Local Read Sync: On-Tile Efficiency with Dynamic Rendering

The Best of Both Worlds

In the legacy render pass system, we used subpasses to perform efficient on-tile read operations. This allowed the GPU to read from a color or depth attachment directly from its on-chip memory (the tile cache), avoiding expensive trips to main memory. This was a critical optimization for mobile and tiled-rendering GPUs.

With the introduction of Vulkan 1.4, this same efficiency is now available in Dynamic Rendering through the VK_KHR_dynamic_rendering_local_read feature. This gives us the simplicity of a "pass-less" world with the performance of a subpass-based world.

Implementing the Local Read

The implementation involves two parts: a specialized barrier and a specific rendering setup. When you use a local read, you tell the GPU: "I want to read from an attachment, but I promise the read will only occur at the same pixel (x, y) location as the current write." This allows the hardware to keep the data on-tile.

// 1. Define the Dependency
auto localReadBarrier = vk::ImageMemoryBarrier2{
    .srcStageMask = vk::PipelineStageFlagBits2::eColorAttachmentOutput,
    .srcAccessMask = vk::AccessFlagBits2::eColorAttachmentWrite,
    .dstStageMask = vk::PipelineStageFlagBits2::eFragmentShader,
    .dstAccessMask = vk::AccessFlagBits2::eInputAttachmentRead,
    .oldLayout = vk::ImageLayout::eRenderingLocalRead,
    .newLayout = vk::ImageLayout::eRenderingLocalRead,
    .image = gBufferAttachment.image(),
    .subresourceRange = subresourceRange
};

commandBuffer.pipelineBarrier2(vk::DependencyInfo{.imageMemoryBarrierCount = 1, .pImageMemoryBarriers = &localReadBarrier});

// 2. Perform the Rendering
// You must include the local read information in your RenderingInfo
auto localReadInfo = vk::RenderingInputAttachmentIndexInfo{
    .colorAttachmentCount = 1,
    .pColorAttachmentInputIndices = &colorIndex
};

auto renderingInfo = vk::RenderingInfo{
    .pNext = &localReadInfo,
    // ...
};

commandBuffer.beginRendering(renderingInfo);
// ... record your on-tile reads in your Slang shader ...
commandBuffer.endRendering();

Slang Integration

In your Slang shader, you use the standard input attachment syntax. The Slang compiler will correctly target the SPIR-V instructions required for local read access. This ensures that your shader code remains clean and portable across different hardware.

// Slang snippet
[[vk::input_attachment_index(0)]]
InputAttachment<float4> gBufferInput;

float4 main(float2 uv : TEXCOORD0) : SV_Target {
    float4 data = gBufferInput.SubpassLoad();
    // ...
}

By mastering local read synchronization, you can build a modern deferred renderer that is every bit as efficient as a legacy subpass-based renderer, but with the flexibility and clarity of modern Vulkan. In the next chapter, we’ll see how these principles apply to the direct CPU-to-GPU data movements in Host Image Copies.

Navigation

Previous: Subpass Replacement | Next: Host Image Copies & Memory Mapped Sync