The Bubble Problem: Finding and Fixing Stalls
Identifying the Bubble
A "bubble" in the GPU timeline is a period where some units are idle because they are waiting for a dependency to be satisfied. These can be hard to find just by looking at your code. You might think you’ve enabled overlap, but if your stage masks are too broad, the GPU might still be stalling.
To find these, we use hardware profilers like NVIDIA Nsight Graphics, AMD Radeon GPU Profiler, or even the LunarG Synchronization Validation layer. In a profiler, a bubble looks like a gap in the timeline where the Graphics or Compute rows are empty while the other is busy.
Common Causes of Bubbles
-
Overly Conservative Stage Masks: If you use
vk::PipelineStageFlagBits2::eAllCommandsfor every barrier, the GPU will flush everything and wait for it to be idle before starting the next task. This is the most common cause of bubbles. Always use the most specific stage mask possible. -
Sequential Submission: Even if you have two queues, if your CPU code waits for one to finish before submitting to the other, you’ve created a bubble on the CPU side. Use the Wait-Before-Signal pattern and multiple submission threads where appropriate.
-
Dependency Chains: A chain of small dependencies can sometimes be more expensive than one slightly broader barrier. If you have five compute passes that all wait for each other, each one introduces a small stall. Sometimes batching these into a single compute submission is better.
Fixing the Stall
Once you’ve found a bubble, the fix is usually to refine your vk::DependencyInfo.
-
Refine Stage Masks: Check if you can move your
srcStageMasklater in the pipeline or yourdstStageMaskearlier. For example, can your compute work start as soon aseVertexShaderis done, instead of waiting foreFragmentShader? -
Use Memory Barriers Wisely: Sometimes a global memory barrier is better than several image barriers if it allows more work to start sooner.
-
Increase Concurrency: If your profiler shows that the compute units are under-utilized, can you move more work (like occlusion culling) from graphics to compute?
By systematically finding and eliminating these bubbles, you move from a renderer that "just works" to one that is truly professional-grade. In the next chapter, we’ll see how these same principles apply to one of the most common background tasks in modern games: asset streaming.
Navigation
Previous: Async Post-Processing | Next: Transfer Queues & Asset Streaming Sync