|The source for this sample can be found in the Khronos Vulkan samples github repository.
Vulkan requires the application to manage image layouts, so that all render pass attachments are in the correct layout when the render pass begins.
This is usually done using pipeline barriers or the
finalLayout parameters of the render pass.
If the rendering pipeline is complex, transitioning each image to its correct layout is not trivial, as it requires some sort of state tracking.
If previous image contents are not needed, there is an easy way out, that is setting
While this is functionally correct, it can have performance implications as it may prevent the GPU from performing some optimizations.
This tutorial will cover an example of such optimizations and how to avoid the performance overhead from using sub-optimal layouts.
Mali GPUs employ something called transaction elimination, which is a technology used to avoid frame buffer write bandwidth for static regions of the framebuffer. This is especially beneficial for games that contain many static opaque overlays.
Transaction elimination is used for an image under the following conditions:
The sample count is 1.
The mipmap level is 1.
The image uses
The image does not use
A single color attachment is being used. Does not apply to the Mali G51 GPU, or later.
The effective tile size is 16x16 pixels. Pixel data storage determines the effective tile size.
The driver keeps a signature buffer for the image to check for redundant frame buffer writes. The signature buffer must always be in sync with the actual contents of the image, which is the case when an image is only used within the tile write path. In practice, this corresponds to only using layouts that are either read-only or can only be written to by fragment shading. These "safe" layouts are:
All other layouts, including
UNDEFINED layout, are considered "unsafe" as they allow writes to an image outside the tile write path.
When an image is transitioned via an "unsafe" layout, the signature buffer must be invalidated to prevent the signature and the data from becoming desynchronized.
Note that the swapchain image is a slightly special case, as it is considered "safe" even when transitioned from
In addition signature invalidation could happen as part of a
vkCmdWaitEvents(), or as part of a
VkRenderPass if the color attachment reference layout is different from the final layout.
vkCmdBlitImage() framebuffer transfer stage operation will also always invalidate the signature buffer, so shader-based blits will likely be more efficient.
The sample sets up deferred rendering using two render passes, to show the effect of transitioning G-buffer images from
UNDEFINED rather than their last known layout.
Note that a deferred rendering implementation using subpasses might be more efficient overall; see the subpasses tutorial for more detail.
The base case is with all color images being transitioned from
UNDEFINED, as shown in the image below.
When we switch to using the last known layout as
oldLayout in the pipeline barriers, transaction elimination can take place.
This is highlighted in the counters showing about double the amount of tiles killed by CRC match, along with ~10% reduction in write bandwidth.
A reduction in memory bandwidth will reduce the power consumption of the device, resulting in less overheating and longer battery life. Additionally, this may improve performance on games that are bandwidth limited.
COLOR_ATTACHMENT_OPTIMALimage layout for color attachments.
Keep an image in a "safe" image layout to avoid unnecessary signature invalidation, including avoiding unnecessary transitions via
storeOp = DONT_CARErather than
UNDEFINEDlayouts to skip unneeded render target writes.
Transition color attachments from "safe" to "unsafe" unless required by the algorithm.
vkCmdBlitImage()to copy constant data between two images; shader-based blits are likely to be more efficient as they will preserve the signature integrity.
Loss of transaction elimination will increase external memory bandwidth for scenes with static regions across frames. This may reduce performance on systems which are memory bandwidth limited, as well as cause a general increase in power consumption.
The GPU performance counters can count the number of tile writes killed by transaction elimination, so you can determine if it is being triggered at all.