Untitled :: Vulkan Documentation Project

Performance samples

The goal of these samples is to demonstrate how to use certain features and functions to achieve optimal performance. To visualize this, they also include real-time profiling information.

AFBC

AFBC (Arm Frame Buffer Compression) is a real-time lossless compression algorithm found in Arm Mali GPUs, designed to tackle the ever-growing demand for higher resolution graphics. This format is applied to the framebuffers that are to be written to the GPU. This technology can offer bandwidth reductions of up to 50%.

Command buffer usage

This sample demonstrates how to use and manage secondary command buffers, and how to record them concurrently. Implementing multi-threaded recording of draw calls can help reduce CPU frame time.

Constant data

The Vulkan API exposes a few different ways in which we can send uniform data into our shaders. There are enough methods that it raises the question "Which one is fastest?", and more often than not the answer is "It depends". The main issue for developers is that the fastest methods may differ between the various vendors, so often there is no "one size fits all" solution. This sample aims to highlight this issue, and help move the Vulkan ecosystem to a point where we are better equipped to solve this for developers. This is done by having an interactive way to toggle different constant data methods that the Vulkan API expose to us. This can then be run on a platform of the developers choice to see the performance implications that each of them bring.

Descriptor management

An application using Vulkan will have to implement a system to manage descriptor pools and sets. The most straightforward and flexible approach is to re-create them for each frame, but doing so might be very inefficient, especially on mobile platforms. The problem of descriptor management is intertwined with that of buffer management, that is choosing how to pack data in VkBuffer objects. This sample will explore a few options to improve both descriptor and buffer management.

HPP Pipeline cache

A transcoded version of the Performance sample Pipeline cache that illustrates the usage of the C++ bindings of vulkan provided by vulkan.hpp.

HPP Swapchain images

A transcoded version of the Performance sampleSwapchain images that illustrates the usage of the C++ bindings of vulkan provided by vulkan.hpp.

HPP Texture compression comparison

A transcoded version of the Performance sample Texture compression comparison that illustrates the usage of the C++ bindings of vulkan provided by vulkan.hpp.

Image compression control

This sample shows how to use the extensions VK_EXT_image_compression_control and VK_EXT_image_compression_control_swapchain to select between different levels of image compression. The UI shows the impact compression has on image size and bandwidth, illustrating the benefits of fixed-rate (visually lossless) compression.

Layout transitions

Vulkan requires the application to manage image layouts, so that all render pass attachments are in the correct layout when the render pass begins. This is usually done using pipeline barriers or the initialLayout and finalLayout parameters of the render pass. If the rendering pipeline is complex, transitioning each image to its correct layout is not trivial, as it requires some sort of state tracking. If previous image contents are not needed, there is an easy way out, that is setting oldLayout/initialLayout to VK_IMAGE_LAYOUT_UNDEFINED. While this is functionally correct, it can have performance implications as it may prevent the GPU from performing some optimizations. This sample will cover an example of such optimizations and how to avoid the performance overhead from using sub-optimal layouts.

MSAA

Aliasing is the result of under-sampling a signal. In graphics this means computing the color of a pixel at a resolution that results in artifacts, commonly jaggies at model edges. Multisample anti-aliasing (MSAA) is an efficient technique that reduces pixel sampling error.

Multi-threaded recording with multiple render passes

Ideally you render all stages of your frame in a single render pass. However, in some cases different stages can’t be performed in the same render pass. This sample shows how multi-threading can help to boost performance when using multiple render passes to render a single frame.

Pipeline barriers

Vulkan gives the application significant control over memory access for resources. Pipeline barriers are particularly convenient for synchronizing memory accesses between render passes. Having barriers is required whenever there is a memory dependency - the application should not assume that render passes are executed in order. However, having too many or too strict barriers can affect the application’s performance. This sample will cover how to set up pipeline barriers efficiently, with a focus on pipeline stages.

Pipeline cache

Vulkan gives applications the ability to save internal representation of a pipeline (graphics or compute) to enable recreating the same pipeline later. This sample will look in detail at the implementation and performance implications of the pipeline creation, caching and management.

Render passes

Vulkan render-passes use attachments to describe input and output render targets. This sample shows how loading and storing attachments might affect performance on mobile. During the creation of a render-pass, you can specify various color attachments and a depth-stencil attachment. Each of those is described by a VkAttachmentDescription struct, which contains attributes to specify the load operation (loadOp) and the store operation (storeOp). This sample lets you choose between different combinations of these operations at runtime.

Specialization constants

Vulkan exposes a number of methods for setting values within shader code during run-time, this includes UBOs and Specialization Constants. This sample compares these two methods and the performance impact of them.

Sub passes

Vulkan introduces the concept of subpasses to subdivide a single render pass into separate logical phases. The benefit of using subpasses over multiple render passes is that a GPU is able to perform various optimizations. Tile-based renderers, for example, can take advantage of tile memory, which being on chip is decisively faster than external memory, potentially saving a considerable amount of bandwidth.

Surface rotation

Mobile devices can be rotated, therefore the logical orientation of the application window and the physical orientation of the display may not match. Applications then need to be able to operate in two modes: portrait and landscape. The difference between these two modes can be simplified to just a change in resolution. However, some display subsystems always work on the "native" (or "physical") orientation of the display panel. Since the device has been rotated, to achieve the desired effect the application output must also rotate. In this sample we focus on the rotation step, and analyze the performance implications of implementing it correctly with Vulkan.

Performance samples

AFBC

Command buffer usage

Constant data

Descriptor management

HPP Pipeline cache

HPP Swapchain images

HPP Texture compression comparison

Image compression control

Layout transitions

MSAA

Multi-threaded recording with multiple render passes

Pipeline barriers

Pipeline cache

Render passes

Specialization constants

Sub passes

Surface rotation

Swapchain images

Wait idle

16-bit storage InputOutput

16-bit arithmetic

Async compute

Basis Universal supercompressed GPU textures

GPU Rendering and Multi-Draw Indirect

Texture compression comparison