VK_EXT_descriptor_buffer

Table of Contents

1. Problem Statement
2. Solution Space
3. Proposal
4. Mapping to DirectX® 12 Descriptor Heaps
5. Porting existing Vulkan applications
6. Example
7. Issues

This document outlines a proposal to make the management of descriptor memory more explicit, allowing descriptors to be present in buffer memory, allowing the data and memory to be managed alongside other buffer objects.

1. Problem Statement

With more “bindless” models of descriptor management, applications are ever increasing the number of descriptors that end up in descriptor sets. Managing allocations this large, and ensuring they end up in device local memory for fast access, is becoming an increasingly awkward problem to manage in the driver. Developers moving to Vulkan are starting to hit bottlenecks that they simply don’t encounter on other platforms.

In other scenarios, making sure descriptors do not end up in device memory is important. Copying descriptors in Vulkan is considered rather esoteric, but it is a fairly common strategy in other APIs and implementing a similar style in Vulkan can lead to problems. There is no hint to let an implementation know that a descriptor set will only be used for purposes of copying (i.e. staging buffer). If a descriptor set is mapped to device local memory (BAR) or uncached memory, reading from the descriptor set on the host can have a catastrophic effect on performance. On top of this, some applications rely on being able to copy several tens of thousand individual descriptors every frame. The overhead to set up this many calls to vkUpdateDescriptorSets is not ideal.

In contrast to this, developers are managing uploads for other large resources (e.g. images, buffers) in application code and generally doing a good job of it – typically this is not identified as a problem area. Developers approaching Vulkan are often confused by the way in which descriptor pools work - and several have made requests to manage things more explicitly. The key things that we’ve had requests for are (relevant Vulkan issues in brackets):

Explicit allocation management
Better mapping to DirectX 12
Host-only descriptor pools
GPU descriptor updates

2. Solution Space

There are several more-or-less invasive options that could work here:

Add relevant flags and other information to descriptor pools
Like 1, but enable memory binding for descriptor pools
Bypass descriptor pools, and allow direct creation and memory binding for descriptor sets
Bypass descriptor sets, and use descriptor set layouts in buffers
Bypass descriptor set layouts, and use blobs of memory in buffers that shaders access with explicit layouts

VK_VALVE_mutable_descriptor_type includes support for option 1, through the use of VK_DESCRIPTOR_SET_LAYOUT_CREATE_HOST_ONLY_POOL_BIT_VALVE and VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE. However, this does not fully solve the problem of memory management since we can only avoid allocating device memory for descriptors. Being able to control where shader-accessible descriptors are allocated is still unavailable to applications.

Option 2 attempts to redefine what a descriptor pool is, and it would seem like a very awkward abstraction. The whole point of the descriptor pool is to allocate and manage memory on the behalf of the application.

Option 3 and 4 are similarly invasive, but move descriptor pools out of the way, making things a lot clearer. The major downside to this is that it potentially blocks out older implementations; however this is likely the same set of implementations that wouldn’t see a benefit from this proposal anyway (i.e. “non-bindless" hardware).

Option 4 has the advantage of having a smaller surface area than option 3 and allows applications to use existing buffer management functions in both Vulkan and in their own code. Being able to use buffers directly means that applications are in control of where the memory is allocated and can control if memory is:

Host-only (plain malloc)
Host-only but shader-visible (VkDeviceMemory with HOST_VISIBLE_BIT)
Device local and shader-visible (resizable BAR on discrete GPUs, unified memory on integrated)
Device local only (GPU copies descriptors)

Option 5 is more invasive than Option 4 and requires shader-side changes.

In order to keep the required changes in this extension to the API only, the extra steps in Option 5 are deferred to a future planned extension, and this proposal focuses on Option 4.

3. Proposal

3.1. Modeling a descriptor set as memory

Descriptors in Vulkan as it stands are generally considered quite abstract. They do not have a size, and when creating descriptor pools it is only specified how many descriptors can be allocated.

This abstraction is removed by the proposal and it assumes that a VkDescriptorSetLayout can be expressed as a list of binding points with a known:

Byte offset
Element size
Number of elements tightly packed

The element size depends on the descriptor type and is a property of the physical device.

Implementations are free to control the byte offset, and so can freely repack descriptors for optimal memory access. For exact control over byte offsets for different descriptors, descriptor indexing should be used, since arrays have guaranteed packing.

If we think in terms of VkDescriptorPool with this model, an implementation of that could be something like an arena allocator where size is derived from the descriptor counts, and a VkDescriptorSet with VkDescriptorSetLayout just allocates a certain number of bytes from the pool. This is essentially the same model as VkBuffer and VkImage allocation.

When we call vkCmdBindDescriptorSets, what we are really doing is binding a buffer of a certain size. The shader compiler looks at VkPipelineLayout and based on the DescriptorSet and Binding decorations, it can look up that a descriptor can be read from the bound descriptor set at a specific offset.

As VK_EXT_descriptor_indexing is required, its descriptor limits apply.

3.1.1. Next level update-after-bind

With descriptor being modeled as buffer memory, we remove all pretense of the implementation being able to consume descriptors when recording the command buffer. In the Vulkan 1.0 descriptor model, descriptors must be valid when descriptor sets are bound and remain valid, which means implementations are free to consume the descriptors, repack them, and so on if they desire. With descriptor indexing, the UPDATE_AFTER_BIND_BIT and PARTIALLY_BOUND_BIT flags imply a buffer like model where descriptors must not be consumed unless dynamically used by shaders. With descriptor buffers, this model is implied and it is not allowed to specify a descriptor set layout being both update-after-bind and descriptor buffer capable.

As descriptors can be updated in the GPU timeline, descriptor buffers go a bit further than update-after-bind. In the existing update-after-bind model, descriptors can only be consumed correctly if they were written before queue submits.

3.1.2. Dropping support for abstract descriptor types

Some descriptor types are a bit more abstract in nature. Dynamic uniform buffers and dynamic storage buffers for example have a component to them that does not consume descriptor memory, but function more like push constants. Descriptor types which cannot be expressed in terms of descriptors in memory are not supported with descriptor buffers, but rapidly changing descriptors can be replaced with existing alternatives such as:

Push constants
Place buffer device address in push constants
Push descriptors

Update-after-bind has similar restrictions already.

3.1.3. One buffer, many offsets

While binding descriptor sets as memory is possible on a wide range of hardware, descriptors are still considered "special" memory by many implementations, and it may not be possible to bind many different buffers at the same time. Some possible restrictions can be:

Limited address space for descriptors
Descriptor sets are accessed with offset from one or more base pointers

In Vulkan, applications are guaranteed at least 4 descriptor sets, but many implementations go beyond this. At the same time, it might not be possible to bind that many different descriptor buffers.

In D3D12 for example, this problem manifests itself as ID3D12GraphicsCommandList::SetDescriptorHeaps().

Similarly, this extension will work on a model where applications allocate large descriptor buffers, and bind those buffers to the command buffer. From there, descriptor sets are expressed as offsets into the bound buffers.

It is expected that changing a descriptor buffer binding is a fairly heavy operation on some implementations and should be avoided. Changing offsets however, is very efficient.

A limited address space can be expressed with special memory types that allocate from a dedicated address space region.

3.1.4. No mixing and matching descriptor buffers and older model

The implication of descriptor buffers is that applications will now take more control over which descriptor buffers are bound to a command buffer. Without descriptor buffers, this is something implementations were able to hide from applications, so it is not possible to mix and match these models in one draw or dispatch. It is possible to mix and match the two models in different draw or dispatches, but it is equivalent to changing the descriptor buffer bindings and should be avoided if possible.

In terms of state invalidation, whenever a descriptor buffer offset is bound, it invalidates all bindings for descriptor sets and vice versa.

3.2. Putting Descriptors in Memory

This extension introduces new commands to put shader-accessible descriptors directly in memory. Properties of descriptor set layouts may vary based on enabled device features, so new device-level functions are added to query the properties of layouts. These calls are invariant across the lifetime of the device, and between VkDevice objects created from the same physical device(s), with the same creation parameters.

void vkGetDescriptorSetLayoutSizeEXT(
    VkDevice                                    device,
    VkDescriptorSetLayout                       layout,
    VkDeviceSize*                               pLayoutSizeInBytes);

void vkGetDescriptorSetLayoutBindingOffsetEXT(
    VkDevice                                    device,
    VkDescriptorSetLayout                       layout,
    uint32_t                                    binding,
    VkDeviceSize*                               pOffset);

Applications are responsible for writing data into memory, but the application does not control the memory location directly – descriptor set layouts dictate where each descriptor lives, so that the shader interface continues to work as-is with set and binding numbers.

The size and offset of descriptors is exposed to applications, so they know how to copy it into memory. This is important since applications are free to copy descriptors on the device itself.

The sizes for different descriptor types are defined in the properties: samplerDescriptorSize, combinedImageSamplerDescriptorSize, sampledImageDescriptorSize, storageImageDescriptorSize, uniformTexelBufferDescriptorSize, robustUniformTexelBufferDescriptorSize, storageTexelBufferDescriptorSize, robustStorageTexelBufferDescriptorSize, uniformBufferDescriptorSize, robustUniformBufferDescriptorSize, storageBufferDescriptorSize, robustStorageBufferDescriptorSize, inputAttachmentDescriptorSize, accelerationStructureDescriptorSize, combinedImageSamplerDensityMapDescriptorSize.

Descriptor arrays have guaranteed packing, such that each element of an array for a given binding has an offset from that binding’s base offset equal to the size of the descriptor multiplied by the array offset. Bindings can be moved around as the driver sees fit, but variable-sized descriptor arrays must be packed at the end.

For use cases where layouts contain a variable-sized descriptor count, the size returned reflects the upper bound described in the descriptor set layout. The size required for a descriptor set layout with a variable size descriptor array can be obtained by adding the product of the number of descriptors that are actually used and the size of the descriptor.

Descriptor set layouts used for this purpose must be created with a new create flag:

VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT = 0x00000010

Layouts created with this flag must not be used to create a VkDescriptorSet and must not include dynamic uniform buffers or dynamic storage buffers. Applications can achieve the same dynamic offsetting by either updating a descriptor buffer, using push constants, or by using push descriptors. The blob of memory corresponding to a descriptor is obtained from resource views directly. How applications get that data into device memory is entirely up to them, but the offset must match that obtained from the layout.

typedef struct VkDescriptorAddressInfoEXT {
    VkStructureType                                 sType;
    const void*                                     pNext;
    VkDeviceAddress                                 address;
    VkDeviceSize                                    range;
    VkFormat                                        format;
} VkDescriptorAddressInfoEXT;

typedef union VkDescriptorDataEXT {
    const VkSampler*                                pSampler;
    const VkDescriptorImageInfo*                    pCombinedImageSampler;
    const VkDescriptorImageInfo*                    pInputAttachmentImage;
    const VkDescriptorImageInfo*                    pSampledImage;
    const VkDescriptorImageInfo*                    pStorageImage;
    const VkDescriptorAddressInfoEXT*               pUniformTexelBuffer;
    const VkDescriptorAddressInfoEXT*               pStorageTexelBuffer;
    const VkDescriptorAddressInfoEXT*               pUniformBuffer;
    const VkDescriptorAddressInfoEXT*               pStorageBuffer;
    VkDeviceAddress                                 accelerationStructure;
} VkDescriptorDataEXT;

typedef struct VkDescriptorGetInfoEXT {
    VkStructureType                                 sType;
    const void*                                     pNext;
    VkDescriptorType                                type;
    VkDescriptorDataEXT                             data;
} VkDescriptorGetInfoEXT;

void vkGetDescriptorEXT(
    VkDevice                                        device
    const VkDescriptorGetInfoEXT*                   pCreateInfo,
    size_t                                          dataSize,
    void*                                           pDescriptor);

These APIs extract raw descriptor blob data from objects. The data obtained from these calls can be freely copied around. Note that these calls do not know anything about descriptor set layouts. It is the application’s responsibility to write descriptors to a suitable location.

A notable change here is that there is no longer any need for VkBufferView objects. Texel buffers are built from buffer device addresses and format instead. This improvement is motivated by DX12 portability. In some use cases, texel buffers are linearly allocated and having to create and manage a large number of unique view objects is problematic. With descriptor buffers, this style of API is now feasible in Vulkan.

A similar improvement is that uniform buffers and storage buffer also take buffer device addresses.

Acceleration structure descriptors are also built from device addresses, or handles retrieved from vkGetAccelerationStructureHandleNV when using VkAccelerationStructureNV objects.

Inline uniform buffers do not have a descriptor data getter API associated with them. Instead, the descriptor data is copied directly into the buffer offset obtained by vkGetDescriptorSetLayoutBindingOffsetEXT. As the name suggests, inline uniform buffers are embedded into the descriptor set itself.

As descriptors are now in regular memory, drivers cannot hide copies of immutable samplers that end up in descriptor sets from the application. As such, applications are required to provide these samplers as if they were not provided immutably. These samplers must have identical parameters to the immutable samplers in the descriptor set layout. Alternatively, applications can use dedicated descriptor sets for immutable samplers that do not require app-managed memory, by embedding them in a special descriptor set.

If the descriptorBufferImageLayoutIgnored feature is enabled, the imageLayout in VkDescriptorImageInfo is ignored, otherwise it specifies the layout that the descriptor will be used with. type must not be VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC or VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC. 'format' in VkDescriptorAddressInfoEXT is ignored for non-texel buffers.

The combinedImageSamplerDescriptorSingleArray property indicates that the implementation does not require an array of VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptors to be written into a descriptor buffer as an array of image descriptors, immediately followed by an array of sampler descriptors. If VK_FALSE, applications are expected to write the first sampledImageDescriptorSize bytes of the data returned through pDescriptor to the first array, and the remaining samplerDescriptorSize bytes of the data to the second array. On these implementations, variable descriptor counts of combined image samplers may be supported, but it is not useful as the descriptor set size must assume the upper bound.

3.2.1. Embedded Immutable Samplers

Immutable samplers can be embedded into descriptor layouts, allowing them to be bound without disturbing descriptor buffer bindings or requiring device memory backing. Descriptor set layouts must be created with a new flag for this purpose:

VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT = 0x00000020

When this flag is used, this set layout can only contain descriptor bindings with a descriptorType of VK_DESCRIPTOR_TYPE_SAMPLER, a descriptorCount of 1 (i.e. not arrayed), and a valid VkSampler used in `pImmutableSamplers. Note that arrays of immutable samplers are not supported, as implementations typically need these in memory to allow dynamic indexing - whereas no device memory is directly associated with these sets.

3.3. Pipeline creation

To use pipelines with descriptor buffers a new VkPipelineCreateFlag must be used:

VK_PIPELINE_CREATE_DESCRIPTOR_BUFFER_BIT_EXT = 0x20000000

3.4. Descriptor Binding

Descriptor buffers are bound to the command buffer directly (similar to vertex buffers).

typedef struct VkDescriptorBufferBindingPushDescriptorBufferHandleEXT {
    VkStructureType                             sType;
    const void*                                 pNext;
    VkBuffer                                    buffer;
} VkDescriptorBufferBindingPushDescriptorBufferHandleEXT;

typedef struct VkDescriptorBufferBindingInfoEXT {
    VkStructureType                             sType;
    const void*                                 pNext;
    VkDeviceAddress                             address;
    VkBufferUsageFlags                          usage;
} VkDescriptorBufferBindingInfoEXT;

vkCmdBindDescriptorBuffersEXT(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    bufferCount,
    const VkDescriptorBufferBindingInfoEXT*     pBindingInfos);

Unlike binding descriptor sets, there’s no invalidating going on with this binding – a buffer remains bound and is interpreted by a pipeline in the manner the pipeline expects, irrespective of what layout was used to construct the buffer for each set.

There must be no more than maxSamplerDescriptorBufferBindings descriptor buffers containing sampler descriptor data bound. Such buffers must be created with VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT.

There must be no more than maxResourceDescriptorBufferBindings descriptor buffers containing resource descriptors bound. Such buffers must be bound with VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT.

If a buffer contains both usage flags, it counts once against both limits.

If the bufferlessPushDescriptors property is VK_FALSE and a buffer contains the VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT usage flag, a VkDescriptorBufferBindingPushDescriptorBufferHandleEXT structure must be added to the pNext chain of VkDescriptorBufferBindingInfoEXT.

bufferCount must be less than or equal to maxDescriptorBufferBindings.

Any previously bound buffers at binding points greater than or equal to bufferCount are unbound.

Each entry in pBindingInfos contains the device address of a descriptor buffer and the usage flags that the buffer was created with.

Changing buffers may be an expensive operation and should be done infrequently (if ever).

The maximum available range of each binding to a shader is maxSamplerDescriptorBufferRange and/or maxResourceDescriptorBufferRange.

The samplerDescriptorBufferAddressSpaceSize, resourceDescriptorBufferAddressSpaceSize, and descriptorBufferAddressSpaceSize properties give the upper bound for the total amount of address space used for descriptor buffers.

Buffers used for this purpose need to be created with a new usage flags:

VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT  = 0x00200000
VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT = 0x00400000

VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT specifies that the buffer will be used to contain sampler descriptors when bound as a descriptor buffer. VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT specifies that the buffer will be used to contain resource descriptors, i.e. non-sampler descriptors, when bound as a descriptor buffer. Buffers containing VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptors must have been created with both VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT and VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT.

Each descriptor set is associated with a buffer and an offset into that buffer which can be set by:

vkCmdSetDescriptorBufferOffsetsEXT(
    VkCommandBuffer                             commandBuffer,
    VkPipelineBindPoint                         pipelineBindPoint,
    VkPipelineLayout                            layout,
    uint32_t                                    firstSet,
    uint32_t                                    setCount,
    const uint32_t*                             pBufferIndices,
    const VkDeviceSize*                         pOffsets);

vkCmdSetDescriptorBufferOffsetsEXT causes the sets numbered [firstSet.. firstSet+setCount-1] to use the bindings stored in the buffer bound at pBufferIndices[i] at an offset of pOffsets[i] for subsequent bound pipeline commands set by pipelineBindPoint. Any bindings that were previously applied via these sets, or calls to vkCmdBindDescriptorSets, are no longer valid. Calling vkCmdBindDescriptorSets invalidates bindings previously applied via vkCmdSetDescriptorBufferOffsetsEXT.

Setting offsets should be a cheap operation and can be performed frequently. The offsets must be aligned to descriptorBufferOffsetAlignment.

Embedded Immutable Samplers can be bound using:

vkCmdBindDescriptorBufferEmbeddedSamplersEXT(
    VkCommandBuffer                             commandBuffer,
    VkPipelineBindPoint                         pipelineBindPoint,
    VkPipelineLayout                            layout,
    uint32_t                                    set)
);

vkCmdBindDescriptorBufferEmbeddedSamplersEXT binds the embedded immutable samplers in layout at set index set to the same set in the command buffer. Set bindings are invalidated in the same manner as they are for vkCmdSetDescriptorBufferOffsetEXT. The VkDescriptorSetLayout at index set of layout must have been created with the VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT bit. There must be no more than maxEmbeddedImmutableSamplerBindings embedded immutable sampler sets bound. Like DX12, there is a limit to how many unique embedded immutable samplers may be alive in a device at any one point. This limit is designed to match DX12.

3.5. Descriptor Updates

As descriptors are just a blob of memory, descriptor updates can be performed by any operation on either the host or device that can access memory, enabling a form of GPU descriptor update. Descriptor buffer reads can be synchronized using a new access bit in the relevant shader stage:

VK_ACCESS_2_DESCRIPTOR_BUFFER_READ_BIT_EXT = 0x20000000000ULL

Note that host writes are implicitly made visible to all stages in vkQueueSubmit, so this access flag is only relevant when performing GPU-side updates of descriptors.

If the allowSamplerImageViewPostSubmitCreation property is VK_FALSE there are special requirements for when descriptor data for VkSampler or VkImageView objects can be used. Those objects must have been created before any vkQueueSubmit (or vkQueueSubmit2) call that executes a command buffer which accesses descriptor data for them.

For example, if allowSamplerImageViewPostSubmitCreation is VK_FALSE, this is disallowed:

Call vkQueueSubmit() which is waiting for a timeline semaphore
Create a VkImageView
Update the descriptor buffer used by the previous submission from the host using the descriptor data of the new VkImageView
Signal the semaphore from the host

3.6. Push descriptors

Support for descriptor buffers combined with push descriptors is supported if the descriptorBufferPushDescriptors feature bit is set.

To support push descriptors on certain implementations, additional buffer usage flags are added:

VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT = 0x04000000

If the application desires to use push descriptors and descriptor buffers together, a descriptor set layout must be declared with VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR and VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT bits set.

If the bufferlessPushDescriptors property is VK_FALSE, there are special requirements for using push descriptors with descriptor buffers. VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT is a special buffer flag which is required for certain implementations in order for push descriptors to interoperate with descriptor buffers. When pushing descriptors using this kind of set layout, it is required that a descriptor buffer is bound to the command list with the VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT usage flag. The intention here is that implementation can reserve scratch space in descriptor buffers for the purposes of dealing with push descriptors. The mechanics here are highly magical and implementation defined in nature and is considered too burdensome to expect that applications deal with it.

Binding a buffer that was created with VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT requires the application to record any current push descriptors again.

3.7. Capture/Replay

When creating a resource with the capture/replay feature enabled, an opaque handle can be obtained which can be passed into creation calls in a future replay, causing descriptors to be created with the same data.

New flags to be supplied when creating buffers, images, and samplers to be captured/replayed:

VK_BUFFER_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT                 = 0x00000020
VK_IMAGE_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT                  = 0x00010000
VK_IMAGE_VIEW_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT             = 0x00000004
VK_SAMPLER_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT                = 0x00000008
VK_ACCELERATION_STRUCTURE_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00000008

There are separate commands to get opaque data for buffers, images, and samplers:

VkResult vkGetBufferOpaqueCaptureDescriptorDataEXT(
    VkDevice                                    device,
    const VkBufferCaptureDescriptorDataInfoEXT* pInfo,
    void*                                       pData);

typedef struct VkBufferCaptureDescriptorDataInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    VkBuffer           buffer;
} VkBufferCaptureDescriptorDataInfoEXT;

VkResult vkGetImageOpaqueCaptureDescriptorDataEXT(
    VkDevice                                   device,
    const VkImageCaptureDescriptorDataInfoEXT* pInfo,
    void*                                      pData);

typedef struct VkImageCaptureDescriptorDataInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    VkImage            image;
} VkImageCaptureDescriptorDataInfoEXT;

VkResult vkGetImageViewOpaqueCaptureDescriptorDataEXT(
    VkDevice                                       device,
    const VkImageViewCaptureDescriptorDataInfoEXT* pInfo,
    void*                                          pData);

typedef struct VkImageViewCaptureDescriptorDataInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    VkImageView        imageView;
} VkImageViewCaptureDescriptorDataInfoEXT;

VkResult vkGetSamplerOpaqueCaptureDescriptorDataEXT(
    VkDevice                                     device,
    const VkSamplerCaptureDescriptorDataInfoEXT* pInfo,
    void*                                        pData);

typedef struct VkSamplerCaptureDescriptorDataInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    VkSampler          sampler;
} VkSamplerCaptureDescriptorDataInfoEXT;

VkResult vkGetAccelerationStructureOpaqueCaptureDescriptorDataEXT(
    VkDevice                                                   device,
    const VkAccelerationStructureCaptureDescriptorDataInfoEXT* pInfo,
    void*                                                      pData);

typedef struct VkAccelerationStructureCaptureDescriptorDataInfoEXT {
    VkStructureType                  sType;
    const void*                      pNext;
    VkAccelerationStructureKHR       accelerationStructure;
    VkAccelerationStructureNV        accelerationStructureNV;
} VkAccelerationStructureCaptureDescriptorDataInfoEXT;

Once queried, this must be provided to buffer/image/imageview/sampler/acceleration structure creation in a similar manner to buffer device address creation, by chaining the following structure to buffer, image, imageview, sampler, or acceleration structure creation:

typedef struct VkOpaqueCaptureDescriptorDataCreateInfoEXT {
    VkStructureType    sType;
    const void*        pNext;
    const void*        opaqueCaptureDescriptorData;
} VkOpaqueCaptureDescriptorDataCreateInfoEXT;

In each case, the size of the capture data is sized to the bufferCaptureReplayDescriptorDataSize, imageCaptureReplayDescriptorDataSize, imageViewCaptureReplayDescriptorDataSize, samplerCaptureReplayDescriptorDataSize, or accelerationStructureCaptureReplayDescriptorDataSize limits as appropriate.

In addition, vkGetDeviceMemoryOpaqueCaptureAddress must be used to capture the opaque address and replay it with VkMemoryOpaqueCaptureAddressAllocateInfo, for any memory used by resources with these handles.

3.8. Device Features

The following features are exposed:

typedef struct VkPhysicalDeviceDescriptorBufferFeaturesEXT {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           descriptorBuffer;
    VkBool32           descriptorBufferCaptureReplay;
    VkBool32           descriptorBufferImageLayoutIgnored;
    VkBool32           descriptorBufferPushDescriptors;
} VkPhysicalDeviceDescriptorBufferFeaturesEXT;

If the descriptorBuffer feature is enabled, VK_AMD_shader_fragment_mask must not be enabled. If the descriptorBufferImageLayoutIgnored feature is enabled, the image layout provided when getting a descriptor is ignored. The descriptorBufferCaptureReplay feature is primarily for capture replay tools, and allows opaque data to be captured and replayed, allowing the same descriptor handles to be used on replay. If the descriptorBufferPushDescriptors features is enabled push descriptors can be used with descriptor buffers.

3.9. Device Properties

The following properties are exposed:

typedef struct VkPhysicalDeviceDescriptorBufferPropertiesEXT {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           combinedImageSamplerDescriptorSingleArray;
    VkBool32           bufferlessPushDescriptors;
    VkBool32           allowSamplerImageViewPostSubmitCreation;
    VkDeviceSize       descriptorBufferOffsetAlignment;
    uint32_t           maxDescriptorBufferBindings;
    uint32_t           maxResourceDescriptorBufferBindings;
    uint32_t           maxSamplerDescriptorBufferBindings;
    uint32_t           maxEmbeddedImmutableSamplerBindings;
    uint32_t           maxEmbeddedImmutableSamplers;
    size_t             bufferCaptureReplayDescriptorDataSize;
    size_t             imageCaptureReplayDescriptorDataSize;
    size_t             imageViewCaptureReplayDescriptorDataSize;
    size_t             samplerCaptureReplayDescriptorDataSize;
    size_t             accelerationStructureCaptureReplayDescriptorDataSize;
    size_t             samplerDescriptorSize;
    size_t             combinedImageSamplerDescriptorSize;
    size_t             sampledImageDescriptorSize;
    size_t             storageImageDescriptorSize;
    size_t             uniformTexelBufferDescriptorSize;
    size_t             robustUniformTexelBufferDescriptorSize;
    size_t             storageTexelBufferDescriptorSize;
    size_t             robustStorageTexelBufferDescriptorSize;
    size_t             uniformBufferDescriptorSize;
    size_t             robustUniformBufferDescriptorSize;
    size_t             storageBufferDescriptorSize;
    size_t             robustStorageBufferDescriptorSize;
    size_t             inputAttachmentDescriptorSize;
    size_t             accelerationStructureDescriptorSize;
    VkDeviceSize       maxSamplerDescriptorBufferRange;
    VkDeviceSize       maxResourceDescriptorBufferRange;
    VkDeviceSize       samplerDescriptorBufferAddressSpaceSize;
    VkDeviceSize       resourceDescriptorBufferAddressSpaceSize;
    VkDeviceSize       descriptorBufferAddressSpaceSize;
} VkPhysicalDeviceDescriptorBufferPropertiesEXT;

descriptorBufferOffsetAlignment describes the alignment required, in bytes, when setting offsets into the descriptor buffer.
combinedImageSamplerDescriptorSingleArray indicates that the implementation does not require an array of VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptors to be written into a descriptor buffer as an array of image descriptors, immediately followed by an array of sampler descriptors.
bufferlessPushDescriptors indicates that the implementation does not require a buffer created with VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT to be bound when using push descriptors.
allowSamplerImageViewPostSubmitCreation indicates that the implementation does not restrict when the VkSampler or VkImageView objects used to retrieve descriptor data can be created in relation to command buffer submission. If this value is VK_FALSE, then the application must create any VkSampler or VkImageView objects whose descriptor data is accessed during the execution of a command buffer, before the vkQueueSubmit (or vkQueueSubmit2) call that submits that command buffer.
maxDescriptorBufferBindings defines the maximum total number of descriptor buffers and embedded immutable sampler sets that can be bound.
maxResourceDescriptorBufferBindings defines the maximum number of resource descriptor buffers that can be bound.
maxSamplerDescriptorBufferBindings defines the maximum number of sampler descriptor buffers that can be bound.
maxEmbeddedImmutableSamplerBindings defines the maximum number of embedded immutable samplers sets that can be bound.
maxEmbeddedImmutableSamplers describes the maximum number of unique immutable samplers in descriptor set layouts created with VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT, and pipeline layouts created from them, which can simultaneously exist on a device.
bufferCaptureReplayDescriptorDataSize, imageCaptureReplayDescriptorDataSize, imageViewCaptureReplayDescriptorDataSize, samplerCaptureReplayDescriptorDataSize, and accelerationStructureCaptureReplayDescriptorDataSize define the maximum size, in bytes, of the opaque data used for capture replay with each respective object type.
samplerDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_SAMPLER descriptor.
combinedImageSamplerDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor.
sampledImageDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE descriptor.
storageImageDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_IMAGE descriptor.
uniformTexelBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER descriptor.
robustUniformTexelBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER descriptor when robust buffer access is enabled.
storageTexelBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER descriptor.
robustStorageTexelBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER descriptor when robust buffer access is enabled.
uniformBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER descriptor.
robustUniformBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER descriptor when robust buffer access is enabled.
storageBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER descriptor.
robustStorageBufferDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER descriptor when robust buffer access is enabled.
inputAttachmentDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT descriptor.
accelerationStructureDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR/VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV descriptor.
maxSamplerDescriptorBufferRange describes the accessible range, in bytes, of a sampler buffer when bound.
maxResourceDescriptorBufferRange describes the accessible range, in bytes, of a resource buffer when bound.
samplerDescriptorBufferAddressSpaceSize describes the total amount of address space available, in bytes, for descriptor buffers containing samplers.
resourceDescriptorBufferAddressSpaceSize describes the total amount of address space available, in bytes, for descriptor buffers containing resources.
descriptorBufferAddressSpaceSize describes the total amount of address space available, in bytes, for all descriptor buffers.

If VK_VALVE_mutable_descriptor_type is used, a descriptor is considered to be a union of all the enabled types, so the size of a descriptor is the maximum of all enabled types.

typedef struct VkPhysicalDeviceDescriptorBufferDensityMapPropertiesEXT {
    VkStructureType    sType;
    void*              pNext;
    size_t             combinedImageSamplerDensityMapDescriptorSize;
} VkPhysicalDeviceDescriptorBufferDensityMapPropertiesEXT;

combinedImageSamplerDensityMapDescriptorSize describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor when using the VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT flag of the VK_EXT_fragment_density_map extension.

4. Mapping to DirectX® 12 Descriptor Heaps

In DirectX 12 (DX12), descriptors are allocated into descriptor heaps, which work almost completely differently to anything currently in Vulkan. This extension aims to reduce one aspect of the divergence between the two. Below is a rough description of the mapping from DX12 to this extension. Applications looking to port between the two APIs will likely have more information available than the DX12 API provides, and can likely take shortcuts (highlighted where possible). This doesn’t solve the overall limits for object counts, and so it’s not possible to trivially emulate every corner of the DX12 API.

4.1. Descriptor Heap Creation

DX12 has the following command to create a heap:

typedef struct D3D12_DESCRIPTOR_HEAP_DESC {
  D3D12_DESCRIPTOR_HEAP_TYPE  Type;
  UINT                        NumDescriptors;
  D3D12_DESCRIPTOR_HEAP_FLAGS Flags;
  UINT                        NodeMask;
} D3D12_DESCRIPTOR_HEAP_DESC;

HRESULT CreateDescriptorHeap(
  const D3D12_DESCRIPTOR_HEAP_DESC *pDescriptorHeapDesc,
  REFIID                           riid,
  void                             **ppvHeap
);

Implementing the equivalent functionality in Vulkan would mean the following operations:

Create a VkDescriptorSetLayout with VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT. The count would be up to 1000000 for resources, and 2048 for samplers.
- If VK_VALVE_mutable_descriptor_type is supported, we only need one descriptor set layout which supports all descriptor types for the heap type.
- Otherwise, there are two alternatives:
  - Create up to 6 descriptor set layouts of the relevant descriptor types the application cares about (STORAGE_BUFFER, UNIFORM_BUFFER, SAMPLED_IMAGE, STORAGE_IMAGE, UNIFORM_TEXEL_BUFFER, STORAGE_TEXEL_BUFFER).
  - Create one descriptor set layout with 6 fixed-size arrays instead of using variable descriptor counts. This means NumDescriptors is effectively ignored.
Create a VkBuffer, size equal to NumDescriptors multiplied by the descriptor size within it, and its device mask set per NodeMask.
If Flags includes D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE, allocate DEVICE_LOCAL memory.
- If this memory can be DEVICE_LOCAL and HOST_VISIBLE, then that can be mapped directly for the CPU pointer and used as the heap CPU pointer.
- Otherwise, HOST_VISIBLE staging memory should be allocated for a parallel buffer. Copying from this staging buffer to the main descriptor buffer should be done at each submit where the staging buffer has been modified.
If Flags does not include D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE, allocate HOST_VISIBLE memory that can be used for staging copies to DEVICE_LOCAL memory.
- Alternatively, plain malloc can be used if descriptor copies are implemented as memcpy.
Copying descriptors ala CopyDescriptorsSimple() is implemented with either memcpy or staging copies.

This model would support the full TIER_3 resource binding feature in DX12 and shader model 6.6 direct heap access, but can be simplified a lot for applications with DX11-style binding models.

4.2. Descriptor Creation

Unlike DX12, Vulkan (and this extension) requires view objects and sampler objects to exist and have their lifetimes managed by the application. These objects need to be kept alive for the descriptor itself to be valid. How this is managed precisely is going to depend on the application’s usage patterns, though vkd3d-proton suggests one viable option. The scheme used by vkd3d-proton involves keeping a hash map of the views associated with each resource object (or the device for samplers), using creation parameters as a key, so that their lifetime is tied to the underlying resource and can be reused. When actually creating the UAV/SRV/Sampler, the object should be looked up in the relevant hash map, and created there if necessary. The descriptor itself is then written directly to the provided CPU pointer. Note that 'VkBufferView' objects are not used and have been replaced by an explicit address, range, and format. This is very important since applications have a tendency to linearly allocate texel buffers and might end up rapidly create these views at different offsets. If applications were forced to hold on to all unique 'VkBufferView' objects, things get out of hand quickly. vkd3d-proton currently works around this problem by quantizing the texel buffer offset and range, and instead performs offset/range checks per access in shaders to keep the number of objects low, which is obviously not desirable.

For image views on the other hand, the number of unique views in flight per resource tends to be constrained and manageable. In terms of performance characteristics, creating SRVs and UAVs is already far more expensive in DX12 than copying descriptors. The style observed in most DX12 applications is that view objects are created in non-shader visible heaps, which are then streamed into shader visible heaps.

4.3. Descriptor Heap Queries

Descriptor heaps provide methods to query the “start” pointer for the descriptor heap on both the CPU and GPU.

D3D12_CPU_DESCRIPTOR_HANDLE GetCPUDescriptorHandleForHeapStart();
D3D12_GPU_DESCRIPTOR_HANDLE GetGPUDescriptorHandleForHeapStart();
UINT GetDescriptorHandleIncrementSize(
  D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapType
);

GetGPUDescriptorHandleForHeapStart should be the VkDeviceAddress for the device-local buffer. GetCPUDescriptorHandleForHeapStart should be the mapped host address for the host-visible buffer. GetDescriptorHandleIncrementSize should be the size of the largest descriptor possible in the buffer.

However, this model can fall through fairly quickly if the descriptor set layout is more complicated. When more than one descriptor array is used to emulate the union-style descriptor heap of DX12, it is not possible to provide a unique pointer to host memory that is suitable for copying.

An engine abstraction that takes descriptor heap and offset separately is much easier to implement overall and avoids all these pitfalls.

4.4. Descriptor Copies

D3D12-style descriptor copies can be performed using memcpy on the host-visible descriptor buffer memory, but applications need to make sure the memory that is being read from is cached on the host. Alternatively, it is possible to use staging buffer copies.

4.5. Descriptor Binding

Binding descriptors to shaders in DX12 consists of two operations: setting the descriptor heaps, and setting tables as offsets into those heaps.

SetDescriptorHeaps allows applications to set one sampler heap, and one CBV/SRV/UAV heap (containing other resources). This command should straightforwardly map to vkCmdBindDescriptorBuffersEXT, with each heap being bound as a separate buffer.

Set{Graphics|Compute}RootDescriptorTable allows applications to set various offsets to the descriptor heap, to be more or less used like descriptor sets in Vulkan. This command will map fairly directly to vkCmdSetDescriptorBufferOffsetsEXT, but if implementing DX12 root signatures natively, this approach will not work easily. The core assumption of DX12 is that the heap is a big array and a table offset should be seen more as an index offset into that big array. descriptorBufferOffsetAlignment might be larger than one descriptor, so binding at the desired offset might not be possible. Descriptor buffer offsets are better suited for suballocating individual descriptor sets rather than slicing existing descriptor sets.

An engine abstraction can decide to take this into account when allocating descriptor sets:

In DX12 path, a root signature has N tables, which needs to allocate M descriptors each.
In Vulkan path, a "root signature" translates to a VkPipelineLayout, which in turn translates to N `VkDescriptorSetLayout`s which require M bytes in the descriptor buffer each.

If native DX12 root signature compatibility is required however, the suggested implementation is to bind the heap in its entirety with a single vkCmdSetDescriptorBufferOffsetEXT of 0. The shader declares global unsized arrays and from there we can implement shader model 6.6 by just indexing into the descriptor array directly. For older models, descriptor table offsets can translate to u32 push constants that add an extra offset, meaning that we promote legacy root signatures to shader model 6.6. This is a fairly invasive process and it is only expected that translation layers would go to this length.

5. Porting existing Vulkan applications

Porting an existing Vulkan application to the new API should require minimal additional code, and ideally should allow the removal of older code.

Applications should be uploading descriptors in the exact same manner they upload other resource data (e.g. new textures, constants, etc.). All advice about how to upload resources (e.g. use staging buffers, use the DMA queue asynchronously, etc.) apply in the exact same manner for descriptors as they do for anything else.

When porting an application then, the aim should not be to create a new separate path for descriptor uploads, but to directly hook into existing resource upload paths. This amortizes the cost of descriptor uploads with other data uploads and reduces the amount of code dedicated to descriptor management. Any improvements to data uploads then automatically apply to descriptor uploads. For strategies where resizable BAR or unified memory can be used, none of this is necessary and uploading descriptors becomes memcpy.

For descriptor management, pools are removed. Instead of allocating descriptor sets from pools, applications can instead allocate from a custom allocator, which is backed by a big descriptor buffer. The size to allocate for a set would be obtained from vkGetDescriptorSetLayoutSizeEXT and alignment from descriptorBufferOffsetAlignment. A linear or arena allocator would be a good match for this.

Instead of updating descriptor sets with vkUpdateDescriptorSets, vkGetDescriptorEXT could point directly to the mapped descriptor buffer, or a scratch buffer can be used and copied later.

6. Example

This example intends to show:

How to create descriptor set layouts
How to use immutable samplers with descriptor buffers
How to use embedded immutable samplers
How to use push descriptors
How to allocate enough descriptor buffer memory
How to bind ranges of descriptor buffers to descriptor sets

VkSampler immutableSamplers[4]; // Create these somehow.

// When using descriptor buffers, it is generally a good idea to separate out samplers and resources into separate sets,
// since descriptor buffers containing samplers might be very limited in size.
const VkDescriptorSetLayoutBinding setLayout0[] =
{
    {
        0,                                      // binding
        VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,       // descriptorType
        2,                                      // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
        NULL                                    // pImmutableSamplers
    },
    {
        1,                                       // binding
        VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER, // descriptorType
        2,                                       // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,            // stageFlags
        NULL                                     // pImmutableSamplers
    }
};

const VkDescriptorSetLayoutBinding setLayout1[] =
{
    {
        0,                                      // binding
        VK_DESCRIPTOR_TYPE_SAMPLER,             // descriptorType
        2,                                      // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
        &immutableSamplers[0],                  // pImmutableSamplers
    },
    {
        1,                                       // binding
        VK_DESCRIPTOR_TYPE_SAMPLER,              // descriptorType
        2,                                       // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,            // stageFlags
        NULL,
    }
};

const VkDescriptorSetLayoutBinding setLayout2[] =
{
    // binding to a single image descriptor
    {
        0,                                      // binding
        VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,      // descriptorType
        1,                                      // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
        NULL                                    // pImmutableSamplers
    }
};

// Embedded immutable samplers are internally allocated and we do not need to allocate anything.
const VkDescriptorSetLayoutBinding setLayout3[] =
{
    {
        0,                                      // binding
        VK_DESCRIPTOR_TYPE_SAMPLER,             // descriptorType
        1,                                      // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
        &immutableSamplers[2],                  // pImmutableSamplers
    },
    {
        1,                                       // binding
        VK_DESCRIPTOR_TYPE_SAMPLER,              // descriptorType
        1,                                       // descriptorCount
        VK_SHADER_STAGE_FRAGMENT_BIT,            // stageFlags
        &immutableSamplers[3],                   // pImmutableSamplers
    }
};

// Descriptor set layouts are created as normal, but we use the descriptor buffer flag on the set layouts.
VkDescriptorSetLayout layout0 =
    create_descriptor_set_layout({ .flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT, .pBindings = setLayout0, .bindingCount = 2 });
VkDescriptorSetLayout layout1 =
    create_descriptor_set_layout({ .flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT, .pBindings = setLayout1, .bindingCount = 2 });
VkDescriptorSetLayout layout2 =
    create_descriptor_set_layout({ .flags =
            VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT |
            VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR,
        .pBindings = setLayout2, .bindingCount = 1 });
VkDescriptorSetLayout layout3 =
    create_descriptor_set_layout({ .flags =
            VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT |
            VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT,
        .pBindings = setLayout3, .bindingCount = 2 });

// Use 5 descriptor set layouts, mostly here to demonstrate how multiple sets can refer to one descriptor buffer.
// Also, use embedded sampler sets and push constants for completion.
VkPipelineLayout layout = create_pipeline_layout({ .layouts = { layout0, layout0, layout1, layout2, layout3 }});

// Query how big the descriptor set layout is.
VkDeviceSize layoutSizes[2];
vkGetDescriptorSetLayoutSizeEXT(device, layout0, &layoutSizes[0]);
vkGetDescriptorSetLayoutSizeEXT(device, layout1, &layoutSizes[1]);

// Align the descriptor set size so it is suitable for suballocation within a descriptor buffer.
layoutSizes[0] = align(layoutSizes[0], props.descriptorBufferOffsetAlignment);
layoutSizes[1] = align(layoutSizes[1], props.descriptorBufferOffsetAlignment);

// Query individual offsets into the descriptor set.
VkDeviceSize layoutOffsets[2][2];
vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout0, 0, &layoutOffsets[0][0]);
vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout0, 1, &layoutOffsets[0][1]);
vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout1, 0, &layoutOffsets[1][0]);
vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout1, 1, &layoutOffsets[1][1]);

#define SET_COUNT 64

// Allocate the equivalent of a big descriptor pool.
// The size is arbitrary and should be large and be able to hold all descriptors used by app,
// for this sample, we allocate the smallest possible descriptor buffer for the number of sets we need.
// The most compatible thing to do is 1 resource buffer, 1 sampler buffer.
Buffer resourceBuffer = create_buffer({
    .size = layoutSizes[0] * 2 * SET_COUNT,
    .usage = VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT |
        (props.bufferlessPushDescriptors ? 0 : VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT),
    .properties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT });

Buffer samplerBuffer = create_buffer({
    .size = layoutSizes[1] * SET_COUNT,
    .usage = VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT,
    .properties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT });

const VkDescriptorBufferBindingPushDescriptorBufferHandleEXT push_descriptor_buffer_handle = {
    VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_PUSH_DESCRIPTOR_BUFFER_HANDLE_EXT, NULL, resourceBuffer.handle};

const VkDescriptorBufferBindingInfoEXT binding_infos[2] = {
    { VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_INFO_EXT, (props.bufferlessPushDescriptors ? NULL : &push_descriptor_buffer_handle),
        resourceBuffer.deviceAddress,
        VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT | (props.bufferlessPushDescriptors ? 0 : VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT) },
    { VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_INFO_EXT, NULL, samplerBuffer.deviceAddress,
        VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT }
};

// Bind the descriptor buffers once, from here, we will offset into the buffer for different descriptor sets.
vkCmdBindDescriptorBuffersEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, 0, 2, binding_infos);

// Allocate these somehow, not particularly important to this example.
VkImageView views[SET_COUNT][2][2];
VkSampler samplers[SET_COUNT][2];
VkDeviceAddress bufferAddressTexelBuffer;

// No buffers are associated with embedded immutable samplers. This maps to DX12 static samplers.
// There is no vkCmdBindPipelineLayout(), so this is the way to do it in Vulkan.
vkCmdBindDescriptorBufferEmbeddedSamplersEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 4);

for (int i = 0; i < SET_COUNT; i++)
{
    // This refers to the buffers we bound in vkCmdBindDescriptorBuffersEXT.
    // Allocate descriptor sets linearly.
    const uint32_t bufferIndices[] = { 0, 0, 1 };
    const VkDeviceSize offsets[] = { 2 * i * layoutSizes[0], (2 * i + 1) * layoutSizes[0], i * layoutSizes[1] };

    // Set 0: Resource set pulled from buffer 0
    // Set 1: Resource set pulled from buffer 0
    // Set 2: Sampler set pulled from buffer 1
    // Set 3: Push descriptors
    // Set 4: Embedded samplers

    vkCmdSetDescriptorBufferOffsetsEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3,
        bufferIndices, offsets);

    VkWriteDescriptorSet ssbo_write = { /* Fill in as desired, details not interesting here. */ };
    vkCmdPushDescriptorSetKHR(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 3, 1, &ssbo_write);

    VkDescriptorImageInfo image_info = {};
    VkDescriptorAddressInfoEXT addr_info = { VK_STRUCTURE_TYPE_DESCRIPTOR_ADDRESS_INFO_EXT };
    VkDescriptorGetInfoEXT info = { VK_STRUCTURE_TYPE_DESCRIPTOR_GET_INFO_EXT };

    for (int j = 0; j < 2; j++)
    {
        info.type = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE;
        info.pSampledImage = &image_info;
        // If descriptorBufferImageLayoutIgnored is enabled, this is ignored, convenient!
        image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;

        // Offset is based on the binding offset + the offset within the descriptor set layout we queried earlier.
        // For array indexing, use the descriptor size from physical device property.
        // set j, binding 0, element k
        for (int k = 0; k < 2; k++)
        {
            image_info.imageView = views[i][j][k];
            vkGetDescriptorEXT(device, &info, props.sampledImageDescriptorSize,
            resourceBuffer.hostPointer + offsets[j] + layoutOffsets[0][0] + k * props.sampledImageDescriptorSize);
        }

        // set j, binding 1, element k
        info.type = VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER;
        info.data.pUniformBuffer = &addr_info;
        for (int k = 0; k < 2; k++)
        {
            addr_info.range = 1024;
            addr_info.address = bufferAddressTexelBuffer + (4 * i + 2 * j + k) * addr_info.range;
            // No VkBufferView needed, how convenient!
            addr_info.format = VK_FORMAT_R8G8B8A8_UNORM;
            vkGetDescriptorEXT(device, &info, props.uniformTexelBufferDescriptorSize,
            resourceBuffer.hostPointer + offsets[j] + layoutOffsets[0][1] + k * props.uniformTexelBufferDescriptorSize);
        }
    }

    // For immutable samplers, we have to emit the buffer payload.
    // In practice, the immutable samplers must work even if implementation just ignores pImmutableSamplers.
    info.type = VK_DESCRIPTOR_TYPE_SAMPLER;
    // set 2, binding 0, element k
    for (int k = 0; k < 2; k++)
    {
        info.data.pSampler = &immutableSamplers[k];
        vkGetDescriptorEXT(device, &info, props.samplerDescriptorSize,
        samplerBuffer.hostPointer + offsets[2] + layoutOffsets[1][0] + k * props.samplerDescriptorSize);
    }

    // set 2, binding 1, element k
    for (int k = 0; k < 2; k++)
    {
        info.data.pSampler = &samplers[i][k];
        vkGetDescriptorEXT(device, &info, props.samplerDescriptorSize,
        samplerBuffer.hostPointer + offsets[2] + layoutOffsets[1][1] + k * props.samplerDescriptorSize);
    }

    vkCmdDraw(...);
}

7. Issues

7.1. RESOLVED: How do immutable samplers work?

There may be cases where a driver needs immutable samplers stored as part of the descriptor, rather than solely existing as a part of the pipeline. With descriptor sets, this could be hidden from the application as the driver controlled how writes were performed – not so with this API. To fix this, samplers must be used to populate these descriptor bindings as if they were not immutable, and they must have been created with identical parameters.

For partity with DX12, a special kind of descriptor set - embedded immutable samplers - are supported as an alternative which follow DX12 restrictions.

7.2. RESOLVED: Should we support dynamic buffers?

No, these have very specialized support paths in some drivers, and end up being more pain than it’s worth to support. Applications can achieve the same using device addresses in push constants, or pipelined descriptor buffer updates.

7.3. UNRESOLVED: How does this interact with descriptor set invalidation?

There’s some extra complication with whether descriptor set layouts work with buffers or sets (VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT) that will need sorting. Shouldn’t be too difficult and will likely just be along the lines of invalidating sets that don’t match in this regard when binding a new pipeline layout, but it’s too much detail for this design document.

7.4. RESOLVED: Should `vkGetDescriptorOffset` take an `arrayOffset` parameter, or should we make guarantees about how arrays work?

Guarantees about how arrays work makes it much easier to work with GPU-side updates, as it avoids having to either add a “get offset” shader intrinsic, or for apps to keep a mapping when doing GPU copies.

7.5. RESOLVED: Now that descriptors are in regular memory, should there be a limit on the size of “inline uniforms”?

We should allow developers to put as many constants into descriptor buffers as they want, thus removing the limit, at least when it interacts with this extension. This is likely to remove an indirection compared to putting these in a uniform buffer. Potentially we might want to at least have it match the uniform buffer limit rather than being independent.

7.6. RESOLVED: Why are view objects required when DX12 has no such requirement?

DX12 has dedicated heap objects which allow implementations to hide a lot of implementation detail behind them; without them, some vendors rely on view objects to store metadata. Introducing heaps to Vulkan as-is was too complex alongside the other changes in this extension, when the primary goal is to enable explicit memory management, rather than precise DX12 compatibility. If this turns out to be a significant problem, a future extension could be developed to bridge this gap.

7.7. RESOLVED: Should `vkGetDescriptorEXT` / `vkGetDescriptorSetLayoutBindingOffsetEXT` be arrayed?

No – there is no reason why pulling this loop into the driver should provide any benefit.

7.8. RESOLVED: Should we support combined image/sampler descriptors with this extension?

While some consider these deprecated, removing them would prevent some applications being able to port to this extension. Additionally, YCbCr support currently relies on this descriptor type, which is required on some platforms. It might be possible to remove that requirement in the YCbCr feature, but it is a lot of work for a fairly low payoff.

7.9. RESOLVED: How does this interact with variable descriptor count?

The variable flag is allowed; vkGetDescriptorSetLayoutSize returns a size assuming the maximum size will be used - but developers are free to use the set with a buffer sized for a smaller number of descriptors. The exception to this is when combinedImageSamplerDescriptorSingleArray is VK_FALSE and the binding contains VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptors; in this case the image and sampler descriptors are still arranged in the descriptor buffer as though the maximum number of descriptors are used, and so the buffer must be sized accordingly.

7.10. RESOLVED: Should we require descriptors to be retrieved for `NULL_HANDLE` or is `memset(0)` sufficient?

Some vendors use non-zero values for null descriptors, so applications can retrieve these using VK_NULL_HANDLE with vkGetDescriptorEXT. For descriptor types which take buffer devices addresses, a 0 address is used instead.

7.11. RESOLVED: How can YCbCr descriptors be obtained?

YCbCr descriptors can have multiple descriptors associated with them; applications must allow for this space. VkSamplerYcbcrConversionImageFormatProperties::combinedImageSamplerDescriptorCount determines how many descriptors each image format requires. When calling vkGetDescriptorEXT for a YCbCr combined descriptor, applications must provide a pointer to enough memory for this many combined sampled image descriptors, and factor this in when copying descriptors.

7.12. RESOLVED: How should we expect capture/replay tooling (e.g. RenderDoc/vktrace) to use this?

A capture replay bit on image/buffer creation will be added to enable descriptors to be reused between runs. This allows capture tools to capture the buffer data as bound, and replay with the same descriptors, rather than attempting to do a mapping. Some sort of GPU feedback is still desirable on capture to determine which handles are accessed, but this will be similar to the situation with descriptor indexing.

7.13. RESOLVED: On some platforms, descriptor sets occupy a 4GB range, allowing the set pointer to be 32-bit, rather than 64-bit. How can this be guaranteed for descriptor buffers?

This could be done a number of ways – e.g. having unique memory types that guarantee allocation in a 4GB range.

7.14. RESOLVED: Should the alignment be separate from the size?

No - the alignment of a descriptor is always the size of the descriptor.

7.15. RESOLVED: What is the fast path for constant data in this new model? Previously most vendors have recommended dynamic UBOs as a fast path, but those go away in this extension.

The crucial part of getting data into a shader quickly is mostly dominated by number of indirections, and cache behavior. Static accesses with fewer indirections and minimal memory model interactions (e.g. read-only and not NonPrivate) will be fastest. Push constants should be favored for small amounts of data. For larger amounts of data, applications should favor allocating buffers and putting data into those buffers according with whichever of the below API mechanisms is most straightforward for their use case, with some potential degradation at each step.

Push constants
Pointer to data in push constants
Inline uniform data in descriptor buffers
Push descriptors
Uniform buffer in descriptor memory
Storage buffer in descriptor memory

This order listed above is not necessarily true for all IHVs.

7.16. RESOLVED: Should applications be able to mix sets and buffers?

Originally the intention was to support this, but at least one vendor cannot support this natively.

7.17. RESOLVED: Should we use buffer device addresses for the buffer arguments?

Buffer parameters in recent extensions have been using device address arguments, so this extension aims to be consistent. Part of the reason for this though, is so that the base address can be modified with a single pointer argument instead of object + offset. However, this extension explicitly uses a separate command for setting the offset dynamically compared to the base address, to allow for the application to set the base address statically. Having the base address specified with a device address is still useful for consistency though.

7.18. RESOLVED: How does this interact with VK_EXT_pipeline_robustness?

There is no way to request robust and non-robust descriptors separately, or specify robust/non-robust descriptors in the set layout, so if the robustBufferAccess feature is enabled then robust descriptors are always used.