VK_NV_cluster_acceleration_structure
This document introduces a new type of bottom level acceleration structure that supports using pre-generated clusters of primitives which helps in reducing acceleration structure build times.
1. Problem Statement
Acceleration structure build times can pose a bottleneck in ray tracing applications with extensive dynamic geometry. Examples include managing numerous animated objects, implementing LOD systems, or handling dynamic tessellation. As scenes become increasingly complex, these build times can escalate significantly, impacting performance.
2. Solution Space
The clustered geometry proposal seeks to resolve this challenge by allowing applications to build bottom-level acceleration structures using pre-generated clusters of primitives, significantly reducing build times.
3. Proposal
This document proposes three new acceleration structure types:
-
Cluster Level Acceleration Structure (CLAS): A new type of acceleration structure described in more detail below.
-
Cluster Template: A partially constructed CLAS which can be instantiated to multiple cluster level acceleration structures.
-
Cluster BLAS: An alternative to the existing Bottom Level Acceleration Structure (BLAS), constructed from references to CLAS structures.
A CLAS is an intermediate acceleration structure created from triangles, which can then be used to build Cluster BLAS. The Cluster BLAS serves as an alternative to the traditional BLAS. The goal is for applications to organize mesh geometry into CLAS primitives before creating the Cluster BLAS. To optimize trace performance, geometry should be grouped into CLAS based on spatial proximity.
A CLAS behaves similarly to a BLAS in many respects but has the following differences:
-
Triangle and Vertex Limits: A CLAS can contain up to a small number of triangles and vertices.
-
TLAS Integration: CLAS cannot be directly included in a TLAS. Instead, they are referenced as part of a Cluster BLAS, which can be traced.
-
Geometry Indices: Indices in a CLAS can be specified per primitive that are local to the CLAS and may be non-consecutive.
-
ClusterID: A CLAS can be assigned a user-defined 32-bit ClusterID, which can be accessed from a hit shader.
-
Vertex positions in a CLAS can be quantized for better storage by implicitly zeroing a variable number of floating point mantissa bits.
Cluster Templates are designed to efficiently instantiate CLAS in memory. During the CLAS instantiation process from a Cluster Template, the actual vertex positions are provided, and the ClusterID as well as the geometry index can be offset uniformly. Cluster Templates perform as much pre-computation as possible that is independent of final vertex positions, enabling reuse when generating multiple CLAS instances. A Cluster Template is a partially constructed CLAS with the following distinctions:
-
It does not store or require vertex position data, however it can use it to guide the spatial relationship among triangles.
-
Its size is smaller due to the absence of position information.
-
It cannot be used for tracing or as a basis for building other acceleration structures.
-
Bounding box information can be used in combination with the ability to zero some of the floating point mantissa bits, to optimize the storage of the actual vertices at instantiation.
-
It retains non-positional properties similar to a CLAS, which are inherited when the CLAS is instantiated.
This extension provides a host-side query function to fetch the memory requirements and a single versatile multi-indirect function for managing cluster geometry which allows applications to generate CLAS geometry, construct Cluster BLAS from CLAS lists, and move or copy CLAS and BLAS. By sourcing inputs from device memory and processing multiple elements simultaneously, the call reduces the host-side costs associated with traditional acceleration structure functions and enables device-driven scene preparation.
4. API Features
The following provides a basic overview of how this extension can be used:
4.1. Feature
The following feature is exposed by this extension:
typedef struct VkPhysicalDeviceClusterAccelerationStructureFeaturesNV {
VkStructureType sType;
void* pNext;
VkBool32 clusterAccelerationStructure;
} VkPhysicalDeviceClusterAccelerationStructureFeaturesNV;
clusterAccelerationStructure
is the core feature enabling this extension’s
functionality.
4.2. Properties
The following properties are exposed by this extension:
typedef struct VkPhysicalDeviceClusterAccelerationStructurePropertiesNV {
VkStructureType sType;
void* pNext;
uint32_t maxVerticesPerCluster;
uint32_t maxTrianglesPerCluster;
uint32_t clusterScratchByteAlignment;
uint32_t clusterByteAlignment;
uint32_t clusterTemplateByteAlignment;
uint32_t clusterBottomLevelByteAlignment;
uint32_t clusterTemplateBoundsByteAlignment;
uint32_t maxClusterGeometryIndex;
} VkPhysicalDeviceClusterAccelerationStructurePropertiesNV;
maxVerticesPerCluster
and maxTrianglesPerCluster
specify the maximum limits
of vertices and triangles per cluster respectively.
The buffers and scratch memory used for building acceleration structures must
adhere to alignment requirements specified by other values in this structure.
maxVerticesPerCluster
is the maximum geometry index possible for a
triangle in cluster acceleration structures.
4.3. Commands
This extension provides a host-side query function to fetch the requirements and a versatile multi-indirect call for managing cluster geometry. This call enables applications to generate cluster geometry, construct Cluster BLAS from CLAS lists, and move or copy CLAS and BLAS. By sourcing inputs from device memory and processing multiple elements simultaneously, the call reduces the host-side costs associated with traditional acceleration structure functions.
4.3.1. Checking memory requirements
To determine the memory requirements for building or moving cluster acceleration structures, use:
VKAPI_ATTR void VKAPI_CALL vkGetClusterAccelerationStructureBuildSizesNV(
VkDevice device,
VkClusterAccelerationStructureInputInfoNV const* pInfo,
VkAccelerationStructureBuildSizesInfoKHR* pSizeInfo);
where pInfo
contains the parameters of the memory requirements query and
pSizeInfo
contains the resulting memory requirements.
The VkClusterAccelerationStructureInputInfoNV
structure is used in querying
memory requirements, performing the build or move operation. The word
"operation" below describes all these operations. The structure is defined as:
typedef struct VkClusterAccelerationStructureInputInfoNV {
VkStructureType sType;
void* pNext;
uint32_t maxAccelerationStructureCount;
VkBuildAccelerationStructureFlagsKHR flags;
VkClusterAccelerationStructureOpTypeNV opType;
VkClusterAccelerationStructureOpModeNV opMode;
VkClusterAccelerationStructureOpInputNV opInput;
} VkClusterAccelerationStructureInputInfoNV;
-
maxAccelerationStructureCount
is the maximum number of acceleration structures used in this operation. -
flags
is a bitmask ofVkBuildAccelerationStructureFlagsKHR
specifying flags for the operation. -
opType
is aVkClusterAccelerationStructureOpTypeNV
value specifying the type of operation. -
opMode
is aVkClusterAccelerationStructureOpModeNV
value specifying the mode of operation. -
opInput
is aVkClusterAccelerationStructureOpInputNV
value specifying the upper bounds in the operation.
VkClusterAccelerationStructureOpTypeNV
can be one of:
typedef enum VkClusterAccelerationStructureOpTypeNV {
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_MOVE_OBJECTS_NV = 0,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_CLUSTERS_BOTTOM_LEVEL_NV = 1,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_TRIANGLE_CLUSTER_NV = 2,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_TRIANGLE_CLUSTER_TEMPLATE_NV = 3,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_INSTANTIATE_TRIANGLE_CLUSTER_NV = 4,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_MAX_ENUM_NV = 0x7FFFFFFF
} VkClusterAccelerationStructureOpTypeNV;
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_MOVE_OBJECTS_NV
means cluster acceleration structures (CLAS, Cluster Templates or Cluster BLAS) will be moved or copied. -
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_CLUSTERS_BOTTOM_LEVEL_NV
means a bottom level cluster acceleration structures will be built. -
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_TRIANGLE_CLUSTER_NV
means a cluster acceleration structures will be built. -
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_TRIANGLE_CLUSTER_TEMPLATE_NV
means a cluster template acceleration structures will be built. -
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_INSTANTIATE_TRIANGLE_CLUSTER_NV
means a cluster template acceleration structures will be instantiated.
VkClusterAccelerationStructureOpModeNV
can be one of:
typedef enum VkClusterAccelerationStructureOpModeNV {
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_IMPLICIT_DESTINATIONS_NV = 0,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_EXPLICIT_DESTINATIONS_NV = 1,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_COMPUTE_SIZES_NV = 2,
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_MAX_ENUM_NV = 0x7FFFFFFF
} VkClusterAccelerationStructureOpModeNV;
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_IMPLICIT_DESTINATIONS_NV
indicates that the build or move operation will implicitly distribute built/moved structures in the user specified buffer (VkClusterAccelerationStructureCommandsInfoNV::dstImplicitData
). -
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_EXPLICIT_DESTINATIONS_NV
indicates that the build or move operation will explicitly write built/moved acceleration structures to the addresses specified in user specified buffer (VkClusterAccelerationStructureCommandsInfoNV::dstAddressesArray
). -
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_COMPUTE_SIZES_NV
indicates that computed cluster acceleration structure’s sizes will be written to user specified buffer (VkClusterAccelerationStructureCommandsInfoNV::dstSizesArray
).
VkClusterAccelerationStructureOpInputNV
can be one of:
typedef union VkClusterAccelerationStructureOpInputNV {
VkClusterAccelerationStructureClustersBottomLevelInputNV* pClustersBottomLevel;
VkClusterAccelerationStructureTriangleClusterInputNV* pTriangleClusters;
VkClusterAccelerationStructureMoveObjectsInputNV* pMoveObjects;
} VkClusterAccelerationStructureOpInputNV;
-
pClustersBottomLevel
is aVkClusterAccelerationStructureClustersBottomLevelInputNV
structure specifying an upper threshold on the number of cluster level acceleration structures that will be used to build a bottom level acceleration structure:
typedef struct VkClusterAccelerationStructureClustersBottomLevelInputNV {
VkStructureType sType;
void* pNext;
uint32_t maxTotalClusterCount;
uint32_t maxClusterCountPerAccelerationStructure;
} VkClusterAccelerationStructureClustersBottomLevelInputNV;
-
pTriangleClusters
is aVkClusterAccelerationStructureTriangleClusterInputNV
structure specifying an upper threshold on parameters to build a regular or template cluster acceleration structure, or to instantiate it:
typedef struct VkClusterAccelerationStructureTriangleClusterInputNV {
VkStructureType sType;
void* pNext;
VkFormat vertexFormat;
uint32_t maxGeometryIndexValue;
uint32_t maxClusterUniqueGeometryCount;
uint32_t maxClusterTriangleCount;
uint32_t maxClusterVertexCount;
uint32_t maxTotalTriangleCount;
uint32_t maxTotalVertexCount;
uint32_t minPositionTruncateBitCount;
} VkClusterAccelerationStructureTriangleClusterInputNV;
-
pMoveObjects
is aVkClusterAccelerationStructureMoveObjectsInputNV
structure specifying an upper threshold on the number of bytes moved and the type of acceleration structure being moved. It also specifies if there is an overlap in the move operation between source and destination acceleration structures:
typedef struct VkClusterAccelerationStructureMoveObjectsInputNV {
VkStructureType sType;
void* pNext;
VkClusterAccelerationStructureTypeNV type;
VkBool32 noMoveOverlap;
VkDeviceSize maxMovedBytes;
} VkClusterAccelerationStructureMoveObjectsInputNV;
4.3.2. Performing build or move operation
To build or move a cluster acceleration structure, a cluster acceleration structure template or to instantiate a cluster acceleration structure template call:
VKAPI_ATTR void VKAPI_CALL vkCmdBuildClusterAccelerationStructureIndirectNV(
VkCommandBuffer commandBuffer,
VkClusterAccelerationStructureCommandsInfoNV const* pCommandInfos);
-
pCommandInfos
is a pointer to aVkClusterAccelerationStructureCommandsInfoNV
structure containing parameters required for building or moving the cluster acceleration structure and is defined as:
typedef struct VkClusterAccelerationStructureCommandsInfoNV {
VkStructureType sType;
void* pNext;
VkClusterAccelerationStructureInputInfoNV input;
VkDeviceAddress dstImplicitData;
VkDeviceAddress scratchData;
VkStridedDeviceAddressRegionKHR dstAddressesArray;
VkStridedDeviceAddressRegionKHR dstSizesArray;
VkStridedDeviceAddressRegionKHR srcInfosArray;
VkDeviceAddress srcInfosCount;
VkClusterAccelerationStructureAddressResolutionFlagsNV addressResolutionFlags;
} VkClusterAccelerationStructureCommandsInfoNV;
-
input
isVkClusterAccelerationStructureInputInfoNV
structure describing the build or move parameters for the cluster acceleration structure. -
dstImplicitData
is the device address for memory where the implicit build of cluster acceleration structure will be saved and it must be provided ifinput::opMode == VK_CLUSTER_ACCELERATION_STRUCTURE_OP_MODE_IMPLICIT_DESTINATIONS_NV
. -
scratchData
is the device address of scratch memory that will be used during cluster acceleration structure move or build. -
dstAddressesArray
is aVkStridedDeviceAddressRegionKHR
where the individual addresses and stride of moved or built cluster acceleration structures will be saved or read from depending oninput::opMode
. -
dstSizesArray
is NULL or a VkStridedDeviceAddressRegionKHR containing sizes of moved or built cluster acceleration structures. -
srcInfosArray
is a VkStridedDeviceAddressRegionKHR where input data for the build or move operation is read from. This is an input to the implementation and is described in more detail below. -
srcInfosCount
is the device address of memory containing the count of number of build or move operations to perform. -
addressResolutionFlags
is a bitmask ofVkClusterAccelerationStructureAddressResolutionFlagBitsNV
values specifying if an operation’s addresses are retrieved from the device through another level of indirection when reading corresponding address inVkClusterAccelerationStructureCommandsInfoNV
. It can be one of:
- VK_CLUSTER_ACCELERATION_STRUCTURE_ADDRESS_RESOLUTION_INDIRECTED_DST_IMPLICIT_DATA_BIT_NV
- VK_CLUSTER_ACCELERATION_STRUCTURE_ADDRESS_RESOLUTION_INDIRECTED_SCRATCH_DATA_BIT_NV
- VK_CLUSTER_ACCELERATION_STRUCTURE_ADDRESS_RESOLUTION_INDIRECTED_DST_ADDRESS_ARRAY_BIT_NV
- VK_CLUSTER_ACCELERATION_STRUCTURE_ADDRESS_RESOLUTION_INDIRECTED_DST_SIZES_ARRAY_BIT_NV
- VK_CLUSTER_ACCELERATION_STRUCTURE_ADDRESS_RESOLUTION_INDIRECTED_SRC_INFOS_ARRAY_BIT_NV
- VK_CLUSTER_ACCELERATION_STRUCTURE_ADDRESS_RESOLUTION_INDIRECTED_SRC_INFOS_COUNT_BIT_NV
Depending on VkClusterAccelerationStructureInputInfoNV::opType
, srcInfosArray
can contain structures of following types:
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_MOVE_OBJECTS_NV
:VkClusterAccelerationStructureMoveObjectsInfoNV
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_CLUSTERS_BOTTOM_LEVEL_NV
:VkClusterAccelerationStructureBuildClustersBottomLevelInfoNV
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_TRIANGLE_CLUSTER_NV
:VkClusterAccelerationStructureBuildTriangleClusterInfoNV
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_BUILD_TRIANGLE_CLUSTER_TEMPLATE_NV
:VkClusterAccelerationStructureBuildTriangleClusterTemplateInfoNV
-
VK_CLUSTER_ACCELERATION_STRUCTURE_OP_TYPE_INSTANTIATE_TRIANGLE_CLUSTER_NV
:VkClusterAccelerationStructureInstantiateClusterInfoNV
If performing a move operation, the source acceleration structure is specified in srcInfosArray
with:
typedef struct VkClusterAccelerationStructureMoveObjectsInfoNV {
VkDeviceAddress srcAccelerationStructure;
} VkClusterAccelerationStructureMoveObjectsInfoNV;
Depending on the input::opMode
, the destination acceleration structure will be moved to the buffer
in VkClusterAccelerationStructureCommandsInfoNV::dstImplicitData
or VkClusterAccelerationStructureCommandsInfoNV::dstAddressesArray
.
If creating a bottom level acceleration structure from clusters, the cluster references that make up the bottom level acceleration structure are specified with below structure. Refer to the spec for more details on individual parameters:
typedef struct VkClusterAccelerationStructureBuildClustersBottomLevelInfoNV {
uint32_t clusterReferencesCount;
uint32_t clusterReferencesStride;
VkDeviceAddress clusterReferences;
} VkClusterAccelerationStructureBuildClustersBottomLevelInfoNV;
If building a triangle cluster, the input data, e.g. vertex data, index data, opacity micromaps etc., are specified with the below structure. Refer to the spec for more details on individual parameters:
typedef struct VkClusterAccelerationStructureBuildTriangleClusterInfoNV {
uint32_t clusterID;
VkClusterAccelerationStructureClusterFlagsNV clusterFlags;
uint32_t triangleCount:9;
uint32_t vertexCount:9;
uint32_t positionTruncateBitCount:6;
uint32_t indexType:4;
uint32_t opacityMicromapIndexType:4;
VkClusterAccelerationStructureGeometryIndexAndGeometryFlagsNV baseGeometryIndexAndGeometryFlags;
uint16_t indexBufferStride;
uint16_t vertexBufferStride;
uint16_t geometryIndexAndFlagsBufferStride;
uint16_t opacityMicromapIndexBufferStride;
VkDeviceAddress indexBuffer;
VkDeviceAddress vertexBuffer;
VkDeviceAddress geometryIndexAndFlagsBuffer;
VkDeviceAddress opacityMicromapArray;
VkDeviceAddress opacityMicromapIndexBuffer;
} VkClusterAccelerationStructureBuildTriangleClusterInfoNV;
If building a triangle cluster template, the input data, e.g. vertex data, index data, opacity micromaps etc., are specified with below structure. Refer to the spec for more details on individual parameters:
typedef struct VkClusterAccelerationStructureBuildTriangleClusterTemplateInfoNV {
uint32_t clusterID;
VkClusterAccelerationStructureClusterFlagsNV clusterFlags;
uint32_t triangleCount:9;
uint32_t vertexCount:9;
uint32_t positionTruncateBitCount:6;
uint32_t indexType:4;
uint32_t opacityMicromapIndexType:4;
VkClusterAccelerationStructureGeometryIndexAndGeometryFlagsNV baseGeometryIndexAndGeometryFlags;
uint16_t indexBufferStride;
uint16_t vertexBufferStride;
uint16_t geometryIndexAndFlagsBufferStride;
uint16_t opacityMicromapIndexBufferStride;
VkDeviceAddress indexBuffer;
VkDeviceAddress vertexBuffer;
VkDeviceAddress geometryIndexAndFlagsBuffer;
VkDeviceAddress opacityMicromapArray;
VkDeviceAddress opacityMicromapIndexBuffer;
VkDeviceAddress instantiationBoundingBoxLimit;
} VkClusterAccelerationStructureBuildTriangleClusterTemplateInfoNV;
instantiationBoundingBoxLimit
is the address of a bounding box within which all instantiated clusters must lie. The bounding box is specified by six 32-bit floating-point values in the order MinX, MinY, MinZ, MaxX, MaxY, MaxZ.
If instantiating a triangle cluster template, the address of the template along with cluster specific values are specified with below structure. Refer to the spec for more details on individual parameters.
typedef struct VkClusterAccelerationStructureInstantiateClusterInfoNV {
uint32_t clusterIdOffset;
uint32_t geometryIndexOffset:24;
uint32_t reserved:8;
VkDeviceAddress clusterTemplateAddress;
VkStridedDeviceAddressNV vertexBuffer;
} VkClusterAccelerationStructureInstantiateClusterInfoNV;
5. Issues
1) Why use a separate VkRayTracingPipelineClusterAccelerationStructureCreateInfoNV
structure
to enable the feature instead of a pipeline bit?
RESOLVED: Yes. The extension was originally provisional and we did not want to use a pipeline bit. This should be revisited when the extension is promoted.
2) Do cluster acceleration structures support serialization/deserialization? RESOLVED: No. The current specification does not support it but could be added if there is interest.