Dispatching Commands

The dispatching commands described in this chapter provoke work in a compute pipeline. Dispatching commands are recorded into a command buffer and when executed by a queue, will produce work which executes according to the bound compute pipeline. A compute pipeline must be bound to a command buffer before any dispatching commands are recorded in that command buffer.

To record a dispatch, call:

// Provided by VK_VERSION_1_0
void vkCmdDispatch(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    groupCountX,
    uint32_t                                    groupCountY,
    uint32_t                                    groupCountZ);
  • commandBuffer is the command buffer into which the command will be recorded.

  • groupCountX is the number of local workgroups to dispatch in the X dimension.

  • groupCountY is the number of local workgroups to dispatch in the Y dimension.

  • groupCountZ is the number of local workgroups to dispatch in the Z dimension.

When the command is executed, a global workgroup consisting of groupCountX × groupCountY × groupCountZ local workgroups is assembled.

Valid Usage
  • VUID-vkCmdDispatch-commandBuffer-02712
    If commandBuffer is a protected command buffer and protectedNoFault is not supported, any resource written to by the VkPipeline object bound to the pipeline bind point used by this command must not be an unprotected resource

  • VUID-vkCmdDispatch-commandBuffer-02713
    If commandBuffer is a protected command buffer and protectedNoFault is not supported, pipeline stages other than the framebuffer-space and compute stages in the VkPipeline object bound to the pipeline bind point used by this command must not write to any resource

  • VUID-vkCmdDispatch-commandBuffer-04617
    If any of the shader stages of the VkPipeline bound to the pipeline bind point used by this command uses the RayQueryKHR capability, then commandBuffer must not be a protected command buffer

  • VUID-vkCmdDispatch-groupCountX-00386
    groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]

  • VUID-vkCmdDispatch-groupCountY-00387
    groupCountY must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[1]

  • VUID-vkCmdDispatch-groupCountZ-00388
    groupCountZ must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[2]

Valid Usage (Implicit)
  • VUID-vkCmdDispatch-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdDispatch-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdDispatch-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_COMPUTE_BIT operations

  • VUID-vkCmdDispatch-suspended
    This command must not be called between suspended render pass instances

  • VUID-vkCmdDispatch-videocoding
    This command must only be called outside of a video coding scope

Host Synchronization
  • Host access to commandBuffer must be externally synchronized

  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Both

Outside

VK_QUEUE_COMPUTE_BIT

Action

Conditional Rendering

vkCmdDispatch is affected by conditional rendering

To record an indirect dispatching command, call:

// Provided by VK_VERSION_1_0
void vkCmdDispatchIndirect(
    VkCommandBuffer                             commandBuffer,
    VkBuffer                                    buffer,
    VkDeviceSize                                offset);
  • commandBuffer is the command buffer into which the command will be recorded.

  • buffer is the buffer containing dispatch parameters.

  • offset is the byte offset into buffer where parameters begin.

vkCmdDispatchIndirect behaves similarly to vkCmdDispatch except that the parameters are read by the device from a buffer during execution. The parameters of the dispatch are encoded in a VkDispatchIndirectCommand structure taken from buffer starting at offset.

Valid Usage
  • VUID-vkCmdDispatchIndirect-buffer-02708
    If buffer is non-sparse then it must be bound completely and contiguously to a single VkDeviceMemory object

  • VUID-vkCmdDispatchIndirect-buffer-02709
    buffer must have been created with the VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT usage flag set

  • VUID-vkCmdDispatchIndirect-offset-02710
    offset must be a multiple of 4

  • VUID-vkCmdDispatchIndirect-commandBuffer-02711
    commandBuffer must not be a protected command buffer

  • VUID-vkCmdDispatchIndirect-offset-00407
    The sum of offset and the size of VkDispatchIndirectCommand must be less than or equal to the size of buffer

Valid Usage (Implicit)
  • VUID-vkCmdDispatchIndirect-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdDispatchIndirect-buffer-parameter
    buffer must be a valid VkBuffer handle

  • VUID-vkCmdDispatchIndirect-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdDispatchIndirect-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_COMPUTE_BIT operations

  • VUID-vkCmdDispatchIndirect-suspended
    This command must not be called between suspended render pass instances

  • VUID-vkCmdDispatchIndirect-videocoding
    This command must only be called outside of a video coding scope

  • VUID-vkCmdDispatchIndirect-commonparent
    Both of buffer, and commandBuffer must have been created, allocated, or retrieved from the same VkDevice

Host Synchronization
  • Host access to commandBuffer must be externally synchronized

  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Both

Outside

VK_QUEUE_COMPUTE_BIT

Action

Conditional Rendering

vkCmdDispatchIndirect is affected by conditional rendering

The VkDispatchIndirectCommand structure is defined as:

// Provided by VK_VERSION_1_0
typedef struct VkDispatchIndirectCommand {
    uint32_t    x;
    uint32_t    y;
    uint32_t    z;
} VkDispatchIndirectCommand;
  • x is the number of local workgroups to dispatch in the X dimension.

  • y is the number of local workgroups to dispatch in the Y dimension.

  • z is the number of local workgroups to dispatch in the Z dimension.

The members of VkDispatchIndirectCommand have the same meaning as the corresponding parameters of vkCmdDispatch.

Valid Usage
  • VUID-VkDispatchIndirectCommand-x-00417
    x must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]

  • VUID-VkDispatchIndirectCommand-y-00418
    y must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[1]

  • VUID-VkDispatchIndirectCommand-z-00419
    z must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[2]

To record a dispatch using non-zero base values for the components of WorkgroupId, call:

// Provided by VK_VERSION_1_1
void vkCmdDispatchBase(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    baseGroupX,
    uint32_t                                    baseGroupY,
    uint32_t                                    baseGroupZ,
    uint32_t                                    groupCountX,
    uint32_t                                    groupCountY,
    uint32_t                                    groupCountZ);
// Provided by VK_KHR_device_group
// Equivalent to vkCmdDispatchBase
void vkCmdDispatchBaseKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    baseGroupX,
    uint32_t                                    baseGroupY,
    uint32_t                                    baseGroupZ,
    uint32_t                                    groupCountX,
    uint32_t                                    groupCountY,
    uint32_t                                    groupCountZ);
  • commandBuffer is the command buffer into which the command will be recorded.

  • baseGroupX is the start value for the X component of WorkgroupId.

  • baseGroupY is the start value for the Y component of WorkgroupId.

  • baseGroupZ is the start value for the Z component of WorkgroupId.

  • groupCountX is the number of local workgroups to dispatch in the X dimension.

  • groupCountY is the number of local workgroups to dispatch in the Y dimension.

  • groupCountZ is the number of local workgroups to dispatch in the Z dimension.

When the command is executed, a global workgroup consisting of groupCountX × groupCountY × groupCountZ local workgroups is assembled, with WorkgroupId values ranging from [baseGroup*, baseGroup* + groupCount*) in each component. vkCmdDispatch is equivalent to vkCmdDispatchBase(0,0,0,groupCountX,groupCountY,groupCountZ).

Valid Usage
  • VUID-vkCmdDispatchBase-commandBuffer-02712
    If commandBuffer is a protected command buffer and protectedNoFault is not supported, any resource written to by the VkPipeline object bound to the pipeline bind point used by this command must not be an unprotected resource

  • VUID-vkCmdDispatchBase-commandBuffer-02713
    If commandBuffer is a protected command buffer and protectedNoFault is not supported, pipeline stages other than the framebuffer-space and compute stages in the VkPipeline object bound to the pipeline bind point used by this command must not write to any resource

  • VUID-vkCmdDispatchBase-commandBuffer-04617
    If any of the shader stages of the VkPipeline bound to the pipeline bind point used by this command uses the RayQueryKHR capability, then commandBuffer must not be a protected command buffer

  • VUID-vkCmdDispatchBase-baseGroupX-00421
    baseGroupX must be less than VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]

  • VUID-vkCmdDispatchBase-baseGroupX-00422
    baseGroupY must be less than VkPhysicalDeviceLimits::maxComputeWorkGroupCount[1]

  • VUID-vkCmdDispatchBase-baseGroupZ-00423
    baseGroupZ must be less than VkPhysicalDeviceLimits::maxComputeWorkGroupCount[2]

  • VUID-vkCmdDispatchBase-groupCountX-00424
    groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0] minus baseGroupX

  • VUID-vkCmdDispatchBase-groupCountY-00425
    groupCountY must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[1] minus baseGroupY

  • VUID-vkCmdDispatchBase-groupCountZ-00426
    groupCountZ must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[2] minus baseGroupZ

  • VUID-vkCmdDispatchBase-baseGroupX-00427
    If any of baseGroupX, baseGroupY, or baseGroupZ are not zero, then the bound compute pipeline must have been created with the VK_PIPELINE_CREATE_DISPATCH_BASE_BIT flag or the bound compute shader object must have been created with the VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT flag

Valid Usage (Implicit)
  • VUID-vkCmdDispatchBase-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdDispatchBase-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdDispatchBase-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_COMPUTE_BIT operations

  • VUID-vkCmdDispatchBase-suspended
    This command must not be called between suspended render pass instances

  • VUID-vkCmdDispatchBase-videocoding
    This command must only be called outside of a video coding scope

Host Synchronization
  • Host access to commandBuffer must be externally synchronized

  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Both

Outside

VK_QUEUE_COMPUTE_BIT

Action

Conditional Rendering

vkCmdDispatchBase is affected by conditional rendering

To record an area-based dispatch, call:

// Provided by VK_QCOM_tile_shading
void vkCmdDispatchTileQCOM(
    VkCommandBuffer                             commandBuffer,
    const VkDispatchTileInfoQCOM*               pDispatchTileInfo);
  • commandBuffer is the command buffer into which the command will be recorded.

  • pDispatchTileInfo is a pointer to a VkDispatchTileInfoQCOM structure containing information about the area-based dispatch.

This command operates in the per-tile execution model, invoking a separate dispatch for each covered tile. The global workgroup count and local workgroup size of each dispatch are defined by the implementation to efficiently iterate over a uniform grid of pixel blocks within the area of its active tile.

Each shader invocation operates on a single pixel block and its size is determined by the shader’s tiling rate, which must be defined by shaders executed by this command. The TileShadingRateQCOM execution mode operand defines the shader’s tiling rate. Its x and y must be a power of two and less than or equal to the maxTileShadingRate limit. Its z must be less than or equal to the active tile’s depth as reported by VK_QCOM_tile_properties, and VkTilePropertiesQCOM.tileSize.z % TileShadingRateQCOM::z must equal 0.

The start location of the shader invocation’s pixel block is vec3(TileOffsetQCOM, 0) + (GlobalInvocationId * TileShadingRateQCOM)

Shader invocations can perform tile attachment load/store operations at any location within the active tile, but the most efficient access may be limited to fragment locations within and local to the shader invocation’s pixel block.

Valid Usage
  • VUID-vkCmdDispatchTileQCOM-commandBuffer-02712
    If commandBuffer is a protected command buffer and protectedNoFault is not supported, any resource written to by the VkPipeline object bound to the pipeline bind point used by this command must not be an unprotected resource

  • VUID-vkCmdDispatchTileQCOM-commandBuffer-02713
    If commandBuffer is a protected command buffer and protectedNoFault is not supported, pipeline stages other than the framebuffer-space and compute stages in the VkPipeline object bound to the pipeline bind point used by this command must not write to any resource

  • VUID-vkCmdDispatchTileQCOM-commandBuffer-04617
    If any of the shader stages of the VkPipeline bound to the pipeline bind point used by this command uses the RayQueryKHR capability, then commandBuffer must not be a protected command buffer

  • VUID-vkCmdDispatchTileQCOM-None-10668
    When this command is recorded per-tile execution model must be enabled

  • VUID-vkCmdDispatchTileQCOM-None-10669
    The tileShadingDispatchTile must enabled

Valid Usage (Implicit)
  • VUID-vkCmdDispatchTileQCOM-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdDispatchTileQCOM-pDispatchTileInfo-parameter
    pDispatchTileInfo must be a valid pointer to a valid VkDispatchTileInfoQCOM structure

  • VUID-vkCmdDispatchTileQCOM-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdDispatchTileQCOM-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_COMPUTE_BIT operations

  • VUID-vkCmdDispatchTileQCOM-renderpass
    This command must only be called inside of a render pass instance

  • VUID-vkCmdDispatchTileQCOM-suspended
    This command must not be called between suspended render pass instances

  • VUID-vkCmdDispatchTileQCOM-videocoding
    This command must only be called outside of a video coding scope

Host Synchronization
  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Inside

Outside

VK_QUEUE_COMPUTE_BIT

Action

Conditional Rendering

vkCmdDispatchTileQCOM is affected by conditional rendering

The VkDispatchTileInfoQCOM structure is defined as:

// Provided by VK_QCOM_tile_shading
typedef struct VkDispatchTileInfoQCOM {
    VkStructureType    sType;
    const void*        pNext;
} VkDispatchTileInfoQCOM;
  • sType is a VkStructureType value identifying this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

Valid Usage (Implicit)

A subpass shading dispatches a compute pipeline work with the work dimension of render area of the calling subpass and work groups are partitioned by specified work group size. Subpass operations like subpassLoad are allowed to be used.

To record a subpass shading, call:

// Provided by VK_HUAWEI_subpass_shading
void vkCmdSubpassShadingHUAWEI(
    VkCommandBuffer                             commandBuffer);
  • commandBuffer is the command buffer into which the command will be recorded.

When the command is executed, a global workgroup consisting of ceil (render area size / local workgroup size) local workgroups is assembled.

Valid Usage
Valid Usage (Implicit)
  • VUID-vkCmdSubpassShadingHUAWEI-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdSubpassShadingHUAWEI-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdSubpassShadingHUAWEI-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_GRAPHICS_BIT operations

  • VUID-vkCmdSubpassShadingHUAWEI-renderpass
    This command must only be called inside of a render pass instance

  • VUID-vkCmdSubpassShadingHUAWEI-suspended
    This command must not be called between suspended render pass instances

  • VUID-vkCmdSubpassShadingHUAWEI-videocoding
    This command must only be called outside of a video coding scope

Host Synchronization
  • Host access to commandBuffer must be externally synchronized

  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Inside

Outside

VK_QUEUE_GRAPHICS_BIT

Action

Conditional Rendering

vkCmdSubpassShadingHUAWEI is not affected by conditional rendering

Dispatch Command for CUDA PTX Kernels

Compute kernels can be provided in SPIR-V or PTX code. When using PTX kernels the dispatch mechanism is different than with regular compute pipelines.

The way to create a PTX assembly file is beyond the scope of this documentation. For mode information, please refer to the CUDA toolkit documentation at https://docs.nvidia.com/cuda/.

Prior to using this command, you must initialize a CUDA module, and create a function handle that will serve as the entry point of the kernel to dispatch. See CUDA Modules.

The dispatching of a CUDA kernel is recorded into a command buffer, and when executed by a queue submit, will produce work which executes according to the bound compute pipeline.

To record a CUDA kernel launch, call:

// Provided by VK_NV_cuda_kernel_launch
void vkCmdCudaLaunchKernelNV(
    VkCommandBuffer                             commandBuffer,
    const VkCudaLaunchInfoNV*                   pLaunchInfo);
  • commandBuffer is the command buffer into which the command will be recorded.

  • pLaunchInfo is a pointer to a VkCudaLaunchInfoNV structure in which the grid (similar to workgroup) dimension, function handle and related arguments are defined.

When the command is executed, a global workgroup consisting of gridDimX × gridDimY × gridDimZ local workgroups is assembled.

Valid Usage (Implicit)
  • VUID-vkCmdCudaLaunchKernelNV-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdCudaLaunchKernelNV-pLaunchInfo-parameter
    pLaunchInfo must be a valid pointer to a valid VkCudaLaunchInfoNV structure

  • VUID-vkCmdCudaLaunchKernelNV-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdCudaLaunchKernelNV-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_COMPUTE_BIT, or VK_QUEUE_GRAPHICS_BIT operations

  • VUID-vkCmdCudaLaunchKernelNV-suspended
    This command must not be called between suspended render pass instances

  • VUID-vkCmdCudaLaunchKernelNV-videocoding
    This command must only be called outside of a video coding scope

Host Synchronization
  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Both

Outside

VK_QUEUE_COMPUTE_BIT
VK_QUEUE_GRAPHICS_BIT

Action

Conditional Rendering

vkCmdCudaLaunchKernelNV is not affected by conditional rendering

Passing Dispatch Parameters and Arguments

The VkCudaLaunchInfoNV structure is very close to the parameters of the CUDA-Driver function cuLaunchKernel documented in section 6.19 Execution Control of CUDA Driver API.

The structure is defined as:

// Provided by VK_NV_cuda_kernel_launch
typedef struct VkCudaLaunchInfoNV {
    VkStructureType        sType;
    const void*            pNext;
    VkCudaFunctionNV       function;
    uint32_t               gridDimX;
    uint32_t               gridDimY;
    uint32_t               gridDimZ;
    uint32_t               blockDimX;
    uint32_t               blockDimY;
    uint32_t               blockDimZ;
    uint32_t               sharedMemBytes;
    size_t                 paramCount;
    const void* const *    pParams;
    size_t                 extraCount;
    const void* const *    pExtras;
} VkCudaLaunchInfoNV;
  • sType is a VkStructureType value identifying this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

  • function is the CUDA-Driver handle to the function being launched.

  • gridDimX is the number of local workgroups to dispatch in the X dimension. It must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]

  • gridDimY is the number of local workgroups to dispatch in the Y dimension. It must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[1]

  • gridDimZ is the number of local workgroups to dispatch in the Z dimension. It must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[2]

  • blockDimX is block size in the X dimension.

  • blockDimY is block size in the Y dimension.

  • blockDimZ is block size in the Z dimension.

  • sharedMemBytes is the dynamic shared-memory size per thread block in bytes.

  • paramCount is the length of the pParams table.

  • pParams is a pointer to an array of paramCount pointers, corresponding to the arguments of function.

  • extraCount is reserved for future use.

  • pExtras is reserved for future use.

Kernel parameters of function are specified via pParams, very much the same way as described in cuLaunchKernel

If function has N parameters, then pParams must be an array of N pointers and paramCount must be N. Each of kernelParams[0] through kernelParams[N-1] must point to a region of memory from which the actual kernel parameter will be copied. The number of kernel parameters and their offsets and sizes are not specified here as that information is stored in the VkCudaFunctionNV object.

The application-owned memory pointed to by pParams and kernelParams[0] through kernelParams[N-1] are consumed immediately, and may be altered or freed after vkCmdCudaLaunchKernelNV has returned.

Valid Usage
  • VUID-VkCudaLaunchInfoNV-gridDimX-09406
    gridDimX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]

  • VUID-VkCudaLaunchInfoNV-gridDimY-09407
    gridDimY must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[1]

  • VUID-VkCudaLaunchInfoNV-gridDimZ-09408
    gridDimZ must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[2]

  • VUID-VkCudaLaunchInfoNV-paramCount-09409
    paramCount must be the total amount of parameters listed in the pParams table

  • VUID-VkCudaLaunchInfoNV-pParams-09410
    pParams must be a pointer to a table of paramCount parameters, corresponding to the arguments of function

  • VUID-VkCudaLaunchInfoNV-extraCount-09411
    extraCount must be 0

  • VUID-VkCudaLaunchInfoNV-pExtras-09412
    pExtras must be NULL

Valid Usage (Implicit)

Resource Sharing from Vulkan to the CUDA Kernel

Given that one key limitation of this extension is that Vulkan cannot access, nor bind any global resource of CUDA modules, the only way to exchange data with the kernel must be to pass resources via the arguments of the function.

You can use VK_KHR_buffer_device_address to write/read to/from a VkBuffer object. VK_KHR_buffer_device_address allows you to get the device address of the buffer to pass it as an argument into pParams. Application-side pointer arithmetic on the device address is legal, but will not be bounds-checked on the device.

The corresponding argument of the CUDA function should be declared as a pointer of the same type as the referenced buffer. CUDA code may simply read or write to this buffer in the typical C way.

You may also use VK_NVX_image_view_handle as another convenient way to read/write from/to a VkImage.

The corresponding argument of the CUDA function must be typed as cudaSurfaceObject_t.

  • You may read from it by using CUDA surface-read functions such as surf3Dread, surf2Dread, and surf1Dread

  • You may write to it by using CUDA surface-write functions such as surf3Dwrite, surf2Dwrite, and surf1Dwrite

Please refer to CUDA surface object documentation for more details

On Vulkan side, here is an example on how to setup VkImageViewHandleInfoNVX to query the handle for cudaSurfaceObject_t:

VkImageViewHandleInfoNVX imageViewHandleInfo = {VK_STRUCTURE_TYPE_IMAGE_VIEW_HANDLE_INFO_NVX};
imageViewHandleInfo.sampler = VK_NULL_HANDLE;
imageViewHandleInfo.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
imageViewHandleInfo.imageView = imageViewIn; // the VkImageView we want to access
uint32_t myViewHandleIn = vkGetImageViewHandleNVX(m_device, &imageViewHandleInfo);
imageViewHandleInfo.imageView = imageViewOut; // the VkImageView we want to access
uint32_t myViewHandleOut = vkGetImageViewHandleNVX(m_device, &imageViewHandleInfo);

Here is an example of how to declare parameters for pParams

VkCudaLaunchInfoNV launchInfo = { VK_STRUCTURE_TYPE_CUDA_LAUNCH_INFO_NV };

int block_size = 8;
float dt = 1.0f / 60.0f;

const void* params[] =
{
  &dt,
  &uint32_t myViewHandleIn,
  &uint32_t myViewHandleOut
};

launchInfo.function = cudaFunction; // CUDA function previously created
launchInfo.gridDimX = (volumeTexDimensionNonBoundary / block_size);
launchInfo.gridDimY = (volumeTexDimensionNonBoundary / block_size);
launchInfo.gridDimZ = (volumeTexDimensionNonBoundary / block_size);
launchInfo.blockDimX = block_size;
launchInfo.blockDimY = block_size;
launchInfo.blockDimZ = block_size;
launchInfo.sharedMemBytes = 0;
launchInfo.paramCount = 3;
launchInfo.pParams = &params[0];
launchInfo.extraCount = 0;
launchInfo.pExtras = nullptr;

vkCmdCudaLaunchKernelNV(commandBuffer, &launchInfo);

In the CUDA kernel source code, here is an example on how arguments match pParams and how we can use Surface object:

extern "C"  __global__ void cudaFunction(
  float dt,
  cudaSurfaceObject_t volumeTexIn,
  cudaSurfaceObject_t volumeTexOut
  )
{
  int i = 1 + blockIdx.x * blockDim.x + threadIdx.x;
  int j = 1 + blockIdx.y * blockDim.y + threadIdx.y;
  int k = 1 + blockIdx.z * blockDim.z + threadIdx.z;

  float val;
  surf3Dread(&val, volumeTexIn, i * sizeof(float), j, k);
  ...
  float result = ...;
  // write result
  surf3Dwrite(result, volumeTexOut, i * sizeof(float), j, k);
}

Compute Occupancy Priority

The VK_NV_compute_occupancy_priority extension provides applications with control over how their compute workloads utilize GPU compute resources, specifically allowing prioritization relative to other simultaneously executing workloads. Applications can specify the priority with which compute workloads should occupy GPU compute resources, allowing for a fine-grained distinction between workloads that may want to execute at a background priority over a long period of time versus workloads with harder latency requirements.

To set the compute occupancy priority for subsequent compute dispatches, call:

// Provided by VK_NV_compute_occupancy_priority
void vkCmdSetComputeOccupancyPriorityNV(
    VkCommandBuffer                             commandBuffer,
    const VkComputeOccupancyPriorityParametersNV* pParameters);
  • commandBuffer is the command buffer into which the command will be recorded.

  • pParameters is a pointer to a VkComputeOccupancyPriorityParametersNV structure specifying the occupancy priority parameters.

The occupancy priority affects how compute workloads utilize GPU compute resources relative to other simultaneously executing workloads. The priority is stateful on a command buffer. All compute dispatch commands issued subsequent to a vkCmdSetComputeOccupancyPriorityNV call will be executed with the specified priority parameters until another vkCmdSetComputeOccupancyPriorityNV call is made.

All command buffers (primary and secondary) start with a priority level equal to the VK_COMPUTE_OCCUPANCY_PRIORITY_NORMAL_NV value. The priority state is not inherited by secondary command buffers - each command buffer maintains its own independent priority state.

Valid Usage (Implicit)
  • VUID-vkCmdSetComputeOccupancyPriorityNV-commandBuffer-parameter
    commandBuffer must be a valid VkCommandBuffer handle

  • VUID-vkCmdSetComputeOccupancyPriorityNV-pParameters-parameter
    pParameters must be a valid pointer to a valid VkComputeOccupancyPriorityParametersNV structure

  • VUID-vkCmdSetComputeOccupancyPriorityNV-commandBuffer-recording
    commandBuffer must be in the recording state

  • VUID-vkCmdSetComputeOccupancyPriorityNV-commandBuffer-cmdpool
    The VkCommandPool that commandBuffer was allocated from must support VK_QUEUE_COMPUTE_BIT operations

  • VUID-vkCmdSetComputeOccupancyPriorityNV-videocoding
    This command must only be called outside of a video coding scope

Host Synchronization
  • Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties
Command Buffer Levels Render Pass Scope Video Coding Scope Supported Queue Types Command Type

Primary
Secondary

Both

Outside

VK_QUEUE_COMPUTE_BIT

State

Conditional Rendering

vkCmdSetComputeOccupancyPriorityNV is not affected by conditional rendering

The VkComputeOccupancyPriorityParametersNV structure is defined as:

// Provided by VK_NV_compute_occupancy_priority
typedef struct VkComputeOccupancyPriorityParametersNV {
    VkStructureType    sType;
    const void*        pNext;
    float              occupancyPriority;
    float              occupancyThrottling;
} VkComputeOccupancyPriorityParametersNV;
  • sType is a VkStructureType value identifying this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

  • occupancyPriority is a value specifying the occupancy priority for subsequent compute workloads, with a valid range of [0.0, 1.0]. A value of 0.0 represents the lowest priority, while a value of 1.0 is the maximum priority. Default priority is specified by a value of 0.5.

  • occupancyThrottling is a value specifying the level of occupancy throttling applied to subsequent workloads, with a valid range of [0.0, 1.0]. A value of 0.0 (the default) means no throttling is applied, allowing workloads to use the full available compute capacity. Non-zero values represent increasing levels of throttling, with higher values resulting in more restrictive occupancy limits. A value of 1.0 represents the maximum level of throttling supported by the implementation.

Valid Usage
  • VUID-VkComputeOccupancyPriorityParametersNV-occupancyPriority-11919
    occupancyPriority must be between 0 and 1, inclusive

  • VUID-VkComputeOccupancyPriorityParametersNV-occupancyThrottling-11920
    occupancyThrottling must be between 0 and 1, inclusive

Valid Usage (Implicit)

VK_COMPUTE_OCCUPANCY_PRIORITY_LOW_NV is a constant value that can be used for VkComputeOccupancyPriorityParametersNV::occupancyPriority to specify a low priority level.

#define VK_COMPUTE_OCCUPANCY_PRIORITY_LOW_NV 0.25f

VK_COMPUTE_OCCUPANCY_PRIORITY_NORMAL_NV is a constant value that can be used for VkComputeOccupancyPriorityParametersNV::occupancyPriority to specify a normal priority level. This represents the default priority level.

#define VK_COMPUTE_OCCUPANCY_PRIORITY_NORMAL_NV 0.50f

VK_COMPUTE_OCCUPANCY_PRIORITY_HIGH_NV is a constant value that can be used for VkComputeOccupancyPriorityParametersNV::occupancyPriority to specify a high priority level.

#define VK_COMPUTE_OCCUPANCY_PRIORITY_HIGH_NV 0.75f