Execution Graphs
Execution graphs provide a way for applications to dispatch multiple operations dynamically from a single initial command on the host. To achieve this, a new execution graph pipeline is provided, that links together multiple shaders or pipelines which each describe one or more operations that can be dispatched within the execution graph. Each linked pipeline or shader describes an execution node within the graph, which can be dispatched dynamically from another shader within the same graph. This allows applications to describe much richer execution topologies at a finer granularity than would typically be possible with API commands alone.
Pipeline Creation
To create execution graph pipelines, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkCreateExecutionGraphPipelinesAMDX(
VkDevice device,
VkPipelineCache pipelineCache,
uint32_t createInfoCount,
const VkExecutionGraphPipelineCreateInfoAMDX* pCreateInfos,
const VkAllocationCallbacks* pAllocator,
VkPipeline* pPipelines);
-
device
is the logical device that creates the execution graph pipelines. -
pipelineCache
is either VK_NULL_HANDLE, indicating that pipeline caching is disabled; or the handle of a valid pipeline cache object, in which case use of that cache is enabled for the duration of the command. -
createInfoCount
is the length of thepCreateInfos
andpPipelines
arrays. -
pCreateInfos
is a pointer to an array of VkExecutionGraphPipelineCreateInfoAMDX structures. -
pAllocator
controls host memory allocation as described in the Memory Allocation chapter. -
pPipelines
is a pointer to an array of VkPipeline handles in which the resulting execution graph pipeline objects are returned.
Pipelines are created and returned as described for Multiple Pipeline Creation.
The VkExecutionGraphPipelineCreateInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkExecutionGraphPipelineCreateInfoAMDX {
VkStructureType sType;
const void* pNext;
VkPipelineCreateFlags flags;
uint32_t stageCount;
const VkPipelineShaderStageCreateInfo* pStages;
const VkPipelineLibraryCreateInfoKHR* pLibraryInfo;
VkPipelineLayout layout;
VkPipeline basePipelineHandle;
int32_t basePipelineIndex;
} VkExecutionGraphPipelineCreateInfoAMDX;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkPipelineCreateFlagBits specifying how the pipeline will be generated. -
stageCount
is the number of entries in thepStages
array. -
pStages
is a pointer to an array ofstageCount
VkPipelineShaderStageCreateInfo structures describing the set of the shader stages to be included in the execution graph pipeline. -
pLibraryInfo
is a pointer to a VkPipelineLibraryCreateInfoKHR structure defining pipeline libraries to include. -
layout
is the description of binding locations used by both the pipeline and descriptor sets used with the pipeline. -
basePipelineHandle
is a pipeline to derive from -
basePipelineIndex
is an index into thepCreateInfos
parameter to use as a pipeline to derive from
The parameters basePipelineHandle
and basePipelineIndex
are
described in more detail in Pipeline Derivatives.
Each shader stage provided when creating an execution graph pipeline
(including those in libraries) is associated with a name and an index,
determined by the inclusion or omission of a
VkPipelineShaderStageNodeCreateInfoAMDX structure in its pNext
chain.
For any graphics pipeline libraries, only the name and index of the vertex
or mesh shader stage is linked directly to the graph as a node - other
shader stages in the pipeline will be executed after those shader stages as
normal.
Task shaders cannot be included in a graphics pipeline used for a draw node.
In addition to the shader name and index, an internal "node index" is also generated for each node, which can be queried with vkGetExecutionGraphPipelineNodeIndexAMDX, and is used exclusively for initial dispatch of an execution graph.
VK_SHADER_INDEX_UNUSED_AMDX
is a special shader index used to indicate
that the created node does not override the index.
In this case, the shader index is determined through other means.
It is defined as:
#define VK_SHADER_INDEX_UNUSED_AMDX (~0U)
The VkPipelineShaderStageNodeCreateInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkPipelineShaderStageNodeCreateInfoAMDX {
VkStructureType sType;
const void* pNext;
const char* pName;
uint32_t index;
} VkPipelineShaderStageNodeCreateInfoAMDX;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pName
is the shader name to use when creating a node in an execution graph. IfpName
isNULL
, the name of the entry point specified in SPIR-V is used as the shader name. -
index
is the shader index to use when creating a node in an execution graph. Ifindex
isVK_SHADER_INDEX_UNUSED_AMDX
then the original index is used, either as specified by theShaderIndexAMDX
execution mode, or0
if that too is not specified.
When included in the pNext
chain of a
VkPipelineShaderStageCreateInfo structure, this structure specifies
the shader name and shader index of a node when creating an execution graph
pipeline.
If this structure is omitted, the shader name is set to the name of the
entry point in SPIR-V and the shader index is set to 0
.
When dispatching a node from another shader, the name is fixed at pipeline creation, but the index can be set dynamically. By associating multiple shaders with the same name but different indexes, applications can dynamically select different nodes to execute. Applications must ensure each node has a unique name and index.
Shaders with the same name must be of the same type - e.g. a compute and graphics shader, or even two compute shaders where one is coalescing and the other is not, cannot share the same name. |
To query the internal node index for a particular node in an execution graph, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkGetExecutionGraphPipelineNodeIndexAMDX(
VkDevice device,
VkPipeline executionGraph,
const VkPipelineShaderStageNodeCreateInfoAMDX* pNodeInfo,
uint32_t* pNodeIndex);
-
device
is the logical device thatexecutionGraph
was created on. -
executionGraph
is the execution graph pipeline to query the internal node index for. -
pNodeInfo
is a pointer to a VkPipelineShaderStageNodeCreateInfoAMDX structure identifying the name and index of the node to query. -
pNodeIndex
is the returned internal node index of the identified node.
Once this function returns, the contents of pNodeIndex
contain the
internal node index of the identified node.
Initializing Scratch Memory
Implementations may need scratch memory to manage dispatch queues or similar when executing a pipeline graph, and this is explicitly managed by the application.
To query the scratch space required to dispatch an execution graph, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkGetExecutionGraphPipelineScratchSizeAMDX(
VkDevice device,
VkPipeline executionGraph,
VkExecutionGraphPipelineScratchSizeAMDX* pSizeInfo);
-
device
is the logical device thatexecutionGraph
was created on. -
executionGraph
is the execution graph pipeline to query the scratch space for. -
pSizeInfo
is a pointer to a VkExecutionGraphPipelineScratchSizeAMDX structure that will contain the required scratch size.
After this function returns, information about the scratch space required
will be returned in pSizeInfo
.
The VkExecutionGraphPipelineScratchSizeAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkExecutionGraphPipelineScratchSizeAMDX {
VkStructureType sType;
void* pNext;
VkDeviceSize minSize;
VkDeviceSize maxSize;
VkDeviceSize sizeGranularity;
} VkExecutionGraphPipelineScratchSizeAMDX;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
minSize
indicates the minimum scratch space required for dispatching the queried execution graph. -
maxSize
indicates the maximum scratch space that can be used for dispatching the queried execution graph. -
sizeGranularity
indicates the granularity at which the scratch space can be increased fromminSize
.
Applications can use any amount of scratch memory greater than
minSize
for dispatching a graph, however only the values equal to
minSize
+ an integer multiple of sizeGranularity
will be used.
Greater values may result in higher performance, up to maxSize
which
indicates the most memory that an implementation can use effectively.
To initialize scratch memory for a particular execution graph, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdInitializeGraphScratchMemoryAMDX(
VkCommandBuffer commandBuffer,
VkPipeline executionGraph,
VkDeviceAddress scratch,
VkDeviceSize scratchSize);
-
commandBuffer
is the command buffer into which the command will be recorded. -
executionGraph
is the execution graph pipeline to initialize the scratch memory for. -
scratch
is the address of scratch memory to be initialized. -
scratchSize
is a range in bytes of scratch memory to be initialized.
This command must be called before using scratch
to dispatch the
bound execution graph pipeline.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ scratchSize
).
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
If any portion of scratch
is modified by any command other than
vkCmdDispatchGraphAMDX, vkCmdDispatchGraphIndirectAMDX,
vkCmdDispatchGraphIndirectCountAMDX, or
vkCmdInitializeGraphScratchMemoryAMDX with the same execution graph,
it must be reinitialized for the execution graph again before dispatching
against it.
Dispatching a Graph
Initial dispatch of an execution graph is done from the host in the same way as any other command, and can be used in a similar way to compute dispatch commands, with indirect variants available.
To record an execution graph dispatch, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
VkDeviceSize scratchSize,
const VkDispatchGraphCountInfoAMDX* pCountInfo);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is the address of scratch memory to be used. -
scratchSize
is a range in bytes of scratch memory to be used. -
pCountInfo
is a host pointer to a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in pCountInfo
are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
There are no rasterization order guarantees between separately dispatched
graphics nodes, though individual primitives within a single dispatch do
adhere to rasterization order.
Draw calls executed before or after the execution graph also execute
relative to each graphics node with respect to rasterization order.
For this command, all device/host pointers in substructures are treated as host pointers and read only during host execution of this command. Once this command returns, no reference to the original pointers is retained.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ scratchSize
).
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
This command captures command buffer state for mesh nodes similarly to draw commands.
To record an execution graph dispatch with node and payload parameters read on device, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphIndirectAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
VkDeviceSize scratchSize,
const VkDispatchGraphCountInfoAMDX* pCountInfo);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is the address of scratch memory to be used. -
scratchSize
is a range in bytes of scratch memory to be used. -
pCountInfo
is a host pointer to a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in pCountInfo
are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
There are no rasterization order guarantees between separately dispatched
graphics nodes, though individual primitives within a single dispatch do
adhere to rasterization order.
Draw calls executed before or after the execution graph also execute
relative to each graphics node with respect to rasterization order.
For this command, all device/host pointers in substructures are treated as
device pointers and read during device execution of this command.
The allocation and contents of these pointers only needs to be valid during
device execution.
All of these addresses will be read in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
access flag.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ scratchSize
).
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
This command captures command buffer state for mesh nodes similarly to draw commands.
To record an execution graph dispatch with all parameters read on device, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphIndirectCountAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
VkDeviceSize scratchSize,
VkDeviceAddress countInfo);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is the address of scratch memory to be used. -
scratchSize
is a range in bytes of scratch memory to be used. -
countInfo
is a device address of a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in countInfo
are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all pointers in substructures are treated as device
pointers and read during device execution of this command.
The allocation and contents of these pointers only needs to be valid during
device execution.
All of these addresses will be read in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
access flag.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ scratchSize
).
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
The VkDeviceOrHostAddressConstAMDX
union is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef union VkDeviceOrHostAddressConstAMDX {
VkDeviceAddress deviceAddress;
const void* hostAddress;
} VkDeviceOrHostAddressConstAMDX;
-
deviceAddress
is a buffer device address as returned by the vkGetBufferDeviceAddressKHR command. -
hostAddress
is a const host memory address.
The VkDispatchGraphCountInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkDispatchGraphCountInfoAMDX {
uint32_t count;
VkDeviceOrHostAddressConstAMDX infos;
uint64_t stride;
} VkDispatchGraphCountInfoAMDX;
-
count
is the number of dispatches to perform. -
infos
is the device or host address of a flat array of VkDispatchGraphInfoAMDX structures -
stride
is the byte stride between successive VkDispatchGraphInfoAMDX structures ininfos
Whether infos
is consumed as a device or host pointer is defined by
the command this structure is used in.
The VkDispatchGraphInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkDispatchGraphInfoAMDX {
uint32_t nodeIndex;
uint32_t payloadCount;
VkDeviceOrHostAddressConstAMDX payloads;
uint64_t payloadStride;
} VkDispatchGraphInfoAMDX;
-
nodeIndex
is the index of a node in an execution graph to be dispatched. -
payloadCount
is the number of payloads to dispatch for the specified node. -
payloads
is a device or host address pointer to a flat array of payloads with size equal to the product ofpayloadCount
andpayloadStride
-
payloadStride
is the byte stride between successive payloads inpayloads
Whether payloads
is consumed as a device or host pointer is defined by
the command this structure is used in.
Shader Enqueue
Compute shaders in an execution graph can use the
OpInitializeNodePayloadsAMDX
to initialize nodes for dispatch.
Any node payload initialized in this way will be enqueued for dispatch once
the shader is done writing to the payload.
As compilers may be conservative when making this determination, shaders
can further call OpFinalizeNodePayloadsAMDX
to guarantee that the
payload is no longer being written.
The Node
Name
operand of the PayloadNodeNameAMDX
decoration
on a payload identifies the shader name of the node to be enqueued, and the
Shader
Index
operand of OpInitializeNodePayloadsAMDX
identifies the shader index.
A node identified in this way is dispatched as described in the following
sections.
Compute Nodes
Compute shaders added as nodes to an execution graph are executed
differently based on the presence or absence of the
StaticNumWorkgroupsAMDX
or CoalescingAMDX
execution modes.
Dispatching a compute shader node that does not declare either the
StaticNumWorkgroupsAMDX
or CoalescingAMDX
execution mode will
execute a number of workgroups in each dimension specified by the first 12
bytes of the payload, interpreted as a VkDispatchIndirectCommand.
The same payload will be broadcast to each workgroup in the same dispatch.
Additional values in the payload are have no effect on execution.
Dispatching a compute shader node with the StaticNumWorkgroupsAMDX
execution mode will execute workgroups in each dimension according to the
x
, y
, and z
size
operands to the
StaticNumWorkgroupsAMDX
execution mode.
The same payload will be broadcast to each workgroup in the same dispatch.
Any values in the payload have no effect on execution.
Dispatching a compute shader node with the CoalescingAMDX
execution
mode will enqueue a single invocation for execution.
Implementations may combine multiple such dispatches into the same
workgroup, up to the size of the workgroup.
The number of invocations coalesced into a given workgroup in this way can
be queried via the CoalescedInputCountAMDX
built-in.
Any values in the payload have no effect on execution.
Mesh Nodes
Graphics pipelines added as nodes to an execution graph are executed in a manner similar to a vkCmdDrawMeshTasksIndirectEXT, using the same payloads as compute shaders, but capturing some state from the command buffer.
When an execution graph dispatch is recorded into a command buffer, it captures the following dynamic state for use with draw nodes:
-
VK_DYNAMIC_STATE_VIEWPORT
-
VK_DYNAMIC_STATE_SCISSOR
-
VK_DYNAMIC_STATE_LINE_WIDTH
-
VK_DYNAMIC_STATE_DEPTH_BIAS
-
VK_DYNAMIC_STATE_BLEND_CONSTANTS
-
VK_DYNAMIC_STATE_DEPTH_BOUNDS
-
VK_DYNAMIC_STATE_VIEWPORT_WITH_COUNT
-
VK_DYNAMIC_STATE_SCISSOR_WITH_COUNT
-
VK_DYNAMIC_STATE_FRAGMENT_SHADING_RATE_KHR
Other state is not captured, and graphics pipelines must not be created with other dynamic states when used as a library in an execution graph pipeline.