Cluster Culling Shading

This shader type has an execution environment similar to that of a compute shader, where a collection of shader invocations form a workgroup and cooperate to perform coarse level geometry culling and LOD selection. A shader invocation can emit a set of built-in output variables via a new built-in function. The cluster culling shader organizes these emitted variables into a drawing command used by the subsequent rendering pipeline.

Cluster Culling Shader Input

The only inputs available to the cluster culling shader are variables identifying the specific workgroup and invocation.

Cluster Culling Shader Output

If a cluster survives after culling in a cluster culling shader invocation, a drawing command to draw this cluster should be emitted by this shader invocation for further rendering processing. There are two types of drawing command, indexed mode and non-indexed mode. Both type of drawing commands consist of a set of built-in output variables which have a similar definition to VkDrawIndexedIndirectCommand and VkDrawIndirectCommand members.

Cluster culling shaders have the following built-in output variables:

  • built-in variable IndexCountHUAWEI is the number of vertices to draw.

  • built-in variable VertexCountHUAWEI is the number of vertices to draw.

  • built-in variable InstanceCountHUAWEI is the number of instances to draw.

  • built-in variable FirstIndexHUAWEI is the base index within the index buffer.

  • built-in variable FirstVertexHUAWEI is the index of the first vertex to draw

  • built-in variable VertexOffsetHUAWEI is the value added to the vertex index before indexing into the vertex buffer.

  • built-in variable FirstInstanceHUAWEI is the instance ID of the first instance to draw.

  • built-in variable ClusterIDHUAWEI is the index of cluster being rendered by this drawing command. When cluster culling shader is enabled, ClusterIDHUAWEI will replace gl_DrawID pass to vertex shader.

  • built-in variable ClusterShadingRate is the shading rate of cluster being rendered by this drawing command.

Cluster Culling Shader Cluster Ordering

  • When a cluster culling shader is used, all output clusters generated by DispatchClusterHUAWEI() in a given workgroup are passed to subsequent pipeline stage before any cluster generated from subsequent workgroup.

  • In a workgroup, the order of output clusters generated by DispatchClusterHUAWEI() is specified by the local invocation id, from lower to higher values.

  • If any cluster culling invocation in the workgroup does not call DispatchClusterHUAWEI(), no cluster will be sent to the subsequent rendering pipeline.

  • Any cluster culling shader invocation may also call DispatchClusterHUAWEI() many times as shown below:

// Cluster Culling Shader sample code:
        ......
    DispatchClusterHUAWEI();  // dispatch 0
        ......
    DispatchClusterHUAWEI();  // dispatch 1
        ......
    DispatchClusterHUAWEI();  // dispatch 2
        ......

In this case, the output sequence of clusters in a workgroup are specified as shown below ( in case of 32 shader invocations in a workgroup):

1. shader invocation0.dispatch0
2. shader invocation1.dispatch0,
            ..........
32. shader invocation31.dispatch0
33. shader invocation0.dispatch1
34. shader invocation1.dispatch1
            ..........
64. shader invocation31.dispatch1
65. shader invocation0.dispatch2
66. shader invocation1.dispatch2
            ..........
96. shader Invocation31.dispatch2

Cluster Culling Shader Primitive Ordering

Following guarantees are provided for the relative ordering of primitives produced by a cluster culling shader, as they pertain to primitive order.

  • Limited guarantees are provided for the relative ordering of primitives produced by a cluster culling shader, as they pertain to primitive order.

  • The order of primitives in a given cluster is specified by the content of

    • DispatchClusterHUAWEI() with indexed output built-in variables, vertices sourced from a lower index buffer addresses to higher addresses.

    • DispatchClusterHUAWEI() with non-indexed output built-in variables, from vertices with a lower numbered vertexIndex to a higher numbered vertexIndex.