GLSL_NV_mesh_shader

The original text file describing this extension as a set of diffs to the OpenGL Shading Language Specification follows.

Name

    NV_mesh_shader

Name String

    GL_NV_mesh_shader

Contact

    Christoph Kubisch, NVIDIA (ckubisch 'at' nvidia.com)
    Pat Brown, NVIDIA (pbrown 'at' nvidia.com)

Contributors

    Yury Uralsky, NVIDIA
    Daniel Koch, NVIDIA
    Sahil Parmar, NVIDIA

Status

    Shipping

Version

    Last Modified Date:     March 6, 2019
    NVIDIA Revision:        7

Dependencies

    This extension can be applied to OpenGL GLSL versions 4.50
    (#version 450) and higher.

    This extension can be applied to OpenGL ES ESSL versions 3.20
    (#version 320) and higher.

    This extension is written against the GLSL 4.50.6 Specification
    (Compatibility Profile), dated April 14, 2016.

    This extension interacts with GLSL 4.60 and KHR_vulkan_glsl.

    This extension interacts with NV_viewport_array2.

    This extension interacts with NV_stereo_view_rendering.

    This extension interacts with NVX_multiview_per_view_attributes.

    This extension interacts with ARB_shader_draw_parameters.

    This extension interacts with EXT_clip_cull_distance.

Overview

    This extension provides a new mechanism allowing applications to use two
    new programmable shader types -- the task and mesh shader -- to generate
    collections of geometric primitives to be processed by fixed-function
    primitive assembly and rasterization logic.  When the task and mesh
    shaders are dispatched, they replace the standard programmable vertex
    processing pipeline, including vertex array attribute fetching, vertex
    shader processing, tessellation, and the geometry shader processing.

    Both new shader types have execution environments similar to that of
    compute shaders, where a collection of shader invocations form a work
    group and cooperate to produce a set of outputs.  Unlike traditional
    vertex, tessellation, and geometry shaders that typically process a vertex
    or primitive at a time, the mesh and task shaders process and generate a
    batch of primitives at once.  The optional task shader pre-processes
    geometry and generates a variable number of mesh shader tasks.  The mesh
    shader evaluates the geometry corresponding to its task and emits a mesh
    -- a collection of vertices arranged into point, line, or triangle
    primitives.  The primitives emitted by the mesh shader are then processed
    by fixed-function primitive assembly and rasterization logic and generate
    fragments that will be processed by the fragment shader.

    Work is submitted to the mesh pipeline by launching the work from the API
    which spawns a one-dimensional array of tasks, similar to the
    API dispatch for compute spawns a three-dimensional array of compute shader
    work groups. If a task shader is present, each task generated by
    this launch spawns a task shader work group.  If no task shader is
    present, each task generated by the launch spawns a mesh shader
    work group.

    When a task shader work group is executed, its invocations execute in
    parallel and evaluate geometry associated with the task.  The task shader
    has no built-in or user-defined input variables other than the built-ins
    identifying the work group and invocation being executed.  The task shader
    can use that information to read properties of the geometry associated
    with the task from memory, using shader storage buffers, textures, or
    other resources.  The task shader determines the number of mesh shader
    tasks that should be spawned for the task it is processing and writes the
    task count to the built-in variable gl_TaskCountNV.  Additionally, the
    task shader can compute and write additional properties of the geometry it
    processes to user-defined output variables qualified with "taskNV" to
    task memory, which can be read as inputs by all of the mesh shaders that
    it spawns.  The task shader can be used to drive level-of-detail
    calculations for procedurally generated geometry, to perform coarse-level
    culling for batches of static or dynamic geometry, and for other forms of
    work reduction or amplification.

    When a mesh shader work group is executed, its invocations execute in
    parallel to evaluate geometry corresponding to its task and emit a mesh
    for further processing by subsequent pipeline stages.  As with task
    shaders, mesh shaders have no built-in inputs other than those identifying
    the work group and invocation being executed, and must fetch their inputs
    explicitly from memory.  The mesh shader invocations collectively must
    produce a mesh, which consists of:

    * a primitive count, written to the built-in output gl_PrimitiveCountNV;

    * a collection of vertex attributes, where each vertex in the mesh has a
      set of built-in and user-defined per-vertex output variables and blocks;

    * a collection of primitive attributes, where each of the
      gl_PrimitiveCountNV primitives in the mesh has a set of built-in and
      user-defined per-primitive output variables and blocks; and

    * an array of vertex index values written to the built-in output array
      gl_PrimitiveIndicesNV, where each output primitive has a set of one,
      two, or three indices that identify the output vertices in the mesh used
      to form the primitive.

    The number of primitives and vertices emitted by the mesh shader can be
    variable, but the mesh shader must specify maximum vertex and primitive
    counts.  There are implementation-dependent limits on the number of
    vertices and primitives emitted by the mesh shader, and are also
    implementation-dependent limits on the total amount of memory consumed by
    a mesh.  In the initial implementation of this extension, implementation
    limits are sufficiently low that complex geometry will need to be
    decomposed into multiple tasks.

    A typical mesh shader used to render static triangle data might operate in
    three phases.  The first phase fetches vertex position data and local
    index data of the primitives that the mesh represents.  The index data
    would have been prepared offline to leverage vertex re-use within the
    mesh.  In the second phase, triangles would be culled and output primitive
    indices written.  Finally, other vertex attributes of the surviving subset
    of vertices would be loaded and computed.  During this process, the
    invocations would sometimes work on a per-vertex and sometimes on a
    per-primitive level.

    Additionally, mesh shaders include infrastructure to allow a single mesh
    shader work group to compute a mesh with multiple "views" (e.g., left and
    right eye views for stereoscopic rendering), using a "view index" similar
    to the view IDs used in the OVR_multiview (OpenGL and OpenGL ES) and
    VK_KHR_multiview (Vulkan) extensions.  Unlike those extensions, the
    programming model here does not run separate shader invocations for each
    view but instead allows shaders to designate individual outputs as
    "per-view".  When a mesh shader completes, its primitives will be
    processed separately for each view with fragments directed at separate
    layers of the framebuffer.  For each view, outputs designated as per-view
    (such as position) will take on values written for that view and all other
    outputs will take on a single shared value written for all views.

      Conventional   From Application
         Vertex             |
        Pipeline            v
                    Launch Mesh Tasks
        (Fig 3.1)           |
            |           --------+
            |           |         |
            |           |         |
            |           |    Task Shader ---+
            |           |         |         |
            |           |         v         |
            |           |  Task Generation  |     Image Load/Store
            |           |         |         |     Atomic Counter
            |           --------+         |<--> Shader Storage
            |               |               |     Texture Fetch
            |               v               |     Uniform Block
            |         Mesh Shader ----------+
            |               |               |
            -------------> +               |
                            |               |
                            v               |
                       Rasterization        |
                            |               |
                            v               |
                      Fragment Shader ------
                            |
                            v
                  Per-Fragment Operations
                            |
                            v
                      Framebuffer

                Mesh Processing Pipeline


    Mapping to SPIR-V
    -----------------

    For informational purposes (non-normative), the following is an
    expected way for an implementation to map GLSL constructs to SPIR-V
    constructs:

      task shader -> TaskNV Execution model
      mesh shader -> MeshNV Execution model

      shared qualifier -> Workgroup Storage Class (existing)

      points layout qualifier -> OutputPoints Execution Mode (existing)
      lines layout qualifier -> OutputLinesNV Execution Mode
      triangles layout qualifier -> OutputTrianglesNV Execution Mode
      max_vertices layout qualifier -> OutputVertices Execution Mode (existing)
      max_primitives layout qualifier -> OutputPrimitivesNV Execution Mode
      local_size_(xyz) layout qualifiers -> LocalSize Execution Mode (existing)
      local_size_(xyz)_id layout qualifiers -> LocalSizeId Execution Mode (existing)

      perprimitiveNV auxiliary storage qualifier -> PerPrimitiveNV Decoration
      perviewNV auxiliary storage qualifier -> PerViewNV Decoration
      taskNV auxiliary storage qualifier -> PerTaskNV Decoration

      gl_WorkGroupSize -> WorkgroupSize decorated OpVariable (existing)
      gl_WorkGroupID -> WorkgroupId decorated OpVariable (existing)
      gl_LocalInvocationID -> LocalInvocationId decorated OpVariable (existing)
      gl_GlobalInvocationID -> GlobalInvocationId decorated OpVariable (existing)
      gl_LocalInvocationIndex -> LocalInvocationIndex decorated OpVariable (existing)
      gl_TaskCountNV -> TaskCountNV decorated OpVariable
      gl_PrimitiveCountNV -> PrimitiveCountNV decorated OpVariable
      gl_PrimitiveIndicesNV -> PrimitiveIndicesNV decorated OpVariable
      gl_Position -> Position decorated OpVariable (existing)
      gl_PositionPerViewNV -> PositionPerViewNV decorated OpVariable (existing extension)
      gl_PointSize -> PointSize decorated OpVariable (existing)
      gl_ClipDistance -> ClipDistance decorated OpVariable (existing)
      gl_ClipDistancePerViewNV -> ClipDistancePerViewNV decorated OpVariable
      gl_CullDistance -> CullDistance decorated OpVariable (existing)
      gl_CullDistancePerViewNV -> CullDistancePerViewNV decorated OpVariable
      gl_PrimitiveID -> PrimitiveId decorated OpVariable (existing)
      gl_Layer -> Layer decorated OpVariable (existing)
      gl_LayerPerViewNV -> LayerPerViewNV decorated OpVariable
      gl_ViewportIndex -> ViewportIndex decorated OpVariable (existing)
      gl_ViewportMask -> ViewportMaskNV decorated OpVariable (existing extension)
      gl_ViewportMaskPerViewNV -> ViewportMaskPerViewNV decorated OpVariable (existing extension)
      gl_MeshViewCountNV -> MeshViewCountNV decorated OpVariable
      gl_MeshViewIndicesNV -> MeshViewIndicesNV decorated OpVariable
      gl_DrawID -> DrawIndex decorated OpVariable (existing 1.3, extension)

      gl_MeshPerVertexNV -> block name, not needed
      gl_MeshPerPrimitiveNV -> block name, not needed

      writePackedPrimitiveIndices4x8NV -> OpWritePackedPrimitiveIndices4x8NV()


Modifications to the OpenGL Shading Language Specification, Version 4.50.6

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_NV_mesh_shader : 

    where  is as specified in section 3.3.

    A new preprocessor #define is added to the OpenGL Shading Language:

      #define GL_NV_mesh_shader 1


    Modify the introduction to Chapter 2, Overview of OpenGL Shading (p. 7)

    (modify first paragraph)  ... Currently, these processors are the vertex,
    tessellation control, tessellation evaluation, geometry, fragment,
    compute, task, and mesh processors.

    (modify second paragraph)  ... The specific languages will be referred to
    by the name of the processor they target: vertex, tessellation control,
    tessellation evaluation, geometry, fragment, compute, task, or mesh.


    Insert new sections at the end of Chapter 2 (p. 9)

    Section 2.7, Task Processor

    The task processor is a programmable unit that operates in conjunction
    with the mesh processor to produce a collection of primitives that will be
    processed by subsequent stages of the graphics pipeline.  The task and
    mesh processors form a primitive processing pipeline that can be used
    instead of the conventional primitive processing pipeline that includes
    the vertex, tessellation control, tessellation evaluation, and geometry
    processors.  Compilation units written in the OpenGL Shading Language to
    run on this processor are called task shaders.  When a set of task shaders
    is successfully compiled and linked, they result in a task shader
    executable that runs on the task processor.

    A task shader has access to many of the same resources as fragment and
    other shader processors, including textures, buffers, image variables, and
    atomic counters.  The task shader has no fixed-function inputs other than
    variables identifying the specific work group and invocation; any vertex
    attributes or other data required by the task shader must be fetched from
    memory.  The only fixed output of the task shader is a task count,
    identifying the number of mesh shader work groups to spawn.  The task
    shader can write additional outputs to task memory, which can be read by
    all of the mesh shader work groups it spawns.

    A task shader operates on a group of work items called a work group. A
    work group is a collection of shader invocations that execute the same
    code, potentially in parallel. An invocation within a work group may share
    data with other members of the same work group through shared variables
    and issue memory and control barriers to synchronize with other members of
    the same work group.

    Section 2.8, Mesh Processor

    The mesh processor is a programmable unit that operates in conjunction
    with the task processor to produce a collection of primitives that will be
    processed by subsequent stages of the graphics pipeline.  The task and
    mesh processors form a primitive processing pipeline that can be used
    instead of the conventional primitive processing pipeline that includes
    the vertex, tessellation control, tessellation evaluation, and geometry
    processors.  Compilation units written in the OpenGL Shading Language to
    run on this processor are called mesh shaders.  When a set of mesh shaders
    is successfully compiled and linked, they result in a mesh shader
    executable that runs on the mesh processor.

    A mesh shader has access to many of the same resources as fragment and
    other shader processors, including textures, buffers, image variables, and
    atomic counters.  The only inputs available to the mesh shader are
    variables identifying the specific work group and invocation and any
    outputs written to task memory by the task shader that spawned the mesh
    shader's work group.  Any vertex attributes or other data required by the
    mesh shader must be fetched from memory.  The invocations of the mesh
    shader work group write an output mesh, comprising a set of primitives
    with per-primitive attributes, a set of vertices with per-vertex
    attributes, and an array of indices identifying the mesh vertices that
    belong to each primitive.  The primitives of this mesh are then processed
    by subsequent graphics pipeline stages, where the outputs of the mesh
    shader form an interface with the fragment shader.

    A mesh shader operates on a group of work items called a work group. A
    work group is a collection of shader invocations that execute the same
    code, potentially in parallel. An invocation within a work group may share
    data with other members of the same work group through shared variables
    and issue memory and control barriers to synchronize with other members of
    the same work group.


    Modify Section 3.6, Keywords (p. 18)

    (add to the end of the list of keywords, p. 19)

      perprimitiveNV
      perviewNV
      taskNV


    Modify Section 3.8.2, Dynamically Uniform Expressions and Uniform Control
    Flow (p. 21)

    (modify third paragraph of this section)

    An invocation group is the complete set of invocations collectively
    processing a particular compute, task, or mesh shader workgroup, or a
    graphical operation, where the scope ...


    Modify Section 4.3, Storage Qualifiers (p. 43)

    (modify table of base storage qualifiers, p. 43)

       Qualifier          Meaning
    ------------------    -----------------------------------------------
        shared            variable storage for compute, task, and mesh shaders
                          shared across all work items in a local work group

    (add to table of auxiliary storage qualifiers, p. 44)

    Auxiliary Storage
       Qualifier          Meaning
    ------------------    -----------------------------------------------
    perprimitiveNV        mesh shader outputs with per-primitive instances
    perviewNV             mesh shader outputs with per-view instances
    taskNV                generic outputs for task shader work groups


    Modify Section 4.3.4, Input Variables (p. 46)

    (modify third paragraph, p. 47, to treat all mesh shader outputs as
    "arrayed" interfaces)

    Some inputs and outputs are arrayed ...  Geometry shader inputs,
    tessellation control shader inputs and outputs, tessellation evaluation
    inputs, and mesh shader outputs all have an additional level of arrayness
    relative to other shader inputs and outputs.  Component limits for these
    arrayed interfaces (e.g., gl_MaxTessControlInputComponents) are limits for
    a single instance and not for the entire interface.

    (insert before the last paragraph, p. 47, "Fragment shader inputs get")

    Task shaders do not permit user-defined input variables and do not form a
    formal interface with any previous shader stage. See section 7.1 "Built-In
    Variables" for a description of built-in task shader input variables.  All
    other input to a task shader is retrieved explicitly through image loads,
    texture fetches, loads from uniforms, uniform buffers, or shader storage
    buffers, or other user supplied code.  Redeclaration of built-in input
    variables in task shaders is not permitted.

    Mesh shaders form an interface with task shaders and support a collection
    of input variables in task memory.  All user-defined mesh shader inputs
    must be declared as members of a single interface block qualified with
    "taskNV" qualifier.  Mesh shaders do not support user-defined inputs
    declared outside interface blocks or without "taskNV" and do not support
    more than one input interface block.  In addition to user-defined inputs,
    mesh shaders support the built-in input variables described in section
    7.1.  User-defined mesh shader input variables are filled with the values
    of matching user-defined output variables written by the task shader.  As
    with other input variables, mesh shader inputs in task memory must be
    declared using the same type and qualification as task memory outputs from
    the previous (task) shader stage.  It is a compile-time error to use the
    "taskNV" qualfier with inputs in any stage other than the mesh shader.
    All other input to a task shader is retrieved explicitly through image
    loads, texture fetches, loads from uniforms, uniform buffers, or shader
    storage buffers, or other user supplied code.  Redeclaration of built-in
    input variables in mesh shaders is not permitted.

    (modify last paragraph, p. 47)

    Fragment shader inputs get...  The auxiliary storage qualifiers centroid,
    sample, and perprimitiveNV can also be applied, as well as...

    (modify first paragraph, p. 48)

    Fragment shader inputs that are signed or unsigned integers, integer
    vectors, or any double-precision floating-point type must be qualified
    with the interpolation qualifier flat or with the auxillary storage
    qualifier perprimitiveNV.

    (add a new example to the second paragraph, p. 48)

      perprimitiveNV in vec3 triangleNormal;

    (modify third paragraph, p. 48)

    The fragment shader inputs form an interface with the mesh shader or last
    active shader in the conventional vertex processing pipeline (e.g.,
    vertex, tessellation evaluation, geometry). ...  Also, interpolation
    qualification (e.g., flat) and auxiliary qualification other than
    "perprimitiveNV" (e.g. centroid) may differ. ...


    Modify Section 4.3.6, Output Variables (p. 49)

    (modify last paragraph, p. 49 to add task and mesh shaders)

    It is a compile-time error to declare a vertex, tessellation evaluation,
    tessellation control, geometry, task, or mesh shader output that contains
    any of the following:  ...

    (insert before the next-to-last paragraph "The order of execution", p. 50)

    Task shader output variables may be used to write values in task memory
    that can be read by the mesh shader invocations for the tasks that it
    spawns.  All user-defined task shader outputs must be declared as members
    of a single interface block qualified with "taskNV" qualifier.  Task
    shaders do not support user-defined outputs declared outside interface
    blocks or without "taskNV" and do not support more than one output.  It is
    a compile-time error to use the "taskNV" qualifier in output declarations
    in any other shader stage.

    Mesh shader output variables may be used to write per-vertex or
    per-primitive data.  Output variables qualified with "perprimitiveNV"
    have separate instances for each primitive in the output mesh; all other
    output variables have separate instances for each vertex in the output
    mesh.  It is a compile-time error to use the "perprimitiveNV" qualifier
    in output declarations in any other shader stage.  Both types of output
    variables are arrayed (see "arrayed" under 4.3.4, Inputs) and each
    per-vertex or per-primitive output variable (or output block, see
    interface blocks below) needs to be declared as an array. For example,

      out float vertexColor[];                      // per-vertex color
      perprimitiveNV out vec3 triangleNormal[];     // per-triangle normal

    Each element of such an array corresponds to one vertex or primitive of
    the output mesh.  Each array can optionally have a size declared.  The
    array size will be set by (or if provided must be consistent with) the
    output layout declaration(s) establishing the maximum number of vertices
    and primitives in the output mesh.  When checking a mesh shader against
    implementation limits on the total number of output variable components,
    the compiler adds the number of per-vertex outputs for a single vertex
    instance and the number of per-primitive outputs for a single primitive
    instance.  Unlike tessellation control shaders, a mesh shader invocation
    may write to outputs for any vertex or primitive.

    Mesh shader outputs qualified with "perviewNV" are considered to be
    per-view and arrayed with a second additional level of arrayness.  Each
    non-block output variable must to be declared as an array with at least
    two dimensions.  For output block members, one level of arrayness applies
    to the block declaration and a second applies to the block member
    declaration.  For example,

      perviewNV out float perViewVertexColor[][];
      out PerVertexBlock {
        perviewNV vec2 perViewTextureCoord[];
      } v[];

    For non-block output variables, each element in the outer (leftmost)
    dimension of such an array corresponds to one vertex or primitive of the
    output mesh, as described immediately above.  Each element in the second
    (next-to-leftmost) dimension corresponds to a single view of the output
    primitive or vertex.  The array dimension corresponding to the view number
    can optionally have a size declared.  The array size will be set to (or if
    provided must be consistent with) the maximum number of views supported by
    the implementation given by the constant gl_MaxMeshViewCountNV.

    When using per-view outputs, all view instances of per-view outputs count
    separately against implementation limits on the total number of output
    components.  Additionally, values for extra views will be stored in the
    upper end of the set of available locations for mesh shader outputs.  A
    compile- or link-time error will be generated if extra storage required
    for extra per-view outputs leaves the compiler unable to assign locations
    for all outputs or includes a location already consumed by an active
    output variable with an associated "location" layout qualifier.

    (modify the next-to-last and last paragraph, p. 50)

    The order of execution of tessellation control, task, and mesh shader
    invocations relative to the other invocations for the same input patch or
    local work group is undefined unless the built-in function barrier() is
    used to provide some control over relative execution order.  When a shader
    invocation calls barrier(), ...

    Because tessellation control, task, and mesh shader invocations execute in
    undefined order between barriers, the values of output variables will
    sometimes be undefined. ...


    Modify Section 4.3.8, Shared Variables (p. 52)

    (modify first paragraph of the section, p. 52)

    The shared qualifier is used to declare variables that have storage shared
    between all work items in a compute, task, or mesh shader local work
    group. Variables declared as shared may only be used in compute, task, or
    mesh shaders.  ...

    (modify last paragraph of the section, p. 52)

    There is a limit to the total size of all variables declared as shared in
    a single shader stage.  This limit, expressed in units of basic machine
    units may be determined by using the OpenGL API to query the value of
    MAX_COMPUTE_SHARED_MEMORY_SIZE (compute shaders),
    MAX_TASK_SHARED_MEMORY_SIZE_NV (task shaders), or
    MAX_MESH_SHARED_MEMORY_SIZE_NV (mesh shaders)


    Modify Section 4.3.9, Interface Blocks, p. 52

    (rework grammar rules, p. 53, to allow "taskNV", "perprimitiveNV", and
     "perviewNV" to qualify blocks)

      interface-qualifier:
        in-block-qualifiers(_opt) "in"
        out-block-qualifiers(_opt) "out"
        uniform
        buffer
        // Note: Not shown for simplicity, but memory qualifiers may also be used

      in-block-qualifiers:
        patch
        taskNV
        perprimitiveNV

      out-block-qualifiers:
        out-block-qualifier
        out-block-qualifier out-block-qualifiers

      out-block-qualifier:
        patch
        taskNV
        perprimitiveNV
        perviewNV


    Modify Section 4.4, Layout Qualifiers, p. 57

    (modify the layout qualifier table, pp. 58-59)

      Layout Qualifier   | Qualifier | Individual | Block | Block  | Allowed interfaces
                         | only      | variable   |       | Member |
      -----------------------------------------------------------------------------
      local_size_x =     |           |            |       |        | compute in
      local_size_y =     |     X     |            |       |        | mesh in
      local_size_z =     |           |            |       |        | task in
      -----------------------------------------------------------------------------
      max_vertices =     |     X     |            |       |        | geometry out
                         |           |            |       |        | mesh out
      -----------------------------------------------------------------------------
      max_primitives =   |     X     |            |       |        | mesh out
      -----------------------------------------------------------------------------
      [ points ]         |           |            |       |        |
      [ lines ]          |     X     |            |       |        | mesh out
      [ triangles ]      |           |            |       |        |


    Add new Section 4.4.1.5, Task Shader Inputs, p. 67

    (note:  the content of this section is nearly identical to the content of
    section 4.4.1.4, Compute Shader Inputs)

    There are no layout location qualifiers for task shader inputs.

    Layout qualifier identifiers for task shader inputs are the work group
    size qualifiers:

      layout-qualifier-id :
        local_size_x = integer-constant-expression
        local_size_y = integer-constant-expression
        local_size_z = integer-constant-expression

    These task shader input layout qualifers behave identically to the
    equivalent compute shader qualifiers and specify a fixed local group size
    used for each task shader work group.  If no size is specified in any of
    the three dimensions, a default size of one will be used.

    If the fixed local group size of the shader in any dimension is greater
    than the maximum size supported by the implementation for that dimension,
    a compile-time error results.  Also, if such a layout qualifier is
    declared more than once in the same shader, all those declarations must
    set the same set of local workgroup sizes and set them to the same values;
    otherwise a compile-time error results. If multiple task shaders attached
    to a single program object declare a fixed local group size, the
    declarations must be identical; otherwise a link-time error results.

    Furthermore, if a program object contains any task shaders, at least one
    must contain an input layout qualifier specifying a fixed local group size
    for the program, or a link-time error will occur.

    Note that task shaders do not currently support multi-dimensional work
    groups; the maximum value for local_size_y and local_size_z will be one.


    Add new Section 4.4.1.6, Mesh Shader Inputs, p. 67

    (note:  the content of this section is nearly identical to the content of
    section 4.4.1.4, Compute Shader Inputs)

    There are no layout location qualifiers for mesh shader inputs.

    Layout qualifier identifiers for mesh shader inputs are the work group
    size qualifiers:

      layout-qualifier-id :
        local_size_x = integer-constant-expression
        local_size_y = integer-constant-expression
        local_size_z = integer-constant-expression

    These mesh shader input layout qualifers behave identically to the
    equivalent compute shader qualifiers and specify a fixed local group size
    used for each mesh shader work group.  If no size is specified in any of
    the three dimensions, a default size of one will be used.

    If the fixed local group size of the shader in any dimension is greater
    than the maximum size supported by the implementation for that dimension,
    a compile-time error results.  Also, if such a layout qualifier is
    declared more than once in the same shader, all those declarations must
    set the same set of local workgroup sizes and set them to the same values;
    otherwise a compile-time error results.  If multiple mesh shaders attached
    to a single program object declare a fixed local group size, the
    declarations must be identical; otherwise a link-time error results.

    Furthermore, if a program object contains any mesh shaders, at least one
    must contain an input layout qualifier specifying a fixed local group size
    for the program, or a link-time error will occur.

    Note that mesh shaders do not currently support multi-dimensional work
    groups; the maximum value for local_size_y and local_size_z will be one.


    Modify section 4.4.2.1, Transform Feedback Layout Qualifiers, p. 69

    (add a new paragraph at the end of the section, p. 71)

    Transform feedback is not supported to capture the outputs of task and
    mesh shaders.  Use of transform feedback layout qualifiers in these shader
    types will result in a compile-time error.


    Add new Section 4.4.2.5, Mesh Shader Outputs, p. 75

    Mesh shaders can have three additional types of output layout identifiers:
    an output primitive type, a maximum output vertex count, and a maximum
    output primitive count. The primitive type, vertex and primitive count
    identifiers are allowed only on the interface qualifier out, not on an
    output block, block member, or variable declaration.

    The layout qualifier identifiers for mesh shader outputs are

      layout-qualifier-id :
        points
        lines
        triangles
        max_vertices = integer-constant-expression
        max_primitives = integer-constant-expression

    The primitive type identifiers "points", "lines", and "triangles" are used
    to specify the type of output primitive produced by the mesh shader, and
    only one of these is accepted.  At least one mesh shader (compilation
    unit) in a program must declare an output primitive type, and all mesh
    shader output primitive type declarations in a program must declare the
    same primitive type.  It is not required that all mesh shaders in a
    program declare an output primitive type.

    The vertex count identifier "max_vertices" is used to specify the maximum
    number of vertices the shader will ever emit for the invocation group.  At
    least one mesh shader (compilation unit) in a program must declare a
    maximum output vertex count, and all mesh shader output vertex count
    declarations in a program must declare the same count.  It is not required
    that all mesh shaders in a program declare a count.

    The primitive count identifier "max_primitives" is used to specify the
    maximum number of primitives the shader will ever emit for the invocation
    group.  At least one mesh shader (compilation unit) in a program must
    declare a maximum output primitive count, and all mesh shader output
    primitive count declarations in a program must declare the same count.  It
    is not required that all mesh shaders in a program declare a count.

    The intrinsically declared output block gl_MeshVerticesNV[] and any user-defined
    output variables or blocks not qualified with "perprimitiveNV" will be
    sized by the "max_vertices" output declaration.  The intrinsically
    declared output block gl_MeshPrimitivesNV[] and any user-defined output
    variables or blocks qualified with "perprimitiveNV" will be sized by the
    "max_primitives" output declaration.  The intrinsically declared array
    gl_PrimitiveIndicesNV[] will be sized according to the primitive type and
    "max_primitives" declarations, where the size is:

    * the value of "max_primitives" if "points" is declared
    * two times the value of "max_primitives" if "lines" is declared, or
    * three times the value of "max_primitives" if "triangles" is declared.

    For outputs declared without an array size, including intrinsically
    declared outputs (e.g., gl_MeshVerticesNV), a layout must be declared before any use
    of the method length() or other array use that requires its size to be
    known.  It is a compile-time error if an output array is declared with an
    explicit size that does not match the array size derived from the layout
    qualifier.


    Modify Section 4.5, Interpolation Qualifiers, p. 83

    (modify first paragraph of the section, p. 83)

    The presence of and type of interpolation is controlled by the above
    interpolation qualifiers as well as the auxiliary storage qualifiers
    centroid and sample. The auxiliary storage qualifiers "patch", "taskNV",
    "perprimitiveNV" are not used for interpolation; it is a compile-time
    error to use interpolation qualifiers with those auxillary storage
    qualifiers.  The auxillary storage qualifier "perviewNV" may not be used
    when declaring fragment shader inputs, but can be used with interpolation
    qualifiers in the declaration of mesh shader outputs.

    (add a new paragraph at the end of the section, p. 84)

    A variable qualified with the auxillary storage qualifier
    "perprimitiveNV" will also not be interpolated.  Instead, it will use
    the same per-primitive value for all fragments generated by each
    primitive.  Such a variable can also qualified with an interpolation
    qualifier with centroid or sample, but those qualifications will mean the
    same thing as only qualifying with "perprimitiveNV".


    Modify Section 7.1, Built-In Language Variables (p. 120)

    (insert after the first paragraph and variable list, p. 123)

    In the task language, built-in variables are intrinsically declared as:

      const uvec3 gl_WorkGroupSize;
      in uvec3 gl_WorkGroupID;
      in uvec3 gl_LocalInvocationID;
      in uvec3 gl_GlobalInvocationID;
      in uint gl_LocalInvocationIndex;
      in uint gl_MeshViewCountNV;
      in uint gl_MeshViewIndicesNV[];

      out uint gl_TaskCountNV;

    In the mesh language, built-in variables are intrinsically declared as:

      const uvec3 gl_WorkGroupSize;
      in uvec3 gl_WorkGroupID;
      in uvec3 gl_LocalInvocationID;
      in uvec3 gl_GlobalInvocationID;
      in uint gl_LocalInvocationIndex;
      in uint gl_MeshViewCountNV;
      in uint gl_MeshViewIndicesNV[];

      out uint gl_PrimitiveCountNV;
      out uint gl_PrimitiveIndicesNV[];

      out gl_MeshPerVertexNV {
         vec4  gl_Position;
         perviewNV vec4  gl_PositionPerViewNV[];  // NVX_multiview_per_view_attributes
         float gl_PointSize;
         float gl_ClipDistance[];
         perviewNV float gl_ClipDistancePerViewNV[][];
         float gl_CullDistance[];
         perviewNV float gl_CullDistancePerViewNV[][];
      } gl_MeshVerticesNV[];

      perprimitiveNV out gl_MeshPerPrimitiveNV {
        int gl_PrimitiveID;
        int gl_Layer;
        perviewNV int gl_LayerPerViewNV[];
        int gl_ViewportIndex;
        int gl_ViewportMask[];          // NV_viewport_array2
        perviewNV int gl_ViewportMaskPerViewNV[][];
      } gl_MeshPrimitivesNV[];

    (modify the discussion of the built-in variables shared with compute
    shaders, which starts on p. 123)

    The built-in constant gl_WorkGroupSize is a compute, task, or mesh shader
    constant containing the local work-group size of the shader. The size ...

    The built-in variable gl_WorkGroupID is a compute, task, or mesh shader
    input variable containing the three-dimensional index of the global work
    group that the current invocation is executing in. ...

    The built-in variable gl_LocalInvocationID is a compute, task, or mesh
    shader input variable containing the three-dimensional index of the local
    work group within the global work group that the current invocation is
    executing in. ...

    The built-in variable gl_GlobalInvocationID is a compute, task, or mesh
    shader input variable containing the global index of the current work
    item. This value uniquely identifies this invocation from all other
    invocations across all local and global work groups initiated by the
    current DispatchCompute or DispatchMeshTasksNV call or by a previously
    executed task shader. ...

    The built-in variable gl_LocalInvocationIndex is a compute, task, or mesh
    shader input variable that contains the one-dimensional representation of
    the gl_LocalInvocationID.

    (modify discussion of gl_PrimitiveID, gl_Layer, and gl_ViewportIndex to
    allow as a mesh output, pp. 125-127)

    The output variable gl_PrimitiveID is available only in the geometry and
    mesh languages and provides a single integer that serves as a primitive
    identifier.  This is then available to fragment shaders as the fragment
    input gl_PrimitiveID, which will select the written primitive ID from the
    provoking vertex in the primitive being shaded when using a geometry
    shader or from the appropriate per-primitive output value when using a
    mesh shader.  If a fragment shader using gl_PrimitiveID is active and a
    geometry or mesh shader is also active, the geometry or mesh shader must
    write to gl_PrimitiveID or the fragment shader input gl_PrimitiveID is
    undefined.  ...

    The variable gl_Layer is available as an output variable in the geometry
    and mesh languages and an input variable in the fragment language.  In the
    geometry and mesh languages, it is used to select a specific layer (or
    face and layer of a cube map) of a multi-layer framebuffer attachment.
    When using a geometry shader, the actual layer used will come from one of
    the vertices in the primitive being shaded. Which vertex the layer comes
    from is discussed in section 11.3.4.6 "Layer and Viewport Selection" of
    the OpenGL Specification. It might be undefined, so it is best to write
    the same layer value for all vertices of a primitive.  When using a mesh
    shader, the actual layer will come from the appropriate per-primitive
    output value written by the mesh shader. ...

    The input variable gl_Layer in the fragment language will have the same
    value that was written to the output variable gl_Layer in the geometry or
    mesh language.  If the geometry or mesh stage does not dynamically assign
    ... If the geometry or mesh stage makes no static assignment to gl_Layer,
    the input value...  Otherwise, the fragment stage will read the same value
    written by the geometry or mesh stage, even if...

    The variable gl_ViewportIndex is available as an output variable in the
    geometry and mesh languages and an input variable in the fragment
    language.  In the geometry and mesh language, it provides the ...
    Primitives generated by the geometry or mesh shader will undergo viewport
    transformation and scissor testing using the viewport transformation and
    scissor rectangle selected by the value of gl_ViewportIndex.  When using a
    geometry shader, the viewport index used will come from one of the
    vertices in the primitive being shaded.  However, which vertex the
    viewport index comes from is implementation-dependent, so it is best to
    use the same viewport index for all vertices of the primitive.  When using
    a mesh shader, the viewport index used will come from the appropriate
    per-primitive output value written by the mesh shader.  If a geometry or
    mesh shader does not assign a value to gl_ViewportIndex, ...   If a
    geometry or mesh shader statically assigns a value to gl_ViewportIndex...

    The input variable gl_ViewportIndex in the fragment stage will have the
    same value that was written to the output variable gl_ViewportIndex in the
    geometry or mesh stage. If the geometry or mesh stage does not dynamically
    assign...  If the geometry or mesh stage makes no static assignment...
    Otherwise, the fragment stage will read the same value written by the
    geometry or mesh stage, even if...

    (insert new paragraphs before the seventh paragraph, starting with
    "Fragment shaders output values", p. 127, describing new task and mesh
    built-in variables)

    The input variable gl_MeshViewCountNV is only available in the mesh and
    task languages and defines the number of views processed by the current
    mesh and task shader invocations.  When using the multi-view API feature,
    the primitives emitted by the mesh shader will be processed separately for
    each enabled view and sent to a different layer of a layered render
    target.  Mesh shader outputs qualified with "perviewNV" are declared as
    arrays with separate values for each view.  To ensure defined results,
    mesh shaders must write values for array elements zero through
    gl_MeshViewCountNV-1 for each such per-view output.

    The input variable gl_MeshViewIndicesNV is only available in the mesh and
    task languages.  This variable is an array where each element holds the
    view number of one of the views being processed by the current mesh and
    task shader invocations.  The array elements with indices greater than or
    equal to the value of gl_MeshViewCountNV are undefined.  If the value of
    gl_MeshViewIndicesNV[i] is , then any outputs qualified with
    "perviewNV" will take on the value of array element  when processing
    primitives for view index .

    The output variable gl_TaskCountNV is only available in the task language
    and defines the number of subsequent mesh shader work groups to generate
    upon completion of the task shader.

    The output variable gl_PrimitiveCountNV is only available in the mesh
    language and defines the number of primitives in the output mesh produced
    by the mesh shader that should be processed by subsequent pipeline stages.

    The output array variable gl_PrimitiveIndicesNV[] is only available in the
    mesh language.  Depending on the output primitive type declared using a
    layout qualifier, each group of one (points), two (lines), three
    (triangles) specifies the indices of the vertices making up the primitive.
    All index values must be in the range [0, N-1], where N is the value of
    the "max_vertices" layout qualifier.  Out-of-bounds index values will
    result in undefined behavior.

    The mesh shader output block members gl_PositionPerViewNV[],
    gl_ClipDistancePerViewNV[][], gl_CullDistancePerViewNV[],
    gl_LayerPerViewNV[], and glViewportMaskPerViewNV[][] are per-view versions
    of the single-view variables with equivalent names that lack the
    "PerViewNV" suffix:

        Per-View Variable               Single-View Variable
        ----------------------------    --------------------
        gl_PositionPerViewNV[]          gl_Position
        gl_ClipDistancePerViewNV[][]    gl_ClipDistance[]
        gl_CullDistancePerViewNV[][]    gl_CullDistance[]
        gl_LayerPerViewNV[]             gl_Layer
        gl_ViewportMaskPerViewNV[][]    gl_ViewportMask[]

    All of these outputs are considered arrayed, with separate values for each
    view.  The view number is used to index in the first dimension of these
    arrays.  For all of these variables, if a shader statically assigns a
    value to any element of a per-view array, it may not statically assign a
    value to the equivalent single-view variable in any mesh shader
    compilation unit.

    As with the gl_ClipDistance[] and gl_CullDistance[] arrays, the second
    dimension of gl_ClipDistancePerViewNV[] and gl_CullDistancePerViewNV[] is
    predeclared as unsized and must be sized by the shader either redeclaring
    it with a size or indexing it only with integral constant expressions. The
    size determines the number and set of enabled clip or cull distances and
    can be at most gl_MaxClipDistances or gl_MaxCullDistances, respectively.
    The number of varying components consumed by these arrays will match the
    size of the array, and shaders writing to either array must write all
    enabled distances, or clipping/culling results will be undefined.

    (modify the fifth paragraph, p. 129)

    The gl_PerVertex, gl_MeshPerVertexNV, and gl_MeshPerPrimitiveNV blocks can
    be redeclared in a shader to explicitly indicate what subset of the fixed
    pipeline interface will be used. ...

    (modify the sixth paragraph, p. 129)

    This establishes the output interface the shader will use with the
    subsequent pipeline stage. It must be a subset of the built-in members of
    gl_PerVertex, gl_MeshPerVertexNV, or gl_MeshPerPrimitiveNV. ...


    Modify Section 7.3, Built-In Constants (p. 136)

    Add to the end of the long list of constants that makes up this section:

        const int gl_MaxMeshViewCountNV = 4;


    Add new Section 8.xx, Mesh Shader Functions, after section 8.15, p. 187

    These functions are only available in mesh shaders.

    Insert a syntax/description table similar to the previous section.

    Syntax:
        void writePackedPrimitiveIndices4x8NV(uint indexOffset,
                                              uint packedIndices)

    Description:
        Interprets the  as four 8 bit unsigned int values and
        stores them into the gl_PrimitiveIndicesNV array starting from the
        provided , which must be a multiple of four.
        Lower bytes are stored at lower addresses in the array.
        The write operations must not exceed the size of the
        gl_PrimitiveIndicesNV array.


    Modify Section 8.16, Shader Invocation Control Functions, p. 186

    (modify first paragraph of the section, p. 186)

    The shader invocation control function is available only in tessellation
    control, compute, task, and mesh shaders and compute shaders.  It is used
    to control the relative execution order of multiple shader invocations
    used to process a patch (in the case of tessellation control shaders) or a
    local work group (in the case of compute, task, and mesh shaders), which
    are otherwise executed with an undefined relative order.

    (modify the last paragraph, p. 186)

    For compute, task, and mesh shaders, the barrier() function may be placed
    within flow control, but that flow control must be uniform flow control.
    ...


    Modify Section 8.17, Shader Memory Control Functions, p. 187

    (modify table of functions, p. 187)

      void memoryBarrierShared()

        Control the ordering of memory transactions to shared variables issued
        within a single shader invocation.

        Only available in compute, task, and mesh shaders.

      void groupMemoryBarrier()

        Control the ordering of all memory transactions issued within a single
        shader invocation, as viewed by other invocations in the same work
        group.

        Only available in compute, task, and mesh shaders.

    (modify last paragraph, p. 187)

    ... all of the above variable types. The functions memoryBarrierShared()
    and groupMemoryBarrier() are available only in compute, task, and mesh
    shaders; the other functions are available in all shader types.

    (modify last paragraph, p. 188)

    ... When using the function groupMemoryBarrier(), this ordering guarantee
    applies only to other shader invocations in the same compute, task, or
    mesh shader work group; all other memory barrier functions provide the
    guarantee to all other shader invocations. ...


Interactions with GLSL 4.60 and KHR_vulkan_glsl

    If GLSL 4.60 or KHR_vulkan_glsl is supported, the layout qualifiers
    "local_size_x_id", "local_size_y_id", and "local_size_z_id" are supported
    in mesh and task shaders, as in compute shaders.

    In the big layout qualifier table in section 4.4, add:

      Layout Qualifier   | Qualifier | Individual | Block | Block  | Allowed interfaces
                         | only      | variable   |       | Member |
      ---------------------------------------------------------+--------------------
      local_size_x_id =  |           |            |       |        | compute in
      local_size_y_id =  |     X     |            |       |        | mesh in
      local_size_z_id =  |           |            |       |        | task in
                         |           |            |       |        | (SPIR-V generation
                         |           |            |       |        |  only)

    No changes are required to the spec language describing these layout
    qualifiers, since the language doesn't specifically reference compute
    shaders and the mesh/task support should be identical.

Interactions with NV_viewport_array2

    If NV_viewport_array2 is not supported, remove gl_ViewportMask[] from the
    gl_PerPrimitiveNV block declaration.

Interactions with NV_stereo_view_rendering

    Mesh shaders support a fully generic set of per-view positions and
    viewport masks, so we include no support for the more limited
    gl_SecondaryPositionNV and gl_SecondaryViewportMaskNV[] built-ins from
    NV_stereo_view_rendering.

Interactions with NVX_multiview_per_view_attributes

    If NVX_multiview_per_view_attributes is not supported, remove
    gl_PositionPerViewNV[] from the gl_PerVertex block declaration and remove
    gl_ViewportMaskPerViewNV[] from the gl_PerPrimitiveNV block declaration.

    If NVX_multiview_per_view_attributes is supported, it is a compile-time
    error for a mesh shader to make a static assignment to
    gl_PositionPerViewNV as well as to either of gl_Position or
    gl_SecondaryPositionNV.

    If NVX_multiview_per_view_attributes is supported, it is a compile-time
    error for a mesh shader to make a static assignment to
    gl_ViewportMaskPerViewNV[] as well as to either of glViewportMask[] or
    gl_SecondaryViewportMaskNV[].

Interactions with ARB_shader_draw_parameters

    If ARB_shader_draw_parameters is supported, the task and mesh shaders
    will also have the following built-in inputs:

      in int      gl_DrawIDARB;

      The variable  is a vertex, task and mesh language input
    variable that holds the integer index of the drawing command to which the
    current vertex belongs (see "Shader Inputs" in section 11.1.3.9 of the
    OpenGL Graphics System Specification), or for the latter the current
    task or mesh workgroup. If the vertex or workgroup is not invoked by a
    Multi* form of a draw command, then the value of gl_DrawIDARB is zero.

Interactions with EXT_clip_cull_distance

    If implemented with OpenGL ES ESSL and EXT_clip_cull_distance is not
    supported, remove references to gl_ClipDistance, gl_CullDistance,
    gl_ClipDistancePerViewNV and gl_CullDistancePerViewNV.

Issues

    (1) What are the matching requirements between mesh outputs declared
        with "perprimitiveNV" and fragment shader inputs?  What should we do
        with interpolation and other auxillary storage qualifiers on
        per-primitive values?

      RESOLVED:  In the initial implementation of this extension, reading
      per-primitive mesh shader outputs in a fragment shader would return
      incorrect/undefined values if the fragment shader input has no special
      qualification.  As a result, we require that mesh shader outputs
      qualified with "perprimitiveNV" be matched with fragment shader inputs
      qualified with "perprimitiveNV" and vice versa.

      We currently allow any of the interpolation and related auxillary
      storage qualifiers (e.g, flat, centroid) on fragment shader inputs
      qualified with "perprimitiveNV".  These qualifiers have no effect.  This
      resolution is consistent with the core GLSL specification language that
      allows (and ignores) auxilliary storage qualifiers such as "sample" or
      "centroid" to be used on inputs qualified by "flat", despite the fact
      that the storage qualifiers are meaningless for flat-shaded attributes.

    (2) How do "arrayed" outputs and blocks work for mesh shaders?  Do you
        have to declare an array dimension?  If you do declare an array
        dimension, how is it checked?

      RESOLVED:  The rules for mesh shader outputs are the same as for arrayed
      inputs and outputs in tessellation control, tessellation evaluation, and
      geometry shaders.  When declaring an "arrayed" block, the size is
      optional.  If omitted, the size is taken from the maximum vertex or
      primitive counts declared using layout qualifiers ("max_vertices" and
      "max_primitives").  If a size is provided, it must match the limits
      specified by the layout qualifiers.

    (3) How are location layout qualfiers handled in mesh and task shaders?
        Do we support some sort of layout or offset qualifier for task memory?

      RESOLVED:  For mesh shader outputs, the "location" layout qualifier is
      supported and is used for interface matching with the fragment shader.
      Locations assigned to mesh shader outputs have the same semantics as
      locations assigned to vertex, tessellation control, tessellation
      evaluation, and geometry shader outputs.  As with tessellation control
      shaders, mesh shader outputs are "arrayed" with separate instances of
      each variable or block for each output vertex or primitive.  These
      multiple instances do not consume separate locations for each
      vertex/primitive.

      For task shader outputs (used as mesh shader inputs), we've chosen not
      to support any location or offset layout qualifiers.  Instead, we limit
      task and mesh shaders to use at most one block qualified by "taskNV" and
      do not allow non-block variables to use "taskNV".  With a single block
      where member declarations need to match between stages, any internal
      offsets/locations can be assigned by the compiler without any external
      annotation.

    (4) For mesh shaders supporting multiple views, how do applications
        specify the set of views that should be produced?

      RESOLVED:  Ignoring mesh shaders, there are significant differences in
      how multiple views are handled in OpenGL and Vulkan.  OVR_multiview
      (OpenGL ES) specifies the view count using the "num_views" layout
      qualifier, where shaders will implicitly use views 0 through
      num_views-1.  VK_KHR_multiview (Vulkan) provides no view information in
      the shader, other than references to a view index.  Instead, the Vulkan
      render pass specifies a bitfield identifying the set of views to
      produce.  In the Vulkan algorithm, there is no explicit notion of a view
      count in the shader, and the view mask is not known at shader compile
      time.

      For mesh shaders in OpenGL, we use the same OVR_multiview "num_views"
      layout qualifier to specify the view count.  Unlike multiview vertex
      shaders, multiview mesh shaders are not run separately for each view.
      The "num_views" layout qualifier is used only to determine array sizes
      for outputs qualified with "perviewNV".  For mesh shaders in Vulkan, the
      view mask of the render pass is used to determine the storage
      requirements of per-view attributes and controls the values of the
      gl_MeshViewCount and gl_MeshViewIndicesNV built-ins.

    (5) For outputs declared with "perviewNV", which are arrays with separate
        elements for each view, what are the rules for array sizing and
        indexing?   Do you have to declare an array dimension?  If you do
        declare an array dimension, how is it checked?

      RESOLVED:  The rules for per-view mesh shader outputs are the same as
      for arrayed inputs and outputs in tessellation control, tessellation
      evaluation, and geometry shaders, as well as the per-vertex and
      per-primitive mesh shader output arrays.  When declaring an output
      qualified with "perviewNV", an extra array dimension needs to be used
      for indexing across views.  The array size in that dimension is
      optional.  If omitted, the size is taken from the implementation
      dependent maximum view count.  If provided, the size must match the
      maximum view count.

      Given that the view count on Vulkan is inferred at *run time* from the
      view mask in the render pass, we can't use that derived view count for
      SPIR-V code generation and compile-time error checking.  Because of
      this, we have chosen to use the *maximum* view count for sizing per-view
      arrays, which is known at compile time.

    (6) What built-ins should be provided for multi-view mesh shaders?

      RESOLVED:  We provide per-view versions of gl_Position,
      gl_ClipDistance[], and gl_CullDistance[] in the built-in block
      gl_MeshPerVertexNV:

         perviewNV vec4  gl_PositionPerViewNV[];
         perviewNV float gl_ClipDistancePerViewNV[][];
         perviewNV float gl_CullDistancePerViewNV[][];

      Because these per-view built-ins refer to the same attributes as the
      equivalent standard built-ins, we prohibit the static use of a per-view
      built-in and its standard equivalent in a single shader.

      We considered instead allowing shaders to redeclare output blocks to add
      "perviewNV" qualification to existing built-ins, such as:

        out gl_PerVertex {
          perviewNV vec4 gl_Position[];
        } v[];

      This approach was rejected because modifying the basic types of built-in
      variables could result in new declarations that consist with the basic
      definitions built into the compiler.

    (7) For multi-view, how do we broadcast mesh shader outputs to multiple
        layers or viewports, where at least some outputs have per-view values?

      RESOLVED:  In the OpenGL and Vulkan multi-view extensions, the
      programming model has logically separate shader invocations for each
      view.  These extensions have a view ID/index built-in that can be used
      to determine which view is being processed by a given invocation.  If a
      hardware platform is capable of compiling a multi-view shader to
      correctly process multiple views in a single shader invocation, the
      implementation is free to perform such an optimization.

      For mesh shaders, a transparent optimization that combines invocations
      for N different views is significantly more problematic.  Separate
      invocations could produce structurally different output (e.g., different
      primitive counts or different topology), which would be more difficult
      to "broadcast".  To simplify matters, we instead use a programming model
      where there is a single work group that processes all views at once.
      For per-view attributes, the mesh shader is responsible for computing
      separate output values for each view.

    (8) Should the gl_NumWorkGroups built-in be supported in task or mesh
        shaders, as with compute shaders?

      RESOLVED:  No, this isn't worth the trouble.  If required, an
      application can pass a workgroup count manually via a uniform.

      If we were to support such a thing, it would be necessary to figure out
      how this built-in would interact with gl_NumWorkGroups.  For compute
      shaders, if you dispatched five workgroups with DispatchCompute, they
      would always be numbered 0..4 and have values less than
      gl_NumWorkGroups.  If you called glDrawMeshTasksNV with  set to 3
      and  set to 5, the work groups would be numbered 3..7 and it
      would be necessary to decide if gl_NumWorkGroups should be 5 or 8.

Revision History

    Version 7, March 6, 2019 (pknowles)
    - Added EXT_clip_cull_distance interactions.

    Version 6, October 22, 2018 (sparmar)
    - Fix typo for per-primitive fragment shader input example

    Version 5, October 5, 2018 (pbrown)
    - Add an interaction with GLSL 4.60 and GL_KHR_vulkan_glsl to allow the
      use of "local_size_[xyz]_id" where applicable.

    Version 4, October 4, 2018 (pbrown)
    - Fix incorrect layout qualifier table entries.  "local_size_[xyz]" is
      legal in task shaders.

    Version 3, September 18, 2018 (pbrown)
    - Additional edits preparing for publication.

    Version 2, September 11, 2018 (pbrown)
    - Miscellaneous edits preparing for publication.

    Version 1 (ckubisch, pbrown)
    - NVIDIA internal revisions.