Video Coding
Vulkan implementations may expose one or more queue families supporting video coding operations. These operations are performed by recording them into a command buffer within a video coding scope, and submitting them to queues with compatible video coding capabilities.
The Vulkan video functionalities are designed to be made available through a set of APIs built on top of each other, consisting of:
-
A core API providing common video coding functionalities,
-
APIs providing codec-independent video decode and video encode related functionalities, respectively,
-
Additional codec-specific APIs built on top of those.
This chapter details the fundamental components and operations of these.
Video Picture Resources
In the context of video coding, multidimensional arrays of image data that can be used as the source or target of video coding operations are referred to as video picture resources. They may store additional metadata that includes implementation-private information used during the execution of video coding operations, as discussed later.
Video picture resources are backed by VkImage objects. Individual subregions of VkImageView objects created from such resources can be used as decode output pictures, encode input pictures, reconstructed pictures, and/or reference pictures.
The parameters of a video picture resource are specified using a
VkVideoPictureResourceInfoKHR
structure.
The VkVideoPictureResourceInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoPictureResourceInfoKHR {
VkStructureType sType;
const void* pNext;
VkOffset2D codedOffset;
VkExtent2D codedExtent;
uint32_t baseArrayLayer;
VkImageView imageViewBinding;
} VkVideoPictureResourceInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
codedOffset
is the offset in texels of the image subregion to use. -
codedExtent
is the size in pixels of the coded image data. -
baseArrayLayer
is the array layer of the image view specified inimageViewBinding
to use as the video picture resource. -
imageViewBinding
is an image view representing the video picture resource.
The image subresource referred to by such a structure is defined as the
image array layer index specified in baseArrayLayer
relative to the
image subresource range the image view specified in imageViewBinding
was created with.
The meaning of the codedOffset
and codedExtent
depends on the
command and context the video picture resource is used in, as well as on the
used video profile and corresponding codec-specific
semantics, as described later.
A video picture resource is uniquely defined by the image subresource
referred to by an instance of this structure, together with the
codedOffset
and codedExtent
members that identify the image
subregion within the image subresource referenced corresponding to the video
picture resource according to the particular codec-specific semantics.
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
As a result, given an effective image subregion corresponding to a video
picture resource, the actual image subregion accessed may be larger than
that as it may include additional padding texels due to the picture access
granularity.
Any writes performed by video coding operations to such padding texels will
result in undefined texel values.
Two video picture resources match if they refer to the same image
subresource and they specify identical codedOffset
and
codedExtent
values.
Decoded Picture Buffer
An integral part of video coding pipelines is the reconstruction of pictures from a compressed video bitstream. A reconstructed picture is a video picture resource resulting from this process.
Such reconstructed pictures can be used as reference pictures in subsequent video coding operations to provide predictions of the values of samples of subsequently decoded or encoded pictures. The correct use of such reconstructed pictures as reference pictures is driven by the video compression standard, the implementation, and the application-specific use cases.
The list of reference pictures used to provide such predictions within a single video coding operation is referred to as the list of active reference pictures.
The decoded picture buffer (DPB) is an indexed data structure that
maintains the set of reference pictures available to be used in video coding
operations.
Individual indexed entries of the DPB are referred to as the
decoded picture buffer (DPB) slots.
The range of valid DPB slot indices is between zero and
N-1
, where N
is the capacity of the DPB.
Each DPB slot can refer to a reference picture containing a video frame
or can refer to up to two reference pictures containing the top and/or
bottom fields that, when both present, together represent a full video
frame
.
In Vulkan, the state and the backing store of the DPB is separated as follows:
-
The state of individual DPB slots is maintained by video session objects.
-
The backing store of DPB slots is provided by subregions of VkImage objects used as video picture resources.
In addition, the implementation may also maintain opaque metadata associated with DPB slots, including:
Such metadata may be stored by the implementation as part of the DPB slot state maintained by the video session, or as part of the video picture resource backing the DPB slot.
Any metadata stored in the video picture resources backing DPB slots are independent of the video session used to store it, hence such video picture resources can be shared with other video sessions. Correspondingly, any metadata that is dependent on the video session will always be stored as part of the DPB slot state maintained by that video session.
The responsibility of managing the DPB is split between the application and the implementation as follows:
-
The application maintains the association between DPB slot indices and corresponding video picture resources.
-
The implementation maintains global and per-slot opaque reference picture metadata.
In addition, the application is also responsible for managing the mapping between the codec-specific picture IDs and DPB slots, and any other codec-specific states unless otherwise specified.
DPB Slot States
At a given time, each DPB slot is either in active or inactive state. Initially, all DPB slots managed by a video session are in inactive state.
A DPB slot can be activated by using it as the target of picture reconstruction in a video coding operation with the reconstructed picture requested to be set up as a reference picture, according to the codec-specific semantics, changing its state to active and associating it with a picture reference to the reconstructed pictures.
Some video coding standards allow multiple picture references to be associated with a single DPB slot. In this case the state of the individual picture references can be independently updated.
As an example, H.264 decoding allows associating a separate top field and bottom field picture with the same DPB slot. |
As part of reference picture setup, the implementation may also generate reference picture metadata. Such reference picture metadata is specific to each picture reference associated with the DPB slot.
If such a video coding operation completes successfully, the activated DPB slot will have a valid picture reference and the reconstructed picture is associated with the DPB slot. This is true even if the DPB slot is used as the target of a picture reconstruction that only sets up a top field or bottom field reference picture and thus does not yet refer to a complete frame. However, if any data provided as input to such a video coding operation is not compliant with the video compression standard used, that video coding operation may complete unsuccessfully, in which case the activated DPB slot will have an invalid picture reference. This is true even if the DPB slot previously had a valid picture reference to a top field or bottom field reference picture, but the reconstruction of the other field corresponding to the DPB slot failed.
The application can use queries to get feedback about the outcome of video coding operations and use the resulting VkQueryResultStatusKHR value to determine whether the video coding operation completed successfully (result status is positive) or unsuccessfully (result status is negative).
Using a reference picture associated with a DPB slot that has an invalid picture reference as an active reference picture in subsequent video coding operations is legal, however, the contents of the outputs of such operations are undefined, and any DPB slots activated by such video coding operations will also have an invalid picture reference. This is true even if such video coding operations may otherwise complete successfully.
A DPB slot can also be deactivated by the application, changing its state to inactive and invalidating any picture references and reference picture metadata associated with the DPB slot.
If an already active DPB slot is used as the target of picture reconstruction in a video coding operation, but the decoded picture is not requested to be set up as a reference picture, according to the codec-specific semantics, no reference picture setup happens and the corresponding picture reference and reference picture metadata is invalidated within the DPB slot. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.
If an already active DPB slot is used as the target of picture reconstruction when decoding a field picture that is not marked as reference, then the behavior is as follows:
-
If the DPB slot is currently associated with a frame, then the DPB slot is deactivated.
-
If the DPB slot is not currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the decoded picture is a bottom field picture, then the other field picture association of the DPB slot, if any, is not disturbed.
-
If the DPB slot is currently associated with a top field picture and the decoded picture is a top field picture, or if the DPB slot is currently associated with a bottom field picture and the decoded picture is a bottom field picture, then that picture association is invalidated, without disturbing the other field picture association, if any. If the DPB slot no longer has any associated picture references after such an operation, the DPB slot is implicitly deactivated.
A DPB slot can be activated with a new frame even if it is already active. In this case all previous associations of the DPB slots with reference pictures are replaced with an association with the reconstructed picture used to activate it.
If an already active DPB slot is activated with a reconstructed field picture, then the behavior is as follows:
-
If the DPB slot is currently associated with a frame, then that association is replaced with an association with the reconstructed field picture used to activate it.
-
If the DPB slot is not currently associated with a top field picture and the DPB slot is activated with a top field picture, or if the DPB slot is not currently associated with a bottom field picture and the DPB slot is activated with a bottom field picture, then the DPB slot is associated with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.
-
If the DPB slot is currently associated with a top field picture and the DPB slot is activated with a new top field picture, or if the DPB slot is currently associated with a bottom field picture and the DPB slot is activated with a new bottom field picture, then that association is replaced with an association with the reconstructed field picture used to activate it, without disturbing the other field picture association, if any.
Video Profiles
The VkVideoProfileInfoKHR
structure is defined as follows:
// Provided by VK_KHR_video_queue
typedef struct VkVideoProfileInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoCodecOperationFlagBitsKHR videoCodecOperation;
VkVideoChromaSubsamplingFlagsKHR chromaSubsampling;
VkVideoComponentBitDepthFlagsKHR lumaBitDepth;
VkVideoComponentBitDepthFlagsKHR chromaBitDepth;
} VkVideoProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
videoCodecOperation
is a VkVideoCodecOperationFlagBitsKHR value specifying a video codec operation. -
chromaSubsampling
is a bitmask of VkVideoChromaSubsamplingFlagBitsKHR specifying video chroma subsampling information. -
lumaBitDepth
is a bitmask of VkVideoComponentBitDepthFlagBitsKHR specifying video luma bit depth information. -
chromaBitDepth
is a bitmask of VkVideoComponentBitDepthFlagBitsKHR specifying video chroma bit depth information.
Video profiles are provided as input to video capability queries such as vkGetPhysicalDeviceVideoCapabilitiesKHR or vkGetPhysicalDeviceVideoFormatPropertiesKHR, as well as when creating resources to be used by video coding operations such as images, buffers, query pools, and video sessions.
The full description of a video profile is specified by an instance of this
structure, and the codec-specific and auxiliary structures provided in its
pNext
chain.
When this structure is specified as an input parameter to
vkGetPhysicalDeviceVideoCapabilitiesKHR, or through the
pProfiles
member of a VkVideoProfileListInfoKHR structure in the
pNext
chain of the input parameter of a query command such as
vkGetPhysicalDeviceVideoFormatPropertiesKHR or
vkGetPhysicalDeviceImageFormatProperties2, the following error codes
indicate specific causes of the failure of the query operation:
-
VK_ERROR_VIDEO_PICTURE_LAYOUT_NOT_SUPPORTED_KHR
specifies that the requested video picture layout (e.g. through thepictureLayout
member of a VkVideoDecodeH264ProfileInfoKHR structure included in thepNext
chain ofVkVideoProfileInfoKHR
) is not supported. -
VK_ERROR_VIDEO_PROFILE_OPERATION_NOT_SUPPORTED_KHR
specifies that a video profile operation specified byvideoCodecOperation
is not supported. -
VK_ERROR_VIDEO_PROFILE_FORMAT_NOT_SUPPORTED_KHR
specifies that video format parameters specified bychromaSubsampling
,lumaBitDepth
, orchromaBitDepth
are not supported. -
VK_ERROR_VIDEO_PROFILE_CODEC_NOT_SUPPORTED_KHR
specifies that the codec-specific parameters corresponding to the video codec operation are not supported.
Possible values of VkVideoProfileInfoKHR::videoCodecOperation
,
specifying the type of video coding operation and video compression standard
used by a video profile, are:
// Provided by VK_KHR_video_queue
typedef enum VkVideoCodecOperationFlagBitsKHR {
VK_VIDEO_CODEC_OPERATION_NONE_KHR = 0,
// Provided by VK_KHR_video_encode_h264
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR = 0x00010000,
// Provided by VK_KHR_video_encode_h265
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR = 0x00020000,
// Provided by VK_KHR_video_decode_h264
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR = 0x00000001,
// Provided by VK_KHR_video_decode_h265
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR = 0x00000002,
// Provided by VK_KHR_video_decode_av1
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR = 0x00000004,
// Provided by VK_KHR_video_encode_av1
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR = 0x00040000,
} VkVideoCodecOperationFlagBitsKHR;
-
VK_VIDEO_CODEC_OPERATION_NONE_KHR
specifies that no video codec operations are supported. -
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
specifies support for H.264 decode operations. -
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
specifies support for H.265 decode operations. -
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
specifies support for AV1 decode operations. -
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
specifies support for H.264 encode operations. -
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
specifies support for H.265 encode operations. -
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
specifies support for AV1 encode operations.
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoCodecOperationFlagsKHR;
VkVideoCodecOperationFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoCodecOperationFlagBitsKHR.
The video format chroma subsampling is defined with the following enums:
// Provided by VK_KHR_video_queue
typedef enum VkVideoChromaSubsamplingFlagBitsKHR {
VK_VIDEO_CHROMA_SUBSAMPLING_INVALID_KHR = 0,
VK_VIDEO_CHROMA_SUBSAMPLING_MONOCHROME_BIT_KHR = 0x00000001,
VK_VIDEO_CHROMA_SUBSAMPLING_420_BIT_KHR = 0x00000002,
VK_VIDEO_CHROMA_SUBSAMPLING_422_BIT_KHR = 0x00000004,
VK_VIDEO_CHROMA_SUBSAMPLING_444_BIT_KHR = 0x00000008,
} VkVideoChromaSubsamplingFlagBitsKHR;
-
VK_VIDEO_CHROMA_SUBSAMPLING_MONOCHROME_BIT_KHR
specifies that the format is monochrome. -
VK_VIDEO_CHROMA_SUBSAMPLING_420_BIT_KHR
specified that the format is 4:2:0 chroma subsampled, i.e. the two chroma components are sampled horizontally and vertically at half the sample rate of the luma component. -
VK_VIDEO_CHROMA_SUBSAMPLING_422_BIT_KHR
- the format is 4:2:2 chroma subsampled, i.e. the two chroma components are sampled horizontally at half the sample rate of luma component. -
VK_VIDEO_CHROMA_SUBSAMPLING_444_BIT_KHR
- the format is 4:4:4 chroma sampled, i.e. all three components of the Y′CBCR format are sampled at the same rate, thus there is no chroma subsampling.
Chroma subsampling is described in more detail in the Chroma Reconstruction section.
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoChromaSubsamplingFlagsKHR;
VkVideoChromaSubsamplingFlagsKHR
is a bitmask type for setting a mask
of zero or more VkVideoChromaSubsamplingFlagBitsKHR.
Possible values for the video format component bit depth are:
// Provided by VK_KHR_video_queue
typedef enum VkVideoComponentBitDepthFlagBitsKHR {
VK_VIDEO_COMPONENT_BIT_DEPTH_INVALID_KHR = 0,
VK_VIDEO_COMPONENT_BIT_DEPTH_8_BIT_KHR = 0x00000001,
VK_VIDEO_COMPONENT_BIT_DEPTH_10_BIT_KHR = 0x00000004,
VK_VIDEO_COMPONENT_BIT_DEPTH_12_BIT_KHR = 0x00000010,
} VkVideoComponentBitDepthFlagBitsKHR;
-
VK_VIDEO_COMPONENT_BIT_DEPTH_8_BIT_KHR
specifies a component bit depth of 8 bits. -
VK_VIDEO_COMPONENT_BIT_DEPTH_10_BIT_KHR
specifies a component bit depth of 10 bits. -
VK_VIDEO_COMPONENT_BIT_DEPTH_12_BIT_KHR
specifies a component bit depth of 12 bits.
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoComponentBitDepthFlagsKHR;
VkVideoComponentBitDepthFlagsKHR
is a bitmask type for setting a mask
of zero or more VkVideoComponentBitDepthFlagBitsKHR.
Additional information about the video decode use case can be provided by
adding a VkVideoDecodeUsageInfoKHR
structure to the pNext
chain
of VkVideoProfileInfoKHR.
The VkVideoDecodeUsageInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_queue
typedef struct VkVideoDecodeUsageInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoDecodeUsageFlagsKHR videoUsageHints;
} VkVideoDecodeUsageInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
videoUsageHints
is a bitmask of VkVideoDecodeUsageFlagBitsKHR specifying hints about the intended use of the video decode profile.
The following bits can be specified in
VkVideoDecodeUsageInfoKHR::videoUsageHints
as a hint about the
video decode use case:
// Provided by VK_KHR_video_decode_queue
typedef enum VkVideoDecodeUsageFlagBitsKHR {
VK_VIDEO_DECODE_USAGE_DEFAULT_KHR = 0,
VK_VIDEO_DECODE_USAGE_TRANSCODING_BIT_KHR = 0x00000001,
VK_VIDEO_DECODE_USAGE_OFFLINE_BIT_KHR = 0x00000002,
VK_VIDEO_DECODE_USAGE_STREAMING_BIT_KHR = 0x00000004,
} VkVideoDecodeUsageFlagBitsKHR;
-
VK_VIDEO_DECODE_USAGE_TRANSCODING_BIT_KHR
specifies that video decoding is intended to be used in conjunction with video encoding to transcode a video bitstream with the same and/or different codecs. -
VK_VIDEO_DECODE_USAGE_OFFLINE_BIT_KHR
specifies that video decoding is intended to be used to consume a local video bitstream. -
VK_VIDEO_DECODE_USAGE_STREAMING_BIT_KHR
specifies that video decoding is intended to be used to consume a video bitstream received as a continuous flow over network.
There are no restrictions on the combination of bits that can be specified by the application. However, applications should use reasonable combinations in order for the implementation to be able to select the most appropriate mode of operation for the particular use case. |
// Provided by VK_KHR_video_decode_queue
typedef VkFlags VkVideoDecodeUsageFlagsKHR;
VkVideoDecodeUsageFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoDecodeUsageFlagBitsKHR.
Additional information about the video encode use case can be provided by
adding a VkVideoEncodeUsageInfoKHR
structure to the pNext
chain
of VkVideoProfileInfoKHR.
The VkVideoEncodeUsageInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeUsageInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeUsageFlagsKHR videoUsageHints;
VkVideoEncodeContentFlagsKHR videoContentHints;
VkVideoEncodeTuningModeKHR tuningMode;
} VkVideoEncodeUsageInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
videoUsageHints
is a bitmask of VkVideoEncodeUsageFlagBitsKHR specifying hints about the intended use of the video encode profile. -
videoContentHints
is a bitmask of VkVideoEncodeContentFlagBitsKHR specifying hints about the content to be encoded using the video encode profile. -
tuningMode
is a VkVideoEncodeTuningModeKHR value specifying the tuning mode to use when encoding with the video profile.
The following bits can be specified in
VkVideoEncodeUsageInfoKHR::videoUsageHints
as a hint about the
video encode use case:
// Provided by VK_KHR_video_encode_queue
typedef enum VkVideoEncodeUsageFlagBitsKHR {
VK_VIDEO_ENCODE_USAGE_DEFAULT_KHR = 0,
VK_VIDEO_ENCODE_USAGE_TRANSCODING_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_USAGE_STREAMING_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_USAGE_RECORDING_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_USAGE_CONFERENCING_BIT_KHR = 0x00000008,
} VkVideoEncodeUsageFlagBitsKHR;
-
VK_VIDEO_ENCODE_USAGE_TRANSCODING_BIT_KHR
specifies that video encoding is intended to be used in conjunction with video decoding to transcode a video bitstream with the same and/or different codecs. -
VK_VIDEO_ENCODE_USAGE_STREAMING_BIT_KHR
specifies that video encoding is intended to be used to produce a video bitstream that is expected to be sent as a continuous flow over network. -
VK_VIDEO_ENCODE_USAGE_RECORDING_BIT_KHR
specifies that video encoding is intended to be used for real-time recording for offline consumption. -
VK_VIDEO_ENCODE_USAGE_CONFERENCING_BIT_KHR
specifies that video encoding is intended to be used in a video conferencing scenario.
There are no restrictions on the combination of bits that can be specified by the application. However, applications should use reasonable combinations in order for the implementation to be able to select the most appropriate mode of operation for the particular use case. |
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeUsageFlagsKHR;
VkVideoEncodeUsageFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoEncodeUsageFlagBitsKHR.
The following bits can be specified in
VkVideoEncodeUsageInfoKHR::videoContentHints
as a hint about the
encoded video content:
// Provided by VK_KHR_video_encode_queue
typedef enum VkVideoEncodeContentFlagBitsKHR {
VK_VIDEO_ENCODE_CONTENT_DEFAULT_KHR = 0,
VK_VIDEO_ENCODE_CONTENT_CAMERA_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_CONTENT_DESKTOP_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_CONTENT_RENDERED_BIT_KHR = 0x00000004,
} VkVideoEncodeContentFlagBitsKHR;
-
VK_VIDEO_ENCODE_CONTENT_CAMERA_BIT_KHR
specifies that video encoding is intended to be used to encode camera content. -
VK_VIDEO_ENCODE_CONTENT_DESKTOP_BIT_KHR
specifies that video encoding is intended to be used to encode desktop content. -
VK_VIDEO_ENCODE_CONTENT_RENDERED_BIT_KHR
specified that video encoding is intended to be used to encode rendered (e.g. game) content.
There are no restrictions on the combination of bits that can be specified by the application. However, applications should use reasonable combinations in order for the implementation to be able to select the most appropriate mode of operation for the particular content type. |
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeContentFlagsKHR;
VkVideoEncodeContentFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoEncodeContentFlagBitsKHR.
Possible video encode tuning mode values are as follows:
// Provided by VK_KHR_video_encode_queue
typedef enum VkVideoEncodeTuningModeKHR {
VK_VIDEO_ENCODE_TUNING_MODE_DEFAULT_KHR = 0,
VK_VIDEO_ENCODE_TUNING_MODE_HIGH_QUALITY_KHR = 1,
VK_VIDEO_ENCODE_TUNING_MODE_LOW_LATENCY_KHR = 2,
VK_VIDEO_ENCODE_TUNING_MODE_ULTRA_LOW_LATENCY_KHR = 3,
VK_VIDEO_ENCODE_TUNING_MODE_LOSSLESS_KHR = 4,
} VkVideoEncodeTuningModeKHR;
-
VK_VIDEO_ENCODE_TUNING_MODE_DEFAULT_KHR
specifies the default tuning mode. -
VK_VIDEO_ENCODE_TUNING_MODE_HIGH_QUALITY_KHR
specifies that video encoding is tuned for high quality. When using this tuning mode, the implementation may compromise the latency of video encoding operations to improve quality. -
VK_VIDEO_ENCODE_TUNING_MODE_LOW_LATENCY_KHR
specifies that video encoding is tuned for low latency. When using this tuning mode, the implementation may compromise quality to increase the performance and lower the latency of video encode operations. -
VK_VIDEO_ENCODE_TUNING_MODE_ULTRA_LOW_LATENCY_KHR
specifies that video encoding is tuned for ultra-low latency. When using this tuning mode, the implementation may compromise quality to maximize the performance and minimize the latency of video encoding operations. -
VK_VIDEO_ENCODE_TUNING_MODE_LOSSLESS_KHR
specifies that video encoding is tuned for lossless encoding. When using this tuning mode, video encode operations produce lossless output.
The VkVideoProfileListInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoProfileListInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t profileCount;
const VkVideoProfileInfoKHR* pProfiles;
} VkVideoProfileListInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
profileCount
is the number of elements in thepProfiles
array. -
pProfiles
is a pointer to an array of VkVideoProfileInfoKHR structures.
Video transcoding is an example of a use case that necessitates the specification of multiple profiles in various contexts. |
When the application provides a video decode profile and one or more video encode profiles in the profile list, the implementation ensures that any capabilitities returned or resources created are suitable for the video transcoding use cases without the need for manual data transformations.
Video Capabilities
Video Coding Capabilities
To query video coding capabilities for a specific video profile, call:
// Provided by VK_KHR_video_queue
VkResult vkGetPhysicalDeviceVideoCapabilitiesKHR(
VkPhysicalDevice physicalDevice,
const VkVideoProfileInfoKHR* pVideoProfile,
VkVideoCapabilitiesKHR* pCapabilities);
-
physicalDevice
is the physical device from which to query the video decode or encode capabilities. -
pVideoProfile
is a pointer to a VkVideoProfileInfoKHR structure. -
pCapabilities
is a pointer to a VkVideoCapabilitiesKHR structure in which the capabilities are returned.
If the video profile described by pVideoProfile
is
supported by the implementation, then this command returns VK_SUCCESS
and pCapabilities
is filled with the capabilities supported with the
specified video profile.
Otherwise, one of the video-profile-specific error codes are returned.
The VkVideoCapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoCapabilityFlagsKHR flags;
VkDeviceSize minBitstreamBufferOffsetAlignment;
VkDeviceSize minBitstreamBufferSizeAlignment;
VkExtent2D pictureAccessGranularity;
VkExtent2D minCodedExtent;
VkExtent2D maxCodedExtent;
uint32_t maxDpbSlots;
uint32_t maxActiveReferencePictures;
VkExtensionProperties stdHeaderVersion;
} VkVideoCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoCapabilityFlagBitsKHR specifying capability flags. -
minBitstreamBufferOffsetAlignment
is the minimum alignment for bitstream buffer offsets. -
minBitstreamBufferSizeAlignment
is the minimum alignment for bitstream buffer range sizes. -
pictureAccessGranularity
is the granularity at which image access to video picture resources happen. -
minCodedExtent
is the minimum width and height of the coded frames. -
maxCodedExtent
is the maximum width and height of the coded frames. -
maxDpbSlots
is the maximum number of DPB slots supported by a single video session. -
maxActiveReferencePictures
is the maximum number of active reference pictures a single video coding operation can use. -
stdHeaderVersion
is a VkExtensionProperties structure reporting the Video Std header name and version supported for the video profile.
It is common for video compression standards to allow using all reference
pictures associated with active DPB slots as active reference pictures,
hence for video decode profiles the values returned in |
Bits which can be set in VkVideoCapabilitiesKHR::flags
are:
// Provided by VK_KHR_video_queue
typedef enum VkVideoCapabilityFlagBitsKHR {
VK_VIDEO_CAPABILITY_PROTECTED_CONTENT_BIT_KHR = 0x00000001,
VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR = 0x00000002,
} VkVideoCapabilityFlagBitsKHR;
-
VK_VIDEO_CAPABILITY_PROTECTED_CONTENT_BIT_KHR
specifies that video sessions support producing and consuming protected content. -
VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR
indicates that the video picture resources associated with the DPB slots of a video session can be backed by separateVkImage
objects. If this capability flag is not present, then all DPB slots of a video session must be associated with video picture resources backed by the sameVkImage
object (e.g. using different layers of the same image).
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoCapabilityFlagsKHR;
VkVideoCapabilityFlagsKHR
is a bitmask type for setting a mask of zero
or more VkVideoCapabilityFlagBitsKHR.
Video Format Capabilities
To enumerate the supported video formats and corresponding capabilities for a specific video profile, call:
// Provided by VK_KHR_video_queue
VkResult vkGetPhysicalDeviceVideoFormatPropertiesKHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceVideoFormatInfoKHR* pVideoFormatInfo,
uint32_t* pVideoFormatPropertyCount,
VkVideoFormatPropertiesKHR* pVideoFormatProperties);
-
physicalDevice
is the physical device from which to query the video format properties. -
pVideoFormatInfo
is a pointer to a VkPhysicalDeviceVideoFormatInfoKHR structure specifying the usage and video profiles for which supported image formats and capabilities are returned. -
pVideoFormatPropertyCount
is a pointer to an integer related to the number of video format properties available or queried, as described below. -
pVideoFormatProperties
is a pointer to an array of VkVideoFormatPropertiesKHR structures in which supported image formats and capabilities are returned.
If pVideoFormatProperties
is NULL
, then the number of video format
properties supported for the given physicalDevice
is returned in
pVideoFormatPropertyCount
.
Otherwise, pVideoFormatPropertyCount
must point to a variable set by
the application to the number of elements in the
pVideoFormatProperties
array, and on return the variable is
overwritten with the number of values actually written to
pVideoFormatProperties
.
If the value of pVideoFormatPropertyCount
is less than the number of
video format properties supported, at most pVideoFormatPropertyCount
values will be written to pVideoFormatProperties
, and
VK_INCOMPLETE
will be returned instead of VK_SUCCESS
, to
indicate that not all the available values were returned.
Video format properties are always queried with respect to a specific set of
video profiles.
These are specified by chaining the VkVideoProfileListInfoKHR
structure to pVideoFormatInfo
.
For most use cases, the images are used by a single video session and a single video profile is provided. For a use case such as video transcoding, where a decode session output image can be used as encode input in one or more encode sessions, multiple video profiles corresponding to the video sessions that will share the image must be provided.
If any of the video profiles specified via
VkVideoProfileListInfoKHR::pProfiles
are not supported, then
this command returns one of the video-profile-specific error codes.
Furthermore, if VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
includes any image usage flags not supported by the specified video
profiles, then this command returns
VK_ERROR_IMAGE_USAGE_NOT_SUPPORTED_KHR
.
This command also returns VK_ERROR_IMAGE_USAGE_NOT_SUPPORTED_KHR
if
VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
does not include
the appropriate flags as dictated by the decode capability flags returned in
VkVideoDecodeCapabilitiesKHR::flags
for any of the profiles
specified in the VkVideoProfileListInfoKHR structure provided in the
pNext
chain of pVideoFormatInfo
.
If the decode capability flags include
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
but not
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
, then in
order to query video format properties for decode DPB and output usage,
VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
must include
both VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
and
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
.
Otherwise, the call will fail with
VK_ERROR_IMAGE_USAGE_NOT_SUPPORTED_KHR
.
If the decode capability flags include
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
but not
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
, then in
order to query video format properties for decode DPB usage,
VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
must include
VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
, but not
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
.
Otherwise, the call will fail with
VK_ERROR_IMAGE_USAGE_NOT_SUPPORTED_KHR
.
Similarly, to query video format properties for decode output usage,
VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
must include
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
, but not
VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
.
Otherwise, the call will fail with
VK_ERROR_IMAGE_USAGE_NOT_SUPPORTED_KHR
.
The imageUsage
member of the VkPhysicalDeviceVideoFormatInfoKHR
structure specifies the expected video usage flags that the returned video
formats must support.
Correspondingly, the imageUsageFlags
member of each
VkVideoFormatPropertiesKHR structure returned will contain at least
the same set of image usage flags.
If the implementation supports using images of a particular format in
operations other than video decode/encode then the imageUsageFlags
member of the corresponding VkVideoFormatPropertiesKHR structure
returned will include additional image usage flags indicating that.
For most use cases, only decode or encode related usage flags are going to be specified. For a use case such as transcode, if the image were to be shared between decode and encode session(s), then both decode and encode related usage flags can be set. |
Multiple VkVideoFormatPropertiesKHR
entries may be returned with the
same format
member with different componentMapping
,
imageType
, or imageTiling
values, as described later.
If VkPhysicalDeviceVideoFormatInfoKHR::imageUsageFlags
includes
VK_IMAGE_USAGE_VIDEO_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
or
VK_IMAGE_USAGE_VIDEO_ENCODE_EMPHASIS_MAP_BIT_KHR
, multiple
VkVideoFormatPropertiesKHR
entries may be returned with the same
format
, componentMapping
, imageType
, and imageTiling
member values, but different quantizationMapTexelSize
returned in the
VkVideoFormatQuantizationMapPropertiesKHR structure, if one is
included in the VkVideoFormatPropertiesKHR::pNext
chain, when
the queried quantization map type supports
multiple distinct quantization map texel sizes.
In addition, a different set of VkVideoFormatPropertiesKHR
entries
may be returned depending on the imageUsage
member of the
VkPhysicalDeviceVideoFormatInfoKHR
structure, even for the same set of
video profiles, for example, based on whether encode input, encode DPB,
decode output, and/or decode DPB usage is requested.
The application can select the parameters returned in the
VkVideoFormatPropertiesKHR
entries and use compatible parameters when
creating the input, output, and DPB images.
The implementation will report all image creation and usage flags that are
valid for images used with the requested video profiles but applications
should create images only with those that are necessary for the particular
use case.
Before creating an image, the application can obtain the complete set of
supported image format features by calling
vkGetPhysicalDeviceImageFormatProperties2 using parameters derived
from the members of one of the reported VkVideoFormatPropertiesKHR
entries and adding the same VkVideoProfileListInfoKHR structure to the
pNext
chain of VkPhysicalDeviceImageFormatInfo2.
The following applies to all VkVideoFormatPropertiesKHR
entries
returned by vkGetPhysicalDeviceVideoFormatPropertiesKHR
:
-
vkGetPhysicalDeviceFormatProperties2 must succeed when called with
VkVideoFormatPropertiesKHR
::format
-
If
VkVideoFormatPropertiesKHR
::imageTiling
isVK_IMAGE_TILING_OPTIMAL
, then theoptimalTilingFeatures
returned by vkGetPhysicalDeviceFormatProperties2 must include all format features required by the image usage flags reported inVkVideoFormatPropertiesKHR
::imageUsageFlags
for the format, as indicated in the Format Feature Dependent Usage Flags section. -
If
VkVideoFormatPropertiesKHR
::imageTiling
isVK_IMAGE_TILING_LINEAR
, then thelinearTilingFeatures
returned by vkGetPhysicalDeviceFormatProperties2 must include all format features required by the image usage flags reported inVkVideoFormatPropertiesKHR
::imageUsageFlags
for the format, as indicated in the Format Feature Dependent Usage Flags section. -
vkGetPhysicalDeviceImageFormatProperties2 must succeed when called with a VkPhysicalDeviceImageFormatInfo2 structure containing the following information:
-
The
pNext
chain including the same VkVideoProfileListInfoKHR structure used to callvkGetPhysicalDeviceVideoFormatPropertiesKHR
. -
format
set to the value ofVkVideoFormatPropertiesKHR
::format
. -
type
set to the value ofVkVideoFormatPropertiesKHR
::imageType
. -
tiling
set to the value ofVkVideoFormatPropertiesKHR
::imageTiling
. -
usage
set to the value ofVkVideoFormatPropertiesKHR
::imageUsageFlags
. -
flags
set to the value ofVkVideoFormatPropertiesKHR
::imageCreateFlags
.
-
The componentMapping
member of VkVideoFormatPropertiesKHR
defines the ordering of the Y′CBCR color channels from the perspective of
the video codec operations specified in VkVideoProfileListInfoKHR.
For example, if the implementation produces video decode output with the
format VK_FORMAT_G8_B8R8_2PLANE_420_UNORM
where the blue and red
chrominance channels are swapped then the componentMapping
member of
the corresponding VkVideoFormatPropertiesKHR
structure will have the
following member values:
components.r = VK_COMPONENT_SWIZZLE_B; // Cb component
components.g = VK_COMPONENT_SWIZZLE_IDENTITY; // Y component
components.b = VK_COMPONENT_SWIZZLE_R; // Cr component
components.a = VK_COMPONENT_SWIZZLE_IDENTITY; // unused, defaults to 1.0
The VkPhysicalDeviceVideoFormatInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkPhysicalDeviceVideoFormatInfoKHR {
VkStructureType sType;
const void* pNext;
VkImageUsageFlags imageUsage;
} VkPhysicalDeviceVideoFormatInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
imageUsage
is a bitmask of VkImageUsageFlagBits specifying the intended usage of the video images.
The VkVideoFormatPropertiesKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoFormatPropertiesKHR {
VkStructureType sType;
void* pNext;
VkFormat format;
VkComponentMapping componentMapping;
VkImageCreateFlags imageCreateFlags;
VkImageType imageType;
VkImageTiling imageTiling;
VkImageUsageFlags imageUsageFlags;
} VkVideoFormatPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
format
is a VkFormat that specifies the format that can be used with the specified video profiles and image usages. -
componentMapping
defines the color channel order used for the format.format
along withcomponentMapping
describe how the color channels are ordered when producing video decoder output or are expected to be ordered in video encoder input, when applicable. If theformat
reported does not require component swizzling then all members ofcomponentMapping
will be set toVK_COMPONENT_SWIZZLE_IDENTITY
. -
imageCreateFlags
is a bitmask of VkImageCreateFlagBits specifying the supported image creation flags for the format. -
imageType
is a VkImageType that specifies the image type the format can be used with. -
imageTiling
is a VkImageTiling that specifies the image tiling the format can be used with. -
imageUsageFlags
is a bitmask of VkImageUsageFlagBits specifying the supported image usage flags for the format.
The list of supported video format properties for a set of image usage
flags with respect to a video profile is defined as the
list of VkVideoFormatPropertiesKHR structures and any structures
included in its pNext
chain, obtained by calling
vkGetPhysicalDeviceVideoFormatPropertiesKHR with
VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
equal to the
VkImageUsageFlags in question and the
VkPhysicalDeviceVideoFormatInfoKHR::pNext
chain including a
VkVideoProfileListInfoKHR structure with its pProfiles
member
containing a single array element specifying the VkVideoProfileInfoKHR
structure chain describing the video profile in question.
Video Sessions
Video sessions are objects that represent and maintain the state needed to perform video decode or encode operations using a specific video profile.
In case of video encode profiles this includes the current rate control configuration and the currently set video encode quality level.
Video sessions are represented by VkVideoSessionKHR
handles:
// Provided by VK_KHR_video_queue
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkVideoSessionKHR)
Creating a Video Session
To create a video session object, call:
// Provided by VK_KHR_video_queue
VkResult vkCreateVideoSessionKHR(
VkDevice device,
const VkVideoSessionCreateInfoKHR* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
VkVideoSessionKHR* pVideoSession);
-
device
is the logical device that creates the video session. -
pCreateInfo
is a pointer to a VkVideoSessionCreateInfoKHR structure containing parameters to be used to create the video session. -
pAllocator
controls host memory allocation as described in the Memory Allocation chapter. -
pVideoSession
is a pointer to a VkVideoSessionKHR handle in which the resulting video session object is returned.
The resulting video session object is said to be created with the video
codec operation specified in
pCreateInfo->pVideoProfile→videoCodecOperation
.
The name and version of the codec-specific Video Std header to be used with
the video session is specified by the VkExtensionProperties structure
pointed to by pCreateInfo->pStdHeaderVersion
.
If a non-existent or unsupported Video Std header version is specified in
pCreateInfo->pStdHeaderVersion→specVersion
, then this command returns
VK_ERROR_VIDEO_STD_VERSION_NOT_SUPPORTED_KHR
.
Video session objects are created in uninitialized state.
In order to transition the video session into initial state, the
application must issue a vkCmdControlVideoCodingKHR command with
VkVideoCodingControlInfoKHR::flags
including
VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR
.
Video session objects also maintain the
state of the DPB.
The number of DPB slots usable with the created video session is specified
in pCreateInfo->maxDpbSlots
, and each slot is initially in the
inactive state.
Each DPB slot maintained by the created video session can refer to a reference picture representing a video frame.
In addition, if the videoCodecOperation
member of the
VkVideoProfileInfoKHR structure pointed to by
pCreateInfo->pVideoProfile
is
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
and the
pictureLayout
member of the VkVideoDecodeH264ProfileInfoKHR
structure provided in the VkVideoProfileInfoKHR::pNext
chain is
not VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_PROGRESSIVE_KHR
, then the
created video session supports interlaced frames and each DPB slot maintained by the created video session can instead refer to
separate top field and bottom field reference pictures
that together can represent a full video frame.
In this case, it is up to the application, driven by the video content,
whether it associates any individual DPB slot with separate top and/or
bottom field pictures or a single picture representing a full frame.
The created video session can be used to perform video coding operations
using video frames up to the maximum size specified in
pCreateInfo->maxCodedExtent
.
The minimum frame size allowed is implicitly derived from
VkVideoCapabilitiesKHR::minCodedExtent
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile
specified by pCreateInfo->pVideoProfile
.
Accordingly, the created video session is said to be created with a
minCodedExtent
equal to that.
In case of video session objects created with a video encode operation,
implementations may return the
VK_ERROR_INVALID_VIDEO_STD_PARAMETERS_KHR
error if any of the
specified Video Std parameters do not adhere to the syntactic or semantic
requirements of the used video compression standard, or if values derived
from parameters according to the rules defined by the used video compression
standard do not adhere to the capabilities of the video compression standard
or the implementation.
Applications should not rely on the
|
The VkVideoSessionCreateInfoKHR structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoSessionCreateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t queueFamilyIndex;
VkVideoSessionCreateFlagsKHR flags;
const VkVideoProfileInfoKHR* pVideoProfile;
VkFormat pictureFormat;
VkExtent2D maxCodedExtent;
VkFormat referencePictureFormat;
uint32_t maxDpbSlots;
uint32_t maxActiveReferencePictures;
const VkExtensionProperties* pStdHeaderVersion;
} VkVideoSessionCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
queueFamilyIndex
is the index of the queue family the created video session will be used with. -
flags
is a bitmask of VkVideoSessionCreateFlagBitsKHR specifying creation flags. -
pVideoProfile
is a pointer to a VkVideoProfileInfoKHR structure specifying the video profile the created video session will be used with. -
pictureFormat
is the image format the created video session will be used with. IfpVideoProfile->videoCodecOperation
specifies a decode operation, thenpictureFormat
is the image format of decode output pictures usable with the created video session. IfpVideoProfile->videoCodecOperation
specifies an encode operation, thenpictureFormat
is the image format of encode input pictures usable with the created video session. -
maxCodedExtent
is the maximum width and height of the coded frames the created video session will be used with. -
referencePictureFormat
is the image format of reference pictures stored in the DPB the created video session will be used with. -
maxDpbSlots
is the maximum number of DPB Slots that can be used with the created video session. -
maxActiveReferencePictures
is the maximum number of active reference pictures that can be used in a single video coding operation using the created video session. -
pStdHeaderVersion
is a pointer to a VkExtensionProperties structure requesting the Video Std header version to use for thevideoCodecOperation
specified inpVideoProfile
.
Bits which can be set in VkVideoSessionCreateInfoKHR::flags
are:
// Provided by VK_KHR_video_queue
typedef enum VkVideoSessionCreateFlagBitsKHR {
VK_VIDEO_SESSION_CREATE_PROTECTED_CONTENT_BIT_KHR = 0x00000001,
// Provided by VK_KHR_video_encode_queue
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_PARAMETER_OPTIMIZATIONS_BIT_KHR = 0x00000002,
// Provided by VK_KHR_video_maintenance1
VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR = 0x00000004,
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR = 0x00000008,
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_EMPHASIS_MAP_BIT_KHR = 0x00000010,
} VkVideoSessionCreateFlagBitsKHR;
-
VK_VIDEO_SESSION_CREATE_PROTECTED_CONTENT_BIT_KHR
specifies that the video session uses protected video content. -
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_PARAMETER_OPTIMIZATIONS_BIT_KHR
specifies that the implementation is allowed to override video session parameters and other codec-specific encoding parameters to optimize video encode operations based on the use case information specified in the video profile and the used video encode quality level.Not specifying
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_PARAMETER_OPTIMIZATIONS_BIT_KHR
does not guarantee that the implementation will not do any codec-specific parameter overrides, as certain overrides are necessary for the correct operation of the video encoder implementation due to limitations to the available encoding tools on that implementation. This flag, however, enables the implementation to apply further optimizing overrides. -
VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR
specifies that queries within video coding scopes using the created video session are executed inline with video coding operations. -
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
specifies that the video session can be used to encode pictures with quantization delta maps. -
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_EMPHASIS_MAP_BIT_KHR
specifies that the video session can be used to encode pictures with emphasis maps.
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoSessionCreateFlagsKHR;
VkVideoSessionCreateFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoSessionCreateFlagBitsKHR.
Destroying a Video Session
To destroy a video session, call:
// Provided by VK_KHR_video_queue
void vkDestroyVideoSessionKHR(
VkDevice device,
VkVideoSessionKHR videoSession,
const VkAllocationCallbacks* pAllocator);
-
device
is the logical device that destroys the video session. -
videoSession
is the video session to destroy. -
pAllocator
controls host memory allocation as described in the Memory Allocation chapter.
Video Session Memory Association
After creating a video session object, and before the object can be used to record video coding operations into command buffers using it, the application must allocate and bind device memory to the video session. Device memory is allocated separately (see Device Memory) and then associated with the video session.
Video sessions may have multiple memory bindings identified by unique unsigned integer values. Appropriate device memory must be bound to each such memory binding before using the video session to record command buffer commands with it.
To determine the memory requirements for a video session object, call:
// Provided by VK_KHR_video_queue
VkResult vkGetVideoSessionMemoryRequirementsKHR(
VkDevice device,
VkVideoSessionKHR videoSession,
uint32_t* pMemoryRequirementsCount,
VkVideoSessionMemoryRequirementsKHR* pMemoryRequirements);
-
device
is the logical device that owns the video session. -
videoSession
is the video session to query. -
pMemoryRequirementsCount
is a pointer to an integer related to the number of memory binding requirements available or queried, as described below. -
pMemoryRequirements
isNULL
or a pointer to an array of VkVideoSessionMemoryRequirementsKHR structures in which the memory binding requirements of the video session are returned.
If pMemoryRequirements
is NULL
, then the number of memory bindings
required for the video session is returned in
pMemoryRequirementsCount
.
Otherwise, pMemoryRequirementsCount
must point to a variable set by
the application to the number of elements in the pMemoryRequirements
array, and on return the variable is overwritten with the number of memory
binding requirements actually written to pMemoryRequirements
.
If pMemoryRequirementsCount
is less than the number of memory bindings
required for the video session, then at most pMemoryRequirementsCount
elements will be written to pMemoryRequirements
, and
VK_INCOMPLETE
will be returned, instead of VK_SUCCESS
, to
indicate that not all required memory binding requirements were returned.
The VkVideoSessionMemoryRequirementsKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoSessionMemoryRequirementsKHR {
VkStructureType sType;
void* pNext;
uint32_t memoryBindIndex;
VkMemoryRequirements memoryRequirements;
} VkVideoSessionMemoryRequirementsKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
memoryBindIndex
is the index of the memory binding. -
memoryRequirements
is a VkMemoryRequirements structure in which the requested memory binding requirements for the binding index specified bymemoryBindIndex
are returned.
To attach memory to a video session object, call:
// Provided by VK_KHR_video_queue
VkResult vkBindVideoSessionMemoryKHR(
VkDevice device,
VkVideoSessionKHR videoSession,
uint32_t bindSessionMemoryInfoCount,
const VkBindVideoSessionMemoryInfoKHR* pBindSessionMemoryInfos);
-
device
is the logical device that owns the video session. -
videoSession
is the video session to be bound with device memory. -
bindSessionMemoryInfoCount
is the number of elements inpBindSessionMemoryInfos
. -
pBindSessionMemoryInfos
is a pointer to an array ofbindSessionMemoryInfoCount
VkBindVideoSessionMemoryInfoKHR structures specifying memory regions to be bound to specific memory bindings of the video session.
The valid usage statements below refer to the VkMemoryRequirements
structure corresponding to a specific element of
pBindSessionMemoryInfos
, which is defined as follows:
-
If the
memoryBindIndex
member of the element ofpBindSessionMemoryInfos
in question matches thememoryBindIndex
member of one of the elements returned inpMemoryRequirements
when vkGetVideoSessionMemoryRequirementsKHR is called with the samevideoSession
and withpMemoryRequirementsCount
equal tobindSessionMemoryInfoCount
, then thememoryRequirements
member of that element ofpMemoryRequirements
is the VkMemoryRequirements structure corresponding to the element ofpBindSessionMemoryInfos
in question. -
Otherwise the element of
pBindSessionMemoryInfos
in question is said to not have a corresponding VkMemoryRequirements structure.
The VkBindVideoSessionMemoryInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkBindVideoSessionMemoryInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t memoryBindIndex;
VkDeviceMemory memory;
VkDeviceSize memoryOffset;
VkDeviceSize memorySize;
} VkBindVideoSessionMemoryInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
memoryBindIndex
is the memory binding index to bind memory to. -
memory
is the allocated device memory to be bound to the video session’s memory binding with indexmemoryBindIndex
. -
memoryOffset
is the start offset of the region ofmemory
which is to be bound. -
memorySize
is the size in bytes of the region ofmemory
, starting frommemoryOffset
bytes, to be bound.
Video Profile Compatibility
Resources and query pools used with a particular video session must be compatible with the video profile the video session was created with.
A VkBuffer is compatible with a video profile if it was created with
the VkBufferCreateInfo::pNext
chain including a
VkVideoProfileListInfoKHR structure with its pProfiles
member
containing an element matching the VkVideoProfileInfoKHR structure
chain describing the video profile, and
VkBufferCreateInfo::usage
including at least one bit specific to
video coding usage.
-
VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
-
VK_BUFFER_USAGE_VIDEO_DECODE_DST_BIT_KHR
-
VK_BUFFER_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
-
VK_BUFFER_USAGE_VIDEO_ENCODE_DST_BIT_KHR
A VkBuffer is also compatible with a video profile if it was created
with VkBufferCreateInfo::flags
including
VK_BUFFER_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
.
A VkImage is compatible with a video profile if it was created with
the VkImageCreateInfo::pNext
chain including a
VkVideoProfileListInfoKHR structure with its pProfiles
member
containing an element matching the VkVideoProfileInfoKHR structure
chain describing the video profile, and VkImageCreateInfo::usage
including at least one bit specific to video coding usage.
-
VK_IMAGE_USAGE_VIDEO_DECODE_SRC_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_ENCODE_SRC_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_ENCODE_DST_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_ENCODE_DPB_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
-
VK_IMAGE_USAGE_VIDEO_ENCODE_EMPHASIS_MAP_BIT_KHR
A VkImage is also compatible with a video profile if all of the following conditions are true for the VkImageCreateInfo structure the image was created with:
-
VkImageCreateInfo::
flags
includedVK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
. -
The list of VkVideoFormatPropertiesKHR structures, obtained by calling vkGetPhysicalDeviceVideoFormatPropertiesKHR with VkPhysicalDeviceVideoFormatInfoKHR::
imageUsage
equal to the VkImageCreateInfo::usage
the image was created with and the VkPhysicalDeviceVideoFormatInfoKHR::pNext
chain including a VkVideoProfileListInfoKHR structure with itspProfiles
member containing a single array element specifying the VkVideoProfileInfoKHR structure chain describing the video profile in question, contains an element for which all of the following conditions are true with respect to the VkImageCreateInfo structure the image was created with:-
VkImageCreateInfo::
format
equals VkVideoFormatPropertiesKHR::format
. -
VkImageCreateInfo::
flags
only containsVK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
and/or bits also set in VkVideoFormatPropertiesKHR::imageCreateFlags
.Specifying
VK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
when creating decode output pictures or encode input pictures is always supported when the videoMaintenance1 feature is enabled, regardless of the supported VkImageCreateFlags reported in VkVideoFormatPropertiesKHR::imageCreateFlags
. Accordingly, implementations should not reportVK_IMAGE_CREATE_VIDEO_PROFILE_INDEPENDENT_BIT_KHR
in VkVideoFormatPropertiesKHR::imageCreateFlags
for any video format. -
VkImageCreateInfo::
imageType
equals VkVideoFormatPropertiesKHR::imageType
. -
VkImageCreateInfo::
tiling
equals VkVideoFormatPropertiesKHR::imageTiling
. -
VkImageCreateInfo::
usage
only contains bits also set in VkVideoFormatPropertiesKHR::imageUsageFlags
, or VkImageCreateInfo::flags
includesVK_IMAGE_CREATE_EXTENDED_USAGE_BIT
.
-
While some of these rules allow creating buffer or image resources that may
be compatible with any video profile, applications should still prefer to
include the specific video profiles the buffer or image resource is expected
to be used with (through a VkVideoProfileListInfoKHR structure
included in the |
A VkImageView is compatible with a video profile if the VkImage it was created from is also compatible with that video profile.
A VkQueryPool is compatible with a video profile if it was created
with the VkQueryPoolCreateInfo::pNext
chain including a
VkVideoProfileInfoKHR structure chain describing the same video
profile, and VkQueryPoolCreateInfo::queryType
having one of the
following values:
-
VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
-
VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR
Video Session Parameters
Video session parameters objects can store preprocessed codec-specific parameters used with a compatible video session, and enable reducing the number of parameters needed to be provided and processed by the implementation while recording video coding operations into command buffers.
Parameters stored in such objects are immutable to facilitate the concurrent use of the stored parameters in multiple threads. At the same time, new parameters can be added to existing objects using the vkUpdateVideoSessionParametersKHR command.
In order to support concurrent use of the stored immutable parameters while
also allowing the video session parameters object to be extended with new
parameters, each video session parameters object maintains an update
sequence counter that is set to 0
at object creation time and must be
incremented by each subsequent update operation.
Certain video sequences that adhere to particular video compression standards permit updating previously supplied parameters. If a parameter update is necessary, the application has the following options:
-
Cache the set of parameters on the application side and create a new video session parameters object adding all the parameters with appropriate changes, as necessary; or
-
Create a new video session parameters object providing only the updated parameters and the previously used object as the template, which ensures that parameters not specified at creation time will be copied unmodified from the template object.
The actual types of parameters that can be stored and the capacity for individual parameter types, and the methods of initializing, updating, and referring to individual parameters are specific to the video codec operation the video session parameters object was created with.
-
For
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
these are defined in the H.264 Decode Parameter Sets section. -
For
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
these are defined in the H.265 Decode Parameter Sets section. -
For
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
these are defined in the AV1 Decode Parameter Sets section. -
For
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
these are defined in the H.264 Encode Parameter Sets section. -
For
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
these are defined in the H.265 Encode Parameter Sets section. -
For
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
these are defined in the AV1 Encode Parameter Sets section.
Video session parameters objects created with an encode operation are further specialized based on the video encode quality level the video session parameters are used with, as implementations may apply different sets of parameter overrides depending on the used quality level. This enables implementations to store the potentially optimized set of parameters in these objects, further limiting the necessary processing required while recording video encode operations into command buffers.
Video session parameters are represented by
VkVideoSessionParametersKHR
handles:
// Provided by VK_KHR_video_queue
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkVideoSessionParametersKHR)
Creating Video Session Parameters
To create a video session parameters object, call:
// Provided by VK_KHR_video_queue
VkResult vkCreateVideoSessionParametersKHR(
VkDevice device,
const VkVideoSessionParametersCreateInfoKHR* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
VkVideoSessionParametersKHR* pVideoSessionParameters);
-
device
is the logical device that creates the video session parameters object. -
pCreateInfo
is a pointer to VkVideoSessionParametersCreateInfoKHR structure containing parameters to be used to create the video session parameters object. -
pAllocator
controls host memory allocation as described in the Memory Allocation chapter. -
pVideoSessionParameters
is a pointer to a VkVideoSessionParametersKHR handle in which the resulting video session parameters object is returned.
The resulting video session parameters object is said to be created with the
video codec operation pCreateInfo->videoSession
was created with.
Video session parameters objects created with an encode operation are always
created with respect to a video encode quality level.
By default, the created video session parameters objects are created with
quality level zero, unless otherwise specified by including a
VkVideoEncodeQualityLevelInfoKHR structure in the
pCreateInfo->pNext
chain, in which case the video session parameters
object is created with the quality level specified in
VkVideoEncodeQualityLevelInfoKHR::qualityLevel
.
If pCreateInfo->videoSessionParametersTemplate
is not
VK_NULL_HANDLE
, then it will be used as a template for constructing
the new video session parameters object.
This happens by first adding any parameters according to the additional
creation parameters provided in the pCreateInfo->pNext
chain, followed
by adding any parameters from the template object that have a key that does
not match the key of any of the already added parameters.
For video session parameters objects created with an encode operation, the
template object specified in
pCreateInfo->videoSessionParametersTemplate
must have been created
with the same video encode quality level as the
newly created object.
This means that codec-specific parameters stored in video session parameters objects can only be reused across different video encode quality levels by re-specifying them, as previously created video session parameters against other quality levels cannot be used as template because the original codec-specific parameters (before the implementation may have applied parameter overrides) may no longer be available in them for the purposes of constructing the derived object. |
Video session parameters objects are only compatible with
quantization maps if they are created with
pCreateInfo->flags
including
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
.
Video session parameters objects created with
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
against a video session object that was created with
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
or
VK_VIDEO_SESSION_CREATE_ALLOW_ENCODE_EMPHASIS_MAP_BIT_KHR
are created
with a specific compatible quantization map texel size specified in the quantizationMapTexelSize
member of
the VkVideoEncodeQuantizationMapSessionParametersCreateInfoKHR
structure included in the pNext
chain of pCreateInfo
.
This means that the quantization map texel size that such a video session parameters object is compatible with is fixed for the lifetime of the object. Applications have to create separate video session parameters objects to use different quantization map texel sizes with a single video session object. This is necessary because the used quantization map texel size may affect the parameter overrides the implementation has to perform and thus the final values of the used codec-specific parameters. |
For video session parameters objects created with
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
,
the template object specified in
pCreateInfo->videoSessionParametersTemplate
must also have been
created with
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
and the same compatible quantization map texel size specified in
VkVideoEncodeQuantizationMapSessionParametersCreateInfoKHR::quantizationMapTexelSize
.
This means that codec-specific parameters stored in video session parameters objects can only be reused with different quantization map texel sizes by re-specifying them, as previously created video session parameters against other quantization map texel sizes cannot be used as template because the original codec-specific parameters (before the implementation may have applied parameter overrides) may no longer be available in them for the purposes of constructing the derived object. |
For video session parameters objects created without
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
,
the template object specified in
pCreateInfo->videoSessionParametersTemplate
must also have been
created without
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
.
If pCreateInfo->videoSession
was created with the video codec
operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then the
created video session parameters object will initially contain the following
sets of parameter entries:
-
StdVideoH264SequenceParameterSet
structures representing H.264 SPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH264SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH264SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264SequenceParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the sameseq_parameter_set_id
.
-
-
StdVideoH264PictureParameterSet
structures representing H.264 PPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH264SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH264PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264PictureParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the sameseq_parameter_set_id
andpic_parameter_set_id
.
-
If pCreateInfo->videoSession
was created with the video codec
operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then the
created video session parameters object will initially contain the following
sets of parameter entries:
-
StdVideoH265VideoParameterSet
structures representing H.265 VPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH265VideoParameterSet
entries specified inpParametersAddInfo->pStdVPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265VideoParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the samevps_video_parameter_set_id
.
-
-
StdVideoH265SequenceParameterSet
structures representing H.265 SPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH265SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265SequenceParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the samesps_video_parameter_set_id
andsps_seq_parameter_set_id
.
-
-
StdVideoH265PictureParameterSet
structures representing H.265 PPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH265PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265PictureParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the samesps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
.
-
If pCreateInfo->videoSession
was created with the video codec
operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
, then the
created video session parameters object will contain a single
AV1 sequence header represented by a
StdVideoAV1SequenceHeader
structure specified through the
pStdSequenceHeader
member of the
VkVideoDecodeAV1SessionParametersCreateInfoKHR structure provided in
the pCreateInfo->pNext
chain.
As such video session parameters objects can only contain a single
AV1 sequence header, it is not possible to
use a previously created object as a template or subsequently update the
created video session parameters object.
If pCreateInfo->videoSession
was created with the video codec
operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then the
created video session parameters object will initially contain the following
sets of parameter entries:
-
StdVideoH264SequenceParameterSet
structures representing H.264 SPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH264SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH264SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264SequenceParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the sameseq_parameter_set_id
.
-
-
StdVideoH264PictureParameterSet
structures representing H.264 PPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH264SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH264PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264PictureParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the sameseq_parameter_set_id
andpic_parameter_set_id
.
-
If pCreateInfo->videoSession
was created with the video codec
operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then the
created video session parameters object will initially contain the following
sets of parameter entries:
-
StdVideoH265VideoParameterSet
structures representing H.265 VPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH265VideoParameterSet
entries specified inpParametersAddInfo->pStdVPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265VideoParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the samevps_video_parameter_set_id
.
-
-
StdVideoH265SequenceParameterSet
structures representing H.265 SPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH265SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265SequenceParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the samesps_video_parameter_set_id
andsps_seq_parameter_set_id
.
-
-
StdVideoH265PictureParameterSet
structures representing H.265 PPS entries, as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure provided in thepCreateInfo->pNext
chain is notNULL
, then the set ofStdVideoH265PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added first; -
If
pCreateInfo->videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265PictureParameterSet
entry stored in it is copied to the created video session parameters object if the created object does not already contain such an entry with the samesps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
.
-
If pCreateInfo->videoSession
was created with the video codec
operation VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, then the
created video session parameters object will contain a single
AV1 sequence header specified through the
members of the VkVideoEncodeAV1SessionParametersCreateInfoKHR
structure provided in the pCreateInfo->pNext
chain.
As such video session parameters objects can only contain a single
AV1 sequence header, it is not possible to
use a previously created object as a template or subsequently update the
created video session parameters object.
In case of video session parameters objects created with a video encode
operation, implementations may return the
VK_ERROR_INVALID_VIDEO_STD_PARAMETERS_KHR
error if any of the
specified Video Std parameters do not adhere to the syntactic or semantic
requirements of the used video compression standard, or if values derived
from parameters according to the rules defined by the used video compression
standard do not adhere to the capabilities of the video compression standard
or the implementation.
Applications should not rely on the
|
The VkVideoSessionParametersCreateInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoSessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoSessionParametersCreateFlagsKHR flags;
VkVideoSessionParametersKHR videoSessionParametersTemplate;
VkVideoSessionKHR videoSession;
} VkVideoSessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoSessionParametersCreateFlagBitsKHR specifying create flags. -
videoSessionParametersTemplate
isVK_NULL_HANDLE
or a valid handle to a VkVideoSessionParametersKHR object used as a template for constructing the new video session parameters object. -
videoSession
is the video session object against which the video session parameters object is going to be created.
Limiting values are defined below that are referenced by the relevant valid usage statements of this structure.
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then letStdVideoH264SequenceParameterSet spsAddList[]
be the list of H.264 SPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH264SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH264SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added tospsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264SequenceParameterSet
entry stored in it withseq_parameter_set_id
not matching any of the entries already inspsAddList
is added tospsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then letStdVideoH264PictureParameterSet ppsAddList[]
be the list of H.264 PPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH264SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH264PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added toppsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264PictureParameterSet
entry stored in it withseq_parameter_set_id
orpic_parameter_set_id
not matching any of the entries already inppsAddList
is added toppsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then letStdVideoH265VideoParameterSet vpsAddList[]
be the list of H.265 VPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH265VideoParameterSet
entries specified inpParametersAddInfo->pStdVPSs
are added tovpsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265VideoParameterSet
entry stored in it withvps_video_parameter_set_id
not matching any of the entries already invpsAddList
is added tovpsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then letStdVideoH265SequenceParameterSet spsAddList[]
be the list of H.265 SPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH265SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added tospsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265SequenceParameterSet
entry stored in it withsps_video_parameter_set_id
orsps_seq_parameter_set_id
not matching any of the entries already inspsAddList
is added tospsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then letStdVideoH265PictureParameterSet ppsAddList[]
be the list of H.265 PPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH265PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added toppsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265PictureParameterSet
entry stored in it withsps_video_parameter_set_id
,pps_seq_parameter_set_id
, orpps_pic_parameter_set_id
not matching any of the entries already inppsAddList
is added toppsAddList
.
-
-
If
videoSession
was created with an encode operation, then letuint32_t qualityLevel
be the video encode quality level of the created video session parameters object, defined as follows:-
If the
pNext
chain of this structure includes a VkVideoEncodeQualityLevelInfoKHR structure, thenqualityLevel
is equal to VkVideoEncodeQualityLevelInfoKHR::qualityLevel
. -
Otherwise
qualityLevel
is0
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then letStdVideoH264SequenceParameterSet spsAddList[]
be the list of H.264 SPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH264SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH264SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added tospsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264SequenceParameterSet
entry stored in it withseq_parameter_set_id
not matching any of the entries already inspsAddList
is added tospsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then letStdVideoH264PictureParameterSet ppsAddList[]
be the list of H.264 PPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH264SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH264PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added toppsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH264PictureParameterSet
entry stored in it withseq_parameter_set_id
orpic_parameter_set_id
not matching any of the entries already inppsAddList
is added toppsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then letStdVideoH265VideoParameterSet vpsAddList[]
be the list of H.265 VPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH265VideoParameterSet
entries specified inpParametersAddInfo->pStdVPSs
are added tovpsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265VideoParameterSet
entry stored in it withvps_video_parameter_set_id
not matching any of the entries already invpsAddList
is added tovpsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then letStdVideoH265SequenceParameterSet spsAddList[]
be the list of H.265 SPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH265SequenceParameterSet
entries specified inpParametersAddInfo->pStdSPSs
are added tospsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265SequenceParameterSet
entry stored in it withsps_video_parameter_set_id
orsps_seq_parameter_set_id
not matching any of the entries already inspsAddList
is added tospsAddList
.
-
-
If
videoSession
was created with the codec operationVK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then letStdVideoH265PictureParameterSet ppsAddList[]
be the list of H.265 PPS entries to add to the created video session parameters object, defined as follows:-
If the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure provided in thepNext
chain is notNULL
, then the set ofStdVideoH265PictureParameterSet
entries specified inpParametersAddInfo->pStdPPSs
are added toppsAddList
; -
If
videoSessionParametersTemplate
is notVK_NULL_HANDLE
, then eachStdVideoH265PictureParameterSet
entry stored in it withsps_video_parameter_set_id
,pps_seq_parameter_set_id
, orpps_pic_parameter_set_id
not matching any of the entries already inppsAddList
is added toppsAddList
.
-
Bits which can be set in
VkVideoSessionParametersCreateInfoKHR::flags
are:
// Provided by VK_KHR_video_encode_quantization_map
typedef enum VkVideoSessionParametersCreateFlagBitsKHR {
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR = 0x00000001,
} VkVideoSessionParametersCreateFlagBitsKHR;
-
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
specifies that the created video session parameters object can be used with quantization maps.
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoSessionParametersCreateFlagsKHR;
VkVideoSessionParametersCreateFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoSessionParametersCreateFlagBitsKHR.
The VkVideoEncodeQuantizationMapSessionParametersCreateInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_quantization_map
typedef struct VkVideoEncodeQuantizationMapSessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
VkExtent2D quantizationMapTexelSize;
} VkVideoEncodeQuantizationMapSessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
quantizationMapTexelSize
specifies the quantization map texel size a video session parameters object created withVK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
is compatible with.
Destroying Video Session Parameters
To destroy a video session parameters object, call:
// Provided by VK_KHR_video_queue
void vkDestroyVideoSessionParametersKHR(
VkDevice device,
VkVideoSessionParametersKHR videoSessionParameters,
const VkAllocationCallbacks* pAllocator);
-
device
is the logical device that destroys the video session parameters object. -
videoSessionParameters
is the video session parameters object to destroy. -
pAllocator
controls host memory allocation as described in the Memory Allocation chapter.
Updating Video Session Parameters
To update video session parameters object with new parameters, call:
// Provided by VK_KHR_video_queue
VkResult vkUpdateVideoSessionParametersKHR(
VkDevice device,
VkVideoSessionParametersKHR videoSessionParameters,
const VkVideoSessionParametersUpdateInfoKHR* pUpdateInfo);
-
device
is the logical device that updates the video session parameters. -
videoSessionParameters
is the video session parameters object to update. -
pUpdateInfo
is a pointer to a VkVideoSessionParametersUpdateInfoKHR structure specifying the parameter update information.
After a successful call to this command, the
update sequence counter of
videoSessionParameters
is changed to the value specified in
pUpdateInfo->updateSequenceCount
.
As each update issued to a video session parameters object needs to specify the next available update sequence count value, concurrent updates of the same video session parameters object are inherently disallowed. However, recording video coding operations to command buffers referring to parameters previously added to the video session parameters object is allowed, even if there is a concurrent update in progress adding some new entries to the object. |
If videoSessionParameters
was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
and the
pUpdateInfo->pNext
chain includes a
VkVideoDecodeH264SessionParametersAddInfoKHR structure, then this
command adds the following parameter entries to
videoSessionParameters
:
-
The H.264 SPS entries specified in VkVideoDecodeH264SessionParametersAddInfoKHR::
pStdSPSs
. -
The H.264 PPS entries specified in VkVideoDecodeH264SessionParametersAddInfoKHR::
pStdPPSs
.
If videoSessionParameters
was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
and the
pUpdateInfo->pNext
chain includes a
VkVideoDecodeH265SessionParametersAddInfoKHR structure, then this
command adds the following parameter entries to
videoSessionParameters
:
-
The H.265 VPS entries specified in VkVideoDecodeH265SessionParametersAddInfoKHR::
pStdVPSs
. -
The H.265 SPS entries specified in VkVideoDecodeH265SessionParametersAddInfoKHR::
pStdSPSs
. -
The H.265 PPS entries specified in VkVideoDecodeH265SessionParametersAddInfoKHR::
pStdPPSs
.
If videoSessionParameters
was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
and the
pUpdateInfo->pNext
chain includes a
VkVideoEncodeH264SessionParametersAddInfoKHR structure, then this
command adds the following parameter entries to
videoSessionParameters
:
-
The H.264 SPS entries specified in VkVideoEncodeH264SessionParametersAddInfoKHR::
pStdSPSs
. -
The H.264 PPS entries specified in VkVideoEncodeH264SessionParametersAddInfoKHR::
pStdPPSs
.
If videoSessionParameters
was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
and the
pUpdateInfo->pNext
chain includes a
VkVideoEncodeH265SessionParametersAddInfoKHR structure, then this
command adds the following parameter entries to
videoSessionParameters
:
-
The H.265 VPS entries specified in VkVideoEncodeH265SessionParametersAddInfoKHR::
pStdVPSs
. -
The H.265 SPS entries specified in VkVideoEncodeH265SessionParametersAddInfoKHR::
pStdSPSs
. -
The H.265 PPS entries specified in VkVideoEncodeH265SessionParametersAddInfoKHR::
pStdPPSs
.
In case of video session parameters objects created with a video encode
operation, implementations may return the
VK_ERROR_INVALID_VIDEO_STD_PARAMETERS_KHR
error if any of the
specified Video Std parameters do not adhere to the syntactic or semantic
requirements of the used video compression standard, or if values derived
from parameters according to the rules defined by the used video compression
standard do not adhere to the capabilities of the video compression standard
or the implementation.
Applications should not rely on the
|
The VkVideoSessionParametersUpdateInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoSessionParametersUpdateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t updateSequenceCount;
} VkVideoSessionParametersUpdateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
updateSequenceCount
is the new update sequence count to set for the video session parameters object.
Video Coding Scope
Applications can record video coding commands for a video session only within a video coding scope.
To begin a video coding scope, call:
// Provided by VK_KHR_video_queue
void vkCmdBeginVideoCodingKHR(
VkCommandBuffer commandBuffer,
const VkVideoBeginCodingInfoKHR* pBeginInfo);
-
commandBuffer
is the command buffer in which to record the command. -
pBeginInfo
is a pointer to a VkVideoBeginCodingInfoKHR structure specifying the parameters of the video coding scope, including the video session and video session parameters object to use.
After beginning a video coding scope, the video session object specified in
pBeginInfo->videoSession
is bound to the command buffer, and the
command buffer is ready to record video coding operations.
Similarly, if pBeginInfo->videoSessionParameters
is not
VK_NULL_HANDLE
, it is also bound to the command buffer, and video
coding operations can refer to the codec-specific parameters stored in it.
This command also establishes the set of bound reference picture resources that can be used as reconstructed pictures or reference pictures within the video coding scope. Each element of this set consists of a video picture resource and the DPB slot index associated with it, if there is one.
The set of bound reference picture resources is immutable within a video coding scope, however, the DPB slot index associated with any of the bound reference picture resources can change during the video coding scope in response to video coding operations.
The VkVideoReferenceSlotInfoKHR structures provided as the elements of
pBeginInfo->pReferenceSlots
are interpreted by this command as
follows:
-
If
slotIndex
is non-negative andpPictureResource
is notNULL
, then the video picture resource defined by the VkVideoPictureResourceInfoKHR structure pointed to bypPictureResource
is added to the set of bound reference picture resources and is associated with the DPB slot index specified inslotIndex
. -
If
slotIndex
is non-negative andpPictureResource
isNULL
, then the DPB slot with indexslotIndex
is deactivated by this command. -
If
slotIndex
is negative andpPictureResource
is notNULL
, then the video picture resource defined by the VkVideoPictureResourceInfoKHR structure pointed to bypPictureResource
is added to the set of bound reference picture resources without an associated DPB slot. Such a picture resource can be subsequently used as a reconstructed picture to associate it with a DPB slot. -
If
slotIndex
is negative andpPictureResource
isNULL
, then the element is ignored.
It is possible for multiple bound reference picture resources to be associated with the same DPB slot index, or for a single bound reference picture to refer to multiple separate reference pictures. For example, in case of an H.264 decode profile with interlaced frame support a single DPB slot can refer to two separate pictures for the top and bottom fields. Depending on the picture layout used by the H.264 decode profile, the following special cases may arise:
|
All non-negative slotIndex
values specified in the elements of
pBeginInfo->pReferenceSlots
must identify DPB slots of the video
session that are in the active state at the time this
command is executed on the device.
The application does not have to specify an entry in
|
In case of a video encode session, the application is also responsible for
providing information about the current rate control state configured for the video session by including an instance of
the VkVideoEncodeRateControlInfoKHR structure in the pNext
chain
of pBeginInfo
.
If no VkVideoEncodeRateControlInfoKHR is included, then the presence
of an empty VkVideoEncodeRateControlInfoKHR structure is implied which
indicates that the current rate control mode
is VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
.
The specified state must match the
effective rate control state configured for the video session at the time
the recorded command is executed on the device.
Including an instance of the VkVideoEncodeRateControlInfoKHR structure
in the |
The VkVideoBeginCodingInfoKHR structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoBeginCodingInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoBeginCodingFlagsKHR flags;
VkVideoSessionKHR videoSession;
VkVideoSessionParametersKHR videoSessionParameters;
uint32_t referenceSlotCount;
const VkVideoReferenceSlotInfoKHR* pReferenceSlots;
} VkVideoBeginCodingInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is reserved for future use. -
videoSession
is the video session object to be bound for the processing of the video commands. -
videoSessionParameters
isVK_NULL_HANDLE
or a handle of a VkVideoSessionParametersKHR object to be used for the processing of the video commands. IfVK_NULL_HANDLE
, then no video session parameters object is bound for the duration of the video coding scope. -
referenceSlotCount
is the number of elements in thepReferenceSlots
array. -
pReferenceSlots
is a pointer to an array of VkVideoReferenceSlotInfoKHR structures specifying the information used to determine the set of bound reference picture resources for the video coding scope and their initial association with DPB slot indices.
Limiting values are defined below that are referenced by the relevant valid usage statements of this structure.
-
Let
VkOffset2D codedOffsetGranularity
be the minimum alignment requirement for the coded offset of video picture resources. Unless otherwise defined, the value of thex
andy
members ofcodedOffsetGranularity
are0
.-
If
videoSession
was created with an H.264 decode profile with a VkVideoDecodeH264ProfileInfoKHR::pictureLayout
ofVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
, thencodedOffsetGranularity
is equal to VkVideoDecodeH264CapabilitiesKHR::fieldOffsetGranularity
, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for that video profile.
-
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoBeginCodingFlagsKHR;
VkVideoBeginCodingFlagsKHR
is a bitmask type for setting a mask, but
is currently reserved for future use.
The VkVideoReferenceSlotInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoReferenceSlotInfoKHR {
VkStructureType sType;
const void* pNext;
int32_t slotIndex;
const VkVideoPictureResourceInfoKHR* pPictureResource;
} VkVideoReferenceSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
slotIndex
is the index of the DPB slot or a negative integer value. -
pPictureResource
isNULL
or a pointer to a VkVideoPictureResourceInfoKHR structure describing the video picture resource associated with the DPB slot index specified byslotIndex
.
To end a video coding scope, call:
// Provided by VK_KHR_video_queue
void vkCmdEndVideoCodingKHR(
VkCommandBuffer commandBuffer,
const VkVideoEndCodingInfoKHR* pEndCodingInfo);
-
commandBuffer
is the command buffer in which to record the command. -
pEndCodingInfo
is a pointer to a VkVideoEndCodingInfoKHR structure specifying the parameters for ending the video coding scope.
After ending a video coding scope, the video session object, the optional video session parameters object, and all reference picture resources previously bound by the corresponding vkCmdBeginVideoCodingKHR command are unbound.
The VkVideoEndCodingInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoEndCodingInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEndCodingFlagsKHR flags;
} VkVideoEndCodingInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is reserved for future use.
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoEndCodingFlagsKHR;
VkVideoEndCodingFlagsKHR
is a bitmask type for setting a mask, but is
currently reserved for future use.
Video Coding Control
To apply dynamic controls to the bound video session object, call:
// Provided by VK_KHR_video_queue
void vkCmdControlVideoCodingKHR(
VkCommandBuffer commandBuffer,
const VkVideoCodingControlInfoKHR* pCodingControlInfo);
-
commandBuffer
is the command buffer in which to record the command. -
pCodingControlInfo
is a pointer to a VkVideoCodingControlInfoKHR structure specifying the control parameters.
The control parameters provided in this call are applied to the video session at the time the command executes on the device and are in effect until a subsequent call to this command with the same video session bound changes the corresponding control parameters.
A newly created video session must be reset before performing video coding
operations using it by including VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR
in pCodingControlInfo->flags
.
The reset operation also returns all DPB slots of the video session to the
inactive state.
Correspondingly, any DPB slot index associated with the
bound reference picture resources is
removed.
For encode sessions, the reset operation returns rate control configuration to implementation default settings and sets the video encode quality level to zero.
After video coding operations are performed using a video session, the reset operation can be used to return the video session to the same initial state as after the reset of a newly created video session. This can be used, for example, when different video sequences are needed to be processed with the same video session object.
If pCodingControlInfo->flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, then the command
replaces the rate control configuration maintained
by the video session with the configuration specified in the
VkVideoEncodeRateControlInfoKHR structure included in the
pCodingControlInfo->pNext
chain.
If pCodingControlInfo->flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_QUALITY_LEVEL_BIT_KHR
, then the command
changes the current video encode quality level to
the value specified in the qualityLevel
member of the
VkVideoEncodeQualityLevelInfoKHR structure included in the
pCodingControlInfo->pNext
chain.
The VkVideoCodingControlInfoKHR
structure is defined as:
// Provided by VK_KHR_video_queue
typedef struct VkVideoCodingControlInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoCodingControlFlagsKHR flags;
} VkVideoCodingControlInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoCodingControlFlagsKHR specifying control flags.
Bits which can be set in VkVideoCodingControlInfoKHR::flags
,
specifying the video coding control parameters to be modified, are:
// Provided by VK_KHR_video_queue
typedef enum VkVideoCodingControlFlagBitsKHR {
VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR = 0x00000001,
// Provided by VK_KHR_video_encode_queue
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR = 0x00000002,
// Provided by VK_KHR_video_encode_queue
VK_VIDEO_CODING_CONTROL_ENCODE_QUALITY_LEVEL_BIT_KHR = 0x00000004,
} VkVideoCodingControlFlagBitsKHR;
-
VK_VIDEO_CODING_CONTROL_RESET_BIT_KHR
specifies a request for the bound video session to be reset before other coding control parameters are applied. -
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
specifies that the coding control parameters include video encode rate control parameters (see VkVideoEncodeRateControlInfoKHR). -
VK_VIDEO_CODING_CONTROL_ENCODE_QUALITY_LEVEL_BIT_KHR
specifies that the coding control parameters include video encode quality level parameters (see VkVideoEncodeQualityLevelInfoKHR).
// Provided by VK_KHR_video_queue
typedef VkFlags VkVideoCodingControlFlagsKHR;
VkVideoCodingControlFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoCodingControlFlagBitsKHR.
Inline Queries
If a video session was created with
VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR
, beginning queries
using commands such as vkCmdBeginQuery within a video coding scope is
not allowed.
Instead, queries are executed inline by including an instance of the
VkVideoInlineQueryInfoKHR structure in the pNext
chain of the
parameters of one of the video coding commands, with its queryPool
member set to a valid VkQueryPool
handle.
The VkVideoInlineQueryInfoKHR
structure is defined as:
// Provided by VK_KHR_video_maintenance1
typedef struct VkVideoInlineQueryInfoKHR {
VkStructureType sType;
const void* pNext;
VkQueryPool queryPool;
uint32_t firstQuery;
uint32_t queryCount;
} VkVideoInlineQueryInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
queryPool
isVK_NULL_HANDLE
or a valid handle to a VkQueryPool object that will manage the results of the queries. -
firstQuery
is the query index within the query pool that will contain the query results for the first video coding operation. The query results of subsequent video coding operations will be contained by subsequent query indices. -
queryCount
is the number of queries to execute.In practice, if
queryPool
is notVK_NULL_HANDLE
, thenqueryCount
will always have to match the number of video coding operations issued by the video coding command this structure is specified to, meaning that using inline queries in a video coding command will always execute a query for each issued video coding operation.
This structure can be included in the pNext
chain of the input
parameter structure of video coding commands.
-
In the
pNext
chain of thepDecodeInfo
parameter of the vkCmdDecodeVideoKHR command to execute a query for each video decode operation issued by the command. -
In the
pNext
chain of thepEncodeInfo
parameter of the vkCmdEncodeVideoKHR command to execute a query for each video encode operation issued by the command.
Video Decode Operations
Video decode operations consume compressed video data from a video bitstream buffer and zero or more reference pictures, and produce a decode output picture and an optional reconstructed picture.
Such decode output pictures can be shared with the Decoded Picture Buffer, and can also be used as the input of video encode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation. |
Video decode operations may access the following resources in the
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR
stage:
-
The source video bitstream buffer range and the image subregions corresponding to the list of active reference pictures with access
VK_ACCESS_2_VIDEO_DECODE_READ_BIT_KHR
. -
The image subregions corresponding to the target decode output picture and reconstructed picture with access
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR
.
The image subresource of each video picture resource accessed by the video decode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:
-
If the image subresource is used in the video decode operation only as decode output picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR
layout. -
If the image subresource is used in the video decode operation both as decode output picture and reconstructed picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout. -
If the image subresource is used in the video decode operation only as reconstructed picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout. -
If the image subresource is used in the video decode operation as a reference picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout.
A video decode operation may complete unsuccessfully. In this case the decode output picture will have undefined contents. Similarly, if reference picture setup is requested, the reconstructed picture will also have undefined contents, and the activated DPB slot will have an invalid picture reference.
Codec-Specific Semantics
The following aspects of video decode operations are codec-specific:
-
The interpretation of the contents of the source video bitstream buffer range.
-
The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
-
The construction and interpretation of information related to the decode output picture and the generation of picture data to the corresponding image subregion.
-
The decision on reference picture setup.
-
The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.
These codec-specific behaviors are defined for each video codec operation separately.
-
If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then the codec-specific aspects of the video decoding process are performed as defined in the H.264 Decode Operations section. -
If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then the codec-specific aspects of the video decoding process are performed as defined in the H.265 Decode Operations section. -
If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
, then the codec-specific aspects of the video decoding process are performed as defined in the AV1 Decode Operations section.
Video Decode Operation Steps
Each video decode operation performs the following steps in the
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR
stage:
-
Reads the encoded video data from the source video bitstream buffer range.
-
Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process;
-
Writes the decoded picture data to the decode output picture, and optionally to the reconstructed picture, if one is specified and is different from the decode output picture, according to the codec-specific semantics;
-
If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture.
When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.
Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR with
pVideoProfile->videoCodecOperation
specifying a decode operation, the
VkVideoDecodeCapabilitiesKHR
structure must be included in the
pNext
chain of the VkVideoCapabilitiesKHR structure to retrieve
capabilities specific to video decoding.
The VkVideoDecodeCapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_decode_queue
typedef struct VkVideoDecodeCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoDecodeCapabilityFlagsKHR flags;
} VkVideoDecodeCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoDecodeCapabilityFlagBitsKHR describing the supported video decoding capabilities.
Bits which may be set in VkVideoDecodeCapabilitiesKHR::flags
,
indicating the decoding capabilities supported, are:
// Provided by VK_KHR_video_decode_queue
typedef enum VkVideoDecodeCapabilityFlagBitsKHR {
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR = 0x00000001,
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR = 0x00000002,
} VkVideoDecodeCapabilityFlagBitsKHR;
-
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
indicates support for using the same video picture resource as the reconstructed picture and decode output picture in a video decode operation. -
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
indicates support for using distinct video picture resources as the reconstructed picture and decode output picture in a video decode operation.Some video profiles allow using distinct video picture resources as the reconstructed picture and decode output picture in specific video decode operations even when the video decode profile does not support
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
. Even if the implementation only reports coincide, the decode output picture for film grain enabled frames must be a different video picture resource from the reconstructed picture because film grain is applied outside of the coding loop.
Implementations are only required to support one of
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
and
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
.
Accordingly, applications should handle both cases to maximize portability.
If both |
// Provided by VK_KHR_video_decode_queue
typedef VkFlags VkVideoDecodeCapabilityFlagsKHR;
VkVideoDecodeCapabilityFlagsKHR
is a bitmask type for setting a mask
of zero or more VkVideoDecodeCapabilityFlagBitsKHR.
Video Decode Commands
To launch video decode operations, call:
// Provided by VK_KHR_video_decode_queue
void vkCmdDecodeVideoKHR(
VkCommandBuffer commandBuffer,
const VkVideoDecodeInfoKHR* pDecodeInfo);
-
commandBuffer
is the command buffer in which to record the command. -
pDecodeInfo
is a pointer to a VkVideoDecodeInfoKHR structure specifying the parameters of the video decode operations.
Each call issues one or more video decode operations.
The implicit parameter opCount
corresponds to the number of video
decode operations issued by the command.
After calling this command, the
active query index of each
active query is incremented by opCount
.
Currently each call to this command results in the issue of a single video decode operation.
If the bound video session was created with
VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR
and the pNext
chain of pDecodeInfo
includes a VkVideoInlineQueryInfoKHR
structure with its queryPool
member specifying a valid
VkQueryPool
handle, then this command will execute a query for each
video decode operation issued by it.
- Active Reference Picture Information
-
The list of active reference pictures used by a video decode operation is a list of image subregions used as the source of reference picture data and related parameters, and is derived from the VkVideoReferenceSlotInfoKHR structures provided as the elements of the
pDecodeInfo->pReferenceSlots
array. For each element ofpDecodeInfo->pReferenceSlots
, one or more elements are added to the active reference picture list, as defined by the codec-specific semantics. Each element of this list contains the following information:-
The image subregion within the image subresource referred to by the video picture resource used as the reference picture.
-
The DPB slot index the reference picture is associated with.
-
The codec-specific reference information related to the reference picture.
-
- Reconstructed Picture Information
-
Information related to the optional reconstructed picture used by a video decode operation is derived from the VkVideoReferenceSlotInfoKHR structure pointed to by
pDecodeInfo->pSetupReferenceSlot
, if notNULL
, as defined by the codec-specific semantics, and consists of the following:-
The image subregion within the image subresource referred to by the video picture resource used as the reconstructed picture.
-
The DPB slot index to use for picture reconstruction.
-
The codec-specific reference information related to the reconstructed picture.
-
Specifying a valid VkVideoReferenceSlotInfoKHR structure in
pDecodeInfo->pSetupReferenceSlot
is always required, unless the video
session was created with VkVideoSessionCreateInfoKHR::maxDpbSlot
equal to zero.
However, the DPB slot identified by
pDecodeInfo->pSetupReferenceSlot→slotIndex
is only
activated with the reconstructed picture specified in
pDecodeInfo->pSetupReferenceSlot→pPictureResource
if reference
picture setup is requested according to the
codec-specific semantics.
If reconstructed picture information is specified, and
pDecodeInfo->pSetupReferenceSlot→pPictureResource
refers to a
video picture resource different than that of
the decode output picture, but reference picture
setup is not requested, the contents of the video picture resource corresponding to the reconstructed picture will be
undefined after the video decode operation.
Some implementations may always output the reconstructed picture or use it as temporary storage during the video decode operation even when the reconstructed picture is not marked for future reference. |
- Decode Output Picture Information
-
Information related to the decode output picture used by a video decode operation is derived from
pDecodeInfo->dstPictureResource
and any codec-specific parameters provided in thepDecodeInfo->pNext
chain, as defined by the codec-specific semantics, and consists of the following:-
The image subregion within the image subresource referred to by the video picture resource used as the decode output picture.
-
The codec-specific picture information related to the decode output picture.
-
Several limiting values are defined below that are referenced by the relevant valid usage statements of this command.
-
Let
uint32_t activeReferencePictureCount
be the size of the list of active reference pictures used by the video decode operation. Unless otherwise defined,activeReferencePictureCount
is set to the value ofpDecodeInfo->referenceSlotCount
.-
If the bound video session was created with an H.264 decode profile, then let
activeReferencePictureCount
be the value ofpDecodeInfo->referenceSlotCount
plus the number of elements of thepDecodeInfo->pReferenceSlots
array that have a VkVideoDecodeH264DpbSlotInfoKHR structure included in theirpNext
chain with bothpStdReferenceInfo->flags.top_field_flag
andpStdReferenceInfo->flags.bottom_field_flag
set.This means that the elements of
pDecodeInfo->pReferenceSlots
that include both a top and bottom field reference are counted as two separate active reference pictures, as described in the active reference picture list construction rules for H.264 decode operations.
-
-
Let
VkOffset2D codedOffsetGranularity
be the minimum alignment requirement for the coded offset of video picture resources. Unless otherwise defined, the value of thex
andy
members ofcodedOffsetGranularity
are0
.-
If the bound video session was created with an H.264 decode profile with a VkVideoDecodeH264ProfileInfoKHR::
pictureLayout
ofVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
, thencodedOffsetGranularity
is equal to VkVideoDecodeH264CapabilitiesKHR::fieldOffsetGranularity
, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for that video profile.
-
-
Let
uint32_t dpbFrameUseCount[]
be an array of sizemaxDpbSlots
, wheremaxDpbSlots
is the VkVideoSessionCreateInfoKHR::maxDpbSlots
the bound video session was created with, with each element indicating the number of times a frame associated with the corresponding DPB slot index is referred to by the video coding operation. Let the initial value of each element of the array be0
.-
If
pDecodeInfo->pSetupReferenceSlot
is notNULL
, thendpbFrameUseCount[i]
is incremented by one, wherei
equalspDecodeInfo->pSetupReferenceSlot→slotIndex
. If the bound video session object was created with an H.264 decode profile, thendpbFrameUseCount[i]
is decremented by one if eitherpStdReferenceInfo->flags.top_field_flag
orpStdReferenceInfo->flags.bottom_field_flag
is set in the VkVideoDecodeH264DpbSlotInfoKHR structure in thepDecodeInfo->pSetupReferenceSlot→pNext
chain. -
For each element of
pDecodeInfo->pReferenceSlots
,dpbFrameUseCount[i]
is incremented by one, wherei
equals theslotIndex
member of the corresponding element. If the bound video session object was created with an H.264 decode profile, thendpbFrameUseCount[i]
is decremented by one if eitherpStdReferenceInfo->flags.top_field_flag
orpStdReferenceInfo->flags.bottom_field_flag
is set in the VkVideoDecodeH264DpbSlotInfoKHR structure in thepNext
chain of the corresponding element ofpDecodeInfo->pReferenceSlots
.
-
-
Let
uint32_t dpbTopFieldUseCount[]
anduint32_t dpbBottomFieldUseCount[]
be arrays of sizemaxDpbSlots
, wheremaxDpbSlots
is the VkVideoSessionCreateInfoKHR::maxDpbSlots
the bound video session was created with, with each element indicating the number of times the top field or the bottom field, respectively, associated with the corresponding DPB slot index is referred to by the video coding operation. Let the initial value of each element of the arrays be0
.-
If the bound video session object was created with an H.264 decode profile and
pDecodeInfo->pSetupReferenceSlot
is notNULL
, then perform the following:-
If
pStdReferenceInfo->flags.top_field_flag
is set in the VkVideoDecodeH264DpbSlotInfoKHR structure in thepDecodeInfo->pSetupReferenceSlot→pNext
chain, thendpbTopFieldUseCount[i]
is incremented by one, wherei
equalspDecodeInfo->pSetupReferenceSlot→slotIndex
. -
If
pStdReferenceInfo->flags.bottom_field_flag
is set in the VkVideoDecodeH264DpbSlotInfoKHR structure in thepDecodeInfo->pSetupReferenceSlot→pNext
chain, thendpbBottomFieldUseCount[i]
is incremented by one, wherei
equalspDecodeInfo->pSetupReferenceSlot→slotIndex
.
-
-
If the bound video session object was created with an H.264 decode profile, then perform the following for each element of
pDecodeInfo->pReferenceSlots
:-
If
pStdReferenceInfo->flags.top_field_flag
is set in the VkVideoDecodeH264DpbSlotInfoKHR structure in thepNext
chain of the element, thendpbTopFieldUseCount[i]
is incremented by one, wherei
equals theslotIndex
member of the element. -
If
pStdReferenceInfo->flags.bottom_field_flag
is set in the VkVideoDecodeH264DpbSlotInfoKHR structure in thepNext
chain of the element, thendpbBottomFieldUseCount[i]
is incremented by one, wherei
equals theslotIndex
member of the element.
-
-
The VkVideoDecodeInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_queue
typedef struct VkVideoDecodeInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoDecodeFlagsKHR flags;
VkBuffer srcBuffer;
VkDeviceSize srcBufferOffset;
VkDeviceSize srcBufferRange;
VkVideoPictureResourceInfoKHR dstPictureResource;
const VkVideoReferenceSlotInfoKHR* pSetupReferenceSlot;
uint32_t referenceSlotCount;
const VkVideoReferenceSlotInfoKHR* pReferenceSlots;
} VkVideoDecodeInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is reserved for future use. -
srcBuffer
is the source video bitstream buffer to read the encoded bitstream from. -
srcBufferOffset
is the starting offset in bytes from the start ofsrcBuffer
to read the encoded bitstream from. -
srcBufferRange
is the size in bytes of the encoded bitstream to decode fromsrcBuffer
, starting fromsrcBufferOffset
. -
dstPictureResource
is the video picture resource to use as the decode output picture. -
pSetupReferenceSlot
isNULL
or a pointer to a VkVideoReferenceSlotInfoKHR structure specifying the reconstructed picture information. -
referenceSlotCount
is the number of elements in thepReferenceSlots
array. -
pReferenceSlots
isNULL
or a pointer to an array of VkVideoReferenceSlotInfoKHR structures describing the DPB slots and corresponding reference picture resources to use in this video decode operation (the set of active reference pictures).
// Provided by VK_KHR_video_decode_queue
typedef VkFlags VkVideoDecodeFlagsKHR;
VkVideoDecodeFlagsKHR
is a bitmask type for setting a mask, but is
currently reserved for future use.
H.264 Decode Operations
Video decode operations using an H.264 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.264 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos. |
This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:
-
Syntax elements, derived values, and other parameters are applied from the following structures:
-
The
StdVideoH264SequenceParameterSet
structure corresponding to the active SPS specifying the H.264 sequence parameter set. -
The
StdVideoH264PictureParameterSet
structure corresponding to the active PPS specifying the H.264 picture parameter set. -
The
StdVideoDecodeH264PictureInfo
structure specifying the H.264 picture information. -
The
StdVideoDecodeH264ReferenceInfo
structures specifying the H.264 reference information corresponding to the optional reconstructed picture and any active reference pictures.
-
-
The contents of the provided video bitstream buffer range are interpreted as defined in the H.264 Decode Bitstream Data Access section.
-
Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the H.264 Decode Picture Data Access section.
-
The decision on reference picture setup is made according to the parameters specified in the H.264 picture information.
If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.
H.264 Decode Bitstream Data Access
If the target decode output picture is a frame, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing an entire frame, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.
If the target decode output picture is a field, then the video bitstream buffer range should contain a VCL NAL unit comprised of the slice headers and data of a picture representing a field, as defined in sections 7.3.3 and 7.3.4, and this data is interpreted as defined in sections 7.4.3 and 7.4.4 of the ITU-T H.264 Specification, respectively.
The offsets provided in
VkVideoDecodeH264PictureInfoKHR::pSliceOffsets
should specify
the starting offsets corresponding to each slice header within the video
bitstream buffer range.
H.264 Decode Picture Data Access
The effective imageOffset
and imageExtent
corresponding to a
decode output picture,
reference picture, or
reconstructed picture used in video decode
operations with an H.264 decode profile are defined
as follows:
-
imageOffset
is (codedOffset.x
,codedOffset.y
) andimageExtent
is (codedExtent.width
,codedExtent.height
), if the picture represents a frame. -
imageOffset
is (codedOffset.x
,codedOffset.y
) andimageExtent
is (codedExtent.width
,codedExtent.height
), if the picture represents a field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
. -
imageOffset
is (codedOffset.x
,codedOffset.y
) andimageExtent
is (codedExtent.width
,codedExtent.height
/ 2), if the picture represents a field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
.
Where codedOffset
and codedExtent
are the members of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
However, accesses to image data within a video picture resource happen at
the granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
This means that the complete image subregion accessed by video coding
operations using an H.264 decode profile for the
video picture resource is defined as the set of texels within the coordinate
range:
-
([
startX
,endX
), [startY
,endY
))
Where:
-
startX
equalsimageOffset.x
rounded down to the nearest integer multiple ofpictureAccessGranularity.width
; -
endX
equalsimageOffset.x
+imageExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure; -
startY equals
imageOffset.y
rounded down to the nearest integer multiple ofpictureAccessGranularity.height
; -
endY equals
imageOffset.y
+imageExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure.
In case of video decode operations using an H.264 decode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.264 Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
specified below:
-
(
x
,y
), if the accessed picture represents a frame. -
(
x
,y
× 2), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
. -
(
x
,y
× 2 + 1), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
. -
(
x
,y
), if the accessed picture represents a top field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
. -
(
codedOffset.x
+x
,codedOffset.y
+y
), if the accessed picture represents a bottom field and the picture layout of the used H.264 decode profile isVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
.
Where codedOffset
is the member of the corresponding
VkVideoPictureResourceInfoKHR structure.
H.264 Decode Profile
A video profile supporting H.264 video decode operations is specified by
setting VkVideoProfileInfoKHR::videoCodecOperation
to
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
and adding a
VkVideoDecodeH264ProfileInfoKHR
structure to the
VkVideoProfileInfoKHR::pNext
chain.
The VkVideoDecodeH264ProfileInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h264
typedef struct VkVideoDecodeH264ProfileInfoKHR {
VkStructureType sType;
const void* pNext;
StdVideoH264ProfileIdc stdProfileIdc;
VkVideoDecodeH264PictureLayoutFlagBitsKHR pictureLayout;
} VkVideoDecodeH264ProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdProfileIdc
is aStdVideoH264ProfileIdc
value specifying the H.264 codec profile IDC, as defined in section A.2 of the ITU-T H.264 Specification. -
pictureLayout
is a VkVideoDecodeH264PictureLayoutFlagBitsKHR value specifying the picture layout used by the H.264 video sequence to be decoded.
The H.264 video decode picture layout flags are defined as follows:
// Provided by VK_KHR_video_decode_h264
typedef enum VkVideoDecodeH264PictureLayoutFlagBitsKHR {
VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_PROGRESSIVE_KHR = 0,
VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR = 0x00000001,
VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR = 0x00000002,
} VkVideoDecodeH264PictureLayoutFlagBitsKHR;
-
VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_PROGRESSIVE_KHR
specifies support for progressive content. This flag has the value0
. -
VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_INTERLEAVED_LINES_BIT_KHR
specifies support for or use of a picture layout for interlaced content where all lines belonging to the top field are decoded to the even-numbered lines within the picture resource, and all lines belonging to the bottom field are decoded to the odd-numbered lines within the picture resource. -
VK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
specifies support for or use of a picture layout for interlaced content where all lines belonging to a field are grouped together in a single image subregion, and the two fields comprising the frame can be stored in separate image subregions of the same image subresource or in separate image subresources.
// Provided by VK_KHR_video_decode_h264
typedef VkFlags VkVideoDecodeH264PictureLayoutFlagsKHR;
VkVideoDecodeH264PictureLayoutFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoDecodeH264PictureLayoutFlagBitsKHR.
H.264 Decode Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities for an H.264 decode profile, the
VkVideoCapabilitiesKHR::pNext
chain must include a
VkVideoDecodeH264CapabilitiesKHR
structure that will be filled with
the profile-specific capabilities.
The VkVideoDecodeH264CapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h264
typedef struct VkVideoDecodeH264CapabilitiesKHR {
VkStructureType sType;
void* pNext;
StdVideoH264LevelIdc maxLevelIdc;
VkOffset2D fieldOffsetGranularity;
} VkVideoDecodeH264CapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxLevelIdc
is aStdVideoH264LevelIdc
value indicating the maximum H.264 level supported by the profile, where enum constantSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifies H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification. -
fieldOffsetGranularity
is the minimum alignment for VkVideoPictureResourceInfoKHR::codedOffset
specified for a video picture resource when using the picture layoutVK_VIDEO_DECODE_H264_PICTURE_LAYOUT_INTERLACED_SEPARATE_PLANES_BIT_KHR
.
H.264 Decode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
can contain the following types of parameters:
- H.264 Sequence Parameter Sets (SPS)
-
Represented by
StdVideoH264SequenceParameterSet
structures and interpreted as follows:-
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored; -
seq_parameter_set_id
is used as the key of the SPS entry; -
level_idc
is one of the enum constantsSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifying the H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification; -
if
flags.seq_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
scaling_list_present_mask
is a bitmask where bit index i corresponds toseq_scaling_list_present_flag[i]
as defined in section 7.4.2.1 of the ITU-T H.264 Specification; -
use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification; -
ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
-
-
if
flags.vui_parameters_present_flag
is set, thenpSequenceParameterSetVui
is a pointer to aStdVideoH264SequenceParameterSetVui
structure that is interpreted as follows:-
reserved1
is used only for padding purposes and is otherwise ignored; -
flags.color_description_present_flag
is interpreted as the value ofcolour_description_present_flag
, as defined in section E.2.1 of the ITU-T H.264 Specification;The name of
colour_description_present_flag
was misspelled in the Video Std header. -
if
flags.nal_hrd_parameters_present_flag
orflags.vcl_hrd_parameters_present_flag
is set, then theStdVideoH264HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:-
reserved1
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoH264HrdParameters
are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
-
-
all other members of
StdVideoH264SequenceParameterSetVui
are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
-
-
all other members of
StdVideoH264SequenceParameterSet
are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.
-
- H.264 Picture Parameter Sets (PPS)
-
Represented by
StdVideoH264PictureParameterSet
structures and interpreted as follows:-
the pair constructed from
seq_parameter_set_id
andpic_parameter_set_id
is used as the key of the PPS entry; -
if
flags.pic_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
scaling_list_present_mask
is a bitmask where bit index i corresponds topic_scaling_list_present_flag[i]
as defined in section 7.4.2.2 of the ITU-T H.264 Specification; -
use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification; -
ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
-
-
all other members of
StdVideoH264PictureParameterSet
are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.
-
When a video session parameters object is
created with the codec operation
VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, the
VkVideoSessionParametersCreateInfoKHR::pNext
chain must include
a VkVideoDecodeH264SessionParametersCreateInfoKHR
structure specifying
the capacity and initial contents of the object.
The VkVideoDecodeH264SessionParametersCreateInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_decode_h264
typedef struct VkVideoDecodeH264SessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t maxStdSPSCount;
uint32_t maxStdPPSCount;
const VkVideoDecodeH264SessionParametersAddInfoKHR* pParametersAddInfo;
} VkVideoDecodeH264SessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxStdSPSCount
is the maximum number of H.264 SPS entries the createdVkVideoSessionParametersKHR
can contain. -
maxStdPPSCount
is the maximum number of H.264 PPS entries the createdVkVideoSessionParametersKHR
can contain. -
pParametersAddInfo
isNULL
or a pointer to a VkVideoDecodeH264SessionParametersAddInfoKHR structure specifying H.264 parameters to add upon object creation.
The VkVideoDecodeH264SessionParametersAddInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_decode_h264
typedef struct VkVideoDecodeH264SessionParametersAddInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t stdSPSCount;
const StdVideoH264SequenceParameterSet* pStdSPSs;
uint32_t stdPPSCount;
const StdVideoH264PictureParameterSet* pStdPPSs;
} VkVideoDecodeH264SessionParametersAddInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdSPSCount
is the number of elements in thepStdSPSs
array. -
pStdSPSs
is a pointer to an array ofStdVideoH264SequenceParameterSet
structures describing the H.264 SPS entries to add. -
stdPPSCount
is the number of elements in thepStdPPSs
array. -
pStdPPSs
is a pointer to an array ofStdVideoH264PictureParameterSet
structures describing the H.264 PPS entries to add.
This structure can be specified in the following places:
-
In the
pParametersAddInfo
member of the VkVideoDecodeH264SessionParametersCreateInfoKHR structure specified in thepNext
chain of VkVideoSessionParametersCreateInfoKHR used to create a video session parameters object. In this case, if the video codec operation the video session parameters object is created with isVK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then it defines the set of initial parameters to add to the created object (see Creating Video Session Parameters). -
In the
pNext
chain of VkVideoSessionParametersUpdateInfoKHR. In this case, if the video codec operation the video session parameters object to be updated was created with isVK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, then it defines the set of parameters to add to it (see Updating Video Session Parameters).
H.264 Decoding Parameters
The VkVideoDecodeH264PictureInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h264
typedef struct VkVideoDecodeH264PictureInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoDecodeH264PictureInfo* pStdPictureInfo;
uint32_t sliceCount;
const uint32_t* pSliceOffsets;
} VkVideoDecodeH264PictureInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdPictureInfo
is a pointer to aStdVideoDecodeH264PictureInfo
structure specifying H.264 picture information. -
sliceCount
is the number of elements inpSliceOffsets
. -
pSliceOffsets
is a pointer to an array ofsliceCount
offsets specifying the start offset of the slices of the picture within the video bitstream buffer range specified in VkVideoDecodeInfoKHR.
This structure is specified in the pNext
chain of the
VkVideoDecodeInfoKHR structure passed to vkCmdDecodeVideoKHR to
specify the codec-specific picture information for an H.264 decode operation.
- Decode Output Picture Information
-
When this structure is specified in the
pNext
chain of the VkVideoDecodeInfoKHR structure passed to vkCmdDecodeVideoKHR, the information related to the decode output picture is defined as follows:-
If
pStdPictureInfo->flags.field_pic_flag
is not set, then the picture represents a frame. -
If
pStdPictureInfo->flags.field_pic_flag
is set, then the picture represents a field. Specifically:-
If
pStdPictureInfo->flags.bottom_field_flag
is not set, then the picture represents the top field of the frame. -
If
pStdPictureInfo->flags.bottom_field_flag
is set, then the picture represents the bottom field of the frame.
-
-
The image subregion used is determined according to the H.264 Decode Picture Data Access section.
-
The decode output picture is associated with the H.264 picture information provided in
pStdPictureInfo
.
-
- Std Picture Information
-
The members of the
StdVideoDecodeH264PictureInfo
structure pointed to bypStdPictureInfo
are interpreted as follows:-
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored; -
flags.is_intra
as defined in section 3.73 of the ITU-T H.264 Specification; -
flags.is_reference
as defined in section 3.136 of the ITU-T H.264 Specification; -
flags.complementary_field_pair
as defined in section 3.35 of the ITU-T H.264 Specification; -
seq_parameter_set_id
andpic_parameter_set_id
are used to identify the active parameter sets, as described below; -
all other members are interpreted as defined in section 7.4.3 of the ITU-T H.264 Specification.
-
Reference picture setup is controlled by the value of
StdVideoDecodeH264PictureInfo
::flags.is_reference
.
If it is set and a reconstructed picture is specified, then the latter is used as the target of picture
reconstruction to activate the DPB slot
specified in pDecodeInfo->pSetupReferenceSlot→slotIndex
.
If StdVideoDecodeH264PictureInfo
::flags.is_reference
is not set,
but a reconstructed picture is
specified, then the corresponding picture reference associated with the
DPB slot is invalidated, as described in the
DPB Slot States section.
- Active Parameter Sets
-
The members of the
StdVideoDecodeH264PictureInfo
structure pointed to bypStdPictureInfo
are used to select the active parameter sets to use from the bound video session parameters object, as follows:-
The active SPS is the SPS identified by the key specified in
StdVideoDecodeH264PictureInfo
::seq_parameter_set_id
. -
The active PPS is the PPS identified by the key specified by the pair constructed from
StdVideoDecodeH264PictureInfo
::seq_parameter_set_id
andStdVideoDecodeH264PictureInfo
::pic_parameter_set_id
.
-
The VkVideoDecodeH264DpbSlotInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h264
typedef struct VkVideoDecodeH264DpbSlotInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoDecodeH264ReferenceInfo* pStdReferenceInfo;
} VkVideoDecodeH264DpbSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdReferenceInfo
is a pointer to aStdVideoDecodeH264ReferenceInfo
structure specifying H.264 reference information.
This structure is specified in the pNext
chain of
VkVideoDecodeInfoKHR::pSetupReferenceSlot
, if not NULL
, and
the pNext
chain of the elements of
VkVideoDecodeInfoKHR::pReferenceSlots
to specify the
codec-specific reference picture information for an H.264 decode operation.
- Active Reference Picture Information
-
When this structure is specified in the
pNext
chain of the elements of VkVideoDecodeInfoKHR::pReferenceSlots
, one or two elements are added to the list of active reference pictures used by the video decode operation for each element of VkVideoDecodeInfoKHR::pReferenceSlots
as follows:-
If neither
pStdReferenceInfo->flags.top_field_flag
norpStdReferenceInfo->flags.bottom_field_flag
is set, then the picture is added as a frame reference to the list of active reference pictures. -
If
pStdReferenceInfo->flags.top_field_flag
is set, then the picture is added as a top field reference to the list of active reference pictures. -
If
pStdReferenceInfo->flags.bottom_field_flag
is set, then the picture is added as a bottom field reference to the list of active reference pictures. -
For each added reference picture, the corresponding image subregion used is determined according to the H.264 Decode Picture Data Access section.
-
Each added reference picture is associated with the DPB slot index specified in the
slotIndex
member of the corresponding element of VkVideoDecodeInfoKHR::pReferenceSlots
. -
Each added reference picture is associated with the H.264 reference information provided in
pStdReferenceInfo
.
-
When both the top and bottom field of an interlaced frame currently
associated with a DPB slot is intended to be used as an active reference
picture and both fields are stored in the same image subregion (which is the
case when using
|
- Reconstructed Picture Information
-
When this structure is specified in the
pNext
chain of VkVideoDecodeInfoKHR::pSetupReferenceSlot
, the information related to the reconstructed picture is defined as follows:-
If neither
pStdReferenceInfo->flags.top_field_flag
norpStdReferenceInfo->flags.bottom_field_flag
is set, then the picture represents a frame. -
If
pStdReferenceInfo->flags.top_field_flag
is set, then the picture represents a field, specifically, the top field of the frame. -
If
pStdReferenceInfo->flags.bottom_field_flag
is set, then the picture represents a field, specifically, the bottom field of the frame. -
The image subregion used is determined according to the H.264 Decode Picture Data Access section.
-
If reference picture setup is requested, then the reconstructed picture is used to activate the DPB slot with the index specified in VkVideoDecodeInfoKHR::
pSetupReferenceSlot->slotIndex
. -
The reconstructed picture is associated with the H.264 reference information provided in
pStdReferenceInfo
.
-
- Std Reference Information
-
The members of the
StdVideoDecodeH264ReferenceInfo
structure pointed to bypStdReferenceInfo
are interpreted as follows:-
flags.top_field_flag
is used to indicate whether the reference is used as top field reference; -
flags.bottom_field_flag
is used to indicate whether the reference is used as bottom field reference; -
flags.used_for_long_term_reference
is used to indicate whether the picture is marked as “used for long-term reference” as defined in section 8.2.5.1 of the ITU-T H.264 Specification; -
flags.is_non_existing
is used to indicate whether the picture is marked as “non-existing” as defined in section 8.2.5.2 of the ITU-T H.264 Specification; -
all other members are interpreted as defined in section 8.2 of the ITU-T H.264 Specification.
-
H.264 Decode Requirements
This section describes the required H.264 decoding capabilities for
physical devices that have at least one queue family that supports the video
codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H264_BIT_KHR
, as
returned by vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
|
1.0.0 |
Video Capability | Requirement | Requirement Type1 |
---|---|---|
|
- |
min |
|
4096 |
max |
|
4096 |
max |
|
(64,64) |
max |
|
- |
max |
|
- |
min |
|
0 |
min |
|
0 |
min |
|
|
min |
|
|
min |
|
(0,0) except for profiles using |
implementation-dependent |
- 1
-
The Requirement Type column specifies the requirement is either the minimum value all implementations must support, the maximum value all implementations must support, or the exact value all implementations must support. For bitmasks a minimum value is the least bits all implementations must set, but they may have additional bits set beyond this minimum.
H.265 Decode Operations
Video decode operations using an H.265 decode profile can be used to decode elementary video stream sequences compliant to the ITU-T H.265 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos. |
This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 8 of ITU-T H.265 Specification:
-
Syntax elements, derived values, and other parameters are applied from the following structures:
-
The
StdVideoH265VideoParameterSet
structure corresponding to the active VPS specifying the H.265 video parameter set. -
The
StdVideoH265SequenceParameterSet
structure corresponding to the active SPS specifying the H.265 sequence parameter set. -
The
StdVideoH265PictureParameterSet
structure corresponding to the active PPS specifying the H.265 picture parameter set. -
The
StdVideoDecodeH265PictureInfo
structure specifying the H.265 picture information. -
The
StdVideoDecodeH265ReferenceInfo
structures specifying the H.265 reference information corresponding to the optional reconstructed picture and any active reference pictures.
-
-
The contents of the provided video bitstream buffer range are interpreted as defined in the H.265 Decode Bitstream Data Access section.
-
Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the H.265 Decode Picture Data Access section.
-
The decision on reference picture setup is made according to the parameters specified in the H.265 picture information.
If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.
H.265 Decode Bitstream Data Access
The video bitstream buffer range should contain a VCL NAL unit comprised of the slice segment headers and data of a picture representing a frame, as defined in sections 7.3.6 and 7.3.8, and this data is interpreted as defined in sections 7.4.7 and 7.4.9 of the ITU-T H.265 Specification, respectively.
The offsets provided in
VkVideoDecodeH265PictureInfoKHR::pSliceSegmentOffsets
should
specify the starting offsets corresponding to each slice segment header
within the video bitstream buffer range.
H.265 Decode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a
decode output picture,
reference picture, or
reconstructed picture accessed by video coding
operations using an H.265 decode profile is defined
as the set of texels within the coordinate range:
-
([0,
endX
), [0,endY
))
Where:
-
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure; -
endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video decode operations using an H.265 decode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.265 Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
H.265 Decode Profile
A video profile supporting H.265 video decode operations is specified by
setting VkVideoProfileInfoKHR::videoCodecOperation
to
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
and adding a
VkVideoDecodeH265ProfileInfoKHR
structure to the
VkVideoProfileInfoKHR::pNext
chain.
The VkVideoDecodeH265ProfileInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h265
typedef struct VkVideoDecodeH265ProfileInfoKHR {
VkStructureType sType;
const void* pNext;
StdVideoH265ProfileIdc stdProfileIdc;
} VkVideoDecodeH265ProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdProfileIdc
is aStdVideoH265ProfileIdc
value specifying the H.265 codec profile IDC, as defined in section A.3 of the ITU-T H.265 Specification.
H.265 Decode Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities for an H.265 decode profile, the
VkVideoCapabilitiesKHR::pNext
chain must include a
VkVideoDecodeH265CapabilitiesKHR
structure that will be filled with
the profile-specific capabilities.
The VkVideoDecodeH265CapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h265
typedef struct VkVideoDecodeH265CapabilitiesKHR {
VkStructureType sType;
void* pNext;
StdVideoH265LevelIdc maxLevelIdc;
} VkVideoDecodeH265CapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxLevelIdc
is aStdVideoH265LevelIdc
value indicating the maximum H.265 level supported by the profile, where enum constantSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifies H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification.
H.265 Decode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
can contain the following types of parameters:
- H.265 Video Parameter Sets (VPS)
-
Represented by
StdVideoH265VideoParameterSet
structures and interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
vps_video_parameter_set_id
is used as the key of the VPS entry; -
the
max_latency_increase_plus1
,max_dec_pic_buffering_minus1
, andmax_num_reorder_pics
members of theStdVideoH265DecPicBufMgr
structure pointed to bypDecPicBufMgr
correspond tovps_max_latency_increase_plus1
,vps_max_dec_pic_buffering_minus1
, andvps_max_num_reorder_pics
, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification; -
the
StdVideoH265HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:-
reserved
is used only for padding purposes and is otherwise ignored; -
flags.fixed_pic_rate_general_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_general_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.fixed_pic_rate_within_cvs_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_within_cvs_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.low_delay_hrd_flag
is a bitmask where bit index i corresponds tolow_delay_hrd_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
if
flags.nal_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersNal
is a pointer to an array ofvps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wherevps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265VideoParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
if
flags.vcl_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersVcl
is a pointer to an array ofvps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wherevps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265VideoParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
all other members of
StdVideoH265HrdParameters
are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
-
-
the
StdVideoH265ProfileTierLevel
structure pointed to bypProfileTierLevel
are interpreted as follows:-
general_level_idc
is one of the enum constantsSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifying the H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265ProfileTierLevel
are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
-
-
all other members of
StdVideoH265VideoParameterSet
are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
-
- H.265 Sequence Parameter Sets (SPS)
-
Represented by
StdVideoH265SequenceParameterSet
structures and interpreted as follows:-
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored; -
the pair constructed from
sps_video_parameter_set_id
andsps_seq_parameter_set_id
is used as the key of the SPS entry; -
the
StdVideoH265ProfileTierLevel
structure pointed to bypProfileTierLevel
are interpreted as follows:-
general_level_idc
is one of the enum constantsSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifying the H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265ProfileTierLevel
are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
-
-
the
max_latency_increase_plus1
,max_dec_pic_buffering_minus1
, andmax_num_reorder_pics
members of theStdVideoH265DecPicBufMgr
structure pointed to bypDecPicBufMgr
correspond tosps_max_latency_increase_plus1
,sps_max_dec_pic_buffering_minus1
, andsps_max_num_reorder_pics
, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification; -
if
flags.sps_scaling_list_data_present_flag
is set, then theStdVideoH265ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
ScalingList4x4
,ScalingList8x8
,ScalingList16x16
, andScalingList32x32
correspond toScalingList[0]
,ScalingList[1]
,ScalingList[2]
, andScalingList[3]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification; -
ScalingListDCCoef16x16
andScalingListDCCoef32x32
correspond toscaling_list_dc_coef_minus8[0]
andscaling_list_dc_coef_minus8[1]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
-
-
pShortTermRefPicSet
is a pointer to an array ofnum_short_term_ref_pic_sets
number ofStdVideoH265ShortTermRefPicSet
structures where each element is interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
used_by_curr_pic_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
use_delta_flag
is a bitmask where bit index i corresponds touse_delta_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
used_by_curr_pic_s0_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_s0_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
used_by_curr_pic_s1_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_s1_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265ShortTermRefPicSet
are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
-
-
if
flags.long_term_ref_pics_present_flag
is set then theStdVideoH265LongTermRefPicsSps
structure pointed to bypLongTermRefPicsSps
is interpreted as follows:-
used_by_curr_pic_lt_sps_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_lt_sps_flag[i]
as defined in section 7.4.3.2 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265LongTermRefPicsSps
are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
-
-
if
flags.vui_parameters_present_flag
is set, then theStdVideoH265SequenceParameterSetVui
structure pointed to bypSequenceParameterSetVui
is interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
the
StdVideoH265HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:-
flags.fixed_pic_rate_general_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_general_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.fixed_pic_rate_within_cvs_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_within_cvs_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.low_delay_hrd_flag
is a bitmask where bit index i corresponds tolow_delay_hrd_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
if
flags.nal_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersNal
is a pointer to an array ofsps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wheresps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265SequenceParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
if
flags.vcl_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersVcl
is a pointer to an array ofsps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wheresps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265SequenceParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
all other members of
StdVideoH265HrdParameters
are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
-
-
all other members of
pSequenceParameterSetVui
are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
-
-
if
flags.sps_palette_predictor_initializer_present_flag
is set, then thePredictorPaletteEntries
member of theStdVideoH265PredictorPaletteEntries
structure pointed to bypPredictorPaletteEntries
is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265SequenceParameterSet
are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
-
- H.265 Picture Parameter Sets (PPS)
-
Represented by
StdVideoH265PictureParameterSet
structures and interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
the triplet constructed from
sps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
is used as the key of the PPS entry; -
if
flags.pps_scaling_list_data_present_flag
is set, then theStdVideoH265ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
ScalingList4x4
,ScalingList8x8
,ScalingList16x16
, andScalingList32x32
correspond toScalingList[0]
,ScalingList[1]
,ScalingList[2]
, andScalingList[3]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification; -
ScalingListDCCoef16x16
andScalingListDCCoef32x32
correspond toscaling_list_dc_coef_minus8[0]
andscaling_list_dc_coef_minus8[1]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
-
-
if
flags.pps_palette_predictor_initializer_present_flag
is set, then thePredictorPaletteEntries
member of theStdVideoH265PredictorPaletteEntries
structure pointed to bypPredictorPaletteEntries
is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265PictureParameterSet
are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.
-
When a video session parameters object is
created with the codec operation
VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, the
VkVideoSessionParametersCreateInfoKHR::pNext
chain must include
a VkVideoDecodeH265SessionParametersCreateInfoKHR
structure specifying
the capacity and initial contents of the object.
The VkVideoDecodeH265SessionParametersCreateInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_decode_h265
typedef struct VkVideoDecodeH265SessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t maxStdVPSCount;
uint32_t maxStdSPSCount;
uint32_t maxStdPPSCount;
const VkVideoDecodeH265SessionParametersAddInfoKHR* pParametersAddInfo;
} VkVideoDecodeH265SessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxStdVPSCount
is the maximum number of H.265 VPS entries the createdVkVideoSessionParametersKHR
can contain. -
maxStdSPSCount
is the maximum number of H.265 SPS entries the createdVkVideoSessionParametersKHR
can contain. -
maxStdPPSCount
is the maximum number of H.265 PPS entries the createdVkVideoSessionParametersKHR
can contain. -
pParametersAddInfo
isNULL
or a pointer to a VkVideoDecodeH265SessionParametersAddInfoKHR structure specifying H.265 parameters to add upon object creation.
The VkVideoDecodeH265SessionParametersAddInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_decode_h265
typedef struct VkVideoDecodeH265SessionParametersAddInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t stdVPSCount;
const StdVideoH265VideoParameterSet* pStdVPSs;
uint32_t stdSPSCount;
const StdVideoH265SequenceParameterSet* pStdSPSs;
uint32_t stdPPSCount;
const StdVideoH265PictureParameterSet* pStdPPSs;
} VkVideoDecodeH265SessionParametersAddInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdVPSCount
is the number of elements in thepStdVPSs
array. -
pStdVPSs
is a pointer to an array ofStdVideoH265VideoParameterSet
structures describing the H.265 VPS entries to add. -
stdSPSCount
is the number of elements in thepStdSPSs
array. -
pStdSPSs
is a pointer to an array ofStdVideoH265SequenceParameterSet
structures describing the H.265 SPS entries to add. -
stdPPSCount
is the number of elements in thepStdPPSs
array. -
pStdPPSs
is a pointer to an array ofStdVideoH265PictureParameterSet
structures describing the H.265 PPS entries to add.
This structure can be specified in the following places:
-
In the
pParametersAddInfo
member of the VkVideoDecodeH265SessionParametersCreateInfoKHR structure specified in thepNext
chain of VkVideoSessionParametersCreateInfoKHR used to create a video session parameters object. In this case, if the video codec operation the video session parameters object is created with isVK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then it defines the set of initial parameters to add to the created object (see Creating Video Session Parameters). -
In the
pNext
chain of VkVideoSessionParametersUpdateInfoKHR. In this case, if the video codec operation the video session parameters object to be updated was created with isVK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, then it defines the set of parameters to add to it (see Updating Video Session Parameters).
H.265 Decoding Parameters
The VkVideoDecodeH265PictureInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h265
typedef struct VkVideoDecodeH265PictureInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoDecodeH265PictureInfo* pStdPictureInfo;
uint32_t sliceSegmentCount;
const uint32_t* pSliceSegmentOffsets;
} VkVideoDecodeH265PictureInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdPictureInfo
is a pointer to aStdVideoDecodeH265PictureInfo
structure specifying H.265 picture information. -
sliceSegmentCount
is the number of elements inpSliceSegmentOffsets
. -
pSliceSegmentOffsets
is a pointer to an array ofsliceSegmentCount
offsets specifying the start offset of the slice segments of the picture within the video bitstream buffer range specified in VkVideoDecodeInfoKHR.
This structure is specified in the pNext
chain of the
VkVideoDecodeInfoKHR structure passed to vkCmdDecodeVideoKHR to
specify the codec-specific picture information for an H.265 decode operation.
- Decode Output Picture Information
-
When this structure is specified in the
pNext
chain of the VkVideoDecodeInfoKHR structure passed to vkCmdDecodeVideoKHR, the information related to the decode output picture is defined as follows:-
The image subregion used is determined according to the H.265 Decode Picture Data Access section.
-
The decode output picture is associated with the H.265 picture information provided in
pStdPictureInfo
.
-
- Std Picture Information
-
The members of the
StdVideoDecodeH265PictureInfo
structure pointed to bypStdPictureInfo
are interpreted as follows:-
reserved
is used only for padding purposes and is otherwise ignored; -
flags.IrapPicFlag
as defined in section 3.73 of the ITU-T H.265 Specification; -
flags.IdrPicFlag
as defined in section 3.67 of the ITU-T H.265 Specification; -
flags.IsReference
as defined in section 3.132 of the ITU-T H.265 Specification; -
sps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
are used to identify the active parameter sets, as described below; -
PicOrderCntVal
as defined in section 8.3.1 of the ITU-T H.265 Specification; -
NumBitsForSTRefPicSetInSlice
is the number of bits used inst_ref_pic_set
whenshort_term_ref_pic_set_sps_flag
is0
, or0
otherwise, as defined in sections 7.4.7 and 7.4.8 of the ITU-T H.265 Specification; -
NumDeltaPocsOfRefRpsIdx
is the value ofNumDeltaPocs[RefRpsIdx]
whenshort_term_ref_pic_set_sps_flag
is1
, or0
otherwise, as defined in sections 7.4.7 and 7.4.8 of the ITU-T H.265 Specification; -
RefPicSetStCurrBefore
,RefPicSetStCurrAfter
, andRefPicSetLtCurr
are interpreted as defined in section 8.3.2 of the ITU-T H.265 Specification where each element of these arrays either identifies an active reference picture using its DPB slot index or contains the valueSTD_VIDEO_H265_NO_REFERENCE_PICTURE
to indicate “no reference picture”; -
all other members are interpreted as defined in section 8.3.2 of the ITU-T H.265 Specification.
-
Reference picture setup is controlled by the value of
StdVideoDecodeH265PictureInfo
::flags.IsReference
.
If it is set and a reconstructed picture is specified, then the latter is used as the target of picture
reconstruction to activate the corresponding
DPB slot.
If StdVideoDecodeH265PictureInfo
::flags.IsReference
is not set,
but a reconstructed picture is
specified, then the corresponding picture reference associated with the
DPB slot is invalidated, as described in the
DPB Slot States section.
- Active Parameter Sets
-
The members of the
StdVideoDecodeH265PictureInfo
structure pointed to bypStdPictureInfo
are used to select the active parameter sets to use from the bound video session parameters object, as follows:-
The active VPS is the VPS identified by the key specified in
StdVideoDecodeH265PictureInfo
::sps_video_parameter_set_id
. -
The active SPS is the SPS identified by the key specified by the pair constructed from
StdVideoDecodeH265PictureInfo
::sps_video_parameter_set_id
andStdVideoDecodeH265PictureInfo
::pps_seq_parameter_set_id
. -
The active PPS is the PPS identified by the key specified by the triplet constructed from
StdVideoDecodeH265PictureInfo
::sps_video_parameter_set_id
,StdVideoDecodeH265PictureInfo
::pps_seq_parameter_set_id
, andStdVideoDecodeH265PictureInfo
::pps_pic_parameter_set_id
.
-
The VkVideoDecodeH265DpbSlotInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_h265
typedef struct VkVideoDecodeH265DpbSlotInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoDecodeH265ReferenceInfo* pStdReferenceInfo;
} VkVideoDecodeH265DpbSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdReferenceInfo
is a pointer to aStdVideoDecodeH265ReferenceInfo
structure specifying reference picture information described in section 8.3 of the ITU-T H.265 Specification.
This structure is specified in the pNext
chain of
VkVideoDecodeInfoKHR::pSetupReferenceSlot
, if not NULL
, and
the pNext
chain of the elements of
VkVideoDecodeInfoKHR::pReferenceSlots
to specify the
codec-specific reference picture information for an H.265 decode operation.
- Active Reference Picture Information
-
When this structure is specified in the
pNext
chain of the elements of VkVideoDecodeInfoKHR::pReferenceSlots
, one element is added to the list of active reference pictures used by the video decode operation for each element of VkVideoDecodeInfoKHR::pReferenceSlots
as follows:-
The image subregion used is determined according to the H.265 Decode Picture Data Access section.
-
The reference picture is associated with the DPB slot index specified in the
slotIndex
member of the corresponding element of VkVideoDecodeInfoKHR::pReferenceSlots
. -
The reference picture is associated with the H.265 reference information provided in
pStdReferenceInfo
.
-
- Reconstructed Picture Information
-
When this structure is specified in the
pNext
chain of VkVideoDecodeInfoKHR::pSetupReferenceSlot
, the information related to the reconstructed picture is defined as follows:-
The image subregion used is determined according to the H.265 Decode Picture Data Access section.
-
If reference picture setup is requested, then the reconstructed picture is used to activate the DPB slot with the index specified in VkVideoDecodeInfoKHR::
pSetupReferenceSlot->slotIndex
. -
The reconstructed picture is associated with the H.265 reference information provided in
pStdReferenceInfo
.
-
- Std Reference Information
-
The members of the
StdVideoDecodeH265ReferenceInfo
structure pointed to bypStdReferenceInfo
are interpreted as follows:-
flags.used_for_long_term_reference
is used to indicate whether the picture is marked as “used for long-term reference” as defined in section 8.3.2 of the ITU-T H.265 Specification; -
flags.unused_for_reference
is used to indicate whether the picture is marked as “unused for reference” as defined in section 8.3.2 of the ITU-T H.265 Specification; -
all other members are interpreted as defined in section 8.3 of the ITU-T H.265 Specification.
-
H.265 Decode Requirements
This section describes the required H.265 decoding capabilities for
physical devices that have at least one queue family that supports the video
codec operation VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR
, as
returned by vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
|
1.0.0 |
Video Capability | Requirement | Requirement Type1 |
---|---|---|
|
- |
min |
|
4096 |
max |
|
4096 |
max |
|
(64,64) |
max |
|
- |
max |
|
- |
min |
|
0 |
min |
|
0 |
min |
|
|
min |
|
|
min |
- 1
-
The Requirement Type column specifies the requirement is either the minimum value all implementations must support, the maximum value all implementations must support, or the exact value all implementations must support. For bitmasks a minimum value is the least bits all implementations must set, but they may have additional bits set beyond this minimum.
AV1 Decode Operations
Video decode operations using an AV1 decode profile can be used to decode elementary video stream sequences compliant with the AV1 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos. |
This process is performed according to the video decode operation steps with the codec-specific semantics defined in section 7 of the AV1 Specification:
-
Syntax elements, derived values, and other parameters are applied from the following structures:
-
The
StdVideoAV1SequenceHeader
structure stored in the bound video session parameters object specifying the active sequence header. -
The
StdVideoDecodeAV1PictureInfo
structure specifying the AV1 picture information. -
The
StdVideoDecodeAV1ReferenceInfo
structures specifying the AV1 reference information corresponding to the optional reconstructed picture and any active reference pictures.
-
-
The contents of the provided video bitstream buffer range are interpreted as defined in the AV1 Decode Bitstream Data Access section.
-
Picture data in the video picture resources corresponding to the used active reference pictures, decode output picture, and optional reconstructed picture is accessed as defined in the AV1 Decode Picture Data Access section.
-
The decision on reference picture setup is made according to the parameters specified in the AV1 picture information.
If the parameters and the bitstream adhere to the syntactic and semantic requirements defined in the corresponding sections of the AV1 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video decode operation will complete successfully. Otherwise, the video decode operation may complete unsuccessfully.
AV1 Decode Bitstream Data Access
The video bitstream buffer range should contain one or more frame OBUs, comprised of a frame header OBU and tile group OBU, that together represent an entire frame, as defined in sections 5.10, 5.9, and 5.11, and this data is interpreted as defined in sections 6.9, 6.8, and 6.10 of the AV1 Specification, respectively.
The offset specified in
VkVideoDecodeAV1PictureInfoKHR::frameHeaderOffset
should
specify the starting offset of the frame header OBU of the frame.
When the tiles of the frame are encoded into multiple tile groups, each
frame OBU has a separate frame header OBU but their content is expected to
match per the requirements of the AV1 Specification.
Accordingly, the offset specified in |
The offsets and sizes provided in
VkVideoDecodeAV1PictureInfoKHR::pTileOffsets
and
VkVideoDecodeAV1PictureInfoKHR::pTileSizes
, respectively,
should specify the starting offsets and sizes corresponding to each tile
within the video bitstream buffer range.
AV1 Decode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a
decode output picture,
reference picture, or
reconstructed picture accessed by video coding
operations using an AV1 decode profile is defined as
the set of texels within the coordinate range:
-
([0,
endX
), [0,endY
))
Where:
-
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure; -
endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video decode operations using an AV1 decode profile, any access to a picture at the coordinates
(x
,y
), as defined by the AV1 Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
AV1 Reference Names and Semantics
Individual reference frames used in the decoding process have different
semantics, as defined in section 6.10.24 of the AV1 Specification.
The AV1 semantics associated with a reference picture are indicated by the
corresponding enumeration constant defined in the Video Std enumeration type
StdVideoAV1ReferenceName
:
-
STD_VIDEO_AV1_REFERENCE_NAME_INTRA_FRAME
identifies the reference used for intra coding (INTRA_FRAME
), as defined in sections 2 and 7.11.2 of the AV1 Specification. -
All other enumeration constants refer to backward or forward references used for inter coding, as defined in sections 2 and 7.11.3 of the AV1 Specification:
-
STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
identifies theLAST_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_LAST2_FRAME
identifies theLAST2_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_LAST3_FRAME
identifies theLAST3_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_GOLDEN_FRAME
identifies theGOLDEN_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_BWDREF_FRAME
identifies theBWDREF_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_ALTREF2_FRAME
identifies theALTREF2_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_ALTREF_FRAME
identifies theALTREF_FRAME
reference
-
These enumeration constants are not directly used in any APIs but are used to indirectly index into certain Video Std and Vulkan API parameter arrays.
AV1 Decode Profile
A video profile supporting AV1 video decode operations is specified by
setting VkVideoProfileInfoKHR::videoCodecOperation
to
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
and adding a
VkVideoDecodeAV1ProfileInfoKHR
structure to the
VkVideoProfileInfoKHR::pNext
chain.
The VkVideoDecodeAV1ProfileInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_av1
typedef struct VkVideoDecodeAV1ProfileInfoKHR {
VkStructureType sType;
const void* pNext;
StdVideoAV1Profile stdProfile;
VkBool32 filmGrainSupport;
} VkVideoDecodeAV1ProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdProfile
is aStdVideoAV1Profile
value specifying the AV1 codec profile, as defined in section A.2 of the AV1 Specification. -
filmGrainSupport
specifies whether AV1 film grain, as defined in section 7.8.3 of the AV1 Specification, can be used with the video profile. When this member isVK_TRUE
, video session objects created against the video profile will be able to decode pictures that have film grain enabled.
Enabling |
AV1 Decode Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities for an AV1 decode profile, the
VkVideoCapabilitiesKHR::pNext
chain must include a
VkVideoDecodeAV1CapabilitiesKHR
structure that will be filled with the
profile-specific capabilities.
The VkVideoDecodeAV1CapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_decode_av1
typedef struct VkVideoDecodeAV1CapabilitiesKHR {
VkStructureType sType;
void* pNext;
StdVideoAV1Level maxLevel;
} VkVideoDecodeAV1CapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxLevel
is aStdVideoAV1Level
value specifying the maximum AV1 level supported by the profile, as defined in section A.3 of the AV1 Specification.
AV1 Decode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
contain a single instance of the following parameter set:
- AV1 Sequence Header
-
Represented by
StdVideoAV1SequenceHeader
structures and interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
the
StdVideoAV1ColorConfig
structure pointed to bypColorConfig
is interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
all other members of
StdVideoAV1ColorConfig
are interpreted as defined in section 6.4.2 of the AV1 Specification;
-
-
if
flags.timing_info_present_flag
is set, then theStdVideoAV1TimingInfo
structure pointed to bypTimingInfo
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoAV1TimingInfo
are interpreted as defined in section 6.4.3 of the AV1 Specification;
-
-
all other members of
StdVideoAV1SequenceHeader
are interpreted as defined in section 6.4 of the AV1 Specification.
-
When a video session parameters object is
created with the codec operation
VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
, the
VkVideoSessionParametersCreateInfoKHR::pNext
chain must include
a VkVideoDecodeAV1SessionParametersCreateInfoKHR
structure specifying
the contents of the object.
The VkVideoDecodeAV1SessionParametersCreateInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_decode_av1
typedef struct VkVideoDecodeAV1SessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoAV1SequenceHeader* pStdSequenceHeader;
} VkVideoDecodeAV1SessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdSequenceHeader
is a pointer to aStdVideoAV1SequenceHeader
structure describing the AV1 sequence header entry to store in the created object.
As AV1 video session parameters objects will only ever contain a single AV1 sequence header, this has to be specified at object creation time and such video session parameters objects cannot be updated using the vkUpdateVideoSessionParametersKHR command. When a new AV1 sequence header is decoded from the input video bitstream the application needs to create a new video session parameters object to store it. |
AV1 Decoding Parameters
The VkVideoDecodeAV1PictureInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_av1
typedef struct VkVideoDecodeAV1PictureInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoDecodeAV1PictureInfo* pStdPictureInfo;
int32_t referenceNameSlotIndices[VK_MAX_VIDEO_AV1_REFERENCES_PER_FRAME_KHR];
uint32_t frameHeaderOffset;
uint32_t tileCount;
const uint32_t* pTileOffsets;
const uint32_t* pTileSizes;
} VkVideoDecodeAV1PictureInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdPictureInfo
is a pointer to aStdVideoDecodeAV1PictureInfo
structure specifying AV1 picture information. -
referenceNameSlotIndices
is an array of seven (VK_MAX_VIDEO_AV1_REFERENCES_PER_FRAME_KHR
, which is equal to the Video Std definitionSTD_VIDEO_AV1_REFS_PER_FRAME
) signed integer values specifying the index of the DPB slot or a negative integer value for each AV1 reference name used for inter coding. In particular, the DPB slot index for the AV1 reference nameframe
is specified inreferenceNameSlotIndices
[frame
-STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
]. -
frameHeaderOffset
is the byte offset of the AV1 frame header OBU, as defined in section 5.9 of the AV1 Specification, within the video bitstream buffer range specified in VkVideoDecodeInfoKHR. -
tileCount
is the number of elements inpTileOffsets
andpTileSizes
. -
pTileOffsets
is a pointer to an array oftileCount
integers specifying the byte offset of the tiles of the picture within the video bitstream buffer range specified in VkVideoDecodeInfoKHR. -
pTileSizes
is a pointer to an array oftileCount
integers specifying the byte size of the tiles of the picture within the video bitstream buffer range specified in VkVideoDecodeInfoKHR.
This structure is specified in the pNext
chain of the
VkVideoDecodeInfoKHR structure passed to vkCmdDecodeVideoKHR to
specify the codec-specific picture information for an AV1 decode operation.
- Decode Output Picture Information
-
When this structure is specified in the
pNext
chain of the VkVideoDecodeInfoKHR structure passed to vkCmdDecodeVideoKHR, the information related to the decode output picture is defined as follows:-
The image subregion used is determined according to the AV1 Decode Picture Data Access section.
-
The decode output picture is associated with the AV1 picture information provided in
pStdPictureInfo
.
-
- Std Picture Information
-
The members of the
StdVideoDecodeAV1PictureInfo
structure pointed to bypStdPictureInfo
are interpreted as follows:-
flags.reserved
,reserved1
, andreserved2
are used only for padding purposes and are otherwise ignored; -
flags.apply_grain
indicates that film grain is enabled for the decoded picture, as defined in section 6.8.20 of the AV1 Specification; -
OrderHint
,OrderHints
, andexpectedFrameId
are interpreted as defined in section 6.8.2 of the AV1 Specification; -
the
StdVideoAV1TileInfo
structure pointed to bypTileInfo
is interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
pMiColStarts
is a pointer to an array ofTileCols
number of unsigned integers that corresponds toMiColStarts
defined in section 6.8.14 of the AV1 Specification; -
pMiRowStarts
is a pointer to an array ofTileRows
number of unsigned integers that corresponds toMiRowStarts
defined in section 6.8.14 of the AV1 Specification; -
pWidthInSbsMinus1
is a pointer to an array ofTileCols
number of unsigned integers that corresponds towidth_in_sbs_minus_1
defined in section 6.8.14 of the AV1 Specification; -
pHeightInSbsMinus1
is a pointer to an array ofTileRows
number of unsigned integers that corresponds toheight_in_sbs_minus_1
defined in section 6.8.14 of the AV1 Specification; -
all other members of
StdVideoAV1TileInfo
are interpreted as defined in section 6.8.14 of the AV1 Specification;
-
-
the
StdVideoAV1Quantization
structure pointed to bypQuantization
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoAV1Quantization
are interpreted as defined in section 6.8.11 of the AV1 Specification;
-
-
if
flags.segmentation_enabled
is set, then theStdVideoAV1Segmentation
structure pointed to bypSegmentation
is interpreted as follows:-
the elements of
FeatureEnabled
are bitmasks where bit index j of element i corresponds toFeatureEnabled[i][j]
as defined in section 6.8.13 of the AV1 Specification; -
FeatureData
is interpreted as defined in section 6.8.13 of the AV1 Specification;
-
-
the
StdVideoAV1LoopFilter
structure pointed to bypLoopFilter
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
update_ref_delta
is a bitmask where bit index i is interpreted as the value ofupdate_ref_delta
corresponding to element i ofloop_filter_ref_deltas
as defined in section 6.8.10 of the AV1 Specification; -
update_mode_delta
is a bitmask where bit index i is interpreted as the value ofupdate_mode_delta
corresponding to element i ofloop_filter_mode_deltas
as defined in section 6.8.10 of the AV1 Specification; -
all other members of
StdVideoAV1LoopFilter
are interpreted as defined in section 6.8.10 of the AV1 Specification;
-
-
if
flags.enable_cdef
is set in the active sequence header, then the members of theStdVideoAV1CDEF
structure pointed to bypCDEF
are interpreted as follows:-
cdef_y_sec_strength
andcdef_uv_sec_strength
are the bitstream values of the corresponding syntax elements defined in section 5.9.19 of the AV1 Specification; -
all other members of
StdVideoAV1CDEF
are interpreted as defined in section 6.10.14 of the AV1 Specification;
-
-
the
StdVideoAV1LoopRestoration
structure pointed to bypLoopRestoration
is interpreted as follows:-
LoopRestorationSize
[plane
] is interpreted as log2(size
) - 5, wheresize
is the value ofLoopRestorationSize
[plane
] as defined in section 6.10.15 of the AV1 Specification. -
all other members of
StdVideoAV1LoopRestoration
are defined as in section 6.10.15 of the AV1 Specification;
-
-
the members of the
StdVideoAV1GlobalMotion
structure provided inglobal_motion
are interpreted as defined in section 7.10 of the AV1 Specification; -
if
flags.film_grain_params_present
is set in the active sequence header, then theStdVideoAV1FilmGrain
structure pointed to bypFilmGrain
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoAV1FilmGrain
are interpreted as defined in section 6.8.20 of the AV1 Specification;
-
-
all other members are interpreted as defined in section 6.8 of the AV1 Specification.
-
When film grain is enabled for the decoded frame,
the flags.update_grain
and film_grain_params_ref_idx
values
specified in StdVideoAV1FilmGrain
are ignored by AV1 decode operations
and the load_grain_params
function, as defined in section 6.8.20 of the
AV1 Specification, is not executed.
Instead, the application is responsible for specifying the effective film
grain parameters for the frame in StdVideoAV1FilmGrain
.
When film grain is enabled for the decoded frame,
the application is required to specify a different decode output picture
resource in VkVideoDecodeInfoKHR::dstPictureResource
compared to
the reconstructed picture specified in
VkVideoDecodeInfoKHR::pSetupReferenceSlot->pPictureResource
even
if the implementation does not report support for
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
in
VkVideoDecodeCapabilitiesKHR::flags
for the video decode
profile.
Reference picture setup is controlled by the value of
StdVideoDecodeAV1PictureInfo
::refresh_frame_flags
.
If it is not zero and a reconstructed picture is specified, then the latter is used as the target of picture
reconstruction to activate the DPB slot
specified in pDecodeInfo->pSetupReferenceSlot→slotIndex
.
If StdVideoDecodeAV1PictureInfo
::refresh_frame_flags
is zero, but
a reconstructed picture is specified,
then the corresponding picture reference associated with the DPB slot is invalidated, as described in the DPB Slot States section.
- Active Parameter Sets
-
The active sequence header is the AV1 sequence header stored in the bound video session parameters object.
The VkVideoDecodeAV1DpbSlotInfoKHR
structure is defined as:
// Provided by VK_KHR_video_decode_av1
typedef struct VkVideoDecodeAV1DpbSlotInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoDecodeAV1ReferenceInfo* pStdReferenceInfo;
} VkVideoDecodeAV1DpbSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdReferenceInfo
is a pointer to aStdVideoDecodeAV1ReferenceInfo
structure specifying AV1 reference information.
This structure is specified in the pNext
chain of
VkVideoDecodeInfoKHR::pSetupReferenceSlot
, if not NULL
, and
the pNext
chain of the elements of
VkVideoDecodeInfoKHR::pReferenceSlots
to specify the
codec-specific reference picture information for an AV1 decode operation.
- Active Reference Picture Information
-
When this structure is specified in the
pNext
chain of the elements of VkVideoDecodeInfoKHR::pReferenceSlots
, one element is added to the list of active reference pictures used by the video decode operation for each element of VkVideoDecodeInfoKHR::pReferenceSlots
as follows:-
The image subregion used is determined according to the AV1 Decode Picture Data Access section.
-
The reference picture is associated with the DPB slot index specified in the
slotIndex
member of the corresponding element of VkVideoDecodeInfoKHR::pReferenceSlots
. -
The reference picture is associated with the AV1 reference information provided in
pStdReferenceInfo
.
-
- Reconstructed Picture Information
-
When this structure is specified in the
pNext
chain of VkVideoDecodeInfoKHR::pSetupReferenceSlot
, the information related to the reconstructed picture is defined as follows:-
The image subregion used is determined according to the AV1 Decode Picture Data Access section.
-
If reference picture setup is requested, then the reconstructed picture is used to activate the DPB slot with the index specified in VkVideoDecodeInfoKHR::
pSetupReferenceSlot->slotIndex
. -
The reconstructed picture is associated with the AV1 reference information provided in
pStdReferenceInfo
.
-
- Std Reference Information
-
The members of the
StdVideoDecodeAV1ReferenceInfo
structure pointed to bypStdReferenceInfo
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
flags.disable_frame_end_update_cdf
is interpreted as defined in section 6.8.2 of the AV1 Specification; -
flags.segmentation_enabled
is interpreted as defined in section 6.8.13 of the AV1 Specification; -
frame_type
is interpreted as defined in section 6.8.2 of the AV1 Specification;The
frame_type
member is defined with the typeuint8_t
, but it takes the same values defined in theStdVideoAV1FrameType
enumeration type asStdVideoDecodeAV1PictureInfo
::frame_type
. -
RefFrameSignBias
is a bitmask where bit index i corresponds toRefFrameSignBias[i]
as defined in section 6.8.2 of the AV1 Specification; -
OrderHint
is interpreted as defined in section 6.8.2 of the AV1 Specification; -
SavedOrderHints
is interpreted as defined in section 7.20 of the AV1 Specification.When the AV1 reference information is provided for the reconstructed picture, certain parameters (e.g.
frame_type
) are specified both in the AV1 picture information and in the AV1 reference information. This is necessary because unlike the AV1 picture information, which is only used for the purposes of the video decode operation in question, the AV1 reference information specified for the reconstructed picture may be associated with the activated DPB slot, meaning that some implementations may maintain it as part of the reference picture metadata corresponding to the video picture resource associated with the DPB slot. When the AV1 reference information is provided for an active reference picture, the specified parameters correspond to the parameters specified when the DPB slot was activated (set up) with the reference picture, as usual, in order to communicate these parameters for implementations that do not maintain any subset of these parameters as part of the DPB slot’s reference picture metadata.
-
AV1 Decode Requirements
This section describes the required AV1 decoding capabilities for physical
devices that have at least one queue family that supports the video codec
operation VK_VIDEO_CODEC_OPERATION_DECODE_AV1_BIT_KHR
, as returned by
vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
|
1.0.0 |
Video Capability | Requirement | Requirement Type1 |
---|---|---|
|
- |
min |
|
4096 |
max |
|
4096 |
max |
|
(64,64) |
max |
|
- |
max |
|
- |
min |
|
0 |
min |
|
0 |
min |
|
|
min |
|
|
min |
- 1
-
The Requirement Type column specifies the requirement is either the minimum value all implementations must support, the maximum value all implementations must support, or the exact value all implementations must support. For bitmasks a minimum value is the least bits all implementations must set, but they may have additional bits set beyond this minimum.
Video Encode Operations
Video encode operations consume an encode input picture and zero or more reference pictures, and produce compressed video data to a video bitstream buffer and an optional reconstructed picture.
Such encode input pictures can be used as the output of video decode operations, with graphics or compute operations, or with Window System Integration APIs, depending on the capabilities of the implementation. |
Video encode operations may access the following resources in the
VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR
stage:
-
The image subregions corresponding to the source encode input picture and active reference pictures with access
VK_ACCESS_2_VIDEO_ENCODE_READ_BIT_KHR
. -
The destination video bitstream buffer range and the optional reconstructed picture with access
VK_ACCESS_2_VIDEO_ENCODE_WRITE_BIT_KHR
.
The image subresource of each video picture resource accessed by the video encode operation is specified using a corresponding VkVideoPictureResourceInfoKHR structure. Each such image subresource must be in the appropriate image layout as follows:
-
If the image subresource is used in the video encode operation as an encode input picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_ENCODE_SRC_KHR
layout. -
If the image subresource is used in the video encode operation as a reconstructed picture or reference picture, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_ENCODE_DPB_KHR
layout. -
If the image subresource is used in the video encode operation as a quantization map, then it must be in the
VK_IMAGE_LAYOUT_VIDEO_ENCODE_QUANTIZATION_MAP_KHR
layout.
A video encode operation may complete unsuccessfully. In this case the target video bitstream buffer will have undefined contents. Similarly, if reference picture setup is requested, the reconstructed-picture will also have undefined contents, and the activated DPB slot will have an invalid picture reference.
If a video encode operation completes successfully and the codec-specific parameters provided by the application adhere to the syntactic and semantic requirements defined in the corresponding video compression standard, then the target video bitstream buffer will contain compressed video data after the execution of the video encode operation according to the respective codec-specific semantics.
Codec-Specific Semantics
The following aspects of video encode operations are codec-specific:
-
The compressed video data written to the target video bitstream buffer range.
-
The construction and interpretation of the list of active reference pictures and the interpretation of the picture data referred to by the corresponding image subregions.
-
The construction and interpretation of information related to the encode input picture and the interpretation of the picture data referred to by the corresponding image subregion.
-
The decision on reference picture setup.
-
The construction and interpretation of information related to the optional reconstructed picture and the generation of picture data to the corresponding image subregion.
-
Certain aspects of rate control.
These codec-specific behaviors are defined for each video codec operation separately.
-
If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then the codec-specific aspects of the video encoding process are performed as defined in the H.264 Encode Operations section. -
If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then the codec-specific aspects of the video encoding process are performed as defined in the H.265 Encode Operations section. -
If the used video codec operation is
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, then the codec-specific aspects of the video encoding process are performed as defined in the AV1 Encode Operations section.
Video Encode Parameter Overrides
Implementations supporting video encode operations for any particular video codec operation often support only a subset of the available encoding tools defined by the corresponding video compression standards. Accordingly, certain implementation-dependent limitations may apply to codec-specific parameters provided through the structures defined in the Video Std headers corresponding to the used video codec operation.
Exposing all of these restrictions on particular codec-specific parameter values or combinations thereof in the form of application-queryable capabilities is impractical, hence this specification allows implementations to override the value of any of the codec-specific parameters, unless otherwise specified, as long as all of the following conditions are met:
-
If the application-provided codec-specific parameters adhere to the syntactic and semantic requirements and rules defined by the used video compression standard, and thus would be usable to produce a video bitstream compliant with that standard, then the codec-specific parameters resulting from the process of implementation overrides must also adhere to the same requirements and rules, and any video bitstream produced using the overridden parameters must also be compliant.
-
The overridden codec-specific parameter values must not have an impact on the codec-independent behaviors defined for video encode operations.
-
The implementation must not override any codec-specific parameters specified to a command that may cause application-provided codec-specific parameters specified to subsequent commands to no longer adhere to the semantic requirements and rules defined by the used video compression standard, unless the implementation also overrides those parameters to adhere to any such requirements and rules.
-
The overridden codec-specific parameter values must not have an impact on the codec-specific picture data access semantics.
-
The overridden codec-specific parameter values may change the contents of the codec-specific bitstream elements produced by video encode operations or otherwise retrieved by the application (e.g. using the vkGetEncodedVideoSessionParametersKHR command) but must still adhere to the codec-specific semantics defined for that video codec operation, including, but not limited to, the number, type, and order of the encoded codec-specific bitstream elements.
Besides codec-specific parameter overrides performed for implementation-dependent reasons, applications can enable the implementation to apply additional optimizing overrides that may improve the efficiency or performance of video encoding operations. However, implementations must meet the conditions listed above even in case of such optimizing overrides.
Unless the application opts in for optimizing overrides, implementations are not expected to override any of the codec-specific parameters, except when such overrides are necessary for the correct operation of video encoder implementation due to limitations to the available encoding tools on that implementation. |
Video Encode Operation Steps
Each video encode operation performs the following steps in the
VK_PIPELINE_STAGE_2_VIDEO_ENCODE_BIT_KHR
stage:
-
Reads the input picture data from the encode input picture;
-
Determine derived encoding quality parameters according to the codec-specific semantics and the current rate control state;
-
Compresses the input picture data according to the codec-specific semantics, applying any prediction data read from the active reference pictures and rate control restrictions in the process;
-
Writes the encoded bitstream data to the destination video bitstream buffer range;
-
Performs picture reconstruction of the encoded video data according to the codec-specific semantics, applying any prediction data read from the active reference pictures in the process, if a reconstructed picture is specified and reference picture setup is requested;
-
If reference picture setup is requested, the DPB slot index specified in the reconstructed picture information is activated with the reconstructed picture;
-
Writes the reconstructed picture data to the reconstructed picture, if one is specified, according to the codec-specific semantics.
When reconstructed picture information is provided, the specified DPB slot index is associated with the corresponding bound reference picture resource, indifferent of whether reference picture setup is requested.
Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR with
pVideoProfile->videoCodecOperation
specifying an encode operation, the
VkVideoEncodeCapabilitiesKHR structure must be included in the
pNext
chain of the VkVideoCapabilitiesKHR structure to retrieve
capabilities specific to video encoding.
The VkVideoEncodeCapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeCapabilityFlagsKHR flags;
VkVideoEncodeRateControlModeFlagsKHR rateControlModes;
uint32_t maxRateControlLayers;
uint64_t maxBitrate;
uint32_t maxQualityLevels;
VkExtent2D encodeInputPictureGranularity;
VkVideoEncodeFeedbackFlagsKHR supportedEncodeFeedbackFlags;
} VkVideoEncodeCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeCapabilityFlagBitsKHR describing supported encoding features. -
rateControlModes
is a bitmask of VkVideoEncodeRateControlModeFlagBitsKHR indicating supported rate control modes. -
maxRateControlLayers
indicates the maximum number of rate control layers supported. -
maxBitrate
indicates the maximum supported bitrate. -
maxQualityLevels
indicates the number of discrete video encode quality levels supported. Implementations must support at least one quality level. -
encodeInputPictureGranularity
indicates the granularity at which encode input picture data is encoded and may indicate a texel granularity up to the size of the largest supported codec-specific coding block. This capability does not impose any valid usage constraints on the application, however, depending on the contents of the encode input picture, it may have effects on the encoded bitstream, as described in more detail below. -
supportedEncodeFeedbackFlags
is a bitmask of VkVideoEncodeFeedbackFlagBitsKHR values specifying the supported flags for video encode feedback queries.
Implementations must include support for at least
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR
and
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR
in
supportedEncodeFeedbackFlags
.
encodeInputPictureGranularity
provides information about the way
encode input picture data is used as input to video
encode operations.
In particular, some implementations may not be able to limit the set of
texels used to encode the output video bitstream to the image subregion
specified in the VkVideoPictureResourceInfoKHR structure corresponding
to the encode input picture (i.e. to the resolution of the image data to
encode specified in its codedExtent
member).
For example, the application requests the coded extent to be 1920x1080, but
the implementation is only able to source the encode input picture data at
the granularity of the codec-specific coding block size which is 16x16
pixels (or as otherwise indicated in |
If codedExtent
rounded up to the next integer multiple of
encodeInputPictureGranularity
is greater than the extent of the image
subresource specified for the encode input picture,
then the texel values corresponding to texel coordinates outside of the
bounds of the image subresource may be undefined.
However, implementations should use well-defined default values for such
texels in order to maximize the encoding efficiency for the last coding
block row/column, and/or to ensure consistent encoding results across
repeated encoding of the same input content.
Nonetheless, the values used for such texels must not have an effect on
whether the video encode operation produces a compliant bitstream, and must
not have any other effects on the encoded picture data beyond what may
otherwise result from using these texel values as input to any compression
algorithm, as defined in the used video compression standard.
While not required, it is generally a good practice for applications to make
sure that the image subresource used for the encode input picture has an
extent that is an integer multiple of the codec-specific coding block size
(or at least |
Bits which may be set in VkVideoEncodeCapabilitiesKHR::flags
,
indicating the encoding tools supported, are:
// Provided by VK_KHR_video_encode_queue
typedef enum VkVideoEncodeCapabilityFlagBitsKHR {
VK_VIDEO_ENCODE_CAPABILITY_PRECEDING_EXTERNALLY_ENCODED_BYTES_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_CAPABILITY_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_DETECTION_BIT_KHR = 0x00000002,
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR = 0x00000004,
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR = 0x00000008,
} VkVideoEncodeCapabilityFlagBitsKHR;
-
VK_VIDEO_ENCODE_CAPABILITY_PRECEDING_EXTERNALLY_ENCODED_BYTES_BIT_KHR
specifies that the implementation supports the use of VkVideoEncodeInfoKHR::precedingExternallyEncodedBytes
. -
VK_VIDEO_ENCODE_CAPABILITY_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_DETECTION_BIT_KHR
specifies that the implementation is able to detect and report when the destination video bitstream buffer range provided by the application is not sufficiently large to fit the encoded bitstream data produced by a video encode operation by reporting theVK_QUERY_RESULT_STATUS_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_KHR
query result status code.Some implementations may not be able to reliably detect insufficient bitstream buffer range conditions in all situations. Such implementations will not report support for the
VK_VIDEO_ENCODE_CAPABILITY_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_DETECTION_BIT_KHR
encode capability flag for the video profile, but may still report theVK_QUERY_RESULT_STATUS_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_KHR
query result status code in certain cases. Applications should always check for the specific query result status codeVK_QUERY_RESULT_STATUS_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_KHR
even when this encode capability flag is not supported by the implementation for the video profile in question. However, applications must not assume that a different negative query result status code indicating an unsuccessful completion of a video encode operation is not the result of an insufficient bitstream buffer condition unless this encode capability flag is supported.
-
VK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
indicates support for using quantization delta maps. -
VK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR
specifies support for using emphasis maps.
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeCapabilityFlagsKHR;
VkVideoEncodeCapabilityFlagsKHR
is a bitmask type for setting a mask
of zero or more VkVideoEncodeCapabilityFlagBitsKHR.
Video Encode Quality Levels
Implementations can support more than one video encode quality levels for a video encode profile, which control the number and type of implementation-specific encoding tools and algorithms utilized in the encoding process.
Generally, using higher video encode quality levels may produce higher quality video streams at the cost of additional processing time. However, as the final quality of an encoded picture depends on the contents of the encode input picture, the contents of the active reference pictures, the codec-specific encode parameters, and the particular implementation-specific tools used corresponding to the individual video encode quality levels, there are no guarantees that using a higher video encode quality level will always produce a higher quality encoded picture for any given set of inputs. |
To query properties for a specific video encode quality level supported by a video encode profile, call:
// Provided by VK_KHR_video_encode_queue
VkResult vkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceVideoEncodeQualityLevelInfoKHR* pQualityLevelInfo,
VkVideoEncodeQualityLevelPropertiesKHR* pQualityLevelProperties);
-
physicalDevice
is the physical device to query the video encode quality level properties for. -
pQualityLevelInfo
is a pointer to a VkPhysicalDeviceVideoEncodeQualityLevelInfoKHR structure specifying the video encode profile and quality level to query properties for. -
pQualityLevelProperties
is a pointer to a VkVideoEncodeQualityLevelPropertiesKHR structure in which the properties are returned.
The VkPhysicalDeviceVideoEncodeQualityLevelInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkPhysicalDeviceVideoEncodeQualityLevelInfoKHR {
VkStructureType sType;
const void* pNext;
const VkVideoProfileInfoKHR* pVideoProfile;
uint32_t qualityLevel;
} VkPhysicalDeviceVideoEncodeQualityLevelInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pVideoProfile
is a pointer to a VkVideoProfileInfoKHR structure specifying the video profile to query the video encode quality level properties for. -
qualityLevel
is the video encode quality level to query properties for.
The VkVideoEncodeQualityLevelPropertiesKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeQualityLevelPropertiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeRateControlModeFlagBitsKHR preferredRateControlMode;
uint32_t preferredRateControlLayerCount;
} VkVideoEncodeQualityLevelPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
preferredRateControlMode
is a VkVideoEncodeRateControlModeFlagBitsKHR value indicating the preferred rate control mode to use with the video encode quality level. -
preferredRateControlLayerCount
indicates the preferred number of rate control layers to use with the video encode quality level.
The VkVideoEncodeQualityLevelInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeQualityLevelInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t qualityLevel;
} VkVideoEncodeQualityLevelInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
qualityLevel
is the used video encode quality level.
This structure can be specified in the following places:
-
In the
pNext
chain of VkVideoSessionParametersCreateInfoKHR to specify the video encode quality level to use for a video session parameters object created for a video encode session. If no instance of this structure is included in thepNext
chain of VkVideoSessionParametersCreateInfoKHR, then the video session parameters object is created with a video encode quality level of zero. -
In the
pNext
chain of VkVideoCodingControlInfoKHR to change the video encode quality level state of the bound video session.
Retrieving Encoded Session Parameters
Any codec-specific parameters stored in video session parameters objects may need to be separately encoded and included in the final video bitstream data, depending on the used video compression standard. In such cases the application must call the vkGetEncodedVideoSessionParametersKHR command to retrieve the encoded parameter data from the used video session parameters object in order to be able to produce a compliant video bitstream.
This is needed because implementations may have changed some of the codec-specific parameters stored in the video session parameters object, as defined in the Video Encode Parameter Overrides section. In addition, the vkGetEncodedVideoSessionParametersKHR command enables the application to retrieve the encoded parameter data without having to encode these codec-specific parameters manually. |
Encoded parameter data can be retrieved from a video session parameters object created with a video encode operation using the command:
// Provided by VK_KHR_video_encode_queue
VkResult vkGetEncodedVideoSessionParametersKHR(
VkDevice device,
const VkVideoEncodeSessionParametersGetInfoKHR* pVideoSessionParametersInfo,
VkVideoEncodeSessionParametersFeedbackInfoKHR* pFeedbackInfo,
size_t* pDataSize,
void* pData);
-
device
is the logical device that owns the video session parameters object. -
pVideoSessionParametersInfo
is a pointer to a VkVideoEncodeSessionParametersGetInfoKHR structure specifying the parameters of the encoded parameter data to retrieve. -
pFeedbackInfo
is eitherNULL
or a pointer to a VkVideoEncodeSessionParametersFeedbackInfoKHR structure in which feedback about the requested parameter data is returned. -
pDataSize
is a pointer to asize_t
value related to the amount of encode parameter data returned, as described below. -
pData
is eitherNULL
or a pointer to a buffer to write the encoded parameter data to.
If pData
is NULL
, then the size of the encoded parameter data, in
bytes, that can be retrieved is returned in pDataSize
.
Otherwise, pDataSize
must point to a variable set by the application
to the size of the buffer, in bytes, pointed to by pData
, and on
return the variable is overwritten with the number of bytes actually written
to pData
.
If pDataSize
is less than the size of the encoded parameter data that
can be retrieved, then no data will be written to pData
, zero will be
written to pDataSize
, and VK_INCOMPLETE
will be returned instead
of VK_SUCCESS
, to indicate that no encoded parameter data was
returned.
If pFeedbackInfo
is not NULL
then the members of the
VkVideoEncodeSessionParametersFeedbackInfoKHR structure and any
additional structures included in its pNext
chain that are applicable
to the video session parameters object specified in
pVideoSessionParametersInfo->videoSessionParameters
will be filled
with feedback about the requested parameter data on all successful calls to
this command.
This includes the cases when |
The VkVideoEncodeSessionParametersGetInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeSessionParametersGetInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoSessionParametersKHR videoSessionParameters;
} VkVideoEncodeSessionParametersGetInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
videoSessionParameters
is the VkVideoSessionParametersKHR object to retrieve encoded parameter data from.
Depending on the used video encode operation, additional codec-specific
structures may need to be included in the pNext
chain of this
structure to identify the specific video session parameters to retrieve
encoded parameter data for, as described in the corresponding sections.
The VkVideoEncodeSessionParametersFeedbackInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeSessionParametersFeedbackInfoKHR {
VkStructureType sType;
void* pNext;
VkBool32 hasOverrides;
} VkVideoEncodeSessionParametersFeedbackInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
hasOverrides
indicates whether any of the requested parameter data were overridden by the implementation.
Depending on the used video encode operation, additional codec-specific
structures can be included in the pNext
chain of this structure to
capture codec-specific feedback information about the requested parameter
data, as described in the corresponding sections.
Video Encode Commands
To launch video encode operations, call:
// Provided by VK_KHR_video_encode_queue
void vkCmdEncodeVideoKHR(
VkCommandBuffer commandBuffer,
const VkVideoEncodeInfoKHR* pEncodeInfo);
-
commandBuffer
is the command buffer in which to record the command. -
pEncodeInfo
is a pointer to a VkVideoEncodeInfoKHR structure specifying the parameters of the video encode operations.
Each call issues one or more video encode operations.
The implicit parameter opCount
corresponds to the number of video
encode operations issued by the command.
After calling this command, the
active query index of each
active query is incremented by opCount
.
Currently each call to this command results in the issue of a single video encode operation.
If the bound video session was created with
VK_VIDEO_SESSION_CREATE_INLINE_QUERIES_BIT_KHR
and the pNext
chain of pEncodeInfo
includes a VkVideoInlineQueryInfoKHR
structure with its queryPool
member specifying a valid
VkQueryPool
handle, then this command will execute a query for each
video encode operation issued by it.
- Active Reference Picture Information
-
The list of active reference pictures used by a video encode operation is a list of image subregions used as the source of reference picture data and related parameters, and is derived from the VkVideoReferenceSlotInfoKHR structures provided as the elements of the
pEncodeInfo->pReferenceSlots
array. For each element ofpEncodeInfo->pReferenceSlots
, one or more elements are added to the active reference picture list, as defined by the codec-specific semantics. Each element of this list contains the following information:-
The image subregion within the image subresource referred to by the video picture resource used as the reference picture.
-
The DPB slot index the reference picture is associated with.
-
The codec-specific reference information related to the reference picture.
-
- Reconstructed Picture Information
-
Information related to the optional reconstructed picture used by a video encode operation is derived from the VkVideoReferenceSlotInfoKHR structure pointed to by
pEncodeInfo->pSetupReferenceSlot
, if notNULL
, as defined by the codec-specific semantics, and consists of the following:-
The image subregion within the image subresource referred to by the video picture resource used as the reconstructed picture.
-
The DPB slot index to use for picture reconstruction.
-
The codec-specific reference information related to the reconstructed picture.
-
Specifying a valid VkVideoReferenceSlotInfoKHR structure in
pEncodeInfo->pSetupReferenceSlot
is always required, unless the video
session was created with VkVideoSessionCreateInfoKHR::maxDpbSlot
equal to zero.
However, the DPB slot identified by
pEncodeInfo->pSetupReferenceSlot→slotIndex
is only
activated with the reconstructed picture specified in
pEncodeInfo->pSetupReferenceSlot→pPictureResource
if reference
picture setup is requested according to the
codec-specific semantics.
If reconstructed picture information is specified, but reference picture setup is not requested, according to the codec-specific semantics, the contents of the video picture resource corresponding to the reconstructed picture will be undefined after the video encode operation.
Some implementations may always output the reconstructed picture or use it as temporary storage during the video encode operation even when the reconstructed picture is not marked for future reference. |
- Encode Input Picture Information
-
Information related to the encode input picture used by a video encode operation is derived from
pEncodeInfo->srcPictureResource
and any codec-specific parameters provided in thepEncodeInfo->pNext
chain, as defined by the codec-specific semantics, and consists of the following:-
The image subregion within the image subresource referred to by the video picture resource used as the encode input picture.
-
The codec-specific picture information related to the encoded picture.
-
Several limiting values are defined below that are referenced by the relevant valid usage statements of this command.
-
Let
uint32_t activeReferencePictureCount
be the size of the list of active reference pictures used by the video encode operation. Unless otherwise defined,activeReferencePictureCount
is set to the value ofpEncodeInfo->referenceSlotCount
. -
Let
VkOffset2D codedOffsetGranularity
be the minimum alignment requirement for the coded offset of video picture resources. Unless otherwise defined, the value of thex
andy
members ofcodedOffsetGranularity
are0
. -
Let
uint32_t dpbFrameUseCount[]
be an array of sizemaxDpbSlots
, wheremaxDpbSlots
is the VkVideoSessionCreateInfoKHR::maxDpbSlots
the bound video session was created with, with each element indicating the number of times a frame associated with the corresponding DPB slot index is referred to by the video coding operation. Let the initial value of each element of the array be0
.-
If
pEncodeInfo->pSetupReferenceSlot
is notNULL
, thendpbFrameUseCount[i]
is incremented by one, wherei
equalspEncodeInfo->pSetupReferenceSlot→slotIndex
. -
For each element of
pEncodeInfo->pReferenceSlots
,dpbFrameUseCount[i]
is incremented by one, wherei
equals theslotIndex
member of the corresponding element.
-
-
If there is a bound video session parameters object created with
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
, then letVkExtent2D quantizationMapTexelSize
be the quantization map texel size the bound video session parameters object was created with. -
Let
VkExtent2D maxCodingBlockSize
be the maximum codec-specific coding block size that may be used by the video encode operation.-
If the bound video session object was created with an H.264 encode profile, then let
maxCodingBlockSize
be equal to the size of an H.264 macroblock, i.e.{16,16}
. -
If the bound video session object was created with an H.265 encode profile, then let
maxCodingBlockSize
be equal to the maximum H.265 coding block size that may be used by the video encode operation derived as the maximum of the CTB sizes corresponding to the VkVideoEncodeH265CtbSizeFlagBitsKHR bits set in VkVideoEncodeH265CapabilitiesKHR::ctbSizes
, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile the bound video session was created with. -
If the bound video session object was created with an AV1 encode profile, then let
maxCodingBlockSize
be equal to the maximum AV1 superblock size that may be used by the video encode operation derived as the maximum of the superblock sizes corresponding to the VkVideoEncodeAV1SuperblockSizeFlagBitsKHR bits set in VkVideoEncodeAV1CapabilitiesKHR::superblockSizes
, as returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile the bound video session was created with. -
Otherwise,
maxCodingBlockSize
is undefined.
-
-
If
maxCodingBlockSize
is defined, then letVkExtent2D minCodingBlockExtent
be the coded extent of the encode input picture expressed in terms of codec-specific coding blocks, assuming the maximum size of such coding blocks, as defined bymaxCodingBlockSize
, calculated from the value of thecodedExtent
member ofpEncodeInfo->srcPictureResource
as follows:-
minCodingBlockExtent.width
= (codedExtent.width
maxCodingBlockSize.width
- 1) /maxCodingBlockSize.width
-
minCodingBlockExtent.height
= (codedExtent.height
maxCodingBlockSize.height
- 1) /maxCodingBlockSize.height
-
-
If the bound video session object was created with an H.264 encode profile, then:
-
Let
StdVideoH264PictureType h264PictureType
be the picture type of the encoded picture set to the value ofpStdPictureInfo->primary_pic_type
specified in the VkVideoEncodeH264PictureInfoKHR structure included in thepEncodeInfo->pNext
chain. -
Let
StdVideoH264PictureType h264L0PictureTypes[]
andStdVideoH264PictureType h264L1PictureTypes[]
be the picture types of the reference pictures in the L0 and L1 reference lists, respectively. IfpStdPictureInfo->pRefLists
specified in the VkVideoEncodeH264PictureInfoKHR structure included in thepEncodeInfo->pNext
chain is notNULL
, then for each reference index specified in the elements of thepStdPictureInfo->pRefLists→RefPicList0
andpStdPictureInfo->pRefLists→RefPicList1
arrays, if the reference index is notSTD_VIDEO_H264_NO_REFERENCE_PICTURE
,pStdReferenceInfo->primary_pic_type
is added toh264L0PictureTypes
orh264L1PictureTypes
, respectively, wherepStdReferenceInfo
is the member of the VkVideoEncodeH264DpbSlotInfoKHR structure included in thepNext
chain of the element ofpEncodeInfo->pReferenceSlots
for whichslotIndex
equals the reference index in question.
-
-
If the bound video session object was created with an H.265 encode profile, then:
-
Let
StdVideoH265PictureType h265PictureType
be the picture type of the encoded picture set to the value ofpStdPictureInfo->pic_type
specified in the VkVideoEncodeH265PictureInfoKHR structure included in thepEncodeInfo->pNext
chain. -
Let
StdVideoH265PictureType h265L0PictureTypes[]
andStdVideoH265PictureType h265L1PictureTypes[]
be the picture types of the reference pictures in the L0 and L1 reference lists, respectively. IfpStdPictureInfo->pRefLists
specified in the VkVideoEncodeH265PictureInfoKHR structure included in thepEncodeInfo->pNext
chain is notNULL
, then for each reference index specified in the elements of thepStdPictureInfo->pRefLists→RefPicList0
andpStdPictureInfo->pRefLists→RefPicList1
arrays, if the reference index is notSTD_VIDEO_H265_NO_REFERENCE_PICTURE
,pStdReferenceInfo->pic_type
is added toh265L0PictureTypes
orh265L1PictureTypes
, respectively, wherepStdReferenceInfo
is the member of the VkVideoEncodeH265DpbSlotInfoKHR structure included in thepNext
chain of the element ofpEncodeInfo->pReferenceSlots
for whichslotIndex
equals the reference index in question.
-
-
If the bound video session object was created with an AV1 encode profile, then:
-
If the
primaryReferenceCdfOnly
member of the VkVideoEncodeAV1PictureInfoKHR structure included in thepEncodeInfo->pNext
chain is set toVK_TRUE
, then letint32_t cdfOnlyReferenceIndex
be the value of VkVideoEncodeAV1PictureInfoKHR::pStdPictureInfo->primary_ref_frame
. -
Otherwise let
int32_t cdfOnlyReferenceIndex
be-1
.
-
The VkVideoEncodeInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeFlagsKHR flags;
VkBuffer dstBuffer;
VkDeviceSize dstBufferOffset;
VkDeviceSize dstBufferRange;
VkVideoPictureResourceInfoKHR srcPictureResource;
const VkVideoReferenceSlotInfoKHR* pSetupReferenceSlot;
uint32_t referenceSlotCount;
const VkVideoReferenceSlotInfoKHR* pReferenceSlots;
uint32_t precedingExternallyEncodedBytes;
} VkVideoEncodeInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
is a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeFlagBitsKHR indicating video encode command flags. -
dstBuffer
is the destination video bitstream buffer to write the encoded bitstream to. -
dstBufferOffset
is the starting offset in bytes from the start ofdstBuffer
to write the encoded bitstream to. -
dstBufferRange
is the maximum bitstream size in bytes that can be written todstBuffer
, starting fromdstBufferOffset
. -
srcPictureResource
is the video picture resource to use as the encode input picture. -
pSetupReferenceSlot
isNULL
or a pointer to a VkVideoReferenceSlotInfoKHR structure specifying the reconstructed picture information. -
referenceSlotCount
is the number of elements in thepReferenceSlots
array. -
pReferenceSlots
isNULL
or a pointer to an array of VkVideoReferenceSlotInfoKHR structures describing the DPB slots and corresponding reference picture resources to use in this video encode operation (the set of active reference pictures). -
precedingExternallyEncodedBytes
is the number of bytes externally encoded by the application to the video bitstream and is used to update the internal state of the implementation’s rate control algorithm to account for the bitrate budget consumed by these externally encoded bytes.
Bits which can be set in VkVideoEncodeInfoKHR::flags
,
specifying video encode flags, are:
// Provided by VK_KHR_video_encode_quantization_map
typedef enum VkVideoEncodeFlagBitsKHR {
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_ENCODE_WITH_QUANTIZATION_DELTA_MAP_BIT_KHR = 0x00000001,
// Provided by VK_KHR_video_encode_quantization_map
VK_VIDEO_ENCODE_WITH_EMPHASIS_MAP_BIT_KHR = 0x00000002,
} VkVideoEncodeFlagBitsKHR;
-
VK_VIDEO_ENCODE_WITH_QUANTIZATION_DELTA_MAP_BIT_KHR
specifies the use of a quantization delta map in the issued video encode operations. -
VK_VIDEO_ENCODE_WITH_EMPHASIS_MAP_BIT_KHR
specifies the use of an emphasis map in the issued video encode operations.
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeFlagsKHR;
VkVideoEncodeFlagsKHR is a bitmask type for setting a mask of zero or more VkVideoEncodeFlagBitsKHR.
Video Encode Rate Control
The size of the encoded bitstream data produced by video encode operations is a function of the following set of constraints:
-
The capabilities of the compression algorithms defined and employed by the used video compression standard;
-
Restrictions imposed by the selected video profile according to the rules defined by the used video compression standard;
-
Further restrictions imposed by the capabilities supported by the implementation for the selected video profile;
-
The image data in the encode input picture and the set of active reference pictures (as these affect the effectiveness of the compression algorithms employed by the video encode operations);
-
The set of codec-specific and codec-independent encoding parameters provided by the application.
These also inherently define the set of decoder capabilities required for reconstructing and processing the picture data in the encoded bitstream.
Video coding uses bitrate as the quantitative metric associated with encoded bitstream data size which expresses the rate at which video bitstream data can be transferred or processed, measured in number of bits per second. This bitrate is both a function of the encoded bitstream data size of the encoded pictures as well as the frame rate used by the video sequence.
Rate control algorithms are used by video encode operations to enable adjusting encoding parameters to achieve a target bitrate, or otherwise directly or indirectly control the bitrate of the generated video bitstream data. These algorithms are usually not defined by the used video compression standard, although some video compression standards do provide non-normative guidelines for implementations.
Accordingly, this specification does not mandate implementations to produce identical encoded bitstream data outputs in response to video encode operations, however, it does define a set of codec-independent and codec-specific parameters that enable the application to control the behavior of the rate control algorithms supported by the implementation. Some of these parameters guarantee certain implementation behavior while others provide guidance for implementations to apply various rate control heuristics.
Applications need to make sure that they configure rate control parameters appropriately and that they follow the promises made to the implementation through parameters providing guidance for the implementation’s rate control algorithms and heuristics in order to be able to get the desired rate control behavior and to be able to hit the set bitrate targets. In addition, the behavior of rate control may also differ across implementations even if the capabilities of the used video profile match between those implementations. This may happen due to implementations applying different rate control algorithms or heuristics internally, and thus even the same set of guidance parameter values may have different effects on the rate control behavior across implementations. |
Rate Control Modes
After a video session is reset to the initial state, the default behavior and parameters of video encode rate control are entirely implementation-dependent and the application cannot affect the bitrate or quality parameters of the encoded bitstream data produced by video encode operations unless the application changes the rate control configuration of the video session, as described in the Video Coding Control section.
For each supported video profile, the implementation may expose a set of rate control modes that are available for use by the application when encoding bitstreams targeting that video profile. These modes allow using different rate control algorithms that fall into one of the following two categories:
-
Per-operation rate control
-
Stream-level rate control
In case of per-operation rate control, the bitrate of the generated video bitstream data is indirectly controlled by quality, size, or other encoding parameters specified by the application for each individual video encode operation.
In case of stream-level rate control, the application can directly specify target bitrates besides other encoding parameters to control the behavior of the rate control algorithm used by the implementation across multiple video encode operations.
The rate control modes are defined with the following enums:
// Provided by VK_KHR_video_encode_queue
typedef enum VkVideoEncodeRateControlModeFlagBitsKHR {
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR = 0,
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_CBR_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_VBR_BIT_KHR = 0x00000004,
} VkVideoEncodeRateControlModeFlagBitsKHR;
-
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
specifies the use of implementation-specific rate control. -
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
specifies that rate control is disabled and the application will specify per-operation rate control parameters controlling the encoding quality. In this mode implementations will encode pictures independently of the output bitrate of prior video encode operations.-
When using an H.264 encode profile, implementations will use the QP value specified in VkVideoEncodeH264NaluSliceInfoKHR::
constantQp
to control the quality of the encoded picture. -
When using an H.265 encode profile, implementations will use the QP value specified in VkVideoEncodeH265NaluSliceSegmentInfoKHR::
constantQp
to control the quality of the encoded picture. -
When using an AV1 encode profile, implementations will use the quantizer index value specified in VkVideoEncodeAV1PictureInfoKHR::
constantQIndex
to control the quality of the encoded picture.
-
-
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_CBR_BIT_KHR
specifies the use of constant bitrate (CBR) rate control mode. In this mode the implementation will attempt to produce the encoded bitstream at a constant bitrate while conforming to the constraints of other rate control parameters. -
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_VBR_BIT_KHR
specifies the use of variable bitrate (VBR) rate control mode. In this mode the implementation will produce the encoded bitstream at a variable bitrate according to the constraints of other rate control parameters.
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeRateControlModeFlagsKHR;
VkVideoEncodeRateControlModeFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeRateControlModeFlagBitsKHR.
Leaky Bucket Model
Video encoding implementations use the leaky bucket model for stream-level rate control. The leaky bucket is a concept referring to the interface between the video encoder and the consumer (for example, a network connection), where the video encoder produces encoded bitstream data corresponding to the encoded pictures and adds them in the leaky bucket while its content are drained by the consumer.
Analogously, a similar leaky bucket is considered to exist at the input interface of a video decoder, into which encoded bitstream data is continuously added and is subsequently consumed by the video decoder. It is desirable to avoid overflowing or underflowing this leaky bucked because:
-
In case of an underflow, the video decoder will be unable to consume encoded bitstream data in order to decode pictures (and optionally display them).
-
In case of an overflow, the leaky bucket will be unable to accommodate more encoded bitstream data and such data may need to be thrown away, leading to the loss of the corresponding encoded pictures.
These requirements can be satisfied by imposing various constraints on the encoder-side leaky bucket to avoid its overflow or underflow, depending on the used rate control algorithm and codec parameters. However, enumerating these constraints is outside the scope of this specification.
The term virtual buffer is often used as an alternative to refer to the leaky bucket.
This virtual buffer model is defined by the following parameters:
-
The bitrate (
R
) at which the encoded bitstream is expected to be processed. -
The size (
B
) of the virtual buffer. -
The initial occupancy (
F
) of the virtual buffer.
In this model the virtual buffer is used to smooth out fluctuations in the bitrate of the encoded bitstream over time without experiencing buffer overflow or underflow, as long as the bitrate of the encoded stream does not diverge from the target bitrate for extended periods of time.
This buffering may inherently impose a processing delay, as the goal of the model is to enable decoders maintain a consistent processing rate of an encoded bitstream with varying data rate.
The initial or start-up delay (D
) is computed as:
-
D
=F
/R
Applications need to configure the virtual buffer with sufficient size to avoid or minimize buffer overflows and underflows while also keeping it small enough to meet their latency goals. |
Rate Control Layers
Some video compression standards and video profiles allow associating encoded pictures with specific video coding layers. The name, identification, and semantics associated with such video coding layers are defined by the corresponding video compression standards.
Analogously, stream-level rate control can be configured to use one or more rate control layers:
-
When a single rate control layer is configured, it is applied to all encoded pictures, regardless of the picture’s video coding layer. In this case the distribution of the available bitrate budget across video coding layers is implementation-dependent.
-
When multiple rate control layers are configured, each rate control layer is applied to the corresponding video coding layer, i.e. only across encoded pictures pertaining to the corresponding video coding layer.
Individual rate control layers are identified using layer indices between
zero and N-1
, where N
is the number of active rate control layers.
Rate control layers are only applicable when using stream-level rate control modes.
Rate Control State
Rate control state is maintained by the implementation in the
video session objects and its parameters are specified
using an instance of the VkVideoEncodeRateControlInfoKHR
structure.
The complete rate control state of a video session is defined by the
following set of parameters:
-
The values of the members of the VkVideoEncodeRateControlInfoKHR structure used to configure the rate control state.
-
The values of the members of any VkVideoEncodeRateControlLayerInfoKHR structures specified in VkVideoEncodeRateControlInfoKHR::
pLayers
used to configure the state of individual rate control layers. -
If the video session was created with an H.264 encode profile:
-
The values of the members of the VkVideoEncodeH264RateControlInfoKHR structure, if one is specified in the
pNext
chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state. -
The values of the members of any VkVideoEncodeH264RateControlLayerInfoKHR structures included in the
pNext
chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
-
-
If the video session was created with an H.265 encode profile:
-
The values of the members of the VkVideoEncodeH265RateControlInfoKHR structure, if one is specified in the
pNext
chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state. -
The values of the members of any VkVideoEncodeH265RateControlLayerInfoKHR structures included in the
pNext
chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
-
-
If the video session was created with an AV1 encode profile:
-
The values of the members of the VkVideoEncodeAV1RateControlInfoKHR structure, if one is specified in the
pNext
chain of the VkVideoEncodeRateControlInfoKHR used to configure the rate control state. -
The values of the members of any VkVideoEncodeAV1RateControlLayerInfoKHR structures included in the
pNext
chain of a VkVideoEncodeRateControlLayerInfoKHR structure used to configure the state of a rate control layer.
-
Two rate control states match if all the parameters listed above match between them.
The VkVideoEncodeRateControlInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeRateControlInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeRateControlFlagsKHR flags;
VkVideoEncodeRateControlModeFlagBitsKHR rateControlMode;
uint32_t layerCount;
const VkVideoEncodeRateControlLayerInfoKHR* pLayers;
uint32_t virtualBufferSizeInMs;
uint32_t initialVirtualBufferSizeInMs;
} VkVideoEncodeRateControlInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is reserved for future use. -
rateControlMode
is a VkVideoEncodeRateControlModeFlagBitsKHR value specifying the rate control mode. -
layerCount
specifies the number of rate control layers to use. -
pLayers
is a pointer to an array oflayerCount
VkVideoEncodeRateControlLayerInfoKHR structures, each specifying the rate control configuration of the corresponding rate control layer. -
virtualBufferSizeInMs
is the size in milliseconds of the virtual buffer used by the implementation’s rate control algorithm for the leaky bucket model, with respect to the average bitrate of the stream calculated by summing the values of theaverageBitrate
members of the elements of thepLayers
array. -
initialVirtualBufferSizeInMs
is the initial occupancy in milliseconds of the virtual buffer used by the implementation’s rate control algorithm for the leaky bucket model.
If layerCount
is zero then the values of virtualBufferSizeInMs
and initialVirtualBufferSizeInMs
are ignored.
This structure can be specified in the following places:
-
In the
pNext
chain of VkVideoBeginCodingInfoKHR to specify the current rate control state expected to be configured when beginning a video coding scope. -
In the
pNext
chain of VkVideoCodingControlInfoKHR to change the rate control configuration of the bound video session.
Including this structure in the pNext
chain of
VkVideoCodingControlInfoKHR and including
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
in
VkVideoCodingControlInfoKHR::flags
enables updating the rate
control configuration of the bound video session.
This replaces the entire rate control configuration of the bound video
session and may reset the state of all enabled rate control layers to an
initial state according to the codec-specific rate control semantics defined
in the corresponding sections listed below.
When layerCount
is greater than one, multiple
rate control layers are configured, and each
rate control layer is applied to the corresponding video coding layer
identified by the index of the corresponding element of pLayer
.
-
If the video session was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then this index specifies the H.264 temporal layer ID of the video coding layer the rate control layer is applied to. -
If the video session was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then this index specifies the H.265 temporal ID of the video coding layer the rate control layer is applied to. -
If the video session was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, then this index specifies the AV1 temporal ID of the temporal layer the rate control layer is applied to.
Additional structures providing codec-specific rate control parameters can
be included in the pNext
chain of VkVideoCodingControlInfoKHR
depending on the video profile the bound video session
was created.
For further details see:
The new rate control configuration takes effect when the corresponding vkCmdControlVideoCodingKHR is executed on the device, and only impacts video encode operations that follow in execution order.
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeRateControlFlagsKHR;
VkVideoEncodeRateControlFlagsKHR
is a bitmask type for setting a mask,
but currently reserved for future use.
Rate Control Layer State
The configuration of individual rate control layers is specified using an
instance of the VkVideoEncodeRateControlLayerInfoKHR
structure.
The VkVideoEncodeRateControlLayerInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkVideoEncodeRateControlLayerInfoKHR {
VkStructureType sType;
const void* pNext;
uint64_t averageBitrate;
uint64_t maxBitrate;
uint32_t frameRateNumerator;
uint32_t frameRateDenominator;
} VkVideoEncodeRateControlLayerInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
is a pointer to a structure extending this structure. -
averageBitrate
is the average bitrate to be targeted by the implementation’s rate control algorithm. -
maxBitrate
is the peak bitrate to be targeted by the implementation’s rate control algorithm. -
frameRateNumerator
is the numerator of the frame rate assumed by the implementation’s rate control algorithm. -
frameRateDenominator
is the denominator of the frame rate assumed by the implementation’s rate control algorithm.
The ability of the implementation’s rate control algorithm to be able to match the requested average and/or peak bitrates may be limited by the set of other codec-independent and codec-specific rate control parameters specified by the application, the input content, as well as the application conforming to the rate control guidance provided to the implementation, as described earlier. |
Additional structures providing codec-specific rate control parameters can
be included in the pNext
chain of
VkVideoEncodeRateControlLayerInfoKHR
depending on the
video profile the bound video session was created with.
For further details see:
Video Encode Quantization Maps
Quantization maps are VkImage objects that are used in video encode operations to control the relative quantization parameter values across the encoded picture. Each texel in the quantization map controls the relative quantization parameter values used to encode the corresponding rectangular block of texels in the encode input picture.
The size of the rectangular block of texels each quantization map texel covers is referred to as the quantization map texel size.
The extent of the image subresource used as a quantization map when encoding
a picture with a coded extent of (width
,height
) thus has
to be at least (⌈width
/ texelSize.width
⌉,
⌈height
/ texelSize.height
⌉), where texelSize
is the used quantization map texel size.
In particular, the quantization map texel at location
(x
,y
) contains relative quantization parameter values used
when encoding the texelSize
sized rectangular block of the
encode input picture starting at the texel location
(x
× texelSize.width
, y
×
texelSize.height
).
The quantization map texel size does not always match the size of the
codec-specific coding blocks used during encoding.
Furthermore, some video compression standards allow the size of the
codec-specific coding blocks to vary across the encoded picture.
In order to accommodate for such mismatches between the granularity at which
quantization parameters are stored in quantization maps and the granularity
at which they are applied to codec-specific coding blocks during encoding,
the following mapping rules are applied to define the quantization map texel
value corresponding to a given codec-specific coding block with a size
(width
,height
) at the texel location
(x
,y
) in the encode input picture:
-
If the size of the codec-specific coding block matches the used quantization map texel size, then the fetched quantization map value corresponding to the codec-specific coding block is the texel value at the texel location (
x
/texelSize.width
,y
/texelSize.height
). -
If the size of the codec-specific coding block is smaller than the used quantization map texel size, then the fetched quantization map value corresponding to the codec-specific coding block is the texel value at the texel location (⌊
x
/texelSize.width
⌋, ⌊y
/texelSize.height
⌋). -
If the size of the codec-specific coding block is larger than the used quantization map texel size, then the fetched quantization map value corresponding to the codec-specific coding block may be any value determined as the linear interpolation of the quantization map texel values in the subregion starting at texel location (
x
/texelSize.width
,y
/texelSize.height
) with a size (⌈width
/texelSize.width
⌉, ⌈height
/texelSize.height
⌉).
The actual control parameters stored in the quantization map depend on its type. This specification supports the following types of quantization maps:
Quantization Delta Maps
Quantization delta maps contain values that directly affect the codec-specific quantization parameter values used to encode the corresponding block of the encode input picture.
Quantization delta maps can be used in conjunction with any
rate control mode, including
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
.
Due to their codec-specific nature, they are described in more detail in the corresponding codec-specific section for video encode operations that support them. In particular:
-
The behavior of quantization delta maps used with an H.264 encode profile is described in the H.264 Encode Quantization and H.264 QP Delta Maps sections.
-
The behavior of quantization delta maps used with an H.265 encode profile is described in the H.265 Encode Quantization and H.265 QP Delta Maps sections.
-
The behavior of quantization delta maps used with an AV1 encode profile is described in the AV1 Encode Quantization and AV1 Quantizer Index Delta Maps sections.
This specification does not support quantization delta maps for any other video encode operation.
Emphasis Maps
Emphasis maps contain values that indirectly affect the codec-specific quantization parameter values used to encode the corresponding block of the encode input picture.
The texels of emphasis maps contain values that provide input to the encoder implementation about the relative importance (emphasis) of regions of the encoded pictures in order to enable the implementation’s rate control algorithm to allocate more bitrate budget for regions of the encoded picture with higher emphasis values than to those with lower emphasis values.
Emphasis maps can only be used when the current
rate control mode configured for the video
session is not VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
or
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
.
As these emphasis values only control the otherwise implementation-specific behavior of the used rate control algorithm, this specification does not impose additional restrictions on implementations beyond the ones outlined in the corresponding codec-specific sections describing quantization behavior:
-
The behavior of emphasis maps used with an H.264 encode profile is described in the H.264 Encode Quantization section.
-
The behavior of emphasis maps used with an H.265 encode profile is described in the H.265 Encode Quantization section.
-
The behavior of emphasis maps used with an AV1 encode profile is described in the AV1 Encode Quantization section.
This specification does not support emphasis maps for any other video encode operation.
Emphasis maps always have single channel unsigned normalized integer formats
and implementations are required to support the VK_FORMAT_R8_UNORM
format for emphasis maps, as reported in
VkVideoFormatPropertiesKHR::format
, when the video encode
profile supports VK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR
.
Quantization Map Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR with
pVideoProfile->videoCodecOperation
specifying an encode operation, the
VkVideoEncodeQuantizationMapCapabilitiesKHR structure can be included
in the pNext
chain of the VkVideoCapabilitiesKHR structure to
retrieve capabilities specific to video encode quantization maps.
The VkVideoEncodeQuantizationMapCapabilitiesKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_quantization_map
typedef struct VkVideoEncodeQuantizationMapCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkExtent2D maxQuantizationMapExtent;
} VkVideoEncodeQuantizationMapCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxQuantizationMapExtent
indicates the maximum supported width and height of quantization maps.
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities of an H.264 encode profile, the
VkVideoEncodeH264QuantizationMapCapabilitiesKHR
structure can be
included in the pNext
chain of the VkVideoCapabilitiesKHR
structure to retrieve additional video encode quantization map capabilities
specific to H.264 encode profiles.
The VkVideoEncodeH264QuantizationMapCapabilitiesKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_h264 with VK_KHR_video_encode_quantization_map
typedef struct VkVideoEncodeH264QuantizationMapCapabilitiesKHR {
VkStructureType sType;
void* pNext;
int32_t minQpDelta;
int32_t maxQpDelta;
} VkVideoEncodeH264QuantizationMapCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
minQpDelta
indicates the minimum QP delta value supported for H.264 QP delta maps. -
maxQpDelta
indicates the maximum QP delta value supported for H.264 QP delta maps.
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities of an H.265 encode profile, the
VkVideoEncodeH265QuantizationMapCapabilitiesKHR
structure can be
included in the pNext
chain of the VkVideoCapabilitiesKHR
structure to retrieve additional video encode quantization map capabilities
specific to H.265 encode profiles.
The VkVideoEncodeH265QuantizationMapCapabilitiesKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_h265 with VK_KHR_video_encode_quantization_map
typedef struct VkVideoEncodeH265QuantizationMapCapabilitiesKHR {
VkStructureType sType;
void* pNext;
int32_t minQpDelta;
int32_t maxQpDelta;
} VkVideoEncodeH265QuantizationMapCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
minQpDelta
indicates the minimum QP delta value supported for H.265 QP delta maps. -
maxQpDelta
indicates the maximum QP delta value supported for H.265 QP delta maps.
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities of an AV1 encode profile, the
VkVideoEncodeAV1QuantizationMapCapabilitiesKHR
structure can be
included in the pNext
chain of the VkVideoCapabilitiesKHR
structure to retrieve additional video encode quantization map capabilities
specific to AV1 encode profiles.
The VkVideoEncodeAV1QuantizationMapCapabilitiesKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_av1 with VK_KHR_video_encode_quantization_map
typedef struct VkVideoEncodeAV1QuantizationMapCapabilitiesKHR {
VkStructureType sType;
void* pNext;
int32_t minQIndexDelta;
int32_t maxQIndexDelta;
} VkVideoEncodeAV1QuantizationMapCapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
minQIndexDelta
indicates the minimum quantizer index delta value supported for AV1 quantizer index delta maps. -
maxQIndexDelta
indicates the maximum quantizer index delta value supported for AV1 quantizer index delta maps.
Quantization Map Format Properties
When calling vkGetPhysicalDeviceVideoFormatPropertiesKHR, the
VkVideoFormatQuantizationMapPropertiesKHR structure can be included
in the pNext
chain of the VkVideoFormatPropertiesKHR structure
to retrieve video format properties specific to video encode quantization
maps.
The VkVideoFormatQuantizationMapPropertiesKHR
structure is defined as:
// Provided by VK_KHR_video_encode_quantization_map
typedef struct VkVideoFormatQuantizationMapPropertiesKHR {
VkStructureType sType;
void* pNext;
VkExtent2D quantizationMapTexelSize;
} VkVideoFormatQuantizationMapPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
quantizationMapTexelSize
indicates the quantization map texel size of the video format, i.e. the number of pixels corresponding to each quantization map texel.
The values returned in this structure are only defined if the allowed image
usage flags returned in
VkVideoFormatPropertiesKHR::imageUsageFlags
for this video
format include
VK_IMAGE_USAGE_VIDEO_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
or
VK_IMAGE_USAGE_VIDEO_ENCODE_EMPHASIS_MAP_BIT_KHR
.
Implementations may support multiple quantization map texel sizes for a
particular video format which is indicated by
vkGetPhysicalDeviceVideoFormatPropertiesKHR returning multiple entries
with different quantizationMapTexelSize
values.
When calling vkGetPhysicalDeviceVideoFormatPropertiesKHR, the
VkVideoFormatH265QuantizationMapPropertiesKHR
structure can be
included in the pNext
chain of the VkVideoFormatPropertiesKHR
structure to retrieve video format properties specific to video encode
quantization maps used with an H.265 encode profile.
The VkVideoFormatH265QuantizationMapPropertiesKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_h265 with VK_KHR_video_encode_quantization_map
typedef struct VkVideoFormatH265QuantizationMapPropertiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeH265CtbSizeFlagsKHR compatibleCtbSizes;
} VkVideoFormatH265QuantizationMapPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
compatibleCtbSizes
is a bitmask of VkVideoEncodeH265CtbSizeFlagBitsKHR indicating the CTB sizes that quantization maps using this video format are compatible with.The value of
compatibleCtbSizes
does not limit the use of the specific quantization map format, but does limit the implementation in being able to encode pictures with CTB sizes not included incompatibleCtbSizes
but otherwise supported by the used video profile, as indicated by VkVideoEncodeH265CapabilitiesKHR::ctbSizes
. In particular, using smaller quantization map texel sizes may prevent implementations from encoding with larger CTB sizes which may have a negative impact on the efficiency of the encoder.
The values returned in this structure are only defined if the allowed image
usage flags returned in
VkVideoFormatPropertiesKHR::imageUsageFlags
for this video
format include
VK_IMAGE_USAGE_VIDEO_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
or
VK_IMAGE_USAGE_VIDEO_ENCODE_EMPHASIS_MAP_BIT_KHR
.
When calling vkGetPhysicalDeviceVideoFormatPropertiesKHR, the
VkVideoFormatAV1QuantizationMapPropertiesKHR
structure can be
included in the pNext
chain of the VkVideoFormatPropertiesKHR
structure to retrieve video format properties specific to video encode
quantization maps used with an AV1 encode profile.
The VkVideoFormatAV1QuantizationMapPropertiesKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_av1 with VK_KHR_video_encode_quantization_map
typedef struct VkVideoFormatAV1QuantizationMapPropertiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeAV1SuperblockSizeFlagsKHR compatibleSuperblockSizes;
} VkVideoFormatAV1QuantizationMapPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
compatibleSuperblockSizes
is a bitmask of VkVideoEncodeAV1SuperblockSizeFlagBitsKHR indicating the AV1 superblock sizes that quantization maps using this video format are compatible with.The value of
compatibleSuperblockSizes
does not limit the use of the specific quantization map format, but does limit the implementation in being able to encode pictures with superblock sizes not included incompatibleSuperblockSizes
but otherwise supported by the used video profile, as indicated by VkVideoEncodeAV1CapabilitiesKHR::superblockSizes
. In particular, using smaller quantization map texel sizes may prevent implementations from encoding with larger superblock sizes which may have a negative impact on the efficiency of the encoder.
The values returned in this structure are only defined if the allowed image
usage flags returned in
VkVideoFormatPropertiesKHR::imageUsageFlags
for this video
format include
VK_IMAGE_USAGE_VIDEO_ENCODE_QUANTIZATION_DELTA_MAP_BIT_KHR
or
VK_IMAGE_USAGE_VIDEO_ENCODE_EMPHASIS_MAP_BIT_KHR
.
Encoding with Quantization Maps
The VkVideoEncodeQuantizationMapInfoKHR
structure can be included in
the pNext
chain of the VkVideoEncodeInfoKHR structure passed to
the vkCmdEncodeVideoKHR command to specify the quantization map used
by the issued video encode operations.
The VkVideoEncodeQuantizationMapInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_quantization_map
typedef struct VkVideoEncodeQuantizationMapInfoKHR {
VkStructureType sType;
const void* pNext;
VkImageView quantizationMap;
VkExtent2D quantizationMapExtent;
} VkVideoEncodeQuantizationMapInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
quantizationMap
specifies the image view to use as the quantization map. -
quantizationMapExtent
specifies the extent of the image subregion ofquantizationMap
to use as the quantization map starting at offset (0,0).
H.264 Encode Operations
Video encode operations using an H.264 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.264 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos. |
This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.264 Specification as follows:
-
Syntax elements, derived values, and other parameters are applied from the following structures:
-
The
StdVideoH264SequenceParameterSet
structure corresponding to the active SPS specifying the H.264 sequence parameter set. -
The
StdVideoH264PictureParameterSet
structure corresponding to the active PPS specifying the H.264 picture parameter set. -
The
StdVideoEncodeH264PictureInfo
structure specifying the H.264 picture information. -
The
StdVideoEncodeH264SliceHeader
structures specifying the H.264 slice header parameters for each encoded H.264 slice. -
The
StdVideoEncodeH264ReferenceInfo
structures specifying the H.264 reference information corresponding to the optional reconstructed picture and any active reference pictures.
-
-
The encoded bitstream data is written to the destination video bitstream buffer range as defined in the H.264 Encode Bitstream Data Access section.
-
Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the H.264 Encode Picture Data Access section.
-
The decision on reference picture setup is made according to the parameters specified in the H.264 picture information.
If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.264 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.
H.264 Encode Parameter Overrides
Implementations may override, unless otherwise specified, any of the H.264 encode parameters specified in the following Video Std structures:
-
StdVideoH264SequenceParameterSet
-
StdVideoH264PictureParameterSet
-
StdVideoEncodeH264PictureInfo
-
StdVideoEncodeH264SliceHeader
-
StdVideoEncodeH264ReferenceInfo
All such H.264 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.
In addition, implementations must not override any of the following H.264 encode parameters:
-
StdVideoEncodeH264PictureInfo
::primary_pic_type
-
StdVideoEncodeH264SliceHeader
::slice_type
In case of H.264 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.264 parameter sets in the bitstream in order to be able to produce a compliant H.264 video bitstream using the H.264 encode parameters stored in the video session parameters object.
In case of any H.264 encode parameters stored in the encoded bitstream
produced by video encode operations, if the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, the
application can use such queries to retrieve feedback about whether any
implementation overrides have been applied to those H.264 encode parameters.
H.264 Encode Bitstream Data Access
Each video encode operation writes one or more VCL NAL units comprising of
slice headers and data of the encoded picture, in the format defined in
sections 7.3.3 and 7.3.4, according to the semantics defined in sections
7.4.3 and 7.4.4 of the ITU-T H.264 Specification,
respectively.
The number of VCL NAL units written is specified by
VkVideoEncodeH264PictureInfoKHR::naluSliceEntryCount
.
In addition, if
VkVideoEncodeH264PictureInfoKHR::generatePrefixNalu
is
VK_TRUE
for the video encode operation, then an additional prefix NAL
unit is written before each VCL NAL unit corresponding to individual slices
in the format defined in section 7.3.2.12, according to the semantics
defined in section 7.4.2.12 of the ITU-T H.264 Specification,
respectively.
H.264 Encode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a encode input picture, reference picture, or
reconstructed picture accessed by video coding
operations using an H.264 encode profile is defined
as the set of texels within the coordinate range:
-
([0,
endX
), [0,endY
))
Where:
-
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure; -
endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video encode operations using an H.264 encode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.264 Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).
H.264 Frame, Picture, and Slice
H.264 pictures are partitioned into slices, as defined in section 6.3 of the ITU-T H.264 Specification.
For the purposes of this specification, the H.264 slices comprising a picture are referred to as the picture partitions of the picture.
Video encode operations using an H.264 encode profile can encode slices of different types, as defined in section 7.4.3
of the ITU-T H.264 Specification, by specifying the
corresponding enumeration constant value in
StdVideoEncodeH264SliceHeader
::slice_type
in the
H.264 slice header parameters from the
Video Std enumeration type StdVideoH264SliceType
:
-
STD_VIDEO_H264_SLICE_TYPE_P
indicates that the slice is a P slice as defined in section 3.109 of the ITU-T H.264 Specification. -
STD_VIDEO_H264_SLICE_TYPE_B
indicates that the slice is a B slice as defined in section 3.9 of the ITU-T H.264 Specification. -
STD_VIDEO_H264_SLICE_TYPE_I
indicates that the slice is an I slice as defined in section 3.66 of the ITU-T H.264 Specification.
Pictures constructed from such slices can be of different types, as defined
in section 7.4.2.4 of the ITU-T H.264 Specification.
Video encode operations using an H.264 encode profile can encode pictures of a specific type by specifying the
corresponding enumeration constant value in
StdVideoEncodeH264PictureInfo
::primary_pic_type
in the
H.264 picture information from the Video Std
enumeration type StdVideoH264PictureType
:
-
STD_VIDEO_H264_PICTURE_TYPE_P
indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame. -
STD_VIDEO_H264_PICTURE_TYPE_B
indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame. -
STD_VIDEO_H264_PICTURE_TYPE_I
indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame. -
STD_VIDEO_H264_PICTURE_TYPE_IDR
indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.69 of the ITU-T H.264 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.
H.264 Coding Blocks
H.264 encode supports a single type of coding block called a macroblock, as defined in section 3.84 of the ITU-T H.264 Specification.
H.264 Encode Profile
A video profile supporting H.264 video encode operations is specified by
setting VkVideoProfileInfoKHR::videoCodecOperation
to
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
and adding a
VkVideoEncodeH264ProfileInfoKHR
structure to the
VkVideoProfileInfoKHR::pNext
chain.
The VkVideoEncodeH264ProfileInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264ProfileInfoKHR {
VkStructureType sType;
const void* pNext;
StdVideoH264ProfileIdc stdProfileIdc;
} VkVideoEncodeH264ProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdProfileIdc
is aStdVideoH264ProfileIdc
value specifying the H.264 codec profile IDC, as defined in section A.2 of the ITU-T H.264 Specification.
H.264 Encode Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities for an H.264 encode profile, the
VkVideoCapabilitiesKHR::pNext
chain must include a
VkVideoEncodeH264CapabilitiesKHR
structure that will be filled with
the profile-specific capabilities.
The VkVideoEncodeH264CapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264CapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeH264CapabilityFlagsKHR flags;
StdVideoH264LevelIdc maxLevelIdc;
uint32_t maxSliceCount;
uint32_t maxPPictureL0ReferenceCount;
uint32_t maxBPictureL0ReferenceCount;
uint32_t maxL1ReferenceCount;
uint32_t maxTemporalLayerCount;
VkBool32 expectDyadicTemporalLayerPattern;
int32_t minQp;
int32_t maxQp;
VkBool32 prefersGopRemainingFrames;
VkBool32 requiresGopRemainingFrames;
VkVideoEncodeH264StdFlagsKHR stdSyntaxFlags;
} VkVideoEncodeH264CapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeH264CapabilityFlagBitsKHR indicating supported H.264 encoding capabilities. -
maxLevelIdc
is aStdVideoH264LevelIdc
value indicating the maximum H.264 level supported by the profile, where enum constantSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifies H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification. -
maxSliceCount
indicates the maximum number of slices that can be encoded for a single picture. Further restrictions may apply to the number of slices that can be encoded for a single picture depending on other capabilities and codec-specific rules. -
maxPPictureL0ReferenceCount
indicates the maximum number of reference pictures the implementation supports in the reference list L0 for P pictures.As implementations may override the reference lists,
maxPPictureL0ReferenceCount
does not limit the number of elements that the application can specify in the L0 reference list for P pictures. However, ifmaxPPictureL0ReferenceCount
is zero, then the use of P pictures is not allowed. -
maxBPictureL0ReferenceCount
indicates the maximum number of reference pictures the implementation supports in the reference list L0 for B pictures. -
maxL1ReferenceCount
indicates the maximum number of reference pictures the implementation supports in the reference list L1 if encoding of B pictures is supported.As implementations may override the reference lists,
maxBPictureL0ReferenceCount
andmaxL1ReferenceCount
does not limit the number of elements that the application can specify in the L0 and L1 reference lists for B pictures. However, ifmaxBPictureL0ReferenceCount
andmaxL1ReferenceCount
are both zero, then the use of B pictures is not allowed. -
maxTemporalLayerCount
indicates the maximum number of H.264 temporal layers supported by the implementation. -
expectDyadicTemporalLayerPattern
indicates that the implementation’s rate control algorithms expect the application to use a dyadic temporal layer pattern when encoding multiple temporal layers. -
minQp
indicates the minimum QP value supported. -
maxQp
indicates the maximum QP value supported. -
prefersGopRemainingFrames
indicates that the implementation’s rate control algorithm prefers the application to specify the number of frames of each type remaining in the current group of pictures when beginning a video coding scope. -
requiresGopRemainingFrames
indicates that the implementation’s rate control algorithm requires the application to specify the number of frames of each type remaining in the current group of pictures when beginning a video coding scope. -
stdSyntaxFlags
is a bitmask of VkVideoEncodeH264StdFlagBitsKHR indicating capabilities related to H.264 syntax elements.
Bits which may be set in
VkVideoEncodeH264CapabilitiesKHR::flags
, indicating the H.264
encoding capabilities supported, are:
// Provided by VK_KHR_video_encode_h264
typedef enum VkVideoEncodeH264CapabilityFlagBitsKHR {
VK_VIDEO_ENCODE_H264_CAPABILITY_HRD_COMPLIANCE_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H264_CAPABILITY_PREDICTION_WEIGHT_TABLE_GENERATED_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H264_CAPABILITY_ROW_UNALIGNED_SLICE_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H264_CAPABILITY_DIFFERENT_SLICE_TYPE_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_H264_CAPABILITY_B_FRAME_IN_L0_LIST_BIT_KHR = 0x00000010,
VK_VIDEO_ENCODE_H264_CAPABILITY_B_FRAME_IN_L1_LIST_BIT_KHR = 0x00000020,
VK_VIDEO_ENCODE_H264_CAPABILITY_PER_PICTURE_TYPE_MIN_MAX_QP_BIT_KHR = 0x00000040,
VK_VIDEO_ENCODE_H264_CAPABILITY_PER_SLICE_CONSTANT_QP_BIT_KHR = 0x00000080,
VK_VIDEO_ENCODE_H264_CAPABILITY_GENERATE_PREFIX_NALU_BIT_KHR = 0x00000100,
// Provided by VK_KHR_video_encode_h264 with VK_KHR_video_encode_quantization_map
VK_VIDEO_ENCODE_H264_CAPABILITY_MB_QP_DIFF_WRAPAROUND_BIT_KHR = 0x00000200,
} VkVideoEncodeH264CapabilityFlagBitsKHR;
-
VK_VIDEO_ENCODE_H264_CAPABILITY_HRD_COMPLIANCE_BIT_KHR
specifies whether the implementation may be able to generate HRD compliant bitstreams if any of thenal_hrd_parameters_present_flag
orvcl_hrd_parameters_present_flag
members ofStdVideoH264SpsVuiFlags
are set to1
in the active SPS. -
VK_VIDEO_ENCODE_H264_CAPABILITY_PREDICTION_WEIGHT_TABLE_GENERATED_BIT_KHR
specifies that ifStdVideoH264PpsFlags
::weighted_pred_flag
is set to1
orStdVideoH264PictureParameterSet
::weighted_bipred_idc
is set toSTD_VIDEO_H264_WEIGHTED_BIPRED_IDC_EXPLICIT
in the active PPS when encoding a P picture or B picture, respectively, then the implementation is able to internally decide syntax forpred_weight_table
, as defined in section 7.4.3.2 of the ITU-T H.264 Specification, and the application is not required to provide a weight table in the H.264 slice header parameters. -
VK_VIDEO_ENCODE_H264_CAPABILITY_ROW_UNALIGNED_SLICE_BIT_KHR
specifies that each slice in a frame with multiple slices may begin or finish at any offset in a macroblock row. If not supported, all slices in the frame must begin at the start of a macroblock row (and hence each slice must finish at the end of a macroblock row). -
VK_VIDEO_ENCODE_H264_CAPABILITY_DIFFERENT_SLICE_TYPE_BIT_KHR
specifies that when a frame is encoded with multiple slices, the implementation allows encoding each slice with a differentStdVideoEncodeH264SliceHeader
::slice_type
specified in the H.264 slice header parameters. If not supported, all slices of the frame must be encoded with the sameslice_type
which corresponds to the picture type of the frame. -
VK_VIDEO_ENCODE_H264_CAPABILITY_B_FRAME_IN_L0_LIST_BIT_KHR
specifies support for using a B frame as L0 reference, as specified inStdVideoEncodeH264ReferenceListsInfo
::RefPicList0
in the H.264 picture information. -
VK_VIDEO_ENCODE_H264_CAPABILITY_B_FRAME_IN_L1_LIST_BIT_KHR
specifies support for using a B frame as L1 reference, as specified inStdVideoEncodeH264ReferenceListsInfo
::RefPicList1
in the H.264 picture information. -
VK_VIDEO_ENCODE_H264_CAPABILITY_PER_PICTURE_TYPE_MIN_MAX_QP_BIT_KHR
specifies support for specifying different QP values in the members of VkVideoEncodeH264QpKHR. -
VK_VIDEO_ENCODE_H264_CAPABILITY_PER_SLICE_CONSTANT_QP_BIT_KHR
specifies support for specifying different constant QP values for each slice. -
VK_VIDEO_ENCODE_H264_CAPABILITY_GENERATE_PREFIX_NALU_BIT_KHR
specifies support for generating prefix NAL units by setting VkVideoEncodeH264PictureInfoKHR::generatePrefixNalu
toVK_TRUE
. -
VK_VIDEO_ENCODE_H264_CAPABILITY_MB_QP_DIFF_WRAPAROUND_BIT_KHR
indicates support for wraparound during the calculation of the QP values of subsequently encoded macroblocks, as defined in equation 7-37 of the ITU-T H.264 Specification. If not supported, equation 7-37 of the ITU-T H.264 Specification is effectively reduced to the following:QPY = QPY,PREV +
mb_qp_delta
The effect of this is that the maximum QP difference across subsequent macroblocks is limited to the [-(26 + QpBdOffsetY / 2), 25 + QpBdOffsetY / 2] range.
// Provided by VK_KHR_video_encode_h264
typedef VkFlags VkVideoEncodeH264CapabilityFlagsKHR;
VkVideoEncodeH264CapabilityFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeH264CapabilityFlagBitsKHR.
Bits which may be set in
VkVideoEncodeH264CapabilitiesKHR::stdSyntaxFlags
, indicating the
capabilities related to the H.264 syntax elements, are:
// Provided by VK_KHR_video_encode_h264
typedef enum VkVideoEncodeH264StdFlagBitsKHR {
VK_VIDEO_ENCODE_H264_STD_SEPARATE_COLOR_PLANE_FLAG_SET_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H264_STD_QPPRIME_Y_ZERO_TRANSFORM_BYPASS_FLAG_SET_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H264_STD_SCALING_MATRIX_PRESENT_FLAG_SET_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H264_STD_CHROMA_QP_INDEX_OFFSET_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_H264_STD_SECOND_CHROMA_QP_INDEX_OFFSET_BIT_KHR = 0x00000010,
VK_VIDEO_ENCODE_H264_STD_PIC_INIT_QP_MINUS26_BIT_KHR = 0x00000020,
VK_VIDEO_ENCODE_H264_STD_WEIGHTED_PRED_FLAG_SET_BIT_KHR = 0x00000040,
VK_VIDEO_ENCODE_H264_STD_WEIGHTED_BIPRED_IDC_EXPLICIT_BIT_KHR = 0x00000080,
VK_VIDEO_ENCODE_H264_STD_WEIGHTED_BIPRED_IDC_IMPLICIT_BIT_KHR = 0x00000100,
VK_VIDEO_ENCODE_H264_STD_TRANSFORM_8X8_MODE_FLAG_SET_BIT_KHR = 0x00000200,
VK_VIDEO_ENCODE_H264_STD_DIRECT_SPATIAL_MV_PRED_FLAG_UNSET_BIT_KHR = 0x00000400,
VK_VIDEO_ENCODE_H264_STD_ENTROPY_CODING_MODE_FLAG_UNSET_BIT_KHR = 0x00000800,
VK_VIDEO_ENCODE_H264_STD_ENTROPY_CODING_MODE_FLAG_SET_BIT_KHR = 0x00001000,
VK_VIDEO_ENCODE_H264_STD_DIRECT_8X8_INFERENCE_FLAG_UNSET_BIT_KHR = 0x00002000,
VK_VIDEO_ENCODE_H264_STD_CONSTRAINED_INTRA_PRED_FLAG_SET_BIT_KHR = 0x00004000,
VK_VIDEO_ENCODE_H264_STD_DEBLOCKING_FILTER_DISABLED_BIT_KHR = 0x00008000,
VK_VIDEO_ENCODE_H264_STD_DEBLOCKING_FILTER_ENABLED_BIT_KHR = 0x00010000,
VK_VIDEO_ENCODE_H264_STD_DEBLOCKING_FILTER_PARTIAL_BIT_KHR = 0x00020000,
VK_VIDEO_ENCODE_H264_STD_SLICE_QP_DELTA_BIT_KHR = 0x00080000,
VK_VIDEO_ENCODE_H264_STD_DIFFERENT_SLICE_QP_DELTA_BIT_KHR = 0x00100000,
} VkVideoEncodeH264StdFlagBitsKHR;
-
VK_VIDEO_ENCODE_H264_STD_SEPARATE_COLOR_PLANE_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264SpsFlags
::separate_colour_plane_flag
in the SPS when that value is1
. -
VK_VIDEO_ENCODE_H264_STD_QPPRIME_Y_ZERO_TRANSFORM_BYPASS_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264SpsFlags
::qpprime_y_zero_transform_bypass_flag
in the SPS when that value is1
. -
VK_VIDEO_ENCODE_H264_STD_SCALING_MATRIX_PRESENT_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided values forStdVideoH264SpsFlags
::seq_scaling_matrix_present_flag
in the SPS andStdVideoH264PpsFlags
::pic_scaling_matrix_present_flag
in the PPS when any of those values are1
. -
VK_VIDEO_ENCODE_H264_STD_CHROMA_QP_INDEX_OFFSET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PictureParameterSet
::chroma_qp_index_offset
in the PPS when that value is non-zero. -
VK_VIDEO_ENCODE_H264_STD_SECOND_CHROMA_QP_INDEX_OFFSET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PictureParameterSet
::second_chroma_qp_index_offset
in the PPS when that value is non-zero. -
VK_VIDEO_ENCODE_H264_STD_PIC_INIT_QP_MINUS26_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PictureParameterSet
::pic_init_qp_minus26
in the PPS when that value is non-zero. -
VK_VIDEO_ENCODE_H264_STD_WEIGHTED_PRED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PpsFlags
::weighted_pred_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H264_STD_WEIGHTED_BIPRED_IDC_EXPLICIT_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PictureParameterSet
::weighted_bipred_idc
in the PPS when that value isSTD_VIDEO_H264_WEIGHTED_BIPRED_IDC_EXPLICIT
. -
VK_VIDEO_ENCODE_H264_STD_WEIGHTED_BIPRED_IDC_IMPLICIT_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PictureParameterSet
::weighted_bipred_idc
in the PPS when that value isSTD_VIDEO_H264_WEIGHTED_BIPRED_IDC_IMPLICIT
. -
VK_VIDEO_ENCODE_H264_STD_TRANSFORM_8X8_MODE_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PpsFlags
::transform_8x8_mode_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H264_STD_DIRECT_SPATIAL_MV_PRED_FLAG_UNSET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH264SliceHeaderFlags
::direct_spatial_mv_pred_flag
in the H.264 slice header parameters when that value is0
. -
VK_VIDEO_ENCODE_H264_STD_ENTROPY_CODING_MODE_FLAG_UNSET_BIT_KHR
specifies whether the implementation supports CAVLC entropy coding, as defined in section 9.2 of the ITU-T H.264 Specification, and thus supports using the application-provided value forStdVideoH264PpsFlags
::entropy_coding_mode_flag
in the PPS when that value is0
. -
VK_VIDEO_ENCODE_H264_STD_ENTROPY_CODING_MODE_FLAG_SET_BIT_KHR
specifies whether the implementation supports CABAC entropy coding, as defined in section 9.3 of the ITU-T H.264 Specification, and thus supports using the application-provided value forStdVideoH264PpsFlags
::entropy_coding_mode_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H264_STD_DIRECT_8X8_INFERENCE_FLAG_UNSET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264SpsFlags
::direct_8x8_inference_flag
in the SPS when that value is0
. -
VK_VIDEO_ENCODE_H264_STD_CONSTRAINED_INTRA_PRED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH264PpsFlags
::constrained_intra_pred_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H264_STD_DEBLOCKING_FILTER_DISABLED_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH264SliceHeader
::disable_deblocking_filter_idc
in the H.264 slice header parameters when that value isSTD_VIDEO_H264_DISABLE_DEBLOCKING_FILTER_IDC_DISABLED
. -
VK_VIDEO_ENCODE_H264_STD_DEBLOCKING_FILTER_ENABLED_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH264SliceHeader
::disable_deblocking_filter_idc
in the H.264 slice header parameters when that value isSTD_VIDEO_H264_DISABLE_DEBLOCKING_FILTER_IDC_ENABLED
. -
VK_VIDEO_ENCODE_H264_STD_DEBLOCKING_FILTER_PARTIAL_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH264SliceHeader
::disable_deblocking_filter_idc
in the H.264 slice header parameters when that value isSTD_VIDEO_H264_DISABLE_DEBLOCKING_FILTER_IDC_PARTIAL
. -
VK_VIDEO_ENCODE_H264_STD_SLICE_QP_DELTA_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH264SliceHeader
::slice_qp_delta
in the H.264 slice header parameters when that value is identical across the slices of the encoded frame. -
VK_VIDEO_ENCODE_H264_STD_DIFFERENT_SLICE_QP_DELTA_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH264SliceHeader
::slice_qp_delta
in the H.264 slice header parameters when that value is different across the slices of the encoded frame.
These capability flags provide information to the application about specific H.264 syntax element values that the implementation supports without having to override them and do not otherwise restrict the values that the application can specify for any of the mentioned H.264 syntax elements.
// Provided by VK_KHR_video_encode_h264
typedef VkFlags VkVideoEncodeH264StdFlagsKHR;
VkVideoEncodeH264StdFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoEncodeH264StdFlagBitsKHR.
H.264 Encode Quality Level Properties
When calling vkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR
with pVideoProfile->videoCodecOperation
specified as
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, the
VkVideoEncodeH264QualityLevelPropertiesKHR structure must be included
in the pNext
chain of the VkVideoEncodeQualityLevelPropertiesKHR
structure to retrieve additional video encode quality level properties
specific to H.264 encoding.
The VkVideoEncodeH264QualityLevelPropertiesKHR structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264QualityLevelPropertiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeH264RateControlFlagsKHR preferredRateControlFlags;
uint32_t preferredGopFrameCount;
uint32_t preferredIdrPeriod;
uint32_t preferredConsecutiveBFrameCount;
uint32_t preferredTemporalLayerCount;
VkVideoEncodeH264QpKHR preferredConstantQp;
uint32_t preferredMaxL0ReferenceCount;
uint32_t preferredMaxL1ReferenceCount;
VkBool32 preferredStdEntropyCodingModeFlag;
} VkVideoEncodeH264QualityLevelPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
preferredRateControlFlags
is a bitmask of VkVideoEncodeH264RateControlFlagBitsKHR values indicating the preferred flags to use for VkVideoEncodeH264RateControlInfoKHR::flags
. -
preferredGopFrameCount
indicates the preferred value to use for VkVideoEncodeH264RateControlInfoKHR::gopFrameCount
. -
preferredIdrPeriod
indicates the preferred value to use for VkVideoEncodeH264RateControlInfoKHR::idrPeriod
. -
preferredConsecutiveBFrameCount
indicates the preferred value to use for VkVideoEncodeH264RateControlInfoKHR::consecutiveBFrameCount
. -
preferredTemporalLayerCount
indicates the preferred value to use for VkVideoEncodeH264RateControlInfoKHR::temporalLayerCount
. -
preferredConstantQp
indicates the preferred values to use for VkVideoEncodeH264NaluSliceInfoKHR::constantQp
for each picture type when using rate control modeVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. -
preferredMaxL0ReferenceCount
indicates the preferred maximum number of reference pictures to use in the reference list L0. -
preferredMaxL1ReferenceCount
indicates the preferred maximum number of reference pictures to use in the reference list L1. -
preferredStdEntropyCodingModeFlag
indicates the preferred value to use forentropy_coding_mode_flag
inStdVideoH264PpsFlags
.
H.264 Encode Session
Additional parameters can be specified when creating a video session with an
H.264 encode profile by including an instance of the
VkVideoEncodeH264SessionCreateInfoKHR structure in the pNext
chain of VkVideoSessionCreateInfoKHR.
The VkVideoEncodeH264SessionCreateInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264SessionCreateInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useMaxLevelIdc;
StdVideoH264LevelIdc maxLevelIdc;
} VkVideoEncodeH264SessionCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useMaxLevelIdc
indicates whether the value ofmaxLevelIdc
should be used by the implementation. When it isVK_FALSE
, the implementation ignores the value ofmaxLevelIdc
and uses the value of VkVideoEncodeH264CapabilitiesKHR::maxLevelIdc
, as reported by vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile. -
maxLevelIdc
is aStdVideoH264LevelIdc
value specifying the upper bound on the H.264 level for the video bitstreams produced by the created video session, where enum constantSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifies H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification.
H.264 Encode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
can contain the following types of parameters:
- H.264 Sequence Parameter Sets (SPS)
-
Represented by
StdVideoH264SequenceParameterSet
structures and interpreted as follows:-
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored; -
seq_parameter_set_id
is used as the key of the SPS entry; -
level_idc
is one of the enum constantsSTD_VIDEO_H264_LEVEL_IDC_<major>_<minor>
identifying the H.264 level<major>.<minor>
as defined in section A.3 of the ITU-T H.264 Specification; -
if
flags.seq_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
scaling_list_present_mask
is a bitmask where bit index i corresponds toseq_scaling_list_present_flag[i]
as defined in section 7.4.2.1 of the ITU-T H.264 Specification; -
use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.1 of the ITU-T H.264 Specification; -
ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.1 of the ITU-T H.264 Specification;
-
-
if
flags.vui_parameters_present_flag
is set, thenpSequenceParameterSetVui
is a pointer to aStdVideoH264SequenceParameterSetVui
structure that is interpreted as follows:-
reserved1
is used only for padding purposes and is otherwise ignored; -
flags.color_description_present_flag
is interpreted as the value ofcolour_description_present_flag
, as defined in section E.2.1 of the ITU-T H.264 Specification;The name of
colour_description_present_flag
was misspelled in the Video Std header. -
if
flags.nal_hrd_parameters_present_flag
orflags.vcl_hrd_parameters_present_flag
is set, then theStdVideoH264HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:-
reserved1
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoH264HrdParameters
are interpreted as defined in section E.2.2 of the ITU-T H.264 Specification;
-
-
all other members of
StdVideoH264SequenceParameterSetVui
are interpreted as defined in section E.2.1 of the ITU-T H.264 Specification;
-
-
all other members of
StdVideoH264SequenceParameterSet
are interpreted as defined in section 7.4.2.1 of the ITU-T H.264 Specification.
-
- H.264 Picture Parameter Sets (PPS)
-
Represented by
StdVideoH264PictureParameterSet
structures and interpreted as follows:-
the pair constructed from
seq_parameter_set_id
andpic_parameter_set_id
is used as the key of the PPS entry; -
if
flags.pic_scaling_matrix_present_flag
is set, then theStdVideoH264ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
scaling_list_present_mask
is a bitmask where bit index i corresponds topic_scaling_list_present_flag[i]
as defined in section 7.4.2.2 of the ITU-T H.264 Specification; -
use_default_scaling_matrix_mask
is a bitmask where bit index i corresponds toUseDefaultScalingMatrix4x4Flag[i]
, when i < 6, or corresponds toUseDefaultScalingMatrix8x8Flag[i-6]
, otherwise, as defined in section 7.3.2.2 of the ITU-T H.264 Specification; -
ScalingList4x4
andScalingList8x8
correspond to the identically named syntax elements defined in section 7.3.2.2 of the ITU-T H.264 Specification;
-
-
all other members of
StdVideoH264PictureParameterSet
are interpreted as defined in section 7.4.2.2 of the ITU-T H.264 Specification.
-
Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.264 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.264 parameter sets in order to be able to produce a compliant H.264 video bitstream.
Such H.264 parameter set overrides may also have cascading effects on the
implementation overrides applied to the encoded bitstream produced by video
encode operations.
If the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, then the
application can use such queries to retrieve feedback about whether any
implementation overrides have been applied to the encoded bitstream.
When a video session parameters object is
created with the codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, the
VkVideoSessionParametersCreateInfoKHR::pNext
chain must include
a VkVideoEncodeH264SessionParametersCreateInfoKHR
structure specifying
the capacity and initial contents of the object.
The VkVideoEncodeH264SessionParametersCreateInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264SessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t maxStdSPSCount;
uint32_t maxStdPPSCount;
const VkVideoEncodeH264SessionParametersAddInfoKHR* pParametersAddInfo;
} VkVideoEncodeH264SessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxStdSPSCount
is the maximum number of H.264 SPS entries the createdVkVideoSessionParametersKHR
can contain. -
maxStdPPSCount
is the maximum number of H.264 PPS entries the createdVkVideoSessionParametersKHR
can contain. -
pParametersAddInfo
isNULL
or a pointer to a VkVideoEncodeH264SessionParametersAddInfoKHR structure specifying H.264 parameters to add upon object creation.
The VkVideoEncodeH264SessionParametersAddInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264SessionParametersAddInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t stdSPSCount;
const StdVideoH264SequenceParameterSet* pStdSPSs;
uint32_t stdPPSCount;
const StdVideoH264PictureParameterSet* pStdPPSs;
} VkVideoEncodeH264SessionParametersAddInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdSPSCount
is the number of elements in thepStdSPSs
array. -
pStdSPSs
is a pointer to an array ofStdVideoH264SequenceParameterSet
structures describing the H.264 SPS entries to add. -
stdPPSCount
is the number of elements in thepStdPPSs
array. -
pStdPPSs
is a pointer to an array ofStdVideoH264PictureParameterSet
structures describing the H.264 PPS entries to add.
This structure can be specified in the following places:
-
In the
pParametersAddInfo
member of the VkVideoEncodeH264SessionParametersCreateInfoKHR structure specified in thepNext
chain of VkVideoSessionParametersCreateInfoKHR used to create a video session parameters object. In this case, if the video codec operation the video session parameters object is created with isVK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then it defines the set of initial parameters to add to the created object (see Creating Video Session Parameters). -
In the
pNext
chain of VkVideoSessionParametersUpdateInfoKHR. In this case, if the video codec operation the video session parameters object to be updated was created with isVK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, then it defines the set of parameters to add to it (see Updating Video Session Parameters).
The VkVideoEncodeH264SessionParametersGetInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264SessionParametersGetInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 writeStdSPS;
VkBool32 writeStdPPS;
uint32_t stdSPSId;
uint32_t stdPPSId;
} VkVideoEncodeH264SessionParametersGetInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
writeStdSPS
indicates whether the encoded H.264 sequence parameter set identified bystdSPSId
is requested to be retrieved. -
writeStdPPS
indicates whether the encoded H.264 picture parameter set identified by the pair constructed fromstdSPSId
andstdPPSId
is requested to be retrieved. -
stdSPSId
specifies the H.264 sequence parameter set ID used to identify the retrieved H.264 sequence and/or picture parameter set(s). -
stdPPSId
specifies the H.264 picture parameter set ID used to identify the retrieved H.264 picture parameter set whenwriteStdPPS
isVK_TRUE
.
When this structure is specified in the pNext
chain of the
VkVideoEncodeSessionParametersGetInfoKHR structure passed to
vkGetEncodedVideoSessionParametersKHR, the command will write encoded
parameter data to the output buffer in the following order:
-
The H.264 sequence parameter set identified by
stdSPSId
, ifwriteStdSPS
isVK_TRUE
. -
The H.264 picture parameter set identified by the pair constructed from
stdSPSId
andstdPPSId
, ifwriteStdPPS
isVK_TRUE
.
The VkVideoEncodeH264SessionParametersFeedbackInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264SessionParametersFeedbackInfoKHR {
VkStructureType sType;
void* pNext;
VkBool32 hasStdSPSOverrides;
VkBool32 hasStdPPSOverrides;
} VkVideoEncodeH264SessionParametersFeedbackInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
hasStdSPSOverrides
indicates whether any of the parameters of the requested H.264 sequence parameter set, if one was requested via VkVideoEncodeH264SessionParametersGetInfoKHR::writeStdSPS
, were overridden by the implementation. -
hasStdPPSOverrides
indicates whether any of the parameters of the requested H.264 picture parameter set, if one was requested via VkVideoEncodeH264SessionParametersGetInfoKHR::writeStdPPS
, were overridden by the implementation.
H.264 Encoding Parameters
The VkVideoEncodeH264PictureInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264PictureInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t naluSliceEntryCount;
const VkVideoEncodeH264NaluSliceInfoKHR* pNaluSliceEntries;
const StdVideoEncodeH264PictureInfo* pStdPictureInfo;
VkBool32 generatePrefixNalu;
} VkVideoEncodeH264PictureInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
naluSliceEntryCount
is the number of elements inpNaluSliceEntries
. -
pNaluSliceEntries
is a pointer to an array ofnaluSliceEntryCount
VkVideoEncodeH264NaluSliceInfoKHR structures specifying the parameters of the individual H.264 slices to encode for the input picture. -
pStdPictureInfo
is a pointer to aStdVideoEncodeH264PictureInfo
structure specifying H.264 picture information. -
generatePrefixNalu
controls whether prefix NALUs are generated before slice NALUs into the target bitstream, as defined in sections 7.3.2.12 and 7.4.2.12 of the ITU-T H.264 Specification.
This structure is specified in the pNext
chain of the
VkVideoEncodeInfoKHR structure passed to vkCmdEncodeVideoKHR to
specify the codec-specific picture information for an H.264 encode operation.
- Encode Input Picture Information
-
When this structure is specified in the
pNext
chain of the VkVideoEncodeInfoKHR structure passed to vkCmdEncodeVideoKHR, the information related to the encode input picture is defined as follows:-
The image subregion used is determined according to the H.264 Encode Picture Data Access section.
-
The encode input picture is associated with the H.264 picture information provided in
pStdPictureInfo
.
-
- Std Picture Information
-
The members of the
StdVideoEncodeH264PictureInfo
structure pointed to bypStdPictureInfo
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
flags.IdrPicFlag
as defined in section 7.4.1 of the ITU-T H.264 Specification; -
flags.is_reference
as defined in section 3.136 of the ITU-T H.264 Specification; -
seq_parameter_set_id
andpic_parameter_set_id
are used to identify the active parameter sets, as described below; -
primary_pic_type
as defined in section 7.4.2 of the ITU-T H.264 Specification; -
PicOrderCnt
as defined in section 8.2 of the ITU-T H.264 Specification; -
temporal_id
as defined in section G.7.4.1.1 of the ITU-T H.264 Specification; -
if
pRefLists
is notNULL
, then it is a pointer to aStdVideoEncodeH264ReferenceListsInfo
structure that is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
ref_pic_list_modification_flag_l0
andref_pic_list_modification_flag_l1
as defined in section 7.4.3.1 of the ITU-T H.264 Specification; -
num_ref_idx_l0_active_minus1
andnum_ref_idx_l1_active_minus1
as defined in section 7.4.3 of the ITU-T H.264 Specification; -
RefPicList0
andRefPicList1
as defined in section 8.2.4 of the ITU-T H.264 Specification where each element of these arrays either identifies an active reference picture using its DPB slot index or contains the valueSTD_VIDEO_H264_NO_REFERENCE_PICTURE
to indicate “no reference picture”; -
if
refList0ModOpCount
is not zero, thenpRefList0ModOperations
is a pointer to an array ofrefList0ModOpCount
number ofStdVideoEncodeH264RefListModEntry
structures specifying the modification parameters for the reference list L0 as defined in section 7.4.3.1 of the ITU-T H.264 Specification; -
if
refList1ModOpCount
is not zero, thenpRefList1ModOperations
is a pointer to an array ofrefList1ModOpCount
number ofStdVideoEncodeH264RefListModEntry
structures specifying the modification parameters for the reference list L1 as defined in section 7.4.3.1 of the ITU-T H.264 Specification; -
if
refPicMarkingOpCount
is not zero, thenrefPicMarkingOperations
is a pointer to an array ofrefPicMarkingOpCount
number ofStdVideoEncodeH264RefPicMarkingEntry
structures specifying the reference picture marking parameters as defined in section 7.4.3.3 of the ITU-T H.264 Specification;
-
-
all other members are interpreted as defined in section 7.4.3 of the ITU-T H.264 Specification.
-
Reference picture setup is controlled by the value of
StdVideoEncodeH264PictureInfo
::flags.is_reference
.
If it is set and a reconstructed picture is specified, then the latter is used as the target of picture
reconstruction to activate the DPB slot
specified in pEncodeInfo->pSetupReferenceSlot→slotIndex
.
If StdVideoEncodeH264PictureInfo
::flags.is_reference
is not set,
but a reconstructed picture is
specified, then the corresponding picture reference associated with the
DPB slot is invalidated, as described in the
DPB Slot States section.
- Active Parameter Sets
-
The members of the
StdVideoEncodeH264PictureInfo
structure pointed to bypStdPictureInfo
are used to select the active parameter sets to use from the bound video session parameters object, as follows:-
The active SPS is the SPS identified by the key specified in
StdVideoEncodeH264PictureInfo
::seq_parameter_set_id
. -
The active PPS is the PPS identified by the key specified by the pair constructed from
StdVideoEncodeH264PictureInfo
::seq_parameter_set_id
andStdVideoEncodeH264PictureInfo
::pic_parameter_set_id
.
-
H.264 encoding uses explicit weighted sample prediction for a slice, as
defined in section 8.4.2.3 of the ITU-T H.264 Specification,
if any of the following conditions are true for the active
PPS and the pStdSliceHeader
member of the
corresponding element of pNaluSliceEntries
:
-
pStdSliceHeader->slice_type
isSTD_VIDEO_H264_SLICE_TYPE_P
andweighted_pred_flag
is enabled in the active PPS. -
pStdSliceHeader->slice_type
isSTD_VIDEO_H264_SLICE_TYPE_B
andweighted_bipred_idc
in the active PPS equalsSTD_VIDEO_H264_WEIGHTED_BIPRED_IDC_EXPLICIT
.
The VkVideoEncodeH264NaluSliceInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264NaluSliceInfoKHR {
VkStructureType sType;
const void* pNext;
int32_t constantQp;
const StdVideoEncodeH264SliceHeader* pStdSliceHeader;
} VkVideoEncodeH264NaluSliceInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
constantQp
is the QP to use for the slice if the current rate control mode configured for the video session isVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. -
pStdSliceHeader
is a pointer to aStdVideoEncodeH264SliceHeader
structure specifying H.264 slice header parameters for the slice.
- Std Slice Header Parameters
-
The members of the
StdVideoEncodeH264SliceHeader
structure pointed to bypStdSliceHeader
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
if
pWeightTable
is notNULL
, then it is a pointer to aStdVideoEncodeH264WeightTable
that is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoEncodeH264WeightTable
are interpreted as defined in section 7.4.3.2 of the ITU-T H.264 Specification;
-
-
all other members are interpreted as defined in section 7.4.3 of the ITU-T H.264 Specification.
-
The VkVideoEncodeH264DpbSlotInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264DpbSlotInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoEncodeH264ReferenceInfo* pStdReferenceInfo;
} VkVideoEncodeH264DpbSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdReferenceInfo
is a pointer to aStdVideoEncodeH264ReferenceInfo
structure specifying H.264 reference information.
This structure is specified in the pNext
chain of
VkVideoEncodeInfoKHR::pSetupReferenceSlot
, if not NULL
, and
the pNext
chain of the elements of
VkVideoEncodeInfoKHR::pReferenceSlots
to specify the
codec-specific reference picture information for an H.264 encode operation.
- Active Reference Picture Information
-
When this structure is specified in the
pNext
chain of the elements of VkVideoEncodeInfoKHR::pReferenceSlots
, one element is added to the list of active reference pictures used by the video encode operation for each element of VkVideoEncodeInfoKHR::pReferenceSlots
as follows:-
The image subregion used is determined according to the H.264 Encode Picture Data Access section.
-
The reference picture is associated with the DPB slot index specified in the
slotIndex
member of the corresponding element of VkVideoEncodeInfoKHR::pReferenceSlots
. -
The reference picture is associated with the H.264 reference information provided in
pStdReferenceInfo
.
-
- Reconstructed Picture Information
-
When this structure is specified in the
pNext
chain of VkVideoEncodeInfoKHR::pSetupReferenceSlot
, the information related to the reconstructed picture is defined as follows:-
The image subregion used is determined according to the H.264 Encode Picture Data Access section.
-
If reference picture setup is requested, then the reconstructed picture is used to activate the DPB slot with the index specified in VkVideoEncodeInfoKHR::
pSetupReferenceSlot->slotIndex
. -
The reconstructed picture is associated with the H.264 reference information provided in
pStdReferenceInfo
.
-
- Std Reference Information
-
The members of the
StdVideoEncodeH264ReferenceInfo
structure pointed to bypStdReferenceInfo
are interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
flags.used_for_long_term_reference
is used to indicate whether the picture is marked as “used for long-term reference” as defined in section 8.2.5.1 of the ITU-T H.264 Specification; -
primary_pic_type
as defined in section 7.4.2 of the ITU-T H.264 Specification; -
long_term_pic_num
andlong_term_frame_idx
as defined in section 7.4.3 of the ITU-T H.264 Specification; -
temporal_id
as defined in section G.7.4.1.1 of the ITU-T H.264 Specification; -
all other members are interpreted as defined in section 8.2 of the ITU-T H.264 Specification.
-
H.264 Encode Rate Control
Group of Pictures
In case of H.264 encoding it is common practice to follow a regular pattern of different picture types in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).
A regular GOP is defined by the following parameters:
-
The number of frames in the GOP;
-
The number of consecutive B frames between I and/or P frames in display order.
GOPs are further classified as open and closed GOPs.
Frame types in an open GOP follow each other in display order according to the following algorithm:
-
The first frame is always an I frame.
-
This is followed by a number of consecutive B frames, as defined above.
-
If the number of frames in the GOP is not reached yet, then the next frame is a P frame and the algorithm continues from step 2.
In case of a closed GOP, an IDR frame is used at a certain period.
It is also typical for H.264 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:
- Flat Reference Pattern
-
-
Each P frame uses the last non-B frame, in display order, as reference.
-
Each B frame uses the last non-B frame, in display order, as its forward reference, and uses the next non-B frame, in display order, as its backward reference.
-
- Dyadic Reference Pattern
-
-
Each P frame uses the last non-B frame, in display order, as reference.
-
The following algorithm is applied to the sequence of consecutive B frames between I and/or P frames in display order:
-
The B frame in the middle of this sequence uses the frame preceding the sequence as its forward reference, and uses the frame following the sequence as its backward reference.
-
The algorithm is executed recursively for the following frame sequences:
-
The B frames of the original sequence preceding the frame in the middle, if any.
-
The B frames of the original sequence following the frame in the middle, if any.
-
-
-
The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the picture type of individual encoded pictures is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.
When an H.264 encode session is used to encode multiple temporal layers, it is also common practice to follow a regular pattern for the H.264 temporal ID for the encoded pictures in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:
- Dyadic Temporal Layer Pattern
-
-
The number of frames in the temporal GOP is 2n-1, where n is the number of temporal layers.
-
The ith frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
-
The ith frame in the temporal GOP uses the rth frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.
-
Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence B pictures with backward prediction are usually not used. |
The VkVideoEncodeH264RateControlInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264RateControlInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeH264RateControlFlagsKHR flags;
uint32_t gopFrameCount;
uint32_t idrPeriod;
uint32_t consecutiveBFrameCount;
uint32_t temporalLayerCount;
} VkVideoEncodeH264RateControlInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeH264RateControlFlagBitsKHR specifying H.264 rate control flags. -
gopFrameCount
is the number of frames within a group of pictures (GOP) intended to be used by the application. If it is 0, the rate control algorithm may assume an implementation-dependent GOP length. If it isUINT32_MAX
, the GOP length is treated as infinite. -
idrPeriod
is the interval, in terms of number of frames, between two IDR frames (see IDR period). If it is 0, the rate control algorithm may assume an implementation-dependent IDR period. If it isUINT32_MAX
, the IDR period is treated as infinite. -
consecutiveBFrameCount
is the number of consecutive B frames between I and/or P frames within the GOP. -
temporalLayerCount
specifies the number of H.264 temporal layers that the application intends to use.
When an instance of this structure is included in the pNext
chain of
the VkVideoCodingControlInfoKHR structure passed to the
vkCmdControlVideoCodingKHR command, and
VkVideoCodingControlInfoKHR::flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, the parameters in
this structure are used as guidance for the implementation’s rate control
algorithm (see Video Coding Control).
If flags
includes
VK_VIDEO_ENCODE_H264_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR
, then
the rate control state is reset to an initial state to meet HRD compliance
requirements.
Otherwise the new rate control state may be applied without a reset
depending on the implementation and the specified rate control parameters.
It would be possible to infer the picture type to be used when encoding a
frame, on the basis of the values provided for |
Bits which can be set in
VkVideoEncodeH264RateControlInfoKHR::flags
, specifying H.264
rate control flags, are:
// Provided by VK_KHR_video_encode_h264
typedef enum VkVideoEncodeH264RateControlFlagBitsKHR {
VK_VIDEO_ENCODE_H264_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H264_RATE_CONTROL_REGULAR_GOP_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_H264_RATE_CONTROL_TEMPORAL_LAYER_PATTERN_DYADIC_BIT_KHR = 0x00000010,
} VkVideoEncodeH264RateControlFlagBitsKHR;
-
VK_VIDEO_ENCODE_H264_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR
specifies that rate control should attempt to produce an HRD compliant bitstream, as defined in annex C of the ITU-T H.264 Specification. -
VK_VIDEO_ENCODE_H264_RATE_CONTROL_REGULAR_GOP_BIT_KHR
specifies that the application intends to use a regular GOP structure according to the parameters specified in thegopFrameCount
,idrPeriod
, andconsecutiveBFrameCount
members of the VkVideoEncodeH264RateControlInfoKHR structure. -
VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR
specifies that the application intends to follow a flat reference pattern in the GOP. -
VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR
specifies that the application intends to follow a dyadic reference pattern in the GOP. -
VK_VIDEO_ENCODE_H264_RATE_CONTROL_TEMPORAL_LAYER_PATTERN_DYADIC_BIT_KHR
specifies that the application intends to follow a dyadic temporal layer pattern.
// Provided by VK_KHR_video_encode_h264
typedef VkFlags VkVideoEncodeH264RateControlFlagsKHR;
VkVideoEncodeH264RateControlFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeH264RateControlFlagBitsKHR.
Rate Control Layers
The VkVideoEncodeH264RateControlLayerInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264RateControlLayerInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useMinQp;
VkVideoEncodeH264QpKHR minQp;
VkBool32 useMaxQp;
VkVideoEncodeH264QpKHR maxQp;
VkBool32 useMaxFrameSize;
VkVideoEncodeH264FrameSizeKHR maxFrameSize;
} VkVideoEncodeH264RateControlLayerInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useMinQp
indicates whether the QP values determined by rate control will be clamped to the lower bounds on the QP values specified inminQp
. -
minQp
specifies the lower bounds on the QP values, for each picture type, that the implementation’s rate control algorithm will use whenuseMinQp
isVK_TRUE
. -
useMaxQp
indicates whether the QP values determined by rate control will be clamped to the upper bounds on the QP values specified inmaxQp
. -
maxQp
specifies the upper bounds on the QP values, for each picture type, that the implementation’s rate control algorithm will use whenuseMaxQp
isVK_TRUE
. -
useMaxFrameSize
indicates whether the implementation’s rate control algorithm should use the values specified inmaxFrameSize
as the upper bounds on the encoded frame size for each picture type. -
maxFrameSize
specifies the upper bounds on the encoded frame size, for each picture type, whenuseMaxFrameSize
isVK_TRUE
.
When used, the values in minQp
and maxQp
guarantee that the
effective QP values used by the implementation will respect those lower and
upper bounds, respectively.
However, limiting the range of QP values that the implementation is able to
use will also limit the capabilities of the implementation’s rate control
algorithm to comply to other constraints.
In particular, the implementation may not be able to comply to the
following:
-
The average and/or peak bitrate values to be used for the encoded bitstream specified in the
averageBitrate
andmaxBitrate
members of the VkVideoEncodeRateControlLayerInfoKHR structure. -
The upper bounds on the encoded frame size, for each picture type, specified in the
maxFrameSize
member ofVkVideoEncodeH264RateControlLayerInfoKHR
.
In general, applications need to configure rate control parameters appropriately in order to be able to get the desired rate control behavior, as described in the Video Encode Rate Control section. |
When an instance of this structure is included in the pNext
chain of a
VkVideoEncodeRateControlLayerInfoKHR structure specified in one of the
elements of the pLayers
array member of the
VkVideoEncodeRateControlInfoKHR structure passed to the
vkCmdControlVideoCodingKHR command,
VkVideoCodingControlInfoKHR::flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, and the bound
video session was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, it specifies the
H.264-specific rate control parameters of the rate control layer
corresponding to that element of pLayers
.
The VkVideoEncodeH264QpKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264QpKHR {
int32_t qpI;
int32_t qpP;
int32_t qpB;
} VkVideoEncodeH264QpKHR;
-
qpI
is the QP to be used for I pictures. -
qpP
is the QP to be used for P pictures. -
qpB
is the QP to be used for B pictures.
The VkVideoEncodeH264FrameSizeKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264FrameSizeKHR {
uint32_t frameISize;
uint32_t framePSize;
uint32_t frameBSize;
} VkVideoEncodeH264FrameSizeKHR;
-
frameISize
is the size in bytes to be used for I pictures. -
framePSize
is the size in bytes to be used for P pictures. -
frameBSize
is the size in bytes to be used for B pictures.
GOP Remaining Frames
Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).
The VkVideoEncodeH264GopRemainingFrameInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h264
typedef struct VkVideoEncodeH264GopRemainingFrameInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useGopRemainingFrames;
uint32_t gopRemainingI;
uint32_t gopRemainingP;
uint32_t gopRemainingB;
} VkVideoEncodeH264GopRemainingFrameInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useGopRemainingFrames
indicates whether the implementation’s rate control algorithm should use the values specified ingopRemainingI
,gopRemainingP
, andgopRemainingB
. IfuseGopRemainingFrames
isVK_FALSE
, then the values ofgopRemainingI
,gopRemainingP
, andgopRemainingB
are ignored. -
gopRemainingI
specifies the number of I frames the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the video encode operation. -
gopRemainingP
specifies the number of P frames the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the video encode operation. -
gopRemainingB
specifies the number of B frames the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the video encode operation.
Setting useGopRemainingFrames
to VK_TRUE
and including this
structure in the pNext
chain of VkVideoBeginCodingInfoKHR is
only mandatory if the
VkVideoEncodeH264CapabilitiesKHR::requiresGopRemainingFrames
reported for the used video profile is VK_TRUE
.
However, implementations may use these remaining frame counts, when
specified, even when it is not required.
In particular, when the application does not use a
regular GOP structure, these values may provide
additional guidance for the implementation’s rate control algorithm.
The VkVideoEncodeH264CapabilitiesKHR::prefersGopRemainingFrames
capability is also used to indicate that the implementation’s rate control
algorithm may operate more accurately if the application specifies the
remaining frame counts using this structure.
As with other rate control guidance values, if the effective order and number of frames encoded by the application are not in line with the remaining frame counts specified in this structure at any given point, then the behavior of the implementation’s rate control algorithm may deviate from the one expected by the application.
H.264 QP Delta Maps
Quantization delta maps used with an H.264 encode profile are referred to as QP delta maps and their texels contain integer values representing QP delta values that are applied in the process of determining the quantization parameters of the encoded picture.
Accordingly, H.264 QP delta maps always have single channel integer formats,
as reported in VkVideoFormatPropertiesKHR::format
.
When the rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, the QP delta
values are added to the per slice constant QP values that, in effect, enable
the application to explicitly control the used QP values at the granularity
of the used quantization map texel size.
For all other rate control modes, the QP delta values can be used to offset the QP values that the rate control algorithm would otherwise produce.
H.264 Encode Quantization
Performing H.264 encode operations involves the process of assigning QP values to individual H.264 macroblocks. This process depends on the used rate control mode, as well as other encode and rate control parameters, as described below:
-
If the configured rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
, then the QP value is initialized by the implementation-specific default rate control algorithm.-
If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the macroblock, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the
minQpDelta
andmaxQpDelta
members of VkVideoEncodeH264QuantizationMapCapabilitiesKHR, then the QP value used for the macroblock becomes undefined.
-
-
If the configured rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, then the QP value is initialized from the constant QP value specified for the H.264 slice the macroblock is part of.-
If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the macroblock, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the
minQpDelta
andmaxQpDelta
members of VkVideoEncodeH264QuantizationMapCapabilitiesKHR, then the QP value used for the macroblock becomes undefined.
-
-
If the configured rate control mode is not
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
orVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, then the QP value is initialized by the corresponding rate control algorithm.-
If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the macroblock, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the
minQpDelta
andmaxQpDelta
members of VkVideoEncodeH264QuantizationMapCapabilitiesKHR, then the QP value used for the macroblock becomes undefined. -
If the video encode operation is issued with an emphasis map, the rate control will adjust the QP value based on the emphasis value corresponding to the macroblock, as fetched from the quantization map, according to the following equation:
QPnew = f(QPprev,e)
Where QPnew is the resulting QP value, QPprev is the previously determined QP value, e is the emphasis value corresponding to the macroblock, and f is an implementation-defined function for which the following implication is true:
e1 < e2 ⇒ f(QP,e1) ≥ f(QP,e2)
This means that lower emphasis values will result in higher QP values, whereas higher emphasis values will result in lower QP values, but the function is not strictly decreasing with respect to the input emphasis value for a given input QP value.
-
If clamping to minimum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding minimum QP value.
-
If clamping to maximum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding maximum QP value.
-
-
If
VK_VIDEO_ENCODE_H264_CAPABILITY_MB_QP_DIFF_WRAPAROUND_BIT_KHR
is not supported, then the determined QP value is clamped in such a way that themb_qp_delta
value of the encoded macroblock complies to the modified version of equation 7-37 of the ITU-T H.264 Specification.The effect of this is that the maximum QP difference across subsequent macroblocks is limited to the [-(26 + QpBdOffsetY / 2), 25 + QpBdOffsetY / 2] range and only has an observable change in behavior when the video encode operation is issued with a QP delta map.
-
In all cases, the final QP value is clamped to the QP value range supported by the video profile, as reported in the
minQp
andmaxQp
members of VkVideoEncodeH264CapabilitiesKHR.
H.264 Encode Requirements
This section described the required H.264 encoding capabilities for
physical devices that have at least one queue family that supports the video
codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H264_BIT_KHR
, as
returned by vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
|
1.0.0 |
Video Capability | Requirement | Requirement Type1 |
---|---|---|
|
- |
min |
|
4096 |
max |
|
4096 |
max |
|
(64,64) |
max |
|
- |
max |
|
- |
min |
|
0 |
min |
|
0 |
min |
|
- |
min |
|
- |
min |
|
64000 |
min |
|
1 |
min |
|
(64,64) |
max |
|
|
min |
|
- |
min |
|
|
min |
|
1 |
min |
|
0 |
min |
|
0 |
min |
|
0 |
min |
|
1 |
min |
|
- |
implementation-dependent |
|
- |
max |
|
- |
min |
|
- |
implementation-dependent |
|
- |
implementation-dependent |
|
- |
min |
|
- 2 |
min |
|
- 3 |
max |
|
- 3 |
min |
- 1
-
The Requirement Type column specifies the requirement is either the minimum value all implementations must support, the maximum value all implementations must support, or the exact value all implementations must support. For bitmasks a minimum value is the least bits all implementations must set, but they may have additional bits set beyond this minimum.
- 2
-
If VkVideoCapabilitiesKHR::
flags
includesVK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
orVK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR
, then thewidth
andheight
members ofmaxQuantizationMapExtent
must be greater than zero. - 3
-
If VkVideoCapabilitiesKHR::
flags
includesVK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
, thenmaxQpDelta
must be greater thanminQpDelta
.
H.265 Encode Operations
Video encode operations using an H.265 encode profile can be used to encode elementary video stream sequences compliant to the ITU-T H.265 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos. |
This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 8 of the ITU-T H.265 Specification as follows:
-
Syntax elements, derived values, and other parameters are applied from the following structures:
-
The
StdVideoH265VideoParameterSet
structure corresponding to the active VPS specifying the H.265 video parameter set. -
The
StdVideoH265SequenceParameterSet
structure corresponding to the active SPS specifying the H.265 sequence parameter set. -
The
StdVideoH265PictureParameterSet
structure corresponding to the active PPS specifying the H.265 picture parameter set. -
The
StdVideoEncodeH265PictureInfo
structure specifying the H.265 picture information. -
The
StdVideoEncodeH265SliceSegmentHeader
structures specifying the H.265 slice segment header parameters for each encoded H.265 slice segment. -
The
StdVideoEncodeH265ReferenceInfo
structures specifying the H.265 reference information corresponding to the optional reconstructed picture and any active reference pictures.
-
-
The encoded bitstream data is written to the destination video bitstream buffer range as defined in the H.265 Encode Bitstream Data Access section.
-
Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the H.265 Encode Picture Data Access section.
-
The decision on reference picture setup is made according to the parameters specified in the H.265 picture information.
If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the ITU-T H.265 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.
H.265 Encode Parameter Overrides
Implementations may override, unless otherwise specified, any of the H.265 encode parameters specified in the following Video Std structures:
-
StdVideoH265VideoParameterSet
-
StdVideoH265SequenceParameterSet
-
StdVideoH265PictureParameterSet
-
StdVideoEncodeH265PictureInfo
-
StdVideoEncodeH265SliceSegmentHeader
-
StdVideoEncodeH265ReferenceInfo
All such H.265 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.
In addition, implementations must not override any of the following H.265 encode parameters:
-
StdVideoEncodeH265PictureInfo
::pic_type
-
StdVideoEncodeH265SliceSegmentHeader
::slice_type
In case of a video session parameters object
created with
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
,
the following H.265 SPS and PPS
parameters may be overridden by the implementation according to the
quantization map texel size the video
session parameters object was created with:
-
StdVideoH265SequenceParameterSet
::log2_min_luma_coding_block_size_minus3
-
StdVideoH265SequenceParameterSet
::log2_diff_max_min_luma_coding_block_size
-
StdVideoH265SequenceParameterSet
::log2_min_pcm_luma_coding_block_size_minus3
-
StdVideoH265SequenceParameterSet
::log2_diff_max_min_pcm_luma_coding_block_size
-
StdVideoH265PictureParameterSet
::diff_cu_qp_delta_depth
This may be necessary in order to limit the set of H.265 coding unit and coding tree unit sizes used during picture encoding to those that are supported by the implementation when using the specific quantization map texel size.
In case of H.265 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded H.265 parameter sets in the bitstream in order to be able to produce a compliant H.265 video bitstream using the H.265 encode parameters stored in the video session parameters object.
In case of any H.265 encode parameters stored in the encoded bitstream
produced by video encode operations, if the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, the
application can use such queries to retrieve feedback about whether any
implementation overrides have been applied to those H.265 encode parameters.
H.265 Encode Bitstream Data Access
Each video encode operation writes one or more VCL NAL units comprising of
slice segment headers and data of the encoded picture, in the format defined
in sections 7.3.6 and 7.3.8, according to the semantics defined in sections
7.4.7 and 7.4.9 of the ITU-T H.265 Specification,
respectively.
The number of VCL NAL units written is specified by
VkVideoEncodeH265PictureInfoKHR::naluSliceSegmentEntryCount
.
H.265 Encode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a encode input picture, reference picture, or
reconstructed picture accessed by video coding
operations using an H.265 encode profile is defined
as the set of texels within the coordinate range:
-
([0,
endX
), [0,endY
))
Where:
-
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure; -
endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video encode operations using an H.265 encode profile, any access to a picture at the coordinates
(x
,y
), as defined by the ITU-T H.265 Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).
H.265 Frame, Picture, Slice Segments, and Tiles
H.265 pictures consist of one or more slices, slice segments, and tiles, as defined in section 6.3.1 of the ITU-T H.265 Specification.
For the purposes of this specification, the H.265 slice segments comprising a picture are referred to as the picture partitions of the picture.
Video encode operations using an H.265 encode profile can encode slice segments of different types, as defined in
section 7.4.7.1 of the ITU-T H.265 Specification, by
specifying the corresponding enumeration constant value in
StdVideoEncodeH265SliceSegmentHeader
::slice_type
in the
H.265 slice segment header parameters from the Video Std enumeration type StdVideoH265SliceType
:
-
STD_VIDEO_H265_SLICE_TYPE_B
indicates that the slice segment is part of a B slice as defined in section 3.12 of the ITU-T H.265 Specification. -
STD_VIDEO_H265_SLICE_TYPE_P
indicates that the slice segment is part of a P slice as defined in section 3.111 of the ITU-T H.265 Specification. -
STD_VIDEO_H265_SLICE_TYPE_I
indicates that the slice segment is part of an I slice as defined in section 3.74 of the ITU-T H.265 Specification.
Pictures constructed from such slice segments can be of different types, as
defined in section 7.4.3.5 of the ITU-T H.265 Specification.
Video encode operations using an H.265 encode profile can encode pictures of a specific type by specifying the
corresponding enumeration constant value in
StdVideoEncodeH265PictureInfo
::pic_type
in the
H.265 picture information from the Video Std
enumeration type StdVideoH265PictureType
:
-
STD_VIDEO_H265_PICTURE_TYPE_P
indicates that the picture is a P picture. A frame consisting of a P picture is also referred to as a P frame. -
STD_VIDEO_H265_PICTURE_TYPE_B
indicates that the picture is a B picture. A frame consisting of a B picture is also referred to as a B frame. -
STD_VIDEO_H265_PICTURE_TYPE_I
indicates that the picture is an I picture. A frame consisting of an I picture is also referred to as an I frame. -
STD_VIDEO_H265_PICTURE_TYPE_IDR
indicates that the picture is a special type of I picture called an IDR picture as defined in section 3.67 of the ITU-T H.265 Specification. A frame consisting of an IDR picture is also referred to as an IDR frame.
H.265 Coding Blocks
H.265 encode supports two types of coding blocks:
-
Coding tree unit, as defined in section 3.35 of the ITU-T H.265 Specification.
-
Coding unit, as defined in section 3.36 of the ITU-T H.265 Specification.
H.265 Encode Profile
A video profile supporting H.265 video encode operations is specified by
setting VkVideoProfileInfoKHR::videoCodecOperation
to
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
and adding a
VkVideoEncodeH265ProfileInfoKHR
structure to the
VkVideoProfileInfoKHR::pNext
chain.
The VkVideoEncodeH265ProfileInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265ProfileInfoKHR {
VkStructureType sType;
const void* pNext;
StdVideoH265ProfileIdc stdProfileIdc;
} VkVideoEncodeH265ProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdProfileIdc
is aStdVideoH265ProfileIdc
value specifying the H.265 codec profile IDC, as defined in section A.3 of the ITU-T H.265 Specification.
H.265 Encode Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities for an H.265 encode profile, the
VkVideoCapabilitiesKHR::pNext
chain must include a
VkVideoEncodeH265CapabilitiesKHR
structure that will be filled with
the profile-specific capabilities.
The VkVideoEncodeH265CapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265CapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeH265CapabilityFlagsKHR flags;
StdVideoH265LevelIdc maxLevelIdc;
uint32_t maxSliceSegmentCount;
VkExtent2D maxTiles;
VkVideoEncodeH265CtbSizeFlagsKHR ctbSizes;
VkVideoEncodeH265TransformBlockSizeFlagsKHR transformBlockSizes;
uint32_t maxPPictureL0ReferenceCount;
uint32_t maxBPictureL0ReferenceCount;
uint32_t maxL1ReferenceCount;
uint32_t maxSubLayerCount;
VkBool32 expectDyadicTemporalSubLayerPattern;
int32_t minQp;
int32_t maxQp;
VkBool32 prefersGopRemainingFrames;
VkBool32 requiresGopRemainingFrames;
VkVideoEncodeH265StdFlagsKHR stdSyntaxFlags;
} VkVideoEncodeH265CapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeH265CapabilityFlagBitsKHR indicating supported H.265 encoding capabilities. -
maxLevelIdc
is aStdVideoH265LevelIdc
value indicating the maximum H.265 level supported by the profile, where enum constantSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifies H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification. -
maxSliceSegmentCount
indicates the maximum number of slice segments that can be encoded for a single picture. Further restrictions may apply to the number of slice segments that can be encoded for a single picture depending on other capabilities and codec-specific rules. -
maxTiles
indicates the maximum number of H.265 tile columns and rows, as defined in sections 3.175 and 3.176 of the ITU-T H.265 Specification that can be encoded for a single picture. Further restrictions may apply to the number of H.265 tiles that can be encoded for a single picture depending on other capabilities and codec-specific rules. -
ctbSizes
is a bitmask of VkVideoEncodeH265CtbSizeFlagBitsKHR describing the supported CTB sizes. -
transformBlockSizes
is a bitmask of VkVideoEncodeH265TransformBlockSizeFlagBitsKHR describing the supported transform block sizes. -
maxPPictureL0ReferenceCount
indicates the maximum number of reference pictures the implementation supports in the reference list L0 for P pictures.As implementations may override the reference lists,
maxPPictureL0ReferenceCount
does not limit the number of elements that the application can specify in the L0 reference list for P pictures. However, ifmaxPPictureL0ReferenceCount
is zero, then the use of P pictures is not allowed. In case of H.265 encoding, pictures can be encoded using only forward prediction even if P pictures are not supported, as the ITU-T H.265 Specification supports generalized P & B frames (also known as low delay B frames) whereas B frames can refer to past frames through both the L0 and L1 reference lists. -
maxBPictureL0ReferenceCount
indicates the maximum number of reference pictures the implementation supports in the reference list L0 for B pictures. -
maxL1ReferenceCount
indicates the maximum number of reference pictures the implementation supports in the reference list L1 if encoding of B pictures is supported.As implementations may override the reference lists,
maxBPictureL0ReferenceCount
andmaxL1ReferenceCount
does not limit the number of elements that the application can specify in the L0 and L1 reference lists for B pictures. However, ifmaxBPictureL0ReferenceCount
andmaxL1ReferenceCount
are both zero, then the use of B pictures is not allowed. -
maxSubLayerCount
indicates the maximum number of H.265 sub-layers supported by the implementation. -
expectDyadicTemporalSubLayerPattern
indicates that the implementation’s rate control algorithms expect the application to use a dyadic temporal sub-layer pattern when encoding multiple temporal sub-layers. -
minQp
indicates the minimum QP value supported. -
maxQp
indicates the maximum QP value supported. -
prefersGopRemainingFrames
indicates that the implementation’s rate control algorithm prefers the application to specify the number of frames of each type remaining in the current group of pictures when beginning a video coding scope. -
requiresGopRemainingFrames
indicates that the implementation’s rate control algorithm requires the application to specify the number of frames of each type remaining in the current group of pictures when beginning a video coding scope. -
stdSyntaxFlags
is a bitmask of VkVideoEncodeH265StdFlagBitsKHR indicating capabilities related to H.265 syntax elements.
Bits which may be set in
VkVideoEncodeH265CapabilitiesKHR::flags
, indicating the H.265
encoding capabilities supported, are:
// Provided by VK_KHR_video_encode_h265
typedef enum VkVideoEncodeH265CapabilityFlagBitsKHR {
VK_VIDEO_ENCODE_H265_CAPABILITY_HRD_COMPLIANCE_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H265_CAPABILITY_PREDICTION_WEIGHT_TABLE_GENERATED_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H265_CAPABILITY_ROW_UNALIGNED_SLICE_SEGMENT_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H265_CAPABILITY_DIFFERENT_SLICE_SEGMENT_TYPE_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_H265_CAPABILITY_B_FRAME_IN_L0_LIST_BIT_KHR = 0x00000010,
VK_VIDEO_ENCODE_H265_CAPABILITY_B_FRAME_IN_L1_LIST_BIT_KHR = 0x00000020,
VK_VIDEO_ENCODE_H265_CAPABILITY_PER_PICTURE_TYPE_MIN_MAX_QP_BIT_KHR = 0x00000040,
VK_VIDEO_ENCODE_H265_CAPABILITY_PER_SLICE_SEGMENT_CONSTANT_QP_BIT_KHR = 0x00000080,
VK_VIDEO_ENCODE_H265_CAPABILITY_MULTIPLE_TILES_PER_SLICE_SEGMENT_BIT_KHR = 0x00000100,
VK_VIDEO_ENCODE_H265_CAPABILITY_MULTIPLE_SLICE_SEGMENTS_PER_TILE_BIT_KHR = 0x00000200,
// Provided by VK_KHR_video_encode_h265 with VK_KHR_video_encode_quantization_map
VK_VIDEO_ENCODE_H265_CAPABILITY_CU_QP_DIFF_WRAPAROUND_BIT_KHR = 0x00000400,
} VkVideoEncodeH265CapabilityFlagBitsKHR;
-
VK_VIDEO_ENCODE_H265_CAPABILITY_HRD_COMPLIANCE_BIT_KHR
specifies whether the implementation may be able to generate HRD compliant bitstreams if any of thenal_hrd_parameters_present_flag
,vcl_hrd_parameters_present_flag
, orsub_pic_hrd_params_present_flag
members ofStdVideoH265HrdFlags
are set to1
in the HRD parameters of the active VPS or active SPS, or ifStdVideoH265SpsVuiFlags
::vui_hrd_parameters_present_flag
is set to1
in the active SPS. -
VK_VIDEO_ENCODE_H265_CAPABILITY_PREDICTION_WEIGHT_TABLE_GENERATED_BIT_KHR
specifies that if theweighted_pred_flag
or theweighted_bipred_flag
member ofStdVideoH265PpsFlags
is set to1
in the active PPS when encoding a P picture or B picture, respectively, then the implementation is able to internally decide syntax forpred_weight_table
, as defined in section 7.4.7.3 of the ITU-T H.265 Specification, and the application is not required to provide a weight table in the H.265 slice segment header parameters. -
VK_VIDEO_ENCODE_H265_CAPABILITY_ROW_UNALIGNED_SLICE_SEGMENT_BIT_KHR
specifies that each slice segment in a frame with a single or multiple tiles per slice may begin or finish at any offset in a CTB row. If not supported, all slice segments in such a frame must begin at the start of a CTB row (and hence each slice segment must finish at the end of a CTB row). Also indicates that each slice segment in a frame with multiple slices per tile may begin or finish at any offset within the enclosing tile’s CTB row. If not supported, slice segments in such a frame must begin at the start of the enclosing tile’s CTB row (and hence each slice segment must finish at the end of the enclosing tile’s CTB row). -
VK_VIDEO_ENCODE_H265_CAPABILITY_DIFFERENT_SLICE_SEGMENT_TYPE_BIT_KHR
specifies that when a frame is encoded with multiple slice segments, the implementation allows encoding each slice segment with a differentStdVideoEncodeH265SliceSegmentHeader
::slice_type
specified in the H.265 slice segment header parameters. If not supported, all slice segments of the frame must be encoded with the sameslice_type
which corresponds to the picture type of the frame. -
VK_VIDEO_ENCODE_H265_CAPABILITY_B_FRAME_IN_L0_LIST_BIT_KHR
specifies support for using a B frame as L0 reference, as specified inStdVideoEncodeH265ReferenceListsInfo
::RefPicList0
in the H.265 picture information. -
VK_VIDEO_ENCODE_H265_CAPABILITY_B_FRAME_IN_L1_LIST_BIT_KHR
specifies support for using a B frame as L1 reference, as specified inStdVideoEncodeH265ReferenceListsInfo
::RefPicList1
in the H.265 picture information. -
VK_VIDEO_ENCODE_H265_CAPABILITY_PER_PICTURE_TYPE_MIN_MAX_QP_BIT_KHR
specifies support for specifying different QP values in the members of VkVideoEncodeH265QpKHR. -
VK_VIDEO_ENCODE_H265_CAPABILITY_PER_SLICE_SEGMENT_CONSTANT_QP_BIT_KHR
specifies support for specifying different constant QP values for each slice segment. -
VK_VIDEO_ENCODE_H265_CAPABILITY_MULTIPLE_TILES_PER_SLICE_SEGMENT_BIT_KHR
specifies whether encoding multiple tiles per slice segment, as defined in section 6.3.1 of the ITU-T H.265 Specification, is supported. If this capability flag is not present, then the implementation is only able to encode a single tile for each slice segment. -
VK_VIDEO_ENCODE_H265_CAPABILITY_MULTIPLE_SLICE_SEGMENTS_PER_TILE_BIT_KHR
specifies whether encoding multiple slice segments per tile, as defined in section 6.3.1 of the ITU-T H.265 Specification, is supported. If this capability flag is not present, then the implementation is only able to encode a single slice segment for each tile. -
VK_VIDEO_ENCODE_H265_CAPABILITY_CU_QP_DIFF_WRAPAROUND_BIT_KHR
indicates support for wraparound during the calculation of the QP values of subsequently encoded coding units, as defined in section 7.4.9.14 of the ITU-T H.265 Specification. If not supported, equation 8-283 of the ITU-T H.265 Specification is effectively reduced to the following:QpY = qPY_PRED +
CuQpDeltaVal
The effect of this is that the maximum QP difference across subsequent coding units is limited to the [-(26 + QpBdOffsetY / 2), 25 + QpBdOffsetY / 2] range.
// Provided by VK_KHR_video_encode_h265
typedef VkFlags VkVideoEncodeH265CapabilityFlagsKHR;
VkVideoEncodeH265CapabilityFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeH265CapabilityFlagBitsKHR.
Bits which may be set in
VkVideoEncodeH265CapabilitiesKHR::stdSyntaxFlags
, indicating the
capabilities related to the H.265 syntax elements, are:
// Provided by VK_KHR_video_encode_h265
typedef enum VkVideoEncodeH265StdFlagBitsKHR {
VK_VIDEO_ENCODE_H265_STD_SEPARATE_COLOR_PLANE_FLAG_SET_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H265_STD_SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG_SET_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H265_STD_SCALING_LIST_DATA_PRESENT_FLAG_SET_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H265_STD_PCM_ENABLED_FLAG_SET_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_H265_STD_SPS_TEMPORAL_MVP_ENABLED_FLAG_SET_BIT_KHR = 0x00000010,
VK_VIDEO_ENCODE_H265_STD_INIT_QP_MINUS26_BIT_KHR = 0x00000020,
VK_VIDEO_ENCODE_H265_STD_WEIGHTED_PRED_FLAG_SET_BIT_KHR = 0x00000040,
VK_VIDEO_ENCODE_H265_STD_WEIGHTED_BIPRED_FLAG_SET_BIT_KHR = 0x00000080,
VK_VIDEO_ENCODE_H265_STD_LOG2_PARALLEL_MERGE_LEVEL_MINUS2_BIT_KHR = 0x00000100,
VK_VIDEO_ENCODE_H265_STD_SIGN_DATA_HIDING_ENABLED_FLAG_SET_BIT_KHR = 0x00000200,
VK_VIDEO_ENCODE_H265_STD_TRANSFORM_SKIP_ENABLED_FLAG_SET_BIT_KHR = 0x00000400,
VK_VIDEO_ENCODE_H265_STD_TRANSFORM_SKIP_ENABLED_FLAG_UNSET_BIT_KHR = 0x00000800,
VK_VIDEO_ENCODE_H265_STD_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT_FLAG_SET_BIT_KHR = 0x00001000,
VK_VIDEO_ENCODE_H265_STD_TRANSQUANT_BYPASS_ENABLED_FLAG_SET_BIT_KHR = 0x00002000,
VK_VIDEO_ENCODE_H265_STD_CONSTRAINED_INTRA_PRED_FLAG_SET_BIT_KHR = 0x00004000,
VK_VIDEO_ENCODE_H265_STD_ENTROPY_CODING_SYNC_ENABLED_FLAG_SET_BIT_KHR = 0x00008000,
VK_VIDEO_ENCODE_H265_STD_DEBLOCKING_FILTER_OVERRIDE_ENABLED_FLAG_SET_BIT_KHR = 0x00010000,
VK_VIDEO_ENCODE_H265_STD_DEPENDENT_SLICE_SEGMENTS_ENABLED_FLAG_SET_BIT_KHR = 0x00020000,
VK_VIDEO_ENCODE_H265_STD_DEPENDENT_SLICE_SEGMENT_FLAG_SET_BIT_KHR = 0x00040000,
VK_VIDEO_ENCODE_H265_STD_SLICE_QP_DELTA_BIT_KHR = 0x00080000,
VK_VIDEO_ENCODE_H265_STD_DIFFERENT_SLICE_QP_DELTA_BIT_KHR = 0x00100000,
} VkVideoEncodeH265StdFlagBitsKHR;
-
VK_VIDEO_ENCODE_H265_STD_SEPARATE_COLOR_PLANE_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265SpsFlags
::separate_colour_plane_flag
in the SPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265SpsFlags
::sample_adaptive_offset_enabled_flag
in the SPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_SCALING_LIST_DATA_PRESENT_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value for thescaling_list_enabled_flag
andsps_scaling_list_data_present_flag
members ofStdVideoH265SpsFlags
in the SPS, and the application-provided value forStdVideoH265PpsFlags
::pps_scaling_list_data_present_flag
in the PPS when those values are1
. -
VK_VIDEO_ENCODE_H265_STD_PCM_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265SpsFlags
::pcm_enable_flag
in the SPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_SPS_TEMPORAL_MVP_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265SpsFlags
::sps_temporal_mvp_enabled_flag
in the SPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_INIT_QP_MINUS26_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PictureParameterSet
::init_qp_minus26
in the PPS when that value is non-zero. -
VK_VIDEO_ENCODE_H265_STD_WEIGHTED_PRED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::weighted_pred_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_WEIGHTED_BIPRED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::weighted_bipred_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_LOG2_PARALLEL_MERGE_LEVEL_MINUS2_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PictureParameterSet
::log2_parallel_merge_level_minus2
in the PPS when that value is non-zero. -
VK_VIDEO_ENCODE_H265_STD_SIGN_DATA_HIDING_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::sign_data_hiding_enabled_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_TRANSFORM_SKIP_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::transform_skip_enabled_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_TRANSFORM_SKIP_ENABLED_FLAG_UNSET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::transform_skip_enabled_flag
in the PPS when that value is0
. -
VK_VIDEO_ENCODE_H265_STD_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::pps_slice_chroma_qp_offsets_present_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_TRANSQUANT_BYPASS_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::transquant_bypass_enabled_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_CONSTRAINED_INTRA_PRED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::constrained_intra_pred_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_ENTROPY_CODING_SYNC_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::entropy_coding_sync_enabled_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_DEBLOCKING_FILTER_OVERRIDE_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::deblocking_filter_override_enabled_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_DEPENDENT_SLICE_SEGMENTS_ENABLED_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoH265PpsFlags
::dependent_slice_segments_enabled_flag
in the PPS when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_DEPENDENT_SLICE_SEGMENT_FLAG_SET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH265SliceSegmentHeader
::dependent_slice_segment_flag
in the H.265 slice segment header parameters when that value is1
. -
VK_VIDEO_ENCODE_H265_STD_SLICE_QP_DELTA_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH265SliceSegmentHeader
::slice_qp_delta
in the H.265 slice segment header parameters when that value is identical across the slice segments of the encoded frame. -
VK_VIDEO_ENCODE_H265_STD_DIFFERENT_SLICE_QP_DELTA_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeH265SliceSegmentHeader
::slice_qp_delta
in the H.265 slice segment header parameters when that value is different across the slice segments of the encoded frame.
These capability flags provide information to the application about specific H.265 syntax element values that the implementation supports without having to override them and do not otherwise restrict the values that the application can specify for any of the mentioned H.265 syntax elements.
// Provided by VK_KHR_video_encode_h265
typedef VkFlags VkVideoEncodeH265StdFlagsKHR;
VkVideoEncodeH265StdFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoEncodeH265StdFlagBitsKHR.
Bits which may be set in
VkVideoEncodeH265CapabilitiesKHR::ctbSizes
, indicating the CTB
sizes supported by the implementation, are:
// Provided by VK_KHR_video_encode_h265
typedef enum VkVideoEncodeH265CtbSizeFlagBitsKHR {
VK_VIDEO_ENCODE_H265_CTB_SIZE_16_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H265_CTB_SIZE_32_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H265_CTB_SIZE_64_BIT_KHR = 0x00000004,
} VkVideoEncodeH265CtbSizeFlagBitsKHR;
-
VK_VIDEO_ENCODE_H265_CTB_SIZE_16_BIT_KHR
specifies that a CTB size of 16x16 is supported. -
VK_VIDEO_ENCODE_H265_CTB_SIZE_32_BIT_KHR
specifies that a CTB size of 32x32 is supported. -
VK_VIDEO_ENCODE_H265_CTB_SIZE_64_BIT_KHR
specifies that a CTB size of 64x64 is supported.
// Provided by VK_KHR_video_encode_h265
typedef VkFlags VkVideoEncodeH265CtbSizeFlagsKHR;
VkVideoEncodeH265CtbSizeFlagsKHR
is a bitmask type for setting a mask
of zero or more VkVideoEncodeH265CtbSizeFlagBitsKHR.
Implementations must support at least one of
VkVideoEncodeH265CtbSizeFlagBitsKHR
.
Bits which may be set in
VkVideoEncodeH265CapabilitiesKHR::transformBlockSizes
,
indicating the transform block sizes supported by the implementation, are:
// Provided by VK_KHR_video_encode_h265
typedef enum VkVideoEncodeH265TransformBlockSizeFlagBitsKHR {
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_4_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_8_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_16_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_32_BIT_KHR = 0x00000008,
} VkVideoEncodeH265TransformBlockSizeFlagBitsKHR;
-
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_4_BIT_KHR
specifies that a transform block size of 4x4 is supported. -
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_8_BIT_KHR
specifies that a transform block size of 8x8 is supported. -
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_16_BIT_KHR
specifies that a transform block size of 16x16 is supported. -
VK_VIDEO_ENCODE_H265_TRANSFORM_BLOCK_SIZE_32_BIT_KHR
specifies that a transform block size of 32x32 is supported.
// Provided by VK_KHR_video_encode_h265
typedef VkFlags VkVideoEncodeH265TransformBlockSizeFlagsKHR;
VkVideoEncodeH265TransformBlockSizeFlagsKHR
is a bitmask type for
setting a mask of zero or more
VkVideoEncodeH265TransformBlockSizeFlagBitsKHR.
Implementations must support at least one of
VkVideoEncodeH265TransformBlockSizeFlagBitsKHR
.
H.265 Encode Quality Level Properties
When calling vkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR
with pVideoProfile->videoCodecOperation
specified as
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, the
VkVideoEncodeH265QualityLevelPropertiesKHR structure must be included
in the pNext
chain of the VkVideoEncodeQualityLevelPropertiesKHR
structure to retrieve additional video encode quality level properties
specific to H.265 encoding.
The VkVideoEncodeH265QualityLevelPropertiesKHR structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265QualityLevelPropertiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeH265RateControlFlagsKHR preferredRateControlFlags;
uint32_t preferredGopFrameCount;
uint32_t preferredIdrPeriod;
uint32_t preferredConsecutiveBFrameCount;
uint32_t preferredSubLayerCount;
VkVideoEncodeH265QpKHR preferredConstantQp;
uint32_t preferredMaxL0ReferenceCount;
uint32_t preferredMaxL1ReferenceCount;
} VkVideoEncodeH265QualityLevelPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
preferredRateControlFlags
is a bitmask of VkVideoEncodeH265RateControlFlagBitsKHR values indicating the preferred flags to use for VkVideoEncodeH265RateControlInfoKHR::flags
. -
preferredGopFrameCount
indicates the preferred value to use for VkVideoEncodeH265RateControlInfoKHR::gopFrameCount
. -
preferredIdrPeriod
indicates the preferred value to use for VkVideoEncodeH265RateControlInfoKHR::idrPeriod
. -
preferredConsecutiveBFrameCount
indicates the preferred value to use for VkVideoEncodeH265RateControlInfoKHR::consecutiveBFrameCount
. -
preferredSubLayerCount
indicates the preferred value to use for VkVideoEncodeH265RateControlInfoKHR::subLayerCount
. -
preferredConstantQp
indicates the preferred values to use for VkVideoEncodeH265NaluSliceSegmentInfoKHR::constantQp
for each picture type when using rate control modeVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. -
preferredMaxL0ReferenceCount
indicates the preferred maximum number of reference pictures to use in the reference list L0. -
preferredMaxL1ReferenceCount
indicates the preferred maximum number of reference pictures to use in the reference list L1.
H.265 Encode Session
Additional parameters can be specified when creating a video session with an
H.265 encode profile by including an instance of the
VkVideoEncodeH265SessionCreateInfoKHR structure in the pNext
chain of VkVideoSessionCreateInfoKHR.
The VkVideoEncodeH265SessionCreateInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265SessionCreateInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useMaxLevelIdc;
StdVideoH265LevelIdc maxLevelIdc;
} VkVideoEncodeH265SessionCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useMaxLevelIdc
indicates whether the value ofmaxLevelIdc
should be used by the implementation. When it isVK_FALSE
, the implementation ignores the value ofmaxLevelIdc
and uses the value of VkVideoEncodeH265CapabilitiesKHR::maxLevelIdc
, as reported by vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile. -
maxLevelIdc
is aStdVideoH265LevelIdc
value specifying the upper bound on the H.265 level for the video bitstreams produced by the created video session, where enum constantSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifies H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification.
H.265 Encode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
can contain the following types of parameters:
- H.265 Video Parameter Sets (VPS)
-
Represented by
StdVideoH265VideoParameterSet
structures and interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
vps_video_parameter_set_id
is used as the key of the VPS entry; -
the
max_latency_increase_plus1
,max_dec_pic_buffering_minus1
, andmax_num_reorder_pics
members of theStdVideoH265DecPicBufMgr
structure pointed to bypDecPicBufMgr
correspond tovps_max_latency_increase_plus1
,vps_max_dec_pic_buffering_minus1
, andvps_max_num_reorder_pics
, respectively, as defined in section 7.4.3.1 of the ITU-T H.265 Specification; -
the
StdVideoH265HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:-
reserved
is used only for padding purposes and is otherwise ignored; -
flags.fixed_pic_rate_general_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_general_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.fixed_pic_rate_within_cvs_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_within_cvs_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.low_delay_hrd_flag
is a bitmask where bit index i corresponds tolow_delay_hrd_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
if
flags.nal_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersNal
is a pointer to an array ofvps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wherevps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265VideoParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
if
flags.vcl_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersVcl
is a pointer to an array ofvps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wherevps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265VideoParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
all other members of
StdVideoH265HrdParameters
are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
-
-
the
StdVideoH265ProfileTierLevel
structure pointed to bypProfileTierLevel
are interpreted as follows:-
general_level_idc
is one of the enum constantsSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifying the H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265ProfileTierLevel
are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
-
-
all other members of
StdVideoH265VideoParameterSet
are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
-
- H.265 Sequence Parameter Sets (SPS)
-
Represented by
StdVideoH265SequenceParameterSet
structures and interpreted as follows:-
reserved1
andreserved2
are used only for padding purposes and are otherwise ignored; -
the pair constructed from
sps_video_parameter_set_id
andsps_seq_parameter_set_id
is used as the key of the SPS entry; -
the
StdVideoH265ProfileTierLevel
structure pointed to bypProfileTierLevel
are interpreted as follows:-
general_level_idc
is one of the enum constantsSTD_VIDEO_H265_LEVEL_IDC_<major>_<minor>
identifying the H.265 level<major>.<minor>
as defined in section A.4 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265ProfileTierLevel
are interpreted as defined in section 7.4.4 of the ITU-T H.265 Specification;
-
-
the
max_latency_increase_plus1
,max_dec_pic_buffering_minus1
, andmax_num_reorder_pics
members of theStdVideoH265DecPicBufMgr
structure pointed to bypDecPicBufMgr
correspond tosps_max_latency_increase_plus1
,sps_max_dec_pic_buffering_minus1
, andsps_max_num_reorder_pics
, respectively, as defined in section 7.4.3.2 of the ITU-T H.265 Specification; -
if
flags.sps_scaling_list_data_present_flag
is set, then theStdVideoH265ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
ScalingList4x4
,ScalingList8x8
,ScalingList16x16
, andScalingList32x32
correspond toScalingList[0]
,ScalingList[1]
,ScalingList[2]
, andScalingList[3]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification; -
ScalingListDCCoef16x16
andScalingListDCCoef32x32
correspond toscaling_list_dc_coef_minus8[0]
andscaling_list_dc_coef_minus8[1]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
-
-
pShortTermRefPicSet
is a pointer to an array ofnum_short_term_ref_pic_sets
number ofStdVideoH265ShortTermRefPicSet
structures where each element is interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
used_by_curr_pic_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
use_delta_flag
is a bitmask where bit index i corresponds touse_delta_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
used_by_curr_pic_s0_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_s0_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
used_by_curr_pic_s1_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_s1_flag[i]
as defined in section 7.4.8 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265ShortTermRefPicSet
are interpreted as defined in section 7.4.8 of the ITU-T H.265 Specification;
-
-
if
flags.long_term_ref_pics_present_flag
is set then theStdVideoH265LongTermRefPicsSps
structure pointed to bypLongTermRefPicsSps
is interpreted as follows:-
used_by_curr_pic_lt_sps_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_lt_sps_flag[i]
as defined in section 7.4.3.2 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265LongTermRefPicsSps
are interpreted as defined in section 7.4.3.2 of the ITU-T H.265 Specification;
-
-
if
flags.vui_parameters_present_flag
is set, then theStdVideoH265SequenceParameterSetVui
structure pointed to bypSequenceParameterSetVui
is interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
the
StdVideoH265HrdParameters
structure pointed to bypHrdParameters
is interpreted as follows:-
flags.fixed_pic_rate_general_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_general_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.fixed_pic_rate_within_cvs_flag
is a bitmask where bit index i corresponds tofixed_pic_rate_within_cvs_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
flags.low_delay_hrd_flag
is a bitmask where bit index i corresponds tolow_delay_hrd_flag[i]
as defined in section E.3.2 of the ITU-T H.265 Specification; -
if
flags.nal_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersNal
is a pointer to an array ofsps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wheresps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265SequenceParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
if
flags.vcl_hrd_parameters_present_flag
is set, thenpSubLayerHrdParametersVcl
is a pointer to an array ofsps_max_sub_layers_minus1
+ 1 number ofStdVideoH265SubLayerHrdParameters
structures wheresps_max_sub_layers_minus1
is the corresponding member of the encompassingStdVideoH265SequenceParameterSet
structure and each element is interpreted as follows:-
cbr_flag
is a bitmask where bit index i corresponds tocbr_flag[i]
as defined in section E.3.3 of the ITU-T H.265 Specification; -
all other members of the
StdVideoH265SubLayerHrdParameters
structure are interpreted as defined in section E.3.3 of the ITU-T H.265 Specification;
-
-
all other members of
StdVideoH265HrdParameters
are interpreted as defined in section E.3.2 of the ITU-T H.265 Specification;
-
-
all other members of
pSequenceParameterSetVui
are interpreted as defined in section E.3.1 of the ITU-T H.265 Specification;
-
-
if
flags.sps_palette_predictor_initializer_present_flag
is set, then thePredictorPaletteEntries
member of theStdVideoH265PredictorPaletteEntries
structure pointed to bypPredictorPaletteEntries
is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265SequenceParameterSet
are interpreted as defined in section 7.4.3.1 of the ITU-T H.265 Specification.
-
- H.265 Picture Parameter Sets (PPS)
-
Represented by
StdVideoH265PictureParameterSet
structures and interpreted as follows:-
reserved1
,reserved2
, andreserved3
are used only for padding purposes and are otherwise ignored; -
the triplet constructed from
sps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
is used as the key of the PPS entry; -
if
flags.pps_scaling_list_data_present_flag
is set, then theStdVideoH265ScalingLists
structure pointed to bypScalingLists
is interpreted as follows:-
ScalingList4x4
,ScalingList8x8
,ScalingList16x16
, andScalingList32x32
correspond toScalingList[0]
,ScalingList[1]
,ScalingList[2]
, andScalingList[3]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification; -
ScalingListDCCoef16x16
andScalingListDCCoef32x32
correspond toscaling_list_dc_coef_minus8[0]
andscaling_list_dc_coef_minus8[1]
, respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
-
-
if
flags.pps_palette_predictor_initializer_present_flag
is set, then thePredictorPaletteEntries
member of theStdVideoH265PredictorPaletteEntries
structure pointed to bypPredictorPaletteEntries
is interpreted as defined in section 7.4.9.13 of the ITU-T H.265 Specification; -
all other members of
StdVideoH265PictureParameterSet
are interpreted as defined in section 7.4.3.3 of the ITU-T H.265 Specification.
-
Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting H.265 parameter sets into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded H.265 parameter sets in order to be able to produce a compliant H.265 video bitstream.
Such H.265 parameter set overrides may also have cascading effects on the
implementation overrides applied to the encoded bitstream produced by video
encode operations.
If the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, then the
application can use such queries to retrieve feedback about whether any
implementation overrides have been applied to the encoded bitstream.
When a video session parameters object is
created with the codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, the
VkVideoSessionParametersCreateInfoKHR::pNext
chain must include
a VkVideoEncodeH265SessionParametersCreateInfoKHR
structure specifying
the capacity and initial contents of the object.
The VkVideoEncodeH265SessionParametersCreateInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265SessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t maxStdVPSCount;
uint32_t maxStdSPSCount;
uint32_t maxStdPPSCount;
const VkVideoEncodeH265SessionParametersAddInfoKHR* pParametersAddInfo;
} VkVideoEncodeH265SessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
maxStdVPSCount
is the maximum number of H.265 VPS entries the createdVkVideoSessionParametersKHR
can contain. -
maxStdSPSCount
is the maximum number of H.265 SPS entries the createdVkVideoSessionParametersKHR
can contain. -
maxStdPPSCount
is the maximum number of H.265 PPS entries the createdVkVideoSessionParametersKHR
can contain. -
pParametersAddInfo
isNULL
or a pointer to a VkVideoEncodeH265SessionParametersAddInfoKHR structure specifying H.265 parameters to add upon object creation.
The VkVideoEncodeH265SessionParametersAddInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265SessionParametersAddInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t stdVPSCount;
const StdVideoH265VideoParameterSet* pStdVPSs;
uint32_t stdSPSCount;
const StdVideoH265SequenceParameterSet* pStdSPSs;
uint32_t stdPPSCount;
const StdVideoH265PictureParameterSet* pStdPPSs;
} VkVideoEncodeH265SessionParametersAddInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdVPSCount
is the number of elements in thepStdVPSs
array. -
pStdVPSs
is a pointer to an array ofStdVideoH265VideoParameterSet
structures describing the H.265 VPS entries to add. -
stdSPSCount
is the number of elements in thepStdSPSs
array. -
pStdSPSs
is a pointer to an array ofStdVideoH265SequenceParameterSet
structures describing the H.265 SPS entries to add. -
stdPPSCount
is the number of elements in thepStdPPSs
array. -
pStdPPSs
is a pointer to an array ofStdVideoH265PictureParameterSet
structures describing the H.265 PPS entries to add.
This structure can be specified in the following places:
-
In the
pParametersAddInfo
member of the VkVideoEncodeH265SessionParametersCreateInfoKHR structure specified in thepNext
chain of VkVideoSessionParametersCreateInfoKHR used to create a video session parameters object. In this case, if the video codec operation the video session parameters object is created with isVK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then it defines the set of initial parameters to add to the created object (see Creating Video Session Parameters). -
In the
pNext
chain of VkVideoSessionParametersUpdateInfoKHR. In this case, if the video codec operation the video session parameters object to be updated was created with isVK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, then it defines the set of parameters to add to it (see Updating Video Session Parameters).
The VkVideoEncodeH265SessionParametersGetInfoKHR
structure is defined
as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265SessionParametersGetInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 writeStdVPS;
VkBool32 writeStdSPS;
VkBool32 writeStdPPS;
uint32_t stdVPSId;
uint32_t stdSPSId;
uint32_t stdPPSId;
} VkVideoEncodeH265SessionParametersGetInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
writeStdVPS
indicates whether the encoded H.265 video parameter set identified bystdVPSId
is requested to be retrieved. -
writeStdSPS
indicates whether the encoded H.265 sequence parameter set identified by the pair constructed fromstdVPSId
andstdSPSId
is requested to be retrieved. -
writeStdPPS
indicates whether the encoded H.265 picture parameter set identified by the triplet constructed fromstdVPSId
,stdSPSId
, andstdPPSId
is requested to be retrieved. -
stdVPSId
specifies the H.265 video parameter set ID used to identify the retrieved H.265 video, sequence, and/or picture parameter set(s). -
stdSPSId
specifies the H.265 sequence parameter set ID used to identify the retrieved H.265 sequence and/or picture parameter set(s) whenwriteStdSPS
and/orwriteStdPPS
isVK_TRUE
. -
stdPPSId
specifies the H.265 picture parameter set ID used to identify the retrieved H.265 picture parameter set whenwriteStdPPS
isVK_TRUE
.
When this structure is specified in the pNext
chain of the
VkVideoEncodeSessionParametersGetInfoKHR structure passed to
vkGetEncodedVideoSessionParametersKHR, the command will write encoded
parameter data to the output buffer in the following order:
-
The H.265 video parameter set identified by
stdVPSId
, ifwriteStdVPS
isVK_TRUE
. -
The H.265 sequence parameter set identified by the pair constructed from
stdVPSId
andstdSPSId
, ifwriteStdSPS
isVK_TRUE
. -
The H.265 picture parameter set identified by the triplet constructed from
stdVPSId
,stdSPSId
, andstdPPSId
, ifwriteStdPPS
isVK_TRUE
.
The VkVideoEncodeH265SessionParametersFeedbackInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265SessionParametersFeedbackInfoKHR {
VkStructureType sType;
void* pNext;
VkBool32 hasStdVPSOverrides;
VkBool32 hasStdSPSOverrides;
VkBool32 hasStdPPSOverrides;
} VkVideoEncodeH265SessionParametersFeedbackInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
hasStdVPSOverrides
indicates whether any of the parameters of the requested H.265 video parameter set, if one was requested via VkVideoEncodeH265SessionParametersGetInfoKHR::writeStdVPS
, were overridden by the implementation. -
hasStdSPSOverrides
indicates whether any of the parameters of the requested H.265 sequence parameter set, if one was requested via VkVideoEncodeH265SessionParametersGetInfoKHR::writeStdSPS
, were overridden by the implementation. -
hasStdPPSOverrides
indicates whether any of the parameters of the requested H.265 picture parameter set, if one was requested via VkVideoEncodeH265SessionParametersGetInfoKHR::writeStdPPS
, were overridden by the implementation.
H.265 Encoding Parameters
The VkVideoEncodeH265PictureInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265PictureInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t naluSliceSegmentEntryCount;
const VkVideoEncodeH265NaluSliceSegmentInfoKHR* pNaluSliceSegmentEntries;
const StdVideoEncodeH265PictureInfo* pStdPictureInfo;
} VkVideoEncodeH265PictureInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
naluSliceSegmentEntryCount
is the number of elements inpNaluSliceSegmentEntries
. -
pNaluSliceSegmentEntries
is a pointer to an array ofnaluSliceSegmentEntryCount
VkVideoEncodeH265NaluSliceSegmentInfoKHR structures specifying the parameters of the individual H.265 slice segments to encode for the input picture. -
pStdPictureInfo
is a pointer to aStdVideoEncodeH265PictureInfo
structure specifying H.265 picture information.
This structure is specified in the pNext
chain of the
VkVideoEncodeInfoKHR structure passed to vkCmdEncodeVideoKHR to
specify the codec-specific picture information for an H.265 encode operation.
- Encode Input Picture Information
-
When this structure is specified in the
pNext
chain of the VkVideoEncodeInfoKHR structure passed to vkCmdEncodeVideoKHR, the information related to the encode input picture is defined as follows:-
The image subregion used is determined according to the H.265 Encode Picture Data Access section.
-
The encode input picture is associated with the H.265 picture information provided in
pStdPictureInfo
.
-
- Std Picture Information
-
The members of the
StdVideoEncodeH265PictureInfo
structure pointed to bypStdPictureInfo
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
flags.is_reference
as defined in section 3.132 of the ITU-T H.265 Specification; -
flags.IrapPicFlag
as defined in section 3.73 of the ITU-T H.265 Specification; -
flags.used_for_long_term_reference
is used to indicate whether the picture is marked as “used for long-term reference” as defined in section 8.3.2 of the ITU-T H.265 Specification; -
flags.discardable_flag
andcross_layer_bla_flag
as defined in section F.7.4.7.1 of the ITU-T H.265 Specification; -
pic_type
as defined in section 7.4.3.5 of the ITU-T H.265 Specification; -
sps_video_parameter_set_id
,pps_seq_parameter_set_id
, andpps_pic_parameter_set_id
are used to identify the active parameter sets, as described below; -
PicOrderCntVal
as defined in section 8.3.1 of the ITU-T H.265 Specification; -
TemporalId
as defined in section 7.4.2.2 of the ITU-T H.265 Specification; -
if
pRefLists
is notNULL
, then it is a pointer to aStdVideoEncodeH265ReferenceListsInfo
structure that is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
ref_pic_list_modification_flag_l0
andref_pic_list_modification_flag_l1
as defined in section 7.4.7.2 of the ITU-T H.265 Specification; -
num_ref_idx_l0_active_minus1
andnum_ref_idx_l1_active_minus1
as defined in section 7.4.7.1 of the ITU-T H.265 Specification; -
RefPicList0
andRefPicList1
as defined in section 8.3.4 of the ITU-T H.265 Specification where each element of these arrays either identifies an active reference picture using its DPB slot index or contains the valueSTD_VIDEO_H265_NO_REFERENCE_PICTURE
to indicate “no reference picture”; -
list_entry_l0
andlist_entry_l1
as defined in section 7.4.7.2 of the ITU-T H.265 Specification;
-
-
if
flags.short_term_ref_pic_set_sps_flag
is set, then theStdVideoH265ShortTermRefPicSet
structure pointed to bypShortTermRefPicSet
is interpreted as defined for the elements of thepShortTermRefPicSet
array specified in H.265 sequence parameter sets. -
if
flags.long_term_ref_pics_present_flag
is set in the active SPS, then theStdVideoEncodeH265LongTermRefPics
structure pointed to bypLongTermRefPics
is interpreted as follows:-
used_by_curr_pic_lt_flag
is a bitmask where bit index i corresponds toused_by_curr_pic_lt_flag[i]
as defined in section 7.4.7.1 of the ITU-T H.265 Specification; -
all other members of
StdVideoEncodeH265LongTermRefPics
are interpreted as defined in section 7.4.7.1 of the ITU-T H.265 Specification;
-
-
all other members are interpreted as defined in section 7.4.7.1 of the ITU-T H.265 Specification.
-
Reference picture setup is controlled by the value of
StdVideoEncodeH265PictureInfo
::flags.is_reference
.
If it is set and a reconstructed picture is specified, then the latter is used as the target of picture
reconstruction to activate the DPB slot
specified in pEncodeInfo->pSetupReferenceSlot→slotIndex
.
If StdVideoEncodeH265PictureInfo
::flags.is_reference
is not set,
but a reconstructed picture is
specified, then the corresponding picture reference associated with the
DPB slot is invalidated, as described in the
DPB Slot States section.
- Active Parameter Sets
-
The members of the
StdVideoEncodeH265PictureInfo
structure pointed to bypStdPictureInfo
are used to select the active parameter sets to use from the bound video session parameters object, as follows:-
The active VPS is the VPS identified by the key specified in
StdVideoEncodeH265PictureInfo
::sps_video_parameter_set_id
. -
The active SPS is the SPS identified by the key specified by the pair constructed from
StdVideoEncodeH265PictureInfo
::sps_video_parameter_set_id
andStdVideoEncodeH265PictureInfo
::pps_seq_parameter_set_id
. -
The active PPS is the PPS identified by the key specified by the triplet constructed from
StdVideoEncodeH265PictureInfo
::sps_video_parameter_set_id
,StdVideoEncodeH265PictureInfo
::pps_seq_parameter_set_id
, andStdVideoEncodeH265PictureInfo
::pps_pic_parameter_set_id
.
-
H.265 encoding uses explicit weighted sample prediction for a slice
segment, as defined in section 8.5.3.3.4 of the ITU-T H.265 Specification, if any of the following conditions are true for the active
PPS and the pStdSliceSegmentHeader
member of the
corresponding element of pNaluSliceSegmentEntries
:
-
pStdSliceSegmentHeader->slice_type
isSTD_VIDEO_H265_SLICE_TYPE_P
andweighted_pred_flag
is enabled in the active PPS. -
pStdSliceSegmentHeader->slice_type
isSTD_VIDEO_H265_SLICE_TYPE_B
andweighted_bipred_flag
is enabled in the active PPS.
The number of H.265 tiles, as defined in section 3.174 of the ITU-T H.265 Specification, is derived from the
num_tile_columns_minus1
and num_tile_rows_minus1
members of the
active PPS as follows:
-
(
num_tile_columns_minus1
+ 1) × (num_tile_rows_minus1
+ 1)
The VkVideoEncodeH265NaluSliceSegmentInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265NaluSliceSegmentInfoKHR {
VkStructureType sType;
const void* pNext;
int32_t constantQp;
const StdVideoEncodeH265SliceSegmentHeader* pStdSliceSegmentHeader;
} VkVideoEncodeH265NaluSliceSegmentInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
constantQp
is the QP to use for the slice segment if the current rate control mode configured for the video session isVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. -
pStdSliceSegmentHeader
is a pointer to aStdVideoEncodeH265SliceSegmentHeader
structure specifying H.265 slice segment header parameters for the slice segment.
- Std Slice Segment Header Parameters
-
The members of the
StdVideoEncodeH265SliceSegmentHeader
structure pointed to bypStdSliceSegmentHeader
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
if
pWeightTable
is notNULL
, then it is a pointer to aStdVideoEncodeH265WeightTable
that is interpreted as follows:-
flags.luma_weight_l0_flag
,flags.chroma_weight_l0_flag
,flags.luma_weight_l1_flag
, andflags.chroma_weight_l1_flag
are bitmasks where bit index i corresponds toluma_weight_l0_flag[i]
,chroma_weight_l0_flag[i]
,luma_weight_l1_flag[i]
, andchroma_weight_l1_flag[i]
, respectively, as defined in section 7.4.7.3 of the ITU-T H.265 Specification; -
all other members of
StdVideoEncodeH265WeightTable
are interpreted as defined in section 7.4.7.3 of the ITU-T H.265 Specification;
-
-
all other members are interpreted as defined in section 7.4.7.1 of the ITU-T H.265 Specification.
-
The VkVideoEncodeH265DpbSlotInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265DpbSlotInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoEncodeH265ReferenceInfo* pStdReferenceInfo;
} VkVideoEncodeH265DpbSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdReferenceInfo
is a pointer to aStdVideoEncodeH265ReferenceInfo
structure specifying H.265 reference information.
This structure is specified in the pNext
chain of
VkVideoEncodeInfoKHR::pSetupReferenceSlot
, if not NULL
, and
the pNext
chain of the elements of
VkVideoEncodeInfoKHR::pReferenceSlots
to specify the
codec-specific reference picture information for an H.265 encode operation.
- Active Reference Picture Information
-
When this structure is specified in the
pNext
chain of the elements of VkVideoEncodeInfoKHR::pReferenceSlots
, one element is added to the list of active reference pictures used by the video encode operation for each element of VkVideoEncodeInfoKHR::pReferenceSlots
as follows:-
The image subregion used is determined according to the H.265 Encode Picture Data Access section.
-
The reference picture is associated with the DPB slot index specified in the
slotIndex
member of the corresponding element of VkVideoEncodeInfoKHR::pReferenceSlots
. -
The reference picture is associated with the H.265 reference information provided in
pStdReferenceInfo
.
-
- Reconstructed Picture Information
-
When this structure is specified in the
pNext
chain of VkVideoEncodeInfoKHR::pSetupReferenceSlot
, the information related to the reconstructed picture is defined as follows:-
The image subregion used is determined according to the H.265 Encode Picture Data Access section.
-
If reference picture setup is requested, then the reconstructed picture is used to activate the DPB slot with the index specified in VkVideoEncodeInfoKHR::
pSetupReferenceSlot->slotIndex
. -
The reconstructed picture is associated with the H.265 reference information provided in
pStdReferenceInfo
.
-
- Std Reference Information
-
The members of the
StdVideoEncodeH265ReferenceInfo
structure pointed to bypStdReferenceInfo
are interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
flags.used_for_long_term_reference
is used to indicate whether the picture is marked as “used for long-term reference” as defined in section 8.3.2 of the ITU-T H.265 Specification; -
flags.unused_for_reference
is used to indicate whether the picture is marked as “unused for reference” as defined in section 8.3.2 of the ITU-T H.265 Specification; -
pic_type
as defined in section 7.4.3.5 of the ITU-T H.265 Specification; -
PicOrderCntVal
as defined in section 8.3.1 of the ITU-T H.265 Specification; -
TemporalId
as defined in section 7.4.2.2 of the ITU-T H.265 Specification.
-
H.265 Encode Rate Control
Group of Pictures
In case of H.265 encoding it is common practice to follow a regular pattern of different picture types in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).
A regular GOP is defined by the following parameters:
-
The number of frames in the GOP;
-
The number of consecutive B frames between I and/or P frames in display order.
GOPs are further classified as open and closed GOPs.
Frame types in an open GOP follow each other in display order according to the following algorithm:
-
The first frame is always an I frame.
-
This is followed by a number of consecutive B frames, as defined above.
-
If the number of frames in the GOP is not reached yet, then the next frame is a P frame and the algorithm continues from step 2.
In case of a closed GOP, an IDR frame is used at a certain period.
It is also typical for H.265 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:
- Flat Reference Pattern
-
-
Each P frame uses the last non-B frame, in display order, as reference.
-
Each B frame uses the last non-B frame, in display order, as its forward reference, and uses the next non-B frame, in display order, as its backward reference.
-
- Dyadic Reference Pattern
-
-
Each P frame uses the last non-B frame, in display order, as reference.
-
The following algorithm is applied to the sequence of consecutive B frames between I and/or P frames in display order:
-
The B frame in the middle of this sequence uses the frame preceding the sequence as its forward reference, and uses the frame following the sequence as its backward reference.
-
The algorithm is executed recursively for the following frame sequences:
-
The B frames of the original sequence preceding the frame in the middle, if any.
-
The B frames of the original sequence following the frame in the middle, if any.
-
-
-
The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the picture type of individual encoded pictures is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.
When an H.265 encode session is used to encode multiple temporal sub-layers, it is also common practice to follow a regular pattern for the H.265 temporal ID for the encoded pictures in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:
- Dyadic Temporal Sub-Layer Pattern
-
-
The number of frames in the temporal GOP is 2n-1, where n is the number of temporal sub-layers.
-
The ith frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
-
The ith frame in the temporal GOP uses the rth frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.
-
Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence B pictures with backward prediction are usually not used. |
The VkVideoEncodeH265RateControlInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265RateControlInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeH265RateControlFlagsKHR flags;
uint32_t gopFrameCount;
uint32_t idrPeriod;
uint32_t consecutiveBFrameCount;
uint32_t subLayerCount;
} VkVideoEncodeH265RateControlInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeH265RateControlFlagBitsKHR specifying H.265 rate control flags. -
gopFrameCount
is the number of frames within a group of pictures (GOP) intended to be used by the application. If it is 0, the rate control algorithm may assume an implementation-dependent GOP length. If it isUINT32_MAX
, the GOP length is treated as infinite. -
idrPeriod
is the interval, in terms of number of frames, between two IDR frames (see IDR period). If it is 0, the rate control algorithm may assume an implementation-dependent IDR period. If it isUINT32_MAX
, the IDR period is treated as infinite. -
consecutiveBFrameCount
is the number of consecutive B frames between I and/or P frames within the GOP. -
subLayerCount
specifies the number of H.265 sub-layers that the application intends to use.
When an instance of this structure is included in the pNext
chain of
the VkVideoCodingControlInfoKHR structure passed to the
vkCmdControlVideoCodingKHR command, and
VkVideoCodingControlInfoKHR::flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, the parameters in
this structure are used as guidance for the implementation’s rate control
algorithm (see Video Coding Control).
If flags
includes
VK_VIDEO_ENCODE_H265_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR
, then
the rate control state is reset to an initial state to meet HRD compliance
requirements.
Otherwise the new rate control state may be applied without a reset
depending on the implementation and the specified rate control parameters.
It would be possible to infer the picture type to be used when encoding a
frame, on the basis of the values provided for |
Bits which can be set in
VkVideoEncodeH265RateControlInfoKHR::flags
, specifying H.265
rate control flags, are:
// Provided by VK_KHR_video_encode_h265
typedef enum VkVideoEncodeH265RateControlFlagBitsKHR {
VK_VIDEO_ENCODE_H265_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_H265_RATE_CONTROL_REGULAR_GOP_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_H265_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_H265_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_H265_RATE_CONTROL_TEMPORAL_SUB_LAYER_PATTERN_DYADIC_BIT_KHR = 0x00000010,
} VkVideoEncodeH265RateControlFlagBitsKHR;
-
VK_VIDEO_ENCODE_H265_RATE_CONTROL_ATTEMPT_HRD_COMPLIANCE_BIT_KHR
specifies that rate control should attempt to produce an HRD compliant bitstream, as defined in annex C of the ITU-T H.265 Specification. -
VK_VIDEO_ENCODE_H265_RATE_CONTROL_REGULAR_GOP_BIT_KHR
specifies that the application intends to use a regular GOP structure according to the parameters specified in thegopFrameCount
,idrPeriod
, andconsecutiveBFrameCount
members of the VkVideoEncodeH265RateControlInfoKHR structure. -
VK_VIDEO_ENCODE_H265_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR
specifies that the application intends to follow a flat reference pattern in the GOP. -
VK_VIDEO_ENCODE_H265_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR
specifies that the application intends to follow a dyadic reference pattern in the GOP. -
VK_VIDEO_ENCODE_H265_RATE_CONTROL_TEMPORAL_SUB_LAYER_PATTERN_DYADIC_BIT_KHR
specifies that the application intends to follow a dyadic temporal sub-layer pattern.
// Provided by VK_KHR_video_encode_h265
typedef VkFlags VkVideoEncodeH265RateControlFlagsKHR;
VkVideoEncodeH265RateControlFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeH265RateControlFlagBitsKHR.
Rate Control Layers
The VkVideoEncodeH265RateControlLayerInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265RateControlLayerInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useMinQp;
VkVideoEncodeH265QpKHR minQp;
VkBool32 useMaxQp;
VkVideoEncodeH265QpKHR maxQp;
VkBool32 useMaxFrameSize;
VkVideoEncodeH265FrameSizeKHR maxFrameSize;
} VkVideoEncodeH265RateControlLayerInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useMinQp
indicates whether the QP values determined by rate control will be clamped to the lower bounds on the QP values specified inminQp
. -
minQp
specifies the lower bounds on the QP values, for each picture type, that the implementation’s rate control algorithm will use whenuseMinQp
isVK_TRUE
. -
useMaxQp
indicates whether the QP values determined by rate control will be clamped to the upper bounds on the QP values specified inmaxQp
. -
maxQp
specifies the upper bounds on the QP values, for each picture type, that the implementation’s rate control algorithm will use whenuseMaxQp
isVK_TRUE
. -
useMaxFrameSize
indicates whether the implementation’s rate control algorithm should use the values specified inmaxFrameSize
as the upper bounds on the encoded frame size for each picture type. -
maxFrameSize
specifies the upper bounds on the encoded frame size, for each picture type, whenuseMaxFrameSize
isVK_TRUE
.
When used, the values in minQp
and maxQp
guarantee that the
effective QP values used by the implementation will respect those lower and
upper bounds, respectively.
However, limiting the range of QP values that the implementation is able to
use will also limit the capabilities of the implementation’s rate control
algorithm to comply to other constraints.
In particular, the implementation may not be able to comply to the
following:
-
The average and/or peak bitrate values to be used for the encoded bitstream specified in the
averageBitrate
andmaxBitrate
members of the VkVideoEncodeRateControlLayerInfoKHR structure. -
The upper bounds on the encoded frame size, for each picture type, specified in the
maxFrameSize
member ofVkVideoEncodeH265RateControlLayerInfoKHR
.
In general, applications need to configure rate control parameters appropriately in order to be able to get the desired rate control behavior, as described in the Video Encode Rate Control section. |
When an instance of this structure is included in the pNext
chain of a
VkVideoEncodeRateControlLayerInfoKHR structure specified in one of the
elements of the pLayers
array member of the
VkVideoEncodeRateControlInfoKHR structure passed to the
vkCmdControlVideoCodingKHR command,
VkVideoCodingControlInfoKHR::flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, and the bound
video session was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, it specifies the
H.265-specific rate control parameters of the rate control layer
corresponding to that element of pLayers
.
The VkVideoEncodeH265QpKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265QpKHR {
int32_t qpI;
int32_t qpP;
int32_t qpB;
} VkVideoEncodeH265QpKHR;
-
qpI
is the QP to be used for I pictures. -
qpP
is the QP to be used for P pictures. -
qpB
is the QP to be used for B pictures.
The VkVideoEncodeH265FrameSizeKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265FrameSizeKHR {
uint32_t frameISize;
uint32_t framePSize;
uint32_t frameBSize;
} VkVideoEncodeH265FrameSizeKHR;
GOP Remaining Frames
Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).
The VkVideoEncodeH265GopRemainingFrameInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_h265
typedef struct VkVideoEncodeH265GopRemainingFrameInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useGopRemainingFrames;
uint32_t gopRemainingI;
uint32_t gopRemainingP;
uint32_t gopRemainingB;
} VkVideoEncodeH265GopRemainingFrameInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useGopRemainingFrames
indicates whether the implementation’s rate control algorithm should use the values specified ingopRemainingI
,gopRemainingP
, andgopRemainingB
. IfuseGopRemainingFrames
isVK_FALSE
, then the values ofgopRemainingI
,gopRemainingP
, andgopRemainingB
are ignored. -
gopRemainingI
specifies the number of I frames the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the video encode operation. -
gopRemainingP
specifies the number of P frames the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the video encode operation. -
gopRemainingB
specifies the number of B frames the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the video encode operation.
Setting useGopRemainingFrames
to VK_TRUE
and including this
structure in the pNext
chain of VkVideoBeginCodingInfoKHR is
only mandatory if the
VkVideoEncodeH265CapabilitiesKHR::requiresGopRemainingFrames
reported for the used video profile is VK_TRUE
.
However, implementations may use these remaining frame counts, when
specified, even when it is not required.
In particular, when the application does not use a
regular GOP structure, these values may provide
additional guidance for the implementation’s rate control algorithm.
The VkVideoEncodeH265CapabilitiesKHR::prefersGopRemainingFrames
capability is also used to indicate that the implementation’s rate control
algorithm may operate more accurately if the application specifies the
remaining frame counts using this structure.
As with other rate control guidance values, if the effective order and number of frames encoded by the application are not in line with the remaining frame counts specified in this structure at any given point, then the behavior of the implementation’s rate control algorithm may deviate from the one expected by the application.
H.265 QP Delta Maps
Quantization delta maps used with an H.265 encode profile are referred to as QP delta maps and their texels contain integer values representing QP delta values that are applied in the process of determining the quantization parameters of the encoded picture.
Accordingly, H.265 QP delta maps always have single channel integer formats,
as reported in VkVideoFormatPropertiesKHR::format
.
When the rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, the QP delta
values are added to the per slice segment constant QP values that, in
effect, enable the application to explicitly control the used QP values at
the granularity of the used
quantization map texel size.
For all other rate control modes, the QP delta values can be used to offset the QP values that the rate control algorithm would otherwise produce.
H.265 Encode Quantization
Performing H.265 encode operations involves the process of assigning QP values to individual H.265 coding units. This process depends on the used rate control mode, as well as other encode and rate control parameters, as described below:
-
If the configured rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
, then the QP value is initialized by the implementation-specific default rate control algorithm.-
If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the coding unit, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the
minQpDelta
andmaxQpDelta
members of VkVideoEncodeH265QuantizationMapCapabilitiesKHR, then the QP value used for the coding unit becomes undefined.
-
-
If the configured rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, then the QP value is initialized from the constant QP value specified for the H.265 slice segment the coding unit is part of.-
If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the coding unit, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the
minQpDelta
andmaxQpDelta
members of VkVideoEncodeH265QuantizationMapCapabilitiesKHR, then the QP value used for the coding unit becomes undefined.
-
-
If the configured rate control mode is not
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
orVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, then the QP value is initialized by the corresponding rate control algorithm.-
If the video encode operation is issued with a quantization delta map, the QP delta value corresponding to the coding unit, as fetched from the quantization map, is added to the previously determined QP value. If the fetched QP delta value falls outside the supported QP delta value range reported in the
minQpDelta
andmaxQpDelta
members of VkVideoEncodeH265QuantizationMapCapabilitiesKHR, then the QP value used for the coding unit becomes undefined. -
If the video encode operation is issued with an emphasis map, the rate control will adjust the QP value based on the emphasis value corresponding to the coding unit, as fetched from the quantization map, according to the following equation:
QPnew = f(QPprev,e)
Where QPnew is the resulting QP value, QPprev is the previously determined QP value, e is the emphasis value corresponding to the coding unit, and f is an implementation-defined function for which the following implication is true:
e1 < e2 ⇒ f(QP,e1) ≥ f(QP,e2)
This means that lower emphasis values will result in higher QP values, whereas higher emphasis values will result in lower QP values, but the function is not strictly decreasing with respect to the input emphasis value for a given input QP value.
-
If clamping to minimum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding minimum QP value.
-
If clamping to maximum QP values is enabled in the applied rate control layer, then the QP value is clamped to the corresponding maximum QP value.
-
-
If
VK_VIDEO_ENCODE_H265_CAPABILITY_CU_QP_DIFF_WRAPAROUND_BIT_KHR
is not supported, then the determined QP value is clamped in such a way that theCuQpDeltaVal
value of the encoded coding unit complies to the modified version of equation 8-283 of the ITU-T H.265 Specification.The effect of this is that the maximum QP difference across subsequent coding units is limited to the [-(26 + QpBdOffsetY / 2), 25 + QpBdOffsetY / 2] range and only has an observable change in behavior when the video encode operation is issued with a QP delta map.
-
In all cases, the final QP value is clamped to the QP value range supported by the video profile, as reported in the
minQp
andmaxQp
members of VkVideoEncodeH265CapabilitiesKHR.
H.265 Encode Requirements
This section described the required H.265 encoding capabilities for
physical devices that have at least one queue family that supports the video
codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_H265_BIT_KHR
, as
returned by vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
|
1.0.0 |
Video Capability | Requirement | Requirement Type1 |
---|---|---|
|
- |
min |
|
4096 |
max |
|
4096 |
max |
|
(64,64) |
max |
|
- |
max |
|
- |
min |
|
0 |
min |
|
0 |
min |
|
- |
min |
|
- |
min |
|
128000 |
min |
|
1 |
min |
|
(64,64) |
max |
|
|
min |
|
- |
min |
|
|
min |
|
1 |
min |
|
(1,1) |
min |
|
at least one bit set |
implementation-dependent |
|
at least one bit set |
implementation-dependent |
|
0 |
min |
|
0 |
min |
|
0 |
min |
|
1 |
min |
|
- |
implementation-dependent |
|
- |
max |
|
- |
min |
|
- |
implementation-dependent |
|
- |
implementation-dependent |
|
- |
min |
|
- 2 |
min |
|
- 3 |
max |
|
- 3 |
min |
- 1
-
The Requirement Type column specifies the requirement is either the minimum value all implementations must support, the maximum value all implementations must support, or the exact value all implementations must support. For bitmasks a minimum value is the least bits all implementations must set, but they may have additional bits set beyond this minimum.
- 2
-
If VkVideoCapabilitiesKHR::
flags
includesVK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
orVK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR
, then thewidth
andheight
members ofmaxQuantizationMapExtent
must be greater than zero. - 3
-
If VkVideoCapabilitiesKHR::
flags
includesVK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
, thenmaxQpDelta
must be greater thanminQpDelta
.
AV1 Encode Operations
Video encode operations using an AV1 encode profile can be used to encode elementary video stream sequences compliant with the AV1 Specification.
Refer to the Preamble for information on how the Khronos Intellectual Property Rights Policy relates to normative references to external materials not created by Khronos. |
This process is performed according to the video encode operation steps with the codec-specific semantics defined in section 7 of the AV1 Specification:
-
Syntax elements, derived values, and other parameters are applied from the following structures:
-
The
StdVideoAV1SequenceHeader
structure, the optionalStdVideoEncodeAV1DecoderModelInfo
structure, and the optional array ofStdVideoEncodeAV1OperatingPointInfo
structures stored in the bound video session parameters object specifying the active sequence header. -
The
StdVideoEncodeAV1PictureInfo
structure specifying the AV1 picture information. -
The
StdVideoEncodeAV1ReferenceInfo
structures specifying the AV1 reference information corresponding to the optional reconstructed picture and any active reference pictures. -
The encoded bitstream data is written to the destination video bitstream buffer range as defined in the AV1 Encode Bitstream Data Access section.
-
Picture data in the video picture resources corresponding to the used encode input picture, active reference pictures, and optional reconstructed picture is accessed as defined in the AV1 Encode Picture Data Access section.
-
-
The decision on reference picture setup is made according to the parameters specified in the AV1 picture information.
If the parameters adhere to the syntactic and semantic requirements defined in the corresponding sections of the AV1 Specification, as described above, and the DPB slots associated with the active reference pictures all refer to valid picture references, then the video encode operation will complete successfully. Otherwise, the video encode operation may complete unsuccessfully.
AV1 Encode Parameter Overrides
Implementations may override, unless otherwise specified, any of the AV1 encode parameters specified in the following Video Std structures:
-
StdVideoAV1SequenceHeader
-
StdVideoEncodeAV1DecoderModelInfo
-
StdVideoEncodeAV1OperatingPointInfo
-
StdVideoEncodeAV1PictureInfo
-
StdVideoEncodeAV1ReferenceInfo
All such AV1 encode parameter overrides must fulfill the conditions defined in the Video Encode Parameter Overrides section.
In addition, implementations must not override any of the following AV1 encode parameters:
-
the following parameters specified in
StdVideoAV1SequenceHeader
:-
flags.still_picture
-
flags.enable_order_hint
-
flags.frame_id_numbers_present_flag
-
flags.film_grain_params_present
-
flags.timing_info_present_flag
-
flags.initial_display_delay_present_flag
-
delta_frame_id_length_minus_2
-
additional_frame_id_length_minus_1
-
order_hint_bits_minus_1
-
-
the following parameters specified in the
StdVideoAV1ColorConfig
structure pointed to byStdVideoAV1SequenceHeader
::pColorConfig
:-
flags.mono_chrome
-
flags.color_range
-
BitDepth
-
subsampling_x
-
subsampling_y
-
color_primaries
-
transfer_characteristics
-
matrix_coefficients
-
chroma_sample_position
-
-
the following parameters specified in the
StdVideoAV1TimingInfo
structure pointed to byStdVideoAV1SequenceHeader
::pTimingInfo
:-
flags.equal_picture_interval
-
num_units_in_display_tick
-
time_scale
-
num_ticks_per_picture_minus_1
-
-
the parameters specified in
StdVideoEncodeAV1DecoderModelInfo
-
the parameters specified in
StdVideoEncodeAV1OperatingPointInfo
-
the following parameters specified in
StdVideoEncodeAV1PictureInfo
:-
flags.show_frame
-
flags.showable_frame
-
frame_type
-
frame_presentation_time
-
current_frame_id
-
order_hint
-
refresh_frame_flags
-
render_width_minus_1
-
render_height_minus_1
-
ref_order_hint
-
ref_frame_idx
-
delta_frame_id_minus_1
-
-
the following parameters specified in the
StdVideoEncodeAV1ExtensionHeader
structure pointed to byStdVideoEncodeAV1PictureInfo
::pExtensionHeader
when VkVideoEncodeAV1PictureInfoKHR::generateObuExtensionHeader
is set toVK_TRUE
:-
temporal_id
-
spatial_id
-
If VkVideoEncodeAV1PictureInfoKHR::primaryReferenceCdfOnly
is
set to VK_TRUE
for a video encode operation, the implementation will
not override StdVideoEncodeAV1PictureInfo
::primary_ref_frame
.
Implementations supporting the
|
If VkVideoEncodeAV1CapabilitiesKHR::codedPictureAlignment
is not
equal to {8,8}
for the used video profile, implementations will override
the coded picture’s resolution and parameters related to the width and
height in the following manner:
-
Let
w
andh
be thecodedExtent.width
andcodedExtent.height
of the VkVideoPictureResourceInfoKHR structure corresponding to the encode input picture, rounded up to the nearest integer multiple of 8. -
Let
aW
andaH
bew
andh
rounded up to the nearest integer multiple ofcodedPictureAlignment.width
andcodedPictureAlignment.height
respectively. -
If
w
equalsaW
, no override will occur. Otherwise the coded width will beaW
. -
If
h
equalsaH
, no override will occur. Otherwise the coded height will beaH
.
The AV1 specification codes all resolutions to an 8x8 alignment, but supports unaligned resolutions through implicit cropping. Thus, if the original coded extent, aligned to 8x8, meets the implementation required alignment, no override needs to occur. Otherwise, the implementation cannot code the requested coded extent, so the final resolution in the bitstream is overridden to be aligned to the implementation required alignment. For example, consider an implementation that is only able to output
bitstreams that are 16x16 aligned (as indicated by
VkVideoEncodeAV1CapabilitiesKHR:: |
In case of a video session parameters object
created with
VK_VIDEO_SESSION_PARAMETERS_CREATE_QUANTIZATION_MAP_COMPATIBLE_BIT_KHR
,
the following AV1 sequence header parameters
may be overridden by the implementation according to the
quantization map texel size the video
session parameters object was created with:
-
StdVideoAV1SequenceHeader
::flags.use_128x128_superblock
This may be necessary to change the AV1 superblock size used during encoding to be compatible with the used quantization map texel size.
In case of AV1 encode parameters stored in video session parameters objects, applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened. If the query indicates that implementation overrides were applied, then the application needs to retrieve and use the encoded AV1 sequence header in the bitstream in order to be able to produce a compliant AV1 video bitstream using the AV1 encode parameters stored in the video session parameters object.
In case of any AV1 encode parameters stored in the encoded bitstream
produced by video encode operations, if the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, the
application can use such queries to retrieve feedback about whether any
implementation overrides have been applied to those AV1 encode parameters.
AV1 Encode Bitstream Data Access
Each video encode operation writes either:
-
A single OBU with
obu_type
equal toOBU_FRAME
comprising of the frame header and tile data of the encoded picture, or -
An OBU with
obu_type
equal toOBU_FRAME_HEADER
encapsulating the frame header of the encoded picture, followed by one or more OBUs withobu_type
equal toOBU_TILE_GROUP
comprising of the tile data of the encoded picture.
In addition, if
VkVideoEncodeAV1PictureInfoKHR::generateObuExtensionHeader
is
set to VK_TRUE
for the video encode operation, then OBU extension
headers are included in the generated bitstream as defined in sections
5.3.1, 5.3.2, and 5.3.3 of the AV1 Specification.
AV1 Encode Picture Data Access
Accesses to image data within a video picture resource happen at the
granularity indicated by
VkVideoCapabilitiesKHR::pictureAccessGranularity
, as returned by
vkGetPhysicalDeviceVideoCapabilitiesKHR for the used video profile.
Accordingly, the complete image subregion of a encode input picture, reference picture, or
reconstructed picture accessed by video coding
operations using an AV1 encode profile is defined as
the set of texels within the coordinate range:
-
([0,
endX
),[0,endY
))
Where:
-
endX
equalscodedExtent.width
rounded up to the nearest integer multiple ofpictureAccessGranularity.width
and clamped to the width of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure; -
endY equals
codedExtent.height
rounded up to the nearest integer multiple ofpictureAccessGranularity.height
and clamped to the height of the image subresource referred to by the corresponding VkVideoPictureResourceInfoKHR structure;
Where codedExtent
is the member of the
VkVideoPictureResourceInfoKHR structure corresponding to the picture.
In case of video encode operations using an AV1 encode profile, any access to a picture at the coordinates
(x
,y
), as defined by the AV1 Specification, is an access to the image subresource
referred to by the corresponding
VkVideoPictureResourceInfoKHR structure at the texel coordinates
(x
,y
).
Implementations may choose not to access some or all texels within particular reference pictures available to a video encode operation (e.g. due to video encode parameter overrides restricting the effective set of used reference pictures, or if the encoding algorithm chooses not to use certain subregions of the reference picture data for sample prediction).
AV1 Reference Names and Semantics
Individual reference frames used in the encoding process have different
semantics, as defined in section 6.10.24 of the AV1 Specification.
The AV1 semantics associated with a reference picture is indicated by the
corresponding enumeration constant defined in the Video Std enumeration type
StdVideoAV1ReferenceName
:
-
STD_VIDEO_AV1_REFERENCE_NAME_INTRA_FRAME
identifies the reference used for intra coding (INTRA_FRAME
), as defined in sections 2 and 7.11.2 of the AV1 Specification. -
All other enumeration constants refer to backward or forward references used for inter coding, as defined in sections 2 and 7.11.3 of the AV1 Specification:
-
STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
identifies theLAST_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_LAST2_FRAME
identifies theLAST2_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_LAST3_FRAME
identifies theLAST3_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_GOLDEN_FRAME
identifies theGOLDEN_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_BWDREF_FRAME
identifies theBWDREF_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_ALTREF2_FRAME
identifies theALTREF2_FRAME
reference -
STD_VIDEO_AV1_REFERENCE_NAME_ALTREF_FRAME
identifies theALTREF_FRAME
reference
-
These enumeration constants are not directly used in any APIs but are used to indirectly index into certain Video Std and Vulkan API parameter arrays.
AV1 Prediction Modes
AV1 encoding supports multiple types of prediction modes, as described in section 6.10.24 of the AV1 Specification.
Possible AV1 encode prediction modes are as follows:
// Provided by VK_KHR_video_encode_av1
typedef enum VkVideoEncodeAV1PredictionModeKHR {
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_INTRA_ONLY_KHR = 0,
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_SINGLE_REFERENCE_KHR = 1,
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_UNIDIRECTIONAL_COMPOUND_KHR = 2,
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_BIDIRECTIONAL_COMPOUND_KHR = 3,
} VkVideoEncodeAV1PredictionModeKHR;
-
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_INTRA_ONLY_KHR
specifies the use of intra-only prediction mode, used when encoding AV1 frames of typeSTD_VIDEO_AV1_FRAME_TYPE_KEY
orSTD_VIDEO_AV1_FRAME_TYPE_INTRA_ONLY
. -
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_SINGLE_REFERENCE_KHR
specifies the use of single reference prediction mode, used when encoding AV1 frames of typeSTD_VIDEO_AV1_FRAME_TYPE_INTER
orSTD_VIDEO_AV1_FRAME_TYPE_SWITCH
withreference_select
, as defined in section 6.8.23 of the AV1 Specification, equal to 0. When using this prediction mode, the application must specify a reference picture for at least one AV1 reference name in VkVideoEncodeAV1PictureInfoKHR::referenceNameSlotIndices
that is supported by the implementation, as reported in VkVideoEncodeAV1CapabilitiesKHR::singleReferenceNameMask
. -
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_UNIDIRECTIONAL_COMPOUND_KHR
specifies the use of unidirectional compound prediction mode, used when encoding AV1 frames of typeSTD_VIDEO_AV1_FRAME_TYPE_INTER
orSTD_VIDEO_AV1_FRAME_TYPE_SWITCH
withreference_select
, as defined in section 6.8.23 of the AV1 Specification, equal to 1, and both reference names used for prediction are from the same reference frame group, as defined in section 6.10.24 of the AV1 Specification. When using this prediction mode, the application must specify a reference picture for at least two AV1 reference names in VkVideoEncodeAV1PictureInfoKHR::referenceNameSlotIndices
that is supported by the implementation, as reported in VkVideoEncodeAV1CapabilitiesKHR::unidirectionalCompoundReferenceNameMask
, where those two reference names are one of the allowed pairs of reference names, as defined in section 5.11.25 of the AV1 Specification, listed below:-
LAST_FRAME
andLAST2_FRAME
, -
LAST_FRAME
andLAST3_FRAME
, -
LAST_FRAME
andGOLDEN_FRAME
, or -
BWDREF_FRAME
andALTREF_FRAME
.
-
-
VK_VIDEO_ENCODE_AV1_PREDICTION_MODE_BIDIRECTIONAL_COMPOUND_KHR
specifies the use of bidirectional compound prediction mode, used when encoding AV1 frames of typeSTD_VIDEO_AV1_FRAME_TYPE_INTER
orSTD_VIDEO_AV1_FRAME_TYPE_SWITCH
withreference_select
, as defined in section 6.8.23 of the AV1 Specification, equal to 1, and the two reference names used for prediction are from different reference frame groups, as defined in section 6.10.24 of the AV1 Specification. When using this prediction mode, the application must specify a reference picture for at least one AV1 reference name from each reference frame group in VkVideoEncodeAV1PictureInfoKHR::referenceNameSlotIndices
that is supported by the implementation, as reported in VkVideoEncodeAV1CapabilitiesKHR::bidirectionalCompoundReferenceNameMask
.
The effective prediction mode used to encode individual AV1 mode info blocks may use simpler prediction modes than the one set by the application for the frame, as allowed by the AV1 Specification, in particular:
-
Frames encoded with single reference prediction mode may contain mode info blocks encoded with intra-only prediction mode.
-
Frames encoded with unidirectional compound prediction mode may contain mode info blocks encoded with intra-only or single reference prediction mode.
-
Frames encoded with bidirectional compound prediction mode may contain mode info blocks encoded with intra-only, single reference, or unidirectional compound prediction mode.
AV1 Coding Blocks
AV1 encode supports two types of coding blocks, as defined in section 2 of the AV1 Specification:
-
Superblock.
-
Mode info block.
AV1 Encode Profile
A video profile supporting AV1 video encode operations is specified by
setting VkVideoProfileInfoKHR::videoCodecOperation
to
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
and adding a
VkVideoEncodeAV1ProfileInfoKHR
structure to the
VkVideoProfileInfoKHR::pNext
chain.
The VkVideoEncodeAV1ProfileInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1ProfileInfoKHR {
VkStructureType sType;
const void* pNext;
StdVideoAV1Profile stdProfile;
} VkVideoEncodeAV1ProfileInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
stdProfile
is aStdVideoAV1Profile
value specifying the AV1 codec profile, as defined in section A.2 of the AV1 Specification.
AV1 Encode Capabilities
When calling vkGetPhysicalDeviceVideoCapabilitiesKHR to query the
capabilities for an AV1 encode profile, the
VkVideoCapabilitiesKHR::pNext
chain must include a
VkVideoEncodeAV1CapabilitiesKHR
structure that will be filled with the
profile-specific capabilities.
The VkVideoEncodeAV1CapabilitiesKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1CapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeAV1CapabilityFlagsKHR flags;
StdVideoAV1Level maxLevel;
VkExtent2D codedPictureAlignment;
VkExtent2D maxTiles;
VkExtent2D minTileSize;
VkExtent2D maxTileSize;
VkVideoEncodeAV1SuperblockSizeFlagsKHR superblockSizes;
uint32_t maxSingleReferenceCount;
uint32_t singleReferenceNameMask;
uint32_t maxUnidirectionalCompoundReferenceCount;
uint32_t maxUnidirectionalCompoundGroup1ReferenceCount;
uint32_t unidirectionalCompoundReferenceNameMask;
uint32_t maxBidirectionalCompoundReferenceCount;
uint32_t maxBidirectionalCompoundGroup1ReferenceCount;
uint32_t maxBidirectionalCompoundGroup2ReferenceCount;
uint32_t bidirectionalCompoundReferenceNameMask;
uint32_t maxTemporalLayerCount;
uint32_t maxSpatialLayerCount;
uint32_t maxOperatingPoints;
uint32_t minQIndex;
uint32_t maxQIndex;
VkBool32 prefersGopRemainingFrames;
VkBool32 requiresGopRemainingFrames;
VkVideoEncodeAV1StdFlagsKHR stdSyntaxFlags;
} VkVideoEncodeAV1CapabilitiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeAV1CapabilityFlagBitsKHR indicating supported AV1 encoding capabilities. -
maxLevel
is aStdVideoAV1Level
value indicating the maximum AV1 level supported by the profile, as defined in section A.3 of the AV1 Specification. -
codedPictureAlignment
indicates the alignment at which the implementation will code pictures. This capability does not impose any valid usage constraints on the application. However, depending on thecodedExtent
of the encode input picture resource, this capability may result in a change of the resolution of the encoded picture, as described in more detail below. -
maxTiles
indicates the maximum number of AV1 tile columns and rows the implementation supports. -
minTileSize
indicates the minimum extent of individual AV1 tiles the implementation supports. -
maxTileSize
indicates the maximum extent of individual AV1 tiles the implementation supports. -
superblockSizes
is a bitmask of VkVideoEncodeAV1SuperblockSizeFlagBitsKHR values indicating the supported AV1 superblock sizes. -
maxSingleReferenceCount
indicates the maximum number of reference pictures the implementation supports when using single reference prediction mode. -
singleReferenceNameMask
is a bitmask of supported AV1 reference names when using single reference prediction mode. -
maxUnidirectionalCompoundReferenceCount
indicates the maximum number of reference pictures the implementation supports when using unidirectional compound prediction mode. -
maxUnidirectionalCompoundGroup1ReferenceCount
indicates the maximum number of reference pictures the implementation supports when using unidirectional compound prediction mode from reference frame group 1, as defined in section 6.10.24 of the AV1 Specification. -
unidirectionalCompoundReferenceNameMask
is a bitmask of supported AV1 reference names when using unidirectional compound prediction mode. -
maxBidirectionalCompoundReferenceCount
indicates the maximum number of reference pictures the implementation supports when using bidirectional compound prediction mode. -
maxBidirectionalCompoundGroup1ReferenceCount
indicates the maximum number of reference pictures the implementation supports when using bidirectional compound prediction mode from reference frame group 1, as defined in section 6.10.24 of the AV1 Specification. -
maxBidirectionalCompoundGroup2ReferenceCount
indicates the maximum number of reference pictures the implementation supports when using bidirectional compound prediction mode from reference frame group 2, as defined in section 6.10.24 of the AV1 Specification. -
bidirectionalCompoundReferenceNameMask
is a bitmask of supported AV1 reference names when using bidirectional compound prediction mode. -
maxTemporalLayerCount
indicates the maximum number of AV1 temporal layers supported by the implementation. -
maxSpatialLayerCount
indicates the maximum number of AV1 spatial layers supported by the implementation. -
maxOperatingPoints
indicates the maximum number of AV1 operating points supported by the implementation. -
minQIndex
indicates the minimum quantizer index value supported. -
maxQIndex
indicates the maximum quantizer index value supported. -
prefersGopRemainingFrames
indicates that the implementation’s rate control algorithm prefers the application to specify the number of frames in each AV1 rate control group remaining in the current group of pictures when beginning a video coding scope. -
requiresGopRemainingFrames
indicates that the implementation’s rate control algorithm requires the application to specify the number of frames in each AV1 rate control group remaining in the current group of pictures when beginning a video coding scope. -
stdSyntaxFlags
is a bitmask of VkVideoEncodeAV1StdFlagBitsKHR indicating capabilities related to AV1 syntax elements.
singleReferenceNameMask
,
unidirectionalCompoundReferenceNameMask
, and
bidirectionalCompoundReferenceNameMask
are encoded such that when bit
index i is set, it indicates support for the
AV1 reference name
STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
+ i.
These masks indicate which elements of the |
codedPictureAlignment
provides information about implementation
limitations to encode arbitrary resolutions.
In particular, some implementations may not be able to generate bitstreams
aligned to the requirements of the AV1 Specification (8x8).
In such cases, the implementation may override the width and height of the bitstream, in order to produce a
bitstream compliant to the AV1 Specification.
If such an override occurs, the encoded resolution of the coded picture is
enlargened, with the texel values used for the texel coordinates outside of
the bounds of the codedExtent
of the encode input picture resource
being first governed by the rules regarding the
encode input picture granularity.
Any texel values outside of the region described by the encode input picture
granularity are implementation-defined.
Implementations should use well-defined values to minimize impact on the
produced encoded content.
This capability does not impose additional application requirements. However, these overrides change the effective resolution of the bitstream and add padding pixels. Applications sensitive to such overrides can use this capability and the corresponding override behavior to compute the cropping needed to reproduce the original input of the encoding and transmit it in a side channel (i.e. by using cropping fields available in a container). Additionally, applications can explicitly consider this alignment in their coded extent, to avoid implementation-defined texel values being included in the encoded content. |
Bits which may be set in
VkVideoEncodeAV1CapabilitiesKHR::flags
, indicating the AV1
encoding capabilities supported, are:
// Provided by VK_KHR_video_encode_av1
typedef enum VkVideoEncodeAV1CapabilityFlagBitsKHR {
VK_VIDEO_ENCODE_AV1_CAPABILITY_PER_RATE_CONTROL_GROUP_MIN_MAX_Q_INDEX_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_AV1_CAPABILITY_GENERATE_OBU_EXTENSION_HEADER_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_AV1_CAPABILITY_PRIMARY_REFERENCE_CDF_ONLY_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_AV1_CAPABILITY_FRAME_SIZE_OVERRIDE_BIT_KHR = 0x00000008,
VK_VIDEO_ENCODE_AV1_CAPABILITY_MOTION_VECTOR_SCALING_BIT_KHR = 0x00000010,
} VkVideoEncodeAV1CapabilityFlagBitsKHR;
-
VK_VIDEO_ENCODE_AV1_CAPABILITY_PER_RATE_CONTROL_GROUP_MIN_MAX_Q_INDEX_BIT_KHR
indicates support for specifying different quantizer index values in the members of VkVideoEncodeAV1QIndexKHR. -
VK_VIDEO_ENCODE_AV1_CAPABILITY_GENERATE_OBU_EXTENSION_HEADER_BIT_KHR
indicates support for generating OBU extension headers, as defined in section 5.3.3 of the AV1 Specification. -
VK_VIDEO_ENCODE_AV1_CAPABILITY_PRIMARY_REFERENCE_CDF_ONLY_BIT_KHR
indicates support for using the primary reference frame indicated by the value ofStdVideoEncodeAV1PictureInfo
::primary_ref_frame
in the AV1 picture information only for CDF data reference, as defined in section 6.8.2 of the AV1 Specification. -
VK_VIDEO_ENCODE_AV1_CAPABILITY_FRAME_SIZE_OVERRIDE_BIT_KHR
indicates support for encoding a picture with a frame size different from the maximum frame size defined in the active AV1 sequence header. If this capability is not supported, thenframe_size_override_flag
must not be set in the AV1 picture information of the encoded frame and the coded extent of the encode input picture must match the maximum coded extent allowed by the active AV1 sequence header, i.e. (max_frame_width_minus_1
+ 1,max_frame_height_minus_1
+ 1). -
VK_VIDEO_ENCODE_AV1_CAPABILITY_MOTION_VECTOR_SCALING_BIT_KHR
indicates support for motion vector scaling, as defined in section 7.11.3.3 of the AV1 Specification. If this capability is not supported, then the coded extent of all active reference pictures must match the coded extent of the encode input picture. This capability may only be supported by a video profile whenVK_VIDEO_ENCODE_AV1_CAPABILITY_FRAME_SIZE_OVERRIDE_BIT_KHR
is also supported.
// Provided by VK_KHR_video_encode_av1
typedef VkFlags VkVideoEncodeAV1CapabilityFlagsKHR;
VkVideoEncodeAV1CapabilityFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeAV1CapabilityFlagBitsKHR.
Bits which may be set in
VkVideoEncodeAV1CapabilitiesKHR::stdSyntaxFlags
, indicating the
capabilities related to the AV1 syntax elements, are:
// Provided by VK_KHR_video_encode_av1
typedef enum VkVideoEncodeAV1StdFlagBitsKHR {
VK_VIDEO_ENCODE_AV1_STD_UNIFORM_TILE_SPACING_FLAG_SET_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_AV1_STD_SKIP_MODE_PRESENT_UNSET_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_AV1_STD_PRIMARY_REF_FRAME_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_AV1_STD_DELTA_Q_BIT_KHR = 0x00000008,
} VkVideoEncodeAV1StdFlagBitsKHR;
-
VK_VIDEO_ENCODE_AV1_STD_UNIFORM_TILE_SPACING_FLAG_SET_BIT_KHR
indicates whether the implementation supports using the application-provided value forStdVideoAV1TileInfoFlags
::uniform_tile_spacing_flag
in the AV1 tile parameters when that value is1
, indifferent of the coded extent of the encode input picture and the number of tile columns and rows requested in theTileCols
andTileRows
members ofStdVideoAV1TileInfo
. -
VK_VIDEO_ENCODE_AV1_STD_SKIP_MODE_PRESENT_UNSET_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeAV1PictureInfoFlags
::skip_mode_present
when that value is0
. -
VK_VIDEO_ENCODE_AV1_STD_PRIMARY_REF_FRAME_BIT_KHR
specifies whether the implementation supports using the application-provided value forStdVideoEncodeAV1PictureInfo
::primary_ref_frame
. -
VK_VIDEO_ENCODE_AV1_STD_DELTA_Q_BIT_KHR
specifies whether the implementation supports using the application-provided values for theDeltaQYDc
,DeltaQUDc
,DeltaQUAc
,DeltaQVDc
, andDeltaQVAc
members ofStdVideoAV1Quantization
.
These capability flags provide information to the application about specific AV1 syntax element values that the implementation supports without having to override them and do not otherwise restrict the values that the application can specify for any of the mentioned AV1 syntax elements.
// Provided by VK_KHR_video_encode_av1
typedef VkFlags VkVideoEncodeAV1StdFlagsKHR;
VkVideoEncodeAV1StdFlagsKHR
is a bitmask type for setting a mask of
zero or more VkVideoEncodeAV1StdFlagBitsKHR.
Bits which may be set in
VkVideoEncodeAV1CapabilitiesKHR::superblockSizes
, indicating the
superblock sizes supported by the implementation, are:
// Provided by VK_KHR_video_encode_av1
typedef enum VkVideoEncodeAV1SuperblockSizeFlagBitsKHR {
VK_VIDEO_ENCODE_AV1_SUPERBLOCK_SIZE_64_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_AV1_SUPERBLOCK_SIZE_128_BIT_KHR = 0x00000002,
} VkVideoEncodeAV1SuperblockSizeFlagBitsKHR;
-
VK_VIDEO_ENCODE_AV1_SUPERBLOCK_SIZE_64_BIT_KHR
specifies that a superblock size of 64x64 is supported. -
VK_VIDEO_ENCODE_AV1_SUPERBLOCK_SIZE_128_BIT_KHR
specifies that a superblock size of 128x128 is supported.
// Provided by VK_KHR_video_encode_av1
typedef VkFlags VkVideoEncodeAV1SuperblockSizeFlagsKHR;
VkVideoEncodeAV1SuperblockSizeFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeAV1SuperblockSizeFlagBitsKHR.
Implementations must support at least one of
VkVideoEncodeAV1SuperblockSizeFlagBitsKHR
.
AV1 Encode Quality Level Properties
When calling vkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR
with pVideoProfile->videoCodecOperation
specified as
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, the
VkVideoEncodeAV1QualityLevelPropertiesKHR structure must be included
in the pNext
chain of the VkVideoEncodeQualityLevelPropertiesKHR
structure to retrieve additional video encode quality level properties
specific to AV1 encoding.
The VkVideoEncodeAV1QualityLevelPropertiesKHR structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1QualityLevelPropertiesKHR {
VkStructureType sType;
void* pNext;
VkVideoEncodeAV1RateControlFlagsKHR preferredRateControlFlags;
uint32_t preferredGopFrameCount;
uint32_t preferredKeyFramePeriod;
uint32_t preferredConsecutiveBipredictiveFrameCount;
uint32_t preferredTemporalLayerCount;
VkVideoEncodeAV1QIndexKHR preferredConstantQIndex;
uint32_t preferredMaxSingleReferenceCount;
uint32_t preferredSingleReferenceNameMask;
uint32_t preferredMaxUnidirectionalCompoundReferenceCount;
uint32_t preferredMaxUnidirectionalCompoundGroup1ReferenceCount;
uint32_t preferredUnidirectionalCompoundReferenceNameMask;
uint32_t preferredMaxBidirectionalCompoundReferenceCount;
uint32_t preferredMaxBidirectionalCompoundGroup1ReferenceCount;
uint32_t preferredMaxBidirectionalCompoundGroup2ReferenceCount;
uint32_t preferredBidirectionalCompoundReferenceNameMask;
} VkVideoEncodeAV1QualityLevelPropertiesKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
preferredRateControlFlags
is a bitmask of VkVideoEncodeAV1RateControlFlagBitsKHR values indicating the preferred flags to use for VkVideoEncodeAV1RateControlInfoKHR::flags
. -
preferredGopFrameCount
indicates the preferred value to use for VkVideoEncodeAV1RateControlInfoKHR::gopFrameCount
. -
preferredKeyFramePeriod
indicates the preferred value to use for VkVideoEncodeAV1RateControlInfoKHR::keyFramePeriod
. -
preferredConsecutiveBipredictiveFrameCount
indicates the preferred value to use for VkVideoEncodeAV1RateControlInfoKHR::consecutiveBipredictiveFrameCount
. -
preferredTemporalLayerCount
indicates the preferred value to use for VkVideoEncodeAV1RateControlInfoKHR::temporalLayerCount
. -
preferredConstantQIndex
indicates the preferred value to use for VkVideoEncodeAV1PictureInfoKHR::constantQIndex
when using rate control modeVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. -
preferredMaxSingleReferenceCount
indicates the preferred maximum number of reference pictures to use with single reference prediction mode. -
preferredSingleReferenceNameMask
is a bitmask of preferred AV1 reference names when using single reference prediction mode. -
preferredMaxUnidirectionalCompoundReferenceCount
indicates the preferred maximum number of reference pictures to use with unidirectional compound prediction mode. -
preferredMaxUnidirectionalCompoundGroup1ReferenceCount
indicates the preferred maximum number of reference pictures to use with unidirectional compound prediction mode from reference frame group 1, as defined in section 6.10.24 of the AV1 Specification. -
preferredUnidirectionalCompoundReferenceNameMask
is a bitmask of preferred AV1 reference names when using unidirectional compound prediction mode. -
preferredMaxBidirectionalCompoundReferenceCount
indicates the preferred maximum number of reference pictures to use with bidirectional compound prediction mode. -
preferredMaxBidirectionalCompoundGroup1ReferenceCount
indicates the preferred maximum number of reference pictures to use with bidirectional compound prediction mode from reference frame group 1, as defined in section 6.10.24 of the AV1 Specification. -
preferredMaxBidirectionalCompoundGroup2ReferenceCount
indicates the preferred maximum number of reference pictures to use with bidirectional compound prediction mode from reference frame group 2, as defined in section 6.10.24 of the AV1 Specification. -
preferredBidirectionalCompoundReferenceNameMask
is a bitmask of preferred AV1 reference names when using bidirectional compound prediction mode.
preferredSingleReferenceNameMask
,
preferredUnidirectionalCompoundReferenceNameMask
, and
preferredBidirectionalCompoundReferenceNameMask
are encoded such that
when bit index i is set, it indicates preference for using the
AV1 reference name
STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
+ i.
AV1 Encode Session
Additional parameters can be specified when creating a video session with an
AV1 encode profile by including an instance of the
VkVideoEncodeAV1SessionCreateInfoKHR structure in the pNext
chain of VkVideoSessionCreateInfoKHR.
The VkVideoEncodeAV1SessionCreateInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1SessionCreateInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useMaxLevel;
StdVideoAV1Level maxLevel;
} VkVideoEncodeAV1SessionCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useMaxLevel
indicates whether the value ofmaxLevel
should be used by the implementation. When it is set toVK_FALSE
, the implementation ignores the value ofmaxLevel
and uses the value of VkVideoEncodeAV1CapabilitiesKHR::maxLevel
, as reported by vkGetPhysicalDeviceVideoCapabilitiesKHR for the video profile. -
maxLevel
is aStdVideoAV1Level
value specifying the upper bound on the AV1 level for the video bitstreams produced by the created video session.
AV1 Encode Parameter Sets
Video session parameters objects created with
the video codec operation VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
contain a single instance of the following parameter set:
- AV1 Sequence Header
-
Represented by
StdVideoAV1SequenceHeader
structures and interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
the
StdVideoAV1ColorConfig
structure pointed to bypColorConfig
is interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
all other members of
StdVideoAV1ColorConfig
are interpreted as defined in section 6.4.2 of the AV1 Specification;
-
-
if
flags.timing_info_present_flag
is set, then theStdVideoAV1TimingInfo
structure pointed to bypTimingInfo
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoAV1TimingInfo
are interpreted as defined in section 6.4.3 of the AV1 Specification;
-
-
all other members of
StdVideoAV1SequenceHeader
are interpreted as defined in section 6.4 of the AV1 Specification.
-
When StdVideoAV1SequenceHeader
::flags.timing_info_present_flag
is
set, the AV1 sequence header can be amended with AV1 decoder model
information, represented by a StdVideoEncodeAV1DecoderModelInfo
structure and interpreted as follows:
-
reserved1
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoEncodeAV1DecoderModelInfo
are interpreted as defined in section 6.4.4 of the AV1 Specification.
When
StdVideoAV1SequenceHeader
::flags.reduced_still_picture_header
is
not set, the AV1 sequence header can be amended with AV1 operating point
information, represented by an array of
StdVideoEncodeAV1OperatingPointInfo
structures and interpreted as
follows:
-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoEncodeAV1OperatingPointInfo
are interpreted as the corresponding element of the respective arrays defined in section 6.4 of the AV1 Specification.
Implementations may override any of these parameters according to the semantics defined in the Video Encode Parameter Overrides section before storing the resulting AV1 sequence header into the video session parameters object. Applications need to use the vkGetEncodedVideoSessionParametersKHR command to determine whether any implementation overrides happened and to retrieve the encoded AV1 sequence header in order to be able to produce a compliant AV1 video bitstream.
The encoded AV1 sequence header retrieved using the
vkGetEncodedVideoSessionParametersKHR command is encoded as a single
OBU with obu_type
equal to OBU_SEQUENCE_HEADER
, as defined in
section 5.3 of the AV1 Specification.
Such AV1 sequence header overrides may also have cascading effects on the
implementation overrides applied to the encoded bitstream produced by video
encode operations.
If the implementation supports the
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR
video encode feedback query flag, then the
application can use such queries to retrieve feedback about whether any
implementation overrides have been applied to the encoded bitstream.
When a video session parameters object is
created with the codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, the
VkVideoSessionParametersCreateInfoKHR::pNext
chain must include
a VkVideoEncodeAV1SessionParametersCreateInfoKHR
structure specifying
the contents of the object.
The VkVideoEncodeAV1SessionParametersCreateInfoKHR
structure is
defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1SessionParametersCreateInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoAV1SequenceHeader* pStdSequenceHeader;
const StdVideoEncodeAV1DecoderModelInfo* pStdDecoderModelInfo;
uint32_t stdOperatingPointCount;
const StdVideoEncodeAV1OperatingPointInfo* pStdOperatingPoints;
} VkVideoEncodeAV1SessionParametersCreateInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdSequenceHeader
is a pointer to aStdVideoAV1SequenceHeader
structure describing parameters of the AV1 sequence header entry to store in the created object. -
pStdDecoderModelInfo
isNULL
or a pointer to aStdVideoEncodeAV1DecoderModelInfo
structure specifying the AV1 decoder model information to store in the created object. -
stdOperatingPointCount
is the number of elements in thepStdOperatingPoints
array. -
pStdOperatingPoints
isNULL
or a pointer to an array ofstdOperatingPointCount
number ofStdVideoEncodeAV1OperatingPointInfo
structures specifying the AV1 operating point information to store in the created object. Each element i specifies the parameter values corresponding to element i of the syntax elements defined in section 6.4 of the AV1 Specification.
AV1 Encoding Parameters
The VkVideoEncodeAV1PictureInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1PictureInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeAV1PredictionModeKHR predictionMode;
VkVideoEncodeAV1RateControlGroupKHR rateControlGroup;
uint32_t constantQIndex;
const StdVideoEncodeAV1PictureInfo* pStdPictureInfo;
int32_t referenceNameSlotIndices[VK_MAX_VIDEO_AV1_REFERENCES_PER_FRAME_KHR];
VkBool32 primaryReferenceCdfOnly;
VkBool32 generateObuExtensionHeader;
} VkVideoEncodeAV1PictureInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
predictionMode
specifies the AV1 prediction mode to use for the encoded frame. -
rateControlGroup
specifies the AV1 rate control group to use for the encoded frame when the current rate control mode is notVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. Otherwise it is ignored. -
constantQIndex
is the quantizer index to use for the encoded frame if the current rate control mode configured for the video session isVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
. -
pStdPictureInfo
is a pointer to aStdVideoEncodeAV1PictureInfo
structure specifying AV1 picture information. -
referenceNameSlotIndices
is an array of seven (VK_MAX_VIDEO_AV1_REFERENCES_PER_FRAME_KHR
, which is equal to the Video Std definitionSTD_VIDEO_AV1_REFS_PER_FRAME
) signed integer values specifying the index of the DPB slot or a negative integer value for each AV1 reference name used for inter coding. In particular, the DPB slot index for the AV1 reference nameframe
is specified inreferenceNameSlotIndices
[frame
-STD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
]. -
primaryReferenceCdfOnly
controls whether the primary reference frame indicated by the value ofpStdPictureInfo->primary_ref_frame
is used only for CDF data reference, as defined in sections 6.8.2 of the AV1 Specification. If set toVK_TRUE
, then the primary reference frame’s picture data will not be used for sample prediction. -
generateObuExtensionHeader
controls whether OBU extension headers are generated into the target bitstream, as defined in sections 5.3.1, 5.3.2, and 5.3.3 of the AV1 Specification.
This structure is specified in the pNext
chain of the
VkVideoEncodeInfoKHR structure passed to vkCmdEncodeVideoKHR to
specify the codec-specific picture information for an AV1 encode operation.
- Encode Input Picture Information
-
When this structure is specified in the
pNext
chain of the VkVideoEncodeInfoKHR structure passed to vkCmdEncodeVideoKHR, the information related to the encode input picture is defined as follows:-
The image subregion used is determined according to the AV1 Encode Picture Data Access section.
-
The encode input picture is associated with the AV1 picture information provided in
pStdPictureInfo
.
-
- Std Picture Information
-
The members of the
StdVideoEncodeAV1PictureInfo
structure pointed to bypStdPictureInfo
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
pSegmentation
must beNULL
AV1 segmentation is currently not supported in video encode operations. Accordingly, the application needs to set
flags.segmentation_enabled
to0
andpSegmentation
toNULL
. -
pTileInfo
isNULL
or a pointer to aStdVideoAV1TileInfo
structure specifying AV1 tile parameters; -
the
StdVideoAV1Quantization
structure pointed to bypQuantization
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
all other members of
StdVideoAV1Quantization
are interpreted as defined in section 6.8.11 of the AV1 Specification;
-
-
the
StdVideoAV1LoopFilter
structure pointed to bypLoopFilter
is interpreted as follows:-
flags.reserved
is used only for padding purposes and is otherwise ignored; -
update_ref_delta
is a bitmask where bit index i is interpreted as the value ofupdate_ref_delta
corresponding to element i ofloop_filter_ref_deltas
as defined in section 6.8.10 of the AV1 Specification; -
update_mode_delta
is a bitmask where bit index i is interpreted as the value ofupdate_mode_delta
corresponding to element i ofloop_filter_mode_deltas
as defined in section 6.8.10 of the AV1 Specification; -
all other members of
StdVideoAV1LoopFilter
are interpreted as defined in section 6.8.10 of the AV1 Specification;
-
-
if
flags.enable_cdef
is set in the active sequence header, then the members of theStdVideoAV1CDEF
structure pointed to bypCDEF
are interpreted as follows:-
cdef_y_sec_strength
andcdef_uv_sec_strength
are the bitstream values of the corresponding syntax elements defined in section 5.9.19 of the AV1 Specification; -
all other members of
StdVideoAV1CDEF
are interpreted as defined in section 6.10.14 of the AV1 Specification;
-
-
if
flags.UsesLr
is set in the active sequence header, then theStdVideoAV1LoopRestoration
structure pointed to bypLoopRestoration
is interpreted as follows:-
LoopRestorationSize
[plane
] is interpreted as log2(size
) - 5, wheresize
is the value ofLoopRestorationSize
[plane
] as defined in section 6.10.15 of the AV1 Specification; -
all other members of
StdVideoAV1LoopRestoration
are defined as in section 6.10.15 of the AV1 Specification;
-
-
the members of the
StdVideoAV1GlobalMotion
structure provided inglobal_motion
are interpreted as defined in section 7.10 of the AV1 Specification; -
pExtensionHeader
isNULL
or a pointer to aStdVideoEncodeAV1ExtensionHeader
structure whosetemporal_id
andspatial_id
members specify the temporal and spatial layer ID of the reference frame, respectively (these IDs are encoded into the OBU extension header if VkVideoEncodeAV1PictureInfoKHR::generateObuExtensionHeader
is set toVK_TRUE
for the encode operation); -
if
flags.buffer_removal_time_present_flag
is set, thenpBufferRemovalTimes
is a pointer to an array of N number of unsigned integer values specifying the elements of thebuffer_removal_time
array, as defined in section 6.8.2 of the AV1 Specification, where N is the number of operating points specified for the active sequence header through VkVideoEncodeAV1SessionParametersCreateInfoKHR::stdOperatingPointCount
; -
all other members are interpreted as defined in section 6.8 of the AV1 Specification.
-
Reference picture setup is controlled by the value of
StdVideoEncodeAV1PictureInfo
::refresh_frame_flags
.
If it is not zero and a reconstructed picture is specified, then the latter is used as the target of picture
reconstruction to activate the DPB slot
specified in pEncodeInfo->pSetupReferenceSlot→slotIndex
.
If StdVideoEncodeAV1PictureInfo
::refresh_frame_flags
is zero, but
a reconstructed picture is specified,
then the corresponding picture reference associated with the DPB slot is invalidated, as described in the DPB Slot States section.
- Std Tile Parameters
-
Specifying AV1 tile parameters is optional. If
StdVideoEncodeAV1PictureInfo
::pTileInfo
isNULL
, then the implementation determines the values of AV1 tile parameters defined in section 6.8.14 of the AV1 Specification in an implementation-dependent manner. IfStdVideoEncodeAV1PictureInfo
::pTileInfo
is notNULL
, then the members of theStdVideoAV1TileInfo
structure pointed to byStdVideoEncodeAV1PictureInfo
::pTileInfo
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
TileCols
andTileRows
specify the number of tile columns and tile rows as defined in section 6.8.14 of the AV1 Specification; -
tile_size_bytes_minus_1
is ignored, as its value, as defined in section 6.8.14 of the AV1 Specification, is determined as the result of the encoding process; -
pMiColStarts
andpMiRowStarts
are ignored, as the elements of theMiColStarts
andMiRowStarts
arrays defined in section 6.8.14 of the AV1 Specification are determined by the implementation based on the tile widths and heights determined by the implementation or specified through thepWidthInSbsMinus1
andpHeightInSbsMinus1
arrays, respectively; -
pWidthInSbsMinus1
isNULL
or a pointer to an array ofTileCols
number of unsigned integers that corresponds towidth_in_sbs_minus_1
defined in section 6.8.14 of the AV1 Specification; -
pHeightInSbsMinus1
isNULL
or is a pointer to an array ofTileRows
number of unsigned integers that corresponds toheight_in_sbs_minus_1
defined in section 6.8.14 of the AV1 Specification; -
all other members of
StdVideoAV1TileInfo
are interpreted as defined in section 6.8.14 of the AV1 Specification.
-
If flags.uniform_tile_spacing_flag
is set, then pWidthInSbsMinus1
and pHeightInSbsMinus1
are ignored.
If flags.uniform_tile_spacing_flag
is not set and
pWidthInSbsMinus1
is NULL
, then the width of individual tile columns
is determined in an implementation-dependent manner.
If flags.uniform_tile_spacing_flag
is not set and
pHeightInSbsMinus1
is NULL
, then the height of individual tile rows
is determined in an implementation-dependent manner.
In general, implementations are expected to respect the application-specified AV1 tile parameters. However, as implementations may have restrictions on the combination of tile column and row counts, and tile widths and heights with respect to the extent of the encoded frame beyond the restrictions specified in the AV1 Specification and this specification (through video profile capabilities), certain parameter combinations may require the implementation to override them in order to conform to such implementation-specific limitations. |
- Active Parameter Sets
-
The active sequence header is the AV1 sequence header stored in the bound video session parameters object.
The VkVideoEncodeAV1DpbSlotInfoKHR structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1DpbSlotInfoKHR {
VkStructureType sType;
const void* pNext;
const StdVideoEncodeAV1ReferenceInfo* pStdReferenceInfo;
} VkVideoEncodeAV1DpbSlotInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pStdReferenceInfo
is a pointer to aStdVideoEncodeAV1ReferenceInfo
structure specifying AV1 reference information.
This structure is specified in the pNext
chain of
VkVideoEncodeInfoKHR::pSetupReferenceSlot
, if not NULL
, and
the pNext
chain of the elements of
VkVideoEncodeInfoKHR::pReferenceSlots
to specify the
codec-specific reference picture information for an AV1 encode operation.
- Active Reference Picture Information
-
When this structure is specified in the
pNext
chain of the elements of VkVideoEncodeInfoKHR::pReferenceSlots
, one element is added to the list of active reference pictures used by the video encode operation for each element of VkVideoEncodeInfoKHR::pReferenceSlots
as follows:-
The image subregion used is determined according to the AV1 Encode Picture Data Access section.
-
The reference picture is associated with the DPB slot index specified in the
slotIndex
member of the corresponding element of VkVideoEncodeInfoKHR::pReferenceSlots
. -
The reference picture is associated with the AV1 reference information provided in
pStdReferenceInfo
.
-
- Reconstructed Picture Information
-
When this structure is specified in the
pNext
chain of VkVideoEncodeInfoKHR::pSetupReferenceSlot
, the information related to the reconstructed picture is defined as follows:-
The image subregion used is determined according to the AV1 Encode Picture Data Access section.
-
If reference picture setup is requested, then the reconstructed picture is used to activate the DPB slot with the index specified in VkVideoEncodeInfoKHR::
pSetupReferenceSlot->slotIndex
. -
The reconstructed picture is associated with the AV1 reference information provided in
pStdReferenceInfo
.
-
- Std Reference Information
-
The members of the
StdVideoEncodeAV1ReferenceInfo
structure pointed to bypStdReferenceInfo
are interpreted as follows:-
flags.reserved
andreserved1
are used only for padding purposes and are otherwise ignored; -
flags.disable_frame_end_update_cdf
is interpreted as defined in section 6.8.2 of the AV1 Specification; -
flags.segmentation_enabled
is interpreted as defined in section 6.8.13 of the AV1 Specification; -
RefFrameId
is interpreted as the element of theRefFrameId
array defined in section 6.8.2 of the AV1 Specification corresponding to the reference frame; -
frame_type
is interpreted as defined in section 6.8.2 of the AV1 Specification; -
OrderHint
is interpreted as defined in section 6.8.2 of the AV1 Specification; -
pExtensionHeader
isNULL
or a pointer to aStdVideoEncodeAV1ExtensionHeader
structure whosetemporal_id
andspatial_id
members specify the temporal and spatial layer ID of the reference frame, respectively.
-
AV1 Encode Rate Control
Group of Pictures
In case of AV1 encoding it is common practice to follow a regular pattern of frame types and prediction directions in display order when encoding subsequent frames. This pattern is referred to as the group of pictures (GOP).
The AV1 Specification, unlike some other video compression
standards, does not restrict the direction in display order of the
referenced frames based on the used frame type or
AV1 prediction mode.
Accordingly, this specification introduces the concept of rate control
groups for which the application can specify separate rate control
configuration parameters.
When encoding a frame, the application specifies the rate control group the
encoded frame belongs to through a VkVideoEncodeAV1RateControlGroupKHR
value in VkVideoEncodeAV1PictureInfoKHR::rateControlGroup
.
This value is then used by the implementation’s rate control algorithm to
determine which rate control configuration parameters apply to it.
Possible AV1 encode rate control groups are as follows:
// Provided by VK_KHR_video_encode_av1
typedef enum VkVideoEncodeAV1RateControlGroupKHR {
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR = 0,
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR = 1,
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR = 2,
} VkVideoEncodeAV1RateControlGroupKHR;
-
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR
should be specified when encoding AV1 frames that use intra-only prediction (e.g. when encoding AV1 frames of typeSTD_VIDEO_AV1_FRAME_TYPE_KEY
orSTD_VIDEO_AV1_FRAME_TYPE_INTRA_ONLY
). -
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
should be specified when encoding AV1 frames that only have forward references in display order. -
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
should be specified when encoding AV1 frames that have backward references in display order.
While the application can specify any rate control group for any frame, indifferent of the frame type, prediction mode, or prediction direction, specifying a rate control group that does not reflect the prediction direction used by the encoded frame may result in unexpected behavior of the implementation’s rate control algorithm. |
A regular GOP is defined by the following parameters:
-
The number of frames in the GOP;
-
The number of consecutive frames encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
between frames encoded with other rate control groups in display order.
GOPs are further classified as open and closed GOPs.
Frame types in an open GOP follow each other in display order according to the following algorithm:
-
The first frame is always a frame encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR
. -
This is followed by a number of consecutive frames encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
. -
If the number of frames in the GOP is not reached yet, then the next frame is a frame encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
and the algorithm continues from step 2.
In case of a closed GOP, a frame with the AV1 frame type
STD_VIDEO_AV1_FRAME_TYPE_KEY
is used at a certain period.
It is also typical for AV1 encoding to use specific reference picture usage patterns across the frames of the GOP. The two most common reference patterns used are as follows:
- Flat Reference Pattern
-
-
Each frame encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
refers to the last frame that was not encoded usingVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
, in display order, as its forward reference. -
Each frame encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
refers to the last frame that was not encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
, in display order, as its forward reference, and refers to the next frame that was not encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
, in display order, as its backward reference.
-
- Dyadic Reference Pattern
-
-
Each frame encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
refers to the last frame that was not encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
, in display order, as its forward reference. -
The following algorithm is applied to the sequence of consecutive frames encoded with
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
between frames using other rate control groups in display order:-
The frame in the middle of this sequence uses the frame preceding the sequence as its forward reference, and uses the frame following the sequence as its backward reference.
-
The algorithm is executed recursively for the following frame sequences:
-
The frames of the original sequence preceding the frame in the middle, if any.
-
The frames of the original sequence following the frame in the middle, if any.
-
-
-
The application can provide guidance to the implementation’s rate control algorithm about the structure of the GOP used by the application. Any such guidance about the GOP and its structure does not mandate that specific GOP structure to be used by the application, as the frame type and the selected rate control group is still application-controlled, however, any deviation from the provided guidance may result in undesired rate control behavior including, but not limited, to the implementation not being able to conform to the expected average or target bitrates, or other rate control parameters specified by the application.
When an AV1 encode session is used to encode multiple temporal layers, it is also common practice to follow a regular pattern for the AV1 temporal ID for the encoded frames in display order when encoding subsequent frames. This pattern is referred to as the temporal GOP. The most common temporal layer pattern used is as follows:
- Dyadic Temporal Layer Pattern
-
-
The number of frames in the temporal GOP is 2n-1, where n is the number of temporal layers.
-
The ith frame in the temporal GOP uses temporal ID t, if and only if the index of the least significant bit set in i equals n-t-1, except for the first frame, which is the only frame in the temporal GOP using temporal ID zero.
-
The ith frame in the temporal GOP uses the rth frame as reference, where r is calculated from i by clearing the least significant bit set in it, except for the first frame in the temporal GOP, which uses the first frame of the previous temporal GOP, if any, as reference.
-
Multi-layer rate control and multi-layer coding are typically used for streaming cases where low latency is expected, hence frames usually do not use backward references in display order. |
The VkVideoEncodeAV1RateControlInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1RateControlInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeAV1RateControlFlagsKHR flags;
uint32_t gopFrameCount;
uint32_t keyFramePeriod;
uint32_t consecutiveBipredictiveFrameCount;
uint32_t temporalLayerCount;
} VkVideoEncodeAV1RateControlInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkVideoEncodeAV1RateControlFlagBitsKHR specifying AV1 rate control flags. -
gopFrameCount
is the number of frames within a group of pictures (GOP) intended to be used by the application. If it is set to 0, the rate control algorithm may assume an implementation-dependent GOP length. If it is set toUINT32_MAX
, the GOP length is treated as infinite. -
keyFramePeriod
is the interval, in terms of number of frames, between two frames with the AV1 frame typeSTD_VIDEO_AV1_FRAME_TYPE_KEY
(see key frame period). If it is set to 0, the rate control algorithm may assume an implementation-dependent key frame period. If it is set toUINT32_MAX
, the key frame period is treated as infinite. -
consecutiveBipredictiveFrameCount
is the number of consecutive frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
between frames encoded with other rate control groups within the GOP. -
temporalLayerCount
specifies the number of AV1 temporal layers that the application intends to use.
When an instance of this structure is included in the pNext
chain of
the VkVideoCodingControlInfoKHR structure passed to the
vkCmdControlVideoCodingKHR command, and
VkVideoCodingControlInfoKHR::flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, the parameters in
this structure are used as guidance for the implementation’s rate control
algorithm (see Video Coding Control).
Bits which can be set in
VkVideoEncodeAV1RateControlInfoKHR::flags
, specifying AV1 rate
control flags, are:
// Provided by VK_KHR_video_encode_av1
typedef enum VkVideoEncodeAV1RateControlFlagBitsKHR {
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_REGULAR_GOP_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_TEMPORAL_LAYER_PATTERN_DYADIC_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR = 0x00000004,
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR = 0x00000008,
} VkVideoEncodeAV1RateControlFlagBitsKHR;
-
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_REGULAR_GOP_BIT_KHR
specifies that the application intends to use a regular GOP structure according to the parameters specified in thegopFrameCount
andkeyFramePeriod
members of the VkVideoEncodeAV1RateControlInfoKHR structure. -
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_TEMPORAL_LAYER_PATTERN_DYADIC_BIT_KHR
specifies that the application intends to follow a dyadic temporal layer pattern. -
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_REFERENCE_PATTERN_FLAT_BIT_KHR
specifies that the application intends to follow a flat reference pattern in the GOP. -
VK_VIDEO_ENCODE_AV1_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR
specifies that the application intends to follow a dyadic reference pattern in the GOP.
// Provided by VK_KHR_video_encode_av1
typedef VkFlags VkVideoEncodeAV1RateControlFlagsKHR;
VkVideoEncodeAV1RateControlFlagsKHR
is a bitmask type for setting a
mask of zero or more VkVideoEncodeAV1RateControlFlagBitsKHR.
Rate Control Layers
The VkVideoEncodeAV1RateControlLayerInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1RateControlLayerInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useMinQIndex;
VkVideoEncodeAV1QIndexKHR minQIndex;
VkBool32 useMaxQIndex;
VkVideoEncodeAV1QIndexKHR maxQIndex;
VkBool32 useMaxFrameSize;
VkVideoEncodeAV1FrameSizeKHR maxFrameSize;
} VkVideoEncodeAV1RateControlLayerInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useMinQIndex
indicates whether the quantizer index values determined by rate control will be clamped to the lower bounds on the quantizer index values specified inminQIndex
. -
minQIndex
specifies the lower bounds on the quantizer index values, for each rate control group, that the implementation’s rate control algorithm will use whenuseMinQIndex
is set toVK_TRUE
. -
useMaxQIndex
indicates whether the quantizer index values determined by rate control will be clamped to the upper bounds on the quantizer index values specified inmaxQIndex
. -
maxQIndex
specifies the upper bounds on the quantizer index values, for each rate control group, that the implementation’s rate control algorithm will use whenuseMaxQIndex
is set toVK_TRUE
. -
useMaxFrameSize
indicates whether the implementation’s rate control algorithm should use the values specified inmaxFrameSize
as the upper bounds on the encoded frame size for each rate control group. -
maxFrameSize
specifies the upper bounds on the encoded frame size, for each rate control group, whenuseMaxFrameSize
is set toVK_TRUE
.
When used, the values in minQIndex
and maxQIndex
guarantee that
the effective quantizer index values used by the implementation will respect
those lower and upper bounds, respectively.
However, limiting the range of quantizer index values that the
implementation is able to use will also limit the capabilities of the
implementation’s rate control algorithm to comply to other constraints.
In particular, the implementation may not be able to comply to the
following:
-
The average and/or peak bitrate values to be used for the encoded bitstream specified in the
averageBitrate
andmaxBitrate
members of the VkVideoEncodeRateControlLayerInfoKHR structure. -
The upper bounds on the encoded frame size, for each rate control group, specified in the
maxFrameSize
member ofVkVideoEncodeAV1RateControlLayerInfoKHR
.
In general, applications need to configure rate control parameters appropriately in order to be able to get the desired rate control behavior, as described in the Video Encode Rate Control section. |
When an instance of this structure is included in the pNext
chain of a
VkVideoEncodeRateControlLayerInfoKHR structure specified in one of the
elements of the pLayers
array member of the
VkVideoEncodeRateControlInfoKHR structure passed to the
vkCmdControlVideoCodingKHR command,
VkVideoCodingControlInfoKHR::flags
includes
VK_VIDEO_CODING_CONTROL_ENCODE_RATE_CONTROL_BIT_KHR
, and the bound
video session was created with the video codec operation
VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, it specifies the
AV1-specific rate control parameters of the rate control layer corresponding
to that element of pLayers
.
The VkVideoEncodeAV1QIndexKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1QIndexKHR {
uint32_t intraQIndex;
uint32_t predictiveQIndex;
uint32_t bipredictiveQIndex;
} VkVideoEncodeAV1QIndexKHR;
-
intraQIndex
is the quantizer index to be used for frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR
. -
predictiveQIndex
is the quantizer index to be used for frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
. -
bipredictiveQIndex
is the quantizer index to be used for frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
.
The VkVideoEncodeAV1FrameSizeKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1FrameSizeKHR {
uint32_t intraFrameSize;
uint32_t predictiveFrameSize;
uint32_t bipredictiveFrameSize;
} VkVideoEncodeAV1FrameSizeKHR;
-
intraFrameSize
is the size in bytes to be used for frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR
. -
predictiveFrameSize
is the size in bytes to be used for frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
. -
bipredictiveFrameSize
is the size in bytes to be used for frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
.
GOP Remaining Frames
Besides session level rate control configuration, the application can specify the number of frames per frame type remaining in the group of pictures (GOP).
The VkVideoEncodeAV1GopRemainingFrameInfoKHR
structure is defined as:
// Provided by VK_KHR_video_encode_av1
typedef struct VkVideoEncodeAV1GopRemainingFrameInfoKHR {
VkStructureType sType;
const void* pNext;
VkBool32 useGopRemainingFrames;
uint32_t gopRemainingIntra;
uint32_t gopRemainingPredictive;
uint32_t gopRemainingBipredictive;
} VkVideoEncodeAV1GopRemainingFrameInfoKHR;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
useGopRemainingFrames
indicates whether the implementation’s rate control algorithm should use the values specified ingopRemainingIntra
,gopRemainingPredictive
, andgopRemainingBipredictive
. IfuseGopRemainingFrames
isVK_FALSE
, then the values ofgopRemainingIntra
,gopRemainingPredictive
, andgopRemainingBipredictive
are ignored. -
gopRemainingIntra
specifies the number of frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_INTRA_KHR
the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the next video encode operation. -
gopRemainingPredictive
specifies the number of frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_PREDICTIVE_KHR
the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the next video encode operation. -
gopRemainingBipredictive
specifies the number of frames encoded withVK_VIDEO_ENCODE_AV1_RATE_CONTROL_GROUP_BIPREDICTIVE_KHR
the implementation’s rate control algorithm should assume to be remaining in the GOP prior to executing the next video encode operation.
Setting useGopRemainingFrames
to VK_TRUE
and including this
structure in the pNext
chain of VkVideoBeginCodingInfoKHR is
only mandatory if the
VkVideoEncodeAV1CapabilitiesKHR::requiresGopRemainingFrames
reported for the used video profile is VK_TRUE
.
However, implementations may use these remaining frame counts, when
specified, even when it is not required.
In particular, when the application does not use a
regular GOP structure, these values may provide
additional guidance for the implementation’s rate control algorithm.
The VkVideoEncodeAV1CapabilitiesKHR::prefersGopRemainingFrames
capability is also used to indicate that the implementation’s rate control
algorithm may operate more accurately if the application specifies the
remaining frame counts using this structure.
As with other rate control guidance values, if the effective order and number of frames encoded by the application are not in line with the remaining frame counts specified in this structure at any given point, then the behavior of the implementation’s rate control algorithm may deviate from the one expected by the application.
AV1 Quantizer Index Delta Maps
Quantization delta maps used with an AV1 encode profile are referred to as quantizer index delta maps and their texels contain integer values representing quantizer index delta values that are applied in the process of determining the quantizer indices of the encoded picture.
Accordingly, AV1 quantizer index delta maps always have single channel
integer formats, as reported in
VkVideoFormatPropertiesKHR::format
.
When the rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, the quantizer
index delta values are added to the constant quantizer index value that, in
effect, enable the application to explicitly control the used quantizer
index values at the granularity of the used
quantization map texel size.
For all other rate control modes, the quantizer index delta values can be used to offset the quantizer index values that the rate control algorithm would otherwise produce.
AV1 Encode Quantization
Performing AV1 encode operations involves the process of assigning quantizer index values to individual AV1 mode info blocks. This process depends on the used rate control mode, as well as other encode and rate control parameters, as described below:
-
If the configured rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
, then the quantizer index value is initialized by the implementation-specific default rate control algorithm.-
If the video encode operation is issued with a quantization delta map, the quantizer index delta value corresponding to the mode info block, as fetched from the quantization map, is added to the previously determined quantizer index value. If the fetched quantizer index delta value falls outside the supported quantizer index delta value range reported in the
minQIndexDelta
andmaxQIndexDelta
members of VkVideoEncodeAV1QuantizationMapCapabilitiesKHR, then the quantizer index value used for the mode info block becomes undefined.
-
-
If the configured rate control mode is
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, then the quantizer index value is initialized from the constant quantizer index value specified for the encoded frame.-
If the video encode operation is issued with a quantization delta map, the quantizer index delta value corresponding to the mode info block, as fetched from the quantization map, is added to the previously determined quantizer index value. If the fetched quantizer index delta value falls outside the supported quantizer index delta value range reported in the
minQIndexDelta
andmaxQIndexDelta
members of VkVideoEncodeAV1QuantizationMapCapabilitiesKHR, then the quantizer index value used for the mode info block becomes undefined.
-
-
If the configured rate control mode is not
VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DEFAULT_KHR
orVK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
, then the quantizer index value is initialized by the corresponding rate control algorithm.-
If the video encode operation is issued with a quantization delta map, the quantizer index delta value corresponding to the mode info block, as fetched from the quantization map, is added to the previously determined quantizer index value. If the fetched quantizer index delta value falls outside the supported quantizer index delta value range reported in the
minQIndexDelta
andmaxQIndexDelta
members of VkVideoEncodeAV1QuantizationMapCapabilitiesKHR, then the quantizer index value used for the mode info block becomes undefined. -
If the video encode operation is issued with an emphasis map, the rate control will adjust the quantizer index value based on the emphasis value corresponding to the mode info block, as fetched from the quantization map, according to the following equation:
QIndexnew = f(QIndexprev,e)
Where QIndexnew is the resulting quantizer index value, QIndexprev is the previously determined quantizer index value, e is the emphasis value corresponding to the macroblock, and f is an implementation-defined function for which the following implication is true:
e1 < e2 ⇒ f(QIndex,e1) ≥ f(QIndex,e2)
This means that lower emphasis values will result in higher quantizer index values, whereas higher emphasis values will result in lower quantizer index values, but the function is not strictly decreasing with respect to the input emphasis value for a given input quantizer index value.
-
If clamping to minimum quantizer index values is enabled in the applied rate control layer, then the quantizer index value is clamped to the corresponding minimum quantizer index value.
-
If clamping to maximum quantizer index values is enabled in the applied rate control layer, then the quantizer index value is clamped to the corresponding maximum quantizer index value.
-
-
In all cases, the final quantizer index value is clamped to the minimum and maximum quantizer index values supported by the video profile.
AV1 Encode Requirements
This section described the required AV1 encoding capabilities for physical
devices that have at least one queue family that supports the video codec
operation VK_VIDEO_CODEC_OPERATION_ENCODE_AV1_BIT_KHR
, as returned by
vkGetPhysicalDeviceQueueFamilyProperties2 in
VkQueueFamilyVideoPropertiesKHR::videoCodecOperations
.
Video Std Header Name | Version |
---|---|
|
1.0.0 |
Video Capability | Requirement | Requirement Type1 |
---|---|---|
|
- |
min |
|
4096 |
max |
|
4096 |
max |
|
(64,64) |
max |
|
- |
max |
|
- |
min |
|
0 |
min |
|
0 |
min |
|
- |
min |
|
- |
min |
|
5529600 |
min |
|
1 |
min |
|
(64,64) |
max |
|
|
min |
|
- |
min |
|
(8,8) |
min |
|
1 |
min |
|
- |
max |
|
- |
min |
|
|
min |
|
at least one bit set |
implementation-dependent |
|
0 |
min |
|
- 2 |
min |
|
0 3 |
min |
|
0 3,4 |
min |
|
- 2 |
min |
|
0 3 |
min |
|
0 3,5 |
min |
|
0 3,5 |
min |
|
- 2 |
min |
|
1 |
min |
|
1 |
min |
|
0 |
min |
|
- |
max |
|
- |
min |
|
- |
implementation-dependent |
|
- |
implementation-dependent |
|
- |
min |
|
- 6 |
min |
|
- 7 |
max |
|
- 7 |
min |
- 1
-
The Requirement Type column specifies the requirement is either the minimum value all implementations must support, the maximum value all implementations must support, or the exact value all implementations must support. For bitmasks a minimum value is the least bits all implementations must set, but they may have additional bits set beyond this minimum.
- 2
-
These masks must only have bits set in the least significant
VK_MAX_VIDEO_AV1_REFERENCES_PER_FRAME_KHR
bits (bit index i indicates support for the AV1 reference nameSTD_VIDEO_AV1_REFERENCE_NAME_LAST_FRAME
+ i when using the corresponding AV1 prediction mode), and must have at least as many bits set in any*ReferenceNameMask
capability as the value of the correspondingmax*ReferenceCount
capability. - 3
-
If greater than zero, it must be at least
2
. - 4
-
maxUnidirectionalCompoundGroup1ReferenceCount
must be less than or equal tomaxUnidirectionalCompoundReferenceCount
- 5
-
The sum of
maxBidirectionalCompoundGroup1ReferenceCount
andmaxBidirectionalCompoundGroup2ReferenceCount
must be greater than or equal tomaxBidirectionalCompoundReferenceCount
- 6
-
If VkVideoCapabilitiesKHR::
flags
includesVK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
orVK_VIDEO_ENCODE_CAPABILITY_EMPHASIS_MAP_BIT_KHR
, then thewidth
andheight
members ofmaxQuantizationMapExtent
must be greater than zero. - 7
-
If VkVideoCapabilitiesKHR::
flags
includesVK_VIDEO_ENCODE_CAPABILITY_QUANTIZATION_DELTA_MAP_BIT_KHR
, thenmaxQIndexDelta
must be greater thanminQIndexDelta
.