VK_KHR_video_decode_queue
- 1. Problem Statement
- 2. Solution Space
- 3. Proposal
- 4. Examples
- 4.1. Select queue family with video decode support for a given video codec operation
- 4.2. Check support and query the capabilities for a video decode profile
- 4.3. Select decode output and DPB formats supported by the video decode profile
- 4.4. Create bitstream buffer
- 4.5. Create decode output image and image view
- 4.6. Create DPB image and image view
- 4.7. Record decode operation (video session without DPB slots)
- 4.8. Record decode operation with reconstructed picture information (DISTINCT mode)
- 4.9. Record decode operation with reconstructed picture information (COINCIDE mode)
- 4.10. Record decode operation with reference picture list
- 5. Issues
- 5.1. RESOLVED: Why is there no
VK_PIPELINE_STAGE_VIDEO_DECODE_BIT_KHR
? - 5.2. RESOLVED: Are the decode output picture data and reconstructed picture data used to activate a DPB slot written to the same resource?
- 5.3. RESOLVED: How can layered codec-specific decode extensions enable applications to provide the necessary codec-specific picture information, parameter sets, etc. that may be needed to perform the video coding operations?
- 5.4. RESOLVED: Can
vkCmdVideoDecodeKHR
only decode frames? What about field decoding, slice decoding, etc.? - 5.5. RESOLVED: What is the effect of the flags provided in
VkVideoDecodeUsageInfoKHR::videoUsageHints
? - 5.6. RESOLVED: When is it mandatory to specify reconstructed picture information in
VkVideoDecodeInfoKHR::pSetupReferenceSlot
? - 5.7. RESOLVED: Does always requiring the specification of reconstructed picture information in
VkVideoDecodeInfoKHR::pSetupReferenceSlot
have any undesired performance consequences?
- 5.1. RESOLVED: Why is there no
- 6. Further Functionality
This document outlines a proposal to enable performing video decode operations in Vulkan.
1. Problem Statement
Integrating video decode operations into Vulkan applications enable a wide set of new usage scenarios including, but not limited to, the following examples:
-
Applying post-processing on top of video frames decoded from a compressed video stream
-
Sourcing dynamic texture data from compressed video streams
It is also not uncommon for Vulkan capable devices to feature dedicated hardware acceleration for video decompression.
The goal of this proposal is to enable these use cases, expose the underlying hardware capabilities, and provide tight integration with other functionalities of the Vulkan API.
2. Solution Space
The following options have been considered:
-
Rely on external sharing capabilities to interact with existing video decode APIs
-
Add new dedicated APIs to Vulkan specific to video decoding
-
Build upon a common set of APIs that enable video coding operations in general
As discussed in the proposal for the VK_KHR_video_queue
extension, reusing a common, shared infrastructure across all video coding functionalities that leverage existing Vulkan capabilities was preferred, hence this extension follows option 3.
Further sub-options were considered whether a common set of APIs could be used to enable video decoding in general, upon which codec-specific extensions can be built. As the possibility of API reuse is similarly possible within the domain of video decoding as it is for video coding in general, this proposal follows the same principle to extend VK_KHR_video_queue
with codec-independent video decoding capabilities.
3. Proposal
3.1. Video Decode Queues
While VK_KHR_video_queue
already includes support for a more fine grained query to determine the set of supported video codec operations for a given queue family, this extension introduces an explicit queue flag called VK_QUEUE_VIDEO_DECODE_BIT_KHR
to indicate support for video decoding.
Applications can use this flag bit to identify video decode capable queue families in general, if needed, before querying more details about the individual video codec operations supported through the use of the VkQueueFamilyVideoPropertiesKHR
structure. It also indicates support for the set of command buffer commands available on video decode queues, which include the following:
-
Pipeline barrier and event handling commands used for synchronization
-
Basic query commands to begin, end, and reset queries
-
Timestamp write commands
-
Generic video coding commands
-
The new video decode command introduced by this extension
For the full list of individual commands supported by video decode queues, and whether any command is supported inside/outside of video coding scopes, refer to the manual page of the corresponding command.
3.2. Video Decode Profiles
Video decode profiles are defined using a VkVideoProfileInfoKHR
structure that specifies a videoCodecOperation
value identifying a video decode operation. This extension does not introduce any video decode operation flags, as that is left to the codec-specific decode extensions.
On the other hand, this extension allows the application to specify usage hints specific to video decoding by chaining the following new structure to VkVideoProfileInfoKHR
:
typedef struct VkVideoDecodeUsageInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoDecodeUsageFlagsKHR videoUsageHints;
} VkVideoDecodeUsageInfoKHR;
The hint flags introduced by this extension are as follows:
-
VK_VIDEO_DECODE_USAGE_TRANSCODING_BIT_KHR
should be used in video transcoding use cases -
VK_VIDEO_DECODE_USAGE_OFFLINE_BIT_KHR
should be used when decoding local video content -
VK_VIDEO_DECODE_USAGE_STREAMING_BIT_KHR
should be used when decoding video content streamed over network
These usage hints do not provide any restrictions or guarantees, so any combination of flags can be used, but they allow the application to better communicate the intended use case scenario so that implementations can make appropriate choices based on it.
Logically, however, it is part of the video profile definition, so capabilities may vary across video decode profiles that only differ in terms of video decode usage hints, and it also affects video profile compatibility between resources and video sessions, so the same VkVideoDecodeUsageInfoKHR
structure has to be included everywhere where the specific video decode profile is used.
3.3. New Pipeline Stage and Access Flags
This extension also introduces a new pipeline stage identified by the VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR
flag to enable synchronizing video decode operations with respect to other Vulkan operations.
In addition, two new access flags are introduced to indicate reads and writes, respectively, performed by the video decode pipeline stage:
-
VK_ACCESS_2_VIDEO_DECODE_READ_BIT_KHR
-
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR
As these flags did no longer fit into the legacy 32-bit enums, this extension requires the VK_KHR_synchronization2
extension and relies on the 64-bit versions of the pipeline stage and access mask flags to handle synchronization specific to video decode operations.
3.4. New Buffer and Image Usage Flags
This extension introduces the following new buffer usage flags:
-
VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
allows using the buffer as a video bitstream buffer in video decode operations -
VK_BUFFER_USAGE_VIDEO_DECODE_DST_BIT_KHR
is reserved for future use
This extension also introduces the following new image usage flags:
-
VK_IMAGE_USAGE_VIDEO_DECODE_SRC_BIT_KHR
is reserved for future use -
VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
allows using the image as a decode output picture -
VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
allows using the image as a decode DPB picture (reconstructed/reference picture)
Specifying these usage flags alone is not sufficient to create a buffer or image that is compatible with a video session created against any particular video profile. In fact, when specifying any of these usage flags at resource creation time, the application has to include a VkVideoProfileListInfoKHR
structure in the pNext
chain of the corresponding create info structure with VkVideoProfileListInfoKHR::pProfiles
including a video decode profile. The created resources will be compatible only with that single video decode profile (and any additional video encode profiles that may have been specified in the list).
3.5. New Format Feature Flags
To indicate which formats are compatible with video decode usage, the following new format feature flags are introduced:
-
VK_FORMAT_FEATURE_VIDEO_DECODE_OUTPUT_BIT_KHR
indicates support for decode output picture usage -
VK_FORMAT_FEATURE_VIDEO_DECODE_DPB_BIT_KHR
indicates support for decode DPB picture usage
The presence of the format flags alone, as returned by the various format queries, is not sufficient to indicate that an image with that format is usable with video decoding using any particular video decode profile. Actual compatibility with a specific video decode profile has to be verified using the vkGetPhysicalDeviceVideoFormatPropertiesKHR
command.
3.6. Basic Operation
Video decode operations can be recorded into command buffers allocated from command pools created against queue families that support the VK_QUEUE_VIDEO_DECODE_BIT_KHR
flag.
Recording video decode operations happens through the use of the following new command:
VKAPI_ATTR void VKAPI_CALL vkCmdDecodeVideoKHR(
VkCommandBuffer commandBuffer,
const VkVideoDecodeInfoKHR* pDecodeInfo);
The common, codec-independent parameters of the video decode operation are provided using the following new structure:
typedef struct VkVideoDecodeInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoDecodeFlagsKHR flags;
VkBuffer srcBuffer;
VkDeviceSize srcBufferOffset;
VkDeviceSize srcBufferRange;
VkVideoPictureResourceInfoKHR dstPictureResource;
const VkVideoReferenceSlotInfoKHR* pSetupReferenceSlot;
uint32_t referenceSlotCount;
const VkVideoReferenceSlotInfoKHR* pReferenceSlots;
} VkVideoDecodeInfoKHR;
Executing such a video decode operation results in the decompression of a single picture (unless otherwise defined by layered extensions), and, if there is an active VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
query, the status of the video decode operation is recorded into the active query slot.
If the decode operation requires additional codec-specific parameters, then such parameters are provided in the pNext
chain of the structure above. Whether such codec-specific information is necessary, and what it may contain is up to the codec-specific extensions.
srcBuffer
, srcBufferOffset
, and srcBufferRange
provide information about the used video bitstream buffer range. The video decode operation reads the compressed picture data from this buffer range.
The application has to create the video bitstream buffer with the new VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
usage flag, and must also include the used video session’s video profile in the VkVideoProfileListInfoKHR
structure specified at buffer creation time.
The expected contents of the video bitstream buffer range depends on the specific video codec used, as defined by corresponding codec-specific extensions built upon this proposal.
The dstPictureResource
, pSetupReferenceSlot
, and pReferenceSlots
members specify the decode output picture, reconstructed picture, and reference pictures, respectively, used by the video decode operation, as discussed in later sections of this proposal.
3.7. Decode Output Picture
dstPictureResource
defines the parameters of the video picture resource to use as the decode output picture. The video decode operation writes the picture data resulting from the decompression of the bitstream data to this video picture resource. As such it is a mandatory parameter of the operation.
The application has to create the image view specified in dstPictureResource.imageViewBinding
with the new VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
usage flag, and must also include the used video session’s video profile in the VkVideoProfileListInfoKHR
structure specified at image creation time.
The image subresource backing the decode output picture has to be in the new VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR
layout at the time the video decode operation is executed, except if it matches the reconstructed picture, as discussed later, in which case the image subresource has to be in the new VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout.
3.8. Reconstructed Picture
pSetupReferenceSlot
is an optional parameter specifying the video picture resource and DPB slot index to use for the reconstructed picture. Implementations use the reconstructed picture for one of the following purposes:
-
When the decoded picture is requested to be set up as a reference, according to the codec-specific semantics, the video decode operation will output the raw decoding results to this picture and activate the reconstructed picture’s DPB slot with it in order to enable using the picture as a reference picture in future video decode operations. Content-wise, this picture is generally identical to the decode output picture unless the decode output picture contains any sort of post-processing (e.g. film grain).
-
When the decoded picture is not requested to be set up as a reference, implementations may use the reconstructed picture’s resource and/or DPB slot for intermediate data required by the decoding process.
Accordingly, pSetupReferenceSlot
must never be NULL
, except when the video session was created without any DPB slots.
The original version of this extension only required the specification of the reconstructed picture information (i.e. a non- |
In summary, for decoded pictures requested to be set up as a reference, this parameter can be used to add new reference pictures to the DPB, and change the association between DPB slot indices and video picture resources. That also implies that the application has to specify a video picture resource in pSetupReferenceSlot→pPictureResource
that was included in the set of bound reference picture resources specified when the video coding scope was started (in one of the elements of VkVideoBeginCodingInfoKHR::pReferenceSlots
). No similar requirement exists for the decode output picture specified by dstPictureResource
which can refer to any video picture resource.
The application has to create the image view specified in pSetupReferenceSlot→pPictureResource→imageViewBinding
with the new VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
usage flag, and must also include the used video session’s video profile in the VkVideoProfileListInfoKHR
structure specified at image creation time.
The image subresource backing the reconstructed picture has to be in the new VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout at the time the video decode operation is executed.
Implementations diverge in the way they handle reconstructed pictures:
-
On some implementations the decode output picture and reconstructed picture have to be distinct video picture resources, even if the picture data written by the video decode operation to the two resources is identical.
-
On other implementations the decode output picture and reconstructed picture have to coincide, i.e. both have to refer to the same video picture resource.
-
Some other implementations may actually support both modes.
Support for the individual modes is indicated by the VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
and VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
capability flags. Implementations are only required to support one of the two modes, hence portable applications have to make sure that they support both options. Dealing with this implementation divergence, however, is fairly simple.
Generally speaking, the application has to create the following images in order to decode a video stream:
-
One or more images usable as reconstructed pictures, some of which will contain the reference pictures associated with the DPB.
-
Optionally, an additional image usable as the decode output picture when the reconstructed picture has to be distinct or when using video sessions without any DPB slots.
In practice, applications will typically allocate more than one decode output pictures for buffering purposes and/or to minimize synchronization overhead resulting from having to prevent write-after-write hazards across subsequent video decode operations targeting the same decode output picture resource. Some applications may also allocate more resources for reference pictures than the number of DPB slots for similar reasons. |
The application should always create the image(s) backing the DPB with the VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
usage flag. If only the coincide mode is supported for reconstructed pictures, then the DPB image(s) that may be used as a reconstructed picture in a video decode operation have to also include the VK_IMAGE_USAGE_VIDEO_DECODE_DST_KHR
usage flag to allow them to be used as coinciding decode output and reconstructed pictures.
The image backing the decode output picture should always be created with the VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
usage flag.
The DPB image(s) are expected to be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout while in use by video decode operations, while the decode output image is expected to be in the VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR
layout.
Here we have two cases to consider:
-
When a reconstructed picture is specified; and
-
When one is not needed (as the video session was created without any DPB slots)
In case of (2), the application can use the image created for the decode output picture in dstPictureResource
, indifferent of whether the distinct or coincide mode is used.
In case of (1), the behavior is as follows:
-
In distinct mode the decode output picture’s image should be used in
dstPictureResource
, and (one of) the DPB image(s) should be referred to inpSetupReferenceSlot
, as it would naturally follow. -
In coincide mode both
dstPictureResource
andpSetupReferenceSlot
should refer to a video picture resource in (one of) the DPB image(s).
In the latter situation the decoded picture will be written only to the DPB image, and the image created for decode-output-only use remains unused. If the application wants to concurrently use the decoded picture while also performing video decode operations using the same picture as reference, it can manually copy the decoded picture stored in the DPB image to the otherwise unused decode output image, if needed. This way it practically mimics the behavior of an implementation supporting the distinct mode. However, in most use cases that is not necessary, hence on implementations supporting the coincide mode the application can avoid having two copies of the decoded pictures, even if they are used as reference pictures later on.
If the video profile in use requires additional codec-specific parameters for the reconstructed picture, then such parameters are provided in the pNext
chain of pSetupReferenceSlot
. Whether such codec-specific reconstructed picture information is necessary, and what it may contain is up to the codec-specific extensions.
3.9. Reference Pictures
If the video session allows, reference pictures can be specified in the pReferenceSlots
array to provide predictions of the values of samples of the decoded picture.
Each entry in the pReferenceSlots
array adds one or more pictures, currently associated with the DPB slot specified in the element’s slotIndex
member and stored in the video picture resource specified in the element’s pPictureResource
member, to the list of active reference pictures to use in the video decode operation.
The application has to make sure to specify each video picture resource used as a reference picture in a video decode operation, beforehand, in the set of bound reference picture resources specified when the video coding scope was started (in one of the elements of VkVideoBeginCodingInfoKHR::pReferenceSlots
).
The application has to create the image view specified in pPictureResource→imageViewBinding
of the elements of pReferenceSlots
with the new VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
usage flag, and must also include the used video session’s video profile in the VkVideoProfileListInfoKHR
structure specified at image creation time.
The image subresources backing the reference pictures have to be in the new VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR
layout at the time the video decode operation is executed.
Typically the number of elements in pReferenceSlots
equals the number of reference pictures added, but in certain cases (depending on the used video codec and video profile) there may be multiple pictures in the same DPB slot resource.
If the video profile in use requires additional codec-specific parameters for the reference pictures, then such parameters are provided in the pNext
chain of the elements of pReferenceSlots
. Whether such codec-specific reference picture information is necessary, and what it may contain is up to the codec-specific extensions.
3.10. Capabilities
Querying capabilities specific to video decoding happens through the query mechanisms introduced by the VK_KHR_video_queue
extension.
Support for individual video decode operations can be retrieved for each queue family using the VkQueueFamilyVideoPropertiesKHR
structure, as discussed earlier.
The application can also use the vkGetPhysicalDeviceVideoCapabilitiesKHR
command to query the capabilities of a specific video decode profile. In case of video decode profiles, the following new structure has to be included in the pNext
chain of the VkVideoCapabilitiesKHR
structure used to retrieve the general video decode capabilities:
typedef struct VkVideoDecodeCapabilitiesKHR {
VkStructureType sType;
void* pNext;
VkVideoDecodeCapabilityFlagsKHR flags;
} VkVideoDecodeCapabilitiesKHR;
This structure only contains a new decode-specific flags
member that indicates support for various video decode capabilities, like the support for the distinct and coincide modes for reconstructed pictures, as discussed earlier.
The vkGetPhysicalDeviceVideoFormatPropertiesKHR
command can be used to query the supported image/picture formats for a given set of video profiles, as described in the VK_KHR_video_queue
extension.
In particular, if the application would like to query the list of format properties supported for decode output pictures, then it should include the new VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
usage flag in VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
.
Similarly, to query the list of format properties supported for decode DPB pictures (reconstructed/reference pictures), then it should include the new VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
usage flag in VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
.
When using the coincide mode, the application will need DPB pictures that support both decode output and DPB usage, hence it should call vkGetPhysicalDeviceVideoFormatPropertiesKHR
with VkPhysicalDeviceVideoFormatInfoKHR::imageUsage
including both VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
and VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR
.
3.11. Usage Summary
To summarize the usage of the video decoding features introduced by this extension, let us take a look at a typical usage scenario when using this extension to decode a video stream.
Before the application can start recording command buffers with video decode operations, it has to do the following:
-
Ensure that the implementation can decode the video content at hand by first querying the video codec operations supported by each queue family using the
vkGetPhysicalDeviceQueueFamilyProperties2
command and theVkQueueFamilyVideoPropertiesKHR
output structure. -
If needed, the application has to also retrieve the
VkQueueFamilyQueryResultStatusPropertiesKHR
output structure for the queue family to check support forVK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
queries. -
Construct the
VkVideoProfileInfoKHR
structure describing the entire video profile, including the video codec operation, chroma subsampling, bit depths, and any other usage or codec-specific parameters. -
Ensure that the specific video profile is supported by the implementation using the
vkGetPhysicalDeviceVideoCapabilitiesKHR
command and retrieve the general, decode-specific, and codec-specific capabilities at the same time. -
Query the list of supported image/picture format properties supported for the video profile using the
vkGetPhysicalDeviceVideoFormatPropertiesKHR
structure, and select a suitable format for the DPB and decode output pictures. -
If needed, create one or more images corresponding to the decode output picture(s) and/or DPB picture(s) with the appropriate usage flags and video profile list, as described earlier, and bind suitable device memory to them. Also create any image views with the appropriate usage flags to use in the video decode operations.
-
Create a buffer with the
VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR
usage flag and the video profile list, to use as the source video bitstream buffer. If the buffer is expected to be populated using the CPU, consider binding compatible host-visible device memory to the buffer. -
If result status queries are needed and supported (as determined earlier), create a query pool with the
VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
query type and the used video decode profile. -
Create the video session using the video decode profile and appropriate parameters within the capabilities supported by the profile, as determined earlier. Bind suitable device memory to each memory binding index of the video session.
-
If needed, create a video session parameters object for the video session.
Recording video decode operations into command buffers typically consists of the following sequence:
-
Start a video coding scope with the created video session (and parameters) object using the
vkCmdBeginVideoCodingKHR
command. Make sure to include all video picture resources inVkVideoBeginCodingInfoKHR::pReferenceSlots
that may be used as reconstructed or reference pictures within the video coding scope, and ensure that the DPB slots specified for each reflect the current DPB slot association for the resource. -
If this is the first video coding scope the video session is used in, reset the video session to the initial state by recording a
vkCmdControlVideoCodingKHR
command with theVK_VIDEO_CODING_CONTROL_RESET_BIT_KHR
flag. -
If needed, start a
VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR
query usingvkCmdBeginQuery
. Reset the query usingvkCmdResetQueryPool
, beforehand, as needed. -
Issue a video decode operation using the
vkCmdDecodeVideoKHR
command with appropriate parameters, as discussed earlier. -
If needed, end the started query using
vkCmdEndQuery
. -
Record any further control or decode operations into the video coding scope, as needed.
-
End the video coding scope using the
vkCmdEndVideoCodingKHR
command.
4. Examples
4.1. Select queue family with video decode support for a given video codec operation
VkVideoCodecOperationFlagBitsKHR neededVideoDecodeOp = ...
uint32_t queueFamilyIndex;
uint32_t queueFamilyCount;
vkGetPhysicalDeviceQueueFamilyProperties2(physicalDevice, &queueFamilyCount, NULL);
VkQueueFamilyProperties2* props = calloc(queueFamilyCount,
sizeof(VkQueueFamilyProperties2));
VkQueueFamilyVideoPropertiesKHR* videoProps = calloc(queueFamilyCount,
sizeof(VkQueueFamilyVideoPropertiesKHR));
for (queueFamilyIndex = 0; queueFamilyIndex < queueFamilyCount; ++queueFamilyIndex) {
props[queueFamilyIndex].sType = VK_STRUCTURE_TYPE_QUEUE_FAMILY_PROPERTIES_2;
props[queueFamilyIndex].pNext = &videoProps[queueFamilyIndex];
videoProps[queueFamilyIndex].sType = VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR;
}
vkGetPhysicalDeviceQueueFamilyProperties2(physicalDevice, &queueFamilyCount, props);
for (queueFamilyIndex = 0; queueFamilyIndex < queueFamilyCount; ++queueFamilyIndex) {
if ((props[queueFamilyIndex].queueFamilyProperties.queueFlags & VK_QUEUE_VIDEO_DECODE_BIT_KHR) != 0 &&
(videoProps[queueFamilyIndex].videoCodecOperations & neededVideoDecodeOp) != 0) {
break;
}
}
if (queueFamilyIndex < queueFamilyCount) {
// Found appropriate queue family
...
} else {
// Did not find a queue family with the needed capabilities
...
}
4.2. Check support and query the capabilities for a video decode profile
VkResult result;
// We also include the optional decode usage hints here
VkVideoDecodeUsageInfoKHR profileUsageInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_USAGE_INFO_KHR,
.pNext = ... // pointer to codec-specific profile structure
.videoUsageHints = VK_VIDEO_DECODE_USAGE_DEFAULT_KHR,
};
VkVideoProfileInfoKHR profileInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PROFILE_INFO_KHR,
.pNext = &profileUsageInfo
.videoCodecOperation = ... // used video decode operation
.chromaSubsampling = VK_VIDEO_CHROMA_SUBSAMPLING_420_BIT_KHR,
.lumaBitDepth = VK_VIDEO_COMPONENT_BIT_DEPTH_8_BIT_KHR,
.chromaBitDepth = VK_VIDEO_COMPONENT_BIT_DEPTH_8_BIT_KHR
};
VkVideoDecodeCapabilitiesKHR decodeCapabilities = {
.sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_CAPABILITIES_KHR,
.pNext = ... // pointer to codec-specific capability structure
}
VkVideoCapabilitiesKHR capabilities = {
.sType = VK_STRUCTURE_TYPE_VIDEO_CAPABILITIES_KHR,
.pNext = &decodeCapabilities
};
result = vkGetPhysicalDeviceVideoCapabilitiesKHR(physicalDevice, &profileInfo, &capabilities);
if (result == VK_SUCCESS) {
// Profile is supported, check additional capabilities
...
} else {
// Profile is not supported, result provides additional information about why
...
}
4.3. Select decode output and DPB formats supported by the video decode profile
VkVideoProfileInfoKHR profileInfo = {
...
};
VkVideoProfileListInfoKHR profileListInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PROFILE_LIST_INFO_KHR,
.pNext = NULL,
.profileCount = 1,
.pProfiles = &profileInfo
};
VkPhysicalDeviceVideoFormatInfoKHR formatInfo = {
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VIDEO_FORMAT_INFO_KHR,
.pNext = &profileListInfo
};
VkVideoFormatPropertiesKHR* formatProps = NULL;
// First query decode output formats
formatInfo.imageUsage = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR;
vkGetPhysicalDeviceVideoFormatPropertiesKHR(physicalDevice, &formatInfo, &formatCount, NULL);
formatProps = calloc(formatCount, sizeof(VkVideoFormatPropertiesKHR));
for (uint32_t i = 0; i < formatCount; ++i) {
formatProps[i].sType = VK_STRUCTURE_TYPE_VIDEO_FORMAT_PROPERTIES_KHR;
}
vkGetPhysicalDeviceVideoFormatPropertiesKHR(physicalDevice, &formatInfo, &formatCount, formatProps);
for (uint32_t i = 0; i < formatCount; ++i) {
// Select decode output format and image creation capabilities best suited for the use case
...
}
free(formatProps);
// Then query DPB formats
formatInfo.imageUsage = VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR;
// If DISTINCT mode is not supported or if COINCIDE mode is supported and preferred, then the DPB
// images generally have to be created to be usable both as decode output and DPB pictures
if ((decodeCapabilities.flags & VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR) == 0 || preferCoincideMode) {
formatInfo.imageUsage |= VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR;
}
vkGetPhysicalDeviceVideoFormatPropertiesKHR(physicalDevice, &formatInfo, &formatCount, NULL);
formatProps = calloc(formatCount, sizeof(VkVideoFormatPropertiesKHR));
for (uint32_t i = 0; i < formatCount; ++i) {
formatProps[i].sType = VK_STRUCTURE_TYPE_VIDEO_FORMAT_PROPERTIES_KHR;
}
vkGetPhysicalDeviceVideoFormatPropertiesKHR(physicalDevice, &formatInfo, &formatCount, formatProps);
for (uint32_t i = 0; i < formatCount; ++i) {
// Select DPB format and image creation capabilities best suited for the use case
...
}
free(formatProps);
4.4. Create bitstream buffer
VkBuffer bitstreamBuffer = VK_NULL_HANDLE;
VkVideoProfileListInfoKHR profileListInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PROFILE_LIST_INFO_KHR,
.pNext = NULL,
.profileCount = ... // number of video profiles to use the bitstream buffer with
.pProfiles = ... // pointer to an array of video profile information structure chains
};
VkBufferCreateInfo createInfo = {
.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO,
.pNext = &profileListInfo,
...
.usage = VK_BUFFER_USAGE_VIDEO_DECODE_SRC_BIT_KHR | ... // any other usages that may be needed
...
};
vkCreateBuffer(device, &createInfo, NULL, &bitstreamBuffer);
4.5. Create decode output image and image view
VkImage outputImage = VK_NULL_HANDLE;
VkImageView outputImageView = VK_NULL_HANDLE;
VkVideoProfileListInfoKHR profileListInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PROFILE_LIST_INFO_KHR,
.pNext = NULL,
.profileCount = ... // number of video profiles to use the decode output image with
.pProfiles = ... // pointer to an array of video profile information structure chains
};
VkImageCreateInfo imageCreateInfo = {
.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
.pNext = &profileListInfo,
...
.usage = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR | ... // any other usages that may be needed
...
};
vkCreateImage(device, &imageCreateInfo, NULL, &outputImage);
VkImageViewUsageCreateInfo imageViewUsageInfo = {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_USAGE_CREATE_INFO,
.pNext = NULL,
.usage = VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR
};
VkImageViewCreateInfo imageViewCreateInfo = {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.pNext = &imageViewUsageInfo,
.flags = 0,
.image = outputImage,
.viewType = ... // image view type (only 2D or 2D_ARRAY is supported)
... // other image view creation parameters
};
vkCreateImageView(device, &imageViewCreateInfo, NULL, &outputImageView);
4.6. Create DPB image and image view
// NOTE: This example creates a single image and image view that is used to back all DPB pictures
// but, depending on the support of the VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR
// capability flag, the application can choose to create separate images for each DPB slot or
// picture
VkImage dpbImage = VK_NULL_HANDLE;
VkImageView dpbImageView = VK_NULL_HANDLE;
VkImageUsage dpbImageUsage = VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR;
// If DISTINCT mode is not supported or if COINCIDE mode is supported and preferred, then the DPB
// images generally have to be created to be usable both as decode output and DPB pictures
if ((decodeCapabilities.flags & VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR) == 0 || preferCoincideMode) {
dpbImageUsage |= VK_IMAGE_USAGE_VIDEO_DECODE_DST_BIT_KHR;
}
VkVideoProfileListInfoKHR profileListInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PROFILE_LIST_INFO_KHR,
.pNext = NULL,
.profileCount = ... // number of video profiles to use the decode DPB image with
.pProfiles = ... // pointer to an array of video profile information structure chains
};
VkImageCreateInfo imageCreateInfo = {
.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO,
.pNext = &profileListInfo,
...
.usage = dpbImageUsage | ... // any other usages that may be needed
...
.arrayLayers = // typically equal to the DPB slot count
};
vkCreateImage(device, &imageCreateInfo, NULL, &dpbImage);
VkImageViewUsageCreateInfo imageViewUsageInfo = {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_USAGE_CREATE_INFO,
.pNext = NULL,
.usage = dpbImageUsage
};
VkImageViewCreateInfo imageViewCreateInfo = {
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.pNext = &imageViewUsageInfo,
.flags = 0,
.image = dpbImage,
.viewType = ... // image view type (only 2D or 2D_ARRAY is supported)
... // other image view creation parameters
};
vkCreateImageView(device, &imageViewCreateInfo, NULL, &dpbImageView);
4.7. Record decode operation (video session without DPB slots)
vkCmdBeginVideoCodingKHR(commandBuffer, ...);
VkVideoPictureResourceInfoKHR decodeOutputPictureResource = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PICTURE_RESOURCE_INFO_KHR,
.pNext = NULL,
.codedOffset = ... // offset within the image subresource (typically { 0, 0 })
.codedExtent = ... // extent of decoded picture (typically the video frame size)
.baseArrayLayer = 0,
.imageViewBinding = outputImageView
};
VkVideoDecodeInfoKHR decodeInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_INFO_KHR,
.pNext = ... // pointer to codec-specific picture information structure
.flags = 0,
.srcBuffer = bitstreamBuffer,
.srcBufferOffset = ... // offset of picture data in the video bitstream buffer
.srcBufferRange = ... // size of picture data in the video bitstream buffer
.dstPictureResource = decodeOutputPictureResource,
.pSetupReferenceSlot = NULL,
.referenceSlotCount = 0,
.pReferenceSlots = NULL
};
vkCmdDecodeVideoKHR(commandBuffer, &decodeInfo);
vkCmdEndVideoCodingKHR(commandBuffer, ...);
4.8. Record decode operation with reconstructed picture information (DISTINCT mode)
// Bound reference resource list provided has to include reconstructed picture resource
vkCmdBeginVideoCodingKHR(commandBuffer, ...);
VkVideoPictureResourceInfoKHR decodeOutputPictureResource = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PICTURE_RESOURCE_INFO_KHR,
.pNext = NULL,
.codedOffset = ... // offset within the image subresource (typically { 0, 0 })
.codedExtent = ... // extent of decoded picture (typically the video frame size)
.baseArrayLayer = 0,
.imageViewBinding = outputImageView
};
VkVideoPictureResourceInfoKHR reconstructedPictureResource = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PICTURE_RESOURCE_INFO_KHR,
.pNext = NULL,
.codedOffset = ... // offset within the image subresource (typically { 0, 0 })
.codedExtent = ... // extent of reconstructed picture (typically the video frame size)
.baseArrayLayer = ... // layer to use for setup picture in DPB
.imageViewBinding = dpbImageView
};
VkVideoReferenceSlotInfoKHR setupSlotInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
.pNext = ... // pointer to codec-specific reconstructed picture information structure
.slotIndex = ... // DPB slot index to use with the reconstructed picture
// (optionally activated per the codec-specific semantics)
.pPictureResource = &reconstructedPictureResource
};
VkVideoDecodeInfoKHR decodeInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_INFO_KHR,
.pNext = ... // pointer to codec-specific picture information structure
...
.dstPictureResource = decodeOutputPictureResource,
.pSetupReferenceSlot = &setupSlotInfo,
...
};
vkCmdDecodeVideoKHR(commandBuffer, &decodeInfo);
vkCmdEndVideoCodingKHR(commandBuffer, ...);
4.9. Record decode operation with reconstructed picture information (COINCIDE mode)
// Bound reference resource list provided has to include reconstructed picture resource
vkCmdBeginVideoCodingKHR(commandBuffer, ...);
VkVideoPictureResourceInfoKHR reconstructedPictureResource = {
.sType = VK_STRUCTURE_TYPE_VIDEO_PICTURE_RESOURCE_INFO_KHR,
.pNext = NULL,
.codedOffset = ... // offset within the image subresource (typically { 0, 0 })
.codedExtent = ... // extent of decoded picture (typically the video frame size)
.baseArrayLayer = ... // layer to use for setup picture in DPB
.imageViewBinding = dpbImageView
};
VkVideoReferenceSlotInfoKHR setupSlotInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
.pNext = ... // pointer to codec-specific reconstructed picture information structure
.slotIndex = ... // DPB slot index to use with the reconstructed picture
// (optionally activated per the codec-specific semantics)
.pPictureResource = &reconstructedPictureResource
};
VkVideoDecodeInfoKHR decodeInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_INFO_KHR,
.pNext = ... // pointer to codec-specific picture information structure
...
.dstPictureResource = reconstructedPictureResource,
.pSetupReferenceSlot = &setupSlotInfo,
...
};
vkCmdDecodeVideoKHR(commandBuffer, &decodeInfo);
vkCmdEndVideoCodingKHR(commandBuffer, ...);
4.10. Record decode operation with reference picture list
// Bound reference resource list provided has to include all used reference picture resources
vkCmdBeginVideoCodingKHR(commandBuffer, ...);
VkVideoPictureResourceInfoKHR referencePictureResources[] = {
{
.sType = VK_STRUCTURE_TYPE_VIDEO_PICTURE_RESOURCE_INFO_KHR,
.pNext = NULL,
.codedOffset = ... // offset within the image subresource (typically { 0, 0 })
.codedExtent = ... // extent of reference picture (typically the video frame size)
.baseArrayLayer = ... // layer of first reference picture resource
.imageViewBinding = dpbImageView
},
{
.sType = VK_STRUCTURE_TYPE_VIDEO_PICTURE_RESOURCE_INFO_KHR,
.pNext = NULL,
.codedOffset = ... // offset within the image subresource (typically { 0, 0 })
.codedExtent = ... // extent of reference picture (typically the video frame size)
.baseArrayLayer = ... // layer of second reference picture resource
.imageViewBinding = dpbImageView
},
...
};
// NOTE: Individual resources do not have to refer to the same image view, e.g. if different
// image views are created for each picture resource, or if the
// VK_VIDEO_CAPABILITY_SEPARATE_REFERENCE_IMAGES_BIT_KHR capability is supported and the
// application created separate images for the reference pictures.
VkVideoReferenceSlotInfoKHR referenceSlotInfo[] = {
{
.sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
.pNext = ... // pointer to codec-specific reference picture information structure
.slotIndex = ... // DPB slot index of the first reference picture
.pPictureResource = &referencePictureResource[0]
},
{
.sType = VK_STRUCTURE_TYPE_VIDEO_REFERENCE_SLOT_INFO_KHR,
.pNext = ... // pointer to codec-specific reference picture information structure
.slotIndex = ... // DPB slot index of the second reference picture
.pPictureResource = &referencePictureResource[1]
},
...
};
VkVideoDecodeInfoKHR decodeInfo = {
.sType = VK_STRUCTURE_TYPE_VIDEO_DECODE_INFO_KHR,
.pNext = ... // pointer to codec-specific picture information structure
...
.referenceSlotCount = sizeof(referenceSlotInfo) / sizeof(referenceSlotInfo[0]),
.pReferenceSlots = &referenceSlotInfo[0]
};
vkCmdDecodeVideoKHR(commandBuffer, &decodeInfo);
vkCmdEndVideoCodingKHR(commandBuffer, ...);
5. Issues
5.1. RESOLVED: Why is there no VK_PIPELINE_STAGE_VIDEO_DECODE_BIT_KHR
?
This extension requires the VK_KHR_synchronization2
extension because the new access flags introduced did not fit in the 32-bit enum VkAccessFlagBits
. Accordingly, all new pipeline stage and access flags have been added to the corresponding 64-bit enums and no new flags have been added to the legacy 32-bit enums. While the new pipeline stage flag introduced uses bit #26 which would also fit in the legacy VkPipelineStageFlagBits
enum, there is no real benefit to include it. Instead the bit is marked reserved.
5.2. RESOLVED: Are the decode output picture data and reconstructed picture data used to activate a DPB slot written to the same resource?
When activating DPB slots with reconstructed pictures (reference picture setup), decode operations have to write the decompressed picture data to a DPB-capable video picture resource.
Behavior varies across implementations in this case:
-
Some implementations write the outputs of the picture decompression to two separate resources: the decoded output picture and the reconstructed picture
-
Some other implementations only write picture decompression results to one place, which in this case has to be a DPB picture (as it must be usable as a reference picture later on)
A separate output could be useful if e.g. the application intends to use the decode output picture for other purposes (e.g. for window-system presentation or for texture sampling), while using the reconstructed picture in parallel to continue decoding. However, such concurrent use of the outputs is not always necessary.
Trying to mandate a uniform behavior across such implementations could have negative performance implications:
-
If separate outputs would be required, then implementations using a single output would have to perform additional copies even though the application use cases might not need to have one.
-
If a single output would be required, then applications would have to do copies if they wanted to do such concurrent processing and would not be able to take advantage of implementations that already can write separate outputs in an optimized fashion.
Instead, this extension allows implementations to support either of these modes of operations (separate or single output), as indicated by the VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR
and VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_COINCIDE_BIT_KHR
capability flags, respectively. Implementations are required to support only one of these two modes, and the extension even allows for implementations that may support both modes of operation.
5.3. RESOLVED: How can layered codec-specific decode extensions enable applications to provide the necessary codec-specific picture information, parameter sets, etc. that may be needed to perform the video coding operations?
There are multiple points where codec-specific picture information can be provided to a video decode operation. This extension suggests the following convention:
-
Codec-specific decode parameters are expected to be provided in the
pNext
chain ofVkVideoDecodeInfoKHR
. -
Codec-specific reconstructed picture information is expected to be provided in the
pNext
chain ofVkVideoDecodeInfoKHR::pSetupReferenceSlot
. -
Codec-specific reference picture information is expected to be provided in the
pNext
chain of the elements of theVkVideoDecodeInfoKHR::pReferenceSlots
array.
5.4. RESOLVED: Can vkCmdVideoDecodeKHR
only decode frames? What about field decoding, slice decoding, etc.?
This extension does not define the types of pictures or sub-picture content that can be decoded by a vkCmdVideoDecodeKHR
command. It is expected that the codec-specific decode extensions built upon this extension define the types of pictures that can be decoded. Furthermore, both codec-specific and codec-independent extensions can expand the set of capabilities introduced here to enable more advanced use cases, as needed.
5.5. RESOLVED: What is the effect of the flags provided in VkVideoDecodeUsageInfoKHR::videoUsageHints
?
There are no specific behavioral effects associated with any of the video decode usage hints, so the application can specify any combination of these flags. They are included to enable the application to better communicate the intended use case scenario to the implementation.
However, just like any other additional video profile information included in the pNext
chain of VkVideoProfileInfoKHR
structures, they are part of the video profile definition, hence whenever matching video profiles have to be provided to an API call, let that be queries or resource creation structures, the application must provide identical video decode usage hint values. This also applies if the application does not include the VkVideoDecodeUsageInfoKHR
structure, which is treated equivalently to specifying the structure with videoUsageHints
equal to VK_VIDEO_DECODE_USAGE_DEFAULT_KHR
(or zero), per the usual conventions of Vulkan.
5.6. RESOLVED: When is it mandatory to specify reconstructed picture information in VkVideoDecodeInfoKHR::pSetupReferenceSlot
?
In the original version of this extension, specifying a non-NULL
pSetupReferenceSlot
parameter was only required for activating DPB slots with reference pictures, but no shipping implementation actually supported specifying NULL
for pSetupReferenceSlot
. In the end, some implementations turned out to require a reconstructed picture resource and/or DPB slot, even when the decoded picture is not expected to be used as a reference picture by future video decode operations, so this extension has been changed with revision 8 as follows:
-
Specifying reconstructed picture information (i.e. a non-
NULL
pSetupReferenceSlot
) is made mandatory for all cases except when the video session was created with no DPB slots -
Reference picture setup (and, inherently, DPB slot activation) was changed to be subject to codec-specific behavior, meaning that specifying a non-
NULL
pSetupReferenceSlot
will only trigger reference picture setup if the appropriate codec-specific parameters or semantics indicate so (typically in the form of marking the decoded picture as reference)
(2) was necessary in order to avoid unnecessary DPB slot activation and the cost of populating the reconstructed picture resource when it is distinct from the decode output and the reconstructed picture is not intended to be used as a reference picture by future video decode operations. However, as some implementations may use the reconstructed picture resource and/or DPB slot as transient storage during the decoding process, if a non-NULL
pSetupReferenceSlot
is specified but no reference picture setup is requested, then the contents of the reconstructed picture resource become undefined and some of the picture references associated with the reconstructed picture’s DPB slot may get invalidated.
While this change breaks backward-compatibility, no implementation actually supported the removed behavior, thus it should not have any effect on shipping applications.
5.7. RESOLVED: Does always requiring the specification of reconstructed picture information in VkVideoDecodeInfoKHR::pSetupReferenceSlot
have any undesired performance consequences?
There may be performance implications when using distinct mode, i.e. using a reconstructed picture resource that is distinct from the decode output picture. As discussed in the previous issue, the explicit codec-specific opt-in for reference picture setup allows implementations to avoid consuming twice as much memory bandwidth to also write out the reconstructed picture when it is otherwise not needed. However, as it is implementation-specific whether the reconstructed picture is written to, even when no reference picture setup takes place, the reconstructed picture becomes a shared resource that the application has to synchronize in order to avoid write-after-write hazards. Accordingly, depending on the implementation, there may be some performance implications to always requiring the specification of reconstructed picture information.
6. Further Functionality
This extension is meant to provide only common video decode functionality, thus support for individual video decode profiles using specific video compression standards is left for extensions layered on top of the infrastructure provided here.
Currently the following layered extensions are available:
-
VK_KHR_video_decode_h264
- adds support for decoding H.264/AVC video sequences -
VK_KHR_video_decode_h265
- adds support for decoding H.265/HEVC video sequences