Subsystems: Vulkan for Audio Processing

Enhancing Audio with Vulkan

In the previous section, we implemented a basic audio system for our engine. Now, we’ll explore how Vulkan’s compute capabilities can enhance audio processing, particularly for implementing realistic 3D spatial audio using Head-Related Transfer Functions (HRTF).

Understanding HRTF

Head-Related Transfer Functions (HRTF) are a set of acoustic filters that model how sound is altered by the head, outer ear, and torso before reaching the eardrums. These filters vary based on the direction of the sound source relative to the listener.

HRTF processing allows us to create convincing 3D audio by applying the appropriate filters to sound sources based on their position. This creates a more immersive experience than simple stereo panning and distance attenuation.

The challenge with HRTF processing is that it’s computationally expensive:

Each sound source requires a unique set of filters based on its position
These filters must be applied to the audio stream in real-time
The process involves complex convolutions (multiplying audio samples with filter coefficients)

This is where Vulkan compute shaders can help by offloading these calculations to the GPU.

Why Use Vulkan for Audio Processing?

Traditional audio processing is done on the CPU, but there are several advantages to using Vulkan compute shaders for certain audio tasks:

Parallelism: Audio processing, especially HRTF convolution, can be highly parallelized, making it well-suited for GPU computation.
Reduced CPU Load: Offloading audio processing to the GPU frees up CPU resources for game logic, AI, and other tasks.
Scalability: GPU-based processing can more easily scale to handle hundreds or thousands of simultaneous sound sources.
Unified Memory: With Vulkan, we can share memory between graphics and audio processing, reducing data transfer overhead.

Implementing HRTF Processing with Vulkan

Let’s extend our audio system to include HRTF processing using Vulkan compute shaders.

First, we’ll add HRTF-related structures to our audio system:

// Audio.h (additions)
#include <vulkan/vulkan_raii.hpp>
#include <array>

namespace Engine {
namespace Audio {

// HRTF data for a specific direction
struct HRTFData {
    std::array<float, 256> leftEarImpulseResponse;
    std::array<float, 256> rightEarImpulseResponse;
};

// HRTF database containing filters for different directions
class HRTFDatabase {
public:
    HRTFDatabase(const std::string& filename);

    // Get HRTF data for a specific direction
    const HRTFData& GetHRTFData(float azimuth, float elevation) const;

private:
    // In a real implementation, this would be a more sophisticated data structure
    std::vector<HRTFData> m_Data;
    // Mapping from direction to data index
    // ...
};

// Extended AudioSystem with Vulkan-based HRTF processing
class AudioSystem {
public:
    // ... existing methods ...

    // Enable/disable HRTF processing
    void SetHRTFEnabled(bool enabled) { m_HRTFEnabled = enabled; }
    bool IsHRTFEnabled() const { return m_HRTFEnabled; }

    // Set the HRTF database to use
    void SetHRTFDatabase(std::shared_ptr<HRTFDatabase> database) { m_HRTFDatabase = database; }

private:
    // ... existing members ...

    // HRTF processing
    bool m_HRTFEnabled = false;
    std::shared_ptr<HRTFDatabase> m_HRTFDatabase;

    // Vulkan resources for HRTF processing
    struct VulkanResources {
        vk::raii::ShaderModule computeShaderModule = nullptr;
        vk::raii::DescriptorSetLayout descriptorSetLayout = nullptr;
        vk::raii::PipelineLayout pipelineLayout = nullptr;
        vk::raii::Pipeline computePipeline = nullptr;
        vk::raii::DescriptorPool descriptorPool = nullptr;

        // Buffers for audio data
        vk::raii::Buffer inputBuffer = nullptr;
        vk::raii::DeviceMemory inputBufferMemory = nullptr;
        vk::raii::Buffer outputBuffer = nullptr;
        vk::raii::DeviceMemory outputBufferMemory = nullptr;
        vk::raii::Buffer hrtfBuffer = nullptr;
        vk::raii::DeviceMemory hrtfBufferMemory = nullptr;

        // Descriptor sets
        std::vector<vk::raii::DescriptorSet> descriptorSets;

        // Command buffer for compute operations
        vk::raii::CommandPool commandPool = nullptr;
        vk::raii::CommandBuffer commandBuffer = nullptr;
    };

    VulkanResources m_VulkanResources;

    // Initialize Vulkan resources for HRTF processing
    void InitializeVulkanResources();
    void CleanupVulkanResources();

    // Process audio with HRTF using Vulkan
    void ProcessAudioWithVulkan(float* inputBuffer, float* outputBuffer, size_t frameCount);
};

} // namespace Audio
} // namespace Engine

Now, let’s implement the Vulkan-based HRTF processing:

// Audio.cpp (implementation)

void AudioSystem::InitializeVulkanResources() {
    // Get Vulkan device from the engine
    auto& device = m_Engine.GetVulkanDevice();

    // Create compute shader module
    auto shaderCode = LoadShaderFile("shaders/hrtf_processing.comp.spv");
    vk::ShaderModuleCreateInfo shaderModuleCreateInfo({}, shaderCode.size() * sizeof(uint32_t),
                                                     reinterpret_cast<const uint32_t*>(shaderCode.data()));
    m_VulkanResources.computeShaderModule = vk::raii::ShaderModule(device, shaderModuleCreateInfo);

    // Create descriptor set layout
    std::array<vk::DescriptorSetLayoutBinding, 3> bindings = {
        // Input audio buffer
        vk::DescriptorSetLayoutBinding(0, vk::DescriptorType::eStorageBuffer, 1,
                                      vk::ShaderStageFlagBits::eCompute),
        // Output audio buffer
        vk::DescriptorSetLayoutBinding(1, vk::DescriptorType::eStorageBuffer, 1,
                                      vk::ShaderStageFlagBits::eCompute),
        // HRTF data buffer
        vk::DescriptorSetLayoutBinding(2, vk::DescriptorType::eStorageBuffer, 1,
                                      vk::ShaderStageFlagBits::eCompute)
    };

    vk::DescriptorSetLayoutCreateInfo descriptorSetLayoutCreateInfo({}, bindings);
    m_VulkanResources.descriptorSetLayout = vk::raii::DescriptorSetLayout(device, descriptorSetLayoutCreateInfo);

    // Create pipeline layout
    vk::PipelineLayoutCreateInfo pipelineLayoutCreateInfo({}, *m_VulkanResources.descriptorSetLayout);
    m_VulkanResources.pipelineLayout = vk::raii::PipelineLayout(device, pipelineLayoutCreateInfo);

    // Create compute pipeline
    vk::PipelineShaderStageCreateInfo shaderStageCreateInfo({}, vk::ShaderStageFlagBits::eCompute,
                                                           *m_VulkanResources.computeShaderModule, "main");
    vk::ComputePipelineCreateInfo computePipelineCreateInfo({}, shaderStageCreateInfo,
                                                           *m_VulkanResources.pipelineLayout);
    m_VulkanResources.computePipeline = vk::raii::Pipeline(device, nullptr, computePipelineCreateInfo);

    // Create descriptor pool
    std::array<vk::DescriptorPoolSize, 1> poolSizes = {
        vk::DescriptorPoolSize(vk::DescriptorType::eStorageBuffer, 3)
    };
    vk::DescriptorPoolCreateInfo descriptorPoolCreateInfo({}, 1, poolSizes);
    m_VulkanResources.descriptorPool = vk::raii::DescriptorPool(device, descriptorPoolCreateInfo);

    // Allocate descriptor sets
    vk::DescriptorSetAllocateInfo descriptorSetAllocateInfo(*m_VulkanResources.descriptorPool,
                                                           1, &*m_VulkanResources.descriptorSetLayout);
    m_VulkanResources.descriptorSets = vk::raii::DescriptorSets(device, descriptorSetAllocateInfo);

    // Create buffers for audio data
    // In a real implementation, you would size these appropriately and handle multiple frames
    CreateBuffer(device, sizeof(float) * 1024, vk::BufferUsageFlagBits::eStorageBuffer,
                m_VulkanResources.inputBuffer, m_VulkanResources.inputBufferMemory);
    CreateBuffer(device, sizeof(float) * 2048, vk::BufferUsageFlagBits::eStorageBuffer,
                m_VulkanResources.outputBuffer, m_VulkanResources.outputBufferMemory);
    CreateBuffer(device, sizeof(float) * 512, vk::BufferUsageFlagBits::eStorageBuffer,
                m_VulkanResources.hrtfBuffer, m_VulkanResources.hrtfBufferMemory);

    // Update descriptor sets
    std::array<vk::DescriptorBufferInfo, 3> bufferInfos = {
        vk::DescriptorBufferInfo(*m_VulkanResources.inputBuffer, 0, VK_WHOLE_SIZE),
        vk::DescriptorBufferInfo(*m_VulkanResources.outputBuffer, 0, VK_WHOLE_SIZE),
        vk::DescriptorBufferInfo(*m_VulkanResources.hrtfBuffer, 0, VK_WHOLE_SIZE)
    };

    std::array<vk::WriteDescriptorSet, 3> descriptorWrites = {
        vk::WriteDescriptorSet(*m_VulkanResources.descriptorSets[0], 0, 0, 1,
                              vk::DescriptorType::eStorageBuffer, nullptr, &bufferInfos[0]),
        vk::WriteDescriptorSet(*m_VulkanResources.descriptorSets[0], 1, 0, 1,
                              vk::DescriptorType::eStorageBuffer, nullptr, &bufferInfos[1]),
        vk::WriteDescriptorSet(*m_VulkanResources.descriptorSets[0], 2, 0, 1,
                              vk::DescriptorType::eStorageBuffer, nullptr, &bufferInfos[2])
    };

    device.updateDescriptorSets(descriptorWrites, {});

    // Create command pool and command buffer
    vk::CommandPoolCreateInfo commandPoolCreateInfo({}, m_Engine.GetVulkanQueueFamilyIndex());
    m_VulkanResources.commandPool = vk::raii::CommandPool(device, commandPoolCreateInfo);

    vk::CommandBufferAllocateInfo commandBufferAllocateInfo(*m_VulkanResources.commandPool,
                                                           vk::CommandBufferLevel::ePrimary, 1);
    auto commandBuffers = vk::raii::CommandBuffers(device, commandBufferAllocateInfo);
    m_VulkanResources.commandBuffer = std::move(commandBuffers[0]);
}

void AudioSystem::ProcessAudioWithVulkan(float* inputBuffer, float* outputBuffer, size_t frameCount) {
    if (!m_HRTFEnabled || !m_HRTFDatabase) {
        // If HRTF is disabled, just copy input to output (or do simple stereo panning)
        memcpy(outputBuffer, inputBuffer, frameCount * sizeof(float));
        return;
    }

    auto& device = m_Engine.GetVulkanDevice();
    auto& queue = m_Engine.GetVulkanComputeQueue();

    // Copy input audio data to the input buffer
    void* data;
    vkMapMemory(device, *m_VulkanResources.inputBufferMemory, 0, frameCount * sizeof(float), 0, &data);
    memcpy(data, inputBuffer, frameCount * sizeof(float));
    vkUnmapMemory(device, *m_VulkanResources.inputBufferMemory);

    // Update HRTF data based on source positions
    // In a real implementation, you would update this for each sound source
    // For simplicity, we're just using a single HRTF filter here
    const auto& hrtfData = m_HRTFDatabase->GetHRTFData(0.0f, 0.0f);
    vkMapMemory(device, *m_VulkanResources.hrtfBufferMemory, 0, sizeof(HRTFData), 0, &data);
    memcpy(data, &hrtfData, sizeof(HRTFData));
    vkUnmapMemory(device, *m_VulkanResources.hrtfBufferMemory);

    // Record command buffer
    vk::CommandBufferBeginInfo beginInfo(vk::CommandBufferUsageFlagBits::eOneTimeSubmit);
    m_VulkanResources.commandBuffer.begin(beginInfo);

    m_VulkanResources.commandBuffer.bindPipeline(vk::PipelineBindPoint::eCompute, *m_VulkanResources.computePipeline);
    m_VulkanResources.commandBuffer.bindDescriptorSets(vk::PipelineBindPoint::eCompute,
                                                     *m_VulkanResources.pipelineLayout, 0,
                                                     *m_VulkanResources.descriptorSets[0], {});

    // Dispatch compute shader
    // The workgroup size should match what's defined in the shader
    m_VulkanResources.commandBuffer.dispatch(frameCount / 64 + 1, 1, 1);

    m_VulkanResources.commandBuffer.end();

    // Submit command buffer
    vk::SubmitInfo submitInfo({}, {}, *m_VulkanResources.commandBuffer);
    queue.submit(submitInfo, nullptr);
    queue.waitIdle();

    // Copy output audio data from the output buffer
    vkMapMemory(device, *m_VulkanResources.outputBufferMemory, 0, frameCount * 2 * sizeof(float), 0, &data);
    memcpy(outputBuffer, data, frameCount * 2 * sizeof(float));
    vkUnmapMemory(device, *m_VulkanResources.outputBufferMemory);
}

void AudioSystem::Update(float deltaTime) {
    // Process all active audio sources
    for (auto& source : m_Sources) {
        if (source->IsPlaying()) {
            // Get audio data from the source
            auto clip = source->GetClip();
            if (!clip) continue;

            // Calculate spatial position relative to listener
            glm::vec3 relativePosition = source->GetPosition() - m_Listener.GetPosition();

            // Rotate relative position based on listener orientation
            glm::mat3 listenerOrientation(
                glm::cross(m_Listener.GetForward(), m_Listener.GetUp()),
                m_Listener.GetUp(),
                -m_Listener.GetForward()
            );
            relativePosition = listenerOrientation * relativePosition;

            // Calculate azimuth and elevation
            float distance = glm::length(relativePosition);
            float azimuth = atan2(relativePosition.x, relativePosition.z);
            float elevation = atan2(relativePosition.y, sqrt(relativePosition.x * relativePosition.x + relativePosition.z * relativePosition.z));

            // Get audio data from the clip
            const float* audioData = clip->GetData() + source->GetCurrentSample();
            size_t remainingSamples = clip->GetSampleCount() - source->GetCurrentSample();
            size_t framesToProcess = std::min(remainingSamples, size_t(1024));

            // Process audio with HRTF using Vulkan
            float processedAudio[2048]; // Stereo output (2 channels)
            ProcessAudioWithVulkan(const_cast<float*>(audioData), processedAudio, framesToProcess);

            // Send processed audio to the audio backend
            // ...

            // Update source state
            source->IncrementSample(framesToProcess);
        }
    }
}

HRTF Compute Shader

Here’s the compute shader that performs the HRTF convolution:

// hrtf_processing.comp
#version 450

layout(local_size_x = 64, local_size_y = 1, local_size_z = 1) in;

// Input mono audio buffer
layout(std430, binding = 0) buffer InputBuffer {
    float samples[];
} inputBuffer;

// Output stereo audio buffer
layout(std430, binding = 1) buffer OutputBuffer {
    float leftSamples[];
    float rightSamples[];
} outputBuffer;

// HRTF data
layout(std430, binding = 2) buffer HRTFBuffer {
    float leftImpulseResponse[256];
    float rightImpulseResponse[256];
} hrtfBuffer;

void main() {
    uint gID = gl_GlobalInvocationID.x;

    // Check if this invocation is within the audio buffer
    if (gID >= inputBuffer.samples.length()) {
        return;
    }

    // Perform convolution with HRTF impulse responses
    float leftSample = 0.0;
    float rightSample = 0.0;

    for (int i = 0; i < 256; i++) {
        int sampleIndex = int(gID) - i;
        if (sampleIndex >= 0 && sampleIndex < inputBuffer.samples.length()) {
            leftSample += inputBuffer.samples[sampleIndex] * hrtfBuffer.leftImpulseResponse[i];
            rightSample += inputBuffer.samples[sampleIndex] * hrtfBuffer.rightImpulseResponse[i];
        }
    }

    // Write to output buffer
    outputBuffer.leftSamples[gID] = leftSample;
    outputBuffer.rightSamples[gID] = rightSample;
}

Performance Considerations

When implementing HRTF processing with Vulkan, consider these performance optimizations:

Batch Processing: Process multiple audio frames in a single dispatch to amortize the overhead of command submission.
Memory Transfers: Minimize transfers between CPU and GPU memory by processing larger chunks of audio at once.
Multiple Sources: Process multiple sound sources in a single shader invocation to maximize GPU utilization.
Dynamic HRTF Selection: Only update HRTF filters when sound source positions change significantly.
Workgroup Size: Tune the workgroup size based on your target hardware for optimal performance.

Integration with the Audio System

To integrate the Vulkan-based HRTF processing into our audio system, we need to modify the AudioSystem::Initialize method:

void AudioSystem::Initialize() {
    // Initialize audio backend
    // ...

    // Initialize Vulkan resources for HRTF processing
    if (m_Engine.IsVulkanInitialized()) {
        InitializeVulkanResources();
    }

    // Load default HRTF database
    m_HRTFDatabase = std::make_shared<HRTFDatabase>("data/hrtf/default.hrtf");
    m_HRTFEnabled = true;
}

void AudioSystem::Shutdown() {
    // Cleanup Vulkan resources
    if (m_Engine.IsVulkanInitialized()) {
        CleanupVulkanResources();
    }

    // Shutdown audio backend
    // ...
}

Advantages of Vulkan-Based HRTF

See the core benefits listed in Why Use Vulkan for Audio Processing? for a summary of why compute shaders are a good fit. In the context of HRTF specifically, two practical advantages are worth highlighting:

Quality: You can afford higher-order HRTF filters without significant performance impact, improving spatial realism.
Advanced Effects: The GPU’s compute power enables more sophisticated effects (e.g., room acoustics simulation) alongside HRTF.

Limitations and Considerations

While Vulkan-based audio processing offers many advantages, there are some limitations to consider:

Latency: GPU processing introduces additional latency, which may be problematic for real-time audio.
Complexity: Implementing and debugging GPU-based audio processing is more complex than CPU-based solutions.
Platform Support: Not all platforms support Vulkan, so you may need fallback CPU implementations.
Power Consumption: GPU processing may increase power consumption, which is a consideration for mobile devices.

Real-World Applications

Several modern game engines and audio middleware solutions are beginning to leverage GPU acceleration for audio processing:

Steam Audio: Valve’s audio SDK supports GPU acceleration for its spatial audio processing.
Wwise: Audiokinetic’s Wwise can offload certain DSP effects to the GPU.
Custom Solutions: AAA game studios often implement custom GPU-accelerated audio processing for their titles.

By implementing Vulkan-based HRTF processing in our engine, we’re following industry best practices for high-performance audio in modern games.

In the next section, we’ll shift our focus to the physics subsystem and explore how Vulkan compute shaders can accelerate physics simulations.

Previous: Audio Basics | Next: Physics Basics