Buffer Device Address Alignment
All variables accessed with PhysicalStorageBuffer must have an Aligned memory operand to it.
%x = OpLoad %type %ptr Aligned 16
OpStore %ptr %obj Aligned 16
Shading languages will have a default, but can allow you to align it explicitly (ex buffer_reference_alignment).
The goal of this alignment is this is a promise for how aligned this specific pointer is. The compiler has no idea what the address will be when the shader is compiled. By providing an alignment it can generate valid code to match the requirement. The user is responsible to confirm the address they use is aligned to it.
layout(buffer_reference, buffer_reference_align = 64) buffer MyBDA {
uint data;
};
MyBDA ptr_a; // at 0x1000
MyBDA ptr_b; // at 0x1010
MyBDA ptr_c; // at 0x1040
ptr_a.data = 0; // (Aligned 64) valid!
ptr_b.data = 0; // (Aligned 64) invalid!
ptr_c.data = 0; // (Aligned 64) valid!
When deciding on an alignment, the minimum value will always be the size greater than or equal to the largest scalar/component type in the block.
// alignment must be at least 4
layout(buffer_reference) buffer MyBDA {
vec4 a; // scalar is float
};
// alignment must be at least 1
layout(buffer_reference) buffer MyBDA {
uint8_t a; // scalar is 8-bit int
};
// alignment must be at least 8
layout(buffer_reference) buffer MyBDA {
uint a; // 32-bit
double b; // 64-bit
};
Setting Alignment Example
To help explain alignment, lets take an example of loading an array of vectors
layout(buffer_reference, buffer_reference_align = ???) buffer MyBDA {
uvec4 data[];
};
MyBDA ptr; // at 0x1000
ptr.data[i] = uvec4(0);
Here we have 2 options, we could set the Aligned to be 4 or 16.
If we set alignment to 16 we are letting the compiler know it can load 16 bytes at a time, so it will hopefully do a vector load/store on the memory.
If we set alignment to 4 the compiler will likely have no way to infer the real alignment and will now do 4 scalar int load/store on the memory.
|
Some GPUs can do vector load/store even on unaligned addresses. |
For the next case, if we had uvec3 instead of uvec4 such as
layout(buffer_reference, buffer_reference_align = 4, scalar) buffer MyBDA {
uvec3 data[];
};
data[0]; // 0x1000
data[1]; // 0x100C
data[2]; // 0x1018
data[3]; // 0x1024
Matching Alignment From The Host
When dealing with buffer device address, you are able to do a simple memcpy to that memory on the host, which can easily lead to bugs if you aren’t careful about things being aligned.
|
The following issues are not directly tied to Buffer Device Address, and still can occur with any uniform or storage buffer. |
Take the following GLSL code as an example (view online)
// ArrayStride is 16
struct Metadata {
uint64_t address;
uint status;
};
layout(buffer_reference, buffer_reference_align = 8, scalar) readonly buffer Payload {
uint count; // offset 0
Metadata meta[]; // offset 8
};
layout(set = 0, binding = 0) buffer SSBO_0 {
Payload data;
};
Because the uint64_t needs be accessed at an 8-byte alignment, glslang (and any other compiler) will be smart and pack things as tightly as possible for you.
The first thing you might notice is Metadata needs to have an array stride of 16 instead of 12. This is because otherwise uint64_t address will land on a non 8-byte alignment every other instance of the array.
The next thing happening is because struct Metedata largest scalar is an 8-byte value, it knows to have the offset at 8 instead of 4. This is why trying to change the struct to
struct Metadata {
uint status;
uint64_t address;
};
or
struct Metadata {
uint64_t address;
uint status;
uint pad;
};
won’t change the offset from 8.
Here is how the memory is laid out in memory:
So the issue here becomes when we try to map our host memory. When you call vkMapMemory and get a void* you need to cautious that memory needs to be laid out the same as the diagram above. One way to ensure this is use a struct on host as it will match the shader code.
struct Metadata {
uint64_t address;
uint32_t status;
};
struct Payload {
uint32_t count;
Metadata meta[2];
} payload;
payload.count = 2;
payload.meta[0].address = 0xDEADBEEF;
payload.meta[0].status = 20;
payload.meta[1].address = 0xDEADBEEF;
payload.meta[1].status = 5;
void* data;
vkMapMemory(device, device_memory, 0, VK_WHOLE_SIZE, 0, &data);
// You can also just memcpy here as well!
Payload *payload_ptr = (Payload*)data;
*payload_ptr = payload;
If we examine the C++ code here (https://godbolt.org/z/Gq75qq1x6) we can see the assembly also automatically maps the offsets the same as the GLSL code above!