Memory limits with Vulkan


Mali GPUs

This article covers situations in which a Vulkan application might trigger an out of memory (OOM) condition on Mali GPUs, resulting in a DEVICE_LOST error, even if the API usage is correct. The OOM condition that developers hit most often is due to a very high vertex load, which might be relatively common when porting Vulkan applications from desktop to mobile.

Mali GPUs have a memory region which is available to store the intermediate geometry output from a render pass. This memory is used to store all of the varying data generated by vertex, tessellation, and geometry shading prior to fragment shading. Exceeding the size of this region may result in a VK_ERROR_DEVICE_LOST. The limit is fixed to 180 MB on current Mali GPUs, but it may be increases or lifted altogether in future GPUs.

The reasoning behind this limit is that tile-based renderers need to write out and then read back intermediate geometry output, thus vertex load is directly correlated to memory bandwidth. For a typical program using 64 bytes of varying data per vertex the 180 MB of intermediate storage can contain over 2 million vertices, which we expect to be enough for normal mobile application usage. We will now cover the reasons why such a vertex load is unlikely to be sustainable and possible mitigations if your application is hitting it.

Let us consider a vertex-heavy application with a single render pass that reaches the 180 MB limit. Since the GPU has to write the data out and read it back from memory, this results in 2 x 180 = 360 MB/render pass, which at 30 FPS brings memory bandwidth up to 30 x 360 = 10.8 GB/s. Memory bandwidth has a direct correlation with power consumption, which can be estimated as 100 mW/(GB/s). This means that an application using 180 MB of varying data will consume at least 1.08 W, and this does not consider further contributions to memory bandwidth and general GPU power consumption. A mobile GPU cannot sustain such a power usage without overheating, which would further cause a reduction of GPU frequency and a performance drop.

The only real solution to the issue is to keep the application’s vertex count below approximately 2 million, as derived above for an average of 64 bytes of varying data per vertex. In scenarios where the memory storage is exceeded and reducing the vertex load is not feasible, we recommend that the application splits the render pass into multiple render passes, each using a safe amount of intermediate storage. Later render passes can use a loadOp=LOAD to restore the content of the framebuffer and continue rendering on top of earlier rendering. This form of incremental rendering might impact performance, due to the write-out and further read-back of the color image.

If your vertex load is unpredictable and you are hitting DEVICE_LOST issues in the field, you can set up a scheme for estimating memory consumption for each draw call in a render pass, then performing incremental rendering if the limit is reached. You should keep in mind that memory is allocated for all vertex indices between the min and max index referenced by a draw call, and for all generated vertices for tessellation and geometry shading, even if they are subsequently culled by the clipping and culling pass. Such an estimate will be conservative, as the actual amount of memory allocated might be lower, so we don’t recommend adding a further safety margin to the 180 MB limit.