Research-0004: Vulkan compute backend — toolchain, loader, memory model, DMABUF import¶
- Status: Active
- Workstream: ADR-0127
- Last updated: 2026-04-20
Question¶
For a Vulkan compute backend that lives alongside CUDA / SYCL / HIP under core/src/vulkan/ and core/src/feature/vulkan/, what are the concrete tool-chain and memory-model choices? Specifically:
- Which Vulkan loader do we link (
vulkan.h+libvulkan.sovs volk)? - Which shader language + compiler do we standardise on?
- Which memory allocator (hand-rolled vs VMA)?
- How do we import FFmpeg-decoded hw-frames without host round-trip?
Sources¶
- Existing backend scaffolding:
core/src/cuda/— dlopen-based loader pattern forlibcuda.so.1.core/src/sycl/— USM picture pool and D3D11 external-handle import (ADR-0101, ADR-0103).- Khronos Vulkan 1.3 spec — the normative reference.
- volk — single-header meta-loader by Arseny Kapoulkine; MIT; currently ~5000 installs/week via vcpkg Conan.
- VMA (Vulkan Memory Allocator) — AMD; MIT; the industry-standard allocator; 23k GitHub stars.
- shaderc — Google's glslang wrapper; provides
glslcas the canonical GLSL → SPIR-V compiler. - FFmpeg hwcontext docs: libavutil/hwcontext_vulkan.c — FFmpeg's own Vulkan hwcontext, useful for aligning the external-memory handle types.
- Mesa lavapipe docs — CPU-backed Vulkan implementation for CI.
Findings¶
1. Loader: volk vs raw vulkan.h¶
Raw vulkan.h + link-time -lvulkan is the obvious choice but has one deal-breaker: if the user's system doesn't have a Vulkan loader (rare on Linux, common on headless containers and some macOS setups), the whole libvmaf.so fails to load. The CUDA backend solved the same problem for libcuda.so.1 via a dlopen shim (ADR-0122 context).
volk is a 2-file header-only meta-loader that:
- Calls
dlopen("libvulkan.so.1")/LoadLibrary("vulkan-1.dll")atvolkInitialize(). - Loads instance- and device-level function pointers into globals without any build-time dependency on the Vulkan SDK.
- Plays correctly with shaderc / VMA / other Vulkan-ecosystem libs.
- Zero link-time Vulkan requirement; the package manifests just need the Vulkan SDK headers.
Decision: volk. Identical pattern to the existing dlopen'd CUDA loader, no build-machine needs Vulkan pre-installed.
2. Shader language + compiler¶
Three viable options:
| Shader language | Compiler | Pros | Cons |
|---|---|---|---|
| GLSL 4.60 | glslc (shaderc) | Most mature tooling; every Linux distro ships it; matches FFmpeg's own Vulkan shader style; SPIR-V 1.3 target fits Vulkan 1.3 baseline | C-era syntax; new shaders require remembering how GLSL handles unsigned math |
| HLSL | DXC (Microsoft) | Sharper tooling on Windows; larger user base from gamedev; nicer generics | Still foreign on Linux; HLSL → SPIR-V path is newer + less battle-tested for compute |
| Slang | slangc | Modern language design; first-class SPIR-V target | Too new; we'd be the only dependent in the libvmaf sphere |
GLSL wins on tooling maturity alone. Compiled to SPIR-V 1.3 offline at build time via glslc --target-env=vulkan1.3; resulting .spv files are embedded as byte arrays in the backend source tree (not shipped as loose files — keeps the install set clean and avoids runtime file-IO).
3. Memory allocator: VMA vs hand-rolled¶
Hand-rolled: possible, but Vulkan memory management is famously the most error-prone surface of the API. Memory types, heap budgets, alignment requirements, sub-allocation — all non-trivial.
VMA solves these once for the entire Vulkan ecosystem. Used by FFmpeg's Vulkan hwcontext, Dolphin, RPCS3, Godot, and ~half the Vulkan-based projects with serious production deployments. Single header + .cpp; MIT; no external deps.
Decision: vendor VMA under subprojects/vulkan-memory-allocator/ (as a Meson wrap) and use it for all buffer/image allocation. Keeps our code focused on the VMAF math, not memory plumbing.
4. DMABUF / external-memory import¶
Our existing SYCL and CUDA backends import FFmpeg-decoded frames via VK_KHR_external_memory_fd-equivalents (CUDA has cuMemImportFromShareableHandle; SYCL has the oneAPI external-memory extension). Vulkan uses the most standards-y path:
- Linux:
VK_KHR_external_memory_fdwithVK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT. FFmpegAV_HWDEVICE_TYPE_VULKANorAV_HWDEVICE_TYPE_VAAPIboth expose the DMABUF fd we hand tovkImportSemaphoreFdKHR/VkImportMemoryFdInfoKHR. - Windows:
VK_KHR_external_memory_win32withVK_EXTERNAL_MEMORY_HANDLE_TYPE_D3D11_TEXTURE_BITwhen source isAV_HWDEVICE_TYPE_D3D11VA. - macOS (MoltenVK): no external-memory support. Fall back to a host-staged copy (same path the SYCL backend uses for setups without USM page-migration support).
Timeline fencing uses VK_KHR_timeline_semaphore (Vulkan 1.2 core, 1.3 guaranteed). Matches FFmpeg's own hwcontext_vulkan semaphore model.
5. CI strategy — lavapipe¶
GitHub Actions has no Vulkan hardware. Mesa lavapipe (the CPU Vulkan driver) is enough to:
- Validate SPIR-V shaders compile and load.
- Validate VkInstance/VkDevice creation and queue lifecycle.
- Run the VIF kernel end-to-end and compare results against the CPU reference within a looser tolerance than hardware would produce (llvmpipe's math is not IEEE-strict).
What lavapipe does not cover:
- Real hardware-specific numerical behaviour (fp16/fp32 blending, tile-memory paths on Mali/Adreno).
- Real DMABUF import (no external-memory extensions exposed).
Proposal: lavapipe leg is a "did it compile / does it run" gate only. Hardware-tolerance assertions run on developer machines or a self-hosted runner. The gate-vs-soak distinction mirrors the existing windows-GPU-build-only-legs policy (ADR-0121).
Open questions (for follow-up iterations)¶
- shaderc packaging: vendor as a
subprojects/wrap, or require system-installed viapkg-config shaderc? Wrap is simpler for first-time contributors; system-installed is leaner for distros. Decide in the implementation PR once we have a concrete build time for the wrap. - First vendor to hardware-validate on: NVIDIA or AMD first? NVIDIA has the most mature Vulkan + external-memory + DMABUF support on Linux; AMD consumer cards are the highest-demand Vulkan-only audience. Probably NVIDIA first (re-uses the CUDA box), AMD second.
- Does VIF-in-Vulkan benefit from subgroup operations? The VIF pyramid reductions are natural fits for
VK_KHR_shader_subgroup_uniform_control_flow+ subgroup intrinsics. Worth a profiling round before the AVX-512 reference becomes the speed ceiling. - Do we expose
libvmaf_vulkan.hpublicly? Yes, symmetric withlibvmaf_cuda.h/libvmaf_sycl.h. Function names startvmaf_vulkan_*.
Next steps¶
- Governance PR (this one) lands — opens the road for
/add-gpu-backend vulkanto scaffold the tree. - Scaffold PR follows — runtime init, device selection, queue setup, empty feature kernel file. No VIF math yet.
- VIF Vulkan kernel PR — port the existing scalar VIF as a set of compute shaders, one shader per VIF pyramid scale. Validate against scalar on a developer box.
- DMABUF import PR — wire FFmpeg
AV_HWDEVICE_TYPE_VULKANframes into the Vulkan VIF path without host round-trip. - PSNR / SSIM Vulkan ports — follow the VIF pattern.
- Hardware self-hosted runner investigation — separate workstream.