ADR-0445: Persistent VkPipelineCache for Vulkan compute backend¶
- Status: Accepted
- Date: 2026-05-16
- Deciders: Lusoris, Claude (Anthropic)
- Tags:
vulkan,gpu,performance,pipeline-cache,fork-local
Context¶
PR #865 profiling (Research-0135) showed that the Vulkan compute backend recompiles all SPIR-V kernels from scratch on every process start because every vkCreateComputePipelines() call passed VK_NULL_HANDLE as the pipeline cache handle. On NVIDIA RTX 4090 this costs 80–120 ms of cold-start latency across the full kernel set. The Vulkan 1.3 spec §10.6 defines VkPipelineCache as the standard mechanism for persisting compiled PSO blobs across process invocations; drivers skip recompilation on warm starts when the vendorID / deviceID header matches.
Decision¶
We will add a VkPipelineCache pipeline_cache to VmafVulkanContext and load it from $XDG_CACHE_HOME/libvmaf/vulkan-pipeline-cache.bin at context init; serialise it back at destroy. The VkPipelineCacheHeaderVersionOne header is validated (vendor ID + device ID) before reuse. Every vkCreateComputePipelines call in the codebase passes ctx->pipeline_cache instead of VK_NULL_HANDLE. An env-var opt-out (LIBVMAF_VULKAN_PIPELINE_CACHE=0) skips all cache I/O.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Keep VK_NULL_HANDLE | Zero implementation complexity | 80–120 ms cold start on every invocation | Unacceptable for short single-file runs |
| In-process cache only (no disk persistence) | No file I/O | No cross-invocation benefit; still compiles on cold start | Eliminates the main saving |
| Persistent cache (chosen) | Warm start 2–5 ms, bit-exact output unchanged | First run still cold; file must be validated on load | Best outcome per Research-0135 data |
No alternatives needed beyond the above — PR #865 settled the choice; this PR implements option 3.
Consequences¶
- Positive: Process startup drops from ~140 ms to ~25 ms on warm runs (1-frame PSNR-Vulkan benchmark on RTX 4090). Bit-exactness is unaffected (the cache just replays compiled ISA, not any numeric path).
- Negative: First run remains cold (~140 ms). A stale cache file (wrong device) is silently discarded and recreated; no user-visible error.
- Neutral / follow-ups:
- The cache file path follows XDG conventions and is gitignored by default. CI uses
LIBVMAF_VULKAN_PIPELINE_CACHE=0to avoid cross-run state contamination. - Every future
vkCreateComputePipelines()addition must passctx->pipeline_cache— see theAGENTS.mdinvariant below. - The
VkPipelineCacheHeaderVersionOnevalidation does not includedriverVersion— a driver update produces a harmless discarded blob on the first post-update run; the driver itself handles stale ISA internally.
References¶
- Research-0135:
docs/research/0135-vulkan-dispatch-overhead-2026-05-15.md - PR #865 profiling finding (source of this implementation).
- Vulkan 1.3 spec §10.6 "Pipeline Cache".
- ADR-0246: kernel template scaffolding (
kernel_template.h). - Source:
req(PR #865 description — implement VkPipelineCache persistence fix).