Skip to content

ADR-0445: Persistent VkPipelineCache for Vulkan compute backend

  • Status: Accepted
  • Date: 2026-05-16
  • Deciders: Lusoris, Claude (Anthropic)
  • Tags: vulkan, gpu, performance, pipeline-cache, fork-local

Context

PR #865 profiling (Research-0135) showed that the Vulkan compute backend recompiles all SPIR-V kernels from scratch on every process start because every vkCreateComputePipelines() call passed VK_NULL_HANDLE as the pipeline cache handle. On NVIDIA RTX 4090 this costs 80–120 ms of cold-start latency across the full kernel set. The Vulkan 1.3 spec §10.6 defines VkPipelineCache as the standard mechanism for persisting compiled PSO blobs across process invocations; drivers skip recompilation on warm starts when the vendorID / deviceID header matches.

Decision

We will add a VkPipelineCache pipeline_cache to VmafVulkanContext and load it from $XDG_CACHE_HOME/libvmaf/vulkan-pipeline-cache.bin at context init; serialise it back at destroy. The VkPipelineCacheHeaderVersionOne header is validated (vendor ID + device ID) before reuse. Every vkCreateComputePipelines call in the codebase passes ctx->pipeline_cache instead of VK_NULL_HANDLE. An env-var opt-out (LIBVMAF_VULKAN_PIPELINE_CACHE=0) skips all cache I/O.

Alternatives considered

Option Pros Cons Why not chosen
Keep VK_NULL_HANDLE Zero implementation complexity 80–120 ms cold start on every invocation Unacceptable for short single-file runs
In-process cache only (no disk persistence) No file I/O No cross-invocation benefit; still compiles on cold start Eliminates the main saving
Persistent cache (chosen) Warm start 2–5 ms, bit-exact output unchanged First run still cold; file must be validated on load Best outcome per Research-0135 data

No alternatives needed beyond the above — PR #865 settled the choice; this PR implements option 3.

Consequences

  • Positive: Process startup drops from ~140 ms to ~25 ms on warm runs (1-frame PSNR-Vulkan benchmark on RTX 4090). Bit-exactness is unaffected (the cache just replays compiled ISA, not any numeric path).
  • Negative: First run remains cold (~140 ms). A stale cache file (wrong device) is silently discarded and recreated; no user-visible error.
  • Neutral / follow-ups:
  • The cache file path follows XDG conventions and is gitignored by default. CI uses LIBVMAF_VULKAN_PIPELINE_CACHE=0 to avoid cross-run state contamination.
  • Every future vkCreateComputePipelines() addition must pass ctx->pipeline_cache — see the AGENTS.md invariant below.
  • The VkPipelineCacheHeaderVersionOne validation does not include driverVersion — a driver update produces a harmless discarded blob on the first post-update run; the driver itself handles stale ISA internally.

References

  • Research-0135: docs/research/0135-vulkan-dispatch-overhead-2026-05-15.md
  • PR #865 profiling finding (source of this implementation).
  • Vulkan 1.3 spec §10.6 "Pipeline Cache".
  • ADR-0246: kernel template scaffolding (kernel_template.h).
  • Source: req (PR #865 description — implement VkPipelineCache persistence fix).