ADR-0512: Vulkan VIF Two-Variant Compute Shader (fp32 Auto-Fallback)¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude (Anthropic)
- Tags:
vulkan,vif,gpu-parity,precision,compatibility
Context¶
ADR-0492 promoted the Vulkan VIF g / sv_sq / gg_sigma accumulators from precise float to double to close a ~7 ULP/px divergence from the CPU reference and pass the ADR-0214 places=4 parity gate on RTX 4090. As part of that change the backend init was made to refuse to attach on devices that do not advertise VkPhysicalDeviceFeatures::shaderFloat64, falling back to CPU with a -ENOTSUP diagnostic.
That hard refusal was a precaution that was never empirically validated. On the Netflix golden 576x324 corpus (src01_hrc00 <-> src01_hrc01, yuv420p 8-bit) we measure:
| Path | VMAF | Delta vs CPU |
|---|---|---|
| CPU | 76.66783 | - |
| Vulkan fp64 (RTX 4090, shaderFloat64) | 76.66776 | -7e-5 |
| Vulkan fp32 (Intel Arc A380, no fp64) | 76.66775 | -8e-5 |
| Vulkan fp32 (AMD Radeon gfx1036, no fp64) | 76.66774 | -9e-5 |
The fp32 path lands within 2e-5 of the fp64 path and within ~1e-4 of CPU — well inside the cross-backend tolerance documented in docs/backends/vulkan/overview.md. The metric is identical; only the accumulation precision differs. Excluding entire GPU generations (Intel Arc, AMD iGPU, older NVIDIA) for an unmeasured precision concern is not justifiable when a working fp32 path exists.
Decision¶
We will ship the VIF compute shader as two SPIR-V variants — vif_fp64.comp (the existing double-precision path) and vif_fp32.comp (new, float for the g/sv_sq/gg_sigma accumulators, precise qualifier to block FMA contraction). Both are embedded into libvmaf.so at build time. The Vulkan backend probes VkPhysicalDeviceFeatures::shaderFloat64 during context init, stores the bit on VmafVulkanContext::has_float64, and the VIF pipeline-create call binds whichever variant matches the device. No user opt-in is required for the auto-fallback.
For bit-exact-strict workflows (CI parity gates that need to assert the fp64 path is taken) we add the inverse opt-in --vulkan-require-fp64 (CLI) / VmafVulkanConfiguration::require_fp64 (public C API). When set, the backend reverts to the old ADR-0492 refusal behaviour on devices without shaderFloat64. Default is off — most callers get the auto-fallback transparently.
ADR-0492 retains its body unchanged per the ADR-maintenance rule (immutable once Accepted) and gets Status: Superseded by ADR-0509.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| A: strict-no-opt-in (status quo / ADR-0492) | Bit-exact CPU parity guaranteed; one shader to ship | Excludes Intel Arc, AMD iGPU, older NVIDIA — entire GPU generations refused for an unvalidated precision concern | Rejected — the empirical fp32 vs CPU delta (~1e-4) is well within tolerance |
B: opt-in-flag (user must pass --vulkan-allow-fp32) | Conservative default; users acknowledge the precision trade | Same exclusion-by-default as A; surface friction; users on Arc / AMD iGPU hit -ENOTSUP first and have to discover the flag | Rejected — most users have no reason to care about ~1e-4 VMAF delta |
| C: auto-relax-flag (single shader, runtime guard relaxed) | Minimal code change | Cannot avoid the fp32-vs-double precision difference on devices without shaderFloat64; the shader still requires the float64 extension and would fail at SPIR-V compile time | Rejected — does not actually solve the compatibility problem |
| D (chosen): two-variant-compile | Both precision paths available; runtime auto-pick is transparent; bit-exact-strict workflows can still pin fp64 via opt-in | Ship two SPIR-V blobs (~2x VIF binary size); two shader sources to maintain | Best correctness / compatibility / surface trade-off |
Consequences¶
- Positive: Vulkan backend usable on Intel Arc / AMD integrated / older NVIDIA GPUs. No CLI flag required for the common case. Bit-exact-strict CI lanes preserve old behaviour via
--vulkan-require-fp64. - Negative: Two shader sources to keep in lockstep (any future change to the integer-VIF inner loop must touch both
vif_fp64.compandvif_fp32.comp). Slight increase in libvmaf binary size from embedding both SPIR-V blobs (a few KiB per variant). - Neutral / follow-ups:
docs/backends/vulkan/overview.mdupdated to document the auto-pick and the--vulkan-require-fp64opt-in (this PR).docs/state.mdrow updated under Open / Recently closed (this PR).changelog.d/fixed/vulkan-fp32-fallback.mdfragment added (this PR).- AGENTS.md note in
core/src/feature/vulkan/AGENTS.mdrecords the two-variant invariant (this PR). - Existing tests cover the fp64 path on shaderFloat64-capable devices; this PR adds a fp32-fallback smoke and an
--vulkan-require-fp64strict-refusal smoke.
References¶
- ADR-0492: original double-precision promotion + hard refusal (superseded by this ADR).
- ADR-0214: GPU-parity CI gate (places=4 per-frame threshold).
- ADR-0350:
shaderBufferInt64Atomicsprobe precedent (same pattern). - research-0053: NVIDIA FMA contraction investigation (fp32
precisefix that the fp32 variant re-uses). - Source: user direction (paraphrased) — replace the hard
shaderFloat64refusal with a two-variant compile and runtime auto-pick; default is auto-fallback to fp32, opt-in--vulkan-require-fp64for bit-exact-strict workflows. Empirical fp32-vs-fp64-vs-CPU deltas (Intel Arc A380, AMD gfx1036, RTX 4090) collected on the dev host pre-fix and documented in the Context table above.