ADR-0391: ciede2000 Vulkan NVIDIA places=4 fork debt is a structural f32/f64 precision gap¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: lusoris, Claude (Anthropic)
- Tags: vulkan, ciede, precision, gpu, nvidia, fork-local
Context¶
PR #346 ("vif + ciede shaders — precise decorations") cut ciede2000's NVIDIA-Vulkan places=4 cross-backend mismatch from 42/48 to 5/48 frames (max abs 1.67e-04 → 8.9e-05) by tagging the load-bearing FP ops in ciede.comp with GLSL precise. The remaining 5/48 tail at 1.78× the places=4 threshold (5.0e-05) was deferred there as "CPU-side double-vs-float bisect follow-up."
This ADR closes that follow-up. The investigation (see research-0055) ran a controlled experiment: rebuild the CPU ciede.c::get_lab_color to do its colour-space chain in float instead of double (matching the Vulkan shader's precision contract), then diff against the NVIDIA RTX 4090 (driver 595.71.05) Vulkan output. Result: float-CPU and NVIDIA-Vulkan agree to ~6e-7 on the 5 frames that fail double-CPU vs NVIDIA-Vulkan at places=4 (frames 0/1/2/5/6 — the highest-ΔE frames in the 48-frame fixture). Conversely, on the 43 frames that pass double-CPU vs NVIDIA-Vulkan, float-CPU vs NVIDIA-Vulkan diverges by ~8.5e-5 (i.e. worse than the gate). The signal is unambiguous: the shader is doing math correctly; the CPU reference is doing math in double and the residual gap is the irreducible f32 vs f64 precision delta on the highest-ΔE pixels, where the per-pixel ΔE sum amplifies single-precision rounding.
The CPU reference's get_lab_color() takes double arguments and runs the BT.709 → linear-RGB → XYZ → Lab chain in double, narrowing to float only on assignment to the LABColor struct. The Vulkan shader, like every fork GPU backend, is float32 throughout. Vulkan's shaderFloat64 device feature is optional (not present on most consumer GPUs at full throughput — RTX 4090 reports it but at 1/64 of fp32 throughput; lavapipe supports it without penalty but lavapipe is already passing places=4) and would require dual-pipeline compilation to keep the lavapipe lane fast. Even if shaderFloat64 were used, the SPIR-V spec doesn't mandate IEEE-754 conformance for f64 transcendentals — another driver-divergence vector.
Decision¶
We accept the 5/48 NVIDIA-Vulkan ciede2000 places=4 mismatch as a structural fork debt and document it under docs/state.md Open bugs as T-VK-CIEDE-F32-F64. We do not promote the ciede shader to f64. The lavapipe parity gate (places=4, currently 0/48) remains authoritative for CI; NVIDIA hardware validation stays a manual local gate documented in docs/backends/vulkan.md. PR #346's precise decorations stay as the high-water mark of what shader-level mitigation can achieve on f32; further work on this tail requires either f64 promotion (rejected here) or f32-remediation in the CPU reference (out of scope — the CPU reference is the source of truth and the Netflix golden gate is built around its current f32/f64 mix).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Promote ciede shader to f64 with shaderFloat64 device feature | Closes the gap on hardware that supports it (RTX 4090 does, lavapipe does, Mesa anv does) | Optional Vulkan feature — not portable. RTX 4090 runs f64 at 1/64 fp32 throughput (~64× slower per-pixel). SPIR-V f64 transcendentals (Pow, Sqrt, Sin) are not bit-mandated by spec — driver divergence still possible. Would require dual-pipeline (f32 lavapipe / f64 NVIDIA) to keep lavapipe fast | Complexity + perf cost greatly outweighs the 5/48 tail at 1.78× threshold |
| Add CPU-side f32 fast path under a flag, ship f32-CPU as the fork's reference for ciede2000 | Trivially closes the GPU gap on every backend | Breaks the 8-year-old CPU reference behaviour. Netflix golden gate built around current f32/f64 mix. Diverges from upstream Netflix/vmaf master (rebase debt) | Out of scope — CPU reference is source-of-truth; we don't change it to make the GPU look good |
Polynomial approximation of pow(x, 2.4) and pow(x, 1/3) matched to glibc's f64 implementation, evaluated in f32 with Kahan summation | Closes the gap purely shader-side, no f64, no CPU change | Substantial engineering effort to match glibc's specific transcendental implementation; not portable across Vulkan drivers (lavapipe vs NVIDIA Pow SPIR-V lowerings differ); 5/48 tail doesn't justify the complexity | Cost-benefit fails — places=4 threshold at 1.78× is acceptable as documented fork debt |
| Accept as documented fork debt; lavapipe gate authoritative (this ADR) | Zero engineering cost. lavapipe passes places=4 (0/48 mismatches). NVIDIA gap is bounded (1.78× threshold, 5/48 frames). Bug entry in docs/state.md keeps it visible | NVIDIA-hardware users see the gap on local cross-backend diff; mitigated by docs in docs/backends/vulkan.md | Chosen — proportional response to a structural precision issue with no clean shader-level fix |
Consequences¶
- Positive: closes the PR #346 deferred follow-up with a definitive root cause; replaces "5/48 mystery tail" with "irreducible f32/f64 gap on high-ΔE frames"; future investigators don't redo the experiment.
- Negative: NVIDIA-hardware users running the local
cross_backend_vif_diff.py --feature ciede --backend vulkansee 5 failures on the 576×324 fixture. Mitigated by an entry indocs/backends/vulkan.mdcalling this out. - Neutral / follow-ups: T-VK-CIEDE-F32-F64 row added to
docs/state.mdOpen bugs; will close if a future ADR enables optional f64 path (e.g. for an HDR-Lab use case where f64 is already required) or if the CI parity matrix grows an NVIDIA hardware lane via self-hosted runner — that lane would then need a per-feature tolerance override (places=3 for ciede on NVIDIA) to be green.
References¶
- PR #346 ("vif + ciede shaders — precise decorations") commit message reservation: "deferred to a CPU-side double-vs-float bisect follow-up"
- ADR-0187 — original ciede Vulkan kernel
- ADR-0265 (in PR #346, not yet merged) —
precisedecorations rationale - research-0055 — full experimental data
docs/state.mdOpen bugs — T-VK-CIEDE-F32-F64- Source:
req— "investigate this pre-existing places=4 violation on NVIDIA driver, root-cause it, and either ship a fix OR file it as a tracked Open bug"