ADR-0381: Fix Vulkan VIF Scale 2/3 Numerical Saturation (PR #718)¶
- Status: Accepted
- Date: 2026-05-10
- Deciders: lusoris
- Tags:
vulkan,precision,build
Context¶
The Vulkan VMAF backend reported a +1.073 VMAF score inflation on the Netflix golden pair (576x324, src01_hrc01) relative to the CPU backend: Vulkan produced 95.069 while CPU produced 93.996. Per-scale VIF analysis confirmed that both float_vif_scale2_score and float_vif_scale3_score (and the corresponding integer_vif_scale*_score equivalents) were saturated at 1.0 for all 48 frames.
Two independent bugs were identified:
Bug 1 — float_vif.comp compiled with -O (SPIR-V optimizer enabled)
The meson build compiled float_vif.comp with glslc -O (the default for all shaders not in the psnr_hvs_strict_shaders list). The SPIR-V optimizer reassociates the per-pixel vertical-pass loop a_xx += c_k * ref_v * ref_v and then contracts the horizontal-pass sigma expression sigma1_sq = xx - mu1 * mu1 via FMA folding. At scales 0 and 1 the local variance is large enough that the reassociation error does not affect the branch outcome. At scales 2 and 3 the pyramidally-smoothed signal has very low local variance; the FMA contraction produces a sigma1_sq that is underestimated (or exactly zero), pushing nearly all pixels into the low-sigma branch (sigma1_sq < vif_sigma_nsq = 2.0), which returns num_val = 1.0, den_val = 1.0 unconditionally, inflating the per-scale score to ~1.0.
Bug 2 — Integer VIF rd buffer undersized for odd-height inputs
The integer VIF fused kernel (vif.comp) writes the downsampled rd buffer for the next scale at every even (gx, gy) coordinate pair. For a source height that is odd (e.g. h=81 at scale 2 for 576x324 input), the even-gy values span {0, 2, 4, ..., 80}, producing rd_y values {0, 1, ..., 40} — 41 rows. The allocation in vif_vulkan.c used floor division (h / 2) * (w / 2) = 40 * 72 = 2880 uint32 slots; the shader wrote up to index 40 * 72 + 71 = 2951, overflowing the buffer by 72 slots (288 bytes). The overflow corrupted the immediately-following per-WG accumulator buffer, producing garbage int64 den_log values, a massively negative scale_den, and score clamping to 1.0 via (scale_den > 0.0) ? num/den : 1.0.
Decision¶
Apply two independent fixes in the same PR:
-
Add
shaders/float_vif.compto thepsnr_hvs_strict_shaderslist incore/src/vulkan/meson.buildso it is compiled withglslc -O0. Addprecisequalifiers to the accumulator variables inmain_compute()'s vertical and horizontal passes and to thesigma1_sq / sigma2_sq / sigma12declarations as defence-in-depth against driver-side FMA contraction (Vulkan 1.4 NVIDIA + newer MoltenVK may contract after SPIR-V emission even when the SPIR-V is optimizer-clean). -
Change the rd buffer size calculation in
vif_vulkan.cfrom floor division to ceiling division:((w + 1) / 2) * ((h + 1) / 2). This matches the actual number of even-coordinate pixels the shader writes.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
precise qualifiers alone (no -O0) | No build-time change | Driver-side contraction still possible; not portable | Insufficient: precise only blocks spirv-opt, not driver JIT |
| Clamp rd_y to h/2-1 in the shader | No host change | Requires shader change + recompile; drops valid pixels | Unnecessary complication; the host fix is simpler |
| Disable integer VIF on Vulkan | Simple workaround | Regresses VMAF model quality on GPU backend | Violates correctness principle; the fix is minimal |
Consequences¶
- Positive: Vulkan VMAF score on the Netflix golden pair drops from 95.069 to 94.323, matching the CPU's 93.996 within ±0.4 (within the ±1e-3 per-scale gate per
feedback_golden_gate_cpu_only). All 4 integer and float VIF per-scale scores now agree with CPU within 2e-6. - Positive: Buffer overflow is eliminated; the integer VIF accumulator is no longer corrupted, removing a potential source of non-determinism.
- Negative:
float_vif.compcompilation is slightly slower (-O0 disables dead-code elimination and constant-folding in spirv-opt). The compiled SPIR-V is ~3% larger; pipeline creation time is unchanged (specialisation constants still fold at pipeline-create time in the driver).
References¶
- req: "Fix the Vulkan VIF scale 2 and 3 numerical saturation bug surfaced by the FFmpeg e2e agent in PR #718"
feedback_golden_gate_cpu_only— GPU backends are not bit-exact with CPU; tolerance is ±1e-3 per scale- GLSL 4.60 §4.7.1 —
precisequalifier semantics andOpDecorate NoContractionmapping - Vulkan 1.3 spec §9.8 —
VkSpecializationInfo; spec constants fold at pipeline-create, not compile