ADR-1104: Remove AVX-512 dispatch from float VIF convolution to restore Netflix golden scores¶
- Status: Accepted
- Date: 2026-06-13
- Deciders: Lusoris, Claude (Anthropic)
- Tags:
simd,correctness,float-vif,bug-fix
Context¶
ADR-0504 added AVX-512 dispatch to the three float VIF convolution wrappers (vif_filter1d_s, vif_filter1d_sq_s, vif_filter1d_xy_s) in core/src/feature/vif_tools.c, routing to convolution_f32_avx512_{s,sq_s,xy_s} when VMAF_X86_CPU_FLAG_AVX512 is set at runtime.
ADR-0504 stated: "Netflix golden assertions still pass at their declared places tolerance." This claim was incorrect on AVX-512 hardware.
The AVX-512 float convolution path uses a wider FMA partial-sum tree (16 floats per iteration vs 8 in AVX2). IEEE-754 FMA is not associative; the wider reduction order yields different rounding, producing a VMAFEXEC_score mean of approximately 76.66729 on the Netflix src01 pair instead of 76.66740433333332 (the protected golden assertion in python/test/vmafexec_test.py line 156). The absolute difference is approximately 1.1e-4, which exceeds the places=4 threshold of 5e-5.
The regression was latent from the day ADR-0504 landed: GitHub Actions runners (Azure VMs) do not expose AVX-512, so CI always took the AVX2 code path and the test appeared green. Developer machines with AVX-512 (e.g. Ice Lake, Zen 4) failed locally. The failure surfaced during a full-matrix validation run that exercised the AVX-512 path explicitly.
A prior fix (changelog entry avx512-float-conv-width-alignment-emergency) guarded the dispatch behind a row-width alignment check (w % 16 == 0) to prevent out-of-bounds reads on narrower rows, but did not address the numerical divergence.
Decision¶
Remove the #if HAVE_AVX512 dispatch blocks from vif_filter1d_s, vif_filter1d_sq_s, and vif_filter1d_xy_s in core/src/feature/vif_tools.c. The float VIF path will dispatch to AVX2 (unchanged from upstream Netflix/vmaf) or scalar.
The convolution_f32_avx512_s/sq_s/xy_s implementations in convolution_avx512.c are retained (they are still used by other callers and may be re-enabled for non-golden paths in the future), but are no longer called from vif_tools.c.
This restores full golden-score parity:
- Float model (
vmaf_float_v0.6.1.json):76.66744— diff3.6e-5<5e-5(places=4 PASS) - Integer model (
vmaf_v0.6.1.json):76.66783— diff5.8e-7<5e-5(places=4 PASS)
Full test run: 271 passed, 12 skipped, 0 failed (vmafexec_test, vmafexec_feature_extractor_test, quality_runner_test, feature_extractor_test, result_test, ssimulacra2_test).
The integer VIF AVX-512 path (vif_avx512.c) is unaffected — it is a separate code path with its own integer arithmetic and is covered by the explicitly accepted divergence in ADR-0214.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Weaken the golden assertion (increase places) | CI would pass | Violates the project's core rule: golden assertions are immutable; "Fix the code, not the assertion" | Not an option |
| Keep AVX-512 dispatch and round-correct the wider accumulator | Preserves AVX-512 performance | Matching exact Netflix AVX2 bit pattern via 512-bit arithmetic is not practically achievable without emulation; no performance benefit for correctness | Not feasible |
Guard dispatch behind enable_avx512=false build flag only | Easy to toggle | Doesn't help on default builds on AVX-512 machines; same correctness problem | Not sufficient |
| Use compensated summation (Kahan) in AVX-512 path | Numerically stable | Adds complexity and latency; correctness target is Netflix golden, not generic precision; AVX2 path already matches | Not needed |
Consequences¶
- Positive: Netflix golden assertions pass on all hardware (AVX-512 and non-AVX-512) at
places=4. - Neutral: Float VIF throughput on AVX-512 CPUs reverts to the AVX2 level (the level that existed before ADR-0504). This is the upstream Netflix/vmaf performance baseline.
- Negative: The AVX-512 throughput gain from ADR-0504 is lost for the float VIF path (~40-50 % on AVX-512 CPUs per ADR-0504 profile). Re-enabling it would require a different numerical correction strategy.
- Follow-up: If AVX-512 float VIF performance is desired in the future, an ADR must explicitly document the accepted deviation from the Netflix golden and add a CI gate (e.g. a separate snapshot JSON for AVX-512) rather than relying on the integer-path ADR-0214 acceptance.
References¶
- ADR-0504 (
0504-float-convolution-avx512-port.md) — the decision this corrects. - Changelog entry
avx512-float-conv-width-alignment-emergency— row-alignment guard (unrelated to numerical divergence; the alignment guard remains in convolution_avx512.c for other callers). - ADR-0214 — precedent for accepted ULP divergence on the integer VIF path; does NOT cover float VIF.
python/test/vmafexec_test.pyline 156 — protected Netflix golden assertion (76.66740433333332,places=4).