ADR-0973: Master CI fixes — Metal MS-SSIM fixture dim + ssimulacra2 icpx XYB bit-exactness¶
- Status: Accepted
- Date: 2026-05-31
- Deciders: Lusoris
- Tags: ci, simd, metal, ssimulacra2, icpx, bit-exactness, tests
Context¶
Two CI regressions on master (tip 4948b771c) blocked the merge train:
-
macOS Metal MS-SSIM parity (
test_metal_float_ms_ssim_parity) failed on all three macOS jobs withCPU: vmaf_read_pictures failed. Root cause: the test fixture usedFIXTURE_W = 256u,FIXTURE_H = 144u. The CPUfloat_ms_ssimextractor enforces a minimum input dimension at init (core/src/feature/float_ms_ssim.c:131-138):min_dim = GAUSSIAN_LEN << (SCALES - 1) = 11 << 4 = 176.144 < 176causesinit()to return-EINVAL, which propagates tovmaf_read_pictures— the CPU twin failed before the Metal path ran. The macOS-only signal was misleading; the bug was unconditional, but the test was metal-gated so it only ran on Apple runners. -
Linux all-backends
test_ssimulacra2_simd::test_xybfailed withlinear_rgb_to_xyb SIMD not bit-identical to scalar. Root cause: icx / icpx (Intel oneAPI 2025.3 in CI, reproduced on 2026.0.0 locally) emitsvfmadd231psinstructions for the inline scalar referenceref_linear_rgb_to_xyb's matrix-multiply chain (kM00 * r + m01 * g + kM02 * b + kOpsinBias) despite the test TU being compiled with both-ffp-contract=offAND-fp-model=precise. The AVX2/AVX-512 SIMD path uses explicit_mm256_mul_ps+_mm256_add_psintrinsics (no_mm256_fmadd_psinlinear_rgb_to_xyb_avx2), so it emits separatevmulps+vaddps. Scalar contracted to FMA, SIMD did not — divergence by ~1 ULP per lane,memcmp()bit-exactness assertion failed.
Both failures were verified locally in the vmaf-dev-mcp container before any code change. See companion research digest: docs/research/0973-master-ci-regressions-verified-2026-05-31.md.
Decision¶
We will:
-
Bump
FIXTURE_Hincore/test/test_metal_float_ms_ssim_parity.cfrom144uto192u(176 rounded up to a multiple of 16 for clean pyramid downsamples). KeepFIXTURE_W = 256u(already ≥ 176). -
Add
#pragma clang fp contract(off)(with a paired-Wunknown-pragmassuppression for GCC) at file scope incore/test/test_ssimulacra2_simd.c. This pragma is empirically the only mechanism that suppresses FMA contraction in icx 2025.3 / 2026.0 —-ffp-contract=off,-fp-model=precise, and#pragma STDC FP_CONTRACT OFFare all ignored by icx for inline scalar code in this TU. icx is clang-based, so the clang FP pragma is honoured. GCC ignores the pragma (already had non-contracted code via-ffp-contract=off), with the warning suppressed.
The pragma is scoped to the test TU only. The production SIMD kernels and the production scalar extractor are unchanged — there is no score drift.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Bump fixture to 256x192 in metal test (chosen) | Trivial, mechanical, makes the test do what it always claimed to do (a 5-scale pyramid fixture) | None | Chosen |
Lower min_dim enforcement in float_ms_ssim.c | None for the test | Breaks the Netflix#1414 invariant — small inputs would silently corrupt mid-pyramid | Rejected: load-bearing invariant |
| Use a different feature with no min-dim | None | Breaks the test's purpose (validates float_ms_ssim_metal specifically) | Rejected |
Add #pragma clang fp contract(off) to test TU (chosen) | Surgical (one pragma at one file scope), zero impact on production code/scores, mirrors existing pattern in core/src/feature/sycl/integer_adm_sycl.cpp, passes on both GCC and icx | None observed | Chosen |
Switch _simd_strict_fp_args to -fp-model=strict for icx | Also suppresses FMA in icx (verified empirically) | Broader semantics (denormals, exceptions) — risk of unrelated drift in other parts of the test TU; affects ALL SIMD tests, not just ssimulacra2 | Rejected: pragma is narrower |
Switch AVX2/AVX-512 SIMD to _mm*_fmadd_ps (mirror ADR-0891 ptlr pattern) | Symmetric with existing ADR-0891 fix for picture_to_linear_rgb | Changes production SIMD score by ~1 ULP — risks breaking the snapshot gate and the cross-extractor parity invariant where GCC-compiled scalar core/src/feature/ssimulacra2.c (-ffp-contract=off default) does NOT use FMA | Rejected: introduces score drift |
Rewrite scalar ref with volatile temporaries | Forces no contraction | Verbose, fragile, easy for a future contributor to "clean up" and reintroduce the bug | Rejected: pragma is self-documenting |
Add __attribute__((optnone)) to ref functions | Definitely disables FMA | Disables ALL optimisations, slows the test, leaves a foot-gun | Rejected |
Consequences¶
- Positive: macOS Metal jobs go green again. Linux all-backends (icpx) job goes green again. No production code change, no score drift, no risk to the snapshot gates.
- Negative: Adds one more file with a
#pragma clang fp contract(off)block; future contributors must know that the SSIMULACRA 2 test's scalar reference relies on this pragma when built with icx. This is documented inline at the top of the test file and recapped incore/test/AGENTS.md. - Neutral / follow-ups: A future round-fix could also audit
core/src/feature/x86/ssimulacra2_host_avx2.candcore/src/feature/x86/ssimulacra2_avx512.c, which use#pragma STDC FP_CONTRACT OFF(ignored by icx) for their scalar tail loops. The test-TU pragma alone resolves the failing test; those prod-side pragmas are vestigial but harmless on GCC and sit inside the strict-FP carve-out static lib (-ffp-contract=off+-fp-model=precise) whose intrinsics-only SIMD body is the actual binary surface against which the test compares. Auditing prod-side pragmas is out of scope here.
References¶
core/src/feature/float_ms_ssim.clines 131-138 (min-dim gate, Netflix#1414).core/test/test_float_ms_ssim_min_dim.c(existing test proving the 176 floor).core/src/feature/x86/ssimulacra2_avx2.c::ssimulacra2_linear_rgb_to_xyb_avx2(uses explicit_mm256_mul_ps+_mm256_add_ps, no FMA intrinsics).core/test/meson.buildlines 25-39, 598-622 (the_simd_strict_fp_args+_ssimulacra2_test_x86_fma_argsregime that this fix supplements).- ADR-0153 (Netflix#1414 —
float_ms_ssimmin-dim init check). - ADR-0161 / ADR-0162 / ADR-0163 (SSIMULACRA 2 SIMD bit-exactness contract).
- ADR-0214 (cross-backend parity gate).
- ADR-0589 (Metal SSIM L/C/S parity bound).
- ADR-0891 (explicit
fmaf()unification inpicture_to_linear_rgb). - Source: per user direction —
req("Fix 2 verified master CI regressions. HARD REQUIREMENT: reproduce EACH failure locally in thevmaf-dev-mcpcontainer BEFORE writing any fix. No guessing.")