ADR-0599: Cross-Backend Parity Audit — Full Extractor Matrix (2026-05-18)¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris
- Tags:
cuda,sycl,vulkan,hip,ci,parity,audit
Context¶
The fork ships 17 distinct SYCL GPU twin extractors, 17 CUDA twins, 17 Vulkan twins, and 18 HIP registrations alongside the 20 CPU-only base extractors. Up to this point, parity was verified only for the three production-model features (ADM + VIF + motion) via the lavapipe gate (ADR-0176/0177/0178). A systematic, single-pass audit covering every registered extractor had not been run. Any new extractor registration (e.g. the ADR-0545 dedup pass, ADR-0533 HIP sweep) could silently introduce cross-backend divergence that the existing gate would not detect.
The user directed a full sweep: enumerate every entry in feature_extractor_list[], invoke it on the Netflix golden 576x324 fixture across all available backends (CPU, SYCL, CUDA, Vulkan, HIP), compare pooled means and per-frame values at full IEEE-754 precision (--precision max), and report any divergence at or beyond the ADR-0214 places=4 bar.
Decision¶
Run the full parity matrix in the vmaf-dev-mcp container on 2026-05-18 and publish results as Research-0550. The audit uses --precision max (IEEE-754 %.17g) to avoid the 6-decimal-place JSON format floor masking sub-1e-5 deltas. Backend isolation is achieved with --no_sycl, --no_cuda, --no_vulkan flags in turn. HIP is probed but limited to scaffold/ENOSYS (no discrete AMD GPU on the audit host).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Extend existing lavapipe gate (ADM+VIF+motion only) | Already automated, zero new code | Covers only 3 of 18 extractors; new extractors remain unaudited indefinitely | Does not fulfil the "every registered extractor" requirement |
| Per-PR cross-backend matrix in CI | Catches regressions immediately | Requires lavapipe + CUDA + SYCL runners simultaneously; expensive; gate time >5 min | Deferred; this audit is a one-shot snapshot to establish the baseline |
| ULP comparison (integer representation) | Finest-grained metric | Requires source-level instrumentation; pooled JSON is the only output surface | --precision max gives full double precision without instrumentation |
Consequences¶
- Positive: Establishes a clean all-exact baseline for all 18 CPU extractors across SYCL, CUDA, and Vulkan as of commit
e5d26e238(2026-05-18). No P0 or P1 findings. The ADR-0214 places=4 gate is satisfied by a margin of several orders of magnitude (exact bit-for-bit agreement) for every tested extractor. The earlier 3.1e-5 ADM-scale1 delta documented inmetrics-backends-matrix.mdis no longer observed — it was eliminated by the ADR-0178 + ADR-0545 kernel hardening and registry dedup waves. - Negative: HIP parity is still unmeasured (no discrete AMD GPU). Requires a follow-up on a gfx1100/gfx1030 host. The ADR-0551 audit confirmed real HSACO kernels exist for all 18 HIP extractors; parity numbers are pending.
- Neutral:
speed_chroma,speed_temporal, and integerssim(vmaf_fex_ssim) have no GPU twins — this is an intentional design choice, documented here as a registration coherence note (not a bug).
References¶
- Research digest: Research-0550
- ADR-0214 (cross-backend parity gate, places=4 requirement)
- ADR-0176/0177/0178 (Vulkan VIF/motion/ADM gate, original scope)
- ADR-0545 (registry dedup pass — removed 67→18 Vulkan entries, SYCL twin dedup)
- ADR-0533 (HIP extractor sweep registration)
- ADR-0551 (HIP extractor audit — confirmed all 18 have real HSACO kernels)
req: The user directed the full systematic sweep in the session briefing of 2026-05-18.