ADR-0599: Cross-Backend Parity Audit — Full Extractor Matrix (2026-05-18)¶

Status: Accepted
Date: 2026-05-18
Deciders: lusoris
Tags: cuda, sycl, vulkan, hip, ci, parity, audit

Context¶

The fork ships 17 distinct SYCL GPU twin extractors, 17 CUDA twins, 17 Vulkan twins, and 18 HIP registrations alongside the 20 CPU-only base extractors. Up to this point, parity was verified only for the three production-model features (ADM + VIF + motion) via the lavapipe gate (ADR-0176/0177/0178). A systematic, single-pass audit covering every registered extractor had not been run. Any new extractor registration (e.g. the ADR-0545 dedup pass, ADR-0533 HIP sweep) could silently introduce cross-backend divergence that the existing gate would not detect.

The user directed a full sweep: enumerate every entry in feature_extractor_list[], invoke it on the Netflix golden 576x324 fixture across all available backends (CPU, SYCL, CUDA, Vulkan, HIP), compare pooled means and per-frame values at full IEEE-754 precision (--precision max), and report any divergence at or beyond the ADR-0214 places=4 bar.

Decision¶

Run the full parity matrix in the vmaf-dev-mcp container on 2026-05-18 and publish results as Research-0550. The audit uses --precision max (IEEE-754 %.17g) to avoid the 6-decimal-place JSON format floor masking sub-1e-5 deltas. Backend isolation is achieved with --no_sycl, --no_cuda, --no_vulkan flags in turn. HIP is probed but limited to scaffold/ENOSYS (no discrete AMD GPU on the audit host).

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Extend existing lavapipe gate (ADM+VIF+motion only)	Already automated, zero new code	Covers only 3 of 18 extractors; new extractors remain unaudited indefinitely	Does not fulfil the "every registered extractor" requirement
Per-PR cross-backend matrix in CI	Catches regressions immediately	Requires lavapipe + CUDA + SYCL runners simultaneously; expensive; gate time >5 min	Deferred; this audit is a one-shot snapshot to establish the baseline
ULP comparison (integer representation)	Finest-grained metric	Requires source-level instrumentation; pooled JSON is the only output surface	--precision max gives full double precision without instrumentation

Consequences¶

Positive: Establishes a clean all-exact baseline for all 18 CPU extractors across SYCL, CUDA, and Vulkan as of commit e5d26e238 (2026-05-18). No P0 or P1 findings. The ADR-0214 places=4 gate is satisfied by a margin of several orders of magnitude (exact bit-for-bit agreement) for every tested extractor. The earlier 3.1e-5 ADM-scale1 delta documented in metrics-backends-matrix.md is no longer observed — it was eliminated by the ADR-0178 + ADR-0545 kernel hardening and registry dedup waves.
Negative: HIP parity is still unmeasured (no discrete AMD GPU). Requires a follow-up on a gfx1100/gfx1030 host. The ADR-0551 audit confirmed real HSACO kernels exist for all 18 HIP extractors; parity numbers are pending.
Neutral: speed_chroma, speed_temporal, and integer ssim (vmaf_fex_ssim) have no GPU twins — this is an intentional design choice, documented here as a registration coherence note (not a bug).

References¶

Research digest: Research-0550
ADR-0214 (cross-backend parity gate, places=4 requirement)
ADR-0176/0177/0178 (Vulkan VIF/motion/ADM gate, original scope)
ADR-0545 (registry dedup pass — removed 67→18 Vulkan entries, SYCL twin dedup)
ADR-0533 (HIP extractor sweep registration)
ADR-0551 (HIP extractor audit — confirmed all 18 have real HSACO kernels)
req: The user directed the full systematic sweep in the session briefing of 2026-05-18.