Cross-backend GPU-parity gate¶

The GPU-parity gate is the fork's single matrix gate for verifying that every GPU backend agrees with the CPU scalar reference (and with each other) on every feature with a GPU twin. It generalises the older single-feature gate (scripts/ci/cross_backend_vif_diff.py) to a one-shot run that covers every (feature, backend-pair) cell.

The gate is enforced on every PR as the CI job vulkan-parity-matrix-gate in .github/workflows/tests-and-quality-gates.yml.

What it gates¶

Per-feature absolute tolerance. Default 5e-5 (places=4 — the fork's GPU-vs-CPU contract from ADR-0125 / ADR-0138 / ADR-0140). Feature-specific relaxations live in the FEATURE_TOLERANCE table inside scripts/ci/cross_backend_parity_gate.py and each carries an ADR pointer:

Feature	Tolerance	Contract source
`vif`, `motion`, `motion_v2`, `adm`, `psnr`, `float_moment`	`5e-5`	ADR-0125 / ADR-0138 / ADR-0140
`float_ssim`, `float_ms_ssim`, `float_psnr`, `float_motion`, `float_vif`, `float_adm`	`5e-5`	ADR-0188 / ADR-0192
`ciede`	`5e-3`	ADR-0187 (per-pixel pow/sqrt/sin/atan2)
`psnr_hvs`	`5e-4`	ADR-0191 (DCT + per-block float reduction)
`ssimulacra2`	`5e-3`	ADR-0192 (XYB cube root + IIR blur)

Vulkan motion is routed through the canonical integer_motion_vulkan extractor in both parity scripts. The legacy motion_vulkan extractor remains addressable by explicit name for compatibility, but it is not the lavapipe gate path after ADR-0662.

Backend pairs. The CI lane runs cpu ↔ vulkan only — Mesa lavapipe runs on stock GitHub-hosted Ubuntu and needs no GPU hardware. CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is wired in (see docs/development/self-hosted-runner.md); the script already supports them via --backends cpu cuda sycl vulkan.
FP16 features. A --fp16-features flag forces the 1e-2 absolute tolerance the FP16 contract calls for. Used by the future ONNX Runtime tiny-AI lane (T7-39).

How to run it locally¶

# Build CPU + Vulkan once
cd libvmaf && meson setup build \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build

# Run the matrix gate on the fork's stock 576×324 fixture
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 \
    --backends cpu vulkan \
    --json-out /tmp/parity.json --md-out /tmp/parity.md

Default fixture in CI is the Netflix src01_hrc00_576x324.yuv ↔ src01_hrc01_576x324.yuv pair fetched from the Netflix/vmaf_resource mirror with caching — same fixture as the legacy lane.

How to read failure output¶

The gate writes two artefacts:

JSON (--json-out) — one record per cell with status, tolerance_abs, n_frames, per_metric_max_abs_diff, per_metric_mismatches, and a free-text note field for ERROR-class failures (binary not built, frame-count mismatch, Vulkan ICD init failure, etc.). Schema is versioned in the top-level schema_version field.
Markdown (--md-out) — single status table plus a per-failure detail block. Suitable for direct PR comment.

A cell's status is one of:

Status	Meaning
`OK`	Every per-frame metric within tolerance
`FAIL`	At least one per-frame mismatch beyond `tolerance_abs`
`ERROR`	Run failed before diffing (binary missing, fixture missing, ICD init failed, frame count mismatch)

How to add a new feature to the gate¶

Add the feature → metric-name list to FEATURE_METRICS in scripts/ci/cross_backend_parity_gate.py. The metric names must match the keys the vmaf JSON output emits when the feature is selected via --feature <name>.
If the feature is FP32 with no relaxations, no further change is needed — it picks up the default 5e-5 (places=4) automatically.
If the feature carries a relaxed contract, add an entry to FEATURE_TOLERANCE with an inline comment citing the ADR that justifies the relaxation. Per CLAUDE.md §12 r1 ("Netflix golden assertions are untouchable") any tightening later requires a measurement-driven follow-up ADR.
Add a row to the table in this document.

Relationship to other gates¶

Gate	Role
Netflix golden (§8)	CPU numerical correctness; required status check, untouchable
GPU-parity matrix gate (this doc, T6-8)	CPU↔GPU agreement on every feature; required status check
`vulkan-vif-arc-nightly`	Hardware-Vulkan drift versus lavapipe (advisory, self-hosted)
Per-backend snapshot diffs (`testdata/scores_cpu_*.json`)	Per-backend regression catch (snapshot-based, not pairwise)

Source¶

ADR: ADR-0214.
Backlog row: T6-8.
Wave-1 roadmap §7.