The fork's CI build matrix in .github/workflows/libvmaf-build-matrix.yml and .github/workflows/tests-and-quality-gates.yml accumulated redundant legs over successive PRs without a systematic review. As of 2026-05-28 the matrix ran 20 active build rows plus separate test jobs, with several rows that were proper subsets of other rows (the plain CPU legs covered no failure mode not already caught by the DNN legs). The VMAFX rebrand plan (ADR-0686 umbrella) called for "CI matrix: dedupe properly — catch everything but don't run a hundred times; avoid the current N-way matrix fan-out."
Additionally, tests-and-quality-gates.yml contained two Vulkan cross-backend jobs (vulkan-vif-cross-backend and vulkan-parity-matrix-gate) that ran the identical workload — same binary, same fixtures, same lavapipe ICD, same Netflix normal pair — with the newer parity gate being a strict superset of the older per-feature diff job.
We remove the following legs from the PR CI matrix and consolidate as described:
Removed from libvmaf-build-matrix.yml — libvmaf-build job¶
Removed row
Rationale
Build — Ubuntu gcc (CPU)
Proper subset of Build — Ubuntu gcc (CPU) + DNN. The DNN build compiles the entire CPU source tree with the same compiler, additionally links ORT, and runs meson test. No unique regression class.
Build — Ubuntu clang (CPU)
Proper subset of Build — Ubuntu clang (CPU) + DNN. Same rationale as above.
Build — macOS clang (CPU)
Proper subset of Build — macOS clang (CPU) + DNN (experimental: true). macOS DNN leg already validates the macOS clang + libvmaf surface.
Build — macOS Vulkan via MoltenVK (advisory)
Was continue-on-error: true and experimental: true — never a merge gate. Consumed ~12+ min of macOS runner time per PR. Known fragile dependency: GL_EXT_shader_atomic_int64 via Metal Tier-2 argument buffers (ADR-0338 known limitations). Moved to nightly.yml where the advisory cadence is appropriate.
Build — Ubuntu CUDA (dynamic, gcc+nvcc)
Proper subset of Build — Ubuntu SYCL + CUDA (combined leg). All CUDA translation units are compiled by nvcc in the combined leg. The static-library CUDA leg (Build — Ubuntu CUDA Static) is retained because it exercises a distinct NVCC-static interaction that the dynamic combined leg does not.
Pure duplicate of vulkan-parity-matrix-gate. Both jobs: build CPU+Vulkan with the same meson flags, select lavapipe (VK_LOADER_DRIVERS_SELECT: '*lvp*'), run against the Netflix normal pair (576×324), and diff per-frame feature scores. The parity gate uses cross_backend_parity_gate.py with calibrated ULP tolerances from gpu_ulp_calibration.yaml and covers all 17 features in one invocation; the removed job used cross_backend_vif_diff.py in 12 sequential steps. No coverage is lost.
Preserves macOS MoltenVK coverage at a lower cadence; continue-on-error: true unchanged since MoltenVK-specific SPIR-V failures are a known-limitation class, not a correctness gate.
None. Every removed leg was advisory (not listed in required-aggregator.yml). The aggregator's required array is unchanged. No branch-protection update needed.
Positive: PR matrix reduces from 20 active build rows to 15. tests-and-quality-gates.yml loses one complete duplicate job. MoltenVK advisory coverage is preserved in nightly. Approximate wall-clock saving per PR: ~15–25 minutes of runner time (3 × Ubuntu CPU legs 1 × macOS Vulkan MoltenVK leg dropped).
Negative: A regression that only manifests on plain CPU gcc/clang without ORT linked in would not be caught on the PR matrix (effectively zero risk — ORT is header-included, not invasive; the DNN flag adds ORT link steps only).
Neutral / follow-ups: The continue-on-error field on the libvmaf-build job is simplified to false unconditionally (the only row that set it to true was the MoltenVK row, which is now gone from this job).