Skip to content

Rebase notes

fix/codeql-quality-batch — code-scanning hygiene (2026-06-27)

Small behaviour-neutral quality fixes. Upstream-mirror touches to re-apply on the next upstream sync: removed an unused from collections.abc import Hashable in compat/python-vmaf/tools/decorator.py, and unused pytest/tempfile imports in two compat/python-vmaf/tests/ files. Fork files: core/tools/vmaf.cpp (2-label switch→if), and new include guards on core/src/feature/moment.h / alias.h. The bulk of the code-scanning backlog was resolved by dismissal (verified false-positive/intentional via the GitHub code-scanning API), not code change — see docs/state.md T-CODEQL-QUALITY-BATCH.

fix/round4-ffmpeg-patches — libvmaf_sycl filter leak + QSV NULL guard (2026-06-27)

ffmpeg-patches/0005-libvmaf-add-libvmaf-sycl-filter.patch gained two fixes in its libavfilter/vf_libvmaf.c hunk (new-count 335→353): uninit_sycl now calls vmaf_sycl_state_free() after vmaf_close(), and do_vmaf_sycl NULL-guards the QSV mfxHDLPair chain. The patch was regenerated surgically (only those two + blocks + the hunk-header recount; the configure/Makefile/allfilters hunks are byte-unchanged). Do NOT let git format-patch/git am --3way regenerate the whole patch — that fuzzed the configure probe >= 3.0.02.0.0 / libvmaf/libvmaf_sycl.hlibvmaf_sycl.h. Verified by a full 16-patch git apply --3way series replay against n8.1.1. Keep these two + blocks on re-sync. Finding #21 (a redundant but idempotent check_pkg_config libvmaf_sycl configure probe) was intentionally left in place.

fix/round4-cli-build-go — round-4 audit bug-fix bundle (2026-06-27)

All fork-added/fork-modified surfaces (no upstream-mirror conflict risk):

  • core/src/meson.build — added _x86_simd_strict_fp_extra to the x86_float_adm_avx2 / x86_float_adm_avx512 carve-outs (icx fp-model parity; no-op on gcc/clang). Keep when re-syncing the meson SIMD carve-out block.
  • core/tools/vmaf.cpp, core/tools/vmaf_bench.c — fork-added CLI timing helpers (wall_time_s / now_ms): zero-init + cached static QPF frequency.
  • core/tools/meson.build — comment-only path fix.
  • pkg/libvmaf/paths.go — fork-added Go MCP path allowlist (AllowedRoots fail-closed via discoverRepoRoot). RepoRoot() signature unchanged.

fix/round4-c-bundle — round-4 audit bug-fix bundle (2026-06-27)

Audit-derived bug fixes; several touch upstream-mirror files, so the next upstream sync must preserve these hunks (they are not in Netflix/vmaf):

  • core/src/feature/ciede.c — init returns -EINVAL (not -ENOMEM) for an unsupported bitdepth; close uses two independent vmaf_picture_unref guards (was a conjunctive guard that leaked s->ref on partial alloc).
  • core/src/feature/cambi.c — close guards vmaf_picture_unref on s->pics[i].ref so never-allocated slots don't poison err.
  • core/src/feature/integer_ssim.c — comment-only correction of the GPU-twin note (the const double sm fix itself is unchanged).
  • core/src/read_json_model.c — partial-collection teardown on the non-string-key early return in model_collection_parse_loop.
  • core/src/model.cvmaf_model_collection_append short-name path now goto fail_model to free mc->model.

Fork-added files (no upstream-sync concern): core/src/feature/cuda/speed_*, core/src/dnn/ort_backend.c.

gorust-rederive — bound GPU/AI probe subprocesses + Rust -sys picture double-free footgun (2026-06-27)

Rebase impact: none on upstream Netflix/vmaf — all fork-local. Every touched surface (pkg/gpu/detect.go, pkg/ai/infer.go, bindings/rust/vmafx-sys/src/safe.rs, core/src/meson.build) is fork-added Go/Rust/build code with no upstream counterpart, so a future /sync-upstream sees no conflict here.

Cross-crate invariant: vmafx-sys::safe::VmafContext::read_pictures and the higher-level vmafx::Context::read_pictures must keep aligned picture-ownership semantics — both consume pictures by value (move) and neither manually unrefs on the error path (the libvmaf contract takes ownership for the call's duration; a second unref is a use-after-free against a CUDA-enabled libvmaf). The vmafx crate side was settled by PR #1056 (round-3 R3-2); this change brings the -sys crate to the same contract. Do not revert either to a borrowing signature or re-add an error-path unref. See bindings/rust/vmafx-sys/AGENTS.md.

Note: this change also restores docs/state.md (truncated to 0 bytes by PR

1055, the pelorus ABI re-vendor) and docs/rebase-notes.md itself (truncated

to 0 bytes by PR #1060, the FMA-ADM fix) — two unrelated accidental wipes on master that are recovered here from their last-good blobs.

feat/pelorus-abi-minor3-consume — re-pin vendored Pelorus ABI to minor-3 + consume PEL_SEC_COMPLEXITY (2026-06-27)

Rebase impact: none on upstream Netflix/vmaf — all fork-local (ADR-1120, builds on ADR-1113 + ADR-1118). Cross-repo ABI parity invariant: the vendored Pelorus interop mirror is single-sourced in VMAFx/pelorus (ADR-0103) and pinned by PELORUS_VENDOR_SHA in scripts/sync-pelorus-interop.sh — now 818d844 (ABI 1.3, was 835e097 / ABI 1.0). The drift-guard CI gate (sync-pelorus-interop.sh without --update) fails on any divergence from the pin, so a future maintainer must NOT hand-edit the vendored files (core/include/libvmaf/pelorus/*.h, core/src/interop/pelorus_*.c, and the body of core/test/test_pelorus_interop.c from its first vendored #include on) — fix defects upstream in pelorus and re-vendor via --update. - Manifest invariant: the script's manifest array, core/src/meson.build, and the test_pelorus_interop target in core/test/meson.build must stay in lockstep with the pelorus source set. Minor-3 added pelorus/denoise.h + pelorus_denoise_params.c + pelorus_qp_report_csv.c; the last is REQUIRED to link the fixture (pel_x265_csv_parse). - --update now re-vendors the fixture body (previously only the six manifest files), preserving the Lusoris-authored header before the first vendored include. The drift check compares the body whitespace-insensitively. - Vendored files are lint/format-excluded by prefix glob (core/src/interop/pelorus_, core/include/libvmaf/pelorus/) in .pre-commit-config.yaml, Makefile, and scripts/ci/assertion-density.sh — new vendored files matching those prefixes are covered automatically; no new exclusion entries are needed. - Complexity modulation (golden-isolation invariant, rebase-sensitive): perceptual_weight.c::complexity_modulation MUST return exactly 1.0 when PEL_SEC_COMPLEXITY is absent or complexity is non-finite — that is what keeps the no-side-data golden path bit-exact (Netflix 576×324 pair = 76.667831). The guard is test_complexity_modulates_weight/_grid_zero.

fix/bughunt-cuda — CUDA pinned-buffer leaks, motion SAD precision, errno fidelity (2026-06-27)

Rebase impact: none on upstream — all fork-local. The CUDA backend (core/src/feature/cuda/) is a fork addition with no Netflix/vmaf counterpart. Touches only core/src/feature/cuda/{float_vif_cuda.c,float_adm_cuda.c, integer_motion_cuda.c,speed_temporal_cuda.c,speed_chroma_cuda.c}. No public header, CLI flag, meson-option, ffmpeg-patch, or Netflix golden-gate surface changes (the golden gate is CPU-only; SpEED is not in the golden pairs). The leak fixes fire only in close_fex_cuda / init-error paths (no success-path behaviour change); the errno fixes only change the value returned on an already-failing CUDA error path (-EIO → the mapped errno); the integer_motion_cuda precision change brings GPU SAD output closer to the CPU double-precision reference (GPU-only, not bit-exact with CPU by design). Rebase-sensitive invariant for the next syncer: in speed_temporal_cuda.c / speed_chroma_cuda.c, every fail: label reached from CHECK_CUDA_GOTO must return _cuda_err; (the macro-mapped errno), not a literal -EIO — matching the CHECK_CUDA_RETURN convention in cuda_helper.cuh. The two manual cuMemcpyDtoH / cuCtxPushCurrent boolean checks deliberately keep their literal -EIO.

fix/bughunt-core-engine — core-engine error-path fixes (2026-06-27)

Rebase impact: low — three upstream-mirror files touched, all on error/cleanup paths. core/src/libvmaf.c, core/src/feature/feature_collector.c, and core/src/model.c are upstream-mirror files with Netflix counterparts, so a future /sync-upstream may produce small conflicts here. The changes are fork-local divergences confined to failure paths: - threaded_read_pictures_batch (libvmaf.c) is a fork-added threaded-batch helper (not in upstream), so its enqueue-failure unref fix carries no upstream conflict risk. Two adjacent doc comments in the same function were tightened to keep it under the fork's readability-function-size LineThreshold (60) — purely cosmetic, no behaviour change. - aggregate_vector_append (feature_collector.c): one-line -EINVAL-ENOMEM on the feature-name malloc-failure path. Mirrors the fork's own .cpp twin. If upstream rewrites this allocation, prefer the -ENOMEM semantics. - vmaf_model_collection_append (model.c): grow-path realloc failure no longer takes the shared fail: label (which nulls *model_collection); it returns -ENOMEM inline. Rebase-sensitive invariant: only the fresh-allocation failures may null the caller's out-param; the grow path must leave the still-valid existing collection (and the caller's handle) intact. No public header, CLI flag, meson-option, ffmpeg-patch, or Netflix golden-gate surface changes; all three edits fire only on malloc/realloc/enqueue failure, so success-path scores are unchanged (golden gate verified green).

fix/bughunt-mcp — MCP Go↔Python parity + HTTP hardening (2026-06-27)

no rebase impact: edits the fork-only MCP servers (cmd/vmafx-mcp/{main.go,impl.go,impl_direct.go} + new cmd/vmafx-mcp/http_security.go, mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py) + tests + docs/state.md + changelog. No libvmaf C-API / CLI / meson_options.txt / public-header change → no ffmpeg-patch impact. Rebase-sensitive invariant — HTTP transport security parity (cmd/vmafx-mcp/AGENTS.md invariant #13): the Go securityMiddleware / bind logic (http_security.go) and the Python _make_security_middleware / _resolve_bind_host (http_transport.py) MUST share the same ADR-0967 env contract (VMAFX_MCP_HTTP_TOKEN constant-time bearer, VMAFX_MCP_HTTP_NO_AUTH=1 opt-out, refuse-all-401-when-neither-set, 4 MiB body limit, VMAFX_MCP_HTTP_BIND default 127.0.0.1). Precision-default parity (ADR-0119 / ADR-1117): both servers default vmaf_score precision to legacy (%.6f) on every transport / dispatch path. A rebase touching either server must keep both in lock-step.

fix/mcp-probe-backend-required — MCP probe_backend required-arg message (2026-06-20)

Rebase impact: none on upstream — fork-local. The MCP server (mcp-server/vmaf-mcp/) is a fork addition with no Netflix/vmaf counterpart. One-line change in _call_tool_dispatch's probe_backend branch (removes a redundant explicit guard, relies on the existing KeyError→ValueError wrapper). No public C-API / CLI / header impact.

fix/speed-extractor-oob-deadlock-heap-corruption — GPU SpEED covariance + eigenbasis correctness + safety (2026-06-20)

Rebase impact: none on upstream — all fork-local. The SpEED feature (speed_chroma / speed_temporal) and all of its GPU backends are fork-additions with no Netflix/vmaf counterpart. Touches the fork-only GPU extractors (core/src/feature/cuda/{speed_chroma_cuda.c,speed_temporal_cuda.c, speed/speed_score.cu}, core/src/feature/hip/{speed_chroma_hip.c, speed_temporal_hip.c,speed/speed_score.hip}, core/src/feature/sycl/{speed_chroma_sycl.cpp,speed_temporal_sycl.cpp}), the fork-only core/src/feature/speed.c CPU host (init-return propagation only — the global covariance math itself is unchanged), and the GPU parity test fixture. No public header, CLI, meson-option, ffmpeg-patch, or Netflix golden-gate surface changes (SpEED is not in the golden pairs). Rebase-sensitive invariant for the next syncer: the GPU means/cov kernels must stay on the CPU's global covariance formulation (means[25] over the full phase-shifted submatrix, NOT per-tile means[25*num_blocks]), and the ref/dis paths must keep separate covariance + eigenvalue bases — recorded in core/src/feature/cuda/AGENTS.md and verified by test_cuda_speed_{chroma,temporal}_parity at 1e-4. If a future change touches any one backend's kernels, mirror it across all four (CPU + CUDA + HIP + SYCL).

fix/audit-runtime-bugs-batch — 18 audit runtime bugs: SYCL/CUDA init leaks, AI crash-hardening, MCP parity (2026-06-20)

Rebase impact: none on upstream. All 18 fixes are fork-local and touch only fork-added files with no upstream Netflix/vmaf counterpart: core/src/feature/sycl/integer_motion_sycl.cpp, core/src/feature/cuda/integer_ssim_cuda.c, core/src/feature/cuda/integer_vif_cuda.c, core/src/feature/ssimulacra2.c (fork-added SSIMULACRA2 extractor), cmd/vmafx-mcp/impl.go, and five ai/ extraction/training scripts (bvi_dvc_to_full_features.py, extract_full_features.py, konvid_to_full_features.py, train_fr_regressor_v2.py, vmaf_train/datamodule.py). No public header, CLI flag, meson-option, ffmpeg-patch, or golden-gate surface changes; all C/SYCL/CUDA edits fire only on already-failing error/OOM paths so success-path behaviour and scores are unchanged. Rebase-sensitive note for the next person syncing: the cmd/vmafx-mcp/impl.go change deletes the last vulkan backend reference in the Go MCP server to keep it byte-compatible with the Python MCP server after the Vulkan removal (ADR-0726); if a sync re-introduces a vulkan keyword in either MCP server, both must move together. No new rebase-sensitive invariants worth a dedicated AGENTS.md entry beyond the existing SYCL/CUDA error-path notes.

fix/sycl-psnr-hvs-chroma-ceiling — SYCL psnr_hvs odd-dimension chroma geometry (2026-06-20)

Rebase impact: none on upstream. Fork-local one-line correctness fix in the fork-only SYCL feature extractor core/src/feature/sycl/integer_psnr_hvs_sycl.cpp, which has no upstream Netflix/vmaf counterpart. init_fex_sycl now derives the 4:2:0 / 4:2:2 chroma plane dims with ceiling division ((w + 1U) >> 1) instead of floor (w >> 1), matching picture.c / the CPU reference / the CUDA + HIP twins. No public-header, CLI, meson-option, ffmpeg-patch, or golden-gate surface changes; even-dimension behaviour is byte-identical to before. Rebase-sensitive note for the next person syncing: the picture allocator's ceiling subsample convention ((dim + ss) >> ss) is the single source of truth for chroma plane dims — any new GPU feature extractor that re-derives plane dimensions in its own init must use the ceiling form, not floor; the floor form only agrees on even dimensions and silently drops the last chroma block strip otherwise. This is the same class of bug as the PSNR and Vulkan chroma ceiling fixes already in tree.

fix/metal-drain-motion2 — Metal end-of-stream drain + frame-0 motion2 (2026-06-20)

Rebase impact: none on upstream. All changes are fork-local (the Metal backend has no upstream Netflix/vmaf counterpart) plus one additive bit in the shared flush path. Touches: core/src/feature/feature_extractor.h (adds VMAF_FEATURE_EXTRACTOR_METAL = 1 << 7 to the VmafFeatureExtractorFlags enum — a fork-added enum; bit 7 is the next free slot after the fork's HIP bit 6), core/src/libvmaf.c (a new #ifdef HAVE_METAL drain branch in flush_context_serial, gated so non-Metal builds are byte-unchanged), and 8 fork-only core/src/feature/metal/*.mm extractors (set the new flag; + float_motion_metal.mm collect-index fix).

Rebase-sensitive notes for the next person syncing: - flush_context_serial is a fork-local rewrite of the upstream flush. If an upstream sync re-touches the end-of-stream flush, the fork's per-backend drain blocks (CUDA / HIP / SYCL / Metal) must be re-applied — each GPU backend whose extractors carry a VMAF_FEATURE_EXTRACTOR_<BACKEND> flag needs its pending gpu_pending final-frame collect() drained before its flush() runs, or the last frame's score is dropped. Do not drop the Metal branch. - Frame-0 motion2 contract. Every motion-family extractor (CPU + all GPU twins) appends motion2 = 0.0 at index 0 and a no-op at index 1; index ≥ 2 emits min(prev, cur) at index − 1. float_motion_metal now matches this exactly — keep it aligned with integer_motion_metal and the HIP / CUDA twins on any future motion2 change (cross-backend invariant, see core/src/feature/metal/AGENTS.md). - Darwin-only. Not buildable / not exercised on the Linux dev or CI lane; re-validate on Apple Silicon after any upstream flush-path sync.

fix/k150k-training-data-integrity — fail-loud on empty-frame clips + MOS-join key mismatch (2026-06-20)

Rebase impact: none on upstream. All changes are fork-local in ai/scripts/extract_k150k_features.py (a fork-added training script with no upstream Netflix counterpart). Two defensive guards: _process_clip raises on an empty frame list instead of writing an all-NaN row + marking the clip done; the MOS-label join gains an mp4.stem fallback + an up-front coverage hard-fail. Invariants recorded in ai/AGENTS.md (do not revert the lookup to a single mos_map.get(clip_name, NaN); keep the staging-first .done ordering). No C-library, ABI, golden-data, or ffmpeg-patch surface touched.

fix/speed-gpu-registry — restore orphaned GPU SpEED registrations + delete dead feature_extractor.c (2026-06-19)

Rebase impact: none on upstream. All changes are fork-local. Touches the fork-only registry file core/src/feature/feature_extractor.cpp (adds six externs + array entries under the existing #if HAVE_{CUDA,SYCL,HIP} blocks), deletes the fork-only dead twin core/src/feature/feature_extractor.c (orphaned by PR #875's .c.cpp split; meson compiled only the .cpp), and adds by-name resolution asserts to core/src/feature/../test/test_feature_extractor.c. Rebase-sensitive note for the next person syncing: there is now exactly ONE registry file (feature_extractor.cpp); if an upstream/Netflix sync re-introduces a feature_extractor.c it must be reconciled into the .cpp, not kept alongside it — the split-brain is what this fix removes. Stale feature_extractor.c references remain in ~40 sibling source comments (Metal .mm, HIP .c) and in historical docs/adr/* / docs/research/* (audit trail — do NOT rewrite); the live comment sweep for the non-ADR consumer files is deferred to the RC LOW doc-hygiene PR and coordinated with the in-flight Metal PR #986.

fix/sycl-init-leaks-exception-safety — SYCL init error-path + exception-boundary hardening (2026-06-19)

Rebase impact: none on upstream — fork-local SYCL error-path + exception-boundary hardening. Touches only fork-added SYCL sources (core/src/feature/sycl/integer_adm_sycl.cpp, core/src/feature/sycl/integer_vif_sycl.cpp, core/src/sycl/common.cpp, core/src/sycl/dmabuf_import.cpp), none of which have an upstream Netflix/vmaf counterpart. No public header, CLI, meson-option, ffmpeg-patch, or golden-gate surface changes; success-path behaviour is unchanged (cleanup/exception handling only fires on already-failing paths).

fix/cuda-init-submit-leaks — CUDA error-path resource frees (2026-06-19)

Rebase impact: none on upstream — fork-local CUDA error-path hardening. Touches only fork-added CUDA feature extractors under core/src/feature/cuda/ (integer_ms_ssim_cuda.c, integer_psnr_hvs_cuda.c, ssimulacra2_cuda.c, speed_chroma_cuda.c); adds NULL-guarded frees / cleanup-goto routing on init + submit failure paths only. No public-header, meson, CLI, ffmpeg-patch, or golden-gate impact, and success-path behaviour is unchanged.

fix/hip-chroma-mcp-parity — psnr_hip enable_chroma option + MCP Go/Python parity (2026-06-20)

Rebase impact: none on upstream — all fork-local. Touches the fork-only HIP extractor (core/src/feature/hip/integer_psnr_hip.c: add an enable_chroma VmafOption), the fork-only Go MCP server (cmd/vmafx-mcp/impl.go + impl_test.go: drop the dead vulkan backend value, drop the unsupported --format from tune-per-shot), and the fork-only Python MCP server (server.py: probe_backend ValueError guard). No upstream Netflix/vmaf file is touched; no public C-API/CLI surface changes (the enable_chroma option already exists on the CPU/CUDA psnr twins).

fix/tox-py314-scipy-118 — tox env py311→py314 (2026-06-20)

Rebase impact: none on upstream — fork-local CI config only. Touches python/tox.ini (envlist py311py314, matching the CI setup-python 3.14.5, since the fork's requirements.txt deps now require ≥3.12) plus a changelog fragment + state.md row. No source or test code changed.

feat/upstream-v1.0.16-models (2026-06-20)

Rebase impact: low (additive model data + one C registry block + one meson embed block; no public-header / CLI / ffmpeg-patch / golden-gate change). Verbatim port of Netflix upstream commit 4718b4f5f ("Add VMAF v1.0.16 SDR models, documentation, and tests"). Because it is a pure upstream port, it is exempt from the ADR-0108 six-deliverable rule (CLAUDE §12 r11); the changelog fragment + this rebase note are still provided.

What was ported and how it was adapted to the fork's diverged layout (ADR-0700 libvmaf/core/):

  • libvmaf/src/model.ccore/src/model.c: added the 8 extern decls + 8 built_in_models[] registry entries, mirroring the existing vmaf_v0.6.1/vmaf_4k_v0.6.1neg idiom byte-for-byte (the fork's struct is the same VmafBuiltInModel {version, data, data_len}).
  • libvmaf/src/meson.buildcore/src/meson.build: added two foreach blocks embedding the v1.0.16 + v1.0.16_hfr JSONs via the same xxd -i -n src_@PLAINNAME@ custom_target the fork already uses for the v0 models. The v1 models live in their own model/vmaf_v1.0.16{,_hfr}/ subdirectories, so a dedicated dir prefix is used (matching upstream).
  • model/vmaf_v1.0.16/*.json + model/vmaf_v1.0.16_hfr/*.json (8 files): copied verbatim via git checkout 4718b4f5f -- ….
  • python/test/vmaf_v1_quality_runner_test.py: copied verbatim (path NOT renamed). 46 new golden assertions; no pre-existing assertion touched.
  • Upstream's resource/doc/models_v1.md + the models.mdmodels_v0.md rename + the README "News" line do not map: the fork has no resource/doc/ model docs (it consolidated them under docs/models/), so the new doc lives at docs/models/v1.md (added to the mkdocs nav, with a cross-link from docs/models/overview.md). The README change is moot — the fork's README diverged and no longer links resource/doc/models.md.

Deliberately NOT ported (rebase-sensitive — re-check on the next upstream sync): the upstream commit also bundled an unrelated feature-source reorg in meson.build (moving speed.c, common/convolution.c, vif_tools.c out of the float_enabled block into the always-on list). The fork already wires those sources differently, so applying the upstream hunk would conflict / double-list. If a future sync touches that region, reconcile against the fork's current libvmaf_feature_sources layout, not the upstream diff.

Known fork gap (load-bearing invariant): the 4 _hfr models embed and register but cannot be scored until motion_five_frame_window=true + motion_moving_average=true are implemented (the prev_prev_ref 5-frame plumbing deferred per ADR-0337). The 4 non-HFR models score correctly (1080p 3H == upstream golden VMAF 82.816059). Do not "fix" the HFR runtime error by deleting the option from the model JSONs — the JSONs are verbatim Netflix data; the fix is to land the 5-frame motion plumbing.

feat/golusoris-tune (2026-06-15)

Rebase impact: low (Go-only, additive + in-place rewrite of one binary's composition root). Phase-1 of the golusoris adoption (ADR-1119): migrates the cmd/vmafx-tune CLI from a hand-built cobra.Command root onto the golusoris clikit (cobra + fx) framework. Touches only cmd/vmafx-tune/* + its docs / changelog / rebase-notes; no C / meson / public-header / ffmpeg-patch / golden-gate impact, and the Python tools/vmaf-tune harness is untouched. cmd/vmafx-tune is non-cgo (no pkg/libvmaf import), so no libvmaf.so build is needed to compile or test it.

Files: new cmd/vmafx-tune/cmd/golusoris.go (the withGolusoris adapter + configOptions + levelledLogger); cmd/vmafx-tune/cmd/root.go rewritten to clikit.New + clikit.Command; compare.go / ladder.go / report.go subcommand builders re-wired through clikit and their run* functions now take (ctx, deps, flags); new cmd/vmafx-tune/cmd/root_test.go; existing tests updated for the new run* signatures; cmd/vmafx-tune/AGENTS.md invariants extended; docs/usage/vmafx-tune-go.md documents the VMAFX_LOG_LEVEL / VMAFX_LOG_FORMAT surface.

Rebase-sensitive invariants for follow-up PRs and any golusoris bump:

  • clikit WithFx is long-running, not one-shot. clikit.WithFx builds an fx.App and calls app.Run() (blocks until signal) and never surfaces an fx.Invoke error as the exit code. One-shot tuning subcommands therefore use clikit.WithRunE(withGolusoris(fn)), where withGolusoris builds the graph from bootstrap.Base, fx.Populates the deps, runs fn, and returns its error. Do not "simplify" these to clikit.WithFx(golusoris.Core, fx.Invoke(fn)) — the CLI would block and lose its exit code.
  • fx.NopLogger is deliberate. A one-shot CLI must not print fx provide/invoke/lifecycle chatter on every run; bootstrap.FxLogger() (which routes fx events onto the app logger) is for long-running services only. The injected *slog.Logger still carries domain diagnostics.
  • levelledLogger compensates for a golusoris v0.4.0 scoping gap. A root-scope fx.Replace(config.Options{EnvPrefix:"VMAFX_"}) reaches root-scope consumers (our domain code reads the right config) but does not penetrate the golusoris.log submodule's own config.Options dependency, so the auto-built logger falls back to the default APP_ prefix and stays at LevelInfo. withGolusoris therefore adds fx.Decorate(levelledLogger) to rebuild the *slog.Logger from the root config at the VMAFX_-configured level/format. Delete this decorator once golusoris makes the root config override penetrate submodules (track upstream alongside golusoris #234); the TestGolusorisInjection_ConfigDrivesLogLevel test guards the behavior.
  • VMAFX_ env prefix. configOptions() sets EnvPrefix:"VMAFX_" to match the fork-wide env contract (ADR-1119). golusoris splits every underscore into the config delimiter, so VMAFX_LOG_LEVELlog.level.

feat/golusoris-foundation (2026-06-14)

Rebase impact: low (Go-only, additive). Phase 0 of the golusoris fx framework adoption (ADR-1119). Adds github.com/golusoris/golusoris v0.3.1 to go.mod/go.sum (and the widened transitive closure — fx/dig, koanf, chi, river transitives; go build ./... + all test binaries compile clean, no version-skew since both repos already pinned identical shared-dep versions), one new package internal/app/bootstrap/bootstrap.go (Base fx module set + FxLogger), and an Info/Get() addition to pkg/version/version.go (the interim stand-in for golusoris#226). No C/meson/CLI/public-header change → no ffmpeg-patch impact, no golden-gate impact. No binary is migrated in this PR — the six cmd/vmafx-* composition roots are rewritten in the subsequent phased PRs (vmafx-server first; vmafx-controller gated on golusoris#225). Rebase-sensitive note for the follow-up PRs: each binary's fx.New(...) must lead with fx.Replace(config.Options{EnvPrefix:"VMAFX_"}) before golusoris.Core to preserve the VMAFX_ env contract, and the cgo libvmaf.Scorer provider must order its OnStop after the gRPC server's (drain before Close()). Docs: docs/adr/1119-* + fragment + _order.txt + README row, docs/research/1119-*, changelog.d/chore/1119-golusoris-foundation.md.

feat/metric-brisque (2026-06-14)

Rebase impact: low. Adds four fork-only files (core/src/feature/brisque.c, brisque_math.h, brisque_model.h, core/test/test_brisque.c) plus the vendored model + provenance (model/other_models/brisque_live.model, NOTICE-brisque, model/brisque_live_card.md) — no upstream twin (BRISQUE is fork-added; the model is the LIVE-lab allmodel bundled under a documented research-use exception, ADR-1115). The model is embedded into the binary at build time via an xxd -i Meson custom_target (the same mechanism libvmaf's JSON models use; brisque_model.h only declares the generated src_brisque_live_model[] / _len externs), so the giant byte array never enters the tree — keeping it under the 1 MB large-file gate. Additive registration: one extern VmafFeatureExtractor vmaf_fex_brisque; + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the LIVE C++23 registry, NOT the dead feature_extractor.c twin), a model embed custom_target + one source line in core/src/meson.build, one executable() (linking the generated model TU) + one test() in core/test/meson.build. Edits docs/metrics/brisque.md (new), mkdocs.yml (BRISQUE nav row), docs/state.md, docs/rebase-notes.md, changelog.d/, core/src/feature/AGENTS.md (invariant note), testdata/scores_cpu_brisque.json (new), docs/research/1101-brisque-nr-metric.md (new), and the ADR index (docs/adr/1115-brisque-nr-metric.md + fragment + _order.txt + README row). CPU-only scalar extractor; no public C-API / ABI / CLI flag / meson_options.txt change → no ffmpeg-patch impact (reachable via the existing generic --feature path). First feature-extractor consumer of the vendored libsvm (core/src/svm.cpp/svm.h) — if a future upstream sync changes the libsvm parser/predict ABI, brisque.c's svm_parse_model_from_buffer + svm_predict calls must be re-checked alongside predict.c. Load-bearing invariants if the algorithm is ever touched: GGD (not AGGD) for the MSCN field, Gaussian sigma=7/6 (not 1.166), MATLAB antialiased bicubic (not INTER_CUBIC), the inline range arrays (not allrange), no output clamp — all required for parity with the bundled trained model (see core/src/feature/AGENTS.md and ADR-1115).

feat/metric-y-funque-plus (2026-06-14)

Rebase impact: low. Fork-only additive metric — no upstream twin. Adds two fork-only files (core/src/feature/y_funque_plus.c, core/test/test_y_funque_plus.c). Additive registration only: one extern VmafFeatureExtractor vmaf_fex_y_funque_plus; + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the live C++23 registry — NOT the dead feature_extractor.c twin), a dedicated libvmaf_y_funque_plus_static_lib static_library() + one extract_all_objects() line in core/src/meson.build (mirrors the ssimulacra2 -ffp-contract=off carve-out), and one executable() + one test() in core/test/meson.build. Edits docs/metrics/y-funque-plus.md (new), docs/metrics/features.md (one new row), docs/state.md, docs/rebase-notes.md, changelog.d/, and the ADR index (docs/adr/1114-y-funque-plus-atoms.md + fragment + _order.txt + README.md row). CPU-only scalar extractor; no public C-API / ABI / CLI flag / meson_options.txt / public-header change → no ffmpeg-patch impact (CLAUDE §12 r14 N/A; reachable via the generic --feature path). Load-bearing invariants if the algorithm is ever touched: the Haar butterfly uses the pywt 'haar' convention cH=(a+b-c-d)/2, cV=(a-b+c-d)/2 (NOT the H/V-swapped form the design dossier text mistakenly listed — pywt was verified directly); the DLM numerator pools rest^3 WITHOUT abs while the denominator pools the ref detail WITH abs (pyr_features.py:54/61); the 2x downscale is OpenCV INTER_CUBIC (Keys cubic a=-0.75), the dominant cross-host parity risk — keep -ffp-contract=off.

feat/pelorus-sidedata-reader (2026-06-14)

Rebase impact: low-to-medium. Fork-only additive feature (ADR-1118), builds on the vendored Pelorus interop ABI (ADR-1113). Adds three fork-only files (core/include/libvmaf/perceptual_weight.h public C-API, core/src/feature/perceptual_weight.{c,h} the weight module + internal contract, core/test/test_perceptual_weight.c the golden-isolation test) — no upstream twin. Edits to shared files, all additive: - core/src/libvmaf.c — the rebase-sensitive one. Adds (1) two #includes, (2) a VmafPerceptualWeightStore perceptual; field at the tail of struct VmafContext (after dnn), (3) a vmaf_perceptual_weight_store_destroy call in vmaf_close after vmaf_ctx_dnn_free, (4) three new public entry points after vmaf_import_feature_score, and (5) the weighting branch inside vmaf_feature_score_pooled plus a new static pool_reduce helper + PoolAccumulators struct just above it. Load-bearing invariant: the no-side-data path through vmaf_feature_score_pooled MUST stay byte-identical to upstream — the weighted accumulators are only summed when vmaf_perceptual_weight_active() is true, and the MEAN/HARMONIC_MEAN reduce runs the literal upstream expression when weighting is inactive. A rebase that refactors this function must preserve that bit-exactness (the golden gate depends on it; test_perceptual_weight.c guards it). - core/src/meson.build — one source line (feature/perceptual_weight.c) in libvmaf_sources (NOT the feature static lib — it is a pooling helper, not a registered extractor). - core/include/libvmaf/meson.build — one install_headers entry. - core/test/meson.build — one executable() + one test(). - ffmpeg: ffmpeg-patches/0017-libvmaf-read-pelorus-sidedata.patch (new public C-API consumed by vf_libvmaf + new perceptual_weight AVOption → ffmpeg-patch impact per CLAUDE r14) appended at the tail of ffmpeg-patches/series.txt. The patch anchors on stable post-0016 context (score_fmt option line, VmafContext *vmaf; struct line, the do_vmaf vmaf_read_pictures call); CI validates it via a full series replay against a clean n8.1 checkout (git am --3way), not standalone git apply. - Docs/index: docs/api/perceptual-weight.md (new), mkdocs.yml (api nav row + ADR-1118 nav row), docs/state.md, docs/research/1102-*.md (new), changelog.d/added/, and the ADR index (docs/adr/1118-perceptual-sidedata-weighting.md + fragment + _order.txt + regenerated README.md).

feat/mcp-tiny-ai-feature-coverage (2026-06-14)

no rebase impact: all touched code is fork-local. The MCP servers (cmd/vmafx-mcp/{tools.go,impl.go,impl_direct.go,main.go,score_extras_test.go} and mcp-server/vmaf-mcp/src/vmaf_mcp/server.py + tests/test_score_extras_adr1117.py) do not exist in upstream Netflix/vmaf, so no mechanical merge conflict is possible. The change adds optional scoring parameters that shell out to existing vmaf CLI flags — it does not add, rename, or remove any public C-API entry point, CLI flag, meson_options.txt entry, public header, or LIBVMAFContext field, so per CLAUDE.md §12 r14 there is no ffmpeg-patch impact (the patches under ffmpeg-patches/ consume libvmaf symbols, not the MCP servers). Rebase-sensitive invariant the diff must preserve: the Go (scoringExtraProperties()) and Python (_scoring_extra_properties()) schema generators MUST stay byte-identical (same keys/enums/defaults/descriptions) per cmd/vmafx-mcp/AGENTS.md §1 — the parity tests (server_test.go::TestToolSchemasMatchPython, test_score_extras_adr1117.py) and the source-of-truth flag names in core/tools/cli_parse.c are the backstop. Also edits docs/mcp/tools.md, docs/state.md, changelog.d/, the ADR index (ADR-1117 + fragment + _order.txt + regenerated README.md), and a research digest — all fork-local docs.

feat/metric-niqe (2026-06-14)

Rebase impact: low. Adds four fork-only files (core/src/feature/niqe.c, niqe_math.h, niqe_model.h, core/test/test_niqe.c) — no upstream twin (NIQE is fork-trained against model/other_models/niqe_v0.1.pkl). Additive registration: one extern VmafFeatureExtractor vmaf_fex_niqe; + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the LIVE C++23 registry, NOT the dead feature_extractor.c twin), one source line in core/src/meson.build, one executable() + one test() in core/test/meson.build. Edits docs/metrics/niqe.md (new), mkdocs.yml (NIQE nav row + regenerated ADR-nav block via scripts/docs/generate-adr-nav.sh), docs/state.md, docs/rebase-notes.md, changelog.d/, testdata/scores_cpu_niqe.json (new), and the ADR index (docs/adr/1112-niqe-nr-metric.md + fragment + _order.txt + regenerated README.md). CPU-only scalar extractor; no public C-API / ABI / CLI flag / meson_options.txt change → no ffmpeg-patch impact (the feature is reachable via the existing generic --feature path). Load-bearing invariants if the algorithm is ever touched: the AGGD N keeps the trailing *aggdratio factor and the MSCN maps + PIL bicubic half-res output stay float32-rounded — both are required for parity with the pkl the model was trained against (see core/src/feature/AGENTS.md and ADR-1112).

feat/metal-standalone-batch (2026-06-14)

Rebase impact: low. Adds 12 fork-only files (4 kernels x {.metal,_metal.mm,test}) for integer_ciede / integer_psnr_hvs / integer_cambi / ssimulacra2 — no upstream twins. Additive registration (4 externs + 4 list entries in feature_extractor.c

if HAVE_METAL; 4 .mm sources + 4 custom_targets + 4 metal_air_files in

core/src/metal/meson.build; a foreach test block in core/test/meson.build). Edits docs/metrics/features.md (+Metal on the 4 rows), state.md, changelog. cambi is a Strategy-II hybrid (GPU kernels + exact-CPU host residual via cambi_internal.h), matching ADR-0205. Metal-only; no public C-API/CLI change -> no ffmpeg-patch impact.

feat/metric-delta-e-itp (2026-06-14)

Rebase impact: low. Fork-only additive metric — no upstream twin. Adds three new files (core/src/feature/delta_e_itp.c, core/src/feature/delta_e_itp_math.h, core/test/test_delta_e_itp.c). Additive registration only: one extern + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the live C++23 file — NOT the stale feature_extractor.c twin, which is dead per ADR-0846 and is a separate cleanup), one source line in core/src/meson.build (next to ciede.c), one executable() + one test() block in core/test/meson.build (next to test_ciede), and one nav entry in mkdocs.yml. No public C-API / ABI / CLI flag / meson_options.txt / public-header change → no ffmpeg-patch impact (CLAUDE §12 r14 N/A). The metric mirrors the CPU ciede.c structure (chroma-upsample helpers, 8/16-bit reads, double-precision frame sum); if the upstream ciede.c chroma-upsampling helpers are ever refactored, the copied-verbatim scale_chroma_planes / scale_chroma_planes_hbd in delta_e_itp.c are independent and need no follow-up. Compiled unconditionally (CPU); no backend flag.

feat/metric-pu21 (2026-06-14)

Rebase impact: low (fork-only additive). Adds five fork-only files (core/src/feature/pu21.c, pu21_math.h, pu21_ssim.c, pu21_ssim.h, core/test/test_pu21.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.cpp (the active C++ registry — NOT the dead feature_extractor.c, which the build does not compile), pu21.c + pu21_ssim.c added to the unconditional libvmaf_feature_sources in core/src/meson.build (next to ciede.c), test block + test() row in core/test/meson.build. Edits docs/metrics/pu21.md (new), mkdocs.yml, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/. Reuses only the read-only iqa Gaussian-convolve helper (iqa/convolve.c) and the read-only Gaussian window table (iqa/ssim_tools.h); the golden float_ssim/iqa_ssim (L=255) is untouched — PU21 ships its own L=256 SSIM. No public C-API / ABI / CLI flag / meson_options.txt change → no ffmpeg-patch impact. If the iqa convolve layout (output packed at the reduced stride w-kw+1) ever changes, pu21_ssim.c's reduction stride must follow.

feat/metal-integer-adm (2026-06-14)

Rebase impact: low. Adds three fork-only files (core/src/feature/metal/integer_adm.metal, integer_adm_metal.mm, core/test/test_metal_integer_adm_parity.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), .mm source + custom_target + metal_air_files entry in core/src/metal/meson.build, test block in core/test/meson.build. Edits docs/metrics/features.md (adm fixed-point GPU column += SYCL/HIP/Metal), docs/state.md, docs/rebase-notes.md, changelog.d/. Metal-only (-Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Mirrors the CPU integer_adm.c fixed-point DWT pipeline — if that algorithm changes, the Metal twin must follow.

fix/mcp-schema-bitdepth-vulkan (2026-06-14)

no rebase impact: edits the fork-only MCP servers (cmd/vmafx-mcp/tools.go, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py) + docs/mcp/tools.md + state.md + changelog. No libvmaf C-API/CLI change. The bitdepth enum + backend enum must stay in sync between the Python and Go MCP tool schemas (byte-compatible pair).

feat/metal-integer-vif (2026-06-14)

Rebase impact: low. Adds three fork-only files (core/src/feature/metal/integer_vif.metal, integer_vif_metal.mm, core/test/test_metal_integer_vif_parity.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), .mm source + custom_target + metal_air_files entry in core/src/metal/meson.build, test block in core/test/meson.build. Edits docs/metrics/vif.md, docs/state.md, docs/rebase-notes.md, changelog.d/. Metal-only (-Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Mirrors the CPU integer_vif.c fixed-point arithmetic + the float_vif_metal scaffold; if the CPU integer-VIF math changes, the Metal twin must follow.

feat/metal-float-adm (2026-06-14)

Rebase impact: low. Adds three fork-only files (core/src/feature/metal/float_adm.metal, float_adm_metal.mm, core/test/test_metal_float_adm_parity.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), .mm source + custom_target + metal_air_files entry in core/src/metal/meson.build, test block in core/test/meson.build. Edits docs/metrics/features.md (float_adm GPU column += Metal), docs/state.md, docs/rebase-notes.md, changelog.d/. Metal-only (-Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Core-VMAF kernel; mirrors the CUDA float_adm/ DWT+CSF+CM pipeline — if a future change alters that algorithm, the Metal twin must follow.

feat/metal-float-vif (2026-06-14)

Rebase impact: low. Adds three fork-only files (core/src/feature/metal/float_vif.metal, float_vif_metal.mm, core/test/test_metal_float_vif_parity.c) — no upstream twin. Additive registration: one extern + one list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), one .mm source + one custom_target + one metal_air_files entry in core/src/metal/meson.build, one test block in core/test/meson.build. Edits docs/metrics/vif.md, docs/state.md, docs/rebase-notes.md, changelog.d/ — keep both additive hunks on concurrent-branch conflict. Metal-only (compiles under -Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Part of the Metal full-parity sweep (9 real kernels); float_vif is core-VMAF.

feat/metal-integer-ssim (2026-06-14)

Rebase impact: low. Adds three fork-only files (core/src/feature/metal/integer_ssim.metal, integer_ssim_metal.mm, core/test/test_metal_integer_ssim_parity.c) — no upstream twin. Registration edits are additive: one extern + one list entry in core/src/feature/feature_extractor.c (inside the #if HAVE_METAL block), one .mm source + one custom_target + one metal_air_files entry in core/src/metal/meson.build, one test block in core/test/meson.build. Edits docs/metrics/ssim.md, docs/state.md, docs/rebase-notes.md, changelog.d/ — keep both additive hunks if a concurrent branch also edits them. The Metal kernel only compiles under -Denable_metal=enabled (macOS); no public libvmaf C-API / ABI / CLI / meson_options.txt change, so no ffmpeg-patch (CLAUDE §12 r14) impact. Scope note: Metal full-parity is 9 real kernels (not 11) — integer_moment/integer_ms_ssim are not distinct extractors.

feat/gpu-motion3-v2-twins (2026-06-14)

Rebase impact: low. Touches three fork-added GPU wrappers (core/src/feature/{sycl,hip,metal}/integer_motion_v2_{sycl.cpp,hip.c,metal.mm}) — none have an upstream twin, so no upstream-sync conflict — plus their fork-only parity tests (core/test/test_{sycl,hip,metal}_motion_v2_parity.c). Each mirrors the merged CUDA flush_fex_cuda motion3_v2 post-process (fix/cuda-motion-v2-motion3-emission, ADR-1108) byte-for-byte and reuses the shared motion_blend_tools.h helper; no GPU kernel is modified. Edits docs/metrics/motion.md, docs/state.md, docs/rebase-notes.md, changelog.d/ — keep both additive hunks if a concurrent branch also edits them. No public libvmaf C-API, ABI, header, CLI, or meson_options.txt change, so no ffmpeg-patch (CLAUDE §12 r14) impact. If a future change alters the CPU integer_motion_v2.c::flush blend/clip/seed/moving-average logic, all four GPU twins (cuda/sycl/hip/metal) must be updated in the same PR to keep the places=4 parity gate green.

fix/cuda-motion-v2-motion3-emission (2026-06-13)

Rebase impact: low. Touches core/src/feature/cuda/integer_motion_v2_cuda.c (fork-added CUDA wrapper — no upstream twin, so no upstream-sync conflict), core/test/test_cuda_motion_v2_parity.c (fork-only test), and core/src/feature/cuda/AGENTS.md (fork doc). Adds docs/adr/1108-*.md, changelog.d/fixed/1108-*.md. Edits docs/metrics/motion.md, docs/adr/README.md, and docs/state.md — these can conflict with a concurrent branch that also edits the same doc; keep both additive hunks (the motion3_v2 rows/paragraph here plus whatever the other branch adds). The motion3_v2 emission reuses the existing motion_blend_tools.h host helper and the established vmaf_feature_collector_append_with_dict API — no public libvmaf C-API, ABI, header, CLI, or meson_options.txt surface change, so no ffmpeg-patch (CLAUDE §12 r14) impact. The CUDA kernel itself is unchanged; only the host-side flush + option table grew.

feat/vmafx-scorestream-phase2 (2026-06-13)

no rebase impact: all changes are in fork-local Go files that do not exist in upstream Netflix/vmaf — pkg/libvmaf/stream.go (+ test), pkg/libvmaf/libvmaf.go (adds the exported Scorer.ResolveModel wrapper), cmd/vmafx-server/grpc_server.go, cmd/vmafx-node/server/server.go, cmd/vmafx-node/main.go, and their tests, plus docs/ and changelog.d/. The cgo path links against the public libvmaf C ABI (vmaf_picture_alloc / vmaf_read_pictures / vmaf_score_at_index / vmaf_score_pooled / vmaf_feature_score_at_index) — all stable upstream entry points in core/include/libvmaf/libvmaf.h; no upstream-mirrored C source is modified, so no mechanical conflict is possible. If a future upstream sync renamed any of those public functions, pkg/libvmaf/{direct,stream}.go would need the same one-line follow per the existing cgo-coupling invariant.

fix/json-model-feature-name-leak (2026-06-13)

Rebase impact: low. Touches core/src/read_json_model.c (upstream-mirrored, libvmaf/src/read_json_model.c upstream) and its fork-only C++23 twin core/src/read_json_model.cpp (ADR-0761 / ADR-0846 Wave 8) — both gain one free(model->feature[index].name) line plus a comment inside append_feature_name, immediately before the strdup. Also adds one test function + one registration line to core/test/test_model.c. Upstream lacks the duplicate-key overwrite guard, so a future sync that rewrites append_feature_name in the .c file will conflict on that hunk only; keep the fork's free-before-strdup (it fixes a real leak the upstream code shares). The .cpp twin is fork-only and never receives upstream hunks. No public API, ABI, header, or CLI surface changes — no ffmpeg-patch impact.

fix/golden-cpu-regression-restore (2026-06-13)

Rebase impact: low. Touches core/src/feature/vif_tools.c (removes #if HAVE_AVX512 dispatch blocks from the three float VIF functions). This file also exists in upstream Netflix/vmaf. Future upstream syncs that modify vif_tools.c will see a clean merge on any hunk that does not overlap with the three removed dispatch blocks. If upstream ever adds AVX-512 float VIF dispatch, the upstream version must be audited for Netflix golden parity before enabling it on this fork.

docs/rc-deferred-closeout (2026-06-13)

no rebase impact: changes confined to docs/state.md (move T-DOC-LEGACY-RUNNER from Open to Recently Closed) and docs/metrics/cambi.md (remove stale Vulkan section, doc-only). Conflicts with a concurrent branch that also edits state.md: keep both state-update rows; the row order within Recently Closed does not matter.

test/float-extractor-cpu-coverage — Float extractor CPU-path unit tests (2026-06-13)

no rebase impact: test-only addition. New files: core/test/test_float_{psnr,moment,ssim,ms_ssim,vif,adm,motion}_coverage.c. Edit to core/test/meson.build adds 7 new executable targets after line 1705 (test_float_vif_min_dim). No conflict risk unless a concurrent branch also adds tests immediately after that same line; in that case, append both blocks in source order.

fix/hip-meson-speed-tus-dedup — Remove duplicate HIP speed TU entries (2026-06-12)

no rebase impact: change confined to core/src/hip/meson.build (build config only). Removes the duplicate speed_chroma_hip.c / speed_temporal_hip.c wiring block that ADR-0852 introduced when ADR-0964 had already included those TUs earlier in the same hip_sources list. If a concurrent branch also edits core/src/hip/meson.build, keep both sets of changes; the resolved file must contain each speed TU exactly once.

fix/docker-ffmpeg-tag-n811-pin — Docker FFMPEG_TAG n8.1 → n8.1.1 + patch 0016 context (2026-06-12)

no rebase impact: Dockerfile and Dockerfile.ffmpeg pin changes are build-config only; patch 0016 context line fix is an ffmpeg-patches-internal correction with no effect on C API, ABI, or libvmaf source. Files touched: Dockerfile, Dockerfile.ffmpeg, .pre-commit-config.yaml, ffmpeg-patches/0016-libvmaf-wire-score-fmt-on-all-vmaf-filters.patch.

chore/codeql-cpp-cleanup-bundle — CodeQL C++ note-level cleanup (2026-06-12)

no rebase impact: all changes are local variable renames, dead-code removals, and comment additions. Files touched: adm_avx2.c, adm_avx512.c, integer_adm.c, feature_collector.c, feature_collector.cpp, feature_name.c, libvmaf.c, mkdirp.c, speed.c, vif_tools.c, pdjson.c, predict.c, svm.cpp, test_score_pooled_eagain.c, test_tensor_io.c, test_cambi.c, test_integer_adm_simd.c, test_svm_api.c. No API or ABI changes; no semantic changes to score computation. If a concurrent branch modifies any of these files, resolve by keeping both sets of changes — variable renames are local and non-conflicting.

chore/bundle-fable-5-findings — 4 Fable deep-hunt fixes (2026-06-12)

core/src/feature/x86/integer_ssim_avx2.c: reorder w*(s*s) to (w*s)*s for the 16-bit accumulation; only affects integer_ssim AVX2 16-bit path. No conflict risk on other branches unless they also modify integer_ssim_avx2.c accumulation order. core/src/libvmaf.c: three separate hunks — bpc &&|| in validate_pic_params; Phase 2 CUDA PREV_REF vmaf_picture_ref instead of bare copy; dist translate error-propagation in read_pictures_cuda_translate. If a concurrent branch edits libvmaf.c in those functions, resolve by keeping all three fixes; they are independent. core/test/test_validate_pic_params_bpc.c and core/test/meson.build: new test file and meson registration. No conflict risk unless another branch adds a test with the same name. cmd/vmafx-server/concurrency.go, concurrency_test.go, grpc_server.go, http_server.go, main.go: ScoreLimiter addition. If a concurrent branch also modifies main.go flag parsing or grpc_server.go/http_server.go handler signatures, resolve by preserving the WithLimiter constructors and the --max-concurrent-scores flag.

fix/master-855-tip-3-reds — bootstrap-test recal + Dockerfile ldconfig (2026-06-08, no ADR)

no rebase impact: python/test/local_explainer_test.py line 276 expected value and places argument changed (fork-local test, not Netflix golden data); Dockerfile gains a single RUN ldconfig line after make install. If a concurrent branch also edits python/test/local_explainer_test.py lines 271-277, resolve by keeping places=3 and the # ADR-0418 macOS-libm Δ relax comments. If a concurrent branch edits Dockerfile around the libvmaf build block, ensure RUN ldconfig is present immediately after the make install line.

fix/containerfile-gid-and-stale-rename — GID/UID 1000 → 2000 (2026-06-08, ADR-1101)

no rebase impact: changes confined to dev/Containerfile (GID/UID values), docs/adr/1101-containerfile-gid-uid-2000.md (new ADR), and changelog.d/fixed/1101-containerfile-gid-uid-2000.md (new fragment). No production C source, public header, meson build files, or Python package modified. If a concurrent branch also edits dev/Containerfile, the only conflict will be in the groupadd/useradd lines; resolve by keeping GID/UID 2000.

fix/matrix-5-real-bugs (2026-06-08, no ADR — 5 correctness bug fixes)

core/src/feature/hip/integer_vif/vif_statistics.hip: removed #define AMD_WAVEFRONT_SIZE 64; reduction loop and lane guards now use warpSize device variable. Conflicts possible if another branch edits the same wavefront-reduce section; resolve by keeping the warpSize-based version. core/src/feature/hip/float_vif/float_vif_score.hip, float_motion/float_motion_score.hip, float_psnr/float_psnr_score.hip, float_moment/moment_score.hip: similar pattern — shared-memory arrays resized for minimum warp size (32); runtime warpSize used for loops. Conflict risk is low (only these wavefront-size definitions changed); keep the warpSize-based version. core/src/libvmaf.c: ref = &ref_host; dist = &dist_host guarded by if (hw_flags & HW_FLAG_HOST). Conflicts possible if another branch modifies the same #ifdef HAVE_CUDA block; resolve by keeping the HW_FLAG_HOST guard. mcp-server/vmaf-mcp/src/vmaf_mcp/server.py: _PROBE_YUV_WIDTH/HEIGHT bumped from 32 to 64; runtime_healthy set to score is not None. Low conflict risk. ffmpeg-patches/0005-libvmaf-add-libvmaf-sycl-filter.patch: FILTER_SINGLE_PIXFMT replaced by FILTER_PIXFMTS; do_vmaf_sycl and config_props_sycl split on AV_PIX_FMT_QSV. Conflicts possible if another branch edits patch 0005; apply this version first, then rebase the other. dev/Containerfile: RUN bash .../fetch-test-yuvs.sh layer added. Low conflict risk.

test/ai-scripts-coverage-round3 (2026-06-06, no ADR — test-only)

no rebase impact: adds two new test files (ai/tests/test_calibrate_phase_f_recipes_unit.py and ai/tests/test_analyze_knob_sweep_unit.py) and one changelog fragment. No existing C source, public API, upstream-mirrored Python, or golden assertion is modified.

docs/r12-c-api-doc-completeness (2026-06-06, no ADR — doc-only)

no rebase impact: comment-only changes to core/include/libvmaf/libvmaf_cuda.h, core/include/libvmaf/libvmaf_sycl.h, core/include/libvmaf/dnn.h, core/include/libvmaf/picture_v2.h, core/include/libvmaf/libvmaf.h, and core/include/libvmaf/model.h. No C sources, build files, or public API signatures touched — Doxygen comment additions only.

docs/doxygen-private-headers-r4 (2026-06-07)

no rebase impact: purely additive Doxygen comment blocks inserted into 10 internal headers under core/src/. No include paths, struct layouts, or function signatures are changed. Conflicts only if another branch inserts text at the same line positions in these headers.

fix/pic-pool-odr-cuda-gpumask-cov-floor (2026-06-08)

core/src/meson.build: adds cpp_args to picture_pool_cpp23_lib. Conflicts possible if another branch modifies the same static_library() block; resolve by keeping both the cpp_args line and the other change. core/tools/test/test_vmaf_cuda_gpumask.sh and core/tools/test/meson.build: shell guard + timeout added; low conflict risk. scripts/ci/coverage-check.sh and .github/workflows/tests-and-quality-gates.yml: per-file floor and pytest timeout changed; low conflict risk (numeric/string values only).

fix/cuda-done-path-double-unref-ort-coverage (2026-06-07)

no rebase impact: changes confined to core/src/libvmaf.c (split read_pictures_cuda_cleanup into full and _device_only variants inside the existing #ifdef HAVE_CUDA block — non-CUDA builds are unchanged; the call site at the done=true branch is guarded by the same #ifdef HAVE_CUDA) and core/src/dnn/ort_backend.c (collapse a dead else branch into a single-line ternary in ort_log_and_release_status — no behaviour change on any exercised code path, coverage-only impact).


fix/ci-multi-platform-bundle-838 (2026-06-07)

no rebase impact: changes confined to core/tools/cli_parse.cpp (const-qualifier on local strsep parameter — isolated #ifndef HAVE_STRSEP compat block), core/src/opt.cpp (replace static_cast<int> with memcpy in a single switch statement — no surrounding context dependency), core/src/feature/feature_extractor.cpp (add extern "C" wrappers around existing extern declarations — purely syntactic, no semantic change), core/src/libvmaf.c (add #ifdef HAVE_CUDA cleanup call in the done=true branch of vmaf_read_pictures — guarded by HAVE_CUDA; non-CUDA builds are unchanged), and python/test/vmafexec_feature_extractor_test.py (lower places=6 to places=4 on 5 per-frame assertions).


fix/go-rust-ci-red-bundle (2026-06-07)

no rebase impact: changes confined to .github/workflows/go-ci.yml (env var addition to go test step), cmd/vmafx-operator/internal/controller/vmafxnode_controller_test.go (timestamp truncation), cmd/vmafx-mcp/impl_direct.go (restore ValidatePath calls), and bindings/rust/vmafx-sys/Cargo.toml (add [lib] doctest = false). No C source, public header, or upstream-mirrored code modified.


fix/build-matrix-macos-windows-fixes (2026-06-07)

Rebase-sensitive (meson.build): core/src/meson.build gains dependencies : [pthread_dependency] on both picture_pool_cpp23_lib and gpu_picture_pool_cpp23_lib static library targets (~lines 1768–1788). If a concurrent branch adds other fields to those static_library() calls, merge both sets of fields.

Other changes are not rebase-sensitive: - core/src/feature/arm64/motion_v2_neon.c: rewrite of neon_any_nonzero_s32 (isolated function, no surrounding context). - compat/python-vmaf/__init__.py: two call-sites of --cpumask changed from "-1" to "4294967295". - .github/workflows/libvmaf-build-matrix.yml: two Vulkan matrix rows removed; if a concurrent branch also removes the Vulkan step bodies (Install Vulkan SDK, Cache meson subprojects (Vulkan wraps), Run Vulkan smoke tests (macOS MoltenVK), etc.), take both removals. - python/test/python_harness_coverage_test.py: test expectation update (--cpumask -1--cpumask 4294967295; test_run_preserves_user_env expected dict gains LC_ALL/LANG).

fix/nightly-bisect-tracker-issue (2026-06-07)

no rebase impact: changes confined to .github/workflows/nightly-bisect.yml, scripts/ci/post-bisect-comment.py, docs/state.md, and changelog.d/fixed/nightly-bisect-tracker-issue.md. No C source, public header, Go source, or test logic modified.

fix/feature-extractor-flags-zero-skip-gpu (2026-06-07)

no rebase impact: the change is confined to a single function body in core/src/feature/feature_extractor.c (lines 443–473). No header changes, no meson.build changes, no new files except the ADR and changelog fragment. The only other file touched is core/test/test_picture.c (missing <string.h> include added). Neither file is a high-contention rebase target.

Rebase-sensitive: modifies core/src/meson.build and core/test/meson.build — two high-contention build files that accumulate edits from most GPU-backend PRs.

In core/src/meson.build: - The sycl_dependency declare_dependency block gains link_args: ['-fsycl']. - The vmaf_link_args += ['-fsycl'] line and its surrounding comment block are replaced with a shorter comment referencing sycl_dependency. If a concurrent branch adds entries to vmaf_link_args, take that branch's additions and keep the updated comment.

In core/test/meson.build: - The test('test_sycl_motion_add_uv_parity', ...) call loses should_fail: true and its accompanying ADR-1093 comment block. If a concurrent branch adds new SYCL test executables nearby, no conflict is expected; should_fail on other tests is unaffected.

In core/test/test_sycl_motion_add_uv_parity.c: - Feature-name queries updated (integer_motion2_mau, float_motion2_mau). Conflicts only if another branch edits the same query lines.

fix/mcp-resource-uri-validation (2026-06-07)

no rebase impact: single-function change in cmd/vmafx-mcp/impl_direct.go (resolveModelArgToPath) and one new test in cmd/vmafx-mcp/impl_direct_test.go. Only the Go cmd/vmafx-mcp package is touched; no C sources, no public headers, no test fixtures, no build files. Conflicts only if another branch edits resolveModelArgToPath or adds tests to impl_direct_test.go.

fix/cross-platform-path-list-separator (2026-06-06)

no rebase impact: single-line change in pkg/libvmaf/paths.go replacing strings.Split(extra, ":") with filepath.SplitList(extra). Only the Go pkg/libvmaf package is touched; no C sources, no public headers, no test fixtures, no build files. Conflicts only if another branch edits the same AllowedRoots function in that file.

fix/neon-motion-zero-skip (2026-06-06)

no rebase impact: single-file change to core/src/feature/arm64/motion_v2_neon.c. Replaces the neon_hadd_s32 (signed horizontal sum) early-exit check with neon_any_nonzero_s32 (bitwise OR-fold) in both motion_score_pipeline_8_neon and motion_score_pipeline_16_neon. No public API, no header, no test data, no upstream-mirrored file is modified. Conflicts only if another branch edits the same static helper region of that file.

fix/helm-values-completeness-adr-1074 (ADR-1074, 2026-06-06)

no rebase impact: changes are confined to deploy/helm/vmafx/values.yaml, deploy/helm/vmafx/values.schema.json, and three templates (templates/statefulset.yaml, templates/node.yaml, templates/networkpolicy.yaml). No C source, public header, upstream-mirrored file, Python test, or golden-data assertion is touched. Conflict risk exists only if another branch edits those same Helm files concurrently.

test/coverage-pkg-observability (2026-06-06)

no rebase impact: changes are confined to pkg/observability/coverage_gaps_test.go (new test file), pkg/observability/AGENTS.md (invariant notes), and changelog.d/added/observability-coverage-gaps.md (fragment). No production source, public header, or build file is modified. Conflicts only if another branch edits the same lines in AGENTS.md or rebase-notes.md.

fix/sanitizer-deselect-tests-and-quality-gates (2026-06-06)

no rebase impact: CI-only change to .github/workflows/tests-and-quality-gates.yml adding test_gpu_picture_pool_uaf, test_integer_motion_v2_coverage, and test_pic_preallocation to the ADR-0347 per-sanitizer EXCLUDE patterns for address, undefined, and thread. No source, header, test, or build file is modified. Conflicts only if another branch edits the same case block in that workflow file.

fix/mcp-score-at-index-eagain-guard (ADR-1073, 2026-06-06)

no rebase impact: changes are confined to core/src/libvmaf.c (vmaf_score_at_index guard condition), core/src/mcp/compute_vmaf.c (n_threads restored to 1u, debug code removed), and core/test/test_mcp_smoke.c (fixture dimensions 64→192, debug print removed). No public API surface, no upstream-mirrored file is modified. The guard change is a one-line fix that does not affect the call signature or semantics observable to callers that never encounter multi-frame pools.

fix/skip-motion-five-frame-window-adr-0337 (ADR-0337, 2026-06-06)

no rebase impact: only python/test/feature_extractor_test.py is modified — 9 test methods gain @unittest.skip decorators. No C source, public header, upstream-mirrored file, or golden-data assertion is touched. Rebase against Netflix/vmaf master or any feature branch has zero conflict risk.

fix/prev-ref-batch-refcount-and-motion-score (ADR-1072, 2026-06-06)

Files touched: core/src/libvmaf.c (two sites in threaded_extract_batch_func and one in threaded_extract_func), core/test/test_hip_ms_ssim_parity.c (FIXTURE_H 144→192), core/test/test_cuda_float_ms_ssim_parity.c (FIXTURE_H 144→192), core/test/test_hip_motion_parity.c (add debug=1 opts, add feature.h include), docs/adr/1072-prev-ref-batch-refcount-leak.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/1072-prev-ref-batch-refcount-leak.md.

Rebase impact: The libvmaf.c hunks add vmaf_picture_unref + memset + memset(f->prev_ref) inside the VMAF_FEATURE_EXTRACTOR_PREV_REF block in threaded_extract_batch_func. If a concurrent branch modifies the same block or the unref: label region, resolve by keeping both the concurrent change and the new unref-before-memset + zero-f->prev_ref logic from this branch. The test fixture changes (144→192) and the debug-flag addition are self-contained with no shared invariants. No public-API, ABI, or upstream-mirrored file changes.


fix/test-failures-macos-dnn (2026-06-06, no ADR — bug fixes)

Files touched: core/src/gpu_picture_pool.{c,cpp}, core/src/libvmaf.c, core/src/feature/integer_motion.c, core/src/feature/feature_extractor.cpp, core/test/test_framesync.c, core/test/test_integer_motion_coverage.c, changelog.d/fixed/macos-dnn-test-failures-6-fixes.md, docs/rebase-notes.md.

Rebase impact: All changes are internal bug fixes with no public-API or ABI changes. If a concurrent branch modifies vmaf_score_at_index (libvmaf.c), vmaf_gpu_picture_pool_init (gpu_picture_pool.{c,cpp}), or integer_motion.c init(), resolve conflicts by keeping both the concurrent change and the err != -EAGAIN / *pool = NULL / w < 3 || h < 3 guards from this branch. The test fixes in test_framesync.c and test_integer_motion_coverage.c are self-contained; no invariants span other branches.

no rebase impact on public API, build flags, or upstream-mirrored files.


docs/doxygen-public-header-drift (2026-06-06, no ADR — doc-only fix)

no rebase impact: comment-only changes to core/include/libvmaf/libvmaf_cuda.h, core/include/libvmaf/libvmaf_vulkan.h, and core/include/libvmaf/libvmaf_sycl.h. No C sources, build files, or public API signatures touched.

chore/ci-workflow-audit-sha-pin-dead-jobs (2026-06-06, no ADR — workflow hygiene)

no rebase impact: changes are entirely in .github/workflows/ (SHA pin, dead-job removal, comment correction). No C sources, public API, build flags, or upstream-mirrored files are touched.


test/go-vmafx-mcp-handler-coverage (2026-06-06, no ADR — test-only)

Files touched: cmd/vmafx-mcp/impl_handlers_test.go (new), cmd/vmafx-mcp/AGENTS.md, changelog.d/added/go-vmafx-mcp-handler-coverage.md, docs/rebase-notes.md.

Rebase impact: test-only addition; no production code changed. If a concurrent branch adds a new tool handler to impl.go, add a corresponding error-path test to impl_handlers_test.go following the established pattern (t.Setenv("VMAF_BIN", "/nonexistent/...") for binary-dependent handlers).


fix/r10-cpp23-wave-error-paths (2026-06-06, ADR-1060)

Files touched: core/src/feature/feature_extractor.cpp, core/src/read_json_model.cpp

Rebase impact: no rebase impact. All changes are internal to existing functions with no public-API or header changes. Branches that also touch feature_extractor.cpp should verify the free_fex_list label and the context-create parse-options error path merge cleanly.


fix/helm-chart-security-hardening (2026-06-06, ADR-1058)

Files touched: deploy/helm/vmafx/templates/pdb.yaml (new), deploy/helm/vmafx/templates/operator-rbac.yaml, deploy/helm/vmafx/templates/networkpolicy.yaml, deploy/helm/vmafx/values.yaml, deploy/helm/vmafx/values.schema.json

Rebase impact: The operator RBAC resource names changed: *-operator-role (ClusterRole) is replaced by *-operator-crds (ClusterRole) + *-operator-ns (Role). Any branch that patches operator-rbac.yaml will conflict on the resource name. Run helm upgrade (not in-place patch) when applying to existing operator installs. The networkPolicy.allow schema is now additionalProperties: false; any branch that adds a new allow.* key must also enumerate it in values.schema.json.


fix/rust-clippy-library-strictness (2026-06-06, ADR-1063)

Files touched: bindings/rust/vmafx-sys/src/lib.rs, bindings/rust/vmafx-sys/src/safe.rs, bindings/rust/vmafx/src/lib.rs, bindings/rust/vmafx/src/picture.rs, bindings/rust/vmafx/src/error.rs, core/src/feature/rust/tad/src/lib.rs

Rebase impact: vmafx-sys/src/lib.rs no longer uses crate-level #![allow(clippy::all)]; the generated bindings are now in a private mod bindings with the allow scoped to that module. Any branch that adds new hand-written code to vmafx-sys/src/lib.rs or safe.rs must write clippy-clean code. The VmafContext::default() call is gone — branches that depend on it must use VmafContext::new() instead. The #![deny(unsafe_op_in_unsafe_fn)] in safe.rs and tad/src/lib.rs will cause a compile error on any in-flight branch that adds a bare unsafe operation inside an unsafe fn without an explicit unsafe {} block.


fix/msvc-cpp-std-vc-latest-1056 (2026-06-06, ADR-1056)

Files touched: core/meson.build, core/AGENTS.md

Rebase impact: core/meson.build no longer carries cpp_std=c++23 in default_options. Any branch that adds cpp_std=... to default_options will conflict with this change. The add_project_arguments('-std=c++23') block must remain beneath the cxx = meson.get_compiler('cpp') line and above the first cc.check_header call. The get_option('cpp_std') == 'none' guard must be preserved; removing it would cause the SYCL leg to receive both -Dcpp_std=c++14 (from the workflow) and -std=c++23 (from the else branch), which is a compile error.


fix/ci-pin-cuda-132-jimver (2026-06-06, no ADR — CI configuration pin fix)

no rebase impact: CI-only change (.github/workflows/build.yml, .github/workflows/libvmaf-build-matrix.yml). No C sources, public API, or upstream-mirrored files are touched.


fix/macos-docker-platform-unblock (2026-06-04, no ADR — build bug fix)

no rebase impact: adds <string_view> include to core/tools/vmaf.cpp (no logic change) and replaces VmafCudaFunctions with CudaFunctions in 13 CUDA close callbacks (correct type name, no ABI/API change). Neither modification touches upstream-mirrored code paths or public API signatures.


revert/float-adm-simd-dispatch-neon-fma (2026-06-06, ADR-1057)

no rebase impact: removes adm_prime_simd_dispatch() from adm_tools.h and adm_tools.c; removes the call site added to float_adm.c::init() by PR #685; deletes core/test/test_float_adm_simd.c and its meson.build entries. Any in-flight branch that rebases onto a version of adm_tools.h that still contains adm_prime_simd_dispatch() will see a merge conflict at the declaration — resolve by simply not including the declaration (the function no longer exists after this revert). The SIMD kernel files (adm_tools_avx2.c, adm_tools_neon.c, etc.) are untouched; the functions remain compiled and linkable for a future re-dispatch PR.


fix/core-test-regressions-pr-train (2026-06-04, no ADR — bug fixes)

Files touched: core/src/gpu_picture_pool.cpp, core/src/feature/feature_extractor.cpp, core/src/feature/integer_motion.c, core/src/predict.c, core/test/test_framesync.c, core/test/test_integer_motion_coverage.c, core/test/test_score_pooled_eagain.c

Rebase impact: Any concurrent branch that also edits feature_extractor_list[] must preserve the &vmaf_fex_integer_motion_v2 entry. Any branch that adds a new motion extractor with VMAF_FEATURE_EXTRACTOR_PREV_REF flag benefits from the context_extract prev_ref management added here. Branches modifying predict_load_feature_score must not regress the -EAGAIN vs -EINVAL distinction for unwritten feature vectors (Netflix#755 / ADR-0154).


fix/legacy-runner-import-stub-adr0749 (2026-06-04, no ADR — bug fix)

Files touched: compat/python-vmaf/core/quality_runner.py, docs/state.md, changelog.d/fixed/legacy-runner-import-stub-adr0749.md

Rebase impact: The VmafLegacyQualityRunner stub is fork-local and does not conflict with upstream Netflix/vmaf (which never had this class). No upstream sync will touch compat/python-vmaf/core/quality_runner.py in a way that removes the stub; if upstream adds a class with the same name, the stub must be removed rather than overwritten.

fix/arm-motion-v2-re-register-and-test-order

Files touched: core/src/meson.build, core/src/feature/feature_extractor.c, core/test/test_integer_motion_coverage.c

Rebase impact: no rebase impact from other branches expected. If a concurrent PR touches feature_extractor_list[] or meson.build's CPU source list, preserve integer_motion_v2.c registration and the &vmaf_fex_integer_motion_v2 list entry — removing them breaks all "motion_v2" lookups on CPU-only builds.

ci/dev-container-gate-adr0819 (2026-06-04, ADR-0819)

no rebase impact: adds .github/workflows/dev-container-build.yml and docs/adr/0819-dev-container-ci-gate.md. No C source, public C API, upstream-mirrored Python, Netflix golden-assertion file, or ffmpeg-patches file is touched.

docs/mkdocs-strict-nav-conformance

no rebase impact: changes are isolated to mkdocs.yml nav entries and a changelog fragment. No C source, public C API, upstream-mirrored Python, Netflix golden-assertion file, or ffmpeg-patches file is touched.


ci/promote-gpu-coverage-gate-required

no rebase impact: changes are isolated to the CI workflow file and docs. No C source, public C API, upstream-mirrored Python, Netflix golden-assertion file, or ffmpeg-patches file is touched.


fix/containerfile-user-hardening-adr1042

no rebase impact: container hardening changes only (USER directive and ARG/ENV scoping). No public API or upstream-mirrored C code touched.## fix/r9-helm-vmaftune-grpc-bugs (2026-06-04)

no rebase impact: changes are confined to deploy/helm/vmafx/ (Helm chart config only), tools/vmaf-tune/src/vmaftune/cli.py (Python), and cmd/vmafx-node/online_feedback.go (fork-local Go binary). None of these files has an upstream Netflix/vmaf counterpart.


fix/r6-sycl-kernel-correctness (2026-06-04)

Files touched: core/src/feature/sycl/integer_vif_sycl.cpp, core/src/feature/sycl/integer_motion_sycl.cpp, core/src/feature/sycl/integer_adm_sycl.cpp

Rebase impact: no rebase impact — all files are fork-local SYCL paths with no upstream counterparts.


fix/r6-cuda-hip-kernel-correctness (2026-06-04)

Files touched: core/src/feature/cuda/integer_vif/filter1d.cu, core/src/feature/cuda/integer_adm/adm_cm.cu, core/src/feature/hip/integer_adm/adm_decouple.hip, core/src/feature/hip/integer_vif/vif_statistics.hip

Rebase impact: no rebase impact — all four files are fork-local GPU paths with no upstream counterparts. The CUDA files are in feature/cuda/ which Netflix upstream does not ship; the HIP files are fully fork-added.


fix/r6-metric-scoring-guards (2026-06-04)

Files touched: core/src/feature/integer_psnr.c, core/src/feature/x86/psnr_avx2.c, core/src/feature/x86/psnr_avx512.c, core/src/feature/arm64/psnr_neon.c, core/src/feature/adm.c, core/src/feature/integer_adm.c, core/src/feature/float_adm.c

Rebase impact: no rebase impact — all fixes are in error-path / edge-case branches that upstream has not touched since the fork. The APSNR cap formula change (* 2 removed) only affects scores on nearly-perfect sequences; it is not a Netflix golden-data assertion value.


fix/r7-ci-wf-concurrency-timeout (2026-06-04, ADR-1035)

Files touched: .github/workflows/nightly.yml, .github/workflows/nightly-bisect.yml, .github/workflows/supply-chain.yml, .github/workflows/release-please.yml, .github/workflows/scorecard.yml, .github/workflows/rust-ci.yml, .github/workflows/go-ci.yml, .github/workflows/e2e-k8s.yml

No rebase impact: pure CI configuration changes with no code-path dependencies. Upstream Netflix/vmaf does not carry these workflows.


Files touched: docs/development/build-flags.md, docs/metrics/features.md, mkdocs.yml

No rebase impact: documentation-only changes. No C library, public header, or Netflix golden-assertion file is touched.


fix/r7-mcp-precision-subsample-drift (2026-06-04, ADR-1038)

Files touched: cmd/vmafx-mcp/impl.go, cmd/vmafx-mcp/tools.go, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py

No rebase impact: pure default-value changes. No C library, public header, upstream Python harness, or Netflix golden-assertion file is touched.


fix/r7-vendored-svm-realloc-oom (2026-06-04, ADR-1039)

Files touched: core/src/svm.cpp

no rebase impact: three internal realloc safety patches. No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. If an upstream Netflix/vmaf commit also fixes these same three sites, take the upstream version (which is also a MEM04-C fix) and drop this patch at rebase time.


Files touched: Cargo.toml, bindings/rust/vmafx/Cargo.toml, ai/pyproject.toml, mcp-server/vmaf-mcp/pyproject.toml, dev-llm/pyproject.toml, python/pyproject.toml, tools/ensemble-training-kit/pyproject.toml, tools/vmaf-roi-score/pyproject.toml, tools/vmaf-tune/pyproject.toml, core/src/svm.cpp

No rebase impact: license field corrections and copyright header additions have no effect on build or test outputs. Upstream Netflix/vmaf does not carry Cargo.toml or any of these pyproject.toml files.


fix/sycl-speed-incomplete-type-access (2026-06-04)

Files touched: core/src/feature/sycl/speed_chroma_sycl.cpp, core/src/feature/sycl/speed_temporal_sycl.cpp

no rebase impact: internal build-fix replacing direct struct member dereferences with the existing public API call vmaf_sycl_get_queue_ptr(). No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. If an upstream commit adds a SYCL speed extractor, ensure it also uses vmaf_sycl_get_queue_ptr() rather than direct struct access.


fix/cli-narrowing-casts-vmaf-cpp (2026-06-04)

Files touched: core/tools/vmaf.cpp

no rebase impact: three static_cast<unsigned>(...) wrappers added to the VmafPictureConfiguration initializer at line ~1360. No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. If an upstream commit modifies the VmafPictureConfiguration initializer or adds new pic_params fields, verify the cast pattern is preserved.


fix/release-please-config-json-parse-error (2026-06-04)

no rebase impact: removes a duplicate array element from release-please-config.json. No C source, public header, Python harness, or Netflix golden-assertion file is touched. Any in-flight branch that modifies release-please-config.json should simply ensure the ai package's changelog-sections array no longer contains two chore entries.

fix/simd-psnr-16bit-scalar-tail-overflow (2026-06-04)

Files touched: core/src/feature/x86/psnr_avx2.c, core/src/feature/x86/psnr_avx512.c, core/src/feature/arm64/psnr_neon.c

no rebase impact: internal arithmetic fix in scalar tail loops. No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The change affects only the three SIMD backends' scalar-remainder path for 16-bit PSNR; the SIMD main loop is unchanged. Port of any upstream commit touching these files should verify that the (uint32_t)abs(...) pattern is preserved in the scalar tail if the upstream change modifies it.


fix/r6-cpu-scoring-nan-ub-guards (2026-06-04)

Files touched: core/src/feature/integer_psnr.c, core/src/feature/ms_ssim.c, core/src/feature/float_ssim.c, core/src/feature/float_ms_ssim.c, core/src/feature/iqa/ssim_tools.c, core/src/feature/adm.c, core/src/feature/integer_adm.c, core/src/feature/float_adm.c, core/src/feature/motion.c, core/src/feature/cambi.c, docs/adr/1033-cpu-scoring-nan-ub-guards.md, changelog.d/fixed/1033-cpu-scoring-nan-ub-guards.md

no rebase impact: all changes are internal correctness fixes inside CPU-path scoring functions. No public C API headers, no meson_options.txt entries, no ffmpeg-patches/ series entries, and no Netflix golden-assertion files are touched. Rebasing on top of any upstream commit that modifies these same source files may produce minor context conflicts in the guard blocks; resolve by keeping both the upstream change and the NaN guard. ADR-1033.


fix/vmaf-init-double-init-guard-vmaf-close-pointer-contract (2026-06-04, ADR-1032)

Files touched: core/src/libvmaf.c, core/src/dnn/dnn_api.c, core/include/libvmaf/libvmaf.h, core/test/test_context.c

no rebase impact: all changes are fork-local bug-fixes with no upstream equivalents. vmaf_init guard is a new branch (no upstream logic removed), vmaf_close header change is documentation-only, and the DNN fallback path touches a fork-added sidecar-loading block that does not exist in Netflix upstream. No Netflix golden assertions or upstream-mirrored Python are touched.


fix/cuda-vif-filter1d-adm-cm-opprec (2026-06-04)

Files touched: core/src/feature/cuda/integer_vif/filter1d.cu, core/src/feature/cuda/integer_adm/adm_cm.cu

no rebase impact: pure kernel arithmetic fixes. No public C API header, no meson build option, no FFmpeg patch surface, and no upstream-mirrored Python file is touched. The fixes correct two silent arithmetic defects (a typo in the rd-filter upper-bound guard in filter1d.cu and a missing parenthesis pair in two x_sq reduction loops in adm_cm.cu). Cross-backend SYCL/HIP/ Vulkan ADM and VIF twins do not carry the same expressions and are unaffected.


fix/sycl-vif-rd-stride-motion-uv-sync (2026-06-04)

Files touched: core/src/feature/sycl/integer_vif_sycl.cpp, core/src/feature/sycl/integer_motion_sycl.cpp, docs/adr/1034-sycl-vif-rd-stride-motion-uv-sync.md, changelog.d/fixed/sycl-vif-rd-stride-motion-uv-sync.md

If this branch rebounds onto a commit that changes the rd_stride or rd_size allocation in integer_vif_sycl.cpp, re-verify that both the scalar (SIMD-32) and SIMD-16 kernel variants use (e_w + 1U) / 2U as the stride and that the allocation uses ((w + 1U) / 2U) * ((h + 1U) / 2U). If a future PR routes UV H2D copies through copy_queue and updates last_upload_event, the vmaf_sycl_queue_wait(state) added in submit_fex_sycl can be removed in favour of the GPU-side barrier — track this as a follow-up optimization.


fix(hip,metal): HIP adm_decouple dangling body + VIF wavefront carry + Metal motion vertical halo (ADR-1030, 2026-06-04)

Files touched: core/src/feature/hip/integer_adm/adm_decouple.hip, core/src/feature/hip/integer_vif/vif_statistics.hip, core/src/feature/metal/float_motion.metal, docs/adr/1030-hip-metal-kernel-correctness.md, changelog.d/fixed/hip-metal-kernel-correctness-1030.md

Rebase impact: low. These are self-contained correctness fixes inside GPU-only kernel files. No public C API, no CPU feature extractor, no CLI flag, and no Netflix golden assertion is touched. Any branch that also modifies adm_decouple.hip will need to re-apply the dangling-body removal; branches touching vif_statistics.hip wavefront_reduce_i64 will need to keep the integer-addition reassembly. Metal float_motion.metal conflicts are straightforward to resolve by preserving TILE_H=20 and the - HALF_FW origin offsets.


docs/vulkan-overview-mark-removed-adr0726 (2026-06-04)

Files touched: docs/backends/vulkan/overview.md, docs/api/vulkan-image-import.md, docs/state.md, changelog.d/chore/vulkan-docs-mark-removed.md

no rebase impact: docs-only changes. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The changes add removal notices to two Vulkan documentation files that still described the backend as active after ADR-0726 removed it.


docs(post-rename): scrub residual libvmaf/ paths (ADR-0700)

no rebase impact: doc-only path corrections. All changed files are under docs/, AGENTS.md, CONTRIBUTING.md, and one comment in core/include/libvmaf/libvmaf_mcp.h. No C source files changed. No public headers changed (the comment in libvmaf_mcp.h is prose, not an include path). No Netflix golden assertions touched.


docs(usage,api): correct backend auto-priority + Doxygen drift in public headers

Files touched: docs/usage/vmafx-cli.md, docs/usage/vmaf-tune-score-backend.md, docs/usage/vmaf-tune.md, docs/usage/bench.md, docs/usage/ffmpeg.md, core/include/libvmaf/libvmaf.h, core/include/libvmaf/libvmaf_hip.h, core/include/libvmaf/AGENTS.md, core/include/libvmaf/model.h, changelog.d/changed/backend-autopriority-doxygen-drift.md, docs/rebase-notes.md.

No rebase impact: doc-only and Doxygen-only edits. No C source, public C symbol, ABI surface, Netflix golden assertion, or upstream-mirrored implementation is affected. The model.h change replaces a @field block with per-member inline comments — comment-only; no struct layout change.


docs/mcp-tools-audit-fixes

Files touched: docs/mcp/index.md, docs/mcp/tools.md, docs/mcp/http-transport.md, docs/mcp/release-channel.md, changelog.d/changed/mcp-tools-catalogue-audit-fixes.md, docs/rebase-notes.md.

No rebase impact: doc-only scrub. No C source, public headers, Netflix golden assertions, MCP server Python/Go source, or upstream-mirrored symbols are touched. No branch logic changed.


docs(post-vulkan-drop): residual scrub + fix -Denable_vulkan=true (invalid Meson) → =enabled

Branch: docs/post-vulkan-drop-residual-scrub

no rebase impact: docs-only change. Fixes stale Vulkan references in docs/ai/datasets/k150k.md, docs/mcp/tools.md, docs/api/index.md, and docs/api/gpu.md. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched.


docs(rebrand): scrub residual Lusoris-fork references

Files touched: docs/usage/cli.md, docs/ai/mos-corpora.md, docs/ai/konvid-1k-ingestion.md, docs/ai/konvid-150k-ingestion.md, docs/development/release.md, docs/development/automated-rule-enforcement.md, docs/mcp/index.md, docs/architecture/c4-context.md, docs/architecture/c4-container.md, docs/metrics/bad-cases.md, CONTRIBUTING.md, AGENTS.md, changelog.d/changed/scrub-lusoris-fork-refs.md, docs/rebase-notes.md.

No rebase impact: doc-only text substitutions (branding strings, fork issue URL, HIP status text). No C source, public header, Netflix golden assertion, upstream-mirrored symbol, version string, or copyright header was modified.


docs(versions): bump stale Go + required-checks-count + Python pins

Files touched: CLAUDE.md, docs/development/languages.md, docs/development/release.md, docs/architecture/c4-context.md, docs/mcp/index.md, docs/getting-started/install/windows.md, docs/ai/training.md, changelog.d/changed/bump-stale-docs-go-checks-python-pins.md, docs/rebase-notes.md.

No rebase impact: docs-only scrub; no C source, public header, Netflix golden assertion, or upstream-mirrored symbol is affected.


docs(copyright): drop "and Claude (Anthropic)" from fork headers — residual sweep

Files touched: README.md, dev-llm/src/vmaf_dev_llm/__init__.py, scripts/lib/__init__.py, changelog.d/changed/copyright-drop-anthropic-residuals.md, docs/rebase-notes.md.

No rebase impact: text-only copyright-line change in three files missed by the ADR-0861 / ADR-0776 sweeps. No C source, public header, Netflix golden assertion, upstream-mirrored symbol, or build system touched.


docs(post-ansnr): scrub residual ANSNR references (PR #38 follow-up)

Files touched: docs/api/gpu.md, docs/backends/hip/overview.md, docs/backends/index.md, docs/backends/arm/overview.md, docs/backends/metal/index.md, docs/development/build-flags.md, docs/development/cross-backend-gate.md, docs/metrics/features.md, docs/mcp/tools.md, README.md, core/src/feature/metal/AGENTS.md, core/src/hip/AGENTS.md, core/src/feature/cuda/AGENTS.md, AGENTS.md, changelog.d/changed/post-ansnr-doc-scrub.md.

No rebase impact: doc-only changes (no C source, public header, Netflix golden assertions, or upstream-mirrored symbols affected). If an upstream Netflix/vmaf PR adds float_ansnr back, take the upstream side only in the C sources; the fork's doc changes apply only to fork-specific backend docs.


fix(rebrand): correct C++ badge (c++11→c++23) + drop Vulkan from GPU badge

Files touched: README.md, changelog.d/fixed/readme-badges-cpp23-drop-vulkan.md, docs/rebase-notes.md.

No rebase impact: doc-only edit to README.md badge lines; no C source, public header, Netflix golden assertion, or upstream-mirrored symbol is affected.

test(hip): parity coverage round 5 — speed_chroma + speed_temporal (2026-06-04, ADR-1004)

Files touched: core/test/test_hip_speed_chroma_parity.c, core/test/test_hip_speed_temporal_parity.c, core/test/meson.build, docs/adr/1004-hip-kernel-coverage-round5.md, docs/adr/README.md, docs/state.md, changelog.d/added/1004-hip-kernel-coverage-round5.md

no rebase impact: the two new test TUs are fork-local additions with no upstream analogue. The meson.build additions are append-only within the if hip_enabled block. No C source, public header, Netflix golden assertion, or upstream-mirrored Python file is modified.


chore/build-cpp-std-c23-bump (2026-06-04, ADR-1003)

Files touched: core/meson.build, core/AGENTS.md, core/test/meson.build, docs/adr/1003-cpp-std-c23-bump.md, docs/adr/README.md, changelog.d/changed/cpp-std-c23-bump.md

Rebase impact: Low. The cpp_std=c++11cpp_std=c++23 change in core/meson.build may conflict with any upstream Netflix/vmaf PR that also touches default_options. Netflix upstream still uses c++11; on conflict, keep c++23 (the fork's stated standard). The core/test/meson.build fix for test_feature_collector_coverage is fork-local; take the fork side on any conflict.


test(mcp-server): coverage push round 4

Files touched: mcp-server/vmaf-mcp/tests/test_coverage_round4.py, changelog.d/added/mcp-server-coverage-round4.md, docs/rebase-notes.md.

Rebase impact: None. Fork-local Python test file with no upstream analogue; no C source, public header, or Netflix golden-assertion file is touched.


test(sycl): parity coverage round 5 — CAMBI parity gate

Branch: test/sycl-parity-round5-cambi

no rebase impact: adds core/test/test_sycl_cambi_parity.c (new file, no upstream analogue), one meson.build registration block, ADR-1001, and a changelog fragment. No C source, public header, feature extractor implementation, or Netflix golden-assertion file is touched.


chore(rust): bump bindgen 0.69 → 0.72 + workspace edition 2021 → 2024 (ADR-1002)

Branch: chore/rust-edition-2024-bindgen-072

Touches: Cargo.toml, Cargo.lock, bindings/rust/vmafx/Cargo.toml, bindings/rust/vmafx-sys/Cargo.toml, core/src/feature/rust/tad/src/lib.rs, docs/adr/1002-rust-edition-2024-bindgen-072.md, changelog.d/chore/rust-edition-2024-bindgen-072.md.

No rebase impact on upstream Netflix/vmaf code. All changed files are fork-local Rust crates (vmafx-sys, vmafx, vmafx-tad) with no upstream analogue. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. A future upstream port cannot conflict with Rust workspace settings since Netflix/vmaf has no Rust code. The bindgen-consumed header paths consumed by bindgen remain at core/include/libvmaf/ (ADR-0700 path); any future upstream header change that adds or removes a symbol is handled automatically by re-running cargo build (bindgen regenerates on every build).


fix(cppcheck): resolve Whole-Project warnings

Files touched: core/src/feature/integer_ssim.c, core/src/picture_pool.cpp, core/src/read_json_model.cpp, core/tools/vmaf.cpp, core/test/test_ssimulacra2_simd.c, core/test/dnn/test_tensor_io.c, .cppcheck-suppressions.txt, changelog.d/fixed/cppcheck-whole-project-warnings.md

Rebase impact: None for upstream Netflix/vmaf cherry-picks. All changes are either fork-local files (picture_pool.cpp, opt.cpp suppression) or minimal defensive additions (null checks, format-specifier corrections, struct-member initialisation) that do not alter external behaviour. The %d%u format fixes in vmaf.cpp and read_json_model.cpp are cosmetic; the VmafModel{} initialisation is semantically equivalent to memset(m, 0, …) on any IEEE-754 platform.


docs(coverage): ADR-0922 coverage-gate runbook (2026-06-04)

Files touched: docs/development/coverage-gate.md (new), changelog.d/added/coverage-gate-runbook.md (new)

Rebase impact: None. Documentation-only addition; no source, build, or CI files are modified.


fix(cppcheck): motion_avx512 missing sub-kernel functions

Files touched: core/src/feature/x86/motion_avx512.c, core/src/feature/x86/motion_avx512.h

Rebase impact: None. Both files are fork-local SIMD additions. The four new public symbols (sad_avx512, y_convolution_8_avx512, y_convolution_16_avx512, x_convolution_16_avx512) are additive and have no upstream Netflix/vmaf equivalents. No existing symbol is renamed, removed, or ABI-changed.


chore/tech-stack-badges-go-pin-bump (2026-06-04, ADR-1000)

Files touched: README.md, go.mod, .github/workflows/go-ci.yml, docs/adr/1000-tech-stack-badges-go-rust-pins.md, docs/adr/_index_fragments/1000-tech-stack-badges-go-rust-pins.md, docs/adr/_index_fragments/_order.txt, changelog.d/changed/tech-stack-badges-go-pin-bump.md

Rebase impact: None for C/SYCL/CUDA/HIP/Vulkan/Rust code. go.mod minimum version is bumped 1.25.0 → 1.26.4; this only affects builds that run go build / go test. Upstream Netflix/vmaf has no Go code, so no upstream cherry-pick will conflict with this change. The README badge block change is purely additive; no upstream port touches the README badge section.


fix/tsan-framesync-stdatomic-cxx (2026-06-04, ADR-0999)

Files touched: core/src/framesync.h, core/src/ref.h

Rebase impact: None. Both files are upstream-mirror headers touched only in the preprocessor guard section; no function signatures or struct members are changed. Upstream Netflix/vmaf does not compile feature_extractor.cpp as C++ (they use a C-only build), so this guard addition will not conflict with any upstream cherry-pick. ref.h guard widening from _MSC_VER to all C++ is backward-compatible: non-MSVC C compilers are unchanged (#if defined(__cplusplus) is false in C mode).


fix(metal): hoist feature_extractor.h above extern "C" in Metal .mm files

Files touched: core/src/feature/metal/float_moment_metal.mm, core/src/feature/metal/float_motion_metal.mm, core/src/feature/metal/float_ms_ssim_metal.mm, core/src/feature/metal/float_psnr_metal.mm, core/src/feature/metal/float_ssim_metal.mm, core/src/feature/metal/integer_motion_metal.mm, core/src/feature/metal/integer_motion_v2_metal.mm, core/src/feature/metal/integer_psnr_metal.mm

Rebase impact: None. All changed files are fork-local Metal backend sources. No upstream Netflix/vmaf files are touched. The change is purely an include-order fix (moves feature_extractor.h above its enclosing extern "C" block); no API, ABI, or algorithm change.


fix(arm64): guard framesync.h stdatomic include for C++ mode

Branch: fix/arm64-clang-stdatomic-cxx-conflict

Files touched: changelog.d/fixed/arm64-clang-stdatomic-cxx-framesync.md, docs/state.md, docs/rebase-notes.md.

no rebase impact: The framesync.h guard is already present via ADR-0999 (fix/tsan-framesync-stdatomic-cxx); this PR adds the ARM64-specific changelog fragment and state.md tracking row.


port/upstream-speed-chroma-simd-30f472b14 (2026-06-03, upstream 30f472b14)

Files touched: core/src/feature/x86/speed_avx2.c, core/src/feature/x86/speed_avx2.h, core/src/feature/x86/speed_avx512.c, core/src/feature/x86/speed_avx512.h, core/src/feature/speed.c, core/src/meson.build, core/test/test_speed_simd.c, core/test/meson.build

Rebase impact: Reduces delta — this port lands the upstream commit verbatim (new AVX2 + AVX-512 covariance-sum kernels, function-pointer dispatch). Future /sync-upstream passes that touch speed.c will see a smaller diff because the kernel dispatch pattern is now present on both sides. The compute_cov_kernel_fn typedef and SpeedState::compute_cov_kernel field are fork additions; any upstream change to the compute_covariance signature must also update the typedef here.


test/go-coverage-push (2026-06-04)

Files touched: cmd/vmafx-controller/{grpc_server.go,grpc_server_test.go,http_cancel_test.go,main_test.go,main_extra_test.go,auth/grpc_interceptor.go,auth/middleware.go,queue/queue_listall_test.go}, cmd/vmafx-mcp/impl.go, cmd/vmafx-node/{executor_test.go,main_test.go,online_feedback_pump_test.go}, cmd/vmafx-operator/internal/controller/{vmafxjob_applystatus_test.go,vmafxmodeltraining_applystatus_test.go,vmafxmodeltraining_controller.go,vmafxnode_controller.go}, cmd/vmafx-server/{grpc_server.go,http_cancel_test.go,main_extra_test.go}, pkg/observability/otel_instruments_test.go, pkg/score/grpc_client_unary_test.go

Rebase impact: Low. All changes are either test files (no rebase conflict possible on pure test additions) or targeted bug fixes in production code (grpc_server.go undefined-var fix, operator int32 type cast, MCP Vulkan backend dispatch). The auth ContextWithClaims export and probeHealthz method are additive. No public header or proto changes.


test/compat-python-vmaf-coverage-push (2026-06-03)

Files touched: compat/python-vmaf/tests/ (new directory), pyproject.toml (testpaths + pythonpath additions)

Rebase impact: None. Pure test addition; no production code changed. The pyproject.toml diff only appends to testpaths and pythonpath — if a concurrent branch adds entries in the same section a trivial conflict resolution is required (keep both entries).


vmafx-title-rebrand (2026-06-03, no ADR)

Files touched: README.md, mkdocs.yml, pyproject.toml, CONTRIBUTING.md

Rebase impact: None. All four files are fork-local metadata surfaces (project title, site name, package description, contributor heading). Upstream Netflix/vmaf does not touch any of these files; no merge conflict is possible on rebase.


feat(vmaf-tune): ADR-0498 follow-up #7 — encoder stats, x264 detection, backend dispatch, codec-list parser

Files touched: tools/vmaf-tune/src/vmaftune/encode.py, tools/vmaf-tune/src/vmaftune/fast.py, tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, tools/vmaf-tune/tests/test_encode_dispatcher_per_adapter.py, tools/vmaf-tune/tests/test_adr_0498_followup7.py

Rebase impact: None. All changed files are fork-local to tools/vmaf-tune/; no upstream Netflix/vmaf files are touched. The _VERSION_PROBE_PATTERNS dict is additive (new keys only). The parse_available_codecs function is new; no existing symbol is renamed or removed. The _build_production_sample_extractor signature change (new backend=None kwarg) is backward-compatible. The test_encode_dispatcher_per_adapter.py fix (capture first call only) resolves a test fragility introduced by the probe-cache expansion; no merge conflict expected against Netflix upstream since that test is fork-added.


fix/cuda-duplicate-csf-r-definitions (2026-06-03)

Files touched: core/src/feature/cuda/integer_adm/adm_cm.cu

Rebase impact: None. Purely removes a duplicate code block introduced by a merge-order accident (PR #565 admin-merged while master already had the same helpers). No upstream file is touched; no public header changes.

feat/ai-run-manifest-12-scripts (ADR-0668 follow-up)

No rebase impact. Pure Python-only change to ai/scripts/train_konvid.py. No C/header files modified. No upstream Netflix/vmaf files touched. The only observable change is the addition of a train_konvid.manifest.json sidecar emitted after training completes.


cuda-adm-decouple-inline-ldg (2026-05-29, ADR-0773)

Files touched: core/src/feature/cuda/integer_adm/adm_csf.cu, core/src/feature/cuda/integer_adm/adm_cm.cu

Rebase impact: None. Both files are fork-added CUDA kernel translation units that do not exist in upstream Netflix/vmaf master (ADM CUDA port is fork-local). No rebase conflict is possible.

The change is a pure performance annotation: const T *__restrict__ pointer extraction before hot inner loops and __ldg() on all per-pixel DWT2 band reads. If upstream Netflix ever adds their own ADM CUDA port, these files will need to be re-reviewed against theirs; the F3 pattern should carry forward.


feat/vmafx-tune-go-stage4-report (ADR-0770)

No rebase impact: pure Go CLI and pkg/report additions. No upstream C/Python files modified. Files added: cmd/vmafx-tune/cmd/report.go, pkg/report/multi.go, pkg/report/multi_test.go, docs/adr/0770-vmafx-tune-go-stage4-report.md, changelog.d/added/vmafx-tune-go-stage4-report.md. Files modified: cmd/vmafx-tune/cmd/root.go (register report + ladder), cmd/vmafx-tune/AGENTS.md (invariants 8–9), docs/usage/vmafx-tune-go.md (Stage-4 section), docs/adr/README.md (new row), docs/rebase-notes.md (this entry).

doxygen-thread-safety-tags (2026-05-29, ADR-0788)

Files touched: core/include/libvmaf/libvmaf.h, core/include/libvmaf/picture.h, core/include/libvmaf/feature.h, core/include/libvmaf/model.h, core/include/libvmaf/dnn.h

Rebase impact: Low. These are comment-only additions. An upstream sync that modifies the same function signatures may create minor merge-fuzz on the Doxygen blocks; resolve by re-applying the @thread-safety tags to whatever the upstream version of the comment looks like.


containerfile-layer-optimization (ADR-0790, 2026-05-29)

Files touched: dev/Containerfile

Rebase impact: None. dev/Containerfile is fork-local (not present in upstream Netflix/vmaf). No rebase conflict is possible.


phase-4b8-c-abi-break-scoping (2026-05-29)

Files touched: docs/adr/0767-phase-4b8-c-abi-break-scoping.md, docs/research/research-0752-phase-4b8-c-abi-break-scoping.md, docs/adr/README.md, changelog.d/changed/0767-phase-4b8-c-abi-break-scoping.md

Rebase impact: No rebase impact. This is a scoping/design document with no source changes. The implementation PR (when it lands) will touch core/include/libvmaf/*.h and every ffmpeg-patches/ file — that implementation PR will carry its own rebase note cataloguing the specific header and patch changes. When upstream Netflix/vmaf adds symbols to libvmaf.h or model.h between now and the v4 implementation, the ADR-0767 removal list should be checked against the upstream additions to avoid removing a symbol upstream has just added.


docs/hip-picture-stub-comment-closeout (ADR-0613, 2026-06-03)

core/src/picture.h — comment on VMAF_PICTURE_BUFFER_TYPE_HIP_DEVICE updated to reflect that picture_hip.{c,h} is fully implemented (ADR-0613); the old text described it as a stub.

Rebase impact: NONE — comment-only change; no logic, no ABI delta.


chore/cambi-drop-vulkan-scaffold — remove CAMBI Vulkan scaffolding per ADR-0726 (2026-06-03)

No rebase impact on upstream C/Python code.

Files modified are fork-local: core/src/feature/vulkan/cambi_vulkan.c (deleted), core/src/feature/vulkan/shaders/cambi_{preprocess,derivative,filter_mode,decimate,mask_dp}.comp (deleted), core/test/test_cambi_vulkan.c (deleted), core/src/vulkan/meson.build (CAMBI source + shader entries removed), core/src/feature/cambi_internal.h (comment updated), core/src/feature/cuda/integer_cambi_cuda.c (comments updated), core/src/feature/hip/integer_cambi_hip.c (comment updated), changelog.d/removed/cambi-vulkan-scaffold.md (new).

Rebase impact: None on upstream sync (no Netflix file touched).


CI scaffold-comment refresh (2026-06-03)

.github/workflows/fuzz.yml — header comment updated: ADR-0882 citation added alongside ADR-0270/0311. .github/workflows/libvmaf-build-matrix.yml — Metal matrix lane comment and name: field updated from "T8-1 scaffold" to "runtime" (ADR-0420 landed).

no rebase impact: comment-only change; no logic or structure altered.

Single ledger of fork-local changes that need attention when this fork syncs from upstream/master (Netflix/vmaf). Required by ADR-0108: every fork-local


Second-opinion batch smoke scaffold + pytest path fix (ADR-0991, 2026-06-03)

Files touched: ai/pyproject.toml (add pythonpath = ["scripts"] to pytest config), ai/testdata/smoke-second-opinion-batch/ (new: batch.json, fixtures/*.jsonl, README.md), docs/adr/0991-second-opinion-batch-runs.md (new), docs/research/research-0991-second-opinion-batch-2026-06-03.md (new), changelog.d/fixed/0991-second-opinion-batch-pytest-path.md (new).

Rebase impact: None on upstream sync (no Netflix/vmaf upstream file touched). The ai/pyproject.toml addition is additive; no conflict risk.


controller-multi-tenant-auth-gateway (2026-05-29, ADR-0794)

Files touched: cmd/vmafx-controller/auth/ (new package), cmd/vmafx-controller/main.go, cmd/vmafx-controller/grpc_server.go, cmd/vmafx-controller/http_server.go, cmd/vmafx-controller/queue/queue.go, cmd/vmafx-controller/queue/schema.sql, deploy/helm/vmafx/crds/vmafx.dev_vmafxtenants.yaml (new), deploy/helm/vmafx/templates/tenant-crd-config.yaml (new), deploy/helm/vmafx/templates/deployment.yaml, deploy/helm/vmafx/values.yaml, docs/server/auth.md (new), docs/adr/0794-controller-multi-tenant-auth-gateway.md (new).

Rebase impact: None. All touched files are fork-local additions (vmafx-controller, Helm chart, docs) that do not exist in upstream Netflix/vmaf. The SQLite schema change (tenant_id column) is additive and non-breaking. No upstream rebase conflict is possible.


KoNViD / UGC / BVI-DVC saliency batch manifests (ADR-0993, 2026-06-03)

Files touched: ai/batch-manifests/saliency/konvid-150k.json (new), ai/batch-manifests/saliency/ugc.json (new), ai/batch-manifests/saliency/bvi-dvc.json (new), docs/ai/saliency-feature-materializer.md (corpus-specific manifests section), docs/adr/0993-konvid-ugc-bvi-saliency-batch-launch.md (new), docs/adr/README.md (index row), changelog.d/added/konvid-ugc-bvi-saliency-batch-manifests.md (new).

Rebase impact: None on upstream sync (no Netflix file touched). All new files are fork-local; no upstream path conflicts.


ADR-0992 — MOS-label batch-run manifests for KonViD and CHUG

Files touched: ai/configs/mos-label-batch-konvid.json (new), ai/configs/mos-label-batch-chug.json (new), ai/tests/test_mos_label_batch_runs_smoke.py (new), ai/tests/test_batch_materialize_mos_labels.py (sys.path bug fix), docs/ai/mos-label-materializer.md, docs/adr/0992-mos-label-batch-runs.md (new), docs/adr/README.md, changelog.d/added/0992-mos-label-batch-runs.md (new), and this file.

Rebase impact: No rebase impact on upstream sync (all touched files are fork-local; no Netflix/vmaf source file is modified). No cross-branch impact: the new ai/configs/*.json files are independent and will not conflict with any in-flight branch.


Changelog-fragment section hygiene (2026-05-30)

Files touched: changelog.d/perf/*.mdchangelog.d/changed/perf-*.md (27 renames), changelog.d/performance/*.mdchangelog.d/changed/perf-*.md (5 renames), changelog.d/README.md, release-please-config.json, docs/adr/0892-conventional-commits-and-changelog-fragment-hygiene.md (new), docs/research/0892-conventional-commits-audit-2026-05-30.md (new), changelog.d/fixed/conventional-commits-audit.md (new).

Rebase impact: None on upstream sync (no Netflix file touched). Cross-branch impact on fork: any in-flight feature branch holding a changelog.d/perf/*.md or changelog.d/performance/*.md file will hit a rename-detection conflict on rebase. git rebase with default -X settings detects the rename cleanly; if a conflict surfaces, the fix is to drop the in-flight branch's copy of the file and re-add the content under changelog.d/changed/perf-<topic>.md. The migrated files had their leading ### Performance / ## perf(…) headings stripped (renderer adds ### Changed itself); in-flight branches that added a new perf/ fragment should follow the same pattern.

See ADR-0892.


fix/ci-docs-pr-trigger — docs.yml PR trigger (2026-06-03, ADR-0986)

No rebase impact on upstream C/Python code.

Files modified are fork-local: .github/workflows/docs.yml (trigger + permissions update), docs/adr/0986-ci-docs-pr-trigger.md (new), docs/adr/_index_fragments/0986-ci-docs-pr-trigger.md (new), docs/adr/_index_fragments/_order.txt (appended), changelog.d/fixed/ci-docs-pr-trigger-0986.md (new), docs/rebase-notes.md (this entry).

Netflix upstream ships no GitHub Actions workflows. No rebase conflict is possible.


Research-0760 — Rust crate audit (docs + ADR-0707 correction, 2026-05-29)

No rebase impact on upstream C/Python code.

All files modified are fork-local: docs/research/research-0760-rust-crate-audit.md (new), changelog.d/added/rust-crate-audit-0760.md (new), docs/adr/0707-vmafx-rust-pilot-feature.md (corrected enable_rust_features default description from "true" to "false"), docs/rebase-notes.md (this entry).

Neither core/meson_options.txt, core/src/meson.build, nor any C/Rust source is modified. No Netflix upstream file is touched. No rebase conflict is possible.


fix/helm-node-deployment-deduplicate (2026-05-30, ADR-0713 / ADR-0719)

Files touched: deploy/helm/vmafx/templates/node.yaml (modified), deploy/helm/vmafx/templates/node-deployment.yaml (deleted).

Rebase impact: None. The deploy/helm/ tree is fork-only — Netflix upstream ships no Helm chart. The duplicate-Deployment collision and its fix live entirely within fork-added templates.

The two templates both rendered a Deployment named {{ include "vmafx.fullname" . }}-node under .Values.node.enabled, which made helm install fail with a duplicate-resource error and left Phase 4b distributed scoring uninstallable. The richer node.yaml (liveness/readiness probes, GPU resource injection, metrics port + Service, VMAFX_NODE_ID per ADR-0713) is kept; the rclone Secret mount + storage-mode / model-dir env vars from the deleted node-deployment.yaml were folded into node.yaml.

libvmaf.Score / ScoreDirect ctx.Context plumbing (2026-05-31, fix/libvmaf-score-ctx)

Files touched: pkg/libvmaf/libvmaf.go (Score signature: ctx as first param; exec.CommandContext + WaitDelay = 2s), pkg/libvmaf/direct.go (ScoreDirect signature: ctx as first param; per-frame ctx.Err() check at the top of the read+queue loop; rename of local ctx *C.VmafContext -> vmafCtx to avoid shadowing), pkg/libvmaf/libvmaf_test.go, pkg/libvmaf/direct_test.go (call-site updates + new cancel tests), cmd/vmafx-server/{http_server.go,grpc_server.go,http_cancel_test.go}, cmd/vmafx-controller/{http_server.go,grpc_server.go,http_cancel_test.go}, cmd/vmafx-node/executor.go, cmd/vmafx-mcp/impl_direct.go.

Rebase impact: All fork-local. pkg/libvmaf/ is a fork-only Go wrapper around the public libvmaf C ABI; cmd/vmafx-* are entirely fork-local binaries with no upstream counterparts. No headers in core/include/ were changed and no upstream-mirrored C source was touched, so upstream syncs cannot collide.

Action on next upstream sync: None. The C API surface (vmaf_init / vmaf_read_pictures / vmaf_score_pooled / vmaf_close) the Go layer wraps is unchanged; we only renamed a local C.VmafContext* variable inside Go.

vmafx-tune-go deep bug audit (2026-05-31, fix/vmafx-tune-go-audit-20260531)

Files touched: pkg/report/report.go, pkg/report/sanitize_test.go (new), pkg/bisect/bisect.go, pkg/bisect/nan_parse_test.go (new), pkg/bisect/timeout_test.go (new), pkg/encoder/encoder.go, pkg/encoder/discover.go, pkg/encoder/discover_test.go, pkg/encoder/discover_cache_test.go (new), pkg/encoder/timeout_test.go (new), cmd/vmafx-tune/cmd/compare.go, cmd/vmafx-tune/cmd/ladder.go, cmd/vmafx-tune/cmd/ladder_nan_test.go (new), changelog.d/fixed/0979-vmafx-tune-go-deep-bug-audit.md (new).

Rebase impact: Fork-local only. Every file lives under pkg/{report,bisect,encoder} or cmd/vmafx-tune/, which are 100% fork additions (the vmafx-tune-go Stage-1 surface from ADR-0705 / ADR-0713; no Netflix upstream counterpart exists). An upstream sync will not encounter conflicts on any of these files.

On-disk surface changes (relevant to in-tree callers):

  • New public helper report.SanitizeBisectSamples([]bisect.Sample) []any — exported so the schema-v2 sweep emitter in cmd/vmafx-tune/cmd.emitSweepJSON can apply the same nested NaN→null coercion the Python emitter (_nan_to_none in tools/vmaf-tune/src/vmaftune/compare.py) has used since the RFC-8259 hardening of 2026-05-17.
  • New env-var knobs VMAFX_TUNE_ENCODE_TIMEOUT (default 60m), VMAFX_TUNE_SCORE_TIMEOUT (default 30m), VMAFX_TUNE_PROBE_TIMEOUT (default 30s) for the ffmpeg / vmaf / ffprobe subprocess upper bounds. Operators can lower these in CI to fail-fast instead of hanging a job.
  • Codec-discovery cache key is now the binary path, not a one-shot sync.Once. Callers that depended on the old "first probe wins forever" shape (none in tree as of this PR) will see a re-probe on binary-path change.

Python-surfaces bug-audit bundle (2026-05-31, fix/python-surfaces-bug-audit)

no rebase impact: REASON — fork-local Python files only. Touches: ai/src/corpus/base.py (fork-added, ADR-0371), ai/src/vmaf_train/data/{datasets,manifest_scan,feature_dump,frame_dataset,frame_loader}.py (fork-added tiny-AI training surface), and mcp-server/vmaf-mcp/src/vmaf_mcp/server.py (fork-added MCP server, no upstream equivalent). No core/src/ or upstream-mirror file is touched.

Fork-local files: ai/src/corpus/base.py, ai/src/vmaf_train/data/datasets.py, ai/src/vmaf_train/data/manifest_scan.py, ai/src/vmaf_train/data/feature_dump.py, ai/src/vmaf_train/data/frame_dataset.py, ai/src/vmaf_train/data/frame_loader.py, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, mcp-server/vmaf-mcp/tests/test_server.py, ai/tests/test_python_surfaces_bug_audit.py (new), mcp-server/vmaf-mcp/tests/test_python_surfaces_bug_audit.py (new), changelog.d/fixed/python-surfaces-bug-audit-2026-05-31.md (new), docs/research/0983-python-surfaces-bug-audit-2026-05-31.md (new).

chore/gosec-findings-fix-v2 (2026-06-01, ADR-0983)

no rebase impact: the Go surface (cmd/, pkg/, gen/, api/vmafx/v1/) is wholly fork-local. Netflix/vmaf has no Go code. The sweep touches only Go files plus .github/workflows/go-ci.yml, docs/adr/, docs/research/, changelog.d/security/, and the regression test cmd/vmafx-mcp/impl_gosec_test.go. No C, no SIMD, no GPU, no upstream-mirror file is touched.

Re-run of the earlier chore/gosec-findings-fix (PR #509, closed without merge) against the post-#505 / post-#508 master tip. The prior PR conflicted with PR #505's pkg/bisect/bisect.go + pkg/encoder/encoder.go exec.CommandContext + per-stage timeout plumbing; this v2 sweep applies the same security-hardening fixes while preserving the ctx + timeout. Same set of touched files; same single real bug fixed (describeModel path traversal). No markdownlint / formatter regression; both parser CI scripts green.


Files touched: core/test/meson.build (added ../src/thread_locale.c to the test_svm_parser source list), api/vmafx/v1/vmafxjob_types.go, api/vmafx/v1/vmafxnode_types.go, api/vmafx/v1/vmafxmodeltraining_types.go, config/crd/bases/vmafx.dev_vmafxjobs.yaml, config/crd/bases/vmafx.dev_vmafxnodes.yaml, config/crd/bases/vmafx.dev_vmafxmodeltrainings.yaml, deploy/helm/vmafx/crds/*.yaml (synced copies), cmd/vmafx-operator/internal/controller/vmafxnode_controller.go, cmd/vmafx-operator/internal/controller/vmafxnode_probehealthz_test.go (new), cmd/vmafx-operator/internal/controller/vmafxmodeltraining_controller_branch_test.go (int32 casts).

Rebase impact: All changes are fork-local — the operator, vmafx.dev/v1 CRDs, and Helm chart are 100% additions on this fork (no upstream counterparts). The single upstream-mirrored file is core/test/meson.build; the change there is one-line additive (append '../src/thread_locale.c'), no conflict surface. No public C ABI is touched; the libsvm vendor remains observation-only per ADR-0889.

No load-bearing invariants; no AGENTS.md rebase-pin required.


core/src lifecycle + memory audit (2026-05-31, fix/core-lifecycle-memory-audit)

Files touched: core/src/picture_pool.c, core/src/model.c, core/src/model.cpp, core/src/predict.c, core/src/output.c, core/src/dict.c, core/src/feature/feature_collector.cpp, core/test/test_predict.c (new test case), core/test/test_model.c (new test case), core/test/test_output.c (new test case).

Rebase impact: Touch points are all upstream-mirrored TUs. Each fix is a narrow correctness patch (NULL guard, errno sign, errno code, missing return-value propagation, free-on-error path) — none of them changes the public C ABI, the entry-point list, or the data layout of any struct.

On upstream sync:

  • picture_pool.c::pool_preallocate_pictures cleanup: trivial vmaf_picture_unref → aligned_free swap on a fork-only code path (Netflix has no pool_preallocate_pictures in this form).
  • model.{c,cpp}::vmaf_model_load + vmaf_model_collection_load NULL guards: add the if (!version) return -EINVAL; block at the top of each function. Conflicts only if upstream reorders the body.
  • predict.c sign and propagation fixes: small textual deltas on upstream-mirrored functions. If upstream changes the sign convention, fall in line with upstream.
  • output.c CSV/SUB NULL guards: paste the same three-line guard the XML/JSON writers already have (ADR-0602).
  • dict.c::dict_normalize_numeric: one-word change strtofstrtod. Conflicts only if upstream switches to a different parser entirely.
  • feature_collector.cpp::aggregate_vector_append: one-word change -EINVAL-ENOMEM.

No load-bearing invariants; no AGENTS.md rebase-pin required.


Markdown-lint full-ruleset discharge (2026-05-31, ADR-0980)

Files touched: ~1,400 .md files across docs/, .claude/, core/, ai/, tools/, bindings/, mcp-server/, scripts/, cmd/, top-level README/CONTRIBUTING/CODE_OF_CONDUCT. .markdownlint.json is unchanged.

Rebase impact: None for upstream-mirrored TUs. The added <!-- markdownlint-disable ... --> comments live only in fork-added / fork-modified .md files; upstream-vendored .md files in subprojects/, core/test/data/, python/test/resource/, compat/python-vmaf/resource/, compat/python-vmaf/matlab/, model/, and testdata/ are excluded from the gate (see .pre-commit-config.yaml markdownlint-cli2 exclude: regex) and are not touched by this PR.

On upstream sync, no resolution is required for .md files. If a future upstream PR adds a new fork-mirrored .md file that brings new violations, either fix the content or extend the per-file disable comment for that file; do not modify .markdownlint.json.


vmafx-server + pkg/score bug-audit (2026-05-31, ADR-0978)

Files touched: pkg/observability/observability.go, pkg/observability/observability_test.go, pkg/score/grpc_client.go, pkg/score/grpc_client_test.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/http_server.go, cmd/vmafx-server/main_test.go, cmd/vmafx-server/grpc_recovery_test.go (new).

Rebase impact: None. All five surfaces are fork-local Go code:

  • pkg/observability/ is a fork-added package; Netflix/vmaf has no equivalent.
  • pkg/score/ is a fork-added wrapper around the fork's vmafx.v1 proto; Netflix/vmaf does not ship a gRPC client.
  • cmd/vmafx-server/ is a fork-added binary (ADR-0703); Netflix/vmaf has no equivalent gRPC + HTTP scoring service.

No C ABI, no public header, no upstream-mirrored TU touched. Upstream syncs do not interact with this change.


core/tools input-reader safety (2026-05-31, ADR-0977)

Files touched: core/tools/y4m_input.c, core/tools/yuv_input.c, core/tools/vmaf_bench.c, core/test/test_y4m_alloc_failure.c (new), core/test/meson.build.

Rebase impact: PARTIAL. y4m_input.c and yuv_input.c are vendored from Daala via upstream Netflix/vmaf; upstream still carries the unchecked malloc returns and the int-precision dst_buf_sz arithmetic. On any upstream sync that touches the y4m / yuv parsers, keep our (size_t) casts and the explicit if (!_y4m->dst_buf) return -1; block — the diff is localised (the size-arithmetic stanza lines and the return 0; tail of y4m_input_open_impl).

vmaf_bench.c is fork-only (no upstream churn).

Sync action: Mechanical merge if upstream touches the same lines: prefer the fork side at the size-arithmetic stanza and the malloc-failure block in y4m_input_open_impl, prefer the fork bench_cleanup label structure in vmaf_bench::bench_feature.


Test suite: NULL-check malloc sweep (2026-05-31, ADR-0971)

Files touched: core/test/test_ssimulacra2_simd.c, core/test/test_framesync.c, core/test/test_pic_preallocation.c, core/test/AGENTS.md.

Rebase impact: None. All changes are purely additive NULL-checks in test-only files. Netflix/vmaf does not carry these test files upstream (test_ssimulacra2_simd.c, test_pic_preallocation.c are fork-added; test_framesync.c has fork-local modifications). No C API or public ABI is touched. Subsequent upstream syncs do not interact with this change.

Public-header ISO-reserved include guards renamed (2026-05-31)

Files touched: core/include/libvmaf/libvmaf.h, core/include/libvmaf/picture.h, core/include/libvmaf/feature.h, core/include/libvmaf/model.h, core/include/libvmaf/macros.h, core/include/libvmaf/vmaf_assert.h, core/include/libvmaf/dnn.h, core/include/libvmaf/libvmaf_cuda.h, core/include/libvmaf/libvmaf_sycl.h.

Rebase impact: REAL. Six of the nine renamed headers (libvmaf.h, picture.h, feature.h, model.h, libvmaf_cuda.h, plus arguably dnn.h if upstream ever ports the tiny-AI surface) are upstream-mirrored from Netflix/vmaf. Upstream still ships the ISO-reserved __VMAF_*__ guard pattern that SEI CERT DCL37-C bans (ADR-0972).

Sync action: On any upstream sync that touches these six headers, keep our LIBVMAF_<BASENAME>_H lines and drop the upstream __VMAF_*__ ones. The diff is mechanical (3 lines per header — the #ifndef, the #define, and the closing #endif comment); no semantic merge required. The full guard-rename table is in ADR-0972 §Decision.

The remaining three renamed headers (macros.h, vmaf_assert.h, libvmaf_sycl.h) are fork-only and never receive upstream churn.

If a future upstream sync changes the LIBVMAF_* pattern itself (e.g. Netflix adopts the same fix with a different spelling), reopen ADR-0972 to decide whether to converge.


Rust vmafx safe binding crate scaffold (2026-05-31)

Files touched: Cargo.toml (workspace), bindings/rust/vmafx/ (new crate).

Rebase impact: None. Netflix/vmaf has no Rust bindings upstream. The new crate is a pure addition under bindings/rust/, parallel to the existing vmafx-sys crate (ADR-0706). The workspace Cargo.toml gains one members entry; no upstream file is touched. Subsequent upstream syncs do not interact with this code.

If a future upstream PR adds a Rust workspace (extremely unlikely), the fork's bindings/rust/vmafx/ and bindings/rust/vmafx-sys/ paths must not collide with the upstream layout. As of n8.1 there is no precedent.

gRPC ScoreStream Phase 1 (2026-05-31)

Files touched: proto/vmafx.proto, gen/go/vmafx.pb.go, gen/go/vmafx_grpc.pb.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/AGENTS.md, pkg/score/grpc_client.go, pkg/score/grpc_client_test.go, pkg/score/AGENTS.md, docs/architecture/grpc-streaming.md, docs/architecture/index.md, docs/adr/0933-grpc-streaming-multi-frame-scoring.md, docs/adr/_index_fragments/0933-grpc-streaming-multi-frame-scoring.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, changelog.d/added/0933-grpc-streaming-phase1.md.

Rebase impact: None against upstream — this surface is entirely fork-local (Netflix/vmaf has no Go gRPC service). The proto package stays vmafx.v1; the unary Score / Health RPCs are unchanged. ScoreStream is purely additive. The Phase 1 server handler returns codes.Unimplemented after validating the opening StreamConfig.

If a future upstream port touches core/ in a way that changes the public C API consumed by pkg/libvmaf, the Phase 2 wiring of ScoreStream to libvmaf will need to mirror that change — but Phase 1 is server-stub-only and doesn't reach the C surface yet.

Native bash pre-commit hook (ADR-0924, 2026-05-31)

no rebase impact: all paths are fork-local — scripts/githooks/ (new directory), docs/development/pre-commit-hooks.md, docs/adr/0924-*.md, docs/research/0924-*.md, changelog.d/added/native-pre-commit-hooks.md. The Makefile changes rename hooks-installinstall-hooks (with the old name kept as a legacy alias), in a fork-only target that upstream Netflix does not define. No upstream-mirrored file is touched.


Metal kernel parity tests round 3 (2026-05-31)

Files touched: core/test/meson.build, core/test/test_metal_integer_motion_parity.c (new), core/test/test_metal_float_motion_parity.c (new), core/test/test_metal_float_moment_parity.c (new), core/test/test_metal_float_ms_ssim_parity.c (new)

Rebase impact: None. Closes the per-kernel parity coverage gap for the remaining four Metal extractors after PR #351 (registration audit) and PR #379 (round-2 parity: motion_v2, integer_psnr, float_psnr, float_ssim). All four new files live under the existing fork-local enable_metal block in core/test/meson.build (the entire Metal backend is fork-added — ADR-0361 / ADR-0421 / ADR-0589 / T8-2a — and absent from upstream Netflix/vmaf). The block edit appended four new executable() + test() pairs immediately after the round-2 block (PR #379 has since merged); the surrounding endif boundaries are untouched so upstream syncs cannot conflict here.

If upstream ever ports a Metal backend, the test files would need re-pointing at the upstream kernel names; the synthetic-fixture + -ENODEV skip pattern carries forward unchanged.


vmaf-tune coverage push — lowest-covered modules (2026-05-31)

Files touched: tools/vmaf-tune/tests/test_coverage_push_lowcov_modules.py, changelog.d/added/vmaf-tune-coverage-push.md.

Rebase impact: None. tools/vmaf-tune/ is fork-only (no upstream Netflix counterpart); the new test file imports only public + underscore- prefixed seams that already existed in the package. The 92 added tests are pure unit-level (no subprocess / no ffmpeg / no ONNX / no GPU) and exercise documented error paths in uncertainty.py, _gop_common.py, proxy.py, predictor_features.py, benchmark.py, encoder_profile.py, and fast.py. If a future refactor renames any of the targeted internal helpers (_parse_fps, _run_probe_encode, _run_signalstats, _parse_frame_sizes, _mean, _resolve_baseline, _row_encode_fps, _row_score_fps, _resolve_model_path), update the corresponding import in this single test file.


Core MCP transport coverage push (2026-05-31)

Files touched: core/test/test_mcp_coverage.c (new), core/test/meson.build, changelog.d/added/core-mcp-coverage-push.md, docs/research/core-mcp-coverage-push-2026-05-31.md.

Rebase impact: None. The embedded MCP server (core/src/mcp/, core/include/libvmaf/libvmaf_mcp.h) is fork-only — upstream Netflix/vmaf has no MCP surface — so this test-only push is fully self-contained and never lands on a Netflix file. If upstream ever adds an MCP-shaped surface, treat the test as canonical fork-side coverage and reconcile by name. Companion: ADR-0108 deliverables in docs/research/core-mcp-coverage-push-2026-05-31.md.


phase3-subset-sweep readonly-view fix (2026-05-31)

Files touched: ai/scripts/phase3_subset_sweep.py, ai/tests/test_phase3_subset_sweep_unit.py.

Rebase impact: None — ai/scripts/phase3_subset_sweep.py is fork-original (Research-0027 Phase-3 tooling, no upstream Netflix analogue). The fix tightens an internal contract (_standardize_inplace now refuses read-only inputs and the caller forces a writeable copy via to_numpy(copy=True)); there is no public API change and no coupling to upstream files. Safe to carry through any upstream sync.


GPU runtime error-path leak fixes (ADR-0960, 2026-05-31)

no rebase impact: REASON — all changes are in fork-local error paths of core/src/cuda/common.c (new fail_after_stream label between two existing labels) and core/src/picture_pool.c (one pthread_cond_signal call and two pic->priv = NULL assignments). No upstream Netflix/vmaf logic is altered. The new test file core/test/test_picture_pool_error_paths.c is wholly fork-added with no upstream counterpart.

queue PullWork rollback on post-update Get failure (2026-05-31, ADR-0961)

no rebase impact: pure Go controller-internal fix. cmd/vmafx-controller/queue/ is entirely fork-added (no upstream Netflix/vmaf equivalent); upstream syncs do not touch this subtree.


ai/src NaN propagation guards — eval.correlations + tune._read_best_metric (2026-05-31, ADR-0963)

Files touched: ai/src/vmaf_train/eval.py, ai/src/vmaf_train/tune.py, ai/tests/test_eval_correlations.py, ai/tests/test_tune_objective.py.

Rebase impact: None — ai/src/vmaf_train/ is entirely fork-local with no upstream Netflix/vmaf equivalent. No C surface is touched. No upstream coupling.

Helm chart seccompProfile + node-deployment image helper (2026-05-31, ADR-0969)

no rebase impact: REASON — both changes are entirely within deploy/helm/vmafx/ which is fork-added infrastructure with no upstream counterpart in Netflix/vmaf. Netflix upstream does not ship a Helm chart; upstream syncs never touch this directory. PR #439 (ADR-0930) has since merged cleanly on top (it modified values.yaml in a non-conflicting block and did not touch node-deployment.yaml).

MCP HTTP transport security hardening (2026-05-31, ADR-0967)

no rebase impact: REASON — changes are confined to the fork-local MCP server subtree (mcp-server/vmaf-mcp/). Netflix upstream has no MCP server; this entire subtree will never merge upstream. The security middleware, auth helpers, and bind-host resolver are fork-invented code with no upstream counterpart.

HIP kernel parity-test coverage round 4 (2026-05-31, ADR-0958)

Files touched: core/test/test_hip_ssimulacra2_parity.c, core/test/test_hip_float_ssim_parity.c, core/test/meson.build.

Rebase impact: Low — the 2 new tests are fork-added consumers of fork-added HIP feature extractors (ssimulacra2_hip, float_ssim_hip). Upstream Netflix has no HIP backend, so neither the test sources nor the meson registration block has an upstream-mirror analogue. The skip-on--ENOSYS contract matches the round-1/2/3 template (PR #351 / PR #372 / PR #443) — if upstream ever ships a HIP backend the tests can be kept verbatim; their CPU side calls only public C-API entry points (vmaf_init, vmaf_use_feature, vmaf_read_pictures, vmaf_feature_score_at_index, vmaf_close) that are upstream-stable.

The round-4 plan also covered speed_chroma_hip / speed_temporal_hip parity gates, but those were deferred when the container build surfaced a pre-existing latent link defect — the helpers speed_internal_init_dimensions / speed_internal_float_stride are declared in core/src/feature/speed_internal.h but never defined. The same defect blocks the analogous CUDA / SYCL speed-family TUs from linking (none are currently wired into their respective meson archives). A follow-up PR adding core/src/feature/speed_internal.c will unblock all three GPU backends simultaneously. Tracked as T-HIP-SPEED-INTERNAL-IMPL-MISSING-2026-05-31 in docs/state.md.

Companion: docs/adr/0958-hip-kernel-coverage-round4.md, docs/research/0958-hip-kernel-coverage-round4-2026-05-31.md, changelog.d/added/0958-hip-kernel-coverage-round4.md.

Controller infrastructure fixes — StreamJobs + reaper stop signal (2026-05-31, ADR-0962)

No rebase impact: all changes are confined to the fork-local controller package (cmd/vmafx-controller/) and the Queue interface in cmd/vmafx-controller/queue/queue.go. Netflix upstream does not own these paths (the controller is a Phase 4b addition, not a port of Netflix code). The nodes.Registry context-propagation change is entirely within fork-local code and has no interaction with libvmaf C sources.

vmaf_mcp_stop() idempotent (CAS instead of exchange) (2026-05-31)

Files touched: core/src/mcp/mcp.c, core/test/test_mcp_stop_idempotent.c, core/test/meson.build.

Rebase impact: None — core/src/mcp/ is fork-only (Netflix has no MCP surface). The fix replaces three atomic_exchange(running, 2) + dual-value-guard pairs with three atomic_compare_exchange_strong(expected=1, desired=2) calls, keeping the existing 3-state state machine semantics intact and matching the CAS pattern already used by vmaf_mcp_start_{stdio,uds,sse}. The new regression test (test_mcp_stop_idempotent.c) is also fork-only. Sync impact: no Netflix file references vmaf_mcp_* symbols.

compat/python-vmaf/ scanf + ProcessRunner locale fixes (2026-05-31, ADR-0955)

Files touched: compat/python-vmaf/tools/scanf.py, compat/python-vmaf/__init__.py, python/test/python_harness_scanf_locale_bugs_test.py (new fork-only test).

Rebase impact: Medium. Both fixes live inside the upstream-mirror tree (compat/python-vmaf/), so a future upstream sync may overwrite them.

  1. tools/scanf.py::makeFormattedHandler.applyWidth — the upstream code has an inverted width guard:
def applyWidth(handler):
    if width is None:
        return makeWidthLimitedHandler(handler, width, ignoreWhitespace=True)
    return handler

The fork swaps the branches so implicit-width converters return handler and explicit-width converters return the capped wrapper. When porting an upstream commit that re-touches this function, verify the swapped semantics are preserved. If Netflix has independently fixed the same bug, drop the fork delta and update ADR-0955's status to Superseded by upstream.

  1. __init__.py::ProcessRunner.run — upstream sets the C locale via env.setdefault("LC_ALL", "C") / env.setdefault("LANG", "C"). The fork replaces both setdefault calls with unconditional assignment (env["LC_ALL"] = "C" / env["LANG"] = "C") so a parent shell with non-English LC_ALL / LANG cannot defeat the override. When porting an upstream commit that re-touches ProcessRunner.run, preserve the unconditional assignment pattern.

The regression test python/test/python_harness_scanf_locale_bugs_test.py exercises both code paths and will fail if either fix regresses during an upstream sync.

GPU dispatch-runtime host-only unit test (2026-05-31, ADR-0954)

Files touched: core/test/test_gpu_dispatch_runtime.c (new), core/test/meson.build.

Rebase impact: Low. The new test executable is fork-local — upstream Netflix/vmaf does not ship the gpu_dispatch_env, gpu_dispatch_parse, or per-backend dispatch_strategy TUs targeted by the test (those are all ADR-0181 / ADR-0461 / ADR-0483 fork additions). The wiring in core/test/meson.build lives in the fork-added test region near other test_* entries; no upstream collision is possible. If upstream ever adds dispatch-strategy abstractions of its own, the test would coexist by name.

Python harness coverage push round 2 (2026-05-31)

Files touched: python/test/python_harness_coverage_test.py (new — 82 cases).

Rebase impact: None. The new test file lives under python/test/, exercises only fork-touched modules under compat/python-vmaf/, and does not modify any Netflix golden assertAlmostEqual value (CLAUDE.md §8). Upstream Netflix has no analogue at the compat/ path (that subtree exists because of ADR-0700). When /sync-upstream runs, this file is fork-only and needs no re-baselining. Companion: PR #412 (test/compat-python-vmaf-coverage) round 1, PR #413 (fix/decorator-persist-encode) — neither overlap.

HIP ADM parity test feature-name + ENOSYS skip (ADR-0950, 2026-05-31)

Files touched: core/test/test_hip_adm_parity.c.

Rebase impact: no rebase impact: Netflix/vmaf upstream has no HIP backend at all (HIP is a fork-exclusive backend per ADR-0212); theadm_hipextractor and its parity test only exist on this fork. There is no upstream counterpart to reconcile during sync. Companion fix to ADR-0949 (motion3 sibling); both tests now follow the same two-axis (enable_hip × enable_hipcc) skip predicate. Companion docs: docs/adr/0950-hip-adm-parity-feature-name-and-enosys-skip.md, changelog.d/fixed/0950-test-hip-adm-parity-feature-name-and-enosys-skip.md.

go-services-coverage-round2 (2026-05-31)

Files touched: cmd/vmafx-tune/cmd/unit_internal_test.go, cmd/vmafx-tune/cmd/unit_internal_fixtures_test.go, cmd/vmafx-controller/grpc_server_test.go, cmd/vmafx-controller/queue/queue_extra_test.go, pkg/encoder/version_extract_test.go, changelog.d/added/go-services-coverage-round2.md.

Rebase impact: None. The Go cmd/ and pkg/ trees are wholly fork-added — upstream Netflix/vmaf has no Go layer. All new files are test-only and never enter the libvmaf C build, the Python harness, or the FFmpeg patch stack. No production code is touched, so the upstream rebase boundary is unaffected. The cmd/vmafx-controller grpc_server tests carry the //go:build cgo tag mirroring the production source file, so they compile only when cgo is enabled (matching the existing main_test.go invariant).

dev/Containerfile libvmaf → core path fix (2026-05-31, ADR-0966)

No rebase impact: pure path fix, no upstream coupling. dev/Containerfile is entirely fork-local and the only change is substituting three occurrences of the old source-directory name libvmaf/ with core/ following the ADR-0700 rename. If a future sync touches dev/Containerfile (unlikely — Netflix does not ship a dev container), re-run grep -n 'libvmaf/' dev/Containerfile to confirm no stale references were re-introduced by the merge. The library output name (libvmaf.so) and stage name (libvmaf-build) are intentionally preserved as references to the product, not the source directory.


SIMD bit-exactness round-2 — SSIMULACRA 2 FMA unification + lib-FP-model extension (2026-05-30, ADR-0891)

CUDA kernel parity coverage round 3 (2026-05-31)

Files touched: core/test/test_cuda_float_psnr_parity.c, core/test/test_cuda_float_vif_parity.c, core/test/test_cuda_float_ms_ssim_parity.c, core/test/test_cuda_float_moment_parity.c, core/test/test_cuda_ssimulacra2_parity.c, core/test/meson.build (+5 executable() + test() blocks under the existing if get_option('enable_cuda') guard, suite ['fast', 'gpu']), docs/adr/0947-cuda-kernel-coverage-round3.md, docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line), docs/research/cuda-kernel-coverage-round3-2026-05-31.md, changelog.d/added/cuda-kernel-coverage-round3.md.

Rebase impact: None. All five test files are fork-local (test_cuda_*_parity.c pattern is fork-only; upstream Netflix/vmaf has no equivalent test scaffold). core/test/meson.build edits are additive blocks inside the existing enable_cuda guard — no upstream file in this region. If upstream Netflix adds new CUDA kernels with matching names (float_psnr_cuda, float_vif_cuda, float_ms_ssim_cuda, float_moment_cuda, ssimulacra2_cuda), the parity tests continue to work unchanged. If upstream adds new test files near test_integer_vif_cpu_cuda_parity (the closest neighbour in meson.build) the additive blocks may need re-anchoring — trivial 3-way merge.

PRs #351 (round 1) and #374 (round 2) both inserted test entries under the same enable_cuda guard in core/test/meson.build and have since merged; the sequential three-way merges resolved cleanly at landing time.


ADR template — optional supply-chain / SBOM / carbon sections (2026-05-31)

Files touched: docs/adr/0000-template.md, docs/adr/README.md

Rebase impact: None. Upstream Netflix/vmaf does not maintain an ADR template; the entire docs/adr/ tree is fork-local. The new optional sections (## Supply-chain impact, ## SBOM delta, ## Carbon / footprint) appear between ## Consequences and ## References. No upstream conflict surface.


vmafx-operator zap → slog uniformity (2026-05-31)

Files touched: cmd/vmafx-operator/main.go, cmd/vmafx-operator/internal/controller/suite_test.go, cmd/vmafx-operator/AGENTS.md, go.mod, go.sum.

Rebase impact: None against Netflix/vmaf (the operator is a fork-only Go package; upstream ships no Kubernetes operator). Rebase impact does exist against the kubebuilder v4 template itself: future scaffold upgrades will re-introduce sigs.k8s.io/controller-runtime/pkg/log/zap imports in main.go and suite_test.go. When re-running kubebuilder edit / operator-sdk init, re-apply the slog bridge:

  • main.go: replace the zap.Options block with slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: ...}) passed through logr.FromSlogHandler.
  • suite_test.go: replace zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(true)) with slog.NewTextHandler(GinkgoWriter, &slog.HandlerOptions{Level: slog.LevelDebug}) through logr.FromSlogHandler.

The cmd/vmafx-operator/AGENTS.md invariant #6 documents this; check it before merging any upstream-template re-sync PR.


MCP server cgo direct path Phase 1 (2026-05-31, ADR-0931)

Files touched: pkg/libvmaf/direct.go, pkg/libvmaf/errors.go, pkg/libvmaf/direct_test.go, pkg/libvmaf/errors_test.go, pkg/libvmaf/AGENTS.md, cmd/vmafx-mcp/impl.go, cmd/vmafx-mcp/impl_direct.go, cmd/vmafx-mcp/impl_direct_test.go, cmd/vmafx-mcp/AGENTS.md.

Rebase impact: None against Netflix upstream. The change is entirely fork-local: it adds a new in-process cgo scoring path (ScoreDirect, ValidateModel) to pkg/libvmaf/ (which does not exist upstream) and wires two MCP tool handlers (vmaf_score, describe_model) in cmd/vmafx-mcp/ (which also does not exist upstream) to take that path when VMAFX_MCP_DIRECT=1. The libvmaf public C ABI used (vmaf_init / vmaf_use_features_from_model / vmaf_read_pictures / vmaf_score_pooled / vmaf_model_load_from_path / vmaf_picture_alloc / vmaf_picture_unref / vmaf_model_destroy / vmaf_close) is the canonical entry-point set documented in core/include/libvmaf/; the upstream signatures change rarely and any rename would already break core/tools/vmaf.c, so this code rides along.

If upstream renames or removes any of those entry points, update pkg/libvmaf/direct.go to match, then run the unit suite (LD_LIBRARY_PATH=$(pwd)/core/build-cpu/src go test ./pkg/libvmaf/ ./cmd/vmafx-mcp/).


OpenTelemetry tracing full roll-out — ADR-0782 (2026-06-03)

Files touched: pkg/observability/otel_instruments.go (new), cmd/vmafx-controller/grpc_server.go, cmd/vmafx-node/executor.go, cmd/vmafx-node/main.go, cmd/vmafx-server/main.go, cmd/vmafx-mcp/main.go, cmd/vmafx-controller/queue/queue.go, deploy/grafana/vmafx-overview.json (new), deploy/helm/vmafx/templates/otel-collector-sidecar.yaml (new), docs/observability/otel.md (new), docs/adr/0782-otel-tracing.md (new).

Rebase impact: None on the Netflix/vmaf C tree. Entirely fork-local Go instrumentation. No upstream C surfaces touched. The otel_instruments.go file is a pure addition; span call-sites follow the StartSpan/EndSpan pattern and do not change function signatures. If a future port touches executor.go or grpc_server.go, the span stanzas are additive and do not conflict with upstream semantics.


OpenTelemetry traces + metrics — Phase 1 (2026-05-31)

Files touched: pkg/observability/otel.go (new), pkg/observability/otel_test.go (new), pkg/observability/AGENTS.md (new), cmd/vmafx-controller/main.go, cmd/vmafx-controller/grpc_server.go, docs/development/observability.md (new), docs/adr/0927-opentelemetry-traces-metrics-phase1.md (new), go.mod, go.sum.

Rebase impact: None on the Netflix/vmaf C tree. The change is entirely fork-local Go code under pkg/observability and cmd/vmafx-controller. Upstream Netflix/vmaf has no Go services, so there is no cross-repo file to reconcile on sync. The added OTel dependencies (go.opentelemetry.io/otel, otelgrpc, OTLP HTTP exporters) live in go.mod and do not touch the C build.

When Phase 2 wires OTel into vmafx-node / vmafx-server / vmafx-mcp / vmafx-tune, follow the call-site pattern documented in pkg/observability/AGENTS.md (the 5 s bounded shutdown is mandatory). Each subsequent service ships as its own PR with its own ADR.

mkdocs ADR nav restructure + by-tag generator (2026-05-31)

Files touched: mkdocs.yml, scripts/docs/generate-adr-nav.sh, scripts/docs/generate-adr-by-tag.sh, docs/adr/by-tag/*.md (auto-generated, 443 files), docs/adr/0937-mkdocs-nav-decade-buckets.md, docs/adr/_index_fragments/0937-*.md, docs/adr/_index_fragments/_order.txt (append).

Rebase impact: None. All files are fork-only:

  • Upstream Netflix/vmaf has no mkdocs.yml, no docs/adr/ tree, and no scripts/docs/ directory.
  • The sentinel-bounded splice region in mkdocs.yml (# >>> ADR-NAV-GENERATED / # <<< ADR-NAV-GENERATED) is fork-local and unaffected by any upstream doc reorganisation.
  • The docs/adr/by-tag/ tree is regenerated by scripts/docs/generate-adr-by-tag.sh --write; on every ADR add / edit / tag-edit, re-run the script (or rely on the --check CI gate once wired into .github/workflows/docs.yml).

If the per-hundred bucket labels in LABELS inside scripts/docs/generate-adr-nav.sh drift away from the actual bucket themes (e.g., the 0800s and 0900s fill out with a clear topic), edit the dict and re-run --write.

BuildKit cache mounts on container build matrix (2026-05-31)

Files touched: Dockerfile, docker/Dockerfile.production-gpu, dev/Containerfile, Dockerfile.go-server, docs/adr/0923-buildkit-cache-mounts.md, changelog.d/changed/buildkit-cache-mounts.md.

Rebase impact: None. These four Dockerfiles are fork-local (Netflix's upstream has only the top-level Dockerfile which we already heavily customise; the production-gpu / dev / go-server trio are wholly fork-added). The change introduces three patterns worth preserving across rebases:

  1. # syntax=docker/dockerfile:1.7 header at the top of each file.
  2. RUN --mount=type=cache,target=/var/cache/apt,sharing=locked --mount=type=cache,target=/var/lib/apt,sharing=locked apt-get ... on every apt invocation, with the matching rm -rf /var/lib/apt/lists/* cleanup REMOVED.
  3. RUN --mount=type=cache,target=$CCACHE_DIR,sharing=locked CCACHE_DIR=... <build command> around every meson/ninja/cmake invocation; ccache installed as a build dependency; FFmpeg gets --cc='ccache gcc' --cxx='ccache g++'; cmake gets -DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache.

If upstream Netflix adds new RUN apt-get install lines to the top-level Dockerfile, prepend the apt cache mount pair. If they add new C/C++ compile steps, wrap them with the ccache mount + env var.

The vmaf user uid/gid is now explicitly pinned to 1000 in dev/Containerfile so BuildKit --mount=...,uid=1000,gid=1000 directives resolve to the same identity that runs the build — preserve that pin on rebase.

Pre-existing test failures across ai/, vmaf-tune, mcp-server (2026-05-30)

Files touched: ai/tests/conftest.py, ai/tests/test_codec_aware_fr.py, ai/tests/test_dnn_exporter_run_provenance.py, ai/tests/test_export_roundtrip.py, ai/tests/test_qat_smoke.py, ai/tests/test_registry.py, ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py, ai/tests/test_train_fr_regressor_v3.py, ai/tests/test_tune_cli.py, ai/tests/test_variance_mode.py, ai/tests/test_conftest_pytorch_lightning_guard.py (new), ai/pyproject.toml, tools/vmaf-tune/src/vmaftune/ladder.py, tools/vmaf-tune/tests/test_ladder.py, mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py, mcp-server/vmaf-mcp/tests/test_http_transport.py.

Rebase impact: None. All three touched subsystems are fork-local:

  • ai/ — entirely fork-added (tiny-AI training); upstream Netflix/vmaf has no Python training package.
  • tools/vmaf-tune/ — fork-added recommendation tool; upstream has no equivalent.
  • mcp-server/vmaf-mcp/ — fork-added MCP JSON-RPC server; upstream has no equivalent.

No cross-repo conflict possible. The requires_pytorch_lightning() helper in ai/tests/conftest.py is a generic environment-probe pattern that will keep working unchanged for any future torch / torchvision / torchmetrics ABI drift; the only knob to revisit is whether to widen the broad except Exception if some future failure mode warrants more specific handling.

Unified Python test orchestrator — top-level noxfile.py (2026-05-31, ADR-0914)

Files touched: noxfile.py (new), docs/development/python-test-orchestrator.md (new), docs/adr/0914-unified-python-test-orchestrator.md (new), docs/adr/_index_fragments/0914-unified-python-test-orchestrator.md (new), docs/adr/_index_fragments/_order.txt, docs/research/0914-python-test-orchestrator-audit-2026-05-31.md (new), changelog.d/added/0914-unified-python-test-orchestrator.md (new).

Rebase impact: None. The orchestrator is entirely fork-local — upstream Netflix/vmaf ships only the python/ legacy harness and its python/tox.ini, neither of which this change modifies. The new noxfile.py lives at repo root, a path upstream does not occupy. If upstream ever adds its own noxfile.py, treat the conflict as fork-takes-priority: our file delegates to upstream's python/tox.ini via the python_harness session, so behaviour is preserved.

clang-tidy modernize-* family enablement (2026-05-31)

Files touched: .clang-tidy, core/src/feature/feature_collector.cpp, core/src/metadata_handler.cpp.

Rebase impact: Low. .clang-tidy is fork-local; upstream Netflix does not ship one. feature_collector.cpp is fork-renamed from upstream .c under ADR-0725-family migrations — if an upstream sync brings a new .c patch that touches feature_collector, the patch likely applies cleanly to the .cpp (extern "C" linkage is preserved) but should be replayed in the C++ idiom (nullptr not NULL, <cstring> not <string.h>). metadata_handler.cpp is wholly fork- local with no upstream counterpart.

When syncing: keep the four -modernize-* opt-outs in .clang-tidy (noise / C-ABI hostility rationale documented in ADR-0915). If upstream ever ships their own clang-tidy config, merge by union — drop our opt-outs only with an explicit ADR.

cargo-deny supply-chain policy (2026-05-31)

Files touched: deny.toml (new), .github/workflows/rust-ci.yml (new cargo-deny job + deny.toml / core/src/feature/rust/** path filters), core/src/feature/rust/tad/Cargo.toml (publish = false).

Rebase impact: None against upstream Netflix/vmaf — deny.toml, the cargo-deny CI job, and the Rust workspace itself are all fork-local additions. Upstream does not maintain a Rust workspace, so no merge surface exists. The publish = false change to core/src/feature/rust/tad/Cargo.toml is also fork-local (core/src/feature/rust/ is an ADR-0707 pilot directory that does not exist upstream).

If a future upstream sync starts shipping a Rust workspace of its own, reconcile by extending deny.toml's [graph] members implicit-include behaviour (cargo-deny picks up workspace members automatically) and audit whether upstream's choice of licenses / banned-crate stance differs from ours. See ADR-0917.

Pixel-format edge coverage test (2026-05-31)

Files touched: core/test/test_pixel_format_edge_coverage.c (new), core/test/meson.build (one executable + one test() registration).

Rebase impact: Low. The new test file is wholly fork-local and only links against the public extractor / picture / collector C surface (no internal-source #include). If upstream Netflix renames any of the API entry points the test uses (vmaf_get_feature_extractor_by_name, vmaf_feature_extractor_context_create / _extract / _close / _destroy, vmaf_feature_collector_init / _get_score / _destroy, vmaf_picture_alloc / _unref), update the test accordingly. The meson.build additions sit between the existing test_psnr block and test_framesync; no upstream core/test/meson.build reordering should conflict, since the inserted block is immediately adjacent to fork-only neighbours.

ADR-0912.

ADR README drift sweep (2026-05-31)

Files touched: docs/adr/README.md, docs/adr/_index_fragments/_order.txt, 35 new + 7 rewritten files under docs/adr/_index_fragments/[0-9]*.md, 3 orphan fragments removed under docs/adr/_index_fragments/, changelog.d/fixed/adr-readme-regen.md.

Rebase impact: None. The fragment tree and README.md are entirely fork-local (upstream Netflix/vmaf has no ADR directory). The sweep only re-aligns three fork-local index sources against the already-authoritative docs/adr/[0-9]*-*.md ADR file set, with no content changes to any ADR body. Future regenerations are mechanical via scripts/docs/concat-adr-index.sh --write.

codespell sweep + .codespellrc (2026-05-31)

Files touched: .codespellrc (new), CONTRIBUTING.md, docs/metrics/cambi.md, docs/adr/0910-codespell-sweep-config.md (new), changelog.d/fixed/codespell-sweep.md (new).

Rebase impact: Low. .codespellrc skip-list explicitly excludes every Netflix-author / vendored / upstream-mirrored file enumerated in ADR-0910 §Context (e.g. compat/python-vmaf/*, python/test/*, core/src/feature/{x86,arm64,cuda,hip,common,metal}/*, core/src/svm.cpp, core/src/pdjson.c, core/tools/y4m_input.c, core/tools/cli_parse.c, core/README.md, core/tools/README.md, core/test/test_picture.c), so re-running codespell after a sync surfaces only newly-introduced fork typos. If upstream lands new files under the skipped trees that the fork later adopts as fork-local (e.g. a new feature extractor we then modify), drop the matching skip row and re-run codespell to catch any latent typos.

If upstream changes path layout (rename core/ back to libvmaf/, etc.), update the skip-list paths in .codespellrc to match. ignore-words-list is independent of upstream layout.

Re-run: codespell --config .codespellrc (or just codespell from the repo root — picks up .codespellrc automatically). Expected output: no findings on a clean tree.

.gitignore staleness audit (ADR-0905, 2026-05-30)

Files touched: .gitignore, python/.gitignore.

Rebase impact: None. Both files are fork-local (the rules trimmed or rewired all originate from fork additions and the post-ADR-0700 directory rename). Upstream Netflix/vmaf maintains its own .gitignore independently; the matlab MEX block, the Cython adm_dwt2_cy block, and the legacy python/.gitignore scope were fork-only artefacts of the rename and never tracked upstream. On the next /sync-upstream, Netflix's .gitignore will merge cleanly because the trimmed rules (.gradle/, .pypirc) and the rewired matlab paths (compat/python-vmaf/matlab/**/*.mex*) do not overlap any upstream rule.

cpp const/noexcept/nodiscard annotation sweep (2026-05-30)

Files touched: core/src/dict.cpp, core/src/feature/feature_collector.cpp, core/src/feature/feature_name.cpp, core/src/fex_ctx_vector.cpp, core/src/opt.cpp.

Rebase impact: None. All annotations are added to fork-local TU-internal static helpers and one TU-local lambda in C++23 files that were introduced by the ADR-0723 / ADR-0727 / ADR-0729 / ADR-0731 C++ migration waves. The extern "C" public-ABI entry points are untouched, so no upstream header rebase is affected. If upstream Netflix introduces new fork-only C++ static helpers, apply the same [[nodiscard]] / noexcept discipline so the lint posture stays uniform.

libvmaf-public-header-doc-gaps-round3 (2026-05-30)

Files touched:

  • core/include/libvmaf/picture.h (doc comments on enum + opaque typedef + 2 entry points, plus NOLINT-cited include guard)
  • core/include/libvmaf/libvmaf.h (doc comments on 2 enums + opaque typedef + 1 struct, plus NOLINT-cited include guard)
  • core/include/libvmaf/libvmaf_cuda.h (doc comments on opaque typedef + config struct + enum + 1 picture-config struct, plus NOLINT-cited include guard)

Rebase impact: Low. The doc-comment additions land above unchanged upstream-mirror declarations; any future Netflix upstream that touches the same function signatures, enum bodies, or struct definitions will produce a tractable 3-way merge — the doc text is fork-local and git merge will preserve our /** ... */ block above whatever upstream rewrites the declaration to. No identifier renames; no ABI/source impact.

The NOLINT annotations on __VMAF_H__ / __VMAF_PICTURE_H__ / __VMAF_CUDA_H__ are inline comments only — they do not alter the include guard symbols themselves, so upstream's preprocessor identity remains intact. Same pattern PR #327 (round 2) used for feature.h / model.h / dnn.h. If a future upstream sync changes the guard form (unlikely — these have been stable for years), the NOLINT cites become redundant and can be removed in a follow-on cleanup.

libvmaf-public-header-doc-gaps-round2 (2026-05-30)

Files touched: - core/include/libvmaf/feature.h (doc comments + NOLINT-cited guard) - core/include/libvmaf/model.h (doc comments + NOLINT-cited guard) - core/include/libvmaf/dnn.h (vmaf_dnn_session_close doc + NOLINT-cited guard)

Rebase impact: Low. The doc-comment additions land above unchanged upstream-mirror declarations; any future Netflix upstream that touches the same function signatures will produce a tractable 3-way merge — the doc text is fork-local and git merge will preserve our /** ... */ block above whatever upstream rewrites the signature to. No identifier renames; no ABI/source impact.

The NOLINT annotations on __VMAF_FEATURE_H__ / __VMAF_MODEL_H__ / __VMAF_DNN_H__ are inline comments only — they do not alter the include guard symbols themselves, so upstream's preprocessor identity remains intact. If a future upstream sync changes the guard form (unlikely — these have been stable for years), the NOLINT cites become redundant and can be removed in a follow-on cleanup. /binary symbol renames; consumers of the patch stack (ffmpeg-patches/) and the Go/Rust bindings see identical declarations.

Bash strict-mode + trap-cleanup sweep (2026-05-30, ADR-0899)

Files touched: scripts/run_unittests.sh, scripts/ai/fetch-tiny-blobs.sh, dev/scripts/smoke-probe-loop.sh, scripts/ci/check-agent-worktree-drift.sh, scripts/ci/test_check_agent_worktree_drift.sh, scripts/ci/check-adr-numbering.sh, scripts/ci/check-dispatch-registry.sh, scripts/adr/next-free.sh, tools/ensemble-training-kit/_platform_detect.sh.

Rebase impact: None. All 9 files are fork-local (Netflix upstream has neither scripts/adr/, scripts/ci/check-*-drift*, scripts/ai/fetch-tiny-blobs.sh, dev/scripts/smoke-probe-loop.sh, tools/ensemble-training-kit/, nor the in-tree scripts/run_unittests.sh in this form). No conflict risk on sync-upstream.

Conflict watchpoints (none expected): if a future upstream sync introduces a Netflix-side scripts/run_unittests.sh, the strict-mode set -eu block at the top of our version is the only carrier of fork-specific behaviour and trivially survives a 3-way merge.

Metal kernel parity tests round 2 (2026-05-30)

Files touched: core/test/meson.build, core/test/test_metal_motion_v2_parity.c (new), core/test/test_metal_integer_psnr_parity.c (new), core/test/test_metal_float_psnr_parity.c (new), core/test/test_metal_float_ssim_parity.c (new)

Rebase impact: None. All four new files live under the existing fork-local enable_metal block in core/test/meson.build (the entire Metal backend is fork-added — ADR-0361 / ADR-0421 / ADR-0589 — and absent from upstream Netflix/vmaf). The block edit appends four new executable() + test() pairs immediately after the test_metal_install_header block; the surrounding endif boundaries are untouched so upstream syncs cannot conflict here.

If upstream ever ports a Metal backend, the test files would need re-pointing at the upstream kernel names; the synthetic-fixture + -ENODEV skip pattern from test_sycl_motion3_parity.c carries forward unchanged.

.claude/skills/ — ADR-0700 path drift cleanup (2026-05-30)

Files touched: .claude/skills/add-gpu-backend/scaffold.sh, .claude/skills/build-vmaf/build.sh, .claude/skills/build-vmaf/SKILL.md, .claude/skills/regen-docs/SKILL.md, .claude/skills/add-simd-path/templates/simd_feature.c.template

Rebase impact: None. Files are entirely fork-local (the .claude/ tree does not exist upstream — see ADR-0331 / ADR-0700). The change rewrites four residual libvmaf/ source-tree references to core/ to match the post-ADR-0700 layout. Public install-path references (core/include/libvmaf/..., libvmaf.so) are unchanged.

When syncing from upstream Netflix/vmaf, this file does not need attention; the conflict surface is empty.

ADR-0871 — SSIM SIMD dispatch pthread_once guard — 2026-05-30

Low rebase impact. The fix sits in two fork-added zones:

  • core/src/feature/iqa/ssim_tools.c — the file is a Tom-Distler BSD-2011 import, but the four globals (g_ssim_precompute, g_ssim_variance, g_ssim_accumulate, g_iqa_convolve), the setter functions, and the new iqa_ssim_install_dispatch_once helper are fork additions (Distler's 2011 import has no SIMD dispatch). The pthread_once guard and atomic-installer publish are appended to the existing fork-added block. A future re-import of Tom Distler's IQA would not collide because the new code lives in fork-added territory.
  • core/src/feature/iqa/ssim_simd.h — fork-added header (Netflix/vmaf has no equivalent); appends one declaration.
  • core/src/feature/float_ssim.c and core/src/feature/float_ms_ssim.c — the dispatch-install bodies are fork additions; the change factors them into a callback and routes the call through the once-helper. The Netflix-upstream init() bodies are unchanged beyond the dispatch block, so a future upstream change to the init() prologue would merge cleanly.

Fork-local files: core/src/feature/iqa/ssim_tools.c (fork-added dispatch zone), core/src/feature/iqa/ssim_simd.h (fork-added header), core/src/feature/float_ssim.c (fork-added SIMD-install block), core/src/feature/float_ms_ssim.c (fork-added SIMD-install block), docs/adr/0871-ssim-dispatch-pthread-once.md, docs/research/tsan-race-audit-2026-05-30.md, changelog.d/fixed/tsan-race-audit.md.

sanitizer-pass-cleanup (2026-05-30, ADR-0869)

Files touched:

  • core/src/feature/cambi.c — adds two int shadow slots (window_size_opt, max_log_contrast_opt) to CambiState; the options table targets them; init() copies into the existing uint16_t runtime fields.
  • core/src/feature/x86/adm_avx2.c — moves the uint32_t cast inside the shift in four DWT2 filter-packing expressions.
  • core/src/feature/x86/adm_avx512.c — same as AVX2.

Rebase impact:

  • CAMBI: upstream Netflix's CambiState does not have the _opt shadow slots. On upstream sync, expect a context conflict on the struct definition and on the two option-table entries. Resolution is to keep the fork's shadow slots and the init-bridge assignments; upstream's option entries should be re-pointed at the _opt shadows.
  • ADM AVX2/AVX-512: the four filter-packing expressions are upstream-mirrored code. On upstream sync, a textual conflict is possible at every occurrence; the fork's resolution is the inside-cast (((uint32_t)filter[k] << 16)). Bit-exact with upstream output; safe to keep.

Verified clean under ASan+UBSan against the full unit-test suite (63 tests OK) and the vmaf CLI on 4:2:0 8-bit, 4:2:2 10-bit, 4:2:0 12-bit. Cambi tuned-options feature-name derivation (cambi_mlc_3_ws_63) works.

SIMD bit-exactness round-2 — SSIMULACRA 2 FMA unification + lib-FP-model extension (2026-05-30, ADR-0891)

Files touched: core/src/meson.build, core/src/feature/x86/ssimulacra2_avx2.c, core/src/feature/x86/ssimulacra2_avx512.c, core/test/test_ssimulacra2_simd.c.

Rebase impact: Low — SSIMULACRA 2 is fork-added (no upstream coupling) and the meson helper _libvmaf_feature_icx_args mirrors the existing _x86_simd_strict_fp_extra pattern from ADR-0339 (round-1). If upstream Netflix ever adds an intel-llvm build matrix and ships scalar references inside libvmaf_feature_static_lib that participate in SIMD bit-exactness tests, reuse _libvmaf_feature_icx_args rather than minting a new helper. The FMA-based picture_to_linear_rgb colour matrix is fully self-contained inside the SSIMULACRA 2 TUs; no upstream Netflix file references those symbols. Companion: docs/adr/0891-simd-bit-exact-round2-fmaf-libvmaf-feature-icx.md, changelog.d/fixed/0891-simd-bit-exact-round2.md.


SIMD strict-FP flags for icx (2026-05-30)

Files touched: core/src/meson.build, core/test/meson.build, core/src/feature/AGENTS.md

Rebase impact: Low. The changes add an icx-specific compile flag (-fp-model=precise) to x86 SIMD carve-out static libs and to the three SIMD bit-exactness test executables (test_psnr_hvs_simd, test_ms_ssim_decimate, test_ssimulacra2_simd). The flag is added only when cc.get_id() returns 'intel-llvm' or 'intel-llvm-cl', so GCC and vanilla Clang builds are unaffected.

If upstream Netflix adds new SIMD carve-out static libs, apply the same _x86_simd_strict_fp_extra pattern to them so the icx build stays green. If Netflix adds new SIMD test executables that compare a scalar reference against SIMD output, add _simd_strict_fp_args to their c_args.


Coverage Gate ORT accessor coverage (2026-05-30)

Files touched: core/test/dnn/test_ort_internals.c, changelog.d/fixed/coverage-gate-ort-backend-accessor.md.

Rebase impact: None. The added test exercises a fork-only public accessor (vmaf_ort_output_name_at) on a fork-only file (core/src/dnn/ort_backend.c); the test TU itself is fork-only under ADR-0112's testability surface. Upstream Netflix/vmaf has no ORT backend, so there is no cross-repo file to reconcile on sync. The ADR-0114 per-file floor override (PER_FILE_MIN["core/src/dnn/ort_backend.c"]=78) stays in place; the coverage delta (409 → 413 / 526 = 78.5 %) is the per-file safety margin restored after PR #129 grew the denominator with unreachable error-handling.


unused-testdata-debug-scripts-cleanup (2026-05-30, ADR-0880)

Files touched: testdata/check_borders.py (deleted), testdata/compare_a380.py (deleted), testdata/scores_sycl_b580_576_mq.json (deleted).

Rebase impact: None. All three files were fork-added and not present in upstream Netflix/vmaf. No upstream patch context references them. Future /sync-upstream runs will not surface any conflicts on these paths.

trivy-container-scan-baseline (2026-05-30, ADR-0878)

Files touched: docker/Dockerfile.production, docker/Dockerfile.production-gpu

Rebase impact: None. Both files are fork-added (no upstream Netflix/vmaf equivalents — Netflix ships no production Dockerfile). The added USER nonroot:nonroot directive on each final stage will not conflict on any future upstream sync. If upstream ever publishes their own Dockerfile, the fork's containers stay separate (the GHCR namespace is vmafx/).

go-nilness-staticcheck-audit (2026-05-30)

Files touched: cmd/vmafx-server/{main.go,http_server.go}, cmd/vmafx-controller/{main.go,http_server.go}, cmd/vmafx-mcp/impl.go, cmd/vmafx-node/main_test.go, pkg/ai/infer_test.go, pkg/bisect/bisect_test.go.

Rebase impact: None. Every modified file is fork-original Go code under cmd/vmafx-* / pkg/*; Netflix/vmaf upstream does not ship Go code in these paths. No upstream conflict possible.

iwyu-audit (2026-05-30) — fork-only files, append-only direct includes

Files touched: 16 fork-authored sources under core/src/feature/, core/src/feature/x86/, core/test/, core/tools/.

Rebase impact: None. All modified files carry the Lusoris-only license header (filtered explicitly during scope selection — files with a Netflix header were skipped to preserve upstream-parity per CLAUDE.md §12 r12). The diff consists of removing dead #include directives and adding direct includes for symbols previously reached transitively. Upstream Netflix/vmaf does not contain any of these files in the form modified here, so there is no conflict surface for a future sync-upstream to navigate.

Follow-up: A second-phase IWYU pass on core/src/dnn/*, core/src/{cuda,sycl,hip,vulkan}/, and the DNN-gated feature extractors is owed (the host CPU-only build cannot exercise VMAF_HAVE_DNN because ONNX Runtime is not installed locally). That pass will run inside the vmaf-dev-mcp container per CLAUDE.md §12 r15.


magic-number-audit cert-int07c (2026-05-30, ADR-0874)

Files touched: core/src/mcp/{mcp_internal.h,mcp.c,compute_vmaf.c,transport_sse.c}, core/src/picture.c, core/src/cuda/picture_cuda.c, core/src/libvmaf.c.

Rebase impact: Low. All five core/src/mcp/* files and core/src/cuda/picture_cuda.c are fork-added; upstream Netflix/vmaf has neither MCP nor a CUDA picture-allocator with these bounds. core/src/picture.c and core/src/libvmaf.c are fork-mirrored — the renames touch fork-added helpers (dnn_*_output_feature_name) and the fork's VMAF_PIC_BPC_{MIN,MAX} hardening (originally a fork-local guard against bpc < 8 || bpc > 16). A future upstream sync that re-introduces a raw 8/16 predicate on those lines should keep the fork's named constants — they are not bit-exact changes and do not alter behaviour. No new public C-API symbols introduced.

eintr-and-io-error-audit (2026-05-30, ADR-0872)

Files touched: core/src/mcp/transport_stdio.c, core/src/mcp/transport_uds.c, core/src/libvmaf.c, core/src/feature/cambi.c, core/src/sycl/dmabuf_import.cpp, core/tools/vmaf_vpl.c.

Rebase impact: Low. The MCP transports are fully fork-local (no upstream peer). libvmaf.c, cambi.c, and vmaf_vpl.c carry fork-local hunks (vmaf_write_output, heatmaps close() fail-path, VPL VA-API init) that are already non-shared with upstream — the new (void) casts sit inside those hunks. dmabuf_import.cpp is wholly fork-added (no upstream file). No upstream conflict expected on the next sync; if Netflix ever adds their own MCP transport, the EINTR retry pattern should be ported there too.

adr-0100-per-surface-doc-audit (2026-05-30)

Files touched: docs/development/build-flags.md, docs/api/dnn.md, docs/usage/cli.md, changelog.d/added/adr-0100-per-surface-doc-audit.md.

Rebase impact: None. All four files are fork-added (the upstream Netflix/vmaf tree has no docs/development/build-flags.md, no docs/api/dnn.md, no docs/usage/cli.md at the fork's depth, and no changelog.d/). The audit closes per-surface doc gaps for fork-local surfaces (codec-context DNN API, codec/preset/CRF/resize CLI flags, six Meson options) that originated in fork ADRs (ADR-0335, ADR-0361, ADR-0519, ADR-0550, ADR-0568, ADR-0623, ADR-0707, ADR-0726). No upstream file is touched; no rebase conflict possible.

go-pkg-coverage-push (2026-05-30)

Files touched: pkg/observability/observability_test.go, pkg/report/report_test.go, pkg/encoder/discover_test.go, pkg/libvmaf/paths_test.go, pkg/gpu/parsers_test.go, pkg/gpu/probe_shim_test.go, pkg/bisect/parse_test.go, pkg/storage/internals_test.go, changelog.d/added/go-pkg-coverage-push.md.

Rebase impact: None. The Go pkg/ tree is wholly fork-added — upstream Netflix/vmaf has no Go layer. All new files are test-only and never enter the libvmaf C build, the Python harness, or the FFmpeg patch stack. No production code is touched, so the upstream rebase boundary is unaffected.

python-type-annotations-audit (2026-05-30)

Files touched: ai/src/aiutils/{__init__,jsonl_utils,parquet_utils}.py, ai/src/corpus/base.py, mcp-server/vmaf-mcp/src/vmaf_mcp/{server,http_transport}.py, tools/vmaf-tune/src/vmaftune/{auto,benchmark,corpus,encoder_profile, fr_from_nr_adapter,hdr,predictor_features,report,saliency,score, score_backend,sidecar}.py, tools/vmaf-tune/src/vmaftune/codec_adapters/_gop_common.py, pyproject.toml.

Rebase impact: None. Every touched file is fork-added (ai/, mcp-server/, tools/vmaf-tune/) or fork-only mypy config (pyproject.toml [tool.mypy.overrides]). Upstream Netflix/vmaf does not ship any of these trees; on a future upstream sync there is no conflict surface.

The change is a pure type-annotation tightening — no runtime semantics change. The one functional change is the removal of a dead-code duplicate _run_benchmark() definition in mcp-server/vmaf-mcp/src/vmaf_mcp/server.py; the deleted copy was silently shadowed at import time by the progress-token-aware implementation 575 lines later, so removal is behaviour-preserving.

openapi-rest-schema (2026-05-29, ADR-0797)

Files touched: api/openapi/vmafx-server-v1.yaml, api/openapi/oapi-codegen.yaml, gen/go/oapi/vmafx_server_v1.gen.go, cmd/vmafx-server/rest_adapter.go, cmd/vmafx-server/swagger_ui.go, cmd/vmafx-server/http_server.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/main.go, docs/server/rest.md

Rebase impact: None. All touched files are fork-local additions in the Go server layer (cmd/vmafx-server/, api/, gen/go/) that do not exist in upstream Netflix/vmaf. No rebase conflicts are possible.

The newHTTPServer signature gained a *grpcServer parameter; any fork-local branch that calls newHTTPServer with the old 4-argument form will fail to compile and must add the grpcServer argument.


ADR-0783 — Kubernetes e2e integration test harness (2026-05-29)

No rebase impact on upstream C/Python code.

All files are wholly fork-local additions: test/e2e/kind-cluster.sh, test/e2e/fixtures/gen-tiny-yuv.sh, test/e2e/fixtures/ref.yuv, test/e2e/fixtures/dist.yuv, test/e2e/kuttl-tests/ (all test case YAML), .github/workflows/e2e-k8s.yml, docs/k8s/integration-tests.md, docs/adr/0783-k8s-e2e-integration-test-harness.md, changelog.d/added/k8s-e2e-integration-test-harness.md.

Netflix upstream has no Kubernetes test infrastructure; no merge conflict risk. A sync-upstream that adds an upstream e2e directory would not conflict with this harness because Netflix uses libvmaf/ path roots that the fork has renamed to core/ (ADR-0700).


cuda-ms-ssim-vert-lcs-horiz-ldg (2026-05-29, ADR-0757)

Files touched: core/src/feature/cuda/integer_ms_ssim/ms_ssim_score.cu

Rebase impact: None. The modified file is a fork-added CUDA kernel TU that does not exist in upstream Netflix/vmaf master (ms_ssim CUDA port is fork-local). No rebase conflict is possible.

The change is a pure performance annotation: __launch_bounds__(128), const float *__restrict__ pointer extraction, and __ldg() on inner-loop loads. If upstream Netflix ever adds their own ms_ssim CUDA port, this file will need to be re-reviewed against theirs; the F3 pattern should carry forward.

cpp23 orphan .c sweep — metadata_handler.c (2026-05-29)

Files touched: core/src/metadata_handler.c (deleted)

Rebase impact: None. The file was dead source — never referenced by any meson.build after ADR-0708 renamed it to metadata_handler.cpp. Upstream Netflix/vmaf still uses metadata_handler.c; on future upstream sync, the upstream .c file will reappear in the patch context but meson.build will continue to reference only .cpp. No conflict possible: the deletion only affects the fork-local tree.

Rule for future cpp23 conversions: when renaming foo.cfoo.cpp in meson.build, always git rm core/src/foo.c in the same commit. Leaving both files in tree causes the source tree to diverge from the build definition.


cuda-readback-free-host-pinned-leak sweep (2026-05-29)

Files touched: core/src/cuda/kernel_template.h, docs/backends/kernel-scaffolding.md

Rebase impact: None. The fix is entirely in fork-added files (kernel_template.h is a Lusoris-added header; kernel-scaffolding.md is fork-added documentation). No upstream Netflix/vmaf file is modified.

The changed function (vmaf_cuda_kernel_readback_free) did not exist in upstream — it was introduced by the fork's kernel-template ADR. No rebase conflict is possible.

ADR-0753 — CUDA resolution-aware dispatch scaffold (2026-05-29)

Files touched (initial + extended scope):

  • core/src/feature/cuda/resolution_dispatch.{h,c} (new)
  • core/src/feature/cuda/integer_adm/adm_cm.cu (two kernel macros)
  • core/src/feature/cuda/integer_adm_cuda.c (include, struct field, init, dispatch)
  • core/src/feature/cuda/integer_vif/filter1d.cu (FILTER1D_8_HORI_NO_BOUNDS macro + instantiation)
  • core/src/feature/cuda/integer_vif_cuda.c (struct field, init, resolution-aware dispatch in filter1d_8)
  • core/src/feature/cuda/integer_ssim/ssim_score.cu (calculate_ssim_vert_combine_no_bounds)
  • core/src/feature/cuda/integer_ssim_cuda.c (struct field, init, resolution-aware dispatch in submit_fex_cuda)
  • core/src/feature/cuda/AGENTS.md (invariant notes + verified wirings table)
  • docs/adr/0753-cuda-resolution-aware-dispatch.md (new; extended policy table)
  • docs/backends/cuda/overview.md (kernel dispatch table extended)
  • docs/research/0753-cuda-resolution-aware-dispatch-design.md (new)
  • changelog.d/added/cuda-resolution-aware-dispatch.md (new)

Rebase impact: Low on resolution_dispatch.{h,c} — these are wholly new fork-local files; no upstream conflict possible.

adm_cm.cu: The ADM_CM_LINE macro was split into ADM_CM_LINE_BOUNDED and ADM_CM_LINE_NO_BOUNDS. If upstream Netflix modifies adm_cm.cu after the fork diverges, the split needs to be reapplied around the new macro body. The extern "C" wrapping (ADR-0747) must be preserved for both entries.

integer_adm_cuda.c: The AdmStateCuda struct grew one field (func_adm_cm_line_kernel_8_no_bounds). If upstream adds fields to the struct in the same location, resolve the merge conflict by keeping both additions. The new #include "feature/cuda/resolution_dispatch.h" line must survive any upstream shuffle of the include block.

On rebase: verify that both cuModuleGetFunction calls in the init block still reference valid kernel symbol names from adm_cm.cu.

Research-0751 4K baseline + PR #79 adm_cm A/B (2026-05-29)

Files touched: docs/research/0751-cross-backend-4k-baseline-and-pr79-adm-cm-4k-measure.md, changelog.d/changed/cross-backend-4k-baseline.md

Rebase impact: None. Research-only digest; no source code changed. No upstream conflict possible — these are fork-added measurement artifacts.

CI round-3 fix — .semgrepignore, .gitleaks.toml, codeql-config.yml, compat/python-vmaf/ (2026-05-28)

Files touched: .semgrepignore, .gitleaks.toml, .github/codeql-config.yml, compat/python-vmaf/core/feature_extractor.py, core/test/test_hip_smoke.c, ai/src/aiutils/jsonl_utils.py, ai/src/vmaf_train/registry.py, .github/workflows/libvmaf-build-matrix.yml.

Rebase impact: Low. All changes are either CI config fixes (path corrections post-ADR-0700 rename) or code fixes for missing functions and removed extractors.

On upstream sync:

  • .semgrepignore and .gitleaks.toml are fork-local; no upstream conflict expected.
  • codeql-config.yml is fork-local; no upstream conflict expected.
  • compat/python-vmaf/core/feature_extractor.py: if Netflix upstream modifies python/vmaf/core/feature_extractor.py (old path), the rename-shim must preserve the removal of float_ansnr from VmafIntegerFeatureExtractor's features list. The legacy path (VmafFeatureExtractor, line 301) may still reference float_ansnr if upstream restores it; that's intentional pending the legacy-runner sunset decision.
  • core/test/test_hip_smoke.c: if upstream adds float_ansnr_hip back, the removed test function must be restored.

docs/research/0734-r610-driver-changelog-audit-2026-05-28.md — R610 driver audit

No rebase impact. This is a documentation-only research digest; it does not touch any C sources, build files, or API surfaces. No upstream sync conflict expected.


docs/research/0734-cudnn-version-audit-20260528.md — cuDNN/ORT audit (doc-only)

No rebase impact on upstream C/Python code: this PR adds only doc and changelog files. No C source, header, or Python source is modified.

If a future upstream sync adds cuDNN pinning or onnxruntime-gpu to any Python requirement, re-check dev/Containerfile lines 529–539 (ORT install) and ai/pyproject.toml for compatibility with the then-current cuDNN series.

Fork-local files added: docs/research/0734-cudnn-version-audit-20260528.md (new), changelog.d/changed/docs-cudnn-version-audit.md (new), docs/rebase-notes.md (this entry), docs/state.md (new deferred row).


Periodic drift sweep — upstream syncs may reintroduce libvmaf/ refs

After every Netflix/vmaf upstream sync, run the inventory grep from PR chore/post-rename-drift-sweep-20260528 to catch any new libvmaf/[a-z] or python/vmaf/ directory references outside ADR bodies and CHANGELOG.md. Files to recheck: Makefile, Dockerfile, .github/codeql-config.yml, IDE settings, skill scripts, and any newly-added utility under scripts/. See changelog fragment changelog.d/fixed/post-rename-drift-sweep.md for the full inventory commands.## port/upstream-batch-threading-picture-pool (2026-06-04)

Files touched: core/src/libvmaf.c, core/src/meson.build

Rebase impact: if a future upstream commit adds more #ifdef VMAF_BATCH_THREADING blocks, those blocks must be removed in the same port PR — the fork no longer uses the flag. The non-batch threaded_read_pictures path was removed; it is not recoverable from the fork without re-introducing the old per-extractor thread pool enqueue pattern.


.github/workflows/tests-and-quality-gates.yml — coverage job deselects slow vifks360 test

The coverage job's --deselect list includes python/test/quality_runner_test.py::QualityRunnerTest::test_run_vmaf_runner_float_vifks360o97 because the test exceeds the 60 s per-test limit on GitHub-hosted runners and truncates the suite. If upstream Netflix/vmaf adds a test with a similar name in a future sync, verify it does not also use a very large vif_kernelscale before removing the deselect. The deselect is CI-only; the test runs in the Netflix golden gate without a per-test timeout.


.github/workflows/ — post-ADR-0700 path rename (libvmaf/core/)

If an upstream Netflix/vmaf sync or cherry-pick brings new CI references to libvmaf/ (path filters, cd libvmaf, find libvmaf/src), they must be remapped to core/ in the same PR. The fork's source tree is rooted at core/ per ADR-0700; any upstream workflow or Makefile that still hardcodes libvmaf/ as a source directory will silently build from a non-existent path on this fork. Additionally, replace any gitleaks/gitleaks-action usage with the direct gitleaks CLI binary — the action requires a GITLEAKS_LICENSE for org repos even when public.


docker/Dockerfile.node — vmafx-node worker image + ffmpeg n8.2 (ADR-0717)

ffmpeg-patches now validated against both n8.1.1 and n8.2. The node Dockerfile pins FFMPEG_TAG=n8.2. When the next upstream sync lands, confirm:

  1. ffmpeg-patches/ still applies against the new tag. Run FFMPEG_SHA=<new-tag> bash ffmpeg-patches/test/build-and-run.sh.
  2. If patches fail, rebase the affected patches and update FFMPEG_TAG in both dev/Containerfile and docker/Dockerfile.node in the same PR (CLAUDE.md §12 r14).
  3. pkg/encoder/encoder.go shells out to ffmpeg. If a new FFmpeg version changes a codec's CLI flag, update the encoder package to match.

Touched files: docker/Dockerfile.node, cmd/vmafx-node/main.go, cmd/vmafx-node/probe/probe.go, cmd/vmafx-node/probe/probe_test.go, cmd/vmafx-node/server/server.go, cmd/vmafx-node/server/server_test.go, docs/adr/0717-vmafx-node-ffmpeg-latest.md, docs/development/vmafx-node.md, changelog.d/added/node-ffmpeg-latest.md, docs/state.md (this entry), docs/rebase-notes.md (this entry).


feat/speed-python-compat-extractors (Research-0732, item #2) — low-conflict upstream port

No structural rebase impact. This PR adds fork-local content to paths (compat/python-vmaf/core/feature_extractor.py, compat/python-vmaf/core/quality_runner.py, python/test/feature_extractor_test.py, docs/metrics/speed_qa.md) that are already diverged from upstream (python/vmaf/core/… in Netflix/vmaf). When syncing from upstream:

  • If Netflix/vmaf updates SpeedChromaFeatureExtractor or SpeedTemporalFeatureExtractor (e.g. bumps VERSION), apply the equivalent change to compat/python-vmaf/core/feature_extractor.py.
  • If Netflix/vmaf adds new SpEED QualityRunner subclasses, port them to compat/python-vmaf/core/quality_runner.py.
  • The compat harness mirrors Netflix's class hierarchy intentionally; keep the TYPE, VERSION, and ATOM_FEATURES_TO_VMAFEXEC_KEY_DICT in sync.

cmd/vmafx-server — Go gRPC + HTTP server (ADR-0703)

no rebase impact on upstream C/Python code: the Go server is entirely fork-local (cmd/, pkg/, gen/, proto/, go.mod, go.sum, Dockerfile.go-server, buf.gen.yaml). None of these paths overlap with Netflix/vmaf upstream.

If a future upstream sync touches model/ (model JSON schema changes) or core/include/libvmaf/libvmaf.h (public ABI), review:

  • pkg/libvmaf/libvmaf.go — the cgo #include and JSON parsing in parseOutput.
  • The ScoreResponse.features map keys (derived from pooled_metrics keys in the vmaf CLI JSON output; key names are stable but new keys may appear).

Touched files: cmd/vmafx-server/main.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/http_server.go, cmd/vmafx-server/main_test.go, pkg/libvmaf/libvmaf.go, pkg/libvmaf/libvmaf_test.go, pkg/observability/observability.go, proto/vmafx.proto, proto/buf.yaml, buf.gen.yaml, gen/go/vmafx.pb.go, gen/go/vmafx_grpc.pb.go, go.mod, go.sum, Dockerfile.go-server, docs/server/grpc.md, docs/adr/0703-vmafx-server-go-grpc.md, changelog.d/added/vmafx-server-go.md, docs/state.md, deploy/helm/vmafx/values.yaml (image repository update).


PR that touches upstream-shared paths or establishes a rebase-sensitive invariant adds an entry here. PRs with no rebase impact state "no rebase impact" in the PR description and skip the entry.

docs/hw-backend-audit-2026-05-28 — doc-only, no rebase impact

No upstream rebase impact: this PR adds a research digest (docs/research/0733-hardware-backend-audit-2026-05-28.md), a changelog fragment, and a docs/state.md update. No C source, build system, or upstream-shared path is touched. Netflix/vmaf upstream syncs are unaffected.

feat/vmafx-phase4-language-modernization-foundation (ADR-0702) — fork-only, no Netflix conflict

No upstream rebase impact. The files added in this PR (go.mod, Cargo.toml, pkg/, cmd/, bindings/, .github/workflows/go-ci.yml, .github/workflows/rust-ci.yml) are entirely fork-local. Netflix/vmaf upstream does not have a Go or Rust surface; cherry-picks from upstream are unaffected.

The docs/principles.md, docs/development/languages.md, .gitignore, and Makefile additions are additive; the Makefile targets are named distinctly (go-build, go-test, rust-build, rust-test) and do not conflict with any upstream Makefile target.

feat/vmafx-tune-go-stage1 (ADR-0705) — fork-only, no Netflix conflict

No upstream rebase impact: the Go port lives entirely under cmd/vmafx-tune/, pkg/encoder/, pkg/bisect/, and pkg/report/. These directories do not exist in upstream Netflix/vmaf. The Python tools/vmaf-tune/ is unchanged. go.mod and go.sum are fork-local additions that upstream does not carry. Cherry-picks from upstream that touch tools/vmaf-tune/ Python source files are unaffected by this PR.

feat/vmafx-mcp-go-port (ADR-0704) — fork-only, no Netflix conflict

No upstream rebase impact: this PR adds cmd/vmafx-mcp/, pkg/libvmaf/, go.mod, and go.sum — all entirely fork-local. The Python MCP server at mcp-server/vmaf-mcp/ is unchanged. Netflix/vmaf upstream does not contain any Go code or an MCP server. Cherry-picks from upstream are unaffected.

chore/post-cutover-url-sweep — fork-only URL change, no Netflix conflict

No upstream rebase impact: this change replaces all occurrences of the lusoris/vmaf GitHub repository slug with VMAFx/vmafx following the GitHub org cutover. All affected strings are fork-local (CI workflow URLs, GHCR image paths, ADR cross-references, doc URLs). Netflix/vmaf upstream does not contain any of these references. Cherry-picks from upstream are unaffected.

refactor/vmafx-repo-layout (ADR-0700) — IMPORTANT: breaks all in-flight PRs

Upstream sync strategy: upstream Netflix/vmaf patches arrive with libvmaf/ paths. When cherry-picking or porting upstream commits after ADR-0700 merged, rewrite paths in the patch stream:

# Single commit
git format-patch -1 <upstream-sha> --stdout \
  | sed 's|libvmaf/|core/|g' \
  | git am --3way

# Range of commits
git format-patch <base>..<tip> --stdout \
  | sed 's|libvmaf/|core/|g' \
  | git am --3way

In-flight PR rebase recipe: after git rebase origin/master, resolve each libvmaf/ path conflict by renaming to core/, and each python/vmaf/ conflict by renaming to compat/python-vmaf/.

Python import compatibility: import vmaf continues to work via the compat/vmaf symlink (→ python-vmaf/) when compat/ is on sys.path, and via the python/vmaf/__init__.py shim when python/ is on sys.path. No from vmaf. import lines need changing.

What stays the same: libvmaf.so, libvmaf.pc, <libvmaf/...> C install-path headers, all public C symbols (VmafContext, vmaf_init, etc.), ffmpeg filter names.

Touched files: all source-tree path references across CI workflows, Makefile, scripts, docs, agent configs, and the libvmaf/ and python/vmaf/ directories themselves.

feat/ai-run-manifest-helper (ADR-0678)

No upstream rebase impact: this touches fork-local AI helper code, AI scripts, tests, Claude skills, docs, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship these AI provenance helpers or local training utilities.

Invariant: new standalone AI artifact sidecars use aiutils.run_manifest.write_run_manifest() so the shared envelope and run_provenance block stay deduplicated. Existing stable report schemas may continue embedding build_run_provenance() directly.

Smoke: .venv/bin/python -m pytest ai/tests/test_run_manifest.py ai/tests/test_build_bisect_cache.py ai/tests/test_legacy_extractor_manifests.py ai/tests/test_ptq_scripts.py ai/tests/test_qat_smoke.py -q

Touched files: ai/src/aiutils/run_manifest.py, ai/scripts/ptq_dynamic.py, ai/scripts/ptq_static.py, ai/scripts/qat_train.py, ai/scripts/build_bisect_cache.py, ai/scripts/collect_gpu_calibration_data.py, ai/scripts/extract_ugc_features.py, ai/scripts/extract_konvid_frames.py, AI tests, .claude/skills/ai-run-manifest/SKILL.md, AI package/Claude guidance, docs/ai/*.md, docs/adr/0678-*.md, docs/research/0699-*.md, changelog.d/added/0678-*.md, and this file.

feat/ai-dataset-fetch-manifests (ADR-0677)

No upstream rebase impact: this touches fork-local AI dataset fetch helpers, tests, docs, package AGENTS notes, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship these local downloader scripts.

Invariant: dataset fetch helpers that seed later AI JSONL/parquet builders write deterministic ADR-0661 run-manifest sidecars before conversion. fetch_konvid_1k.py defaults to <root>/fetch_manifest.json; fetch_youtube_ugc_subset.py keeps --manifest as the content manifest and defaults the run sidecar to <manifest>.run-manifest.json.

Smoke: .venv/bin/python -m pytest ai/tests/test_dataset_fetch_manifests.py -q

Touched files: ai/scripts/fetch_konvid_1k.py, ai/scripts/fetch_youtube_ugc_subset.py, ai/tests/test_dataset_fetch_manifests.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/training-data.md, docs/ai/konvid-1k-ingestion.md, docs/ai/youtube-ugc-ingestion.md, docs/ai/mos-corpora.md, docs/adr/0677-*.md, docs/research/0698-*.md, changelog.d/added/0677-*.md, and this file.

feat/mos-corpus-adapter-manifests (ADR-0676)

No upstream rebase impact: this touches fork-local AI MOS corpus adapters, tests, docs, package AGENTS notes, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship these local MOS-corpus ingestion scripts.

Invariant: CHUG, KoNViD-1k, KoNViD-150k, YouTube-UGC, LSVQ, LIVE-VQC, and Waterloo-IVC source adapters write <output>.manifest.json by default using corpus.base.write_ingest_manifest() and ADR-0661 run_provenance. Keep new MOS adapter CLIs on this sidecar contract before their JSONL rows feed aggregation, model-card refreshes, or signal-mix audits.

Smoke: .venv/bin/python -m pytest ai/tests/test_corpus_base.py ai/tests/test_chug.py ai/tests/test_konvid_1k.py ai/tests/test_konvid_150k.py ai/tests/test_lsvq.py ai/tests/test_live_vqc.py ai/tests/test_waterloo_ivc.py ai/tests/test_youtube_ugc.py -q

Touched files: ai/src/corpus/base.py, ai/scripts/chug_to_corpus_jsonl.py, ai/scripts/konvid_1k_to_corpus_jsonl.py, ai/scripts/konvid_150k_to_corpus_jsonl.py, ai/scripts/youtube_ugc_to_corpus_jsonl.py, ai/scripts/lsvq_to_corpus_jsonl.py, ai/scripts/live_vqc_to_corpus_jsonl.py, ai/scripts/waterloo_ivc_to_corpus_jsonl.py, ai/tests/test_corpus_base.py, ai/tests/test_chug.py, ai/AGENTS.md, docs/ai/*.md ingestion docs, docs/adr/0676-*.md, docs/research/0697-*.md, changelog.d/added/0676-*.md, and this file.

feat/full-feature-exporter-manifests (ADR-0668 follow-up)

No upstream rebase impact: this touches fork-local AI corpus exporters, tests, docs, package AGENTS notes, a research digest, and a changelog fragment. Upstream Netflix/vmaf does not ship these KoNViD or BVI-DVC training-table builders.

Invariant: ai/scripts/konvid_to_full_features.py and ai/scripts/bvi_dvc_to_full_features.py write <out>.manifest.json by default using aiutils.run_manifest. Keep the manifest beside refreshed local parquets so later model cards can prove source roots, cache/model inputs, feature order, and row/clip counts.

Smoke: .venv/bin/python -m pytest ai/tests/test_konvid_full_features.py ai/tests/test_bvi_dvc_dir_mode.py -q

Touched files: ai/scripts/konvid_to_full_features.py, ai/scripts/bvi_dvc_to_full_features.py, ai/tests/test_konvid_full_features.py, ai/tests/test_bvi_dvc_dir_mode.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/bvi-dvc-corpus-ingestion.md, docs/research/0696-full-feature-exporter-manifests.md, changelog.d/added/0696-full-feature-exporter-manifests.md, and this file.

feat/u2netp-mirror-exporter (ADR-0671)

No upstream rebase impact: this touches fork-local tiny-AI exporter tooling, tests, docs, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship the U2NetP mirror workflow.

Invariant: ai/scripts/export_u2netp_mirror.py imports an audited local xuebinqin/U-2-Net checkout and writes a gitignored ONNX plus manifest. Do not vendor upstream U-2-Net source here, do not accept non-Apache license text, and do not commit model/u2netp_mirror.onnx.

Smoke: .venv/bin/python -m pytest ai/tests/test_export_u2netp_mirror.py -q

Touched files: ai/scripts/export_u2netp_mirror.py, ai/tests/test_export_u2netp_mirror.py, ai/AGENTS.md, docs/ai/u2netp-mirror.md, docs/ai/models/u2netp_mirror_card.md, docs/ai/training.md, docs/adr/0671-*.md, docs/adr/_index_fragments/0671-*.md, docs/research/0691-*.md, changelog.d/added/0671-*.md, and this file.

feat/tune-score-backend-native-priority (ADR-0667)

No upstream rebase impact: this touches fork-local vmaf-tune backend-selection code, docs, tests, AGENTS notes, and ADR/research notes. Upstream Netflix/vmaf does not ship the fork vmaf-tune automation harness.

Invariant: tools/vmaf-tune/src/vmaftune/score_backend.py keeps DEFAULT_FALLBACKS = ("cuda", "sycl", "hip", "cpu"). The Vulkan entry was removed when ADR-0726 dropped the Vulkan backend; do not re-add it during backend-selector rebases. CPU remains the final fallback.

Smoke: .venv/bin/python -m pytest tools/vmaf-tune/tests/test_score_backend.py -q

Touched files: tools/vmaf-tune/src/vmaftune/score_backend.py, tools/vmaf-tune/tests/test_score_backend.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-score-backend.md, docs/adr/0667-*.md, docs/adr/_index_fragments/0667-*.md, docs/research/0687-*.md, changelog.d/changed/0667-*.md, and this file.

feat/tune-report-quick-takeaways (ADR-0666)

No upstream rebase impact: this touches fork-local vmaf-tune report rendering, tests, user docs, ADR/research notes, AGENTS notes, and a changelog fragment. Upstream Netflix/vmaf does not ship the fork vmaf-tune profile-card renderer.

Smoke: .venv/bin/python -m pytest tools/vmaf-tune/tests/test_report.py -q

Touched files: tools/vmaf-tune/src/vmaftune/report.py, tools/vmaf-tune/tests/test_report.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0666-*.md, docs/adr/_index_fragments/0666-*.md, docs/research/0686-*.md, changelog.d/added/0666-*.md, and this file.

fix/fast-nr-calibration-quality-guard (ADR-0665)

No upstream rebase impact: this touches fork-local tiny-AI calibration tooling, vmaf-tune docs, package AGENTS notes, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship the fork nr_metric_v1 fast-NR sidecar calibration workflow.

Smoke: .venv/bin/python -m pytest ai/tests/test_calibrate_nr_threshold.py -q

Touched files: ai/scripts/calibrate_nr_threshold.py, ai/tests/test_calibrate_nr_threshold.py, ai/AGENTS.md, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune-fast-nr.md, docs/ai/training.md, docs/adr/0665-*.md, docs/adr/_index_fragments/0665-*.md, docs/research/0685-*.md, changelog.d/fixed/0665-*.md, and this file.

feat/ai-validation-report-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local tiny-AI validation tooling, model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the fork tiny-model registry or saliency-student validation surfaces.

Smoke: .venv/bin/python -m pytest ai/tests/test_validation_report_provenance.py -q

Touched files: ai/scripts/validate_model_registry.py, ai/scripts/validate_saliency_student.py, ai/tests/test_validation_report_provenance.py, docs/ai/model-registry.md, docs/ai/models/saliency_student_*.md, docs/ai/training.md, docs/research/0683-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0683-*.md, and this file.

feat/vmaf-tiny-validator-report-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local tiny-AI validator tooling, model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the v2/v3/v4 tiny-VMAF validator CLI family.

Smoke: .venv/bin/python -m pytest ai/tests/test_vmaf_tiny_validator_reports.py -q

Touched files: ai/scripts/validate_vmaf_tiny_v*.py, ai/tests/test_vmaf_tiny_validator_reports.py, docs/ai/models/vmaf_tiny_v*.md, docs/ai/training.md, docs/research/0681-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0681-*.md, and this file.

feat/saliency-student-metrics-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local AI saliency training tooling, model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the DUTS-trained saliency student metrics surface.

Smoke: .venv/bin/python -m pytest ai/tests/test_saliency_student_metrics_provenance.py -q

Touched files: ai/scripts/train_saliency_student.py, ai/scripts/train_saliency_student_v2.py, ai/tests/test_saliency_student_metrics_provenance.py, docs/ai/models/saliency_student_*.md, docs/ai/training.md, docs/research/0680-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0680-*.md, and this file.

feat/dnn-exporter-manifest-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local AI exporter tooling, tiny-model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship these DNN feature-model exporter sidecars.

Smoke: .venv/bin/python -m pytest ai/tests/test_dnn_exporter_run_provenance.py -q

Touched files: ai/scripts/export_tiny_models.py, ai/scripts/export_fastdvdnet_pre.py, ai/scripts/export_fastdvdnet_pre_placeholder.py, ai/scripts/export_transnet_v2.py, ai/scripts/export_transnet_v2_placeholder.py, ai/tests/test_dnn_exporter_run_provenance.py, docs/ai/models/*.md, docs/ai/training.md, docs/research/0679-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0679-*.md, and this file.

feat/ensemble-manifest-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local AI ensemble training tooling, docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the fr_regressor_v2_ensemble_v1 trainer/manifest surface.

Smoke: .venv/bin/python -m pytest ai/tests/test_train_fr_regressor_v2_ensemble.py -q

Touched files: ai/scripts/train_fr_regressor_v2_ensemble.py, ai/tests/test_train_fr_regressor_v2_ensemble.py, docs/ai/models/fr_regressor_v2_probabilistic.md, docs/ai/training.md, docs/research/0678-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0678-*.md, and this file.

feat/nr-threshold-calibration-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local AI/vmaf-tune calibration tooling, docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the vmaf-tune --fast-nr NR threshold calibration path.

Smoke: .venv/bin/python -m pytest ai/tests/test_calibrate_nr_threshold.py -q

Touched files: ai/scripts/calibrate_nr_threshold.py, ai/tests/test_calibrate_nr_threshold.py, docs/usage/vmaf-tune-fast-nr.md, docs/ai/training.md, docs/research/0677-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0677-*.md, and this file.

feat/phase-f-calibration-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local AI/vmaf-tune calibration tooling, docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the vmaf-tune auto Phase F recipe calibration path.

Smoke: .venv/bin/python -m pytest ai/tests/test_calibrate_phase_f_recipes.py -q

Touched files: ai/scripts/calibrate_phase_f_recipes.py, ai/tests/test_calibrate_phase_f_recipes.py, docs/usage/vmaf-tune.md, docs/ai/training.md, docs/research/0676-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0676-*.md, and this file.

feat/quant-ep-report-provenance (ADR-0661)

No upstream rebase impact: this touches fork-local AI investigation tooling, AI docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship this per-EP quantisation harness.

Smoke: .venv/bin/python -m pytest ai/tests/test_measure_quant_drop_per_ep.py -q

Touched files: ai/scripts/measure_quant_drop_per_ep.py, ai/tests/test_measure_quant_drop_per_ep.py, docs/ai/quant-eps.md, docs/research/0006-tinyai-ptq-accuracy-targets.md, docs/research/0675-quant-ep-report-provenance.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0675-*.md, and this file.

fix/windows-cuda-toolkit-installer (ADR-0664)

High CI rebase impact: this touches the fork-local build matrix workflow. Upstream Netflix/vmaf does not ship these Windows GPU build-only legs, but workflow syncs can silently restore older action-based setup patterns.

Rebase-sensitive fork invariant:

  • Build — Windows MSVC + CUDA (build only) installs CUDA 13.2.0 directly from NVIDIA's Windows network installer and verifies nvcc.exe --version. Do not restore Jimver/cuda-toolkit on this Windows leg without a superseding ADR and a green required Windows CUDA run.
  • Linux CUDA legs remain unchanged and still use Jimver/cuda-toolkit.

Smoke: gh pr checks <pr> --watch --required

Touched files: .github/workflows/libvmaf-build-matrix.yml, .github/AGENTS.md, docs/development/ci-runners.md, docs/adr/0664-*.md, docs/research/0664-*.md, changelog.d/fixed/0664-*.md, and this file.

fix/external-bench-wrapper-schema (ADR-0656)

No upstream rebase impact: all touched implementation paths are fork-local external-bench tooling, docs, tests, ADR/research, and changelog fragments. Upstream Netflix/vmaf does not ship this benchmark harness.

Rebase-sensitive fork invariant:

  • summary.competitor emitted by every tools/external-bench/*/run.sh wrapper must exactly match the registry key in compare.WRAPPERS. Model/version labels belong in optional metadata, not this identity field, or validate_wrapper_output() will reject the result before aggregation.

Smoke: .venv/bin/python -m pytest tools/external-bench/tests/ -q

Touched files: tools/external-bench/, docs/ai/external-bench.md, docs/adr/0656-*.md, docs/research/0656-*.md, changelog.d/fixed/0656-*.md, mkdocs.yml, and this file.

fix/tiny-ai-disabled-runtime-gate (ADR-0660)

Low upstream rebase impact: the touched C files are fork-local tiny-AI extractors and helper tests. Upstream Netflix/vmaf does not ship these DNN feature extractors, but conflicts are possible if upstream changes the feature registry or libvmaf's optional-DNN surface.

Rebase-sensitive fork invariant:

  • Every tiny-AI feature extractor calls vmaf_tiny_ai_require_runtime(<feature>) after pixel-format / bit-depth validation and before vmaf_tiny_ai_resolve_model_path(). Disabled-DNN builds must return -ENOSYS before path probing; DNN-enabled builds keep missing model paths as -EINVAL.

Smoke: meson test -C build --suite=fast --print-errorlogs test_lpips test_dists test_fastdvdnet_pre test_mobilesal test_transnet_v2

Touched files: core/src/dnn/tiny_extractor_template.h, core/src/feature/feature_{lpips,dists,mobilesal}.c, core/src/feature/{fastdvdnet_pre,transnet_v2}.c, core/test/tiny_ai_test_template.h, core/src/dnn/AGENTS.md, docs/ai/, docs/metrics/features.md, docs/adr/0660-*.md, docs/research/0660-*.md, changelog.d/fixed/0660-*.md, and this file.

feat/saliency-feature-materializer (ADR-0655)

No upstream rebase impact: the implementation is fork-local AI tooling (ai/scripts/, ai/tests/) plus fork-local documentation and changelog files. Upstream Netflix/vmaf does not ship the fork's saliency training-table materializer.

Rebase-sensitive fork invariants:

  • ai/scripts/materialize_saliency_features.py owns bulk saliency enrichment for existing JSONL/parquet feature tables; trainers consume the resulting saliency_mean / saliency_var columns instead of silently running saliency inference inside training loops.
  • The status column remains row-local and human-readable (ok, skipped-existing, missing-source, missing-geometry, decode-failed, model-failed) so large local sweeps can be audited without scraping stderr.
  • SaliencyMaterializeConfig.default_width / default_height (added PR fixing the Netflix refresh materializer): fallback geometry for raw YUV corpora without container headers. Netflix corpus YUVs are always 1920×1080 at rest.
  • For .yuv sources, the ffmpeg decode prepends -f rawvideo -video_size WxH -pix_fmt yuv420p before -i; do not remove this for raw-YUV support.
  • In-process per-file saliency cache in materialize_rows() avoids redundant decodes for per-frame tables; the cache is scoped to one materialize_rows() call and does not persist across batch table boundaries.

Smoke: PYTHONPATH=. .venv/bin/python -m pytest ai/tests/test_materialize_saliency_features.py -q

feat/signal-mix-audit (ADR-0650)

No upstream rebase impact: all implementation paths are fork-local AI tooling, tests, and documentation. Upstream Netflix/vmaf does not ship this training/audit package or the associated docs.

Rebase-sensitive fork invariants:

  • ai/scripts/signal_mix_audit.py remains table-only and side-effect free: no feature extraction, checkpoint export, corpus mutation, or default CI gate.
  • Signal-family regexes and docs/ai/signal-mix-audit.md must be updated together when new metric families or table columns are introduced.
  • Missing candidate metrics in the Markdown report are advisory work selectors, not proof that a candidate should be promoted without a corpus run.

Smoke: .venv/bin/python -m pytest ai/tests/test_signal_mix_audit.py -q

Touched files: ai/scripts/signal_mix_audit.py, ai/tests/test_signal_mix_audit.py, ai/AGENTS.md, docs/ai/signal-mix-audit.md, docs/adr/0650-*.md, docs/research/0650-*.md, changelog.d/added/0650-*.md, and this file.

fix/dnn-attached-multi-output (ADR-0646)

Low upstream rebase impact: the implementation touches fork-local DNN runtime plumbing plus libvmaf's context bridge. Upstream Netflix/vmaf does not ship the fork's ONNX Runtime attached tiny-AI surface, but conflicts are possible if upstream changes core/src/libvmaf.c near the per-frame pipeline.

Rebase-sensitive fork invariants:

  • Single-output attached tiny models keep the historical collector key exactly. Do not append _score or an ONNX output suffix for one-output models.
  • Multi-output attached models route through vmaf_ort_run(), not vmaf_ort_infer(). The latter is intentionally a single-output helper.
  • Sidecar output_names[] wins only when its count matches the ONNX output count; otherwise ONNX output names are used and sanitized.
  • Attached mode remains scalar-only. Vector or image output tensors must still use vmaf_dnn_session_run() until a future ADR defines feature-name flattening.

Smoke: docker exec vmaf-dev-mcp bash -lc 'cd /workspace && rm -rf /tmp/vmaf-dnn-multi-output-build && meson setup /tmp/vmaf-dnn-multi-output-build core -Denable_dnn=enabled -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled -Denable_hip=false -Denable_metal=disabled && meson test -C /tmp/vmaf-dnn-multi-output-build --suite=dnn --print-errorlogs'

Touched files: core/src/libvmaf.c, core/src/dnn/model_loader.*, core/src/dnn/ort_backend.*, core/test/dnn/*, model/tiny/smoke_multi_output_v0.*, scripts/gen_multi_output_smoke_onnx.py, docs/api/dnn.md, docs/ai/, docs/adr/0646-*.md, docs/research/0646-*.md, changelog.d/fixed/0646-*.md, and this file.

fix/ai-refresh-defaults-and-konvid-full-features (ADR-0642)

No upstream rebase impact: all touched implementation files live under fork-local ai/ tooling. Upstream Netflix/vmaf does not ship these training scripts, model-refresh docs, or local corpus ledgers.

Rebase-sensitive fork invariants:

  • AI feature extraction defaults point at core/build-cpu/tools/vmaf. Do not regress to /usr/local/bin/vmaf or ambiguous build/tools/vmaf; stale binaries have previously lacked fork-only extractors.
  • ai/scripts/konvid_to_full_features.py owns regeneration of both runs/full_features_konvid.parquet and runs/full_features_konvid_with_folds.parquet. The folded output's source=fold0..fold4 assignment is a deterministic balanced hash over clip keys and feeds eval_multiseed_v3_v4.py.
  • BVI-DVC full-feature dir mode accepts .mkv, .mp4, and .yuv. The local .mkv lossless bundle is the known-good refresh input after the raw-YUV copy produced all-zero VMAF in a one-clip smoke.
  • ai/scripts/extract_ugc_features.py emits the current FULL_FEATURES schema with an explicit vmaf_v0.6.1 model path. Do not restore the historical canonical-6-only UGC table when refreshing full_features_5corpus.
  • Aggregate full-feature training tables are rebuilt with ai/scripts/combine_full_feature_parquets.py; the normalized schema is corpus, source, frame_index, codec, <FULL_FEATURES>, vmaf.

Smoke: .venv/bin/python -m pytest ai/tests/test_konvid_full_features.py ai/tests/test_extract_ugc_features.py ai/tests/test_combine_full_feature_parquets.py ai/tests/test_feature_extractor_defaults.py ai/tests/test_bvi_dvc_dir_mode.py -q

Touched files: ai/data/feature_extractor.py, ai/scripts/*full_features*.py, ai/scripts/konvid_to_full_features.py, ai/src/vmaf_train/, ai/tests/test_*, ai/AGENTS.md, docs/ai/, docs/adr/0642-*.md, docs/research/0642-*.md, changelog.d/added/, and .workingdir2/AI_REFRESH_2026-05-20.md (ignored local ledger).

fix/dev-container-encoder-probes (ADR-0641)

Low upstream rebase impact: implementation changes are fork-local dev-container / vmaf-tune files (dev/, tools/vmaf-tune/, docs, ADR, research, changelog) plus one fork-local FFmpeg integration patch. Upstream Netflix/vmaf does not ship vmaf-tune or this dev-MCP compose stack. The only upstream-adjacent file is ffmpeg-patches/0003-*, which targets FFmpeg n8.1.1 rather than Netflix/vmaf.

Rebase-sensitive fork invariants:

  • dev/Containerfile must keep the pinned intel/vpl-gpu-rt source build and post-install /usr/lib/x86_64-linux-gnu/libmfx-gen.so check whenever FFmpeg keeps --enable-libvpl. libvpl-dev alone exposes QSV encoders but cannot create an Arc/iGPU session, and installing the runtime outside the dispatcher search path revives the same MFX_ERR_NOT_FOUND failure.
  • dev/docker-compose.yml must keep the dev-mcp healthcheck aligned with the stdio entrypoint (vmaf --version), not /sockets/vmaf-mcp.sock.
  • vmaf-tune compare defaults to the production CPU set libx265,libsvtav1; archival software codecs remain explicit via --encoders.
  • QSV VA-API device selection defaults to auto and uses Intel sysfs vendor-ID discovery; explicit --vaapi-device paths still override.
  • ffmpeg-patches/0003-* must call vmaf_sycl_state_free(&s->sycl_state). The public SYCL API frees and nulls a VmafSyclState **; using the older single-pointer call breaks the in-container FFmpeg build with -Wincompatible-pointer-types.

Touched files: dev/Containerfile, dev/docker-compose.yml, dev/AGENTS.md, tools/vmaf-tune/src/vmaftune/{bisect.py,cli.py,compare.py,hw_devices.py}, tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, tools/vmaf-tune/tests/, ffmpeg-patches/0003-*, docs/usage/vmaf-tune.md, docs/development/dev-mcp.md, docs/state.md, docs/adr/0641-*.md, docs/research/0641-*.md, changelog.d/fixed/0641-*.md, and this file.

chore/ci-warning-omnibus (ADR-0635)

No rebase impact: all touched files are fork-local CI workflow YAML (.github/workflows/libvmaf-build-matrix.yml), fork-added docs (docs/mcp/tools.md, docs/adr/, docs/research/, changelog.d/), and this file. Upstream Netflix/vmaf does not use GitHub Actions workflows that overlap with these changes. No C sources, no public headers, and no FFmpeg patch series are involved.

Touched files: .github/workflows/libvmaf-build-matrix.yml (ilammy→TheMrMilchmann action swap; windows-latest→windows-2025; vulkaninfo stderr redirect + debug demotion; ccache-v2 key prefix), docs/mcp/tools.md (run_benchmark heading backtick removal + a-id drop), docs/adr/0635-ci-warning-omnibus-2026-05-19.md, docs/adr/README.md (one index row), docs/research/ci-warning-omnibus-2026-05-19.md, changelog.d/fixed/0635-ci-warning-omnibus.md, docs/rebase-notes.md (this entry).

ADR-0672 — Saliency materializer temporal controls

Saliency-table provenance impact. This widens the ADR-0655 materializer from historical mean-only saliency to the same temporal reducer family exposed by vmaf-tune.

Key invariants:

  • ai/scripts/materialize_saliency_features.py forwards --temporal-aggregator and --ema-alpha into vmaftune.saliency.compute_saliency_map().
  • Newly computed rows record saliency_model_id, saliency_aggregator, and saliency_ema_alpha by default.
  • Rows skipped because they already contain finite saliency columns must not get invented model/reducer metadata; use --overwrite for intentional replacement.

Touched files: ai/scripts/materialize_saliency_features.py, ai/tests/test_materialize_saliency_features.py, ai/AGENTS.md, docs/ai/saliency-feature-materializer.md, docs/ai/u2netp-mirror.md, docs/adr/0672-saliency-materializer-temporal-controls.md, docs/research/0692-saliency-materializer-temporal-controls.md, changelog.d/added/0672-saliency-materializer-temporal-controls.md, docs/rebase-notes.md (this entry).

ADR-0654 — Predictor saliency signals

vmaf-tune predict --use-saliency is a predictor-feature switch, not the ROI/QP sidecar path. Preserve the temporary raw-yuv420p decode in predictor_features._compute_saliency() before calling saliency.compute_saliency_map(raw_path, width, height, ...); the saliency helper remains raw-YUV-only even though the public predict source can be any FFmpeg-readable container.

predictor_train.project_row() must keep the 14-column predictor input layout stable. When real corpora carry probe_*_avg_bytes, saliency_mean, saliency_var, frame_diff_mean, y_avg, or y_var, preserve those finite values. Only legacy rows should fall back to bitrate-derived probe bytes and zero saliency / signalstats values.

fix/ci-test-failures-omnibus (ADR-0637)

No rebase impact: all touched files are fork-local CI configuration (.github/workflows/tests-and-quality-gates.yml), MCP server tests (mcp-server/vmaf-mcp/tests/test_smoke_e2e.py), ADR files, and changelog fragments. No upstream C sources, no public headers, no FFmpeg patch series involved. The timeout and coverage-floor edits are fork-CI-specific and have no upstream equivalent.

fix/scaffold-audit-p0-silent-correctness (ADR-0620)

No rebase impact: all touched files are fork-local Python harness files and docs. No upstream C sources, no public headers, no FFmpeg patch series involved. The three fixed Python files (routine.py, train_test_model.py, local_explainer.py) are also present upstream, but the specific exception-handling changes are in fork-added call paths (extended-stats bagging, plot_scatter visualisation, local-explainer model dispatch). If upstream lands a conflicting change to these exact lines, the merge resolution is straightforward: keep the raise paths and update context if the upstream change affects surrounding logic.

Touched files: python/vmaf/tools/exceptions.py (3 new exception classes), python/vmaf/routine.py (P0-1 fix + CalibrationError import), python/vmaf/core/train_test_model.py (P0-2 fix + MissingLabelStddevError import), python/vmaf/core/local_explainer.py (P0-3 fix + EnsembleNotSupportedError import), python/test/test_adr0620_scaffold_audit_p0.py (16 regression tests), docs/adr/0620-scaffold-audit-p0-silent-correctness-fixes.md, docs/adr/README.md (one index row), docs/state.md (3 rows moved from Open to Recently closed), changelog.d/fixed/adr0620-scaffold-audit-p0-silent-correctness.md,

fix/scaffold-audit-p1-feature-plumbing (ADR-0613)

Touches core/src/hip/picture_hip.c, core/src/feature/feature_mobilesal.c, and core/src/libvmaf.c. Upstream Netflix/vmaf does not have a HIP backend, the mobilesal extractor, or the DNN multi-output guard — so no rebase conflict is expected on any of the C-side changes.

Touches tools/vmaf-tune/src/vmaftune/cli.py — upstream does not have vmaf-tune. No rebase conflict expected.

Doc paths (docs/api/dnn.md, docs/ai/models/mobilesal.md, docs/state.md, docs/adr/README.md) are fork-local only.

Rebase-sensitive invariant (C): picture_hip.c now compiles in two branches: #ifdef HAVE_HIPCC (real hipMalloc) and #else (-ENOSYS). Any upstream change to picture_hip.h's function signatures must be reflected in both branches.

Touched files: core/src/hip/picture_hip.c, core/src/feature/feature_mobilesal.c, core/src/libvmaf.c (comment-only at lines 1115, 1214), tools/vmaf-tune/src/vmaftune/cli.py, docs/api/dnn.md, docs/ai/models/mobilesal.md, docs/state.md, docs/adr/0639-scaffold-audit-p1-feature-plumbing-fixes.md, docs/adr/README.md, changelog.d/fixed/adr-0613-scaffold-audit-p1.md, docs/rebase-notes.md (this entry).

feat/zed-editor-project-config (ADR-0608)

No rebase-sensitive invariants — only .zed/ (new directory), .gitignore (.zed/local/ exclusion), docs/development/ide-setup.md (Zed section), docs/adr/0608-zed-editor-project-config.md (ADR), and supporting fragment/ changelog files are touched. None of these paths overlap with upstream Netflix/vmaf. .vscode/ is unchanged.

Touched files: .zed/settings.json, .zed/tasks.json, .zed/debug.json (new), .gitignore (.zed/local/ entry), docs/development/ide-setup.md (Zed section appended), docs/adr/0608-zed-editor-project-config.md, docs/adr/_index_fragments/0608-zed-editor-project-config.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md (regenerated), changelog.d/added/0608-zed-editor-project-config.md, docs/rebase-notes.md (this entry).

plan/netflix-grade-encoding-roadmap (ADR-0613 – ADR-0618)

No rebase-sensitive invariants — all changes are planning documents only: six ADRs, six research digests, one roadmap overview, one changelog fragment, and ADR index rows in docs/adr/README.md. No C sources, headers, build files, or Python implementation files are touched. No upstream-shared paths are modified.

Touched files: docs/adr/0613-dynamic-optimizer.md, docs/adr/0614-per-shot-abr-rendition.md, docs/adr/0615-fast-nr-prescoring.md, docs/adr/0616-vmaf-neg-integration.md, docs/adr/0617-cross-shot-complexity-weighting.md, docs/adr/0618-content-aware-classifier.md, docs/adr/README.md (index rows), docs/research/0609-dynamic-optimizer-research.md, docs/research/0610-per-shot-abr-rendition-research.md, docs/research/0611-fast-nr-prescoring-research.md, docs/research/0612-vmaf-neg-integration-research.md, docs/research/0613-cross-shot-complexity-weighting-research.md, docs/research/0614-content-aware-classifier-research.md, docs/development/netflix-grade-encoding-pipeline-roadmap-2026-05-19.md, changelog.d/added/netflix-grade-encoding-pipeline-roadmap.md.

chore/scaffold-audit-p3-cleanup (ADR-0621)

No rebase-sensitive invariants. All changes are in fork-local files (ai/scripts/, scripts/dev/, python/test/, .semgrepignore, docs/ai/model-registry.md, docs/adr/, docs/state.md, changelog.d/). None of the touched Python test files are shared with Netflix upstream (Netflix does not ship asset_test.py or quality_runner_test.py). The python/test/*.py files the PR touches carry fork-added tests or skip-decorator updates; no upstream test assertions are modified.

Touched files: scripts/dev/permutation_importance.py, ai/scripts/*.py (13 files), python/test/result_test.py, python/test/routine_test.py, python/test/asset_test.py, python/test/feature_extractor_test.py, python/test/quality_runner_test.py, .semgrepignore, docs/ai/model-registry.md, docs/adr/0621-scaffold-audit-p3-cleanup.md, docs/adr/README.md, docs/state.md, changelog.d/fixed/0621-scaffold-audit-p3-cleanup.md.

feat/mcp-p1-vmaftune-extractors-models-progress (ADR-0608)

No rebase-sensitive invariants. The only changed files are:

  • mcp-server/vmaf-mcp/src/vmaf_mcp/server.py — fork-local MCP server, never in Netflix upstream.
  • mcp-server/vmaf-mcp/tests/ — fork-local tests.
  • mcp-server/vmaf-mcp/tests/test_smoke_e2e.py — updated expected tool-name set.
  • docs/mcp/tools.md, docs/adr/0608-*.md, docs/adr/README.md, docs/rebase-notes.md — docs.
  • changelog.d/added/0608-*.md — changelog fragment.

No C sources, public headers, meson_options.txt, ffmpeg-patches/, or build files are touched.

chore/renovate-customManagers-dev-image (ADR-0605)

No rebase-sensitive invariants — the only change is to renovate.json (adding eight new customManagers entries for Containerfile ARG-pinned deps; extending the FFmpeg manager's managerFilePatterns to also scan dev/Containerfile). renovate.json is fork-local and never appears in upstream Netflix/vmaf. No C sources, headers, or build files are touched.

Touched files: renovate.json (customManagers + packageRules), docs/adr/0605-renovate-custommgr-dev-image.md, docs/adr/README.md (one index row), changelog.d/changed/0605-renovate-custommgr-dev-image.md, docs/rebase-notes.md (this entry).

chore/rocm-7-13-bump-and-renovate-manager (ADR-0604)

No rebase-sensitive invariants — the only change is to renovate.json (adding a customManagers entry and customDatasources block for ROCm). renovate.json is fork-local and never appears in upstream Netflix/vmaf. dev/Containerfile is unchanged (7.2.3 remains the correct pin).

Touched files: renovate.json (customManagers + customDatasources), docs/adr/0604-rocm-renovate-manager.md, docs/adr/README.md (one index row), docs/research/rocm-version-audit-2026-05-19.md, changelog.d/changed/0604-rocm-renovate-manager.md, docs/rebase-notes.md (this entry).

fix/ubuntu-26-04-fallout (ADR-0603)

No rebase-sensitive invariants — all changes are in the build/CI layer (dev/Containerfile, CI workflow YAML, meson.build nvcc flags, pyproject.toml ceiling bumps) and do not touch any upstream-shared C sources, public headers, or Python test assertions.

The one meson.build addition (-D__MATH_NO_INLINES in cuda_flags) is additive and harmless on any glibc version; if upstream Netflix touches the CUDA flags block in core/src/meson.build, preserve the -D__MATH_NO_INLINES entry alongside whatever upstream adds.

Touched files: dev/Containerfile, core/src/meson.build (cuda_flags), tools/vmaf-tune/pyproject.toml (requires-python ceiling), ai/pyproject.toml (requires-python ceiling), .github/workflows/libvmaf-build-matrix.yml (CUDA version pin), docs/adr/0603-ubuntu-26-04-fallout-fixes.md, docs/adr/README.md (index row), changelog.d/fixed/ubuntu-26-04-fallout.md, docs/rebase-notes.md (this entry).

fix/macos-vmaf-write-output-segv (ADR-0602)

No rebase-sensitive invariants — the changes are purely defensive guards (NULL checks, pic_cnt > 0 guards) added to existing functions in core/src/libvmaf.c and core/src/output.c, and a new test in core/test/test_output.c. If upstream Netflix merges any change to vmaf_write_output_with_format or vmaf_write_output_json, re-apply the three guards (vmaf-NULL, feature_collector-NULL, output_path-NULL) and the pic_cnt > 0 guards in json_write_pooled_entry / xml_write_one_metric_pools to the merged version.

Touched files: core/src/libvmaf.c (NULL guards at top of vmaf_write_output_with_format), core/src/output.c (pic_cnt > 0 guards, NULL guards in JSON writer, split xml_write_pooled_and_aggregate into three helpers, remove unused n_frames variables), core/test/test_output.c (test_write_output_pic_cnt_zero regression test), docs/adr/0602-macos-vmaf-write-output-segv.md, docs/adr/README.md (index row), docs/state.md (Recently-closed row), docs/rebase-notes.md (this entry), changelog.d/fixed/0602-macos-vmaf-write-output-segv.md.

fix/vmaftune-qsv-amf-hw-init-and-probe-size (ADR-0601)

Rebase impact: tools/vmaf-tune/ only — no libvmaf C sources, public headers, or meson_options.txt touched. Zero upstream conflict surface.

Rebase-sensitive invariants:

  • compare._QSV_ENCODERS must stay in sync with the set of QSV encoder names registered in codec_adapters/. If a new QSV adapter is added (e.g. vp9_qsv), add its encoder string to _QSV_ENCODERS in the same commit; omitting it silently skips the VA-API init chain for that encoder.
  • BaseQsvAdapter.qsv_hw_init_args() and compare._hw_init_args_for_encoder() must produce identical flag sequences. If one is updated, update the other. A test in test_bbb_e2e_v14_bug_cluster.py verifies this invariant.
  • The default _DEFAULT_VAAPI_DEVICE = "/dev/dri/renderD128" is also the default in BaseQsvAdapter.qsv_hw_init_args. Keep them in sync.

Touched files: tools/vmaf-tune/src/vmaftune/compare.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, tools/vmaf-tune/src/vmaftune/codec_adapters/_amf_common.py, tools/vmaf-tune/tests/test_bbb_e2e_v14_bug_cluster.py, docs/adr/0601-vmaftune-qsv-amf-hw-init-and-probe-fix.md, docs/adr/README.md (one index row), docs/usage/vmaf-tune.md (--vaapi-device flag + QSV init docs), docs/state.md (T-BBB-V14-HW-ENCODER-PROBE-QSV-INIT-2026-05-18 row), changelog.d/fixed/0601-vmaftune-qsv-amf-hw-init-and-probe-fix.md, docs/rebase-notes.md (this entry).

chore/ffmpeg-patches-n811-full-feature-exposure-sync (ADR-0576)

Rebase impact: ffmpeg-patches/ only — no libvmaf C sources, public headers, or meson_options.txt touched. Upstream Netflix/vmaf does not ship ffmpeg-patches/; no rebase conflict surface.

Rebase-sensitive invariants:

  • Patch 0014 targets the LIBVMAFContext struct and VmafConfiguration init blocks introduced cumulatively by patches 0003–0013. It must remain the final patch in the series (or be rebased against whichever patch last touches those init blocks if the series is reordered).
  • The cpumask / gpumask AVOption names must match the field names in VmafConfiguration from core/include/libvmaf/libvmaf.h. If a future libvmaf refactor renames those fields, patch 0014's struct designators (.cpumask =, .gpumask =) must be updated to match.
  • The feature= passthrough in the stock libvmaf filter continues to cover all extractors in feature_extractor_list[]; no patch is needed for new extractor additions unless they require a dedicated C-API init call (e.g., a new vmaf_<backend>_state_init() entry point).

fix/ffmpeg-patches-score-fmt-gap (ADR-1064)

Rebase impact: ffmpeg-patches/ only — adds patch 0016 and updates series.txt and README.md. No libvmaf C sources, public headers, or meson_options.txt touched.

Rebase-sensitive invariants:

  • Patch 0016 must come after patch 0014 (which adds cpumask/gpumask to LIBVMAFContext). Patch 0016 adds score_fmt immediately after the int64_t gpumask field; if 0014 is reordered or the struct layout changes, the context lines in 0016's struct hunk must be updated.
  • The vmaf_write_output_with_format symbol must be present in the libvmaf version checked by pkg-config. If a future refactor renames this entry point, all four uninit paths in patch 0016 must be updated.
  • Patch 0016 requires git am --3way replay against all 15 preceding patches before verifying clean apply against n8.1.1 (the patch series is cumulative).

Re-test on rebase:

git clone --depth 1 --branch n8.1.1 https://git.ffmpeg.org/ffmpeg.git /tmp/ffmpeg-retest
git -C /tmp/ffmpeg-retest config user.email "lusoris@pm.me"
git -C /tmp/ffmpeg-retest config user.name "Lusoris"
for p in ffmpeg-patches/*.patch; do
    git -C /tmp/ffmpeg-retest am --3way "$p" || { echo "FAILED: $p"; break; }
done
# Expect 14 commits applied cleanly, no conflicts.

Upstream Netflix/vmaf has no ffmpeg-patches/; no rebase conflict surface against upstream/master. All 14 patches are fork-local.

feat/vmaftune-bisect-concurrency-cap (ADR-0577)

Rebase impact: pure Python — touches only tools/vmaf-tune/ and docs/. No C surface, no meson.build change, no public C-API change, no GPU path change.

Rebase-sensitive invariant: none. The _decode_semaphore singleton and set_decode_semaphore setter are new module-level additions in vmaftune/bisect.py; they do not conflict with any existing upstream pattern. The decode_semaphore keyword argument added to bisect_target_vmaf and make_bisect_predicate is backwards-compatible (defaults to None, falling back to the module-level semaphore).

Touched files: tools/vmaf-tune/src/vmaftune/bisect.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_bisect_concurrency_cap.py (new), tools/vmaf-tune/tests/test_bisect.py (exports check update), tools/vmaf-tune/tests/test_compare.py (semaphore kwarg assertion), docs/adr/0577-vmaftune-bisect-concurrency-cap-and-aggressive-cleanup.md (new), docs/adr/README.md (one index row), docs/usage/vmaf-tune.md (--max-concurrent-decodes docs + disk-mgmt section), changelog.d/fixed/vmaf-tune-bisect-concurrency-cap-enospc.md (new), docs/rebase-notes.md (this entry).

fix/windows-ci-sdk-pin-22621 (ADR-0575)

Rebase impact: tools only — touches core/tools/yuv_input.c. No meson.build change, no public C-API change, no GPU path change.

Rebase-sensitive invariant: #include <sys/stat.h> must remain before the #ifdef _MSC_VER macro block in yuv_input.c. If a rebase reorders these lines (e.g. by re-applying a prior ADR-0521 patch that placed the macros before the include), the MinGW64 and MSVC+SDK-26100 redefinition errors will recur.

Touched files: core/tools/yuv_input.c, docs/adr/0575-windows-msvc-stat-compat-include-order.md, docs/adr/README.md (one index row), docs/state.md (Updated note + T-WINDOWS-STAT-COMPAT row in Recently closed), changelog.d/fixed/0575-windows-stat-compat-include-order.md, docs/rebase-notes.md (this entry).

feat/integer-ssim-gpu-real-kernels (ADR-0564)

Rebase impact: low. The change touches two upstream-shared files:

  • core/src/feature/feature_extractor.c: adds three extern declarations and three list entries (vmaf_fex_integer_ssim_cuda, vmaf_fex_integer_ssim_sycl, and a comment update). On rebase, apply after any upstream changes to this file.
  • core/src/meson.build: adds one entry to cuda_cu_sources dict and one entry to the C source list. The meson.build is append-only per fork coordination rules.
  • core/src/feature/hip/integer_ssim_hip.c: full rewrite of the host glue. The pre-existing upstream file used float intermediates; this branch rewrites it to int64. If upstream ever ships a real integer_ssim HIP extractor, it will conflict — prefer the upstream version and re-test.
  • core/src/feature/sycl/integer_ssim_sycl.cpp: appends a new extractor after the existing float_ssim_sycl code. On rebase, confirm the append point is still a clean } /* extern "C" */ boundary.

All new files (ssim_cuda.c, ssim_cuda.h, integer_ssim_score.cu) are fork-local with no upstream equivalent; no conflict expected.

Invariant: vmaf_fex_integer_ssim_cuda in ssim_cuda.c provides "ssim". The pre-existing vmaf_fex_integer_ssim_cuda in integer_ssim_cuda.c provides "float_ssim" — the naming is a historical misnomer kept for link-compat. Do not merge or rename without updating feature_extractor.c to match.

Touched files: core/src/feature/cuda/integer_ssim/integer_ssim_score.cu (new), core/src/feature/cuda/ssim_cuda.c (new), core/src/feature/cuda/ssim_cuda.h (new), core/src/feature/hip/integer_ssim_hip.c (rewritten), core/src/feature/sycl/integer_ssim_sycl.cpp (appended), core/src/feature/feature_extractor.c (extern + list entries), core/src/meson.build (PTX + C source entries), docs/adr/0564-integer-ssim-gpu-real-kernels.md, docs/adr/README.md (one index row), docs/research/0564-integer-ssim-gpu-real-kernels.md, docs/state.md (Recently-closed row), changelog.d/added/0564-integer-ssim-gpu-real-kernels.md, docs/rebase-notes.md (this entry).

fix/vmaftune-workdir-tmpfs-enospc (ADR-0549)

No rebase impact. All changes are confined to fork-local files:

  • tools/vmaf-tune/src/vmaftune/bisect.py (fork-added tool).
  • tools/vmaf-tune/src/vmaftune/cli.py (fork-added tool).
  • tools/vmaf-tune/tests/test_workdir_enospc.py (new test file).
  • tools/vmaf-tune/tests/test_compare.py (update expected kwargs).
  • dev/Containerfile (fork-local; whole file is fork-added).
  • dev/scripts/dev-mcp-entrypoint.sh (fork-local).
  • docs/adr/0549-vmaftune-workdir-relocation.md, docs/state.md, docs/usage/vmaf-tune.md, docs/adr/README.md, changelog.d/fixed/vmaf-tune-enospc-workdir.md, docs/rebase-notes.md (fork-only doc tree).

No upstream-shared paths touched. VMAFTUNE_WORKDIR is a new fork-local environment variable; it has no upstream counterpart and poses no rebase conflict risk.

docs/vcq-223-local-explainer-hang-diagnosis (ADR-0563)

No rebase impact. All changes are confined to fork-local documentation:

  • docs/adr/0551-local-explainer-hang-diagnosis.md (new file, fork-local).
  • docs/research/0551-local-explainer-hang.md (new file, fork-local).
  • docs/state.md — updated T-VCQ-223-LOCAL-EXPLAINER-HANG row (fork-local).
  • changelog.d/fixed/0551-local-explainer-hang-diagnosis.md (new file, fork-local).
  • docs/adr/README.md — new index row (fork-local).
  • docs/rebase-notes.md — this entry (fork-local).

No upstream-shared C sources, Python sources, or build files are touched. The @unittest.skip decorator in python/test/local_explainer_test.py is explicitly not removed in this PR — that is a follow-up code change.

chore/hip-cuda-orphan-tu-cleanup (ADR-0546)

No rebase impact. All deleted files (adm_hip.c, motion_hip.c, vif_hip.c, feature_hip.h, integer_ciede_hip.c, integer_moment_hip.c, float_ssim_cuda.c) are fork-local additions with no upstream analogue. If upstream ever adds a file with the same name to core/src/feature/hip/ or core/src/feature/cuda/, a sync-upstream cherry-pick will restore it; the deletion here does not create a rebase conflict because the upstream tree never had these paths. The core/src/hip/meson.build edit is entirely fork-local. core/src/feature/hip/AGENTS.md and core/src/feature/cuda/AGENTS.md are fork-local files.

chore/hip-extractor-audit-verify-9 (ADR-0563)

No rebase impact. This PR is documentation and audit closure only. All changed files are fork-local:

  • docs/adr/0563-hip-extractor-audit-verification.md (new ADR, fork-local).
  • docs/research/0563-hip-extractor-audit-verification.md (new research digest, fork-local).
  • docs/state.md (fork-local tracking ledger).
  • docs/adr/README.md (fork-local ADR index).
  • docs/rebase-notes.md (this file, fork-local).
  • changelog.d/changed/0551-hip-extractor-audit-close.md (fork-local fragment).

No upstream Netflix/vmaf file is touched. No libvmaf/ source is touched. No ffmpeg-patches/ file is touched. No meson_options.txt key is added. No new rebase-sensitive invariant is introduced.

fix/dev-container-sycl-hip-runtime (ADR-0543)

No rebase impact. All changes are confined to:

  • dev/Containerfile (fork-local; whole file is fork-added).
  • dev/scripts/dev-mcp-entrypoint.sh (fork-local).
  • dev/AGENTS.md (fork-local invariant note).
  • docs/adr/0541-dev-container-sycl-hip-runtime-fix.md, docs/state.md, docs/development/dev-mcp.md, changelog.d/fixed/0541-dev-container-sycl-hip-runtime.md, docs/adr/README.md (fork-only doc tree).

No upstream-shared paths touched. The container's pinned NEO_VER / IGC_VER / GMMLIB_VER / ROCM_VER ARGs become a recurring maintenance item: when a future host kernel revs the i915 / xe / KFD UAPI, bump the relevant ARG. The dev-mcp-entrypoint.sh visibility probe surfaces such regressions in ≤ 30 s of container start so future bumps are easy to identify.

fix/dev-container-full-gpu-plumbing (ADR-0542)

No upstream-mirror paths touched. Modifies:

  • dev/Containerfile (stage 1 apt list: + intel-media-va-driver-non-free, mesa-va-drivers; revised Vulkan ICD selection comment block).
  • dev/docker-compose.yml (common-env: + HSA_OVERRIDE_GFX_VERSION, HSA_ENABLE_SDMA, + ROCR_VISIBLE_DEVICES; expanded NVIDIA_DRIVER_CAPABILITIES documentation comment).
  • dev/scripts/dev-mcp-entrypoint.sh (entrypoint-time VK_DRIVER_FILES rewrite to exclude lavapipe whenever any real ICD is present).
  • dev/AGENTS.md (new GPU-plumbing invariant section).
  • docs/development/dev-mcp.md (backend matrix + env-var contract + HSA override documentation).
  • docs/adr/0541-…md (+ index row), changelog.d/fixed/0541-…md, docs/state.md (one recently-closed row), this file.

Rebase sensitivity (none — container infra fork-local additive plus documentation): Every touched file lives under dev/, docs/, or changelog.d/. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry. The CLAUDE.md §12 r14 patch-stack rule does not apply (no libvmaf surface touched). Netflix upstream has no container infra under dev/ to conflict with.

fix/integer-vif-cuda-chroma-plane (ADR-0547)

Touches upstream-mirror path. Modifies:

  • core/src/feature/cuda/integer_vif_cuda.c (upstream-mirror — comment option-help-text clarifications plus a one-shot warn-on-true block for the vestigial enable_chroma option; no kernel changes, no behaviour changes for any caller that doesn't set enable_chroma=true).

Why fork-local. Upstream Netflix/vmaf's CUDA VIF (verified at Netflix/vmaf@32780bd9b6:core/src/feature/cuda/integer_vif_cuda.c) neither carries the enable_chroma option nor has an n_planes field — it hardcodes data[0]-only access. The option was added by the fork-local PR #949 and the abandoned PR #948 attempted to mirror it on CPU; only the CUDA option landed and it was always a no-op.

Sync rule. If upstream ever adds genuine multi-plane VIF (would be a significant departure from the Sheikh & Bovik 2006 definition), revisit this clarification:

  • Drop the vmaf_log VMAF_LOG_LEVEL_WARNING block from init_fex_cuda.
  • Restore s->enable_chroma = false; to the active-clamp form OR plumb enable_chroma into the dispatch loop (depending on upstream's shape).
  • Update docs/metrics/vif.md to advertise the per-chroma-plane features that newly exist.
  • Move the docs/state.md row from "Confirmed not-affected" to a normal closed-bug row.

Until then, sync conflicts on this file should keep both: the fork-local warn-on-true block in init_fex_cuda (search for the ADR-0541 comment anchor) and any incoming upstream changes to the neighbouring kernel-load paths.

Test reference. core/test/test_integer_vif_cpu_cuda_parity.c (suite fast/gpu) is the regression gate; it must continue to pass after any sync.

feat/hip-float-vif-score-kernel-real (ADR-0539)

No rebase impact. Touches:

  • core/src/feature/hip/hip_hsaco_stubs.c — fork-local TU; removes one VMAF_HSACO_WEAK_STUB(float_vif_score_hsaco) line. Upstream Netflix/vmaf has no HIP backend so no conflict possible.
  • docs/adr/0539-hip-float-vif-stub-removal.md, docs/adr/README.md, docs/state.md, docs/backends/hip/overview.md, core/src/feature/hip/AGENTS.md, changelog.d/fixed/hip-float-vif-stub-removal.md — fork-local docs.

Rebase invariant: the moment another .hip kernel under feature/hip/<extractor>/ becomes standalone-buildable, the same one-line removal must happen in hip_hsaco_stubs.c for its symbol. The AGENTS.md note added by this PR captures the pattern.

fix/hip-integer-vif-kernel-crash (ADR-0538)

Touches upstream-mirror paths. Modifies:

  • core/src/feature/hip/integer_vif/vif_statistics.hip (fork-local — HIP backend addition; no upstream conflict expected).
  • core/src/feature/hip/integer_vif_hip.c (fork-local — added by ADR-0379 / PR #...).
  • core/src/meson.build (upstream-shared — adds entries to the fork-local hip_kernel_sources dict, which itself is inside the fork-local if is_hipcc_enabled and is_hip_enabled block; conflict risk only if upstream lands a totally different HIP build pipeline, which is implausible).
  • core/src/hip/meson.build (fork-local).
  • core/src/feature/hip/AGENTS.md (fork-local).
  • core/src/feature/hip/hip_hsaco_stubs.c (NEW — fork-local).

No verbatim upstream code paths altered. Rebase invariant: if upstream ever adds an integer_vif HIP port, drop the fork's integer_vif/vif_statistics.hip and integer_vif_hip.c and re-evaluate whether the four ADR-0538 defects exist in their port too — three of the four are subtle (filter-half-width parsing, missing rd-write, host-pointer kernel arg) and an upstream re-implementation may well have the same blind spots.

fix/per-shot-bitrate-and-last-shot-chart (ADR-0531)

No rebase impact. All changes are confined to tools/vmaf-tune/src/vmaftune/per_shot.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/report.py, tools/vmaf-tune/tests/test_per_shot.py, tools/vmaf-tune/tests/test_report.py, docs/adr/0531-*.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, and changelog.d/fixed/per-shot-bitrate-and-last-shot-chart.md. The tools/vmaf-tune/ tree does not exist in upstream Netflix/vmaf. No conflict risk on sync.

fix/per-shot-segments-readonly-cwd (ADR-0532)

fix/per-shot-segments-readonly-cwd (ADR-0530)

No rebase impact. All changes are confined to tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_per_shot.py, docs/usage/vmaf-tune.md, docs/adr/0530-*.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, and changelog.d/fixed/0532-per-shot-segments-readonly-cwd.md. The tools/vmaf-tune/ tree does not exist in upstream Netflix/vmaf. No conflict risk on sync.

changelog.d/fixed/0530-per-shot-segments-readonly-cwd.md. The tools/vmaf-tune/ tree does not exist in upstream Netflix/vmaf. No conflict risk on sync.

fix/dev-container-dri-bind (ADR-0528)

No rebase impact. The only changed files are dev/docker-compose.yml, dev/AGENTS.md, docs/development/dev-mcp.md, docs/adr/0528-*.md, docs/adr/README.md, docs/rebase-notes.md, docs/state.md, and changelog.d/fixed/dev-container-dri-bind.md. None of these paths exist in upstream Netflix/vmaf. No conflict risk on sync.

fix/compare-rate-quality-chart-from-bisect-samples (ADR-0534)

No rebase impact. All changes are confined to fork-local files:

  • tools/vmaf-tune/src/vmaftune/bisect.py — added BisectSample dataclass + BisectResult.samples field; bisect loop appends a sample per successful probe; to_recommend_result projects samples into the RecommendResult.bisect_samples tuple.
  • tools/vmaf-tune/src/vmaftune/compare.pyRecommendResult gained optional bisect_samples field; to_row emits bisect_samples only when populated (additive v2 schema change); CSV writer pinned to extrasaction="ignore".
  • tools/vmaf-tune/src/vmaftune/report.pyBisectSamplePoint added; CodecSweepPoint gained optional bisect_samples; _sweep_plot_fn rewrites the chart to render from samples when available (legacy connect-the-dots path retained with caveat note when samples absent).
  • tools/vmaf-tune/src/vmaftune/cli.py--target-vmafs default flipped to 75,80,85,90,93; both --target-vmaf and --target-vmafs wrapped with _TrackedDefaultAction so the v1 single-target back-compat path activates only when --target-vmaf NN is explicit and --target-vmafs is at its default; _sweep_point_from_json parses the new field.

None of these paths exist in upstream Netflix/vmaf (the entire tools/vmaf-tune/ tree is fork-local). No conflict risk on sync.

ffmpeg-patch stack: no impact (this PR doesn't touch any libvmaf C-API, public header, or meson_options.txt entry).

tooling/adr-atomic-allocator (ADR-0535)

No rebase impact. All changes are confined to scripts/adr/next-free.sh, scripts/adr/test-next-free.sh, docs/adr/0535-adr-atomic-allocator.md, docs/adr/README.md, docs/adr/0000-template.md, docs/state.md, docs/rebase-notes.md, docs/development/adr-workflow.md, changelog.d/added/0535-adr-atomic-allocator.md, CLAUDE.md, and AGENTS.md. None of these paths exist in upstream Netflix/vmaf. No conflict risk on sync.

fix/premium-vmaf-target-defaults (ADR-0538)

No rebase impact. All changes are confined to fork-local files:

  • tools/vmaf-tune/src/vmaftune/cli.py — flip the --target-vmafs default from 75,80,85,90,93 (ADR-0534) to 94,96,97,98 and update the help text + supersession note.
  • tools/vmaf-tune/src/vmaftune/bisect.py — add _ABSOLUTE_CRF_RANGE_BY_NAME + _absolute_crf_range(adapter); default the bisect search window to that absolute range; bypass adapter.validate's CRF gate inside _encode_and_score in favour of an explicit absolute-range check.
  • tools/vmaf-tune/tests/test_bisect.py — update test_crf_range_defaults_to_adapter_quality_range -> test_crf_range_defaults_to_encoder_absolute_range; add three regression tests pinning libx264 / libx265 / libsvtav1 premium-archival targets at ok=True with achieved VMAF within 0.5 of target.
  • tools/vmaf-tune/tests/test_compare_rate_quality_sweep.py — update the default-target-vmafs assertion to 94,96,97,98.
  • tools/vmaf-tune/AGENTS.md — rewrite the --target-vmafs default rebase-sensitive-invariant note; add a new invariant for the bisect's encoder-absolute-range default.
  • docs/usage/vmaf-tune.md — supersede the rate-quality-sweep section's defaults / rationale; add the High-VMAF bisect contract subsection with the per-codec absolute-range table.
  • docs/adr/0538-premium-vmaf-target-defaults-and-bisect.md, docs/adr/0534-...md (status flip), docs/adr/README.md, docs/research/0537-...md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0538-premium-vmaf-target-defaults-and-bisect.md.

None of these paths exist in upstream Netflix/vmaf (the entire tools/vmaf-tune/ tree is fork-local). No conflict risk on sync.

ffmpeg-patch stack: no impact (this PR does not touch any libvmaf C-API, public header, or meson_options.txt entry).

feat/bvi-dvc-pre-extracted-input (ADR-0527)

No rebase impact. All changes are confined to ai/scripts/bvi_dvc_to_full_features.py, ai/tests/test_bvi_dvc_dir_mode.py, and doc / AGENTS.md / changelog files. None of these paths exist in upstream Netflix/vmaf. No conflict risk on sync.

fix/hip-motion-extractor-register (ADR-0523)

No rebase impact. The only changed file is core/src/feature/feature_extractor.c, which is a fork-local file (the HIP and Metal extractor blocks it contains have no upstream equivalent). Upstream Netflix/vmaf does not ship integer_motion_hip and the #if HAVE_HIP block does not exist in upstream. No conflict risk on sync.

fix/dnn-symbolic-batch-dim (ADR-0524)

Rebase-sensitive — core/src/libvmaf.c carries the fork-local tiny-AI loader path (vmaf_ctx_dnn_attach and the two helpers dnn_attach_nchw / dnn_attach_feature_vector); Netflix upstream does not ship a tiny-model surface. Changes:

  • dnn_attach_nchw accepts in_shape[0] ∈ {1, -1} (symbolic batch folded to 1). The in_shape[1] != 1 (channels) reject is now separated from the batch check so each surface has its own diagnostic. The H/W reject message was sharpened to call out symbolic dims explicitly.
  • dnn_attach_feature_vector gained the same batch policy before the feature-width check; the optional rank-2 second-input shape probe (extra_shape) follows the same rule.
  • Per-frame inference (vmaf_ctx_dnn_run_frame_nchw and the feature-vector run path) is unchanged — both already emit shape[0] = 1 on the ORT Run call, so symbolic batch is purely a load-time concern.

core/src/dnn/AGENTS.md gained an "Invariant — symbolic batch dim acceptance (ADR-0524)" section. Reverting the batch acceptance breaks every shipped NR tiny model (model/tiny/nr_metric_v1*.onnx) plus any future trainer using the PyTorch dynamic_axes default.

ffmpeg-patch stack: no impact. The tiny-AI loader sits behind vmaf_use_tiny_model, which the in-tree FFmpeg patches do not touch.

Test fixture: model/tiny/smoke_v0_symbolic_batch.onnx is a fork-local 166-byte Identity graph with dim_param='batch' on dim 0. The fixture has no sidecar (loader handles -ENOENT gracefully) and is not listed in model/tiny/registry.json (which catalogues shipped models, not test fixtures).

fix/cli-no-reference-wire (ADR-0520)

Rebase-sensitive — core/tools/cli_parse.c + core/tools/vmaf.c are upstream-shared paths. Changes:

  • cli_parse.c: the reference-required gate at the end of cli_parse() is now conditional on !settings->no_reference; the new branch requires tiny_model_path and force-enables no_prediction. If upstream Netflix reintroduces an unconditional if (!settings->path_ref) (the original shape pre-PR), restore the guard. The no_reference field has been in tree since the tiny-AI surface landed, so the merge conflict is a literal hunk replace.
  • vmaf.c: in the main() body the file_ref = fopen(c.path_ref, ...) call now opens c.path_dist when c.no_reference is true. If an upstream sync collapses the open into a helper, propagate the conditional. open_input_videos also gained a no_reference-aware error message (uses c->path_dist when ref is being faked).
  • core/tools/AGENTS.md: new ADR-0519 entry under "Governing ADRs" documents the CLI gate invariant + frame-loop invariant. Keep the entry through future merges.

core/src/libvmaf.c is not touched; the public API (vmaf_read_pictures, vmaf_use_tiny_model) is unchanged. The rank-4 DNN dispatch in vmaf_ctx_dnn_run_frame_nchw is upstream- internal and consumes ref argument bytes without caring about the slot semantics — the CLI's open-twice strategy works precisely because that dispatch is slot-agnostic. If an upstream refactor changes the dispatch to consult both ref and dist (e.g. for FR-only dual-input models), the fork-side wiring needs to either pass dist explicitly or expose a public vmaf_read_pictures_nr API.

ffmpeg-patch stack: no impact. The fork's FFmpeg filter does not surface NR-mode wiring today.

Netflix upstream does not ship --no-reference; the flag is a fork-local addition.

fix/msvc-unistd-gating (ADR-0521)

Rebase sensitivity: low — targeted portability guards on upstream-shared files.

Two files touched: core/src/feature/x86/vif_avx512.c and core/tools/yuv_input.c.

vif_avx512.c is a fork-local AVX-512 TU (no Netflix/vmaf upstream equivalent). The VMAF_NOINLINE_NOCLONE macro is added at the TU level and does not affect public headers or the ABI.

yuv_input.c has an upstream counterpart in Netflix/vmaf. The _WIN32 shims (fstat_fstat64, S_ISREG, off_t) are added inside the existing #ifdef _WIN32 block, immediately after the already-present _fileno alias. On upstream sync: check whether Netflix has independently added MSVC portability to yuv_input.c; if so, prefer their solution and drop the fork-local block. The change is a four-line addition inside an existing guarded block — low merge conflict risk.

No ffmpeg-patches file touches either file. No public API change.


fix/per-shot-scene-threshold-and-1-shot-chart (ADR-0513)

No rebase impact. Changes confined to fork-local trees: tools/vmaf-tune/src/vmaftune/per_shot.py (new split_long_shots helper + diff_threshold / framerate / max_shot_duration_sec kwargs on detect_shots), tools/vmaf-tune/src/vmaftune/cli.py (--scene-threshold + --max-shot-duration flags on tune-per-shot), tools/vmaf-tune/src/vmaftune/report.py (_shot_plot_fn uses ax.hlines bands instead of a step plot), tools/vmaf-tune/tests/test_per_shot.py + test_report.py (6 new regression tests). The C-side core/tools/vmaf_per_shot.c is untouched — the new --scene-threshold flag passes through to the existing --diff-threshold C option that has been in tree since ADR-0222. Docs: ADR-0512, docs/adr/README.md index row, docs/usage/vmaf-tune.md flag rows + "Tuning scene sensitivity" section, docs/state.md Recently-closed rows, changelog.d/fixed/per-shot-scene-threshold-and-1-shot-chart.md. Netflix upstream does not ship tools/vmaf-tune/.

feat/compare-rate-quality-sweep — ADR-0516

No rebase impact. Changes confined to fork-local files: tools/vmaf-tune/src/vmaftune/compare.py (new compare_codecs_sweep, SweepReport, probe_encoder_available, detect_schema_version, v2 emitters, DEFAULT_CPU_ENCODERS, HARDWARE_ENCODERS, SCHEMA_VERSION_V1, SCHEMA_VERSION_V2), tools/vmaf-tune/src/vmaftune/cli.py (the _run_compare runner gains --target-vmafs parsing + sweep dispatch, the _run_report runner ingests v2 JSON into CodecSweepPoint), tools/vmaf-tune/src/vmaftune/report.py (new CodecSweepPoint, compute_pareto_frontier, _sweep_plot_fn per-codec line chart, v2 summary table renderers in both markdown + HTML), tools/vmaf-tune/tests/test_compare_rate_quality_sweep.py (new file, 24 regression tests), tools/vmaf-tune/AGENTS.md (v1 vs v2 schema invariant note + per-target bisect predicate construction rule), docs/usage/vmaf-tune.md (multi-target sweep section + flag table update + schema migration note), docs/adr/0516-vmaf-tune-compare-rate-quality-sweep.md (new), docs/adr/README.md (index row), docs/state.md (Recently closed row), changelog.d/added/compare-rate-quality-sweep.md (new fragment). Netflix upstream does not ship tools/vmaf-tune/; no upstream-shared C sources, public headers, Meson options, or ffmpeg-patches/ patches are touched.


fix/compare-source-is-container-plumbing (ADR-0509)

No rebase impact. Changes confined to tools/vmaf-tune/ (fork-local package) — src/vmaftune/cli.py (the _run_compare runner, the new _TrackedDefaultAction argparse action, _stamp_tracked_default_sentinels, and the _resolve_compare_source_geometry helper) and tests/test_compare.py (7 new regression tests). Netflix upstream does not ship tools/vmaf-tune/; no upstream-shared C sources, public headers, or build files are modified. The ADR (0509) and changelog fragment are fork-local docs only.


fix/chug-extract-vmaf-alignment — ADR-0510

No rebase impact. Changes confined to fork-local files: ai/scripts/extract_k150k_features.py, ai/scripts/chug_extract_features.py, ai/tests/test_extract_k150k_features.py, ai/tests/test_chug.py, ai/tests/test_chug_extract_features_smoke.py (new), docs/adr/0510-chug-extract-vmaf-alignment-fr-from-nr-guard.md (new), docs/adr/README.md (index row), docs/rebase-notes.md (this entry), docs/state.md (Recently closed row), ai/AGENTS.md (K150K-A invariant update), changelog.d/fixed/0509-*.md (new). The entire ai/ package and the FR-from-NR adapter pattern are fork-local — Netflix upstream has no CHUG ingestion, no K150K-A extractor, and no FR-from-NR adapter. No upstream-shared code, headers, build files, public C-API, or feature extractors are modified; the libvmaf CLI and all backends are unchanged.

fix/vulkan-two-variant-vif-shader (ADR-0512, supersedes ADR-0492)

No rebase impact on Netflix upstream — the Vulkan backend and its GLSL compute shaders are entirely fork-local (the core/src/vulkan/ and core/src/feature/vulkan/ trees do not exist in upstream). Fork-internal rebase invariants:

  • core/src/feature/vulkan/shaders/vif.comp was renamed into vif_fp64.comp + new sibling vif_fp32.comp (the original file is removed). Any future patch series that targets vif.comp by name must be retargeted onto both variants — kernel changes touch BOTH in lockstep (see core/src/feature/vulkan/AGENTS.md).
  • VmafVulkanContext gained an int has_float64 field (core/src/vulkan/vulkan_internal.h). Wire-compatible: feature TUs read it via the internal header, not the public ABI.
  • VmafVulkanConfiguration gained a public int require_fp64 field (core/include/libvmaf/libvmaf_vulkan.h). Append-only ABI extension — existing zero-initialised callers get the auto-fallback default.
  • New internal entry point vmaf_vulkan_context_new_with_opts(out, device_index, require_fp64); the original vmaf_vulkan_context_new is preserved as a wrapper that passes require_fp64 = 0.
  • New CLI flag --vulkan-require-fp64 (and underscore alias --vulkan_require_fp64); the usage string was split across two fprintf calls to stay under the C99 4095-char string-literal limit.

fix/dev-container-backend-exposure (ADR-0514)

No rebase impact. dev/Containerfile, dev/docker-compose.yml, and dev/AGENTS.md are entirely fork-local — upstream Netflix/vmaf does not ship the vmaf-dev-mcp container stack. If upstream ever ships its own dev container, merge by adopting upstream's image discipline and re-applying the four invariants documented in dev/AGENTS.md (tcm/latest/lib on LD_LIBRARY_PATH, no VK_ICD_FILENAMES pin, /dev/dri/by-path bind-mount, build-time backend probe).

fix/mcp-run-benchmark-repair — no rebase impact

All changed files (mcp-server/, testdata/bench_all.sh, docs/adr/0513-*, docs/mcp/, changelog.d/, docs/state.md) are fork-local. bench_all.sh is a fork-local benchmarking helper not present in Netflix/vmaf upstream. No rebase action required on upstream sync.


fix/restore-cuda-kernel-lifecycle-helpers

No rebase impact. Investigation confirmed VmafCudaKernelLifecycle, VmafCudaKernelReadback, and helper functions are intact in core/src/cuda/kernel_template.h (fork-local, ADR-0246). Changes confined to docs/state.md and changelog.d/ -- both fork-local, not present in Netflix upstream.


refactor/aiutils-vmaftune-corpus-dedup — no rebase impact

tools/vmaf-tune/ is fork-local. ai/src/aiutils/ is fork-local. No upstream-shared files are touched; no rebase action required.

fix/saliency-per-mb-eval-2026-05-15 — integer_vif enable_chroma

refactor/gpu-dispatch-env-pthread-once (ADR-0461)

No rebase impact: adds core/src/gpu_dispatch_env.{h,c} (new fork-local files) and modifies cuda/dispatch_strategy.c, vulkan/dispatch_strategy.c, sycl/dispatch_strategy.cpp — all fork-local TUs with no Netflix upstream equivalents. If upstream Netflix ever introduces their own dispatch env handling, merge by adopting their approach and dropping this helper.

No rebase impact: doc-only change. docs/state.md is fork-local and not present in Netflix upstream; upstream syncs do not touch it.


test/output-public-api-coverage-2026-05-16

No rebase impact. All changes are confined to core/test/test_output.c and changelog.d/. The test file is fork-local; upstream Netflix/vmaf does not ship test_output.c. No upstream-shared C sources, public headers, or build files are modified.


fix/sycl-motion-fps-weight-vulkan-import-status-2026-05-16

Sub-task B -- integer_motion_v2_sycl.cpp: adds motion_fps_weight to MotionV2StateSycl struct and options_motion_v2_sycl[]. If upstream Netflix ever adds motion_fps_weight to integer_motion_v2.c (the CPU reference), both the SYCL and CUDA motion_v2 twins should pick it up in the same PR per the invariant added to core/src/feature/sycl/AGENTS.md.

Sub-task A -- libvmaf_vulkan.h: removes stale -ENOSYS until T7-29 part 2 lands from the @return lines of vmaf_vulkan_import_image, vmaf_vulkan_wait_compute, and vmaf_vulkan_read_imported_pictures. No upstream rebase conflict expected -- the public Vulkan header is fork-local.

No rebase impact: fix/dev-mcp-stage3-and-bundled-fixes-2026-05-16 touches only dev/Containerfile, dev/AGENTS.md, docs/research/0135-*, and changelog.d/fixed/dev-mcp-container-stage-3.md. These are all fork-local infra files; no upstream-shared code, headers, build files, or feature extractors are modified. No sync-upstream conflicts expected.


No rebase impact: audit/t3-9b-ssimulacra2-ulp-audit — doc-only PR (ADR-0467, changelog fragment, BACKLOG update). No C files touched. No upstream-shared paths modified.

No rebase impact: feat/tiny-ai-registry-ci-and-saliency-v2-promotion-2026-05-15 touches model/tiny/registry.json (fork-local tiny-AI registry), docs/ai/models/ (fork-local model cards), docs/adr/0444-* (fork-local ADR), and the registry-validate CI job (fork-local CI). No upstream-shared code, headers, build files, or feature extractors are modified; the saliency model change is registry and docs only — the C-side mobilesal extractor is unaffected. Sync-upstream conflicts in this area are not expected.

No rebase impact: fix/mcp-embedded-docs-live-2026-05-14 updates fork-local MCP documentation and tools/vmaf-tune auto-planner code only; it does not touch upstream-shared code, headers, build files, or rebase-sensitive invariants.

The intended reader is whoever runs the next /sync-upstream (see ADR-0002 and .claude/skills/sync-upstream/). Read top-to-bottom before resolving conflicts.

Format

Each entry is a ### NNNN — short title heading with three fields:

  • Touches: paths likely to conflict on upstream merge.
  • Invariant: what the fork relies on that an upstream change could silently drop.
  • Re-test: the command(s) to run after the merge to confirm the invariant survived. Reproducer-style — no surrounding prose required.

IDs are assigned in commit order and never reused. A single entry may cover several PRs in one workstream; cross-link from the ID heading.

Entries (backfilled 2026-04-18 per ADR-0108 adoption)

perf/vif-cpu-workspace-hoist-2026-05-16 — VifState scratch buffer hoist (ADR-0452)

  • Touches: core/src/feature/vif.c, core/src/feature/float_vif.c, core/src/feature/vif.h.
  • Invariant: VifState gains a float *vif_buf field (VIF_SCRATCH_BUF_CNT × scaled_float_stride × scaled_h bytes, allocated in init, freed in close). compute_vif's signature gains a trailing float *data_buf parameter — callers must pass a buffer of at least 10 × ALIGN_CEIL(w * sizeof(float)) × h bytes. If an upstream Netflix commit modifies compute_vif's signature or adds fields to the implicit scratch layout, the fork's extra parameter must be reconciled with the upstream change. The fork does NOT carry the upstream per-frame allocation; if upstream adds a new scratch sub-plane, extend VIF_SCRATCH_BUF_CNT and the VifState::vif_buf allocation size in the same PR.
  • Re-test:

```shell ninja -C build meson test -C build 2>&1 | grep -E "Ok|Fail" # Confirm 0 failures

perf/cambi-sycl-event-chain-2026-05-16 — CAMBI SYCL GPU-to-GPU event chains (SY-1)

  • Touches: core/src/feature/sycl/integer_cambi_sycl.cpp, core/src/feature/sycl/AGENTS.md, docs/adr/0471-cambi-sycl-event-chain.md.
  • Invariant: launch_spatial_mask, launch_decimate, and launch_filter_mode now return sycl::event and accept a sycl::event dep parameter (except launch_spatial_mask, which has no predecessor). If upstream Netflix ever rewrites the CAMBI SYCL port (unlikely — SYCL is fork-only), preserve the event-chain structure and ensure the two semantically-required q.wait() points (post-H2D and post-D2H) remain. The CUDA twin (integer_cambi_cuda.c) retains synchronous v1 posture and is not affected by this change.
  • Re-test:
meson test -C build --suite=fast cambi_sycl
python3 scripts/ci/cross_backend_parity_gate.py --feature cambi --places 4

fix/psnr-enable-chroma-gpu-parity-2026-05-16 — PSNR enable_chroma option GPU parity

  • Touches: core/src/feature/cuda/integer_psnr_cuda.c, core/src/feature/sycl/integer_psnr_sycl.cpp, core/src/feature/vulkan/psnr_vulkan.c, docs/metrics/features.md, docs/research/0135-*, docs/adr/0452-*, changelog.d/fixed/psnr-enable-chroma-cross-backend.md, docs/research/0136-psnr-enable-chroma-cross-backend-2026-05-16.md, docs/adr/0453-psnr-enable-chroma-gpu-parity.md.
  • Invariant: The enable_chroma option default is true on all backends. The n_planes clamp in GPU init() must stay in the following order: (1) pix_fmt == YUV400P sets n_planes = 1; (2) !enable_chroma && n_planes > 1 also clamps to 1. If upstream Netflix ever adds option-table support to the CUDA/Vulkan twins, port any new options but preserve the enable_chroma entry and its default_val.b = true exactly — a default flip to false would silently suppress chroma output and break the cross-backend parity gate.
  • Re-test:
python3 scripts/ci/cross_backend_parity_gate.py \
    --backends cpu cuda --features psnr --places 4
python3 scripts/ci/cross_backend_parity_gate.py \
    --backends cpu cuda --features psnr --places 4 \
    --feature-opts 'psnr=enable_chroma=false' \
    --feature-opts 'psnr_cuda=enable_chroma=false'

fix/vmaf-tune-temporal-saliency-2026-05-15 — recommend-saliency temporal aggregation

  • Touches: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_saliency.py, and docs/usage/vmaf-tune.md.
  • Invariant: mean remains the default compatibility reducer for recommend-saliency --saliency-aggregator. Changing the default changes user-visible saliency ROI behaviour and needs an ADR-0396 follow-up plus usage-doc update.
  • Re-test:
PYTHONPATH=tools/vmaf-tune/src pytest tools/vmaf-tune/tests/test_saliency.py -q

fix/saliency-per-mb-eval-2026-05-15 — saliency per-block IoU evaluator

  • Touches: ai/scripts/eval_saliency_per_mb.py, ai/tests/test_eval_saliency_per_mb.py, ai/AGENTS.md, docs/ai/saliency-per-mb-eval.md, docs/ai/index.md, docs/ai/roadmap.md, and mkdocs.yml.
  • Invariant: video-saliency model promotion should be measured at the encoder ROI block grid, not only full-resolution pixel IoU. Keep the evaluator dependency-light (numpy plus .npy / PGM loaders) so training sandboxes can run it without Pillow or OpenCV.
  • Re-test:
PYTHONPATH=. pytest ai/tests/test_eval_saliency_per_mb.py -q

fix/chug-hdr-audit-splits-2026-05-15 — CHUG HDR audit and content-safe splits

  • Touches: ai/scripts/chug_extract_features.py, ai/scripts/train_konvid_mos_head.py, ai/scripts/extract_k150k_features.py, ai/tests/test_chug.py, ai/tests/test_train_konvid_mos_head.py, ai/tests/test_extract_k150k_features.py, ai/AGENTS.md, docs/ai/chug-ingestion.md, and docs/ai/datasets/k150k.md.
  • Invariant: CHUG train/validation/test partitions are keyed by chug_content_name, not by individual bitrate-ladder rows. The materialiser writes split, chug_split_key, and chug_split_policy into every feature row. Preserve the --audit-output ffprobe HDR metadata audit as a pre-training guard. train_konvid_mos_head.py consumes explicit splits when available instead of silently re-shuffling CHUG rows. The FR-from-NR parquet extractor preserves CHUG side metadata when --metadata-jsonl is supplied.
  • Re-test: PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_chug.py ai/tests/test_train_konvid_mos_head.py ai/tests/test_extract_k150k_features.py -q

fix/tiny-ai-rgb-high-bitdepth-2026-05-15 — LPIPS / DISTS high-bit-depth input

  • Touches: core/src/dnn/tiny_extractor_template.h, core/src/feature/feature_lpips.c, core/src/feature/feature_dists.c, core/test/test_dists.c, core/src/dnn/AGENTS.md, core/src/feature/AGENTS.md, docs/ai/extractor-template.md, docs/ai/models/lpips_sq.md, docs/ai/models/dists_sq.md, docs/metrics/dists.md, and docs/metrics/features.md.
  • Invariant: LPIPS and DISTS-Sq accept planar 8/10/12/16-bit YUV while keeping the ONNX tensor ABI unchanged: ImageNet-normalised RGB8, NCHW [1,3,H,W], named inputs ref / dist, scalar output score. High-bit-depth samples are little-endian 16-bit containers rounded into the 8-bit domain before the shared BT.709 limited-range RGB conversion.
  • Re-test: meson test -C core/build-tiny-rgb-hbd test_dists test_lpips --print-errorlogs

fix/mcp-runtime-doc-status-2026-05-15 — embedded MCP runtime docs

  • Touches: docs/api/mcp.md, docs/development/build-flags.md, core/meson_options.txt, and libvmaf/AGENTS.md.
  • Invariant: embedded MCP is no longer an all-entrypoint -ENOSYS scaffold. Preserve the runtime contract when rebasing: stdio / UDS / loopback-SSE transports are live when their build flags are enabled; compute_vmaf uses a per-call ephemeral VmafContext; mutating measurement-thread tools still wait on the future SPSC bridge; enable_mcp remains default-off until that bridge lands.
  • Re-test: meson setup /tmp/vmaf-mcp-doc-check -Denable_mcp=true -Denable_mcp_stdio=true -Denable_mcp_uds=true -Denable_mcp_sse=enabled && ninja -C /tmp/vmaf-mcp-doc-check test_mcp_smoke && meson test -C /tmp/vmaf-mcp-doc-check test_mcp_smoke

fix/mcp-compute-vmaf-high-bitdepth-2026-05-15 — MCP compute_vmaf bitdepth

  • Touches: core/src/mcp/compute_vmaf.c, core/src/mcp/dispatcher.c, core/test/test_mcp_smoke.c, core/src/mcp/AGENTS.md, docs/api/mcp.md, and docs/mcp/embedded.md.
  • Invariant: embedded MCP compute_vmaf accepts YUV420p at 8/10/12/16 bpc and defaults to 8 when bitdepth is omitted. High-bit-depth raw samples are little-endian 16-bit words read directly into libvmaf picture storage. Do not silently add YUV422P or YUV444P without extending the tool schema with an explicit pixel_format argument and matching docs/tests.
  • Re-test: meson test -C core/build-mcp-hbd test_mcp_smoke --print-errorlogs

fix/chug-cuda-feature-split-2026-05-15 — FR-from-NR CUDA feature split

  • Touches: ai/scripts/extract_k150k_features.py, ai/tests/test_extract_k150k_features.py, ai/AGENTS.md, docs/ai/datasets/k150k.md, and docs/ai/chug-ingestion.md.
  • Invariant: CUDA mode in the FR-from-NR extractor uses explicit CUDA feature names for the stable CUDA pass and --cpu-vmaf-bin for the residual CPU feature pass (float_ssim, cambi). Do not collapse this back into one generic all-feature --backend cuda invocation; local CHUG 10-bit clips reproduced duplicate feature-key writes and CUDA context synchronization failures on that path.
  • Re-test: PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_extract_k150k_features.py -q

fix/vmaf-tune-libvpx-adapter-2026-05-14 — vmaf-tune libvpx-vp9 adapter

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, tools/vmaf-tune/src/vmaftune/codec_adapters/libvpx.py, tools/vmaf-tune/src/vmaftune/encode.py, and docs/usage/vmaf-tune*.md.
  • Invariant: libvpx-vp9 stays a normal codec-adapter registry entry. Do not add VP9 branches to corpus / encode search loops; the adapter owns -deadline good, -cpu-used, -crf, -b:v 0, -row-mt 1, and FFmpeg-native -pass / -passlogfile wiring. supports_encoder_stats remains false until a binary VP9 first-pass stats parser lands.
  • Re-test: PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_codec_adapter_libvpx.py tools/vmaf-tune/tests/test_encode_multi_codec.py -q

fix/ai-frame-loader-color-pixfmt-2026-05-14 — packed colour frame loader

  • Touches: ai/src/vmaf_train/data/frame_loader.py, ai/tests/test_frame_loader.py, docs/ai/training.md, and ai/AGENTS.md.
  • Invariant: frame-loader support is limited to byte-contiguous formats with unambiguous tensor shape: gray -> HxW, and rgb24 / bgr24 / rgba / bgra -> HxWxC. Planar or subsampled formats such as yuv420p must keep failing before spawning ffmpeg until a PR adds explicit plane semantics.
  • Re-test: PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_frame_loader.py -q

fix/mkdocs-strict-pre-push-2026-05-15 — mkdocs strict-mode pre-push hook

  • Touches: scripts/git-hooks/pre-push-mkdocs-strict.sh (new), scripts/git-hooks/pre-push (delegation call appended), .pre-commit-config.yaml (new mkdocs-strict local hook entry), docs/adr/0466-mkdocs-strict-pre-push-hook.md (new ADR).
  • Invariant: The hook gate mirrors the CI docs.yml lane (ADR-0403): mkdocs build --strict --quiet with the repo-root mkdocs.yml. Keeping the hook's config-file flag pointed at mkdocs.yml in the repo root is load-bearing — if mkdocs.yml is ever moved, update pre-push-mkdocs-strict.sh in the same PR. The SKIP=mkdocs-strict bypass token is the per-hook escape hatch; preserve it across rebases so the CI-gate-mirror contract (which also respects SKIP) stays coherent.
  • Re-test:
# Touch a docs file with a known-good anchor, push — hook should pass:
touch docs/index.md && git push
# Touch docs/index.md, add a broken anchor ref, push — hook should block:
echo "[bad](#nonexistent)" >> docs/index.md && git push

fix/dists-extractor-2026-05-14 — DISTS-Sq extractor smoke surface

  • Touches: core/src/feature/feature_extractor.c, core/src/feature/feature_dists.c, core/src/meson.build, core/test/meson.build, .gitattributes, model/tiny/registry.json, and docs/metrics/dists.md.
  • Invariant: dists_sq is a registered tiny-AI full-reference extractor that mirrors LPIPS' two-input ABI: model_path option, VMAF_DISTS_SQ_MODEL_PATH environment fallback, ONNX inputs ref / dist, scalar output score, and emitted feature key dists_sq. model/tiny/dists_sq.onnx is a smoke placeholder marked dists_sq_placeholder_v0; do not present it as production DISTS weights.
  • Re-test: meson test -C build-dists test_dists && .venv/bin/python ai/scripts/validate_model_registry.py model/tiny/registry.json

fix/backlog-gap-pass-10-2026-05-14 — KonViD-150k split score ingestion

  • Touches: ai/scripts/konvid_150k_to_corpus_jsonl.py, ai/tests/test_konvid_150k.py, docs/ai/konvid-150k-ingestion.md, ai/AGENTS.md.
  • Invariant: konvid_150k_to_corpus_jsonl.py accepts both the URL-manifest layout (manifest.csv + clips/) and the staged split score layout (k150ka_scores.csv / k150kb_scores.csv plus k150ka_extracted/ / k150kb_extracted/). Explicit --manifest-csv stays strict and must not silently fall back. Output JSONL schema remains unchanged.
  • Re-test: PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_konvid_150k.py -q

fix/backlog-gap-pass-11-2026-05-14 — vmaf-tune auto source probe

  • Touches: tools/vmaf-tune/src/vmaftune/auto.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_auto_short_circuits.py, docs/usage/vmaf-tune.md, and tools/vmaf-tune/AGENTS.md.
  • Invariant: run_auto(smoke=False, meta_override=None) is not a scaffold. It probes source geometry, duration, and HDR once through _probe_source_meta, using one subprocess runner seam for testability. Probe failure must degrade to conservative defaults rather than raising or reintroducing NotImplementedError.
  • Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_auto_short_circuits.py \
  tools/vmaf-tune/tests/test_auto_confidence_aware.py \
  tools/vmaf-tune/tests/test_auto_recipe_overrides.py \
  tools/vmaf-tune/tests/test_auto_phase_f1_f2.py -q
.venv/bin/python -m ruff check \
  tools/vmaf-tune/src/vmaftune/auto.py \
  tools/vmaf-tune/src/vmaftune/cli.py \
  tools/vmaf-tune/tests/test_auto_short_circuits.py

fix/backlog-gap-pass-12-2026-05-14 — MCP docs + SSIMULACRA2 snapshot hardening

  • Touches: python/test/ssimulacra2_test.py, docs/mcp/index.md, docs/mcp/embedded.md, docs/mcp/release-channel.md, mcp-server/vmaf-mcp/README.md, mcp-server/AGENTS.md.
  • Invariant: the SSIMULACRA2 snapshot gate remains fork-local. It pins current extractor output for the 576x324 fixture with explicit x86_64 and arm64/aarch64 baselines, pins the shared 160x90 tail fixture, and must invoke the repo vmaf binary with an argv list, not a shell string. The external MCP server docs list all seven live tools. The embedded MCP docs describe the v3 runtime accurately: stdio, UDS, and loopback SSE are live; list_features and compute_vmaf are live; the SPSC measurement-thread drain and mutating tools remain future work.
  • Re-test: PYTHONPATH=python .venv/bin/python -m pytest python/test/ssimulacra2_test.py -q && PYTHONPATH=mcp-server/vmaf-mcp/src .venv/bin/python -m pytest mcp-server/vmaf-mcp/tests/test_server.py -q

fix/read-json-model-dynamic-limits-2026-05-14 — dynamic JSON model arrays

  • Touches: core/src/read_json_model.c, core/src/model.h, core/src/model.c, and core/test/test_model.c.
  • Invariant: JSON model loading grows VmafModel.feature and score_transform.knots.list from the payload. Do not restore the old fixed MAX_FEATURE_COUNT / MAX_KNOT_COUNT parser caps; models with 65+ features or 11+ score-transform knots must parse when the JSON is otherwise valid.
  • Re-test: meson test -C build test_model --print-errorlogs.

fix/real-scaffold-gap-pass-4-2026-05-14 — vmaf-tune x264 two-pass

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/x264.py, tools/vmaf-tune/src/vmaftune/encode.py consumers, and docs/usage/vmaf-tune.md.
  • Invariant: libx264 opts into the shared Phase F two-pass seam through supports_two_pass = True and two_pass_args() -> ("-pass", N, "-passlogfile", path). The encode driver must stay adapter-driven; do not add an x264 branch in build_ffmpeg_command.
  • Re-test: PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_codec_adapter_x265_two_pass.py tools/vmaf-tune/tests/test_auto_phase_f1_f2.py -q

fix/backlog-gap-pass-8-2026-05-14 — CUDA psnr_hvs drain-batch integration

  • Touches: core/src/feature/cuda/integer_psnr_hvs_cuda.c, core/src/feature/cuda/AGENTS.md, docs/backends/cuda/overview.md, docs/development/cuda-profile-2026-05-03.md.
  • Invariant: integer_psnr_hvs_cuda.c enqueues all three plane-partial DtoH copies on s->lc.str during submit, calls vmaf_cuda_kernel_submit_post_record(&s->lc, fex->cu_state), and uses vmaf_cuda_kernel_collect_wait(&s->lc, fex->cu_state) in collect before reading h_partials[]. Do not move the readback + raw cuStreamSynchronize(s->lc.str) back into collect.
  • Re-test: meson setup build-cuda-drain libvmaf -Denable_cuda=true -Denable_sycl=false -Denable_vulkan=disabled --buildtype=debug && ninja -C build-cuda-drain src/libvmaf.so.3.0.0 && python3 scripts/ci/cross_backend_vif_diff.py --vmaf-binary "$PWD/build-cuda-drain/tools/vmaf" --reference testdata/ref_576x324_48f.yuv --distorted testdata/dis_576x324_48f.yuv --width 576 --height 324 --feature psnr_hvs --backend cuda --places 3. If the local CUDA fatbin build is blocked by toolkit include-path drift, at minimum compile the touched host TU: ninja -C build-cuda-drain src/liblibvmaf_feature.a.p/feature_cuda_integer_psnr_hvs_cuda.c.o.

fix/tune-scaffold-gap-pass-2-2026-05-14 — vmaf-tune per-shot real bisect CLI

  • Touches: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/per_shot.py, tools/vmaf-tune/tests/test_per_shot.py, docs/usage/vmaf-tune.md, docs/adr/0392-vmaf-tune-phase-d-per-shot.md, tools/vmaf-tune/AGENTS.md.
  • Invariant: the CLI default for vmaf-tune tune-per-shot is the real Phase-B bisect backend. It extracts each detected half-open shot to temporary raw YUV, passes explicit geometry into bisect_target_vmaf, and emits measured per-shot VMAF in the JSON plan. --predicate-module MODULE:CALLABLE is the only CLI path that bypasses real bisect; the adapter-default predicate remains library-only dry-run behaviour.
  • Re-test: PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q.

fix/scaffold-gap-pass-2026-05-14b — vmaf-tune compare real bisect CLI

  • Touches: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/compare.py, tools/vmaf-tune/tests/test_compare.py, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-bisect.md, tools/vmaf-tune/AGENTS.md.
  • Invariant: the CLI default for vmaf-tune compare is the real Phase-B bisect backend when source geometry is supplied. The programmatic compare_codecs() default may still return ok=False because it lacks geometry, but the CLI must not silently rank using a placeholder predicate. --predicate-module MODULE:CALLABLE is the explicit custom/test escape hatch.
  • Re-test: PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_compare.py -q.

fix/scaffold-gap-pass-2026-05-14 — vmaf-tune hardware predictor real weights

  • Touches: tools/vmaf-tune/src/vmaftune/predictor_train.py, model/predictor_{h264,hevc,av1}_{nvenc,qsv}.onnx, matching model cards, docs/ai/predictor.md, tools/vmaf-tune/AGENTS.md.
  • Invariant: the trainer accepts canonical Phase-A rows and historical hardware-sweep aliases. Do not add an external corpus-conversion script for runs/phase_a/full_grid/comprehensive.jsonl; the loader is the compatibility seam.
  • Re-test: PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_predictor_train.py -q.

fix/vmaf-tune-ai-scaffold-state-cleanup — auto HDR dispatch + ensemble seed registry flip (2026-05-14)

  • Touches: tools/vmaf-tune/src/vmaftune/auto.py, tools/vmaf-tune/tests/test_auto_short_circuits.py, tools/vmaf-tune/tests/test_auto_recipe_overrides.py, model/tiny/registry.json, python/test/model_registry_schema_test.py, docs/usage/vmaf-tune.md, docs/state.md, docs/research/0100-vmaf-tune-ai-scaffold-audit-2026-05-14.md, changelog.d/fixed/vmaf-tune-ai-scaffold-state-cleanup.md.
  • Invariant: vmaf-tune auto must use vmaftune.hdr.hdr_codec_args(codec, info) per HDR cell; a single generic PQ tuple is not valid because x265/SVT-AV1/NVENC/VVenC carry HDR signalling through different ffmpeg flag families. Recipe-adjusted effective_thresholds from _apply_recipe_override must be the thresholds used for F.3 decisions and JSON metadata. The five fr_regressor_v2_ensemble_v1_seed{0..4} registry rows are production entries (smoke: false) only while their sidecars carry matching SHA-256s and a passing PROMOTE gate.
  • Re-test on rebase:
PYTHONPATH=tools/vmaf-tune/src python -m pytest \
    tools/vmaf-tune/tests/test_auto_short_circuits.py \
    tools/vmaf-tune/tests/test_auto_recipe_overrides.py \
    tools/vmaf-tune/tests/test_auto_confidence_aware.py \
    tools/vmaf-tune/tests/test_hdr.py -v
PYTHONPATH=python python -m pytest python/test/model_registry_schema_test.py -v
bash core/test/dnn/test_registry.sh

feat/libvmaf-metal-filter-iosurface — Metal IOSurface zero-copy import (ADR-0423)

  • Touches: core/include/libvmaf/libvmaf_metal.h (new VmafMetalExternalHandles + four entry points appended), core/src/metal/picture_import.mm (new TU implementing the IOSurfaceLock + memcpy ring), core/src/metal/state_priv.h (shared struct defs between common.mm and picture_import.mm), core/src/metal/import.h (internal bridge for libvmaf.c HAVE_METAL block), core/src/metal/common.mm (state-free hook for the import ring), core/src/libvmaf.c (HAVE_METAL block: vmaf_metal_import_state / vmaf_metal_read_imported_pictures), core/src/metal/meson.build (one-line TU registration), core/test/test_metal_smoke.c (input-validation + device-default skip semantics), ffmpeg-patches/0013-libvmaf-add-libvmaf-metal-filter.patch (new), ffmpeg-patches/series.txt, ffmpeg-patches/README.md.
  • Invariant: the import path is geometry-pinned to the first (w, h, bpc) tuple seen — subsequent imports with a different geometry return -EINVAL. Ring depth is 2 slots (VMAF_METAL_IMPORT_RING); a slot is identified by index % VMAF_METAL_IMPORT_RING and discarded if the caller's index no longer matches the stored one. CPU memcpy path is synchronous so vmaf_metal_wait_compute is a no-op (returns 0); do not promote it to a MTLSharedEvent drain without first switching the import body to an async MTLCommandBuffer submission. Apple-Family-7+ gate is enforced inside vmaf_metal_state_init_external via [device supportsFamily:MTLGPUFamilyApple7]-ENODEV on non-Apple hosts; the ffmpeg patch surfaces this as AVERROR(ENODEV) at config_props_metal time. Symbol names are load-bearing for the check_pkg_config probe in patch 0013; do not rename without simultaneously updating the patch.
  • Re-test on rebase:
meson setup build libvmaf -Denable_metal=enabled \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build
nm build/libvmaf/libvmaf.dylib | grep vmaf_metal_picture_import
git -C ffmpeg-8 reset --hard n8.1.1
for p in ffmpeg-patches/000*-*.patch; do
    git -C ffmpeg-8 am --3way "$p" || break
done

Upstream Netflix/vmaf has no Metal backend; no rebase conflict surface against upstream/master. The 0013 patch is fork-local.

fix/saliency-per-mb-eval-2026-05-15 (Batch 4) — Metal install + header fix (ADR-0437)

  • Touches: core/include/core/meson.build (adds is_metal_enabled guard + libvmaf_metal.h to platform_specific_headers), core/test/meson.build (adds test_metal_install_header under host_machine.system() == 'darwin'), core/test/test_metal_install_header.c (new compile+link smoke test), docs/api/gpu.md (Metal + HIP symbol corrections, IOSurface sub-API table), docs/adr/0437-*.md, docs/adr/_index_fragments/0437-*.md, changelog.d/fixed/metal-public-header-install-and-import-state.md, docs/state.md.
  • Invariant: The is_metal_enabled guard mirrors the Vulkan guard (is_vulkan_enabled): both treat enabled and auto as "install the header". Do not change this to install only on enabled; auto on macOS resolves to a real Metal build and the header must be present for FFmpeg's check_pkg_config to succeed (same rationale as ADR-0192 for Vulkan).
  • No rebase conflict surface: upstream Netflix/vmaf has no Metal backend; core/include/core/meson.build diverges from upstream at the first is_cuda_enabled line. The only conflict risk is a batch that also edits platform_specific_headers — resolve by keeping both additions.
  • Re-test on rebase:
meson setup build -Denable_metal=enabled \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson install -C build --destdir /tmp/vmaf-test-install
ls /tmp/vmaf-test-install/usr/local/include/libvmaf/libvmaf_metal.h

fix/metal-includes-and-ffmpeg-patch — Metal kernel batch T8-1c–k (ADR-0421)

  • Touches: core/src/feature/metal/*.metal (7 new kernel files), core/src/feature/metal/*_metal.mm (7 new dispatch files replacing *_metal.c scaffolds), core/src/metal/meson.build (.air custom_targets + metallib pipeline), ffmpeg-patches/0012-*.
  • Invariant: no atomic_ulong in any .metal file — Apple MSL silently drops 64-bit atomic updates (CI run 25685703780). All kernels use per-WG float/uint partials array indexed by bid.y * grid_groups.x + bid.x; host reduces in double. The float_moment_metal.mm corrects provided_features (was wrong in the scaffold: float_moment1/2/std → correct names float_moment_ref1st/dis1st/ref2nd/dis2nd).
  • Re-test on rebase (macOS, Apple-Family-7+):
meson setup build libvmaf -Denable_metal=enabled \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_metal_smoke

On Linux: same build without -Denable_metal (Metal subdir excluded); no Metal tests registered. No upstream rebase conflict surface.

feat/metal-runtime-t8-1b — Metal backend runtime PR (ADR-0420)

  • Touches: core/src/metal/common.{c→mm,h}, core/src/metal/picture_metal.{c→mm}, core/src/metal/kernel_template.{c→mm}, core/src/metal/meson.build, core/test/test_metal_smoke.c, core/src/metal/AGENTS.md.
  • Invariant: the Metal backend ships three Objective-C++ TUs (common.mm, picture_metal.mm, kernel_template.mm) instead of the T8-1 pure-C scaffold. Public ABI in core/include/libvmaf/libvmaf_metal.h unchanged. Internal metal/common.h gained two accessor declarations (vmaf_metal_context_{device,queue}_handle) that consumer TUs call to retrieve bridge-retained void * Metal handles. The Obj-C++ TUs compile with -fobjc-arc via add_project_arguments(language: 'objcpp'). ARC + __bridge_retained / __bridge_transfer casts manage the +1 retain that lives on each C-struct slot. Upstream Netflix/vmaf has no Metal backend; there is no rebase conflict surface against upstream/master.
  • Re-test: meson setup build -Denable_metal=enabled on a recent macOS host (Apple Silicon preferred) + meson compile -C build + meson test -C build test_metal_smoke. On Apple-7+ the smoke test exercises real-device paths; on Intel Macs it short-circuits cleanly on -ENODEV. Non-Darwin builds are unaffected — subdir('metal') is already gated to Darwin.

fix/sve2-probe-darwin-gate — SVE2 build probe gated to non-Darwin hosts (ADR-0419)

  • Touches: core/src/meson.build (the SVE2 cc.compiles() probe block).
  • Invariant: is_sve2_supported = false is forced when host_machine.system() == 'darwin', mirroring the runtime __linux__ gate in core/src/arm/cpu.c::vmaf_get_cpu_flags_arm(). Apple Silicon (M1–M4) is ARMv8.x without SVE2 hardware, so the build-time and runtime gates must stay in lockstep.
  • Re-test: if upstream Netflix ever introduces its own SVE2 probe in core/src/meson.build, drop the fork-local Darwin short-circuit in favour of theirs if it matches the darwin ⇒ false invariant; otherwise layer the Darwin guard on top. Reverse the gate only when (a) Apple ships an arm part with SVE2 — no public roadmap as of 2026-05 — and (b) the runtime probe in arm/cpu.c grows a Darwin branch (e.g. sysctlbyname("hw.optional.arm.FEAT_SVE2", ...)).

fix/macos-test-recal-post-vif-sync — macOS Python test assertions recalibrated for post-bf9ad333 VMAF/ADM values (ADR-0418)

  • Touches: python/test/local_explainer_test.py, python/test/vmafexec_test.py, python/test/vmafexec_feature_extractor_test.py.
  • Invariant: 9+ assertions in those files were updated to the post-VIF-sync values that the macOS-libm binary actually produces, since Netflix upstream only shipped recalibration fixtures for the test_run_vmaf_* tests via 142c0671 / 7209110e / d93495f5 / fe756c9f and not the local_explainer_test::test_explain_vmaf_results, vmafexec_test::test_run_vmafexec_runner_akiyo_*, or the 5× vmafexec_feature_extractor::test_run_float_adm_fextractor_adm_* cases. Each updated line carries an inline # post-VIF-sync (#758) recal comment so the divergence is greppable. Affected values:
  • local_explainer_test.py:10376.68425574067017 → 76.66740228116836
  • vmafexec_test.py:871132.732952 → 132.732323
  • vmafexec_test.py:926, 1032, 108688.030463 → 88.030322
  • vmafexec_feature_extractor_test.py:18340.9420788125 → 0.9185737499999999
  • vmafexec_feature_extractor_test.py:18970.9517253541666667 → 0.8902739375
  • vmafexec_feature_extractor_test.py:19600.9554477708333334 → 0.8780868749999998
  • vmafexec_feature_extractor_test.py:20230.9662835416666665 → 0.8407157499999999
  • vmafexec_feature_extractor_test.py:30300.96851 → 0.962086
  • Re-test: after the next /sync-upstream, if upstream has shipped recalibrated fixtures for any of the listed test names, prefer upstream values over the fork-recalibrated ones in this entry. Mechanical: git grep "post-VIF-sync (#758) recal" enumerates every divergence; for each row, diff against the upstream value at the same line. If upstream still hasn't shipped the fixtures, leave the fork values in place — they're verified against the on-the-fly-VIF binary on the macOS-libm precision floor.

fix/vif-upstream-onthefly-filter-sync — VIF synced to Netflix upstream bf9ad333 + 8c645ce3

  • Touches: core/src/feature/vif.c, vif.h, vif_tools.c, vif_tools.h, vif_options.h, float_vif.c; python/test/quality_runner_test.py, feature_extractor_test.py, result_test.py, vmafexec_test.py.
  • Invariant: fork's VIF C-side now matches upstream HEAD verbatim for the listed files. The only fork-local divergence is float_vif.c::extract() passing s->vif_skip_scale0 ? 1 : 0 for the new compute_vif() parameter (instead of the upstream pattern of reading from a flag set in init). Test cherry-picks took upstream values for VIF score assertions and fork values for VMAF_legacy_score / VMAF_score where the fork's binary diverges from upstream's at places=4 (already pre-loosened).
  • Re-test: after the next /sync-upstream, run meson test -C build (must remain 54/54 OK) and PYTHONPATH=python python -m pytest python/test/feature_extractor_test.py python/test/quality_runner_test.py -q (must show 0 failures excluding niqe_runner skimage env issue). If upstream reverts or further modifies on-the-fly filter generation, this entry's invariant should re-sync rather than carry a fork-local divergence.

fix/master-build-failures-sycl-vulkan — SYCL macro collision + Vulkan SDK fallback + Cambi FR atom rename

  • Touches: core/src/feature/sycl/integer_adm_sycl.cpp, core/src/vulkan/common.c, python/vmaf/core/cambi_feature_extractor.py, python/test/cambi_test.py.
  • Invariant 1 (SYCL): adm_options.h defines ADM_BORDER_FACTOR as a C macro; the #undef before the constexpr redeclaration must remain if upstream ever changes the macro name or value. If upstream removes the macro entirely, the #ifdef-guarded #undef is a no-op and safe.
  • Invariant 2 (Vulkan): #ifndef VK_API_VERSION_1_4 guard must remain until Ubuntu 22.04 is retired from CI (or the minimum Vulkan Headers version is bumped past 1.3.280). Track via ADR-0264 (NVIDIA driver regression gate).
  • Invariant 3 (Cambi FR atom feature): CambiFullReferenceFeatureExtractor uses "cambi_encbd" (not "cambi") as the atom feature name for the distorted CAMBI score. If upstream changes the enc_bitdepth option alias from "encbd" to something else, the vmafexec XML key changes and the Python extractor's wildcard prefix must be updated to match.
  • Re-test: python3 -m pytest python/test/cambi_test.py -k "full_reference or fullref" -v

fix/precommit-onnx-binary-exclude — ADR collision sweep + pre-commit hook hardening

  • Touches: docs/adr/*.md (28 files renumbered to 0388–0415), docs/adr/README.md, docs/adr/_index_fragments/, scripts/ci/check-adr-numbering.sh, .pre-commit-config.yaml, tools/vmaf-tune/tests/test_hdr.py.
  • Invariant: No rebase impact on libvmaf C sources. The ADR renumbering affects documentation only; no code paths reference ADR numbers at runtime. Any in-flight branches that reference the old ADR numbers (0241-vmaf-tiny-v3, 0279-fr-regressor-v2-probabilistic, etc.) will need their references updated to the new numbers after rebasing onto master.
  • Re-test: bash scripts/ci/check-adr-numbering.sh must print "ADR numbering check passed." pre-commit run end-of-file-fixer ruff-check check-adr-numbering --all-files must all pass.

fix/round8-mcp-tmpdir-leak — MCP describe_worst_frames tmp-dir cleanup

No rebase impact: this change is MCP-server-only (mcp-server/vmaf-mcp/src/vmaf_mcp/server.py), touches no libvmaf C source, no public C API headers, no Meson build files, and no FFmpeg patch stack entries. Upstream Netflix/vmaf does not have the MCP server. The change adds a shutil.rmtree before the per-invocation PNG generation loop.

  • Re-test: PYTHONPATH=mcp-server/vmaf-mcp/src python -m pytest mcp-server/vmaf-mcp/tests/test_server.py::test_describe_worst_frames_tmpdir_cleared_on_next_call — must report 1 passed.

fix/round8-opt-nan-bypass — NaN rejection in set_option_double

  • Touches: core/src/opt.c — adds #include <math.h> and an isnan(n) guard in set_option_double.
  • Invariant: all callers of vmaf_option_set with VMAF_OPT_TYPE_DOUBLE must receive -EINVAL when the value string parses to NaN. Upstream Netflix/vmaf's opt.c does not yet have this guard. If Netflix merges a version of opt.c that modifies set_option_double (e.g. to add a new type or change the strtod flow), verify the isnan guard is preserved and still sits before the n < min / n > max checks.
  • Re-test: meson setup core/build-test libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_tests=true && ninja -C core/build-test test/test_opt && core/build-test/test/test_opt — must report 25/25 passed, including test_double_nan_is_rejected and test_double_inf_rejected_when_max_finite.

fix/fex-dedup-by-provided-feature — feature-extractor dedup by provided-feature names (ADR-0385)

  • Touches: core/src/fex_ctx_vector.c (new provided_features_overlap() helper, updated feature_extractor_vector_append() dedup logic); core/test/test_feature_extractor.c (new regression test); core/test/meson.build (adds fex_ctx_vector.c to test target sources).
  • Rebase impact: Low. The change is entirely internal to fex_ctx_vector.c; no public C API headers are touched, no core/include/ changes, no meson_options.txt, no FFmpeg patch stack entries. If upstream Netflix/vmaf rewrites fex_ctx_vector.c in a future sync, port the provided_features_overlap() helper and its two-stage dedup logic forward; reverting to name-only dedup re-opens T-CUDA-FEATURE-EXTRACTOR-DOUBLE-WRITE on every GPU binary that combines --feature <name> with a default model load.
  • Re-test:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu test_feature_extractor
# Expect: 6/6 tests passed, including
#   test_fex_vector_dedup_by_provided_feature_name: pass

# Verify no "cannot be overwritten" warnings:
build-cpu/tools/vmaf \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 --feature adm --threads 1 \
  2>&1 | grep "cannot be overwritten" | wc -l
# → 0

fix/pypsnr-ast-eval — JSON log serialization in PyFeatureExtractorMixin

No rebase impact: this change is Python-only (python/vmaf/core/feature_extractor.py), touches no C/CUDA/SYCL/HIP/Vulkan/Metal source, no public C API headers, no Meson build files, and no FFmpeg patch stack entries. The log files written by _generate_result are transient per-run scratch files (under workdir/); the format change from Python repr to JSON is invisible to callers. If upstream Netflix/vmaf modifies PyFeatureExtractorMixin._get_feature_scores or _generate_result in a future sync, verify that neither side re-introduces str() / ast.literal_eval — the numpy 2.x incompatibility is the root cause of T-PYPSNR-AST-EVAL.

  • Re-test: PYTHONPATH=$PWD/python python3 -m pytest python/test/feature_extractor_test.py -k pypsnr — must report 8/8 passed.

fix/pypsnr-feature-extractor-import — PyPsnrFeatureExtractor class hierarchy restoration

No rebase impact: this change is Python-only (python/vmaf/core/feature_extractor.py), touches no C/CUDA/SYCL/HIP/Vulkan/Metal source, no public C API headers, no Meson build files, and no FFmpeg patch stack entries. If upstream Netflix/vmaf adds or removes PyPsnrFeatureExtractor / PypsnrFeatureExtractor in a future sync, audit feature_extractor.py lines 722–830 to ensure the primary-vs-deprecated alias relationship is preserved (primary = PyPsnr*, deprecated = Pypsnr*).

0086 — TransNet shot-metadata columns + HDR VMAF model port slot (Research-0086, ADR-0300 follow-up)

  • Touches: tools/vmaf-tune/src/vmaftune/__init__.py (CORPUS_ROW_KEYS additive trio, no SCHEMA_VERSION bump), tools/vmaf-tune/src/vmaftune/per_shot.py (summarise_shots, _detect_shots_with_status, ShotMetadata), tools/vmaf-tune/src/vmaftune/corpus.py (_resolve_shot_metadata, row population, new shot_runner kwarg on iter_rows), tools/vmaf-tune/src/vmaftune/hdr.py (transfer-aware select_hdr_vmaf_model, hdr_model_name_for, HDR_MODEL_FILENAME, single-shot warning helper), tests + docs.
  • Invariant: iter_rows runs vmaf-perShot exactly once per source — the cost of TransNet inference is too high to pay per (preset, crf) cell. If a future PR moves shot detection inside the cell loop the corpus-generation wall time roughly doubles. Keep the per-source resolution at the top of iter_rows and pass ShotMetadata down to _row_for. Additionally: _detect_shots_with_status is the only call site that distinguishes "real one-shot source" from "fallback because the binary failed" — the public detect_shots shape cannot carry that boolean and downstream consumers depend on the (shots, ok) tuple to emit (0, 0.0, 0.0) sentinel rows.
  • Upstream conflict probability: zero. Upstream Netflix/vmaf does not carry a vmaf-tune directory, an hdr.py, or a shot-detection harness. The HDR VMAF model port slot (vmaf_hdr_v0.6.1.json) is fork-internal scaffolding — Netflix publishes the canonical artefact outside their public model/ tree. No upstream rebase will touch any of these files.
  • Re-test: pytest tools/vmaf-tune/tests/test_hdr.py tools/vmaf-tune/tests/test_shot_metadata_columns.py tools/vmaf-tune/tests/test_per_shot.py.

0358 — CUDA motion race + leak + motion2/motion3 precision parity (ADR-0358)

  • Touches: core/src/feature/cuda/integer_motion_cuda.c (memset moved from s->str to pic_stream; motion2_score emission switched to the CPU's MIN(score * motion_fps_weight, motion_max_val) post-process in collect + flush; motion3_postprocess_cuda guard relaxed to frame_index > 2 for the pre-incremented frame counter; vmaf_cuda_buffer_host_free (s->sad_host) added to close_fex_cuda and the init_fex_cuda error unwind), core/src/feature/cuda/integer_motion/motion_score.cu (shared-tile inner stride padded TILE_W → `TILE_PITCH = TILE_W
  • 1;launch_bounds(BLOCK_X * BLOCK_Y, 8)added to both bpc kernels),core/src/feature/cuda/integer_motion_v2/motion_v2_score.cu(same padding + launch_bounds for the v2 twin),docs/adr/0358-...md,docs/adr/README.md(index row),docs/backends/cuda/overview.md(motion bit-exact-at-places=4 appendix to "Numerical tolerance vs the CPU scalar path"),docs/state.md(Recently-closed row),changelog.d/fixed/ cuda-motion-race-leak-precision.md. Upstream Netflix/vmaf does not currently ship the motion3-on-CUDA host post-processing surface (motion3_postprocess_cudais fork-local per ADR-0219) so a future rebase touchinginteger_motion_cuda.c` is unlikely to touch the same lines, but a pure-upstream port that resets the post-process to its naive form will silently un-fix bugs 3 + 4.
  • Invariant: the SAD cuMemsetD8Async runs on pic_stream, NOT on the drain stream s->str. The kernel's atomicAdd lives on pic_stream; both streams are CU_STREAM_NON_BLOCKING and there is no event linking them, so co-locating memset + kernel on the same stream is the only thing that orders them. Mirrors the verbatim pattern at integer_motion_v2_cuda.c:188. The motion2_score row emitted to the feature collector is the weighted-and-clipped value MIN(score * motion_fps_weight, motion_max_val), NOT the raw min(prev, cur) SAD score; this matches integer_motion.c:563. The motion3_postprocess_cuda moving-average guard reads frame_index > 2, NOT > 1, because frame_index is pre-incremented before the helper is called.
  • Re-test: build with cd libvmaf && meson setup build-cuda -Denable_cuda=true -Denable_sycl=false --buildtype=release && ninja -C build-cuda, then run meson test -C build-cuda (expect 55/55), then run tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv -d python/test/resource/yuv/src01_hrc01_576x324.yuv -w 576 -h 324 -p 420 -b 8 --backend cuda --feature motion_cuda --output /tmp/cuda.json --json and the same with --backend cpu --feature motion --output /tmp/cpu.json --json; places=4 diff over integer_motion, integer_motion2, integer_motion3 should report 0/144 mismatches at max_abs = 0.00e+00. compute-sanitizer --tool memcheck --leak-check full tools/vmaf ... --backend cuda --feature motion_cuda reports LEAK SUMMARY: 0 bytes leaked in 0 allocations post-fix. compute-sanitizer --tool racecheck reports 0 hazards.

0326 — vmaf-tune codec-adapter dispatcher pivot (ADR-0326, HP-1)

  • Touches: tools/vmaf-tune/src/vmaftune/encode.py (build_ffmpeg_command + new _resolve_codec_args / _legacy_codec_args helpers), tools/vmaf-tune/src/vmaftune/per_shot.py (_segment_command signature + body, new _default_segment_preset), tools/vmaf-tune/src/vmaftune/codec_adapters/ (11 adapters gain ffmpeg_codec_args + extra_params; libaom slice normalised), tools/vmaf-tune/tests/test_encode_dispatcher_per_adapter.py (new), tools/vmaf-tune/tests/test_codec_adapter_libaom.py (slice expectations updated for new contract). Upstream Netflix/vmaf has no vmaf-tune surface, so conflict probability is zero — this entry exists because the dispatcher contract is fork-local and any future adapter PRs need to land both ffmpeg_codec_args and a matching fixture row.
  • Invariant: every entry in tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py::_REGISTRY ships an ffmpeg_codec_args(preset, quality) -> list[str] method that returns the codec-correct argv slice (with -c:v <encoder> as the first two tokens). The runtime contract is enforced by tests/test_encode_dispatcher_per_adapter.py::test_fixture_table_covers_every_registered_adapter — adding an adapter without a fixture row fails this meta-test. x264 and x265 argv shapes stay byte-for-byte ["-c:v", encoder, "-preset", preset, "-crf", str(quality)] (defended by the test_x{264,265}_argv_byte_for_byte_legacy_shape pinning tests). The legacy fallback in encode._resolve_codec_args returns the historic libx264 shape for unregistered encoders so callers that bypass the registry stay invocable.
  • Re-test: cd tools/vmaf-tune && pytest tests/test_encode_dispatcher_per_adapter.py tests/test_per_shot.py tests/test_codec_adapter_libaom.py -v (36 + 16 + 12 = 64 tests green on this branch). For a wider check, run the full vmaf-tune suite — pre-existing failures in test_recommend.py, test_resolution.py, test_encode_multi_codec.py (parse_versions(encoder=...) + encoder_runner=), and test_codec_adapter_{x265,svtav1}.py (parse_versions(encoder=...)) are unrelated and predate HP-1.

0310 — Vulkan VIF int64 reduction race condition Phase 3 fix

  • Touches: core/src/feature/vulkan/shaders/vif.comp (replaces all three bare barrier() calls with explicit memoryBarrierShared(); barrier(); pairs covering the Phase-1 cooperative tile load, the Phase-2 vertical-conv shared write, and the Phase-4 cross-subgroup int64 reduction); plus documentation under docs/research/0089-...md (Phase 3 status appendix), docs/adr/0269-...md (Phase 3 status appendix), docs/state.md (T-VK-VIF-1.4-RESIDUAL closed; new T-VK-VIF-1.4-RESIDUAL-ARC opened), core/src/vulkan/AGENTS.md (Phase 3 update on the existing invariant row), changelog.d/fixed/vif-int64-reduction-race-condition.md. Upstream Netflix/vmaf has no Vulkan backend, so conflict probability for the shader is zero. The entry exists because the fix is rebase-sensitive: any future cherry-pick that touches vif.comp and downgrades a memoryBarrierShared(); barrier(); pair back to a bare barrier() will silently re-introduce the NVIDIA Vulkan 1.4 race.
  • Invariant: vif.comp shared-memory ordering between cooperative-write phases must be release-acquire, not just a bare workgroup-execution barrier. NVIDIA's Vulkan 1.4 default memory model requires the explicit shared-memory release; bare barrier() works at API 1.3 by accident on this driver. SCALE is irrelevant — the fix applies to all four pipeline specialisations because the barrier sites are in the SCALE-shared code. Do NOT remove the explicit memoryBarrierShared() calls even if a perf review claims they are redundant under the GLSL spec wording: empirical real-hardware evidence in research-0089 2026-05-09 appendix shows otherwise on NVIDIA driver 595.71.05.
  • Re-test: apply the local API-1.4 bump (core/src/vulkan/common.c 3 sites + vma_impl.cpp VMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build with meson setup ... -Denable_vulkan=enabled, then run python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan --device 1 --places 4. Expect 0/48 across all four scales. Run the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3" against --vulkan_device 1; expect 5 identical (integer_vif_num_scale2, integer_vif_den_scale2) = (+2.494358e+04, +2.522523e+04) pairs at frame 5. Note that --vulkan_device 0 on this multi-GPU host is the Intel Arc A380 lane and will still fail at API 1.4 (separate T-VK-VIF-1.4-RESIDUAL-ARC row Open).

0309 — Vulkan VIF API-1.4 Phase 2 dump (T-VK-VIF-1.4-RESIDUAL)

  • Touches: docs/research/0089-vulkan-vif-fp-residual-bisect-2026-05-08.md (2026-05-09 status appendix with empirical numbers from the live RTX 4090), docs/state.md (T-VK-VIF-1.4-RESIDUAL row updated with the localisation), core/src/vulkan/AGENTS.md (new invariant row pinning the SCALE = 2 cross-subgroup-reduction memory-model finding), CHANGELOG.md (lusoris fork "Changed" entry). No code touched; the Phase 3 shader memory-model fix lands in a separate PR. Upstream Netflix/vmaf has no Vulkan backend so conflict probability for the AGENTS.md row is zero — entry exists because the empirical localisation flips the open state-row hypothesis from FP-precision to memory-model and retires the places=3 override path that earlier rebase scaffolding might have suggested.
  • Invariant: vif.comp SCALE = 2 specialisation's Phase-4 cross-subgroup int64 reduction is non-deterministic on NVIDIA driver 595.71.05 + Vulkan 1.4.341 (lines 547–592, subgroupAdd
  • barrier() + thread-0 read of s_lmem). API 1.3 lane is fully deterministic on the same hardware. The four apiVersion pinning sites in core/src/vulkan/common.c + core/src/vulkan/vma_impl.cpp stay at 1.3 until Phase 3 lands the explicit memory-scope barrier and a 5-run determinism gate confirms run-to-run identical (num, den) plus places=4 0/48 on NVIDIA. The places=3 override path is eliminated from the unblock options.
  • Re-test: apply the local API-1.4 bump (core/src/vulkan/common.c 3 sites + vma_impl.cpp VMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build with meson setup ... -Denable_vulkan=enabled, then run the gate and the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3". Expect 45/48 places=4 failures on integer_vif_scale2 (max abs 1.527e-02) AND 5 distinct (integer_vif_num_scale2, integer_vif_den_scale2) pairs across 5 runs of --feature 'vif_vulkan=debug=true'. Both observations reproduced bit-for-bit on this session's hardware lane (UUID e478b41b-5c4f-1ddb-f990-e44916aff4c8).

0309 — vmaf-tune fast CLI surface (ADR-0276 status-update appendix, HP-3)

  • Touches: tools/vmaf-tune/src/vmaftune/cli.py (new fast subparser + _run_fast + _build_fast_sample_extractor + _build_fast_encode_runner + _parse_canonical6_means), tools/vmaf-tune/tests/test_cli_fast.py (new), tools/vmaf-tune/AGENTS.md (new exit-code-contract invariant bullet under the fast-path section), docs/adr/0276-vmaf-tune-fast-path.md (status-update appendix), docs/usage/vmaf-tune.md (new ## fast section). Upstream Netflix/vmaf has no fast-path surface, so conflict probability is zero — entry exists because the canonical-6 parsing path off pooled_metrics.<feature>.mean is sensitive to libvmaf's JSON output shape.
  • Invariant: The libvmaf JSON layout the _parse_canonical6_means helper consumes is the pooled_metrics.<feature>.mean shape (modern libvmaf 3.x), with a per-frame frames[].metrics.<feature> fallback. Both shapes are covered by parse_vmaf_json for the headline VMAF score in score.py; canonical-6 means re-use the same surface. If upstream changes the JSON schema (e.g. nests pooled_metrics under a new key), _parse_canonical6_means follows in the same PR — the fast-path proxy depends on the canonical-6 vector being correctly extracted from libvmaf's output. The OOD-gap exit code 3 from _run_fast is the documented fall-back signal in docs/usage/vmaf-tune.md § "Fall-back idiom"; do not silently downgrade it to 0. The CLI is the only seam that injects sample_extractor and encode_runner into fast.fast_recommend; the Python API still raises NotImplementedError when called without them.
  • Re-test: PYTHONPATH=tools/vmaf-tune/src python -m pytest tools/vmaf-tune/tests/test_cli_fast.py tools/vmaf-tune/tests/test_fast.py -v (21 tests). Smoke end-to-end without ffmpeg / ONNX / GPU: vmaf-tune fast --target-vmaf 92 --smoke --n-trials 8 should emit a JSON payload whose smoke is true and verify_vmaf is null. vmaf-tune fast --help lists every flag in _DOCUMENTED_FAST_FLAGS from test_cli_fast.py.

0366 — vmaf-tune corpus schema v3 (ADR-0366)

  • Touches: tools/vmaf-tune/src/vmaftune/__init__.py (SCHEMA_VERSION 2 → 3, +12 canonical-6 aggregate keys), tools/vmaf-tune/src/vmaftune/score.py (new parse_feature_aggregates, ScoreResult.feature_means/_stds), tools/vmaf-tune/src/vmaftune/corpus.py (writer projects the aggregates into row keys; new read_jsonl with v2 back-compat), ai/scripts/train_fr_regressor_v[23].py (consume the new columns directly from the corpus DataFrame). All paths are wholly fork-local — tools/vmaf-tune/ and ai/scripts/ are not mirrored upstream — so rebase impact is zero.
  • Invariant: Phase B/C/D consumers and the FR-regressor trainers rely on the canonical-6 <feature>_mean columns being present on schema_version >= 3 rows and being NaN (never 0.0) when libvmaf does not expose the feature. Keep the writer-side NaN contract intact during any future widening; trainers drop NaN rows before fitting StandardScaler. The reader (read_jsonl) preserves the on-disk schema_version so trainers can filter to >= 3 if they need real per-feature data.
  • Re-test:
cd tools/vmaf-tune && python -m pytest \
  tests/test_corpus.py tests/test_corpus_schema_v3.py \
  tests/test_corpus_v2_back_compat.py -q
python -m pytest ai/tests/test_train_fr_regressor_v3.py -q

0308 — encoder knob-sweep recipe-regression policy (ADR-0308, docs-only)

  • Touches: docs/research/0080-encoder-knob-sweep-findings.md, docs/adr/0308-encoder-knob-sweep-recipe-regression-policy.md, docs/adr/README.md (index row), ai/AGENTS.md (knob-sweep invariant section), changelog.d/changed/encoder-knob-sweep-findings.md. No code touched; companion to PR #400 (ADR-0305 + Research-0077 + ai/scripts/analyze_knob_sweep.py). Upstream Netflix/vmaf has no encoder-knob-sweep surface, so conflict probability is zero — this entry exists only because the policy threshold (7-of-9 structural cut) is rebase-sensitive on the corpus shape.
  • Invariant: the 7-of-9 source-count threshold from ADR-0308 §Decision point 1 is calibrated against the current 9-source Netflix Public Dataset corpus. If the corpus grows past 9 sources (e.g. UGC expansion per ADR-0287, or HDR additions), re-derive the absolute threshold as a fraction (≥7/9 ≈ 78 %). The structural cluster is sharp on the current corpus (top-15 cells all hit 9-of-9, no observed cells in 4-6 range), so a fractional cut at ~75 % is robust. Do NOT relax bitrate_tol_pct (default 5.0) or vmaf_tol (default 0.1) in ai/scripts/analyze_knob_sweep.py without an ADR — those tolerances are calibrated against the per-frame VMAF noise floor and bitrate quantisation in libavformat muxers.
  • Re-test: pytest ai/tests/test_knob_sweep_analysis.py -v (script logic; ships in PR #400). Policy gate is offline: regenerate runs/phase_a/full_grid/comprehensive.jsonl via tools/vmaf-tune/src/vmaftune/hw_encoder_corpus.py (3-hour run on a single host with NVENC + QSV) then re-run python ai/scripts/analyze_knob_sweep.py --jsonl <adapted.jsonl> --out-dir runs/phase_a/full_grid/reports/ and diff the resulting summary.md against docs/research/0080-encoder-knob-sweep-findings.md headline table. Structural cluster (top-15 cells, all 9-of-9) is the invariant to defend.

0228 — Vulkan 1.4 bump deferred (ADR-0264, docs-only)

  • Touches: none (docs-only PR). Future Step A of T-VK-1.4-BUMP will touch core/src/feature/vulkan/shaders/vif.comp and core/src/feature/vulkan/shaders/ciede.comp; Step B will touch the three apiVersion sites in core/src/vulkan/common.c (lines 54, 264, 374) and the VMA_VULKAN_VERSION define in core/src/vulkan/vma_impl.cpp (line 22).
  • Invariant: master stays on VK_API_VERSION_1_3 and VMA_VULKAN_VERSION = 1003000. Lifting the constant in any future upstream sync (Netflix doesn't ship a Vulkan backend, so the conflict is improbable) without first auditing precise / OpDecorate ... NoContraction decoration on vif.comp and ciede.comp will reintroduce the NVIDIA-driver regression captured in research-0053. The psnr_hvs_strict_shaders -O0 list in core/src/vulkan/meson.build is the existing precedent for shader-side bit-exactness mitigations and should be the place a 1.4-era audit lands its results (potentially expanding to cover vif.comp + ciede.comp if the precise audit decides the optimizer is the right place to gate).
  • Re-test: when Step B lands, the gate is python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan and the same with --feature ciede against NVIDIA + RADV + lavapipe; max abs diff must stay ≤ 5.0e-05 (places=4) on all three.

0229 — HIP fifth-consumer kernel float_ansnr_hip (ADR-0266)

0228 — y4m_convert_411_422jpeg 1-byte heap-buffer-overflow fix

0228 — vmaf-tune resolution-aware model selection (ADR-0289)

0282 — vmaf-tune AMD AMF codec adapters (ADR-0282)

0228 — tools/vmaf-tune/ codec-agnostic encode dispatcher (ADR-0294)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/encode.py — refactored to look up the codec adapter and delegate argv composition. Wholly fork-local.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, codec_adapters/x264.py — adapter contract gains ffmpeg_codec_args(preset, quality) and extra_params(). Both are duck-typed; missing methods fall back to the legacy x264-CRF shape.
  • tools/vmaf-tune/tests/test_encode_multi_codec.py — new 19-test suite pinning the dispatcher contract per codec.
  • docs/usage/vmaf-tune.md — new "Codec adapter contract" section.
  • Invariant: the harness (encode.py, corpus.py) must not branch on codec identity. The only codec-aware code is the per-adapter codec_adapters/*.py file. Any future change that adds an if adapter.encoder == "..." to the harness regresses ADR-0294's whole-purpose. The corpus row schema stays at SCHEMA_VERSION=1 — crf is preserved as the row column even when the underlying codec's quality knob is -cq / -qp / etc.; EncodeRequest.quality is a request-side property only. Adapters that don't yet expose ffmpeg_codec_args are intentionally permitted to fall back to the legacy x264-CRF shape; removing that fallback would break in-flight adapter PRs landing one-at-a-time.
  • Re-test on rebase:

```bash pytest tools/vmaf-tune/tests/ -q # 32 passed (13 existing + 19 multi-codec)

python -c " from pathlib import Path from vmaftune.encode import EncodeRequest, build_ffmpeg_command req = EncodeRequest( source=Path('ref.yuv'), width=1920, height=1080, pix_fmt='yuv420p', framerate=24.0, encoder='libx264', preset='medium', crf=23, output=Path('out.mp4'), ) cmd = build_ffmpeg_command(req) assert cmd[cmd.index('-c:v') + 1] == 'libx264' assert cmd[cmd.index('-preset') + 1] == 'medium' assert cmd[cmd.index('-crf') + 1] == '23' print('x264 dispatcher path OK') "

0260 — vmaf-tune --sample-clip-seconds (ADR-0301)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/{cli,corpus,encode,score,__init__}.py — fork-local. No upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/tests/test_corpus.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0301-vmaf-tune-sample-clip.md, docs/adr/_index_fragments/0301-vmaf-tune-sample-clip.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md.
  • Invariant: corpus JSONL SCHEMA_VERSION bumped to 2 — additive clip_mode key only. Sample-clip windows are mirrored on both sides via FFmpeg input-side -ss/-t (encode) and libvmaf's --frame_skip_ref / --frame_cnt (score). The _resolve_sample_clip() helper is the single source of truth for the centre-anchored slice math; do not duplicate the computation elsewhere. Falls back silently to "full" when N >= duration_s.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep sample-clip

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_amf,hevc_amf,av1_amf,_amf_common}.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py — registry extended with three AMF entries.
  • tools/vmaf-tune/tests/test_codec_adapter_amf.py (new).
  • tools/vmaf-tune/tests/test_corpus.py — Phase A test renamed from test_known_codecs_phase_a_is_x264_only to test_known_codecs_includes_x264_and_amf.
  • tools/vmaf-tune/AGENTS.md — adds AMF preset-compression invariant.
  • docs/usage/vmaf-tune.md — adds Hardware encoders section.
  • Invariant: the 7-into-3 preset compression table in _amf_common.py (_PRESET_TO_AMF) is the cross-codec axis Phase B / C consumers depend on. Every AMF adapter accepts the canonical 7 preset names (placeboultrafast) and maps them onto the three AMF rungs (quality / balanced / speed). Do not extend the preset vocabulary without amending ADR-0282 — registry uniformity (no codec-identity branching in the harness search loop) rests on every codec accepting the same names.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/resolution.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/corpus.py — adds CorpusOptions.resolution_aware: bool = True and pipes the effective model through score_res.request.model into the JSONL row.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds --resolution-aware / --no-resolution-aware (BooleanOptionalAction, default on).
  • tools/vmaf-tune/tests/test_resolution.py (new).
  • docs/usage/vmaf-tune.md — new "Resolution-aware mode" section.
  • docs/adr/0289-vmaf-tune-resolution-aware.md (new) + docs/research/0064-vmaf-tune-resolution-aware.md (new).
  • tools/vmaf-tune/AGENTS.md — two new invariant notes.
  • Invariant: the height-only decision rule (height >= 2160vmaf_4k_v0.6.1, else vmaf_v0.6.1) is the documented contract. The JSONL vmaf_model field is now per-row (not per-job) — mixed ladder corpora legitimately contain multiple distinct values across rows. Downstream consumers (Phase B / C / D) must group/filter by vmaf_model rather than assuming a constant. Width is accepted in the API for symmetry but ignored in the body; do not branch on it without a follow-up ADR.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep resolution-aware

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • core/tools/y4m_input.c — upstream-mirrored Daala-derived Y4M parser. The fix sits inside the 4:1:1 → 4:2:2-jpeg chroma upsample routine y4m_convert_411_422jpeg, lines ~500–530 in the function's three sub-loops. Upstream Netflix/vmaf carries the same shape; if upstream lands its own fix during a sync, prefer the upstream version and drop ours.
  • core/test/test_y4m_411_oob.c (new, fork-local) — drives the minimal W=2 H=4 4:1:1 stream through video_input_open + video_input_fetch_frame. Wholly fork-added; no upstream collision.
  • core/test/meson.build — adds test_y4m_411_oob executable + test() registration.
  • Invariant: the first two sub-loops of y4m_convert_411_422jpeg must guard _dst[(x << 1) | 1] writes with (x << 1 | 1) < dst_c_w, matching the third sub-loop's existing guard. Without the guard a 4:1:1 stream of width 2 (dst_c_w == 1) writes one byte past the destination chroma row.
  • Re-test:
  • cd libvmaf && meson setup ../build-asan --buildtype=debug -Db_sanitize=address -Db_lundef=false -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
  • ninja -C build-asan test/test_y4m_411_oob
  • ASAN_OPTIONS=detect_leaks=0 ./build-asan/test/test_y4m_411_oob — must report 1 tests run, 1 passed. Pre-fix the binary aborts with AddressSanitizer: heap-buffer-overflow … WRITE of size 1 at y4m_input.c:507.

0270 — saliency_student_v1 fork-trained on DUTS-TR (ADR-0286)

  • Touches:
  • model/tiny/registry.json — adds the saliency_student_v1 row. Fork-local registry; no upstream overlap.
  • model/tiny/saliency_student_v1.onnx (+ .json sidecar) — new weights and metadata. Fork-local.
  • ai/scripts/train_saliency_student.py — new training script. Wholly fork-local under ai/, which has no upstream counterpart.
  • docs/ai/models/saliency_student_v1.md, docs/research/0062-saliency-student-from-scratch-on-duts.md, docs/adr/0286-saliency-student-fork-trained-on-duts.md — new docs under fork-local trees.
  • Invariant: the C-side feature_mobilesal.c extractor's tensor-name contract — input (NCHW [1, 3, H, W]) and saliency_map (NCHW [1, 1, H, W]) — must continue to match the ONNX graph for both saliency_student_v1.onnx and the legacy mobilesal.onnx placeholder. Future weights swaps can change the graph internals freely but must keep these names + shapes; the smoke test asserts the registration. The op-allowlist constraint (graph uses only ops in core/src/dnn/op_allowlist.c) carries over from ADR-0218 — Resize is not used; ConvTranspose is the upsample op for v1 to keep the graph load-clean against vanilla origin/master.
  • Re-test:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python -c "
from ai.src.vmaf_train.op_allowlist import check_model
from pathlib import Path
r = check_model(Path('model/tiny/saliency_student_v1.onnx'))
assert r.ok, r.pretty()
print('allowlist OK')
"
meson test -C build --suite=fast mobilesal

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • core/src/feature/hip/float_ansnr_hip.{c,h} (new) — fifth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/float_ansnr_cuda.c call-graph-for-call-graph; init/submit/collect/close invoke the kernel-template helpers in the same order; the submit body intentionally bypasses vmaf_hip_kernel_submit_pre_launch (no atomic, kernel writes per-block (sig, noise) interleaved float partials directly).
  • core/src/hip/meson.build — adds the new TU to hip_sources.
  • core/src/feature/feature_extractor.c — adds the extern VmafFeatureExtractor vmaf_fex_float_ansnr_hip; declaration and the registry row under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — adds test_float_ansnr_hip_extractor_registered sub-test pinning the lookup contract.
  • Invariant — the submit_pre_launch bypass is load-bearing. The CUDA twin makes the same choice for the same reason. If a future PR adds a submit_pre_launch call to float_ansnr_cuda.c's submit path, the HIP twin must follow in the same PR. Likewise the readback shape (wg_count * 2u * sizeof(float)) and the bpc table (peak/psnr_max for 8/10/12/16-bit) mirror the CUDA twin verbatim — keep aligned on rebase.
  • Re-test on rebase:
cd libvmaf
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build  # 48/48 green (47 CPU + HIP smoke)

0230 — HIP sixth-consumer kernel motion_v2_hip (ADR-0267)

  • Touches:
  • core/src/feature/hip/integer_motion_v2_hip.{c,h} (new) — sixth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/integer_motion_v2_cuda.c call-graph-for-call-graph; carries the VMAF_FEATURE_EXTRACTOR_TEMPORAL flag and a flush() callback. The state struct has a uintptr_t pix[2] ping-pong slot pair tracked outside the kernel-template (the template models a single device+host pair only).
  • core/src/hip/meson.build — adds the new TU to hip_sources.
  • core/src/feature/feature_extractor.c — adds the extern VmafFeatureExtractor vmaf_fex_integer_motion_v2_hip; declaration and the registry row under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — adds test_motion_v2_hip_extractor_registered sub-test pinning the lookup contract (extractor name is motion_v2_hip, matching the CUDA twin's motion_v2_cuda naming).
  • Invariant — temporal-extractor + ping-pong shape. The VMAF_FEATURE_EXTRACTOR_TEMPORAL flag bit, the flush() callback registration, and the uintptr_t pix[2] slot pair are load-bearing for the runtime PR (T7-10b). The runtime PR will swap uintptr_t pix[2] for a real device-buffer handle pair matching the CUDA twin's VmafCudaBuffer *pix[2]. On rebase: if the CUDA twin's flush-pass shape changes (currently min(score[i], score[i+1])), update the HIP twin's flush_fex_hip body in the same PR.
  • Re-test on rebase: same as 0229 — meson test -C build with enable_hip=true exercises the smoke contract.

0227 — ms_ssim_vulkan submit-side migrated to kernel_template (T-GPU-DEDUP-26)

  • Touches:
  • core/src/feature/vulkan/ms_ssim_vulkan.cextract()'s raw VkCommandBuffer / VkFence / vkAllocateCommandBuffers / vkBeginCommandBuffer / vkCreateFence / vkQueueSubmit / vkWaitForFences / vkDestroyFence / vkFreeCommandBuffers blocks become VmafVulkanKernelSubmit triples (vmaf_vulkan_kernel_submit_begin / _submit_end_and_wait / _submit_free). One triple covers the decimate-pyramid command buffer; one triple per scale covers the per-scale SSIM submit. The pipeline-side bundles (pl_decimate 2-binding 4-variant + pl_ssim 10-binding 9-variant) and their _add_variant() chains are unchanged from the prior migration.
  • Invariant: any future submit-side template change (timeline semaphores, deferred fence release, queue-family parameterisation) must keep the helpers' synchronous-wait + per-frame fence + per-frame command-buffer contract intact, since ms_ssim_vulkan.c does host readback of the l_partials / c_partials / s_partials buffers immediately after _submit_end_and_wait returns. The submit-side contract is the same one already documented in core/src/vulkan/AGENTS.md's "Rebase-sensitive invariants" section for kernel_template.h.
  • Re-test:

```bash cd libvmaf && meson test -C build python scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature float_ms_ssim --backend vulkan --places 4

0231 — SHA-pin GitHub Actions (OSSF Pinned-Dependencies)

  • Touches: every workflow file under .github/workflows/. All 13 fork workflows (docker-image.yml, docs.yml, ffmpeg-integration.yml, libvmaf-build-matrix.yml, lint-and-format.yml, nightly-bisect.yml, nightly.yml, release-please.yml, rule-enforcement.yml, scorecard.yml, security-scans.yml, supply-chain.yml, tests-and-quality-gates.yml) had their uses: directives rewritten from <owner>/<repo>@vN[.M.K] to <owner>/<repo>@<40-char-sha> # vN.M.K. 97 references converted; the SLSA reusable-workflow ref in supply-chain.yml is the single documented holdout (see Invariant below).
  • Invariant — SHA-pin policy for uses:. Every action reference in .github/workflows/*.yml MUST be a 40-char commit SHA with the semver tag preserved as a trailing # vN.M.K comment. The OSSF Scorecard Pinned-Dependencies check parses both forms and a floating tag (@vN) is treated as unpinned and counts against the aggregate score. Single permitted exception: the SLSA generator reusable workflow (slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml) must keep its vX.Y.Z tag form because GitHub Actions consumers cannot SHA-pin reusable-workflow refs in every code path; the exception is documented inline in supply-chain.yml and survives on each rebase. Why this matters on upstream sync: Netflix upstream does not ship the fork's CI tree, so a /sync-upstream run that drags new workflow content (e.g. via repository templates or bot-authored bumps) into .github/workflows/ can re-introduce floating-tag references unnoticed. The post-rebase check below is the standing gate — anything that lights up needs to be re-pinned before merging the sync.
  • Re-test on rebase:
# Anything that prints is a regression — every uses: must be either
# already SHA-pinned (40 hex) or, for the documented SLSA exception,
# the slsa-github-generator reusable-workflow ref.
grep -hnE '^\s*(- )?uses:\s+[^@]+@[^ #]+\s*$' .github/workflows/*.yml \
  | grep -vE '@[a-f0-9]{40}' \
  | grep -v 'slsa-framework/slsa-github-generator/.github/workflows/'
# SHA-resolution sanity for any new pin (per-action):
gh api repos/<owner>/<repo>/git/ref/tags/<vN.M.K> --jq '.object.sha'
# If the result is a "tag" object (annotated tag), deref:
gh api repos/<owner>/<repo>/git/tags/<sha-from-prev> --jq '.object.sha'

0226 — CUDA drain-batch engine-loop opt (T-GPU-OPT-1)

  • Touches:
  • core/src/cuda/drain_batch.{h,c} (new) — TLS drain-batch table + shared drain stream + _open()/register/_flush()/_close() API.
  • core/src/libvmaf.c — engine-side per-frame loop now wraps submit/collect with _open() + _flush() so all CUDA extractor finished events are waited on a single shared drain stream.
  • All 12 CUDA feature kernels (core/src/feature/cuda/*.c) register their finished event + drained flag with the drain batch on submit; collect skips its private cuStreamSynchronize when drained is true.
  • Invariant — drained-flag contract. Every CUDA extractor's collect path must check the per-frame drained flag and skip its own cuStreamSynchronize when set; otherwise the drain batching is a no-op. The flag is reset to false per frame inside vmaf_cuda_drain_batch_register().
  • Re-test on rebase:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast cuda

Expected: all CUDA tests green; bench shows ≥5% wall-clock gain on a 7-extractor VMAF model (model.json with all feature extractors enabled).

0225 — Netflix bench snapshot regen (upstream a44e5e61 motion fix)

  • Touches:
  • testdata/netflix_benchmark_results.json — fork-added snapshot. CPU rows now reflect the post-fix motion feature; cuda / sycl rows from the previous regen are preserved unchanged because those backends were not exercised on this rerun (host-environment tooling — wrong renderD path, libvmaf_cuda not enabled in the local FFmpeg build). Future full regens should include cuda / sycl.
  • testdata/bench_all.sh — default VMAF= no longer points at /usr/local/bin/vmaf (which on most dev hosts is stuck at the pre-upstream-a44e5e61 v3.0.0); now defaults to the in-tree fork build at core/build/tools/vmaf.
  • testdata/benchmark_netflix.pyFFMPEG, YUVDIR and the hardcoded LD_LIBRARY_PATH=/usr/local/lib are now overridable via VMAF_FFMPEG, VMAF_YUVDIR and any caller-set LD_LIBRARY_PATH.
  • Invariant: the snapshot's CPU pooled VMAF for src01_576x324 is 76.667828 (post-fix), not 76.668904 (the upstream-buggy mirror). If /sync-upstream ever re-pulls a Netflix change that touches motion.c mirror-handling, this number is the reference.
  • Re-test:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
LD_LIBRARY_PATH=$(pwd)/build/src python3 \
    ../testdata/benchmark_netflix.py

Expected CPU pooled rows: 76.667828, 35.068672, 7.985899.

0224 — CUDA graph capture feasibility (research-0047, DEFER)

  • Touches: none — investigation-only; no code lands. The research digest docs/research/0047-cuda-graph-capture-feasibility.md documents why a CUDA graph capture path on the per-frame submit chain is deferred rather than shipped (realised wall-clock gain capped at ~1-3% vs. the predicted 10-20%, with a 4-slot picture-pool rotation that defeats single-graph capture and forces per-frame cuGraphExecKernelNodeSetParams rebinding for (ref, dis) device pointers).
  • Invariant: the kernel_template.h docstring keeps naming VmafCudaKernelLifecycle.finished as a graph-capture hook point. Don't prune that comment on rebase — leaving the door open in the template is free, and the digest's "what needs to be true for a future GO" section depends on the hook still being there.
  • Re-test on rebase:
# Confirm the docstring still references graph capture as the hook
# point — wording change is fine, removal is not.
grep -q "graph capture" core/src/cuda/kernel_template.h

0223 — ADR slug-drift repair in CHANGELOG / rebase-notes (PR #304 follow-up)

  • Touches: CHANGELOG.md, docs/rebase-notes.md. No code; no upstream-shared path; no public-API surface.
  • Invariant: every [ADR-NNNN](docs/adr/NNNN-slug.md) link in the fork's tracked docs resolves to an actual on-disk file under docs/adr/. Repaired 4 broken slugs that did not exist on disk (0138-iqa-convolve-avx2-bitexact-double0138-iqa-convolve-avx2-bitexact-double, 0140-simd-dx-framework0140-simd-dx-framework, 0190-ms-ssim-vulkan0190-ms-ssim-vulkan, 0178-vulkan-adm-kernel0178-vulkan-adm-kernel). All retained their cited NNNN per ADR-0028 (NNNN is immutable once Accepted).
  • Re-test on rebase: from repo root, the following must print no lines:
for ref in $(grep -ohE 'docs/adr/[0-9]{4}-[a-z0-9-]+\.md' \
    CHANGELOG.md docs/rebase-notes.md AGENTS.md docs/state.md \
    | sort -u); do
  test -f "$ref" || echo "MISSING: $ref"
done

0125 — cambi_vulkan migrated to kernel_template (T-GPU-DEDUP-25, 5-bundle)

  • Touches:
  • core/src/feature/vulkan/cambi_vulkan.c — state's quintet (dsl_2bind + 5× pl_layout_* + shader_modules[CAMBI_PL_COUNT]
    • shared desc_pool) collapses to five VmafVulkanKernelPipeline bundles (pl_trivial, pl_derivative, pl_filter_mode, pl_decimate, pl_mask_dp), each owning its own descriptor pool. The first slot of pipelines[] per stage aliases the bundle's base pipeline; CAMBI_PL_FILTER_MODE_V, CAMBI_PL_MASK_SAT_COL, and CAMBI_PL_MASK_THRESHOLD are sibling variants built via vmaf_vulkan_kernel_pipeline_add_variant().
  • cambi_vk_alloc_set takes a bundle pointer (->desc_pool / ->dsl) — every dispatch site picks the bundle that matches its push-constant struct.
  • The cambi_vk_make_dsl / cambi_vk_make_pl / cambi_vk_create_shader / cambi_vk_build_pipeline helpers are dropped — the template subsumes them.
  • Invariant — variants destroyed before bundle, base alias must be skipped. Five distinct push-constant struct sizes (CambiVkPushTrivial / CambiVkPushDerivative / CambiVkPushFilterMode / CambiVkPushDecimate / CambiVkPushMaskDp) force five bundles even though every stage's DSL is 2-binding SSBO; _add_variant() only siblings pipelines under the same layout. close_fex must vkDestroyPipeline() the variant slots (CAMBI_PL_FILTER_MODE_V, CAMBI_PL_MASK_SAT_COL, CAMBI_PL_MASK_THRESHOLD) before calling vmaf_vulkan_kernel_pipeline_destroy() on each bundle.
  • Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit): cambi mean = 0.0, identical to pre-migration (the pair has no banding artifacts).
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper. Upstream Netflix/vmaf has no Vulkan backend, so there is nothing to merge against.

0124 — ssimulacra2_vulkan migrated to kernel_template (T-GPU-DEDUP-24, 4-bundle)

  • Touches:
  • core/src/feature/vulkan/ssimulacra2_vulkan.c — state's 16 long-lived pipeline-object fields (4× *_dsl + *_pl + *_shader + the shared desc_pool) collapse to four VmafVulkanKernelPipeline bundles (pl_xyb, pl_mul, pl_blur, pl_ssim), each owning its own descriptor pool. The first slot of each per-bundle pipeline array (xyb_pipelines[0], mul_pipelines[0], blur_pipelines_h[0], ssim_pipelines[0]) aliases the bundle's base VkPipeline; remaining per-scale / per-pass slots are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • ss2v_build_pipeline_int3 reroutes through _add_variant() instead of calling vkCreateComputePipelines directly; ss2v_alloc_set takes a bundle pointer (->desc_pool / ->dsl) instead of a separate DSL argument; descriptor-set free sites at the tail of ss2v_run_scale route to each bundle's pool.
  • The ss2v_make_dsl / ss2v_make_pl / ss2v_create_shader helpers are dropped — the template subsumes them.
  • Invariant — variants destroyed before bundle, slot 0 alias must be skipped. Four distinct DSL shapes (XYB = 6 SSBOs, MUL = 3, BLUR = 2, SSIM = 8) prevent collapsing to one bundle: _add_variant() only siblings pipelines under the same layout. close_fex must vkDestroyPipeline() the variant slots in xyb_pipelines[1..N-1], mul_pipelines[1..N-1], ssim_pipelines[1..N-1], blur_pipelines_h[1..N-1], and every slot of blur_pipelines_v[] before calling vmaf_vulkan_kernel_pipeline_destroy() on each bundle, and must skip slot 0 of the first three arrays + blur_pipelines_h to avoid double-freeing the aliased base.
  • Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit): ssimulacra2 mean = 24.613842, identical to pre-migration.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper. Upstream Netflix/vmaf has no ssimulacra2 extractor and no Vulkan backend, so there is nothing to merge against.

0118 — psnr_hvs_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-18)

  • Touches:
  • core/src/feature/vulkan/psnr_hvs_vulkan.c — state's dsl + pipeline_layout + shader + desc_pool + pipeline[3] collapses to VmafVulkanKernelPipeline pl + VkPipeline pipeline_chroma_u + VkPipeline pipeline_chroma_v. Plane 0 is the template's base pipeline; planes 1+2 are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • New psnr_hvs_plane_pipeline() accessor maps plane index to the right VkPipeline handle.
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the chroma U/V variants before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan in T-GPU-DEDUP-7.
  • Numerical contract: unchanged. Same shaders + spec-constants
  • push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0119 — vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-19)

  • Touches:
  • core/src/feature/vulkan/vif_vulkan.c — state's dsl + pipeline_layout + shader + desc_pool + pipelines[4] collapses to VmafVulkanKernelPipeline pl + VkPipeline scale_variants[3]. Scale 0 is the template's base pipeline; scales 1, 2, 3 are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • New vif_scale_pipeline() accessor maps scale index to the right VkPipeline handle (replaces s->pipelines[scale]).
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the 3 scale variants before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan in T-GPU-DEDUP-7 and psnr_hvs_vulkan in T-GPU-DEDUP-18.
  • Numerical contract: unchanged. Same shaders, same spec-constants, same push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0120 — float_vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-20)

  • Touches:
  • core/src/feature/vulkan/float_vif_vulkan.c — state collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl; the VkPipeline pipelines[2][4] 2-D lookup table is preserved so the existing [mode][scale] dispatch path stays clean, but pipelines[0][0] aliases s->pl.pipeline (the template's base). The other 6 entries are sibling pipelines created via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the 6 sibling variants (every (mode, scale) except (0, 0)) before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan / psnr_hvs_vulkan / vif_vulkan.
  • Invariant — pipelines[0][0] aliasing. The base pipeline handle is owned by s->pl.pipeline; we copy it into pipelines[0][0] after _create() so the dispatch path can use a uniform 2-D lookup. The destroy loop must skip (mode=0, scale=0) to avoid double-freeing the template's pipeline.
  • Numerical contract: unchanged. Same shaders, spec-constants (mode + scale), push-constants. Netflix-pair smoke matches integer_vif bit-identically to 4 decimals.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0122 — float_adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-22)

  • Touches:
  • core/src/feature/vulkan/float_adm_vulkan.c — twin to adm_vulkan (T-GPU-DEDUP-21); 16-pipeline 2-D [stage][scale] array. State collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl. pipelines[0][0] aliases s->pl.pipeline; the other 15 entries are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariants:
  • Variants destroyed before bundle.
  • pipelines[0][0] aliasing — destroy loop must skip (stage=0, scale=0).
  • Numerical contract: unchanged. Same float (_s suffix) primitives from adm_tools.c; same 5-element spec-constant tuple; same float partial accumulation reduced in double on the host.

0121 — adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-21)

  • Touches:
  • core/src/feature/vulkan/adm_vulkan.c — state collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl; the VkPipeline pipelines[4][4] 2-D lookup is preserved so the per-stage dispatch path stays clean. pipelines[0][0] aliases s->pl.pipeline (the template's base); the other 15 entries are sibling pipelines via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariants:
  • Variants destroyed before bundle (same rule as ssim_vulkan / psnr_hvs / vif / float_vif).
  • pipelines[0][0] aliasing — destroy loop must skip (stage=0, scale=0) to avoid double-freeing the template's pipeline.
  • Numerical contract: unchanged. Same shaders + 5-element spec-constant tuple (width, height, bpc, scale, stage) + push-constants.
  • Rebase impact: low. Builds on top of PR #272.

0123 — ms_ssim_vulkan 2-bundle migration (T-GPU-DEDUP-23)

  • Touches:
  • core/src/feature/vulkan/ms_ssim_vulkan.c — state collapses decimate_dsl + decimate_pl + decimate_shader + ssim_dsl + ssim_pl + ssim_shader + desc_pool (7 fields) to two bundles VmafVulkanKernelPipeline pl_decimate + pl_ssim. Each bundle owns its own descriptor pool. The kernel has two distinct pipeline shapes (decimate = 2 SSBO bindings, ssim = 10 bindings), so two bundles is the minimum — _add_variant() only siblings pipelines under the same layout.
  • decimate_pipelines[0] aliases pl_decimate.pipeline (the template's base = scale 0). The remaining MS_SSIM_SCALES - 2 decimate variants (scales 1..3) are siblings via _add_variant().
  • ssim_pipeline_horiz[0] aliases pl_ssim.pipeline (base = scale 0, pass 0). The other 9 entries (4× ssim_pipeline_horiz for scales 1..4, plus 5× ssim_pipeline_vert for scales 0..4) are variants.
  • Invariant — variants destroyed before bundle. Same rule as ADR-0106 entry 0106: close_fex must destroy decimate_pipelines[1..3] and ssim_pipeline_horiz[1..4] + ssim_pipeline_vert[0..4] before calling vmaf_vulkan_kernel_pipeline_destroy() on pl_decimate / pl_ssim.
  • Invariant — [0] aliasing destroy-skip. decimate_pipelines[0] and ssim_pipeline_horiz[0] must not be passed to vkDestroyPipeline in close_fex_destroy() already releases them via pl_decimate.pipeline / pl_ssim.pipeline. Double-free is UB. The destroy loops in close_fex start at i = 1 for decimate and skip i == 0 for ssim_horiz.
  • Invariant — per-bundle descriptor pool. The shared s->desc_pool is gone; alloc_descriptor_set now takes a const VmafVulkanKernelPipeline *bundle and uses bundle->desc_pool + bundle->dsl. Per-frame vkFreeDescriptorSets calls must target the matching pool (pl_decimate.desc_pool for decimate sets, pl_ssim.desc_pool for ssim sets) — mixing them is undefined behavior.
  • Numerical contract: unchanged. Same shaders, spec constants, push constants, and dispatch order as before. float_ms_ssim Netflix-pair smoke (576×324×48f) reports mean 0.963241; ssim pyramid intermediate values bit-identical to pre-migration run.
  • Rebase impact: low. Upstream Netflix has no Vulkan backend. Conflicts only against the parallel T-GPU-DEDUP-{18..22} PRs (#284–#288) on CHANGELOG.md / docs/rebase-notes.md — auto-resolve keeps both halves.

0106 — Vulkan kernel template multi-pipeline + ssim/motion migration (T-GPU-DEDUP-7)

  • Touches:
  • core/src/vulkan/kernel_template.h — new vmaf_vulkan_kernel_pipeline_add_variant() helper. Takes the base pipeline bundle (DSL / pipeline layout / shader / pool owned by vmaf_vulkan_kernel_pipeline_create) plus a partial VkComputePipelineCreateInfo and produces a sibling VkPipeline re-using the same layout / shader. The base _create and _destroy entry points are unchanged; existing consumers (psnr, moment, ciede) keep working.
  • core/src/feature/vulkan/motion_vulkan.c — state collapses VkPipeline pipelines[2] (kept "for SYCL parity" but functionally identical because COMPUTE_SAD goes through push constants, not spec-constants) to a single VmafVulkanKernelPipeline pl. create_pipelines / close_fex shrink to template-driven create + destroy.
  • core/src/feature/vulkan/ssim_vulkan.c — state becomes VmafVulkanKernelPipeline pl + VkPipeline pipeline_vert. Pass 0 (horizontal) is the template's base pipeline; pass 1 (vertical) is created via _add_variant(). close_fex destroys the variant first, then calls vmaf_vulkan_kernel_pipeline_destroy() on the bundle.
  • Invariant — no spec-constant drift between base and variant. _add_variant() overwrites sType / stage.sType / stage.stage / stage.module / layout of the caller's VkComputePipelineCreateInfo so the variant is guaranteed to share the base's shader and layout. Callers control the variant's spec-constant via pSpecializationInfo. Reordering these overwrites lets a consumer accidentally bind a different shader module under the same layout — UB at descriptor-set time.
  • Invariant — variant destroyed before bundle. close_fex in ssim must vkDestroyPipeline(s->pipeline_vert) before vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — the bundle's _destroy releases the descriptor pool, which the vkAllocateDescriptorSets issued against the variant pipeline's layout cleanly drops only when the variant pipeline is already gone.
  • Numerical contract: unchanged. Both kernels run identical shaders + spec-constants + push-constants as before; only the Vulkan boilerplate that creates / destroys the pipeline scaffolding moved to a shared owner. Cross-backend parity gate at places=4 holds — Netflix-pair float_ssim smoke (576×324×48f) reports mean 0.863, identical to pre-migration.
  • Rebase impact: low. The base pipeline-bundle helpers predate this change (PR #270 / #271); the new _add_variant is additive. Upstream Netflix has no Vulkan backend to conflict with.

0111 — integer_ciede_cuda migrated to kernel_template (T-GPU-DEDUP-11)

  • Touches:
  • core/src/feature/cuda/integer_ciede_cuda.c — state's CUstream + CUevent + CUevent + VmafCudaBuffer + host-pinned float* quintet collapses to VmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. init / collect / close call the template's lifecycle_init/readback_alloc/collect_wait/ lifecycle_close/readback_free helpers. submit keeps the pre-launch wait inline (intentional — ciede has no atomic, so the template's pre-launch memset is unnecessary).
  • Numerical contract: unchanged. Pure CUDA-boilerplate consolidation. The host-side reduction in collect still uses the same double accumulator over per-block float partials — places=4 (ADR-0187) holds.

0112 — integer_moment_cuda migrated to kernel_template (T-GPU-DEDUP-12)

  • Touches:
  • core/src/feature/cuda/integer_moment_cuda.c — state's stream/event/device-buffer/host-pinned quintet collapses to VmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. submit calls vmaf_cuda_kernel_submit_pre_launch (atomic counters require the device-side memset). init / collect / close call the matching template helpers.
  • Numerical contract: unchanged. Same per-frame atomic accumulators (4× uint64), same sums_host[i] / n_pixels host division.
  • Rebase impact: low. Upstream Netflix has no equivalent template; this consolidation is fork-local.

0113 — integer_motion_v2_cuda migrated to kernel_template (T-GPU-DEDUP-13)

  • Touches:
  • core/src/feature/cuda/integer_motion_v2_cuda.c — stream/event pair + sad device+host quintet collapses to lc + rb. Raw-pixel ping-pong pix[2] stays outside the bundle. submit keeps the memset on pic_stream inline rather than calling submit_pre_launch (the helper would move the memset to lc.str, which races with the kernel reading the accumulator). init / collect / close call the matching template helpers.
  • Numerical contract: unchanged. Same D2D copy, same conditional kernel launch on frame ≥ 1, same host-side min(score[i], score[i+1]) flush.

0114 — integer_ssim_cuda migrated to kernel_template (T-GPU-DEDUP-14)

  • Touches:
  • core/src/feature/cuda/integer_ssim_cuda.c — stream/event/partials device+host quintet collapses to lc + rb. Five intermediate float buffers (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) stay outside the bundle. submit keeps the cuStreamWaitEvent + horiz + vert + DtoH chain inline — SSIM writes one float per block (no atomic), so the template's submit_pre_launch memset is unnecessary. init / collect / close use the matching template helpers.
  • Numerical contract: unchanged. Same horiz-then-vert two-pass pipeline, same per-block float partial reduction in double on the host. places=4 (matching the ciede_cuda precision pattern) holds.
  • Rebase impact: low. Upstream Netflix has no equivalent; this is fork-added.

0115 — ms_ssim_cuda + psnr_hvs_cuda lifecycle migration (T-GPU-DEDUP-15)

  • Touches:
  • core/src/feature/cuda/integer_ms_ssim_cuda.c — stream + 2-event lifecycle replaced with VmafCudaKernelLifecycle lc; multi-level pyramid + SSIM intermediate + 3-partials buffers stay outside the template's single-pair readback bundle.
  • core/src/feature/cuda/integer_psnr_hvs_cuda.c — same shape; 3-plane ref/dist/partials triples remain inline.
  • Numerical contract: unchanged. The migration only affects init / close boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the s->strs->lc.str / s->events->lc.submit / s->finisheds->lc.finished field renames.

0116 — float_psnr/ansnr/motion cuda → kernel_template (T-GPU-DEDUP-16)

  • Touches:
  • core/src/feature/cuda/float_psnr_cuda.c — stream/event/partials quintet → lc + rb; input upload buffers ref_in / dis_in stay outside the bundle.
  • core/src/feature/cuda/float_ansnr_cuda.c — same shape; rb wraps the (sig, noise) interleaved partials.
  • core/src/feature/cuda/float_motion_cuda.c — same shape; rb wraps the SAD partials, blur[2] ping-pong stays outside.
  • Numerical contract: unchanged. Same dispatch geometry, same reduction order. Cross-backend parity gate at the kernels' contracted precision (places=3 per ADR-0192) holds.

0117 — float_adm + float_vif cuda lifecycle migration (T-GPU-DEDUP-17)

  • Touches:
  • core/src/feature/cuda/float_adm_cuda.c — stream + 2-event lifecycle replaced with VmafCudaKernelLifecycle lc; multi-stage DWT + CSF pipeline state stays outside the template's single-pair readback bundle.
  • core/src/feature/cuda/float_vif_cuda.c — same shape; 4-level pyramid + per-scale (num, den) pairs remain inline.
  • Numerical contract: unchanged. The migration only affects init / close stream-event boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the field renames.
  • Rebase impact: low. Upstream Netflix has no equivalent template; this is fork-added.

0107 — float_psnr_vulkan migrated to kernel_template (T-GPU-DEDUP-8)

  • Touches:
  • core/src/feature/vulkan/float_psnr_vulkan.c — state's dsl + pipeline_layout + shader + pipeline + desc_pool quintet is collapsed into a single VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy. No shader changes, no spec-constant changes, no push-constant changes.
  • Numerical contract: unchanged. The migration is a pure Vulkan-boilerplate consolidation. Cross-backend parity gate at places=4 holds — Netflix-pair smoke reports float_psnr mean 30.755 dB, identical to pre-migration.

0109 — float_ansnr_vulkan + motion_v2_vulkan migrated to kernel_template (T-GPU-DEDUP-9)

  • Touches:
  • core/src/feature/vulkan/float_ansnr_vulkan.c — single-pipeline state collapses to VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy.
  • core/src/feature/vulkan/motion_v2_vulkan.c — same shape.
  • Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Cross-backend parity gate at the kernel's contracted precision holds — Netflix-pair smoke reports float_ansnr mean 23.51 dB and motion2_v2_score mean 3.895, identical to pre-migration.

0110 — float_motion_vulkan migrated to kernel_template (T-GPU-DEDUP-10)

  • Touches:
  • core/src/feature/vulkan/float_motion_vulkan.c — single-pipeline state collapses to VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy.
  • Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Netflix-pair smoke reports motion mean 4.049 / motion2 mean 3.894, identical to pre-migration.
  • Rebase impact: low. Upstream Netflix has no Vulkan backend.

0108 — Bristol VI-Lab feasibility digest + BVI-CC ingest ADR (Draft)

  • Touches:
  • docs/research/0046-bristol-vi-lab-feasibility.md (new) — nine-dataset survey + use-case fit + effort estimate.
  • docs/adr/0241-bristol-bvi-cc-ingest.md (new, Status: Draft) — proposal to ingest BVI-CC as the second tiny-AI corpus.
  • docs/adr/README.md — index row for ADR-0241.
  • CHANGELOG.md — Added entry.
  • Numerical contract: not applicable (docs-only).
  • Rebase impact: none. Pure research deliverables; upstream Netflix has no equivalent surface.

0094 — Vulkan VkImage import v2 async pending-fence (T7-29 part 4 / ADR-0251)

  • ADR: ADR-0251; predecessor ADR-0186.
  • Touches:
  • core/src/vulkan/import.c — full rewrite of the submission path. Single-fence submit_and_wait becomes per-slot submit_to_slot + drain_slot_fence; the new slot_alloc / slot_release helpers materialise / tear down a ring slot (staging-pair + cmd buffer + fence). vmaf_vulkan_import_image indexes into the ring by frame_index % ring_size; vmaf_vulkan_wait_compute drains every outstanding fence. vmaf_vulkan_state_build_pictures waits the slot's fence before exposing the host pointer. Public-API signatures are unchanged.
  • core/src/vulkan/vulkan_internal.h — new struct VmafVulkanImportSlot; VmafVulkanImportSlots becomes a fixed-capacity VmafVulkanImportSlot ring[VMAF_VULKAN_RING_MAX] plus geometry + ring_size. Two new defines — VMAF_VULKAN_RING_DEFAULT (4) and VMAF_VULKAN_RING_MAX (8). VmafVulkanState gains requested_ring_size.
  • core/src/vulkan/common.cvmaf_vulkan_state_init and _state_init_external set requested_ring_size = VMAF_VULKAN_RING_DEFAULT.
  • core/test/test_vulkan_async_pending_fence.c (new, contract smoke for the v1 → v2 swap).
  • core/test/meson.build — registers the new test under the existing enable_vulkan guard.
  • core/src/vulkan/AGENTS.md (new) — pins the three rebase-sensitive ring invariants.
  • docs/adr/0251-vulkan-async-pending-fence.md (new), docs/research/0042-vulkan-async-pending-fence.md (new), docs/api/gpu.md, docs/backends/vulkan/overview.md, CHANGELOG.md, docs/rebase-notes.md.
  • ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patchunchanged. The v2 ring is fully internal to VmafVulkanState; the public ABI stays byte-identical so the filter consumes the new path transparently.
  • Invariant 1 — fixed ring depth at first import. lazy_alloc_ring is the only place that materialises the ring; once allocated the depth never changes for the lifetime of the VmafVulkanState. Any caller that needs a different depth has to free + re-init. The geometry pinning contract from v1 (ADR-0186) is preserved verbatim.
  • Invariant 2 — vkResetFences only after VK_SUCCESS from vkWaitForFences. Sole reset path lives in drain_slot_fence; fence_in_flight flips back to 0 only after the wait succeeds. A -EIO from the wait propagates up without resetting (so a retry would correctly re-wait rather than silently move on).
  • Invariant 3 — state_free drains before destroying. vmaf_vulkan_import_slots_free walks the ring and calls drain_slot_fence on every in-flight slot, then issues one vkQueueWaitIdle belt-and-braces (any feature kernel that submitted on the same queue may still be running). Reordering this triggers validation-layer "destroying in-use object" errors.
  • Numerical contract: unchanged. Async submission only changes when the host can read the staging buffer, not which bytes the GPU writes. Cross-backend parity gate (scripts/ci/cross_backend_parity_gate.py, places=4) holds.
  • Memory delta: staging arena scales 1 → ring_size per direction. At default depth and 1080p 8-bit Y, the per-state host-visible footprint grows from ~4 MiB to ~16 MiB. Documented in ADR-0251 §Consequences.

0090 — cambi_vulkan extractor (T7-36 / ADR-0210)

  • ADR: ADR-0210; predecessor ADR-0205.
  • Touches:
  • core/src/feature/vulkan/cambi_vulkan.c (replaces the spike scaffold's init_stub/extract_stub/close_stub triple with the full Vulkan-aware lifecycle).
  • core/src/feature/vulkan/shaders/cambi_preprocess.comp (new), cambi_mask_dp.comp (new — unified row-SAT / col-SAT / threshold-compare via PASS=0/1/2 spec const).
  • core/src/feature/cambi.c — appends a small block of public trampolines (vmaf_cambi_*) at the bottom of the file that thinly wrap the file-static helpers. No upstream function-static code is renamed or moved; the entire upstream body of cambi.c above the trampolines stays byte-identical, which keeps Netflix sync straightforward.
  • core/src/feature/cambi_internal.h (new) — internal-only header exposing vmaf_cambi_calculate_c_values, vmaf_cambi_get_spatial_mask, etc., to the GPU twin.
  • core/src/vulkan/meson.build — registers the 5 cambi shaders in vulkan_shader_sources[] and cambi_vulkan.c in vulkan_sources.
  • core/src/feature/feature_extractor.c — adds the extern decl + registry entry for vmaf_fex_cambi_vulkan under #if HAVE_VULKAN.
  • scripts/ci/cross_backend_vif_diff.pycambi row in FEATURE_METRICS so the cross-backend gate runs at places=4 against the CPU baseline.
  • docs/adr/0210-cambi-vulkan-integration.md, docs/research/0032-cambi-vulkan-integration.md, docs/backends/vulkan.md, CHANGELOG.md.
  • Invariant 1 — bit-exactness by construction. Every GPU phase is integer arithmetic (uint16 derivative, int32 SAT, > compare, stride-2 gather, 3-element mode3 lookup). The readback into the host VmafPicture pair is byte-identical to what the CPU would have written; the host residual then runs the unmodified CPU calculate_c_values + spatial pooling on those buffers. Any rebase that introduces float arithmetic into one of these GPU phases — e.g., a future Netflix change to the derivative kernel that adds a bilinear interpolation step — will silently break places=4 and must be caught at the cross-backend gate.
  • Invariant 2 — cambi_internal.h signatures must stay in lock-step with cambi.c's file-static helpers. The Vulkan twin calls vmaf_cambi_calculate_c_values, which trampolines to the file-static calculate_c_values. Any signature change to the latter (extra parameters, type changes) must update the trampoline + header in the same PR or the GPU build breaks.
  • On upstream sync: cambi.c's file-static helpers are sometimes renamed by upstream (e.g., decimatecambi_decimate would happen during a Netflix tidy-up). When rebasing, search cambi.c's tail for the trampoline block — its five static calls (get_spatial_mask, decimate, filter_mode, calculate_c_values, spatial_pooling, weight_scores_per_scale, get_pixels_in_window, increment_range, decrement_range, get_derivative_data_for_row, cambi_preprocessing) need to match the upstream symbol names. Update the trampoline body if upstream renames; signatures should not need to change because the trampoline already takes the function-pointer-typedef form (VmafRangeUpdater etc.).
  • Re-test on rebase: python3 scripts/ci/cross_backend_vif_diff.py --backend vulkan --feature cambi --ref testdata/ref_576x324_48f.yuv --dist testdata/dis_576x324_48f.yuv --width 576 --height 324 --pixel-format 420 --bitdepth 8 --frames 48. Should emit places=4 PASS with max_abs_diff = 0.0. If it diverges, bisect the GPU phases by reading back individual buffers (image_buf / mask_buf / deriv_buf) and comparing against the CPU's in-place pic plane after the equivalent stage.

The pre-ADR-0108 fork-local PRs are summarised by workstream rather than per-PR. Future PRs add entries individually.

0085 — Upstream c70debb1 partial port (adm_csf + barten_csf tests)

  • No ADR. Pure upstream cherry-pick per ADR-0108 carve-out ("pure upstream syncs and port-upstream-commit PRs are exempt").
  • Upstream source: c70debb1 (Kyle Swanson, 2026-04-28): "libvmaf/test: port new adm/vif/speed tests". The audit row that flagged the gap is T-NEW-2 in the 2026-04-29 quarterly upstream-backlog re-audit (PR #205).
  • Touches (additive only):
  • core/src/feature/adm_csf_tools.h — new header (verbatim from upstream); declares the inline adm_native_csf helper (DLM-paper CSF) used by the new test_adm_csf unit.
  • core/test/test_adm_csf.c — new unit (verbatim from upstream); 2 mu_assert cases on adm_native_csf(3, 3.0, 1080, {0, 45}).
  • core/test/test_barten_csf.c — new unit (verbatim from upstream); 23 mu_assert cases over barten_rod_cone_sens, barten_mtf, barten_csf, linear_interpolate, barten_watson_blend_csf (all symbols already on the fork).
  • core/test/meson.build — registers the two new executables + adds test('test_adm_csf', ...) and test('test_barten_csf', ...).
  • CHANGELOG.md Unreleased § Changed.
  • Deliberate scope cuts (the upstream commit's other halves are not portable verbatim):
  • test_vif_tools.c — depends on upstream symbols NUM_KERNELSCALES, the 21-entry valid_kernelscales table, vif_validate_kernelscale, vif_get_filter_size, vif_get_filter, speed_get_antialias_filter, and a [NUM_KERNELSCALES][5][65] filter table that the fork's vif_filter1d_table_s [11][4][65] does not match. Per Research-0024 Strategy E, the fork deliberately diverges from the upstream vif runtime-helper chain to preserve the ADR-0138 / 0139 / 0142 / 0143 SIMD bit-exactness contract. Porting this test requires porting the runtime helpers first.
  • test_speed_chroma.c#includes feature/speed.c directly; the fork has no SpEED extractor (feature/speed.c does not exist). Pairs with audit row T-NEW-1 (port the SpEED extractor wholesale, or absorb it into the tiny-AI speed metric).
  • Invariants (rebase-relevant):
  • The new adm_csf_tools.h header is wholly additive and does not conflict with the existing fork adm_csf_s non-inline helper in adm_tools.h (different signature, different translation units).
  • The two new tests do not depend on Netflix golden YUVs — they evaluate the closed-form CSF math directly. No golden-data interaction.
  • On upstream sync: a future port of the upstream vif runtime-helper chain (Research-0024 Strategy A reversal) or the SpEED extractor (T-NEW-1) unlocks the deferred halves of this commit. Until then, fork-side test_vif_tools.c / test_speed_chroma.c stay absent.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu test_adm_csf test_barten_csf
meson test -C build-cpu test_adm_csf test_barten_csf

0084 — Embedded MCP server scaffold (T5-2, ADR-0209)

  • ADR: ADR-0209 (audit-first scaffold) on top of the ADR-0128 governance + Research-0005 design.
  • Upstream source: fork-local. Netflix/vmaf has no embedded MCP server (and no plans to add one — the workflow is agent-tooling-specific, well outside upstream's library scope).
  • Touches:
  • core/include/libvmaf/libvmaf_mcp.h — new public header.
  • core/include/core/meson.build — new if get_option('enable_mcp') install branch.
  • core/src/mcp/ — new directory: mcp.c (stub TU) + meson.build (exposes mcp_sources + mcp_defines).
  • core/src/meson.build — new is_mcp_enabled guard + subdir('mcp') block; mcp_sources threaded into the library('vmaf', ...) source list alongside dnn_sources.
  • core/test/meson.build — new if get_option('enable_mcp') block wiring test_mcp_smoke.
  • core/test/test_mcp_smoke.c — new 12-sub-test smoke.
  • core/meson_options.txt — new enable_mcp umbrella + three sub-flags (all default false).
  • Invariant: every public entry point in libvmaf_mcp.h (vmaf_mcp_init / _start_sse / _start_uds / _start_stdio / _stop / _close) returns -ENOSYS (or -EINVAL on bad arguments) until the T5-2b runtime PR lands. The smoke pins this contract — a runtime PR that flips a return code without flipping the smoke expectation regresses the gate.
  • On upstream sync: zero interaction with upstream files. Wholly additive directory + boolean build flags. The subdir('mcp') insertion in core/src/meson.build lives next to the existing subdir('dnn') / Vulkan blocks; an upstream conflict in that area would be confined to those few lines and is mechanical to resolve.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_mcp=false
ninja -C build-cpu && meson test -C build-cpu  # baseline still green

meson setup --reconfigure build-cpu libvmaf -Denable_mcp=true \
            -Denable_mcp_sse=true -Denable_mcp_uds=true -Denable_mcp_stdio=true
ninja -C build-cpu
meson test -C build-cpu test_mcp_smoke  # 12/12 sub-tests pass

0065 — T7-37 Netflix bench rerun + docs/benchmarks.md TBD fill

  • No ADR. Empirical fill of pre-existing TBD cells; no new decision. The bench script fixes that this rerun depends on shipped earlier under PR #169 (libvmaf/AGENTS.md backend-engagement foot-guns), PR #170 (--backend cuda actually engages CUDA), and PR #171 (testdata/bench_all.sh uses correct flags). Vulkan header install for SDK consumers is PR #175.
  • Touches (additive only): docs/benchmarks.md (every TBD cell replaced with measured numbers; hardware-profile table updated to the ryzen-4090-arc host the rerun was performed on; "How to reproduce" section now documents fixture acquisition for the gitignored BBB 4K 200-frame pair). CHANGELOG.md Unreleased § Changed entry.
  • Invariants (rebase-relevant): none. The numbers are tied to fork commit 41301496 and the ryzen-4090-arc profile; an upstream rebase that changes feature pipelines would invalidate the table but not break parsing.
  • On upstream sync: zero interaction. Pure docs.
  • Re-test on rebase: bash testdata/bench_all.sh (after a fresh fork build) — confirms the bench script still drives all four backends and that the per-row metrics-key counts (CPU=15, CUDA=12, SYCL/Vulkan=34) still distinguish them. If they collapse to one count, the new upstream broke a backend dispatcher silently.

0050 — float_adm_cuda + float_adm_sycl extractors (ADR-0202)

  • ADR: ADR-0202
  • Touches:
  • core/src/feature/cuda/float_adm/float_adm_score.cu (new)
  • core/src/feature/cuda/float_adm_cuda.{c,h} (new)
  • core/src/feature/sycl/float_adm_sycl.cpp (new)
  • core/src/meson.build — three changes: (1) new float_adm_score entry in cuda_cu_sources, (2) new cuda_cu_extra_flags dict that threads --fmad=false + -Xcompiler=-ffp-contract=off into the float_adm_score fatbin only, (3) new SYCL source in sycl_feature_sources.
  • core/src/feature/feature_extractor.c (extern decls + list entries for vmaf_fex_float_adm_cuda / vmaf_fex_float_adm_sycl under #if HAVE_CUDA / #if HAVE_SYCL).
  • Invariant 1 — --fmad=false for the float_adm fatbin only: the angle-flag dot product (ot_dp = oh*th + ov*tv) and the cube reductions (xa*xa*xa, csf_o*csf_o*csf_o) require IEEE-754 add/mul ordering to match the GLSL precise qualifier in float_adm.comp. NVCC's default -fmad=true fuses these and drifts past places=4 at scale 3 / adm2. The integer ADM kernels share cuda_flags but use int64 accumulators where FMA is irrelevant — keep the FMA-on default for them.
  • Invariant 2 — parent-LL dimension trap: stage 0 at scale > 0 reads the parent's LL band; the mirror/clamp bounds are scale_w/h[scale] (= parent's LL output dims = current scale's input dims), NOT scale_w/h[scale - 1] (= parent's full image dims). Both float_adm_cuda.c and float_adm_sycl.cpp cite this inline. Do not "simplify" by using the off-by-one neighbour.
  • Re-test:
CXX=icpx CC=icx meson setup build-cs -Denable_cuda=true \
     -Denable_sycl=true -Denable_vulkan=enabled \
     -Denable_float=true \
     -Dsycl_compiler=/opt/intel/oneapi/compiler/latest/bin/icpx
ninja -C build-cs
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary build-cs/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature float_adm \
  --backend cuda --places 4
# Same with --backend sycl on a host with an SYCL device.
# Both must report 0/N mismatches at places=4.

0049 — float_adm_vulkan extractor (ADR-0199)

  • ADR: ADR-0199
  • Touches:
  • core/src/feature/vulkan/float_adm_vulkan.c (new)
  • core/src/feature/vulkan/shaders/float_adm.comp (new)
  • core/src/vulkan/meson.build (adds the .comp shader and the new .c source)
  • core/src/feature/feature_extractor.c (extern decl + list entry under #if HAVE_VULKAN)
  • scripts/ci/cross_backend_vif_diff.py (float_adm entry in FEATURE_METRICS)
  • .github/workflows/tests-and-quality-gates.yml (lavapipe float_adm step at places=4)
  • Invariant: float_adm GPU port uses the 2 * sup - idx - 1 mirror form on both axes — matches both the scalar adm_dwt2_s and the AVX2 float_adm_dwt2_avx2, which both consume the same dwt2_src_indices_filt_s index buffer. This is intentionally different from float_vif's GPU mirror (ADR-0197), which uses -2 because float_vif's AVX2 path takes a different code branch. Do not "fix" the asymmetry by analogy with float_vif.
  • Re-test:
meson setup build-vk -Denable_vulkan=enabled -Denable_cuda=false \
                     -Denable_sycl=false
ninja -C build-vk
meson test -C build-vk
VK_LOADER_DRIVERS_SELECT='*lvp*' python3 \
  scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary build-vk/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature float_adm --places 4

0083 — SSIMULACRA 2 Vulkan kernel (ADR-0201)

meson setup core/build-vk-ss2 \
  -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false \
  libvmaf
ninja -C core/build-vk-ss2 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build-vk-ss2/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 \
  --feature ssimulacra2 --backend vulkan --places 1
# expected: max_abs_diff ≈ 1.59e-2, 0/48 mismatches at places=1
  • Follow-ups:
  • CUDA + SYCL twins (batch 3 parts 7b + 7c per ADR-0192).
  • Performance follow-up: re-bin multiple rows / columns per WG in the IIR blur (currently local_size = 1, one row/col per WG for correctness).
  • Optional: rename psnr_hvs_strict_shaders to strict_shaders in core/src/vulkan/meson.build (cosmetic — out of scope for this PR).

0001 — SIMD bit-identical reductions for float ADM

  • Workstream PRs: #18, commits 24c88a32, f082cfd3.
  • Touches: core/src/feature/integer_adm.c, core/src/feature/float_adm.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/arm64/adm_neon.c, upstream python/test/feature_extractor_test.py test expectations.
  • Invariant: sum_cube and csf_den_scale accumulate cubed values in double precision (via _mm256_cvtps_pd / _mm512_cvtps_pd) in scalar, AVX2, AVX-512, and NEON. Upstream accumulates in float, which produces ~8e-5 drift between scalar and SIMD. Test expectations were tightened to match the double-precision path; an upstream-side accumulator change would re-introduce the drift and break the tightened assertions.
  • Re-test: meson test -C build --suite=fast && python -m pytest python/test/feature_extractor_test.py -k adm.

0002 — CUDA ADM decouple-inline buffer elimination

  • Workstream PRs: commit 787e3382.
  • Touches: core/src/feature/cuda/integer_adm_cuda.cu, core/src/feature/cuda/adm_decouple_inline.cuh (new), core/src/feature/cuda/meson.build. Upstream's adm_decouple.cu is no longer compiled in the fork.
  • Invariant: CSF and CM CUDA kernels read ref / dis DWT2 buffers directly and compute decouple_r / decouple_a inline via __device__ helpers in adm_decouple_inline.cuh. The 6 intermediate buffers (decouple_r, decouple_a, csf_a × {scale-0 int16, scales 1-3 int32}) and the standalone adm_decouple.cu source are intentionally removed. ~107 MB GPU memory savings at 4K. An upstream change to adm_decouple.cu will look orphaned and a literal merge would re-introduce the buffer allocations.
  • Re-test: meson setup build -Denable_cuda=true && ninja -C build && meson test -C build --suite=cuda.

0003 — SYCL backend (USM pool / D3D11 import / vmaf_sycl_* API)

  • Workstream PRs: #33, #35, #5 (initial scaffolding), and the picture-pool deadlock fix that landed via #32.
  • Touches: core/include/libvmaf/libvmaf_sycl.h, core/src/sycl/, core/src/feature/sycl/, core/src/libvmaf.c (SYCL public-API entry points), meson_options.txt (enable_sycl).
  • Invariant: vmaf_sycl_preallocate_pictures constructs a real VmafSyclPicturePool honoring VmafSyclPicturePreallocationMethod (NONE / DEVICE / HOST); vmaf_sycl_picture_fetch dispatches to the pool when configured. The whole SYCL tree is fork-local and has no upstream counterpart — upstream changes to core/src/libvmaf.c near the SYCL entry-point block are likely to conflict. Picture-pool error paths in vmaf_read_pictures (libvmaf.c) must goto cleanup; rather than return err; to avoid leaking ref/dist pictures into the live-picture set (closes the always-on-pool deadlock fixed in #32 — see ADR-0104). See ADR-0101, ADR-0103, ADR-0104.
  • Re-test: meson setup build -Denable_sycl=true && ninja -C build && meson test -C build --suite=sycl (requires oneAPI / icpx).

0004 — DNN runtime + tiny-AI surfaces

  • Workstream PRs: #5, #8, #21, #22, #23, #31, #34, plus the pre-numbered DNN feat commits (9b985946, 1e5336d3, d122b721).
  • Touches: core/include/libvmaf/dnn.h, core/src/dnn/, core/src/feature/feature_lpips.c, model/tiny/, meson_options.txt (enable_onnxruntime).
  • Invariant: ordered EP selection (CUDA → DML → CPU) with graceful fallback (ADR-0102); fp16_io does host-side fp32↔fp16 cast on the scoring path; VMAF_TINY_MODEL_DIR enforces a path jail on model load (PR #31); the runtime op-allowlist (PR #21) walks the ONNX graph and rejects unknown ops + bounds Loop/If trip_count at 1024 (ADR-0036/0107). DNN tree is fork-local; upstream has no DNN code yet, so conflicts here are unlikely but the meson_options.txt and core/src/meson.build blocks near the DNN flag may collide.
  • Re-test: meson setup build -Denable_onnxruntime=true && ninja -C build && meson test -C build --suite=dnn.

0005 — --precision CLI flag (IEEE-754 round-trip lossless)

  • Workstream PRs: commit c989fbd9.
  • Touches: core/tools/vmaf.c, core/tools/cli_parse.c, core/include/libvmaf/libvmaf.h (added vmaf_write_output_with_format), core/src/output.c.
  • Invariant: default --precision is %.17g (round-trip lossless); legacy opts back into upstream's %.6f; the public C API gained vmaf_write_output_with_format and the old vmaf_write_output routes through it with the %.17g default. ABI-breaking only if upstream adds a same-named function with a different signature. See ADR-0006.
  • Re-test: vmaf -r ref.yuv -d dis.yuv ... --precision=full and diff against --precision=legacy.

0006 — Netflix golden tests preserved verbatim as required gate

  • Workstream PRs: across the fork's life; codified in ADR-0024.
  • Touches: python/test/quality_runner_test.py, python/test/vmafexec_test.py, python/test/vmafexec_feature_extractor_test.py, python/test/feature_extractor_test.py, python/test/result_test.py, python/test/resource/yuv/.
  • Invariant: assertAlmostEqual(...) golden values in the five upstream Python test files are never modified by this fork. Fork-added tests live in separate files (e.g. python/test/test_precision_flag.py). The CI gate "Netflix CPU golden tests (D24)" is required and blocks merge. Upstream changes to these files are accepted unless they relax the assertions.
  • Re-test: make test-netflix-golden.

0007 — Build system (CUDA 13.2, oneAPI 2025.3, MkDocs migration)

  • Workstream PRs: #7, #17, commit 8a995cb0.
  • Touches: meson.build, meson_options.txt, top-level Makefile, docs/ (Sphinx → MkDocs Material migration — docs/conf.py removed, mkdocs.yml added), docs/requirements.txt, Dockerfile.*, distro install scripts under scripts/.
  • Invariant: image pins are non-conservative (ADR-0027) — CUDA 13.2, oneAPI 2025.3, clang-format 22, black 26 — and ship experimental toolchain flags (--expt-relaxed-constexpr, etc.) deliberately. An upstream sync that pulls in a Dockerfile change targeted at older CUDA or older oneAPI must not relax the pins.
  • Re-test: meson setup build -Denable_cuda=true -Denable_sycl=true && ninja -C build && mkdocs build --strict.

0008 — Workspace / docs / MATLAB / resource-tree relocations

  • Workstream PRs: codified across ADR-0026, ADR-0029, ADR-0030, ADR-0031, ADR-0032, ADR-0033, ADR-0034, ADR-0038.
  • Touches: any path-walk in upstream's CI / scripts / docs that assumes the upstream layout (root-level workspace/, resource/, matlab/, root unittest script, root patches/).
  • Invariant: the fork's layout is python/vmaf/workspace/, python/vmaf/resource/, python/vmaf/matlab/, scripts/unittest, ffmpeg-patches/ only, .github/codeql-config.yml. Upstream moves to a different sub-tree (e.g. a hypothetical tools/workspace/) need to either be applied via a corresponding fork-side relocation or rejected with a rebase note.
  • Re-test: python -m pytest python/test/ -k golden (verifies the resource-tree path works); make test-netflix-golden.

0009 — License headers (Lusoris/Claude on wholly-new files

2016–2026 on Netflix files)

  • Workstream PRs: commits c159761d, a185f8ef, 0e98c949, codified in ADR-0025 / ADR-0105.
  • Touches: every wholly-new fork file (notably the SYCL tree and core/src/dnn/) and every Netflix-touched file (year range 2016 → 2016–2026).
  • Invariant: wholly-new fork files carry Copyright 2026 Lusoris and Claude (Anthropic) under the same BSD-3-Clause-Plus-Patent license; mixed files use a dual-copyright notice. An upstream commit that resets a Netflix file's year range (e.g. back to 2016–2020) must be partially rejected — keep the fork's 2016–2026.
  • Re-test: grep that wholly-new fork files retain the Lusoris/Claude header (grep -L "Copyright 2026 Lusoris" core/src/sycl/*.cpp — expected to match nothing).

0010 — .claude/ agent scaffolding + ADR tree + AGENTS.md / CLAUDE.md

  • Workstream PRs: #14, #24, #37, plus continuous additions.
  • Touches: .claude/, AGENTS.md, CLAUDE.md, docs/adr/, .github/PULL_REQUEST_TEMPLATE.md.
  • Invariant: this whole tree is fork-local and has no upstream counterpart. Upstream additions to .github/ (issue templates, workflows) need to merge cleanly with the fork's existing files rather than replacing them. The ADR tree's IDs ≤ 0099 are backfills; new decisions start at 0100 (ADR-0028 / ADR-0106).
  • Re-test: visual review of .github/ and docs/adr/README.md after the merge.

Pre-ADR-0108 entries above are the result of a one-shot backfill sweep on 2026-04-18; subsequent fork-local PRs add their own entries inline.

0011 — Nightly bisect-model-quality + fixture cache

  • Workstream PRs: closes #4; sticky tracker issue #40.
  • Touches: .github/workflows/nightly-bisect.yml, ai/scripts/build_bisect_cache.py, ai/testdata/bisect/{features.parquet, models/*.onnx, README.md}, scripts/ci/post-bisect-comment.py, docs/ai/bisect-model-quality.md, docs/adr/0109-nightly-bisect-model-quality.md, docs/research/0001-bisect-model-quality-cache.md, mkdocs.yml (nav).
  • Invariant: the committed parquet + ONNX bytes under ai/testdata/bisect/ must regenerate byte-identically from ai/scripts/build_bisect_cache.py with seeds FEATURE_SEED=20260418 and MODEL_SEED=20260419. The CI --check step asserts this before every bisect run, so any upstream pull that bumps pandas / pyarrow / onnx enough to change the serialiser bytes will fail the workflow until the cache is regenerated and committed.
  • Re-test:
python ai/scripts/build_bisect_cache.py --check
vmaf-train bisect-model-quality \
    ai/testdata/bisect/models/model_*.onnx \
    --features ai/testdata/bisect/features.parquet \
    --min-plcc 0.85 --input-name input
# Expected: "no regression in this range"; first_bad_index None.

Pure upstream code is not touched, so no Netflix-side conflict vector. Only fork-local files; risk is toolchain drift, not merge conflict.

0012 — Upstream ADM port (Netflix 966be8d5)

  • Workstream PRs: this PR; ports a single upstream commit.
  • Touches: core/src/feature/integer_adm.{c,h}, core/src/feature/x86/adm_avx2.{c,h}, core/src/feature/x86/adm_avx512.{c,h}, core/src/feature/alias.c, core/src/feature/barten_csf_tools.h (new upstream file).
  • Invariant: the eight ADM files now mirror upstream's content byte-for-byte (modulo our clang-format-22 pass and the Netflix copyright-year bump on the new header). Future /sync-upstream runs can take new upstream ADM commits cleanly. Do not revert to a pre-966be8d5 ADM kernel without also reverting the call-site signatures in integer_compute_adm — upstream extended i4_adm_cm from 8 to 13 args.
  • Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --model version=vmaf_v0.6.1 -o /tmp/vmaf-port.json
grep '<metric name="vmaf"' /tmp/vmaf-port.json
# Expected: mean ≈ 76.66890 (golden 76.66890519623612, places=4 OK).

0013 — Upstream motion port (Netflix PR #1486 head 2aab9ef1)

  • Workstream PRs: this PR; ports upstream PR #1486 (4 commits on top of 966be8d5 ADM base, head 2aab9ef1). Sister to entry 0012.
  • Touches: core/src/feature/integer_motion.{c,h}, core/src/feature/motion_blend_tools.h (new upstream file), core/src/feature/x86/motion_avx2.c, core/src/feature/x86/motion_avx512.c, core/src/feature/alias.c (additive: integer_motion3 row), python/test/{quality_runner,vmafexec,feature_extractor,vmafexec_feature_extractor}_test.py (golden tolerance updates: places=4places=2 on motion-affected asserts; expected values unchanged).
  • Invariant: motion files mirror upstream byte-for-byte (modulo our clang-format-22 pass). The alias.c row for integer_motion3 was inserted surgically to avoid clobbering the AVX-512 ADM registration added by entry 0012; new motion3 metric appears in default VMAF model output but is not standalone-loadable via --feature integer_motion3 (sub-feature only). Netflix golden VMAF mean shifts 76.66890482476.667830213 (well within places=2 tolerance the upstream PR loosened to). Do not revert places=4 on motion-touching assertions without also reverting the motion code.
  • Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --model version=vmaf_v0.6.1 -o /tmp/vmaf-motion-port.json
grep -E '<metric name="vmaf"|integer_motion3' /tmp/vmaf-motion-port.json
# Expected: vmaf mean ≈ 76.66783; integer_motion3 mean ≈ 3.98976.

0014 — Coverage gate overhaul + upstream python/test/ reformat

  • Workstream PRs: this PR (coverage-gate overhaul + in-tree reformat of upstream-mirror Python tests).
  • Touches: .github/workflows/ci.yml (CPU + GPU coverage jobs: -Dc_args=-fprofile-update=atomic / -Dcpp_args=-fprofile-update=atomic, meson test --num-processes 1, -Denable_dnn=enabled, ORT install step on the CPU coverage job, lcov/geninfo replaced by gcovr with --json-summary / --xml / --txt output, artifact rename coverage-lcov-{cpu,gpu}coverage-{cpu,gpu}), scripts/ci/coverage-check.sh (rewritten to parse gcovr JSON via python3 -c — same CLI signature), core/src/dnn/dnn_api.c + new core/src/dnn/dnn_attach_api.c (vmaf_use_tiny_model carved out into its own TU so the unit-test binaries — which pull in dnn_sources for feature_lpips.c but never link libvmaf.c — don't end up with an undefined reference to vmaf_ctx_dnn_attach once enable_dnn=enabled activates the real bodies), core/src/dnn/meson.build + core/src/meson.build (new dnn_libvmaf_only_sources list wired into libvmaf.so only), python/test/{feature_extractor,quality_runner,vmafexec,vmafexec_feature_extractor}_test.py (mechanical Black + isort reformat — no assertion values changed, imports regrouped, line wrapping normalised).
  • Invariant: coverage CI must keep all five pieces in lockstep — (a) -fprofile-update=atomic closes the intra-process counter race on SIMD inner loops (vif_avx2.c:673, motion_avx2, etc.) → negative counts → geninfo/gcovr abort; (b) --num-processes 1 closes the inter-process race where multiple parallel test binaries merge their counters into the same .gcda files for the shared libvmaf.so at process exit (per-thread atomicity does not cover this); (c) gcovr deduplicates .gcno files belonging to the same source compiled into multiple targets — without dedup, lcov sums hits across compilation units and yields impossible

    100% values (dnn_api.c — 1176% was the smoking gun on the first attempt that had only (a)+(b)); (d) ORT install + enable_dnn=enabled in the coverage job is what makes core/src/dnn/*.c measurable in the first place — without ORT, the DNN tree compiles in stub branches and the 85% per-critical-file gate is meaningless; (e) vmaf_use_tiny_model lives in dnn_attach_api.c and is added to libvmaf.so only via dnn_libvmaf_only_sources — moving it back into dnn_api.c reintroduces the vmaf_ctx_dnn_attach undefined-reference link error in test_feature_extractor / test_lpips whenever enable_dnn=enabled, since those test binaries pull in dnn_sources for feature_lpips.c but never link libvmaf.c. Lint scope: upstream-mirror Python tests are linted at the same standard as fork-added code; we accept that /sync-upstream and /port-upstream-commit will re-trigger Black/isort failures whenever upstream rewrites these files, and the fix is another in-tree reformat pass — never an exclusion. The fork's pyproject.toml and .pre-commit-config.yaml keep python/test/resource/ (binary fixtures only) excluded; python/test/*.py is in scope. See ADR-0110 (race fixes, superseded) and ADR-0111 (gcovr + ORT layer).

  • Re-test:
# Reproduce coverage path locally (requires gcc + python3-pip):
pip install --user 'gcovr>=8.0'
cd libvmaf
meson setup build-cov-test --buildtype=debug -Db_coverage=true \
    -Denable_avx512=true -Denable_float=true -Denable_dnn=disabled \
    -Dc_args=-fprofile-update=atomic -Dcpp_args=-fprofile-update=atomic
ninja -C build-cov-test
meson test -C build-cov-test --print-errorlogs --num-processes 1
~/.local/bin/gcovr --root .. \
    --filter 'src/.*' \
    --exclude '.*/test/.*' --exclude '.*/tests/.*' \
    --exclude '.*/subprojects/.*' \
    --gcov-ignore-parse-errors=negative_hits.warn \
    --gcov-ignore-parse-errors=suspicious_hits.warn \
    --print-summary --txt build-cov-test/coverage.txt \
    --json-summary build-cov-test/coverage.json \
    build-cov-test
grep -E 'dnn_api|model_loader' build-cov-test/coverage.txt
# Expected: gcovr completes without "Unexpected negative count" AND no
# per-file percentages exceed 100% (drop --num-processes 1 to reproduce
# the multi-process .gcda merge race; switch back to lcov to reproduce
# the dnn_api.c — 1176% over-count from compilation-unit summation).

# Lint smoke test for upstream-mirror tree:
pre-commit run --files python/test/quality_runner_test.py
# Expected: Black/isort/Ruff all PASS — files are reformatted in-tree
# to fork style and stay clean until the next upstream sync.

0015 — Tox doctest collection skips vmaf/resource/

  • Workstream PRs: this PR (fix(ci): skip pytest doctest collection of vmaf/resource/ data files). Surfaced once ADR-0115 consolidated CI triggers to master and tox actually started running on PRs.
  • Touches: python/tox.ini (single-line --ignore=vmaf/resource added to the pytest invocation, plus an explanatory comment block). Pure fork-local; no upstream Python file changes.
  • Invariant: pytest --doctest-modules must not attempt to import files under python/vmaf/resource/. Those are parameter / dataset / example-config .py files; several have dots in their stems (e.g. vmaf_v7.2_bootstrap.py) that make them unimportable as Python modules. None carry doctests, so the ignore is correctness rather than a workaround. Do not drop the --ignore=vmaf/resource flag without first verifying every file under that directory has been renamed to a dot-free stem and is importable.
  • Re-test:
cd python && tox -e py311 -- --collect-only --doctest-modules \
    --ignore=vmaf/resource 2>&1 | grep -c "ERROR collecting vmaf/resource"
# Expected: 0 (was 5 before the fix).

Pure upstream code is not touched, so no Netflix-side conflict vector. Risk is upstream renaming or removing files under python/vmaf/resource/ such that the directory disappears, in which case the --ignore becomes a harmless no-op.

  • Workstream PRs: this PR (fix(libvmaf): gate -fsycl link arg on icpx CXX, allow gcc/clang host linker). Surfaced once ADR-0115's CI consolidation added an Ubuntu SYCL job to PR-time CI that uses CXX=g++ (host linker) with sidecar icpx for SYCL .cpp compilation.
  • Touches: core/src/meson.build (the vmaf_link_args block immediately after the is_sycl_enabled flag handling — currently ~lines 696-712). Pure fork-local; no upstream Meson file changes expected.
  • Invariant: -fsycl is appended to vmaf_link_args only when meson.get_compiler('cpp').get_id() == 'intel-llvm' (icpx). Rationale: the documented project mode (see comment near is_sycl_enabled block at top of src/meson.build) compiles SYCL .cpp files via custom_target with icpx, while the project's CXX driver may be gcc / clang / msvc; in that mode the SPIR-V device code is already embedded in the icpx-compiled .o files at compile time, and the runtime libraries (libsycl + libsvml + libirc + libze_loader) declared as link dependencies resolve every symbol. Passing -fsycl to a non-icpx linker is a hard error (g++: error: unrecognized command-line option '-fsycl'). Do not remove the cpp.get_id() == 'intel-llvm' guard without first verifying every CI matrix leg uses icpx as the project CXX.
  • Re-test:
meson setup build -Denable_sycl=true \
    -Dcpp_link_args=-Wl,--no-undefined
ninja -C build src/libvmaf.so.3
# Expected: link succeeds; no `-fsycl` errors with gcc/clang host CXX.

Pure fork-local guard; no Netflix-side conflict vector.

0017 — CLI precision default %.6f (Netflix-compat) + frame-skip unref

  • Workstream PRs: this PR (fix(cli): revert precision default to %.6f and unref skipped frames). Reverts the default flipped by commit c989fbd9 (ADR-0006) per ADR-0119. Companion fix in core/tools/vmaf.c resolves the picture-pool exhaustion in the --frame_skip_ref/dist loops surfaced once the always-on picture pool (ADR-0104) made unref'ing skipped pictures mandatory.
  • Touches:
  • core/tools/cli_parse.c (VMAF_DEFAULT_PRECISION_FMT + VMAF_LOSSLESS_PRECISION_FMT macros, resolve_precision_fmt() body, --help text)
  • core/tools/cli_parse.h (field comments only; struct shape unchanged)
  • core/src/output.c (DEFAULT_SCORE_FORMAT macro)
  • core/tools/vmaf.c (skip loop bodies at the c.frame_skip_ref / c.frame_skip_dist for-loops)
  • python/vmaf/core/result.py (per-frame and aggregate :.6f formatters)
  • python/test/command_line_test.py is unmodified — Netflix golden assertions stay frozen per CLAUDE.md §8; the binary's output format adapts to them, not the other way around.
  • Invariant: vmaf CLI default score-output format is %.6f (matches upstream Netflix byte-for-byte). --precision=max|full selects %.17g (IEEE-754 round-trip lossless). --precision=legacy is a synonym for the default. The library default for vmaf_write_output_with_format(..., score_format=NULL) matches. Skipped frames in the --frame_skip_ref / --frame_skip_dist pre-loops are vmaf_picture_unref'd immediately after fetch so the preallocated picture pool is not exhausted before the main scoring loop runs. Do not flip the macros back to %.17g or remove the unrefs without a superseding ADR — both are golden-gate-load-bearing.
  • Re-test:
ninja -C core/build
python -m pytest python/test/command_line_test.py \
    ::VmafexecCommandLineTest::test_run_vmafexec \
    ::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping \
    ::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping_unequal \
    -v
# Expected: all three PASS in <1 s combined.

Pure fork-local; no Netflix-side conflict vector. If upstream ever changes the default format string, treat their value as the new baseline and reconfirm the golden assertions before adopting.

0018 — FFmpeg patches ship as ordered series.txt

  • Workstream PRs: this PR (fix(ci): drop dead sycl trigger + consolidate windows.yml into libvmaf.yml (ADR-0115)). Surfaced once ADR-0115's consolidation routed the docker / FFmpeg-SYCL jobs through the master-targeting CI gate for the first time on this branch — the standalone 0003-…sycl… apply broke because it referenced struct fields added by 0001-…tiny-model…, the Dockerfile only COPY'd 0003, and ffmpeg.yml referenced a stale ../patches/ path.
  • Touches: Dockerfile (lines ~86-95 — the FFmpeg patch-apply block), .github/workflows/ffmpeg.yml (the Build FFmpeg with SYCL patch series step), ffmpeg-patches/000{1,2,3}-*.patch (regenerated via real git format-patch -3 so they carry valid index <sha>..<sha> <mode> lines and committable SHAs). Pure fork-local; no upstream FFmpeg or Netflix file changes.
  • Invariant: both the Dockerfile and ffmpeg.yml walk ffmpeg-patches/series.txt line-by-line and apply each patch via git apply with a patch -p1 fallback. Do not ship a new patch without appending it to series.txt, and do not reorder existing entries — patch 0003 references LIBVMAFContext fields added by patch 0001, so any out-of-order apply breaks the build at hunk 2 of vf_libvmaf.c.
  • Two flag-side fixes bundled in the same PR:
  • --enable-libvmaf-sycl is not a valid FFmpeg configure option. Patch 0003 uses check_pkg_config libvmaf_sycl … auto-detection (matching how libvmaf_cuda is wired) — it never registers the switch. Both Dockerfile and ffmpeg.yml used to pass the flag and configure rejected it with Unknown option "--enable-libvmaf-sycl". SYCL support is now controlled solely by -Denable_sycl=true at libvmaf build time; FFmpeg picks it up automatically when libvmaf-sycl.pc is on PKG_CONFIG_PATH.
  • The Dockerfile now carries two nvcc-flag ARGs. NVCC_FLAGS (libvmaf) keeps four -gencode lines plus the experimental --extended-lambda / --expt-relaxed-constexpr / --expt-extended-lambda flags needed for Thrust/CUB host+device code. FFMPEG_NVCC_FLAGS (FFmpeg) carries a single -gencode arch=compute_75,code=sm_75 -O2 — FFmpeg's check_nvcc runs nvcc -ptx, which fails with nvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures on multi-arch input, and --extended-lambda requires host+device compilation. compute_75 PTX is forward-compatible with all newer GPUs via driver JIT.
  • --enable-libnpp is no longer passed to FFmpeg's configure. FFmpeg n8.1's libnpp probe carries an explicit die "ERROR: libnpp support is deprecated, version 13.0 and up are not supported" (configure:7335-7336) that fires on the base image's CUDA 13.2 libnpp. We don't use scale_npp / transpose_npp / sharpen_npp in any VMAF workflow; cuvid + nvdec + nvenc + libvmaf-cuda is the actual GPU path. Revisit once we move to an FFmpeg release that supports CUDA 13 libnpp upstream.
  • Patch 0002 (add-vmaf_pre-filter) gained a missing #include "libavutil/imgutils.h" for av_image_copy_plane(). FFmpeg's libavfilter Makefile builds with -Werror=implicit-function-declaration so this fired during the actual compile (not configure). Caught by a local docker build rather than waiting for GitHub Actions — much faster iteration loop.
  • Re-test:
cd /tmp && rm -rf ffmpeg-test && \
    git clone -q --depth 1 -b n8.1 \
        https://git.ffmpeg.org/ffmpeg.git ffmpeg-test && \
    cd ffmpeg-test && \
    while IFS= read -r line; do \
        case "$line" in ''|\#*) continue ;; esac; \
        git apply "/path/to/vmaf/ffmpeg-patches/$line" \
            || patch -p1 < "/path/to/vmaf/ffmpeg-patches/$line"; \
    done < /path/to/vmaf/ffmpeg-patches/series.txt
# Expected: all three patches apply with no rejects; the resulting
# tree compiles with --enable-libvmaf. SYCL is auto-detected via
# check_pkg_config (patch 0003), so no explicit configure flag is
# required when libvmaf-sycl.pc is on PKG_CONFIG_PATH.

Pure fork-local series; no Netflix-side conflict vector. See ADR-0118.

0019 — Coverage Gate annotations: upload-artifact v7 + gcovr filter

  • Workstream PRs: this PR.
  • Touches: .github/workflows/ci.yml (CPU + GPU coverage steps: gcovr stderr piped through grep -vE 'Ignoring (suspicious|negative) hits' ... || true), .github/workflows/{ci,lint,nightly,nightly-bisect,supply-chain,libvmaf}.yml (actions/upload-artifact@v5|@v6 → @v7, actions/download-artifact@v5 → @v7 in supply-chain.yml). Note: windows.yml was consolidated into libvmaf.yml by ADR-0115 / PR #50, so the windows-side bump now lives in libvmaf.yml's build (MINGW64, …) job.
  • Invariant: Coverage Gate Annotations panel must finish empty on a clean run. The two pieces are coordinated — (a) @v7 for upload / download artifact actions silences GitHub's Node-20 deprecation banner ahead of the 2026-06-02 forced-Node-24 cutoff; (b) the gcovr stderr filter swallows the Ignoring (suspicious|negative) hits warnings that gcovr 8 emits for the legitimately-large hit counts in tight ANSNR / VIF / motion inner loops (e.g. ansnr_tools.c:207 at ~4.93 G hits across an HD multi-frame coverage suite — real, not gcov bug). The filter is regex-narrow and anchored to gcov's exact warning prefix; any other gcovr warning still surfaces. Upstream (Netflix/vmaf) does not maintain these CI files; rebase impact is limited to the unlikely case that an upstream sync touches the shared .github/workflows/ tree, which it currently does not. See ADR-0117.
  • Re-test:
# Verify gcovr filter locally (after a coverage build per entry 0014):
~/.local/bin/gcovr --root .. \
    --filter 'src/.*' \
    --exclude '.*/test/.*' --exclude '.*/tests/.*' \
    --exclude '.*/subprojects/.*' \
    --gcov-ignore-parse-errors=negative_hits.warn \
    --gcov-ignore-parse-errors=suspicious_hits.warn \
    --print-summary --txt build-cov-test/coverage.txt \
    build-cov-test \
  2> >(grep -vE 'Ignoring (suspicious|negative) hits' >&2 || true)
# Expected: stderr contains the gcovr summary block but NO
# "Ignoring (suspicious|negative) hits" lines. coverage.txt unchanged.

# Verify all upload/download-artifact instances are on @v7:
grep -rE 'actions/(upload|download)-artifact@v[0-6]' .github/workflows/
# Expected: empty output.

0020 — CI workflow file + display-name renames (Title Case sweep)

  • Workstream PRs: this PR; renames all six core .github/workflows/*.yml files to purpose-descriptive kebab-case and normalises every workflow name: and job name: to Title Case. See ADR-0116.
  • Touches: .github/workflows/{ci,lint,security,libvmaf,ffmpeg,docker}.yml (renamed via git mv to tests-and-quality-gates.yml, lint-and-format.yml, security-scans.yml, libvmaf-build-matrix.yml, ffmpeg-integration.yml, docker-image.yml), README.md (5 badge URLs + labels), docs/principles.md (line 5 workflow-tuple update), .claude/skills/add-gpu-backend/SKILL.md + scaffold.sh (filename refs), docs/adr/0116-*.md (new), docs/adr/README.md (index row), CHANGELOG.md.
  • Invariant: workflow files are purpose-named; their name: fields are Title Case sentences with em-dash axis tags; job-level name: strings are Title Case sentences (Build — / Pre-Commit / Coverage Gate / etc.). Required-status-check contexts in master branch protection are bound to job-level names — when renaming any job, re-pin via gh api --method PUT repos/VMAFx/vmafx/branches/master/protection. The 19 required gates' semantics are unchanged from ADR-0037; only their display strings move.
  • Re-test:
# Validate every workflow file parses and lists the expected job names.
cd .github/workflows
for f in tests-and-quality-gates.yml lint-and-format.yml security-scans.yml \
         libvmaf-build-matrix.yml ffmpeg-integration.yml docker-image.yml; do
    yq '.name, .jobs.[].name' "$f" || echo "PARSE FAIL: $f"
done
# Expected: each workflow prints its Title Case workflow name + job names;
# no PARSE FAIL lines.

0021 — DNN-enabled CI matrix legs (gcc + clang + macOS)

  • Workstream PRs: this PR; adds three new entries to the libvmaf-build matrix in .github/workflows/libvmaf-build-matrix.yml covering -Denable_dnn=enabled across Ubuntu/gcc, Ubuntu/clang, and macOS/clang. See ADR-0120.
  • Touches: .github/workflows/libvmaf-build-matrix.yml (3 new matrix entries + ORT install steps + dedicated dnn-suite test step), docs/adr/0120-ai-enabled-ci-matrix-legs.md (new), docs/adr/README.md (index row), CHANGELOG.md (Added entry).
  • Invariant: the DNN matrix legs install ONNX Runtime via the same pinned source as the dedicated Tiny AI job (tests-and-quality-gates.yml) — Linux: MS tarball at the version pinned by ORT_VERSION; macOS: Homebrew. When the Tiny AI job's pin changes, the matrix legs' ORT_VERSION env in their Install ONNX Runtime (linux, DNN leg) step must change to match; otherwise compiler/portability coverage drifts away from the gating leg's actual ABI.
  • Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.libvmaf-build.strategy.matrix.include[] | select(.dnn==true) | .name' \
    .github/workflows/libvmaf-build-matrix.yml
# Expected output (3 lines):
#   Build — Ubuntu gcc (CPU) + DNN
#   Build — Ubuntu clang (CPU) + DNN
#   Build — macOS clang (CPU) + DNN

# Local DNN build sanity (matches what each leg will run):
meson setup libvmaf core/build --buildtype release \
    --prefix $PWD/install -Denable_float=true -Denable_dnn=enabled
ninja -vC core/build install
meson test -C core/build --suite=dnn --print-errorlogs
  • Branch protection: the two Linux DNN legs are pinned as required status checks on master immediately after this PR's merge (19 → 21 contexts). The macOS leg stays informational (experimental: true) because Homebrew ORT floats. Re-pin command:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
    --input /tmp/protection-update.json

0022 — Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)

  • Workstream PRs: this PR; adds a new top-level windows-gpu-build job to .github/workflows/libvmaf-build-matrix.yml with two matrix entries (CUDA, SYCL). See ADR-0121.
  • Touches: .github/workflows/libvmaf-build-matrix.yml (new windows-gpu-build job), docs/adr/0121-windows-gpu-build-only-legs.md (new), docs/adr/README.md (index row), CHANGELOG.md (Added entry), core/src/compat/win32/pthread.h (new — Win32 pthread shim for MSVC; mirrors compat/gcc/stdatomic.h pattern), core/src/feature/integer_adm.h (UPSTREAM — converted the dwt_7_9_YCbCr_threshold[3] designated initializer to positional form so MSVC/nvcc-on-Windows accepts the C++ parse; semantically identical, no behavioural change), core/src/ref.h and core/src/feature/feature_extractor.h (UPSTREAM — added #if defined(__cplusplus) && defined(_MSC_VER) branch around #include <stdatomic.h> so MSVC C++ TUs pull atomic_int via using std::atomic_int;; POSIX paths unchanged), core/src/sycl/d3d11_import.cpp (fix non-existent <libvmaf/log.h>"log.h"), core/src/sycl/dmabuf_import.cpp (move <unistd.h> inside #if HAVE_SYCL_DMABUF guard for non-VA-API hosts), core/src/sycl/common.cpp (replace POSIX clock_gettime(CLOCK_MONOTONIC) with portable std::chrono::steady_clock), core/src/feature/x86/motion_avx2.c (UPSTREAM — replace GCC vector-extension __m256i[N] indexing at line 529 with _mm256_extract_epi64; bit-exact), core/src/feature/x86/adm_avx2.c (UPSTREAM — replace 6 (__m256i)(_mm256_cmp_ps(...)) casts with _mm256_castps_si256(...) and 12 __m128i[N] reductions with _mm_extract_epi64; bit-exact), core/src/feature/x86/adm_avx512.c (UPSTREAM — replace 12 __m128i[N] reductions with _mm_extract_epi64; bit-exact), core/src/log.c (UPSTREAM — gate <unistd.h> behind !_WIN32, include <io.h> + redirect isatty/fileno to _isatty/_fileno for MSVC), core/src/feature/integer_vif.c (UPSTREAM — switch the aligned_malloc cursor from void * to uint8_t * with explicit typed-pointer casts so MSVC accepts the byte-wise pointer arithmetic), core/src/feature/cuda/integer_adm_cuda.c (UPSTREAM — drop unused <unistd.h> include), core/src/dnn/model_loader.c (fork-added — Windows fallback definitions for POSIX S_ISDIR / S_ISREG path-classification macros), .github/workflows/lint-and-format.yml (fork-added — set lfs: true on the pre-commit job's checkout so LFS-stored ONNX blobs resolve and don't appear as phantom pre-commit-induced diffs), core/src/feature/x86/motion_avx512.c (UPSTREAM — replace 1 __m128i[N] reduction with _mm_extract_epi64; bit-exact), core/src/feature/x86/{vif_statistic_avx2,ansnr_avx2,ansnr_avx512,float_adm_avx2,float_adm_avx512,float_psnr_avx2,float_psnr_avx512,ssim_avx2,ssim_avx512}.c (UPSTREAM — convert 17 sites of trailing __attribute__((aligned(N))) to leading C11 _Alignas(N); same alignment, MSVC-portable), core/src/feature/mkdirp.c and core/src/feature/mkdirp.h (UPSTREAM third-party MIT-licensed micro-library — gate <unistd.h> to non-Windows, add <direct.h> + _mkdir for Windows, add mode_t typedef for MSVC), core/meson.build (new pthread_dependency gated on cc.check_header('pthread.h') failing), core/src/meson.build and core/test/meson.build (thread pthread_dependency into every target compiling pthread-using TUs).
  • Invariant: Windows GPU legs are pinned to the same toolchain versions as the corresponding Linux GPU legs (CUDA 13.0.0, oneAPI BaseKit 2025.3.0.372) so a Linux-vs-Windows divergence implies an MSVC ABI issue, not a tooling-version delta. When either Linux GPU leg bumps its toolchain, the Windows leg must move in lockstep — the Intel installer URL on Windows hard-codes the per-release directory id and the version string, so the bump is two-line edits in the SYCL Install Intel oneAPI (windows) step (the WINDOWS_BASEKIT_URL env var). Both legs additionally inject /experimental:c11atomics into CFLAGS / CXXFLAGS because libvmaf uses C11 atomics that MSVC's <stdatomic.h> rejects without that opt-in flag — when MSVC ships full C11 atomics support, the flag becomes unconditional and can be dropped. Two Windows-only dependency steps round out the parity: the CUDA leg's Jimver/cuda-toolkit sub-package list includes both crt (CUDA Runtime Library compile-time headers, ships crt/host_config.h; cuda_cccl is not a valid Windows sub-package name — installer rejects it) and nvvm (ships nvvm/bin/cicc.exe + nvvm/libdevice/libdevice.*.bc; without it, nvcc's .cu → PTX stage fails with The system cannot find the path specified. — on Linux apt pulls NVVM in transitively with cuda-nvcc-XY, Windows requires it explicitly); the SYCL leg builds the Level Zero loader from source (oneapi-src/level-zero v1.18.5 → cmake --build … --target install) because Windows oneAPI BaseKit ships the SYCL runtime but not ze_loader.lib, and libvmaf's meson cc.find_library('ze_loader') needs both the header and the import library. When the Linux apt level-zero-dev version moves, bump the L0 git tag to match. core/src/meson.build guards the explicit svml / irc cc.find_library calls behind host_machine.system() != 'windows' — those calls exist for the gcc/g++ + icpx Linux flow where the host linker is non-Intel; on Windows the host compiler is icx-cl itself and auto-injects the Intel runtime. Round-10 surfaced an additional Windows-only gap: ~14 libvmaf TUs #include <pthread.h> unconditionally, but MSVC and clang-cl ship no pthread (MinGW does, via winpthreads). The fork now ships a header-only Win32 shim at core/src/compat/win32/pthread.h mapping the in-use pthread subset (mutex / cond / thread create+join+detach) onto SRWLOCK + CONDITION_VARIABLE + _beginthreadex. The shim is wired in via pthread_dependency in core/meson.build, declared only when cc.check_header('pthread.h') fails — so MinGW and POSIX paths stay untouched. When upstream Netflix/vmaf adds new pthread surface (e.g., pthread_rwlock_*), extend compat/win32/pthread.h to cover it. Both nvcc fatbin custom_targets (CUDA) and icpx custom_targets (SYCL common.cpp / picture_sycl.cpp / dmabuf_import.cpp, plus the SYCL feature kernels) bypass meson's dependencies: plumbing and hand-roll their own -I lists, so the shim path must be threaded into both cuda_extra_includes and sycl_inc_flags explicitly on Windows. icpx-cl on Windows additionally rejects -fPIC (unsupported option for target 'x86_64-pc-windows-msvc') — so sycl_common_args and sycl_feature_args route their -fPIC token through sycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []. PIC is the default for Windows DLLs, so dropping the flag is the correct fix rather than a workaround. Round-14 surfaced a third Windows-only blocker: core/src/feature/integer_adm.h (an upstream Netflix file, last touched by upstream port d06dd6cf) initialises dwt_7_9_YCbCr_threshold[3] with C99 designated initializers ({.a = ..., .k = ..., .f0 = ..., .g = {...}}). The header is included from both integer_adm.c (C TU) and cuda/integer_adm/*.cu (C++ TU via nvcc); MSVC's C++ frontend (and nvcc's cudafe++ on Windows) rejects C99 designated initializers without /std:c++20. Converted to positional initialization in the same struct-member order (a / k / f0 / g[4]) — the conversion is provably semantically identical and works in every C/C++ standard, so it costs nothing on the upstream-merge side beyond a trivial conflict marker if upstream Netflix later edits the same lines. Restore designated form post-merge if upstream has it. Round-17 surfaced four more Windows/MSVC-only SYCL blockers, two of which touch upstream-shared headers. (a) core/src/ref.h and core/src/feature/feature_extractor.h (UPSTREAM) unconditionally #include <stdatomic.h> and use the atomic_int typedef in struct definitions. MSVC's <stdatomic.h> (added in 19.34) only declares the C11 symbols inside the global namespace under C; in C++ compilation (icpx-cl drives the SYCL TUs as C++) MSVC surfaces them only inside namespace std::. gcc/clang expose both via a GNU extension, so the upstream code works on every other platform. The fork now wraps both headers' #include <stdatomic.h> in #if defined(__cplusplus) && defined(_MSC_VER)#include <atomic> + using std::atomic_int;, falling through to the original <stdatomic.h> line on every other configuration. ABI is unchanged — atomic_int resolves to the same underlying type. If upstream Netflix adds further C11 atomic typedefs in these headers (e.g., atomic_uint, atomic_size_t), extend the using std:: lines to cover them. (b) core/src/sycl/d3d11_import.cpp (fork-added) used <libvmaf/log.h> which doesn't exist — log.h lives at core/src/log.h and is internal. Switched to "log.h"; the icpx invocation already supplies the src-relative -I. (c) core/src/sycl/dmabuf_import.cpp (fork-added) included <unistd.h> at file scope, but POSIX close() is only used inside the #if HAVE_SYCL_DMABUF VA-API block. Moved the <unistd.h> include inside that guard so non-DMA-BUF builds (Windows MSVC, macOS) compile cleanly. (d) core/src/sycl/common.cpp (fork-added) called clock_gettime(CLOCK_MONOTONIC), which doesn't exist on Windows. Replaced with std::chrono::steady_clock (guaranteed monotonic by the C++ standard, portable on every supported host). All four fixes preserve POSIX/Linux behaviour bit-identically and only change the Windows MSVC build path. Round-18 surfaced a fifth Windows blocker on the CUDA leg's CPU SIMD compile path: core/src/feature/x86/motion_avx2.c:529 (UPSTREAM, ported in commit 9371a0aa from Netflix PR #1486) computed final_accum[0] + final_accum[1] + final_accum[2] + final_accum[3] to extract the four int64 lanes from an __m256i. gcc/clang allow this via the GNU vector-extension treatment of __m256i (it carries __attribute__((vector_size(32)))); MSVC rejects it with C2088: built-in operator '[' cannot be applied to an operand of type '__m256i'. Replaced with _mm256_extract_epi64(final_accum, N) for N ∈ {0..3}, summed — bit-exact lane sum on every compiler. Restore the index form post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Round-19 surfaced the same MSVC pattern at 19 more call sites across the AVX2/AVX-512 ADM and motion files plus six GCC-style vector casts. core/src/feature/x86/adm_avx2.c (UPSTREAM): 6 lines (915-920) used (__m256i)(_mm256_cmp_ps(...)) C-style casts that gcc/clang accept via the GNU vector extension; replaced with the dedicated _mm256_castps_si256(...) bit-cast intrinsic. 12 lane-extract sites (r2_h[0]+r2_h[1], etc. at lines 2420 / 2425 / 2430 / 2893 / 2897 / 2901 / 4079 / 4084 / 4089 / 4627 / 4631 / 4635) replaced with _mm_extract_epi64(r2_X, N) summed pair. core/src/feature/x86/adm_avx512.c (UPSTREAM): 6 sister lane-extract sites (lines 4470 / 4477 / 4484 / 4625 / 4631 / 4637) — same fix. The AVX-512 paths reduce a __m512i down to __m128i first (via _mm512_extracti64x4_epi64_mm256_extracti64x2_epi64) before the index, so only the final __m128i[N] step needed changing. core/src/feature/x86/motion_avx512.c (UPSTREAM, ported in 9371a0aa from PR #1486): one final r2[0]+r2[1] reduction (line 448), same fix. All 19 lane-extract fixes plus the 6 cast fixes are bit-exact rewrites and only change the source-level syntax to MSVC-portable form. Restore the original forms post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Additionally core/src/sycl/d3d11_import.cpp (fork-added) switched from C-style COBJMACROS helpers (ID3D11Device_CreateTexture2D, …_Release, etc.) to C++ method-call syntax (device->CreateTexture2D, tex->Release) — d3d11.h gates COBJMACROS behind !defined(__cplusplus), so the C-style helpers aren't visible in this .cpp TU. The two forms are ABI-equivalent (both dispatch through the COM vtable); the choice is purely lexical and POSIX builds aren't affected (the whole TU is #ifdef _WIN32). Round-20 surfaced two more Windows-only blockers. (a) 17 sites across the x86 SIMD layer used GCC's float tmp[N] __attribute__((aligned(M))); form to align scratch buffers for _mm{256,512}_store_ps. MSVC rejects the trailing-attribute syntax with C2146: syntax error: missing ';' before identifier '__attribute__'. Replaced with the C11-standard _Alignas(M) float tmp[N]; (alignment specifier before the type) — works in gcc, clang and MSVC with /std:c11. Files touched (all UPSTREAM): vif_statistic_avx2.c (×2), ansnr_avx2.c (×2), ansnr_avx512.c (×2), float_adm_avx2.c (×2), float_adm_avx512.c (×2), float_psnr_avx2.c (×1), float_psnr_avx512.c (×1), ssim_avx2.c (×4), ssim_avx512.c (×4). The pre-existing vif_avx2.c / vif_avx512.c already define a portable ALIGNED(x) macro at file scope and position the attribute before the type, so they compile cleanly under MSVC and were not touched. (b) core/src/feature/mkdirp.c (UPSTREAM, third-party MIT-licensed copy of Stephen Mathieson's micro-library) included <unistd.h> unconditionally but never used POSIX unistd symbols (only mkdir via <sys/stat.h>/<direct.h>). Gated <unistd.h> to non-Windows and added <direct.h> for Windows; switched mkdir(pathname)_mkdir(pathname) (the non-deprecated MSVC name). core/src/feature/mkdirp.h added a mode_t typedef under MSVC since neither <sys/types.h> nor <sys/stat.h> declare it on Windows; mode is ignored on the Windows path anyway. Round-21 surfaced two more blockers (the round-19 __m128i[N] sweep missed six sites) plus a pre-commit workflow checkout gap. (a) core/src/feature/x86/adm_avx512.c (UPSTREAM) had six further r2_X[0] + r2_X[1] reductions at lines 2128 / 2135 / 2142 / 2589 / 2595 / 2601 that reduce a __m512i accumulator down to __m128i before the lane index. Replaced with the same _mm_extract_epi64(r2_X, N) summed-pair pattern used in round 19 — bit-exact, MSVC-portable. (b) core/src/log.c (UPSTREAM) included <unistd.h> unconditionally to pick up POSIX isatty / fileno. On MSVC both live in <io.h> as _isatty / _fileno; gated the include and macro-redirected the names so the one call site at line 34 compiles on both sides without touching the POSIX path. (c) .github/workflows/lint-and-format.yml (fork-added) checks out without lfs: true, so the model/tiny/*.onnx files land as LFS pointer stubs. pre-commit's "changes made by hooks" reporter then diffs the stubs against HEAD's real blobs and fails the job even though no hook touched them. Added lfs: true to the pre-commit job's checkout. (d) core/src/meson.buildcuda_common_vmaf_lib static library had no dependencies: list, so the Win32 pthread shim (wired in via pthread_dependency in core/meson.build) wasn't on its include path; cuda/common.h unconditionally #include <pthread.h> and MSVC failed with C1083. Added dependencies : [pthread_dependency] — no-op on POSIX (empty list), routes the shim path in on Windows. (e) core/src/feature/integer_vif.c (UPSTREAM) walked one big aligned_malloc result as void *data and did data += pad_size / data += h * stride_16 etc. to carve the buffer into typed sub-pointers. gcc/clang accept pointer arithmetic on void * as a GNU extension (treating sizeof(void) == 1); MSVC rejects it with C2036: 'void *': unknown size. Replaced the cursor type with uint8_t * and added explicit casts at assignment sites that take a typed pointer (uint16_t *mu1, uint32_t *mu1_32, etc.). Byte offsets are identical, layout unchanged, bit-exact. If upstream Netflix edits the same loop, reabsorb the walk and re-apply the cursor-type + cast pattern. (f) core/src/feature/cuda/integer_adm_cuda.c (UPSTREAM) included <unistd.h> at line 33 but used no POSIX symbols from it; MSVC failed with C1083. Dropped the unused include outright — simplest fix, no runtime change on any platform. (g) core/src/dnn/model_loader.c (fork-added) uses S_ISDIR / S_ISREG to classify resolved paths. MSVC ships the underlying S_IFMT / S_IFDIR / S_IFREG bit masks in <sys/stat.h> but not the POSIX classification macros. Added a Windows-only fallback (#ifndef S_ISDIR #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #endif, same for S_ISREG) guarded by #ifdef _WIN32. Semantically identical to the POSIX macro on Linux/macOS. Round-21e surfaced the final source-portability blockers once the DLL build passed preprocessing. (h) core/src/predict.c, core/src/libvmaf.c and core/src/read_json_model.c (all UPSTREAM) used C99 variable-length arrays — double scores[cnt] at predict.c:385, char name[name_sz] at predict.c:453 and libvmaf.c:1741, plus cfg_name[cfg_name_sz] and generated_key[generated_key_sz] in the .json model-collection parser. gcc/clang accept VLAs as a C11 optional feature; MSVC (even with /std:c11) rejects them outright with C2057: expected constant expression (plus C2466 and C2133 on the const size_t sized arrays — MSVC treats const as runtime-bounded, not a constant expression, even when the initialiser is literal like 4 + 1). Replaced each runtime-sized buffer with a small malloc + explicit free on every exit path (in predict.c and read_json_model.c a goto out; cleanup arm was introduced because the loops error-exit mid-function). The generated_key buffer in read_json_model.c uses the narrower fix — char generated_key[5]; — since its size (four decimal digits of the bootstrap sub-model index plus NUL) is a true compile-time constant. Buffers are a handful of bytes each (name_sz is the model-collection name length plus the fixed _ci_p95_lo suffix, scores holds ~20 doubles, cfg_name is the name plus _0000 suffix), so the heap round-trip is not performance-relevant; the new -ENOMEM failure mode is handled uniformly by existing callers. The read_json_model.c refactor also plugs a pre-existing leak of the name buffer on the early return -EINVAL when a JSON object key isn't a string — the goto out; path frees name + cfg_name on every exit. core/test/test_feature_extractor.c:56 (UPSTREAM) declared const unsigned n_threads = 8; and used it as the extent of VmafFeatureExtractorContext *fex_ctx[n_threads];. Converted to enum { n_threads = 8 }; so MSVC sees a constant-expression; every other compiler accepts enum constants identically. Re-absorb if upstream Netflix later edits the same loops and your toolchain matrix omits MSVC. (i) The Windows MSVC build-only legs now build the full tree — CLI tools, unit tests and libvmaf.dll — rather than the previous short cut of disabling -Denable_tools / -Denable_tests. Per user direction ("fix the code ffs"), the tree polyfills the remaining POSIX surfaces on MSVC instead: (core/tools/compat/win32/getopt.h + core/tools/compat/win32/getopt.c) a from-scratch POSIX/GNU-compatible getopt_long shim (short / long options, no_argument / required_argument / optional_argument, argv permutation for non-option operands, -- explicit stop, =-embedded values). The shim is fork-added (BSD-3-Clause-Plus-Patent, Copyright 2026 Lusoris and Claude) and declared via a single getopt_dependency in core/meson.build, gated on cc.check_header('getopt.h') failing. The dependency auto-propagates the shim .c into any consuming target via meson's sources: keyword, so both the vmaf CLI (core/tools/meson.build) and the test_cli_parse unit test (core/test/meson.build) pick it up uniformly. MinGW ships <getopt.h> via mingw-w64-crt, so check_header succeeds there and the shim stays out of the TU list. (j) Eleven test executables (test_log, test_dict, test_opt, test_cpu, test_ref, test_feature, test_ciede, test_luminance_tools, test_cli_parse, test_sycl, test_sycl_pic_preallocation) were missing pthread_dependency in their dependencies: lists at core/test/meson.build. On POSIX pthread_dependency is an empty list so the omission was invisible; on MSVC those TUs transitively include feature_collector.h<pthread.h> and fail with C1083. Threaded the dependency through all eleven targets. test_cli_parse additionally lists getopt_dependency to pick up the shim. (k) Three additional VLA sites surfaced once the test harness built on MSVC: test_cambi.c:254 had unsigned w = 5, h = 5; uint16_t buffer[3 * w];; converted to enum { w = 5, h = 5 }; so the array extent is a constant expression. test_pic_preallocation.c:382 and test_pic_preallocation.c:506 had const int num_threads = N; pthread_t threads[num_threads]; — MSVC rejects const int as non-constant-expression. Converted to enum { num_threads = N, fetches_per_thread = M };. (l) test_ring_buffer.c:23 (since removed; the ring-buffer test logic was folded into the CUDA-buffer / pic-preallocation suites) and test_pic_preallocation.c:26 included <unistd.h> for usleep / sleep. Gated behind !_WIN32 with a Win32 fallback via <windows.h> + #define usleep(us) Sleep(((us) + 999) / 1000) / #define sleep(s) Sleep((s) * 1000). The conversion rounds sub-millisecond usleep inputs up, which is safe for these test paths (they use 100 µs jitter and 1 s waits). (m) core/tools/vmaf.c included <unistd.h> for isatty / fileno. Applied the same gating pattern used in log.c in round-21(b) — include <io.h> on MSVC and redirect isatty / fileno to _isatty / _fileno via #define. (n) __builtin_clz / __builtin_clzll are GCC intrinsics; MSVC ships __lzcnt / __lzcnt64 via <intrin.h> instead. The shim already lived in core/src/feature/integer_vif.h but integer_adm.c:939, x86/adm_avx2.c:1425 and x86/adm_avx512.c:1217 don't include that header. Extracted the shim into a dedicated core/src/feature/compat_builtin.h (fork-added) and included it from all four TUs. The guard is defined(_MSC_VER) && !defined(__clang__), so clang-cl / icx-cl (which provide the GCC intrinsics natively) skip the shim. (o) The SYCL leg's D3D11 import TU core/src/sycl/d3d11_import.cpp is C++ (icpx-cl drives it as C++ on Windows) but included the internal C header log.h without an extern "C" wrap. log.h is an upstream Netflix header with no __cplusplus guard, so vmaf_log got C++ name-mangled in the .cpp TU and failed to resolve against the C-linkage symbol produced by log.c at link time (LNK2019 from every test target that pulls in the SYCL static lib). Wrapped the #include "log.h" with extern "C" { ... } inside the fork-added .cpp rather than touching the upstream header — keeps log.h identical to upstream on every /sync-upstream. (p) The Windows MSVC legs build with --default-library=static. libvmaf's public API has no __declspec(dllexport) attributes (upstream Netflix is POSIX-shaped), so a vanilla MSVC shared build produces src/vmaf-3.dll with no exported symbols and the toolchain therefore never emits the companion vmaf.lib import library. Downstream tool targets then fail with LNK1181: cannot open input file 'src\vmaf.lib'. The MinGW matrix leg has used --default-library static since day one for the same reason (line 387); the MSVC legs now mirror that choice via matrix.include[].meson_extra. Downstream consumers that want a DLL can either add __declspec(dllexport) decorations to the public API or use a .def file; that is a separate decision and out of scope for the build-only gate.
  • Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.windows-gpu-build.strategy.matrix.include[].name' \
    .github/workflows/libvmaf-build-matrix.yml
# Expected output (2 lines):
#   Build — Windows MSVC + CUDA (build only)
#   Build — Windows MSVC + oneAPI SYCL (build only)
  • Branch protection: the two Windows GPU legs are pinned as required status checks on master immediately after this PR's merge. After ADR-0120's two Linux DNN legs the count moves 21 → 23. Re-pin via:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
    --input /tmp/protection-update.json

0023 — CUDA gencode coverage (sm_86/sm_89/compute_80 PTX) + init hardening

  • Workstream PRs: the ADR-0122 PR (gencode + init hardening) and the ADR-0123 follow-up for the 32b115df post-cubin-load regression.
  • Touches:
  • core/src/meson.build — the gencode array in the if get_option('enable_nvcc') branch.
  • core/src/cuda/common.cvmaf_cuda_state_init() error paths (multi-line actionable log, cuda_free_functions() + free(c) + *cu_state = NULL cleanup).
  • docs/backends/cuda/overview.md## Runtime requirements section and ### GPU architecture coverage table.
  • Invariant: the gencode array unconditionally emits cubins for sm_75 / sm_80 / sm_86 / sm_89 plus a compute_80 PTX, independent of host nvcc version. Upstream Netflix's gencode only ships cubins at Txx major boundaries (sm_75 / sm_80 / sm_90 / sm_100 / sm_120); a literal merge that replaces our array with upstream's would re-open the Ampere-sm_86 / Ada-sm_89 coverage hole. The sm_90 / sm_100 / sm_120 entries are still version-gated and should be preserved verbatim if upstream adds new gates. The init-path error messages are fork-local strings; upstream's terse "Error: failed to load CUDA functions" must NOT win a merge.
  • Re-test:
meson setup build -Denable_cuda=true -Denable_nvcc=true
ninja -C build 2>&1 | grep -E 'compute_(80|86|89)'
# Expect at least -gencode=arch=compute_86,code=sm_86 and
#                -gencode=arch=compute_89,code=sm_89 and
#                -gencode=arch=compute_80,code=compute_80

# Actionable init message (run without CUDA driver on the loader path):
LD_LIBRARY_PATH= ./build/tools/vmaf --help 2>&1 | grep -qi 'libcuda.so.1' || \
    echo "init log regressed"

0024 — vmaf_read_pictures null-guard for CUDA device-only path

  • Workstream PRs: the ADR-0123 follow-up landed atop the ADR-0122 gencode/init-hardening work.
  • Touches:
  • core/src/libvmaf.c — the non-threaded tail of vmaf_read_pictures at the prev_ref update site (line ~1428 in the fork; upstream equivalent is the tail added by f740276a).
  • Invariant: the prev_ref update is guarded by if (ref && ref->ref) so pure-CUDA extractor sets (where ref = &ref_host but ref_host was never populated by translate_picture_device) do not deref a NULL refcount. Upstream currently has the same unguarded tail; the bug is masked upstream only because the experimental VMAF_PICTURE_POOL gate from 32b115df is still in place. A literal upstream merge that removes our null-guard while upstream's experimental gate is still holding would pass tests but re-open the libvmaf_cuda ffmpeg crash the moment the gate flips default-on (which the fork did in 65460e3a, ADR-0104). Keep the guard until the upstream null-guard port lands.
  • Re-test:
# Unit tests cover the non-regression on the library side:
meson test -C build

# End-to-end regression: ffmpeg libvmaf_cuda must exit 0 on a
# CUDA-device-only extractor set (full recipe in ADR-0123).
./ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
  -i /tmp/ref.mp4 -i /tmp/dis.mp4 \
  -lavfi "[0:v]format=yuv420p,hwupload_cuda[r];\
          [1:v]format=yuv420p,hwupload_cuda[d];\
          [r][d]libvmaf_cuda=log_path=/tmp/out.json:log_fmt=json" \
  -f null -

0025 — VIF init() fail-path frees advanced byte-cursor

  • Workstream PRs: PR #47 (rewritten to leak-fix-only after master absorbed the void→uint8_t half via commit b0a4ac3a, entry 0022 §e). Ports the leak-fix half of upstream Netflix PR #1476.
  • Touches: core/src/feature/integer_vif.c (UPSTREAM — 2-line fix in the init() fail: handler).
  • Invariant: init() walks uint8_t *data forward through aligned_malloc's one allocation, advancing past each sub-pointer assignment. If vmaf_feature_name_dict_from_provided_features returns NULL the fail path must free the base pointer s->public.buf.data, never the advanced cursor data. Upstream master still has aligned_free(data) there — same bug — so this entry is the reminder to not let an upstream sync re-introduce the advanced-cursor form. If upstream lands PR #1476 or an equivalent, the sync can drop this entry.
  • Re-test:
meson test -C build --suite=fast
# Static check: ripgrep the pattern that must NOT return.
rg -n "aligned_free\(data\)" core/src/feature/integer_vif.c && \
    echo 'REGRESSED' || echo 'ok'
  • Workstream PRs: this PR (ADR-0124 adoption). Closes the "rule-without-a-check" gap on ADR-0100 / 0105 / 0106 / 0108.
  • Touches (all FORK-ADDED — no upstream overlap): .github/workflows/rule-enforcement.yml (new), scripts/ci/check-copyright.sh (new), .pre-commit-config.yaml (appended local hook).
  • Invariant: the deep-dive-checklist job is blocking on every PR that is not an upstream port (exempt via port: title prefix or port/ branch). The other three gates (doc-substance-check, adr-backfill-check, copyright pre-commit) are advisory or pre-commit, never CI-blocking; this split is the whole point of ADR-0124 and an upstream sync must not move them into the required-status-check set without a follow-up ADR. The opt-out parser matches /^-?\s*no .* (?:needed|impact|rebase-sensitive)/ per ADR-0108 §Opt-out-lines — if upstream ever changes PR-template phrasing (unlikely; this is fork-local), the regex and the template must move together.
  • Re-test:
# Lint the workflow + hook locally.
pre-commit run --files \
  .github/workflows/rule-enforcement.yml \
  scripts/ci/check-copyright.sh \
  .pre-commit-config.yaml

# Dry-run the copyright hook against a staged source file.
scripts/ci/check-copyright.sh core/src/libvmaf.c && echo ok

# Synthetic PR body that violates ADR-0108 should fail the parser;
# see docs/research/0002-automated-rule-enforcement.md §Verification
# plan for the three test cases.

0027 — SSIMULACRA 2 scalar extractor (libjxl FastGaussian IIR blur)

  • Workstream PRs: this PR (feat/ssimulacra2-scalar); proposal ADR in PR #67.
  • Touches: core/src/feature/ssimulacra2.c (fork-local, new), core/src/meson.build, core/src/feature/feature_extractor.c.
  • Invariant: the extractor embeds several tables that must track libjxl upstream — opsin absorbance matrix, MakePositiveXYB offsets, 108 pooling weights, polynomial-transform coefficients, and the FastGaussian coefficient-derivation formulas (radius = 3.2795·σ + 0.2546, Cramer's 3×3 solve for β, n2/d1 assignment per Charalampidis 2016 (33)). If libjxl ever changes any of these, update ssimulacra2.c in the same PR that syncs upstream. Self-consistency must stay at exactly 100.000000 for identical ref/dist inputs — this is the cheapest regression check.
  • Re-test:
meson test -C build --suite=fast
./build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 --feature ssimulacra2 -o /tmp/self.xml \
  && grep -q 'ssimulacra2="100.000000"' /tmp/self.xml \
  && echo "ok: self-consistency 100.0"

0028 — MS-SSIM separable decimate + AVX2/AVX-512/NEON SIMD

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 (supersedes the rebase-incompatible feat/ms-ssim-decimate-simd; AVX2/AVX-512, commits 7de8cd7f scalar separable, 5f93c864 AVX2, 73436438 AVX-512); feat/ms-ssim-decimate-neon-v2 (NEON follow-up, stacked).
  • Touches: core/src/feature/ms_ssim_decimate.{c,h} (NEW), core/src/feature/x86/ms_ssim_decimate_avx2.{c,h} (NEW), core/src/feature/x86/ms_ssim_decimate_avx512.{c,h} (NEW), core/src/feature/arm64/ms_ssim_decimate_neon.{c,h} (NEW), core/src/feature/ms_ssim.c (call-site change), core/src/meson.build (register new SIMD TUs), core/test/test_ms_ssim_decimate.c (NEW), core/test/meson.build (arm64 gating).
  • Invariant: the 9-tap 9/7 biorthogonal wavelet LPF coefficients (ms_ssim_lpf_h / ms_ssim_lpf_v) are duplicated verbatim in five TUs for bit-identity: the scalar ms_ssim_decimate.c, the AVX2 variant, the AVX-512 variant, the NEON variant, and upstream's g_lpf_h / g_lpf_v in ms_ssim.c. Any upstream change to the coefficient values or the KBND_SYMMETRIC mirror branch in iqa/convolve.c must be mirrored to all five. If not mirrored, SIMD paths and scalar diverge silently and the bit-equality memcmp in test_ms_ssim_decimate catches it — but only when that test runs, so diff the five files first.
  • Re-test (on each supported host arch):
# x86_64 host — native build.
meson test -C build
./build/test/test_ms_ssim_decimate

# aarch64 host OR aarch64 cross under qemu — see /tmp/aarch64-cross.txt.
meson setup build-arm64 libvmaf --cross-file /tmp/aarch64-cross.txt \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build-arm64
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
    build-arm64/test/test_ms_ssim_decimate

# Netflix MS-SSIM golden — places=4 must still pass through SIMD.
.venv/bin/python -m pytest \
    python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor

0029 — KBND_SYMMETRIC period-based reflection in iqa/convolve.c

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 follow-up (CI triage on PR #69, 2026-04-20).
  • Touches: core/src/feature/iqa/convolve.c (upstream file, rewritten KBND_SYMMETRIC).
  • Invariant: KBND_SYMMETRIC(img, w, h, x, y, _) must use the period-based form (period = 2*w, period = 2*h) so that offsets with |x| > w or |y| > h still land in bounds. Upstream's single-reflect form was out-of-bounds whenever w < kernel_half or h < kernel_half; the latent bug did not reproduce in Netflix golden tests because MS-SSIM pyramids never decimate below ~60×34. Any upstream change that reverts to the single-reflect form must be rejected or re-ported.
  • Re-test:
./build/test/test_ms_ssim_decimate        # test_1x1 border case
.venv/bin/python -m pytest \
    python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor

0030 — adm_decouple_s123_avx512 stack-array 64-byte alignment

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 follow-up (CI triage on PR #69, 2026-04-20).
  • Touches: core/src/feature/x86/adm_avx512.c (upstream file, one-line _Alignas(64) on int64_t angle_flag[16] at line 1317). core/test/test_pic_preallocation.c (upstream file, three vmaf_model_destroy(model) calls pairing the vmaf_model_load in test_picture_pool_basic / _small / _yuv444).
  • Invariant: the stack slot for angle_flag must be 64-byte aligned because two _mm512_loadu_si512(&angle_flag[0/8]) loads in the same scope may be promoted to aligned vmovdqa64 by LTO. Dropping the _Alignas(64) annotation re-introduces the SEGV under --buildtype=release -Db_lto=true -Db_sanitize=address. Debug / no-LTO builds keep vmovdqu64 and cannot flag the regression. See docs/development/known-upstream-bugs.md.
  • Re-test:
meson setup build-asan-lto libvmaf \
    -Denable_cuda=false -Denable_sycl=false \
    -Db_sanitize=address --buildtype=release -Db_lto=true
ninja -C build-asan-lto test/test_pic_preallocation
ASAN_OPTIONS=detect_leaks=1 \
    ./build-asan-lto/test/test_pic_preallocation

0031 — Batch-A upstream-port small-fix sweep (ports of unmerged PRs)

  • Workstream PRs: feat/batch-a-upstream-small-fix-sweep — commits 546a40ee (T0-1), 8fed8ad1 (T4-4), 83a1db46 (T4-5), 34425dee (T4-6). ADRs 0131, 0132, 0134, 0135.
  • Touches:
  • core/src/cuda/picture_cuda.c (one-line cuMemFree port of Netflix#1382)
  • core/src/feature/feature_collector.c + core/test/test_feature_collector.c (mount/unmount bugfix port of Netflix#1406 + shared-helper test refactor)
  • core/src/meson.build (declare_dependency + override_dependency port of Netflix#1451)
  • core/include/libvmaf/model.h, core/src/model.c, core/test/test_model.c, docs/api/index.md (built-in model iterator port of Netflix#1424)
  • Invariant: each of the four upstream PRs is OPEN (unmerged) on the port date; when Netflix merges any of them, the fork's version is correction-bearing (T4-4 test refactor, T4-6 three defect fixes + Doxygen doc expansion), not line-identical. Resolution on upstream merge is always "keep fork version" because the fork's version already satisfies the PR's intent and additionally fixes the defects.
  • Netflix#1406 conflict will land in test_feature_collector.c — fork uses load_three_test_models() helper vs upstream's inline per-model VmafModel *m0, *m1, *m2; duplication.
  • Netflix#1424 conflict will land in core/src/model.c and core/test/test_model.c — fork uses else if guard + idx + 1 < CNT + const-qualified test types.
  • Netflix#1382 and Netflix#1451 are line-identical in substance; merge should be clean aside from trailing-comma style drift.
  • Re-test:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_feature_collector test/test_model
build/test/test_feature_collector
build/test/test_model
# Expected: 6/6 pass in test_feature_collector (mount/unmount
# 3-model sequences); 39/39 pass in test_model (includes
# test_version_next full-iteration invariant).

0032 — Thread-local locale handling for numeric I/O (port of Netflix/vmaf#1430)

  • Workstream PRs: port/netflix-1430-thread-locale (T4-3 from the "Batch-A follow-up" sweep, 2026-04-20).
  • Touches: core/src/thread_locale.h / core/src/thread_locale.c (new, upstream-authored); core/src/meson.build (two cdata.set('HAVE_USELOCALE'/'HAVE_XLOCALE_H') probes + src_dir + 'thread_locale.c' in libvmaf_sources); core/src/output.c (four writers gain push_c() + pop() bracket, preserving fork's ferror(outfile) ? -EIO : 0 return contract from ADR-0119); core/src/svm.cpp (drop <locale.h> include; replace setlocale/strdup/setlocale bracket with vmaf_thread_locale_push_c/pop; add buffer.imbue(std::locale::classic()) to both SVM parser ctors with fork's K&R + 4-space style); core/src/read_json_model.c (bracket model_parse with push/pop); core/test/meson.build (new test_locale_handling target + test registration); core/test/test_locale_handling.c (new, upstream-authored with three fork corrections for the score_format parameter).
  • Invariant: fork's output writers return ferror(outfile) ? -EIO : 0 — this must survive any upstream refactor of the writer bodies. The push_c() call MUST be paired with a pop() on every return path (writer bodies have a single tail return, so the pattern is locally push → body → pop → return ferror-check). Dropping pop() leaks a locale_t on POSIX and leaves the thread locked to "C" on Windows.
  • Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_locale_handling
# Repro the user-visible failure without the fix:
LC_ALL=de_DE.UTF-8 build/tools/vmaf --reference ref.yuv \
    --distorted dis.yuv --width 1920 --height 1080 \
    --pixel_format 420 --bitdepth 8 --output result.json \
    --json
# Assert output contains period decimals, not comma.
python -c "import json; d=json.load(open('result.json')); \
    assert all('.' in repr(v) for v in \
    [f['metrics']['vmaf'] for f in d['frames']])"
  • On upstream sync: when Netflix merges PR #1430, the (cherry picked from commit 054a97ed…) trailer in git log port/netflix-1430-thread-locale lets the next /sync-upstream skip this commit. If the upstream diff drifts, redo the three fork corrections listed in ADR-0137 §Decision.

0033 — SSIM / MS-SSIM SIMD bit-exact to scalar via per-lane scalar double

  • Workstream PRs: feat/ms-ssim-decimate-neon (this PR — companion to the ADR-0138 convolve fast path).
  • Touches: core/src/feature/x86/ssim_avx2.c and core/src/feature/x86/ssim_avx512.cssim_accumulate_* rewritten. ssim_precompute_* and ssim_variance_* unchanged (they were already bit-exact). Plus the new bit-exact convolve_avx2.c / convolve_avx512.c and the upstream h-pass OOB fix at iqa/convolve.c:159.
  • Invariants (see ADR-0139 §Decision):
  • Convolve tapssingle-rounded float*float → widen → double add, NO FMA. Mirrors scalar sum += img[i]*k[j] in iqa/convolve.c.
  • SSIM accumulate — scalar's 2.0 * literal (2.0 * ref_mu[i] * cmp_mu[i] + C1 and 2.0 * srsc + C2) is a C double literal. Both SIMD accumulators do the 2.0 * numerator + division + final l*c*s product per-lane in scalar double to match scalar type promotions byte-for-byte.
  • H-pass outer-loop boundy < dst_h + vc - kh_even (not y < dst_h + vc); the - kh_even is load-bearing because the last cache row on even-tap kernels (e.g. box-8) is never read by the v-pass but was previously written OOB when image height equals kernel height.

Fork-local SSIM SIMD is NOT upstream. If upstream ever adds their own SSIM AVX2/AVX-512, keep the fork's version on conflict — it's the only variant verified bit-exact to scalar at --precision max. - Re-test:

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_iqa_convolve test_ms_ssim_decimate
# Bit-exactness check across dispatch backends:
FIX=python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv
DIS=python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv
for m in 255 16 0; do
  build/tools/vmaf --cpumask $m --reference $FIX --distorted $DIS \
      --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
      --feature float_ssim --feature float_ms_ssim \
      --output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_16.xml)    # expect empty
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_0.xml)     # expect empty
  • On upstream sync: the AVX2/AVX-512 SSIM surface is entirely fork-local (upstream has VIF/ADM/motion/CAMBI SIMD but no SSIM). If upstream ever introduces SSIM SIMD, their kernel bodies will almost certainly compute l*c*s in vector float for throughput — do not adopt. The fork's per-lane-scalar-double reduction is required for the bit-exactness claim. Same applies to convolve_avx2/512 — they are fork-only; dispatch sits in ssim_tools.c via _iqa_convolve_set_dispatch.

0034 — SIMD DX framework + NEON SSIM/convolve bit-exact port

  • Workstream PRs: feat/simd-dx-framework (this PR, PR #A); ships the two demos on top of which PR #B will consume the framework (ssimulacra2, motion_v2, vif_statistic, ...).
  • Touches: core/src/feature/simd_dx.h (new header), core/src/feature/arm64/convolve_neon.c + convolve_neon.h (new NEON port), core/src/feature/arm64/ssim_neon.c (ssim_accumulate_neon rewritten for ADR-0139 bit-exactness; precompute + variance unchanged), core/src/feature/float_ssim.c + core/src/feature/float_ms_ssim.c (wire iqa_convolve_neon into the aarch64 dispatch setters), core/src/meson.build (arm64_sources += convolve_neon.c), core/test/meson.build (test_iqa_convolve arch filter extended to arm64 / aarch64), core/test/test_iqa_convolve.c (NEON variant check + aarch64 CPU flag detection), core/test/dnn/meson.build (test_cli.sh gated on not meson.is_cross_build() — bash invokes $VMAF_BIN directly so meson's exe_wrapper isn't applied), new build-aux/aarch64-linux-gnu.ini meson cross-file, .claude/skills/add-simd-path/SKILL.md (upgraded kernel-spec flags).
  • Invariants (see ADR-0140 §Decision):
  • simd_dx.h is fork-local. Keep the fork's version on upstream conflict. Macro names are ISA-suffixed (_AVX2_4L, _AVX512_8L, _NEON_4L) — do not collapse into a cross-ISA abstraction; the fork's SIMD policy (user-memory feedback_simd_dx_scope.md) rules out Highway / simde / xsimd.
  • The ADR-0138 widen-then-add rule (single-rounded float * float → widen → double add, NO FMA) applies to NEON exactly as to AVX2 / AVX-512. The NEON form uses paired float64x2_t accumulators (lo / hi) because NEON has no float64x4_t.
  • The ADR-0139 per-lane scalar-double reduction rule applies to ssim_accumulate_neon exactly as to the AVX2 / AVX-512 variants. The NEON implementation uses SIMD_ALIGNED_F32_BUF_NEON (_Alignas(16) float name[4]) + a 4-iteration scalar loop.
  • Re-test (requires aarch64-linux-gnu-gcc + qemu-user-static + aarch64 sysroot at /usr/aarch64-linux-gnu):
cd libvmaf
meson setup ../build-aarch64 \
  --cross-file ../build-aux/aarch64-linux-gnu.ini \
  -Denable_cuda=false -Denable_sycl=false -Denable_dnn=disabled
cd ..
ninja -C build-aarch64
meson test -C build-aarch64                       # expect 31/31 OK
# Bit-exactness check scalar vs NEON under QEMU:
REF=python/test/resource/yuv/src01_hrc00_576x324.yuv
DIS=python/test/resource/yuv/src01_hrc01_576x324.yuv
for m in 255 0; do
  LD_LIBRARY_PATH=$PWD/build-aarch64/src qemu-aarch64-static \
    -L /usr/aarch64-linux-gnu build-aarch64/tools/vmaf \
    --cpumask $m --reference $REF --distorted $DIS \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    --feature float_ssim --feature float_ms_ssim \
    --output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_0.xml)     # expect empty
  • On upstream sync: upstream has no NEON SSIM and no NEON convolve for IQA. If they ever add one, keep the fork's version on conflict — the fork's NEON path is the only variant verified bit-exact to scalar at --precision max. The build-aux/aarch64-linux-gnu.ini cross-file has no upstream equivalent. The /add-simd-path skill is fork-only; upstream doesn't ship .claude/skills/.

0036 — Port Netflix generalised AVX convolve + ADR-0141 cleanup

  • Workstream PRs: port/upstream-f3a628b4-generalized-avx-convolve (this PR).
  • Upstream commit: f3a628b4 "feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21).
  • Touches:
  • convolution.h — upstream-tracking: adds #define MAX_FWIDTH_AVX_CONV 17.
  • convolution_avx.c — upstream-tracking (2,500 LoC deletion) plus fork-delta cleanup per ADR-0141: four scanline helpers convolution_f32_avx_s_1d_* changed from external linkage to static (no other TU uses them after the specialised-path removal); stride parameters widened from int to ptrdiff_t in the helpers, with (ptrdiff_t) casts at public-function multiplication sites; #include <stddef.h> added for the type.
  • core/src/feature/vif_tools.c — upstream-tracking: three AVX dispatch sites drop the fwidth == 17 || ... == 3 whitelist in favour of fwidth <= MAX_FWIDTH_AVX_CONV.
  • python/test/quality_runner_test.py, python/test/vmafexec_test.py — upstream-authored loosening of two full-VMAF-score assertions from places=2 (±0.005) to places=1 (±0.05). Adopted per the ADR-0142 Netflix-authority precedent (project rule #1 addresses fork drift, not upstream-authored test updates the fork must track).
  • Invariants (see ADR-0143 §Decision):
  • Static linkage on scanline helpers — upstream leaves the four convolution_f32_avx_s_1d_*_scanline helpers with external linkage out of habit; the fork narrows them to static. On upstream sync: if upstream ever externs them from another TU, that's a flag to re-audit; keep the fork's static unless the reference is real.
  • ptrdiff_t strides inside helpers — the public convolution_f32_avx_*_s wrappers keep int strides (matching the upstream interface + convolution.h declarations). Helpers take ptrdiff_t to silence bugprone-implicit-widening-of- multiplication-result. If upstream changes the public interface to ptrdiff_t, drop the fork's wrapper-level casts.
  • MAX_FWIDTH_AVX_CONV = 17 — the ceiling is upstream's; if upstream bumps it, the fork must rebuild + re-run the VIF golden test pair.
  • Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build            # expect 32/32 OK
clang-tidy -p build core/src/feature/common/convolution_avx.c
# Zero warnings expected on the touched file.

Netflix CPU golden CI leg exercises the two loosened assertions; confirmed locally under meson test. - On upstream sync: upstream is the source of truth for convolution_avx.c, convolution.h, vif_tools.c dispatch, and the two python golden tolerances. On a rebase, prefer upstream for those files except: - Keep the fork's static on the four scanline helpers. - Keep the fork's ptrdiff_t helper signatures + multiplication- site casts (unless upstream adopts them too, in which case converge). - Keep the fork's #include <stddef.h>. If upstream re-introduces a specialised fast path for common widths, evaluate on a per-fwidth perf profile — the fork's /profile-hotpath skill covers this.

0037 — Float convolution AVX-512 port (ADR-0504, fork-local)

  • Workstream PR: perf/float-convolution-avx512-port-2026-05-18.
  • Upstream: no AVX-512 float convolution path in upstream; this is fork-local (ADR-0504).
  • Touches (fork-local):
  • convolution_avx512.c — new TU with four static scanline helpers and three public wrappers, all ported from convolution_avx.c (__m256__m512, FMA added).
  • convolution.h — adds three convolution_f32_avx512_*_s declarations.
  • vif_tools.c — dispatch updated to test VMAF_X86_CPU_FLAG_AVX512 before AVX2 in all three vif_filter1d_*_s functions.
  • core/src/meson.build — adds convolution_avx512.c to x86_avx512_sources.
  • Rebase risk: LOW. convolution_avx512.c is entirely fork-local; upstream changes to convolution_avx.c or convolution.h may need to be mirrored here, but the AVX-512 file has no upstream conflict surface.
  • Gate: meson test -C build (63/63). Netflix CPU golden tests pass.

0038 — motion_v2 NEON SIMD (fork-local)

  • Workstream PR: port/motion-bundle-neon-and-updates (this PR).
  • Upstream: none — aarch64 NEON for motion_v2 is fork-local. Upstream scalar + AVX2 + AVX-512 variants exist; this PR adds the missing NEON fourth path. Scalar is the bit-exactness ground truth.
  • Touches (fork-local):
  • motion_v2_neon.c — new TU, ~300 LoC. 4-wide int32 SIMD over the 5-tap Gaussian pipeline. Five static inline helpers keep every function under the ADR-0141 60-line budget.
  • motion_v2_neon.h — new header declaring the two public entry points.
  • integer_motion_v2.c — dispatch update: adds an #if ARCH_AARCH64 block in init that selects the NEON variant when VMAF_ARM_CPU_FLAG_NEON is present, mirroring the existing x86 dispatch blocks.
  • core/src/meson.build — add arm64/motion_v2_neon.c to the arm64_sources list.
  • Invariants (see ADR-0145 §Decision):
  • Arithmetic right-shift throughout. The fork's AVX2 path uses _mm256_srlv_epi64 (logical) which can diverge from scalar on negative-diff pixels. The NEON port uses vshrq_n_s64(v, 16) for the known Phase-2 shift and vshlq_s64(v, -(int64_t)bpc) for the variable Phase-1 shift — both arithmetic, matching scalar C >> on signed integer. On rebase: keep the arithmetic forms; do NOT adopt vshrq_n_u64 or a logical emulation even if it runs faster.
  • 4-lane stride + mirror tails. SIMD stride = 4; scalar tails cover the remainder. The Phase-2 helper x_conv_row_sad_neon hands 4 lanes to x_conv_block4_neon and drops to scalar for both left/right edges (j < 2 and j + 6 > w). On rebase: preserve the 4-lane stride and the two-sided scalar tail.
  • Signature parity with AVX2. Both pipeline entry points match the AVX2 + AVX-512 variants' (const uint8_t *prev, ptrdiff_t, const uint8_t *cur, ptrdiff_t, int32_t *y_row, unsigned w, unsigned h, unsigned bpc) signature. On rebase: if upstream changes the signature, mirror the change here AND in the x86 variants in lockstep.
  • Re-test:
meson setup build-aarch64 libvmaf \
  --cross-file build-aux/aarch64-linux-gnu.ini \
  -Denable_cuda=false -Denable_sycl=false
ninja -C build-aarch64
meson test -C build-aarch64 --no-rebuild   # expect 31/31 OK
clang-tidy -p build-aarch64 \
  core/src/feature/arm64/motion_v2_neon.c
# Zero warnings expected on the touched file.

# NEON-vs-scalar bit-exact diff under QEMU:
YUV=python/test/resource/yuv
for mask in 0 255; do
  LD_LIBRARY_PATH=build-aarch64/src \
    qemu-aarch64-static -L /usr/aarch64-linux-gnu \
    build-aarch64/tools/vmaf \
    -r $YUV/src01_hrc00_576x324.yuv \
    -d $YUV/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 -n --feature motion_v2 \
    --cpumask $mask -o /tmp/mv2_$mask.xml --precision max
done
diff <(grep -v 'fps=' /tmp/mv2_0.xml) \
     <(grep -v 'fps=' /tmp/mv2_255.xml)  # expect empty
  • On upstream sync: upstream has no NEON motion_v2 and has not signalled plans to add one. If they ever do, diff their NEON against the fork's: on logical-vs-arithmetic shift, keep the fork's arithmetic form (matches scalar). On the function decomposition (the five helpers), adopt upstream's if it's smaller; the fork's layout is ADR-0141-driven, not a semantic contract.
  • Follow-up T7-32 (fixed 2026-05-09): The _mm256_srlv_epi64 (logical right shift) in motion_score_pipeline_16_avx2 was replaced with srav_epi64_imm, an AVX2-safe arithmetic-right-shift emulation: logical shift OR sign-fill mask via srai_epi32 + slli_epi64. Two bugs were closed in the same PR:
  • AVX2 logical-vs-arithmetic shift: _mm256_srlv_epi64 replaced by srav_epi64_imm in core/src/feature/x86/motion_v2_avx2.c. The emulation is bit-exact with scalar C >> bpc on signed int64_t.
  • Test scalar reference mirror: mirror_idx in core/test/test_motion_v2_simd.c used 2*size - idx - 1 instead of 2*size - idx - 2, diverging from integer_motion_v2.c::mirror(). Fixed to -2. All four adversarial fixtures (neg-diff bpc10/12, mixed-diff bpc10/12) now pass. meson test -C build 50/50 OK. On rebase: keep srav_epi64_imm; do not revert to _mm256_srlv_epi64. The rebase-time invariant is now: AVX2 path uses arithmetic shift (matching NEON and scalar).

0039 — readability-function-size NOLINT sweep (ADR-0146)

  • ADR: ADR-0146
  • Touches:
  • core/src/dict.c
  • core/src/picture.c
  • core/src/picture_pool.c
  • core/src/predict.c
  • core/src/libvmaf.c
  • core/src/output.c
  • core/src/read_json_model.c
  • core/src/feature/feature_extractor.c
  • core/src/feature/feature_collector.c
  • core/src/feature/iqa/convolve.c
  • core/src/feature/iqa/ssim_tools.c
  • core/src/feature/x86/vif_statistic_avx2.c
  • Invariant: every readability-function-size NOLINT suppression has been replaced by a set of small static (or static inline, for the SIMD / IQA files) helpers. The helper names are stable interfaces the surrounding code depends on (e.g. iqa_convolve_1d_separable, iqa_convolve_2d, ssim_compute_stats, ssim_workspace_alloc / _free, vif_stat_simd8_compute / _reduce, struct vif_simd8_lane, read_pictures_extractor_loop, read_pictures_post_extractor, read_pictures_validate_and_prep, read_pictures_update_prev_ref). Upstream Netflix has no equivalent helpers; rebases touching any of these files will conflict against the fork's split shape.
  • On upstream sync:
  • If upstream lands a different decomposition of _iqa_convolve or _iqa_ssim, prefer upstream's shape only if it keeps the ADR-0138 / ADR-0139 bit-exactness invariants (single-rounded float mul → widen to double → double add; per-lane scalar-float reduction through aligned temp buffer). Otherwise keep the fork's split and re-document the divergence here.
  • The fork renamed _calc_scaleiqa_calc_scale to clear the bugprone-reserved-identifier check. If upstream modifies _calc_scale, keep the fork's name and port the behavioural change.
  • model_collection_parse_loop writes directly to cfg_name rather than through c->name — if upstream ever rewrites model_collection_parse, preserve the direct write (it's what lets the param stay non-const without a NOLINT).
  • Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for mask in 0 255; do
  VMAF_CPU_MASK=$mask ./build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    -m version=vmaf_v0.6.1 -o /tmp/vmaf_$mask.xml
done
diff <(grep -v fyi /tmp/vmaf_0.xml) <(grep -v fyi /tmp/vmaf_255.xml)
# expect exit 0 (Netflix-golden-pair VMAF bit-identical scalar vs SIMD)

Also run clang-tidy -p build on every file in Touches; expect zero warnings. - Follow-up T7-6: decide whether to rename the _iqa_* API surface (convolve / ssim / decimate / img_filter / filter_pixel / get_pixel) across all callers to clear the remaining bugprone-reserved-identifier suppressions in ssim.c, ms_ssim.c, float_ms_ssim.c. Out of scope here.

0040 — Thread-pool job recycling + inline data buffer (ADR-0147)

  • ADR: ADR-0147
  • Touches: core/src/thread_pool.c
  • Invariants:
  • VmafThreadPoolJob carries a fixed-size char inline_data[64] buffer. Payloads ≤ 64 bytes go through memcpy(job->inline_data, data, data_sz) + job->data = job->inline_data; payloads > 64 bytes take the legacy malloc path. The cleanup path MUST distinguish the two via job->data != job->inline_data — a naive free(job->data) would corrupt the slot. Enforced in vmaf_thread_pool_job_clear_data.
  • free_jobs list is protected by the existing queue.lock; enqueue pops from it before mallocing, runner recycles onto it after running a job. vmaf_thread_pool_destroy walks the list after vmaf_thread_pool_wait returns (all workers have exited → no lock needed). Any reorder that frees the queue lock before the free_jobs walk is a leak on shutdown.
  • Fork's void (*func)(void *data, void **thread_data) signature + per-worker VmafThreadPoolWorker are fork-local; upstream Netflix #1464 has func(void *data). Keep the fork's signature on any rebase — callers (src/libvmaf.c:threaded_enqueue_one etc.) depend on the two-arg form.
  • On upstream sync: Netflix PR #1464 is CLOSED (not merged) and bundles twelve unrelated optimizations. Only the thread-pool portion is ported here. If upstream ever reopens and merges #1464 (or a successor), cherry-pick only the pool mechanics; reject the payload-signature changes, the ADM / VIF / predict.c pieces (they conflict with ADR-0138 / 0139 / 0142 bit-exactness and with T7-5 predict.c refactor), and the feature-collector capacity bump (fork already capped at 8 for a reason — see src/feature/feature_collector.c).

  • Re-test on rebase (x86, any libsvm-less host):

ninja -C build && meson test -C build
for threads in 1 4; do
  for mask in 0 255; do
    VMAF_CPU_MASK=$mask ./build/tools/vmaf \
      --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
      --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
      --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
      -m version=vmaf_v0.6.1 --threads $threads -o /tmp/vmaf_${threads}_${mask}.xml
  done
done
# Expect bit-identical scores (attribute order may differ across
# --threads 1 vs --threads 4 because feature-collector emits in
# insertion order; the numeric values match).
diff <(grep -v fyi /tmp/vmaf_4_0.xml) <(grep -v fyi /tmp/vmaf_4_255.xml)
# expect exit 0 (scalar vs SIMD threaded)

Also run clang-tidy -p build core/src/thread_pool.c — expect zero warnings. Re-run the 500 000-job micro-benchmark from ADR-0147 §Decision if performance is under investigation.

0041 — IQA reserved-identifier rename + cleanup (ADR-0148)

  • ADR: ADR-0148
  • Touches: 21 files across core/src/feature/ (iqa/{convolve,decimate,ssim_tools}.{c,h}, iqa/ssim_simd.h, ssim.c, integer_ssim.c, ms_ssim.c, ms_ssim_decimate.h, float_ssim.c, float_ms_ssim.c, x86/convolve_avx2.{c,h}, x86/convolve_avx512.{c,h}, arm64/convolve_neon.{c,h}, AGENTS.md) plus core/test/test_iqa_convolve.c.
  • Invariants:
  • Every _iqa_* / _kernel / _ssim_int / _map_reduce / _map / _reduce / _context / _ms_ssim_* / _ssim_* / _alloc_buffers / _free_buffers symbol and the four underscore-prefixed header guards (_CONVOLVE_H_, _DECIMATE_H_, _SSIM_TOOLS_H_, __VMAF_MS_SSIM_DECIMATE_H__) is renamed to its non-reserved spelling. The fork's IQA surface no longer uses C's reserved-identifier name space.
  • The clang-analyzer-security.ArrayBound NOLINT bracket in ssim_accumulate_row and ssim_reduce_row_range (integer_ssim.c) is load-bearing — the inner kernel-loop k_min / k_max clamping is provably correct (k_min = max(0, hkernel_offs - x), k_max = min(hkernel_sz, hkernel_sz - (x + hkernel_offs - w + 1))) but the analyzer can't follow it across helper boundaries. Do not collapse the bracket.
  • The clang-analyzer-unix.Malloc NOLINT bracket in test_iqa_convolve.c (check_simd_variant, check_case) is intentional — test exits process on failure path; small allocations leak by design at test end. Do not refactor to free-on-exit.
  • The cross-TU NOLINT pattern on compute_ssim (ssim.c) and compute_ms_ssim (ms_ssim.c) — clang-tidy misc-use-internal-linkage runs per-TU and can't see the header bridge to float_ssim.c / float_ms_ssim.c. Keep the inline justification comment.
  • On upstream sync:
  • The Netflix upstream IQA library (tjdistler/iqa) has been effectively abandoned (last meaningful commit pre-2020). Future rebases will conflict on every renamed symbol; drop the underscore-prefix on each conflict and mirror the fork's iqa_* naming.
  • If upstream Netflix/vmaf ever reincorporates the IQA naming wholesale, prefer the fork's spellings — this PR is a one-shot mechanical rename with no semantic content.
  • Re-test on rebase:
ninja -C build && meson test -C build
for mask in 0 255; do
  VMAF_CPU_MASK=$mask ./build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    -m version=vmaf_v0.6.1 \
    --feature float_ssim --feature float_ms_ssim \
    -o /tmp/iqa_$mask.xml
done
diff <(grep -v fyi /tmp/iqa_0.xml) <(grep -v fyi /tmp/iqa_255.xml)
# expect exit 0 (bit-identical scalar vs SIMD on float_ssim/ms_ssim)

Also run clang-tidy -p build on every touched file (excluding arm64/); expect zero warnings.

0042 — Port Netflix #1376 — FIFO-hang fix via Semaphore (ADR-0149)

  • ADR: ADR-0149
  • Upstream commit: Netflix PR #1376, head 1c06ca4f1bb5da38b54db075a27c35ba8ea9d7b7 (OPEN upstream as of 2026-04-24).
  • Touches:
  • python/vmaf/core/executor.py — base Executor class + ExternalVmafExecutor-style subclass; delete _wait_for_workfiles / _wait_for_procfiles polling loops; rewrite _open_{work,proc}files_in_fifo_mode around multiprocessing.Semaphore(0); add open_sem=None kwarg to every _open_{ref,dis}_{work,proc}file and to the _open_workfile staticmethod; drop unused from time import sleep.
  • python/vmaf/core/raw_extractor.pyAssetExtractor + DisYUVRawVideoExtractor; add open_sem=None to _open_{ref,dis}_workfile overrides (release on entry since these are no-ops); delete _wait_for_workfiles overrides; drop unused from time import sleep.
  • Fork carve-outs (load-bearing on rebase):
  • python/vmaf/__init__.py:__version__ stays "3.0.0" — do NOT port upstream's bump to "4.0.0". The fork tracks its own versioning (v3.x.y-lusoris.N) per ADR-0025.
  • from time import sleep is dropped from both files — upstream leaves the import in place (unused after their patch); the fork removes it because ADR-0141 touched-file rule requires ruff F401 clean.
  • Upstream typo preserved: the subclass warning message contains "to be created to be created". Comments note the typo inline; do not silently fix on rebase — it's upstream- authored and project policy is verbatim port.
  • On upstream sync: upstream PR #1376 is still OPEN. When it merges, re-diff against the merged form; the touched hunks should be conflict-free because the fork now carries the same shape. Re-check whether upstream fixed the "to be created to be created" typo; if so, adopt the fix (it becomes a simple string update).
  • Re-test:
python3 -m py_compile python/vmaf/core/executor.py \
                       python/vmaf/core/raw_extractor.py
ruff check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
black --check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
# all silent

# No FIFO-mode unit test in the tree; end-to-end harness
# exercise (needs libsvm + ffmpeg + fixtures) goes via
#   make test-netflix-golden
# which doesn't exercise fifo_mode path but does verify the
# refactor didn't break executor.py imports.

0043 — Port Netflix #1472 — CUDA on Windows MSYS2/MinGW (ADR-0150)

  • ADR: ADR-0150
  • Upstream commits: Netflix PR #147215745cdf (portability) + b7b65e64 (meson plumbing). Both OPEN upstream as of 2026-04-24.
  • Touches:
  • core/src/cuda/common.h — drop <pthread.h> include; rename reserved header guard __VMAF_SRC_CUDA_COMMON_H__VMAF_SRC_CUDA_COMMON_INCLUDED.
  • core/src/cuda/cuda_helper.cuh#ifdef DEVICE_CODE guard around <cuda.h> vs <ffnvcodec/dynlink_loader.h>.
  • core/src/picture.h#ifdef DEVICE_CODE guard around <cuda.h> + forward-declare VmafCudaState vs <ffnvcodec/*> + full libvmaf_cuda.h; rename reserved header guard.
  • core/src/feature/integer_adm.h — updated comment above dwt_7_9_YCbCr_threshold table noting the fork's positional-initializer shape vs upstream's #ifndef __CUDACC__ shape (see §Fork carve-outs).
  • core/src/feature/cuda/integer_adm/{adm_cm,adm_csf,adm_csf_den,adm_decouple,adm_dwt2}.cu#ifndef DEVICE_CODE guard around #include "feature_collector.h".
  • core/src/meson.build — Windows nvcc plumbing (+70 LoC under host_machine.system() == 'windows'): vswhere-based cl.exe discovery, MSVC + Windows SDK include path injection, CUDA version detection via nvcc --version, nvcc_ccbin_flags + nvcc_host_includes threaded through every custom_target that invokes nvcc.
  • Fork carve-outs (load-bearing on rebase):
  • integer_adm.h uses positional initializers, NOT upstream's #ifndef __CUDACC__ wrap. Both shapes resolve the MSVC/nvcc C++-designated-initializer issue; the positional form is C++-portable and keeps the table available to future .cu consumers. Keep the fork's form on rebase.
  • cuda_static_lib keeps dependencies : [pthread_dependency]. Upstream drops it; the fork needs it because ring_buffer.c (built as part of cuda_static_lib) #includes <pthread.h> directly. On rebase: keep the fork's version.
  • meson.build gencode coverage block: the fork's ADR-0122 explicit cubin list (sm_75/80/86/89 + compute_80 PTX) sits after the new upstream nvcc-detect block. On rebase, re-assemble the same merged order: nvcc-detect first, then gencode coverage (both host-independent).
  • Header guards: _INCLUDED spellings are fork-local (ADR-0148 precedent). Upstream keeps reserved __VMAF_SRC_*_H__ spellings. On rebase, keep _INCLUDED.
  • On upstream sync: PR #1472 is still OPEN. When merged, re-diff the three conflict-resolved hunks against upstream's final form. Keep fork's version on the four carve-outs above unless upstream meaningfully reshapes those regions.
  • Re-test on rebase (Linux host with CUDA toolkit):
meson setup libvmaf core/build-cuda \
    -Denable_cuda=true -Denable_nvcc=true -Denable_sycl=false
ninja -C core/build-cuda && meson test -C core/build-cuda
# Expect 6 .fatbin files generated + CLI linked + 35/35 tests pass.

Windows validation is operator-driven — CI does not yet have a Windows + MSYS2 + MinGW + MSVC BuildTools + CUDA runner (tracked as T7-3 in .workingdir2/OPEN.md). - Prerequisites note (Windows only): nv-codec-headers must be built from git master commit 876af32 or later. The release tag n13.0.19.0 is missing cuMemFreeHost, cuStreamCreateWithPriority, cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing issue, not scope of this port.

0058 — libvmaf.pc Cflags leak fix (ADR-0200)

  • ADR: ADR-0200; bug-fix follow-up to entry 0057.
  • Upstream source: fork-local. Netflix has no Vulkan backend.
  • Touches:
  • core/subprojects/packagefiles/volk/meson.build — drops -include volk_priv_remap.h from volk_dep.compile_args; keeps -DVK_NO_PROTOTYPES.
  • core/src/vulkan/meson.build — pulls volk_priv_remap_h_path from the volk subproject and appends ['-include', <path>] to vmaf_cflags_common (private c_args: on libvmaf's library() call).
  • Invariants (load-bearing):
  • -include MUST stay off volk_dep.compile_args — otherwise it leaks into static libvmaf.pc Cflags. Test on rebase: meson setup ... -Ddefault_library=static -Denable_vulkan=enabled, then grep Cflags meson-private/libvmaf.pc — must NOT contain volk_priv_remap or any build-dir absolute path.
  • -include MUST be applied to libvmaf's compile — every libvmaf TU that calls volk's vk* API needs the rename macros active. The vmaf_cflags_common injection covers this for all libvmaf sub-libraries (libvmaf_feature, libvmaf_cpu, etc.).
  • The path comes from subproject('volk').get_variable(...), not from a hardcoded string — survives volk wrap version bumps.
  • On upstream sync: zero upstream interaction.
  • Re-test on rebase / volk wrap bump:
meson setup build-vk-static-test libvmaf -Denable_vulkan=enabled \
    -Denable_cuda=false -Denable_sycl=false -Ddefault_library=static
ninja -C build-vk-static-test src/libvmaf.a
grep Cflags build-vk-static-test/meson-private/libvmaf.pc
# Expected: no `volk_priv_remap` substring, no build-dir absolute path

0057 — Volk vk* priv-remap for static-archive builds (ADR-0198)

  • ADR: ADR-0198; follow-up to ADR-0185.
  • Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
  • Touches:
  • core/subprojects/packagefiles/volk/meson.build — overlay applied on top of the upstream volk wrap. Adds a custom_target that runs gen_priv_remap.py to produce volk_priv_remap.h from the upstream volk.h, and wires -include of the generated header into volk.c's c_args and volk_dep's compile_args.
  • core/subprojects/packagefiles/volk/gen_priv_remap.py — fork-added generator script (regex against extern PFN_vkXxx vkXxx; declarations).
  • Invariants (load-bearing):
  • Force-include must propagate to every libvmaf TU pulling in volk_dep — verified via meson dep graph. Removing the -include from compile_args re-introduces the static-link multi-def cascade.
  • Generator regex matches every vk* PFN declaration in volk.h — confirmed for volk-1.4.341 (784 declarations, 784 remaps). Bumping the volk wrap version: re-run the generator (it's a configure-time custom target, so it's automatic) and confirm the rename count printed to stdout matches the count of ^extern PFN_vk lines in the new volk.h.
  • The renamed symbols use the vmaf_priv_ prefix — chosen to match no upstream Netflix or Vulkan SDK identifier. Don't rename to _vk* (collides with reserved-identifier C namespace) or vkv_* etc.
  • On upstream sync: zero upstream interaction. The volk wrap is a libvmaf-managed subproject; Netflix doesn't ship a Vulkan backend.
  • Re-test on rebase / after any volk wrap bump:
meson setup build-vk-static libvmaf -Denable_vulkan=enabled \
    -Denable_cuda=false -Denable_sycl=false \
    -Ddefault_library=static
ninja -C build-vk-static src/libvmaf.a
test "$(nm build-vk-static/src/libvmaf.a 2>/dev/null \
          | grep -cE '^[0-9a-f]* (T|D|B|R) vk[A-Z]')" = "0" \
    && echo OK

(Followed by the BtbN-style link reproducer in the ADR References section.)

0056 — SSIMULACRA 2 snapshot gate + fp-contract-off split (ADR-0164)

  • ADR: ADR-0164
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
  • Touches:
  • python/test/ssimulacra2_test.py — new fork-added Python test. Uses subprocess.call against ExternalProgram.vmafexec with --feature ssimulacra2; parses the --json output; asserts pooled + per-frame scores.
  • Invariants (load-bearing):
  • Pinned values are CPU-only — generated on master HEAD after PR #100 merge. Re-generate if the scalar or any SIMD path changes semantically (which per ADR-0161/0162/0163's bit-exactness contract, it shouldn't — any bit-exact refactor leaves pinned values unchanged).
  • Tolerance is 4 decimal places (places=4) — matches 1e-4. The CPU paths are bit-exact so actual drift should be 0; the tolerance is defensive.
  • -ffp-contract=off everywhere in the ssimulacra2 pipeline: libvmaf_ssimulacra2_static_lib (scalar extractor), x86_ssimulacra2_avx2_lib, x86_ssimulacra2_avx512_lib, and arm64_ssimulacra2_lib (from ADR-0161). All four split out of their umbrella libs so other extractors keep upstream's default FMA policy. Without this the CI GCC/clang hosts drifted ~2e-4 from my AVX-512 authoring host — GCC 10+ defaults -ffp-contract=fast on x86 with -mfma and on aarch64, fusing a*b+c in scalar glue around the SIMD calls. Do NOT remove any of these carve-outs on rebase.
  • Fixtures are already-checked-insrc01_hrc00/01_576x324 is also the primary Netflix golden fixture; the 160×90 derived one stresses the sub-176 pyramid-termination path.
  • Do NOT modify the Netflix golden assertions in quality_runner_test.py et al. — those are upstream-pinned. This test is a SEPARATE file that adds fork-specific scores.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future, cross-reference against their pinning if they add one.
  • Re-test on rebase / after any ssimulacra2 change:
cd python && python -m pytest test/ssimulacra2_test.py -v   # 2/2
  • Follow-ups:
  • Cross-reference gate against libjxl tools/ssimulacra2 when ssimulacra2_rs cargo install is fixed.
  • Expand fixture coverage if new YUV test assets land.

0055 — SSIMULACRA 2 picture_to_linear_rgb SIMD (ADR-0163)

  • ADR: ADR-0163
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
  • Touches:
  • ssimulacra2_avx2.{c,h} — new ssimulacra2_picture_to_linear_rgb_avx2 + helpers (read_plane_scalar_s2, srgb_to_linear_lane_avx2, compute_matrix_coefs).
  • ssimulacra2_avx512.{c,h} — 16-wide AVX-512 port.
  • ssimulacra2_neon.{c,h} — 4-wide aarch64 port.
  • ssimulacra2.c — new ptlr_fn field in Ssimu2State; dispatch wrapper convert_picture_to_linear_rgb unpacks VmafPicture into simd_plane_t[3]; init assigns AVX2/AVX-512/NEON pointers.
  • ssimulacra2_simd_common.h — new shared header declaring simd_plane_t. Decouples SIMD TUs from VmafPicture type.
  • test_ssimulacra2_simd.c — new test_ptlr_420_8, test_ptlr_420_10, test_ptlr_444_8, test_ptlr_444_10, test_ptlr_422_8 subtests + scalar references ref_read_plane, ref_srgb_to_linear, ref_picture_to_linear_rgb.
  • Invariants (load-bearing):
  • Scalar-order matmulG = Yn + cb_g * Un + cr_g * Vn chained left-to-right in all three SIMD TUs. Regression test catches reordering drift (~1 ulp).
  • Per-lane scalar powf — vector polynomial approximation would drift scalar bit-exactness. Do not replace the lane spill/reload pattern with a vector libm.
  • simd_plane_t layout{data, stride, w, h} ordering assumed by all three SIMD TUs. The dispatch wrapper builds this from VmafPicture fields; layout must match.
  • Bounds clamping in read_plane_scalar_* mirrors scalar reference verbatim (if (sx < 0) sx = 0; if (sx >= pw) sx = pw-1; etc.). Do not simplify — removes per-lane safety at plane edges.
  • Arbitrary chroma ratios fall through to the int64_t multiplication branch. Don't remove it — SSIMULACRA 2 is supposed to accept non-standard ratios gracefully.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides a SIMD YUV→RGB path, diff against the fork's — preserve the bit-exactness contract unless ADR-0142 Netflix-authority carve-out opens.
  • Re-test on rebase:
ninja -C build && build/test/test_ssimulacra2_simd     # 11/11
ninja -C build-aarch64 && \
  qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
    build-aarch64/test/test_ssimulacra2_simd            # 11/11
  • Follow-ups:
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending (gated on tools/ssimulacra2 availability).
  • SSIMULACRA 2 now has zero scalar hot paths. T3-1 closes in full with phases 1+2+3 (ADR-0161, 0162, 0163).

0054 — SSIMULACRA 2 FastGaussian IIR blur SIMD (ADR-0162)

  • ADR: ADR-0162
  • Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf.
  • Touches:
  • ssimulacra2_avx2.{c,h} — new ssimulacra2_blur_plane_avx2 + 2 helpers (hblur_8rows_avx2, vblur_simd_8cols_avx2).
  • ssimulacra2_avx512.{c,h} — 16-wide port.
  • ssimulacra2_neon.{c,h} — 4-wide aarch64 port, uses vsetq_lane_f32 in place of gather.
  • ssimulacra2.c — adds blur_fn function pointer to Ssimu2State, dispatch in init_simd_dispatch(), call-site in blur_3plane.
  • test_ssimulacra2_simd.c — new test_blur + scalar reference (ref_blur_plane, ref_fast_gaussian_1d).
  • Invariants (load-bearing):
  • Row-batching lane layout — horizontal pass lane i MUST hold row (y_base + i). Gather index vector entries are (y_base + i) * w (stride-w). Changing this breaks bit-exactness vs scalar.
  • Scalar left-to-right summation ordern2_k * sum - d1_k * prev1_k - prev2_k chained sequentially; o0 + o1 + o2 at output time is (o0 + o1) + o2. Changing to (o0 + o2) + o1 or o0 + (o1 + o2) will drift ~1 ulp and the regression test catches it.
  • col_state is 6 * w contiguous floats — layout is [prev1_0 | prev1_1 | prev1_2 | prev2_0 | prev2_1 | prev2_2]. SIMD loads assume this layout; changing field order requires updating all three SIMD TUs in lockstep with blur_plane.
  • NEON lane-set pattern — aarch64 has no gather intrinsic; 4 explicit vsetq_lane_f32 calls per input vector. Do not replace with a ld1 {v.s}[lane]-style pseudo-gather without re-verifying bit-exactness.
  • Scalar tail in vertical pass matches scalar reference body verbatim. Any deviation breaks memcmp equality on widths that aren't multiples of the SIMD width.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides their own IIR blur SIMD, diff against the fork's and preserve the bit-exactness contract unless an ADR-0142 Netflix-authority carve-out is opened.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd  # 6/6
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
  build-aarch64/test/test_ssimulacra2_simd  # 6/6
  • Follow-ups:
  • picture_to_linear_rgb SIMD — last scalar hot path in the extractor. 2 calls / frame. Low ROI but mechanical.
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending.

0053 — SSIMULACRA 2 SIMD bit-exact ports (ADR-0161)

  • ADR: ADR-0161
  • Upstream source: fork-local. Upstream Netflix/vmaf has no SSIMULACRA 2 extractor at all (fork-added in ADR-0130).
  • Touches:
  • ssimulacra2_avx2.c / .h — 5 AVX2 kernels + per-lane cbrtf helper.
  • ssimulacra2_avx512.c / .h — 5 AVX-512 kernels; mechanical 16-wide widening of the AVX2 path.
  • ssimulacra2_neon.c / .h — 5 NEON kernels; 4-wide aarch64 mirror.
  • ssimulacra2.c — adds function-pointer dispatch fields to Ssimu2State + init_simd_dispatch() helper, calls go through the pointers.
  • meson.build — registers the three SIMD TUs in x86_avx2_sources / x86_avx512_sources / arm64_sources.
  • test_ssimulacra2_simd.c and test/meson.build — new bit-exact test harness.
  • Invariants (load-bearing):
  • Byte-for-byte bit-exactness to scalar on all 5 vectorised kernels under FLT_EVAL_METHOD == 0. Regression caught pre- merge: naïve pairing (a+b)+(c+d) vs scalar ((a+b)+c)+d drifts by 1 ULP. Keep sequential scalar-order chains in all three SIMD TUs on rebase.
  • cbrtf is per-lane scalar libm, not a polynomial. Any replacement with a vector cbrt would drift the ssimulacra2 score and break the regression test. Keep the spill/reload pattern.
  • ssim_map / edge_diff_map reductions use the ADR-0139 per-lane double scalar tail. Do NOT SIMD-reduce float lanes then lift to double — summation order changes.
  • downsample_2x2 deinterleave uses ISA-appropriate ops: AVX2 vshufps+vpermpd, AVX-512 vpermt2ps, NEON vuzp1q_f32+vuzp2q_f32. After deinterleave, sum order is ((r0e+r0o)+r1e)+r1o matching scalar.
  • #pragma STDC FP_CONTRACT OFF at every TU header. Ignored by aarch64 GCC (non-fatal -Wunknown-pragmas); kept for portability (clang, MSVC).
  • IIR blur + picture_to_linear_rgb stay scalar in this PR. Follow-up PRs target these; when they land, re-verify bit-exactness via test_ssimulacra2_simd expansion.
  • Runtime dispatch order: AVX-512 > AVX2 on x86; NEON on aarch64; scalar fallback. Preserve on rebase.
  • On upstream sync:
  • Upstream has no SSIMULACRA 2 extractor; nothing to merge.
  • If Netflix adopts SSIMULACRA 2 in the future, diff their implementation against the fork's scalar + SIMD TUs; keep the fork's bit-exactness contract absent a specific Netflix-authority carve-out ADR.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd   # 5/5
clang-tidy -p build core/src/feature/x86/ssimulacra2_avx2.c \
                     core/src/feature/x86/ssimulacra2_avx512.c
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
  build-aarch64/test/test_ssimulacra2_simd   # 5/5
clang-tidy -p build-aarch64 \
  core/src/feature/arm64/ssimulacra2_neon.c
  • Follow-ups:
  • IIR blur vectorisation (blur_plane vertical-pass column batching) — the biggest frame-level wallclock win.
  • picture_to_linear_rgb per-lane powf — lower ROI but mechanical.
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — ADR-0130 deferred; still pending.

0052 — psnr_hvs SIMD bit-exact ports (ADR-0159 AVX2, ADR-0160 NEON)

  • ADRs: ADR-0159 (AVX2), ADR-0160 (NEON sister port).
  • Upstream source: fork-local. Upstream Netflix/vmaf has no psnr_hvs SIMD path.
  • Touches:
  • core/src/feature/x86/psnr_hvs_avx2.c — AVX2 TU.
  • core/src/feature/x86/psnr_hvs_avx2.h — AVX2 header.
  • core/src/feature/arm64/psnr_hvs_neon.c — NEON TU (sister port, ADR-0160).
  • core/src/feature/arm64/psnr_hvs_neon.h — NEON header.
  • core/src/feature/third_party/xiph/psnr_hvs.c — add PsnrHvsState + runtime dispatch in init() (AVX2 under ARCH_X86, NEON under ARCH_AARCH64) + scoped NOLINTBEGIN/END around the upstream Xiph scalar block (kept verbatim as the bit-exact reference).
  • core/src/meson.build — add x86/psnr_hvs_avx2.c to x86_avx2_sources and arm64/psnr_hvs_neon.c to arm64_sources.
  • core/test/test_psnr_hvs_avx2.c, core/test/test_psnr_hvs_neon.c — bit-exact unit tests (x86 and aarch64 respectively).
  • core/test/meson.build — register both tests under enable_asm, arch-gated.
  • Invariants (load-bearing):
  • Bit-exactness to scalar: every od_coeff (int32) and every final psnr_hvs_{y,cb,cr,psnr_hvs} value the AVX2 path emits must be byte-identical to the scalar reference on the Netflix golden pairs. If a rebase introduces any pattern that breaks this (e.g. a floating-point horizontal reduce in the mask accumulator), the unit test test_psnr_hvs_avx2 will fail — don't relax the assertions; fix the SIMD path.
  • DCT butterfly layout: butterfly → transpose → butterfly → transpose. The transpose lives inside od_bin_fdct8x8_avx2. Do not move it.
  • Float accumulators stay scalar: means / variances / mask / error accumulation in calc_psnrhvs_avx2 use the same per-block scalar loop as scalar psnr_hvs — bit-exact by construction. Do not vectorize these with horizontal reductions without replicating ADR-0139's per-lane scalar-float reduction pattern. The cross-block error accumulator ret is threaded through accumulate_error() by pointer, not returned-then-summed: each of the 64 per-coefficient contributions per block must hit the outer ret directly, matching scalar's inline ret += ... at third_party/xiph/psnr_hvs.c line 355. IEEE-754 float add is non-associative — summing into a local float and then adding the per-block total to ret changes the summation tree and drifts the Netflix golden by ~5.5e-5.
  • #pragma STDC FP_CONTRACT OFF at the TU header disables FMA formation. Required: fmaf(a, b, c) can differ from (a*b)+c by 1 ulp, breaking bit-exactness. Do not remove the pragma; do not add -ffp-contract=fast to the build flags for this TU.
  • NOLINT suppressions are load-bearing — each cites ADR-0141 inline (bit-exactness scalar-diff auditability for the 30-butterfly function, scalar float→double promotion for sqrt, extractor-registry extern linkage for vmaf_fex_psnr_hvs, upstream-Xiph scoped block for rebase parity).
  • On upstream sync:
  • Upstream has no psnr_hvs SIMD as of 2026-04-24. Keep fork's version on conflict.
  • If upstream ever touches psnr_hvs.c for non-SIMD reasons (e.g. a masking-table update), rebase the AVX2 TU to match line-for-line and re-run test_psnr_hvs_avx2 to confirm bit-exactness survives.
  • NEON follow-up PR is a sister port; its arm64/psnr_hvs_neon.c will mirror this ADR's invariants. On rebase, the two SIMD TUs must stay in lock-step with the scalar reference.
  • Re-test on rebase:
ninja -C build
meson test -C build test_psnr_hvs_avx2
# Expect: 5/5 subtests pass (DCT bit-exact on 3 random seeds +
# delta + constant input).

# CLI-level bit-exactness on Netflix golden (requires the YUV
# fixtures in python/test/resource/yuv/):
# VMAF_CPU_MASK=0    (scalar)
# VMAF_CPU_MASK=255  (AVX2 enabled)
# Diff per-frame psnr_hvs_{y,cb,cr,psnr_hvs} XML fields; expect
# byte-identical across all 3 golden pairs.

0051 — Netflix#1486 motion updates verified present (ADR-0158)

  • ADR: ADR-0158
  • Upstream source: Netflix upstream PR #1486 ("Port motion updates"), MERGED 2026-04-20 as commits a44e5e6 (code) + 62f47d5 (Netflix golden updates).
  • Touches: documentation-only; the actual code changes this ADR documents are already in the fork's master via earlier incremental motion3 / blend / five-frame-window commits.
  • Invariants (load-bearing for future /sync-upstream):
  • The edge_8 mirror fix (i_tap = height - (i_tap - height + 2)) is present at integer_motion.c:240, x86/motion_avx2.c:147, x86/motion_avx512.c:147. If upstream's mirror line ever diverges again, this is the hunk to watch.
  • The motion_max_val feature option is at integer_motion.c:57,118-120 with default 10000.0 and FEATURE_PARAM flag. Upstream's default = fork's default; don't drift.
  • VMAF_integer_feature_motion3_score output plumbing is in integer_motion.c + alias.c.
  • Fork-local motion extensions (five-frame-window, moving-average, blend, fps_weight) are ADDITIONS on top of Netflix#1486. They are not upstream. Upstream changes to motion extractor internals may conflict with them — diff against core/src/feature/integer_motion.c on every rebase and check that the fork's MIN(s->score * s->motion_fps_weight, s->motion_max_val) invocations are preserved (lines ~409, ~503).
  • On upstream sync: nothing to port from Netflix#1486 — it's absorbed. If a future upstream PR touches the same code paths, prefer upstream's version for the scalar/edge handling and the fork's version for the five-frame-window / blend extensions.
  • Re-test on rebase:
ninja -C build
meson test -C build
# Expect: 35/35 pass.

# Verify the upstream markers are still in place after rebase:
grep -n "height - (i_tap - height + 2)\|motion_max_val\|VMAF_integer_feature_motion3_score" \
    core/src/feature/integer_motion.c \
    core/src/feature/alias.c \
    core/src/feature/x86/motion_avx2.c \
    core/src/feature/x86/motion_avx512.c
# Expect: matches at all 4 files. If any missing, the rebase
# silently dropped the Netflix#1486 content — investigate.

0050 — CUDA preallocation memory leak fix + vmaf_cuda_state_free (ADR-0157)

  • ADR: ADR-0157
  • Upstream source: Netflix upstream issue #1300 (OPEN since 2024; no maintainer fix as of 2026-04-24). User reports GPU memory rises monotonically across init/preallocate/fetch/close cycles.
  • Touches:
  • core/include/libvmaf/libvmaf_cuda.h — new public vmaf_cuda_state_free() API declaration.
  • core/src/cuda/common.c — new vmaf_cuda_state_free() implementation; vmaf_cuda_release() now calls cuda_free_functions(); vmaf_cuda_state_init() gets an outer failure unwind; init_with_primary_context() releases the retained primary context on fail_after_pop.
  • core/src/cuda/ring_buffer.c (since folded into the per-stream dispatch + drain machinery; see core/src/cuda/dispatch_strategy.c and core/src/cuda/drain_batch.c) — vmaf_ring_buffer_close() then unlocked + destroyed the mutex before freeing.
  • core/test/test_cuda_preallocation_leak.c — new GPU-gated reducer (10-cycle loop with full cleanup).
  • core/test/test_cuda_pic_preallocation.c, core/test/test_cuda_buffer_alloc_oom.c — add missing vmaf_cuda_state_free() + vmaf_model_destroy() calls after vmaf_close() in every test that allocates these.
  • core/test/meson.build — register the new reducer under enable_cuda guard.
  • Invariants (load-bearing):
  • Public contract: every caller of vmaf_cuda_state_init() MUST call vmaf_cuda_state_free() AFTER vmaf_close() on any VmafContext that imported the state. Informal free(cu_state) is a silent double-free hazard AFTER close (vmaf_close's vmaf_cuda_release already memset's + frees CudaFunctions internals; vmaf_cuda_state_free only frees the heap allocation itself).
  • vmaf_cuda_release() frees CudaFunctions via a saved pointer AFTER the memset. Order matters — memset first so cu_state->f is zeroed in the caller's struct, then free via the saved local. Do not re-order.
  • vmaf_ring_buffer_close() unlocks BEFORE destroying the mutex (POSIX requires the mutex be unlocked for destroy).
  • The cold-start unwind in init_with_primary_context releases cuDevicePrimaryCtxRetain's retained context if cuStreamCreateWithPriority fails.
  • The ADR-0122 / ADR-0123 is_cudastate_empty() null-guards at the top of every public vmaf_cuda_* entry must continue to compose with the new vmaf_cuda_state_free() (which accepts NULL directly and doesn't call through to the CUDA API).
  • The new free call order in callers is: vmaf_close(vmaf)vmaf_cuda_state_free(cu_state)vmaf_model_destroy(model). Reversing the first two produces a use-after-free.
  • On upstream sync:
  • Upstream has no vmaf_cuda_state_free() as of 2026-04-24. Keep the fork's version on any conflict. If upstream eventually lands the same API with a different spelling, prefer upstream's spelling and add a compat alias — but do not break the fork's ABI.
  • vmaf_cuda_release()'s cuda_free_functions() call is fork-local. On rebase, keep it.
  • The ring-buffer pthread_mutex_unlock + pthread_mutex_destroy pair is fork-local. On rebase, keep it.
  • If upstream refactors VmafCudaState ownership semantics (unlikely — their pattern has been "leaked state in a long- lived process is acceptable" historically), re-audit this ADR and the new public API.
  • Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 40/40 pass including test_cuda_preallocation_leak.

# ASan leak-check:
cd libvmaf && meson setup build-asan-cuda \
    -Db_sanitize=address -Denable_cuda=true -Denable_sycl=false \
    --buildtype=debug
ninja -C build-asan-cuda
ASAN_OPTIONS='detect_leaks=1:leak_check_at_exit=1' \
    build-asan-cuda/test/test_cuda_preallocation_leak
# Expect: 0 bytes leaked from core/src/* frames.
# (~180 bytes in libcuda.so.1 is expected — driver's process-
#  lifetime cuInit cache, does not grow per cycle.)

0049 — CUDA graceful error propagation (ADR-0156)

  • ADR: ADR-0156
  • Upstream source: Netflix upstream issue #1420 (OPEN as of 2026-04-24). Reports that two concurrent VMAF-CUDA processes crash the second one at vmaf_cuda_buffer_alloc due to CHECK_CUDA(cuMemAlloc)assert(0) on OOM.
  • Touches:
  • core/src/cuda/cuda_helper.cuh — redefined CHECK_CUDA family. New macros CHECK_CUDA_GOTO + CHECK_CUDA_RETURN + helper vmaf_cuda_result_to_errno. Old assert(0) semantics removed entirely.
  • core/src/cuda/common.c, core/src/cuda/picture_cuda.c, core/src/libvmaf.c — all CHECK_CUDA(...) sites converted; cleanup labels added where contexts / buffers were pushed / allocated.
  • core/src/feature/cuda/integer_motion_cuda.c, integer_vif_cuda.c, integer_adm_cuda.c — same conversion; 12 static helpers promoted void → int.
  • core/test/test_cuda_buffer_alloc_oom.c — new GPU-gated reducer.
  • core/test/meson.build — register new test under enable_cuda guard.
  • Invariants (load-bearing):
  • CHECK_CUDA_GOTO / CHECK_CUDA_RETURN must never call assert(0) or abort() on a CUDA error. Any regression back to the upstream abort-on-error semantics re-introduces Netflix#1420 and the NDEBUG footgun.
  • Every CHECK_CUDA_GOTO target label must pop any previously-pushed CUDA context and free any partially-constructed buffers before returning the errno. The graceful path must not leak resources.
  • vmaf_cuda_result_to_errno uses numeric CUresult values directly (0 / 1 / 2 / 3 / 4 / 101 / 201 / 400) so host TUs that don't include <cuda.h> can transitively consume the mapping via the inline function. If upstream renumbers CUresult enum values (historically stable — they've been fixed since CUDA 1.0), re-audit the switch.
  • ADR-0122 / ADR-0123 is_cudastate_empty(...) guards at the top of every public vmaf_cuda_* entry point must stay — they run before the CUDA API is touched and compose cleanly with the new error propagation.
  • Twelve static helper signatures in the feature extractors are int-returning (was void): any upstream-port that restores the void return silently regresses the error path.
  • On upstream sync:
  • Upstream Netflix still uses assert(0) in CHECK_CUDA as of 2026-04-24. Keep the fork's macro definitions in cuda_helper.cuh on any upstream conflict — this file is fork-local behaviour.
  • If upstream eventually lands Netflix#1420 with a similar refactor, prefer the fork's version unless upstream's has identical semantics (no assert(0) / no abort() / translates CUresult to -errno). Re-verify test_cuda_buffer_alloc_oom after rebase.
  • If upstream adds new CHECK_CUDA(...) sites in a port, rewrite them to CHECK_CUDA_GOTO / CHECK_CUDA_RETURN as part of the port commit.
  • If upstream changes any of the 12 static helper signatures back to void, re-promote them to int during the merge.
  • Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 39/39 pass including test_cuda_buffer_alloc_oom.

# Reducer check — verify the OOM-to-errno path is live:
meson test -C core/build-cuda test_cuda_buffer_alloc_oom -v
# Expect subtests: request 1 TiB → -ENOMEM; request 0 bytes → 0.

clang-tidy -p core/build-cuda --quiet \
    core/src/cuda/common.c \
    core/src/cuda/picture_cuda.c \
    core/src/feature/cuda/integer_motion_cuda.c \
    core/src/feature/cuda/integer_vif_cuda.c \
    core/src/feature/cuda/integer_adm_cuda.c \
    core/src/libvmaf.c
# Expect exit 0 on every file.

0049 — compute_motion / picture_copy signature changes (b949cebf upstream port)

  • Upstream commit: Netflix/vmaf b949cebf (feature/motion: port several feature extractor options)
  • Prerequisite commit: Netflix/vmaf d3647c73 (picture_copy: add channel parameter)
  • PR: upstream/port-b949cebf-motion

Rebase-sensitive invariants:

  1. compute_motion signature changecompute_motion() in core/src/feature/motion.c / motion.h now takes an extra int motion_decimate parameter (the motion_add_scale1 flag). Any new caller added in the fork that calls compute_motion() must pass this parameter. The SIMD integer motion callers (motion_avx2.c, motion_avx512.c) do NOT call compute_motion() — they use the SAD/convolution dispatch table directly and are unaffected.

  2. vmaf_image_sad_c signature change — similarly gains int motion_add_scale1. Any caller in the fork must be updated. Currently only called from compute_motion() internally.

  3. picture_copy signature change — gains int channel as the last parameter (0=Y, 1=U, 2=V). Every caller in the tree has been updated to pass 0 (luma). When adding new callers that need UV planes, pass 1 or 2. The fork's CUDA/SYCL/Vulkan callers have been updated in this PR.

  4. Default behavior preserved — all new options default to no-op values. motion_add_scale1=false, motion_add_uv=false, motion_blend_factor=1.0, motion_fps_weight=1.0, motion_filter_size=5 (= DEFAULT_MOTION_FILTER_SIZE). Integer and float motion2 scores are bit-identical to pre-port baseline.

  5. vif_scale_frame_s dependency avoided — the upstream b949cebf motion.c imports vif_scale_frame_s from vif_tools.h. The fork does not have this function yet (vif options chain is deferred, Research-0024 Strategy E). The bilinear downscaler for motion_add_scale1 is implemented as local static functions in motion.c (motion_scale_bilinear, motion_bilinear_interp, motion_mirror_f). When upstream's vif options chain is eventually ported, reconcile by replacing these local functions with vif_scale_frame_s.

Reproducer:

# verify bit-exactness (default options, scores must be identical):
./core/build/tools/vmaf \
  --reference testdata/ref_576x324_48f.yuv \
  --distorted testdata/dis_576x324_48f.yuv \
  --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
  --model path=model/vmaf_v0.6.1.json \
  --feature motion --no_prediction --json --output /tmp/motion.json
# integer_motion2 scores must match pre-port baseline at 6 decimal places.

0048 — i4_adm_cm int32 rounding overflow deliberately preserved (ADR-0155)

  • ADR: ADR-0155
  • Upstream source: Netflix upstream issue #955 (OPEN since 2020; no maintainer response as of 2026-04-24). Reports that add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1)) in core/src/feature/integer_adm.c scales 1–3 overflows int32_t (1u << 31 = 0x80000000 wraps to -2147483648). Rounding term is sign-negated; ADM scales 1–3 biased low by ≈1 LSB per summed term.
  • Touches (documentation-only):
  • docs/adr/0155-adm-i4-rounding-deferred-netflix-955.md — new ADR (this entry's anchor).
  • core/src/feature/integer_adm.c — in-file warning comment above the overflow site (add_bef_shift_flt[] initialiser loop around line 1277). No code change.
  • core/src/feature/AGENTS.md — invariant note under "Rebase-sensitive invariants".
  • Invariants (load-bearing — do NOT silently "fix"):
  • integer_adm.c keeps int32_t add_bef_shift_flt[3] with the overflowing 1u << 31 assignment. The Netflix golden assertions (python/test/quality_runner_test.py, vmafexec_test.py, feature_extractor_test.py) encode the buggy ADM output. Project hard rule #1 (ADR-0024) prohibits changing those assertions.
  • Any "fix" that changes ADM numerical output must land together with a coordinated Netflix-authored golden-number update (the ADR-0142 Netflix-authority carve-out). Until Netflix#955 closes upstream, there is no authority to track.
  • On upstream sync:
  • If Netflix finally lands a fix for #955 (widening the rounding term to uint32_t or int64_t), sync the C-side fix AND the updated assertAlmostEqual values in the same merge. Re-run make test-netflix-golden and /cross-backend-diff on the golden pairs to verify the new numbers are consistent across CPU / CUDA / SYCL.
  • Remove the in-file warning comment above the add_bef_shift_flt initialiser loop, flip ADR-0155 to Superseded by ADR-NNNN, and drop this rebase-notes entry.
  • If upstream instead closes #955 as wont-fix, keep this entry verbatim and update the ADR status to note upstream's closure.
  • Re-test on rebase (gates the invariant by confirming the golden numbers are unchanged):
ninja -C build
make test-netflix-golden
# Expect: VMAF mean 76.66890… on src01_hrc00/01_576x324 golden
# pair — bit-identical to pre-rebase.

0047 — vmaf_score_pooled -EAGAIN for pending features (ADR-0154)

  • ADR: ADR-0154
  • Upstream source: Netflix upstream issue #755 (OPEN as of 2026-04-24). Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); fork reopens it via error-code semantics without changing the retroactive-write design.
  • Touches:
  • core/src/feature/feature_collector.cvmaf_feature_collector_get_score returns -EAGAIN (was -EINVAL) when the requested index is valid but not yet written.
  • core/src/feature/feature_collector.h — inline vmaf_feature_vector_get_score now returns -EINVAL for null/out-of-range and -EAGAIN for not-written (was -1 for both). Added #include <errno.h>. Rename reserved __VMAF_FEATURE_COLLECTOR_H__ guard to VMAF_FEATURE_COLLECTOR_INCLUDED.
  • core/test/test_score_pooled_eagain.c — new 4-subtest reducer.
  • core/test/meson.build — register the new test.
  • Invariants (load-bearing, enforced by the reducer):
  • vmaf_feature_collector_get_score(fc, name, &score, i) returns -EAGAIN iff the feature name is registered and i is in range but score[i].written == false.
  • The return stays -EINVAL for (a) null pointers, (b) i >= feature_vector->capacity, (c) unknown feature name.
  • The inline fast-path vmaf_feature_vector_get_score uses the same split.
  • On upstream sync: upstream has not changed the error semantics since 2020. If they do (unlikely), keep the fork's -EAGAIN — it is strictly more informative and downstream code depending on the split would regress.
  • Re-test on rebase:
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: 4/4 subtests pass.

# Reducer check:
git stash push core/src/feature/feature_collector.c core/src/feature/feature_collector.h
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: Fail: 1 (tests fail without -EAGAIN split).
git stash pop

0046 — float_ms_ssim min-dim guard (ADR-0153)

  • ADR: ADR-0153
  • Upstream source: Netflix upstream issue #1414 (OPEN as of 2026-04-24). No upstream fix has landed; fork adds the guard independently.
  • Touches:
  • core/src/feature/float_ms_ssim.c — add #include "log.h" + #include "iqa/ssim_tools.h" + a min_dim = GAUSSIAN_LEN << (SCALES - 1) check at the start of init; extract SIMD dispatch into a new ms_ssim_init_simd_dispatch helper to keep init within the ADR-0141 60-line budget.
  • core/test/test_float_ms_ssim_min_dim.c — new 3-subtest reducer.
  • core/test/meson.build — register the new test executable.
  • Invariant (load-bearing, enforced by the reducer): float_ms_ssim.init returns -EINVAL when w < 176 || h < 176, where 176 is computed dynamically from the filter constants. The magic number is not hardcoded — changing SCALES or GAUSSIAN_LEN upstream will auto-update the minimum.
  • On upstream sync: if Netflix upstream lands a similar init-time guard, keep the fork's version — the helper name ms_ssim_init_simd_dispatch is fork-local (introduced to satisfy ADR-0141) and upstream's patch won't match. Both guards should be compatible; re-verify the reducer after rebase.
  • Re-test on rebase:
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: 3/3 subtests pass.

# Reducer check (confirms the guard is load-bearing):
git stash push core/src/feature/float_ms_ssim.c
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: Fail: 1 (tests fail without the guard).
git stash pop

0045 — vmaf_read_pictures monotonic-index guard (ADR-0152)

  • ADR: ADR-0152
  • Upstream source: Netflix upstream issue #910 (OPEN as of 2026-04-24). No upstream fix has landed; the fork adds the guard independently, per the 2021-10-14 maintainer comment that recommended exactly this shape.
  • Touches:
  • core/src/libvmaf.c — add unsigned last_index + bool have_last_index fields to VmafContext; prepend a monotonic-index check inside read_pictures_validate_and_prep (returns -EINVAL on duplicates / regressions); update the two new fields at the tail of the same helper on success.
  • core/test/test_read_pictures_monotonic.c — new 3-subtest reducer covering the Netflix#910 sequence and the two classes of rejection (duplicate, out-of-order).
  • core/test/meson.build — register the new test executable.
  • Invariant (load-bearing, enforced by the reducer): vmaf_read_pictures(vmaf, ref, dist, index) returns -EINVAL when have_last_index && index <= last_index. Flush (vmaf_read_pictures(vmaf, NULL, NULL, 0)) routes to flush_context before the guard runs — flushing remains always-available independent of the last accepted index.
  • On upstream sync:
  • If Netflix upstream eventually lands a similar guard at the API boundary, keep the fork's version — the helper function name (read_pictures_validate_and_prep) is fork-local (ADR-0146), upstream's patch will target a different insertion point. Both guards should be compatible; re-verify the reducer after rebase.
  • If upstream instead lands an internal reordering mechanism (buffer-and-sort frames before dispatch), revisit this decision — the fork's API-level contract is stricter and may need to relax to match. Open a new ADR if so.
  • Re-test on rebase:
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: 3/3 subtests pass.

# Reducer check (confirms the guard is load-bearing):
git stash push core/src/libvmaf.c
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: Fail: 1 (the test rejects the un-guarded behaviour).
git stash pop

0044 — i686 (32-bit x86) build-only CI job (ADR-0151)

  • ADR: ADR-0151
  • Upstream source: Netflix upstream issue #1481 (OPEN as of 2026-04-24). Reports i686 compile failure on _mm256_extract_epi64. Workaround documented in the issue: -Denable_asm=false.
  • Touches:
  • build-aux/i686-linux-gnu.ini — new cross-file; gcc + -m32 + cpu_family = 'x86' / cpu = 'i686'. No exe_wrapper.
  • .github/workflows/libvmaf-build-matrix.yml — new matrix row with i686: true flag + new install-deps step for gcc-multilib + g++-multilib; existing "Run tests" + "Run tox tests (ubuntu)" steps widened with && !matrix.i686 guards.
  • Invariants:
  • The i686 matrix row pins -Denable_asm=false — this is the upstream-documented workaround for _mm256_extract_epi64's missing declaration on 32-bit x86 targets. Do NOT remove the flag without first gating every _mm256_extract_epi64 call site in core/src/feature/x86/adm_avx2.c + motion_avx2.c + adm_avx512.c on __x86_64__. Removing the flag naively will re-break the build.
  • No exe_wrapper in the cross-file: meson marks tests as SKIP 77 even though the host can run i686 binaries natively. Build-only gate by design.
  • On upstream sync:
  • If upstream Netflix fixes #1481 at source (by gating the intrinsic calls on __x86_64__ or by emulating via two _mm256_extract_epi32 halves), sync the fix and re-enable ASM on the i686 row (drop -Denable_asm=false from meson_extra). Re-verify bit-exactness via /cross-backend-diff on the x86_64 golden pair.
  • If upstream marks i686 unsupported in meson (e.g. via a hard error), the fork's i686 row should be removed or downgraded to continue-on-error: true.
  • Re-test on rebase (Ubuntu host with gcc-multilib):
meson setup libvmaf core/build-i686 \
    --cross-file=build-aux/i686-linux-gnu.ini \
    -Denable_asm=false \
    -Denable_cuda=false -Denable_sycl=false
ninja -C core/build-i686
file core/build-i686/tools/vmaf
# Expect: ELF 32-bit LSB pie executable, Intel i386

CI runs this same sequence via the new matrix row.

0058 — Tiny-AI Netflix corpus training scaffold (ADR-0252)

  • ADR: ADR-0252.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training harness or MCP server.
  • Touches:
  • ai/ — training harness; NflxLocalDataset loader reads from --data-root (never from a hardcoded path).
  • docs/ai/training-data.md — corpus path convention and loader API docs; purely additive.
  • mcp-server/vmaf-mcp/tests/test_smoke_e2e.py — new e2e smoke test; references only committed golden fixtures.
  • Invariants (load-bearing):
  • Data path is local-only. .workingdir2/netflix/ is gitignored; no YUV from this corpus is ever committed. The --data-root CLI flag must remain the sole mechanism for locating the corpus.
  • Smoke test uses only committed fixtures. test_smoke_e2e.py references python/test/resource/yuv/src01_hrc00_576x324.yuv (a committed golden file), never the local corpus path. On upstream sync the golden YUV path must stay stable.
  • No Netflix golden assertion is modified. The places=4 tolerance in test_smoke_e2e.py asserts against the vmaf_v0.6.1 CPU reference; it is not a golden assertion and may be adjusted by /regen-snapshots with justification.
  • On upstream sync: zero interaction with Netflix upstream. The ai/ subtree and mcp-server/ are wholly fork-local; upstream merges are conflict-free here. If Netflix ever ships a training harness, reconcile separately.
  • Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (vmaf binary)
# Skips automatically if binary or golden YUV is absent.

0085 — Research-0030 Phase-3b multi-seed validation (Gate 1 passed)

  • No ADR. Empirical research digest closing Gate 1 of the 3-gate v2 validation chain. Architecture decision unchanged.
  • Upstream source: fork-local. Netflix has no multi-seed validation surface for tiny-AI training.
  • Touches (additive only):
  • docs/research/0030-phase3b-multiseed-validation.md — per-seed PLCC tables + stability analysis + Gate 2/3 plan.
  • ai/scripts/phase3_subset_sweep.py — adds --seeds flag (comma-separated list) + per-seed result aggregation.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The +0.0175 Δ is multi-seed mean PLCC, not seed-0 PLCC. Don't cite the +0.0106 from Research-0029 once Research-0030 lands; the multi-seed number is more trustworthy.
  • Subset B is more stable than canonical-6 across seeds. Don't ship a v2 model citing single-seed numbers — always report multi-seed mean ± seed-mean-std for any tiny-AI metric in a future digest.
  • The --seeds flag aggregates by flattening (seed × fold) pairs. The reported mean_plcc is the mean of all n_seeds × n_folds measurements; seed_mean_plcc_std is the std across per-seed means, which is the right number for "is the result seed-stable".
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files reproduce from the canonical command.

0084 — Research-0029 Phase-3b StandardScaler retry (positive result)

  • No ADR. Empirical research digest; revives the Research-0026 hypothesis after the Research-0028 negative result. The architectural decision (ship vmaf_tiny_v2) is gated on three validation steps documented in the digest §"Required before shipping".
  • Upstream source: fork-local. Netflix has no tiny-AI preprocessing-sensitivity analysis surface.
  • Touches (additive only):
  • docs/research/0029-phase3b-standardscaler-results.md — per-fold tables + apples-to-apples comparison + 3-gate pre-shipping checklist.
  • ai/scripts/phase3_subset_sweep.py — adds --standardize flag + _standardize_inplace helper.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • StandardScaler statistics MUST be fit per-fold on the train split only. Fitting on the full data would leak held-out information into LOSO; the _standardize_inplace helper enforces this by taking only the train slice as input.
  • A shipped vmaf_tiny_v2.onnx MUST bundle its scaler (mean, std) in the sidecar JSON — otherwise inference applies different normalisation than training and the win evaporates. Currently UN-implemented; tracked as a §"Caveats" #5 follow-up.
  • Subset B's feature list is the load-bearing finding: adm2, adm_scale3, vif_scale2, motion2, ssimulacra2, psnr_hvs, float_ssim. Phase-3c experiments may shift the optimal arch / lr / epochs but should keep this set.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the --standardize invocation in §"Reproducer".

0082 — Research-0028 Phase-3 subset sweep (negative-result digest)

  • No ADR. Empirical research digest. The architectural decision (no v2 model ships from this Phase) is governed by Research-0027's pre-registered stopping rule.
  • Upstream source: fork-local. Netflix has no tiny-AI subset- sweep surface.
  • Touches (additive only):
  • docs/research/0028-phase3-subset-sweep.md — per-fold tables
    • headline + standardisation caveat + Phase-3b/c/d follow-ups.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • canonical-6 stays the default until Phase-3b lands a ≥ 0.005 PLCC win (per Research-0027 stopping rule).
  • The PLCC drop is most likely a feature-scale issue, not evidence the new features lack signal. Don't cite this digest to retire ssimulacra2 / adm_scale3 from the candidate pool; re-test with StandardScaler first.
  • Phase-3 results are seed=0 only. Any v2-shipping decision needs 3-seed mean±std and KoNViD cross-check.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; runs/ files are reproducible from the canonical command in §"Reproducer".

0081 — Research-0027 Phase-2 feature importance results

  • No ADR. Empirical research digest closing Research-0026 Phase 2; the architectural decision (Subset A / B / C) is deferred to Phase-3 results in a future digest.
  • Upstream source: fork-local. Netflix has no cross-metric feature-importance analysis surface.
  • Touches (additive only):
  • docs/research/0027-phase2-feature-importance.md — per-method top-10 + consensus + redundancy + Phase-3 subset recommendations.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Consensus top-10 is the load-bearing finding: adm2, adm_scale3, ssimulacra2, vif_scale2. Phase-3 candidate subsets MUST include all four.
  • The 11-pair redundancy table is corpus-specific — measurements on Netflix Public 9-source. KoNViD-1k cross- check is a Phase-3 prerequisite if Subsets B/C advance.
  • runs/full_features_netflix.parquet and runs/full_features_correlation.json stay gitignored. Reproducer in §"Reproducer" regenerates both.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the canonical commands.

0080 — Phase-2 analysis scripts (Research-0026 Phase 2 prep)

  • No ADR. Pure analysis scaffolding; the architectural decision (which features to ship in v2) is gated on Phase 2's numerical output via Research-0027.
  • Upstream source: fork-local. Netflix has no tiny-AI training nor cross-metric correlation tooling.
  • Touches (additive only):
  • ai/scripts/extract_full_features.py — parquet extractor over Netflix corpus with FULL_FEATURES. Per-clip JSON cache at $XDG_CACHE_HOME/vmaf-tiny-ai-full/<source>/<dis_stem>.json.
  • ai/scripts/feature_correlation.py — Pearson + MI + LASSO
    • RF + consensus top-K analyser; outputs JSON.
  • ai/tests/test_feature_correlation.py — 5 pytest cases against synthetic parquet (no libvmaf dependency).
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The per-clip JSON cache and the FULL_FEATURES tuple must stay in lock-step. If the tuple grows (or shrinks), pre-existing cache files become stale and silently misalign their stored per_frame columns with the new tuple. The extractor MUST be re-run with a cleared cache when FULL_FEATURES changes. Regression hint: test_default_features_unchanged in test_feature_sets.py already guards the canonical 6; extend coverage to FULL_FEATURES if rebases touch it.
  • motion3 resolves to extractor motion_v2 in _METRIC_TO_EXTRACTOR, not motion3 (the upstream-canonical extractor name in the integer_motion_v2 module). The CLI --feature motion3 does NOT exist. The JSON output key is integer_motion3 which _lookup finds via the integer_ fallback.
  • adm and vif aggregates are NOT in FULL_FEATURES. The integer extractor emits integer_adm2 and integer_vif_scale0..3 but no bare adm/vif. Listing them produced all-NaN columns in v1 — fixed in PR #185 amend.
  • On upstream sync: zero interaction. Pure fork-side analysis tooling.
  • Re-test on rebase:
pytest ai/tests/test_feature_correlation.py ai/tests/test_feature_sets.py -v
# Expect: 14 passed in <1 s.

0079 — Tiny-AI feature-set registry (Research-0026 Phase 1)

  • No ADR. Pure additive extension of an existing module; the architectural decision (which features, which model) lives in Research-0026's go/no-go gate after Phase 2.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training pipeline.
  • Touches (additive only):
  • ai/data/feature_extractor.py — adds FULL_FEATURES (21 entries), FEATURE_SETS registry, resolve_feature_set() helper. _METRIC_TO_EXTRACTOR grew 11 → 25 entries.
  • ai/tests/test_feature_sets.py — new 9-test smoke suite.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant — these are load-bearing):
  • DEFAULT_FEATURES stays the canonical 6-tuple matching vmaf_v0.6.1's SVR input layout. Test test_default_features_unchanged is the regression guard; any quiet broadening would invalidate every shipped tiny-AI ONNX (input-dim baked into the model). If a future change must broaden the default, ship a paired model swap under ADR-0049 sidecar policy.
  • FULL_FEATURES excludes lpips and float_moment per Research-0026 §"Open questions" Q1. Test test_full_features_excludes_lpips_and_moment enforces. Adding either would re-classify the experiment from "tiny model on classical features" to "ensemble of DNNs".
  • Every entry in FULL_FEATURES MUST have an entry in _METRIC_TO_EXTRACTOR. Test test_every_full_feature_has_extractor_mapping is the guard — without the mapping the libvmaf CLI silently emits NaN columns for the missing metric.
  • On upstream sync: zero interaction. Fork-only training surface.
  • Re-test on rebase:
pytest ai/tests/test_feature_sets.py -v
# Expect: 9 passed in <1 s.

0078 — Research-0026 cross-metric feature fusion plan

  • No ADR. Pure research-plan digest; the architectural decision (which features to add) is deferred to Research-0027 follow-up after Phase 2 numbers land.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training and no broader-feature-set hypothesis under investigation.
  • Touches (additive only):
  • docs/research/0026-cross-metric-feature-fusion.md — 4-phase experimental plan + cost estimate + go/no-go criteria.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The 6-feature canonical baseline (adm2, vif_scale0..3, motion2) stays the default. Any v2 model is opt-in via a new feature_set field in the sidecar JSON; existing vmaf_tiny_v1.onnx users get the same numbers.
  • lpips is OUT of the candidate pool (Phase 1/2). It's DNN-based and would blur the line between "tiny model on classical features" and "ensemble of DNNs". Revisit only if classical features can't close the gap.
  • On upstream sync: zero interaction. Pure fork-side research planning.
  • Re-test on rebase: documentation-only; no test surface.

0077 — Research-0025 FoxBird outlier resolved via KoNViD combined training

  • No ADR. Empirical research digest closing the open question in Research-0023 §5; no architecture or policy decision. Pure documentation of an empirical result.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training, no KoNViD-1k integration, and no LOSO eval surface.
  • Touches (additive only):
  • docs/research/0025-foxbird-resolved-via-konvid.md — per-clip table + comparison to Netflix-only baselines + interpretation + caveats + next-experiment list.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The training-fit per-clip numbers in §"Per-clip result" are NOT held-out generalisation metrics — FoxBird is in the training set. The proper validation is the LOSO sweep on the combined corpus (§"Next experiments" #1). Don't cite the 0.9936 FoxBird PLCC as a generalisation number; cite it as "training-fit on combined corpus, 5.4× RMSE improvement vs Netflix-only".
  • Combined trainer command line is canonical. The reproduction recipe in §"Setup" includes --seed 0, --konvid-val-fraction 0.1, --val-source Tennis, --val-mode netflix-source-and-konvid-holdout. Changing any knob invalidates the per-clip numbers.
  • runs/tiny_combined_canonical/ stays gitignored. The final ONNX is reproducible from the parquet + Netflix corpus + the canonical CLI; the durable record is the digest's table.
  • On upstream sync: zero interaction. Research digest is fork-only.
  • Re-test on rebase:
python ai/train/train_combined.py \
  --netflix-root .workingdir2/netflix \
  --konvid-parquet ai/data/konvid_vmaf_pairs.parquet \
  --model-arch mlp_small --epochs 30 --batch-size 256 --lr 1e-3 \
  --val-mode netflix-source-and-konvid-holdout \
  --val-source Tennis --konvid-val-fraction 0.1 --seed 0 \
  --out-dir runs/tiny_combined_canonical
# Expect: FoxBird PLCC ≈ 0.9936 ± 1e-3 (numerical-noise floor),
# mean PLCC ≥ 0.9983 across 9 Netflix clips.

0076 — Research-0024 vif/adm upstream-divergence digest (Strategy E doc)

  • No ADR. Pure documentation digest; the divergence decisions it ratifies are already governed by ADR-0138 / 0139 / 0142 / 0143 (vif SIMD bit-exactness contract) and ADR-0024 (Netflix golden-data immutability). The digest itself fits the per-PR research-digest deliverable bar from ADR-0108.
  • Upstream source: forward-looking — pre-emptively documents the fork's non-port of Netflix 4ad6e0ea / 41d42c9e / bc744aa3 / 8c645ce3 (vif chain) and 4dcc2f7c (float_adm chain). Strategy A on b949cebf motion chain stays approved.
  • Touches (additive only):
  • docs/research/0024-vif-upstream-divergence.md — 5-strategy decision matrix + numerical-risk analysis for each chain.
  • core/src/feature/AGENTS.md — two new "rebase-sensitive invariants" entries pinning the vif and adm divergences.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant — these are the whole point):
  • Do not port 4ad6e0ea (vif runtime helpers) or 8c645ce3 (vif prescale options) verbatim. They replace the precomputed vif_filter1d_table_s table whose frozen const float Gaussians make AVX2 == AVX-512 == NEON == scalar bit-for-bit. A future opt-in second-path port (Strategy C, runtime helpers behind --vif-prescale != 1) is allowed but must not touch the default code path.
  • Do not port 4dcc2f7c float_adm options chain. The 12-parameter compute_adm signature change cascades through SIMD (avx2 / avx512 / neon) and 3 GPU backends (vulkan / cuda / sycl). The new aim feature has no fork- side golden values; defer until concrete user demand.
  • Mirror bugfix 41d42c9e is a separate decision. Must come paired with places=4 → places=3 golden loosening per ADR-0142 Netflix-authority precedent. Not part of Strategy E; eligible for a focused single-purpose PR if any shipped model drifts more than places=3 because of the missing fix.
  • b949cebf motion chain port stays APPROVED under Strategy A (verbatim, float_motion-side only). Float_motion has no precomputed-table investment to protect; existing fork integer_motion already has 6/9 of these options; cheap to mirror onto float_motion.
  • On upstream sync: zero conflict — pure additions to research/ and AGENTS.md.
  • Re-test on rebase: documentation-only PR; rendered markdown is the only verification surface.
# Re-run the diff scan that produced the digest (catches new
# upstream commits since 9dac0a59):
git fetch upstream && git log --pretty=format:'%h %s' \
  upstream/master ^origin/master --since="2026-01-01" \
  -- core/src/feature/{float_,integer_,}{vif,motion,adm,cambi}*.{c,h} \
     core/src/feature/{vif,motion,adm,cambi}_options.h \
  | head -30
# If new vif / adm option ports appear, update Research-0024 §"Same
# divergence test for motion + float_adm" before deciding to port.

0075 — Upstream 798409e3 + 314db130 ports (CUDA null-deref + remove all.c)

  • No ADR. Pure upstream cherry-picks per ADR-0108 carve-out ("pure upstream syncs and port-upstream-commit PRs are exempt").
  • Upstream source:
  • 798409e3 (Lawrence Curtis, 2026-04-20): "Fix null deref crash on prev_ref update in pure CUDA pipelines"
  • 314db130 (Kyle Swanson, 2026-04-28): "libvmaf/feature: remove empty translation unit all.c"
  • Touches (additive / removal only):
  • core/src/libvmaf.c — adds if (ref && ref->ref) guard before vmaf_picture_ref(&vmaf->prev_ref, ref) at the two threaded paths (threaded_enqueue_one line 1057 and threaded_read_pictures_batch line 1105). Main path at line 1597 already has the guard.
  • core/src/feature/all.c — file deleted.
  • core/src/meson.build — drops the feature_src_dir + 'all.c' line.
  • core/src/feature/offset.c — updates the // NOLINTNEXTLINE comment to drop all.c from the list of per-feature consumers.
  • CHANGELOG.md Unreleased § Fixed (798409e3) + § Changed (314db130).
  • Invariants (rebase-relevant):
  • The fork has THREE prev_ref update sites; all need the if (ref && ref->ref) guard. The main vmaf_read_pictures path already had it (via read_pictures_update_prev_ref helper); the threaded paths (#ifdef VMAF_BATCH_THREADING) inherited the unguarded shape from upstream's old code. Future upstream rebases must preserve all three guards even if Netflix refactors the threaded paths.
  • all.c deletion is symbol-safe. All compute_* functions it forward-declared are reached via per-extractor TUs that #include the relevant <feature>.h. No external linker dependency on all.c's symbols.
  • On upstream sync: zero conflict expected — fork now matches upstream tip on these two surfaces.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
  -Denable_vulkan=disabled
ninja -C build-cpu
meson test -C build-cpu  # 37 tests, all pass.

0074 — Combined Netflix + KoNViD-1k trainer driver

  • No ADR. Pure engineering follow-up; the architecture rationale is fully covered by ADR-0203 (training-prep architecture) and Research-0023 §5 (FoxBird-class outlier needs broader corpus).
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI trainer.
  • Stacks on the KoNViD-1k loader bridge (PR #178 / rebase-note 0073). Rebase order: land 0073 first.
  • Touches (additive only):
  • ai/train/train_combined.py — concatenating trainer that reuses _build_model / _train_loop / export_onnx from ai/train/train.py.
  • ai/tests/test_train_combined_smoke.py — 5 pytest cases (key splitter + --epochs 0 paths, no libvmaf or real corpus required).
  • docs/ai/training.md — "Combining KoNViD with the Netflix corpus" subsection rewritten from "follow-up" to runnable.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Reuse the canonical training-loop helpers. Don't fork _build_model / _train_loop / export_onnx into this file. Both trainers must share the model factory so a future change (e.g. adding mlp_large) lands in one place.
  • KoNViD train/val splits hold out whole clip keys, not random frames. A frame-level split would let frames from the same clip leak across train/val and inflate PLCC by 5-10 pp (well-known VQA pitfall — same reasoning as ADR-0203's Netflix 1-source-out split).
  • Missing data falls back, not errors. Missing --konvid-parquet → Netflix-only path. Missing --netflix-root → KoNViD-only path. Both missing → initial- weights ONNX export + rc=0 so the smoke command always produces a deterministic artefact.
  • On upstream sync: zero interaction; pure fork-local trainer.
  • Re-test on rebase:
pytest ai/tests/test_train_combined_smoke.py -v
# Expect: 5 passed (under ~3 s, no libvmaf required).
python ai/train/train_combined.py --epochs 0 \
  --netflix-root /tmp/missing --konvid-parquet /tmp/missing.parquet \
  --out-dir /tmp/combined_smoke
# Expect: <out-dir>/mlp_small_combined_final.onnx written, rc=0.

0073 — KoNViD-1k → VMAF-pair acquisition + loader bridge

  • No ADR. Acquisition + loader pieces are pure additions; the methodology fits inside ADR-0203 / Research-0019.
  • Upstream source: fork-local. KoNViD-1k integration is a fork-only training-data play.
  • Touches (additive only):
  • ai/scripts/konvid_to_vmaf_pairs.py — acquisition pipeline.
  • ai/train/konvid_pair_dataset.pyKoNViDPairDataset class mirroring NetflixFrameDataset's interface.
  • ai/tests/test_konvid_pair_dataset.py — 5 pytest cases.
  • docs/ai/training.md — new "C1 (KoNViD-1k corpus)" section.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • KoNViDPairDataset mirrors NetflixFrameDataset shape. feature_dim == 6, numpy_arrays() → (X, y) returns (n_frames, 6) + (n_frames,). If NetflixFrameDataset's feature order changes, mirror it here.
  • Acquisition parquet schema is fixed. Required columns: key, frame_index, vif_scale0..3, adm2, motion2, vmaf. Add freely; do NOT rename / drop those.
  • ai/data/konvid_vmaf_pairs.parquet and $VMAF_TINY_AI_CACHE/konvid-1k/ stay gitignored. They regenerate from raw KoNViD .mp4 sources.
  • On upstream sync: zero interaction.
  • Re-test on rebase:
pytest ai/tests/test_konvid_pair_dataset.py -v
# Expect: 5 passed
python ai/scripts/konvid_to_vmaf_pairs.py --max-clips 5
# Expect: ~7 s wall, ai/data/konvid_vmaf_pairs.parquet with
#         5 unique keys × ~200 frames each.

0072 — Tiny-AI 3-arch LOSO eval harness + Research-0023

  • No ADR. Methodology fits inside Research-0023; ADR-0203 already covers the training-prep architecture and the three-arch sweep concept.
  • Research digest: docs/research/0023-loso-3arch-results.md.
  • Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
  • Touches (additive only):
  • ai/scripts/eval_loso_3arch.py — new harness; reuses the _load_session + _load_clip + CLIPS helpers from eval_loso_mlp_small.py (PR #165).
  • docs/research/0023-loso-3arch-results.md — methodology + per-fold tables for mlp_small / mlp_medium / linear.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Reuse the PR #165 helpers. Don't fork the _load_session external-data workaround into a copy — both scripts must keep using the same import. If a follow-up re-exports the shipped baselines with corrected external_data.location, both scripts deprecate the workaround simultaneously.
  • runs/ and model/tiny/training_runs/ stay gitignored. The harness writes runs/loso_eval/loso_3arch_eval.{json,md}; the durable record is the table in Research-0023 §2 + the per-fold tables in §3. Regenerate via the loop in §6 of the digest.
  • On upstream sync: zero interaction. Pure fork-local evaluation harness.
  • Re-test on rebase:
python ai/scripts/eval_loso_3arch.py
diff <(jq -r '.archs.mlp_small.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9808)
diff <(jq -r '.archs.mlp_medium.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9727)
diff <(jq -r '.archs.linear.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.3679)
# Expect: identical lines on a populated cache + identical fold ONNX.

0071 — T7-16 ADM Vulkan/SYCL drift verified-resolved (doc close)

  • No ADR. Verification-only close, sister of T7-15.
  • Upstream source: fork-local. ADM cross-backend gate is a fork-only test surface; Netflix/vmaf has no Vulkan or SYCL backend.
  • Touches (additive only):
  • docs/state.md — new "Recently closed" row for T7-16.
  • .workingdir2/BACKLOG.md — T7-16 row marked closed (local- only planning dossier; gitignored).
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • places=4 cross-backend ADM contract. Empirical adm_scale2 max_abs_diff is now 1e-6 (print floor; ULP=0) on Vulkan device 0 (NVIDIA), device 1 (Mesa anv on Arc), and SYCL device 0 (Arc); residual adm_scale1 ≈ 3.1e-5 and adm2 ≈ 5e-6 on 1/48 frames pass places=4 (5e-5 tolerance) but fail places=5. Hold the gate at places=4.
  • No ADM kernel source change. Fix is environmental (NVCC + driver + SYCL runtime).
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --feature adm --backend vulkan --device 0 --places 4 \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324
# Expect: 0/48 mismatches across all 5 ADM metrics.

0070 — T7-15 motion CUDA/SYCL drift verified-resolved (doc close)

  • No ADR. Verification-only close; no code change in PR #172.
  • Upstream source: fork-local. Cross-backend gate is a fork-only test surface; not in Netflix/vmaf.
  • Touches (additive only):
  • docs/state.md — "Recently closed" row for T7-15.
  • .workingdir2/BACKLOG.md — T7-15 row marked closed (local- only planning dossier; gitignored).
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • The places=4 cross-backend gate stays at places=4. Empirical max_abs_diff is currently 0.0 (CUDA) or 1e-6 (SYCL/ Vulkan, JSON %f rounding floor); tightening to places=5 could be tempting but the 1e-6 print-floor would then make the SYCL + Vulkan rows fail. Hold at places=4 until --precision=max is wired into the diff tool.
  • No motion-kernel source change. PR #172 didn't modify core/src/feature/cuda/integer_motion/*.cu or core/src/feature/sycl/integer_motion_sycl.cpp. The fix is environmental (NVCC + driver), so the next CI run on a fresh image needs to be re-verified against the gate.
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature motion --backend cuda \
  --places 4
# Expect: 0/48 mismatches, max_abs_diff = 0.0

0069 — libvmaf_vulkan.h installed under prefix (build bug)

  • No ADR. Build-system bug fix; matches existing CUDA / SYCL install conditions.
  • Upstream source: fork-local. Vulkan backend is fork-only; Netflix/vmaf has no libvmaf_vulkan.h.
  • Touches:
  • core/include/core/meson.build — adds an is_vulkan_enabled gate that handles the feature option's enabled / auto states; appends libvmaf_vulkan.h to platform_specific_headers when active.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • Install rule mirrors the CUDA / SYCL pattern but uses the feature-option API. The is_cuda_enabled = get_option('enable_cuda') == true boolean idiom doesn't apply to enable_vulkan because that's a feature option, not a boolean. Use .enabled() or .auto(). Don't "simplify" to == true — that would silently drop the install in the auto state.
  • Pairs with ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch which probes for the header via check_pkg_config libvmaf_vulkan "libvmaf >= 3.0.0" libvmaf/libvmaf_vulkan.h vmaf_vulkan_state_init_external. Removing the install rule re-introduces lawrence's 2026-04-28 symptom: FFmpeg silently drops the libvmaf_vulkan filter despite --enable-libvmaf-vulkan.
  • On upstream sync: zero interaction; Vulkan backend is fork-only.
  • Re-test on rebase:
cd libvmaf
CC=icx CXX=icpx meson setup build -Denable_vulkan=enabled \
  -Denable_cuda=true -Denable_sycl=true -Db_lto=false
ninja -C build
meson install -C build --destdir /tmp/libvmaf-install
ls /tmp/libvmaf-install/usr/local/include/libvmaf/libvmaf_vulkan.h
# Expect: file exists.

0066 — --backend cuda inverted-gpumask fix (CLI bug)

  • No ADR. Bug fix; behaviour now matches the public-header VmafConfiguration::gpumask contract.
  • Upstream source: fork-local. The --backend CLI selector was added by the fork (Netflix/vmaf has no exclusive-backend selector).
  • Touches (additive + 1-line behavioural fix):
  • core/tools/cli_parse.c::parse_cli_args--backend cuda branch sets gpumask = 0 (was gpumask = 1).
  • core/test/test_cli_parse.c — 5 new regression tests (test_backend_{cpu,cuda_engages_cuda,cuda_preserves_explicit_gpumask,sycl,vulkan}) plus run_aom_ctc_tests / run_backend_tests helper split to keep run_tests under the function-size budget.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • VmafConfiguration::gpumask semantics: if gpumask: disable CUDA. compute_fex_flags in src/libvmaf.c routes CUDA only when gpumask == 0. Any code path that sets a non-zero gpumask to "request CUDA" silently disables it. The CLI's --backend cuda branch must set gpumask = 0 and rely on use_gpumask = true to trigger vmaf_cuda_state_init. Do not "fix" this back to gpumask = 1 — it's the bug being fixed.
  • Explicit --gpumask=N --backend cuda preserves N. A user who passes --gpumask=2 already has use_gpumask = true, so the --backend cuda branch's defaulting block (gated on !settings->use_gpumask) is skipped. The test_backend_cuda_preserves_explicit_gpumask regression locks this in.
  • On upstream sync: zero interaction; --backend is fork-only.
  • Re-test on rebase:
./build/test/test_cli_parse | grep -E 'backend_'
# Expect: 5 backend tests pass.
build/tools/vmaf -r REF -d DIS -w 576 -h 324 -p 420 -b 8 \
  --model "path=model/vmaf_v0.6.1.json" --threads 1 \
  --backend cuda --output cuda.json --json -q
python3 -c "import json; d=json.load(open('cuda.json')); \
  assert len(d['frames'][0]['metrics']) == 12, 'CUDA not engaged'"

0067 — Tiny-AI PTQ accuracy across Execution Providers (T5-3e)

  • No ADR. Investigation/measurement PR; ADR-0129 already governs the PTQ workstream. Findings update docs/research/0006-tinyai-ptq-accuracy-targets.md §"GPU-EP quantisation" — that section was previously a deferred-open-question; it is now the empirical landing spot.
  • Research digest: same file (Research-0006).
  • Upstream source: fork-local. Netflix/vmaf does not ship a PTQ harness or any tiny-AI ONNX path.
  • Touches (additive only):
  • ai/scripts/measure_quant_drop_per_ep.py — new sibling of measure_quant_drop.py. CPU+CUDA via ORT; Arc / OpenVINO-CPU via the native openvino Python runtime (no onnxruntime-openvino because no cp314 wheel exists). Reuses the _load_session rename workaround from PR #165 + a value_info-strip fix so dynamic-PTQ doesn't choke on the shipped MLP ONNX.
  • docs/ai/quant-eps.md — new user doc; linked from docs/ai/index.md.
  • docs/research/0006-tinyai-ptq-accuracy-targets.md — refreshed header, replaced "GPU-EP open question" with the measurement table, fixed pre-existing MD040/MD060 lints surfaced on the touched file.
  • docs/ai/index.md — added the quant-eps row, rewrapped to 80 cols.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant):
  • measure_quant_drop.py (the CI gate) is unchanged. The new script is purely additive. Any rebase that conflates the two scripts must keep the CI gate CPU-only — Arc int8 is broken, so a per-EP gate would red-light every PR.
  • value_info strip is required for vmaf_tiny_v1* dynamic PTQ. The shipped MLP ONNX duplicate weight tensors in value_info, which makes quantize_dynamic raise Inferred shape and existing shape differ. The fix is in _save_inlined. Don't remove it during a refactor unless the underlying ONNX is regenerated.
  • CUDA-12 ABI shim. ORT-GPU 1.25 wheels link libcublasLt.so.12 even on CUDA-13 hosts. The reproduction recipe pins the nvidia-*-cu12 wheels and prepends them to LD_LIBRARY_PATH. If a future ORT wheel drops the cu12 ABI we can cut the shim, but the script tolerates either since it doesn't import any CUDA symbol itself.
  • On upstream sync: zero interaction; entirely fork-local.
  • Re-test on rebase:
SP=$VIRTUAL_ENV/lib/python3.14/site-packages/nvidia
export LD_LIBRARY_PATH="$SP/cublas/lib:$SP/cudnn/lib:$SP/cuda_nvrtc/lib:$SP/cuda_runtime/lib:$SP/cufft/lib:$SP/curand/lib:$SP/cusolver/lib:$SP/cusparse/lib:$SP/cuda_cupti/lib:$SP/nvtx/lib:$SP/nvjitlink/lib"
python ai/scripts/measure_quant_drop_per_ep.py \
    --eps cpu cuda openvino \
    --extra-fp32 vmaf_tiny_v1.onnx vmaf_tiny_v1_medium.onnx \
    --out runs/quant-eps-$(date +%Y-%m-%d)
# Expected: CPU + CUDA PASS (drop ≤ 1.2e-4); OpenVINO Arc ERR
# (compile failure for Conv-int8) or NaN (MatMul-int8) until a
# newer intel_gpu plugin lands.

0065 — testdata/bench_all.sh correct backend-engagement flags

  • No ADR. Bug fix; no behavioural surface change beyond "the bench actually engages the backends it claims to now."
  • Upstream source: fork-local. testdata/bench_all.sh is a fork-only bench harness; not in Netflix/vmaf.
  • Touches (additive only):
  • testdata/bench_all.sh — switched per-row flag pattern from the disable-only singletons (--no_sycl for "CUDA", etc.) to the correct engagement form (--gpumask=0 --no_sycl --no_vulkan for CUDA, --sycl_device=0 --no_cuda --no_vulkan for SYCL, --vulkan_device=0 --no_cuda --no_sycl for Vulkan, and --no_cuda --no_sycl --no_vulkan for CPU). Added a 4th column (Vulkan) to the comparator. Honours $VMAF_BIN for the binary path and $VMAF_ONEAPI_SETVARS for the oneAPI install location.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • Disable-only singletons don't engage a backend. --no_sycl alone leaves CUDA available but unrequested. --no_cuda alone leaves SYCL available but unrequested. The CLI inits CUDA only when c.use_gpumask is set; SYCL only when c.sycl_device >= 0 or c.use_gpumask; Vulkan only when c.vulkan_device >= 0. Any change to those gates that drops one of the per-row flags will re-introduce the silent CPU fallback. Verify after a rebase by inspecting JSON frames[0].metrics key counts (CPU 14-15, CUDA 11-12, Vulkan ~34) — see libvmaf/AGENTS.md §"Backend-engagement foot-guns".
  • gpumask semantics are inverted from intuition. gpumask=0 enables CUDA dispatch; gpumask=1 disables it. The per-row CUDA flag is --gpumask=0, not --gpumask=1. Don't "fix" it to --gpumask=1 for symmetry with sycl_device/vulkan_device — that's the bug being fixed (parallel to PR #170).
  • On upstream sync: zero interaction; testdata/bench_all.sh is fork-only.
  • Re-test on rebase:
bash testdata/bench_all.sh    # smoke
# Verify each row's JSON keys match the expected per-backend count:
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cpu.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cuda.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_vulkan.json

0063 — Tiny-AI LOSO eval harness for mlp_small

  • No ADR. The methodology fits inside Research Digest 0022; ADR-0203 already covers the training-prep architecture.
  • Research digest: docs/research/0022-loso-mlp-small-results.md.
  • Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
  • Touches (additive only):
  • ai/scripts/eval_loso_mlp_small.py — new evaluation harness.
  • docs/ai/loso-eval.md — usage doc.
  • docs/research/0022-loso-mlp-small-results.md — methodology + results.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • _load_session workaround for renamed-baseline ONNX. The shipped baselines model/tiny/vmaf_tiny_v1*.onnx reference their pre-rename external_data.location values. The workaround in _load_session rewrites the entries before handing the proto to ORT. Removing the workaround breaks the baseline phase. The proper fix (re-export with matching names) is tracked as a follow-up; until then this code path is load-bearing.
  • runs/ and model/tiny/training_runs/ stay gitignored. The harness writes to runs/loso_eval/ by default; do NOT promote any of those outputs into the tree. The 9 fold ONNX and the per-clip JSON cache regenerate from the corpus + trainer + libvmaf CLI.
  • On upstream sync: zero interaction. Pure fork-local evaluation harness.
  • Re-test on rebase:
python ai/scripts/eval_loso_mlp_small.py
diff <(jq -r '.loso_aggregate.mean_plcc' runs/loso_eval/loso_mlp_small_eval.json) <(echo 0.9808)
# Expect: identical line on a populated cache + identical fold ONNX.
  • No ADR. Process / docs PR; rows trace back to the individually-cited ADRs / research digests in their own References columns.
  • Decision dossier: .workingdir2/decisions/section-a-decisions-2026-04-28.md.
  • Source audit: docs/backlog-audit-2026-04-28.md.
  • Upstream source: fork-local. Pure backlog hygiene PR; no Netflix code touched.
  • Touches (additive only):
  • .workingdir2/BACKLOG.md — 9 new rows: T3-17, T3-18, T5-3e, T5-4, T7-35, T7-36, T7-37, T7-38; T6-1a row extended with the bisect-cache fixture sub-bullet.
  • docs/research/0006-tinyai-ptq-accuracy-targets.md — drops the "defer until first user" framing on the GPU-EP quantisation open question per user direction; cross-links T5-3e.
  • docs/research/0020-cambi-gpu-strategies.md — v2 follow-up section now cites T7-36 as the gate for opening the v2 row.
  • docs/adr/0205-cambi-gpu-feasibility.md — Decision section's "follow-up integration PR" now cites T7-36.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant): none. Pure backlog text. Rebase-conflict risk is limited to the same BACKLOG.md table rows that any future row addition would touch; trivial to re-resolve.
  • On upstream sync: zero interaction.
  • Re-test on rebase: none — docs-only.

0062 — ssimulacra2 CUDA + SYCL twins (ADR-0206)

  • ADR: ADR-0206.
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2 GPU implementation; this PR adds the CUDA + SYCL twins of the fork's ADR-0201 Vulkan kernel.
  • Touches (additive + small wiring edits):
  • docs/adr/0206-ssimulacra2-cuda-sycl.md and the index row in docs/adr/README.md.
  • core/src/feature/cuda/ssimulacra2_cuda.{c,h} — new CUDA dispatch.
  • core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu and ssimulacra2_mul.cu — new CUDA fatbins.
  • core/src/feature/sycl/ssimulacra2_sycl.cpp — new SYCL extractor.
  • core/src/feature/feature_extractor.c — two new extern declarations + two new entries in feature_extractor_list[].
  • core/src/meson.build — adds ssimulacra2_blur + ssimulacra2_mul to cuda_cu_sources, introduces (or extends, if PR #157 / ADR-0202 landed first) the cuda_cu_extra_flags map with a ssimulacra2_blur entry, threads per_kernel_flags into the fatbin custom-target, and lists the two new C / CPP TUs.
  • core/src/cuda/AGENTS.md and core/src/sycl/AGENTS.md — rebase invariant notes for the per-kernel --fmad=false flag and the -fp-model=precise SYCL build flag.
  • docs/backends/cuda/overview.md, docs/backends/sycl/overview.md, docs/metrics/features.md — coverage matrix updates.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (load-bearing on rebase):
  • Per-kernel --fmad=false for ssimulacra2_blur. The IIR's o = n2 * sum - d1 * prev1 - prev2 must NOT fuse into FMAs — without the flag the recursive Gaussian's per-step rounding compounds across the 6-scale pyramid past places=4.
  • -fp-model=precise on the SYCL feature build line. Removing it drifts ssimulacra2_sycl past places=2 through the IIR.
  • Hybrid host/GPU split mirrors Vulkan. Host runs YUV→RGB, XYB, downsample, and SSIM/EdgeDiff combine in double; GPU runs only mul + IIR blur. Any future PR that ports XYB or YUV→RGB onto the GPU MUST land alongside an updated ADR-0206 and re-validate places=4 on every Netflix CPU pair.
  • CUDA fex uses .extract (synchronous), not .submit/.collect. Per-frame raw YUV is D2H-copied from picture_cuda's device-side VmafPicture.data[] into pinned host scratch via cuMemcpy2DAsync. Skipping the copy segfaults — direct host reads on a CUdeviceptr are the failure mode the prior agent's WIP hit.
  • On upstream sync: zero interaction with Netflix. The GPU coverage matrix for ssimulacra2 is wholly fork-local.
  • Re-test on rebase:
meson setup build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda

python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary ./build_cuda/tools/vmaf \
  --feature ssimulacra2 --backend cuda --places 4 \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel-format 420 --bitdepth 8
# Expect: 0/48 mismatches, max_abs_diff ~1e-6.

0061 — cambi GPU feasibility spike (ADR-0205)

  • ADR: ADR-0205.
  • Research digest: docs/research/0020-cambi-gpu-strategies.md.
  • Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
  • Touches (additive only):
  • docs/adr/0205-cambi-gpu-feasibility.md, docs/research/0020-cambi-gpu-strategies.md, docs/adr/README.md index row.
  • core/src/feature/vulkan/cambi_vulkan.c — new dormant scaffold (not yet in vulkan_sources, not yet registered).
  • core/src/feature/vulkan/shaders/cambi_{derivative,decimate,filter_mode}.comp — new reference GLSL shaders, not yet in the build's shaders list.
  • core/src/feature/AGENTS.md invariants + CHANGELOG.md bullet.
  • Invariants (rebase-relevant):
  • Hybrid host/GPU port by decision. If Netflix upstream tightens the c-value formula or histogram update protocol, the host residual call site in the eventual cambi_vulkan.c::cambi_vulkan_extract must be updated alongside cambi.c::calculate_c_values — the same code is reused. Do NOT translate the c-values phase to GPU during any upstream-port PR; that optimisation belongs to the v2 strategy-III PR (deferred).
  • Scaffolds dormant in the spike PR. The cambi_vulkan.c extractor returns -ENOSYS from cambi_vulkan_init_stub until the integration follow-up wires it in. Do NOT register vmaf_fex_cambi_vulkan_scaffold in feature_extractor.c's list.
  • Shaders not in the build's shader list. Adding them to core/src/vulkan/meson.build's vulkan_shaders list before the integration PR produces orphaned *_spv.h headers. Leave them alone in this spike PR.
  • On upstream sync: zero interaction. cambi.c itself is upstream-mirrored — Netflix changes flow through port-upstream-commit; only the integration PR's host residual call site needs paired attention.
  • Re-test on rebase:

```bash meson setup build -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build

0059 — Tiny-AI Netflix corpus training prep (ADR-0203)

  • ADR: ADR-0203.
  • Upstream source: fork-local. Netflix/vmaf has no equivalent training surface.
  • Touches:
  • ai/data/ — Netflix loader, libvmaf-CLI feature extractor, distillation scoring.
  • ai/train/ — PyTorch dataset, eval harness, Lightning-style training entry point.
  • ai/scripts/run_training.sh — convenience wrapper.
  • ai/tests/ — five new pytest modules (test_netflix_loader.py, test_dataset.py, test_eval.py, test_train_smoke.py, plus conftest.py).
  • docs/ai/training.md — new "C1 (Netflix corpus)" section; existing sections untouched.
  • ai/AGENTS.md — invariants section added.
  • Invariants (load-bearing):
  • Filename ladder regex is fork-specific. <source>_<quality>_<height>_<bitrate>.yuv (dis) + <source>_<fps>fps.yuv (ref). Upstream may publish a different naming convention later; do NOT merge them — keep this loader scoped to the Netflix corpus, add a sibling loader for any upstream alternative.
  • Per-clip cache schema is consumed by both dataset and any downstream tooling. Schema is {features:{feature_names, per_frame, n_frames}, scores:{per_frame, pooled}}. Any change must invalidate $VMAF_TINY_AI_CACHE (delete or version-tag the directory).
  • Smoke command stays runnable without a built vmaf binary. The _make_zero_payload helper in ai.train.dataset injects a fake payload for --epochs 0 so CI gates don't drag a libvmaf build into the Python test surface.
  • YUV size probe never silently guesses. probe_yuv_dims either matches the 1920x1080 default, returns ffprobe's answer, or raises. Tests pass assume_dims=(16, 16) explicitly for synthetic fixtures.
  • On upstream sync: no interaction with upstream. The ai/ subtree is wholly fork-local.
  • Re-test on rebase:
python -m pytest ai/tests/test_netflix_loader.py \
    ai/tests/test_dataset.py ai/tests/test_eval.py \
    ai/tests/test_train_smoke.py -v
python ai/train/train.py --epochs 0 --data-root /tmp/mock_corpus \
    --assume-dims 16x16 --val-source BetaSrc --out-dir /tmp/out

0073 — Tiny-AI QAT trainer + first per-model QAT pass (T5-4)

  • ADR: ADR-0207 (design), ADR-0208 (per-model impl).
  • Touches: ai/train/qat.py (new), ai/scripts/qat_train.py (rewrite from NotImplementedError scaffold), ai/configs/learned_filter_v1_qat.yaml (new), ai/tests/test_qat_smoke.py (new), docs/ai/quantization.md (QAT tier added). All paths are wholly fork-local; no upstream Netflix/vmaf interaction.
  • Invariants:
  • Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) is load-bearing. Both the legacy ONNX exporter (quantized::conv2d) and the new TorchDynamo exporter (Conv2dPackedParamsBase.__obj_flatten__) refuse to consume convert_fx output on PyTorch 2.11. The bridge (state-dict diff to a fresh fp32 module + ORT static-quantize) is the only path that yields a QDQ ONNX. Do NOT collapse to a single-step convert_fx → torch.onnx.export until both PyTorch issues are fixed; re-check both exporters on each PyTorch upgrade.
  • State-dict transfer matches by submodule name + shape. _copy_qat_weights_into_fp32 walks fp32_state keys, finds the same key in the FX-prepared module, copies the tensor. Tiny-AI models today have stable submodule names (entry, body.*, exit); a model architecture that uses top-level nn.Sequential would break this because prepare_qat_fx renames Sequential children to numeric indices. The RuntimeError("0 tensors copied") guard catches the silent failure mode.
  • FX preparation runs on CPU. PyTorch 2.11's FX symbolic tracer is flaky on CUDA buffers; the trainer migrates the model to CPU before prepare_qat_fx and back to the accelerator for the fine-tune phase. The smoke test deliberately exercises the CPU path so this stays covered.
  • torch.ao.quantization deprecation will hard-fail in PyTorch 2.10. Migration target is torchao.quantization.pt2e (prepare_pt2e / convert_pt2e); the two-step pipeline is mostly pt2e-compatible — only the FX-prep call changes.
  • On upstream sync: no interaction with upstream. The ai/ subtree is fully fork-local.
  • Re-test on rebase:
python -m pytest ai/tests/test_qat_smoke.py -v
python ai/scripts/qat_train.py \
    --config ai/configs/learned_filter_v1_qat.yaml \
    --output /tmp/qat_smoke.int8.onnx --smoke

0074 — GPU-parity matrix CI gate (T6-8 / ADR-0214)

  • Touched surfaces (fork-local): scripts/ci/cross_backend_parity_gate.py (new), .github/workflows/tests-and-quality-gates.yml (new vulkan-parity-matrix-gate job), docs/development/cross-backend-gate.md (new), docs/backends/index.md (cross-backend section), libvmaf/AGENTS.md (rebase-sensitive invariant note).
  • Why this matters on rebase: the CI lane and the matrix-gate script are entirely fork-local. Upstream Netflix/vmaf has no comparable gate; conflicts on rebase are restricted to the CI workflow file when upstream rearranges its own jobs. The gate's Python script lives outside core/src/ so the upstream-sync path doesn't see it.
  • Invariants the gate enforces:
  • Per-feature absolute tolerance is declared in one place (FEATURE_TOLERANCE in scripts/ci/cross_backend_parity_gate.py). Tightening a tolerance requires a measurement-driven follow-up ADR; loosening requires a justification ADR (CLAUDE.md §12 r1).
  • The legacy single-feature gate scripts/ci/cross_backend_vif_diff.py stays for one release cycle. Sister PRs in this session add to it; the T6-8b cleanup PR deletes it once the matrix gate has soaked.
  • CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is registered. The script supports them via --backends; flipping the CI lane to required is a follow-up wiring change, not a code change.
  • On upstream sync: no interaction with upstream tests-and-quality-gates.yml (the gate job is fork-added); rebase conflicts limited to insertion-order in the workflow file.
  • Re-test on rebase:
cd libvmaf && meson setup build \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --backends cpu vulkan \
    --json-out /tmp/parity.json --md-out /tmp/parity.md

0220 — SYCL feature kernels are unconditionally fp64-free (T7-17)

  • Touches: core/src/sycl/common.cpp (init log line), core/src/sycl/AGENTS.md (new invariant row), all SYCL feature kernels under core/src/feature/sycl/ (no diff today, but the contract pins their shape going forward).
  • Invariant: every SYCL feature-kernel lambda captures and operates on float / integer types only. No double operand inside a parallel_for body, no sycl::reduction<double>, no sycl::plus<double>. A single fp64 instruction in the TU's SPIR-V module causes the Level Zero runtime to reject the entire module on Intel Arc A-series and other fp64-less devices, even when the offending kernel is never submitted. Host-side double (in extract / flush post-processing, score aggregation, log10 normalisation) remains fine. Concrete patterns in tree: ADM gain limiting via int64 Q31 (gain_limit_to_q31 + launch_decouple_csf<false> in integer_adm_sycl.cpp); VIF gain limiting via fp32 sycl::fmin; CIEDE / SSIM accumulators via sycl::reduction<int64_t> / sycl::plus<int64_t>.
  • On upstream sync: Netflix/vmaf has no SYCL backend upstream; conflicts cannot enter via git merge. The risk is a fork-local cherry-pick (e.g. a SYCL twin of a new CUDA kernel) bringing a double into a kernel lambda. Audit the lambda capture list and any sycl::reduce* calls against this invariant before merging.
  • Re-test on rebase:
# Build SYCL backend
meson setup build-sycl libvmaf -Denable_sycl=true CC=icx CXX=icpx
ninja -C build-sycl

# On an fp64-less device (e.g. Intel Arc A380), confirm the
# init log line is INFO-level and reads "device lacks native
# fp64 — kernels already use fp32 + int64 paths, no emulation
# overhead". The SYCL kernels must launch successfully (no
# SPIR-V module rejection from the Level Zero runtime).
build-sycl/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --backend sycl \
    --feature integer_vif --feature integer_adm \
    --output /tmp/sycl-fp64less.json --json

0091 — T6-9 model registry schema + --tiny-model-verify (ADR-0211)

  • No rebase impact: 100% fork-local surface. The registry (model/tiny/registry.json), its JSON Schema (model/tiny/registry.schema.json), the --tiny-model-verify CLI flag, and the vmaf_dnn_verify_signature() C entry point are entirely fork-local — none of these paths exist in upstream Netflix/vmaf. Listed here for completeness so a future /sync-upstream run sees the surface area was acknowledged.
  • Touches (additive only): model/tiny/registry.json, model/tiny/registry.schema.json, ai/scripts/validate_model_registry.py, core/src/dnn/model_loader.{c,h} (added vmaf_dnn_verify_signature()), core/include/libvmaf/dnn.h (public declaration), core/tools/cli_parse.{c,h} (ARG_TINY_MODEL_VERIFY + tiny_model_verify field), core/tools/vmaf.c (call site), core/test/dnn/test_tiny_model_verify.c, python/test/model_registry_schema_test.py, docs/ai/model-registry.md, docs/ai/inference.md, docs/ai/security.md, docs/adr/0209-...md, docs/adr/README.md (index row), CHANGELOG.md, core/src/dnn/AGENTS.md.
  • Invariants (rebase-relevant):
  • Schema is the contract. New registry fields land in registry.schema.json first, then in registry.json, then in any consumers (the C-side parser, the Python validator, the MCP). Reverse order causes mismatch.
  • schema_version is bounded. The schema accepts only {0, 1}; bump the enum and the loader's check together when adding 2.
  • Banned-function rule applies. The cosign invocation uses posix_spawnp(3p) with an explicit argv array. Do not replace with system(3) / popen(3) — both shell-parse the command and would re-introduce injection risk.
  • Bundle-file absence is fail-closed. When sigstore_bundle points at a not-yet-existing file (pre-release state), vmaf_dnn_verify_signature() returns -ENOENT. The CLI surfaces this as a load failure; do not "soften" to a warning without an explicit ADR.
  • Re-test on rebase:
python3 ai/scripts/validate_model_registry.py
python3 -m pytest python/test/model_registry_schema_test.py -v
meson test -C build-cpu --suite=dnn

0074 — HIP (AMD ROCm) backend scaffold (T7-10)

  • ADR: ADR-0212.
  • Upstream source: fork-local. HIP backend is fork-only; Netflix/vmaf has no libvmaf_hip.h and no enable_hip meson option.
  • Touches:
  • core/include/libvmaf/libvmaf_hip.h (new).
  • core/include/core/meson.build — adds the is_hip_enabled install gate, mirroring is_cuda_enabled / is_sycl_enabled boolean idioms.
  • core/meson_options.txt — new enable_hip boolean option (default false).
  • core/src/meson.build — new is_hip_enabled flag, conditional subdir('hip'), hip_sources + hip_deps threaded through libvmaf_feature_static_lib (alongside the existing CUDA / SYCL / Vulkan aggregations) and the top-level library('vmaf', ...) dependencies list.
  • core/src/hip/ (new directory: common.{c,h}, picture_hip.{c,h}, dispatch_strategy.{c,h}, meson.build).
  • core/src/feature/hip/ (new directory: adm_hip.c, vif_hip.c, motion_hip.c).
  • core/test/test_hip_smoke.c (new).
  • core/test/meson.build — registers the smoke test under if get_option('enable_hip') == true.
  • .github/workflows/libvmaf-build-matrix.yml — adds Build — Ubuntu HIP (T7-10 scaffold) row.
  • docs/backends/hip/overview.md (new), docs/backends/index.md (planned → scaffold row), docs/research/0033-hip-applicability.md (new), docs/adr/0212-hip-backend-scaffold.md (new), docs/adr/README.md (new index row).
  • libvmaf/AGENTS.md — new "HIP backend scaffold contract" rebase-sensitive invariant entry.
  • CHANGELOG.md — Unreleased § Added.
  • Invariants (rebase-relevant):
  • enable_hip is a boolean option, not a feature. Mirrors enable_cuda / enable_sycl; do not "harmonise" with enable_vulkan's feature / disabled form without an ADR amendment per ADR-0212 § "Decision".
  • Public C-API entry points return -ENOSYS for the scaffold. The smoke test core/test/test_hip_smoke.c pins this. A rebase that "succeeds" by accidentally enabling a code path (e.g. a refactor that early-returns 0 from vmaf_hip_state_init) breaks the smoke and the runtime PR's contract baseline.
  • hip_sources is added to libvmaf_feature_static_lib, NOT directly to the top-level library('vmaf', ...). The static lib is extracted into libvmaf via objects: [..., libvmaf_feature_static_lib.extract_all_objects(recursive: true), ...] at the bottom of core/src/meson.build. Adding hip_sources to the top library() too would double-link.
  • hip_deps IS added to the top library() dependencies: list. The runtime PR will populate hip_deps with the real dependency('hip-lang') linkage; threading it through the top library() ensures consumers see the transitive dependency.
  • Header purity: libvmaf_hip.h does not include <hip/hip_runtime.h>. HIP runtime types cross the public ABI as uintptr_t (matches the CUDA / Vulkan precedent; ADR-0212). Don't add <hip/...> includes to the public header during a rebase / runtime-PR bring-up.
  • No FFmpeg patch: the fork's ffmpeg-patches/ series does not currently consume the HIP API surface. CLAUDE §12 r14 only requires patch updates when an existing patch consumes the surface; the runtime PR (T7-10b) will add the hip_device filter option and the corresponding patch.
  • On upstream sync: zero interaction; HIP backend is fork-only.
  • Re-test on rebase:
cd libvmaf
meson setup build-hip -Denable_cuda=false -Denable_sycl=false \
                      -Denable_hip=true
ninja -C build-hip
meson test -C build-hip test_hip_smoke
# Expect: 9/9 pass.

# Default no-HIP build still works:
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=fast

0074 — SSIMULACRA 2 SVE2 SIMD parity (T7-38)

  • ADR: ADR-0213.
  • Touches: core/src/feature/arm64/ssimulacra2_sve2.{c,h} (new), core/src/feature/ssimulacra2.c (dispatch table override in init_simd_dispatch), core/src/arm/cpu.{c,h} (HWCAP2_SVE2 probe + new VMAF_ARM_CPU_FLAG_SVE2 enum value), core/src/meson.build (cc.compiles probe + optional arm64_ssimulacra2_sve2 static library), core/test/test_ssimulacra2_simd.c (SVE2 picker overrides on the arm64 path + dispatch diagnostic), build-aux/aarch64-linux-gnu-sve2.ini (new cross-file pinning qemu-aarch64-static -cpu max). All paths are wholly fork-local; no upstream Netflix/vmaf code is modified.
  • Invariants:
  • Fixed 4-lane SVE2 predicate. Every kernel uses svwhilelt_b32(0, 4) so SIMD arithmetic order is identical to the NEON sibling regardless of the runtime vector length. This keeps the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract intact. Do NOT widen the predicate to svptrue_b32() without a separate ADR + snapshot regen — variable-length lane reductions perturb the per-step rounding order.
  • NEON stays the fallback. SVE2 is purely additive; the dispatch table assigns NEON first and only overrides on VMAF_ARM_CPU_FLAG_SVE2. A toolchain that fails the cc.compiles(... -march=armv9-a+sve2) probe leaves HAVE_SVE2 unset and the legacy NEON-only build is unchanged.
  • -ffp-contract=off mirrors the NEON sibling. Without it GCC fuses the per-lane scalar tail's a*b+c patterns into fmla, drifting against the SIMD path by ~1 ulp. The arm64_ssimulacra2_sve2 static library carries the flag like its NEON counterpart.
  • On upstream sync: no interaction with upstream — arm64/ feature TUs and the arm/cpu.{c,h} flag enum are fork-local. An upstream sync that rewrites init_simd_dispatch in core/src/feature/ssimulacra2.c would also need the SVE2 cases preserved.
  • Re-test on rebase:
meson setup build-arm64-sve2 libvmaf \
    --cross-file=build-aux/aarch64-linux-gnu-sve2.ini -Denable_asm=true
ninja -C build-arm64-sve2 test/test_ssimulacra2_simd
meson test -C build-arm64-sve2 test_ssimulacra2_simd
# stderr should report `ssimulacra2 simd dispatch: NEON=1 SVE2=1`
# and 11/11 tests should pass.

0075 — enable_lcs MS-SSIM extras on CUDA + Vulkan (T7-35 / ADR-0243)

  • Touched surfaces (fork-local): core/src/feature/cuda/integer_ms_ssim_cuda.c (added enable_lcs to MsSsimStateCuda + options[] + 15 host-side vmaf_feature_collector_append calls gated on the bool), core/src/feature/vulkan/ms_ssim_vulkan.c (rewrote enable_lcs help text + added emit_lcs_metrics helper + gated 15 vmaf_feature_collector_append calls), scripts/ci/cross_backend_vif_diff.py scripts/ci/cross_backend_parity_gate.py (new float_ms_ssim_lcs pseudo-feature + FEATURE_ALIASES map places=4 tolerance row).
  • Why this matters on rebase: the GPU MS-SSIM extractors are fork-local (Netflix upstream has no Vulkan or CUDA MS-SSIM kernel today). The enable_lcs semantic and the metric names (float_ms_ssim_{l,c,s}_scale{0..4}) must match the upstream CPU reference at core/src/feature/float_ms_ssim.c:189-221. If upstream ever renames or reorders those metrics, mirror the change on the GPU side in the same merge — public-API contract.
  • Invariants the contract enforces:
  • Default-path output (enable_lcs=false) stays bit-identical to the pre-T7-35 binary: only the host-side appends are gated; no kernel / shader / device-buffer changes.
  • Metric ordering is metric-wise (all l_scale* first, then c_*, then s_*) — matches the CPU emission order.
  • places=4 cross-backend tolerance per ADR-0190; enforced by the new float_ms_ssim_lcs cell in the parity matrix gate (ADR-0214).
  • On upstream sync: zero interaction; the GPU twins do not exist upstream. The CPU float_ms_ssim.c is shared with upstream but enable_lcs is upstream-stable since v3.0.0.
  • Re-test on rebase:
cd libvmaf && meson setup build-vulkan \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build-vulkan
cd ..
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build-vulkan/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 \
    --feature float_ms_ssim_lcs --backend vulkan --places 4

0075 — 32-bit ADM/cpu fallbacks port (T-NEW-3)

  • Touched surfaces (upstream-mirror): core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/x86/cpu.c. Cherry-picks of upstream 8a289703 (Christopher Degawa, "adm: add fallback for extract_epi64 for 32-bit") and 1b6c3886 ("x86/cpu: remove limit of avx+ on 32-bit").
  • Why this matters on rebase: trivially conflict-free with any future upstream extract_epi64 work because we land upstream's exact extract_epi64 macro/inline-fn pair. The conflict surface is the fork's clang-format-100col layout in adm_avx2.c / adm_avx512.c and the _Alignas(64) LTO-correctness slot in adm_avx512.c (docs/development/known-upstream-bugs.md); both are preserved verbatim.
  • Invariants the port preserves:
  • _Alignas(64) int64_t angle_flag[16] in adm_decouple_s123_avx512 stays — without it, LTO can promote the unaligned load to vmovdqa64 and fault under --buildtype=release -Db_lto=true.
  • The extract_epi64 symbol must remain resolved on both __x86_64__ (macro to _mm256_extract_epi64) and 32-bit (fallback inline). If a future upstream change inlines the helper differently, keep the conditional definition.
  • On upstream sync: if Netflix ships further 32-bit fallbacks (motion / psnr — not in this port), expect a parallel extract_epi64-style helper at the top of each affected SIMD file. The fork should mirror those verbatim into the same files.
  • Re-test on rebase:
meson setup build-i686 libvmaf \
    --cross-file=build-aux/i686-linux-gnu.ini \
    -Denable_asm=false
ninja -C build-i686
meson setup build-cpu libvmaf -Denable_avx512=true
ninja -C build-cpu
meson test -C build-cpu

0076 — codec-aware FR regressor surface (T7-CODEC-AWARE / ADR-0235)

  • Touches: ai/src/vmaf_train/codec.py (new), ai/src/vmaf_train/models/fr_regressor.py (extended), ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/extract_full_features.py. No upstream-shared paths.
  • Invariant: CODEC_VOCAB in ai/src/vmaf_train/codec.py is closed and order-stable — the index of each codec is the one-hot column index baked into trained ONNX. Adding a codec appends to the tuple and bumps CODEC_VOCAB_VERSION; reordering silently invalidates every shipped fr_regressor_v2_*.onnx. FRRegressor(num_codecs=0) must remain the v1 single-input contract — flipping the default would break every existing model/tiny/fr_regressor_v1.onnx consumer.
  • Re-test: pytest ai/tests/test_codec_aware_fr.py -v (8 sub-tests covering vocabulary contract + alias table + back-compat). Pure fork-local addition; no upstream rebase impact for the next /sync-upstream.

0075 — feature/speed extractors (T-NEW-1, upstream port d3647c73)

  • Touches: core/src/feature/speed.c (new), core/src/feature/picture_copy.{c,h} (signature change — added int channel parameter), core/src/feature/float_*.c call sites updated to pass channel=0, core/src/feature/feature_extractor.c registry block, core/src/feature/alias.c, core/src/meson.build, core/src/feature/vif_tools.{c,h} (helper-function port from upstream 4ad6e0ea).
  • Upstream source: verbatim cherry-pick of Netflix/vmaf d3647c73 ("feature/speed: port speed_chroma and speed_temporal extractors") with its dependency 4ad6e0ea ("feature/vif: port helper functions"). Both are pre-existing on Netflix master and enter the fork as part of the T7-4 audit catch-up.
  • Invariant: picture_copy() now takes a channel argument — every fork-local extractor that calls it (CUDA integer_ms_ssim, Vulkan ssim / ms_ssim) passes channel=0. If upstream later evolves the signature again (e.g. adds bit-depth or stride validation), update those fork-local call sites in lockstep. Speed extractors only register when VMAF_FLOAT_FEATURES=1 (build with -Denable_float=true).
  • On upstream sync: future Netflix commits in core/src/feature/speed.c apply cleanly because the file is now a verbatim mirror; conflict potential is limited to the registry block in feature_extractor.c (interleave with the fork's Vulkan / SYCL / CUDA blocks) and to any further picture_copy signature evolution.
  • Re-test on rebase:

```bash meson setup build-cpu libvmaf -Denable_cuda=false \ -Denable_sycl=false -Denable_float=true ninja -C build-cpu meson test -C build-cpu test_speed meson test -C build-cpu # full meson suite make test-netflix-golden # 3 CPU canonical pairs

0221 — CHANGELOG + ADR-index fragment-file pattern (T7-39 / ADR-0221)

  • What changed: the fork stopped editing CHANGELOG.md and docs/adr/README.md directly. Both files are now rendered from fragment trees:
  • changelog.d/<section>/<topic>.md (Keep-a-Changelog sections), plus the migration archive changelog.d/_pre_fragment_legacy.md.
  • docs/adr/_index_fragments/<NNNN-slug>.md, plus docs/adr/_index_fragments/_order.txt (frozen commit-merge order manifest) and docs/adr/_index_fragments/_header.md (table prelude). Two scripts render the consolidated outputs:
  • scripts/release/concat-changelog-fragments.sh --check|--write
  • scripts/docs/concat-adr-index.sh --check|--write
  • On upstream sync: zero interaction — CHANGELOG.md is a fork-local Markdown surface (Netflix upstream doesn't ship a Keep-a-Changelog file in this format), and docs/adr/ is entirely fork-local. A /sync-upstream run will not touch the fragment trees.
  • Re-test on rebase:
bash scripts/release/concat-changelog-fragments.sh --check
bash scripts/docs/concat-adr-index.sh --check
# both must exit 0; otherwise run --write and re-stage.

0077 — DISTS extractor proposal (T7-DISTS / ADR-0236)

  • What landed: ADR-0236 (Proposed) + Research-0043 design digest
  • ADR README index row + CHANGELOG entry.
  • Rebase impact: pure fork-local proposal-stage docs; no code, no Netflix-mirror file touched, no ffmpeg-patches change, no public C-API surface change.
  • Reproducer (when implementation lands as T7-DISTS):

```sh vmaf --feature dists_sq=model_path=model/tiny/dists_sq.onnx \ --reference ref.yuv --distorted dist.yuv \ --width 1920 --height 1080 --pix_fmt yuv420p

0076 — GPU-gen ULP calibration head (proposal-stage, T7-GPU-ULP-CAL / ADR-0234)

  • What landed: ADR-0234 (Proposed), Research-0041, data-collection scaffold at ai/scripts/collect_gpu_calibration_data.py, forward-pointer in docs/usage/cli.md for the future --gpu-calibrated flag.
  • Rebase impact: pure fork-local (proposal docs + Python script); no upstream Netflix/vmaf code touched, no public C-API changes, no ffmpeg-patches changes.
  • Reproducer:

```sh python3 ai/scripts/collect_gpu_calibration_data.py --smoke

0095 — Per-backend GPU kernel scaffolding templates (CUDA + Vulkan, ADR-0246)

  • ADR: ADR-0246.
  • Touches:
  • core/src/cuda/kernel_template.h (new, header-only).
  • core/src/vulkan/kernel_template.h (new, header-only).
  • core/src/cuda/AGENTS.md (new invariant row + dir listing).
  • core/src/vulkan/AGENTS.md (new file).
  • docs/backends/kernel-scaffolding.md (new).
  • docs/adr/0246-gpu-kernel-template.md (new).
  • CHANGELOG.md, docs/adr/README.md. All paths are wholly fork-local. Upstream Netflix/vmaf has no Vulkan backend at all today and the CUDA backend uses different per-kernel scaffolding shapes; nothing here can collide on a pure upstream sync.
  • Invariants:
  • Templates are unused at PR-merge time. kernel_template.h in both core/src/cuda/ and core/src/vulkan/ lands with zero call-sites. Each future kernel migration is its own gated PR (places=4 cross-backend-diff per ADR-0214). Do not bulk-port existing kernels onto the templates in a single sync — that would short-circuit the per-kernel gate.
  • Per-backend, not cross-backend. Resist the urge to merge the two templates into a unified gpu/kernel_template.h. CUDA async-stream + event vs Vulkan command-buffer + fence + descriptor-pool share no concrete shape; a unified API would be lowest-common-denominator.
  • Helper functions, not macros. The header bodies are static inline functions for cuda-gdb / Nsight / RenderDoc step-debugging. The CHECK_CUDA_GOTO / CHECK_CUDA_RETURN macros in cuda_helper.cuh stay where they pay off (textual goto label), and the templates use them internally.
  • On upstream sync: no interaction with upstream paths. An upstream sync that touches core/src/cuda/common.h or picture_cuda.h may shift the helper signatures the template consumes (vmaf_cuda_buffer_alloc, vmaf_cuda_picture_get_stream, …); update the template if so.
  • Re-test on rebase:

```bash # CUDA build (configure inside libvmaf/ — see CLAUDE.md §2 note). meson setup core/build-cuda libvmaf \ -Denable_cuda=true -Denable_nvcc=true \ -Denable_vulkan=disabled -Denable_sycl=false ninja -C core/build-cuda meson test -C core/build-cuda

# Vulkan build. meson setup core/build-vulkan libvmaf \ -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C core/build-vulkan meson test -C core/build-vulkan

0222 — vmaf-perShot per-shot CRF predictor sidecar (T6-3b)

  • Touches: core/tools/meson.build (new executable + test wiring), core/tools/vmaf_per_shot.c (new file — fork-local, no upstream sibling), core/tools/test/meson.build (test row), core/tools/test/test_vmaf_per_shot.sh (new smoke test), core/tools/AGENTS.md (sidecar invariants), docs/usage/cli.md (cross-link), docs/usage/vmaf-perShot.md (new user doc), docs/ai/roadmap.md (T6-3b row update).
  • Invariant: the sidecar must stay standalone — it does not link the libvmaf metric path. Any upstream patch that tries to fold per-shot CRF prediction into vmaf_score_* would collapse the encoder-hint vs. quality-score separation recorded in roadmap §2.4 and ADR-0222 §Decision. The CSV / JSON column set (shot_id, start_frame, end_frame, frames, mean_complexity, mean_motion, predicted_crf) is the public schema; downstream encoders consume it directly.
  • Conflict expectation on /sync-upstream: low. Upstream Netflix has no per-shot CRF predictor in tree, so there is no natural collision point — tools/meson.build is the only mutually-edited file and the new executable('vmaf-perShot', …) block is appended after vmaf_bench_deps, well clear of upstream's likely additions.
  • Reproducer:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=disabled ninja -C build meson test -C build test_vmaf_per_shot --print-errorlogs ./build/tools/vmaf-perShot \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --output /tmp/plan.csv cat /tmp/plan.csv

0075 — vmaf-roi sidecar binary (T6-2b / ADR-0247)

  • Touches:
  • core/tools/meson.build — adds the vmaf_roi executable target (after the existing vmaf target, before vmaf_bench). Append-only; no upstream-shared lines moved or removed.
  • core/test/meson.build — adds the test_vmaf_roi executable + test() registration. Append-only.
  • core/tools/vmaf_roi.c — wholly new, fork-local.
  • core/tools/vmaf_roi_core.h — wholly new, fork-local.
  • core/test/test_vmaf_roi.c — wholly new, fork-local.
  • Invariant: the vmaf-roi sidecar emits two byte-exact formats that downstream encoder drivers (x265 --qpfile, SVT-AV1 --roi-map-file) will hard-depend on:
  • x265 ASCII grid — two #-prefixed header lines (# vmaf-roi qpfile (x265, --qpfile-style) and # frame=N ctu=S cols=C rows=R strength=F.FFF), space-separated signed integers, one row per CTU row, \n terminator.
  • SVT-AV1 raw binary — exactly cols * rows bytes of int8_t, row-major, no header.
  • QP-offset clamp+-12 (VMAF_ROI_CORE_QP_OFFSET_MAX).
  • Reduction — per-CTU mean (not max). Switching to max or a percentile changes every downstream encoder result and requires its own ADR.
  • Pure helpers in vmaf_roi_core.h — the per-CTU mean reducer and saliency-to-QP mapper are static inline in a header so test_vmaf_roi compiles them without dragging the libvmaf link surface in. Moving them into a .c TU breaks the test wiring.
  • On upstream sync: no interaction with upstream — tools/ is a fork-local surface from upstream's perspective (upstream ships vmaf.c only). An upstream sync that rewrites core/tools/meson.build should preserve the vmaf_roi executable block.
  • Re-test on rebase:

```bash meson setup build-cpu libvmaf \ -Denable_cuda=false -Denable_sycl=false -Denable_tools=true ninja -C build-cpu tools/vmaf_roi test/test_vmaf_roi meson test -C build-cpu test_vmaf_roi ./build-cpu/tools/vmaf_roi \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --frame 0 --output - \ --encoder x265 --ctu-size 64 --strength 6.0 | head -3 # First two lines are the # comment header; row 1 of the grid # should be "4 2 1 -1 -1 -1 1 2 4" (placeholder radial map).

0219 — motion3 GPU coverage on Vulkan + CUDA + SYCL (T3-15(c) / ADR-0219)

  • What changed: The motion GPU twins (core/src/feature/vulkan/motion_vulkan.c, core/src/feature/cuda/integer_motion_cuda.c, core/src/feature/sycl/integer_motion_sycl.cpp) now emit VMAF_integer_feature_motion3_score in 3-frame window mode (default). Cross-backend gates extended (scripts/ci/cross_backend_*.py FEATURE_METRICS["motion"]).
  • Invariants:
  • motion3 = host-side scalar post-process of motion2. No device-side state changes; motion3 is computed on the host in extract() / collect() / flush() after the existing SAD reduction. The post-processing function (motion3_postprocess_*) mirrors CPU integer_motion.c lines 510-560 byte-for-byte: clip(motion_blend(motion2 * fps_weight, blend_factor, blend_offset), max_val) with optional moving-average against the unaveraged prior blended value.
  • motion_five_frame_window=true returns -ENOTSUP at init() on all three GPU backends. The 5-deep blur ring + second SAD-pair dispatch remain deferred. Do NOT silently fall back to the 3-frame path when the user enables the flag — fail loud per CERT C / CLAUDE.md §12 r4.
  • CPU motion3 algorithm is the source of truth. Any port of an upstream Netflix change to integer_motion.c that touches motion_blend(...), the motion_max_val clip, or the moving-average rule MUST be mirrored in motion3_postprocess_* across all three GPU files in the same PR. The cross-backend gate at places=4 will catch drift, but only after a full GPU run.
  • On upstream sync: Pure fork-local additions to GPU TUs. Upstream Netflix has no GPU motion extractor. The motion_blend_tools.h header is upstream-mirrored — if a sync rewrites the motion_blend() formula, regenerate the GPU snapshot and re-run the cross-backend gate.
  • Re-test on rebase:

```bash # CPU sanity (motion3 emission unchanged) ./core/build/tools/vmaf \ --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature motion --output /tmp/motion.json --json python -c "import json; d=json.load(open('/tmp/motion.json')); \ print('motion3 frames:', sum(1 for f in d['frames'] \ if 'integer_motion3' in f.get('metrics', {})))" # Expect 49 (one motion3 per frame).

# Cross-backend gate (Vulkan/lavapipe lane works on every host): python scripts/ci/cross_backend_vif_diff.py \ --feature motion --backend vulkan \ --ref python/test/resource/yuv/src01_hrc00_576x324.yuv \ --dis python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --bitdepth 8 \ --vmaf-bin core/build/tools/vmaf # Expect: integer_motion / integer_motion2 / integer_motion3 all OK at places=4.

0216 — vmaf_tiny_v2 (Phase-3-validated tiny VMAF MLP)

  • Touches: model/tiny/registry.json, model/tiny/vmaf_tiny_v2.{onnx,json}, ai/scripts/{train,export,validate}_vmaf_tiny_v2.py, ai/AGENTS.md, core/test/dnn/{test_vmaf_tiny_v2.py,meson.build}, docs/ai/{models/vmaf_tiny_v2.md,inference.md,roadmap.md}, docs/adr/{0244-vmaf-tiny-v2.md,README.md}, CHANGELOG.md. All paths are wholly fork-local; no upstream Netflix/vmaf code is modified.
  • Invariants:
  • Bundled scaler stats are part of the trust root. The shipped ONNX bakes (input - mean) / std as Constant Sub + Div nodes that run before the MLP. Re-exporting must go through ai/scripts/export_vmaf_tiny_v2.py, which pulls mean / std from the trainer checkpoint and writes them as graph initialisers. Adding an out-of-band scaler step at runtime (e.g., a sidecar JSON consumed by the loader) is forbidden without a follow-up ADR — it splits the trust root and invalidates the registry sha256 contract.
  • Feature column order is fixed. The graph reads (adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2) in exactly this order; reordering breaks the bundled mean / std constants. Any change to the feature set requires a fresh Phase-3 chain (Research-0027 → 0028 → 0029 → 0030).
  • opset 17. Matches the sister tiny-AI models (learned_filter_v1, nr_metric_v1, fastdvdnet_pre) and the ORT op-allowlist baseline. Upgrading requires re-validating the Sub / Div / Gemm / Relu / Squeeze ops against op_allowlist.c.
  • On upstream sync: zero interaction. Netflix/vmaf has no equivalent surface; an upstream sync that touches core/src/dnn/ (op-allowlist or model-loader changes) needs to preserve Sub / Div / Gemm / Relu / Squeeze in the allowlist for opset 17.
  • Re-test on rebase:

```bash bash core/test/dnn/test_registry.sh python3 core/test/dnn/test_vmaf_tiny_v2.py python3 ai/scripts/validate_vmaf_tiny_v2.py \ --onnx model/tiny/vmaf_tiny_v2.onnx \ --parquet runs/full_features_netflix.parquet \ --rows 100 --min-plcc 0.97 meson test -C build-cpu --suite=dnn

0094 — Tiny-AI extractor template (ADR-0250)

  • Touches: core/src/dnn/tiny_extractor_template.h (new), core/src/feature/feature_lpips.c, core/src/feature/fastdvdnet_pre.c, core/src/dnn/AGENTS.md, docs/ai/extractor-template.md (new), docs/adr/0250-tiny-ai-extractor-template.md (new).
  • Invariants:
  • Helper signatures are wire-format-stable. vmaf_tiny_ai_resolve_model_path(name, option, env_var) and vmaf_tiny_ai_open_session(name, path, &out) produce the user-facing log lines <name>: no model path … and <name>: vmaf_dnn_session_open(<path>) failed: <rc> — downstream tooling greps these. Don't rename or reorder the parameters without bumping every extractor + the recipe doc.
  • YUV→RGB is bit-exact. The shared vmaf_tiny_ai_yuv8_to_rgb8_planes is a literal move of the pre-existing feature_lpips.c body (BT.709 limited-range, nearest-neighbour chroma upsample). LPIPS / saliency / future colour-sensitive tiny-AI scores depend on byte-exact equality with the prior ad-hoc copies. Any change to the conversion constants or the rounding rule needs a separate ADR + a coordinated snapshot regen — model/tiny/ weights aren't re-trained against new colour math casually.
  • Option-table macro is plain text substitution. The VMAF_TINY_AI_MODEL_PATH_OPTION(state_t, help) macro emits a single struct literal — no control flow, no recursion, no variadic shenanigans (Power-of-10 rule 1 / rule 9). Don't extend it into a multi-option emitter without a fresh ADR.
  • On upstream sync: zero interaction with upstream — feature_lpips.c and fastdvdnet_pre.c are fork-only files, and the new dnn/tiny_extractor_template.h lives entirely under fork-introduced core/src/dnn/. An upstream sync that rewrites unrelated feature_*.c files won't conflict.
  • Re-test on rebase:
cd libvmaf
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=dnn
meson test -C build-cpu test_lpips test_fastdvdnet_pre
# All 10 dnn-suite + both extractor tests must pass.

0095 — Vulkan ring-depth tunable (ADR-0251 follow-up #3)

  • PR: feat/t7-29-followup3-ring-tunable.
  • What rebases need to know: VmafVulkanConfiguration grew an additive unsigned max_outstanding_frames field. Existing zero-initialised configs continue to receive the canonical default (0 → VMAF_VULKAN_RING_DEFAULT == 4). The clamp helper vmaf_vulkan_clamp_ring_size moved from import.c (file-local static) to vulkan_internal.h (static inline) so state_init and lazy_alloc_ring share one definition; an upstream sync that re-introduces the static in import.c would shadow the header helper — drop the duplicate, keep the inline.
  • New public symbol: vmaf_vulkan_state_max_outstanding_frames(const VmafVulkanState *) — read-side accessor for the clamped value. Pure additive surface; no upstream collision.
  • On upstream sync: zero interaction. The ring is wholly fork-introduced (ADR-0251); upstream Netflix has no Vulkan backend.
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_async_pending_fence # All 8 cases must pass: 4 v2-contract + 4 ring-tunable.

0096 — tools/vmaf-tune/ automation umbrella spec (ADR-0237 / Research-0044)

  • PR: feat/vmaf-tune-spec.
  • What rebases need to know: this PR ships only an umbrella ADR
  • research digest under docs/. No tracked source code, no tools/vmaf-tune/ directory yet, no Meson changes. An upstream sync touching ffmpeg-patches or libvmaf/ cannot collide with this PR.
  • On upstream sync: zero interaction. Spec-only PR.
  • Re-test on rebase:
# No build/test impact — verify the docs render and links are alive:
ls docs/adr/0237-quality-aware-encode-automation.md \
   docs/research/0044-quality-aware-encode-automation.md
grep -c '\[ADR-0237\]' docs/adr/README.md

0097 — test_speed gated on enable_float (fix default-build failure)

  • PR: fix/test-speed-chroma-registration.
  • What rebases need to know: core/test/meson.build now wraps the test_speed executable + test() registration in if get_option('enable_float'). The speed_chroma / speed_temporal extractors live in speed.c, which is only compiled when enable_float=true (the entries in feature_extractor.c are wrapped in #if VMAF_FLOAT_FEATURES), so the test's vmaf_get_feature_extractor_by_name("speed_chroma") returned NULL on a default build (enable_float=false).
  • On upstream sync: zero interaction. test_speed.c was added fork-side via the Netflix port commit d3647c73. The gating pattern matches test_vulkan_* (if get_option('enable_vulkan').enabled()).
  • Re-test on rebase:
# default (enable_float=false): test_speed must NOT be in the suite
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --reconfigure
ninja -C build
meson test -C build  # expect: NO test_speed in the run

# CI shape (enable_float=true): test_speed must run + pass
meson setup build libvmaf -Denable_float=true --reconfigure
ninja -C build
meson test -C build test_speed  # expect: 5/5 pass

0098 — Vulkan picture preallocation surface (ADR-0238)

  • PR: feat/vulkan-picture-preallocation.
  • What rebases need to know: ABI grows additively. New public surface in core/include/libvmaf/libvmaf_vulkan.h: enum VmafVulkanPicturePreallocationMethod, VmafVulkanPictureConfiguration, vmaf_vulkan_preallocate_pictures, vmaf_vulkan_picture_fetch. New enumerator VMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICE in core/src/picture.h::VmafPictureBufferType. New TU core/src/vulkan/picture_vulkan_pool.c (~180 LOC); registered in core/src/vulkan/meson.build. Fork-internal accessor vmaf_vulkan_state_context() (declared in vulkan_internal.h) exposes the imported state's VkInstance/VkDevice to the pool — used only by libvmaf.c::vmaf_vulkan_preallocate_pictures.
  • VmafContext field added: vmaf->vulkan.pool next to vmaf->vulkan.state. The vmaf_close() teardown closes the pool before clearing the state pointer (matches SYCL).
  • On upstream sync: zero interaction. Vulkan backend is fork-only; upstream Netflix has no Vulkan integration.
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_pic_preallocation # All 6 cases must pass under ASan/UBSan: # test_method_none_is_a_no_op # test_method_host_allocates_round_robins # test_method_device_allocates_round_robins # test_fetch_without_preallocate_falls_back # test_unknown_method_rejected # test_null_args_rejected

0099 — feature_mobilesal.c + transnet_v2.c migrated to tiny_extractor_template.h

  • PR: refactor/migrate-ai-to-template.
  • What rebases need to know: feature_mobilesal.c and transnet_v2.c previously open-coded the model-path resolution (getenv + log block), the YUV→RGB kernel (mobilesal only), the vmaf_dnn_session_open + log boilerplate, and the VmafOption[].model_path row. They now use the helpers from dnn/tiny_extractor_template.h (PR #251) — the same template feature_lpips.c and fastdvdnet_pre.c already consume. Net −98 LOC of identical boilerplate.
  • Behavior preserved: bit-exact YUV→RGB conversion (mobilesal used the literal copy of feature_lpips.c's body that the template hoisted), identical error-log strings, identical option-table flag/type/offset shape. The migrated mobilesal_options macro expands to the same struct literal the hand-rolled version produced.
  • On upstream sync: zero interaction. Both files are fork-introduced; upstream Netflix has neither extractor.

0100 — cuda/ring_buffer.{c,h}gpu_picture_pool.{c,h} (ADR-0239)

  • PR: refactor/gpu-picture-pool-extract.
  • What rebases need to know: core/src/cuda/ring_buffer.c and ring_buffer.h are removed. The same callback-based round-robin pool lives at core/src/gpu_picture_pool.{c,h} under renamed symbols (VmafRingBufferVmafGpuPicturePool, vmaf_ring_buffer_*vmaf_gpu_picture_pool_*, _fetch_next_picture_fetch). All call sites in libvmaf.c migrated. core/test/test_ring_buffer.c renamed to test_gpu_picture_pool.c with the corresponding meson update.
  • Netflix-upstream interaction: minimal — Netflix's cuda/ring_buffer.{c,h} last touched in commit cb1d49c6. An upstream sync that resurrects the old names should be redirected to the new ones; the file move is purely fork-local.
  • Netflix#1300 mutex-destroy-order fix preserved (ADR-0157) — moved verbatim to the new file; the fix remains attached to vmaf_gpu_picture_pool_close.
  • SYCL pool migration: vmaf_sycl_picture_pool_* keeps its public-internal API but now delegates to the generic pool. The SYCL wrapper struct (VmafSyclPicturePool) just owns the VmafSyclCookie storage. std::mutex drops out.
  • Vulkan pool migration: bundled into this PR after #264 merged. picture_vulkan_pool.c rewrites as a thin wrapper around the generic pool — wrapper struct owns per-pool state for the alloc/free callbacks; the generic pool owns the round-robin slots / mutex / unwind. Same pattern as the SYCL migration above.
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=dnn
meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre
# All 11 dnn-suite + 4 extractor smoke tests must pass.
meson test -C build  # 47/47 pass under ASan/UBSan

# CUDA build (CI-only; pre-existing local nvcc include-path quirk):
meson setup build-cuda libvmaf -Denable_cuda=true
ninja -C build-cuda
meson test -C build-cuda test_gpu_picture_pool

# SYCL build:
meson setup build-sycl libvmaf -Denable_sycl=true
ninja -C build-sycl
meson test -C build-sycl

0104 — psnr_vulkan.c migrated to vulkan/kernel_template.h

  • PR: refactor/migrate-psnr-vulkan-to-template.
  • What rebases need to know: vulkan/kernel_template.h (410 LOC, ADR-0246, PR #251) shipped with zero consumers. Its docstring designated psnr_vulkan.c as the reference implementation. This PR lands the migration as the first consumer of the Vulkan template — paired with PR #269 (the first CUDA template consumer). The 5 long-lived pipeline objects (descriptor-set layout, pipeline layout, shader module, compute pipeline, descriptor pool) collapse from individual struct fields to one VmafVulkanKernelPipeline pl bundle. create_pipeline() (~104 LOC) collapses to a single vmaf_vulkan_kernel_pipeline_create() call (~30 LOC) — the template owns the descriptor-set layout creation, pipeline layout, shader module, compute pipeline, and descriptor-pool sizing. close_fex()'s vkDeviceWaitIdle + 5×vkDestroy* sweep collapses to one vmaf_vulkan_kernel_pipeline_destroy() call.
  • Net LOC delta: −55 LOC on psnr_vulkan.c directly. Unlike the CUDA template (where helper-call boilerplate roughly matches the inline savings), the Vulkan template's pipeline creation is dramatic enough that even the first consumer wins.
  • Bit-exactness gates: spec-constants, push-constant struct, shader bytecode, dispatch grid math, and host-side reduction are byte-identical to the prior implementation. The template only owns descriptor-set layout / pipeline layout / shader module / compute pipeline creation / descriptor pool sizing — none of which affects the kernel's mathematical behaviour. Cross-backend parity gate (places=4) re-runs unchanged.
  • On upstream sync: zero interaction. psnr_vulkan.c is fork-introduced (T7-23 / ADR-0182 / ADR-0216).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled
ninja -C build
meson test -C build  # 50/50 pass on lavapipe
# Cross-backend parity gate (places=4):
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4

0105 — moment_vulkan.c + ciede_vulkan.c migrated to vulkan/kernel_template.h

  • PR: refactor/migrate-motion-vulkan-to-template (note: the branch name reflects the original intent; motion's two-pipeline shape didn't fit the template's single-pipeline contract, so this PR migrates moment + ciede instead).
  • What rebases need to know: second + third consumers of vulkan/kernel_template.h (after PR #270 = psnr_vulkan, the first consumer). Both files follow the identical migration pattern:
  • Replace 5 individual pipeline-object fields (dsl, pipeline_layout, shader, pipeline, desc_pool) with one VmafVulkanKernelPipeline pl bundle.
  • Replace ~100 LOC of create_pipeline() body (descriptor-set layout + pipeline layout + shader module + compute pipeline + descriptor pool boilerplate) with a single vmaf_vulkan_kernel_pipeline_create() call.
  • Replace close_fex()'s vkDeviceWaitIdle + 5×vkDestroy* sweep with one vmaf_vulkan_kernel_pipeline_destroy() call.
  • Per-file LOC deltas:
  • moment_vulkan.c: −60 LOC (450 → 390).
  • ciede_vulkan.c: −59 LOC (536 → 477).
  • Net: −119 LOC.
  • Bit-exactness preserved: spec-constants (width/height/bpc/ subgroup_size identical across both), push-constant structs (MomentPushConsts, CiedePushConsts), shader bytecodes (moment_spv, ciede_spv), dispatch grid math, and host-side reductions are byte-identical to the prior implementation. Cross-backend parity gates (places=4 for moment integer reduce; places=2 for ciede transcendentals per ADR-0187) re-run unchanged.
  • motion_vulkan.c deferred: motion uses two pipelines (first frame vs subsequent) sharing one DSL + layout + shader + pool. The template's current shape produces one pipeline per descriptor; splitting motion across two VmafVulkanKernelPipeline instances would duplicate the shared objects. Tracked as a follow-up template extension (multi-pipeline support).
  • On upstream sync: zero interaction. Both files are fork-introduced (T7-23 / ADR-0182 / ADR-0187).
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build # 50/50 pass on lavapipe (under ASan/UBSan) python scripts/ci/cross_backend_parity_gate.py --feature float_moment_ref1st --places 4 python scripts/ci/cross_backend_parity_gate.py --feature ciede2000 --places 2

0101 — GPU backend pattern doc (ADR-0240)

  • PR: docs/gpu-backend-template.
  • What rebases need to know: doc-only PR. Adds docs/development/gpu-backend-template.md (recipe new GPU backends follow) and core/include/libvmaf/AGENTS.md (public-headers-tree invariant note). No source code, no meson changes, no ABI impact.
  • On upstream sync: zero interaction. Both files are fork-introduced.
  • Re-test on rebase:

```bash # Doc-only — verify links resolve: test -f docs/development/gpu-backend-template.md test -f core/include/libvmaf/AGENTS.md grep -c 'gpu-backend-template' core/include/libvmaf/AGENTS.md

0102 — Tiny-AI test registration macro (tiny_ai_test_template.h)

  • PR: refactor/test-registration-macro.
  • What rebases need to know: new core/test/tiny_ai_test_template.h emits the four standard registration tests (<name>_is_registered, <name>_provides_primary_feature, <name>_options_table_well_formed, <name>_init_rejects_missing_model) via the VMAF_TINY_AI_DEFINE_REGISTRATION_TESTS(ext, feat, env, prefix) macro. The four per-extractor test files (test_lpips.c, test_mobilesal.c, test_transnet_v2.c, test_fastdvdnet_pre.c) shrank from ~140 LOC each to ~20-50 LOC. Net −286 LOC. Behavior bit-exact preserved (same assertions, same env-var save/restore dance, same setenv shim for MSVCRT). TransNet V2 keeps two extractor-specific extra tests (binary-flag round-trip + provided_features list-termination) that the macro doesn't cover.
  • On upstream sync: zero interaction. The four test files are fork-introduced (per ADR-0042 / ADR-0168 / ADR-0220 / ADR-0223 / ADR-0215).
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre # 4/4 binaries pass; 18 individual tests total (4x4 standard + 2 # TransNet V2 extras).

0103 — integer_psnr_cuda.c migrated to cuda/kernel_template.h

  • PR: refactor/migrate-psnr-cuda-to-template.
  • What rebases need to know: cuda/kernel_template.h shipped with no consumers in PR #251 (ADR-0246). This PR migrates the first consumer (integer_psnr_cuda.c) — the file the template's own docstring explicitly designated as the reference. The CUstream + CUevent + CUevent triple and the (VmafCudaBuffer device, void *host_pinned, size_t bytes) readback pair are now dispensed by the template helpers (vmaf_cuda_kernel_lifecycle_init/_close, vmaf_cuda_kernel_readback_alloc/_free, vmaf_cuda_kernel_submit_pre_launch, vmaf_cuda_kernel_collect_wait) instead of being open-coded. PsnrStateCuda shrinks: replaces three fields (event + finished + str) with one VmafCudaKernelLifecycle
  • replaces (sse + sse_host) with one VmafCudaKernelReadback.
  • Net LOC delta: +8 LOC on integer_psnr_cuda.c alone — the helpers add per-call boilerplate. The dedup win materialises as more CUDA feature kernels (motion / moment / ssim / vif / adm) migrate one-at-a-time in follow-up PRs. Each subsequent migration saves ~15 LOC.
  • Bit-exactness gates: kernel launch + reduction logic unchanged. The migration only touches state-management boilerplate around the kernel; the SSE accumulator math, the per-bpc kernel function lookup, the host-side log10 score formula, and the dispatch grid-dim calculation are byte-identical to the prior implementation. Netflix golden gate + CPU/CUDA cross-backend parity gate (places=4) re-run unchanged.
  • On upstream sync: zero interaction. integer_psnr_cuda.c is fork-introduced (T7-23 / ADR-0182).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true
ninja -C build
meson test -C build  # CUDA test suite must pass
# Cross-backend parity gate:
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4

0125 — Vulkan submit-side template + fence pool + descriptor pre-alloc bundle (ADR-0256)

  • Touches:
  • core/src/vulkan/kernel_template.h — fork-local. Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. VmafVulkanKernelSubmitPool struct + _create / _destroy / _acquire helpers + vmaf_vulkan_kernel_descriptor_sets_alloc helper. Upstream has no Vulkan backend — no merge surface.
  • core/src/feature/vulkan/{psnr_hvs,vif,float_vif,float_adm}_vulkan.c — fork-local kernel TUs, also no upstream peer.
  • Invariant: the four migrated kernels keep all per-frame VkFence + VkCommandBuffer + VkDescriptorSet resources alive across frames in the pool. Pre-bound descriptor sets rely on the kernel's VmafVulkanBuffer * handles being init-time stable (allocated in init(), freed only in close_fex). vmaf_vulkan_kernel_pipeline_destroy destroys the descriptor pool — pre-allocated sets are released implicitly via the pool; callers must NOT call vkFreeDescriptorSets on them.
  • Re-test on rebase:
meson setup build libvmaf -Denable_vulkan=enabled
ninja -C build
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
    meson test -C build test_vulkan_smoke \
                        test_vulkan_async_pending_fence \
                        test_vulkan_pic_preallocation
python scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature vif --backend vulkan --places 4
python scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature adm --backend vulkan --places 4

0107 — psnr_hvs_cuda async upload + persistent pinned staging (T-GPU-OPT-2/3)

  • Touches:
  • core/src/feature/cuda/integer_psnr_hvs_cuda.c — only consumer; fork-local from inception (T7-23 / ADR-0188 / ADR-0191). State adds upload_str (dedicated H2D stream), upload_done (cross-stream completion event), and per-plane persistent pinned h_uint_ref[3] / h_uint_dist[3] staging buffers allocated once in init_fex_cuda. The per-call helper upload_plane_cuda is split into issue_d2h_plane (pic-stream D2H), convert_plane (CPU normalise), and issue_h2d_plane (upload-stream H2D). submit_fex_cuda runs the three phases explicitly and records upload_done after the last H2D, then cuStreamWaitEvents on lc.str before kernel launches.
  • core/src/cuda/AGENTS.md — adds a rebase-sensitive invariant entry under §Rebase-sensitive invariants documenting the three-phase flow + persistent staging contract.
  • Invariant: the pinned h_uint_* and h_ref / h_dist buffers are never freed and re-allocated mid-stream; the H2Ds must run on upload_str (not on lc.str) so the cuStreamWaitEvent cross-stream link is meaningful; the upload_done event is recorded after the last H2D for the current frame and waited on once before the first kernel launch of that frame. CUDA graph capture (future T-GPU-OPT-N) depends on the no-per-frame-alloc invariant; collapsing the three-phase split or re-introducing per-frame vmaf_cuda_buffer_host_alloc calls breaks that follow-up. Bit-exactness gate is places=3 for psnr_hvs_y / cb / cr and the combined psnr_hvs (matches the existing matrix; not places=4).
  • On upstream sync: zero interaction. integer_psnr_hvs_cuda.c is fork-introduced (T7-23 / ADR-0188 / ADR-0191).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
  --feature psnr_hvs --backend cuda --places 3

0227 — output.c writer-format unit tests (R3 of coverage-gap-2026-05-02)

  • Touches:
  • core/test/test_output.c (new) — exercises the four writers in core/src/output.c (XML / JSON / CSV / SUB) end-to-end via tmpfile()-backed sinks and a synthetic VmafFeatureCollector. Pure test-only; no production code change.
  • core/test/meson.build — registers test_output next to test_feature_collector (mirrors that test's wiring: link_with: libvmaf + libsvm objects + log/predict/metadata helpers).
  • Invariant: the test pulls libvmaf.c and output.c in via #include "*.c" (mirroring the precedent in test_feature_collector.c) so the per-translation-unit .gcno lands in the test build dir and gcovr aggregates output.c's coverage. The mu-test framework macro (mu_assert) deliberately early-returns from each static char *test_*() body — that's why every test body trips clang-analyzer-unix.Malloc "potential leak" notes (cleanup runs only on the success-tail path). This pattern is shared across every core/test/test_*.c file and is load- bearing (per ADR-0141 NOLINT carve-out): replacing it with goto- cleanup would obscure the per-assertion failure message.
  • On upstream sync: zero interaction. output.c is upstream- mirrored, but this PR doesn't touch it. The test only depends on the four public function signatures (vmaf_write_output_{xml, json,csv,sub}); if Netflix renames or reorders those, the test fails to compile and the rebase author updates it then.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && ./build/test/test_output

0126 — OSSF Scorecard policy (ADR-0263)

  • Touches: .github/workflows/scorecard.yml (line 45 — the github/codeql-action/upload-sarif@<sha> pin). The rest of the policy is doc-only (docs/adr/0263-*.md, docs/research/0053-*.md, changelog.d/security/). Upstream Netflix/vmaf does not ship a Scorecard workflow, so the path itself is fork-introduced and won't conflict.
  • Invariant: the upload-sarif SHA must point to a commit that currently exists in github/codeql-action's git tree. A SHA that was once v4 head but no longer exists in the action repository triggers Scorecard's "imposter commit" defence and breaks the workflow with a 400 error against api.scorecard.dev. Verify on every Dependabot bump by spot-checking gh api /repos/github/codeql-action/commits/<sha> returns 200.
  • On upstream sync: zero interaction.
  • Re-test on rebase:

```bash # Confirm the pin still resolves to a real commit: pin=$(grep -oE 'codeql-action/upload-sarif@[a-f0-9]{40}' \ .github/workflows/scorecard.yml | head -1 | cut -d@ -f2) gh api "/repos/github/codeql-action/commits/$pin" --jq '.sha' # Then watch the next master push for a green Scorecard run: gh run list --workflow scorecard --repo VMAFx/vmafx --limit 1

0228 — U-2-Net u2netp saliency replacement deferred (ADR-0265)

  • Touches: docs-only.
  • docs/adr/0265-u2netp-saliency-replacement-blocked.md — new ADR continuing the deferral chain started by ADR-0257.
  • docs/research/0055-u2netp-saliency-replacement-survey.md — new research digest (upstream survey + license + distribution -allowlist audit + alternatives walk).
  • docs/ai/models/mobilesal.md — pointer block updated to reference both ADR-0257 (first blocker) and ADR-0265 (second blocker).
  • model/tiny/registry.jsonmobilesal_placeholder_v0 notes field updated to reference ADR-0265 alongside ADR-0257 (no schema / sha256 / file changes).
  • model/tiny/mobilesal.json — sidecar notes field updated in lockstep.
  • scripts/gen_mobilesal_placeholder_onnx.py — generator notes string updated so re-running is idempotent against the new sidecar / registry text.
  • CHANGELOG.md — Changed entry via changelog.d/changed/T6-2a-followup-u2netp-replacement-deferred.md.
  • docs/adr/README.md — index row via docs/adr/_index_fragments/0265-u2netp-saliency-replacement-blocked.md.
  • Invariant: zero C-side surface change. feature_mobilesal.c tensor-name contract (input input → output saliency_map, NCHW float32 [1, 3, H, W][1, 1, H, W]) is unchanged; the on-disk model/tiny/mobilesal.onnx (sha256 f1226310…) is unchanged; mobilesal_placeholder_v0's smoke: true flag is unchanged. Any future drop-in (U-2-Net via T6-2a-mirror-u2netp-via-release + T6-2a-widen-allowlist-resize, distilled student, or BASNet / PoolNet survey result) replaces the .onnx and bumps the registry sha256 without touching the C side.
  • On upstream sync: zero interaction. feature_mobilesal.c, the registry, the ADR, and the research digest are all fork-local (T6-2a; ADR-0218 / ADR-0257 / ADR-0265; not present in Netflix upstream).
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_mobilesal
python3 ai/scripts/validate_model_registry.py
bash scripts/docs/concat-adr-index.sh --check
bash scripts/release/concat-changelog-fragments.sh --check

0108 — ssim_accumulate_avx512 per-lane double reduction vectorised

  • ADR: ADR-0139 (existing; no new ADR — the per-lane reduction order is unchanged).
  • Touches:
  • core/src/feature/x86/ssim_avx512.c — the ssim_accumulate_block_avx512 body. The per-lane scalar ssim_accumulate_lane calls (16 of them) are replaced by two 8-wide __m512d passes that compute lv, cv, sv, and lv*cv*sv lane-wise in vector double. Aligned double[16] spill buffers replace the previous _Alignas(64) float[16]×6 spill, and the scalar accumulation loop now does 4×16 vaddsd instead of 16 invocations of the per-lane helper.
  • CHANGELOG.md — Changed entry.
  • This file — this entry.
  • Invariant (load-bearing for ADR-0139 bit-exactness):
  • Per-lane double computation order is byte-identical: ((2.0 * rm) * cm + C1) / l_den, then (2.0 * srsc + C2) / c_den, then (lv * cv) * sv. No FMA contraction (separate _mm512_mul_pd + _mm512_add_pd_mm512_fmadd_pd is forbidden because it changes the rounding count and would diverge from scalar's two-step mul+add).
  • Float→double widening uses _mm512_cvtps_pd which is IEEE-754-exact for finite floats (52-bit mantissa fits 23-bit float losslessly).
  • Lane-by-lane left-to-right reduction order preserved: local_ssim += t_ssim[k] for k = 0..15. Tree reductions (pairwise add, dual-accumulator unroll) are forbidden — they break running-sum associativity against scalar.
  • AVX2 / NEON twins kept on the per-lane scalar path. Verified bit-identical against the new AVX-512 at --precision max on the Netflix src01_hrc00/01_576x324 and the checkerboard_1920_1080_10_3_*_0 pairs. The bit-exactness contract (ADR-0139) is per-lane, not per-ISA algorithm — so AVX2 / NEON stay scalar-per-lane until a dedicated PR vectorises them with the same care.
  • Rebase impact: zero conflict with Netflix upstream — the whole SSIM SIMD surface is fork-local (no upstream SSIM SIMD exists). Conflicts only arise if upstream changes ssim_accumulate_default_scalar in iqa/ssim_tools.c; in that case both the AVX2 / NEON per-lane helper and the AVX-512 vector-double block need a coordinated update preserving the three invariants above.
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build
# Bit-exact at --precision max, scalar vs AVX2 vs AVX-512:
for MASK in 0 16 255; do
  core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --feature float_ms_ssim --feature float_ssim \
    --xml -o /tmp/m${MASK}.xml --precision max --cpumask $MASK
done
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m16.xml)   # empty
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m255.xml)  # empty
  • Why this matters on rebase: an upstream commit that touches core/src/feature/ssimulacra2.c could prompt a "let's also port the GPU XYB while we're here" follow-up. The ledger entry is the standing answer: don't, the measurement was redone on NVIDIA in May 2026 and the result still failed places=4 by five decades. See Research-0047.

0126 — FastDVDnet real upstream weights drop (ADR-0253)

  • What changed: replaces model/tiny/fastdvdnet_pre.onnx with the wrapped real upstream FastDVDnet checkpoint (sha256 eb9444cf6f07eefdc7f4f68d09131074dbd1dcee6f88a331ba684dd2fb5937d4, ~9.5 MiB), refreshes the sidecar model/tiny/fastdvdnet_pre.json, flips the registry row's smoke: true → false and adds license: "MIT" + the upstream commit pin c8fdf61. New exporter ai/scripts/export_fastdvdnet_pre.py (the older _placeholder.py exporter is retained for reference). New ADR docs/adr/0255-fastdvdnet-pre-real-weights.md; user-facing doc docs/ai/models/fastdvdnet_pre.md rewritten with provenance, license attribution, and reproduce-the-export instructions.
  • Upstream source: fork-local. Netflix/vmaf does not ship a FastDVDnet temporal pre-filter; the C extractor and ONNX surface are entirely fork-introduced (ADR-0215). The wrapped weights are attribution-only (upstream m-tassano/fastdvdnet MIT).
  • On upstream sync: zero interaction. Every file touched (ai/scripts/export_fastdvdnet_pre*.py, model/tiny/fastdvdnet_pre.*, docs/ai/models/fastdvdnet_pre.md, docs/adr/0253-*.md, CHANGELOG fragment, ADR index fragment) lives in fork-introduced trees.
  • Re-test on rebase:
# Re-derive the ONNX from the pinned upstream checkpoint.
mkdir -p /tmp/fastdvdnet_upstream && cd /tmp/fastdvdnet_upstream
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/model.pth
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/models.py
cd /path/to/vmaf
python3 ai/scripts/export_fastdvdnet_pre.py \
    --upstream-dir /tmp/fastdvdnet_upstream
python3 ai/scripts/validate_model_registry.py
meson test -C build --suite=fast --print-errorlogs test_fastdvdnet_pre

0127 — ONNX op-allowlist gains Resize (ADR-0258)

  • Touches:
  • core/src/dnn/op_allowlist.c — fork-local file (no upstream counterpart). One new entry "Resize" under the /* convolutional */ block.
  • core/test/dnn/test_op_allowlist.c, core/test/dnn/test_onnx_scan.c — fork-local DNN tests.
  • ai/tests/test_op_allowlist.py — fork-local Python parity test.
  • Invariant: the C allowlist is the single source of truth; the Python regex parser in ai/src/vmaf_train/op_allowlist.py walks the same op_allowlist.c file. Any future entry only needs the C edit — Python symmetry is automatic.
  • Upstream source: fork-local. Netflix/vmaf has no ONNX op- allowlist surface; the entire core/src/dnn/ tree is fork- introduced.
  • On upstream sync: zero interaction. Every file touched lives in fork-introduced trees.
  • Re-test on rebase:
meson test -C build test_op_allowlist test_onnx_scan
PYTHONPATH=ai/src python -m pytest ai/tests/test_op_allowlist.py

0231 — vif.comp + ciede.comp precise decorations (ADR-0269 / Step A of Vulkan 1.4 bump)

  • Touches: core/src/feature/vulkan/shaders/vif.comp (3 local-variable type qualifiers: g, sv_sq, gg_sigma_fprecise float), core/src/feature/vulkan/shaders/ciede.comp (yuv_to_rgb outputs, rgb_to_xyz matmul accumulators, ciede2000 chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE).
  • Invariant: Both shaders are fork-local (Vulkan backend is fork-added; upstream Netflix/vmaf has no Vulkan compute kernels). The precise keyword is GLSL 4.50 standard syntax; glslc 2026.1 lowers it to per-result OpDecorate NoContraction. The decorations are load-bearing for the cross-backend gate on NVIDIA driver 595.71+ — removing them would re-introduce the 42/48 ciede regression at API 1.3 documented in research-0054.
  • On upstream sync: zero interaction. Both shader files are entirely fork-introduced; upstream has no Vulkan compute path.
  • Re-test on rebase:
# Re-confirm the cross-backend gate on a Vulkan-capable host.
meson setup core/build -Denable_vulkan=enabled
ninja -C core/build
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature vif --backend vulkan --places 4
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature ciede --backend vulkan --places 4
# Confirm SPIR-V still emits NoContraction post-rebase.
glslc --target-env=vulkan1.3 -O \
    core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
spirv-dis /tmp/vif.spv | grep -c NoContraction   # expect ≥ 60

Expected on NVIDIA 595.71+: vif 0/48 OK, ciede 5/48 FAIL (max abs 8.9e-05 — pre-existing fork debt at API 1.3, see ADR-0269). On RADV / lavapipe: bit-exact (precise is a no-op there).

0229 — fr_regressor_v2 codec-aware scaffold (ADR-0272)

  • ADR: ADR-0272
  • Touches:
  • ai/scripts/train_fr_regressor_v2.py (new) — Phase A JSONL consumer; trains the codec-aware FRRegressor.
  • model/tiny/fr_regressor_v2.onnx (new, smoke) — placeholder ONNX from --smoke mode; re-baked on production training.
  • model/tiny/fr_regressor_v2.json (new) — sidecar.
  • model/tiny/registry.json — new entry with smoke: true.
  • docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md (new).
  • docs/adr/README.md — index row.
  • docs/research/0058-fr-regressor-v2-feasibility.md (new).
  • docs/ai/models/fr_regressor_v2.md (new) — model card.
  • ai/AGENTS.md — invariant note (codec block layout + ENCODER_VOCAB ordering).
  • CHANGELOG.md — Added entry.
  • Invariant: the 8-D codec block layout is [encoder_onehot(6), preset_norm, crf_norm] with ENCODER_VOCAB = (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown) in load-bearing order. CRF normaliser is /63 (union upper bound). Preset normaliser is /9. Bumping the vocabulary requires a re-train; existing checkpoints pin the order they were trained against via encoder_vocab_version in the sidecar. The two-input ONNX (features, codec) follows the LPIPS-Sq precedent (ADR-0040 / ADR-0041).
  • Rebase impact: entirely fork-local; pure additive; no upstream-mirror file is touched. Phase A schema (consumed by this trainer) is itself fork-local (tools/vmaf-tune/). No conflict expected on /sync-upstream.
  • Re-test on rebase:
python ai/scripts/train_fr_regressor_v2.py --smoke
python ai/scripts/validate_model_registry.py

0311 — libFuzzer harness expansion: yuv_input + cli_parse (ADR-0311)

  • ADR: ADR-0311; parent ADR-0270.
  • Touches:
  • core/test/fuzz/fuzz_yuv_input.c (new)
  • core/test/fuzz/fuzz_cli_parse.c (new)
  • core/test/fuzz/meson.build — two new executable(...) blocks for the harnesses, plus a shared fuzz_vidinput_sources list.
  • core/test/fuzz/yuv_input_corpus/* (new — 6 seeds covering 8/10-bit × 4:2:0 / 4:2:2 / 4:4:4 plus a truncated-frame seed).
  • core/test/fuzz/cli_parse_corpus/* (new — 6 seeds covering the --feature, --model, --reference, YUV-flag, and --help shapes).
  • core/test/fuzz/README.md — Targets table extended.
  • .github/workflows/fuzz.yml — matrix gains fuzz_yuv_input + fuzz_cli_parse; per-harness wall-clock budget reduced from 300 s to 60 s so the 3-target matrix fits the existing timeout-minutes: 15 cap.
  • docs/development/fuzzing.md — runbook table + smoke commands extended.
  • docs/adr/0311-libfuzzer-harness-expansion.md (new)
  • docs/research/0083-libfuzzer-harness-expansion-target-survey.md (new)
  • libvmaf/AGENTS.md — new invariant block for the one-parser-one-harness rule.
  • CHANGELOG.md — Added entry.
  • Invariant:
  • The fuzz scaffold remains opt-in (-Dfuzz=true) — every default meson setup invocation must continue to skip it.
  • fuzz_yuv_input re-includes tools/yuv_input.c and the rest of the vidinput trio as build inputs. Upstream Netflix/vmaf splits or renames of those source files need the matching meson.build source-list update.
  • fuzz_cli_parse re-includes tools/cli_parse.c as a build input and links against libvmaf for vmaf_version() and feature-dictionary symbols. The -Wl,--wrap=exit link arg is load-bearing — without it, usage()'s exit(1) would terminate the fuzzer process on first bad input.
  • LLVMFuzzerTestOneInput keeps external linkage; the scaffold-wide // NOLINTNEXTLINE(misc-use-internal-linkage) pattern is correct for libFuzzer's name-resolved entry-point ABI.
  • Rebase impact: any upstream sync that touches core/tools/{yuv_input,cli_parse}.c must re-run the 60 s smoke per harness on the merged tip; record any new-found crash-* artefact under the matching <target>_known_crashes/ dir, not in <target>_corpus/. The __wrap_exit shim in fuzz_cli_parse.c is GNU-ld / lld-only; do not assume it works on Apple ld without an -undefined,dynamic_lookup fallback.
  • Re-test on rebase:
CC=clang CXX=clang++ \
  meson setup build-fuzz libvmaf \
    --buildtype=debug \
    -Db_sanitize=address \
    -Db_lundef=false \
    -Dfuzz=true \
    -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz \
    test/fuzz/fuzz_y4m_input \
    test/fuzz/fuzz_yuv_input \
    test/fuzz/fuzz_cli_parse
./build-fuzz/test/fuzz/fuzz_yuv_input \
    -seed=0 -runs=1000 \
    core/test/fuzz/yuv_input_corpus/
./build-fuzz/test/fuzz/fuzz_cli_parse \
    -seed=0 -runs=1000 \
    core/test/fuzz/cli_parse_corpus/

0229 — libFuzzer scaffold for the YUV4MPEG2 parser (ADR-0270)

  • ADR: ADR-0270
  • Touches:
  • core/test/fuzz/fuzz_y4m_input.c (new)
  • core/test/fuzz/meson.build (new)
  • core/test/fuzz/README.md (new)
  • core/test/fuzz/y4m_input_corpus/* (new — six seeds)
  • core/test/fuzz/y4m_input_known_crashes/* (new — one 411-chroma OOB reproducer; excluded from CI corpus)
  • core/test/meson.buildsubdir('fuzz') line.
  • core/meson_options.txt — new option('fuzz', ...).
  • .github/workflows/fuzz.yml (new — nightly 5-minute job).
  • docs/development/fuzzing.md (new — operator runbook).
  • docs/adr/0270-fuzzing-scaffold.md (new)
  • docs/research/0059-libfuzzer-scaffold-y4m.md (new)
  • docs/state.md — new Open-bug row for the 411-chroma OOB write.
  • CHANGELOG.md — Added entry.
  • Invariant: the fuzz scaffold is opt-in — every default meson setup invocation must continue to skip it. The harness links statically against core/tools/{y4m_input,yuv_input,vidinput}.c rather than libvmaf.so so the public C-API surface stays unchanged.
  • Rebase impact: the harness re-includes core/tools/y4m_input.c as a build input. Any upstream Netflix/vmaf change that splits or renames the tool sources (e.g. moves the parser into core/src/) needs the corresponding meson.build source list update and the harness re-test below. The y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m reproducer is the regression gate for the parser fix; do not delete it on upstream sync — if upstream lands the same fix, port the reproducer back into y4m_input_corpus/ as a permanent seed.
  • Re-test on rebase:
CC=clang CXX=clang++ \
  meson setup build-fuzz libvmaf \
    --buildtype=debug \
    -Db_sanitize=address \
    -Db_lundef=false \
    -Dfuzz=true \
    -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
./build-fuzz/test/fuzz/fuzz_y4m_input \
    -max_total_time=60 \
    core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer still triggers (until the fix lands):
./build-fuzz/test/fuzz/fuzz_y4m_input \
    core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m

0231 — HIP seventh-consumer kernel float_motion_hip (ADR-0273)

  • ADR: ADR-0273
  • Touches:
  • core/src/feature/hip/float_motion_hip.c (new) — seventh consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/float_motion_cuda.c call-graph-for-call-graph; init/submit/collect/close invoke the kernel-template helpers in the same order; flush() callback for tail-frame motion2 emission; motion_force_zero short-circuit posture (fex->extract swap with submit / collect / flush / close nulled). Submit path intentionally bypasses vmaf_hip_kernel_submit_pre_launch (kernel writes per-WG SAD float partials directly, no atomic, no memset).
  • core/src/feature/hip/float_motion_hip.h (new)
  • core/src/hip/meson.build — new entry in hip_sources.
  • core/src/feature/feature_extractor.c — extern declaration plus feature_extractor_list[] entry under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — new sub-test test_float_motion_hip_extractor_registered (also asserts the VMAF_FEATURE_EXTRACTOR_TEMPORAL flag bit) and a row in test_table[].
  • docs/adr/0273-hip-seventh-consumer-float-motion.md (new)
  • docs/adr/README.md — index row.
  • docs/backends/hip/overview.md — seventh / eighth consumer note.
  • core/src/hip/AGENTS.md — invariant note.
  • CHANGELOG.md — Added entry (joint with ADR-0274).
  • Invariant — three-buffer ping-pong + motion_force_zero short-circuit are load-bearing. The state struct carries three uintptr_t buffer slots (ref_in, blur[2]) that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin's VmafCudaBuffer *ref_in + VmafCudaBuffer *blur[2] field shape. The motion_force_zero short-circuit (fex->extract swap, kernel-template helpers nulled) must stay aligned with the CUDA twin on every refactor — otherwise the runtime PR's helper-body flip diverges between the two backends. The submit_pre_launch bypass mirrors the CUDA twin; if a future PR adds a submit_pre_launch call to float_motion_cuda.c's submit path, the HIP twin must follow in the same PR.
  • Rebase impact: entirely fork-local. New files are HIP-specific. The only upstream-touching edit is feature_extractor.c, but the change sits inside an existing #if HAVE_HIP block (ADR-0241); upstream has no HAVE_HIP so no conflict is expected.
  • Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
  -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke

0232 — HIP eighth-consumer kernel float_ssim_hip (ADR-0274)

  • ADR: ADR-0274
  • Touches:
  • core/src/feature/hip/float_ssim_hip.c (new) — eighth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/integer_ssim_cuda.c call-graph-for-call-graph (the CUDA file registers vmaf_fex_float_ssim_cuda despite its integer_ filename). First multi-dispatch HIP consumer (chars.n_dispatches_per_frame == 2). Submit path intentionally bypasses vmaf_hip_kernel_submit_pre_launch (kernel writes per-block float partials directly). State struct carries five uintptr_t intermediate float buffer slots (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) tracked outside the kernel-template's readback bundle. validate_dims_hip and init_dims_hip helpers extracted from init() to fit the readability-function-size budget.
  • core/src/feature/hip/float_ssim_hip.h (new)
  • core/src/hip/meson.build — new entry in hip_sources.
  • core/src/feature/feature_extractor.c — extern declaration plus feature_extractor_list[] entry under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — new sub-test test_float_ssim_hip_extractor_registered (also asserts chars.n_dispatches_per_frame == 2) and a row in test_table[].
  • docs/adr/0274-hip-eighth-consumer-float-ssim.md (new)
  • docs/adr/README.md — index row.
  • docs/backends/hip/overview.md — seventh / eighth consumer note (joint).
  • core/src/hip/AGENTS.md — invariant note.
  • CHANGELOG.md — Added entry (joint with ADR-0273).
  • Invariant — multi-dispatch + five-slot buffer pyramid + v1 scale=1 validation are load-bearing. The state struct carries five uintptr_t intermediate float buffer slots that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin's VmafCudaBuffer *h_* field shape — any drift in the CUDA twin's slot count requires a paired update here. The chars.n_dispatches_per_frame == 2 characteristic is asserted in the smoke test; do not silently lower it. The v1 scale=1 -EINVAL validation surface (in validate_dims_hip) must stay aligned with the CUDA twin's compute_scale / vmaf_log chain. The HIP twin's validate_dims_hip / init_dims_hip extraction is intentional for the function-size budget; do not re-inline without verifying the budget still passes.
  • Rebase impact: entirely fork-local; same posture as ADR-0273.
  • Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
  -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke

0229 — vmaf_tiny_v3 + vmaf_tiny_v4 dynamic-PTQ int8 sidecars (ADR-0275)

0278 — vmaf-tune libaom-av1 codec adapter (2026-05-03)

0228 — vmaf-tune libx265 codec adapter (ADR-0288)

0280 — vmaf-tune NVENC codec adapters (ADR-0290)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_nvenc,hevc_nvenc,av1_nvenc,_nvenc_common}.py (new). Wholly fork-local — no upstream Netflix/vmaf overlap.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py — registry expanded.
  • tools/vmaf-tune/tests/test_codec_adapter_nvenc.py (new).
  • tools/vmaf-tune/tests/test_corpus.py — Phase-A registry assertion updated.
  • tools/vmaf-tune/AGENTS.md — invariant note expanded.
  • docs/usage/vmaf-tune.md — "Hardware encoders (NVENC)" section.
  • docs/adr/0290-vmaf-tune-nvenc-adapters.md (new) + docs/adr/README.md index row.
  • docs/research/0065-vmaf-tune-nvenc-adapters.md (new).
  • CHANGELOG.md — Added entry.
  • Invariant: known_codecs() returns the four-codec tuple ("av1_nvenc", "h264_nvenc", "hevc_nvenc", "libx264"); the mnemonic preset map (ultrafast/superfast/veryfastp1, fasterp2, fastp3, mediump4, slowp5, slowerp6, slowest/placebop7) is the canonical cross-codec preset alignment that downstream Phase B/C consumers assume. The CQ window is the hardware-permitted [0, 51]; the Phase A informative window is [15, 40].
  • Rebase impact: zero — tools/vmaf-tune/ is wholly fork-local and has no upstream Netflix/vmaf path overlap.
  • Re-test on rebase:
cd tools/vmaf-tune && python -m pytest tests/ -q

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry add), tools/vmaf-tune/src/vmaftune/encode.py (parse_versions(stderr, encoder=…) gains a per-codec branch), tools/vmaf-tune/src/vmaftune/cli.py (help-text wording only), tools/vmaf-tune/tests/test_codec_adapter_x265.py (new), tools/vmaf-tune/tests/test_corpus.py (membership-based codec list assertion).
  • Invariant: the codec-adapter contract documented in tools/vmaf-tune/AGENTS.md (multi-codec from day one; the search loop never branches on codec identity). The parse_versions signature is still backward-compatible — encoder defaults to libx264 so callers from before this PR keep working.
  • Upstream source: fork-local. tools/vmaf-tune/ is fork-only; upstream Netflix/vmaf does not ship encode automation.
  • On upstream sync: zero interaction. Confirm the _index_fragments/_order.txt row for 0288-vmaf-tune-codec-adapter-x265 remains present after any cross-merge.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -x

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry row + import), tools/vmaf-tune/tests/test_corpus.py (membership assertion relaxed from == ("libx264",) to "libx264" in known_codecs()), tools/vmaf-tune/tests/test_codec_adapter_libaom.py (new), tools/vmaf-tune/AGENTS.md (preset-vocabulary invariant).
  • Invariant: the cross-codec preset vocabulary (placebo, slowest, slower, slow, medium, fast, faster, veryfast, superfast, ultrafast) is shared across AV1-family adapters so one --preset axis covers x264 / x265 / svtav1 / libaom-av1. Each adapter maps the human name onto its codec-specific knob; do not introduce per-adapter preset names.
  • Upstream source: fork-local. tools/vmaf-tune/ is the fork-introduced quality-aware encode automation harness (ADR-0237); it has no upstream Netflix/vmaf counterpart.
  • On upstream sync: zero interaction with upstream/master. Self-contained in tools/vmaf-tune/ and docs/.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • ADR: ADR-0275
  • Touches:
  • model/tiny/vmaf_tiny_v3.int8.onnx (new, 4 267 B)
  • model/tiny/vmaf_tiny_v4.int8.onnx (new, 7 769 B)
  • model/tiny/registry.json — new vmaf_tiny_v3 and vmaf_tiny_v4 rows with quant_mode, int8_sha256, quant_accuracy_budget_plcc fields.
  • model/tiny/vmaf_tiny_v3.json, model/tiny/vmaf_tiny_v4.json — same fields mirrored into the per-model sidecars.
  • docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md — new "Quantisation" sections.
  • docs/adr/0275-vmaf-tiny-v3-v4-ptq.md (new) and ADR index row.
  • CHANGELOG.md — Added entry.
  • Invariant: python ai/scripts/measure_quant_drop.py --all reports [PASS] for both vmaf_tiny_v3 (drop ≤ 0.001 on Netflix features) and vmaf_tiny_v4 (drop ≤ 0.001), inside the 0.01 per-model budget. The runtime redirect from ADR-0174 picks the .int8.onnx sibling when an operator's registry overlay declares quant_mode: dynamic.
  • Rebase impact: entirely fork-local — neither v3 nor v4 nor the dynamic-PTQ harness exists upstream. The new int8 ONNX bytes ship as committed binaries (mirroring learned_filter_v1 and nr_metric_v1); they are well below the few-MB external-data threshold and don't require the sigstore + .onnx.data pattern.
  • Re-test on rebase:

```bash python ai/scripts/validate_model_registry.py python ai/scripts/measure_quant_drop.py --all

0229 — NVIDIA-Vulkan ciede2000 places=4 fork debt root-cause (ADR-0273)

  • Touched files: docs-only.
  • docs/adr/0273-...precision-gap.md (new) + _index_fragments/ row + _order.txt append.
  • docs/research/0055-ciede-vulkan-nvidia-f32-f64-root-cause.md (new) + docs/research/README.md index row.
  • docs/state.md — Open-bugs row T-VK-CIEDE-F32-F64.
  • docs/backends/vulkan/overview.md — NVIDIA-hardware caveat.
  • changelog.d/changed/ciede-vulkan-nvidia-f32-f64-precision-gap.md (new).
  • core/src/vulkan/AGENTS.md — invariant cross-link.
  • Invariant: the ciede.comp shader's f32 precision contract is load-bearing — promoting to f64 would silently change scores on every Vulkan device that supports shaderFloat64 and create a per-device-feature-bit divergence (RTX 4090 has it; many consumer GPUs don't). The CPU ciede.c::get_lab_color doing its colour-space chain in double is upstream Netflix behaviour and must not be narrowed to f32 to "fix" the GPU gap (would change Netflix golden ground truth). The 5/48 NVIDIA places=4 mismatch on the highest-ΔE frames is expected and documented; do not attempt to "fix" it without re-reading ADR-0273 first.
  • Rebase impact: zero — docs-only. The CPU and shader sources this ADR analyses are unchanged by this PR. If a future upstream rebase touches ciede.c::get_lab_color (the double chain) the ADR's reasoning still holds; if upstream changes the CPU reference's precision posture, ADR-0273 needs a Status: Superseded entry.
  • Re-test on rebase: a manual NVIDIA-hardware run if available:

```bash cd libvmaf && meson setup build \ -Denable_vulkan=enabled -Denable_cuda=false && ninja -C build cd .. python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary $PWD/core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature ciede --backend vulkan --device 0 --places 4 # Expected post-PR-346 (when merged): 5/48 mismatches at 1.78× threshold. # Expected pre-PR-346 (current master): 42/48 mismatches at higher ratio. # If the count drops below 5/48 on NVIDIA, ADR-0273 should record the # delta and consider closing T-VK-CIEDE-F32-F64.

0229 — tools/vmaf-tune fast Phase A.5 scaffold (ADR-0276)

  • Touches: tools/vmaf-tune/src/vmaftune/fast.py (new), tools/vmaf-tune/src/vmaftune/cli.py (new fast subcommand branch), tools/vmaf-tune/pyproject.toml (new [fast] extra), tools/vmaf-tune/tests/test_fast.py (new), tools/vmaf-tune/AGENTS.md (new invariants), docs/usage/vmaf-tune.md (new "Phase A.5" section), docs/adr/0276-vmaf-tune-fast-path.md (new ADR), docs/research/0060-vmaf-tune-fast-path.md (new digest).
  • Invariant: the fast subcommand is opt-in and never automatically replaces the Phase A grid path. The slow grid is the ground-truth corpus generator (ADR-0237 contract); fast-path is for the recommendation use case only. Optuna is a lazy-imported optional dep gated behind the [fast] extra — importing it at module scope outside fast.py (or its tests) breaks the zero-dep core install.
  • Rebase impact: entirely fork-local; the tool sits under tools/vmaf-tune/ which is fork-added, and no upstream files are touched. Upstream Netflix/vmaf has no analogous surface.
  • Re-test on rebase:
pip install -e 'tools/vmaf-tune[fast]'
pytest tools/vmaf-tune/tests/test_fast.py -v
vmaf-tune fast --smoke --target-vmaf 92

0229 — vmaf-tune recommend subcommand (ADR-0237 Phase B-lite)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/recommend.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds recommend subparser; corpus subcommand untouched.
  • tools/vmaf-tune/tests/test_recommend.py (new). 13-case smoke suite, mocks all binaries; runs in <100 ms.
  • docs/usage/vmaf-tune.md — adds ## recommend section.
  • Invariant: recommend consumes the existing CORPUS_ROW_KEYS schema unchanged — vmaf_score, bitrate_kbps, crf, preset, encoder, exit_status. No schema bump. If a future PR bumps SCHEMA_VERSION, both the corpus writer and the recommend reader must be updated in lockstep; tests assert this via test_corpus_row_keys_match_init_contract.
  • Rebase impact: zero — tools/vmaf-tune/ is wholly fork-local; no upstream surface touches it.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0228 — integer_ms_ssim_cuda.c joins drain_batch (T-GPU-OPT-2 / ADR-0271)

  • Touches: core/src/feature/cuda/integer_ms_ssim_cuda.c. No upstream Netflix/vmaf changes expected here — the file is fork-added (CUDA twin of the upstream-port ms_ssim_score.cu) and the surface this PR redrew (per-scale l_partials[i] / c_partials[i] / s_partials[i] arrays + the per-scale h_l_partials[i] / h_c_partials[i] / h_s_partials[i] pinned host shadows + the submit() <→ collect() work redistribution + the cuEventRecord(s->lc.finished, s->lc.str) + vmaf_cuda_drain_batch_register(&s->lc) tail) is also entirely fork-local.
  • Invariant: the engine-scope drain-batch contract from ADR-0271 / drain_batch.h. The kernel-launch order on s->lc.str must stay stable: decimate (× 4) then for each scale i ∈ 0..4 horizvert_lcs ⇒ DtoH(l_partials[i]) ⇒ DtoH(c_partials[i]) ⇒ DtoH(s_partials[i])thencuEventRecord(s->lc.finished, s->lc.str)thenvmaf_cuda_drain_batch_register(&s->lc). Same-stream ordering is what makes the shared SSIM intermediates (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp`) safe across scales without explicit sync — any change that parallelises the per-scale work onto multiple streams breaks bit-exactness unless per-scale intermediates are also added.
  • On upstream sync: zero interaction (the file is fork-added). If a future upstream PR adds an integer_ms_ssim_cuda.c of its own, the merger must reconcile the per-scale partials topology + the drain_batch tail with whatever the new upstream shape brings.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build  # confirms the CPU build still links cleanly
# If the dev host has a working nvcc / host-compiler pair:
meson setup build_cuda -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda src/liblibvmaf_feature.a.p/feature_cuda_integer_ms_ssim_cuda.c.o
# Netflix CPU golden gate (CPU is the bit-exactness ground truth):
make test-netflix-golden
# Cross-backend parity (places=4 gate, ADR-0214):
/cross-backend-diff

0277 — ffmpeg-patches refresh against n8.1 — 2026-05-04 (ADR-0277)

  • Touches: ffmpeg-patches/ is unchanged (no content drift). Doc-only entries land in:
  • docs/adr/0277-ffmpeg-patches-refresh-2026-05-04.md — new ADR.
  • docs/adr/_index_fragments/0277-ffmpeg-patches-refresh-2026-05-04.md — index row.
  • docs/adr/_index_fragments/_order.txt — manifest append.
  • changelog.d/changed/ffmpeg-patches-refresh-2026-05-04.md — Changed entry.
  • This file — this entry.
  • Invariant: ffmpeg-patches/series.txt order is load-bearing — patches 0002…0006 build on each other and only apply cleanly cumulatively. The verification gate is a series replay, not a per-patch git apply --check (per ADR-0118 + CLAUDE.md §12 r14).
  • On upstream sync: zero interaction. Netflix/vmaf has no ffmpeg-patches/ tree; this is a fork-local integration surface.
  • Re-test on rebase (also: re-replay procedure for the next refresh):
# Clone pristine n8.1
git -C /tmp clone --depth 1 --branch n8.1 \
  https://github.com/FFmpeg/FFmpeg.git ff-replay-$(date +%F)
cd /tmp/ff-replay-$(date +%F)
git switch -c refresh-$(date +%F)
git config user.email refresh@local && git config user.name "Refresh Bot"

# Replay the series cumulatively
for p in /path/to/vmaf/ffmpeg-patches/000*-*.patch; do
  git am --3way "$p" || break
done

# Regenerate and compare to in-tree
mkdir -p /tmp/ff-regen-$(date +%F)
git format-patch n8.1.. -o /tmp/ff-regen-$(date +%F)/

# Diff old vs new excluding pure format-patch noise
for i in 1 2 3 4 5 6; do
  orig=$(ls /path/to/vmaf/ffmpeg-patches/000${i}-*.patch)
  regen=$(ls /tmp/ff-regen-$(date +%F)/000${i}-*.patch)
  diff -u \
    <(grep -v "^From [0-9a-f]\|^Date:\|^index " "$orig") \
    <(grep -v "^From [0-9a-f]\|^Date:\|^index " "$regen") \
    | head -40
done

If only stylistic diffs surface (PATCH N/M numbering, MIME headers, hunk-context counts, hunk offset shifts against cumulative state), keep originals — record a no-drift refresh ADR. If real content drift surfaces, regenerate and ship the refresh PR with the regenerated patches plus a content-summary ADR.

End-to-end vf_libvmaf smoke is best run from CI (ffmpeg-integration.yml) against an installed libvmaf prefix — the meson-uninstalled .pc does not satisfy FFmpeg's #include <libvmaf.h> probe (the headers live under libvmaf/libvmaf.h only; the system-installed .pc carries an extra -I${includedir}/libvmaf shortcut that the uninstalled .pc omits).

0229 — T7-5 NOLINT-sweep closeout (ADR-0278)

  • Touched files:
  • core/src/feature/integer_adm.c (1 NOLINT cite, line ~988 adm_decouple_s123 — upstream-mirror Netflix 966be8d5).
  • core/src/feature/cuda/ssimulacra2_cuda.c (3 NOLINT cites: ss2c_picture_to_linear_rgb, ss2c_host_combine, ss2c_run_scale_gpu / extract_fex_cuda).
  • core/src/feature/vulkan/ssimulacra2_vulkan.c (3 NOLINT cites: ss2v_setup_gaussian, ss2v_picture_to_linear_rgb, ss2v_run_scale).
  • core/src/feature/vulkan/cambi_vulkan.c (1 NOLINT cite: cambi_vk_extract).
  • core/src/feature/sycl/integer_adm_sycl.cpp (6 cites, SYCL kernel-launch entries).
  • core/src/feature/sycl/integer_motion_sycl.cpp (2 cites).
  • core/src/feature/sycl/integer_vif_sycl.cpp (4 cites).
  • core/tools/vmaf.c (3 cites: copy_picture_data, init_gpu_backends, main).
  • Invariant: zero behavioural change. Edits are inside comment blocks — appended (ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278) to existing prose justifications. No function bodies split. The 12 SYCL sites share an identical justification string verbatim; preserving the byte-for-byte duplicate is the load-bearing documentation pattern (grep-able across the SYCL TUs).
  • On upstream sync: minimal interaction. The cite-only edits live inside comment blocks above the function signatures; rebases will surface them as touched lines but the function bodies are unchanged. For integer_adm.c's upstream-mirror block (Netflix 966be8d5), the comment edit at line 984–991 is cosmetic — keep the fork's version on conflict (it merely names the ADR; the underlying prose is unchanged).
  • Re-test on rebase:

```bash # 1. Programmatic audit must report 0 missing citations python3 - <<'PY' import re, os paths = [os.path.join(r, f) for r, _, fs in os.walk('libvmaf/src') for f in fs if f.endswith(('.c','.cpp','.h'))] paths.append('core/tools/vmaf.c') miss = total = 0 for p in paths: with open(p) as fh: ls = fh.readlines() for i, line in enumerate(ls): if 'NOLINT' in line and 'readability-function-size' in line and 'NOLINTEND' not in line: total += 1 ctx = [line]; j = i - 1 while j >= 0 and j > i - 14: s = ls[j].strip() if not s: break if s.startswith(('//','/','')): ctx.insert(0, ls[j]); j -= 1 else: break buf = ''.join(ctx) if 'ADR-' not in buf and not re.search(r'[Rr]esearch-?\d', buf): miss += 1 print(f"sites={total} missing={miss}") PY

# 2. Build + Netflix golden gate meson setup build -Denable_cuda=false -Denable_sycl=false ninja -C build make test-netflix-golden

0231 — vmaf-tune score path decodes mp4 -> raw YUV

  • Touches: tools/vmaf-tune/src/vmaftune/score.py (new _decode_to_raw_yuv + _needs_decode helpers, run_score shells out to ffmpeg when req.distorted.suffix not in {.yuv, .y4m}); tools/vmaf-tune/tests/test_corpus.py (3 new regression tests + the smoke-end-to-end mock now also stubs the ffmpeg decode call).
  • Invariant: the decode-back is the contract the libvmaf CLI imposes — mp4/webm/etc. --distorted is silently rejected as raw-yuv with the wrong byte count, surfacing as exit_status=234. Future encoder adapters that emit non-raw containers inherit this decode automatically. Do not "optimise" the temp YUV away without first migrating the corpus pipeline to the ffmpeg+libvmaf filter (which can pipe an mp4 stream in directly).
  • On upstream sync: zero interaction. vmaf-tune is fork-only tooling; upstream Netflix/vmaf has no analogue.
  • Re-test on rebase:

```bash cd tools/vmaf-tune && python3 -m pytest tests/ # plus an end-to-end smoke (needs a real raw YUV + ffmpeg + vmaf): ./vmaf-tune corpus --source /path/to/ref.yuv --width 1920 \ --height 1080 --pix-fmt yuv420p --framerate 25 --duration 6 \ --encoder libx264 --preset medium --crf 23 \ --output /tmp/smoke.jsonl --no-source-hash # expect: vmaf_score is a real number, not NaN.

0232 — CUDA build pins nvcc --std c++20

  • Touches: core/src/meson.build line 686 (cuda_flags = [...]).
  • Invariant: nvcc 12.x clamps host C++ at C++17 by default; 13.x accepts up to C++20. Bumping the host stdlib past nvcc's default (any gcc >= 16, libstdc++ ships C++23 features) breaks the host-side parse in <type_traits> / <bits/utility.h>. Forcing --std c++20 on CUDA 13+ keeps the host headers parseable. Do not drop this flag without first checking the host gcc version against nvcc's default.
  • On upstream sync: zero interaction. Netflix/vmaf doesn't ship the cuda_flags list shape we use (their CUDA build is the original pre-fork pattern); a sync that touches core/src/meson.build around the is_cuda_enabled branch should keep the --std c++20 injection.
  • Re-test on rebase:
meson setup core/build-cuda -Denable_cuda=true \
    -Denable_sycl=false -Denable_vulkan=disabled
ninja -C core/build-cuda
# smoke
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
    -r .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
    -d .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
    -w 1920 -h 1080 -p 420 -b 8

0233 — CUDA motion flush_fex_cuda idempotency guard

  • Touches: core/src/feature/cuda/integer_motion_cuda.c — factored an append_if_unwritten helper and routed the two motion2 / motion3 final-frame writes through it.
  • Invariant: under T-GPU-OPT-1 (PR #312 / ADR-0242), the pending-collect inside flush_context_cuda may already have written motion2_score[s->index] / motion3_score[s->index] before flush_fex_cuda runs. Any future motion-cuda flush logic that emits the same (feature, index) pair must keep this idempotency contract or flush_context_cuda will mis-surface as "context could not be synchronized".
  • On upstream sync: the bug only exists because the fork's flush_context_cuda runs the pending-collect before the per-extractor flush. Netflix/vmaf upstream doesn't have the T-GPU-OPT-1 drain pattern, so the pre-#312 code path didn't duplicate-write. If Netflix lands a similar pattern, the fix shape mirrors what's done here.
  • Re-test on rebase:
ninja -C core/build-cuda
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 \
  --model path=model/vmaf_v0.6.1.json --threads 1 -q \
  --output /tmp/cuda.json --json
# Expect: clean run, no "cannot be overwritten" warning,
# no "problem flushing context" error.

0234 — hw_encoder_corpus.py Phase A real-corpus runner

  • Touches: new scripts/dev/hw_encoder_corpus.py (no existing caller; opt-in tooling). Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. docs/development/intel-arc-vaapi-driver-priority.md. Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. stratified sample, 58 KiB).
  • Invariant: the script's QSV path forces env['LIBVA_DRIVER_NAME']='iHD' (set by the calling shell, not inside the script) when targeting /dev/dri/renderD129 on a multi-card host that has NVIDIA's libva-driver-nvidia shim installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI translation and the MFX session handshake fails with -9. See the companion doc for the failure mode + fix.
  • On upstream sync: zero interaction. The script lives under scripts/dev/ (fork-only); upstream Netflix/vmaf has no comparable Phase A corpus tooling.
  • Re-test on rebase:
python3 scripts/dev/hw_encoder_corpus.py \
  --vmaf-bin core/build-cuda/tools/vmaf \
  --source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
  --width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
  --encoder h264_nvenc --cq 25 \
  --out /tmp/smoke.jsonl
# Expect: 1 cell × ~150 frames, per-frame canonical-6 + vmaf,
# encoder=h264_nvenc, cq=25.

0235 — fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)

  • Touches: ai/scripts/train_fr_regressor_v2.pyENCODER_VOCAB gains 6 hw-codec entries (3 NVENC + 3 QSV); ENCODER_VOCAB_VERSION bumps 1 -> 2; PRESET_ORDINAL gains 6 sub-tables for p1..p7 (NVENC) and the libx264-aligned QSV preset family.
  • Invariant: vocab order is load-bearing — index of every entry is baked into trained model graphs as a one-hot column position. New entries MUST be appended (never inserted into the middle), and the unknown sentinel MUST stay last (UNKNOWN_ENCODER_INDEX = N - 1). Bumping ENCODER_VOCAB_VERSION signals that any v1-graph ONNX needs re-export against v2 before consuming v2 training rows.
  • On upstream sync: zero interaction. train_fr_regressor_v2.py is fork-only (Phase B prereq, ADR-0237 / ADR-0272).
  • Re-test on rebase: python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export — expect PLCC > 0.95 on a multi-codec corpus.

0276 — vmaf_tiny_v5 corpus-expansion probe (ADR-0287) — defer

  • What changed: research-only addition. New scripts under ai/scripts/ (fetch_youtube_ugc_subset.py, extract_ugc_features.py, train_vmaf_tiny_v5.py, eval_loso_vmaf_tiny_v5.py), new ADR docs/adr/0276-*.md, new research digest docs/research/0057-*.md, and one CHANGELOG entry. No new ONNX artefact under model/tiny/, no registry change, no public C-API / CLI / meson_options change. The probe trained an architecturally identical mlp_small on a 5-corpus parquet (4-corpus + 27 000 UGC rows); the 1-σ ship gate did not clear (Δ PLCC = +0.00005), so the exporter that the prior agent had drafted (export_vmaf_tiny_v5.py) was discarded before the commit.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI corpus-expansion surface; nothing on the upstream side touches these files.
  • On upstream sync: zero interaction. The v5 surface lives entirely under ai/scripts/ + docs/adr/ + docs/research/, all of which are fork-introduced trees. The shipped v2 model (model/tiny/vmaf_tiny_v2.onnx) and its registry row are untouched.
  • Re-test on rebase:
# No code under test on rebase — purely research artefacts.
# If revisiting the corpus expansion, the reproducer is in the
# research digest:
python3 ai/scripts/fetch_youtube_ugc_subset.py \
    --out-dir .workingdir2/ugc/download \
    --n-stems 30 \
    --manifest .workingdir2/ugc/manifest.json
python3 ai/scripts/extract_ugc_features.py \
    --manifest .workingdir2/ugc/manifest.json \
    --yuv-dir .workingdir2/ugc/yuv \
    --vmaf-bin build-cpu/tools/vmaf \
    --out-parquet runs/full_features_ugc.parquet \
    --max-height 360 --max-frames 300 --threads 8
python3 ai/scripts/eval_loso_vmaf_tiny_v5.py \
    --parquet-base  runs/full_features_4corpus.parquet \
    --parquet-extra runs/full_features_ugc.parquet \
    --out-json      runs/vmaf_tiny_v5_loso_metrics.json

0227 — vmaf-tune Intel QSV codec adapters (ADR-0281)

  • What changed: fork-local additions under tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, h264_qsv.py, hevc_qsv.py, av1_qsv.py, plus registry rows in codec_adapters/__init__.py and a new test file tools/vmaf-tune/tests/test_codec_adapter_qsv.py. Doc updates: docs/usage/vmaf-tune.md (Hardware encoders section), docs/adr/0281-vmaf-tune-qsv-adapters.md, docs/research/0066-vmaf-tune-qsv-adapters.md, tools/vmaf-tune/AGENTS.md, CHANGELOG.md.
  • Upstream source: fork-local. tools/vmaf-tune/ is fork-introduced under ADR-0237; Netflix/vmaf has no corresponding tree.
  • On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths.
  • Invariant: the registry exposes exactly four codecs (av1_qsv, h264_qsv, hevc_qsv, libx264 — alphabetical), each adapter validates its (preset, quality) pair, and the QSV preset vocabulary is the seven x264-style names (veryslow…veryfast, no ultrafast / superfast). The encode pipeline (encode.py) remains x264-CRF-tied and will be widened in a separate PR — the QSV adapters are inert until then. Future codec families that share parameter shape (NVENC, AMF) follow the same _<family>_common.py + N thin adapters pattern.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0230 — K150K-A corpus extraction script (ADR-0362)

  • Touches: ai/scripts/extract_k150k_features.py (new fork-only file), ai/AGENTS.md (K150K invariant note appended), docs/adr/README.md (ADR-0362 index row), CHANGELOG.md, docs/rebase-notes.md.
  • Invariant: extract_k150k_features.py requires build-cpu/tools/vmaf (fork build with ssimulacra2 + motion_v2). If upstream Netflix adds these extractors to their own release binary, the --vmaf-bin default may be updated to the system binary -- but only after verifying that the metric JSON key names match the aliases in _METRIC_ALIASES. The FEATURE_NAMES tuple is column-order-locked to the parquet schema; any reorder invalidates trained checkpoints that consume the parquet.
  • Upstream interaction: none. Script is fork-only; the K150K clips and parquet are gitignored. Upstream Netflix/vmaf does not ship a K150K extractor.
  • Re-test on rebase:
python ai/scripts/extract_k150k_features.py --limit 10
# Expect: rows=10 cols=48 ok=10 fail=0

0229 — vmaf-tune libvvenc + NN-VC codec adapter (ADR-0285)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/vvenc.py (new fork-only file), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry edit, fork-only), tools/vmaf-tune/tests/test_codec_adapter_vvenc.py (new), tools/vmaf-tune/tests/test_corpus.py (relaxes the known_codecs() == ("libx264",) assertion to "libx264" in known_codecs() since the registry now spans multiple codecs).
  • Invariant: the codec-adapter registry is fork-introduced (Phase A of ADR-0237) and lives entirely outside the upstream Netflix tree, so tools/vmaf-tune/ does not touch upstream paths. The only rebase-sensitive surface is the CORPUS_ROW_KEYS schema in src/vmaftune/__init__.py (per the Phase A invariant in tools/vmaf-tune/AGENTS.md); this PR adds the adapter without changing the schema.
  • Upstream interaction: none. tools/vmaf-tune/ is not in Netflix/vmaf upstream.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/
  • Status update 2026-05-09: the original nnvc_intra toggle was removed (it emitted a fabricated IntraNN key that does not exist in any released VVenC). Replaced with a curated 9-knob real-VVenC 1.14.0 tuning surface (PerceptQPA, InternalBitDepth, Tier, Tiles, MaxParallelFrames, RPR, SAO, ALF, CCALF). Defaults preserve the bit-exact Phase A grid baseline. adapter_version bumped to "2" so cache keys invalidate. See ADR-0285 §"Status update 2026-05-09". no rebase impact: REASON (fork-local file, no upstream-tree touch).

0228 — vmaf-tune Phase D scaffold (ADR-0276)

  • Touches: tools/vmaf-tune/src/vmaftune/per_shot.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_per_shot.py, docs/usage/vmaf-tune.md, docs/adr/0276-vmaf-tune-phase-d-per-shot.md.
  • Invariant: scaffold-only. The module relies on a stable predicate signature (shot, target_vmaf, encoder) -> (crf, predicted_vmaf) that Phase B's bisect (PR #347) drops into later. Shot ranges are half-open [start_frame, end_frame) even though the C-side vmaf-perShot JSON/CSV sidecar uses an inclusive end_frame — normalisation happens at the parse boundary in _parse_per_shot_json / parse_per_shot_csv. vmaf-perShot schema lives in docs/usage/vmaf-perShot.md and is fork-local (ADR-0222), so upstream cannot drift it; the only rebase risk is fork-internal renames.
  • Upstream source: entirely fork-local. tools/vmaf-tune/ is fork-introduced (ADR-0237). Netflix/vmaf upstream has no encode-automation surface.
  • On upstream sync: zero interaction expected. No file in this PR overlaps an upstream-mirrored path.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q
python tools/vmaf-tune/vmaf-tune tune-per-shot --help

0229 — vmaf-tune SVT-AV1 codec adapter (ADR-0278)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/svtav1.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry), tools/vmaf-tune/src/vmaftune/encode.py (parse_versions extended for the SVT-AV1 banner pattern), tools/vmaf-tune/src/vmaftune/corpus.py (optional ffmpeg_preset_token hook).
  • Invariant: PRESET_NAME_TO_INT is closed and order-stable; the integer values are baked into corpus rows that downstream fr_regressor_v2 (ADR-0235) trains on. Reordering or rewriting the table silently changes the integer SVT-AV1 receives. The codec key "libsvtav1" matches CODEC_VOCAB[2] in ai/src/vmaf_train/codec.py — keep them aligned on any rename.
  • Upstream source: fork-local. tools/vmaf-tune/ is a fork-introduced tree (see entry 0227 — Phase A scaffold). No Netflix/vmaf upstream interaction.
  • On upstream sync: zero interaction. Lives entirely under the fork-local tools/vmaf-tune/ tree.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -v

0230 — fr_regressor_v2 PROD ship (ADR-0352)

0230 — fr_regressor_v2 PROD ship (ADR-0291)

  • ADR: ADR-0291

  • Touches: model/tiny/fr_regressor_v2.onnx (binary, refreshed), model/tiny/fr_regressor_v2.json (sidecar, sha256 + metrics), model/tiny/registry.json (smoke flag flip, sha256 update), runs/phase_a/full_grid/per_frame_canonical6.jsonl (training corpus — fork-local artefact under runs/), companion docs.

  • Re-test recipe: see Research-0068 §Reproducer. Ship gate is LOSO PLCC ≥ 0.95 on the per-source folds; current run reports 0.9681 ± 0.0207.
  • Rebase invariant: the per-frame canonical-6 corpus must be rebuilt from runs/phase_a/{nvenc,qsv}_pf.jsonl (PR #392) before any retrain; do not re-train against the cell-only comprehensive.jsonl (it lacks the per-frame features and produces PLCC ≈ 0.7 — the smoke baseline).
  • No upstream interaction: fr_regressor_v2 is fork-local (ADR-0272).

0229 — vmaf-tune Phase E ladder generator (ADR-0295)

  • ADR: ADR-0295
  • Touches: entirely fork-local under tools/vmaf-tune/. New module tools/vmaf-tune/src/vmaftune/ladder.py, new test file tools/vmaf-tune/tests/test_ladder.py, two new subcommand blocks in tools/vmaf-tune/src/vmaftune/cli.py. No upstream-shared paths touched.
  • Invariant: vmaftune.ladder.convex_hull returns a strictly monotonic Pareto frontier (both bitrate and vmaf monotonically increasing); select_knees returns exactly min(n, len(hull)) rungs in ascending bitrate order; emit_manifest("hls") produces one #EXT-X-STREAM-INF per rung with monotonically-increasing BANDWIDTH= values. The default _default_sampler is intentionally NotImplementedError — production callers must inject a Phase B bisect-driven sampler. Phase B integration PR (gated on PR #347) swaps the default; the test suite continues to inject a synthetic stub.
  • Rebase impact: none — fork-local Python tool; upstream Netflix/vmaf does not ship a tools/vmaf-tune/ tree.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_ladder.py -v

0229 — fr_regressor_v2 probabilistic head scaffold (ADR-0279)

  • Touches:
  • ai/scripts/train_fr_regressor_v2_ensemble.py (new — fork-local).
  • ai/scripts/eval_probabilistic_proxy.py (new — fork-local).
  • model/tiny/fr_regressor_v2_ensemble_v1*.onnx, fr_regressor_v2_ensemble_v1.json (new artefacts; smoke probes).
  • model/tiny/registry.json — five new kind: "fr" rows (fr_regressor_v2_ensemble_v1_seed{0..4}); existing entries untouched.
  • ai/AGENTS.md — new "fr_regressor_v2_ensemble_v1 — probabilistic head" section pinning the per-member ONNX I/O contract, manifest-as-runtime-entry-point invariant, ensemble-size pin, confidence-rule one-of, codec-vocab parity, and smoke-artefact posture.
  • docs/ai/models/fr_regressor_v2_probabilistic.md (new model card).
  • docs/research/0067-fr-regressor-v2-probabilistic.md (new audit digest).
  • docs/adr/0279-fr-regressor-v2-probabilistic.md (new ADR; Proposed). Index row appended to docs/adr/README.md.
  • CHANGELOG.md### Added row under "Unreleased — lusoris fork".
  • Invariant: the per-member ONNX I/O contract (two inputs: features [N, 6] standardised + codec_onehot [N, NUM_CODECS]; one output score [N]) and the manifest's confidence rule (one-of "ensemble" / "ensemble+conformal") are the C-side adapter's load-bearing contract. Per-member ensembles are stock FRRegressor(num_codecs=NUM_CODECS) calls — flipping to a v1-shaped single-input graph silently invalidates the manifest. CODEC_VOCAB parity with ai/src/vmaf_train/codec.py is required.
  • On upstream sync: zero interaction expected. Wholly fork-local; no upstream Netflix/vmaf path overlap. The ai/ package is fork-introduced (see ADR-0021, ADR-0036) — upstream has no probabilistic-regressor surface. If upstream ever ships its own fr_regressor_v2 variant, do NOT merge — register both ids side-by-side.
  • Re-test on rebase:
python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke
python ai/scripts/eval_probabilistic_proxy.py --smoke
python ai/scripts/validate_model_registry.py

0287 — vmaf-tune saliency-aware ROI tuning (ADR-0293)

  • Touches: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/cli.py (new recommend subcommand), tools/vmaf-tune/AGENTS.md (saliency invariant), docs/usage/vmaf-tune.md (saliency section).
  • Upstream source: fork-local. The vmaf-tune tree was introduced in PR #329 (ADR-0237 Phase A) and has no upstream Netflix counterpart.
  • On upstream sync: zero interaction — pure fork-local Python package under tools/vmaf-tune/.
  • Invariant: the saliency-to-QP-offset signal blend (offset = (2*sal − 1) * foreground_offset, clamped to ±12) is bit-for-bit equivalent to vmaf-roi's C-side blend (ADR-0247). tests/test_saliency.py pins the contract; if vmaf-roi's C blend changes, saliency.py follows in the same PR. The test seam contract (session_factory=…, encode_runner=…) lets the suite run without onnxruntime or ffmpeg.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/ -q

0229 — tools/vmaf-roi-score/ Option C scaffold (ADR-0296)

  • ADR: ADR-0296
  • Touches:
  • tools/vmaf-roi-score/pyproject.toml (new)
  • tools/vmaf-roi-score/vmaf-roi-score (new console shim)
  • tools/vmaf-roi-score/src/vmafroiscore/__init__.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/cli.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/score.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/mask.py (new)
  • tools/vmaf-roi-score/tests/test_combine.py (new)
  • tools/vmaf-roi-score/README.md (new)
  • tools/vmaf-roi-score/AGENTS.md (new)
  • docs/adr/0296-vmaf-roi-saliency-weighted.md (new)
  • docs/adr/_index_fragments/0296-vmaf-roi-saliency-weighted.md (new)
  • docs/adr/_index_fragments/_order.txt — append-only.
  • docs/research/0069-vmaf-roi-saliency-weighted.md (new)
  • docs/usage/vmaf-roi-score.md (new)
  • changelog.d/added/T6-2c-vmaf-roi-score-scaffold.md (new)
  • Invariant: tools/vmaf-roi-score/ is wholly fork-local. No upstream Netflix/vmaf surface owns or interacts with this directory. The combine math is a pure linear blend on Python float; the JSON schema is pinned by ROI_RESULT_KEYS and SCHEMA_VERSION = 1. Schema bumps require an ADR-0288 supersession. Naming guard: do not confuse with core/tools/vmaf_roi.c (ADR-0247) — that's the encoder-steering binary. The scoring tool here is vmaf-roi-score; the names diverge deliberately.
  • Rebase impact: zero. Pure-Python tool under tools/; not part of the libvmaf C build, not part of any Netflix-mirrored surface.
  • Re-test on rebase:
pytest tools/vmaf-roi-score/tests

0228 — vmaf-tune compare codec-comparison mode (research-0061 Bucket #7)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/compare.py (new). Wholly fork-local; no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds the compare subparser and _run_compare router.
  • tools/vmaf-tune/tests/test_compare.py (new). Mocked predicate; no ffmpeg / vmaf binaries required.
  • tools/vmaf-tune/AGENTS.md — invariant note for the predicate seam and COMPARE_ROW_KEYS contract.
  • docs/usage/vmaf-tune.md — new "Codec comparison" section.
  • Invariant: compare.compare_codecs orchestrates per-codec ranking via an injected predicate(codec, src, target_vmaf) -> RecommendResult callable. The orchestration must not branch on codec name; new codecs land as one-file additions under codec_adapters/ and are picked up automatically by the registry. COMPARE_ROW_KEYS is the JSON / CSV column contract — same maintenance discipline as CORPUS_ROW_KEYS.
  • Rebase impact: entirely fork-local. The Phase A + Phase B recommend backend (ADR-0237) is fork-internal; upstream Netflix/vmaf has no tools/vmaf-tune/ tree.
  • Re-test on rebase:

```shell pytest tools/vmaf-tune/tests/test_compare.py -v PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \ --src /tmp/ref.yuv --target-vmaf 92 --format markdown

0229 — vmaf-tune --score-backend GPU score wiring (ADR-0299)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/score_backend.py (new). Wholly fork-local — tools/vmaf-tune/ has no upstream Netflix/vmaf overlap.
  • tools/vmaf-tune/src/vmaftune/{score,corpus,cli}.py (additive kwargs, no API removals).
  • tools/vmaf-tune/tests/test_score_backend.py (new).
  • docs/usage/vmaf-tune.md (new GPU section + flag row).
  • docs/adr/0299-vmaf-tune-gpu-score.md (new).
  • docs/research/0071-vmaf-tune-gpu-score-backend.md (new).
  • Invariant: the libvmaf CLI exposes --backend NAME with values auto|cpu|cuda|sycl|vulkan exactly. Help-text parser in score_backend.parse_supported_backends pins this format. If upstream renames the flag or reformats the help line on merge, the parser silently degrades to "CPU only" — the test fixtures in test_score_backend.py will catch the format change but only if re-run.
  • Upstream source: fork-local. Netflix upstream's CLI does not ship a --backend selector (CPU-only).
  • On upstream sync: zero interaction. vmaf-tune lives entirely in fork-introduced paths and consumes only the fork's --backend flag.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v
# If the libvmaf help text reformats, parse_supported_backends
# will return {"cpu"} on test_parse_full_backend_line_yields_all_four
# and the test fails loudly.

0261 — vmaf-tune HDR-aware encode + score path (2026-05-03)

  • What changed: fork-local addition under tools/vmaf-tune/src/vmaftune/hdr.py plus wiring into corpus.py / cli.py / score.py. Adds ffprobe-driven HDR detection, codec-specific HDR ffmpeg flag dispatch, schema-v2 corpus row keys (hdr_transfer, hdr_primaries, hdr_forced), and four --auto-hdr / --force-* CLI modes. See ADR-0300.
  • Upstream source: zero. tools/vmaf-tune/ is fork-introduced (Phase A under ADR-0237).
  • On upstream sync: zero interaction. Upstream Netflix/vmaf ships no encode automation surface; this tree is entirely fork-local and lives outside libvmaf/ and python/.
  • Schema migration note: SCHEMA_VERSION bumped 1 → 2. The three new keys are additive — Phase B / C loaders treat missing keys as SDR for backward compat with v1 rows.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -q
python -m vmaftune.cli corpus --help  # confirm --auto-hdr surfaces

0298 — vmaf-tune content-addressed cache (ADR-0298)

  • What changed: fork-local. New module tools/vmaf-tune/src/vmaftune/cache.py; cache integration in tools/vmaf-tune/src/vmaftune/corpus.py (iter_rows now consults the cache before encode/score); new CLI flags --no-cache, --cache-dir, --cache-size-gb in cli.py. Codec-adapter Protocol gains adapter_version: str; the lone Phase-A x264 adapter pins "1".
  • Upstream source: none. tools/vmaf-tune/ is fork-introduced (ADR-0237) and has no upstream counterpart.
  • On upstream sync: zero interaction with Netflix/vmaf master. The module sits entirely under tools/vmaf-tune/, which upstream does not ship.
  • Invariant for future codec adapters: every CodecAdapter must declare adapter_version: str. Bump it whenever the adapter's argv shape, preset list, or quality range changes — otherwise the cache returns stale results post-upgrade. The contract is asserted by test_cache_key_diffs_on_each_field in tests/test_cache.py.
  • Re-test on rebase:

```bash pytest tools/vmaf-tune/tests/test_cache.py -v

0283 — vmaf-tune Apple VideoToolbox adapters (2026-05-05)

  • What changed: fork-local addition under tools/vmaf-tune/src/vmaftune/codec_adapters/. New files: h264_videotoolbox.py, hevc_videotoolbox.py, _videotoolbox_common.py, plus the registry hook in __init__.py. See ADR-0283.
  • Update 2026-05-09: prores_videotoolbox.py adapter added to the same registry pattern (broadcast / prosumer ProRes intermediate). Quality knob differs — ProRes is a fixed-rate codec, so the harness's --crf slot carries the integer ProRes tier id (0=proxy → 5=xq) rather than a -q:v value. _videotoolbox_common.py extended with PRORES_PROFILE_* constants + validate_prores_videotoolbox() / prores_profile_name() helpers; profile ids verified against FFmpeg n8.1.1 libavcodec/videotoolboxenc.c. See the Status update appendix in ADR-0283.
  • Upstream source: zero. tools/vmaf-tune/ is fork-introduced (Phase A under ADR-0237).
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py -q
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_prores_videotoolbox.py -q

0228 — vmaf-tune coarse-to-fine CRF search (ADR-0306)

  • What changed: fork-local tooling. Adds coarse_to_fine_search() to tools/vmaf-tune/src/vmaftune/corpus.py, plumbs new CLI flags onto vmaf-tune corpus (--coarse-to-fine, --coarse-step, --fine-radius, --fine-step, --target-vmaf), and ships a new vmaf-tune recommend subcommand. Widens tools/vmaf-tune/src/vmaftune/codec_adapters/x264.py quality_range from (15, 40) to (0, 51). JSONL row schema unchanged (SCHEMA_VERSION=1).
  • Upstream source: fork-local. The whole tools/vmaf-tune/ tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation surface.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is not mirrored from upstream.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_corpus.py -k coarse_to_fine

0314 — vmaf-tune --score-backend=vulkan (ADR-0314)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/cli.py (additive argparse flag on corpus + recommend subparsers; resolves select_backend and catches BackendUnavailableError for clean exit-2).
  • tools/vmaf-tune/src/vmaftune/score.py (additive backend kwarg on build_vmaf_command and run_score; None = no flag emitted).
  • tools/vmaf-tune/src/vmaftune/corpus.py (new CorpusOptions.score_backend field, default None; forwarded into run_score).
  • tools/vmaf-tune/tests/test_score_backend.py (additive Vulkan-specific tests; pre-existing tests now pass after the backend= kwarg lands).
  • docs/adr/0314-vmaf-tune-score-backend-vulkan.md (new).
  • docs/usage/vmaf-tune.md (new "Vulkan score backend" subsection under the existing GPU-scoring section).
  • tools/vmaf-tune/AGENTS.md (invariant note: argparse choices stay in sync with libvmaf --backend vocabulary).
  • changelog.d/added/vmaf-tune-score-backend-vulkan.md (new).
  • Invariant: score_backend.ALL_BACKENDS = ("cpu", "cuda", "sycl", "vulkan") is the exact set libvmaf's core/tools/cli_parse.c --backend alternation accepts. Adding a new harness-side value without the libvmaf-side wiring produces silent strict-mode failures on hosts that probe positively for it.
  • Upstream source: zero. Netflix upstream's CLI does not ship a --backend selector; both tools/vmaf-tune/ and core/src/vulkan/ are fork-introduced.
  • On upstream sync: zero interaction. No upstream-mirror file is touched.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v -k vulkan
pytest tools/vmaf-tune/tests/test_score_backend.py -v

Failures here usually indicate the libvmaf help-text format changed; score_backend.parse_supported_backends test fixtures pin the format and will fail loudly.

0303 — fr_regressor_v2 ensemble prod flip (ADR-0303)

  • ADR: ADR-0303
  • Touches: entirely fork-local.
  • ai/scripts/train_fr_regressor_v2_ensemble_loso.py (new — 9-fold LOSO trainer over the five ensemble seeds; emits loso_seed{N}.json artefacts).
  • scripts/ci/ensemble_prod_gate.py (new — reads five loso_seed{N}.json files, returns exit 0 iff mean(PLCC_i) ≥ 0.95 AND max - min ≤ 0.005).
  • ai/AGENTS.md — appended "Ensemble registry invariant" paragraph under the existing fr_regressor_v2_ensemble_v1 section.
  • docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md (new), docs/research/0075-fr-regressor-v2-ensemble-prod-flip.md (new), changelog.d/added/fr-regressor-v2-ensemble-prod-flip.md (new).
  • Rebase invariant: the production ship gate is two-partmean_i(PLCC_i) ≥ 0.95 AND max_i(PLCC_i) - min_i(PLCC_i) ≤ 0.005 over five seeds. The variance bound is load-bearing: removing it silently allows a one-seed-wins-four-seeds-tie configuration that invalidates the ensemble's predictive-distribution semantics. Both thresholds live in scripts/ci/ensemble_prod_gate.py; do not weaken either without superseding ADR-0303.
  • Rebase invariant (registry): the five fr_regressor_v2_ensemble_v1_seed{0..4} registry rows are smoke: true on master at this commit; flipping them to false is the follow-up flip PR's job, gated on a real-corpus LOSO run + the CI gate. Do not flip seed rows during a rebase merge conflict resolution.
  • Re-test on rebase:
python3 -c "import ast; ast.parse(open('ai/scripts/train_fr_regressor_v2_ensemble_loso.py').read())"
python3 -c "import ast; ast.parse(open('scripts/ci/ensemble_prod_gate.py').read())"
python ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help
python scripts/ci/ensemble_prod_gate.py --help
  • Upstream source: zero. fr_regressor_v2 and its ensemble are fork-introduced (parent ADR-0272 / ADR-0279).
  • On upstream sync: zero interaction.

0313 — CI required-checks aggregator (2026-05-05)

  • What changed: fork-local CI policy. New .github/workflows/required-aggregator.yml — single workflow that runs on every non-draft PR and verifies the 23 named required checks reported success/skipped/neutral (or didn't appear at all, which is the path-filter-rejection semantics). Aggregator becomes the single branch-protection required check, replacing the 23-name list from ADR-0037.
  • Touches: .github/workflows/required-aggregator.yml (new), docs/adr/0313-ci-required-checks-aggregator.md (new), changelog.d/added/ci-required-checks-aggregator.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment file).
  • Upstream source: zero. Branch-protection policy is fork-only.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Manual operator step at adoption (uses PATCH, not PUT — corrected from the original ADR-0313 body which had the wrong verb):
echo '{"strict": false, "contexts": ["Required Checks Aggregator"]}' | \
  gh api -X PATCH "repos/VMAFx/vmafx/branches/master/protection/required_status_checks" --input -
  • Re-test on rebase:
# YAML lint passes
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/required-aggregator.yml'))"

0305 — encoder knob-space Pareto analysis (2026-05-05)

  • What changed: fork-local. New analysis scaffold for the 12,636-cell encoder knob sweep that backs tools/vmaf-tune/codec_adapters/* recipe defaults. New files: ai/scripts/analyze_knob_sweep.py (per-(source, codec, rc_mode) Pareto hull on (bitrate_kbps, vmaf_score), encode_time_ms tiebreaker, regression-detection check), ai/tests/test_knob_sweep_analysis.py (synthetic 20-row JSONL fixture). Methodology + scaffolded findings: see ADR-0305 + Research-0077. Companion to Research-0063.
  • Touches: none upstream-shared. Sits entirely under ai/ (fork-local since the tiny-AI training surface, ADR-0021) and docs/{adr,research}/ (fork ledger).
  • Upstream source: zero. The 12,636-cell sweep, the Pareto scaffold, and the regression-detection invariant are fork-introduced; Netflix/vmaf master ships no encoder knob-sweep tooling.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Invariant for future codec adapter PRs: per the ai/AGENTS.md knob-sweep corpus invariant (ADR-0305), recipes that regress vs the bare encoder at matched bitrate within the same (source, codec, rc_mode) slice MUST NOT ship as adapter defaults. New adapter PRs cite the per-slice hull row from reports/summary.md (or "no hull entry yet — bare default") in their PR description. The comprehensive.jsonl sweep file is generated locally and lives under runs/phase_a/full_grid/ (gitignored — never committed).
  • Re-test on rebase:
pytest ai/tests/test_knob_sweep_analysis.py -v

0302 — ENCODER_VOCAB v3 schema expansion (ADR-0302)

  • Touches: ai/scripts/train_fr_regressor_v2.py (adds an ENCODER_VOCAB_V3 parallel constant; does not modify the live ENCODER_VOCAB or ENCODER_VOCAB_VERSION).
  • Invariant: ENCODER_VOCAB is append-only and order-stable (per ADR-0235). The v3 scaffold preserves the v2 slot ordering verbatim — slots 0..12 are bit-identical to the v2 vocab; slots 13/14/15 append libsvtav1, h264_videotoolbox, hevc_videotoolbox. The live ENCODER_VOCAB_VERSION = 2 remains the source of truth until the follow-up retrain PR clears the LOSO PLCC ship gate.
  • Upstream interaction: zero. ai/scripts/train_fr_regressor_v2.py is fork-introduced (ADR-0272) and has no upstream counterpart.
  • Re-test on rebase:
python3 -c "
import importlib.util, pathlib
spec = importlib.util.spec_from_file_location(
    't', pathlib.Path('ai/scripts/train_fr_regressor_v2.py')
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
assert len(m.ENCODER_VOCAB_V3) == 16
assert m.ENCODER_VOCAB_VERSION == 2
print('OK')
"

0304 — vmaf-tune fast-path prod wiring (ADR-0304)

  • Touches: tools/vmaf-tune/src/vmaftune/fast.py (replaces the ADR-0276 scaffold's NotImplementedError paths with concrete Optuna TPE + v2 proxy + GPU verify wiring); new module tools/vmaf-tune/src/vmaftune/proxy.py (centralised seam for fr_regressor_v2 ONNX inference); expanded tools/vmaf-tune/tests/test_fast.py. Doc-side: ADR-0304, Research-0076, tools/vmaf-tune/AGENTS.md invariant note.
  • Upstream source: zero. tools/vmaf-tune/ and model/tiny/fr_regressor_v2.onnx are both fork-introduced (ADR-0237 / ADR-0352).
  • Invariant: the production proxy is always fr_regressor_v2 (no smoke models in the production path) and a single GPU verify pass at recommend-end is mandatory — proxy alone never wins. The vmaftune.proxy.run_proxy helper is the single seam every fast-path consumer goes through; future probabilistic-head / ensemble migrations land in that one module. ENCODER_VOCAB v2 one-hot ordering is frozen by ADR-0352 and pinned in proxy.ENCODER_VOCAB_V2 — keep in sync with ai/scripts/train_fr_regressor_v2.py; drift raises ProxyError at inference time before bad predictions ship.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_fast.py -v

0307 — vmaf-tune ladder default sampler wiring (ADR-0307)

  • What changed: fork-local tooling. tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler no longer raises NotImplementedError; it composes corpus.iter_rows (Phase A encode + score) with recommend.pick_target_vmaf (smallest CRF clearing target VMAF) over DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38) at the adapter's mid-range preset. Module-level docstring + AGENTS.md invariant updated. New tests in tools/vmaf-tune/tests/test_ladder.py stub iter_rows via monkeypatch.setattr so no live ffmpeg / vmaf binaries are needed.
  • Upstream source: fork-local. The whole tools/vmaf-tune/ tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation / ladder surface.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is not mirrored from upstream.
  • Rebase invariant: the 5-point sweep (18, 23, 28, 33, 38) is the load-bearing default; downstream Phase E callers size their wall-time budget against five encodes per (resolution, target_vmaf) cell. Do not widen / narrow it without an ADR-0307 follow-up. The SamplerFn seam stays open — callers needing finer grids pass an explicit sampler=.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_ladder.py -v

0309 — fr_regressor_v2 ensemble real-corpus retrain harness (ADR-0309)

  • ADR: ADR-0309
  • Touches: entirely fork-local.
  • ai/scripts/run_ensemble_v2_real_corpus_loso.sh (new — Bash wrapper that loops the five seeds over the existing train_fr_regressor_v2_ensemble_loso.py against .workingdir2/netflix/).
  • ai/scripts/validate_ensemble_seeds.py (new — calls the ADR-0303 gate and writes PROMOTE.json / HOLD.json with a corpus sha256 snapshot).
  • ai/tests/test_validate_ensemble_seeds.py (new — 7 tests, synthetic JSON fixtures for both verdict paths).
  • ai/AGENTS.md — appended "Registry-flip is a separate PR (ADR-0309)" paragraph under the existing fr_regressor_v2_ensemble_v1 section.
  • docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md, docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md (all new).
  • Rebase invariant: the harness is decoupled from the registry mutation. Neither the wrapper nor the validator touches model/tiny/registry.json; the registry flip is a separate follow-up PR gated on a passing PROMOTE.json. Auto-flipping on PROMOTE was rejected in ADR-0309's alternatives matrix specifically because rebase-time mutation of shipped registry rows is the foot-gun this invariant exists to prevent.
  • Re-test on rebase:
python -m pytest ai/tests/test_validate_ensemble_seeds.py -v
python ai/scripts/validate_ensemble_seeds.py --help
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
  • Upstream source: zero.
  • On upstream sync: zero interaction.

0310 — BVI-DVC corpus ingestion for fr_regressor_v2 (ADR-0310)

  • Touches: ai/scripts/bvi_dvc_to_corpus_jsonl.py (new fork-only adapter), ai/scripts/merge_corpora.py (new fork-only shard merger), ai/tests/test_merge_corpora.py (new), docs/ai/bvi-dvc-corpus-ingestion.md (new), docs/adr/0310-bvi-dvc-corpus-ingestion.md (new), docs/research/0082-bvi-dvc-corpus-feasibility.md (new), ai/AGENTS.md (BVI-DVC invariant note).
  • Invariant: the BVI-DVC archive and any extracted artefacts (parquet, cached libvmaf JSON, JSONL corpus shard) are research-only and stay local — only derived fr_regressor_v2_*.onnx weights ship. The merge utility validates every row against the canonical vmaftune.CORPUS_ROW_KEYS tuple; the schema is the merge contract. Re-shape here is a pure transform on the cached libvmaf JSON; no ffmpeg / vmaf binary is invoked. The (src_sha256, encoder, preset, crf) natural key is load-bearing for de-duplication across mirrors and re-encodes.
  • Upstream interaction: none. ai/ is fork-introduced; BVI-DVC is not part of Netflix/vmaf upstream.
  • Re-test on rebase:
python -m pytest ai/tests/test_merge_corpora.py -v

ADR-0312 — ffmpeg-patches/ vmaf-tune integration (2026-05-05)

  • Files: ffmpeg-patches/0007-libvmaf-tune-qpfile-unified.patch, ffmpeg-patches/0008-add-libvmaf_tune-filter.patch, ffmpeg-patches/0009-pass-autotune-cli-glue.patch, ffmpeg-patches/series.txt, ffmpeg-patches/README.md.
  • Rebase invariant: patches 0007–0009 plug into the cumulative state after patches 0001–0006 apply against pristine n8.1. Per-patch git apply --check in isolation is the wrong gate; use the series-replay command in CLAUDE.md §12 r14 instead.
  • vmaf-tune patch invariant: the qpfile parser at libavcodec/qpfile_parser.{c,h} is shared across all three encoder adapters in patch 0007. Future encoders that grow a -qpfile AVOption inherit it; do not fork the parser. When tools/vmaf-tune/src/vmaftune/saliency.py's qpfile output format changes (new column, different frame-type alphabet, …), patch 0007 must change in the same PR (CLAUDE.md §12 r14).
  • vf_libvmaf_tune full-scoring promotion (2026-05-06): patch 0008 originally shipped as a scaffold (linear CRF↔VMAF interpolation, no libvmaf scoring) per ADR-0312's deferred-alternatives column. The filter now mirrors vf_libvmaf.c's CPU framesync pipeline end-to-end (vmaf_init + vmaf_model_load + vmaf_use_features_from_model in init(); per-frame vmaf_picture_alloc + memcpy + vmaf_read_pictures; flush + vmaf_score_pooled(MEAN) in uninit()). The CRF recommendation remains a piece-wise linear projection from the observed VMAF; per-clip Optuna TPE search stays in tools/vmaf-tune/src/vmaftune/recommend.py. Rebase-side: the new filter still depends only on libvmaf's CPU C-API (vmaf_init, vmaf_model_load, vmaf_use_features_from_model, vmaf_read_pictures, vmaf_score_pooled, vmaf_close, vmaf_picture_alloc/unref); zero new symbols beyond what vf_libvmaf.c already requires, so future libvmaf rebases that pass the existing libvmaf filter pass this one too. ADR-0312 sub-decision retired.
  • n7+ API migration (2026-05-06): patch 0008 originally referenced the removed AVFilterLink::frame_rate member directly (n6-era API); in n7+ that field moved off AVFilterLink onto a new FilterLink struct accessed via ff_filter_link(AVFilterLink *) from libavfilter/filters.h. Patch 0008 now uses ff_filter_link(outlink)->frame_rate = ff_filter_link(mainlink)->frame_rate; in config_output(), mirroring patches 0005/0006 which were already written against the post-n7 API. The bug slipped through CI because the FFmpeg-Vulkan lane only builds vf_libvmaf.o, not vf_libvmaf_tune.c; the full SYCL lane catches it now that PR #415 added ffmpeg-patches/** to the integration workflow's path filter. Discovery: PR #415 / ADR-0317.
  • Upstream source: zero. The vmaf-tune integration is fork-introduced; pure upstream syncs are unaffected.
  • On upstream sync: zero interaction with libvmaf master. FFmpeg-side rebases when n8.1 → n8.x land in ffmpeg-patches/test/build-and-run.sh's FFMPEG_SHA are tracked separately under each refresh ADR (e.g., ADR-0277 for the 2026-05-04 refresh).
  • Re-test on rebase:
git -C /path/to/ffmpeg-8 reset --hard n8.1
for p in ffmpeg-patches/000*-*.patch; do
    git -C /path/to/ffmpeg-8 am --3way "$p" || break
done
# Build smoke (libvmaf-disabled — patches 0001–0006 skipped if libvmaf_dnn
# is not built). With libvmaf_dnn available:
cd /path/to/ffmpeg-8 && ./configure --enable-libvmaf --enable-libx264 --enable-libsvtav1 --enable-libaom --enable-gpl
make -j$(nproc) ffmpeg
./ffmpeg -hide_banner -h encoder=libx264 2>&1 | grep -i qpfile
  • 2026-05-06 update — patch 0007 SVT-AV1 ROI bridge promoted from scaffold to full impl: the libsvtav1 hunk now sets enc_params.enable_roi_map = true, builds one SvtAv1RoiMapEvt per qpfile frame upfront in eb_enc_init (per-MB qp_offsets averaged into per-64×64-SB b64_seg_map of up to 8 segment QPs; uniform binning when the value span exceeds the segment budget), and attaches each event as a ROI_MAP_EVENT priv-data node from eb_send_frame() with node->size = sizeof(SvtAv1RoiMapEvt*) (the validation contract enforced by SVT-AV1's resource_coordination_process.c). Lifetime invariant: events + maps live for the entire encode session because SVT-AV1 reads ROI_MAP_EVENT data via shallow-copied pointers on async pipeline threads (per enc_handle.c::copy_private_data_list); eb_enc_close frees them. Wiring is gated on SVT_AV1_CHECK_VERSION(1, 6, 0); older SVT-AV1 builds keep the log-and-continue fallback. libaom remains scaffold-only — its AOME_SET_ROI_MAP bridge stays a separate follow-up. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).
  • 2026-05-06 update — patch 0007 libaom-av1 ROI bridge promoted from scaffold to full impl: the libaom-av1 hunk now caches the parsed VmafTuneQpFile in AOMContext, allocates a segment-id map at libaom's mode-info grid (ALIGN_POWER_OF_TWO(dim, 8) >> 2, since av1/common/enums.h::MI_SIZE == 4), and on every encoded frame picks up to 8 segment QPs from the per-frame qp_offset value range (uniform linear binning when the span exceeds AOM_MAX_SEGMENTS == 8), paints the per-mi segment map by expanding each per-16×16-MB qp_offset into a 4×4 block of mi cells, and issues aom_codec_control(&ctx->encoder, AOME_SET_ROI_MAP, &roi_map). Lifetime invariant: libaom deep-copies the segment map and delta_q[] table on every control call (per av1/encoder/encoder.c::av1_set_roi_map memcpy), so a single buffer is reused across frames and freed in aom_free(). The qpfile is also freed there. Trade-off: the 8-segment cap rounds nearby qp_offsets together when the saliency model emits more than 8 distinct values per frame; finer granularity requires vmaf-tune corpus instead. This retires the libaom-av1 deferral noted under ADR-0312 — both AV1 encoder hooks (libsvtav1 and libaom-av1) are now full-impl. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).

0315 — Vendor-neutral VVC encode strategy (ADR-0315 / Research-0085)

  • ADR: ADR-0315
  • Digest: Research-0085
  • Touches: docs-only.
  • docs/research/0085-vendor-neutral-vvc-encode-landscape.md (new).
  • docs/adr/0315-vendor-neutral-vvc-encode-strategy.md (new).
  • docs/adr/_index_fragments/0315-vendor-neutral-vvc-encode-strategy.md (new).
  • docs/adr/_index_fragments/_order.txt (one-line append).
  • changelog.d/added/research-0085-vendor-neutral-vvc-encode.md (new).
  • docs/rebase-notes.md (this entry).
  • Rebase invariant: none. The research digest and ADR are pure surveys with no code dependencies; nothing in the fork's source tree references them in a way that breaks on upstream rebase.
  • Upstream source: zero. VVC encode strategy is a fork-local decision; upstream Netflix/vmaf has no codec adapter or encode-automation surface.
  • On upstream sync: zero interaction. Pure docs.
  • Re-test on rebase:
mkdocs build --strict 2>&1 | grep -E "(WARNING|ERROR)" || echo "docs build clean"
  • 2026-05-06 follow-up (Research-0085 verification pass):
  • docs/research/0085-vendor-neutral-vvc-encode-landscape.md flipped from Status: SKELETON to Status: Active. Most [UNVERIFIED] claims are now backed by primary-source URLs (NVIDIA SDK 13.0 docs, AMD AMF GitHub, Intel oneVPL GitHub + mfxstructures.h + CHANGELOG.md, Khronos registry, Phoronix Mesa/RADV coverage, VVenC issue tracker, ZLUDA repo).
  • ADR-0315's ## Context and ## Alternatives considered refreshed with the verified data points. Status stays Proposed.
  • [UNVERIFIED] count in the digest dropped 25 → 10; remaining items are legitimate gaps (NN-VC quality lift, vvenc per-kernel profile, HHI's non-public roadmap).
  • No code touched. No rebase impact beyond the existing docs-only posture.

0316 — cli_parse.c error() long-only-option fix (ADR-0316)

  • ADR: ADR-0316 (follow-up to ADR-0311).
  • Digest: none — bug-fix; fix shape fits in the ADR/commit body.
  • Touches:
  • core/tools/cli_parse.c (3 lines — call-site arg change at the ARG_THREADS / ARG_SUBSAMPLE / ARG_CPUMASK handlers).
  • core/test/fuzz/fuzz_cli_parse.c (removed known_assert_in_input early-reject filter).
  • core/test/fuzz/cli_parse_corpus/cli_threads_abbrev_assert.argv (promoted from cli_parse_known_crashes/).
  • core/test/test_cli_parse_long_only_args.c (new fork()-based regression test).
  • core/test/meson.build (new test wiring, gated off Windows alongside test_y4m_411_oob).
  • core/tools/AGENTS.md (added a long-only-options invariant note next to the existing cli_parse.c rules).
  • Rebase invariant: load-bearing. cli_parse.c is upstream-mirror with fork additions; the three handlers carry the fork-local shape of passing the ARG_* enum value (not 't' / 's' / 'c') to parse_unsigned(). If an upstream sync re-introduces the original short-option char shape, the assert returns and the parked-then-promoted reproducer (cli_parse_corpus/cli_threads_abbrev_assert.argv) will surface it in the next nightly fuzz run.
  • Upstream source: the bug shape exists in Netflix/vmaf master too (long-only options were added upstream with the same short-option-char placeholder). When the fork ports an upstream fix that overlaps these handlers, prefer the parse_unsigned(optarg, ARG_*, argv[0]) form already on the fork.
  • On upstream sync: re-apply the three-line change in cli_parse.c if upstream resets the call-site args. The unit test is fork-local and stays.
  • Re-test on rebase:
meson setup core/build libvmaf -Denable_tests=true \
    -Denable_cuda=false -Denable_sycl=false
ninja -C core/build test/test_cli_parse_long_only_args
meson test -C core/build test_cli_parse_long_only_args -v

ADR-0317 — CI flake fix: doc-only PR path-filter (2026-05-06)

  • Touched files:
  • .github/workflows/docker-image.yml — added paths: filter on both push: and pull_request: triggers.
  • .github/workflows/ffmpeg-integration.yml — added paths: filter on both push: and pull_request: triggers (covers all four matrix lanes: gcc, clang, SYCL, Vulkan).
  • docs/adr/0317-ci-doc-only-pr-flake-fix.md, docs/adr/README.md (index row), changelog.d/fixed/ci-doc-only-pr-flakes.md.
  • Rebase invariant: not load-bearing. Workflow-only change. Both files are fork-local CI; upstream Netflix/vmaf does not ship a Docker workflow or an FFmpeg-integration matrix in this shape, so rebase conflicts are unlikely. If a future upstream sync introduces an overlapping docker-image.yml or FFmpeg matrix, prefer the fork's path-filtered form — the rationale (ADR-0313 aggregator posture, doc-only-PR runner-time burn) is fork-specific.
  • Upstream source: none — fork-local CI workflows.
  • On upstream sync: no action required. If reviewers later add new build inputs (e.g. a top-level docker-compose.yml, a new ffmpeg-patches/*.txt config file), extend the paths: lists in the same PR that adds the input.
  • Follow-up not in this ADR: patch ffmpeg-patches/0008-add-libvmaf_tune-filter.patch line 256 (outlink->frame_rate = mainlink->frame_rate;) needs to migrate to the ff_filter_link() accessor introduced in FFmpeg n7+, matching the pattern already in patches 0005 / 0006. Tracked separately; the path-filter does not hide it (any libvmaf/ or ffmpeg-patches/ PR will still trip the SYCL lane).
  • Re-test on rebase:
python3 -c "import yaml; \
  yaml.safe_load(open('.github/workflows/docker-image.yml')); \
  yaml.safe_load(open('.github/workflows/ffmpeg-integration.yml')); \
  print('OK')"

0319 — fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training (ADR-0319)

  • Touches: ai/scripts/train_fr_regressor_v2_ensemble_loso.py (real _load_corpus + _train_one_seed bodies), ai/scripts/run_ensemble_v2_real_corpus_loso.sh (wrapper argv fix), docs/ai/ensemble-v2-real-corpus-retrain-runbook.md (Step 0 corpus-generation section), ai/AGENTS.md (canonical-6 schema invariant note), ai/tests/test_train_fr_regressor_v2_ensemble_loso_*.py (loader + train schema tests). Closes the deferrals tracked in rebase-notes §0303 + §0309.
  • Upstream source: none — fork-local ML training infrastructure. Netflix/vmaf upstream has no fr_regressor_v2 surface, no LOSO trainer, and no canonical-6 corpus tooling.
  • Invariant: the trainer's _load_corpus accepts the canonical-6 JSONL schema emitted by scripts/dev/hw_encoder_corpus.py bit-for-bit — required keys per row are (src, encoder, cq, frame_index, vmaf, adm2, vif_scale0..3, motion2). Codec block layout is 12-slot ENCODER_VOCAB v2 one-hot + constant preset_norm = 0.5 + crf_norm = (cq - cq_min) / (cq_max - cq_min). Schema changes require an ENCODER_VOCAB_VERSION bump and full ensemble retrain per the existing closed-vocabulary rule (ADR-0235 / ADR-0352). Fold-level StandardScaler is fit on the training rows only; leaking the held-out source's distribution into the scaler would silently inflate per-fold PLCC.
  • On upstream sync: no action required. If upstream Netflix/vmaf ever adds a competing LOSO trainer under python/vmaf/, do NOT merge them — keep the fork's training stack under ai/ per the AGENTS.md scope rule.
  • Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v2_ensemble_loso_loader.py \
       ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py -v
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh

ADR-0323 — fr_regressor_v3 train + register on ENCODER_VOCAB v3 (2026-05-06)

  • Scope: ai/scripts/train_fr_regressor_v3.py (new), ai/tests/test_train_fr_regressor_v3.py (new), model/tiny/fr_regressor_v3.onnx (new, real-weight checkpoint from a 9-fold LOSO gate-pass at mean PLCC 0.9975), model/tiny/fr_regressor_v3.json (new sidecar with encoder_vocab_version: 3 and full per-fold trace), model/tiny/registry.json (new fr_regressor_v3 row, smoke: false), ai/AGENTS.md (v3 retrain invariant section gains a "Status" subsection recording the gate result), docs/ai/models/fr_regressor_v3.md (new model card), docs/adr/0323-fr-regressor-v3-train-and-register.md + index row, changelog.d/added/fr-regressor-v3-train-register.md.
  • Rebase impact: zero. Fork-local feature; no upstream Netflix/vmaf surface is touched. The 16-slot ENCODER_VOCAB_V3 imported from train_fr_regressor_v2.py was already landed by PR #401 (ADR-0302).
  • On upstream sync: no action required. The v3 model ships alongside v2 — fr_regressor_v2.onnx and its sidecar are unchanged; the v3 row is appended to the registry and sorted alphabetically. If a future upstream sync ever lands a competing fr_regressor_v3 model under python/vmaf/, do NOT cross-link them — the fork's training stack lives under ai/.
  • Watch out for: the live ENCODER_VOCAB_VERSION in ai/scripts/train_fr_regressor_v2.py stays at 2 (per ADR-0302's invariant). Do not bump it to 3 in this PR or in any downstream port; the in-place promotion of v3 over v2 is a separate "promote v3 to authoritative" PR per ADR-0302's production-flip checklist.
  • Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh   # must report OK: 20+
python -c "import onnx; onnx.checker.check_model(onnx.load('model/tiny/fr_regressor_v3.onnx')); print('OK')"

ADR-0321 — fr_regressor_v2_ensemble_v1 full production flip (2026-05-06)

  • Scope: ai/scripts/export_ensemble_v2_seeds.py (new), model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.onnx (real full-corpus-trained weights replacing the 3025-byte synthetic scaffold bytes), model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.json (new per-seed sidecars), model/tiny/registry.json (sha256 + smoke: false on the five seed rows), ai/AGENTS.md (new invariant: the registry-flip is now done; future re-flips require a fresh PROMOTE.json + re-run of the export driver).
  • Rebase impact: zero. This is a fork-local production-flip; no upstream Netflix/vmaf surface is touched. The 12-slot ENCODER_VOCAB v2 carried in each sidecar is the same one the LOSO trainer (ADR-0319) bakes into the codec-block layout, so there is no rebase-time vocabulary drift to worry about.
  • Watch out for: if a future upstream sync ever introduces a competing fr_regressor_v2_ensemble_* model under python/vmaf/, do NOT cross-link them — the fork's ensemble weights are gated on runs/ensemble_v2_real/PROMOTE.json and are not portable to a different training stack.
  • Re-test on rebase:
bash core/test/dnn/test_registry.sh   # must report OK: 19
python -c "import onnx; \
  [onnx.checker.check_model(onnx.load(f'model/tiny/fr_regressor_v2_ensemble_v1_seed{i}.onnx')) \
   for i in range(5)]; print('OK')"

ADR-0324 — Ensemble training kit (2026-05-06)

  • Touches: tools/ensemble-training-kit/ (new), docs/adr/0324-ensemble-training-kit.md (new), docs/adr/README.md (index row), changelog.d/added/0324-ensemble-training-kit.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the kit assumes the LOSO wrapper hard-codes seeds (0 1 2 3 4). The orchestrator surfaces a warning if --seeds deviates but still hands off to the wrapper. If a future PR parameterises the wrapper's seed list, update both the wrapper and the kit's pass-through logic in lockstep.
  • On upstream sync: no action required. The kit lives entirely under tools/ensemble-training-kit/ (a fork-local path) and only invokes other fork-local scripts (ai/scripts/, scripts/dev/, scripts/ci/).
  • Re-test on rebase:
bash -n tools/ensemble-training-kit/*.sh
bash tools/ensemble-training-kit/make-distribution-tarball.sh /tmp/kit-test.tar.gz
tar -tzf /tmp/kit-test.tar.gz | grep -q "tools/ensemble-training-kit/run-full-pipeline.sh"

ADR-0335 — Hardware-capability priors (2026-05-08)

  • Touches: ai/data/hardware_caps.csv (new), ai/scripts/hardware_caps_loader.py (new), ai/tests/test_hardware_caps.py (new), ai/AGENTS.md (one new bullet under "Rebase-sensitive invariants"), docs/ai/hardware-capability-priors.md (new), docs/research/0088-hardware-capability-priors-2026-05-08.md (new), docs/adr/0335-hardware-capability-priors.md (new), docs/adr/_index_fragments/0335-hardware-capability-priors.md (new), docs/adr/_index_fragments/_order.txt (one-line append), CHANGELOG.md (Added bullet under [Unreleased] — lusoris fork). No upstream-shared paths.
  • Invariant: the table is prior-only. The schema check in hardware_caps_loader.py rejects benchmark-shaped header columns (fps_*, throughput, mbps, latency, watts, tdp, score_*, vmaf_*), community-wiki source URLs (wikipedia.org, wikichip.org), empty fields, and rows with encoding_blocks=0. Adding throughput / quality columns is forbidden — that pathology was the contributor-pack digest's category-1 NO-GO finding. Schema extensions need a new ADR, not a silent column bump. The cap_vector_for() return-dict shape is load-bearing: trainers / corpus writers consume hwcap_* columns by name; reordering or renaming silently breaks downstream parquet schemas.
  • On upstream sync: no action required. The whole surface lives under ai/ and docs/ — Netflix upstream has no equivalent.
  • Re-test on rebase:
python -m pytest ai/tests/test_hardware_caps.py -v   # must report 23 passed
python ai/scripts/hardware_caps_loader.py            # JSON dump, 6+ rows

ADR-0332 — External-competitor benchmark harness (2026-05-08)

  • Touches: tools/external-bench/ (new), docs/adr/0332-external-bench-wrapper-only.md (new), docs/adr/_index_fragments/0332-external-bench-wrapper-only.md (new), docs/adr/_index_fragments/_order.txt (one-line append), docs/adr/README.md (regenerated), changelog.d/added/external-bench-harness.md (new), docs/research/0087-external-bench-competitor-survey-2026-05-08.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the harness is wrapper-only — never vendor or link x264-pVMAF (GPL-2.0) into this fork. Future competitors follow the same pattern (tools/external-bench/<competitor>/run.sh invokes a user-installed binary via env var; output schema-shimmed into the canonical JSON shape). The output schema (frames[].{frame_idx, predicted_vmaf_or_mos, runtime_ms} + summary.{competitor, plcc, srocc, rmse, runtime_total_ms, params, gflops}) is the contract between every wrapper and compare.py. run_wrapper's runner parameter MUST stay resolved at call time (not via default-arg binding) so monkeypatch-based tests work.
  • On upstream sync: no action required. The harness lives entirely under tools/external-bench/ (a fork-local path) and never touches Netflix-shared code.
  • Re-test on rebase:
python3 -m pytest tools/external-bench/tests/ -q   # must report 7 passed
bash -n tools/external-bench/*/run.sh

0327 — Conformal-VQA prediction surface for vmaf-tune (ADR-0279)

  • Touches: tools/vmaf-tune/src/vmaftune/conformal.py (new), tools/vmaf-tune/src/vmaftune/predictor.py (Predictor.predict_vmaf_with_uncertainty), tools/vmaf-tune/src/vmaftune/cli.py (predict subcommand gains --with-uncertainty / --calibration-sidecar / --alpha), tools/vmaf-tune/tests/test_conformal.py (new), docs/ai/conformal-vqa.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the conformal wrapper sits outside the ONNX graph and adds no new runtime dependency — conformal.py imports only the standard library (math, statistics, dataclasses, json, warnings). Future calibration-sidecar shapes use the method discriminator string for versioning; do not rename "split-conformal" / "cv-plus" without bumping the loader. The Predictor.predict_vmaf_with_uncertainty signature is the Python-API contract consumed by vmaf-tune predict --with-uncertainty; renaming or reordering its keyword args breaks the CLI in lockstep.
  • On upstream sync: no action required. vmaf-tune is a fork-local tool; upstream Netflix/vmaf has no per-shot prediction surface.
  • Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_conformal.py -q
python3 -m pytest tools/vmaf-tune/tests/test_predictor.py -q

CI paths-ignore deny-list on heavy workflows (ADR-0341, 2026-05-09)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (fork-local — paths-ignore: block under pull_request:), .github/workflows/tests-and-quality-gates.yml (fork-local — same block), docs/adr/0341-ci-paths-ignore-doc-only-prs.md + index fragment, changelog.d/changed/ci-paths-ignore-doc-only.md.
  • Invariant: the deny-list must stay strictly documentation-only (docs/**, **/*.md, changelog.d/**, CHANGELOG.md, .workingdir2/**). Any path that contributes to a build, test, or lint input — libvmaf/**, meson.build, meson_options.txt, subprojects/**, python/**, ai/**, mcp-server/**, model/**, testdata/**, .github/workflows/** — must NEVER appear in the deny-list, otherwise the corresponding required check is silently skipped on a code-touching PR. The Required Checks Aggregator (ADR-0313) catches only the doc-only case (no required check ever ran for any required name); a too-broad deny-list would lose build coverage without anyone noticing.
  • On upstream sync: Netflix/vmaf upstream does not carry these two workflow files (they are fork-local additions). No sync conflict expected.
  • Re-test on rebase:

HDR VMAF model search — Path C documentation only (2026-05-09)

  • Files added (this fork only; upstream Netflix/vmaf has none of these):
  • model/vmaf_hdr_model_card.md — discoverable warning that the HDR scoring path falls back to the SDR vmaf_v0.6.1.json weights. Filename deliberately uses .md, not .json, so the vmaftune.hdr.select_hdr_vmaf_model glob (vmaf_hdr_*.json) keeps returning None.
  • docs/research/0089-hdr-vmaf-model-search.md — verbatim trail of the source-or-train survey (URLs + access dates).
  • changelog.d/added/hdr-vmaf-model-search.md — release-notes fragment per ADR-0221.
  • ADR-0300 grew an inline ### Status update 2026-05-09: HDR model status section.
  • Why no model JSON ships: Path A negative findings (no public Netflix HDR VMAF model exists; HDRMAX is a different algorithm not loadable by libvmaf's JSON path). Path B deferred behind gated subjective HDR corpora + multi-day training compute. No fabricated weights are introduced.
  • On upstream sync: if Netflix lands vmaf_hdr_*.json in Netflix/vmaf/model/, port via /port-upstream-commit; the resolver picks it up automatically with no vmaftune change. Then delete model/vmaf_hdr_model_card.md (or rewrite it as a normal model card describing the upstream weights). Watch https://github.com/Netflix/vmaf/issues/645 for the upstream release announcement.
  • Re-test on rebase: no behavioural change — pure docs. Sanity:
python3 -c "from pathlib import Path; \
  import sys; sys.path.insert(0,'tools/vmaf-tune/src'); \
  from vmaftune.hdr import select_hdr_vmaf_model; \
  print(select_hdr_vmaf_model(Path('model')))"
# Expect: None  — confirms the .md card does not match the glob

ADR-0349 — fr_regressor_v3 namespace resolution (2026-05-09)

  • Rebase impact: none. Docs-only change — adds ADR-0349, an append-only status appendix on ADR-0302 per ADR-0028, a ## fr_regressor_* namespace map block in ai/AGENTS.md, and two changelog fragments. No upstream Netflix/vmaf surface touched; no fr_regressor_* registry rows touched (sha256s for _v1, _v2, _v2_ensemble_v1_seed{0..4}, _v3 all unchanged); no C / Python / ONNX bytes modified.
  • What to check after a rebase: nothing automated. The only drift risk is a future agent claiming fr_regressor_v3plus_features for an unrelated workstream — ai/AGENTS.md carries the reservation; reviewers verify the map row exists before approving any new fr_regressor_* registry id.
  • Reproducer:

```bash # ADR + AGENTS.md namespace map present and consistent: test -f docs/adr/0349-fr-regressor-v3-namespace.md grep -q "fr_regressor_* namespace map" ai/AGENTS.md grep -q "fr_regressor_v3plus_features" ai/AGENTS.md docs/adr/0349-fr-regressor-v3-namespace.md # Status appendix present on ADR-0302: grep -q "Status update 2026-05-09: namespace collision resolved" \ docs/adr/0302-encoder-vocab-v3-schema-expansion.md # Existing v3 production row bit-identical (sha256 unchanged): python3 -c "

import json reg = json.load(open('model/tiny/registry.json')) v3 = next(m for m in reg['models'] if m['id'] == 'fr_regressor_v3') assert v3['sha256'] == 'eaa16d23461eda74940b2ed590edfcaf13428aade294e47792a5a15f4d3b999c', v3 assert v3['smoke'] is False print('OK: fr_regressor_v3 production row unchanged') "

Registry test still passes:

bash core/test/dnn/test_registry.sh

0327 — Pre-push PR-body deliverables validator hook

  • Touches: scripts/ci/validate-pr-body.sh (new), scripts/git-hooks/pre-push (new), scripts/ci/test-validate-pr-body.sh (new), Makefile (hooks-install target adds the pre-push symlink). Re-uses scripts/ci/deliverables-check.sh parser verbatim — no upstream-shared file is modified.
  • Invariant: parser shape parity with .github/workflows/rule-enforcement.yml deep-dive-checklist gate (ADR-0108). The validator constructs a PATH shim that intercepts git diff --name-only calls only; every other git invocation falls through to the real binary.
  • On upstream sync: not applicable — these files are entirely fork-local and Netflix has no equivalent. If scripts/ci/deliverables-check.sh is ever rewritten or moved, the validator's exec path (scripts/ci/deliverables-check.sh) and the test harness's expected exit codes must follow. bash scripts/ci/test-validate-pr-body.sh # 8/8 cases pass

0320 — Semgrep # nosemgrep cites on Netflix-upstream Python harness (Research-0090)

  • Touches: python/vmaf/core/asset.py, python/vmaf/core/executor.py, python/vmaf/core/feature_extractor.py, python/vmaf/core/quality_runner.py, python/vmaf/core/result_store.py, python/vmaf/tools/decorator.py, python/test/command_line_test.py, python/test/feature_extractor_test.py, python/test/ssimulacra2_test.py, python/vmaf/config.py.
  • Invariant: every fork-added # nosemgrep: <rule-id> line is paired with an inline cite to Research-0090. The cite + rule-id pair is the load-bearing artifact (per memory feedback_no_guessing: every "false positive" claim ships its safety proof). If an upstream sync removes the cited line of code, drop the cite-comment block too. If upstream adds a defusedxml fix at the ElementTree.parse() site (feature_extractor.py:115, quality_runner.py:1496), keep upstream's fix and drop our suppressions.
  • config.py:40 (the SSL-bypass deletion) is a fork-exclusive security fix; if upstream resurrects ssl._create_unverified_context on a sync, do not re-merge it — the bypass clobbers the process-global default and is unjustified per Research-0090, F1. semgrep scan --config=p/cwe-top-25 --config=p/c --config=p/python . \ --metrics=off --json | jq '.results | length'

# expect 0 — every legit finding either has a # nosemgrep cite or was fixed

0321 — Security-scans workflow registry-pack list (Research-0090)

  • Touches: .github/workflows/security-scans.yml, .github/workflows/lint-and-format.yml.
  • Invariant: the registry packs the workflow cites (p/cwe-top-25 + p/c + p/python) are validated against https://semgrep.dev/c/p/<pack> — the previously-cited p/cert-c-strict, p/cert-cpp-strict, and p/cpp packs were retired by Semgrep in 2025 and 404. The lint-and-format.yml pull of ${{ github.* }} into env: (clang-tidy + clang-tidy-sycl steps) defuses run-shell-injection; preserve the pattern on any edit. See Research-0090, F2/F3. for pack in p/cwe-top-25 p/c p/python; do code=$(curl -sIL "https://semgrep.dev/c/${pack}" | head -1 | awk '{print $2}') [ "$code" = "200" ] && echo "${pack}: OK" || echo "${pack}: FAIL ($code)"

0320 — CodeQL C bulk sweep (78 deferred alerts → 60 fixed, 14 deferred to T7-5)

  • Touches: core/src/feature/{cambi.c,ciede.c,integer_adm.c,integer_psnr.c,adm_tools.h,third_party/xiph/psnr_hvs.c}, core/src/feature/x86/{adm_avx2.c,adm_avx512.c,ansnr_avx2.c,ansnr_avx512.c,vif_avx2.c,vif_avx512.c}, core/src/{pdjson.c,svm.cpp}, core/test/{test_cpu.c,test_model.c}, core/tools/{y4m_input.c,yuv_input.c,vmaf_bench.c}. All but vmaf_bench.c are upstream-mirror Netflix files.
  • Invariant: widening casts on integer multiplications ((size_t), (uint64_t), (double)) are LHS-prefixed before the multiply, never wrapped around the whole expression — the latter is a no-op against cpp/integer-multiplication-cast-to-long. Deleted commented-out blocks (e.g., the AVX-512 VP-loop dead variant in adm_avx512.c::adm_dwt2_inverse) are gone for good; if upstream brings them back, they reintroduce the alerts. iqa/convolve.c was deliberately left untouched: prefixing (double) on the float×float multiplications inside the scalar reference path breaks bit-exactness against the AVX2 path enforced by test_iqa_convolve — CodeQL alert deferred to a follow-up that updates both paths in lockstep.
  • On upstream sync: any upstream change that re-introduces the deleted comment blocks or rewrites the cast forms will surface the alerts again. The cambi_score signature change (CambiBuffers buffersconst CambiBuffers *buffers) is fork-local and likely to conflict with upstream patches that touch that function. The 14 deferred VifBuffer large-parameter alerts are tracked under T7-5 (multi-backend coordinated refactor including NEON).
  • Re-test on rebase: cd libvmaf && meson test -C build # all 50+ C tests make test-netflix-golden # upstream golden gate

# Re-run CodeQL on master afterwards; the 60 fixed alerts must stay closed.

CodeQL cpp/declaration-hides-variable sweep (2026-05-09)

  • What changed: Mechanical rename / scope-tighten / dedupe sweep closing 64 open cpp/declaration-hides-variable CodeQL alerts on master. Touched files: core/src/feature/cambi.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/x86/vif_avx2.c, core/src/feature/x86/vif_avx512.c. All five are upstream-mirror; the Netflix copyright header is preserved on each.
  • Renames adopted (semantic over _2 suffix):
  • cambi.c: inner int err shadowing function-scope err becomes mkdir_err (heatmaps init) and src_err (full-ref extract path).
  • adm_avx2.c / adm_avx512.c: the j == 0 first-column special-case block is wrapped in { ... } so its j0..j3 and s0..s3 stop being visible to the per-j tail loop. The inner duplicate __m256i add_shift_HP_vex = _mm256_set1_epi32(32768) (and 512-bit twin) is removed — bit-identical to the function-scope value already in scope. The __m256i rfactor1 that shadowed the function-scope float rfactor1[3] becomes rfactor_v0/_v1/_v2 (and the AVX-512 twin likewise).
  • vif_avx2.c / vif_avx512.c: tap-loop locals follow f_tap, r_top/r_bot, d_top/d_bot for the s0 stage, and f_tap0/f_tap1, r_back0/r_fwd0, etc. for the AVX-512 paired-tap stage. Inner per-fj __m256i fq / __m512i fq shadows of the centre-tap broadcast become f_tap. Inner-block duplicates of function-scope ref/dis/stride/ii (identical types and initialisers) are simply removed. The two scalar VifResiduals residuals declarations that shadowed function-scope Residuals512 residuals become tail_residuals. The two const uint16_t fcoeff declarations that shadowed function-scope __m512i fcoeff become fcoeff_scalar.
  • Invariant: bit-exactness gate — the rename sweep must not change any score. The Netflix CPU golden 3 (src01_hrc00, checkerboard_1, checkerboard_10) ran clean against this PR. All 76 VMAF-targeted Python tests pass; the 9 unrelated pre-existing failures (NIQE, PyPSNR, FileSystemResultStore) reproduce on a pristine origin/master checkout.
  • On upstream sync: Netflix has no equivalent renames on upstream master as of 2026-05-09. When syncing, prefer the fork's renamed identifiers (the CodeQL gate depends on them). If Netflix later renames the same locals differently, reconcile by keeping fork names and updating any imported chunks at port time.
  • Re-test on rebase: meson test -C build --suite=fast PYTHONPATH=$PWD/python python3 -m pytest \ python/test/quality_runner_test.py -k test_run_vmaf \ python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ -m "not slow" -q

ADR-0209 v1 stdio runtime (T5-2b) — Embedded MCP server (2026-05-08)

  • Touches: core/src/mcp/{mcp.c,dispatcher.c,transport_stdio.c,mcp_internal.h,meson.build,3rdparty/cJSON/{cJSON.c,cJSON.h,LICENSE}}, core/test/test_mcp_smoke.c, core/test/meson.build. All paths are fork-local. cJSON is vendored verbatim from upstream DaveGamble/cJSON@v1.7.18 under its MIT license.
  • Invariant: every TU under core/src/mcp/ (other than the vendored cJSON dir) is fork-local with the Copyright 2026 Lusoris and Claude (Anthropic) header; cJSON keeps its upstream MIT header verbatim. The public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged from T5-2 — only function bodies flipped from -ENOSYS to working implementations. SSE / UDS still return -ENOSYS so the v2 PR can wire them without touching the public surface.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface; the entire core/src/mcp/ subtree is fork-local. If upstream ever adds an MCP surface, expect a port-only sync since names will collide. cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \ -Denable_mcp=true -Denable_mcp_stdio=true ninja -C build && meson test -C build test_mcp_smoke -v

ADR-0334 — state.md-touch-check CI gate (2026-05-08)

  • Touches: .github/workflows/rule-enforcement.yml (new top-level job state-md-touch-check), scripts/ci/state-md-touch-check.sh (new), scripts/ci/test-state-md-touch-check.sh (new), scripts/ci/AGENTS.md (new rebase-sensitive-surface row), .github/PULL_REQUEST_TEMPLATE.md (already carries the "Bug-status hygiene" section + no state delta: REASON opt-out — coupled to the script's regex). No upstream-shared paths.
  • Invariant: the gate's trigger predicate (Conventional-Commit fix: prefix, bare bug token in title, GitHub close-keywords closes/fixes/resolves #N, unchecked Bug-status-hygiene checkbox) and opt-out sentinel (no state delta: REASON) match the wording of the ## Bug-status hygiene section in .github/PULL_REQUEST_TEMPLATE.md. Reword the template only alongside the script. The job carries the pull_request.draft == false || github.event_name != 'pull_request' gate (ADR-0331 pattern) — keep that on any future hoist into the required-aggregator set.
  • On upstream sync: Netflix/vmaf has no equivalent rule. No conflict expected; the workflow file is fork-introduced.
  • Re-test on rebase: bash scripts/ci/test-state-md-touch-check.sh python3 -c "import yaml; yaml.safe_load(open('.github/workflows/rule-enforcement.yml')); print('YAML OK')" pre-commit run shellcheck --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh pre-commit run shfmt --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh

SYCL PSNR chroma extension (T3-15(b), 2026-05-09)

  • Touches: core/src/feature/sycl/integer_psnr_sycl.cpp (per-extractor chroma device buffers, per-plane SSE accumulators, and a provided_features extension to psnr_y / psnr_cb / psnr_cr), core/src/sycl/AGENTS.md (per-kernel rebase-sensitive invariant for the chroma-on-per-extractor-buffer arrangement), docs/metrics/features.md (footnote ¹ refresh — all three GPU PSNR extractors now emit chroma), docs/adr/0192-gpu-long-tail-batch-3.md References-section status update, changelog.d/added/sycl-psnr-chroma.md.
  • Invariant on the chroma upload path: chroma planes ride on per-extractor device buffers populated by host-side staging copies in the combined-graph pre_fn callback — NOT the SYCL state's shared frame buffer (vmaf_sycl_shared_frame_init), which is luma-only by design. Luma stays graph-recorded; chroma SSE kernels run direct in post_fn on the same in-order combined queue. The CUDA twin (PR #520 / commit 7f3d58a5) uses the existing CUDA per-plane picture infrastructure and therefore has no equivalent invariant.
  • On upstream sync: Netflix/vmaf upstream has no SYCL backend at all, so conflict probability is zero on psnr_sycl. If an upstream port to the fork's SYCL runtime someday extends vmaf_sycl_shared_frame_init to allocate chroma planes, the PSNR extension can be migrated onto it and the per-extractor chroma buffers retired — but only after a cross-backend gate run confirms bit-exactness against CPU at places=4 (ADR-0214). source /opt/intel/oneapi/setvars.sh CC=icx CXX=icpx meson setup build-sycl libvmaf \ -Denable_sycl=true -Denable_cuda=false ninja -C build-sycl python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build-sycl/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend sycl --device 0

# Expect 0/48 mismatches across psnr_y / psnr_cb / psnr_cr at places=4.

```text

Cppcheck nullPointer false-positive in dict.c (2026-05-09)

Files pinned:

  • core/src/dict.c:121 (one-line redundant-condition fix in dict_overwrite_existing). Why this rebase-note exists: Master CI's Cppcheck (Whole Project) gate started failing on commit 14b5ffba (#537) and blocked every open PR because each PR rebases onto a broken master. The cppcheck finding was likely always present but masked by paths-ignore filtering on the prior workflow shape; PR #530 widened cppcheck's trigger surface and exposed it. Deleted the redundant && val guard since val is already checked at the public entry-point vmaf_dictionary_set (dict.c:137). No behavior change; cppcheck flags the original as "either the val check is redundant or there's a possible null deref" because it can't prove the interprocedural guarantee. Rebase-sensitivity: zero — change is local to dict.c. Future upstream sync of this file should keep the fix or re-run cppcheck locally to confirm absence of recurrence.

Aggregator timeout bump (2026-05-09)

Files pinned:

  • .github/workflows/required-aggregator.yml (deadline 30→90 min, job timeout 35→100 min) Why: 41 PRs in flight 2026-05-09 morning hit Aggregator timeouts while real CI eventually passed. Bumping both deadlines unblocks the train without touching the underlying matrix. Rebase-sensitivity: zero — workflow file is wholly fork-local.

ARC self-hosted runner pool — pilot Cppcheck routing (2026-05-09)

  • .github/workflows/lint-and-format.yml (Cppcheck runs-on: ternary). Why: opt-in graceful migration; ADR-0359 + docs/development/ci-runners.md document the flip-the-variable recipe when the cluster is degraded. Rebase-sensitivity: zero — workflow file is fork-local.

ADR-0338 — macOS Vulkan-via-MoltenVK CI lane (2026-05-09)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (fork-local — adds Build — macOS Vulkan via MoltenVK (advisory) lane, adds continue-on-error plumbing on matrix.experimental && matrix.moltenvk, adds Install MoltenVK + Vulkan loader/headers (macOS) step, adds Run Vulkan smoke tests (macOS MoltenVK) step, gates the existing test/cache/tox steps on !matrix.moltenvk), docs/backends/vulkan/moltenvk.md (new fork-local doc), docs/adr/0127-vulkan-compute-backend.md (status-update appendix per the ADR's Proposed status — body untouched), docs/adr/0338-macos-vulkan-via-moltenvk-lane.md (new), docs/adr/_index_fragments/0338-macos-vulkan-via-moltenvk-lane.md plus _order.txt append (new), docs/research/0089-moltenvk-feasibility-on-fork-shaders.md (new), changelog.d/added/macos-vulkan-via-moltenvk-lane.md (new).
  • Invariant on the upstream-mirror file: none — libvmaf-build-matrix.yml is fork-local. The new lane's continue-on-error clause MUST stay scoped to matrix.experimental == true && matrix.moltenvk == true so existing experimental: true matrix entries (e.g. the macOS DNN lane) keep their default fail-fast behaviour. VK_ICD_FILENAMES MUST point at /opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json — note the etc/vulkan segment, NOT share/vulkan (the homebrew formula's install layout uses etc/; verified against Formula/m/molten-vk.rb).
  • On upstream sync: Netflix upstream has no macOS Vulkan lane and no MoltenVK awareness; nothing to reconcile. If a future MoltenVK release drops support for GL_EXT_shader_atomic_int64 translation, moment.comp will fail on the lane; the fix path is in ADR-0338 §Decision (lane is continue-on-error so it does not block PRs) — update the known-limitations table in docs/backends/vulkan/moltenvk.md and either pin a working MoltenVK version in the brew install line or rewrite the shader.
  • Re-test on rebase:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/libvmaf-build-matrix.yml'))" && \
  echo "YAML parse OK"
# Confirm the lane is still in the matrix:
grep -q "Build — macOS Vulkan via MoltenVK (advisory)" \
  .github/workflows/libvmaf-build-matrix.yml
# Confirm the lane is NOT promoted to required-aggregator until one
# green run on master (per ADR-0338):
! grep -q "macOS Vulkan via MoltenVK" \
  .github/workflows/required-aggregator.yml
# Confirm the ICD path is the etc/ one, not share/:
grep -q "etc/vulkan/icd.d/MoltenVK_icd.json" \
  .github/workflows/libvmaf-build-matrix.yml

ADR-0363 — Mend Renovate replaces Dependabot (2026-05-09)

  • Touches: renovate.json (new, repo-root), .github/workflows/renovate.yml (new), .github/dependabot.yml (deleted — renamed to .github/dependabot.yml.disabled), docs/development/dependency-bot.md (new operator playbook), changelog.d/changed/renovate-supersedes-dependabot.md (new), docs/adr/0363-renovate-replaces-dependabot.md (new), docs/adr/_index_fragments/0363-renovate-replaces-dependabot.md (new).
  • Invariant: .github/dependabot.yml no longer exists on master; the disabled copy is dependabot.yml.disabled. On upstream sync, if Netflix ever ships their own dependabot.yml, do NOT restore it — the fork intentionally uses Renovate. Merge the upstream file into dependabot.yml.disabled for reference only.
  • Upstream interaction: none. Netflix/vmaf upstream has no Renovate config. Conflict risk is zero unless upstream adds renovate.json or restores dependabot.yml.
  • Re-test on rebase:
# Verify the workflow SHA-pin is still present and non-floating:
grep -E 'renovatebot/github-action@[a-f0-9]{40}' .github/workflows/renovate.yml
# Verify dependabot.yml is still absent:
test ! -f .github/dependabot.yml && echo "ok: dependabot.yml absent"
# Validate renovate.json syntax (requires Node):
node -e "JSON.parse(require('fs').readFileSync('renovate.json','utf8')); console.log('JSON valid')"

ADR-0355 — Symphony-inspired agent-dispatch infrastructure (2026-05-09)

Files added (all fork-introduced, none mirror upstream):

  • .claude/workflows/_template.md, .claude/workflows/codeql-alert-sweep.md, .claude/workflows/simd-port.md, .claude/workflows/feature-extractor-port.md.
  • scripts/lib/__init__.py, scripts/lib/backlog_tracker.py, scripts/lib/AGENTS.md.
  • scripts/ci/agent-eligibility-precheck.py (new row in scripts/ci/AGENTS.md "Rebase-sensitive surfaces" table).
  • docs/development/agent-dispatch.md. Why this rebase-note exists: pure additive, all paths are fork-only (.claude/, scripts/lib/, fork-only docs). Upstream Netflix/vmaf has no .claude/, no scripts/lib/, and no docs/development/agent-dispatch.md, so the merge surface is zero on /sync-upstream. The only coupling is internal between scripts/ci/agent-eligibility-precheck.py and scripts/lib/backlog_tracker.py (sys.path import). Both files move together; documented in scripts/lib/AGENTS.md and a new row in scripts/ci/AGENTS.md. Rebase-sensitivity: zero w.r.t. upstream. Internal-only: renaming BacklogItem field names or the BacklogTracker / GitHubTracker public method signatures is a breaking change for the precheck and any future state-audit script — guard via the smoke listed in Research-0091 §"Smoke results" before any rename PR. Format-coupling note: the BACKLOG.md row regex (scripts/lib/backlog_tracker.py:_ID_PATTERN) is brittle against table-shape edits. If a future BACKLOG.md edit adds a column or renames a status word, the parser will silently mis-classify rows — the smoke parses 101 rows on master at 2026-05-09; expect ≥ 100 after any structural edit.

0350 — psnr_hvs AVX-512 ceiling re-bench (ADR-0350, T3-9 (a))

  • docs/adr/0350-psnr-hvs-avx512-ceiling.md — closure ADR.
  • docs/adr/0160-psnr-hvs-neon-bitexact.md — appended ### Status update 2026-05-09 appendix.
  • docs/research/0091-psnr-hvs-avx512-bench-2026-05-09.md — empirical companion (cycle share, Amdahl ceiling, reproducer). Why this rebase-note exists: T3-9 (a) closes as AVX2 ceiling. The result has zero rebase-sensitivity by itself — no engine code changes — but the bit-exactness invariants that lock it to a ceiling do. The 78.42 % scalar tail in calc_psnrhvs_avx2 / calc_psnrhvs_neon is locked by ADR-0138 / ADR-0139's "per-lane-scalar float reduction" rule (carried by ADR-0159 / ADR-0160). If a future upstream sync of core/src/feature/third_party/xiph/psnr_hvs.c (the Xiph/Daala DCT) changes the per-block summation tree — e.g. partial folding, re-ordered means, vectorised mask reductions — the AVX2 + NEON TUs in core/src/feature/x86/psnr_hvs_avx2.c and core/src/feature/arm64/psnr_hvs_neon.c MUST be re-audited against the new scalar reference, and the ceiling argument in ADR-0350 must be re-run (because the 78 / 15 cycle-share split would shift). Rebase-sensitivity: low for the ceiling decision itself (empirical re-bench on a current host is cheap — 30 seconds via the reproducer in Research-0091 §7); high for the underlying bit-exactness invariants the decision rests on (Netflix golden trips on ≥ 5.5e-5 drift per ADR-0160 §Context). The ADR-0350 §Verification reproducer is the gate — re-run it if the cycle share shifts, the Netflix normal-pair fixture changes, or a new host class (e.g. wide-issue Granite Rapids) goes into CI.

0320 — FFmpeg n8.1 → n8.1.1 base bump (2026-05-09)

  • Touches: ffmpeg-patches/series.txt (header comment), ffmpeg-patches/README.md (apply / verify / smoke sections), ffmpeg-patches/test/build-and-run.sh (FFMPEG_SHA default), scripts/ci/ffmpeg-patches-check.sh (header comment; FFMPEG_BRANCH env default unchanged at release/8.1 since the branch tracks point releases), docs/development/automated-rule-enforcement.md (gate description). The 9 .patch files themselves are unchanged — every patch in the series applied cleanly, cumulatively, against pristine n8.1.1 via git am --3way.
  • Upstream source: FFmpeg upstream point release n8.1.1 (commit 239f2c7 "Bump micro for 8.1.1") — bug-fix-only on top of n8.1, no API or AVOption breakage that the patch stack consumes.
  • Invariant: the patch stack continues to apply against the current tip of FFmpeg's release/8.1 branch. Per ADR-0118 and ADR-0186 §FFmpeg patch coupling, the verification gate is cumulative git am --3way against a pristine checkout, not per-patch standalone apply. The scripts/ci/ffmpeg-patches-check.sh local gate uses git apply (no commit) but accumulates state in the same way.
  • On upstream sync: no action required. If a future FFmpeg point release (n8.1.2 or n8.2) lands new hunks that conflict with one of the patches, regenerate the affected patches via git format-patch on the resolved state, bump the references in the five files listed under "Touches", and add a fresh rebase-notes entry citing the conflict file(s).
  • Re-test on rebase:
cd /tmp && rm -rf ffmpeg-n811 && \
  git clone --depth 1 --branch n8.1.1 \
    https://git.ffmpeg.org/ffmpeg.git ffmpeg-n811
git -C /tmp/ffmpeg-n811 config user.email agent@local
git -C /tmp/ffmpeg-n811 config user.name agent
for p in ffmpeg-patches/000*-*.patch; do
  git -C /tmp/ffmpeg-n811 am --3way "$p" || break
done
bash scripts/ci/ffmpeg-patches-check.sh

ADR-0281 follow-up — QSV install-matrix discoverability backfill (2026-05-08)

  • Touches: docs/getting-started/install/{arch,fedora,ubuntu,macos,windows}.md (new ## Intel QSV section per page), docs/adr/0281-vmaf-tune-qsv-adapters.md (status-update appendix per ADR-0028), changelog.d/changed/qsv-install-matrix-docs.md (new fragment). No code, no engine, no upstream-shared C / Python source touched. Pure documentation backfill closing the SYCL-audit research-0086 Topic C gap (issue #464).
  • Invariant: each per-OS QSV section pins the package names against verified upstream URLs with a Verified 2026-05-08 access date. The hardware-generation matrix is sourced from the public Wikipedia "Intel Quick Sync Video — Hardware decoding and encoding" table; if Intel revises which generation supports AV1 encode (e.g. backports the encoder to Lunar Lake / Meteor Lake silicon currently absent from the table), the matrix in all five pages must move in lockstep — the Arch / Fedora / Ubuntu / Windows pages all carry the same matrix verbatim. The macOS page deliberately omits the matrix (QSV unsupported on macOS).
  • On upstream sync: no action required — Netflix/vmaf upstream does not ship per-OS install pages under docs/getting-started/install/; that tree is fork-only.

# Lint the install pages (markdownlint via pre-commit):

pre-commit run --files docs/getting-started/install/*.md

# Verify each page (except alpine + macos) still carries the matrix:

for f in arch fedora ubuntu windows; do grep -q 'Arc Battlemage' "docs/getting-started/install/${f}.md" || echo "MISSING: ${f}"

# Confirm the macOS page documents QSV as unsupported:

grep -q 'Intel QSV. is unsupported on macOS' docs/getting-started/install/macos.md

0333 — vmaf-tune Phase F multi-pass encoding (ADR-0333)

Touches:

  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (CodecAdapter Protocol gains supports_two_pass: bool + two_pass_args(...))
  • tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (overrides both)
  • tools/vmaf-tune/src/vmaftune/encode.py (EncodeRequest gains pass_number / stats_path; build_ffmpeg_command adds the 2-pass argv splice + pass-1 null-muxer redirect; new run_two_pass_encode)
  • tools/vmaf-tune/src/vmaftune/corpus.py (CorpusOptions.two_pass, routing in iter_rows)
  • tools/vmaf-tune/src/vmaftune/cli.py (--two-pass flag on corpus / recommend subparsers) Invariant: 2-pass encoding routes through the codec adapter via supports_two_pass + two_pass_args(pass_number, stats_path). The encode driver never branches on codec name. Adapters with supports_two_pass = False are honoured silently (single-pass fallback with stderr warning); the seam is open for sibling codec adapters (libx264, libsvtav1, libvvenc, libaom-av1) to opt in by overriding the two methods on their adapter file alone. This is the fork-local extension to the ADR-0237 Phase A multi-codec contract; upstream Netflix/vmaf has no equivalent and does not own this code path. Re-test:
cd tools/vmaf-tune
python -m pytest tests/test_codec_adapter_x265_two_pass.py -q

(Optional, requires ffmpeg + libx265 in the runner's PATH:)

VMAF_TUNE_INTEGRATION=1 python -m pytest \
  tests/test_codec_adapter_x265_two_pass.py::test_real_x265_two_pass_smoke -q

Rebase-sensitivity: zero from upstream — tools/vmaf-tune/ is fork-local. The only concern is the codec_adapters Protocol shape: a future upstream commit that adds a sibling codec adapter SHOULD inherit the supports_two_pass = False default and either explicitly opt in or leave the flag off. Downstream sibling-codec PRs in this fork should follow the ADR-0288 / ADR-0333 pattern: one adapter file, override the two methods, add a test file mirroring test_codec_adapter_x265_two_pass.py.

ADR-0360 — CAMBI CUDA port (T3-15a, 2026-05-09)

Files pinned:

  • core/src/feature/cuda/integer_cambi_cuda.c (new)
  • core/src/feature/cuda/integer_cambi_cuda.h (new)
  • core/src/feature/cuda/integer_cambi/cambi_score.cu (new)
  • core/src/feature/feature_extractor.c (added vmaf_fex_cambi_cuda to list)
  • core/src/meson.build (added cambi_score to cuda_cu_sources, added integer_cambi_cuda.c to CUDA feature sources)

Why: The CUDA twin of vmaf_fex_cambi (Strategy II hybrid — three GPU kernels for the embarrassingly parallel stages; calculate_c_values + topK on CPU). Registers vmaf_fex_cambi_cuda under #if HAVE_CUDA guard.

Rebase-sensitivity: low. The three new files are wholly fork-local and will not conflict. The two upstream-shared files have small, self-contained hunks:

  • feature_extractor.c: the extern vmaf_fex_cambi_cuda declaration and the &vmaf_fex_cambi_cuda array entry are inside a #if HAVE_CUDA block. Upstream's additions to this file (new feature extractors, new dispatch flags) will not conflict unless Netflix adds their own CUDA twin for CAMBI (unlikely — they don't ship a CUDA backend).
  • meson.build: the cambi_score entry in the cuda_cu_sources dict and the integer_cambi_cuda.c line in the CUDA sources list. Any upstream changes to meson.build that restructure the cuda_cu_sources dict would require a manual merge; the dict entries are sorted alphabetically by key, so cambi_score lands between adm_score and motion_score.

If upstream adds cambi_cuda themselves: drop the fork copy and check for API divergence. Strategy II hybrid is the natural choice; the upstream implementation may differ if they choose Strategy III (fully-on-GPU calculate_c_values).

cambi_internal.h dependency: integer_cambi_cuda.c includes core/src/feature/cambi_internal.h (fork-added trampoline exposing cambi.c's static helpers). If upstream significantly refactors cambi.c (renames vmaf_cambi_preprocessing, vmaf_cambi_calculate_c_values, etc.), cambi_internal.h must be updated alongside. This is the same dependency the Vulkan twin (cambi_vulkan.c) has — see ADR-0210's rebase note for the full list of exposed functions.

Vulkan submit-pool PR-B: six secondary kernels (2026-05-09, ADR-0353)

Files changed:

  • core/src/feature/vulkan/ssim_vulkan.c
  • core/src/feature/vulkan/ciede_vulkan.c
  • core/src/feature/vulkan/ms_ssim_vulkan.c
  • core/src/feature/vulkan/motion_v2_vulkan.c
  • core/src/feature/vulkan/float_psnr_vulkan.c
  • core/src/feature/vulkan/float_motion_vulkan.c
  • core/src/feature/vulkan/AGENTS.md
  • docs/adr/0353-vulkan-submit-pool-pr-b-six-kernels.md

Why this rebase-note exists: six Vulkan host-glue TUs were migrated from per-frame command-buffer and descriptor-set allocation to the VmafVulkanKernelSubmitPool abstraction (ADR-0256). Any Netflix upstream sync that touches these same files (unlikely — they are fork-local) must preserve the VmafVulkanKernelSubmitPool fields in the state struct and the pool-destroy-before-pipeline-destroy ordering in close_fex().

Rebase-sensitivity: low. All six files are entirely fork-local; Netflix upstream does not have a Vulkan backend. The submit-pool API is defined in core/src/vulkan/kernel.h (also fork-local). No public header or C-API surface was changed; the FFmpeg patch series is unaffected.

Key invariant to preserve on rebase: vmaf_vulkan_kernel_submit_pool_destroy MUST be called before vmaf_vulkan_kernel_pipeline_destroy in every migrated kernel's close_fex(). See core/src/feature/vulkan/AGENTS.md §"Submit-pool ordering invariant".

0354 — Vulkan submit-pool PR-C: submit_pool_destroy-before-pipeline ordering

  • Touches: core/src/feature/vulkan/cambi_vulkan.c, core/src/feature/vulkan/ssimulacra2_vulkan.c, core/src/feature/vulkan/float_ansnr_vulkan.c, core/src/feature/vulkan/moment_vulkan.c.
  • Invariant: In every migrated extractor, vmaf_vulkan_kernel_submit_pool_destroy() MUST precede every vmaf_vulkan_kernel_pipeline_destroy() call in close_fex(). Reversing the order frees the pool's command buffers after the pipeline's command pool is destroyed — undefined behaviour per Vulkan spec §6.2.
  • Re-test: meson test -C build --suite=vulkan passes. scripts/ci/cross_backend_vif_diff.py shows places=4 for all four extractors on all three target devices (RTX 4090, Arc A380, RADV iGPU).

0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0291)

0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0352)

  • Touches: core/src/feature/vulkan/adm_vulkan.c, core/src/feature/vulkan/motion_vulkan.c, core/src/feature/vulkan/psnr_vulkan.c (all fork-local Vulkan kernels; no upstream C paths touched), changelog.d/changed/vulkan-submit-pool-pr-a-adm-motion-psnr.md, docs/adr/0291-vulkan-submit-pool-pr-a-adm-motion-psnr.md.
  • Invariant: Each migrated TU adds VmafVulkanKernelSubmitPool sub_pool and pre-allocated VkDescriptorSet field(s) to its state struct. The pool must be destroyed (vmaf_vulkan_kernel_submit_pool_destroy) before vmaf_vulkan_kernel_pipeline_destroy in close_fex(); reversing the order would destroy the descriptor pool while the submit pool still holds live command buffer + fence references. Descriptor sets allocated via vmaf_vulkan_kernel_descriptor_sets_alloc are freed implicitly by the descriptor pool tear-down — do NOT call vkFreeDescriptorSets on them in close_fex(). For motion_vulkan, the pre-allocated set is rebound once per frame via vkUpdateDescriptorSets because the blur ping-pong changes which blur[] slot is "current"; for adm_vulkan and psnr_vulkan the sets are stable after init() and require no per-frame update.
  • Upstream interaction: none. All three files are fork-local Vulkan kernel TUs not present in Netflix/vmaf upstream.
  • On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths. The Vulkan backend is entirely fork-introduced.
  • Re-test on rebase:
meson test -C build --suite=fast
# Cross-backend parity gate (places=4):
python python/test/cross_backend_diff.py \
    --features adm motion psnr \
    --backend vulkan cpu \
    --places 4 \
    --yuv testdata/yuv/src01_hrc00_576x324.yuv \
            testdata/yuv/src01_hrc01_576x324.yuv

ADR-0350 — FFmpeg libvmaf filter CUDA backend selector (0010 patch)

Patch: ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch.

  • libavfilter/vf_libvmaf.c — adds cuda AVOption + state field + init / cleanup / picture-pool wiring under CONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER.
  • configure — adds --enable-libvmaf-cuda (EXTERNAL_LIBRARY_LIST entry + help text), promotes libvmaf_cuda from blanket-autodetect to gated enabled libvmaf_cuda && require_pkg_config + check, preserves the enabled libvmaf && check_pkg_config libvmaf_cuda in-filter probe so the new selector still works without the explicit flag when libvmaf ships CUDA. Why this rebase-note exists: Patch 0010 extends the SYCL (0003) / Vulkan (0004) per-context backend selectors to CUDA on the regular libvmaf filter. The patch coexists with the upstream dedicated libvmaf_cuda filter (CONFIG_LIBVMAF_CUDA_FILTER) by gating its struct field and code paths on !CONFIG_LIBVMAF_CUDA_FILTER — the dedicated filter keeps owning its own cu_state field. CLAUDE.md §12 r14 makes the patch update mandatory because the change touches a filter consumer of the vmaf_cuda_state_init / _import_state / _state_free / _preallocate_pictures / _fetch_preallocated_picture C-API surface in libvmaf_cuda.h. Rebase-sensitivity: low. The patch's vf_libvmaf.c hunks are context-anchored on the SYCL/Vulkan selector blocks; if upstream FFmpeg renames CONFIG_LIBVMAF_CUDA_FILTER or moves the libvmaf_cuda.h include, the include guard at the top of the file needs the corresponding update. The configure hunks are context-anchored on the existing --enable-libvmaf-sycl / --enable-libvmaf-vulkan lines — those have proven stable across n8.0 → n8.1 → n8.1.1, so drift risk is low. When VmafCudaConfiguration ever grows a device_index field upstream, swap the cuda boolean for an int cuda_device mirroring SYCL's shape (separate ADR + patch refresh). Verification gate: cumulative git am --3way replay of ffmpeg-patches/000{1..9}-*.patch + 0010-* against pristine FFmpeg n8.1.1 PASS (2026-05-09). Build of libavfilter/vf_libvmaf.o PASS under both CONFIG_LIBVMAF_CUDA=0 (selector errors at filter- init time per #else branch) and CONFIG_LIBVMAF_CUDA=1 && !CONFIG_LIBVMAF_CUDA_FILTER (selector active, picture-pool wiring compiles).

0320 — Vulkan instance / VMA apiVersion bump to 1.4 (Step B)

  • Touches: core/src/vulkan/common.c, core/src/vulkan/vma_impl.cpp, core/src/vulkan/AGENTS.md.
  • Invariant: the four apiVersion sites (lines 54, 264, 374 of common.c; line 22 of vma_impl.cpp) request Vulkan 1.4, not 1.3. Together with the Step-A precise decorations in vif.comp / ciede.comp (PR #346) and the Phase-3 cross-subgroup release-acquire fix (PR #511), this gates the cross-backend places=4 contract on Arc + RADV. NVIDIA closure depends on Phase 3c (PR #512; block-on-merge until that lands). Netflix upstream does not carry a VMA dependency or a Vulkan backend; no upstream merge conflict expected on these files.
  • Re-test on rebase:
meson setup build -Denable_vulkan=enabled -Denable_cuda=false \
  -Denable_sycl=false --buildtype=release
ninja -C build
for D in 0 1 2; do
  python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --backends cpu vulkan --vulkan-device "$D" \
    --features vif ciede adm motion psnr
done
# All 0/N mismatches at places=4 once Phase 3c (PR #512) has landed.

ADR-0332 v2 runtime (T5-2c) — Embedded MCP server UDS + real compute_vmaf (2026-05-09)

  • Touches: core/src/mcp/{mcp.c,dispatcher.c,mcp_internal.h,meson.build,compute_vmaf.c,transport_uds.c}, core/test/test_mcp_smoke.c. All paths are fork-local. No new third-party vendor drop in v2 — mongoose vendoring stays deferred to v3 with the SSE transport.
  • Invariant: same as ADR-0209 v1 — the entire core/src/mcp/ subtree is fork-local; the public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged (only function bodies flipped — vmaf_mcp_start_uds from -ENOSYS to a working AF_UNIX listener; compute_vmaf from a {"status":"deferred_to_v2"} placeholder to a real vmaf_score_pooled binding). Per ADR-0128 § operational guardrails the UDS socket file is created mode 0700; that chmod happens in vmaf_mcp_start_uds after bind and is a load-bearing security invariant — do NOT relax it on rebase. compute_vmaf runs on a per-call ephemeral VmafContext so the host's main scoring run is unperturbed; do NOT rewire it to reuse server->ctx because vmaf_score_pooled commits the model destructively to the context.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. If upstream adds one, expect a port-only sync since names will collide.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
                                -Denable_mcp=true -Denable_mcp_stdio=true \
                                -Denable_mcp_uds=true
ninja -C build && meson test -C build test_mcp_smoke -v
# Real-score smoke (single 576x324 pair):
build/test/test_mcp_smoke 2>&1 | tail -3   # expects "16 tests run, 16 passed"

ADR-0332 v3 runtime (T5-2d) — Embedded MCP server SSE transport (2026-05-09)

  • Touches: core/src/mcp/{mcp.c,mcp_internal.h,meson.build,transport_sse.c}, core/meson_options.txt, core/test/test_mcp_smoke.c, docs/mcp/embedded.md, docs/adr/0332-mcp-runtime-v2.md (status-update appendix). All paths are fork-local. No third-party vendor drop in v3 — the originally-planned mongoose vendor was reversed because cesanta/mongoose 7.18 is GPL-2.0-only OR commercial, incompatible with the fork's BSD-3-Clause-Plus-Patent license (verified at upstream LICENSE 2026-05-09). The SSE transport is plain POSIX sockets in fork-owned C (~500 LOC).
  • Invariant: same as ADR-0209 / ADR-0332 v2 — the entire core/src/mcp/ subtree is fork-local; the public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged (only vmaf_mcp_start_sse's body flipped from -ENOSYS to a working AF_INET listener). The SSE listener binds INADDR_LOOPBACK only; do NOT switch to INADDR_ANY without a separate ADR + auth design (v3 ships intentionally without CORS/Bearer/per-session auth on the assumption of a same-host trust boundary). The SSE stop path uses shutdown(SHUT_RDWR) before close() — plain close() of an AF_INET listening fd from another thread does NOT unblock accept() on Linux; do NOT remove the shutdown call. enable_mcp_sse is now a feature option (default auto), not boolean false.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. Do NOT re-introduce mongoose (or any GPL-licensed HTTP library) on a future rebase without first amending CLAUDE §1 and adding a separate license-compatibility ADR.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
                                -Denable_mcp=true -Denable_mcp_stdio=true \
                                -Denable_mcp_uds=true \
                                -Denable_mcp_sse=enabled
ninja -C build && meson test -C build test_mcp_smoke -v
build/test/test_mcp_smoke 2>&1 | tail -3   # expects "17 tests run, 17 passed"

Status update 2026-05-09 — placeholder-ref hardening

  • Additional touches: same set as the 2026-05-08 ADR-0334 entry, no new files. The hardening adds a git diff -U0 ... -- docs/state.md call inside scripts/ci/state-md-touch-check.sh (case 4a) plus 10 additional fixture cases in scripts/ci/test-state-md-touch-check.sh.
  • New invariant: inserted lines in docs/state.md (lines starting with +, excluding the +++ b/... header) must not contain this PR / this commit / bare TBD / <PR> / #NNN. Canonical accept forms are PR #N and commit `<sha>`. The placeholder vocabulary is coupled to PR #541's audit findings — reword in lockstep with the ADR-0334 status-update appendix if the fork's row template changes.
  • Re-test on rebase: same bash scripts/ci/test-state-md-touch-check.sh run as the 2026-05-08 entry; the harness now reports 18/18 passed (was 8/8 passed).

0347 — Sanitizer matrix test-set scope (ADR-0347)

  • Touches: .github/workflows/tests-and-quality-gates.yml job sanitizers (build + test step), core/test/meson.build (no edits — the absence of any suite: 'unit' tag is the upstream state we now work with rather than against).
  • Invariant: the sanitizer job runs the full C unit-test set per sanitizer with a per-sanitizer deselect list driven by a case block on ${{ matrix.sanitizer }}. The deselect lists are load-bearing — each entry corresponds to a real bug tracked in docs/state.md. Under UBSan the build adds -Dc_args=-fno-sanitize=function -Dcpp_args=-fno-sanitize=function to suppress the K&R-prototype harness UB; the meson case branch must keep this build flag in sync with the test deselect entries. An upstream rebase that adds new test files via core/test/meson.build inherits full sanitizer coverage automatically (the workflow enumerates tests via meson test --list).
  • On upstream sync: if upstream Netflix lands a suite: 'unit' tagging convention, the workflow is robust to it (we already enumerate from meson test --list, not from --suite=unit). If upstream rewrites the harness to declare static char *test_X(void) with a (void) parameter, the -fno-sanitize=function flag becomes redundant — leave it in place (zero cost) until a deliberate cleanup PR reverts the suppression. If upstream lands a fix for any of the surfaced defects (SVMModelParser validation, feature_collector metadata leak, integer_adm::div_lookup race, framesync mutex mismatch), drop the corresponding deselect row from the workflow's case block in the same PR that pulls the upstream fix. cd libvmaf for SAN in address undefined thread; do EXTRA=() [ "$SAN" = undefined ] && EXTRA=( "-Dc_args=-fno-sanitize=function" "-Dcpp_args=-fno-sanitize=function" ) rm -rf "build-$SAN" CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld \ meson setup "build-$SAN" -Db_sanitize="$SAN" \ -Denable_cuda=false -Denable_sycl=false --buildtype=debug \ -Db_lto=false -Db_lundef=false "${EXTRA[@]}" meson compile -C "build-$SAN" case "$SAN" in address) EXCLUDE='test_model$|test_predict$|test_float_ms_ssim_min_dim$' ;; undefined) EXCLUDE='test_model$' ;; thread) EXCLUDE='test_model$|test_pic_preallocation$|test_framesync$' ;; esac TESTS=$(meson test -C "build-$SAN" --list \ | grep '^libvmaf:' \ | grep -vE "$EXCLUDE" \ | sed 's/^libvmaf://') meson test -C "build-$SAN" --print-errorlogs $TESTS

CodeQL bulk mechanical sweep — Python tree (2026-05-09)

  • Why this matters on rebase: no rebase impact. The diff lives entirely in python/vmaf/ and one fork-local helper (core/src/vulkan/spv_embed.py). None of the touched Python modules have been changed by Netflix upstream in over four years; the closest churn is unrelated additions to python/vmaf/script/run_*.py driver flags. A future /sync-upstream will land on a clean tree.
  • What changed: dead imports removed; exit()sys.exit() in seven CLI driver scripts; open(...)with open(...) in python/vmaf/tools/decorator.py and core/src/vulkan/spv_embed.py; typed except KeyError: pass bodies got an explanatory one-line comment to satisfy py/empty-except; pass removed where it was a no-op tail statement; one commented-out debug block deleted from tools/misc.py.
  • Re-test on rebase: python3 -c "import ast; [ast.parse(open(f).read()) for f in (...)]" over the touched files; ruff check over the same set must produce no NEW errors versus master baseline.

0345 — cambi × {CUDA, SYCL, HIP} GPU port planning (ADR-0345, docs-only)

  • Touches: docs/research/0091-cambi-gpu-port-planning-2026-05-09.md (new), docs/adr/0345-cambi-gpu-port-strategy.md (new), docs/adr/_index_fragments/0345-cambi-gpu-port-strategy.md (new fragment), docs/adr/_index_fragments/_order.txt (append slot), changelog.d/changed/cambi-gpu-planning-digest.md (new). No code. Companion to the per-port PRs that follow per the digest's §6 ordered plan (CUDA → SYCL → HIP).
  • Upstream source: none — fork-local planning artefact. Netflix/vmaf upstream has no CUDA / SYCL / HIP cambi twin and no plans to add one on those backends.
  • Invariant: the planning round locks Strategy II host-staged hybrid for the three pending backends, inheriting verbatim from ADR-0205 §Decision and ADR-0210 §Decision. The cross-backend gate contract for cambi is places=4 from day one on all backends — by construction (integer-only GPU pre-passes; byte-identical readback; unmodified host residual). If any per-port PR sees empirical drift from CPU, fix the kernel — never relax the gate (memory feedback_no_test_weakening). The shared cambi_internal.h host residual surface (shipped with PR #196 for the Vulkan port) is the load-bearing reuse point — all four GPU twins (Vulkan, CUDA, SYCL, HIP) link against it and inherit any future CPU-side c-value formula change automatically.
  • On upstream sync: no action required. If a future upstream sync introduces a Netflix/vmaf cambi GPU twin (extremely unlikely — Netflix has no public CUDA / SYCL / HIP cambi work), evaluate whether to drop the fork's twin in favour of upstream's per the standard prefer-upstream rule; otherwise no action.
  • Re-test on rebase: docs-only — no compile / runtime gate. The Strategy III v2 follow-up (parked per ADR-0205 §Out of scope) gets its own ADR + rebase-notes entry when profile data lands.

0320 — Vulkan VIF API-1.4 NVIDIA residual Phase 3b (deferral)

  • Touches: core/src/feature/vulkan/shaders/vif.comp (comment-only update at the Phase-4 reduction site — documents the Phase-3b candidate-fix experiments and the driver-side hypothesis; no code logic change vs. PR #511); docs/adr/0269-vif-ciede-precise-step-a.md (appended Phase-3b status update appendix; ADR body remains frozen per ADR-0028); docs/research/0090-...md (new); docs/state.md (row T-VK-VIF-1.4-RESIDUAL-ARC retired in favour of T-VK-VIF-1.4-RESIDUAL-NVIDIA-DEFERRED after the hardware-mapping correction); core/src/vulkan/AGENTS.md (Phase 3b update + rebase invariant for cross-backend gate device-name selection); changelog.d/fixed/vif-arc-mesa-anv-int64-reduction.md (new fragment).
  • Invariant: the workgroup-scope memoryBarrierShared(); barrier(); pair PR #511 introduced is load-bearing for the Arc + RADV lanes at API 1.4 and stays. Phase 3b confirmed it cannot be downgraded back to a bare barrier() even if the NVIDIA residual ever closes — Arc's clean state is contingent on the workgroup-scope pair.
  • Cross-backend gate device-selection invariant (NEW): scripts that target a specific Vulkan vendor must select by deviceName substring, not by --vulkan_device <index>. vmaf_vulkan_context_new's device sort is stable inside the same devtype_score bucket and the vkEnumeratePhysicalDevices enumeration order is host-policy-dependent (driver registration order in /etc/vulkan/icd.d/, Mesa device-select layer, VK_LOADER_* env vars). PR #511's commit message inverted the device map on this fork's CI workstation; the empirical numbers it cited as "NVIDIA" actually came from Arc and vice versa. New cross-backend lanes targeting a specific vendor should not inherit the off-by-one.
  • On upstream sync: vif.comp is fork-local; no upstream Netflix/vmaf has a Vulkan path. Cherry-picks from upstream cannot reach this file.
  • Re-test on rebase (assumes a multi-GPU CI workstation with NVIDIA + Arc + RADV; lavapipe-only CI lanes are a no-op for the API-1.4 residual since lavapipe never reproduced the bug):

# Local API-1.4 bump (off-master reproducer; do NOT commit).

sed -i 's/VK_API_VERSION_1_3/VK_API_VERSION_1_4/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1003000/VMA_VULKAN_VERSION 1004000/' \ core/src/vulkan/vma_impl.cpp cd libvmaf && meson setup build -Denable_vulkan=enabled \ -Denable_cuda=false -Denable_sycl=false && ninja -C build cd ..

# NVIDIA lane — expected 45/48 FAIL scale 2 until either the

# manual int64 subgroup-reduction patch lands or NVIDIA fixes

# the driver. Arc + RADV expected 0/48.

python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature vif --backend vulkan --device

# Revert local bump after testing.

sed -i 's/VK_API_VERSION_1_4/VK_API_VERSION_1_3/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1004000/VMA_VULKAN_VERSION 1003000/' \ core/src/vulkan/vma_impl.cpp

Upstream-port-later batch — Research-0090 18-commit triage close-out (2026-05-09)

  • Touches: docs/state.md (one row in "Deferred (waiting on external trigger)"), this file, changelog.d/changed/upstream-port-later-batch-2026-05-09.md. No code touched. Companion to PR #446 (Research-0090) and the in-flight PRs #497 (MyTestCase super-PR), #443 / #444 (cambi-docs duplicate pair).
  • Per-commit classification (input set: 18 PORT_LATER SHAs from Research-0090):
# Upstream SHA Subject (truncated) Verdict Reopen / forward path
1 38e905d1 adopt MyTestCase + reformat BD-rate test data PORT_DEFERRED Subsumed by PR #497 commit e1dbdc09; close out when #497 merges
2 005988ea adopt MyTestCase + port new tests + align fifo_mode PORT_DEFERRED Subsumed by PR #497 commit 6c05afe2; close out when #497 merges
3 4679db83 fix VMAFEXEC_score tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit 0004d2cf — must preserve fork's golden places= values byte-for-byte (CLAUDE §8 / ADR-0024)
4 3e075107 adopt MyTestCase + update score values in vmafexec tests PORT_DEFERRED Subsumed by PR #497 commit 0004d2cf; close out when #497 merges
5 e3827e4d adopt MyTestCase + port new tests in asset/bootstrap/local_explainer PORT_DEFERRED Subsumed by PR #497 commit 6c05afe2; close out when #497 merges
6 25ff9f18 remove empty VmafossexecCommandLineTest stub PORT_DEFERRED → CHERRY-PICK after #497 Pure 13-line deletion. PR #497 currently RE-EMITS the stub; once #497 lands, cherry-pick this commit standalone (zero-conflict against post-#497 tip).
7 3a041a97 adopt MyTestCase + update score values PORT_DEFERRED Subsumed by PR #497 commit d52d9221; close out when #497 merges
8 ead2d12b fix vif_scale3 + adm3_egl_1 tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit b5a3f61b — Netflix-golden tolerance guard same as row 3
9 6c097fc4 reduce ADM/VIF tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit f3881d5c — Netflix-golden tolerance guard same as row 3
10 7df50f3a align testutil with full set of fixture functions PORT_DEFERRED Subsumed by PR #497 commit f1ae0495; close out when #497 merges
11 322ca041 replace temporal slicing with pre-sliced YUV fixtures PORT_DEFERRED Subsumed by PR #497 commit 7d9d9a10; close out when #497 merges. Sequencing matters: this commit must land before rows 12, 14, 15, 17 (the YUV-fixture consumers); #497 already orders them correctly.
12 74bdce1b align vmafexec_feature_extractor_test (aim/adm3/motion3) PORT_DEFERRED Subsumed by PR #497 commit 07e7cb48; close out when #497 merges
13 a3776335 align feature_extractor_test (aim/adm3/motion3) PORT_DEFERRED Subsumed by PR #497 commit 15a6874d; close out when #497 merges
14 0341f730 remove duplicate test_run_vmaf_integer_fextractor PORT_DEFERRED → CHERRY-PICK after #497 Pure 76-line deletion. Same disposition as row 6 — #497 currently re-emits the duplicate; cherry-pick standalone after #497.
15 9fa593eb port feature_extractor tests for aim/adm3/motion3 + new options PORT_DEFERRED Subsumed by PR #497 commit ab21b694; close out when #497 merges
16 d93495f5 reduce tolerance for VMAF scores in quality_runner tests PORT_DEFERRED w/ Netflix-golden guard PR #497 — Netflix-golden tolerance guard same as row 3
17 7d1ad54b port feature extractor tests for aim/adm3/motion3 PORT_DEFERRED Subsumed by PR #497 commit 44b9e626; close out when #497 merges
18 721569bc resource/doc: cambi_high_res_speedup + motion2 score PORT_DEFERRED → DEDUP Already in flight on TWO branches (PR #443 + PR #444). Maintainer picks one and abandons the other per Research-0090 §Recommended action #4. No third port-PR opened.
  • Invariant: after PR #497 merges, the Research-0090 PORT_LATER bucket reduces to exactly two follow-up cherry-picks against post-#497 master:
  • git cherry-pick 25ff9f18 (delete empty VmafossexecCommandLineTest).
  • git cherry-pick 0341f730 (delete duplicate test_run_vmaf_integer_fextractor). Both are pure deletions on python/test/command_line_test.py and python/test/feature_extractor_test.py respectively; no score change, no Netflix-golden interaction. They were excluded from PR #497 because the v2 super-PR's diff state currently RE-EMITS those identifiers (likely because #497 cherry-picked from an earlier upstream tip than 25ff9f18 / 0341f730).
  • Netflix-golden guard (binding): per CLAUDE §8 / ADR-0024, the three Netflix CPU golden pairs in python/test/quality_runner_test.py, vmafexec_test.py, vmafexec_feature_extractor_test.py, feature_extractor_test.py, result_test.py (1 normal src01_hrc00↔hrc01 + 2 checkerboard) carry hard-coded assertAlmostEqual rows that are NEVER modified by a fork PR. Upstream commits 4679db83, ead2d12b, 6c097fc4, d93495f5 explicitly LOWER places= on a subset of those rows (their stated motivation is macOS FP precision drift, not a true score change). Reviewer of PR #497 must verify that the 3 golden pairs retain fork tolerances byte-for-byte; only non-golden rows may adopt the relaxations.
  • On upstream sync: future /sync-upstream runs that re-detect these 18 SHAs should match this entry via the SHA list and short-circuit Pass-2 classification (skip re-triage).
  • Re-test on rebase: none required at the time of this commit (no code touched); after the two follow-up cherry-picks (25ff9f18 + 0341f730) eventually land, run meson test -C build --suite=fast make test-netflix-golden # 3/3 CPU goldens still pass

0356 — Vulkan two-level GPU reduction for VIF / ADM / motion

  • Touches: core/src/feature/vulkan/vif_vulkan.c, adm_vulkan.c, motion_vulkan.c, core/src/vulkan/picture_vulkan.{h,c}, core/src/vulkan/meson.build, core/src/feature/vulkan/shaders/vif_reduce.comp, adm_reduce.comp, motion_reduce.comp.
  • Invariant: The vif_reduce.comp / adm_reduce.comp / motion_reduce.comp shaders are ACCUM_FIELDS=7 / ACCUM_SLOTS=6 / single-field. If an upstream sync adds or removes fields from the per-WG accumulator layout in vif.comp / adm.comp / motion.comp, the corresponding reducer shader and the host-side VIF_ACCUM_FIELDS / ADM_ACCUM_SLOTS_PER_WG constants must be updated in lockstep. Mismatch = silent miscompute.
  • Re-test: meson test -C build --suite=fast + run scripts/ci/cross-backend-diff.sh --backend=vulkan --places=4 against the Netflix normal pair on any available Vulkan device. e notes

Single ledger of fork-local changes that need attention when this fork syncs from upstream/master (Netflix/vmaf). Required by ADR-0108: every fork-local PR that touches upstream-shared paths or establishes a rebase-sensitive invariant adds an entry here. PRs with no rebase impact state "no rebase impact" in the PR description and skip the entry.

The intended reader is whoever runs the next /sync-upstream (see ADR-0002 and .claude/skills/sync-upstream/). Read top-to-bottom before resolving conflicts.

Format

Each entry is a ### NNNN — short title heading with three fields:

  • Touches: paths likely to conflict on upstream merge.
  • Invariant: what the fork relies on that an upstream change could silently drop.
  • Re-test: the command(s) to run after the merge to confirm the invariant survived. Reproducer-style — no surrounding prose required.

IDs are assigned in commit order and never reused. A single entry may cover several PRs in one workstream; cross-link from the ID heading.

Entries (backfilled 2026-04-18 per ADR-0108 adoption)

0332 — Agent worktree-drift hard guard (ADR-0332)

  • Touches: .pre-commit-config.yaml (one new local hook id), Makefile (hooks-install target — comment-only edit), scripts/ci/check-agent-worktree-drift.sh (new), scripts/ci/test_check_agent_worktree_drift.sh (new), AGENTS.md (new section §12a), docs/development/agent-worktree-discipline.md (new), docs/adr/0332-*.md (new), docs/adr/_index_fragments/ (new fragment + _order.txt append), changelog.d/added/agent-worktree-drift-guard.md (new).
  • Invariant on upstream-mirror files: none — every touched path is fork-local. The pre-commit hook ID agent-worktree-drift-guard is unique to this fork; upstream Netflix/vmaf has no local hook block in its (also non-existent) .pre-commit-config.yaml.
  • On upstream sync: no expected conflict. The .pre-commit-config.yaml block is fork-only; if Netflix ever introduces its own pre-commit config we'll rebase the agent-worktree-drift-guard local hook on top of upstream's blocks but the YAML structure is independent.
  • Re-test on rebase:

```bash bash scripts/ci/test_check_agent_worktree_drift.sh # End-to-end: refused commit from main with active agent. cd "$(git -C . rev-parse --show-toplevel)" && \ bash scripts/ci/check-agent-worktree-drift.sh ; echo "exit=$?" # Allowed commit from inside an agent worktree. cd "$(git -C . rev-parse --show-toplevel)/.claude/worktrees/agent-" && \ bash $OLDPWD/scripts/ci/check-agent-worktree-drift.sh ; echo "exit=$?"

0320 — psnr_cuda chroma extension (ADR-0351)

  • Touches: core/src/feature/cuda/integer_psnr/psnr_score.cu (kernel — fork-only file, BSD-3-Clause-Plus-Patent / Lusoris+Claude header) and core/src/feature/cuda/integer_psnr_cuda.c (host glue — fork-only file, same header). Neither path is in upstream Netflix/vmaf. The PTX module key (psnr_score) and nvcc extra flags table in core/src/meson.build are unchanged by this PR.
  • Invariant: the kernel signature now takes a plane parameter (unsigned) appended after (width, height). The host file's psnr_cuda_dispatch packs kernelParams[] in the exact (ref, dis, sse, &width, &height, &plane) order — any refactor that reorders or drops the trailing argument silently breaks the chroma path because cuLaunchKernel cannot validate argument types. The host also relies on the picture_cuda upload path having uploaded all 3 planes for non-YUV400P inputs (see libvmaf.c::translate_picture_host's upload_mask); a future "minimise upload" optimisation must consult the extractor's chars to decide which planes can be skipped, not assume luma-only.
  • Upstream interaction: none — CUDA backend is not in Netflix/vmaf upstream. Cherry-picks from upstream that touch core/src/feature/integer_psnr.c (the CPU twin) need attention only if they change psnr_name[], mse_name[], or the enable_chroma semantics; the CUDA path mirrors those conventions byte-for-byte to keep the cross-backend gate clean.
  • Re-test on rebase:

```bash cd libvmaf && meson setup build -Denable_cuda=true && ninja -C build python ../scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build/tools/vmaf \ --reference ../testdata/ref_576x324_48f.yuv \ --distorted ../testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend cuda --places 4

ADR-0994 — coverage build-break fix: remove vmaf_fex_integer_motion_v2 from feature_extractor.cpp + guard motion_five_frame_window in integer_motion.c (2026-06-03)

  • Touches: core/src/feature/integer_motion.c, core/src/feature/feature_extractor.cpp, docs/adr/0994-coverage-build-fix-motion-v2-ref.md (new), docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0994-coverage-build-fix-motion-v2-ref.md (new).
  • No rebase-sensitive invariants: bug fix only. Both files touched are production C/C++ sources; no option table or ABI surface changed.
  • Relation to ADR-0337: when the prev_prev_ref picture-pool refactor (ADR-0337 deferred hunk) lands, flip the -ENOTSUP guard in integer_motion.c::init() and restore the fex->prev_prev_ref reference in extract(), then remove the motion_five_frame_window comment block added here.

ADR-0337 — motion_v2 public option surface duplication (2026-05-09)

  • Touches: core/src/feature/integer_motion_v2.c, core/src/feature/x86/motion_v2_avx2.c, core/src/feature/x86/motion_v2_avx512.c, core/src/feature/arm64/motion_v2_neon.c, docs/adr/0337-motion-v2-public-api-options.md (new), docs/adr/_index_fragments/0337-motion-v2-public-api-options.md (new), docs/adr/_index_fragments/_order.txt (one-line append), docs/adr/README.md (regenerated), changelog.d/added/motion-v2-public-api-options.md (new), changelog.d/fixed/motion-v2-mirror-off-by-one.md (new), docs/state.md (deferral row update), core/src/feature/AGENTS.md (invariant note).
  • Upstream cluster ported: Netflix/vmaf 856d3835 (mirror off-by-one fix, propagated to scalar + AVX2 + AVX-512 + fork-local NEON), c17dd898 (motion_max_val option), a2b59b77 (motion_five_frame_window option, partial — see Deferred-hunks below), 4e469601 (remaining options + motion3_v2_score provided feature, manual port adapted to 3-frame mode only).
  • Architectural decision: ADR-0337 picks A1 — duplicate option surfaces between motion v1 and motion_v2. v1 (integer_motion.c) and v2 (integer_motion_v2.c) each register their own VmafOption[] table; the seven option names match upstream byte-for-byte so future /sync-upstream runs find no behavioural delta. The duplication is purely textual (~80 LOC of option-table rows + 7 struct fields). Touching one extractor's help string requires touching the other; ADR-0141 catches drift on the next edit.
  • Invariants:
  • motion_five_frame_window=true returns -ENOTSUP at init() on motion_v2. Mirrors ADR-0219 §Decision's GPU motion3 precedent. The 3-frame default mode is fully supported. When the picture-pool plumbing follow-up lands, the -ENOTSUP guard flips to a prev_prev_ref lookup; until then any caller passing =true sees a hard error.
  • motion v1's option surface is the source of truth for the seven shared option names. ADR-0158 carries v1's history; ADR-0337 carries v2's. Both extractors emit independently into the feature collector under VMAF_integer_feature_motion*_score (v1) and VMAF_integer_feature_motion*_v2_score (v2) — there is no shared output namespace.
  • GPU twins (CUDA / SYCL / HIP / Vulkan) of motion_v2 do NOT yet register the option surface in this PR. Their motion3_v2_score emission is out of scope per ADR-0337 §Consequences; whether the GPU twins gain the same options follows when a model needs the score there. The mirror off-by-one fix in 856d3835 is propagated only to scalar + AVX2 + AVX-512 + NEON in this PR; CUDA / SYCL / HIP / Vulkan mirror formulae stay on the pre-fix 2*size - idx - 1 form and document the divergence. Refresh tracked as a follow-up.
  • Deferred upstream hunks (kept in upstream commits but not ported in this PR; tracked here so the next /sync-upstream finds them):
  • a2b59b77 hunks in core/src/feature/feature_extractor.h (adds VmafPicture prev_prev_ref to the per-extractor framework struct), core/src/libvmaf.c (picture-pool sizing n_threads * 2 + 2, prev_prev_ref plumbing through threaded_extract_func / threaded_extract_batch_func / threaded_read_pictures / threaded_read_pictures_batch, vmaf_close cleanup), core/tools/vmaf.c (CLI picture-pool sizing). Conflicts in 8 regions on the fork's read_pictures* decomposition (ADR-0152 monotonic-index gate) make an in-PR port unsafe; the picture-pool refactor will land as its own PR with a five-frame-window fixture under python/test/ once the framework changes are reviewed in isolation.
  • a2b59b77's 5-frame branch in extract() (uses fex->prev_prev_ref directly) lands together with the framework hunks above.
  • 4e469601's 5-frame flush() branch (variable stride / min_idx = 2, lo_idx/hi_idx window) is collapsed in this PR to the min_idx = 1 constant for 3-frame mode only. The branch is structurally one if (s->motion_five_frame_window) away from full upstream parity; reinstate when the prev_prev_ref plumbing lands.
  • On upstream sync: future /sync-upstream against Netflix/vmaf master will report all four commits as already ported (cherry-picked or hand-ported with (cherry picked from …) trailers). When the picture-pool refactor PR lands, this ledger row gains a "5-frame mode now wired" sub-bullet and the -ENOTSUP guard flips. Avoid re-discovering the four commits as pending — the deferred hunks are tracked above, not the whole commits.

0310 — Vulkan VIF int64 reduction race condition Phase 3 fix

  • Touches: core/src/feature/vulkan/shaders/vif.comp (replaces all three bare barrier() calls with explicit memoryBarrierShared(); barrier(); pairs covering the Phase-1 cooperative tile load, the Phase-2 vertical-conv shared write, and the Phase-4 cross-subgroup int64 reduction); plus documentation under docs/research/0089-...md (Phase 3 status appendix), docs/adr/0269-...md (Phase 3 status appendix), docs/state.md (T-VK-VIF-1.4-RESIDUAL closed; new T-VK-VIF-1.4-RESIDUAL-ARC opened), core/src/vulkan/AGENTS.md (Phase 3 update on the existing invariant row), changelog.d/fixed/vif-int64-reduction-race-condition.md. Upstream Netflix/vmaf has no Vulkan backend, so conflict probability for the shader is zero. The entry exists because the fix is rebase-sensitive: any future cherry-pick that touches vif.comp and downgrades a memoryBarrierShared(); barrier(); pair back to a bare barrier() will silently re-introduce the NVIDIA Vulkan 1.4 race.
  • Invariant: vif.comp shared-memory ordering between cooperative-write phases must be release-acquire, not just a bare workgroup-execution barrier. NVIDIA's Vulkan 1.4 default memory model requires the explicit shared-memory release; bare barrier() works at API 1.3 by accident on this driver. SCALE is irrelevant — the fix applies to all four pipeline specialisations because the barrier sites are in the SCALE-shared code. Do NOT remove the explicit memoryBarrierShared() calls even if a perf review claims they are redundant under the GLSL spec wording: empirical real-hardware evidence in research-0089 2026-05-09 appendix shows otherwise on NVIDIA driver 595.71.05.
  • Re-test: apply the local API-1.4 bump (core/src/vulkan/common.c 3 sites + vma_impl.cpp VMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build with meson setup ... -Denable_vulkan=enabled, then run python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan --device 1 --places 4. Expect 0/48 across all four scales. Run the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3" against --vulkan_device 1; expect 5 identical (integer_vif_num_scale2, integer_vif_den_scale2) = (+2.494358e+04, +2.522523e+04) pairs at frame 5. Note that --vulkan_device 0 on this multi-GPU host is the Intel Arc A380 lane and will still fail at API 1.4 (separate T-VK-VIF-1.4-RESIDUAL-ARC row Open).

0309 — Vulkan VIF API-1.4 Phase 2 dump (T-VK-VIF-1.4-RESIDUAL)

  • Touches: docs/research/0089-vulkan-vif-fp-residual-bisect-2026-05-08.md (2026-05-09 status appendix with empirical numbers from the live RTX 4090), docs/state.md (T-VK-VIF-1.4-RESIDUAL row updated with the localisation), core/src/vulkan/AGENTS.md (new invariant row pinning the SCALE = 2 cross-subgroup-reduction memory-model finding), CHANGELOG.md (lusoris fork "Changed" entry). No code touched; the Phase 3 shader memory-model fix lands in a separate PR. Upstream Netflix/vmaf has no Vulkan backend so conflict probability for the AGENTS.md row is zero — entry exists because the empirical localisation flips the open state-row hypothesis from FP-precision to memory-model and retires the places=3 override path that earlier rebase scaffolding might have suggested.
  • Invariant: vif.comp SCALE = 2 specialisation's Phase-4 cross-subgroup int64 reduction is non-deterministic on NVIDIA driver 595.71.05 + Vulkan 1.4.341 (lines 547–592, subgroupAdd barrier() + thread-0 read of s_lmem). API 1.3 lane is fully deterministic on the same hardware. The four apiVersion pinning sites in core/src/vulkan/common.c + core/src/vulkan/vma_impl.cpp stay at 1.3 until Phase 3 lands the explicit memory-scope barrier and a 5-run determinism gate confirms run-to-run identical (num, den) plus places=4 0/48 on NVIDIA. The places=3 override path is eliminated from the unblock options.
  • Re-test: apply the local API-1.4 bump (core/src/vulkan/common.c 3 sites + vma_impl.cpp VMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build with meson setup ... -Denable_vulkan=enabled, then run the gate and the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3". Expect 45/48 places=4 failures on integer_vif_scale2 (max abs 1.527e-02) AND 5 distinct (integer_vif_num_scale2, integer_vif_den_scale2) pairs across 5 runs of --feature 'vif_vulkan=debug=true'. Both observations reproduced bit-for-bit on this session's hardware lane (UUID e478b41b-5c4f-1ddb-f990-e44916aff4c8).

0308 — encoder knob-sweep recipe-regression policy (ADR-0308, docs-only)

  • Touches: docs/research/0080-encoder-knob-sweep-findings.md, docs/adr/0308-encoder-knob-sweep-recipe-regression-policy.md, docs/adr/README.md (index row), ai/AGENTS.md (knob-sweep invariant section), changelog.d/changed/encoder-knob-sweep-findings.md. No code touched; companion to PR #400 (ADR-0305 + Research-0077 + ai/scripts/analyze_knob_sweep.py). Upstream Netflix/vmaf has no encoder-knob-sweep surface, so conflict probability is zero — this entry exists only because the policy threshold (7-of-9 structural cut) is rebase-sensitive on the corpus shape.
  • Invariant: the 7-of-9 source-count threshold from ADR-0308 §Decision point 1 is calibrated against the current 9-source Netflix Public Dataset corpus. If the corpus grows past 9 sources (e.g. UGC expansion per ADR-0287, or HDR additions), re-derive the absolute threshold as a fraction (≥7/9 ≈ 78 %). The structural cluster is sharp on the current corpus (top-15 cells all hit 9-of-9, no observed cells in 4-6 range), so a fractional cut at ~75 % is robust. Do NOT relax bitrate_tol_pct (default 5.0) or vmaf_tol (default 0.1) in ai/scripts/analyze_knob_sweep.py without an ADR — those tolerances are calibrated against the per-frame VMAF noise floor and bitrate quantisation in libavformat muxers.
  • Re-test: pytest ai/tests/test_knob_sweep_analysis.py -v (script logic; ships in PR #400). Policy gate is offline: regenerate runs/phase_a/full_grid/comprehensive.jsonl via tools/vmaf-tune/src/vmaftune/hw_encoder_corpus.py (3-hour run on a single host with NVENC + QSV) then re-run python ai/scripts/analyze_knob_sweep.py --jsonl <adapted.jsonl> --out-dir runs/phase_a/full_grid/reports/ and diff the resulting summary.md against docs/research/0080-encoder-knob-sweep-findings.md headline table. Structural cluster (top-15 cells, all 9-of-9) is the invariant to defend.

0228 — Vulkan 1.4 bump deferred (ADR-0264, docs-only)

  • Touches: none (docs-only PR). Future Step A of T-VK-1.4-BUMP will touch core/src/feature/vulkan/shaders/vif.comp and core/src/feature/vulkan/shaders/ciede.comp; Step B will touch the three apiVersion sites in core/src/vulkan/common.c (lines 54, 264, 374) and the VMA_VULKAN_VERSION define in core/src/vulkan/vma_impl.cpp (line 22).
  • Invariant: master stays on VK_API_VERSION_1_3 and VMA_VULKAN_VERSION = 1003000. Lifting the constant in any future upstream sync (Netflix doesn't ship a Vulkan backend, so the conflict is improbable) without first auditing precise / OpDecorate ... NoContraction decoration on vif.comp and ciede.comp will reintroduce the NVIDIA-driver regression captured in research-0053. The psnr_hvs_strict_shaders -O0 list in core/src/vulkan/meson.build is the existing precedent for shader-side bit-exactness mitigations and should be the place a 1.4-era audit lands its results (potentially expanding to cover vif.comp + ciede.comp if the precise audit decides the optimizer is the right place to gate).
  • Re-test: when Step B lands, the gate is python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan and the same with --feature ciede against NVIDIA + RADV + lavapipe; max abs diff must stay ≤ 5.0e-05 (places=4) on all three.

0229 — HIP fifth-consumer kernel float_ansnr_hip (ADR-0266)

0228 — y4m_convert_411_422jpeg 1-byte heap-buffer-overflow fix

0228 — vmaf-tune resolution-aware model selection (ADR-0289)

0282 — vmaf-tune AMD AMF codec adapters (ADR-0282)

0228 — tools/vmaf-tune/ codec-agnostic encode dispatcher (ADR-0294)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/encode.py — refactored to look up the codec adapter and delegate argv composition. Wholly fork-local.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, codec_adapters/x264.py — adapter contract gains ffmpeg_codec_args(preset, quality) and extra_params(). Both are duck-typed; missing methods fall back to the legacy x264-CRF shape.
  • tools/vmaf-tune/tests/test_encode_multi_codec.py — new 19-test suite pinning the dispatcher contract per codec.
  • docs/usage/vmaf-tune.md — new "Codec adapter contract" section.
  • Invariant: the harness (encode.py, corpus.py) must not branch on codec identity. The only codec-aware code is the per-adapter codec_adapters/*.py file. Any future change that adds an if adapter.encoder == "..." to the harness regresses ADR-0294's whole-purpose. The corpus row schema stays at SCHEMA_VERSION=1 — crf is preserved as the row column even when the underlying codec's quality knob is -cq / -qp / etc.; EncodeRequest.quality is a request-side property only. Adapters that don't yet expose ffmpeg_codec_args are intentionally permitted to fall back to the legacy x264-CRF shape; removing that fallback would break in-flight adapter PRs landing one-at-a-time.
  • Re-test on rebase:

```bash pytest tools/vmaf-tune/tests/ -q # 32 passed (13 existing + 19 multi-codec)

python -c " from pathlib import Path from vmaftune.encode import EncodeRequest, build_ffmpeg_command req = EncodeRequest( source=Path('ref.yuv'), width=1920, height=1080, pix_fmt='yuv420p', framerate=24.0, encoder='libx264', preset='medium', crf=23, output=Path('out.mp4'), ) cmd = build_ffmpeg_command(req) assert cmd[cmd.index('-c:v') + 1] == 'libx264' assert cmd[cmd.index('-preset') + 1] == 'medium' assert cmd[cmd.index('-crf') + 1] == '23' print('x264 dispatcher path OK') "

0260 — vmaf-tune --sample-clip-seconds (ADR-0301)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/{cli,corpus,encode,score,__init__}.py — fork-local. No upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/tests/test_corpus.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0301-vmaf-tune-sample-clip.md, docs/adr/_index_fragments/0301-vmaf-tune-sample-clip.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md.
  • Invariant: corpus JSONL SCHEMA_VERSION bumped to 2 — additive clip_mode key only. Sample-clip windows are mirrored on both sides via FFmpeg input-side -ss/-t (encode) and libvmaf's --frame_skip_ref / --frame_cnt (score). The _resolve_sample_clip() helper is the single source of truth for the centre-anchored slice math; do not duplicate the computation elsewhere. Falls back silently to "full" when N >= duration_s.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep sample-clip

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_amf,hevc_amf,av1_amf,_amf_common}.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py — registry extended with three AMF entries.
  • tools/vmaf-tune/tests/test_codec_adapter_amf.py (new).
  • tools/vmaf-tune/tests/test_corpus.py — Phase A test renamed from test_known_codecs_phase_a_is_x264_only to test_known_codecs_includes_x264_and_amf.
  • tools/vmaf-tune/AGENTS.md — adds AMF preset-compression invariant.
  • docs/usage/vmaf-tune.md — adds Hardware encoders section.
  • Invariant: the 7-into-3 preset compression table in _amf_common.py (_PRESET_TO_AMF) is the cross-codec axis Phase B / C consumers depend on. Every AMF adapter accepts the canonical 7 preset names (placeboultrafast) and maps them onto the three AMF rungs (quality / balanced / speed). Do not extend the preset vocabulary without amending ADR-0282 — registry uniformity (no codec-identity branching in the harness search loop) rests on every codec accepting the same names.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/resolution.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/corpus.py — adds CorpusOptions.resolution_aware: bool = True and pipes the effective model through score_res.request.model into the JSONL row.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds --resolution-aware / --no-resolution-aware (BooleanOptionalAction, default on).
  • tools/vmaf-tune/tests/test_resolution.py (new).
  • docs/usage/vmaf-tune.md — new "Resolution-aware mode" section.
  • docs/adr/0289-vmaf-tune-resolution-aware.md (new) + docs/research/0064-vmaf-tune-resolution-aware.md (new).
  • tools/vmaf-tune/AGENTS.md — two new invariant notes.
  • Invariant: the height-only decision rule (height >= 2160vmaf_4k_v0.6.1, else vmaf_v0.6.1) is the documented contract. The JSONL vmaf_model field is now per-row (not per-job) — mixed ladder corpora legitimately contain multiple distinct values across rows. Downstream consumers (Phase B / C / D) must group/filter by vmaf_model rather than assuming a constant. Width is accepted in the API for symmetry but ignored in the body; do not branch on it without a follow-up ADR.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep resolution-aware

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • core/tools/y4m_input.c — upstream-mirrored Daala-derived Y4M parser. The fix sits inside the 4:1:1 → 4:2:2-jpeg chroma upsample routine y4m_convert_411_422jpeg, lines ~500–530 in the function's three sub-loops. Upstream Netflix/vmaf carries the same shape; if upstream lands its own fix during a sync, prefer the upstream version and drop ours.
  • core/test/test_y4m_411_oob.c (new, fork-local) — drives the minimal W=2 H=4 4:1:1 stream through video_input_open + video_input_fetch_frame. Wholly fork-added; no upstream collision.
  • core/test/meson.build — adds test_y4m_411_oob executable + test() registration.
  • Invariant: the first two sub-loops of y4m_convert_411_422jpeg must guard _dst[(x << 1) | 1] writes with (x << 1 | 1) < dst_c_w, matching the third sub-loop's existing guard. Without the guard a 4:1:1 stream of width 2 (dst_c_w == 1) writes one byte past the destination chroma row.
  • Re-test:
  • cd libvmaf && meson setup ../build-asan --buildtype=debug -Db_sanitize=address -Db_lundef=false -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
  • ninja -C build-asan test/test_y4m_411_oob
  • ASAN_OPTIONS=detect_leaks=0 ./build-asan/test/test_y4m_411_oob — must report 1 tests run, 1 passed. Pre-fix the binary aborts with AddressSanitizer: heap-buffer-overflow … WRITE of size 1 at y4m_input.c:507.

0270 — saliency_student_v1 fork-trained on DUTS-TR (ADR-0286)

  • Touches:
  • model/tiny/registry.json — adds the saliency_student_v1 row. Fork-local registry; no upstream overlap.
  • model/tiny/saliency_student_v1.onnx (+ .json sidecar) — new weights and metadata. Fork-local.
  • ai/scripts/train_saliency_student.py — new training script. Wholly fork-local under ai/, which has no upstream counterpart.
  • docs/ai/models/saliency_student_v1.md, docs/research/0062-saliency-student-from-scratch-on-duts.md, docs/adr/0286-saliency-student-fork-trained-on-duts.md — new docs under fork-local trees.
  • Invariant: the C-side feature_mobilesal.c extractor's tensor-name contract — input (NCHW [1, 3, H, W]) and saliency_map (NCHW [1, 1, H, W]) — must continue to match the ONNX graph for both saliency_student_v1.onnx and the legacy mobilesal.onnx placeholder. Future weights swaps can change the graph internals freely but must keep these names + shapes; the smoke test asserts the registration. The op-allowlist constraint (graph uses only ops in core/src/dnn/op_allowlist.c) carries over from ADR-0218 — Resize is not used; ConvTranspose is the upsample op for v1 to keep the graph load-clean against vanilla origin/master.
  • Re-test:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python -c "
from ai.src.vmaf_train.op_allowlist import check_model
from pathlib import Path
r = check_model(Path('model/tiny/saliency_student_v1.onnx'))
assert r.ok, r.pretty()
print('allowlist OK')
"
meson test -C build --suite=fast mobilesal

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • core/src/feature/hip/float_ansnr_hip.{c,h} (new) — fifth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/float_ansnr_cuda.c call-graph-for-call-graph; init/submit/collect/close invoke the kernel-template helpers in the same order; the submit body intentionally bypasses vmaf_hip_kernel_submit_pre_launch (no atomic, kernel writes per-block (sig, noise) interleaved float partials directly).
  • core/src/hip/meson.build — adds the new TU to hip_sources.
  • core/src/feature/feature_extractor.c — adds the extern VmafFeatureExtractor vmaf_fex_float_ansnr_hip; declaration and the registry row under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — adds test_float_ansnr_hip_extractor_registered sub-test pinning the lookup contract.
  • Invariant — the submit_pre_launch bypass is load-bearing. The CUDA twin makes the same choice for the same reason. If a future PR adds a submit_pre_launch call to float_ansnr_cuda.c's submit path, the HIP twin must follow in the same PR. Likewise the readback shape (wg_count * 2u * sizeof(float)) and the bpc table (peak/psnr_max for 8/10/12/16-bit) mirror the CUDA twin verbatim — keep aligned on rebase.
  • Re-test on rebase:
cd libvmaf
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build  # 48/48 green (47 CPU + HIP smoke)

0230 — HIP sixth-consumer kernel motion_v2_hip (ADR-0267)

  • Touches:
  • core/src/feature/hip/integer_motion_v2_hip.{c,h} (new) — sixth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/integer_motion_v2_cuda.c call-graph-for-call-graph; carries the VMAF_FEATURE_EXTRACTOR_TEMPORAL flag and a flush() callback. The state struct has a uintptr_t pix[2] ping-pong slot pair tracked outside the kernel-template (the template models a single device+host pair only).
  • core/src/hip/meson.build — adds the new TU to hip_sources.
  • core/src/feature/feature_extractor.c — adds the extern VmafFeatureExtractor vmaf_fex_integer_motion_v2_hip; declaration and the registry row under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — adds test_motion_v2_hip_extractor_registered sub-test pinning the lookup contract (extractor name is motion_v2_hip, matching the CUDA twin's motion_v2_cuda naming).
  • Invariant — temporal-extractor + ping-pong shape. The VMAF_FEATURE_EXTRACTOR_TEMPORAL flag bit, the flush() callback registration, and the uintptr_t pix[2] slot pair are load-bearing for the runtime PR (T7-10b). The runtime PR will swap uintptr_t pix[2] for a real device-buffer handle pair matching the CUDA twin's VmafCudaBuffer *pix[2]. On rebase: if the CUDA twin's flush-pass shape changes (currently min(score[i], score[i+1])), update the HIP twin's flush_fex_hip body in the same PR.
  • Re-test on rebase: same as 0229 — meson test -C build with enable_hip=true exercises the smoke contract.

0227 — ms_ssim_vulkan submit-side migrated to kernel_template (T-GPU-DEDUP-26)

  • Touches:
  • core/src/feature/vulkan/ms_ssim_vulkan.cextract()'s raw VkCommandBuffer / VkFence / vkAllocateCommandBuffers / vkBeginCommandBuffer / vkCreateFence / vkQueueSubmit / vkWaitForFences / vkDestroyFence / vkFreeCommandBuffers blocks become VmafVulkanKernelSubmit triples (vmaf_vulkan_kernel_submit_begin / _submit_end_and_wait / _submit_free). One triple covers the decimate-pyramid command buffer; one triple per scale covers the per-scale SSIM submit. The pipeline-side bundles (pl_decimate 2-binding 4-variant + pl_ssim 10-binding 9-variant) and their _add_variant() chains are unchanged from the prior migration.
  • Invariant: any future submit-side template change (timeline semaphores, deferred fence release, queue-family parameterisation) must keep the helpers' synchronous-wait + per-frame fence + per-frame command-buffer contract intact, since ms_ssim_vulkan.c does host readback of the l_partials / c_partials / s_partials buffers immediately after _submit_end_and_wait returns. The submit-side contract is the same one already documented in core/src/vulkan/AGENTS.md's "Rebase-sensitive invariants" section for kernel_template.h.
  • Re-test:

```bash cd libvmaf && meson test -C build python scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature float_ms_ssim --backend vulkan --places 4

0231 — SHA-pin GitHub Actions (OSSF Pinned-Dependencies)

  • Touches: every workflow file under .github/workflows/. All 13 fork workflows (docker-image.yml, docs.yml, ffmpeg-integration.yml, libvmaf-build-matrix.yml, lint-and-format.yml, nightly-bisect.yml, nightly.yml, release-please.yml, rule-enforcement.yml, scorecard.yml, security-scans.yml, supply-chain.yml, tests-and-quality-gates.yml) had their uses: directives rewritten from <owner>/<repo>@vN[.M.K] to <owner>/<repo>@<40-char-sha> # vN.M.K. 97 references converted; the SLSA reusable-workflow ref in supply-chain.yml is the single documented holdout (see Invariant below).
  • Invariant — SHA-pin policy for uses:. Every action reference in .github/workflows/*.yml MUST be a 40-char commit SHA with the semver tag preserved as a trailing # vN.M.K comment. The OSSF Scorecard Pinned-Dependencies check parses both forms and a floating tag (@vN) is treated as unpinned and counts against the aggregate score. Single permitted exception: the SLSA generator reusable workflow (slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml) must keep its vX.Y.Z tag form because GitHub Actions consumers cannot SHA-pin reusable-workflow refs in every code path; the exception is documented inline in supply-chain.yml and survives on each rebase. Why this matters on upstream sync: Netflix upstream does not ship the fork's CI tree, so a /sync-upstream run that drags new workflow content (e.g. via repository templates or bot-authored bumps) into .github/workflows/ can re-introduce floating-tag references unnoticed. The post-rebase check below is the standing gate — anything that lights up needs to be re-pinned before merging the sync.
  • Re-test on rebase:
# Anything that prints is a regression — every uses: must be either
# already SHA-pinned (40 hex) or, for the documented SLSA exception,
# the slsa-github-generator reusable-workflow ref.
grep -hnE '^\s*(- )?uses:\s+[^@]+@[^ #]+\s*$' .github/workflows/*.yml \
  | grep -vE '@[a-f0-9]{40}' \
  | grep -v 'slsa-framework/slsa-github-generator/.github/workflows/'
# SHA-resolution sanity for any new pin (per-action):
gh api repos/<owner>/<repo>/git/ref/tags/<vN.M.K> --jq '.object.sha'
# If the result is a "tag" object (annotated tag), deref:
gh api repos/<owner>/<repo>/git/tags/<sha-from-prev> --jq '.object.sha'

0226 — CUDA drain-batch engine-loop opt (T-GPU-OPT-1)

  • Touches:
  • core/src/cuda/drain_batch.{h,c} (new) — TLS drain-batch table + shared drain stream + _open()/register/_flush()/_close() API.
  • core/src/libvmaf.c — engine-side per-frame loop now wraps submit/collect with _open() + _flush() so all CUDA extractor finished events are waited on a single shared drain stream.
  • All 12 CUDA feature kernels (core/src/feature/cuda/*.c) register their finished event + drained flag with the drain batch on submit; collect skips its private cuStreamSynchronize when drained is true.
  • Invariant — drained-flag contract. Every CUDA extractor's collect path must check the per-frame drained flag and skip its own cuStreamSynchronize when set; otherwise the drain batching is a no-op. The flag is reset to false per frame inside vmaf_cuda_drain_batch_register().
  • Re-test on rebase:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast cuda

Expected: all CUDA tests green; bench shows ≥5% wall-clock gain on a 7-extractor VMAF model (model.json with all feature extractors enabled).

0225 — Netflix bench snapshot regen (upstream a44e5e61 motion fix)

  • Touches:
  • testdata/netflix_benchmark_results.json — fork-added snapshot. CPU rows now reflect the post-fix motion feature; cuda / sycl rows from the previous regen are preserved unchanged because those backends were not exercised on this rerun (host-environment tooling — wrong renderD path, libvmaf_cuda not enabled in the local FFmpeg build). Future full regens should include cuda / sycl.
  • testdata/bench_all.sh — default VMAF= no longer points at /usr/local/bin/vmaf (which on most dev hosts is stuck at the pre-upstream-a44e5e61 v3.0.0); now defaults to the in-tree fork build at core/build/tools/vmaf.
  • testdata/benchmark_netflix.pyFFMPEG, YUVDIR and the hardcoded LD_LIBRARY_PATH=/usr/local/lib are now overridable via VMAF_FFMPEG, VMAF_YUVDIR and any caller-set LD_LIBRARY_PATH.
  • Invariant: the snapshot's CPU pooled VMAF for src01_576x324 is 76.667828 (post-fix), not 76.668904 (the upstream-buggy mirror). If /sync-upstream ever re-pulls a Netflix change that touches motion.c mirror-handling, this number is the reference.
  • Re-test:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
LD_LIBRARY_PATH=$(pwd)/build/src python3 \
    ../testdata/benchmark_netflix.py

Expected CPU pooled rows: 76.667828, 35.068672, 7.985899.

0224 — CUDA graph capture feasibility (research-0047, DEFER)

  • Touches: none — investigation-only; no code lands. The research digest docs/research/0047-cuda-graph-capture-feasibility.md documents why a CUDA graph capture path on the per-frame submit chain is deferred rather than shipped (realised wall-clock gain capped at ~1-3% vs. the predicted 10-20%, with a 4-slot picture-pool rotation that defeats single-graph capture and forces per-frame cuGraphExecKernelNodeSetParams rebinding for (ref, dis) device pointers).
  • Invariant: the kernel_template.h docstring keeps naming VmafCudaKernelLifecycle.finished as a graph-capture hook point. Don't prune that comment on rebase — leaving the door open in the template is free, and the digest's "what needs to be true for a future GO" section depends on the hook still being there.
  • Re-test on rebase:
# Confirm the docstring still references graph capture as the hook
# point — wording change is fine, removal is not.
grep -q "graph capture" core/src/cuda/kernel_template.h

0223 — ADR slug-drift repair in CHANGELOG / rebase-notes (PR #304 follow-up)

  • Touches: CHANGELOG.md, docs/rebase-notes.md. No code; no upstream-shared path; no public-API surface.
  • Invariant: every [ADR-NNNN](docs/adr/NNNN-slug.md) link in the fork's tracked docs resolves to an actual on-disk file under docs/adr/. Repaired 4 broken slugs that did not exist on disk (0138-iqa-convolve-avx2-bitexact-double0138-iqa-convolve-avx2-bitexact-double, 0140-simd-dx-framework0140-simd-dx-framework, 0190-ms-ssim-vulkan0190-ms-ssim-vulkan, 0178-vulkan-adm-kernel0178-vulkan-adm-kernel). All retained their cited NNNN per ADR-0028 (NNNN is immutable once Accepted).
  • Re-test on rebase: from repo root, the following must print no lines:
for ref in $(grep -ohE 'docs/adr/[0-9]{4}-[a-z0-9-]+\.md' \
    CHANGELOG.md docs/rebase-notes.md AGENTS.md docs/state.md \
    | sort -u); do
  test -f "$ref" || echo "MISSING: $ref"
done

0125 — cambi_vulkan migrated to kernel_template (T-GPU-DEDUP-25, 5-bundle)

  • Touches:
  • core/src/feature/vulkan/cambi_vulkan.c — state's quintet (dsl_2bind + 5× pl_layout_* + shader_modules[CAMBI_PL_COUNT] ared desc_pool) collapses to five VmafVulkanKernelPipeline bundles (pl_trivial, pl_derivative, pl_filter_mode, pl_decimate, pl_mask_dp), each owning its own descriptor pool. The first slot of pipelines[] per stage aliases the bundle's base pipeline; CAMBI_PL_FILTER_MODE_V, CAMBI_PL_MASK_SAT_COL, and CAMBI_PL_MASK_THRESHOLD are sibling variants built via vmaf_vulkan_kernel_pipeline_add_variant().
  • cambi_vk_alloc_set takes a bundle pointer (->desc_pool / ->dsl) — every dispatch site picks the bundle that matches its push-constant struct.
  • The cambi_vk_make_dsl / cambi_vk_make_pl / cambi_vk_create_shader / cambi_vk_build_pipeline helpers are dropped — the template subsumes them.
  • Invariant — variants destroyed before bundle, base alias must be skipped. Five distinct push-constant struct sizes (CambiVkPushTrivial / CambiVkPushDerivative / CambiVkPushFilterMode / CambiVkPushDecimate / CambiVkPushMaskDp) force five bundles even though every stage's DSL is 2-binding SSBO; _add_variant() only siblings pipelines under the same layout. close_fex must vkDestroyPipeline() the variant slots (CAMBI_PL_FILTER_MODE_V, CAMBI_PL_MASK_SAT_COL, CAMBI_PL_MASK_THRESHOLD) before calling vmaf_vulkan_kernel_pipeline_destroy() on each bundle.
  • Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit): cambi mean = 0.0, identical to pre-migration (the pair has no banding artifacts).
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper. Upstream Netflix/vmaf has no Vulkan backend, so there is nothing to merge against.

0124 — ssimulacra2_vulkan migrated to kernel_template (T-GPU-DEDUP-24, 4-bundle)

  • Touches:
  • core/src/feature/vulkan/ssimulacra2_vulkan.c — state's 16 long-lived pipeline-object fields (4× *_dsl + *_pl + *_shader + the shared desc_pool) collapse to four VmafVulkanKernelPipeline bundles (pl_xyb, pl_mul, pl_blur, pl_ssim), each owning its own descriptor pool. The first slot of each per-bundle pipeline array (xyb_pipelines[0], mul_pipelines[0], blur_pipelines_h[0], ssim_pipelines[0]) aliases the bundle's base VkPipeline; remaining per-scale / per-pass slots are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • ss2v_build_pipeline_int3 reroutes through _add_variant() instead of calling vkCreateComputePipelines directly; ss2v_alloc_set takes a bundle pointer (->desc_pool / ->dsl) instead of a separate DSL argument; descriptor-set free sites at the tail of ss2v_run_scale route to each bundle's pool.
  • The ss2v_make_dsl / ss2v_make_pl / ss2v_create_shader helpers are dropped — the template subsumes them.
  • Invariant — variants destroyed before bundle, slot 0 alias must be skipped. Four distinct DSL shapes (XYB = 6 SSBOs, MUL = 3, BLUR = 2, SSIM = 8) prevent collapsing to one bundle: _add_variant() only siblings pipelines under the same layout. close_fex must vkDestroyPipeline() the variant slots in xyb_pipelines[1..N-1], mul_pipelines[1..N-1], ssim_pipelines[1..N-1], blur_pipelines_h[1..N-1], and every slot of blur_pipelines_v[] before calling vmaf_vulkan_kernel_pipeline_destroy() on each bundle, and must skip slot 0 of the first three arrays + blur_pipelines_h to avoid double-freeing the aliased base.
  • Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit): ssimulacra2 mean = 24.613842, identical to pre-migration.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper. Upstream Netflix/vmaf has no ssimulacra2 extractor and no Vulkan backend, so there is nothing to merge against.

0118 — psnr_hvs_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-18)

  • Touches:
  • core/src/feature/vulkan/psnr_hvs_vulkan.c — state's dsl + pipeline_layout + shader + desc_pool + pipeline[3] collapses to VmafVulkanKernelPipeline pl + VkPipeline pipeline_chroma_u + VkPipeline pipeline_chroma_v. Plane 0 is the template's base pipeline; planes 1+2 are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • New psnr_hvs_plane_pipeline() accessor maps plane index to the right VkPipeline handle.
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the chroma U/V variants before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan in T-GPU-DEDUP-7.
  • Numerical contract: unchanged. Same shaders + spec-constants push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0119 — vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-19)

  • Touches:
  • core/src/feature/vulkan/vif_vulkan.c — state's dsl + pipeline_layout + shader + desc_pool + pipelines[4] collapses to VmafVulkanKernelPipeline pl + VkPipeline scale_variants[3]. Scale 0 is the template's base pipeline; scales 1, 2, 3 are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • New vif_scale_pipeline() accessor maps scale index to the right VkPipeline handle (replaces s->pipelines[scale]).
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the 3 scale variants before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan in T-GPU-DEDUP-7 and psnr_hvs_vulkan in T-GPU-DEDUP-18.
  • Numerical contract: unchanged. Same shaders, same spec-constants, same push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0120 — float_vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-20)

  • Touches:
  • core/src/feature/vulkan/float_vif_vulkan.c — state collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl; the VkPipeline pipelines[2][4] 2-D lookup table is preserved so the existing [mode][scale] dispatch path stays clean, but pipelines[0][0] aliases s->pl.pipeline (the template's base). The other 6 entries are sibling pipelines created via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the 6 sibling variants (every (mode, scale) except (0, 0)) before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan / psnr_hvs_vulkan / vif_vulkan.
  • Invariant — pipelines[0][0] aliasing. The base pipeline handle is owned by s->pl.pipeline; we copy it into pipelines[0][0] after _create() so the dispatch path can use a uniform 2-D lookup. The destroy loop must skip (mode=0, scale=0) to avoid double-freeing the template's pipeline.
  • Numerical contract: unchanged. Same shaders, spec-constants (mode + scale), push-constants. Netflix-pair smoke matches integer_vif bit-identically to 4 decimals.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0122 — float_adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-22)

  • Touches:
  • core/src/feature/vulkan/float_adm_vulkan.c — twin to adm_vulkan (T-GPU-DEDUP-21); 16-pipeline 2-D [stage][scale] array. State collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl. pipelines[0][0] aliases s->pl.pipeline; the other 15 entries are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariants:
  • Variants destroyed before bundle.
  • pipelines[0][0] aliasing — destroy loop must skip (stage=0, scale=0).
  • Numerical contract: unchanged. Same float (_s suffix) primitives from adm_tools.c; same 5-element spec-constant tuple; same float partial accumulation reduced in double on the host.

0121 — adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-21)

  • Touches:
  • core/src/feature/vulkan/adm_vulkan.c — state collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl; the VkPipeline pipelines[4][4] 2-D lookup is preserved so the per-stage dispatch path stays clean. pipelines[0][0] aliases s->pl.pipeline (the template's base); the other 15 entries are sibling pipelines via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariants:
  • Variants destroyed before bundle (same rule as ssim_vulkan / psnr_hvs / vif / float_vif).
  • pipelines[0][0] aliasing — destroy loop must skip (stage=0, scale=0) to avoid double-freeing the template's pipeline.
  • Numerical contract: unchanged. Same shaders + 5-element spec-constant tuple (width, height, bpc, scale, stage) + push-constants.
  • Rebase impact: low. Builds on top of PR #272.

0123 — ms_ssim_vulkan 2-bundle migration (T-GPU-DEDUP-23)

  • Touches:
  • core/src/feature/vulkan/ms_ssim_vulkan.c — state collapses decimate_dsl + decimate_pl + decimate_shader + ssim_dsl + ssim_pl + ssim_shader + desc_pool (7 fields) to two bundles VmafVulkanKernelPipeline pl_decimate + pl_ssim. Each bundle owns its own descriptor pool. The kernel has two distinct pipeline shapes (decimate = 2 SSBO bindings, ssim = 10 bindings), so two bundles is the minimum — _add_variant() only siblings pipelines under the same layout.
  • decimate_pipelines[0] aliases pl_decimate.pipeline (the template's base = scale 0). The remaining MS_SSIM_SCALES - 2 decimate variants (scales 1..3) are siblings via _add_variant().
  • ssim_pipeline_horiz[0] aliases pl_ssim.pipeline (base = scale 0, pass 0). The other 9 entries (4× ssim_pipeline_horiz for scales 1..4, plus 5× ssim_pipeline_vert for scales 0..4) are variants.
  • Invariant — variants destroyed before bundle. Same rule as ADR-0106 entry 0106: close_fex must destroy decimate_pipelines[1..3] and ssim_pipeline_horiz[1..4] + ssim_pipeline_vert[0..4] before calling vmaf_vulkan_kernel_pipeline_destroy() on pl_decimate / pl_ssim.
  • Invariant — [0] aliasing destroy-skip. decimate_pipelines[0] and ssim_pipeline_horiz[0] must not be passed to vkDestroyPipeline in close_fex_destroy() already releases them via pl_decimate.pipeline / pl_ssim.pipeline. Double-free is UB. The destroy loops in close_fex start at i = 1 for decimate and skip i == 0 for ssim_horiz.
  • Invariant — per-bundle descriptor pool. The shared s->desc_pool is gone; alloc_descriptor_set now takes a const VmafVulkanKernelPipeline *bundle and uses bundle->desc_pool + bundle->dsl. Per-frame vkFreeDescriptorSets calls must target the matching pool (pl_decimate.desc_pool for decimate sets, pl_ssim.desc_pool for ssim sets) — mixing them is undefined behavior.
  • Numerical contract: unchanged. Same shaders, spec constants, push constants, and dispatch order as before. float_ms_ssim Netflix-pair smoke (576×324×48f) reports mean 0.963241; ssim pyramid intermediate values bit-identical to pre-migration run.
  • Rebase impact: low. Upstream Netflix has no Vulkan backend. Conflicts only against the parallel T-GPU-DEDUP-{18..22} PRs (#284–#288) on CHANGELOG.md / docs/rebase-notes.md — auto-resolve keeps both halves.

0106 — Vulkan kernel template multi-pipeline + ssim/motion migration (T-GPU-DEDUP-7)

  • Touches:
  • core/src/vulkan/kernel_template.h — new vmaf_vulkan_kernel_pipeline_add_variant() helper. Takes the base pipeline bundle (DSL / pipeline layout / shader / pool owned by vmaf_vulkan_kernel_pipeline_create) plus a partial VkComputePipelineCreateInfo and produces a sibling VkPipeline re-using the same layout / shader. The base _create and _destroy entry points are unchanged; existing consumers (psnr, moment, ciede) keep working.
  • core/src/feature/vulkan/motion_vulkan.c — state collapses VkPipeline pipelines[2] (kept "for SYCL parity" but functionally identical because COMPUTE_SAD goes through push constants, not spec-constants) to a single VmafVulkanKernelPipeline pl. create_pipelines / close_fex shrink to template-driven create + destroy.
  • core/src/feature/vulkan/ssim_vulkan.c — state becomes VmafVulkanKernelPipeline pl + VkPipeline pipeline_vert. Pass 0 (horizontal) is the template's base pipeline; pass 1 (vertical) is created via _add_variant(). close_fex destroys the variant first, then calls vmaf_vulkan_kernel_pipeline_destroy() on the bundle.
  • Invariant — no spec-constant drift between base and variant. _add_variant() overwrites sType / stage.sType / stage.stage / stage.module / layout of the caller's VkComputePipelineCreateInfo so the variant is guaranteed to share the base's shader and layout. Callers control the variant's spec-constant via pSpecializationInfo. Reordering these overwrites lets a consumer accidentally bind a different shader module under the same layout — UB at descriptor-set time.
  • Invariant — variant destroyed before bundle. close_fex in ssim must vkDestroyPipeline(s->pipeline_vert) before vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — the bundle's _destroy releases the descriptor pool, which the vkAllocateDescriptorSets issued against the variant pipeline's layout cleanly drops only when the variant pipeline is already gone.
  • Numerical contract: unchanged. Both kernels run identical shaders + spec-constants + push-constants as before; only the Vulkan boilerplate that creates / destroys the pipeline scaffolding moved to a shared owner. Cross-backend parity gate at places=4 holds — Netflix-pair float_ssim smoke (576×324×48f) reports mean 0.863, identical to pre-migration.
  • Rebase impact: low. The base pipeline-bundle helpers predate this change (PR #270 / #271); the new _add_variant is additive. Upstream Netflix has no Vulkan backend to conflict with.

0111 — integer_ciede_cuda migrated to kernel_template (T-GPU-DEDUP-11)

  • Touches:
  • core/src/feature/cuda/integer_ciede_cuda.c — state's CUstream + CUevent + CUevent + VmafCudaBuffer + host-pinned float* quintet collapses to VmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. init / collect / close call the template's lifecycle_init/readback_alloc/collect_wait/ lifecycle_close/readback_free helpers. submit keeps the pre-launch wait inline (intentional — ciede has no atomic, so the template's pre-launch memset is unnecessary).
  • Numerical contract: unchanged. Pure CUDA-boilerplate consolidation. The host-side reduction in collect still uses the same double accumulator over per-block float partials — places=4 (ADR-0187) holds.

0112 — integer_moment_cuda migrated to kernel_template (T-GPU-DEDUP-12)

  • Touches:
  • core/src/feature/cuda/integer_moment_cuda.c — state's stream/event/device-buffer/host-pinned quintet collapses to VmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. submit calls vmaf_cuda_kernel_submit_pre_launch (atomic counters require the device-side memset). init / collect / close call the matching template helpers.
  • Numerical contract: unchanged. Same per-frame atomic accumulators (4× uint64), same sums_host[i] / n_pixels host division.
  • Rebase impact: low. Upstream Netflix has no equivalent template; this consolidation is fork-local.

0113 — integer_motion_v2_cuda migrated to kernel_template (T-GPU-DEDUP-13)

  • Touches:
  • core/src/feature/cuda/integer_motion_v2_cuda.c — stream/event pair + sad device+host quintet collapses to lc + rb. Raw-pixel ping-pong pix[2] stays outside the bundle. submit keeps the memset on pic_stream inline rather than calling submit_pre_launch (the helper would move the memset to lc.str, which races with the kernel reading the accumulator). init / collect / close call the matching template helpers.
  • Numerical contract: unchanged. Same D2D copy, same conditional kernel launch on frame ≥ 1, same host-side min(score[i], score[i+1]) flush.

0114 — integer_ssim_cuda migrated to kernel_template (T-GPU-DEDUP-14)

  • Touches:
  • core/src/feature/cuda/integer_ssim_cuda.c — stream/event/partials device+host quintet collapses to lc + rb. Five intermediate float buffers (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) stay outside the bundle. submit keeps the cuStreamWaitEvent + horiz + vert + DtoH chain inline — SSIM writes one float per block (no atomic), so the template's submit_pre_launch memset is unnecessary. init / collect / close use the matching template helpers.
  • Numerical contract: unchanged. Same horiz-then-vert two-pass pipeline, same per-block float partial reduction in double on the host. places=4 (matching the ciede_cuda precision pattern) holds.
  • Rebase impact: low. Upstream Netflix has no equivalent; this is fork-added.

0115 — ms_ssim_cuda + psnr_hvs_cuda lifecycle migration (T-GPU-DEDUP-15)

  • Touches:
  • core/src/feature/cuda/integer_ms_ssim_cuda.c — stream + 2-event lifecycle replaced with VmafCudaKernelLifecycle lc; multi-level pyramid + SSIM intermediate + 3-partials buffers stay outside the template's single-pair readback bundle.
  • core/src/feature/cuda/integer_psnr_hvs_cuda.c — same shape; 3-plane ref/dist/partials triples remain inline.
  • Numerical contract: unchanged. The migration only affects init / close boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the s->strs->lc.str / s->events->lc.submit / s->finisheds->lc.finished field renames.

0116 — float_psnr/ansnr/motion cuda → kernel_template (T-GPU-DEDUP-16)

  • Touches:
  • core/src/feature/cuda/float_psnr_cuda.c — stream/event/partials quintet → lc + rb; input upload buffers ref_in / dis_in stay outside the bundle.
  • core/src/feature/cuda/float_ansnr_cuda.c — same shape; rb wraps the (sig, noise) interleaved partials.
  • core/src/feature/cuda/float_motion_cuda.c — same shape; rb wraps the SAD partials, blur[2] ping-pong stays outside.
  • Numerical contract: unchanged. Same dispatch geometry, same reduction order. Cross-backend parity gate at the kernels' contracted precision (places=3 per ADR-0192) holds.

0117 — float_adm + float_vif cuda lifecycle migration (T-GPU-DEDUP-17)

  • Touches:
  • core/src/feature/cuda/float_adm_cuda.c — stream + 2-event lifecycle replaced with VmafCudaKernelLifecycle lc; multi-stage DWT + CSF pipeline state stays outside the template's single-pair readback bundle.
  • core/src/feature/cuda/float_vif_cuda.c — same shape; 4-level pyramid + per-scale (num, den) pairs remain inline.
  • Numerical contract: unchanged. The migration only affects init / close stream-event boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the field renames.
  • Rebase impact: low. Upstream Netflix has no equivalent template; this is fork-added.

0107 — float_psnr_vulkan migrated to kernel_template (T-GPU-DEDUP-8)

  • Touches:
  • core/src/feature/vulkan/float_psnr_vulkan.c — state's dsl + pipeline_layout + shader + pipeline + desc_pool quintet is collapsed into a single VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy. No shader changes, no spec-constant changes, no push-constant changes.
  • Numerical contract: unchanged. The migration is a pure Vulkan-boilerplate consolidation. Cross-backend parity gate at places=4 holds — Netflix-pair smoke reports float_psnr mean 30.755 dB, identical to pre-migration.

0109 — float_ansnr_vulkan + motion_v2_vulkan migrated to kernel_template (T-GPU-DEDUP-9)

  • Touches:
  • core/src/feature/vulkan/float_ansnr_vulkan.c — single-pipeline state collapses to VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy.
  • core/src/feature/vulkan/motion_v2_vulkan.c — same shape.
  • Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Cross-backend parity gate at the kernel's contracted precision holds — Netflix-pair smoke reports float_ansnr mean 23.51 dB and motion2_v2_score mean 3.895, identical to pre-migration.

0110 — float_motion_vulkan migrated to kernel_template (T-GPU-DEDUP-10)

  • Touches:
  • core/src/feature/vulkan/float_motion_vulkan.c — single-pipeline state collapses to VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy.
  • Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Netflix-pair smoke reports motion mean 4.049 / motion2 mean 3.894, identical to pre-migration.
  • Rebase impact: low. Upstream Netflix has no Vulkan backend.

0108 — Bristol VI-Lab feasibility digest + BVI-CC ingest ADR (Draft)

  • Touches:
  • docs/research/0046-bristol-vi-lab-feasibility.md (new) — nine-dataset survey + use-case fit + effort estimate.
  • docs/adr/0241-bristol-bvi-cc-ingest.md (new, Status: Draft) — proposal to ingest BVI-CC as the second tiny-AI corpus.
  • docs/adr/README.md — index row for ADR-0241.
  • CHANGELOG.md — Added entry.
  • Numerical contract: not applicable (docs-only).
  • Rebase impact: none. Pure research deliverables; upstream Netflix has no equivalent surface.

0094 — Vulkan VkImage import v2 async pending-fence (T7-29 part 4 / ADR-0251)

  • ADR: ADR-0251; predecessor ADR-0186.
  • Touches:
  • core/src/vulkan/import.c — full rewrite of the submission path. Single-fence submit_and_wait becomes per-slot submit_to_slot + drain_slot_fence; the new slot_alloc / slot_release helpers materialise / tear down a ring slot (staging-pair + cmd buffer + fence). vmaf_vulkan_import_image indexes into the ring by frame_index % ring_size; vmaf_vulkan_wait_compute drains every outstanding fence. vmaf_vulkan_state_build_pictures waits the slot's fence before exposing the host pointer. Public-API signatures are unchanged.
  • core/src/vulkan/vulkan_internal.h — new struct VmafVulkanImportSlot; VmafVulkanImportSlots becomes a fixed-capacity VmafVulkanImportSlot ring[VMAF_VULKAN_RING_MAX] plus geometry + ring_size. Two new defines — VMAF_VULKAN_RING_DEFAULT (4) and VMAF_VULKAN_RING_MAX (8). VmafVulkanState gains requested_ring_size.
  • core/src/vulkan/common.cvmaf_vulkan_state_init and _state_init_external set requested_ring_size = VMAF_VULKAN_RING_DEFAULT.
  • core/test/test_vulkan_async_pending_fence.c (new, contract smoke for the v1 → v2 swap).
  • core/test/meson.build — registers the new test under the existing enable_vulkan guard.
  • core/src/vulkan/AGENTS.md (new) — pins the three rebase-sensitive ring invariants.
  • docs/adr/0251-vulkan-async-pending-fence.md (new), docs/research/0042-vulkan-async-pending-fence.md (new), docs/api/gpu.md, docs/backends/vulkan/overview.md, CHANGELOG.md, docs/rebase-notes.md.
  • ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patchunchanged. The v2 ring is fully internal to VmafVulkanState; the public ABI stays byte-identical so the filter consumes the new path transparently.
  • Invariant 1 — fixed ring depth at first import. lazy_alloc_ring is the only place that materialises the ring; once allocated the depth never changes for the lifetime of the VmafVulkanState. Any caller that needs a different depth has to free + re-init. The geometry pinning contract from v1 (ADR-0186) is preserved verbatim.
  • Invariant 2 — vkResetFences only after VK_SUCCESS from vkWaitForFences. Sole reset path lives in drain_slot_fence; fence_in_flight flips back to 0 only after the wait succeeds. A -EIO from the wait propagates up without resetting (so a retry would correctly re-wait rather than silently move on).
  • Invariant 3 — state_free drains before destroying. vmaf_vulkan_import_slots_free walks the ring and calls drain_slot_fence on every in-flight slot, then issues one vkQueueWaitIdle belt-and-braces (any feature kernel that submitted on the same queue may still be running). Reordering this triggers validation-layer "destroying in-use object" errors.
  • Numerical contract: unchanged. Async submission only changes when the host can read the staging buffer, not which bytes the GPU writes. Cross-backend parity gate (scripts/ci/cross_backend_parity_gate.py, places=4) holds.
  • Memory delta: staging arena scales 1 → ring_size per direction. At default depth and 1080p 8-bit Y, the per-state host-visible footprint grows from ~4 MiB to ~16 MiB. Documented in ADR-0251 §Consequences.

0090 — cambi_vulkan extractor (T7-36 / ADR-0210)

  • ADR: ADR-0210; predecessor ADR-0205.
  • Touches:
  • core/src/feature/vulkan/cambi_vulkan.c (replaces the spike scaffold's init_stub/extract_stub/close_stub triple with the full Vulkan-aware lifecycle).
  • core/src/feature/vulkan/shaders/cambi_preprocess.comp (new), cambi_mask_dp.comp (new — unified row-SAT / col-SAT / threshold-compare via PASS=0/1/2 spec const).
  • core/src/feature/cambi.c — appends a small block of public trampolines (vmaf_cambi_*) at the bottom of the file that thinly wrap the file-static helpers. No upstream function-static code is renamed or moved; the entire upstream body of cambi.c above the trampolines stays byte-identical, which keeps Netflix sync straightforward.
  • core/src/feature/cambi_internal.h (new) — internal-only header exposing vmaf_cambi_calculate_c_values, vmaf_cambi_get_spatial_mask, etc., to the GPU twin.
  • core/src/vulkan/meson.build — registers the 5 cambi shaders in vulkan_shader_sources[] and cambi_vulkan.c in vulkan_sources.
  • core/src/feature/feature_extractor.c — adds the extern decl + registry entry for vmaf_fex_cambi_vulkan under #if HAVE_VULKAN.
  • scripts/ci/cross_backend_vif_diff.pycambi row in FEATURE_METRICS so the cross-backend gate runs at places=4 against the CPU baseline.
  • docs/adr/0210-cambi-vulkan-integration.md, docs/research/0032-cambi-vulkan-integration.md, docs/backends/vulkan.md, CHANGELOG.md.
  • Invariant 1 — bit-exactness by construction. Every GPU phase is integer arithmetic (uint16 derivative, int32 SAT, > compare, stride-2 gather, 3-element mode3 lookup). The readback into the host VmafPicture pair is byte-identical to what the CPU would have written; the host residual then runs the unmodified CPU calculate_c_values + spatial pooling on those buffers. Any rebase that introduces float arithmetic into one of these GPU phases — e.g., a future Netflix change to the derivative kernel that adds a bilinear interpolation step — will silently break places=4 and must be caught at the cross-backend gate.
  • Invariant 2 — cambi_internal.h signatures must stay in lock-step with cambi.c's file-static helpers. The Vulkan twin calls vmaf_cambi_calculate_c_values, which trampolines to the file-static calculate_c_values. Any signature change to the latter (extra parameters, type changes) must update the trampoline + header in the same PR or the GPU build breaks.
  • On upstream sync: cambi.c's file-static helpers are sometimes renamed by upstream (e.g., decimatecambi_decimate would happen during a Netflix tidy-up). When rebasing, search cambi.c's tail for the trampoline block — its five static calls (get_spatial_mask, decimate, filter_mode, calculate_c_values, spatial_pooling, weight_scores_per_scale, get_pixels_in_window, increment_range, decrement_range, get_derivative_data_for_row, cambi_preprocessing) need to match the upstream symbol names. Update the trampoline body if upstream renames; signatures should not need to change because the trampoline already takes the function-pointer-typedef form (VmafRangeUpdater etc.).
  • Re-test on rebase: python3 scripts/ci/cross_backend_vif_diff.py --backend vulkan --feature cambi --ref testdata/ref_576x324_48f.yuv --dist testdata/dis_576x324_48f.yuv --width 576 --height 324 --pixel-format 420 --bitdepth 8 --frames 48. Should emit places=4 PASS with max_abs_diff = 0.0. If it diverges, bisect the GPU phases by reading back individual buffers (image_buf / mask_buf / deriv_buf) and comparing against the CPU's in-place pic plane after the equivalent stage.

The pre-ADR-0108 fork-local PRs are summarised by workstream rather than per-PR. Future PRs add entries individually.

0085 — Upstream c70debb1 partial port (adm_csf + barten_csf tests)

  • No ADR. Pure upstream cherry-pick per ADR-0108 carve-out ("pure upstream syncs and port-upstream-commit PRs are exempt").
  • Upstream source: c70debb1 (Kyle Swanson, 2026-04-28): "libvmaf/test: port new adm/vif/speed tests". The audit row that flagged the gap is T-NEW-2 in the 2026-04-29 quarterly upstream-backlog re-audit (PR #205).
  • Touches (additive only):
  • core/src/feature/adm_csf_tools.h — new header (verbatim from upstream); declares the inline adm_native_csf helper (DLM-paper CSF) used by the new test_adm_csf unit.
  • core/test/test_adm_csf.c — new unit (verbatim from upstream); 2 mu_assert cases on adm_native_csf(3, 3.0, 1080, {0, 45}).
  • core/test/test_barten_csf.c — new unit (verbatim from upstream); 23 mu_assert cases over barten_rod_cone_sens, barten_mtf, barten_csf, linear_interpolate, barten_watson_blend_csf (all symbols already on the fork).
  • core/test/meson.build — registers the two new executables + adds test('test_adm_csf', ...) and test('test_barten_csf', ...).
  • CHANGELOG.md Unreleased § Changed.
  • Deliberate scope cuts (the upstream commit's other halves are not portable verbatim):
  • test_vif_tools.c — depends on upstream symbols NUM_KERNELSCALES, the 21-entry valid_kernelscales table, vif_validate_kernelscale, vif_get_filter_size, vif_get_filter, speed_get_antialias_filter, and a [NUM_KERNELSCALES][5][65] filter table that the fork's vif_filter1d_table_s [11][4][65] does not match. Per Research-0024 Strategy E, the fork deliberately diverges from the upstream vif runtime-helper chain to preserve the ADR-0138 / 0139 / 0142 / 0143 SIMD bit-exactness contract. Porting this test requires porting the runtime helpers first.
  • test_speed_chroma.c#includes feature/speed.c directly; the fork has no SpEED extractor (feature/speed.c does not exist). Pairs with audit row T-NEW-1 (port the SpEED extractor wholesale, or absorb it into the tiny-AI speed metric).
  • Invariants (rebase-relevant):
  • The new adm_csf_tools.h header is wholly additive and does not conflict with the existing fork adm_csf_s non-inline helper in adm_tools.h (different signature, different translation units).
  • The two new tests do not depend on Netflix golden YUVs — they evaluate the closed-form CSF math directly. No golden-data interaction.
  • On upstream sync: a future port of the upstream vif runtime-helper chain (Research-0024 Strategy A reversal) or the SpEED extractor (T-NEW-1) unlocks the deferred halves of this commit. Until then, fork-side test_vif_tools.c / test_speed_chroma.c stay absent.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu test_adm_csf test_barten_csf
meson test -C build-cpu test_adm_csf test_barten_csf

0084 — Embedded MCP server scaffold (T5-2, ADR-0209)

  • ADR: ADR-0209 (audit-first scaffold) on top of the ADR-0128 governance + Research-0005 design.
  • Upstream source: fork-local. Netflix/vmaf has no embedded MCP server (and no plans to add one — the workflow is agent-tooling-specific, well outside upstream's library scope).
  • Touches:
  • core/include/libvmaf/libvmaf_mcp.h — new public header.
  • core/include/core/meson.build — new if get_option('enable_mcp') install branch.
  • core/src/mcp/ — new directory: mcp.c (stub TU) + meson.build (exposes mcp_sources + mcp_defines).
  • core/src/meson.build — new is_mcp_enabled guard + subdir('mcp') block; mcp_sources threaded into the library('vmaf', ...) source list alongside dnn_sources.
  • core/test/meson.build — new if get_option('enable_mcp') block wiring test_mcp_smoke.
  • core/test/test_mcp_smoke.c — new 12-sub-test smoke.
  • core/meson_options.txt — new enable_mcp umbrella + three sub-flags (all default false).
  • Invariant: every public entry point in libvmaf_mcp.h (vmaf_mcp_init / _start_sse / _start_uds / _start_stdio / _stop / _close) returns -ENOSYS (or -EINVAL on bad arguments) until the T5-2b runtime PR lands. The smoke pins this contract — a runtime PR that flips a return code without flipping the smoke expectation regresses the gate.
  • On upstream sync: zero interaction with upstream files. Wholly additive directory + boolean build flags. The subdir('mcp') insertion in core/src/meson.build lives next to the existing subdir('dnn') / Vulkan blocks; an upstream conflict in that area would be confined to those few lines and is mechanical to resolve.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_mcp=false
ninja -C build-cpu && meson test -C build-cpu  # baseline still green

meson setup --reconfigure build-cpu libvmaf -Denable_mcp=true \
            -Denable_mcp_sse=true -Denable_mcp_uds=true -Denable_mcp_stdio=true
ninja -C build-cpu
meson test -C build-cpu test_mcp_smoke  # 12/12 sub-tests pass

0065 — T7-37 Netflix bench rerun + docs/benchmarks.md TBD fill

  • No ADR. Empirical fill of pre-existing TBD cells; no new decision. The bench script fixes that this rerun depends on shipped earlier under PR #169 (libvmaf/AGENTS.md backend-engagement foot-guns), PR #170 (--backend cuda actually engages CUDA), and PR #171 (testdata/bench_all.sh uses correct flags). Vulkan header install for SDK consumers is PR #175.
  • Touches (additive only): docs/benchmarks.md (every TBD cell replaced with measured numbers; hardware-profile table updated to the ryzen-4090-arc host the rerun was performed on; "How to reproduce" section now documents fixture acquisition for the gitignored BBB 4K 200-frame pair). CHANGELOG.md Unreleased § Changed entry.
  • Invariants (rebase-relevant): none. The numbers are tied to fork commit 41301496 and the ryzen-4090-arc profile; an upstream rebase that changes feature pipelines would invalidate the table but not break parsing.
  • On upstream sync: zero interaction. Pure docs.
  • Re-test on rebase: bash testdata/bench_all.sh (after a fresh fork build) — confirms the bench script still drives all four backends and that the per-row metrics-key counts (CPU=15, CUDA=12, SYCL/Vulkan=34) still distinguish them. If they collapse to one count, the new upstream broke a backend dispatcher silently.

0050 — float_adm_cuda + float_adm_sycl extractors (ADR-0202)

  • ADR: ADR-0202
  • Touches:
  • core/src/feature/cuda/float_adm/float_adm_score.cu (new)
  • core/src/feature/cuda/float_adm_cuda.{c,h} (new)
  • core/src/feature/sycl/float_adm_sycl.cpp (new)
  • core/src/meson.build — three changes: (1) new float_adm_score entry in cuda_cu_sources, (2) new cuda_cu_extra_flags dict that threads --fmad=false + -Xcompiler=-ffp-contract=off into the float_adm_score fatbin only, (3) new SYCL source in sycl_feature_sources.
  • core/src/feature/feature_extractor.c (extern decls + list entries for vmaf_fex_float_adm_cuda / vmaf_fex_float_adm_sycl under #if HAVE_CUDA / #if HAVE_SYCL).
  • Invariant 1 — --fmad=false for the float_adm fatbin only: the angle-flag dot product (ot_dp = oh*th + ov*tv) and the cube reductions (xa*xa*xa, csf_o*csf_o*csf_o) require IEEE-754 add/mul ordering to match the GLSL precise qualifier in float_adm.comp. NVCC's default -fmad=true fuses these and drifts past places=4 at scale 3 / adm2. The integer ADM kernels share cuda_flags but use int64 accumulators where FMA is irrelevant — keep the FMA-on default for them.
  • Invariant 2 — parent-LL dimension trap: stage 0 at scale > 0 reads the parent's LL band; the mirror/clamp bounds are scale_w/h[scale] (= parent's LL output dims = current scale's input dims), NOT scale_w/h[scale - 1] (= parent's full image dims). Both float_adm_cuda.c and float_adm_sycl.cpp cite this inline. Do not "simplify" by using the off-by-one neighbour.
  • Re-test:
CXX=icpx CC=icx meson setup build-cs -Denable_cuda=true \
     -Denable_sycl=true -Denable_vulkan=enabled \
     -Denable_float=true \
     -Dsycl_compiler=/opt/intel/oneapi/compiler/latest/bin/icpx
ninja -C build-cs
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary build-cs/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature float_adm \
  --backend cuda --places 4
# Same with --backend sycl on a host with an SYCL device.
# Both must report 0/N mismatches at places=4.

0049 — float_adm_vulkan extractor (ADR-0199)

  • ADR: ADR-0199
  • Touches:
  • core/src/feature/vulkan/float_adm_vulkan.c (new)
  • core/src/feature/vulkan/shaders/float_adm.comp (new)
  • core/src/vulkan/meson.build (adds the .comp shader and the new .c source)
  • core/src/feature/feature_extractor.c (extern decl + list entry under #if HAVE_VULKAN)
  • scripts/ci/cross_backend_vif_diff.py (float_adm entry in FEATURE_METRICS)
  • .github/workflows/tests-and-quality-gates.yml (lavapipe float_adm step at places=4)
  • Invariant: float_adm GPU port uses the 2 * sup - idx - 1 mirror form on both axes — matches both the scalar adm_dwt2_s and the AVX2 float_adm_dwt2_avx2, which both consume the same dwt2_src_indices_filt_s index buffer. This is intentionally different from float_vif's GPU mirror (ADR-0197), which uses -2 because float_vif's AVX2 path takes a different code branch. Do not "fix" the asymmetry by analogy with float_vif.
  • Re-test:
meson setup build-vk -Denable_vulkan=enabled -Denable_cuda=false \
                     -Denable_sycl=false
ninja -C build-vk
meson test -C build-vk
VK_LOADER_DRIVERS_SELECT='*lvp*' python3 \
  scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary build-vk/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature float_adm --places 4

0083 — SSIMULACRA 2 Vulkan kernel (ADR-0201)

meson setup core/build-vk-ss2 \
  -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false \
  libvmaf
ninja -C core/build-vk-ss2 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build-vk-ss2/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 \
  --feature ssimulacra2 --backend vulkan --places 1
# expected: max_abs_diff ≈ 1.59e-2, 0/48 mismatches at places=1
  • Follow-ups:
  • CUDA + SYCL twins (batch 3 parts 7b + 7c per ADR-0192).
  • Performance follow-up: re-bin multiple rows / columns per WG in the IIR blur (currently local_size = 1, one row/col per WG for correctness).
  • Optional: rename psnr_hvs_strict_shaders to strict_shaders in core/src/vulkan/meson.build (cosmetic — out of scope for this PR).

0001 — SIMD bit-identical reductions for float ADM

  • Workstream PRs: #18, commits 24c88a32, f082cfd3.
  • Touches: core/src/feature/integer_adm.c, core/src/feature/float_adm.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/arm64/adm_neon.c, upstream python/test/feature_extractor_test.py test expectations.
  • Invariant: sum_cube and csf_den_scale accumulate cubed values in double precision (via _mm256_cvtps_pd / _mm512_cvtps_pd) in scalar, AVX2, AVX-512, and NEON. Upstream accumulates in float, which produces ~8e-5 drift between scalar and SIMD. Test expectations were tightened to match the double-precision path; an upstream-side accumulator change would re-introduce the drift and break the tightened assertions.
  • Re-test: meson test -C build --suite=fast && python -m pytest python/test/feature_extractor_test.py -k adm.

0002 — CUDA ADM decouple-inline buffer elimination

  • Workstream PRs: commit 787e3382.
  • Touches: core/src/feature/cuda/integer_adm_cuda.cu, core/src/feature/cuda/adm_decouple_inline.cuh (new), core/src/feature/cuda/meson.build. Upstream's adm_decouple.cu is no longer compiled in the fork.
  • Invariant: CSF and CM CUDA kernels read ref / dis DWT2 buffers directly and compute decouple_r / decouple_a inline via __device__ helpers in adm_decouple_inline.cuh. The 6 intermediate buffers (decouple_r, decouple_a, csf_a × {scale-0 int16, scales 1-3 int32}) and the standalone adm_decouple.cu source are intentionally removed. ~107 MB GPU memory savings at 4K. An upstream change to adm_decouple.cu will look orphaned and a literal merge would re-introduce the buffer allocations.
  • Re-test: meson setup build -Denable_cuda=true && ninja -C build && meson test -C build --suite=cuda.

0003 — SYCL backend (USM pool / D3D11 import / vmaf_sycl_* API)

  • Workstream PRs: #33, #35, #5 (initial scaffolding), and the picture-pool deadlock fix that landed via #32.
  • Touches: core/include/libvmaf/libvmaf_sycl.h, core/src/sycl/, core/src/feature/sycl/, core/src/libvmaf.c (SYCL public-API entry points), meson_options.txt (enable_sycl).
  • Invariant: vmaf_sycl_preallocate_pictures constructs a real VmafSyclPicturePool honoring VmafSyclPicturePreallocationMethod (NONE / DEVICE / HOST); vmaf_sycl_picture_fetch dispatches to the pool when configured. The whole SYCL tree is fork-local and has no upstream counterpart — upstream changes to core/src/libvmaf.c near the SYCL entry-point block are likely to conflict. Picture-pool error paths in vmaf_read_pictures (libvmaf.c) must goto cleanup; rather than return err; to avoid leaking ref/dist pictures into the live-picture set (closes the always-on-pool deadlock fixed in #32 — see ADR-0104). See ADR-0101, ADR-0103, ADR-0104.
  • Re-test: meson setup build -Denable_sycl=true && ninja -C build && meson test -C build --suite=sycl (requires oneAPI / icpx).

0004 — DNN runtime + tiny-AI surfaces

  • Workstream PRs: #5, #8, #21, #22, #23, #31, #34, plus the pre-numbered DNN feat commits (9b985946, 1e5336d3, d122b721).
  • Touches: core/include/libvmaf/dnn.h, core/src/dnn/, core/src/feature/feature_lpips.c, model/tiny/, meson_options.txt (enable_onnxruntime).
  • Invariant: ordered EP selection (CUDA → DML → CPU) with graceful fallback (ADR-0102); fp16_io does host-side fp32↔fp16 cast on the scoring path; VMAF_TINY_MODEL_DIR enforces a path jail on model load (PR #31); the runtime op-allowlist (PR #21) walks the ONNX graph and rejects unknown ops + bounds Loop/If trip_count at 1024 (ADR-0036/0107). DNN tree is fork-local; upstream has no DNN code yet, so conflicts here are unlikely but the meson_options.txt and core/src/meson.build blocks near the DNN flag may collide.
  • Re-test: meson setup build -Denable_onnxruntime=true && ninja -C build && meson test -C build --suite=dnn.

0005 — --precision CLI flag (IEEE-754 round-trip lossless)

  • Workstream PRs: commit c989fbd9.
  • Touches: core/tools/vmaf.c, core/tools/cli_parse.c, core/include/libvmaf/libvmaf.h (added vmaf_write_output_with_format), core/src/output.c.
  • Invariant: default --precision is %.17g (round-trip lossless); legacy opts back into upstream's %.6f; the public C API gained vmaf_write_output_with_format and the old vmaf_write_output routes through it with the %.17g default. ABI-breaking only if upstream adds a same-named function with a different signature. See ADR-0006.
  • Re-test: vmaf -r ref.yuv -d dis.yuv ... --precision=full and diff against --precision=legacy.

0006 — Netflix golden tests preserved verbatim as required gate

  • Workstream PRs: across the fork's life; codified in ADR-0024.
  • Touches: python/test/quality_runner_test.py, python/test/vmafexec_test.py, python/test/vmafexec_feature_extractor_test.py, python/test/feature_extractor_test.py, python/test/result_test.py, python/test/resource/yuv/.
  • Invariant: assertAlmostEqual(...) golden values in the five upstream Python test files are never modified by this fork. Fork-added tests live in separate files (e.g. python/test/test_precision_flag.py). The CI gate "Netflix CPU golden tests (D24)" is required and blocks merge. Upstream changes to these files are accepted unless they relax the assertions.
  • Re-test: make test-netflix-golden.

0007 — Build system (CUDA 13.2, oneAPI 2025.3, MkDocs migration)

  • Workstream PRs: #7, #17, commit 8a995cb0.
  • Touches: meson.build, meson_options.txt, top-level Makefile, docs/ (Sphinx → MkDocs Material migration — docs/conf.py removed, mkdocs.yml added), docs/requirements.txt, Dockerfile.*, distro install scripts under scripts/.
  • Invariant: image pins are non-conservative (ADR-0027) — CUDA 13.2, oneAPI 2025.3, clang-format 22, black 26 — and ship experimental toolchain flags (--expt-relaxed-constexpr, etc.) deliberately. An upstream sync that pulls in a Dockerfile change targeted at older CUDA or older oneAPI must not relax the pins.
  • Re-test: meson setup build -Denable_cuda=true -Denable_sycl=true && ninja -C build && mkdocs build --strict.

0008 — Workspace / docs / MATLAB / resource-tree relocations

  • Workstream PRs: codified across ADR-0026, ADR-0029, ADR-0030, ADR-0031, ADR-0032, ADR-0033, ADR-0034, ADR-0038.
  • Touches: any path-walk in upstream's CI / scripts / docs that assumes the upstream layout (root-level workspace/, resource/, matlab/, root unittest script, root patches/).
  • Invariant: the fork's layout is python/vmaf/workspace/, python/vmaf/resource/, python/vmaf/matlab/, scripts/unittest, ffmpeg-patches/ only, .github/codeql-config.yml. Upstream moves to a different sub-tree (e.g. a hypothetical tools/workspace/) need to either be applied via a corresponding fork-side relocation or rejected with a rebase note.
  • Re-test: python -m pytest python/test/ -k golden (verifies the resource-tree path works); make test-netflix-golden.

0009 — License headers (Lusoris/Claude on wholly-new files

2016–2026 on Netflix files)

  • Workstream PRs: commits c159761d, a185f8ef, 0e98c949, codified in ADR-0025 / ADR-0105.
  • Touches: every wholly-new fork file (notably the SYCL tree and core/src/dnn/) and every Netflix-touched file (year range 2016 → 2016–2026).
  • Invariant: wholly-new fork files carry Copyright 2026 Lusoris and Claude (Anthropic) under the same BSD-3-Clause-Plus-Patent license; mixed files use a dual-copyright notice. An upstream commit that resets a Netflix file's year range (e.g. back to 2016–2020) must be partially rejected — keep the fork's 2016–2026.
  • Re-test: grep that wholly-new fork files retain the Lusoris/Claude header (grep -L "Copyright 2026 Lusoris" core/src/sycl/*.cpp — expected to match nothing).

0010 — .claude/ agent scaffolding + ADR tree + AGENTS.md / CLAUDE.md

  • Workstream PRs: #14, #24, #37, plus continuous additions.
  • Touches: .claude/, AGENTS.md, CLAUDE.md, docs/adr/, .github/PULL_REQUEST_TEMPLATE.md.
  • Invariant: this whole tree is fork-local and has no upstream counterpart. Upstream additions to .github/ (issue templates, workflows) need to merge cleanly with the fork's existing files rather than replacing them. The ADR tree's IDs ≤ 0099 are backfills; new decisions start at 0100 (ADR-0028 / ADR-0106).
  • Re-test: visual review of .github/ and docs/adr/README.md after the merge.

Pre-ADR-0108 entries above are the result of a one-shot backfill sweep on 2026-04-18; subsequent fork-local PRs add their own entries inline.

0011 — Nightly bisect-model-quality + fixture cache

  • Workstream PRs: closes #4; sticky tracker issue #40.
  • Touches: .github/workflows/nightly-bisect.yml, ai/scripts/build_bisect_cache.py, ai/testdata/bisect/{features.parquet, models/*.onnx, README.md}, scripts/ci/post-bisect-comment.py, docs/ai/bisect-model-quality.md, docs/adr/0109-nightly-bisect-model-quality.md, docs/research/0001-bisect-model-quality-cache.md, mkdocs.yml (nav).
  • Invariant: the committed parquet + ONNX bytes under ai/testdata/bisect/ must regenerate byte-identically from ai/scripts/build_bisect_cache.py with seeds FEATURE_SEED=20260418 and MODEL_SEED=20260419. The CI --check step asserts this before every bisect run, so any upstream pull that bumps pandas / pyarrow / onnx enough to change the serialiser bytes will fail the workflow until the cache is regenerated and committed.
  • Re-test:
python ai/scripts/build_bisect_cache.py --check
vmaf-train bisect-model-quality \
    ai/testdata/bisect/models/model_*.onnx \
    --features ai/testdata/bisect/features.parquet \
    --min-plcc 0.85 --input-name input
# Expected: "no regression in this range"; first_bad_index None.

Pure upstream code is not touched, so no Netflix-side conflict vector. Only fork-local files; risk is toolchain drift, not merge conflict.

0012 — Upstream ADM port (Netflix 966be8d5)

  • Workstream PRs: this PR; ports a single upstream commit.
  • Touches: core/src/feature/integer_adm.{c,h}, core/src/feature/x86/adm_avx2.{c,h}, core/src/feature/x86/adm_avx512.{c,h}, core/src/feature/alias.c, core/src/feature/barten_csf_tools.h (new upstream file).
  • Invariant: the eight ADM files now mirror upstream's content byte-for-byte (modulo our clang-format-22 pass and the Netflix copyright-year bump on the new header). Future /sync-upstream runs can take new upstream ADM commits cleanly. Do not revert to a pre-966be8d5 ADM kernel without also reverting the call-site signatures in integer_compute_adm — upstream extended i4_adm_cm from 8 to 13 args.
  • Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --model version=vmaf_v0.6.1 -o /tmp/vmaf-port.json
grep '<metric name="vmaf"' /tmp/vmaf-port.json
# Expected: mean ≈ 76.66890 (golden 76.66890519623612, places=4 OK).

0013 — Upstream motion port (Netflix PR #1486 head 2aab9ef1)

  • Workstream PRs: this PR; ports upstream PR #1486 (4 commits on top of 966be8d5 ADM base, head 2aab9ef1). Sister to entry 0012.
  • Touches: core/src/feature/integer_motion.{c,h}, core/src/feature/motion_blend_tools.h (new upstream file), core/src/feature/x86/motion_avx2.c, core/src/feature/x86/motion_avx512.c, core/src/feature/alias.c (additive: integer_motion3 row), python/test/{quality_runner,vmafexec,feature_extractor,vmafexec_feature_extractor}_test.py (golden tolerance updates: places=4places=2 on motion-affected asserts; expected values unchanged).
  • Invariant: motion files mirror upstream byte-for-byte (modulo our clang-format-22 pass). The alias.c row for integer_motion3 was inserted surgically to avoid clobbering the AVX-512 ADM registration added by entry 0012; new motion3 metric appears in default VMAF model output but is not standalone-loadable via --feature integer_motion3 (sub-feature only). Netflix golden VMAF mean shifts 76.66890482476.667830213 (well within places=2 tolerance the upstream PR loosened to). Do not revert places=4 on motion-touching assertions without also reverting the motion code.
  • Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --model version=vmaf_v0.6.1 -o /tmp/vmaf-motion-port.json
grep -E '<metric name="vmaf"|integer_motion3' /tmp/vmaf-motion-port.json
# Expected: vmaf mean ≈ 76.66783; integer_motion3 mean ≈ 3.98976.

0014 — Coverage gate overhaul + upstream python/test/ reformat

  • Workstream PRs: this PR (coverage-gate overhaul + in-tree reformat of upstream-mirror Python tests).
  • Touches: .github/workflows/ci.yml (CPU + GPU coverage jobs: -Dc_args=-fprofile-update=atomic / -Dcpp_args=-fprofile-update=atomic, meson test --num-processes 1, -Denable_dnn=enabled, ORT install step on the CPU coverage job, lcov/geninfo replaced by gcovr with --json-summary / --xml / --txt output, artifact rename coverage-lcov-{cpu,gpu}coverage-{cpu,gpu}), scripts/ci/coverage-check.sh (rewritten to parse gcovr JSON via python3 -c — same CLI signature), core/src/dnn/dnn_api.c + new core/src/dnn/dnn_attach_api.c (vmaf_use_tiny_model carved out into its own TU so the unit-test binaries — which pull in dnn_sources for feature_lpips.c but never link libvmaf.c — don't end up with an undefined reference to vmaf_ctx_dnn_attach once enable_dnn=enabled activates the real bodies), core/src/dnn/meson.build + core/src/meson.build (new dnn_libvmaf_only_sources list wired into libvmaf.so only), python/test/{feature_extractor,quality_runner,vmafexec,vmafexec_feature_extractor}_test.py (mechanical Black + isort reformat — no assertion values changed, imports regrouped, line wrapping normalised).
  • Invariant: coverage CI must keep all five pieces in lockstep — (a) -fprofile-update=atomic closes the intra-process counter race on SIMD inner loops (vif_avx2.c:673, motion_avx2, etc.) → negative counts → geninfo/gcovr abort; (b) --num-processes 1 closes the inter-process race where multiple parallel test binaries merge their counters into the same .gcda files for the shared libvmaf.so at process exit (per-thread atomicity does not cover this); (c) gcovr deduplicates .gcno files belonging to the same source compiled into multiple targets — without dedup, lcov sums hits across compilation units and yields impossible

    100% values (dnn_api.c — 1176% was the smoking gun on the first attempt that had only (a)+(b)); (d) ORT install + enable_dnn=enabled in the coverage job is what makes core/src/dnn/*.c measurable in the first place — without ORT, the DNN tree compiles in stub branches and the 85% per-critical-file gate is meaningless; (e) vmaf_use_tiny_model lives in dnn_attach_api.c and is added to libvmaf.so only via dnn_libvmaf_only_sources — moving it back into dnn_api.c reintroduces the vmaf_ctx_dnn_attach undefined-reference link error in test_feature_extractor / test_lpips whenever enable_dnn=enabled, since those test binaries pull in dnn_sources for feature_lpips.c but never link libvmaf.c. Lint scope: upstream-mirror Python tests are linted at the same standard as fork-added code; we accept that /sync-upstream and /port-upstream-commit will re-trigger Black/isort failures whenever upstream rewrites these files, and the fix is another in-tree reformat pass — never an exclusion. The fork's pyproject.toml and .pre-commit-config.yaml keep python/test/resource/ (binary fixtures only) excluded; python/test/*.py is in scope. See ADR-0110 (race fixes, superseded) and ADR-0111 (gcovr + ORT layer).

  • Re-test:
# Reproduce coverage path locally (requires gcc + python3-pip):
pip install --user 'gcovr>=8.0'
cd libvmaf
meson setup build-cov-test --buildtype=debug -Db_coverage=true \
    -Denable_avx512=true -Denable_float=true -Denable_dnn=disabled \
    -Dc_args=-fprofile-update=atomic -Dcpp_args=-fprofile-update=atomic
ninja -C build-cov-test
meson test -C build-cov-test --print-errorlogs --num-processes 1
~/.local/bin/gcovr --root .. \
    --filter 'src/.*' \
    --exclude '.*/test/.*' --exclude '.*/tests/.*' \
    --exclude '.*/subprojects/.*' \
    --gcov-ignore-parse-errors=negative_hits.warn \
    --gcov-ignore-parse-errors=suspicious_hits.warn \
    --print-summary --txt build-cov-test/coverage.txt \
    --json-summary build-cov-test/coverage.json \
    build-cov-test
grep -E 'dnn_api|model_loader' build-cov-test/coverage.txt
# Expected: gcovr completes without "Unexpected negative count" AND no
# per-file percentages exceed 100% (drop --num-processes 1 to reproduce
# the multi-process .gcda merge race; switch back to lcov to reproduce
# the dnn_api.c — 1176% over-count from compilation-unit summation).

# Lint smoke test for upstream-mirror tree:
pre-commit run --files python/test/quality_runner_test.py
# Expected: Black/isort/Ruff all PASS — files are reformatted in-tree
# to fork style and stay clean until the next upstream sync.

0015 — Tox doctest collection skips vmaf/resource/

  • Workstream PRs: this PR (fix(ci): skip pytest doctest collection of vmaf/resource/ data files). Surfaced once ADR-0115 consolidated CI triggers to master and tox actually started running on PRs.
  • Touches: python/tox.ini (single-line --ignore=vmaf/resource added to the pytest invocation, plus an explanatory comment block). Pure fork-local; no upstream Python file changes.
  • Invariant: pytest --doctest-modules must not attempt to import files under python/vmaf/resource/. Those are parameter / dataset / example-config .py files; several have dots in their stems (e.g. vmaf_v7.2_bootstrap.py) that make them unimportable as Python modules. None carry doctests, so the ignore is correctness rather than a workaround. Do not drop the --ignore=vmaf/resource flag without first verifying every file under that directory has been renamed to a dot-free stem and is importable.
  • Re-test:
cd python && tox -e py311 -- --collect-only --doctest-modules \
    --ignore=vmaf/resource 2>&1 | grep -c "ERROR collecting vmaf/resource"
# Expected: 0 (was 5 before the fix).

Pure upstream code is not touched, so no Netflix-side conflict vector. Risk is upstream renaming or removing files under python/vmaf/resource/ such that the directory disappears, in which case the --ignore becomes a harmless no-op.

  • Workstream PRs: this PR (fix(libvmaf): gate -fsycl link arg on icpx CXX, allow gcc/clang host linker). Surfaced once ADR-0115's CI consolidation added an Ubuntu SYCL job to PR-time CI that uses CXX=g++ (host linker) with sidecar icpx for SYCL .cpp compilation.
  • Touches: core/src/meson.build (the vmaf_link_args block immediately after the is_sycl_enabled flag handling — currently ~lines 696-712). Pure fork-local; no upstream Meson file changes expected.
  • Invariant: -fsycl is appended to vmaf_link_args only when meson.get_compiler('cpp').get_id() == 'intel-llvm' (icpx). Rationale: the documented project mode (see comment near is_sycl_enabled block at top of src/meson.build) compiles SYCL .cpp files via custom_target with icpx, while the project's CXX driver may be gcc / clang / msvc; in that mode the SPIR-V device code is already embedded in the icpx-compiled .o files at compile time, and the runtime libraries (libsycl + libsvml + libirc + libze_loader) declared as link dependencies resolve every symbol. Passing -fsycl to a non-icpx linker is a hard error (g++: error: unrecognized command-line option '-fsycl'). Do not remove the cpp.get_id() == 'intel-llvm' guard without first verifying every CI matrix leg uses icpx as the project CXX.
  • Re-test:
meson setup build -Denable_sycl=true \
    -Dcpp_link_args=-Wl,--no-undefined
ninja -C build src/libvmaf.so.3
# Expected: link succeeds; no `-fsycl` errors with gcc/clang host CXX.

Pure fork-local guard; no Netflix-side conflict vector.

0017 — CLI precision default %.6f (Netflix-compat) + frame-skip unref

  • Workstream PRs: this PR (fix(cli): revert precision default to %.6f and unref skipped frames). Reverts the default flipped by commit c989fbd9 (ADR-0006) per ADR-0119. Companion fix in core/tools/vmaf.c resolves the picture-pool exhaustion in the --frame_skip_ref/dist loops surfaced once the always-on picture pool (ADR-0104) made unref'ing skipped pictures mandatory.
  • Touches:
  • core/tools/cli_parse.c (VMAF_DEFAULT_PRECISION_FMT + VMAF_LOSSLESS_PRECISION_FMT macros, resolve_precision_fmt() body, --help text)
  • core/tools/cli_parse.h (field comments only; struct shape unchanged)
  • core/src/output.c (DEFAULT_SCORE_FORMAT macro)
  • core/tools/vmaf.c (skip loop bodies at the c.frame_skip_ref / c.frame_skip_dist for-loops)
  • python/vmaf/core/result.py (per-frame and aggregate :.6f formatters)
  • python/test/command_line_test.py is unmodified — Netflix golden assertions stay frozen per CLAUDE.md §8; the binary's output format adapts to them, not the other way around.
  • Invariant: vmaf CLI default score-output format is %.6f (matches upstream Netflix byte-for-byte). --precision=max|full selects %.17g (IEEE-754 round-trip lossless). --precision=legacy is a synonym for the default. The library default for vmaf_write_output_with_format(..., score_format=NULL) matches. Skipped frames in the --frame_skip_ref / --frame_skip_dist pre-loops are vmaf_picture_unref'd immediately after fetch so the preallocated picture pool is not exhausted before the main scoring loop runs. Do not flip the macros back to %.17g or remove the unrefs without a superseding ADR — both are golden-gate-load-bearing.
  • Re-test:
ninja -C core/build
python -m pytest python/test/command_line_test.py \
    ::VmafexecCommandLineTest::test_run_vmafexec \
    ::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping \
    ::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping_unequal \
    -v
# Expected: all three PASS in <1 s combined.

Pure fork-local; no Netflix-side conflict vector. If upstream ever changes the default format string, treat their value as the new baseline and reconfirm the golden assertions before adopting.

0018 — FFmpeg patches ship as ordered series.txt

  • Workstream PRs: this PR (fix(ci): drop dead sycl trigger + consolidate windows.yml into libvmaf.yml (ADR-0115)). Surfaced once ADR-0115's consolidation routed the docker / FFmpeg-SYCL jobs through the master-targeting CI gate for the first time on this branch — the standalone 0003-…sycl… apply broke because it referenced struct fields added by 0001-…tiny-model…, the Dockerfile only COPY'd 0003, and ffmpeg.yml referenced a stale ../patches/ path.
  • Touches: Dockerfile (lines ~86-95 — the FFmpeg patch-apply block), .github/workflows/ffmpeg.yml (the Build FFmpeg with SYCL patch series step), ffmpeg-patches/000{1,2,3}-*.patch (regenerated via real git format-patch -3 so they carry valid index <sha>..<sha> <mode> lines and committable SHAs). Pure fork-local; no upstream FFmpeg or Netflix file changes.
  • Invariant: both the Dockerfile and ffmpeg.yml walk ffmpeg-patches/series.txt line-by-line and apply each patch via git apply with a patch -p1 fallback. Do not ship a new patch without appending it to series.txt, and do not reorder existing entries — patch 0003 references LIBVMAFContext fields added by patch 0001, so any out-of-order apply breaks the build at hunk 2 of vf_libvmaf.c.
  • Two flag-side fixes bundled in the same PR:
  • --enable-libvmaf-sycl is not a valid FFmpeg configure option. Patch 0003 uses check_pkg_config libvmaf_sycl … auto-detection (matching how libvmaf_cuda is wired) — it never registers the switch. Both Dockerfile and ffmpeg.yml used to pass the flag and configure rejected it with Unknown option "--enable-libvmaf-sycl". SYCL support is now controlled solely by -Denable_sycl=true at libvmaf build time; FFmpeg picks it up automatically when libvmaf-sycl.pc is on PKG_CONFIG_PATH.
  • The Dockerfile now carries two nvcc-flag ARGs. NVCC_FLAGS (libvmaf) keeps four -gencode lines plus the experimental --extended-lambda / --expt-relaxed-constexpr / --expt-extended-lambda flags needed for Thrust/CUB host+device code. FFMPEG_NVCC_FLAGS (FFmpeg) carries a single -gencode arch=compute_75,code=sm_75 -O2 — FFmpeg's check_nvcc runs nvcc -ptx, which fails with nvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures on multi-arch input, and --extended-lambda requires host+device compilation. compute_75 PTX is forward-compatible with all newer GPUs via driver JIT.
  • --enable-libnpp is no longer passed to FFmpeg's configure. FFmpeg n8.1's libnpp probe carries an explicit die "ERROR: libnpp support is deprecated, version 13.0 and up are not supported" (configure:7335-7336) that fires on the base image's CUDA 13.2 libnpp. We don't use scale_npp / transpose_npp / sharpen_npp in any VMAF workflow; cuvid + nvdec + nvenc + libvmaf-cuda is the actual GPU path. Revisit once we move to an FFmpeg release that supports CUDA 13 libnpp upstream.
  • Patch 0002 (add-vmaf_pre-filter) gained a missing #include "libavutil/imgutils.h" for av_image_copy_plane(). FFmpeg's libavfilter Makefile builds with -Werror=implicit-function-declaration so this fired during the actual compile (not configure). Caught by a local docker build rather than waiting for GitHub Actions — much faster iteration loop.
  • Re-test:
cd /tmp && rm -rf ffmpeg-test && \
    git clone -q --depth 1 -b n8.1 \
        https://git.ffmpeg.org/ffmpeg.git ffmpeg-test && \
    cd ffmpeg-test && \
    while IFS= read -r line; do \
        case "$line" in ''|\#*) continue ;; esac; \
        git apply "/path/to/vmaf/ffmpeg-patches/$line" \
            || patch -p1 < "/path/to/vmaf/ffmpeg-patches/$line"; \
    done < /path/to/vmaf/ffmpeg-patches/series.txt
# Expected: all three patches apply with no rejects; the resulting
# tree compiles with --enable-libvmaf. SYCL is auto-detected via
# check_pkg_config (patch 0003), so no explicit configure flag is
# required when libvmaf-sycl.pc is on PKG_CONFIG_PATH.

Pure fork-local series; no Netflix-side conflict vector. See ADR-0118.

0019 — Coverage Gate annotations: upload-artifact v7 + gcovr filter

  • Workstream PRs: this PR.
  • Touches: .github/workflows/ci.yml (CPU + GPU coverage steps: gcovr stderr piped through grep -vE 'Ignoring (suspicious|negative) hits' ... || true), .github/workflows/{ci,lint,nightly,nightly-bisect,supply-chain,libvmaf}.yml (actions/upload-artifact@v5|@v6 → @v7, actions/download-artifact@v5 → @v7 in supply-chain.yml). Note: windows.yml was consolidated into libvmaf.yml by ADR-0115 / PR #50, so the windows-side bump now lives in libvmaf.yml's build (MINGW64, …) job.
  • Invariant: Coverage Gate Annotations panel must finish empty on a clean run. The two pieces are coordinated — (a) @v7 for upload / download artifact actions silences GitHub's Node-20 deprecation banner ahead of the 2026-06-02 forced-Node-24 cutoff; (b) the gcovr stderr filter swallows the Ignoring (suspicious|negative) hits warnings that gcovr 8 emits for the legitimately-large hit counts in tight ANSNR / VIF / motion inner loops (e.g. ansnr_tools.c:207 at ~4.93 G hits across an HD multi-frame coverage suite — real, not gcov bug). The filter is regex-narrow and anchored to gcov's exact warning prefix; any other gcovr warning still surfaces. Upstream (Netflix/vmaf) does not maintain these CI files; rebase impact is limited to the unlikely case that an upstream sync touches the shared .github/workflows/ tree, which it currently does not. See ADR-0117.
  • Re-test:
# Verify gcovr filter locally (after a coverage build per entry 0014):
~/.local/bin/gcovr --root .. \
    --filter 'src/.*' \
    --exclude '.*/test/.*' --exclude '.*/tests/.*' \
    --exclude '.*/subprojects/.*' \
    --gcov-ignore-parse-errors=negative_hits.warn \
    --gcov-ignore-parse-errors=suspicious_hits.warn \
    --print-summary --txt build-cov-test/coverage.txt \
    build-cov-test \
  2> >(grep -vE 'Ignoring (suspicious|negative) hits' >&2 || true)
# Expected: stderr contains the gcovr summary block but NO
# "Ignoring (suspicious|negative) hits" lines. coverage.txt unchanged.

# Verify all upload/download-artifact instances are on @v7:
grep -rE 'actions/(upload|download)-artifact@v[0-6]' .github/workflows/
# Expected: empty output.

0020 — CI workflow file + display-name renames (Title Case sweep)

  • Workstream PRs: this PR; renames all six core .github/workflows/*.yml files to purpose-descriptive kebab-case and normalises every workflow name: and job name: to Title Case. See ADR-0116.
  • Touches: .github/workflows/{ci,lint,security,libvmaf,ffmpeg,docker}.yml (renamed via git mv to tests-and-quality-gates.yml, lint-and-format.yml, security-scans.yml, libvmaf-build-matrix.yml, ffmpeg-integration.yml, docker-image.yml), README.md (5 badge URLs + labels), docs/principles.md (line 5 workflow-tuple update), .claude/skills/add-gpu-backend/SKILL.md + scaffold.sh (filename refs), docs/adr/0116-*.md (new), docs/adr/README.md (index row), CHANGELOG.md.
  • Invariant: workflow files are purpose-named; their name: fields are Title Case sentences with em-dash axis tags; job-level name: strings are Title Case sentences (Build — / Pre-Commit / Coverage Gate / etc.). Required-status-check contexts in master branch protection are bound to job-level names — when renaming any job, re-pin via gh api --method PUT repos/VMAFx/vmafx/branches/master/protection. The 19 required gates' semantics are unchanged from ADR-0037; only their display strings move.
  • Re-test:
# Validate every workflow file parses and lists the expected job names.
cd .github/workflows
for f in tests-and-quality-gates.yml lint-and-format.yml security-scans.yml \
         libvmaf-build-matrix.yml ffmpeg-integration.yml docker-image.yml; do
    yq '.name, .jobs.[].name' "$f" || echo "PARSE FAIL: $f"
done
# Expected: each workflow prints its Title Case workflow name + job names;
# no PARSE FAIL lines.

0021 — DNN-enabled CI matrix legs (gcc + clang + macOS)

  • Workstream PRs: this PR; adds three new entries to the libvmaf-build matrix in .github/workflows/libvmaf-build-matrix.yml covering -Denable_dnn=enabled across Ubuntu/gcc, Ubuntu/clang, and macOS/clang. See ADR-0120.
  • Touches: .github/workflows/libvmaf-build-matrix.yml (3 new matrix entries + ORT install steps + dedicated dnn-suite test step), docs/adr/0120-ai-enabled-ci-matrix-legs.md (new), docs/adr/README.md (index row), CHANGELOG.md (Added entry).
  • Invariant: the DNN matrix legs install ONNX Runtime via the same pinned source as the dedicated Tiny AI job (tests-and-quality-gates.yml) — Linux: MS tarball at the version pinned by ORT_VERSION; macOS: Homebrew. When the Tiny AI job's pin changes, the matrix legs' ORT_VERSION env in their Install ONNX Runtime (linux, DNN leg) step must change to match; otherwise compiler/portability coverage drifts away from the gating leg's actual ABI.
  • Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.libvmaf-build.strategy.matrix.include[] | select(.dnn==true) | .name' \
    .github/workflows/libvmaf-build-matrix.yml
# Expected output (3 lines):
#   Build — Ubuntu gcc (CPU) + DNN
#   Build — Ubuntu clang (CPU) + DNN
#   Build — macOS clang (CPU) + DNN

# Local DNN build sanity (matches what each leg will run):
meson setup libvmaf core/build --buildtype release \
    --prefix $PWD/install -Denable_float=true -Denable_dnn=enabled
ninja -vC core/build install
meson test -C core/build --suite=dnn --print-errorlogs
  • Branch protection: the two Linux DNN legs are pinned as required status checks on master immediately after this PR's merge (19 → 21 contexts). The macOS leg stays informational (experimental: true) because Homebrew ORT floats. Re-pin command:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
    --input /tmp/protection-update.json

0022 — Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)

  • Workstream PRs: this PR; adds a new top-level windows-gpu-build job to .github/workflows/libvmaf-build-matrix.yml with two matrix entries (CUDA, SYCL). See ADR-0121.
  • Touches: .github/workflows/libvmaf-build-matrix.yml (new windows-gpu-build job), docs/adr/0121-windows-gpu-build-only-legs.md (new), docs/adr/README.md (index row), CHANGELOG.md (Added entry), core/src/compat/win32/pthread.h (new — Win32 pthread shim for MSVC; mirrors compat/gcc/stdatomic.h pattern), core/src/feature/integer_adm.h (UPSTREAM — converted the dwt_7_9_YCbCr_threshold[3] designated initializer to positional form so MSVC/nvcc-on-Windows accepts the C++ parse; semantically identical, no behavioural change), core/src/ref.h and core/src/feature/feature_extractor.h (UPSTREAM — added #if defined(__cplusplus) && defined(_MSC_VER) branch around #include <stdatomic.h> so MSVC C++ TUs pull atomic_int via using std::atomic_int;; POSIX paths unchanged), core/src/sycl/d3d11_import.cpp (fix non-existent <libvmaf/log.h>"log.h"), core/src/sycl/dmabuf_import.cpp (move <unistd.h> inside #if HAVE_SYCL_DMABUF guard for non-VA-API hosts), core/src/sycl/common.cpp (replace POSIX clock_gettime(CLOCK_MONOTONIC) with portable std::chrono::steady_clock), core/src/feature/x86/motion_avx2.c (UPSTREAM — replace GCC vector-extension __m256i[N] indexing at line 529 with _mm256_extract_epi64; bit-exact), core/src/feature/x86/adm_avx2.c (UPSTREAM — replace 6 (__m256i)(_mm256_cmp_ps(...)) casts with _mm256_castps_si256(...) and 12 __m128i[N] reductions with _mm_extract_epi64; bit-exact), core/src/feature/x86/adm_avx512.c (UPSTREAM — replace 12 __m128i[N] reductions with _mm_extract_epi64; bit-exact), core/src/log.c (UPSTREAM — gate <unistd.h> behind !_WIN32, include <io.h> + redirect isatty/fileno to _isatty/_fileno for MSVC), core/src/feature/integer_vif.c (UPSTREAM — switch the aligned_malloc cursor from void * to uint8_t * with explicit typed-pointer casts so MSVC accepts the byte-wise pointer arithmetic), core/src/feature/cuda/integer_adm_cuda.c (UPSTREAM — drop unused <unistd.h> include), core/src/dnn/model_loader.c (fork-added — Windows fallback definitions for POSIX S_ISDIR / S_ISREG path-classification macros), .github/workflows/lint-and-format.yml (fork-added — set lfs: true on the pre-commit job's checkout so LFS-stored ONNX blobs resolve and don't appear as phantom pre-commit-induced diffs), core/src/feature/x86/motion_avx512.c (UPSTREAM — replace 1 __m128i[N] reduction with _mm_extract_epi64; bit-exact), core/src/feature/x86/{vif_statistic_avx2,ansnr_avx2,ansnr_avx512,float_adm_avx2,float_adm_avx512,float_psnr_avx2,float_psnr_avx512,ssim_avx2,ssim_avx512}.c (UPSTREAM — convert 17 sites of trailing __attribute__((aligned(N))) to leading C11 _Alignas(N); same alignment, MSVC-portable), core/src/feature/mkdirp.c and core/src/feature/mkdirp.h (UPSTREAM third-party MIT-licensed micro-library — gate <unistd.h> to non-Windows, add <direct.h> + _mkdir for Windows, add mode_t typedef for MSVC), core/meson.build (new pthread_dependency gated on cc.check_header('pthread.h') failing), core/src/meson.build and core/test/meson.build (thread pthread_dependency into every target compiling pthread-using TUs).
  • Invariant: Windows GPU legs are pinned to the same toolchain versions as the corresponding Linux GPU legs (CUDA 13.0.0, oneAPI BaseKit 2025.3.0.372) so a Linux-vs-Windows divergence implies an MSVC ABI issue, not a tooling-version delta. When either Linux GPU leg bumps its toolchain, the Windows leg must move in lockstep — the Intel installer URL on Windows hard-codes the per-release directory id and the version string, so the bump is two-line edits in the SYCL Install Intel oneAPI (windows) step (the WINDOWS_BASEKIT_URL env var). Both legs additionally inject /experimental:c11atomics into CFLAGS / CXXFLAGS because libvmaf uses C11 atomics that MSVC's <stdatomic.h> rejects without that opt-in flag — when MSVC ships full C11 atomics support, the flag becomes unconditional and can be dropped. Two Windows-only dependency steps round out the parity: the CUDA leg's Jimver/cuda-toolkit sub-package list includes both crt (CUDA Runtime Library compile-time headers, ships crt/host_config.h; cuda_cccl is not a valid Windows sub-package name — installer rejects it) and nvvm (ships nvvm/bin/cicc.exe + nvvm/libdevice/libdevice.*.bc; without it, nvcc's .cu → PTX stage fails with The system cannot find the path specified. — on Linux apt pulls NVVM in transitively with cuda-nvcc-XY, Windows requires it explicitly); the SYCL leg builds the Level Zero loader from source (oneapi-src/level-zero v1.18.5 → cmake --build … --target install) because Windows oneAPI BaseKit ships the SYCL runtime but not ze_loader.lib, and libvmaf's meson cc.find_library('ze_loader') needs both the header and the import library. When the Linux apt level-zero-dev version moves, bump the L0 git tag to match. core/src/meson.build guards the explicit svml / irc cc.find_library calls behind host_machine.system() != 'windows' — those calls exist for the gcc/g++ + icpx Linux flow where the host linker is non-Intel; on Windows the host compiler is icx-cl itself and auto-injects the Intel runtime. Round-10 surfaced an additional Windows-only gap: ~14 libvmaf TUs #include <pthread.h> unconditionally, but MSVC and clang-cl ship no pthread (MinGW does, via winpthreads). The fork now ships a header-only Win32 shim at core/src/compat/win32/pthread.h mapping the in-use pthread subset (mutex / cond / thread create+join+detach) onto SRWLOCK + CONDITION_VARIABLE + _beginthreadex. The shim is wired in via pthread_dependency in core/meson.build, declared only when cc.check_header('pthread.h') fails — so MinGW and POSIX paths stay untouched. When upstream Netflix/vmaf adds new pthread surface (e.g., pthread_rwlock_*), extend compat/win32/pthread.h to cover it. Both nvcc fatbin custom_targets (CUDA) and icpx custom_targets (SYCL common.cpp / picture_sycl.cpp / dmabuf_import.cpp, plus the SYCL feature kernels) bypass meson's dependencies: plumbing and hand-roll their own -I lists, so the shim path must be threaded into both cuda_extra_includes and sycl_inc_flags explicitly on Windows. icpx-cl on Windows additionally rejects -fPIC (unsupported option for target 'x86_64-pc-windows-msvc') — so sycl_common_args and sycl_feature_args route their -fPIC token through sycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []. PIC is the default for Windows DLLs, so dropping the flag is the correct fix rather than a workaround. Round-14 surfaced a third Windows-only blocker: core/src/feature/integer_adm.h (an upstream Netflix file, last touched by upstream port d06dd6cf) initialises dwt_7_9_YCbCr_threshold[3] with C99 designated initializers ({.a = ..., .k = ..., .f0 = ..., .g = {...}}). The header is included from both integer_adm.c (C TU) and cuda/integer_adm/*.cu (C++ TU via nvcc); MSVC's C++ frontend (and nvcc's cudafe++ on Windows) rejects C99 designated initializers without /std:c++20. Converted to positional initialization in the same struct-member order (a / k / f0 / g[4]) — the conversion is provably semantically identical and works in every C/C++ standard, so it costs nothing on the upstream-merge side beyond a trivial conflict marker if upstream Netflix later edits the same lines. Restore designated form post-merge if upstream has it. Round-17 surfaced four more Windows/MSVC-only SYCL blockers, two of which touch upstream-shared headers. (a) core/src/ref.h and core/src/feature/feature_extractor.h (UPSTREAM) unconditionally #include <stdatomic.h> and use the atomic_int typedef in struct definitions. MSVC's <stdatomic.h> (added in 19.34) only declares the C11 symbols inside the global namespace under C; in C++ compilation (icpx-cl drives the SYCL TUs as C++) MSVC surfaces them only inside namespace std::. gcc/clang expose both via a GNU extension, so the upstream code works on every other platform. The fork now wraps both headers' #include <stdatomic.h> in #if defined(__cplusplus) && defined(_MSC_VER)#include <atomic> + using std::atomic_int;, falling through to the original <stdatomic.h> line on every other configuration. ABI is unchanged — atomic_int resolves to the same underlying type. If upstream Netflix adds further C11 atomic typedefs in these headers (e.g., atomic_uint, atomic_size_t), extend the using std:: lines to cover them. (b) core/src/sycl/d3d11_import.cpp (fork-added) used <libvmaf/log.h> which doesn't exist — log.h lives at core/src/log.h and is internal. Switched to "log.h"; the icpx invocation already supplies the src-relative -I. (c) core/src/sycl/dmabuf_import.cpp (fork-added) included <unistd.h> at file scope, but POSIX close() is only used inside the #if HAVE_SYCL_DMABUF VA-API block. Moved the <unistd.h> include inside that guard so non-DMA-BUF builds (Windows MSVC, macOS) compile cleanly. (d) core/src/sycl/common.cpp (fork-added) called clock_gettime(CLOCK_MONOTONIC), which doesn't exist on Windows. Replaced with std::chrono::steady_clock (guaranteed monotonic by the C++ standard, portable on every supported host). All four fixes preserve POSIX/Linux behaviour bit-identically and only change the Windows MSVC build path. Round-18 surfaced a fifth Windows blocker on the CUDA leg's CPU SIMD compile path: core/src/feature/x86/motion_avx2.c:529 (UPSTREAM, ported in commit 9371a0aa from Netflix PR #1486) computed final_accum[0] + final_accum[1] + final_accum[2] + final_accum[3] to extract the four int64 lanes from an __m256i. gcc/clang allow this via the GNU vector-extension treatment of __m256i (it carries __attribute__((vector_size(32)))); MSVC rejects it with C2088: built-in operator '[' cannot be applied to an operand of type '__m256i'. Replaced with _mm256_extract_epi64(final_accum, N) for N ∈ {0..3}, summed — bit-exact lane sum on every compiler. Restore the index form post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Round-19 surfaced the same MSVC pattern at 19 more call sites across the AVX2/AVX-512 ADM and motion files plus six GCC-style vector casts. core/src/feature/x86/adm_avx2.c (UPSTREAM): 6 lines (915-920) used (__m256i)(_mm256_cmp_ps(...)) C-style casts that gcc/clang accept via the GNU vector extension; replaced with the dedicated _mm256_castps_si256(...) bit-cast intrinsic. 12 lane-extract sites (r2_h[0]+r2_h[1], etc. at lines 2420 / 2425 / 2430 / 2893 / 2897 / 2901 / 4079 / 4084 / 4089 / 4627 / 4631 / 4635) replaced with _mm_extract_epi64(r2_X, N) summed pair. core/src/feature/x86/adm_avx512.c (UPSTREAM): 6 sister lane-extract sites (lines 4470 / 4477 / 4484 / 4625 / 4631 / 4637) — same fix. The AVX-512 paths reduce a __m512i down to __m128i first (via _mm512_extracti64x4_epi64_mm256_extracti64x2_epi64) before the index, so only the final __m128i[N] step needed changing. core/src/feature/x86/motion_avx512.c (UPSTREAM, ported in 9371a0aa from PR #1486): one final r2[0]+r2[1] reduction (line 448), same fix. All 19 lane-extract fixes plus the 6 cast fixes are bit-exact rewrites and only change the source-level syntax to MSVC-portable form. Restore the original forms post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Additionally core/src/sycl/d3d11_import.cpp (fork-added) switched from C-style COBJMACROS helpers (ID3D11Device_CreateTexture2D, …_Release, etc.) to C++ method-call syntax (device->CreateTexture2D, tex->Release) — d3d11.h gates COBJMACROS behind !defined(__cplusplus), so the C-style helpers aren't visible in this .cpp TU. The two forms are ABI-equivalent (both dispatch through the COM vtable); the choice is purely lexical and POSIX builds aren't affected (the whole TU is #ifdef _WIN32). Round-20 surfaced two more Windows-only blockers. (a) 17 sites across the x86 SIMD layer used GCC's float tmp[N] __attribute__((aligned(M))); form to align scratch buffers for _mm{256,512}_store_ps. MSVC rejects the trailing-attribute syntax with C2146: syntax error: missing ';' before identifier '__attribute__'. Replaced with the C11-standard _Alignas(M) float tmp[N]; (alignment specifier before the type) — works in gcc, clang and MSVC with /std:c11. Files touched (all UPSTREAM): vif_statistic_avx2.c (×2), ansnr_avx2.c (×2), ansnr_avx512.c (×2), float_adm_avx2.c (×2), float_adm_avx512.c (×2), float_psnr_avx2.c (×1), float_psnr_avx512.c (×1), ssim_avx2.c (×4), ssim_avx512.c (×4). The pre-existing vif_avx2.c / vif_avx512.c already define a portable ALIGNED(x) macro at file scope and position the attribute before the type, so they compile cleanly under MSVC and were not touched. (b) core/src/feature/mkdirp.c (UPSTREAM, third-party MIT-licensed copy of Stephen Mathieson's micro-library) included <unistd.h> unconditionally but never used POSIX unistd symbols (only mkdir via <sys/stat.h>/<direct.h>). Gated <unistd.h> to non-Windows and added <direct.h> for Windows; switched mkdir(pathname)_mkdir(pathname) (the non-deprecated MSVC name). core/src/feature/mkdirp.h added a mode_t typedef under MSVC since neither <sys/types.h> nor <sys/stat.h> declare it on Windows; mode is ignored on the Windows path anyway. Round-21 surfaced two more blockers (the round-19 __m128i[N] sweep missed six sites) plus a pre-commit workflow checkout gap. (a) core/src/feature/x86/adm_avx512.c (UPSTREAM) had six further r2_X[0] + r2_X[1] reductions at lines 2128 / 2135 / 2142 / 2589 / 2595 / 2601 that reduce a __m512i accumulator down to __m128i before the lane index. Replaced with the same _mm_extract_epi64(r2_X, N) summed-pair pattern used in round 19 — bit-exact, MSVC-portable. (b) core/src/log.c (UPSTREAM) included <unistd.h> unconditionally to pick up POSIX isatty / fileno. On MSVC both live in <io.h> as _isatty / _fileno; gated the include and macro-redirected the names so the one call site at line 34 compiles on both sides without touching the POSIX path. (c) .github/workflows/lint-and-format.yml (fork-added) checks out without lfs: true, so the model/tiny/*.onnx files land as LFS pointer stubs. pre-commit's "changes made by hooks" reporter then diffs the stubs against HEAD's real blobs and fails the job even though no hook touched them. Added lfs: true to the pre-commit job's checkout. (d) core/src/meson.buildcuda_common_vmaf_lib static library had no dependencies: list, so the Win32 pthread shim (wired in via pthread_dependency in core/meson.build) wasn't on its include path; cuda/common.h unconditionally #include <pthread.h> and MSVC failed with C1083. Added dependencies : [pthread_dependency] — no-op on POSIX (empty list), routes the shim path in on Windows. (e) core/src/feature/integer_vif.c (UPSTREAM) walked one big aligned_malloc result as void *data and did data += pad_size / data += h * stride_16 etc. to carve the buffer into typed sub-pointers. gcc/clang accept pointer arithmetic on void * as a GNU extension (treating sizeof(void) == 1); MSVC rejects it with C2036: 'void *': unknown size. Replaced the cursor type with uint8_t * and added explicit casts at assignment sites that take a typed pointer (uint16_t *mu1, uint32_t *mu1_32, etc.). Byte offsets are identical, layout unchanged, bit-exact. If upstream Netflix edits the same loop, reabsorb the walk and re-apply the cursor-type + cast pattern. (f) core/src/feature/cuda/integer_adm_cuda.c (UPSTREAM) included <unistd.h> at line 33 but used no POSIX symbols from it; MSVC failed with C1083. Dropped the unused include outright — simplest fix, no runtime change on any platform. (g) core/src/dnn/model_loader.c (fork-added) uses S_ISDIR / S_ISREG to classify resolved paths. MSVC ships the underlying S_IFMT / S_IFDIR / S_IFREG bit masks in <sys/stat.h> but not the POSIX classification macros. Added a Windows-only fallback (#ifndef S_ISDIR #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #endif, same for S_ISREG) guarded by #ifdef _WIN32. Semantically identical to the POSIX macro on Linux/macOS. Round-21e surfaced the final source-portability blockers once the DLL build passed preprocessing. (h) core/src/predict.c, core/src/libvmaf.c and core/src/read_json_model.c (all UPSTREAM) used C99 variable-length arrays — double scores[cnt] at predict.c:385, char name[name_sz] at predict.c:453 and libvmaf.c:1741, plus cfg_name[cfg_name_sz] and generated_key[generated_key_sz] in the .json model-collection parser. gcc/clang accept VLAs as a C11 optional feature; MSVC (even with /std:c11) rejects them outright with C2057: expected constant expression (plus C2466 and C2133 on the const size_t sized arrays — MSVC treats const as runtime-bounded, not a constant expression, even when the initialiser is literal like 4 + 1). Replaced each runtime-sized buffer with a small malloc + explicit free on every exit path (in predict.c and read_json_model.c a goto out; cleanup arm was introduced because the loops error-exit mid-function). The generated_key buffer in read_json_model.c uses the narrower fix — char generated_key[5]; — since its size (four decimal digits of the bootstrap sub-model index plus NUL) is a true compile-time constant. Buffers are a handful of bytes each (name_sz is the model-collection name length plus the fixed _ci_p95_lo suffix, scores holds ~20 doubles, cfg_name is the name plus _0000 suffix), so the heap round-trip is not performance-relevant; the new -ENOMEM failure mode is handled uniformly by existing callers. The read_json_model.c refactor also plugs a pre-existing leak of the name buffer on the early return -EINVAL when a JSON object key isn't a string — the goto out; path frees name + cfg_name on every exit. core/test/test_feature_extractor.c:56 (UPSTREAM) declared const unsigned n_threads = 8; and used it as the extent of VmafFeatureExtractorContext *fex_ctx[n_threads];. Converted to enum { n_threads = 8 }; so MSVC sees a constant-expression; every other compiler accepts enum constants identically. Re-absorb if upstream Netflix later edits the same loops and your toolchain matrix omits MSVC. (i) The Windows MSVC build-only legs now build the full tree — CLI tools, unit tests and libvmaf.dll — rather than the previous short cut of disabling -Denable_tools / -Denable_tests. Per user direction ("fix the code ffs"), the tree polyfills the remaining POSIX surfaces on MSVC instead: (core/tools/compat/win32/getopt.h + core/tools/compat/win32/getopt.c) a from-scratch POSIX/GNU-compatible getopt_long shim (short / long options, no_argument / required_argument / optional_argument, argv permutation for non-option operands, -- explicit stop, =-embedded values). The shim is fork-added (BSD-3-Clause-Plus-Patent, Copyright 2026 Lusoris and Claude) and declared via a single getopt_dependency in core/meson.build, gated on cc.check_header('getopt.h') failing. The dependency auto-propagates the shim .c into any consuming target via meson's sources: keyword, so both the vmaf CLI (core/tools/meson.build) and the test_cli_parse unit test (core/test/meson.build) pick it up uniformly. MinGW ships <getopt.h> via mingw-w64-crt, so check_header succeeds there and the shim stays out of the TU list. (j) Eleven test executables (test_log, test_dict, test_opt, test_cpu, test_ref, test_feature, test_ciede, test_luminance_tools, test_cli_parse, test_sycl, test_sycl_pic_preallocation) were missing pthread_dependency in their dependencies: lists at core/test/meson.build. On POSIX pthread_dependency is an empty list so the omission was invisible; on MSVC those TUs transitively include feature_collector.h<pthread.h> and fail with C1083. Threaded the dependency through all eleven targets. test_cli_parse additionally lists getopt_dependency to pick up the shim. (k) Three additional VLA sites surfaced once the test harness built on MSVC: test_cambi.c:254 had unsigned w = 5, h = 5; uint16_t buffer[3 * w];; converted to enum { w = 5, h = 5 }; so the array extent is a constant expression. test_pic_preallocation.c:382 and test_pic_preallocation.c:506 had const int num_threads = N; pthread_t threads[num_threads]; — MSVC rejects const int as non-constant-expression. Converted to enum { num_threads = N, fetches_per_thread = M };. (l) test_ring_buffer.c:23 and test_pic_preallocation.c:26 included <unistd.h> for usleep / sleep. Gated behind !_WIN32 with a Win32 fallback via <windows.h> + #define usleep(us) Sleep(((us) + 999) / 1000) / #define sleep(s) Sleep((s) * 1000). The conversion rounds sub-millisecond usleep inputs up, which is safe for these test paths (they use 100 µs jitter and 1 s waits). (m) core/tools/vmaf.c included <unistd.h> for isatty / fileno. Applied the same gating pattern used in log.c in round-21(b) — include <io.h> on MSVC and redirect isatty / fileno to _isatty / _fileno via #define. (n) __builtin_clz / __builtin_clzll are GCC intrinsics; MSVC ships __lzcnt / __lzcnt64 via <intrin.h> instead. The shim already lived in core/src/feature/integer_vif.h but integer_adm.c:939, x86/adm_avx2.c:1425 and x86/adm_avx512.c:1217 don't include that header. Extracted the shim into a dedicated core/src/feature/compat_builtin.h (fork-added) and included it from all four TUs. The guard is defined(_MSC_VER) && !defined(__clang__), so clang-cl / icx-cl (which provide the GCC intrinsics natively) skip the shim. (o) The SYCL leg's D3D11 import TU core/src/sycl/d3d11_import.cpp is C++ (icpx-cl drives it as C++ on Windows) but included the internal C header log.h without an extern "C" wrap. log.h is an upstream Netflix header with no __cplusplus guard, so vmaf_log got C++ name-mangled in the .cpp TU and failed to resolve against the C-linkage symbol produced by log.c at link time (LNK2019 from every test target that pulls in the SYCL static lib). Wrapped the #include "log.h" with extern "C" { ... } inside the fork-added .cpp rather than touching the upstream header — keeps log.h identical to upstream on every /sync-upstream. (p) The Windows MSVC legs build with --default-library=static. libvmaf's public API has no __declspec(dllexport) attributes (upstream Netflix is POSIX-shaped), so a vanilla MSVC shared build produces src/vmaf-3.dll with no exported symbols and the toolchain therefore never emits the companion vmaf.lib import library. Downstream tool targets then fail with LNK1181: cannot open input file 'src\vmaf.lib'. The MinGW matrix leg has used --default-library static since day one for the same reason (line 387); the MSVC legs now mirror that choice via matrix.include[].meson_extra. Downstream consumers that want a DLL can either add __declspec(dllexport) decorations to the public API or use a .def file; that is a separate decision and out of scope for the build-only gate.
  • Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.windows-gpu-build.strategy.matrix.include[].name' \
    .github/workflows/libvmaf-build-matrix.yml
# Expected output (2 lines):
#   Build — Windows MSVC + CUDA (build only)
#   Build — Windows MSVC + oneAPI SYCL (build only)
  • Branch protection: the two Windows GPU legs are pinned as required status checks on master immediately after this PR's merge. After ADR-0120's two Linux DNN legs the count moves 21 → 23. Re-pin via:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
    --input /tmp/protection-update.json

0023 — CUDA gencode coverage (sm_86/sm_89/compute_80 PTX) + init hardening

  • Workstream PRs: the ADR-0122 PR (gencode + init hardening) and the ADR-0123 follow-up for the 32b115df post-cubin-load regression.
  • Touches:
  • core/src/meson.build — the gencode array in the if get_option('enable_nvcc') branch.
  • core/src/cuda/common.cvmaf_cuda_state_init() error paths (multi-line actionable log, cuda_free_functions() + free(c) + *cu_state = NULL cleanup).
  • docs/backends/cuda/overview.md## Runtime requirements section and ### GPU architecture coverage table.
  • Invariant: the gencode array unconditionally emits cubins for sm_75 / sm_80 / sm_86 / sm_89 plus a compute_80 PTX, independent of host nvcc version. Upstream Netflix's gencode only ships cubins at Txx major boundaries (sm_75 / sm_80 / sm_90 / sm_100 / sm_120); a literal merge that replaces our array with upstream's would re-open the Ampere-sm_86 / Ada-sm_89 coverage hole. The sm_90 / sm_100 / sm_120 entries are still version-gated and should be preserved verbatim if upstream adds new gates. The init-path error messages are fork-local strings; upstream's terse "Error: failed to load CUDA functions" must NOT win a merge.
  • Re-test:
meson setup build -Denable_cuda=true -Denable_nvcc=true
ninja -C build 2>&1 | grep -E 'compute_(80|86|89)'
# Expect at least -gencode=arch=compute_86,code=sm_86 and
#                -gencode=arch=compute_89,code=sm_89 and
#                -gencode=arch=compute_80,code=compute_80

# Actionable init message (run without CUDA driver on the loader path):
LD_LIBRARY_PATH= ./build/tools/vmaf --help 2>&1 | grep -qi 'libcuda.so.1' || \
    echo "init log regressed"

0024 — vmaf_read_pictures null-guard for CUDA device-only path

  • Workstream PRs: the ADR-0123 follow-up landed atop the ADR-0122 gencode/init-hardening work.
  • Touches:
  • core/src/libvmaf.c — the non-threaded tail of vmaf_read_pictures at the prev_ref update site (line ~1428 in the fork; upstream equivalent is the tail added by f740276a).
  • Invariant: the prev_ref update is guarded by if (ref && ref->ref) so pure-CUDA extractor sets (where ref = &ref_host but ref_host was never populated by translate_picture_device) do not deref a NULL refcount. Upstream currently has the same unguarded tail; the bug is masked upstream only because the experimental VMAF_PICTURE_POOL gate from 32b115df is still in place. A literal upstream merge that removes our null-guard while upstream's experimental gate is still holding would pass tests but re-open the libvmaf_cuda ffmpeg crash the moment the gate flips default-on (which the fork did in 65460e3a, ADR-0104). Keep the guard until the upstream null-guard port lands.
  • Re-test:
# Unit tests cover the non-regression on the library side:
meson test -C build

# End-to-end regression: ffmpeg libvmaf_cuda must exit 0 on a
# CUDA-device-only extractor set (full recipe in ADR-0123).
./ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
  -i /tmp/ref.mp4 -i /tmp/dis.mp4 \
  -lavfi "[0:v]format=yuv420p,hwupload_cuda[r];\
          [1:v]format=yuv420p,hwupload_cuda[d];\
          [r][d]libvmaf_cuda=log_path=/tmp/out.json:log_fmt=json" \
  -f null -

0025 — VIF init() fail-path frees advanced byte-cursor

  • Workstream PRs: PR #47 (rewritten to leak-fix-only after master absorbed the void→uint8_t half via commit b0a4ac3a, entry 0022 §e). Ports the leak-fix half of upstream Netflix PR #1476.
  • Touches: core/src/feature/integer_vif.c (UPSTREAM — 2-line fix in the init() fail: handler).
  • Invariant: init() walks uint8_t *data forward through aligned_malloc's one allocation, advancing past each sub-pointer assignment. If vmaf_feature_name_dict_from_provided_features returns NULL the fail path must free the base pointer s->public.buf.data, never the advanced cursor data. Upstream master still has aligned_free(data) there — same bug — so this entry is the reminder to not let an upstream sync re-introduce the advanced-cursor form. If upstream lands PR #1476 or an equivalent, the sync can drop this entry.
  • Re-test:
meson test -C build --suite=fast
# Static check: ripgrep the pattern that must NOT return.
rg -n "aligned_free\(data\)" core/src/feature/integer_vif.c && \
    echo 'REGRESSED' || echo 'ok'
  • Workstream PRs: this PR (ADR-0124 adoption). Closes the "rule-without-a-check" gap on ADR-0100 / 0105 / 0106 / 0108.
  • Touches (all FORK-ADDED — no upstream overlap): .github/workflows/rule-enforcement.yml (new), scripts/ci/check-copyright.sh (new), .pre-commit-config.yaml (appended local hook).
  • Invariant: the deep-dive-checklist job is blocking on every PR that is not an upstream port (exempt via port: title prefix or port/ branch). The other three gates (doc-substance-check, adr-backfill-check, copyright pre-commit) are advisory or pre-commit, never CI-blocking; this split is the whole point of ADR-0124 and an upstream sync must not move them into the required-status-check set without a follow-up ADR. The opt-out parser matches /^-?\s*no .* (?:needed|impact|rebase-sensitive)/ per ADR-0108 §Opt-out-lines — if upstream ever changes PR-template phrasing (unlikely; this is fork-local), the regex and the template must move together.
  • Re-test:
# Lint the workflow + hook locally.
pre-commit run --files \
  .github/workflows/rule-enforcement.yml \
  scripts/ci/check-copyright.sh \
  .pre-commit-config.yaml

# Dry-run the copyright hook against a staged source file.
scripts/ci/check-copyright.sh core/src/libvmaf.c && echo ok

# Synthetic PR body that violates ADR-0108 should fail the parser;
# see docs/research/0002-automated-rule-enforcement.md §Verification
# plan for the three test cases.

0027 — SSIMULACRA 2 scalar extractor (libjxl FastGaussian IIR blur)

  • Workstream PRs: this PR (feat/ssimulacra2-scalar); proposal ADR in PR #67.
  • Touches: core/src/feature/ssimulacra2.c (fork-local, new), core/src/meson.build, core/src/feature/feature_extractor.c.
  • Invariant: the extractor embeds several tables that must track libjxl upstream — opsin absorbance matrix, MakePositiveXYB offsets, 108 pooling weights, polynomial-transform coefficients, and the FastGaussian coefficient-derivation formulas (radius = 3.2795·σ + 0.2546, Cramer's 3×3 solve for β, n2/d1 assignment per Charalampidis 2016 (33)). If libjxl ever changes any of these, update ssimulacra2.c in the same PR that syncs upstream. Self-consistency must stay at exactly 100.000000 for identical ref/dist inputs — this is the cheapest regression check.
  • Re-test:
meson test -C build --suite=fast
./build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 --feature ssimulacra2 -o /tmp/self.xml \
  && grep -q 'ssimulacra2="100.000000"' /tmp/self.xml \
  && echo "ok: self-consistency 100.0"

0028 — MS-SSIM separable decimate + AVX2/AVX-512/NEON SIMD

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 (supersedes the rebase-incompatible feat/ms-ssim-decimate-simd; AVX2/AVX-512, commits 7de8cd7f scalar separable, 5f93c864 AVX2, 73436438 AVX-512); feat/ms-ssim-decimate-neon-v2 (NEON follow-up, stacked).
  • Touches: core/src/feature/ms_ssim_decimate.{c,h} (NEW), core/src/feature/x86/ms_ssim_decimate_avx2.{c,h} (NEW), core/src/feature/x86/ms_ssim_decimate_avx512.{c,h} (NEW), core/src/feature/arm64/ms_ssim_decimate_neon.{c,h} (NEW), core/src/feature/ms_ssim.c (call-site change), core/src/meson.build (register new SIMD TUs), core/test/test_ms_ssim_decimate.c (NEW), core/test/meson.build (arm64 gating).
  • Invariant: the 9-tap 9/7 biorthogonal wavelet LPF coefficients (ms_ssim_lpf_h / ms_ssim_lpf_v) are duplicated verbatim in five TUs for bit-identity: the scalar ms_ssim_decimate.c, the AVX2 variant, the AVX-512 variant, the NEON variant, and upstream's g_lpf_h / g_lpf_v in ms_ssim.c. Any upstream change to the coefficient values or the KBND_SYMMETRIC mirror branch in iqa/convolve.c must be mirrored to all five. If not mirrored, SIMD paths and scalar diverge silently and the bit-equality memcmp in test_ms_ssim_decimate catches it — but only when that test runs, so diff the five files first.
  • Re-test (on each supported host arch):
# x86_64 host — native build.
meson test -C build
./build/test/test_ms_ssim_decimate

# aarch64 host OR aarch64 cross under qemu — see /tmp/aarch64-cross.txt.
meson setup build-arm64 libvmaf --cross-file /tmp/aarch64-cross.txt \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build-arm64
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
    build-arm64/test/test_ms_ssim_decimate

# Netflix MS-SSIM golden — places=4 must still pass through SIMD.
.venv/bin/python -m pytest \
    python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor

0029 — KBND_SYMMETRIC period-based reflection in iqa/convolve.c

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 follow-up (CI triage on PR #69, 2026-04-20).
  • Touches: core/src/feature/iqa/convolve.c (upstream file, rewritten KBND_SYMMETRIC).
  • Invariant: KBND_SYMMETRIC(img, w, h, x, y, _) must use the period-based form (period = 2*w, period = 2*h) so that offsets with |x| > w or |y| > h still land in bounds. Upstream's single-reflect form was out-of-bounds whenever w < kernel_half or h < kernel_half; the latent bug did not reproduce in Netflix golden tests because MS-SSIM pyramids never decimate below ~60×34. Any upstream change that reverts to the single-reflect form must be rejected or re-ported.
  • Re-test:
./build/test/test_ms_ssim_decimate        # test_1x1 border case
.venv/bin/python -m pytest \
    python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor

0030 — adm_decouple_s123_avx512 stack-array 64-byte alignment

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 follow-up (CI triage on PR #69, 2026-04-20).
  • Touches: core/src/feature/x86/adm_avx512.c (upstream file, one-line _Alignas(64) on int64_t angle_flag[16] at line 1317). core/test/test_pic_preallocation.c (upstream file, three vmaf_model_destroy(model) calls pairing the vmaf_model_load in test_picture_pool_basic / _small / _yuv444).
  • Invariant: the stack slot for angle_flag must be 64-byte aligned because two _mm512_loadu_si512(&angle_flag[0/8]) loads in the same scope may be promoted to aligned vmovdqa64 by LTO. Dropping the _Alignas(64) annotation re-introduces the SEGV under --buildtype=release -Db_lto=true -Db_sanitize=address. Debug / no-LTO builds keep vmovdqu64 and cannot flag the regression. See docs/development/known-upstream-bugs.md.
  • Re-test:
meson setup build-asan-lto libvmaf \
    -Denable_cuda=false -Denable_sycl=false \
    -Db_sanitize=address --buildtype=release -Db_lto=true
ninja -C build-asan-lto test/test_pic_preallocation
ASAN_OPTIONS=detect_leaks=1 \
    ./build-asan-lto/test/test_pic_preallocation

0031 — Batch-A upstream-port small-fix sweep (ports of unmerged PRs)

  • Workstream PRs: feat/batch-a-upstream-small-fix-sweep — commits 546a40ee (T0-1), 8fed8ad1 (T4-4), 83a1db46 (T4-5), 34425dee (T4-6). ADRs 0131, 0132, 0134, 0135.
  • Touches:
  • core/src/cuda/picture_cuda.c (one-line cuMemFree port of Netflix#1382)
  • core/src/feature/feature_collector.c + core/test/test_feature_collector.c (mount/unmount bugfix port of Netflix#1406 + shared-helper test refactor)
  • core/src/meson.build (declare_dependency + override_dependency port of Netflix#1451)
  • core/include/libvmaf/model.h, core/src/model.c, core/test/test_model.c, docs/api/index.md (built-in model iterator port of Netflix#1424)
  • Invariant: each of the four upstream PRs is OPEN (unmerged) on the port date; when Netflix merges any of them, the fork's version is correction-bearing (T4-4 test refactor, T4-6 three defect fixes + Doxygen doc expansion), not line-identical. Resolution on upstream merge is always "keep fork version" because the fork's version already satisfies the PR's intent and additionally fixes the defects.
  • Netflix#1406 conflict will land in test_feature_collector.c — fork uses load_three_test_models() helper vs upstream's inline per-model VmafModel *m0, *m1, *m2; duplication.
  • Netflix#1424 conflict will land in core/src/model.c and core/test/test_model.c — fork uses else if guard + idx + 1 < CNT + const-qualified test types.
  • Netflix#1382 and Netflix#1451 are line-identical in substance; merge should be clean aside from trailing-comma style drift.
  • Re-test:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_feature_collector test/test_model
build/test/test_feature_collector
build/test/test_model
# Expected: 6/6 pass in test_feature_collector (mount/unmount
# 3-model sequences); 39/39 pass in test_model (includes
# test_version_next full-iteration invariant).

0032 — Thread-local locale handling for numeric I/O (port of Netflix/vmaf#1430)

  • Workstream PRs: port/netflix-1430-thread-locale (T4-3 from the "Batch-A follow-up" sweep, 2026-04-20).
  • Touches: core/src/thread_locale.h / core/src/thread_locale.c (new, upstream-authored); core/src/meson.build (two cdata.set('HAVE_USELOCALE'/'HAVE_XLOCALE_H') probes + src_dir + 'thread_locale.c' in libvmaf_sources); core/src/output.c (four writers gain push_c() + pop() bracket, preserving fork's ferror(outfile) ? -EIO : 0 return contract from ADR-0119); core/src/svm.cpp (drop <locale.h> include; replace setlocale/strdup/setlocale bracket with vmaf_thread_locale_push_c/pop; add buffer.imbue(std::locale::classic()) to both SVM parser ctors with fork's K&R + 4-space style); core/src/read_json_model.c (bracket model_parse with push/pop); core/test/meson.build (new test_locale_handling target + test registration); core/test/test_locale_handling.c (new, upstream-authored with three fork corrections for the score_format parameter).
  • Invariant: fork's output writers return ferror(outfile) ? -EIO : 0 — this must survive any upstream refactor of the writer bodies. The push_c() call MUST be paired with a pop() on every return path (writer bodies have a single tail return, so the pattern is locally push → body → pop → return ferror-check). Dropping pop() leaks a locale_t on POSIX and leaves the thread locked to "C" on Windows.
  • Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_locale_handling
# Repro the user-visible failure without the fix:
LC_ALL=de_DE.UTF-8 build/tools/vmaf --reference ref.yuv \
    --distorted dis.yuv --width 1920 --height 1080 \
    --pixel_format 420 --bitdepth 8 --output result.json \
    --json
# Assert output contains period decimals, not comma.
python -c "import json; d=json.load(open('result.json')); \
    assert all('.' in repr(v) for v in \
    [f['metrics']['vmaf'] for f in d['frames']])"
  • On upstream sync: when Netflix merges PR #1430, the (cherry picked from commit 054a97ed…) trailer in git log port/netflix-1430-thread-locale lets the next /sync-upstream skip this commit. If the upstream diff drifts, redo the three fork corrections listed in ADR-0137 §Decision.

0033 — SSIM / MS-SSIM SIMD bit-exact to scalar via per-lane scalar double

  • Workstream PRs: feat/ms-ssim-decimate-neon (this PR — companion to the ADR-0138 convolve fast path).
  • Touches: core/src/feature/x86/ssim_avx2.c and core/src/feature/x86/ssim_avx512.cssim_accumulate_* rewritten. ssim_precompute_* and ssim_variance_* unchanged (they were already bit-exact). Plus the new bit-exact convolve_avx2.c / convolve_avx512.c and the upstream h-pass OOB fix at iqa/convolve.c:159.
  • Invariants (see ADR-0139 §Decision):
  • Convolve tapssingle-rounded float*float → widen → double add, NO FMA. Mirrors scalar sum += img[i]*k[j] in iqa/convolve.c.
  • SSIM accumulate — scalar's 2.0 * literal (2.0 * ref_mu[i] * cmp_mu[i] + C1 and 2.0 * srsc + C2) is a C double literal. Both SIMD accumulators do the 2.0 * numerator + division + final l*c*s product per-lane in scalar double to match scalar type promotions byte-for-byte.
  • H-pass outer-loop boundy < dst_h + vc - kh_even (not y < dst_h + vc); the - kh_even is load-bearing because the last cache row on even-tap kernels (e.g. box-8) is never read by the v-pass but was previously written OOB when image height equals kernel height.

Fork-local SSIM SIMD is NOT upstream. If upstream ever adds their own SSIM AVX2/AVX-512, keep the fork's version on conflict — it's the only variant verified bit-exact to scalar at --precision max. - Re-test:

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_iqa_convolve test_ms_ssim_decimate
# Bit-exactness check across dispatch backends:
FIX=python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv
DIS=python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv
for m in 255 16 0; do
  build/tools/vmaf --cpumask $m --reference $FIX --distorted $DIS \
      --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
      --feature float_ssim --feature float_ms_ssim \
      --output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_16.xml)    # expect empty
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_0.xml)     # expect empty
  • On upstream sync: the AVX2/AVX-512 SSIM surface is entirely fork-local (upstream has VIF/ADM/motion/CAMBI SIMD but no SSIM). If upstream ever introduces SSIM SIMD, their kernel bodies will almost certainly compute l*c*s in vector float for throughput — do not adopt. The fork's per-lane-scalar-double reduction is required for the bit-exactness claim. Same applies to convolve_avx2/512 — they are fork-only; dispatch sits in ssim_tools.c via _iqa_convolve_set_dispatch.

0034 — SIMD DX framework + NEON SSIM/convolve bit-exact port

  • Workstream PRs: feat/simd-dx-framework (this PR, PR #A); ships the two demos on top of which PR #B will consume the framework (ssimulacra2, motion_v2, vif_statistic, ...).
  • Touches: core/src/feature/simd_dx.h (new header), core/src/feature/arm64/convolve_neon.c + convolve_neon.h (new NEON port), core/src/feature/arm64/ssim_neon.c (ssim_accumulate_neon rewritten for ADR-0139 bit-exactness; precompute + variance unchanged), core/src/feature/float_ssim.c + core/src/feature/float_ms_ssim.c (wire iqa_convolve_neon into the aarch64 dispatch setters), core/src/meson.build (arm64_sources += convolve_neon.c), core/test/meson.build (test_iqa_convolve arch filter extended to arm64 / aarch64), core/test/test_iqa_convolve.c (NEON variant check + aarch64 CPU flag detection), core/test/dnn/meson.build (test_cli.sh gated on not meson.is_cross_build() — bash invokes $VMAF_BIN directly so meson's exe_wrapper isn't applied), new build-aux/aarch64-linux-gnu.ini meson cross-file, .claude/skills/add-simd-path/SKILL.md (upgraded kernel-spec flags).
  • Invariants (see ADR-0140 §Decision):
  • simd_dx.h is fork-local. Keep the fork's version on upstream conflict. Macro names are ISA-suffixed (_AVX2_4L, _AVX512_8L, _NEON_4L) — do not collapse into a cross-ISA abstraction; the fork's SIMD policy (user-memory feedback_simd_dx_scope.md) rules out Highway / simde / xsimd.
  • The ADR-0138 widen-then-add rule (single-rounded float * float → widen → double add, NO FMA) applies to NEON exactly as to AVX2 / AVX-512. The NEON form uses paired float64x2_t accumulators (lo / hi) because NEON has no float64x4_t.
  • The ADR-0139 per-lane scalar-double reduction rule applies to ssim_accumulate_neon exactly as to the AVX2 / AVX-512 variants. The NEON implementation uses SIMD_ALIGNED_F32_BUF_NEON (_Alignas(16) float name[4]) + a 4-iteration scalar loop.
  • Re-test (requires aarch64-linux-gnu-gcc + qemu-user-static + aarch64 sysroot at /usr/aarch64-linux-gnu):
cd libvmaf
meson setup ../build-aarch64 \
  --cross-file ../build-aux/aarch64-linux-gnu.ini \
  -Denable_cuda=false -Denable_sycl=false -Denable_dnn=disabled
cd ..
ninja -C build-aarch64
meson test -C build-aarch64                       # expect 31/31 OK
# Bit-exactness check scalar vs NEON under QEMU:
REF=python/test/resource/yuv/src01_hrc00_576x324.yuv
DIS=python/test/resource/yuv/src01_hrc01_576x324.yuv
for m in 255 0; do
  LD_LIBRARY_PATH=$PWD/build-aarch64/src qemu-aarch64-static \
    -L /usr/aarch64-linux-gnu build-aarch64/tools/vmaf \
    --cpumask $m --reference $REF --distorted $DIS \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    --feature float_ssim --feature float_ms_ssim \
    --output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_0.xml)     # expect empty
  • On upstream sync: upstream has no NEON SSIM and no NEON convolve for IQA. If they ever add one, keep the fork's version on conflict — the fork's NEON path is the only variant verified bit-exact to scalar at --precision max. The build-aux/aarch64-linux-gnu.ini cross-file has no upstream equivalent. The /add-simd-path skill is fork-only; upstream doesn't ship .claude/skills/.

0036 — Port Netflix generalised AVX convolve + ADR-0141 cleanup

  • Workstream PRs: port/upstream-f3a628b4-generalized-avx-convolve (this PR).
  • Upstream commit: f3a628b4 "feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21).
  • Touches:
  • convolution.h — upstream-tracking: adds #define MAX_FWIDTH_AVX_CONV 17.
  • convolution_avx.c — upstream-tracking (2,500 LoC deletion) plus fork-delta cleanup per ADR-0141: four scanline helpers convolution_f32_avx_s_1d_* changed from external linkage to static (no other TU uses them after the specialised-path removal); stride parameters widened from int to ptrdiff_t in the helpers, with (ptrdiff_t) casts at public-function multiplication sites; #include <stddef.h> added for the type.
  • core/src/feature/vif_tools.c — upstream-tracking: three AVX dispatch sites drop the fwidth == 17 || ... == 3 whitelist in favour of fwidth <= MAX_FWIDTH_AVX_CONV.
  • python/test/quality_runner_test.py, python/test/vmafexec_test.py — upstream-authored loosening of two full-VMAF-score assertions from places=2 (±0.005) to places=1 (±0.05). Adopted per the ADR-0142 Netflix-authority precedent (project rule #1 addresses fork drift, not upstream-authored test updates the fork must track).
  • Invariants (see ADR-0143 §Decision):
  • Static linkage on scanline helpers — upstream leaves the four convolution_f32_avx_s_1d_*_scanline helpers with external linkage out of habit; the fork narrows them to static. On upstream sync: if upstream ever externs them from another TU, that's a flag to re-audit; keep the fork's static unless the reference is real.
  • ptrdiff_t strides inside helpers — the public convolution_f32_avx_*_s wrappers keep int strides (matching the upstream interface + convolution.h declarations). Helpers take ptrdiff_t to silence bugprone-implicit-widening-of- multiplication-result. If upstream changes the public interface to ptrdiff_t, drop the fork's wrapper-level casts.
  • MAX_FWIDTH_AVX_CONV = 17 — the ceiling is upstream's; if upstream bumps it, the fork must rebuild + re-run the VIF golden test pair.
  • Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build            # expect 32/32 OK
clang-tidy -p build core/src/feature/common/convolution_avx.c
# Zero warnings expected on the touched file.

Netflix CPU golden CI leg exercises the two loosened assertions; confirmed locally under meson test. - On upstream sync: upstream is the source of truth for convolution_avx.c, convolution.h, vif_tools.c dispatch, and the two python golden tolerances. On a rebase, prefer upstream for those files except: - Keep the fork's static on the four scanline helpers. - Keep the fork's ptrdiff_t helper signatures + multiplication- site casts (unless upstream adopts them too, in which case converge). - Keep the fork's #include <stddef.h>. If upstream re-introduces a specialised fast path for common widths, evaluate on a per-fwidth perf profile — the fork's /profile-hotpath skill covers this.

0038 — motion_v2 NEON SIMD (fork-local)

  • Workstream PR: port/motion-bundle-neon-and-updates (this PR).
  • Upstream: none — aarch64 NEON for motion_v2 is fork-local. Upstream scalar + AVX2 + AVX-512 variants exist; this PR adds the missing NEON fourth path. Scalar is the bit-exactness ground truth.
  • Touches (fork-local):
  • motion_v2_neon.c — new TU, ~300 LoC. 4-wide int32 SIMD over the 5-tap Gaussian pipeline. Five static inline helpers keep every function under the ADR-0141 60-line budget.
  • motion_v2_neon.h — new header declaring the two public entry points.
  • integer_motion_v2.c — dispatch update: adds an #if ARCH_AARCH64 block in init that selects the NEON variant when VMAF_ARM_CPU_FLAG_NEON is present, mirroring the existing x86 dispatch blocks.
  • core/src/meson.build — add arm64/motion_v2_neon.c to the arm64_sources list.
  • Invariants (see ADR-0145 §Decision):
  • Arithmetic right-shift throughout. The fork's AVX2 path uses _mm256_srlv_epi64 (logical) which can diverge from scalar on negative-diff pixels. The NEON port uses vshrq_n_s64(v, 16) for the known Phase-2 shift and vshlq_s64(v, -(int64_t)bpc) for the variable Phase-1 shift — both arithmetic, matching scalar C >> on signed integer. On rebase: keep the arithmetic forms; do NOT adopt vshrq_n_u64 or a logical emulation even if it runs faster.
  • 4-lane stride + mirror tails. SIMD stride = 4; scalar tails cover the remainder. The Phase-2 helper x_conv_row_sad_neon hands 4 lanes to x_conv_block4_neon and drops to scalar for both left/right edges (j < 2 and j + 6 > w). On rebase: preserve the 4-lane stride and the two-sided scalar tail.
  • Signature parity with AVX2. Both pipeline entry points match the AVX2 + AVX-512 variants' (const uint8_t *prev, ptrdiff_t, const uint8_t *cur, ptrdiff_t, int32_t *y_row, unsigned w, unsigned h, unsigned bpc) signature. On rebase: if upstream changes the signature, mirror the change here AND in the x86 variants in lockstep.
  • Re-test:
meson setup build-aarch64 libvmaf \
  --cross-file build-aux/aarch64-linux-gnu.ini \
  -Denable_cuda=false -Denable_sycl=false
ninja -C build-aarch64
meson test -C build-aarch64 --no-rebuild   # expect 31/31 OK
clang-tidy -p build-aarch64 \
  core/src/feature/arm64/motion_v2_neon.c
# Zero warnings expected on the touched file.

# NEON-vs-scalar bit-exact diff under QEMU:
YUV=python/test/resource/yuv
for mask in 0 255; do
  LD_LIBRARY_PATH=build-aarch64/src \
    qemu-aarch64-static -L /usr/aarch64-linux-gnu \
    build-aarch64/tools/vmaf \
    -r $YUV/src01_hrc00_576x324.yuv \
    -d $YUV/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 -n --feature motion_v2 \
    --cpumask $mask -o /tmp/mv2_$mask.xml --precision max
done
diff <(grep -v 'fps=' /tmp/mv2_0.xml) \
     <(grep -v 'fps=' /tmp/mv2_255.xml)  # expect empty
  • On upstream sync: upstream has no NEON motion_v2 and has not signalled plans to add one. If they ever do, diff their NEON against the fork's: on logical-vs-arithmetic shift, keep the fork's arithmetic form (matches scalar). On the function decomposition (the five helpers), adopt upstream's if it's smaller; the fork's layout is ADR-0141-driven, not a semantic contract.
  • Follow-up T7-32 (fixed 2026-05-09): The _mm256_srlv_epi64 (logical right shift) in motion_score_pipeline_16_avx2 was replaced with srav_epi64_imm, an AVX2-safe arithmetic-right-shift emulation: logical shift OR sign-fill mask via srai_epi32 + slli_epi64. Two bugs were closed in the same PR:
  • AVX2 logical-vs-arithmetic shift: _mm256_srlv_epi64 replaced by srav_epi64_imm in core/src/feature/x86/motion_v2_avx2.c. The emulation is bit-exact with scalar C >> bpc on signed int64_t.
  • Test scalar reference mirror: mirror_idx in core/test/test_motion_v2_simd.c used 2*size - idx - 1 instead of 2*size - idx - 2, diverging from integer_motion_v2.c::mirror(). Fixed to -2. All four adversarial fixtures (neg-diff bpc10/12, mixed-diff bpc10/12) now pass. meson test -C build 50/50 OK. On rebase: keep srav_epi64_imm; do not revert to _mm256_srlv_epi64. The rebase-time invariant is now: AVX2 path uses arithmetic shift (matching NEON and scalar).

0039 — readability-function-size NOLINT sweep (ADR-0146)

  • ADR: ADR-0146
  • Touches:
  • core/src/dict.c
  • core/src/picture.c
  • core/src/picture_pool.c
  • core/src/predict.c
  • core/src/libvmaf.c
  • core/src/output.c
  • core/src/read_json_model.c
  • core/src/feature/feature_extractor.c
  • core/src/feature/feature_collector.c
  • core/src/feature/iqa/convolve.c
  • core/src/feature/iqa/ssim_tools.c
  • core/src/feature/x86/vif_statistic_avx2.c
  • Invariant: every readability-function-size NOLINT suppression has been replaced by a set of small static (or static inline, for the SIMD / IQA files) helpers. The helper names are stable interfaces the surrounding code depends on (e.g. iqa_convolve_1d_separable, iqa_convolve_2d, ssim_compute_stats, ssim_workspace_alloc / _free, vif_stat_simd8_compute / _reduce, struct vif_simd8_lane, read_pictures_extractor_loop, read_pictures_post_extractor, read_pictures_validate_and_prep, read_pictures_update_prev_ref). Upstream Netflix has no equivalent helpers; rebases touching any of these files will conflict against the fork's split shape.
  • On upstream sync:
  • If upstream lands a different decomposition of _iqa_convolve or _iqa_ssim, prefer upstream's shape only if it keeps the ADR-0138 / ADR-0139 bit-exactness invariants (single-rounded float mul → widen to double → double add; per-lane scalar-float reduction through aligned temp buffer). Otherwise keep the fork's split and re-document the divergence here.
  • The fork renamed _calc_scaleiqa_calc_scale to clear the bugprone-reserved-identifier check. If upstream modifies _calc_scale, keep the fork's name and port the behavioural change.
  • model_collection_parse_loop writes directly to cfg_name rather than through c->name — if upstream ever rewrites model_collection_parse, preserve the direct write (it's what lets the param stay non-const without a NOLINT).
  • Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for mask in 0 255; do
  VMAF_CPU_MASK=$mask ./build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    -m version=vmaf_v0.6.1 -o /tmp/vmaf_$mask.xml
done
diff <(grep -v fyi /tmp/vmaf_0.xml) <(grep -v fyi /tmp/vmaf_255.xml)
# expect exit 0 (Netflix-golden-pair VMAF bit-identical scalar vs SIMD)

Also run clang-tidy -p build on every file in Touches; expect zero warnings. - Follow-up T7-6: decide whether to rename the _iqa_* API surface (convolve / ssim / decimate / img_filter / filter_pixel / get_pixel) across all callers to clear the remaining bugprone-reserved-identifier suppressions in ssim.c, ms_ssim.c, float_ms_ssim.c. Out of scope here.

0040 — Thread-pool job recycling + inline data buffer (ADR-0147)

  • ADR: ADR-0147
  • Touches: core/src/thread_pool.c
  • Invariants:
  • VmafThreadPoolJob carries a fixed-size char inline_data[64] buffer. Payloads ≤ 64 bytes go through memcpy(job->inline_data, data, data_sz) + job->data = job->inline_data; payloads > 64 bytes take the legacy malloc path. The cleanup path MUST distinguish the two via job->data != job->inline_data — a naive free(job->data) would corrupt the slot. Enforced in vmaf_thread_pool_job_clear_data.
  • free_jobs list is protected by the existing queue.lock; enqueue pops from it before mallocing, runner recycles onto it after running a job. vmaf_thread_pool_destroy walks the list after vmaf_thread_pool_wait returns (all workers have exited → no lock needed). Any reorder that frees the queue lock before the free_jobs walk is a leak on shutdown.
  • Fork's void (*func)(void *data, void **thread_data) signature + per-worker VmafThreadPoolWorker are fork-local; upstream Netflix #1464 has func(void *data). Keep the fork's signature on any rebase — callers (src/libvmaf.c:threaded_enqueue_one etc.) depend on the two-arg form.
  • On upstream sync: Netflix PR #1464 is CLOSED (not merged) and bundles twelve unrelated optimizations. Only the thread-pool portion is ported here. If upstream ever reopens and merges #1464 (or a successor), cherry-pick only the pool mechanics; reject the payload-signature changes, the ADM / VIF / predict.c pieces (they conflict with ADR-0138 / 0139 / 0142 bit-exactness and with T7-5 predict.c refactor), and the feature-collector capacity bump (fork already capped at 8 for a reason — see src/feature/feature_collector.c).

  • Re-test on rebase (x86, any libsvm-less host):

ninja -C build && meson test -C build
for threads in 1 4; do
  for mask in 0 255; do
    VMAF_CPU_MASK=$mask ./build/tools/vmaf \
      --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
      --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
      --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
      -m version=vmaf_v0.6.1 --threads $threads -o /tmp/vmaf_${threads}_${mask}.xml
  done
done
# Expect bit-identical scores (attribute order may differ across
# --threads 1 vs --threads 4 because feature-collector emits in
# insertion order; the numeric values match).
diff <(grep -v fyi /tmp/vmaf_4_0.xml) <(grep -v fyi /tmp/vmaf_4_255.xml)
# expect exit 0 (scalar vs SIMD threaded)

Also run clang-tidy -p build core/src/thread_pool.c — expect zero warnings. Re-run the 500 000-job micro-benchmark from ADR-0147 §Decision if performance is under investigation.

0041 — IQA reserved-identifier rename + cleanup (ADR-0148)

  • ADR: ADR-0148
  • Touches: 21 files across core/src/feature/ (iqa/{convolve,decimate,ssim_tools}.{c,h}, iqa/ssim_simd.h, ssim.c, integer_ssim.c, ms_ssim.c, ms_ssim_decimate.h, float_ssim.c, float_ms_ssim.c, x86/convolve_avx2.{c,h}, x86/convolve_avx512.{c,h}, arm64/convolve_neon.{c,h}, AGENTS.md) plus core/test/test_iqa_convolve.c.
  • Invariants:
  • Every _iqa_* / _kernel / _ssim_int / _map_reduce / _map / _reduce / _context / _ms_ssim_* / _ssim_* / _alloc_buffers / _free_buffers symbol and the four underscore-prefixed header guards (_CONVOLVE_H_, _DECIMATE_H_, _SSIM_TOOLS_H_, __VMAF_MS_SSIM_DECIMATE_H__) is renamed to its non-reserved spelling. The fork's IQA surface no longer uses C's reserved-identifier name space.
  • The clang-analyzer-security.ArrayBound NOLINT bracket in ssim_accumulate_row and ssim_reduce_row_range (integer_ssim.c) is load-bearing — the inner kernel-loop k_min / k_max clamping is provably correct (k_min = max(0, hkernel_offs - x), k_max = min(hkernel_sz, hkernel_sz - (x + hkernel_offs - w + 1))) but the analyzer can't follow it across helper boundaries. Do not collapse the bracket.
  • The clang-analyzer-unix.Malloc NOLINT bracket in test_iqa_convolve.c (check_simd_variant, check_case) is intentional — test exits process on failure path; small allocations leak by design at test end. Do not refactor to free-on-exit.
  • The cross-TU NOLINT pattern on compute_ssim (ssim.c) and compute_ms_ssim (ms_ssim.c) — clang-tidy misc-use-internal-linkage runs per-TU and can't see the header bridge to float_ssim.c / float_ms_ssim.c. Keep the inline justification comment.
  • On upstream sync:
  • The Netflix upstream IQA library (tjdistler/iqa) has been effectively abandoned (last meaningful commit pre-2020). Future rebases will conflict on every renamed symbol; drop the underscore-prefix on each conflict and mirror the fork's iqa_* naming.
  • If upstream Netflix/vmaf ever reincorporates the IQA naming wholesale, prefer the fork's spellings — this PR is a one-shot mechanical rename with no semantic content.
  • Re-test on rebase:
ninja -C build && meson test -C build
for mask in 0 255; do
  VMAF_CPU_MASK=$mask ./build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    -m version=vmaf_v0.6.1 \
    --feature float_ssim --feature float_ms_ssim \
    -o /tmp/iqa_$mask.xml
done
diff <(grep -v fyi /tmp/iqa_0.xml) <(grep -v fyi /tmp/iqa_255.xml)
# expect exit 0 (bit-identical scalar vs SIMD on float_ssim/ms_ssim)

Also run clang-tidy -p build on every touched file (excluding arm64/); expect zero warnings.

0042 — Port Netflix #1376 — FIFO-hang fix via Semaphore (ADR-0149)

  • ADR: ADR-0149
  • Upstream commit: Netflix PR #1376, head 1c06ca4f1bb5da38b54db075a27c35ba8ea9d7b7 (OPEN upstream as of 2026-04-24).
  • Touches:
  • python/vmaf/core/executor.py — base Executor class + ExternalVmafExecutor-style subclass; delete _wait_for_workfiles / _wait_for_procfiles polling loops; rewrite _open_{work,proc}files_in_fifo_mode around multiprocessing.Semaphore(0); add open_sem=None kwarg to every _open_{ref,dis}_{work,proc}file and to the _open_workfile staticmethod; drop unused from time import sleep.
  • python/vmaf/core/raw_extractor.pyAssetExtractor + DisYUVRawVideoExtractor; add open_sem=None to _open_{ref,dis}_workfile overrides (release on entry since these are no-ops); delete _wait_for_workfiles overrides; drop unused from time import sleep.
  • Fork carve-outs (load-bearing on rebase):
  • python/vmaf/__init__.py:__version__ stays "3.0.0" — do NOT port upstream's bump to "4.0.0". The fork tracks its own versioning (v3.x.y-lusoris.N) per ADR-0025.
  • from time import sleep is dropped from both files — upstream leaves the import in place (unused after their patch); the fork removes it because ADR-0141 touched-file rule requires ruff F401 clean.
  • Upstream typo preserved: the subclass warning message contains "to be created to be created". Comments note the typo inline; do not silently fix on rebase — it's upstream- authored and project policy is verbatim port.
  • On upstream sync: upstream PR #1376 is still OPEN. When it merges, re-diff against the merged form; the touched hunks should be conflict-free because the fork now carries the same shape. Re-check whether upstream fixed the "to be created to be created" typo; if so, adopt the fix (it becomes a simple string update).
  • Re-test:
python3 -m py_compile python/vmaf/core/executor.py \
                       python/vmaf/core/raw_extractor.py
ruff check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
black --check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
# all silent

# No FIFO-mode unit test in the tree; end-to-end harness
# exercise (needs libsvm + ffmpeg + fixtures) goes via
#   make test-netflix-golden
# which doesn't exercise fifo_mode path but does verify the
# refactor didn't break executor.py imports.

0043 — Port Netflix #1472 — CUDA on Windows MSYS2/MinGW (ADR-0150)

  • ADR: ADR-0150
  • Upstream commits: Netflix PR #147215745cdf (portability) + b7b65e64 (meson plumbing). Both OPEN upstream as of 2026-04-24.
  • Touches:
  • core/src/cuda/common.h — drop <pthread.h> include; rename reserved header guard __VMAF_SRC_CUDA_COMMON_H__VMAF_SRC_CUDA_COMMON_INCLUDED.
  • core/src/cuda/cuda_helper.cuh#ifdef DEVICE_CODE guard around <cuda.h> vs <ffnvcodec/dynlink_loader.h>.
  • core/src/picture.h#ifdef DEVICE_CODE guard around <cuda.h> + forward-declare VmafCudaState vs <ffnvcodec/*> + full libvmaf_cuda.h; rename reserved header guard.
  • core/src/feature/integer_adm.h — updated comment above dwt_7_9_YCbCr_threshold table noting the fork's positional-initializer shape vs upstream's #ifndef __CUDACC__ shape (see §Fork carve-outs).
  • core/src/feature/cuda/integer_adm/{adm_cm,adm_csf,adm_csf_den,adm_decouple,adm_dwt2}.cu#ifndef DEVICE_CODE guard around #include "feature_collector.h".
  • core/src/meson.build — Windows nvcc plumbing (+70 LoC under host_machine.system() == 'windows'): vswhere-based cl.exe discovery, MSVC + Windows SDK include path injection, CUDA version detection via nvcc --version, nvcc_ccbin_flags + nvcc_host_includes threaded through every custom_target that invokes nvcc.
  • Fork carve-outs (load-bearing on rebase):
  • integer_adm.h uses positional initializers, NOT upstream's #ifndef __CUDACC__ wrap. Both shapes resolve the MSVC/nvcc C++-designated-initializer issue; the positional form is C++-portable and keeps the table available to future .cu consumers. Keep the fork's form on rebase.
  • cuda_static_lib keeps dependencies : [pthread_dependency]. Upstream drops it; the fork needs it because ring_buffer.c (built as part of cuda_static_lib) #includes <pthread.h> directly. On rebase: keep the fork's version.
  • meson.build gencode coverage block: the fork's ADR-0122 explicit cubin list (sm_75/80/86/89 + compute_80 PTX) sits after the new upstream nvcc-detect block. On rebase, re-assemble the same merged order: nvcc-detect first, then gencode coverage (both host-independent).
  • Header guards: _INCLUDED spellings are fork-local (ADR-0148 precedent). Upstream keeps reserved __VMAF_SRC_*_H__ spellings. On rebase, keep _INCLUDED.
  • On upstream sync: PR #1472 is still OPEN. When merged, re-diff the three conflict-resolved hunks against upstream's final form. Keep fork's version on the four carve-outs above unless upstream meaningfully reshapes those regions.
  • Re-test on rebase (Linux host with CUDA toolkit):
meson setup libvmaf core/build-cuda \
    -Denable_cuda=true -Denable_nvcc=true -Denable_sycl=false
ninja -C core/build-cuda && meson test -C core/build-cuda
# Expect 6 .fatbin files generated + CLI linked + 35/35 tests pass.

Windows validation is operator-driven — CI does not yet have a Windows + MSYS2 + MinGW + MSVC BuildTools + CUDA runner (tracked as T7-3 in .workingdir2/OPEN.md). - Prerequisites note (Windows only): nv-codec-headers must be built from git master commit 876af32 or later. The release tag n13.0.19.0 is missing cuMemFreeHost, cuStreamCreateWithPriority, cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing issue, not scope of this port.

0058 — libvmaf.pc Cflags leak fix (ADR-0200)

  • ADR: ADR-0200; bug-fix follow-up to entry 0057.
  • Upstream source: fork-local. Netflix has no Vulkan backend.
  • Touches:
  • core/subprojects/packagefiles/volk/meson.build — drops -include volk_priv_remap.h from volk_dep.compile_args; keeps -DVK_NO_PROTOTYPES.
  • core/src/vulkan/meson.build — pulls volk_priv_remap_h_path from the volk subproject and appends ['-include', <path>] to vmaf_cflags_common (private c_args: on libvmaf's library() call).
  • Invariants (load-bearing):
  • -include MUST stay off volk_dep.compile_args — otherwise it leaks into static libvmaf.pc Cflags. Test on rebase: meson setup ... -Ddefault_library=static -Denable_vulkan=enabled, then grep Cflags meson-private/libvmaf.pc — must NOT contain volk_priv_remap or any build-dir absolute path.
  • -include MUST be applied to libvmaf's compile — every libvmaf TU that calls volk's vk* API needs the rename macros active. The vmaf_cflags_common injection covers this for all libvmaf sub-libraries (libvmaf_feature, libvmaf_cpu, etc.).
  • The path comes from subproject('volk').get_variable(...), not from a hardcoded string — survives volk wrap version bumps.
  • On upstream sync: zero upstream interaction.
  • Re-test on rebase / volk wrap bump:
meson setup build-vk-static-test libvmaf -Denable_vulkan=enabled \
    -Denable_cuda=false -Denable_sycl=false -Ddefault_library=static
ninja -C build-vk-static-test src/libvmaf.a
grep Cflags build-vk-static-test/meson-private/libvmaf.pc
# Expected: no `volk_priv_remap` substring, no build-dir absolute path

0057 — Volk vk* priv-remap for static-archive builds (ADR-0198)

  • ADR: ADR-0198; follow-up to ADR-0185.
  • Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
  • Touches:
  • core/subprojects/packagefiles/volk/meson.build — overlay applied on top of the upstream volk wrap. Adds a custom_target that runs gen_priv_remap.py to produce volk_priv_remap.h from the upstream volk.h, and wires -include of the generated header into volk.c's c_args and volk_dep's compile_args.
  • core/subprojects/packagefiles/volk/gen_priv_remap.py — fork-added generator script (regex against extern PFN_vkXxx vkXxx; declarations).
  • Invariants (load-bearing):
  • Force-include must propagate to every libvmaf TU pulling in volk_dep — verified via meson dep graph. Removing the -include from compile_args re-introduces the static-link multi-def cascade.
  • Generator regex matches every vk* PFN declaration in volk.h — confirmed for volk-1.4.341 (784 declarations, 784 remaps). Bumping the volk wrap version: re-run the generator (it's a configure-time custom target, so it's automatic) and confirm the rename count printed to stdout matches the count of ^extern PFN_vk lines in the new volk.h.
  • The renamed symbols use the vmaf_priv_ prefix — chosen to match no upstream Netflix or Vulkan SDK identifier. Don't rename to _vk* (collides with reserved-identifier C namespace) or vkv_* etc.
  • On upstream sync: zero upstream interaction. The volk wrap is a libvmaf-managed subproject; Netflix doesn't ship a Vulkan backend.
  • Re-test on rebase / after any volk wrap bump:
meson setup build-vk-static libvmaf -Denable_vulkan=enabled \
    -Denable_cuda=false -Denable_sycl=false \
    -Ddefault_library=static
ninja -C build-vk-static src/libvmaf.a
test "$(nm build-vk-static/src/libvmaf.a 2>/dev/null \
          | grep -cE '^[0-9a-f]* (T|D|B|R) vk[A-Z]')" = "0" \
    && echo OK

(Followed by the BtbN-style link reproducer in the ADR References section.)

0056 — SSIMULACRA 2 snapshot gate + fp-contract-off split (ADR-0164)

  • ADR: ADR-0164
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
  • Touches:
  • python/test/ssimulacra2_test.py — new fork-added Python test. Uses subprocess.call against ExternalProgram.vmafexec with --feature ssimulacra2; parses the --json output; asserts pooled + per-frame scores.
  • Invariants (load-bearing):
  • Pinned values are CPU-only — generated on master HEAD after PR #100 merge. Re-generate if the scalar or any SIMD path changes semantically (which per ADR-0161/0162/0163's bit-exactness contract, it shouldn't — any bit-exact refactor leaves pinned values unchanged).
  • Tolerance is 4 decimal places (places=4) — matches 1e-4. The CPU paths are bit-exact so actual drift should be 0; the tolerance is defensive.
  • -ffp-contract=off everywhere in the ssimulacra2 pipeline: libvmaf_ssimulacra2_static_lib (scalar extractor), x86_ssimulacra2_avx2_lib, x86_ssimulacra2_avx512_lib, and arm64_ssimulacra2_lib (from ADR-0161). All four split out of their umbrella libs so other extractors keep upstream's default FMA policy. Without this the CI GCC/clang hosts drifted ~2e-4 from my AVX-512 authoring host — GCC 10+ defaults -ffp-contract=fast on x86 with -mfma and on aarch64, fusing a*b+c in scalar glue around the SIMD calls. Do NOT remove any of these carve-outs on rebase.
  • Fixtures are already-checked-insrc01_hrc00/01_576x324 is also the primary Netflix golden fixture; the 160×90 derived one stresses the sub-176 pyramid-termination path.
  • Do NOT modify the Netflix golden assertions in quality_runner_test.py et al. — those are upstream-pinned. This test is a SEPARATE file that adds fork-specific scores.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future, cross-reference against their pinning if they add one.
  • Re-test on rebase / after any ssimulacra2 change:
cd python && python -m pytest test/ssimulacra2_test.py -v   # 2/2
  • Follow-ups:
  • Cross-reference gate against libjxl tools/ssimulacra2 when ssimulacra2_rs cargo install is fixed.
  • Expand fixture coverage if new YUV test assets land.

0055 — SSIMULACRA 2 picture_to_linear_rgb SIMD (ADR-0163)

  • ADR: ADR-0163
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
  • Touches:
  • ssimulacra2_avx2.{c,h} — new ssimulacra2_picture_to_linear_rgb_avx2 + helpers (read_plane_scalar_s2, srgb_to_linear_lane_avx2, compute_matrix_coefs).
  • ssimulacra2_avx512.{c,h} — 16-wide AVX-512 port.
  • ssimulacra2_neon.{c,h} — 4-wide aarch64 port.
  • ssimulacra2.c — new ptlr_fn field in Ssimu2State; dispatch wrapper convert_picture_to_linear_rgb unpacks VmafPicture into simd_plane_t[3]; init assigns AVX2/AVX-512/NEON pointers.
  • ssimulacra2_simd_common.h — new shared header declaring simd_plane_t. Decouples SIMD TUs from VmafPicture type.
  • test_ssimulacra2_simd.c — new test_ptlr_420_8, test_ptlr_420_10, test_ptlr_444_8, test_ptlr_444_10, test_ptlr_422_8 subtests + scalar references ref_read_plane, ref_srgb_to_linear, ref_picture_to_linear_rgb.
  • Invariants (load-bearing):
  • Scalar-order matmulG = Yn + cb_g * Un + cr_g * Vn chained left-to-right in all three SIMD TUs. Regression test catches reordering drift (~1 ulp).
  • Per-lane scalar powf — vector polynomial approximation would drift scalar bit-exactness. Do not replace the lane spill/reload pattern with a vector libm.
  • simd_plane_t layout{data, stride, w, h} ordering assumed by all three SIMD TUs. The dispatch wrapper builds this from VmafPicture fields; layout must match.
  • Bounds clamping in read_plane_scalar_* mirrors scalar reference verbatim (if (sx < 0) sx = 0; if (sx >= pw) sx = pw-1; etc.). Do not simplify — removes per-lane safety at plane edges.
  • Arbitrary chroma ratios fall through to the int64_t multiplication branch. Don't remove it — SSIMULACRA 2 is supposed to accept non-standard ratios gracefully.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides a SIMD YUV→RGB path, diff against the fork's — preserve the bit-exactness contract unless ADR-0142 Netflix-authority carve-out opens.
  • Re-test on rebase:
ninja -C build && build/test/test_ssimulacra2_simd     # 11/11
ninja -C build-aarch64 && \
  qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
    build-aarch64/test/test_ssimulacra2_simd            # 11/11
  • Follow-ups:
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending (gated on tools/ssimulacra2 availability).
  • SSIMULACRA 2 now has zero scalar hot paths. T3-1 closes in full with phases 1+2+3 (ADR-0161, 0162, 0163).

0054 — SSIMULACRA 2 FastGaussian IIR blur SIMD (ADR-0162)

  • ADR: ADR-0162
  • Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf.
  • Touches:
  • ssimulacra2_avx2.{c,h} — new ssimulacra2_blur_plane_avx2 + 2 helpers (hblur_8rows_avx2, vblur_simd_8cols_avx2).
  • ssimulacra2_avx512.{c,h} — 16-wide port.
  • ssimulacra2_neon.{c,h} — 4-wide aarch64 port, uses vsetq_lane_f32 in place of gather.
  • ssimulacra2.c — adds blur_fn function pointer to Ssimu2State, dispatch in init_simd_dispatch(), call-site in blur_3plane.
  • test_ssimulacra2_simd.c — new test_blur + scalar reference (ref_blur_plane, ref_fast_gaussian_1d).
  • Invariants (load-bearing):
  • Row-batching lane layout — horizontal pass lane i MUST hold row (y_base + i). Gather index vector entries are (y_base + i) * w (stride-w). Changing this breaks bit-exactness vs scalar.
  • Scalar left-to-right summation ordern2_k * sum - d1_k * prev1_k - prev2_k chained sequentially; o0 + o1 + o2 at output time is (o0 + o1) + o2. Changing to (o0 + o2) + o1 or o0 + (o1 + o2) will drift ~1 ulp and the regression test catches it.
  • col_state is 6 * w contiguous floats — layout is [prev1_0 | prev1_1 | prev1_2 | prev2_0 | prev2_1 | prev2_2]. SIMD loads assume this layout; changing field order requires updating all three SIMD TUs in lockstep with blur_plane.
  • NEON lane-set pattern — aarch64 has no gather intrinsic; 4 explicit vsetq_lane_f32 calls per input vector. Do not replace with a ld1 {v.s}[lane]-style pseudo-gather without re-verifying bit-exactness.
  • Scalar tail in vertical pass matches scalar reference body verbatim. Any deviation breaks memcmp equality on widths that aren't multiples of the SIMD width.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides their own IIR blur SIMD, diff against the fork's and preserve the bit-exactness contract unless an ADR-0142 Netflix-authority carve-out is opened.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd  # 6/6
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
  build-aarch64/test/test_ssimulacra2_simd  # 6/6
  • Follow-ups:
  • picture_to_linear_rgb SIMD — last scalar hot path in the extractor. 2 calls / frame. Low ROI but mechanical.
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending.

0053 — SSIMULACRA 2 SIMD bit-exact ports (ADR-0161)

  • ADR: ADR-0161
  • Upstream source: fork-local. Upstream Netflix/vmaf has no SSIMULACRA 2 extractor at all (fork-added in ADR-0130).
  • Touches:
  • ssimulacra2_avx2.c / .h — 5 AVX2 kernels + per-lane cbrtf helper.
  • ssimulacra2_avx512.c / .h — 5 AVX-512 kernels; mechanical 16-wide widening of the AVX2 path.
  • ssimulacra2_neon.c / .h — 5 NEON kernels; 4-wide aarch64 mirror.
  • ssimulacra2.c — adds function-pointer dispatch fields to Ssimu2State + init_simd_dispatch() helper, calls go through the pointers.
  • meson.build — registers the three SIMD TUs in x86_avx2_sources / x86_avx512_sources / arm64_sources.
  • test_ssimulacra2_simd.c and test/meson.build — new bit-exact test harness.
  • Invariants (load-bearing):
  • Byte-for-byte bit-exactness to scalar on all 5 vectorised kernels under FLT_EVAL_METHOD == 0. Regression caught pre- merge: naïve pairing (a+b)+(c+d) vs scalar ((a+b)+c)+d drifts by 1 ULP. Keep sequential scalar-order chains in all three SIMD TUs on rebase.
  • cbrtf is per-lane scalar libm, not a polynomial. Any replacement with a vector cbrt would drift the ssimulacra2 score and break the regression test. Keep the spill/reload pattern.
  • ssim_map / edge_diff_map reductions use the ADR-0139 per-lane double scalar tail. Do NOT SIMD-reduce float lanes then lift to double — summation order changes.
  • downsample_2x2 deinterleave uses ISA-appropriate ops: AVX2 vshufps+vpermpd, AVX-512 vpermt2ps, NEON vuzp1q_f32+vuzp2q_f32. After deinterleave, sum order is ((r0e+r0o)+r1e)+r1o matching scalar.
  • #pragma STDC FP_CONTRACT OFF at every TU header. Ignored by aarch64 GCC (non-fatal -Wunknown-pragmas); kept for portability (clang, MSVC).
  • IIR blur + picture_to_linear_rgb stay scalar in this PR. Follow-up PRs target these; when they land, re-verify bit-exactness via test_ssimulacra2_simd expansion.
  • Runtime dispatch order: AVX-512 > AVX2 on x86; NEON on aarch64; scalar fallback. Preserve on rebase.
  • On upstream sync:
  • Upstream has no SSIMULACRA 2 extractor; nothing to merge.
  • If Netflix adopts SSIMULACRA 2 in the future, diff their implementation against the fork's scalar + SIMD TUs; keep the fork's bit-exactness contract absent a specific Netflix-authority carve-out ADR.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd   # 5/5
clang-tidy -p build core/src/feature/x86/ssimulacra2_avx2.c \
                     core/src/feature/x86/ssimulacra2_avx512.c
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
  build-aarch64/test/test_ssimulacra2_simd   # 5/5
clang-tidy -p build-aarch64 \
  core/src/feature/arm64/ssimulacra2_neon.c
  • Follow-ups:
  • IIR blur vectorisation (blur_plane vertical-pass column batching) — the biggest frame-level wallclock win.
  • picture_to_linear_rgb per-lane powf — lower ROI but mechanical.
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — ADR-0130 deferred; still pending.

0052 — psnr_hvs SIMD bit-exact ports (ADR-0159 AVX2, ADR-0160 NEON)

  • ADRs: ADR-0159 (AVX2), ADR-0160 (NEON sister port).
  • Upstream source: fork-local. Upstream Netflix/vmaf has no psnr_hvs SIMD path.
  • Touches:
  • core/src/feature/x86/psnr_hvs_avx2.c — AVX2 TU.
  • core/src/feature/x86/psnr_hvs_avx2.h — AVX2 header.
  • core/src/feature/arm64/psnr_hvs_neon.c — NEON TU (sister port, ADR-0160).
  • core/src/feature/arm64/psnr_hvs_neon.h — NEON header.
  • core/src/feature/third_party/xiph/psnr_hvs.c — add PsnrHvsState + runtime dispatch in init() (AVX2 under ARCH_X86, NEON under ARCH_AARCH64) + scoped NOLINTBEGIN/END around the upstream Xiph scalar block (kept verbatim as the bit-exact reference).
  • core/src/meson.build — add x86/psnr_hvs_avx2.c to x86_avx2_sources and arm64/psnr_hvs_neon.c to arm64_sources.
  • core/test/test_psnr_hvs_avx2.c, core/test/test_psnr_hvs_neon.c — bit-exact unit tests (x86 and aarch64 respectively).
  • core/test/meson.build — register both tests under enable_asm, arch-gated.
  • Invariants (load-bearing):
  • Bit-exactness to scalar: every od_coeff (int32) and every final psnr_hvs_{y,cb,cr,psnr_hvs} value the AVX2 path emits must be byte-identical to the scalar reference on the Netflix golden pairs. If a rebase introduces any pattern that breaks this (e.g. a floating-point horizontal reduce in the mask accumulator), the unit test test_psnr_hvs_avx2 will fail — don't relax the assertions; fix the SIMD path.
  • DCT butterfly layout: butterfly → transpose → butterfly → transpose. The transpose lives inside od_bin_fdct8x8_avx2. Do not move it.
  • Float accumulators stay scalar: means / variances / mask / error accumulation in calc_psnrhvs_avx2 use the same per-block scalar loop as scalar psnr_hvs — bit-exact by construction. Do not vectorize these with horizontal reductions without replicating ADR-0139's per-lane scalar-float reduction pattern. The cross-block error accumulator ret is threaded through accumulate_error() by pointer, not returned-then-summed: each of the 64 per-coefficient contributions per block must hit the outer ret directly, matching scalar's inline ret += ... at third_party/xiph/psnr_hvs.c line 355. IEEE-754 float add is non-associative — summing into a local float and then adding the per-block total to ret changes the summation tree and drifts the Netflix golden by ~5.5e-5.
  • #pragma STDC FP_CONTRACT OFF at the TU header disables FMA formation. Required: fmaf(a, b, c) can differ from (a*b)+c by 1 ulp, breaking bit-exactness. Do not remove the pragma; do not add -ffp-contract=fast to the build flags for this TU.
  • NOLINT suppressions are load-bearing — each cites ADR-0141 inline (bit-exactness scalar-diff auditability for the 30-butterfly function, scalar float→double promotion for sqrt, extractor-registry extern linkage for vmaf_fex_psnr_hvs, upstream-Xiph scoped block for rebase parity).
  • On upstream sync:
  • Upstream has no psnr_hvs SIMD as of 2026-04-24. Keep fork's version on conflict.
  • If upstream ever touches psnr_hvs.c for non-SIMD reasons (e.g. a masking-table update), rebase the AVX2 TU to match line-for-line and re-run test_psnr_hvs_avx2 to confirm bit-exactness survives.
  • NEON follow-up PR is a sister port; its arm64/psnr_hvs_neon.c will mirror this ADR's invariants. On rebase, the two SIMD TUs must stay in lock-step with the scalar reference.
  • Re-test on rebase:
ninja -C build
meson test -C build test_psnr_hvs_avx2
# Expect: 5/5 subtests pass (DCT bit-exact on 3 random seeds +
# delta + constant input).

# CLI-level bit-exactness on Netflix golden (requires the YUV
# fixtures in python/test/resource/yuv/):
# VMAF_CPU_MASK=0    (scalar)
# VMAF_CPU_MASK=255  (AVX2 enabled)
# Diff per-frame psnr_hvs_{y,cb,cr,psnr_hvs} XML fields; expect
# byte-identical across all 3 golden pairs.

0051 — Netflix#1486 motion updates verified present (ADR-0158)

  • ADR: ADR-0158
  • Upstream source: Netflix upstream PR #1486 ("Port motion updates"), MERGED 2026-04-20 as commits a44e5e6 (code) + 62f47d5 (Netflix golden updates).
  • Touches: documentation-only; the actual code changes this ADR documents are already in the fork's master via earlier incremental motion3 / blend / five-frame-window commits.
  • Invariants (load-bearing for future /sync-upstream):
  • The edge_8 mirror fix (i_tap = height - (i_tap - height + 2)) is present at integer_motion.c:240, x86/motion_avx2.c:147, x86/motion_avx512.c:147. If upstream's mirror line ever diverges again, this is the hunk to watch.
  • The motion_max_val feature option is at integer_motion.c:57,118-120 with default 10000.0 and FEATURE_PARAM flag. Upstream's default = fork's default; don't drift.
  • VMAF_integer_feature_motion3_score output plumbing is in integer_motion.c + alias.c.
  • Fork-local motion extensions (five-frame-window, moving-average, blend, fps_weight) are ADDITIONS on top of Netflix#1486. They are not upstream. Upstream changes to motion extractor internals may conflict with them — diff against core/src/feature/integer_motion.c on every rebase and check that the fork's MIN(s->score * s->motion_fps_weight, s->motion_max_val) invocations are preserved (lines ~409, ~503).
  • On upstream sync: nothing to port from Netflix#1486 — it's absorbed. If a future upstream PR touches the same code paths, prefer upstream's version for the scalar/edge handling and the fork's version for the five-frame-window / blend extensions.
  • Re-test on rebase:
ninja -C build
meson test -C build
# Expect: 35/35 pass.

# Verify the upstream markers are still in place after rebase:
grep -n "height - (i_tap - height + 2)\|motion_max_val\|VMAF_integer_feature_motion3_score" \
    core/src/feature/integer_motion.c \
    core/src/feature/alias.c \
    core/src/feature/x86/motion_avx2.c \
    core/src/feature/x86/motion_avx512.c
# Expect: matches at all 4 files. If any missing, the rebase
# silently dropped the Netflix#1486 content — investigate.

0050 — CUDA preallocation memory leak fix + vmaf_cuda_state_free (ADR-0157)

  • ADR: ADR-0157
  • Upstream source: Netflix upstream issue #1300 (OPEN since 2024; no maintainer fix as of 2026-04-24). User reports GPU memory rises monotonically across init/preallocate/fetch/close cycles.
  • Touches:
  • core/include/libvmaf/libvmaf_cuda.h — new public vmaf_cuda_state_free() API declaration.
  • core/src/cuda/common.c — new vmaf_cuda_state_free() implementation; vmaf_cuda_release() now calls cuda_free_functions(); vmaf_cuda_state_init() gets an outer failure unwind; init_with_primary_context() releases the retained primary context on fail_after_pop.
  • core/src/cuda/ring_buffer.cvmaf_ring_buffer_close() now unlocks + destroys the mutex before freeing.
  • core/test/test_cuda_preallocation_leak.c — new GPU-gated reducer (10-cycle loop with full cleanup).
  • core/test/test_cuda_pic_preallocation.c, core/test/test_cuda_buffer_alloc_oom.c — add missing vmaf_cuda_state_free() + vmaf_model_destroy() calls after vmaf_close() in every test that allocates these.
  • core/test/meson.build — register the new reducer under enable_cuda guard.
  • Invariants (load-bearing):
  • Public contract: every caller of vmaf_cuda_state_init() MUST call vmaf_cuda_state_free() AFTER vmaf_close() on any VmafContext that imported the state. Informal free(cu_state) is a silent double-free hazard AFTER close (vmaf_close's vmaf_cuda_release already memset's + frees CudaFunctions internals; vmaf_cuda_state_free only frees the heap allocation itself).
  • vmaf_cuda_release() frees CudaFunctions via a saved pointer AFTER the memset. Order matters — memset first so cu_state->f is zeroed in the caller's struct, then free via the saved local. Do not re-order.
  • vmaf_ring_buffer_close() unlocks BEFORE destroying the mutex (POSIX requires the mutex be unlocked for destroy).
  • The cold-start unwind in init_with_primary_context releases cuDevicePrimaryCtxRetain's retained context if cuStreamCreateWithPriority fails.
  • The ADR-0122 / ADR-0123 is_cudastate_empty() null-guards at the top of every public vmaf_cuda_* entry must continue to compose with the new vmaf_cuda_state_free() (which accepts NULL directly and doesn't call through to the CUDA API).
  • The new free call order in callers is: vmaf_close(vmaf)vmaf_cuda_state_free(cu_state)vmaf_model_destroy(model). Reversing the first two produces a use-after-free.
  • On upstream sync:
  • Upstream has no vmaf_cuda_state_free() as of 2026-04-24. Keep the fork's version on any conflict. If upstream eventually lands the same API with a different spelling, prefer upstream's spelling and add a compat alias — but do not break the fork's ABI.
  • vmaf_cuda_release()'s cuda_free_functions() call is fork-local. On rebase, keep it.
  • The ring-buffer pthread_mutex_unlock + pthread_mutex_destroy pair is fork-local. On rebase, keep it.
  • If upstream refactors VmafCudaState ownership semantics (unlikely — their pattern has been "leaked state in a long- lived process is acceptable" historically), re-audit this ADR and the new public API.
  • Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 40/40 pass including test_cuda_preallocation_leak.

# ASan leak-check:
cd libvmaf && meson setup build-asan-cuda \
    -Db_sanitize=address -Denable_cuda=true -Denable_sycl=false \
    --buildtype=debug
ninja -C build-asan-cuda
ASAN_OPTIONS='detect_leaks=1:leak_check_at_exit=1' \
    build-asan-cuda/test/test_cuda_preallocation_leak
# Expect: 0 bytes leaked from core/src/* frames.
# (~180 bytes in libcuda.so.1 is expected — driver's process-
#  lifetime cuInit cache, does not grow per cycle.)

0049 — CUDA graceful error propagation (ADR-0156)

  • ADR: ADR-0156
  • Upstream source: Netflix upstream issue #1420 (OPEN as of 2026-04-24). Reports that two concurrent VMAF-CUDA processes crash the second one at vmaf_cuda_buffer_alloc due to CHECK_CUDA(cuMemAlloc)assert(0) on OOM.
  • Touches:
  • core/src/cuda/cuda_helper.cuh — redefined CHECK_CUDA family. New macros CHECK_CUDA_GOTO + CHECK_CUDA_RETURN + helper vmaf_cuda_result_to_errno. Old assert(0) semantics removed entirely.
  • core/src/cuda/common.c, core/src/cuda/picture_cuda.c, core/src/libvmaf.c — all CHECK_CUDA(...) sites converted; cleanup labels added where contexts / buffers were pushed / allocated.
  • core/src/feature/cuda/integer_motion_cuda.c, integer_vif_cuda.c, integer_adm_cuda.c — same conversion; 12 static helpers promoted void → int.
  • core/test/test_cuda_buffer_alloc_oom.c — new GPU-gated reducer.
  • core/test/meson.build — register new test under enable_cuda guard.
  • Invariants (load-bearing):
  • CHECK_CUDA_GOTO / CHECK_CUDA_RETURN must never call assert(0) or abort() on a CUDA error. Any regression back to the upstream abort-on-error semantics re-introduces Netflix#1420 and the NDEBUG footgun.
  • Every CHECK_CUDA_GOTO target label must pop any previously-pushed CUDA context and free any partially-constructed buffers before returning the errno. The graceful path must not leak resources.
  • vmaf_cuda_result_to_errno uses numeric CUresult values directly (0 / 1 / 2 / 3 / 4 / 101 / 201 / 400) so host TUs that don't include <cuda.h> can transitively consume the mapping via the inline function. If upstream renumbers CUresult enum values (historically stable — they've been fixed since CUDA 1.0), re-audit the switch.
  • ADR-0122 / ADR-0123 is_cudastate_empty(...) guards at the top of every public vmaf_cuda_* entry point must stay — they run before the CUDA API is touched and compose cleanly with the new error propagation.
  • Twelve static helper signatures in the feature extractors are int-returning (was void): any upstream-port that restores the void return silently regresses the error path.
  • On upstream sync:
  • Upstream Netflix still uses assert(0) in CHECK_CUDA as of 2026-04-24. Keep the fork's macro definitions in cuda_helper.cuh on any upstream conflict — this file is fork-local behaviour.
  • If upstream eventually lands Netflix#1420 with a similar refactor, prefer the fork's version unless upstream's has identical semantics (no assert(0) / no abort() / translates CUresult to -errno). Re-verify test_cuda_buffer_alloc_oom after rebase.
  • If upstream adds new CHECK_CUDA(...) sites in a port, rewrite them to CHECK_CUDA_GOTO / CHECK_CUDA_RETURN as part of the port commit.
  • If upstream changes any of the 12 static helper signatures back to void, re-promote them to int during the merge.
  • Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 39/39 pass including test_cuda_buffer_alloc_oom.

# Reducer check — verify the OOM-to-errno path is live:
meson test -C core/build-cuda test_cuda_buffer_alloc_oom -v
# Expect subtests: request 1 TiB → -ENOMEM; request 0 bytes → 0.

clang-tidy -p core/build-cuda --quiet \
    core/src/cuda/common.c \
    core/src/cuda/picture_cuda.c \
    core/src/feature/cuda/integer_motion_cuda.c \
    core/src/feature/cuda/integer_vif_cuda.c \
    core/src/feature/cuda/integer_adm_cuda.c \
    core/src/libvmaf.c
# Expect exit 0 on every file.

0049 — compute_motion / picture_copy signature changes (b949cebf upstream port)

  • Upstream commit: Netflix/vmaf b949cebf (feature/motion: port several feature extractor options)
  • Prerequisite commit: Netflix/vmaf d3647c73 (picture_copy: add channel parameter)
  • PR: upstream/port-b949cebf-motion

Rebase-sensitive invariants:

  1. compute_motion signature changecompute_motion() in core/src/feature/motion.c / motion.h now takes an extra int motion_decimate parameter (the motion_add_scale1 flag). Any new caller added in the fork that calls compute_motion() must pass this parameter. The SIMD integer motion callers (motion_avx2.c, motion_avx512.c) do NOT call compute_motion() — they use the SAD/convolution dispatch table directly and are unaffected.

  2. vmaf_image_sad_c signature change — similarly gains int motion_add_scale1. Any caller in the fork must be updated. Currently only called from compute_motion() internally.

  3. picture_copy signature change — gains int channel as the last parameter (0=Y, 1=U, 2=V). Every caller in the tree has been updated to pass 0 (luma). When adding new callers that need UV planes, pass 1 or 2. The fork's CUDA/SYCL/Vulkan callers have been updated in this PR.

  4. Default behavior preserved — all new options default to no-op values. motion_add_scale1=false, motion_add_uv=false, motion_blend_factor=1.0, motion_fps_weight=1.0, motion_filter_size=5 (= DEFAULT_MOTION_FILTER_SIZE). Integer and float motion2 scores are bit-identical to pre-port baseline.

  5. vif_scale_frame_s dependency avoided — the upstream b949cebf motion.c imports vif_scale_frame_s from vif_tools.h. The fork does not have this function yet (vif options chain is deferred, Research-0024 Strategy E). The bilinear downscaler for motion_add_scale1 is implemented as local static functions in motion.c (motion_scale_bilinear, motion_bilinear_interp, motion_mirror_f). When upstream's vif options chain is eventually ported, reconcile by replacing these local functions with vif_scale_frame_s.

Reproducer:

# verify bit-exactness (default options, scores must be identical):
./core/build/tools/vmaf \
  --reference testdata/ref_576x324_48f.yuv \
  --distorted testdata/dis_576x324_48f.yuv \
  --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
  --model path=model/vmaf_v0.6.1.json \
  --feature motion --no_prediction --json --output /tmp/motion.json
# integer_motion2 scores must match pre-port baseline at 6 decimal places.

0048 — i4_adm_cm int32 rounding overflow deliberately preserved (ADR-0155)

  • ADR: ADR-0155
  • Upstream source: Netflix upstream issue #955 (OPEN since 2020; no maintainer response as of 2026-04-24). Reports that add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1)) in core/src/feature/integer_adm.c scales 1–3 overflows int32_t (1u << 31 = 0x80000000 wraps to -2147483648). Rounding term is sign-negated; ADM scales 1–3 biased low by ≈1 LSB per summed term.
  • Touches (documentation-only):
  • docs/adr/0155-adm-i4-rounding-deferred-netflix-955.md — new ADR (this entry's anchor).
  • core/src/feature/integer_adm.c — in-file warning comment above the overflow site (add_bef_shift_flt[] initialiser loop around line 1277). No code change.
  • core/src/feature/AGENTS.md — invariant note under "Rebase-sensitive invariants".
  • Invariants (load-bearing — do NOT silently "fix"):
  • integer_adm.c keeps int32_t add_bef_shift_flt[3] with the overflowing 1u << 31 assignment. The Netflix golden assertions (python/test/quality_runner_test.py, vmafexec_test.py, feature_extractor_test.py) encode the buggy ADM output. Project hard rule #1 (ADR-0024) prohibits changing those assertions.
  • Any "fix" that changes ADM numerical output must land together with a coordinated Netflix-authored golden-number update (the ADR-0142 Netflix-authority carve-out). Until Netflix#955 closes upstream, there is no authority to track.
  • On upstream sync:
  • If Netflix finally lands a fix for #955 (widening the rounding term to uint32_t or int64_t), sync the C-side fix AND the updated assertAlmostEqual values in the same merge. Re-run make test-netflix-golden and /cross-backend-diff on the golden pairs to verify the new numbers are consistent across CPU / CUDA / SYCL.
  • Remove the in-file warning comment above the add_bef_shift_flt initialiser loop, flip ADR-0155 to Superseded by ADR-NNNN, and drop this rebase-notes entry.
  • If upstream instead closes #955 as wont-fix, keep this entry verbatim and update the ADR status to note upstream's closure.
  • Re-test on rebase (gates the invariant by confirming the golden numbers are unchanged):
ninja -C build
make test-netflix-golden
# Expect: VMAF mean 76.66890… on src01_hrc00/01_576x324 golden
# pair — bit-identical to pre-rebase.

0047 — vmaf_score_pooled -EAGAIN for pending features (ADR-0154)

  • ADR: ADR-0154
  • Upstream source: Netflix upstream issue #755 (OPEN as of 2026-04-24). Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); fork reopens it via error-code semantics without changing the retroactive-write design.
  • Touches:
  • core/src/feature/feature_collector.cvmaf_feature_collector_get_score returns -EAGAIN (was -EINVAL) when the requested index is valid but not yet written.
  • core/src/feature/feature_collector.h — inline vmaf_feature_vector_get_score now returns -EINVAL for null/out-of-range and -EAGAIN for not-written (was -1 for both). Added #include <errno.h>. Rename reserved __VMAF_FEATURE_COLLECTOR_H__ guard to VMAF_FEATURE_COLLECTOR_INCLUDED.
  • core/test/test_score_pooled_eagain.c — new 4-subtest reducer.
  • core/test/meson.build — register the new test.
  • Invariants (load-bearing, enforced by the reducer):
  • vmaf_feature_collector_get_score(fc, name, &score, i) returns -EAGAIN iff the feature name is registered and i is in range but score[i].written == false.
  • The return stays -EINVAL for (a) null pointers, (b) i >= feature_vector->capacity, (c) unknown feature name.
  • The inline fast-path vmaf_feature_vector_get_score uses the same split.
  • On upstream sync: upstream has not changed the error semantics since 2020. If they do (unlikely), keep the fork's -EAGAIN — it is strictly more informative and downstream code depending on the split would regress.
  • Re-test on rebase:
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: 4/4 subtests pass.

# Reducer check:
git stash push core/src/feature/feature_collector.c core/src/feature/feature_collector.h
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: Fail: 1 (tests fail without -EAGAIN split).
git stash pop

0046 — float_ms_ssim min-dim guard (ADR-0153)

  • ADR: ADR-0153
  • Upstream source: Netflix upstream issue #1414 (OPEN as of 2026-04-24). No upstream fix has landed; fork adds the guard independently.
  • Touches:
  • core/src/feature/float_ms_ssim.c — add #include "log.h" + #include "iqa/ssim_tools.h" + a min_dim = GAUSSIAN_LEN << (SCALES - 1) check at the start of init; extract SIMD dispatch into a new ms_ssim_init_simd_dispatch helper to keep init within the ADR-0141 60-line budget.
  • core/test/test_float_ms_ssim_min_dim.c — new 3-subtest reducer.
  • core/test/meson.build — register the new test executable.
  • Invariant (load-bearing, enforced by the reducer): float_ms_ssim.init returns -EINVAL when w < 176 || h < 176, where 176 is computed dynamically from the filter constants. The magic number is not hardcoded — changing SCALES or GAUSSIAN_LEN upstream will auto-update the minimum.
  • On upstream sync: if Netflix upstream lands a similar init-time guard, keep the fork's version — the helper name ms_ssim_init_simd_dispatch is fork-local (introduced to satisfy ADR-0141) and upstream's patch won't match. Both guards should be compatible; re-verify the reducer after rebase.
  • Re-test on rebase:
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: 3/3 subtests pass.

# Reducer check (confirms the guard is load-bearing):
git stash push core/src/feature/float_ms_ssim.c
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: Fail: 1 (tests fail without the guard).
git stash pop

0045 — vmaf_read_pictures monotonic-index guard (ADR-0152)

  • ADR: ADR-0152
  • Upstream source: Netflix upstream issue #910 (OPEN as of 2026-04-24). No upstream fix has landed; the fork adds the guard independently, per the 2021-10-14 maintainer comment that recommended exactly this shape.
  • Touches:
  • core/src/libvmaf.c — add unsigned last_index + bool have_last_index fields to VmafContext; prepend a monotonic-index check inside read_pictures_validate_and_prep (returns -EINVAL on duplicates / regressions); update the two new fields at the tail of the same helper on success.
  • core/test/test_read_pictures_monotonic.c — new 3-subtest reducer covering the Netflix#910 sequence and the two classes of rejection (duplicate, out-of-order).
  • core/test/meson.build — register the new test executable.
  • Invariant (load-bearing, enforced by the reducer): vmaf_read_pictures(vmaf, ref, dist, index) returns -EINVAL when have_last_index && index <= last_index. Flush (vmaf_read_pictures(vmaf, NULL, NULL, 0)) routes to flush_context before the guard runs — flushing remains always-available independent of the last accepted index.
  • On upstream sync:
  • If Netflix upstream eventually lands a similar guard at the API boundary, keep the fork's version — the helper function name (read_pictures_validate_and_prep) is fork-local (ADR-0146), upstream's patch will target a different insertion point. Both guards should be compatible; re-verify the reducer after rebase.
  • If upstream instead lands an internal reordering mechanism (buffer-and-sort frames before dispatch), revisit this decision — the fork's API-level contract is stricter and may need to relax to match. Open a new ADR if so.
  • Re-test on rebase:
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: 3/3 subtests pass.

# Reducer check (confirms the guard is load-bearing):
git stash push core/src/libvmaf.c
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: Fail: 1 (the test rejects the un-guarded behaviour).
git stash pop

0044 — i686 (32-bit x86) build-only CI job (ADR-0151)

  • ADR: ADR-0151
  • Upstream source: Netflix upstream issue #1481 (OPEN as of 2026-04-24). Reports i686 compile failure on _mm256_extract_epi64. Workaround documented in the issue: -Denable_asm=false.
  • Touches:
  • build-aux/i686-linux-gnu.ini — new cross-file; gcc + -m32 + cpu_family = 'x86' / cpu = 'i686'. No exe_wrapper.
  • .github/workflows/libvmaf-build-matrix.yml — new matrix row with i686: true flag + new install-deps step for gcc-multilib + g++-multilib; existing "Run tests" + "Run tox tests (ubuntu)" steps widened with && !matrix.i686 guards.
  • Invariants:
  • The i686 matrix row pins -Denable_asm=false — this is the upstream-documented workaround for _mm256_extract_epi64's missing declaration on 32-bit x86 targets. Do NOT remove the flag without first gating every _mm256_extract_epi64 call site in core/src/feature/x86/adm_avx2.c + motion_avx2.c + adm_avx512.c on __x86_64__. Removing the flag naively will re-break the build.
  • No exe_wrapper in the cross-file: meson marks tests as SKIP 77 even though the host can run i686 binaries natively. Build-only gate by design.
  • On upstream sync:
  • If upstream Netflix fixes #1481 at source (by gating the intrinsic calls on __x86_64__ or by emulating via two _mm256_extract_epi32 halves), sync the fix and re-enable ASM on the i686 row (drop -Denable_asm=false from meson_extra). Re-verify bit-exactness via /cross-backend-diff on the x86_64 golden pair.
  • If upstream marks i686 unsupported in meson (e.g. via a hard error), the fork's i686 row should be removed or downgraded to continue-on-error: true.
  • Re-test on rebase (Ubuntu host with gcc-multilib):
meson setup libvmaf core/build-i686 \
    --cross-file=build-aux/i686-linux-gnu.ini \
    -Denable_asm=false \
    -Denable_cuda=false -Denable_sycl=false
ninja -C core/build-i686
file core/build-i686/tools/vmaf
# Expect: ELF 32-bit LSB pie executable, Intel i386

CI runs this same sequence via the new matrix row.

0058 — Tiny-AI Netflix corpus training scaffold (ADR-0252)

  • ADR: ADR-0252.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training harness or MCP server.
  • Touches:
  • ai/ — training harness; NflxLocalDataset loader reads from --data-root (never from a hardcoded path).
  • docs/ai/training-data.md — corpus path convention and loader API docs; purely additive.
  • mcp-server/vmaf-mcp/tests/test_smoke_e2e.py — new e2e smoke test; references only committed golden fixtures.
  • Invariants (load-bearing):
  • Data path is local-only. .workingdir2/netflix/ is gitignored; no YUV from this corpus is ever committed. The --data-root CLI flag must remain the sole mechanism for locating the corpus.
  • Smoke test uses only committed fixtures. test_smoke_e2e.py references python/test/resource/yuv/src01_hrc00_576x324.yuv (a committed golden file), never the local corpus path. On upstream sync the golden YUV path must stay stable.
  • No Netflix golden assertion is modified. The places=4 tolerance in test_smoke_e2e.py asserts against the vmaf_v0.6.1 CPU reference; it is not a golden assertion and may be adjusted by /regen-snapshots with justification.
  • On upstream sync: zero interaction with Netflix upstream. The ai/ subtree and mcp-server/ are wholly fork-local; upstream merges are conflict-free here. If Netflix ever ships a training harness, reconcile separately.
  • Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (vmaf binary)
# Skips automatically if binary or golden YUV is absent.

0085 — Research-0030 Phase-3b multi-seed validation (Gate 1 passed)

  • No ADR. Empirical research digest closing Gate 1 of the 3-gate v2 validation chain. Architecture decision unchanged.
  • Upstream source: fork-local. Netflix has no multi-seed validation surface for tiny-AI training.
  • Touches (additive only):
  • docs/research/0030-phase3b-multiseed-validation.md — per-seed PLCC tables + stability analysis + Gate 2/3 plan.
  • ai/scripts/phase3_subset_sweep.py — adds --seeds flag (comma-separated list) + per-seed result aggregation.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The +0.0175 Δ is multi-seed mean PLCC, not seed-0 PLCC. Don't cite the +0.0106 from Research-0029 once Research-0030 lands; the multi-seed number is more trustworthy.
  • Subset B is more stable than canonical-6 across seeds. Don't ship a v2 model citing single-seed numbers — always report multi-seed mean ± seed-mean-std for any tiny-AI metric in a future digest.
  • The --seeds flag aggregates by flattening (seed × fold) pairs. The reported mean_plcc is the mean of all n_seeds × n_folds measurements; seed_mean_plcc_std is the std across per-seed means, which is the right number for "is the result seed-stable".
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files reproduce from the canonical command.

0084 — Research-0029 Phase-3b StandardScaler retry (positive result)

  • No ADR. Empirical research digest; revives the Research-0026 hypothesis after the Research-0028 negative result. The architectural decision (ship vmaf_tiny_v2) is gated on three validation steps documented in the digest §"Required before shipping".
  • Upstream source: fork-local. Netflix has no tiny-AI preprocessing-sensitivity analysis surface.
  • Touches (additive only):
  • docs/research/0029-phase3b-standardscaler-results.md — per-fold tables + apples-to-apples comparison + 3-gate pre-shipping checklist.
  • ai/scripts/phase3_subset_sweep.py — adds --standardize flag + _standardize_inplace helper.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • StandardScaler statistics MUST be fit per-fold on the train split only. Fitting on the full data would leak held-out information into LOSO; the _standardize_inplace helper enforces this by taking only the train slice as input.
  • A shipped vmaf_tiny_v2.onnx MUST bundle its scaler (mean, std) in the sidecar JSON per ADR-0049 — otherwise inference applies different normalisation than training and the win evaporates. Currently UN-implemented; tracked as a §"Caveats" #5 follow-up.
  • Subset B's feature list is the load-bearing finding: adm2, adm_scale3, vif_scale2, motion2, ssimulacra2, psnr_hvs, float_ssim. Phase-3c experiments may shift the optimal arch / lr / epochs but should keep this set.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the --standardize invocation in §"Reproducer".

0082 — Research-0028 Phase-3 subset sweep (negative-result digest)

  • No ADR. Empirical research digest. The architectural decision (no v2 model ships from this Phase) is governed by Research-0027's pre-registered stopping rule.
  • Upstream source: fork-local. Netflix has no tiny-AI subset- sweep surface.
  • Touches (additive only):
  • docs/research/0028-phase3-subset-sweep.md — per-fold tables adline + standardisation caveat + Phase-3b/c/d follow-ups.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • canonical-6 stays the default until Phase-3b lands a ≥ 0.005 PLCC win (per Research-0027 stopping rule).
  • The PLCC drop is most likely a feature-scale issue, not evidence the new features lack signal. Don't cite this digest to retire ssimulacra2 / adm_scale3 from the candidate pool; re-test with StandardScaler first.
  • Phase-3 results are seed=0 only. Any v2-shipping decision needs 3-seed mean±std and KoNViD cross-check.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; runs/ files are reproducible from the canonical command in §"Reproducer".

0081 — Research-0027 Phase-2 feature importance results

  • No ADR. Empirical research digest closing Research-0026 Phase 2; the architectural decision (Subset A / B / C) is deferred to Phase-3 results in a future digest.
  • Upstream source: fork-local. Netflix has no cross-metric feature-importance analysis surface.
  • Touches (additive only):
  • docs/research/0027-phase2-feature-importance.md — per-method top-10 + consensus + redundancy + Phase-3 subset recommendations.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Consensus top-10 is the load-bearing finding: adm2, adm_scale3, ssimulacra2, vif_scale2. Phase-3 candidate subsets MUST include all four.
  • The 11-pair redundancy table is corpus-specific — measurements on Netflix Public 9-source. KoNViD-1k cross- check is a Phase-3 prerequisite if Subsets B/C advance.
  • runs/full_features_netflix.parquet and runs/full_features_correlation.json stay gitignored. Reproducer in §"Reproducer" regenerates both.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the canonical commands.

0080 — Phase-2 analysis scripts (Research-0026 Phase 2 prep)

  • No ADR. Pure analysis scaffolding; the architectural decision (which features to ship in v2) is gated on Phase 2's numerical output via Research-0027.
  • Upstream source: fork-local. Netflix has no tiny-AI training nor cross-metric correlation tooling.
  • Touches (additive only):
  • ai/scripts/extract_full_features.py — parquet extractor over Netflix corpus with FULL_FEATURES. Per-clip JSON cache at $XDG_CACHE_HOME/vmaf-tiny-ai-full/<source>/<dis_stem>.json.
  • ai/scripts/feature_correlation.py — Pearson + MI + LASSO
    • consensus top-K analyser; outputs JSON.
  • ai/tests/test_feature_correlation.py — 5 pytest cases against synthetic parquet (no libvmaf dependency).
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The per-clip JSON cache and the FULL_FEATURES tuple must stay in lock-step. If the tuple grows (or shrinks), pre-existing cache files become stale and silently misalign their stored per_frame columns with the new tuple. The extractor MUST be re-run with a cleared cache when FULL_FEATURES changes. Regression hint: test_default_features_unchanged in test_feature_sets.py already guards the canonical 6; extend coverage to FULL_FEATURES if rebases touch it.
  • motion3 resolves to extractor motion_v2 in _METRIC_TO_EXTRACTOR, not motion3 (the upstream-canonical extractor name in the integer_motion_v2 module). The CLI --feature motion3 does NOT exist. The JSON output key is integer_motion3 which _lookup finds via the integer_ fallback.
  • adm and vif aggregates are NOT in FULL_FEATURES. The integer extractor emits integer_adm2 and integer_vif_scale0..3 but no bare adm/vif. Listing them produced all-NaN columns in v1 — fixed in PR #185 amend.
  • On upstream sync: zero interaction. Pure fork-side analysis tooling.
  • Re-test on rebase:
pytest ai/tests/test_feature_correlation.py ai/tests/test_feature_sets.py -v
# Expect: 14 passed in <1 s.

0079 — Tiny-AI feature-set registry (Research-0026 Phase 1)

  • No ADR. Pure additive extension of an existing module; the architectural decision (which features, which model) lives in Research-0026's go/no-go gate after Phase 2.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training pipeline.
  • Touches (additive only):
  • ai/data/feature_extractor.py — adds FULL_FEATURES (21 entries), FEATURE_SETS registry, resolve_feature_set() helper. _METRIC_TO_EXTRACTOR grew 11 → 25 entries.
  • ai/tests/test_feature_sets.py — new 9-test smoke suite.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant — these are load-bearing):
  • DEFAULT_FEATURES stays the canonical 6-tuple matching vmaf_v0.6.1's SVR input layout. Test test_default_features_unchanged is the regression guard; any quiet broadening would invalidate every shipped tiny-AI ONNX (input-dim baked into the model). If a future change must broaden the default, ship a paired model swap under ADR-0049 sidecar policy.
  • FULL_FEATURES excludes lpips and float_moment per Research-0026 §"Open questions" Q1. Test test_full_features_excludes_lpips_and_moment enforces. Adding either would re-classify the experiment from "tiny model on classical features" to "ensemble of DNNs".
  • Every entry in FULL_FEATURES MUST have an entry in _METRIC_TO_EXTRACTOR. Test test_every_full_feature_has_extractor_mapping is the guard — without the mapping the libvmaf CLI silently emits NaN columns for the missing metric.
  • On upstream sync: zero interaction. Fork-only training surface.
  • Re-test on rebase:
pytest ai/tests/test_feature_sets.py -v
# Expect: 9 passed in <1 s.

0078 — Research-0026 cross-metric feature fusion plan

  • No ADR. Pure research-plan digest; the architectural decision (which features to add) is deferred to Research-0027 follow-up after Phase 2 numbers land.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training and no broader-feature-set hypothesis under investigation.
  • Touches (additive only):
  • docs/research/0026-cross-metric-feature-fusion.md — 4-phase experimental plan + cost estimate + go/no-go criteria.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The 6-feature canonical baseline (adm2, vif_scale0..3, motion2) stays the default. Any v2 model is opt-in via a new feature_set field in the sidecar JSON; existing vmaf_tiny_v1.onnx users get the same numbers.
  • lpips is OUT of the candidate pool (Phase 1/2). It's DNN-based and would blur the line between "tiny model on classical features" and "ensemble of DNNs". Revisit only if classical features can't close the gap.
  • On upstream sync: zero interaction. Pure fork-side research planning.
  • Re-test on rebase: documentation-only; no test surface.

0077 — Research-0025 FoxBird outlier resolved via KoNViD combined training

  • No ADR. Empirical research digest closing the open question in Research-0023 §5; no architecture or policy decision. Pure documentation of an empirical result.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training, no KoNViD-1k integration, and no LOSO eval surface.
  • Touches (additive only):
  • docs/research/0025-foxbird-resolved-via-konvid.md — per-clip table + comparison to Netflix-only baselines + interpretation + caveats + next-experiment list.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The training-fit per-clip numbers in §"Per-clip result" are NOT held-out generalisation metrics — FoxBird is in the training set. The proper validation is the LOSO sweep on the combined corpus (§"Next experiments" #1). Don't cite the 0.9936 FoxBird PLCC as a generalisation number; cite it as "training-fit on combined corpus, 5.4× RMSE improvement vs Netflix-only".
  • Combined trainer command line is canonical. The reproduction recipe in §"Setup" includes --seed 0, --konvid-val-fraction 0.1, --val-source Tennis, --val-mode netflix-source-and-konvid-holdout. Changing any knob invalidates the per-clip numbers.
  • runs/tiny_combined_canonical/ stays gitignored. The final ONNX is reproducible from the parquet + Netflix corpus + the canonical CLI; the durable record is the digest's table.
  • On upstream sync: zero interaction. Research digest is fork-only.
  • Re-test on rebase:
python ai/train/train_combined.py \
  --netflix-root .workingdir2/netflix \
  --konvid-parquet ai/data/konvid_vmaf_pairs.parquet \
  --model-arch mlp_small --epochs 30 --batch-size 256 --lr 1e-3 \
  --val-mode netflix-source-and-konvid-holdout \
  --val-source Tennis --konvid-val-fraction 0.1 --seed 0 \
  --out-dir runs/tiny_combined_canonical
# Expect: FoxBird PLCC ≈ 0.9936 ± 1e-3 (numerical-noise floor),
# mean PLCC ≥ 0.9983 across 9 Netflix clips.

0076 — Research-0024 vif/adm upstream-divergence digest (Strategy E doc)

  • No ADR. Pure documentation digest; the divergence decisions it ratifies are already governed by ADR-0138 / 0139 / 0142 / 0143 (vif SIMD bit-exactness contract) and ADR-0024 (Netflix golden-data immutability). The digest itself fits the per-PR research-digest deliverable bar from ADR-0108.
  • Upstream source: forward-looking — pre-emptively documents the fork's non-port of Netflix 4ad6e0ea / 41d42c9e / bc744aa3 / 8c645ce3 (vif chain) and 4dcc2f7c (float_adm chain). Strategy A on b949cebf motion chain stays approved.
  • Touches (additive only):
  • docs/research/0024-vif-upstream-divergence.md — 5-strategy decision matrix + numerical-risk analysis for each chain.
  • core/src/feature/AGENTS.md — two new "rebase-sensitive invariants" entries pinning the vif and adm divergences.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant — these are the whole point):
  • Do not port 4ad6e0ea (vif runtime helpers) or 8c645ce3 (vif prescale options) verbatim. They replace the precomputed vif_filter1d_table_s table whose frozen const float Gaussians make AVX2 == AVX-512 == NEON == scalar bit-for-bit. A future opt-in second-path port (Strategy C, runtime helpers behind --vif-prescale != 1) is allowed but must not touch the default code path.
  • Do not port 4dcc2f7c float_adm options chain. The 12-parameter compute_adm signature change cascades through SIMD (avx2 / avx512 / neon) and 3 GPU backends (vulkan / cuda / sycl). The new aim feature has no fork- side golden values; defer until concrete user demand.
  • Mirror bugfix 41d42c9e is a separate decision. Must come paired with places=4 → places=3 golden loosening per ADR-0142 Netflix-authority precedent. Not part of Strategy E; eligible for a focused single-purpose PR if any shipped model drifts more than places=3 because of the missing fix.
  • b949cebf motion chain port stays APPROVED under Strategy A (verbatim, float_motion-side only). Float_motion has no precomputed-table investment to protect; existing fork integer_motion already has 6/9 of these options; cheap to mirror onto float_motion.
  • On upstream sync: zero conflict — pure additions to research/ and AGENTS.md.
  • Re-test on rebase: documentation-only PR; rendered markdown is the only verification surface.
# Re-run the diff scan that produced the digest (catches new
# upstream commits since 9dac0a59):
git fetch upstream && git log --pretty=format:'%h %s' \
  upstream/master ^origin/master --since="2026-01-01" \
  -- core/src/feature/{float_,integer_,}{vif,motion,adm,cambi}*.{c,h} \
     core/src/feature/{vif,motion,adm,cambi}_options.h \
  | head -30
# If new vif / adm option ports appear, update Research-0024 §"Same
# divergence test for motion + float_adm" before deciding to port.

0075 — Upstream 798409e3 + 314db130 ports (CUDA null-deref + remove all.c)

  • No ADR. Pure upstream cherry-picks per ADR-0108 carve-out ("pure upstream syncs and port-upstream-commit PRs are exempt").
  • Upstream source:
  • 798409e3 (Lawrence Curtis, 2026-04-20): "Fix null deref crash on prev_ref update in pure CUDA pipelines"
  • 314db130 (Kyle Swanson, 2026-04-28): "libvmaf/feature: remove empty translation unit all.c"
  • Touches (additive / removal only):
  • core/src/libvmaf.c — adds if (ref && ref->ref) guard before vmaf_picture_ref(&vmaf->prev_ref, ref) at the two threaded paths (threaded_enqueue_one line 1057 and threaded_read_pictures_batch line 1105). Main path at line 1597 already has the guard.
  • core/src/feature/all.c — file deleted.
  • core/src/meson.build — drops the feature_src_dir + 'all.c' line.
  • core/src/feature/offset.c — updates the // NOLINTNEXTLINE comment to drop all.c from the list of per-feature consumers.
  • CHANGELOG.md Unreleased § Fixed (798409e3) + § Changed (314db130).
  • Invariants (rebase-relevant):
  • The fork has THREE prev_ref update sites; all need the if (ref && ref->ref) guard. The main vmaf_read_pictures path already had it (via read_pictures_update_prev_ref helper); the threaded paths (#ifdef VMAF_BATCH_THREADING) inherited the unguarded shape from upstream's old code. Future upstream rebases must preserve all three guards even if Netflix refactors the threaded paths.
  • all.c deletion is symbol-safe. All compute_* functions it forward-declared are reached via per-extractor TUs that #include the relevant <feature>.h. No external linker dependency on all.c's symbols.
  • On upstream sync: zero conflict expected — fork now matches upstream tip on these two surfaces.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
  -Denable_vulkan=disabled
ninja -C build-cpu
meson test -C build-cpu  # 37 tests, all pass.

0074 — Combined Netflix + KoNViD-1k trainer driver

  • No ADR. Pure engineering follow-up; the architecture rationale is fully covered by ADR-0203 (training-prep architecture) and Research-0023 §5 (FoxBird-class outlier needs broader corpus).
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI trainer.
  • Stacks on the KoNViD-1k loader bridge (PR #178 / rebase-note 0073). Rebase order: land 0073 first.
  • Touches (additive only):
  • ai/train/train_combined.py — concatenating trainer that reuses _build_model / _train_loop / export_onnx from ai/train/train.py.
  • ai/tests/test_train_combined_smoke.py — 5 pytest cases (key splitter + --epochs 0 paths, no libvmaf or real corpus required).
  • docs/ai/training.md — "Combining KoNViD with the Netflix corpus" subsection rewritten from "follow-up" to runnable.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Reuse the canonical training-loop helpers. Don't fork _build_model / _train_loop / export_onnx into this file. Both trainers must share the model factory so a future change (e.g. adding mlp_large) lands in one place.
  • KoNViD train/val splits hold out whole clip keys, not random frames. A frame-level split would let frames from the same clip leak across train/val and inflate PLCC by 5-10 pp (well-known VQA pitfall — same reasoning as ADR-0203's Netflix 1-source-out split).
  • Missing data falls back, not errors. Missing --konvid-parquet → Netflix-only path. Missing --netflix-root → KoNViD-only path. Both missing → initial- weights ONNX export + rc=0 so the smoke command always produces a deterministic artefact.
  • On upstream sync: zero interaction; pure fork-local trainer.
  • Re-test on rebase:
pytest ai/tests/test_train_combined_smoke.py -v
# Expect: 5 passed (under ~3 s, no libvmaf required).
python ai/train/train_combined.py --epochs 0 \
  --netflix-root /tmp/missing --konvid-parquet /tmp/missing.parquet \
  --out-dir /tmp/combined_smoke
# Expect: <out-dir>/mlp_small_combined_final.onnx written, rc=0.

0073 — KoNViD-1k → VMAF-pair acquisition + loader bridge

  • No ADR. Acquisition + loader pieces are pure additions; the methodology fits inside ADR-0203 / Research-0019.
  • Upstream source: fork-local. KoNViD-1k integration is a fork-only training-data play.
  • Touches (additive only):
  • ai/scripts/konvid_to_vmaf_pairs.py — acquisition pipeline.
  • ai/train/konvid_pair_dataset.pyKoNViDPairDataset class mirroring NetflixFrameDataset's interface.
  • ai/tests/test_konvid_pair_dataset.py — 5 pytest cases.
  • docs/ai/training.md — new "C1 (KoNViD-1k corpus)" section.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • KoNViDPairDataset mirrors NetflixFrameDataset shape. feature_dim == 6, numpy_arrays() → (X, y) returns (n_frames, 6) + (n_frames,). If NetflixFrameDataset's feature order changes, mirror it here.
  • Acquisition parquet schema is fixed. Required columns: key, frame_index, vif_scale0..3, adm2, motion2, vmaf. Add freely; do NOT rename / drop those.
  • ai/data/konvid_vmaf_pairs.parquet and $VMAF_TINY_AI_CACHE/konvid-1k/ stay gitignored. They regenerate from raw KoNViD .mp4 sources.
  • On upstream sync: zero interaction.
  • Re-test on rebase:
pytest ai/tests/test_konvid_pair_dataset.py -v
# Expect: 5 passed
python ai/scripts/konvid_to_vmaf_pairs.py --max-clips 5
# Expect: ~7 s wall, ai/data/konvid_vmaf_pairs.parquet with
#         5 unique keys × ~200 frames each.

0072 — Tiny-AI 3-arch LOSO eval harness + Research-0023

  • No ADR. Methodology fits inside Research-0023; ADR-0203 already covers the training-prep architecture and the three-arch sweep concept.
  • Research digest: docs/research/0023-loso-3arch-results.md.
  • Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
  • Touches (additive only):
  • ai/scripts/eval_loso_3arch.py — new harness; reuses the _load_session + _load_clip + CLIPS helpers from eval_loso_mlp_small.py (PR #165).
  • docs/research/0023-loso-3arch-results.md — methodology + per-fold tables for mlp_small / mlp_medium / linear.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Reuse the PR #165 helpers. Don't fork the _load_session external-data workaround into a copy — both scripts must keep using the same import. If a follow-up re-exports the shipped baselines with corrected external_data.location, both scripts deprecate the workaround simultaneously.
  • runs/ and model/tiny/training_runs/ stay gitignored. The harness writes runs/loso_eval/loso_3arch_eval.{json,md}; the durable record is the table in Research-0023 §2 + the per-fold tables in §3. Regenerate via the loop in §6 of the digest.
  • On upstream sync: zero interaction. Pure fork-local evaluation harness.
  • Re-test on rebase:
python ai/scripts/eval_loso_3arch.py
diff <(jq -r '.archs.mlp_small.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9808)
diff <(jq -r '.archs.mlp_medium.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9727)
diff <(jq -r '.archs.linear.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.3679)
# Expect: identical lines on a populated cache + identical fold ONNX.

0071 — T7-16 ADM Vulkan/SYCL drift verified-resolved (doc close)

  • No ADR. Verification-only close, sister of T7-15.
  • Upstream source: fork-local. ADM cross-backend gate is a fork-only test surface; Netflix/vmaf has no Vulkan or SYCL backend.
  • Touches (additive only):
  • docs/state.md — new "Recently closed" row for T7-16.
  • .workingdir2/BACKLOG.md — T7-16 row marked closed (local- only planning dossier; gitignored).
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • places=4 cross-backend ADM contract. Empirical adm_scale2 max_abs_diff is now 1e-6 (print floor; ULP=0) on Vulkan device 0 (NVIDIA), device 1 (Mesa anv on Arc), and SYCL device 0 (Arc); residual adm_scale1 ≈ 3.1e-5 and adm2 ≈ 5e-6 on 1/48 frames pass places=4 (5e-5 tolerance) but fail places=5. Hold the gate at places=4.
  • No ADM kernel source change. Fix is environmental (NVCC + driver + SYCL runtime).
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --feature adm --backend vulkan --device 0 --places 4 \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324
# Expect: 0/48 mismatches across all 5 ADM metrics.

0070 — T7-15 motion CUDA/SYCL drift verified-resolved (doc close)

  • No ADR. Verification-only close; no code change in PR #172.
  • Upstream source: fork-local. Cross-backend gate is a fork-only test surface; not in Netflix/vmaf.
  • Touches (additive only):
  • docs/state.md — "Recently closed" row for T7-15.
  • .workingdir2/BACKLOG.md — T7-15 row marked closed (local- only planning dossier; gitignored).
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • The places=4 cross-backend gate stays at places=4. Empirical max_abs_diff is currently 0.0 (CUDA) or 1e-6 (SYCL/ Vulkan, JSON %f rounding floor); tightening to places=5 could be tempting but the 1e-6 print-floor would then make the SYCL + Vulkan rows fail. Hold at places=4 until --precision=max is wired into the diff tool.
  • No motion-kernel source change. PR #172 didn't modify core/src/feature/cuda/integer_motion/*.cu or core/src/feature/sycl/integer_motion_sycl.cpp. The fix is environmental (NVCC + driver), so the next CI run on a fresh image needs to be re-verified against the gate.
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature motion --backend cuda \
  --places 4
# Expect: 0/48 mismatches, max_abs_diff = 0.0

0069 — libvmaf_vulkan.h installed under prefix (build bug)

  • No ADR. Build-system bug fix; matches existing CUDA / SYCL install conditions.
  • Upstream source: fork-local. Vulkan backend is fork-only; Netflix/vmaf has no libvmaf_vulkan.h.
  • Touches:
  • core/include/core/meson.build — adds an is_vulkan_enabled gate that handles the feature option's enabled / auto states; appends libvmaf_vulkan.h to platform_specific_headers when active.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • Install rule mirrors the CUDA / SYCL pattern but uses the feature-option API. The is_cuda_enabled = get_option('enable_cuda') == true boolean idiom doesn't apply to enable_vulkan because that's a feature option, not a boolean. Use .enabled() or .auto(). Don't "simplify" to == true — that would silently drop the install in the auto state.
  • Pairs with ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch which probes for the header via check_pkg_config libvmaf_vulkan "libvmaf >= 3.0.0" libvmaf/libvmaf_vulkan.h vmaf_vulkan_state_init_external. Removing the install rule re-introduces lawrence's 2026-04-28 symptom: FFmpeg silently drops the libvmaf_vulkan filter despite --enable-libvmaf-vulkan.
  • On upstream sync: zero interaction; Vulkan backend is fork-only.
  • Re-test on rebase:
cd libvmaf
CC=icx CXX=icpx meson setup build -Denable_vulkan=enabled \
  -Denable_cuda=true -Denable_sycl=true -Db_lto=false
ninja -C build
meson install -C build --destdir /tmp/libvmaf-install
ls /tmp/libvmaf-install/usr/local/include/libvmaf/libvmaf_vulkan.h
# Expect: file exists.

0066 — --backend cuda inverted-gpumask fix (CLI bug)

  • No ADR. Bug fix; behaviour now matches the public-header VmafConfiguration::gpumask contract.
  • Upstream source: fork-local. The --backend CLI selector was added by the fork (Netflix/vmaf has no exclusive-backend selector).
  • Touches (additive + 1-line behavioural fix):
  • core/tools/cli_parse.c::parse_cli_args--backend cuda branch sets gpumask = 0 (was gpumask = 1).
  • core/test/test_cli_parse.c — 5 new regression tests (test_backend_{cpu,cuda_engages_cuda,cuda_preserves_explicit_gpumask,sycl,vulkan}) plus run_aom_ctc_tests / run_backend_tests helper split to keep run_tests under the function-size budget.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • VmafConfiguration::gpumask semantics: if gpumask: disable CUDA. compute_fex_flags in src/libvmaf.c routes CUDA only when gpumask == 0. Any code path that sets a non-zero gpumask to "request CUDA" silently disables it. The CLI's --backend cuda branch must set gpumask = 0 and rely on use_gpumask = true to trigger vmaf_cuda_state_init. Do not "fix" this back to gpumask = 1 — it's the bug being fixed.
  • Explicit --gpumask=N --backend cuda preserves N. A user who passes --gpumask=2 already has use_gpumask = true, so the --backend cuda branch's defaulting block (gated on !settings->use_gpumask) is skipped. The test_backend_cuda_preserves_explicit_gpumask regression locks this in.
  • On upstream sync: zero interaction; --backend is fork-only.
  • Re-test on rebase:
./build/test/test_cli_parse | grep -E 'backend_'
# Expect: 5 backend tests pass.
build/tools/vmaf -r REF -d DIS -w 576 -h 324 -p 420 -b 8 \
  --model "path=model/vmaf_v0.6.1.json" --threads 1 \
  --backend cuda --output cuda.json --json -q
python3 -c "import json; d=json.load(open('cuda.json')); \
  assert len(d['frames'][0]['metrics']) == 12, 'CUDA not engaged'"

0067 — Tiny-AI PTQ accuracy across Execution Providers (T5-3e)

  • No ADR. Investigation/measurement PR; ADR-0129 already governs the PTQ workstream. Findings update docs/research/0006-tinyai-ptq-accuracy-targets.md §"GPU-EP quantisation" — that section was previously a deferred-open-question; it is now the empirical landing spot.
  • Research digest: same file (Research-0006).
  • Upstream source: fork-local. Netflix/vmaf does not ship a PTQ harness or any tiny-AI ONNX path.
  • Touches (additive only):
  • ai/scripts/measure_quant_drop_per_ep.py — new sibling of measure_quant_drop.py. CPU+CUDA via ORT; Arc / OpenVINO-CPU via the native openvino Python runtime (no onnxruntime-openvino because no cp314 wheel exists). Reuses the _load_session rename workaround from PR #165 + a value_info-strip fix so dynamic-PTQ doesn't choke on the shipped MLP ONNX.
  • docs/ai/quant-eps.md — new user doc; linked from docs/ai/index.md.
  • docs/research/0006-tinyai-ptq-accuracy-targets.md — refreshed header, replaced "GPU-EP open question" with the measurement table, fixed pre-existing MD040/MD060 lints surfaced on the touched file.
  • docs/ai/index.md — added the quant-eps row, rewrapped to 80 cols.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant):
  • measure_quant_drop.py (the CI gate) is unchanged. The new script is purely additive. Any rebase that conflates the two scripts must keep the CI gate CPU-only — Arc int8 is broken, so a per-EP gate would red-light every PR.
  • value_info strip is required for vmaf_tiny_v1* dynamic PTQ. The shipped MLP ONNX duplicate weight tensors in value_info, which makes quantize_dynamic raise Inferred shape and existing shape differ. The fix is in _save_inlined. Don't remove it during a refactor unless the underlying ONNX is regenerated.
  • CUDA-12 ABI shim. ORT-GPU 1.25 wheels link libcublasLt.so.12 even on CUDA-13 hosts. The reproduction recipe pins the nvidia-*-cu12 wheels and prepends them to LD_LIBRARY_PATH. If a future ORT wheel drops the cu12 ABI we can cut the shim, but the script tolerates either since it doesn't import any CUDA symbol itself.
  • On upstream sync: zero interaction; entirely fork-local.
  • Re-test on rebase:
SP=$VIRTUAL_ENV/lib/python3.14/site-packages/nvidia
export LD_LIBRARY_PATH="$SP/cublas/lib:$SP/cudnn/lib:$SP/cuda_nvrtc/lib:$SP/cuda_runtime/lib:$SP/cufft/lib:$SP/curand/lib:$SP/cusolver/lib:$SP/cusparse/lib:$SP/cuda_cupti/lib:$SP/nvtx/lib:$SP/nvjitlink/lib"
python ai/scripts/measure_quant_drop_per_ep.py \
    --eps cpu cuda openvino \
    --extra-fp32 vmaf_tiny_v1.onnx vmaf_tiny_v1_medium.onnx \
    --out runs/quant-eps-$(date +%Y-%m-%d)
# Expected: CPU + CUDA PASS (drop ≤ 1.2e-4); OpenVINO Arc ERR
# (compile failure for Conv-int8) or NaN (MatMul-int8) until a
# newer intel_gpu plugin lands.

0065 — testdata/bench_all.sh correct backend-engagement flags

  • No ADR. Bug fix; no behavioural surface change beyond "the bench actually engages the backends it claims to now."
  • Upstream source: fork-local. testdata/bench_all.sh is a fork-only bench harness; not in Netflix/vmaf.
  • Touches (additive only):
  • testdata/bench_all.sh — switched per-row flag pattern from the disable-only singletons (--no_sycl for "CUDA", etc.) to the correct engagement form (--gpumask=0 --no_sycl --no_vulkan for CUDA, --sycl_device=0 --no_cuda --no_vulkan for SYCL, --vulkan_device=0 --no_cuda --no_sycl for Vulkan, and --no_cuda --no_sycl --no_vulkan for CPU). Added a 4th column (Vulkan) to the comparator. Honours $VMAF_BIN for the binary path and $VMAF_ONEAPI_SETVARS for the oneAPI install location.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • Disable-only singletons don't engage a backend. --no_sycl alone leaves CUDA available but unrequested. --no_cuda alone leaves SYCL available but unrequested. The CLI inits CUDA only when c.use_gpumask is set; SYCL only when c.sycl_device >= 0 or c.use_gpumask; Vulkan only when c.vulkan_device >= 0. Any change to those gates that drops one of the per-row flags will re-introduce the silent CPU fallback. Verify after a rebase by inspecting JSON frames[0].metrics key counts (CPU 14-15, CUDA 11-12, Vulkan ~34) — see libvmaf/AGENTS.md §"Backend-engagement foot-guns".
  • gpumask semantics are inverted from intuition. gpumask=0 enables CUDA dispatch; gpumask=1 disables it. The per-row CUDA flag is --gpumask=0, not --gpumask=1. Don't "fix" it to --gpumask=1 for symmetry with sycl_device/vulkan_device — that's the bug being fixed (parallel to PR #170).
  • On upstream sync: zero interaction; testdata/bench_all.sh is fork-only.
  • Re-test on rebase:
bash testdata/bench_all.sh    # smoke
# Verify each row's JSON keys match the expected per-backend count:
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cpu.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cuda.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_vulkan.json

0063 — Tiny-AI LOSO eval harness for mlp_small

  • No ADR. The methodology fits inside Research Digest 0022; ADR-0203 already covers the training-prep architecture.
  • Research digest: docs/research/0022-loso-mlp-small-results.md.
  • Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
  • Touches (additive only):
  • ai/scripts/eval_loso_mlp_small.py — new evaluation harness.
  • docs/ai/loso-eval.md — usage doc.
  • docs/research/0022-loso-mlp-small-results.md — methodology + results.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • _load_session workaround for renamed-baseline ONNX. The shipped baselines model/tiny/vmaf_tiny_v1*.onnx reference their pre-rename external_data.location values. The workaround in _load_session rewrites the entries before handing the proto to ORT. Removing the workaround breaks the baseline phase. The proper fix (re-export with matching names) is tracked as a follow-up; until then this code path is load-bearing.
  • runs/ and model/tiny/training_runs/ stay gitignored. The harness writes to runs/loso_eval/ by default; do NOT promote any of those outputs into the tree. The 9 fold ONNX and the per-clip JSON cache regenerate from the corpus + trainer + libvmaf CLI.
  • On upstream sync: zero interaction. Pure fork-local evaluation harness.
  • Re-test on rebase:
python ai/scripts/eval_loso_mlp_small.py
diff <(jq -r '.loso_aggregate.mean_plcc' runs/loso_eval/loso_mlp_small_eval.json) <(echo 0.9808)
# Expect: identical line on a populated cache + identical fold ONNX.
  • No ADR. Process / docs PR; rows trace back to the individually-cited ADRs / research digests in their own References columns.
  • Decision dossier: .workingdir2/decisions/section-a-decisions-2026-04-28.md.
  • Source audit: docs/backlog-audit-2026-04-28.md.
  • Upstream source: fork-local. Pure backlog hygiene PR; no Netflix code touched.
  • Touches (additive only):
  • .workingdir2/BACKLOG.md — 9 new rows: T3-17, T3-18, T5-3e, T5-4, T7-35, T7-36, T7-37, T7-38; T6-1a row extended with the bisect-cache fixture sub-bullet.
  • docs/research/0006-tinyai-ptq-accuracy-targets.md — drops the "defer until first user" framing on the GPU-EP quantisation open question per user direction; cross-links T5-3e.
  • docs/research/0020-cambi-gpu-strategies.md — v2 follow-up section now cites T7-36 as the gate for opening the v2 row.
  • docs/adr/0205-cambi-gpu-feasibility.md — Decision section's "follow-up integration PR" now cites T7-36.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant): none. Pure backlog text. Rebase-conflict risk is limited to the same BACKLOG.md table rows that any future row addition would touch; trivial to re-resolve.
  • On upstream sync: zero interaction.
  • Re-test on rebase: none — docs-only.

0062 — ssimulacra2 CUDA + SYCL twins (ADR-0206)

  • ADR: ADR-0206.
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2 GPU implementation; this PR adds the CUDA + SYCL twins of the fork's ADR-0201 Vulkan kernel.
  • Touches (additive + small wiring edits):
  • docs/adr/0206-ssimulacra2-cuda-sycl.md and the index row in docs/adr/README.md.
  • core/src/feature/cuda/ssimulacra2_cuda.{c,h} — new CUDA dispatch.
  • core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu and ssimulacra2_mul.cu — new CUDA fatbins.
  • core/src/feature/sycl/ssimulacra2_sycl.cpp — new SYCL extractor.
  • core/src/feature/feature_extractor.c — two new extern declarations + two new entries in feature_extractor_list[].
  • core/src/meson.build — adds ssimulacra2_blur + ssimulacra2_mul to cuda_cu_sources, introduces (or extends, if PR #157 / ADR-0202 landed first) the cuda_cu_extra_flags map with a ssimulacra2_blur entry, threads per_kernel_flags into the fatbin custom-target, and lists the two new C / CPP TUs.
  • core/src/cuda/AGENTS.md and core/src/sycl/AGENTS.md — rebase invariant notes for the per-kernel --fmad=false flag and the -fp-model=precise SYCL build flag.
  • docs/backends/cuda/overview.md, docs/backends/sycl/overview.md, docs/metrics/features.md — coverage matrix updates.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (load-bearing on rebase):
  • Per-kernel --fmad=false for ssimulacra2_blur. The IIR's o = n2 * sum - d1 * prev1 - prev2 must NOT fuse into FMAs — without the flag the recursive Gaussian's per-step rounding compounds across the 6-scale pyramid past places=4.
  • -fp-model=precise on the SYCL feature build line. Removing it drifts ssimulacra2_sycl past places=2 through the IIR.
  • Hybrid host/GPU split mirrors Vulkan. Host runs YUV→RGB, XYB, downsample, and SSIM/EdgeDiff combine in double; GPU runs only mul + IIR blur. Any future PR that ports XYB or YUV→RGB onto the GPU MUST land alongside an updated ADR-0206 and re-validate places=4 on every Netflix CPU pair.
  • CUDA fex uses .extract (synchronous), not .submit/.collect. Per-frame raw YUV is D2H-copied from picture_cuda's device-side VmafPicture.data[] into pinned host scratch via cuMemcpy2DAsync. Skipping the copy segfaults — direct host reads on a CUdeviceptr are the failure mode the prior agent's WIP hit.
  • On upstream sync: zero interaction with Netflix. The GPU coverage matrix for ssimulacra2 is wholly fork-local.
  • Re-test on rebase:
meson setup build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda

python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary ./build_cuda/tools/vmaf \
  --feature ssimulacra2 --backend cuda --places 4 \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel-format 420 --bitdepth 8
# Expect: 0/48 mismatches, max_abs_diff ~1e-6.

0061 — cambi GPU feasibility spike (ADR-0205)

  • ADR: ADR-0205.
  • Research digest: docs/research/0020-cambi-gpu-strategies.md.
  • Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
  • Touches (additive only):
  • docs/adr/0205-cambi-gpu-feasibility.md, docs/research/0020-cambi-gpu-strategies.md, docs/adr/README.md index row.
  • core/src/feature/vulkan/cambi_vulkan.c — new dormant scaffold (not yet in vulkan_sources, not yet registered).
  • core/src/feature/vulkan/shaders/cambi_{derivative,decimate,filter_mode}.comp — new reference GLSL shaders, not yet in the build's shaders list.
  • core/src/feature/AGENTS.md invariants + CHANGELOG.md bullet.
  • Invariants (rebase-relevant):
  • Hybrid host/GPU port by decision. If Netflix upstream tightens the c-value formula or histogram update protocol, the host residual call site in the eventual cambi_vulkan.c::cambi_vulkan_extract must be updated alongside cambi.c::calculate_c_values — the same code is reused. Do NOT translate the c-values phase to GPU during any upstream-port PR; that optimisation belongs to the v2 strategy-III PR (deferred).
  • Scaffolds dormant in the spike PR. The cambi_vulkan.c extractor returns -ENOSYS from cambi_vulkan_init_stub until the integration follow-up wires it in. Do NOT register vmaf_fex_cambi_vulkan_scaffold in feature_extractor.c's list.
  • Shaders not in the build's shader list. Adding them to core/src/vulkan/meson.build's vulkan_shaders list before the integration PR produces orphaned *_spv.h headers. Leave them alone in this spike PR.
  • On upstream sync: zero interaction. cambi.c itself is upstream-mirrored — Netflix changes flow through port-upstream-commit; only the integration PR's host residual call site needs paired attention.
  • Re-test on rebase:

```bash meson setup build -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build

0059 — Tiny-AI Netflix corpus training prep (ADR-0203)

  • ADR: ADR-0203.
  • Upstream source: fork-local. Netflix/vmaf has no equivalent training surface.
  • Touches:
  • ai/data/ — Netflix loader, libvmaf-CLI feature extractor, distillation scoring.
  • ai/train/ — PyTorch dataset, eval harness, Lightning-style training entry point.
  • ai/scripts/run_training.sh — convenience wrapper.
  • ai/tests/ — five new pytest modules (test_netflix_loader.py, test_dataset.py, test_eval.py, test_train_smoke.py, plus conftest.py).
  • docs/ai/training.md — new "C1 (Netflix corpus)" section; existing sections untouched.
  • ai/AGENTS.md — invariants section added.
  • Invariants (load-bearing):
  • Filename ladder regex is fork-specific. <source>_<quality>_<height>_<bitrate>.yuv (dis) + <source>_<fps>fps.yuv (ref). Upstream may publish a different naming convention later; do NOT merge them — keep this loader scoped to the Netflix corpus, add a sibling loader for any upstream alternative.
  • Per-clip cache schema is consumed by both dataset and any downstream tooling. Schema is {features:{feature_names, per_frame, n_frames}, scores:{per_frame, pooled}}. Any change must invalidate $VMAF_TINY_AI_CACHE (delete or version-tag the directory).
  • Smoke command stays runnable without a built vmaf binary. The _make_zero_payload helper in ai.train.dataset injects a fake payload for --epochs 0 so CI gates don't drag a libvmaf build into the Python test surface.
  • YUV size probe never silently guesses. probe_yuv_dims either matches the 1920x1080 default, returns ffprobe's answer, or raises. Tests pass assume_dims=(16, 16) explicitly for synthetic fixtures.
  • On upstream sync: no interaction with upstream. The ai/ subtree is wholly fork-local.
  • Re-test on rebase:
python -m pytest ai/tests/test_netflix_loader.py \
    ai/tests/test_dataset.py ai/tests/test_eval.py \
    ai/tests/test_train_smoke.py -v
python ai/train/train.py --epochs 0 --data-root /tmp/mock_corpus \
    --assume-dims 16x16 --val-source BetaSrc --out-dir /tmp/out

0073 — Tiny-AI QAT trainer + first per-model QAT pass (T5-4)

  • ADR: ADR-0207 (design), ADR-0208 (per-model impl).
  • Touches: ai/train/qat.py (new), ai/scripts/qat_train.py (rewrite from NotImplementedError scaffold), ai/configs/learned_filter_v1_qat.yaml (new), ai/tests/test_qat_smoke.py (new), docs/ai/quantization.md (QAT tier added). All paths are wholly fork-local; no upstream Netflix/vmaf interaction.
  • Invariants:
  • Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) is load-bearing. Both the legacy ONNX exporter (quantized::conv2d) and the new TorchDynamo exporter (Conv2dPackedParamsBase.__obj_flatten__) refuse to consume convert_fx output on PyTorch 2.11. The bridge (state-dict diff to a fresh fp32 module + ORT static-quantize) is the only path that yields a QDQ ONNX. Do NOT collapse to a single-step convert_fx → torch.onnx.export until both PyTorch issues are fixed; re-check both exporters on each PyTorch upgrade.
  • State-dict transfer matches by submodule name + shape. _copy_qat_weights_into_fp32 walks fp32_state keys, finds the same key in the FX-prepared module, copies the tensor. Tiny-AI models today have stable submodule names (entry, body.*, exit); a model architecture that uses top-level nn.Sequential would break this because prepare_qat_fx renames Sequential children to numeric indices. The RuntimeError("0 tensors copied") guard catches the silent failure mode.
  • FX preparation runs on CPU. PyTorch 2.11's FX symbolic tracer is flaky on CUDA buffers; the trainer migrates the model to CPU before prepare_qat_fx and back to the accelerator for the fine-tune phase. The smoke test deliberately exercises the CPU path so this stays covered.
  • torch.ao.quantization deprecation will hard-fail in PyTorch 2.10. Migration target is torchao.quantization.pt2e (prepare_pt2e / convert_pt2e); the two-step pipeline is mostly pt2e-compatible — only the FX-prep call changes.
  • On upstream sync: no interaction with upstream. The ai/ subtree is fully fork-local.
  • Re-test on rebase:
python -m pytest ai/tests/test_qat_smoke.py -v
python ai/scripts/qat_train.py \
    --config ai/configs/learned_filter_v1_qat.yaml \
    --output /tmp/qat_smoke.int8.onnx --smoke

0074 — GPU-parity matrix CI gate (T6-8 / ADR-0214)

  • Touched surfaces (fork-local): scripts/ci/cross_backend_parity_gate.py (new), .github/workflows/tests-and-quality-gates.yml (new vulkan-parity-matrix-gate job), docs/development/cross-backend-gate.md (new), docs/backends/index.md (cross-backend section), libvmaf/AGENTS.md (rebase-sensitive invariant note).
  • Why this matters on rebase: the CI lane and the matrix-gate script are entirely fork-local. Upstream Netflix/vmaf has no comparable gate; conflicts on rebase are restricted to the CI workflow file when upstream rearranges its own jobs. The gate's Python script lives outside core/src/ so the upstream-sync path doesn't see it.
  • Invariants the gate enforces:
  • Per-feature absolute tolerance is declared in one place (FEATURE_TOLERANCE in scripts/ci/cross_backend_parity_gate.py). Tightening a tolerance requires a measurement-driven follow-up ADR; loosening requires a justification ADR (CLAUDE.md §12 r1).
  • The legacy single-feature gate scripts/ci/cross_backend_vif_diff.py stays for one release cycle. Sister PRs in this session add to it; the T6-8b cleanup PR deletes it once the matrix gate has soaked.
  • CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is registered. The script supports them via --backends; flipping the CI lane to required is a follow-up wiring change, not a code change.
  • On upstream sync: no interaction with upstream tests-and-quality-gates.yml (the gate job is fork-added); rebase conflicts limited to insertion-order in the workflow file.
  • Re-test on rebase:
cd libvmaf && meson setup build \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --backends cpu vulkan \
    --json-out /tmp/parity.json --md-out /tmp/parity.md

0220 — SYCL feature kernels are unconditionally fp64-free (T7-17)

  • Touches: core/src/sycl/common.cpp (init log line), core/src/sycl/AGENTS.md (new invariant row), all SYCL feature kernels under core/src/feature/sycl/ (no diff today, but the contract pins their shape going forward).
  • Invariant: every SYCL feature-kernel lambda captures and operates on float / integer types only. No double operand inside a parallel_for body, no sycl::reduction<double>, no sycl::plus<double>. A single fp64 instruction in the TU's SPIR-V module causes the Level Zero runtime to reject the entire module on Intel Arc A-series and other fp64-less devices, even when the offending kernel is never submitted. Host-side double (in extract / flush post-processing, score aggregation, log10 normalisation) remains fine. Concrete patterns in tree: ADM gain limiting via int64 Q31 (gain_limit_to_q31 + launch_decouple_csf<false> in integer_adm_sycl.cpp); VIF gain limiting via fp32 sycl::fmin; CIEDE / SSIM accumulators via sycl::reduction<int64_t> / sycl::plus<int64_t>.
  • On upstream sync: Netflix/vmaf has no SYCL backend upstream; conflicts cannot enter via git merge. The risk is a fork-local cherry-pick (e.g. a SYCL twin of a new CUDA kernel) bringing a double into a kernel lambda. Audit the lambda capture list and any sycl::reduce* calls against this invariant before merging.
  • Re-test on rebase:
# Build SYCL backend
meson setup build-sycl libvmaf -Denable_sycl=true CC=icx CXX=icpx
ninja -C build-sycl

# On an fp64-less device (e.g. Intel Arc A380), confirm the
# init log line is INFO-level and reads "device lacks native
# fp64 — kernels already use fp32 + int64 paths, no emulation
# overhead". The SYCL kernels must launch successfully (no
# SPIR-V module rejection from the Level Zero runtime).
build-sycl/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --backend sycl \
    --feature integer_vif --feature integer_adm \
    --output /tmp/sycl-fp64less.json --json

0091 — T6-9 model registry schema + --tiny-model-verify (ADR-0211)

  • No rebase impact: 100% fork-local surface. The registry (model/tiny/registry.json), its JSON Schema (model/tiny/registry.schema.json), the --tiny-model-verify CLI flag, and the vmaf_dnn_verify_signature() C entry point are entirely fork-local — none of these paths exist in upstream Netflix/vmaf. Listed here for completeness so a future /sync-upstream run sees the surface area was acknowledged.
  • Touches (additive only): model/tiny/registry.json, model/tiny/registry.schema.json, ai/scripts/validate_model_registry.py, core/src/dnn/model_loader.{c,h} (added vmaf_dnn_verify_signature()), core/include/libvmaf/dnn.h (public declaration), core/tools/cli_parse.{c,h} (ARG_TINY_MODEL_VERIFY + tiny_model_verify field), core/tools/vmaf.c (call site), core/test/dnn/test_tiny_model_verify.c, python/test/model_registry_schema_test.py, docs/ai/model-registry.md, docs/ai/inference.md, docs/ai/security.md, docs/adr/0209-...md, docs/adr/README.md (index row), CHANGELOG.md, core/src/dnn/AGENTS.md.
  • Invariants (rebase-relevant):
  • Schema is the contract. New registry fields land in registry.schema.json first, then in registry.json, then in any consumers (the C-side parser, the Python validator, the MCP). Reverse order causes mismatch.
  • schema_version is bounded. The schema accepts only {0, 1}; bump the enum and the loader's check together when adding 2.
  • Banned-function rule applies. The cosign invocation uses posix_spawnp(3p) with an explicit argv array. Do not replace with system(3) / popen(3) — both shell-parse the command and would re-introduce injection risk.
  • Bundle-file absence is fail-closed. When sigstore_bundle points at a not-yet-existing file (pre-release state), vmaf_dnn_verify_signature() returns -ENOENT. The CLI surfaces this as a load failure; do not "soften" to a warning without an explicit ADR.
  • Re-test on rebase:
python3 ai/scripts/validate_model_registry.py
python3 -m pytest python/test/model_registry_schema_test.py -v
meson test -C build-cpu --suite=dnn

0074 — HIP (AMD ROCm) backend scaffold (T7-10)

  • ADR: ADR-0212.
  • Upstream source: fork-local. HIP backend is fork-only; Netflix/vmaf has no libvmaf_hip.h and no enable_hip meson option.
  • Touches:
  • core/include/libvmaf/libvmaf_hip.h (new).
  • core/include/core/meson.build — adds the is_hip_enabled install gate, mirroring is_cuda_enabled / is_sycl_enabled boolean idioms.
  • core/meson_options.txt — new enable_hip boolean option (default false).
  • core/src/meson.build — new is_hip_enabled flag, conditional subdir('hip'), hip_sources + hip_deps threaded through libvmaf_feature_static_lib (alongside the existing CUDA / SYCL / Vulkan aggregations) and the top-level library('vmaf', ...) dependencies list.
  • core/src/hip/ (new directory: common.{c,h}, picture_hip.{c,h}, dispatch_strategy.{c,h}, meson.build).
  • core/src/feature/hip/ (new directory: adm_hip.c, vif_hip.c, motion_hip.c).
  • core/test/test_hip_smoke.c (new).
  • core/test/meson.build — registers the smoke test under if get_option('enable_hip') == true.
  • .github/workflows/libvmaf-build-matrix.yml — adds Build — Ubuntu HIP (T7-10 scaffold) row.
  • docs/backends/hip/overview.md (new), docs/backends/index.md (planned → scaffold row), docs/research/0033-hip-applicability.md (new), docs/adr/0212-hip-backend-scaffold.md (new), docs/adr/README.md (new index row).
  • libvmaf/AGENTS.md — new "HIP backend scaffold contract" rebase-sensitive invariant entry.
  • CHANGELOG.md — Unreleased § Added.
  • Invariants (rebase-relevant):
  • enable_hip is a boolean option, not a feature. Mirrors enable_cuda / enable_sycl; do not "harmonise" with enable_vulkan's feature / disabled form without an ADR amendment per ADR-0212 § "Decision".
  • Public C-API entry points return -ENOSYS for the scaffold. The smoke test core/test/test_hip_smoke.c pins this. A rebase that "succeeds" by accidentally enabling a code path (e.g. a refactor that early-returns 0 from vmaf_hip_state_init) breaks the smoke and the runtime PR's contract baseline.
  • hip_sources is added to libvmaf_feature_static_lib, NOT directly to the top-level library('vmaf', ...). The static lib is extracted into libvmaf via objects: [..., libvmaf_feature_static_lib.extract_all_objects(recursive: true), ...] at the bottom of core/src/meson.build. Adding hip_sources to the top library() too would double-link.
  • hip_deps IS added to the top library() dependencies: list. The runtime PR will populate hip_deps with the real dependency('hip-lang') linkage; threading it through the top library() ensures consumers see the transitive dependency.
  • Header purity: libvmaf_hip.h does not include <hip/hip_runtime.h>. HIP runtime types cross the public ABI as uintptr_t (matches the CUDA / Vulkan precedent; ADR-0212). Don't add <hip/...> includes to the public header during a rebase / runtime-PR bring-up.
  • No FFmpeg patch: the fork's ffmpeg-patches/ series does not currently consume the HIP API surface. CLAUDE §12 r14 only requires patch updates when an existing patch consumes the surface; the runtime PR (T7-10b) will add the hip_device filter option and the corresponding patch.
  • On upstream sync: zero interaction; HIP backend is fork-only.
  • Re-test on rebase:
cd libvmaf
meson setup build-hip -Denable_cuda=false -Denable_sycl=false \
                      -Denable_hip=true
ninja -C build-hip
meson test -C build-hip test_hip_smoke
# Expect: 9/9 pass.

# Default no-HIP build still works:
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=fast

0074 — SSIMULACRA 2 SVE2 SIMD parity (T7-38)

  • ADR: ADR-0213.
  • Touches: core/src/feature/arm64/ssimulacra2_sve2.{c,h} (new), core/src/feature/ssimulacra2.c (dispatch table override in init_simd_dispatch), core/src/arm/cpu.{c,h} (HWCAP2_SVE2 probe + new VMAF_ARM_CPU_FLAG_SVE2 enum value), core/src/meson.build (cc.compiles probe + optional arm64_ssimulacra2_sve2 static library), core/test/test_ssimulacra2_simd.c (SVE2 picker overrides on the arm64 path + dispatch diagnostic), build-aux/aarch64-linux-gnu-sve2.ini (new cross-file pinning qemu-aarch64-static -cpu max). All paths are wholly fork-local; no upstream Netflix/vmaf code is modified.
  • Invariants:
  • Fixed 4-lane SVE2 predicate. Every kernel uses svwhilelt_b32(0, 4) so SIMD arithmetic order is identical to the NEON sibling regardless of the runtime vector length. This keeps the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract intact. Do NOT widen the predicate to svptrue_b32() without a separate ADR + snapshot regen — variable-length lane reductions perturb the per-step rounding order.
  • NEON stays the fallback. SVE2 is purely additive; the dispatch table assigns NEON first and only overrides on VMAF_ARM_CPU_FLAG_SVE2. A toolchain that fails the cc.compiles(... -march=armv9-a+sve2) probe leaves HAVE_SVE2 unset and the legacy NEON-only build is unchanged.
  • -ffp-contract=off mirrors the NEON sibling. Without it GCC fuses the per-lane scalar tail's a*b+c patterns into fmla, drifting against the SIMD path by ~1 ulp. The arm64_ssimulacra2_sve2 static library carries the flag like its NEON counterpart.
  • On upstream sync: no interaction with upstream — arm64/ feature TUs and the arm/cpu.{c,h} flag enum are fork-local. An upstream sync that rewrites init_simd_dispatch in core/src/feature/ssimulacra2.c would also need the SVE2 cases preserved.
  • Re-test on rebase:
meson setup build-arm64-sve2 libvmaf \
    --cross-file=build-aux/aarch64-linux-gnu-sve2.ini -Denable_asm=true
ninja -C build-arm64-sve2 test/test_ssimulacra2_simd
meson test -C build-arm64-sve2 test_ssimulacra2_simd
# stderr should report `ssimulacra2 simd dispatch: NEON=1 SVE2=1`
# and 11/11 tests should pass.

0075 — enable_lcs MS-SSIM extras on CUDA + Vulkan (T7-35 / ADR-0243)

  • Touched surfaces (fork-local): core/src/feature/cuda/integer_ms_ssim_cuda.c (added enable_lcs to MsSsimStateCuda + options[] + 15 host-side vmaf_feature_collector_append calls gated on the bool), core/src/feature/vulkan/ms_ssim_vulkan.c (rewrote enable_lcs help text + added emit_lcs_metrics helper + gated 15 vmaf_feature_collector_append calls), scripts/ci/cross_backend_vif_diff.py
  • scripts/ci/cross_backend_parity_gate.py (new float_ms_ssim_lcs pseudo-feature + FEATURE_ALIASES map
  • places=4 tolerance row).
  • Why this matters on rebase: the GPU MS-SSIM extractors are fork-local (Netflix upstream has no Vulkan or CUDA MS-SSIM kernel today). The enable_lcs semantic and the metric names (float_ms_ssim_{l,c,s}_scale{0..4}) must match the upstream CPU reference at core/src/feature/float_ms_ssim.c:189-221. If upstream ever renames or reorders those metrics, mirror the change on the GPU side in the same merge — public-API contract.
  • Invariants the contract enforces:
  • Default-path output (enable_lcs=false) stays bit-identical to the pre-T7-35 binary: only the host-side appends are gated; no kernel / shader / device-buffer changes.
  • Metric ordering is metric-wise (all l_scale* first, then c_*, then s_*) — matches the CPU emission order.
  • places=4 cross-backend tolerance per ADR-0190; enforced by the new float_ms_ssim_lcs cell in the parity matrix gate (ADR-0214).
  • On upstream sync: zero interaction; the GPU twins do not exist upstream. The CPU float_ms_ssim.c is shared with upstream but enable_lcs is upstream-stable since v3.0.0.
  • Re-test on rebase:
cd libvmaf && meson setup build-vulkan \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build-vulkan
cd ..
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build-vulkan/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 \
    --feature float_ms_ssim_lcs --backend vulkan --places 4

0075 — 32-bit ADM/cpu fallbacks port (T-NEW-3)

  • Touched surfaces (upstream-mirror): core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/x86/cpu.c. Cherry-picks of upstream 8a289703 (Christopher Degawa, "adm: add fallback for extract_epi64 for 32-bit") and 1b6c3886 ("x86/cpu: remove limit of avx+ on 32-bit").
  • Why this matters on rebase: trivially conflict-free with any future upstream extract_epi64 work because we land upstream's exact extract_epi64 macro/inline-fn pair. The conflict surface is the fork's clang-format-100col layout in adm_avx2.c / adm_avx512.c and the _Alignas(64) LTO-correctness slot in adm_avx512.c (docs/development/known-upstream-bugs.md); both are preserved verbatim.
  • Invariants the port preserves:
  • _Alignas(64) int64_t angle_flag[16] in adm_decouple_s123_avx512 stays — without it, LTO can promote the unaligned load to vmovdqa64 and fault under --buildtype=release -Db_lto=true.
  • The extract_epi64 symbol must remain resolved on both __x86_64__ (macro to _mm256_extract_epi64) and 32-bit (fallback inline). If a future upstream change inlines the helper differently, keep the conditional definition.
  • On upstream sync: if Netflix ships further 32-bit fallbacks (motion / psnr — not in this port), expect a parallel extract_epi64-style helper at the top of each affected SIMD file. The fork should mirror those verbatim into the same files.
  • Re-test on rebase:
meson setup build-i686 libvmaf \
    --cross-file=build-aux/i686-linux-gnu.ini \
    -Denable_asm=false
ninja -C build-i686
meson setup build-cpu libvmaf -Denable_avx512=true
ninja -C build-cpu
meson test -C build-cpu

0076 — codec-aware FR regressor surface (T7-CODEC-AWARE / ADR-0235)

  • Touches: ai/src/vmaf_train/codec.py (new), ai/src/vmaf_train/models/fr_regressor.py (extended), ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/extract_full_features.py. No upstream-shared paths.
  • Invariant: CODEC_VOCAB in ai/src/vmaf_train/codec.py is closed and order-stable — the index of each codec is the one-hot column index baked into trained ONNX. Adding a codec appends to the tuple and bumps CODEC_VOCAB_VERSION; reordering silently invalidates every shipped fr_regressor_v2_*.onnx. FRRegressor(num_codecs=0) must remain the v1 single-input contract — flipping the default would break every existing model/tiny/fr_regressor_v1.onnx consumer.
  • Re-test: pytest ai/tests/test_codec_aware_fr.py -v (8 sub-tests covering vocabulary contract + alias table + back-compat). Pure fork-local addition; no upstream rebase impact for the next /sync-upstream.

0075 — feature/speed extractors (T-NEW-1, upstream port d3647c73)

  • Touches: core/src/feature/speed.c (new), core/src/feature/picture_copy.{c,h} (signature change — added int channel parameter), core/src/feature/float_*.c call sites updated to pass channel=0, core/src/feature/feature_extractor.c registry block, core/src/feature/alias.c, core/src/meson.build, core/src/feature/vif_tools.{c,h} (helper-function port from upstream 4ad6e0ea).
  • Upstream source: verbatim cherry-pick of Netflix/vmaf d3647c73 ("feature/speed: port speed_chroma and speed_temporal extractors") with its dependency 4ad6e0ea ("feature/vif: port helper functions"). Both are pre-existing on Netflix master and enter the fork as part of the T7-4 audit catch-up.
  • Invariant: picture_copy() now takes a channel argument — every fork-local extractor that calls it (CUDA integer_ms_ssim, Vulkan ssim / ms_ssim) passes channel=0. If upstream later evolves the signature again (e.g. adds bit-depth or stride validation), update those fork-local call sites in lockstep. Speed extractors only register when VMAF_FLOAT_FEATURES=1 (build with -Denable_float=true).
  • On upstream sync: future Netflix commits in core/src/feature/speed.c apply cleanly because the file is now a verbatim mirror; conflict potential is limited to the registry block in feature_extractor.c (interleave with the fork's Vulkan / SYCL / CUDA blocks) and to any further picture_copy signature evolution.
  • Re-test on rebase:

```bash meson setup build-cpu libvmaf -Denable_cuda=false \ -Denable_sycl=false -Denable_float=true ninja -C build-cpu meson test -C build-cpu test_speed meson test -C build-cpu # full meson suite make test-netflix-golden # 3 CPU canonical pairs

0221 — CHANGELOG + ADR-index fragment-file pattern (T7-39 / ADR-0221)

  • What changed: the fork stopped editing CHANGELOG.md and docs/adr/README.md directly. Both files are now rendered from fragment trees:
  • changelog.d/<section>/<topic>.md (Keep-a-Changelog sections), plus the migration archive changelog.d/_pre_fragment_legacy.md.
  • docs/adr/_index_fragments/<NNNN-slug>.md, plus docs/adr/_index_fragments/_order.txt (frozen commit-merge order manifest) and docs/adr/_index_fragments/_header.md (table prelude). Two scripts render the consolidated outputs:
  • scripts/release/concat-changelog-fragments.sh --check|--write
  • scripts/docs/concat-adr-index.sh --check|--write
  • On upstream sync: zero interaction — CHANGELOG.md is a fork-local Markdown surface (Netflix upstream doesn't ship a Keep-a-Changelog file in this format), and docs/adr/ is entirely fork-local. A /sync-upstream run will not touch the fragment trees.
  • Re-test on rebase:
bash scripts/release/concat-changelog-fragments.sh --check
bash scripts/docs/concat-adr-index.sh --check
# both must exit 0; otherwise run --write and re-stage.

0077 — DISTS extractor proposal (T7-DISTS / ADR-0236)

  • What landed: ADR-0236 (Proposed) + Research-0043 design digest ADR README index row + CHANGELOG entry.
  • Rebase impact: pure fork-local proposal-stage docs; no code, no Netflix-mirror file touched, no ffmpeg-patches change, no public C-API surface change.
  • Reproducer (when implementation lands as T7-DISTS):

```sh vmaf --feature dists_sq=model_path=model/tiny/dists_sq.onnx \ --reference ref.yuv --distorted dist.yuv \ --width 1920 --height 1080 --pix_fmt yuv420p

0076 — GPU-gen ULP calibration head (proposal-stage, T7-GPU-ULP-CAL / ADR-0234)

  • What landed: ADR-0234 (Proposed), Research-0041, data-collection scaffold at ai/scripts/collect_gpu_calibration_data.py, forward-pointer in docs/usage/cli.md for the future --gpu-calibrated flag.
  • Rebase impact: pure fork-local (proposal docs + Python script); no upstream Netflix/vmaf code touched, no public C-API changes, no ffmpeg-patches changes.
  • Reproducer:

```sh python3 ai/scripts/collect_gpu_calibration_data.py --smoke

0095 — Per-backend GPU kernel scaffolding templates (CUDA + Vulkan, ADR-0246)

  • ADR: ADR-0246.
  • Touches:
  • core/src/cuda/kernel_template.h (new, header-only).
  • core/src/vulkan/kernel_template.h (new, header-only).
  • core/src/cuda/AGENTS.md (new invariant row + dir listing).
  • core/src/vulkan/AGENTS.md (new file).
  • docs/backends/kernel-scaffolding.md (new).
  • docs/adr/0246-gpu-kernel-template.md (new).
  • CHANGELOG.md, docs/adr/README.md. All paths are wholly fork-local. Upstream Netflix/vmaf has no Vulkan backend at all today and the CUDA backend uses different per-kernel scaffolding shapes; nothing here can collide on a pure upstream sync.
  • Invariants:
  • Templates are unused at PR-merge time. kernel_template.h in both core/src/cuda/ and core/src/vulkan/ lands with zero call-sites. Each future kernel migration is its own gated PR (places=4 cross-backend-diff per ADR-0214). Do not bulk-port existing kernels onto the templates in a single sync — that would short-circuit the per-kernel gate.
  • Per-backend, not cross-backend. Resist the urge to merge the two templates into a unified gpu/kernel_template.h. CUDA async-stream + event vs Vulkan command-buffer + fence + descriptor-pool share no concrete shape; a unified API would be lowest-common-denominator.
  • Helper functions, not macros. The header bodies are static inline functions for cuda-gdb / Nsight / RenderDoc step-debugging. The CHECK_CUDA_GOTO / CHECK_CUDA_RETURN macros in cuda_helper.cuh stay where they pay off (textual goto label), and the templates use them internally.
  • On upstream sync: no interaction with upstream paths. An upstream sync that touches core/src/cuda/common.h or picture_cuda.h may shift the helper signatures the template consumes (vmaf_cuda_buffer_alloc, vmaf_cuda_picture_get_stream, …); update the template if so.
  • Re-test on rebase:

```bash # CUDA build (configure inside libvmaf/ — see CLAUDE.md §2 note). meson setup core/build-cuda libvmaf \ -Denable_cuda=true -Denable_nvcc=true \ -Denable_vulkan=disabled -Denable_sycl=false ninja -C core/build-cuda meson test -C core/build-cuda

# Vulkan build. meson setup core/build-vulkan libvmaf \ -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C core/build-vulkan meson test -C core/build-vulkan

0222 — vmaf-perShot per-shot CRF predictor sidecar (T6-3b)

  • Touches: core/tools/meson.build (new executable + test wiring), core/tools/vmaf_per_shot.c (new file — fork-local, no upstream sibling), core/tools/test/meson.build (test row), core/tools/test/test_vmaf_per_shot.sh (new smoke test), core/tools/AGENTS.md (sidecar invariants), docs/usage/cli.md (cross-link), docs/usage/vmaf-perShot.md (new user doc), docs/ai/roadmap.md (T6-3b row update).
  • Invariant: the sidecar must stay standalone — it does not link the libvmaf metric path. Any upstream patch that tries to fold per-shot CRF prediction into vmaf_score_* would collapse the encoder-hint vs. quality-score separation recorded in roadmap §2.4 and ADR-0222 §Decision. The CSV / JSON column set (shot_id, start_frame, end_frame, frames, mean_complexity, mean_motion, predicted_crf) is the public schema; downstream encoders consume it directly.
  • Conflict expectation on /sync-upstream: low. Upstream Netflix has no per-shot CRF predictor in tree, so there is no natural collision point — tools/meson.build is the only mutually-edited file and the new executable('vmaf-perShot', …) block is appended after vmaf_bench_deps, well clear of upstream's likely additions.
  • Reproducer:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=disabled ninja -C build meson test -C build test_vmaf_per_shot --print-errorlogs ./build/tools/vmaf-perShot \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --output /tmp/plan.csv cat /tmp/plan.csv

0075 — vmaf-roi sidecar binary (T6-2b / ADR-0247)

  • Touches:
  • core/tools/meson.build — adds the vmaf_roi executable target (after the existing vmaf target, before vmaf_bench). Append-only; no upstream-shared lines moved or removed.
  • core/test/meson.build — adds the test_vmaf_roi executable + test() registration. Append-only.
  • core/tools/vmaf_roi.c — wholly new, fork-local.
  • core/tools/vmaf_roi_core.h — wholly new, fork-local.
  • core/test/test_vmaf_roi.c — wholly new, fork-local.
  • Invariant: the vmaf-roi sidecar emits two byte-exact formats that downstream encoder drivers (x265 --qpfile, SVT-AV1 --roi-map-file) will hard-depend on:
  • x265 ASCII grid — two #-prefixed header lines (# vmaf-roi qpfile (x265, --qpfile-style) and # frame=N ctu=S cols=C rows=R strength=F.FFF), space-separated signed integers, one row per CTU row, \n terminator.
  • SVT-AV1 raw binary — exactly cols * rows bytes of int8_t, row-major, no header.
  • QP-offset clamp+-12 (VMAF_ROI_CORE_QP_OFFSET_MAX).
  • Reduction — per-CTU mean (not max). Switching to max or a percentile changes every downstream encoder result and requires its own ADR.
  • Pure helpers in vmaf_roi_core.h — the per-CTU mean reducer and saliency-to-QP mapper are static inline in a header so test_vmaf_roi compiles them without dragging the libvmaf link surface in. Moving them into a .c TU breaks the test wiring.
  • On upstream sync: no interaction with upstream — tools/ is a fork-local surface from upstream's perspective (upstream ships vmaf.c only). An upstream sync that rewrites core/tools/meson.build should preserve the vmaf_roi executable block.
  • Re-test on rebase:

```bash meson setup build-cpu libvmaf \ -Denable_cuda=false -Denable_sycl=false -Denable_tools=true ninja -C build-cpu tools/vmaf_roi test/test_vmaf_roi meson test -C build-cpu test_vmaf_roi ./build-cpu/tools/vmaf_roi \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --frame 0 --output - \ --encoder x265 --ctu-size 64 --strength 6.0 | head -3 # First two lines are the # comment header; row 1 of the grid # should be "4 2 1 -1 -1 -1 1 2 4" (placeholder radial map).

0219 — motion3 GPU coverage on Vulkan + CUDA + SYCL (T3-15(c) / ADR-0219)

  • What changed: The motion GPU twins (core/src/feature/vulkan/motion_vulkan.c, core/src/feature/cuda/integer_motion_cuda.c, core/src/feature/sycl/integer_motion_sycl.cpp) now emit VMAF_integer_feature_motion3_score in 3-frame window mode (default). Cross-backend gates extended (scripts/ci/cross_backend_*.py FEATURE_METRICS["motion"]).
  • Invariants:
  • motion3 = host-side scalar post-process of motion2. No device-side state changes; motion3 is computed on the host in extract() / collect() / flush() after the existing SAD reduction. The post-processing function (motion3_postprocess_*) mirrors CPU integer_motion.c lines 510-560 byte-for-byte: clip(motion_blend(motion2 * fps_weight, blend_factor, blend_offset), max_val) with optional moving-average against the unaveraged prior blended value.
  • motion_five_frame_window=true returns -ENOTSUP at init() on all three GPU backends. The 5-deep blur ring + second SAD-pair dispatch remain deferred. Do NOT silently fall back to the 3-frame path when the user enables the flag — fail loud per CERT C / CLAUDE.md §12 r4.
  • CPU motion3 algorithm is the source of truth. Any port of an upstream Netflix change to integer_motion.c that touches motion_blend(...), the motion_max_val clip, or the moving-average rule MUST be mirrored in motion3_postprocess_* across all three GPU files in the same PR. The cross-backend gate at places=4 will catch drift, but only after a full GPU run.
  • On upstream sync: Pure fork-local additions to GPU TUs. Upstream Netflix has no GPU motion extractor. The motion_blend_tools.h header is upstream-mirrored — if a sync rewrites the motion_blend() formula, regenerate the GPU snapshot and re-run the cross-backend gate.
  • Re-test on rebase:

```bash # CPU sanity (motion3 emission unchanged) ./core/build/tools/vmaf \ --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature motion --output /tmp/motion.json --json python -c "import json; d=json.load(open('/tmp/motion.json')); \ print('motion3 frames:', sum(1 for f in d['frames'] \ if 'integer_motion3' in f.get('metrics', {})))" # Expect 49 (one motion3 per frame).

# Cross-backend gate (Vulkan/lavapipe lane works on every host): python scripts/ci/cross_backend_vif_diff.py \ --feature motion --backend vulkan \ --ref python/test/resource/yuv/src01_hrc00_576x324.yuv \ --dis python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --bitdepth 8 \ --vmaf-bin core/build/tools/vmaf # Expect: integer_motion / integer_motion2 / integer_motion3 all OK at places=4.

0216 — vmaf_tiny_v2 (Phase-3-validated tiny VMAF MLP)

  • Touches: model/tiny/registry.json, model/tiny/vmaf_tiny_v2.{onnx,json}, ai/scripts/{train,export,validate}_vmaf_tiny_v2.py, ai/AGENTS.md, core/test/dnn/{test_vmaf_tiny_v2.py,meson.build}, docs/ai/{models/vmaf_tiny_v2.md,inference.md,roadmap.md}, docs/adr/{0244-vmaf-tiny-v2.md,README.md}, CHANGELOG.md. All paths are wholly fork-local; no upstream Netflix/vmaf code is modified.
  • Invariants:
  • Bundled scaler stats are part of the trust root. The shipped ONNX bakes (input - mean) / std as Constant Sub + Div nodes that run before the MLP. Re-exporting must go through ai/scripts/export_vmaf_tiny_v2.py, which pulls mean / std from the trainer checkpoint and writes them as graph initialisers. Adding an out-of-band scaler step at runtime (e.g., a sidecar JSON consumed by the loader) is forbidden without a follow-up ADR — it splits the trust root and invalidates the registry sha256 contract.
  • Feature column order is fixed. The graph reads (adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2) in exactly this order; reordering breaks the bundled mean / std constants. Any change to the feature set requires a fresh Phase-3 chain (Research-0027 → 0028 → 0029 → 0030).
  • opset 17. Matches the sister tiny-AI models (learned_filter_v1, nr_metric_v1, fastdvdnet_pre) and the ORT op-allowlist baseline. Upgrading requires re-validating the Sub / Div / Gemm / Relu / Squeeze ops against op_allowlist.c.
  • On upstream sync: zero interaction. Netflix/vmaf has no equivalent surface; an upstream sync that touches core/src/dnn/ (op-allowlist or model-loader changes) needs to preserve Sub / Div / Gemm / Relu / Squeeze in the allowlist for opset 17.
  • Re-test on rebase:

```bash bash core/test/dnn/test_registry.sh python3 core/test/dnn/test_vmaf_tiny_v2.py python3 ai/scripts/validate_vmaf_tiny_v2.py \ --onnx model/tiny/vmaf_tiny_v2.onnx \ --parquet runs/full_features_netflix.parquet \ --rows 100 --min-plcc 0.97 meson test -C build-cpu --suite=dnn

0094 — Tiny-AI extractor template (ADR-0250)

  • Touches: core/src/dnn/tiny_extractor_template.h (new), core/src/feature/feature_lpips.c, core/src/feature/fastdvdnet_pre.c, core/src/dnn/AGENTS.md, docs/ai/extractor-template.md (new), docs/adr/0250-tiny-ai-extractor-template.md (new).
  • Invariants:
  • Helper signatures are wire-format-stable. vmaf_tiny_ai_resolve_model_path(name, option, env_var) and vmaf_tiny_ai_open_session(name, path, &out) produce the user-facing log lines <name>: no model path … and <name>: vmaf_dnn_session_open(<path>) failed: <rc> — downstream tooling greps these. Don't rename or reorder the parameters without bumping every extractor + the recipe doc.
  • YUV→RGB is bit-exact. The shared vmaf_tiny_ai_yuv8_to_rgb8_planes is a literal move of the pre-existing feature_lpips.c body (BT.709 limited-range, nearest-neighbour chroma upsample). LPIPS / saliency / future colour-sensitive tiny-AI scores depend on byte-exact equality with the prior ad-hoc copies. Any change to the conversion constants or the rounding rule needs a separate ADR + a coordinated snapshot regen — model/tiny/ weights aren't re-trained against new colour math casually.
  • Option-table macro is plain text substitution. The VMAF_TINY_AI_MODEL_PATH_OPTION(state_t, help) macro emits a single struct literal — no control flow, no recursion, no variadic shenanigans (Power-of-10 rule 1 / rule 9). Don't extend it into a multi-option emitter without a fresh ADR.
  • On upstream sync: zero interaction with upstream — feature_lpips.c and fastdvdnet_pre.c are fork-only files, and the new dnn/tiny_extractor_template.h lives entirely under fork-introduced core/src/dnn/. An upstream sync that rewrites unrelated feature_*.c files won't conflict.
  • Re-test on rebase:
cd libvmaf
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=dnn
meson test -C build-cpu test_lpips test_fastdvdnet_pre
# All 10 dnn-suite + both extractor tests must pass.

0095 — Vulkan ring-depth tunable (ADR-0251 follow-up #3)

  • PR: feat/t7-29-followup3-ring-tunable.
  • What rebases need to know: VmafVulkanConfiguration grew an additive unsigned max_outstanding_frames field. Existing zero-initialised configs continue to receive the canonical default (0 → VMAF_VULKAN_RING_DEFAULT == 4). The clamp helper vmaf_vulkan_clamp_ring_size moved from import.c (file-local static) to vulkan_internal.h (static inline) so state_init and lazy_alloc_ring share one definition; an upstream sync that re-introduces the static in import.c would shadow the header helper — drop the duplicate, keep the inline.
  • New public symbol: vmaf_vulkan_state_max_outstanding_frames(const VmafVulkanState *) — read-side accessor for the clamped value. Pure additive surface; no upstream collision.
  • On upstream sync: zero interaction. The ring is wholly fork-introduced (ADR-0251); upstream Netflix has no Vulkan backend.
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_hip=false -Denable_vulkan=disabled \ -Denable_float=true ninja -C build && meson test -C build # 51/52 OK; 1 pre-existing # T7-32 fail on # test_motion_v2_simd # (ADR-0038 follow-up)

# Smoke the new options + ENOTSUP guard: build/tools/vmaf --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature 'motion_v2=motion_blend_factor=0.5' \ --xml -o /tmp/r.xml --no_prediction grep motion3_v2 /tmp/r.xml | head -3 # → 49 frames with VMAF_integer_feature_motion3_v2_score_mbf_0.5

# ENOTSUP guard: build/tools/vmaf --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature 'motion_v2=motion_five_frame_window=1' \ --xml -o /tmp/r2.xml --no_prediction 2>&1 # → "problem loading feature extractor: motion_v2" # → stderr: "motion_v2: motion_five_frame_window=true is not supported …"

ADR-index backfill 2026-05-08 (this PR)

  • Touches: docs/adr/_index_fragments/0235-codec-aware-fr-regressor.md (new), docs/adr/_index_fragments/0236-dists-extractor.md (new), docs/adr/_index_fragments/0238-vulkan-picture-preallocation.md (new), docs/adr/_index_fragments/0239-gpu-picture-pool-dedup.md (new), docs/adr/_index_fragments/0251-vulkan-async-pending-fence.md (new), docs/adr/_index_fragments/0279-fr-regressor-v2-probabilistic.md (new), docs/adr/_index_fragments/_order.txt (six slugs appended), docs/adr/README.md (eight rows appended; one duplicate ADR-0279 row deduplicated).
  • Invariant: no engine code touched; no upstream-shared paths. Pure fork-local index maintenance.
  • On upstream sync: no action required. docs/adr/ is a fork-local tree.
  • Coordination with #468 (27-ADR status sweep): both PRs touch ADR metadata. They do not conflict at the file level (#468 edits ADR bodies; this PR adds index fragments + appends README rows for the eight previously-unindexed ADRs). At merge time the README append-tail may overlap if #468 lands later index rows for its swept ADRs; whichever lands first, the second rebases by re-running scripts/docs/concat-adr-index.sh --check and inserting any newly-stale rows in commit-merge order.
  • Known finding (out of scope): scripts/docs/concat-adr-index.sh --check currently reports a much larger fragment-vs-README drift than this PR introduces — many ADRs have rows in README.md without corresponding _index_fragments/ files, and several _order.txt slugs have no fragment yet. Running --write blindly would drop ~37 README rows for ADRs unrelated to this PR. The ADR-0221 fragment-driven contract therefore could not be enforced via a clean --write here; eight new rows were appended directly to keep the change scoped. A separate sweep PR is needed to flush the residual drift.
  • Re-test on rebase:
for n in 0235 0236 0238 0239 0251 0276 0279 0315; do
  grep -cE "^\| \[ADR-$n\]" docs/adr/README.md  # must be ≥ 1
done
bash scripts/docs/concat-adr-index.sh >/dev/null  # must succeed
  -Denable_vulkan=enabled

ninja -C build meson test -C build test_vulkan_async_pending_fence

# All 8 cases must pass: 4 v2-contract + 4 ring-tunable.

0096 — tools/vmaf-tune/ automation umbrella spec (ADR-0237 / Research-0044)

  • PR: feat/vmaf-tune-spec.
  • What rebases need to know: this PR ships only an umbrella ADR research digest under docs/. No tracked source code, no tools/vmaf-tune/ directory yet, no Meson changes. An upstream sync touching ffmpeg-patches or libvmaf/ cannot collide with this PR.
  • On upstream sync: zero interaction. Spec-only PR.
  • Re-test on rebase:
# No build/test impact — verify the docs render and links are alive:
ls docs/adr/0237-quality-aware-encode-automation.md \
   docs/research/0044-quality-aware-encode-automation.md
grep -c '\[ADR-0237\]' docs/adr/README.md

0097 — test_speed gated on enable_float (fix default-build failure)

  • PR: fix/test-speed-chroma-registration.
  • What rebases need to know: core/test/meson.build now wraps the test_speed executable + test() registration in if get_option('enable_float'). The speed_chroma / speed_temporal extractors live in speed.c, which is only compiled when enable_float=true (the entries in feature_extractor.c are wrapped in #if VMAF_FLOAT_FEATURES), so the test's vmaf_get_feature_extractor_by_name("speed_chroma") returned NULL on a default build (enable_float=false).
  • On upstream sync: zero interaction. test_speed.c was added fork-side via the Netflix port commit d3647c73. The gating pattern matches test_vulkan_* (if get_option('enable_vulkan').enabled()).
  • Re-test on rebase:
# default (enable_float=false): test_speed must NOT be in the suite
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --reconfigure
ninja -C build
meson test -C build  # expect: NO test_speed in the run

# CI shape (enable_float=true): test_speed must run + pass
meson setup build libvmaf -Denable_float=true --reconfigure
ninja -C build
meson test -C build test_speed  # expect: 5/5 pass

0098 — Vulkan picture preallocation surface (ADR-0238)

  • PR: feat/vulkan-picture-preallocation.
  • What rebases need to know: ABI grows additively. New public surface in core/include/libvmaf/libvmaf_vulkan.h: enum VmafVulkanPicturePreallocationMethod, VmafVulkanPictureConfiguration, vmaf_vulkan_preallocate_pictures, vmaf_vulkan_picture_fetch. New enumerator VMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICE in core/src/picture.h::VmafPictureBufferType. New TU core/src/vulkan/picture_vulkan_pool.c (~180 LOC); registered in core/src/vulkan/meson.build. Fork-internal accessor vmaf_vulkan_state_context() (declared in vulkan_internal.h) exposes the imported state's VkInstance/VkDevice to the pool — used only by libvmaf.c::vmaf_vulkan_preallocate_pictures.
  • VmafContext field added: vmaf->vulkan.pool next to vmaf->vulkan.state. The vmaf_close() teardown closes the pool before clearing the state pointer (matches SYCL).
  • On upstream sync: zero interaction. Vulkan backend is fork-only; upstream Netflix has no Vulkan integration.
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_pic_preallocation # All 6 cases must pass under ASan/UBSan: # test_method_none_is_a_no_op # test_method_host_allocates_round_robins # test_method_device_allocates_round_robins # test_fetch_without_preallocate_falls_back # test_unknown_method_rejected # test_null_args_rejected

0099 — feature_mobilesal.c + transnet_v2.c migrated to tiny_extractor_template.h

  • PR: refactor/migrate-ai-to-template.
  • What rebases need to know: feature_mobilesal.c and transnet_v2.c previously open-coded the model-path resolution (getenv + log block), the YUV→RGB kernel (mobilesal only), the vmaf_dnn_session_open + log boilerplate, and the VmafOption[].model_path row. They now use the helpers from dnn/tiny_extractor_template.h (PR #251) — the same template feature_lpips.c and fastdvdnet_pre.c already consume. Net −98 LOC of identical boilerplate.
  • Behavior preserved: bit-exact YUV→RGB conversion (mobilesal used the literal copy of feature_lpips.c's body that the template hoisted), identical error-log strings, identical option-table flag/type/offset shape. The migrated mobilesal_options macro expands to the same struct literal the hand-rolled version produced.
  • On upstream sync: zero interaction. Both files are fork-introduced; upstream Netflix has neither extractor.

0100 — cuda/ring_buffer.{c,h}gpu_picture_pool.{c,h} (ADR-0239)

  • PR: refactor/gpu-picture-pool-extract.
  • What rebases need to know: core/src/cuda/ring_buffer.c and ring_buffer.h are removed. The same callback-based round-robin pool lives at core/src/gpu_picture_pool.{c,h} under renamed symbols (VmafRingBufferVmafGpuPicturePool, vmaf_ring_buffer_*vmaf_gpu_picture_pool_*, _fetch_next_picture_fetch). All call sites in libvmaf.c migrated. core/test/test_ring_buffer.c renamed to test_gpu_picture_pool.c with the corresponding meson update.
  • Netflix-upstream interaction: minimal — Netflix's cuda/ring_buffer.{c,h} last touched in commit cb1d49c6. An upstream sync that resurrects the old names should be redirected to the new ones; the file move is purely fork-local.
  • Netflix#1300 mutex-destroy-order fix preserved (ADR-0157) — moved verbatim to the new file; the fix remains attached to vmaf_gpu_picture_pool_close.
  • SYCL pool migration: vmaf_sycl_picture_pool_* keeps its public-internal API but now delegates to the generic pool. The SYCL wrapper struct (VmafSyclPicturePool) just owns the VmafSyclCookie storage. std::mutex drops out.
  • Vulkan pool migration: bundled into this PR after #264 merged. picture_vulkan_pool.c rewrites as a thin wrapper around the generic pool — wrapper struct owns per-pool state for the alloc/free callbacks; the generic pool owns the round-robin slots / mutex / unwind. Same pattern as the SYCL migration above.
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=dnn
meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre
# All 11 dnn-suite + 4 extractor smoke tests must pass.
meson test -C build  # 47/47 pass under ASan/UBSan

# CUDA build (CI-only; pre-existing local nvcc include-path quirk):
meson setup build-cuda libvmaf -Denable_cuda=true
ninja -C build-cuda
meson test -C build-cuda test_gpu_picture_pool

# SYCL build:
meson setup build-sycl libvmaf -Denable_sycl=true
ninja -C build-sycl
meson test -C build-sycl

0104 — psnr_vulkan.c migrated to vulkan/kernel_template.h

  • PR: refactor/migrate-psnr-vulkan-to-template.
  • What rebases need to know: vulkan/kernel_template.h (410 LOC, ADR-0246, PR #251) shipped with zero consumers. Its docstring designated psnr_vulkan.c as the reference implementation. This PR lands the migration as the first consumer of the Vulkan template — paired with PR #269 (the first CUDA template consumer). The 5 long-lived pipeline objects (descriptor-set layout, pipeline layout, shader module, compute pipeline, descriptor pool) collapse from individual struct fields to one VmafVulkanKernelPipeline pl bundle. create_pipeline() (~104 LOC) collapses to a single vmaf_vulkan_kernel_pipeline_create() call (~30 LOC) — the template owns the descriptor-set layout creation, pipeline layout, shader module, compute pipeline, and descriptor-pool sizing. close_fex()'s vkDeviceWaitIdle + 5×vkDestroy* sweep collapses to one vmaf_vulkan_kernel_pipeline_destroy() call.
  • Net LOC delta: −55 LOC on psnr_vulkan.c directly. Unlike the CUDA template (where helper-call boilerplate roughly matches the inline savings), the Vulkan template's pipeline creation is dramatic enough that even the first consumer wins.
  • Bit-exactness gates: spec-constants, push-constant struct, shader bytecode, dispatch grid math, and host-side reduction are byte-identical to the prior implementation. The template only owns descriptor-set layout / pipeline layout / shader module / compute pipeline creation / descriptor pool sizing — none of which affects the kernel's mathematical behaviour. Cross-backend parity gate (places=4) re-runs unchanged.
  • On upstream sync: zero interaction. psnr_vulkan.c is fork-introduced (T7-23 / ADR-0182 / ADR-0216).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled
ninja -C build
meson test -C build  # 50/50 pass on lavapipe
# Cross-backend parity gate (places=4):
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4

0105 — moment_vulkan.c + ciede_vulkan.c migrated to vulkan/kernel_template.h

  • PR: refactor/migrate-motion-vulkan-to-template (note: the branch name reflects the original intent; motion's two-pipeline shape didn't fit the template's single-pipeline contract, so this PR migrates moment + ciede instead).
  • What rebases need to know: second + third consumers of vulkan/kernel_template.h (after PR #270 = psnr_vulkan, the first consumer). Both files follow the identical migration pattern:
  • Replace 5 individual pipeline-object fields (dsl, pipeline_layout, shader, pipeline, desc_pool) with one VmafVulkanKernelPipeline pl bundle.
  • Replace ~100 LOC of create_pipeline() body (descriptor-set layout + pipeline layout + shader module + compute pipeline + descriptor pool boilerplate) with a single vmaf_vulkan_kernel_pipeline_create() call.
  • Replace close_fex()'s vkDeviceWaitIdle + 5×vkDestroy* sweep with one vmaf_vulkan_kernel_pipeline_destroy() call.
  • Per-file LOC deltas:
  • moment_vulkan.c: −60 LOC (450 → 390).
  • ciede_vulkan.c: −59 LOC (536 → 477).
  • Net: −119 LOC.
  • Bit-exactness preserved: spec-constants (width/height/bpc/ subgroup_size identical across both), push-constant structs (MomentPushConsts, CiedePushConsts), shader bytecodes (moment_spv, ciede_spv), dispatch grid math, and host-side reductions are byte-identical to the prior implementation. Cross-backend parity gates (places=4 for moment integer reduce; places=2 for ciede transcendentals per ADR-0187) re-run unchanged.
  • motion_vulkan.c deferred: motion uses two pipelines (first frame vs subsequent) sharing one DSL + layout + shader + pool. The template's current shape produces one pipeline per descriptor; splitting motion across two VmafVulkanKernelPipeline instances would duplicate the shared objects. Tracked as a follow-up template extension (multi-pipeline support).
  • On upstream sync: zero interaction. Both files are fork-introduced (T7-23 / ADR-0182 / ADR-0187).
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build # 50/50 pass on lavapipe (under ASan/UBSan) python scripts/ci/cross_backend_parity_gate.py --feature float_moment_ref1st --places 4 python scripts/ci/cross_backend_parity_gate.py --feature ciede2000 --places 2

0101 — GPU backend pattern doc (ADR-0240)

  • PR: docs/gpu-backend-template.
  • What rebases need to know: doc-only PR. Adds docs/development/gpu-backend-template.md (recipe new GPU backends follow) and core/include/libvmaf/AGENTS.md (public-headers-tree invariant note). No source code, no meson changes, no ABI impact.
  • On upstream sync: zero interaction. Both files are fork-introduced.
  • Re-test on rebase:

```bash # Doc-only — verify links resolve: test -f docs/development/gpu-backend-template.md test -f core/include/libvmaf/AGENTS.md grep -c 'gpu-backend-template' core/include/libvmaf/AGENTS.md

0102 — Tiny-AI test registration macro (tiny_ai_test_template.h)

  • PR: refactor/test-registration-macro.
  • What rebases need to know: new core/test/tiny_ai_test_template.h emits the four standard registration tests (<name>_is_registered, <name>_provides_primary_feature, <name>_options_table_well_formed, <name>_init_rejects_missing_model) via the VMAF_TINY_AI_DEFINE_REGISTRATION_TESTS(ext, feat, env, prefix) macro. The four per-extractor test files (test_lpips.c, test_mobilesal.c, test_transnet_v2.c, test_fastdvdnet_pre.c) shrank from ~140 LOC each to ~20-50 LOC. Net −286 LOC. Behavior bit-exact preserved (same assertions, same env-var save/restore dance, same setenv shim for MSVCRT). TransNet V2 keeps two extractor-specific extra tests (binary-flag round-trip + provided_features list-termination) that the macro doesn't cover.
  • On upstream sync: zero interaction. The four test files are fork-introduced (per ADR-0042 / ADR-0168 / ADR-0220 / ADR-0223 / ADR-0215).
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre # 4/4 binaries pass; 18 individual tests total (4x4 standard + 2 # TransNet V2 extras).

0103 — integer_psnr_cuda.c migrated to cuda/kernel_template.h

  • PR: refactor/migrate-psnr-cuda-to-template.
  • What rebases need to know: cuda/kernel_template.h shipped with no consumers in PR #251 (ADR-0246). This PR migrates the first consumer (integer_psnr_cuda.c) — the file the template's own docstring explicitly designated as the reference. The CUstream + CUevent + CUevent triple and the (VmafCudaBuffer device, void *host_pinned, size_t bytes) readback pair are now dispensed by the template helpers (vmaf_cuda_kernel_lifecycle_init/_close, vmaf_cuda_kernel_readback_alloc/_free, vmaf_cuda_kernel_submit_pre_launch, vmaf_cuda_kernel_collect_wait) instead of being open-coded. PsnrStateCuda shrinks: replaces three fields (event + finished + str) with one VmafCudaKernelLifecycle replaces (sse + sse_host) with one VmafCudaKernelReadback.
  • Net LOC delta: +8 LOC on integer_psnr_cuda.c alone — the helpers add per-call boilerplate. The dedup win materialises as more CUDA feature kernels (motion / moment / ssim / vif / adm) migrate one-at-a-time in follow-up PRs. Each subsequent migration saves ~15 LOC.
  • Bit-exactness gates: kernel launch + reduction logic unchanged. The migration only touches state-management boilerplate around the kernel; the SSE accumulator math, the per-bpc kernel function lookup, the host-side log10 score formula, and the dispatch grid-dim calculation are byte-identical to the prior implementation. Netflix golden gate + CPU/CUDA cross-backend parity gate (places=4) re-run unchanged.
  • On upstream sync: zero interaction. integer_psnr_cuda.c is fork-introduced (T7-23 / ADR-0182).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true
ninja -C build
meson test -C build  # CUDA test suite must pass
# Cross-backend parity gate:
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4

0125 — Vulkan submit-side template + fence pool + descriptor pre-alloc bundle (ADR-0256)

  • Touches:
  • core/src/vulkan/kernel_template.h — fork-local. Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. VmafVulkanKernelSubmitPool struct + _create / _destroy / _acquire helpers + vmaf_vulkan_kernel_descriptor_sets_alloc helper. Upstream has no Vulkan backend — no merge surface.
  • core/src/feature/vulkan/{psnr_hvs,vif,float_vif,float_adm}_vulkan.c — fork-local kernel TUs, also no upstream peer.
  • Invariant: the four migrated kernels keep all per-frame VkFence + VkCommandBuffer + VkDescriptorSet resources alive across frames in the pool. Pre-bound descriptor sets rely on the kernel's VmafVulkanBuffer * handles being init-time stable (allocated in init(), freed only in close_fex). vmaf_vulkan_kernel_pipeline_destroy destroys the descriptor pool — pre-allocated sets are released implicitly via the pool; callers must NOT call vkFreeDescriptorSets on them.
  • Re-test on rebase:
meson setup build libvmaf -Denable_vulkan=enabled
ninja -C build
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
    meson test -C build test_vulkan_smoke \
                        test_vulkan_async_pending_fence \
                        test_vulkan_pic_preallocation
python scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature vif --backend vulkan --places 4
python scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature adm --backend vulkan --places 4

0107 — psnr_hvs_cuda async upload + persistent pinned staging (T-GPU-OPT-2/3)

  • Touches:
  • core/src/feature/cuda/integer_psnr_hvs_cuda.c — only consumer; fork-local from inception (T7-23 / ADR-0188 / ADR-0191). State adds upload_str (dedicated H2D stream), upload_done (cross-stream completion event), and per-plane persistent pinned h_uint_ref[3] / h_uint_dist[3] staging buffers allocated once in init_fex_cuda. The per-call helper upload_plane_cuda is split into issue_d2h_plane (pic-stream D2H), convert_plane (CPU normalise), and issue_h2d_plane (upload-stream H2D). submit_fex_cuda runs the three phases explicitly and records upload_done after the last H2D, then cuStreamWaitEvents on lc.str before kernel launches.
  • core/src/cuda/AGENTS.md — adds a rebase-sensitive invariant entry under §Rebase-sensitive invariants documenting the three-phase flow + persistent staging contract.
  • Invariant: the pinned h_uint_* and h_ref / h_dist buffers are never freed and re-allocated mid-stream; the H2Ds must run on upload_str (not on lc.str) so the cuStreamWaitEvent cross-stream link is meaningful; the upload_done event is recorded after the last H2D for the current frame and waited on once before the first kernel launch of that frame. CUDA graph capture (future T-GPU-OPT-N) depends on the no-per-frame-alloc invariant; collapsing the three-phase split or re-introducing per-frame vmaf_cuda_buffer_host_alloc calls breaks that follow-up. Bit-exactness gate is places=3 for psnr_hvs_y / cb / cr and the combined psnr_hvs (matches the existing matrix; not places=4).
  • On upstream sync: zero interaction. integer_psnr_hvs_cuda.c is fork-introduced (T7-23 / ADR-0188 / ADR-0191).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
  --feature psnr_hvs --backend cuda --places 3

0227 — output.c writer-format unit tests (R3 of coverage-gap-2026-05-02)

  • Touches:
  • core/test/test_output.c (new) — exercises the four writers in core/src/output.c (XML / JSON / CSV / SUB) end-to-end via tmpfile()-backed sinks and a synthetic VmafFeatureCollector. Pure test-only; no production code change.
  • core/test/meson.build — registers test_output next to test_feature_collector (mirrors that test's wiring: link_with: libvmaf + libsvm objects + log/predict/metadata helpers).
  • Invariant: the test pulls libvmaf.c and output.c in via #include "*.c" (mirroring the precedent in test_feature_collector.c) so the per-translation-unit .gcno lands in the test build dir and gcovr aggregates output.c's coverage. The mu-test framework macro (mu_assert) deliberately early-returns from each static char *test_*() body — that's why every test body trips clang-analyzer-unix.Malloc "potential leak" notes (cleanup runs only on the success-tail path). This pattern is shared across every core/test/test_*.c file and is load- bearing (per ADR-0141 NOLINT carve-out): replacing it with goto- cleanup would obscure the per-assertion failure message.
  • On upstream sync: zero interaction. output.c is upstream- mirrored, but this PR doesn't touch it. The test only depends on the four public function signatures (vmaf_write_output_{xml, json,csv,sub}); if Netflix renames or reorders those, the test fails to compile and the rebase author updates it then.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && ./build/test/test_output

0126 — OSSF Scorecard policy (ADR-0263)

  • Touches: .github/workflows/scorecard.yml (line 45 — the github/codeql-action/upload-sarif@<sha> pin). The rest of the policy is doc-only (docs/adr/0263-*.md, docs/research/0053-*.md, changelog.d/security/). Upstream Netflix/vmaf does not ship a Scorecard workflow, so the path itself is fork-introduced and won't conflict.
  • Invariant: the upload-sarif SHA must point to a commit that currently exists in github/codeql-action's git tree. A SHA that was once v4 head but no longer exists in the action repository triggers Scorecard's "imposter commit" defence and breaks the workflow with a 400 error against api.scorecard.dev. Verify on every Dependabot bump by spot-checking gh api /repos/github/codeql-action/commits/<sha> returns 200.
  • On upstream sync: zero interaction.
  • Re-test on rebase:

```bash # Confirm the pin still resolves to a real commit: pin=$(grep -oE 'codeql-action/upload-sarif@[a-f0-9]{40}' \ .github/workflows/scorecard.yml | head -1 | cut -d@ -f2) gh api "/repos/github/codeql-action/commits/$pin" --jq '.sha' # Then watch the next master push for a green Scorecard run: gh run list --workflow scorecard --repo VMAFx/vmafx --limit 1

0228 — U-2-Net u2netp saliency replacement deferred (ADR-0265)

  • Touches: docs-only.
  • docs/adr/0265-u2netp-saliency-replacement-blocked.md — new ADR continuing the deferral chain started by ADR-0257.
  • docs/research/0055-u2netp-saliency-replacement-survey.md — new research digest (upstream survey + license + distribution
    • op-allowlist audit + alternatives walk).
  • docs/ai/models/mobilesal.md — pointer block updated to reference both ADR-0257 (first blocker) and ADR-0265 (second blocker).
  • model/tiny/registry.jsonmobilesal_placeholder_v0 notes field updated to reference ADR-0265 alongside ADR-0257 (no schema / sha256 / file changes).
  • model/tiny/mobilesal.json — sidecar notes field updated in lockstep.
  • scripts/gen_mobilesal_placeholder_onnx.py — generator notes string updated so re-running is idempotent against the new sidecar / registry text.
  • CHANGELOG.md — Changed entry via changelog.d/changed/T6-2a-followup-u2netp-replacement-deferred.md.
  • docs/adr/README.md — index row via docs/adr/_index_fragments/0265-u2netp-saliency-replacement-blocked.md.
  • Invariant: zero C-side surface change. feature_mobilesal.c tensor-name contract (input input → output saliency_map, NCHW float32 [1, 3, H, W][1, 1, H, W]) is unchanged; the on-disk model/tiny/mobilesal.onnx (sha256 f1226310…) is unchanged; mobilesal_placeholder_v0's smoke: true flag is unchanged. Any future drop-in (U-2-Net via T6-2a-mirror-u2netp-via-release + T6-2a-widen-allowlist-resize, distilled student, or BASNet / PoolNet survey result) replaces the .onnx and bumps the registry sha256 without touching the C side.
  • On upstream sync: zero interaction. feature_mobilesal.c, the registry, the ADR, and the research digest are all fork-local (T6-2a; ADR-0218 / ADR-0257 / ADR-0265; not present in Netflix upstream).
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_mobilesal
python3 ai/scripts/validate_model_registry.py
bash scripts/docs/concat-adr-index.sh --check
bash scripts/release/concat-changelog-fragments.sh --check

0108 — ssim_accumulate_avx512 per-lane double reduction vectorised

  • ADR: ADR-0139 (existing; no new ADR — the per-lane reduction order is unchanged).
  • Touches:
  • core/src/feature/x86/ssim_avx512.c — the ssim_accumulate_block_avx512 body. The per-lane scalar ssim_accumulate_lane calls (16 of them) are replaced by two 8-wide __m512d passes that compute lv, cv, sv, and lv*cv*sv lane-wise in vector double. Aligned double[16] spill buffers replace the previous _Alignas(64) float[16]×6 spill, and the scalar accumulation loop now does 4×16 vaddsd instead of 16 invocations of the per-lane helper.
  • CHANGELOG.md — Changed entry.
  • This file — this entry.
  • Invariant (load-bearing for ADR-0139 bit-exactness):
  • Per-lane double computation order is byte-identical: ((2.0 * rm) * cm + C1) / l_den, then (2.0 * srsc + C2) / c_den, then (lv * cv) * sv. No FMA contraction (separate _mm512_mul_pd + _mm512_add_pd_mm512_fmadd_pd is forbidden because it changes the rounding count and would diverge from scalar's two-step mul+add).
  • Float→double widening uses _mm512_cvtps_pd which is IEEE-754-exact for finite floats (52-bit mantissa fits 23-bit float losslessly).
  • Lane-by-lane left-to-right reduction order preserved: local_ssim += t_ssim[k] for k = 0..15. Tree reductions (pairwise add, dual-accumulator unroll) are forbidden — they break running-sum associativity against scalar.
  • AVX2 / NEON twins kept on the per-lane scalar path. Verified bit-identical against the new AVX-512 at --precision max on the Netflix src01_hrc00/01_576x324 and the checkerboard_1920_1080_10_3_*_0 pairs. The bit-exactness contract (ADR-0139) is per-lane, not per-ISA algorithm — so AVX2 / NEON stay scalar-per-lane until a dedicated PR vectorises them with the same care.
  • Rebase impact: zero conflict with Netflix upstream — the whole SSIM SIMD surface is fork-local (no upstream SSIM SIMD exists). Conflicts only arise if upstream changes ssim_accumulate_default_scalar in iqa/ssim_tools.c; in that case both the AVX2 / NEON per-lane helper and the AVX-512 vector-double block need a coordinated update preserving the three invariants above.
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build
# Bit-exact at --precision max, scalar vs AVX2 vs AVX-512:
for MASK in 0 16 255; do
  core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --feature float_ms_ssim --feature float_ssim \
    --xml -o /tmp/m${MASK}.xml --precision max --cpumask $MASK
done
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m16.xml)   # empty
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m255.xml)  # empty
  • Why this matters on rebase: an upstream commit that touches core/src/feature/ssimulacra2.c could prompt a "let's also port the GPU XYB while we're here" follow-up. The ledger entry is the standing answer: don't, the measurement was redone on NVIDIA in May 2026 and the result still failed places=4 by five decades. See Research-0047.

0126 — FastDVDnet real upstream weights drop (ADR-0253)

  • What changed: replaces model/tiny/fastdvdnet_pre.onnx with the wrapped real upstream FastDVDnet checkpoint (sha256 eb9444cf6f07eefdc7f4f68d09131074dbd1dcee6f88a331ba684dd2fb5937d4, ~9.5 MiB), refreshes the sidecar model/tiny/fastdvdnet_pre.json, flips the registry row's smoke: true → false and adds license: "MIT" + the upstream commit pin c8fdf61. New exporter ai/scripts/export_fastdvdnet_pre.py (the older _placeholder.py exporter is retained for reference). New ADR docs/adr/0255-fastdvdnet-pre-real-weights.md; user-facing doc docs/ai/models/fastdvdnet_pre.md rewritten with provenance, license attribution, and reproduce-the-export instructions.
  • Upstream source: fork-local. Netflix/vmaf does not ship a FastDVDnet temporal pre-filter; the C extractor and ONNX surface are entirely fork-introduced (ADR-0215). The wrapped weights are attribution-only (upstream m-tassano/fastdvdnet MIT).
  • On upstream sync: zero interaction. Every file touched (ai/scripts/export_fastdvdnet_pre*.py, model/tiny/fastdvdnet_pre.*, docs/ai/models/fastdvdnet_pre.md, docs/adr/0253-*.md, CHANGELOG fragment, ADR index fragment) lives in fork-introduced trees.
  • Re-test on rebase:
# Re-derive the ONNX from the pinned upstream checkpoint.
mkdir -p /tmp/fastdvdnet_upstream && cd /tmp/fastdvdnet_upstream
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/model.pth
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/models.py
cd /path/to/vmaf
python3 ai/scripts/export_fastdvdnet_pre.py \
    --upstream-dir /tmp/fastdvdnet_upstream
python3 ai/scripts/validate_model_registry.py
meson test -C build --suite=fast --print-errorlogs test_fastdvdnet_pre

0127 — ONNX op-allowlist gains Resize (ADR-0258)

  • Touches:
  • core/src/dnn/op_allowlist.c — fork-local file (no upstream counterpart). One new entry "Resize" under the /* convolutional */ block.
  • core/test/dnn/test_op_allowlist.c, core/test/dnn/test_onnx_scan.c — fork-local DNN tests.
  • ai/tests/test_op_allowlist.py — fork-local Python parity test.
  • Invariant: the C allowlist is the single source of truth; the Python regex parser in ai/src/vmaf_train/op_allowlist.py walks the same op_allowlist.c file. Any future entry only needs the C edit — Python symmetry is automatic.
  • Upstream source: fork-local. Netflix/vmaf has no ONNX op- allowlist surface; the entire core/src/dnn/ tree is fork- introduced.
  • On upstream sync: zero interaction. Every file touched lives in fork-introduced trees.
  • Re-test on rebase:
meson test -C build test_op_allowlist test_onnx_scan
PYTHONPATH=ai/src python -m pytest ai/tests/test_op_allowlist.py

0231 — vif.comp + ciede.comp precise decorations (ADR-0269 / Step A of Vulkan 1.4 bump)

  • Touches: core/src/feature/vulkan/shaders/vif.comp (3 local-variable type qualifiers: g, sv_sq, gg_sigma_fprecise float), core/src/feature/vulkan/shaders/ciede.comp (yuv_to_rgb outputs, rgb_to_xyz matmul accumulators, ciede2000 chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE).
  • Invariant: Both shaders are fork-local (Vulkan backend is fork-added; upstream Netflix/vmaf has no Vulkan compute kernels). The precise keyword is GLSL 4.50 standard syntax; glslc 2026.1 lowers it to per-result OpDecorate NoContraction. The decorations are load-bearing for the cross-backend gate on NVIDIA driver 595.71+ — removing them would re-introduce the 42/48 ciede regression at API 1.3 documented in research-0054.
  • On upstream sync: zero interaction. Both shader files are entirely fork-introduced; upstream has no Vulkan compute path.
  • Re-test on rebase:
# Re-confirm the cross-backend gate on a Vulkan-capable host.
meson setup core/build -Denable_vulkan=enabled
ninja -C core/build
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature vif --backend vulkan --places 4
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature ciede --backend vulkan --places 4
# Confirm SPIR-V still emits NoContraction post-rebase.
glslc --target-env=vulkan1.3 -O \
    core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
spirv-dis /tmp/vif.spv | grep -c NoContraction   # expect ≥ 60

Expected on NVIDIA 595.71+: vif 0/48 OK, ciede 5/48 FAIL (max abs 8.9e-05 — pre-existing fork debt at API 1.3, see ADR-0269). On RADV / lavapipe: bit-exact (precise is a no-op there).

0229 — fr_regressor_v2 codec-aware scaffold (ADR-0272)

  • ADR: ADR-0272
  • Touches:
  • ai/scripts/train_fr_regressor_v2.py (new) — Phase A JSONL consumer; trains the codec-aware FRRegressor.
  • model/tiny/fr_regressor_v2.onnx (new, smoke) — placeholder ONNX from --smoke mode; re-baked on production training.
  • model/tiny/fr_regressor_v2.json (new) — sidecar.
  • model/tiny/registry.json — new entry with smoke: true.
  • docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md (new).
  • docs/adr/README.md — index row.
  • docs/research/0058-fr-regressor-v2-feasibility.md (new).
  • docs/ai/models/fr_regressor_v2.md (new) — model card.
  • ai/AGENTS.md — invariant note (codec block layout + ENCODER_VOCAB ordering).
  • CHANGELOG.md — Added entry.
  • Invariant: the 8-D codec block layout is [encoder_onehot(6), preset_norm, crf_norm] with ENCODER_VOCAB = (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown) in load-bearing order. CRF normaliser is /63 (union upper bound). Preset normaliser is /9. Bumping the vocabulary requires a re-train; existing checkpoints pin the order they were trained against via encoder_vocab_version in the sidecar. The two-input ONNX (features, codec) follows the LPIPS-Sq precedent (ADR-0040 / ADR-0041).
  • Rebase impact: entirely fork-local; pure additive; no upstream-mirror file is touched. Phase A schema (consumed by this trainer) is itself fork-local (tools/vmaf-tune/). No conflict expected on /sync-upstream.
  • Re-test on rebase:
python ai/scripts/train_fr_regressor_v2.py --smoke
python ai/scripts/validate_model_registry.py

0311 — libFuzzer harness expansion: yuv_input + cli_parse (ADR-0311)

  • ADR: ADR-0311; parent ADR-0270.
  • Touches:
  • core/test/fuzz/fuzz_yuv_input.c (new)
  • core/test/fuzz/fuzz_cli_parse.c (new)
  • core/test/fuzz/meson.build — two new executable(...) blocks for the harnesses, plus a shared fuzz_vidinput_sources list.
  • core/test/fuzz/yuv_input_corpus/* (new — 6 seeds covering 8/10-bit × 4:2:0 / 4:2:2 / 4:4:4 plus a truncated-frame seed).
  • core/test/fuzz/cli_parse_corpus/* (new — 6 seeds covering the --feature, --model, --reference, YUV-flag, and --help shapes).
  • core/test/fuzz/README.md — Targets table extended.
  • .github/workflows/fuzz.yml — matrix gains fuzz_yuv_input + fuzz_cli_parse; per-harness wall-clock budget reduced from 300 s to 60 s so the 3-target matrix fits the existing timeout-minutes: 15 cap.
  • docs/development/fuzzing.md — runbook table + smoke commands extended.
  • docs/adr/0311-libfuzzer-harness-expansion.md (new)
  • docs/research/0083-libfuzzer-harness-expansion-target-survey.md (new)
  • libvmaf/AGENTS.md — new invariant block for the one-parser-one-harness rule.
  • CHANGELOG.md — Added entry.
  • Invariant:
  • The fuzz scaffold remains opt-in (-Dfuzz=true) — every default meson setup invocation must continue to skip it.
  • fuzz_yuv_input re-includes tools/yuv_input.c and the rest of the vidinput trio as build inputs. Upstream Netflix/vmaf splits or renames of those source files need the matching meson.build source-list update.
  • fuzz_cli_parse re-includes tools/cli_parse.c as a build input and links against libvmaf for vmaf_version() and feature-dictionary symbols. The -Wl,--wrap=exit link arg is load-bearing — without it, usage()'s exit(1) would terminate the fuzzer process on first bad input.
  • LLVMFuzzerTestOneInput keeps external linkage; the scaffold-wide // NOLINTNEXTLINE(misc-use-internal-linkage) pattern is correct for libFuzzer's name-resolved entry-point ABI.
  • Rebase impact: any upstream sync that touches core/tools/{yuv_input,cli_parse}.c must re-run the 60 s smoke per harness on the merged tip; record any new-found crash-* artefact under the matching <target>_known_crashes/ dir, not in <target>_corpus/. The __wrap_exit shim in fuzz_cli_parse.c is GNU-ld / lld-only; do not assume it works on Apple ld without an -undefined,dynamic_lookup fallback.
  • Re-test on rebase:
CC=clang CXX=clang++ \
  meson setup build-fuzz libvmaf \
    --buildtype=debug \
    -Db_sanitize=address \
    -Db_lundef=false \
    -Dfuzz=true \
    -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz \
    test/fuzz/fuzz_y4m_input \
    test/fuzz/fuzz_yuv_input \
    test/fuzz/fuzz_cli_parse
./build-fuzz/test/fuzz/fuzz_yuv_input \
    -seed=0 -runs=1000 \
    core/test/fuzz/yuv_input_corpus/
./build-fuzz/test/fuzz/fuzz_cli_parse \
    -seed=0 -runs=1000 \
    core/test/fuzz/cli_parse_corpus/

0229 — libFuzzer scaffold for the YUV4MPEG2 parser (ADR-0270)

  • ADR: ADR-0270
  • Touches:
  • core/test/fuzz/fuzz_y4m_input.c (new)
  • core/test/fuzz/meson.build (new)
  • core/test/fuzz/README.md (new)
  • core/test/fuzz/y4m_input_corpus/* (new — six seeds)
  • core/test/fuzz/y4m_input_known_crashes/* (new — one 411-chroma OOB reproducer; excluded from CI corpus)
  • core/test/meson.buildsubdir('fuzz') line.
  • core/meson_options.txt — new option('fuzz', ...).
  • .github/workflows/fuzz.yml (new — nightly 5-minute job).
  • docs/development/fuzzing.md (new — operator runbook).
  • docs/adr/0270-fuzzing-scaffold.md (new)
  • docs/research/0059-libfuzzer-scaffold-y4m.md (new)
  • docs/state.md — new Open-bug row for the 411-chroma OOB write.
  • CHANGELOG.md — Added entry.
  • Invariant: the fuzz scaffold is opt-in — every default meson setup invocation must continue to skip it. The harness links statically against core/tools/{y4m_input,yuv_input,vidinput}.c rather than libvmaf.so so the public C-API surface stays unchanged.
  • Rebase impact: the harness re-includes core/tools/y4m_input.c as a build input. Any upstream Netflix/vmaf change that splits or renames the tool sources (e.g. moves the parser into core/src/) needs the corresponding meson.build source list update and the harness re-test below. The y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m reproducer is the regression gate for the parser fix; do not delete it on upstream sync — if upstream lands the same fix, port the reproducer back into y4m_input_corpus/ as a permanent seed.
  • Re-test on rebase:
CC=clang CXX=clang++ \
  meson setup build-fuzz libvmaf \
    --buildtype=debug \
    -Db_sanitize=address \
    -Db_lundef=false \
    -Dfuzz=true \
    -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
./build-fuzz/test/fuzz/fuzz_y4m_input \
    -max_total_time=60 \
    core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer still triggers (until the fix lands):
./build-fuzz/test/fuzz/fuzz_y4m_input \
    core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m

0231 — HIP seventh-consumer kernel float_motion_hip (ADR-0273)

  • ADR: ADR-0273
  • Touches:
  • core/src/feature/hip/float_motion_hip.c (new) — seventh consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/float_motion_cuda.c call-graph-for-call-graph; init/submit/collect/close invoke the kernel-template helpers in the same order; flush() callback for tail-frame motion2 emission; motion_force_zero short-circuit posture (fex->extract swap with submit / collect / flush / close nulled). Submit path intentionally bypasses vmaf_hip_kernel_submit_pre_launch (kernel writes per-WG SAD float partials directly, no atomic, no memset).
  • core/src/feature/hip/float_motion_hip.h (new)
  • core/src/hip/meson.build — new entry in hip_sources.
  • core/src/feature/feature_extractor.c — extern declaration plus feature_extractor_list[] entry under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — new sub-test test_float_motion_hip_extractor_registered (also asserts the VMAF_FEATURE_EXTRACTOR_TEMPORAL flag bit) and a row in test_table[].
  • docs/adr/0273-hip-seventh-consumer-float-motion.md (new)
  • docs/adr/README.md — index row.
  • docs/backends/hip/overview.md — seventh / eighth consumer note.
  • core/src/hip/AGENTS.md — invariant note.
  • CHANGELOG.md — Added entry (joint with ADR-0274).
  • Invariant — three-buffer ping-pong + motion_force_zero short-circuit are load-bearing. The state struct carries three uintptr_t buffer slots (ref_in, blur[2]) that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin's VmafCudaBuffer *ref_in + VmafCudaBuffer *blur[2] field shape. The motion_force_zero short-circuit (fex->extract swap, kernel-template helpers nulled) must stay aligned with the CUDA twin on every refactor — otherwise the runtime PR's helper-body flip diverges between the two backends. The submit_pre_launch bypass mirrors the CUDA twin; if a future PR adds a submit_pre_launch call to float_motion_cuda.c's submit path, the HIP twin must follow in the same PR.
  • Rebase impact: entirely fork-local. New files are HIP-specific. The only upstream-touching edit is feature_extractor.c, but the change sits inside an existing #if HAVE_HIP block (ADR-0241); upstream has no HAVE_HIP so no conflict is expected.
  • Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
  -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke

0232 — HIP eighth-consumer kernel float_ssim_hip (ADR-0274)

  • ADR: ADR-0274
  • Touches:
  • core/src/feature/hip/float_ssim_hip.c (new) — eighth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/integer_ssim_cuda.c call-graph-for-call-graph (the CUDA file registers vmaf_fex_float_ssim_cuda despite its integer_ filename). First multi-dispatch HIP consumer (chars.n_dispatches_per_frame == 2). Submit path intentionally bypasses vmaf_hip_kernel_submit_pre_launch (kernel writes per-block float partials directly). State struct carries five uintptr_t intermediate float buffer slots (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) tracked outside the kernel-template's readback bundle. validate_dims_hip and init_dims_hip helpers extracted from init() to fit the readability-function-size budget.
  • core/src/feature/hip/float_ssim_hip.h (new)
  • core/src/hip/meson.build — new entry in hip_sources.
  • core/src/feature/feature_extractor.c — extern declaration plus feature_extractor_list[] entry under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — new sub-test test_float_ssim_hip_extractor_registered (also asserts chars.n_dispatches_per_frame == 2) and a row in test_table[].
  • docs/adr/0274-hip-eighth-consumer-float-ssim.md (new)
  • docs/adr/README.md — index row.
  • docs/backends/hip/overview.md — seventh / eighth consumer note (joint).
  • core/src/hip/AGENTS.md — invariant note.
  • CHANGELOG.md — Added entry (joint with ADR-0273).
  • Invariant — multi-dispatch + five-slot buffer pyramid + v1 scale=1 validation are load-bearing. The state struct carries five uintptr_t intermediate float buffer slots that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin's VmafCudaBuffer *h_* field shape — any drift in the CUDA twin's slot count requires a paired update here. The chars.n_dispatches_per_frame == 2 characteristic is asserted in the smoke test; do not silently lower it. The v1 scale=1 -EINVAL validation surface (in validate_dims_hip) must stay aligned with the CUDA twin's compute_scale / vmaf_log chain. The HIP twin's validate_dims_hip / init_dims_hip extraction is intentional for the function-size budget; do not re-inline without verifying the budget still passes.
  • Rebase impact: entirely fork-local; same posture as ADR-0273.
  • Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
  -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke

0229 — vmaf_tiny_v3 + vmaf_tiny_v4 dynamic-PTQ int8 sidecars (ADR-0275)

0278 — vmaf-tune libaom-av1 codec adapter (2026-05-03)

0228 — vmaf-tune libx265 codec adapter (ADR-0288)

0280 — vmaf-tune NVENC codec adapters (ADR-0290)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_nvenc,hevc_nvenc,av1_nvenc,_nvenc_common}.py (new). Wholly fork-local — no upstream Netflix/vmaf overlap.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py — registry expanded.
  • tools/vmaf-tune/tests/test_codec_adapter_nvenc.py (new).
  • tools/vmaf-tune/tests/test_corpus.py — Phase-A registry assertion updated.
  • tools/vmaf-tune/AGENTS.md — invariant note expanded.
  • docs/usage/vmaf-tune.md — "Hardware encoders (NVENC)" section.
  • docs/adr/0290-vmaf-tune-nvenc-adapters.md (new) + docs/adr/README.md index row.
  • docs/research/0065-vmaf-tune-nvenc-adapters.md (new).
  • CHANGELOG.md — Added entry.
  • Invariant: known_codecs() returns the four-codec tuple ("av1_nvenc", "h264_nvenc", "hevc_nvenc", "libx264"); the mnemonic preset map (ultrafast/superfast/veryfastp1, fasterp2, fastp3, mediump4, slowp5, slowerp6, slowest/placebop7) is the canonical cross-codec preset alignment that downstream Phase B/C consumers assume. The CQ window is the hardware-permitted [0, 51]; the Phase A informative window is [15, 40].
  • Rebase impact: zero — tools/vmaf-tune/ is wholly fork-local and has no upstream Netflix/vmaf path overlap.
  • Re-test on rebase:
cd tools/vmaf-tune && python -m pytest tests/ -q

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry add), tools/vmaf-tune/src/vmaftune/encode.py (parse_versions(stderr, encoder=…) gains a per-codec branch), tools/vmaf-tune/src/vmaftune/cli.py (help-text wording only), tools/vmaf-tune/tests/test_codec_adapter_x265.py (new), tools/vmaf-tune/tests/test_corpus.py (membership-based codec list assertion).
  • Invariant: the codec-adapter contract documented in tools/vmaf-tune/AGENTS.md (multi-codec from day one; the search loop never branches on codec identity). The parse_versions signature is still backward-compatible — encoder defaults to libx264 so callers from before this PR keep working.
  • Upstream source: fork-local. tools/vmaf-tune/ is fork-only; upstream Netflix/vmaf does not ship encode automation.
  • On upstream sync: zero interaction. Confirm the _index_fragments/_order.txt row for 0288-vmaf-tune-codec-adapter-x265 remains present after any cross-merge.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -x

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry row + import), tools/vmaf-tune/tests/test_corpus.py (membership assertion relaxed from == ("libx264",) to "libx264" in known_codecs()), tools/vmaf-tune/tests/test_codec_adapter_libaom.py (new), tools/vmaf-tune/AGENTS.md (preset-vocabulary invariant).
  • Invariant: the cross-codec preset vocabulary (placebo, slowest, slower, slow, medium, fast, faster, veryfast, superfast, ultrafast) is shared across AV1-family adapters so one --preset axis covers x264 / x265 / svtav1 / libaom-av1. Each adapter maps the human name onto its codec-specific knob; do not introduce per-adapter preset names.
  • Upstream source: fork-local. tools/vmaf-tune/ is the fork-introduced quality-aware encode automation harness (ADR-0237); it has no upstream Netflix/vmaf counterpart.
  • On upstream sync: zero interaction with upstream/master. Self-contained in tools/vmaf-tune/ and docs/.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • ADR: ADR-0275
  • Touches:
  • model/tiny/vmaf_tiny_v3.int8.onnx (new, 4 267 B)
  • model/tiny/vmaf_tiny_v4.int8.onnx (new, 7 769 B)
  • model/tiny/registry.json — new vmaf_tiny_v3 and vmaf_tiny_v4 rows with quant_mode, int8_sha256, quant_accuracy_budget_plcc fields.
  • model/tiny/vmaf_tiny_v3.json, model/tiny/vmaf_tiny_v4.json — same fields mirrored into the per-model sidecars.
  • docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md — new "Quantisation" sections.
  • docs/adr/0275-vmaf-tiny-v3-v4-ptq.md (new) and ADR index row.
  • CHANGELOG.md — Added entry.
  • Invariant: python ai/scripts/measure_quant_drop.py --all reports [PASS] for both vmaf_tiny_v3 (drop ≤ 0.001 on Netflix features) and vmaf_tiny_v4 (drop ≤ 0.001), inside the 0.01 per-model budget. The runtime redirect from ADR-0174 picks the .int8.onnx sibling when an operator's registry overlay declares quant_mode: dynamic.
  • Rebase impact: entirely fork-local — neither v3 nor v4 nor the dynamic-PTQ harness exists upstream. The new int8 ONNX bytes ship as committed binaries (mirroring learned_filter_v1 and nr_metric_v1); they are well below the few-MB external-data threshold and don't require the sigstore + .onnx.data pattern.
  • Re-test on rebase:

```bash python ai/scripts/validate_model_registry.py python ai/scripts/measure_quant_drop.py --all

0229 — NVIDIA-Vulkan ciede2000 places=4 fork debt root-cause (ADR-0273)

  • Touched files: docs-only.
  • docs/adr/0273-...precision-gap.md (new) + _index_fragments/ row + _order.txt append.
  • docs/research/0055-ciede-vulkan-nvidia-f32-f64-root-cause.md (new) + docs/research/README.md index row.
  • docs/state.md — Open-bugs row T-VK-CIEDE-F32-F64.
  • docs/backends/vulkan/overview.md — NVIDIA-hardware caveat.
  • changelog.d/changed/ciede-vulkan-nvidia-f32-f64-precision-gap.md (new).
  • core/src/vulkan/AGENTS.md — invariant cross-link.
  • Invariant: the ciede.comp shader's f32 precision contract is load-bearing — promoting to f64 would silently change scores on every Vulkan device that supports shaderFloat64 and create a per-device-feature-bit divergence (RTX 4090 has it; many consumer GPUs don't). The CPU ciede.c::get_lab_color doing its colour-space chain in double is upstream Netflix behaviour and must not be narrowed to f32 to "fix" the GPU gap (would change Netflix golden ground truth). The 5/48 NVIDIA places=4 mismatch on the highest-ΔE frames is expected and documented; do not attempt to "fix" it without re-reading ADR-0273 first.
  • Rebase impact: zero — docs-only. The CPU and shader sources this ADR analyses are unchanged by this PR. If a future upstream rebase touches ciede.c::get_lab_color (the double chain) the ADR's reasoning still holds; if upstream changes the CPU reference's precision posture, ADR-0273 needs a Status: Superseded entry.
  • Re-test on rebase: a manual NVIDIA-hardware run if available:

```bash cd libvmaf && meson setup build \ -Denable_vulkan=enabled -Denable_cuda=false && ninja -C build cd .. python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary $PWD/core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature ciede --backend vulkan --device 0 --places 4 # Expected post-PR-346 (when merged): 5/48 mismatches at 1.78× threshold. # Expected pre-PR-346 (current master): 42/48 mismatches at higher ratio. # If the count drops below 5/48 on NVIDIA, ADR-0273 should record the # delta and consider closing T-VK-CIEDE-F32-F64.

0229 — tools/vmaf-tune fast Phase A.5 scaffold (ADR-0276)

  • Touches: tools/vmaf-tune/src/vmaftune/fast.py (new), tools/vmaf-tune/src/vmaftune/cli.py (new fast subcommand branch), tools/vmaf-tune/pyproject.toml (new [fast] extra), tools/vmaf-tune/tests/test_fast.py (new), tools/vmaf-tune/AGENTS.md (new invariants), docs/usage/vmaf-tune.md (new "Phase A.5" section), docs/adr/0276-vmaf-tune-fast-path.md (new ADR), docs/research/0060-vmaf-tune-fast-path.md (new digest).
  • Invariant: the fast subcommand is opt-in and never automatically replaces the Phase A grid path. The slow grid is the ground-truth corpus generator (ADR-0237 contract); fast-path is for the recommendation use case only. Optuna is a lazy-imported optional dep gated behind the [fast] extra — importing it at module scope outside fast.py (or its tests) breaks the zero-dep core install.
  • Rebase impact: entirely fork-local; the tool sits under tools/vmaf-tune/ which is fork-added, and no upstream files are touched. Upstream Netflix/vmaf has no analogous surface.
  • Re-test on rebase:
pip install -e 'tools/vmaf-tune[fast]'
pytest tools/vmaf-tune/tests/test_fast.py -v
vmaf-tune fast --smoke --target-vmaf 92

0229 — vmaf-tune recommend subcommand (ADR-0237 Phase B-lite)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/recommend.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds recommend subparser; corpus subcommand untouched.
  • tools/vmaf-tune/tests/test_recommend.py (new). 13-case smoke suite, mocks all binaries; runs in <100 ms.
  • docs/usage/vmaf-tune.md — adds ## recommend section.
  • Invariant: recommend consumes the existing CORPUS_ROW_KEYS schema unchanged — vmaf_score, bitrate_kbps, crf, preset, encoder, exit_status. No schema bump. If a future PR bumps SCHEMA_VERSION, both the corpus writer and the recommend reader must be updated in lockstep; tests assert this via test_corpus_row_keys_match_init_contract.
  • Rebase impact: zero — tools/vmaf-tune/ is wholly fork-local; no upstream surface touches it.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0228 — integer_ms_ssim_cuda.c joins drain_batch (T-GPU-OPT-2 / ADR-0271)

  • Touches: core/src/feature/cuda/integer_ms_ssim_cuda.c. No upstream Netflix/vmaf changes expected here — the file is fork-added (CUDA twin of the upstream-port ms_ssim_score.cu) and the surface this PR redrew (per-scale l_partials[i] / c_partials[i] / s_partials[i] arrays + the per-scale h_l_partials[i] / h_c_partials[i] / h_s_partials[i] pinned host shadows + the submit() <→ collect() work redistribution + the cuEventRecord(s->lc.finished, s->lc.str) + vmaf_cuda_drain_batch_register(&s->lc) tail) is also entirely fork-local.
  • Invariant: the engine-scope drain-batch contract from ADR-0271 / drain_batch.h. The kernel-launch order on s->lc.str must stay stable: decimate (× 4) then for each scale i ∈ 0..4 horizvert_lcs ⇒ DtoH(l_partials[i]) ⇒ DtoH(c_partials[i]) ⇒ DtoH(s_partials[i])thencuEventRecord(s->lc.finished, s->lc.str)thenvmaf_cuda_drain_batch_register(&s->lc). Same-stream ordering is what makes the shared SSIM intermediates (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp`) safe across scales without explicit sync — any change that parallelises the per-scale work onto multiple streams breaks bit-exactness unless per-scale intermediates are also added.
  • On upstream sync: zero interaction (the file is fork-added). If a future upstream PR adds an integer_ms_ssim_cuda.c of its own, the merger must reconcile the per-scale partials topology + the drain_batch tail with whatever the new upstream shape brings.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build  # confirms the CPU build still links cleanly
# If the dev host has a working nvcc / host-compiler pair:
meson setup build_cuda -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda src/liblibvmaf_feature.a.p/feature_cuda_integer_ms_ssim_cuda.c.o
# Netflix CPU golden gate (CPU is the bit-exactness ground truth):
make test-netflix-golden
# Cross-backend parity (places=4 gate, ADR-0214):
/cross-backend-diff

0277 — ffmpeg-patches refresh against n8.1 — 2026-05-04 (ADR-0277)

  • Touches: ffmpeg-patches/ is unchanged (no content drift). Doc-only entries land in:
  • docs/adr/0277-ffmpeg-patches-refresh-2026-05-04.md — new ADR.
  • docs/adr/_index_fragments/0277-ffmpeg-patches-refresh-2026-05-04.md — index row.
  • docs/adr/_index_fragments/_order.txt — manifest append.
  • changelog.d/changed/ffmpeg-patches-refresh-2026-05-04.md — Changed entry.
  • This file — this entry.
  • Invariant: ffmpeg-patches/series.txt order is load-bearing — patches 0002…0006 build on each other and only apply cleanly cumulatively. The verification gate is a series replay, not a per-patch git apply --check (per ADR-0118 + CLAUDE.md §12 r14).
  • On upstream sync: zero interaction. Netflix/vmaf has no ffmpeg-patches/ tree; this is a fork-local integration surface.
  • Re-test on rebase (also: re-replay procedure for the next refresh):
# Clone pristine n8.1
git -C /tmp clone --depth 1 --branch n8.1 \
  https://github.com/FFmpeg/FFmpeg.git ff-replay-$(date +%F)
cd /tmp/ff-replay-$(date +%F)
git switch -c refresh-$(date +%F)
git config user.email refresh@local && git config user.name "Refresh Bot"

# Replay the series cumulatively
for p in /path/to/vmaf/ffmpeg-patches/000*-*.patch; do
  git am --3way "$p" || break
done

# Regenerate and compare to in-tree
mkdir -p /tmp/ff-regen-$(date +%F)
git format-patch n8.1.. -o /tmp/ff-regen-$(date +%F)/

# Diff old vs new excluding pure format-patch noise
for i in 1 2 3 4 5 6; do
  orig=$(ls /path/to/vmaf/ffmpeg-patches/000${i}-*.patch)
  regen=$(ls /tmp/ff-regen-$(date +%F)/000${i}-*.patch)
  diff -u \
    <(grep -v "^From [0-9a-f]\|^Date:\|^index " "$orig") \
    <(grep -v "^From [0-9a-f]\|^Date:\|^index " "$regen") \
    | head -40
done

If only stylistic diffs surface (PATCH N/M numbering, MIME headers, hunk-context counts, hunk offset shifts against cumulative state), keep originals — record a no-drift refresh ADR. If real content drift surfaces, regenerate and ship the refresh PR with the regenerated patches plus a content-summary ADR.

End-to-end vf_libvmaf smoke is best run from CI (ffmpeg-integration.yml) against an installed libvmaf prefix — the meson-uninstalled .pc does not satisfy FFmpeg's #include <libvmaf.h> probe (the headers live under libvmaf/libvmaf.h only; the system-installed .pc carries an extra -I${includedir}/libvmaf shortcut that the uninstalled .pc omits).

0229 — T7-5 NOLINT-sweep closeout (ADR-0278)

  • Touched files:
  • core/src/feature/integer_adm.c (1 NOLINT cite, line ~988 adm_decouple_s123 — upstream-mirror Netflix 966be8d5).
  • core/src/feature/cuda/ssimulacra2_cuda.c (3 NOLINT cites: ss2c_picture_to_linear_rgb, ss2c_host_combine, ss2c_run_scale_gpu / extract_fex_cuda).
  • core/src/feature/vulkan/ssimulacra2_vulkan.c (3 NOLINT cites: ss2v_setup_gaussian, ss2v_picture_to_linear_rgb, ss2v_run_scale).
  • core/src/feature/vulkan/cambi_vulkan.c (1 NOLINT cite: cambi_vk_extract).
  • core/src/feature/sycl/integer_adm_sycl.cpp (6 cites, SYCL kernel-launch entries).
  • core/src/feature/sycl/integer_motion_sycl.cpp (2 cites).
  • core/src/feature/sycl/integer_vif_sycl.cpp (4 cites).
  • core/tools/vmaf.c (3 cites: copy_picture_data, init_gpu_backends, main).
  • Invariant: zero behavioural change. Edits are inside comment blocks — appended (ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278) to existing prose justifications. No function bodies split. The 12 SYCL sites share an identical justification string verbatim; preserving the byte-for-byte duplicate is the load-bearing documentation pattern (grep-able across the SYCL TUs).
  • On upstream sync: minimal interaction. The cite-only edits live inside comment blocks above the function signatures; rebases will surface them as touched lines but the function bodies are unchanged. For integer_adm.c's upstream-mirror block (Netflix 966be8d5), the comment edit at line 984–991 is cosmetic — keep the fork's version on conflict (it merely names the ADR; the underlying prose is unchanged).
  • Re-test on rebase:

```bash # 1. Programmatic audit must report 0 missing citations python3 - <<'PY' import re, os paths = [os.path.join(r, f) for r, _, fs in os.walk('libvmaf/src') for f in fs if f.endswith(('.c','.cpp','.h'))] paths.append('core/tools/vmaf.c') miss = total = 0 for p in paths: with open(p) as fh: ls = fh.readlines() for i, line in enumerate(ls): if 'NOLINT' in line and 'readability-function-size' in line and 'NOLINTEND' not in line: total += 1 ctx = [line]; j = i - 1 while j >= 0 and j > i - 14: s = ls[j].strip() if not s: break if s.startswith(('//','/','')): ctx.insert(0, ls[j]); j -= 1 else: break buf = ''.join(ctx) if 'ADR-' not in buf and not re.search(r'[Rr]esearch-?\d', buf): miss += 1 print(f"sites={total} missing={miss}") PY

# 2. Build + Netflix golden gate meson setup build -Denable_cuda=false -Denable_sycl=false ninja -C build make test-netflix-golden

0231 — vmaf-tune score path decodes mp4 -> raw YUV

  • Touches: tools/vmaf-tune/src/vmaftune/score.py (new _decode_to_raw_yuv + _needs_decode helpers, run_score shells out to ffmpeg when req.distorted.suffix not in {.yuv, .y4m}); tools/vmaf-tune/tests/test_corpus.py (3 new regression tests + the smoke-end-to-end mock now also stubs the ffmpeg decode call).
  • Invariant: the decode-back is the contract the libvmaf CLI imposes — mp4/webm/etc. --distorted is silently rejected as raw-yuv with the wrong byte count, surfacing as exit_status=234. Future encoder adapters that emit non-raw containers inherit this decode automatically. Do not "optimise" the temp YUV away without first migrating the corpus pipeline to the ffmpeg+libvmaf filter (which can pipe an mp4 stream in directly).
  • On upstream sync: zero interaction. vmaf-tune is fork-only tooling; upstream Netflix/vmaf has no analogue.
  • Re-test on rebase:

```bash cd tools/vmaf-tune && python3 -m pytest tests/ # plus an end-to-end smoke (needs a real raw YUV + ffmpeg + vmaf): ./vmaf-tune corpus --source /path/to/ref.yuv --width 1920 \ --height 1080 --pix-fmt yuv420p --framerate 25 --duration 6 \ --encoder libx264 --preset medium --crf 23 \ --output /tmp/smoke.jsonl --no-source-hash # expect: vmaf_score is a real number, not NaN.

0232 — CUDA build pins nvcc --std c++20

  • Touches: core/src/meson.build line 686 (cuda_flags = [...]).
  • Invariant: nvcc 12.x clamps host C++ at C++17 by default; 13.x accepts up to C++20. Bumping the host stdlib past nvcc's default (any gcc >= 16, libstdc++ ships C++23 features) breaks the host-side parse in <type_traits> / <bits/utility.h>. Forcing --std c++20 on CUDA 13+ keeps the host headers parseable. Do not drop this flag without first checking the host gcc version against nvcc's default.
  • On upstream sync: zero interaction. Netflix/vmaf doesn't ship the cuda_flags list shape we use (their CUDA build is the original pre-fork pattern); a sync that touches core/src/meson.build around the is_cuda_enabled branch should keep the --std c++20 injection.
  • Re-test on rebase:
meson setup core/build-cuda -Denable_cuda=true \
    -Denable_sycl=false -Denable_vulkan=disabled
ninja -C core/build-cuda
# smoke
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
    -r .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
    -d .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
    -w 1920 -h 1080 -p 420 -b 8

0233 — CUDA motion flush_fex_cuda idempotency guard

  • Touches: core/src/feature/cuda/integer_motion_cuda.c — factored an append_if_unwritten helper and routed the two motion2 / motion3 final-frame writes through it.
  • Invariant: under T-GPU-OPT-1 (PR #312 / ADR-0242), the pending-collect inside flush_context_cuda may already have written motion2_score[s->index] / motion3_score[s->index] before flush_fex_cuda runs. Any future motion-cuda flush logic that emits the same (feature, index) pair must keep this idempotency contract or flush_context_cuda will mis-surface as "context could not be synchronized".
  • On upstream sync: the bug only exists because the fork's flush_context_cuda runs the pending-collect before the per-extractor flush. Netflix/vmaf upstream doesn't have the T-GPU-OPT-1 drain pattern, so the pre-#312 code path didn't duplicate-write. If Netflix lands a similar pattern, the fix shape mirrors what's done here.
  • Re-test on rebase:
ninja -C core/build-cuda
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 \
  --model path=model/vmaf_v0.6.1.json --threads 1 -q \
  --output /tmp/cuda.json --json
# Expect: clean run, no "cannot be overwritten" warning,
# no "problem flushing context" error.

0234 — hw_encoder_corpus.py Phase A real-corpus runner

  • Touches: new scripts/dev/hw_encoder_corpus.py (no existing caller; opt-in tooling). Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. docs/development/intel-arc-vaapi-driver-priority.md. Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. stratified sample, 58 KiB).
  • Invariant: the script's QSV path forces env['LIBVA_DRIVER_NAME']='iHD' (set by the calling shell, not inside the script) when targeting /dev/dri/renderD129 on a multi-card host that has NVIDIA's libva-driver-nvidia shim installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI translation and the MFX session handshake fails with -9. See the companion doc for the failure mode + fix.
  • On upstream sync: zero interaction. The script lives under scripts/dev/ (fork-only); upstream Netflix/vmaf has no comparable Phase A corpus tooling.
  • Re-test on rebase:
python3 scripts/dev/hw_encoder_corpus.py \
  --vmaf-bin core/build-cuda/tools/vmaf \
  --source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
  --width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
  --encoder h264_nvenc --cq 25 \
  --out /tmp/smoke.jsonl
# Expect: 1 cell × ~150 frames, per-frame canonical-6 + vmaf,
# encoder=h264_nvenc, cq=25.

0235 — fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)

  • Touches: ai/scripts/train_fr_regressor_v2.pyENCODER_VOCAB gains 6 hw-codec entries (3 NVENC + 3 QSV); ENCODER_VOCAB_VERSION bumps 1 -> 2; PRESET_ORDINAL gains 6 sub-tables for p1..p7 (NVENC) and the libx264-aligned QSV preset family.
  • Invariant: vocab order is load-bearing — index of every entry is baked into trained model graphs as a one-hot column position. New entries MUST be appended (never inserted into the middle), and the unknown sentinel MUST stay last (UNKNOWN_ENCODER_INDEX = N - 1). Bumping ENCODER_VOCAB_VERSION signals that any v1-graph ONNX needs re-export against v2 before consuming v2 training rows.
  • On upstream sync: zero interaction. train_fr_regressor_v2.py is fork-only (Phase B prereq, ADR-0237 / ADR-0272).
  • Re-test on rebase: python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export — expect PLCC > 0.95 on a multi-codec corpus.

0276 — vmaf_tiny_v5 corpus-expansion probe (ADR-0287) — defer

  • What changed: research-only addition. New scripts under ai/scripts/ (fetch_youtube_ugc_subset.py, extract_ugc_features.py, train_vmaf_tiny_v5.py, eval_loso_vmaf_tiny_v5.py), new ADR docs/adr/0276-*.md, new research digest docs/research/0057-*.md, and one CHANGELOG entry. No new ONNX artefact under model/tiny/, no registry change, no public C-API / CLI / meson_options change. The probe trained an architecturally identical mlp_small on a 5-corpus parquet (4-corpus + 27 000 UGC rows); the 1-σ ship gate did not clear (Δ PLCC = +0.00005), so the exporter that the prior agent had drafted (export_vmaf_tiny_v5.py) was discarded before the commit.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI corpus-expansion surface; nothing on the upstream side touches these files.
  • On upstream sync: zero interaction. The v5 surface lives entirely under ai/scripts/ + docs/adr/ + docs/research/, all of which are fork-introduced trees. The shipped v2 model (model/tiny/vmaf_tiny_v2.onnx) and its registry row are untouched.
  • Re-test on rebase:
# No code under test on rebase — purely research artefacts.
# If revisiting the corpus expansion, the reproducer is in the
# research digest:
python3 ai/scripts/fetch_youtube_ugc_subset.py \
    --out-dir .workingdir2/ugc/download \
    --n-stems 30 \
    --manifest .workingdir2/ugc/manifest.json
python3 ai/scripts/extract_ugc_features.py \
    --manifest .workingdir2/ugc/manifest.json \
    --yuv-dir .workingdir2/ugc/yuv \
    --vmaf-bin build-cpu/tools/vmaf \
    --out-parquet runs/full_features_ugc.parquet \
    --max-height 360 --max-frames 300 --threads 8
python3 ai/scripts/eval_loso_vmaf_tiny_v5.py \
    --parquet-base  runs/full_features_4corpus.parquet \
    --parquet-extra runs/full_features_ugc.parquet \
    --out-json      runs/vmaf_tiny_v5_loso_metrics.json

0227 — vmaf-tune Intel QSV codec adapters (ADR-0281)

  • What changed: fork-local additions under tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, h264_qsv.py, hevc_qsv.py, av1_qsv.py, plus registry rows in codec_adapters/__init__.py and a new test file tools/vmaf-tune/tests/test_codec_adapter_qsv.py. Doc updates: docs/usage/vmaf-tune.md (Hardware encoders section), docs/adr/0281-vmaf-tune-qsv-adapters.md, docs/research/0066-vmaf-tune-qsv-adapters.md, tools/vmaf-tune/AGENTS.md, CHANGELOG.md.
  • Upstream source: fork-local. tools/vmaf-tune/ is fork-introduced under ADR-0237; Netflix/vmaf has no corresponding tree.
  • On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths.
  • Invariant: the registry exposes exactly four codecs (av1_qsv, h264_qsv, hevc_qsv, libx264 — alphabetical), each adapter validates its (preset, quality) pair, and the QSV preset vocabulary is the seven x264-style names (veryslow…veryfast, no ultrafast / superfast). The encode pipeline (encode.py) remains x264-CRF-tied and will be widened in a separate PR — the QSV adapters are inert until then. Future codec families that share parameter shape (NVENC, AMF) follow the same _<family>_common.py + N thin adapters pattern.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0229 — vmaf-tune libvvenc + NN-VC codec adapter (ADR-0285)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/vvenc.py (new fork-only file), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry edit, fork-only), tools/vmaf-tune/tests/test_codec_adapter_vvenc.py (new), tools/vmaf-tune/tests/test_corpus.py (relaxes the known_codecs() == ("libx264",) assertion to "libx264" in known_codecs() since the registry now spans multiple codecs).
  • Invariant: the codec-adapter registry is fork-introduced (Phase A of ADR-0237) and lives entirely outside the upstream Netflix tree, so tools/vmaf-tune/ does not touch upstream paths. The only rebase-sensitive surface is the CORPUS_ROW_KEYS schema in src/vmaftune/__init__.py (per the Phase A invariant in tools/vmaf-tune/AGENTS.md); this PR adds the adapter without changing the schema.
  • Upstream interaction: none. tools/vmaf-tune/ is not in Netflix/vmaf upstream.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/
  • Status update 2026-05-09: the original nnvc_intra toggle was removed (it emitted a fabricated IntraNN key that does not exist in any released VVenC). Replaced with a curated 9-knob real-VVenC 1.14.0 tuning surface (PerceptQPA, InternalBitDepth, Tier, Tiles, MaxParallelFrames, RPR, SAO, ALF, CCALF). Defaults preserve the bit-exact Phase A grid baseline. adapter_version bumped to "2" so cache keys invalidate. See ADR-0285 §"Status update 2026-05-09". no rebase impact: REASON (fork-local file, no upstream-tree touch).

0228 — vmaf-tune Phase D scaffold (ADR-0276)

  • Touches: tools/vmaf-tune/src/vmaftune/per_shot.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_per_shot.py, docs/usage/vmaf-tune.md, docs/adr/0276-vmaf-tune-phase-d-per-shot.md.
  • Invariant: scaffold-only. The module relies on a stable predicate signature (shot, target_vmaf, encoder) -> (crf, predicted_vmaf) that Phase B's bisect (PR #347) drops into later. Shot ranges are half-open [start_frame, end_frame) even though the C-side vmaf-perShot JSON/CSV sidecar uses an inclusive end_frame — normalisation happens at the parse boundary in _parse_per_shot_json / parse_per_shot_csv. vmaf-perShot schema lives in docs/usage/vmaf-perShot.md and is fork-local (ADR-0222), so upstream cannot drift it; the only rebase risk is fork-internal renames.
  • Upstream source: entirely fork-local. tools/vmaf-tune/ is fork-introduced (ADR-0237). Netflix/vmaf upstream has no encode-automation surface.
  • On upstream sync: zero interaction expected. No file in this PR overlaps an upstream-mirrored path.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q
python tools/vmaf-tune/vmaf-tune tune-per-shot --help

0229 — vmaf-tune SVT-AV1 codec adapter (ADR-0278)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/svtav1.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry), tools/vmaf-tune/src/vmaftune/encode.py (parse_versions extended for the SVT-AV1 banner pattern), tools/vmaf-tune/src/vmaftune/corpus.py (optional ffmpeg_preset_token hook).
  • Invariant: PRESET_NAME_TO_INT is closed and order-stable; the integer values are baked into corpus rows that downstream fr_regressor_v2 (ADR-0235) trains on. Reordering or rewriting the table silently changes the integer SVT-AV1 receives. The codec key "libsvtav1" matches CODEC_VOCAB[2] in ai/src/vmaf_train/codec.py — keep them aligned on any rename.
  • Upstream source: fork-local. tools/vmaf-tune/ is a fork-introduced tree (see entry 0227 — Phase A scaffold). No Netflix/vmaf upstream interaction.
  • On upstream sync: zero interaction. Lives entirely under the fork-local tools/vmaf-tune/ tree.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -v

0230 — fr_regressor_v2 PROD ship (ADR-0352)

  • ADR: ADR-0352
  • Touches: model/tiny/fr_regressor_v2.onnx (binary, refreshed), model/tiny/fr_regressor_v2.json (sidecar, sha256 + metrics), model/tiny/registry.json (smoke flag flip, sha256 update), runs/phase_a/full_grid/per_frame_canonical6.jsonl (training corpus — fork-local artefact under runs/), companion docs.
  • Re-test recipe: see Research-0068 §Reproducer. Ship gate is LOSO PLCC ≥ 0.95 on the per-source folds; current run reports 0.9681 ± 0.0207.
  • Rebase invariant: the per-frame canonical-6 corpus must be rebuilt from runs/phase_a/{nvenc,qsv}_pf.jsonl (PR #392) before any retrain; do not re-train against the cell-only comprehensive.jsonl (it lacks the per-frame features and produces PLCC ≈ 0.7 — the smoke baseline).
  • No upstream interaction: fr_regressor_v2 is fork-local (ADR-0272).

0229 — vmaf-tune Phase E ladder generator (ADR-0295)

  • ADR: ADR-0295
  • Touches: entirely fork-local under tools/vmaf-tune/. New module tools/vmaf-tune/src/vmaftune/ladder.py, new test file tools/vmaf-tune/tests/test_ladder.py, two new subcommand blocks in tools/vmaf-tune/src/vmaftune/cli.py. No upstream-shared paths touched.
  • Invariant: vmaftune.ladder.convex_hull returns a strictly monotonic Pareto frontier (both bitrate and vmaf monotonically increasing); select_knees returns exactly min(n, len(hull)) rungs in ascending bitrate order; emit_manifest("hls") produces one #EXT-X-STREAM-INF per rung with monotonically-increasing BANDWIDTH= values. The default _default_sampler is intentionally NotImplementedError — production callers must inject a Phase B bisect-driven sampler. Phase B integration PR (gated on PR #347) swaps the default; the test suite continues to inject a synthetic stub.
  • Rebase impact: none — fork-local Python tool; upstream Netflix/vmaf does not ship a tools/vmaf-tune/ tree.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_ladder.py -v

0229 — fr_regressor_v2 probabilistic head scaffold (ADR-0279)

  • Touches:
  • ai/scripts/train_fr_regressor_v2_ensemble.py (new — fork-local).
  • ai/scripts/eval_probabilistic_proxy.py (new — fork-local).
  • model/tiny/fr_regressor_v2_ensemble_v1*.onnx, fr_regressor_v2_ensemble_v1.json (new artefacts; smoke probes).
  • model/tiny/registry.json — five new kind: "fr" rows (fr_regressor_v2_ensemble_v1_seed{0..4}); existing entries untouched.
  • ai/AGENTS.md — new "fr_regressor_v2_ensemble_v1 — probabilistic head" section pinning the per-member ONNX I/O contract, manifest-as-runtime-entry-point invariant, ensemble-size pin, confidence-rule one-of, codec-vocab parity, and smoke-artefact posture.
  • docs/ai/models/fr_regressor_v2_probabilistic.md (new model card).
  • docs/research/0067-fr-regressor-v2-probabilistic.md (new audit digest).
  • docs/adr/0279-fr-regressor-v2-probabilistic.md (new ADR; Proposed). Index row appended to docs/adr/README.md.
  • CHANGELOG.md### Added row under "Unreleased — lusoris fork".
  • Invariant: the per-member ONNX I/O contract (two inputs: features [N, 6] standardised + codec_onehot [N, NUM_CODECS]; one output score [N]) and the manifest's confidence rule (one-of "ensemble" / "ensemble+conformal") are the C-side adapter's load-bearing contract. Per-member ensembles are stock FRRegressor(num_codecs=NUM_CODECS) calls — flipping to a v1-shaped single-input graph silently invalidates the manifest. CODEC_VOCAB parity with ai/src/vmaf_train/codec.py is required.
  • On upstream sync: zero interaction expected. Wholly fork-local; no upstream Netflix/vmaf path overlap. The ai/ package is fork-introduced (see ADR-0021, ADR-0036) — upstream has no probabilistic-regressor surface. If upstream ever ships its own fr_regressor_v2 variant, do NOT merge — register both ids side-by-side.
  • Re-test on rebase:
python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke
python ai/scripts/eval_probabilistic_proxy.py --smoke
python ai/scripts/validate_model_registry.py

0287 — vmaf-tune saliency-aware ROI tuning (ADR-0293)

  • Touches: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/cli.py (new recommend subcommand), tools/vmaf-tune/AGENTS.md (saliency invariant), docs/usage/vmaf-tune.md (saliency section).
  • Upstream source: fork-local. The vmaf-tune tree was introduced in PR #329 (ADR-0237 Phase A) and has no upstream Netflix counterpart.
  • On upstream sync: zero interaction — pure fork-local Python package under tools/vmaf-tune/.
  • Invariant: the saliency-to-QP-offset signal blend (offset = (2*sal − 1) * foreground_offset, clamped to ±12) is bit-for-bit equivalent to vmaf-roi's C-side blend (ADR-0247). tests/test_saliency.py pins the contract; if vmaf-roi's C blend changes, saliency.py follows in the same PR. The test seam contract (session_factory=…, encode_runner=…) lets the suite run without onnxruntime or ffmpeg.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/ -q

0229 — tools/vmaf-roi-score/ Option C scaffold (ADR-0296)

  • ADR: ADR-0296
  • Touches:
  • tools/vmaf-roi-score/pyproject.toml (new)
  • tools/vmaf-roi-score/vmaf-roi-score (new console shim)
  • tools/vmaf-roi-score/src/vmafroiscore/__init__.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/cli.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/score.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/mask.py (new)
  • tools/vmaf-roi-score/tests/test_combine.py (new)
  • tools/vmaf-roi-score/README.md (new)
  • tools/vmaf-roi-score/AGENTS.md (new)
  • docs/adr/0296-vmaf-roi-saliency-weighted.md (new)
  • docs/adr/_index_fragments/0296-vmaf-roi-saliency-weighted.md (new)
  • docs/adr/_index_fragments/_order.txt — append-only.
  • docs/research/0069-vmaf-roi-saliency-weighted.md (new)
  • docs/usage/vmaf-roi-score.md (new)
  • changelog.d/added/T6-2c-vmaf-roi-score-scaffold.md (new)
  • Invariant: tools/vmaf-roi-score/ is wholly fork-local. No upstream Netflix/vmaf surface owns or interacts with this directory. The combine math is a pure linear blend on Python float; the JSON schema is pinned by ROI_RESULT_KEYS and SCHEMA_VERSION = 1. Schema bumps require an ADR-0288 supersession. Naming guard: do not confuse with core/tools/vmaf_roi.c (ADR-0247) — that's the encoder-steering binary. The scoring tool here is vmaf-roi-score; the names diverge deliberately.
  • Rebase impact: zero. Pure-Python tool under tools/; not part of the libvmaf C build, not part of any Netflix-mirrored surface.
  • Re-test on rebase:
pytest tools/vmaf-roi-score/tests

0228 — vmaf-tune compare codec-comparison mode (research-0061 Bucket #7)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/compare.py (new). Wholly fork-local; no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds the compare subparser and _run_compare router.
  • tools/vmaf-tune/tests/test_compare.py (new). Mocked predicate; no ffmpeg / vmaf binaries required.
  • tools/vmaf-tune/AGENTS.md — invariant note for the predicate seam and COMPARE_ROW_KEYS contract.
  • docs/usage/vmaf-tune.md — new "Codec comparison" section.
  • Invariant: compare.compare_codecs orchestrates per-codec ranking via an injected predicate(codec, src, target_vmaf) -> RecommendResult callable. The orchestration must not branch on codec name; new codecs land as one-file additions under codec_adapters/ and are picked up automatically by the registry. COMPARE_ROW_KEYS is the JSON / CSV column contract — same maintenance discipline as CORPUS_ROW_KEYS.
  • Rebase impact: entirely fork-local. The Phase A + Phase B recommend backend (ADR-0237) is fork-internal; upstream Netflix/vmaf has no tools/vmaf-tune/ tree.
  • Re-test on rebase:

```shell pytest tools/vmaf-tune/tests/test_compare.py -v PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \ --src /tmp/ref.yuv --target-vmaf 92 --format markdown

0229 — vmaf-tune --score-backend GPU score wiring (ADR-0299)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/score_backend.py (new). Wholly fork-local — tools/vmaf-tune/ has no upstream Netflix/vmaf overlap.
  • tools/vmaf-tune/src/vmaftune/{score,corpus,cli}.py (additive kwargs, no API removals).
  • tools/vmaf-tune/tests/test_score_backend.py (new).
  • docs/usage/vmaf-tune.md (new GPU section + flag row).
  • docs/adr/0299-vmaf-tune-gpu-score.md (new).
  • docs/research/0071-vmaf-tune-gpu-score-backend.md (new).
  • Invariant: the libvmaf CLI exposes --backend NAME with values auto|cpu|cuda|sycl|vulkan exactly. Help-text parser in score_backend.parse_supported_backends pins this format. If upstream renames the flag or reformats the help line on merge, the parser silently degrades to "CPU only" — the test fixtures in test_score_backend.py will catch the format change but only if re-run.
  • Upstream source: fork-local. Netflix upstream's CLI does not ship a --backend selector (CPU-only).
  • On upstream sync: zero interaction. vmaf-tune lives entirely in fork-introduced paths and consumes only the fork's --backend flag.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v
# If the libvmaf help text reformats, parse_supported_backends
# will return {"cpu"} on test_parse_full_backend_line_yields_all_four
# and the test fails loudly.

0261 — vmaf-tune HDR-aware encode + score path (2026-05-03)

  • What changed: fork-local addition under tools/vmaf-tune/src/vmaftune/hdr.py plus wiring into corpus.py / cli.py / score.py. Adds ffprobe-driven HDR detection, codec-specific HDR ffmpeg flag dispatch, schema-v2 corpus row keys (hdr_transfer, hdr_primaries, hdr_forced), and four --auto-hdr / --force-* CLI modes. See ADR-0300.
  • Upstream source: zero. tools/vmaf-tune/ is fork-introduced (Phase A under ADR-0237).
  • On upstream sync: zero interaction. Upstream Netflix/vmaf ships no encode automation surface; this tree is entirely fork-local and lives outside libvmaf/ and python/.
  • Schema migration note: SCHEMA_VERSION bumped 1 → 2. The three new keys are additive — Phase B / C loaders treat missing keys as SDR for backward compat with v1 rows.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -q
python -m vmaftune.cli corpus --help  # confirm --auto-hdr surfaces

0298 — vmaf-tune content-addressed cache (ADR-0298)

  • What changed: fork-local. New module tools/vmaf-tune/src/vmaftune/cache.py; cache integration in tools/vmaf-tune/src/vmaftune/corpus.py (iter_rows now consults the cache before encode/score); new CLI flags --no-cache, --cache-dir, --cache-size-gb in cli.py. Codec-adapter Protocol gains adapter_version: str; the lone Phase-A x264 adapter pins "1".
  • Upstream source: none. tools/vmaf-tune/ is fork-introduced (ADR-0237) and has no upstream counterpart.
  • On upstream sync: zero interaction with Netflix/vmaf master. The module sits entirely under tools/vmaf-tune/, which upstream does not ship.
  • Invariant for future codec adapters: every CodecAdapter must declare adapter_version: str. Bump it whenever the adapter's argv shape, preset list, or quality range changes — otherwise the cache returns stale results post-upgrade. The contract is asserted by test_cache_key_diffs_on_each_field in tests/test_cache.py.
  • Re-test on rebase:

```bash pytest tools/vmaf-tune/tests/test_cache.py -v

0283 — vmaf-tune Apple VideoToolbox adapters (2026-05-05)

  • What changed: fork-local addition under tools/vmaf-tune/src/vmaftune/codec_adapters/. New files: h264_videotoolbox.py, hevc_videotoolbox.py, _videotoolbox_common.py, plus the registry hook in __init__.py. See ADR-0283.
  • Update 2026-05-09: prores_videotoolbox.py adapter added to the same registry pattern (broadcast / prosumer ProRes intermediate). Quality knob differs — ProRes is a fixed-rate codec, so the harness's --crf slot carries the integer ProRes tier id (0=proxy → 5=xq) rather than a -q:v value. _videotoolbox_common.py extended with PRORES_PROFILE_* constants + validate_prores_videotoolbox() / prores_profile_name() helpers; profile ids verified against FFmpeg n8.1.1 libavcodec/videotoolboxenc.c. See the Status update appendix in ADR-0283.
  • Upstream source: zero. tools/vmaf-tune/ is fork-introduced (Phase A under ADR-0237).
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py -q
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_prores_videotoolbox.py -q

0228 — vmaf-tune coarse-to-fine CRF search (ADR-0306)

  • What changed: fork-local tooling. Adds coarse_to_fine_search() to tools/vmaf-tune/src/vmaftune/corpus.py, plumbs new CLI flags onto vmaf-tune corpus (--coarse-to-fine, --coarse-step, --fine-radius, --fine-step, --target-vmaf), and ships a new vmaf-tune recommend subcommand. Widens tools/vmaf-tune/src/vmaftune/codec_adapters/x264.py quality_range from (15, 40) to (0, 51). JSONL row schema unchanged (SCHEMA_VERSION=1).
  • Upstream source: fork-local. The whole tools/vmaf-tune/ tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation surface.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is not mirrored from upstream.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_corpus.py -k coarse_to_fine

0314 — vmaf-tune --score-backend=vulkan (ADR-0314)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/cli.py (additive argparse flag on corpus + recommend subparsers; resolves select_backend and catches BackendUnavailableError for clean exit-2).
  • tools/vmaf-tune/src/vmaftune/score.py (additive backend kwarg on build_vmaf_command and run_score; None = no flag emitted).
  • tools/vmaf-tune/src/vmaftune/corpus.py (new CorpusOptions.score_backend field, default None; forwarded into run_score).
  • tools/vmaf-tune/tests/test_score_backend.py (additive Vulkan-specific tests; pre-existing tests now pass after the backend= kwarg lands).
  • docs/adr/0314-vmaf-tune-score-backend-vulkan.md (new).
  • docs/usage/vmaf-tune.md (new "Vulkan score backend" subsection under the existing GPU-scoring section).
  • tools/vmaf-tune/AGENTS.md (invariant note: argparse choices stay in sync with libvmaf --backend vocabulary).
  • changelog.d/added/vmaf-tune-score-backend-vulkan.md (new).
  • Invariant: score_backend.ALL_BACKENDS = ("cpu", "cuda", "sycl", "vulkan") is the exact set libvmaf's core/tools/cli_parse.c --backend alternation accepts. Adding a new harness-side value without the libvmaf-side wiring produces silent strict-mode failures on hosts that probe positively for it.
  • Upstream source: zero. Netflix upstream's CLI does not ship a --backend selector; both tools/vmaf-tune/ and core/src/vulkan/ are fork-introduced.
  • On upstream sync: zero interaction. No upstream-mirror file is touched.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v -k vulkan
pytest tools/vmaf-tune/tests/test_score_backend.py -v

Failures here usually indicate the libvmaf help-text format changed; score_backend.parse_supported_backends test fixtures pin the format and will fail loudly.

0303 — fr_regressor_v2 ensemble prod flip (ADR-0303)

  • ADR: ADR-0303
  • Touches: entirely fork-local.
  • ai/scripts/train_fr_regressor_v2_ensemble_loso.py (new — 9-fold LOSO trainer over the five ensemble seeds; emits loso_seed{N}.json artefacts).
  • scripts/ci/ensemble_prod_gate.py (new — reads five loso_seed{N}.json files, returns exit 0 iff mean(PLCC_i) ≥ 0.95 AND max - min ≤ 0.005).
  • ai/AGENTS.md — appended "Ensemble registry invariant" paragraph under the existing fr_regressor_v2_ensemble_v1 section.
  • docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md (new), docs/research/0075-fr-regressor-v2-ensemble-prod-flip.md (new), changelog.d/added/fr-regressor-v2-ensemble-prod-flip.md (new).
  • Rebase invariant: the production ship gate is two-partmean_i(PLCC_i) ≥ 0.95 AND max_i(PLCC_i) - min_i(PLCC_i) ≤ 0.005 over five seeds. The variance bound is load-bearing: removing it silently allows a one-seed-wins-four-seeds-tie configuration that invalidates the ensemble's predictive-distribution semantics. Both thresholds live in scripts/ci/ensemble_prod_gate.py; do not weaken either without superseding ADR-0303.
  • Rebase invariant (registry): the five fr_regressor_v2_ensemble_v1_seed{0..4} registry rows are smoke: true on master at this commit; flipping them to false is the follow-up flip PR's job, gated on a real-corpus LOSO run + the CI gate. Do not flip seed rows during a rebase merge conflict resolution.
  • Re-test on rebase:
python3 -c "import ast; ast.parse(open('ai/scripts/train_fr_regressor_v2_ensemble_loso.py').read())"
python3 -c "import ast; ast.parse(open('scripts/ci/ensemble_prod_gate.py').read())"
python ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help
python scripts/ci/ensemble_prod_gate.py --help
  • Upstream source: zero. fr_regressor_v2 and its ensemble are fork-introduced (parent ADR-0272 / ADR-0279).
  • On upstream sync: zero interaction.

0313 — CI required-checks aggregator (2026-05-05)

  • What changed: fork-local CI policy. New .github/workflows/required-aggregator.yml — single workflow that runs on every non-draft PR and verifies the 23 named required checks reported success/skipped/neutral (or didn't appear at all, which is the path-filter-rejection semantics). Aggregator becomes the single branch-protection required check, replacing the 23-name list from ADR-0037.
  • Touches: .github/workflows/required-aggregator.yml (new), docs/adr/0313-ci-required-checks-aggregator.md (new), changelog.d/added/ci-required-checks-aggregator.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment file).
  • Upstream source: zero. Branch-protection policy is fork-only.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Manual operator step at adoption (uses PATCH, not PUT — corrected from the original ADR-0313 body which had the wrong verb):
echo '{"strict": false, "contexts": ["Required Checks Aggregator"]}' | \
  gh api -X PATCH "repos/VMAFx/vmafx/branches/master/protection/required_status_checks" --input -
  • Re-test on rebase:
# YAML lint passes
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/required-aggregator.yml'))"

0305 — encoder knob-space Pareto analysis (2026-05-05)

  • What changed: fork-local. New analysis scaffold for the 12,636-cell encoder knob sweep that backs tools/vmaf-tune/codec_adapters/* recipe defaults. New files: ai/scripts/analyze_knob_sweep.py (per-(source, codec, rc_mode) Pareto hull on (bitrate_kbps, vmaf_score), encode_time_ms tiebreaker, regression-detection check), ai/tests/test_knob_sweep_analysis.py (synthetic 20-row JSONL fixture). Methodology + scaffolded findings: see ADR-0305 + Research-0077. Companion to Research-0063.
  • Touches: none upstream-shared. Sits entirely under ai/ (fork-local since the tiny-AI training surface, ADR-0021) and docs/{adr,research}/ (fork ledger).
  • Upstream source: zero. The 12,636-cell sweep, the Pareto scaffold, and the regression-detection invariant are fork-introduced; Netflix/vmaf master ships no encoder knob-sweep tooling.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Invariant for future codec adapter PRs: per the ai/AGENTS.md knob-sweep corpus invariant (ADR-0305), recipes that regress vs the bare encoder at matched bitrate within the same (source, codec, rc_mode) slice MUST NOT ship as adapter defaults. New adapter PRs cite the per-slice hull row from reports/summary.md (or "no hull entry yet — bare default") in their PR description. The comprehensive.jsonl sweep file is generated locally and lives under runs/phase_a/full_grid/ (gitignored — never committed).
  • Re-test on rebase:
pytest ai/tests/test_knob_sweep_analysis.py -v

0302 — ENCODER_VOCAB v3 schema expansion (ADR-0302)

  • Touches: ai/scripts/train_fr_regressor_v2.py (adds an ENCODER_VOCAB_V3 parallel constant; does not modify the live ENCODER_VOCAB or ENCODER_VOCAB_VERSION).
  • Invariant: ENCODER_VOCAB is append-only and order-stable (per ADR-0235). The v3 scaffold preserves the v2 slot ordering verbatim — slots 0..12 are bit-identical to the v2 vocab; slots 13/14/15 append libsvtav1, h264_videotoolbox, hevc_videotoolbox. The live ENCODER_VOCAB_VERSION = 2 remains the source of truth until the follow-up retrain PR clears the LOSO PLCC ship gate.
  • Upstream interaction: zero. ai/scripts/train_fr_regressor_v2.py is fork-introduced (ADR-0272) and has no upstream counterpart.
  • Re-test on rebase:
python3 -c "
import importlib.util, pathlib
spec = importlib.util.spec_from_file_location(
    't', pathlib.Path('ai/scripts/train_fr_regressor_v2.py')
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
assert len(m.ENCODER_VOCAB_V3) == 16
assert m.ENCODER_VOCAB_VERSION == 2
print('OK')
"

0304 — vmaf-tune fast-path prod wiring (ADR-0304)

  • Touches: tools/vmaf-tune/src/vmaftune/fast.py (replaces the ADR-0276 scaffold's NotImplementedError paths with concrete Optuna TPE + v2 proxy + GPU verify wiring); new module tools/vmaf-tune/src/vmaftune/proxy.py (centralised seam for fr_regressor_v2 ONNX inference); expanded tools/vmaf-tune/tests/test_fast.py. Doc-side: ADR-0304, Research-0076, tools/vmaf-tune/AGENTS.md invariant note.
  • Upstream source: zero. tools/vmaf-tune/ and model/tiny/fr_regressor_v2.onnx are both fork-introduced (ADR-0237 / ADR-0352).
  • Invariant: the production proxy is always fr_regressor_v2 (no smoke models in the production path) and a single GPU verify pass at recommend-end is mandatory — proxy alone never wins. The vmaftune.proxy.run_proxy helper is the single seam every fast-path consumer goes through; future probabilistic-head / ensemble migrations land in that one module. ENCODER_VOCAB v2 one-hot ordering is frozen by ADR-0352 and pinned in proxy.ENCODER_VOCAB_V2 — keep in sync with ai/scripts/train_fr_regressor_v2.py; drift raises ProxyError at inference time before bad predictions ship.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_fast.py -v

0307 — vmaf-tune ladder default sampler wiring (ADR-0307)

  • What changed: fork-local tooling. tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler no longer raises NotImplementedError; it composes corpus.iter_rows (Phase A encode + score) with recommend.pick_target_vmaf (smallest CRF clearing target VMAF) over DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38) at the adapter's mid-range preset. Module-level docstring + AGENTS.md invariant updated. New tests in tools/vmaf-tune/tests/test_ladder.py stub iter_rows via monkeypatch.setattr so no live ffmpeg / vmaf binaries are needed.
  • Upstream source: fork-local. The whole tools/vmaf-tune/ tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation / ladder surface.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is not mirrored from upstream.
  • Rebase invariant: the 5-point sweep (18, 23, 28, 33, 38) is the load-bearing default; downstream Phase E callers size their wall-time budget against five encodes per (resolution, target_vmaf) cell. Do not widen / narrow it without an ADR-0307 follow-up. The SamplerFn seam stays open — callers needing finer grids pass an explicit sampler=.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_ladder.py -v

0309 — fr_regressor_v2 ensemble real-corpus retrain harness (ADR-0309)

  • ADR: ADR-0309
  • Touches: entirely fork-local.
  • ai/scripts/run_ensemble_v2_real_corpus_loso.sh (new — Bash wrapper that loops the five seeds over the existing train_fr_regressor_v2_ensemble_loso.py against .workingdir2/netflix/).
  • ai/scripts/validate_ensemble_seeds.py (new — calls the ADR-0303 gate and writes PROMOTE.json / HOLD.json with a corpus sha256 snapshot).
  • ai/tests/test_validate_ensemble_seeds.py (new — 7 tests, synthetic JSON fixtures for both verdict paths).
  • ai/AGENTS.md — appended "Registry-flip is a separate PR (ADR-0309)" paragraph under the existing fr_regressor_v2_ensemble_v1 section.
  • docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md, docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md (all new).
  • Rebase invariant: the harness is decoupled from the registry mutation. Neither the wrapper nor the validator touches model/tiny/registry.json; the registry flip is a separate follow-up PR gated on a passing PROMOTE.json. Auto-flipping on PROMOTE was rejected in ADR-0309's alternatives matrix specifically because rebase-time mutation of shipped registry rows is the foot-gun this invariant exists to prevent.
  • Re-test on rebase:
python -m pytest ai/tests/test_validate_ensemble_seeds.py -v
python ai/scripts/validate_ensemble_seeds.py --help
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
  • Upstream source: zero.
  • On upstream sync: zero interaction.

0310 — BVI-DVC corpus ingestion for fr_regressor_v2 (ADR-0310)

  • Touches: ai/scripts/bvi_dvc_to_corpus_jsonl.py (new fork-only adapter), ai/scripts/merge_corpora.py (new fork-only shard merger), ai/tests/test_merge_corpora.py (new), docs/ai/bvi-dvc-corpus-ingestion.md (new), docs/adr/0310-bvi-dvc-corpus-ingestion.md (new), docs/research/0082-bvi-dvc-corpus-feasibility.md (new), ai/AGENTS.md (BVI-DVC invariant note).
  • Invariant: the BVI-DVC archive and any extracted artefacts (parquet, cached libvmaf JSON, JSONL corpus shard) are research-only and stay local — only derived fr_regressor_v2_*.onnx weights ship. The merge utility validates every row against the canonical vmaftune.CORPUS_ROW_KEYS tuple; the schema is the merge contract. Re-shape here is a pure transform on the cached libvmaf JSON; no ffmpeg / vmaf binary is invoked. The (src_sha256, encoder, preset, crf) natural key is load-bearing for de-duplication across mirrors and re-encodes.
  • Upstream interaction: none. ai/ is fork-introduced; BVI-DVC is not part of Netflix/vmaf upstream.
  • Re-test on rebase:
python -m pytest ai/tests/test_merge_corpora.py -v

ADR-0312 — ffmpeg-patches/ vmaf-tune integration (2026-05-05)

  • Files: ffmpeg-patches/0007-libvmaf-tune-qpfile-unified.patch, ffmpeg-patches/0008-add-libvmaf_tune-filter.patch, ffmpeg-patches/0009-pass-autotune-cli-glue.patch, ffmpeg-patches/series.txt, ffmpeg-patches/README.md.
  • Rebase invariant: patches 0007–0009 plug into the cumulative state after patches 0001–0006 apply against pristine n8.1. Per-patch git apply --check in isolation is the wrong gate; use the series-replay command in CLAUDE.md §12 r14 instead.
  • vmaf-tune patch invariant: the qpfile parser at libavcodec/qpfile_parser.{c,h} is shared across all three encoder adapters in patch 0007. Future encoders that grow a -qpfile AVOption inherit it; do not fork the parser. When tools/vmaf-tune/src/vmaftune/saliency.py's qpfile output format changes (new column, different frame-type alphabet, …), patch 0007 must change in the same PR (CLAUDE.md §12 r14).
  • vf_libvmaf_tune full-scoring promotion (2026-05-06): patch 0008 originally shipped as a scaffold (linear CRF↔VMAF interpolation, no libvmaf scoring) per ADR-0312's deferred-alternatives column. The filter now mirrors vf_libvmaf.c's CPU framesync pipeline end-to-end (vmaf_init + vmaf_model_load + vmaf_use_features_from_model in init(); per-frame vmaf_picture_alloc + memcpy + vmaf_read_pictures; flush + vmaf_score_pooled(MEAN) in uninit()). The CRF recommendation remains a piece-wise linear projection from the observed VMAF; per-clip Optuna TPE search stays in tools/vmaf-tune/src/vmaftune/recommend.py. Rebase-side: the new filter still depends only on libvmaf's CPU C-API (vmaf_init, vmaf_model_load, vmaf_use_features_from_model, vmaf_read_pictures, vmaf_score_pooled, vmaf_close, vmaf_picture_alloc/unref); zero new symbols beyond what vf_libvmaf.c already requires, so future libvmaf rebases that pass the existing libvmaf filter pass this one too. ADR-0312 sub-decision retired.
  • n7+ API migration (2026-05-06): patch 0008 originally referenced the removed AVFilterLink::frame_rate member directly (n6-era API); in n7+ that field moved off AVFilterLink onto a new FilterLink struct accessed via ff_filter_link(AVFilterLink *) from libavfilter/filters.h. Patch 0008 now uses ff_filter_link(outlink)->frame_rate = ff_filter_link(mainlink)->frame_rate; in config_output(), mirroring patches 0005/0006 which were already written against the post-n7 API. The bug slipped through CI because the FFmpeg-Vulkan lane only builds vf_libvmaf.o, not vf_libvmaf_tune.c; the full SYCL lane catches it now that PR #415 added ffmpeg-patches/** to the integration workflow's path filter. Discovery: PR #415 / ADR-0317.
  • Upstream source: zero. The vmaf-tune integration is fork-introduced; pure upstream syncs are unaffected.
  • On upstream sync: zero interaction with libvmaf master. FFmpeg-side rebases when n8.1 → n8.x land in ffmpeg-patches/test/build-and-run.sh's FFMPEG_SHA are tracked separately under each refresh ADR (e.g., ADR-0277 for the 2026-05-04 refresh).
  • Re-test on rebase:
git -C /path/to/ffmpeg-8 reset --hard n8.1
for p in ffmpeg-patches/000*-*.patch; do
    git -C /path/to/ffmpeg-8 am --3way "$p" || break
done
# Build smoke (libvmaf-disabled — patches 0001–0006 skipped if libvmaf_dnn
# is not built). With libvmaf_dnn available:
cd /path/to/ffmpeg-8 && ./configure --enable-libvmaf --enable-libx264 --enable-libsvtav1 --enable-libaom --enable-gpl
make -j$(nproc) ffmpeg
./ffmpeg -hide_banner -h encoder=libx264 2>&1 | grep -i qpfile
  • 2026-05-06 update — patch 0007 SVT-AV1 ROI bridge promoted from scaffold to full impl: the libsvtav1 hunk now sets enc_params.enable_roi_map = true, builds one SvtAv1RoiMapEvt per qpfile frame upfront in eb_enc_init (per-MB qp_offsets averaged into per-64×64-SB b64_seg_map of up to 8 segment QPs; uniform binning when the value span exceeds the segment budget), and attaches each event as a ROI_MAP_EVENT priv-data node from eb_send_frame() with node->size = sizeof(SvtAv1RoiMapEvt*) (the validation contract enforced by SVT-AV1's resource_coordination_process.c). Lifetime invariant: events + maps live for the entire encode session because SVT-AV1 reads ROI_MAP_EVENT data via shallow-copied pointers on async pipeline threads (per enc_handle.c::copy_private_data_list); eb_enc_close frees them. Wiring is gated on SVT_AV1_CHECK_VERSION(1, 6, 0); older SVT-AV1 builds keep the log-and-continue fallback. libaom remains scaffold-only — its AOME_SET_ROI_MAP bridge stays a separate follow-up. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).

  • 2026-05-06 update — patch 0007 libaom-av1 ROI bridge promoted from scaffold to full impl: the libaom-av1 hunk now caches the parsed VmafTuneQpFile in AOMContext, allocates a segment-id map at libaom's mode-info grid (ALIGN_POWER_OF_TWO(dim, 8) >> 2, since av1/common/enums.h::MI_SIZE == 4), and on every encoded frame picks up to 8 segment QPs from the per-frame qp_offset value range (uniform linear binning when the span exceeds AOM_MAX_SEGMENTS == 8), paints the per-mi segment map by expanding each per-16×16-MB qp_offset into a 4×4 block of mi cells, and issues aom_codec_control(&ctx->encoder, AOME_SET_ROI_MAP, &roi_map). Lifetime invariant: libaom deep-copies the segment map and delta_q[] table on every control call (per av1/encoder/encoder.c::av1_set_roi_map memcpy), so a single buffer is reused across frames and freed in aom_free(). The qpfile is also freed there. Trade-off: the 8-segment cap rounds nearby qp_offsets together when the saliency model emits more than 8 distinct values per frame; finer granularity requires vmaf-tune corpus instead. This retires the libaom-av1 deferral noted under ADR-0312 — both AV1 encoder hooks (libsvtav1 and libaom-av1) are now full-impl. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).

0315 — Vendor-neutral VVC encode strategy (ADR-0315 / Research-0085)

  • ADR: ADR-0315
  • Digest: Research-0085
  • Touches: docs-only.
  • docs/research/0085-vendor-neutral-vvc-encode-landscape.md (new).
  • docs/adr/0315-vendor-neutral-vvc-encode-strategy.md (new).
  • docs/adr/_index_fragments/0315-vendor-neutral-vvc-encode-strategy.md (new).
  • docs/adr/_index_fragments/_order.txt (one-line append).
  • changelog.d/added/research-0085-vendor-neutral-vvc-encode.md (new).
  • docs/rebase-notes.md (this entry).
  • Rebase invariant: none. The research digest and ADR are pure surveys with no code dependencies; nothing in the fork's source tree references them in a way that breaks on upstream rebase.
  • Upstream source: zero. VVC encode strategy is a fork-local decision; upstream Netflix/vmaf has no codec adapter or encode-automation surface.
  • On upstream sync: zero interaction. Pure docs.
  • Re-test on rebase:
mkdocs build --strict 2>&1 | grep -E "(WARNING|ERROR)" || echo "docs build clean"
  • 2026-05-06 follow-up (Research-0085 verification pass):
  • docs/research/0085-vendor-neutral-vvc-encode-landscape.md flipped from Status: SKELETON to Status: Active. Most [UNVERIFIED] claims are now backed by primary-source URLs (NVIDIA SDK 13.0 docs, AMD AMF GitHub, Intel oneVPL GitHub + mfxstructures.h + CHANGELOG.md, Khronos registry, Phoronix Mesa/RADV coverage, VVenC issue tracker, ZLUDA repo).
  • ADR-0315's ## Context and ## Alternatives considered refreshed with the verified data points. Status stays Proposed.
  • [UNVERIFIED] count in the digest dropped 25 → 10; remaining items are legitimate gaps (NN-VC quality lift, vvenc per-kernel profile, HHI's non-public roadmap).
  • No code touched. No rebase impact beyond the existing docs-only posture.

0316 — cli_parse.c error() long-only-option fix (ADR-0316)

  • ADR: ADR-0316 (follow-up to ADR-0311).
  • Digest: none — bug-fix; fix shape fits in the ADR/commit body.
  • Touches:
  • core/tools/cli_parse.c (3 lines — call-site arg change at the ARG_THREADS / ARG_SUBSAMPLE / ARG_CPUMASK handlers).
  • core/test/fuzz/fuzz_cli_parse.c (removed known_assert_in_input early-reject filter).
  • core/test/fuzz/cli_parse_corpus/cli_threads_abbrev_assert.argv (promoted from cli_parse_known_crashes/).
  • core/test/test_cli_parse_long_only_args.c (new fork()-based regression test).
  • core/test/meson.build (new test wiring, gated off Windows alongside test_y4m_411_oob).
  • core/tools/AGENTS.md (added a long-only-options invariant note next to the existing cli_parse.c rules).
  • Rebase invariant: load-bearing. cli_parse.c is upstream-mirror with fork additions; the three handlers carry the fork-local shape of passing the ARG_* enum value (not 't' / 's' / 'c') to parse_unsigned(). If an upstream sync re-introduces the original short-option char shape, the assert returns and the parked-then-promoted reproducer (cli_parse_corpus/cli_threads_abbrev_assert.argv) will surface it in the next nightly fuzz run.
  • Upstream source: the bug shape exists in Netflix/vmaf master too (long-only options were added upstream with the same short-option-char placeholder). When the fork ports an upstream fix that overlaps these handlers, prefer the parse_unsigned(optarg, ARG_*, argv[0]) form already on the fork.
  • On upstream sync: re-apply the three-line change in cli_parse.c if upstream resets the call-site args. The unit test is fork-local and stays.
  • Re-test on rebase:
meson setup core/build libvmaf -Denable_tests=true \
    -Denable_cuda=false -Denable_sycl=false
ninja -C core/build test/test_cli_parse_long_only_args
meson test -C core/build test_cli_parse_long_only_args -v

ADR-0317 — CI flake fix: doc-only PR path-filter (2026-05-06)

  • Touched files:
  • .github/workflows/docker-image.yml — added paths: filter on both push: and pull_request: triggers.
  • .github/workflows/ffmpeg-integration.yml — added paths: filter on both push: and pull_request: triggers (covers all four matrix lanes: gcc, clang, SYCL, Vulkan).
  • docs/adr/0317-ci-doc-only-pr-flake-fix.md, docs/adr/README.md (index row), changelog.d/fixed/ci-doc-only-pr-flakes.md.
  • Rebase invariant: not load-bearing. Workflow-only change. Both files are fork-local CI; upstream Netflix/vmaf does not ship a Docker workflow or an FFmpeg-integration matrix in this shape, so rebase conflicts are unlikely. If a future upstream sync introduces an overlapping docker-image.yml or FFmpeg matrix, prefer the fork's path-filtered form — the rationale (ADR-0313 aggregator posture, doc-only-PR runner-time burn) is fork-specific.
  • Upstream source: none — fork-local CI workflows.
  • On upstream sync: no action required. If reviewers later add new build inputs (e.g. a top-level docker-compose.yml, a new ffmpeg-patches/*.txt config file), extend the paths: lists in the same PR that adds the input.
  • Follow-up not in this ADR: patch ffmpeg-patches/0008-add-libvmaf_tune-filter.patch line 256 (outlink->frame_rate = mainlink->frame_rate;) needs to migrate to the ff_filter_link() accessor introduced in FFmpeg n7+, matching the pattern already in patches 0005 / 0006. Tracked separately; the path-filter does not hide it (any libvmaf/ or ffmpeg-patches/ PR will still trip the SYCL lane).
  • Re-test on rebase:
python3 -c "import yaml; \
  yaml.safe_load(open('.github/workflows/docker-image.yml')); \
  yaml.safe_load(open('.github/workflows/ffmpeg-integration.yml')); \
  print('OK')"

0319 — fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training (ADR-0319)

  • Touches: ai/scripts/train_fr_regressor_v2_ensemble_loso.py (real _load_corpus + _train_one_seed bodies), ai/scripts/run_ensemble_v2_real_corpus_loso.sh (wrapper argv fix), docs/ai/ensemble-v2-real-corpus-retrain-runbook.md (Step 0 corpus-generation section), ai/AGENTS.md (canonical-6 schema invariant note), ai/tests/test_train_fr_regressor_v2_ensemble_loso_*.py (loader + train schema tests). Closes the deferrals tracked in rebase-notes §0303 + §0309.
  • Upstream source: none — fork-local ML training infrastructure. Netflix/vmaf upstream has no fr_regressor_v2 surface, no LOSO trainer, and no canonical-6 corpus tooling.
  • Invariant: the trainer's _load_corpus accepts the canonical-6 JSONL schema emitted by scripts/dev/hw_encoder_corpus.py bit-for-bit — required keys per row are (src, encoder, cq, frame_index, vmaf, adm2, vif_scale0..3, motion2). Codec block layout is 12-slot ENCODER_VOCAB v2 one-hot + constant preset_norm = 0.5 + crf_norm = (cq - cq_min) / (cq_max - cq_min). Schema changes require an ENCODER_VOCAB_VERSION bump and full ensemble retrain per the existing closed-vocabulary rule (ADR-0235 / ADR-0352). Fold-level StandardScaler is fit on the training rows only; leaking the held-out source's distribution into the scaler would silently inflate per-fold PLCC.
  • On upstream sync: no action required. If upstream Netflix/vmaf ever adds a competing LOSO trainer under python/vmaf/, do NOT merge them — keep the fork's training stack under ai/ per the AGENTS.md scope rule.
  • Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v2_ensemble_loso_loader.py \
       ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py -v
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh

ADR-0323 — fr_regressor_v3 train + register on ENCODER_VOCAB v3 (2026-05-06)

  • Scope: ai/scripts/train_fr_regressor_v3.py (new), ai/tests/test_train_fr_regressor_v3.py (new), model/tiny/fr_regressor_v3.onnx (new, real-weight checkpoint from a 9-fold LOSO gate-pass at mean PLCC 0.9975), model/tiny/fr_regressor_v3.json (new sidecar with encoder_vocab_version: 3 and full per-fold trace), model/tiny/registry.json (new fr_regressor_v3 row, smoke: false), ai/AGENTS.md (v3 retrain invariant section gains a "Status" subsection recording the gate result), docs/ai/models/fr_regressor_v3.md (new model card), docs/adr/0323-fr-regressor-v3-train-and-register.md + index row, changelog.d/added/fr-regressor-v3-train-register.md.
  • Rebase impact: zero. Fork-local feature; no upstream Netflix/vmaf surface is touched. The 16-slot ENCODER_VOCAB_V3 imported from train_fr_regressor_v2.py was already landed by PR #401 (ADR-0302).
  • On upstream sync: no action required. The v3 model ships alongside v2 — fr_regressor_v2.onnx and its sidecar are unchanged; the v3 row is appended to the registry and sorted alphabetically. If a future upstream sync ever lands a competing fr_regressor_v3 model under python/vmaf/, do NOT cross-link them — the fork's training stack lives under ai/.
  • Watch out for: the live ENCODER_VOCAB_VERSION in ai/scripts/train_fr_regressor_v2.py stays at 2 (per ADR-0302's invariant). Do not bump it to 3 in this PR or in any downstream port; the in-place promotion of v3 over v2 is a separate "promote v3 to authoritative" PR per ADR-0302's production-flip checklist.
  • Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh   # must report OK: 20+
python -c "import onnx; onnx.checker.check_model(onnx.load('model/tiny/fr_regressor_v3.onnx')); print('OK')"

ADR-0321 — fr_regressor_v2_ensemble_v1 full production flip (2026-05-06)

  • Scope: ai/scripts/export_ensemble_v2_seeds.py (new), model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.onnx (real full-corpus-trained weights replacing the 3025-byte synthetic scaffold bytes), model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.json (new per-seed sidecars), model/tiny/registry.json (sha256 + smoke: false on the five seed rows), ai/AGENTS.md (new invariant: the registry-flip is now done; future re-flips require a fresh PROMOTE.json + re-run of the export driver).
  • Rebase impact: zero. This is a fork-local production-flip; no upstream Netflix/vmaf surface is touched. The 12-slot ENCODER_VOCAB v2 carried in each sidecar is the same one the LOSO trainer (ADR-0319) bakes into the codec-block layout, so there is no rebase-time vocabulary drift to worry about.
  • Watch out for: if a future upstream sync ever introduces a competing fr_regressor_v2_ensemble_* model under python/vmaf/, do NOT cross-link them — the fork's ensemble weights are gated on runs/ensemble_v2_real/PROMOTE.json and are not portable to a different training stack.
  • Re-test on rebase:
bash core/test/dnn/test_registry.sh   # must report OK: 19
python -c "import onnx; \
  [onnx.checker.check_model(onnx.load(f'model/tiny/fr_regressor_v2_ensemble_v1_seed{i}.onnx')) \
   for i in range(5)]; print('OK')"

ADR-0324 — Ensemble training kit (2026-05-06)

  • Touches: tools/ensemble-training-kit/ (new), docs/adr/0324-ensemble-training-kit.md (new), docs/adr/README.md (index row), changelog.d/added/0324-ensemble-training-kit.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the kit assumes the LOSO wrapper hard-codes seeds (0 1 2 3 4). The orchestrator surfaces a warning if --seeds deviates but still hands off to the wrapper. If a future PR parameterises the wrapper's seed list, update both the wrapper and the kit's pass-through logic in lockstep.
  • On upstream sync: no action required. The kit lives entirely under tools/ensemble-training-kit/ (a fork-local path) and only invokes other fork-local scripts (ai/scripts/, scripts/dev/, scripts/ci/).
  • Re-test on rebase:
bash -n tools/ensemble-training-kit/*.sh
bash tools/ensemble-training-kit/make-distribution-tarball.sh /tmp/kit-test.tar.gz
tar -tzf /tmp/kit-test.tar.gz | grep -q "tools/ensemble-training-kit/run-full-pipeline.sh"

ADR-0332 — External-competitor benchmark harness (2026-05-08)

  • Touches: tools/external-bench/ (new), docs/adr/0332-external-bench-wrapper-only.md (new), docs/adr/_index_fragments/0332-external-bench-wrapper-only.md (new), docs/adr/_index_fragments/_order.txt (one-line append), docs/adr/README.md (regenerated), changelog.d/added/external-bench-harness.md (new), docs/research/0087-external-bench-competitor-survey-2026-05-08.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the harness is wrapper-only — never vendor or link x264-pVMAF (GPL-2.0) into this fork. Future competitors follow the same pattern (tools/external-bench/<competitor>/run.sh invokes a user-installed binary via env var; output schema-shimmed into the canonical JSON shape). The output schema (frames[].{frame_idx, predicted_vmaf_or_mos, runtime_ms} + summary.{competitor, plcc, srocc, rmse, runtime_total_ms, params, gflops}) is the contract between every wrapper and compare.py. run_wrapper's runner parameter MUST stay resolved at call time (not via default-arg binding) so monkeypatch-based tests work.
  • On upstream sync: no action required. The harness lives entirely under tools/external-bench/ (a fork-local path) and never touches Netflix-shared code.

ADR-0331 — Skip CI on draft pull requests (2026-05-08)

  • Touches: .github/workflows/{docker-image,security-scans,lint-and-format,ffmpeg-integration,libvmaf-build-matrix,rule-enforcement,tests-and-quality-gates}.yml (per-job if: clause + pull_request.types list). required-aggregator.yml is unchanged — it already adopted the pattern under ADR-0313. No upstream-shared paths.
  • Invariant: every top-level job in the eight fork workflows that trigger on pull_request carries if: github.event_name != 'pull_request' || github.event.pull_request.draft == false. The pull_request: block lists ready_for_review in types: so promotion of a draft fires CI exactly once. The second clause keeps push: triggers (no PR object) intact. If an upstream merge introduces a new top-level job, that job MUST inherit the gate; otherwise drafts will silently consume one matrix slot per push.
  • On upstream sync: Netflix/vmaf upstream does not gate on draft state; if a sync brings in new pull_request workflow content, replay the gate on every newly-introduced top-level job. Composing with an existing if: follows the coverage-gpu pattern — wrap both predicates in ${{ ... && ( ... ) }}.

  • Re-test on rebase:

```bash python3 -c "import yaml; names=['docker-image','security-scans','lint-and-format','required-aggregator','ffmpeg-integration','libvmaf-build-matrix','rule-enforcement','tests-and-quality-gates']; [yaml.safe_load(open(f'.github/workflows/{n}.yml')) for n in names]; print('OK')" # Spot-check the gate is present on every top-level job: for f in docker-image security-scans lint-and-format ffmpeg-integration \ libvmaf-build-matrix rule-enforcement tests-and-quality-gates \ required-aggregator; do grep -c "pull_request.draft == false" ".github/workflows/${f}.yml" done # Each must report >= 1.

SSIM extractor registration fix (2026-05-08)

  • Touches: core/src/feature/feature_extractor.c (upstream-mirror — adds one extern + one registry-array entry near the existing SSIM rows), core/src/feature/integer_ssim.c (upstream-mirror — adds #include "config.h" and refreshes the file-scope comment above vmaf_fex_ssim), core/src/meson.build (adds integer_ssim.c to the source list — fork-local diff), core/test/test_feature_extractor.c (adds one regression test alongside the existing tests), docs/metrics/features.md (table row + footnote ²), docs/state.md, changelog.d/fixed/ssim-extractor-registration.md.
  • Invariant on the upstream-mirror files: the registry-array entry must remain inside the unconditional CPU block (the same block as &vmaf_fex_float_ssim / &vmaf_fex_float_ms_ssim) — vmaf_fex_ssim is CPU-only with no SIMD or GPU twin. The config.h include in integer_ssim.c is load-bearing on Vulkan-enabled LTO builds because feature_extractor.c and integer_ssim.c must agree on HAVE_VULKAN / HAVE_CUDA / HAVE_SYCL for the VmafFeatureExtractor struct layout to match across TUs.
  • On upstream sync: if Netflix ever lands its own integer-SSIM registry row, drop the fork's row in favour of upstream's; the file structure is identical. If upstream removes integer_ssim.c entirely (the file has been dormant on master for years), revert the meson.build addition. Otherwise no action.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false && ninja -C build
./build/test/test_feature_extractor    # 5/5 pass, includes new ssim row
./build/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
                  --distorted testdata/dis_576x324_48f.yuv \
                  --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
                  --feature ssim --output /tmp/ssim_smoke.json && \
  grep -q '<metric name="ssim"' /tmp/ssim_smoke.json
# Vulkan-enabled LTO build (-Wlto-type-mismatch must stay clean)
meson setup build-vulkan -Denable_vulkan=enabled --reconfigure && \
  ninja -C build-vulkan tools/vmaf

CI paths-ignore deny-list on heavy workflows (ADR-0341, 2026-05-09)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (fork-local — paths-ignore: block under pull_request:), .github/workflows/tests-and-quality-gates.yml (fork-local — same block), docs/adr/0341-ci-paths-ignore-doc-only-prs.md + index fragment, changelog.d/changed/ci-paths-ignore-doc-only.md.
  • Invariant: the deny-list must stay strictly documentation-only (docs/**, **/*.md, changelog.d/**, CHANGELOG.md, .workingdir2/**). Any path that contributes to a build, test, or lint input — libvmaf/**, meson.build, meson_options.txt, subprojects/**, python/**, ai/**, mcp-server/**, model/**, testdata/**, .github/workflows/** — must NEVER appear in the deny-list, otherwise the corresponding required check is silently skipped on a code-touching PR. The Required Checks Aggregator (ADR-0313) catches only the doc-only case (no required check ever ran for any required name); a too-broad deny-list would lose build coverage without anyone noticing.
  • On upstream sync: Netflix/vmaf upstream does not carry these two workflow files (they are fork-local additions). No sync conflict expected.
  • Re-test on rebase:

HDR VMAF model search — Path C documentation only (2026-05-09)

  • Files added (this fork only; upstream Netflix/vmaf has none of these):
  • model/vmaf_hdr_model_card.md — discoverable warning that the HDR scoring path falls back to the SDR vmaf_v0.6.1.json weights. Filename deliberately uses .md, not .json, so the vmaftune.hdr.select_hdr_vmaf_model glob (vmaf_hdr_*.json) keeps returning None.
  • docs/research/0089-hdr-vmaf-model-search.md — verbatim trail of the source-or-train survey (URLs + access dates).
  • changelog.d/added/hdr-vmaf-model-search.md — release-notes fragment per ADR-0221.
  • ADR-0300 grew an inline ### Status update 2026-05-09: HDR model status section.
  • Why no model JSON ships: Path A negative findings (no public Netflix HDR VMAF model exists; HDRMAX is a different algorithm not loadable by libvmaf's JSON path). Path B deferred behind gated subjective HDR corpora + multi-day training compute. No fabricated weights are introduced.
  • On upstream sync: if Netflix lands vmaf_hdr_*.json in Netflix/vmaf/model/, port via /port-upstream-commit; the resolver picks it up automatically with no vmaftune change. Then delete model/vmaf_hdr_model_card.md (or rewrite it as a normal model card describing the upstream weights). Watch https://github.com/Netflix/vmaf/issues/645 for the upstream release announcement.
  • Re-test on rebase: no behavioural change — pure docs. Sanity:
python3 -c "from pathlib import Path; \
  import sys; sys.path.insert(0,'tools/vmaf-tune/src'); \
  from vmaftune.hdr import select_hdr_vmaf_model; \
  print(select_hdr_vmaf_model(Path('model')))"
# Expect: None  — confirms the .md card does not match the glob

ADR-0349 — fr_regressor_v3 namespace resolution (2026-05-09)

  • Rebase impact: none. Docs-only change — adds ADR-0349, an append-only status appendix on ADR-0302 per ADR-0028, a ## fr_regressor_* namespace map block in ai/AGENTS.md, and two changelog fragments. No upstream Netflix/vmaf surface touched; no fr_regressor_* registry rows touched (sha256s for _v1, _v2, _v2_ensemble_v1_seed{0..4}, _v3 all unchanged); no C / Python / ONNX bytes modified.
  • What to check after a rebase: nothing automated. The only drift risk is a future agent claiming fr_regressor_v3plus_features for an unrelated workstream — ai/AGENTS.md carries the reservation; reviewers verify the map row exists before approving any new fr_regressor_* registry id.
  • Reproducer:

```bash # ADR + AGENTS.md namespace map present and consistent: test -f docs/adr/0349-fr-regressor-v3-namespace.md grep -q "fr_regressor_* namespace map" ai/AGENTS.md grep -q "fr_regressor_v3plus_features" ai/AGENTS.md docs/adr/0349-fr-regressor-v3-namespace.md # Status appendix present on ADR-0302: grep -q "Status update 2026-05-09: namespace collision resolved" \ docs/adr/0302-encoder-vocab-v3-schema-expansion.md # Existing v3 production row bit-identical (sha256 unchanged): python3 -c "

import json reg = json.load(open('model/tiny/registry.json')) v3 = next(m for m in reg['models'] if m['id'] == 'fr_regressor_v3') assert v3['sha256'] == 'eaa16d23461eda74940b2ed590edfcaf13428aade294e47792a5a15f4d3b999c', v3 assert v3['smoke'] is False print('OK: fr_regressor_v3 production row unchanged') "

Registry test still passes:

bash core/test/dnn/test_registry.sh

0327 — Pre-push PR-body deliverables validator hook

  • Touches: scripts/ci/validate-pr-body.sh (new), scripts/git-hooks/pre-push (new), scripts/ci/test-validate-pr-body.sh (new), Makefile (hooks-install target adds the pre-push symlink). Re-uses scripts/ci/deliverables-check.sh parser verbatim — no upstream-shared file is modified.
  • Invariant: parser shape parity with .github/workflows/rule-enforcement.yml deep-dive-checklist gate (ADR-0108). The validator constructs a PATH shim that intercepts git diff --name-only calls only; every other git invocation falls through to the real binary.
  • On upstream sync: not applicable — these files are entirely fork-local and Netflix has no equivalent. If scripts/ci/deliverables-check.sh is ever rewritten or moved, the validator's exec path (scripts/ci/deliverables-check.sh) and the test harness's expected exit codes must follow. bash scripts/ci/test-validate-pr-body.sh # 8/8 cases pass

0320 — Semgrep # nosemgrep cites on Netflix-upstream Python harness (Research-0090)

  • Touches: python/vmaf/core/asset.py, python/vmaf/core/executor.py, python/vmaf/core/feature_extractor.py, python/vmaf/core/quality_runner.py, python/vmaf/core/result_store.py, python/vmaf/tools/decorator.py, python/test/command_line_test.py, python/test/feature_extractor_test.py, python/test/ssimulacra2_test.py, python/vmaf/config.py.
  • Invariant: every fork-added # nosemgrep: <rule-id> line is paired with an inline cite to Research-0090. The cite + rule-id pair is the load-bearing artifact (per memory feedback_no_guessing: every "false positive" claim ships its safety proof). If an upstream sync removes the cited line of code, drop the cite-comment block too. If upstream adds a defusedxml fix at the ElementTree.parse() site (feature_extractor.py:115, quality_runner.py:1496), keep upstream's fix and drop our suppressions.
  • config.py:40 (the SSL-bypass deletion) is a fork-exclusive security fix; if upstream resurrects ssl._create_unverified_context on a sync, do not re-merge it — the bypass clobbers the process-global default and is unjustified per Research-0090, F1. semgrep scan --config=p/cwe-top-25 --config=p/c --config=p/python . \ --metrics=off --json | jq '.results | length'

# expect 0 — every legit finding either has a # nosemgrep cite or was fixed

0321 — Security-scans workflow registry-pack list (Research-0090)

  • Touches: .github/workflows/security-scans.yml, .github/workflows/lint-and-format.yml.
  • Invariant: the registry packs the workflow cites (p/cwe-top-25 + p/c + p/python) are validated against https://semgrep.dev/c/p/<pack> — the previously-cited p/cert-c-strict, p/cert-cpp-strict, and p/cpp packs were retired by Semgrep in 2025 and 404. The lint-and-format.yml pull of ${{ github.* }} into env: (clang-tidy + clang-tidy-sycl steps) defuses run-shell-injection; preserve the pattern on any edit. See Research-0090, F2/F3. for pack in p/cwe-top-25 p/c p/python; do code=$(curl -sIL "https://semgrep.dev/c/${pack}" | head -1 | awk '{print $2}') [ "$code" = "200" ] && echo "${pack}: OK" || echo "${pack}: FAIL ($code)"

0320 — CodeQL C bulk sweep (78 deferred alerts → 60 fixed, 14 deferred to T7-5)

  • Touches: core/src/feature/{cambi.c,ciede.c,integer_adm.c,integer_psnr.c,adm_tools.h,third_party/xiph/psnr_hvs.c}, core/src/feature/x86/{adm_avx2.c,adm_avx512.c,ansnr_avx2.c,ansnr_avx512.c,vif_avx2.c,vif_avx512.c}, core/src/{pdjson.c,svm.cpp}, core/test/{test_cpu.c,test_model.c}, core/tools/{y4m_input.c,yuv_input.c,vmaf_bench.c}. All but vmaf_bench.c are upstream-mirror Netflix files.
  • Invariant: widening casts on integer multiplications ((size_t), (uint64_t), (double)) are LHS-prefixed before the multiply, never wrapped around the whole expression — the latter is a no-op against cpp/integer-multiplication-cast-to-long. Deleted commented-out blocks (e.g., the AVX-512 VP-loop dead variant in adm_avx512.c::adm_dwt2_inverse) are gone for good; if upstream brings them back, they reintroduce the alerts. iqa/convolve.c was deliberately left untouched: prefixing (double) on the float×float multiplications inside the scalar reference path breaks bit-exactness against the AVX2 path enforced by test_iqa_convolve — CodeQL alert deferred to a follow-up that updates both paths in lockstep.
  • On upstream sync: any upstream change that re-introduces the deleted comment blocks or rewrites the cast forms will surface the alerts again. The cambi_score signature change (CambiBuffers buffersconst CambiBuffers *buffers) is fork-local and likely to conflict with upstream patches that touch that function. The 14 deferred VifBuffer large-parameter alerts are tracked under T7-5 (multi-backend coordinated refactor including NEON).
  • Re-test on rebase: cd libvmaf && meson test -C build # all 50+ C tests make test-netflix-golden # upstream golden gate

# Re-run CodeQL on master afterwards; the 60 fixed alerts must stay closed.

CodeQL cpp/declaration-hides-variable sweep (2026-05-09)

  • What changed: Mechanical rename / scope-tighten / dedupe sweep closing 64 open cpp/declaration-hides-variable CodeQL alerts on master. Touched files: core/src/feature/cambi.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/x86/vif_avx2.c, core/src/feature/x86/vif_avx512.c. All five are upstream-mirror; the Netflix copyright header is preserved on each.
  • Renames adopted (semantic over _2 suffix):
  • cambi.c: inner int err shadowing function-scope err becomes mkdir_err (heatmaps init) and src_err (full-ref extract path).
  • adm_avx2.c / adm_avx512.c: the j == 0 first-column special-case block is wrapped in { ... } so its j0..j3 and s0..s3 stop being visible to the per-j tail loop. The inner duplicate __m256i add_shift_HP_vex = _mm256_set1_epi32(32768) (and 512-bit twin) is removed — bit-identical to the function-scope value already in scope. The __m256i rfactor1 that shadowed the function-scope float rfactor1[3] becomes rfactor_v0/_v1/_v2 (and the AVX-512 twin likewise).
  • vif_avx2.c / vif_avx512.c: tap-loop locals follow f_tap, r_top/r_bot, d_top/d_bot for the s0 stage, and f_tap0/f_tap1, r_back0/r_fwd0, etc. for the AVX-512 paired-tap stage. Inner per-fj __m256i fq / __m512i fq shadows of the centre-tap broadcast become f_tap. Inner-block duplicates of function-scope ref/dis/stride/ii (identical types and initialisers) are simply removed. The two scalar VifResiduals residuals declarations that shadowed function-scope Residuals512 residuals become tail_residuals. The two const uint16_t fcoeff declarations that shadowed function-scope __m512i fcoeff become fcoeff_scalar.
  • Invariant: bit-exactness gate — the rename sweep must not change any score. The Netflix CPU golden 3 (src01_hrc00, checkerboard_1, checkerboard_10) ran clean against this PR. All 76 VMAF-targeted Python tests pass; the 9 unrelated pre-existing failures (NIQE, PyPSNR, FileSystemResultStore) reproduce on a pristine origin/master checkout.
  • On upstream sync: Netflix has no equivalent renames on upstream master as of 2026-05-09. When syncing, prefer the fork's renamed identifiers (the CodeQL gate depends on them). If Netflix later renames the same locals differently, reconcile by keeping fork names and updating any imported chunks at port time.
  • Re-test on rebase: meson test -C build --suite=fast PYTHONPATH=$PWD/python python3 -m pytest \ python/test/quality_runner_test.py -k test_run_vmaf \ python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ -m "not slow" -q

ADR-0209 v1 stdio runtime (T5-2b) — Embedded MCP server (2026-05-08)

  • Touches: core/src/mcp/{mcp.c,dispatcher.c,transport_stdio.c,mcp_internal.h,meson.build,3rdparty/cJSON/{cJSON.c,cJSON.h,LICENSE}}, core/test/test_mcp_smoke.c, core/test/meson.build. All paths are fork-local. cJSON is vendored verbatim from upstream DaveGamble/cJSON@v1.7.18 under its MIT license.
  • Invariant: every TU under core/src/mcp/ (other than the vendored cJSON dir) is fork-local with the Copyright 2026 Lusoris and Claude (Anthropic) header; cJSON keeps its upstream MIT header verbatim. The public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged from T5-2 — only function bodies flipped from -ENOSYS to working implementations. SSE / UDS still return -ENOSYS so the v2 PR can wire them without touching the public surface.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface; the entire core/src/mcp/ subtree is fork-local. If upstream ever adds an MCP surface, expect a port-only sync since names will collide. cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \ -Denable_mcp=true -Denable_mcp_stdio=true ninja -C build && meson test -C build test_mcp_smoke -v

ADR-0334 — state.md-touch-check CI gate (2026-05-08)

  • Touches: .github/workflows/rule-enforcement.yml (new top-level job state-md-touch-check), scripts/ci/state-md-touch-check.sh (new), scripts/ci/test-state-md-touch-check.sh (new), scripts/ci/AGENTS.md (new rebase-sensitive-surface row), .github/PULL_REQUEST_TEMPLATE.md (already carries the "Bug-status hygiene" section + no state delta: REASON opt-out — coupled to the script's regex). No upstream-shared paths.
  • Invariant: the gate's trigger predicate (Conventional-Commit fix: prefix, bare bug token in title, GitHub close-keywords closes/fixes/resolves #N, unchecked Bug-status-hygiene checkbox) and opt-out sentinel (no state delta: REASON) match the wording of the ## Bug-status hygiene section in .github/PULL_REQUEST_TEMPLATE.md. Reword the template only alongside the script. The job carries the pull_request.draft == false || github.event_name != 'pull_request' gate (ADR-0331 pattern) — keep that on any future hoist into the required-aggregator set.
  • On upstream sync: Netflix/vmaf has no equivalent rule. No conflict expected; the workflow file is fork-introduced.
  • Re-test on rebase: bash scripts/ci/test-state-md-touch-check.sh python3 -c "import yaml; yaml.safe_load(open('.github/workflows/rule-enforcement.yml')); print('YAML OK')" pre-commit run shellcheck --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh pre-commit run shfmt --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh

SYCL PSNR chroma extension (T3-15(b), 2026-05-09)

  • Touches: core/src/feature/sycl/integer_psnr_sycl.cpp (per-extractor chroma device buffers, per-plane SSE accumulators, and a provided_features extension to psnr_y / psnr_cb / psnr_cr), core/src/sycl/AGENTS.md (per-kernel rebase-sensitive invariant for the chroma-on-per-extractor-buffer arrangement), docs/metrics/features.md (footnote ¹ refresh — all three GPU PSNR extractors now emit chroma), docs/adr/0192-gpu-long-tail-batch-3.md References-section status update, changelog.d/added/sycl-psnr-chroma.md.
  • Invariant on the chroma upload path: chroma planes ride on per-extractor device buffers populated by host-side staging copies in the combined-graph pre_fn callback — NOT the SYCL state's shared frame buffer (vmaf_sycl_shared_frame_init), which is luma-only by design. Luma stays graph-recorded; chroma SSE kernels run direct in post_fn on the same in-order combined queue. The CUDA twin (PR #520 / commit 7f3d58a5) uses the existing CUDA per-plane picture infrastructure and therefore has no equivalent invariant.
  • On upstream sync: Netflix/vmaf upstream has no SYCL backend at all, so conflict probability is zero on psnr_sycl. If an upstream port to the fork's SYCL runtime someday extends vmaf_sycl_shared_frame_init to allocate chroma planes, the PSNR extension can be migrated onto it and the per-extractor chroma buffers retired — but only after a cross-backend gate run confirms bit-exactness against CPU at places=4 (ADR-0214). source /opt/intel/oneapi/setvars.sh CC=icx CXX=icpx meson setup build-sycl libvmaf \ -Denable_sycl=true -Denable_cuda=false ninja -C build-sycl python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build-sycl/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend sycl --device 0

# Expect 0/48 mismatches across psnr_y / psnr_cb / psnr_cr at places=4.

```text

Cppcheck nullPointer false-positive in dict.c (2026-05-09)

Files pinned:

  • core/src/dict.c:121 (one-line redundant-condition fix in dict_overwrite_existing). Why this rebase-note exists: Master CI's Cppcheck (Whole Project) gate started failing on commit 14b5ffba (#537) and blocked every open PR because each PR rebases onto a broken master. The cppcheck finding was likely always present but masked by paths-ignore filtering on the prior workflow shape; PR #530 widened cppcheck's trigger surface and exposed it. Deleted the redundant && val guard since val is already checked at the public entry-point vmaf_dictionary_set (dict.c:137). No behavior change; cppcheck flags the original as "either the val check is redundant or there's a possible null deref" because it can't prove the interprocedural guarantee. Rebase-sensitivity: zero — change is local to dict.c. Future upstream sync of this file should keep the fix or re-run cppcheck locally to confirm absence of recurrence.

Aggregator timeout bump (2026-05-09)

Files pinned:

  • .github/workflows/required-aggregator.yml (deadline 30→90 min, job timeout 35→100 min) Why: 41 PRs in flight 2026-05-09 morning hit Aggregator timeouts while real CI eventually passed. Bumping both deadlines unblocks the train without touching the underlying matrix. Rebase-sensitivity: zero — workflow file is wholly fork-local.

ARC self-hosted runner pool — pilot Cppcheck routing (2026-05-09)

  • .github/workflows/lint-and-format.yml (Cppcheck runs-on: ternary). Why: opt-in graceful migration; ADR-0359 + docs/development/ci-runners.md document the flip-the-variable recipe when the cluster is degraded. Rebase-sensitivity: zero — workflow file is fork-local.

ADR-0338 — macOS Vulkan-via-MoltenVK CI lane (2026-05-09)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (fork-local — adds Build — macOS Vulkan via MoltenVK (advisory) lane, adds continue-on-error plumbing on matrix.experimental && matrix.moltenvk, adds Install MoltenVK + Vulkan loader/headers (macOS) step, adds Run Vulkan smoke tests (macOS MoltenVK) step, gates the existing test/cache/tox steps on !matrix.moltenvk), docs/backends/vulkan/moltenvk.md (new fork-local doc), docs/adr/0127-vulkan-compute-backend.md (status-update appendix per the ADR's Proposed status — body untouched), docs/adr/0338-macos-vulkan-via-moltenvk-lane.md (new), docs/adr/_index_fragments/0338-macos-vulkan-via-moltenvk-lane.md plus _order.txt append (new), docs/research/0089-moltenvk-feasibility-on-fork-shaders.md (new), changelog.d/added/macos-vulkan-via-moltenvk-lane.md (new).
  • Invariant on the upstream-mirror file: none — libvmaf-build-matrix.yml is fork-local. The new lane's continue-on-error clause MUST stay scoped to matrix.experimental == true && matrix.moltenvk == true so existing experimental: true matrix entries (e.g. the macOS DNN lane) keep their default fail-fast behaviour. VK_ICD_FILENAMES MUST point at /opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json — note the etc/vulkan segment, NOT share/vulkan (the homebrew formula's install layout uses etc/; verified against Formula/m/molten-vk.rb).
  • On upstream sync: Netflix upstream has no macOS Vulkan lane and no MoltenVK awareness; nothing to reconcile. If a future MoltenVK release drops support for GL_EXT_shader_atomic_int64 translation, moment.comp will fail on the lane; the fix path is in ADR-0338 §Decision (lane is continue-on-error so it does not block PRs) — update the known-limitations table in docs/backends/vulkan/moltenvk.md and either pin a working MoltenVK version in the brew install line or rewrite the shader.
  • Re-test on rebase:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/libvmaf-build-matrix.yml'))" && \
  echo "YAML parse OK"
# Confirm the lane is still in the matrix:
grep -q "Build — macOS Vulkan via MoltenVK (advisory)" \
  .github/workflows/libvmaf-build-matrix.yml
# Confirm the lane is NOT promoted to required-aggregator until one
# green run on master (per ADR-0338):
! grep -q "macOS Vulkan via MoltenVK" \
  .github/workflows/required-aggregator.yml
# Confirm the ICD path is the etc/ one, not share/:
grep -q "etc/vulkan/icd.d/MoltenVK_icd.json" \
  .github/workflows/libvmaf-build-matrix.yml

ADR-0363 — Mend Renovate replaces Dependabot (2026-05-09)

  • Touches: renovate.json (new, repo-root), .github/workflows/renovate.yml (new), .github/dependabot.yml (deleted — renamed to .github/dependabot.yml.disabled), docs/development/dependency-bot.md (new operator playbook), changelog.d/changed/renovate-supersedes-dependabot.md (new), docs/adr/0363-renovate-replaces-dependabot.md (new), docs/adr/_index_fragments/0363-renovate-replaces-dependabot.md (new).
  • Invariant: .github/dependabot.yml no longer exists on master; the disabled copy is dependabot.yml.disabled. On upstream sync, if Netflix ever ships their own dependabot.yml, do NOT restore it — the fork intentionally uses Renovate. Merge the upstream file into dependabot.yml.disabled for reference only.
  • Upstream interaction: none. Netflix/vmaf upstream has no Renovate config. Conflict risk is zero unless upstream adds renovate.json or restores dependabot.yml.
  • Re-test on rebase:
# Verify the workflow SHA-pin is still present and non-floating:
grep -E 'renovatebot/github-action@[a-f0-9]{40}' .github/workflows/renovate.yml
# Verify dependabot.yml is still absent:
test ! -f .github/dependabot.yml && echo "ok: dependabot.yml absent"
# Validate renovate.json syntax (requires Node):
node -e "JSON.parse(require('fs').readFileSync('renovate.json','utf8')); console.log('JSON valid')"

ADR-0355 — Symphony-inspired agent-dispatch infrastructure (2026-05-09)

Files added (all fork-introduced, none mirror upstream):

  • .claude/workflows/_template.md, .claude/workflows/codeql-alert-sweep.md, .claude/workflows/simd-port.md, .claude/workflows/feature-extractor-port.md.
  • scripts/lib/__init__.py, scripts/lib/backlog_tracker.py, scripts/lib/AGENTS.md.
  • scripts/ci/agent-eligibility-precheck.py (new row in scripts/ci/AGENTS.md "Rebase-sensitive surfaces" table).
  • docs/development/agent-dispatch.md. Why this rebase-note exists: pure additive, all paths are fork-only (.claude/, scripts/lib/, fork-only docs). Upstream Netflix/vmaf has no .claude/, no scripts/lib/, and no docs/development/agent-dispatch.md, so the merge surface is zero on /sync-upstream. The only coupling is internal between scripts/ci/agent-eligibility-precheck.py and scripts/lib/backlog_tracker.py (sys.path import). Both files move together; documented in scripts/lib/AGENTS.md and a new row in scripts/ci/AGENTS.md. Rebase-sensitivity: zero w.r.t. upstream. Internal-only: renaming BacklogItem field names or the BacklogTracker / GitHubTracker public method signatures is a breaking change for the precheck and any future state-audit script — guard via the smoke listed in Research-0091 §"Smoke results" before any rename PR. Format-coupling note: the BACKLOG.md row regex (scripts/lib/backlog_tracker.py:_ID_PATTERN) is brittle against table-shape edits. If a future BACKLOG.md edit adds a column or renames a status word, the parser will silently mis-classify rows — the smoke parses 101 rows on master at 2026-05-09; expect ≥ 100 after any structural edit.

0350 — psnr_hvs AVX-512 ceiling re-bench (ADR-0350, T3-9 (a))

  • docs/adr/0350-psnr-hvs-avx512-ceiling.md — closure ADR.
  • docs/adr/0160-psnr-hvs-neon-bitexact.md — appended ### Status update 2026-05-09 appendix.
  • docs/research/0091-psnr-hvs-avx512-bench-2026-05-09.md — empirical companion (cycle share, Amdahl ceiling, reproducer). Why this rebase-note exists: T3-9 (a) closes as AVX2 ceiling. The result has zero rebase-sensitivity by itself — no engine code changes — but the bit-exactness invariants that lock it to a ceiling do. The 78.42 % scalar tail in calc_psnrhvs_avx2 / calc_psnrhvs_neon is locked by ADR-0138 / ADR-0139's "per-lane-scalar float reduction" rule (carried by ADR-0159 / ADR-0160). If a future upstream sync of core/src/feature/third_party/xiph/psnr_hvs.c (the Xiph/Daala DCT) changes the per-block summation tree — e.g. partial folding, re-ordered means, vectorised mask reductions — the AVX2 + NEON TUs in core/src/feature/x86/psnr_hvs_avx2.c and core/src/feature/arm64/psnr_hvs_neon.c MUST be re-audited against the new scalar reference, and the ceiling argument in ADR-0350 must be re-run (because the 78 / 15 cycle-share split would shift). Rebase-sensitivity: low for the ceiling decision itself (empirical re-bench on a current host is cheap — 30 seconds via the reproducer in Research-0091 §7); high for the underlying bit-exactness invariants the decision rests on (Netflix golden trips on ≥ 5.5e-5 drift per ADR-0160 §Context). The ADR-0350 §Verification reproducer is the gate — re-run it if the cycle share shifts, the Netflix normal-pair fixture changes, or a new host class (e.g. wide-issue Granite Rapids) goes into CI.

0320 — FFmpeg n8.1 → n8.1.1 base bump (2026-05-09)

  • Touches: ffmpeg-patches/series.txt (header comment), ffmpeg-patches/README.md (apply / verify / smoke sections), ffmpeg-patches/test/build-and-run.sh (FFMPEG_SHA default), scripts/ci/ffmpeg-patches-check.sh (header comment; FFMPEG_BRANCH env default unchanged at release/8.1 since the branch tracks point releases), docs/development/automated-rule-enforcement.md (gate description). The 9 .patch files themselves are unchanged — every patch in the series applied cleanly, cumulatively, against pristine n8.1.1 via git am --3way.
  • Upstream source: FFmpeg upstream point release n8.1.1 (commit 239f2c7 "Bump micro for 8.1.1") — bug-fix-only on top of n8.1, no API or AVOption breakage that the patch stack consumes.
  • Invariant: the patch stack continues to apply against the current tip of FFmpeg's release/8.1 branch. Per ADR-0118 and ADR-0186 §FFmpeg patch coupling, the verification gate is cumulative git am --3way against a pristine checkout, not per-patch standalone apply. The scripts/ci/ffmpeg-patches-check.sh local gate uses git apply (no commit) but accumulates state in the same way.
  • On upstream sync: no action required. If a future FFmpeg point release (n8.1.2 or n8.2) lands new hunks that conflict with one of the patches, regenerate the affected patches via git format-patch on the resolved state, bump the references in the five files listed under "Touches", and add a fresh rebase-notes entry citing the conflict file(s).
  • Re-test on rebase:
cd /tmp && rm -rf ffmpeg-n811 && \
  git clone --depth 1 --branch n8.1.1 \
    https://git.ffmpeg.org/ffmpeg.git ffmpeg-n811
git -C /tmp/ffmpeg-n811 config user.email agent@local
git -C /tmp/ffmpeg-n811 config user.name agent
for p in ffmpeg-patches/000*-*.patch; do
  git -C /tmp/ffmpeg-n811 am --3way "$p" || break
done
bash scripts/ci/ffmpeg-patches-check.sh

ADR-0281 follow-up — QSV install-matrix discoverability backfill (2026-05-08)

  • Touches: docs/getting-started/install/{arch,fedora,ubuntu,macos,windows}.md (new ## Intel QSV section per page), docs/adr/0281-vmaf-tune-qsv-adapters.md (status-update appendix per ADR-0028), changelog.d/changed/qsv-install-matrix-docs.md (new fragment). No code, no engine, no upstream-shared C / Python source touched. Pure documentation backfill closing the SYCL-audit research-0086 Topic C gap (issue #464).
  • Invariant: each per-OS QSV section pins the package names against verified upstream URLs with a Verified 2026-05-08 access date. The hardware-generation matrix is sourced from the public Wikipedia "Intel Quick Sync Video — Hardware decoding and encoding" table; if Intel revises which generation supports AV1 encode (e.g. backports the encoder to Lunar Lake / Meteor Lake silicon currently absent from the table), the matrix in all five pages must move in lockstep — the Arch / Fedora / Ubuntu / Windows pages all carry the same matrix verbatim. The macOS page deliberately omits the matrix (QSV unsupported on macOS).
  • On upstream sync: no action required — Netflix/vmaf upstream does not ship per-OS install pages under docs/getting-started/install/; that tree is fork-only.

# Lint the install pages (markdownlint via pre-commit):

pre-commit run --files docs/getting-started/install/*.md

# Verify each page (except alpine + macos) still carries the matrix:

for f in arch fedora ubuntu windows; do grep -q 'Arc Battlemage' "docs/getting-started/install/${f}.md" || echo "MISSING: ${f}"

# Confirm the macOS page documents QSV as unsupported:

grep -q 'Intel QSV. is unsupported on macOS' docs/getting-started/install/macos.md

0333 — vmaf-tune Phase F multi-pass encoding (ADR-0333)

Touches:

  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (CodecAdapter Protocol gains supports_two_pass: bool + two_pass_args(...))
  • tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (overrides both)
  • tools/vmaf-tune/src/vmaftune/encode.py (EncodeRequest gains pass_number / stats_path; build_ffmpeg_command adds the 2-pass argv splice + pass-1 null-muxer redirect; new run_two_pass_encode)
  • tools/vmaf-tune/src/vmaftune/corpus.py (CorpusOptions.two_pass, routing in iter_rows)
  • tools/vmaf-tune/src/vmaftune/cli.py (--two-pass flag on corpus / recommend subparsers) Invariant: 2-pass encoding routes through the codec adapter via supports_two_pass + two_pass_args(pass_number, stats_path). The encode driver never branches on codec name. Adapters with supports_two_pass = False are honoured silently (single-pass fallback with stderr warning); the seam is open for sibling codec adapters (libx264, libsvtav1, libvvenc, libaom-av1) to opt in by overriding the two methods on their adapter file alone. This is the fork-local extension to the ADR-0237 Phase A multi-codec contract; upstream Netflix/vmaf has no equivalent and does not own this code path. Re-test:
cd tools/vmaf-tune
python -m pytest tests/test_codec_adapter_x265_two_pass.py -q

(Optional, requires ffmpeg + libx265 in the runner's PATH:)

VMAF_TUNE_INTEGRATION=1 python -m pytest \
  tests/test_codec_adapter_x265_two_pass.py::test_real_x265_two_pass_smoke -q

Rebase-sensitivity: zero from upstream — tools/vmaf-tune/ is fork-local. The only concern is the codec_adapters Protocol shape: a future upstream commit that adds a sibling codec adapter SHOULD inherit the supports_two_pass = False default and either explicitly opt in or leave the flag off. Downstream sibling-codec PRs in this fork should follow the ADR-0288 / ADR-0333 pattern: one adapter file, override the two methods, add a test file mirroring test_codec_adapter_x265_two_pass.py.

ADR-0360 — CAMBI CUDA port (T3-15a, 2026-05-09)

Files pinned:

  • core/src/feature/cuda/integer_cambi_cuda.c (new)
  • core/src/feature/cuda/integer_cambi_cuda.h (new)
  • core/src/feature/cuda/integer_cambi/cambi_score.cu (new)
  • core/src/feature/feature_extractor.c (added vmaf_fex_cambi_cuda to list)
  • core/src/meson.build (added cambi_score to cuda_cu_sources, added integer_cambi_cuda.c to CUDA feature sources)

Why: The CUDA twin of vmaf_fex_cambi (Strategy II hybrid — three GPU kernels for the embarrassingly parallel stages; calculate_c_values + topK on CPU). Registers vmaf_fex_cambi_cuda under #if HAVE_CUDA guard.

Rebase-sensitivity: low. The three new files are wholly fork-local and will not conflict. The two upstream-shared files have small, self-contained hunks:

  • feature_extractor.c: the extern vmaf_fex_cambi_cuda declaration and the &vmaf_fex_cambi_cuda array entry are inside a #if HAVE_CUDA block. Upstream's additions to this file (new feature extractors, new dispatch flags) will not conflict unless Netflix adds their own CUDA twin for CAMBI (unlikely — they don't ship a CUDA backend).
  • meson.build: the cambi_score entry in the cuda_cu_sources dict and the integer_cambi_cuda.c line in the CUDA sources list. Any upstream changes to meson.build that restructure the cuda_cu_sources dict would require a manual merge; the dict entries are sorted alphabetically by key, so cambi_score lands between adm_score and motion_score.

If upstream adds cambi_cuda themselves: drop the fork copy and check for API divergence. Strategy II hybrid is the natural choice; the upstream implementation may differ if they choose Strategy III (fully-on-GPU calculate_c_values).

cambi_internal.h dependency: integer_cambi_cuda.c includes core/src/feature/cambi_internal.h (fork-added trampoline exposing cambi.c's static helpers). If upstream significantly refactors cambi.c (renames vmaf_cambi_preprocessing, vmaf_cambi_calculate_c_values, etc.), cambi_internal.h must be updated alongside. This is the same dependency the Vulkan twin (cambi_vulkan.c) has — see ADR-0210's rebase note for the full list of exposed functions.

Vulkan submit-pool PR-B: six secondary kernels (2026-05-09, ADR-0353)

Files changed:

  • core/src/feature/vulkan/ssim_vulkan.c
  • core/src/feature/vulkan/ciede_vulkan.c
  • core/src/feature/vulkan/ms_ssim_vulkan.c
  • core/src/feature/vulkan/motion_v2_vulkan.c
  • core/src/feature/vulkan/float_psnr_vulkan.c
  • core/src/feature/vulkan/float_motion_vulkan.c
  • core/src/feature/vulkan/AGENTS.md
  • docs/adr/0353-vulkan-submit-pool-pr-b-six-kernels.md

Why this rebase-note exists: six Vulkan host-glue TUs were migrated from per-frame command-buffer and descriptor-set allocation to the VmafVulkanKernelSubmitPool abstraction (ADR-0256). Any Netflix upstream sync that touches these same files (unlikely — they are fork-local) must preserve the VmafVulkanKernelSubmitPool fields in the state struct and the pool-destroy-before-pipeline-destroy ordering in close_fex().

Rebase-sensitivity: low. All six files are entirely fork-local; Netflix upstream does not have a Vulkan backend. The submit-pool API is defined in core/src/vulkan/kernel.h (also fork-local). No public header or C-API surface was changed; the FFmpeg patch series is unaffected.

Key invariant to preserve on rebase: vmaf_vulkan_kernel_submit_pool_destroy MUST be called before vmaf_vulkan_kernel_pipeline_destroy in every migrated kernel's close_fex(). See core/src/feature/vulkan/AGENTS.md §"Submit-pool ordering invariant".

0354 — Vulkan submit-pool PR-C: submit_pool_destroy-before-pipeline ordering

  • Touches: core/src/feature/vulkan/cambi_vulkan.c, core/src/feature/vulkan/ssimulacra2_vulkan.c, core/src/feature/vulkan/float_ansnr_vulkan.c, core/src/feature/vulkan/moment_vulkan.c.
  • Invariant: In every migrated extractor, vmaf_vulkan_kernel_submit_pool_destroy() MUST precede every vmaf_vulkan_kernel_pipeline_destroy() call in close_fex(). Reversing the order frees the pool's command buffers after the pipeline's command pool is destroyed — undefined behaviour per Vulkan spec §6.2.
  • Re-test: meson test -C build --suite=vulkan passes. scripts/ci/cross_backend_vif_diff.py shows places=4 for all four extractors on all three target devices (RTX 4090, Arc A380, RADV iGPU).

0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0291)

0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0352)

  • Touches: core/src/feature/vulkan/adm_vulkan.c, core/src/feature/vulkan/motion_vulkan.c, core/src/feature/vulkan/psnr_vulkan.c (all fork-local Vulkan kernels; no upstream C paths touched), changelog.d/changed/vulkan-submit-pool-pr-a-adm-motion-psnr.md, docs/adr/0291-vulkan-submit-pool-pr-a-adm-motion-psnr.md.
  • Invariant: Each migrated TU adds VmafVulkanKernelSubmitPool sub_pool and pre-allocated VkDescriptorSet field(s) to its state struct. The pool must be destroyed (vmaf_vulkan_kernel_submit_pool_destroy) before vmaf_vulkan_kernel_pipeline_destroy in close_fex(); reversing the order would destroy the descriptor pool while the submit pool still holds live command buffer + fence references. Descriptor sets allocated via vmaf_vulkan_kernel_descriptor_sets_alloc are freed implicitly by the descriptor pool tear-down — do NOT call vkFreeDescriptorSets on them in close_fex(). For motion_vulkan, the pre-allocated set is rebound once per frame via vkUpdateDescriptorSets because the blur ping-pong changes which blur[] slot is "current"; for adm_vulkan and psnr_vulkan the sets are stable after init() and require no per-frame update.
  • Upstream interaction: none. All three files are fork-local Vulkan kernel TUs not present in Netflix/vmaf upstream.
  • On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths. The Vulkan backend is entirely fork-introduced.
  • Re-test on rebase:
meson test -C build --suite=fast
# Cross-backend parity gate (places=4):
python python/test/cross_backend_diff.py \
    --features adm motion psnr \
    --backend vulkan cpu \
    --places 4 \
    --yuv testdata/yuv/src01_hrc00_576x324.yuv \
            testdata/yuv/src01_hrc01_576x324.yuv

ADR-0350 — FFmpeg libvmaf filter CUDA backend selector (0010 patch)

Patch: ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch.

  • libavfilter/vf_libvmaf.c — adds cuda AVOption + state field + init / cleanup / picture-pool wiring under CONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER.
  • configure — adds --enable-libvmaf-cuda (EXTERNAL_LIBRARY_LIST entry + help text), promotes libvmaf_cuda from blanket-autodetect to gated enabled libvmaf_cuda && require_pkg_config + check, preserves the enabled libvmaf && check_pkg_config libvmaf_cuda in-filter probe so the new selector still works without the explicit flag when libvmaf ships CUDA. Why this rebase-note exists: Patch 0010 extends the SYCL (0003) / Vulkan (0004) per-context backend selectors to CUDA on the regular libvmaf filter. The patch coexists with the upstream dedicated libvmaf_cuda filter (CONFIG_LIBVMAF_CUDA_FILTER) by gating its struct field and code paths on !CONFIG_LIBVMAF_CUDA_FILTER — the dedicated filter keeps owning its own cu_state field. CLAUDE.md §12 r14 makes the patch update mandatory because the change touches a filter consumer of the vmaf_cuda_state_init / _import_state / _state_free / _preallocate_pictures / _fetch_preallocated_picture C-API surface in libvmaf_cuda.h. Rebase-sensitivity: low. The patch's vf_libvmaf.c hunks are context-anchored on the SYCL/Vulkan selector blocks; if upstream FFmpeg renames CONFIG_LIBVMAF_CUDA_FILTER or moves the libvmaf_cuda.h include, the include guard at the top of the file needs the corresponding update. The configure hunks are context-anchored on the existing --enable-libvmaf-sycl / --enable-libvmaf-vulkan lines — those have proven stable across n8.0 → n8.1 → n8.1.1, so drift risk is low. When VmafCudaConfiguration ever grows a device_index field upstream, swap the cuda boolean for an int cuda_device mirroring SYCL's shape (separate ADR + patch refresh). Verification gate: cumulative git am --3way replay of ffmpeg-patches/000{1..9}-*.patch + 0010-* against pristine FFmpeg n8.1.1 PASS (2026-05-09). Build of libavfilter/vf_libvmaf.o PASS under both CONFIG_LIBVMAF_CUDA=0 (selector errors at filter- init time per #else branch) and CONFIG_LIBVMAF_CUDA=1 && !CONFIG_LIBVMAF_CUDA_FILTER (selector active, picture-pool wiring compiles).

0320 — Vulkan instance / VMA apiVersion bump to 1.4 (Step B)

  • Touches: core/src/vulkan/common.c, core/src/vulkan/vma_impl.cpp, core/src/vulkan/AGENTS.md.
  • Invariant: the four apiVersion sites (lines 54, 264, 374 of common.c; line 22 of vma_impl.cpp) request Vulkan 1.4, not 1.3. Together with the Step-A precise decorations in vif.comp / ciede.comp (PR #346) and the Phase-3 cross-subgroup release-acquire fix (PR #511), this gates the cross-backend places=4 contract on Arc + RADV. NVIDIA closure depends on Phase 3c (PR #512; block-on-merge until that lands). Netflix upstream does not carry a VMA dependency or a Vulkan backend; no upstream merge conflict expected on these files.
  • Re-test on rebase:
meson setup build -Denable_vulkan=enabled -Denable_cuda=false \
  -Denable_sycl=false --buildtype=release
ninja -C build
for D in 0 1 2; do
  python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --backends cpu vulkan --vulkan-device "$D" \
    --features vif ciede adm motion psnr
done
# All 0/N mismatches at places=4 once Phase 3c (PR #512) has landed.

ADR-0332 v2 runtime (T5-2c) — Embedded MCP server UDS + real compute_vmaf (2026-05-09)

  • Touches: core/src/mcp/{mcp.c,dispatcher.c,mcp_internal.h,meson.build,compute_vmaf.c,transport_uds.c}, core/test/test_mcp_smoke.c. All paths are fork-local. No new third-party vendor drop in v2 — mongoose vendoring stays deferred to v3 with the SSE transport.
  • Invariant: same as ADR-0209 v1 — the entire core/src/mcp/ subtree is fork-local; the public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged (only function bodies flipped — vmaf_mcp_start_uds from -ENOSYS to a working AF_UNIX listener; compute_vmaf from a {"status":"deferred_to_v2"} placeholder to a real vmaf_score_pooled binding). Per ADR-0128 § operational guardrails the UDS socket file is created mode 0700; that chmod happens in vmaf_mcp_start_uds after bind and is a load-bearing security invariant — do NOT relax it on rebase. compute_vmaf runs on a per-call ephemeral VmafContext so the host's main scoring run is unperturbed; do NOT rewire it to reuse server->ctx because vmaf_score_pooled commits the model destructively to the context.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. If upstream adds one, expect a port-only sync since names will collide.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
                                -Denable_mcp=true -Denable_mcp_stdio=true \
                                -Denable_mcp_uds=true
ninja -C build && meson test -C build test_mcp_smoke -v
# Real-score smoke (single 576x324 pair):
build/test/test_mcp_smoke 2>&1 | tail -3   # expects "16 tests run, 16 passed"

ADR-0332 v3 runtime (T5-2d) — Embedded MCP server SSE transport (2026-05-09)

  • Touches: core/src/mcp/{mcp.c,mcp_internal.h,meson.build,transport_sse.c}, core/meson_options.txt, core/test/test_mcp_smoke.c, docs/mcp/embedded.md, docs/adr/0332-mcp-runtime-v2.md (status-update appendix). All paths are fork-local. No third-party vendor drop in v3 — the originally-planned mongoose vendor was reversed because cesanta/mongoose 7.18 is GPL-2.0-only OR commercial, incompatible with the fork's BSD-3-Clause-Plus-Patent license (verified at upstream LICENSE 2026-05-09). The SSE transport is plain POSIX sockets in fork-owned C (~500 LOC).
  • Invariant: same as ADR-0209 / ADR-0332 v2 — the entire core/src/mcp/ subtree is fork-local; the public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged (only vmaf_mcp_start_sse's body flipped from -ENOSYS to a working AF_INET listener). The SSE listener binds INADDR_LOOPBACK only; do NOT switch to INADDR_ANY without a separate ADR + auth design (v3 ships intentionally without CORS/Bearer/per-session auth on the assumption of a same-host trust boundary). The SSE stop path uses shutdown(SHUT_RDWR) before close() — plain close() of an AF_INET listening fd from another thread does NOT unblock accept() on Linux; do NOT remove the shutdown call. enable_mcp_sse is now a feature option (default auto), not boolean false.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. Do NOT re-introduce mongoose (or any GPL-licensed HTTP library) on a future rebase without first amending CLAUDE §1 and adding a separate license-compatibility ADR.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
                                -Denable_mcp=true -Denable_mcp_stdio=true \
                                -Denable_mcp_uds=true \
                                -Denable_mcp_sse=enabled
ninja -C build && meson test -C build test_mcp_smoke -v
build/test/test_mcp_smoke 2>&1 | tail -3   # expects "17 tests run, 17 passed"

Status update 2026-05-09 — placeholder-ref hardening

  • Additional touches: same set as the 2026-05-08 ADR-0334 entry, no new files. The hardening adds a git diff -U0 ... -- docs/state.md call inside scripts/ci/state-md-touch-check.sh (case 4a) plus 10 additional fixture cases in scripts/ci/test-state-md-touch-check.sh.
  • New invariant: inserted lines in docs/state.md (lines starting with +, excluding the +++ b/... header) must not contain this PR / this commit / bare TBD / <PR> / #NNN. Canonical accept forms are PR #N and commit `<sha>`. The placeholder vocabulary is coupled to PR #541's audit findings — reword in lockstep with the ADR-0334 status-update appendix if the fork's row template changes.
  • Re-test on rebase: same bash scripts/ci/test-state-md-touch-check.sh run as the 2026-05-08 entry; the harness now reports 18/18 passed (was 8/8 passed).

0347 — Sanitizer matrix test-set scope (ADR-0347)

  • Touches: .github/workflows/tests-and-quality-gates.yml job sanitizers (build + test step), core/test/meson.build (no edits — the absence of any suite: 'unit' tag is the upstream state we now work with rather than against).
  • Invariant: the sanitizer job runs the full C unit-test set per sanitizer with a per-sanitizer deselect list driven by a case block on ${{ matrix.sanitizer }}. The deselect lists are load-bearing — each entry corresponds to a real bug tracked in docs/state.md. Under UBSan the build adds -Dc_args=-fno-sanitize=function -Dcpp_args=-fno-sanitize=function to suppress the K&R-prototype harness UB; the meson case branch must keep this build flag in sync with the test deselect entries. An upstream rebase that adds new test files via core/test/meson.build inherits full sanitizer coverage automatically (the workflow enumerates tests via meson test --list).
  • On upstream sync: if upstream Netflix lands a suite: 'unit' tagging convention, the workflow is robust to it (we already enumerate from meson test --list, not from --suite=unit). If upstream rewrites the harness to declare static char *test_X(void) with a (void) parameter, the -fno-sanitize=function flag becomes redundant — leave it in place (zero cost) until a deliberate cleanup PR reverts the suppression. If upstream lands a fix for any of the surfaced defects (SVMModelParser validation, feature_collector metadata leak, integer_adm::div_lookup race, framesync mutex mismatch), drop the corresponding deselect row from the workflow's case block in the same PR that pulls the upstream fix. cd libvmaf for SAN in address undefined thread; do EXTRA=() [ "$SAN" = undefined ] && EXTRA=( "-Dc_args=-fno-sanitize=function" "-Dcpp_args=-fno-sanitize=function" ) rm -rf "build-$SAN" CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld \ meson setup "build-$SAN" -Db_sanitize="$SAN" \ -Denable_cuda=false -Denable_sycl=false --buildtype=debug \ -Db_lto=false -Db_lundef=false "${EXTRA[@]}" meson compile -C "build-$SAN" case "$SAN" in address) EXCLUDE='test_model$|test_predict$|test_float_ms_ssim_min_dim$' ;; undefined) EXCLUDE='test_model$' ;; thread) EXCLUDE='test_model$|test_pic_preallocation$|test_framesync$' ;; esac TESTS=$(meson test -C "build-$SAN" --list \ | grep '^libvmaf:' \ | grep -vE "$EXCLUDE" \ | sed 's/^libvmaf://') meson test -C "build-$SAN" --print-errorlogs $TESTS

CodeQL bulk mechanical sweep — Python tree (2026-05-09)

  • Why this matters on rebase: no rebase impact. The diff lives entirely in python/vmaf/ and one fork-local helper (core/src/vulkan/spv_embed.py). None of the touched Python modules have been changed by Netflix upstream in over four years; the closest churn is unrelated additions to python/vmaf/script/run_*.py driver flags. A future /sync-upstream will land on a clean tree.
  • What changed: dead imports removed; exit()sys.exit() in seven CLI driver scripts; open(...)with open(...) in python/vmaf/tools/decorator.py and core/src/vulkan/spv_embed.py; typed except KeyError: pass bodies got an explanatory one-line comment to satisfy py/empty-except; pass removed where it was a no-op tail statement; one commented-out debug block deleted from tools/misc.py.
  • Re-test on rebase: python3 -c "import ast; [ast.parse(open(f).read()) for f in (...)]" over the touched files; ruff check over the same set must produce no NEW errors versus master baseline.

0345 — cambi × {CUDA, SYCL, HIP} GPU port planning (ADR-0345, docs-only)

  • Touches: docs/research/0091-cambi-gpu-port-planning-2026-05-09.md (new), docs/adr/0345-cambi-gpu-port-strategy.md (new), docs/adr/_index_fragments/0345-cambi-gpu-port-strategy.md (new fragment), docs/adr/_index_fragments/_order.txt (append slot), changelog.d/changed/cambi-gpu-planning-digest.md (new). No code. Companion to the per-port PRs that follow per the digest's §6 ordered plan (CUDA → SYCL → HIP).
  • Upstream source: none — fork-local planning artefact. Netflix/vmaf upstream has no CUDA / SYCL / HIP cambi twin and no plans to add one on those backends.
  • Invariant: the planning round locks Strategy II host-staged hybrid for the three pending backends, inheriting verbatim from ADR-0205 §Decision and ADR-0210 §Decision. The cross-backend gate contract for cambi is places=4 from day one on all backends — by construction (integer-only GPU pre-passes; byte-identical readback; unmodified host residual). If any per-port PR sees empirical drift from CPU, fix the kernel — never relax the gate (memory feedback_no_test_weakening). The shared cambi_internal.h host residual surface (shipped with PR #196 for the Vulkan port) is the load-bearing reuse point — all four GPU twins (Vulkan, CUDA, SYCL, HIP) link against it and inherit any future CPU-side c-value formula change automatically.
  • On upstream sync: no action required. If a future upstream sync introduces a Netflix/vmaf cambi GPU twin (extremely unlikely — Netflix has no public CUDA / SYCL / HIP cambi work), evaluate whether to drop the fork's twin in favour of upstream's per the standard prefer-upstream rule; otherwise no action.
  • Re-test on rebase: docs-only — no compile / runtime gate. The Strategy III v2 follow-up (parked per ADR-0205 §Out of scope) gets its own ADR + rebase-notes entry when profile data lands.

0320 — Vulkan VIF API-1.4 NVIDIA residual Phase 3b (deferral)

  • Touches: core/src/feature/vulkan/shaders/vif.comp (comment-only update at the Phase-4 reduction site — documents the Phase-3b candidate-fix experiments and the driver-side hypothesis; no code logic change vs. PR #511); docs/adr/0269-vif-ciede-precise-step-a.md (appended Phase-3b status update appendix; ADR body remains frozen per ADR-0028); docs/research/0090-...md (new); docs/state.md (row T-VK-VIF-1.4-RESIDUAL-ARC retired in favour of T-VK-VIF-1.4-RESIDUAL-NVIDIA-DEFERRED after the hardware-mapping correction); core/src/vulkan/AGENTS.md (Phase 3b update + rebase invariant for cross-backend gate device-name selection); changelog.d/fixed/vif-arc-mesa-anv-int64-reduction.md (new fragment).
  • Invariant: the workgroup-scope memoryBarrierShared(); barrier(); pair PR #511 introduced is load-bearing for the Arc + RADV lanes at API 1.4 and stays. Phase 3b confirmed it cannot be downgraded back to a bare barrier() even if the NVIDIA residual ever closes — Arc's clean state is contingent on the workgroup-scope pair.
  • Cross-backend gate device-selection invariant (NEW): scripts that target a specific Vulkan vendor must select by deviceName substring, not by --vulkan_device <index>. vmaf_vulkan_context_new's device sort is stable inside the same devtype_score bucket and the vkEnumeratePhysicalDevices enumeration order is host-policy-dependent (driver registration order in /etc/vulkan/icd.d/, Mesa device-select layer, VK_LOADER_* env vars). PR #511's commit message inverted the device map on this fork's CI workstation; the empirical numbers it cited as "NVIDIA" actually came from Arc and vice versa. New cross-backend lanes targeting a specific vendor should not inherit the off-by-one.
  • On upstream sync: vif.comp is fork-local; no upstream Netflix/vmaf has a Vulkan path. Cherry-picks from upstream cannot reach this file.
  • Re-test on rebase (assumes a multi-GPU CI workstation with NVIDIA + Arc + RADV; lavapipe-only CI lanes are a no-op for the API-1.4 residual since lavapipe never reproduced the bug):

# Local API-1.4 bump (off-master reproducer; do NOT commit).

sed -i 's/VK_API_VERSION_1_3/VK_API_VERSION_1_4/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1003000/VMA_VULKAN_VERSION 1004000/' \ core/src/vulkan/vma_impl.cpp cd libvmaf && meson setup build -Denable_vulkan=enabled \ -Denable_cuda=false -Denable_sycl=false && ninja -C build cd ..

# NVIDIA lane — expected 45/48 FAIL scale 2 until either the

# manual int64 subgroup-reduction patch lands or NVIDIA fixes

# the driver. Arc + RADV expected 0/48.

0230 — ssimulacra2_cuda GPU module unload + per-scale malloc removal (ADR-0356)

  • Touches: core/src/feature/cuda/ssimulacra2_cuda.c (fork-only — fork-added CUDA extractor), core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu (fork-only kernel), core/src/cuda/AGENTS.md (fork-local package guidance).
  • Invariant: every cuModuleLoadData in the fork's CUDA extractors must be paired with a guarded cuModuleUnload in the matching close_fex_cuda, between cuStreamSynchronize and cuStreamDestroy. The leak is invisible to compute-sanitizer --tool memcheck (the tool's leak-checker is scoped to cuMem*Alloc only). The XYB H2D / D2H byte counts shrink to the valid sub-region per scale; the device-side plane_full_pixels stride contract (kernels assume each plane starts at full-resolution offsets) stays unchanged. Pinned scratch reservations h_ref_lin_ds / h_dis_lin_ds are owned by ss2c_alloc_buffers and freed by close_fex_cuda via the existing SS2C_FREE_HOST macro.
  • Upstream interaction: none. ssimulacra2_cuda is fork-added per ADR-0206 and has no upstream Netflix/vmaf twin. meson test -C core/build test_ssimulacra2_simd

python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \

  --feature vif --backend vulkan --device <NVIDIA-index>

# Revert local bump after testing.

sed -i 's/VK_API_VERSION_1_4/VK_API_VERSION_1_3/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1004000/VMA_VULKAN_VERSION 1003000/' \ core/src/vulkan/vma_impl.cpp

Upstream-port-later batch — Research-0090 18-commit triage close-out (2026-05-09)

  • Touches: docs/state.md (one row in "Deferred (waiting on external trigger)"), this file, changelog.d/changed/upstream-port-later-batch-2026-05-09.md. No code touched. Companion to PR #446 (Research-0090) and the in-flight PRs #497 (MyTestCase super-PR), #443 / #444 (cambi-docs duplicate pair).
  • Per-commit classification (input set: 18 PORT_LATER SHAs from Research-0090):
# Upstream SHA Subject (truncated) Verdict Reopen / forward path
1 38e905d1 adopt MyTestCase + reformat BD-rate test data PORT_DEFERRED Subsumed by PR #497 commit e1dbdc09; close out when #497 merges
2 005988ea adopt MyTestCase + port new tests + align fifo_mode PORT_DEFERRED Subsumed by PR #497 commit 6c05afe2; close out when #497 merges
3 4679db83 fix VMAFEXEC_score tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit 0004d2cf — must preserve fork's golden places= values byte-for-byte (CLAUDE §8 / ADR-0024)
4 3e075107 adopt MyTestCase + update score values in vmafexec tests PORT_DEFERRED Subsumed by PR #497 commit 0004d2cf; close out when #497 merges
5 e3827e4d adopt MyTestCase + port new tests in asset/bootstrap/local_explainer PORT_DEFERRED Subsumed by PR #497 commit 6c05afe2; close out when #497 merges
6 25ff9f18 remove empty VmafossexecCommandLineTest stub PORT_DEFERRED → CHERRY-PICK after #497 Pure 13-line deletion. PR #497 currently RE-EMITS the stub; once #497 lands, cherry-pick this commit standalone (zero-conflict against post-#497 tip).
7 3a041a97 adopt MyTestCase + update score values PORT_DEFERRED Subsumed by PR #497 commit d52d9221; close out when #497 merges
8 ead2d12b fix vif_scale3 + adm3_egl_1 tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit b5a3f61b — Netflix-golden tolerance guard same as row 3
9 6c097fc4 reduce ADM/VIF tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit f3881d5c — Netflix-golden tolerance guard same as row 3
10 7df50f3a align testutil with full set of fixture functions PORT_DEFERRED Subsumed by PR #497 commit f1ae0495; close out when #497 merges
11 322ca041 replace temporal slicing with pre-sliced YUV fixtures PORT_DEFERRED Subsumed by PR #497 commit 7d9d9a10; close out when #497 merges. Sequencing matters: this commit must land before rows 12, 14, 15, 17 (the YUV-fixture consumers); #497 already orders them correctly.
12 74bdce1b align vmafexec_feature_extractor_test (aim/adm3/motion3) PORT_DEFERRED Subsumed by PR #497 commit 07e7cb48; close out when #497 merges
13 a3776335 align feature_extractor_test (aim/adm3/motion3) PORT_DEFERRED Subsumed by PR #497 commit 15a6874d; close out when #497 merges
14 0341f730 remove duplicate test_run_vmaf_integer_fextractor PORT_DEFERRED → CHERRY-PICK after #497 Pure 76-line deletion. Same disposition as row 6 — #497 currently re-emits the duplicate; cherry-pick standalone after #497.
15 9fa593eb port feature_extractor tests for aim/adm3/motion3 + new options PORT_DEFERRED Subsumed by PR #497 commit ab21b694; close out when #497 merges
16 d93495f5 reduce tolerance for VMAF scores in quality_runner tests PORT_DEFERRED w/ Netflix-golden guard PR #497 — Netflix-golden tolerance guard same as row 3
17 7d1ad54b port feature extractor tests for aim/adm3/motion3 PORT_DEFERRED Subsumed by PR #497 commit 44b9e626; close out when #497 merges
18 721569bc resource/doc: cambi_high_res_speedup + motion2 score PORT_DEFERRED → DEDUP Already in flight on TWO branches (PR #443 + PR #444). Maintainer picks one and abandons the other per Research-0090 §Recommended action #4. No third port-PR opened.
  • Invariant: after PR #497 merges, the Research-0090 PORT_LATER bucket reduces to exactly two follow-up cherry-picks against post-#497 master:
  • git cherry-pick 25ff9f18 (delete empty VmafossexecCommandLineTest).
  • git cherry-pick 0341f730 (delete duplicate test_run_vmaf_integer_fextractor). Both are pure deletions on python/test/command_line_test.py and python/test/feature_extractor_test.py respectively; no score change, no Netflix-golden interaction. They were excluded from PR #497 because the v2 super-PR's diff state currently RE-EMITS those identifiers (likely because #497 cherry-picked from an earlier upstream tip than 25ff9f18 / 0341f730).
  • Netflix-golden guard (binding): per CLAUDE §8 / ADR-0024, the three Netflix CPU golden pairs in python/test/quality_runner_test.py, vmafexec_test.py, vmafexec_feature_extractor_test.py, feature_extractor_test.py, result_test.py (1 normal src01_hrc00↔hrc01 + 2 checkerboard) carry hard-coded assertAlmostEqual rows that are NEVER modified by a fork PR. Upstream commits 4679db83, ead2d12b, 6c097fc4, d93495f5 explicitly LOWER places= on a subset of those rows (their stated motivation is macOS FP precision drift, not a true score change). Reviewer of PR #497 must verify that the 3 golden pairs retain fork tolerances byte-for-byte; only non-golden rows may adopt the relaxations.
  • On upstream sync: future /sync-upstream runs that re-detect these 18 SHAs should match this entry via the SHA list and short-circuit Pass-2 classification (skip re-triage).
  • Re-test on rebase: none required at the time of this commit (no code touched); after the two follow-up cherry-picks (25ff9f18 + 0341f730) eventually land, run meson test -C build --suite=fast make test-netflix-golden # 3/3 CPU goldens still pass ADR-0108: every fork-local PR that touches upstream-shared paths or establishes a rebase-sensitive invariant adds an entry here. PRs with no rebase impact state "no rebase impact" in the PR description and skip the entry.

The intended reader is whoever runs the next /sync-upstream (see ADR-0002 and .claude/skills/sync-upstream/). Read top-to-bottom before resolving conflicts.

Format

Each entry is a ### NNNN — short title heading with three fields:

  • Touches: paths likely to conflict on upstream merge.
  • Invariant: what the fork relies on that an upstream change could silently drop.
  • Re-test: the command(s) to run after the merge to confirm the invariant survived. Reproducer-style — no surrounding prose required.

IDs are assigned in commit order and never reused. A single entry may cover several PRs in one workstream; cross-link from the ID heading.

Entries (backfilled 2026-04-18 per ADR-0108 adoption)

0310 — Vulkan VIF int64 reduction race condition Phase 3 fix

  • Touches: core/src/feature/vulkan/shaders/vif.comp (replaces all three bare barrier() calls with explicit memoryBarrierShared(); barrier(); pairs covering the Phase-1 cooperative tile load, the Phase-2 vertical-conv shared write, and the Phase-4 cross-subgroup int64 reduction); plus documentation under docs/research/0089-...md (Phase 3 status appendix), docs/adr/0269-...md (Phase 3 status appendix), docs/state.md (T-VK-VIF-1.4-RESIDUAL closed; new T-VK-VIF-1.4-RESIDUAL-ARC opened), core/src/vulkan/AGENTS.md (Phase 3 update on the existing invariant row), changelog.d/fixed/vif-int64-reduction-race-condition.md. Upstream Netflix/vmaf has no Vulkan backend, so conflict probability for the shader is zero. The entry exists because the fix is rebase-sensitive: any future cherry-pick that touches vif.comp and downgrades a memoryBarrierShared(); barrier(); pair back to a bare barrier() will silently re-introduce the NVIDIA Vulkan 1.4 race.
  • Invariant: vif.comp shared-memory ordering between cooperative-write phases must be release-acquire, not just a bare workgroup-execution barrier. NVIDIA's Vulkan 1.4 default memory model requires the explicit shared-memory release; bare barrier() works at API 1.3 by accident on this driver. SCALE is irrelevant — the fix applies to all four pipeline specialisations because the barrier sites are in the SCALE-shared code. Do NOT remove the explicit memoryBarrierShared() calls even if a perf review claims they are redundant under the GLSL spec wording: empirical real-hardware evidence in research-0089 2026-05-09 appendix shows otherwise on NVIDIA driver 595.71.05.
  • Re-test: apply the local API-1.4 bump (core/src/vulkan/common.c 3 sites + vma_impl.cpp VMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build with meson setup ... -Denable_vulkan=enabled, then run python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan --device 1 --places 4. Expect 0/48 across all four scales. Run the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3" against --vulkan_device 1; expect 5 identical (integer_vif_num_scale2, integer_vif_den_scale2) = (+2.494358e+04, +2.522523e+04) pairs at frame 5. Note that --vulkan_device 0 on this multi-GPU host is the Intel Arc A380 lane and will still fail at API 1.4 (separate T-VK-VIF-1.4-RESIDUAL-ARC row Open).

0309 — Vulkan VIF API-1.4 Phase 2 dump (T-VK-VIF-1.4-RESIDUAL)

  • Touches: docs/research/0089-vulkan-vif-fp-residual-bisect-2026-05-08.md (2026-05-09 status appendix with empirical numbers from the live RTX 4090), docs/state.md (T-VK-VIF-1.4-RESIDUAL row updated with the localisation), core/src/vulkan/AGENTS.md (new invariant row pinning the SCALE = 2 cross-subgroup-reduction memory-model finding), CHANGELOG.md (lusoris fork "Changed" entry). No code touched; the Phase 3 shader memory-model fix lands in a separate PR. Upstream Netflix/vmaf has no Vulkan backend so conflict probability for the AGENTS.md row is zero — entry exists because the empirical localisation flips the open state-row hypothesis from FP-precision to memory-model and retires the places=3 override path that earlier rebase scaffolding might have suggested.
  • Invariant: vif.comp SCALE = 2 specialisation's Phase-4 cross-subgroup int64 reduction is non-deterministic on NVIDIA driver 595.71.05 + Vulkan 1.4.341 (lines 547–592, subgroupAdd barrier() + thread-0 read of s_lmem). API 1.3 lane is fully deterministic on the same hardware. The four apiVersion pinning sites in core/src/vulkan/common.c + core/src/vulkan/vma_impl.cpp stay at 1.3 until Phase 3 lands the explicit memory-scope barrier and a 5-run determinism gate confirms run-to-run identical (num, den) plus places=4 0/48 on NVIDIA. The places=3 override path is eliminated from the unblock options.
  • Re-test: apply the local API-1.4 bump (core/src/vulkan/common.c 3 sites + vma_impl.cpp VMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build with meson setup ... -Denable_vulkan=enabled, then run the gate and the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3". Expect 45/48 places=4 failures on integer_vif_scale2 (max abs 1.527e-02) AND 5 distinct (integer_vif_num_scale2, integer_vif_den_scale2) pairs across 5 runs of --feature 'vif_vulkan=debug=true'. Both observations reproduced bit-for-bit on this session's hardware lane (UUID e478b41b-5c4f-1ddb-f990-e44916aff4c8).

0308 — encoder knob-sweep recipe-regression policy (ADR-0308, docs-only)

  • Touches: docs/research/0080-encoder-knob-sweep-findings.md, docs/adr/0308-encoder-knob-sweep-recipe-regression-policy.md, docs/adr/README.md (index row), ai/AGENTS.md (knob-sweep invariant section), changelog.d/changed/encoder-knob-sweep-findings.md. No code touched; companion to PR #400 (ADR-0305 + Research-0077 + ai/scripts/analyze_knob_sweep.py). Upstream Netflix/vmaf has no encoder-knob-sweep surface, so conflict probability is zero — this entry exists only because the policy threshold (7-of-9 structural cut) is rebase-sensitive on the corpus shape.
  • Invariant: the 7-of-9 source-count threshold from ADR-0308 §Decision point 1 is calibrated against the current 9-source Netflix Public Dataset corpus. If the corpus grows past 9 sources (e.g. UGC expansion per ADR-0287, or HDR additions), re-derive the absolute threshold as a fraction (≥7/9 ≈ 78 %). The structural cluster is sharp on the current corpus (top-15 cells all hit 9-of-9, no observed cells in 4-6 range), so a fractional cut at ~75 % is robust. Do NOT relax bitrate_tol_pct (default 5.0) or vmaf_tol (default 0.1) in ai/scripts/analyze_knob_sweep.py without an ADR — those tolerances are calibrated against the per-frame VMAF noise floor and bitrate quantisation in libavformat muxers.
  • Re-test: pytest ai/tests/test_knob_sweep_analysis.py -v (script logic; ships in PR #400). Policy gate is offline: regenerate runs/phase_a/full_grid/comprehensive.jsonl via tools/vmaf-tune/src/vmaftune/hw_encoder_corpus.py (3-hour run on a single host with NVENC + QSV) then re-run python ai/scripts/analyze_knob_sweep.py --jsonl <adapted.jsonl> --out-dir runs/phase_a/full_grid/reports/ and diff the resulting summary.md against docs/research/0080-encoder-knob-sweep-findings.md headline table. Structural cluster (top-15 cells, all 9-of-9) is the invariant to defend.

0228 — Vulkan 1.4 bump deferred (ADR-0264, docs-only)

  • Touches: none (docs-only PR). Future Step A of T-VK-1.4-BUMP will touch core/src/feature/vulkan/shaders/vif.comp and core/src/feature/vulkan/shaders/ciede.comp; Step B will touch the three apiVersion sites in core/src/vulkan/common.c (lines 54, 264, 374) and the VMA_VULKAN_VERSION define in core/src/vulkan/vma_impl.cpp (line 22).
  • Invariant: master stays on VK_API_VERSION_1_3 and VMA_VULKAN_VERSION = 1003000. Lifting the constant in any future upstream sync (Netflix doesn't ship a Vulkan backend, so the conflict is improbable) without first auditing precise / OpDecorate ... NoContraction decoration on vif.comp and ciede.comp will reintroduce the NVIDIA-driver regression captured in research-0053. The psnr_hvs_strict_shaders -O0 list in core/src/vulkan/meson.build is the existing precedent for shader-side bit-exactness mitigations and should be the place a 1.4-era audit lands its results (potentially expanding to cover vif.comp + ciede.comp if the precise audit decides the optimizer is the right place to gate).
  • Re-test: when Step B lands, the gate is python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan and the same with --feature ciede against NVIDIA + RADV + lavapipe; max abs diff must stay ≤ 5.0e-05 (places=4) on all three.

0229 — HIP fifth-consumer kernel float_ansnr_hip (ADR-0266)

0228 — y4m_convert_411_422jpeg 1-byte heap-buffer-overflow fix

0228 — vmaf-tune resolution-aware model selection (ADR-0289)

0282 — vmaf-tune AMD AMF codec adapters (ADR-0282)

0228 — tools/vmaf-tune/ codec-agnostic encode dispatcher (ADR-0294)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/encode.py — refactored to look up the codec adapter and delegate argv composition. Wholly fork-local.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, codec_adapters/x264.py — adapter contract gains ffmpeg_codec_args(preset, quality) and extra_params(). Both are duck-typed; missing methods fall back to the legacy x264-CRF shape.
  • tools/vmaf-tune/tests/test_encode_multi_codec.py — new 19-test suite pinning the dispatcher contract per codec.
  • docs/usage/vmaf-tune.md — new "Codec adapter contract" section.
  • Invariant: the harness (encode.py, corpus.py) must not branch on codec identity. The only codec-aware code is the per-adapter codec_adapters/*.py file. Any future change that adds an if adapter.encoder == "..." to the harness regresses ADR-0294's whole-purpose. The corpus row schema stays at SCHEMA_VERSION=1 — crf is preserved as the row column even when the underlying codec's quality knob is -cq / -qp / etc.; EncodeRequest.quality is a request-side property only. Adapters that don't yet expose ffmpeg_codec_args are intentionally permitted to fall back to the legacy x264-CRF shape; removing that fallback would break in-flight adapter PRs landing one-at-a-time.
  • Re-test on rebase:

```bash pytest tools/vmaf-tune/tests/ -q # 32 passed (13 existing + 19 multi-codec)

python -c " from pathlib import Path from vmaftune.encode import EncodeRequest, build_ffmpeg_command req = EncodeRequest( source=Path('ref.yuv'), width=1920, height=1080, pix_fmt='yuv420p', framerate=24.0, encoder='libx264', preset='medium', crf=23, output=Path('out.mp4'), ) cmd = build_ffmpeg_command(req) assert cmd[cmd.index('-c:v') + 1] == 'libx264' assert cmd[cmd.index('-preset') + 1] == 'medium' assert cmd[cmd.index('-crf') + 1] == '23' print('x264 dispatcher path OK') "

0260 — vmaf-tune --sample-clip-seconds (ADR-0301)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/{cli,corpus,encode,score,__init__}.py — fork-local. No upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/tests/test_corpus.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0301-vmaf-tune-sample-clip.md, docs/adr/_index_fragments/0301-vmaf-tune-sample-clip.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md.
  • Invariant: corpus JSONL SCHEMA_VERSION bumped to 2 — additive clip_mode key only. Sample-clip windows are mirrored on both sides via FFmpeg input-side -ss/-t (encode) and libvmaf's --frame_skip_ref / --frame_cnt (score). The _resolve_sample_clip() helper is the single source of truth for the centre-anchored slice math; do not duplicate the computation elsewhere. Falls back silently to "full" when N >= duration_s.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep sample-clip

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_amf,hevc_amf,av1_amf,_amf_common}.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py — registry extended with three AMF entries.
  • tools/vmaf-tune/tests/test_codec_adapter_amf.py (new).
  • tools/vmaf-tune/tests/test_corpus.py — Phase A test renamed from test_known_codecs_phase_a_is_x264_only to test_known_codecs_includes_x264_and_amf.
  • tools/vmaf-tune/AGENTS.md — adds AMF preset-compression invariant.
  • docs/usage/vmaf-tune.md — adds Hardware encoders section.
  • Invariant: the 7-into-3 preset compression table in _amf_common.py (_PRESET_TO_AMF) is the cross-codec axis Phase B / C consumers depend on. Every AMF adapter accepts the canonical 7 preset names (placeboultrafast) and maps them onto the three AMF rungs (quality / balanced / speed). Do not extend the preset vocabulary without amending ADR-0282 — registry uniformity (no codec-identity branching in the harness search loop) rests on every codec accepting the same names.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/resolution.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/corpus.py — adds CorpusOptions.resolution_aware: bool = True and pipes the effective model through score_res.request.model into the JSONL row.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds --resolution-aware / --no-resolution-aware (BooleanOptionalAction, default on).
  • tools/vmaf-tune/tests/test_resolution.py (new).
  • docs/usage/vmaf-tune.md — new "Resolution-aware mode" section.
  • docs/adr/0289-vmaf-tune-resolution-aware.md (new) + docs/research/0064-vmaf-tune-resolution-aware.md (new).
  • tools/vmaf-tune/AGENTS.md — two new invariant notes.
  • Invariant: the height-only decision rule (height >= 2160vmaf_4k_v0.6.1, else vmaf_v0.6.1) is the documented contract. The JSONL vmaf_model field is now per-row (not per-job) — mixed ladder corpora legitimately contain multiple distinct values across rows. Downstream consumers (Phase B / C / D) must group/filter by vmaf_model rather than assuming a constant. Width is accepted in the API for symmetry but ignored in the body; do not branch on it without a follow-up ADR.
  • Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep resolution-aware

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • core/tools/y4m_input.c — upstream-mirrored Daala-derived Y4M parser. The fix sits inside the 4:1:1 → 4:2:2-jpeg chroma upsample routine y4m_convert_411_422jpeg, lines ~500–530 in the function's three sub-loops. Upstream Netflix/vmaf carries the same shape; if upstream lands its own fix during a sync, prefer the upstream version and drop ours.
  • core/test/test_y4m_411_oob.c (new, fork-local) — drives the minimal W=2 H=4 4:1:1 stream through video_input_open + video_input_fetch_frame. Wholly fork-added; no upstream collision.
  • core/test/meson.build — adds test_y4m_411_oob executable + test() registration.
  • Invariant: the first two sub-loops of y4m_convert_411_422jpeg must guard _dst[(x << 1) | 1] writes with (x << 1 | 1) < dst_c_w, matching the third sub-loop's existing guard. Without the guard a 4:1:1 stream of width 2 (dst_c_w == 1) writes one byte past the destination chroma row.
  • Re-test:
  • cd libvmaf && meson setup ../build-asan --buildtype=debug -Db_sanitize=address -Db_lundef=false -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
  • ninja -C build-asan test/test_y4m_411_oob
  • ASAN_OPTIONS=detect_leaks=0 ./build-asan/test/test_y4m_411_oob — must report 1 tests run, 1 passed. Pre-fix the binary aborts with AddressSanitizer: heap-buffer-overflow … WRITE of size 1 at y4m_input.c:507.

0270 — saliency_student_v1 fork-trained on DUTS-TR (ADR-0286)

  • Touches:
  • model/tiny/registry.json — adds the saliency_student_v1 row. Fork-local registry; no upstream overlap.
  • model/tiny/saliency_student_v1.onnx (+ .json sidecar) — new weights and metadata. Fork-local.
  • ai/scripts/train_saliency_student.py — new training script. Wholly fork-local under ai/, which has no upstream counterpart.
  • docs/ai/models/saliency_student_v1.md, docs/research/0062-saliency-student-from-scratch-on-duts.md, docs/adr/0286-saliency-student-fork-trained-on-duts.md — new docs under fork-local trees.
  • Invariant: the C-side feature_mobilesal.c extractor's tensor-name contract — input (NCHW [1, 3, H, W]) and saliency_map (NCHW [1, 1, H, W]) — must continue to match the ONNX graph for both saliency_student_v1.onnx and the legacy mobilesal.onnx placeholder. Future weights swaps can change the graph internals freely but must keep these names + shapes; the smoke test asserts the registration. The op-allowlist constraint (graph uses only ops in core/src/dnn/op_allowlist.c) carries over from ADR-0218 — Resize is not used; ConvTranspose is the upsample op for v1 to keep the graph load-clean against vanilla origin/master.
  • Re-test:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python -c "
from ai.src.vmaf_train.op_allowlist import check_model
from pathlib import Path
r = check_model(Path('model/tiny/saliency_student_v1.onnx'))
assert r.ok, r.pretty()
print('allowlist OK')
"
meson test -C build --suite=fast mobilesal

0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)

  • Touches:
  • core/src/feature/hip/float_ansnr_hip.{c,h} (new) — fifth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/float_ansnr_cuda.c call-graph-for-call-graph; init/submit/collect/close invoke the kernel-template helpers in the same order; the submit body intentionally bypasses vmaf_hip_kernel_submit_pre_launch (no atomic, kernel writes per-block (sig, noise) interleaved float partials directly).
  • core/src/hip/meson.build — adds the new TU to hip_sources.
  • core/src/feature/feature_extractor.c — adds the extern VmafFeatureExtractor vmaf_fex_float_ansnr_hip; declaration and the registry row under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — adds test_float_ansnr_hip_extractor_registered sub-test pinning the lookup contract.
  • Invariant — the submit_pre_launch bypass is load-bearing. The CUDA twin makes the same choice for the same reason. If a future PR adds a submit_pre_launch call to float_ansnr_cuda.c's submit path, the HIP twin must follow in the same PR. Likewise the readback shape (wg_count * 2u * sizeof(float)) and the bpc table (peak/psnr_max for 8/10/12/16-bit) mirror the CUDA twin verbatim — keep aligned on rebase.
  • Re-test on rebase:
cd libvmaf
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build  # 48/48 green (47 CPU + HIP smoke)

0230 — HIP sixth-consumer kernel motion_v2_hip (ADR-0267)

  • Touches:
  • core/src/feature/hip/integer_motion_v2_hip.{c,h} (new) — sixth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/integer_motion_v2_cuda.c call-graph-for-call-graph; carries the VMAF_FEATURE_EXTRACTOR_TEMPORAL flag and a flush() callback. The state struct has a uintptr_t pix[2] ping-pong slot pair tracked outside the kernel-template (the template models a single device+host pair only).
  • core/src/hip/meson.build — adds the new TU to hip_sources.
  • core/src/feature/feature_extractor.c — adds the extern VmafFeatureExtractor vmaf_fex_integer_motion_v2_hip; declaration and the registry row under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — adds test_motion_v2_hip_extractor_registered sub-test pinning the lookup contract (extractor name is motion_v2_hip, matching the CUDA twin's motion_v2_cuda naming).
  • Invariant — temporal-extractor + ping-pong shape. The VMAF_FEATURE_EXTRACTOR_TEMPORAL flag bit, the flush() callback registration, and the uintptr_t pix[2] slot pair are load-bearing for the runtime PR (T7-10b). The runtime PR will swap uintptr_t pix[2] for a real device-buffer handle pair matching the CUDA twin's VmafCudaBuffer *pix[2]. On rebase: if the CUDA twin's flush-pass shape changes (currently min(score[i], score[i+1])), update the HIP twin's flush_fex_hip body in the same PR.
  • Re-test on rebase: same as 0229 — meson test -C build with enable_hip=true exercises the smoke contract.

0227 — ms_ssim_vulkan submit-side migrated to kernel_template (T-GPU-DEDUP-26)

  • Touches:
  • core/src/feature/vulkan/ms_ssim_vulkan.cextract()'s raw VkCommandBuffer / VkFence / vkAllocateCommandBuffers / vkBeginCommandBuffer / vkCreateFence / vkQueueSubmit / vkWaitForFences / vkDestroyFence / vkFreeCommandBuffers blocks become VmafVulkanKernelSubmit triples (vmaf_vulkan_kernel_submit_begin / _submit_end_and_wait / _submit_free). One triple covers the decimate-pyramid command buffer; one triple per scale covers the per-scale SSIM submit. The pipeline-side bundles (pl_decimate 2-binding 4-variant + pl_ssim 10-binding 9-variant) and their _add_variant() chains are unchanged from the prior migration.
  • Invariant: any future submit-side template change (timeline semaphores, deferred fence release, queue-family parameterisation) must keep the helpers' synchronous-wait + per-frame fence + per-frame command-buffer contract intact, since ms_ssim_vulkan.c does host readback of the l_partials / c_partials / s_partials buffers immediately after _submit_end_and_wait returns. The submit-side contract is the same one already documented in core/src/vulkan/AGENTS.md's "Rebase-sensitive invariants" section for kernel_template.h.
  • Re-test:

```bash cd libvmaf && meson test -C build python scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature float_ms_ssim --backend vulkan --places 4

0231 — SHA-pin GitHub Actions (OSSF Pinned-Dependencies)

  • Touches: every workflow file under .github/workflows/. All 13 fork workflows (docker-image.yml, docs.yml, ffmpeg-integration.yml, libvmaf-build-matrix.yml, lint-and-format.yml, nightly-bisect.yml, nightly.yml, release-please.yml, rule-enforcement.yml, scorecard.yml, security-scans.yml, supply-chain.yml, tests-and-quality-gates.yml) had their uses: directives rewritten from <owner>/<repo>@vN[.M.K] to <owner>/<repo>@<40-char-sha> # vN.M.K. 97 references converted; the SLSA reusable-workflow ref in supply-chain.yml is the single documented holdout (see Invariant below).
  • Invariant — SHA-pin policy for uses:. Every action reference in .github/workflows/*.yml MUST be a 40-char commit SHA with the semver tag preserved as a trailing # vN.M.K comment. The OSSF Scorecard Pinned-Dependencies check parses both forms and a floating tag (@vN) is treated as unpinned and counts against the aggregate score. Single permitted exception: the SLSA generator reusable workflow (slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml) must keep its vX.Y.Z tag form because GitHub Actions consumers cannot SHA-pin reusable-workflow refs in every code path; the exception is documented inline in supply-chain.yml and survives on each rebase. Why this matters on upstream sync: Netflix upstream does not ship the fork's CI tree, so a /sync-upstream run that drags new workflow content (e.g. via repository templates or bot-authored bumps) into .github/workflows/ can re-introduce floating-tag references unnoticed. The post-rebase check below is the standing gate — anything that lights up needs to be re-pinned before merging the sync.
  • Re-test on rebase:
# Anything that prints is a regression — every uses: must be either
# already SHA-pinned (40 hex) or, for the documented SLSA exception,
# the slsa-github-generator reusable-workflow ref.
grep -hnE '^\s*(- )?uses:\s+[^@]+@[^ #]+\s*$' .github/workflows/*.yml \
  | grep -vE '@[a-f0-9]{40}' \
  | grep -v 'slsa-framework/slsa-github-generator/.github/workflows/'
# SHA-resolution sanity for any new pin (per-action):
gh api repos/<owner>/<repo>/git/ref/tags/<vN.M.K> --jq '.object.sha'
# If the result is a "tag" object (annotated tag), deref:
gh api repos/<owner>/<repo>/git/tags/<sha-from-prev> --jq '.object.sha'

0226 — CUDA drain-batch engine-loop opt (T-GPU-OPT-1)

  • Touches:
  • core/src/cuda/drain_batch.{h,c} (new) — TLS drain-batch table + shared drain stream + _open()/register/_flush()/_close() API.
  • core/src/libvmaf.c — engine-side per-frame loop now wraps submit/collect with _open() + _flush() so all CUDA extractor finished events are waited on a single shared drain stream.
  • All 12 CUDA feature kernels (core/src/feature/cuda/*.c) register their finished event + drained flag with the drain batch on submit; collect skips its private cuStreamSynchronize when drained is true.
  • Invariant — drained-flag contract. Every CUDA extractor's collect path must check the per-frame drained flag and skip its own cuStreamSynchronize when set; otherwise the drain batching is a no-op. The flag is reset to false per frame inside vmaf_cuda_drain_batch_register().
  • Re-test on rebase:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast cuda

Expected: all CUDA tests green; bench shows ≥5% wall-clock gain on a 7-extractor VMAF model (model.json with all feature extractors enabled).

0225 — Netflix bench snapshot regen (upstream a44e5e61 motion fix)

  • Touches:
  • testdata/netflix_benchmark_results.json — fork-added snapshot. CPU rows now reflect the post-fix motion feature; cuda / sycl rows from the previous regen are preserved unchanged because those backends were not exercised on this rerun (host-environment tooling — wrong renderD path, libvmaf_cuda not enabled in the local FFmpeg build). Future full regens should include cuda / sycl.
  • testdata/bench_all.sh — default VMAF= no longer points at /usr/local/bin/vmaf (which on most dev hosts is stuck at the pre-upstream-a44e5e61 v3.0.0); now defaults to the in-tree fork build at core/build/tools/vmaf.
  • testdata/benchmark_netflix.pyFFMPEG, YUVDIR and the hardcoded LD_LIBRARY_PATH=/usr/local/lib are now overridable via VMAF_FFMPEG, VMAF_YUVDIR and any caller-set LD_LIBRARY_PATH.
  • Invariant: the snapshot's CPU pooled VMAF for src01_576x324 is 76.667828 (post-fix), not 76.668904 (the upstream-buggy mirror). If /sync-upstream ever re-pulls a Netflix change that touches motion.c mirror-handling, this number is the reference.
  • Re-test:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
LD_LIBRARY_PATH=$(pwd)/build/src python3 \
    ../testdata/benchmark_netflix.py

Expected CPU pooled rows: 76.667828, 35.068672, 7.985899.

0224 — CUDA graph capture feasibility (research-0047, DEFER)

  • Touches: none — investigation-only; no code lands. The research digest docs/research/0047-cuda-graph-capture-feasibility.md documents why a CUDA graph capture path on the per-frame submit chain is deferred rather than shipped (realised wall-clock gain capped at ~1-3% vs. the predicted 10-20%, with a 4-slot picture-pool rotation that defeats single-graph capture and forces per-frame cuGraphExecKernelNodeSetParams rebinding for (ref, dis) device pointers).
  • Invariant: the kernel_template.h docstring keeps naming VmafCudaKernelLifecycle.finished as a graph-capture hook point. Don't prune that comment on rebase — leaving the door open in the template is free, and the digest's "what needs to be true for a future GO" section depends on the hook still being there.
  • Re-test on rebase:
# Confirm the docstring still references graph capture as the hook
# point — wording change is fine, removal is not.
grep -q "graph capture" core/src/cuda/kernel_template.h

0223 — ADR slug-drift repair in CHANGELOG / rebase-notes (PR #304 follow-up)

  • Touches: CHANGELOG.md, docs/rebase-notes.md. No code; no upstream-shared path; no public-API surface.
  • Invariant: every [ADR-NNNN](docs/adr/NNNN-slug.md) link in the fork's tracked docs resolves to an actual on-disk file under docs/adr/. Repaired 4 broken slugs that did not exist on disk (0138-iqa-convolve-avx2-bitexact-double0138-iqa-convolve-avx2-bitexact-double, 0140-simd-dx-framework0140-simd-dx-framework, 0190-ms-ssim-vulkan0190-ms-ssim-vulkan, 0178-vulkan-adm-kernel0178-vulkan-adm-kernel). All retained their cited NNNN per ADR-0028 (NNNN is immutable once Accepted).
  • Re-test on rebase: from repo root, the following must print no lines:
for ref in $(grep -ohE 'docs/adr/[0-9]{4}-[a-z0-9-]+\.md' \
    CHANGELOG.md docs/rebase-notes.md AGENTS.md docs/state.md \
    | sort -u); do
  test -f "$ref" || echo "MISSING: $ref"
done

0125 — cambi_vulkan migrated to kernel_template (T-GPU-DEDUP-25, 5-bundle)

  • Touches:
  • core/src/feature/vulkan/cambi_vulkan.c — state's quintet (dsl_2bind + 5× pl_layout_* + shader_modules[CAMBI_PL_COUNT] ared desc_pool) collapses to five VmafVulkanKernelPipeline bundles (pl_trivial, pl_derivative, pl_filter_mode, pl_decimate, pl_mask_dp), each owning its own descriptor pool. The first slot of pipelines[] per stage aliases the bundle's base pipeline; CAMBI_PL_FILTER_MODE_V, CAMBI_PL_MASK_SAT_COL, and CAMBI_PL_MASK_THRESHOLD are sibling variants built via vmaf_vulkan_kernel_pipeline_add_variant().
  • cambi_vk_alloc_set takes a bundle pointer (->desc_pool / ->dsl) — every dispatch site picks the bundle that matches its push-constant struct.
  • The cambi_vk_make_dsl / cambi_vk_make_pl / cambi_vk_create_shader / cambi_vk_build_pipeline helpers are dropped — the template subsumes them.
  • Invariant — variants destroyed before bundle, base alias must be skipped. Five distinct push-constant struct sizes (CambiVkPushTrivial / CambiVkPushDerivative / CambiVkPushFilterMode / CambiVkPushDecimate / CambiVkPushMaskDp) force five bundles even though every stage's DSL is 2-binding SSBO; _add_variant() only siblings pipelines under the same layout. close_fex must vkDestroyPipeline() the variant slots (CAMBI_PL_FILTER_MODE_V, CAMBI_PL_MASK_SAT_COL, CAMBI_PL_MASK_THRESHOLD) before calling vmaf_vulkan_kernel_pipeline_destroy() on each bundle.
  • Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit): cambi mean = 0.0, identical to pre-migration (the pair has no banding artifacts).
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper. Upstream Netflix/vmaf has no Vulkan backend, so there is nothing to merge against.

0124 — ssimulacra2_vulkan migrated to kernel_template (T-GPU-DEDUP-24, 4-bundle)

  • Touches:
  • core/src/feature/vulkan/ssimulacra2_vulkan.c — state's 16 long-lived pipeline-object fields (4× *_dsl + *_pl + *_shader + the shared desc_pool) collapse to four VmafVulkanKernelPipeline bundles (pl_xyb, pl_mul, pl_blur, pl_ssim), each owning its own descriptor pool. The first slot of each per-bundle pipeline array (xyb_pipelines[0], mul_pipelines[0], blur_pipelines_h[0], ssim_pipelines[0]) aliases the bundle's base VkPipeline; remaining per-scale / per-pass slots are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • ss2v_build_pipeline_int3 reroutes through _add_variant() instead of calling vkCreateComputePipelines directly; ss2v_alloc_set takes a bundle pointer (->desc_pool / ->dsl) instead of a separate DSL argument; descriptor-set free sites at the tail of ss2v_run_scale route to each bundle's pool.
  • The ss2v_make_dsl / ss2v_make_pl / ss2v_create_shader helpers are dropped — the template subsumes them.
  • Invariant — variants destroyed before bundle, slot 0 alias must be skipped. Four distinct DSL shapes (XYB = 6 SSBOs, MUL = 3, BLUR = 2, SSIM = 8) prevent collapsing to one bundle: _add_variant() only siblings pipelines under the same layout. close_fex must vkDestroyPipeline() the variant slots in xyb_pipelines[1..N-1], mul_pipelines[1..N-1], ssim_pipelines[1..N-1], blur_pipelines_h[1..N-1], and every slot of blur_pipelines_v[] before calling vmaf_vulkan_kernel_pipeline_destroy() on each bundle, and must skip slot 0 of the first three arrays + blur_pipelines_h to avoid double-freeing the aliased base.
  • Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit): ssimulacra2 mean = 24.613842, identical to pre-migration.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper. Upstream Netflix/vmaf has no ssimulacra2 extractor and no Vulkan backend, so there is nothing to merge against.

0118 — psnr_hvs_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-18)

  • Touches:
  • core/src/feature/vulkan/psnr_hvs_vulkan.c — state's dsl + pipeline_layout + shader + desc_pool + pipeline[3] collapses to VmafVulkanKernelPipeline pl + VkPipeline pipeline_chroma_u + VkPipeline pipeline_chroma_v. Plane 0 is the template's base pipeline; planes 1+2 are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • New psnr_hvs_plane_pipeline() accessor maps plane index to the right VkPipeline handle.
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the chroma U/V variants before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan in T-GPU-DEDUP-7.
  • Numerical contract: unchanged. Same shaders + spec-constants push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0119 — vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-19)

  • Touches:
  • core/src/feature/vulkan/vif_vulkan.c — state's dsl + pipeline_layout + shader + desc_pool + pipelines[4] collapses to VmafVulkanKernelPipeline pl + VkPipeline scale_variants[3]. Scale 0 is the template's base pipeline; scales 1, 2, 3 are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • New vif_scale_pipeline() accessor maps scale index to the right VkPipeline handle (replaces s->pipelines[scale]).
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the 3 scale variants before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan in T-GPU-DEDUP-7 and psnr_hvs_vulkan in T-GPU-DEDUP-18.
  • Numerical contract: unchanged. Same shaders, same spec-constants, same push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0120 — float_vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-20)

  • Touches:
  • core/src/feature/vulkan/float_vif_vulkan.c — state collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl; the VkPipeline pipelines[2][4] 2-D lookup table is preserved so the existing [mode][scale] dispatch path stays clean, but pipelines[0][0] aliases s->pl.pipeline (the template's base). The other 6 entries are sibling pipelines created via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariant — variants destroyed before bundle. close_fex must vkDestroyPipeline() the 6 sibling variants (every (mode, scale) except (0, 0)) before calling vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — same rule as ssim_vulkan / psnr_hvs_vulkan / vif_vulkan.
  • Invariant — pipelines[0][0] aliasing. The base pipeline handle is owned by s->pl.pipeline; we copy it into pipelines[0][0] after _create() so the dispatch path can use a uniform 2-D lookup. The destroy loop must skip (mode=0, scale=0) to avoid double-freeing the template's pipeline.
  • Numerical contract: unchanged. Same shaders, spec-constants (mode + scale), push-constants. Netflix-pair smoke matches integer_vif bit-identically to 4 decimals.
  • Rebase impact: low. Builds on top of PR #272's _add_variant() helper.

0122 — float_adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-22)

  • Touches:
  • core/src/feature/vulkan/float_adm_vulkan.c — twin to adm_vulkan (T-GPU-DEDUP-21); 16-pipeline 2-D [stage][scale] array. State collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl. pipelines[0][0] aliases s->pl.pipeline; the other 15 entries are siblings via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariants:
  • Variants destroyed before bundle.
  • pipelines[0][0] aliasing — destroy loop must skip (stage=0, scale=0).
  • Numerical contract: unchanged. Same float (_s suffix) primitives from adm_tools.c; same 5-element spec-constant tuple; same float partial accumulation reduced in double on the host.

0121 — adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-21)

  • Touches:
  • core/src/feature/vulkan/adm_vulkan.c — state collapses dsl + pipeline_layout + shader + desc_pool to VmafVulkanKernelPipeline pl; the VkPipeline pipelines[4][4] 2-D lookup is preserved so the per-stage dispatch path stays clean. pipelines[0][0] aliases s->pl.pipeline (the template's base); the other 15 entries are sibling pipelines via vmaf_vulkan_kernel_pipeline_add_variant().
  • Invariants:
  • Variants destroyed before bundle (same rule as ssim_vulkan / psnr_hvs / vif / float_vif).
  • pipelines[0][0] aliasing — destroy loop must skip (stage=0, scale=0) to avoid double-freeing the template's pipeline.
  • Numerical contract: unchanged. Same shaders + 5-element spec-constant tuple (width, height, bpc, scale, stage) + push-constants.
  • Rebase impact: low. Builds on top of PR #272.

0123 — ms_ssim_vulkan 2-bundle migration (T-GPU-DEDUP-23)

  • Touches:
  • core/src/feature/vulkan/ms_ssim_vulkan.c — state collapses decimate_dsl + decimate_pl + decimate_shader + ssim_dsl + ssim_pl + ssim_shader + desc_pool (7 fields) to two bundles VmafVulkanKernelPipeline pl_decimate + pl_ssim. Each bundle owns its own descriptor pool. The kernel has two distinct pipeline shapes (decimate = 2 SSBO bindings, ssim = 10 bindings), so two bundles is the minimum — _add_variant() only siblings pipelines under the same layout.
  • decimate_pipelines[0] aliases pl_decimate.pipeline (the template's base = scale 0). The remaining MS_SSIM_SCALES - 2 decimate variants (scales 1..3) are siblings via _add_variant().
  • ssim_pipeline_horiz[0] aliases pl_ssim.pipeline (base = scale 0, pass 0). The other 9 entries (4× ssim_pipeline_horiz for scales 1..4, plus 5× ssim_pipeline_vert for scales 0..4) are variants.
  • Invariant — variants destroyed before bundle. Same rule as ADR-0106 entry 0106: close_fex must destroy decimate_pipelines[1..3] and ssim_pipeline_horiz[1..4] + ssim_pipeline_vert[0..4] before calling vmaf_vulkan_kernel_pipeline_destroy() on pl_decimate / pl_ssim.
  • Invariant — [0] aliasing destroy-skip. decimate_pipelines[0] and ssim_pipeline_horiz[0] must not be passed to vkDestroyPipeline in close_fex_destroy() already releases them via pl_decimate.pipeline / pl_ssim.pipeline. Double-free is UB. The destroy loops in close_fex start at i = 1 for decimate and skip i == 0 for ssim_horiz.
  • Invariant — per-bundle descriptor pool. The shared s->desc_pool is gone; alloc_descriptor_set now takes a const VmafVulkanKernelPipeline *bundle and uses bundle->desc_pool + bundle->dsl. Per-frame vkFreeDescriptorSets calls must target the matching pool (pl_decimate.desc_pool for decimate sets, pl_ssim.desc_pool for ssim sets) — mixing them is undefined behavior.
  • Numerical contract: unchanged. Same shaders, spec constants, push constants, and dispatch order as before. float_ms_ssim Netflix-pair smoke (576×324×48f) reports mean 0.963241; ssim pyramid intermediate values bit-identical to pre-migration run.
  • Rebase impact: low. Upstream Netflix has no Vulkan backend. Conflicts only against the parallel T-GPU-DEDUP-{18..22} PRs (#284–#288) on CHANGELOG.md / docs/rebase-notes.md — auto-resolve keeps both halves.

0106 — Vulkan kernel template multi-pipeline + ssim/motion migration (T-GPU-DEDUP-7)

  • Touches:
  • core/src/vulkan/kernel_template.h — new vmaf_vulkan_kernel_pipeline_add_variant() helper. Takes the base pipeline bundle (DSL / pipeline layout / shader / pool owned by vmaf_vulkan_kernel_pipeline_create) plus a partial VkComputePipelineCreateInfo and produces a sibling VkPipeline re-using the same layout / shader. The base _create and _destroy entry points are unchanged; existing consumers (psnr, moment, ciede) keep working.
  • core/src/feature/vulkan/motion_vulkan.c — state collapses VkPipeline pipelines[2] (kept "for SYCL parity" but functionally identical because COMPUTE_SAD goes through push constants, not spec-constants) to a single VmafVulkanKernelPipeline pl. create_pipelines / close_fex shrink to template-driven create + destroy.
  • core/src/feature/vulkan/ssim_vulkan.c — state becomes VmafVulkanKernelPipeline pl + VkPipeline pipeline_vert. Pass 0 (horizontal) is the template's base pipeline; pass 1 (vertical) is created via _add_variant(). close_fex destroys the variant first, then calls vmaf_vulkan_kernel_pipeline_destroy() on the bundle.
  • Invariant — no spec-constant drift between base and variant. _add_variant() overwrites sType / stage.sType / stage.stage / stage.module / layout of the caller's VkComputePipelineCreateInfo so the variant is guaranteed to share the base's shader and layout. Callers control the variant's spec-constant via pSpecializationInfo. Reordering these overwrites lets a consumer accidentally bind a different shader module under the same layout — UB at descriptor-set time.
  • Invariant — variant destroyed before bundle. close_fex in ssim must vkDestroyPipeline(s->pipeline_vert) before vmaf_vulkan_kernel_pipeline_destroy(&s->pl) — the bundle's _destroy releases the descriptor pool, which the vkAllocateDescriptorSets issued against the variant pipeline's layout cleanly drops only when the variant pipeline is already gone.
  • Numerical contract: unchanged. Both kernels run identical shaders + spec-constants + push-constants as before; only the Vulkan boilerplate that creates / destroys the pipeline scaffolding moved to a shared owner. Cross-backend parity gate at places=4 holds — Netflix-pair float_ssim smoke (576×324×48f) reports mean 0.863, identical to pre-migration.
  • Rebase impact: low. The base pipeline-bundle helpers predate this change (PR #270 / #271); the new _add_variant is additive. Upstream Netflix has no Vulkan backend to conflict with.

0111 — integer_ciede_cuda migrated to kernel_template (T-GPU-DEDUP-11)

  • Touches:
  • core/src/feature/cuda/integer_ciede_cuda.c — state's CUstream + CUevent + CUevent + VmafCudaBuffer + host-pinned float* quintet collapses to VmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. init / collect / close call the template's lifecycle_init/readback_alloc/collect_wait/ lifecycle_close/readback_free helpers. submit keeps the pre-launch wait inline (intentional — ciede has no atomic, so the template's pre-launch memset is unnecessary).
  • Numerical contract: unchanged. Pure CUDA-boilerplate consolidation. The host-side reduction in collect still uses the same double accumulator over per-block float partials — places=4 (ADR-0187) holds.

0112 — integer_moment_cuda migrated to kernel_template (T-GPU-DEDUP-12)

  • Touches:
  • core/src/feature/cuda/integer_moment_cuda.c — state's stream/event/device-buffer/host-pinned quintet collapses to VmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. submit calls vmaf_cuda_kernel_submit_pre_launch (atomic counters require the device-side memset). init / collect / close call the matching template helpers.
  • Numerical contract: unchanged. Same per-frame atomic accumulators (4× uint64), same sums_host[i] / n_pixels host division.
  • Rebase impact: low. Upstream Netflix has no equivalent template; this consolidation is fork-local.

0113 — integer_motion_v2_cuda migrated to kernel_template (T-GPU-DEDUP-13)

  • Touches:
  • core/src/feature/cuda/integer_motion_v2_cuda.c — stream/event pair + sad device+host quintet collapses to lc + rb. Raw-pixel ping-pong pix[2] stays outside the bundle. submit keeps the memset on pic_stream inline rather than calling submit_pre_launch (the helper would move the memset to lc.str, which races with the kernel reading the accumulator). init / collect / close call the matching template helpers.
  • Numerical contract: unchanged. Same D2D copy, same conditional kernel launch on frame ≥ 1, same host-side min(score[i], score[i+1]) flush.

0114 — integer_ssim_cuda migrated to kernel_template (T-GPU-DEDUP-14)

  • Touches:
  • core/src/feature/cuda/integer_ssim_cuda.c — stream/event/partials device+host quintet collapses to lc + rb. Five intermediate float buffers (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) stay outside the bundle. submit keeps the cuStreamWaitEvent + horiz + vert + DtoH chain inline — SSIM writes one float per block (no atomic), so the template's submit_pre_launch memset is unnecessary. init / collect / close use the matching template helpers.
  • Numerical contract: unchanged. Same horiz-then-vert two-pass pipeline, same per-block float partial reduction in double on the host. places=4 (matching the ciede_cuda precision pattern) holds.
  • Rebase impact: low. Upstream Netflix has no equivalent; this is fork-added.

0115 — ms_ssim_cuda + psnr_hvs_cuda lifecycle migration (T-GPU-DEDUP-15)

  • Touches:
  • core/src/feature/cuda/integer_ms_ssim_cuda.c — stream + 2-event lifecycle replaced with VmafCudaKernelLifecycle lc; multi-level pyramid + SSIM intermediate + 3-partials buffers stay outside the template's single-pair readback bundle.
  • core/src/feature/cuda/integer_psnr_hvs_cuda.c — same shape; 3-plane ref/dist/partials triples remain inline.
  • Numerical contract: unchanged. The migration only affects init / close boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the s->strs->lc.str / s->events->lc.submit / s->finisheds->lc.finished field renames.

0116 — float_psnr/ansnr/motion cuda → kernel_template (T-GPU-DEDUP-16)

  • Touches:
  • core/src/feature/cuda/float_psnr_cuda.c — stream/event/partials quintet → lc + rb; input upload buffers ref_in / dis_in stay outside the bundle.
  • core/src/feature/cuda/float_ansnr_cuda.c — same shape; rb wraps the (sig, noise) interleaved partials.
  • core/src/feature/cuda/float_motion_cuda.c — same shape; rb wraps the SAD partials, blur[2] ping-pong stays outside.
  • Numerical contract: unchanged. Same dispatch geometry, same reduction order. Cross-backend parity gate at the kernels' contracted precision (places=3 per ADR-0192) holds.

0117 — float_adm + float_vif cuda lifecycle migration (T-GPU-DEDUP-17)

  • Touches:
  • core/src/feature/cuda/float_adm_cuda.c — stream + 2-event lifecycle replaced with VmafCudaKernelLifecycle lc; multi-stage DWT + CSF pipeline state stays outside the template's single-pair readback bundle.
  • core/src/feature/cuda/float_vif_cuda.c — same shape; 4-level pyramid + per-scale (num, den) pairs remain inline.
  • Numerical contract: unchanged. The migration only affects init / close stream-event boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the field renames.
  • Rebase impact: low. Upstream Netflix has no equivalent template; this is fork-added.

0107 — float_psnr_vulkan migrated to kernel_template (T-GPU-DEDUP-8)

  • Touches:
  • core/src/feature/vulkan/float_psnr_vulkan.c — state's dsl + pipeline_layout + shader + pipeline + desc_pool quintet is collapsed into a single VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy. No shader changes, no spec-constant changes, no push-constant changes.
  • Numerical contract: unchanged. The migration is a pure Vulkan-boilerplate consolidation. Cross-backend parity gate at places=4 holds — Netflix-pair smoke reports float_psnr mean 30.755 dB, identical to pre-migration.

0109 — float_ansnr_vulkan + motion_v2_vulkan migrated to kernel_template (T-GPU-DEDUP-9)

  • Touches:
  • core/src/feature/vulkan/float_ansnr_vulkan.c — single-pipeline state collapses to VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy.
  • core/src/feature/vulkan/motion_v2_vulkan.c — same shape.
  • Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Cross-backend parity gate at the kernel's contracted precision holds — Netflix-pair smoke reports float_ansnr mean 23.51 dB and motion2_v2_score mean 3.895, identical to pre-migration.

0110 — float_motion_vulkan migrated to kernel_template (T-GPU-DEDUP-10)

  • Touches:
  • core/src/feature/vulkan/float_motion_vulkan.c — single-pipeline state collapses to VmafVulkanKernelPipeline pl; create_pipelines and close_fex shrink to template-driven create + destroy.
  • Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Netflix-pair smoke reports motion mean 4.049 / motion2 mean 3.894, identical to pre-migration.
  • Rebase impact: low. Upstream Netflix has no Vulkan backend.

0108 — Bristol VI-Lab feasibility digest + BVI-CC ingest ADR (Draft)

  • Touches:
  • docs/research/0046-bristol-vi-lab-feasibility.md (new) — nine-dataset survey + use-case fit + effort estimate.
  • docs/adr/0241-bristol-bvi-cc-ingest.md (new, Status: Draft) — proposal to ingest BVI-CC as the second tiny-AI corpus.
  • docs/adr/README.md — index row for ADR-0241.
  • CHANGELOG.md — Added entry.
  • Numerical contract: not applicable (docs-only).
  • Rebase impact: none. Pure research deliverables; upstream Netflix has no equivalent surface.

0094 — Vulkan VkImage import v2 async pending-fence (T7-29 part 4 / ADR-0251)

  • ADR: ADR-0251; predecessor ADR-0186.
  • Touches:
  • core/src/vulkan/import.c — full rewrite of the submission path. Single-fence submit_and_wait becomes per-slot submit_to_slot + drain_slot_fence; the new slot_alloc / slot_release helpers materialise / tear down a ring slot (staging-pair + cmd buffer + fence). vmaf_vulkan_import_image indexes into the ring by frame_index % ring_size; vmaf_vulkan_wait_compute drains every outstanding fence. vmaf_vulkan_state_build_pictures waits the slot's fence before exposing the host pointer. Public-API signatures are unchanged.
  • core/src/vulkan/vulkan_internal.h — new struct VmafVulkanImportSlot; VmafVulkanImportSlots becomes a fixed-capacity VmafVulkanImportSlot ring[VMAF_VULKAN_RING_MAX] plus geometry + ring_size. Two new defines — VMAF_VULKAN_RING_DEFAULT (4) and VMAF_VULKAN_RING_MAX (8). VmafVulkanState gains requested_ring_size.
  • core/src/vulkan/common.cvmaf_vulkan_state_init and _state_init_external set requested_ring_size = VMAF_VULKAN_RING_DEFAULT.
  • core/test/test_vulkan_async_pending_fence.c (new, contract smoke for the v1 → v2 swap).
  • core/test/meson.build — registers the new test under the existing enable_vulkan guard.
  • core/src/vulkan/AGENTS.md (new) — pins the three rebase-sensitive ring invariants.
  • docs/adr/0251-vulkan-async-pending-fence.md (new), docs/research/0042-vulkan-async-pending-fence.md (new), docs/api/gpu.md, docs/backends/vulkan/overview.md, CHANGELOG.md, docs/rebase-notes.md.
  • ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patchunchanged. The v2 ring is fully internal to VmafVulkanState; the public ABI stays byte-identical so the filter consumes the new path transparently.
  • Invariant 1 — fixed ring depth at first import. lazy_alloc_ring is the only place that materialises the ring; once allocated the depth never changes for the lifetime of the VmafVulkanState. Any caller that needs a different depth has to free + re-init. The geometry pinning contract from v1 (ADR-0186) is preserved verbatim.
  • Invariant 2 — vkResetFences only after VK_SUCCESS from vkWaitForFences. Sole reset path lives in drain_slot_fence; fence_in_flight flips back to 0 only after the wait succeeds. A -EIO from the wait propagates up without resetting (so a retry would correctly re-wait rather than silently move on).
  • Invariant 3 — state_free drains before destroying. vmaf_vulkan_import_slots_free walks the ring and calls drain_slot_fence on every in-flight slot, then issues one vkQueueWaitIdle belt-and-braces (any feature kernel that submitted on the same queue may still be running). Reordering this triggers validation-layer "destroying in-use object" errors.
  • Numerical contract: unchanged. Async submission only changes when the host can read the staging buffer, not which bytes the GPU writes. Cross-backend parity gate (scripts/ci/cross_backend_parity_gate.py, places=4) holds.
  • Memory delta: staging arena scales 1 → ring_size per direction. At default depth and 1080p 8-bit Y, the per-state host-visible footprint grows from ~4 MiB to ~16 MiB. Documented in ADR-0251 §Consequences.

0090 — cambi_vulkan extractor (T7-36 / ADR-0210)

  • ADR: ADR-0210; predecessor ADR-0205.
  • Touches:
  • core/src/feature/vulkan/cambi_vulkan.c (replaces the spike scaffold's init_stub/extract_stub/close_stub triple with the full Vulkan-aware lifecycle).
  • core/src/feature/vulkan/shaders/cambi_preprocess.comp (new), cambi_mask_dp.comp (new — unified row-SAT / col-SAT / threshold-compare via PASS=0/1/2 spec const).
  • core/src/feature/cambi.c — appends a small block of public trampolines (vmaf_cambi_*) at the bottom of the file that thinly wrap the file-static helpers. No upstream function-static code is renamed or moved; the entire upstream body of cambi.c above the trampolines stays byte-identical, which keeps Netflix sync straightforward.
  • core/src/feature/cambi_internal.h (new) — internal-only header exposing vmaf_cambi_calculate_c_values, vmaf_cambi_get_spatial_mask, etc., to the GPU twin.
  • core/src/vulkan/meson.build — registers the 5 cambi shaders in vulkan_shader_sources[] and cambi_vulkan.c in vulkan_sources.
  • core/src/feature/feature_extractor.c — adds the extern decl + registry entry for vmaf_fex_cambi_vulkan under #if HAVE_VULKAN.
  • scripts/ci/cross_backend_vif_diff.pycambi row in FEATURE_METRICS so the cross-backend gate runs at places=4 against the CPU baseline.
  • docs/adr/0210-cambi-vulkan-integration.md, docs/research/0032-cambi-vulkan-integration.md, docs/backends/vulkan.md, CHANGELOG.md.
  • Invariant 1 — bit-exactness by construction. Every GPU phase is integer arithmetic (uint16 derivative, int32 SAT, > compare, stride-2 gather, 3-element mode3 lookup). The readback into the host VmafPicture pair is byte-identical to what the CPU would have written; the host residual then runs the unmodified CPU calculate_c_values + spatial pooling on those buffers. Any rebase that introduces float arithmetic into one of these GPU phases — e.g., a future Netflix change to the derivative kernel that adds a bilinear interpolation step — will silently break places=4 and must be caught at the cross-backend gate.
  • Invariant 2 — cambi_internal.h signatures must stay in lock-step with cambi.c's file-static helpers. The Vulkan twin calls vmaf_cambi_calculate_c_values, which trampolines to the file-static calculate_c_values. Any signature change to the latter (extra parameters, type changes) must update the trampoline + header in the same PR or the GPU build breaks.
  • On upstream sync: cambi.c's file-static helpers are sometimes renamed by upstream (e.g., decimatecambi_decimate would happen during a Netflix tidy-up). When rebasing, search cambi.c's tail for the trampoline block — its five static calls (get_spatial_mask, decimate, filter_mode, calculate_c_values, spatial_pooling, weight_scores_per_scale, get_pixels_in_window, increment_range, decrement_range, get_derivative_data_for_row, cambi_preprocessing) need to match the upstream symbol names. Update the trampoline body if upstream renames; signatures should not need to change because the trampoline already takes the function-pointer-typedef form (VmafRangeUpdater etc.).
  • Re-test on rebase: python3 scripts/ci/cross_backend_vif_diff.py --backend vulkan --feature cambi --ref testdata/ref_576x324_48f.yuv --dist testdata/dis_576x324_48f.yuv --width 576 --height 324 --pixel-format 420 --bitdepth 8 --frames 48. Should emit places=4 PASS with max_abs_diff = 0.0. If it diverges, bisect the GPU phases by reading back individual buffers (image_buf / mask_buf / deriv_buf) and comparing against the CPU's in-place pic plane after the equivalent stage.

The pre-ADR-0108 fork-local PRs are summarised by workstream rather than per-PR. Future PRs add entries individually.

0085 — Upstream c70debb1 partial port (adm_csf + barten_csf tests)

  • No ADR. Pure upstream cherry-pick per ADR-0108 carve-out ("pure upstream syncs and port-upstream-commit PRs are exempt").
  • Upstream source: c70debb1 (Kyle Swanson, 2026-04-28): "libvmaf/test: port new adm/vif/speed tests". The audit row that flagged the gap is T-NEW-2 in the 2026-04-29 quarterly upstream-backlog re-audit (PR #205).
  • Touches (additive only):
  • core/src/feature/adm_csf_tools.h — new header (verbatim from upstream); declares the inline adm_native_csf helper (DLM-paper CSF) used by the new test_adm_csf unit.
  • core/test/test_adm_csf.c — new unit (verbatim from upstream); 2 mu_assert cases on adm_native_csf(3, 3.0, 1080, {0, 45}).
  • core/test/test_barten_csf.c — new unit (verbatim from upstream); 23 mu_assert cases over barten_rod_cone_sens, barten_mtf, barten_csf, linear_interpolate, barten_watson_blend_csf (all symbols already on the fork).
  • core/test/meson.build — registers the two new executables + adds test('test_adm_csf', ...) and test('test_barten_csf', ...).
  • CHANGELOG.md Unreleased § Changed.
  • Deliberate scope cuts (the upstream commit's other halves are not portable verbatim):
  • test_vif_tools.c — depends on upstream symbols NUM_KERNELSCALES, the 21-entry valid_kernelscales table, vif_validate_kernelscale, vif_get_filter_size, vif_get_filter, speed_get_antialias_filter, and a [NUM_KERNELSCALES][5][65] filter table that the fork's vif_filter1d_table_s [11][4][65] does not match. Per Research-0024 Strategy E, the fork deliberately diverges from the upstream vif runtime-helper chain to preserve the ADR-0138 / 0139 / 0142 / 0143 SIMD bit-exactness contract. Porting this test requires porting the runtime helpers first.
  • test_speed_chroma.c#includes feature/speed.c directly; the fork has no SpEED extractor (feature/speed.c does not exist). Pairs with audit row T-NEW-1 (port the SpEED extractor wholesale, or absorb it into the tiny-AI speed metric).
  • Invariants (rebase-relevant):
  • The new adm_csf_tools.h header is wholly additive and does not conflict with the existing fork adm_csf_s non-inline helper in adm_tools.h (different signature, different translation units).
  • The two new tests do not depend on Netflix golden YUVs — they evaluate the closed-form CSF math directly. No golden-data interaction.
  • On upstream sync: a future port of the upstream vif runtime-helper chain (Research-0024 Strategy A reversal) or the SpEED extractor (T-NEW-1) unlocks the deferred halves of this commit. Until then, fork-side test_vif_tools.c / test_speed_chroma.c stay absent.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu test_adm_csf test_barten_csf
meson test -C build-cpu test_adm_csf test_barten_csf

0084 — Embedded MCP server scaffold (T5-2, ADR-0209)

  • ADR: ADR-0209 (audit-first scaffold) on top of the ADR-0128 governance + Research-0005 design.
  • Upstream source: fork-local. Netflix/vmaf has no embedded MCP server (and no plans to add one — the workflow is agent-tooling-specific, well outside upstream's library scope).
  • Touches:
  • core/include/libvmaf/libvmaf_mcp.h — new public header.
  • core/include/core/meson.build — new if get_option('enable_mcp') install branch.
  • core/src/mcp/ — new directory: mcp.c (stub TU) + meson.build (exposes mcp_sources + mcp_defines).
  • core/src/meson.build — new is_mcp_enabled guard + subdir('mcp') block; mcp_sources threaded into the library('vmaf', ...) source list alongside dnn_sources.
  • core/test/meson.build — new if get_option('enable_mcp') block wiring test_mcp_smoke.
  • core/test/test_mcp_smoke.c — new 12-sub-test smoke.
  • core/meson_options.txt — new enable_mcp umbrella + three sub-flags (all default false).
  • Invariant: every public entry point in libvmaf_mcp.h (vmaf_mcp_init / _start_sse / _start_uds / _start_stdio / _stop / _close) returns -ENOSYS (or -EINVAL on bad arguments) until the T5-2b runtime PR lands. The smoke pins this contract — a runtime PR that flips a return code without flipping the smoke expectation regresses the gate.
  • On upstream sync: zero interaction with upstream files. Wholly additive directory + boolean build flags. The subdir('mcp') insertion in core/src/meson.build lives next to the existing subdir('dnn') / Vulkan blocks; an upstream conflict in that area would be confined to those few lines and is mechanical to resolve.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_mcp=false
ninja -C build-cpu && meson test -C build-cpu  # baseline still green

meson setup --reconfigure build-cpu libvmaf -Denable_mcp=true \
            -Denable_mcp_sse=true -Denable_mcp_uds=true -Denable_mcp_stdio=true
ninja -C build-cpu
meson test -C build-cpu test_mcp_smoke  # 12/12 sub-tests pass

0065 — T7-37 Netflix bench rerun + docs/benchmarks.md TBD fill

  • No ADR. Empirical fill of pre-existing TBD cells; no new decision. The bench script fixes that this rerun depends on shipped earlier under PR #169 (libvmaf/AGENTS.md backend-engagement foot-guns), PR #170 (--backend cuda actually engages CUDA), and PR #171 (testdata/bench_all.sh uses correct flags). Vulkan header install for SDK consumers is PR #175.
  • Touches (additive only): docs/benchmarks.md (every TBD cell replaced with measured numbers; hardware-profile table updated to the ryzen-4090-arc host the rerun was performed on; "How to reproduce" section now documents fixture acquisition for the gitignored BBB 4K 200-frame pair). CHANGELOG.md Unreleased § Changed entry.
  • Invariants (rebase-relevant): none. The numbers are tied to fork commit 41301496 and the ryzen-4090-arc profile; an upstream rebase that changes feature pipelines would invalidate the table but not break parsing.
  • On upstream sync: zero interaction. Pure docs.
  • Re-test on rebase: bash testdata/bench_all.sh (after a fresh fork build) — confirms the bench script still drives all four backends and that the per-row metrics-key counts (CPU=15, CUDA=12, SYCL/Vulkan=34) still distinguish them. If they collapse to one count, the new upstream broke a backend dispatcher silently.

0050 — float_adm_cuda + float_adm_sycl extractors (ADR-0202)

  • ADR: ADR-0202
  • Touches:
  • core/src/feature/cuda/float_adm/float_adm_score.cu (new)
  • core/src/feature/cuda/float_adm_cuda.{c,h} (new)
  • core/src/feature/sycl/float_adm_sycl.cpp (new)
  • core/src/meson.build — three changes: (1) new float_adm_score entry in cuda_cu_sources, (2) new cuda_cu_extra_flags dict that threads --fmad=false + -Xcompiler=-ffp-contract=off into the float_adm_score fatbin only, (3) new SYCL source in sycl_feature_sources.
  • core/src/feature/feature_extractor.c (extern decls + list entries for vmaf_fex_float_adm_cuda / vmaf_fex_float_adm_sycl under #if HAVE_CUDA / #if HAVE_SYCL).
  • Invariant 1 — --fmad=false for the float_adm fatbin only: the angle-flag dot product (ot_dp = oh*th + ov*tv) and the cube reductions (xa*xa*xa, csf_o*csf_o*csf_o) require IEEE-754 add/mul ordering to match the GLSL precise qualifier in float_adm.comp. NVCC's default -fmad=true fuses these and drifts past places=4 at scale 3 / adm2. The integer ADM kernels share cuda_flags but use int64 accumulators where FMA is irrelevant — keep the FMA-on default for them.
  • Invariant 2 — parent-LL dimension trap: stage 0 at scale > 0 reads the parent's LL band; the mirror/clamp bounds are scale_w/h[scale] (= parent's LL output dims = current scale's input dims), NOT scale_w/h[scale - 1] (= parent's full image dims). Both float_adm_cuda.c and float_adm_sycl.cpp cite this inline. Do not "simplify" by using the off-by-one neighbour.
  • Re-test:
CXX=icpx CC=icx meson setup build-cs -Denable_cuda=true \
     -Denable_sycl=true -Denable_vulkan=enabled \
     -Denable_float=true \
     -Dsycl_compiler=/opt/intel/oneapi/compiler/latest/bin/icpx
ninja -C build-cs
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary build-cs/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature float_adm \
  --backend cuda --places 4
# Same with --backend sycl on a host with an SYCL device.
# Both must report 0/N mismatches at places=4.

0049 — float_adm_vulkan extractor (ADR-0199)

  • ADR: ADR-0199
  • Touches:
  • core/src/feature/vulkan/float_adm_vulkan.c (new)
  • core/src/feature/vulkan/shaders/float_adm.comp (new)
  • core/src/vulkan/meson.build (adds the .comp shader and the new .c source)
  • core/src/feature/feature_extractor.c (extern decl + list entry under #if HAVE_VULKAN)
  • scripts/ci/cross_backend_vif_diff.py (float_adm entry in FEATURE_METRICS)
  • .github/workflows/tests-and-quality-gates.yml (lavapipe float_adm step at places=4)
  • Invariant: float_adm GPU port uses the 2 * sup - idx - 1 mirror form on both axes — matches both the scalar adm_dwt2_s and the AVX2 float_adm_dwt2_avx2, which both consume the same dwt2_src_indices_filt_s index buffer. This is intentionally different from float_vif's GPU mirror (ADR-0197), which uses -2 because float_vif's AVX2 path takes a different code branch. Do not "fix" the asymmetry by analogy with float_vif.
  • Re-test:
meson setup build-vk -Denable_vulkan=enabled -Denable_cuda=false \
                     -Denable_sycl=false
ninja -C build-vk
meson test -C build-vk
VK_LOADER_DRIVERS_SELECT='*lvp*' python3 \
  scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary build-vk/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature float_adm --places 4

0083 — SSIMULACRA 2 Vulkan kernel (ADR-0201)

meson setup core/build-vk-ss2 \
  -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false \
  libvmaf
ninja -C core/build-vk-ss2 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build-vk-ss2/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 \
  --feature ssimulacra2 --backend vulkan --places 1
# expected: max_abs_diff ≈ 1.59e-2, 0/48 mismatches at places=1
  • Follow-ups:
  • CUDA + SYCL twins (batch 3 parts 7b + 7c per ADR-0192).
  • Performance follow-up: re-bin multiple rows / columns per WG in the IIR blur (currently local_size = 1, one row/col per WG for correctness).
  • Optional: rename psnr_hvs_strict_shaders to strict_shaders in core/src/vulkan/meson.build (cosmetic — out of scope for this PR).

0001 — SIMD bit-identical reductions for float ADM

  • Workstream PRs: #18, commits 24c88a32, f082cfd3.
  • Touches: core/src/feature/integer_adm.c, core/src/feature/float_adm.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/arm64/adm_neon.c, upstream python/test/feature_extractor_test.py test expectations.
  • Invariant: sum_cube and csf_den_scale accumulate cubed values in double precision (via _mm256_cvtps_pd / _mm512_cvtps_pd) in scalar, AVX2, AVX-512, and NEON. Upstream accumulates in float, which produces ~8e-5 drift between scalar and SIMD. Test expectations were tightened to match the double-precision path; an upstream-side accumulator change would re-introduce the drift and break the tightened assertions.
  • Re-test: meson test -C build --suite=fast && python -m pytest python/test/feature_extractor_test.py -k adm.

0002 — CUDA ADM decouple-inline buffer elimination

  • Workstream PRs: commit 787e3382.
  • Touches: core/src/feature/cuda/integer_adm_cuda.cu, core/src/feature/cuda/adm_decouple_inline.cuh (new), core/src/feature/cuda/meson.build. Upstream's adm_decouple.cu is no longer compiled in the fork.
  • Invariant: CSF and CM CUDA kernels read ref / dis DWT2 buffers directly and compute decouple_r / decouple_a inline via __device__ helpers in adm_decouple_inline.cuh. The 6 intermediate buffers (decouple_r, decouple_a, csf_a × {scale-0 int16, scales 1-3 int32}) and the standalone adm_decouple.cu source are intentionally removed. ~107 MB GPU memory savings at 4K. An upstream change to adm_decouple.cu will look orphaned and a literal merge would re-introduce the buffer allocations.
  • Re-test: meson setup build -Denable_cuda=true && ninja -C build && meson test -C build --suite=cuda.

0003 — SYCL backend (USM pool / D3D11 import / vmaf_sycl_* API)

  • Workstream PRs: #33, #35, #5 (initial scaffolding), and the picture-pool deadlock fix that landed via #32.
  • Touches: core/include/libvmaf/libvmaf_sycl.h, core/src/sycl/, core/src/feature/sycl/, core/src/libvmaf.c (SYCL public-API entry points), meson_options.txt (enable_sycl).
  • Invariant: vmaf_sycl_preallocate_pictures constructs a real VmafSyclPicturePool honoring VmafSyclPicturePreallocationMethod (NONE / DEVICE / HOST); vmaf_sycl_picture_fetch dispatches to the pool when configured. The whole SYCL tree is fork-local and has no upstream counterpart — upstream changes to core/src/libvmaf.c near the SYCL entry-point block are likely to conflict. Picture-pool error paths in vmaf_read_pictures (libvmaf.c) must goto cleanup; rather than return err; to avoid leaking ref/dist pictures into the live-picture set (closes the always-on-pool deadlock fixed in #32 — see ADR-0104). See ADR-0101, ADR-0103, ADR-0104.
  • Re-test: meson setup build -Denable_sycl=true && ninja -C build && meson test -C build --suite=sycl (requires oneAPI / icpx).

0004 — DNN runtime + tiny-AI surfaces

  • Workstream PRs: #5, #8, #21, #22, #23, #31, #34, plus the pre-numbered DNN feat commits (9b985946, 1e5336d3, d122b721).
  • Touches: core/include/libvmaf/dnn.h, core/src/dnn/, core/src/feature/feature_lpips.c, model/tiny/, meson_options.txt (enable_onnxruntime).
  • Invariant: ordered EP selection (CUDA → DML → CPU) with graceful fallback (ADR-0102); fp16_io does host-side fp32↔fp16 cast on the scoring path; VMAF_TINY_MODEL_DIR enforces a path jail on model load (PR #31); the runtime op-allowlist (PR #21) walks the ONNX graph and rejects unknown ops + bounds Loop/If trip_count at 1024 (ADR-0036/0107). DNN tree is fork-local; upstream has no DNN code yet, so conflicts here are unlikely but the meson_options.txt and core/src/meson.build blocks near the DNN flag may collide.
  • Re-test: meson setup build -Denable_onnxruntime=true && ninja -C build && meson test -C build --suite=dnn.

0005 — --precision CLI flag (IEEE-754 round-trip lossless)

  • Workstream PRs: commit c989fbd9.
  • Touches: core/tools/vmaf.c, core/tools/cli_parse.c, core/include/libvmaf/libvmaf.h (added vmaf_write_output_with_format), core/src/output.c.
  • Invariant: default --precision is %.17g (round-trip lossless); legacy opts back into upstream's %.6f; the public C API gained vmaf_write_output_with_format and the old vmaf_write_output routes through it with the %.17g default. ABI-breaking only if upstream adds a same-named function with a different signature. See ADR-0006.
  • Re-test: vmaf -r ref.yuv -d dis.yuv ... --precision=full and diff against --precision=legacy.

0006 — Netflix golden tests preserved verbatim as required gate

  • Workstream PRs: across the fork's life; codified in ADR-0024.
  • Touches: python/test/quality_runner_test.py, python/test/vmafexec_test.py, python/test/vmafexec_feature_extractor_test.py, python/test/feature_extractor_test.py, python/test/result_test.py, python/test/resource/yuv/.
  • Invariant: assertAlmostEqual(...) golden values in the five upstream Python test files are never modified by this fork. Fork-added tests live in separate files (e.g. python/test/test_precision_flag.py). The CI gate "Netflix CPU golden tests (D24)" is required and blocks merge. Upstream changes to these files are accepted unless they relax the assertions.
  • Re-test: make test-netflix-golden.

0007 — Build system (CUDA 13.2, oneAPI 2025.3, MkDocs migration)

  • Workstream PRs: #7, #17, commit 8a995cb0.
  • Touches: meson.build, meson_options.txt, top-level Makefile, docs/ (Sphinx → MkDocs Material migration — docs/conf.py removed, mkdocs.yml added), docs/requirements.txt, Dockerfile.*, distro install scripts under scripts/.
  • Invariant: image pins are non-conservative (ADR-0027) — CUDA 13.2, oneAPI 2025.3, clang-format 22, black 26 — and ship experimental toolchain flags (--expt-relaxed-constexpr, etc.) deliberately. An upstream sync that pulls in a Dockerfile change targeted at older CUDA or older oneAPI must not relax the pins.
  • Re-test: meson setup build -Denable_cuda=true -Denable_sycl=true && ninja -C build && mkdocs build --strict.

0008 — Workspace / docs / MATLAB / resource-tree relocations

  • Workstream PRs: codified across ADR-0026, ADR-0029, ADR-0030, ADR-0031, ADR-0032, ADR-0033, ADR-0034, ADR-0038.
  • Touches: any path-walk in upstream's CI / scripts / docs that assumes the upstream layout (root-level workspace/, resource/, matlab/, root unittest script, root patches/).
  • Invariant: the fork's layout is python/vmaf/workspace/, python/vmaf/resource/, python/vmaf/matlab/, scripts/unittest, ffmpeg-patches/ only, .github/codeql-config.yml. Upstream moves to a different sub-tree (e.g. a hypothetical tools/workspace/) need to either be applied via a corresponding fork-side relocation or rejected with a rebase note.
  • Re-test: python -m pytest python/test/ -k golden (verifies the resource-tree path works); make test-netflix-golden.

0009 — License headers (Lusoris/Claude on wholly-new files

2016–2026 on Netflix files)

  • Workstream PRs: commits c159761d, a185f8ef, 0e98c949, codified in ADR-0025 / ADR-0105.
  • Touches: every wholly-new fork file (notably the SYCL tree and core/src/dnn/) and every Netflix-touched file (year range 2016 → 2016–2026).
  • Invariant: wholly-new fork files carry Copyright 2026 Lusoris and Claude (Anthropic) under the same BSD-3-Clause-Plus-Patent license; mixed files use a dual-copyright notice. An upstream commit that resets a Netflix file's year range (e.g. back to 2016–2020) must be partially rejected — keep the fork's 2016–2026.
  • Re-test: grep that wholly-new fork files retain the Lusoris/Claude header (grep -L "Copyright 2026 Lusoris" core/src/sycl/*.cpp — expected to match nothing).

0010 — .claude/ agent scaffolding + ADR tree + AGENTS.md / CLAUDE.md

  • Workstream PRs: #14, #24, #37, plus continuous additions.
  • Touches: .claude/, AGENTS.md, CLAUDE.md, docs/adr/, .github/PULL_REQUEST_TEMPLATE.md.
  • Invariant: this whole tree is fork-local and has no upstream counterpart. Upstream additions to .github/ (issue templates, workflows) need to merge cleanly with the fork's existing files rather than replacing them. The ADR tree's IDs ≤ 0099 are backfills; new decisions start at 0100 (ADR-0028 / ADR-0106).
  • Re-test: visual review of .github/ and docs/adr/README.md after the merge.

Pre-ADR-0108 entries above are the result of a one-shot backfill sweep on 2026-04-18; subsequent fork-local PRs add their own entries inline.

0011 — Nightly bisect-model-quality + fixture cache

  • Workstream PRs: closes #4; sticky tracker issue #40.
  • Touches: .github/workflows/nightly-bisect.yml, ai/scripts/build_bisect_cache.py, ai/testdata/bisect/{features.parquet, models/*.onnx, README.md}, scripts/ci/post-bisect-comment.py, docs/ai/bisect-model-quality.md, docs/adr/0109-nightly-bisect-model-quality.md, docs/research/0001-bisect-model-quality-cache.md, mkdocs.yml (nav).
  • Invariant: the committed parquet + ONNX bytes under ai/testdata/bisect/ must regenerate byte-identically from ai/scripts/build_bisect_cache.py with seeds FEATURE_SEED=20260418 and MODEL_SEED=20260419. The CI --check step asserts this before every bisect run, so any upstream pull that bumps pandas / pyarrow / onnx enough to change the serialiser bytes will fail the workflow until the cache is regenerated and committed.
  • Re-test:
python ai/scripts/build_bisect_cache.py --check
vmaf-train bisect-model-quality \
    ai/testdata/bisect/models/model_*.onnx \
    --features ai/testdata/bisect/features.parquet \
    --min-plcc 0.85 --input-name input
# Expected: "no regression in this range"; first_bad_index None.

Pure upstream code is not touched, so no Netflix-side conflict vector. Only fork-local files; risk is toolchain drift, not merge conflict.

0012 — Upstream ADM port (Netflix 966be8d5)

  • Workstream PRs: this PR; ports a single upstream commit.
  • Touches: core/src/feature/integer_adm.{c,h}, core/src/feature/x86/adm_avx2.{c,h}, core/src/feature/x86/adm_avx512.{c,h}, core/src/feature/alias.c, core/src/feature/barten_csf_tools.h (new upstream file).
  • Invariant: the eight ADM files now mirror upstream's content byte-for-byte (modulo our clang-format-22 pass and the Netflix copyright-year bump on the new header). Future /sync-upstream runs can take new upstream ADM commits cleanly. Do not revert to a pre-966be8d5 ADM kernel without also reverting the call-site signatures in integer_compute_adm — upstream extended i4_adm_cm from 8 to 13 args.
  • Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --model version=vmaf_v0.6.1 -o /tmp/vmaf-port.json
grep '<metric name="vmaf"' /tmp/vmaf-port.json
# Expected: mean ≈ 76.66890 (golden 76.66890519623612, places=4 OK).

0013 — Upstream motion port (Netflix PR #1486 head 2aab9ef1)

  • Workstream PRs: this PR; ports upstream PR #1486 (4 commits on top of 966be8d5 ADM base, head 2aab9ef1). Sister to entry 0012.
  • Touches: core/src/feature/integer_motion.{c,h}, core/src/feature/motion_blend_tools.h (new upstream file), core/src/feature/x86/motion_avx2.c, core/src/feature/x86/motion_avx512.c, core/src/feature/alias.c (additive: integer_motion3 row), python/test/{quality_runner,vmafexec,feature_extractor,vmafexec_feature_extractor}_test.py (golden tolerance updates: places=4places=2 on motion-affected asserts; expected values unchanged).
  • Invariant: motion files mirror upstream byte-for-byte (modulo our clang-format-22 pass). The alias.c row for integer_motion3 was inserted surgically to avoid clobbering the AVX-512 ADM registration added by entry 0012; new motion3 metric appears in default VMAF model output but is not standalone-loadable via --feature integer_motion3 (sub-feature only). Netflix golden VMAF mean shifts 76.66890482476.667830213 (well within places=2 tolerance the upstream PR loosened to). Do not revert places=4 on motion-touching assertions without also reverting the motion code.
  • Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --model version=vmaf_v0.6.1 -o /tmp/vmaf-motion-port.json
grep -E '<metric name="vmaf"|integer_motion3' /tmp/vmaf-motion-port.json
# Expected: vmaf mean ≈ 76.66783; integer_motion3 mean ≈ 3.98976.

0014 — Coverage gate overhaul + upstream python/test/ reformat

  • Workstream PRs: this PR (coverage-gate overhaul + in-tree reformat of upstream-mirror Python tests).
  • Touches: .github/workflows/ci.yml (CPU + GPU coverage jobs: -Dc_args=-fprofile-update=atomic / -Dcpp_args=-fprofile-update=atomic, meson test --num-processes 1, -Denable_dnn=enabled, ORT install step on the CPU coverage job, lcov/geninfo replaced by gcovr with --json-summary / --xml / --txt output, artifact rename coverage-lcov-{cpu,gpu}coverage-{cpu,gpu}), scripts/ci/coverage-check.sh (rewritten to parse gcovr JSON via python3 -c — same CLI signature), core/src/dnn/dnn_api.c + new core/src/dnn/dnn_attach_api.c (vmaf_use_tiny_model carved out into its own TU so the unit-test binaries — which pull in dnn_sources for feature_lpips.c but never link libvmaf.c — don't end up with an undefined reference to vmaf_ctx_dnn_attach once enable_dnn=enabled activates the real bodies), core/src/dnn/meson.build + core/src/meson.build (new dnn_libvmaf_only_sources list wired into libvmaf.so only), python/test/{feature_extractor,quality_runner,vmafexec,vmafexec_feature_extractor}_test.py (mechanical Black + isort reformat — no assertion values changed, imports regrouped, line wrapping normalised).
  • Invariant: coverage CI must keep all five pieces in lockstep — (a) -fprofile-update=atomic closes the intra-process counter race on SIMD inner loops (vif_avx2.c:673, motion_avx2, etc.) → negative counts → geninfo/gcovr abort; (b) --num-processes 1 closes the inter-process race where multiple parallel test binaries merge their counters into the same .gcda files for the shared libvmaf.so at process exit (per-thread atomicity does not cover this); (c) gcovr deduplicates .gcno files belonging to the same source compiled into multiple targets — without dedup, lcov sums hits across compilation units and yields impossible

    100% values (dnn_api.c — 1176% was the smoking gun on the first attempt that had only (a)+(b)); (d) ORT install + enable_dnn=enabled in the coverage job is what makes core/src/dnn/*.c measurable in the first place — without ORT, the DNN tree compiles in stub branches and the 85% per-critical-file gate is meaningless; (e) vmaf_use_tiny_model lives in dnn_attach_api.c and is added to libvmaf.so only via dnn_libvmaf_only_sources — moving it back into dnn_api.c reintroduces the vmaf_ctx_dnn_attach undefined-reference link error in test_feature_extractor / test_lpips whenever enable_dnn=enabled, since those test binaries pull in dnn_sources for feature_lpips.c but never link libvmaf.c. Lint scope: upstream-mirror Python tests are linted at the same standard as fork-added code; we accept that /sync-upstream and /port-upstream-commit will re-trigger Black/isort failures whenever upstream rewrites these files, and the fix is another in-tree reformat pass — never an exclusion. The fork's pyproject.toml and .pre-commit-config.yaml keep python/test/resource/ (binary fixtures only) excluded; python/test/*.py is in scope. See ADR-0110 (race fixes, superseded) and ADR-0111 (gcovr + ORT layer).

  • Re-test:
# Reproduce coverage path locally (requires gcc + python3-pip):
pip install --user 'gcovr>=8.0'
cd libvmaf
meson setup build-cov-test --buildtype=debug -Db_coverage=true \
    -Denable_avx512=true -Denable_float=true -Denable_dnn=disabled \
    -Dc_args=-fprofile-update=atomic -Dcpp_args=-fprofile-update=atomic
ninja -C build-cov-test
meson test -C build-cov-test --print-errorlogs --num-processes 1
~/.local/bin/gcovr --root .. \
    --filter 'src/.*' \
    --exclude '.*/test/.*' --exclude '.*/tests/.*' \
    --exclude '.*/subprojects/.*' \
    --gcov-ignore-parse-errors=negative_hits.warn \
    --gcov-ignore-parse-errors=suspicious_hits.warn \
    --print-summary --txt build-cov-test/coverage.txt \
    --json-summary build-cov-test/coverage.json \
    build-cov-test
grep -E 'dnn_api|model_loader' build-cov-test/coverage.txt
# Expected: gcovr completes without "Unexpected negative count" AND no
# per-file percentages exceed 100% (drop --num-processes 1 to reproduce
# the multi-process .gcda merge race; switch back to lcov to reproduce
# the dnn_api.c — 1176% over-count from compilation-unit summation).

# Lint smoke test for upstream-mirror tree:
pre-commit run --files python/test/quality_runner_test.py
# Expected: Black/isort/Ruff all PASS — files are reformatted in-tree
# to fork style and stay clean until the next upstream sync.

0015 — Tox doctest collection skips vmaf/resource/

  • Workstream PRs: this PR (fix(ci): skip pytest doctest collection of vmaf/resource/ data files). Surfaced once ADR-0115 consolidated CI triggers to master and tox actually started running on PRs.
  • Touches: python/tox.ini (single-line --ignore=vmaf/resource added to the pytest invocation, plus an explanatory comment block). Pure fork-local; no upstream Python file changes.
  • Invariant: pytest --doctest-modules must not attempt to import files under python/vmaf/resource/. Those are parameter / dataset / example-config .py files; several have dots in their stems (e.g. vmaf_v7.2_bootstrap.py) that make them unimportable as Python modules. None carry doctests, so the ignore is correctness rather than a workaround. Do not drop the --ignore=vmaf/resource flag without first verifying every file under that directory has been renamed to a dot-free stem and is importable.
  • Re-test:
cd python && tox -e py311 -- --collect-only --doctest-modules \
    --ignore=vmaf/resource 2>&1 | grep -c "ERROR collecting vmaf/resource"
# Expected: 0 (was 5 before the fix).

Pure upstream code is not touched, so no Netflix-side conflict vector. Risk is upstream renaming or removing files under python/vmaf/resource/ such that the directory disappears, in which case the --ignore becomes a harmless no-op.

  • Workstream PRs: this PR (fix(libvmaf): gate -fsycl link arg on icpx CXX, allow gcc/clang host linker). Surfaced once ADR-0115's CI consolidation added an Ubuntu SYCL job to PR-time CI that uses CXX=g++ (host linker) with sidecar icpx for SYCL .cpp compilation.
  • Touches: core/src/meson.build (the vmaf_link_args block immediately after the is_sycl_enabled flag handling — currently ~lines 696-712). Pure fork-local; no upstream Meson file changes expected.
  • Invariant: -fsycl is appended to vmaf_link_args only when meson.get_compiler('cpp').get_id() == 'intel-llvm' (icpx). Rationale: the documented project mode (see comment near is_sycl_enabled block at top of src/meson.build) compiles SYCL .cpp files via custom_target with icpx, while the project's CXX driver may be gcc / clang / msvc; in that mode the SPIR-V device code is already embedded in the icpx-compiled .o files at compile time, and the runtime libraries (libsycl + libsvml + libirc + libze_loader) declared as link dependencies resolve every symbol. Passing -fsycl to a non-icpx linker is a hard error (g++: error: unrecognized command-line option '-fsycl'). Do not remove the cpp.get_id() == 'intel-llvm' guard without first verifying every CI matrix leg uses icpx as the project CXX.
  • Re-test:
meson setup build -Denable_sycl=true \
    -Dcpp_link_args=-Wl,--no-undefined
ninja -C build src/libvmaf.so.3
# Expected: link succeeds; no `-fsycl` errors with gcc/clang host CXX.

Pure fork-local guard; no Netflix-side conflict vector.

0017 — CLI precision default %.6f (Netflix-compat) + frame-skip unref

  • Workstream PRs: this PR (fix(cli): revert precision default to %.6f and unref skipped frames). Reverts the default flipped by commit c989fbd9 (ADR-0006) per ADR-0119. Companion fix in core/tools/vmaf.c resolves the picture-pool exhaustion in the --frame_skip_ref/dist loops surfaced once the always-on picture pool (ADR-0104) made unref'ing skipped pictures mandatory.
  • Touches:
  • core/tools/cli_parse.c (VMAF_DEFAULT_PRECISION_FMT + VMAF_LOSSLESS_PRECISION_FMT macros, resolve_precision_fmt() body, --help text)
  • core/tools/cli_parse.h (field comments only; struct shape unchanged)
  • core/src/output.c (DEFAULT_SCORE_FORMAT macro)
  • core/tools/vmaf.c (skip loop bodies at the c.frame_skip_ref / c.frame_skip_dist for-loops)
  • python/vmaf/core/result.py (per-frame and aggregate :.6f formatters)
  • python/test/command_line_test.py is unmodified — Netflix golden assertions stay frozen per CLAUDE.md §8; the binary's output format adapts to them, not the other way around.
  • Invariant: vmaf CLI default score-output format is %.6f (matches upstream Netflix byte-for-byte). --precision=max|full selects %.17g (IEEE-754 round-trip lossless). --precision=legacy is a synonym for the default. The library default for vmaf_write_output_with_format(..., score_format=NULL) matches. Skipped frames in the --frame_skip_ref / --frame_skip_dist pre-loops are vmaf_picture_unref'd immediately after fetch so the preallocated picture pool is not exhausted before the main scoring loop runs. Do not flip the macros back to %.17g or remove the unrefs without a superseding ADR — both are golden-gate-load-bearing.
  • Re-test:
ninja -C core/build
python -m pytest python/test/command_line_test.py \
    ::VmafexecCommandLineTest::test_run_vmafexec \
    ::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping \
    ::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping_unequal \
    -v
# Expected: all three PASS in <1 s combined.

Pure fork-local; no Netflix-side conflict vector. If upstream ever changes the default format string, treat their value as the new baseline and reconfirm the golden assertions before adopting.

0018 — FFmpeg patches ship as ordered series.txt

  • Workstream PRs: this PR (fix(ci): drop dead sycl trigger + consolidate windows.yml into libvmaf.yml (ADR-0115)). Surfaced once ADR-0115's consolidation routed the docker / FFmpeg-SYCL jobs through the master-targeting CI gate for the first time on this branch — the standalone 0003-…sycl… apply broke because it referenced struct fields added by 0001-…tiny-model…, the Dockerfile only COPY'd 0003, and ffmpeg.yml referenced a stale ../patches/ path.
  • Touches: Dockerfile (lines ~86-95 — the FFmpeg patch-apply block), .github/workflows/ffmpeg.yml (the Build FFmpeg with SYCL patch series step), ffmpeg-patches/000{1,2,3}-*.patch (regenerated via real git format-patch -3 so they carry valid index <sha>..<sha> <mode> lines and committable SHAs). Pure fork-local; no upstream FFmpeg or Netflix file changes.
  • Invariant: both the Dockerfile and ffmpeg.yml walk ffmpeg-patches/series.txt line-by-line and apply each patch via git apply with a patch -p1 fallback. Do not ship a new patch without appending it to series.txt, and do not reorder existing entries — patch 0003 references LIBVMAFContext fields added by patch 0001, so any out-of-order apply breaks the build at hunk 2 of vf_libvmaf.c.
  • Two flag-side fixes bundled in the same PR:
  • --enable-libvmaf-sycl is not a valid FFmpeg configure option. Patch 0003 uses check_pkg_config libvmaf_sycl … auto-detection (matching how libvmaf_cuda is wired) — it never registers the switch. Both Dockerfile and ffmpeg.yml used to pass the flag and configure rejected it with Unknown option "--enable-libvmaf-sycl". SYCL support is now controlled solely by -Denable_sycl=true at libvmaf build time; FFmpeg picks it up automatically when libvmaf-sycl.pc is on PKG_CONFIG_PATH.
  • The Dockerfile now carries two nvcc-flag ARGs. NVCC_FLAGS (libvmaf) keeps four -gencode lines plus the experimental --extended-lambda / --expt-relaxed-constexpr / --expt-extended-lambda flags needed for Thrust/CUB host+device code. FFMPEG_NVCC_FLAGS (FFmpeg) carries a single -gencode arch=compute_75,code=sm_75 -O2 — FFmpeg's check_nvcc runs nvcc -ptx, which fails with nvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures on multi-arch input, and --extended-lambda requires host+device compilation. compute_75 PTX is forward-compatible with all newer GPUs via driver JIT.
  • --enable-libnpp is no longer passed to FFmpeg's configure. FFmpeg n8.1's libnpp probe carries an explicit die "ERROR: libnpp support is deprecated, version 13.0 and up are not supported" (configure:7335-7336) that fires on the base image's CUDA 13.2 libnpp. We don't use scale_npp / transpose_npp / sharpen_npp in any VMAF workflow; cuvid + nvdec + nvenc + libvmaf-cuda is the actual GPU path. Revisit once we move to an FFmpeg release that supports CUDA 13 libnpp upstream.
  • Patch 0002 (add-vmaf_pre-filter) gained a missing #include "libavutil/imgutils.h" for av_image_copy_plane(). FFmpeg's libavfilter Makefile builds with -Werror=implicit-function-declaration so this fired during the actual compile (not configure). Caught by a local docker build rather than waiting for GitHub Actions — much faster iteration loop.
  • Re-test:
cd /tmp && rm -rf ffmpeg-test && \
    git clone -q --depth 1 -b n8.1 \
        https://git.ffmpeg.org/ffmpeg.git ffmpeg-test && \
    cd ffmpeg-test && \
    while IFS= read -r line; do \
        case "$line" in ''|\#*) continue ;; esac; \
        git apply "/path/to/vmaf/ffmpeg-patches/$line" \
            || patch -p1 < "/path/to/vmaf/ffmpeg-patches/$line"; \
    done < /path/to/vmaf/ffmpeg-patches/series.txt
# Expected: all three patches apply with no rejects; the resulting
# tree compiles with --enable-libvmaf. SYCL is auto-detected via
# check_pkg_config (patch 0003), so no explicit configure flag is
# required when libvmaf-sycl.pc is on PKG_CONFIG_PATH.

Pure fork-local series; no Netflix-side conflict vector. See ADR-0118.

0019 — Coverage Gate annotations: upload-artifact v7 + gcovr filter

  • Workstream PRs: this PR.
  • Touches: .github/workflows/ci.yml (CPU + GPU coverage steps: gcovr stderr piped through grep -vE 'Ignoring (suspicious|negative) hits' ... || true), .github/workflows/{ci,lint,nightly,nightly-bisect,supply-chain,libvmaf}.yml (actions/upload-artifact@v5|@v6 → @v7, actions/download-artifact@v5 → @v7 in supply-chain.yml). Note: windows.yml was consolidated into libvmaf.yml by ADR-0115 / PR #50, so the windows-side bump now lives in libvmaf.yml's build (MINGW64, …) job.
  • Invariant: Coverage Gate Annotations panel must finish empty on a clean run. The two pieces are coordinated — (a) @v7 for upload / download artifact actions silences GitHub's Node-20 deprecation banner ahead of the 2026-06-02 forced-Node-24 cutoff; (b) the gcovr stderr filter swallows the Ignoring (suspicious|negative) hits warnings that gcovr 8 emits for the legitimately-large hit counts in tight ANSNR / VIF / motion inner loops (e.g. ansnr_tools.c:207 at ~4.93 G hits across an HD multi-frame coverage suite — real, not gcov bug). The filter is regex-narrow and anchored to gcov's exact warning prefix; any other gcovr warning still surfaces. Upstream (Netflix/vmaf) does not maintain these CI files; rebase impact is limited to the unlikely case that an upstream sync touches the shared .github/workflows/ tree, which it currently does not. See ADR-0117.
  • Re-test:
# Verify gcovr filter locally (after a coverage build per entry 0014):
~/.local/bin/gcovr --root .. \
    --filter 'src/.*' \
    --exclude '.*/test/.*' --exclude '.*/tests/.*' \
    --exclude '.*/subprojects/.*' \
    --gcov-ignore-parse-errors=negative_hits.warn \
    --gcov-ignore-parse-errors=suspicious_hits.warn \
    --print-summary --txt build-cov-test/coverage.txt \
    build-cov-test \
  2> >(grep -vE 'Ignoring (suspicious|negative) hits' >&2 || true)
# Expected: stderr contains the gcovr summary block but NO
# "Ignoring (suspicious|negative) hits" lines. coverage.txt unchanged.

# Verify all upload/download-artifact instances are on @v7:
grep -rE 'actions/(upload|download)-artifact@v[0-6]' .github/workflows/
# Expected: empty output.

0020 — CI workflow file + display-name renames (Title Case sweep)

  • Workstream PRs: this PR; renames all six core .github/workflows/*.yml files to purpose-descriptive kebab-case and normalises every workflow name: and job name: to Title Case. See ADR-0116.
  • Touches: .github/workflows/{ci,lint,security,libvmaf,ffmpeg,docker}.yml (renamed via git mv to tests-and-quality-gates.yml, lint-and-format.yml, security-scans.yml, libvmaf-build-matrix.yml, ffmpeg-integration.yml, docker-image.yml), README.md (5 badge URLs + labels), docs/principles.md (line 5 workflow-tuple update), .claude/skills/add-gpu-backend/SKILL.md + scaffold.sh (filename refs), docs/adr/0116-*.md (new), docs/adr/README.md (index row), CHANGELOG.md.
  • Invariant: workflow files are purpose-named; their name: fields are Title Case sentences with em-dash axis tags; job-level name: strings are Title Case sentences (Build — / Pre-Commit / Coverage Gate / etc.). Required-status-check contexts in master branch protection are bound to job-level names — when renaming any job, re-pin via gh api --method PUT repos/VMAFx/vmafx/branches/master/protection. The 19 required gates' semantics are unchanged from ADR-0037; only their display strings move.
  • Re-test:
# Validate every workflow file parses and lists the expected job names.
cd .github/workflows
for f in tests-and-quality-gates.yml lint-and-format.yml security-scans.yml \
         libvmaf-build-matrix.yml ffmpeg-integration.yml docker-image.yml; do
    yq '.name, .jobs.[].name' "$f" || echo "PARSE FAIL: $f"
done
# Expected: each workflow prints its Title Case workflow name + job names;
# no PARSE FAIL lines.

0021 — DNN-enabled CI matrix legs (gcc + clang + macOS)

  • Workstream PRs: this PR; adds three new entries to the libvmaf-build matrix in .github/workflows/libvmaf-build-matrix.yml covering -Denable_dnn=enabled across Ubuntu/gcc, Ubuntu/clang, and macOS/clang. See ADR-0120.
  • Touches: .github/workflows/libvmaf-build-matrix.yml (3 new matrix entries + ORT install steps + dedicated dnn-suite test step), docs/adr/0120-ai-enabled-ci-matrix-legs.md (new), docs/adr/README.md (index row), CHANGELOG.md (Added entry).
  • Invariant: the DNN matrix legs install ONNX Runtime via the same pinned source as the dedicated Tiny AI job (tests-and-quality-gates.yml) — Linux: MS tarball at the version pinned by ORT_VERSION; macOS: Homebrew. When the Tiny AI job's pin changes, the matrix legs' ORT_VERSION env in their Install ONNX Runtime (linux, DNN leg) step must change to match; otherwise compiler/portability coverage drifts away from the gating leg's actual ABI.
  • Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.libvmaf-build.strategy.matrix.include[] | select(.dnn==true) | .name' \
    .github/workflows/libvmaf-build-matrix.yml
# Expected output (3 lines):
#   Build — Ubuntu gcc (CPU) + DNN
#   Build — Ubuntu clang (CPU) + DNN
#   Build — macOS clang (CPU) + DNN

# Local DNN build sanity (matches what each leg will run):
meson setup libvmaf core/build --buildtype release \
    --prefix $PWD/install -Denable_float=true -Denable_dnn=enabled
ninja -vC core/build install
meson test -C core/build --suite=dnn --print-errorlogs
  • Branch protection: the two Linux DNN legs are pinned as required status checks on master immediately after this PR's merge (19 → 21 contexts). The macOS leg stays informational (experimental: true) because Homebrew ORT floats. Re-pin command:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
    --input /tmp/protection-update.json

0022 — Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)

  • Workstream PRs: this PR; adds a new top-level windows-gpu-build job to .github/workflows/libvmaf-build-matrix.yml with two matrix entries (CUDA, SYCL). See ADR-0121.
  • Touches: .github/workflows/libvmaf-build-matrix.yml (new windows-gpu-build job), docs/adr/0121-windows-gpu-build-only-legs.md (new), docs/adr/README.md (index row), CHANGELOG.md (Added entry), core/src/compat/win32/pthread.h (new — Win32 pthread shim for MSVC; mirrors compat/gcc/stdatomic.h pattern), core/src/feature/integer_adm.h (UPSTREAM — converted the dwt_7_9_YCbCr_threshold[3] designated initializer to positional form so MSVC/nvcc-on-Windows accepts the C++ parse; semantically identical, no behavioural change), core/src/ref.h and core/src/feature/feature_extractor.h (UPSTREAM — added #if defined(__cplusplus) && defined(_MSC_VER) branch around #include <stdatomic.h> so MSVC C++ TUs pull atomic_int via using std::atomic_int;; POSIX paths unchanged), core/src/sycl/d3d11_import.cpp (fix non-existent <libvmaf/log.h>"log.h"), core/src/sycl/dmabuf_import.cpp (move <unistd.h> inside #if HAVE_SYCL_DMABUF guard for non-VA-API hosts), core/src/sycl/common.cpp (replace POSIX clock_gettime(CLOCK_MONOTONIC) with portable std::chrono::steady_clock), core/src/feature/x86/motion_avx2.c (UPSTREAM — replace GCC vector-extension __m256i[N] indexing at line 529 with _mm256_extract_epi64; bit-exact), core/src/feature/x86/adm_avx2.c (UPSTREAM — replace 6 (__m256i)(_mm256_cmp_ps(...)) casts with _mm256_castps_si256(...) and 12 __m128i[N] reductions with _mm_extract_epi64; bit-exact), core/src/feature/x86/adm_avx512.c (UPSTREAM — replace 12 __m128i[N] reductions with _mm_extract_epi64; bit-exact), core/src/log.c (UPSTREAM — gate <unistd.h> behind !_WIN32, include <io.h> + redirect isatty/fileno to _isatty/_fileno for MSVC), core/src/feature/integer_vif.c (UPSTREAM — switch the aligned_malloc cursor from void * to uint8_t * with explicit typed-pointer casts so MSVC accepts the byte-wise pointer arithmetic), core/src/feature/cuda/integer_adm_cuda.c (UPSTREAM — drop unused <unistd.h> include), core/src/dnn/model_loader.c (fork-added — Windows fallback definitions for POSIX S_ISDIR / S_ISREG path-classification macros), .github/workflows/lint-and-format.yml (fork-added — set lfs: true on the pre-commit job's checkout so LFS-stored ONNX blobs resolve and don't appear as phantom pre-commit-induced diffs), core/src/feature/x86/motion_avx512.c (UPSTREAM — replace 1 __m128i[N] reduction with _mm_extract_epi64; bit-exact), core/src/feature/x86/{vif_statistic_avx2,ansnr_avx2,ansnr_avx512,float_adm_avx2,float_adm_avx512,float_psnr_avx2,float_psnr_avx512,ssim_avx2,ssim_avx512}.c (UPSTREAM — convert 17 sites of trailing __attribute__((aligned(N))) to leading C11 _Alignas(N); same alignment, MSVC-portable), core/src/feature/mkdirp.c and core/src/feature/mkdirp.h (UPSTREAM third-party MIT-licensed micro-library — gate <unistd.h> to non-Windows, add <direct.h> + _mkdir for Windows, add mode_t typedef for MSVC), core/meson.build (new pthread_dependency gated on cc.check_header('pthread.h') failing), core/src/meson.build and core/test/meson.build (thread pthread_dependency into every target compiling pthread-using TUs).
  • Invariant: Windows GPU legs are pinned to the same toolchain versions as the corresponding Linux GPU legs (CUDA 13.0.0, oneAPI BaseKit 2025.3.0.372) so a Linux-vs-Windows divergence implies an MSVC ABI issue, not a tooling-version delta. When either Linux GPU leg bumps its toolchain, the Windows leg must move in lockstep — the Intel installer URL on Windows hard-codes the per-release directory id and the version string, so the bump is two-line edits in the SYCL Install Intel oneAPI (windows) step (the WINDOWS_BASEKIT_URL env var). Both legs additionally inject /experimental:c11atomics into CFLAGS / CXXFLAGS because libvmaf uses C11 atomics that MSVC's <stdatomic.h> rejects without that opt-in flag — when MSVC ships full C11 atomics support, the flag becomes unconditional and can be dropped. Two Windows-only dependency steps round out the parity: the CUDA leg's Jimver/cuda-toolkit sub-package list includes both crt (CUDA Runtime Library compile-time headers, ships crt/host_config.h; cuda_cccl is not a valid Windows sub-package name — installer rejects it) and nvvm (ships nvvm/bin/cicc.exe + nvvm/libdevice/libdevice.*.bc; without it, nvcc's .cu → PTX stage fails with The system cannot find the path specified. — on Linux apt pulls NVVM in transitively with cuda-nvcc-XY, Windows requires it explicitly); the SYCL leg builds the Level Zero loader from source (oneapi-src/level-zero v1.18.5 → cmake --build … --target install) because Windows oneAPI BaseKit ships the SYCL runtime but not ze_loader.lib, and libvmaf's meson cc.find_library('ze_loader') needs both the header and the import library. When the Linux apt level-zero-dev version moves, bump the L0 git tag to match. core/src/meson.build guards the explicit svml / irc cc.find_library calls behind host_machine.system() != 'windows' — those calls exist for the gcc/g++ + icpx Linux flow where the host linker is non-Intel; on Windows the host compiler is icx-cl itself and auto-injects the Intel runtime. Round-10 surfaced an additional Windows-only gap: ~14 libvmaf TUs #include <pthread.h> unconditionally, but MSVC and clang-cl ship no pthread (MinGW does, via winpthreads). The fork now ships a header-only Win32 shim at core/src/compat/win32/pthread.h mapping the in-use pthread subset (mutex / cond / thread create+join+detach) onto SRWLOCK + CONDITION_VARIABLE + _beginthreadex. The shim is wired in via pthread_dependency in core/meson.build, declared only when cc.check_header('pthread.h') fails — so MinGW and POSIX paths stay untouched. When upstream Netflix/vmaf adds new pthread surface (e.g., pthread_rwlock_*), extend compat/win32/pthread.h to cover it. Both nvcc fatbin custom_targets (CUDA) and icpx custom_targets (SYCL common.cpp / picture_sycl.cpp / dmabuf_import.cpp, plus the SYCL feature kernels) bypass meson's dependencies: plumbing and hand-roll their own -I lists, so the shim path must be threaded into both cuda_extra_includes and sycl_inc_flags explicitly on Windows. icpx-cl on Windows additionally rejects -fPIC (unsupported option for target 'x86_64-pc-windows-msvc') — so sycl_common_args and sycl_feature_args route their -fPIC token through sycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []. PIC is the default for Windows DLLs, so dropping the flag is the correct fix rather than a workaround. Round-14 surfaced a third Windows-only blocker: core/src/feature/integer_adm.h (an upstream Netflix file, last touched by upstream port d06dd6cf) initialises dwt_7_9_YCbCr_threshold[3] with C99 designated initializers ({.a = ..., .k = ..., .f0 = ..., .g = {...}}). The header is included from both integer_adm.c (C TU) and cuda/integer_adm/*.cu (C++ TU via nvcc); MSVC's C++ frontend (and nvcc's cudafe++ on Windows) rejects C99 designated initializers without /std:c++20. Converted to positional initialization in the same struct-member order (a / k / f0 / g[4]) — the conversion is provably semantically identical and works in every C/C++ standard, so it costs nothing on the upstream-merge side beyond a trivial conflict marker if upstream Netflix later edits the same lines. Restore designated form post-merge if upstream has it. Round-17 surfaced four more Windows/MSVC-only SYCL blockers, two of which touch upstream-shared headers. (a) core/src/ref.h and core/src/feature/feature_extractor.h (UPSTREAM) unconditionally #include <stdatomic.h> and use the atomic_int typedef in struct definitions. MSVC's <stdatomic.h> (added in 19.34) only declares the C11 symbols inside the global namespace under C; in C++ compilation (icpx-cl drives the SYCL TUs as C++) MSVC surfaces them only inside namespace std::. gcc/clang expose both via a GNU extension, so the upstream code works on every other platform. The fork now wraps both headers' #include <stdatomic.h> in #if defined(__cplusplus) && defined(_MSC_VER)#include <atomic> + using std::atomic_int;, falling through to the original <stdatomic.h> line on every other configuration. ABI is unchanged — atomic_int resolves to the same underlying type. If upstream Netflix adds further C11 atomic typedefs in these headers (e.g., atomic_uint, atomic_size_t), extend the using std:: lines to cover them. (b) core/src/sycl/d3d11_import.cpp (fork-added) used <libvmaf/log.h> which doesn't exist — log.h lives at core/src/log.h and is internal. Switched to "log.h"; the icpx invocation already supplies the src-relative -I. (c) core/src/sycl/dmabuf_import.cpp (fork-added) included <unistd.h> at file scope, but POSIX close() is only used inside the #if HAVE_SYCL_DMABUF VA-API block. Moved the <unistd.h> include inside that guard so non-DMA-BUF builds (Windows MSVC, macOS) compile cleanly. (d) core/src/sycl/common.cpp (fork-added) called clock_gettime(CLOCK_MONOTONIC), which doesn't exist on Windows. Replaced with std::chrono::steady_clock (guaranteed monotonic by the C++ standard, portable on every supported host). All four fixes preserve POSIX/Linux behaviour bit-identically and only change the Windows MSVC build path. Round-18 surfaced a fifth Windows blocker on the CUDA leg's CPU SIMD compile path: core/src/feature/x86/motion_avx2.c:529 (UPSTREAM, ported in commit 9371a0aa from Netflix PR #1486) computed final_accum[0] + final_accum[1] + final_accum[2] + final_accum[3] to extract the four int64 lanes from an __m256i. gcc/clang allow this via the GNU vector-extension treatment of __m256i (it carries __attribute__((vector_size(32)))); MSVC rejects it with C2088: built-in operator '[' cannot be applied to an operand of type '__m256i'. Replaced with _mm256_extract_epi64(final_accum, N) for N ∈ {0..3}, summed — bit-exact lane sum on every compiler. Restore the index form post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Round-19 surfaced the same MSVC pattern at 19 more call sites across the AVX2/AVX-512 ADM and motion files plus six GCC-style vector casts. core/src/feature/x86/adm_avx2.c (UPSTREAM): 6 lines (915-920) used (__m256i)(_mm256_cmp_ps(...)) C-style casts that gcc/clang accept via the GNU vector extension; replaced with the dedicated _mm256_castps_si256(...) bit-cast intrinsic. 12 lane-extract sites (r2_h[0]+r2_h[1], etc. at lines 2420 / 2425 / 2430 / 2893 / 2897 / 2901 / 4079 / 4084 / 4089 / 4627 / 4631 / 4635) replaced with _mm_extract_epi64(r2_X, N) summed pair. core/src/feature/x86/adm_avx512.c (UPSTREAM): 6 sister lane-extract sites (lines 4470 / 4477 / 4484 / 4625 / 4631 / 4637) — same fix. The AVX-512 paths reduce a __m512i down to __m128i first (via _mm512_extracti64x4_epi64_mm256_extracti64x2_epi64) before the index, so only the final __m128i[N] step needed changing. core/src/feature/x86/motion_avx512.c (UPSTREAM, ported in 9371a0aa from PR #1486): one final r2[0]+r2[1] reduction (line 448), same fix. All 19 lane-extract fixes plus the 6 cast fixes are bit-exact rewrites and only change the source-level syntax to MSVC-portable form. Restore the original forms post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Additionally core/src/sycl/d3d11_import.cpp (fork-added) switched from C-style COBJMACROS helpers (ID3D11Device_CreateTexture2D, …_Release, etc.) to C++ method-call syntax (device->CreateTexture2D, tex->Release) — d3d11.h gates COBJMACROS behind !defined(__cplusplus), so the C-style helpers aren't visible in this .cpp TU. The two forms are ABI-equivalent (both dispatch through the COM vtable); the choice is purely lexical and POSIX builds aren't affected (the whole TU is #ifdef _WIN32). Round-20 surfaced two more Windows-only blockers. (a) 17 sites across the x86 SIMD layer used GCC's float tmp[N] __attribute__((aligned(M))); form to align scratch buffers for _mm{256,512}_store_ps. MSVC rejects the trailing-attribute syntax with C2146: syntax error: missing ';' before identifier '__attribute__'. Replaced with the C11-standard _Alignas(M) float tmp[N]; (alignment specifier before the type) — works in gcc, clang and MSVC with /std:c11. Files touched (all UPSTREAM): vif_statistic_avx2.c (×2), ansnr_avx2.c (×2), ansnr_avx512.c (×2), float_adm_avx2.c (×2), float_adm_avx512.c (×2), float_psnr_avx2.c (×1), float_psnr_avx512.c (×1), ssim_avx2.c (×4), ssim_avx512.c (×4). The pre-existing vif_avx2.c / vif_avx512.c already define a portable ALIGNED(x) macro at file scope and position the attribute before the type, so they compile cleanly under MSVC and were not touched. (b) core/src/feature/mkdirp.c (UPSTREAM, third-party MIT-licensed copy of Stephen Mathieson's micro-library) included <unistd.h> unconditionally but never used POSIX unistd symbols (only mkdir via <sys/stat.h>/<direct.h>). Gated <unistd.h> to non-Windows and added <direct.h> for Windows; switched mkdir(pathname)_mkdir(pathname) (the non-deprecated MSVC name). core/src/feature/mkdirp.h added a mode_t typedef under MSVC since neither <sys/types.h> nor <sys/stat.h> declare it on Windows; mode is ignored on the Windows path anyway. Round-21 surfaced two more blockers (the round-19 __m128i[N] sweep missed six sites) plus a pre-commit workflow checkout gap. (a) core/src/feature/x86/adm_avx512.c (UPSTREAM) had six further r2_X[0] + r2_X[1] reductions at lines 2128 / 2135 / 2142 / 2589 / 2595 / 2601 that reduce a __m512i accumulator down to __m128i before the lane index. Replaced with the same _mm_extract_epi64(r2_X, N) summed-pair pattern used in round 19 — bit-exact, MSVC-portable. (b) core/src/log.c (UPSTREAM) included <unistd.h> unconditionally to pick up POSIX isatty / fileno. On MSVC both live in <io.h> as _isatty / _fileno; gated the include and macro-redirected the names so the one call site at line 34 compiles on both sides without touching the POSIX path. (c) .github/workflows/lint-and-format.yml (fork-added) checks out without lfs: true, so the model/tiny/*.onnx files land as LFS pointer stubs. pre-commit's "changes made by hooks" reporter then diffs the stubs against HEAD's real blobs and fails the job even though no hook touched them. Added lfs: true to the pre-commit job's checkout. (d) core/src/meson.buildcuda_common_vmaf_lib static library had no dependencies: list, so the Win32 pthread shim (wired in via pthread_dependency in core/meson.build) wasn't on its include path; cuda/common.h unconditionally #include <pthread.h> and MSVC failed with C1083. Added dependencies : [pthread_dependency] — no-op on POSIX (empty list), routes the shim path in on Windows. (e) core/src/feature/integer_vif.c (UPSTREAM) walked one big aligned_malloc result as void *data and did data += pad_size / data += h * stride_16 etc. to carve the buffer into typed sub-pointers. gcc/clang accept pointer arithmetic on void * as a GNU extension (treating sizeof(void) == 1); MSVC rejects it with C2036: 'void *': unknown size. Replaced the cursor type with uint8_t * and added explicit casts at assignment sites that take a typed pointer (uint16_t *mu1, uint32_t *mu1_32, etc.). Byte offsets are identical, layout unchanged, bit-exact. If upstream Netflix edits the same loop, reabsorb the walk and re-apply the cursor-type + cast pattern. (f) core/src/feature/cuda/integer_adm_cuda.c (UPSTREAM) included <unistd.h> at line 33 but used no POSIX symbols from it; MSVC failed with C1083. Dropped the unused include outright — simplest fix, no runtime change on any platform. (g) core/src/dnn/model_loader.c (fork-added) uses S_ISDIR / S_ISREG to classify resolved paths. MSVC ships the underlying S_IFMT / S_IFDIR / S_IFREG bit masks in <sys/stat.h> but not the POSIX classification macros. Added a Windows-only fallback (#ifndef S_ISDIR #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #endif, same for S_ISREG) guarded by #ifdef _WIN32. Semantically identical to the POSIX macro on Linux/macOS. Round-21e surfaced the final source-portability blockers once the DLL build passed preprocessing. (h) core/src/predict.c, core/src/libvmaf.c and core/src/read_json_model.c (all UPSTREAM) used C99 variable-length arrays — double scores[cnt] at predict.c:385, char name[name_sz] at predict.c:453 and libvmaf.c:1741, plus cfg_name[cfg_name_sz] and generated_key[generated_key_sz] in the .json model-collection parser. gcc/clang accept VLAs as a C11 optional feature; MSVC (even with /std:c11) rejects them outright with C2057: expected constant expression (plus C2466 and C2133 on the const size_t sized arrays — MSVC treats const as runtime-bounded, not a constant expression, even when the initialiser is literal like 4 + 1). Replaced each runtime-sized buffer with a small malloc + explicit free on every exit path (in predict.c and read_json_model.c a goto out; cleanup arm was introduced because the loops error-exit mid-function). The generated_key buffer in read_json_model.c uses the narrower fix — char generated_key[5]; — since its size (four decimal digits of the bootstrap sub-model index plus NUL) is a true compile-time constant. Buffers are a handful of bytes each (name_sz is the model-collection name length plus the fixed _ci_p95_lo suffix, scores holds ~20 doubles, cfg_name is the name plus _0000 suffix), so the heap round-trip is not performance-relevant; the new -ENOMEM failure mode is handled uniformly by existing callers. The read_json_model.c refactor also plugs a pre-existing leak of the name buffer on the early return -EINVAL when a JSON object key isn't a string — the goto out; path frees name + cfg_name on every exit. core/test/test_feature_extractor.c:56 (UPSTREAM) declared const unsigned n_threads = 8; and used it as the extent of VmafFeatureExtractorContext *fex_ctx[n_threads];. Converted to enum { n_threads = 8 }; so MSVC sees a constant-expression; every other compiler accepts enum constants identically. Re-absorb if upstream Netflix later edits the same loops and your toolchain matrix omits MSVC. (i) The Windows MSVC build-only legs now build the full tree — CLI tools, unit tests and libvmaf.dll — rather than the previous short cut of disabling -Denable_tools / -Denable_tests. Per user direction ("fix the code ffs"), the tree polyfills the remaining POSIX surfaces on MSVC instead: (core/tools/compat/win32/getopt.h + core/tools/compat/win32/getopt.c) a from-scratch POSIX/GNU-compatible getopt_long shim (short / long options, no_argument / required_argument / optional_argument, argv permutation for non-option operands, -- explicit stop, =-embedded values). The shim is fork-added (BSD-3-Clause-Plus-Patent, Copyright 2026 Lusoris and Claude) and declared via a single getopt_dependency in core/meson.build, gated on cc.check_header('getopt.h') failing. The dependency auto-propagates the shim .c into any consuming target via meson's sources: keyword, so both the vmaf CLI (core/tools/meson.build) and the test_cli_parse unit test (core/test/meson.build) pick it up uniformly. MinGW ships <getopt.h> via mingw-w64-crt, so check_header succeeds there and the shim stays out of the TU list. (j) Eleven test executables (test_log, test_dict, test_opt, test_cpu, test_ref, test_feature, test_ciede, test_luminance_tools, test_cli_parse, test_sycl, test_sycl_pic_preallocation) were missing pthread_dependency in their dependencies: lists at core/test/meson.build. On POSIX pthread_dependency is an empty list so the omission was invisible; on MSVC those TUs transitively include feature_collector.h<pthread.h> and fail with C1083. Threaded the dependency through all eleven targets. test_cli_parse additionally lists getopt_dependency to pick up the shim. (k) Three additional VLA sites surfaced once the test harness built on MSVC: test_cambi.c:254 had unsigned w = 5, h = 5; uint16_t buffer[3 * w];; converted to enum { w = 5, h = 5 }; so the array extent is a constant expression. test_pic_preallocation.c:382 and test_pic_preallocation.c:506 had const int num_threads = N; pthread_t threads[num_threads]; — MSVC rejects const int as non-constant-expression. Converted to enum { num_threads = N, fetches_per_thread = M };. (l) test_ring_buffer.c:23 and test_pic_preallocation.c:26 included <unistd.h> for usleep / sleep. Gated behind !_WIN32 with a Win32 fallback via <windows.h> + #define usleep(us) Sleep(((us) + 999) / 1000) / #define sleep(s) Sleep((s) * 1000). The conversion rounds sub-millisecond usleep inputs up, which is safe for these test paths (they use 100 µs jitter and 1 s waits). (m) core/tools/vmaf.c included <unistd.h> for isatty / fileno. Applied the same gating pattern used in log.c in round-21(b) — include <io.h> on MSVC and redirect isatty / fileno to _isatty / _fileno via #define. (n) __builtin_clz / __builtin_clzll are GCC intrinsics; MSVC ships __lzcnt / __lzcnt64 via <intrin.h> instead. The shim already lived in core/src/feature/integer_vif.h but integer_adm.c:939, x86/adm_avx2.c:1425 and x86/adm_avx512.c:1217 don't include that header. Extracted the shim into a dedicated core/src/feature/compat_builtin.h (fork-added) and included it from all four TUs. The guard is defined(_MSC_VER) && !defined(__clang__), so clang-cl / icx-cl (which provide the GCC intrinsics natively) skip the shim. (o) The SYCL leg's D3D11 import TU core/src/sycl/d3d11_import.cpp is C++ (icpx-cl drives it as C++ on Windows) but included the internal C header log.h without an extern "C" wrap. log.h is an upstream Netflix header with no __cplusplus guard, so vmaf_log got C++ name-mangled in the .cpp TU and failed to resolve against the C-linkage symbol produced by log.c at link time (LNK2019 from every test target that pulls in the SYCL static lib). Wrapped the #include "log.h" with extern "C" { ... } inside the fork-added .cpp rather than touching the upstream header — keeps log.h identical to upstream on every /sync-upstream. (p) The Windows MSVC legs build with --default-library=static. libvmaf's public API has no __declspec(dllexport) attributes (upstream Netflix is POSIX-shaped), so a vanilla MSVC shared build produces src/vmaf-3.dll with no exported symbols and the toolchain therefore never emits the companion vmaf.lib import library. Downstream tool targets then fail with LNK1181: cannot open input file 'src\vmaf.lib'. The MinGW matrix leg has used --default-library static since day one for the same reason (line 387); the MSVC legs now mirror that choice via matrix.include[].meson_extra. Downstream consumers that want a DLL can either add __declspec(dllexport) decorations to the public API or use a .def file; that is a separate decision and out of scope for the build-only gate.
  • Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.windows-gpu-build.strategy.matrix.include[].name' \
    .github/workflows/libvmaf-build-matrix.yml
# Expected output (2 lines):
#   Build — Windows MSVC + CUDA (build only)
#   Build — Windows MSVC + oneAPI SYCL (build only)
  • Branch protection: the two Windows GPU legs are pinned as required status checks on master immediately after this PR's merge. After ADR-0120's two Linux DNN legs the count moves 21 → 23. Re-pin via:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
    --input /tmp/protection-update.json

0023 — CUDA gencode coverage (sm_86/sm_89/compute_80 PTX) + init hardening

  • Workstream PRs: the ADR-0122 PR (gencode + init hardening) and the ADR-0123 follow-up for the 32b115df post-cubin-load regression.
  • Touches:
  • core/src/meson.build — the gencode array in the if get_option('enable_nvcc') branch.
  • core/src/cuda/common.cvmaf_cuda_state_init() error paths (multi-line actionable log, cuda_free_functions() + free(c) + *cu_state = NULL cleanup).
  • docs/backends/cuda/overview.md## Runtime requirements section and ### GPU architecture coverage table.
  • Invariant: the gencode array unconditionally emits cubins for sm_75 / sm_80 / sm_86 / sm_89 plus a compute_80 PTX, independent of host nvcc version. Upstream Netflix's gencode only ships cubins at Txx major boundaries (sm_75 / sm_80 / sm_90 / sm_100 / sm_120); a literal merge that replaces our array with upstream's would re-open the Ampere-sm_86 / Ada-sm_89 coverage hole. The sm_90 / sm_100 / sm_120 entries are still version-gated and should be preserved verbatim if upstream adds new gates. The init-path error messages are fork-local strings; upstream's terse "Error: failed to load CUDA functions" must NOT win a merge.
  • Re-test:
meson setup build -Denable_cuda=true -Denable_nvcc=true
ninja -C build 2>&1 | grep -E 'compute_(80|86|89)'
# Expect at least -gencode=arch=compute_86,code=sm_86 and
#                -gencode=arch=compute_89,code=sm_89 and
#                -gencode=arch=compute_80,code=compute_80

# Actionable init message (run without CUDA driver on the loader path):
LD_LIBRARY_PATH= ./build/tools/vmaf --help 2>&1 | grep -qi 'libcuda.so.1' || \
    echo "init log regressed"

0024 — vmaf_read_pictures null-guard for CUDA device-only path

  • Workstream PRs: the ADR-0123 follow-up landed atop the ADR-0122 gencode/init-hardening work.
  • Touches:
  • core/src/libvmaf.c — the non-threaded tail of vmaf_read_pictures at the prev_ref update site (line ~1428 in the fork; upstream equivalent is the tail added by f740276a).
  • Invariant: the prev_ref update is guarded by if (ref && ref->ref) so pure-CUDA extractor sets (where ref = &ref_host but ref_host was never populated by translate_picture_device) do not deref a NULL refcount. Upstream currently has the same unguarded tail; the bug is masked upstream only because the experimental VMAF_PICTURE_POOL gate from 32b115df is still in place. A literal upstream merge that removes our null-guard while upstream's experimental gate is still holding would pass tests but re-open the libvmaf_cuda ffmpeg crash the moment the gate flips default-on (which the fork did in 65460e3a, ADR-0104). Keep the guard until the upstream null-guard port lands.
  • Re-test:
# Unit tests cover the non-regression on the library side:
meson test -C build

# End-to-end regression: ffmpeg libvmaf_cuda must exit 0 on a
# CUDA-device-only extractor set (full recipe in ADR-0123).
./ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
  -i /tmp/ref.mp4 -i /tmp/dis.mp4 \
  -lavfi "[0:v]format=yuv420p,hwupload_cuda[r];\
          [1:v]format=yuv420p,hwupload_cuda[d];\
          [r][d]libvmaf_cuda=log_path=/tmp/out.json:log_fmt=json" \
  -f null -

0025 — VIF init() fail-path frees advanced byte-cursor

  • Workstream PRs: PR #47 (rewritten to leak-fix-only after master absorbed the void→uint8_t half via commit b0a4ac3a, entry 0022 §e). Ports the leak-fix half of upstream Netflix PR #1476.
  • Touches: core/src/feature/integer_vif.c (UPSTREAM — 2-line fix in the init() fail: handler).
  • Invariant: init() walks uint8_t *data forward through aligned_malloc's one allocation, advancing past each sub-pointer assignment. If vmaf_feature_name_dict_from_provided_features returns NULL the fail path must free the base pointer s->public.buf.data, never the advanced cursor data. Upstream master still has aligned_free(data) there — same bug — so this entry is the reminder to not let an upstream sync re-introduce the advanced-cursor form. If upstream lands PR #1476 or an equivalent, the sync can drop this entry.
  • Re-test:
meson test -C build --suite=fast
# Static check: ripgrep the pattern that must NOT return.
rg -n "aligned_free\(data\)" core/src/feature/integer_vif.c && \
    echo 'REGRESSED' || echo 'ok'
  • Workstream PRs: this PR (ADR-0124 adoption). Closes the "rule-without-a-check" gap on ADR-0100 / 0105 / 0106 / 0108.
  • Touches (all FORK-ADDED — no upstream overlap): .github/workflows/rule-enforcement.yml (new), scripts/ci/check-copyright.sh (new), .pre-commit-config.yaml (appended local hook).
  • Invariant: the deep-dive-checklist job is blocking on every PR that is not an upstream port (exempt via port: title prefix or port/ branch). The other three gates (doc-substance-check, adr-backfill-check, copyright pre-commit) are advisory or pre-commit, never CI-blocking; this split is the whole point of ADR-0124 and an upstream sync must not move them into the required-status-check set without a follow-up ADR. The opt-out parser matches /^-?\s*no .* (?:needed|impact|rebase-sensitive)/ per ADR-0108 §Opt-out-lines — if upstream ever changes PR-template phrasing (unlikely; this is fork-local), the regex and the template must move together.
  • Re-test:
# Lint the workflow + hook locally.
pre-commit run --files \
  .github/workflows/rule-enforcement.yml \
  scripts/ci/check-copyright.sh \
  .pre-commit-config.yaml

# Dry-run the copyright hook against a staged source file.
scripts/ci/check-copyright.sh core/src/libvmaf.c && echo ok

# Synthetic PR body that violates ADR-0108 should fail the parser;
# see docs/research/0002-automated-rule-enforcement.md §Verification
# plan for the three test cases.

0027 — SSIMULACRA 2 scalar extractor (libjxl FastGaussian IIR blur)

  • Workstream PRs: this PR (feat/ssimulacra2-scalar); proposal ADR in PR #67.
  • Touches: core/src/feature/ssimulacra2.c (fork-local, new), core/src/meson.build, core/src/feature/feature_extractor.c.
  • Invariant: the extractor embeds several tables that must track libjxl upstream — opsin absorbance matrix, MakePositiveXYB offsets, 108 pooling weights, polynomial-transform coefficients, and the FastGaussian coefficient-derivation formulas (radius = 3.2795·σ + 0.2546, Cramer's 3×3 solve for β, n2/d1 assignment per Charalampidis 2016 (33)). If libjxl ever changes any of these, update ssimulacra2.c in the same PR that syncs upstream. Self-consistency must stay at exactly 100.000000 for identical ref/dist inputs — this is the cheapest regression check.
  • Re-test:
meson test -C build --suite=fast
./build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 --feature ssimulacra2 -o /tmp/self.xml \
  && grep -q 'ssimulacra2="100.000000"' /tmp/self.xml \
  && echo "ok: self-consistency 100.0"

0028 — MS-SSIM separable decimate + AVX2/AVX-512/NEON SIMD

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 (supersedes the rebase-incompatible feat/ms-ssim-decimate-simd; AVX2/AVX-512, commits 7de8cd7f scalar separable, 5f93c864 AVX2, 73436438 AVX-512); feat/ms-ssim-decimate-neon-v2 (NEON follow-up, stacked).
  • Touches: core/src/feature/ms_ssim_decimate.{c,h} (NEW), core/src/feature/x86/ms_ssim_decimate_avx2.{c,h} (NEW), core/src/feature/x86/ms_ssim_decimate_avx512.{c,h} (NEW), core/src/feature/arm64/ms_ssim_decimate_neon.{c,h} (NEW), core/src/feature/ms_ssim.c (call-site change), core/src/meson.build (register new SIMD TUs), core/test/test_ms_ssim_decimate.c (NEW), core/test/meson.build (arm64 gating).
  • Invariant: the 9-tap 9/7 biorthogonal wavelet LPF coefficients (ms_ssim_lpf_h / ms_ssim_lpf_v) are duplicated verbatim in five TUs for bit-identity: the scalar ms_ssim_decimate.c, the AVX2 variant, the AVX-512 variant, the NEON variant, and upstream's g_lpf_h / g_lpf_v in ms_ssim.c. Any upstream change to the coefficient values or the KBND_SYMMETRIC mirror branch in iqa/convolve.c must be mirrored to all five. If not mirrored, SIMD paths and scalar diverge silently and the bit-equality memcmp in test_ms_ssim_decimate catches it — but only when that test runs, so diff the five files first.
  • Re-test (on each supported host arch):
# x86_64 host — native build.
meson test -C build
./build/test/test_ms_ssim_decimate

# aarch64 host OR aarch64 cross under qemu — see /tmp/aarch64-cross.txt.
meson setup build-arm64 libvmaf --cross-file /tmp/aarch64-cross.txt \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build-arm64
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
    build-arm64/test/test_ms_ssim_decimate

# Netflix MS-SSIM golden — places=4 must still pass through SIMD.
.venv/bin/python -m pytest \
    python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor

0029 — KBND_SYMMETRIC period-based reflection in iqa/convolve.c

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 follow-up (CI triage on PR #69, 2026-04-20).
  • Touches: core/src/feature/iqa/convolve.c (upstream file, rewritten KBND_SYMMETRIC).
  • Invariant: KBND_SYMMETRIC(img, w, h, x, y, _) must use the period-based form (period = 2*w, period = 2*h) so that offsets with |x| > w or |y| > h still land in bounds. Upstream's single-reflect form was out-of-bounds whenever w < kernel_half or h < kernel_half; the latent bug did not reproduce in Netflix golden tests because MS-SSIM pyramids never decimate below ~60×34. Any upstream change that reverts to the single-reflect form must be rejected or re-ported.
  • Re-test:
./build/test/test_ms_ssim_decimate        # test_1x1 border case
.venv/bin/python -m pytest \
    python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor

0030 — adm_decouple_s123_avx512 stack-array 64-byte alignment

  • Workstream PRs: feat/ms-ssim-decimate-simd-v2 follow-up (CI triage on PR #69, 2026-04-20).
  • Touches: core/src/feature/x86/adm_avx512.c (upstream file, one-line _Alignas(64) on int64_t angle_flag[16] at line 1317). core/test/test_pic_preallocation.c (upstream file, three vmaf_model_destroy(model) calls pairing the vmaf_model_load in test_picture_pool_basic / _small / _yuv444).
  • Invariant: the stack slot for angle_flag must be 64-byte aligned because two _mm512_loadu_si512(&angle_flag[0/8]) loads in the same scope may be promoted to aligned vmovdqa64 by LTO. Dropping the _Alignas(64) annotation re-introduces the SEGV under --buildtype=release -Db_lto=true -Db_sanitize=address. Debug / no-LTO builds keep vmovdqu64 and cannot flag the regression. See docs/development/known-upstream-bugs.md.
  • Re-test:
meson setup build-asan-lto libvmaf \
    -Denable_cuda=false -Denable_sycl=false \
    -Db_sanitize=address --buildtype=release -Db_lto=true
ninja -C build-asan-lto test/test_pic_preallocation
ASAN_OPTIONS=detect_leaks=1 \
    ./build-asan-lto/test/test_pic_preallocation

0031 — Batch-A upstream-port small-fix sweep (ports of unmerged PRs)

  • Workstream PRs: feat/batch-a-upstream-small-fix-sweep — commits 546a40ee (T0-1), 8fed8ad1 (T4-4), 83a1db46 (T4-5), 34425dee (T4-6). ADRs 0131, 0132, 0134, 0135.
  • Touches:
  • core/src/cuda/picture_cuda.c (one-line cuMemFree port of Netflix#1382)
  • core/src/feature/feature_collector.c + core/test/test_feature_collector.c (mount/unmount bugfix port of Netflix#1406 + shared-helper test refactor)
  • core/src/meson.build (declare_dependency + override_dependency port of Netflix#1451)
  • core/include/libvmaf/model.h, core/src/model.c, core/test/test_model.c, docs/api/index.md (built-in model iterator port of Netflix#1424)
  • Invariant: each of the four upstream PRs is OPEN (unmerged) on the port date; when Netflix merges any of them, the fork's version is correction-bearing (T4-4 test refactor, T4-6 three defect fixes + Doxygen doc expansion), not line-identical. Resolution on upstream merge is always "keep fork version" because the fork's version already satisfies the PR's intent and additionally fixes the defects.
  • Netflix#1406 conflict will land in test_feature_collector.c — fork uses load_three_test_models() helper vs upstream's inline per-model VmafModel *m0, *m1, *m2; duplication.
  • Netflix#1424 conflict will land in core/src/model.c and core/test/test_model.c — fork uses else if guard + idx + 1 < CNT + const-qualified test types.
  • Netflix#1382 and Netflix#1451 are line-identical in substance; merge should be clean aside from trailing-comma style drift.
  • Re-test:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_feature_collector test/test_model
build/test/test_feature_collector
build/test/test_model
# Expected: 6/6 pass in test_feature_collector (mount/unmount
# 3-model sequences); 39/39 pass in test_model (includes
# test_version_next full-iteration invariant).

0032 — Thread-local locale handling for numeric I/O (port of Netflix/vmaf#1430)

  • Workstream PRs: port/netflix-1430-thread-locale (T4-3 from the "Batch-A follow-up" sweep, 2026-04-20).
  • Touches: core/src/thread_locale.h / core/src/thread_locale.c (new, upstream-authored); core/src/meson.build (two cdata.set('HAVE_USELOCALE'/'HAVE_XLOCALE_H') probes + src_dir + 'thread_locale.c' in libvmaf_sources); core/src/output.c (four writers gain push_c() + pop() bracket, preserving fork's ferror(outfile) ? -EIO : 0 return contract from ADR-0119); core/src/svm.cpp (drop <locale.h> include; replace setlocale/strdup/setlocale bracket with vmaf_thread_locale_push_c/pop; add buffer.imbue(std::locale::classic()) to both SVM parser ctors with fork's K&R + 4-space style); core/src/read_json_model.c (bracket model_parse with push/pop); core/test/meson.build (new test_locale_handling target + test registration); core/test/test_locale_handling.c (new, upstream-authored with three fork corrections for the score_format parameter).
  • Invariant: fork's output writers return ferror(outfile) ? -EIO : 0 — this must survive any upstream refactor of the writer bodies. The push_c() call MUST be paired with a pop() on every return path (writer bodies have a single tail return, so the pattern is locally push → body → pop → return ferror-check). Dropping pop() leaks a locale_t on POSIX and leaves the thread locked to "C" on Windows.
  • Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_locale_handling
# Repro the user-visible failure without the fix:
LC_ALL=de_DE.UTF-8 build/tools/vmaf --reference ref.yuv \
    --distorted dis.yuv --width 1920 --height 1080 \
    --pixel_format 420 --bitdepth 8 --output result.json \
    --json
# Assert output contains period decimals, not comma.
python -c "import json; d=json.load(open('result.json')); \
    assert all('.' in repr(v) for v in \
    [f['metrics']['vmaf'] for f in d['frames']])"
  • On upstream sync: when Netflix merges PR #1430, the (cherry picked from commit 054a97ed…) trailer in git log port/netflix-1430-thread-locale lets the next /sync-upstream skip this commit. If the upstream diff drifts, redo the three fork corrections listed in ADR-0137 §Decision.

0033 — SSIM / MS-SSIM SIMD bit-exact to scalar via per-lane scalar double

  • Workstream PRs: feat/ms-ssim-decimate-neon (this PR — companion to the ADR-0138 convolve fast path).
  • Touches: core/src/feature/x86/ssim_avx2.c and core/src/feature/x86/ssim_avx512.cssim_accumulate_* rewritten. ssim_precompute_* and ssim_variance_* unchanged (they were already bit-exact). Plus the new bit-exact convolve_avx2.c / convolve_avx512.c and the upstream h-pass OOB fix at iqa/convolve.c:159.
  • Invariants (see ADR-0139 §Decision):
  • Convolve tapssingle-rounded float*float → widen → double add, NO FMA. Mirrors scalar sum += img[i]*k[j] in iqa/convolve.c.
  • SSIM accumulate — scalar's 2.0 * literal (2.0 * ref_mu[i] * cmp_mu[i] + C1 and 2.0 * srsc + C2) is a C double literal. Both SIMD accumulators do the 2.0 * numerator + division + final l*c*s product per-lane in scalar double to match scalar type promotions byte-for-byte.
  • H-pass outer-loop boundy < dst_h + vc - kh_even (not y < dst_h + vc); the - kh_even is load-bearing because the last cache row on even-tap kernels (e.g. box-8) is never read by the v-pass but was previously written OOB when image height equals kernel height.

Fork-local SSIM SIMD is NOT upstream. If upstream ever adds their own SSIM AVX2/AVX-512, keep the fork's version on conflict — it's the only variant verified bit-exact to scalar at --precision max. - Re-test:

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_iqa_convolve test_ms_ssim_decimate
# Bit-exactness check across dispatch backends:
FIX=python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv
DIS=python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv
for m in 255 16 0; do
  build/tools/vmaf --cpumask $m --reference $FIX --distorted $DIS \
      --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
      --feature float_ssim --feature float_ms_ssim \
      --output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_16.xml)    # expect empty
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_0.xml)     # expect empty
  • On upstream sync: the AVX2/AVX-512 SSIM surface is entirely fork-local (upstream has VIF/ADM/motion/CAMBI SIMD but no SSIM). If upstream ever introduces SSIM SIMD, their kernel bodies will almost certainly compute l*c*s in vector float for throughput — do not adopt. The fork's per-lane-scalar-double reduction is required for the bit-exactness claim. Same applies to convolve_avx2/512 — they are fork-only; dispatch sits in ssim_tools.c via _iqa_convolve_set_dispatch.

0034 — SIMD DX framework + NEON SSIM/convolve bit-exact port

  • Workstream PRs: feat/simd-dx-framework (this PR, PR #A); ships the two demos on top of which PR #B will consume the framework (ssimulacra2, motion_v2, vif_statistic, ...).
  • Touches: core/src/feature/simd_dx.h (new header), core/src/feature/arm64/convolve_neon.c + convolve_neon.h (new NEON port), core/src/feature/arm64/ssim_neon.c (ssim_accumulate_neon rewritten for ADR-0139 bit-exactness; precompute + variance unchanged), core/src/feature/float_ssim.c + core/src/feature/float_ms_ssim.c (wire iqa_convolve_neon into the aarch64 dispatch setters), core/src/meson.build (arm64_sources += convolve_neon.c), core/test/meson.build (test_iqa_convolve arch filter extended to arm64 / aarch64), core/test/test_iqa_convolve.c (NEON variant check + aarch64 CPU flag detection), core/test/dnn/meson.build (test_cli.sh gated on not meson.is_cross_build() — bash invokes $VMAF_BIN directly so meson's exe_wrapper isn't applied), new build-aux/aarch64-linux-gnu.ini meson cross-file, .claude/skills/add-simd-path/SKILL.md (upgraded kernel-spec flags).
  • Invariants (see ADR-0140 §Decision):
  • simd_dx.h is fork-local. Keep the fork's version on upstream conflict. Macro names are ISA-suffixed (_AVX2_4L, _AVX512_8L, _NEON_4L) — do not collapse into a cross-ISA abstraction; the fork's SIMD policy (user-memory feedback_simd_dx_scope.md) rules out Highway / simde / xsimd.
  • The ADR-0138 widen-then-add rule (single-rounded float * float → widen → double add, NO FMA) applies to NEON exactly as to AVX2 / AVX-512. The NEON form uses paired float64x2_t accumulators (lo / hi) because NEON has no float64x4_t.
  • The ADR-0139 per-lane scalar-double reduction rule applies to ssim_accumulate_neon exactly as to the AVX2 / AVX-512 variants. The NEON implementation uses SIMD_ALIGNED_F32_BUF_NEON (_Alignas(16) float name[4]) + a 4-iteration scalar loop.
  • Re-test (requires aarch64-linux-gnu-gcc + qemu-user-static + aarch64 sysroot at /usr/aarch64-linux-gnu):
cd libvmaf
meson setup ../build-aarch64 \
  --cross-file ../build-aux/aarch64-linux-gnu.ini \
  -Denable_cuda=false -Denable_sycl=false -Denable_dnn=disabled
cd ..
ninja -C build-aarch64
meson test -C build-aarch64                       # expect 31/31 OK
# Bit-exactness check scalar vs NEON under QEMU:
REF=python/test/resource/yuv/src01_hrc00_576x324.yuv
DIS=python/test/resource/yuv/src01_hrc01_576x324.yuv
for m in 255 0; do
  LD_LIBRARY_PATH=$PWD/build-aarch64/src qemu-aarch64-static \
    -L /usr/aarch64-linux-gnu build-aarch64/tools/vmaf \
    --cpumask $m --reference $REF --distorted $DIS \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    --feature float_ssim --feature float_ms_ssim \
    --output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
     <(grep -v '<fyi fps' /tmp/ssim_0.xml)     # expect empty
  • On upstream sync: upstream has no NEON SSIM and no NEON convolve for IQA. If they ever add one, keep the fork's version on conflict — the fork's NEON path is the only variant verified bit-exact to scalar at --precision max. The build-aux/aarch64-linux-gnu.ini cross-file has no upstream equivalent. The /add-simd-path skill is fork-only; upstream doesn't ship .claude/skills/.

0036 — Port Netflix generalised AVX convolve + ADR-0141 cleanup

  • Workstream PRs: port/upstream-f3a628b4-generalized-avx-convolve (this PR).
  • Upstream commit: f3a628b4 "feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21).
  • Touches:
  • convolution.h — upstream-tracking: adds #define MAX_FWIDTH_AVX_CONV 17.
  • convolution_avx.c — upstream-tracking (2,500 LoC deletion) plus fork-delta cleanup per ADR-0141: four scanline helpers convolution_f32_avx_s_1d_* changed from external linkage to static (no other TU uses them after the specialised-path removal); stride parameters widened from int to ptrdiff_t in the helpers, with (ptrdiff_t) casts at public-function multiplication sites; #include <stddef.h> added for the type.
  • core/src/feature/vif_tools.c — upstream-tracking: three AVX dispatch sites drop the fwidth == 17 || ... == 3 whitelist in favour of fwidth <= MAX_FWIDTH_AVX_CONV.
  • python/test/quality_runner_test.py, python/test/vmafexec_test.py — upstream-authored loosening of two full-VMAF-score assertions from places=2 (±0.005) to places=1 (±0.05). Adopted per the ADR-0142 Netflix-authority precedent (project rule #1 addresses fork drift, not upstream-authored test updates the fork must track).
  • Invariants (see ADR-0143 §Decision):
  • Static linkage on scanline helpers — upstream leaves the four convolution_f32_avx_s_1d_*_scanline helpers with external linkage out of habit; the fork narrows them to static. On upstream sync: if upstream ever externs them from another TU, that's a flag to re-audit; keep the fork's static unless the reference is real.
  • ptrdiff_t strides inside helpers — the public convolution_f32_avx_*_s wrappers keep int strides (matching the upstream interface + convolution.h declarations). Helpers take ptrdiff_t to silence bugprone-implicit-widening-of- multiplication-result. If upstream changes the public interface to ptrdiff_t, drop the fork's wrapper-level casts.
  • MAX_FWIDTH_AVX_CONV = 17 — the ceiling is upstream's; if upstream bumps it, the fork must rebuild + re-run the VIF golden test pair.
  • Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build            # expect 32/32 OK
clang-tidy -p build core/src/feature/common/convolution_avx.c
# Zero warnings expected on the touched file.

Netflix CPU golden CI leg exercises the two loosened assertions; confirmed locally under meson test. - On upstream sync: upstream is the source of truth for convolution_avx.c, convolution.h, vif_tools.c dispatch, and the two python golden tolerances. On a rebase, prefer upstream for those files except: - Keep the fork's static on the four scanline helpers. - Keep the fork's ptrdiff_t helper signatures + multiplication- site casts (unless upstream adopts them too, in which case converge). - Keep the fork's #include <stddef.h>. If upstream re-introduces a specialised fast path for common widths, evaluate on a per-fwidth perf profile — the fork's /profile-hotpath skill covers this.

0038 — motion_v2 NEON SIMD (fork-local)

  • Workstream PR: port/motion-bundle-neon-and-updates (this PR).
  • Upstream: none — aarch64 NEON for motion_v2 is fork-local. Upstream scalar + AVX2 + AVX-512 variants exist; this PR adds the missing NEON fourth path. Scalar is the bit-exactness ground truth.
  • Touches (fork-local):
  • motion_v2_neon.c — new TU, ~300 LoC. 4-wide int32 SIMD over the 5-tap Gaussian pipeline. Five static inline helpers keep every function under the ADR-0141 60-line budget.
  • motion_v2_neon.h — new header declaring the two public entry points.
  • integer_motion_v2.c — dispatch update: adds an #if ARCH_AARCH64 block in init that selects the NEON variant when VMAF_ARM_CPU_FLAG_NEON is present, mirroring the existing x86 dispatch blocks.
  • core/src/meson.build — add arm64/motion_v2_neon.c to the arm64_sources list.
  • Invariants (see ADR-0145 §Decision):
  • Arithmetic right-shift throughout. The fork's AVX2 path uses _mm256_srlv_epi64 (logical) which can diverge from scalar on negative-diff pixels. The NEON port uses vshrq_n_s64(v, 16) for the known Phase-2 shift and vshlq_s64(v, -(int64_t)bpc) for the variable Phase-1 shift — both arithmetic, matching scalar C >> on signed integer. On rebase: keep the arithmetic forms; do NOT adopt vshrq_n_u64 or a logical emulation even if it runs faster.
  • 4-lane stride + mirror tails. SIMD stride = 4; scalar tails cover the remainder. The Phase-2 helper x_conv_row_sad_neon hands 4 lanes to x_conv_block4_neon and drops to scalar for both left/right edges (j < 2 and j + 6 > w). On rebase: preserve the 4-lane stride and the two-sided scalar tail.
  • Signature parity with AVX2. Both pipeline entry points match the AVX2 + AVX-512 variants' (const uint8_t *prev, ptrdiff_t, const uint8_t *cur, ptrdiff_t, int32_t *y_row, unsigned w, unsigned h, unsigned bpc) signature. On rebase: if upstream changes the signature, mirror the change here AND in the x86 variants in lockstep.
  • Re-test:
meson setup build-aarch64 libvmaf \
  --cross-file build-aux/aarch64-linux-gnu.ini \
  -Denable_cuda=false -Denable_sycl=false
ninja -C build-aarch64
meson test -C build-aarch64 --no-rebuild   # expect 31/31 OK
clang-tidy -p build-aarch64 \
  core/src/feature/arm64/motion_v2_neon.c
# Zero warnings expected on the touched file.

# NEON-vs-scalar bit-exact diff under QEMU:
YUV=python/test/resource/yuv
for mask in 0 255; do
  LD_LIBRARY_PATH=build-aarch64/src \
    qemu-aarch64-static -L /usr/aarch64-linux-gnu \
    build-aarch64/tools/vmaf \
    -r $YUV/src01_hrc00_576x324.yuv \
    -d $YUV/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 -n --feature motion_v2 \
    --cpumask $mask -o /tmp/mv2_$mask.xml --precision max
done
diff <(grep -v 'fps=' /tmp/mv2_0.xml) \
     <(grep -v 'fps=' /tmp/mv2_255.xml)  # expect empty
  • On upstream sync: upstream has no NEON motion_v2 and has not signalled plans to add one. If they ever do, diff their NEON against the fork's: on logical-vs-arithmetic shift, keep the fork's arithmetic form (matches scalar). On the function decomposition (the five helpers), adopt upstream's if it's smaller; the fork's layout is ADR-0141-driven, not a semantic contract.
  • Follow-up T7-32 (fixed 2026-05-09): The _mm256_srlv_epi64 (logical right shift) in motion_score_pipeline_16_avx2 was replaced with srav_epi64_imm, an AVX2-safe arithmetic-right-shift emulation: logical shift OR sign-fill mask via srai_epi32 + slli_epi64. Two bugs were closed in the same PR:
  • AVX2 logical-vs-arithmetic shift: _mm256_srlv_epi64 replaced by srav_epi64_imm in core/src/feature/x86/motion_v2_avx2.c. The emulation is bit-exact with scalar C >> bpc on signed int64_t.
  • Test scalar reference mirror: mirror_idx in core/test/test_motion_v2_simd.c used 2*size - idx - 1 instead of 2*size - idx - 2, diverging from integer_motion_v2.c::mirror(). Fixed to -2. All four adversarial fixtures (neg-diff bpc10/12, mixed-diff bpc10/12) now pass. meson test -C build 50/50 OK. On rebase: keep srav_epi64_imm; do not revert to _mm256_srlv_epi64. The rebase-time invariant is now: AVX2 path uses arithmetic shift (matching NEON and scalar).

0039 — readability-function-size NOLINT sweep (ADR-0146)

  • ADR: ADR-0146
  • Touches:
  • core/src/dict.c
  • core/src/picture.c
  • core/src/picture_pool.c
  • core/src/predict.c
  • core/src/libvmaf.c
  • core/src/output.c
  • core/src/read_json_model.c
  • core/src/feature/feature_extractor.c
  • core/src/feature/feature_collector.c
  • core/src/feature/iqa/convolve.c
  • core/src/feature/iqa/ssim_tools.c
  • core/src/feature/x86/vif_statistic_avx2.c
  • Invariant: every readability-function-size NOLINT suppression has been replaced by a set of small static (or static inline, for the SIMD / IQA files) helpers. The helper names are stable interfaces the surrounding code depends on (e.g. iqa_convolve_1d_separable, iqa_convolve_2d, ssim_compute_stats, ssim_workspace_alloc / _free, vif_stat_simd8_compute / _reduce, struct vif_simd8_lane, read_pictures_extractor_loop, read_pictures_post_extractor, read_pictures_validate_and_prep, read_pictures_update_prev_ref). Upstream Netflix has no equivalent helpers; rebases touching any of these files will conflict against the fork's split shape.
  • On upstream sync:
  • If upstream lands a different decomposition of _iqa_convolve or _iqa_ssim, prefer upstream's shape only if it keeps the ADR-0138 / ADR-0139 bit-exactness invariants (single-rounded float mul → widen to double → double add; per-lane scalar-float reduction through aligned temp buffer). Otherwise keep the fork's split and re-document the divergence here.
  • The fork renamed _calc_scaleiqa_calc_scale to clear the bugprone-reserved-identifier check. If upstream modifies _calc_scale, keep the fork's name and port the behavioural change.
  • model_collection_parse_loop writes directly to cfg_name rather than through c->name — if upstream ever rewrites model_collection_parse, preserve the direct write (it's what lets the param stay non-const without a NOLINT).
  • Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for mask in 0 255; do
  VMAF_CPU_MASK=$mask ./build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    -m version=vmaf_v0.6.1 -o /tmp/vmaf_$mask.xml
done
diff <(grep -v fyi /tmp/vmaf_0.xml) <(grep -v fyi /tmp/vmaf_255.xml)
# expect exit 0 (Netflix-golden-pair VMAF bit-identical scalar vs SIMD)

Also run clang-tidy -p build on every file in Touches; expect zero warnings. - Follow-up T7-6: decide whether to rename the _iqa_* API surface (convolve / ssim / decimate / img_filter / filter_pixel / get_pixel) across all callers to clear the remaining bugprone-reserved-identifier suppressions in ssim.c, ms_ssim.c, float_ms_ssim.c. Out of scope here.

0040 — Thread-pool job recycling + inline data buffer (ADR-0147)

  • ADR: ADR-0147
  • Touches: core/src/thread_pool.c
  • Invariants:
  • VmafThreadPoolJob carries a fixed-size char inline_data[64] buffer. Payloads ≤ 64 bytes go through memcpy(job->inline_data, data, data_sz) + job->data = job->inline_data; payloads > 64 bytes take the legacy malloc path. The cleanup path MUST distinguish the two via job->data != job->inline_data — a naive free(job->data) would corrupt the slot. Enforced in vmaf_thread_pool_job_clear_data.
  • free_jobs list is protected by the existing queue.lock; enqueue pops from it before mallocing, runner recycles onto it after running a job. vmaf_thread_pool_destroy walks the list after vmaf_thread_pool_wait returns (all workers have exited → no lock needed). Any reorder that frees the queue lock before the free_jobs walk is a leak on shutdown.
  • Fork's void (*func)(void *data, void **thread_data) signature + per-worker VmafThreadPoolWorker are fork-local; upstream Netflix #1464 has func(void *data). Keep the fork's signature on any rebase — callers (src/libvmaf.c:threaded_enqueue_one etc.) depend on the two-arg form.
  • On upstream sync: Netflix PR #1464 is CLOSED (not merged) and bundles twelve unrelated optimizations. Only the thread-pool portion is ported here. If upstream ever reopens and merges #1464 (or a successor), cherry-pick only the pool mechanics; reject the payload-signature changes, the ADM / VIF / predict.c pieces (they conflict with ADR-0138 / 0139 / 0142 bit-exactness and with T7-5 predict.c refactor), and the feature-collector capacity bump (fork already capped at 8 for a reason — see src/feature/feature_collector.c).

  • Re-test on rebase (x86, any libsvm-less host):

ninja -C build && meson test -C build
for threads in 1 4; do
  for mask in 0 255; do
    VMAF_CPU_MASK=$mask ./build/tools/vmaf \
      --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
      --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
      --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
      -m version=vmaf_v0.6.1 --threads $threads -o /tmp/vmaf_${threads}_${mask}.xml
  done
done
# Expect bit-identical scores (attribute order may differ across
# --threads 1 vs --threads 4 because feature-collector emits in
# insertion order; the numeric values match).
diff <(grep -v fyi /tmp/vmaf_4_0.xml) <(grep -v fyi /tmp/vmaf_4_255.xml)
# expect exit 0 (scalar vs SIMD threaded)

Also run clang-tidy -p build core/src/thread_pool.c — expect zero warnings. Re-run the 500 000-job micro-benchmark from ADR-0147 §Decision if performance is under investigation.

0041 — IQA reserved-identifier rename + cleanup (ADR-0148)

  • ADR: ADR-0148
  • Touches: 21 files across core/src/feature/ (iqa/{convolve,decimate,ssim_tools}.{c,h}, iqa/ssim_simd.h, ssim.c, integer_ssim.c, ms_ssim.c, ms_ssim_decimate.h, float_ssim.c, float_ms_ssim.c, x86/convolve_avx2.{c,h}, x86/convolve_avx512.{c,h}, arm64/convolve_neon.{c,h}, AGENTS.md) plus core/test/test_iqa_convolve.c.
  • Invariants:
  • Every _iqa_* / _kernel / _ssim_int / _map_reduce / _map / _reduce / _context / _ms_ssim_* / _ssim_* / _alloc_buffers / _free_buffers symbol and the four underscore-prefixed header guards (_CONVOLVE_H_, _DECIMATE_H_, _SSIM_TOOLS_H_, __VMAF_MS_SSIM_DECIMATE_H__) is renamed to its non-reserved spelling. The fork's IQA surface no longer uses C's reserved-identifier name space.
  • The clang-analyzer-security.ArrayBound NOLINT bracket in ssim_accumulate_row and ssim_reduce_row_range (integer_ssim.c) is load-bearing — the inner kernel-loop k_min / k_max clamping is provably correct (k_min = max(0, hkernel_offs - x), k_max = min(hkernel_sz, hkernel_sz - (x + hkernel_offs - w + 1))) but the analyzer can't follow it across helper boundaries. Do not collapse the bracket.
  • The clang-analyzer-unix.Malloc NOLINT bracket in test_iqa_convolve.c (check_simd_variant, check_case) is intentional — test exits process on failure path; small allocations leak by design at test end. Do not refactor to free-on-exit.
  • The cross-TU NOLINT pattern on compute_ssim (ssim.c) and compute_ms_ssim (ms_ssim.c) — clang-tidy misc-use-internal-linkage runs per-TU and can't see the header bridge to float_ssim.c / float_ms_ssim.c. Keep the inline justification comment.
  • On upstream sync:
  • The Netflix upstream IQA library (tjdistler/iqa) has been effectively abandoned (last meaningful commit pre-2020). Future rebases will conflict on every renamed symbol; drop the underscore-prefix on each conflict and mirror the fork's iqa_* naming.
  • If upstream Netflix/vmaf ever reincorporates the IQA naming wholesale, prefer the fork's spellings — this PR is a one-shot mechanical rename with no semantic content.
  • Re-test on rebase:
ninja -C build && meson test -C build
for mask in 0 255; do
  VMAF_CPU_MASK=$mask ./build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    -m version=vmaf_v0.6.1 \
    --feature float_ssim --feature float_ms_ssim \
    -o /tmp/iqa_$mask.xml
done
diff <(grep -v fyi /tmp/iqa_0.xml) <(grep -v fyi /tmp/iqa_255.xml)
# expect exit 0 (bit-identical scalar vs SIMD on float_ssim/ms_ssim)

Also run clang-tidy -p build on every touched file (excluding arm64/); expect zero warnings.

0042 — Port Netflix #1376 — FIFO-hang fix via Semaphore (ADR-0149)

  • ADR: ADR-0149
  • Upstream commit: Netflix PR #1376, head 1c06ca4f1bb5da38b54db075a27c35ba8ea9d7b7 (OPEN upstream as of 2026-04-24).
  • Touches:
  • python/vmaf/core/executor.py — base Executor class + ExternalVmafExecutor-style subclass; delete _wait_for_workfiles / _wait_for_procfiles polling loops; rewrite _open_{work,proc}files_in_fifo_mode around multiprocessing.Semaphore(0); add open_sem=None kwarg to every _open_{ref,dis}_{work,proc}file and to the _open_workfile staticmethod; drop unused from time import sleep.
  • python/vmaf/core/raw_extractor.pyAssetExtractor + DisYUVRawVideoExtractor; add open_sem=None to _open_{ref,dis}_workfile overrides (release on entry since these are no-ops); delete _wait_for_workfiles overrides; drop unused from time import sleep.
  • Fork carve-outs (load-bearing on rebase):
  • python/vmaf/__init__.py:__version__ stays "3.0.0" — do NOT port upstream's bump to "4.0.0". The fork tracks its own versioning (v3.x.y-lusoris.N) per ADR-0025.
  • from time import sleep is dropped from both files — upstream leaves the import in place (unused after their patch); the fork removes it because ADR-0141 touched-file rule requires ruff F401 clean.
  • Upstream typo preserved: the subclass warning message contains "to be created to be created". Comments note the typo inline; do not silently fix on rebase — it's upstream- authored and project policy is verbatim port.
  • On upstream sync: upstream PR #1376 is still OPEN. When it merges, re-diff against the merged form; the touched hunks should be conflict-free because the fork now carries the same shape. Re-check whether upstream fixed the "to be created to be created" typo; if so, adopt the fix (it becomes a simple string update).
  • Re-test:
python3 -m py_compile python/vmaf/core/executor.py \
                       python/vmaf/core/raw_extractor.py
ruff check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
black --check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
# all silent

# No FIFO-mode unit test in the tree; end-to-end harness
# exercise (needs libsvm + ffmpeg + fixtures) goes via
#   make test-netflix-golden
# which doesn't exercise fifo_mode path but does verify the
# refactor didn't break executor.py imports.

0043 — Port Netflix #1472 — CUDA on Windows MSYS2/MinGW (ADR-0150)

  • ADR: ADR-0150
  • Upstream commits: Netflix PR #147215745cdf (portability) + b7b65e64 (meson plumbing). Both OPEN upstream as of 2026-04-24.
  • Touches:
  • core/src/cuda/common.h — drop <pthread.h> include; rename reserved header guard __VMAF_SRC_CUDA_COMMON_H__VMAF_SRC_CUDA_COMMON_INCLUDED.
  • core/src/cuda/cuda_helper.cuh#ifdef DEVICE_CODE guard around <cuda.h> vs <ffnvcodec/dynlink_loader.h>.
  • core/src/picture.h#ifdef DEVICE_CODE guard around <cuda.h> + forward-declare VmafCudaState vs <ffnvcodec/*> + full libvmaf_cuda.h; rename reserved header guard.
  • core/src/feature/integer_adm.h — updated comment above dwt_7_9_YCbCr_threshold table noting the fork's positional-initializer shape vs upstream's #ifndef __CUDACC__ shape (see §Fork carve-outs).
  • core/src/feature/cuda/integer_adm/{adm_cm,adm_csf,adm_csf_den,adm_decouple,adm_dwt2}.cu#ifndef DEVICE_CODE guard around #include "feature_collector.h".
  • core/src/meson.build — Windows nvcc plumbing (+70 LoC under host_machine.system() == 'windows'): vswhere-based cl.exe discovery, MSVC + Windows SDK include path injection, CUDA version detection via nvcc --version, nvcc_ccbin_flags + nvcc_host_includes threaded through every custom_target that invokes nvcc.
  • Fork carve-outs (load-bearing on rebase):
  • integer_adm.h uses positional initializers, NOT upstream's #ifndef __CUDACC__ wrap. Both shapes resolve the MSVC/nvcc C++-designated-initializer issue; the positional form is C++-portable and keeps the table available to future .cu consumers. Keep the fork's form on rebase.
  • cuda_static_lib keeps dependencies : [pthread_dependency]. Upstream drops it; the fork needs it because ring_buffer.c (built as part of cuda_static_lib) #includes <pthread.h> directly. On rebase: keep the fork's version.
  • meson.build gencode coverage block: the fork's ADR-0122 explicit cubin list (sm_75/80/86/89 + compute_80 PTX) sits after the new upstream nvcc-detect block. On rebase, re-assemble the same merged order: nvcc-detect first, then gencode coverage (both host-independent).
  • Header guards: _INCLUDED spellings are fork-local (ADR-0148 precedent). Upstream keeps reserved __VMAF_SRC_*_H__ spellings. On rebase, keep _INCLUDED.
  • On upstream sync: PR #1472 is still OPEN. When merged, re-diff the three conflict-resolved hunks against upstream's final form. Keep fork's version on the four carve-outs above unless upstream meaningfully reshapes those regions.
  • Re-test on rebase (Linux host with CUDA toolkit):
meson setup libvmaf core/build-cuda \
    -Denable_cuda=true -Denable_nvcc=true -Denable_sycl=false
ninja -C core/build-cuda && meson test -C core/build-cuda
# Expect 6 .fatbin files generated + CLI linked + 35/35 tests pass.

Windows validation is operator-driven — CI does not yet have a Windows + MSYS2 + MinGW + MSVC BuildTools + CUDA runner (tracked as T7-3 in .workingdir2/OPEN.md). - Prerequisites note (Windows only): nv-codec-headers must be built from git master commit 876af32 or later. The release tag n13.0.19.0 is missing cuMemFreeHost, cuStreamCreateWithPriority, cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing issue, not scope of this port.

0058 — libvmaf.pc Cflags leak fix (ADR-0200)

  • ADR: ADR-0200; bug-fix follow-up to entry 0057.
  • Upstream source: fork-local. Netflix has no Vulkan backend.
  • Touches:
  • core/subprojects/packagefiles/volk/meson.build — drops -include volk_priv_remap.h from volk_dep.compile_args; keeps -DVK_NO_PROTOTYPES.
  • core/src/vulkan/meson.build — pulls volk_priv_remap_h_path from the volk subproject and appends ['-include', <path>] to vmaf_cflags_common (private c_args: on libvmaf's library() call).
  • Invariants (load-bearing):
  • -include MUST stay off volk_dep.compile_args — otherwise it leaks into static libvmaf.pc Cflags. Test on rebase: meson setup ... -Ddefault_library=static -Denable_vulkan=enabled, then grep Cflags meson-private/libvmaf.pc — must NOT contain volk_priv_remap or any build-dir absolute path.
  • -include MUST be applied to libvmaf's compile — every libvmaf TU that calls volk's vk* API needs the rename macros active. The vmaf_cflags_common injection covers this for all libvmaf sub-libraries (libvmaf_feature, libvmaf_cpu, etc.).
  • The path comes from subproject('volk').get_variable(...), not from a hardcoded string — survives volk wrap version bumps.
  • On upstream sync: zero upstream interaction.
  • Re-test on rebase / volk wrap bump:
meson setup build-vk-static-test libvmaf -Denable_vulkan=enabled \
    -Denable_cuda=false -Denable_sycl=false -Ddefault_library=static
ninja -C build-vk-static-test src/libvmaf.a
grep Cflags build-vk-static-test/meson-private/libvmaf.pc
# Expected: no `volk_priv_remap` substring, no build-dir absolute path

0057 — Volk vk* priv-remap for static-archive builds (ADR-0198)

  • ADR: ADR-0198; follow-up to ADR-0185.
  • Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
  • Touches:
  • core/subprojects/packagefiles/volk/meson.build — overlay applied on top of the upstream volk wrap. Adds a custom_target that runs gen_priv_remap.py to produce volk_priv_remap.h from the upstream volk.h, and wires -include of the generated header into volk.c's c_args and volk_dep's compile_args.
  • core/subprojects/packagefiles/volk/gen_priv_remap.py — fork-added generator script (regex against extern PFN_vkXxx vkXxx; declarations).
  • Invariants (load-bearing):
  • Force-include must propagate to every libvmaf TU pulling in volk_dep — verified via meson dep graph. Removing the -include from compile_args re-introduces the static-link multi-def cascade.
  • Generator regex matches every vk* PFN declaration in volk.h — confirmed for volk-1.4.341 (784 declarations, 784 remaps). Bumping the volk wrap version: re-run the generator (it's a configure-time custom target, so it's automatic) and confirm the rename count printed to stdout matches the count of ^extern PFN_vk lines in the new volk.h.
  • The renamed symbols use the vmaf_priv_ prefix — chosen to match no upstream Netflix or Vulkan SDK identifier. Don't rename to _vk* (collides with reserved-identifier C namespace) or vkv_* etc.
  • On upstream sync: zero upstream interaction. The volk wrap is a libvmaf-managed subproject; Netflix doesn't ship a Vulkan backend.
  • Re-test on rebase / after any volk wrap bump:
meson setup build-vk-static libvmaf -Denable_vulkan=enabled \
    -Denable_cuda=false -Denable_sycl=false \
    -Ddefault_library=static
ninja -C build-vk-static src/libvmaf.a
test "$(nm build-vk-static/src/libvmaf.a 2>/dev/null \
          | grep -cE '^[0-9a-f]* (T|D|B|R) vk[A-Z]')" = "0" \
    && echo OK

(Followed by the BtbN-style link reproducer in the ADR References section.)

0056 — SSIMULACRA 2 snapshot gate + fp-contract-off split (ADR-0164)

  • ADR: ADR-0164
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
  • Touches:
  • python/test/ssimulacra2_test.py — new fork-added Python test. Uses subprocess.call against ExternalProgram.vmafexec with --feature ssimulacra2; parses the --json output; asserts pooled + per-frame scores.
  • Invariants (load-bearing):
  • Pinned values are CPU-only — generated on master HEAD after PR #100 merge. Re-generate if the scalar or any SIMD path changes semantically (which per ADR-0161/0162/0163's bit-exactness contract, it shouldn't — any bit-exact refactor leaves pinned values unchanged).
  • Tolerance is 4 decimal places (places=4) — matches 1e-4. The CPU paths are bit-exact so actual drift should be 0; the tolerance is defensive.
  • -ffp-contract=off everywhere in the ssimulacra2 pipeline: libvmaf_ssimulacra2_static_lib (scalar extractor), x86_ssimulacra2_avx2_lib, x86_ssimulacra2_avx512_lib, and arm64_ssimulacra2_lib (from ADR-0161). All four split out of their umbrella libs so other extractors keep upstream's default FMA policy. Without this the CI GCC/clang hosts drifted ~2e-4 from my AVX-512 authoring host — GCC 10+ defaults -ffp-contract=fast on x86 with -mfma and on aarch64, fusing a*b+c in scalar glue around the SIMD calls. Do NOT remove any of these carve-outs on rebase.
  • Fixtures are already-checked-insrc01_hrc00/01_576x324 is also the primary Netflix golden fixture; the 160×90 derived one stresses the sub-176 pyramid-termination path.
  • Do NOT modify the Netflix golden assertions in quality_runner_test.py et al. — those are upstream-pinned. This test is a SEPARATE file that adds fork-specific scores.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future, cross-reference against their pinning if they add one.
  • Re-test on rebase / after any ssimulacra2 change:
cd python && python -m pytest test/ssimulacra2_test.py -v   # 2/2
  • Follow-ups:
  • Cross-reference gate against libjxl tools/ssimulacra2 when ssimulacra2_rs cargo install is fixed.
  • Expand fixture coverage if new YUV test assets land.

0055 — SSIMULACRA 2 picture_to_linear_rgb SIMD (ADR-0163)

  • ADR: ADR-0163
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
  • Touches:
  • ssimulacra2_avx2.{c,h} — new ssimulacra2_picture_to_linear_rgb_avx2 + helpers (read_plane_scalar_s2, srgb_to_linear_lane_avx2, compute_matrix_coefs).
  • ssimulacra2_avx512.{c,h} — 16-wide AVX-512 port.
  • ssimulacra2_neon.{c,h} — 4-wide aarch64 port.
  • ssimulacra2.c — new ptlr_fn field in Ssimu2State; dispatch wrapper convert_picture_to_linear_rgb unpacks VmafPicture into simd_plane_t[3]; init assigns AVX2/AVX-512/NEON pointers.
  • ssimulacra2_simd_common.h — new shared header declaring simd_plane_t. Decouples SIMD TUs from VmafPicture type.
  • test_ssimulacra2_simd.c — new test_ptlr_420_8, test_ptlr_420_10, test_ptlr_444_8, test_ptlr_444_10, test_ptlr_422_8 subtests + scalar references ref_read_plane, ref_srgb_to_linear, ref_picture_to_linear_rgb.
  • Invariants (load-bearing):
  • Scalar-order matmulG = Yn + cb_g * Un + cr_g * Vn chained left-to-right in all three SIMD TUs. Regression test catches reordering drift (~1 ulp).
  • Per-lane scalar powf — vector polynomial approximation would drift scalar bit-exactness. Do not replace the lane spill/reload pattern with a vector libm.
  • simd_plane_t layout{data, stride, w, h} ordering assumed by all three SIMD TUs. The dispatch wrapper builds this from VmafPicture fields; layout must match.
  • Bounds clamping in read_plane_scalar_* mirrors scalar reference verbatim (if (sx < 0) sx = 0; if (sx >= pw) sx = pw-1; etc.). Do not simplify — removes per-lane safety at plane edges.
  • Arbitrary chroma ratios fall through to the int64_t multiplication branch. Don't remove it — SSIMULACRA 2 is supposed to accept non-standard ratios gracefully.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides a SIMD YUV→RGB path, diff against the fork's — preserve the bit-exactness contract unless ADR-0142 Netflix-authority carve-out opens.
  • Re-test on rebase:
ninja -C build && build/test/test_ssimulacra2_simd     # 11/11
ninja -C build-aarch64 && \
  qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
    build-aarch64/test/test_ssimulacra2_simd            # 11/11
  • Follow-ups:
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending (gated on tools/ssimulacra2 availability).
  • SSIMULACRA 2 now has zero scalar hot paths. T3-1 closes in full with phases 1+2+3 (ADR-0161, 0162, 0163).

0054 — SSIMULACRA 2 FastGaussian IIR blur SIMD (ADR-0162)

  • ADR: ADR-0162
  • Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf.
  • Touches:
  • ssimulacra2_avx2.{c,h} — new ssimulacra2_blur_plane_avx2 + 2 helpers (hblur_8rows_avx2, vblur_simd_8cols_avx2).
  • ssimulacra2_avx512.{c,h} — 16-wide port.
  • ssimulacra2_neon.{c,h} — 4-wide aarch64 port, uses vsetq_lane_f32 in place of gather.
  • ssimulacra2.c — adds blur_fn function pointer to Ssimu2State, dispatch in init_simd_dispatch(), call-site in blur_3plane.
  • test_ssimulacra2_simd.c — new test_blur + scalar reference (ref_blur_plane, ref_fast_gaussian_1d).
  • Invariants (load-bearing):
  • Row-batching lane layout — horizontal pass lane i MUST hold row (y_base + i). Gather index vector entries are (y_base + i) * w (stride-w). Changing this breaks bit-exactness vs scalar.
  • Scalar left-to-right summation ordern2_k * sum - d1_k * prev1_k - prev2_k chained sequentially; o0 + o1 + o2 at output time is (o0 + o1) + o2. Changing to (o0 + o2) + o1 or o0 + (o1 + o2) will drift ~1 ulp and the regression test catches it.
  • col_state is 6 * w contiguous floats — layout is [prev1_0 | prev1_1 | prev1_2 | prev2_0 | prev2_1 | prev2_2]. SIMD loads assume this layout; changing field order requires updating all three SIMD TUs in lockstep with blur_plane.
  • NEON lane-set pattern — aarch64 has no gather intrinsic; 4 explicit vsetq_lane_f32 calls per input vector. Do not replace with a ld1 {v.s}[lane]-style pseudo-gather without re-verifying bit-exactness.
  • Scalar tail in vertical pass matches scalar reference body verbatim. Any deviation breaks memcmp equality on widths that aren't multiples of the SIMD width.
  • On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides their own IIR blur SIMD, diff against the fork's and preserve the bit-exactness contract unless an ADR-0142 Netflix-authority carve-out is opened.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd  # 6/6
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
  build-aarch64/test/test_ssimulacra2_simd  # 6/6
  • Follow-ups:
  • picture_to_linear_rgb SIMD — last scalar hot path in the extractor. 2 calls / frame. Low ROI but mechanical.
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending.

0053 — SSIMULACRA 2 SIMD bit-exact ports (ADR-0161)

  • ADR: ADR-0161
  • Upstream source: fork-local. Upstream Netflix/vmaf has no SSIMULACRA 2 extractor at all (fork-added in ADR-0130).
  • Touches:
  • ssimulacra2_avx2.c / .h — 5 AVX2 kernels + per-lane cbrtf helper.
  • ssimulacra2_avx512.c / .h — 5 AVX-512 kernels; mechanical 16-wide widening of the AVX2 path.
  • ssimulacra2_neon.c / .h — 5 NEON kernels; 4-wide aarch64 mirror.
  • ssimulacra2.c — adds function-pointer dispatch fields to Ssimu2State + init_simd_dispatch() helper, calls go through the pointers.
  • meson.build — registers the three SIMD TUs in x86_avx2_sources / x86_avx512_sources / arm64_sources.
  • test_ssimulacra2_simd.c and test/meson.build — new bit-exact test harness.
  • Invariants (load-bearing):
  • Byte-for-byte bit-exactness to scalar on all 5 vectorised kernels under FLT_EVAL_METHOD == 0. Regression caught pre- merge: naïve pairing (a+b)+(c+d) vs scalar ((a+b)+c)+d drifts by 1 ULP. Keep sequential scalar-order chains in all three SIMD TUs on rebase.
  • cbrtf is per-lane scalar libm, not a polynomial. Any replacement with a vector cbrt would drift the ssimulacra2 score and break the regression test. Keep the spill/reload pattern.
  • ssim_map / edge_diff_map reductions use the ADR-0139 per-lane double scalar tail. Do NOT SIMD-reduce float lanes then lift to double — summation order changes.
  • downsample_2x2 deinterleave uses ISA-appropriate ops: AVX2 vshufps+vpermpd, AVX-512 vpermt2ps, NEON vuzp1q_f32+vuzp2q_f32. After deinterleave, sum order is ((r0e+r0o)+r1e)+r1o matching scalar.
  • #pragma STDC FP_CONTRACT OFF at every TU header. Ignored by aarch64 GCC (non-fatal -Wunknown-pragmas); kept for portability (clang, MSVC).
  • IIR blur + picture_to_linear_rgb stay scalar in this PR. Follow-up PRs target these; when they land, re-verify bit-exactness via test_ssimulacra2_simd expansion.
  • Runtime dispatch order: AVX-512 > AVX2 on x86; NEON on aarch64; scalar fallback. Preserve on rebase.
  • On upstream sync:
  • Upstream has no SSIMULACRA 2 extractor; nothing to merge.
  • If Netflix adopts SSIMULACRA 2 in the future, diff their implementation against the fork's scalar + SIMD TUs; keep the fork's bit-exactness contract absent a specific Netflix-authority carve-out ADR.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd   # 5/5
clang-tidy -p build core/src/feature/x86/ssimulacra2_avx2.c \
                     core/src/feature/x86/ssimulacra2_avx512.c
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
  build-aarch64/test/test_ssimulacra2_simd   # 5/5
clang-tidy -p build-aarch64 \
  core/src/feature/arm64/ssimulacra2_neon.c
  • Follow-ups:
  • IIR blur vectorisation (blur_plane vertical-pass column batching) — the biggest frame-level wallclock win.
  • picture_to_linear_rgb per-lane powf — lower ROI but mechanical.
  • T3-3 SSIMULACRA 2 snapshot-JSON regression test — ADR-0130 deferred; still pending.

0052 — psnr_hvs SIMD bit-exact ports (ADR-0159 AVX2, ADR-0160 NEON)

  • ADRs: ADR-0159 (AVX2), ADR-0160 (NEON sister port).
  • Upstream source: fork-local. Upstream Netflix/vmaf has no psnr_hvs SIMD path.
  • Touches:
  • core/src/feature/x86/psnr_hvs_avx2.c — AVX2 TU.
  • core/src/feature/x86/psnr_hvs_avx2.h — AVX2 header.
  • core/src/feature/arm64/psnr_hvs_neon.c — NEON TU (sister port, ADR-0160).
  • core/src/feature/arm64/psnr_hvs_neon.h — NEON header.
  • core/src/feature/third_party/xiph/psnr_hvs.c — add PsnrHvsState + runtime dispatch in init() (AVX2 under ARCH_X86, NEON under ARCH_AARCH64) + scoped NOLINTBEGIN/END around the upstream Xiph scalar block (kept verbatim as the bit-exact reference).
  • core/src/meson.build — add x86/psnr_hvs_avx2.c to x86_avx2_sources and arm64/psnr_hvs_neon.c to arm64_sources.
  • core/test/test_psnr_hvs_avx2.c, core/test/test_psnr_hvs_neon.c — bit-exact unit tests (x86 and aarch64 respectively).
  • core/test/meson.build — register both tests under enable_asm, arch-gated.
  • Invariants (load-bearing):
  • Bit-exactness to scalar: every od_coeff (int32) and every final psnr_hvs_{y,cb,cr,psnr_hvs} value the AVX2 path emits must be byte-identical to the scalar reference on the Netflix golden pairs. If a rebase introduces any pattern that breaks this (e.g. a floating-point horizontal reduce in the mask accumulator), the unit test test_psnr_hvs_avx2 will fail — don't relax the assertions; fix the SIMD path.
  • DCT butterfly layout: butterfly → transpose → butterfly → transpose. The transpose lives inside od_bin_fdct8x8_avx2. Do not move it.
  • Float accumulators stay scalar: means / variances / mask / error accumulation in calc_psnrhvs_avx2 use the same per-block scalar loop as scalar psnr_hvs — bit-exact by construction. Do not vectorize these with horizontal reductions without replicating ADR-0139's per-lane scalar-float reduction pattern. The cross-block error accumulator ret is threaded through accumulate_error() by pointer, not returned-then-summed: each of the 64 per-coefficient contributions per block must hit the outer ret directly, matching scalar's inline ret += ... at third_party/xiph/psnr_hvs.c line 355. IEEE-754 float add is non-associative — summing into a local float and then adding the per-block total to ret changes the summation tree and drifts the Netflix golden by ~5.5e-5.
  • #pragma STDC FP_CONTRACT OFF at the TU header disables FMA formation. Required: fmaf(a, b, c) can differ from (a*b)+c by 1 ulp, breaking bit-exactness. Do not remove the pragma; do not add -ffp-contract=fast to the build flags for this TU.
  • NOLINT suppressions are load-bearing — each cites ADR-0141 inline (bit-exactness scalar-diff auditability for the 30-butterfly function, scalar float→double promotion for sqrt, extractor-registry extern linkage for vmaf_fex_psnr_hvs, upstream-Xiph scoped block for rebase parity).
  • On upstream sync:
  • Upstream has no psnr_hvs SIMD as of 2026-04-24. Keep fork's version on conflict.
  • If upstream ever touches psnr_hvs.c for non-SIMD reasons (e.g. a masking-table update), rebase the AVX2 TU to match line-for-line and re-run test_psnr_hvs_avx2 to confirm bit-exactness survives.
  • NEON follow-up PR is a sister port; its arm64/psnr_hvs_neon.c will mirror this ADR's invariants. On rebase, the two SIMD TUs must stay in lock-step with the scalar reference.
  • Re-test on rebase:
ninja -C build
meson test -C build test_psnr_hvs_avx2
# Expect: 5/5 subtests pass (DCT bit-exact on 3 random seeds +
# delta + constant input).

# CLI-level bit-exactness on Netflix golden (requires the YUV
# fixtures in python/test/resource/yuv/):
# VMAF_CPU_MASK=0    (scalar)
# VMAF_CPU_MASK=255  (AVX2 enabled)
# Diff per-frame psnr_hvs_{y,cb,cr,psnr_hvs} XML fields; expect
# byte-identical across all 3 golden pairs.

0051 — Netflix#1486 motion updates verified present (ADR-0158)

  • ADR: ADR-0158
  • Upstream source: Netflix upstream PR #1486 ("Port motion updates"), MERGED 2026-04-20 as commits a44e5e6 (code) + 62f47d5 (Netflix golden updates).
  • Touches: documentation-only; the actual code changes this ADR documents are already in the fork's master via earlier incremental motion3 / blend / five-frame-window commits.
  • Invariants (load-bearing for future /sync-upstream):
  • The edge_8 mirror fix (i_tap = height - (i_tap - height + 2)) is present at integer_motion.c:240, x86/motion_avx2.c:147, x86/motion_avx512.c:147. If upstream's mirror line ever diverges again, this is the hunk to watch.
  • The motion_max_val feature option is at integer_motion.c:57,118-120 with default 10000.0 and FEATURE_PARAM flag. Upstream's default = fork's default; don't drift.
  • VMAF_integer_feature_motion3_score output plumbing is in integer_motion.c + alias.c.
  • Fork-local motion extensions (five-frame-window, moving-average, blend, fps_weight) are ADDITIONS on top of Netflix#1486. They are not upstream. Upstream changes to motion extractor internals may conflict with them — diff against core/src/feature/integer_motion.c on every rebase and check that the fork's MIN(s->score * s->motion_fps_weight, s->motion_max_val) invocations are preserved (lines ~409, ~503).
  • On upstream sync: nothing to port from Netflix#1486 — it's absorbed. If a future upstream PR touches the same code paths, prefer upstream's version for the scalar/edge handling and the fork's version for the five-frame-window / blend extensions.
  • Re-test on rebase:
ninja -C build
meson test -C build
# Expect: 35/35 pass.

# Verify the upstream markers are still in place after rebase:
grep -n "height - (i_tap - height + 2)\|motion_max_val\|VMAF_integer_feature_motion3_score" \
    core/src/feature/integer_motion.c \
    core/src/feature/alias.c \
    core/src/feature/x86/motion_avx2.c \
    core/src/feature/x86/motion_avx512.c
# Expect: matches at all 4 files. If any missing, the rebase
# silently dropped the Netflix#1486 content — investigate.

0050 — CUDA preallocation memory leak fix + vmaf_cuda_state_free (ADR-0157)

  • ADR: ADR-0157
  • Upstream source: Netflix upstream issue #1300 (OPEN since 2024; no maintainer fix as of 2026-04-24). User reports GPU memory rises monotonically across init/preallocate/fetch/close cycles.
  • Touches:
  • core/include/libvmaf/libvmaf_cuda.h — new public vmaf_cuda_state_free() API declaration.
  • core/src/cuda/common.c — new vmaf_cuda_state_free() implementation; vmaf_cuda_release() now calls cuda_free_functions(); vmaf_cuda_state_init() gets an outer failure unwind; init_with_primary_context() releases the retained primary context on fail_after_pop.
  • core/src/cuda/ring_buffer.cvmaf_ring_buffer_close() now unlocks + destroys the mutex before freeing.
  • core/test/test_cuda_preallocation_leak.c — new GPU-gated reducer (10-cycle loop with full cleanup).
  • core/test/test_cuda_pic_preallocation.c, core/test/test_cuda_buffer_alloc_oom.c — add missing vmaf_cuda_state_free() + vmaf_model_destroy() calls after vmaf_close() in every test that allocates these.
  • core/test/meson.build — register the new reducer under enable_cuda guard.
  • Invariants (load-bearing):
  • Public contract: every caller of vmaf_cuda_state_init() MUST call vmaf_cuda_state_free() AFTER vmaf_close() on any VmafContext that imported the state. Informal free(cu_state) is a silent double-free hazard AFTER close (vmaf_close's vmaf_cuda_release already memset's + frees CudaFunctions internals; vmaf_cuda_state_free only frees the heap allocation itself).
  • vmaf_cuda_release() frees CudaFunctions via a saved pointer AFTER the memset. Order matters — memset first so cu_state->f is zeroed in the caller's struct, then free via the saved local. Do not re-order.
  • vmaf_ring_buffer_close() unlocks BEFORE destroying the mutex (POSIX requires the mutex be unlocked for destroy).
  • The cold-start unwind in init_with_primary_context releases cuDevicePrimaryCtxRetain's retained context if cuStreamCreateWithPriority fails.
  • The ADR-0122 / ADR-0123 is_cudastate_empty() null-guards at the top of every public vmaf_cuda_* entry must continue to compose with the new vmaf_cuda_state_free() (which accepts NULL directly and doesn't call through to the CUDA API).
  • The new free call order in callers is: vmaf_close(vmaf)vmaf_cuda_state_free(cu_state)vmaf_model_destroy(model). Reversing the first two produces a use-after-free.
  • On upstream sync:
  • Upstream has no vmaf_cuda_state_free() as of 2026-04-24. Keep the fork's version on any conflict. If upstream eventually lands the same API with a different spelling, prefer upstream's spelling and add a compat alias — but do not break the fork's ABI.
  • vmaf_cuda_release()'s cuda_free_functions() call is fork-local. On rebase, keep it.
  • The ring-buffer pthread_mutex_unlock + pthread_mutex_destroy pair is fork-local. On rebase, keep it.
  • If upstream refactors VmafCudaState ownership semantics (unlikely — their pattern has been "leaked state in a long- lived process is acceptable" historically), re-audit this ADR and the new public API.
  • Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 40/40 pass including test_cuda_preallocation_leak.

# ASan leak-check:
cd libvmaf && meson setup build-asan-cuda \
    -Db_sanitize=address -Denable_cuda=true -Denable_sycl=false \
    --buildtype=debug
ninja -C build-asan-cuda
ASAN_OPTIONS='detect_leaks=1:leak_check_at_exit=1' \
    build-asan-cuda/test/test_cuda_preallocation_leak
# Expect: 0 bytes leaked from core/src/* frames.
# (~180 bytes in libcuda.so.1 is expected — driver's process-
#  lifetime cuInit cache, does not grow per cycle.)

0049 — CUDA graceful error propagation (ADR-0156)

  • ADR: ADR-0156
  • Upstream source: Netflix upstream issue #1420 (OPEN as of 2026-04-24). Reports that two concurrent VMAF-CUDA processes crash the second one at vmaf_cuda_buffer_alloc due to CHECK_CUDA(cuMemAlloc)assert(0) on OOM.
  • Touches:
  • core/src/cuda/cuda_helper.cuh — redefined CHECK_CUDA family. New macros CHECK_CUDA_GOTO + CHECK_CUDA_RETURN + helper vmaf_cuda_result_to_errno. Old assert(0) semantics removed entirely.
  • core/src/cuda/common.c, core/src/cuda/picture_cuda.c, core/src/libvmaf.c — all CHECK_CUDA(...) sites converted; cleanup labels added where contexts / buffers were pushed / allocated.
  • core/src/feature/cuda/integer_motion_cuda.c, integer_vif_cuda.c, integer_adm_cuda.c — same conversion; 12 static helpers promoted void → int.
  • core/test/test_cuda_buffer_alloc_oom.c — new GPU-gated reducer.
  • core/test/meson.build — register new test under enable_cuda guard.
  • Invariants (load-bearing):
  • CHECK_CUDA_GOTO / CHECK_CUDA_RETURN must never call assert(0) or abort() on a CUDA error. Any regression back to the upstream abort-on-error semantics re-introduces Netflix#1420 and the NDEBUG footgun.
  • Every CHECK_CUDA_GOTO target label must pop any previously-pushed CUDA context and free any partially-constructed buffers before returning the errno. The graceful path must not leak resources.
  • vmaf_cuda_result_to_errno uses numeric CUresult values directly (0 / 1 / 2 / 3 / 4 / 101 / 201 / 400) so host TUs that don't include <cuda.h> can transitively consume the mapping via the inline function. If upstream renumbers CUresult enum values (historically stable — they've been fixed since CUDA 1.0), re-audit the switch.
  • ADR-0122 / ADR-0123 is_cudastate_empty(...) guards at the top of every public vmaf_cuda_* entry point must stay — they run before the CUDA API is touched and compose cleanly with the new error propagation.
  • Twelve static helper signatures in the feature extractors are int-returning (was void): any upstream-port that restores the void return silently regresses the error path.
  • On upstream sync:
  • Upstream Netflix still uses assert(0) in CHECK_CUDA as of 2026-04-24. Keep the fork's macro definitions in cuda_helper.cuh on any upstream conflict — this file is fork-local behaviour.
  • If upstream eventually lands Netflix#1420 with a similar refactor, prefer the fork's version unless upstream's has identical semantics (no assert(0) / no abort() / translates CUresult to -errno). Re-verify test_cuda_buffer_alloc_oom after rebase.
  • If upstream adds new CHECK_CUDA(...) sites in a port, rewrite them to CHECK_CUDA_GOTO / CHECK_CUDA_RETURN as part of the port commit.
  • If upstream changes any of the 12 static helper signatures back to void, re-promote them to int during the merge.
  • Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 39/39 pass including test_cuda_buffer_alloc_oom.

# Reducer check — verify the OOM-to-errno path is live:
meson test -C core/build-cuda test_cuda_buffer_alloc_oom -v
# Expect subtests: request 1 TiB → -ENOMEM; request 0 bytes → 0.

clang-tidy -p core/build-cuda --quiet \
    core/src/cuda/common.c \
    core/src/cuda/picture_cuda.c \
    core/src/feature/cuda/integer_motion_cuda.c \
    core/src/feature/cuda/integer_vif_cuda.c \
    core/src/feature/cuda/integer_adm_cuda.c \
    core/src/libvmaf.c
# Expect exit 0 on every file.

0049 — compute_motion / picture_copy signature changes (b949cebf upstream port)

  • Upstream commit: Netflix/vmaf b949cebf (feature/motion: port several feature extractor options)
  • Prerequisite commit: Netflix/vmaf d3647c73 (picture_copy: add channel parameter)
  • PR: upstream/port-b949cebf-motion

Rebase-sensitive invariants:

  1. compute_motion signature changecompute_motion() in core/src/feature/motion.c / motion.h now takes an extra int motion_decimate parameter (the motion_add_scale1 flag). Any new caller added in the fork that calls compute_motion() must pass this parameter. The SIMD integer motion callers (motion_avx2.c, motion_avx512.c) do NOT call compute_motion() — they use the SAD/convolution dispatch table directly and are unaffected.

  2. vmaf_image_sad_c signature change — similarly gains int motion_add_scale1. Any caller in the fork must be updated. Currently only called from compute_motion() internally.

  3. picture_copy signature change — gains int channel as the last parameter (0=Y, 1=U, 2=V). Every caller in the tree has been updated to pass 0 (luma). When adding new callers that need UV planes, pass 1 or 2. The fork's CUDA/SYCL/Vulkan callers have been updated in this PR.

  4. Default behavior preserved — all new options default to no-op values. motion_add_scale1=false, motion_add_uv=false, motion_blend_factor=1.0, motion_fps_weight=1.0, motion_filter_size=5 (= DEFAULT_MOTION_FILTER_SIZE). Integer and float motion2 scores are bit-identical to pre-port baseline.

  5. vif_scale_frame_s dependency avoided — the upstream b949cebf motion.c imports vif_scale_frame_s from vif_tools.h. The fork does not have this function yet (vif options chain is deferred, Research-0024 Strategy E). The bilinear downscaler for motion_add_scale1 is implemented as local static functions in motion.c (motion_scale_bilinear, motion_bilinear_interp, motion_mirror_f). When upstream's vif options chain is eventually ported, reconcile by replacing these local functions with vif_scale_frame_s.

Reproducer:

# verify bit-exactness (default options, scores must be identical):
./core/build/tools/vmaf \
  --reference testdata/ref_576x324_48f.yuv \
  --distorted testdata/dis_576x324_48f.yuv \
  --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
  --model path=model/vmaf_v0.6.1.json \
  --feature motion --no_prediction --json --output /tmp/motion.json
# integer_motion2 scores must match pre-port baseline at 6 decimal places.

0048 — i4_adm_cm int32 rounding overflow deliberately preserved (ADR-0155)

  • ADR: ADR-0155
  • Upstream source: Netflix upstream issue #955 (OPEN since 2020; no maintainer response as of 2026-04-24). Reports that add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1)) in core/src/feature/integer_adm.c scales 1–3 overflows int32_t (1u << 31 = 0x80000000 wraps to -2147483648). Rounding term is sign-negated; ADM scales 1–3 biased low by ≈1 LSB per summed term.
  • Touches (documentation-only):
  • docs/adr/0155-adm-i4-rounding-deferred-netflix-955.md — new ADR (this entry's anchor).
  • core/src/feature/integer_adm.c — in-file warning comment above the overflow site (add_bef_shift_flt[] initialiser loop around line 1277). No code change.
  • core/src/feature/AGENTS.md — invariant note under "Rebase-sensitive invariants".
  • Invariants (load-bearing — do NOT silently "fix"):
  • integer_adm.c keeps int32_t add_bef_shift_flt[3] with the overflowing 1u << 31 assignment. The Netflix golden assertions (python/test/quality_runner_test.py, vmafexec_test.py, feature_extractor_test.py) encode the buggy ADM output. Project hard rule #1 (ADR-0024) prohibits changing those assertions.
  • Any "fix" that changes ADM numerical output must land together with a coordinated Netflix-authored golden-number update (the ADR-0142 Netflix-authority carve-out). Until Netflix#955 closes upstream, there is no authority to track.
  • On upstream sync:
  • If Netflix finally lands a fix for #955 (widening the rounding term to uint32_t or int64_t), sync the C-side fix AND the updated assertAlmostEqual values in the same merge. Re-run make test-netflix-golden and /cross-backend-diff on the golden pairs to verify the new numbers are consistent across CPU / CUDA / SYCL.
  • Remove the in-file warning comment above the add_bef_shift_flt initialiser loop, flip ADR-0155 to Superseded by ADR-NNNN, and drop this rebase-notes entry.
  • If upstream instead closes #955 as wont-fix, keep this entry verbatim and update the ADR status to note upstream's closure.
  • Re-test on rebase (gates the invariant by confirming the golden numbers are unchanged):
ninja -C build
make test-netflix-golden
# Expect: VMAF mean 76.66890… on src01_hrc00/01_576x324 golden
# pair — bit-identical to pre-rebase.

0047 — vmaf_score_pooled -EAGAIN for pending features (ADR-0154)

  • ADR: ADR-0154
  • Upstream source: Netflix upstream issue #755 (OPEN as of 2026-04-24). Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); fork reopens it via error-code semantics without changing the retroactive-write design.
  • Touches:
  • core/src/feature/feature_collector.cvmaf_feature_collector_get_score returns -EAGAIN (was -EINVAL) when the requested index is valid but not yet written.
  • core/src/feature/feature_collector.h — inline vmaf_feature_vector_get_score now returns -EINVAL for null/out-of-range and -EAGAIN for not-written (was -1 for both). Added #include <errno.h>. Rename reserved __VMAF_FEATURE_COLLECTOR_H__ guard to VMAF_FEATURE_COLLECTOR_INCLUDED.
  • core/test/test_score_pooled_eagain.c — new 4-subtest reducer.
  • core/test/meson.build — register the new test.
  • Invariants (load-bearing, enforced by the reducer):
  • vmaf_feature_collector_get_score(fc, name, &score, i) returns -EAGAIN iff the feature name is registered and i is in range but score[i].written == false.
  • The return stays -EINVAL for (a) null pointers, (b) i >= feature_vector->capacity, (c) unknown feature name.
  • The inline fast-path vmaf_feature_vector_get_score uses the same split.
  • On upstream sync: upstream has not changed the error semantics since 2020. If they do (unlikely), keep the fork's -EAGAIN — it is strictly more informative and downstream code depending on the split would regress.
  • Re-test on rebase:
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: 4/4 subtests pass.

# Reducer check:
git stash push core/src/feature/feature_collector.c core/src/feature/feature_collector.h
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: Fail: 1 (tests fail without -EAGAIN split).
git stash pop

0046 — float_ms_ssim min-dim guard (ADR-0153)

  • ADR: ADR-0153
  • Upstream source: Netflix upstream issue #1414 (OPEN as of 2026-04-24). No upstream fix has landed; fork adds the guard independently.
  • Touches:
  • core/src/feature/float_ms_ssim.c — add #include "log.h" + #include "iqa/ssim_tools.h" + a min_dim = GAUSSIAN_LEN << (SCALES - 1) check at the start of init; extract SIMD dispatch into a new ms_ssim_init_simd_dispatch helper to keep init within the ADR-0141 60-line budget.
  • core/test/test_float_ms_ssim_min_dim.c — new 3-subtest reducer.
  • core/test/meson.build — register the new test executable.
  • Invariant (load-bearing, enforced by the reducer): float_ms_ssim.init returns -EINVAL when w < 176 || h < 176, where 176 is computed dynamically from the filter constants. The magic number is not hardcoded — changing SCALES or GAUSSIAN_LEN upstream will auto-update the minimum.
  • On upstream sync: if Netflix upstream lands a similar init-time guard, keep the fork's version — the helper name ms_ssim_init_simd_dispatch is fork-local (introduced to satisfy ADR-0141) and upstream's patch won't match. Both guards should be compatible; re-verify the reducer after rebase.
  • Re-test on rebase:
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: 3/3 subtests pass.

# Reducer check (confirms the guard is load-bearing):
git stash push core/src/feature/float_ms_ssim.c
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: Fail: 1 (tests fail without the guard).
git stash pop

0045 — vmaf_read_pictures monotonic-index guard (ADR-0152)

  • ADR: ADR-0152
  • Upstream source: Netflix upstream issue #910 (OPEN as of 2026-04-24). No upstream fix has landed; the fork adds the guard independently, per the 2021-10-14 maintainer comment that recommended exactly this shape.
  • Touches:
  • core/src/libvmaf.c — add unsigned last_index + bool have_last_index fields to VmafContext; prepend a monotonic-index check inside read_pictures_validate_and_prep (returns -EINVAL on duplicates / regressions); update the two new fields at the tail of the same helper on success.
  • core/test/test_read_pictures_monotonic.c — new 3-subtest reducer covering the Netflix#910 sequence and the two classes of rejection (duplicate, out-of-order).
  • core/test/meson.build — register the new test executable.
  • Invariant (load-bearing, enforced by the reducer): vmaf_read_pictures(vmaf, ref, dist, index) returns -EINVAL when have_last_index && index <= last_index. Flush (vmaf_read_pictures(vmaf, NULL, NULL, 0)) routes to flush_context before the guard runs — flushing remains always-available independent of the last accepted index.
  • On upstream sync:
  • If Netflix upstream eventually lands a similar guard at the API boundary, keep the fork's version — the helper function name (read_pictures_validate_and_prep) is fork-local (ADR-0146), upstream's patch will target a different insertion point. Both guards should be compatible; re-verify the reducer after rebase.
  • If upstream instead lands an internal reordering mechanism (buffer-and-sort frames before dispatch), revisit this decision — the fork's API-level contract is stricter and may need to relax to match. Open a new ADR if so.
  • Re-test on rebase:
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: 3/3 subtests pass.

# Reducer check (confirms the guard is load-bearing):
git stash push core/src/libvmaf.c
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: Fail: 1 (the test rejects the un-guarded behaviour).
git stash pop

0044 — i686 (32-bit x86) build-only CI job (ADR-0151)

  • ADR: ADR-0151
  • Upstream source: Netflix upstream issue #1481 (OPEN as of 2026-04-24). Reports i686 compile failure on _mm256_extract_epi64. Workaround documented in the issue: -Denable_asm=false.
  • Touches:
  • build-aux/i686-linux-gnu.ini — new cross-file; gcc + -m32 + cpu_family = 'x86' / cpu = 'i686'. No exe_wrapper.
  • .github/workflows/libvmaf-build-matrix.yml — new matrix row with i686: true flag + new install-deps step for gcc-multilib + g++-multilib; existing "Run tests" + "Run tox tests (ubuntu)" steps widened with && !matrix.i686 guards.
  • Invariants:
  • The i686 matrix row pins -Denable_asm=false — this is the upstream-documented workaround for _mm256_extract_epi64's missing declaration on 32-bit x86 targets. Do NOT remove the flag without first gating every _mm256_extract_epi64 call site in core/src/feature/x86/adm_avx2.c + motion_avx2.c + adm_avx512.c on __x86_64__. Removing the flag naively will re-break the build.
  • No exe_wrapper in the cross-file: meson marks tests as SKIP 77 even though the host can run i686 binaries natively. Build-only gate by design.
  • On upstream sync:
  • If upstream Netflix fixes #1481 at source (by gating the intrinsic calls on __x86_64__ or by emulating via two _mm256_extract_epi32 halves), sync the fix and re-enable ASM on the i686 row (drop -Denable_asm=false from meson_extra). Re-verify bit-exactness via /cross-backend-diff on the x86_64 golden pair.
  • If upstream marks i686 unsupported in meson (e.g. via a hard error), the fork's i686 row should be removed or downgraded to continue-on-error: true.
  • Re-test on rebase (Ubuntu host with gcc-multilib):
meson setup libvmaf core/build-i686 \
    --cross-file=build-aux/i686-linux-gnu.ini \
    -Denable_asm=false \
    -Denable_cuda=false -Denable_sycl=false
ninja -C core/build-i686
file core/build-i686/tools/vmaf
# Expect: ELF 32-bit LSB pie executable, Intel i386

CI runs this same sequence via the new matrix row.

0058 — Tiny-AI Netflix corpus training scaffold (ADR-0252)

  • ADR: ADR-0252.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training harness or MCP server.
  • Touches:
  • ai/ — training harness; NflxLocalDataset loader reads from --data-root (never from a hardcoded path).
  • docs/ai/training-data.md — corpus path convention and loader API docs; purely additive.
  • mcp-server/vmaf-mcp/tests/test_smoke_e2e.py — new e2e smoke test; references only committed golden fixtures.
  • Invariants (load-bearing):
  • Data path is local-only. .workingdir2/netflix/ is gitignored; no YUV from this corpus is ever committed. The --data-root CLI flag must remain the sole mechanism for locating the corpus.
  • Smoke test uses only committed fixtures. test_smoke_e2e.py references python/test/resource/yuv/src01_hrc00_576x324.yuv (a committed golden file), never the local corpus path. On upstream sync the golden YUV path must stay stable.
  • No Netflix golden assertion is modified. The places=4 tolerance in test_smoke_e2e.py asserts against the vmaf_v0.6.1 CPU reference; it is not a golden assertion and may be adjusted by /regen-snapshots with justification.
  • On upstream sync: zero interaction with Netflix upstream. The ai/ subtree and mcp-server/ are wholly fork-local; upstream merges are conflict-free here. If Netflix ever ships a training harness, reconcile separately.
  • Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (vmaf binary)
# Skips automatically if binary or golden YUV is absent.

0085 — Research-0030 Phase-3b multi-seed validation (Gate 1 passed)

  • No ADR. Empirical research digest closing Gate 1 of the 3-gate v2 validation chain. Architecture decision unchanged.
  • Upstream source: fork-local. Netflix has no multi-seed validation surface for tiny-AI training.
  • Touches (additive only):
  • docs/research/0030-phase3b-multiseed-validation.md — per-seed PLCC tables + stability analysis + Gate 2/3 plan.
  • ai/scripts/phase3_subset_sweep.py — adds --seeds flag (comma-separated list) + per-seed result aggregation.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The +0.0175 Δ is multi-seed mean PLCC, not seed-0 PLCC. Don't cite the +0.0106 from Research-0029 once Research-0030 lands; the multi-seed number is more trustworthy.
  • Subset B is more stable than canonical-6 across seeds. Don't ship a v2 model citing single-seed numbers — always report multi-seed mean ± seed-mean-std for any tiny-AI metric in a future digest.
  • The --seeds flag aggregates by flattening (seed × fold) pairs. The reported mean_plcc is the mean of all n_seeds × n_folds measurements; seed_mean_plcc_std is the std across per-seed means, which is the right number for "is the result seed-stable".
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files reproduce from the canonical command.

0084 — Research-0029 Phase-3b StandardScaler retry (positive result)

  • No ADR. Empirical research digest; revives the Research-0026 hypothesis after the Research-0028 negative result. The architectural decision (ship vmaf_tiny_v2) is gated on three validation steps documented in the digest §"Required before shipping".
  • Upstream source: fork-local. Netflix has no tiny-AI preprocessing-sensitivity analysis surface.
  • Touches (additive only):
  • docs/research/0029-phase3b-standardscaler-results.md — per-fold tables + apples-to-apples comparison + 3-gate pre-shipping checklist.
  • ai/scripts/phase3_subset_sweep.py — adds --standardize flag + _standardize_inplace helper.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • StandardScaler statistics MUST be fit per-fold on the train split only. Fitting on the full data would leak held-out information into LOSO; the _standardize_inplace helper enforces this by taking only the train slice as input.
  • A shipped vmaf_tiny_v2.onnx MUST bundle its scaler (mean, std) in the sidecar JSON per ADR-0049 — otherwise inference applies different normalisation than training and the win evaporates. Currently UN-implemented; tracked as a §"Caveats" #5 follow-up.
  • Subset B's feature list is the load-bearing finding: adm2, adm_scale3, vif_scale2, motion2, ssimulacra2, psnr_hvs, float_ssim. Phase-3c experiments may shift the optimal arch / lr / epochs but should keep this set.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the --standardize invocation in §"Reproducer".

0082 — Research-0028 Phase-3 subset sweep (negative-result digest)

  • No ADR. Empirical research digest. The architectural decision (no v2 model ships from this Phase) is governed by Research-0027's pre-registered stopping rule.
  • Upstream source: fork-local. Netflix has no tiny-AI subset- sweep surface.
  • Touches (additive only):
  • docs/research/0028-phase3-subset-sweep.md — per-fold tables adline + standardisation caveat + Phase-3b/c/d follow-ups.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • canonical-6 stays the default until Phase-3b lands a ≥ 0.005 PLCC win (per Research-0027 stopping rule).
  • The PLCC drop is most likely a feature-scale issue, not evidence the new features lack signal. Don't cite this digest to retire ssimulacra2 / adm_scale3 from the candidate pool; re-test with StandardScaler first.
  • Phase-3 results are seed=0 only. Any v2-shipping decision needs 3-seed mean±std and KoNViD cross-check.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; runs/ files are reproducible from the canonical command in §"Reproducer".

0081 — Research-0027 Phase-2 feature importance results

  • No ADR. Empirical research digest closing Research-0026 Phase 2; the architectural decision (Subset A / B / C) is deferred to Phase-3 results in a future digest.
  • Upstream source: fork-local. Netflix has no cross-metric feature-importance analysis surface.
  • Touches (additive only):
  • docs/research/0027-phase2-feature-importance.md — per-method top-10 + consensus + redundancy + Phase-3 subset recommendations.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Consensus top-10 is the load-bearing finding: adm2, adm_scale3, ssimulacra2, vif_scale2. Phase-3 candidate subsets MUST include all four.
  • The 11-pair redundancy table is corpus-specific — measurements on Netflix Public 9-source. KoNViD-1k cross- check is a Phase-3 prerequisite if Subsets B/C advance.
  • runs/full_features_netflix.parquet and runs/full_features_correlation.json stay gitignored. Reproducer in §"Reproducer" regenerates both.
  • On upstream sync: zero interaction. Fork-only research.
  • Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the canonical commands.

0080 — Phase-2 analysis scripts (Research-0026 Phase 2 prep)

  • No ADR. Pure analysis scaffolding; the architectural decision (which features to ship in v2) is gated on Phase 2's numerical output via Research-0027.
  • Upstream source: fork-local. Netflix has no tiny-AI training nor cross-metric correlation tooling.
  • Touches (additive only):
  • ai/scripts/extract_full_features.py — parquet extractor over Netflix corpus with FULL_FEATURES. Per-clip JSON cache at $XDG_CACHE_HOME/vmaf-tiny-ai-full/<source>/<dis_stem>.json.
  • ai/scripts/feature_correlation.py — Pearson + MI + LASSO
    • consensus top-K analyser; outputs JSON.
  • ai/tests/test_feature_correlation.py — 5 pytest cases against synthetic parquet (no libvmaf dependency).
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The per-clip JSON cache and the FULL_FEATURES tuple must stay in lock-step. If the tuple grows (or shrinks), pre-existing cache files become stale and silently misalign their stored per_frame columns with the new tuple. The extractor MUST be re-run with a cleared cache when FULL_FEATURES changes. Regression hint: test_default_features_unchanged in test_feature_sets.py already guards the canonical 6; extend coverage to FULL_FEATURES if rebases touch it.
  • motion3 resolves to extractor motion_v2 in _METRIC_TO_EXTRACTOR, not motion3 (the upstream-canonical extractor name in the integer_motion_v2 module). The CLI --feature motion3 does NOT exist. The JSON output key is integer_motion3 which _lookup finds via the integer_ fallback.
  • adm and vif aggregates are NOT in FULL_FEATURES. The integer extractor emits integer_adm2 and integer_vif_scale0..3 but no bare adm/vif. Listing them produced all-NaN columns in v1 — fixed in PR #185 amend.
  • On upstream sync: zero interaction. Pure fork-side analysis tooling.
  • Re-test on rebase:
pytest ai/tests/test_feature_correlation.py ai/tests/test_feature_sets.py -v
# Expect: 14 passed in <1 s.

0079 — Tiny-AI feature-set registry (Research-0026 Phase 1)

  • No ADR. Pure additive extension of an existing module; the architectural decision (which features, which model) lives in Research-0026's go/no-go gate after Phase 2.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training pipeline.
  • Touches (additive only):
  • ai/data/feature_extractor.py — adds FULL_FEATURES (21 entries), FEATURE_SETS registry, resolve_feature_set() helper. _METRIC_TO_EXTRACTOR grew 11 → 25 entries.
  • ai/tests/test_feature_sets.py — new 9-test smoke suite.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant — these are load-bearing):
  • DEFAULT_FEATURES stays the canonical 6-tuple matching vmaf_v0.6.1's SVR input layout. Test test_default_features_unchanged is the regression guard; any quiet broadening would invalidate every shipped tiny-AI ONNX (input-dim baked into the model). If a future change must broaden the default, ship a paired model swap under ADR-0049 sidecar policy.
  • FULL_FEATURES excludes lpips and float_moment per Research-0026 §"Open questions" Q1. Test test_full_features_excludes_lpips_and_moment enforces. Adding either would re-classify the experiment from "tiny model on classical features" to "ensemble of DNNs".
  • Every entry in FULL_FEATURES MUST have an entry in _METRIC_TO_EXTRACTOR. Test test_every_full_feature_has_extractor_mapping is the guard — without the mapping the libvmaf CLI silently emits NaN columns for the missing metric.
  • On upstream sync: zero interaction. Fork-only training surface.
  • Re-test on rebase:
pytest ai/tests/test_feature_sets.py -v
# Expect: 9 passed in <1 s.

0078 — Research-0026 cross-metric feature fusion plan

  • No ADR. Pure research-plan digest; the architectural decision (which features to add) is deferred to Research-0027 follow-up after Phase 2 numbers land.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training and no broader-feature-set hypothesis under investigation.
  • Touches (additive only):
  • docs/research/0026-cross-metric-feature-fusion.md — 4-phase experimental plan + cost estimate + go/no-go criteria.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The 6-feature canonical baseline (adm2, vif_scale0..3, motion2) stays the default. Any v2 model is opt-in via a new feature_set field in the sidecar JSON; existing vmaf_tiny_v1.onnx users get the same numbers.
  • lpips is OUT of the candidate pool (Phase 1/2). It's DNN-based and would blur the line between "tiny model on classical features" and "ensemble of DNNs". Revisit only if classical features can't close the gap.
  • On upstream sync: zero interaction. Pure fork-side research planning.
  • Re-test on rebase: documentation-only; no test surface.

0077 — Research-0025 FoxBird outlier resolved via KoNViD combined training

  • No ADR. Empirical research digest closing the open question in Research-0023 §5; no architecture or policy decision. Pure documentation of an empirical result.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training, no KoNViD-1k integration, and no LOSO eval surface.
  • Touches (additive only):
  • docs/research/0025-foxbird-resolved-via-konvid.md — per-clip table + comparison to Netflix-only baselines + interpretation + caveats + next-experiment list.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • The training-fit per-clip numbers in §"Per-clip result" are NOT held-out generalisation metrics — FoxBird is in the training set. The proper validation is the LOSO sweep on the combined corpus (§"Next experiments" #1). Don't cite the 0.9936 FoxBird PLCC as a generalisation number; cite it as "training-fit on combined corpus, 5.4× RMSE improvement vs Netflix-only".
  • Combined trainer command line is canonical. The reproduction recipe in §"Setup" includes --seed 0, --konvid-val-fraction 0.1, --val-source Tennis, --val-mode netflix-source-and-konvid-holdout. Changing any knob invalidates the per-clip numbers.
  • runs/tiny_combined_canonical/ stays gitignored. The final ONNX is reproducible from the parquet + Netflix corpus + the canonical CLI; the durable record is the digest's table.
  • On upstream sync: zero interaction. Research digest is fork-only.
  • Re-test on rebase:
python ai/train/train_combined.py \
  --netflix-root .workingdir2/netflix \
  --konvid-parquet ai/data/konvid_vmaf_pairs.parquet \
  --model-arch mlp_small --epochs 30 --batch-size 256 --lr 1e-3 \
  --val-mode netflix-source-and-konvid-holdout \
  --val-source Tennis --konvid-val-fraction 0.1 --seed 0 \
  --out-dir runs/tiny_combined_canonical
# Expect: FoxBird PLCC ≈ 0.9936 ± 1e-3 (numerical-noise floor),
# mean PLCC ≥ 0.9983 across 9 Netflix clips.

0076 — Research-0024 vif/adm upstream-divergence digest (Strategy E doc)

  • No ADR. Pure documentation digest; the divergence decisions it ratifies are already governed by ADR-0138 / 0139 / 0142 / 0143 (vif SIMD bit-exactness contract) and ADR-0024 (Netflix golden-data immutability). The digest itself fits the per-PR research-digest deliverable bar from ADR-0108.
  • Upstream source: forward-looking — pre-emptively documents the fork's non-port of Netflix 4ad6e0ea / 41d42c9e / bc744aa3 / 8c645ce3 (vif chain) and 4dcc2f7c (float_adm chain). Strategy A on b949cebf motion chain stays approved.
  • Touches (additive only):
  • docs/research/0024-vif-upstream-divergence.md — 5-strategy decision matrix + numerical-risk analysis for each chain.
  • core/src/feature/AGENTS.md — two new "rebase-sensitive invariants" entries pinning the vif and adm divergences.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant — these are the whole point):
  • Do not port 4ad6e0ea (vif runtime helpers) or 8c645ce3 (vif prescale options) verbatim. They replace the precomputed vif_filter1d_table_s table whose frozen const float Gaussians make AVX2 == AVX-512 == NEON == scalar bit-for-bit. A future opt-in second-path port (Strategy C, runtime helpers behind --vif-prescale != 1) is allowed but must not touch the default code path.
  • Do not port 4dcc2f7c float_adm options chain. The 12-parameter compute_adm signature change cascades through SIMD (avx2 / avx512 / neon) and 3 GPU backends (vulkan / cuda / sycl). The new aim feature has no fork- side golden values; defer until concrete user demand.
  • Mirror bugfix 41d42c9e is a separate decision. Must come paired with places=4 → places=3 golden loosening per ADR-0142 Netflix-authority precedent. Not part of Strategy E; eligible for a focused single-purpose PR if any shipped model drifts more than places=3 because of the missing fix.
  • b949cebf motion chain port stays APPROVED under Strategy A (verbatim, float_motion-side only). Float_motion has no precomputed-table investment to protect; existing fork integer_motion already has 6/9 of these options; cheap to mirror onto float_motion.
  • On upstream sync: zero conflict — pure additions to research/ and AGENTS.md.
  • Re-test on rebase: documentation-only PR; rendered markdown is the only verification surface.
# Re-run the diff scan that produced the digest (catches new
# upstream commits since 9dac0a59):
git fetch upstream && git log --pretty=format:'%h %s' \
  upstream/master ^origin/master --since="2026-01-01" \
  -- core/src/feature/{float_,integer_,}{vif,motion,adm,cambi}*.{c,h} \
     core/src/feature/{vif,motion,adm,cambi}_options.h \
  | head -30
# If new vif / adm option ports appear, update Research-0024 §"Same
# divergence test for motion + float_adm" before deciding to port.

0075 — Upstream 798409e3 + 314db130 ports (CUDA null-deref + remove all.c)

  • No ADR. Pure upstream cherry-picks per ADR-0108 carve-out ("pure upstream syncs and port-upstream-commit PRs are exempt").
  • Upstream source:
  • 798409e3 (Lawrence Curtis, 2026-04-20): "Fix null deref crash on prev_ref update in pure CUDA pipelines"
  • 314db130 (Kyle Swanson, 2026-04-28): "libvmaf/feature: remove empty translation unit all.c"
  • Touches (additive / removal only):
  • core/src/libvmaf.c — adds if (ref && ref->ref) guard before vmaf_picture_ref(&vmaf->prev_ref, ref) at the two threaded paths (threaded_enqueue_one line 1057 and threaded_read_pictures_batch line 1105). Main path at line 1597 already has the guard.
  • core/src/feature/all.c — file deleted.
  • core/src/meson.build — drops the feature_src_dir + 'all.c' line.
  • core/src/feature/offset.c — updates the // NOLINTNEXTLINE comment to drop all.c from the list of per-feature consumers.
  • CHANGELOG.md Unreleased § Fixed (798409e3) + § Changed (314db130).
  • Invariants (rebase-relevant):
  • The fork has THREE prev_ref update sites; all need the if (ref && ref->ref) guard. The main vmaf_read_pictures path already had it (via read_pictures_update_prev_ref helper); the threaded paths (#ifdef VMAF_BATCH_THREADING) inherited the unguarded shape from upstream's old code. Future upstream rebases must preserve all three guards even if Netflix refactors the threaded paths.
  • all.c deletion is symbol-safe. All compute_* functions it forward-declared are reached via per-extractor TUs that #include the relevant <feature>.h. No external linker dependency on all.c's symbols.
  • On upstream sync: zero conflict expected — fork now matches upstream tip on these two surfaces.
  • Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
  -Denable_vulkan=disabled
ninja -C build-cpu
meson test -C build-cpu  # 37 tests, all pass.

0074 — Combined Netflix + KoNViD-1k trainer driver

  • No ADR. Pure engineering follow-up; the architecture rationale is fully covered by ADR-0203 (training-prep architecture) and Research-0023 §5 (FoxBird-class outlier needs broader corpus).
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI trainer.
  • Stacks on the KoNViD-1k loader bridge (PR #178 / rebase-note 0073). Rebase order: land 0073 first.
  • Touches (additive only):
  • ai/train/train_combined.py — concatenating trainer that reuses _build_model / _train_loop / export_onnx from ai/train/train.py.
  • ai/tests/test_train_combined_smoke.py — 5 pytest cases (key splitter + --epochs 0 paths, no libvmaf or real corpus required).
  • docs/ai/training.md — "Combining KoNViD with the Netflix corpus" subsection rewritten from "follow-up" to runnable.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Reuse the canonical training-loop helpers. Don't fork _build_model / _train_loop / export_onnx into this file. Both trainers must share the model factory so a future change (e.g. adding mlp_large) lands in one place.
  • KoNViD train/val splits hold out whole clip keys, not random frames. A frame-level split would let frames from the same clip leak across train/val and inflate PLCC by 5-10 pp (well-known VQA pitfall — same reasoning as ADR-0203's Netflix 1-source-out split).
  • Missing data falls back, not errors. Missing --konvid-parquet → Netflix-only path. Missing --netflix-root → KoNViD-only path. Both missing → initial- weights ONNX export + rc=0 so the smoke command always produces a deterministic artefact.
  • On upstream sync: zero interaction; pure fork-local trainer.
  • Re-test on rebase:
pytest ai/tests/test_train_combined_smoke.py -v
# Expect: 5 passed (under ~3 s, no libvmaf required).
python ai/train/train_combined.py --epochs 0 \
  --netflix-root /tmp/missing --konvid-parquet /tmp/missing.parquet \
  --out-dir /tmp/combined_smoke
# Expect: <out-dir>/mlp_small_combined_final.onnx written, rc=0.

0073 — KoNViD-1k → VMAF-pair acquisition + loader bridge

  • No ADR. Acquisition + loader pieces are pure additions; the methodology fits inside ADR-0203 / Research-0019.
  • Upstream source: fork-local. KoNViD-1k integration is a fork-only training-data play.
  • Touches (additive only):
  • ai/scripts/konvid_to_vmaf_pairs.py — acquisition pipeline.
  • ai/train/konvid_pair_dataset.pyKoNViDPairDataset class mirroring NetflixFrameDataset's interface.
  • ai/tests/test_konvid_pair_dataset.py — 5 pytest cases.
  • docs/ai/training.md — new "C1 (KoNViD-1k corpus)" section.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • KoNViDPairDataset mirrors NetflixFrameDataset shape. feature_dim == 6, numpy_arrays() → (X, y) returns (n_frames, 6) + (n_frames,). If NetflixFrameDataset's feature order changes, mirror it here.
  • Acquisition parquet schema is fixed. Required columns: key, frame_index, vif_scale0..3, adm2, motion2, vmaf. Add freely; do NOT rename / drop those.
  • ai/data/konvid_vmaf_pairs.parquet and $VMAF_TINY_AI_CACHE/konvid-1k/ stay gitignored. They regenerate from raw KoNViD .mp4 sources.
  • On upstream sync: zero interaction.
  • Re-test on rebase:
pytest ai/tests/test_konvid_pair_dataset.py -v
# Expect: 5 passed
python ai/scripts/konvid_to_vmaf_pairs.py --max-clips 5
# Expect: ~7 s wall, ai/data/konvid_vmaf_pairs.parquet with
#         5 unique keys × ~200 frames each.

0072 — Tiny-AI 3-arch LOSO eval harness + Research-0023

  • No ADR. Methodology fits inside Research-0023; ADR-0203 already covers the training-prep architecture and the three-arch sweep concept.
  • Research digest: docs/research/0023-loso-3arch-results.md.
  • Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
  • Touches (additive only):
  • ai/scripts/eval_loso_3arch.py — new harness; reuses the _load_session + _load_clip + CLIPS helpers from eval_loso_mlp_small.py (PR #165).
  • docs/research/0023-loso-3arch-results.md — methodology + per-fold tables for mlp_small / mlp_medium / linear.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • Reuse the PR #165 helpers. Don't fork the _load_session external-data workaround into a copy — both scripts must keep using the same import. If a follow-up re-exports the shipped baselines with corrected external_data.location, both scripts deprecate the workaround simultaneously.
  • runs/ and model/tiny/training_runs/ stay gitignored. The harness writes runs/loso_eval/loso_3arch_eval.{json,md}; the durable record is the table in Research-0023 §2 + the per-fold tables in §3. Regenerate via the loop in §6 of the digest.
  • On upstream sync: zero interaction. Pure fork-local evaluation harness.
  • Re-test on rebase:
python ai/scripts/eval_loso_3arch.py
diff <(jq -r '.archs.mlp_small.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9808)
diff <(jq -r '.archs.mlp_medium.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9727)
diff <(jq -r '.archs.linear.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.3679)
# Expect: identical lines on a populated cache + identical fold ONNX.

0071 — T7-16 ADM Vulkan/SYCL drift verified-resolved (doc close)

  • No ADR. Verification-only close, sister of T7-15.
  • Upstream source: fork-local. ADM cross-backend gate is a fork-only test surface; Netflix/vmaf has no Vulkan or SYCL backend.
  • Touches (additive only):
  • docs/state.md — new "Recently closed" row for T7-16.
  • .workingdir2/BACKLOG.md — T7-16 row marked closed (local- only planning dossier; gitignored).
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • places=4 cross-backend ADM contract. Empirical adm_scale2 max_abs_diff is now 1e-6 (print floor; ULP=0) on Vulkan device 0 (NVIDIA), device 1 (Mesa anv on Arc), and SYCL device 0 (Arc); residual adm_scale1 ≈ 3.1e-5 and adm2 ≈ 5e-6 on 1/48 frames pass places=4 (5e-5 tolerance) but fail places=5. Hold the gate at places=4.
  • No ADM kernel source change. Fix is environmental (NVCC + driver + SYCL runtime).
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --feature adm --backend vulkan --device 0 --places 4 \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324
# Expect: 0/48 mismatches across all 5 ADM metrics.

0070 — T7-15 motion CUDA/SYCL drift verified-resolved (doc close)

  • No ADR. Verification-only close; no code change in PR #172.
  • Upstream source: fork-local. Cross-backend gate is a fork-only test surface; not in Netflix/vmaf.
  • Touches (additive only):
  • docs/state.md — "Recently closed" row for T7-15.
  • .workingdir2/BACKLOG.md — T7-15 row marked closed (local- only planning dossier; gitignored).
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • The places=4 cross-backend gate stays at places=4. Empirical max_abs_diff is currently 0.0 (CUDA) or 1e-6 (SYCL/ Vulkan, JSON %f rounding floor); tightening to places=5 could be tempting but the 1e-6 print-floor would then make the SYCL + Vulkan rows fail. Hold at places=4 until --precision=max is wired into the diff tool.
  • No motion-kernel source change. PR #172 didn't modify core/src/feature/cuda/integer_motion/*.cu or core/src/feature/sycl/integer_motion_sycl.cpp. The fix is environmental (NVCC + driver), so the next CI run on a fresh image needs to be re-verified against the gate.
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --feature motion --backend cuda \
  --places 4
# Expect: 0/48 mismatches, max_abs_diff = 0.0

0069 — libvmaf_vulkan.h installed under prefix (build bug)

  • No ADR. Build-system bug fix; matches existing CUDA / SYCL install conditions.
  • Upstream source: fork-local. Vulkan backend is fork-only; Netflix/vmaf has no libvmaf_vulkan.h.
  • Touches:
  • core/include/core/meson.build — adds an is_vulkan_enabled gate that handles the feature option's enabled / auto states; appends libvmaf_vulkan.h to platform_specific_headers when active.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • Install rule mirrors the CUDA / SYCL pattern but uses the feature-option API. The is_cuda_enabled = get_option('enable_cuda') == true boolean idiom doesn't apply to enable_vulkan because that's a feature option, not a boolean. Use .enabled() or .auto(). Don't "simplify" to == true — that would silently drop the install in the auto state.
  • Pairs with ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch which probes for the header via check_pkg_config libvmaf_vulkan "libvmaf >= 3.0.0" libvmaf/libvmaf_vulkan.h vmaf_vulkan_state_init_external. Removing the install rule re-introduces lawrence's 2026-04-28 symptom: FFmpeg silently drops the libvmaf_vulkan filter despite --enable-libvmaf-vulkan.
  • On upstream sync: zero interaction; Vulkan backend is fork-only.
  • Re-test on rebase:
cd libvmaf
CC=icx CXX=icpx meson setup build -Denable_vulkan=enabled \
  -Denable_cuda=true -Denable_sycl=true -Db_lto=false
ninja -C build
meson install -C build --destdir /tmp/libvmaf-install
ls /tmp/libvmaf-install/usr/local/include/libvmaf/libvmaf_vulkan.h
# Expect: file exists.

0066 — --backend cuda inverted-gpumask fix (CLI bug)

  • No ADR. Bug fix; behaviour now matches the public-header VmafConfiguration::gpumask contract.
  • Upstream source: fork-local. The --backend CLI selector was added by the fork (Netflix/vmaf has no exclusive-backend selector).
  • Touches (additive + 1-line behavioural fix):
  • core/tools/cli_parse.c::parse_cli_args--backend cuda branch sets gpumask = 0 (was gpumask = 1).
  • core/test/test_cli_parse.c — 5 new regression tests (test_backend_{cpu,cuda_engages_cuda,cuda_preserves_explicit_gpumask,sycl,vulkan}) plus run_aom_ctc_tests / run_backend_tests helper split to keep run_tests under the function-size budget.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • VmafConfiguration::gpumask semantics: if gpumask: disable CUDA. compute_fex_flags in src/libvmaf.c routes CUDA only when gpumask == 0. Any code path that sets a non-zero gpumask to "request CUDA" silently disables it. The CLI's --backend cuda branch must set gpumask = 0 and rely on use_gpumask = true to trigger vmaf_cuda_state_init. Do not "fix" this back to gpumask = 1 — it's the bug being fixed.
  • Explicit --gpumask=N --backend cuda preserves N. A user who passes --gpumask=2 already has use_gpumask = true, so the --backend cuda branch's defaulting block (gated on !settings->use_gpumask) is skipped. The test_backend_cuda_preserves_explicit_gpumask regression locks this in.
  • On upstream sync: zero interaction; --backend is fork-only.
  • Re-test on rebase:
./build/test/test_cli_parse | grep -E 'backend_'
# Expect: 5 backend tests pass.
build/tools/vmaf -r REF -d DIS -w 576 -h 324 -p 420 -b 8 \
  --model "path=model/vmaf_v0.6.1.json" --threads 1 \
  --backend cuda --output cuda.json --json -q
python3 -c "import json; d=json.load(open('cuda.json')); \
  assert len(d['frames'][0]['metrics']) == 12, 'CUDA not engaged'"

0067 — Tiny-AI PTQ accuracy across Execution Providers (T5-3e)

  • No ADR. Investigation/measurement PR; ADR-0129 already governs the PTQ workstream. Findings update docs/research/0006-tinyai-ptq-accuracy-targets.md §"GPU-EP quantisation" — that section was previously a deferred-open-question; it is now the empirical landing spot.
  • Research digest: same file (Research-0006).
  • Upstream source: fork-local. Netflix/vmaf does not ship a PTQ harness or any tiny-AI ONNX path.
  • Touches (additive only):
  • ai/scripts/measure_quant_drop_per_ep.py — new sibling of measure_quant_drop.py. CPU+CUDA via ORT; Arc / OpenVINO-CPU via the native openvino Python runtime (no onnxruntime-openvino because no cp314 wheel exists). Reuses the _load_session rename workaround from PR #165 + a value_info-strip fix so dynamic-PTQ doesn't choke on the shipped MLP ONNX.
  • docs/ai/quant-eps.md — new user doc; linked from docs/ai/index.md.
  • docs/research/0006-tinyai-ptq-accuracy-targets.md — refreshed header, replaced "GPU-EP open question" with the measurement table, fixed pre-existing MD040/MD060 lints surfaced on the touched file.
  • docs/ai/index.md — added the quant-eps row, rewrapped to 80 cols.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant):
  • measure_quant_drop.py (the CI gate) is unchanged. The new script is purely additive. Any rebase that conflates the two scripts must keep the CI gate CPU-only — Arc int8 is broken, so a per-EP gate would red-light every PR.
  • value_info strip is required for vmaf_tiny_v1* dynamic PTQ. The shipped MLP ONNX duplicate weight tensors in value_info, which makes quantize_dynamic raise Inferred shape and existing shape differ. The fix is in _save_inlined. Don't remove it during a refactor unless the underlying ONNX is regenerated.
  • CUDA-12 ABI shim. ORT-GPU 1.25 wheels link libcublasLt.so.12 even on CUDA-13 hosts. The reproduction recipe pins the nvidia-*-cu12 wheels and prepends them to LD_LIBRARY_PATH. If a future ORT wheel drops the cu12 ABI we can cut the shim, but the script tolerates either since it doesn't import any CUDA symbol itself.
  • On upstream sync: zero interaction; entirely fork-local.
  • Re-test on rebase:
SP=$VIRTUAL_ENV/lib/python3.14/site-packages/nvidia
export LD_LIBRARY_PATH="$SP/cublas/lib:$SP/cudnn/lib:$SP/cuda_nvrtc/lib:$SP/cuda_runtime/lib:$SP/cufft/lib:$SP/curand/lib:$SP/cusolver/lib:$SP/cusparse/lib:$SP/cuda_cupti/lib:$SP/nvtx/lib:$SP/nvjitlink/lib"
python ai/scripts/measure_quant_drop_per_ep.py \
    --eps cpu cuda openvino \
    --extra-fp32 vmaf_tiny_v1.onnx vmaf_tiny_v1_medium.onnx \
    --out runs/quant-eps-$(date +%Y-%m-%d)
# Expected: CPU + CUDA PASS (drop ≤ 1.2e-4); OpenVINO Arc ERR
# (compile failure for Conv-int8) or NaN (MatMul-int8) until a
# newer intel_gpu plugin lands.

0065 — testdata/bench_all.sh correct backend-engagement flags

  • No ADR. Bug fix; no behavioural surface change beyond "the bench actually engages the backends it claims to now."
  • Upstream source: fork-local. testdata/bench_all.sh is a fork-only bench harness; not in Netflix/vmaf.
  • Touches (additive only):
  • testdata/bench_all.sh — switched per-row flag pattern from the disable-only singletons (--no_sycl for "CUDA", etc.) to the correct engagement form (--gpumask=0 --no_sycl --no_vulkan for CUDA, --sycl_device=0 --no_cuda --no_vulkan for SYCL, --vulkan_device=0 --no_cuda --no_sycl for Vulkan, and --no_cuda --no_sycl --no_vulkan for CPU). Added a 4th column (Vulkan) to the comparator. Honours $VMAF_BIN for the binary path and $VMAF_ONEAPI_SETVARS for the oneAPI install location.
  • CHANGELOG.md Unreleased § Fixed.
  • Invariants (rebase-relevant):
  • Disable-only singletons don't engage a backend. --no_sycl alone leaves CUDA available but unrequested. --no_cuda alone leaves SYCL available but unrequested. The CLI inits CUDA only when c.use_gpumask is set; SYCL only when c.sycl_device >= 0 or c.use_gpumask; Vulkan only when c.vulkan_device >= 0. Any change to those gates that drops one of the per-row flags will re-introduce the silent CPU fallback. Verify after a rebase by inspecting JSON frames[0].metrics key counts (CPU 14-15, CUDA 11-12, Vulkan ~34) — see libvmaf/AGENTS.md §"Backend-engagement foot-guns".
  • gpumask semantics are inverted from intuition. gpumask=0 enables CUDA dispatch; gpumask=1 disables it. The per-row CUDA flag is --gpumask=0, not --gpumask=1. Don't "fix" it to --gpumask=1 for symmetry with sycl_device/vulkan_device — that's the bug being fixed (parallel to PR #170).
  • On upstream sync: zero interaction; testdata/bench_all.sh is fork-only.
  • Re-test on rebase:
bash testdata/bench_all.sh    # smoke
# Verify each row's JSON keys match the expected per-backend count:
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cpu.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cuda.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_vulkan.json

0063 — Tiny-AI LOSO eval harness for mlp_small

  • No ADR. The methodology fits inside Research Digest 0022; ADR-0203 already covers the training-prep architecture.
  • Research digest: docs/research/0022-loso-mlp-small-results.md.
  • Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
  • Touches (additive only):
  • ai/scripts/eval_loso_mlp_small.py — new evaluation harness.
  • docs/ai/loso-eval.md — usage doc.
  • docs/research/0022-loso-mlp-small-results.md — methodology + results.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (rebase-relevant):
  • _load_session workaround for renamed-baseline ONNX. The shipped baselines model/tiny/vmaf_tiny_v1*.onnx reference their pre-rename external_data.location values. The workaround in _load_session rewrites the entries before handing the proto to ORT. Removing the workaround breaks the baseline phase. The proper fix (re-export with matching names) is tracked as a follow-up; until then this code path is load-bearing.
  • runs/ and model/tiny/training_runs/ stay gitignored. The harness writes to runs/loso_eval/ by default; do NOT promote any of those outputs into the tree. The 9 fold ONNX and the per-clip JSON cache regenerate from the corpus + trainer + libvmaf CLI.
  • On upstream sync: zero interaction. Pure fork-local evaluation harness.
  • Re-test on rebase:
python ai/scripts/eval_loso_mlp_small.py
diff <(jq -r '.loso_aggregate.mean_plcc' runs/loso_eval/loso_mlp_small_eval.json) <(echo 0.9808)
# Expect: identical line on a populated cache + identical fold ONNX.
  • No ADR. Process / docs PR; rows trace back to the individually-cited ADRs / research digests in their own References columns.
  • Decision dossier: .workingdir2/decisions/section-a-decisions-2026-04-28.md.
  • Source audit: docs/backlog-audit-2026-04-28.md.
  • Upstream source: fork-local. Pure backlog hygiene PR; no Netflix code touched.
  • Touches (additive only):
  • .workingdir2/BACKLOG.md — 9 new rows: T3-17, T3-18, T5-3e, T5-4, T7-35, T7-36, T7-37, T7-38; T6-1a row extended with the bisect-cache fixture sub-bullet.
  • docs/research/0006-tinyai-ptq-accuracy-targets.md — drops the "defer until first user" framing on the GPU-EP quantisation open question per user direction; cross-links T5-3e.
  • docs/research/0020-cambi-gpu-strategies.md — v2 follow-up section now cites T7-36 as the gate for opening the v2 row.
  • docs/adr/0205-cambi-gpu-feasibility.md — Decision section's "follow-up integration PR" now cites T7-36.
  • CHANGELOG.md Unreleased § Changed.
  • Invariants (rebase-relevant): none. Pure backlog text. Rebase-conflict risk is limited to the same BACKLOG.md table rows that any future row addition would touch; trivial to re-resolve.
  • On upstream sync: zero interaction.
  • Re-test on rebase: none — docs-only.

0062 — ssimulacra2 CUDA + SYCL twins (ADR-0206)

  • ADR: ADR-0206.
  • Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2 GPU implementation; this PR adds the CUDA + SYCL twins of the fork's ADR-0201 Vulkan kernel.
  • Touches (additive + small wiring edits):
  • docs/adr/0206-ssimulacra2-cuda-sycl.md and the index row in docs/adr/README.md.
  • core/src/feature/cuda/ssimulacra2_cuda.{c,h} — new CUDA dispatch.
  • core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu and ssimulacra2_mul.cu — new CUDA fatbins.
  • core/src/feature/sycl/ssimulacra2_sycl.cpp — new SYCL extractor.
  • core/src/feature/feature_extractor.c — two new extern declarations + two new entries in feature_extractor_list[].
  • core/src/meson.build — adds ssimulacra2_blur + ssimulacra2_mul to cuda_cu_sources, introduces (or extends, if PR #157 / ADR-0202 landed first) the cuda_cu_extra_flags map with a ssimulacra2_blur entry, threads per_kernel_flags into the fatbin custom-target, and lists the two new C / CPP TUs.
  • core/src/cuda/AGENTS.md and core/src/sycl/AGENTS.md — rebase invariant notes for the per-kernel --fmad=false flag and the -fp-model=precise SYCL build flag.
  • docs/backends/cuda/overview.md, docs/backends/sycl/overview.md, docs/metrics/features.md — coverage matrix updates.
  • CHANGELOG.md Unreleased § Added.
  • Invariants (load-bearing on rebase):
  • Per-kernel --fmad=false for ssimulacra2_blur. The IIR's o = n2 * sum - d1 * prev1 - prev2 must NOT fuse into FMAs — without the flag the recursive Gaussian's per-step rounding compounds across the 6-scale pyramid past places=4.
  • -fp-model=precise on the SYCL feature build line. Removing it drifts ssimulacra2_sycl past places=2 through the IIR.
  • Hybrid host/GPU split mirrors Vulkan. Host runs YUV→RGB, XYB, downsample, and SSIM/EdgeDiff combine in double; GPU runs only mul + IIR blur. Any future PR that ports XYB or YUV→RGB onto the GPU MUST land alongside an updated ADR-0206 and re-validate places=4 on every Netflix CPU pair.
  • CUDA fex uses .extract (synchronous), not .submit/.collect. Per-frame raw YUV is D2H-copied from picture_cuda's device-side VmafPicture.data[] into pinned host scratch via cuMemcpy2DAsync. Skipping the copy segfaults — direct host reads on a CUdeviceptr are the failure mode the prior agent's WIP hit.
  • On upstream sync: zero interaction with Netflix. The GPU coverage matrix for ssimulacra2 is wholly fork-local.
  • Re-test on rebase:
meson setup build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda

python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary ./build_cuda/tools/vmaf \
  --feature ssimulacra2 --backend cuda --places 4 \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel-format 420 --bitdepth 8
# Expect: 0/48 mismatches, max_abs_diff ~1e-6.

0061 — cambi GPU feasibility spike (ADR-0205)

  • ADR: ADR-0205.
  • Research digest: docs/research/0020-cambi-gpu-strategies.md.
  • Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
  • Touches (additive only):
  • docs/adr/0205-cambi-gpu-feasibility.md, docs/research/0020-cambi-gpu-strategies.md, docs/adr/README.md index row.
  • core/src/feature/vulkan/cambi_vulkan.c — new dormant scaffold (not yet in vulkan_sources, not yet registered).
  • core/src/feature/vulkan/shaders/cambi_{derivative,decimate,filter_mode}.comp — new reference GLSL shaders, not yet in the build's shaders list.
  • core/src/feature/AGENTS.md invariants + CHANGELOG.md bullet.
  • Invariants (rebase-relevant):
  • Hybrid host/GPU port by decision. If Netflix upstream tightens the c-value formula or histogram update protocol, the host residual call site in the eventual cambi_vulkan.c::cambi_vulkan_extract must be updated alongside cambi.c::calculate_c_values — the same code is reused. Do NOT translate the c-values phase to GPU during any upstream-port PR; that optimisation belongs to the v2 strategy-III PR (deferred).
  • Scaffolds dormant in the spike PR. The cambi_vulkan.c extractor returns -ENOSYS from cambi_vulkan_init_stub until the integration follow-up wires it in. Do NOT register vmaf_fex_cambi_vulkan_scaffold in feature_extractor.c's list.
  • Shaders not in the build's shader list. Adding them to core/src/vulkan/meson.build's vulkan_shaders list before the integration PR produces orphaned *_spv.h headers. Leave them alone in this spike PR.
  • On upstream sync: zero interaction. cambi.c itself is upstream-mirrored — Netflix changes flow through port-upstream-commit; only the integration PR's host residual call site needs paired attention.
  • Re-test on rebase:

```bash meson setup build -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build

0059 — Tiny-AI Netflix corpus training prep (ADR-0203)

  • ADR: ADR-0203.
  • Upstream source: fork-local. Netflix/vmaf has no equivalent training surface.
  • Touches:
  • ai/data/ — Netflix loader, libvmaf-CLI feature extractor, distillation scoring.
  • ai/train/ — PyTorch dataset, eval harness, Lightning-style training entry point.
  • ai/scripts/run_training.sh — convenience wrapper.
  • ai/tests/ — five new pytest modules (test_netflix_loader.py, test_dataset.py, test_eval.py, test_train_smoke.py, plus conftest.py).
  • docs/ai/training.md — new "C1 (Netflix corpus)" section; existing sections untouched.
  • ai/AGENTS.md — invariants section added.
  • Invariants (load-bearing):
  • Filename ladder regex is fork-specific. <source>_<quality>_<height>_<bitrate>.yuv (dis) + <source>_<fps>fps.yuv (ref). Upstream may publish a different naming convention later; do NOT merge them — keep this loader scoped to the Netflix corpus, add a sibling loader for any upstream alternative.
  • Per-clip cache schema is consumed by both dataset and any downstream tooling. Schema is {features:{feature_names, per_frame, n_frames}, scores:{per_frame, pooled}}. Any change must invalidate $VMAF_TINY_AI_CACHE (delete or version-tag the directory).
  • Smoke command stays runnable without a built vmaf binary. The _make_zero_payload helper in ai.train.dataset injects a fake payload for --epochs 0 so CI gates don't drag a libvmaf build into the Python test surface.
  • YUV size probe never silently guesses. probe_yuv_dims either matches the 1920x1080 default, returns ffprobe's answer, or raises. Tests pass assume_dims=(16, 16) explicitly for synthetic fixtures.
  • On upstream sync: no interaction with upstream. The ai/ subtree is wholly fork-local.
  • Re-test on rebase:
python -m pytest ai/tests/test_netflix_loader.py \
    ai/tests/test_dataset.py ai/tests/test_eval.py \
    ai/tests/test_train_smoke.py -v
python ai/train/train.py --epochs 0 --data-root /tmp/mock_corpus \
    --assume-dims 16x16 --val-source BetaSrc --out-dir /tmp/out

0073 — Tiny-AI QAT trainer + first per-model QAT pass (T5-4)

  • ADR: ADR-0207 (design), ADR-0208 (per-model impl).
  • Touches: ai/train/qat.py (new), ai/scripts/qat_train.py (rewrite from NotImplementedError scaffold), ai/configs/learned_filter_v1_qat.yaml (new), ai/tests/test_qat_smoke.py (new), docs/ai/quantization.md (QAT tier added). All paths are wholly fork-local; no upstream Netflix/vmaf interaction.
  • Invariants:
  • Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) is load-bearing. Both the legacy ONNX exporter (quantized::conv2d) and the new TorchDynamo exporter (Conv2dPackedParamsBase.__obj_flatten__) refuse to consume convert_fx output on PyTorch 2.11. The bridge (state-dict diff to a fresh fp32 module + ORT static-quantize) is the only path that yields a QDQ ONNX. Do NOT collapse to a single-step convert_fx → torch.onnx.export until both PyTorch issues are fixed; re-check both exporters on each PyTorch upgrade.
  • State-dict transfer matches by submodule name + shape. _copy_qat_weights_into_fp32 walks fp32_state keys, finds the same key in the FX-prepared module, copies the tensor. Tiny-AI models today have stable submodule names (entry, body.*, exit); a model architecture that uses top-level nn.Sequential would break this because prepare_qat_fx renames Sequential children to numeric indices. The RuntimeError("0 tensors copied") guard catches the silent failure mode.
  • FX preparation runs on CPU. PyTorch 2.11's FX symbolic tracer is flaky on CUDA buffers; the trainer migrates the model to CPU before prepare_qat_fx and back to the accelerator for the fine-tune phase. The smoke test deliberately exercises the CPU path so this stays covered.
  • torch.ao.quantization deprecation will hard-fail in PyTorch 2.10. Migration target is torchao.quantization.pt2e (prepare_pt2e / convert_pt2e); the two-step pipeline is mostly pt2e-compatible — only the FX-prep call changes.
  • On upstream sync: no interaction with upstream. The ai/ subtree is fully fork-local.
  • Re-test on rebase:
python -m pytest ai/tests/test_qat_smoke.py -v
python ai/scripts/qat_train.py \
    --config ai/configs/learned_filter_v1_qat.yaml \
    --output /tmp/qat_smoke.int8.onnx --smoke

0074 — GPU-parity matrix CI gate (T6-8 / ADR-0214)

  • Touched surfaces (fork-local): scripts/ci/cross_backend_parity_gate.py (new), .github/workflows/tests-and-quality-gates.yml (new vulkan-parity-matrix-gate job), docs/development/cross-backend-gate.md (new), docs/backends/index.md (cross-backend section), libvmaf/AGENTS.md (rebase-sensitive invariant note).
  • Why this matters on rebase: the CI lane and the matrix-gate script are entirely fork-local. Upstream Netflix/vmaf has no comparable gate; conflicts on rebase are restricted to the CI workflow file when upstream rearranges its own jobs. The gate's Python script lives outside core/src/ so the upstream-sync path doesn't see it.
  • Invariants the gate enforces:
  • Per-feature absolute tolerance is declared in one place (FEATURE_TOLERANCE in scripts/ci/cross_backend_parity_gate.py). Tightening a tolerance requires a measurement-driven follow-up ADR; loosening requires a justification ADR (CLAUDE.md §12 r1).
  • The legacy single-feature gate scripts/ci/cross_backend_vif_diff.py stays for one release cycle. Sister PRs in this session add to it; the T6-8b cleanup PR deletes it once the matrix gate has soaked.
  • CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is registered. The script supports them via --backends; flipping the CI lane to required is a follow-up wiring change, not a code change.
  • On upstream sync: no interaction with upstream tests-and-quality-gates.yml (the gate job is fork-added); rebase conflicts limited to insertion-order in the workflow file.
  • Re-test on rebase:
cd libvmaf && meson setup build \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --backends cpu vulkan \
    --json-out /tmp/parity.json --md-out /tmp/parity.md

0220 — SYCL feature kernels are unconditionally fp64-free (T7-17)

  • Touches: core/src/sycl/common.cpp (init log line), core/src/sycl/AGENTS.md (new invariant row), all SYCL feature kernels under core/src/feature/sycl/ (no diff today, but the contract pins their shape going forward).
  • Invariant: every SYCL feature-kernel lambda captures and operates on float / integer types only. No double operand inside a parallel_for body, no sycl::reduction<double>, no sycl::plus<double>. A single fp64 instruction in the TU's SPIR-V module causes the Level Zero runtime to reject the entire module on Intel Arc A-series and other fp64-less devices, even when the offending kernel is never submitted. Host-side double (in extract / flush post-processing, score aggregation, log10 normalisation) remains fine. Concrete patterns in tree: ADM gain limiting via int64 Q31 (gain_limit_to_q31 + launch_decouple_csf<false> in integer_adm_sycl.cpp); VIF gain limiting via fp32 sycl::fmin; CIEDE / SSIM accumulators via sycl::reduction<int64_t> / sycl::plus<int64_t>.
  • On upstream sync: Netflix/vmaf has no SYCL backend upstream; conflicts cannot enter via git merge. The risk is a fork-local cherry-pick (e.g. a SYCL twin of a new CUDA kernel) bringing a double into a kernel lambda. Audit the lambda capture list and any sycl::reduce* calls against this invariant before merging.
  • Re-test on rebase:
# Build SYCL backend
meson setup build-sycl libvmaf -Denable_sycl=true CC=icx CXX=icpx
ninja -C build-sycl

# On an fp64-less device (e.g. Intel Arc A380), confirm the
# init log line is INFO-level and reads "device lacks native
# fp64 — kernels already use fp32 + int64 paths, no emulation
# overhead". The SYCL kernels must launch successfully (no
# SPIR-V module rejection from the Level Zero runtime).
build-sycl/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --backend sycl \
    --feature integer_vif --feature integer_adm \
    --output /tmp/sycl-fp64less.json --json

0091 — T6-9 model registry schema + --tiny-model-verify (ADR-0211)

  • No rebase impact: 100% fork-local surface. The registry (model/tiny/registry.json), its JSON Schema (model/tiny/registry.schema.json), the --tiny-model-verify CLI flag, and the vmaf_dnn_verify_signature() C entry point are entirely fork-local — none of these paths exist in upstream Netflix/vmaf. Listed here for completeness so a future /sync-upstream run sees the surface area was acknowledged.
  • Touches (additive only): model/tiny/registry.json, model/tiny/registry.schema.json, ai/scripts/validate_model_registry.py, core/src/dnn/model_loader.{c,h} (added vmaf_dnn_verify_signature()), core/include/libvmaf/dnn.h (public declaration), core/tools/cli_parse.{c,h} (ARG_TINY_MODEL_VERIFY + tiny_model_verify field), core/tools/vmaf.c (call site), core/test/dnn/test_tiny_model_verify.c, python/test/model_registry_schema_test.py, docs/ai/model-registry.md, docs/ai/inference.md, docs/ai/security.md, docs/adr/0209-...md, docs/adr/README.md (index row), CHANGELOG.md, core/src/dnn/AGENTS.md.
  • Invariants (rebase-relevant):
  • Schema is the contract. New registry fields land in registry.schema.json first, then in registry.json, then in any consumers (the C-side parser, the Python validator, the MCP). Reverse order causes mismatch.
  • schema_version is bounded. The schema accepts only {0, 1}; bump the enum and the loader's check together when adding 2.
  • Banned-function rule applies. The cosign invocation uses posix_spawnp(3p) with an explicit argv array. Do not replace with system(3) / popen(3) — both shell-parse the command and would re-introduce injection risk.
  • Bundle-file absence is fail-closed. When sigstore_bundle points at a not-yet-existing file (pre-release state), vmaf_dnn_verify_signature() returns -ENOENT. The CLI surfaces this as a load failure; do not "soften" to a warning without an explicit ADR.
  • Re-test on rebase:
python3 ai/scripts/validate_model_registry.py
python3 -m pytest python/test/model_registry_schema_test.py -v
meson test -C build-cpu --suite=dnn

0074 — HIP (AMD ROCm) backend scaffold (T7-10)

  • ADR: ADR-0212.
  • Upstream source: fork-local. HIP backend is fork-only; Netflix/vmaf has no libvmaf_hip.h and no enable_hip meson option.
  • Touches:
  • core/include/libvmaf/libvmaf_hip.h (new).
  • core/include/core/meson.build — adds the is_hip_enabled install gate, mirroring is_cuda_enabled / is_sycl_enabled boolean idioms.
  • core/meson_options.txt — new enable_hip boolean option (default false).
  • core/src/meson.build — new is_hip_enabled flag, conditional subdir('hip'), hip_sources + hip_deps threaded through libvmaf_feature_static_lib (alongside the existing CUDA / SYCL / Vulkan aggregations) and the top-level library('vmaf', ...) dependencies list.
  • core/src/hip/ (new directory: common.{c,h}, picture_hip.{c,h}, dispatch_strategy.{c,h}, meson.build).
  • core/src/feature/hip/ (new directory: adm_hip.c, vif_hip.c, motion_hip.c).
  • core/test/test_hip_smoke.c (new).
  • core/test/meson.build — registers the smoke test under if get_option('enable_hip') == true.
  • .github/workflows/libvmaf-build-matrix.yml — adds Build — Ubuntu HIP (T7-10 scaffold) row.
  • docs/backends/hip/overview.md (new), docs/backends/index.md (planned → scaffold row), docs/research/0033-hip-applicability.md (new), docs/adr/0212-hip-backend-scaffold.md (new), docs/adr/README.md (new index row).
  • libvmaf/AGENTS.md — new "HIP backend scaffold contract" rebase-sensitive invariant entry.
  • CHANGELOG.md — Unreleased § Added.
  • Invariants (rebase-relevant):
  • enable_hip is a boolean option, not a feature. Mirrors enable_cuda / enable_sycl; do not "harmonise" with enable_vulkan's feature / disabled form without an ADR amendment per ADR-0212 § "Decision".
  • Public C-API entry points return -ENOSYS for the scaffold. The smoke test core/test/test_hip_smoke.c pins this. A rebase that "succeeds" by accidentally enabling a code path (e.g. a refactor that early-returns 0 from vmaf_hip_state_init) breaks the smoke and the runtime PR's contract baseline.
  • hip_sources is added to libvmaf_feature_static_lib, NOT directly to the top-level library('vmaf', ...). The static lib is extracted into libvmaf via objects: [..., libvmaf_feature_static_lib.extract_all_objects(recursive: true), ...] at the bottom of core/src/meson.build. Adding hip_sources to the top library() too would double-link.
  • hip_deps IS added to the top library() dependencies: list. The runtime PR will populate hip_deps with the real dependency('hip-lang') linkage; threading it through the top library() ensures consumers see the transitive dependency.
  • Header purity: libvmaf_hip.h does not include <hip/hip_runtime.h>. HIP runtime types cross the public ABI as uintptr_t (matches the CUDA / Vulkan precedent; ADR-0212). Don't add <hip/...> includes to the public header during a rebase / runtime-PR bring-up.
  • No FFmpeg patch: the fork's ffmpeg-patches/ series does not currently consume the HIP API surface. CLAUDE §12 r14 only requires patch updates when an existing patch consumes the surface; the runtime PR (T7-10b) will add the hip_device filter option and the corresponding patch.
  • On upstream sync: zero interaction; HIP backend is fork-only.
  • Re-test on rebase:
cd libvmaf
meson setup build-hip -Denable_cuda=false -Denable_sycl=false \
                      -Denable_hip=true
ninja -C build-hip
meson test -C build-hip test_hip_smoke
# Expect: 9/9 pass.

# Default no-HIP build still works:
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=fast

0074 — SSIMULACRA 2 SVE2 SIMD parity (T7-38)

  • ADR: ADR-0213.
  • Touches: core/src/feature/arm64/ssimulacra2_sve2.{c,h} (new), core/src/feature/ssimulacra2.c (dispatch table override in init_simd_dispatch), core/src/arm/cpu.{c,h} (HWCAP2_SVE2 probe + new VMAF_ARM_CPU_FLAG_SVE2 enum value), core/src/meson.build (cc.compiles probe + optional arm64_ssimulacra2_sve2 static library), core/test/test_ssimulacra2_simd.c (SVE2 picker overrides on the arm64 path + dispatch diagnostic), build-aux/aarch64-linux-gnu-sve2.ini (new cross-file pinning qemu-aarch64-static -cpu max). All paths are wholly fork-local; no upstream Netflix/vmaf code is modified.
  • Invariants:
  • Fixed 4-lane SVE2 predicate. Every kernel uses svwhilelt_b32(0, 4) so SIMD arithmetic order is identical to the NEON sibling regardless of the runtime vector length. This keeps the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract intact. Do NOT widen the predicate to svptrue_b32() without a separate ADR + snapshot regen — variable-length lane reductions perturb the per-step rounding order.
  • NEON stays the fallback. SVE2 is purely additive; the dispatch table assigns NEON first and only overrides on VMAF_ARM_CPU_FLAG_SVE2. A toolchain that fails the cc.compiles(... -march=armv9-a+sve2) probe leaves HAVE_SVE2 unset and the legacy NEON-only build is unchanged.
  • -ffp-contract=off mirrors the NEON sibling. Without it GCC fuses the per-lane scalar tail's a*b+c patterns into fmla, drifting against the SIMD path by ~1 ulp. The arm64_ssimulacra2_sve2 static library carries the flag like its NEON counterpart.
  • On upstream sync: no interaction with upstream — arm64/ feature TUs and the arm/cpu.{c,h} flag enum are fork-local. An upstream sync that rewrites init_simd_dispatch in core/src/feature/ssimulacra2.c would also need the SVE2 cases preserved.
  • Re-test on rebase:
meson setup build-arm64-sve2 libvmaf \
    --cross-file=build-aux/aarch64-linux-gnu-sve2.ini -Denable_asm=true
ninja -C build-arm64-sve2 test/test_ssimulacra2_simd
meson test -C build-arm64-sve2 test_ssimulacra2_simd
# stderr should report `ssimulacra2 simd dispatch: NEON=1 SVE2=1`
# and 11/11 tests should pass.

0075 — enable_lcs MS-SSIM extras on CUDA + Vulkan (T7-35 / ADR-0243)

  • Touched surfaces (fork-local): core/src/feature/cuda/integer_ms_ssim_cuda.c (added enable_lcs to MsSsimStateCuda + options[] + 15 host-side vmaf_feature_collector_append calls gated on the bool), core/src/feature/vulkan/ms_ssim_vulkan.c (rewrote enable_lcs help text + added emit_lcs_metrics helper + gated 15 vmaf_feature_collector_append calls), scripts/ci/cross_backend_vif_diff.py
  • scripts/ci/cross_backend_parity_gate.py (new float_ms_ssim_lcs pseudo-feature + FEATURE_ALIASES map
  • places=4 tolerance row).
  • Why this matters on rebase: the GPU MS-SSIM extractors are fork-local (Netflix upstream has no Vulkan or CUDA MS-SSIM kernel today). The enable_lcs semantic and the metric names (float_ms_ssim_{l,c,s}_scale{0..4}) must match the upstream CPU reference at core/src/feature/float_ms_ssim.c:189-221. If upstream ever renames or reorders those metrics, mirror the change on the GPU side in the same merge — public-API contract.
  • Invariants the contract enforces:
  • Default-path output (enable_lcs=false) stays bit-identical to the pre-T7-35 binary: only the host-side appends are gated; no kernel / shader / device-buffer changes.
  • Metric ordering is metric-wise (all l_scale* first, then c_*, then s_*) — matches the CPU emission order.
  • places=4 cross-backend tolerance per ADR-0190; enforced by the new float_ms_ssim_lcs cell in the parity matrix gate (ADR-0214).
  • On upstream sync: zero interaction; the GPU twins do not exist upstream. The CPU float_ms_ssim.c is shared with upstream but enable_lcs is upstream-stable since v3.0.0.
  • Re-test on rebase:
cd libvmaf && meson setup build-vulkan \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled -Denable_float=true \
    --buildtype=release && ninja -C build-vulkan
cd ..
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build-vulkan/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 \
    --feature float_ms_ssim_lcs --backend vulkan --places 4

0075 — 32-bit ADM/cpu fallbacks port (T-NEW-3)

  • Touched surfaces (upstream-mirror): core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/x86/cpu.c. Cherry-picks of upstream 8a289703 (Christopher Degawa, "adm: add fallback for extract_epi64 for 32-bit") and 1b6c3886 ("x86/cpu: remove limit of avx+ on 32-bit").
  • Why this matters on rebase: trivially conflict-free with any future upstream extract_epi64 work because we land upstream's exact extract_epi64 macro/inline-fn pair. The conflict surface is the fork's clang-format-100col layout in adm_avx2.c / adm_avx512.c and the _Alignas(64) LTO-correctness slot in adm_avx512.c (docs/development/known-upstream-bugs.md); both are preserved verbatim.
  • Invariants the port preserves:
  • _Alignas(64) int64_t angle_flag[16] in adm_decouple_s123_avx512 stays — without it, LTO can promote the unaligned load to vmovdqa64 and fault under --buildtype=release -Db_lto=true.
  • The extract_epi64 symbol must remain resolved on both __x86_64__ (macro to _mm256_extract_epi64) and 32-bit (fallback inline). If a future upstream change inlines the helper differently, keep the conditional definition.
  • On upstream sync: if Netflix ships further 32-bit fallbacks (motion / psnr — not in this port), expect a parallel extract_epi64-style helper at the top of each affected SIMD file. The fork should mirror those verbatim into the same files.
  • Re-test on rebase:
meson setup build-i686 libvmaf \
    --cross-file=build-aux/i686-linux-gnu.ini \
    -Denable_asm=false
ninja -C build-i686
meson setup build-cpu libvmaf -Denable_avx512=true
ninja -C build-cpu
meson test -C build-cpu

0076 — codec-aware FR regressor surface (T7-CODEC-AWARE / ADR-0235)

  • Touches: ai/src/vmaf_train/codec.py (new), ai/src/vmaf_train/models/fr_regressor.py (extended), ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/extract_full_features.py. No upstream-shared paths.
  • Invariant: CODEC_VOCAB in ai/src/vmaf_train/codec.py is closed and order-stable — the index of each codec is the one-hot column index baked into trained ONNX. Adding a codec appends to the tuple and bumps CODEC_VOCAB_VERSION; reordering silently invalidates every shipped fr_regressor_v2_*.onnx. FRRegressor(num_codecs=0) must remain the v1 single-input contract — flipping the default would break every existing model/tiny/fr_regressor_v1.onnx consumer.
  • Re-test: pytest ai/tests/test_codec_aware_fr.py -v (8 sub-tests covering vocabulary contract + alias table + back-compat). Pure fork-local addition; no upstream rebase impact for the next /sync-upstream.

0075 — feature/speed extractors (T-NEW-1, upstream port d3647c73)

  • Touches: core/src/feature/speed.c (new), core/src/feature/picture_copy.{c,h} (signature change — added int channel parameter), core/src/feature/float_*.c call sites updated to pass channel=0, core/src/feature/feature_extractor.c registry block, core/src/feature/alias.c, core/src/meson.build, core/src/feature/vif_tools.{c,h} (helper-function port from upstream 4ad6e0ea).
  • Upstream source: verbatim cherry-pick of Netflix/vmaf d3647c73 ("feature/speed: port speed_chroma and speed_temporal extractors") with its dependency 4ad6e0ea ("feature/vif: port helper functions"). Both are pre-existing on Netflix master and enter the fork as part of the T7-4 audit catch-up.
  • Invariant: picture_copy() now takes a channel argument — every fork-local extractor that calls it (CUDA integer_ms_ssim, Vulkan ssim / ms_ssim) passes channel=0. If upstream later evolves the signature again (e.g. adds bit-depth or stride validation), update those fork-local call sites in lockstep. Speed extractors only register when VMAF_FLOAT_FEATURES=1 (build with -Denable_float=true).
  • On upstream sync: future Netflix commits in core/src/feature/speed.c apply cleanly because the file is now a verbatim mirror; conflict potential is limited to the registry block in feature_extractor.c (interleave with the fork's Vulkan / SYCL / CUDA blocks) and to any further picture_copy signature evolution.
  • Re-test on rebase:

```bash meson setup build-cpu libvmaf -Denable_cuda=false \ -Denable_sycl=false -Denable_float=true ninja -C build-cpu meson test -C build-cpu test_speed meson test -C build-cpu # full meson suite make test-netflix-golden # 3 CPU canonical pairs

0221 — CHANGELOG + ADR-index fragment-file pattern (T7-39 / ADR-0221)

  • What changed: the fork stopped editing CHANGELOG.md and docs/adr/README.md directly. Both files are now rendered from fragment trees:
  • changelog.d/<section>/<topic>.md (Keep-a-Changelog sections), plus the migration archive changelog.d/_pre_fragment_legacy.md.
  • docs/adr/_index_fragments/<NNNN-slug>.md, plus docs/adr/_index_fragments/_order.txt (frozen commit-merge order manifest) and docs/adr/_index_fragments/_header.md (table prelude). Two scripts render the consolidated outputs:
  • scripts/release/concat-changelog-fragments.sh --check|--write
  • scripts/docs/concat-adr-index.sh --check|--write
  • On upstream sync: zero interaction — CHANGELOG.md is a fork-local Markdown surface (Netflix upstream doesn't ship a Keep-a-Changelog file in this format), and docs/adr/ is entirely fork-local. A /sync-upstream run will not touch the fragment trees.
  • Re-test on rebase:
bash scripts/release/concat-changelog-fragments.sh --check
bash scripts/docs/concat-adr-index.sh --check
# both must exit 0; otherwise run --write and re-stage.

0077 — DISTS extractor proposal (T7-DISTS / ADR-0236)

  • What landed: ADR-0236 (Proposed) + Research-0043 design digest ADR README index row + CHANGELOG entry.
  • Rebase impact: pure fork-local proposal-stage docs; no code, no Netflix-mirror file touched, no ffmpeg-patches change, no public C-API surface change.
  • Reproducer (when implementation lands as T7-DISTS):

```sh vmaf --feature dists_sq=model_path=model/tiny/dists_sq.onnx \ --reference ref.yuv --distorted dist.yuv \ --width 1920 --height 1080 --pix_fmt yuv420p

0076 — GPU-gen ULP calibration head (proposal-stage, T7-GPU-ULP-CAL / ADR-0234)

  • What landed: ADR-0234 (Proposed), Research-0041, data-collection scaffold at ai/scripts/collect_gpu_calibration_data.py, forward-pointer in docs/usage/cli.md for the future --gpu-calibrated flag.
  • Rebase impact: pure fork-local (proposal docs + Python script); no upstream Netflix/vmaf code touched, no public C-API changes, no ffmpeg-patches changes.
  • Reproducer:

```sh python3 ai/scripts/collect_gpu_calibration_data.py --smoke

0095 — Per-backend GPU kernel scaffolding templates (CUDA + Vulkan, ADR-0246)

  • ADR: ADR-0246.
  • Touches:
  • core/src/cuda/kernel_template.h (new, header-only).
  • core/src/vulkan/kernel_template.h (new, header-only).
  • core/src/cuda/AGENTS.md (new invariant row + dir listing).
  • core/src/vulkan/AGENTS.md (new file).
  • docs/backends/kernel-scaffolding.md (new).
  • docs/adr/0246-gpu-kernel-template.md (new).
  • CHANGELOG.md, docs/adr/README.md. All paths are wholly fork-local. Upstream Netflix/vmaf has no Vulkan backend at all today and the CUDA backend uses different per-kernel scaffolding shapes; nothing here can collide on a pure upstream sync.
  • Invariants:
  • Templates are unused at PR-merge time. kernel_template.h in both core/src/cuda/ and core/src/vulkan/ lands with zero call-sites. Each future kernel migration is its own gated PR (places=4 cross-backend-diff per ADR-0214). Do not bulk-port existing kernels onto the templates in a single sync — that would short-circuit the per-kernel gate.
  • Per-backend, not cross-backend. Resist the urge to merge the two templates into a unified gpu/kernel_template.h. CUDA async-stream + event vs Vulkan command-buffer + fence + descriptor-pool share no concrete shape; a unified API would be lowest-common-denominator.
  • Helper functions, not macros. The header bodies are static inline functions for cuda-gdb / Nsight / RenderDoc step-debugging. The CHECK_CUDA_GOTO / CHECK_CUDA_RETURN macros in cuda_helper.cuh stay where they pay off (textual goto label), and the templates use them internally.
  • On upstream sync: no interaction with upstream paths. An upstream sync that touches core/src/cuda/common.h or picture_cuda.h may shift the helper signatures the template consumes (vmaf_cuda_buffer_alloc, vmaf_cuda_picture_get_stream, …); update the template if so.
  • Re-test on rebase:

```bash # CUDA build (configure inside libvmaf/ — see CLAUDE.md §2 note). meson setup core/build-cuda libvmaf \ -Denable_cuda=true -Denable_nvcc=true \ -Denable_vulkan=disabled -Denable_sycl=false ninja -C core/build-cuda meson test -C core/build-cuda

# Vulkan build. meson setup core/build-vulkan libvmaf \ -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C core/build-vulkan meson test -C core/build-vulkan

0222 — vmaf-perShot per-shot CRF predictor sidecar (T6-3b)

  • Touches: core/tools/meson.build (new executable + test wiring), core/tools/vmaf_per_shot.c (new file — fork-local, no upstream sibling), core/tools/test/meson.build (test row), core/tools/test/test_vmaf_per_shot.sh (new smoke test), core/tools/AGENTS.md (sidecar invariants), docs/usage/cli.md (cross-link), docs/usage/vmaf-perShot.md (new user doc), docs/ai/roadmap.md (T6-3b row update).
  • Invariant: the sidecar must stay standalone — it does not link the libvmaf metric path. Any upstream patch that tries to fold per-shot CRF prediction into vmaf_score_* would collapse the encoder-hint vs. quality-score separation recorded in roadmap §2.4 and ADR-0222 §Decision. The CSV / JSON column set (shot_id, start_frame, end_frame, frames, mean_complexity, mean_motion, predicted_crf) is the public schema; downstream encoders consume it directly.
  • Conflict expectation on /sync-upstream: low. Upstream Netflix has no per-shot CRF predictor in tree, so there is no natural collision point — tools/meson.build is the only mutually-edited file and the new executable('vmaf-perShot', …) block is appended after vmaf_bench_deps, well clear of upstream's likely additions.
  • Reproducer:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=disabled ninja -C build meson test -C build test_vmaf_per_shot --print-errorlogs ./build/tools/vmaf-perShot \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --output /tmp/plan.csv cat /tmp/plan.csv

0075 — vmaf-roi sidecar binary (T6-2b / ADR-0247)

  • Touches:
  • core/tools/meson.build — adds the vmaf_roi executable target (after the existing vmaf target, before vmaf_bench). Append-only; no upstream-shared lines moved or removed.
  • core/test/meson.build — adds the test_vmaf_roi executable + test() registration. Append-only.
  • core/tools/vmaf_roi.c — wholly new, fork-local.
  • core/tools/vmaf_roi_core.h — wholly new, fork-local.
  • core/test/test_vmaf_roi.c — wholly new, fork-local.
  • Invariant: the vmaf-roi sidecar emits two byte-exact formats that downstream encoder drivers (x265 --qpfile, SVT-AV1 --roi-map-file) will hard-depend on:
  • x265 ASCII grid — two #-prefixed header lines (# vmaf-roi qpfile (x265, --qpfile-style) and # frame=N ctu=S cols=C rows=R strength=F.FFF), space-separated signed integers, one row per CTU row, \n terminator.
  • SVT-AV1 raw binary — exactly cols * rows bytes of int8_t, row-major, no header.
  • QP-offset clamp+-12 (VMAF_ROI_CORE_QP_OFFSET_MAX).
  • Reduction — per-CTU mean (not max). Switching to max or a percentile changes every downstream encoder result and requires its own ADR.
  • Pure helpers in vmaf_roi_core.h — the per-CTU mean reducer and saliency-to-QP mapper are static inline in a header so test_vmaf_roi compiles them without dragging the libvmaf link surface in. Moving them into a .c TU breaks the test wiring.
  • On upstream sync: no interaction with upstream — tools/ is a fork-local surface from upstream's perspective (upstream ships vmaf.c only). An upstream sync that rewrites core/tools/meson.build should preserve the vmaf_roi executable block.
  • Re-test on rebase:

```bash meson setup build-cpu libvmaf \ -Denable_cuda=false -Denable_sycl=false -Denable_tools=true ninja -C build-cpu tools/vmaf_roi test/test_vmaf_roi meson test -C build-cpu test_vmaf_roi ./build-cpu/tools/vmaf_roi \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --frame 0 --output - \ --encoder x265 --ctu-size 64 --strength 6.0 | head -3 # First two lines are the # comment header; row 1 of the grid # should be "4 2 1 -1 -1 -1 1 2 4" (placeholder radial map).

0219 — motion3 GPU coverage on Vulkan + CUDA + SYCL (T3-15(c) / ADR-0219)

  • What changed: The motion GPU twins (core/src/feature/vulkan/motion_vulkan.c, core/src/feature/cuda/integer_motion_cuda.c, core/src/feature/sycl/integer_motion_sycl.cpp) now emit VMAF_integer_feature_motion3_score in 3-frame window mode (default). Cross-backend gates extended (scripts/ci/cross_backend_*.py FEATURE_METRICS["motion"]).
  • Invariants:
  • motion3 = host-side scalar post-process of motion2. No device-side state changes; motion3 is computed on the host in extract() / collect() / flush() after the existing SAD reduction. The post-processing function (motion3_postprocess_*) mirrors CPU integer_motion.c lines 510-560 byte-for-byte: clip(motion_blend(motion2 * fps_weight, blend_factor, blend_offset), max_val) with optional moving-average against the unaveraged prior blended value.
  • motion_five_frame_window=true returns -ENOTSUP at init() on all three GPU backends. The 5-deep blur ring + second SAD-pair dispatch remain deferred. Do NOT silently fall back to the 3-frame path when the user enables the flag — fail loud per CERT C / CLAUDE.md §12 r4.
  • CPU motion3 algorithm is the source of truth. Any port of an upstream Netflix change to integer_motion.c that touches motion_blend(...), the motion_max_val clip, or the moving-average rule MUST be mirrored in motion3_postprocess_* across all three GPU files in the same PR. The cross-backend gate at places=4 will catch drift, but only after a full GPU run.
  • On upstream sync: Pure fork-local additions to GPU TUs. Upstream Netflix has no GPU motion extractor. The motion_blend_tools.h header is upstream-mirrored — if a sync rewrites the motion_blend() formula, regenerate the GPU snapshot and re-run the cross-backend gate.
  • Re-test on rebase:

```bash # CPU sanity (motion3 emission unchanged) ./core/build/tools/vmaf \ --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature motion --output /tmp/motion.json --json python -c "import json; d=json.load(open('/tmp/motion.json')); \ print('motion3 frames:', sum(1 for f in d['frames'] \ if 'integer_motion3' in f.get('metrics', {})))" # Expect 49 (one motion3 per frame).

# Cross-backend gate (Vulkan/lavapipe lane works on every host): python scripts/ci/cross_backend_vif_diff.py \ --feature motion --backend vulkan \ --ref python/test/resource/yuv/src01_hrc00_576x324.yuv \ --dis python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --bitdepth 8 \ --vmaf-bin core/build/tools/vmaf # Expect: integer_motion / integer_motion2 / integer_motion3 all OK at places=4.

0216 — vmaf_tiny_v2 (Phase-3-validated tiny VMAF MLP)

  • Touches: model/tiny/registry.json, model/tiny/vmaf_tiny_v2.{onnx,json}, ai/scripts/{train,export,validate}_vmaf_tiny_v2.py, ai/AGENTS.md, core/test/dnn/{test_vmaf_tiny_v2.py,meson.build}, docs/ai/{models/vmaf_tiny_v2.md,inference.md,roadmap.md}, docs/adr/{0244-vmaf-tiny-v2.md,README.md}, CHANGELOG.md. All paths are wholly fork-local; no upstream Netflix/vmaf code is modified.
  • Invariants:
  • Bundled scaler stats are part of the trust root. The shipped ONNX bakes (input - mean) / std as Constant Sub + Div nodes that run before the MLP. Re-exporting must go through ai/scripts/export_vmaf_tiny_v2.py, which pulls mean / std from the trainer checkpoint and writes them as graph initialisers. Adding an out-of-band scaler step at runtime (e.g., a sidecar JSON consumed by the loader) is forbidden without a follow-up ADR — it splits the trust root and invalidates the registry sha256 contract.
  • Feature column order is fixed. The graph reads (adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2) in exactly this order; reordering breaks the bundled mean / std constants. Any change to the feature set requires a fresh Phase-3 chain (Research-0027 → 0028 → 0029 → 0030).
  • opset 17. Matches the sister tiny-AI models (learned_filter_v1, nr_metric_v1, fastdvdnet_pre) and the ORT op-allowlist baseline. Upgrading requires re-validating the Sub / Div / Gemm / Relu / Squeeze ops against op_allowlist.c.
  • On upstream sync: zero interaction. Netflix/vmaf has no equivalent surface; an upstream sync that touches core/src/dnn/ (op-allowlist or model-loader changes) needs to preserve Sub / Div / Gemm / Relu / Squeeze in the allowlist for opset 17.
  • Re-test on rebase:

```bash bash core/test/dnn/test_registry.sh python3 core/test/dnn/test_vmaf_tiny_v2.py python3 ai/scripts/validate_vmaf_tiny_v2.py \ --onnx model/tiny/vmaf_tiny_v2.onnx \ --parquet runs/full_features_netflix.parquet \ --rows 100 --min-plcc 0.97 meson test -C build-cpu --suite=dnn

0094 — Tiny-AI extractor template (ADR-0250)

  • Touches: core/src/dnn/tiny_extractor_template.h (new), core/src/feature/feature_lpips.c, core/src/feature/fastdvdnet_pre.c, core/src/dnn/AGENTS.md, docs/ai/extractor-template.md (new), docs/adr/0250-tiny-ai-extractor-template.md (new).
  • Invariants:
  • Helper signatures are wire-format-stable. vmaf_tiny_ai_resolve_model_path(name, option, env_var) and vmaf_tiny_ai_open_session(name, path, &out) produce the user-facing log lines <name>: no model path … and <name>: vmaf_dnn_session_open(<path>) failed: <rc> — downstream tooling greps these. Don't rename or reorder the parameters without bumping every extractor + the recipe doc.
  • YUV→RGB is bit-exact. The shared vmaf_tiny_ai_yuv8_to_rgb8_planes is a literal move of the pre-existing feature_lpips.c body (BT.709 limited-range, nearest-neighbour chroma upsample). LPIPS / saliency / future colour-sensitive tiny-AI scores depend on byte-exact equality with the prior ad-hoc copies. Any change to the conversion constants or the rounding rule needs a separate ADR + a coordinated snapshot regen — model/tiny/ weights aren't re-trained against new colour math casually.
  • Option-table macro is plain text substitution. The VMAF_TINY_AI_MODEL_PATH_OPTION(state_t, help) macro emits a single struct literal — no control flow, no recursion, no variadic shenanigans (Power-of-10 rule 1 / rule 9). Don't extend it into a multi-option emitter without a fresh ADR.
  • On upstream sync: zero interaction with upstream — feature_lpips.c and fastdvdnet_pre.c are fork-only files, and the new dnn/tiny_extractor_template.h lives entirely under fork-introduced core/src/dnn/. An upstream sync that rewrites unrelated feature_*.c files won't conflict.
  • Re-test on rebase:
cd libvmaf
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=dnn
meson test -C build-cpu test_lpips test_fastdvdnet_pre
# All 10 dnn-suite + both extractor tests must pass.

0095 — Vulkan ring-depth tunable (ADR-0251 follow-up #3)

  • PR: feat/t7-29-followup3-ring-tunable.
  • What rebases need to know: VmafVulkanConfiguration grew an additive unsigned max_outstanding_frames field. Existing zero-initialised configs continue to receive the canonical default (0 → VMAF_VULKAN_RING_DEFAULT == 4). The clamp helper vmaf_vulkan_clamp_ring_size moved from import.c (file-local static) to vulkan_internal.h (static inline) so state_init and lazy_alloc_ring share one definition; an upstream sync that re-introduces the static in import.c would shadow the header helper — drop the duplicate, keep the inline.
  • New public symbol: vmaf_vulkan_state_max_outstanding_frames(const VmafVulkanState *) — read-side accessor for the clamped value. Pure additive surface; no upstream collision.
  • On upstream sync: zero interaction. The ring is wholly fork-introduced (ADR-0251); upstream Netflix has no Vulkan backend.
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_async_pending_fence # All 8 cases must pass: 4 v2-contract + 4 ring-tunable.

0096 — tools/vmaf-tune/ automation umbrella spec (ADR-0237 / Research-0044)

  • PR: feat/vmaf-tune-spec.
  • What rebases need to know: this PR ships only an umbrella ADR research digest under docs/. No tracked source code, no tools/vmaf-tune/ directory yet, no Meson changes. An upstream sync touching ffmpeg-patches or libvmaf/ cannot collide with this PR.
  • On upstream sync: zero interaction. Spec-only PR.
  • Re-test on rebase:
# No build/test impact — verify the docs render and links are alive:
ls docs/adr/0237-quality-aware-encode-automation.md \
   docs/research/0044-quality-aware-encode-automation.md
grep -c '\[ADR-0237\]' docs/adr/README.md

0097 — test_speed gated on enable_float (fix default-build failure)

  • PR: fix/test-speed-chroma-registration.
  • What rebases need to know: core/test/meson.build now wraps the test_speed executable + test() registration in if get_option('enable_float'). The speed_chroma / speed_temporal extractors live in speed.c, which is only compiled when enable_float=true (the entries in feature_extractor.c are wrapped in #if VMAF_FLOAT_FEATURES), so the test's vmaf_get_feature_extractor_by_name("speed_chroma") returned NULL on a default build (enable_float=false).
  • On upstream sync: zero interaction. test_speed.c was added fork-side via the Netflix port commit d3647c73. The gating pattern matches test_vulkan_* (if get_option('enable_vulkan').enabled()).
  • Re-test on rebase:
# default (enable_float=false): test_speed must NOT be in the suite
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --reconfigure
ninja -C build
meson test -C build  # expect: NO test_speed in the run

# CI shape (enable_float=true): test_speed must run + pass
meson setup build libvmaf -Denable_float=true --reconfigure
ninja -C build
meson test -C build test_speed  # expect: 5/5 pass

0098 — Vulkan picture preallocation surface (ADR-0238)

  • PR: feat/vulkan-picture-preallocation.
  • What rebases need to know: ABI grows additively. New public surface in core/include/libvmaf/libvmaf_vulkan.h: enum VmafVulkanPicturePreallocationMethod, VmafVulkanPictureConfiguration, vmaf_vulkan_preallocate_pictures, vmaf_vulkan_picture_fetch. New enumerator VMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICE in core/src/picture.h::VmafPictureBufferType. New TU core/src/vulkan/picture_vulkan_pool.c (~180 LOC); registered in core/src/vulkan/meson.build. Fork-internal accessor vmaf_vulkan_state_context() (declared in vulkan_internal.h) exposes the imported state's VkInstance/VkDevice to the pool — used only by libvmaf.c::vmaf_vulkan_preallocate_pictures.
  • VmafContext field added: vmaf->vulkan.pool next to vmaf->vulkan.state. The vmaf_close() teardown closes the pool before clearing the state pointer (matches SYCL).
  • On upstream sync: zero interaction. Vulkan backend is fork-only; upstream Netflix has no Vulkan integration.
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_pic_preallocation # All 6 cases must pass under ASan/UBSan: # test_method_none_is_a_no_op # test_method_host_allocates_round_robins # test_method_device_allocates_round_robins # test_fetch_without_preallocate_falls_back # test_unknown_method_rejected # test_null_args_rejected

0099 — feature_mobilesal.c + transnet_v2.c migrated to tiny_extractor_template.h

  • PR: refactor/migrate-ai-to-template.
  • What rebases need to know: feature_mobilesal.c and transnet_v2.c previously open-coded the model-path resolution (getenv + log block), the YUV→RGB kernel (mobilesal only), the vmaf_dnn_session_open + log boilerplate, and the VmafOption[].model_path row. They now use the helpers from dnn/tiny_extractor_template.h (PR #251) — the same template feature_lpips.c and fastdvdnet_pre.c already consume. Net −98 LOC of identical boilerplate.
  • Behavior preserved: bit-exact YUV→RGB conversion (mobilesal used the literal copy of feature_lpips.c's body that the template hoisted), identical error-log strings, identical option-table flag/type/offset shape. The migrated mobilesal_options macro expands to the same struct literal the hand-rolled version produced.
  • On upstream sync: zero interaction. Both files are fork-introduced; upstream Netflix has neither extractor.

0100 — cuda/ring_buffer.{c,h}gpu_picture_pool.{c,h} (ADR-0239)

  • PR: refactor/gpu-picture-pool-extract.
  • What rebases need to know: core/src/cuda/ring_buffer.c and ring_buffer.h are removed. The same callback-based round-robin pool lives at core/src/gpu_picture_pool.{c,h} under renamed symbols (VmafRingBufferVmafGpuPicturePool, vmaf_ring_buffer_*vmaf_gpu_picture_pool_*, _fetch_next_picture_fetch). All call sites in libvmaf.c migrated. core/test/test_ring_buffer.c renamed to test_gpu_picture_pool.c with the corresponding meson update.
  • Netflix-upstream interaction: minimal — Netflix's cuda/ring_buffer.{c,h} last touched in commit cb1d49c6. An upstream sync that resurrects the old names should be redirected to the new ones; the file move is purely fork-local.
  • Netflix#1300 mutex-destroy-order fix preserved (ADR-0157) — moved verbatim to the new file; the fix remains attached to vmaf_gpu_picture_pool_close.
  • SYCL pool migration: vmaf_sycl_picture_pool_* keeps its public-internal API but now delegates to the generic pool. The SYCL wrapper struct (VmafSyclPicturePool) just owns the VmafSyclCookie storage. std::mutex drops out.
  • Vulkan pool migration: bundled into this PR after #264 merged. picture_vulkan_pool.c rewrites as a thin wrapper around the generic pool — wrapper struct owns per-pool state for the alloc/free callbacks; the generic pool owns the round-robin slots / mutex / unwind. Same pattern as the SYCL migration above.
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=dnn
meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre
# All 11 dnn-suite + 4 extractor smoke tests must pass.
meson test -C build  # 47/47 pass under ASan/UBSan

# CUDA build (CI-only; pre-existing local nvcc include-path quirk):
meson setup build-cuda libvmaf -Denable_cuda=true
ninja -C build-cuda
meson test -C build-cuda test_gpu_picture_pool

# SYCL build:
meson setup build-sycl libvmaf -Denable_sycl=true
ninja -C build-sycl
meson test -C build-sycl

0104 — psnr_vulkan.c migrated to vulkan/kernel_template.h

  • PR: refactor/migrate-psnr-vulkan-to-template.
  • What rebases need to know: vulkan/kernel_template.h (410 LOC, ADR-0246, PR #251) shipped with zero consumers. Its docstring designated psnr_vulkan.c as the reference implementation. This PR lands the migration as the first consumer of the Vulkan template — paired with PR #269 (the first CUDA template consumer). The 5 long-lived pipeline objects (descriptor-set layout, pipeline layout, shader module, compute pipeline, descriptor pool) collapse from individual struct fields to one VmafVulkanKernelPipeline pl bundle. create_pipeline() (~104 LOC) collapses to a single vmaf_vulkan_kernel_pipeline_create() call (~30 LOC) — the template owns the descriptor-set layout creation, pipeline layout, shader module, compute pipeline, and descriptor-pool sizing. close_fex()'s vkDeviceWaitIdle + 5×vkDestroy* sweep collapses to one vmaf_vulkan_kernel_pipeline_destroy() call.
  • Net LOC delta: −55 LOC on psnr_vulkan.c directly. Unlike the CUDA template (where helper-call boilerplate roughly matches the inline savings), the Vulkan template's pipeline creation is dramatic enough that even the first consumer wins.
  • Bit-exactness gates: spec-constants, push-constant struct, shader bytecode, dispatch grid math, and host-side reduction are byte-identical to the prior implementation. The template only owns descriptor-set layout / pipeline layout / shader module / compute pipeline creation / descriptor pool sizing — none of which affects the kernel's mathematical behaviour. Cross-backend parity gate (places=4) re-runs unchanged.
  • On upstream sync: zero interaction. psnr_vulkan.c is fork-introduced (T7-23 / ADR-0182 / ADR-0216).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=enabled
ninja -C build
meson test -C build  # 50/50 pass on lavapipe
# Cross-backend parity gate (places=4):
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4

0105 — moment_vulkan.c + ciede_vulkan.c migrated to vulkan/kernel_template.h

  • PR: refactor/migrate-motion-vulkan-to-template (note: the branch name reflects the original intent; motion's two-pipeline shape didn't fit the template's single-pipeline contract, so this PR migrates moment + ciede instead).
  • What rebases need to know: second + third consumers of vulkan/kernel_template.h (after PR #270 = psnr_vulkan, the first consumer). Both files follow the identical migration pattern:
  • Replace 5 individual pipeline-object fields (dsl, pipeline_layout, shader, pipeline, desc_pool) with one VmafVulkanKernelPipeline pl bundle.
  • Replace ~100 LOC of create_pipeline() body (descriptor-set layout + pipeline layout + shader module + compute pipeline + descriptor pool boilerplate) with a single vmaf_vulkan_kernel_pipeline_create() call.
  • Replace close_fex()'s vkDeviceWaitIdle + 5×vkDestroy* sweep with one vmaf_vulkan_kernel_pipeline_destroy() call.
  • Per-file LOC deltas:
  • moment_vulkan.c: −60 LOC (450 → 390).
  • ciede_vulkan.c: −59 LOC (536 → 477).
  • Net: −119 LOC.
  • Bit-exactness preserved: spec-constants (width/height/bpc/ subgroup_size identical across both), push-constant structs (MomentPushConsts, CiedePushConsts), shader bytecodes (moment_spv, ciede_spv), dispatch grid math, and host-side reductions are byte-identical to the prior implementation. Cross-backend parity gates (places=4 for moment integer reduce; places=2 for ciede transcendentals per ADR-0187) re-run unchanged.
  • motion_vulkan.c deferred: motion uses two pipelines (first frame vs subsequent) sharing one DSL + layout + shader + pool. The template's current shape produces one pipeline per descriptor; splitting motion across two VmafVulkanKernelPipeline instances would duplicate the shared objects. Tracked as a follow-up template extension (multi-pipeline support).
  • On upstream sync: zero interaction. Both files are fork-introduced (T7-23 / ADR-0182 / ADR-0187).
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build # 50/50 pass on lavapipe (under ASan/UBSan) python scripts/ci/cross_backend_parity_gate.py --feature float_moment_ref1st --places 4 python scripts/ci/cross_backend_parity_gate.py --feature ciede2000 --places 2

0101 — GPU backend pattern doc (ADR-0240)

  • PR: docs/gpu-backend-template.
  • What rebases need to know: doc-only PR. Adds docs/development/gpu-backend-template.md (recipe new GPU backends follow) and core/include/libvmaf/AGENTS.md (public-headers-tree invariant note). No source code, no meson changes, no ABI impact.
  • On upstream sync: zero interaction. Both files are fork-introduced.
  • Re-test on rebase:

```bash # Doc-only — verify links resolve: test -f docs/development/gpu-backend-template.md test -f core/include/libvmaf/AGENTS.md grep -c 'gpu-backend-template' core/include/libvmaf/AGENTS.md

0102 — Tiny-AI test registration macro (tiny_ai_test_template.h)

  • PR: refactor/test-registration-macro.
  • What rebases need to know: new core/test/tiny_ai_test_template.h emits the four standard registration tests (<name>_is_registered, <name>_provides_primary_feature, <name>_options_table_well_formed, <name>_init_rejects_missing_model) via the VMAF_TINY_AI_DEFINE_REGISTRATION_TESTS(ext, feat, env, prefix) macro. The four per-extractor test files (test_lpips.c, test_mobilesal.c, test_transnet_v2.c, test_fastdvdnet_pre.c) shrank from ~140 LOC each to ~20-50 LOC. Net −286 LOC. Behavior bit-exact preserved (same assertions, same env-var save/restore dance, same setenv shim for MSVCRT). TransNet V2 keeps two extractor-specific extra tests (binary-flag round-trip + provided_features list-termination) that the macro doesn't cover.
  • On upstream sync: zero interaction. The four test files are fork-introduced (per ADR-0042 / ADR-0168 / ADR-0220 / ADR-0223 / ADR-0215).
  • Re-test on rebase:

```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre # 4/4 binaries pass; 18 individual tests total (4x4 standard + 2 # TransNet V2 extras).

0103 — integer_psnr_cuda.c migrated to cuda/kernel_template.h

  • PR: refactor/migrate-psnr-cuda-to-template.
  • What rebases need to know: cuda/kernel_template.h shipped with no consumers in PR #251 (ADR-0246). This PR migrates the first consumer (integer_psnr_cuda.c) — the file the template's own docstring explicitly designated as the reference. The CUstream + CUevent + CUevent triple and the (VmafCudaBuffer device, void *host_pinned, size_t bytes) readback pair are now dispensed by the template helpers (vmaf_cuda_kernel_lifecycle_init/_close, vmaf_cuda_kernel_readback_alloc/_free, vmaf_cuda_kernel_submit_pre_launch, vmaf_cuda_kernel_collect_wait) instead of being open-coded. PsnrStateCuda shrinks: replaces three fields (event + finished + str) with one VmafCudaKernelLifecycle replaces (sse + sse_host) with one VmafCudaKernelReadback.
  • Net LOC delta: +8 LOC on integer_psnr_cuda.c alone — the helpers add per-call boilerplate. The dedup win materialises as more CUDA feature kernels (motion / moment / ssim / vif / adm) migrate one-at-a-time in follow-up PRs. Each subsequent migration saves ~15 LOC.
  • Bit-exactness gates: kernel launch + reduction logic unchanged. The migration only touches state-management boilerplate around the kernel; the SSE accumulator math, the per-bpc kernel function lookup, the host-side log10 score formula, and the dispatch grid-dim calculation are byte-identical to the prior implementation. Netflix golden gate + CPU/CUDA cross-backend parity gate (places=4) re-run unchanged.
  • On upstream sync: zero interaction. integer_psnr_cuda.c is fork-introduced (T7-23 / ADR-0182).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true
ninja -C build
meson test -C build  # CUDA test suite must pass
# Cross-backend parity gate:
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4

0125 — Vulkan submit-side template + fence pool + descriptor pre-alloc bundle (ADR-0256)

  • Touches:
  • core/src/vulkan/kernel_template.h — fork-local. Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. VmafVulkanKernelSubmitPool struct + _create / _destroy / _acquire helpers + vmaf_vulkan_kernel_descriptor_sets_alloc helper. Upstream has no Vulkan backend — no merge surface.
  • core/src/feature/vulkan/{psnr_hvs,vif,float_vif,float_adm}_vulkan.c — fork-local kernel TUs, also no upstream peer.
  • Invariant: the four migrated kernels keep all per-frame VkFence + VkCommandBuffer + VkDescriptorSet resources alive across frames in the pool. Pre-bound descriptor sets rely on the kernel's VmafVulkanBuffer * handles being init-time stable (allocated in init(), freed only in close_fex). vmaf_vulkan_kernel_pipeline_destroy destroys the descriptor pool — pre-allocated sets are released implicitly via the pool; callers must NOT call vkFreeDescriptorSets on them.
  • Re-test on rebase:
meson setup build libvmaf -Denable_vulkan=enabled
ninja -C build
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
    meson test -C build test_vulkan_smoke \
                        test_vulkan_async_pending_fence \
                        test_vulkan_pic_preallocation
python scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature vif --backend vulkan --places 4
python scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary build/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature adm --backend vulkan --places 4

0107 — psnr_hvs_cuda async upload + persistent pinned staging (T-GPU-OPT-2/3)

  • Touches:
  • core/src/feature/cuda/integer_psnr_hvs_cuda.c — only consumer; fork-local from inception (T7-23 / ADR-0188 / ADR-0191). State adds upload_str (dedicated H2D stream), upload_done (cross-stream completion event), and per-plane persistent pinned h_uint_ref[3] / h_uint_dist[3] staging buffers allocated once in init_fex_cuda. The per-call helper upload_plane_cuda is split into issue_d2h_plane (pic-stream D2H), convert_plane (CPU normalise), and issue_h2d_plane (upload-stream H2D). submit_fex_cuda runs the three phases explicitly and records upload_done after the last H2D, then cuStreamWaitEvents on lc.str before kernel launches.
  • core/src/cuda/AGENTS.md — adds a rebase-sensitive invariant entry under §Rebase-sensitive invariants documenting the three-phase flow + persistent staging contract.
  • Invariant: the pinned h_uint_* and h_ref / h_dist buffers are never freed and re-allocated mid-stream; the H2Ds must run on upload_str (not on lc.str) so the cuStreamWaitEvent cross-stream link is meaningful; the upload_done event is recorded after the last H2D for the current frame and waited on once before the first kernel launch of that frame. CUDA graph capture (future T-GPU-OPT-N) depends on the no-per-frame-alloc invariant; collapsing the three-phase split or re-introducing per-frame vmaf_cuda_buffer_host_alloc calls breaks that follow-up. Bit-exactness gate is places=3 for psnr_hvs_y / cb / cr and the combined psnr_hvs (matches the existing matrix; not places=4).
  • On upstream sync: zero interaction. integer_psnr_hvs_cuda.c is fork-introduced (T7-23 / ADR-0188 / ADR-0191).
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary core/build/tools/vmaf \
  --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
  --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
  --feature psnr_hvs --backend cuda --places 3

0227 — output.c writer-format unit tests (R3 of coverage-gap-2026-05-02)

  • Touches:
  • core/test/test_output.c (new) — exercises the four writers in core/src/output.c (XML / JSON / CSV / SUB) end-to-end via tmpfile()-backed sinks and a synthetic VmafFeatureCollector. Pure test-only; no production code change.
  • core/test/meson.build — registers test_output next to test_feature_collector (mirrors that test's wiring: link_with: libvmaf + libsvm objects + log/predict/metadata helpers).
  • Invariant: the test pulls libvmaf.c and output.c in via #include "*.c" (mirroring the precedent in test_feature_collector.c) so the per-translation-unit .gcno lands in the test build dir and gcovr aggregates output.c's coverage. The mu-test framework macro (mu_assert) deliberately early-returns from each static char *test_*() body — that's why every test body trips clang-analyzer-unix.Malloc "potential leak" notes (cleanup runs only on the success-tail path). This pattern is shared across every core/test/test_*.c file and is load- bearing (per ADR-0141 NOLINT carve-out): replacing it with goto- cleanup would obscure the per-assertion failure message.
  • On upstream sync: zero interaction. output.c is upstream- mirrored, but this PR doesn't touch it. The test only depends on the four public function signatures (vmaf_write_output_{xml, json,csv,sub}); if Netflix renames or reorders those, the test fails to compile and the rebase author updates it then.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && ./build/test/test_output

0126 — OSSF Scorecard policy (ADR-0263)

  • Touches: .github/workflows/scorecard.yml (line 45 — the github/codeql-action/upload-sarif@<sha> pin). The rest of the policy is doc-only (docs/adr/0263-*.md, docs/research/0053-*.md, changelog.d/security/). Upstream Netflix/vmaf does not ship a Scorecard workflow, so the path itself is fork-introduced and won't conflict.
  • Invariant: the upload-sarif SHA must point to a commit that currently exists in github/codeql-action's git tree. A SHA that was once v4 head but no longer exists in the action repository triggers Scorecard's "imposter commit" defence and breaks the workflow with a 400 error against api.scorecard.dev. Verify on every Dependabot bump by spot-checking gh api /repos/github/codeql-action/commits/<sha> returns 200.
  • On upstream sync: zero interaction.
  • Re-test on rebase:

```bash # Confirm the pin still resolves to a real commit: pin=$(grep -oE 'codeql-action/upload-sarif@[a-f0-9]{40}' \ .github/workflows/scorecard.yml | head -1 | cut -d@ -f2) gh api "/repos/github/codeql-action/commits/$pin" --jq '.sha' # Then watch the next master push for a green Scorecard run: gh run list --workflow scorecard --repo VMAFx/vmafx --limit 1

0228 — U-2-Net u2netp saliency replacement deferred (ADR-0265)

  • Touches: docs-only.
  • docs/adr/0265-u2netp-saliency-replacement-blocked.md — new ADR continuing the deferral chain started by ADR-0257.
  • docs/research/0055-u2netp-saliency-replacement-survey.md — new research digest (upstream survey + license + distribution
    • op-allowlist audit + alternatives walk).
  • docs/ai/models/mobilesal.md — pointer block updated to reference both ADR-0257 (first blocker) and ADR-0265 (second blocker).
  • model/tiny/registry.jsonmobilesal_placeholder_v0 notes field updated to reference ADR-0265 alongside ADR-0257 (no schema / sha256 / file changes).
  • model/tiny/mobilesal.json — sidecar notes field updated in lockstep.
  • scripts/gen_mobilesal_placeholder_onnx.py — generator notes string updated so re-running is idempotent against the new sidecar / registry text.
  • CHANGELOG.md — Changed entry via changelog.d/changed/T6-2a-followup-u2netp-replacement-deferred.md.
  • docs/adr/README.md — index row via docs/adr/_index_fragments/0265-u2netp-saliency-replacement-blocked.md.
  • Invariant: zero C-side surface change. feature_mobilesal.c tensor-name contract (input input → output saliency_map, NCHW float32 [1, 3, H, W][1, 1, H, W]) is unchanged; the on-disk model/tiny/mobilesal.onnx (sha256 f1226310…) is unchanged; mobilesal_placeholder_v0's smoke: true flag is unchanged. Any future drop-in (U-2-Net via T6-2a-mirror-u2netp-via-release + T6-2a-widen-allowlist-resize, distilled student, or BASNet / PoolNet survey result) replaces the .onnx and bumps the registry sha256 without touching the C side.
  • On upstream sync: zero interaction. feature_mobilesal.c, the registry, the ADR, and the research digest are all fork-local (T6-2a; ADR-0218 / ADR-0257 / ADR-0265; not present in Netflix upstream).
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_mobilesal
python3 ai/scripts/validate_model_registry.py
bash scripts/docs/concat-adr-index.sh --check
bash scripts/release/concat-changelog-fragments.sh --check

0108 — ssim_accumulate_avx512 per-lane double reduction vectorised

  • ADR: ADR-0139 (existing; no new ADR — the per-lane reduction order is unchanged).
  • Touches:
  • core/src/feature/x86/ssim_avx512.c — the ssim_accumulate_block_avx512 body. The per-lane scalar ssim_accumulate_lane calls (16 of them) are replaced by two 8-wide __m512d passes that compute lv, cv, sv, and lv*cv*sv lane-wise in vector double. Aligned double[16] spill buffers replace the previous _Alignas(64) float[16]×6 spill, and the scalar accumulation loop now does 4×16 vaddsd instead of 16 invocations of the per-lane helper.
  • CHANGELOG.md — Changed entry.
  • This file — this entry.
  • Invariant (load-bearing for ADR-0139 bit-exactness):
  • Per-lane double computation order is byte-identical: ((2.0 * rm) * cm + C1) / l_den, then (2.0 * srsc + C2) / c_den, then (lv * cv) * sv. No FMA contraction (separate _mm512_mul_pd + _mm512_add_pd_mm512_fmadd_pd is forbidden because it changes the rounding count and would diverge from scalar's two-step mul+add).
  • Float→double widening uses _mm512_cvtps_pd which is IEEE-754-exact for finite floats (52-bit mantissa fits 23-bit float losslessly).
  • Lane-by-lane left-to-right reduction order preserved: local_ssim += t_ssim[k] for k = 0..15. Tree reductions (pairwise add, dual-accumulator unroll) are forbidden — they break running-sum associativity against scalar.
  • AVX2 / NEON twins kept on the per-lane scalar path. Verified bit-identical against the new AVX-512 at --precision max on the Netflix src01_hrc00/01_576x324 and the checkerboard_1920_1080_10_3_*_0 pairs. The bit-exactness contract (ADR-0139) is per-lane, not per-ISA algorithm — so AVX2 / NEON stay scalar-per-lane until a dedicated PR vectorises them with the same care.
  • Rebase impact: zero conflict with Netflix upstream — the whole SSIM SIMD surface is fork-local (no upstream SSIM SIMD exists). Conflicts only arise if upstream changes ssim_accumulate_default_scalar in iqa/ssim_tools.c; in that case both the AVX2 / NEON per-lane helper and the AVX-512 vector-double block need a coordinated update preserving the three invariants above.
  • Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build
# Bit-exact at --precision max, scalar vs AVX2 vs AVX-512:
for MASK in 0 16 255; do
  core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
    -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
    -w 576 -h 324 -p 420 -b 8 \
    --feature float_ms_ssim --feature float_ssim \
    --xml -o /tmp/m${MASK}.xml --precision max --cpumask $MASK
done
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m16.xml)   # empty
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m255.xml)  # empty
  • Why this matters on rebase: an upstream commit that touches core/src/feature/ssimulacra2.c could prompt a "let's also port the GPU XYB while we're here" follow-up. The ledger entry is the standing answer: don't, the measurement was redone on NVIDIA in May 2026 and the result still failed places=4 by five decades. See Research-0047.

0126 — FastDVDnet real upstream weights drop (ADR-0253)

  • What changed: replaces model/tiny/fastdvdnet_pre.onnx with the wrapped real upstream FastDVDnet checkpoint (sha256 eb9444cf6f07eefdc7f4f68d09131074dbd1dcee6f88a331ba684dd2fb5937d4, ~9.5 MiB), refreshes the sidecar model/tiny/fastdvdnet_pre.json, flips the registry row's smoke: true → false and adds license: "MIT" + the upstream commit pin c8fdf61. New exporter ai/scripts/export_fastdvdnet_pre.py (the older _placeholder.py exporter is retained for reference). New ADR docs/adr/0255-fastdvdnet-pre-real-weights.md; user-facing doc docs/ai/models/fastdvdnet_pre.md rewritten with provenance, license attribution, and reproduce-the-export instructions.
  • Upstream source: fork-local. Netflix/vmaf does not ship a FastDVDnet temporal pre-filter; the C extractor and ONNX surface are entirely fork-introduced (ADR-0215). The wrapped weights are attribution-only (upstream m-tassano/fastdvdnet MIT).
  • On upstream sync: zero interaction. Every file touched (ai/scripts/export_fastdvdnet_pre*.py, model/tiny/fastdvdnet_pre.*, docs/ai/models/fastdvdnet_pre.md, docs/adr/0253-*.md, CHANGELOG fragment, ADR index fragment) lives in fork-introduced trees.
  • Re-test on rebase:
# Re-derive the ONNX from the pinned upstream checkpoint.
mkdir -p /tmp/fastdvdnet_upstream && cd /tmp/fastdvdnet_upstream
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/model.pth
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/models.py
cd /path/to/vmaf
python3 ai/scripts/export_fastdvdnet_pre.py \
    --upstream-dir /tmp/fastdvdnet_upstream
python3 ai/scripts/validate_model_registry.py
meson test -C build --suite=fast --print-errorlogs test_fastdvdnet_pre

0127 — ONNX op-allowlist gains Resize (ADR-0258)

  • Touches:
  • core/src/dnn/op_allowlist.c — fork-local file (no upstream counterpart). One new entry "Resize" under the /* convolutional */ block.
  • core/test/dnn/test_op_allowlist.c, core/test/dnn/test_onnx_scan.c — fork-local DNN tests.
  • ai/tests/test_op_allowlist.py — fork-local Python parity test.
  • Invariant: the C allowlist is the single source of truth; the Python regex parser in ai/src/vmaf_train/op_allowlist.py walks the same op_allowlist.c file. Any future entry only needs the C edit — Python symmetry is automatic.
  • Upstream source: fork-local. Netflix/vmaf has no ONNX op- allowlist surface; the entire core/src/dnn/ tree is fork- introduced.
  • On upstream sync: zero interaction. Every file touched lives in fork-introduced trees.
  • Re-test on rebase:
meson test -C build test_op_allowlist test_onnx_scan
PYTHONPATH=ai/src python -m pytest ai/tests/test_op_allowlist.py

0231 — vif.comp + ciede.comp precise decorations (ADR-0269 / Step A of Vulkan 1.4 bump)

  • Touches: core/src/feature/vulkan/shaders/vif.comp (3 local-variable type qualifiers: g, sv_sq, gg_sigma_fprecise float), core/src/feature/vulkan/shaders/ciede.comp (yuv_to_rgb outputs, rgb_to_xyz matmul accumulators, ciede2000 chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE).
  • Invariant: Both shaders are fork-local (Vulkan backend is fork-added; upstream Netflix/vmaf has no Vulkan compute kernels). The precise keyword is GLSL 4.50 standard syntax; glslc 2026.1 lowers it to per-result OpDecorate NoContraction. The decorations are load-bearing for the cross-backend gate on NVIDIA driver 595.71+ — removing them would re-introduce the 42/48 ciede regression at API 1.3 documented in research-0054.
  • On upstream sync: zero interaction. Both shader files are entirely fork-introduced; upstream has no Vulkan compute path.
  • Re-test on rebase:
# Re-confirm the cross-backend gate on a Vulkan-capable host.
meson setup core/build -Denable_vulkan=enabled
ninja -C core/build
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature vif --backend vulkan --places 4
python3 scripts/ci/cross_backend_vif_diff.py \
    --vmaf-binary core/build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --feature ciede --backend vulkan --places 4
# Confirm SPIR-V still emits NoContraction post-rebase.
glslc --target-env=vulkan1.3 -O \
    core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
spirv-dis /tmp/vif.spv | grep -c NoContraction   # expect ≥ 60

Expected on NVIDIA 595.71+: vif 0/48 OK, ciede 5/48 FAIL (max abs 8.9e-05 — pre-existing fork debt at API 1.3, see ADR-0269). On RADV / lavapipe: bit-exact (precise is a no-op there).

0229 — fr_regressor_v2 codec-aware scaffold (ADR-0272)

  • ADR: ADR-0272
  • Touches:
  • ai/scripts/train_fr_regressor_v2.py (new) — Phase A JSONL consumer; trains the codec-aware FRRegressor.
  • model/tiny/fr_regressor_v2.onnx (new, smoke) — placeholder ONNX from --smoke mode; re-baked on production training.
  • model/tiny/fr_regressor_v2.json (new) — sidecar.
  • model/tiny/registry.json — new entry with smoke: true.
  • docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md (new).
  • docs/adr/README.md — index row.
  • docs/research/0058-fr-regressor-v2-feasibility.md (new).
  • docs/ai/models/fr_regressor_v2.md (new) — model card.
  • ai/AGENTS.md — invariant note (codec block layout + ENCODER_VOCAB ordering).
  • CHANGELOG.md — Added entry.
  • Invariant: the 8-D codec block layout is [encoder_onehot(6), preset_norm, crf_norm] with ENCODER_VOCAB = (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown) in load-bearing order. CRF normaliser is /63 (union upper bound). Preset normaliser is /9. Bumping the vocabulary requires a re-train; existing checkpoints pin the order they were trained against via encoder_vocab_version in the sidecar. The two-input ONNX (features, codec) follows the LPIPS-Sq precedent (ADR-0040 / ADR-0041).
  • Rebase impact: entirely fork-local; pure additive; no upstream-mirror file is touched. Phase A schema (consumed by this trainer) is itself fork-local (tools/vmaf-tune/). No conflict expected on /sync-upstream.
  • Re-test on rebase:
python ai/scripts/train_fr_regressor_v2.py --smoke
python ai/scripts/validate_model_registry.py

0311 — libFuzzer harness expansion: yuv_input + cli_parse (ADR-0311)

  • ADR: ADR-0311; parent ADR-0270.
  • Touches:
  • core/test/fuzz/fuzz_yuv_input.c (new)
  • core/test/fuzz/fuzz_cli_parse.c (new)
  • core/test/fuzz/meson.build — two new executable(...) blocks for the harnesses, plus a shared fuzz_vidinput_sources list.
  • core/test/fuzz/yuv_input_corpus/* (new — 6 seeds covering 8/10-bit × 4:2:0 / 4:2:2 / 4:4:4 plus a truncated-frame seed).
  • core/test/fuzz/cli_parse_corpus/* (new — 6 seeds covering the --feature, --model, --reference, YUV-flag, and --help shapes).
  • core/test/fuzz/README.md — Targets table extended.
  • .github/workflows/fuzz.yml — matrix gains fuzz_yuv_input + fuzz_cli_parse; per-harness wall-clock budget reduced from 300 s to 60 s so the 3-target matrix fits the existing timeout-minutes: 15 cap.
  • docs/development/fuzzing.md — runbook table + smoke commands extended.
  • docs/adr/0311-libfuzzer-harness-expansion.md (new)
  • docs/research/0083-libfuzzer-harness-expansion-target-survey.md (new)
  • libvmaf/AGENTS.md — new invariant block for the one-parser-one-harness rule.
  • CHANGELOG.md — Added entry.
  • Invariant:
  • The fuzz scaffold remains opt-in (-Dfuzz=true) — every default meson setup invocation must continue to skip it.
  • fuzz_yuv_input re-includes tools/yuv_input.c and the rest of the vidinput trio as build inputs. Upstream Netflix/vmaf splits or renames of those source files need the matching meson.build source-list update.
  • fuzz_cli_parse re-includes tools/cli_parse.c as a build input and links against libvmaf for vmaf_version() and feature-dictionary symbols. The -Wl,--wrap=exit link arg is load-bearing — without it, usage()'s exit(1) would terminate the fuzzer process on first bad input.
  • LLVMFuzzerTestOneInput keeps external linkage; the scaffold-wide // NOLINTNEXTLINE(misc-use-internal-linkage) pattern is correct for libFuzzer's name-resolved entry-point ABI.
  • Rebase impact: any upstream sync that touches core/tools/{yuv_input,cli_parse}.c must re-run the 60 s smoke per harness on the merged tip; record any new-found crash-* artefact under the matching <target>_known_crashes/ dir, not in <target>_corpus/. The __wrap_exit shim in fuzz_cli_parse.c is GNU-ld / lld-only; do not assume it works on Apple ld without an -undefined,dynamic_lookup fallback.
  • Re-test on rebase:
CC=clang CXX=clang++ \
  meson setup build-fuzz libvmaf \
    --buildtype=debug \
    -Db_sanitize=address \
    -Db_lundef=false \
    -Dfuzz=true \
    -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz \
    test/fuzz/fuzz_y4m_input \
    test/fuzz/fuzz_yuv_input \
    test/fuzz/fuzz_cli_parse
./build-fuzz/test/fuzz/fuzz_yuv_input \
    -seed=0 -runs=1000 \
    core/test/fuzz/yuv_input_corpus/
./build-fuzz/test/fuzz/fuzz_cli_parse \
    -seed=0 -runs=1000 \
    core/test/fuzz/cli_parse_corpus/

0229 — libFuzzer scaffold for the YUV4MPEG2 parser (ADR-0270)

  • ADR: ADR-0270
  • Touches:
  • core/test/fuzz/fuzz_y4m_input.c (new)
  • core/test/fuzz/meson.build (new)
  • core/test/fuzz/README.md (new)
  • core/test/fuzz/y4m_input_corpus/* (new — six seeds)
  • core/test/fuzz/y4m_input_known_crashes/* (new — one 411-chroma OOB reproducer; excluded from CI corpus)
  • core/test/meson.buildsubdir('fuzz') line.
  • core/meson_options.txt — new option('fuzz', ...).
  • .github/workflows/fuzz.yml (new — nightly 5-minute job).
  • docs/development/fuzzing.md (new — operator runbook).
  • docs/adr/0270-fuzzing-scaffold.md (new)
  • docs/research/0059-libfuzzer-scaffold-y4m.md (new)
  • docs/state.md — new Open-bug row for the 411-chroma OOB write.
  • CHANGELOG.md — Added entry.
  • Invariant: the fuzz scaffold is opt-in — every default meson setup invocation must continue to skip it. The harness links statically against core/tools/{y4m_input,yuv_input,vidinput}.c rather than libvmaf.so so the public C-API surface stays unchanged.
  • Rebase impact: the harness re-includes core/tools/y4m_input.c as a build input. Any upstream Netflix/vmaf change that splits or renames the tool sources (e.g. moves the parser into core/src/) needs the corresponding meson.build source list update and the harness re-test below. The y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m reproducer is the regression gate for the parser fix; do not delete it on upstream sync — if upstream lands the same fix, port the reproducer back into y4m_input_corpus/ as a permanent seed.
  • Re-test on rebase:
CC=clang CXX=clang++ \
  meson setup build-fuzz libvmaf \
    --buildtype=debug \
    -Db_sanitize=address \
    -Db_lundef=false \
    -Dfuzz=true \
    -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
./build-fuzz/test/fuzz/fuzz_y4m_input \
    -max_total_time=60 \
    core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer still triggers (until the fix lands):
./build-fuzz/test/fuzz/fuzz_y4m_input \
    core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m

0231 — HIP seventh-consumer kernel float_motion_hip (ADR-0273)

  • ADR: ADR-0273
  • Touches:
  • core/src/feature/hip/float_motion_hip.c (new) — seventh consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/float_motion_cuda.c call-graph-for-call-graph; init/submit/collect/close invoke the kernel-template helpers in the same order; flush() callback for tail-frame motion2 emission; motion_force_zero short-circuit posture (fex->extract swap with submit / collect / flush / close nulled). Submit path intentionally bypasses vmaf_hip_kernel_submit_pre_launch (kernel writes per-WG SAD float partials directly, no atomic, no memset).
  • core/src/feature/hip/float_motion_hip.h (new)
  • core/src/hip/meson.build — new entry in hip_sources.
  • core/src/feature/feature_extractor.c — extern declaration plus feature_extractor_list[] entry under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — new sub-test test_float_motion_hip_extractor_registered (also asserts the VMAF_FEATURE_EXTRACTOR_TEMPORAL flag bit) and a row in test_table[].
  • docs/adr/0273-hip-seventh-consumer-float-motion.md (new)
  • docs/adr/README.md — index row.
  • docs/backends/hip/overview.md — seventh / eighth consumer note.
  • core/src/hip/AGENTS.md — invariant note.
  • CHANGELOG.md — Added entry (joint with ADR-0274).
  • Invariant — three-buffer ping-pong + motion_force_zero short-circuit are load-bearing. The state struct carries three uintptr_t buffer slots (ref_in, blur[2]) that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin's VmafCudaBuffer *ref_in + VmafCudaBuffer *blur[2] field shape. The motion_force_zero short-circuit (fex->extract swap, kernel-template helpers nulled) must stay aligned with the CUDA twin on every refactor — otherwise the runtime PR's helper-body flip diverges between the two backends. The submit_pre_launch bypass mirrors the CUDA twin; if a future PR adds a submit_pre_launch call to float_motion_cuda.c's submit path, the HIP twin must follow in the same PR.
  • Rebase impact: entirely fork-local. New files are HIP-specific. The only upstream-touching edit is feature_extractor.c, but the change sits inside an existing #if HAVE_HIP block (ADR-0241); upstream has no HAVE_HIP so no conflict is expected.
  • Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
  -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke

0232 — HIP eighth-consumer kernel float_ssim_hip (ADR-0274)

  • ADR: ADR-0274
  • Touches:
  • core/src/feature/hip/float_ssim_hip.c (new) — eighth consumer of core/src/hip/kernel_template.h. Mirrors core/src/feature/cuda/integer_ssim_cuda.c call-graph-for-call-graph (the CUDA file registers vmaf_fex_float_ssim_cuda despite its integer_ filename). First multi-dispatch HIP consumer (chars.n_dispatches_per_frame == 2). Submit path intentionally bypasses vmaf_hip_kernel_submit_pre_launch (kernel writes per-block float partials directly). State struct carries five uintptr_t intermediate float buffer slots (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) tracked outside the kernel-template's readback bundle. validate_dims_hip and init_dims_hip helpers extracted from init() to fit the readability-function-size budget.
  • core/src/feature/hip/float_ssim_hip.h (new)
  • core/src/hip/meson.build — new entry in hip_sources.
  • core/src/feature/feature_extractor.c — extern declaration plus feature_extractor_list[] entry under #if HAVE_HIP.
  • core/test/test_hip_smoke.c — new sub-test test_float_ssim_hip_extractor_registered (also asserts chars.n_dispatches_per_frame == 2) and a row in test_table[].
  • docs/adr/0274-hip-eighth-consumer-float-ssim.md (new)
  • docs/adr/README.md — index row.
  • docs/backends/hip/overview.md — seventh / eighth consumer note (joint).
  • core/src/hip/AGENTS.md — invariant note.
  • CHANGELOG.md — Added entry (joint with ADR-0273).
  • Invariant — multi-dispatch + five-slot buffer pyramid + v1 scale=1 validation are load-bearing. The state struct carries five uintptr_t intermediate float buffer slots that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin's VmafCudaBuffer *h_* field shape — any drift in the CUDA twin's slot count requires a paired update here. The chars.n_dispatches_per_frame == 2 characteristic is asserted in the smoke test; do not silently lower it. The v1 scale=1 -EINVAL validation surface (in validate_dims_hip) must stay aligned with the CUDA twin's compute_scale / vmaf_log chain. The HIP twin's validate_dims_hip / init_dims_hip extraction is intentional for the function-size budget; do not re-inline without verifying the budget still passes.
  • Rebase impact: entirely fork-local; same posture as ADR-0273.
  • Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
  -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke

0229 — vmaf_tiny_v3 + vmaf_tiny_v4 dynamic-PTQ int8 sidecars (ADR-0275)

0278 — vmaf-tune libaom-av1 codec adapter (2026-05-03)

0228 — vmaf-tune libx265 codec adapter (ADR-0288)

0280 — vmaf-tune NVENC codec adapters (ADR-0290)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_nvenc,hevc_nvenc,av1_nvenc,_nvenc_common}.py (new). Wholly fork-local — no upstream Netflix/vmaf overlap.
  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py — registry expanded.
  • tools/vmaf-tune/tests/test_codec_adapter_nvenc.py (new).
  • tools/vmaf-tune/tests/test_corpus.py — Phase-A registry assertion updated.
  • tools/vmaf-tune/AGENTS.md — invariant note expanded.
  • docs/usage/vmaf-tune.md — "Hardware encoders (NVENC)" section.
  • docs/adr/0290-vmaf-tune-nvenc-adapters.md (new) + docs/adr/README.md index row.
  • docs/research/0065-vmaf-tune-nvenc-adapters.md (new).
  • CHANGELOG.md — Added entry.
  • Invariant: known_codecs() returns the four-codec tuple ("av1_nvenc", "h264_nvenc", "hevc_nvenc", "libx264"); the mnemonic preset map (ultrafast/superfast/veryfastp1, fasterp2, fastp3, mediump4, slowp5, slowerp6, slowest/placebop7) is the canonical cross-codec preset alignment that downstream Phase B/C consumers assume. The CQ window is the hardware-permitted [0, 51]; the Phase A informative window is [15, 40].
  • Rebase impact: zero — tools/vmaf-tune/ is wholly fork-local and has no upstream Netflix/vmaf path overlap.
  • Re-test on rebase:
cd tools/vmaf-tune && python -m pytest tests/ -q

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry add), tools/vmaf-tune/src/vmaftune/encode.py (parse_versions(stderr, encoder=…) gains a per-codec branch), tools/vmaf-tune/src/vmaftune/cli.py (help-text wording only), tools/vmaf-tune/tests/test_codec_adapter_x265.py (new), tools/vmaf-tune/tests/test_corpus.py (membership-based codec list assertion).
  • Invariant: the codec-adapter contract documented in tools/vmaf-tune/AGENTS.md (multi-codec from day one; the search loop never branches on codec identity). The parse_versions signature is still backward-compatible — encoder defaults to libx264 so callers from before this PR keep working.
  • Upstream source: fork-local. tools/vmaf-tune/ is fork-only; upstream Netflix/vmaf does not ship encode automation.
  • On upstream sync: zero interaction. Confirm the _index_fragments/_order.txt row for 0288-vmaf-tune-codec-adapter-x265 remains present after any cross-merge.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -x

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry row + import), tools/vmaf-tune/tests/test_corpus.py (membership assertion relaxed from == ("libx264",) to "libx264" in known_codecs()), tools/vmaf-tune/tests/test_codec_adapter_libaom.py (new), tools/vmaf-tune/AGENTS.md (preset-vocabulary invariant).
  • Invariant: the cross-codec preset vocabulary (placebo, slowest, slower, slow, medium, fast, faster, veryfast, superfast, ultrafast) is shared across AV1-family adapters so one --preset axis covers x264 / x265 / svtav1 / libaom-av1. Each adapter maps the human name onto its codec-specific knob; do not introduce per-adapter preset names.
  • Upstream source: fork-local. tools/vmaf-tune/ is the fork-introduced quality-aware encode automation harness (ADR-0237); it has no upstream Netflix/vmaf counterpart.
  • On upstream sync: zero interaction with upstream/master. Self-contained in tools/vmaf-tune/ and docs/.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)

  • ADR: ADR-0275
  • Touches:
  • model/tiny/vmaf_tiny_v3.int8.onnx (new, 4 267 B)
  • model/tiny/vmaf_tiny_v4.int8.onnx (new, 7 769 B)
  • model/tiny/registry.json — new vmaf_tiny_v3 and vmaf_tiny_v4 rows with quant_mode, int8_sha256, quant_accuracy_budget_plcc fields.
  • model/tiny/vmaf_tiny_v3.json, model/tiny/vmaf_tiny_v4.json — same fields mirrored into the per-model sidecars.
  • docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md — new "Quantisation" sections.
  • docs/adr/0275-vmaf-tiny-v3-v4-ptq.md (new) and ADR index row.
  • CHANGELOG.md — Added entry.
  • Invariant: python ai/scripts/measure_quant_drop.py --all reports [PASS] for both vmaf_tiny_v3 (drop ≤ 0.001 on Netflix features) and vmaf_tiny_v4 (drop ≤ 0.001), inside the 0.01 per-model budget. The runtime redirect from ADR-0174 picks the .int8.onnx sibling when an operator's registry overlay declares quant_mode: dynamic.
  • Rebase impact: entirely fork-local — neither v3 nor v4 nor the dynamic-PTQ harness exists upstream. The new int8 ONNX bytes ship as committed binaries (mirroring learned_filter_v1 and nr_metric_v1); they are well below the few-MB external-data threshold and don't require the sigstore + .onnx.data pattern.
  • Re-test on rebase:

```bash python ai/scripts/validate_model_registry.py python ai/scripts/measure_quant_drop.py --all

0229 — NVIDIA-Vulkan ciede2000 places=4 fork debt root-cause (ADR-0273)

  • Touched files: docs-only.
  • docs/adr/0273-...precision-gap.md (new) + _index_fragments/ row + _order.txt append.
  • docs/research/0055-ciede-vulkan-nvidia-f32-f64-root-cause.md (new) + docs/research/README.md index row.
  • docs/state.md — Open-bugs row T-VK-CIEDE-F32-F64.
  • docs/backends/vulkan/overview.md — NVIDIA-hardware caveat.
  • changelog.d/changed/ciede-vulkan-nvidia-f32-f64-precision-gap.md (new).
  • core/src/vulkan/AGENTS.md — invariant cross-link.
  • Invariant: the ciede.comp shader's f32 precision contract is load-bearing — promoting to f64 would silently change scores on every Vulkan device that supports shaderFloat64 and create a per-device-feature-bit divergence (RTX 4090 has it; many consumer GPUs don't). The CPU ciede.c::get_lab_color doing its colour-space chain in double is upstream Netflix behaviour and must not be narrowed to f32 to "fix" the GPU gap (would change Netflix golden ground truth). The 5/48 NVIDIA places=4 mismatch on the highest-ΔE frames is expected and documented; do not attempt to "fix" it without re-reading ADR-0273 first.
  • Rebase impact: zero — docs-only. The CPU and shader sources this ADR analyses are unchanged by this PR. If a future upstream rebase touches ciede.c::get_lab_color (the double chain) the ADR's reasoning still holds; if upstream changes the CPU reference's precision posture, ADR-0273 needs a Status: Superseded entry.
  • Re-test on rebase: a manual NVIDIA-hardware run if available:

```bash cd libvmaf && meson setup build \ -Denable_vulkan=enabled -Denable_cuda=false && ninja -C build cd .. python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary $PWD/core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature ciede --backend vulkan --device 0 --places 4 # Expected post-PR-346 (when merged): 5/48 mismatches at 1.78× threshold. # Expected pre-PR-346 (current master): 42/48 mismatches at higher ratio. # If the count drops below 5/48 on NVIDIA, ADR-0273 should record the # delta and consider closing T-VK-CIEDE-F32-F64.

0229 — tools/vmaf-tune fast Phase A.5 scaffold (ADR-0276)

  • Touches: tools/vmaf-tune/src/vmaftune/fast.py (new), tools/vmaf-tune/src/vmaftune/cli.py (new fast subcommand branch), tools/vmaf-tune/pyproject.toml (new [fast] extra), tools/vmaf-tune/tests/test_fast.py (new), tools/vmaf-tune/AGENTS.md (new invariants), docs/usage/vmaf-tune.md (new "Phase A.5" section), docs/adr/0276-vmaf-tune-fast-path.md (new ADR), docs/research/0060-vmaf-tune-fast-path.md (new digest).
  • Invariant: the fast subcommand is opt-in and never automatically replaces the Phase A grid path. The slow grid is the ground-truth corpus generator (ADR-0237 contract); fast-path is for the recommendation use case only. Optuna is a lazy-imported optional dep gated behind the [fast] extra — importing it at module scope outside fast.py (or its tests) breaks the zero-dep core install.
  • Rebase impact: entirely fork-local; the tool sits under tools/vmaf-tune/ which is fork-added, and no upstream files are touched. Upstream Netflix/vmaf has no analogous surface.
  • Re-test on rebase:
pip install -e 'tools/vmaf-tune[fast]'
pytest tools/vmaf-tune/tests/test_fast.py -v
vmaf-tune fast --smoke --target-vmaf 92

0229 — vmaf-tune recommend subcommand (ADR-0237 Phase B-lite)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/recommend.py (new). Wholly fork-local — no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds recommend subparser; corpus subcommand untouched.
  • tools/vmaf-tune/tests/test_recommend.py (new). 13-case smoke suite, mocks all binaries; runs in <100 ms.
  • docs/usage/vmaf-tune.md — adds ## recommend section.
  • Invariant: recommend consumes the existing CORPUS_ROW_KEYS schema unchanged — vmaf_score, bitrate_kbps, crf, preset, encoder, exit_status. No schema bump. If a future PR bumps SCHEMA_VERSION, both the corpus writer and the recommend reader must be updated in lockstep; tests assert this via test_corpus_row_keys_match_init_contract.
  • Rebase impact: zero — tools/vmaf-tune/ is wholly fork-local; no upstream surface touches it.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0228 — integer_ms_ssim_cuda.c joins drain_batch (T-GPU-OPT-2 / ADR-0271)

  • Touches: core/src/feature/cuda/integer_ms_ssim_cuda.c. No upstream Netflix/vmaf changes expected here — the file is fork-added (CUDA twin of the upstream-port ms_ssim_score.cu) and the surface this PR redrew (per-scale l_partials[i] / c_partials[i] / s_partials[i] arrays + the per-scale h_l_partials[i] / h_c_partials[i] / h_s_partials[i] pinned host shadows + the submit() <→ collect() work redistribution + the cuEventRecord(s->lc.finished, s->lc.str) + vmaf_cuda_drain_batch_register(&s->lc) tail) is also entirely fork-local.
  • Invariant: the engine-scope drain-batch contract from ADR-0271 / drain_batch.h. The kernel-launch order on s->lc.str must stay stable: decimate (× 4) then for each scale i ∈ 0..4 horizvert_lcs ⇒ DtoH(l_partials[i]) ⇒ DtoH(c_partials[i]) ⇒ DtoH(s_partials[i])thencuEventRecord(s->lc.finished, s->lc.str)thenvmaf_cuda_drain_batch_register(&s->lc). Same-stream ordering is what makes the shared SSIM intermediates (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp`) safe across scales without explicit sync — any change that parallelises the per-scale work onto multiple streams breaks bit-exactness unless per-scale intermediates are also added.
  • On upstream sync: zero interaction (the file is fork-added). If a future upstream PR adds an integer_ms_ssim_cuda.c of its own, the merger must reconcile the per-scale partials topology + the drain_batch tail with whatever the new upstream shape brings.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build  # confirms the CPU build still links cleanly
# If the dev host has a working nvcc / host-compiler pair:
meson setup build_cuda -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda src/liblibvmaf_feature.a.p/feature_cuda_integer_ms_ssim_cuda.c.o
# Netflix CPU golden gate (CPU is the bit-exactness ground truth):
make test-netflix-golden
# Cross-backend parity (places=4 gate, ADR-0214):
/cross-backend-diff

0277 — ffmpeg-patches refresh against n8.1 — 2026-05-04 (ADR-0277)

  • Touches: ffmpeg-patches/ is unchanged (no content drift). Doc-only entries land in:
  • docs/adr/0277-ffmpeg-patches-refresh-2026-05-04.md — new ADR.
  • docs/adr/_index_fragments/0277-ffmpeg-patches-refresh-2026-05-04.md — index row.
  • docs/adr/_index_fragments/_order.txt — manifest append.
  • changelog.d/changed/ffmpeg-patches-refresh-2026-05-04.md — Changed entry.
  • This file — this entry.
  • Invariant: ffmpeg-patches/series.txt order is load-bearing — patches 0002…0006 build on each other and only apply cleanly cumulatively. The verification gate is a series replay, not a per-patch git apply --check (per ADR-0118 + CLAUDE.md §12 r14).
  • On upstream sync: zero interaction. Netflix/vmaf has no ffmpeg-patches/ tree; this is a fork-local integration surface.
  • Re-test on rebase (also: re-replay procedure for the next refresh):
# Clone pristine n8.1
git -C /tmp clone --depth 1 --branch n8.1 \
  https://github.com/FFmpeg/FFmpeg.git ff-replay-$(date +%F)
cd /tmp/ff-replay-$(date +%F)
git switch -c refresh-$(date +%F)
git config user.email refresh@local && git config user.name "Refresh Bot"

# Replay the series cumulatively
for p in /path/to/vmaf/ffmpeg-patches/000*-*.patch; do
  git am --3way "$p" || break
done

# Regenerate and compare to in-tree
mkdir -p /tmp/ff-regen-$(date +%F)
git format-patch n8.1.. -o /tmp/ff-regen-$(date +%F)/

# Diff old vs new excluding pure format-patch noise
for i in 1 2 3 4 5 6; do
  orig=$(ls /path/to/vmaf/ffmpeg-patches/000${i}-*.patch)
  regen=$(ls /tmp/ff-regen-$(date +%F)/000${i}-*.patch)
  diff -u \
    <(grep -v "^From [0-9a-f]\|^Date:\|^index " "$orig") \
    <(grep -v "^From [0-9a-f]\|^Date:\|^index " "$regen") \
    | head -40
done

If only stylistic diffs surface (PATCH N/M numbering, MIME headers, hunk-context counts, hunk offset shifts against cumulative state), keep originals — record a no-drift refresh ADR. If real content drift surfaces, regenerate and ship the refresh PR with the regenerated patches plus a content-summary ADR.

End-to-end vf_libvmaf smoke is best run from CI (ffmpeg-integration.yml) against an installed libvmaf prefix — the meson-uninstalled .pc does not satisfy FFmpeg's #include <libvmaf.h> probe (the headers live under libvmaf/libvmaf.h only; the system-installed .pc carries an extra -I${includedir}/libvmaf shortcut that the uninstalled .pc omits).

0229 — T7-5 NOLINT-sweep closeout (ADR-0278)

  • Touched files:
  • core/src/feature/integer_adm.c (1 NOLINT cite, line ~988 adm_decouple_s123 — upstream-mirror Netflix 966be8d5).
  • core/src/feature/cuda/ssimulacra2_cuda.c (3 NOLINT cites: ss2c_picture_to_linear_rgb, ss2c_host_combine, ss2c_run_scale_gpu / extract_fex_cuda).
  • core/src/feature/vulkan/ssimulacra2_vulkan.c (3 NOLINT cites: ss2v_setup_gaussian, ss2v_picture_to_linear_rgb, ss2v_run_scale).
  • core/src/feature/vulkan/cambi_vulkan.c (1 NOLINT cite: cambi_vk_extract).
  • core/src/feature/sycl/integer_adm_sycl.cpp (6 cites, SYCL kernel-launch entries).
  • core/src/feature/sycl/integer_motion_sycl.cpp (2 cites).
  • core/src/feature/sycl/integer_vif_sycl.cpp (4 cites).
  • core/tools/vmaf.c (3 cites: copy_picture_data, init_gpu_backends, main).
  • Invariant: zero behavioural change. Edits are inside comment blocks — appended (ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278) to existing prose justifications. No function bodies split. The 12 SYCL sites share an identical justification string verbatim; preserving the byte-for-byte duplicate is the load-bearing documentation pattern (grep-able across the SYCL TUs).
  • On upstream sync: minimal interaction. The cite-only edits live inside comment blocks above the function signatures; rebases will surface them as touched lines but the function bodies are unchanged. For integer_adm.c's upstream-mirror block (Netflix 966be8d5), the comment edit at line 984–991 is cosmetic — keep the fork's version on conflict (it merely names the ADR; the underlying prose is unchanged).
  • Re-test on rebase:

```bash # 1. Programmatic audit must report 0 missing citations python3 - <<'PY' import re, os paths = [os.path.join(r, f) for r, _, fs in os.walk('libvmaf/src') for f in fs if f.endswith(('.c','.cpp','.h'))] paths.append('core/tools/vmaf.c') miss = total = 0 for p in paths: with open(p) as fh: ls = fh.readlines() for i, line in enumerate(ls): if 'NOLINT' in line and 'readability-function-size' in line and 'NOLINTEND' not in line: total += 1 ctx = [line]; j = i - 1 while j >= 0 and j > i - 14: s = ls[j].strip() if not s: break if s.startswith(('//','/','')): ctx.insert(0, ls[j]); j -= 1 else: break buf = ''.join(ctx) if 'ADR-' not in buf and not re.search(r'[Rr]esearch-?\d', buf): miss += 1 print(f"sites={total} missing={miss}") PY

# 2. Build + Netflix golden gate meson setup build -Denable_cuda=false -Denable_sycl=false ninja -C build make test-netflix-golden

0231 — vmaf-tune score path decodes mp4 -> raw YUV

  • Touches: tools/vmaf-tune/src/vmaftune/score.py (new _decode_to_raw_yuv + _needs_decode helpers, run_score shells out to ffmpeg when req.distorted.suffix not in {.yuv, .y4m}); tools/vmaf-tune/tests/test_corpus.py (3 new regression tests + the smoke-end-to-end mock now also stubs the ffmpeg decode call).
  • Invariant: the decode-back is the contract the libvmaf CLI imposes — mp4/webm/etc. --distorted is silently rejected as raw-yuv with the wrong byte count, surfacing as exit_status=234. Future encoder adapters that emit non-raw containers inherit this decode automatically. Do not "optimise" the temp YUV away without first migrating the corpus pipeline to the ffmpeg+libvmaf filter (which can pipe an mp4 stream in directly).
  • On upstream sync: zero interaction. vmaf-tune is fork-only tooling; upstream Netflix/vmaf has no analogue.
  • Re-test on rebase:

```bash cd tools/vmaf-tune && python3 -m pytest tests/ # plus an end-to-end smoke (needs a real raw YUV + ffmpeg + vmaf): ./vmaf-tune corpus --source /path/to/ref.yuv --width 1920 \ --height 1080 --pix-fmt yuv420p --framerate 25 --duration 6 \ --encoder libx264 --preset medium --crf 23 \ --output /tmp/smoke.jsonl --no-source-hash # expect: vmaf_score is a real number, not NaN.

0232 — CUDA build pins nvcc --std c++20

  • Touches: core/src/meson.build line 686 (cuda_flags = [...]).
  • Invariant: nvcc 12.x clamps host C++ at C++17 by default; 13.x accepts up to C++20. Bumping the host stdlib past nvcc's default (any gcc >= 16, libstdc++ ships C++23 features) breaks the host-side parse in <type_traits> / <bits/utility.h>. Forcing --std c++20 on CUDA 13+ keeps the host headers parseable. Do not drop this flag without first checking the host gcc version against nvcc's default.
  • On upstream sync: zero interaction. Netflix/vmaf doesn't ship the cuda_flags list shape we use (their CUDA build is the original pre-fork pattern); a sync that touches core/src/meson.build around the is_cuda_enabled branch should keep the --std c++20 injection.
  • Re-test on rebase:
meson setup core/build-cuda -Denable_cuda=true \
    -Denable_sycl=false -Denable_vulkan=disabled
ninja -C core/build-cuda
# smoke
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
    -r .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
    -d .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
    -w 1920 -h 1080 -p 420 -b 8

0233 — CUDA motion flush_fex_cuda idempotency guard

  • Touches: core/src/feature/cuda/integer_motion_cuda.c — factored an append_if_unwritten helper and routed the two motion2 / motion3 final-frame writes through it.
  • Invariant: under T-GPU-OPT-1 (PR #312 / ADR-0242), the pending-collect inside flush_context_cuda may already have written motion2_score[s->index] / motion3_score[s->index] before flush_fex_cuda runs. Any future motion-cuda flush logic that emits the same (feature, index) pair must keep this idempotency contract or flush_context_cuda will mis-surface as "context could not be synchronized".
  • On upstream sync: the bug only exists because the fork's flush_context_cuda runs the pending-collect before the per-extractor flush. Netflix/vmaf upstream doesn't have the T-GPU-OPT-1 drain pattern, so the pre-#312 code path didn't duplicate-write. If Netflix lands a similar pattern, the fix shape mirrors what's done here.
  • Re-test on rebase:
ninja -C core/build-cuda
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 \
  --model path=model/vmaf_v0.6.1.json --threads 1 -q \
  --output /tmp/cuda.json --json
# Expect: clean run, no "cannot be overwritten" warning,
# no "problem flushing context" error.

0234 — hw_encoder_corpus.py Phase A real-corpus runner

  • Touches: new scripts/dev/hw_encoder_corpus.py (no existing caller; opt-in tooling). Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. docs/development/intel-arc-vaapi-driver-priority.md. Output landing in runs/phase_a/ is gitignored — rerun the script to reproduce. stratified sample, 58 KiB).
  • Invariant: the script's QSV path forces env['LIBVA_DRIVER_NAME']='iHD' (set by the calling shell, not inside the script) when targeting /dev/dri/renderD129 on a multi-card host that has NVIDIA's libva-driver-nvidia shim installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI translation and the MFX session handshake fails with -9. See the companion doc for the failure mode + fix.
  • On upstream sync: zero interaction. The script lives under scripts/dev/ (fork-only); upstream Netflix/vmaf has no comparable Phase A corpus tooling.
  • Re-test on rebase:
python3 scripts/dev/hw_encoder_corpus.py \
  --vmaf-bin core/build-cuda/tools/vmaf \
  --source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
  --width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
  --encoder h264_nvenc --cq 25 \
  --out /tmp/smoke.jsonl
# Expect: 1 cell × ~150 frames, per-frame canonical-6 + vmaf,
# encoder=h264_nvenc, cq=25.

0235 — fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)

  • Touches: ai/scripts/train_fr_regressor_v2.pyENCODER_VOCAB gains 6 hw-codec entries (3 NVENC + 3 QSV); ENCODER_VOCAB_VERSION bumps 1 -> 2; PRESET_ORDINAL gains 6 sub-tables for p1..p7 (NVENC) and the libx264-aligned QSV preset family.
  • Invariant: vocab order is load-bearing — index of every entry is baked into trained model graphs as a one-hot column position. New entries MUST be appended (never inserted into the middle), and the unknown sentinel MUST stay last (UNKNOWN_ENCODER_INDEX = N - 1). Bumping ENCODER_VOCAB_VERSION signals that any v1-graph ONNX needs re-export against v2 before consuming v2 training rows.
  • On upstream sync: zero interaction. train_fr_regressor_v2.py is fork-only (Phase B prereq, ADR-0237 / ADR-0272).
  • Re-test on rebase: python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export — expect PLCC > 0.95 on a multi-codec corpus.

0276 — vmaf_tiny_v5 corpus-expansion probe (ADR-0287) — defer

  • What changed: research-only addition. New scripts under ai/scripts/ (fetch_youtube_ugc_subset.py, extract_ugc_features.py, train_vmaf_tiny_v5.py, eval_loso_vmaf_tiny_v5.py), new ADR docs/adr/0276-*.md, new research digest docs/research/0057-*.md, and one CHANGELOG entry. No new ONNX artefact under model/tiny/, no registry change, no public C-API / CLI / meson_options change. The probe trained an architecturally identical mlp_small on a 5-corpus parquet (4-corpus + 27 000 UGC rows); the 1-σ ship gate did not clear (Δ PLCC = +0.00005), so the exporter that the prior agent had drafted (export_vmaf_tiny_v5.py) was discarded before the commit.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI corpus-expansion surface; nothing on the upstream side touches these files.
  • On upstream sync: zero interaction. The v5 surface lives entirely under ai/scripts/ + docs/adr/ + docs/research/, all of which are fork-introduced trees. The shipped v2 model (model/tiny/vmaf_tiny_v2.onnx) and its registry row are untouched.
  • Re-test on rebase:
# No code under test on rebase — purely research artefacts.
# If revisiting the corpus expansion, the reproducer is in the
# research digest:
python3 ai/scripts/fetch_youtube_ugc_subset.py \
    --out-dir .workingdir2/ugc/download \
    --n-stems 30 \
    --manifest .workingdir2/ugc/manifest.json
python3 ai/scripts/extract_ugc_features.py \
    --manifest .workingdir2/ugc/manifest.json \
    --yuv-dir .workingdir2/ugc/yuv \
    --vmaf-bin build-cpu/tools/vmaf \
    --out-parquet runs/full_features_ugc.parquet \
    --max-height 360 --max-frames 300 --threads 8
python3 ai/scripts/eval_loso_vmaf_tiny_v5.py \
    --parquet-base  runs/full_features_4corpus.parquet \
    --parquet-extra runs/full_features_ugc.parquet \
    --out-json      runs/vmaf_tiny_v5_loso_metrics.json

0227 — vmaf-tune Intel QSV codec adapters (ADR-0281)

  • What changed: fork-local additions under tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, h264_qsv.py, hevc_qsv.py, av1_qsv.py, plus registry rows in codec_adapters/__init__.py and a new test file tools/vmaf-tune/tests/test_codec_adapter_qsv.py. Doc updates: docs/usage/vmaf-tune.md (Hardware encoders section), docs/adr/0281-vmaf-tune-qsv-adapters.md, docs/research/0066-vmaf-tune-qsv-adapters.md, tools/vmaf-tune/AGENTS.md, CHANGELOG.md.
  • Upstream source: fork-local. tools/vmaf-tune/ is fork-introduced under ADR-0237; Netflix/vmaf has no corresponding tree.
  • On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths.
  • Invariant: the registry exposes exactly four codecs (av1_qsv, h264_qsv, hevc_qsv, libx264 — alphabetical), each adapter validates its (preset, quality) pair, and the QSV preset vocabulary is the seven x264-style names (veryslow…veryfast, no ultrafast / superfast). The encode pipeline (encode.py) remains x264-CRF-tied and will be widened in a separate PR — the QSV adapters are inert until then. Future codec families that share parameter shape (NVENC, AMF) follow the same _<family>_common.py + N thin adapters pattern.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/

0229 — vmaf-tune libvvenc + NN-VC codec adapter (ADR-0285)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/vvenc.py (new fork-only file), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry edit, fork-only), tools/vmaf-tune/tests/test_codec_adapter_vvenc.py (new), tools/vmaf-tune/tests/test_corpus.py (relaxes the known_codecs() == ("libx264",) assertion to "libx264" in known_codecs() since the registry now spans multiple codecs).
  • Invariant: the codec-adapter registry is fork-introduced (Phase A of ADR-0237) and lives entirely outside the upstream Netflix tree, so tools/vmaf-tune/ does not touch upstream paths. The only rebase-sensitive surface is the CORPUS_ROW_KEYS schema in src/vmaftune/__init__.py (per the Phase A invariant in tools/vmaf-tune/AGENTS.md); this PR adds the adapter without changing the schema.
  • Upstream interaction: none. tools/vmaf-tune/ is not in Netflix/vmaf upstream.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/
  • Status update 2026-05-09: the original nnvc_intra toggle was removed (it emitted a fabricated IntraNN key that does not exist in any released VVenC). Replaced with a curated 9-knob real-VVenC 1.14.0 tuning surface (PerceptQPA, InternalBitDepth, Tier, Tiles, MaxParallelFrames, RPR, SAO, ALF, CCALF). Defaults preserve the bit-exact Phase A grid baseline. adapter_version bumped to "2" so cache keys invalidate. See ADR-0285 §"Status update 2026-05-09". no rebase impact: REASON (fork-local file, no upstream-tree touch).

0228 — vmaf-tune Phase D scaffold (ADR-0276)

  • Touches: tools/vmaf-tune/src/vmaftune/per_shot.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_per_shot.py, docs/usage/vmaf-tune.md, docs/adr/0276-vmaf-tune-phase-d-per-shot.md.
  • Invariant: scaffold-only. The module relies on a stable predicate signature (shot, target_vmaf, encoder) -> (crf, predicted_vmaf) that Phase B's bisect (PR #347) drops into later. Shot ranges are half-open [start_frame, end_frame) even though the C-side vmaf-perShot JSON/CSV sidecar uses an inclusive end_frame — normalisation happens at the parse boundary in _parse_per_shot_json / parse_per_shot_csv. vmaf-perShot schema lives in docs/usage/vmaf-perShot.md and is fork-local (ADR-0222), so upstream cannot drift it; the only rebase risk is fork-internal renames.
  • Upstream source: entirely fork-local. tools/vmaf-tune/ is fork-introduced (ADR-0237). Netflix/vmaf upstream has no encode-automation surface.
  • On upstream sync: zero interaction expected. No file in this PR overlaps an upstream-mirrored path.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q
python tools/vmaf-tune/vmaf-tune tune-per-shot --help

0229 — vmaf-tune SVT-AV1 codec adapter (ADR-0278)

  • Touches: tools/vmaf-tune/src/vmaftune/codec_adapters/svtav1.py (new), tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (registry), tools/vmaf-tune/src/vmaftune/encode.py (parse_versions extended for the SVT-AV1 banner pattern), tools/vmaf-tune/src/vmaftune/corpus.py (optional ffmpeg_preset_token hook).
  • Invariant: PRESET_NAME_TO_INT is closed and order-stable; the integer values are baked into corpus rows that downstream fr_regressor_v2 (ADR-0235) trains on. Reordering or rewriting the table silently changes the integer SVT-AV1 receives. The codec key "libsvtav1" matches CODEC_VOCAB[2] in ai/src/vmaf_train/codec.py — keep them aligned on any rename.
  • Upstream source: fork-local. tools/vmaf-tune/ is a fork-introduced tree (see entry 0227 — Phase A scaffold). No Netflix/vmaf upstream interaction.
  • On upstream sync: zero interaction. Lives entirely under the fork-local tools/vmaf-tune/ tree.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -v

0230 — fr_regressor_v2 PROD ship (ADR-0352)

  • ADR: ADR-0352
  • Touches: model/tiny/fr_regressor_v2.onnx (binary, refreshed), model/tiny/fr_regressor_v2.json (sidecar, sha256 + metrics), model/tiny/registry.json (smoke flag flip, sha256 update), runs/phase_a/full_grid/per_frame_canonical6.jsonl (training corpus — fork-local artefact under runs/), companion docs.
  • Re-test recipe: see Research-0068 §Reproducer. Ship gate is LOSO PLCC ≥ 0.95 on the per-source folds; current run reports 0.9681 ± 0.0207.
  • Rebase invariant: the per-frame canonical-6 corpus must be rebuilt from runs/phase_a/{nvenc,qsv}_pf.jsonl (PR #392) before any retrain; do not re-train against the cell-only comprehensive.jsonl (it lacks the per-frame features and produces PLCC ≈ 0.7 — the smoke baseline).
  • No upstream interaction: fr_regressor_v2 is fork-local (ADR-0272).

0229 — vmaf-tune Phase E ladder generator (ADR-0295)

  • ADR: ADR-0295
  • Touches: entirely fork-local under tools/vmaf-tune/. New module tools/vmaf-tune/src/vmaftune/ladder.py, new test file tools/vmaf-tune/tests/test_ladder.py, two new subcommand blocks in tools/vmaf-tune/src/vmaftune/cli.py. No upstream-shared paths touched.
  • Invariant: vmaftune.ladder.convex_hull returns a strictly monotonic Pareto frontier (both bitrate and vmaf monotonically increasing); select_knees returns exactly min(n, len(hull)) rungs in ascending bitrate order; emit_manifest("hls") produces one #EXT-X-STREAM-INF per rung with monotonically-increasing BANDWIDTH= values. The default _default_sampler is intentionally NotImplementedError — production callers must inject a Phase B bisect-driven sampler. Phase B integration PR (gated on PR #347) swaps the default; the test suite continues to inject a synthetic stub.
  • Rebase impact: none — fork-local Python tool; upstream Netflix/vmaf does not ship a tools/vmaf-tune/ tree.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_ladder.py -v

0229 — fr_regressor_v2 probabilistic head scaffold (ADR-0279)

  • Touches:
  • ai/scripts/train_fr_regressor_v2_ensemble.py (new — fork-local).
  • ai/scripts/eval_probabilistic_proxy.py (new — fork-local).
  • model/tiny/fr_regressor_v2_ensemble_v1*.onnx, fr_regressor_v2_ensemble_v1.json (new artefacts; smoke probes).
  • model/tiny/registry.json — five new kind: "fr" rows (fr_regressor_v2_ensemble_v1_seed{0..4}); existing entries untouched.
  • ai/AGENTS.md — new "fr_regressor_v2_ensemble_v1 — probabilistic head" section pinning the per-member ONNX I/O contract, manifest-as-runtime-entry-point invariant, ensemble-size pin, confidence-rule one-of, codec-vocab parity, and smoke-artefact posture.
  • docs/ai/models/fr_regressor_v2_probabilistic.md (new model card).
  • docs/research/0067-fr-regressor-v2-probabilistic.md (new audit digest).
  • docs/adr/0279-fr-regressor-v2-probabilistic.md (new ADR; Proposed). Index row appended to docs/adr/README.md.
  • CHANGELOG.md### Added row under "Unreleased — lusoris fork".
  • Invariant: the per-member ONNX I/O contract (two inputs: features [N, 6] standardised + codec_onehot [N, NUM_CODECS]; one output score [N]) and the manifest's confidence rule (one-of "ensemble" / "ensemble+conformal") are the C-side adapter's load-bearing contract. Per-member ensembles are stock FRRegressor(num_codecs=NUM_CODECS) calls — flipping to a v1-shaped single-input graph silently invalidates the manifest. CODEC_VOCAB parity with ai/src/vmaf_train/codec.py is required.
  • On upstream sync: zero interaction expected. Wholly fork-local; no upstream Netflix/vmaf path overlap. The ai/ package is fork-introduced (see ADR-0021, ADR-0036) — upstream has no probabilistic-regressor surface. If upstream ever ships its own fr_regressor_v2 variant, do NOT merge — register both ids side-by-side.
  • Re-test on rebase:
python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke
python ai/scripts/eval_probabilistic_proxy.py --smoke
python ai/scripts/validate_model_registry.py

0287 — vmaf-tune saliency-aware ROI tuning (ADR-0293)

  • Touches: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/cli.py (new recommend subcommand), tools/vmaf-tune/AGENTS.md (saliency invariant), docs/usage/vmaf-tune.md (saliency section).
  • Upstream source: fork-local. The vmaf-tune tree was introduced in PR #329 (ADR-0237 Phase A) and has no upstream Netflix counterpart.
  • On upstream sync: zero interaction — pure fork-local Python package under tools/vmaf-tune/.
  • Invariant: the saliency-to-QP-offset signal blend (offset = (2*sal − 1) * foreground_offset, clamped to ±12) is bit-for-bit equivalent to vmaf-roi's C-side blend (ADR-0247). tests/test_saliency.py pins the contract; if vmaf-roi's C blend changes, saliency.py follows in the same PR. The test seam contract (session_factory=…, encode_runner=…) lets the suite run without onnxruntime or ffmpeg.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/ -q

0229 — tools/vmaf-roi-score/ Option C scaffold (ADR-0296)

  • ADR: ADR-0296
  • Touches:
  • tools/vmaf-roi-score/pyproject.toml (new)
  • tools/vmaf-roi-score/vmaf-roi-score (new console shim)
  • tools/vmaf-roi-score/src/vmafroiscore/__init__.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/cli.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/score.py (new)
  • tools/vmaf-roi-score/src/vmafroiscore/mask.py (new)
  • tools/vmaf-roi-score/tests/test_combine.py (new)
  • tools/vmaf-roi-score/README.md (new)
  • tools/vmaf-roi-score/AGENTS.md (new)
  • docs/adr/0296-vmaf-roi-saliency-weighted.md (new)
  • docs/adr/_index_fragments/0296-vmaf-roi-saliency-weighted.md (new)
  • docs/adr/_index_fragments/_order.txt — append-only.
  • docs/research/0069-vmaf-roi-saliency-weighted.md (new)
  • docs/usage/vmaf-roi-score.md (new)
  • changelog.d/added/T6-2c-vmaf-roi-score-scaffold.md (new)
  • Invariant: tools/vmaf-roi-score/ is wholly fork-local. No upstream Netflix/vmaf surface owns or interacts with this directory. The combine math is a pure linear blend on Python float; the JSON schema is pinned by ROI_RESULT_KEYS and SCHEMA_VERSION = 1. Schema bumps require an ADR-0288 supersession. Naming guard: do not confuse with core/tools/vmaf_roi.c (ADR-0247) — that's the encoder-steering binary. The scoring tool here is vmaf-roi-score; the names diverge deliberately.
  • Rebase impact: zero. Pure-Python tool under tools/; not part of the libvmaf C build, not part of any Netflix-mirrored surface.
  • Re-test on rebase:
pytest tools/vmaf-roi-score/tests

0228 — vmaf-tune compare codec-comparison mode (research-0061 Bucket #7)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/compare.py (new). Wholly fork-local; no upstream Netflix/vmaf path overlap.
  • tools/vmaf-tune/src/vmaftune/cli.py — adds the compare subparser and _run_compare router.
  • tools/vmaf-tune/tests/test_compare.py (new). Mocked predicate; no ffmpeg / vmaf binaries required.
  • tools/vmaf-tune/AGENTS.md — invariant note for the predicate seam and COMPARE_ROW_KEYS contract.
  • docs/usage/vmaf-tune.md — new "Codec comparison" section.
  • Invariant: compare.compare_codecs orchestrates per-codec ranking via an injected predicate(codec, src, target_vmaf) -> RecommendResult callable. The orchestration must not branch on codec name; new codecs land as one-file additions under codec_adapters/ and are picked up automatically by the registry. COMPARE_ROW_KEYS is the JSON / CSV column contract — same maintenance discipline as CORPUS_ROW_KEYS.
  • Rebase impact: entirely fork-local. The Phase A + Phase B recommend backend (ADR-0237) is fork-internal; upstream Netflix/vmaf has no tools/vmaf-tune/ tree.
  • Re-test on rebase:

```shell pytest tools/vmaf-tune/tests/test_compare.py -v PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \ --src /tmp/ref.yuv --target-vmaf 92 --format markdown

0229 — vmaf-tune --score-backend GPU score wiring (ADR-0299)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/score_backend.py (new). Wholly fork-local — tools/vmaf-tune/ has no upstream Netflix/vmaf overlap.
  • tools/vmaf-tune/src/vmaftune/{score,corpus,cli}.py (additive kwargs, no API removals).
  • tools/vmaf-tune/tests/test_score_backend.py (new).
  • docs/usage/vmaf-tune.md (new GPU section + flag row).
  • docs/adr/0299-vmaf-tune-gpu-score.md (new).
  • docs/research/0071-vmaf-tune-gpu-score-backend.md (new).
  • Invariant: the libvmaf CLI exposes --backend NAME with values auto|cpu|cuda|sycl|vulkan exactly. Help-text parser in score_backend.parse_supported_backends pins this format. If upstream renames the flag or reformats the help line on merge, the parser silently degrades to "CPU only" — the test fixtures in test_score_backend.py will catch the format change but only if re-run.
  • Upstream source: fork-local. Netflix upstream's CLI does not ship a --backend selector (CPU-only).
  • On upstream sync: zero interaction. vmaf-tune lives entirely in fork-introduced paths and consumes only the fork's --backend flag.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v
# If the libvmaf help text reformats, parse_supported_backends
# will return {"cpu"} on test_parse_full_backend_line_yields_all_four
# and the test fails loudly.

0261 — vmaf-tune HDR-aware encode + score path (2026-05-03)

  • What changed: fork-local addition under tools/vmaf-tune/src/vmaftune/hdr.py plus wiring into corpus.py / cli.py / score.py. Adds ffprobe-driven HDR detection, codec-specific HDR ffmpeg flag dispatch, schema-v2 corpus row keys (hdr_transfer, hdr_primaries, hdr_forced), and four --auto-hdr / --force-* CLI modes. See ADR-0300.
  • Upstream source: zero. tools/vmaf-tune/ is fork-introduced (Phase A under ADR-0237).
  • On upstream sync: zero interaction. Upstream Netflix/vmaf ships no encode automation surface; this tree is entirely fork-local and lives outside libvmaf/ and python/.
  • Schema migration note: SCHEMA_VERSION bumped 1 → 2. The three new keys are additive — Phase B / C loaders treat missing keys as SDR for backward compat with v1 rows.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -q
python -m vmaftune.cli corpus --help  # confirm --auto-hdr surfaces

HP-2 — vmaf-tune HDR iter_rows integration (2026-05-08)

  • What changed: fork-local. tools/vmaf-tune/src/vmaftune/corpus.py now imports vmaftune.hdr and wires detect_hdr / hdr_codec_args / select_hdr_vmaf_model into the per-source encode + score loop. The 0300 PR landed hdr.py and the four CLI flags but never imported the module — PQ sources silently encoded as SDR. Schema bumps v2 → v3 because the originally-promised hdr_transfer / hdr_primaries / hdr_forced row columns finally land. See ADR-0300 § Status update 2026-05-08.
  • Upstream source: zero. Fork-only.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is fork-introduced.
  • Schema migration note: SCHEMA_VERSION 2 → 3 (additive). The three HDR keys default to "" / "" / False for SDR rows; Phase B / C loaders that ignore unknown keys keep working against v3 rows.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_hdr.py -q
python -m pytest tools/vmaf-tune/tests/test_corpus.py::test_corpus_row_keys_match_init_contract -q

0298 — vmaf-tune content-addressed cache (ADR-0298)

  • What changed: fork-local. New module tools/vmaf-tune/src/vmaftune/cache.py; cache integration in tools/vmaf-tune/src/vmaftune/corpus.py (iter_rows now consults the cache before encode/score); new CLI flags --no-cache, --cache-dir, --cache-size-gb in cli.py. Codec-adapter Protocol gains adapter_version: str; the lone Phase-A x264 adapter pins "1".
  • Upstream source: none. tools/vmaf-tune/ is fork-introduced (ADR-0237) and has no upstream counterpart.
  • On upstream sync: zero interaction with Netflix/vmaf master. The module sits entirely under tools/vmaf-tune/, which upstream does not ship.
  • Invariant for future codec adapters: every CodecAdapter must declare adapter_version: str. Bump it whenever the adapter's argv shape, preset list, or quality range changes — otherwise the cache returns stale results post-upgrade. The contract is asserted by test_cache_key_diffs_on_each_field in tests/test_cache.py.
  • Re-test on rebase:

```bash pytest tools/vmaf-tune/tests/test_cache.py -v

0283 — vmaf-tune Apple VideoToolbox adapters (2026-05-05)

  • What changed: fork-local addition under tools/vmaf-tune/src/vmaftune/codec_adapters/. New files: h264_videotoolbox.py, hevc_videotoolbox.py, _videotoolbox_common.py, plus the registry hook in __init__.py. See ADR-0283.
  • Update 2026-05-09: prores_videotoolbox.py adapter added to the same registry pattern (broadcast / prosumer ProRes intermediate). Quality knob differs — ProRes is a fixed-rate codec, so the harness's --crf slot carries the integer ProRes tier id (0=proxy → 5=xq) rather than a -q:v value. _videotoolbox_common.py extended with PRORES_PROFILE_* constants + validate_prores_videotoolbox() / prores_profile_name() helpers; profile ids verified against FFmpeg n8.1.1 libavcodec/videotoolboxenc.c. See the Status update appendix in ADR-0283.
  • Upstream source: zero. tools/vmaf-tune/ is fork-introduced (Phase A under ADR-0237).
  • On upstream sync: zero interaction.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py -q
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_prores_videotoolbox.py -q

0228 — vmaf-tune coarse-to-fine CRF search (ADR-0306)

  • What changed: fork-local tooling. Adds coarse_to_fine_search() to tools/vmaf-tune/src/vmaftune/corpus.py, plumbs new CLI flags onto vmaf-tune corpus (--coarse-to-fine, --coarse-step, --fine-radius, --fine-step, --target-vmaf), and ships a new vmaf-tune recommend subcommand. Widens tools/vmaf-tune/src/vmaftune/codec_adapters/x264.py quality_range from (15, 40) to (0, 51). JSONL row schema unchanged (SCHEMA_VERSION=1).
  • Upstream source: fork-local. The whole tools/vmaf-tune/ tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation surface.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is not mirrored from upstream.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_corpus.py -k coarse_to_fine

0314 — vmaf-tune --score-backend=vulkan (ADR-0314)

  • Touches:
  • tools/vmaf-tune/src/vmaftune/cli.py (additive argparse flag on corpus + recommend subparsers; resolves select_backend and catches BackendUnavailableError for clean exit-2).
  • tools/vmaf-tune/src/vmaftune/score.py (additive backend kwarg on build_vmaf_command and run_score; None = no flag emitted).
  • tools/vmaf-tune/src/vmaftune/corpus.py (new CorpusOptions.score_backend field, default None; forwarded into run_score).
  • tools/vmaf-tune/tests/test_score_backend.py (additive Vulkan-specific tests; pre-existing tests now pass after the backend= kwarg lands).
  • docs/adr/0314-vmaf-tune-score-backend-vulkan.md (new).
  • docs/usage/vmaf-tune.md (new "Vulkan score backend" subsection under the existing GPU-scoring section).
  • tools/vmaf-tune/AGENTS.md (invariant note: argparse choices stay in sync with libvmaf --backend vocabulary).
  • changelog.d/added/vmaf-tune-score-backend-vulkan.md (new).
  • Invariant: score_backend.ALL_BACKENDS = ("cpu", "cuda", "sycl", "vulkan") is the exact set libvmaf's core/tools/cli_parse.c --backend alternation accepts. Adding a new harness-side value without the libvmaf-side wiring produces silent strict-mode failures on hosts that probe positively for it.
  • Upstream source: zero. Netflix upstream's CLI does not ship a --backend selector; both tools/vmaf-tune/ and core/src/vulkan/ are fork-introduced.
  • On upstream sync: zero interaction. No upstream-mirror file is touched.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v -k vulkan
pytest tools/vmaf-tune/tests/test_score_backend.py -v

Failures here usually indicate the libvmaf help-text format changed; score_backend.parse_supported_backends test fixtures pin the format and will fail loudly.

0303 — fr_regressor_v2 ensemble prod flip (ADR-0303)

  • ADR: ADR-0303
  • Touches: entirely fork-local.
  • ai/scripts/train_fr_regressor_v2_ensemble_loso.py (new — 9-fold LOSO trainer over the five ensemble seeds; emits loso_seed{N}.json artefacts).
  • scripts/ci/ensemble_prod_gate.py (new — reads five loso_seed{N}.json files, returns exit 0 iff mean(PLCC_i) ≥ 0.95 AND max - min ≤ 0.005).
  • ai/AGENTS.md — appended "Ensemble registry invariant" paragraph under the existing fr_regressor_v2_ensemble_v1 section.
  • docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md (new), docs/research/0075-fr-regressor-v2-ensemble-prod-flip.md (new), changelog.d/added/fr-regressor-v2-ensemble-prod-flip.md (new).
  • Rebase invariant: the production ship gate is two-partmean_i(PLCC_i) ≥ 0.95 AND max_i(PLCC_i) - min_i(PLCC_i) ≤ 0.005 over five seeds. The variance bound is load-bearing: removing it silently allows a one-seed-wins-four-seeds-tie configuration that invalidates the ensemble's predictive-distribution semantics. Both thresholds live in scripts/ci/ensemble_prod_gate.py; do not weaken either without superseding ADR-0303.
  • Rebase invariant (registry): the five fr_regressor_v2_ensemble_v1_seed{0..4} registry rows are smoke: true on master at this commit; flipping them to false is the follow-up flip PR's job, gated on a real-corpus LOSO run + the CI gate. Do not flip seed rows during a rebase merge conflict resolution.
  • Re-test on rebase:
python3 -c "import ast; ast.parse(open('ai/scripts/train_fr_regressor_v2_ensemble_loso.py').read())"
python3 -c "import ast; ast.parse(open('scripts/ci/ensemble_prod_gate.py').read())"
python ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help
python scripts/ci/ensemble_prod_gate.py --help
  • Upstream source: zero. fr_regressor_v2 and its ensemble are fork-introduced (parent ADR-0272 / ADR-0279).
  • On upstream sync: zero interaction.

0313 — CI required-checks aggregator (2026-05-05)

  • What changed: fork-local CI policy. New .github/workflows/required-aggregator.yml — single workflow that runs on every non-draft PR and verifies the 23 named required checks reported success/skipped/neutral (or didn't appear at all, which is the path-filter-rejection semantics). Aggregator becomes the single branch-protection required check, replacing the 23-name list from ADR-0037.
  • Touches: .github/workflows/required-aggregator.yml (new), docs/adr/0313-ci-required-checks-aggregator.md (new), changelog.d/added/ci-required-checks-aggregator.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment file).
  • Upstream source: zero. Branch-protection policy is fork-only.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Manual operator step at adoption (uses PATCH, not PUT — corrected from the original ADR-0313 body which had the wrong verb):
echo '{"strict": false, "contexts": ["Required Checks Aggregator"]}' | \
  gh api -X PATCH "repos/VMAFx/vmafx/branches/master/protection/required_status_checks" --input -
  • Re-test on rebase:
# YAML lint passes
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/required-aggregator.yml'))"

0305 — encoder knob-space Pareto analysis (2026-05-05)

  • What changed: fork-local. New analysis scaffold for the 12,636-cell encoder knob sweep that backs tools/vmaf-tune/codec_adapters/* recipe defaults. New files: ai/scripts/analyze_knob_sweep.py (per-(source, codec, rc_mode) Pareto hull on (bitrate_kbps, vmaf_score), encode_time_ms tiebreaker, regression-detection check), ai/tests/test_knob_sweep_analysis.py (synthetic 20-row JSONL fixture). Methodology + scaffolded findings: see ADR-0305 + Research-0077. Companion to Research-0063.
  • Touches: none upstream-shared. Sits entirely under ai/ (fork-local since the tiny-AI training surface, ADR-0021) and docs/{adr,research}/ (fork ledger).
  • Upstream source: zero. The 12,636-cell sweep, the Pareto scaffold, and the regression-detection invariant are fork-introduced; Netflix/vmaf master ships no encoder knob-sweep tooling.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Invariant for future codec adapter PRs: per the ai/AGENTS.md knob-sweep corpus invariant (ADR-0305), recipes that regress vs the bare encoder at matched bitrate within the same (source, codec, rc_mode) slice MUST NOT ship as adapter defaults. New adapter PRs cite the per-slice hull row from reports/summary.md (or "no hull entry yet — bare default") in their PR description. The comprehensive.jsonl sweep file is generated locally and lives under runs/phase_a/full_grid/ (gitignored — never committed).
  • Re-test on rebase:
pytest ai/tests/test_knob_sweep_analysis.py -v

0302 — ENCODER_VOCAB v3 schema expansion (ADR-0302)

  • Touches: ai/scripts/train_fr_regressor_v2.py (adds an ENCODER_VOCAB_V3 parallel constant; does not modify the live ENCODER_VOCAB or ENCODER_VOCAB_VERSION).
  • Invariant: ENCODER_VOCAB is append-only and order-stable (per ADR-0235). The v3 scaffold preserves the v2 slot ordering verbatim — slots 0..12 are bit-identical to the v2 vocab; slots 13/14/15 append libsvtav1, h264_videotoolbox, hevc_videotoolbox. The live ENCODER_VOCAB_VERSION = 2 remains the source of truth until the follow-up retrain PR clears the LOSO PLCC ship gate.
  • Upstream interaction: zero. ai/scripts/train_fr_regressor_v2.py is fork-introduced (ADR-0272) and has no upstream counterpart.
  • Re-test on rebase:
python3 -c "
import importlib.util, pathlib
spec = importlib.util.spec_from_file_location(
    't', pathlib.Path('ai/scripts/train_fr_regressor_v2.py')
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
assert len(m.ENCODER_VOCAB_V3) == 16
assert m.ENCODER_VOCAB_VERSION == 2
print('OK')
"

0304 — vmaf-tune fast-path prod wiring (ADR-0304)

  • Touches: tools/vmaf-tune/src/vmaftune/fast.py (replaces the ADR-0276 scaffold's NotImplementedError paths with concrete Optuna TPE + v2 proxy + GPU verify wiring); new module tools/vmaf-tune/src/vmaftune/proxy.py (centralised seam for fr_regressor_v2 ONNX inference); expanded tools/vmaf-tune/tests/test_fast.py. Doc-side: ADR-0304, Research-0076, tools/vmaf-tune/AGENTS.md invariant note.
  • Upstream source: zero. tools/vmaf-tune/ and model/tiny/fr_regressor_v2.onnx are both fork-introduced (ADR-0237 / ADR-0352).
  • Invariant: the production proxy is always fr_regressor_v2 (no smoke models in the production path) and a single GPU verify pass at recommend-end is mandatory — proxy alone never wins. The vmaftune.proxy.run_proxy helper is the single seam every fast-path consumer goes through; future probabilistic-head / ensemble migrations land in that one module. ENCODER_VOCAB v2 one-hot ordering is frozen by ADR-0352 and pinned in proxy.ENCODER_VOCAB_V2 — keep in sync with ai/scripts/train_fr_regressor_v2.py; drift raises ProxyError at inference time before bad predictions ship.
  • On upstream sync: zero interaction with Netflix/vmaf master.
  • Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_fast.py -v

0307 — vmaf-tune ladder default sampler wiring (ADR-0307)

  • What changed: fork-local tooling. tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler no longer raises NotImplementedError; it composes corpus.iter_rows (Phase A encode + score) with recommend.pick_target_vmaf (smallest CRF clearing target VMAF) over DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38) at the adapter's mid-range preset. Module-level docstring + AGENTS.md invariant updated. New tests in tools/vmaf-tune/tests/test_ladder.py stub iter_rows via monkeypatch.setattr so no live ffmpeg / vmaf binaries are needed.
  • Upstream source: fork-local. The whole tools/vmaf-tune/ tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation / ladder surface.
  • On upstream sync: zero interaction. tools/vmaf-tune/ is not mirrored from upstream.
  • Rebase invariant: the 5-point sweep (18, 23, 28, 33, 38) is the load-bearing default; downstream Phase E callers size their wall-time budget against five encodes per (resolution, target_vmaf) cell. Do not widen / narrow it without an ADR-0307 follow-up. The SamplerFn seam stays open — callers needing finer grids pass an explicit sampler=.
  • Re-test on rebase:
pytest tools/vmaf-tune/tests/test_ladder.py -v

0309 — fr_regressor_v2 ensemble real-corpus retrain harness (ADR-0309)

  • ADR: ADR-0309
  • Touches: entirely fork-local.
  • ai/scripts/run_ensemble_v2_real_corpus_loso.sh (new — Bash wrapper that loops the five seeds over the existing train_fr_regressor_v2_ensemble_loso.py against .workingdir2/netflix/).
  • ai/scripts/validate_ensemble_seeds.py (new — calls the ADR-0303 gate and writes PROMOTE.json / HOLD.json with a corpus sha256 snapshot).
  • ai/tests/test_validate_ensemble_seeds.py (new — 7 tests, synthetic JSON fixtures for both verdict paths).
  • ai/AGENTS.md — appended "Registry-flip is a separate PR (ADR-0309)" paragraph under the existing fr_regressor_v2_ensemble_v1 section.
  • docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md, docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md (all new).
  • Rebase invariant: the harness is decoupled from the registry mutation. Neither the wrapper nor the validator touches model/tiny/registry.json; the registry flip is a separate follow-up PR gated on a passing PROMOTE.json. Auto-flipping on PROMOTE was rejected in ADR-0309's alternatives matrix specifically because rebase-time mutation of shipped registry rows is the foot-gun this invariant exists to prevent.
  • Re-test on rebase:
python -m pytest ai/tests/test_validate_ensemble_seeds.py -v
python ai/scripts/validate_ensemble_seeds.py --help
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
  • Upstream source: zero.
  • On upstream sync: zero interaction.

0310 — BVI-DVC corpus ingestion for fr_regressor_v2 (ADR-0310)

  • Touches: ai/scripts/bvi_dvc_to_corpus_jsonl.py (new fork-only adapter), ai/scripts/merge_corpora.py (new fork-only shard merger), ai/tests/test_merge_corpora.py (new), docs/ai/bvi-dvc-corpus-ingestion.md (new), docs/adr/0310-bvi-dvc-corpus-ingestion.md (new), docs/research/0082-bvi-dvc-corpus-feasibility.md (new), ai/AGENTS.md (BVI-DVC invariant note).
  • Invariant: the BVI-DVC archive and any extracted artefacts (parquet, cached libvmaf JSON, JSONL corpus shard) are research-only and stay local — only derived fr_regressor_v2_*.onnx weights ship. The merge utility validates every row against the canonical vmaftune.CORPUS_ROW_KEYS tuple; the schema is the merge contract. Re-shape here is a pure transform on the cached libvmaf JSON; no ffmpeg / vmaf binary is invoked. The (src_sha256, encoder, preset, crf) natural key is load-bearing for de-duplication across mirrors and re-encodes.
  • Upstream interaction: none. ai/ is fork-introduced; BVI-DVC is not part of Netflix/vmaf upstream.
  • Re-test on rebase:
python -m pytest ai/tests/test_merge_corpora.py -v

ADR-0312 — ffmpeg-patches/ vmaf-tune integration (2026-05-05)

  • Files: ffmpeg-patches/0007-libvmaf-tune-qpfile-unified.patch, ffmpeg-patches/0008-add-libvmaf_tune-filter.patch, ffmpeg-patches/0009-pass-autotune-cli-glue.patch, ffmpeg-patches/series.txt, ffmpeg-patches/README.md.
  • Rebase invariant: patches 0007–0009 plug into the cumulative state after patches 0001–0006 apply against pristine n8.1. Per-patch git apply --check in isolation is the wrong gate; use the series-replay command in CLAUDE.md §12 r14 instead.
  • vmaf-tune patch invariant: the qpfile parser at libavcodec/qpfile_parser.{c,h} is shared across all three encoder adapters in patch 0007. Future encoders that grow a -qpfile AVOption inherit it; do not fork the parser. When tools/vmaf-tune/src/vmaftune/saliency.py's qpfile output format changes (new column, different frame-type alphabet, …), patch 0007 must change in the same PR (CLAUDE.md §12 r14).
  • vf_libvmaf_tune full-scoring promotion (2026-05-06): patch 0008 originally shipped as a scaffold (linear CRF↔VMAF interpolation, no libvmaf scoring) per ADR-0312's deferred-alternatives column. The filter now mirrors vf_libvmaf.c's CPU framesync pipeline end-to-end (vmaf_init + vmaf_model_load + vmaf_use_features_from_model in init(); per-frame vmaf_picture_alloc + memcpy + vmaf_read_pictures; flush + vmaf_score_pooled(MEAN) in uninit()). The CRF recommendation remains a piece-wise linear projection from the observed VMAF; per-clip Optuna TPE search stays in tools/vmaf-tune/src/vmaftune/recommend.py. Rebase-side: the new filter still depends only on libvmaf's CPU C-API (vmaf_init, vmaf_model_load, vmaf_use_features_from_model, vmaf_read_pictures, vmaf_score_pooled, vmaf_close, vmaf_picture_alloc/unref); zero new symbols beyond what vf_libvmaf.c already requires, so future libvmaf rebases that pass the existing libvmaf filter pass this one too. ADR-0312 sub-decision retired.
  • n7+ API migration (2026-05-06): patch 0008 originally referenced the removed AVFilterLink::frame_rate member directly (n6-era API); in n7+ that field moved off AVFilterLink onto a new FilterLink struct accessed via ff_filter_link(AVFilterLink *) from libavfilter/filters.h. Patch 0008 now uses ff_filter_link(outlink)->frame_rate = ff_filter_link(mainlink)->frame_rate; in config_output(), mirroring patches 0005/0006 which were already written against the post-n7 API. The bug slipped through CI because the FFmpeg-Vulkan lane only builds vf_libvmaf.o, not vf_libvmaf_tune.c; the full SYCL lane catches it now that PR #415 added ffmpeg-patches/** to the integration workflow's path filter. Discovery: PR #415 / ADR-0317.
  • Upstream source: zero. The vmaf-tune integration is fork-introduced; pure upstream syncs are unaffected.
  • On upstream sync: zero interaction with libvmaf master. FFmpeg-side rebases when n8.1 → n8.x land in ffmpeg-patches/test/build-and-run.sh's FFMPEG_SHA are tracked separately under each refresh ADR (e.g., ADR-0277 for the 2026-05-04 refresh).
  • Re-test on rebase:
git -C /path/to/ffmpeg-8 reset --hard n8.1
for p in ffmpeg-patches/000*-*.patch; do
    git -C /path/to/ffmpeg-8 am --3way "$p" || break
done
# Build smoke (libvmaf-disabled — patches 0001–0006 skipped if libvmaf_dnn
# is not built). With libvmaf_dnn available:
cd /path/to/ffmpeg-8 && ./configure --enable-libvmaf --enable-libx264 --enable-libsvtav1 --enable-libaom --enable-gpl
make -j$(nproc) ffmpeg
./ffmpeg -hide_banner -h encoder=libx264 2>&1 | grep -i qpfile
  • 2026-05-06 update — patch 0007 SVT-AV1 ROI bridge promoted from scaffold to full impl: the libsvtav1 hunk now sets enc_params.enable_roi_map = true, builds one SvtAv1RoiMapEvt per qpfile frame upfront in eb_enc_init (per-MB qp_offsets averaged into per-64×64-SB b64_seg_map of up to 8 segment QPs; uniform binning when the value span exceeds the segment budget), and attaches each event as a ROI_MAP_EVENT priv-data node from eb_send_frame() with node->size = sizeof(SvtAv1RoiMapEvt*) (the validation contract enforced by SVT-AV1's resource_coordination_process.c). Lifetime invariant: events + maps live for the entire encode session because SVT-AV1 reads ROI_MAP_EVENT data via shallow-copied pointers on async pipeline threads (per enc_handle.c::copy_private_data_list); eb_enc_close frees them. Wiring is gated on SVT_AV1_CHECK_VERSION(1, 6, 0); older SVT-AV1 builds keep the log-and-continue fallback. libaom remains scaffold-only — its AOME_SET_ROI_MAP bridge stays a separate follow-up. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).

  • 2026-05-06 update — patch 0007 libaom-av1 ROI bridge promoted from scaffold to full impl: the libaom-av1 hunk now caches the parsed VmafTuneQpFile in AOMContext, allocates a segment-id map at libaom's mode-info grid (ALIGN_POWER_OF_TWO(dim, 8) >> 2, since av1/common/enums.h::MI_SIZE == 4), and on every encoded frame picks up to 8 segment QPs from the per-frame qp_offset value range (uniform linear binning when the span exceeds AOM_MAX_SEGMENTS == 8), paints the per-mi segment map by expanding each per-16×16-MB qp_offset into a 4×4 block of mi cells, and issues aom_codec_control(&ctx->encoder, AOME_SET_ROI_MAP, &roi_map). Lifetime invariant: libaom deep-copies the segment map and delta_q[] table on every control call (per av1/encoder/encoder.c::av1_set_roi_map memcpy), so a single buffer is reused across frames and freed in aom_free(). The qpfile is also freed there. Trade-off: the 8-segment cap rounds nearby qp_offsets together when the saliency model emits more than 8 distinct values per frame; finer granularity requires vmaf-tune corpus instead. This retires the libaom-av1 deferral noted under ADR-0312 — both AV1 encoder hooks (libsvtav1 and libaom-av1) are now full-impl. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).

0315 — Vendor-neutral VVC encode strategy (ADR-0315 / Research-0085)

  • ADR: ADR-0315
  • Digest: Research-0085
  • Touches: docs-only.
  • docs/research/0085-vendor-neutral-vvc-encode-landscape.md (new).
  • docs/adr/0315-vendor-neutral-vvc-encode-strategy.md (new).
  • docs/adr/_index_fragments/0315-vendor-neutral-vvc-encode-strategy.md (new).
  • docs/adr/_index_fragments/_order.txt (one-line append).
  • changelog.d/added/research-0085-vendor-neutral-vvc-encode.md (new).
  • docs/rebase-notes.md (this entry).
  • Rebase invariant: none. The research digest and ADR are pure surveys with no code dependencies; nothing in the fork's source tree references them in a way that breaks on upstream rebase.
  • Upstream source: zero. VVC encode strategy is a fork-local decision; upstream Netflix/vmaf has no codec adapter or encode-automation surface.
  • On upstream sync: zero interaction. Pure docs.
  • Re-test on rebase:
mkdocs build --strict 2>&1 | grep -E "(WARNING|ERROR)" || echo "docs build clean"
  • 2026-05-06 follow-up (Research-0085 verification pass):
  • docs/research/0085-vendor-neutral-vvc-encode-landscape.md flipped from Status: SKELETON to Status: Active. Most [UNVERIFIED] claims are now backed by primary-source URLs (NVIDIA SDK 13.0 docs, AMD AMF GitHub, Intel oneVPL GitHub + mfxstructures.h + CHANGELOG.md, Khronos registry, Phoronix Mesa/RADV coverage, VVenC issue tracker, ZLUDA repo).
  • ADR-0315's ## Context and ## Alternatives considered refreshed with the verified data points. Status stays Proposed.
  • [UNVERIFIED] count in the digest dropped 25 → 10; remaining items are legitimate gaps (NN-VC quality lift, vvenc per-kernel profile, HHI's non-public roadmap).
  • No code touched. No rebase impact beyond the existing docs-only posture.

0316 — cli_parse.c error() long-only-option fix (ADR-0316)

  • ADR: ADR-0316 (follow-up to ADR-0311).
  • Digest: none — bug-fix; fix shape fits in the ADR/commit body.
  • Touches:
  • core/tools/cli_parse.c (3 lines — call-site arg change at the ARG_THREADS / ARG_SUBSAMPLE / ARG_CPUMASK handlers).
  • core/test/fuzz/fuzz_cli_parse.c (removed known_assert_in_input early-reject filter).
  • core/test/fuzz/cli_parse_corpus/cli_threads_abbrev_assert.argv (promoted from cli_parse_known_crashes/).
  • core/test/test_cli_parse_long_only_args.c (new fork()-based regression test).
  • core/test/meson.build (new test wiring, gated off Windows alongside test_y4m_411_oob).
  • core/tools/AGENTS.md (added a long-only-options invariant note next to the existing cli_parse.c rules).
  • Rebase invariant: load-bearing. cli_parse.c is upstream-mirror with fork additions; the three handlers carry the fork-local shape of passing the ARG_* enum value (not 't' / 's' / 'c') to parse_unsigned(). If an upstream sync re-introduces the original short-option char shape, the assert returns and the parked-then-promoted reproducer (cli_parse_corpus/cli_threads_abbrev_assert.argv) will surface it in the next nightly fuzz run.
  • Upstream source: the bug shape exists in Netflix/vmaf master too (long-only options were added upstream with the same short-option-char placeholder). When the fork ports an upstream fix that overlaps these handlers, prefer the parse_unsigned(optarg, ARG_*, argv[0]) form already on the fork.
  • On upstream sync: re-apply the three-line change in cli_parse.c if upstream resets the call-site args. The unit test is fork-local and stays.
  • Re-test on rebase:
meson setup core/build libvmaf -Denable_tests=true \
    -Denable_cuda=false -Denable_sycl=false
ninja -C core/build test/test_cli_parse_long_only_args
meson test -C core/build test_cli_parse_long_only_args -v

ADR-0317 — CI flake fix: doc-only PR path-filter (2026-05-06)

  • Touched files:
  • .github/workflows/docker-image.yml — added paths: filter on both push: and pull_request: triggers.
  • .github/workflows/ffmpeg-integration.yml — added paths: filter on both push: and pull_request: triggers (covers all four matrix lanes: gcc, clang, SYCL, Vulkan).
  • docs/adr/0317-ci-doc-only-pr-flake-fix.md, docs/adr/README.md (index row), changelog.d/fixed/ci-doc-only-pr-flakes.md.
  • Rebase invariant: not load-bearing. Workflow-only change. Both files are fork-local CI; upstream Netflix/vmaf does not ship a Docker workflow or an FFmpeg-integration matrix in this shape, so rebase conflicts are unlikely. If a future upstream sync introduces an overlapping docker-image.yml or FFmpeg matrix, prefer the fork's path-filtered form — the rationale (ADR-0313 aggregator posture, doc-only-PR runner-time burn) is fork-specific.
  • Upstream source: none — fork-local CI workflows.
  • On upstream sync: no action required. If reviewers later add new build inputs (e.g. a top-level docker-compose.yml, a new ffmpeg-patches/*.txt config file), extend the paths: lists in the same PR that adds the input.
  • Follow-up not in this ADR: patch ffmpeg-patches/0008-add-libvmaf_tune-filter.patch line 256 (outlink->frame_rate = mainlink->frame_rate;) needs to migrate to the ff_filter_link() accessor introduced in FFmpeg n7+, matching the pattern already in patches 0005 / 0006. Tracked separately; the path-filter does not hide it (any libvmaf/ or ffmpeg-patches/ PR will still trip the SYCL lane).
  • Re-test on rebase:
python3 -c "import yaml; \
  yaml.safe_load(open('.github/workflows/docker-image.yml')); \
  yaml.safe_load(open('.github/workflows/ffmpeg-integration.yml')); \
  print('OK')"

0319 — fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training (ADR-0319)

  • Touches: ai/scripts/train_fr_regressor_v2_ensemble_loso.py (real _load_corpus + _train_one_seed bodies), ai/scripts/run_ensemble_v2_real_corpus_loso.sh (wrapper argv fix), docs/ai/ensemble-v2-real-corpus-retrain-runbook.md (Step 0 corpus-generation section), ai/AGENTS.md (canonical-6 schema invariant note), ai/tests/test_train_fr_regressor_v2_ensemble_loso_*.py (loader + train schema tests). Closes the deferrals tracked in rebase-notes §0303 + §0309.
  • Upstream source: none — fork-local ML training infrastructure. Netflix/vmaf upstream has no fr_regressor_v2 surface, no LOSO trainer, and no canonical-6 corpus tooling.
  • Invariant: the trainer's _load_corpus accepts the canonical-6 JSONL schema emitted by scripts/dev/hw_encoder_corpus.py bit-for-bit — required keys per row are (src, encoder, cq, frame_index, vmaf, adm2, vif_scale0..3, motion2). Codec block layout is 12-slot ENCODER_VOCAB v2 one-hot + constant preset_norm = 0.5 + crf_norm = (cq - cq_min) / (cq_max - cq_min). Schema changes require an ENCODER_VOCAB_VERSION bump and full ensemble retrain per the existing closed-vocabulary rule (ADR-0235 / ADR-0352). Fold-level StandardScaler is fit on the training rows only; leaking the held-out source's distribution into the scaler would silently inflate per-fold PLCC.
  • On upstream sync: no action required. If upstream Netflix/vmaf ever adds a competing LOSO trainer under python/vmaf/, do NOT merge them — keep the fork's training stack under ai/ per the AGENTS.md scope rule.
  • Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v2_ensemble_loso_loader.py \
       ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py -v
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh

ADR-0323 — fr_regressor_v3 train + register on ENCODER_VOCAB v3 (2026-05-06)

  • Scope: ai/scripts/train_fr_regressor_v3.py (new), ai/tests/test_train_fr_regressor_v3.py (new), model/tiny/fr_regressor_v3.onnx (new, real-weight checkpoint from a 9-fold LOSO gate-pass at mean PLCC 0.9975), model/tiny/fr_regressor_v3.json (new sidecar with encoder_vocab_version: 3 and full per-fold trace), model/tiny/registry.json (new fr_regressor_v3 row, smoke: false), ai/AGENTS.md (v3 retrain invariant section gains a "Status" subsection recording the gate result), docs/ai/models/fr_regressor_v3.md (new model card), docs/adr/0323-fr-regressor-v3-train-and-register.md + index row, changelog.d/added/fr-regressor-v3-train-register.md.
  • Rebase impact: zero. Fork-local feature; no upstream Netflix/vmaf surface is touched. The 16-slot ENCODER_VOCAB_V3 imported from train_fr_regressor_v2.py was already landed by PR #401 (ADR-0302).
  • On upstream sync: no action required. The v3 model ships alongside v2 — fr_regressor_v2.onnx and its sidecar are unchanged; the v3 row is appended to the registry and sorted alphabetically. If a future upstream sync ever lands a competing fr_regressor_v3 model under python/vmaf/, do NOT cross-link them — the fork's training stack lives under ai/.
  • Watch out for: the live ENCODER_VOCAB_VERSION in ai/scripts/train_fr_regressor_v2.py stays at 2 (per ADR-0302's invariant). Do not bump it to 3 in this PR or in any downstream port; the in-place promotion of v3 over v2 is a separate "promote v3 to authoritative" PR per ADR-0302's production-flip checklist.
  • Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh   # must report OK: 20+
python -c "import onnx; onnx.checker.check_model(onnx.load('model/tiny/fr_regressor_v3.onnx')); print('OK')"

ADR-0321 — fr_regressor_v2_ensemble_v1 full production flip (2026-05-06)

  • Scope: ai/scripts/export_ensemble_v2_seeds.py (new), model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.onnx (real full-corpus-trained weights replacing the 3025-byte synthetic scaffold bytes), model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.json (new per-seed sidecars), model/tiny/registry.json (sha256 + smoke: false on the five seed rows), ai/AGENTS.md (new invariant: the registry-flip is now done; future re-flips require a fresh PROMOTE.json + re-run of the export driver).
  • Rebase impact: zero. This is a fork-local production-flip; no upstream Netflix/vmaf surface is touched. The 12-slot ENCODER_VOCAB v2 carried in each sidecar is the same one the LOSO trainer (ADR-0319) bakes into the codec-block layout, so there is no rebase-time vocabulary drift to worry about.
  • Watch out for: if a future upstream sync ever introduces a competing fr_regressor_v2_ensemble_* model under python/vmaf/, do NOT cross-link them — the fork's ensemble weights are gated on runs/ensemble_v2_real/PROMOTE.json and are not portable to a different training stack.
  • Re-test on rebase:
bash core/test/dnn/test_registry.sh   # must report OK: 19
python -c "import onnx; \
  [onnx.checker.check_model(onnx.load(f'model/tiny/fr_regressor_v2_ensemble_v1_seed{i}.onnx')) \
   for i in range(5)]; print('OK')"

ADR-0324 — Ensemble training kit (2026-05-06)

  • Touches: tools/ensemble-training-kit/ (new), docs/adr/0324-ensemble-training-kit.md (new), docs/adr/README.md (index row), changelog.d/added/0324-ensemble-training-kit.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the kit assumes the LOSO wrapper hard-codes seeds (0 1 2 3 4). The orchestrator surfaces a warning if --seeds deviates but still hands off to the wrapper. If a future PR parameterises the wrapper's seed list, update both the wrapper and the kit's pass-through logic in lockstep.
  • On upstream sync: no action required. The kit lives entirely under tools/ensemble-training-kit/ (a fork-local path) and only invokes other fork-local scripts (ai/scripts/, scripts/dev/, scripts/ci/).
  • Re-test on rebase:
bash -n tools/ensemble-training-kit/*.sh
bash tools/ensemble-training-kit/make-distribution-tarball.sh /tmp/kit-test.tar.gz
tar -tzf /tmp/kit-test.tar.gz | grep -q "tools/ensemble-training-kit/run-full-pipeline.sh"

ADR-0332 — External-competitor benchmark harness (2026-05-08)

  • Touches: tools/external-bench/ (new), docs/adr/0332-external-bench-wrapper-only.md (new), docs/adr/_index_fragments/0332-external-bench-wrapper-only.md (new), docs/adr/_index_fragments/_order.txt (one-line append), docs/adr/README.md (regenerated), changelog.d/added/external-bench-harness.md (new), docs/research/0087-external-bench-competitor-survey-2026-05-08.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the harness is wrapper-only — never vendor or link x264-pVMAF (GPL-2.0) into this fork. Future competitors follow the same pattern (tools/external-bench/<competitor>/run.sh invokes a user-installed binary via env var; output schema-shimmed into the canonical JSON shape). The output schema (frames[].{frame_idx, predicted_vmaf_or_mos, runtime_ms} + summary.{competitor, plcc, srocc, rmse, runtime_total_ms, params, gflops}) is the contract between every wrapper and compare.py. run_wrapper's runner parameter MUST stay resolved at call time (not via default-arg binding) so monkeypatch-based tests work.
  • On upstream sync: no action required. The harness lives entirely under tools/external-bench/ (a fork-local path) and never touches Netflix-shared code.
  • Re-test on rebase:
python3 -c "import yaml; names=['docker-image','security-scans','lint-and-format','required-aggregator','ffmpeg-integration','libvmaf-build-matrix','rule-enforcement','tests-and-quality-gates']; [yaml.safe_load(open(f'.github/workflows/{n}.yml')) for n in names]; print('OK')"
# Spot-check the gate is present on every top-level job:
for f in docker-image security-scans lint-and-format ffmpeg-integration \
         libvmaf-build-matrix rule-enforcement tests-and-quality-gates \
         required-aggregator; do
  grep -c "pull_request.draft == false" ".github/workflows/${f}.yml"
done  # Each must report >= 1.

SSIM extractor registration fix (2026-05-08)

  • Touches: core/src/feature/feature_extractor.c (upstream-mirror — adds one extern + one registry-array entry near the existing SSIM rows), core/src/feature/integer_ssim.c (upstream-mirror — adds #include "config.h" and refreshes the file-scope comment above vmaf_fex_ssim), core/src/meson.build (adds integer_ssim.c to the source list — fork-local diff), core/test/test_feature_extractor.c (adds one regression test alongside the existing tests), docs/metrics/features.md (table row + footnote ²), docs/state.md, changelog.d/fixed/ssim-extractor-registration.md.
  • Invariant on the upstream-mirror files: the registry-array entry must remain inside the unconditional CPU block (the same block as &vmaf_fex_float_ssim / &vmaf_fex_float_ms_ssim) — vmaf_fex_ssim is CPU-only with no SIMD or GPU twin. The config.h include in integer_ssim.c is load-bearing on Vulkan-enabled LTO builds because feature_extractor.c and integer_ssim.c must agree on HAVE_VULKAN / HAVE_CUDA / HAVE_SYCL for the VmafFeatureExtractor struct layout to match across TUs.
  • On upstream sync: if Netflix ever lands its own integer-SSIM registry row, drop the fork's row in favour of upstream's; the file structure is identical. If upstream removes integer_ssim.c entirely (the file has been dormant on master for years), revert the meson.build addition. Otherwise no action.
  • Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false && ninja -C build
./build/test/test_feature_extractor    # 5/5 pass, includes new ssim row
./build/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
                  --distorted testdata/dis_576x324_48f.yuv \
                  --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
                  --feature ssim --output /tmp/ssim_smoke.json && \
  grep -q '<metric name="ssim"' /tmp/ssim_smoke.json
# Vulkan-enabled LTO build (-Wlto-type-mismatch must stay clean)
meson setup build-vulkan -Denable_vulkan=enabled --reconfigure && \
  ninja -C build-vulkan tools/vmaf
python3 -m pytest tools/external-bench/tests/ -q   # must report 7 passed
bash -n tools/external-bench/*/run.sh

0327 — Conformal-VQA prediction surface for vmaf-tune (ADR-0279)

  • Touches: tools/vmaf-tune/src/vmaftune/conformal.py (new), tools/vmaf-tune/src/vmaftune/predictor.py (Predictor.predict_vmaf_with_uncertainty), tools/vmaf-tune/src/vmaftune/cli.py (predict subcommand gains --with-uncertainty / --calibration-sidecar / --alpha), tools/vmaf-tune/tests/test_conformal.py (new), docs/ai/conformal-vqa.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the conformal wrapper sits outside the ONNX graph and adds no new runtime dependency — conformal.py imports only the standard library (math, statistics, dataclasses, json, warnings). Future calibration-sidecar shapes use the method discriminator string for versioning; do not rename "split-conformal" / "cv-plus" without bumping the loader. The Predictor.predict_vmaf_with_uncertainty signature is the Python-API contract consumed by vmaf-tune predict --with-uncertainty; renaming or reordering its keyword args breaks the CLI in lockstep.
  • On upstream sync: no action required. vmaf-tune is a fork-local tool; upstream Netflix/vmaf has no per-shot prediction surface.
  • Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_conformal.py -q
python3 -m pytest tools/vmaf-tune/tests/test_predictor.py -q

CI paths-ignore deny-list on heavy workflows (ADR-0341, 2026-05-09)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (fork-local — paths-ignore: block under pull_request:), .github/workflows/tests-and-quality-gates.yml (fork-local — same block), docs/adr/0341-ci-paths-ignore-doc-only-prs.md + index fragment, changelog.d/changed/ci-paths-ignore-doc-only.md.
  • Invariant: the deny-list must stay strictly documentation-only (docs/**, **/*.md, changelog.d/**, CHANGELOG.md, .workingdir2/**). Any path that contributes to a build, test, or lint input — libvmaf/**, meson.build, meson_options.txt, subprojects/**, python/**, ai/**, mcp-server/**, model/**, testdata/**, .github/workflows/** — must NEVER appear in the deny-list, otherwise the corresponding required check is silently skipped on a code-touching PR. The Required Checks Aggregator (ADR-0313) catches only the doc-only case (no required check ever ran for any required name); a too-broad deny-list would lose build coverage without anyone noticing.
  • On upstream sync: Netflix/vmaf upstream does not carry these two workflow files (they are fork-local additions). No sync conflict expected.

0332 — mkdocs --strict validation policy (ADR-0332)

  • Touches: mkdocs.yml (validation block + exclude_docs:), docs/mcp/embedded.md (one anchor fix), docs/research/0055-...md (one anchor fix), docs/{index,state,rebase-notes}.md (small bare-relative-dir-link sweep). All fork-local — no upstream-shared paths touched.
  • Upstream source: none — Netflix/vmaf upstream uses Sphinx / GitHub-rendered Markdown, not mkdocs. The mkdocs.yml config is wholly fork-local.
  • Invariant: mkdocs.yml validation: must keep links.{not_found,unrecognized_links}: info until either (a) ADR-0028 / ADR-0106 are superseded by a less-strict immutability rule that allows refreshing renamed-ADR cross-refs in frozen ADR bodies, or (b) the ~820 cross-tree-pointer links from docs into source-tree files (../../core/src/..., ../../scripts/ci/..., ../../.github/workflows/...) are migrated to absolute GitHub URLs or moved into docs_dir-resident generated content. Promoting either category to warn while those conditions hold turns the docs lane permanently red.
  • On upstream sync: no action — the lane is fork-local.
  • Re-test on rebase:

HDR VMAF model search — Path C documentation only (2026-05-09)

  • Files added (this fork only; upstream Netflix/vmaf has none of these):
  • model/vmaf_hdr_model_card.md — discoverable warning that the HDR scoring path falls back to the SDR vmaf_v0.6.1.json weights. Filename deliberately uses .md, not .json, so the vmaftune.hdr.select_hdr_vmaf_model glob (vmaf_hdr_*.json) keeps returning None.
  • docs/research/0089-hdr-vmaf-model-search.md — verbatim trail of the source-or-train survey (URLs + access dates).
  • changelog.d/added/hdr-vmaf-model-search.md — release-notes fragment per ADR-0221.
  • ADR-0300 grew an inline ### Status update 2026-05-09: HDR model status section.
  • Why no model JSON ships: Path A negative findings (no public Netflix HDR VMAF model exists; HDRMAX is a different algorithm not loadable by libvmaf's JSON path). Path B deferred behind gated subjective HDR corpora + multi-day training compute. No fabricated weights are introduced.
  • On upstream sync: if Netflix lands vmaf_hdr_*.json in Netflix/vmaf/model/, port via /port-upstream-commit; the resolver picks it up automatically with no vmaftune change. Then delete model/vmaf_hdr_model_card.md (or rewrite it as a normal model card describing the upstream weights). Watch https://github.com/Netflix/vmaf/issues/645 for the upstream release announcement.
  • Re-test on rebase: no behavioural change — pure docs. Sanity:
python3 -c "from pathlib import Path; \
  import sys; sys.path.insert(0,'tools/vmaf-tune/src'); \
  from vmaftune.hdr import select_hdr_vmaf_model; \
  print(select_hdr_vmaf_model(Path('model')))"
# Expect: None  — confirms the .md card does not match the glob

mkdocs build --strict   # must EXIT=0 with no WARNING lines

ADR-0349 — fr_regressor_v3 namespace resolution (2026-05-09)

  • Rebase impact: none. Docs-only change — adds ADR-0349, an append-only status appendix on ADR-0302 per ADR-0028, a ## fr_regressor_* namespace map block in ai/AGENTS.md, and two changelog fragments. No upstream Netflix/vmaf surface touched; no fr_regressor_* registry rows touched (sha256s for _v1, _v2, _v2_ensemble_v1_seed{0..4}, _v3 all unchanged); no C / Python / ONNX bytes modified.
  • What to check after a rebase: nothing automated. The only drift risk is a future agent claiming fr_regressor_v3plus_features for an unrelated workstream — ai/AGENTS.md carries the reservation; reviewers verify the map row exists before approving any new fr_regressor_* registry id.
  • Reproducer:

```bash # ADR + AGENTS.md namespace map present and consistent: test -f docs/adr/0349-fr-regressor-v3-namespace.md grep -q "fr_regressor_* namespace map" ai/AGENTS.md grep -q "fr_regressor_v3plus_features" ai/AGENTS.md docs/adr/0349-fr-regressor-v3-namespace.md # Status appendix present on ADR-0302: grep -q "Status update 2026-05-09: namespace collision resolved" \ docs/adr/0302-encoder-vocab-v3-schema-expansion.md # Existing v3 production row bit-identical (sha256 unchanged): python3 -c "

import json reg = json.load(open('model/tiny/registry.json')) v3 = next(m for m in reg['models'] if m['id'] == 'fr_regressor_v3') assert v3['sha256'] == 'eaa16d23461eda74940b2ed590edfcaf13428aade294e47792a5a15f4d3b999c', v3 assert v3['smoke'] is False print('OK: fr_regressor_v3 production row unchanged') "

Registry test still passes:

bash core/test/dnn/test_registry.sh

0327 — Pre-push PR-body deliverables validator hook

  • Touches: scripts/ci/validate-pr-body.sh (new), scripts/git-hooks/pre-push (new), scripts/ci/test-validate-pr-body.sh (new), Makefile (hooks-install target adds the pre-push symlink). Re-uses scripts/ci/deliverables-check.sh parser verbatim — no upstream-shared file is modified.
  • Invariant: parser shape parity with .github/workflows/rule-enforcement.yml deep-dive-checklist gate (ADR-0108). The validator constructs a PATH shim that intercepts git diff --name-only calls only; every other git invocation falls through to the real binary.
  • On upstream sync: not applicable — these files are entirely fork-local and Netflix has no equivalent. If scripts/ci/deliverables-check.sh is ever rewritten or moved, the validator's exec path (scripts/ci/deliverables-check.sh) and the test harness's expected exit codes must follow. bash scripts/ci/test-validate-pr-body.sh # 8/8 cases pass

0320 — Semgrep # nosemgrep cites on Netflix-upstream Python harness (Research-0090)

  • Touches: python/vmaf/core/asset.py, python/vmaf/core/executor.py, python/vmaf/core/feature_extractor.py, python/vmaf/core/quality_runner.py, python/vmaf/core/result_store.py, python/vmaf/tools/decorator.py, python/test/command_line_test.py, python/test/feature_extractor_test.py, python/test/ssimulacra2_test.py, python/vmaf/config.py.
  • Invariant: every fork-added # nosemgrep: <rule-id> line is paired with an inline cite to Research-0090. The cite + rule-id pair is the load-bearing artifact (per memory feedback_no_guessing: every "false positive" claim ships its safety proof). If an upstream sync removes the cited line of code, drop the cite-comment block too. If upstream adds a defusedxml fix at the ElementTree.parse() site (feature_extractor.py:115, quality_runner.py:1496), keep upstream's fix and drop our suppressions.
  • config.py:40 (the SSL-bypass deletion) is a fork-exclusive security fix; if upstream resurrects ssl._create_unverified_context on a sync, do not re-merge it — the bypass clobbers the process-global default and is unjustified per Research-0090, F1. semgrep scan --config=p/cwe-top-25 --config=p/c --config=p/python . \ --metrics=off --json | jq '.results | length'

# expect 0 — every legit finding either has a # nosemgrep cite or was fixed

0321 — Security-scans workflow registry-pack list (Research-0090)

  • Touches: .github/workflows/security-scans.yml, .github/workflows/lint-and-format.yml.
  • Invariant: the registry packs the workflow cites (p/cwe-top-25 + p/c + p/python) are validated against https://semgrep.dev/c/p/<pack> — the previously-cited p/cert-c-strict, p/cert-cpp-strict, and p/cpp packs were retired by Semgrep in 2025 and 404. The lint-and-format.yml pull of ${{ github.* }} into env: (clang-tidy + clang-tidy-sycl steps) defuses run-shell-injection; preserve the pattern on any edit. See Research-0090, F2/F3. for pack in p/cwe-top-25 p/c p/python; do code=$(curl -sIL "https://semgrep.dev/c/${pack}" | head -1 | awk '{print $2}') [ "$code" = "200" ] && echo "${pack}: OK" || echo "${pack}: FAIL ($code)"

0320 — CodeQL C bulk sweep (78 deferred alerts → 60 fixed, 14 deferred to T7-5)

  • Touches: core/src/feature/{cambi.c,ciede.c,integer_adm.c,integer_psnr.c,adm_tools.h,third_party/xiph/psnr_hvs.c}, core/src/feature/x86/{adm_avx2.c,adm_avx512.c,ansnr_avx2.c,ansnr_avx512.c,vif_avx2.c,vif_avx512.c}, core/src/{pdjson.c,svm.cpp}, core/test/{test_cpu.c,test_model.c}, core/tools/{y4m_input.c,yuv_input.c,vmaf_bench.c}. All but vmaf_bench.c are upstream-mirror Netflix files.
  • Invariant: widening casts on integer multiplications ((size_t), (uint64_t), (double)) are LHS-prefixed before the multiply, never wrapped around the whole expression — the latter is a no-op against cpp/integer-multiplication-cast-to-long. Deleted commented-out blocks (e.g., the AVX-512 VP-loop dead variant in adm_avx512.c::adm_dwt2_inverse) are gone for good; if upstream brings them back, they reintroduce the alerts. iqa/convolve.c was deliberately left untouched: prefixing (double) on the float×float multiplications inside the scalar reference path breaks bit-exactness against the AVX2 path enforced by test_iqa_convolve — CodeQL alert deferred to a follow-up that updates both paths in lockstep.
  • On upstream sync: any upstream change that re-introduces the deleted comment blocks or rewrites the cast forms will surface the alerts again. The cambi_score signature change (CambiBuffers buffersconst CambiBuffers *buffers) is fork-local and likely to conflict with upstream patches that touch that function. The 14 deferred VifBuffer large-parameter alerts are tracked under T7-5 (multi-backend coordinated refactor including NEON).
  • Re-test on rebase: cd libvmaf && meson test -C build # all 50+ C tests make test-netflix-golden # upstream golden gate

# Re-run CodeQL on master afterwards; the 60 fixed alerts must stay closed.

CodeQL cpp/declaration-hides-variable sweep (2026-05-09)

  • What changed: Mechanical rename / scope-tighten / dedupe sweep closing 64 open cpp/declaration-hides-variable CodeQL alerts on master. Touched files: core/src/feature/cambi.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/x86/vif_avx2.c, core/src/feature/x86/vif_avx512.c. All five are upstream-mirror; the Netflix copyright header is preserved on each.
  • Renames adopted (semantic over _2 suffix):
  • cambi.c: inner int err shadowing function-scope err becomes mkdir_err (heatmaps init) and src_err (full-ref extract path).
  • adm_avx2.c / adm_avx512.c: the j == 0 first-column special-case block is wrapped in { ... } so its j0..j3 and s0..s3 stop being visible to the per-j tail loop. The inner duplicate __m256i add_shift_HP_vex = _mm256_set1_epi32(32768) (and 512-bit twin) is removed — bit-identical to the function-scope value already in scope. The __m256i rfactor1 that shadowed the function-scope float rfactor1[3] becomes rfactor_v0/_v1/_v2 (and the AVX-512 twin likewise).
  • vif_avx2.c / vif_avx512.c: tap-loop locals follow f_tap, r_top/r_bot, d_top/d_bot for the s0 stage, and f_tap0/f_tap1, r_back0/r_fwd0, etc. for the AVX-512 paired-tap stage. Inner per-fj __m256i fq / __m512i fq shadows of the centre-tap broadcast become f_tap. Inner-block duplicates of function-scope ref/dis/stride/ii (identical types and initialisers) are simply removed. The two scalar VifResiduals residuals declarations that shadowed function-scope Residuals512 residuals become tail_residuals. The two const uint16_t fcoeff declarations that shadowed function-scope __m512i fcoeff become fcoeff_scalar.
  • Invariant: bit-exactness gate — the rename sweep must not change any score. The Netflix CPU golden 3 (src01_hrc00, checkerboard_1, checkerboard_10) ran clean against this PR. All 76 VMAF-targeted Python tests pass; the 9 unrelated pre-existing failures (NIQE, PyPSNR, FileSystemResultStore) reproduce on a pristine origin/master checkout.
  • On upstream sync: Netflix has no equivalent renames on upstream master as of 2026-05-09. When syncing, prefer the fork's renamed identifiers (the CodeQL gate depends on them). If Netflix later renames the same locals differently, reconcile by keeping fork names and updating any imported chunks at port time.
  • Re-test on rebase: meson test -C build --suite=fast PYTHONPATH=$PWD/python python3 -m pytest \ python/test/quality_runner_test.py -k test_run_vmaf \ python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ -m "not slow" -q

ADR-0209 v1 stdio runtime (T5-2b) — Embedded MCP server (2026-05-08)

  • Touches: core/src/mcp/{mcp.c,dispatcher.c,transport_stdio.c,mcp_internal.h,meson.build,3rdparty/cJSON/{cJSON.c,cJSON.h,LICENSE}}, core/test/test_mcp_smoke.c, core/test/meson.build. All paths are fork-local. cJSON is vendored verbatim from upstream DaveGamble/cJSON@v1.7.18 under its MIT license.
  • Invariant: every TU under core/src/mcp/ (other than the vendored cJSON dir) is fork-local with the Copyright 2026 Lusoris and Claude (Anthropic) header; cJSON keeps its upstream MIT header verbatim. The public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged from T5-2 — only function bodies flipped from -ENOSYS to working implementations. SSE / UDS still return -ENOSYS so the v2 PR can wire them without touching the public surface.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface; the entire core/src/mcp/ subtree is fork-local. If upstream ever adds an MCP surface, expect a port-only sync since names will collide. cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \ -Denable_mcp=true -Denable_mcp_stdio=true ninja -C build && meson test -C build test_mcp_smoke -v

ADR-0334 — state.md-touch-check CI gate (2026-05-08)

  • Touches: .github/workflows/rule-enforcement.yml (new top-level job state-md-touch-check), scripts/ci/state-md-touch-check.sh (new), scripts/ci/test-state-md-touch-check.sh (new), scripts/ci/AGENTS.md (new rebase-sensitive-surface row), .github/PULL_REQUEST_TEMPLATE.md (already carries the "Bug-status hygiene" section + no state delta: REASON opt-out — coupled to the script's regex). No upstream-shared paths.
  • Invariant: the gate's trigger predicate (Conventional-Commit fix: prefix, bare bug token in title, GitHub close-keywords closes/fixes/resolves #N, unchecked Bug-status-hygiene checkbox) and opt-out sentinel (no state delta: REASON) match the wording of the ## Bug-status hygiene section in .github/PULL_REQUEST_TEMPLATE.md. Reword the template only alongside the script. The job carries the pull_request.draft == false || github.event_name != 'pull_request' gate (ADR-0331 pattern) — keep that on any future hoist into the required-aggregator set.
  • On upstream sync: Netflix/vmaf has no equivalent rule. No conflict expected; the workflow file is fork-introduced.
  • Re-test on rebase: bash scripts/ci/test-state-md-touch-check.sh python3 -c "import yaml; yaml.safe_load(open('.github/workflows/rule-enforcement.yml')); print('YAML OK')" pre-commit run shellcheck --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh pre-commit run shfmt --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh

SYCL PSNR chroma extension (T3-15(b), 2026-05-09)

  • Touches: core/src/feature/sycl/integer_psnr_sycl.cpp (per-extractor chroma device buffers, per-plane SSE accumulators, and a provided_features extension to psnr_y / psnr_cb / psnr_cr), core/src/sycl/AGENTS.md (per-kernel rebase-sensitive invariant for the chroma-on-per-extractor-buffer arrangement), docs/metrics/features.md (footnote ¹ refresh — all three GPU PSNR extractors now emit chroma), docs/adr/0192-gpu-long-tail-batch-3.md References-section status update, changelog.d/added/sycl-psnr-chroma.md.
  • Invariant on the chroma upload path: chroma planes ride on per-extractor device buffers populated by host-side staging copies in the combined-graph pre_fn callback — NOT the SYCL state's shared frame buffer (vmaf_sycl_shared_frame_init), which is luma-only by design. Luma stays graph-recorded; chroma SSE kernels run direct in post_fn on the same in-order combined queue. The CUDA twin (PR #520 / commit 7f3d58a5) uses the existing CUDA per-plane picture infrastructure and therefore has no equivalent invariant.
  • On upstream sync: Netflix/vmaf upstream has no SYCL backend at all, so conflict probability is zero on psnr_sycl. If an upstream port to the fork's SYCL runtime someday extends vmaf_sycl_shared_frame_init to allocate chroma planes, the PSNR extension can be migrated onto it and the per-extractor chroma buffers retired — but only after a cross-backend gate run confirms bit-exactness against CPU at places=4 (ADR-0214). source /opt/intel/oneapi/setvars.sh CC=icx CXX=icpx meson setup build-sycl libvmaf \ -Denable_sycl=true -Denable_cuda=false ninja -C build-sycl python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build-sycl/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend sycl --device 0

# Expect 0/48 mismatches across psnr_y / psnr_cb / psnr_cr at places=4.

```text

Cppcheck nullPointer false-positive in dict.c (2026-05-09)

Files pinned:

  • core/src/dict.c:121 (one-line redundant-condition fix in dict_overwrite_existing). Why this rebase-note exists: Master CI's Cppcheck (Whole Project) gate started failing on commit 14b5ffba (#537) and blocked every open PR because each PR rebases onto a broken master. The cppcheck finding was likely always present but masked by paths-ignore filtering on the prior workflow shape; PR #530 widened cppcheck's trigger surface and exposed it. Deleted the redundant && val guard since val is already checked at the public entry-point vmaf_dictionary_set (dict.c:137). No behavior change; cppcheck flags the original as "either the val check is redundant or there's a possible null deref" because it can't prove the interprocedural guarantee. Rebase-sensitivity: zero — change is local to dict.c. Future upstream sync of this file should keep the fix or re-run cppcheck locally to confirm absence of recurrence.

Aggregator timeout bump (2026-05-09)

Files pinned:

  • .github/workflows/required-aggregator.yml (deadline 30→90 min, job timeout 35→100 min) Why: 41 PRs in flight 2026-05-09 morning hit Aggregator timeouts while real CI eventually passed. Bumping both deadlines unblocks the train without touching the underlying matrix. Rebase-sensitivity: zero — workflow file is wholly fork-local.

ARC self-hosted runner pool — pilot Cppcheck routing (2026-05-09)

  • .github/workflows/lint-and-format.yml (Cppcheck runs-on: ternary). Why: opt-in graceful migration; ADR-0359 + docs/development/ci-runners.md document the flip-the-variable recipe when the cluster is degraded. Rebase-sensitivity: zero — workflow file is fork-local.

ADR-0338 — macOS Vulkan-via-MoltenVK CI lane (2026-05-09)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (fork-local — adds Build — macOS Vulkan via MoltenVK (advisory) lane, adds continue-on-error plumbing on matrix.experimental && matrix.moltenvk, adds Install MoltenVK + Vulkan loader/headers (macOS) step, adds Run Vulkan smoke tests (macOS MoltenVK) step, gates the existing test/cache/tox steps on !matrix.moltenvk), docs/backends/vulkan/moltenvk.md (new fork-local doc), docs/adr/0127-vulkan-compute-backend.md (status-update appendix per the ADR's Proposed status — body untouched), docs/adr/0338-macos-vulkan-via-moltenvk-lane.md (new), docs/adr/_index_fragments/0338-macos-vulkan-via-moltenvk-lane.md plus _order.txt append (new), docs/research/0089-moltenvk-feasibility-on-fork-shaders.md (new), changelog.d/added/macos-vulkan-via-moltenvk-lane.md (new).
  • Invariant on the upstream-mirror file: none — libvmaf-build-matrix.yml is fork-local. The new lane's continue-on-error clause MUST stay scoped to matrix.experimental == true && matrix.moltenvk == true so existing experimental: true matrix entries (e.g. the macOS DNN lane) keep their default fail-fast behaviour. VK_ICD_FILENAMES MUST point at /opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json — note the etc/vulkan segment, NOT share/vulkan (the homebrew formula's install layout uses etc/; verified against Formula/m/molten-vk.rb).
  • On upstream sync: Netflix upstream has no macOS Vulkan lane and no MoltenVK awareness; nothing to reconcile. If a future MoltenVK release drops support for GL_EXT_shader_atomic_int64 translation, moment.comp will fail on the lane; the fix path is in ADR-0338 §Decision (lane is continue-on-error so it does not block PRs) — update the known-limitations table in docs/backends/vulkan/moltenvk.md and either pin a working MoltenVK version in the brew install line or rewrite the shader.
  • Re-test on rebase:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/libvmaf-build-matrix.yml'))" && \
  echo "YAML parse OK"
# Confirm the lane is still in the matrix:
grep -q "Build — macOS Vulkan via MoltenVK (advisory)" \
  .github/workflows/libvmaf-build-matrix.yml
# Confirm the lane is NOT promoted to required-aggregator until one
# green run on master (per ADR-0338):
! grep -q "macOS Vulkan via MoltenVK" \
  .github/workflows/required-aggregator.yml
# Confirm the ICD path is the etc/ one, not share/:
grep -q "etc/vulkan/icd.d/MoltenVK_icd.json" \
  .github/workflows/libvmaf-build-matrix.yml

ADR-0363 — Mend Renovate replaces Dependabot (2026-05-09)

  • Touches: renovate.json (new, repo-root), .github/workflows/renovate.yml (new), .github/dependabot.yml (deleted — renamed to .github/dependabot.yml.disabled), docs/development/dependency-bot.md (new operator playbook), changelog.d/changed/renovate-supersedes-dependabot.md (new), docs/adr/0363-renovate-replaces-dependabot.md (new), docs/adr/_index_fragments/0363-renovate-replaces-dependabot.md (new).
  • Invariant: .github/dependabot.yml no longer exists on master; the disabled copy is dependabot.yml.disabled. On upstream sync, if Netflix ever ships their own dependabot.yml, do NOT restore it — the fork intentionally uses Renovate. Merge the upstream file into dependabot.yml.disabled for reference only.
  • Upstream interaction: none. Netflix/vmaf upstream has no Renovate config. Conflict risk is zero unless upstream adds renovate.json or restores dependabot.yml.
  • Re-test on rebase:
# Verify the workflow SHA-pin is still present and non-floating:
grep -E 'renovatebot/github-action@[a-f0-9]{40}' .github/workflows/renovate.yml
# Verify dependabot.yml is still absent:
test ! -f .github/dependabot.yml && echo "ok: dependabot.yml absent"
# Validate renovate.json syntax (requires Node):
node -e "JSON.parse(require('fs').readFileSync('renovate.json','utf8')); console.log('JSON valid')"

ADR-0355 — Symphony-inspired agent-dispatch infrastructure (2026-05-09)

Files added (all fork-introduced, none mirror upstream):

  • .claude/workflows/_template.md, .claude/workflows/codeql-alert-sweep.md, .claude/workflows/simd-port.md, .claude/workflows/feature-extractor-port.md.
  • scripts/lib/__init__.py, scripts/lib/backlog_tracker.py, scripts/lib/AGENTS.md.
  • scripts/ci/agent-eligibility-precheck.py (new row in scripts/ci/AGENTS.md "Rebase-sensitive surfaces" table).
  • docs/development/agent-dispatch.md. Why this rebase-note exists: pure additive, all paths are fork-only (.claude/, scripts/lib/, fork-only docs). Upstream Netflix/vmaf has no .claude/, no scripts/lib/, and no docs/development/agent-dispatch.md, so the merge surface is zero on /sync-upstream. The only coupling is internal between scripts/ci/agent-eligibility-precheck.py and scripts/lib/backlog_tracker.py (sys.path import). Both files move together; documented in scripts/lib/AGENTS.md and a new row in scripts/ci/AGENTS.md. Rebase-sensitivity: zero w.r.t. upstream. Internal-only: renaming BacklogItem field names or the BacklogTracker / GitHubTracker public method signatures is a breaking change for the precheck and any future state-audit script — guard via the smoke listed in Research-0091 §"Smoke results" before any rename PR. Format-coupling note: the BACKLOG.md row regex (scripts/lib/backlog_tracker.py:_ID_PATTERN) is brittle against table-shape edits. If a future BACKLOG.md edit adds a column or renames a status word, the parser will silently mis-classify rows — the smoke parses 101 rows on master at 2026-05-09; expect ≥ 100 after any structural edit.

0350 — psnr_hvs AVX-512 ceiling re-bench (ADR-0350, T3-9 (a))

  • docs/adr/0350-psnr-hvs-avx512-ceiling.md — closure ADR.
  • docs/adr/0160-psnr-hvs-neon-bitexact.md — appended ### Status update 2026-05-09 appendix.
  • docs/research/0091-psnr-hvs-avx512-bench-2026-05-09.md — empirical companion (cycle share, Amdahl ceiling, reproducer). Why this rebase-note exists: T3-9 (a) closes as AVX2 ceiling. The result has zero rebase-sensitivity by itself — no engine code changes — but the bit-exactness invariants that lock it to a ceiling do. The 78.42 % scalar tail in calc_psnrhvs_avx2 / calc_psnrhvs_neon is locked by ADR-0138 / ADR-0139's "per-lane-scalar float reduction" rule (carried by ADR-0159 / ADR-0160). If a future upstream sync of core/src/feature/third_party/xiph/psnr_hvs.c (the Xiph/Daala DCT) changes the per-block summation tree — e.g. partial folding, re-ordered means, vectorised mask reductions — the AVX2 + NEON TUs in core/src/feature/x86/psnr_hvs_avx2.c and core/src/feature/arm64/psnr_hvs_neon.c MUST be re-audited against the new scalar reference, and the ceiling argument in ADR-0350 must be re-run (because the 78 / 15 cycle-share split would shift). Rebase-sensitivity: low for the ceiling decision itself (empirical re-bench on a current host is cheap — 30 seconds via the reproducer in Research-0091 §7); high for the underlying bit-exactness invariants the decision rests on (Netflix golden trips on ≥ 5.5e-5 drift per ADR-0160 §Context). The ADR-0350 §Verification reproducer is the gate — re-run it if the cycle share shifts, the Netflix normal-pair fixture changes, or a new host class (e.g. wide-issue Granite Rapids) goes into CI.

0320 — FFmpeg n8.1 → n8.1.1 base bump (2026-05-09)

  • Touches: ffmpeg-patches/series.txt (header comment), ffmpeg-patches/README.md (apply / verify / smoke sections), ffmpeg-patches/test/build-and-run.sh (FFMPEG_SHA default), scripts/ci/ffmpeg-patches-check.sh (header comment; FFMPEG_BRANCH env default unchanged at release/8.1 since the branch tracks point releases), docs/development/automated-rule-enforcement.md (gate description). The 9 .patch files themselves are unchanged — every patch in the series applied cleanly, cumulatively, against pristine n8.1.1 via git am --3way.
  • Upstream source: FFmpeg upstream point release n8.1.1 (commit 239f2c7 "Bump micro for 8.1.1") — bug-fix-only on top of n8.1, no API or AVOption breakage that the patch stack consumes.
  • Invariant: the patch stack continues to apply against the current tip of FFmpeg's release/8.1 branch. Per ADR-0118 and ADR-0186 §FFmpeg patch coupling, the verification gate is cumulative git am --3way against a pristine checkout, not per-patch standalone apply. The scripts/ci/ffmpeg-patches-check.sh local gate uses git apply (no commit) but accumulates state in the same way.
  • On upstream sync: no action required. If a future FFmpeg point release (n8.1.2 or n8.2) lands new hunks that conflict with one of the patches, regenerate the affected patches via git format-patch on the resolved state, bump the references in the five files listed under "Touches", and add a fresh rebase-notes entry citing the conflict file(s).
  • Re-test on rebase:
cd /tmp && rm -rf ffmpeg-n811 && \
  git clone --depth 1 --branch n8.1.1 \
    https://git.ffmpeg.org/ffmpeg.git ffmpeg-n811
git -C /tmp/ffmpeg-n811 config user.email agent@local
git -C /tmp/ffmpeg-n811 config user.name agent
for p in ffmpeg-patches/000*-*.patch; do
  git -C /tmp/ffmpeg-n811 am --3way "$p" || break
done
bash scripts/ci/ffmpeg-patches-check.sh

ADR-0281 follow-up — QSV install-matrix discoverability backfill (2026-05-08)

  • Touches: docs/getting-started/install/{arch,fedora,ubuntu,macos,windows}.md (new ## Intel QSV section per page), docs/adr/0281-vmaf-tune-qsv-adapters.md (status-update appendix per ADR-0028), changelog.d/changed/qsv-install-matrix-docs.md (new fragment). No code, no engine, no upstream-shared C / Python source touched. Pure documentation backfill closing the SYCL-audit research-0086 Topic C gap (issue #464).
  • Invariant: each per-OS QSV section pins the package names against verified upstream URLs with a Verified 2026-05-08 access date. The hardware-generation matrix is sourced from the public Wikipedia "Intel Quick Sync Video — Hardware decoding and encoding" table; if Intel revises which generation supports AV1 encode (e.g. backports the encoder to Lunar Lake / Meteor Lake silicon currently absent from the table), the matrix in all five pages must move in lockstep — the Arch / Fedora / Ubuntu / Windows pages all carry the same matrix verbatim. The macOS page deliberately omits the matrix (QSV unsupported on macOS).
  • On upstream sync: no action required — Netflix/vmaf upstream does not ship per-OS install pages under docs/getting-started/install/; that tree is fork-only.

# Lint the install pages (markdownlint via pre-commit):

pre-commit run --files docs/getting-started/install/*.md

# Verify each page (except alpine + macos) still carries the matrix:

for f in arch fedora ubuntu windows; do grep -q 'Arc Battlemage' "docs/getting-started/install/${f}.md" || echo "MISSING: ${f}"

# Confirm the macOS page documents QSV as unsupported:

grep -q 'Intel QSV. is unsupported on macOS' docs/getting-started/install/macos.md

0333 — vmaf-tune Phase F multi-pass encoding (ADR-0333)

Touches:

  • tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py (CodecAdapter Protocol gains supports_two_pass: bool + two_pass_args(...))
  • tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (overrides both)
  • tools/vmaf-tune/src/vmaftune/encode.py (EncodeRequest gains pass_number / stats_path; build_ffmpeg_command adds the 2-pass argv splice + pass-1 null-muxer redirect; new run_two_pass_encode)
  • tools/vmaf-tune/src/vmaftune/corpus.py (CorpusOptions.two_pass, routing in iter_rows)
  • tools/vmaf-tune/src/vmaftune/cli.py (--two-pass flag on corpus / recommend subparsers) Invariant: 2-pass encoding routes through the codec adapter via supports_two_pass + two_pass_args(pass_number, stats_path). The encode driver never branches on codec name. Adapters with supports_two_pass = False are honoured silently (single-pass fallback with stderr warning); the seam is open for sibling codec adapters (libx264, libsvtav1, libvvenc, libaom-av1) to opt in by overriding the two methods on their adapter file alone. This is the fork-local extension to the ADR-0237 Phase A multi-codec contract; upstream Netflix/vmaf has no equivalent and does not own this code path. Re-test:
cd tools/vmaf-tune
python -m pytest tests/test_codec_adapter_x265_two_pass.py -q

(Optional, requires ffmpeg + libx265 in the runner's PATH:)

VMAF_TUNE_INTEGRATION=1 python -m pytest \
  tests/test_codec_adapter_x265_two_pass.py::test_real_x265_two_pass_smoke -q

Rebase-sensitivity: zero from upstream — tools/vmaf-tune/ is fork-local. The only concern is the codec_adapters Protocol shape: a future upstream commit that adds a sibling codec adapter SHOULD inherit the supports_two_pass = False default and either explicitly opt in or leave the flag off. Downstream sibling-codec PRs in this fork should follow the ADR-0288 / ADR-0333 pattern: one adapter file, override the two methods, add a test file mirroring test_codec_adapter_x265_two_pass.py.

ADR-0360 — CAMBI CUDA port (T3-15a, 2026-05-09)

Files pinned:

  • core/src/feature/cuda/integer_cambi_cuda.c (new)
  • core/src/feature/cuda/integer_cambi_cuda.h (new)
  • core/src/feature/cuda/integer_cambi/cambi_score.cu (new)
  • core/src/feature/feature_extractor.c (added vmaf_fex_cambi_cuda to list)
  • core/src/meson.build (added cambi_score to cuda_cu_sources, added integer_cambi_cuda.c to CUDA feature sources)

Why: The CUDA twin of vmaf_fex_cambi (Strategy II hybrid — three GPU kernels for the embarrassingly parallel stages; calculate_c_values + topK on CPU). Registers vmaf_fex_cambi_cuda under #if HAVE_CUDA guard.

Rebase-sensitivity: low. The three new files are wholly fork-local and will not conflict. The two upstream-shared files have small, self-contained hunks:

  • feature_extractor.c: the extern vmaf_fex_cambi_cuda declaration and the &vmaf_fex_cambi_cuda array entry are inside a #if HAVE_CUDA block. Upstream's additions to this file (new feature extractors, new dispatch flags) will not conflict unless Netflix adds their own CUDA twin for CAMBI (unlikely — they don't ship a CUDA backend).
  • meson.build: the cambi_score entry in the cuda_cu_sources dict and the integer_cambi_cuda.c line in the CUDA sources list. Any upstream changes to meson.build that restructure the cuda_cu_sources dict would require a manual merge; the dict entries are sorted alphabetically by key, so cambi_score lands between adm_score and motion_score.

If upstream adds cambi_cuda themselves: drop the fork copy and check for API divergence. Strategy II hybrid is the natural choice; the upstream implementation may differ if they choose Strategy III (fully-on-GPU calculate_c_values).

cambi_internal.h dependency: integer_cambi_cuda.c includes core/src/feature/cambi_internal.h (fork-added trampoline exposing cambi.c's static helpers). If upstream significantly refactors cambi.c (renames vmaf_cambi_preprocessing, vmaf_cambi_calculate_c_values, etc.), cambi_internal.h must be updated alongside. This is the same dependency the Vulkan twin (cambi_vulkan.c) has — see ADR-0210's rebase note for the full list of exposed functions.

Vulkan submit-pool PR-B: six secondary kernels (2026-05-09, ADR-0353)

Files changed:

  • core/src/feature/vulkan/ssim_vulkan.c
  • core/src/feature/vulkan/ciede_vulkan.c
  • core/src/feature/vulkan/ms_ssim_vulkan.c
  • core/src/feature/vulkan/motion_v2_vulkan.c
  • core/src/feature/vulkan/float_psnr_vulkan.c
  • core/src/feature/vulkan/float_motion_vulkan.c
  • core/src/feature/vulkan/AGENTS.md
  • docs/adr/0353-vulkan-submit-pool-pr-b-six-kernels.md

Why this rebase-note exists: six Vulkan host-glue TUs were migrated from per-frame command-buffer and descriptor-set allocation to the VmafVulkanKernelSubmitPool abstraction (ADR-0256). Any Netflix upstream sync that touches these same files (unlikely — they are fork-local) must preserve the VmafVulkanKernelSubmitPool fields in the state struct and the pool-destroy-before-pipeline-destroy ordering in close_fex().

Rebase-sensitivity: low. All six files are entirely fork-local; Netflix upstream does not have a Vulkan backend. The submit-pool API is defined in core/src/vulkan/kernel.h (also fork-local). No public header or C-API surface was changed; the FFmpeg patch series is unaffected.

Key invariant to preserve on rebase: vmaf_vulkan_kernel_submit_pool_destroy MUST be called before vmaf_vulkan_kernel_pipeline_destroy in every migrated kernel's close_fex(). See core/src/feature/vulkan/AGENTS.md §"Submit-pool ordering invariant".

0354 — Vulkan submit-pool PR-C: submit_pool_destroy-before-pipeline ordering

  • Touches: core/src/feature/vulkan/cambi_vulkan.c, core/src/feature/vulkan/ssimulacra2_vulkan.c, core/src/feature/vulkan/float_ansnr_vulkan.c, core/src/feature/vulkan/moment_vulkan.c.
  • Invariant: In every migrated extractor, vmaf_vulkan_kernel_submit_pool_destroy() MUST precede every vmaf_vulkan_kernel_pipeline_destroy() call in close_fex(). Reversing the order frees the pool's command buffers after the pipeline's command pool is destroyed — undefined behaviour per Vulkan spec §6.2.
  • Re-test: meson test -C build --suite=vulkan passes. scripts/ci/cross_backend_vif_diff.py shows places=4 for all four extractors on all three target devices (RTX 4090, Arc A380, RADV iGPU).

0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0291)

0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0352)

  • Touches: core/src/feature/vulkan/adm_vulkan.c, core/src/feature/vulkan/motion_vulkan.c, core/src/feature/vulkan/psnr_vulkan.c (all fork-local Vulkan kernels; no upstream C paths touched), changelog.d/changed/vulkan-submit-pool-pr-a-adm-motion-psnr.md, docs/adr/0291-vulkan-submit-pool-pr-a-adm-motion-psnr.md.
  • Invariant: Each migrated TU adds VmafVulkanKernelSubmitPool sub_pool and pre-allocated VkDescriptorSet field(s) to its state struct. The pool must be destroyed (vmaf_vulkan_kernel_submit_pool_destroy) before vmaf_vulkan_kernel_pipeline_destroy in close_fex(); reversing the order would destroy the descriptor pool while the submit pool still holds live command buffer + fence references. Descriptor sets allocated via vmaf_vulkan_kernel_descriptor_sets_alloc are freed implicitly by the descriptor pool tear-down — do NOT call vkFreeDescriptorSets on them in close_fex(). For motion_vulkan, the pre-allocated set is rebound once per frame via vkUpdateDescriptorSets because the blur ping-pong changes which blur[] slot is "current"; for adm_vulkan and psnr_vulkan the sets are stable after init() and require no per-frame update.
  • Upstream interaction: none. All three files are fork-local Vulkan kernel TUs not present in Netflix/vmaf upstream.
  • On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths. The Vulkan backend is entirely fork-introduced.
  • Re-test on rebase:
meson test -C build --suite=fast
# Cross-backend parity gate (places=4):
python python/test/cross_backend_diff.py \
    --features adm motion psnr \
    --backend vulkan cpu \
    --places 4 \
    --yuv testdata/yuv/src01_hrc00_576x324.yuv \
            testdata/yuv/src01_hrc01_576x324.yuv

ADR-0350 — FFmpeg libvmaf filter CUDA backend selector (0010 patch)

Patch: ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch.

  • libavfilter/vf_libvmaf.c — adds cuda AVOption + state field + init / cleanup / picture-pool wiring under CONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER.
  • configure — adds --enable-libvmaf-cuda (EXTERNAL_LIBRARY_LIST entry + help text), promotes libvmaf_cuda from blanket-autodetect to gated enabled libvmaf_cuda && require_pkg_config + check, preserves the enabled libvmaf && check_pkg_config libvmaf_cuda in-filter probe so the new selector still works without the explicit flag when libvmaf ships CUDA. Why this rebase-note exists: Patch 0010 extends the SYCL (0003) / Vulkan (0004) per-context backend selectors to CUDA on the regular libvmaf filter. The patch coexists with the upstream dedicated libvmaf_cuda filter (CONFIG_LIBVMAF_CUDA_FILTER) by gating its struct field and code paths on !CONFIG_LIBVMAF_CUDA_FILTER — the dedicated filter keeps owning its own cu_state field. CLAUDE.md §12 r14 makes the patch update mandatory because the change touches a filter consumer of the vmaf_cuda_state_init / _import_state / _state_free / _preallocate_pictures / _fetch_preallocated_picture C-API surface in libvmaf_cuda.h. Rebase-sensitivity: low. The patch's vf_libvmaf.c hunks are context-anchored on the SYCL/Vulkan selector blocks; if upstream FFmpeg renames CONFIG_LIBVMAF_CUDA_FILTER or moves the libvmaf_cuda.h include, the include guard at the top of the file needs the corresponding update. The configure hunks are context-anchored on the existing --enable-libvmaf-sycl / --enable-libvmaf-vulkan lines — those have proven stable across n8.0 → n8.1 → n8.1.1, so drift risk is low. When VmafCudaConfiguration ever grows a device_index field upstream, swap the cuda boolean for an int cuda_device mirroring SYCL's shape (separate ADR + patch refresh). Verification gate: cumulative git am --3way replay of ffmpeg-patches/000{1..9}-*.patch + 0010-* against pristine FFmpeg n8.1.1 PASS (2026-05-09). Build of libavfilter/vf_libvmaf.o PASS under both CONFIG_LIBVMAF_CUDA=0 (selector errors at filter- init time per #else branch) and CONFIG_LIBVMAF_CUDA=1 && !CONFIG_LIBVMAF_CUDA_FILTER (selector active, picture-pool wiring compiles).

0320 — Vulkan instance / VMA apiVersion bump to 1.4 (Step B)

  • Touches: core/src/vulkan/common.c, core/src/vulkan/vma_impl.cpp, core/src/vulkan/AGENTS.md.
  • Invariant: the four apiVersion sites (lines 54, 264, 374 of common.c; line 22 of vma_impl.cpp) request Vulkan 1.4, not 1.3. Together with the Step-A precise decorations in vif.comp / ciede.comp (PR #346) and the Phase-3 cross-subgroup release-acquire fix (PR #511), this gates the cross-backend places=4 contract on Arc + RADV. NVIDIA closure depends on Phase 3c (PR #512; block-on-merge until that lands). Netflix upstream does not carry a VMA dependency or a Vulkan backend; no upstream merge conflict expected on these files.
  • Re-test on rebase:
meson setup build -Denable_vulkan=enabled -Denable_cuda=false \
  -Denable_sycl=false --buildtype=release
ninja -C build
for D in 0 1 2; do
  python3 scripts/ci/cross_backend_parity_gate.py \
    --vmaf-binary build/tools/vmaf \
    --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
    --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
    --width 576 --height 324 --pixel-format 420 --bitdepth 8 \
    --backends cpu vulkan --vulkan-device "$D" \
    --features vif ciede adm motion psnr
done
# All 0/N mismatches at places=4 once Phase 3c (PR #512) has landed.

ADR-0332 v2 runtime (T5-2c) — Embedded MCP server UDS + real compute_vmaf (2026-05-09)

  • Touches: core/src/mcp/{mcp.c,dispatcher.c,mcp_internal.h,meson.build,compute_vmaf.c,transport_uds.c}, core/test/test_mcp_smoke.c. All paths are fork-local. No new third-party vendor drop in v2 — mongoose vendoring stays deferred to v3 with the SSE transport.
  • Invariant: same as ADR-0209 v1 — the entire core/src/mcp/ subtree is fork-local; the public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged (only function bodies flipped — vmaf_mcp_start_uds from -ENOSYS to a working AF_UNIX listener; compute_vmaf from a {"status":"deferred_to_v2"} placeholder to a real vmaf_score_pooled binding). Per ADR-0128 § operational guardrails the UDS socket file is created mode 0700; that chmod happens in vmaf_mcp_start_uds after bind and is a load-bearing security invariant — do NOT relax it on rebase. compute_vmaf runs on a per-call ephemeral VmafContext so the host's main scoring run is unperturbed; do NOT rewire it to reuse server->ctx because vmaf_score_pooled commits the model destructively to the context.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. If upstream adds one, expect a port-only sync since names will collide.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
                                -Denable_mcp=true -Denable_mcp_stdio=true \
                                -Denable_mcp_uds=true
ninja -C build && meson test -C build test_mcp_smoke -v
# Real-score smoke (single 576x324 pair):
build/test/test_mcp_smoke 2>&1 | tail -3   # expects "16 tests run, 16 passed"

ADR-0332 v3 runtime (T5-2d) — Embedded MCP server SSE transport (2026-05-09)

  • Touches: core/src/mcp/{mcp.c,mcp_internal.h,meson.build,transport_sse.c}, core/meson_options.txt, core/test/test_mcp_smoke.c, docs/mcp/embedded.md, docs/adr/0332-mcp-runtime-v2.md (status-update appendix). All paths are fork-local. No third-party vendor drop in v3 — the originally-planned mongoose vendor was reversed because cesanta/mongoose 7.18 is GPL-2.0-only OR commercial, incompatible with the fork's BSD-3-Clause-Plus-Patent license (verified at upstream LICENSE 2026-05-09). The SSE transport is plain POSIX sockets in fork-owned C (~500 LOC).
  • Invariant: same as ADR-0209 / ADR-0332 v2 — the entire core/src/mcp/ subtree is fork-local; the public ABI in core/include/libvmaf/libvmaf_mcp.h is unchanged (only vmaf_mcp_start_sse's body flipped from -ENOSYS to a working AF_INET listener). The SSE listener binds INADDR_LOOPBACK only; do NOT switch to INADDR_ANY without a separate ADR + auth design (v3 ships intentionally without CORS/Bearer/per-session auth on the assumption of a same-host trust boundary). The SSE stop path uses shutdown(SHUT_RDWR) before close() — plain close() of an AF_INET listening fd from another thread does NOT unblock accept() on Linux; do NOT remove the shutdown call. enable_mcp_sse is now a feature option (default auto), not boolean false.
  • On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. Do NOT re-introduce mongoose (or any GPL-licensed HTTP library) on a future rebase without first amending CLAUDE §1 and adding a separate license-compatibility ADR.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
                                -Denable_mcp=true -Denable_mcp_stdio=true \
                                -Denable_mcp_uds=true \
                                -Denable_mcp_sse=enabled
ninja -C build && meson test -C build test_mcp_smoke -v
build/test/test_mcp_smoke 2>&1 | tail -3   # expects "17 tests run, 17 passed"

Status update 2026-05-09 — placeholder-ref hardening

  • Additional touches: same set as the 2026-05-08 ADR-0334 entry, no new files. The hardening adds a git diff -U0 ... -- docs/state.md call inside scripts/ci/state-md-touch-check.sh (case 4a) plus 10 additional fixture cases in scripts/ci/test-state-md-touch-check.sh.
  • New invariant: inserted lines in docs/state.md (lines starting with +, excluding the +++ b/... header) must not contain this PR / this commit / bare TBD / <PR> / #NNN. Canonical accept forms are PR #N and commit `<sha>`. The placeholder vocabulary is coupled to PR #541's audit findings — reword in lockstep with the ADR-0334 status-update appendix if the fork's row template changes.
  • Re-test on rebase: same bash scripts/ci/test-state-md-touch-check.sh run as the 2026-05-08 entry; the harness now reports 18/18 passed (was 8/8 passed).

0347 — Sanitizer matrix test-set scope (ADR-0347)

  • Touches: .github/workflows/tests-and-quality-gates.yml job sanitizers (build + test step), core/test/meson.build (no edits — the absence of any suite: 'unit' tag is the upstream state we now work with rather than against).
  • Invariant: the sanitizer job runs the full C unit-test set per sanitizer with a per-sanitizer deselect list driven by a case block on ${{ matrix.sanitizer }}. The deselect lists are load-bearing — each entry corresponds to a real bug tracked in docs/state.md. Under UBSan the build adds -Dc_args=-fno-sanitize=function -Dcpp_args=-fno-sanitize=function to suppress the K&R-prototype harness UB; the meson case branch must keep this build flag in sync with the test deselect entries. An upstream rebase that adds new test files via core/test/meson.build inherits full sanitizer coverage automatically (the workflow enumerates tests via meson test --list).
  • On upstream sync: if upstream Netflix lands a suite: 'unit' tagging convention, the workflow is robust to it (we already enumerate from meson test --list, not from --suite=unit). If upstream rewrites the harness to declare static char *test_X(void) with a (void) parameter, the -fno-sanitize=function flag becomes redundant — leave it in place (zero cost) until a deliberate cleanup PR reverts the suppression. If upstream lands a fix for any of the surfaced defects (SVMModelParser validation, feature_collector metadata leak, integer_adm::div_lookup race, framesync mutex mismatch), drop the corresponding deselect row from the workflow's case block in the same PR that pulls the upstream fix. cd libvmaf for SAN in address undefined thread; do EXTRA=() [ "$SAN" = undefined ] && EXTRA=( "-Dc_args=-fno-sanitize=function" "-Dcpp_args=-fno-sanitize=function" ) rm -rf "build-$SAN" CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld \ meson setup "build-$SAN" -Db_sanitize="$SAN" \ -Denable_cuda=false -Denable_sycl=false --buildtype=debug \ -Db_lto=false -Db_lundef=false "${EXTRA[@]}" meson compile -C "build-$SAN" case "$SAN" in address) EXCLUDE='test_model$|test_predict$|test_float_ms_ssim_min_dim$' ;; undefined) EXCLUDE='test_model$' ;; thread) EXCLUDE='test_model$|test_pic_preallocation$|test_framesync$' ;; esac TESTS=$(meson test -C "build-$SAN" --list \ | grep '^libvmaf:' \ | grep -vE "$EXCLUDE" \ | sed 's/^libvmaf://') meson test -C "build-$SAN" --print-errorlogs $TESTS

CodeQL bulk mechanical sweep — Python tree (2026-05-09)

  • Why this matters on rebase: no rebase impact. The diff lives entirely in python/vmaf/ and one fork-local helper (core/src/vulkan/spv_embed.py). None of the touched Python modules have been changed by Netflix upstream in over four years; the closest churn is unrelated additions to python/vmaf/script/run_*.py driver flags. A future /sync-upstream will land on a clean tree.
  • What changed: dead imports removed; exit()sys.exit() in seven CLI driver scripts; open(...)with open(...) in python/vmaf/tools/decorator.py and core/src/vulkan/spv_embed.py; typed except KeyError: pass bodies got an explanatory one-line comment to satisfy py/empty-except; pass removed where it was a no-op tail statement; one commented-out debug block deleted from tools/misc.py.
  • Re-test on rebase: python3 -c "import ast; [ast.parse(open(f).read()) for f in (...)]" over the touched files; ruff check over the same set must produce no NEW errors versus master baseline.

0345 — cambi × {CUDA, SYCL, HIP} GPU port planning (ADR-0345, docs-only)

  • Touches: docs/research/0091-cambi-gpu-port-planning-2026-05-09.md (new), docs/adr/0345-cambi-gpu-port-strategy.md (new), docs/adr/_index_fragments/0345-cambi-gpu-port-strategy.md (new fragment), docs/adr/_index_fragments/_order.txt (append slot), changelog.d/changed/cambi-gpu-planning-digest.md (new). No code. Companion to the per-port PRs that follow per the digest's §6 ordered plan (CUDA → SYCL → HIP).
  • Upstream source: none — fork-local planning artefact. Netflix/vmaf upstream has no CUDA / SYCL / HIP cambi twin and no plans to add one on those backends.
  • Invariant: the planning round locks Strategy II host-staged hybrid for the three pending backends, inheriting verbatim from ADR-0205 §Decision and ADR-0210 §Decision. The cross-backend gate contract for cambi is places=4 from day one on all backends — by construction (integer-only GPU pre-passes; byte-identical readback; unmodified host residual). If any per-port PR sees empirical drift from CPU, fix the kernel — never relax the gate (memory feedback_no_test_weakening). The shared cambi_internal.h host residual surface (shipped with PR #196 for the Vulkan port) is the load-bearing reuse point — all four GPU twins (Vulkan, CUDA, SYCL, HIP) link against it and inherit any future CPU-side c-value formula change automatically.
  • On upstream sync: no action required. If a future upstream sync introduces a Netflix/vmaf cambi GPU twin (extremely unlikely — Netflix has no public CUDA / SYCL / HIP cambi work), evaluate whether to drop the fork's twin in favour of upstream's per the standard prefer-upstream rule; otherwise no action.
  • Re-test on rebase: docs-only — no compile / runtime gate. The Strategy III v2 follow-up (parked per ADR-0205 §Out of scope) gets its own ADR + rebase-notes entry when profile data lands.

0320 — Vulkan VIF API-1.4 NVIDIA residual Phase 3b (deferral)

  • Touches: core/src/feature/vulkan/shaders/vif.comp (comment-only update at the Phase-4 reduction site — documents the Phase-3b candidate-fix experiments and the driver-side hypothesis; no code logic change vs. PR #511); docs/adr/0269-vif-ciede-precise-step-a.md (appended Phase-3b status update appendix; ADR body remains frozen per ADR-0028); docs/research/0090-...md (new); docs/state.md (row T-VK-VIF-1.4-RESIDUAL-ARC retired in favour of T-VK-VIF-1.4-RESIDUAL-NVIDIA-DEFERRED after the hardware-mapping correction); core/src/vulkan/AGENTS.md (Phase 3b update + rebase invariant for cross-backend gate device-name selection); changelog.d/fixed/vif-arc-mesa-anv-int64-reduction.md (new fragment).
  • Invariant: the workgroup-scope memoryBarrierShared(); barrier(); pair PR #511 introduced is load-bearing for the Arc + RADV lanes at API 1.4 and stays. Phase 3b confirmed it cannot be downgraded back to a bare barrier() even if the NVIDIA residual ever closes — Arc's clean state is contingent on the workgroup-scope pair.
  • Cross-backend gate device-selection invariant (NEW): scripts that target a specific Vulkan vendor must select by deviceName substring, not by --vulkan_device <index>. vmaf_vulkan_context_new's device sort is stable inside the same devtype_score bucket and the vkEnumeratePhysicalDevices enumeration order is host-policy-dependent (driver registration order in /etc/vulkan/icd.d/, Mesa device-select layer, VK_LOADER_* env vars). PR #511's commit message inverted the device map on this fork's CI workstation; the empirical numbers it cited as "NVIDIA" actually came from Arc and vice versa. New cross-backend lanes targeting a specific vendor should not inherit the off-by-one.
  • On upstream sync: vif.comp is fork-local; no upstream Netflix/vmaf has a Vulkan path. Cherry-picks from upstream cannot reach this file.
  • Re-test on rebase (assumes a multi-GPU CI workstation with NVIDIA + Arc + RADV; lavapipe-only CI lanes are a no-op for the API-1.4 residual since lavapipe never reproduced the bug):

# Local API-1.4 bump (off-master reproducer; do NOT commit).

sed -i 's/VK_API_VERSION_1_3/VK_API_VERSION_1_4/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1003000/VMA_VULKAN_VERSION 1004000/' \ core/src/vulkan/vma_impl.cpp cd libvmaf && meson setup build -Denable_vulkan=enabled \ -Denable_cuda=false -Denable_sycl=false && ninja -C build cd ..

# NVIDIA lane — expected 45/48 FAIL scale 2 until either the

# manual int64 subgroup-reduction patch lands or NVIDIA fixes

# the driver. Arc + RADV expected 0/48.

python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature vif --backend vulkan --device

# Revert local bump after testing.

sed -i 's/VK_API_VERSION_1_4/VK_API_VERSION_1_3/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1004000/VMA_VULKAN_VERSION 1003000/' \ core/src/vulkan/vma_impl.cpp

Upstream-port-later batch — Research-0090 18-commit triage close-out (2026-05-09)

  • Touches: docs/state.md (one row in "Deferred (waiting on external trigger)"), this file, changelog.d/changed/upstream-port-later-batch-2026-05-09.md. No code touched. Companion to PR #446 (Research-0090) and the in-flight PRs #497 (MyTestCase super-PR), #443 / #444 (cambi-docs duplicate pair).
  • Per-commit classification (input set: 18 PORT_LATER SHAs from Research-0090):
# Upstream SHA Subject (truncated) Verdict Reopen / forward path
1 38e905d1 adopt MyTestCase + reformat BD-rate test data PORT_DEFERRED Subsumed by PR #497 commit e1dbdc09; close out when #497 merges
2 005988ea adopt MyTestCase + port new tests + align fifo_mode PORT_DEFERRED Subsumed by PR #497 commit 6c05afe2; close out when #497 merges
3 4679db83 fix VMAFEXEC_score tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit 0004d2cf — must preserve fork's golden places= values byte-for-byte (CLAUDE §8 / ADR-0024)
4 3e075107 adopt MyTestCase + update score values in vmafexec tests PORT_DEFERRED Subsumed by PR #497 commit 0004d2cf; close out when #497 merges
5 e3827e4d adopt MyTestCase + port new tests in asset/bootstrap/local_explainer PORT_DEFERRED Subsumed by PR #497 commit 6c05afe2; close out when #497 merges
6 25ff9f18 remove empty VmafossexecCommandLineTest stub PORT_DEFERRED → CHERRY-PICK after #497 Pure 13-line deletion. PR #497 currently RE-EMITS the stub; once #497 lands, cherry-pick this commit standalone (zero-conflict against post-#497 tip).
7 3a041a97 adopt MyTestCase + update score values PORT_DEFERRED Subsumed by PR #497 commit d52d9221; close out when #497 merges
8 ead2d12b fix vif_scale3 + adm3_egl_1 tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit b5a3f61b — Netflix-golden tolerance guard same as row 3
9 6c097fc4 reduce ADM/VIF tolerances for macOS FP precision PORT_DEFERRED w/ Netflix-golden guard PR #497 commit f3881d5c — Netflix-golden tolerance guard same as row 3
10 7df50f3a align testutil with full set of fixture functions PORT_DEFERRED Subsumed by PR #497 commit f1ae0495; close out when #497 merges
11 322ca041 replace temporal slicing with pre-sliced YUV fixtures PORT_DEFERRED Subsumed by PR #497 commit 7d9d9a10; close out when #497 merges. Sequencing matters: this commit must land before rows 12, 14, 15, 17 (the YUV-fixture consumers); #497 already orders them correctly.
12 74bdce1b align vmafexec_feature_extractor_test (aim/adm3/motion3) PORT_DEFERRED Subsumed by PR #497 commit 07e7cb48; close out when #497 merges
13 a3776335 align feature_extractor_test (aim/adm3/motion3) PORT_DEFERRED Subsumed by PR #497 commit 15a6874d; close out when #497 merges
14 0341f730 remove duplicate test_run_vmaf_integer_fextractor PORT_DEFERRED → CHERRY-PICK after #497 Pure 76-line deletion. Same disposition as row 6 — #497 currently re-emits the duplicate; cherry-pick standalone after #497.
15 9fa593eb port feature_extractor tests for aim/adm3/motion3 + new options PORT_DEFERRED Subsumed by PR #497 commit ab21b694; close out when #497 merges
16 d93495f5 reduce tolerance for VMAF scores in quality_runner tests PORT_DEFERRED w/ Netflix-golden guard PR #497 — Netflix-golden tolerance guard same as row 3
17 7d1ad54b port feature extractor tests for aim/adm3/motion3 PORT_DEFERRED Subsumed by PR #497 commit 44b9e626; close out when #497 merges
18 721569bc resource/doc: cambi_high_res_speedup + motion2 score PORT_DEFERRED → DEDUP Already in flight on TWO branches (PR #443 + PR #444). Maintainer picks one and abandons the other per Research-0090 §Recommended action #4. No third port-PR opened.
  • Invariant: after PR #497 merges, the Research-0090 PORT_LATER bucket reduces to exactly two follow-up cherry-picks against post-#497 master:
  • git cherry-pick 25ff9f18 (delete empty VmafossexecCommandLineTest).
  • git cherry-pick 0341f730 (delete duplicate test_run_vmaf_integer_fextractor). Both are pure deletions on python/test/command_line_test.py and python/test/feature_extractor_test.py respectively; no score change, no Netflix-golden interaction. They were excluded from PR #497 because the v2 super-PR's diff state currently RE-EMITS those identifiers (likely because #497 cherry-picked from an earlier upstream tip than 25ff9f18 / 0341f730).
  • Netflix-golden guard (binding): per CLAUDE §8 / ADR-0024, the three Netflix CPU golden pairs in python/test/quality_runner_test.py, vmafexec_test.py, vmafexec_feature_extractor_test.py, feature_extractor_test.py, result_test.py (1 normal src01_hrc00↔hrc01 + 2 checkerboard) carry hard-coded assertAlmostEqual rows that are NEVER modified by a fork PR. Upstream commits 4679db83, ead2d12b, 6c097fc4, d93495f5 explicitly LOWER places= on a subset of those rows (their stated motivation is macOS FP precision drift, not a true score change). Reviewer of PR #497 must verify that the 3 golden pairs retain fork tolerances byte-for-byte; only non-golden rows may adopt the relaxations.
  • On upstream sync: future /sync-upstream runs that re-detect these 18 SHAs should match this entry via the SHA list and short-circuit Pass-2 classification (skip re-triage).
  • Re-test on rebase: none required at the time of this commit (no code touched); after the two follow-up cherry-picks (25ff9f18 + 0341f730) eventually land, run meson test -C build --suite=fast make test-netflix-golden # 3/3 CPU goldens still pass

ADR-0357 — Vulkan readback buffer VMA flag separation (PR pending)

What changed: picture_vulkan.{c,h} now exposes two sibling allocation functions: vmaf_vulkan_buffer_alloc (UPLOAD, unchanged) and vmaf_vulkan_buffer_alloc_readback (READBACK, HOST_ACCESS_RANDOM). A new vmaf_vulkan_buffer_invalidate wraps vmaInvalidateAllocation. All 17 feature kernel files under core/src/feature/vulkan/ are updated to use the readback variant for accumulator and partial-sum buffers.

  • core/src/vulkan/picture_vulkan.c — two new functions + shared helper.
  • core/src/vulkan/picture_vulkan.h — two new declarations.
  • All 17 core/src/feature/vulkan/*.c files — alloc and invalidate call sites. Rebase-sensitivity: low — entirely fork-local Vulkan backend code with no upstream Netflix counterpart. If an upstream sync adds new files to core/src/vulkan/ or core/src/feature/vulkan/, new readback buffers in those files must be classified (UPLOAD vs READBACK) and use the correct allocator per the table in ADR-0350. Conflict risk on the 17 feature files is zero (upstream doesn't touch them).

ADR-0356 — ffmpeg-patches surface-sync CI gate (2026-05-09)

Files added:

  • scripts/ci/ffmpeg-patches-surface-check.sh (new gate script).
  • .github/workflows/rule-enforcement.yml (new ffmpeg-patches-surface-check job).
  • docs/adr/0356-ffmpeg-patches-surface-gate.md (decision record).
  • docs/development/automated-rule-enforcement.md (user-facing doc update).

Why this rebase-note exists: the gate is fork-local CI; it does not touch any upstream-shared file, so an upstream merge cannot drop its enforcement. However, whoever runs the next /sync-upstream should be aware that ffmpeg-patches/ integrity is now machine-checked on every PR — if a future libvmaf header rename slips through during conflict resolution and breaks the patch stack, the gate will fire on the post-sync PR and surface the omission immediately rather than at the next sync.

Rebase-sensitivity: zero on the upstream-merge path. Indirect benefit: the gate hardens ffmpeg-patches/ against silent drift, so the patch-stack invariants tracked elsewhere in this file (entries referencing ffmpeg-patches/0001…0009) are now machine-defended.

0320 — HIP CI lane apt-installs ROCm runtime (ADR-0212 status update)

  • Touches: .github/workflows/libvmaf-build-matrix.yml (HIP lane if: matrix.hip install step + base-deps gate), .github/workflows/required-aggregator.yml (HIP lane added to required-check allow-list). Upstream Netflix/vmaf has no HIP backend and no equivalent CI matrix; conflict probability against upstream/master is zero. Entry exists to flag the rebase-sensitive ROCm-version pin for future maintainers.
  • Invariant: the ROCm version pin (ROCM_VERSION: "7.2.3") in the Install ROCm / HIP runtime step must match the version the maintainer's local box runs against. The apt URL is https://repo.radeon.com/rocm/apt/<ver> — the version is part of the path, so AMD effectively snapshots each ROCm release as its own apt repo. Bumping the pin is a one-line change but requires re-validating that rocm-hip-runtime-dev still pulls the same symbol set; in particular, amdhip64 major-version changes have historically broken dlopen consumers. noble is the codename for ubuntu-24.04, which is what ubuntu-latest resolves to on GitHub-hosted runners as of 2024-04. If ubuntu-latest rolls forward to a newer LTS, the apt repo path component (https://repo.radeon.com/rocm/apt/<ver> <codename> main) needs to be re-checked against https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/install-methods/package-manager/package-manager-ubuntu.html for the current AMD-supported codename list.
  • Re-test on rebase:
# Locally, mirror what CI does (assumes ROCm /opt/rocm install on dev box):
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
./build/test/test_hip_smoke   # passes with device_count == 0
# Apt-side: verify the URL still resolves (versioned path)
curl -sfI https://repo.radeon.com/rocm/apt/7.2.3/dists/noble/Release \
  && echo OK || echo "ROCm apt URL drifted — bump ROCM_VERSION"

RN-2026-05-08-cambi-cluster — port 9 of 10 upstream cambi commits

  • Tracked by: ADR-0328, PR feat/upstream-port-cambi-cluster-2026-05-08.
  • Cluster: Netflix upstream commits d655cefe, 9fad7317, 767a6780, 8c60dc9e, bd278ea6, 1091b0c1, 77474251, 933cccb4, 984f281f ported verbatim. 41bacc83 ("move shared code to cambi.h") explicitly skipped.
  • Touches: core/src/feature/cambi.c, core/src/feature/x86/cambi_avx2.c, core/src/feature/x86/cambi_avx2.h, core/test/test_cambi.c. cambi_reciprocal_lut.h stays (fork commit ef6d33e6 already added it before upstream).
  • Invariant: the fork uses a CAMBI_CALC_C_VALUES_BODY macro in cambi.c to share the calculate_c_values loop nest across calculate_c_values (scalar), calculate_c_values_avx2, and calculate_c_values_neon. Upstream keeps the three variants as separate function definitions in cambi.c (scalar) and cambi_avx2.c (AVX-2) with the helpers exposed via cambi.h. The fork's macro keeps the three drivers in lockstep without externalising the helpers.
  • Twin-update gaps:
  • AVX-512: no calculate_c_values_row_avx512 exists; the AVX-512 dispatch path falls through to calculate_c_values_avx2. Tracked as a perf follow-up — bit-exactness preserved, only throughput affected.
  • NEON: calculate_c_values_neon uses scalar calculate_c_values_row (no NEON calculate_c_values_row_neon exists yet). Tracked as a perf follow-up.
  • CUDA / SYCL: cambi has no GPU twin in those backends (the only existing twin is Vulkan, ADR-0205 Strategy II). The Vulkan twin's host-residual shim vmaf_cambi_calculate_c_values was updated in port 933cccb4 to drop the inc/dec range-updater parameters (now (void)-cast since calculate_c_values self-dispatches its updaters); ABI-compatible with cambi_internal.h callers.
  • On upstream sync: when re-syncing cambi, expect conflicts on the calculate_c_values_avx2 body — upstream keeps it as a function in cambi_avx2.c, the fork keeps it inside cambi.c via the macro. The translation is mechanical: take any inner-loop change from upstream's body, apply it once inside CAMBI_CALC_C_VALUES_BODY. The fork's calculate_c_values_neon has no upstream counterpart and stays fork-local.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && build/test/test_cambi
# Optional GPU-parity gate when available:
# ./scripts/cross-backend-diff.sh --feature cambi

ADR-0336 — KonViD MOS head v1 (2026-05-08)

  • Touches: ai/scripts/train_konvid_mos_head.py (new), ai/tests/test_train_konvid_mos_head.py (new), tools/vmaf-tune/src/vmaftune/predictor.py (adds Predictor.predict_mos + the optional konvid_mos_head_v1.onnx loader; _DEFAULT_COEFFS and _predict_analytical are unchanged), tools/vmaf-tune/tests/test_predict_mos.py (new), model/konvid_mos_head_v1.onnx (new), model/konvid_mos_head_v1_card.md (new), model/konvid_mos_head_v1.json (new manifest sidecar), docs/adr/0336-konvid-mos-head-v1.md (new), docs/research/0090-konvid-mos-head-design.md (new), docs/state.md (T-MOS-HEAD-PRODFLIP row), changelog.d/added/0336-konvid-mos-head-v1.md (new). All paths are fork-local; upstream Netflix/vmaf has no MOS-head surface and the predictor lives entirely under tools/vmaf-tune/.
  • Invariant: the MOS-head ONNX I/O contract is two-input named tensors (features shape (N, 11); encoder_onehot shape (N, 1)) -> one output tensor (mos shape (N,)) with the range [1.0, 5.0] baked into the graph via 1 + 4 * sigmoid(raw). The 11 feature columns are (adm2, vif_scale0..3, motion2, saliency_mean, saliency_var, shot_count_norm, shot_mean_len_norm, shot_cut_density) in that exact order — they line up with train_konvid_mos_head.FEATURE_COLUMNS and the predictor's _predict_mos_via_head zero-fills layout. ENCODER_VOCAB v4 ships a single "ugc-mixed" slot; multi-slot expansion is append-only. Predictor.predict_mos falls back to mos = (predicted_vmaf - 30) / 14 clamped to [1, 5] whenever the ONNX is missing or onnxruntime is unavailable — that fallback is the documented behaviour, not a bug.
  • On upstream sync: no action required. The trainer + predictor + MOS head + tests live entirely under fork-local paths (ai/, tools/vmaf-tune/, model/); upstream syncs cannot touch them. tools/vmaf-tune/src/vmaftune/predictor.py is fork-local but co-evolves with vmaf-tune; if a future ADR re-shapes ShotFeatures, replay the MOS-head feature-column map in lockstep.
  • Re-test on rebase:

```bash python3 -m pytest ai/tests/test_train_konvid_mos_head.py tools/vmaf-tune/tests/test_predict_mos.py -v python3 ai/scripts/train_konvid_mos_head.py --smoke --no-export # gate must report PASS

ADR-0335 — AdaptiveCpp as a second SYCL toolchain (2026-05-08)

  • Touches: core/src/feature/sycl/sycl_compat.h (new), core/src/feature/sycl/*.cpp (10 attribute call sites in 9 files switched from [[intel::reqd_sub_group_size(N)]] to VMAF_SYCL_REQD_SG_SIZE(N)), core/src/meson.build (toolchain branch in the SYCL block + the feature-kernel block), core/meson_options.txt (description bump on sycl_compiler + new sycl_acpp_targets option), docs/development/sycl-toolchains.md (new), docs/adr/0335-adaptivecpp-second-sycl-toolchain.md (new), docs/adr/_index_fragments/0335-adaptivecpp-second-sycl-toolchain.md (new), docs/adr/_index_fragments/_order.txt (append), docs/adr/README.md (regenerated by concat-adr-index.sh --write), docs/adr/0217-sycl-toolchain-cleanup.md (status-update appendix per ADR-0028), core/src/sycl/AGENTS.md (invariant row), changelog.d/added/0335-adaptivecpp-second-sycl-toolchain.md (new). No upstream-shared paths in core/src/feature/sycl/*.cpp are touched on upstream/master (those TUs are fork-local SYCL twins).
  • Invariant: Intel icpx stays the primary toolchain. AdaptiveCpp is opt-in via -Dsycl_compiler=acpp. Any new Intel-specific SYCL kernel attribute (e.g. a future [[intel::*]] decoration, sycl::ext::oneapi::experimental::* use) must land behind a new macro in core/src/feature/sycl/sycl_compat.h rather than appear inline. AdaptiveCpp output is not bit-identical to icpx and not bit-identical to scalar CPU (consistent with the existing CPU-only golden gate). The canonical AdaptiveCpp identification macros are SYCL_IMPLEMENTATION_ACPP and the legacy SYCL_IMPLEMENTATION_HIPSYCL, both auto-defined by <sycl/sycl.hpp>.
  • On upstream sync: if a Netflix upstream cherry-pick lands a bare [[intel::reqd_sub_group_size(N)]] (or any Intel-specific SYCL attribute) on a kernel lambda, wrap the attribute in the appropriate VMAF_SYCL_* compat macro before merging. Upstream has no SYCL backend today, so the conflict surface is small.
  • Re-test on rebase:
# Plumbing parses cleanly with the icpx default still selected:
meson setup /tmp/build-sycl-icpx libvmaf -Denable_sycl=false
# And the macro count is consistent (10 sites under acpp guard):
grep -rl 'VMAF_SYCL_REQD_SG_SIZE' core/src/feature/sycl | wc -l
# → 9 files (the compat header itself defines the macro;
#    9 kernel TUs consume it.)

ADR-0212 §Status update — HIP runtime (T7-10b, 2026-05-08)

  • Touches: core/src/hip/common.c, core/src/hip/kernel_template.c, core/src/hip/meson.build, core/test/test_hip_smoke.c, core/test/meson.build (added hip_deps everywhere vulkan_deps already appears so test executables that statically pull the feature lib resolve hipMemsetAsync / hipFree).
  • Invariant: the kernel_template.c helpers and common.c public API both store HIP runtime handles (hipStream_t, hipEvent_t) as uintptr_t in the structs that cross the public ABI. The header-purity contract documented in core/src/hip/kernel_template.h is load-bearing — moving the cast site (or replacing uintptr_t with void *) breaks every consumer TU and the public libvmaf_hip.h no-<hip/...> guarantee. The fallback find_library('amdhip64', dirs: hip_search_paths) exists because ROCm 7.x publishes no hip-lang.pc and the cmake config breaks under meson's CMake probe — the fallback is the supported path on ROCm 7.x.
  • Re-test on rebase:
PATH=/opt/rocm/bin:$PATH meson setup build --reconfigure \
    -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_hip_smoke

The smoke test self-skips the device-resident assertions when vmaf_hip_device_count() == 0, so it stays portable across CI runners that don't expose an AMD GPU.

saliency_student_v2 — Resize-decoder ablation (ADR-0364, 2026-05-09)

  • Touches: ai/scripts/train_saliency_student_v2.py (new), model/tiny/saliency_student_v2.{onnx,json} (new), model/tiny/saliency_student_v2_card.md (new), model/tiny/registry.json (new row), docs/ai/models/saliency_student_v2.md (new), docs/adr/0364-saliency-student-v2-resize-decoder.md (new), docs/research/0089-saliency-student-v2-resize-decoder.md (new), changelog.d/added/saliency-student-v2.md (new). All paths are fork-only — no upstream-mirrored files touched.
  • Invariant: v1 (saliency_student_v1.onnx, registry id saliency_student_v1, smoke: false) stays as the production weights for the C-side mobilesal extractor. v2 is a parallel artefact under model/tiny/; promotion to production is a separate PR. The trainer's _ResizeConv module produces an ONNX graph with Resize (mode=linear, coordinate_transformation_mode=half_pixel) — every op stays on core/src/dnn/op_allowlist.c post-ADR-0258.
  • On upstream sync: no rebase impact — Netflix has no parallel saliency-student model, no consumer of Resize in the upstream ONNX surface, and no model/tiny/ registry in the upstream tree. If Netflix ever lands a saliency model, the fork's saliency_student_v{1,2} rows stay independent.
  • Re-test on rebase:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python - <<'EOF'
import onnx
g = onnx.load('model/tiny/saliency_student_v2.onnx')
ops = sorted({n.op_type for n in g.graph.node})
assert 'Resize' in ops and 'ConvTranspose' not in ops, ops
print('v2 ONNX op-set:', ops)
EOF

Predictor v2 — real-corpus LOSO trainer + ADR-0303 gate (2026-05-08)

  • Touches: ai/scripts/train_predictor_v2_realcorpus.py (new), ai/scripts/run_predictor_v2_training.sh (new), ai/tests/test_train_predictor_v2_realcorpus.py (new), docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md (Status-update appendix only — body frozen per ADR-0028), changelog.d/added/predictor-v2-realcorpus-trainer.md (new). No upstream-shared paths; the trainer lives entirely under fork-local ai/scripts/.
  • Invariant: the gate constants SHIP_GATE_MEAN_PLCC = 0.95, SHIP_GATE_PLCC_SPREAD_MAX = 0.005, SHIP_GATE_PER_FOLD_MIN = 0.95, LOSO_FOLD_COUNT = 5 mirror ADR-0303 §Decision and the constants in scripts/ci/ensemble_prod_gate.py. They MUST stay in lockstep; if a future ADR changes the gate, update both files (the predictor trainer + the ensemble CI gate) and re-run test_gate_constants_match_adr_0303. The 14-codec list in _resolve_codecs() is sourced from vmaftune.predictor._DEFAULT_COEFFS when PR #450 is on the path; the hard-coded fallback exists for the bootstrap case where this script lands before #450 merges. Drift between the two is asserted at runtime — adding a 15th codec means updating the mirror.
  • On upstream sync: no action required. The trainer + tests live entirely under fork-local paths (ai/scripts/, ai/tests/); upstream Netflix/vmaf has no equivalent surface. PR #450 (the predictor train pipeline) is itself fork-local; an upstream sync that reorganises ai/scripts/ would invalidate the relative imports — re-run the test suite if that happens.
  • Re-test on rebase:

```bash python -m pytest ai/tests/test_train_predictor_v2_realcorpus.py -q bash -n ai/scripts/run_predictor_v2_training.sh python ai/scripts/train_predictor_v2_realcorpus.py --synthetic-smoke --report-out /tmp/p2.json

ADR-0332 — OpenVINO NPU EP wired into tiny-AI dispatch (2026-05-08)

  • Touches: core/include/libvmaf/dnn.h, core/src/dnn/ort_backend.{c,h}, core/tools/vmaf.c, core/tools/cli_parse.{c,h}, core/test/dnn/test_ep_fp16.c, core/test/dnn/test_cli.sh, docs/ai/inference.md, docs/usage/cli.md, docs/development/oneapi-install.md, docs/adr/0332-openvino-npu-ep-wiring.md (new), docs/adr/_index_fragments/0332-openvino-npu-ep-wiring.md (new), changelog.d/added/openvino-npu-ep.md (new). The libvmaf dnn/ and tools surfaces are fork-local additions; upstream Netflix/vmaf has no tiny-AI / ONNX Runtime dispatch layer, so conflict probability on dnn/ is zero.
  • Invariant: VmafDnnDevice enum values 9..11 (OPENVINO_NPU / OPENVINO_CPU / OPENVINO_GPU) are appended after CoreML 5..8. ABI requires these values stay stable across releases — append-only; never renumber. The --tiny-device validator in cli_parse.c::ARG_TINY_DEVICE enumerates the keyword set; new keywords append to the validator AND to the help string AND to resolve_tiny_device() in vmaf.c together. The vmaf_dnn_session_attached_ep() stable-string list (docs/ai/inference.md + dnn.h doxygen) gains "OpenVINO:NPU" — consumers asserting on the returned string MUST update.
  • On upstream sync: no action required for upstream Netflix/vmaf. If a future Netflix sync introduces an unrelated tiny-AI surface (unlikely), reconcile the EP-name list at the merge.
  • Re-test on rebase:
cd libvmaf && \
  CC=icx CXX=icpx meson setup build -Denable_sycl=true -Denable_cuda=false && \
  ninja -C build && \
  ./build/test/dnn/test_ep_fp16 && \
  ./build/tools/vmaf --tiny-device=openvino-npu --tiny-device=openvino-cpu \
    --tiny-device=openvino-gpu  # validator must accept all three keywords

ADR-0365 — CoreML execution provider wiring (2026-05-09)

  • Touches: core/include/libvmaf/dnn.h, core/src/dnn/ort_backend.{c,h}, core/tools/cli_parse.{c,h}, core/tools/vmaf.c, core/test/dnn/test_ep_fp16.c, core/test/dnn/test_cli.sh, docs/ai/inference.md, docs/usage/cli.md. Coordinates with ADR-0332 (OpenVINO NPU EP, PR #496) — both touch the same files; conflicts are mechanical (adjacent enum values, adjacent switch cases, adjacent CLI keyword strings). OpenVINO NPU/CPU/GPU values are 9..11 (after CoreML 5..8).
  • Invariant: VmafDnnDevice enum is append-only. CoreML values are 5..8; OpenVINO pinned variants are 9..11. The SessionOptionsAppendExecutionProvider("CoreMLExecutionProvider", …) generic form is deliberate so the Linux build needs no coreml_provider_factory.h include. The MLComputeUnits key string values (CPUAndNeuralEngine / CPUAndGPU / CPUOnly) are part of the CoreML EP public contract — upstream renames would break the wiring. The AUTO chain inserts CoreML at the last position (after CUDA / OpenVINO / ROCm); reordering changes the Apple-silicon AUTO outcome.
  • Re-test on rebase:
cd libvmaf && meson setup build -Denable_dnn=auto \
  -Denable_cuda=false -Denable_sycl=false \
  -Dbuilt_in_models=false && \
  ninja -C build && \
  ./build/test/dnn/test_ep_fp16 && \
  VMAF_BIN=$PWD/build/tools/vmaf bash test/dnn/test_cli.sh && \
  ./build/tools/vmaf --tiny-device coreml-ane 2>&1 | \
    grep -q 'Reference' && \
  ./build/tools/vmaf --tiny-device bogus 2>&1 | \
    grep -q 'coreml'

python3 -m pytest tools/external-bench/tests/ -q   # must report 7 passed
bash -n tools/external-bench/*/run.sh

0361 — Metal (Apple Silicon) backend scaffold (ADR-0361)

  • Touches:
  • core/include/libvmaf/libvmaf_metal.h (new, fork-local) — public C-API for the Metal backend (vmaf_metal_state_init / _import_state / _state_free / vmaf_metal_list_devices / vmaf_metal_available). Mirrors the HIP / Vulkan / SYCL / CUDA public-header convention; opaque runtime types cross the ABI as uintptr_t per ADR-0361 / ADR-0212 / ADR-0184.
  • core/src/metal/{common,picture_metal,dispatch_strategy,kernel_template}.{c,h}
    • AGENTS.md + meson.build (new, fork-local) — backend tree. Every entry point returns -ENOSYS. The kernel_template field shape mirrors the HIP twin modulo the unified-memory buffer collapse (one MTLBuffer with MTLResourceStorageModeShared instead of the (device, pinned-host) readback pair).
  • core/src/feature/metal/integer_motion_v2_metal.c (new, fork-local) — first kernel-template consumer. Mirrors feature/hip/integer_motion_v2_hip.c call-graph-for-call-graph modulo the single-buffer prev-ref slot (vs the HIP twin's pix[2] ping-pong).
  • core/test/test_metal_smoke.c (new, fork-local) — 14-sub-test smoke pinning the -ENOSYS contract. Mirrors test_hip_smoke.c.
  • core/meson_options.txt — new enable_metal feature option (default auto). On auto the parent meson resolves to host_machine.system() == 'darwin' so non-macOS hosts compile cleanly without the frameworks; enabled forces linkage and fails on non-macOS. Type-feature matches enable_dnn's auto-resolve shape (Metal on macOS is always available, like DNN on a host with ONNX Runtime); the GPU-vendor-pair boolean-default-off triad (enable_cuda / enable_sycl / enable_hip) does not fit because Metal has no comparable "wrong-host silent flip" risk.
  • core/src/meson.buildis_metal_enabled resolution + subdir('metal') + metal_sources / metal_deps threaded through libvmaf_feature_static_lib and libvmaf library() calls alongside CUDA / SYCL / Vulkan / HIP / DNN aggregations.
  • core/test/meson.buildtest_metal_smoke executable wired under the same auto-on-macOS / explicit-enabled gate.
  • core/src/feature/feature_extractor.c — adds extern VmafFeatureExtractor vmaf_fex_integer_motion_v2_metal;
    • registry entry under #if HAVE_METAL.
  • .github/workflows/libvmaf-build-matrix.yml — new lane Build — macOS Metal (T8-1 scaffold) on macos-latest with -Denable_metal=enabled. The macos-latest runner ships the Metal SDK as part of the system framework set; no extra install step is needed.
  • docs/backends/metal/index.md (new, fork-local) + docs/backends/index.md (row added) + ADR-0361 + index fragment + changelog.d/added/metal-backend-scaffold.md + docs/state.md row T8-1b.
  • Upstream-port footprint: zero — Netflix/vmaf does not ship a Metal backend; this is a wholly fork-local addition. No upstream file is touched. Same posture as the HIP scaffold (T7-10) and the Vulkan scaffold (T5-1).
  • Rebase invariants (mirror the HIP scaffold's invariant set):
  • metal/kernel_template.h mirrors hip/kernel_template.h modulo the unified-memory buffer collapse (single MTLBuffer slot vs the HIP (device, pinned-host) pair). On rebase, if the HIP twin's lifecycle struct gains a third event slot, the Metal twin must follow in the same PR.
  • feature/metal/integer_motion_v2_metal.c mirrors feature/hip/integer_motion_v2_hip.c call-graph-for-call-graph modulo the single-prev_ref-slot collapse (vs the HIP twin's pix[2] ping-pong). On rebase, drift in the HIP twin's submit body (e.g. an added submit_pre_launch call) requires a paired update here.
  • vmaf_fex_integer_motion_v2_metal registers without the VMAF_FEATURE_EXTRACTOR_METAL flag bit set. The flag bit is reserved for the runtime PR (T8-1b) which adds the VMAF_PICTURE_BUFFER_TYPE_METAL_DEVICE tag and then sets the flag. Same posture as the HIP twin's VMAF_FEATURE_EXTRACTOR_HIP-deferral; on rebase, leave the flags at VMAF_FEATURE_EXTRACTOR_TEMPORAL only until T8-1b.
  • Re-test (on macOS only — Linux dev sessions cannot run this lane locally):
meson setup build -Denable_metal=enabled
ninja -C build
meson test -C build test_metal_smoke

And on every host (Linux / Windows included): the default-build gate must stay green — the auto-probe resolves to disabled on non-macOS hosts so meson setup build && ninja -C build runs unchanged.

ADR-0325 — vmaf-tune auto Phase F.1 + F.2 short-circuits (2026-05-08)

0327 — Conformal-VQA prediction surface for vmaf-tune (ADR-0279)

  • Touches: tools/vmaf-tune/src/vmaftune/conformal.py (new), tools/vmaf-tune/src/vmaftune/predictor.py (Predictor.predict_vmaf_with_uncertainty), tools/vmaf-tune/src/vmaftune/cli.py (predict subcommand gains --with-uncertainty / --calibration-sidecar / --alpha), tools/vmaf-tune/tests/test_conformal.py (new), docs/ai/conformal-vqa.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the conformal wrapper sits outside the ONNX graph and adds no new runtime dependency — conformal.py imports only the standard library (math, statistics, dataclasses, json, warnings). Future calibration-sidecar shapes use the method discriminator string for versioning; do not rename "split-conformal" / "cv-plus" without bumping the loader. The Predictor.predict_vmaf_with_uncertainty signature is the Python-API contract consumed by vmaf-tune predict --with-uncertainty; renaming or reordering its keyword args breaks the CLI in lockstep.
  • On upstream sync: no action required. vmaf-tune is a fork-local tool; upstream Netflix/vmaf has no per-shot prediction surface.
  • Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_conformal.py -q
python3 -m pytest tools/vmaf-tune/tests/test_predictor.py -q

ADR-0364 — vmaf-tune auto Phase F.1 + F.2 short-circuits (2026-05-08)

  • Touches: tools/vmaf-tune/src/vmaftune/auto.py (new), tools/vmaf-tune/src/vmaftune/cli.py (added auto subparser + dispatcher), tools/vmaf-tune/tests/test_auto_short_circuits.py (new), tools/vmaf-tune/AGENTS.md (invariant row), docs/usage/vmaf-tune.md (## auto section), docs/adr/0364-vmaf-tune-phase-f-auto.md (status update — already-accepted body untouched per ADR-0028; appended a ### Status update block under ## References). No upstream-shared paths.

ADR-0325 — vmaf-tune auto Phase F.1 + F.2 short-circuits (2026-05-08)

ADR-0371 — Shared CorpusIngestBase (2026-05-10)

No rebase impact: pure Python refactor under ai/ — no C/header/patch changes, no upstream-shared paths touched. All six MOS-corpus adapter scripts now import from corpus.base import CorpusIngestBase (PYTHONPATH=ai/src); if a future upstream sync adds a corpus/ directory under ai/ the import path may collide but the risk is negligible (Netflix/vmaf does not carry an ai/ subtree).

  • Touches: tools/vmaf-tune/src/vmaftune/auto.py (new), tools/vmaf-tune/src/vmaftune/cli.py (added auto subparser + dispatcher), tools/vmaf-tune/tests/test_auto_short_circuits.py (new), tools/vmaf-tune/AGENTS.md (invariant row), docs/usage/vmaf-tune.md (## auto section), docs/adr/0325-vmaf-tune-phase-f-auto.md (status update — already-accepted body untouched per ADR-0028; appended a ### Status update block under ## References). No upstream-shared paths.
  • Invariant: SHORT_CIRCUIT_PREDICATES in auto.py is an ordered tuple, not a set. The seven entries appear in the canonical order LADDER_SINGLE_RUNG, CODEC_PINNED, PREDICTOR_GOSPEL, SKIP_SALIENCY, SDR_SKIP, SAMPLE_CLIP_PROPAGATE, SKIP_PER_SHOT. The JSON schema records short-circuits in this order under plan.metadata.short_circuits; downstream consumers (CI corpus collector, post-hoc speedup analysis) parse the canonical-order list. Adding an eighth short-circuit (F.3+) appends; never reorder. The Phase D thresholds (PHASE_D_DURATION_GATE_S = 300.0, PHASE_D_SHOT_VARIANCE_GATE = 0.15) are placeholders pending F.3 empirical fit.
  • On upstream sync: no action required. Module is fork-local (tools/vmaf-tune/ is fork-only). The vmaf-tune umbrella ADR-0237 explicitly carves Phases B–F out of upstream scope.
  • Re-test on rebase:
cd tools/vmaf-tune && python -m pytest tests/test_auto_short_circuits.py -v
PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli auto \
    --src /dev/null --target-vmaf 93 --max-budget-bitrate 5000 \
    --allow-codecs libx264 --sample-clip-seconds 10 --smoke

ADR-0325 — vmaf-tune auto Phase F.3 confidence-aware fallbacks (2026-05-08)

  • Touches: tools/vmaf-tune/src/vmaftune/auto.py (F.3 helpers, _confidence_aware_escalation, ConfidenceThresholds, ConfidenceDecision, load_confidence_thresholds, per-cell wiring in run_auto), tools/vmaf-tune/tests/test_auto_confidence_aware.py (new, 28 tests), tools/vmaf-tune/AGENTS.md (invariant note), docs/usage/vmaf-tune.md (new ### Confidence-aware fallbacks (F.3) subsection under ## auto), docs/adr/0325-vmaf-tune-phase-f-auto.md (status update appended per ADR-0028; already-Accepted body untouched), changelog.d/added/phase-f3-confidence-aware-fallbacks.md (new). No upstream-shared paths.
  • Invariant: DEFAULT_TIGHT_INTERVAL_MAX_WIDTH = 2.0 and DEFAULT_WIDE_INTERVAL_MIN_WIDTH = 5.0 are an emergency floor (Research-0067), not a target. The production values come from a JSON calibration sidecar produced by the conformal-VQA pipeline (ADR-0279) with the canonical keys tight_interval_max_width and wide_interval_min_width. load_confidence_thresholds falls back to the defaults with a one-line WARNING when no sidecar is found; do not silence the warning. _confidence_aware_escalation is a pure function of its three inputs and is exposed in __all__ so downstream tools (the MCP server's auto proxy, the CI corpus collector) can embed it directly. The JSON schema records per-cell decisions in plan.metadata.confidence_aware_escalations[] (one entry per (rung, codec) cell with keys rung, codec, verdict, interval_width, decision); each cell in plan.cells[] also carries confidence_decision + interval_width so consumers don't need to cross-reference the metadata array index. Adding a fourth ConfidenceDecision value is a schema bump — coordinate with downstream JSON consumers.
  • On upstream sync: no action required. tools/vmaf-tune/ is fork-only; the conformal-VQA prediction surface (ADR-0279) and the F.1 + F.2 scaffold (ADR-0325) are both fork-local.
  • Re-test on rebase:
cd tools/vmaf-tune && python -m pytest \
    tests/test_auto_confidence_aware.py \
    tests/test_auto_short_circuits.py \
    tests/test_conformal.py -v

ADR-0325 — vmaf-tune auto Phase F.4 per-content-type recipe overrides (2026-05-09)

  • Touches: tools/vmaf-tune/src/vmaftune/auto.py (added _apply_recipe_override, _CONTENT_RECIPE_TABLE, get_recipe_for_class, the four _<class>_recipe factories, and the RECIPE_CLASS_* constants; integrated the override into run_auto and added recipe_applied / effective_predictor_target_vmaf to the JSON metadata), tools/vmaf-tune/tests/test_auto_recipe_overrides.py (new — 37 assertions), tools/vmaf-tune/tests/test_auto_short_circuits.py (one test updated for the F.4 force-single-rung semantics on animation sources), tools/vmaf-tune/AGENTS.md (invariant row), docs/usage/vmaf-tune.md (### Per-content-type recipes (F.4) subsection), docs/adr/0325-vmaf-tune-phase-f-auto.md (status update appended; already-accepted body untouched per ADR-0028), changelog.d/added/phase-f4-content-recipes.md. No upstream-shared paths.
  • Invariant: _CONTENT_RECIPE_TABLE stores factory callables, not literal dicts. Every get_recipe_for_class / _apply_recipe_override call returns a fresh override dict so caller mutations cannot leak between runs. The four override keys honoured by the driver are tight_interval_max_width, force_single_rung, saliency_intensity, target_vmaf_offset; the _RECIPE_KEYS allowlist filters anything else as defence-in-depth. The target_vmaf_offset shifts only effective_predictor_target_vmaf; the input target_vmaf (production-flip gate) is preserved verbatim. Every threshold value at F.4 is provisional pending F.5 calibration — do not promote a placeholder to "calibrated" in a drive-by edit.
  • On upstream sync: no action required. tools/vmaf-tune/ is fork-local; ADR-0237 explicitly carves Phases B–F out of upstream scope.
  • Re-test on rebase:
PYTHONPATH=tools/vmaf-tune/src python -m pytest \
  tools/vmaf-tune/tests/test_auto_recipe_overrides.py \
  tools/vmaf-tune/tests/test_auto_short_circuits.py \
  tools/vmaf-tune/tests/test_auto_confidence_aware.py -v
PYTHONPATH=tools/vmaf-tune/src python -c \
  "from pathlib import Path; from vmaftune.auto import run_auto, SourceMeta; \
   m = SourceMeta(height=1080, width=1920, content_class='animation', duration_s=120, shot_variance=0.05); \
   p = run_auto(src=Path('/dev/null'), target_vmaf=93.0, max_budget_kbps=5000.0, \
                allow_codecs=('libx264',), smoke=True, meta_override=m); \
   assert p.metadata['recipe_applied'] == 'animation'; \
   assert p.metadata['target_vmaf'] == 93.0; \
   assert p.metadata['effective_predictor_target_vmaf'] == 95.0; \
   print('F.4 smoke OK')"

ADR-0325 — vmaf-tune auto Phase F.5 calibrated recipe overrides (2026-05-09)

  • Touches: ai/scripts/calibrate_phase_f_recipes.py (new), ai/data/phase_f_recipes_calibrated.json (new — tracked via the .gitignore !ai/data/phase_f_recipes_calibrated.json allow rule), tools/vmaf-tune/src/vmaftune/auto.py (added _F4_PLACEHOLDER_RECIPES, _CALIBRATED_RECIPES_FILENAME, _find_calibrated_recipes_path, _load_calibrated_recipes, _CALIBRATED_RECIPES; the four _<class>_recipe factories now read from _CALIBRATED_RECIPES), tools/vmaf-tune/tests/test_calibrated_recipes.py (new — 14 assertions), docs/usage/vmaf-tune.md (calibrated table replaces the F.4 placeholder table in the ### Per-content-type recipes (F.4) subsection), docs/adr/0325-vmaf-tune-phase-f-auto.md (### Status update 2026-05-09: F.5 calibrated appended; already-accepted body untouched per ADR-0028), changelog.d/changed/phase-f5-calibrated-recipes.md, .gitignore (one allow rule for the JSON file). No upstream-shared paths.
  • Invariant: the _CONTENT_RECIPE_TABLE factories now consume _CALIBRATED_RECIPES snapshotted at module import. The runtime load is a single read; reloading at runtime requires importlib.reload(vmaftune.auto). Every get_recipe_for_class / _apply_recipe_override call still returns a fresh dict — the read-only invariant from F.4 is preserved by dict(_CALIBRATED_ RECIPES[<cls>]). The _load_calibrated_recipes loader strips every _provenance sub-dict and filters every key against _RECIPE_KEYS so a malicious or malformed JSON cannot inject unknown keys into a recipe. Per memory feedback_no_test_weakening, the calibration cannot widen the production-flip gate beyond the ConfidenceThresholds wide-interval ceiling — the regression test test_calibrated_ugc_width_below_wide_gate_ceiling locks this in.
  • On upstream sync: no action required. tools/vmaf-tune/, ai/scripts/, ai/data/ are all fork-local; ADR-0237 explicitly carves Phases B–F out of upstream scope.
  • Re-test on rebase:
PYTHONPATH=tools/vmaf-tune/src python -m pytest \
  tools/vmaf-tune/tests/test_calibrated_recipes.py \
  tools/vmaf-tune/tests/test_auto_recipe_overrides.py -v
python ai/scripts/calibrate_phase_f_recipes.py \
  --corpus .workingdir2/konvid-150k/konvid_150k.jsonl \
  --out /tmp/recipes_smoke.json \
  --max-rows 10000

ADR-0335 — Hardware-capability priors (2026-05-08)

  • Touches: ai/data/hardware_caps.csv (new), ai/scripts/hardware_caps_loader.py (new), ai/tests/test_hardware_caps.py (new), ai/AGENTS.md (one new bullet under "Rebase-sensitive invariants"), docs/ai/hardware-capability-priors.md (new), docs/research/0088-hardware-capability-priors-2026-05-08.md (new), docs/adr/0335-hardware-capability-priors.md (new), docs/adr/_index_fragments/0335-hardware-capability-priors.md (new), docs/adr/_index_fragments/_order.txt (one-line append), CHANGELOG.md (Added bullet under [Unreleased] — lusoris fork). No upstream-shared paths.
  • Invariant: the table is prior-only. The schema check in hardware_caps_loader.py rejects benchmark-shaped header columns (fps_*, throughput, mbps, latency, watts, tdp, score_*, vmaf_*), community-wiki source URLs (wikipedia.org, wikichip.org), empty fields, and rows with encoding_blocks=0. Adding throughput / quality columns is forbidden — that pathology was the contributor-pack digest's category-1 NO-GO finding. Schema extensions need a new ADR, not a silent column bump. The cap_vector_for() return-dict shape is load-bearing: trainers / corpus writers consume hwcap_* columns by name; reordering or renaming silently breaks downstream parquet schemas.
  • On upstream sync: no action required. The whole surface lives under ai/ and docs/ — Netflix upstream has no equivalent.
  • Re-test on rebase:

```bash python -m pytest ai/tests/test_hardware_caps.py -v # must report 23 passed python ai/scripts/hardware_caps_loader.py # JSON dump, 6+ rows

ADR-0367 — LSVQ corpus ingestion (2026-05-08)

  • Touches: ai/scripts/lsvq_to_corpus_jsonl.py (new), ai/tests/test_lsvq.py (new), docs/adr/0367-lsvq-corpus-ingestion.md (new), docs/adr/README.md (regenerated index), docs/ai/lsvq-ingestion.md (new), docs/research/0090-lsvq-corpus-feasibility.md (new), changelog.d/added/0367-lsvq-ingestion.md (new). No engine code touched; no upstream-shared paths.
  • Invariant: the JSONL row schema emitted by this adapter is byte-identical to the KonViD-150k Phase 2 adapter (ai/scripts/konvid_150k_to_corpus_jsonl.py) modulo the corpus and corpus_version literals. If a future PR widens the row contract (new column, type change), the LSVQ adapter must follow in lockstep — the trainer-side data loader consumes both shards through one schema.
  • On upstream sync: no action required. The adapter lives entirely under fork-local paths (ai/scripts/, ai/tests/) and only consumes a fork-local CSV manifest.
  • Re-test on rebase:
pytest ai/tests/test_lsvq.py -v

ADR-0325 — Local sidecar training scaffold (2026-05-08)

  • Touches: tools/vmaf-tune/src/vmaftune/sidecar.py (new), tools/vmaf-tune/tests/test_sidecar.py (new), docs/adr/0325-local-sidecar-training.md (new), docs/adr/_index_fragments/0325-local-sidecar-training.md (new), docs/adr/_index_fragments/_order.txt (append), docs/adr/README.md (index row), docs/research/0086-local-sidecar-feasibility.md (new), docs/ai/local-sidecar-training.md (new), changelog.d/added/local-sidecar-training-scaffold.md (new), tools/vmaf-tune/AGENTS.md (sidecar invariant note). No engine code touched; no upstream-shared paths.
  • Invariant: the sidecar's on-disk state schema (SIDECAR_SCHEMA_VERSION = 1, FEATURE_DIM = 14, the column order in _feature_vector) is the load-bearing pin. Adding columns or reordering them must bump SIDECAR_SCHEMA_VERSION; otherwise saved state from older harness versions silently aligns mismatched columns to the wrong feature. The SidecarConfig.predictor_version tag is the load-bearing pin against shipped-predictor upgrades — bumping it is the contract that invalidates stale corrections without operator intervention.
  • On upstream sync: no action required. The sidecar lives entirely under tools/vmaf-tune/ (fork-local) and only consumes the existing Predictor / ShotFeatures surface. Upstream Netflix/vmaf does not ship a vmaf-tune analogue; conflict probability is zero.
  • Re-test on rebase:

```bash cd tools/vmaf-tune && python -m pytest tests/test_sidecar.py -v

ADR-0368 — YouTube UGC corpus ingestion (2026-05-08)

  • Touches: ai/scripts/youtube_ugc_to_corpus_jsonl.py (new), ai/tests/test_youtube_ugc.py (new), docs/adr/0368-youtube-ugc-corpus-ingestion.md (new), docs/adr/_index_fragments/0368-youtube-ugc-corpus-ingestion.md (new), docs/adr/_index_fragments/_order.txt (one-line append), docs/adr/README.md (regenerated index), docs/ai/youtube-ugc-ingestion.md (new), docs/research/0091-youtube-ugc-corpus-feasibility.md (new), changelog.d/added/0368-youtube-ugc-ingestion.md (new), ai/AGENTS.md (one-paragraph invariant). No engine code touched; no upstream-shared paths.
  • Invariant: the JSONL row schema emitted by this adapter is byte-identical to the LSVQ adapter (ai/scripts/lsvq_to_corpus_jsonl.py, ADR-0367) and the KonViD-150k Phase 2 adapter modulo the corpus and corpus_version literals. If a future PR widens the row contract (new column, type change), all adapters must follow in lockstep.

ADR-0369 — Waterloo IVC 4K-VQA corpus ingestion (2026-05-08)

  • Touches: ai/scripts/waterloo_ivc_to_corpus_jsonl.py (new), ai/tests/test_waterloo_ivc.py (new), docs/adr/0369-waterloo-ivc-4k-corpus-ingestion.md (new), docs/adr/_index_fragments/0369-waterloo-ivc-4k-corpus-ingestion.md (new), docs/adr/_index_fragments/_order.txt (one-line append), docs/adr/README.md (regenerated index), docs/ai/waterloo-ivc-4k-ingestion.md (new), docs/research/0091-waterloo-ivc-4k-corpus-feasibility.md (new), changelog.d/added/0369-waterloo-ivc-4k-ingestion.md (new), ai/AGENTS.md (one-paragraph invariant). No engine code touched; no upstream-shared paths.
  • Invariant: JSONL row schema is byte-identical to the LSVQ (ADR-0367) and YouTube-UGC (ADR-0368) adapters modulo the corpus and corpus_version literals. All adapters must change in lockstep on schema widening.

  • On upstream sync: no action required.

  • Re-test on rebase:

```bash

pytest ai/tests/test_youtube_ugc.py -v

pytest ai/tests/test_waterloo_ivc.py -v

ADR-0325 — predictor stub-models policy (2026-05-08)

  • Touches: tools/vmaf-tune/src/vmaftune/predictor_train.py (new), model/predictor_<codec>.onnx × 14 (new), model/predictor_<codec>_card.md × 14 (new), tools/vmaf-tune/tests/test_predictor_train.py (new), docs/ai/predictor.md (new), docs/adr/0325-predictor-stub-models-policy.md (new), docs/adr/README.md + _index_fragments/0325-*.md + _order.txt (index rows), changelog.d/added/predictor-train-pipeline.md (new). No engine code; no upstream-shared paths.
  • Invariant: the trainer's CODECS tuple is sourced from predictor._DEFAULT_COEFFS so the two stay in lockstep. Any new codec adapter that lands in predictor._DEFAULT_COEFFS must (a) ship a matching synthetic-stub model + card under model/predictor_<codec>.{onnx,_card.md} in the same PR, and (b) re-run the trainer to refresh the artefact set. The shipped-model smoke test (test_predictor_loads_each_shipped_model) parameterises over CODECS and will fail if either condition is missed.
  • On upstream sync: no action required. The predictor + trainer live entirely under tools/vmaf-tune/ (a fork-local path); the model artefacts live under model/ but use a predictor_<codec>.onnx naming scheme that does not collide with any upstream model/vmaf_*.{json,pkl} or model/tiny/*.onnx path.
  • Re-test on rebase:

```bash python3 -m pytest tools/vmaf-tune/tests/test_predictor_train.py -q python3 -c " import sys sys.path.insert(0, 'tools/vmaf-tune/src') from vmaftune.predictor_train import main sys.exit(main(['--output-dir', '/tmp/predictor-rebase', '--epochs', '20'])) "

ADR-0325 — vmaf-tune Phase B target-VMAF bisect (2026-05-08)

ADR-0326 — vmaf-tune Phase B target-VMAF bisect (2026-05-08)

  • Touches: tools/vmaf-tune/src/vmaftune/bisect.py (new), tools/vmaf-tune/src/vmaftune/compare.py (default-predicate error string), tools/vmaf-tune/tests/test_bisect.py (new), tools/vmaf-tune/tests/test_compare.py (renamed default-predicate assertion), tools/vmaf-tune/AGENTS.md (Phase B invariant), docs/adr/0326-vmaf-tune-phase-b-bisect.md (new), docs/adr/_index_fragments/0326-vmaf-tune-phase-b-bisect.md (new), docs/adr/_index_fragments/_order.txt (append), docs/research/0090-vmaf-tune-phase-b-bisect-feasibility.md (new), docs/usage/vmaf-tune-bisect.md (new), changelog.d/added/vmaf-tune-phase-b-bisect.md (new). No upstream Netflix/vmaf surface is touched.
  • Invariant: the bisect assumes monotone-decreasing VMAF in CRF. Two non-adjacent samples that violate this contract abort the call with a clear error rather than falling back to a different search strategy. Do NOT add a fallback path on rebase — the AGENTS.md Phase B note is load-bearing.
  • Companion seam: compare._default_predicate no longer raises NotImplementedError("Phase B pending"); it returns a well-formed RecommendResult(ok=False, error=...) pointing callers at make_bisect_predicate. Any downstream tests that asserted "Phase B pending" verbatim need updating.
  • On upstream sync: no action required. The module lives entirely under tools/vmaf-tune/ (a fork-local path).
  • Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_bisect.py -v
python3 -m pytest tools/vmaf-tune/tests/test_compare.py -v

feat/sycl-integer-cambi-port — CAMBI SYCL twin (T3-15 / ADR-0371, 2026-05-10)

  • Touches: core/src/feature/sycl/integer_cambi_sycl.cpp (new file), core/src/feature/feature_extractor.c (extern declaration + list entry under #if HAVE_SYCL), core/src/meson.build (source addition to the SYCL feature list), core/test/test_integer_cambi_sycl.c (new smoke test), core/test/meson.build (test target + gpu_all_deps refactor), docs/backends/sycl/overview.md (Known gaps update), docs/adr/0371-cambi-sycl-port.md (new ADR).
  • Invariant: vmaf_fex_cambi_sycl must remain registered before any Vulkan or CUDA CAMBI extractor in feature_extractor_list[] so SYCL is preferred when the runtime selects a GPU backend. The ordering #if HAVE_SYCL … &vmaf_fex_cambi_sycl before #if HAVE_VULKAN / #if HAVE_CUDA is load-bearing. Additionally: the host residual calls vmaf_cambi_calculate_c_values and vmaf_cambi_spatial_pooling via cambi_internal.h trampoline — if upstream Netflix ever renames or removes those symbols the SYCL twin will silently stop compiling.
  • Upstream conflict probability: low. Netflix upstream does not carry a core/src/feature/sycl/ directory. The only upstream-shared paths touched are feature_extractor.c (extern + list entry) and cambi_internal.h (consumed, not modified). A conflict on feature_extractor.c would be an upstream addition of a new extractor; resolve by re-inserting the vmaf_fex_cambi_sycl entry under #if HAVE_SYCL.
  • Re-test on rebase:
meson setup build -Denable_sycl=true -Denable_cuda=false && ninja -C build
meson test -C build --suite=fast

fix/float-adm-extractor-loading — enable_float default flip (2026-05-09)

No rebase-sensitive invariants. The change is a single default-value flip in core/meson_options.txt (enable_float: falseenable_float: true) and a prose update to docs/development/build-flags.md. No C source was modified; no build-system paths changed; no new symbols were added.

  • On upstream sync: if Netflix upstream ever adds their own enable_float default change, prefer theirs and drop this entry.
  • Re-test on rebase: run the reproducer — ./build/tools/vmaf --feature float_adm --no_prediction ... — and confirm it no longer prints "problem loading feature extractor".

ADR-0326 — MyTestCase upstream migration (partial port, Batch E, 2026-05-08)

  • Touches: python/test/testutil.py, python/test/bd_rate_calculator_test.py, python/test/asset_test.py, python/test/bootstrap_train_test_model_test.py, python/test/local_explainer_test.py, python/test/cy_test.py, python/test/executor_test.py, python/test/raw_extractor_test.py, python/test/cross_validation_test.py, python/test/niqe_train_test_model_test.py, python/vmaf/script/run_testing.py, python/vmaf/tools/misc.py, python/vmaf/tools/testutils.py. Five Netflix golden-pinned files (quality_runner_test.py, vmafexec_test.py, vmafexec_feature_extractor_test.py, feature_extractor_test.py, result_test.py) are deliberately untouched.
  • Invariant: every assertAlmostEqual(key, value) pair in the five golden-pinned files remains byte-identical to the fork's pre-port state per ADR-0024. Verified via /tmp/mytestcase-port/verify_golden.py against the multiset baseline /tmp/mytestcase-port/baseline-pairs.json: all 310 + 183 + 37 + 113 + 17 = 660 pairs PASS post-port. CLAUDE.md §1 / §8 forbid altering them.
  • Deferred upstream commits (still need porting in a future session, in chronological order): 7d1ad54b (port aim/adm3/motion3 fextractor tests), 9fa593eb (more aim/adm3/motion3 + new options), 0341f730 (remove duplicate test_run_vmaf_integer_fextractor), a3776335 + 74bdce1b (align fork tests with upstream layout for aim/adm3/motion3), 322ca041 (replace temporal slicing with pre-sliced YUV fixtures), 6c097fc4 + ead2d12b + 4679db83 (macOS FP tolerance widenings — many of these are no-ops for our fork because the affected lines do not exist in the fork's current state), 005988ea (routine_test MyTestCase + fifo_mode), 3a041a97 + 3e075107 (per the user's 2026-05-08 instruction list, these "update score values" upstream commits are PERMANENTLY skipped — porting them would violate ADR-0024). The 3cbf352d + eb3374d0 and a333ba4c + 403dafed revert-pairs are no-ops upstream and require no port. The MyTestCase mixin itself is already in python/vmaf/tools/misc.py from a prior fork-local sync.
  • Watch out for: when retrying the deferred port, the fork's feature_extractor_test.py test method order (psnr -> ansnr -> ssim -> ssim_flat -> ms_ssim) differs from upstream's post-cluster layout (psnr -> ssim -> ms_ssim -> ansnr); the cherry-pick conflicts cluster around this reordering. The aim/adm3/motion3 additive blocks should be transplanted as new test methods rather than merged into existing ones. The verifier script will catch any (key, value) pair drop.
  • Re-test on rebase:

```bash python3 /tmp/mytestcase-port/verify_golden.py # OVERALL: PASS required pytest python/test/bd_rate_calculator_test.py -v pytest python/test/asset_test.py -v pytest python/test/quality_runner_test.py python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ python/test/feature_extractor_test.py python/test/result_test.py \ --collect-only -q # 173 tests collected, no errors

ADR-0318 — fr_regressor_v2 ensemble retrain harness fix (2026-05-06)

  • Touched files:
  • ai/scripts/run_ensemble_v2_real_corpus_loso.sh — wrapper passes --corpus "$CORPUS_JSONL" + --out-dir "$out_dir", drops --corpus-root / --output. JSONL-existence check replaces the YUV-directory hard-fail (YUV check is informational).
  • docs/ai/ensemble-v2-real-corpus-retrain-runbook.md — adds step 0. Generate the Phase A canonical-6 corpus, expands prereqs table with the JSONL row + Phase A wall-time estimate.
  • docs/adr/0318-ensemble-retrain-harness-fix.md, docs/adr/README.md (index row), changelog.d/fixed/ensemble-retrain-harness-interface.md.
  • Rebase invariant: not load-bearing. Wrapper-script + doc-only change. The trainer ai/scripts/train_fr_regressor_v2_ensemble_loso.py CLI is the authoritative interface (frozen here as part of the decision); any future change to it must update this wrapper in the same PR.
  • Upstream source: none — fork-local AI training harness.
  • On upstream sync: no action required. Path is entirely under ai/scripts/ + docs/ai/ + docs/adr/; upstream Netflix/vmaf does not ship these directories.
  • Re-test on rebase:

```bash bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh python3 ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help mkdir -p runs/phase_a/full_grid && \ touch runs/phase_a/full_grid/per_frame_canonical6.jsonl && \ bash ai/scripts/run_ensemble_v2_real_corpus_loso.sh 2>&1 | \ grep -q "unrecognized arguments" && echo "REGRESSED" || echo "OK" rm -rf runs/phase_a runs/ensemble_v2_real

0320 — fr_regressor_v2 ensemble seeds — production flip (ADR-0320)

  • Touches: model/tiny/registry.json (five fr_regressor_v2_ensemble_v1_seed{0..4} rows flipped from smoke: true to smoke: false), model/tiny/fr_regressor_v2_ensemble_v1_seed_flip_PROMOTE.json (new — committed verdict from the ADR-0319 harness run), ai/AGENTS.md (registry-flip invariant updated to record the flip
  • the going-forward "fresh PROMOTE.json required" rule), docs/state.md (Recently closed row), docs/adr/0320-fr-regressor-v2-ensemble-seed-flip.md (new ADR). Closes the deferral tracked in rebase-notes §0303 / §0309 / §0319.
  • Upstream source: none — fork-local registry mutation honouring ADR-0303's flip contract. Netflix/vmaf upstream has no fr_regressor_v2 ensemble surface.
  • Invariant: any future change to the fr_regressor_v2_ensemble_v1_seed{0..4} registry rows (sha256 bump after retraining, smoke-flag mutation, ONNX path change) requires a fresh runs/ensemble_v2_real/PROMOTE.json verdict with mean per-seed LOSO PLCC ≥ 0.95 AND max - min spread ≤ 0.005 — the same two-part gate ADR-0303 defined and ADR-0320 honoured. Never mutate these rows during a /sync-upstream rebase or as a side-effect of any other PR; the harness emits the verdict file but does not mutate the registry. The committed verdict at model/tiny/fr_regressor_v2_ensemble_v1_seed_flip_PROMOTE.json is the audit-trail anchor for the 2026-05-06 flip.
  • On upstream sync: no action required. The five rows live in model/tiny/registry.json which is fork-local; upstream has no competing entries. If upstream ever ships its own fr_regressor_v2_ensemble_v1_* registry rows, stop and consult the rebase reviewer — naming collision implies an architectural divergence that needs a Supersedes-ADR, not a mechanical merge.
  • Re-test on rebase:
python3 -c "import json; \
  d = json.load(open('model/tiny/registry.json')); \
  seeds = [m for m in d['models'] \
           if m['id'].startswith('fr_regressor_v2_ensemble_v1_seed')]; \
  assert len(seeds) == 5, seeds; \
  assert all(m['smoke'] is False for m in seeds), seeds; \
  print('OK: 5 ensemble seeds at smoke=false')"
python3 -c "import json; \
  v = json.load(open('model/tiny/fr_regressor_v2_ensemble_v1_seed_flip_PROMOTE.json')); \
  assert v['verdict'] == 'PROMOTE'; \
  assert v['gate']['passed'] is True; \
  print('OK: verdict still PROMOTE')"
  --feature ssimulacra2 --backend cuda --places 4

ADR-0372 — HIP batch-1: integer_psnr_hip + float_ansnr_hip real kernels (2026-05-10)

  • Touches: core/src/feature/hip/integer_psnr_hip.c (full rewrite), core/src/feature/hip/float_ansnr_hip.c (full rewrite), core/src/feature/hip/integer_psnr_hip.h (HSACO symbol decl under HAVE_HIPCC), core/src/feature/hip/float_ansnr_hip.h (HSACO symbol decl under HAVE_HIPCC), core/src/feature/hip/integer_psnr/psnr_score.hip (new device kernel), core/src/feature/hip/float_ansnr/float_ansnr_score.hip (new device kernel), core/src/hip/kernel_template.{h,c} (vmaf_hip_kernel_submit_post_record — also in PR #612; on merge conflict keep one copy and drop the duplicate), core/src/meson.build (hip_hsaco_sources HSACO build pipeline — also in PR #612), docs/backends/hip/overview.md (status table update), docs/adr/0372-hip-batch1-integer-psnr-float-ansnr.md (new).
  • Invariant (HAVE_HIPCC dual-path): all device-state fields in PsnrStateHip / AnsnrStateHip and all hipModule_t / hipFunction_t member declarations live under #ifdef HAVE_HIPCC. Without the flag the scaffold -ENOSYS contract is preserved and the host TU compiles without ROCm SDK headers. On rebase or refactor, never move device-state fields outside the #ifdef HAVE_HIPCC guard — it breaks the CPU-only build.
  • Invariant (float_ansnr no-memset bypass): float_ansnr_hip's submit() does not call vmaf_hip_kernel_submit_pre_launch. The device kernel writes per-block (sig, noise) float partials directly into an output buffer (partials[2*block_idx+0] and [+1]); no atomic accumulation means no memset is needed. The partials buffer is sized wg_count * 2u * sizeof(float) at init. On rebase: if a future PR adds a submit_pre_launch call to float_ansnr_cuda.c, the HIP twin must follow in the same PR.
  • Invariant (integer_psnr uint64 split shuffle): the PSNR kernel splits a uint64 warp-reduction into two uint32 __shfl_down calls (HIP warp size = 64, no native uint64 shuffle). On rebase: if ROCm adds native uint64 shuffle primitives in a future release, the kernel can be simplified — but verify the cross-backend numeric gate meson test -C build --suite=hip-parity still passes before landing.
  • Merge-conflict risk with PR #612: vmaf_hip_kernel_submit_post_record in kernel_template.{h,c} and the hip_hsaco_sources meson pipeline are also being added by PR #612 (float_psnr_hip). When the two PRs merge, keep one copy of each and discard the duplicate. The bodies are identical so either direction is safe.
  • On upstream sync: no action required for PSNR or ANSNR logic (fork-local kernels). If upstream adds its own HIP backend with conflicting meson.build variables, resolve manually against the hip_hsaco_sources pattern documented in this PR.
  • Re-test on rebase:
# CPU-only build (no ROCm required): must compile clean
meson setup build_hip_cpu -Denable_hip=true -Denable_hipcc=false \
    -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_cpu

# With ROCm + hipcc: HSACO pipeline must produce .hsaco + _hsaco.c
meson setup build_hip_full -Denable_hip=true -Denable_hipcc=true \
    -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_full

# Cross-backend numeric gate (requires AMD GPU)
meson test -C build_hip_full --suite=hip-parity

ADR-0373 — HIP batch-2: float_motion_hip real kernel (2026-05-10)

  • Touches: core/src/feature/hip/float_motion_hip.c (full rewrite to #ifdef HAVE_HIPCC dual-path; uintptr_t opaque slots replaced with real void * device pointers), core/src/feature/hip/float_motion/float_motion_score.hip (device kernel — already present, no change in this PR), core/src/meson.build (float_motion_score added to hip_kernel_sources), docs/backends/hip/overview.md (status update to 4/11), docs/adr/0373-hip-batch2-float-motion.md (new).
  • Invariant (HAVE_HIPCC dual-path): hipModule_t module, hipFunction_t funcbpc8/funcbpc16, and void *ref_in, void *blur[2] live under #ifdef HAVE_HIPCC. Without the flag the scaffold -ENOSYS contract is preserved. Never move these fields outside the guard.
  • Invariant (temporal blur ping-pong): cur_blur alternates 0/1 in both submit() and collect(). The kernel reads blur[1 - s->cur_blur] as "prev" and writes blur[s->cur_blur] as "cur". collect() flips cur_blur after consuming the partials. On rebase: if the CUDA twin changes the ping-pong direction, the HIP twin must follow in the same PR.
  • Invariant (first-frame compute_sad=0): submit() passes compute_sad=0 when index == 0. The kernel still writes cur_blur but sets all partials to 0.0. collect() emits motion_score=0 and motion2_score=0 for index==0 without SAD accumulation.
  • Invariant (flush tail): flush() emits VMAF_feature_motion2_score = prev_motion_score at s->index (the last frame) and returns 1. If s->index == 0 it returns 1 immediately. Mirrors flush_fex_cuda shape exactly.
  • On upstream sync: no action required (fork-local kernel). If Netflix adds a HIP backend with conflicting float_motion logic, resolve against this invariant set.
  • Re-test on rebase:
# CPU-only build (no ROCm required): must compile clean
meson setup build_hip_cpu -Denable_hip=true -Denable_hipcc=false \
    -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_cpu

# With ROCm + hipcc: HSACO pipeline must produce float_motion_score.hsaco
meson setup build_hip_full -Denable_hip=true -Denable_hipcc=true \
    -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_full

# Cross-backend numeric gate (requires AMD GPU)
meson test -C build_hip_full --suite=hip-parity

ADR-0375 — HIP batch-3: float_moment_hip + float_ssim_hip real kernels (2026-05-10)

  • Touches: core/src/feature/hip/float_moment_hip.c (full rewrite to #ifdef HAVE_HIPCC dual-path), core/src/feature/hip/float_moment_hip.h (HSACO symbol decl under HAVE_HIPCC), core/src/feature/hip/float_moment/moment_score.hip (new device kernel), core/src/feature/hip/float_ssim_hip.c (full rewrite to #ifdef HAVE_HIPCC dual-path), core/src/feature/hip/float_ssim_hip.h (HSACO symbol decl under HAVE_HIPCC), core/src/feature/hip/float_ssim/ssim_score.hip (new device kernel), core/src/meson.build (moment_score and ssim_score added to hip_kernel_sources), docs/backends/hip/overview.md (status update to 6/11), docs/adr/0375-hip-batch3-float-moment-float-ssim.md (new).
  • Invariant (HAVE_HIPCC dual-path): hipModule_t module, hipFunction_t funcbpc8/16, void *ref_in/dis_in (moment) and all five void *d_* intermediate buffers + void *ref_in/cmp_in (SSIM) live under #ifdef HAVE_HIPCC in the respective structs. Free helpers (moment_hip_module_free, ssim_hip_bufs_free) are defined outside the guard with internal #ifdef HAVE_HIPCC bodies (mirrors float_psnr_hip_module_free). On rebase: never move device-state fields outside the guard — it breaks the CPU-only build.
  • Invariant (moment 7-arg kernel): calculate_moment_hip_kernel_8bpc and _16bpc both take 7 arguments (ref, dis, ref_stride, dis_stride, sums, width, height) — the 16bpc kernel does NOT take a bpc arg. The host launch must pass 7 args to both functions. If the CUDA twin adds a bpc arg to moment_score_16bpc, the HIP twin must follow.
  • Invariant (SSIM two-pass stream ordering): both calculate_ssim_hip_horiz_* and calculate_ssim_hip_vert_combine run on the same s->lc.str stream. Implicit stream ordering provides the happens-before between Pass 1 writes and Pass 2 reads — no explicit event is needed between the two launches. On rebase: if the CUDA twin adds an explicit inter-pass sync event, evaluate whether GCN/RDNA's stream ordering guarantees are equivalent before mirroring.
  • Invariant (SSIM WARPS_PER_BLOCK=2): SSIM_WARPS_PER_BLOCK = SSIM_BLOCK_SIZE / SSIM_WARP_SIZE = 128 / 64 = 2 (vs CUDA's 4 = 128/32). The shared-memory array s_warp_sums[SSIM_WARPS_PER_BLOCK] must be sized 2. On rebase: if the block size or warp size changes, update ssim_score.hip accordingly.
  • Invariant (SSIM scale=1 only): init_fex_hip rejects scale != 1 with -EINVAL. Mirrors the CUDA and Vulkan SSIM twins. Lifting this constraint is a future batch item and requires a new ADR.
  • On upstream sync: no action required (fork-local kernels).
  • Re-test on rebase:
# CPU-only build (no ROCm required): must compile clean
meson setup build_hip_cpu -Denable_hip=true -Denable_hipcc=false \
    -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_cpu

# With ROCm + hipcc: HSACO pipeline must produce moment_score.hsaco + ssim_score.hsaco
meson setup build_hip_full -Denable_hip=true -Denable_hipcc=true \
    -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_full

# Cross-backend numeric gate (requires AMD GPU)
meson test -C build_hip_full --suite=hip-parity

feat/hip-float-psnr-first-real — T7-10b: float_psnr_hip first real kernel (ADR-0254)

Touches:

  • core/src/feature/hip/float_psnr_hip.c — complete rewrite from scaffold stub to functional kernel consumer (HIP Module API pattern: hipModuleLoadData + hipModuleLaunchKernel).
  • core/src/feature/hip/float_psnr_hip.h — HSACO symbol extern declarations (float_psnr_score_hsaco[], float_psnr_score_hsaco_len).
  • core/src/feature/hip/float_psnr/float_psnr_score.hip — new HIP device kernel file (warp-64 reduction, GCN/RDNA-specific __shfl_down).
  • core/src/hip/kernel_template.{h,c} — new vmaf_hip_kernel_submit_post_record helper for the post-launch event.
  • core/src/hip/meson.buildfloat_psnr_hip.c added to hip_sources.
  • core/src/meson.buildhip_hsaco_sources list + enable_hipcc guard + hipcc / xxd custom-target pipeline.
  • core/meson_options.txt — new enable_hipcc boolean option.
  • core/src/feature/feature_extractor.cvmaf_fex_float_psnr_hip registration under #if HAVE_HIP.
  • core/test/test_hip_smoke.ctest_float_psnr_hip_extractor_registered.

Invariants:

  1. vmaf_hip_kernel_submit_post_record call ordering — must be called after hipMemcpyAsync DtoH and before the collect-side vmaf_hip_kernel_collect_wait. Any reorder breaks the event-fencing contract (the finished event records the end of the readback copy, not the end of the kernel launch). If a future PR refactors kernel_template.c, preserve this ordering constraint.

  2. #ifdef HAVE_HIPCC wraps all device-dependent statehipModule_t, hipFunction_t, staging-buffer allocs, and the kernel launch chain are all inside #ifdef HAVE_HIPCC guards so the file compiles cleanly without a ROCm SDK. The #ifndef HAVE_HIPCC paths return -ENOSYS. Any future real-kernel port must preserve this dual-path pattern.

  3. Warp size 64 (GCN/RDNA)FPSNR_WARPS_PER_BLOCK = 4 for block size 256 (vs CUDA's 8 warps at warp size 32). The .hip kernel uses __shfl_down(v, off) without a mask (HIP convention; CUDA uses __shfl_down_sync(0xffffffff, v, off)). Any future port of the CUDA twin's warp-reduction that changes these constants must update both backends.

  4. enable_hipcc=false default — the meson option defaults to false so CI / downstream builds without a ROCm toolchain still compile cleanly. The enable_hipcc=true path requires hipcc + xxd in PATH and a ROCm 6+ SDK. Gate any build-system change on both codepaths.

Re-test on rebase:

# CPU-only (no ROCm needed):
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build
meson test -C build --suite=fast  # test_float_psnr_hip_extractor_registered passes

# With hipcc (ROCm 6+):
meson setup build -Denable_hip=true -Denable_hipcc=true -Denable_cuda=false libvmaf
ninja -C build
# Confirm kernel launches on device by running vmaf with --feature float_psnr_hip

HIP batch-4 -- ciede_hip and integer_motion_v2_hip real kernels (ADR-0377)

Rebase-sensitive invariants:

  1. Arithmetic shift on int32/int64 in motion_v2_score.hip — the inner filter right-shifts (>> shift_y, >> shift_x) operate on signed types (int32_t, int64_t). They MUST remain arithmetic (signed) shifts. Converting to logical shifts (e.g., >> (unsigned), or using bitwise ops) diverges from the CPU reference for negative values. This was the root cause of the AVX2 srlv_epi64 regression in PR #587. The CUDA twin documents the same constraint.

  2. Mirror padding diverges from motion_hipmotion_v2_score.hip uses reflective mirror (2 * size - idx - 1) while motion_hip's kernel uses skip-boundary mirror (2 * size - idx - 2). Both match their respective CPU references. Do not unify them on rebase.

  3. Six YUV staging buffers for ciede_hipciede_hip_bufs_alloc allocates ref_y/u/v + dis_y/u/v separately. The chroma buffers are sized at chroma_w * chroma_h, not luma_w * luma_h. If the HIP picture-buffer API changes on rebase (e.g., VMAF_FEATURE_EXTRACTOR_HIP flag lands and pictures arrive on-device), these staging copies and their hipMalloc/hipFree calls must be removed or made conditional.

  4. #ifdef HAVE_HIPCC dual-path preserved — same invariant as float_psnr_hip (see entry above). All device-dependent state and kernel launches are inside #ifdef HAVE_HIPCC guards.

Re-test on rebase:

# CPU-only (no ROCm needed):
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build
meson test -C build  # 54/54 pass including test_hip_smoke

speed_qa -- real SpEED-QA implementation (ADR-0253)

core/src/feature/speed_qa.c went from a 71-line placeholder scaffold to a ~380-line real implementation. The extractor now sets

speed_qa — real SpEED-QA implementation (ADR-0253)

core/src/feature/speed_qa.c went from a 71-line placeholder scaffold to a ~380-line real implementation. The extractor now sets

VMAF_FEATURE_EXTRACTOR_TEMPORAL and carries priv_size = sizeof(SpeedQaState); the registration entry in feature_extractor_list[] is unchanged (always unconditional, outside the #if VMAF_FLOAT_FEATURES block).

No upstream rebase conflict expected. The scaffold was fork-local; Netflix

upstream has no speed_qa.c. The upstream speed.c is unmodified.

Rebase invariant: vmaf_fex_speed_qa must stay outside the VMAF_FLOAT_FEATURES guard in feature_extractor.c -- speed_qa.c is compiled unconditionally (no float dependency). If a future Netflix commit lands a speed_qa.c, audit for algorithm conflicts before merging.

upstream has no speed_qa.c. The speed.c file (upstream port) is unmodified.

Rebase invariant: vmaf_fex_speed_qa must stay outside the VMAF_FLOAT_FEATURES guard in feature_extractor.cspeed_qa.c is compiled unconditionally (no float dependency). If a rebase lands a Netflix speed_qa.c, audit for algorithm conflicts before merging.

Re-test on rebase:

meson setup build_test libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build_test test/test_speed_qa
meson test -C build_test test_speed_qa --verbose
# Expected: 5 tests run, 5 passed

0378 — picture-upload stream CU_STREAM_NON_BLOCKING (PR #702, ADR-0378)

Touches: core/src/cuda/picture_cuda.c (one line in vmaf_cuda_picture_alloc).

Invariant: priv->cuda.str must be created with CU_STREAM_NON_BLOCKING (via cuStreamCreateWithPriority). The CUDA implicit null-stream serialisation rule makes CU_STREAM_DEFAULT a per-frame context barrier; at sub-4K this reduces CUDA motion throughput to ~0.55x CPU. If a future upstream commit touches vmaf_cuda_picture_alloc and reverts the stream flag, the performance regression returns silently.

Re-test:

meson setup build-cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build-cuda
./build-cuda/tools/vmaf_bench --resolution 576x324 --gpu-only --frames 20
# motion (CUDA) @ 576x324 must be >= 30 fps (>= 1x CPU baseline)

ADR-0376 — Vulkan buffer-invalidate voidint fix (GCC 16, 2026-05-10)

  • Touches: core/src/feature/vulkan/float_ansnr_vulkan.c (function reduce_partials, call site in extract), core/src/feature/vulkan/cambi_vulkan.c (functions cambi_vk_readback_image, cambi_vk_readback_mask, call sites in cambi_vk_extract).
  • Invariant: reduce_partials, cambi_vk_readback_image, and cambi_vk_readback_mask are now static int with error-propagating call sites. If an upstream sync brings a competing refactor of these functions (e.g., a signature change or a different coherency-flush strategy), the static int contract and call-site error checks must be preserved.
  • On upstream sync: no risk from Netflix/vmaf upstream — these files are 100% fork-local (Vulkan feature extractors do not exist upstream). The rebase risk is internal: if a fork-local PR changes the Vulkan buffer-API surface (e.g., a new vmaf_vulkan_buffer_invalidate variant), verify the call-site error propagation pattern still applies.
  • Re-test on rebase:
meson setup build-vk-retest -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=true
ninja -C build-vk-retest
# Must compile cleanly under GCC 16 with no -Wreturn-mismatch error

PR-fix-cuda-picture-widening — CUDA picture_cuda.c integer-precision fixes (round-5 clang-tidy)

  • Touches: core/src/cuda/picture_cuda.c — upstream-shared CUDA picture allocation and transfer path.
  • Invariant: Three WidthInBytes / cuMemAllocPitch width arguments now use (size_t) casts to prevent silent 32-bit multiplication overflow before widening to size_t. aligned_y / aligned_c are unsigned with an explicit 1u mask literal. vmaf_ref_load() result is stored as long. If an upstream sync modifies these expressions, ensure the (size_t) casts and unsigned types are preserved.
  • Re-test on rebase:
clang-tidy \
  -checks='-*,bugprone-narrowing-conversions,bugprone-implicit-widening-of-multiplication-result' \
  -p core/build-cuda \
  core/src/cuda/picture_cuda.c
# Must produce zero warnings for the named checks.

PR-fix-cuda-dispatch-getenv — CUDA dispatch_strategy.c getenv() thread-safety fix

  • Touches: core/src/cuda/dispatch_strategy.c — fork-local TU (no upstream equivalent).
  • Invariant: g_env_once / cache_env_dispatch / g_env_disp must remain as the single canonical read path for VMAF_CUDA_DISPATCH. If a future PR needs to re-read the variable (e.g., for unit-test reset), it must reset g_env_once via pthread_once_t g_env_once = PTHREAD_ONCE_INIT; in a test fixture, not call getenv() directly from vmaf_cuda_select_strategy.
  • Re-test on rebase:
clang-tidy \
  -checks='-*,concurrency-*' \
  -p core/build-cuda \
  core/src/cuda/dispatch_strategy.c
# Must produce zero concurrency-mt-unsafe warnings.

0103 — -fvisibility=hidden + VMAF_EXPORT public-API annotation (ADR-0379, Research-0092)

  • Touches: core/src/meson.build (vmaf_cflags_common), core/include/libvmaf/*.h (all public headers), core/include/libvmaf/macros.h (new file), core/include/core/meson.build (header install list), core/src/dnn/model_loader.h (vmaf_dnn_verify_signature declaration).
  • Invariant: -fvisibility=hidden is in vmaf_cflags_common. Any new public vmaf_* function added by an upstream sync — whether in libvmaf.c, picture.c, dict.c, or any other source — must also have VMAF_EXPORT on its declaration in the matching public header, otherwise it will be hidden in libvmaf.so and downstream callers will get a link error. Gate: nm -D --defined-only build/src/libvmaf.so.* | grep ' [TW] ' | grep -v ' vmaf_' | wc -l must be 0.
  • On upstream sync: upstream Netflix/vmaf does NOT use -fvisibility=hidden. Any new public entry point in an upstream commit (typically added to core/src/libvmaf.c + core/include/libvmaf/libvmaf.h) will compile to a hidden symbol on the fork without VMAF_EXPORT. The merge author must:
  • Add VMAF_EXPORT to the new declaration in the public header.
  • Run the nm -D gate (above) — it must return 0.
  • Run meson test -C build — all tests must pass.
  • Re-test on rebase:
meson setup build-vis libvmaf -Denable_cuda=false -Denable_sycl=false --wipe
ninja -C build-vis
nm -D --defined-only build-vis/src/libvmaf.so.* | grep ' [TW] ' | grep -v ' vmaf_' | wc -l
# Must print 0
meson test -C build-vis
# All tests must pass

PR-fix-cuda-switch-defaults — CUDA feature extractor defensive fixes (round-5 clang-tidy)

  • Touches: core/src/feature/cuda/integer_adm_cuda.c, core/src/feature/cuda/integer_vif_cuda.c.
  • Invariant: default: break; clauses added to three switch(scale) statements. If an upstream sync adds new scale cases to ADM (scales 1–3) or VIF (scales 0–3), the default clause remains valid but no longer exhausts all cases — update the comment accordingly. The RES_BUFFER_SIZE macro now has parentheses; any fork-local addition to that macro must preserve them.
  • Re-test on rebase:
clang-tidy \
  -checks='-*,bugprone-macro-parentheses,bugprone-switch-missing-default-case' \
  -p core/build-cuda \
  core/src/feature/cuda/integer_adm_cuda.c \
  core/src/feature/cuda/integer_vif_cuda.c
# Must produce zero warnings for those checks.

PR-fix-picture-align-unsigned-narrowing — integer-sanitizer narrowing/overflow fixes in picture.c, libvmaf.c, tensor_io.c (round-5 -fsanitize=integer sweep)

  • Touches: core/src/picture.c (upstream-shared picture geometry), core/src/libvmaf.c (upstream-shared vmaf_init), and core/src/dnn/tensor_io.c (fork-added f16 ↔ f32 converter).
  • Invariant: Three narrowing/overflow defects corrected: (1) aligned_y/aligned_c are now unsigned with DATA_ALIGN - 1u mask — if an upstream sync touches picture_compute_geometry, ensure the unsigned type and 1u literal are preserved. (2) vmaf_set_cpu_flags_mask call site in vmaf_init uses (unsigned)(~cfg.cpumask) — if cpumask type changes upstream, revisit the cast. (3) f16_to_f32_one subnormal path uses a signed int32_t exp_adj counter — if the f16 converter is reworked, verify no unsigned wrap is reintroduced.
  • Re-test on rebase:
CC=clang CXX=clang++ meson setup /tmp/build-isan-retest libvmaf \
  -Denable_cuda=false -Denable_sycl=false \
  --buildtype=debugoptimized -Db_sanitize=integer -Db_lundef=false -Db_lto=false
ninja -C /tmp/build-isan-retest
UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0" \
  /tmp/build-isan-retest/test/test_picture 2>&1 | grep "runtime error"
UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0" \
  /tmp/build-isan-retest/test/test_read_pictures_monotonic 2>&1 | grep "runtime error"
UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0" \
  /tmp/build-isan-retest/test/dnn/test_tensor_io 2>&1 | grep "runtime error"
# All three must produce zero "runtime error" lines.

PR-fix-cuda-pinned-alloc-null-deref — CWE-476 null-deref in vmaf_cuda_picture_alloc_pinned (round-6 cross-PR audit)

  • Touches: core/src/cuda/picture_cuda.c — CUDA host TU; no upstream equivalent.
  • Invariant: The sequential check pattern (err = vmaf_picture_priv_init(pic); if (err) goto free_data;) must be preserved on any rebase or future modification. The |= idiom evaluates the right-hand side unconditionally regardless of prior failure — PR #700 fixed the identical pattern in picture.c (CWE-476); this fix closes the same class in the CUDA path. If upstream ever adds a similar pinned-picture allocation function, apply the same sequential-check discipline. Secondary: DATA_ALIGN_PINNED - 1u (with the u suffix) must be preserved on both sides of the alignment mask expression to match the picture.c pattern fixed by PR #708.
  • Re-test on rebase:
gcc -fanalyzer -Wno-analyzer-too-complex \
  -Ilibvmaf/src -Ilibvmaf/include \
  core/src/cuda/picture_cuda.c 2>&1 | grep "CWE-476"
# Must produce zero CWE-476 warnings for vmaf_cuda_picture_alloc_pinned.
meson test -C build --suite=fast
# Must be green.

0380 — FFmpeg HIP backend selector patch (ADR-0380, ffmpeg-patches 0011)

  • Touches: ffmpeg-patches/0011-libvmaf-wire-hip-backend-selector.patch, ffmpeg-patches/series.txt, ffmpeg-patches/README.md.
  • Invariant: The patch is authored against FFmpeg n8.1.1 as the cumulative base (0001..0010 stack applied). Context lines in the patch reference VmafCudaState *cu_state / cuda_pool_initialised (added by patch 0010) and #include <vulkan/vulkan.h> / #endif (added by patch 0006). If the series is ever rebased to a newer FFmpeg tag (n8.2, n9.x, etc.) the surrounding context in vf_libvmaf.c may shift; run the full series replay against the new tag and regenerate conflicting patches. The HIP cleanup path (vmaf_hip_state_free(&s->hip_state)) uses a double pointer, unlike the CUDA path (vmaf_cuda_state_free(s->cu_state)) which uses a single pointer — this asymmetry is intentional and matches libvmaf_hip.h; preserve it.
  • Error-code invariant (fix/hip-averror-propagation-0011, 2026-05-10): Both HIP error sites in init() must use return AVERROR(-err), not return AVERROR(EINVAL). AVERROR(-err) maps the libvmaf-supplied errno (e.g. -ENODEV = -19, -ENOSYS = -38) to the correct FFmpeg error string ("No such device", "Function not implemented"). The AVERROR(EINVAL) form was the original patch text; it was corrected during the full 0001–0011 e2e test run. If the patch context is regenerated or the hunk is split, verify the fix is preserved.
  • Re-test on rebase:
git -C /tmp/ffmpeg-8 reset --hard n8.1.1
for p in ffmpeg-patches/000*-*.patch; do
    git -C /tmp/ffmpeg-8 am --3way "$p" || break
done
# All 11 patches must apply without conflict.

Research-0094 — integer_motion_v2 flush() dict-leak fix

  • Touches: core/src/feature/integer_motion_v2.c.
  • Invariant: The dict_locally_owned flag in flush() (introduced in this fix) relies on the invariant that s->feature_name_dict is NULL at flush() entry only in the registered-context (threaded dispatch) path, and non-NULL only when extract() has already run on this context (serial / pool-instance path). If a future upstream change causes extract() to clear s->feature_name_dict mid-run (e.g., per-scene re-init), the flag will incorrectly take the locally-owned path and free the dict prematurely. The companion unit test (test in test_feature_extractor) guards this via the existing motion_v2 code path.
  • Re-test on rebase:
meson test -C build --suite=fast
# 53/53 must pass, including test_feature_extractor and test_motion_v2_simd.
ASAN_OPTIONS='detect_leaks=1' ./build-leak/tools/vmaf \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 --feature motion_v2 \
  --output /dev/null --threads 4 2>&1 | grep -E 'leak|SUMMARY'
# Must produce no output (clean).

0382 — Y4M negative-dimension rejection (ADR-0382, T-FUZZ-Y4M-NEG-WIDTH-SEGV)

  • Touches: core/tools/y4m_input.c (internal static y4m_input_open_impl), core/test/fuzz/y4m_input_known_crashes/ (new corpus seed).
  • Invariant: The guard if (_y4m->pic_w <= 0 || _y4m->pic_h <= 0) must stay between the y4m_parse_tags() call and the chroma-type dispatch block. If upstream restructures y4m_input_open_impl or moves the tag parser, the guard must migrate with it so no allocation occurs before the check. The y4m_neg_width_null_deref.y4m seed must be replayed on every rebase to confirm the parser returns clean -1 rather than SEGV.
  • No rebase impact on public API or ffmpeg-patches: the fix is internal to y4m_input_open_impl (a static function); no public header is changed; no ffmpeg-patches patch is affected.
  • Re-test on rebase:
CC=clang meson setup build-fuzz libvmaf \
    -Dfuzz=true -Db_sanitize=address \
    -Denable_cuda=false -Denable_sycl=false \
    -Denable_vulkan=disabled --buildtype=debug
ninja -C build-fuzz core/test/fuzz/fuzz_y4m_input
./build-fuzz/core/test/fuzz/fuzz_y4m_input \
    core/test/fuzz/y4m_input_known_crashes/y4m_neg_width_null_deref.y4m
# Pre-fix: SEGV on address 0x000000000000 inside fread.
# Post-fix: exits 0; stderr prints
#   "Invalid YUV4MPEG2 dimensions: W=-8 H=4 (must be > 0)."

fix/picture-odd-dim-chroma-ceiling — picture_compute_geometry ceiling division for odd luma dims

  • Touches: core/src/picture.c, core/src/cuda/picture_cuda.c, core/src/feature/cuda/integer_psnr_cuda.c, core/src/feature/cuda/integer_psnr_hvs_cuda.c, core/src/feature/integer_psnr.c, core/test/test_picture.c.
  • Invariant: All geometry computations for chroma plane dimensions use ceiling division (dim + ss) >> ss (where ss is 0 or 1). If upstream adds a new allocator or copies the geometry pattern, it must use the same ceiling form. The regression test test_picture_odd_dim_chroma_ceiling pins this: 577 × 323 YUV420 must produce pic.w[1]==289, pic.h[1]==162.
  • Re-test on rebase:
meson test -C build --suite=fast
# test_picture must pass; it includes test_picture_odd_dim_chroma_ceiling.
# Additionally, the ASan smoke:
python3 -c "
W, H = 577, 323
luma = bytes([128] * W * H)
cw, ch = (W+1)>>1, (H+1)>>1
chroma = bytes([128] * cw * ch)
open('/tmp/odd.yuv','wb').write((luma+chroma+chroma)*3)
"
ASAN_OPTIONS=halt_on_error=1 ./build/tools/vmaf \
  --reference /tmp/odd.yuv --distorted /tmp/odd.yuv \
  --width 577 --height 323 --pixel_format 420 --bitdepth 8 \
  --feature ciede --threads 4
# Must exit 0 with no ASan reports.

fix/motion-mirror-padding-min-dim — 5-tap filter minimum-dimension guard in all motion extractors

  • Touches: core/src/feature/integer_motion.c, core/src/feature/integer_motion_v2.c, core/src/feature/float_motion.c, core/src/feature/cuda/integer_motion_cuda.c, core/src/feature/cuda/float_motion_cuda.c, core/src/feature/cuda/integer_motion_v2_cuda.c, core/src/feature/sycl/integer_motion_sycl.cpp, core/src/feature/sycl/float_motion_sycl.cpp, core/src/feature/sycl/integer_motion_v2_sycl.cpp, core/src/feature/vulkan/motion_vulkan.c, core/src/feature/vulkan/motion_v2_vulkan.c, core/src/feature/vulkan/float_motion_vulkan.c, core/src/feature/hip/integer_motion_v2_hip.c, core/src/feature/hip/float_motion_hip.c, core/test/test_motion_min_dim.c, core/test/meson.build.
  • Invariant: Every motion init() rejects w < 3 || h < 3 with -EINVAL before any buffer allocation. The reflect-101 mirror formula height - (i_tap - height + 2) requires height ≥ filter_width/2 + 1 = 3. If upstream Netflix/vmaf modifies the convolution core to support smaller frames (e.g. by switching to a clamp-to-edge formula), the guard should be re-evaluated. If upstream adds a new motion extractor that also uses the 5-tap kernel, add the same guard to its init().
  • Re-test on rebase:
meson test -C build --suite=fast
# test_motion_min_dim must pass (13/13 cases).
# Reproducer:
python3 -c "
plane = bytes([128]*1*1)
chroma = bytes([128]*1*1)
frame = plane + chroma + chroma
with open('/tmp/1x1.yuv','wb') as f: f.write(frame*3)
"
./build/tools/vmaf --reference /tmp/1x1.yuv --distorted /tmp/1x1.yuv \
  --width 1 --height 1 --pixel_format 420 --bitdepth 8 \
  --feature motion --threads 1 2>&1 | grep -E 'EINVAL|minimum|below'
# Must print the "frame 1x1 is below the 5-tap filter minimum" message.

0381 — Vulkan VIF scale 2/3 numerical saturation fix (ADR-0381, PR #718)

  • Touches: core/src/feature/vulkan/shaders/float_vif.comp, core/src/feature/vulkan/vif_vulkan.c, core/src/vulkan/meson.build.
  • Invariant 1 — float_vif.comp must remain in psnr_hvs_strict_shaders. meson.build's psnr_hvs_strict_shaders list controls which shaders compile with glslc -O0 (strict) vs -O (optimised). float_vif.comp belongs to this list because the SPIR-V optimizer's FMA-contraction and reassociation of sigma1_sq = xx - mu1*mu1 triggers catastrophic cancellation at scales 2 and 3 (small local variance), saturating the per-scale score to 1.0 and inflating VMAF by ~+1.07. If the list is re-ordered or float_vif.comp is accidentally removed, restore it.
  • Invariant 2 — precise qualifiers on float_vif.comp accumulators. The vertical-pass accumulators (a_mu1, a_mu2, a_xx, a_yy, a_xy), the horizontal-pass accumulators (mu1, mu2, xx, yy, xy), and the sigma expressions (sigma1_sq, sigma2_sq, sigma12) carry precise qualifiers. These map to OpDecorate NoContraction in SPIR-V and defend against driver-side FMA contraction (Vulkan 1.4 NVIDIA / newer MoltenVK). Do not remove the precise qualifiers without re-running places=4 on every CI hardware lane.
  • Invariant 3 — integer VIF rd buffer ceiling division. vif_vulkan.c::alloc_buffers() allocates the per-scale rd buffers with ceiling division: ((w + 1u) / 2u) * ((h + 1u) / 2u). For odd input dimensions (e.g. h=81 at scale 2 of a 576×324 input), the shader writes rd_y indices up to h/2 = 40 (inclusive), requiring 41 rows. Floor division h/2 = 40 under-allocates by one row (72 uint32 slots), corrupting the adjacent per-WG int64 accumulator buffer. The ceiling form must be preserved on any refactor or upstream merge touching alloc_buffers.
  • Re-test on rebase:
# Build with Vulkan enabled
meson setup build-vk-vif libvmaf -Denable_vulkan=true -Denable_cuda=false       -Denable_sycl=false --buildtype=release
ninja -C build-vk-vif

# Run per-scale VIF parity check on the Netflix 576x324 golden pair
build-vk-vif/tools/vmaf \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 --feature float_vif_vulkan --backend vulkan \
  --output /tmp/vk_vif_out.json
# All per-scale VIF scores must be < 1.0; VMAF must be within ±0.5 of CPU.
# Per-scale delta must be < 1e-3 from CPU reference.

fix/recal-adm-f1f2-post-pr731 — Recalibrate fork-local adm_f1f2 assertion after PR #731 AIM port

No rebase impact: this change touches only python/test/feature_extractor_test.py (a single assertAlmostEqual value for the fork-local adm_f1s/f2s feature), with a documentation comment explaining the recalibration. No C sources, no public headers, no Meson options, no FFmpeg patch stack entries were modified. If upstream Netflix/vmaf adds its own adm_f1s/f2s noise-weight test in a future sync, verify that the expected value (0.8872294166666667) still matches the post-PR-#731 CPU scalar path output for the src01_hrc00_576x324.yuvsrc01_hrc01_576x324.yuv pair with the f1s/f2s parameters listed in test_run_vmaf_fextractor_adm_f1f2.

  • Re-test: PYTHONPATH=$PWD/python python3 -m pytest python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_vmaf_fextractor_adm_f1f2 -v — must report 1 passed.

feat/adm-gpu-param-sync — ADM noise_weight/csf_scale/csf_diag_scale GPU extension

  • Touches: core/src/feature/cuda/float_adm_cuda.c, core/src/feature/cuda/integer_adm_cuda.c, core/src/feature/sycl/float_adm_sycl.cpp, core/src/feature/sycl/integer_adm_sycl.cpp, core/src/feature/vulkan/adm_vulkan.c, core/src/feature/vulkan/float_adm_vulkan.c.
  • Invariant 1 — three-param parity with CPU. Every GPU ADM backend (float_adm_cuda, integer_adm_cuda, float_adm_sycl, integer_adm_sycl, adm_vulkan, float_adm_vulkan) exposes adm_csf_scale, adm_csf_diag_scale, and noise_weight with the same defaults (1.0, 1.0, 0.03125) as the CPU scalar path added by PR #731. If upstream Netflix ever adds or renames these parameters in integer_adm.c / float_adm.c, the corresponding GPU files must be updated in the same PR.
  • Invariant 2 — integer CUDA must NOT include adm_options.h directly. core/src/feature/cuda/integer_adm_cuda.c must NOT include feature/adm_options.h directly. DEFAULT_ADM_NOISE_WEIGHT, DEFAULT_ADM_CSF_SCALE, DEFAULT_ADM_CSF_DIAG_SCALE, and the full 4-member enum ADM_CSF_MODE arrive transitively via cuda/integer_adm_cuda.hfeature/integer_adm.h. A direct include reintroduces the 2-member enum ADM_CSF_MODE from adm_options.h and produces a redeclaration error.
  • Invariant 3 — Vulkan integer fast-path gated on CSF-scale defaults. adm_vulkan.c contains a hard-coded i_rfactor fast-path for the 3.0 * 1080 default viewing geometry. It is gated by: bool csf_default = (fabs(s->adm_csf_scale - 1.0) < 1e-9) && (fabs(s->adm_csf_diag_scale - 1.0) < 1e-9). If the fast-path is ever updated, the CSF-default guard must be updated to match; removing or loosening the guard will produce wrong rfactors when non-default CSF scales are in use.
  • Re-test on rebase:
# CPU-only build + golden test
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
  -Denable_vulkan=disabled
ninja -C build-cpu
make test-netflix-golden

# Verify default params produce unchanged scores
build-cpu/tools/vmaf \
  -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
  -d python/test/resource/yuv/src01_hrc01_576x324.yuv \
  -w 576 -h 324 -p 420 -b 8 \
  --feature adm=noise_weight=0.03125:adm_csf_scale=1.0:adm_csf_diag_scale=1.0 \
  --output /tmp/adm_param_default.json
# adm2 must match the no-param baseline (places=4).

0383 — K150K parallel CPU driver + feature_extractor_list dedup fix (ADR-0383)

  • Touches:
  • ai/scripts/extract_k150k_features.py — driver redesign.
  • ai/AGENTS.md — K150K-A invariant note updated.
  • core/src/feature/feature_extractor.c — duplicate CUDA extractor registration removed (lines 239–240 deduplicated: six CUDA extractors that were registered twice).
  • docs/ai/datasets/k150k.md — user-facing docs updated.
  • docs/adr/0383-k150k-parallel-cpu-driver.md — new ADR.
  • docs/research/0096-k150k-gpu-driver-investigation-2026-05-10.md — new digest.
  • Invariant 1 — feature_extractor_list[] must have no duplicate entries. The dedup in feature_extractor_vector_append() is by extractor name, not by provided-feature name. Duplicate entries in feature_extractor_list[] result in both extractors being registered and both writing the same feature-collector slots. If upstream Netflix/vmaf modifies core/src/feature/feature_extractor.c to add new backend entries, verify that no extractor is registered more than once.
  • Invariant 2 — CUDA binary double-write via default model auto-load. When --model is not specified and --no-prediction is absent, the CLI auto-loads vmaf_v0.6.1, which registers CUDA twins via vmaf_use_features_from_model(). A subsequent --feature adm call registers the CPU "adm" extractor in addition; both run and double-write. This is a latent bug in the CLI model-auto-load / explicit-feature interaction path. The K150K pipeline works around it by using the CPU binary (no CUDA context). See Research-0096 for full root-cause analysis.
  • Re-test on rebase:
# Verify no duplicate entries exist in feature_extractor_list[]
grep -c "vmaf_fex_integer_adm_cuda" core/src/feature/feature_extractor.c
# Must print 2 (one extern declaration + one list entry)

# Verify CPU driver produces correct output
python ai/scripts/extract_k150k_features.py --limit 5 --threads-cuda 2 \
  --out /tmp/smoke5.parquet && python3 -c \
  "import pandas; df=pandas.read_parquet('/tmp/smoke5.parquet'); \
   print(df.shape, df.columns.tolist()[:5])"
# Must print (5, 48) and the first five column names.

fix/ci-master-shfmt-cppcheck-semgrep — CI gate fixes (ADR-0384)

  • Touches: .pre-commit-config.yaml, .github/workflows/lint-and-format.yml, core/src/feature/adm.c, core/src/feature/ansnr.c, core/src/feature/offset.c, core/src/feature/vif.c, scripts/ci/check-agent-worktree-drift.sh.
  • Invariant: No rebase-sensitive invariants. The .pre-commit-config.yaml and workflow changes are fork-infrastructure. The (void *) cast fix in the four feature files is a portable C idiom; upstream may or may not carry their own version of these files. The semgrep-comment reword in check-agent-worktree-drift.sh is fork-only.
  • Re-test on rebase:
# Verify semgrep is clean
semgrep scan --config=.semgrep.yml --error

# Verify cppcheck finds no invalidPointerCast in the four files
cppcheck --enable=portability core/src/feature/adm.c \
  core/src/feature/ansnr.c core/src/feature/offset.c \
  core/src/feature/vif.c 2>&1 | grep invalidPointerCast
# Must produce no output.

no rebase impact: the CI infrastructure files are fork-local; the C source changes are minimal (cast through void*) and will trivially survive any upstream rebase that doesn't rewrite these specific functions.

fix/thread-pool-pthread-create-unchecked — thread pool pthread_create error handling + n_workers_created race fix

  • Touches: core/src/thread_pool.c.
  • Invariant: VmafThreadPool now has a n_workers_created field (written once at creation, never decremented) alongside the existing n_threads counter (decremented by each exiting worker). Any upstream change to thread_pool.c that adds or renames struct fields or changes the pthread_create call site must be reconciled against the fork's error-handling block (lines ~170–192) and the n_workers_created field initialisation.
  • Re-test on rebase:
meson setup /tmp/build-tp-rebase libvmaf \
  -Denable_cuda=false -Denable_sycl=false --buildtype=debugoptimized
meson test -C /tmp/build-tp-rebase
# Must report 54/54 (or more) OK.

ai/tiny-netflix-training-scaffold — tiny-AI Netflix corpus training scaffold draft PR (ADR-0417)

  • Touches: docs/adr/0417-tiny-ai-netflix-training-scaffold-pr.md, docs/research/0099-tiny-ai-netflix-training-update.md, docs/adr/_index_fragments/0417-tiny-ai-netflix-training-scaffold-pr.md, changelog.d/added/0417-tiny-ai-netflix-training-scaffold-pr.md, docs/ai/training-data.md (See-also links only).
  • Invariant: the corpus path .workingdir2/netflix/ is gitignored and must never be committed. The --data-root CLI flag and VMAF_DATA_ROOT environment variable are the only sanctioned ways to point the training scripts at the corpus. Any rebase or upstream sync that modifies ai/ or mcp-server/vmaf-mcp/ must preserve this invariant; verify with git check-ignore -v .workingdir2/netflix/ref/ (must return the root .gitignore entry).
  • Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (for the vmaf binary).
# test_list_tools_returns_expected_names  PASSED
# test_list_tools_each_has_input_schema   PASSED
# test_call_tool_list_models_returns_list PASSED
# test_call_tool_list_backends_includes_cpu PASSED
# test_call_tool_unknown_name_returns_error_json PASSED
# test_call_tool_vmaf_score_golden_pair   PASSED (requires build/tools/vmaf)

no rebase impact on libvmaf C sources: this branch is doc-only (ADR-0417, Research Digest 0099, changelog fragment, ADR index fragment). The MCP smoke test and training-data.md are already in master and untouched by this branch.

0420 — Metal (Apple Silicon) backend runtime (T8-1b / ADR-0420)

  • Touches:
  • core/src/metal/common.mm (new, replaces common.c) — MTLDevice + MTLCommandQueue lifecycle; MTLCreateSystemDefaultDevice for auto-pick; MTLCopyAllDevices for explicit indexing; Apple-Family-7 gate.
  • core/src/metal/picture_metal.mm (new, replaces picture_metal.c) — MTLBuffer allocator with MTLResourceStorageModeShared (zero-copy).
  • core/src/metal/kernel_template.mm (new, replaces kernel_template.c) — private MTLCommandQueue + two MTLSharedEvent handles; blit-fill accumulator zero; cross-queue encodeWaitForEvent; waitUntilCompleted drain.
  • core/src/metal/common.h — two new internal accessors: vmaf_metal_context_device_handle() + vmaf_metal_context_queue_handle().
  • core/src/metal/meson.builddependency('Foundation'/'Metal', required: true); -fobjc-arc project arg for objcpp; source list flipped from .c to .mm.
  • core/test/test_metal_smoke.c — smoke expectations flipped from -ENOSYS pin to runtime (0 on Apple7+, -ENODEV elsewhere).
  • docs/adr/0420-metal-backend-runtime-t8-1b.md + docs/adr/_index_fragments/0420-metal-backend-runtime-t8-1b.md + changelog.d/changed/metal-backend-runtime.md.
  • Upstream-port footprint: zero — Netflix/vmaf has no Metal backend.
  • Rebase invariants:
  • Header purity: no <Metal/Metal.h> in any header or pure-C consumer. Metal handles cross the boundary as void * / uintptr_t. Do not promote a Metal type into a header on rebase.
  • ARC bridge-cast discipline: __bridge_retained to stash (+1 retain), __bridge_transfer to release (−1), __bridge to borrow (no refcount). A missing _retained leaks; a missing _transfer double-frees.
  • Struct privacy: struct VmafMetalContext is defined only in common.mm. Consumers use the accessor pair — never struct-layout introspection.
  • HIP twin parity for kernel_template: any PR that grows the HIP kernel_template.c lifecycle must propagate the same change to kernel_template.mm in the same PR.
  • Re-test on rebase (macOS, Apple-Family-7+):
meson setup build libvmaf -Denable_metal=enabled \
    -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_metal_smoke   # must PASS

On Linux: meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false && ninja -C build (Metal subdir not entered; no Metal test registered).

ADR-0422 — CLI HIP and Metal backend selectors (2026-05-11)

Files touched: core/tools/cli_parse.h, core/tools/cli_parse.c, core/tools/vmaf.c, core/test/test_cli_parse.c, core/include/libvmaf/libvmaf_metal.h.

Rebase impact: none — this PR only adds new CLI flags and branches to the standalone vmaf tool. No libvmaf public C API symbols changed; no meson_options.txt entries added; the ffmpeg-patches stack is unaffected. The libvmaf_metal.h change is docstring-only (no API surface delta).

Invariants to preserve on rebase:

  • CLISettings in cli_parse.h has no_hip, hip_device, no_metal, metal_device fields. If upstream adds its own HIP/Metal CLI flags in the same struct, resolve the merge by keeping the fork's field names (they match our header convention) and dropping any upstream stub.
  • --backend cpu disables all five GPU backends (no_cuda, no_sycl, no_vulkan, no_hip, no_metal). If upstream extends the backend enum, ensure the cpu branch stays exhaustive.
  • init_gpu_backends() signature in vmaf.c passes hip_state/hip_active and metal_state/metal_active by reference under #ifdef HAVE_HIP / #ifdef HAVE_METAL guards. Preserve both the guards and the by-reference convention on rebase.

Smoke-test after rebase:

meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --buildtype=debug
ninja -C build
meson test -C build test_cli_parse   # all 5 new tests must pass

2026-05-14 — vmaf-tune recommend --from-corpus Row Filtering

Files touched: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_recommend.py, docs/usage/vmaf-tune.md, docs/state.md.

Rebase impact: low. This only aligns the CLI --from-corpus path with the existing vmaftune.recommend.recommend() filtering contract. No corpus schema, encode path, score path, or model artefact changes.

Invariant to preserve on rebase: both CLI and library corpus recommendation must filter through RecommendRequest / recommend() so failed rows, NaN rows, and non-matching encoder / preset rows cannot win from the CLI.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_recommend.py -q

2026-05-14 — vmaf-tune Usage-Doc Scaffold Label Cleanup

Files touched: docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-coarse-to-fine.md, docs/usage/vmaf-tune-bitrate-ladder.md, docs/usage/vmaf-tune-ladder-default-sampler.md, docs/usage/vmaf-tune-saliency-aware.md.

Rebase impact: documentation-only. The implementation already lives in tools/vmaf-tune/src/vmaftune/corpus.py, ladder.py, saliency.py, and fast.py; this PR removes stale user-facing stub labels that survived after those surfaces were wired.

Invariant to preserve on rebase: usage docs are implementation-status contracts, not backlog labels. If a command is wired and tested, do not call it a stub/scaffold in docs/usage/; describe the shipped path and name any remaining production limit precisely.

Smoke-test after rebase:

rg -n 'scaffold-only|Status: scaffold only|\(stub\)|\*\*Stub\*\*|recommend --saliency-aware|advisory in scaffold' \
  docs/usage/vmaf-tune.md \
  docs/usage/vmaf-tune-coarse-to-fine.md \
  docs/usage/vmaf-tune-bitrate-ladder.md \
  docs/usage/vmaf-tune-ladder-default-sampler.md \
  docs/usage/vmaf-tune-saliency-aware.md

2026-05-14 — vmaf-tune fast --time-budget-s Timeout Wiring

Files touched: tools/vmaf-tune/src/vmaftune/fast.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_fast.py, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-fast-path.md.

Rebase impact: low. This is a fast-path user-surface fix only; no libvmaf public C API, model schema, or FFmpeg patch stack is touched.

Invariant to preserve on rebase: time_budget_s is a soft Optuna timeout. Do not revert it to metadata-only. The JSON n_trials field reports completed trials because it may be lower than the requested --n-trials when the timeout fires.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_fast.py \
  tools/vmaf-tune/tests/test_cli_fast.py -q

2026-05-14 — vmaf-tune Public Doc Stub-Label Sweep

Files touched: docs/usage/vmaf-tune-resolution-aware.md, docs/ai/ensemble-training-kit.md, docs/ai/models/vmaf_tiny_v5.md, docs/ai/per-pr-doc-bar.md, docs/ai/predictor.md, docs/development/ffmpeg-patches-refresh.md, docs/development/ossf-scorecard.md, tools/vmaf-tune/README.md, tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py, tools/vmaf-tune/src/vmaftune/per_shot.py.

Rebase impact: low. This PR updates stale public wording and docstrings after already-shipped implementations. It does not change the vmaf-tune row schema, CLI arguments, model defaults, or libvmaf public API.

Invariant to preserve on rebase: user-facing docs describe shipped implementation status, not old backlog labels. Keep intentional scaffold warnings only where the backing implementation or required external artefact is still genuinely missing.

Smoke-test after rebase:

rg -n '^# .*\(stub\)|^# .*stub|> \*\*Stub\*\*|0276-vmaf-tune-phase-d|full prose follows|later PR' \
  docs/usage docs/ai docs/development tools/vmaf-tune/README.md -g '*.md'
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_resolution.py \
  tools/vmaf-tune/tests/test_per_shot.py \
  tools/vmaf-tune/tests/test_encode_dispatcher_per_adapter.py -q
.venv/bin/python -m ruff check \
  tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py \
  tools/vmaf-tune/src/vmaftune/per_shot.py

2026-05-14 — vmaf-tune Predictor Directory-Corpus Training

Files touched: tools/vmaf-tune/src/vmaftune/predictor_train.py, tools/vmaf-tune/tests/test_predictor_train.py, docs/usage/vmaf-tune.md, tools/vmaf-tune/README.md.

Rebase impact: low. This only broadens the trainer's corpus input resolver from a single JSONL file to a file-or-directory source. The corpus row schema, predictor input vector, shipped model defaults, and libvmaf public surface are unchanged.

Invariant to preserve on rebase: directory corpus traversal is recursive and sorted. Keep that determinism so repeated training over .workingdir2/corpus_run/ sees the same row order across filesystems.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_predictor_train.py \
  -q

2026-05-14 — vmaf-tune benchmark Phase-G Corpus Report

Files touched: tools/vmaf-tune/src/vmaftune/benchmark.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_benchmark.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0424-vmaf-tune-corpus-benchmark.md, docs/research/0106-vmaf-tune-corpus-benchmark.md.

Rebase impact: low. The new command is a read-only consumer of the existing Phase-A JSONL row schema. It does not change CORPUS_ROW_KEYS, libvmaf public API, FFmpeg patches, or encode/scoring behaviour.

Invariant to preserve on rebase: vmaf-tune benchmark must stay offline. It reads corpus rows and reports matched-quality encoder summaries; live encode comparisons remain owned by vmaf-tune compare.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_benchmark.py -q

2026-05-14 — vmaf-tune auto winner selection

Files touched: tools/vmaf-tune/src/vmaftune/auto.py, tools/vmaf-tune/tests/test_auto_short_circuits.py, docs/usage/vmaf-tune.md, tools/vmaf-tune/AGENTS.md.

Rebase impact: low. The Phase F JSON schema now includes metadata.winner and a per-cell selected boolean, but corpus rows and libvmaf public APIs are unchanged.

Invariant to preserve on rebase: keep metadata.winner aligned with exactly one cells[].selected == true row. The selector remains quality/budget ordered per ADR-0428.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_auto_short_circuits.py -q

2026-05-14 — testdata bench_perf portability

Files touched: testdata/bench_perf.py, testdata/test_bench_perf.py, docs/benchmarks.md.

Rebase impact: low. The performance JSON snapshots are unchanged; only the FFmpeg lavfi benchmark harness gains configuration and hardware-free smoke surfaces.

Invariant to preserve on rebase: bench_perf.py must not reintroduce mandatory machine-local paths. The MP4 decode test remains opt-in through --bbb-mp4-ref / VMAF_BBB_MP4_REF, while --require-all is the strict mode.

Smoke-test after rebase:

PYTHONPATH=. .venv/bin/python -m pytest testdata/test_bench_perf.py -q
.venv/bin/python testdata/bench_perf.py --list-tests
.venv/bin/python testdata/bench_perf.py --backend cpu --dry-run

2026-05-14 — CHUG HDR Corpus Ingestion + Feature Materialisation

Files touched: ai/scripts/chug_to_corpus_jsonl.py, ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, scripts/dev/training_discovery_report.py, docs/ai/chug-ingestion.md, docs/ai/mos-corpora.md, docs/research/0101-training-discovery-synthesis-2026-05-14.md, docs/adr/0426-chug-hdr-corpus-ingestion.md, docs/adr/0427-chug-hdr-feature-materialisation.md, and ai/AGENTS.md.

Rebase impact: low to medium. The new CHUG adapter is fork-local and local-only, but it intentionally widens the MOS-corpus family with an HDR dataset and optional chug_* JSONL metadata fields.

Invariant to preserve on rebase: CHUG media and labels stay out of git. The adapter stores CHUG's raw mos_j as mos_raw_0_100 and maps the trainer-facing mos to [1, 5]; do not silently change that scale. The feature materialiser pairs distorted rows to the matching chug_content_name reference and scales distorted clips to reference geometry before extraction. Keep the license posture non-commercial/share-alike until the README/license mismatch is clarified upstream.

Smoke-test after rebase:

PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_chug.py -q
python3 scripts/dev/training_discovery_report.py --output /tmp/training_discovery_report.md

2026-05-14 — vmaf-tune ladder Uncertainty CLI Wiring

Files touched: tools/vmaf-tune/src/vmaftune/ladder.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_ladder.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, and docs/usage/vmaf-tune-ladder.md.

Rebase impact: low. The normal point-estimate ladder path is unchanged. When --with-uncertainty is set, corpus rows that contain a vmaf_interval object now flow through apply_uncertainty_recipe() before select_knees(). Rows without intervals use the active wide_interval_min_width as a conservative centred fallback interval so point-only corpora still participate in midpoint insertion.

Invariant to preserve on rebase: the uncertainty transform stays post-hull and pre-knee-selection. Do not run it before convex_hull(), or synthetic midpoint rungs can distort the Pareto filtering stage.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_ladder.py \
  tools/vmaf-tune/tests/test_ladder_uncertainty.py -q

2026-05-14 — vmaf-tune libaom-av1 saliency ROI Dispatch

Files touched: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py, tools/vmaf-tune/src/vmaftune/cli.py, vmaf-tune saliency tests, and the matching usage docs/state/changelog notes.

Rebase impact: low. The change only adds libaom-av1 to the existing saliency ROI dispatch table and uses the FFmpeg patch stack's top-level -qpfile <path> option. It does not alter scoring, predictor inputs, model files, or libvmaf public ABI.

Invariant to preserve on rebase: libaom-av1 saliency uses the shared x264-style 16x16 qpfile writer, but passes it as separate argv tokens ("-qpfile", path). Keep ephemeral cleanup aware of both key=path params and -qpfile path pairs.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_saliency.py \
  tools/vmaf-tune/tests/test_saliency_roi_adapters.py \
  tools/vmaf-tune/tests/test_saliency_roi_codec.py \
  -q

2026-05-14 — Metal Dispatch Support Table

Files touched: core/src/metal/dispatch_strategy.c, core/src/metal/dispatch_strategy.h, core/test/test_metal_smoke.c, core/src/metal/AGENTS.md, docs/backends/metal/index.md.

Rebase impact: low. The dispatch predicate now reflects the Metal kernels already compiled into the backend; it does not change kernel math, picture layout, metallib embedding, or public libvmaf_metal.h symbols.

Invariant to preserve on rebase: every newly-landed Metal extractor must append both its extractor name and its provided feature keys to g_metal_features. Unknown features, NULL contexts, and NULL names must keep returning 0.

Smoke-test after rebase:

meson setup build-metal -Denable_metal=enabled
ninja -C build-metal test_metal_smoke
meson test -C build-metal test_metal_smoke

2026-05-14 — Tiny-AI Bisect Cache Real-Feature Bridge

Files touched: ai/scripts/build_bisect_cache.py, ai/tests/test_build_bisect_cache.py, ai/testdata/bisect/README.md, docs/ai/bisect-model-quality.md, ai/AGENTS.md.

Rebase impact: low. The committed nightly cache remains generated from the existing deterministic synthetic seeds unless callers pass --source-features. The real-feature path only broadens the generator to materialise an operator-provided parquet into the same features.parquet + linear-ONNX timeline layout.

Invariant to preserve on rebase: the output feature order stays adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2, and the output target column stays named mos even when the source uses dmos, target, or score.

Smoke-test after rebase:

PYTHONPATH=ai/src .venv/bin/python -m pytest \
  ai/tests/test_build_bisect_cache.py \
  ai/tests/test_bisect_model_quality.py -q
PYTHONPATH=ai/src .venv/bin/python ai/scripts/build_bisect_cache.py --check

2026-05-14 — Vulkan VIF Manual Int64 Subgroup Reduction

Files touched: core/src/feature/vulkan/shaders/vif.comp, core/src/vulkan/AGENTS.md, docs/adr/0269-vif-ciede-precise-step-a.md, docs/research/0108-vulkan-vif-int64-subgroup-reduction-2026-05-14.md, docs/state.md.

Rebase impact: medium. The shader semantic change is intentionally small, but it is load-bearing for Vulkan API-1.4 parity on NVIDIA. Do not simplify the Phase-4 VIF accumulator path back to subgroupAdd(int64_t) when resolving upstream shader conflicts.

Invariant to preserve on rebase: vif.comp must keep GL_KHR_shader_subgroup_shuffle and the manual reduce_i64_subgroup(...) helper for all seven int64 accumulator fields. The helper exists because NVIDIA RTX 4090 + driver 595.71.05 produced non-deterministic integer_vif_scale2 output through subgroupAdd(int64_t) at Vulkan API 1.4.

Smoke-test after rebase:

glslc --target-env=vulkan1.3 -O \
  core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
ninja -C build-vulkan-int64 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary "$PWD/build-vulkan-int64/tools/vmaf" \
  --reference testdata/ref_576x324_48f.yuv \
  --distorted testdata/dis_576x324_48f.yuv \
  --width 576 --height 324 --feature vif --backend vulkan \
  --device 0 --places 4

2026-05-14 — Saliency RGB ingest + SSIMULACRA2 public docs

Files touched: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/tests/test_saliency.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/metrics/ssimulacra2.md, docs/metrics/features.md, docs/adr/0430-saliency-rgb-ingest-and-ssimulacra2-docs.md, docs/research/0112-public-doc-gap-batch-2026-05-14.md.

Rebase impact: low. The changed saliency preprocessing is fork-local and keeps the same ONNX model input shape ([1, 3, H, W]).

Invariant to preserve on rebase: compute_saliency_map() must keep Y/U/V yuv420p ingest, BT.709 limited-range YUV-to-RGB conversion, and ImageNet normalisation before invoking saliency_student_v1. The old luma-replicated RGB path is no longer the user-facing contract.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_saliency.py -q
scripts/docs/concat-adr-index.sh --check

2026-05-14 — test_score_pooled_eagain Sanitizer Deselect Retired

Files touched: .github/workflows/tests-and-quality-gates.yml, core/src/feature/x86/adm_avx2.c, docs/state.md.

Rebase impact: low. The sanitizer workflow now dispatches test_score_pooled_eagain again in ASan, UBSan, and TSan lanes. The AVX2 ADM helper keeps scalar-path parity for direct-LUT-range values: temp < 32768 returns temp and shift 0, while larger values still use the rounded 15-bit reduction. The remaining T-SANITIZER-DEFECTS-REVEALED-758 exclusions stay in place.

Invariant to preserve on rebase: sanitizer deselect regexes should contain only tests with an active state row. Do not re-add test_score_pooled_eagain unless a fresh sanitizer report is captured and tracked. Do not call __builtin_clz() for ADM direct-LUT values below 32768.

Smoke-test after rebase:

ASAN_OPTIONS=detect_leaks=1:halt_on_error=1 \
  ./core/build-asan-score/test/test_score_pooled_eagain
UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 \
  ./core/build-ubsan-score/test/test_score_pooled_eagain
TSAN_OPTIONS=halt_on_error=1 \
  ./core/build-tsan-score/test/test_score_pooled_eagain

2026-05-14 — test_feature_collector Sanitizer Deselect Retired

Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md.

Rebase impact: low. The sanitizer workflow now dispatches test_feature_collector again in ASan, UBSan, and TSan lanes. The remaining T-SANITIZER-DEFECTS-REVEALED-758 exclusions stay in place.

Invariant to preserve on rebase: sanitizer deselect regexes should contain only tests with an active state row. Do not re-add test_feature_collector unless a fresh sanitizer report is captured and tracked.

Smoke-test after rebase:

ASAN_OPTIONS=detect_leaks=1:halt_on_error=1 \
  ./core/build-asan-score/test/test_feature_collector
UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 \
  ./core/build-ubsan-score/test/test_feature_collector
TSAN_OPTIONS=halt_on_error=1 \
  ./core/build-tsan-score/test/test_feature_collector

2026-05-14 — test_pic_preallocation Sanitizer Deselect Retired

Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md.

Rebase impact: low. The sanitizer workflow now dispatches test_pic_preallocation again in ASan, UBSan, and TSan lanes. The remaining T-SANITIZER-DEFECTS-REVEALED-758 exclusions stay in place.

Invariant to preserve on rebase: sanitizer deselect regexes should contain only tests with an active state row. Do not re-add test_pic_preallocation unless a fresh sanitizer report is captured and tracked.

Smoke-test after rebase:

ASAN_OPTIONS=detect_leaks=1:halt_on_error=1 \
  ./core/build-asan-score/test/test_pic_preallocation
UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 \
  ./core/build-ubsan-score/test/test_pic_preallocation
TSAN_OPTIONS=halt_on_error=1 \
  ./core/build-tsan-score/test/test_pic_preallocation

2026-05-14 — vmaf-tune libx265 encoder-stats parser

Files touched: tools/vmaf-tune/src/vmaftune/encoder_stats.py, tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py, tools/vmaf-tune/tests/test_encoder_stats_parser_x264.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md.

Rebase impact: low. The corpus row schema stays at the existing v3 ten-column enc_internal_* contract; this only teaches the parser x265's pass-1 aliases (q-aq, icu, pcu, scu) and fractional CTU counts.

Invariant to preserve on rebase: x264 imb / pmb / smb and x265 icu / pcu / scu must continue to feed the same intra / predicted / skip ratio columns. Do not split the public corpus schema per codec.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_encoder_stats_parser_x264.py -q

2026-05-14 — vmaf-tune predictor directory-corpus orchestration

Files touched: tools/vmaf-tune/src/vmaftune/predictor_train.py, tools/vmaf-tune/tests/test_predictor_train.py, docs/usage/vmaf-tune.md.

Rebase impact: low. The loader already supported recursive JSONL directories; this change removes stale is_file() gates in the trainer orchestration so CLI/API callers get the documented real-corpus path. Model format, feature order, corpus row schema, and shipped model bytes are unchanged.

Invariant to preserve on rebase: --corpus <directory> and train_all_codecs(corpus_path=<directory>) must call the same load_corpus() path as single-file inputs. Do not reintroduce file-only guards above the loader.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_predictor_train.py -q

2026-05-15 — vmaf-tune sidecar CLI wiring

Files touched: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_cli_sidecar.py, tools/vmaf-tune/tests/test_sidecar.py, tools/vmaf-tune/AGENTS.md, docs/ai/local-sidecar-training.md, docs/usage/vmaf-tune.md, docs/research/0122-vmaf-tune-sidecar-cli-2026-05-15.md.

Rebase impact: low. This adds one top-level vmaf-tune sidecar subcommand group and does not change corpus row schemas, predictor ONNX schemas, codec adapters, libvmaf public APIs, or FFmpeg patches.

Invariant to preserve on rebase: the CLI must remain a thin wrapper over vmaftune.sidecar.SidecarPredictor. It must keep the same cache layout (<cache>/<predictor-version>/<codec>/state.json), random host UUID posture, and ShotFeatures feature names as the Python API. Do not add upload, hostname-derived IDs, or predictor mutation to this surface.

Smoke-test after rebase:

cd tools/vmaf-tune && ../../.venv/bin/python -m pytest \
  tests/test_cli_sidecar.py tests/test_sidecar.py -q

2026-05-14 — vmaf-tune Phase-B Bisect Sample Clips

Files touched: tools/vmaf-tune/src/vmaftune/bisect.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_bisect.py, tools/vmaf-tune/tests/test_compare.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-bisect.md, docs/research/0109-vmaf-tune-bisect-sample-clip-2026-05-14.md.

Rebase impact: low. The public addition is one vmaf-tune compare flag and one Python bisect argument. It does not change the compare report schema, codec adapter registry, libvmaf public API, or FFmpeg patch stack.

Invariant to preserve on rebase: sample_clip_seconds in bisect_target_vmaf, make_bisect_predicate, and vmaf-tune compare must compute one centre-anchored sample window and thread it into both EncodeRequest (sample_clip_start_s / sample_clip_seconds) and ScoreRequest (frame_skip_ref / frame_cnt). Bitrate must be normalised against the sample duration when sample-clip mode is active. Unknown duration, non-positive framerate, or samples not shorter than the source remain full-source mode.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_bisect.py \
  tools/vmaf-tune/tests/test_compare.py -q

2026-05-14 — vmaf-tune ladder spacing alias fix

Files touched: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/ladder.py, tools/vmaf-tune/tests/test_ladder.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md.

Rebase impact: low. This only keeps the Phase-E CLI choices and library spacing modes aligned. The ladder hull math, default sampler, manifest schema, and encode/scoring behaviour are unchanged.

Invariant to preserve on rebase: argparse choices for vmaf-tune ladder --spacing must stay in lockstep with ladder.select_knees(). vmaf is the documented perceptual spacing mode; uniform remains a backwards-compatible alias for that same mode.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_ladder.py -q

2026-05-15 — Tiny-AI real-weight limitation docs

Files touched: docs/ai/roadmap.md, docs/ai/models/fastdvdnet_pre.md, docs/metrics/features.md, core/src/dnn/AGENTS.md, core/src/feature/AGENTS.md.

Rebase impact: low. This is a documentation / invariant-note cleanup that aligns user-facing docs with the already-shipped smoke: false FastDVDnet and TransNet V2 registry entries. Model bytes, registry schema, extractor I/O names, and runtime behaviour are unchanged.

Invariant to preserve on rebase: do not reintroduce placeholder-only wording for fastdvdnet_pre or transnet_v2. The remaining follow-ups are the FFmpeg temporal-filter consumer, luma-native FastDVDnet retrain, per-shot CRF aggregation, and true RGB / bilinear TransNet thumbnails.

Smoke-test after rebase:

rg -n "real upstream weights are tracked|ADR-0246|0253-fastdvdnet" \
  docs/ai docs/metrics core/src/dnn/AGENTS.md core/src/feature/AGENTS.md

2026-05-15 — vmaf-roi High-Bit-Depth Input

Files touched: core/tools/vmaf_roi.c, core/tools/test/meson.build, core/tools/test/test_vmaf_roi_high_bitdepth.sh, core/tools/AGENTS.md, docs/usage/vmaf-roi.md, docs/research/0123-vmaf-roi-high-bitdepth-2026-05-15.md.

Rebase impact: low. This extends an existing CLI flag and does not change libvmaf public APIs, encoder sidecar schemas, or FFmpeg patches.

Invariant to preserve on rebase: vmaf-roi --bitdepth 10|12|16 must seek using full planar YUV frame bytes, including chroma planes and 16-bit sample containers, then downscale luma to the existing luma8 saliency-model contract. Unsupported depths such as 9-bit remain rejected.

Smoke-test after rebase:

meson test -C core/build-roi-hbd test_vmaf_roi_high_bitdepth --print-errorlogs

2026-05-15 — vmaf-perShot 4:2:2 / 4:4:4 Input

Files touched: core/tools/vmaf_per_shot.c, core/tools/test/test_vmaf_per_shot.sh, core/tools/AGENTS.md, docs/usage/vmaf-perShot.md, docs/research/0124-vmaf-pershot-422-444-2026-05-15.md.

Rebase impact: low. This extends one existing CLI option and does not change the CSV / JSON plan schema, libvmaf public APIs, or FFmpeg patch stack.

Invariant to preserve on rebase: vmaf-perShot remains luma-only for detection and CRF prediction, but --pixel_format 420|422|444 must count the selected planar chroma layout when skipping to the next frame. --bitdepth remains limited to 8|10|12|16.

Smoke-test after rebase:

meson test -C core/build-pershot-pixfmt test_vmaf_per_shot --print-errorlogs

2026-05-15 — CUDA psnr_hvs DCT Parallelisation

Files touched: core/src/feature/cuda/integer_psnr_hvs/psnr_hvs_score.cu, core/src/feature/cuda/AGENTS.md, docs/backends/cuda/overview.md, docs/research/0130-cuda-psnr-hvs-dct-parallel-2026-05-15.md.

Rebase impact: low. The host lifecycle, feature names, CLI surface, and public APIs are unchanged. This is a CUDA-kernel scheduling optimisation for an existing extractor.

Invariant to preserve on rebase: only the integer 8x8 DCT passes run across the first eight CUDA threads. Float means, variances, masking, and masked-error accumulation stay thread-0 serial in CPU scan order; do not convert them to warp/block reductions without a separate numeric-contract ADR and cross-backend tolerance update.

Smoke-test after rebase:

python3 scripts/ci/cross_backend_vif_diff.py \
  --vmaf-binary "$PWD/core/build-cuda/tools/vmaf" \
  --reference testdata/ref_576x324_48f.yuv \
  --distorted testdata/dis_576x324_48f.yuv \
  --width 576 --height 324 --feature psnr_hvs --backend cuda --places 3

2026-05-15 — test_cli_parse Sanitizer Deselect Retired

Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md, changelog.d/fixed/sanitizer-cli-parse.md.

Rebase impact: low. This only narrows the ADR-0347 sanitizer deselect regexes after re-verifying test_cli_parse on current master; the CLI parser behavior and public options are unchanged.

Invariant to preserve on rebase: keep test_cli_parse out of the ASan / UBSan / TSan EXCLUDE regexes unless a new sanitizer report is captured and tracked in docs/state.md.

Smoke-test after rebase:

ASAN_OPTIONS=detect_leaks=1:halt_on_error=1:abort_on_error=1:print_summary=1 ./core/build-asan-cli/test/test_cli_parse
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ./core/build-ubsan-cli/test/test_cli_parse
TSAN_OPTIONS=halt_on_error=1 ./core/build-tsan-cli/test/test_cli_parse

2026-05-15 — test_predict Sanitizer Deselect Retired

Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md, changelog.d/fixed/sanitizer-predict.md.

Rebase impact: low. This only narrows the ADR-0347 sanitizer deselect regexes after re-verifying test_predict on current master; prediction logic, model loading, and output scores are unchanged.

Invariant to preserve on rebase: keep test_predict out of the ASan / UBSan / TSan EXCLUDE regexes unless a new sanitizer report is captured and tracked in docs/state.md.

Smoke-test after rebase:

ASAN_OPTIONS=detect_leaks=1:halt_on_error=1:abort_on_error=1:print_summary=1 ./core/build-asan-predict/test/test_predict
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ./core/build-ubsan-predict/test/test_predict
TSAN_OPTIONS=halt_on_error=1 ./core/build-tsan-predict/test/test_predict

2026-05-15 — vmaf-tune HDR Dispatch Coverage

Files touched: tools/vmaf-tune/src/vmaftune/hdr.py, tools/vmaf-tune/tests/test_hdr.py, tools/vmaf-tune/tests/test_auto_short_circuits.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-hdr-and-sampling.md, docs/research/0126-vmaf-tune-hdr-dispatch-coverage-2026-05-15.md.

Rebase impact: low. This extends the existing ADR-0300 dispatch table only; it does not change corpus schema, codec-adapter quality knobs, or HDR model lookup.

Invariant to preserve on rebase: hdr_codec_args() remains the single HDR argv contract. Hardware HEVC rows should emit p010le + main10 plus global color tags; hardware AV1 rows should emit p010le plus global color tags; codec-private mastering-display / MaxCLL flags stay limited to verified families.

Smoke-test after rebase:

PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
  tools/vmaf-tune/tests/test_hdr.py \
  tools/vmaf-tune/tests/test_auto_short_circuits.py -q

2026-05-15 — Docs Pages Strict-Anchor Repair

Files touched: docs/ai/quantization.md, docs/api/gpu.md, docs/metrics/ssimulacra2.md, docs/usage/vmaf-tune.md, changelog.d/fixed/docs-pages-anchor-strict.md.

Rebase impact: low. This only corrects MkDocs-rendered internal anchors and adds the missing docs/api/gpu.md HIP / Metal section targets consumed by docs/api/index.md.

Invariant to preserve on rebase: mkdocs build --strict must stay green on master before a docs-affecting PR is merged; do not weaken validation.links.anchors: warn to hide anchor drift.

Smoke-test after rebase:

.venv/bin/python -m mkdocs build --strict

2026-05-15 — CHUG FULL_FEATURES Parquet Metadata Enrichment

Files touched: ai/scripts/enrich_k150k_parquet_metadata.py, ai/tests/test_enrich_k150k_parquet_metadata.py, ai/AGENTS.md, docs/ai/chug-ingestion.md, and docs/ai/datasets/k150k.md.

Rebase impact: low. This adds a recovery utility for local FULL_FEATURES parquet jobs that predate --metadata-jsonl; it does not change the extraction schema or feature column order.

Invariant to preserve on rebase: the enrichment utility matches rows by clip_name, fills missing metadata cells by default, writes parquet atomically, and leaves feature/MOS columns unchanged unless the operator passes --overwrite-metadata.

Smoke-test after rebase:

PYTHONPATH=ai/src .venv/bin/python -m pytest \
  ai/tests/test_enrich_k150k_parquet_metadata.py \
  ai/tests/test_extract_k150k_features.py -q

fix/saliency-per-mb-eval-2026-05-15 — CLI short-opt + bench atoi fix (Batch 5)

Branch: fix/saliency-per-mb-eval-2026-05-15

Files touched: core/tools/cli_parse.c, core/tools/vmaf_bench.c, core/test/test_cli_parse.c, docs/usage/cli.md.

Rebase impact: low. cli_parse.c and vmaf_bench.c are upstream-shared files; if Netflix/vmaf ever adds a new short option or touches the same switch block, the case 'c': fall-through arm may need to be re-applied. The invariant comment (INVARIANT (ADR-0438)) marks the intent clearly. vmaf_bench.c changes are in the SYCL-gated #if defined(HAVE_SYCL) block; upstream is unlikely to add atoi back.

Invariant to preserve on rebase: every entry in short_opts[] in core/tools/cli_parse.c must have a matching case arm in the switch (o) block inside cli_parse(). If Netflix adds a new short option upstream without a case, the same silent-drop bug recurs.

Smoke-test after rebase:

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_cli_parse
./build/test/test_cli_parse   # expect: 18 tests run, 18 passed

fix/motion-fps-weight-all-gpu-backends — motion_fps_weight parity across all GPU twins

Branch: fix/saliency-per-mb-eval-2026-05-15 (squash PR #863)

Files touched: core/src/feature/cuda/integer_motion_v2_cuda.c, core/src/feature/sycl/integer_motion_v2_sycl.cpp, core/src/feature/vulkan/motion_v2_vulkan.c, core/src/feature/hip/integer_motion_v2_hip.c, core/src/feature/metal/integer_motion_v2_metal.mm, core/src/feature/cuda/float_motion_cuda.c, core/src/feature/sycl/float_motion_sycl.cpp, core/src/feature/vulkan/float_motion_vulkan.c, core/src/feature/hip/float_motion_hip.c, core/src/feature/metal/float_motion_metal.mm.

Rebase impact: low. All touched files are fork-local or fork-added GPU twins; upstream Netflix/vmaf does not maintain any GPU motion extractor files. No upstream-shared path is modified.

Invariant to preserve on rebase: motion_fps_weight must remain present in every motion-family GPU twin's VmafOption options[] table and applied identically (see canonical note in core/src/feature/cuda/AGENTS.md). If a future PR introduces a new motion GPU backend or a new motion-related option, the same option table and application math must be replicated across all twins in the same PR.

Smoke-test after rebase:

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast

core/test/meson.build — suite-tagging invariant (fix/meson-suite-fast)

Files touched: core/test/meson.build, core/test/AGENTS.md.

Rebase impact: moderate. Upstream Netflix/vmaf periodically adds new test() calls to core/test/meson.build without suite: arguments (that is the upstream convention). Every upstream sync or port-upstream-commit cherry-pick that touches this file must be followed by:

grep "^test(" core/test/meson.build | grep -v "suite :"

Any output is a missing tag — add the appropriate suite: before merging. Failure to do so silently breaks meson test -C build --suite=fast (the pre-push gate) because Meson's --suite filter matches only tests that declare the named suite; untagged tests are invisible to the filter and the command exits 0 with zero tests run.

Invariant to preserve on rebase: every test(...) call in core/test/meson.build carries a suite: keyword argument. The fast suite is the pre-push gate; simd and gpu are secondary selectors for CI matrix jobs. See core/test/AGENTS.md for the full tag matrix.

Smoke-test after rebase:

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast --list   # must print >20 tests, not 0

perf/cambi-calculate-c-values-avx512-neon-2026-05-16 (ADR-0452)

What changed: Added calculate_c_values_row_avx512 and calculate_c_values_row_neon as siblings of the existing calculate_c_values_row_avx2. Updated cambi.c dispatch to assign calculate_c_values_avx512 on AVX-512 hosts and corrected the NEON wrapper to call calculate_c_values_row_neon instead of the scalar fallback.

Rebase impact: low. All modified files are fork-local additions to cambi SIMD infrastructure; upstream Netflix/vmaf does not maintain AVX-512 or NEON CAMBI kernels. No public API surface is changed.

Invariant to preserve on rebase: The twin-update rule (x86/AGENTS.md, arm64/AGENTS.md) now requires that every cambi inner-loop function ported to AVX2 ships with AVX-512 + NEON siblings in the same PR. Do not merge a cambi AVX2 kernel without the matching AVX-512 + NEON files and a dispatch update in cambi.c.


refactor/gpu-dispatch-parse-dedup — shared GPU dispatch env tokenizer (ADR-0483)

Branch: refactor/gpu-dispatch-parse-dedup

Files touched: core/src/gpu_dispatch_parse.h (new), core/src/cuda/dispatch_strategy.c, core/src/sycl/dispatch_strategy.cpp, core/src/vulkan/dispatch_strategy.c.

Rebase impact: low. The three dispatch_strategy TUs are fork-local; upstream Netflix/vmaf does not have dispatch_strategy.c files. No public headers, no meson sources, and no link-time symbols change — the new gpu_dispatch_parse.h is a header-only static inline and is not added to any meson source list.

Invariant to preserve on rebase: k_<backend>_strategy_names[] index 0 must equal the backend enum's default value (e.g. VMAF_CUDA_DISPATCH_DIRECT = 0). When adding new strategy enum values, append to both the enum and the table; never reorder either.


perf/chug-drop-ssimulacra2-cuda-self-vs-self-2026-05-16 — K150K/CHUG self-vs-self extraction schema v2

Branch: perf/chug-drop-ssimulacra2-cuda-self-vs-self-2026-05-16

Files touched: ai/scripts/extract_k150k_features.py, ai/AGENTS.md, ai/tests/test_extract_k150k_no_ssimulacra2.py.

Rebase impact: low. All touched files are fork-local tiny-AI infrastructure; upstream Netflix/vmaf does not maintain ai/ or K150K extraction pipelines. No upstream-shared C/C++/headers are modified.

Invariant to preserve on rebase: the K150K extraction script (extract_k150k_features.py) is a fork-only feature. If upstream adds its own extract_features.py or similar, keep them separate under different package names; do not merge them. The parquet schema v2 (21-feature, no ssimulacra2) is now authoritative for new K150K/CHUG extraction runs. Existing v1 parquets (22-feature, with ssimulacra2) are grandfathered in; loaders must handle both by detecting feature count at runtime or reading a schema-version sidecar (future work).

Smoke-test after rebase:

python -m pytest ai/tests/test_extract_k150k_no_ssimulacra2.py -v
# Expected: 3/3 PASS

feat/psnr-hvs-vulkan-enable-chroma-2026-05-16 — enable_chroma option for psnr_hvs_vulkan (ADR-0461)

  • Touches: core/src/feature/vulkan/psnr_hvs_vulkan.c
  • Invariant: enable_chroma defaults to true; do not flip. When false, n_planes=1, chroma pipelines are not created, and the combined psnr_hvs score is suppressed. close_fex() relies on VK_NULL_HANDLE guards for the chroma pipeline variants.
  • No rebase impact on upstream files (fork-local Vulkan extractor).

Smoke-test after rebase:

# Default (enable_chroma=true): expect psnr_hvs_y + psnr_hvs_cb + psnr_hvs_cr + psnr_hvs
./build/tools/vmaf --reference src01_hrc00_576x324.yuv \
    --distorted src01_hrc01_576x324.yuv \
    --width 576 --height 324 --feature psnr_hvs_vulkan
# Luma-only: expect only psnr_hvs_y
./build/tools/vmaf --reference src01_hrc00_576x324.yuv \
    --distorted src01_hrc01_576x324.yuv \
    --width 576 --height 324 \
    --feature-opts 'psnr_hvs_vulkan=enable_chroma=false'

feat/hip-float-adm-real-2026-05-16 — HIP float_adm ninth consumer (ADR-0468)

Branch: feat/hip-float-adm-real-2026-05-16

Files touched: core/src/feature/hip/float_adm_hip.c (new), core/src/feature/hip/float_adm_hip.h (new), core/src/feature/hip/float_adm/float_adm_score.hip (new), core/src/hip/meson.build (add TU to hip_sources), core/src/meson.build (add float_adm_score to hip_kernel_sources), core/src/feature/feature_extractor.c (extern decl + #if HAVE_HIP list row), docs/adr/0468-hip-float-adm-real-kernel.md (new), docs/adr/README.md (index row), changelog.d/added/hip-float-adm-real-kernel.md (new).

Rebase impact: low. All new files are fork-local HIP infrastructure; upstream Netflix/vmaf does not maintain a HIP backend. The only upstream-shared file touched is feature_extractor.c, where the change is limited to adding an extern declaration and a single list entry inside #if HAVE_HIP — a block upstream does not have.

Invariant to preserve on rebase: float_adm_hip.c must track float_adm_cuda.c semantically. Any change to the four pipeline stages (DWT coefficients, decouple angle flag parenthesisation, CM threshold 8-neighbour sum, border factor) must be mirrored in both the CUDA and HIP TUs. The warp-size difference (CUDA=32 vs HIP=64) means the shared-memory partial arrays differ in size (FADM_WARPS_PER_BLOCK = 8 vs 4); this is correct and must not be unified.


perf/adm-p-norm-fast-path-vif-arm64-malloc-2026-05-16 (ADR-0463)

What changed: Added adm_cm_s_p3, adm_csf_den_scale_s_p3, and adm_sum_cube_s_p3 fast-path variants in adm_tools.c; dispatch added in adm.c:compute_adm. Removed per-call aligned_malloc from the scalar fallback paths of vif_filter1d_s, vif_filter1d_sq_s, and vif_filter1d_xy_s in vif_tools.c — the caller-supplied tmpbuf is used instead.

Rebase impact: low. All modified files (adm_tools.c, adm_tools.h, adm.c, vif_tools.c) are shared with upstream Netflix/vmaf. The ADM changes add new symbols (no existing signatures altered). The VIF changes only remove local malloc/free; the function signatures and caller-supplied tmpbuf contract are unchanged.

Invariant to preserve on rebase: When upstream Netflix/vmaf modifies adm_cm_s, adm_csf_den_scale_s, or adm_sum_cube_s, the corresponding _p3 variants in the fork must receive the same logic change (minus the powf path). When upstream modifies vif_filter1d_* scalar fallbacks, ensure they do not reintroduce aligned_malloc in the fallback body. See core/src/feature/AGENTS.md performance-invariant section.

fix/dispatch-strategy-registry-audit-2026-05-15 — dispatch registry deduplication + HIP/Metal fixes

Touches: core/src/feature/feature_extractor.c (SYCL/Vulkan sections of feature_extractor_list[]), core/src/hip/dispatch_strategy.c, core/src/metal/dispatch_strategy.c.

Rebase impact: low for the SYCL/Vulkan deduplication (purely cosmetic — first-match semantics mean behaviour is unchanged). Medium for HIP and Metal dispatch-supports: if an upstream sync adds new feature_extractor_list[] entries for HIP or Metal extractors, they must also be added to g_hip_features[] / g_metal_features[] in the same commit.

Invariant to preserve on rebase: every vmaf_fex_*_hip extractor registered in feature_extractor_list[] must appear in g_hip_features[] in core/src/hip/dispatch_strategy.c. Every vmaf_fex_*_metal extractor must appear (by extractor .name and all provided_features[] keys) in g_metal_features[] in core/src/metal/dispatch_strategy.c. The build does not enforce this — run scripts/ci/check-dispatch-registry.sh after any kernel addition.

Smoke-test after rebase:

meson setup build -Denable_hip=true -Denable_hipcc=false
ninja -C build
# Must compile without errors; vmaf_fex_float_adm_hip must be registered.
# With a ROCm 6+ toolchain:
meson setup build -Denable_hip=true -Denable_hipcc=true
ninja -C build
meson test -C build --suite=fast

perf/vif-cuda-smem-staging-2026-05-16 (ADR-0454)

Files touched: core/src/feature/cuda/integer_vif/filter1d.cu, core/src/feature/cuda/AGENTS.md.

Rebase impact: low. filter1d.cu is a fork-local CUDA kernel file (Netflix/vmaf does not maintain CUDA implementations). AGENTS.md is fork-local. No upstream-shared path, public header, or build file is modified.

Invariant to preserve on rebase: the __shared__ smem staging in all four filter template functions must be preserved verbatim (see canonical note added to AGENTS.md). If a future upstream commit adds a filter1d.cu-equivalent (unlikely — Netflix does not ship GPU VIF), reconcile by keeping the smem staging on our side. Do not remove the __syncthreads() between the cooperative load and the compute phase — that barrier is the only thing ordering the smem writes from all threads before any thread reads.


perf/ssimulacra2-cuda-blur-fusion-transpose — 3-channel kernel fusion + V-pass transpose (ADR-0456)

  • Touches: core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu, core/src/feature/cuda/ssimulacra2_cuda.c, core/src/feature/cuda/AGENTS.md.
  • Invariant: ssimulacra2_blur.cu must export exactly 5 kernel symbols: ssimulacra2_transpose, ssimulacra2_blur_h, ssimulacra2_blur_h3, ssimulacra2_blur_v, ssimulacra2_blur_v3_transposed. The host dispatch in ssimulacra2_cuda.c looks up all 5 via cuModuleGetFunction at init time; removing or renaming any symbol causes a hard init failure. The fused kernels use gridDim.z = 3 with plane_stride = width * height (full-resolution constant stride, NOT scale-adjusted stride); any change to the stride contract must propagate to both the kernel and the three host dispatch helpers (ss2c_launch_blur_h3, ss2c_launch_transpose, ss2c_launch_blur_v3_transposed). The --fmad=false flag in the cuda_cu_extra_flags map in core/src/meson.build is load-bearing for the places=4 cross-backend parity gate; do not remove it.
  • Re-test:
meson setup core/build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C core/build_cuda tools/vmaf
# Correctness gate:
./core/build_cuda/tools/vmaf \
    --reference testdata/ref_576x324_48f.yuv \
    --distorted testdata/dis_576x324_48f.yuv \
    --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
    --feature ssimulacra2_cuda --output /tmp/cuda_out.xml --backend cuda
# Cross-backend parity:
python3 scripts/ci/cross_backend_parity_gate.py \
    --features ssimulacra2 --backends cpu cuda --places 4

perf/adm-cm-cuda-warp-reduce-fusion — ADM CM i4 warp-reduce fusion (2026-05-16)

What changed: integer_adm/adm_cm.cui4_adm_cm_line_kernel (writes INT32 per-thread values to accum_per_thread global scratch) + adm_cm_reduce_line_kernel_4 (reads scratch, cubic-accumulates, warp-reduces) replaced by a single i4_adm_cm_line_kernel_fused kernel that does all three steps internally and writes via atomicAdd_int64. integer_adm_cuda.c updated: one cuLaunchKernel per scale (was two), func_adm_cm_reduce_line_kernel_4 removed from AdmStateCuda.

Rebase impact: low. All touched files are fork-local CUDA kernels and their host glue; Netflix upstream does not maintain GPU ADM CM kernels.

Invariant to preserve on rebase: the fused kernel's shift constants (shift_sq=30, add_shift_sq=1<<29, shift_cub=ceil(log2(w)), shift_inner_accum=ceil(log2(h))) must match those used by adm_cm_reduce_line_kernel in the same file for scale != 0. If the reduce kernel's constants are ever changed, the fused kernel's constants must be updated in lockstep.

Smoke-test after rebase:

meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast
python3 scripts/ci/cross_backend_parity_gate.py \
    --features vif --backends cpu cuda --places 4

python3 scripts/ci/cross_backend_parity_gate.py --features adm --backends cpu cuda --places 4

---

### Ghost `moment_vulkan.c` removed (fix/drop-ghost-moment-vulkan-c)

PR #1067 re-introduced the pre-rename `moment_vulkan.c` alongside the new
`float_moment_vulkan.c` that PR #1046 had established. The ghost file was
deleted and `core/src/vulkan/meson.build` updated to reference
`float_moment_vulkan.c` exclusively.

**Rebase impact**: any branch that modified `moment_vulkan.c` must be
re-targeted to `float_moment_vulkan.c` instead.

**Smoke-test after rebase**:

```bash
ninja -C build && echo "no duplicate symbol error"

perf/cache-rfe-hw-flags — cache rfe_hw_flags bitmask (F2-B)

File changed: core/src/libvmaf.cVmafContext struct + vmaf_init + vmaf_use_feature + vmaf_read_pictures.

No rebase impact: the change is entirely internal to libvmaf.c; no public header touched, no FFmpeg-patch surface changed.

Invariant: rfe_hw_flags_dirty must be set to true in vmaf_init (after the memset zeroes it to false). If a future refactor moves the memset or adds a second init path, the dirty flag must be set at every init site.

Smoke-test after rebase:

meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build src/liblibvmaf.a.p/libvmaf_src_libvmaf.c.o
# Expected: compiles without error or warning

PR #1067 clobbered four GPU feature options (fix/enable-chroma-pr1067-regression)

PR #1067 (bootstrap name-builder refactor) merged a stale base that pre-dated four option additions and overwrote them:

  • integer_psnr_metal.mm: lost enable_chroma field + option entry + n_planes guard (PR #986)
  • float_psnr_metal.mm: lost enable_chroma + per-plane dispatch loop + n_planes (PR #978)
  • psnr_vulkan.c: ceiling division reverted to floor division for chroma geometry (PR #878)
  • vif_vulkan.c: lost vif_skip_scale0 field + option entry + score-suppression guards (PR #1057)

Rebase impact: any branch that adds options to these four files and was branched before PR #1067 merged must be rebased onto master (post-fix) to avoid re-clobbering these options.

Smoke-test after rebase:

meson setup build -Denable_cuda=false -Denable_sycl=false --wipe
ninja -C build
meson test -C build --suite=fast

perf/chug-sidecar-bit-depth-key-f6b (2026-05-17)

File: core/src/feature/hip/integer_psnr_hip.c

No rebase impact: the change is additive (new option + plane-loop). If upstream later changes the HIP PSNR submit/collect call-graph, re-check that the per-plane loop in submit_fex_hip and collect_fex_hip matches whatever new structure upstream introduces. The kernel (psnr_score.hip) is unchanged.


fix/float-vif-skip-scale0-hip-metal

Files: core/src/feature/sycl/float_vif_sycl.cpp, core/src/feature/hip/float_vif_hip.c, core/src/feature/metal/float_vif_metal.mm

No rebase impact: the changes are additive (new field + option + host-side guard in the collect path). GPU kernels are unchanged. If upstream later changes the float_vif collect path or adds vif_skip_scale0 natively, re-check that scale-0 suppression in all three backends matches the CPU implementation in float_vif.c.


fix/adm-metal-missing-options

File: core/src/feature/metal/integer_adm_metal.mm

No rebase impact: the change moves three implicit defaults from init_fex_metal into the options table. The struct fields and kernel dispatch are unchanged. If upstream adds its own Metal ADM options or renames the default macros, re-check that DEFAULT_ADM_CSF_SCALE, DEFAULT_ADM_CSF_DIAG_SCALE, and DEFAULT_ADM_NOISE_WEIGHT still resolve correctly.


fix/docs-pr-strict-check-batch18

Files: .github/workflows/lint-and-format.yml, .github/workflows/required-aggregator.yml

No rebase impact: the change adds a new CI job (docs-lint) and a new entry in the required-aggregator check list. Both are purely additive and contain no fork-local logic that upstream could change. If upstream adds its own docs-lint CI, dedup by dropping our job or merging the two.


chore/svm-h-remove-orphaned-xxx-marker

File: core/src/svm.h

No rebase impact: removes an empty /* XXX */ comment from the vendored libsvm header and folds two trailing comment lines on free_sv into one two-line block. If upstream libsvm updates svm.h, re-apply by re-removing the marker (it originates in libsvm upstream and may reappear).


model/tiny/vmaf_tiny_v1_medium.onnx

Files: model/tiny/vmaf_tiny_v1_medium.onnx (binary, inline repack), model/tiny/vmaf_tiny_v1_medium.onnx.data (deleted), changelog.d/fixed/model-tiny-v1-medium-external-data.md.

No rebase impact: model-only binary change, no C-API or registry schema change. If upstream Netflix ever ships a file with this name, treat as a conflict and keep the fork's version (it is a fork-local model, not an upstream artifact).

docs/motion-dedicated-page

Files: docs/metrics/motion.md (new), docs/metrics/features.md, docs/adr/0491-motion-dedicated-doc-page.md, docs/adr/README.md, changelog.d/added/motion-dedicated-doc-page.md.

No rebase impact: doc-only addition. If upstream Netflix adds a motion extractor or renames existing ones, update docs/metrics/motion.md to match — no code change required.


fix/vulkan-vif-shader-fp64-for-bit-exact

Files: core/src/feature/vulkan/shaders/vif.comp, core/src/vulkan/common.c, docs/adr/0492-vulkan-vif-shader-fp64-g-computation.md, docs/backends/vulkan/overview.md, changelog.d/fixed/vulkan-vif-fp64-g-computation.md.

Rebase sensitivity (medium): vif.comp carries the fp64 extension declaration at line 68 and the revised g/sv_sq block at ~line 540. If upstream Netflix modifies the VIF computation path in integer_vif.c, re-verify that the double-precision GLSL block still mirrors the CPU reference exactly (especially the eps constant and the int32 truncation order for sv_sq). The common.c shaderFloat64 probe must stay in sync with any new device-feature guards added to the same function.


fix/test-output-portable-tempfile

Files: core/test/test_output.c, changelog.d/fixed/test-output-portable-tempfile.md.

No rebase impact: core/test/test_output.c is fork-local (added by PR #963, fork-only coverage gap follow-up; Netflix upstream has no equivalent file). The Windows-portable make_temp_path() helper sits inside that file and is not exported. Upstream syncs do not touch it.


fix/mcp-probe-findings-2026-05-17

Files: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, mcp-server/vmaf-mcp/tests/test_probe_findings_2026_05_17.py, mcp-server/vmaf-mcp/tests/test_backend_dispatch.py, testdata/bench_all.sh, docs/adr/0495-mcp-probe-bug-fixes.md, docs/mcp/tools.md, docs/state.md, changelog.d/fixed/mcp-probe-bug-cluster-2026-05-17.md.

Rebase sensitivity (low): all changes live in the fork-only mcp-server/ tree and the fork-added testdata/bench_all.sh (Netflix upstream has neither). The new _BACKEND_DISABLE / _BACKEND_PROBE_CACHE helpers and the --no_<backend> flag plumbing in _run_vmaf_score assume the libvmaf CLI continues to advertise --no_<backend> switches in --help and to accept them on the command line. If upstream ever removes them (the new --backend $NAME exclusive selector landed in the fork on 2026-04-28 — see bench_all.sh header comments), swap _probe_backends to parse the selector grammar instead. No upstream- mirrored file is touched.


fix/bbb-e2e-v2-bug-cluster-2026-05-18

Files: core/tools/vmaf.c (init_gpu_backends explicit-backend gating + amend_json_with_backend_used helper), tools/vmaf-tune/src/vmaftune/{score,bisect,corpus,ladder,report,encode,cli}.py, tools/vmaf-tune/tests/test_bbb_e2e_v2_bug_cluster.py, dev/Containerfile (matplotlib), dev/scripts/dev-mcp-entrypoint.sh (mkdir -p /tmp), docs/adr/0498-vmaf-tune-bbb-e2e-v2-bug-cluster.md, docs/adr/README.md, docs/state.md, docs/usage/vmaf-tune.md, docs/backends/index.md, docs/development/dev-mcp.md, changelog.d/fixed/vmaf-tune-bbb-e2e-v2-bug-cluster.md.

Rebase sensitivity (medium for core/tools/vmaf.c, low for the rest): the C-side change is bolted on at the end of each backend's state_init failure stanza inside init_gpu_backends; an upstream refactor that restructures that helper (Netflix has no equivalent function — the fork extracted it as ADR-0141 §2 with NOLINTNEXTLINE) would need the explicit-backend if (...) return -1; gates re-applied per backend. The amend_json_with_backend_used helper is fork-local (operates on the file libvmaf wrote — no API change) and survives upstream syncs verbatim. The vmaf-tune fixes live entirely in tools/vmaf-tune/ which is fork-added; no upstream-mirror file is touched. The ScoreRequest.duration_s and CorpusJob.{src_width, src_height} field additions are optional with safe defaults so older test fixtures still compile. ffmpeg-patches are unaffected — no public C API surface changed.

fix/vmaf-tune-ladder-reference-decode-v3

Files: tools/vmaf-tune/src/vmaftune/{corpus,score}.py, tools/vmaf-tune/tests/test_bbb_e2e_v3_bug_cluster.py, docs/adr/0499-vmaf-tune-ladder-reference-decode-v3.md, docs/adr/README.md, docs/usage/vmaf-tune.md, changelog.d/fixed/vmaf-tune-ladder-reference-decode-v3.md.

Rebase sensitivity (none): all changes live in tools/vmaf-tune/ which is fork-added — no upstream Netflix file is touched. The new _maybe_decode_reference helper and _decode_source_to_yuv shared building block are private module functions; no public API was added or renamed. Dropping .y4m from _VMAF_RAW_SUFFIXES / VMAF_RAW_SUFFIXES is a behaviour change inside the wrapper that matches what the libvmaf CLI has always done (raw_input_open rejects Y4M files when use_yuv=true — see core/tools/cli_parse.c and the regression test test_vmaf_raw_suffixes_matches_libvmaf_cli_source which cross-checks the table against the CLI source). ffmpeg-patches unaffected. No effect on bisect.py (already decodes the reference per ADR-0498) — the regression test test_bisect_decodes_reference_too pins the existing invariant.


perf/vif-lut-shrink-and-filter-cache (ADR-0500)

Touches upstream-mirrored files: core/src/feature/integer_vif.h, integer_vif.c, vif.c, vif.h, float_vif.c, x86/vif_avx512.c.

Rebase note: when pulling upstream changes to any of these files, verify that:

  1. VifPublicState.log2_table (now 32768 entries) is not reverted to 65537 entries.
  2. The compute_vif signature addition (precomputed_filters, precomputed_filter_widths) does not conflict with upstream signature changes.
  3. The three _mm512_i32gather_epi64 gather sites in vif_avx512.c retain the _mm256_and_si256 index mask with VIF_LOG2_TABLE_SIZE - 1.
  4. log_generate fills indices [0..32767] with log2f(32768+i)*2048 (not the original log2f(i)*2048 for i in [32767..65535]).

If upstream changes any of the above, a new reconciliation pass is needed.

fix/bbb-e2e-v4-bug-cluster-2026-05-18 (ADR-0501)

Files: tools/vmaf-tune/src/vmaftune/{corpus,ladder,cli}.py, tools/vmaf-tune/tests/{test_bbb_e2e_v2_bug_cluster,test_bbb_e2e_v4_bug_cluster}.py, docs/adr/0501-vmaf-tune-bbb-e2e-v4-bug-cluster.md, docs/adr/README.md, docs/usage/vmaf-tune.md, changelog.d/fixed/0501-vmaf-tune-bbb-e2e-v4-bug-cluster.md.

Rebase sensitivity (none): all changes live in tools/vmaf-tune/ (fork-added) and fork-added docs/changelog files — no upstream Netflix file is touched. The corpus / ladder / cli modules are not present in Netflix master. The optional target_width / target_height kwargs added to _decode_source_to_yuv and _maybe_decode_reference default to None so older test fixtures and the ADR-0499 single-resolution code path round-trip unchanged. The samples= kwarg added to emit_manifest / _emit_json is keyword-only with a None default; HLS / DASH emitters silently ignore it. _run_report's stdout JSON grows two new fields (degraded, codec_rows_unavailable) — a strict schema consumer that asserts on absence would need updating, but the existing fields stay populated. ffmpeg-patches are unaffected; no public C API surface changed.


perf/adm-decouple-gather-locality-2026-05-18 (ADR-0502)

Files touched: core/src/feature/x86/adm_avx512.c (upstream-mirror), core/src/feature/x86/AGENTS.md, docs/adr/0502-adm-decouple-gather-prefetch.md, docs/adr/README.md, docs/research/0435-adm-decouple-gather-locality.md, changelog.d/performance/adm-decouple-gather-prefetch.md.

Rebase sensitivity (low): the only upstream-shared file touched is adm_avx512.c. The change is a self-contained block (16 lines, guarded by if (j + 32 < right_mod16)) inserted before the three vpgatherdd lines. Conflicts arise only if Netflix upstream modifies adm_decouple_avx512 — resolution: apply the prefetch block to the updated gather cluster in the upstream version. The adm_div_lookup LUT signature is unchanged; the adm_div_lookup[val + 32768] access pattern is identical to scalar. No public C-API surface, no header, no meson build files touched.

ADR-0503: vif_subsample_rd_8_avx512 loop fission (2026-05-18)

File: core/src/feature/x86/vif_avx512.c

If Netflix upstream modifies vif_subsample_rd_8_avx512, the two noinline helpers (vif_subsample_rd_8_vert_j, vif_subsample_rd_8_horiz_j) and their parameter structs (VifVertCoeffs8, VifHorizCoeffs8) must be kept in sync with any upstream changes to the accumulation order or filter constant initialisation. The struct fields map 1:1 to the local variables in the original monolithic body (f0-f4, mask2/3/x for vertical; fcoeff-fcoeff4, addnum, mask1 for horizontal). No public C-API surface, no header, no meson build files touched.


fix/bbb-e2e-v5-bug-cluster-2026-05-18 (ADR-0505)

Files: tools/vmaf-tune/src/vmaftune/{corpus,ladder,cli}.py, tools/vmaf-tune/tests/{test_bbb_e2e_v4_bug_cluster,test_bbb_e2e_v5_bug_cluster}.py, docs/adr/0505-vmaf-tune-bbb-e2e-v5-bug-cluster.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0505-vmaf-tune-bbb-e2e-v5-bug-cluster.md.

Rebase sensitivity (none): all changes live in fork-added tools/vmaf-tune/ plus fork-added docs/changelog files — no upstream Netflix file is touched. The new source_is_container wiring in corpus.iter_rows consumes an existing EncodeRequest field (added in earlier fork work); container-detection logic is a suffix-set membership test against the fork-local _VMAF_RAW_SUFFIXES. The new cloud_sink kwarg on make_default_sampler and _default_sampler defaults to None so every existing caller round-trips unchanged.


fix/ladder-duration-clip-ffmpeg-t-flag (ADR-0508)

Files: tools/vmaf-tune/src/vmaftune/encode.py, tools/vmaf-tune/tests/test_bbb_e2e_v8_bug_cluster.py, docs/adr/0508-vmaf-tune-ladder-pass1-stats-duration-clip.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0508-vmaf-tune-ladder-pass1-stats-duration-clip.md.

Rebase sensitivity (none): all changes live in fork-added tools/vmaf-tune/ plus fork-added docs/changelog files — no upstream Netflix file is touched. The fix adds a six-line fallback to build_pass1_stats_command that reads the existing EncodeRequest.duration_s field (introduced by ADR-0506 V6-1) and emits an input-side -t duration_s when the caller did not opt into sample-clip mode. Sample-clip precedence is preserved so existing tests that pin the sample-clip argv shape continue to pass unchanged. No public surface of the libvmaf C API changes; no ffmpeg-patches file consumes tools/vmaf-tune/ Python helpers.

fix/bbb-e2e-v6-bug-cluster-2026-05-18 (ADR-0506)

Files: tools/vmaf-tune/src/vmaftune/{corpus,encode,cli}.py, tools/vmaf-tune/tests/test_bbb_e2e_v6_bug_cluster.py, docs/adr/0506-vmaf-tune-bbb-e2e-v6-bug-cluster.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0506-vmaf-tune-bbb-e2e-v6-bug-cluster.md.

Rebase sensitivity (none): all changes live in fork-added tools/vmaf-tune/ plus fork-added docs/changelog files — no upstream Netflix file is touched. EncodeRequest gains a new duration_s: float = 0.0 field with a back-compatible default so every existing caller round-trips unchanged. _decode_source_to_yuv gains four new kwargs (source_is_raw, source_width, source_height, source_framerate) all defaulting to None/False; the container-source path (which is what every v3/v4/v5 test exercises) takes the legacy branch and emits an identical argv. _maybe_decode_reference and iter_rows wire the new kwargs through. No public surface of the libvmaf C API changes; no ffmpeg-patches file consumes the modified tools/vmaf-tune/ Python helpers.

fix/mcp-backend-probe-allowlist-ladder-score-backend (ADR-0511)

Files: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, mcp-server/vmaf-mcp/tests/test_backend_probe_and_allowlist_0509.py, tools/vmaf-tune/src/vmaftune/{cli,ladder}.py, tools/vmaf-tune/tests/test_ladder_score_backend_0509.py, docs/adr/0511-mcp-backend-probe-allowlist-and-ladder-backend.md, docs/adr/README.md, docs/state.md, docs/mcp/backends.md, docs/usage/vmaf-tune.md, docs/rebase-notes.md, mcp-server/AGENTS.md, tools/vmaf-tune/AGENTS.md, changelog.d/fixed/mcp-and-ladder-backend.md.

Rebase sensitivity (none): every touched file is fork-local — the MCP server (mcp-server/) and the vmaf-tune CLI (tools/vmaf-tune/) are wholly fork-added trees with no upstream counterpart. The libvmaf C surface is untouched (the probe shells out to the existing vmaf --help flag table, no new CLI surface added there), and no ffmpeg-patches/ patch consumes tools/vmaf-tune/ or mcp-server/ Python helpers. make_default_sampler and _default_sampler gain a new score_backend: str | None = None kwarg with a back-compatible default, so every existing caller round-trips unchanged. _run_tune_per_shot is deliberately not touched — the auto → None → libvmaf-picks predicate contract is preserved as documented inline.

fix/windows-mingw64-build-repair (ADR-0515)

Rebase sensitivity (none): test-only change confined to a fork-added file (core/test/test_public_api_score.c, added 2026-05-16 from the C-API coverage audit). The Win32 #ifdef branch mirrors the pre-existing pattern in core/test/dnn/test_model_loader.c::test_sidecar_parses; no public C-API or ffmpeg-patches/ surface touched. No upstream-Netflix counterpart for this test file, so upstream rebases cannot collide with the helper.

fix/tiny-model-loader-external-data-and-feature-rank (ADR-0518)

Files: core/src/dnn/{model_loader.h,model_loader.c,dnn_ctx.h,ort_backend.h,ort_backend.c,AGENTS.md}, core/src/libvmaf.c, core/test/dnn/{meson.build,test_cli.sh,test_model_loader.c}, docs/adr/0518-tiny-model-loader-external-data-and-feature-rank.md, docs/adr/README.md, docs/ai/inference.md, docs/state.md, docs/research/0518-tiny-model-loader-feature-rank.md, docs/rebase-notes.md, changelog.d/fixed/tiny-model-loader.md.

Rebase sensitivity (low — fork-local dnn surface, additive): All touched files are fork-local except core/src/libvmaf.c, where the changes are confined to the tiny-AI bridge (the VmafContext::dnn struct in the file-private definitions block, vmaf_ctx_dnn_free, vmaf_ctx_dnn_attach, and vmaf_ctx_dnn_run_frame — all fork-added per ADR-0040 / ADR-0042). The struct grows by four fields (in_rank, n_features, extra_in_width, extra_in_buf); no existing offset shifts since the new fields are appended inside the dnn substruct, which is private to libvmaf.c. VmafModelSidecar (declared in core/src/dnn/model_loader.h) grows by n_features / feature_names[VMAF_DNN_MAX_FEATURE_NAMES] / feature_mean[] / feature_std[] / has_feature_scaler — this is a header consumed only by the dnn TU set and the DNN tests; consumers outside core/src/dnn/ or core/test/dnn/ should not depend on the struct layout (it's an internal sidecar contract, not a public API). vmaf_ort_input_shape_at() is a new public symbol on ort_backend.h; the existing vmaf_ort_input_shape() remains as the slot == 0 shortcut. No ffmpeg-patches file consumes any of the changed symbols.

fix/hip-import-state-implementation (ADR-0519)

Files: core/src/hip/common.c (delete vmaf_hip_import_state stub body — 9 lines removed), core/src/libvmaf.c (add HAVE_HIP block: header include, hip field on VmafContext, real implementation of vmaf_hip_import_state, cleanup in vmaf_close), core/test/test_hip_smoke.c (replace test_import_state_returns_enosys with test_import_state_validates_arguments + test_import_state_succeeds_with_real_state), docs/adr/0519-hip-import-state-implementation.md, docs/adr/README.md, docs/backends/hip/overview.md, docs/state.md, docs/research/0519-hip-import-state.md, docs/rebase-notes.md, changelog.d/fixed/hip-import-state.md.

Rebase sensitivity (low — fork-local HIP surface, additive): The VmafContext struct in core/src/libvmaf.c grows by one field — a hip substruct holding a single VmafHipState * pointer — gated by #ifdef HAVE_HIP. The field is appended after the existing metal substruct (the last existing GPU substruct), so no offsets shift in CPU-only / CUDA-only / etc. builds. The new vmaf_hip_import_state definition matches the existing public declaration in core/include/libvmaf/libvmaf_hip.h field-for-field; the header is unchanged, so consumers (the fork's core/tools/vmaf.c and any future ffmpeg-patches/ HIP consumer) recompile against the same ABI. core/src/hip/common.c loses the 9-line vmaf_hip_import_state stub; no other in-file functions are touched. On upstream rebase the patch is trivially applicable because Netflix/vmaf master ships no HIP backend; the entire core/src/hip/ tree is fork-local (ADR-0212). No ffmpeg-patches file consumes vmaf_hip_import_state directly today — vf_libvmaf reaches HIP only through the CPU-path fallback for now.

2026-05-18 — --tiny-codec / --tiny-preset / --tiny-crf populate codec block (ADR-0522, PR #TBD)

Fork-local. Adds three CLI flags + one public C-API (vmaf_dnn_set_codec_context) that override the ADR-0518 "unknown" codec pre-seed for codec-aware tiny models (fr_regressor_v2). Touched: core/include/libvmaf/dnn.h (public header — new export), core/src/dnn/dnn_attach_api.c (public symbol), core/src/dnn/dnn_ctx.h (bridge), core/src/libvmaf.c (bridge implementation — vmaf_ctx_dnn_set_codec_context), core/src/dnn/model_loader.{c,h} (sidecar encoder_vocab[] parsing + vmaf_dnn_codec_block_fill helper + VmafModelSidecar grows by n_encoder_vocab / encoder_vocab[VMAF_DNN_MAX_ENCODER_VOCAB] / codec_aware), core/tools/cli_parse.{c,h} (three new flags + three new CLISettings fields: tiny_codec, tiny_preset, tiny_crf), core/tools/vmaf.c (call site after vmaf_use_tiny_model), core/test/dnn/test_model_loader.c (8 new tests).

Rebase sensitivity (low — fork-local additive): All touched files are fork-local. VmafModelSidecar and the VmafContext::dnn substruct grow by additive fields only — no existing offset shifts. vmaf_dnn_set_codec_context() is a new VMAF_EXPORT symbol on the public libvmaf/dnn.h surface; the ffmpeg-patches stack does NOT currently consume tiny-model inference (the vf_libvmaf filter wires through the classic metric collector, not the tiny-AI surface), so no patch update is required for this PR (per CLAUDE.md §12 r14: "Does NOT apply to … kernel implementations behind an existing public surface"). The C-side codec_block_preset_ordinal table is a duplicate of train_fr_regressor_v2.py::PRESET_ORDINAL; the core/src/dnn/AGENTS.md invariant note flags both files as a co-edit pair. The sidecar encoder_vocab array is the single source of truth for the vocabulary; vocab bumps (e.g. ADR-0302 v3) only require a new sidecar JSON, no C recompile.

fix/cli-threads-parse-safety-v2 (ADR-0528)

Files touched: core/test/test_cli_parse_long_only_args.c, core/tools/cli_parse.c.

Rebase sensitivity (low — fork-local additive): Both files are fork-local. The test (test_cli_parse_long_only_args.c) has no upstream twin — it was added in PR #408 / ADR-0316 to lock down the long-only short-option synthesis bug. cli_parse.c is upstream- adjacent (Netflix maintains its own error() with the same assert(long_opts[n].name) shape and the same sprintf(optname, …) calls). On a future upstream sync, expect a merge conflict on error() if Netflix changes the assert / sprintf lines independently: keep the fork's if (!found) usage(…); return; + snprintf shape and drop the <assert.h> include. No public API surface changes; the ffmpeg-patches stack is untouched.

2026-05-18 — HIP integer_motion flag promotion + HIP_DEVICE buffer enum (ADR-0530, PR #TBD)

Extends ADR-0519. Promotes VMAF_FEATURE_EXTRACTOR_HIP on vmaf_fex_integer_motion_hip so the model-driven dispatch picks the HIP kernel instead of the CPU twin when a HIP state is imported. Adds VMAF_PICTURE_BUFFER_TYPE_HIP_DEVICE to the picture-buffer enum (reserved for the future HIP picture pool; HIP TUs still accept HOST and do their own HtoD copy). Wires compute_fex_flags() for HIP, adds a CPU-twin fallback in vmaf_get_feature_extractor_by_feature_name(), drains HIP-flagged extractors' gpu_pending final-frame collect in flush_context_serial(), and routes the HIP integer_motion collect/flush writes through feature_name_dict so the encoded option-aware key matches the predict-side lookup.

Touched: core/src/picture.h (new enum entry), core/src/feature/feature_extractor.c (dispatch buffer-type check + _by_feature_name fallback + new extern + registry row), core/src/libvmaf.c (compute_fex_flags HIP slot + flush_context_serial HIP drain), core/src/feature/hip/integer_motion_hip.c (flag bit set + dict-aware writes), core/src/feature/hip/integer_vif_hip.c (flag bit cleared with citation — un-promotes pending kernel-level fix), core/src/hip/meson.build (compile integer_motion_hip.c), core/src/meson.build (motion_score.hip HSACO), core/src/hip/AGENTS.md (invariant rewrite), core/test/test_hip_smoke.c (registration + flag-dispatch tests), docs/backends/hip/overview.md, docs/rebase-notes.md, changelog.d/added/0530-hip-integer-motion-flag-promotion.md, docs/state.md.

Rebase sensitivity (medium — touches upstream-mirror feature_extractor.c dispatch site):

The dispatch-time HIP buffer-type check is a NEW symmetric block right after the existing CUDA buffer-type check. Any upstream port that touches the CUDA block needs a paired update to the HIP block to keep them symmetric. The CPU-twin fallback pass in _by_feature_name is a documented contract going forward (ADR-0530) — future GPU backend work cannot assume "flag set ⇒ full coverage"; treat the fallback as the established behaviour, not as a bug to fix.

The compute_fex_flags() HIP slot mirrors the existing Vulkan / SYCL slots field-for-field; the flush_context_serial() HIP drain mirrors the SYCL flush_context_sycl drain. Any upstream refactor that relocates either function needs to move all three GPU slots / drains together.

vmaf_fex_integer_vif_hip had VMAF_FEATURE_EXTRACTOR_HIP set speculatively in its batch-1 commit; this PR clears it with an inline citation. Do NOT re-enable on a future rebase without a kernel-level GPU-memory-access-fault fix and an ADR-0530-style per-extractor reproducer.

No public-header change → no ffmpeg-patches/ update required (per CLAUDE.md §12 r14: the new picture-buffer-type enum lives in the libvmaf-private src/picture.h, not the public include/libvmaf/picture.h; the ffmpeg vf_libvmaf filter hands HOST buffers to libvmaf and is unaffected).

feat/hip-register-all-extractors (ADR-0533)

Files touched: core/src/hip/meson.build, core/src/feature/feature_extractor.c, core/test/test_hip_smoke.c, core/src/hip/AGENTS.md, docs/backends/hip/overview.md, docs/state.md, docs/adr/0533-hip-all-extractors-registration-sweep.md, docs/adr/README.md, changelog.d/fixed/hip-register-all-extractors.md.

Rebase sensitivity (low — fork-local only): All edits sit in fork-additive HIP plumbing. Upstream Netflix/vmaf ships no HIP backend, so the #if HAVE_HIP blocks in feature_extractor.c are entirely fork-local; the extern + registry entries land inside the same #if HAVE_HIP regions ADR-0523 already extended. core/src/hip/meson.build is a fork-added file (the subdir('hip') invocation is gated on enable_hip). No public C-API surface changes — vmaf_get_feature_extractor_by_name already existed; the sweep only adds rows to the table it reads. The ffmpeg-patches stack is untouched (no new LIBVMAFContext field, no new CLI flag, no new meson_options.txt entry). On a future upstream sync, expect zero conflicts — Netflix never touches HIP files. If a future PR adds a new HIP feature TU, the invariant pinned in core/src/hip/AGENTS.md (every vmaf_fex_*_hip symbol must appear in hip_sources + the extern/registry blocks) must be honoured or the registration drops out silently.

ADR-0538 — per-shot predicate bitrate sidecar (PR #1290 follow-up)

No rebase impact: the change is entirely internal to tools/vmaf-tune/src/vmaftune/cli.py (_build_per_shot_bisect_predicate return type change + call-site unpack + dataclasses.replace patch loop) and the corresponding test file. No public API surface, no C code, no meson_options.txt entry, no ffmpeg-patches entry, no new public Python symbol. Netflix upstream never touches vmaf-tune; upstream syncs will not conflict. On a future upstream sync, expect zero conflicts.

ADR-0539 — HIP integer_moment HSACO blob registration

No rebase impact: change is one new row in hip_kernel_sources inside core/src/meson.build, gated by the fork-only enable_hip flag. Netflix upstream has no HIP backend and never touches hip_kernel_sources, the four feature/hip/integer_moment/* paths, or the hip_hsaco_stubs.c TU. The ffmpeg-patches stack is untouched (no new LIBVMAFContext field, no new CLI flag, no new meson_options.txt entry). On a future upstream sync, expect zero conflicts.

If a future PR adds yet another HIP host TU that consumes a <name>_hsaco symbol distinct from any existing meson key, the invariant pinned in core/src/feature/hip/AGENTS.md (HSACO symbol naming) must be honoured to avoid the same class of link error this ADR closed.

feat/dev-container-ffmpeg-av1-hwaccel (ADR-0543)

Files touched: dev/Containerfile (stage 3.5 apt list + SVT-AV1 / libaom / vvenc / AMF source builds + FFmpeg configure flags + build- time encoder probe), dev/AGENTS.md (four new "FFmpeg encoder exposure invariants"), docs/development/dev-mcp.md (encoder matrix + runtime failure modes + full-sweep reproducer), core/src/meson.build (one-line follow-up to ADR-0523: add 'motion_score' to the hip_kernel_sources dict; surfaced as a stage-3 link blocker during ADR-0543's container rebuild verification), ffmpeg-patches/0007- libvmaf-tune-qpfile-unified.patch (one-line addition: #include <stdbool.h> to libavcodec/libsvtav1.c so enable_roi_map = true compiles — surfaced when the in-image FFmpeg was first built with --enable-libsvtav1; the patched code was never previously exercised because no prior dev-image enabled libsvtav1), docs/adr/0529-…md (+ index fragment), changelog.d/added/0529-…md, docs/state.md (two state rows), this file.

Rebase sensitivity (none — container-only fork-local additive plus one fork-local libvmaf wiring fix): Every touched file lives under dev/, docs/, changelog.d/, or core/src/meson.build. The libvmaf hunk adds one dict entry referencing a fork-local .hip source (ADR-0523 lineage); no upstream conflict possible because Netflix has no HIP backend. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry. The CLAUDE.md §12 r14 patch-stack rule does not apply — the FFmpeg configure-line change happens in the in-image build only and is orthogonal to the host-side patch series under ffmpeg-patches/. Pin bumps (VVenC v1.12.0, AMF v1.4.36, FFmpeg n8.1.1 via FFMPEG_TAG build-arg) are visible in the ARG lines of dev/Containerfile; bumping them is a local container change.


ADR-0543 — ADR-0498 enforcement hardening (exit code 100 + JSON error + per-feature gate)

Summary: Hardens the explicit-backend gate that ADR-0498 introduced in core/tools/vmaf.c. Adds three orthogonal contracts: dedicated exit code 100 (VMAF_EXIT_BACKEND_INIT_FAILED) for --backend NAME init failures, structured JSON error descriptor at the --output path when format is JSON, and a per-feature gate that hard-fails GPU-pinned feature names (*_cuda / *_sycl / *_vulkan / *_hip / *_metal) when the matching backend isn't active.

Files touched: core/tools/vmaf.c (new constants, new helpers write_backend_error_json / feature_backend_suffix / backend_active, new bool *cuda_active_out parameter on init_gpu_backends, per-feature gate in the feature-loading loop, simplified backend_used echo), docs/adr/0543-adr-0498-enforcement- hardening.md (+ index row in docs/adr/README.md), changelog.d/fixed/0543-adr-0498-enforcement-hardening.md, tools/vmaf-tune/tests/test_adr_0543_backend_enforcement.py (13 integration + source-level tests), docs/state.md (Recently closed row), this file.

Rebase sensitivity (none — fork-local additive against an already-fork-local helper): The only C source touched is core/tools/vmaf.c, and only inside the init_gpu_backends helper + its caller — both of which are fork-local additions that do not exist in Netflix/vmaf upstream (Netflix has no SYCL / HIP / Vulkan / Metal backends and no --backend selector). The new bool *cuda_active_out parameter on init_gpu_backends is guarded by #ifdef HAVE_CUDA and only affects the in-tree caller. No upstream conflict possible.

ffmpeg-patches/ impact (none): No public libvmaf C-API entry points added, renamed, or removed. No meson_options.txt flag added. No LIBVMAFContext field added. No vf_libvmaf.c filter variant added. The new exit code is a CLI-level contract observed by wrappers (vmaf-tune, MCP) — FFmpeg's libvmaf filter consumes libvmaf via the C API and is not impacted. CLAUDE.md §12 r14 does not apply.

fix/feature-extractor-list-dedup (ADR-0544)

Removes 61 duplicate &vmaf_fex_* entries from core/src/feature/feature_extractor.c's static feature_extractor_list[] (55 Vulkan + 6 SYCL) and adds vmaf_feature_extractor_list_audit(), called from vmaf_init(), that returns -EINVAL if any extractor name or pointer is seen twice.

Rebase sensitivity (low — fork-local hunks only): The duplicated rows lived in fork-local #if HAVE_VULKAN / #if HAVE_SYCL blocks — both backends are absent upstream. The deduped arrangement keeps the same row ordering Netflix would expect for the CPU + CUDA paths (untouched) so the inevitable next sync-upstream sees no diff there. The new public header line in core/src/feature/feature_extractor.h (the vmaf_feature_extractor_list_audit() declaration) is appended after the existing fork-local symbols and before the VmafFeatureExtractorContextFlags block, isolating it from upstream hunks. The vmaf_init() call site lives in a fork-local block (right after vmaf_set_log_level) that already differs from upstream because of the HIP/Vulkan plumbing — a conflict is only possible if Netflix adds a new init-time call there, in which case the resolution is trivial (preserve both calls; the audit is order-independent w.r.t. other init steps).

Touched files: core/src/feature/feature_extractor.{c,h}, core/src/libvmaf.c, core/test/test_feature_extractor.c, docs/adr/0541-*.md, docs/adr/_index_fragments/0541-*.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md (regenerated), docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0541-*.md. No ffmpeg-patches/, meson_options.txt, or meson.build change (test is exercised by an existing test_feature_extractor target).

chore/wire-or-delete-dead-extractor-files (ADR-0545)

Deletes 18 dead Vulkan / Metal feature-extractor source files plus 14 paired orphan shaders (.comp / .metal) from core/src/feature/{vulkan,metal}/ that were never wired into their backend's meson.build. Wires one previously-unwired Metal TU (float_ms_ssim_metal.mm + float_ms_ssim.metal, ADR-0490) into core/src/metal/meson.build. Removes one dead extern in core/src/feature/feature_extractor.c (vmaf_fex_integer_adm_metal) and refreshes the core/src/feature/{vulkan,metal}/AGENTS.md rebase-sensitive invariants section to forbid re-introducing the deleted scaffolds.

Rebase sensitivity (none — pure fork-local housekeeping): All deleted files were fork-local scaffolds added in commit 302bd1673 (2026-05-18, "docs(rules): default to vmaf-dev-mcp container"). Netflix upstream has no Vulkan or Metal backend, so no upstream conflict is possible on the deletes. The lone wired file (float_ms_ssim_metal.mm) is fork-original, references no upstream identifier, and lives under fork-only core/src/metal/. The retained adm_vulkan.c legacy shim is out of scope per ADR-0468. No CPU-path C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry, no Python-binding change — CLAUDE.md §12 r14 (FFmpeg patch-stack sync) does not apply.

feat/vmaf-tune-full-file-and-no-bisect (ADR-0548)

Files touched: tools/vmaf-tune/src/vmaftune/cli.py (auto-probe block in _run_tune_per_shot; new _run_compare_crf_sweep function; --no-bisect / --crf-sweep argparse flags in the compare subparser; --width / --height / --framerate made optional in the tune-per-shot subparser), tools/vmaf-tune/tests/test_tune_per_shot_container_src.py (new — Fix A smoke tests), tools/vmaf-tune/tests/test_compare_no_bisect.py (new — Fix B smoke tests), tools/vmaf-tune/AGENTS.md (two new invariant notes), docs/adr/0548-vmaf-tune-full-file-and-no-bisect.md (+ index row in docs/adr/README.md), changelog.d/added/0548-…md, docs/usage/vmaf-tune.md (Fix A and Fix B documentation), this file.

Rebase sensitivity (none — vmaf-tune Python only): All touched files live under tools/vmaf-tune/, docs/, or changelog.d/. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry is touched. The cli.py changes are purely additive: new function _run_compare_crf_sweep, new optional args in existing subparsers, and an early-return dispatch at the top of _run_compare. No existing API surface is renamed or removed. The probe block at the top of _run_tune_per_shot only executes when args.width is None or args.height is None or args.framerate is None — callers that pass explicit geometry are unaffected. No upstream Netflix/vmaf path is touched.

ADR-0539 — HIP hip_cu_extra_flags dispatch + ssimulacra2_blur -ffp-contract=off

No rebase impact: the change is entirely additive in core/src/meson.build inside the if get_option('enable_hipcc') block — a new hip_cu_extra_flags dict and one extra per_kernel_flags list interpolated into the existing hipcc custom_target command. The fall-through (.get(name, [])) keeps the command line byte-identical for every kernel not listed. Netflix upstream ships no HIP backend, so the entire enable_hipcc block is fork-local; upstream syncs will not conflict. The dict mechanism extends naturally — when porting a future CUDA kernel that lists flags in cuda_cu_extra_flags, mirror the entry in hip_cu_extra_flags per core/src/feature/hip/AGENTS.md. No public API surface, no meson_options.txt entry, no ffmpeg-patches entry. On a future upstream sync, expect zero conflicts.

ADR-0539 — integer ADM HIP kernels (real impl, removes ADR-0536 weak stubs)

No rebase impact: every touched file is fork-local — the four .hip kernel sources under core/src/feature/hip/integer_adm/ are fork-additive (Netflix ships no HIP backend), the hip_kernel_sources meson dict additions live inside the if is_hip_enabled and is_hipcc_enabled block (also fork-local), and the hip_hsaco_stubs.c weak-fallback file is wholly fork-added under ADR-0536. No public C-API surface changes — kernel symbol names match the GET_FN calls in integer_adm_hip.c exactly, host TU is untouched. No meson_options.txt flag added or renamed (re-uses enable_hip + enable_hipcc). No ffmpeg-patches entry needs an update (no new LIBVMAFContext field, no new CLI flag). On a future upstream sync, expect zero conflicts. If a future PR re-introduces a CUDA-only helper into one of the four kernels (re-breaking the standalone build), do NOT re-add a weak HSACO stub — fix the kernel (invariant pinned in core/src/feature/hip/AGENTS.md).

ADR-0546 — Codec-adapter two_pass_args real implementations

Rebase impact: none. The change is entirely fork-local — all modified files live under tools/vmaf-tune/src/vmaftune/codec_adapters/ (fork-added Phase A/F vmaf-tune package) and docs/. No upstream Netflix/vmaf file is touched, no ffmpeg-patches file is touched, no public core/include/ header is touched, no meson_options.txt key is added.

Touched files: tools/vmaf-tune/src/vmaftune/codec_adapters/{svtav1,libaom,vvenc,_nvenc_common,_qsv_common,_amf_common,_videotoolbox_common,h264_videotoolbox,hevc_videotoolbox,av1_videotoolbox,prores_videotoolbox}.py, tools/vmaf-tune/tests/test_codec_adapter_two_pass_real.py, docs/adr/0546-codec-adapter-two-pass-real.md, docs/adr/README.md (one index row), docs/research/0546-codec-adapter-two-pass-real.md, docs/usage/vmaf-tune.md (codec support matrix refresh), docs/state.md (Recently-closed row), changelog.d/added/0546-codec-adapter-two-pass-real.md, docs/rebase-notes.md (this entry).

chore/ai-tooling-env-overrides-split (ADR-0547)

Rebase sensitivity (low — fork-local only):

  • ai/scripts/*.py: every file is fork-local. The edits add an os.environ.get(...) wrap around the existing default-path literal. No upstream conflict possible.
  • .gitignore: appends *.bak / *.orig. Trivial to resolve if upstream ever touches the same lines (unlikely — these are universal editor-backup patterns).
  • docs/ai/scripts-env-vars.md (new file), mkdocs.yml nav entry: fork-local docs tree. No upstream conflict possible.
  • tools/vmaf-tune/src/vmaftune/cli.py.bak: untracked file deletion; no git history impact.

Touched files: .gitignore, fifteen ai/scripts/*.py files, docs/ai/scripts-env-vars.md (new), mkdocs.yml, docs/adr/0547-ai-script-env-vars.md (new), docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/changed/0547-ai-script-env-vars.md (new). No ffmpeg-patches/ change (no C-API, CLI flag, or meson_options.txt consumed by a patch — CLAUDE.md §12 r14 exempt).

chore/audit-cleanup-bundle-2 (ADR-0549)

No rebase impact. All changes are confined to fork-local files:

  • core/src/feature/cuda/integer_{ssim,ms_ssim,psnr,moment}_cuda.c (comment addition only — no functional change; upstream parity intact).
  • core/src/feature/sycl/integer_{ssim,ms_ssim,psnr,moment}_sycl.cpp (comment addition only).
  • dev/Containerfile (fork-local; whole file is fork-added).
  • .gitignore (adds .claude/worktrees/ line; no upstream conflict).
  • docs/state.md (fork-only doc tree).
  • python/test/vmafexec_test.py (comment deletion only; assertion value and places argument are unchanged — no golden-gate impact).
  • docs/adr/0549-audit-cleanup-bundle-2.md, docs/adr/README.md, changelog.d/changed/0549-audit-cleanup-bundle-2.md, docs/rebase-notes.md (this entry).

Touched files: core/src/feature/cuda/integer_{ssim,ms_ssim,psnr,moment}_cuda.c, core/src/feature/sycl/integer_{ssim,ms_ssim,psnr,moment}_sycl.cpp, dev/Containerfile, .gitignore, docs/state.md, python/test/vmafexec_test.py, docs/adr/0549-audit-cleanup-bundle-2.md, docs/adr/README.md (one index row), changelog.d/changed/0549-audit-cleanup-bundle-2.md, docs/rebase-notes.md (this entry).

ADR-0546 — audit bundle (Vulkan-01 / saliency-tune-01 / ai-01)

No rebase impact for Vulkan-01: adding &vmaf_fex_integer_motion_vulkan_impl to feature_extractor_list[] in feature_extractor.c is a purely additive change under the existing #if HAVE_VULKAN guard. Netflix has no Vulkan backend, so upstream syncs produce zero conflicts.

No rebase impact for saliency-tune-01: all touched files (tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/cli.py) are fork-local. No public C-API, no meson_options.txt entry, no ffmpeg-patches entry.

No rebase impact for ai-01: tools/vmaf-tune/src/vmaftune/predictor_train.py is fork-local. The --emit-stub-card-only flag is additive.

ADR-0550 -- Cross-backend parity matrix 2026-05-18

Touches: docs/adr/0550-cross-backend-parity-matrix-2026-05-18.md, docs/research/0550-cross-backend-parity-matrix-2026-05-18.md, docs/adr/README.md (index row), docs/state.md (audit-closed row), changelog.d/added/0550-cross-backend-parity-matrix.md, this file.

Rebase sensitivity (none -- docs-only, no C source): All touched files are under docs/ and changelog.d/. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry. No numerical-correctness risk: this is a read-only audit that produces only documentation artefacts.

ADR-0561 — HIP gfx_targets fallback widening

Branch: fix/hip-gfx-targets-fallback-widening Rebase impact: low — touches only core/src/meson.build, core/meson_options.txt, and docs/backends/hip/overview.md. No kernel code, no public API change.

Rebase-sensitive invariant: The fallback string 'gfx90a,gfx1030,gfx1036,gfx1100' at the end of the four-step probe chain in core/src/meson.build must not regress to 'gfx90a' alone. If a meson.build rebase conflict arises in that region, prefer the wider fallback. The comment block above the fallback explains the rationale.

Touched files: core/src/meson.build (fallback string + comment), core/meson_options.txt (description update), docs/backends/hip/overview.md (-Dhip_gfx_targets section), docs/adr/0561-hip-gfx-targets-fallback-widening.md, docs/adr/README.md (one index row), docs/research/0561-hip-gfx-targets-fallback-widening.md, docs/state.md (Recently-closed row T-HIP-GFX-TARGETS-FALLBACK-2026-05-18), changelog.d/fixed/0561-hip-gfx-targets-fallback-widening.md, docs/rebase-notes.md (this entry).

ADR-0562 — VCQ-223 local-explainer hang fix

Rebase impact: none. The change is entirely fork-local — all modified files live under python/ (Python wrapper test harness) and docs/. No upstream Netflix/vmaf C file is touched, no ffmpeg-patches file is touched, no public core/include/ header is touched, no meson_options.txt key is added.

Touched files: python/vmaf/core/quality_runner_extra.py, python/test/local_explainer_test.py, docs/adr/0562-local-explainer-hang-fix.md, docs/adr/README.md (one index row), docs/state.md (Recently-closed row), changelog.d/fixed/vcq-223-local-explainer-hang.md, docs/rebase-notes.md (this entry).

ADR-0559 — Feature coverage audit: speed_chroma + speed_temporal in extraction scripts

Rebase impact: minimal. Changes are entirely fork-local — all modified files live under ai/data/feature_extractor.py, ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/extract_full_features.py (docstring only), and docs/. No upstream Netflix/vmaf file is touched, no ffmpeg-patches file is touched, no public core/include/ header is touched, no meson_options.txt key is added.

If a future upstream sync adds speed_chroma or speed_temporal to the upstream FULL_FEATURES equivalent, this fork's tuple will have them already; check for duplicates on merge.

Touched files: ai/data/feature_extractor.py (FULL_FEATURES + _METRIC_TO_EXTRACTOR), ai/scripts/bvi_dvc_to_full_features.py (local FULL_FEATURES + EXTRACTORS), ai/scripts/extract_full_features.py (docstring only), docs/ai/models/konvid_mos_head_v1.md (coverage-gap note), docs/adr/0559-feature-coverage-audit.md, docs/adr/README.md (one index row), docs/research/feature-coverage-audit-2026-05-18.md, changelog.d/added/0559-feature-coverage-audit-speed-features.md, docs/rebase-notes.md (this entry).

ADR-0566 — HIP VIF per-feature places=4 gate (supersedes ADR-0537 follow-up)

Branch: fix/hip-vif-svm-amplification-places4-gate Rebase impact: documentation-only — touches only docs/adr/, docs/state.md, docs/rebase-notes.md, and changelog.d/. No kernel code, no meson.build change, no public API change.

Rebase-sensitive invariant: If ADR-0537 is amended in a future PR, ensure the "places=3 is acceptable" follow-up clause is not reintroduced. The supersession is recorded in ADR-0566 and in the Recently-closed row T-HIP-VIF-PLACES3-GATE-INCORRECT-2026-05-18.

Touched files: docs/adr/0566-hip-vif-per-feature-places4-gate.md, docs/adr/README.md (one index row), docs/state.md (Recently-closed row T-HIP-VIF-PLACES3-GATE-INCORRECT-2026-05-18), changelog.d/fixed/0566-hip-vif-per-feature-places4-gate.md, docs/rebase-notes.md (this entry).

ADR-0552 — HIP VIF deterministic wavefront reduction

Branch: fix/hip-vif-deterministic-reduce Rebase impact: low — only touches core/src/feature/hip/integer_vif/vif_statistics.hip and documentation files. No public API change. No meson.build change.

Rebase-sensitive invariant: The wavefront_reduce_i64 helper uses __shfl_xor with strides 32, 16, 8, 4, 2, 1 (for AMD 64-lane wavefronts). Do NOT merge with a CUDA-style __shfl_down_sync port that uses strides 16, 8, 4, 2, 1 (32-lane) — the stride list is wrong for AMD and will under-reduce, leaving 32-thread partial sums in the accumulator.

Conflict scenario: If a rebase brings in a change to vif_statistics.hip from the CUDA parity sweep or a vif_hori_16_body template refactor, verify that:

  1. The outer if (x < w && y < h) guard is preserved (not replaced by early return).
  2. wavefront_reduce_accums(thr) is called before the atomicAdd block.
  3. The atomicAdd block is inside if ((threadIdx.x % AMD_WAVEFRONT_SIZE) == 0).

Touched files: core/src/feature/hip/integer_vif/vif_statistics.hip, docs/adr/0552-hip-integer-vif-deterministic-reduce.md, docs/adr/README.md (one index row), docs/research/0552-hip-vif-deterministic-reduce.md, docs/state.md (Recently-closed row T-HIP-VIF-PARITY-PLACES4-2026-05-18), changelog.d/fixed/0552-hip-vif-deterministic-reduce.md, docs/rebase-notes.md (this entry).

fix/python-mcp-ai-audit-p0-p1-2026-05-18 (ADR-0556)

Python / MCP / AI silent-fallback audit. No upstream-shared paths modified; all fixes are in fork-local Python harness files (tools/vmaf-tune/, mcp-server/, ai/scripts/) or documentation. No rebase-sensitive invariants introduced — the score.py JSONDecodeError guard is a pure additive safety wrapper around an existing json.load call, and the bvi_dvc_to_full_features.py empty-entries guards are early-exits before any loop body runs. Files touched: tools/vmaf-tune/src/vmaftune/score.py, tools/vmaf-tune/src/vmaftune/auto.py, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/validate_model_registry.py, docs/adr/0556-python-mcp-ai-audit-2026-05-18.md, docs/adr/README.md (one index row), docs/research/python-mcp-ai-audit-2026-05-18.md, changelog.d/fixed/0556-python-mcp-ai-audit.md, docs/state.md (5 T-rows added to Open section),

ADR-0568 — upstream port USE_DIRECT_READ zero-copy input path (Netflix/vmaf@30a6e2a8d)

Branch: chore/upstream-port-direct-read-and-speed-wrappers Rebase impact: low — all touched files are upstream-shared paths that the upstream commit also modifies. On the next sync-upstream, these changes should merge cleanly because the fork's version is a strict superset of the upstream diff (same logic, plus fork style conventions: (void)fprintf, explicit (int) casts on fread returns, memcmp(…) != 0).

Rebase-sensitive invariant: The new fetch_into_vmaf_picture vtable field is the last member of video_input_vtbl. Any future vtable extension must append after it or update both the vtable struct and all initialisers (YUV_INPUT_VTBL, Y4M_INPUT_VTBL) simultaneously. The upstream port order is: open_raw, open, get_info, fetch_frame, close, fetch_into_vmaf_picture.

Touched files: core/tools/vidinput.h, core/tools/vidinput.c, core/tools/yuv_input.c, core/tools/y4m_input.c, core/tools/vmaf.c, docs/adr/0567-upstream-port-direct-read.md, docs/adr/README.md (one index row), changelog.d/perf/0567-upstream-port-direct-read.md,

ADR-0568 — sycl_icpx_aot_targets default

Rebase impact: low. Adds a new sycl_icpx_aot_targets string option to core/meson_options.txt and wires the corresponding AOT flags in core/src/meson.build. Any upstream Netflix/vmaf change that also touches those two files will produce a trivial two-hunk conflict. Resolution: preserve both the upstream hunk and the fork-added sycl_icpx_aot_targets option + toolchain-flag block. No public API header is touched; no ffmpeg-patches file is touched.

Touched files: core/meson_options.txt (new option), core/src/meson.build (AOT flag wiring, icpx branch), docs/backends/sycl/overview.md (new AOT section), dev/Containerfile (doc comment near meson invocation), docs/adr/0568-sycl-icpx-aot-targets-default.md (new ADR), docs/adr/README.md (one index row), changelog.d/added/0568-sycl-icpx-aot-targets-default.md, docs/state.md (no-bug note),

chore/sdk-version-bumps-may-2026 (ADR-0569)

No rebase-sensitive invariants. All changes are version-string edits in dev/Containerfile ARG lines, .pre-commit-config.yaml rev fields, .github/workflows/supply-chain.yml action SHA pins, and python/requirements.txt ceiling. No API changes, no C/Python logic changes.

Conflict scenarios:

  • dev/Containerfile: The Ubuntu 26.04 base-image PR (in-flight) touches different ARG blocks. If a rebase conflict occurs, keep both sets of version edits — they are in disjoint sections of the file.
  • .pre-commit-config.yaml: If a concurrent PR bumps the same tools, prefer the higher version.
  • python/requirements.txt: If the Ubuntu 26.04 PR widens the numpy ceiling in the same PR, both edits are independent; apply both.

Touched files: dev/Containerfile, .pre-commit-config.yaml, .github/workflows/supply-chain.yml, python/requirements.txt, ai/pyproject.toml (comment only), docs/adr/0569-sdk-version-bumps-2026-05-18.md, docs/adr/README.md (one index row), changelog.d/changed/0569-sdk-version-bumps-2026-05-18.md, dev/AGENTS.md (invariant notes), docs/development/dev-mcp.md (version table),

ADR-0574 — CUDA twins for HDR-model aim and adm3 sub-features (Phase 1)

Branch: feat/hdr-features-cuda-twins Rebase impact: CUDA kernel and host files only — touches core/src/feature/cuda/float_adm/float_adm_score.cu and core/src/feature/cuda/float_adm_cuda.c. No meson.build change, no public C-API change, no Python/model change.

Rebase-sensitive invariant: FADM_ACCUM_SLOTS = 9 must remain identical in both files. The .cu unit defines the per-WG slot layout ([0..2]=csf_den, [3..5]=cm_num, [6..8]=aim_cm); the .c host uses it for buffer allocation, D2H copy size, and accumulator reads. If a rebase replaces the .cu with a pre-ADR-0574 version (FADM_ACCUM_SLOTS = 6), update float_adm_cuda.c to match in the same commit — a mismatch silently corrupts host memory. The --fmad=false nvcc flag covers all six kernels; do not remove it.

Touched files: core/src/feature/cuda/float_adm/float_adm_score.cu, core/src/feature/cuda/float_adm_cuda.c, docs/adr/0574-hdr-features-cuda-twins-phase-1.md, docs/adr/README.md (one index row), core/src/feature/cuda/AGENTS.md (slot-sync invariant note), docs/research/netflix-upstream-feature-additions-since-sync-2026-05-18.md, docs/metrics/features.md (aim/adm3 sub-feature docs + footnote ⁶), docs/state.md (Recently-closed row T-CUDA-AIM-ADM3-2026-05-18), changelog.d/added/0574-hdr-features-cuda-twins-aim-adm3.md, docs/rebase-notes.md (this entry).

ADR-0606 — macOS SIGSEGV deep-fix in output.c writers (PR #1403 follow-up)

Rebase-sensitive invariant: i >= fc->feature_vector[j]->capacity (not >) in all seven frame-iteration bounds checks in core/src/output.c. If upstream Netflix ever backports a fix to the same comparison sites and uses > (the old, buggy form), the rebase must preserve the >= — the > form is a heap buffer overread UB that surfaces on macOS with MALLOC_PERTURB_=198.

The fps computation guard (if (vmaf->pic_cnt == 0 || timer_elapsed == 0)) in core/src/libvmaf.c is similarly rebase-sensitive: if upstream modifies the fps block, preserve the guard before dividing so import-only callers (those that use vmaf_import_feature_score without vmaf_read_pictures) do not produce 0.0/0.0 which may SIGFPE on Apple platforms.

Touched files: core/src/output.c (7 bounds-check sites, json pool-score + frames comma fixes), core/src/libvmaf.c (fps defensive computation), docs/adr/0606-macos-vmaf-write-output-segv-deep-fix.md, docs/adr/README.md (one index row), docs/state.md (Recently-closed row), changelog.d/fixed/0606-macos-vmaf-write-output-segv-deep-fix.md, docs/rebase-notes.md (this entry).

ADR-0612 — vmaf-tune compare: decode reference YUV once (shared-ref fix)

ADR-0607 — vmaf-tune compare: decode reference YUV once (shared-ref fix)

No rebase impact: all touched files are fork-local Python harness files. No upstream C sources, no public headers, no FFmpeg patch series involved.

Touched files: tools/vmaf-tune/src/vmaftune/compare.py (pre_decoded_ref param on compare_codecs and compare_codecs_sweep), tools/vmaf-tune/src/vmaftune/cli.py (decode-once block + try/finally in _run_compare; imports _decode_to_raw_yuv from .score), tools/vmaf-tune/tests/test_bbb_e2e_v15_shared_ref.py (7 acceptance tests), docs/adr/0607-vmaftune-shared-ref-yuv-decode-once.md, docs/adr/README.md (one index row), changelog.d/fixed/0607-vmaftune-shared-ref-yuv-decode-once.md,

ADR-0612 — Tiny-AI Netflix corpus training scaffold (2026-05-19 iteration)

No rebase-sensitive invariants introduced by this PR — all changes are documentation, research digest, and CHANGELOG fragment. No C/CUDA/SIMD paths modified; no loader or test code changed.

The one invariant worth noting for future rebases: the .workingdir2/netflix/ corpus path is local-only and gitignored. If a future rebase touches .gitignore, confirm that the *.yuv and .workingdir2/ entries remain in place. Training scripts must continue to accept --data-root as an explicit CLI flag rather than hard-coding the path.

Touched files: docs/adr/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/adr/_index_fragments/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/adr/_index_fragments/_order.txt, docs/research/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/ai/training-data.md (cross-reference links), changelog.d/added/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/rebase-notes.md (this entry).

ADR-0626 — SSH debug session on macOS CI failure (tmate)

No rebase-sensitive invariants — the change is limited to .github/workflows/libvmaf-build-matrix.yml (one new step) and docs/. No C, CUDA, SYCL, HIP, or Python paths touched.

If an upstream sync touches libvmaf-build-matrix.yml, confirm that the SSH debug session on test failure step is preserved after the merge and that its if: condition still references runner.os == 'macOS', failure(), and github.event_name == 'workflow_dispatch'. The action SHA pin (c0afd6f790e3a5564914980036ebf83216678101) will be bumped automatically by Renovate when a new mxschmitt/action-tmate release is tagged.

Touched files: .github/workflows/libvmaf-build-matrix.yml (one new step after "Run tests"), docs/development/ci-tmate-debug.md (new operator guide), docs/adr/0626-macos-ci-tmate-debug-on-failure.md, docs/adr/README.md (one index row), changelog.d/added/0626-macos-ci-tmate-debug-on-failure.md, docs/rebase-notes.md (this entry).

ADR-0628 — Remote-aware ADR number allocator

No rebase impact: all touched files are fork-local tooling and CI configuration. No upstream C sources, public headers, or FFmpeg patch series involved.

scripts/adr/next-free.sh is a fork-added script with no upstream analogue; it will never conflict on a Netflix upstream sync. The .github/workflows/ rule-enforcement.yml change adds a new step to an existing job — this file does not exist upstream, so no conflict is expected. The CLAUDE.md update extends §12 r8 prose only.

Touched files: scripts/adr/next-free.sh (remote-aware allocator + .git/adr-claims/ side-pointer), scripts/adr/tests/test-next-free-remote-aware.sh (new acceptance tests), .github/workflows/rule-enforcement.yml (phase-2 open-PR collision check), CLAUDE.md (§12 r8 extended allocator description), docs/adr/0628-adr-allocator-remote-aware.md, docs/adr/README.md (one index row), changelog.d/fixed/0628-adr-allocator-remote-aware.md,

ADR-0608 — MCP P0 fixes: isError, probe_backend, vmaf_version, vmaf_score_encoded

No rebase impact: all touched files are fork-local MCP server Python files and docs. No upstream C sources, no public headers, no FFmpeg patch series involved.

Touched files: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py (isError fix in _call_tool; new _probe_backend, _vmaf_version, _run_vmaf_score_encoded, _ffprobe_geometry, _decode_to_yuv functions; three new Tool registrations), mcp-server/vmaf-mcp/tests/test_mcp_p0_adr0608.py (11 new regression tests), mcp-server/vmaf-mcp/tests/test_smoke_e2e.py (2 test updates for new behavior), mcp-server/vmaf-mcp/README.md (tools table updated to 10 tools, 6 backends), docs/mcp/tools.md (new tool sections, corrected list_backends description and response body, updated error conventions table), docs/adr/0608-mcp-p0-iserror-and-probe-version-encoded.md, docs/adr/README.md (one index row), changelog.d/fixed/adr0608-mcp-p0-iserror-probe-version-encoded.md, changelog.d/added/adr0608-mcp-probe-backend-vmaf-version-encoded.md, docs/rebase-notes.md (this entry).

Master CI repair — DNN coverage, MCP smoke, and formatter drift (2026-05-19)

No rebase-sensitive invariants introduced — this PR is CI/test/doc hygiene only. No public C API, backend implementation, model artifact, FFmpeg patch, or Netflix golden-data assertion changed.

If a future rebase touches the DNN tiny-model smoke tests, preserve these contracts:

  • core/test/dnn/test_cli.sh uses --tiny-resize bilinear for the nr_metric_v1.onnx no-reference smoke because the shipped NR model is 224x224 and strict resize mode intentionally rejects the 576x324 fixture.
  • The same CLI smoke caps tiny-model inference at --frame_cnt 1; it is a load/run smoke, not a full-clip numerical benchmark.
  • mcp-server/vmaf-mcp/tests/test_smoke_e2e.py expects unknown tool names to raise, matching the ADR-0613 isError contract.
  • The Netflix golden and lavapipe parity gates keep per-command timeout wrappers plus step-level timeout-minutes; if a runner/backend hangs, CI must fail diagnostically instead of waiting for the full job-level timeout.
  • The Netflix golden CI lane intentionally invokes only QualityRunnerTest::test_run_vmaf_runner and QualityRunnerTest::test_run_vmaf_runner_checkerboard: together they cover the D24 normal pair plus the checkerboard 10-px and 1-px distorted pairs. Do not expand this lane into the broad Python quality/feature suites; those are separate test coverage, not the golden-data gate.
  • Keep the D24 normal and checkerboard invocations as separate workflow steps; this preserves a clear failure surface when the 1080p checkerboard pair is slow or stuck. The normal-pair step still needs a multi-minute budget on cold GitHub-hosted runners because the Python runner invokes the feature binary with ADM/VIF/motion debug output. The normal-pair budget is 21 minutes inside a 22-minute step; lowering it back to 7 or 11 minutes has timed out on cold 2026-05-19 GitHub-hosted runners before assertions completed.
  • The lavapipe Vulkan VIF cross-backend lane has the same cold-runner constraint. Keep the required VIF step at a 15-minute command wrapper inside a 16-minute step; the old 8-minute wrapper timed out exactly on GitHub-hosted Ubuntu before the diff script could report a result.
  • python/vmaf/routine.py::run_test_on_dataset() only passes bootstrap stats kwargs when the runner exposes the full bootstrap score-key getter set. Normal VMAF and PSNR runners do not have get_bagging_score_key() / CI95 / all-model prediction fields; macOS tox exercises those normal runners through run_testing.py, so do not reintroduce unconditional bootstrap-key access.
  • Python doctests under python/vmaf/tools/ must not rely on platform/version scalar reprs or assertion traceback formatting. NumPy 2 can display scalar values as np.float64(...), and Python 3.14 can append assert-expression details; keep examples explicit with float(...), string formatting, or first-line exception-message printing.
  • core/src/thread_locale.c uses duplocale(LC_GLOBAL_LOCALE) as the base for newlocale(LC_NUMERIC_MASK, "C", base). Do not restore newlocale(LC_ALL_MASK, "C", NULL): macOS allocator poisoning can expose poisoned internal locale pointers as SIGSEGV in the output-writer tests.
  • core/test/test_output.c must not include libvmaf.c or output.c directly while also linking libvmaf. Use core/src/libvmaf_priv.h::vmaf_feature_collector_get() for the internal collector access instead; duplicate implementation TUs have crashed Apple ld64 + LTO macOS jobs under allocator poisoning.
  • core/src/dnn/model_loader.c::vmaf_dnn_sidecar_load() rejects oversized sidecars with stat() before opening them. Preserve this cheap metadata-only guard so the test_vmaf_use_tiny_model oversized-sidecar case does not enter platform stdio on the expected -EFBIG path. The regression test copies model/tiny/smoke_v0.onnx rather than synthesising an invalid ONNX blob; that keeps a missed sidecar gate as an assertion failure instead of an ORT invalid-model crash on macOS.
  • core/src/output.c flushes each writer's stream before calling vmaf_thread_locale_pop(). Keep the flush inside the locale lifetime: path-based vmaf_write_output() uses fdopen() and may otherwise defer the stream flush to fclose() after the temporary C numeric locale has been restored/freed, which is the macOS-only writer SIGSEGV shape.
  • core/test/meson.build defines libvmaf_public_link so public ABI tests link the shared library when default_library=both. Do not route test_public_api_score or test_vmaf_use_tiny_model back through libvmaf.get_static_lib() on macOS: Apple ld64 + LTO folds the public call into the test executable and reproduces the writer/DNN SIGSEGV shape. The internal test_output target keeps its static link for vmaf_feature_collector_get() but disables LTO at the target on Darwin only; Linux clang static-archive links still need -flto because src/libvmaf.a contains LLVM bitcode there.
  • python/vmaf/core/asset.py::ORDERED_FILTER_LIST includes fps and format between pad and gblur. Keep that order stable: it controls both FFmpeg preprocessing command composition and the slugified Asset string identity. The corresponding properties are fps_cmd, ref_fps_cmd, and dis_fps_cmd; format is intentionally accessed via get_filter_cmd("format", target) like the generic filter-only keys.
  • core/src/feature/feature_extractor.h includes generated config.h before defining struct VmafFeatureExtractor. Keep that include in the header, not just in selected consumer TUs: backend-enabled LTO builds need every extractor definition and the registry to see identical HAVE_CUDA / HAVE_SYCL / HAVE_VULKAN macro state, otherwise GCC emits -Wlto-type-mismatch and may misoptimise extractor globals.
  • core/src/feature/common/macros.h::FORCE_INLINE already expands to an inline specifier on GCC/Clang. Do not re-add a second literal inline to CSF / CAMBI / motion helper declarations; Clang reports the duplicate specifier throughout the build matrix.
  • core/tools/vmaf.c::fetch_picture() owns a preallocated picture slot as soon as vmaf_fetch_preallocated_picture() succeeds. Preserve the EOF/read-error cleanup that unrefs that slot before returning 1 or -1, and preserve the run_frame_loop() cleanup for the opposite side when only one input read succeeded. Without both, the CLI can score and write output successfully, then hang forever in vmaf_close() while the picture pool waits for unread slots to return.
  • scripts/ci/cross_backend_{parity_gate,vif_diff}.py must keep the backend-specific extractor alias ("adm", "vulkan") -> "integer_adm_vulkan". ADR-0586 renamed Vulkan integer ADM to the canonical extractor name while CPU/CUDA/SYCL retained the historical adm, adm_cuda, and adm_sycl names. Dropping the alias makes the lavapipe parity gate invoke the retired adm_vulkan compatibility name and fail before comparing scores.
  • core/src/feature/common/convolution_avx512.c vertical scanlines must use _mm512_loadu_ps / _mm512_storeu_ps. MAX_ALIGN is 32 bytes, not 64 bytes; the stride can be a 64-byte multiple while the row base is still only 32-byte aligned. Reintroducing aligned AVX-512 memory ops can crash float_vif on AVX-512-capable CPU runners.

Touched files: .github/workflows/tests-and-quality-gates.yml, docs/development/zed-migration-plan-2026-05-19.md, docs/metrics/features.md, docs/usage/python.md, core/src/dnn/AGENTS.md, core/src/dnn/model_loader.c, core/src/dnn/ort_backend.c, core/src/AGENTS.md, core/src/feature/AGENTS.md, core/src/feature/adm_csf_tools.h, core/src/feature/arm64/moment_sve2.c, core/src/feature/arm64/psnr_hvs_neon.c, core/src/feature/arm64/ssimulacra2_host_neon.c, core/src/feature/arm64/ssimulacra2_neon.c, core/src/feature/arm64/ssimulacra2_sve2.c, core/src/feature/barten_csf_tools.h, core/src/feature/cambi.c, core/src/feature/feature_dists.c, core/src/feature/feature_extractor.h, core/src/feature/feature_lpips.c, core/src/feature/feature_mobilesal.c, core/src/feature/fastdvdnet_pre.c, core/src/feature/integer_motion.c, core/src/feature/motion_blend_tools.h, core/src/feature/ssimulacra2.c, core/src/feature/transnet_v2.c, core/src/feature/vulkan/adm_vulkan.c, core/src/feature/vulkan/cambi_vulkan.c, core/src/feature/vulkan/float_vif_vulkan.c, core/src/feature/vulkan/integer_adm_vulkan.c, core/src/feature/vulkan/ssimulacra2_vulkan.c, core/src/feature/vif_tools.c, core/src/feature/x86/psnr_hvs_avx2.c, core/src/feature/x86/ssimulacra2_avx2.c, core/src/feature/x86/ssimulacra2_avx512.c, core/src/feature/x86/ssimulacra2_host_avx2.c, core/src/feature/x86/vif_avx512.c, core/src/libvmaf.c, core/src/libvmaf_priv.h, core/src/framesync.c, core/src/model.c, core/src/output.c, core/src/picture.c, core/src/thread_locale.c, core/src/vulkan/vma_impl.cpp, core/tools/AGENTS.md, core/tools/vmaf.c, core/test/AGENTS.md, core/test/dnn/test_cli.sh, core/test/dnn/test_dnn_session_api.c, core/test/dnn/test_model_loader.c, core/test/dnn/test_ort_internals.c, core/test/dnn/test_tensor_io.c, core/test/dnn/test_vmaf_use_tiny_model.c, core/test/dnn/meson.build, core/test/meson.build, core/test/test_feature_extractor.c, core/test/test_framesync.c, core/test/test_model.c, core/test/test_output.c, core/test/test_predict.c, core/test/test_psnr_hvs_simd.c, core/test/test.c, mcp-server/vmaf-mcp/tests/test_smoke_e2e.py, python/test/asset_test.py, python/vmaf/core/asset.py, core/tools/vmaf_bench.c, core/src/feature/common/AGENTS.md, core/src/feature/common/convolution.h, core/src/feature/common/convolution_avx512.c, core/test/test_vif_simd.c, scripts/ci/AGENTS.md, scripts/ci/cross_backend_parity_gate.py, scripts/ci/cross_backend_vif_diff.py, scripts/ci/test_cross_backend_feature_names.py, docs/state.md, changelog.d/fixed/master-ci-dnn-mcp-coverage-2026-05-19.md, docs/rebase-notes.md (this entry).

ADR-0640 — Tiny-AI Netflix corpus training scaffold (2026-05-20 iteration)

No rebase impact on upstream C sources or FFmpeg patches. All touched files are fork-local docs, Python test infrastructure, and the changelog fragment tree.

Key invariants (track when upstream Netflix/vmaf adds its own training surface):

  • .workingdir2/netflix/ is gitignored and the 37 GB corpus is never committed. The --data-root flag (or VMAF_DATA_ROOT env var) is the mandatory CLI interface; any training script that hard-codes the corpus path violates this invariant.
  • mcp-server/vmaf-mcp/tests/test_smoke_e2e.py runs against committed fixtures only (python/test/resource/yuv/src01_hrc00_576x324.yuv). Do not change the smoke test to reference .workingdir2/netflix/.
  • Architecture selection and the actual training run are deferred to a follow-up PR; do not trigger training from the scaffold branch.

Touched files: docs/adr/0640-tiny-ai-netflix-training-scaffold-2026-05-20.md, docs/adr/_index_fragments/0640-tiny-ai-netflix-training-scaffold-2026-05-20.md, docs/adr/_index_fragments/_order.txt, docs/research/0615-tiny-ai-netflix-training-2026-05-20.md, docs/ai/training-data.md (See also section extended), changelog.d/added/0640-tiny-ai-netflix-training-scaffold-2026-05-20.md, docs/rebase-notes.md (this entry).

ADR-0643 — vmaf-tune encoder-profile report contract

No rebase impact on upstream libvmaf C sources. This change touches fork-local vmaf-tune Python code, docs, tests, and the FFmpeg patch stack. The FFmpeg integration is advisory CLI glue only.

Key invariants:

  • ReportData.to_dict() embeds encoder_profile.schema == "vmaftune.encoder_profile.v1". Future report-shape changes should be additive or should bump the profile schema.
  • vmaf-tune encode-profile must read raw JSON, HTML, and Markdown reports. HTML raw JSON is escaped in <pre> and intentionally unescaped before parsing.
  • The profile reader selects one recommendation by --codec, --target-vmaf, and/or --recommendation-index; it must not implicitly encode every codec or ladder rung.
  • FFmpeg patch 0015-vmaf-tune-profile-cli-glue.patch stays advisory. Do not duplicate vmaf-tune's JSON/profile selection logic in FFmpeg.
  • FFmpeg 8.x.x base: upstream tags were fetched on 2026-05-20 and the latest released 8.x.x tag was n8.1.1 (n8.2-dev is a dev tag). The full ffmpeg-patches/000*-*.patch series replayed cleanly against a temporary pristine n8.1.1 worktree.

Touched files: tools/vmaf-tune/src/vmaftune/report.py, tools/vmaf-tune/src/vmaftune/encoder_profile.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_report.py, tools/vmaf-tune/tests/test_encoder_profile.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-ffmpeg.md, ffmpeg-patches/0015-vmaf-tune-profile-cli-glue.patch, ffmpeg-patches/series.txt, ffmpeg-patches/README.md, docs/adr/0643-vmaf-tune-encoder-profile-contract.md, docs/adr/_index_fragments/0643-vmaf-tune-encoder-profile-contract.md, docs/adr/_index_fragments/_order.txt, docs/research/0643-vmaf-tune-encoder-profile-contract.md, changelog.d/added/vmaf-tune-encoder-profile.md, docs/rebase-notes.md (this entry).

ADR-0644 — vmaf-tune codec runtime variants

No upstream Netflix C-source rebase impact. The change is confined to the fork-local tools/vmaf-tune Python CLI/report schema, usage docs, and ADR metadata.

Key invariants:

  • ADAPTER@VARIANT is a compare display token. The base ADAPTER still routes through the codec-adapter registry and FFmpeg -c:v encoder name.
  • --encoder-ffmpeg-bin TOKEN=PATH is an exact-token binding. Unknown binding keys are rejected rather than silently falling back to the global --ffmpeg-bin.
  • Compare JSON/CSV rows now include adapter, runtime_variant, and ffmpeg_bin. Keep these fields together if future schema work touches COMPARE_ROW_KEYS.

Touched files: tools/vmaf-tune/src/vmaftune/encoder_runtime.py, tools/vmaf-tune/src/vmaftune/compare.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_encoder_runtime.py, tools/vmaf-tune/tests/test_compare.py, tools/vmaf-tune/tests/test_compare_no_bisect.py, tools/vmaf-tune/tests/test_compare_rate_quality_sweep.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-codec-adapters.md, docs/adr/0644-vmaf-tune-codec-runtime-variants.md, docs/adr/_index_fragments/0644-vmaf-tune-codec-runtime-variants.md, docs/adr/_index_fragments/_order.txt, docs/research/0644-vmaf-tune-codec-runtime-variants.md, changelog.d/added/0644-vmaf-tune-codec-runtime-variants.md, docs/rebase-notes.md (this entry).

ADR-0645 — Integer ADM p-norm SIMD callback ABI

When rebasing any upstream change that touches integer ADM contrast-measure callbacks, keep adm_p_norm threaded through the scalar and x86 SIMD twins.

Touched ABI group: core/src/feature/integer_adm.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/x86/adm_avx2.h, core/src/feature/x86/adm_avx512.h.

Invariant: adm_cm and i4_adm_cm must accept the p-norm parameter and the final powf exponent must be 1.0f / (float)adm_p_norm in every twin. The default 3.0 path is the Netflix-compatible path; do not split SIMD dispatch back to a hard-coded exponent when resolving conflicts.

ADR-0648 — CHUG HDR MOS trainer entry point

CHUG HDR subjective-MOS experiments use ai/scripts/train_chug_hdr_mos_head.py and local chug_hdr_mos_head_v1 manifests. Keep CHUG operator docs on that entry point; do not reintroduce instructions that pass CHUG shards through train_konvid_mos_head.py's KonViD-named flags. The wrapper may reuse the shared MOS-head implementation, but it must pass explicit non-existent KonViD paths so CHUG runs cannot accidentally mix local KonViD rows with HDR MOS shards.

ADR-0649 — CHUG HDR wide MOS feature schema

train_chug_hdr_mos_head.py defaults to --feature-schema chug-hdr-wide-v1. That schema is CHUG-local and currently 34 columns: canonical-6 means, p10/p90 / std temporal aggregates, and HDR ladder / geometry metadata. Do not edit the KonViD FEATURE_COLUMNS order to implement CHUG experiments; keep the shipped konvid_mos_head_v1 ONNX on the konvid-v1 11-column schema. Downstream CHUG experiment scripts must read feature_schema and feature_order from the manifest instead of assuming 11 inputs.

ADR-0331 — rule-enforcement ready-for-review trigger repair

.github/workflows/rule-enforcement.yml must include ready_for_review in its pull_request.types list, alongside edited. Without ready_for_review, draft-to-ready promotion leaves the ADR-0108, ADR-0100, FFmpeg-surface, ADR-number, backfill, and docs/state.md gates stuck on their draft-time skipped check runs while the heavier workflows rerun correctly. Keep edited as well so PR-body fixes can rerun only the rule-enforcement workflow without burning the full matrix again.

Test build graph — generated vcs_version.h dependency

core/test/test_feature_collector.c directly includes core/src/libvmaf.c, and libvmaf.c includes the generated vcs_version.h header. Keep rev_target listed in the test_feature_collector executable sources in core/test/meson.build; otherwise fresh parallel Ninja builds can compile the test before include/vcs_version.h exists and fail nondeterministically.

Vulkan lavapipe CI — motion probes stay out of the VIF job

The Vulkan VIF Cross-Backend (lavapipe, places=4) job should not run the known-broken motion / motion_v2 lavapipe probes with continue-on-error: true. GitHub still emits ##[error] annotations for those advisory failures, which makes a passing PR look broken. Keep the documented T-VULKAN-MOTION-LAVAPIPE-INIT debt in docs/state.md and keep the required GPU-Parity Matrix Gate skip list until the Vulkan motion lavapipe bug is actually fixed; do not reintroduce advisory failing steps inside the named VIF gate.

ADR-0647 — fr_regressor_v1 Netflix refresh

No upstream Netflix C-source rebase impact. This is a fork-local model artifact refresh: model/tiny/fr_regressor_v1.onnx, its sidecar, registry row, model card, ADR/research docs, and state/changelog metadata.

Key invariants:

  • The ADR-0249 model recipe and PLCC ship gate stay unchanged. Do not use this refresh as precedent for changing architecture, feature order, or gate threshold.
  • Refreshes must train from a dated current full-feature table, not from stale runs/full_features_netflix.parquet.
  • fr_regressor_v1.onnx is inline after export. If the exporter rewrites the stale sibling .onnx.data file while the ONNX has no external initializers, restore the orphan sidecar rather than expanding the model diff.

Touched files: model/tiny/fr_regressor_v1.onnx, model/tiny/fr_regressor_v1.json, model/tiny/registry.json, docs/ai/models/fr_regressor_v1.md, docs/adr/0647-ai-fr-regressor-v1-refresh-20260520.md, docs/adr/_index_fragments/0647-ai-fr-regressor-v1-refresh-20260520.md, docs/adr/_index_fragments/_order.txt, docs/research/0647-ai-fr-regressor-v1-refresh-20260520.md, ai/AGENTS.md, docs/state.md, changelog.d/changed/0647-ai-fr-regressor-v1-refresh-20260520.md, docs/rebase-notes.md (this entry).

ADR-0651 — CHUG HDR row metadata

No upstream Netflix C-source rebase impact. This is a fork-local AI corpus-materialisation schema extension in ai/scripts/chug_extract_features.py.

Key invariants:

  • CHUG feature rows now preserve feature_ref_* and feature_dis_* ffprobe HDR/display metadata for the matched reference and distorted clip.
  • Unknown ffprobe fields remain explicit as unknown or null; do not infer panel/display capability in the materialiser.
  • The existing --audit-output corpus preflight remains the aggregate health check; the row fields are the model-facing copy.

Touched files: ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, docs/ai/chug-ingestion.md, docs/adr/0651-chug-hdr-row-metadata.md, docs/adr/_index_fragments/0651-chug-hdr-row-metadata.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0651-chug-hdr-row-metadata.md, ai/AGENTS.md, changelog.d/added/0651-chug-hdr-row-metadata.md, docs/rebase-notes.md (this entry).

ADR-0652 — CHUG visual-signal primitives

No upstream Netflix C-source rebase impact. This is a fork-local AI feature-row schema extension in ai/scripts/chug_extract_features.py.

Key invariants:

  • CHUG feature rows now include feature_ref_*, feature_dis_*, and feature_delta_* luma-domain visual-signal primitives for luma_std, sharpness_laplacian_var, highfreq_abs_mean, and noise_lap_mad.
  • These are deterministic diagnostic blur/noise/grain proxies computed from sampled decoded YUV10 luma frames. Do not treat them as a trained no-reference VQA model.
  • The visual-signal cache lives beside the existing CHUG feature cache and must be regenerated if the primitive definitions change.

Touched files: ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, docs/ai/chug-ingestion.md, docs/adr/0652-chug-visual-signal-primitives.md, docs/adr/_index_fragments/0652-chug-visual-signal-primitives.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0652-chug-visual-signal-primitives.md, ai/AGENTS.md, changelog.d/added/0652-chug-visual-signal-primitives.md, docs/rebase-notes.md (this entry).

ADR-0653 — CHUG display-profile training

No upstream Netflix C-source rebase impact. This is a fork-local CHUG HDR MOS training-schema extension under ai/ plus docs and DDD material.

Key invariants:

  • chug-hdr-wide-v1 remains the no-profile CHUG default.
  • --display-profile-json selects chug-hdr-display-v1 only when the caller did not explicitly pass --feature-schema.
  • Row-local display fields override the target profile so future multi-display HDR corpora keep their panel axis.
  • The display profile is recorded in the emitted manifest with normalized feature values and source sha256.

Touched files: ai/scripts/train_konvid_mos_head.py, ai/scripts/train_chug_hdr_mos_head.py, ai/tests/test_train_konvid_mos_head.py, docs/ai/chug-ingestion.md, docs/ai/mos-corpora.md, docs/ai/models/konvid_mos_head_v1.md, docs/adr/0653-chug-display-profile-training.md, docs/adr/_index_fragments/0653-chug-display-profile-training.md, docs/adr/_index_fragments/_order.txt, docs/research/0653-chug-display-profile-training.md, ai/AGENTS.md, changelog.d/added/0653-chug-display-profile-training.md, docs/rebase-notes.md (this entry).

ADR-0657 — Second-opinion feature materializer

No upstream Netflix C-source rebase impact. This is a fork-local AI feature-table enrichment utility under ai/scripts/.

Key invariants:

  • ai/scripts/materialize_second_opinion_features.py stays table-side: it joins already-generated scorer JSON/JSONL and must not invoke or vendor third-party VQA projects.
  • Output columns remain namespaced as second_opinion_<scorer>_* so downstream audits and trainers can detect NR/MOS evidence without colliding with native corpus columns.
  • Duplicate (scorer, key) rows are rejected; they usually indicate stale reruns or mismatched row keys and must not be averaged silently.

Touched files: ai/scripts/materialize_second_opinion_features.py, ai/scripts/signal_mix_audit.py, ai/tests/test_second_opinion_features.py, docs/ai/second-opinion-features.md, docs/ai/signal-mix-audit.md, docs/ai/index.md, docs/adr/0657-second-opinion-feature-materializer.md, docs/adr/_index_fragments/0657-second-opinion-feature-materializer.md, docs/adr/_index_fragments/_order.txt, docs/research/0657-second-opinion-feature-materializer.md, ai/AGENTS.md, mkdocs.yml, changelog.d/added/0657-second-opinion-feature-materializer.md, docs/rebase-notes.md (this entry).

ADR-0658 — Project modernization audit

No upstream Netflix C-source rebase impact. This is a fork-local developer-tooling audit under scripts/dev/.

Key invariants:

  • scripts/dev/project_modernization_audit.py is read-only. It may emit JSON and Markdown, but it must not rewrite .workingdir2/OPEN.md, .workingdir2/BACKLOG.md, docs/state.md, changelog fragments, or PR bodies.
  • The scanner is advisory queue shaping, not a required CI gate. Its marker matches are intentionally text-based and need human triage.
  • Archived scratch remains skipped by default; include it only with --include-archives during deliberate archaeology.

Touched files: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, docs/development/project-modernization-audit.md, docs/adr/0658-project-modernization-audit.md, docs/adr/_index_fragments/0658-project-modernization-audit.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0658-project-modernization-audit.md, scripts/AGENTS.md, mkdocs.yml, changelog.d/added/0658-project-modernization-audit.md, docs/rebase-notes.md (this entry).

ADR-0659 — Modernization audit false-positive filter

No upstream Netflix C-source rebase impact. This is a fork-local developer-tooling precision fix under scripts/dev/.

Key invariants:

  • Live Python raise NotImplementedError(...) rows remain high-severity audit findings.
  • Historical closeout prose such as "replaced the NotImplementedError scaffold", Python except NotImplementedError handlers, and custom NotImplementedError exception subclasses are not modernization gaps.
  • Documented -ENOSYS optional-build contracts are not modernization gaps; bare return -ENOSYS; rows outside such context still are.
  • Add future suppressions as narrow line-context tests; avoid file-level suppressions that could hide new real debt.

Touched files: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, docs/development/project-modernization-audit.md, docs/adr/0659-modernization-audit-false-positive-filter.md, docs/adr/_index_fragments/0659-modernization-audit-false-positive-filter.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0659-modernization-audit-false-positive-filter.md, scripts/AGENTS.md, changelog.d/fixed/0659-modernization-audit-false-positive-filter.md, docs/rebase-notes.md (this entry).

ADR-0661 — AI run manifest provenance

No upstream Netflix C-source rebase impact. This is fork-local AI tooling under ai/ plus human-facing docs.

Key invariants:

  • New AI training/export sidecars should use aiutils.run_manifest.build_run_provenance() instead of hand-rolled path hashing or argument JSON.
  • CHUG MOS wrapper runs record train_chug_hdr_mos_head.py as the user-facing entrypoint, even though they delegate into the shared KonViD training loop.
  • Add shared_trainer when wrapper identity and implementation script differ.

Touched files: ai/src/aiutils/run_manifest.py, ai/src/aiutils/__init__.py, ai/scripts/train_konvid_mos_head.py, ai/scripts/train_chug_hdr_mos_head.py, ai/tests/test_run_manifest.py, ai/tests/test_train_konvid_mos_head.py, ai/AGENTS.md, ai/src/aiutils/AGENTS.md, docs/ai/training.md, docs/ai/models/konvid_mos_head_v1.md, docs/ai/chug-ingestion.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/adr/_index_fragments/0661-ai-run-manifest-provenance.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0661-ai-run-manifest-provenance.md, changelog.d/added/0661-ai-run-manifest-provenance.md, docs/rebase-notes.md (this entry).

ADR-0662 — Vulkan motion lavapipe parity

Rebase-sensitive feature-extractor impact. This changes fork-local GPU motion twins and CI parity routing; keep these invariants when resolving any upstream sync that touches motion, feature registration, or parity scripts.

Key invariants:

  • integer_motion_vulkan stays before legacy motion_vulkan in the Vulkan registry block so model feature-name dispatch chooses the lavapipe-stable canonical twin.
  • Both parity scripts keep BACKEND_EXTRACTOR_ALIASES[("motion", "vulkan")] = "integer_motion_vulkan".
  • CUDA, SYCL, and Vulkan motion_v2 kernels use the CPU integer_motion_v2.c::mirror high-edge literal 2 * size - idx - 2; do not restore the stale -1 formula from old ADR-0193 prose.
  • integer_motion_vulkan defaults debug=true, matching CPU, CUDA, and the legacy Vulkan motion extractor, so the raw integer_motion metric is emitted for parity.

Touched files: .github/workflows/tests-and-quality-gates.yml, core/src/feature/feature_extractor.c, core/src/feature/vulkan/integer_motion_vulkan.c, core/src/feature/vulkan/shaders/motion_v2.comp, core/src/feature/cuda/integer_motion_v2/motion_v2_score.cu, core/src/feature/sycl/integer_motion_v2_sycl.cpp, scripts/ci/cross_backend_vif_diff.py, scripts/ci/cross_backend_parity_gate.py, docs/metrics/motion.md, docs/metrics/features.md, docs/backends/vulkan/overview.md, docs/api/gpu.md, docs/development/cross-backend-gate.md, docs/adr/0193-motion-v2-vulkan.md, docs/adr/0662-vulkan-motion-lavapipe-parity.md, docs/adr/_index_fragments/0193-motion-v2-vulkan.md, docs/adr/_index_fragments/0662-vulkan-motion-lavapipe-parity.md, docs/adr/_index_fragments/_order.txt, docs/research/0662-vulkan-motion-lavapipe-parity.md, core/src/feature/AGENTS.md, core/src/feature/vulkan/AGENTS.md, core/src/feature/cuda/AGENTS.md, scripts/ci/AGENTS.md, changelog.d/fixed/0662-vulkan-motion-lavapipe-parity.md, docs/rebase-notes.md (this entry).

ADR-0663 — MOS label materializer

No upstream Netflix C-source rebase impact. This is fork-local AI training/data-prep plumbing under ai/scripts/.

Key invariants:

  • ai/scripts/materialize_mos_labels.py stays table-side: it joins subjective MOS labels onto already-extracted feature tables and must not extract features, download corpora, or train models.
  • Real MOS-head training must not silently synthesize data when explicit real-corpus paths produce zero labelled rows. --smoke is the documented synthetic path.
  • Conflicting duplicate label keys are rejected; low unique-key coverage fails by default so stale key joins do not become training inputs.

Touched files: ai/scripts/materialize_mos_labels.py, ai/scripts/train_konvid_mos_head.py, ai/tests/test_materialize_mos_labels.py, ai/tests/test_train_konvid_mos_head.py, docs/ai/mos-label-materializer.md, docs/ai/mos-corpora.md, docs/ai/models/konvid_mos_head_v1.md, docs/ai/index.md, docs/adr/0663-mos-label-materializer.md, docs/adr/_index_fragments/0663-mos-label-materializer.md, docs/adr/_index_fragments/_order.txt, docs/research/0663-mos-label-materializer.md, ai/AGENTS.md, mkdocs.yml, changelog.d/added/0663-mos-label-materializer.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — AI model sidecar run provenance

AI-sidecar provenance impact. This widens ADR-0661 from MOS-head trainers to the FR regressor training family and the vmaf_tiny exporter family.

Key invariants:

  • train_fr_regressor.py, train_fr_regressor_v2.py, and train_fr_regressor_v3.py sidecars carry run_provenance built by aiutils.run_manifest.build_run_provenance().
  • v1/v2 metrics JSON carries the same block, including gate-failed runs where no ONNX export is written.
  • export_vmaf_tiny_v2.py, export_vmaf_tiny_v3.py, and export_vmaf_tiny_v4.py sidecars carry the same block with the checkpoint input and ONNX/sidecar output targets.
  • Do not replace this with per-script argument/path JSON when rebasing AI trainer changes; extend the shared helper instead.

Touched files: ai/scripts/export_vmaf_tiny_v2.py, ai/scripts/export_vmaf_tiny_v3.py, ai/scripts/export_vmaf_tiny_v4.py, ai/scripts/train_fr_regressor.py, ai/scripts/train_fr_regressor_v2.py, ai/scripts/train_fr_regressor_v3.py, ai/tests/test_fr_regressor_run_provenance.py, ai/tests/test_vmaf_tiny_export_run_provenance.py, docs/ai/training.md, docs/ai/models/fr_regressor_v1.md, docs/ai/models/fr_regressor_v2.md, docs/ai/models/fr_regressor_v3.md, docs/ai/models/vmaf_tiny_v2.md, docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0664-ai-fr-regressor-run-provenance.md, changelog.d/added/0664-ai-fr-regressor-run-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — AI eval report run provenance

AI-eval/validate provenance impact. This widens ADR-0661 from model-producing sidecars to the tiny-VMAF evaluation and validation report families.

Key invariants:

  • eval_loso_vmaf_tiny_v3.py, eval_loso_vmaf_tiny_v4.py, eval_loso_vmaf_tiny_v5.py, and eval_multiseed_v3_v4.py report JSON files carry run_provenance built by aiutils.run_manifest.build_run_provenance().
  • Evaluation reports record the feature parquet input(s), parsed eval hyperparameters, original argv, and report_target output path.
  • validate_ensemble_seeds.py verdict JSON files carry the same schema and record loso_dir, corpus_root, seed list, gate thresholds, and the PROMOTE.json / HOLD.json output path.
  • Do not restore per-script json.dumps(...).write_text(...) report writers when rebasing eval-script changes; use write_manifest_json() so the JSON shape and newline handling stay shared.

Touched files: ai/scripts/eval_loso_vmaf_tiny_v3.py, ai/scripts/eval_loso_vmaf_tiny_v4.py, ai/scripts/eval_loso_vmaf_tiny_v5.py, ai/scripts/eval_multiseed_v3_v4.py, ai/scripts/validate_ensemble_seeds.py, ai/tests/test_eval_report_run_provenance.py, ai/tests/test_validate_ensemble_seeds.py, docs/ai/training.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md, docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md, docs/ai/models/vmaf_tiny_v5.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0665-ai-eval-report-run-provenance.md, ai/src/aiutils/AGENTS.md, changelog.d/added/0665-ai-eval-report-run-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Legacy AI eval report run provenance

Legacy eval/report provenance impact. This widens ADR-0661 adoption from the refreshed v3/v4/v5 eval family to older durable AI evaluation reports.

Key invariants:

  • eval_loso_mlp_small.py and eval_loso_3arch.py JSON reports carry run_provenance built by aiutils.run_manifest.build_run_provenance().
  • eval_probabilistic_proxy.py --metrics-out writes the same block with the ensemble manifest input, optional held-out parquet, and metrics output path.
  • eval_saliency_per_mb.py writes the same block for CLI output, recording the predicted and ground-truth mask directories plus block settings.
  • Do not restore direct json.dump() writers for these durable reports when rebasing old eval-script changes; use write_manifest_json() for stable sorting and newline handling.

Touched files: ai/scripts/eval_loso_mlp_small.py, ai/scripts/eval_loso_3arch.py, ai/scripts/eval_probabilistic_proxy.py, ai/scripts/eval_saliency_per_mb.py, ai/tests/test_legacy_eval_report_run_provenance.py, ai/tests/test_eval_saliency_per_mb.py, docs/ai/training.md, docs/ai/loso-eval.md, docs/ai/saliency-per-mb-eval.md, docs/ai/models/fr_regressor_v2_probabilistic.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0666-ai-legacy-eval-report-provenance.md, changelog.d/added/0666-ai-legacy-eval-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Predictor v2 real-corpus report provenance

Predictor-v2 report provenance impact. This widens ADR-0661 adoption to the per-codec real-corpus gate report used before predictor-v2 model-card updates.

Key invariants:

  • ai/scripts/train_predictor_v2_realcorpus.py writes runs/predictor_v2_realcorpus/report.json with a run_provenance block built by aiutils.run_manifest.build_run_provenance().
  • The report records the trainer entrypoint, original argv, parsed arguments, explicit corpus files, corpus roots, resolved JSONL files, and report target.
  • Keep ADR-0303 gate constants untouched; provenance makes failed or insufficient reports reproducible, but it does not change pass/fail logic.
  • Do not restore direct Path.write_text(json.dumps(...)) report output here; use write_manifest_json() so JSON sorting and trailing-newline behavior stay shared with the other AI provenance reports.

Touched files: ai/scripts/train_predictor_v2_realcorpus.py, ai/tests/test_train_predictor_v2_realcorpus.py, docs/ai/predictor-v2-realcorpus-training.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0667-predictor-v2-report-provenance.md, changelog.d/added/0667-predictor-v2-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — vmaf_tiny train stats provenance

vmaf_tiny training provenance impact. This widens ADR-0661 adoption to the pre-export stats JSON files emitted by the vmaf_tiny trainer family.

Key invariants:

  • train_vmaf_tiny_v2.py, train_vmaf_tiny_v3.py, train_vmaf_tiny_v4.py, and train_vmaf_tiny_v5.py write their --out-stats JSON with a run_provenance block built by aiutils.run_manifest.build_run_provenance().
  • v2/v3/v4 stats record the parquet input, checkpoint target, stats target, argv, and parsed hyperparameters.
  • v5 stats record both parquet_base and parquet_extra, plus checkpoint and stats output targets.
  • Do not restore direct Path.write_text(json.dumps(...)) stats output here; use write_manifest_json() so JSON sorting and trailing-newline behavior stay shared with the other AI provenance reports.

Touched files: ai/scripts/train_vmaf_tiny_v2.py, ai/scripts/train_vmaf_tiny_v3.py, ai/scripts/train_vmaf_tiny_v4.py, ai/scripts/train_vmaf_tiny_v5.py, ai/tests/test_vmaf_tiny_train_run_provenance.py, docs/ai/training.md, docs/ai/models/vmaf_tiny_v2.md, docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md, docs/ai/models/vmaf_tiny_v5.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0668-vmaf-tiny-train-stats-provenance.md, changelog.d/added/0668-vmaf-tiny-train-stats-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — AI materializer audit provenance

Materializer audit provenance impact. This widens ADR-0661 adoption to feature-table materializers and signal-mix audit reports that feed retraining or model-mix decisions.

Key invariants:

  • materialize_mos_labels.py --audit-json and materialize_second_opinion_features.py --audit-json include run_provenance in their audit JSON outputs.
  • materialize_saliency_features.py --audit-json writes row counters, the effective config, and run_provenance; use it for saliency-enriched tables that feed retraining.
  • signal_mix_audit.py --out-json includes run_provenance for audited table paths, thresholds, argv, JSON output, and Markdown output.
  • Do not reintroduce bespoke path hashing or direct audit JSON writers on these surfaces; use aiutils.run_manifest.build_run_provenance() and write_manifest_json().

Touched files: ai/scripts/materialize_mos_labels.py, ai/scripts/materialize_second_opinion_features.py, ai/scripts/materialize_saliency_features.py, ai/scripts/signal_mix_audit.py, ai/tests/test_materialize_mos_labels.py, ai/tests/test_second_opinion_features.py, ai/tests/test_materialize_saliency_features.py, ai/tests/test_signal_mix_audit.py, docs/ai/training.md, docs/ai/mos-label-materializer.md, docs/ai/second-opinion-features.md, docs/ai/saliency-feature-materializer.md, docs/ai/signal-mix-audit.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0669-ai-materializer-audit-provenance.md, changelog.d/added/0669-ai-materializer-audit-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Ensemble seed export provenance

Ensemble export provenance impact. This widens ADR-0661 adoption to the production seed exporter for fr_regressor_v2_ensemble_v1_seed* sidecars.

Key invariants:

  • ai/scripts/export_ensemble_v2_seeds.py builds one run_provenance block per invocation with the corpus, PROMOTE verdict, parsed export args, argv, per-seed ONNX/sidecar targets, and optional registry target.
  • Each fresh fr_regressor_v2_ensemble_v1_seed{N}.json sidecar receives that block.
  • Sidecar and optional registry writes use write_manifest_json() so the JSON formatting contract matches the other ADR-0661 adopters.

Touched files: ai/scripts/export_ensemble_v2_seeds.py, ai/tests/test_export_ensemble_v2_seeds_provenance.py, docs/ai/training.md, docs/ai/models/fr_regressor_v2_probabilistic.md, docs/ai/ensemble-training-kit.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0670-ensemble-seed-export-provenance.md, changelog.d/added/0670-ensemble-seed-export-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Ensemble LOSO report provenance

Ensemble LOSO provenance impact. This widens ADR-0661 adoption to the per-seed loso_seed{N}.json reports emitted by the production ensemble LOSO trainer.

Key invariants:

  • ai/scripts/train_fr_regressor_v2_ensemble_loso.py writes each loso_seed{N}.json through write_manifest_json().
  • Each report includes run_provenance with the trainer entrypoint, original argv, parsed training args, corpus JSONL input, and per-seed report target.
  • The existing gate keys (mean_plcc, min_plcc, max_plcc, folds, and seed metadata) remain unchanged for scripts/ci/ensemble_prod_gate.py and ai/scripts/validate_ensemble_seeds.py.

Touched files: ai/scripts/train_fr_regressor_v2_ensemble_loso.py, ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py, docs/ai/training.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md, docs/ai/ensemble-training-kit.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0671-ensemble-loso-report-provenance.md, changelog.d/added/0671-ensemble-loso-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — vmaf-train CLI report provenance

vmaf-train report provenance impact. This widens ADR-0661 adoption to the user-facing vmaf-train --json report surfaces.

Key invariants:

  • ai/src/vmaf_train/cli.py uses _write_cli_report_json() for durable report commands that accept --json.
  • Covered subcommands: validate-norm, profile, audit-learned-filter, quantize-int8, cross-backend, and bisect-model-quality.
  • The provenance block records the CLI entrypoint, argv, parsed options, model/feature/calibration/frame inputs, JSON report output, and generated model output where a command writes one.

Touched files: ai/src/vmaf_train/cli.py, ai/tests/test_tune_cli.py, docs/usage/vmaf-train.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0672-vmaf-train-cli-report-provenance.md, changelog.d/added/0672-vmaf-train-cli-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Feature-correlation report provenance

Feature-correlation provenance impact. This widens ADR-0661 adoption to the feature-ranking report emitted by ai/scripts/feature_correlation.py.

Key invariants:

  • feature_correlation.py --out writes the JSON report through write_manifest_json().
  • The report includes run_provenance with the analyzer entrypoint, original argv, parsed target / redundancy / top-K arguments, source parquet input, and JSON report target.
  • The analytic payload keys (pearson, redundant_pairs, importances, per_method_topk, and consensus_topk) remain unchanged for downstream research/audit readers.

Touched files: ai/scripts/feature_correlation.py, ai/tests/test_feature_correlation.py, docs/research/0027-phase2-feature-importance.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0673-feature-correlation-report-provenance.md, changelog.d/added/0673-feature-correlation-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Phase-3 subset-sweep report provenance

Phase-3 sweep provenance impact. This widens ADR-0661 adoption to the model-selection JSON emitted by ai/scripts/phase3_subset_sweep.py.

Key invariants:

  • phase3_subset_sweep.py --out writes the JSON report through write_manifest_json().
  • The report keeps existing subset result keys and adds top-level run_provenance with the analyzer entrypoint, original argv, parsed subset / seed / standardization arguments, source parquet input, and JSON report target.
  • The subset result payload (features, per_seed, summary) remains unchanged for each requested subset.

Touched files: ai/scripts/phase3_subset_sweep.py, ai/tests/test_phase3_subset_sweep.py, docs/research/0028-phase3-subset-sweep.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0674-phase3-subset-report-provenance.md, changelog.d/added/0674-phase3-subset-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — Quantisation report provenance

Quantisation provenance impact. This widens ADR-0661 adoption to the int8 producer/gate scripts used for model-card promotion evidence.

Key invariants:

  • ai/scripts/ptq_dynamic.py --report-out and ai/scripts/ptq_static.py --report-out write JSON reports with fp32/int8 sizes, selected quantisation settings, and run_provenance.
  • ai/scripts/qat_train.py --report-out writes QAT output/report metadata for the fp32 bridge and final int8 ONNX artifact.
  • ai/scripts/measure_quant_drop.py --out-json preserves per-model gate rows and run_provenance without changing stdout or exit codes.

Touched files: ai/scripts/ptq_dynamic.py, ai/scripts/ptq_static.py, ai/scripts/qat_train.py, ai/scripts/measure_quant_drop.py, ai/tests/test_ptq_scripts.py, ai/tests/test_qat_smoke.py, docs/ai/quantization.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0682-quantization-report-provenance.md, changelog.d/added/0682-quantization-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0661 follow-up — CHUG extraction report provenance

CHUG extraction provenance impact. This widens ADR-0661 adoption to the local CHUG split manifest and HDR metadata audit JSON emitted before HDR MOS training.

Key invariants:

  • ai/scripts/chug_extract_features.py --split-manifest keeps the existing content-level split payload and adds top-level run_provenance.
  • ai/scripts/chug_extract_features.py --audit-output keeps the existing HDR audit counters/malformed-row payload and adds top-level run_provenance.
  • Feature JSONL rows are unchanged; this PR only stamps the durable split/audit JSON evidence with extractor command and input/output context.

Touched files: ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, docs/ai/chug-ingestion.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0684-chug-extraction-report-provenance.md, changelog.d/added/0684-chug-extraction-report-provenance.md, docs/rebase-notes.md (this entry).

ADR-0668 — AI derived table provenance

Derived-table provenance impact. This extends the ADR-0661 manifest pattern from trainer/report JSONs down to the local FULL_FEATURES parquet builders that feed refreshed AI models.

Key invariants:

  • ai/scripts/extract_k150k_features.py writes <out>.manifest.json by default with feature order, CPU/CUDA extractor split, restart counters, backend worker counts, parquet row count, and shared run_provenance.
  • ai/scripts/combine_full_feature_parquets.py writes <out>.manifest.json by default with input labels, per-input row counts, missing-feature fill lists, corpus distribution, output column order, and shared run_provenance.
  • ai/scripts/enrich_k150k_parquet_metadata.py writes <out>.manifest.json by default with metadata match/update counters, available metadata keys, overwrite policy, and shared run_provenance.
  • Existing parquet row schemas are unchanged; the manifest is a sibling local evidence artifact.

Touched files: ai/scripts/extract_k150k_features.py, ai/scripts/combine_full_feature_parquets.py, ai/scripts/enrich_k150k_parquet_metadata.py, ai/tests/test_extract_k150k_features.py, ai/tests/test_combine_full_feature_parquets.py, ai/tests/test_enrich_k150k_parquet_metadata.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/chug-ingestion.md, docs/adr/0668-ai-derived-table-provenance.md, docs/research/0688-ai-derived-table-provenance.md, changelog.d/added/0668-ai-derived-table-provenance.md, docs/rebase-notes.md (this entry).

ADR-0669 — AI corpus JSONL provenance

Corpus-JSONL provenance impact. This extends the ADR-0661 manifest pattern to the corpus JSONL boundary before trainers consume merged or aggregated row streams.

Key invariants:

  • ai/scripts/aggregate_corpora.py writes <output>.manifest.json by default with MOS scale conversions, optional corpus-source overrides, aggregate counters, and shared run_provenance.
  • ai/scripts/merge_corpora.py writes <output>.manifest.json by default with required vmaf-tune corpus keys, natural dedup key, merge counters, and shared run_provenance.
  • JSONL row schemas are unchanged; run-level evidence belongs in the sidecar.

Touched files: ai/scripts/aggregate_corpora.py, ai/scripts/merge_corpora.py, ai/tests/test_aggregate_corpora.py, ai/tests/test_merge_corpora.py, ai/AGENTS.md, docs/ai/mos-corpora.md, docs/ai/multi-corpus-aggregation.md, docs/ai/training.md, docs/adr/0669-ai-corpus-jsonl-provenance.md, docs/research/0689-ai-corpus-jsonl-provenance.md, changelog.d/added/0669-ai-corpus-jsonl-provenance.md, docs/rebase-notes.md (this entry).

ADR-0670 — AI legacy corpus extraction manifests

Legacy trainer-input provenance impact. This extends the ADR-0661 manifest pattern to older corpus/extraction scripts that directly create local trainer-input parquets or vmaf-tune JSONL.

Key invariants:

  • ai/scripts/extract_full_features.py writes <out>.manifest.json by default with Netflix corpus/cache inputs, VMAF binary evidence, feature list, pair count, row count, and shared run_provenance.
  • ai/scripts/konvid_to_vmaf_pairs.py writes <out>.manifest.json by default with KoNViD root, VMAF/model inputs, cache policy, CRF, feature list, clip / frame counters, failed clip IDs, and shared run_provenance.
  • ai/scripts/bvi_dvc_to_corpus_jsonl.py writes <output>.manifest.json by default with cache inputs, row schema version, adapter labels, row/cache counters, and shared run_provenance.
  • BVI-DVC JSONL rows must include the current vmaf-tune v3 additive keys; unavailable HDR, shot, canonical-feature aggregate, and encoder-internal values are explicit defaults, not missing columns.

Touched files: ai/scripts/extract_full_features.py, ai/scripts/konvid_to_vmaf_pairs.py, ai/scripts/bvi_dvc_to_corpus_jsonl.py, ai/tests/test_legacy_corpus_extraction_manifests.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/mos-corpora.md, docs/adr/0670-ai-legacy-corpus-extraction-manifests.md, docs/research/0690-ai-legacy-corpus-extraction-manifests.md, changelog.d/added/0670-ai-legacy-corpus-extraction-manifests.md, docs/rebase-notes.md (this entry).

ADR-0673 — Saliency materializer batch manifest

Saliency table refresh impact. This adds a batch orchestration layer over ai/scripts/materialize_saliency_features.py. Rebase work that changes SaliencyMaterializeConfig, saliency status values, or table read/write semantics must update both the single-table script and the batch manifest runner/tests together.

Key invariants:

  • ai/scripts/batch_materialize_saliency_features.py must import and reuse the single-table materializer functions; it must not duplicate FFmpeg decode, ffprobe fallback, saliency inference, or row status semantics.
  • Batch manifests carry shared defaults plus per-table overrides. Relative paths resolve from the manifest directory unless --base-dir is supplied.
  • Batch reports use schema saliency-materializer-batch-v1 and include ADR-0661 run_provenance.

Touched files: ai/scripts/batch_materialize_saliency_features.py, ai/tests/test_batch_materialize_saliency_features.py, ai/AGENTS.md, docs/ai/saliency-feature-materializer.md, docs/adr/0673-saliency-materializer-batch-manifest.md, docs/research/0693-saliency-materializer-batch-manifest.md, changelog.d/added/0673-saliency-materializer-batch-manifest.md, docs/rebase-notes.md (this entry).

ADR-0674 — Second-opinion materializer batch manifest

Second-opinion table refresh impact. This adds a batch orchestration layer over ai/scripts/materialize_second_opinion_features.py. Rebase work that changes score-sidecar parsing, join-key policy, missing-score semantics, or run-provenance fields must update both the single-table joiner and the batch manifest runner/tests together.

Key invariants:

  • ai/scripts/batch_materialize_second_opinion_features.py must import and reuse materialize_second_opinion_features.materialize(); external scorer execution remains outside this repo.
  • Batch manifests carry shared defaults plus per-table overrides. Relative paths resolve from the manifest directory unless --base-dir is supplied.
  • Batch reports use schema second-opinion-materializer-batch-v1 and include ADR-0661 run_provenance.

Touched files: ai/scripts/batch_materialize_second_opinion_features.py, ai/tests/test_batch_materialize_second_opinion_features.py, ai/AGENTS.md, docs/ai/second-opinion-features.md, docs/adr/0674-second-opinion-materializer-batch-manifest.md, docs/research/0694-second-opinion-materializer-batch-manifest.md, changelog.d/added/0674-second-opinion-materializer-batch-manifest.md, docs/rebase-notes.md (this entry).

ADR-0675 — MOS label materializer batch manifest

MOS-labelled table refresh impact. This adds a batch orchestration layer over ai/scripts/materialize_mos_labels.py. Rebase work that changes MOS column inference, key-normalisation, match-rate enforcement, overwrite policy, or run-provenance fields must update both the single-table materializer and the batch manifest runner/tests together.

Key invariants:

  • ai/scripts/batch_materialize_mos_labels.py must import and reuse materialize_mos_labels.materialize(); it must not parse MOS rows, extract features, or train models.
  • Batch manifests carry shared defaults plus per-table overrides. Relative paths resolve from the manifest directory unless --base-dir is supplied.
  • Batch reports use schema mos-label-materializer-batch-v1 and include ADR-0661 run_provenance.

Touched files: ai/scripts/batch_materialize_mos_labels.py, ai/tests/test_batch_materialize_mos_labels.py, ai/AGENTS.md, docs/ai/mos-label-materializer.md, docs/adr/0675-mos-label-materializer-batch-manifest.md, docs/research/0695-mos-label-materializer-batch-manifest.md, changelog.d/added/0675-mos-label-materializer-batch-manifest.md, docs/rebase-notes.md (this entry).

ADR-0679 — CI draft auto-merge gate

Merge-train safety impact. The single required branch-protection context, Required Checks Aggregator, must not be skipped on draft PRs. It now runs on drafts and fails intentionally so GitHub cannot treat a draft-era skipped check as sufficient for auto-merge after the PR is marked ready.

Key invariants:

  • Keep required-aggregator.yml free of a job-level draft skip. Expensive sibling workflows may still skip draft PRs, but the required aggregate status must fail drafts and rerun on ready_for_review.
  • The aggregator ignores sibling check runs older than the current workflow registration window and chooses the newest run per check name. This prevents stale draft-era skipped checks on the same SHA from masking ready-run checks.
  • The ADR collision guard phase 1 compares against BASE_SHA, not live origin/master, so a fast post-merge workflow cannot self-collide against the PR's own ADR.

Touched files: .github/workflows/required-aggregator.yml, .github/workflows/rule-enforcement.yml, .github/AGENTS.md, docs/adr/0679-ci-draft-automerge-gate.md, docs/research/0699-ci-draft-automerge-gate.md, changelog.d/fixed/0679-ci-draft-automerge-gate.md, docs/rebase-notes.md (this entry).

ADR-0680 — Shared AI CLI helper pattern

AI script helper impact. Batch manifest runners now share parser and raw argv boilerplate through aiutils.cli_helpers. Rebase work that changes the standard manifest/report/fail-fast flags should update the helper and all batch runner tests together instead of editing each runner independently.

Key invariants:

  • collect_cli_argv() is the canonical raw-argument capture for ADR-0661 provenance in scripts that accept an injectable argv.
  • add_batch_manifest_arguments() owns --manifest, --base-dir, --report-json, --report-md, --fail-fast, and optional --allow-row-failures for batch manifest runners.
  • Table-specific manifest schemas and materializer semantics stay in the individual runner modules.

Touched files: ai/src/aiutils/cli_helpers.py, ai/scripts/batch_materialize_saliency_features.py, ai/scripts/batch_materialize_second_opinion_features.py, ai/scripts/batch_materialize_mos_labels.py, ai/tests/test_cli_helpers.py, ai/AGENTS.md, ai/src/aiutils/AGENTS.md, .claude/skills/ai-run-manifest/SKILL.md, docs/ai/training.md, docs/adr/0680-ai-cli-helper-pattern.md, docs/research/0700-ai-cli-helper-pattern.md, changelog.d/added/0680-ai-cli-helper-pattern.md, docs/rebase-notes.md (this entry).

ADR-0681 — AI script bootstrap helper

AI script import impact. Directly executable ai/scripts/*.py files now use ai/scripts/_script_bootstrap.py::bootstrap_ai_script(__file__) for repo-local imports before they import aiutils, sibling materializers, or vmaf-tune helpers.

Key invariants:

  • aiutils must remain free of startup path mutation; the bootstrap lives in ai/scripts because it has to run before ai/src is importable.
  • New ad hoc sys.path.insert(...) blocks in AI scripts should be avoided. If a script needs a new repo-local root, extend _script_bootstrap.py and ai/tests/test_script_bootstrap.py.
  • The helper only owns import roots; artifact schemas, materializer rules, and report contents stay in the individual scripts.

Touched files: ai/scripts/_script_bootstrap.py, ai/scripts/batch_materialize_saliency_features.py, ai/scripts/batch_materialize_second_opinion_features.py, ai/scripts/batch_materialize_mos_labels.py, ai/scripts/enrich_k150k_parquet_metadata.py, ai/scripts/combine_full_feature_parquets.py, ai/scripts/extract_k150k_features.py, ai/tests/test_script_bootstrap.py, ai/AGENTS.md, ai/src/aiutils/AGENTS.md, .claude/skills/ai-run-manifest/SKILL.md, docs/ai/training.md, docs/adr/0681-ai-script-bootstrap-helper.md, docs/research/0701-ai-script-bootstrap-helper.md, changelog.d/changed/0681-ai-script-bootstrap-helper.md, docs/rebase-notes.md (this entry).

fix/mcp-cjson-banned-functions (ADR-0683)

No upstream rebase impact: core/src/mcp/3rdparty/cJSON/ is fork-local; upstream Netflix/vmaf does not vendor cJSON. There is no rebase conflict risk from the Netflix side.

Invariant: if this directory is synced to a newer cJSON upstream release, verify that no banned functions (sprintf, strcpy) have been re-introduced, and re-apply the fixes documented in ADR-0683. The AGENTS.md in this directory carries the exact grep command to check.

Smoke: ninja -C build && meson test -C build --suite=fast (no dedicated cJSON unit test; the MCP smoke covers the JSON paths).

Touched files: core/src/mcp/3rdparty/cJSON/cJSON.c, core/src/mcp/3rdparty/cJSON/AGENTS.md, docs/adr/0683-cjson-banned-function-remediation.md, docs/adr/README.md, changelog.d/fixed/0683-mcp-cjson-banned-functions.md, docs/rebase-notes.md (this entry).

2026-05-21 follow-up — contract-noise filter widening

No upstream Netflix C-source rebase impact. This stays within ADR-0659's scanner-precision policy.

Key invariant: suppress only context-bound false positives: optional-backend contracts that name HAVE_*, enable_*=false, missing loader/runtime, or CPU fallback; unit-test stub prose; and ADR allocator .md.stub reservation wording. Non-implementation "stub" uses such as Python type-stub packages, driver-stub diagnostics, and ABI-pinning disabled-build stub comments are also filtered. Do not add broad file-level allowlists.

Touched files: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, docs/development/project-modernization-audit.md, docs/research/0685-modernization-audit-contract-noise.md, scripts/AGENTS.md, changelog.d/fixed/0685-modernization-audit-contract-noise.md, docs/rebase-notes.md (this entry).

ADR-0682 — Tiny-AI Netflix corpus training scaffold — 2026-05-22 prep scope

  • ADR: ADR-0682.
  • Upstream source: fork-local. Netflix/vmaf has no tiny-AI training surface.
  • Branch: ai/tiny-netflix-training-scaffold. Key invariants:

  • Data path is local-only. .workingdir2/netflix/ is gitignored; YUV files are never committed. Every training script must accept --data-root (or the VMAF_DATA_ROOT environment variable) as the sole corpus entry point.

  • Branch name is the routine's idempotency key. Once ai/tiny-netflix-training-scaffold exists on origin, the daily prep-scaffolding routine exits silently. Do not rename or delete the branch until the follow-up architecture-selection PR has merged.
  • Netflix golden pairs are held-out only. The 3 pairs in python/test/resource/yuv/ (see CLAUDE.md §8) are correctness gates; they are never used as training data.
  • Architecture selection is deferred. ADR-0682 and ADR-0242 document the alternatives table but do not pick an architecture. The follow-up PR must resolve questions (A), (B), (C) from ADR-0242 before any training run. Touched files: docs/adr/0682-tiny-ai-netflix-training-scaffold-2026-05-22.md, docs/adr/_index_fragments/0682-tiny-ai-netflix-training-scaffold-2026-05-22.md, docs/research/0706-tiny-ai-netflix-training-prep-2026-05-22.md, changelog.d/added/0682-tiny-ai-netflix-training-scaffold-2026-05-22.md, docs/rebase-notes.md (this entry).

feat/bindings-rust-vmafx-sys (ADR-0706) — fork-only Rust crate, no Netflix upstream impact

No upstream rebase impact: bindings/rust/vmafx-sys, the root Cargo.toml, .github/workflows/rust-ci.yml, and docs/development/rust.md are wholly fork-local. Netflix/vmaf upstream has no Rust surface; upstream cherry-picks and port-upstream-commit syncs are unaffected. The libvmaf C public headers consumed by bindgen remain at core/include/libvmaf/ (ADR-0700 path); any future upstream header change that adds or removes a symbol is handled automatically by re-running cargo build (bindgen regenerates on every build).


feat/vmafx-phase4b-distributed-platform-adr-0709 — fork-only architectural decision, no Netflix upstream impact

No upstream rebase impact: ADR-0709 and the Phase 4b architecture diagram (docs/architecture/phase4b-distributed-platform.md) are wholly fork-local documents. Netflix/vmaf upstream has no controller/node/operator architecture, no Go or Rust binaries, and no rclone/eBPF integration. Upstream cherry-picks and port-upstream-commit syncs are unaffected.

The C ABI break decision (Phase 4b.8) will require updating ffmpeg-patches/ when the implementation PR lands; that PR's docs/rebase-notes.md entry will detail the specific patch files affected. This umbrella ADR does not touch any C source files.

Touched files: docs/adr/0709-vmafx-phase4b-distributed-platform.md, docs/architecture/phase4b-distributed-platform.md, changelog.d/added/vmafx-phase4b-umbrella-adr.md, docs/state.md, docs/rebase-notes.md (this entry), docs/adr/README.md.


docs/research-netflix-pipeline-backlog-audit (Research-0732) — research digest only, no Netflix upstream impact

no rebase impact: this PR adds only docs/research/0732-netflix-pipeline-backlog-audit.md and a changelog fragment. No C sources, headers, build files, or test fixtures are touched. Netflix/vmaf upstream cherry-picks and port-upstream-commit syncs are unaffected.

Touched files: docs/research/0732-netflix-pipeline-backlog-audit.md, changelog.d/added/0732-netflix-pipeline-backlog-audit.md, docs/state.md, docs/rebase-notes.md (this entry).


refactor/cpp23-pilot-metadata-handler — no upstream Netflix conflict

No rebase impact. metadata_handler.c is a fork-local refactor: Netflix/vmaf upstream also has a libvmaf/src/metadata_handler.c at the same path (pre-rename). The rename to .cpp is fork-local (upstream stays .c). If an upstream commit touches libvmaf/src/metadata_handler.c, the port must:

  1. Apply the upstream diff content to core/src/metadata_handler.cpp manually (the C code is still valid C++ after the conversion).
  2. Verify the extern "C" guards in metadata_handler.h are not disturbed.
  3. Rebuild and re-run make test-netflix-golden to confirm scores unchanged.

The meson.build change (replacing the src_dir + 'metadata_handler.c' entry with the metadata_handler_cpp20_lib static lib) is entirely fork-local and has no upstream equivalent.

Touched files: core/src/metadata_handler.cpp (was metadata_handler.c), core/src/metadata_handler.h (added extern "C" guards), core/src/meson.build (isolated static lib for C++20), core/test/meson.build (updated .c -> .cpp references), docs/adr/0708-vmafx-cpp23-internals-pilot.md, docs/research/0732-vmafx-cpp23-internals-migration-plan.md, changelog.d/changed/0708-cpp23-internals-pilot.md, docs/state.md (this entry), docs/rebase-notes.md (this entry).

ADR-0707 — TAD Rust pilot (cbindgen integration) — 2026-05-28

  • ADR: ADR-0707.
  • Upstream source: fork-local. Netflix/vmaf has no Rust feature extractors.
  • Branch: feat/tad-rust-pilot

Key rebase invariants:

  1. core/src/feature/feature_extractor.c gains #if HAVE_RUST_TAD guards around the vmaf_fex_tad extern and list entry. On upstream sync, ensure these guards are preserved; do not merge the upstream version of this file without re-applying the guards.
  2. core/src/meson.build has a cargo build --release custom_target and a declare_dependency for the Rust archive. These are entirely fork-local additions; upstream's meson.build will not have them. The additions appear after the libvmaf_feature_sources list and before the libvmaf = library() call.
  3. tad_rust.c is compiled as a DIRECT source of the libvmaf library target (not into libvmaf_feature.a). This is an intentional architectural choice; do not move it into libvmaf_feature_sources on rebase.
  4. Cargo.toml at the repo root is the workspace manifest. Upstream will never have this file; no merge conflict expected.
  5. The enable_rust_features meson option in core/meson_options.txt is fork-local; preserve on upstream merges. Cargo.toml (repo root, new), core/meson_options.txt, core/src/meson.build, core/src/feature/feature_extractor.c, core/src/feature/tad_rust.c (new), core/src/feature/rust/tad/ (new crate directory), core/test/meson.build, core/test/test_tad_rust.c (new), docs/adr/0707-vmafx-rust-pilot-feature.md, docs/metrics/tad.md, changelog.d/added/tad-rust-pilot.md,

CAMBI Python compat-layer sync v0.5 → v0.8 — 2026-05-28

  • ADR: no ADR required — 1:1 upstream port with no fork-local divergence.
  • Upstream source: Netflix/vmaf CambiFeatureExtractor version history through v0.8 (Research-0732 item #4).
  • Branch: chore/cambi-python-v0.8-sync

Rebase notes: The fork is now at parity with upstream Netflix/vmaf for the Python CAMBI wrappers as of 2026-05-28. Future Netflix syncs of compat/python-vmaf/core/cambi_feature_extractor.py, compat/python-vmaf/core/cambi_quality_runner.py, and python/test/cambi_test.py should merge cleanly. No fork-local divergence was introduced; this was a pure upstream port.

compat/python-vmaf/core/cambi_feature_extractor.py, compat/python-vmaf/core/cambi_quality_runner.py, python/test/cambi_test.py, changelog.d/changed/cambi-python-v0.8-sync.md,


vmafx-node Go worker binary (ADR-0713)

no rebase impact: fork-only addition — all new files under cmd/vmafx-node/, pkg/gpu/, pkg/ai/, gen/go/controller/, docker/Dockerfile.node*, deploy/helm/vmafx/templates/node.yaml. No C sources, no upstream-mirror files touched. pkg/encoder/discover.go and pkg/encoder/hardware.go are new fork-local files; pkg/encoder/encoder.go (already fork-local from ADR-0705) is not modified.

Files added: cmd/vmafx-node/main.go (new), cmd/vmafx-node/executor.go (new), cmd/vmafx-node/main_test.go (new), gen/go/controller/controller.pb.go (new), gen/go/controller/controller_grpc.pb.go (new), pkg/gpu/detect.go (new), pkg/gpu/detect_test.go (new), pkg/ai/infer.go (new), pkg/ai/infer_test.go (new), pkg/encoder/discover.go (new), pkg/encoder/hardware.go (new), docker/Dockerfile.node (new), docker/Dockerfile.node-cpu (new), docker/Dockerfile.node-cuda12 (new), docker/Dockerfile.node-rocm6 (new), docker/Dockerfile.node-sycl-oneapi2026 (new), deploy/helm/vmafx/templates/node.yaml (new), deploy/helm/vmafx/templates/_helpers.tpl (extended), deploy/helm/vmafx/values.yaml (extended — .Values.node section added), docs/server/node.md (new), docs/adr/0713-vmafx-node-impl.md (new), changelog.d/added/vmafx-node.md (new).


Research-0733 — VMAFX eBPF optimization target — 2026-05-28

No rebase impact: docs-only PR. All touched files (docs/research/, changelog.d/, docs/state.md, docs/rebase-notes.md) are fork-local with no upstream Netflix/vmaf equivalent. No C source, no build system, no test assertions changed.

Touched files: docs/research/0733-vmafx-ebpf-optimization-target.md (new), changelog.d/changed/ebpf-research.md (new), docs/state.md (new row), docs/rebase-notes.md (this entry).


cmd/vmafx-operator — Kubernetes Operator kubebuilder skeleton (ADR-0714)

No rebase impact on upstream C/Python code: the operator is entirely fork-local (api/vmafx/v1/, cmd/vmafx-operator/, config/crd/, config/rbac/, deploy/helm/vmafx/crds/, deploy/helm/vmafx/templates/operator-*.yaml, go.mod, go.sum). None of these paths overlap with Netflix/vmaf upstream.

If a future upstream sync adds a Go module or touches go.mod, merge the dependency lists in go.mod and regenerate go.sum.

Fork-local files: api/vmafx/v1/ (new), cmd/vmafx-operator/ (new), config/crd/bases/ (new), config/rbac/role.yaml (new), deploy/helm/vmafx/crds/ (new), deploy/helm/vmafx/templates/operator-deployment.yaml (new), deploy/helm/vmafx/templates/operator-rbac.yaml (new), deploy/helm/vmafx/values.yaml (operator.* section added), docs/adr/0714-vmafx-operator-skeleton.md, docs/development/operator.md, changelog.d/added/vmafx-operator-skeleton.md,


core/src/feature/cuda/AGENTS.md__mul24 prohibition invariant (Research-0734, 2026-05-28)

The 2026-05-28 audit confirmed zero __mul24 / __umul24 / __mul24hi usages in the fork's CUDA kernel tree. A prohibition invariant was added to core/src/feature/cuda/AGENTS.md. On upstream sync: if Netflix/vmaf ever adds a CUDA kernel that uses these intrinsics, the prohibiton invariant requires the caller to either remove the intrinsic (replace with *) or obtain CODEOWNERS sign-off documenting the minimum-CUDA-13.3 constraint (see the AGENTS.md note for the full acceptance criteria).

No upstream file is currently in conflict; this note exists to alert future sync agents that the invariant file was intentionally added by the fork and should be preserved through rebases.


Research-0734 — CUDA 13.3 fix-list deep audit

No rebase impact on upstream C/Python code: this PR is docs-only (research digest, changelog fragment, state.md row, rebase-notes entry). No C source, .cu kernel, or build file is modified.

If a future upstream sync changes dev/Containerfile or Dockerfile CUDA base-image pins, verify that the new pin is >= 13.3 to ensure the NVCC thread-reconvergence fix [6156910] is included.

Fork-local files: docs/research/0734-cuda-13.3-fix-list-deep-audit.md (new), changelog.d/changed/cuda-13.3-fix-list-deep-audit.md (new), docs/state.md (new row), docs/rebase-notes.md (this entry).


scripts/dev/cleanup-agent-state.sh — agent-state cleanup utility

No rebase impact on upstream C/Python code: the script is entirely fork-local developer tooling that touches no compiled sources, tests, or public API.

If a future upstream sync adds a scripts/dev/ directory, merge manually (name collision is the only risk; no logic conflict).

Fork-local files added: scripts/dev/cleanup-agent-state.sh (new), docs/development/agent-worktree-discipline.md (cleanup section added), changelog.d/added/dev-cleanup-script.md (new).


Research-0734 — CUDA VIF filter1d ncu hotpath (no rebase impact)

no rebase impact: pure research digest; no source files modified.

Research-0744 cross-backend baseline (2026-05-28) — no rebase impact

This PR adds only docs/research/0744-cuda-cross-backend-baseline-pre-ncu-perf.md, changelog.d/perf/cuda-cross-backend-baseline.md, and a docs/state.md row. No C, header, Python, or build files are modified. No upstream sync action is required.

docs/research/0734–0738 — CUDA ADM/motion/SSIM/MS-SSIM ncu hotpath profiles (2026-05-28)

No rebase impact: research-only documents, no source code changes. The profiling findings (Research-0734 through 0738) are advisory; no kernel modifications were made in this PR. When a follow-up PR implements the integer_ssim_score.cu extern "C" fix (Research-0736 recommendation 1), that PR must also update ssim_cuda.c host glue and verify bit-exact parity against the CPU integer_ssim extractor on the Netflix golden fixture.

C++23 wave adversarial review (2026-05-28)

Read-only review of PRs #41, #43, #44, #45, #48, #51, #54, #56, #58. No files were modified by this review. The review digest is in docs/research/cpp23-wave-adversarial-review-20260528.md.

Critical issues that must be fixed before merge:

  • PR #43 opt.cpp: strtol/strtod on potentially non-NUL-terminated string_view::data()
  • PR #48 dict.cpp: strtof (float) assigned to double — precision loss on option values
  • PR #54 model.cpp: strlen(model->name) - 5U unsigned underflow → heap overflow
  • PR #58 ref.cpp: make_unique / C-caller free() allocator mismatch

No rebase impact from the review itself; all findings are fixes required in those PRs.

core/src/feature/cuda/integer_ssim/extern "C" on new kernels (ADR-0747)

Any upstream or fork PR that adds a new __global__ kernel to a .cu file under core/src/feature/cuda/ or core/src/cuda/ must wrap the entry point in extern "C" { } if it is also referenced by cuModuleGetFunction in the host .c glue.

The invariant is enforced by scripts/dev/check-cuda-extern-c.sh. Run it locally before pushing. On upstream sync, if Netflix adds new CUDA kernels to their libvmaf/src/feature/cuda/ tree, check whether those kernels use extern "C" in the upstream source and mirror the pattern here.

This invariant was formalised after the audit that found integer_ssim/integer_ssim_score.cu missing extern "C", silently breaking --feature ssim --backend cuda since introduction (PR #77 fixed the analogous break in ssim_score.cu; ADR-0747 fixes integer_ssim_score.cu).

core/src/feature/cuda/integer_vif/filter1d.cu — ADR-0743 launch_bounds + __ldg

No rebase-sensitive invariants for downstream callers — the changes are confined to the device-side kernel body and the FILTER1D_8_HORI macro. The symbol name filter1d_8_horizontal_kernel_2_17_9 is unchanged; the C host file integer_vif_cuda.c continues to load and dispatch it by name.

If an upstream Netflix/vmaf sync introduces changes to filter1d.cu:

  1. The __launch_bounds__(128, 10) annotation on FILTER1D_8_HORI must be preserved (or re-applied) — upstream does not carry this hint.
  2. The __ldg() calls on the 7 buf.tmp.* loads must be preserved.
  3. If upstream changes val_per_thread or HORI_TILE_W, recheck the smem budget constraint (14812 B/block at vpt=4 is smem-limited on sm_89 — see ADR-0743 for derivation).
  4. The ptxas advisory "minnctapersm out of range, ignored" for sm_75/sm_80/ sm_86 is expected and benign; do not treat it as a gate failure.

research-0748 / PR #76 1080p re-measurement — no new rebase invariants

The 1080p re-measurement (research-0748) validates PR #76 at production resolution. No new rebase-sensitive invariants beyond those already documented in the ADR-0743 __launch_bounds__ + __ldg entry above. The register budget (48 regs/thread) and __ldg annotations must be preserved on any upstream sync that touches filter1d.cu per the existing note.

One-off container SYCL device-access pattern (--device /dev/dri --group-add 988)

No rebase impact on upstream C/Python code.

When running vmaf-dev-mcp:cuda13.3 as a one-off docker run with SYCL needed:

  1. --device /dev/dri is not sufficient. The Level Zero GPU ICD requires /dev/dri/by-path/pci-XXXX:YY:ZZ.W-render symlinks to enumerate Intel devices. These symlinks are not passed by --device /dev/dri; they require an explicit -v /dev/dri/by-path:/dev/dri/by-path:ro bind-mount.
  2. --group-add render fails because render is not a group name inside the container. Use --group-add 988 (the host render GID, confirmed on this machine).
  3. Source setvars.sh inside the container before invoking sycl-ls or vmaf --backend sycl.

The docker compose deployment (dev/docker-compose.yml) already carries the by-path bind-mount per ADR-0514; this note covers one-off docker run usage.

Fork-local files: docs/research/0734-cross-backend-baseline-with-sycl-20260528.md (new), changelog.d/changed/cross-backend-baseline-with-sycl.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).

ADR-0752 — Multi-resolution perf benchmark baseline

No rebase impact on upstream C/Python code.

New fork-local files only:

  • scripts/perf/bench-multi-resolution.sh — benchmark harness
  • testdata/perf_multi_resolution.json — baseline snapshot (schema_version=1)
  • docs/development/perf.md — usage docs
  • docs/research/research-0752-perf-bench-multi-resolution-baseline.md
  • docs/adr/0752-perf-bench-multi-resolution.md
  • changelog.d/added/perf-bench-multi-resolution.md
  • core/AGENTS.md (new invariant appended)

The upscaled fixture cache files (testdata/ref_1920x1080_48f.yuv, etc.) are generated on first run and should be .gitignored (they are reproducible from the 576×324 native fixture via ffmpeg -vf scale=W:H:flags=bilinear).


perf/cuda-ssim-vert-combine-ldg-launch-bounds-leak-20260529 (ADR-0754)

No rebase impact on upstream C/Python code.

core/src/feature/cuda/integer_ssim/ssim_score.cu and core/src/feature/cuda/integer_ssim_cuda.c are wholly fork-local files with no upstream Netflix equivalents. The VmafCudaBuffer struct and the vmaf_cuda_kernel_readback_free / vmaf_cuda_buffer_host_free helpers are fork-local CUDA infrastructure. No Netflix upstream commit will collide with these changes on sync-upstream.

Fork-local files modified: core/src/feature/cuda/integer_ssim/ssim_score.cu (F2 + F4 — ldg() + __launch_bounds), core/src/feature/cuda/integer_ssim_cuda.c (F6 per-caller save+free DROPPED — superseded by helper fix in PR #94), core/src/feature/cuda/AGENTS.md (invariant notes), docs/adr/0754-cuda-ssim-vert-combine-ldg-pinned-leak.md (new), docs/adr/README.md (new row), docs/research/0754-cuda-ssim-vert-combine-ldg-launch-bounds-2026-05-29.md (new), changelog.d/perf/cuda-ssim-vert-combine.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).


Research-0755 — HIP backend audit (2026-05-29)

No rebase impact on upstream C/Python code.

All files modified are fork-local: core/src/feature/hip/AGENTS.md (invariant notes), docs/research/0755-hip-backend-audit-20260529.md (new), changelog.d/changed/hip-backend-audit.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).

No source files were modified (audit-only). No Netflix upstream commit will collide with these additions on sync-upstream.

research/cuda-f3-struct-by-value-audit-20260529 (2026-05-29)

No rebase impact: this PR adds documentation-only files (research digest, ADR, changelog fragment, state.md row). No CUDA source files are modified. VmafCudaBuffer, VmafPicture, and AdmBufferCuda definitions are unchanged; no upstream Netflix/vmaf commit will collide with this PR's diff.

Fork-local files added/modified: docs/research/research-0756-cuda-f3-struct-by-value-audit.md (new), docs/adr/0756-cuda-f3-struct-by-value-audit.md (new), docs/adr/README.md (new row), changelog.d/perf/cuda-f3-struct-by-value-audit.md (new), docs/state.md (new row),

ADR-0755: C++23 Wave 7 — activate cpu.cpp (PR on 2026-05-29)

No rebase impact on upstream C/Python code.

core/src/cpu.c was deleted and core/src/meson.build updated to compile cpu.cpp. The file cpu.cpp is wholly fork-local (no upstream Netflix equivalent). No Netflix upstream commit will collide with this deletion.

Fork-local files modified: core/src/cpu.c (deleted), core/src/meson.build (cpu.c → cpu.cpp in libvmaf_cpu_sources), docs/adr/0755-cpp23-wave7-single-file.md (new), docs/adr/README.md (new row), changelog.d/changed/0755-cpp23-wave7-cpu-cpp.md (new), docs/rebase-notes.md (this entry).

research/cuda-motion-ncu-profile-20260529

No rebase impact: research-only commit. No source files modified. Files added: docs/research/0760-cuda-motion-ncu-multi-resolution-20260529.md, changelog.d/perf/cuda-motion-ncu-multi-resolution.md, docs/adr/0760-cuda-motion-ncu-multi-resolution.md (research ADR). No upstream collision risk.

HIP ADM buffer-by-pointer refactor (ADR-0759, 2026-05-29)

Files touched: core/src/feature/hip/integer_adm/adm_csf.hip, core/src/feature/hip/integer_adm/adm_cm.hip, core/src/feature/hip/integer_adm_hip.c, core/src/feature/hip/AGENTS.md

Rebase impact: None. All touched files are fork-added; no upstream Netflix/vmaf file is modified. The HIP backend does not exist in upstream. No rebase conflict is possible with upstream syncs.

The changed kernel signatures are internal to the HIP dispatch path and are not part of any public API.


ADR-0762 — CUDA CIEDE2000 __ldg() F3 fix (2026-05-29)

No rebase impact on upstream C/Python code.

All files modified are fork-local: core/src/feature/cuda/integer_ciede/ciede_score.cu (F3 fix — __ldg + __launch_bounds), core/src/feature/cuda/integer_vif_cuda.c (resolve pre-existing merge-conflict stub from 24bb5daf89), docs/adr/0762-cuda-ciede-ldg.md (new), changelog.d/perf/cuda-ciede-ldg.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).

ciede_score.cu is entirely fork-local (Netflix upstream has no CUDA ciede kernel). A sync-upstream that adds a CUDA ciede kernel upstream would need to incorporate this __ldg() pattern. The integer_vif_cuda.c conflict resolution keeps the HEAD side (ADR-0743 comment block); no Netflix upstream content was discarded.


ADR-0764 — psnr_hvs CUDA kernel F3 ldg() + __launch_bounds(64) (2026-05-29)

No rebase impact on upstream C/Python code: psnr_hvs_score.cu is entirely fork-local. integer_psnr_hvs_cuda.c is unchanged.

If an upstream sync changes the psnr_hvs CPU reference in core/src/feature/third_party/xiph/psnr_hvs.c, verify the CUDA kernel's cooperative tile load and reduction order in psnr_hvs_score.cu are still byte-for-byte equivalent to the CPU's calc_psnrhvs computation pattern.

All files modified are fork-local: core/src/feature/cuda/integer_psnr_hvs/psnr_hvs_score.cu (pointer extraction + __ldg() + __launch_bounds__(64)), docs/adr/0764-psnr-hvs-ldg-launch-bounds.md (new), docs/research/0764-cuda-psnr-hvs-ldg-launch-bounds-2026-05-29.md (new), changelog.d/perf/cuda-psnr-hvs-ldg-launch-bounds.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).


ADR-0787 — libvmaf API error-path audit (2026-05-29)

No rebase impact: this PR adds only documentation files (research digest, ADR, changelog fragment) and no C/Python source changes.

All files modified are fork-local: docs/research/research-0787-libvmaf-api-error-path-audit.md (new), docs/adr/0787-libvmaf-api-error-path-audit.md (new), docs/adr/README.md (new row), changelog.d/fixed/0787-libvmaf-api-error-path-audit.md (new), docs/rebase-notes.md (this entry).

The six implementation fixes recommended by the audit (vmaf_write_output_with_format errno, vmaf_cuda_state_init error codes, vmaf_close unchecked returns, CUDA EBUSY guard, vmaf_init error propagation) will land in a separate fix PR that will carry its own rebase-notes entry. The vmaf_cuda_state_free ABI-normalisation is deferred to a major-version PR.


ADR-0815 — vmafx-operator + vmafx-node distroless Dockerfiles (2026-05-29)

No rebase impact on upstream C/Python code.

All files added are fork-local: docker/Dockerfile.operator (new), .github/workflows/docker-publish-operator-node.yml (new), docs/adr/0815-operator-node-distroless-dockerfiles.md (new), changelog.d/added/0815-operator-node-distroless-dockerfiles.md (new), docs/backends/operator.md (new), docs/rebase-notes.md (this entry).

No upstream Netflix/vmaf files are touched. A sync-upstream cannot conflict with these additions. The docker/Dockerfile.node file was already in-tree (ADR-0717); this PR only adds the CI workflow that publishes it.

no rebase impact: REASON — changes are confined to config files (.clang-tidy, .pre-commit-config.yaml, pyproject.toml), fork-owned Python sources in ai/ and scripts/ (UP auto-fixes), and docs. No upstream Netflix/vmaf C source is touched; the HeaderFilterRegex fix has no effect on any upstream file.

ADR-0795 — prev_ref thread-safety hardening — 2026-05-29

No rebase impact: all changes are in core/src/libvmaf.c (comments, a rename from fex to shared_fex, and a defensive assert). No logic change; no new symbols; no API change. The modified functions (threaded_extract_func, threaded_extract_batch_func) are fork-local dispatch paths not present in upstream Netflix/vmaf.

Fork-local files: core/src/libvmaf.c (comments + assert), docs/adr/0795-prev-ref-thread-safety.md, changelog.d/fixed/prev-ref-batch-thread-safety.md.

ADR-0882 — fuzz target audit (json_model + dnn_sidecar) — 2026-05-30

no rebase impact: REASON — all new files (core/test/fuzz/fuzz_json_model.c, core/test/fuzz/fuzz_dnn_sidecar.c, seed corpora under core/test/fuzz/json_model_corpus/ + core/test/fuzz/dnn_sidecar_corpus/, the known-crash reproducer under core/test/fuzz/json_model_known_crashes/, and ADR-0882 + changelog fragment) are fork-local. Upstream Netflix/vmaf has no libFuzzer harnesses at all (the entire core/test/fuzz/ subtree is fork-added per ADR-0270 + ADR-0311). The core/test/fuzz/meson.build edits sit in a if not get_option('fuzz') guarded subdir that upstream does not descend into. The .github/workflows/fuzz.yml matrix addition extends a fork-only workflow file. The only files touching shared upstream-mirror code are doc edits (docs/state.md, docs/rebase-notes.md, docs/adr/README.md) that always paint the fork-local row pattern.

ADR-0887 — vmaf_model_destroy slopes-OOB fix — 2026-05-30

Low rebase impact, but not zero. Touches two upstream-mirrored files:

  • core/src/read_json_model.c — adds sync_n_features helper, replaces parse_feature_names' unconditional n_features++ with a per-iteration sync_n_features(model, i) call, adds the same call to parse_slopes / parse_intercepts / parse_feature_opts_dicts, and inserts validate_feature_arrays before parse_model_dict returns.
  • core/src/model.c::vmaf_model_destroy — flips the destroy walk bound from max(feature_cap, n_features) to min(feature_cap, n_features).

On upstream sync, if Netflix has independently changed parse_feature_names or vmaf_model_destroy, take the upstream changes for unrelated lines and re-apply this fork's hunks (the new sync_n_features helper, the validate_feature_arrays call, and the min bound in destroy). Upstream Netflix does not currently have this validation pass, so a conflict means upstream changed an adjacent surface — re-applying the fork's hunks post-upstream is mechanical.

Fork-local files (no rebase impact): docs/adr/0887-*.md, docs/research/0887-*.md, core/test/test_model.c regression tests, changelog.d/fixed/vmaf-model-destroy-slopes-oob.md, docs/state.md row.

Feature-extractor coverage round 2 (ADR-0938, 2026-05-31)

no rebase impact: REASON — all seven new files (core/test/test_integer_psnr_coverage.c, core/test/test_integer_motion_coverage.c, core/test/test_integer_motion_v2_coverage.c, core/test/test_integer_vif_log2.c, core/test/test_iqa_convolve_coverage.c, core/test/test_barten_csf_coverage.c, core/test/test_ms_ssim_decimate_coverage.c) are fork-local additions under core/test/ and seven additive blocks in core/test/meson.build that do not touch any upstream-mirrored test file. The only contact surface with upstream is the consumed public C-API and the public feature/integer_vif.h / feature/barten_csf_tools.h / feature/iqa/convolve.h headers, which are upstream-mirrored but read-only from these tests. On upstream sync, conflicts are restricted to the meson.build insertion points; reapply the seven test_*_coverage = executable(...) blocks and the matching test('test_*_coverage', ...) rows post-rebase.

Feature-extractor coverage round 3 (ADR-0948, 2026-05-31)

no rebase impact: REASON — additions are confined to fork-local test binaries under core/test/ (test_integer_motion_edge16_coverage, test_adm_csf_tools_coverage, test_feature_collector_coverage) and three append-only entries in core/test/meson.build. No upstream-mirrored source touched; no public API delta. On upstream sync the new tests apply cleanly regardless of what Netflix does to the underlying production files because the tests link against the existing libvmaf static target and import public + internal headers that already existed before round 3.

SYCL kernel coverage round 2 (ADR-0884, 2026-05-30)

no rebase impact: REASON — all changes are confined to fork-added test files (core/test/test_sycl_adm_parity.c, core/test/test_sycl_ciede_parity.c, core/test/test_sycl_ssim_parity.c, core/test/test_sycl_ms_ssim_parity.c, core/test/test_sycl_motion_v2_parity.c), the meson wiring for those files in core/test/meson.build, and docs / changelog / core/src/feature/sycl/AGENTS.md companion notes. No upstream Netflix/vmaf C source is touched. The SYCL backend itself is fork-original (Netflix/vmaf has no SYCL path), so there is no upstream rebase surface for these tests at all.

CUDA kernel parity tests — round 2 (ADR-0886, 2026-05-30)

no rebase impact: REASON — adds five new fork-local test files under core/test/ (test_cuda_adm_parity.c, test_cuda_motion_v2_parity.c, test_cuda_cambi_parity.c, test_cuda_psnr_hvs_parity.c, test_cuda_ssim_parity.c) and wires them through core/test/meson.build inside the existing if cuda_dependency.found() guard. The tests exercise the public C API (vmaf_init / vmaf_use_feature / vmaf_cuda_state_init / vmaf_feature_score_at_index); upstream Netflix/vmaf does not own any of the touched files. Conflict surface on sync is limited to the core/test/meson.build stanza ordering, which is mechanical.

Fork-local files: core/test/test_cuda_adm_parity.c, core/test/test_cuda_motion_v2_parity.c, core/test/test_cuda_cambi_parity.c, core/test/test_cuda_psnr_hvs_parity.c, core/test/test_cuda_ssim_parity.c, core/test/meson.build (new stanzas only), docs/adr/0886-cuda-kernel-coverage-round2.md, docs/research/cuda-kernel-coverage-round2-2026-05-30.md, changelog.d/added/0886-cuda-kernel-coverage-round2.md.

macOS CI ansnr-residual cleanup (ADR-0749 follow-up, 2026-05-30)

no rebase impact: REASON — changes are confined to fork-mirrored upstream test files (python/test/feature_extractor_test.py, python/test/quality_runner_test.py, python/test/routine_test.py) where assertions referencing the legacy ansnr / anpsnr keys are dropped or the tests are skipped per ADR-0749 (ansnr feature sunset). The @unittest.skip reasons cite ADR-0749, so on upstream sync the conflict resolution is mechanical: if Netflix upstream still has the legacy assertions they were calibrated against float_ansnr output that this fork no longer produces — keep the skips. If Netflix upstream removes the legacy assertions themselves (matching this fork's direction), drop the local skips.

CI scripts: rebrand-proof assertion-density + tempfile trap (ADR-0968, 2026-05-31)

no rebase impact: fork-local — scripts/ci/assertion-density.sh and scripts/release/concat-changelog-fragments.sh are entirely fork-introduced; Netflix upstream has no equivalent files in either path. The only rebase risk is a new upstream scripts/ entry shadowing the directory, which would surface as an explicit conflict rather than a silent behaviour change.

compat/python-vmaf leaf-utility coverage (2026-05-31)

no rebase impact: REASON — the new test file lives entirely under python/test/compat_python_vmaf_coverage_test.py (fork-local test directory that Netflix upstream never touches) and imports leaf utilities by their existing public names. No production module under compat/python-vmaf/ is modified; only compat/python-vmaf/AGENTS.md gains one paragraph documenting which leaves carry coverage tests and warning about the latent sha1 bug in tools/decorator.py's persist helpers. Upstream syncs do not own compat/python-vmaf/AGENTS.md (fork-only file).

Master CI regressions — Metal MS-SSIM fixture + ssimulacra2 icpx XYB (ADR-0973, 2026-05-31)

no rebase impact: REASON — all touched files are fork-additions with no upstream conflict surface:

  • core/test/test_metal_float_ms_ssim_parity.c — fork-added in T8-2a; Netflix upstream has no Metal backend.
  • core/test/test_ssimulacra2_simd.c — fork-added SIMD bit-exactness test; Netflix upstream has no SSIMULACRA 2 SIMD paths.
  • docs/adr/0973-*.md, docs/research/0973-*.md, changelog.d/fixed/0973-*.md, core/test/AGENTS.md — fork-only governance / docs.

The fix adds a file-scope #pragma clang fp contract(off) block to test_ssimulacra2_simd.c. If a future contributor refactors the file's scalar reference functions out into a helper header, the pragma block must move with them or the icx FMA contraction returns and test_xyb fails under the all-backends matrix leg.

test_gpu_picture_pool.c Round 27 D.3 + D.4 cleanup (ADR-0970, 2026-05-31)

no rebase impact: REASON — core/test/test_gpu_picture_pool.c is a fork-local test file (it was introduced in this fork's PR #266 / ADR-0239; Netflix upstream has no equivalent file). The two changes (remove unused .state malloc, delete dead /* ... */ block) affect only lines that Netflix upstream never touches. core/test/AGENTS.md is also fork-only.


ADR-0922 — coverage ratchet + per-PR delta gate — 2026-05-31

No rebase impact. All touched files are fork-local CI / docs infrastructure:

  • scripts/ci/coverage-check.sh (raised OVERALL_MIN 37 → 60, CRITICAL_MIN 85 → 90, tightened each PER_FILE_MIN entry by +5pp).
  • scripts/ci/coverage-delta-check.sh (new — per-PR delta gate).
  • .github/workflows/tests-and-quality-gates.yml (Coverage Gate job: new floor numbers + two new steps that compute base-branch coverage and run the delta gate on pull-request events).
  • docs/adr/0922-coverage-ratchet-aggressive.md, docs/adr/_index_fragments/0922-coverage-ratchet-aggressive.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md (regenerated by scripts/docs/concat-adr-index.sh).
  • changelog.d/changed/0922-coverage-ratchet-aggressive.md.

Upstream Netflix/vmaf has no coverage gate, so on sync there is nothing to reconcile. The per-PR delta gate's fetch-depth: 0 checkout requirement is worth flagging if the workflow ever gets restructured: a shallow checkout breaks git merge-base HEAD "$BASE_REF".

Metal kernel coverage round 4 — closeout (2026-05-31, ADR-0959)

no rebase impact: REASON — every new file path is fork-local Metal-only (Netflix upstream has no Metal backend at all per rebase-notes.md §"feat/libvmaf-metal-filter-iosurface" lineage). The single existing-file edit, core/test/meson.build, appends one executable() + test() block inside the existing enable_metal guard introduced by ADR-0361 (no boundary change, no upstream-mirrored line touched). Upstream sync resolution is trivially "keep theirs" everywhere except inside the if metal_test_opt.enabled() … block, which is fork-only by construction.

Fork-local additions (no rebase impact): core/test/test_metal_kernel_coverage_audit.c, docs/adr/0959-metal-kernel-coverage-round4-closeout.md, docs/research/0959-metal-kernel-coverage-round4-closeout.md, changelog.d/added/metal-kernel-coverage-round4.md, the new audit row in docs/adr/README.md, the T-METAL-KERNEL-PARITY-ROUND4-2026-05-31 row in docs/state.md.

CUDA kernel parity coverage — round 4 (ADR-0956, 2026-05-31)

no rebase impact: REASON — all five new files (core/test/test_cuda_float_adm_parity.c, core/test/test_cuda_float_motion_parity.c, core/test/test_cuda_float_ssim_parity.c, core/test/test_cuda_speed_chroma_smoke.c, core/test/test_cuda_speed_temporal_smoke.c) live entirely under fork-local test directories that Netflix upstream never touches. The only modified shared file is core/test/meson.build, where the round 4 block is appended after the existing round 3 / ADR-0541 / motion3 parity blocks inside the existing if get_option('enable_cuda') guard. On upstream sync, if Netflix has independently added test binaries in the same enable_cuda block the conflict is a trivial append-vs-append three-way merge (no shared lines change). Fork-local documentation files (ADR-0956, the round 4 research digest, the changelog fragment, this rebase-notes row) are never authored upstream.

speed_internal.c + SpEED GPU twin wiring (ADR-0964, 2026-05-31)

Will bite a rebase. This PR adds core/src/feature/speed_internal.c (a fork-local TU that duplicates ~600 LOC of pure math — eigendecomposition, QR factorisation, matrix helpers — from speed.c). When /sync-upstream ports any change to speed.c's static helpers (compute_eigenvalues, matrix_qr_decomposition, solve_triangular_system, convert_to_tridiagonal, compute_eigenvalues_tridiagonal, compute_covariance_matrix, filter_and_downscale, ...), the same change must be mirrored into speed_internal.c. Symptom of drift: test_sycl_speed_chroma_parity / test_sycl_speed_temporal_parity flag a places=4 violation between CPU and SYCL on Intel Arc.

Also fork-local:

  • core/src/feature/hip/speed_{chroma,temporal}_hip.c (already in tree, newly wired into core/src/hip/meson.build).
  • core/src/feature/sycl/speed_{chroma,temporal}_sycl.cpp (already in tree, newly wired into core/src/meson.build sycl_feature_sources).
  • core/src/feature/feature_extractor.c externs + registry rows for the four new GPU extractor symbols, gated on #if HAVE_HIP / #if HAVE_SYCL.
  • core/test/test_sycl_speed_chroma_parity.c + core/test/test_sycl_speed_temporal_parity.c.

If Netflix upstream ever ships its own SpEED GPU implementation that takes a different code-sharing approach (e.g. exposing speed.c helpers via a non-static-prefix), the fork should consider migrating to upstream's pattern; until then speed_internal.c is the canonical location for the shared helpers and the GPU TUs depend on its function names.

CUDA twins (speed_chroma_cuda, speed_temporal_cuda) are NOT wired in this PR — the TUs reference symbols (CHECK_CUDA, CudaFunctions->cuMemAllocHost) that do not exist; they need a repair pass. Tracked as T-CUDA-SPEED-TU-REPAIR-2026-05-31 in docs/state.md.

CUDA SpEED TU repair + wiring (ADR-0965, 2026-05-31)

No new rebase risk beyond ADR-0964. This repair PR fixes the latent bugs in speed_chroma_cuda.c and speed_temporal_cuda.c and wires them into meson. The changes are:

  • CHECK_CUDA(cu_f, CALL) replaced with CHECK_CUDA_GOTO(cu_f, CALL, fail) throughout both TUs.
  • cuMemAllocHost(ptr, sz) replaced with cuMemHostAlloc(ptr, sz, 0x01u) in the ALLOC_HOST macros (both TUs).

Both changes are mechanical; no algorithmic content was altered. The rebase-note from ADR-0964 above covers the speed_internal.c drift risk (mirror fixes between speed.c and speed_internal.c).

New additions:

  • core/src/feature/feature_extractor.c externs + registry rows for vmaf_fex_speed_chroma_cuda / vmaf_fex_speed_temporal_cuda under #if HAVE_CUDA.
  • core/test/test_cuda_speed_chroma_parity.c + core/test/test_cuda_speed_temporal_parity.c.

Go errors.Join cleanup paths + slog key standardisation (ADR-0935, 2026-05-31)

no rebase impact: REASON — every file touched lives in fork-original Phase 4b Go subtree (pkg/bisect/, pkg/encoder/, pkg/storage/, cmd/vmafx-controller/queue/, cmd/vmafx-node/). Netflix upstream ships no Go code under these paths, so a future upstream/master sync cannot conflict here. The cmd/vmafx-tune/AGENTS.md invariant addition is also fork-original. If a follow-up port-PR introduces upstream Go code, the errors.Join discipline documented in cmd/vmafx-tune/AGENTS.md §7 applies on entry.

Generic registry for vmafx-controller (ADR-0925, 2026-05-31)

no rebase impact: REASON — touched files are 100 % fork-only Go sources (pkg/registry/registry.go, pkg/registry/registry_test.go, cmd/vmafx-controller/nodes/registry.go, pkg/observability/observability.go). Netflix upstream is a pure C / Python tree; the cmd/ and pkg/ Go trees do not exist there.


VmafPicture v2 design scaffold (ADR-0928, 2026-05-31)

Files touched: core/include/libvmaf/picture_v2.h (new), docs/adr/0928-vmaf-picture-v2-explicit-backend-state.md (new), docs/architecture/vmaf-picture-v2-migration.md (new), docs/adr/README.md + docs/adr/_index_fragments/ (index row), changelog.d/added/vmaf-picture-v2-design.md (fragment).

Rebase impact: None for this PR. The new header is declared but not yet wired into meson.build, and v1 (core/include/libvmaf/picture.h) is preserved bit-for-bit — every existing consumer (FFmpeg patches 0002–0006, MCP server, Rust binding scaffold, Python wheels) still sees the v1 surface unchanged. Upstream Netflix/vmaf has no v2 counterpart on the deprecation horizon, so no sync conflict is expected.

Lifecycle (per ADR-0928):

  • Cycle N (this PR): header declared, design + scaffold only.
  • Cycle N+1: header wired into meson, converters implemented in core/src/picture.c, v1 marked __attribute__((deprecated)).
  • Cycle N+2: in-tree backends + ffmpeg-patches/0002-0006 switched to v2 (coordinated per CLAUDE.md §12 r14).
  • Cycle N+3 (≈ 12 months, target VMAFX v4.0.0): v1 removed, SONAME bump libvmaf.so.3 → .4.

If upstream Netflix independently adds a VmafPicture v2 of their own before cycle N+3, reconcile by adopting upstream's naming (VmafPicture2 is intentionally generic) and remap our converters; otherwise the cycle-N+3 v1-removal commit is the natural ABI break window.

pathlib sweep + ruff PTH guard (ADR-0936, 2026-05-31)

no rebase impact: changes are confined to fork-owned Python — the two console-shim files under tools/vmaf-*/ (fork-added, no upstream twin), fork-owned ai/scripts/, ai/src/corpus/, mcp-server/, scripts/ci/, and tools/vmaf-tune/src/ modules. The pyproject.toml ruff config delta adds PTH to select and lists it in the existing per-file ignores for the upstream-mirror trees (python/**, compat/python-vmaf/**, testdata/**). Upstream Netflix Python is covered by those ignores; an upstream sync will not see the PTH rule applied to their files.

iter.Seq[T] companion APIs for Go packages (ADR-0932, 2026-05-31)

no rebase impact: REASON — every touched file is fork-original Go code under pkg/bisect/, pkg/ladder/, pkg/ai/, and cmd/vmafx-controller/nodes/. None of these paths exist upstream (Netflix/vmaf has no Go module), so an upstream sync cannot conflict with the new IterSamples / IterCloud / IterHull / AllSeq / ListModelsSeq surfaces. The deprecated Registry.All / Registry.ListModels shims are likewise fork-local. If a future Netflix upstream adds Go bindings, the conflict is resolution-only at the package-tree level (different directory layout, no symbol overlap).

Skills library expansion — /add-mcp-tool, /add-k8s-resource, /audit-modernization, bisect-common (ADR-0939, 2026-05-31)

no rebase impact: all new files land under .claude/skills/, which is fork-local infrastructure (the upstream Netflix/vmaf repo does not ship a .claude/ directory). The accompanying ADR, index fragment, changelog fragment, and research digest are likewise fork-local. The two existing bisect skills (bisect-regression, bisect-model-quality) gain scaffold.sh driver scripts that source .claude/skills/lib/bisect-common.sh — still all fork-local. No upstream files touched.

If upstream Netflix ever adopts .claude/ skills (unlikely — different agent tooling), revisit whether the three new scaffolds should be promoted or stay fork-only. The bisect-common library has no upstream analogue either, so the merge surface is zero.

ai/ dataclass → pydantic v2 migration (ADR-0934, 2026-05-31)

no rebase impact: REASON — touched files are entirely fork-local. Upstream Netflix/vmaf does not ship ai/src/vmaf_train/ at all (the package is fork-added — Tiny-AI surface, ADR-0042). TrainConfig (train.py), ModelMetadata (registry.py), and ManifestEntry (data/datasets.py) become pydantic.BaseModels; pydantic>=2.13.4 added to ai/pyproject.toml (already in tree via mcp-server/vmaf-mcp). Sidecar JSON layout byte-identical (ModelMetadata.to_json() uses model_dump(mode="json") + json.dumps(indent=2, sort_keys=True)). On upstream sync the diff cannot conflict — Netflix has no equivalent file to merge into.

Vendored libsvm + IQA test-coverage uplift (2026-05-31, ADR-0952)

core/test/test_svm_api.c and core/test/test_iqa_helpers.c are pure fork-local test additions. They link against the vendored libsvm_static_lib (for svm) and libvmaf_feature_static_lib + libvmaf_cpu_static_lib (for iqa) via their extract_all_objects recipes — the same pattern used by test_iqa_convolve.c, test_feature_extractor.c, and PR #381's test_svm_parser.c. No vendored source is touched.

Rebase impact: when Netflix upstream re-pins libsvm (3.24 → 3.36 or later) or when the IQA helpers gain new public functions:

  • test_svm_api.c assertions on inspector outputs and on the C-SVC / EPSILON-SVR predict round-trip are functional invariants of the libsvm public API; a major-version bump that changes them is a semantic break and should land its own ADR.
  • test_iqa_helpers.c _round() and _cmp_float() assertions document the current asymmetric rounding rule ("trunc toward zero, add sign when |frac| >= 0.5"). If upstream tdistler.com (or Netflix's 2016 update) ever rewrites those helpers to IEEE-754 round-half-to-even, the tests will fail — that is by design; the failure surfaces the unintended numerical change at the rebase diff, not at the integration SSIM result.

The meson wiring in core/test/meson.build inserts two new executables above test_feature_extractor and registers them in the fast suite. Both fragments are isolated; the only adjacency to upstream code is the alphabetical position in the test list.

PR companion to ADR-0889 (PR #381, libsvm parser audit) — the two PRs can land in either order without conflict.

external-bench test coverage backfill (ADR-0332 follow-up, 2026-05-31)

no rebase impact: REASON — changes are confined to fork-only files (tools/external-bench/tests/test_compare.py, changelog.d/added/*, docs/research/*). The tools/external-bench/ tree is fork-only per ADR-0332 (no upstream counterpart); coverage backfill (14 new tests for BVI-DVC discovery edge cases, Netflix discovery edge cases, validator rejection paths, run_wrapper missing-output guard, and main() --limit + per-item skip flow) cannot conflict on upstream sync.

HIP motion3 parity test ENOSYS-skip (ADR-0949, 2026-05-31)

no rebase impact: REASON — the only file touched in the libvmaf source tree is core/test/test_hip_motion3_parity.c, which is wholly fork-added (no Netflix upstream counterpart — Netflix/vmaf does not ship a HIP backend). The skip-on--ENOSYS change is self-contained inside the test's HIP-path helper; CPU baseline, tolerance, fixture geometry, and end-of-stream handling are unchanged. Upstream sync cannot conflict because no upstream file touches this test path or the motion_hip extractor's scaffold-vs-runtime split.

GitHub Actions custom-action + reusable-workflow audit (ADR-0951, 2026-05-31)

no rebase impact: REASON — audit-only PR. No code under core/, python/, ai/, mcp-server/, or tools/ is touched. The only edited files are: docs/adr/0951-github-actions-custom-audit.md (new ADR), docs/adr/README.md + docs/adr/_index_fragments/_order.txt (index rows), docs/research/0951-github-actions-custom-audit.md (digest), changelog.d/changed/github-actions-custom-audit.md (fragment), and this rebase-notes row. The fork-wide SHA-pin invariant in .github/AGENTS.md lines 100–144 is unchanged. On upstream sync the audit conclusions remain valid until Netflix introduces its own .github/actions/ tree or workflow_call: workflow; re-run the three reproducer commands in the research digest to confirm.

fix(mcp-server): NamedTemporaryFile (ADR-0975) — no rebase impact

Replaces a local variable assignment in _run_vmaf_score. No C surface, no public API, no upstream-mirrored file touched. On rebase against upstream Netflix/vmaf, this change applies cleanly to the MCP server layer which is entirely fork-local.

ADR-0945 — HIP kernel parity coverage round 3 — 2026-05-31

no rebase impact: REASON — the 4 new test files (core/test/test_hip_cambi_parity.c, core/test/test_hip_float_adm_parity.c, core/test/test_hip_float_motion_parity.c, core/test/test_hip_float_psnr_parity.c) live entirely under the fork-only HIP backend tree. Upstream Netflix/vmaf has no HIP backend, no parity tests, and no test_hip_* files; the if get_option('enable_hip') == true block in core/test/meson.build is fork-local (added by the HIP scaffold landing in ADR-0212). Wiring lives strictly inside that block. The only non-test files touched are docs/adr/README.md (index row), docs/adr/_index_fragments/_order.txt, the changelog.d/added/hip-kernel-coverage-round3.md fragment, and the companion docs/research/hip-kernel-coverage-round3-2026-05-31.md audit — all fork-only.

ADR-0918 — LLVM IR diff harness — 2026-05-31

no rebase impact: harness is fork-local tooling (scripts/perf/check-ir-diff.sh, scripts/perf/ir-diff-config.yaml, testdata/ir-snapshots/, make ir-diff / make ir-diff-update targets). It snapshots LLVM IR for fork-added SIMD sources only; Netflix upstream never touches these paths. The only upstream coupling is the SIMD source files themselves (core/src/feature/x86/*.c) — if a future upstream sync changes the scalar reference for psnr_hvs / ms_ssim_decimate / ssimulacra2 and the AVX2 twin must change in lockstep, the snapshot regen step (make ir-diff-update) is a normal part of the port — same discipline as the score JSON snapshots under /regen-snapshots. The new core/src/feature/x86/AGENTS.md invariant note flags this for the next sync agent.

vmafx-operator functional test coverage uplift (2026-05-31)

no rebase impact: REASON — all four new test files live under cmd/vmafx-operator/internal/controller/ which is fork-added per ADR-0714 (vmafx-operator kubebuilder skeleton). Upstream Netflix/vmaf ships no Go sources and no Kubernetes operator surface; there is nothing to merge against.

Fork-local files: cmd/vmafx-operator/internal/controller/vmafxnode_controller_test.go (new), cmd/vmafx-operator/internal/controller/vmafxjob_controller_branch_test.go (new), cmd/vmafx-operator/internal/controller/vmafxmodeltraining_controller_branch_test.go (new), cmd/vmafx-operator/internal/controller/setup_with_manager_test.go (new), changelog.d/added/operator-functional-coverage.md (new).

ADR-0913 — CHANGELOG.md renderer splice contract + 44 k-line drift sweep — 2026-05-31

no rebase impact (upstream): REASON — fork-local infrastructure only. The renderer (scripts/release/concat-changelog-fragments.sh), the rendered file (CHANGELOG.md), and the fragment tree (changelog.d/) are all fork-added; Netflix/vmaf upstream uses a hand-edited CHANGELOG.md with no fragment system.

In-flight fork-branch impact (medium): every in-flight fork branch that added a fragment under the old ## Section / ### Section shape will conflict on its fragment file at rebase time. Resolution is mechanical — keep the bullet content, drop the redundant first-line section header (the renderer emits ### Section itself). Branches that added perf entries under changelog.d/perf/ or changelog.d/performance/ need to rename to changelog.d/changed/perf-<topic>.md (the same convention PR #384 / ADR-0892 introduces). On rebase the renderer's new stderr WARNING surfaces the wrong directory immediately; bash scripts/release/concat-changelog-fragments.sh --check then verifies the fix.

__init__.py export-completeness audit (ADR-0911, 2026-05-31)

no rebase impact: REASON — all eight modified __init__.py files are fork-added (ai/__init__.py, ai/data/__init__.py, ai/train/__init__.py, ai/src/vmaf_train/__init__.py, ai/src/vmaf_train/data/__init__.py, dev-llm/src/vmaf_dev_llm/__init__.py, mcp-server/vmaf-mcp/src/vmaf_mcp/__init__.py, scripts/lib/__init__.py). Upstream-mirror packages (compat/python-vmaf/**, python/test/__init__.py) were deliberately left byte-identical per the upstream-mirror rebase-hygiene rule. No upstream Netflix/vmaf file is touched.

ADR-0907 — Wall-clock perf regression gate (2026-05-30)

No rebase impact on upstream C/Python code.

New fork-local files only:

  • scripts/perf/check-regression.py — gate script (stdlib-only)
  • scripts/perf/test_check_regression.py — smoke tests
  • docs/adr/0907-perf-regression-gate-wall-clock.md (new)
  • docs/adr/_index_fragments/0907-perf-regression-gate-wall-clock.md (new)
  • changelog.d/added/perf-regression-gate.md (new)
  • .github/workflows/tests-and-quality-gates.yml (new perf-regression job; the disabled cross-backend job's broken bench_all.sh --backend=cpu --snapshot-only --tolerance-ulp=2 invocation is replaced with a no-op placeholder echo since bench_all.sh does not parse those flags)

No upstream Netflix collision risk — the gate consumes only the fork-added testdata/perf_multi_resolution.json baseline (ADR-0752, fork-local).

Slow-test audit (ADR-0908, 2026-05-30)

no rebase impact: REASON — all touched files are fork-local. A new ADR (docs/adr/0908-slow-test-audit-2026-05-30.md), a new research digest (docs/research/slow-test-audit-2026-05-30.md), fork-added pytest configuration in three pyproject.toml files registering the slow marker (tools/vmaf-tune/pyproject.toml, ai/pyproject.toml, mcp-server/vmaf-mcp/pyproject.toml), and fork-added test files (tools/vmaf-tune/tests/test_bbb_e2e_v5_bug_cluster.py, tools/vmaf-tune/tests/test_bbb_e2e_v14_bug_cluster.py). None are mirrored from upstream Netflix/vmaf.

ADR status-field drift sweep (2026-05-30)

no rebase impact: changes are confined to fork-local ADR markdown files under docs/adr/. Status-field flips on ADR-0573 (→ Superseded by ADR-0738) and Status normalisation on ADR-0105 / ADR-0106 / ADR-0107 (Supersedes-in-Status → Accepted + explicit Supersedes line). Netflix upstream has no docs/adr/ tree; nothing to reconcile on sync. Audit methodology and the full decision matrix live in docs/research/adr-status-drift-audit-2026-05-30.md.

ADR-0903 — Codecov upload wiring (2026-05-30)

no rebase impact: REASON — all changes are confined to fork-only files: .github/workflows/tests-and-quality-gates.yml is a fork-added CI workflow (upstream Netflix/vmaf has no equivalent gcovr-based Coverage Gate), docs/adr/0903-wire-codecov-upload.md is fork-only documentation, and changelog.d/added/wire-codecov-upload.md is a fork-only changelog fragment per ADR-0221. The added codecov/codecov-action steps depend only on the Cobertura XML the existing gcovr step already produces; upstream sync cannot break this wiring because the gcovr job itself is fork-only.

ADR-0904 — cargo-machete build-dep ignores (2026-05-30)

no rebase impact: REASON — Netflix/vmaf upstream has no Rust workspace. Both touched Cargo.toml files (bindings/rust/vmafx-sys/Cargo.toml, core/src/feature/rust/tad/Cargo.toml) live entirely in fork-local trees (ADR-0702, ADR-0707). The [package.metadata.cargo-machete] blocks add no-op metadata (cargo ignores keys it doesn't know) and cannot conflict with anything upstream might add later.

Signing and attestation audit (ADR-0902, 2026-05-30)

no rebase impact: REASON — changes are confined to fork-local CI infrastructure (.github/workflows/docker-publish-production.yml, docs/development/release.md, docs/adr/0902-*.md, docs/research/signing-and-attestation-audit-2026-05-30.md, changelog.d/security/signing-and-attestation-audit.md). The supply-chain workflow (supply-chain.yml) is itself fork-additive (Netflix upstream does not ship a Sigstore + SLSA + SBOM release channel); upstream syncs never touch any of these files.

Doxygen public-API clean (ADR-0953, 2026-05-31)

Low rebase impact: every edit lands as additional doxygen comments or per-member /**< desc */ annotations inside the public headers under core/include/libvmaf/. Two of the headers touched are Netflix-upstream-mirrored — picture.h and model.h — but the edits are pure documentation; no struct layout, function signature, or symbol name moves. An upstream sync that re-touches either file should accept its hunks unchanged and let the fork's doc comments remain in place. The four fork-added headers (libvmaf_mcp.h, dnn.h, libvmaf_metal.h, libvmaf_hip.h) are fork-local and have no upstream counterpart. The new core/doc/Doxyfile.public-api, .github/workflows/doxygen-public-api.yml, ADR-0953, research digest, changelog fragment, and AGENTS.md invariant note are fork-local — zero rebase exposure.: warning-clean doxygen build for libvmaf public C API (recovery of #457)): warning-clean doxygen build for libvmaf public C API (recovery of #457))

governance-audit (2026-05-30, ADR-0901)

No rebase impact — all changes are fork-local governance files that upstream Netflix/vmaf does not ship:

  • GOVERNANCE.md (new), MAINTAINERS.md (new) — top-level fork-only.
  • .github/CODEOWNERSappend-only additions below the existing rows (the rename of the existing /libvmaf/... rows to /core/... is owned by in-flight PR #321, not this PR).
  • CONTRIBUTING.md — fork-specific block extended with branch-naming, ADR-0108 deliverables, ADR-allocator pointer, governance pointer. The inherited Netflix upstream contribution-guide block at the bottom is unchanged.
  • docs/adr/0901-governance-audit.md, docs/adr/_index_fragments/0901-governance-audit.md, docs/adr/_index_fragments/_order.txt (one-line append), docs/research/governance-audit-2026-05-30.md, changelog.d/added/governance-audit.md — all fork-only paths.

On upstream sync, no conflict is expected. If CODEOWNERS shows a textual conflict because PR #321 landed in-between, the resolution is trivial: keep PR #321's renamed /core/... rows AND keep this PR's new append-only rows. Both edits are non-overlapping at the line level.

ADR-0893 — Pre-commit config audit — 2026-05-30

no rebase impact: REASON — .pre-commit-config.yaml is a fork-local config file. Upstream Netflix/vmaf does not ship pre-commit configuration; all revisions and hooks listed are fork-owned. Touches one fork-owned Python file via isort 6.0.1 auto-fix (tools/vmaf-tune/tests/test_codec_adapter_av1_videotoolbox.py), which is itself outside the upstream tree.

libsvm vendored audit — extend SAN-MODEL-MALLOC-OOB to row-ordering (ADR-0889, 2026-05-30)

Touches the vendored libsvm parser core/src/svm.cpp, which is wrapped in a file-level NOLINTBEGIN / NOLINTEND cordon. On Netflix-vmaf upstream sync the file is part of the fork-mirrored set: Netflix upstream has not refreshed its vendored libsvm copy since 2020-11 either, so a Netflix-only sync has near-zero conflict risk on this file.

On an upstream libsvm (Chih-Chung Chang / Chih-Jen Lin) sync — deliberately deferred per ADR-0889 — the fork carries three patch families that must be re-applied:

  1. Thread-locale isolation (ADR-0137) — buffer.imbue(std::locale::classic()) in both SVMModelParserFileSource and SVMModelParserBufferSource constructors.
  2. JSON in-memory entry point — svm_parse_model_from_buffer plus the SVMModelParserBufferSource template instantiation. Consumed by read_json_model.c; removing it breaks JSON-embedded SVM model loading.
  3. SAN-MODEL-MALLOC-OOB hardening — VMAF_SVM_MAX_AXIS_COUNT (1 << 24) bound, nr_class / total_sv axis-size asserts in parse_header() and parse_support_vectors(), sv_buffer.empty() post-parse guard, plus the row-ordering preconditions (exceptAssert(model->nr_class > 0, ...)) on rho, label, probA, probB, nr_sv added by ADR-0889.

Regression coverage at core/test/test_svm_parser.c (suite fast). On sync, re-run that test plus test_predict and test_model before merging. See core/src/AGENTS.md §10 for the full invariant list.

CI concurrency + cost audit (ADR-0890, 2026-05-30)

no rebase impact: REASON — CI-only changes to .github/workflows/ files that are wholly fork-local. Netflix upstream's CI is one .github/workflows/ file with a different name and structure; the five files modified here (ffmpeg-integration.yml, sanitizers.yml, security-scans.yml, lint-and-format.yml, plus the ADR / changelog / state.md surface) have no upstream counterpart. No source / header / patch surface touched; the ffmpeg-patches/ series is unaffected.

ADR-0883 — HIP kernel parity coverage round 2 — 2026-05-30

no rebase impact: REASON — the 5 new test files (core/test/test_hip_ciede_parity.c, test_hip_psnr_hvs_parity.c, test_hip_motion_parity.c, test_hip_ssim_parity.c, test_hip_ms_ssim_parity.c) live entirely under the fork-only HIP backend tree. Upstream Netflix/vmaf has no HIP backend, no parity tests, and no test_hip_* files; the if get_option('enable_hip') == true block in core/test/meson.build is fork-local (enable_hip was added by the HIP scaffold landing in ADR-0212). Wiring lives strictly inside that block. The only non-test file touched is docs/adr/README.md (index row) and docs/adr/_index_fragments/_order.txt — both fork-only.

ADR-0876 — printf-format portability sweep (CERT FIO47-C) — 2026-05-30

Low rebase impact, scoped to fork-added log / debug call sites. The four touched source files (core/src/libvmaf.c, core/src/sycl/common.cpp, core/src/sycl/dmabuf_import.cpp, core/test/test_motion_v2_simd.c) either are fork-added (the SYCL TUs + the AVX2 test) or contain a fork-added block inside an upstream-mirror file (the tiny-model loader in libvmaf.c, which is post-ADR-0700 fork-edited per git blame). The format-string changes are mechanical: (unsigned long)x + %lux + %" PRIu64 " for uint64_t; (long long)x + %lldx + %" PRId64 " for int64_t; (unsigned long long)x + %llxx + %" PRIx64 " for uint64_t hex prints. Three call sites in upstream-mirror code (core/src/feature/x86/adm_avx512.c print_128_64 debug macro) and POSIX-off_t / Windows-DWORD sites were intentionally not changed — see docs/research/0876-printf-format-portability-audit.md §2 Class C for the rationale. Future upstream syncs that touch the same lines will conflict trivially; resolve in favour of the PRI-macro form for fixed-width types.

ADR-0877 — error-code consistency audit (MS-SSIM decimate) — 2026-05-30

no rebase impact: the four touched TUs (core/src/feature/ms_ssim_decimate.{c,h}, core/src/feature/x86/ms_ssim_decimate_{avx2,avx512}.c, core/src/feature/arm64/ms_ssim_decimate_neon.c) are fork-added 2026-04-20; they have no upstream Netflix/vmaf counterpart. The change converts the malloc-failure branch from bare return -1 to return -ENOMEM and tightens the header docstring to match — no logic change on the hot path. Bit-exactness across scalar / AVX2 / AVX-512 / NEON is preserved (only the cold malloc-failure branch is touched).

ADR-0875 — GitHub Actions hardening audit — 2026-05-30

no rebase impact: REASON — all changes are confined to fork-local CI workflows under .github/workflows/ (go-ci.yml, rust-ci.yml, sanitizers.yml, supply-chain.yml). Upstream Netflix/vmaf has a completely different CI pipeline; none of these files exist upstream. Adds top-level permissions: contents: read to the two Go/Rust workflows and persist-credentials: false to five actions/checkout steps. No source code touched.

Fork-local files: .github/workflows/go-ci.yml, .github/workflows/rust-ci.yml, .github/workflows/sanitizers.yml, .github/workflows/supply-chain.yml, docs/adr/0875-github-actions-audit-2026-05-30.md, docs/research/github-actions-audit-2026-05-30.md, changelog.d/security/github-actions-audit-2026-05-30.md.

ADR-0873 — ARM64 NEON bit-exactness audit — 2026-05-30

Rebase impact: low, limited to build system and one test file.

core/src/meson.build lines 581–643: the arm64_v8 static lib is split into arm64_v8 (integer-only TUs, unchanged compile flags) and arm64_v8_fp (float-arithmetic TUs, new -ffp-contract=off flag). If upstream Netflix/vmaf adds new NEON TUs to this region, they must be classified as integer or float and placed in the correct lib.

core/src/feature/arm64/float_adm_neon.c: float_adm_sum_cube_neon and float_adm_csf_den_scale_neon now accumulate into float64x2_t instead of float32x4_t. This is a numeric change — if upstream modifies these functions, the double-accumulation pattern must be preserved.

core/src/feature/adm.c: comment-only change (ADR-0873 follow-up note).

core/test/test_motion_v2_simd.c: fill_adversarial_neg and fill_adversarial_mixed moved outside #if ARCH_X86; NEON test arm added for motion_score_pipeline_16_neon. On upstream sync, ensure the x86 test body still compiles.

Fork-local files: core/src/meson.build (lib split), core/src/feature/arm64/float_adm_neon.c (reduction stability), core/src/feature/arm64/AGENTS.md (invariant note), core/src/feature/adm.c (comment), core/test/test_motion_v2_simd.c (NEON test arm), docs/adr/0873-arm64-neon-bit-exactness-audit.md, changelog.d/fixed/arm64-neon-bit-exactness-audit.md.

Logging consistency audit — 2026-05-30

No rebase impact on upstream. All routed sites are fork-local: core/src/libvmaf.c (vmaf_write_output — fork-added entry point added by the --precision/output overhaul), core/src/sycl/dispatch_strategy.cpp (fork-only file, ADR-0181), core/src/sycl/common.cpp (fork-only file). Vendored core/src/svm.cpp and upstream-mirror feature extractors (vif.c, adm.c, ms_ssim.c, motion.c, ssim.c) are explicitly deferred precisely because they carry upstream-sync invariants — leaving them untouched preserves the rebase story.

Fork-local files: core/src/libvmaf.c, core/src/sycl/dispatch_strategy.cpp, core/src/sycl/common.cpp, docs/research/logging-consistency-audit-2026-05-30.md, changelog.d/changed/logging-consistency-audit.md.

ADR-0870 — Helm values.schema.json + dev-MCP path drift — 2026-05-30

no rebase impact: all touched files are fork-additions (deploy/helm/, dev/Containerfile, dev/docker-compose.yml, .dockerignore, docs/adr/0870-*.md, docs/adr/README.md, docs/adr/_index_fragments/_order.txt, docs/development/k8s-deployment.md, changelog.d/added/0870-*.md, changelog.d/fixed/0870-*.md, docs/state.md). None of these have upstream Netflix/vmaf counterparts. The Containerfile path fixes (libvmaf/core/) are the downstream of ADR-0700's repo rename; future rebases against a hypothetical upstream that re-introduced a libvmaf/ directory at the repo root would need their own audit, but no such state exists or is planned.

ADR-0868 — GPU backend kernel coverage gap-fill — 2026-05-30

No rebase impact: all changes are net-new fork-local test files under core/test/test_{cuda,hip,sycl,metal}_*_parity*.c plus their meson wiring. No upstream Netflix/vmaf source is touched. The tests target fork-added GPU extractor names (psnr_cuda, ciede_cuda, psnr_hip, vif_hip, psnr_sycl, vif_sycl, the 8 *_metal extractors) which do not exist upstream. Mirrors the existing test_{cuda,hip,sycl}_motion3_parity.c pattern (already fork-local).

Fork-local files: core/test/test_cuda_psnr_parity.c, core/test/test_cuda_ciede_parity.c, core/test/test_hip_psnr_parity.c, core/test/test_hip_vif_parity.c, core/test/test_sycl_psnr_parity.c, core/test/test_sycl_vif_parity.c, core/test/test_metal_kernel_registration.c, core/test/meson.build (additive blocks only, no upstream-touching hunks), docs/adr/0868-gpu-backend-kernel-coverage.md, docs/research/gpu-backend-kernel-coverage-audit-2026-05-30.md, changelog.d/added/0868-gpu-backend-kernel-coverage.md.

test/feature-extractor-coverage-push — 2026-05-30

no rebase impact: REASON — test-only changes confined to fork-local files under core/test/. New test file core/test/test_mkdirp.c is wholly fork-added (no upstream equivalent); the four touched files (core/test/test_luminance_tools.c, core/test/test_feature.c, core/test/test_feature_extractor.c, core/test/meson.build) gain only new static char *test_… functions and registrations — no existing logic edited. No production source under core/src/ is touched. The test_mkdirp binary compiles core/src/feature/mkdirp.c directly into its TU; this is the same pattern other test binaries already use (test_ref compiles ../src/ref.c, test_thread_pool compiles ../src/thread_pool.c, etc.), so the linkage introduces no new precedent.

Fork-local files: core/test/test_mkdirp.c (new), core/test/test_luminance_tools.c, core/test/test_feature.c, core/test/test_feature_extractor.c, core/test/meson.build, changelog.d/added/feature-extractor-coverage-push.md.

fix/simd-bug-audit-20260531 — 2026-05-31

no rebase impact: fork-local SIMD entry points only. The two patched files (core/src/feature/x86/float_adm_avx2.c, core/src/feature/arm64/float_adm_neon.c) are fork-added SIMD ports of upstream adm_dwt2_s; they are not yet wired through compute_adm (ADR-0873 follow-up). Upstream Netflix/vmaf has neither file. The change harmonises the NULL-allocation guard with the already-shipped AVX-512 sibling (float_adm_avx512.c) which has been the de-facto reference since master tip; no upstream merge can collide. The third file (core/src/feature/arm64/ssimulacra2_host_neon.c) is wholly fork-added (SSIMULACRA 2 is a fork extractor) and the edit is comment- only.

Fork-local files: core/src/feature/x86/float_adm_avx2.c, core/src/feature/arm64/float_adm_neon.c, core/src/feature/arm64/ssimulacra2_host_neon.c, changelog.d/fixed/simd-float-adm-dwt2-unchecked-aligned-malloc.md.

ai/ tempfile + path-safety bandit sweep — 2026-05-30

no rebase impact: REASON — every touched file lives under ai/scripts/ or ai/tests/, all of which are wholly fork-local (Netflix upstream ships no tiny-AI training, dataset acquisition, or ONNX export pipeline). No upstream Netflix/vmaf file is touched. Fork-local files: ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/konvid_to_full_features.py, ai/scripts/export_tiny_models.py, ai/scripts/export_u2netp_mirror.py, ai/scripts/export_vmaf_tiny_v{2,3,4}.py, ai/scripts/fetch_konvid_1k.py, ai/scripts/fetch_youtube_ugc_subset.py, ai/tests/test_corpus_base.py, ai/tests/test_feature_extractor_defaults.py, ai/tests/test_merge_corpora.py, ai/tests/test_train_predictor_v2_realcorpus.py, changelog.d/security/ai-tempfile-and-path-safety.md.

Go controller / server / MCP test coverage expansion (2026-05-30)

No rebase impact: all touched files are fork-added Go tests under cmd/ — none have an upstream Netflix/vmaf counterpart (upstream ships no Go sources). The PR adds:

  • cmd/vmafx-controller/main_extra_test.go (new)
  • cmd/vmafx-controller/nodes/registry_edge_test.go (new)
  • cmd/vmafx-server/main_extra_test.go (new)
  • cmd/vmafx-mcp/impl_test.go (new)
  • changelog.d/added/go-controller-mcp-coverage.md (new)
  • docs/state.md (one _Updated: annotation line; no row change)
  • docs/rebase-notes.md (this entry)

No upstream-mirror file is touched.

ADR-0848 — Per-surface doc compliance audit (2026-05-29)

Rebase impact: none. This PR adds only docs/research/, docs/adr/, changelog.d/, and docs/state.md changes. No code, no meson, no public headers.

Future rebases: If PRs that fix the three gaps (Issue A / B / C from Research-0848) are in flight, ensure: - Issue A (Vulkan removal docs): no conflict expected — docs/backends/vulkan/, docs/metrics/features.md, docs/development/build-flags.md are rarely touched. - Issue B (deprecations.md): docs/development/deprecations.md is append-only.

Changelog fragment consolidation (2026-05-29)

no rebase impact: changelog-only — scripts/release/concat-changelog-fragments.sh awk fix + changelog.d/ fragment moves do not touch any upstream Netflix/vmaf source file. No C, Python, or test changes.

float_adm AVX2/AVX-512 F2+F3 precision fix (ADR-0844, 2026-05-29)

Rebase invariant: if an upstream Netflix/vmaf commit changes float_adm_csf_den_scale_s, float_adm_sum_cube_s, or any other reduction function in core/src/feature/float_adm.c, the corresponding AVX2 and AVX-512 variants in core/src/feature/x86/float_adm_avx2.c and float_adm_avx512.c must be updated to preserve the double-precision widening contract (ADR-0844 / ADR-0139). The hadd_pd4 and hsum_ps_to_double helpers are static inline and duplicated across TUs intentionally — do not merge them into a shared header. The -ffp-contract=off per-TU static library carve-out in core/src/meson.build (the x86_float_adm_avx2_lib and x86_float_adm_avx512_lib targets) must be preserved on any rebase that touches the meson.build AVX2/AVX-512 build block; they mirror the ssimulacra2 carve-out already in tree.

AVX-512 motion parity tests (ADR-0854, 2026-05-29)

no rebase impact: REASON — changes are confined to new test files (core/test/test_motion_avx512_parity.c, changelog.d/added/motion-avx512-parity-tests.md, docs/adr/0854-motion-avx512-parity-tests.md) and additive changes to core/test/simd_bitexact_test.h (new helper function) and core/test/meson.build (new test registration). No upstream Netflix/vmaf production source is modified; no existing test is changed; no golden assertions are touched.

ADR-0852 — HIP speed extractor wiring (2026-05-29)

no rebase impact: the three changed files (core/src/meson.build, core/src/hip/meson.build, core/src/feature/feature_extractor.c) are fork-owned; no upstream Netflix/vmaf C source is touched. The only upstream- adjacent file is feature_extractor.c whose #if HAVE_HIP block is a fork-added section; conflicts are only possible with other HIP-wiring PRs.

Dependency audit 2026-05-30 — golang.org/x/net + x/sys bump

No rebase impact: the only changed files are go.mod / go.sum, plus a changelog fragment and a research digest. The Go workspace is a fork-only addition (Netflix/vmaf upstream does not ship Go modules); there is no upstream baseline to rebase against. Versions: golang.org/x/net v0.53.0 -> v0.55.0, golang.org/x/sys v0.43.0 -> v0.45.0, golang.org/x/term v0.42.0 -> v0.43.0, golang.org/x/text v0.36.0 -> v0.37.0 (minimum-version selection).

Fork-local files: go.mod, go.sum, changelog.d/security/dependency-audit-2026-05-30.md, docs/research/dependency-audit-2026-05-30.md.

CodeQL Go coverage + config conflict resolution (ADR-0811, 2026-05-29)

no rebase impact: CI-config-only change; no public API surface affected. All changes are confined to .github/codeql-config.yml (Go paths addition + gen/go exclusion), .github/workflows/security-scans.yml (new codeql-go job), docs/adr/0811-security-codeql-go-pvr.md, and the changelog fragment. No upstream Netflix/vmaf files are touched; no C/Python/Go production code is modified. On upstream sync, the CodeQL workflow additions apply cleanly regardless of upstream changes.

Fork-local files: .github/codeql-config.yml, .github/workflows/security-scans.yml, docs/adr/0811-security-codeql-go-pvr.md, changelog.d/security/0811-codeql-go-config-fix.md.

release-please draft mode

no rebase impact: release-tooling-only change (release-please-config.json "draft": true). No C sources, headers, or test logic modified.

Coverage-overrides audit — tighten tiny_extractor_template.h (ADR-0881, 2026-05-30)

no rebase impact: REASON — changes are confined to fork-only files: scripts/ci/coverage-check.sh (fork-only CI gate), the new docs/adr/0881-*.md ADR, the new docs/research/0881-*.md digest, the ADR index fragment under docs/adr/_index_fragments/, and the changelog.d/changed/ fragment. The threshold ratchet only tightens an existing override (10 → 75) — does not introduce a new path Netflix upstream might also override. Future audits per the codified rule (see ADR-0881 §Decision) are also fork-only since coverage-check.sh itself is fork-only (Netflix upstream has no equivalent gate).

vmafx-operator envtest etcd setup (2026-05-30)

no rebase impact: REASON — all changes are in fork-added paths only. Files touched: Makefile (new setup-envtest + setup-envtest-env targets in the Go workspace section, ADR-0702 scope), .github/workflows/go-ci.yml (new pre-test step installing sigs.k8s.io/controller-runtime/tools/setup-envtest@latest + exporting KUBEBUILDER_ASSETS), cmd/vmafx-operator/internal/controller/suite_test.go (top-of-TestControllers t.Skip() guard + nil-testEnv bailout in AfterSuite), cmd/vmafx-operator/AGENTS.md (new invariant #6 documenting the skip-safe envtest pattern), and the changelog.d/fixed/ fragment. cmd/vmafx-operator/ is fork-added per ADR-0714 — upstream Netflix/vmaf ships no Go sources, so no upstream merge can reach these files.

log.c → log.cpp C++23 pilot (ADR-0708 Wave 1, 2026-05-30)

Upstream Netflix libvmaf/src/log.c is fork-renamed to core/src/log.cpp. Future port-upstream-commit runs that touch libvmaf/src/log.c must apply changes to core/src/log.cpp instead — the fork-rename mapping is recorded here.

Public C ABI is preserved: core/src/log.h retains the same two function prototypes (vmaf_log, vmaf_set_log_level) and now carries extern "C" guards so it is includable from both C and C++ TUs. The C-mangled exported symbols are unchanged (nm libvmaf.so shows vmaf_log and vmaf_set_log_level with the same C-mangling as the prior log.c build).

Meson wiring: log.cpp compiles in an isolated log_cpp23_lib static_library with override_options: ['cpp_std=' + libvmaf_cpu_cpp_std], mirroring the metadata_handler_cpp20_lib pattern (ADR-0708 metadata_handler pilot). Test executables that previously direct-compiled ../src/log.c (test_lpips, test_dists, test_feature_extractor, test_speed, ...) now pick up log symbols via the shared log_cpp23_test_objects aggregate in core/test/meson.build.

Fork-local files: core/src/log.cpp (was: core/src/log.c, removed), core/src/log.h (added extern "C" guards), core/src/meson.build (replaced log.c source entry with log_cpp23_lib), core/test/meson.build (added log_cpp23_test_objects, removed inline '../src/log.c' source entries from ~20 test execs, wired test_log into the fast suite), docs/adr/0708-vmafx-cpp23-internals-pilot.md (consequences cross-link), changelog.d/changed/log-c-to-cpp23.md.extern "C" guards added: log.h, model.h, read_json_model.h, opt.h. Any upstream commit that adds new declarations to these headers must include the guard-wrapped declaration for correctness. Flag in the port if upstream adds a declaration outside the guard block.

port/upstream-netflix-may-jun-2026 — 2026-06-01

Five Netflix upstream commits ported. Each reduces the diff against upstream and therefore reduces future rebase friction.

  1. e4b93c6ed (fetch_picture direct-read): core/tools/vmaf.c no longer has a #ifdef USE_DIRECT_READ branch. Future upstreams that touch vmaf.c will now merge cleanly without the compile-guard conflict.

  2. a4a1492d3 (integer_motion rename): core/src/feature/integer_motion.c and core/src/feature/x86/motion_avx2.{c,h} / motion_avx512.{c,h} are now at upstream parity. integer_motion_v2.c and motion_v2_avx2/512 are fork-local (GPU build paths); any future upstream touch of those names should check whether the GPU backends have been updated to the renamed API.

  3. c2155d6cd (2160p CSF): core/src/feature/barten_csf_tools.h is now at upstream parity. core/test/test_barten_csf.c has new upstream tests.

  4. 9a078011c (ADM SIMD fix): core/src/feature/integer_adm.c and core/src/feature/x86/adm_avx2.c + adm_avx512.c at upstream parity.

  5. 30f472b14 (Speed_chroma AVX): core/src/feature/speed.c, core/src/feature/x86/speed_avx2.{c,h}, core/src/feature/x86/speed_avx512.{c,h} are new upstream-mirror files. Future upstream touches to speed.c may need to propagate compute_cov_kernel into the GPU speed_chroma extractors.

Fork-local files touched: core/tools/vmaf.c (commit #1 — call-site updates for signature change), core/src/feature/feature_extractor.c (commit #2 — remove CPU v2), core/src/meson.build (commits #2, #5 — add speed_avx2/512, remove motion_v2 CPU build).


ADR-0700 Dockerfile path residuals — 2026-05-30

no rebase impact: REASON — touches only fork-added Dockerfiles (docker/Dockerfile.production, docker/Dockerfile.production-gpu, docker/dev/{alpine-3.20,arch,fedora-40}.Dockerfile) and the fork-added dev/Containerfile. None of these files have an upstream Netflix/vmaf counterpart. The change is a literal libvmaf/core/ substitution at source-tree positions (meson setup … core, COPY core/, cd core); install-path / package / filter-name occurrences (/usr/local/include/libvmaf/, libvmaf.so, libvmaf-dev, --enable-libvmaf*) are deliberately preserved because they describe the shipped library / package / ffmpeg-filter surface, not the source layout.


ADR-0709 residual ANSNR references in docs + ai/data — 2026-05-30

no rebase impact: REASON — all changes are fork-local. Touched files are ai/data/feature_extractor.py (fork-added Python helper, no upstream counterpart), docs/metrics/ansnr.md, docs/backends/index.md, docs/backends/cuda/overview.md, docs/backends/hip/overview.md. The HIP and CUDA overviews and the metric page are fork-only docs; the backends index page is also fork-only. No upstream Netflix/vmaf source is touched. The cleanup closes residual references left over after PR #38 (ADR-0709) removed the float_ansnr extractor from every backend.

fix/post-rename-post-vulkan-sweep — 2026-06-01

no rebase impact: post-rename cleanup only. The Containerfile change adds a pkg-install line that cannot conflict with upstream (upstream has no Containerfile). The score_backend.py change is fork-local code with no upstream counterpart. The test and doc updates are purely fork-local.

ADR-0777 — Thread-Safety Audit: CUDA / SYCL / HIP Backends (2026-05-29)

no rebase impact: docs/research + docs/adr only; no source files were changed.

Python dep freshness sweep (ADR-0879, 2026-05-30)

no rebase impact: all touched files are fork-local — ai/pyproject.toml, mcp-server/vmaf-mcp/pyproject.toml, dev-llm/pyproject.toml, tools/vmaf-tune/pyproject.toml, tools/vmaf-roi-score/pyproject.toml, python/test/requirements.txt. Netflix upstream does not ship the ai/, mcp-server/, dev-llm/, or tools/vmaf-* trees; the only file shared with upstream (python/test/requirements.txt) gained a >=7.1.0 floor on pytest-cov which is purely additive over upstream's bare pytest-cov line. On rebase, keep the bumped floor; if upstream introduces its own ceiling on pytest-cov, intersect rather than overwrite.

pyright-strict-audit (2026-05-30, ADR-0888)

no rebase impact: REASON — all touched files are fork-added Python sources under ai/src/ and tools/vmaf-tune/src/ (and the CodecAdapter Protocol in tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, also fork-added). No upstream Netflix/vmaf file is touched. The annotation tightening (TYPE_CHECKING torch import, assert-based Optional narrowing, cast through stub gaps, dropped dead Optional comparisons) does not change runtime behaviour — all 12 fixes are pure type-checker compliance. The audit's companion file pyrightconfig.audit.json is intentionally gitignored so this PR doesn't introduce a CI gate before the long-tail cleanup is done.

cuda-ms-ssim-double-precision-lcs (2026-06-03, ADR-0990)

no rebase impact: REASON — all touched files are fork-added CUDA sources (core/src/feature/cuda/integer_ms_ssim/ms_ssim_score.cu, core/src/feature/cuda/integer_ms_ssim_cuda.c) and docs. Netflix upstream does not ship a CUDA ms_ssim kernel. The invariant to preserve on any future port of ssim_accumulate_default_scalar changes from upstream: the ms_ssim_vert_lcs kernel must keep double for my_l/my_c/my_s, the shared-memory warp partial arrays, and the c1/c2/c3 parameters — these are load-bearing for the places=4 parity gate (ADR-0990 / ADR-0139). If upstream changes the scalar 2.0 * to 2.0f * (regressing to float), do NOT mirror that change into the CUDA kernel without also updating the parity test tolerance.

sycl-float-ssim-ssimulacra2-parity-research (2026-06-03, ADR-0985)

no rebase impact: REASON — changes are: (1) a new test file core/test/test_sycl_float_ssim_parity.c (fork-added, no upstream equivalent), (2) a meson.build test entry, (3) a research document, (4) an ADR, (5) a clarifying comment in integer_ssim_sycl.cpp (fork-added GPU kernel), and (6) state.md / changelog.d fragment updates. No CPU scalar, no public API, no Netflix upstream file is touched.

perf/arm64-float-moment-sve2 (2026-06-03, ADR-0584)

Rebase note: core/src/feature/float_moment.c gains an #if HAVE_SVE2 block that selects compute_1st_moment_sve2 / compute_2nd_moment_sve2 over the NEON fallback when VMAF_ARM_CPU_FLAG_SVE2 is set. The scalar default and the NEON path are unchanged; SVE2 is purely additive. core/src/meson.build gains a new arm64_moment_sve2_lib static library inside the existing if is_sve2_supported block. core/test/test_moment_simd.c gains four SVE2 test functions guarded by #if HAVE_SVE2. docs/backends/arm/overview.md updates the per-feature coverage table. No upstream Netflix/vmaf file is touched; no public C API or CLI flag changes.

shared-strict-json-helpers (2026-06-03, ADR-0988)

no rebase impact: REASON — all touched files are fork-added Python modules (tools/vmaf-tune/src/vmaftune/compare.py, report.py, benchmark.py; mcp-server/vmaf-mcp/src/vmaf_mcp/server.py) with no upstream Netflix/vmaf equivalent. The changes are import additions and private-function removals; no public API, no C sources, no Netflix golden-data files are touched.

sycl-motion-add-uv (2026-06-03, ADR-0989)

no rebase impact: REASON — all changed files are fork-added GPU backends (integer_motion_sycl.cpp, integer_motion_cuda.c, motion_vulkan.c, integer_motion_hip.c, integer_motion_metal.mm). The upstream Netflix integer_motion.c is not modified. If upstream adds motion_add_uv to integer_motion.c in a future sync, check whether the SYCL per-plane normalization formula remains consistent.

avx512-float-moment (2026-06-03, ADR-0987)

no rebase impact: REASON — all touched files are fork-added x86 SIMD sources (core/src/feature/x86/moment_avx512.c, moment_avx512.h) and the dispatch addition inside float_moment.c is guarded by HAVE_AVX512 / VMAF_X86_CPU_FLAG_AVX512 ifdefs that are invisible on any non-AVX-512 build path. The four new parity test cases in test_moment_simd.c are also guarded by HAVE_AVX512 and do not touch any upstream Netflix file. No public C API, no meson_options.txt entry, no CLI flag, no ffmpeg patch is changed.

CUDA motion 8-frame SAD batching (ADR-0845, 2026-05-29)

core/src/feature/cuda/integer_motion_cuda.c — structural change to MotionStateCuda (sad ring, score_ring, last_batch_boundary fields) and rewrite of submit/collect/flush.

Rebase impact: MEDIUM. If an upstream Netflix commit touches integer_motion_cuda.c, expect a conflict in submit_fex_cuda and collect_fex_cuda. Resolution rules: 1. Keep the batching structure (sad[] ring, batch-boundary sync in collect). 2. Apply upstream logic changes (e.g., score normalization formula, new options) to the batch-emit paths rather than the old per-frame paths. 3. Verify the ADR-0358 invariant: cuMemsetD8Async is always on pic_stream, NOT s->str. 4. The emit_batch_scores() frame_index save/restore must survive; dropping it breaks motion3 moving-average correctness.

core/src/feature/cuda/AGENTS.md — new section "Motion SAD batch fencing": keep verbatim on rebase.

ADR-0930 — Helm NetworkPolicy + PSS baseline — 2026-05-31

no rebase impact: REASON — every touched file lives entirely under deploy/helm/vmafx/ (a fork-local directory; Netflix upstream ships no Helm chart), plus fork-local documentation under docs/development/, docs/adr/, docs/research/, docs/state.md, docs/rebase-notes.md (this file), and changelog.d/added/. None of the production C, Go, Python, FFmpeg-patch, or Meson surfaces are touched. Future upstream syncs cannot conflict with this change.

Fork-local files: - deploy/helm/vmafx/values.yaml (UID + seccomp + networkPolicy block) - deploy/helm/vmafx/templates/networkpolicy.yaml (new) - deploy/helm/vmafx/templates/operator-deployment.yaml (inherit from .Values) - deploy/helm/vmafx/templates/tests/test-connection.yaml (inherit from .Values) - deploy/helm/vmafx/templates/NOTES.txt (PSS / NP hints) - docs/development/k8s-deployment.md (Pod security + NetworkPolicy sections) - docs/adr/0930-helm-networkpolicy-pss.md and matching _index_fragments/ entry - docs/research/0930-helm-networkpolicy-pss.md - changelog.d/added/helm-networkpolicy-pss.md - docs/state.md (closed row)

fix(hip): integer_ms_ssim_hip picture_copy normalization — 2026-06-03

no rebase impact: REASON — changes touch only core/src/feature/hip/integer_ms_ssim_hip.c (fork-only HIP backend, no upstream equivalent in Netflix/vmaf), docs/state.md (fork-local bug tracker), changelog.d/fixed/ (fragment), and docs/rebase-notes.md (this entry). core/test/test_hip_ms_ssim_parity.c and the corresponding meson.build entry were already present on master (merged from gpu-runtime-bug-audit). No CPU scalar path, no public header, no Netflix upstream file is touched.


feat(simd): integer-ssim-avx2 (ADR-0784)

Branch: feat/integer-ssim-avx2 Touches: core/src/feature/integer_ssim.c, core/src/feature/x86/integer_ssim_avx2.{c,h}, core/src/meson.build (x86_avx2_sources), core/test/test_integer_ssim_simd.c, docs/adr/0784-integer-ssim-avx2.md, docs/backends/x86/integer-ssim-avx2.md.

Adds AVX2 dispatch for the horizontal moment accumulation pass in integer_ssim.c. The dispatch uses function pointers in IntegerSsimState; the integer_ssim_moments_t struct in integer_ssim_avx2.h must stay layout-identical to ssim_moments in integer_ssim.c. Any upstream refactor of ssim_moments field order requires a matching update in the AVX2 header.


chore(ci): ci-workflow-name-shortening (ADR-0995)

Branch: chore/ci-shorten-workflow-names

no rebase impact: pure CI display-name rename; no C/C++/Python source touched.


fix(dnn): add missing vmaf_ort_internal_input/output_elem_type accessors

Branch: fix/dnn-ort-internals-missing-elem-type-accessors Touches: core/src/dnn/ort_backend_internal.h, core/src/dnn/ort_backend.c, changelog.d/fixed/dnn-ort-internals-elem-type-accessors.md.

No rebase impact: the added symbols (VmafOrtElemType, vmaf_ort_internal_input_elem_type, vmaf_ort_internal_output_elem_type) are fork-local internal-test helpers with no upstream Netflix/vmaf analogue. The VmafOrtSession.input_elem_types / .output_elem_types fields and the VMAF_HAVE_DNN guard structure they read from are also fork-local. Conflict probability on these files with upstream is zero.


chore(scripts): modernization-audit scanner — reduce false-positive noise

Branch: chore/modernization-audit-false-positive-filter Touches: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, changelog.d/fixed/modernization-audit-calibration-and-closed-row-noise.md.

No rebase impact: changes are confined to the developer-tools scanner and its test file. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The new module-level constants (CALIBRATION_PLACEHOLDER_PATHS, CLOSED_SECTION_HEADINGS_RE, CLOSED_ROW_RE) and the updated scan_state_files / _marker_suppressed functions have no upstream analogue; there is no merge conflict possible.


fix(rust): vmafx-sys Default trait + Rust CI re-trigger

Branch: fix/rust-ci-vmafx-sys-build-dep

no rebase impact: adds impl Default for VmafContext in bindings/rust/vmafx-sys/src/safe.rs. Fork-local Rust crate with no upstream analogue; no C source, public header, or Python file is touched.


fix(perf): scaffold perf gate baseline + advisory threshold (ADR-1005)

Branch: fix/perf-gate-advisory-threshold-adr1005

no rebase impact: adds --advisory and --skip-if-no-baseline flags to scripts/perf/check-regression.py, updates the CI workflow step comment and flags, and adds docs/development/perf-gate.md. No C source, public header, Netflix golden assertion, or upstream-mirrored Python file is touched.


test(c): CPU feature extractor coverage push — round 3

Branch: test/cpu-extractor-coverage-push

no rebase impact: adds four new test-only .c files under core/test/ and wires them into core/test/meson.build. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The new tests exercise existing extractor paths; no new symbols are introduced.


chore(cppcheck): audit + cite all cppcheck-suppress comments

Files touched: core/src/feature/vif.c, changelog.d/chore/cppcheck-suppress-cite-audit.md, docs/rebase-notes.md.

No rebase impact: comment-only edit to vif.c; adds [MISRA-C:2012-11.3/EXP36-C] citations to 10 bare cppcheck-suppress invalidPointerCast annotations. No logic changed; no public header, Netflix golden assertion, or upstream-mirrored symbol is affected.


test(compat-python-vmaf): coverage push round 2 — Asset + ResultStore + crossval

Branch: test/compat-python-vmaf-coverage-push (or equivalent worktree branch)

no rebase impact: all four new test files (test_asset.py, test_result_store.py, test_cross_validation.py, test_tools_misc.py) live exclusively under compat/python-vmaf/tests/. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The tests exercise existing public APIs only and add no new production code paths.


chore(adr-0726): final Vulkan residual scrub — config flags + Docker + comments

Branch: chore/adr-0726-vulkan-residual-scrub

no rebase impact: removes dead Vulkan build-matrix rows and updates stale Vulkan references in docs, CLAUDE.md, and AGENTS.md files to past tense. No C source files changed. No public headers changed. No upstream-mirrored Python files changed. No Netflix golden assertions touched. The only structural change is removing two dead CI matrix rows that would fail anyway (meson rejects the unknown enable_vulkan option).

ADR-1011 — CUDA symbol visibility (2026-06-04)

No rebase impact. Adding static to TU-internal functions has no ABI or behaviour effect — all call sites are function-pointer assignments within the same translation unit. No public headers changed.

ADR-1010 — MCP server JSON parse guards (2026-06-04)

No rebase impact. Error-handling only — wraps two json.loads calls. No protocol, API, or tool-schema changes. Output format on the success path is unchanged.

ADR-1008 — C lifecycle + test correctness fixes (2026-06-04)

No rebase impact. pic_cnt increment timing change only affects error-retry callers (extremely uncommon path). Div-by-zero guard only fires when n_subsample covers all frames in range (degenerate caller). Test fixes in test_feature_collector.c and test_framesync.c are test-only with no production code change.

ADR-1007 — C string/numeric UB fixes (2026-06-04)

No rebase impact. All changes are guarded code paths that only fire for unusual caller-supplied values (NULL string defaults, model names shorter than 5 chars, tiny ADM frame dimensions). No public API, golden assertions, or ABI touched.

ADR-1012 — Go queue state-machine guards (2026-06-04)

No rebase impact. Both changes affect only the internal SQLite write path of the controller queue. No public proto/gRPC API change. Callers that receive the new 'job was cancelled before assignment' error from PullWork should retry — the controller's own retry loop already does this.

ADR-1009 — Go shutdown goroutine fixes (2026-06-04)

No rebase impact. WaitForShutdown drain-window change only affects shutdown timing (returns up to 30s earlier on clean shutdown). GracefulStop hard-stop fallback only fires on stuck streaming RPCs. No public API, ABI, or golden assertion touched.


fix(observability): Prometheus registry isolation + timer leak (ADR-1014)

Branch: fix/r5-prometheus-registry

no rebase impact: all changes are in pkg/observability/observability.go and its test file. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The struct gains two private fields (reg prometheus.Registerer, sourcesOnce sync.Once) which are zero-valued before NewMetrics is called — no caller needs updating. The WaitForShutdown time.Aftertime.NewTimer + defer Stop() change is behaviour-equivalent; the only observable difference is the timer being released promptly on early return rather than at GC time.


fix(operator): Go operator resource-allocation — http.Client + gRPC dial (ADR-1017)

Branch: fix/r5-go-timer-ctx

no rebase impact: all changes are in cmd/vmafx-operator/internal/controller/. No public Go API, CRD schema, RBAC manifest, Helm template, or C source is touched. The SetupWithManager change is additive (adds an if r.HTTPClient == nil guard). No Netflix golden assertions touched.


fix(mcp,controller): exec.CommandContext + gRPC panic recovery (ADR-1018)

Branch: fix/r5-mcp-exec-ctx

no rebase impact: changes are in cmd/vmafx-mcp/impl.go and cmd/vmafx-controller/grpc_server.go. The runVmafScore Go signature change is internal to cmd/vmafx-mcp (all three callers are in the same file). No public MCP tool schema, JSON-RPC protocol, or gRPC proto file changes. No C source, public header, or Netflix golden assertions touched.


fix/y4m-dst-buf-read-sz-overflow (2026-06-04)

Files touched: core/tools/y4m_input.c, docs/adr/1022-y4m-dst-buf-read-sz-overflow.md, changelog.d/fixed/1022-y4m-dst-buf-read-sz-overflow.md

no rebase impact: the fix adds (size_t) casts to five arithmetic expressions in y4m_input_open_impl(). No public header is changed. No API surface changes. The upstream y4m_input.c source differs from this fork's copy (earlier ADR-0977 fixes are already in tree); cherry-picks of upstream Y4M changes will need to re-apply the same cast pattern to any newly introduced chroma branches.


fix(auth,nodes): constant-time session-token compare + JWT nbf validation (ADR-1021, 2026-06-04)

Files touched: cmd/vmafx-controller/nodes/registry.go, cmd/vmafx-controller/auth/middleware.go, mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py, cmd/vmafx-controller/nodes/registry_test.go, cmd/vmafx-controller/auth/middleware_test.go, docs/adr/1021-session-token-const-time-compare.md, changelog.d/fixed/r5-crypto-const-time-session-token.md

no rebase impact: security-only bug-fix with no public API changes. All modified symbols are internal (non-exported comparison logic, JWT payload struct field). No C source, public C API header, upstream-mirrored Python, Netflix golden assertion, or ffmpeg-patches file is touched.


fix/mcp-asyncio-adr1023 (2026-06-04)

Files touched: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, docs/adr/1023-mcp-asyncio-correctness.md, docs/adr/README.md, changelog.d/fixed/mcp-asyncio-correctness.md

no rebase impact: changes are isolated to the MCP server Python module and docs. No C source, public C header, Netflix golden-assertion file, or upstream-mirrored Python harness is touched. The changes are pure async-safety fixes inside coroutines that already existed; no public API or tool schema changes.


fix/r5-memory-ordering (2026-06-04, ADR-1020)

Files touched: core/src/ref.c, core/src/ref.cpp, core/src/ref.h, core/src/feature/feature_collector.h, core/src/feature/feature_collector.cpp, core/src/picture_pool.c

If an upstream Netflix commit touches any of these files, review the following invariants before accepting:

  • ref.c / ref.cpp: the decrement must remain memory_order_acq_rel; any upstream change that reverts to bare atomic_fetch_sub must be re-annotated.
  • feature_collector.h: the destroyed field must survive struct layout changes; all new public entry points that lock feature_collector->lock must add the destroyed-guard pattern immediately after the lock call.
  • picture_pool.c fetch path: the pool->pictures[idx] copy must happen before the unlock; if upstream refactors the fetch function, preserve that ordering.

fix/integer-ssim-moments-type-non-x86 (2026-06-04, ADR-1040)

Files touched: core/src/feature/integer_ssim.h (new), core/src/feature/integer_ssim.c, core/src/feature/x86/integer_ssim_avx2.h

If an upstream Netflix commit adds or renames fields in the SSIM accumulation buffer, the shared header integer_ssim.h must be updated to match. The layout invariant (six consecutive int64_t fields, identical to the private ssim_moments struct) is documented in ADR-0784 and ADR-1040; any upstream layout change that breaks the direct cast in accum_row_scalar_8 / accum_row_scalar_16 requires a coordinated update to both the typedef and the cast sites.


fix/ci-go-rust-red-adr1041

no rebase impact: CI configuration change (go-ci.yml) and test/meson.build build guard. Neither touches public API or upstream-mirrored C code.


feat/0804-vmaf-context-get-backend

no rebase impact: purely additive public C API addition (new enum VmafBackend and new function vmaf_context_get_backend()). No upstream-mirrored code is modified; no existing entry points are changed. ADR-0804.


fix/dev-cuda-gpu-passthrough

no rebase impact: dev/docker-compose.yml only; changes default runtime: and expands capabilities for the NVIDIA passthrough. No C sources, public API, or upstream-mirrored files are touched. ADR-1053.


fix/cppcheck-vif-suppression-syntax

no rebase impact: comment-only change in core/src/feature/vif.c. Corrects cppcheck suppression delimiter from [...] to ; ... in 10 inline comments. No logic, no public API, no upstream-mirrored code changed.


chore/state-md-stale-open-rows-sweep-20260606

no rebase impact: docs/state.md only — removes 2 stale Open rows already in Recently Closed, updates 1 Open row's owner reference, and fixes a duplicate Recently Closed row. No C sources, public API, or upstream-mirrored files are touched.

test/go-vmafx-node-coverage-r6

no rebase impact: test-only additions to cmd/vmafx-node/executor_extra_test.go and cmd/vmafx-node/bpf/bypass_unit_test.go. No C sources, public API, upstream-mirrored files, or production Go code is changed.

fix/vendored-cjson-pdjson-depth-overflow (ADR-1061, 2026-06-06)

Touches two vendored sources: core/src/pdjson.c and core/src/mcp/3rdparty/cJSON/cJSON.c. Neither file exists in the Netflix upstream tree (pdjson and cJSON are fork-local additions). Rebase against Netflix/vmaf master has zero conflict risk from this change.

The PDJSON_STACK_MAX constant added to pdjson.c is a #define at the top of the file; any future vendor sync that replaces the file will need to re-apply the same define or find a better integration point.

test/svm-multiclass-realloc-and-compose-dri-lint (PR #TBD, 2026-06-06, ADR-1066)

no rebase impact: adds new test file core/test/test_svm_multiclass.c, new lint script scripts/ci/check-compose-dri-writable.sh, and a step in .github/workflows/dev-container-build.yml. No existing C source, public API, or upstream-mirrored file is modified.

fix/go-staticcheck-r10-timer-body (ADR-1065)

no rebase impact: all changes are fork-local Go files (pkg/storage/, cmd/vmafx-controller/, cmd/vmafx-server/) with no upstream-mirrored C or Python code affected. The time.NewTicker refactor is a semantic no-op for rebases; the MaxBytesReader + ReadTimeout additions are internal to the HTTP handler and do not touch any public API surface.

fix/sanitizer-exclusions-huge-alloc-tests

no rebase impact: only .github/workflows/sanitizers.yml and a changelog fragment are modified. No C source, public API, or upstream-mirrored file is touched.


fix/pic-prealloc-asan-leak (2026-06-06)

no rebase impact: single-line change in core/src/libvmaf.c setting fex_ctx->is_initialized = true before the batch-flush loop in flush_context_threaded. The change is additive — it enables the existing vmaf_feature_extractor_context_close teardown path to run correctly on shared (never-initialized) contexts. No upstream-mirrored file is modified, no public API is affected, no test fixtures change.


fix/win32-pthread-once-redefinition (2026-06-06)

no rebase impact: removes a duplicate block from core/src/compat/win32/pthread.h (typedef, macro, BOOL CALLBACK, and inline function). The surviving first definition is unchanged. No upstream-mirrored file is modified, no public API is affected, no test fixtures change.


fix/ubsan-enum-invalid-value-log-opt (2026-06-06)

no rebase impact: both changes are one-liner casts (static_cast<int>) in core/src/log.cpp and core/src/opt.cpp. Neither file is upstream-mirrored (both are C++23 rewrites of upstream C originals — ADR-0708 and ADR-0772), no public API is affected, and no test fixtures change.


test/operator-controller-coverage (2026-06-06)

no rebase impact: all changes are confined to cmd/vmafx-operator/internal/controller/ test files and the fix to vmafxnode_controller.go (removes the status.lastHeartbeat write — additive correctness only). No public API is affected, no upstream-mirrored file is touched, no test fixtures change. Depends on PR #759 (ADR-1069 CRD schema fix) for the envtest assertions to pass end-to-end with a real API server.


fix/disable-recurring-flaky-tests (2026-06-07)

no rebase impact: the change is two should_fail: true additions to core/test/meson.build and a new ADR file. No source files, no public API, no upstream-mirrored files, and no test fixtures are modified. The test binaries remain compiled; only their expected-failure polarity flips in the Meson test registry.

fix/dnn-onnx-domain-bypass (2026-06-07, ADR-1089)

no rebase impact: all changes are confined to fork-local files (core/src/dnn/onnx_scan.c, core/src/dnn/onnx_scan.h, core/test/dnn/test_onnx_scan.c). Netflix/vmaf has no ONNX DNN surface; no upstream-mirrored file is touched. The scanner's internal enum gains NODE_DOMAIN_FIELD = 7; the scan_node loop gains a new branch; a 50-line read_domain() helper is added. No public API or CLI surface changes.

Re-test: meson test -C build test_onnx_scan (26/26 pass).

fix/r13-gpu-dispatch-env-fast-path-data-race (2026-06-06)

no rebase impact: changes confined to core/src/gpu_dispatch_env.cpp. Adds std::atomic<bool> ready per-slot publication flag. No public header or ABI change; the AtomicBool type is internal to the translation unit.

worktree-wf_b08e0c22-717-2 / fix/framesync-producer-death-deadlock (2026-06-07)

no rebase impact: changes confined to core/src/framesync.c and core/src/framesync.h. Adds vmaf_framesync_abort() and aborted flag. The new function is internal to libvmaf; no public header change.

fix/cuda-stream-event-leak-paths (2026-06-07)

no rebase impact: changes confined to core/src/cuda/picture_cuda.c and four CUDA feature extractors. Graduated cleanup labels only; no new public API or ABI change.

fix/metal-buffer-ownership-leaks (2026-06-07)

no rebase impact: changes confined to core/src/feature/metal/float_ms_ssim_metal.mm and core/src/metal/picture_import.mm. MTLBuffer retain-count fixes only; no public header or ABI change.

fix/mcp-http-edge-cases (2026-06-07)

no rebase impact: changes confined to mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py and a new test file. No C API or public header change.

fix/vmaftune-corner-cases-r14 (2026-06-07)

no rebase impact: changes confined to tools/vmaf-tune/src/vmaftune/cli.py and encode.py. No C API or public header change.

fix/ci-yaml-concurrency-timeout (2026-06-07)

no rebase impact: changes confined to 18 .github/workflows/*.yml files. No source, header, or build file is modified.

fix/ci-workflow-permissions-least-privilege (2026-06-07)

no rebase impact: changes confined to two .github/workflows/ files. No source, header, or build file is modified.

cov/vmafx-controller-queue-nodes-auth (2026-06-07)

no rebase impact: changes confined to cmd/vmafx-controller/queue/queue.go and test files. No public API or ABI change.

fix/operator-crd-status-schema-gaps (2026-06-07)

no rebase impact: changes confined to cmd/vmafx-operator/ and CRD YAML files. No C API or libvmaf surface changed.

fix/ms-ssim-hip-adr0990-precision-parity (2026-06-07)

no rebase impact: changes confined to core/src/feature/hip/integer_ms_ssim_hip.c and ms_ssim_score.hip. No public header or ABI change.

fix/ms-ssim-option-parity-hip-sycl (2026-06-07)

no rebase impact: changes confined to CUDA/HIP/SYCL ms-ssim extractors. No public header or ABI change.

fix/cargo-deny-bsd2-patent-allowlist (2026-06-07)

no rebase impact: changes confined to deny.toml. No source, header, or build file is modified.

fix/rust-pilot-clippy (2026-06-07)

no rebase impact: changes confined to bindings/rust/ and core/src/feature/rust/tad/. No C API or public header change.

fix/coverage-pkg-storage (2026-06-07)

no rebase impact: new test files only (pkg/storage/coverage_test.go, cmd/vmafx-node/bpf/coverage_test.go) + changelog fragments. No production source or public header change.

fix/mcp-streaming-backpressure-disconnect (2026-06-07)

no rebase impact: changes confined to cmd/vmafx-mcp/impl*.go and mcp-server/vmaf-mcp/src/vmaf_mcp/server.py. No C API or public header change.

test/r12-thread-safety-batch-tsan (2026-06-07)

no rebase impact: adds new test file core/test/test_thread_safety_batch.c and updates core/test/meson.build. No production source or public header change.

fix/r12-picture-ref-unref-error-path-coverage (2026-06-07)

no rebase impact: adds test coverage to core/test/test_picture.c only. No production source or public header change.

fix/r14-yuv-input-edge-cases (2026-06-07)

no rebase impact: changes confined to core/tools/y4m_input.c. No public header or ABI change.

fix/r14-cli-flag-parsing (2026-06-07)

no rebase impact: changes confined to core/tools/cli_parse.c and cli_parse.cpp. No public header or ABI change.

fix/test-malloc-leak-r12 (2026-06-07)

no rebase impact: test-only changes to free malloc'd buffers on early exit in test_framesync.c and test_pic_preallocation.c. No production source touched.

fix/test-framework-mu-assert-stderr-output (2026-06-07)

no rebase impact: changes confined to core/test/test.c and core/test/test.h. Fix mu_report writing to stdout instead of stderr; add missing include guard.

worktree-wf_392e91a3-897-12 / fix/ci-action-sha-consistency (2026-06-07)

no rebase impact: corrects inconsistent action SHAs in e2e-k8s, go-ci, and rust-ci workflows. No source, header, or build file is modified.

fix/ort-error-message-logging (2026-06-07)

no rebase impact: changes confined to core/src/dnn/ort_backend.c. No public header or ABI change.

fix/bench-clock-unchecked-returns (2026-06-07)

no rebase impact: changes confined to core/tools/vmaf.c and core/tools/vmaf_bench.c. No public header or ABI change.

fix/msvc-windows-portability-hygiene (2026-06-07)

no rebase impact: dead code removal in core/src/dnn/model_loader.c and core/src/feature/x86/vif_avx2.c / vif_avx512.c. No public header, ABI, or numeric change.

fix/roi-frame-bytes-odd-dims (2026-06-07)

no rebase impact: changes confined to core/tools/vmaf_roi.c and core/test/test_vmaf_roi.c. No public header or ABI change.

fix/vmaf-per-shot-correctness (2026-06-07)

no rebase impact: changes confined to core/tools/vmaf_per_shot.c and core/tools/test/test_vmaf_per_shot.sh. No public header or ABI change.

fix/compat-python-vmaf-mode-shim (2026-06-07)

no rebase impact: changes confined to compat/python-vmaf/__init__.py, compat/python-vmaf/config.py, and compat/python-vmaf/core/matlab_feature_extractor.py. No C source, public header, or ABI change.

fix/ffmpeg-vmaf-pre-device-full-enum (2026-06-07)

no rebase impact: changes confined to ffmpeg-patches/0002-*.patch and ffmpeg-patches/0014-*.patch. No C source or public header in-tree is modified.

fix/observability-otel-trace-context (2026-06-07)

no rebase impact: changes confined to cmd/vmafx-server/grpc_server.go, pkg/observability/otel_instruments.go, and pkg/score/grpc_client.go. No C source, public header, or ABI change.

worktree-wf_392e91a3-897-1 / fix/ai-atomic-writes (2026-06-07)

no rebase impact: changes confined to ai/src/aiutils/file_utils.py, ai/src/aiutils/run_manifest.py, and five AI scripts. No C source, public header, or build file is modified.

fix/helm-rolling-update-correctness (2026-06-07)

no rebase impact: changes confined to deploy/helm/vmafx/ Helm chart templates and values. No C source, public header, or Go source is modified.

fix/r12-dead-code-and-unused-var-after-pr-train (2026-06-07)

no rebase impact: removes dead code and unused variables from core/src/feature/integer_motion.c and core/test/test_framesync.c. No public header or ABI change.

fix/motion-coverage-picture-ref-include (2026-06-07)

no rebase impact: changes confined to core/test/test_integer_motion_coverage.c. Test-only change. No production source or public header modified.

fix/state-sweep-fix — CI build-matrix + ASan + motion_v2 coverage (2026-06-07)

libvmaf-build-matrix.yml: four meson setup / ninja invocations updated from libvmaf to core — ADR-0700 rename follow-up. Conflicts possible if another branch also edits those four lines; resolve by keeping core as the source dir. tests-and-quality-gates.yml: ASAN_OPTIONS: allocator_may_return_null=1 added to the sanitizer step env; no conflict risk. core/src/picture.c and core/src/picture.h: vmaf_picture_pool_flush() added; no conflict risk (new symbol). core/test/test_integer_motion_v2_coverage.c: manual prev_ref assignments and memset calls removed; tests now call extract in a plain loop. Conflicts possible if another branch edits the same test functions; resolve by keeping the wrapper-managed approach (no manual prev_ref assignment).

fix/ffmpeg-vulkan-ci-job-removal (2026-06-07)

no rebase impact: only .github/workflows/ffmpeg-integration.yml modified (dead job removed) and docs/state.md + changelog.d/fixed/ffmpeg-vulkan-ci-job-removal.md added. No production source, public header, or meson build files touched.

docs/phase4b9-container-only-publishing (2026-06-08)

no rebase impact: docs-only change. CLAUDE.md §15 updated with a new publishing bullet; docs/development/publishing.md and docs/adr/1102-*.md added; docs/adr/README.md gets one new index row; changelog.d/added/1102-*.md added. No production source, public header, meson build files, or test files touched.

fix/codeql-large-parameter-const-pointer (2026-06-12)

Rebase-sensitive: function signatures changed across VIF, SpEED, and CAMBI. - VifBuffer: PADDING_SQ_DATA, PADDING_SQ_DATA_2 (integer_vif.h), pad_top_and_bottom, decimate_and_pad, subsample_rd_8/16 (integer_vif.c), vif_subsample_rd_8/16_avx2 + vif_filter1d_8/16_avx2 (x86/vif_avx2.h/.c), vif_subsample_rd_8/16_avx512 (x86/vif_avx512.h/.c), vif_subsample_rd_8/16_neon (arm64/vif_neon.h/.c): all now take const VifBuffer * instead of VifBuffer by value. The function-pointer typedef in VifState was updated accordingly. Call sites pass &buf or &s->public.buf as appropriate. - SpeedDimensions (speed.c): 11 static functions now take const SpeedDimensions *. Call sites in speed_extract_score pass &s->dimensions. - SpeedInternalDimensions (speed_internal.h/.c): speed_internal_filter_and_downscale and speed_internal_compute_cov_matrix now take const SpeedInternalDimensions *. All GPU backend call sites (cuda/hip/sycl speed twins) pass &s->dim. - CambiBuffers (cambi.c): cambi_score now takes const CambiBuffers *. Call site passes &s->buffers. Conflicts possible if upstream or another branch edits these function signatures or adds new call sites; resolve by carrying the const * form forward.

docs/index.md — Vulkan image-import list item removed (docs/remove-stale-vulkan-image-import-ref)

no rebase impact: docs-only removal of a stale list item; no code or nav structure changed.

fix/functional-matrix-broken-17 (2026-06-12)

no rebase impact: all changes are bug fixes in independent files (bench_all.sh, bisect.py, server.py, cli.py, op_allowlist.c, float_adm_cuda.c, dnn_api.c, Containerfile) with no shared function-signature changes, no renamed symbols, and no upstream-mirrored path modifications.

rc/scaffold-stub-completion — picture_v2 implementation + ai/scripts exit-code fix

Files touched: core/src/picture_v2.c (new), core/test/test_picture_v2.c (new), core/include/libvmaf/picture_v2.h, core/src/meson.build, core/include/libvmaf/meson.build, core/test/meson.build, docs/architecture/vmaf-picture-v2-migration.md, ai/scripts/{gen_calibration,quantize_int8,build_calibration_set,eval_loso_fr_regressor_v2, external_benchmark_pvmaf,fetch_lsvq,gen_dists_sq_placeholder_onnx, gen_mobilesal_placeholder_onnx,gen_ssimulacra2_eotf_lut,hdrsdr_vqa_to_corpus_jsonl, my_corpus_to_corpus_jsonl,train_fr_regressor_v4,train_video_saliency_student}.py.

Rebase impact: low. picture_v2.c is a new file; no upstream file was modified. If upstream ever adds its own picture_v2.c (unlikely given it is a fork-local concept), resolve by keeping the fork's implementation. The ai/scripts exit-code fix is purely script-internal; no C or build system conflict possible with upstream.

fix/rc-gate-three-infra-test-bugs (2026-06-13)

no rebase impact: all changes are test-only. cmd/vmafx-mcp/server_test.go drops t.Parallel() from one test function (no C/API change). core/test/meson.build adds a TSAN_OPTIONS env entry alongside an existing ASAN_OPTIONS entry for one test. core/test/dnn/test_cli.sh adds a DNN-availability probe near the top of the script. None of these files exist in upstream Netflix/vmaf; no upstream merge conflict is possible.

chore/remove-vulkan-moltenvk-dead-leftovers (2026-06-13)

no rebase impact: deletions only (subproject wraps, Docker stages, CI job bodies, moltenvk.md). No shared function signatures changed, no symbols renamed, no upstream-mirrored C paths modified. ABI-reserved enum gaps preserved.

fix/hip-vif-mirror2-boundary (2026-06-13)

no rebase impact for upstream syncs: touches only fork-local HIP files (core/src/feature/hip/integer_vif/vif_statistics.hip) and the fork-local HIP VIF parity test (core/test/test_hip_vif_parity.c). Neither file exists in upstream Netflix/vmaf. The docs/adr/ and docs/backends/hip/overview.md changes are also fork-local. Any future upstream sync that adds upstream files under core/src/feature/hip/ would require manual review of boundary semantics, but no mechanical conflict is possible.

feat/pelorus-vendor-interop-abi (2026-06-14)

no rebase impact for upstream Netflix/vmaf syncs: every file is fork-local and has no upstream counterpart. New vendored mirror under core/src/interop/pelorus_*.c + core/include/libvmaf/pelorus/*.h (sourced from VMAFx/pelorus@835e097, NOT Netflix upstream), the conformance fixture core/test/test_pelorus_interop.c, scripts/sync-pelorus-interop.sh, and the docs/ADR/changelog/state deliverables. The core/src/meson.build and core/test/meson.build edits append to fork-local lists (the libvmaf source list and the test registrations) and do not touch upstream-mirrored build logic.

Rebase-sensitive invariant (cross-repo, NOT upstream): the vendored files are a byte-identical mirror pinned to VMAFx/pelorus@835e097. Never hand-edit them — clang-format/clang-tidy are deliberately excluded for these paths (dir-local .clang-tidy, .cppcheck-suppressions.txt, the make format path filters, the assertion-density skip, and the auto-format-on-edit.sh PostToolUse hook skip). A pelorus ABI bump is re-synced via scripts/sync-pelorus-interop.sh --update (which also bumps the pin), never by editing the mirror in place. If a future change rewrites these files, re-run the sync guard + the conformance fixture before merging.

feat/pelorus-autotune-control-plane (2026-06-14)

no rebase impact for upstream syncs: touches only fork-local files under tools/vmaf-tune/ — a new src/vmaftune/filter_adapters/ package (__init__.py, pelorus_deband.py), a new src/vmaftune/prefilter.py module, three new test files under tests/, plus additive edits to src/vmaftune/cli.py (new imports, a prefilter subparser, a _run_prefilter handler, and one dispatch line). None of these exist in upstream Netflix/vmaf — tools/vmaf-tune/ is entirely fork-added. No shared function signatures changed and no upstream-mirrored paths were touched. The cli.py edits are append-only at well-separated sites (import block, subparser-registration block, handler block, dispatch block), so even a fork-internal rebase against a newer cli.py resolves cleanly. External coupling note: the 10 deband knobs are a verbatim copy of the Pelorus ADR-0110 control-plane contract; a contract change on the Pelorus side requires a matching edit to filter_adapters/pelorus_deband.py in a coordinated two-repo PR (the conformance test fails on drift).

feat/golusoris-server (2026-06-14)

no rebase impact for upstream Netflix/vmaf syncs: every file touched is fork-local Go and has no upstream counterpart. The change rewrites cmd/vmafx-server/*.go (the Go gRPC + HTTP scoring service) onto the golusoris fx framework (ADR-1119), updates Dockerfile.go-server, docs/server/grpc.md, and docs/usage/env-vars.md for the env-var rename, and adds an app_test.go fxtest lifecycle test. None of these exist in upstream; libvmaf's C sources, public headers, and the Netflix golden gate are untouched.

Rebase-sensitive invariants (fork-internal, NOT upstream): - R1 cgo-lifetime stop order. The composition root forces the *libvmaf.Scorer to be constructed BEFORE the golusoris *grpc.Server (an fx.Invoke(func(_ *libvmaf.Scorer) {}) registered ahead of the gRPC service-registration invoke, plus scorer-first arg order in that invoke). fx runs OnStop hooks in reverse construction order, so this guarantees the gRPC server's GracefulStop drains in-flight Score calls before the scorer's Close() releases C resources. TestStopOrderScorerAfterGRPC pins this; do not reorder those invokes or flip the arg order without re-deriving the ordering (see the empirical probe rationale in the PR). - go.mod pin. github.com/golusoris/golusoris stays at v0.3.1. The fx migration only adds transitive // indirect deps (go-grpc-middleware/v2) and promotes go-chi/chi/v5 from indirect to direct via go mod tidy; it does not bump the golusoris pin or touch internal/app/bootstrap. - Env-var contract. VMAFX_HTTP_ADDR / VMAFX_GRPC_LISTEN map to the golusoris http.addr / grpc.listen keys under the VMAFX_ prefix. If golusoris renames those keys, the server's documented env contract must follow.

feat/golusoris-node (2026-06-15)

no rebase impact for upstream Netflix/vmaf syncs: every file touched is fork-local Go and has no upstream counterpart. The change rewrites cmd/vmafx-node/main.go (the Go gRPC worker root) onto the golusoris fx framework (ADR-1119, Phase-1 PR-3), adds cmd/vmafx-node/providers.go (the fx domain providers) and cmd/vmafx-node/scoring_handler.go (the VmafxScoring impl moved out of the now-removed cmd/vmafx-node/server package), refactors cmd/vmafx-node/online_feedback.go (Start/Close lifecycle), updates docs/usage/env-vars.md for the env-var rename, and adds app_test.go + app_scorestream_test.go (fxtest lifecycle + end-to-end ScoreStream). None of these exist in upstream; libvmaf's C sources, public headers, and the Netflix golden gate are untouched. The eBPF loader under cmd/vmafx-node/bpf/ is unrelated to golusoris and was not touched.

Rebase-sensitive invariants (fork-internal, NOT upstream): - R-node lifecycle stop order. The composition root forces the *libvmaf.Scorer to be constructed first, then the *FeedbackClient + *Executor (a lazy-provider guard fx.Invoke(func(_ *FeedbackClient, _ *Executor) {})), then the golusoris *grpc.Server (a standalone fx.Invoke(func(_ *grpc.Server) {}) lazy-provider guard). fx runs OnStop hooks in reverse construction order, so this guarantees: gRPC GracefulStop drains in-flight Score / ScoreStream calls → FeedbackClient drainer stops → scorer Close(). TestStopOrderNode (app_test.go) pins this against the REAL hook firing order; do not reorder those invokes or flip arg order. - Lazy-provider listener guard. grpc.Module's listener binds in an OnStart hook that only runs if *grpc.Server is consumed. The standalone fx.Invoke(func(_ *grpc.Server) {}) is load-bearing — remove it and the node serves nothing. TestAppStartsAndBinds dials the bound addr to prove it. - FeedbackClient drainer lifetime. NewFeedbackClient(log) no longer takes a context or spawns a goroutine; Start() launches the drainer (bound to an internal, Close-owned context) and Close() stops + awaits it. Both are idempotent. Wired to fx OnStart/OnStop in provideFeedbackClient. - go.mod pin. github.com/golusoris/golusoris stays at v0.4.0. The fx migration only adds the transitive // indirect dep go-grpc-middleware/v2 v2.3.3 via go mod tidy; it does not bump the golusoris pin or touch internal/app/bootstrap. - Env-var contract. VMAFX_GRPC_LISTEN maps to the golusoris grpc.listen key under the VMAFX_ prefix (replaces VMAFX_NODE_ADDR). If golusoris renames that key, the node's documented env contract must follow.

feat/golusoris-operator (2026-06-15)

no rebase impact for upstream Netflix/vmaf syncs: every file is fork-local and has no upstream counterpart. cmd/vmafx-operator/main.go is rewritten from a hand-rolled controller-runtime entry point onto the golusoris fx framework (ADR-1119 Phase 1), and cmd/vmafx-operator/main_test.go adds fx-graph validation. The reconcilers under cmd/vmafx-operator/internal/controller/ and the webhooks under cmd/vmafx-operator/internal/webhook/ are unchanged; only their wiring (Setup-against-manager) moved into fx.Invoke hooks. None of these files exist upstream.

Rebase-sensitive invariant (cross-repo, NOT Netflix upstream): this migration requires github.com/golusoris/golusoris >= v0.4.0, because the github.com/golusoris/golusoris/k8s/operator module (introduced by golusoris commit 3df9f1a / PR #224) first appears in tag v0.4.0 and is ABSENT in v0.3.1. The foundation commit (afd66c7ef) pins v0.3.1, which predates k8s/operator — so cmd/vmafx-operator/main.go does not compile until the go.mod golusoris pin is bumped to v0.4.0+. The pin bump is intentionally NOT part of this branch (it is a shared go.mod change owned by the migration orchestrator).

golusoris#227 note: the in-tree main.go does NOT add an app-level ctrl.SetLogger shim — golusoris v0.4.0's operator.Module already calls ctrl.SetLogger(loggerFromSlog(logger)) inside newManager, so a second SetLogger from the app would be redundant. If a future golusoris release reverts that (regressing #227), re-add the shim as an fx.Invoke(func(l *slog.Logger){ ctrl.SetLogger(logr.FromSlogHandler(l.Handler())) }). Likewise webhooks are wired via operator.Options.WebhookPort (also added post-v0.3.1); if that field disappears upstream, the app must stand up its own webhook.NewServer and add it to the manager.

feat/golusoris-mcp (2026-06-15)

no rebase impact for upstream Netflix/vmaf syncs: this PR rewrites only cmd/vmafx-mcp/main.go (the Go MCP server composition root) onto the golusoris fx framework (ADR-1119, Phase-1 PR-5), plus the fork-local docs/changelog/AGENTS deliverables. cmd/vmafx-mcp/ is entirely fork-added and has no upstream counterpart. The MCP tool surface (tools.go, impl.go, impl_direct.go, server.go) is byte-unchanged, and no test file changed.

  • Composition root. The hand-rolled flag.Parse + signal.NotifyContext
  • bespoke stdio/HTTP transport loops + custom observability.InitOTel are replaced by fx.New(bootstrap.Base, fx.Replace(config.Options{...}), bootstrap.FxLogger(), fx.Provide(buildMCPServer), fx.Invoke(runMCPTransport)).Run(). Mirrors cmd/vmafx-server/main.go and cmd/vmafx-node/main.go. Because the MCP server is NOT a golusoris server module (golusoris ships no MCP module), the transport is owned in the runMCPTransport lifecycle hook rather than by a framework module — if golusoris later adds an MCP module, fold the hook into it.
  • bootstrap dependency. This PR consumes internal/app/bootstrap.Base and bootstrap.FxLogger() but does NOT modify them; it shares the bootstrap stanza with the sibling fx migrations (#932/#934/#935/#936). A rebase that reshapes bootstrap.Base (e.g. when golusoris#226 ships a version module, or golusoris#234's LOG_LEVEL prefix-read lands and the env bridge can be deleted) must re-check main() here too.
  • Env bridge (interim). main() bridges VMAFX_LOG_LEVEL → LOG_LEVEL and VMAFX_LOG_FORMAT → LOG_FORMAT before fx.New, identical to the sibling binaries (golusoris#234). Delete all four bridges across the cmd/ tree in one sweep once the carrying golusoris tag lands.
  • Env-var / flag contract change. --transport / --port flags removed; replaced by VMAFX_MCP_TRANSPORT (mcp.transport, default stdio) and VMAFX_MCP_HTTP_ADDR (mcp.http.addr, default :3000). VMAF_BIN and VMAFX_MCP_DIRECT are read directly by the tool handlers (not via koanf) and are unchanged.
  • Rebase-sensitive invariant — stdio-stdout purity. Nothing in the fx graph may write to stdout in stdio mode (the JSON-RPC framing owns it). golusoris log → stderr, otel.Module is OTLP-gRPC (no stdout), bootstrap.FxLogger() → slog → stderr. A future rebase that adds an fx.Print-style logger, a stdout OTel exporter, or any fmt.Println to the composition root MUST gate it off in stdio mode. See cmd/vmafx-mcp/AGENTS.md invariant #11.

feat/golusoris-controller (2026-06-15)

no rebase impact for upstream Netflix/vmaf syncs: every file touched is fork-local Go and has no upstream counterpart. The change rewrites cmd/vmafx-controller/*.go (the Go gRPC + HTTP controller: SQLite job queue + node registry + FIFO scheduler + JWT auth) onto the golusoris fx framework (ADR-1119, Phase-1 PR-2), refactors cmd/vmafx-controller/nodes/registry.go, adds an app_test.go fxtest lifecycle suite, and updates docs/usage/env-vars.md for the env-var rename. libvmaf's C sources, public headers, and the Netflix golden gate are untouched; no ffmpeg-patch impact.

Rebase-sensitive invariants (fork-internal, NOT upstream): - golusoris pin → v0.4.1. The PR depends on golusoris#225 (grpc.ProvideServerOption, used to chain the JWT auth interceptors), which is NOT in the v0.4.0 tag the shared go.mod currently pins. The committed go.mod keeps the v0.4.0 pin (no replace directive); the binary + app_test.go will not compile until the orchestrator bumps the pin to the tag carrying #225 (expected v0.4.1). go mod tidy against v0.4.0 is otherwise clean — the migration only adds the transitive // indirect go-grpc-middleware/v2 (golusoris HEAD's grpc.Module recovery/logging interceptors). - R1 stop order. The composition root forces the *libvmaf.Scorer, the SQLite queue.Queue, and the *nodes.Registry to be constructed BEFORE the golusoris *grpc.Server (an fx.Invoke(func(_ *libvmaf.Scorer, _ queue.Queue, _ *nodes.Registry) {}) registered ahead of the gRPC service-registration invoke). fx runs OnStop hooks in reverse construction order, so this guarantees the gRPC GracefulStop drains in-flight RPCs before the queue Close, the node-registry reaper stop, and the scorer Close. TestStopOrder pins this; do not reorder those invokes without re-deriving the ordering. - nodes.Registry lifecycle. NewRegistry(log) no longer takes a context or spawns the reaper at construction; the reaper is launched by Start(ctx) (fx OnStart) and stopped + awaited by Close() (fx OnStop). Every call site (production + tests) must drive Start/Close via the lifecycle rather than passing a caller context. - gen/go/controller proto types are hand-written. Unlike the protoc-generated gen/go (scoring) types, gen/go/controller/*.pb.go are hand-maintained and do NOT implement the protobuf-v2 reflection interface, so VmafxController messages cannot be marshaled by the standard gRPC wire codec. In-process handler tests are unaffected; over-the-wire fxtests therefore use the VmafxScoring service. See the orchestrator note below — regenerating gen/go/controller with buf/protoc is the proper fix and is a candidate follow-up. - Env-var contract. VMAFX_HTTP_ADDR / VMAFX_GRPC_LISTEN map to the golusoris http.addr / grpc.listen keys; auth.tenant_claim / auth.roles_claim are golusoris CompoundKeys (preserve the underscore). If golusoris renames those keys, the controller's documented env contract must follow.

fix/mcp-probe-parity (2026-06-15)

no rebase impact: edits the fork-only MCP servers (cmd/vmafx-mcp/{impl.go,tools.go,impl_test.go,AGENTS.md}, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py) + docs/mcp/tools.md + docs/state.md + changelog. No libvmaf C-API / CLI / meson_options.txt / public-header change → no ffmpeg-patch impact. Rebase-sensitive invariant — probe_backend parity (cmd/vmafx-mcp/AGENTS.md invariant #12): the Go handleProbeBackend and the Python _probe_backend MUST share the same 64×64 (≥36px/dim, CUDA-ADM minimum) synthetic probe frame AND the same runtime_healthy predicate (null/non-finite vmaf.meanruntime_healthy=false, error "vmaf returned exit 0 but score was null"). A rebase that touches either probe handler must keep the two in lock-step.

fix/bughunt-feature-cpu — CIEDE 4:2:2 chroma-upsample flag swap + cambi init leak (2026-06-27)

Rebase impact: DIVERGES from upstream on ciede.c — read carefully on the next sync. core/src/feature/ciede.c scale_chroma_planes / scale_chroma_planes_hbd is a near-verbatim upstream Netflix file, and upstream carries the identical transposition bug: the horizontal sample index keys off ss_ver and the vertical row advance keys off ss_hor. This fix swaps them to the correct chroma-subsample math (horizontal → ss_hor, vertical → ss_ver). Rebase-sensitive invariant for the next syncer: do NOT let an upstream sync silently revert this — if Netflix re-pulls the transposed lines, keep the fork's corrected flags. The bug only manifests on YUV422P (heap OOB + wrong ciede2000 scores); YUV420P is a no-op (both flags set) and YUV444P never calls the function, so the Netflix golden CIEDE2000 pair (420P) is bit-identical either way. Guarded by test_ciede_scale_chroma_422_8b / _16b in core/test/test_ciede.c, which fail against the buggy/upstream form. The core/src/feature/cambi.c change is a fork-internal error-path unwind (route init() failures through close_cambi()); cambi is upstream-mirrored but the change is confined to the -ENOMEM / -EINVAL error paths with no success-path or scoring delta — a sync that re-pulls cambi init() should re-apply the goto fail unwind. No public header, CLI, meson-option, or ffmpeg-patch surface changes.

fix/bughunt-simd (2026-06-27)

no rebase impact: edits fork-added SIMD float_moment paths only (core/src/feature/arm64/moment_sve2.c, core/src/feature/x86/moment_avx2.c, core/src/feature/x86/moment_avx512.c, core/src/feature/arm64/moment_neon.c) + the fork test core/test/test_moment_simd.c. No libvmaf C-API / CLI / meson_options.txt / public-header change -> no ffmpeg-patch impact. No Netflix golden assertion touched. Rebase-sensitive invariant — moment SVE2 lane mapping (core/src/feature/arm64/AGENTS.md): the SVE FCVT .s->.d (svcvt_f64_f32) widens the EVEN-indexed f32 lanes (source element 2i), NOT the lower contiguous lanes; the odd lanes must be widened with the SVE2 FCVTLT (svcvtlt_f64_f32, source element 2i+1). Any future edit to moment_sve2.c must keep the even+odd dual-convert (stepping a full svcntw() register) or it will silently double-count even lanes and drop odd lanes on >128-bit SVE.

fix/bughunt-dnn (2026-06-27)

no rebase impact: edits the fork-only DNN tiny-AI / ONNX Runtime path (core/src/dnn/tensor_io.c, core/src/dnn/ort_backend.c) and its fork-only tests (core/test/dnn/test_tensor_io.c, core/test/dnn/test_ort_internals.c) + docs/state.md + changelog. No libvmaf public-header / CLI / meson_options.txt change, so no ffmpeg-patch impact. The whole core/src/dnn/ tree is fork-added (not present upstream Netflix/vmaf), so there is no upstream-parity conflict surface to track on a future sync.

fix/bughunt-ai (2026-06-27)

no rebase impact: edits training-harness Python only (ai/scripts/{aggregate_corpora,extract_k150k_features,materialize_saliency_features}.py, ai/train/konvid_pair_dataset.py + ai/tests/). No libvmaf C-API / CLI / meson_options.txt / public-header change → no ffmpeg-patch impact. No rebase-sensitive invariants.

fix/bughunt-cli (2026-06-27)

no ffmpeg-patch impact: edits the CLI (core/tools/vmaf.cpp, cli_parse.cpp) only. Deleted dead core/tools/vmaf.c (unreferenced; superseded by vmaf.cpp) + re-pointed 8 stale config/doc refs. Invariant: cli_parse.c is NOT dead — it is the TU compiled into test_cli_parse / test_cli_parse_long_only_args / fuzz_cli_parse; do not delete it on rebase. --help→stdout/exit0, no-frames→exit 101 (VMAF_EXIT_NO_FRAMES_DECODED).

chore/version-3.2.0 (2026-06-27)

no rebase impact: bumps core/meson.build version (x-release-please-version) + .release-please-manifest.json . to 3.2.0 / 3.2.0-lusoris.0 to track upstream libvmaf 3.2.0 SONAME. On an upstream sync, keep the fork's libvmaf version aligned with Netflix's (<upstream-X.Y.Z>-lusoris.N).

fix/round3-build-gpu-batch (2026-06-27)

no ffmpeg-patch impact. R3-6 HIP integer_vif uninit-err (init err=0). R3-9 NVTX libdl → cc.find_library('dl'). R3-10 ssim AVX2 carve-out + _x86_simd_strict_fp_extra (icx -fp-model=precise; no-op on gcc/clang). Invariant: every x86 SIMD carve-out lib that needs bit-exactness under icx must carry _x86_simd_strict_fp_extra; keep the ssim carve-out aligned with its psnr_hvs/ms_ssim/ssimulacra2 siblings.