Rebase notes¶
fix/codeql-quality-batch — code-scanning hygiene (2026-06-27)¶
Small behaviour-neutral quality fixes. Upstream-mirror touches to re-apply on the next upstream sync: removed an unused from collections.abc import Hashable in compat/python-vmaf/tools/decorator.py, and unused pytest/tempfile imports in two compat/python-vmaf/tests/ files. Fork files: core/tools/vmaf.cpp (2-label switch→if), and new include guards on core/src/feature/moment.h / alias.h. The bulk of the code-scanning backlog was resolved by dismissal (verified false-positive/intentional via the GitHub code-scanning API), not code change — see docs/state.md T-CODEQL-QUALITY-BATCH.
fix/round4-ffmpeg-patches — libvmaf_sycl filter leak + QSV NULL guard (2026-06-27)¶
ffmpeg-patches/0005-libvmaf-add-libvmaf-sycl-filter.patch gained two fixes in its libavfilter/vf_libvmaf.c hunk (new-count 335→353): uninit_sycl now calls vmaf_sycl_state_free() after vmaf_close(), and do_vmaf_sycl NULL-guards the QSV mfxHDLPair chain. The patch was regenerated surgically (only those two + blocks + the hunk-header recount; the configure/Makefile/allfilters hunks are byte-unchanged). Do NOT let git format-patch/git am --3way regenerate the whole patch — that fuzzed the configure probe >= 3.0.0→2.0.0 / libvmaf/libvmaf_sycl.h→libvmaf_sycl.h. Verified by a full 16-patch git apply --3way series replay against n8.1.1. Keep these two + blocks on re-sync. Finding #21 (a redundant but idempotent check_pkg_config libvmaf_sycl configure probe) was intentionally left in place.
fix/round4-cli-build-go — round-4 audit bug-fix bundle (2026-06-27)¶
All fork-added/fork-modified surfaces (no upstream-mirror conflict risk):
core/src/meson.build— added_x86_simd_strict_fp_extrato thex86_float_adm_avx2/x86_float_adm_avx512carve-outs (icx fp-model parity; no-op on gcc/clang). Keep when re-syncing the meson SIMD carve-out block.core/tools/vmaf.cpp,core/tools/vmaf_bench.c— fork-added CLI timing helpers (wall_time_s/now_ms): zero-init + cachedstaticQPF frequency.core/tools/meson.build— comment-only path fix.pkg/libvmaf/paths.go— fork-added Go MCP path allowlist (AllowedRootsfail-closed viadiscoverRepoRoot).RepoRoot()signature unchanged.
fix/round4-c-bundle — round-4 audit bug-fix bundle (2026-06-27)¶
Audit-derived bug fixes; several touch upstream-mirror files, so the next upstream sync must preserve these hunks (they are not in Netflix/vmaf):
core/src/feature/ciede.c— init returns-EINVAL(not-ENOMEM) for an unsupported bitdepth; close uses two independentvmaf_picture_unrefguards (was a conjunctive guard that leakeds->refon partial alloc).core/src/feature/cambi.c— close guardsvmaf_picture_unrefons->pics[i].refso never-allocated slots don't poisonerr.core/src/feature/integer_ssim.c— comment-only correction of the GPU-twin note (theconst double smfix itself is unchanged).core/src/read_json_model.c— partial-collection teardown on the non-string-key early return inmodel_collection_parse_loop.core/src/model.c—vmaf_model_collection_appendshort-name path nowgoto fail_modelto freemc->model.
Fork-added files (no upstream-sync concern): core/src/feature/cuda/speed_*, core/src/dnn/ort_backend.c.
gorust-rederive — bound GPU/AI probe subprocesses + Rust -sys picture double-free footgun (2026-06-27)¶
Rebase impact: none on upstream Netflix/vmaf — all fork-local. Every touched surface (pkg/gpu/detect.go, pkg/ai/infer.go, bindings/rust/vmafx-sys/src/safe.rs, core/src/meson.build) is fork-added Go/Rust/build code with no upstream counterpart, so a future /sync-upstream sees no conflict here.
Cross-crate invariant: vmafx-sys::safe::VmafContext::read_pictures and the higher-level vmafx::Context::read_pictures must keep aligned picture-ownership semantics — both consume pictures by value (move) and neither manually unrefs on the error path (the libvmaf contract takes ownership for the call's duration; a second unref is a use-after-free against a CUDA-enabled libvmaf). The vmafx crate side was settled by PR #1056 (round-3 R3-2); this change brings the -sys crate to the same contract. Do not revert either to a borrowing signature or re-add an error-path unref. See bindings/rust/vmafx-sys/AGENTS.md.
Note: this change also restores docs/state.md (truncated to 0 bytes by PR
1055, the pelorus ABI re-vendor) and docs/rebase-notes.md itself (truncated¶
to 0 bytes by PR #1060, the FMA-ADM fix) — two unrelated accidental wipes on master that are recovered here from their last-good blobs.
feat/pelorus-abi-minor3-consume — re-pin vendored Pelorus ABI to minor-3 + consume PEL_SEC_COMPLEXITY (2026-06-27)¶
Rebase impact: none on upstream Netflix/vmaf — all fork-local (ADR-1120, builds on ADR-1113 + ADR-1118). Cross-repo ABI parity invariant: the vendored Pelorus interop mirror is single-sourced in VMAFx/pelorus (ADR-0103) and pinned by PELORUS_VENDOR_SHA in scripts/sync-pelorus-interop.sh — now 818d844 (ABI 1.3, was 835e097 / ABI 1.0). The drift-guard CI gate (sync-pelorus-interop.sh without --update) fails on any divergence from the pin, so a future maintainer must NOT hand-edit the vendored files (core/include/libvmaf/pelorus/*.h, core/src/interop/pelorus_*.c, and the body of core/test/test_pelorus_interop.c from its first vendored #include on) — fix defects upstream in pelorus and re-vendor via --update. - Manifest invariant: the script's manifest array, core/src/meson.build, and the test_pelorus_interop target in core/test/meson.build must stay in lockstep with the pelorus source set. Minor-3 added pelorus/denoise.h + pelorus_denoise_params.c + pelorus_qp_report_csv.c; the last is REQUIRED to link the fixture (pel_x265_csv_parse). - --update now re-vendors the fixture body (previously only the six manifest files), preserving the Lusoris-authored header before the first vendored include. The drift check compares the body whitespace-insensitively. - Vendored files are lint/format-excluded by prefix glob (core/src/interop/pelorus_, core/include/libvmaf/pelorus/) in .pre-commit-config.yaml, Makefile, and scripts/ci/assertion-density.sh — new vendored files matching those prefixes are covered automatically; no new exclusion entries are needed. - Complexity modulation (golden-isolation invariant, rebase-sensitive): perceptual_weight.c::complexity_modulation MUST return exactly 1.0 when PEL_SEC_COMPLEXITY is absent or complexity is non-finite — that is what keeps the no-side-data golden path bit-exact (Netflix 576×324 pair = 76.667831). The guard is test_complexity_modulates_weight/_grid_zero.
fix/bughunt-cuda — CUDA pinned-buffer leaks, motion SAD precision, errno fidelity (2026-06-27)¶
Rebase impact: none on upstream — all fork-local. The CUDA backend (core/src/feature/cuda/) is a fork addition with no Netflix/vmaf counterpart. Touches only core/src/feature/cuda/{float_vif_cuda.c,float_adm_cuda.c, integer_motion_cuda.c,speed_temporal_cuda.c,speed_chroma_cuda.c}. No public header, CLI flag, meson-option, ffmpeg-patch, or Netflix golden-gate surface changes (the golden gate is CPU-only; SpEED is not in the golden pairs). The leak fixes fire only in close_fex_cuda / init-error paths (no success-path behaviour change); the errno fixes only change the value returned on an already-failing CUDA error path (-EIO → the mapped errno); the integer_motion_cuda precision change brings GPU SAD output closer to the CPU double-precision reference (GPU-only, not bit-exact with CPU by design). Rebase-sensitive invariant for the next syncer: in speed_temporal_cuda.c / speed_chroma_cuda.c, every fail: label reached from CHECK_CUDA_GOTO must return _cuda_err; (the macro-mapped errno), not a literal -EIO — matching the CHECK_CUDA_RETURN convention in cuda_helper.cuh. The two manual cuMemcpyDtoH / cuCtxPushCurrent boolean checks deliberately keep their literal -EIO.
fix/bughunt-core-engine — core-engine error-path fixes (2026-06-27)¶
Rebase impact: low — three upstream-mirror files touched, all on error/cleanup paths. core/src/libvmaf.c, core/src/feature/feature_collector.c, and core/src/model.c are upstream-mirror files with Netflix counterparts, so a future /sync-upstream may produce small conflicts here. The changes are fork-local divergences confined to failure paths: - threaded_read_pictures_batch (libvmaf.c) is a fork-added threaded-batch helper (not in upstream), so its enqueue-failure unref fix carries no upstream conflict risk. Two adjacent doc comments in the same function were tightened to keep it under the fork's readability-function-size LineThreshold (60) — purely cosmetic, no behaviour change. - aggregate_vector_append (feature_collector.c): one-line -EINVAL→-ENOMEM on the feature-name malloc-failure path. Mirrors the fork's own .cpp twin. If upstream rewrites this allocation, prefer the -ENOMEM semantics. - vmaf_model_collection_append (model.c): grow-path realloc failure no longer takes the shared fail: label (which nulls *model_collection); it returns -ENOMEM inline. Rebase-sensitive invariant: only the fresh-allocation failures may null the caller's out-param; the grow path must leave the still-valid existing collection (and the caller's handle) intact. No public header, CLI flag, meson-option, ffmpeg-patch, or Netflix golden-gate surface changes; all three edits fire only on malloc/realloc/enqueue failure, so success-path scores are unchanged (golden gate verified green).
fix/bughunt-mcp — MCP Go↔Python parity + HTTP hardening (2026-06-27)¶
no rebase impact: edits the fork-only MCP servers (cmd/vmafx-mcp/{main.go,impl.go,impl_direct.go} + new cmd/vmafx-mcp/http_security.go, mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py) + tests + docs/state.md + changelog. No libvmaf C-API / CLI / meson_options.txt / public-header change → no ffmpeg-patch impact. Rebase-sensitive invariant — HTTP transport security parity (cmd/vmafx-mcp/AGENTS.md invariant #13): the Go securityMiddleware / bind logic (http_security.go) and the Python _make_security_middleware / _resolve_bind_host (http_transport.py) MUST share the same ADR-0967 env contract (VMAFX_MCP_HTTP_TOKEN constant-time bearer, VMAFX_MCP_HTTP_NO_AUTH=1 opt-out, refuse-all-401-when-neither-set, 4 MiB body limit, VMAFX_MCP_HTTP_BIND default 127.0.0.1). Precision-default parity (ADR-0119 / ADR-1117): both servers default vmaf_score precision to legacy (%.6f) on every transport / dispatch path. A rebase touching either server must keep both in lock-step.
fix/mcp-probe-backend-required — MCP probe_backend required-arg message (2026-06-20)¶
Rebase impact: none on upstream — fork-local. The MCP server (mcp-server/vmaf-mcp/) is a fork addition with no Netflix/vmaf counterpart. One-line change in _call_tool_dispatch's probe_backend branch (removes a redundant explicit guard, relies on the existing KeyError→ValueError wrapper). No public C-API / CLI / header impact.
fix/speed-extractor-oob-deadlock-heap-corruption — GPU SpEED covariance + eigenbasis correctness + safety (2026-06-20)¶
Rebase impact: none on upstream — all fork-local. The SpEED feature (speed_chroma / speed_temporal) and all of its GPU backends are fork-additions with no Netflix/vmaf counterpart. Touches the fork-only GPU extractors (core/src/feature/cuda/{speed_chroma_cuda.c,speed_temporal_cuda.c, speed/speed_score.cu}, core/src/feature/hip/{speed_chroma_hip.c, speed_temporal_hip.c,speed/speed_score.hip}, core/src/feature/sycl/{speed_chroma_sycl.cpp,speed_temporal_sycl.cpp}), the fork-only core/src/feature/speed.c CPU host (init-return propagation only — the global covariance math itself is unchanged), and the GPU parity test fixture. No public header, CLI, meson-option, ffmpeg-patch, or Netflix golden-gate surface changes (SpEED is not in the golden pairs). Rebase-sensitive invariant for the next syncer: the GPU means/cov kernels must stay on the CPU's global covariance formulation (means[25] over the full phase-shifted submatrix, NOT per-tile means[25*num_blocks]), and the ref/dis paths must keep separate covariance + eigenvalue bases — recorded in core/src/feature/cuda/AGENTS.md and verified by test_cuda_speed_{chroma,temporal}_parity at 1e-4. If a future change touches any one backend's kernels, mirror it across all four (CPU + CUDA + HIP + SYCL).
fix/audit-runtime-bugs-batch — 18 audit runtime bugs: SYCL/CUDA init leaks, AI crash-hardening, MCP parity (2026-06-20)¶
Rebase impact: none on upstream. All 18 fixes are fork-local and touch only fork-added files with no upstream Netflix/vmaf counterpart: core/src/feature/sycl/integer_motion_sycl.cpp, core/src/feature/cuda/integer_ssim_cuda.c, core/src/feature/cuda/integer_vif_cuda.c, core/src/feature/ssimulacra2.c (fork-added SSIMULACRA2 extractor), cmd/vmafx-mcp/impl.go, and five ai/ extraction/training scripts (bvi_dvc_to_full_features.py, extract_full_features.py, konvid_to_full_features.py, train_fr_regressor_v2.py, vmaf_train/datamodule.py). No public header, CLI flag, meson-option, ffmpeg-patch, or golden-gate surface changes; all C/SYCL/CUDA edits fire only on already-failing error/OOM paths so success-path behaviour and scores are unchanged. Rebase-sensitive note for the next person syncing: the cmd/vmafx-mcp/impl.go change deletes the last vulkan backend reference in the Go MCP server to keep it byte-compatible with the Python MCP server after the Vulkan removal (ADR-0726); if a sync re-introduces a vulkan keyword in either MCP server, both must move together. No new rebase-sensitive invariants worth a dedicated AGENTS.md entry beyond the existing SYCL/CUDA error-path notes.
fix/sycl-psnr-hvs-chroma-ceiling — SYCL psnr_hvs odd-dimension chroma geometry (2026-06-20)¶
Rebase impact: none on upstream. Fork-local one-line correctness fix in the fork-only SYCL feature extractor core/src/feature/sycl/integer_psnr_hvs_sycl.cpp, which has no upstream Netflix/vmaf counterpart. init_fex_sycl now derives the 4:2:0 / 4:2:2 chroma plane dims with ceiling division ((w + 1U) >> 1) instead of floor (w >> 1), matching picture.c / the CPU reference / the CUDA + HIP twins. No public-header, CLI, meson-option, ffmpeg-patch, or golden-gate surface changes; even-dimension behaviour is byte-identical to before. Rebase-sensitive note for the next person syncing: the picture allocator's ceiling subsample convention ((dim + ss) >> ss) is the single source of truth for chroma plane dims — any new GPU feature extractor that re-derives plane dimensions in its own init must use the ceiling form, not floor; the floor form only agrees on even dimensions and silently drops the last chroma block strip otherwise. This is the same class of bug as the PSNR and Vulkan chroma ceiling fixes already in tree.
fix/metal-drain-motion2 — Metal end-of-stream drain + frame-0 motion2 (2026-06-20)¶
Rebase impact: none on upstream. All changes are fork-local (the Metal backend has no upstream Netflix/vmaf counterpart) plus one additive bit in the shared flush path. Touches: core/src/feature/feature_extractor.h (adds VMAF_FEATURE_EXTRACTOR_METAL = 1 << 7 to the VmafFeatureExtractorFlags enum — a fork-added enum; bit 7 is the next free slot after the fork's HIP bit 6), core/src/libvmaf.c (a new #ifdef HAVE_METAL drain branch in flush_context_serial, gated so non-Metal builds are byte-unchanged), and 8 fork-only core/src/feature/metal/*.mm extractors (set the new flag; + float_motion_metal.mm collect-index fix).
Rebase-sensitive notes for the next person syncing: - flush_context_serial is a fork-local rewrite of the upstream flush. If an upstream sync re-touches the end-of-stream flush, the fork's per-backend drain blocks (CUDA / HIP / SYCL / Metal) must be re-applied — each GPU backend whose extractors carry a VMAF_FEATURE_EXTRACTOR_<BACKEND> flag needs its pending gpu_pending final-frame collect() drained before its flush() runs, or the last frame's score is dropped. Do not drop the Metal branch. - Frame-0 motion2 contract. Every motion-family extractor (CPU + all GPU twins) appends motion2 = 0.0 at index 0 and a no-op at index 1; index ≥ 2 emits min(prev, cur) at index − 1. float_motion_metal now matches this exactly — keep it aligned with integer_motion_metal and the HIP / CUDA twins on any future motion2 change (cross-backend invariant, see core/src/feature/metal/AGENTS.md). - Darwin-only. Not buildable / not exercised on the Linux dev or CI lane; re-validate on Apple Silicon after any upstream flush-path sync.
fix/k150k-training-data-integrity — fail-loud on empty-frame clips + MOS-join key mismatch (2026-06-20)¶
Rebase impact: none on upstream. All changes are fork-local in ai/scripts/extract_k150k_features.py (a fork-added training script with no upstream Netflix counterpart). Two defensive guards: _process_clip raises on an empty frame list instead of writing an all-NaN row + marking the clip done; the MOS-label join gains an mp4.stem fallback + an up-front coverage hard-fail. Invariants recorded in ai/AGENTS.md (do not revert the lookup to a single mos_map.get(clip_name, NaN); keep the staging-first .done ordering). No C-library, ABI, golden-data, or ffmpeg-patch surface touched.
fix/speed-gpu-registry — restore orphaned GPU SpEED registrations + delete dead feature_extractor.c (2026-06-19)¶
Rebase impact: none on upstream. All changes are fork-local. Touches the fork-only registry file core/src/feature/feature_extractor.cpp (adds six externs + array entries under the existing #if HAVE_{CUDA,SYCL,HIP} blocks), deletes the fork-only dead twin core/src/feature/feature_extractor.c (orphaned by PR #875's .c→.cpp split; meson compiled only the .cpp), and adds by-name resolution asserts to core/src/feature/../test/test_feature_extractor.c. Rebase-sensitive note for the next person syncing: there is now exactly ONE registry file (feature_extractor.cpp); if an upstream/Netflix sync re-introduces a feature_extractor.c it must be reconciled into the .cpp, not kept alongside it — the split-brain is what this fix removes. Stale feature_extractor.c references remain in ~40 sibling source comments (Metal .mm, HIP .c) and in historical docs/adr/* / docs/research/* (audit trail — do NOT rewrite); the live comment sweep for the non-ADR consumer files is deferred to the RC LOW doc-hygiene PR and coordinated with the in-flight Metal PR #986.
fix/sycl-init-leaks-exception-safety — SYCL init error-path + exception-boundary hardening (2026-06-19)¶
Rebase impact: none on upstream — fork-local SYCL error-path + exception-boundary hardening. Touches only fork-added SYCL sources (core/src/feature/sycl/integer_adm_sycl.cpp, core/src/feature/sycl/integer_vif_sycl.cpp, core/src/sycl/common.cpp, core/src/sycl/dmabuf_import.cpp), none of which have an upstream Netflix/vmaf counterpart. No public header, CLI, meson-option, ffmpeg-patch, or golden-gate surface changes; success-path behaviour is unchanged (cleanup/exception handling only fires on already-failing paths).
fix/cuda-init-submit-leaks — CUDA error-path resource frees (2026-06-19)¶
Rebase impact: none on upstream — fork-local CUDA error-path hardening. Touches only fork-added CUDA feature extractors under core/src/feature/cuda/ (integer_ms_ssim_cuda.c, integer_psnr_hvs_cuda.c, ssimulacra2_cuda.c, speed_chroma_cuda.c); adds NULL-guarded frees / cleanup-goto routing on init + submit failure paths only. No public-header, meson, CLI, ffmpeg-patch, or golden-gate impact, and success-path behaviour is unchanged.
fix/hip-chroma-mcp-parity — psnr_hip enable_chroma option + MCP Go/Python parity (2026-06-20)¶
Rebase impact: none on upstream — all fork-local. Touches the fork-only HIP extractor (core/src/feature/hip/integer_psnr_hip.c: add an enable_chroma VmafOption), the fork-only Go MCP server (cmd/vmafx-mcp/impl.go + impl_test.go: drop the dead vulkan backend value, drop the unsupported --format from tune-per-shot), and the fork-only Python MCP server (server.py: probe_backend ValueError guard). No upstream Netflix/vmaf file is touched; no public C-API/CLI surface changes (the enable_chroma option already exists on the CPU/CUDA psnr twins).
fix/tox-py314-scipy-118 — tox env py311→py314 (2026-06-20)¶
Rebase impact: none on upstream — fork-local CI config only. Touches python/tox.ini (envlist py311→py314, matching the CI setup-python 3.14.5, since the fork's requirements.txt deps now require ≥3.12) plus a changelog fragment + state.md row. No source or test code changed.
feat/upstream-v1.0.16-models (2026-06-20)¶
Rebase impact: low (additive model data + one C registry block + one meson embed block; no public-header / CLI / ffmpeg-patch / golden-gate change). Verbatim port of Netflix upstream commit 4718b4f5f ("Add VMAF v1.0.16 SDR models, documentation, and tests"). Because it is a pure upstream port, it is exempt from the ADR-0108 six-deliverable rule (CLAUDE §12 r11); the changelog fragment + this rebase note are still provided.
What was ported and how it was adapted to the fork's diverged layout (ADR-0700 libvmaf/ → core/):
libvmaf/src/model.c→core/src/model.c: added the 8externdecls + 8built_in_models[]registry entries, mirroring the existingvmaf_v0.6.1/vmaf_4k_v0.6.1negidiom byte-for-byte (the fork's struct is the sameVmafBuiltInModel {version, data, data_len}).libvmaf/src/meson.build→core/src/meson.build: added twoforeachblocks embedding the v1.0.16 + v1.0.16_hfr JSONs via the samexxd -i -n src_@PLAINNAME@custom_targetthe fork already uses for the v0 models. The v1 models live in their ownmodel/vmaf_v1.0.16{,_hfr}/subdirectories, so a dedicated dir prefix is used (matching upstream).model/vmaf_v1.0.16/*.json+model/vmaf_v1.0.16_hfr/*.json(8 files): copied verbatim viagit checkout 4718b4f5f -- ….python/test/vmaf_v1_quality_runner_test.py: copied verbatim (path NOT renamed). 46 new golden assertions; no pre-existing assertion touched.- Upstream's
resource/doc/models_v1.md+ themodels.md→models_v0.mdrename + the README "News" line do not map: the fork has noresource/doc/model docs (it consolidated them underdocs/models/), so the new doc lives atdocs/models/v1.md(added to the mkdocs nav, with a cross-link fromdocs/models/overview.md). The README change is moot — the fork's README diverged and no longer linksresource/doc/models.md.
Deliberately NOT ported (rebase-sensitive — re-check on the next upstream sync): the upstream commit also bundled an unrelated feature-source reorg in meson.build (moving speed.c, common/convolution.c, vif_tools.c out of the float_enabled block into the always-on list). The fork already wires those sources differently, so applying the upstream hunk would conflict / double-list. If a future sync touches that region, reconcile against the fork's current libvmaf_feature_sources layout, not the upstream diff.
Known fork gap (load-bearing invariant): the 4 _hfr models embed and register but cannot be scored until motion_five_frame_window=true + motion_moving_average=true are implemented (the prev_prev_ref 5-frame plumbing deferred per ADR-0337). The 4 non-HFR models score correctly (1080p 3H == upstream golden VMAF 82.816059). Do not "fix" the HFR runtime error by deleting the option from the model JSONs — the JSONs are verbatim Netflix data; the fix is to land the 5-frame motion plumbing.
feat/golusoris-tune (2026-06-15)¶
Rebase impact: low (Go-only, additive + in-place rewrite of one binary's composition root). Phase-1 of the golusoris adoption (ADR-1119): migrates the cmd/vmafx-tune CLI from a hand-built cobra.Command root onto the golusoris clikit (cobra + fx) framework. Touches only cmd/vmafx-tune/* + its docs / changelog / rebase-notes; no C / meson / public-header / ffmpeg-patch / golden-gate impact, and the Python tools/vmaf-tune harness is untouched. cmd/vmafx-tune is non-cgo (no pkg/libvmaf import), so no libvmaf.so build is needed to compile or test it.
Files: new cmd/vmafx-tune/cmd/golusoris.go (the withGolusoris adapter + configOptions + levelledLogger); cmd/vmafx-tune/cmd/root.go rewritten to clikit.New + clikit.Command; compare.go / ladder.go / report.go subcommand builders re-wired through clikit and their run* functions now take (ctx, deps, flags); new cmd/vmafx-tune/cmd/root_test.go; existing tests updated for the new run* signatures; cmd/vmafx-tune/AGENTS.md invariants extended; docs/usage/vmafx-tune-go.md documents the VMAFX_LOG_LEVEL / VMAFX_LOG_FORMAT surface.
Rebase-sensitive invariants for follow-up PRs and any golusoris bump:
- clikit
WithFxis long-running, not one-shot.clikit.WithFxbuilds anfx.Appand callsapp.Run()(blocks until signal) and never surfaces anfx.Invokeerror as the exit code. One-shot tuning subcommands therefore useclikit.WithRunE(withGolusoris(fn)), wherewithGolusorisbuilds the graph frombootstrap.Base,fx.Populates the deps, runsfn, and returns its error. Do not "simplify" these toclikit.WithFx(golusoris.Core, fx.Invoke(fn))— the CLI would block and lose its exit code. fx.NopLoggeris deliberate. A one-shot CLI must not print fx provide/invoke/lifecycle chatter on every run;bootstrap.FxLogger()(which routes fx events onto the app logger) is for long-running services only. The injected*slog.Loggerstill carries domain diagnostics.levelledLoggercompensates for a golusoris v0.4.0 scoping gap. A root-scopefx.Replace(config.Options{EnvPrefix:"VMAFX_"})reaches root-scope consumers (our domain code reads the right config) but does not penetrate thegolusoris.logsubmodule's ownconfig.Optionsdependency, so the auto-built logger falls back to the defaultAPP_prefix and stays atLevelInfo.withGolusoristherefore addsfx.Decorate(levelledLogger)to rebuild the*slog.Loggerfrom the root config at theVMAFX_-configured level/format. Delete this decorator once golusoris makes the root config override penetrate submodules (track upstream alongside golusoris #234); theTestGolusorisInjection_ConfigDrivesLogLeveltest guards the behavior.VMAFX_env prefix.configOptions()setsEnvPrefix:"VMAFX_"to match the fork-wide env contract (ADR-1119). golusoris splits every underscore into the config delimiter, soVMAFX_LOG_LEVEL→log.level.
feat/golusoris-foundation (2026-06-14)¶
Rebase impact: low (Go-only, additive). Phase 0 of the golusoris fx framework adoption (ADR-1119). Adds github.com/golusoris/golusoris v0.3.1 to go.mod/go.sum (and the widened transitive closure — fx/dig, koanf, chi, river transitives; go build ./... + all test binaries compile clean, no version-skew since both repos already pinned identical shared-dep versions), one new package internal/app/bootstrap/bootstrap.go (Base fx module set + FxLogger), and an Info/Get() addition to pkg/version/version.go (the interim stand-in for golusoris#226). No C/meson/CLI/public-header change → no ffmpeg-patch impact, no golden-gate impact. No binary is migrated in this PR — the six cmd/vmafx-* composition roots are rewritten in the subsequent phased PRs (vmafx-server first; vmafx-controller gated on golusoris#225). Rebase-sensitive note for the follow-up PRs: each binary's fx.New(...) must lead with fx.Replace(config.Options{EnvPrefix:"VMAFX_"}) before golusoris.Core to preserve the VMAFX_ env contract, and the cgo libvmaf.Scorer provider must order its OnStop after the gRPC server's (drain before Close()). Docs: docs/adr/1119-* + fragment + _order.txt + README row, docs/research/1119-*, changelog.d/chore/1119-golusoris-foundation.md.
feat/metric-brisque (2026-06-14)¶
Rebase impact: low. Adds four fork-only files (core/src/feature/brisque.c, brisque_math.h, brisque_model.h, core/test/test_brisque.c) plus the vendored model + provenance (model/other_models/brisque_live.model, NOTICE-brisque, model/brisque_live_card.md) — no upstream twin (BRISQUE is fork-added; the model is the LIVE-lab allmodel bundled under a documented research-use exception, ADR-1115). The model is embedded into the binary at build time via an xxd -i Meson custom_target (the same mechanism libvmaf's JSON models use; brisque_model.h only declares the generated src_brisque_live_model[] / _len externs), so the giant byte array never enters the tree — keeping it under the 1 MB large-file gate. Additive registration: one extern VmafFeatureExtractor vmaf_fex_brisque; + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the LIVE C++23 registry, NOT the dead feature_extractor.c twin), a model embed custom_target + one source line in core/src/meson.build, one executable() (linking the generated model TU) + one test() in core/test/meson.build. Edits docs/metrics/brisque.md (new), mkdocs.yml (BRISQUE nav row), docs/state.md, docs/rebase-notes.md, changelog.d/, core/src/feature/AGENTS.md (invariant note), testdata/scores_cpu_brisque.json (new), docs/research/1101-brisque-nr-metric.md (new), and the ADR index (docs/adr/1115-brisque-nr-metric.md + fragment + _order.txt + README row). CPU-only scalar extractor; no public C-API / ABI / CLI flag / meson_options.txt change → no ffmpeg-patch impact (reachable via the existing generic --feature path). First feature-extractor consumer of the vendored libsvm (core/src/svm.cpp/svm.h) — if a future upstream sync changes the libsvm parser/predict ABI, brisque.c's svm_parse_model_from_buffer + svm_predict calls must be re-checked alongside predict.c. Load-bearing invariants if the algorithm is ever touched: GGD (not AGGD) for the MSCN field, Gaussian sigma=7/6 (not 1.166), MATLAB antialiased bicubic (not INTER_CUBIC), the inline range arrays (not allrange), no output clamp — all required for parity with the bundled trained model (see core/src/feature/AGENTS.md and ADR-1115).
feat/metric-y-funque-plus (2026-06-14)¶
Rebase impact: low. Fork-only additive metric — no upstream twin. Adds two fork-only files (core/src/feature/y_funque_plus.c, core/test/test_y_funque_plus.c). Additive registration only: one extern VmafFeatureExtractor vmaf_fex_y_funque_plus; + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the live C++23 registry — NOT the dead feature_extractor.c twin), a dedicated libvmaf_y_funque_plus_static_lib static_library() + one extract_all_objects() line in core/src/meson.build (mirrors the ssimulacra2 -ffp-contract=off carve-out), and one executable() + one test() in core/test/meson.build. Edits docs/metrics/y-funque-plus.md (new), docs/metrics/features.md (one new row), docs/state.md, docs/rebase-notes.md, changelog.d/, and the ADR index (docs/adr/1114-y-funque-plus-atoms.md + fragment + _order.txt + README.md row). CPU-only scalar extractor; no public C-API / ABI / CLI flag / meson_options.txt / public-header change → no ffmpeg-patch impact (CLAUDE §12 r14 N/A; reachable via the generic --feature path). Load-bearing invariants if the algorithm is ever touched: the Haar butterfly uses the pywt 'haar' convention cH=(a+b-c-d)/2, cV=(a-b+c-d)/2 (NOT the H/V-swapped form the design dossier text mistakenly listed — pywt was verified directly); the DLM numerator pools rest^3 WITHOUT abs while the denominator pools the ref detail WITH abs (pyr_features.py:54/61); the 2x downscale is OpenCV INTER_CUBIC (Keys cubic a=-0.75), the dominant cross-host parity risk — keep -ffp-contract=off.
feat/pelorus-sidedata-reader (2026-06-14)¶
Rebase impact: low-to-medium. Fork-only additive feature (ADR-1118), builds on the vendored Pelorus interop ABI (ADR-1113). Adds three fork-only files (core/include/libvmaf/perceptual_weight.h public C-API, core/src/feature/perceptual_weight.{c,h} the weight module + internal contract, core/test/test_perceptual_weight.c the golden-isolation test) — no upstream twin. Edits to shared files, all additive: - core/src/libvmaf.c — the rebase-sensitive one. Adds (1) two #includes, (2) a VmafPerceptualWeightStore perceptual; field at the tail of struct VmafContext (after dnn), (3) a vmaf_perceptual_weight_store_destroy call in vmaf_close after vmaf_ctx_dnn_free, (4) three new public entry points after vmaf_import_feature_score, and (5) the weighting branch inside vmaf_feature_score_pooled plus a new static pool_reduce helper + PoolAccumulators struct just above it. Load-bearing invariant: the no-side-data path through vmaf_feature_score_pooled MUST stay byte-identical to upstream — the weighted accumulators are only summed when vmaf_perceptual_weight_active() is true, and the MEAN/HARMONIC_MEAN reduce runs the literal upstream expression when weighting is inactive. A rebase that refactors this function must preserve that bit-exactness (the golden gate depends on it; test_perceptual_weight.c guards it). - core/src/meson.build — one source line (feature/perceptual_weight.c) in libvmaf_sources (NOT the feature static lib — it is a pooling helper, not a registered extractor). - core/include/libvmaf/meson.build — one install_headers entry. - core/test/meson.build — one executable() + one test(). - ffmpeg: ffmpeg-patches/0017-libvmaf-read-pelorus-sidedata.patch (new public C-API consumed by vf_libvmaf + new perceptual_weight AVOption → ffmpeg-patch impact per CLAUDE r14) appended at the tail of ffmpeg-patches/series.txt. The patch anchors on stable post-0016 context (score_fmt option line, VmafContext *vmaf; struct line, the do_vmaf vmaf_read_pictures call); CI validates it via a full series replay against a clean n8.1 checkout (git am --3way), not standalone git apply. - Docs/index: docs/api/perceptual-weight.md (new), mkdocs.yml (api nav row + ADR-1118 nav row), docs/state.md, docs/research/1102-*.md (new), changelog.d/added/, and the ADR index (docs/adr/1118-perceptual-sidedata-weighting.md + fragment + _order.txt + regenerated README.md).
feat/mcp-tiny-ai-feature-coverage (2026-06-14)¶
no rebase impact: all touched code is fork-local. The MCP servers (cmd/vmafx-mcp/{tools.go,impl.go,impl_direct.go,main.go,score_extras_test.go} and mcp-server/vmaf-mcp/src/vmaf_mcp/server.py + tests/test_score_extras_adr1117.py) do not exist in upstream Netflix/vmaf, so no mechanical merge conflict is possible. The change adds optional scoring parameters that shell out to existing vmaf CLI flags — it does not add, rename, or remove any public C-API entry point, CLI flag, meson_options.txt entry, public header, or LIBVMAFContext field, so per CLAUDE.md §12 r14 there is no ffmpeg-patch impact (the patches under ffmpeg-patches/ consume libvmaf symbols, not the MCP servers). Rebase-sensitive invariant the diff must preserve: the Go (scoringExtraProperties()) and Python (_scoring_extra_properties()) schema generators MUST stay byte-identical (same keys/enums/defaults/descriptions) per cmd/vmafx-mcp/AGENTS.md §1 — the parity tests (server_test.go::TestToolSchemasMatchPython, test_score_extras_adr1117.py) and the source-of-truth flag names in core/tools/cli_parse.c are the backstop. Also edits docs/mcp/tools.md, docs/state.md, changelog.d/, the ADR index (ADR-1117 + fragment + _order.txt + regenerated README.md), and a research digest — all fork-local docs.
feat/metric-niqe (2026-06-14)¶
Rebase impact: low. Adds four fork-only files (core/src/feature/niqe.c, niqe_math.h, niqe_model.h, core/test/test_niqe.c) — no upstream twin (NIQE is fork-trained against model/other_models/niqe_v0.1.pkl). Additive registration: one extern VmafFeatureExtractor vmaf_fex_niqe; + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the LIVE C++23 registry, NOT the dead feature_extractor.c twin), one source line in core/src/meson.build, one executable() + one test() in core/test/meson.build. Edits docs/metrics/niqe.md (new), mkdocs.yml (NIQE nav row + regenerated ADR-nav block via scripts/docs/generate-adr-nav.sh), docs/state.md, docs/rebase-notes.md, changelog.d/, testdata/scores_cpu_niqe.json (new), and the ADR index (docs/adr/1112-niqe-nr-metric.md + fragment + _order.txt + regenerated README.md). CPU-only scalar extractor; no public C-API / ABI / CLI flag / meson_options.txt change → no ffmpeg-patch impact (the feature is reachable via the existing generic --feature path). Load-bearing invariants if the algorithm is ever touched: the AGGD N keeps the trailing *aggdratio factor and the MSCN maps + PIL bicubic half-res output stay float32-rounded — both are required for parity with the pkl the model was trained against (see core/src/feature/AGENTS.md and ADR-1112).
feat/metal-standalone-batch (2026-06-14)¶
Rebase impact: low. Adds 12 fork-only files (4 kernels x {.metal,_metal.mm,test}) for integer_ciede / integer_psnr_hvs / integer_cambi / ssimulacra2 — no upstream twins. Additive registration (4 externs + 4 list entries in feature_extractor.c
if HAVE_METAL; 4 .mm sources + 4 custom_targets + 4 metal_air_files in¶
core/src/metal/meson.build; a foreach test block in core/test/meson.build). Edits docs/metrics/features.md (+Metal on the 4 rows), state.md, changelog. cambi is a Strategy-II hybrid (GPU kernels + exact-CPU host residual via cambi_internal.h), matching ADR-0205. Metal-only; no public C-API/CLI change -> no ffmpeg-patch impact.
feat/metric-delta-e-itp (2026-06-14)¶
Rebase impact: low. Fork-only additive metric — no upstream twin. Adds three new files (core/src/feature/delta_e_itp.c, core/src/feature/delta_e_itp_math.h, core/test/test_delta_e_itp.c). Additive registration only: one extern + one feature_extractor_list[] entry in core/src/feature/feature_extractor.cpp (the live C++23 file — NOT the stale feature_extractor.c twin, which is dead per ADR-0846 and is a separate cleanup), one source line in core/src/meson.build (next to ciede.c), one executable() + one test() block in core/test/meson.build (next to test_ciede), and one nav entry in mkdocs.yml. No public C-API / ABI / CLI flag / meson_options.txt / public-header change → no ffmpeg-patch impact (CLAUDE §12 r14 N/A). The metric mirrors the CPU ciede.c structure (chroma-upsample helpers, 8/16-bit reads, double-precision frame sum); if the upstream ciede.c chroma-upsampling helpers are ever refactored, the copied-verbatim scale_chroma_planes / scale_chroma_planes_hbd in delta_e_itp.c are independent and need no follow-up. Compiled unconditionally (CPU); no backend flag.
feat/metric-pu21 (2026-06-14)¶
Rebase impact: low (fork-only additive). Adds five fork-only files (core/src/feature/pu21.c, pu21_math.h, pu21_ssim.c, pu21_ssim.h, core/test/test_pu21.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.cpp (the active C++ registry — NOT the dead feature_extractor.c, which the build does not compile), pu21.c + pu21_ssim.c added to the unconditional libvmaf_feature_sources in core/src/meson.build (next to ciede.c), test block + test() row in core/test/meson.build. Edits docs/metrics/pu21.md (new), mkdocs.yml, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/. Reuses only the read-only iqa Gaussian-convolve helper (iqa/convolve.c) and the read-only Gaussian window table (iqa/ssim_tools.h); the golden float_ssim/iqa_ssim (L=255) is untouched — PU21 ships its own L=256 SSIM. No public C-API / ABI / CLI flag / meson_options.txt change → no ffmpeg-patch impact. If the iqa convolve layout (output packed at the reduced stride w-kw+1) ever changes, pu21_ssim.c's reduction stride must follow.
feat/metal-integer-adm (2026-06-14)¶
Rebase impact: low. Adds three fork-only files (core/src/feature/metal/integer_adm.metal, integer_adm_metal.mm, core/test/test_metal_integer_adm_parity.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), .mm source + custom_target + metal_air_files entry in core/src/metal/meson.build, test block in core/test/meson.build. Edits docs/metrics/features.md (adm fixed-point GPU column += SYCL/HIP/Metal), docs/state.md, docs/rebase-notes.md, changelog.d/. Metal-only (-Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Mirrors the CPU integer_adm.c fixed-point DWT pipeline — if that algorithm changes, the Metal twin must follow.
fix/mcp-schema-bitdepth-vulkan (2026-06-14)¶
no rebase impact: edits the fork-only MCP servers (cmd/vmafx-mcp/tools.go, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py) + docs/mcp/tools.md + state.md + changelog. No libvmaf C-API/CLI change. The bitdepth enum + backend enum must stay in sync between the Python and Go MCP tool schemas (byte-compatible pair).
feat/metal-integer-vif (2026-06-14)¶
Rebase impact: low. Adds three fork-only files (core/src/feature/metal/integer_vif.metal, integer_vif_metal.mm, core/test/test_metal_integer_vif_parity.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), .mm source + custom_target + metal_air_files entry in core/src/metal/meson.build, test block in core/test/meson.build. Edits docs/metrics/vif.md, docs/state.md, docs/rebase-notes.md, changelog.d/. Metal-only (-Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Mirrors the CPU integer_vif.c fixed-point arithmetic + the float_vif_metal scaffold; if the CPU integer-VIF math changes, the Metal twin must follow.
feat/metal-float-adm (2026-06-14)¶
Rebase impact: low. Adds three fork-only files (core/src/feature/metal/float_adm.metal, float_adm_metal.mm, core/test/test_metal_float_adm_parity.c) — no upstream twin. Additive registration: extern + list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), .mm source + custom_target + metal_air_files entry in core/src/metal/meson.build, test block in core/test/meson.build. Edits docs/metrics/features.md (float_adm GPU column += Metal), docs/state.md, docs/rebase-notes.md, changelog.d/. Metal-only (-Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Core-VMAF kernel; mirrors the CUDA float_adm/ DWT+CSF+CM pipeline — if a future change alters that algorithm, the Metal twin must follow.
feat/metal-float-vif (2026-06-14)¶
Rebase impact: low. Adds three fork-only files (core/src/feature/metal/float_vif.metal, float_vif_metal.mm, core/test/test_metal_float_vif_parity.c) — no upstream twin. Additive registration: one extern + one list entry in core/src/feature/feature_extractor.c (#if HAVE_METAL), one .mm source + one custom_target + one metal_air_files entry in core/src/metal/meson.build, one test block in core/test/meson.build. Edits docs/metrics/vif.md, docs/state.md, docs/rebase-notes.md, changelog.d/ — keep both additive hunks on concurrent-branch conflict. Metal-only (compiles under -Denable_metal=enabled); no public C-API / ABI / CLI / meson_options.txt change → no ffmpeg-patch impact. Part of the Metal full-parity sweep (9 real kernels); float_vif is core-VMAF.
feat/metal-integer-ssim (2026-06-14)¶
Rebase impact: low. Adds three fork-only files (core/src/feature/metal/integer_ssim.metal, integer_ssim_metal.mm, core/test/test_metal_integer_ssim_parity.c) — no upstream twin. Registration edits are additive: one extern + one list entry in core/src/feature/feature_extractor.c (inside the #if HAVE_METAL block), one .mm source + one custom_target + one metal_air_files entry in core/src/metal/meson.build, one test block in core/test/meson.build. Edits docs/metrics/ssim.md, docs/state.md, docs/rebase-notes.md, changelog.d/ — keep both additive hunks if a concurrent branch also edits them. The Metal kernel only compiles under -Denable_metal=enabled (macOS); no public libvmaf C-API / ABI / CLI / meson_options.txt change, so no ffmpeg-patch (CLAUDE §12 r14) impact. Scope note: Metal full-parity is 9 real kernels (not 11) — integer_moment/integer_ms_ssim are not distinct extractors.
feat/gpu-motion3-v2-twins (2026-06-14)¶
Rebase impact: low. Touches three fork-added GPU wrappers (core/src/feature/{sycl,hip,metal}/integer_motion_v2_{sycl.cpp,hip.c,metal.mm}) — none have an upstream twin, so no upstream-sync conflict — plus their fork-only parity tests (core/test/test_{sycl,hip,metal}_motion_v2_parity.c). Each mirrors the merged CUDA flush_fex_cuda motion3_v2 post-process (fix/cuda-motion-v2-motion3-emission, ADR-1108) byte-for-byte and reuses the shared motion_blend_tools.h helper; no GPU kernel is modified. Edits docs/metrics/motion.md, docs/state.md, docs/rebase-notes.md, changelog.d/ — keep both additive hunks if a concurrent branch also edits them. No public libvmaf C-API, ABI, header, CLI, or meson_options.txt change, so no ffmpeg-patch (CLAUDE §12 r14) impact. If a future change alters the CPU integer_motion_v2.c::flush blend/clip/seed/moving-average logic, all four GPU twins (cuda/sycl/hip/metal) must be updated in the same PR to keep the places=4 parity gate green.
fix/cuda-motion-v2-motion3-emission (2026-06-13)¶
Rebase impact: low. Touches core/src/feature/cuda/integer_motion_v2_cuda.c (fork-added CUDA wrapper — no upstream twin, so no upstream-sync conflict), core/test/test_cuda_motion_v2_parity.c (fork-only test), and core/src/feature/cuda/AGENTS.md (fork doc). Adds docs/adr/1108-*.md, changelog.d/fixed/1108-*.md. Edits docs/metrics/motion.md, docs/adr/README.md, and docs/state.md — these can conflict with a concurrent branch that also edits the same doc; keep both additive hunks (the motion3_v2 rows/paragraph here plus whatever the other branch adds). The motion3_v2 emission reuses the existing motion_blend_tools.h host helper and the established vmaf_feature_collector_append_with_dict API — no public libvmaf C-API, ABI, header, CLI, or meson_options.txt surface change, so no ffmpeg-patch (CLAUDE §12 r14) impact. The CUDA kernel itself is unchanged; only the host-side flush + option table grew.
feat/vmafx-scorestream-phase2 (2026-06-13)¶
no rebase impact: all changes are in fork-local Go files that do not exist in upstream Netflix/vmaf — pkg/libvmaf/stream.go (+ test), pkg/libvmaf/libvmaf.go (adds the exported Scorer.ResolveModel wrapper), cmd/vmafx-server/grpc_server.go, cmd/vmafx-node/server/server.go, cmd/vmafx-node/main.go, and their tests, plus docs/ and changelog.d/. The cgo path links against the public libvmaf C ABI (vmaf_picture_alloc / vmaf_read_pictures / vmaf_score_at_index / vmaf_score_pooled / vmaf_feature_score_at_index) — all stable upstream entry points in core/include/libvmaf/libvmaf.h; no upstream-mirrored C source is modified, so no mechanical conflict is possible. If a future upstream sync renamed any of those public functions, pkg/libvmaf/{direct,stream}.go would need the same one-line follow per the existing cgo-coupling invariant.
fix/json-model-feature-name-leak (2026-06-13)¶
Rebase impact: low. Touches core/src/read_json_model.c (upstream-mirrored, libvmaf/src/read_json_model.c upstream) and its fork-only C++23 twin core/src/read_json_model.cpp (ADR-0761 / ADR-0846 Wave 8) — both gain one free(model->feature[index].name) line plus a comment inside append_feature_name, immediately before the strdup. Also adds one test function + one registration line to core/test/test_model.c. Upstream lacks the duplicate-key overwrite guard, so a future sync that rewrites append_feature_name in the .c file will conflict on that hunk only; keep the fork's free-before-strdup (it fixes a real leak the upstream code shares). The .cpp twin is fork-only and never receives upstream hunks. No public API, ABI, header, or CLI surface changes — no ffmpeg-patch impact.
fix/golden-cpu-regression-restore (2026-06-13)¶
Rebase impact: low. Touches core/src/feature/vif_tools.c (removes #if HAVE_AVX512 dispatch blocks from the three float VIF functions). This file also exists in upstream Netflix/vmaf. Future upstream syncs that modify vif_tools.c will see a clean merge on any hunk that does not overlap with the three removed dispatch blocks. If upstream ever adds AVX-512 float VIF dispatch, the upstream version must be audited for Netflix golden parity before enabling it on this fork.
docs/rc-deferred-closeout (2026-06-13)¶
no rebase impact: changes confined to docs/state.md (move T-DOC-LEGACY-RUNNER from Open to Recently Closed) and docs/metrics/cambi.md (remove stale Vulkan section, doc-only). Conflicts with a concurrent branch that also edits state.md: keep both state-update rows; the row order within Recently Closed does not matter.
test/float-extractor-cpu-coverage — Float extractor CPU-path unit tests (2026-06-13)¶
no rebase impact: test-only addition. New files: core/test/test_float_{psnr,moment,ssim,ms_ssim,vif,adm,motion}_coverage.c. Edit to core/test/meson.build adds 7 new executable targets after line 1705 (test_float_vif_min_dim). No conflict risk unless a concurrent branch also adds tests immediately after that same line; in that case, append both blocks in source order.
fix/hip-meson-speed-tus-dedup — Remove duplicate HIP speed TU entries (2026-06-12)¶
no rebase impact: change confined to core/src/hip/meson.build (build config only). Removes the duplicate speed_chroma_hip.c / speed_temporal_hip.c wiring block that ADR-0852 introduced when ADR-0964 had already included those TUs earlier in the same hip_sources list. If a concurrent branch also edits core/src/hip/meson.build, keep both sets of changes; the resolved file must contain each speed TU exactly once.
fix/docker-ffmpeg-tag-n811-pin — Docker FFMPEG_TAG n8.1 → n8.1.1 + patch 0016 context (2026-06-12)¶
no rebase impact: Dockerfile and Dockerfile.ffmpeg pin changes are build-config only; patch 0016 context line fix is an ffmpeg-patches-internal correction with no effect on C API, ABI, or libvmaf source. Files touched: Dockerfile, Dockerfile.ffmpeg, .pre-commit-config.yaml, ffmpeg-patches/0016-libvmaf-wire-score-fmt-on-all-vmaf-filters.patch.
chore/codeql-cpp-cleanup-bundle — CodeQL C++ note-level cleanup (2026-06-12)¶
no rebase impact: all changes are local variable renames, dead-code removals, and comment additions. Files touched: adm_avx2.c, adm_avx512.c, integer_adm.c, feature_collector.c, feature_collector.cpp, feature_name.c, libvmaf.c, mkdirp.c, speed.c, vif_tools.c, pdjson.c, predict.c, svm.cpp, test_score_pooled_eagain.c, test_tensor_io.c, test_cambi.c, test_integer_adm_simd.c, test_svm_api.c. No API or ABI changes; no semantic changes to score computation. If a concurrent branch modifies any of these files, resolve by keeping both sets of changes — variable renames are local and non-conflicting.
chore/bundle-fable-5-findings — 4 Fable deep-hunt fixes (2026-06-12)¶
core/src/feature/x86/integer_ssim_avx2.c: reorder w*(s*s) to (w*s)*s for the 16-bit accumulation; only affects integer_ssim AVX2 16-bit path. No conflict risk on other branches unless they also modify integer_ssim_avx2.c accumulation order. core/src/libvmaf.c: three separate hunks — bpc &&→|| in validate_pic_params; Phase 2 CUDA PREV_REF vmaf_picture_ref instead of bare copy; dist translate error-propagation in read_pictures_cuda_translate. If a concurrent branch edits libvmaf.c in those functions, resolve by keeping all three fixes; they are independent. core/test/test_validate_pic_params_bpc.c and core/test/meson.build: new test file and meson registration. No conflict risk unless another branch adds a test with the same name. cmd/vmafx-server/concurrency.go, concurrency_test.go, grpc_server.go, http_server.go, main.go: ScoreLimiter addition. If a concurrent branch also modifies main.go flag parsing or grpc_server.go/http_server.go handler signatures, resolve by preserving the WithLimiter constructors and the --max-concurrent-scores flag.
fix/master-855-tip-3-reds — bootstrap-test recal + Dockerfile ldconfig (2026-06-08, no ADR)¶
no rebase impact: python/test/local_explainer_test.py line 276 expected value and places argument changed (fork-local test, not Netflix golden data); Dockerfile gains a single RUN ldconfig line after make install. If a concurrent branch also edits python/test/local_explainer_test.py lines 271-277, resolve by keeping places=3 and the # ADR-0418 macOS-libm Δ relax comments. If a concurrent branch edits Dockerfile around the libvmaf build block, ensure RUN ldconfig is present immediately after the make install line.
fix/containerfile-gid-and-stale-rename — GID/UID 1000 → 2000 (2026-06-08, ADR-1101)¶
no rebase impact: changes confined to dev/Containerfile (GID/UID values), docs/adr/1101-containerfile-gid-uid-2000.md (new ADR), and changelog.d/fixed/1101-containerfile-gid-uid-2000.md (new fragment). No production C source, public header, meson build files, or Python package modified. If a concurrent branch also edits dev/Containerfile, the only conflict will be in the groupadd/useradd lines; resolve by keeping GID/UID 2000.
fix/matrix-5-real-bugs (2026-06-08, no ADR — 5 correctness bug fixes)¶
core/src/feature/hip/integer_vif/vif_statistics.hip: removed #define AMD_WAVEFRONT_SIZE 64; reduction loop and lane guards now use warpSize device variable. Conflicts possible if another branch edits the same wavefront-reduce section; resolve by keeping the warpSize-based version. core/src/feature/hip/float_vif/float_vif_score.hip, float_motion/float_motion_score.hip, float_psnr/float_psnr_score.hip, float_moment/moment_score.hip: similar pattern — shared-memory arrays resized for minimum warp size (32); runtime warpSize used for loops. Conflict risk is low (only these wavefront-size definitions changed); keep the warpSize-based version. core/src/libvmaf.c: ref = &ref_host; dist = &dist_host guarded by if (hw_flags & HW_FLAG_HOST). Conflicts possible if another branch modifies the same #ifdef HAVE_CUDA block; resolve by keeping the HW_FLAG_HOST guard. mcp-server/vmaf-mcp/src/vmaf_mcp/server.py: _PROBE_YUV_WIDTH/HEIGHT bumped from 32 to 64; runtime_healthy set to score is not None. Low conflict risk. ffmpeg-patches/0005-libvmaf-add-libvmaf-sycl-filter.patch: FILTER_SINGLE_PIXFMT replaced by FILTER_PIXFMTS; do_vmaf_sycl and config_props_sycl split on AV_PIX_FMT_QSV. Conflicts possible if another branch edits patch 0005; apply this version first, then rebase the other. dev/Containerfile: RUN bash .../fetch-test-yuvs.sh layer added. Low conflict risk.
test/ai-scripts-coverage-round3 (2026-06-06, no ADR — test-only)¶
no rebase impact: adds two new test files (ai/tests/test_calibrate_phase_f_recipes_unit.py and ai/tests/test_analyze_knob_sweep_unit.py) and one changelog fragment. No existing C source, public API, upstream-mirrored Python, or golden assertion is modified.
docs/r12-c-api-doc-completeness (2026-06-06, no ADR — doc-only)¶
no rebase impact: comment-only changes to core/include/libvmaf/libvmaf_cuda.h, core/include/libvmaf/libvmaf_sycl.h, core/include/libvmaf/dnn.h, core/include/libvmaf/picture_v2.h, core/include/libvmaf/libvmaf.h, and core/include/libvmaf/model.h. No C sources, build files, or public API signatures touched — Doxygen comment additions only.
docs/doxygen-private-headers-r4 (2026-06-07)¶
no rebase impact: purely additive Doxygen comment blocks inserted into 10 internal headers under core/src/. No include paths, struct layouts, or function signatures are changed. Conflicts only if another branch inserts text at the same line positions in these headers.
fix/pic-pool-odr-cuda-gpumask-cov-floor (2026-06-08)¶
core/src/meson.build: adds cpp_args to picture_pool_cpp23_lib. Conflicts possible if another branch modifies the same static_library() block; resolve by keeping both the cpp_args line and the other change. core/tools/test/test_vmaf_cuda_gpumask.sh and core/tools/test/meson.build: shell guard + timeout added; low conflict risk. scripts/ci/coverage-check.sh and .github/workflows/tests-and-quality-gates.yml: per-file floor and pytest timeout changed; low conflict risk (numeric/string values only).
fix/cuda-done-path-double-unref-ort-coverage (2026-06-07)¶
no rebase impact: changes confined to core/src/libvmaf.c (split read_pictures_cuda_cleanup into full and _device_only variants inside the existing #ifdef HAVE_CUDA block — non-CUDA builds are unchanged; the call site at the done=true branch is guarded by the same #ifdef HAVE_CUDA) and core/src/dnn/ort_backend.c (collapse a dead else branch into a single-line ternary in ort_log_and_release_status — no behaviour change on any exercised code path, coverage-only impact).
fix/ci-multi-platform-bundle-838 (2026-06-07)¶
no rebase impact: changes confined to core/tools/cli_parse.cpp (const-qualifier on local strsep parameter — isolated #ifndef HAVE_STRSEP compat block), core/src/opt.cpp (replace static_cast<int> with memcpy in a single switch statement — no surrounding context dependency), core/src/feature/feature_extractor.cpp (add extern "C" wrappers around existing extern declarations — purely syntactic, no semantic change), core/src/libvmaf.c (add #ifdef HAVE_CUDA cleanup call in the done=true branch of vmaf_read_pictures — guarded by HAVE_CUDA; non-CUDA builds are unchanged), and python/test/vmafexec_feature_extractor_test.py (lower places=6 to places=4 on 5 per-frame assertions).
fix/go-rust-ci-red-bundle (2026-06-07)¶
no rebase impact: changes confined to .github/workflows/go-ci.yml (env var addition to go test step), cmd/vmafx-operator/internal/controller/vmafxnode_controller_test.go (timestamp truncation), cmd/vmafx-mcp/impl_direct.go (restore ValidatePath calls), and bindings/rust/vmafx-sys/Cargo.toml (add [lib] doctest = false). No C source, public header, or upstream-mirrored code modified.
fix/build-matrix-macos-windows-fixes (2026-06-07)¶
Rebase-sensitive (meson.build): core/src/meson.build gains dependencies : [pthread_dependency] on both picture_pool_cpp23_lib and gpu_picture_pool_cpp23_lib static library targets (~lines 1768–1788). If a concurrent branch adds other fields to those static_library() calls, merge both sets of fields.
Other changes are not rebase-sensitive: - core/src/feature/arm64/motion_v2_neon.c: rewrite of neon_any_nonzero_s32 (isolated function, no surrounding context). - compat/python-vmaf/__init__.py: two call-sites of --cpumask changed from "-1" to "4294967295". - .github/workflows/libvmaf-build-matrix.yml: two Vulkan matrix rows removed; if a concurrent branch also removes the Vulkan step bodies (Install Vulkan SDK, Cache meson subprojects (Vulkan wraps), Run Vulkan smoke tests (macOS MoltenVK), etc.), take both removals. - python/test/python_harness_coverage_test.py: test expectation update (--cpumask -1 → --cpumask 4294967295; test_run_preserves_user_env expected dict gains LC_ALL/LANG).
fix/nightly-bisect-tracker-issue (2026-06-07)¶
no rebase impact: changes confined to .github/workflows/nightly-bisect.yml, scripts/ci/post-bisect-comment.py, docs/state.md, and changelog.d/fixed/nightly-bisect-tracker-issue.md. No C source, public header, Go source, or test logic modified.
fix/feature-extractor-flags-zero-skip-gpu (2026-06-07)¶
no rebase impact: the change is confined to a single function body in core/src/feature/feature_extractor.c (lines 443–473). No header changes, no meson.build changes, no new files except the ADR and changelog fragment. The only other file touched is core/test/test_picture.c (missing <string.h> include added). Neither file is a high-contention rebase target.
fix/sycl-fsycl-link-propagation (2026-06-07)¶
Rebase-sensitive: modifies core/src/meson.build and core/test/meson.build — two high-contention build files that accumulate edits from most GPU-backend PRs.
In core/src/meson.build: - The sycl_dependency declare_dependency block gains link_args: ['-fsycl']. - The vmaf_link_args += ['-fsycl'] line and its surrounding comment block are replaced with a shorter comment referencing sycl_dependency. If a concurrent branch adds entries to vmaf_link_args, take that branch's additions and keep the updated comment.
In core/test/meson.build: - The test('test_sycl_motion_add_uv_parity', ...) call loses should_fail: true and its accompanying ADR-1093 comment block. If a concurrent branch adds new SYCL test executables nearby, no conflict is expected; should_fail on other tests is unaffected.
In core/test/test_sycl_motion_add_uv_parity.c: - Feature-name queries updated (integer_motion2_mau, float_motion2_mau). Conflicts only if another branch edits the same query lines.
fix/mcp-resource-uri-validation (2026-06-07)¶
no rebase impact: single-function change in cmd/vmafx-mcp/impl_direct.go (resolveModelArgToPath) and one new test in cmd/vmafx-mcp/impl_direct_test.go. Only the Go cmd/vmafx-mcp package is touched; no C sources, no public headers, no test fixtures, no build files. Conflicts only if another branch edits resolveModelArgToPath or adds tests to impl_direct_test.go.
fix/cross-platform-path-list-separator (2026-06-06)¶
no rebase impact: single-line change in pkg/libvmaf/paths.go replacing strings.Split(extra, ":") with filepath.SplitList(extra). Only the Go pkg/libvmaf package is touched; no C sources, no public headers, no test fixtures, no build files. Conflicts only if another branch edits the same AllowedRoots function in that file.
fix/neon-motion-zero-skip (2026-06-06)¶
no rebase impact: single-file change to core/src/feature/arm64/motion_v2_neon.c. Replaces the neon_hadd_s32 (signed horizontal sum) early-exit check with neon_any_nonzero_s32 (bitwise OR-fold) in both motion_score_pipeline_8_neon and motion_score_pipeline_16_neon. No public API, no header, no test data, no upstream-mirrored file is modified. Conflicts only if another branch edits the same static helper region of that file.
fix/helm-values-completeness-adr-1074 (ADR-1074, 2026-06-06)¶
no rebase impact: changes are confined to deploy/helm/vmafx/values.yaml, deploy/helm/vmafx/values.schema.json, and three templates (templates/statefulset.yaml, templates/node.yaml, templates/networkpolicy.yaml). No C source, public header, upstream-mirrored file, Python test, or golden-data assertion is touched. Conflict risk exists only if another branch edits those same Helm files concurrently.
test/coverage-pkg-observability (2026-06-06)¶
no rebase impact: changes are confined to pkg/observability/coverage_gaps_test.go (new test file), pkg/observability/AGENTS.md (invariant notes), and changelog.d/added/observability-coverage-gaps.md (fragment). No production source, public header, or build file is modified. Conflicts only if another branch edits the same lines in AGENTS.md or rebase-notes.md.
fix/sanitizer-deselect-tests-and-quality-gates (2026-06-06)¶
no rebase impact: CI-only change to .github/workflows/tests-and-quality-gates.yml adding test_gpu_picture_pool_uaf, test_integer_motion_v2_coverage, and test_pic_preallocation to the ADR-0347 per-sanitizer EXCLUDE patterns for address, undefined, and thread. No source, header, test, or build file is modified. Conflicts only if another branch edits the same case block in that workflow file.
fix/mcp-score-at-index-eagain-guard (ADR-1073, 2026-06-06)¶
no rebase impact: changes are confined to core/src/libvmaf.c (vmaf_score_at_index guard condition), core/src/mcp/compute_vmaf.c (n_threads restored to 1u, debug code removed), and core/test/test_mcp_smoke.c (fixture dimensions 64→192, debug print removed). No public API surface, no upstream-mirrored file is modified. The guard change is a one-line fix that does not affect the call signature or semantics observable to callers that never encounter multi-frame pools.
fix/skip-motion-five-frame-window-adr-0337 (ADR-0337, 2026-06-06)¶
no rebase impact: only python/test/feature_extractor_test.py is modified — 9 test methods gain @unittest.skip decorators. No C source, public header, upstream-mirrored file, or golden-data assertion is touched. Rebase against Netflix/vmaf master or any feature branch has zero conflict risk.
fix/prev-ref-batch-refcount-and-motion-score (ADR-1072, 2026-06-06)¶
Files touched: core/src/libvmaf.c (two sites in threaded_extract_batch_func and one in threaded_extract_func), core/test/test_hip_ms_ssim_parity.c (FIXTURE_H 144→192), core/test/test_cuda_float_ms_ssim_parity.c (FIXTURE_H 144→192), core/test/test_hip_motion_parity.c (add debug=1 opts, add feature.h include), docs/adr/1072-prev-ref-batch-refcount-leak.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/1072-prev-ref-batch-refcount-leak.md.
Rebase impact: The libvmaf.c hunks add vmaf_picture_unref + memset + memset(f->prev_ref) inside the VMAF_FEATURE_EXTRACTOR_PREV_REF block in threaded_extract_batch_func. If a concurrent branch modifies the same block or the unref: label region, resolve by keeping both the concurrent change and the new unref-before-memset + zero-f->prev_ref logic from this branch. The test fixture changes (144→192) and the debug-flag addition are self-contained with no shared invariants. No public-API, ABI, or upstream-mirrored file changes.
fix/test-failures-macos-dnn (2026-06-06, no ADR — bug fixes)¶
Files touched: core/src/gpu_picture_pool.{c,cpp}, core/src/libvmaf.c, core/src/feature/integer_motion.c, core/src/feature/feature_extractor.cpp, core/test/test_framesync.c, core/test/test_integer_motion_coverage.c, changelog.d/fixed/macos-dnn-test-failures-6-fixes.md, docs/rebase-notes.md.
Rebase impact: All changes are internal bug fixes with no public-API or ABI changes. If a concurrent branch modifies vmaf_score_at_index (libvmaf.c), vmaf_gpu_picture_pool_init (gpu_picture_pool.{c,cpp}), or integer_motion.c init(), resolve conflicts by keeping both the concurrent change and the err != -EAGAIN / *pool = NULL / w < 3 || h < 3 guards from this branch. The test fixes in test_framesync.c and test_integer_motion_coverage.c are self-contained; no invariants span other branches.
no rebase impact on public API, build flags, or upstream-mirrored files.
docs/doxygen-public-header-drift (2026-06-06, no ADR — doc-only fix)¶
no rebase impact: comment-only changes to core/include/libvmaf/libvmaf_cuda.h, core/include/libvmaf/libvmaf_vulkan.h, and core/include/libvmaf/libvmaf_sycl.h. No C sources, build files, or public API signatures touched.
chore/ci-workflow-audit-sha-pin-dead-jobs (2026-06-06, no ADR — workflow hygiene)¶
no rebase impact: changes are entirely in .github/workflows/ (SHA pin, dead-job removal, comment correction). No C sources, public API, build flags, or upstream-mirrored files are touched.
test/go-vmafx-mcp-handler-coverage (2026-06-06, no ADR — test-only)¶
Files touched: cmd/vmafx-mcp/impl_handlers_test.go (new), cmd/vmafx-mcp/AGENTS.md, changelog.d/added/go-vmafx-mcp-handler-coverage.md, docs/rebase-notes.md.
Rebase impact: test-only addition; no production code changed. If a concurrent branch adds a new tool handler to impl.go, add a corresponding error-path test to impl_handlers_test.go following the established pattern (t.Setenv("VMAF_BIN", "/nonexistent/...") for binary-dependent handlers).
fix/r10-cpp23-wave-error-paths (2026-06-06, ADR-1060)¶
Files touched: core/src/feature/feature_extractor.cpp, core/src/read_json_model.cpp
Rebase impact: no rebase impact. All changes are internal to existing functions with no public-API or header changes. Branches that also touch feature_extractor.cpp should verify the free_fex_list label and the context-create parse-options error path merge cleanly.
fix/helm-chart-security-hardening (2026-06-06, ADR-1058)¶
Files touched: deploy/helm/vmafx/templates/pdb.yaml (new), deploy/helm/vmafx/templates/operator-rbac.yaml, deploy/helm/vmafx/templates/networkpolicy.yaml, deploy/helm/vmafx/values.yaml, deploy/helm/vmafx/values.schema.json
Rebase impact: The operator RBAC resource names changed: *-operator-role (ClusterRole) is replaced by *-operator-crds (ClusterRole) + *-operator-ns (Role). Any branch that patches operator-rbac.yaml will conflict on the resource name. Run helm upgrade (not in-place patch) when applying to existing operator installs. The networkPolicy.allow schema is now additionalProperties: false; any branch that adds a new allow.* key must also enumerate it in values.schema.json.
fix/rust-clippy-library-strictness (2026-06-06, ADR-1063)¶
Files touched: bindings/rust/vmafx-sys/src/lib.rs, bindings/rust/vmafx-sys/src/safe.rs, bindings/rust/vmafx/src/lib.rs, bindings/rust/vmafx/src/picture.rs, bindings/rust/vmafx/src/error.rs, core/src/feature/rust/tad/src/lib.rs
Rebase impact: vmafx-sys/src/lib.rs no longer uses crate-level #![allow(clippy::all)]; the generated bindings are now in a private mod bindings with the allow scoped to that module. Any branch that adds new hand-written code to vmafx-sys/src/lib.rs or safe.rs must write clippy-clean code. The VmafContext::default() call is gone — branches that depend on it must use VmafContext::new() instead. The #![deny(unsafe_op_in_unsafe_fn)] in safe.rs and tad/src/lib.rs will cause a compile error on any in-flight branch that adds a bare unsafe operation inside an unsafe fn without an explicit unsafe {} block.
fix/msvc-cpp-std-vc-latest-1056 (2026-06-06, ADR-1056)¶
Files touched: core/meson.build, core/AGENTS.md
Rebase impact: core/meson.build no longer carries cpp_std=c++23 in default_options. Any branch that adds cpp_std=... to default_options will conflict with this change. The add_project_arguments('-std=c++23') block must remain beneath the cxx = meson.get_compiler('cpp') line and above the first cc.check_header call. The get_option('cpp_std') == 'none' guard must be preserved; removing it would cause the SYCL leg to receive both -Dcpp_std=c++14 (from the workflow) and -std=c++23 (from the else branch), which is a compile error.
fix/ci-pin-cuda-132-jimver (2026-06-06, no ADR — CI configuration pin fix)¶
no rebase impact: CI-only change (.github/workflows/build.yml, .github/workflows/libvmaf-build-matrix.yml). No C sources, public API, or upstream-mirrored files are touched.
fix/macos-docker-platform-unblock (2026-06-04, no ADR — build bug fix)¶
no rebase impact: adds <string_view> include to core/tools/vmaf.cpp (no logic change) and replaces VmafCudaFunctions with CudaFunctions in 13 CUDA close callbacks (correct type name, no ABI/API change). Neither modification touches upstream-mirrored code paths or public API signatures.
revert/float-adm-simd-dispatch-neon-fma (2026-06-06, ADR-1057)¶
no rebase impact: removes adm_prime_simd_dispatch() from adm_tools.h and adm_tools.c; removes the call site added to float_adm.c::init() by PR #685; deletes core/test/test_float_adm_simd.c and its meson.build entries. Any in-flight branch that rebases onto a version of adm_tools.h that still contains adm_prime_simd_dispatch() will see a merge conflict at the declaration — resolve by simply not including the declaration (the function no longer exists after this revert). The SIMD kernel files (adm_tools_avx2.c, adm_tools_neon.c, etc.) are untouched; the functions remain compiled and linkable for a future re-dispatch PR.
fix/core-test-regressions-pr-train (2026-06-04, no ADR — bug fixes)¶
Files touched: core/src/gpu_picture_pool.cpp, core/src/feature/feature_extractor.cpp, core/src/feature/integer_motion.c, core/src/predict.c, core/test/test_framesync.c, core/test/test_integer_motion_coverage.c, core/test/test_score_pooled_eagain.c
Rebase impact: Any concurrent branch that also edits feature_extractor_list[] must preserve the &vmaf_fex_integer_motion_v2 entry. Any branch that adds a new motion extractor with VMAF_FEATURE_EXTRACTOR_PREV_REF flag benefits from the context_extract prev_ref management added here. Branches modifying predict_load_feature_score must not regress the -EAGAIN vs -EINVAL distinction for unwritten feature vectors (Netflix#755 / ADR-0154).
fix/legacy-runner-import-stub-adr0749 (2026-06-04, no ADR — bug fix)¶
Files touched: compat/python-vmaf/core/quality_runner.py, docs/state.md, changelog.d/fixed/legacy-runner-import-stub-adr0749.md
Rebase impact: The VmafLegacyQualityRunner stub is fork-local and does not conflict with upstream Netflix/vmaf (which never had this class). No upstream sync will touch compat/python-vmaf/core/quality_runner.py in a way that removes the stub; if upstream adds a class with the same name, the stub must be removed rather than overwritten.
fix/arm-motion-v2-re-register-and-test-order¶
Files touched: core/src/meson.build, core/src/feature/feature_extractor.c, core/test/test_integer_motion_coverage.c
Rebase impact: no rebase impact from other branches expected. If a concurrent PR touches feature_extractor_list[] or meson.build's CPU source list, preserve integer_motion_v2.c registration and the &vmaf_fex_integer_motion_v2 list entry — removing them breaks all "motion_v2" lookups on CPU-only builds.
ci/dev-container-gate-adr0819 (2026-06-04, ADR-0819)¶
no rebase impact: adds .github/workflows/dev-container-build.yml and docs/adr/0819-dev-container-ci-gate.md. No C source, public C API, upstream-mirrored Python, Netflix golden-assertion file, or ffmpeg-patches file is touched.
docs/mkdocs-strict-nav-conformance¶
no rebase impact: changes are isolated to mkdocs.yml nav entries and a changelog fragment. No C source, public C API, upstream-mirrored Python, Netflix golden-assertion file, or ffmpeg-patches file is touched.
ci/promote-gpu-coverage-gate-required¶
no rebase impact: changes are isolated to the CI workflow file and docs. No C source, public C API, upstream-mirrored Python, Netflix golden-assertion file, or ffmpeg-patches file is touched.
fix/containerfile-user-hardening-adr1042¶
no rebase impact: container hardening changes only (USER directive and ARG/ENV scoping). No public API or upstream-mirrored C code touched.## fix/r9-helm-vmaftune-grpc-bugs (2026-06-04)
no rebase impact: changes are confined to deploy/helm/vmafx/ (Helm chart config only), tools/vmaf-tune/src/vmaftune/cli.py (Python), and cmd/vmafx-node/online_feedback.go (fork-local Go binary). None of these files has an upstream Netflix/vmaf counterpart.
fix/r6-sycl-kernel-correctness (2026-06-04)¶
Files touched: core/src/feature/sycl/integer_vif_sycl.cpp, core/src/feature/sycl/integer_motion_sycl.cpp, core/src/feature/sycl/integer_adm_sycl.cpp
Rebase impact: no rebase impact — all files are fork-local SYCL paths with no upstream counterparts.
fix/r6-cuda-hip-kernel-correctness (2026-06-04)¶
Files touched: core/src/feature/cuda/integer_vif/filter1d.cu, core/src/feature/cuda/integer_adm/adm_cm.cu, core/src/feature/hip/integer_adm/adm_decouple.hip, core/src/feature/hip/integer_vif/vif_statistics.hip
Rebase impact: no rebase impact — all four files are fork-local GPU paths with no upstream counterparts. The CUDA files are in feature/cuda/ which Netflix upstream does not ship; the HIP files are fully fork-added.
fix/r6-metric-scoring-guards (2026-06-04)¶
Files touched: core/src/feature/integer_psnr.c, core/src/feature/x86/psnr_avx2.c, core/src/feature/x86/psnr_avx512.c, core/src/feature/arm64/psnr_neon.c, core/src/feature/adm.c, core/src/feature/integer_adm.c, core/src/feature/float_adm.c
Rebase impact: no rebase impact — all fixes are in error-path / edge-case branches that upstream has not touched since the fork. The APSNR cap formula change (* 2 removed) only affects scores on nearly-perfect sequences; it is not a Netflix golden-data assertion value.
fix/r7-ci-wf-concurrency-timeout (2026-06-04, ADR-1035)¶
Files touched: .github/workflows/nightly.yml, .github/workflows/nightly-bisect.yml, .github/workflows/supply-chain.yml, .github/workflows/release-please.yml, .github/workflows/scorecard.yml, .github/workflows/rust-ci.yml, .github/workflows/go-ci.yml, .github/workflows/e2e-k8s.yml
No rebase impact: pure CI configuration changes with no code-path dependencies. Upstream Netflix/vmaf does not carry these workflows.
fix/r7-docs-broken-links-mkdocs-nav (2026-06-04)¶
Files touched: docs/development/build-flags.md, docs/metrics/features.md, mkdocs.yml
No rebase impact: documentation-only changes. No C library, public header, or Netflix golden-assertion file is touched.
fix/r7-mcp-precision-subsample-drift (2026-06-04, ADR-1038)¶
Files touched: cmd/vmafx-mcp/impl.go, cmd/vmafx-mcp/tools.go, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py
No rebase impact: pure default-value changes. No C library, public header, upstream Python harness, or Netflix golden-assertion file is touched.
fix/r7-vendored-svm-realloc-oom (2026-06-04, ADR-1039)¶
Files touched: core/src/svm.cpp
no rebase impact: three internal realloc safety patches. No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. If an upstream Netflix/vmaf commit also fixes these same three sites, take the upstream version (which is also a MEM04-C fix) and drop this patch at rebase time.
fix/r7-licensing-spdx-svm-copyright (2026-06-04)¶
Files touched: Cargo.toml, bindings/rust/vmafx/Cargo.toml, ai/pyproject.toml, mcp-server/vmaf-mcp/pyproject.toml, dev-llm/pyproject.toml, python/pyproject.toml, tools/ensemble-training-kit/pyproject.toml, tools/vmaf-roi-score/pyproject.toml, tools/vmaf-tune/pyproject.toml, core/src/svm.cpp
No rebase impact: license field corrections and copyright header additions have no effect on build or test outputs. Upstream Netflix/vmaf does not carry Cargo.toml or any of these pyproject.toml files.
fix/sycl-speed-incomplete-type-access (2026-06-04)¶
Files touched: core/src/feature/sycl/speed_chroma_sycl.cpp, core/src/feature/sycl/speed_temporal_sycl.cpp
no rebase impact: internal build-fix replacing direct struct member dereferences with the existing public API call vmaf_sycl_get_queue_ptr(). No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. If an upstream commit adds a SYCL speed extractor, ensure it also uses vmaf_sycl_get_queue_ptr() rather than direct struct access.
fix/cli-narrowing-casts-vmaf-cpp (2026-06-04)¶
Files touched: core/tools/vmaf.cpp
no rebase impact: three static_cast<unsigned>(...) wrappers added to the VmafPictureConfiguration initializer at line ~1360. No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. If an upstream commit modifies the VmafPictureConfiguration initializer or adds new pic_params fields, verify the cast pattern is preserved.
fix/release-please-config-json-parse-error (2026-06-04)¶
no rebase impact: removes a duplicate array element from release-please-config.json. No C source, public header, Python harness, or Netflix golden-assertion file is touched. Any in-flight branch that modifies release-please-config.json should simply ensure the ai package's changelog-sections array no longer contains two chore entries.
fix/simd-psnr-16bit-scalar-tail-overflow (2026-06-04)¶
Files touched: core/src/feature/x86/psnr_avx2.c, core/src/feature/x86/psnr_avx512.c, core/src/feature/arm64/psnr_neon.c
no rebase impact: internal arithmetic fix in scalar tail loops. No public C API, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The change affects only the three SIMD backends' scalar-remainder path for 16-bit PSNR; the SIMD main loop is unchanged. Port of any upstream commit touching these files should verify that the (uint32_t)abs(...) pattern is preserved in the scalar tail if the upstream change modifies it.
fix/r6-cpu-scoring-nan-ub-guards (2026-06-04)¶
Files touched: core/src/feature/integer_psnr.c, core/src/feature/ms_ssim.c, core/src/feature/float_ssim.c, core/src/feature/float_ms_ssim.c, core/src/feature/iqa/ssim_tools.c, core/src/feature/adm.c, core/src/feature/integer_adm.c, core/src/feature/float_adm.c, core/src/feature/motion.c, core/src/feature/cambi.c, docs/adr/1033-cpu-scoring-nan-ub-guards.md, changelog.d/fixed/1033-cpu-scoring-nan-ub-guards.md
no rebase impact: all changes are internal correctness fixes inside CPU-path scoring functions. No public C API headers, no meson_options.txt entries, no ffmpeg-patches/ series entries, and no Netflix golden-assertion files are touched. Rebasing on top of any upstream commit that modifies these same source files may produce minor context conflicts in the guard blocks; resolve by keeping both the upstream change and the NaN guard. ADR-1033.
fix/vmaf-init-double-init-guard-vmaf-close-pointer-contract (2026-06-04, ADR-1032)¶
Files touched: core/src/libvmaf.c, core/src/dnn/dnn_api.c, core/include/libvmaf/libvmaf.h, core/test/test_context.c
no rebase impact: all changes are fork-local bug-fixes with no upstream equivalents. vmaf_init guard is a new branch (no upstream logic removed), vmaf_close header change is documentation-only, and the DNN fallback path touches a fork-added sidecar-loading block that does not exist in Netflix upstream. No Netflix golden assertions or upstream-mirrored Python are touched.
fix/cuda-vif-filter1d-adm-cm-opprec (2026-06-04)¶
Files touched: core/src/feature/cuda/integer_vif/filter1d.cu, core/src/feature/cuda/integer_adm/adm_cm.cu
no rebase impact: pure kernel arithmetic fixes. No public C API header, no meson build option, no FFmpeg patch surface, and no upstream-mirrored Python file is touched. The fixes correct two silent arithmetic defects (a typo in the rd-filter upper-bound guard in filter1d.cu and a missing parenthesis pair in two x_sq reduction loops in adm_cm.cu). Cross-backend SYCL/HIP/ Vulkan ADM and VIF twins do not carry the same expressions and are unaffected.
fix/sycl-vif-rd-stride-motion-uv-sync (2026-06-04)¶
Files touched: core/src/feature/sycl/integer_vif_sycl.cpp, core/src/feature/sycl/integer_motion_sycl.cpp, docs/adr/1034-sycl-vif-rd-stride-motion-uv-sync.md, changelog.d/fixed/sycl-vif-rd-stride-motion-uv-sync.md
If this branch rebounds onto a commit that changes the rd_stride or rd_size allocation in integer_vif_sycl.cpp, re-verify that both the scalar (SIMD-32) and SIMD-16 kernel variants use (e_w + 1U) / 2U as the stride and that the allocation uses ((w + 1U) / 2U) * ((h + 1U) / 2U). If a future PR routes UV H2D copies through copy_queue and updates last_upload_event, the vmaf_sycl_queue_wait(state) added in submit_fex_sycl can be removed in favour of the GPU-side barrier — track this as a follow-up optimization.
fix(hip,metal): HIP adm_decouple dangling body + VIF wavefront carry + Metal motion vertical halo (ADR-1030, 2026-06-04)¶
Files touched: core/src/feature/hip/integer_adm/adm_decouple.hip, core/src/feature/hip/integer_vif/vif_statistics.hip, core/src/feature/metal/float_motion.metal, docs/adr/1030-hip-metal-kernel-correctness.md, changelog.d/fixed/hip-metal-kernel-correctness-1030.md
Rebase impact: low. These are self-contained correctness fixes inside GPU-only kernel files. No public C API, no CPU feature extractor, no CLI flag, and no Netflix golden assertion is touched. Any branch that also modifies adm_decouple.hip will need to re-apply the dangling-body removal; branches touching vif_statistics.hip wavefront_reduce_i64 will need to keep the integer-addition reassembly. Metal float_motion.metal conflicts are straightforward to resolve by preserving TILE_H=20 and the - HALF_FW origin offsets.
docs/vulkan-overview-mark-removed-adr0726 (2026-06-04)¶
Files touched: docs/backends/vulkan/overview.md, docs/api/vulkan-image-import.md, docs/state.md, changelog.d/chore/vulkan-docs-mark-removed.md
no rebase impact: docs-only changes. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The changes add removal notices to two Vulkan documentation files that still described the backend as active after ADR-0726 removed it.
docs(post-rename): scrub residual libvmaf/ paths (ADR-0700)¶
no rebase impact: doc-only path corrections. All changed files are under docs/, AGENTS.md, CONTRIBUTING.md, and one comment in core/include/libvmaf/libvmaf_mcp.h. No C source files changed. No public headers changed (the comment in libvmaf_mcp.h is prose, not an include path). No Netflix golden assertions touched.
docs(usage,api): correct backend auto-priority + Doxygen drift in public headers¶
Files touched: docs/usage/vmafx-cli.md, docs/usage/vmaf-tune-score-backend.md, docs/usage/vmaf-tune.md, docs/usage/bench.md, docs/usage/ffmpeg.md, core/include/libvmaf/libvmaf.h, core/include/libvmaf/libvmaf_hip.h, core/include/libvmaf/AGENTS.md, core/include/libvmaf/model.h, changelog.d/changed/backend-autopriority-doxygen-drift.md, docs/rebase-notes.md.
No rebase impact: doc-only and Doxygen-only edits. No C source, public C symbol, ABI surface, Netflix golden assertion, or upstream-mirrored implementation is affected. The model.h change replaces a @field block with per-member inline comments — comment-only; no struct layout change.
docs/mcp-tools-audit-fixes¶
Files touched: docs/mcp/index.md, docs/mcp/tools.md, docs/mcp/http-transport.md, docs/mcp/release-channel.md, changelog.d/changed/mcp-tools-catalogue-audit-fixes.md, docs/rebase-notes.md.
No rebase impact: doc-only scrub. No C source, public headers, Netflix golden assertions, MCP server Python/Go source, or upstream-mirrored symbols are touched. No branch logic changed.
docs(post-vulkan-drop): residual scrub + fix -Denable_vulkan=true (invalid Meson) → =enabled¶
Branch: docs/post-vulkan-drop-residual-scrub
no rebase impact: docs-only change. Fixes stale Vulkan references in docs/ai/datasets/k150k.md, docs/mcp/tools.md, docs/api/index.md, and docs/api/gpu.md. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched.
docs(rebrand): scrub residual Lusoris-fork references¶
Files touched: docs/usage/cli.md, docs/ai/mos-corpora.md, docs/ai/konvid-1k-ingestion.md, docs/ai/konvid-150k-ingestion.md, docs/development/release.md, docs/development/automated-rule-enforcement.md, docs/mcp/index.md, docs/architecture/c4-context.md, docs/architecture/c4-container.md, docs/metrics/bad-cases.md, CONTRIBUTING.md, AGENTS.md, changelog.d/changed/scrub-lusoris-fork-refs.md, docs/rebase-notes.md.
No rebase impact: doc-only text substitutions (branding strings, fork issue URL, HIP status text). No C source, public header, Netflix golden assertion, upstream-mirrored symbol, version string, or copyright header was modified.
docs(versions): bump stale Go + required-checks-count + Python pins¶
Files touched: CLAUDE.md, docs/development/languages.md, docs/development/release.md, docs/architecture/c4-context.md, docs/mcp/index.md, docs/getting-started/install/windows.md, docs/ai/training.md, changelog.d/changed/bump-stale-docs-go-checks-python-pins.md, docs/rebase-notes.md.
No rebase impact: docs-only scrub; no C source, public header, Netflix golden assertion, or upstream-mirrored symbol is affected.
docs(copyright): drop "and Claude (Anthropic)" from fork headers — residual sweep¶
Files touched: README.md, dev-llm/src/vmaf_dev_llm/__init__.py, scripts/lib/__init__.py, changelog.d/changed/copyright-drop-anthropic-residuals.md, docs/rebase-notes.md.
No rebase impact: text-only copyright-line change in three files missed by the ADR-0861 / ADR-0776 sweeps. No C source, public header, Netflix golden assertion, upstream-mirrored symbol, or build system touched.
docs(post-ansnr): scrub residual ANSNR references (PR #38 follow-up)¶
Files touched: docs/api/gpu.md, docs/backends/hip/overview.md, docs/backends/index.md, docs/backends/arm/overview.md, docs/backends/metal/index.md, docs/development/build-flags.md, docs/development/cross-backend-gate.md, docs/metrics/features.md, docs/mcp/tools.md, README.md, core/src/feature/metal/AGENTS.md, core/src/hip/AGENTS.md, core/src/feature/cuda/AGENTS.md, AGENTS.md, changelog.d/changed/post-ansnr-doc-scrub.md.
No rebase impact: doc-only changes (no C source, public header, Netflix golden assertions, or upstream-mirrored symbols affected). If an upstream Netflix/vmaf PR adds float_ansnr back, take the upstream side only in the C sources; the fork's doc changes apply only to fork-specific backend docs.
fix(rebrand): correct C++ badge (c++11→c++23) + drop Vulkan from GPU badge¶
Files touched: README.md, changelog.d/fixed/readme-badges-cpp23-drop-vulkan.md, docs/rebase-notes.md.
No rebase impact: doc-only edit to README.md badge lines; no C source, public header, Netflix golden assertion, or upstream-mirrored symbol is affected.
test(hip): parity coverage round 5 — speed_chroma + speed_temporal (2026-06-04, ADR-1004)¶
Files touched: core/test/test_hip_speed_chroma_parity.c, core/test/test_hip_speed_temporal_parity.c, core/test/meson.build, docs/adr/1004-hip-kernel-coverage-round5.md, docs/adr/README.md, docs/state.md, changelog.d/added/1004-hip-kernel-coverage-round5.md
no rebase impact: the two new test TUs are fork-local additions with no upstream analogue. The meson.build additions are append-only within the if hip_enabled block. No C source, public header, Netflix golden assertion, or upstream-mirrored Python file is modified.
chore/build-cpp-std-c23-bump (2026-06-04, ADR-1003)¶
Files touched: core/meson.build, core/AGENTS.md, core/test/meson.build, docs/adr/1003-cpp-std-c23-bump.md, docs/adr/README.md, changelog.d/changed/cpp-std-c23-bump.md
Rebase impact: Low. The cpp_std=c++11 → cpp_std=c++23 change in core/meson.build may conflict with any upstream Netflix/vmaf PR that also touches default_options. Netflix upstream still uses c++11; on conflict, keep c++23 (the fork's stated standard). The core/test/meson.build fix for test_feature_collector_coverage is fork-local; take the fork side on any conflict.
test(mcp-server): coverage push round 4¶
Files touched: mcp-server/vmaf-mcp/tests/test_coverage_round4.py, changelog.d/added/mcp-server-coverage-round4.md, docs/rebase-notes.md.
Rebase impact: None. Fork-local Python test file with no upstream analogue; no C source, public header, or Netflix golden-assertion file is touched.
test(sycl): parity coverage round 5 — CAMBI parity gate¶
Branch: test/sycl-parity-round5-cambi
no rebase impact: adds core/test/test_sycl_cambi_parity.c (new file, no upstream analogue), one meson.build registration block, ADR-1001, and a changelog fragment. No C source, public header, feature extractor implementation, or Netflix golden-assertion file is touched.
chore(rust): bump bindgen 0.69 → 0.72 + workspace edition 2021 → 2024 (ADR-1002)¶
Branch: chore/rust-edition-2024-bindgen-072
Touches: Cargo.toml, Cargo.lock, bindings/rust/vmafx/Cargo.toml, bindings/rust/vmafx-sys/Cargo.toml, core/src/feature/rust/tad/src/lib.rs, docs/adr/1002-rust-edition-2024-bindgen-072.md, changelog.d/chore/rust-edition-2024-bindgen-072.md.
No rebase impact on upstream Netflix/vmaf code. All changed files are fork-local Rust crates (vmafx-sys, vmafx, vmafx-tad) with no upstream analogue. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. A future upstream port cannot conflict with Rust workspace settings since Netflix/vmaf has no Rust code. The bindgen-consumed header paths consumed by bindgen remain at core/include/libvmaf/ (ADR-0700 path); any future upstream header change that adds or removes a symbol is handled automatically by re-running cargo build (bindgen regenerates on every build).
fix(cppcheck): resolve Whole-Project warnings¶
Files touched: core/src/feature/integer_ssim.c, core/src/picture_pool.cpp, core/src/read_json_model.cpp, core/tools/vmaf.cpp, core/test/test_ssimulacra2_simd.c, core/test/dnn/test_tensor_io.c, .cppcheck-suppressions.txt, changelog.d/fixed/cppcheck-whole-project-warnings.md
Rebase impact: None for upstream Netflix/vmaf cherry-picks. All changes are either fork-local files (picture_pool.cpp, opt.cpp suppression) or minimal defensive additions (null checks, format-specifier corrections, struct-member initialisation) that do not alter external behaviour. The %d → %u format fixes in vmaf.cpp and read_json_model.cpp are cosmetic; the VmafModel{} initialisation is semantically equivalent to memset(m, 0, …) on any IEEE-754 platform.
docs(coverage): ADR-0922 coverage-gate runbook (2026-06-04)¶
Files touched: docs/development/coverage-gate.md (new), changelog.d/added/coverage-gate-runbook.md (new)
Rebase impact: None. Documentation-only addition; no source, build, or CI files are modified.
fix(cppcheck): motion_avx512 missing sub-kernel functions¶
Files touched: core/src/feature/x86/motion_avx512.c, core/src/feature/x86/motion_avx512.h
Rebase impact: None. Both files are fork-local SIMD additions. The four new public symbols (sad_avx512, y_convolution_8_avx512, y_convolution_16_avx512, x_convolution_16_avx512) are additive and have no upstream Netflix/vmaf equivalents. No existing symbol is renamed, removed, or ABI-changed.
chore/tech-stack-badges-go-pin-bump (2026-06-04, ADR-1000)¶
Files touched: README.md, go.mod, .github/workflows/go-ci.yml, docs/adr/1000-tech-stack-badges-go-rust-pins.md, docs/adr/_index_fragments/1000-tech-stack-badges-go-rust-pins.md, docs/adr/_index_fragments/_order.txt, changelog.d/changed/tech-stack-badges-go-pin-bump.md
Rebase impact: None for C/SYCL/CUDA/HIP/Vulkan/Rust code. go.mod minimum version is bumped 1.25.0 → 1.26.4; this only affects builds that run go build / go test. Upstream Netflix/vmaf has no Go code, so no upstream cherry-pick will conflict with this change. The README badge block change is purely additive; no upstream port touches the README badge section.
fix/tsan-framesync-stdatomic-cxx (2026-06-04, ADR-0999)¶
Files touched: core/src/framesync.h, core/src/ref.h
Rebase impact: None. Both files are upstream-mirror headers touched only in the preprocessor guard section; no function signatures or struct members are changed. Upstream Netflix/vmaf does not compile feature_extractor.cpp as C++ (they use a C-only build), so this guard addition will not conflict with any upstream cherry-pick. ref.h guard widening from _MSC_VER to all C++ is backward-compatible: non-MSVC C compilers are unchanged (#if defined(__cplusplus) is false in C mode).
fix(metal): hoist feature_extractor.h above extern "C" in Metal .mm files¶
Files touched: core/src/feature/metal/float_moment_metal.mm, core/src/feature/metal/float_motion_metal.mm, core/src/feature/metal/float_ms_ssim_metal.mm, core/src/feature/metal/float_psnr_metal.mm, core/src/feature/metal/float_ssim_metal.mm, core/src/feature/metal/integer_motion_metal.mm, core/src/feature/metal/integer_motion_v2_metal.mm, core/src/feature/metal/integer_psnr_metal.mm
Rebase impact: None. All changed files are fork-local Metal backend sources. No upstream Netflix/vmaf files are touched. The change is purely an include-order fix (moves feature_extractor.h above its enclosing extern "C" block); no API, ABI, or algorithm change.
fix(arm64): guard framesync.h stdatomic include for C++ mode¶
Branch: fix/arm64-clang-stdatomic-cxx-conflict
Files touched: changelog.d/fixed/arm64-clang-stdatomic-cxx-framesync.md, docs/state.md, docs/rebase-notes.md.
no rebase impact: The framesync.h guard is already present via ADR-0999 (fix/tsan-framesync-stdatomic-cxx); this PR adds the ARM64-specific changelog fragment and state.md tracking row.
port/upstream-speed-chroma-simd-30f472b14 (2026-06-03, upstream 30f472b14)¶
Files touched: core/src/feature/x86/speed_avx2.c, core/src/feature/x86/speed_avx2.h, core/src/feature/x86/speed_avx512.c, core/src/feature/x86/speed_avx512.h, core/src/feature/speed.c, core/src/meson.build, core/test/test_speed_simd.c, core/test/meson.build
Rebase impact: Reduces delta — this port lands the upstream commit verbatim (new AVX2 + AVX-512 covariance-sum kernels, function-pointer dispatch). Future /sync-upstream passes that touch speed.c will see a smaller diff because the kernel dispatch pattern is now present on both sides. The compute_cov_kernel_fn typedef and SpeedState::compute_cov_kernel field are fork additions; any upstream change to the compute_covariance signature must also update the typedef here.
test/go-coverage-push (2026-06-04)¶
Files touched: cmd/vmafx-controller/{grpc_server.go,grpc_server_test.go,http_cancel_test.go,main_test.go,main_extra_test.go,auth/grpc_interceptor.go,auth/middleware.go,queue/queue_listall_test.go}, cmd/vmafx-mcp/impl.go, cmd/vmafx-node/{executor_test.go,main_test.go,online_feedback_pump_test.go}, cmd/vmafx-operator/internal/controller/{vmafxjob_applystatus_test.go,vmafxmodeltraining_applystatus_test.go,vmafxmodeltraining_controller.go,vmafxnode_controller.go}, cmd/vmafx-server/{grpc_server.go,http_cancel_test.go,main_extra_test.go}, pkg/observability/otel_instruments_test.go, pkg/score/grpc_client_unary_test.go
Rebase impact: Low. All changes are either test files (no rebase conflict possible on pure test additions) or targeted bug fixes in production code (grpc_server.go undefined-var fix, operator int32 type cast, MCP Vulkan backend dispatch). The auth ContextWithClaims export and probeHealthz method are additive. No public header or proto changes.
test/compat-python-vmaf-coverage-push (2026-06-03)¶
Files touched: compat/python-vmaf/tests/ (new directory), pyproject.toml (testpaths + pythonpath additions)
Rebase impact: None. Pure test addition; no production code changed. The pyproject.toml diff only appends to testpaths and pythonpath — if a concurrent branch adds entries in the same section a trivial conflict resolution is required (keep both entries).
vmafx-title-rebrand (2026-06-03, no ADR)¶
Files touched: README.md, mkdocs.yml, pyproject.toml, CONTRIBUTING.md
Rebase impact: None. All four files are fork-local metadata surfaces (project title, site name, package description, contributor heading). Upstream Netflix/vmaf does not touch any of these files; no merge conflict is possible on rebase.
feat(vmaf-tune): ADR-0498 follow-up #7 — encoder stats, x264 detection, backend dispatch, codec-list parser¶
Files touched: tools/vmaf-tune/src/vmaftune/encode.py, tools/vmaf-tune/src/vmaftune/fast.py, tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, tools/vmaf-tune/tests/test_encode_dispatcher_per_adapter.py, tools/vmaf-tune/tests/test_adr_0498_followup7.py
Rebase impact: None. All changed files are fork-local to tools/vmaf-tune/; no upstream Netflix/vmaf files are touched. The _VERSION_PROBE_PATTERNS dict is additive (new keys only). The parse_available_codecs function is new; no existing symbol is renamed or removed. The _build_production_sample_extractor signature change (new backend=None kwarg) is backward-compatible. The test_encode_dispatcher_per_adapter.py fix (capture first call only) resolves a test fragility introduced by the probe-cache expansion; no merge conflict expected against Netflix upstream since that test is fork-added.
fix/cuda-duplicate-csf-r-definitions (2026-06-03)¶
Files touched: core/src/feature/cuda/integer_adm/adm_cm.cu
Rebase impact: None. Purely removes a duplicate code block introduced by a merge-order accident (PR #565 admin-merged while master already had the same helpers). No upstream file is touched; no public header changes.
feat/ai-run-manifest-12-scripts (ADR-0668 follow-up)¶
No rebase impact. Pure Python-only change to ai/scripts/train_konvid.py. No C/header files modified. No upstream Netflix/vmaf files touched. The only observable change is the addition of a train_konvid.manifest.json sidecar emitted after training completes.
cuda-adm-decouple-inline-ldg (2026-05-29, ADR-0773)¶
Files touched: core/src/feature/cuda/integer_adm/adm_csf.cu, core/src/feature/cuda/integer_adm/adm_cm.cu
Rebase impact: None. Both files are fork-added CUDA kernel translation units that do not exist in upstream Netflix/vmaf master (ADM CUDA port is fork-local). No rebase conflict is possible.
The change is a pure performance annotation: const T *__restrict__ pointer extraction before hot inner loops and __ldg() on all per-pixel DWT2 band reads. If upstream Netflix ever adds their own ADM CUDA port, these files will need to be re-reviewed against theirs; the F3 pattern should carry forward.
feat/vmafx-tune-go-stage4-report (ADR-0770)¶
No rebase impact: pure Go CLI and pkg/report additions. No upstream C/Python files modified. Files added: cmd/vmafx-tune/cmd/report.go, pkg/report/multi.go, pkg/report/multi_test.go, docs/adr/0770-vmafx-tune-go-stage4-report.md, changelog.d/added/vmafx-tune-go-stage4-report.md. Files modified: cmd/vmafx-tune/cmd/root.go (register report + ladder), cmd/vmafx-tune/AGENTS.md (invariants 8–9), docs/usage/vmafx-tune-go.md (Stage-4 section), docs/adr/README.md (new row), docs/rebase-notes.md (this entry).
doxygen-thread-safety-tags (2026-05-29, ADR-0788)¶
Files touched: core/include/libvmaf/libvmaf.h, core/include/libvmaf/picture.h, core/include/libvmaf/feature.h, core/include/libvmaf/model.h, core/include/libvmaf/dnn.h
Rebase impact: Low. These are comment-only additions. An upstream sync that modifies the same function signatures may create minor merge-fuzz on the Doxygen blocks; resolve by re-applying the @thread-safety tags to whatever the upstream version of the comment looks like.
containerfile-layer-optimization (ADR-0790, 2026-05-29)¶
Files touched: dev/Containerfile
Rebase impact: None. dev/Containerfile is fork-local (not present in upstream Netflix/vmaf). No rebase conflict is possible.
phase-4b8-c-abi-break-scoping (2026-05-29)¶
Files touched: docs/adr/0767-phase-4b8-c-abi-break-scoping.md, docs/research/research-0752-phase-4b8-c-abi-break-scoping.md, docs/adr/README.md, changelog.d/changed/0767-phase-4b8-c-abi-break-scoping.md
Rebase impact: No rebase impact. This is a scoping/design document with no source changes. The implementation PR (when it lands) will touch core/include/libvmaf/*.h and every ffmpeg-patches/ file — that implementation PR will carry its own rebase note cataloguing the specific header and patch changes. When upstream Netflix/vmaf adds symbols to libvmaf.h or model.h between now and the v4 implementation, the ADR-0767 removal list should be checked against the upstream additions to avoid removing a symbol upstream has just added.
docs/hip-picture-stub-comment-closeout (ADR-0613, 2026-06-03)¶
core/src/picture.h — comment on VMAF_PICTURE_BUFFER_TYPE_HIP_DEVICE updated to reflect that picture_hip.{c,h} is fully implemented (ADR-0613); the old text described it as a stub.
Rebase impact: NONE — comment-only change; no logic, no ABI delta.
chore/cambi-drop-vulkan-scaffold — remove CAMBI Vulkan scaffolding per ADR-0726 (2026-06-03)¶
No rebase impact on upstream C/Python code.
Files modified are fork-local: core/src/feature/vulkan/cambi_vulkan.c (deleted), core/src/feature/vulkan/shaders/cambi_{preprocess,derivative,filter_mode,decimate,mask_dp}.comp (deleted), core/test/test_cambi_vulkan.c (deleted), core/src/vulkan/meson.build (CAMBI source + shader entries removed), core/src/feature/cambi_internal.h (comment updated), core/src/feature/cuda/integer_cambi_cuda.c (comments updated), core/src/feature/hip/integer_cambi_hip.c (comment updated), changelog.d/removed/cambi-vulkan-scaffold.md (new).
Rebase impact: None on upstream sync (no Netflix file touched).
CI scaffold-comment refresh (2026-06-03)¶
.github/workflows/fuzz.yml — header comment updated: ADR-0882 citation added alongside ADR-0270/0311. .github/workflows/libvmaf-build-matrix.yml — Metal matrix lane comment and name: field updated from "T8-1 scaffold" to "runtime" (ADR-0420 landed).
no rebase impact: comment-only change; no logic or structure altered.
Single ledger of fork-local changes that need attention when this fork syncs from upstream/master (Netflix/vmaf). Required by ADR-0108: every fork-local
Second-opinion batch smoke scaffold + pytest path fix (ADR-0991, 2026-06-03)¶
Files touched: ai/pyproject.toml (add pythonpath = ["scripts"] to pytest config), ai/testdata/smoke-second-opinion-batch/ (new: batch.json, fixtures/*.jsonl, README.md), docs/adr/0991-second-opinion-batch-runs.md (new), docs/research/research-0991-second-opinion-batch-2026-06-03.md (new), changelog.d/fixed/0991-second-opinion-batch-pytest-path.md (new).
Rebase impact: None on upstream sync (no Netflix/vmaf upstream file touched). The ai/pyproject.toml addition is additive; no conflict risk.
controller-multi-tenant-auth-gateway (2026-05-29, ADR-0794)¶
Files touched: cmd/vmafx-controller/auth/ (new package), cmd/vmafx-controller/main.go, cmd/vmafx-controller/grpc_server.go, cmd/vmafx-controller/http_server.go, cmd/vmafx-controller/queue/queue.go, cmd/vmafx-controller/queue/schema.sql, deploy/helm/vmafx/crds/vmafx.dev_vmafxtenants.yaml (new), deploy/helm/vmafx/templates/tenant-crd-config.yaml (new), deploy/helm/vmafx/templates/deployment.yaml, deploy/helm/vmafx/values.yaml, docs/server/auth.md (new), docs/adr/0794-controller-multi-tenant-auth-gateway.md (new).
Rebase impact: None. All touched files are fork-local additions (vmafx-controller, Helm chart, docs) that do not exist in upstream Netflix/vmaf. The SQLite schema change (tenant_id column) is additive and non-breaking. No upstream rebase conflict is possible.
KoNViD / UGC / BVI-DVC saliency batch manifests (ADR-0993, 2026-06-03)¶
Files touched: ai/batch-manifests/saliency/konvid-150k.json (new), ai/batch-manifests/saliency/ugc.json (new), ai/batch-manifests/saliency/bvi-dvc.json (new), docs/ai/saliency-feature-materializer.md (corpus-specific manifests section), docs/adr/0993-konvid-ugc-bvi-saliency-batch-launch.md (new), docs/adr/README.md (index row), changelog.d/added/konvid-ugc-bvi-saliency-batch-manifests.md (new).
Rebase impact: None on upstream sync (no Netflix file touched). All new files are fork-local; no upstream path conflicts.
ADR-0992 — MOS-label batch-run manifests for KonViD and CHUG¶
Files touched: ai/configs/mos-label-batch-konvid.json (new), ai/configs/mos-label-batch-chug.json (new), ai/tests/test_mos_label_batch_runs_smoke.py (new), ai/tests/test_batch_materialize_mos_labels.py (sys.path bug fix), docs/ai/mos-label-materializer.md, docs/adr/0992-mos-label-batch-runs.md (new), docs/adr/README.md, changelog.d/added/0992-mos-label-batch-runs.md (new), and this file.
Rebase impact: No rebase impact on upstream sync (all touched files are fork-local; no Netflix/vmaf source file is modified). No cross-branch impact: the new ai/configs/*.json files are independent and will not conflict with any in-flight branch.
Changelog-fragment section hygiene (2026-05-30)¶
Files touched: changelog.d/perf/*.md → changelog.d/changed/perf-*.md (27 renames), changelog.d/performance/*.md → changelog.d/changed/perf-*.md (5 renames), changelog.d/README.md, release-please-config.json, docs/adr/0892-conventional-commits-and-changelog-fragment-hygiene.md (new), docs/research/0892-conventional-commits-audit-2026-05-30.md (new), changelog.d/fixed/conventional-commits-audit.md (new).
Rebase impact: None on upstream sync (no Netflix file touched). Cross-branch impact on fork: any in-flight feature branch holding a changelog.d/perf/*.md or changelog.d/performance/*.md file will hit a rename-detection conflict on rebase. git rebase with default -X settings detects the rename cleanly; if a conflict surfaces, the fix is to drop the in-flight branch's copy of the file and re-add the content under changelog.d/changed/perf-<topic>.md. The migrated files had their leading ### Performance / ## perf(…) headings stripped (renderer adds ### Changed itself); in-flight branches that added a new perf/ fragment should follow the same pattern.
See ADR-0892.
fix/ci-docs-pr-trigger — docs.yml PR trigger (2026-06-03, ADR-0986)¶
No rebase impact on upstream C/Python code.
Files modified are fork-local: .github/workflows/docs.yml (trigger + permissions update), docs/adr/0986-ci-docs-pr-trigger.md (new), docs/adr/_index_fragments/0986-ci-docs-pr-trigger.md (new), docs/adr/_index_fragments/_order.txt (appended), changelog.d/fixed/ci-docs-pr-trigger-0986.md (new), docs/rebase-notes.md (this entry).
Netflix upstream ships no GitHub Actions workflows. No rebase conflict is possible.
Research-0760 — Rust crate audit (docs + ADR-0707 correction, 2026-05-29)¶
No rebase impact on upstream C/Python code.
All files modified are fork-local: docs/research/research-0760-rust-crate-audit.md (new), changelog.d/added/rust-crate-audit-0760.md (new), docs/adr/0707-vmafx-rust-pilot-feature.md (corrected enable_rust_features default description from "true" to "false"), docs/rebase-notes.md (this entry).
Neither core/meson_options.txt, core/src/meson.build, nor any C/Rust source is modified. No Netflix upstream file is touched. No rebase conflict is possible.
fix/helm-node-deployment-deduplicate (2026-05-30, ADR-0713 / ADR-0719)¶
Files touched: deploy/helm/vmafx/templates/node.yaml (modified), deploy/helm/vmafx/templates/node-deployment.yaml (deleted).
Rebase impact: None. The deploy/helm/ tree is fork-only — Netflix upstream ships no Helm chart. The duplicate-Deployment collision and its fix live entirely within fork-added templates.
The two templates both rendered a Deployment named {{ include "vmafx.fullname" . }}-node under .Values.node.enabled, which made helm install fail with a duplicate-resource error and left Phase 4b distributed scoring uninstallable. The richer node.yaml (liveness/readiness probes, GPU resource injection, metrics port + Service, VMAFX_NODE_ID per ADR-0713) is kept; the rclone Secret mount + storage-mode / model-dir env vars from the deleted node-deployment.yaml were folded into node.yaml.
libvmaf.Score / ScoreDirect ctx.Context plumbing (2026-05-31, fix/libvmaf-score-ctx)¶
Files touched: pkg/libvmaf/libvmaf.go (Score signature: ctx as first param; exec.CommandContext + WaitDelay = 2s), pkg/libvmaf/direct.go (ScoreDirect signature: ctx as first param; per-frame ctx.Err() check at the top of the read+queue loop; rename of local ctx *C.VmafContext -> vmafCtx to avoid shadowing), pkg/libvmaf/libvmaf_test.go, pkg/libvmaf/direct_test.go (call-site updates + new cancel tests), cmd/vmafx-server/{http_server.go,grpc_server.go,http_cancel_test.go}, cmd/vmafx-controller/{http_server.go,grpc_server.go,http_cancel_test.go}, cmd/vmafx-node/executor.go, cmd/vmafx-mcp/impl_direct.go.
Rebase impact: All fork-local. pkg/libvmaf/ is a fork-only Go wrapper around the public libvmaf C ABI; cmd/vmafx-* are entirely fork-local binaries with no upstream counterparts. No headers in core/include/ were changed and no upstream-mirrored C source was touched, so upstream syncs cannot collide.
Action on next upstream sync: None. The C API surface (vmaf_init / vmaf_read_pictures / vmaf_score_pooled / vmaf_close) the Go layer wraps is unchanged; we only renamed a local C.VmafContext* variable inside Go.
vmafx-tune-go deep bug audit (2026-05-31, fix/vmafx-tune-go-audit-20260531)¶
Files touched: pkg/report/report.go, pkg/report/sanitize_test.go (new), pkg/bisect/bisect.go, pkg/bisect/nan_parse_test.go (new), pkg/bisect/timeout_test.go (new), pkg/encoder/encoder.go, pkg/encoder/discover.go, pkg/encoder/discover_test.go, pkg/encoder/discover_cache_test.go (new), pkg/encoder/timeout_test.go (new), cmd/vmafx-tune/cmd/compare.go, cmd/vmafx-tune/cmd/ladder.go, cmd/vmafx-tune/cmd/ladder_nan_test.go (new), changelog.d/fixed/0979-vmafx-tune-go-deep-bug-audit.md (new).
Rebase impact: Fork-local only. Every file lives under pkg/{report,bisect,encoder} or cmd/vmafx-tune/, which are 100% fork additions (the vmafx-tune-go Stage-1 surface from ADR-0705 / ADR-0713; no Netflix upstream counterpart exists). An upstream sync will not encounter conflicts on any of these files.
On-disk surface changes (relevant to in-tree callers):
- New public helper
report.SanitizeBisectSamples([]bisect.Sample) []any— exported so the schema-v2 sweep emitter incmd/vmafx-tune/cmd.emitSweepJSONcan apply the same nested NaN→null coercion the Python emitter (_nan_to_noneintools/vmaf-tune/src/vmaftune/compare.py) has used since the RFC-8259 hardening of 2026-05-17. - New env-var knobs
VMAFX_TUNE_ENCODE_TIMEOUT(default60m),VMAFX_TUNE_SCORE_TIMEOUT(default30m),VMAFX_TUNE_PROBE_TIMEOUT(default30s) for the ffmpeg / vmaf / ffprobe subprocess upper bounds. Operators can lower these in CI to fail-fast instead of hanging a job. - Codec-discovery cache key is now the binary path, not a one-shot
sync.Once. Callers that depended on the old "first probe wins forever" shape (none in tree as of this PR) will see a re-probe on binary-path change.
Python-surfaces bug-audit bundle (2026-05-31, fix/python-surfaces-bug-audit)¶
no rebase impact: REASON — fork-local Python files only. Touches: ai/src/corpus/base.py (fork-added, ADR-0371), ai/src/vmaf_train/data/{datasets,manifest_scan,feature_dump,frame_dataset,frame_loader}.py (fork-added tiny-AI training surface), and mcp-server/vmaf-mcp/src/vmaf_mcp/server.py (fork-added MCP server, no upstream equivalent). No core/src/ or upstream-mirror file is touched.
Fork-local files: ai/src/corpus/base.py, ai/src/vmaf_train/data/datasets.py, ai/src/vmaf_train/data/manifest_scan.py, ai/src/vmaf_train/data/feature_dump.py, ai/src/vmaf_train/data/frame_dataset.py, ai/src/vmaf_train/data/frame_loader.py, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, mcp-server/vmaf-mcp/tests/test_server.py, ai/tests/test_python_surfaces_bug_audit.py (new), mcp-server/vmaf-mcp/tests/test_python_surfaces_bug_audit.py (new), changelog.d/fixed/python-surfaces-bug-audit-2026-05-31.md (new), docs/research/0983-python-surfaces-bug-audit-2026-05-31.md (new).
chore/gosec-findings-fix-v2 (2026-06-01, ADR-0983)¶
no rebase impact: the Go surface (cmd/, pkg/, gen/, api/vmafx/v1/) is wholly fork-local. Netflix/vmaf has no Go code. The sweep touches only Go files plus .github/workflows/go-ci.yml, docs/adr/, docs/research/, changelog.d/security/, and the regression test cmd/vmafx-mcp/impl_gosec_test.go. No C, no SIMD, no GPU, no upstream-mirror file is touched.
Re-run of the earlier chore/gosec-findings-fix (PR #509, closed without merge) against the post-#505 / post-#508 master tip. The prior PR conflicted with PR #505's pkg/bisect/bisect.go + pkg/encoder/encoder.go exec.CommandContext + per-stage timeout plumbing; this v2 sweep applies the same security-hardening fixes while preserving the ctx + timeout. Same set of touched files; same single real bug fixed (describeModel path traversal). No markdownlint / formatter regression; both parser CI scripts green.
test_svm_parser link + vmafx-operator audit (2026-05-31, fix/test-svm-parser-link-plus-operator-audit)¶
Files touched: core/test/meson.build (added ../src/thread_locale.c to the test_svm_parser source list), api/vmafx/v1/vmafxjob_types.go, api/vmafx/v1/vmafxnode_types.go, api/vmafx/v1/vmafxmodeltraining_types.go, config/crd/bases/vmafx.dev_vmafxjobs.yaml, config/crd/bases/vmafx.dev_vmafxnodes.yaml, config/crd/bases/vmafx.dev_vmafxmodeltrainings.yaml, deploy/helm/vmafx/crds/*.yaml (synced copies), cmd/vmafx-operator/internal/controller/vmafxnode_controller.go, cmd/vmafx-operator/internal/controller/vmafxnode_probehealthz_test.go (new), cmd/vmafx-operator/internal/controller/vmafxmodeltraining_controller_branch_test.go (int32 casts).
Rebase impact: All changes are fork-local — the operator, vmafx.dev/v1 CRDs, and Helm chart are 100% additions on this fork (no upstream counterparts). The single upstream-mirrored file is core/test/meson.build; the change there is one-line additive (append '../src/thread_locale.c'), no conflict surface. No public C ABI is touched; the libsvm vendor remains observation-only per ADR-0889.
No load-bearing invariants; no AGENTS.md rebase-pin required.
core/src lifecycle + memory audit (2026-05-31, fix/core-lifecycle-memory-audit)¶
Files touched: core/src/picture_pool.c, core/src/model.c, core/src/model.cpp, core/src/predict.c, core/src/output.c, core/src/dict.c, core/src/feature/feature_collector.cpp, core/test/test_predict.c (new test case), core/test/test_model.c (new test case), core/test/test_output.c (new test case).
Rebase impact: Touch points are all upstream-mirrored TUs. Each fix is a narrow correctness patch (NULL guard, errno sign, errno code, missing return-value propagation, free-on-error path) — none of them changes the public C ABI, the entry-point list, or the data layout of any struct.
On upstream sync:
picture_pool.c::pool_preallocate_picturescleanup: trivialvmaf_picture_unref → aligned_freeswap on a fork-only code path (Netflix has nopool_preallocate_picturesin this form).model.{c,cpp}::vmaf_model_load+vmaf_model_collection_loadNULL guards: add theif (!version) return -EINVAL;block at the top of each function. Conflicts only if upstream reorders the body.predict.csign and propagation fixes: small textual deltas on upstream-mirrored functions. If upstream changes the sign convention, fall in line with upstream.output.cCSV/SUB NULL guards: paste the same three-line guard the XML/JSON writers already have (ADR-0602).dict.c::dict_normalize_numeric: one-word changestrtof→strtod. Conflicts only if upstream switches to a different parser entirely.feature_collector.cpp::aggregate_vector_append: one-word change-EINVAL→-ENOMEM.
No load-bearing invariants; no AGENTS.md rebase-pin required.
Markdown-lint full-ruleset discharge (2026-05-31, ADR-0980)¶
Files touched: ~1,400 .md files across docs/, .claude/, core/, ai/, tools/, bindings/, mcp-server/, scripts/, cmd/, top-level README/CONTRIBUTING/CODE_OF_CONDUCT. .markdownlint.json is unchanged.
Rebase impact: None for upstream-mirrored TUs. The added <!-- markdownlint-disable ... --> comments live only in fork-added / fork-modified .md files; upstream-vendored .md files in subprojects/, core/test/data/, python/test/resource/, compat/python-vmaf/resource/, compat/python-vmaf/matlab/, model/, and testdata/ are excluded from the gate (see .pre-commit-config.yaml markdownlint-cli2 exclude: regex) and are not touched by this PR.
On upstream sync, no resolution is required for .md files. If a future upstream PR adds a new fork-mirrored .md file that brings new violations, either fix the content or extend the per-file disable comment for that file; do not modify .markdownlint.json.
vmafx-server + pkg/score bug-audit (2026-05-31, ADR-0978)¶
Files touched: pkg/observability/observability.go, pkg/observability/observability_test.go, pkg/score/grpc_client.go, pkg/score/grpc_client_test.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/http_server.go, cmd/vmafx-server/main_test.go, cmd/vmafx-server/grpc_recovery_test.go (new).
Rebase impact: None. All five surfaces are fork-local Go code:
pkg/observability/is a fork-added package; Netflix/vmaf has no equivalent.pkg/score/is a fork-added wrapper around the fork's vmafx.v1 proto; Netflix/vmaf does not ship a gRPC client.cmd/vmafx-server/is a fork-added binary (ADR-0703); Netflix/vmaf has no equivalent gRPC + HTTP scoring service.
No C ABI, no public header, no upstream-mirrored TU touched. Upstream syncs do not interact with this change.
core/tools input-reader safety (2026-05-31, ADR-0977)¶
Files touched: core/tools/y4m_input.c, core/tools/yuv_input.c, core/tools/vmaf_bench.c, core/test/test_y4m_alloc_failure.c (new), core/test/meson.build.
Rebase impact: PARTIAL. y4m_input.c and yuv_input.c are vendored from Daala via upstream Netflix/vmaf; upstream still carries the unchecked malloc returns and the int-precision dst_buf_sz arithmetic. On any upstream sync that touches the y4m / yuv parsers, keep our (size_t) casts and the explicit if (!_y4m->dst_buf) return -1; block — the diff is localised (the size-arithmetic stanza lines and the return 0; tail of y4m_input_open_impl).
vmaf_bench.c is fork-only (no upstream churn).
Sync action: Mechanical merge if upstream touches the same lines: prefer the fork side at the size-arithmetic stanza and the malloc-failure block in y4m_input_open_impl, prefer the fork bench_cleanup label structure in vmaf_bench::bench_feature.
Test suite: NULL-check malloc sweep (2026-05-31, ADR-0971)¶
Files touched: core/test/test_ssimulacra2_simd.c, core/test/test_framesync.c, core/test/test_pic_preallocation.c, core/test/AGENTS.md.
Rebase impact: None. All changes are purely additive NULL-checks in test-only files. Netflix/vmaf does not carry these test files upstream (test_ssimulacra2_simd.c, test_pic_preallocation.c are fork-added; test_framesync.c has fork-local modifications). No C API or public ABI is touched. Subsequent upstream syncs do not interact with this change.
Public-header ISO-reserved include guards renamed (2026-05-31)¶
Files touched: core/include/libvmaf/libvmaf.h, core/include/libvmaf/picture.h, core/include/libvmaf/feature.h, core/include/libvmaf/model.h, core/include/libvmaf/macros.h, core/include/libvmaf/vmaf_assert.h, core/include/libvmaf/dnn.h, core/include/libvmaf/libvmaf_cuda.h, core/include/libvmaf/libvmaf_sycl.h.
Rebase impact: REAL. Six of the nine renamed headers (libvmaf.h, picture.h, feature.h, model.h, libvmaf_cuda.h, plus arguably dnn.h if upstream ever ports the tiny-AI surface) are upstream-mirrored from Netflix/vmaf. Upstream still ships the ISO-reserved __VMAF_*__ guard pattern that SEI CERT DCL37-C bans (ADR-0972).
Sync action: On any upstream sync that touches these six headers, keep our LIBVMAF_<BASENAME>_H lines and drop the upstream __VMAF_*__ ones. The diff is mechanical (3 lines per header — the #ifndef, the #define, and the closing #endif comment); no semantic merge required. The full guard-rename table is in ADR-0972 §Decision.
The remaining three renamed headers (macros.h, vmaf_assert.h, libvmaf_sycl.h) are fork-only and never receive upstream churn.
If a future upstream sync changes the LIBVMAF_* pattern itself (e.g. Netflix adopts the same fix with a different spelling), reopen ADR-0972 to decide whether to converge.
Rust vmafx safe binding crate scaffold (2026-05-31)¶
Files touched: Cargo.toml (workspace), bindings/rust/vmafx/ (new crate).
Rebase impact: None. Netflix/vmaf has no Rust bindings upstream. The new crate is a pure addition under bindings/rust/, parallel to the existing vmafx-sys crate (ADR-0706). The workspace Cargo.toml gains one members entry; no upstream file is touched. Subsequent upstream syncs do not interact with this code.
If a future upstream PR adds a Rust workspace (extremely unlikely), the fork's bindings/rust/vmafx/ and bindings/rust/vmafx-sys/ paths must not collide with the upstream layout. As of n8.1 there is no precedent.
gRPC ScoreStream Phase 1 (2026-05-31)¶
Files touched: proto/vmafx.proto, gen/go/vmafx.pb.go, gen/go/vmafx_grpc.pb.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/AGENTS.md, pkg/score/grpc_client.go, pkg/score/grpc_client_test.go, pkg/score/AGENTS.md, docs/architecture/grpc-streaming.md, docs/architecture/index.md, docs/adr/0933-grpc-streaming-multi-frame-scoring.md, docs/adr/_index_fragments/0933-grpc-streaming-multi-frame-scoring.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, changelog.d/added/0933-grpc-streaming-phase1.md.
Rebase impact: None against upstream — this surface is entirely fork-local (Netflix/vmaf has no Go gRPC service). The proto package stays vmafx.v1; the unary Score / Health RPCs are unchanged. ScoreStream is purely additive. The Phase 1 server handler returns codes.Unimplemented after validating the opening StreamConfig.
If a future upstream port touches core/ in a way that changes the public C API consumed by pkg/libvmaf, the Phase 2 wiring of ScoreStream to libvmaf will need to mirror that change — but Phase 1 is server-stub-only and doesn't reach the C surface yet.
Native bash pre-commit hook (ADR-0924, 2026-05-31)¶
no rebase impact: all paths are fork-local — scripts/githooks/ (new directory), docs/development/pre-commit-hooks.md, docs/adr/0924-*.md, docs/research/0924-*.md, changelog.d/added/native-pre-commit-hooks.md. The Makefile changes rename hooks-install → install-hooks (with the old name kept as a legacy alias), in a fork-only target that upstream Netflix does not define. No upstream-mirrored file is touched.
Metal kernel parity tests round 3 (2026-05-31)¶
Files touched: core/test/meson.build, core/test/test_metal_integer_motion_parity.c (new), core/test/test_metal_float_motion_parity.c (new), core/test/test_metal_float_moment_parity.c (new), core/test/test_metal_float_ms_ssim_parity.c (new)
Rebase impact: None. Closes the per-kernel parity coverage gap for the remaining four Metal extractors after PR #351 (registration audit) and PR #379 (round-2 parity: motion_v2, integer_psnr, float_psnr, float_ssim). All four new files live under the existing fork-local enable_metal block in core/test/meson.build (the entire Metal backend is fork-added — ADR-0361 / ADR-0421 / ADR-0589 / T8-2a — and absent from upstream Netflix/vmaf). The block edit appended four new executable() + test() pairs immediately after the round-2 block (PR #379 has since merged); the surrounding endif boundaries are untouched so upstream syncs cannot conflict here.
If upstream ever ports a Metal backend, the test files would need re-pointing at the upstream kernel names; the synthetic-fixture + -ENODEV skip pattern carries forward unchanged.
vmaf-tune coverage push — lowest-covered modules (2026-05-31)¶
Files touched: tools/vmaf-tune/tests/test_coverage_push_lowcov_modules.py, changelog.d/added/vmaf-tune-coverage-push.md.
Rebase impact: None. tools/vmaf-tune/ is fork-only (no upstream Netflix counterpart); the new test file imports only public + underscore- prefixed seams that already existed in the package. The 92 added tests are pure unit-level (no subprocess / no ffmpeg / no ONNX / no GPU) and exercise documented error paths in uncertainty.py, _gop_common.py, proxy.py, predictor_features.py, benchmark.py, encoder_profile.py, and fast.py. If a future refactor renames any of the targeted internal helpers (_parse_fps, _run_probe_encode, _run_signalstats, _parse_frame_sizes, _mean, _resolve_baseline, _row_encode_fps, _row_score_fps, _resolve_model_path), update the corresponding import in this single test file.
Core MCP transport coverage push (2026-05-31)¶
Files touched: core/test/test_mcp_coverage.c (new), core/test/meson.build, changelog.d/added/core-mcp-coverage-push.md, docs/research/core-mcp-coverage-push-2026-05-31.md.
Rebase impact: None. The embedded MCP server (core/src/mcp/, core/include/libvmaf/libvmaf_mcp.h) is fork-only — upstream Netflix/vmaf has no MCP surface — so this test-only push is fully self-contained and never lands on a Netflix file. If upstream ever adds an MCP-shaped surface, treat the test as canonical fork-side coverage and reconcile by name. Companion: ADR-0108 deliverables in docs/research/core-mcp-coverage-push-2026-05-31.md.
phase3-subset-sweep readonly-view fix (2026-05-31)¶
Files touched: ai/scripts/phase3_subset_sweep.py, ai/tests/test_phase3_subset_sweep_unit.py.
Rebase impact: None — ai/scripts/phase3_subset_sweep.py is fork-original (Research-0027 Phase-3 tooling, no upstream Netflix analogue). The fix tightens an internal contract (_standardize_inplace now refuses read-only inputs and the caller forces a writeable copy via to_numpy(copy=True)); there is no public API change and no coupling to upstream files. Safe to carry through any upstream sync.
GPU runtime error-path leak fixes (ADR-0960, 2026-05-31)¶
no rebase impact: REASON — all changes are in fork-local error paths of core/src/cuda/common.c (new fail_after_stream label between two existing labels) and core/src/picture_pool.c (one pthread_cond_signal call and two pic->priv = NULL assignments). No upstream Netflix/vmaf logic is altered. The new test file core/test/test_picture_pool_error_paths.c is wholly fork-added with no upstream counterpart.
queue PullWork rollback on post-update Get failure (2026-05-31, ADR-0961)¶
no rebase impact: pure Go controller-internal fix. cmd/vmafx-controller/queue/ is entirely fork-added (no upstream Netflix/vmaf equivalent); upstream syncs do not touch this subtree.
ai/src NaN propagation guards — eval.correlations + tune._read_best_metric (2026-05-31, ADR-0963)¶
Files touched: ai/src/vmaf_train/eval.py, ai/src/vmaf_train/tune.py, ai/tests/test_eval_correlations.py, ai/tests/test_tune_objective.py.
Rebase impact: None — ai/src/vmaf_train/ is entirely fork-local with no upstream Netflix/vmaf equivalent. No C surface is touched. No upstream coupling.
Helm chart seccompProfile + node-deployment image helper (2026-05-31, ADR-0969)¶
no rebase impact: REASON — both changes are entirely within deploy/helm/vmafx/ which is fork-added infrastructure with no upstream counterpart in Netflix/vmaf. Netflix upstream does not ship a Helm chart; upstream syncs never touch this directory. PR #439 (ADR-0930) has since merged cleanly on top (it modified values.yaml in a non-conflicting block and did not touch node-deployment.yaml).
MCP HTTP transport security hardening (2026-05-31, ADR-0967)¶
no rebase impact: REASON — changes are confined to the fork-local MCP server subtree (mcp-server/vmaf-mcp/). Netflix upstream has no MCP server; this entire subtree will never merge upstream. The security middleware, auth helpers, and bind-host resolver are fork-invented code with no upstream counterpart.
HIP kernel parity-test coverage round 4 (2026-05-31, ADR-0958)¶
Files touched: core/test/test_hip_ssimulacra2_parity.c, core/test/test_hip_float_ssim_parity.c, core/test/meson.build.
Rebase impact: Low — the 2 new tests are fork-added consumers of fork-added HIP feature extractors (ssimulacra2_hip, float_ssim_hip). Upstream Netflix has no HIP backend, so neither the test sources nor the meson registration block has an upstream-mirror analogue. The skip-on--ENOSYS contract matches the round-1/2/3 template (PR #351 / PR #372 / PR #443) — if upstream ever ships a HIP backend the tests can be kept verbatim; their CPU side calls only public C-API entry points (vmaf_init, vmaf_use_feature, vmaf_read_pictures, vmaf_feature_score_at_index, vmaf_close) that are upstream-stable.
The round-4 plan also covered speed_chroma_hip / speed_temporal_hip parity gates, but those were deferred when the container build surfaced a pre-existing latent link defect — the helpers speed_internal_init_dimensions / speed_internal_float_stride are declared in core/src/feature/speed_internal.h but never defined. The same defect blocks the analogous CUDA / SYCL speed-family TUs from linking (none are currently wired into their respective meson archives). A follow-up PR adding core/src/feature/speed_internal.c will unblock all three GPU backends simultaneously. Tracked as T-HIP-SPEED-INTERNAL-IMPL-MISSING-2026-05-31 in docs/state.md.
Companion: docs/adr/0958-hip-kernel-coverage-round4.md, docs/research/0958-hip-kernel-coverage-round4-2026-05-31.md, changelog.d/added/0958-hip-kernel-coverage-round4.md.
Controller infrastructure fixes — StreamJobs + reaper stop signal (2026-05-31, ADR-0962)¶
No rebase impact: all changes are confined to the fork-local controller package (cmd/vmafx-controller/) and the Queue interface in cmd/vmafx-controller/queue/queue.go. Netflix upstream does not own these paths (the controller is a Phase 4b addition, not a port of Netflix code). The nodes.Registry context-propagation change is entirely within fork-local code and has no interaction with libvmaf C sources.
vmaf_mcp_stop() idempotent (CAS instead of exchange) (2026-05-31)¶
Files touched: core/src/mcp/mcp.c, core/test/test_mcp_stop_idempotent.c, core/test/meson.build.
Rebase impact: None — core/src/mcp/ is fork-only (Netflix has no MCP surface). The fix replaces three atomic_exchange(running, 2) + dual-value-guard pairs with three atomic_compare_exchange_strong(expected=1, desired=2) calls, keeping the existing 3-state state machine semantics intact and matching the CAS pattern already used by vmaf_mcp_start_{stdio,uds,sse}. The new regression test (test_mcp_stop_idempotent.c) is also fork-only. Sync impact: no Netflix file references vmaf_mcp_* symbols.
compat/python-vmaf/ scanf + ProcessRunner locale fixes (2026-05-31, ADR-0955)¶
Files touched: compat/python-vmaf/tools/scanf.py, compat/python-vmaf/__init__.py, python/test/python_harness_scanf_locale_bugs_test.py (new fork-only test).
Rebase impact: Medium. Both fixes live inside the upstream-mirror tree (compat/python-vmaf/), so a future upstream sync may overwrite them.
tools/scanf.py::makeFormattedHandler.applyWidth— the upstream code has an inverted width guard:
def applyWidth(handler):
if width is None:
return makeWidthLimitedHandler(handler, width, ignoreWhitespace=True)
return handler
The fork swaps the branches so implicit-width converters return handler and explicit-width converters return the capped wrapper. When porting an upstream commit that re-touches this function, verify the swapped semantics are preserved. If Netflix has independently fixed the same bug, drop the fork delta and update ADR-0955's status to Superseded by upstream.
__init__.py::ProcessRunner.run— upstream sets the C locale viaenv.setdefault("LC_ALL", "C")/env.setdefault("LANG", "C"). The fork replaces bothsetdefaultcalls with unconditional assignment (env["LC_ALL"] = "C"/env["LANG"] = "C") so a parent shell with non-EnglishLC_ALL/LANGcannot defeat the override. When porting an upstream commit that re-touchesProcessRunner.run, preserve the unconditional assignment pattern.
The regression test python/test/python_harness_scanf_locale_bugs_test.py exercises both code paths and will fail if either fix regresses during an upstream sync.
GPU dispatch-runtime host-only unit test (2026-05-31, ADR-0954)¶
Files touched: core/test/test_gpu_dispatch_runtime.c (new), core/test/meson.build.
Rebase impact: Low. The new test executable is fork-local — upstream Netflix/vmaf does not ship the gpu_dispatch_env, gpu_dispatch_parse, or per-backend dispatch_strategy TUs targeted by the test (those are all ADR-0181 / ADR-0461 / ADR-0483 fork additions). The wiring in core/test/meson.build lives in the fork-added test region near other test_* entries; no upstream collision is possible. If upstream ever adds dispatch-strategy abstractions of its own, the test would coexist by name.
Python harness coverage push round 2 (2026-05-31)¶
Files touched: python/test/python_harness_coverage_test.py (new — 82 cases).
Rebase impact: None. The new test file lives under python/test/, exercises only fork-touched modules under compat/python-vmaf/, and does not modify any Netflix golden assertAlmostEqual value (CLAUDE.md §8). Upstream Netflix has no analogue at the compat/ path (that subtree exists because of ADR-0700). When /sync-upstream runs, this file is fork-only and needs no re-baselining. Companion: PR #412 (test/compat-python-vmaf-coverage) round 1, PR #413 (fix/decorator-persist-encode) — neither overlap.
HIP ADM parity test feature-name + ENOSYS skip (ADR-0950, 2026-05-31)¶
Files touched: core/test/test_hip_adm_parity.c.
Rebase impact: no rebase impact: Netflix/vmaf upstream has no HIP backend at all (HIP is a fork-exclusive backend per ADR-0212); theadm_hipextractor and its parity test only exist on this fork. There is no upstream counterpart to reconcile during sync. Companion fix to ADR-0949 (motion3 sibling); both tests now follow the same two-axis (enable_hip × enable_hipcc) skip predicate. Companion docs: docs/adr/0950-hip-adm-parity-feature-name-and-enosys-skip.md, changelog.d/fixed/0950-test-hip-adm-parity-feature-name-and-enosys-skip.md.
go-services-coverage-round2 (2026-05-31)¶
Files touched: cmd/vmafx-tune/cmd/unit_internal_test.go, cmd/vmafx-tune/cmd/unit_internal_fixtures_test.go, cmd/vmafx-controller/grpc_server_test.go, cmd/vmafx-controller/queue/queue_extra_test.go, pkg/encoder/version_extract_test.go, changelog.d/added/go-services-coverage-round2.md.
Rebase impact: None. The Go cmd/ and pkg/ trees are wholly fork-added — upstream Netflix/vmaf has no Go layer. All new files are test-only and never enter the libvmaf C build, the Python harness, or the FFmpeg patch stack. No production code is touched, so the upstream rebase boundary is unaffected. The cmd/vmafx-controller grpc_server tests carry the //go:build cgo tag mirroring the production source file, so they compile only when cgo is enabled (matching the existing main_test.go invariant).
dev/Containerfile libvmaf → core path fix (2026-05-31, ADR-0966)¶
No rebase impact: pure path fix, no upstream coupling. dev/Containerfile is entirely fork-local and the only change is substituting three occurrences of the old source-directory name libvmaf/ with core/ following the ADR-0700 rename. If a future sync touches dev/Containerfile (unlikely — Netflix does not ship a dev container), re-run grep -n 'libvmaf/' dev/Containerfile to confirm no stale references were re-introduced by the merge. The library output name (libvmaf.so) and stage name (libvmaf-build) are intentionally preserved as references to the product, not the source directory.
SIMD bit-exactness round-2 — SSIMULACRA 2 FMA unification + lib-FP-model extension (2026-05-30, ADR-0891)¶
CUDA kernel parity coverage round 3 (2026-05-31)¶
Files touched: core/test/test_cuda_float_psnr_parity.c, core/test/test_cuda_float_vif_parity.c, core/test/test_cuda_float_ms_ssim_parity.c, core/test/test_cuda_float_moment_parity.c, core/test/test_cuda_ssimulacra2_parity.c, core/test/meson.build (+5 executable() + test() blocks under the existing if get_option('enable_cuda') guard, suite ['fast', 'gpu']), docs/adr/0947-cuda-kernel-coverage-round3.md, docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line), docs/research/cuda-kernel-coverage-round3-2026-05-31.md, changelog.d/added/cuda-kernel-coverage-round3.md.
Rebase impact: None. All five test files are fork-local (test_cuda_*_parity.c pattern is fork-only; upstream Netflix/vmaf has no equivalent test scaffold). core/test/meson.build edits are additive blocks inside the existing enable_cuda guard — no upstream file in this region. If upstream Netflix adds new CUDA kernels with matching names (float_psnr_cuda, float_vif_cuda, float_ms_ssim_cuda, float_moment_cuda, ssimulacra2_cuda), the parity tests continue to work unchanged. If upstream adds new test files near test_integer_vif_cpu_cuda_parity (the closest neighbour in meson.build) the additive blocks may need re-anchoring — trivial 3-way merge.
PRs #351 (round 1) and #374 (round 2) both inserted test entries under the same enable_cuda guard in core/test/meson.build and have since merged; the sequential three-way merges resolved cleanly at landing time.
ADR template — optional supply-chain / SBOM / carbon sections (2026-05-31)¶
Files touched: docs/adr/0000-template.md, docs/adr/README.md
Rebase impact: None. Upstream Netflix/vmaf does not maintain an ADR template; the entire docs/adr/ tree is fork-local. The new optional sections (## Supply-chain impact, ## SBOM delta, ## Carbon / footprint) appear between ## Consequences and ## References. No upstream conflict surface.
vmafx-operator zap → slog uniformity (2026-05-31)¶
Files touched: cmd/vmafx-operator/main.go, cmd/vmafx-operator/internal/controller/suite_test.go, cmd/vmafx-operator/AGENTS.md, go.mod, go.sum.
Rebase impact: None against Netflix/vmaf (the operator is a fork-only Go package; upstream ships no Kubernetes operator). Rebase impact does exist against the kubebuilder v4 template itself: future scaffold upgrades will re-introduce sigs.k8s.io/controller-runtime/pkg/log/zap imports in main.go and suite_test.go. When re-running kubebuilder edit / operator-sdk init, re-apply the slog bridge:
main.go: replace the zap.Options block withslog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: ...})passed throughlogr.FromSlogHandler.suite_test.go: replacezap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(true))withslog.NewTextHandler(GinkgoWriter, &slog.HandlerOptions{Level: slog.LevelDebug})throughlogr.FromSlogHandler.
The cmd/vmafx-operator/AGENTS.md invariant #6 documents this; check it before merging any upstream-template re-sync PR.
MCP server cgo direct path Phase 1 (2026-05-31, ADR-0931)¶
Files touched: pkg/libvmaf/direct.go, pkg/libvmaf/errors.go, pkg/libvmaf/direct_test.go, pkg/libvmaf/errors_test.go, pkg/libvmaf/AGENTS.md, cmd/vmafx-mcp/impl.go, cmd/vmafx-mcp/impl_direct.go, cmd/vmafx-mcp/impl_direct_test.go, cmd/vmafx-mcp/AGENTS.md.
Rebase impact: None against Netflix upstream. The change is entirely fork-local: it adds a new in-process cgo scoring path (ScoreDirect, ValidateModel) to pkg/libvmaf/ (which does not exist upstream) and wires two MCP tool handlers (vmaf_score, describe_model) in cmd/vmafx-mcp/ (which also does not exist upstream) to take that path when VMAFX_MCP_DIRECT=1. The libvmaf public C ABI used (vmaf_init / vmaf_use_features_from_model / vmaf_read_pictures / vmaf_score_pooled / vmaf_model_load_from_path / vmaf_picture_alloc / vmaf_picture_unref / vmaf_model_destroy / vmaf_close) is the canonical entry-point set documented in core/include/libvmaf/; the upstream signatures change rarely and any rename would already break core/tools/vmaf.c, so this code rides along.
If upstream renames or removes any of those entry points, update pkg/libvmaf/direct.go to match, then run the unit suite (LD_LIBRARY_PATH=$(pwd)/core/build-cpu/src go test ./pkg/libvmaf/ ./cmd/vmafx-mcp/).
OpenTelemetry tracing full roll-out — ADR-0782 (2026-06-03)¶
Files touched: pkg/observability/otel_instruments.go (new), cmd/vmafx-controller/grpc_server.go, cmd/vmafx-node/executor.go, cmd/vmafx-node/main.go, cmd/vmafx-server/main.go, cmd/vmafx-mcp/main.go, cmd/vmafx-controller/queue/queue.go, deploy/grafana/vmafx-overview.json (new), deploy/helm/vmafx/templates/otel-collector-sidecar.yaml (new), docs/observability/otel.md (new), docs/adr/0782-otel-tracing.md (new).
Rebase impact: None on the Netflix/vmaf C tree. Entirely fork-local Go instrumentation. No upstream C surfaces touched. The otel_instruments.go file is a pure addition; span call-sites follow the StartSpan/EndSpan pattern and do not change function signatures. If a future port touches executor.go or grpc_server.go, the span stanzas are additive and do not conflict with upstream semantics.
OpenTelemetry traces + metrics — Phase 1 (2026-05-31)¶
Files touched: pkg/observability/otel.go (new), pkg/observability/otel_test.go (new), pkg/observability/AGENTS.md (new), cmd/vmafx-controller/main.go, cmd/vmafx-controller/grpc_server.go, docs/development/observability.md (new), docs/adr/0927-opentelemetry-traces-metrics-phase1.md (new), go.mod, go.sum.
Rebase impact: None on the Netflix/vmaf C tree. The change is entirely fork-local Go code under pkg/observability and cmd/vmafx-controller. Upstream Netflix/vmaf has no Go services, so there is no cross-repo file to reconcile on sync. The added OTel dependencies (go.opentelemetry.io/otel, otelgrpc, OTLP HTTP exporters) live in go.mod and do not touch the C build.
When Phase 2 wires OTel into vmafx-node / vmafx-server / vmafx-mcp / vmafx-tune, follow the call-site pattern documented in pkg/observability/AGENTS.md (the 5 s bounded shutdown is mandatory). Each subsequent service ships as its own PR with its own ADR.
mkdocs ADR nav restructure + by-tag generator (2026-05-31)¶
Files touched: mkdocs.yml, scripts/docs/generate-adr-nav.sh, scripts/docs/generate-adr-by-tag.sh, docs/adr/by-tag/*.md (auto-generated, 443 files), docs/adr/0937-mkdocs-nav-decade-buckets.md, docs/adr/_index_fragments/0937-*.md, docs/adr/_index_fragments/_order.txt (append).
Rebase impact: None. All files are fork-only:
- Upstream Netflix/vmaf has no
mkdocs.yml, nodocs/adr/tree, and noscripts/docs/directory. - The sentinel-bounded splice region in
mkdocs.yml(# >>> ADR-NAV-GENERATED/# <<< ADR-NAV-GENERATED) is fork-local and unaffected by any upstream doc reorganisation. - The
docs/adr/by-tag/tree is regenerated byscripts/docs/generate-adr-by-tag.sh --write; on every ADR add / edit / tag-edit, re-run the script (or rely on the--checkCI gate once wired into.github/workflows/docs.yml).
If the per-hundred bucket labels in LABELS inside scripts/docs/generate-adr-nav.sh drift away from the actual bucket themes (e.g., the 0800s and 0900s fill out with a clear topic), edit the dict and re-run --write.
BuildKit cache mounts on container build matrix (2026-05-31)¶
Files touched: Dockerfile, docker/Dockerfile.production-gpu, dev/Containerfile, Dockerfile.go-server, docs/adr/0923-buildkit-cache-mounts.md, changelog.d/changed/buildkit-cache-mounts.md.
Rebase impact: None. These four Dockerfiles are fork-local (Netflix's upstream has only the top-level Dockerfile which we already heavily customise; the production-gpu / dev / go-server trio are wholly fork-added). The change introduces three patterns worth preserving across rebases:
# syntax=docker/dockerfile:1.7header at the top of each file.RUN --mount=type=cache,target=/var/cache/apt,sharing=locked --mount=type=cache,target=/var/lib/apt,sharing=locked apt-get ...on every apt invocation, with the matchingrm -rf /var/lib/apt/lists/*cleanup REMOVED.RUN --mount=type=cache,target=$CCACHE_DIR,sharing=locked CCACHE_DIR=... <build command>around every meson/ninja/cmake invocation;ccacheinstalled as a build dependency; FFmpeg gets--cc='ccache gcc' --cxx='ccache g++'; cmake gets-DCMAKE_{C,CXX}_COMPILER_LAUNCHER=ccache.
If upstream Netflix adds new RUN apt-get install lines to the top-level Dockerfile, prepend the apt cache mount pair. If they add new C/C++ compile steps, wrap them with the ccache mount + env var.
The vmaf user uid/gid is now explicitly pinned to 1000 in dev/Containerfile so BuildKit --mount=...,uid=1000,gid=1000 directives resolve to the same identity that runs the build — preserve that pin on rebase.
Pre-existing test failures across ai/, vmaf-tune, mcp-server (2026-05-30)¶
Files touched: ai/tests/conftest.py, ai/tests/test_codec_aware_fr.py, ai/tests/test_dnn_exporter_run_provenance.py, ai/tests/test_export_roundtrip.py, ai/tests/test_qat_smoke.py, ai/tests/test_registry.py, ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py, ai/tests/test_train_fr_regressor_v3.py, ai/tests/test_tune_cli.py, ai/tests/test_variance_mode.py, ai/tests/test_conftest_pytorch_lightning_guard.py (new), ai/pyproject.toml, tools/vmaf-tune/src/vmaftune/ladder.py, tools/vmaf-tune/tests/test_ladder.py, mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py, mcp-server/vmaf-mcp/tests/test_http_transport.py.
Rebase impact: None. All three touched subsystems are fork-local:
ai/— entirely fork-added (tiny-AI training); upstream Netflix/vmaf has no Python training package.tools/vmaf-tune/— fork-added recommendation tool; upstream has no equivalent.mcp-server/vmaf-mcp/— fork-added MCP JSON-RPC server; upstream has no equivalent.
No cross-repo conflict possible. The requires_pytorch_lightning() helper in ai/tests/conftest.py is a generic environment-probe pattern that will keep working unchanged for any future torch / torchvision / torchmetrics ABI drift; the only knob to revisit is whether to widen the broad except Exception if some future failure mode warrants more specific handling.
Unified Python test orchestrator — top-level noxfile.py (2026-05-31, ADR-0914)¶
Files touched: noxfile.py (new), docs/development/python-test-orchestrator.md (new), docs/adr/0914-unified-python-test-orchestrator.md (new), docs/adr/_index_fragments/0914-unified-python-test-orchestrator.md (new), docs/adr/_index_fragments/_order.txt, docs/research/0914-python-test-orchestrator-audit-2026-05-31.md (new), changelog.d/added/0914-unified-python-test-orchestrator.md (new).
Rebase impact: None. The orchestrator is entirely fork-local — upstream Netflix/vmaf ships only the python/ legacy harness and its python/tox.ini, neither of which this change modifies. The new noxfile.py lives at repo root, a path upstream does not occupy. If upstream ever adds its own noxfile.py, treat the conflict as fork-takes-priority: our file delegates to upstream's python/tox.ini via the python_harness session, so behaviour is preserved.
clang-tidy modernize-* family enablement (2026-05-31)¶
Files touched: .clang-tidy, core/src/feature/feature_collector.cpp, core/src/metadata_handler.cpp.
Rebase impact: Low. .clang-tidy is fork-local; upstream Netflix does not ship one. feature_collector.cpp is fork-renamed from upstream .c under ADR-0725-family migrations — if an upstream sync brings a new .c patch that touches feature_collector, the patch likely applies cleanly to the .cpp (extern "C" linkage is preserved) but should be replayed in the C++ idiom (nullptr not NULL, <cstring> not <string.h>). metadata_handler.cpp is wholly fork- local with no upstream counterpart.
When syncing: keep the four -modernize-* opt-outs in .clang-tidy (noise / C-ABI hostility rationale documented in ADR-0915). If upstream ever ships their own clang-tidy config, merge by union — drop our opt-outs only with an explicit ADR.
cargo-deny supply-chain policy (2026-05-31)¶
Files touched: deny.toml (new), .github/workflows/rust-ci.yml (new cargo-deny job + deny.toml / core/src/feature/rust/** path filters), core/src/feature/rust/tad/Cargo.toml (publish = false).
Rebase impact: None against upstream Netflix/vmaf — deny.toml, the cargo-deny CI job, and the Rust workspace itself are all fork-local additions. Upstream does not maintain a Rust workspace, so no merge surface exists. The publish = false change to core/src/feature/rust/tad/Cargo.toml is also fork-local (core/src/feature/rust/ is an ADR-0707 pilot directory that does not exist upstream).
If a future upstream sync starts shipping a Rust workspace of its own, reconcile by extending deny.toml's [graph] members implicit-include behaviour (cargo-deny picks up workspace members automatically) and audit whether upstream's choice of licenses / banned-crate stance differs from ours. See ADR-0917.
Pixel-format edge coverage test (2026-05-31)¶
Files touched: core/test/test_pixel_format_edge_coverage.c (new), core/test/meson.build (one executable + one test() registration).
Rebase impact: Low. The new test file is wholly fork-local and only links against the public extractor / picture / collector C surface (no internal-source #include). If upstream Netflix renames any of the API entry points the test uses (vmaf_get_feature_extractor_by_name, vmaf_feature_extractor_context_create / _extract / _close / _destroy, vmaf_feature_collector_init / _get_score / _destroy, vmaf_picture_alloc / _unref), update the test accordingly. The meson.build additions sit between the existing test_psnr block and test_framesync; no upstream core/test/meson.build reordering should conflict, since the inserted block is immediately adjacent to fork-only neighbours.
ADR-0912.
ADR README drift sweep (2026-05-31)¶
Files touched: docs/adr/README.md, docs/adr/_index_fragments/_order.txt, 35 new + 7 rewritten files under docs/adr/_index_fragments/[0-9]*.md, 3 orphan fragments removed under docs/adr/_index_fragments/, changelog.d/fixed/adr-readme-regen.md.
Rebase impact: None. The fragment tree and README.md are entirely fork-local (upstream Netflix/vmaf has no ADR directory). The sweep only re-aligns three fork-local index sources against the already-authoritative docs/adr/[0-9]*-*.md ADR file set, with no content changes to any ADR body. Future regenerations are mechanical via scripts/docs/concat-adr-index.sh --write.
codespell sweep + .codespellrc (2026-05-31)¶
Files touched: .codespellrc (new), CONTRIBUTING.md, docs/metrics/cambi.md, docs/adr/0910-codespell-sweep-config.md (new), changelog.d/fixed/codespell-sweep.md (new).
Rebase impact: Low. .codespellrc skip-list explicitly excludes every Netflix-author / vendored / upstream-mirrored file enumerated in ADR-0910 §Context (e.g. compat/python-vmaf/*, python/test/*, core/src/feature/{x86,arm64,cuda,hip,common,metal}/*, core/src/svm.cpp, core/src/pdjson.c, core/tools/y4m_input.c, core/tools/cli_parse.c, core/README.md, core/tools/README.md, core/test/test_picture.c), so re-running codespell after a sync surfaces only newly-introduced fork typos. If upstream lands new files under the skipped trees that the fork later adopts as fork-local (e.g. a new feature extractor we then modify), drop the matching skip row and re-run codespell to catch any latent typos.
If upstream changes path layout (rename core/ back to libvmaf/, etc.), update the skip-list paths in .codespellrc to match. ignore-words-list is independent of upstream layout.
Re-run: codespell --config .codespellrc (or just codespell from the repo root — picks up .codespellrc automatically). Expected output: no findings on a clean tree.
.gitignore staleness audit (ADR-0905, 2026-05-30)¶
Files touched: .gitignore, python/.gitignore.
Rebase impact: None. Both files are fork-local (the rules trimmed or rewired all originate from fork additions and the post-ADR-0700 directory rename). Upstream Netflix/vmaf maintains its own .gitignore independently; the matlab MEX block, the Cython adm_dwt2_cy block, and the legacy python/.gitignore scope were fork-only artefacts of the rename and never tracked upstream. On the next /sync-upstream, Netflix's .gitignore will merge cleanly because the trimmed rules (.gradle/, .pypirc) and the rewired matlab paths (compat/python-vmaf/matlab/**/*.mex*) do not overlap any upstream rule.
cpp const/noexcept/nodiscard annotation sweep (2026-05-30)¶
Files touched: core/src/dict.cpp, core/src/feature/feature_collector.cpp, core/src/feature/feature_name.cpp, core/src/fex_ctx_vector.cpp, core/src/opt.cpp.
Rebase impact: None. All annotations are added to fork-local TU-internal static helpers and one TU-local lambda in C++23 files that were introduced by the ADR-0723 / ADR-0727 / ADR-0729 / ADR-0731 C++ migration waves. The extern "C" public-ABI entry points are untouched, so no upstream header rebase is affected. If upstream Netflix introduces new fork-only C++ static helpers, apply the same [[nodiscard]] / noexcept discipline so the lint posture stays uniform.
libvmaf-public-header-doc-gaps-round3 (2026-05-30)¶
Files touched:
core/include/libvmaf/picture.h(doc comments on enum + opaque typedef + 2 entry points, plus NOLINT-cited include guard)core/include/libvmaf/libvmaf.h(doc comments on 2 enums + opaque typedef + 1 struct, plus NOLINT-cited include guard)core/include/libvmaf/libvmaf_cuda.h(doc comments on opaque typedef + config struct + enum + 1 picture-config struct, plus NOLINT-cited include guard)
Rebase impact: Low. The doc-comment additions land above unchanged upstream-mirror declarations; any future Netflix upstream that touches the same function signatures, enum bodies, or struct definitions will produce a tractable 3-way merge — the doc text is fork-local and git merge will preserve our /** ... */ block above whatever upstream rewrites the declaration to. No identifier renames; no ABI/source impact.
The NOLINT annotations on __VMAF_H__ / __VMAF_PICTURE_H__ / __VMAF_CUDA_H__ are inline comments only — they do not alter the include guard symbols themselves, so upstream's preprocessor identity remains intact. Same pattern PR #327 (round 2) used for feature.h / model.h / dnn.h. If a future upstream sync changes the guard form (unlikely — these have been stable for years), the NOLINT cites become redundant and can be removed in a follow-on cleanup.
libvmaf-public-header-doc-gaps-round2 (2026-05-30)¶
Files touched: - core/include/libvmaf/feature.h (doc comments + NOLINT-cited guard) - core/include/libvmaf/model.h (doc comments + NOLINT-cited guard) - core/include/libvmaf/dnn.h (vmaf_dnn_session_close doc + NOLINT-cited guard)
Rebase impact: Low. The doc-comment additions land above unchanged upstream-mirror declarations; any future Netflix upstream that touches the same function signatures will produce a tractable 3-way merge — the doc text is fork-local and git merge will preserve our /** ... */ block above whatever upstream rewrites the signature to. No identifier renames; no ABI/source impact.
The NOLINT annotations on __VMAF_FEATURE_H__ / __VMAF_MODEL_H__ / __VMAF_DNN_H__ are inline comments only — they do not alter the include guard symbols themselves, so upstream's preprocessor identity remains intact. If a future upstream sync changes the guard form (unlikely — these have been stable for years), the NOLINT cites become redundant and can be removed in a follow-on cleanup. /binary symbol renames; consumers of the patch stack (ffmpeg-patches/) and the Go/Rust bindings see identical declarations.
Bash strict-mode + trap-cleanup sweep (2026-05-30, ADR-0899)¶
Files touched: scripts/run_unittests.sh, scripts/ai/fetch-tiny-blobs.sh, dev/scripts/smoke-probe-loop.sh, scripts/ci/check-agent-worktree-drift.sh, scripts/ci/test_check_agent_worktree_drift.sh, scripts/ci/check-adr-numbering.sh, scripts/ci/check-dispatch-registry.sh, scripts/adr/next-free.sh, tools/ensemble-training-kit/_platform_detect.sh.
Rebase impact: None. All 9 files are fork-local (Netflix upstream has neither scripts/adr/, scripts/ci/check-*-drift*, scripts/ai/fetch-tiny-blobs.sh, dev/scripts/smoke-probe-loop.sh, tools/ensemble-training-kit/, nor the in-tree scripts/run_unittests.sh in this form). No conflict risk on sync-upstream.
Conflict watchpoints (none expected): if a future upstream sync introduces a Netflix-side scripts/run_unittests.sh, the strict-mode set -eu block at the top of our version is the only carrier of fork-specific behaviour and trivially survives a 3-way merge.
Metal kernel parity tests round 2 (2026-05-30)¶
Files touched: core/test/meson.build, core/test/test_metal_motion_v2_parity.c (new), core/test/test_metal_integer_psnr_parity.c (new), core/test/test_metal_float_psnr_parity.c (new), core/test/test_metal_float_ssim_parity.c (new)
Rebase impact: None. All four new files live under the existing fork-local enable_metal block in core/test/meson.build (the entire Metal backend is fork-added — ADR-0361 / ADR-0421 / ADR-0589 — and absent from upstream Netflix/vmaf). The block edit appends four new executable() + test() pairs immediately after the test_metal_install_header block; the surrounding endif boundaries are untouched so upstream syncs cannot conflict here.
If upstream ever ports a Metal backend, the test files would need re-pointing at the upstream kernel names; the synthetic-fixture + -ENODEV skip pattern from test_sycl_motion3_parity.c carries forward unchanged.
.claude/skills/ — ADR-0700 path drift cleanup (2026-05-30)¶
Files touched: .claude/skills/add-gpu-backend/scaffold.sh, .claude/skills/build-vmaf/build.sh, .claude/skills/build-vmaf/SKILL.md, .claude/skills/regen-docs/SKILL.md, .claude/skills/add-simd-path/templates/simd_feature.c.template
Rebase impact: None. Files are entirely fork-local (the .claude/ tree does not exist upstream — see ADR-0331 / ADR-0700). The change rewrites four residual libvmaf/ source-tree references to core/ to match the post-ADR-0700 layout. Public install-path references (core/include/libvmaf/..., libvmaf.so) are unchanged.
When syncing from upstream Netflix/vmaf, this file does not need attention; the conflict surface is empty.
ADR-0871 — SSIM SIMD dispatch pthread_once guard — 2026-05-30¶
Low rebase impact. The fix sits in two fork-added zones:
core/src/feature/iqa/ssim_tools.c— the file is a Tom-Distler BSD-2011 import, but the four globals (g_ssim_precompute,g_ssim_variance,g_ssim_accumulate,g_iqa_convolve), the setter functions, and the newiqa_ssim_install_dispatch_oncehelper are fork additions (Distler's 2011 import has no SIMD dispatch). The pthread_once guard and atomic-installer publish are appended to the existing fork-added block. A future re-import of Tom Distler's IQA would not collide because the new code lives in fork-added territory.core/src/feature/iqa/ssim_simd.h— fork-added header (Netflix/vmaf has no equivalent); appends one declaration.core/src/feature/float_ssim.candcore/src/feature/float_ms_ssim.c— the dispatch-install bodies are fork additions; the change factors them into a callback and routes the call through the once-helper. The Netflix-upstream init() bodies are unchanged beyond the dispatch block, so a future upstream change to the init() prologue would merge cleanly.
Fork-local files: core/src/feature/iqa/ssim_tools.c (fork-added dispatch zone), core/src/feature/iqa/ssim_simd.h (fork-added header), core/src/feature/float_ssim.c (fork-added SIMD-install block), core/src/feature/float_ms_ssim.c (fork-added SIMD-install block), docs/adr/0871-ssim-dispatch-pthread-once.md, docs/research/tsan-race-audit-2026-05-30.md, changelog.d/fixed/tsan-race-audit.md.
sanitizer-pass-cleanup (2026-05-30, ADR-0869)¶
Files touched:
core/src/feature/cambi.c— adds twointshadow slots (window_size_opt,max_log_contrast_opt) toCambiState; the options table targets them;init()copies into the existinguint16_truntime fields.core/src/feature/x86/adm_avx2.c— moves theuint32_tcast inside the shift in four DWT2 filter-packing expressions.core/src/feature/x86/adm_avx512.c— same as AVX2.
Rebase impact:
- CAMBI: upstream Netflix's
CambiStatedoes not have the_optshadow slots. On upstream sync, expect a context conflict on the struct definition and on the two option-table entries. Resolution is to keep the fork's shadow slots and the init-bridge assignments; upstream's option entries should be re-pointed at the_optshadows. - ADM AVX2/AVX-512: the four filter-packing expressions are upstream-mirrored code. On upstream sync, a textual conflict is possible at every occurrence; the fork's resolution is the inside-cast (
((uint32_t)filter[k] << 16)). Bit-exact with upstream output; safe to keep.
Verified clean under ASan+UBSan against the full unit-test suite (63 tests OK) and the vmaf CLI on 4:2:0 8-bit, 4:2:2 10-bit, 4:2:0 12-bit. Cambi tuned-options feature-name derivation (cambi_mlc_3_ws_63) works.
SIMD bit-exactness round-2 — SSIMULACRA 2 FMA unification + lib-FP-model extension (2026-05-30, ADR-0891)¶
Files touched: core/src/meson.build, core/src/feature/x86/ssimulacra2_avx2.c, core/src/feature/x86/ssimulacra2_avx512.c, core/test/test_ssimulacra2_simd.c.
Rebase impact: Low — SSIMULACRA 2 is fork-added (no upstream coupling) and the meson helper _libvmaf_feature_icx_args mirrors the existing _x86_simd_strict_fp_extra pattern from ADR-0339 (round-1). If upstream Netflix ever adds an intel-llvm build matrix and ships scalar references inside libvmaf_feature_static_lib that participate in SIMD bit-exactness tests, reuse _libvmaf_feature_icx_args rather than minting a new helper. The FMA-based picture_to_linear_rgb colour matrix is fully self-contained inside the SSIMULACRA 2 TUs; no upstream Netflix file references those symbols. Companion: docs/adr/0891-simd-bit-exact-round2-fmaf-libvmaf-feature-icx.md, changelog.d/fixed/0891-simd-bit-exact-round2.md.
SIMD strict-FP flags for icx (2026-05-30)¶
Files touched: core/src/meson.build, core/test/meson.build, core/src/feature/AGENTS.md
Rebase impact: Low. The changes add an icx-specific compile flag (-fp-model=precise) to x86 SIMD carve-out static libs and to the three SIMD bit-exactness test executables (test_psnr_hvs_simd, test_ms_ssim_decimate, test_ssimulacra2_simd). The flag is added only when cc.get_id() returns 'intel-llvm' or 'intel-llvm-cl', so GCC and vanilla Clang builds are unaffected.
If upstream Netflix adds new SIMD carve-out static libs, apply the same _x86_simd_strict_fp_extra pattern to them so the icx build stays green. If Netflix adds new SIMD test executables that compare a scalar reference against SIMD output, add _simd_strict_fp_args to their c_args.
Coverage Gate ORT accessor coverage (2026-05-30)¶
Files touched: core/test/dnn/test_ort_internals.c, changelog.d/fixed/coverage-gate-ort-backend-accessor.md.
Rebase impact: None. The added test exercises a fork-only public accessor (vmaf_ort_output_name_at) on a fork-only file (core/src/dnn/ort_backend.c); the test TU itself is fork-only under ADR-0112's testability surface. Upstream Netflix/vmaf has no ORT backend, so there is no cross-repo file to reconcile on sync. The ADR-0114 per-file floor override (PER_FILE_MIN["core/src/dnn/ort_backend.c"]=78) stays in place; the coverage delta (409 → 413 / 526 = 78.5 %) is the per-file safety margin restored after PR #129 grew the denominator with unreachable error-handling.
unused-testdata-debug-scripts-cleanup (2026-05-30, ADR-0880)¶
Files touched: testdata/check_borders.py (deleted), testdata/compare_a380.py (deleted), testdata/scores_sycl_b580_576_mq.json (deleted).
Rebase impact: None. All three files were fork-added and not present in upstream Netflix/vmaf. No upstream patch context references them. Future /sync-upstream runs will not surface any conflicts on these paths.
trivy-container-scan-baseline (2026-05-30, ADR-0878)¶
Files touched: docker/Dockerfile.production, docker/Dockerfile.production-gpu
Rebase impact: None. Both files are fork-added (no upstream Netflix/vmaf equivalents — Netflix ships no production Dockerfile). The added USER nonroot:nonroot directive on each final stage will not conflict on any future upstream sync. If upstream ever publishes their own Dockerfile, the fork's containers stay separate (the GHCR namespace is vmafx/).
go-nilness-staticcheck-audit (2026-05-30)¶
Files touched: cmd/vmafx-server/{main.go,http_server.go}, cmd/vmafx-controller/{main.go,http_server.go}, cmd/vmafx-mcp/impl.go, cmd/vmafx-node/main_test.go, pkg/ai/infer_test.go, pkg/bisect/bisect_test.go.
Rebase impact: None. Every modified file is fork-original Go code under cmd/vmafx-* / pkg/*; Netflix/vmaf upstream does not ship Go code in these paths. No upstream conflict possible.
iwyu-audit (2026-05-30) — fork-only files, append-only direct includes¶
Files touched: 16 fork-authored sources under core/src/feature/, core/src/feature/x86/, core/test/, core/tools/.
Rebase impact: None. All modified files carry the Lusoris-only license header (filtered explicitly during scope selection — files with a Netflix header were skipped to preserve upstream-parity per CLAUDE.md §12 r12). The diff consists of removing dead #include directives and adding direct includes for symbols previously reached transitively. Upstream Netflix/vmaf does not contain any of these files in the form modified here, so there is no conflict surface for a future sync-upstream to navigate.
Follow-up: A second-phase IWYU pass on core/src/dnn/*, core/src/{cuda,sycl,hip,vulkan}/, and the DNN-gated feature extractors is owed (the host CPU-only build cannot exercise VMAF_HAVE_DNN because ONNX Runtime is not installed locally). That pass will run inside the vmaf-dev-mcp container per CLAUDE.md §12 r15.
magic-number-audit cert-int07c (2026-05-30, ADR-0874)¶
Files touched: core/src/mcp/{mcp_internal.h,mcp.c,compute_vmaf.c,transport_sse.c}, core/src/picture.c, core/src/cuda/picture_cuda.c, core/src/libvmaf.c.
Rebase impact: Low. All five core/src/mcp/* files and core/src/cuda/picture_cuda.c are fork-added; upstream Netflix/vmaf has neither MCP nor a CUDA picture-allocator with these bounds. core/src/picture.c and core/src/libvmaf.c are fork-mirrored — the renames touch fork-added helpers (dnn_*_output_feature_name) and the fork's VMAF_PIC_BPC_{MIN,MAX} hardening (originally a fork-local guard against bpc < 8 || bpc > 16). A future upstream sync that re-introduces a raw 8/16 predicate on those lines should keep the fork's named constants — they are not bit-exact changes and do not alter behaviour. No new public C-API symbols introduced.
eintr-and-io-error-audit (2026-05-30, ADR-0872)¶
Files touched: core/src/mcp/transport_stdio.c, core/src/mcp/transport_uds.c, core/src/libvmaf.c, core/src/feature/cambi.c, core/src/sycl/dmabuf_import.cpp, core/tools/vmaf_vpl.c.
Rebase impact: Low. The MCP transports are fully fork-local (no upstream peer). libvmaf.c, cambi.c, and vmaf_vpl.c carry fork-local hunks (vmaf_write_output, heatmaps close() fail-path, VPL VA-API init) that are already non-shared with upstream — the new (void) casts sit inside those hunks. dmabuf_import.cpp is wholly fork-added (no upstream file). No upstream conflict expected on the next sync; if Netflix ever adds their own MCP transport, the EINTR retry pattern should be ported there too.
adr-0100-per-surface-doc-audit (2026-05-30)¶
Files touched: docs/development/build-flags.md, docs/api/dnn.md, docs/usage/cli.md, changelog.d/added/adr-0100-per-surface-doc-audit.md.
Rebase impact: None. All four files are fork-added (the upstream Netflix/vmaf tree has no docs/development/build-flags.md, no docs/api/dnn.md, no docs/usage/cli.md at the fork's depth, and no changelog.d/). The audit closes per-surface doc gaps for fork-local surfaces (codec-context DNN API, codec/preset/CRF/resize CLI flags, six Meson options) that originated in fork ADRs (ADR-0335, ADR-0361, ADR-0519, ADR-0550, ADR-0568, ADR-0623, ADR-0707, ADR-0726). No upstream file is touched; no rebase conflict possible.
go-pkg-coverage-push (2026-05-30)¶
Files touched: pkg/observability/observability_test.go, pkg/report/report_test.go, pkg/encoder/discover_test.go, pkg/libvmaf/paths_test.go, pkg/gpu/parsers_test.go, pkg/gpu/probe_shim_test.go, pkg/bisect/parse_test.go, pkg/storage/internals_test.go, changelog.d/added/go-pkg-coverage-push.md.
Rebase impact: None. The Go pkg/ tree is wholly fork-added — upstream Netflix/vmaf has no Go layer. All new files are test-only and never enter the libvmaf C build, the Python harness, or the FFmpeg patch stack. No production code is touched, so the upstream rebase boundary is unaffected.
python-type-annotations-audit (2026-05-30)¶
Files touched: ai/src/aiutils/{__init__,jsonl_utils,parquet_utils}.py, ai/src/corpus/base.py, mcp-server/vmaf-mcp/src/vmaf_mcp/{server,http_transport}.py, tools/vmaf-tune/src/vmaftune/{auto,benchmark,corpus,encoder_profile, fr_from_nr_adapter,hdr,predictor_features,report,saliency,score, score_backend,sidecar}.py, tools/vmaf-tune/src/vmaftune/codec_adapters/_gop_common.py, pyproject.toml.
Rebase impact: None. Every touched file is fork-added (ai/, mcp-server/, tools/vmaf-tune/) or fork-only mypy config (pyproject.toml [tool.mypy.overrides]). Upstream Netflix/vmaf does not ship any of these trees; on a future upstream sync there is no conflict surface.
The change is a pure type-annotation tightening — no runtime semantics change. The one functional change is the removal of a dead-code duplicate _run_benchmark() definition in mcp-server/vmaf-mcp/src/vmaf_mcp/server.py; the deleted copy was silently shadowed at import time by the progress-token-aware implementation 575 lines later, so removal is behaviour-preserving.
openapi-rest-schema (2026-05-29, ADR-0797)¶
Files touched: api/openapi/vmafx-server-v1.yaml, api/openapi/oapi-codegen.yaml, gen/go/oapi/vmafx_server_v1.gen.go, cmd/vmafx-server/rest_adapter.go, cmd/vmafx-server/swagger_ui.go, cmd/vmafx-server/http_server.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/main.go, docs/server/rest.md
Rebase impact: None. All touched files are fork-local additions in the Go server layer (cmd/vmafx-server/, api/, gen/go/) that do not exist in upstream Netflix/vmaf. No rebase conflicts are possible.
The newHTTPServer signature gained a *grpcServer parameter; any fork-local branch that calls newHTTPServer with the old 4-argument form will fail to compile and must add the grpcServer argument.
ADR-0783 — Kubernetes e2e integration test harness (2026-05-29)¶
No rebase impact on upstream C/Python code.
All files are wholly fork-local additions: test/e2e/kind-cluster.sh, test/e2e/fixtures/gen-tiny-yuv.sh, test/e2e/fixtures/ref.yuv, test/e2e/fixtures/dist.yuv, test/e2e/kuttl-tests/ (all test case YAML), .github/workflows/e2e-k8s.yml, docs/k8s/integration-tests.md, docs/adr/0783-k8s-e2e-integration-test-harness.md, changelog.d/added/k8s-e2e-integration-test-harness.md.
Netflix upstream has no Kubernetes test infrastructure; no merge conflict risk. A sync-upstream that adds an upstream e2e directory would not conflict with this harness because Netflix uses libvmaf/ path roots that the fork has renamed to core/ (ADR-0700).
cuda-ms-ssim-vert-lcs-horiz-ldg (2026-05-29, ADR-0757)¶
Files touched: core/src/feature/cuda/integer_ms_ssim/ms_ssim_score.cu
Rebase impact: None. The modified file is a fork-added CUDA kernel TU that does not exist in upstream Netflix/vmaf master (ms_ssim CUDA port is fork-local). No rebase conflict is possible.
The change is a pure performance annotation: __launch_bounds__(128), const float *__restrict__ pointer extraction, and __ldg() on inner-loop loads. If upstream Netflix ever adds their own ms_ssim CUDA port, this file will need to be re-reviewed against theirs; the F3 pattern should carry forward.
cpp23 orphan .c sweep — metadata_handler.c (2026-05-29)¶
Files touched: core/src/metadata_handler.c (deleted)
Rebase impact: None. The file was dead source — never referenced by any meson.build after ADR-0708 renamed it to metadata_handler.cpp. Upstream Netflix/vmaf still uses metadata_handler.c; on future upstream sync, the upstream .c file will reappear in the patch context but meson.build will continue to reference only .cpp. No conflict possible: the deletion only affects the fork-local tree.
Rule for future cpp23 conversions: when renaming foo.c → foo.cpp in meson.build, always git rm core/src/foo.c in the same commit. Leaving both files in tree causes the source tree to diverge from the build definition.
cuda-readback-free-host-pinned-leak sweep (2026-05-29)¶
Files touched: core/src/cuda/kernel_template.h, docs/backends/kernel-scaffolding.md
Rebase impact: None. The fix is entirely in fork-added files (kernel_template.h is a Lusoris-added header; kernel-scaffolding.md is fork-added documentation). No upstream Netflix/vmaf file is modified.
The changed function (vmaf_cuda_kernel_readback_free) did not exist in upstream — it was introduced by the fork's kernel-template ADR. No rebase conflict is possible.
ADR-0753 — CUDA resolution-aware dispatch scaffold (2026-05-29)¶
Files touched (initial + extended scope):
core/src/feature/cuda/resolution_dispatch.{h,c}(new)core/src/feature/cuda/integer_adm/adm_cm.cu(two kernel macros)core/src/feature/cuda/integer_adm_cuda.c(include, struct field, init, dispatch)core/src/feature/cuda/integer_vif/filter1d.cu(FILTER1D_8_HORI_NO_BOUNDS macro + instantiation)core/src/feature/cuda/integer_vif_cuda.c(struct field, init, resolution-aware dispatch in filter1d_8)core/src/feature/cuda/integer_ssim/ssim_score.cu(calculate_ssim_vert_combine_no_bounds)core/src/feature/cuda/integer_ssim_cuda.c(struct field, init, resolution-aware dispatch in submit_fex_cuda)core/src/feature/cuda/AGENTS.md(invariant notes + verified wirings table)docs/adr/0753-cuda-resolution-aware-dispatch.md(new; extended policy table)docs/backends/cuda/overview.md(kernel dispatch table extended)docs/research/0753-cuda-resolution-aware-dispatch-design.md(new)changelog.d/added/cuda-resolution-aware-dispatch.md(new)
Rebase impact: Low on resolution_dispatch.{h,c} — these are wholly new fork-local files; no upstream conflict possible.
adm_cm.cu: The ADM_CM_LINE macro was split into ADM_CM_LINE_BOUNDED and ADM_CM_LINE_NO_BOUNDS. If upstream Netflix modifies adm_cm.cu after the fork diverges, the split needs to be reapplied around the new macro body. The extern "C" wrapping (ADR-0747) must be preserved for both entries.
integer_adm_cuda.c: The AdmStateCuda struct grew one field (func_adm_cm_line_kernel_8_no_bounds). If upstream adds fields to the struct in the same location, resolve the merge conflict by keeping both additions. The new #include "feature/cuda/resolution_dispatch.h" line must survive any upstream shuffle of the include block.
On rebase: verify that both cuModuleGetFunction calls in the init block still reference valid kernel symbol names from adm_cm.cu.
Research-0751 4K baseline + PR #79 adm_cm A/B (2026-05-29)¶
Files touched: docs/research/0751-cross-backend-4k-baseline-and-pr79-adm-cm-4k-measure.md, changelog.d/changed/cross-backend-4k-baseline.md
Rebase impact: None. Research-only digest; no source code changed. No upstream conflict possible — these are fork-added measurement artifacts.
CI round-3 fix — .semgrepignore, .gitleaks.toml, codeql-config.yml, compat/python-vmaf/ (2026-05-28)¶
Files touched: .semgrepignore, .gitleaks.toml, .github/codeql-config.yml, compat/python-vmaf/core/feature_extractor.py, core/test/test_hip_smoke.c, ai/src/aiutils/jsonl_utils.py, ai/src/vmaf_train/registry.py, .github/workflows/libvmaf-build-matrix.yml.
Rebase impact: Low. All changes are either CI config fixes (path corrections post-ADR-0700 rename) or code fixes for missing functions and removed extractors.
On upstream sync:
.semgrepignoreand.gitleaks.tomlare fork-local; no upstream conflict expected.codeql-config.ymlis fork-local; no upstream conflict expected.compat/python-vmaf/core/feature_extractor.py: if Netflix upstream modifiespython/vmaf/core/feature_extractor.py(old path), the rename-shim must preserve the removal offloat_ansnrfromVmafIntegerFeatureExtractor's features list. The legacy path (VmafFeatureExtractor, line 301) may still referencefloat_ansnrif upstream restores it; that's intentional pending the legacy-runner sunset decision.core/test/test_hip_smoke.c: if upstream addsfloat_ansnr_hipback, the removed test function must be restored.
docs/research/0734-r610-driver-changelog-audit-2026-05-28.md — R610 driver audit¶
No rebase impact. This is a documentation-only research digest; it does not touch any C sources, build files, or API surfaces. No upstream sync conflict expected.
docs/research/0734-cudnn-version-audit-20260528.md — cuDNN/ORT audit (doc-only)¶
No rebase impact on upstream C/Python code: this PR adds only doc and changelog files. No C source, header, or Python source is modified.
If a future upstream sync adds cuDNN pinning or onnxruntime-gpu to any Python requirement, re-check dev/Containerfile lines 529–539 (ORT install) and ai/pyproject.toml for compatibility with the then-current cuDNN series.
Fork-local files added: docs/research/0734-cudnn-version-audit-20260528.md (new), changelog.d/changed/docs-cudnn-version-audit.md (new), docs/rebase-notes.md (this entry), docs/state.md (new deferred row).
Periodic drift sweep — upstream syncs may reintroduce libvmaf/ refs¶
After every Netflix/vmaf upstream sync, run the inventory grep from PR chore/post-rename-drift-sweep-20260528 to catch any new libvmaf/[a-z] or python/vmaf/ directory references outside ADR bodies and CHANGELOG.md. Files to recheck: Makefile, Dockerfile, .github/codeql-config.yml, IDE settings, skill scripts, and any newly-added utility under scripts/. See changelog fragment changelog.d/fixed/post-rename-drift-sweep.md for the full inventory commands.## port/upstream-batch-threading-picture-pool (2026-06-04)
Files touched: core/src/libvmaf.c, core/src/meson.build
Rebase impact: if a future upstream commit adds more #ifdef VMAF_BATCH_THREADING blocks, those blocks must be removed in the same port PR — the fork no longer uses the flag. The non-batch threaded_read_pictures path was removed; it is not recoverable from the fork without re-introducing the old per-extractor thread pool enqueue pattern.
.github/workflows/tests-and-quality-gates.yml — coverage job deselects slow vifks360 test¶
The coverage job's --deselect list includes python/test/quality_runner_test.py::QualityRunnerTest::test_run_vmaf_runner_float_vifks360o97 because the test exceeds the 60 s per-test limit on GitHub-hosted runners and truncates the suite. If upstream Netflix/vmaf adds a test with a similar name in a future sync, verify it does not also use a very large vif_kernelscale before removing the deselect. The deselect is CI-only; the test runs in the Netflix golden gate without a per-test timeout.
.github/workflows/ — post-ADR-0700 path rename (libvmaf/ → core/)¶
If an upstream Netflix/vmaf sync or cherry-pick brings new CI references to libvmaf/ (path filters, cd libvmaf, find libvmaf/src), they must be remapped to core/ in the same PR. The fork's source tree is rooted at core/ per ADR-0700; any upstream workflow or Makefile that still hardcodes libvmaf/ as a source directory will silently build from a non-existent path on this fork. Additionally, replace any gitleaks/gitleaks-action usage with the direct gitleaks CLI binary — the action requires a GITLEAKS_LICENSE for org repos even when public.
docker/Dockerfile.node — vmafx-node worker image + ffmpeg n8.2 (ADR-0717)¶
ffmpeg-patches now validated against both n8.1.1 and n8.2. The node Dockerfile pins FFMPEG_TAG=n8.2. When the next upstream sync lands, confirm:
ffmpeg-patches/still applies against the new tag. RunFFMPEG_SHA=<new-tag> bash ffmpeg-patches/test/build-and-run.sh.- If patches fail, rebase the affected patches and update
FFMPEG_TAGin bothdev/Containerfileanddocker/Dockerfile.nodein the same PR (CLAUDE.md §12 r14). pkg/encoder/encoder.goshells out to ffmpeg. If a new FFmpeg version changes a codec's CLI flag, update the encoder package to match.
Touched files: docker/Dockerfile.node, cmd/vmafx-node/main.go, cmd/vmafx-node/probe/probe.go, cmd/vmafx-node/probe/probe_test.go, cmd/vmafx-node/server/server.go, cmd/vmafx-node/server/server_test.go, docs/adr/0717-vmafx-node-ffmpeg-latest.md, docs/development/vmafx-node.md, changelog.d/added/node-ffmpeg-latest.md, docs/state.md (this entry), docs/rebase-notes.md (this entry).
feat/speed-python-compat-extractors (Research-0732, item #2) — low-conflict upstream port¶
No structural rebase impact. This PR adds fork-local content to paths (compat/python-vmaf/core/feature_extractor.py, compat/python-vmaf/core/quality_runner.py, python/test/feature_extractor_test.py, docs/metrics/speed_qa.md) that are already diverged from upstream (python/vmaf/core/… in Netflix/vmaf). When syncing from upstream:
- If Netflix/vmaf updates
SpeedChromaFeatureExtractororSpeedTemporalFeatureExtractor(e.g. bumps VERSION), apply the equivalent change tocompat/python-vmaf/core/feature_extractor.py. - If Netflix/vmaf adds new SpEED QualityRunner subclasses, port them to
compat/python-vmaf/core/quality_runner.py. - The compat harness mirrors Netflix's class hierarchy intentionally; keep the TYPE, VERSION, and ATOM_FEATURES_TO_VMAFEXEC_KEY_DICT in sync.
cmd/vmafx-server — Go gRPC + HTTP server (ADR-0703)¶
no rebase impact on upstream C/Python code: the Go server is entirely fork-local (cmd/, pkg/, gen/, proto/, go.mod, go.sum, Dockerfile.go-server, buf.gen.yaml). None of these paths overlap with Netflix/vmaf upstream.
If a future upstream sync touches model/ (model JSON schema changes) or core/include/libvmaf/libvmaf.h (public ABI), review:
pkg/libvmaf/libvmaf.go— the cgo#includeand JSON parsing inparseOutput.- The
ScoreResponse.featuresmap keys (derived frompooled_metricskeys in the vmaf CLI JSON output; key names are stable but new keys may appear).
Touched files: cmd/vmafx-server/main.go, cmd/vmafx-server/grpc_server.go, cmd/vmafx-server/http_server.go, cmd/vmafx-server/main_test.go, pkg/libvmaf/libvmaf.go, pkg/libvmaf/libvmaf_test.go, pkg/observability/observability.go, proto/vmafx.proto, proto/buf.yaml, buf.gen.yaml, gen/go/vmafx.pb.go, gen/go/vmafx_grpc.pb.go, go.mod, go.sum, Dockerfile.go-server, docs/server/grpc.md, docs/adr/0703-vmafx-server-go-grpc.md, changelog.d/added/vmafx-server-go.md, docs/state.md, deploy/helm/vmafx/values.yaml (image repository update).
PR that touches upstream-shared paths or establishes a rebase-sensitive invariant adds an entry here. PRs with no rebase impact state "no rebase impact" in the PR description and skip the entry.
docs/hw-backend-audit-2026-05-28 — doc-only, no rebase impact¶
No upstream rebase impact: this PR adds a research digest (docs/research/0733-hardware-backend-audit-2026-05-28.md), a changelog fragment, and a docs/state.md update. No C source, build system, or upstream-shared path is touched. Netflix/vmaf upstream syncs are unaffected.
feat/vmafx-phase4-language-modernization-foundation (ADR-0702) — fork-only, no Netflix conflict¶
No upstream rebase impact. The files added in this PR (go.mod, Cargo.toml, pkg/, cmd/, bindings/, .github/workflows/go-ci.yml, .github/workflows/rust-ci.yml) are entirely fork-local. Netflix/vmaf upstream does not have a Go or Rust surface; cherry-picks from upstream are unaffected.
The docs/principles.md, docs/development/languages.md, .gitignore, and Makefile additions are additive; the Makefile targets are named distinctly (go-build, go-test, rust-build, rust-test) and do not conflict with any upstream Makefile target.
feat/vmafx-tune-go-stage1 (ADR-0705) — fork-only, no Netflix conflict¶
No upstream rebase impact: the Go port lives entirely under cmd/vmafx-tune/, pkg/encoder/, pkg/bisect/, and pkg/report/. These directories do not exist in upstream Netflix/vmaf. The Python tools/vmaf-tune/ is unchanged. go.mod and go.sum are fork-local additions that upstream does not carry. Cherry-picks from upstream that touch tools/vmaf-tune/ Python source files are unaffected by this PR.
feat/vmafx-mcp-go-port (ADR-0704) — fork-only, no Netflix conflict¶
No upstream rebase impact: this PR adds cmd/vmafx-mcp/, pkg/libvmaf/, go.mod, and go.sum — all entirely fork-local. The Python MCP server at mcp-server/vmaf-mcp/ is unchanged. Netflix/vmaf upstream does not contain any Go code or an MCP server. Cherry-picks from upstream are unaffected.
chore/post-cutover-url-sweep — fork-only URL change, no Netflix conflict¶
No upstream rebase impact: this change replaces all occurrences of the lusoris/vmaf GitHub repository slug with VMAFx/vmafx following the GitHub org cutover. All affected strings are fork-local (CI workflow URLs, GHCR image paths, ADR cross-references, doc URLs). Netflix/vmaf upstream does not contain any of these references. Cherry-picks from upstream are unaffected.
refactor/vmafx-repo-layout (ADR-0700) — IMPORTANT: breaks all in-flight PRs¶
Upstream sync strategy: upstream Netflix/vmaf patches arrive with libvmaf/ paths. When cherry-picking or porting upstream commits after ADR-0700 merged, rewrite paths in the patch stream:
# Single commit
git format-patch -1 <upstream-sha> --stdout \
| sed 's|libvmaf/|core/|g' \
| git am --3way
# Range of commits
git format-patch <base>..<tip> --stdout \
| sed 's|libvmaf/|core/|g' \
| git am --3way
In-flight PR rebase recipe: after git rebase origin/master, resolve each libvmaf/ path conflict by renaming to core/, and each python/vmaf/ conflict by renaming to compat/python-vmaf/.
Python import compatibility: import vmaf continues to work via the compat/vmaf symlink (→ python-vmaf/) when compat/ is on sys.path, and via the python/vmaf/__init__.py shim when python/ is on sys.path. No from vmaf. import lines need changing.
What stays the same: libvmaf.so, libvmaf.pc, <libvmaf/...> C install-path headers, all public C symbols (VmafContext, vmaf_init, etc.), ffmpeg filter names.
Touched files: all source-tree path references across CI workflows, Makefile, scripts, docs, agent configs, and the libvmaf/ and python/vmaf/ directories themselves.
feat/ai-run-manifest-helper (ADR-0678)¶
No upstream rebase impact: this touches fork-local AI helper code, AI scripts, tests, Claude skills, docs, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship these AI provenance helpers or local training utilities.
Invariant: new standalone AI artifact sidecars use aiutils.run_manifest.write_run_manifest() so the shared envelope and run_provenance block stay deduplicated. Existing stable report schemas may continue embedding build_run_provenance() directly.
Smoke: .venv/bin/python -m pytest ai/tests/test_run_manifest.py ai/tests/test_build_bisect_cache.py ai/tests/test_legacy_extractor_manifests.py ai/tests/test_ptq_scripts.py ai/tests/test_qat_smoke.py -q
Touched files: ai/src/aiutils/run_manifest.py, ai/scripts/ptq_dynamic.py, ai/scripts/ptq_static.py, ai/scripts/qat_train.py, ai/scripts/build_bisect_cache.py, ai/scripts/collect_gpu_calibration_data.py, ai/scripts/extract_ugc_features.py, ai/scripts/extract_konvid_frames.py, AI tests, .claude/skills/ai-run-manifest/SKILL.md, AI package/Claude guidance, docs/ai/*.md, docs/adr/0678-*.md, docs/research/0699-*.md, changelog.d/added/0678-*.md, and this file.
feat/ai-dataset-fetch-manifests (ADR-0677)¶
No upstream rebase impact: this touches fork-local AI dataset fetch helpers, tests, docs, package AGENTS notes, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship these local downloader scripts.
Invariant: dataset fetch helpers that seed later AI JSONL/parquet builders write deterministic ADR-0661 run-manifest sidecars before conversion. fetch_konvid_1k.py defaults to <root>/fetch_manifest.json; fetch_youtube_ugc_subset.py keeps --manifest as the content manifest and defaults the run sidecar to <manifest>.run-manifest.json.
Smoke: .venv/bin/python -m pytest ai/tests/test_dataset_fetch_manifests.py -q
Touched files: ai/scripts/fetch_konvid_1k.py, ai/scripts/fetch_youtube_ugc_subset.py, ai/tests/test_dataset_fetch_manifests.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/training-data.md, docs/ai/konvid-1k-ingestion.md, docs/ai/youtube-ugc-ingestion.md, docs/ai/mos-corpora.md, docs/adr/0677-*.md, docs/research/0698-*.md, changelog.d/added/0677-*.md, and this file.
feat/mos-corpus-adapter-manifests (ADR-0676)¶
No upstream rebase impact: this touches fork-local AI MOS corpus adapters, tests, docs, package AGENTS notes, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship these local MOS-corpus ingestion scripts.
Invariant: CHUG, KoNViD-1k, KoNViD-150k, YouTube-UGC, LSVQ, LIVE-VQC, and Waterloo-IVC source adapters write <output>.manifest.json by default using corpus.base.write_ingest_manifest() and ADR-0661 run_provenance. Keep new MOS adapter CLIs on this sidecar contract before their JSONL rows feed aggregation, model-card refreshes, or signal-mix audits.
Smoke: .venv/bin/python -m pytest ai/tests/test_corpus_base.py ai/tests/test_chug.py ai/tests/test_konvid_1k.py ai/tests/test_konvid_150k.py ai/tests/test_lsvq.py ai/tests/test_live_vqc.py ai/tests/test_waterloo_ivc.py ai/tests/test_youtube_ugc.py -q
Touched files: ai/src/corpus/base.py, ai/scripts/chug_to_corpus_jsonl.py, ai/scripts/konvid_1k_to_corpus_jsonl.py, ai/scripts/konvid_150k_to_corpus_jsonl.py, ai/scripts/youtube_ugc_to_corpus_jsonl.py, ai/scripts/lsvq_to_corpus_jsonl.py, ai/scripts/live_vqc_to_corpus_jsonl.py, ai/scripts/waterloo_ivc_to_corpus_jsonl.py, ai/tests/test_corpus_base.py, ai/tests/test_chug.py, ai/AGENTS.md, docs/ai/*.md ingestion docs, docs/adr/0676-*.md, docs/research/0697-*.md, changelog.d/added/0676-*.md, and this file.
feat/full-feature-exporter-manifests (ADR-0668 follow-up)¶
No upstream rebase impact: this touches fork-local AI corpus exporters, tests, docs, package AGENTS notes, a research digest, and a changelog fragment. Upstream Netflix/vmaf does not ship these KoNViD or BVI-DVC training-table builders.
Invariant: ai/scripts/konvid_to_full_features.py and ai/scripts/bvi_dvc_to_full_features.py write <out>.manifest.json by default using aiutils.run_manifest. Keep the manifest beside refreshed local parquets so later model cards can prove source roots, cache/model inputs, feature order, and row/clip counts.
Smoke: .venv/bin/python -m pytest ai/tests/test_konvid_full_features.py ai/tests/test_bvi_dvc_dir_mode.py -q
Touched files: ai/scripts/konvid_to_full_features.py, ai/scripts/bvi_dvc_to_full_features.py, ai/tests/test_konvid_full_features.py, ai/tests/test_bvi_dvc_dir_mode.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/bvi-dvc-corpus-ingestion.md, docs/research/0696-full-feature-exporter-manifests.md, changelog.d/added/0696-full-feature-exporter-manifests.md, and this file.
feat/u2netp-mirror-exporter (ADR-0671)¶
No upstream rebase impact: this touches fork-local tiny-AI exporter tooling, tests, docs, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship the U2NetP mirror workflow.
Invariant: ai/scripts/export_u2netp_mirror.py imports an audited local xuebinqin/U-2-Net checkout and writes a gitignored ONNX plus manifest. Do not vendor upstream U-2-Net source here, do not accept non-Apache license text, and do not commit model/u2netp_mirror.onnx.
Smoke: .venv/bin/python -m pytest ai/tests/test_export_u2netp_mirror.py -q
Touched files: ai/scripts/export_u2netp_mirror.py, ai/tests/test_export_u2netp_mirror.py, ai/AGENTS.md, docs/ai/u2netp-mirror.md, docs/ai/models/u2netp_mirror_card.md, docs/ai/training.md, docs/adr/0671-*.md, docs/adr/_index_fragments/0671-*.md, docs/research/0691-*.md, changelog.d/added/0671-*.md, and this file.
feat/tune-score-backend-native-priority (ADR-0667)¶
No upstream rebase impact: this touches fork-local vmaf-tune backend-selection code, docs, tests, AGENTS notes, and ADR/research notes. Upstream Netflix/vmaf does not ship the fork vmaf-tune automation harness.
Invariant: tools/vmaf-tune/src/vmaftune/score_backend.py keeps DEFAULT_FALLBACKS = ("cuda", "sycl", "hip", "cpu"). The Vulkan entry was removed when ADR-0726 dropped the Vulkan backend; do not re-add it during backend-selector rebases. CPU remains the final fallback.
Smoke: .venv/bin/python -m pytest tools/vmaf-tune/tests/test_score_backend.py -q
Touched files: tools/vmaf-tune/src/vmaftune/score_backend.py, tools/vmaf-tune/tests/test_score_backend.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-score-backend.md, docs/adr/0667-*.md, docs/adr/_index_fragments/0667-*.md, docs/research/0687-*.md, changelog.d/changed/0667-*.md, and this file.
feat/tune-report-quick-takeaways (ADR-0666)¶
No upstream rebase impact: this touches fork-local vmaf-tune report rendering, tests, user docs, ADR/research notes, AGENTS notes, and a changelog fragment. Upstream Netflix/vmaf does not ship the fork vmaf-tune profile-card renderer.
Smoke: .venv/bin/python -m pytest tools/vmaf-tune/tests/test_report.py -q
Touched files: tools/vmaf-tune/src/vmaftune/report.py, tools/vmaf-tune/tests/test_report.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0666-*.md, docs/adr/_index_fragments/0666-*.md, docs/research/0686-*.md, changelog.d/added/0666-*.md, and this file.
fix/fast-nr-calibration-quality-guard (ADR-0665)¶
No upstream rebase impact: this touches fork-local tiny-AI calibration tooling, vmaf-tune docs, package AGENTS notes, ADR/research notes, and a changelog fragment. Upstream Netflix/vmaf does not ship the fork nr_metric_v1 fast-NR sidecar calibration workflow.
Smoke: .venv/bin/python -m pytest ai/tests/test_calibrate_nr_threshold.py -q
Touched files: ai/scripts/calibrate_nr_threshold.py, ai/tests/test_calibrate_nr_threshold.py, ai/AGENTS.md, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune-fast-nr.md, docs/ai/training.md, docs/adr/0665-*.md, docs/adr/_index_fragments/0665-*.md, docs/research/0685-*.md, changelog.d/fixed/0665-*.md, and this file.
feat/ai-validation-report-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local tiny-AI validation tooling, model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the fork tiny-model registry or saliency-student validation surfaces.
Smoke: .venv/bin/python -m pytest ai/tests/test_validation_report_provenance.py -q
Touched files: ai/scripts/validate_model_registry.py, ai/scripts/validate_saliency_student.py, ai/tests/test_validation_report_provenance.py, docs/ai/model-registry.md, docs/ai/models/saliency_student_*.md, docs/ai/training.md, docs/research/0683-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0683-*.md, and this file.
feat/vmaf-tiny-validator-report-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local tiny-AI validator tooling, model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the v2/v3/v4 tiny-VMAF validator CLI family.
Smoke: .venv/bin/python -m pytest ai/tests/test_vmaf_tiny_validator_reports.py -q
Touched files: ai/scripts/validate_vmaf_tiny_v*.py, ai/tests/test_vmaf_tiny_validator_reports.py, docs/ai/models/vmaf_tiny_v*.md, docs/ai/training.md, docs/research/0681-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0681-*.md, and this file.
feat/saliency-student-metrics-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local AI saliency training tooling, model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the DUTS-trained saliency student metrics surface.
Smoke: .venv/bin/python -m pytest ai/tests/test_saliency_student_metrics_provenance.py -q
Touched files: ai/scripts/train_saliency_student.py, ai/scripts/train_saliency_student_v2.py, ai/tests/test_saliency_student_metrics_provenance.py, docs/ai/models/saliency_student_*.md, docs/ai/training.md, docs/research/0680-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0680-*.md, and this file.
feat/dnn-exporter-manifest-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local AI exporter tooling, tiny-model docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship these DNN feature-model exporter sidecars.
Smoke: .venv/bin/python -m pytest ai/tests/test_dnn_exporter_run_provenance.py -q
Touched files: ai/scripts/export_tiny_models.py, ai/scripts/export_fastdvdnet_pre.py, ai/scripts/export_fastdvdnet_pre_placeholder.py, ai/scripts/export_transnet_v2.py, ai/scripts/export_transnet_v2_placeholder.py, ai/tests/test_dnn_exporter_run_provenance.py, docs/ai/models/*.md, docs/ai/training.md, docs/research/0679-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0679-*.md, and this file.
feat/ensemble-manifest-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local AI ensemble training tooling, docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the fr_regressor_v2_ensemble_v1 trainer/manifest surface.
Smoke: .venv/bin/python -m pytest ai/tests/test_train_fr_regressor_v2_ensemble.py -q
Touched files: ai/scripts/train_fr_regressor_v2_ensemble.py, ai/tests/test_train_fr_regressor_v2_ensemble.py, docs/ai/models/fr_regressor_v2_probabilistic.md, docs/ai/training.md, docs/research/0678-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0678-*.md, and this file.
feat/nr-threshold-calibration-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local AI/vmaf-tune calibration tooling, docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the vmaf-tune --fast-nr NR threshold calibration path.
Smoke: .venv/bin/python -m pytest ai/tests/test_calibrate_nr_threshold.py -q
Touched files: ai/scripts/calibrate_nr_threshold.py, ai/tests/test_calibrate_nr_threshold.py, docs/usage/vmaf-tune-fast-nr.md, docs/ai/training.md, docs/research/0677-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0677-*.md, and this file.
feat/phase-f-calibration-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local AI/vmaf-tune calibration tooling, docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship the vmaf-tune auto Phase F recipe calibration path.
Smoke: .venv/bin/python -m pytest ai/tests/test_calibrate_phase_f_recipes.py -q
Touched files: ai/scripts/calibrate_phase_f_recipes.py, ai/tests/test_calibrate_phase_f_recipes.py, docs/usage/vmaf-tune.md, docs/ai/training.md, docs/research/0676-*.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0676-*.md, and this file.
feat/quant-ep-report-provenance (ADR-0661)¶
No upstream rebase impact: this touches fork-local AI investigation tooling, AI docs, ADR/research notes, and changelog fragments. Upstream Netflix/vmaf does not ship this per-EP quantisation harness.
Smoke: .venv/bin/python -m pytest ai/tests/test_measure_quant_drop_per_ep.py -q
Touched files: ai/scripts/measure_quant_drop_per_ep.py, ai/tests/test_measure_quant_drop_per_ep.py, docs/ai/quant-eps.md, docs/research/0006-tinyai-ptq-accuracy-targets.md, docs/research/0675-quant-ep-report-provenance.md, docs/adr/0661-ai-run-manifest-provenance.md, changelog.d/added/0675-*.md, and this file.
fix/windows-cuda-toolkit-installer (ADR-0664)¶
High CI rebase impact: this touches the fork-local build matrix workflow. Upstream Netflix/vmaf does not ship these Windows GPU build-only legs, but workflow syncs can silently restore older action-based setup patterns.
Rebase-sensitive fork invariant:
Build — Windows MSVC + CUDA (build only)installs CUDA 13.2.0 directly from NVIDIA's Windows network installer and verifiesnvcc.exe --version. Do not restoreJimver/cuda-toolkiton this Windows leg without a superseding ADR and a green required Windows CUDA run.- Linux CUDA legs remain unchanged and still use
Jimver/cuda-toolkit.
Smoke: gh pr checks <pr> --watch --required
Touched files: .github/workflows/libvmaf-build-matrix.yml, .github/AGENTS.md, docs/development/ci-runners.md, docs/adr/0664-*.md, docs/research/0664-*.md, changelog.d/fixed/0664-*.md, and this file.
fix/external-bench-wrapper-schema (ADR-0656)¶
No upstream rebase impact: all touched implementation paths are fork-local external-bench tooling, docs, tests, ADR/research, and changelog fragments. Upstream Netflix/vmaf does not ship this benchmark harness.
Rebase-sensitive fork invariant:
summary.competitoremitted by everytools/external-bench/*/run.shwrapper must exactly match the registry key incompare.WRAPPERS. Model/version labels belong in optional metadata, not this identity field, orvalidate_wrapper_output()will reject the result before aggregation.
Smoke: .venv/bin/python -m pytest tools/external-bench/tests/ -q
Touched files: tools/external-bench/, docs/ai/external-bench.md, docs/adr/0656-*.md, docs/research/0656-*.md, changelog.d/fixed/0656-*.md, mkdocs.yml, and this file.
fix/tiny-ai-disabled-runtime-gate (ADR-0660)¶
Low upstream rebase impact: the touched C files are fork-local tiny-AI extractors and helper tests. Upstream Netflix/vmaf does not ship these DNN feature extractors, but conflicts are possible if upstream changes the feature registry or libvmaf's optional-DNN surface.
Rebase-sensitive fork invariant:
- Every tiny-AI feature extractor calls
vmaf_tiny_ai_require_runtime(<feature>)after pixel-format / bit-depth validation and beforevmaf_tiny_ai_resolve_model_path(). Disabled-DNN builds must return-ENOSYSbefore path probing; DNN-enabled builds keep missing model paths as-EINVAL.
Smoke: meson test -C build --suite=fast --print-errorlogs test_lpips test_dists test_fastdvdnet_pre test_mobilesal test_transnet_v2
Touched files: core/src/dnn/tiny_extractor_template.h, core/src/feature/feature_{lpips,dists,mobilesal}.c, core/src/feature/{fastdvdnet_pre,transnet_v2}.c, core/test/tiny_ai_test_template.h, core/src/dnn/AGENTS.md, docs/ai/, docs/metrics/features.md, docs/adr/0660-*.md, docs/research/0660-*.md, changelog.d/fixed/0660-*.md, and this file.
feat/saliency-feature-materializer (ADR-0655)¶
No upstream rebase impact: the implementation is fork-local AI tooling (ai/scripts/, ai/tests/) plus fork-local documentation and changelog files. Upstream Netflix/vmaf does not ship the fork's saliency training-table materializer.
Rebase-sensitive fork invariants:
ai/scripts/materialize_saliency_features.pyowns bulk saliency enrichment for existing JSONL/parquet feature tables; trainers consume the resultingsaliency_mean/saliency_varcolumns instead of silently running saliency inference inside training loops.- The status column remains row-local and human-readable (
ok,skipped-existing,missing-source,missing-geometry,decode-failed,model-failed) so large local sweeps can be audited without scraping stderr. SaliencyMaterializeConfig.default_width/default_height(added PR fixing the Netflix refresh materializer): fallback geometry for raw YUV corpora without container headers. Netflix corpus YUVs are always 1920×1080 at rest.- For
.yuvsources, the ffmpeg decode prepends-f rawvideo -video_size WxH -pix_fmt yuv420pbefore-i; do not remove this for raw-YUV support. - In-process per-file saliency cache in
materialize_rows()avoids redundant decodes for per-frame tables; the cache is scoped to onematerialize_rows()call and does not persist across batch table boundaries.
Smoke: PYTHONPATH=. .venv/bin/python -m pytest ai/tests/test_materialize_saliency_features.py -q
feat/signal-mix-audit (ADR-0650)¶
No upstream rebase impact: all implementation paths are fork-local AI tooling, tests, and documentation. Upstream Netflix/vmaf does not ship this training/audit package or the associated docs.
Rebase-sensitive fork invariants:
ai/scripts/signal_mix_audit.pyremains table-only and side-effect free: no feature extraction, checkpoint export, corpus mutation, or default CI gate.- Signal-family regexes and
docs/ai/signal-mix-audit.mdmust be updated together when new metric families or table columns are introduced. - Missing candidate metrics in the Markdown report are advisory work selectors, not proof that a candidate should be promoted without a corpus run.
Smoke: .venv/bin/python -m pytest ai/tests/test_signal_mix_audit.py -q
Touched files: ai/scripts/signal_mix_audit.py, ai/tests/test_signal_mix_audit.py, ai/AGENTS.md, docs/ai/signal-mix-audit.md, docs/adr/0650-*.md, docs/research/0650-*.md, changelog.d/added/0650-*.md, and this file.
fix/dnn-attached-multi-output (ADR-0646)¶
Low upstream rebase impact: the implementation touches fork-local DNN runtime plumbing plus libvmaf's context bridge. Upstream Netflix/vmaf does not ship the fork's ONNX Runtime attached tiny-AI surface, but conflicts are possible if upstream changes core/src/libvmaf.c near the per-frame pipeline.
Rebase-sensitive fork invariants:
- Single-output attached tiny models keep the historical collector key exactly. Do not append
_scoreor an ONNX output suffix for one-output models. - Multi-output attached models route through
vmaf_ort_run(), notvmaf_ort_infer(). The latter is intentionally a single-output helper. - Sidecar
output_names[]wins only when its count matches the ONNX output count; otherwise ONNX output names are used and sanitized. - Attached mode remains scalar-only. Vector or image output tensors must still use
vmaf_dnn_session_run()until a future ADR defines feature-name flattening.
Smoke: docker exec vmaf-dev-mcp bash -lc 'cd /workspace && rm -rf /tmp/vmaf-dnn-multi-output-build && meson setup /tmp/vmaf-dnn-multi-output-build core -Denable_dnn=enabled -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled -Denable_hip=false -Denable_metal=disabled && meson test -C /tmp/vmaf-dnn-multi-output-build --suite=dnn --print-errorlogs'
Touched files: core/src/libvmaf.c, core/src/dnn/model_loader.*, core/src/dnn/ort_backend.*, core/test/dnn/*, model/tiny/smoke_multi_output_v0.*, scripts/gen_multi_output_smoke_onnx.py, docs/api/dnn.md, docs/ai/, docs/adr/0646-*.md, docs/research/0646-*.md, changelog.d/fixed/0646-*.md, and this file.
fix/ai-refresh-defaults-and-konvid-full-features (ADR-0642)¶
No upstream rebase impact: all touched implementation files live under fork-local ai/ tooling. Upstream Netflix/vmaf does not ship these training scripts, model-refresh docs, or local corpus ledgers.
Rebase-sensitive fork invariants:
- AI feature extraction defaults point at
core/build-cpu/tools/vmaf. Do not regress to/usr/local/bin/vmafor ambiguousbuild/tools/vmaf; stale binaries have previously lacked fork-only extractors. ai/scripts/konvid_to_full_features.pyowns regeneration of bothruns/full_features_konvid.parquetandruns/full_features_konvid_with_folds.parquet. The folded output'ssource=fold0..fold4assignment is a deterministic balanced hash over clip keys and feedseval_multiseed_v3_v4.py.- BVI-DVC full-feature dir mode accepts
.mkv,.mp4, and.yuv. The local.mkvlossless bundle is the known-good refresh input after the raw-YUV copy produced all-zero VMAF in a one-clip smoke. ai/scripts/extract_ugc_features.pyemits the currentFULL_FEATURESschema with an explicitvmaf_v0.6.1model path. Do not restore the historical canonical-6-only UGC table when refreshingfull_features_5corpus.- Aggregate full-feature training tables are rebuilt with
ai/scripts/combine_full_feature_parquets.py; the normalized schema iscorpus, source, frame_index, codec, <FULL_FEATURES>, vmaf.
Smoke: .venv/bin/python -m pytest ai/tests/test_konvid_full_features.py ai/tests/test_extract_ugc_features.py ai/tests/test_combine_full_feature_parquets.py ai/tests/test_feature_extractor_defaults.py ai/tests/test_bvi_dvc_dir_mode.py -q
Touched files: ai/data/feature_extractor.py, ai/scripts/*full_features*.py, ai/scripts/konvid_to_full_features.py, ai/src/vmaf_train/, ai/tests/test_*, ai/AGENTS.md, docs/ai/, docs/adr/0642-*.md, docs/research/0642-*.md, changelog.d/added/, and .workingdir2/AI_REFRESH_2026-05-20.md (ignored local ledger).
fix/dev-container-encoder-probes (ADR-0641)¶
Low upstream rebase impact: implementation changes are fork-local dev-container / vmaf-tune files (dev/, tools/vmaf-tune/, docs, ADR, research, changelog) plus one fork-local FFmpeg integration patch. Upstream Netflix/vmaf does not ship vmaf-tune or this dev-MCP compose stack. The only upstream-adjacent file is ffmpeg-patches/0003-*, which targets FFmpeg n8.1.1 rather than Netflix/vmaf.
Rebase-sensitive fork invariants:
dev/Containerfilemust keep the pinnedintel/vpl-gpu-rtsource build and post-install/usr/lib/x86_64-linux-gnu/libmfx-gen.socheck whenever FFmpeg keeps--enable-libvpl.libvpl-devalone exposes QSV encoders but cannot create an Arc/iGPU session, and installing the runtime outside the dispatcher search path revives the sameMFX_ERR_NOT_FOUNDfailure.dev/docker-compose.ymlmust keep thedev-mcphealthcheck aligned with the stdio entrypoint (vmaf --version), not/sockets/vmaf-mcp.sock.vmaf-tune comparedefaults to the production CPU setlibx265,libsvtav1; archival software codecs remain explicit via--encoders.- QSV VA-API device selection defaults to
autoand uses Intel sysfs vendor-ID discovery; explicit--vaapi-devicepaths still override. ffmpeg-patches/0003-*must callvmaf_sycl_state_free(&s->sycl_state). The public SYCL API frees and nulls aVmafSyclState **; using the older single-pointer call breaks the in-container FFmpeg build with-Wincompatible-pointer-types.
Touched files: dev/Containerfile, dev/docker-compose.yml, dev/AGENTS.md, tools/vmaf-tune/src/vmaftune/{bisect.py,cli.py,compare.py,hw_devices.py}, tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, tools/vmaf-tune/tests/, ffmpeg-patches/0003-*, docs/usage/vmaf-tune.md, docs/development/dev-mcp.md, docs/state.md, docs/adr/0641-*.md, docs/research/0641-*.md, changelog.d/fixed/0641-*.md, and this file.
chore/ci-warning-omnibus (ADR-0635)¶
No rebase impact: all touched files are fork-local CI workflow YAML (.github/workflows/libvmaf-build-matrix.yml), fork-added docs (docs/mcp/tools.md, docs/adr/, docs/research/, changelog.d/), and this file. Upstream Netflix/vmaf does not use GitHub Actions workflows that overlap with these changes. No C sources, no public headers, and no FFmpeg patch series are involved.
Touched files: .github/workflows/libvmaf-build-matrix.yml (ilammy→TheMrMilchmann action swap; windows-latest→windows-2025; vulkaninfo stderr redirect + debug demotion; ccache-v2 key prefix), docs/mcp/tools.md (run_benchmark heading backtick removal + a-id drop), docs/adr/0635-ci-warning-omnibus-2026-05-19.md, docs/adr/README.md (one index row), docs/research/ci-warning-omnibus-2026-05-19.md, changelog.d/fixed/0635-ci-warning-omnibus.md, docs/rebase-notes.md (this entry).
ADR-0672 — Saliency materializer temporal controls¶
Saliency-table provenance impact. This widens the ADR-0655 materializer from historical mean-only saliency to the same temporal reducer family exposed by vmaf-tune.
Key invariants:
ai/scripts/materialize_saliency_features.pyforwards--temporal-aggregatorand--ema-alphaintovmaftune.saliency.compute_saliency_map().- Newly computed rows record
saliency_model_id,saliency_aggregator, andsaliency_ema_alphaby default. - Rows skipped because they already contain finite saliency columns must not get invented model/reducer metadata; use
--overwritefor intentional replacement.
Touched files: ai/scripts/materialize_saliency_features.py, ai/tests/test_materialize_saliency_features.py, ai/AGENTS.md, docs/ai/saliency-feature-materializer.md, docs/ai/u2netp-mirror.md, docs/adr/0672-saliency-materializer-temporal-controls.md, docs/research/0692-saliency-materializer-temporal-controls.md, changelog.d/added/0672-saliency-materializer-temporal-controls.md, docs/rebase-notes.md (this entry).
ADR-0654 — Predictor saliency signals¶
vmaf-tune predict --use-saliency is a predictor-feature switch, not the ROI/QP sidecar path. Preserve the temporary raw-yuv420p decode in predictor_features._compute_saliency() before calling saliency.compute_saliency_map(raw_path, width, height, ...); the saliency helper remains raw-YUV-only even though the public predict source can be any FFmpeg-readable container.
predictor_train.project_row() must keep the 14-column predictor input layout stable. When real corpora carry probe_*_avg_bytes, saliency_mean, saliency_var, frame_diff_mean, y_avg, or y_var, preserve those finite values. Only legacy rows should fall back to bitrate-derived probe bytes and zero saliency / signalstats values.
fix/ci-test-failures-omnibus (ADR-0637)¶
No rebase impact: all touched files are fork-local CI configuration (.github/workflows/tests-and-quality-gates.yml), MCP server tests (mcp-server/vmaf-mcp/tests/test_smoke_e2e.py), ADR files, and changelog fragments. No upstream C sources, no public headers, no FFmpeg patch series involved. The timeout and coverage-floor edits are fork-CI-specific and have no upstream equivalent.
fix/scaffold-audit-p0-silent-correctness (ADR-0620)¶
No rebase impact: all touched files are fork-local Python harness files and docs. No upstream C sources, no public headers, no FFmpeg patch series involved. The three fixed Python files (routine.py, train_test_model.py, local_explainer.py) are also present upstream, but the specific exception-handling changes are in fork-added call paths (extended-stats bagging, plot_scatter visualisation, local-explainer model dispatch). If upstream lands a conflicting change to these exact lines, the merge resolution is straightforward: keep the raise paths and update context if the upstream change affects surrounding logic.
Touched files: python/vmaf/tools/exceptions.py (3 new exception classes), python/vmaf/routine.py (P0-1 fix + CalibrationError import), python/vmaf/core/train_test_model.py (P0-2 fix + MissingLabelStddevError import), python/vmaf/core/local_explainer.py (P0-3 fix + EnsembleNotSupportedError import), python/test/test_adr0620_scaffold_audit_p0.py (16 regression tests), docs/adr/0620-scaffold-audit-p0-silent-correctness-fixes.md, docs/adr/README.md (one index row), docs/state.md (3 rows moved from Open to Recently closed), changelog.d/fixed/adr0620-scaffold-audit-p0-silent-correctness.md,
fix/scaffold-audit-p1-feature-plumbing (ADR-0613)¶
Touches core/src/hip/picture_hip.c, core/src/feature/feature_mobilesal.c, and core/src/libvmaf.c. Upstream Netflix/vmaf does not have a HIP backend, the mobilesal extractor, or the DNN multi-output guard — so no rebase conflict is expected on any of the C-side changes.
Touches tools/vmaf-tune/src/vmaftune/cli.py — upstream does not have vmaf-tune. No rebase conflict expected.
Doc paths (docs/api/dnn.md, docs/ai/models/mobilesal.md, docs/state.md, docs/adr/README.md) are fork-local only.
Rebase-sensitive invariant (C): picture_hip.c now compiles in two branches: #ifdef HAVE_HIPCC (real hipMalloc) and #else (-ENOSYS). Any upstream change to picture_hip.h's function signatures must be reflected in both branches.
Touched files: core/src/hip/picture_hip.c, core/src/feature/feature_mobilesal.c, core/src/libvmaf.c (comment-only at lines 1115, 1214), tools/vmaf-tune/src/vmaftune/cli.py, docs/api/dnn.md, docs/ai/models/mobilesal.md, docs/state.md, docs/adr/0639-scaffold-audit-p1-feature-plumbing-fixes.md, docs/adr/README.md, changelog.d/fixed/adr-0613-scaffold-audit-p1.md, docs/rebase-notes.md (this entry).
feat/zed-editor-project-config (ADR-0608)¶
No rebase-sensitive invariants — only .zed/ (new directory), .gitignore (.zed/local/ exclusion), docs/development/ide-setup.md (Zed section), docs/adr/0608-zed-editor-project-config.md (ADR), and supporting fragment/ changelog files are touched. None of these paths overlap with upstream Netflix/vmaf. .vscode/ is unchanged.
Touched files: .zed/settings.json, .zed/tasks.json, .zed/debug.json (new), .gitignore (.zed/local/ entry), docs/development/ide-setup.md (Zed section appended), docs/adr/0608-zed-editor-project-config.md, docs/adr/_index_fragments/0608-zed-editor-project-config.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md (regenerated), changelog.d/added/0608-zed-editor-project-config.md, docs/rebase-notes.md (this entry).
plan/netflix-grade-encoding-roadmap (ADR-0613 – ADR-0618)¶
No rebase-sensitive invariants — all changes are planning documents only: six ADRs, six research digests, one roadmap overview, one changelog fragment, and ADR index rows in docs/adr/README.md. No C sources, headers, build files, or Python implementation files are touched. No upstream-shared paths are modified.
Touched files: docs/adr/0613-dynamic-optimizer.md, docs/adr/0614-per-shot-abr-rendition.md, docs/adr/0615-fast-nr-prescoring.md, docs/adr/0616-vmaf-neg-integration.md, docs/adr/0617-cross-shot-complexity-weighting.md, docs/adr/0618-content-aware-classifier.md, docs/adr/README.md (index rows), docs/research/0609-dynamic-optimizer-research.md, docs/research/0610-per-shot-abr-rendition-research.md, docs/research/0611-fast-nr-prescoring-research.md, docs/research/0612-vmaf-neg-integration-research.md, docs/research/0613-cross-shot-complexity-weighting-research.md, docs/research/0614-content-aware-classifier-research.md, docs/development/netflix-grade-encoding-pipeline-roadmap-2026-05-19.md, changelog.d/added/netflix-grade-encoding-pipeline-roadmap.md.
chore/scaffold-audit-p3-cleanup (ADR-0621)¶
No rebase-sensitive invariants. All changes are in fork-local files (ai/scripts/, scripts/dev/, python/test/, .semgrepignore, docs/ai/model-registry.md, docs/adr/, docs/state.md, changelog.d/). None of the touched Python test files are shared with Netflix upstream (Netflix does not ship asset_test.py or quality_runner_test.py). The python/test/*.py files the PR touches carry fork-added tests or skip-decorator updates; no upstream test assertions are modified.
Touched files: scripts/dev/permutation_importance.py, ai/scripts/*.py (13 files), python/test/result_test.py, python/test/routine_test.py, python/test/asset_test.py, python/test/feature_extractor_test.py, python/test/quality_runner_test.py, .semgrepignore, docs/ai/model-registry.md, docs/adr/0621-scaffold-audit-p3-cleanup.md, docs/adr/README.md, docs/state.md, changelog.d/fixed/0621-scaffold-audit-p3-cleanup.md.
feat/mcp-p1-vmaftune-extractors-models-progress (ADR-0608)¶
No rebase-sensitive invariants. The only changed files are:
mcp-server/vmaf-mcp/src/vmaf_mcp/server.py— fork-local MCP server, never in Netflix upstream.mcp-server/vmaf-mcp/tests/— fork-local tests.mcp-server/vmaf-mcp/tests/test_smoke_e2e.py— updated expected tool-name set.docs/mcp/tools.md,docs/adr/0608-*.md,docs/adr/README.md,docs/rebase-notes.md— docs.changelog.d/added/0608-*.md— changelog fragment.
No C sources, public headers, meson_options.txt, ffmpeg-patches/, or build files are touched.
chore/renovate-customManagers-dev-image (ADR-0605)¶
No rebase-sensitive invariants — the only change is to renovate.json (adding eight new customManagers entries for Containerfile ARG-pinned deps; extending the FFmpeg manager's managerFilePatterns to also scan dev/Containerfile). renovate.json is fork-local and never appears in upstream Netflix/vmaf. No C sources, headers, or build files are touched.
Touched files: renovate.json (customManagers + packageRules), docs/adr/0605-renovate-custommgr-dev-image.md, docs/adr/README.md (one index row), changelog.d/changed/0605-renovate-custommgr-dev-image.md, docs/rebase-notes.md (this entry).
chore/rocm-7-13-bump-and-renovate-manager (ADR-0604)¶
No rebase-sensitive invariants — the only change is to renovate.json (adding a customManagers entry and customDatasources block for ROCm). renovate.json is fork-local and never appears in upstream Netflix/vmaf. dev/Containerfile is unchanged (7.2.3 remains the correct pin).
Touched files: renovate.json (customManagers + customDatasources), docs/adr/0604-rocm-renovate-manager.md, docs/adr/README.md (one index row), docs/research/rocm-version-audit-2026-05-19.md, changelog.d/changed/0604-rocm-renovate-manager.md, docs/rebase-notes.md (this entry).
fix/ubuntu-26-04-fallout (ADR-0603)¶
No rebase-sensitive invariants — all changes are in the build/CI layer (dev/Containerfile, CI workflow YAML, meson.build nvcc flags, pyproject.toml ceiling bumps) and do not touch any upstream-shared C sources, public headers, or Python test assertions.
The one meson.build addition (-D__MATH_NO_INLINES in cuda_flags) is additive and harmless on any glibc version; if upstream Netflix touches the CUDA flags block in core/src/meson.build, preserve the -D__MATH_NO_INLINES entry alongside whatever upstream adds.
Touched files: dev/Containerfile, core/src/meson.build (cuda_flags), tools/vmaf-tune/pyproject.toml (requires-python ceiling), ai/pyproject.toml (requires-python ceiling), .github/workflows/libvmaf-build-matrix.yml (CUDA version pin), docs/adr/0603-ubuntu-26-04-fallout-fixes.md, docs/adr/README.md (index row), changelog.d/fixed/ubuntu-26-04-fallout.md, docs/rebase-notes.md (this entry).
fix/macos-vmaf-write-output-segv (ADR-0602)¶
No rebase-sensitive invariants — the changes are purely defensive guards (NULL checks, pic_cnt > 0 guards) added to existing functions in core/src/libvmaf.c and core/src/output.c, and a new test in core/test/test_output.c. If upstream Netflix merges any change to vmaf_write_output_with_format or vmaf_write_output_json, re-apply the three guards (vmaf-NULL, feature_collector-NULL, output_path-NULL) and the pic_cnt > 0 guards in json_write_pooled_entry / xml_write_one_metric_pools to the merged version.
Touched files: core/src/libvmaf.c (NULL guards at top of vmaf_write_output_with_format), core/src/output.c (pic_cnt > 0 guards, NULL guards in JSON writer, split xml_write_pooled_and_aggregate into three helpers, remove unused n_frames variables), core/test/test_output.c (test_write_output_pic_cnt_zero regression test), docs/adr/0602-macos-vmaf-write-output-segv.md, docs/adr/README.md (index row), docs/state.md (Recently-closed row), docs/rebase-notes.md (this entry), changelog.d/fixed/0602-macos-vmaf-write-output-segv.md.
fix/vmaftune-qsv-amf-hw-init-and-probe-size (ADR-0601)¶
Rebase impact: tools/vmaf-tune/ only — no libvmaf C sources, public headers, or meson_options.txt touched. Zero upstream conflict surface.
Rebase-sensitive invariants:
compare._QSV_ENCODERSmust stay in sync with the set of QSV encoder names registered incodec_adapters/. If a new QSV adapter is added (e.g.vp9_qsv), add its encoder string to_QSV_ENCODERSin the same commit; omitting it silently skips the VA-API init chain for that encoder.BaseQsvAdapter.qsv_hw_init_args()andcompare._hw_init_args_for_encoder()must produce identical flag sequences. If one is updated, update the other. A test intest_bbb_e2e_v14_bug_cluster.pyverifies this invariant.- The default
_DEFAULT_VAAPI_DEVICE = "/dev/dri/renderD128"is also the default inBaseQsvAdapter.qsv_hw_init_args. Keep them in sync.
Touched files: tools/vmaf-tune/src/vmaftune/compare.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/codec_adapters/_qsv_common.py, tools/vmaf-tune/src/vmaftune/codec_adapters/_amf_common.py, tools/vmaf-tune/tests/test_bbb_e2e_v14_bug_cluster.py, docs/adr/0601-vmaftune-qsv-amf-hw-init-and-probe-fix.md, docs/adr/README.md (one index row), docs/usage/vmaf-tune.md (--vaapi-device flag + QSV init docs), docs/state.md (T-BBB-V14-HW-ENCODER-PROBE-QSV-INIT-2026-05-18 row), changelog.d/fixed/0601-vmaftune-qsv-amf-hw-init-and-probe-fix.md, docs/rebase-notes.md (this entry).
chore/ffmpeg-patches-n811-full-feature-exposure-sync (ADR-0576)¶
Rebase impact: ffmpeg-patches/ only — no libvmaf C sources, public headers, or meson_options.txt touched. Upstream Netflix/vmaf does not ship ffmpeg-patches/; no rebase conflict surface.
Rebase-sensitive invariants:
- Patch 0014 targets the
LIBVMAFContextstruct andVmafConfigurationinit blocks introduced cumulatively by patches 0003–0013. It must remain the final patch in the series (or be rebased against whichever patch last touches those init blocks if the series is reordered). - The
cpumask/gpumaskAVOption names must match the field names inVmafConfigurationfromcore/include/libvmaf/libvmaf.h. If a future libvmaf refactor renames those fields, patch 0014's struct designators (.cpumask =,.gpumask =) must be updated to match. - The
feature=passthrough in the stocklibvmaffilter continues to cover all extractors infeature_extractor_list[]; no patch is needed for new extractor additions unless they require a dedicated C-API init call (e.g., a newvmaf_<backend>_state_init()entry point).
fix/ffmpeg-patches-score-fmt-gap (ADR-1064)¶
Rebase impact: ffmpeg-patches/ only — adds patch 0016 and updates series.txt and README.md. No libvmaf C sources, public headers, or meson_options.txt touched.
Rebase-sensitive invariants:
- Patch 0016 must come after patch 0014 (which adds
cpumask/gpumasktoLIBVMAFContext). Patch 0016 addsscore_fmtimmediately after theint64_t gpumaskfield; if 0014 is reordered or the struct layout changes, the context lines in 0016's struct hunk must be updated. - The
vmaf_write_output_with_formatsymbol must be present in the libvmaf version checked bypkg-config. If a future refactor renames this entry point, all four uninit paths in patch 0016 must be updated. - Patch 0016 requires
git am --3wayreplay against all 15 preceding patches before verifying clean apply against n8.1.1 (the patch series is cumulative).
Re-test on rebase:
git clone --depth 1 --branch n8.1.1 https://git.ffmpeg.org/ffmpeg.git /tmp/ffmpeg-retest
git -C /tmp/ffmpeg-retest config user.email "lusoris@pm.me"
git -C /tmp/ffmpeg-retest config user.name "Lusoris"
for p in ffmpeg-patches/*.patch; do
git -C /tmp/ffmpeg-retest am --3way "$p" || { echo "FAILED: $p"; break; }
done
# Expect 14 commits applied cleanly, no conflicts.
Upstream Netflix/vmaf has no ffmpeg-patches/; no rebase conflict surface against upstream/master. All 14 patches are fork-local.
feat/vmaftune-bisect-concurrency-cap (ADR-0577)¶
Rebase impact: pure Python — touches only tools/vmaf-tune/ and docs/. No C surface, no meson.build change, no public C-API change, no GPU path change.
Rebase-sensitive invariant: none. The _decode_semaphore singleton and set_decode_semaphore setter are new module-level additions in vmaftune/bisect.py; they do not conflict with any existing upstream pattern. The decode_semaphore keyword argument added to bisect_target_vmaf and make_bisect_predicate is backwards-compatible (defaults to None, falling back to the module-level semaphore).
Touched files: tools/vmaf-tune/src/vmaftune/bisect.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_bisect_concurrency_cap.py (new), tools/vmaf-tune/tests/test_bisect.py (exports check update), tools/vmaf-tune/tests/test_compare.py (semaphore kwarg assertion), docs/adr/0577-vmaftune-bisect-concurrency-cap-and-aggressive-cleanup.md (new), docs/adr/README.md (one index row), docs/usage/vmaf-tune.md (--max-concurrent-decodes docs + disk-mgmt section), changelog.d/fixed/vmaf-tune-bisect-concurrency-cap-enospc.md (new), docs/rebase-notes.md (this entry).
fix/windows-ci-sdk-pin-22621 (ADR-0575)¶
Rebase impact: tools only — touches core/tools/yuv_input.c. No meson.build change, no public C-API change, no GPU path change.
Rebase-sensitive invariant: #include <sys/stat.h> must remain before the #ifdef _MSC_VER macro block in yuv_input.c. If a rebase reorders these lines (e.g. by re-applying a prior ADR-0521 patch that placed the macros before the include), the MinGW64 and MSVC+SDK-26100 redefinition errors will recur.
Touched files: core/tools/yuv_input.c, docs/adr/0575-windows-msvc-stat-compat-include-order.md, docs/adr/README.md (one index row), docs/state.md (Updated note + T-WINDOWS-STAT-COMPAT row in Recently closed), changelog.d/fixed/0575-windows-stat-compat-include-order.md, docs/rebase-notes.md (this entry).
feat/integer-ssim-gpu-real-kernels (ADR-0564)¶
Rebase impact: low. The change touches two upstream-shared files:
core/src/feature/feature_extractor.c: adds threeexterndeclarations and three list entries (vmaf_fex_integer_ssim_cuda,vmaf_fex_integer_ssim_sycl, and a comment update). On rebase, apply after any upstream changes to this file.core/src/meson.build: adds one entry tocuda_cu_sourcesdict and one entry to the C source list. The meson.build is append-only per fork coordination rules.core/src/feature/hip/integer_ssim_hip.c: full rewrite of the host glue. The pre-existing upstream file used float intermediates; this branch rewrites it to int64. If upstream ever ships a real integer_ssim HIP extractor, it will conflict — prefer the upstream version and re-test.core/src/feature/sycl/integer_ssim_sycl.cpp: appends a new extractor after the existing float_ssim_sycl code. On rebase, confirm the append point is still a clean} /* extern "C" */boundary.
All new files (ssim_cuda.c, ssim_cuda.h, integer_ssim_score.cu) are fork-local with no upstream equivalent; no conflict expected.
Invariant: vmaf_fex_integer_ssim_cuda in ssim_cuda.c provides "ssim". The pre-existing vmaf_fex_integer_ssim_cuda in integer_ssim_cuda.c provides "float_ssim" — the naming is a historical misnomer kept for link-compat. Do not merge or rename without updating feature_extractor.c to match.
Touched files: core/src/feature/cuda/integer_ssim/integer_ssim_score.cu (new), core/src/feature/cuda/ssim_cuda.c (new), core/src/feature/cuda/ssim_cuda.h (new), core/src/feature/hip/integer_ssim_hip.c (rewritten), core/src/feature/sycl/integer_ssim_sycl.cpp (appended), core/src/feature/feature_extractor.c (extern + list entries), core/src/meson.build (PTX + C source entries), docs/adr/0564-integer-ssim-gpu-real-kernels.md, docs/adr/README.md (one index row), docs/research/0564-integer-ssim-gpu-real-kernels.md, docs/state.md (Recently-closed row), changelog.d/added/0564-integer-ssim-gpu-real-kernels.md, docs/rebase-notes.md (this entry).
fix/vmaftune-workdir-tmpfs-enospc (ADR-0549)¶
No rebase impact. All changes are confined to fork-local files:
tools/vmaf-tune/src/vmaftune/bisect.py(fork-added tool).tools/vmaf-tune/src/vmaftune/cli.py(fork-added tool).tools/vmaf-tune/tests/test_workdir_enospc.py(new test file).tools/vmaf-tune/tests/test_compare.py(update expected kwargs).dev/Containerfile(fork-local; whole file is fork-added).dev/scripts/dev-mcp-entrypoint.sh(fork-local).docs/adr/0549-vmaftune-workdir-relocation.md,docs/state.md,docs/usage/vmaf-tune.md,docs/adr/README.md,changelog.d/fixed/vmaf-tune-enospc-workdir.md,docs/rebase-notes.md(fork-only doc tree).
No upstream-shared paths touched. VMAFTUNE_WORKDIR is a new fork-local environment variable; it has no upstream counterpart and poses no rebase conflict risk.
docs/vcq-223-local-explainer-hang-diagnosis (ADR-0563)¶
No rebase impact. All changes are confined to fork-local documentation:
docs/adr/0551-local-explainer-hang-diagnosis.md(new file, fork-local).docs/research/0551-local-explainer-hang.md(new file, fork-local).docs/state.md— updated T-VCQ-223-LOCAL-EXPLAINER-HANG row (fork-local).changelog.d/fixed/0551-local-explainer-hang-diagnosis.md(new file, fork-local).docs/adr/README.md— new index row (fork-local).docs/rebase-notes.md— this entry (fork-local).
No upstream-shared C sources, Python sources, or build files are touched. The @unittest.skip decorator in python/test/local_explainer_test.py is explicitly not removed in this PR — that is a follow-up code change.
chore/hip-cuda-orphan-tu-cleanup (ADR-0546)¶
No rebase impact. All deleted files (adm_hip.c, motion_hip.c, vif_hip.c, feature_hip.h, integer_ciede_hip.c, integer_moment_hip.c, float_ssim_cuda.c) are fork-local additions with no upstream analogue. If upstream ever adds a file with the same name to core/src/feature/hip/ or core/src/feature/cuda/, a sync-upstream cherry-pick will restore it; the deletion here does not create a rebase conflict because the upstream tree never had these paths. The core/src/hip/meson.build edit is entirely fork-local. core/src/feature/hip/AGENTS.md and core/src/feature/cuda/AGENTS.md are fork-local files.
chore/hip-extractor-audit-verify-9 (ADR-0563)¶
No rebase impact. This PR is documentation and audit closure only. All changed files are fork-local:
docs/adr/0563-hip-extractor-audit-verification.md(new ADR, fork-local).docs/research/0563-hip-extractor-audit-verification.md(new research digest, fork-local).docs/state.md(fork-local tracking ledger).docs/adr/README.md(fork-local ADR index).docs/rebase-notes.md(this file, fork-local).changelog.d/changed/0551-hip-extractor-audit-close.md(fork-local fragment).
No upstream Netflix/vmaf file is touched. No libvmaf/ source is touched. No ffmpeg-patches/ file is touched. No meson_options.txt key is added. No new rebase-sensitive invariant is introduced.
fix/dev-container-sycl-hip-runtime (ADR-0543)¶
No rebase impact. All changes are confined to:
dev/Containerfile(fork-local; whole file is fork-added).dev/scripts/dev-mcp-entrypoint.sh(fork-local).dev/AGENTS.md(fork-local invariant note).docs/adr/0541-dev-container-sycl-hip-runtime-fix.md,docs/state.md,docs/development/dev-mcp.md,changelog.d/fixed/0541-dev-container-sycl-hip-runtime.md,docs/adr/README.md(fork-only doc tree).
No upstream-shared paths touched. The container's pinned NEO_VER / IGC_VER / GMMLIB_VER / ROCM_VER ARGs become a recurring maintenance item: when a future host kernel revs the i915 / xe / KFD UAPI, bump the relevant ARG. The dev-mcp-entrypoint.sh visibility probe surfaces such regressions in ≤ 30 s of container start so future bumps are easy to identify.
fix/dev-container-full-gpu-plumbing (ADR-0542)¶
No upstream-mirror paths touched. Modifies:
dev/Containerfile(stage 1 apt list: +intel-media-va-driver-non-free,mesa-va-drivers; revised Vulkan ICD selection comment block).dev/docker-compose.yml(common-env: +HSA_OVERRIDE_GFX_VERSION,HSA_ENABLE_SDMA, +ROCR_VISIBLE_DEVICES; expandedNVIDIA_DRIVER_CAPABILITIESdocumentation comment).dev/scripts/dev-mcp-entrypoint.sh(entrypoint-timeVK_DRIVER_FILESrewrite to exclude lavapipe whenever any real ICD is present).dev/AGENTS.md(new GPU-plumbing invariant section).docs/development/dev-mcp.md(backend matrix + env-var contract + HSA override documentation).docs/adr/0541-…md(+ index row),changelog.d/fixed/0541-…md,docs/state.md(one recently-closed row), this file.
Rebase sensitivity (none — container infra fork-local additive plus documentation): Every touched file lives under dev/, docs/, or changelog.d/. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry. The CLAUDE.md §12 r14 patch-stack rule does not apply (no libvmaf surface touched). Netflix upstream has no container infra under dev/ to conflict with.
fix/integer-vif-cuda-chroma-plane (ADR-0547)¶
Touches upstream-mirror path. Modifies:
core/src/feature/cuda/integer_vif_cuda.c(upstream-mirror — comment option-help-text clarifications plus a one-shot warn-on-true block for the vestigialenable_chromaoption; no kernel changes, no behaviour changes for any caller that doesn't setenable_chroma=true).
Why fork-local. Upstream Netflix/vmaf's CUDA VIF (verified at Netflix/vmaf@32780bd9b6:core/src/feature/cuda/integer_vif_cuda.c) neither carries the enable_chroma option nor has an n_planes field — it hardcodes data[0]-only access. The option was added by the fork-local PR #949 and the abandoned PR #948 attempted to mirror it on CPU; only the CUDA option landed and it was always a no-op.
Sync rule. If upstream ever adds genuine multi-plane VIF (would be a significant departure from the Sheikh & Bovik 2006 definition), revisit this clarification:
- Drop the
vmaf_log VMAF_LOG_LEVEL_WARNINGblock frominit_fex_cuda. - Restore
s->enable_chroma = false;to the active-clamp form OR plumbenable_chromainto the dispatch loop (depending on upstream's shape). - Update
docs/metrics/vif.mdto advertise the per-chroma-plane features that newly exist. - Move the
docs/state.mdrow from "Confirmed not-affected" to a normal closed-bug row.
Until then, sync conflicts on this file should keep both: the fork-local warn-on-true block in init_fex_cuda (search for the ADR-0541 comment anchor) and any incoming upstream changes to the neighbouring kernel-load paths.
Test reference. core/test/test_integer_vif_cpu_cuda_parity.c (suite fast/gpu) is the regression gate; it must continue to pass after any sync.
feat/hip-float-vif-score-kernel-real (ADR-0539)¶
No rebase impact. Touches:
core/src/feature/hip/hip_hsaco_stubs.c— fork-local TU; removes oneVMAF_HSACO_WEAK_STUB(float_vif_score_hsaco)line. Upstream Netflix/vmaf has no HIP backend so no conflict possible.docs/adr/0539-hip-float-vif-stub-removal.md,docs/adr/README.md,docs/state.md,docs/backends/hip/overview.md,core/src/feature/hip/AGENTS.md,changelog.d/fixed/hip-float-vif-stub-removal.md— fork-local docs.
Rebase invariant: the moment another .hip kernel under feature/hip/<extractor>/ becomes standalone-buildable, the same one-line removal must happen in hip_hsaco_stubs.c for its symbol. The AGENTS.md note added by this PR captures the pattern.
fix/hip-integer-vif-kernel-crash (ADR-0538)¶
Touches upstream-mirror paths. Modifies:
core/src/feature/hip/integer_vif/vif_statistics.hip(fork-local — HIP backend addition; no upstream conflict expected).core/src/feature/hip/integer_vif_hip.c(fork-local — added by ADR-0379 / PR #...).core/src/meson.build(upstream-shared — adds entries to the fork-localhip_kernel_sourcesdict, which itself is inside the fork-localif is_hipcc_enabled and is_hip_enabledblock; conflict risk only if upstream lands a totally different HIP build pipeline, which is implausible).core/src/hip/meson.build(fork-local).core/src/feature/hip/AGENTS.md(fork-local).core/src/feature/hip/hip_hsaco_stubs.c(NEW — fork-local).
No verbatim upstream code paths altered. Rebase invariant: if upstream ever adds an integer_vif HIP port, drop the fork's integer_vif/vif_statistics.hip and integer_vif_hip.c and re-evaluate whether the four ADR-0538 defects exist in their port too — three of the four are subtle (filter-half-width parsing, missing rd-write, host-pointer kernel arg) and an upstream re-implementation may well have the same blind spots.
fix/per-shot-bitrate-and-last-shot-chart (ADR-0531)¶
No rebase impact. All changes are confined to tools/vmaf-tune/src/vmaftune/per_shot.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/report.py, tools/vmaf-tune/tests/test_per_shot.py, tools/vmaf-tune/tests/test_report.py, docs/adr/0531-*.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, and changelog.d/fixed/per-shot-bitrate-and-last-shot-chart.md. The tools/vmaf-tune/ tree does not exist in upstream Netflix/vmaf. No conflict risk on sync.
fix/per-shot-segments-readonly-cwd (ADR-0532)¶
fix/per-shot-segments-readonly-cwd (ADR-0530)¶
No rebase impact. All changes are confined to tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_per_shot.py, docs/usage/vmaf-tune.md, docs/adr/0530-*.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, and changelog.d/fixed/0532-per-shot-segments-readonly-cwd.md. The tools/vmaf-tune/ tree does not exist in upstream Netflix/vmaf. No conflict risk on sync.
changelog.d/fixed/0530-per-shot-segments-readonly-cwd.md. The tools/vmaf-tune/ tree does not exist in upstream Netflix/vmaf. No conflict risk on sync.
fix/dev-container-dri-bind (ADR-0528)¶
No rebase impact. The only changed files are dev/docker-compose.yml, dev/AGENTS.md, docs/development/dev-mcp.md, docs/adr/0528-*.md, docs/adr/README.md, docs/rebase-notes.md, docs/state.md, and changelog.d/fixed/dev-container-dri-bind.md. None of these paths exist in upstream Netflix/vmaf. No conflict risk on sync.
fix/compare-rate-quality-chart-from-bisect-samples (ADR-0534)¶
No rebase impact. All changes are confined to fork-local files:
tools/vmaf-tune/src/vmaftune/bisect.py— addedBisectSampledataclass +BisectResult.samplesfield; bisect loop appends a sample per successful probe;to_recommend_resultprojects samples into theRecommendResult.bisect_samplestuple.tools/vmaf-tune/src/vmaftune/compare.py—RecommendResultgained optionalbisect_samplesfield;to_rowemitsbisect_samplesonly when populated (additive v2 schema change); CSV writer pinned toextrasaction="ignore".tools/vmaf-tune/src/vmaftune/report.py—BisectSamplePointadded;CodecSweepPointgained optionalbisect_samples;_sweep_plot_fnrewrites the chart to render from samples when available (legacy connect-the-dots path retained with caveat note when samples absent).tools/vmaf-tune/src/vmaftune/cli.py—--target-vmafsdefault flipped to75,80,85,90,93; both--target-vmafand--target-vmafswrapped with_TrackedDefaultActionso the v1 single-target back-compat path activates only when--target-vmaf NNis explicit and--target-vmafsis at its default;_sweep_point_from_jsonparses the new field.
None of these paths exist in upstream Netflix/vmaf (the entire tools/vmaf-tune/ tree is fork-local). No conflict risk on sync.
ffmpeg-patch stack: no impact (this PR doesn't touch any libvmaf C-API, public header, or meson_options.txt entry).
tooling/adr-atomic-allocator (ADR-0535)¶
No rebase impact. All changes are confined to scripts/adr/next-free.sh, scripts/adr/test-next-free.sh, docs/adr/0535-adr-atomic-allocator.md, docs/adr/README.md, docs/adr/0000-template.md, docs/state.md, docs/rebase-notes.md, docs/development/adr-workflow.md, changelog.d/added/0535-adr-atomic-allocator.md, CLAUDE.md, and AGENTS.md. None of these paths exist in upstream Netflix/vmaf. No conflict risk on sync.
fix/premium-vmaf-target-defaults (ADR-0538)¶
No rebase impact. All changes are confined to fork-local files:
tools/vmaf-tune/src/vmaftune/cli.py— flip the--target-vmafsdefault from75,80,85,90,93(ADR-0534) to94,96,97,98and update the help text + supersession note.tools/vmaf-tune/src/vmaftune/bisect.py— add_ABSOLUTE_CRF_RANGE_BY_NAME+_absolute_crf_range(adapter); default the bisect search window to that absolute range; bypassadapter.validate's CRF gate inside_encode_and_scorein favour of an explicit absolute-range check.tools/vmaf-tune/tests/test_bisect.py— updatetest_crf_range_defaults_to_adapter_quality_range->test_crf_range_defaults_to_encoder_absolute_range; add three regression tests pinning libx264 / libx265 / libsvtav1 premium-archival targets atok=Truewith achieved VMAF within 0.5 of target.tools/vmaf-tune/tests/test_compare_rate_quality_sweep.py— update the default-target-vmafs assertion to94,96,97,98.tools/vmaf-tune/AGENTS.md— rewrite the--target-vmafsdefault rebase-sensitive-invariant note; add a new invariant for the bisect's encoder-absolute-range default.docs/usage/vmaf-tune.md— supersede the rate-quality-sweep section's defaults / rationale; add the High-VMAF bisect contract subsection with the per-codec absolute-range table.docs/adr/0538-premium-vmaf-target-defaults-and-bisect.md,docs/adr/0534-...md(status flip),docs/adr/README.md,docs/research/0537-...md,docs/state.md,docs/rebase-notes.md,changelog.d/fixed/0538-premium-vmaf-target-defaults-and-bisect.md.
None of these paths exist in upstream Netflix/vmaf (the entire tools/vmaf-tune/ tree is fork-local). No conflict risk on sync.
ffmpeg-patch stack: no impact (this PR does not touch any libvmaf C-API, public header, or meson_options.txt entry).
feat/bvi-dvc-pre-extracted-input (ADR-0527)¶
No rebase impact. All changes are confined to ai/scripts/bvi_dvc_to_full_features.py, ai/tests/test_bvi_dvc_dir_mode.py, and doc / AGENTS.md / changelog files. None of these paths exist in upstream Netflix/vmaf. No conflict risk on sync.
fix/hip-motion-extractor-register (ADR-0523)¶
No rebase impact. The only changed file is core/src/feature/feature_extractor.c, which is a fork-local file (the HIP and Metal extractor blocks it contains have no upstream equivalent). Upstream Netflix/vmaf does not ship integer_motion_hip and the #if HAVE_HIP block does not exist in upstream. No conflict risk on sync.
fix/dnn-symbolic-batch-dim (ADR-0524)¶
Rebase-sensitive — core/src/libvmaf.c carries the fork-local tiny-AI loader path (vmaf_ctx_dnn_attach and the two helpers dnn_attach_nchw / dnn_attach_feature_vector); Netflix upstream does not ship a tiny-model surface. Changes:
dnn_attach_nchwacceptsin_shape[0] ∈ {1, -1}(symbolic batch folded to 1). Thein_shape[1] != 1(channels) reject is now separated from the batch check so each surface has its own diagnostic. The H/W reject message was sharpened to call out symbolic dims explicitly.dnn_attach_feature_vectorgained the same batch policy before the feature-width check; the optional rank-2 second-input shape probe (extra_shape) follows the same rule.- Per-frame inference (
vmaf_ctx_dnn_run_frame_nchwand the feature-vector run path) is unchanged — both already emitshape[0] = 1on the ORT Run call, so symbolic batch is purely a load-time concern.
core/src/dnn/AGENTS.md gained an "Invariant — symbolic batch dim acceptance (ADR-0524)" section. Reverting the batch acceptance breaks every shipped NR tiny model (model/tiny/nr_metric_v1*.onnx) plus any future trainer using the PyTorch dynamic_axes default.
ffmpeg-patch stack: no impact. The tiny-AI loader sits behind vmaf_use_tiny_model, which the in-tree FFmpeg patches do not touch.
Test fixture: model/tiny/smoke_v0_symbolic_batch.onnx is a fork-local 166-byte Identity graph with dim_param='batch' on dim 0. The fixture has no sidecar (loader handles -ENOENT gracefully) and is not listed in model/tiny/registry.json (which catalogues shipped models, not test fixtures).
fix/cli-no-reference-wire (ADR-0520)¶
Rebase-sensitive — core/tools/cli_parse.c + core/tools/vmaf.c are upstream-shared paths. Changes:
cli_parse.c: the reference-required gate at the end ofcli_parse()is now conditional on!settings->no_reference; the new branch requirestiny_model_pathand force-enablesno_prediction. If upstream Netflix reintroduces an unconditionalif (!settings->path_ref)(the original shape pre-PR), restore the guard. Theno_referencefield has been in tree since the tiny-AI surface landed, so the merge conflict is a literal hunk replace.vmaf.c: in themain()body thefile_ref = fopen(c.path_ref, ...)call now opensc.path_distwhenc.no_referenceis true. If an upstream sync collapses the open into a helper, propagate the conditional.open_input_videosalso gained ano_reference-aware error message (usesc->path_distwhen ref is being faked).core/tools/AGENTS.md: new ADR-0519 entry under "Governing ADRs" documents the CLI gate invariant + frame-loop invariant. Keep the entry through future merges.
core/src/libvmaf.c is not touched; the public API (vmaf_read_pictures, vmaf_use_tiny_model) is unchanged. The rank-4 DNN dispatch in vmaf_ctx_dnn_run_frame_nchw is upstream- internal and consumes ref argument bytes without caring about the slot semantics — the CLI's open-twice strategy works precisely because that dispatch is slot-agnostic. If an upstream refactor changes the dispatch to consult both ref and dist (e.g. for FR-only dual-input models), the fork-side wiring needs to either pass dist explicitly or expose a public vmaf_read_pictures_nr API.
ffmpeg-patch stack: no impact. The fork's FFmpeg filter does not surface NR-mode wiring today.
Netflix upstream does not ship --no-reference; the flag is a fork-local addition.
fix/msvc-unistd-gating (ADR-0521)¶
Rebase sensitivity: low — targeted portability guards on upstream-shared files.
Two files touched: core/src/feature/x86/vif_avx512.c and core/tools/yuv_input.c.
vif_avx512.c is a fork-local AVX-512 TU (no Netflix/vmaf upstream equivalent). The VMAF_NOINLINE_NOCLONE macro is added at the TU level and does not affect public headers or the ABI.
yuv_input.c has an upstream counterpart in Netflix/vmaf. The _WIN32 shims (fstat → _fstat64, S_ISREG, off_t) are added inside the existing #ifdef _WIN32 block, immediately after the already-present _fileno alias. On upstream sync: check whether Netflix has independently added MSVC portability to yuv_input.c; if so, prefer their solution and drop the fork-local block. The change is a four-line addition inside an existing guarded block — low merge conflict risk.
No ffmpeg-patches file touches either file. No public API change.
fix/per-shot-scene-threshold-and-1-shot-chart (ADR-0513)¶
No rebase impact. Changes confined to fork-local trees: tools/vmaf-tune/src/vmaftune/per_shot.py (new split_long_shots helper + diff_threshold / framerate / max_shot_duration_sec kwargs on detect_shots), tools/vmaf-tune/src/vmaftune/cli.py (--scene-threshold + --max-shot-duration flags on tune-per-shot), tools/vmaf-tune/src/vmaftune/report.py (_shot_plot_fn uses ax.hlines bands instead of a step plot), tools/vmaf-tune/tests/test_per_shot.py + test_report.py (6 new regression tests). The C-side core/tools/vmaf_per_shot.c is untouched — the new --scene-threshold flag passes through to the existing --diff-threshold C option that has been in tree since ADR-0222. Docs: ADR-0512, docs/adr/README.md index row, docs/usage/vmaf-tune.md flag rows + "Tuning scene sensitivity" section, docs/state.md Recently-closed rows, changelog.d/fixed/per-shot-scene-threshold-and-1-shot-chart.md. Netflix upstream does not ship tools/vmaf-tune/.
feat/compare-rate-quality-sweep — ADR-0516¶
No rebase impact. Changes confined to fork-local files: tools/vmaf-tune/src/vmaftune/compare.py (new compare_codecs_sweep, SweepReport, probe_encoder_available, detect_schema_version, v2 emitters, DEFAULT_CPU_ENCODERS, HARDWARE_ENCODERS, SCHEMA_VERSION_V1, SCHEMA_VERSION_V2), tools/vmaf-tune/src/vmaftune/cli.py (the _run_compare runner gains --target-vmafs parsing + sweep dispatch, the _run_report runner ingests v2 JSON into CodecSweepPoint), tools/vmaf-tune/src/vmaftune/report.py (new CodecSweepPoint, compute_pareto_frontier, _sweep_plot_fn per-codec line chart, v2 summary table renderers in both markdown + HTML), tools/vmaf-tune/tests/test_compare_rate_quality_sweep.py (new file, 24 regression tests), tools/vmaf-tune/AGENTS.md (v1 vs v2 schema invariant note + per-target bisect predicate construction rule), docs/usage/vmaf-tune.md (multi-target sweep section + flag table update + schema migration note), docs/adr/0516-vmaf-tune-compare-rate-quality-sweep.md (new), docs/adr/README.md (index row), docs/state.md (Recently closed row), changelog.d/added/compare-rate-quality-sweep.md (new fragment). Netflix upstream does not ship tools/vmaf-tune/; no upstream-shared C sources, public headers, Meson options, or ffmpeg-patches/ patches are touched.
fix/compare-source-is-container-plumbing (ADR-0509)¶
No rebase impact. Changes confined to tools/vmaf-tune/ (fork-local package) — src/vmaftune/cli.py (the _run_compare runner, the new _TrackedDefaultAction argparse action, _stamp_tracked_default_sentinels, and the _resolve_compare_source_geometry helper) and tests/test_compare.py (7 new regression tests). Netflix upstream does not ship tools/vmaf-tune/; no upstream-shared C sources, public headers, or build files are modified. The ADR (0509) and changelog fragment are fork-local docs only.
fix/chug-extract-vmaf-alignment — ADR-0510¶
No rebase impact. Changes confined to fork-local files: ai/scripts/extract_k150k_features.py, ai/scripts/chug_extract_features.py, ai/tests/test_extract_k150k_features.py, ai/tests/test_chug.py, ai/tests/test_chug_extract_features_smoke.py (new), docs/adr/0510-chug-extract-vmaf-alignment-fr-from-nr-guard.md (new), docs/adr/README.md (index row), docs/rebase-notes.md (this entry), docs/state.md (Recently closed row), ai/AGENTS.md (K150K-A invariant update), changelog.d/fixed/0509-*.md (new). The entire ai/ package and the FR-from-NR adapter pattern are fork-local — Netflix upstream has no CHUG ingestion, no K150K-A extractor, and no FR-from-NR adapter. No upstream-shared code, headers, build files, public C-API, or feature extractors are modified; the libvmaf CLI and all backends are unchanged.
fix/vulkan-two-variant-vif-shader (ADR-0512, supersedes ADR-0492)¶
No rebase impact on Netflix upstream — the Vulkan backend and its GLSL compute shaders are entirely fork-local (the core/src/vulkan/ and core/src/feature/vulkan/ trees do not exist in upstream). Fork-internal rebase invariants:
core/src/feature/vulkan/shaders/vif.compwas renamed intovif_fp64.comp+ new siblingvif_fp32.comp(the original file is removed). Any future patch series that targetsvif.compby name must be retargeted onto both variants — kernel changes touch BOTH in lockstep (seecore/src/feature/vulkan/AGENTS.md).VmafVulkanContextgained anint has_float64field (core/src/vulkan/vulkan_internal.h). Wire-compatible: feature TUs read it via the internal header, not the public ABI.VmafVulkanConfigurationgained a publicint require_fp64field (core/include/libvmaf/libvmaf_vulkan.h). Append-only ABI extension — existing zero-initialised callers get the auto-fallback default.- New internal entry point
vmaf_vulkan_context_new_with_opts(out, device_index, require_fp64); the originalvmaf_vulkan_context_newis preserved as a wrapper that passesrequire_fp64 = 0. - New CLI flag
--vulkan-require-fp64(and underscore alias--vulkan_require_fp64); the usage string was split across twofprintfcalls to stay under the C99 4095-char string-literal limit.
fix/dev-container-backend-exposure (ADR-0514)¶
No rebase impact. dev/Containerfile, dev/docker-compose.yml, and dev/AGENTS.md are entirely fork-local — upstream Netflix/vmaf does not ship the vmaf-dev-mcp container stack. If upstream ever ships its own dev container, merge by adopting upstream's image discipline and re-applying the four invariants documented in dev/AGENTS.md (tcm/latest/lib on LD_LIBRARY_PATH, no VK_ICD_FILENAMES pin, /dev/dri/by-path bind-mount, build-time backend probe).
fix/mcp-run-benchmark-repair — no rebase impact¶
All changed files (mcp-server/, testdata/bench_all.sh, docs/adr/0513-*, docs/mcp/, changelog.d/, docs/state.md) are fork-local. bench_all.sh is a fork-local benchmarking helper not present in Netflix/vmaf upstream. No rebase action required on upstream sync.
fix/restore-cuda-kernel-lifecycle-helpers¶
No rebase impact. Investigation confirmed VmafCudaKernelLifecycle, VmafCudaKernelReadback, and helper functions are intact in core/src/cuda/kernel_template.h (fork-local, ADR-0246). Changes confined to docs/state.md and changelog.d/ -- both fork-local, not present in Netflix upstream.
refactor/aiutils-vmaftune-corpus-dedup — no rebase impact¶
tools/vmaf-tune/ is fork-local. ai/src/aiutils/ is fork-local. No upstream-shared files are touched; no rebase action required.
fix/saliency-per-mb-eval-2026-05-15 — integer_vif enable_chroma¶
refactor/gpu-dispatch-env-pthread-once (ADR-0461)¶
No rebase impact: adds core/src/gpu_dispatch_env.{h,c} (new fork-local files) and modifies cuda/dispatch_strategy.c, vulkan/dispatch_strategy.c, sycl/dispatch_strategy.cpp — all fork-local TUs with no Netflix upstream equivalents. If upstream Netflix ever introduces their own dispatch env handling, merge by adopting their approach and dropping this helper.
No rebase impact: doc-only change. docs/state.md is fork-local and not present in Netflix upstream; upstream syncs do not touch it.
test/output-public-api-coverage-2026-05-16¶
No rebase impact. All changes are confined to core/test/test_output.c and changelog.d/. The test file is fork-local; upstream Netflix/vmaf does not ship test_output.c. No upstream-shared C sources, public headers, or build files are modified.
fix/sycl-motion-fps-weight-vulkan-import-status-2026-05-16¶
Sub-task B -- integer_motion_v2_sycl.cpp: adds motion_fps_weight to MotionV2StateSycl struct and options_motion_v2_sycl[]. If upstream Netflix ever adds motion_fps_weight to integer_motion_v2.c (the CPU reference), both the SYCL and CUDA motion_v2 twins should pick it up in the same PR per the invariant added to core/src/feature/sycl/AGENTS.md.
Sub-task A -- libvmaf_vulkan.h: removes stale -ENOSYS until T7-29 part 2 lands from the @return lines of vmaf_vulkan_import_image, vmaf_vulkan_wait_compute, and vmaf_vulkan_read_imported_pictures. No upstream rebase conflict expected -- the public Vulkan header is fork-local.
No rebase impact: fix/dev-mcp-stage3-and-bundled-fixes-2026-05-16 touches only dev/Containerfile, dev/AGENTS.md, docs/research/0135-*, and changelog.d/fixed/dev-mcp-container-stage-3.md. These are all fork-local infra files; no upstream-shared code, headers, build files, or feature extractors are modified. No sync-upstream conflicts expected.
No rebase impact: audit/t3-9b-ssimulacra2-ulp-audit — doc-only PR (ADR-0467, changelog fragment, BACKLOG update). No C files touched. No upstream-shared paths modified.
No rebase impact: feat/tiny-ai-registry-ci-and-saliency-v2-promotion-2026-05-15 touches model/tiny/registry.json (fork-local tiny-AI registry), docs/ai/models/ (fork-local model cards), docs/adr/0444-* (fork-local ADR), and the registry-validate CI job (fork-local CI). No upstream-shared code, headers, build files, or feature extractors are modified; the saliency model change is registry and docs only — the C-side mobilesal extractor is unaffected. Sync-upstream conflicts in this area are not expected.
No rebase impact: fix/mcp-embedded-docs-live-2026-05-14 updates fork-local MCP documentation and tools/vmaf-tune auto-planner code only; it does not touch upstream-shared code, headers, build files, or rebase-sensitive invariants.
The intended reader is whoever runs the next /sync-upstream (see ADR-0002 and .claude/skills/sync-upstream/). Read top-to-bottom before resolving conflicts.
Format¶
Each entry is a ### NNNN — short title heading with three fields:
- Touches: paths likely to conflict on upstream merge.
- Invariant: what the fork relies on that an upstream change could silently drop.
- Re-test: the command(s) to run after the merge to confirm the invariant survived. Reproducer-style — no surrounding prose required.
IDs are assigned in commit order and never reused. A single entry may cover several PRs in one workstream; cross-link from the ID heading.
Entries (backfilled 2026-04-18 per ADR-0108 adoption)¶
perf/vif-cpu-workspace-hoist-2026-05-16 — VifState scratch buffer hoist (ADR-0452)¶
- Touches:
core/src/feature/vif.c,core/src/feature/float_vif.c,core/src/feature/vif.h. - Invariant:
VifStategains afloat *vif_buffield (VIF_SCRATCH_BUF_CNT × scaled_float_stride × scaled_hbytes, allocated ininit, freed inclose).compute_vif's signature gains a trailingfloat *data_bufparameter — callers must pass a buffer of at least10 × ALIGN_CEIL(w * sizeof(float)) × hbytes. If an upstream Netflix commit modifiescompute_vif's signature or adds fields to the implicit scratch layout, the fork's extra parameter must be reconciled with the upstream change. The fork does NOT carry the upstream per-frame allocation; if upstream adds a new scratch sub-plane, extendVIF_SCRATCH_BUF_CNTand theVifState::vif_bufallocation size in the same PR. - Re-test:
```shell ninja -C build meson test -C build 2>&1 | grep -E "Ok|Fail" # Confirm 0 failures
perf/cambi-sycl-event-chain-2026-05-16 — CAMBI SYCL GPU-to-GPU event chains (SY-1)¶
- Touches:
core/src/feature/sycl/integer_cambi_sycl.cpp,core/src/feature/sycl/AGENTS.md,docs/adr/0471-cambi-sycl-event-chain.md. - Invariant:
launch_spatial_mask,launch_decimate, andlaunch_filter_modenow returnsycl::eventand accept asycl::event depparameter (exceptlaunch_spatial_mask, which has no predecessor). If upstream Netflix ever rewrites the CAMBI SYCL port (unlikely — SYCL is fork-only), preserve the event-chain structure and ensure the two semantically-requiredq.wait()points (post-H2D and post-D2H) remain. The CUDA twin (integer_cambi_cuda.c) retains synchronous v1 posture and is not affected by this change. - Re-test:
meson test -C build --suite=fast cambi_sycl
python3 scripts/ci/cross_backend_parity_gate.py --feature cambi --places 4
fix/psnr-enable-chroma-gpu-parity-2026-05-16 — PSNR enable_chroma option GPU parity¶
- Touches:
core/src/feature/cuda/integer_psnr_cuda.c,core/src/feature/sycl/integer_psnr_sycl.cpp,core/src/feature/vulkan/psnr_vulkan.c,docs/metrics/features.md,docs/research/0135-*,docs/adr/0452-*,changelog.d/fixed/psnr-enable-chroma-cross-backend.md,docs/research/0136-psnr-enable-chroma-cross-backend-2026-05-16.md,docs/adr/0453-psnr-enable-chroma-gpu-parity.md. - Invariant: The
enable_chromaoption default istrueon all backends. Then_planesclamp in GPUinit()must stay in the following order: (1)pix_fmt == YUV400Psetsn_planes = 1; (2)!enable_chroma && n_planes > 1also clamps to 1. If upstream Netflix ever adds option-table support to the CUDA/Vulkan twins, port any new options but preserve theenable_chromaentry and itsdefault_val.b = trueexactly — a default flip tofalsewould silently suppress chroma output and break the cross-backend parity gate. - Re-test:
python3 scripts/ci/cross_backend_parity_gate.py \
--backends cpu cuda --features psnr --places 4
python3 scripts/ci/cross_backend_parity_gate.py \
--backends cpu cuda --features psnr --places 4 \
--feature-opts 'psnr=enable_chroma=false' \
--feature-opts 'psnr_cuda=enable_chroma=false'
fix/vmaf-tune-temporal-saliency-2026-05-15 — recommend-saliency temporal aggregation¶
- Touches:
tools/vmaf-tune/src/vmaftune/saliency.py,tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/tests/test_saliency.py, anddocs/usage/vmaf-tune.md. - Invariant:
meanremains the default compatibility reducer forrecommend-saliency --saliency-aggregator. Changing the default changes user-visible saliency ROI behaviour and needs an ADR-0396 follow-up plus usage-doc update. - Re-test:
fix/saliency-per-mb-eval-2026-05-15 — saliency per-block IoU evaluator¶
- Touches:
ai/scripts/eval_saliency_per_mb.py,ai/tests/test_eval_saliency_per_mb.py,ai/AGENTS.md,docs/ai/saliency-per-mb-eval.md,docs/ai/index.md,docs/ai/roadmap.md, andmkdocs.yml. - Invariant: video-saliency model promotion should be measured at the encoder ROI block grid, not only full-resolution pixel IoU. Keep the evaluator dependency-light (
numpyplus.npy/ PGM loaders) so training sandboxes can run it without Pillow or OpenCV. - Re-test:
fix/chug-hdr-audit-splits-2026-05-15 — CHUG HDR audit and content-safe splits¶
- Touches:
ai/scripts/chug_extract_features.py,ai/scripts/train_konvid_mos_head.py,ai/scripts/extract_k150k_features.py,ai/tests/test_chug.py,ai/tests/test_train_konvid_mos_head.py,ai/tests/test_extract_k150k_features.py,ai/AGENTS.md,docs/ai/chug-ingestion.md, anddocs/ai/datasets/k150k.md. - Invariant: CHUG train/validation/test partitions are keyed by
chug_content_name, not by individual bitrate-ladder rows. The materialiser writessplit,chug_split_key, andchug_split_policyinto every feature row. Preserve the--audit-outputffprobe HDR metadata audit as a pre-training guard.train_konvid_mos_head.pyconsumes explicit splits when available instead of silently re-shuffling CHUG rows. The FR-from-NR parquet extractor preserves CHUG side metadata when--metadata-jsonlis supplied. - Re-test:
PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_chug.py ai/tests/test_train_konvid_mos_head.py ai/tests/test_extract_k150k_features.py -q
fix/tiny-ai-rgb-high-bitdepth-2026-05-15 — LPIPS / DISTS high-bit-depth input¶
- Touches:
core/src/dnn/tiny_extractor_template.h,core/src/feature/feature_lpips.c,core/src/feature/feature_dists.c,core/test/test_dists.c,core/src/dnn/AGENTS.md,core/src/feature/AGENTS.md,docs/ai/extractor-template.md,docs/ai/models/lpips_sq.md,docs/ai/models/dists_sq.md,docs/metrics/dists.md, anddocs/metrics/features.md. - Invariant: LPIPS and DISTS-Sq accept planar 8/10/12/16-bit YUV while keeping the ONNX tensor ABI unchanged: ImageNet-normalised RGB8, NCHW
[1,3,H,W], named inputsref/dist, scalar outputscore. High-bit-depth samples are little-endian 16-bit containers rounded into the 8-bit domain before the shared BT.709 limited-range RGB conversion. - Re-test:
meson test -C core/build-tiny-rgb-hbd test_dists test_lpips --print-errorlogs
fix/mcp-runtime-doc-status-2026-05-15 — embedded MCP runtime docs¶
- Touches:
docs/api/mcp.md,docs/development/build-flags.md,core/meson_options.txt, andlibvmaf/AGENTS.md. - Invariant: embedded MCP is no longer an all-entrypoint
-ENOSYSscaffold. Preserve the runtime contract when rebasing: stdio / UDS / loopback-SSE transports are live when their build flags are enabled;compute_vmafuses a per-call ephemeralVmafContext; mutating measurement-thread tools still wait on the future SPSC bridge;enable_mcpremains default-off until that bridge lands. - Re-test:
meson setup /tmp/vmaf-mcp-doc-check -Denable_mcp=true -Denable_mcp_stdio=true -Denable_mcp_uds=true -Denable_mcp_sse=enabled && ninja -C /tmp/vmaf-mcp-doc-check test_mcp_smoke && meson test -C /tmp/vmaf-mcp-doc-check test_mcp_smoke
fix/mcp-compute-vmaf-high-bitdepth-2026-05-15 — MCP compute_vmaf bitdepth¶
- Touches:
core/src/mcp/compute_vmaf.c,core/src/mcp/dispatcher.c,core/test/test_mcp_smoke.c,core/src/mcp/AGENTS.md,docs/api/mcp.md, anddocs/mcp/embedded.md. - Invariant: embedded MCP
compute_vmafaccepts YUV420p at 8/10/12/16 bpc and defaults to 8 whenbitdepthis omitted. High-bit-depth raw samples are little-endian 16-bit words read directly into libvmaf picture storage. Do not silently add YUV422P or YUV444P without extending the tool schema with an explicitpixel_formatargument and matching docs/tests. - Re-test:
meson test -C core/build-mcp-hbd test_mcp_smoke --print-errorlogs
fix/chug-cuda-feature-split-2026-05-15 — FR-from-NR CUDA feature split¶
- Touches:
ai/scripts/extract_k150k_features.py,ai/tests/test_extract_k150k_features.py,ai/AGENTS.md,docs/ai/datasets/k150k.md, anddocs/ai/chug-ingestion.md. - Invariant: CUDA mode in the FR-from-NR extractor uses explicit CUDA feature names for the stable CUDA pass and
--cpu-vmaf-binfor the residual CPU feature pass (float_ssim,cambi). Do not collapse this back into one generic all-feature--backend cudainvocation; local CHUG 10-bit clips reproduced duplicate feature-key writes and CUDA context synchronization failures on that path. - Re-test:
PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_extract_k150k_features.py -q
fix/vmaf-tune-libvpx-adapter-2026-05-14 — vmaf-tune libvpx-vp9 adapter¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py,tools/vmaf-tune/src/vmaftune/codec_adapters/libvpx.py,tools/vmaf-tune/src/vmaftune/encode.py, anddocs/usage/vmaf-tune*.md. - Invariant:
libvpx-vp9stays a normal codec-adapter registry entry. Do not add VP9 branches to corpus / encode search loops; the adapter owns-deadline good,-cpu-used,-crf,-b:v 0,-row-mt 1, and FFmpeg-native-pass/-passlogfilewiring.supports_encoder_statsremains false until a binary VP9 first-pass stats parser lands. - Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_codec_adapter_libvpx.py tools/vmaf-tune/tests/test_encode_multi_codec.py -q
fix/ai-frame-loader-color-pixfmt-2026-05-14 — packed colour frame loader¶
- Touches:
ai/src/vmaf_train/data/frame_loader.py,ai/tests/test_frame_loader.py,docs/ai/training.md, andai/AGENTS.md. - Invariant: frame-loader support is limited to byte-contiguous formats with unambiguous tensor shape:
gray->HxW, andrgb24/bgr24/rgba/bgra->HxWxC. Planar or subsampled formats such asyuv420pmust keep failing before spawning ffmpeg until a PR adds explicit plane semantics. - Re-test:
PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_frame_loader.py -q
fix/mkdocs-strict-pre-push-2026-05-15 — mkdocs strict-mode pre-push hook¶
- Touches:
scripts/git-hooks/pre-push-mkdocs-strict.sh(new),scripts/git-hooks/pre-push(delegation call appended),.pre-commit-config.yaml(newmkdocs-strictlocal hook entry),docs/adr/0466-mkdocs-strict-pre-push-hook.md(new ADR). - Invariant: The hook gate mirrors the CI
docs.ymllane (ADR-0403):mkdocs build --strict --quietwith the repo-rootmkdocs.yml. Keeping the hook's config-file flag pointed atmkdocs.ymlin the repo root is load-bearing — ifmkdocs.ymlis ever moved, updatepre-push-mkdocs-strict.shin the same PR. TheSKIP=mkdocs-strictbypass token is the per-hook escape hatch; preserve it across rebases so the CI-gate-mirror contract (which also respectsSKIP) stays coherent. - Re-test:
# Touch a docs file with a known-good anchor, push — hook should pass:
touch docs/index.md && git push
# Touch docs/index.md, add a broken anchor ref, push — hook should block:
echo "[bad](#nonexistent)" >> docs/index.md && git push
fix/dists-extractor-2026-05-14 — DISTS-Sq extractor smoke surface¶
- Touches:
core/src/feature/feature_extractor.c,core/src/feature/feature_dists.c,core/src/meson.build,core/test/meson.build,.gitattributes,model/tiny/registry.json, anddocs/metrics/dists.md. - Invariant:
dists_sqis a registered tiny-AI full-reference extractor that mirrors LPIPS' two-input ABI:model_pathoption,VMAF_DISTS_SQ_MODEL_PATHenvironment fallback, ONNX inputsref/dist, scalar outputscore, and emitted feature keydists_sq.model/tiny/dists_sq.onnxis a smoke placeholder markeddists_sq_placeholder_v0; do not present it as production DISTS weights. - Re-test:
meson test -C build-dists test_dists && .venv/bin/python ai/scripts/validate_model_registry.py model/tiny/registry.json
fix/backlog-gap-pass-10-2026-05-14 — KonViD-150k split score ingestion¶
- Touches:
ai/scripts/konvid_150k_to_corpus_jsonl.py,ai/tests/test_konvid_150k.py,docs/ai/konvid-150k-ingestion.md,ai/AGENTS.md. - Invariant:
konvid_150k_to_corpus_jsonl.pyaccepts both the URL-manifest layout (manifest.csv+clips/) and the staged split score layout (k150ka_scores.csv/k150kb_scores.csvplusk150ka_extracted//k150kb_extracted/). Explicit--manifest-csvstays strict and must not silently fall back. Output JSONL schema remains unchanged. - Re-test:
PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_konvid_150k.py -q
fix/backlog-gap-pass-11-2026-05-14 — vmaf-tune auto source probe¶
- Touches:
tools/vmaf-tune/src/vmaftune/auto.py,tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/tests/test_auto_short_circuits.py,docs/usage/vmaf-tune.md, andtools/vmaf-tune/AGENTS.md. - Invariant:
run_auto(smoke=False, meta_override=None)is not a scaffold. It probes source geometry, duration, and HDR once through_probe_source_meta, using one subprocess runner seam for testability. Probe failure must degrade to conservative defaults rather than raising or reintroducingNotImplementedError. - Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_auto_short_circuits.py \
tools/vmaf-tune/tests/test_auto_confidence_aware.py \
tools/vmaf-tune/tests/test_auto_recipe_overrides.py \
tools/vmaf-tune/tests/test_auto_phase_f1_f2.py -q
.venv/bin/python -m ruff check \
tools/vmaf-tune/src/vmaftune/auto.py \
tools/vmaf-tune/src/vmaftune/cli.py \
tools/vmaf-tune/tests/test_auto_short_circuits.py
fix/backlog-gap-pass-12-2026-05-14 — MCP docs + SSIMULACRA2 snapshot hardening¶
- Touches:
python/test/ssimulacra2_test.py,docs/mcp/index.md,docs/mcp/embedded.md,docs/mcp/release-channel.md,mcp-server/vmaf-mcp/README.md,mcp-server/AGENTS.md. - Invariant: the SSIMULACRA2 snapshot gate remains fork-local. It pins current extractor output for the 576x324 fixture with explicit x86_64 and arm64/aarch64 baselines, pins the shared 160x90 tail fixture, and must invoke the repo
vmafbinary with an argv list, not a shell string. The external MCP server docs list all seven live tools. The embedded MCP docs describe the v3 runtime accurately: stdio, UDS, and loopback SSE are live;list_featuresandcompute_vmafare live; the SPSC measurement-thread drain and mutating tools remain future work. - Re-test:
PYTHONPATH=python .venv/bin/python -m pytest python/test/ssimulacra2_test.py -q && PYTHONPATH=mcp-server/vmaf-mcp/src .venv/bin/python -m pytest mcp-server/vmaf-mcp/tests/test_server.py -q
fix/read-json-model-dynamic-limits-2026-05-14 — dynamic JSON model arrays¶
- Touches:
core/src/read_json_model.c,core/src/model.h,core/src/model.c, andcore/test/test_model.c. - Invariant: JSON model loading grows
VmafModel.featureandscore_transform.knots.listfrom the payload. Do not restore the old fixedMAX_FEATURE_COUNT/MAX_KNOT_COUNTparser caps; models with 65+ features or 11+ score-transform knots must parse when the JSON is otherwise valid. - Re-test:
meson test -C build test_model --print-errorlogs.
fix/real-scaffold-gap-pass-4-2026-05-14 — vmaf-tune x264 two-pass¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/x264.py,tools/vmaf-tune/src/vmaftune/encode.pyconsumers, anddocs/usage/vmaf-tune.md. - Invariant:
libx264opts into the shared Phase F two-pass seam throughsupports_two_pass = Trueandtwo_pass_args() -> ("-pass", N, "-passlogfile", path). The encode driver must stay adapter-driven; do not add an x264 branch inbuild_ffmpeg_command. - Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_codec_adapter_x265_two_pass.py tools/vmaf-tune/tests/test_auto_phase_f1_f2.py -q
fix/backlog-gap-pass-8-2026-05-14 — CUDA psnr_hvs drain-batch integration¶
- Touches:
core/src/feature/cuda/integer_psnr_hvs_cuda.c,core/src/feature/cuda/AGENTS.md,docs/backends/cuda/overview.md,docs/development/cuda-profile-2026-05-03.md. - Invariant:
integer_psnr_hvs_cuda.cenqueues all three plane-partial DtoH copies ons->lc.strduring submit, callsvmaf_cuda_kernel_submit_post_record(&s->lc, fex->cu_state), and usesvmaf_cuda_kernel_collect_wait(&s->lc, fex->cu_state)in collect before readingh_partials[]. Do not move the readback + rawcuStreamSynchronize(s->lc.str)back into collect. - Re-test:
meson setup build-cuda-drain libvmaf -Denable_cuda=true -Denable_sycl=false -Denable_vulkan=disabled --buildtype=debug && ninja -C build-cuda-drain src/libvmaf.so.3.0.0 && python3 scripts/ci/cross_backend_vif_diff.py --vmaf-binary "$PWD/build-cuda-drain/tools/vmaf" --reference testdata/ref_576x324_48f.yuv --distorted testdata/dis_576x324_48f.yuv --width 576 --height 324 --feature psnr_hvs --backend cuda --places 3. If the local CUDA fatbin build is blocked by toolkit include-path drift, at minimum compile the touched host TU:ninja -C build-cuda-drain src/liblibvmaf_feature.a.p/feature_cuda_integer_psnr_hvs_cuda.c.o.
fix/tune-scaffold-gap-pass-2-2026-05-14 — vmaf-tune per-shot real bisect CLI¶
- Touches:
tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/src/vmaftune/per_shot.py,tools/vmaf-tune/tests/test_per_shot.py,docs/usage/vmaf-tune.md,docs/adr/0392-vmaf-tune-phase-d-per-shot.md,tools/vmaf-tune/AGENTS.md. - Invariant: the CLI default for
vmaf-tune tune-per-shotis the real Phase-B bisect backend. It extracts each detected half-open shot to temporary raw YUV, passes explicit geometry intobisect_target_vmaf, and emits measured per-shot VMAF in the JSON plan.--predicate-module MODULE:CALLABLEis the only CLI path that bypasses real bisect; the adapter-default predicate remains library-only dry-run behaviour. - Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q.
fix/scaffold-gap-pass-2026-05-14b — vmaf-tune compare real bisect CLI¶
- Touches:
tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/src/vmaftune/compare.py,tools/vmaf-tune/tests/test_compare.py,docs/usage/vmaf-tune.md,docs/usage/vmaf-tune-bisect.md,tools/vmaf-tune/AGENTS.md. - Invariant: the CLI default for
vmaf-tune compareis the real Phase-B bisect backend when source geometry is supplied. The programmaticcompare_codecs()default may still returnok=Falsebecause it lacks geometry, but the CLI must not silently rank using a placeholder predicate.--predicate-module MODULE:CALLABLEis the explicit custom/test escape hatch. - Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_compare.py -q.
fix/scaffold-gap-pass-2026-05-14 — vmaf-tune hardware predictor real weights¶
- Touches:
tools/vmaf-tune/src/vmaftune/predictor_train.py,model/predictor_{h264,hevc,av1}_{nvenc,qsv}.onnx, matching model cards,docs/ai/predictor.md,tools/vmaf-tune/AGENTS.md. - Invariant: the trainer accepts canonical Phase-A rows and historical hardware-sweep aliases. Do not add an external corpus-conversion script for
runs/phase_a/full_grid/comprehensive.jsonl; the loader is the compatibility seam. - Re-test:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest tools/vmaf-tune/tests/test_predictor_train.py -q.
fix/vmaf-tune-ai-scaffold-state-cleanup — auto HDR dispatch + ensemble seed registry flip (2026-05-14)¶
- Touches:
tools/vmaf-tune/src/vmaftune/auto.py,tools/vmaf-tune/tests/test_auto_short_circuits.py,tools/vmaf-tune/tests/test_auto_recipe_overrides.py,model/tiny/registry.json,python/test/model_registry_schema_test.py,docs/usage/vmaf-tune.md,docs/state.md,docs/research/0100-vmaf-tune-ai-scaffold-audit-2026-05-14.md,changelog.d/fixed/vmaf-tune-ai-scaffold-state-cleanup.md. - Invariant:
vmaf-tune automust usevmaftune.hdr.hdr_codec_args(codec, info)per HDR cell; a single generic PQ tuple is not valid because x265/SVT-AV1/NVENC/VVenC carry HDR signalling through different ffmpeg flag families. Recipe-adjustedeffective_thresholdsfrom_apply_recipe_overridemust be the thresholds used for F.3 decisions and JSON metadata. The fivefr_regressor_v2_ensemble_v1_seed{0..4}registry rows are production entries (smoke: false) only while their sidecars carry matching SHA-256s and a passing PROMOTE gate. - Re-test on rebase:
PYTHONPATH=tools/vmaf-tune/src python -m pytest \
tools/vmaf-tune/tests/test_auto_short_circuits.py \
tools/vmaf-tune/tests/test_auto_recipe_overrides.py \
tools/vmaf-tune/tests/test_auto_confidence_aware.py \
tools/vmaf-tune/tests/test_hdr.py -v
PYTHONPATH=python python -m pytest python/test/model_registry_schema_test.py -v
bash core/test/dnn/test_registry.sh
feat/libvmaf-metal-filter-iosurface — Metal IOSurface zero-copy import (ADR-0423)¶
- Touches:
core/include/libvmaf/libvmaf_metal.h(newVmafMetalExternalHandles+ four entry points appended),core/src/metal/picture_import.mm(new TU implementing the IOSurfaceLock + memcpy ring),core/src/metal/state_priv.h(shared struct defs between common.mm and picture_import.mm),core/src/metal/import.h(internal bridge for libvmaf.c HAVE_METAL block),core/src/metal/common.mm(state-free hook for the import ring),core/src/libvmaf.c(HAVE_METAL block:vmaf_metal_import_state/vmaf_metal_read_imported_pictures),core/src/metal/meson.build(one-line TU registration),core/test/test_metal_smoke.c(input-validation + device-default skip semantics),ffmpeg-patches/0013-libvmaf-add-libvmaf-metal-filter.patch(new),ffmpeg-patches/series.txt,ffmpeg-patches/README.md. - Invariant: the import path is geometry-pinned to the first (w, h, bpc) tuple seen — subsequent imports with a different geometry return
-EINVAL. Ring depth is 2 slots (VMAF_METAL_IMPORT_RING); a slot is identified byindex % VMAF_METAL_IMPORT_RINGand discarded if the caller'sindexno longer matches the stored one. CPU memcpy path is synchronous sovmaf_metal_wait_computeis a no-op (returns 0); do not promote it to aMTLSharedEventdrain without first switching the import body to an asyncMTLCommandBuffersubmission. Apple-Family-7+ gate is enforced insidevmaf_metal_state_init_externalvia[device supportsFamily:MTLGPUFamilyApple7]→-ENODEVon non-Apple hosts; the ffmpeg patch surfaces this asAVERROR(ENODEV)atconfig_props_metaltime. Symbol names are load-bearing for thecheck_pkg_configprobe in patch 0013; do not rename without simultaneously updating the patch. - Re-test on rebase:
meson setup build libvmaf -Denable_metal=enabled \
-Denable_cuda=false -Denable_sycl=false
ninja -C build
nm build/libvmaf/libvmaf.dylib | grep vmaf_metal_picture_import
git -C ffmpeg-8 reset --hard n8.1.1
for p in ffmpeg-patches/000*-*.patch; do
git -C ffmpeg-8 am --3way "$p" || break
done
Upstream Netflix/vmaf has no Metal backend; no rebase conflict surface against upstream/master. The 0013 patch is fork-local.
fix/saliency-per-mb-eval-2026-05-15 (Batch 4) — Metal install + header fix (ADR-0437)¶
- Touches:
core/include/core/meson.build(addsis_metal_enabledguard +libvmaf_metal.htoplatform_specific_headers),core/test/meson.build(addstest_metal_install_headerunderhost_machine.system() == 'darwin'),core/test/test_metal_install_header.c(new compile+link smoke test),docs/api/gpu.md(Metal + HIP symbol corrections, IOSurface sub-API table),docs/adr/0437-*.md,docs/adr/_index_fragments/0437-*.md,changelog.d/fixed/metal-public-header-install-and-import-state.md,docs/state.md. - Invariant: The
is_metal_enabledguard mirrors the Vulkan guard (is_vulkan_enabled): both treatenabledandautoas "install the header". Do not change this to install only onenabled;autoon macOS resolves to a real Metal build and the header must be present for FFmpeg'scheck_pkg_configto succeed (same rationale as ADR-0192 for Vulkan). - No rebase conflict surface: upstream Netflix/vmaf has no Metal backend;
core/include/core/meson.builddiverges from upstream at the firstis_cuda_enabledline. The only conflict risk is a batch that also editsplatform_specific_headers— resolve by keeping both additions. - Re-test on rebase:
meson setup build -Denable_metal=enabled \
-Denable_cuda=false -Denable_sycl=false
ninja -C build
meson install -C build --destdir /tmp/vmaf-test-install
ls /tmp/vmaf-test-install/usr/local/include/libvmaf/libvmaf_metal.h
fix/metal-includes-and-ffmpeg-patch — Metal kernel batch T8-1c–k (ADR-0421)¶
- Touches:
core/src/feature/metal/*.metal(7 new kernel files),core/src/feature/metal/*_metal.mm(7 new dispatch files replacing*_metal.cscaffolds),core/src/metal/meson.build(.aircustom_targets + metallib pipeline),ffmpeg-patches/0012-*. - Invariant: no
atomic_ulongin any.metalfile — Apple MSL silently drops 64-bit atomic updates (CI run 25685703780). All kernels use per-WGfloat/uintpartials array indexed bybid.y * grid_groups.x + bid.x; host reduces indouble. Thefloat_moment_metal.mmcorrectsprovided_features(was wrong in the scaffold:float_moment1/2/std→ correct namesfloat_moment_ref1st/dis1st/ref2nd/dis2nd). - Re-test on rebase (macOS, Apple-Family-7+):
meson setup build libvmaf -Denable_metal=enabled \
-Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_metal_smoke
On Linux: same build without -Denable_metal (Metal subdir excluded); no Metal tests registered. No upstream rebase conflict surface.
feat/metal-runtime-t8-1b — Metal backend runtime PR (ADR-0420)¶
- Touches:
core/src/metal/common.{c→mm,h},core/src/metal/picture_metal.{c→mm},core/src/metal/kernel_template.{c→mm},core/src/metal/meson.build,core/test/test_metal_smoke.c,core/src/metal/AGENTS.md. - Invariant: the Metal backend ships three Objective-C++ TUs (
common.mm,picture_metal.mm,kernel_template.mm) instead of the T8-1 pure-C scaffold. Public ABI incore/include/libvmaf/libvmaf_metal.hunchanged. Internalmetal/common.hgained two accessor declarations (vmaf_metal_context_{device,queue}_handle) that consumer TUs call to retrieve bridge-retainedvoid *Metal handles. The Obj-C++ TUs compile with-fobjc-arcviaadd_project_arguments(language: 'objcpp'). ARC +__bridge_retained/__bridge_transfercasts manage the +1 retain that lives on each C-struct slot. Upstream Netflix/vmaf has no Metal backend; there is no rebase conflict surface againstupstream/master. - Re-test:
meson setup build -Denable_metal=enabledon a recent macOS host (Apple Silicon preferred) +meson compile -C build+meson test -C build test_metal_smoke. On Apple-7+ the smoke test exercises real-device paths; on Intel Macs it short-circuits cleanly on-ENODEV. Non-Darwin builds are unaffected —subdir('metal')is already gated to Darwin.
fix/sve2-probe-darwin-gate — SVE2 build probe gated to non-Darwin hosts (ADR-0419)¶
- Touches:
core/src/meson.build(the SVE2cc.compiles()probe block). - Invariant:
is_sve2_supported = falseis forced whenhost_machine.system() == 'darwin', mirroring the runtime__linux__gate incore/src/arm/cpu.c::vmaf_get_cpu_flags_arm(). Apple Silicon (M1–M4) is ARMv8.x without SVE2 hardware, so the build-time and runtime gates must stay in lockstep. - Re-test: if upstream Netflix ever introduces its own SVE2 probe in
core/src/meson.build, drop the fork-local Darwin short-circuit in favour of theirs if it matches thedarwin ⇒ falseinvariant; otherwise layer the Darwin guard on top. Reverse the gate only when (a) Apple ships an arm part with SVE2 — no public roadmap as of 2026-05 — and (b) the runtime probe inarm/cpu.cgrows a Darwin branch (e.g.sysctlbyname("hw.optional.arm.FEAT_SVE2", ...)).
fix/macos-test-recal-post-vif-sync — macOS Python test assertions recalibrated for post-bf9ad333 VMAF/ADM values (ADR-0418)¶
- Touches:
python/test/local_explainer_test.py,python/test/vmafexec_test.py,python/test/vmafexec_feature_extractor_test.py. - Invariant: 9+ assertions in those files were updated to the post-VIF-sync values that the macOS-libm binary actually produces, since Netflix upstream only shipped recalibration fixtures for the
test_run_vmaf_*tests via142c0671/7209110e/d93495f5/fe756c9fand not thelocal_explainer_test::test_explain_vmaf_results,vmafexec_test::test_run_vmafexec_runner_akiyo_*, or the 5×vmafexec_feature_extractor::test_run_float_adm_fextractor_adm_*cases. Each updated line carries an inline# post-VIF-sync (#758) recalcomment so the divergence is greppable. Affected values: local_explainer_test.py:103—76.68425574067017 → 76.66740228116836vmafexec_test.py:871—132.732952 → 132.732323vmafexec_test.py:926, 1032, 1086—88.030463 → 88.030322vmafexec_feature_extractor_test.py:1834—0.9420788125 → 0.9185737499999999vmafexec_feature_extractor_test.py:1897—0.9517253541666667 → 0.8902739375vmafexec_feature_extractor_test.py:1960—0.9554477708333334 → 0.8780868749999998vmafexec_feature_extractor_test.py:2023—0.9662835416666665 → 0.8407157499999999vmafexec_feature_extractor_test.py:3030—0.96851 → 0.962086- Re-test: after the next
/sync-upstream, if upstream has shipped recalibrated fixtures for any of the listed test names, prefer upstream values over the fork-recalibrated ones in this entry. Mechanical:git grep "post-VIF-sync (#758) recal"enumerates every divergence; for each row, diff against the upstream value at the same line. If upstream still hasn't shipped the fixtures, leave the fork values in place — they're verified against the on-the-fly-VIF binary on the macOS-libm precision floor.
fix/vif-upstream-onthefly-filter-sync — VIF synced to Netflix upstream bf9ad333 + 8c645ce3¶
- Touches:
core/src/feature/vif.c,vif.h,vif_tools.c,vif_tools.h,vif_options.h,float_vif.c;python/test/quality_runner_test.py,feature_extractor_test.py,result_test.py,vmafexec_test.py. - Invariant: fork's VIF C-side now matches upstream HEAD verbatim for the listed files. The only fork-local divergence is
float_vif.c::extract()passings->vif_skip_scale0 ? 1 : 0for the newcompute_vif()parameter (instead of the upstream pattern of reading from a flag set ininit). Test cherry-picks took upstream values for VIF score assertions and fork values for VMAF_legacy_score / VMAF_score where the fork's binary diverges from upstream's at places=4 (already pre-loosened). - Re-test: after the next
/sync-upstream, runmeson test -C build(must remain 54/54 OK) andPYTHONPATH=python python -m pytest python/test/feature_extractor_test.py python/test/quality_runner_test.py -q(must show 0 failures excludingniqe_runnerskimage env issue). If upstream reverts or further modifies on-the-fly filter generation, this entry's invariant should re-sync rather than carry a fork-local divergence.
fix/master-build-failures-sycl-vulkan — SYCL macro collision + Vulkan SDK fallback + Cambi FR atom rename¶
- Touches:
core/src/feature/sycl/integer_adm_sycl.cpp,core/src/vulkan/common.c,python/vmaf/core/cambi_feature_extractor.py,python/test/cambi_test.py. - Invariant 1 (SYCL):
adm_options.hdefinesADM_BORDER_FACTORas a C macro; the#undefbefore the constexpr redeclaration must remain if upstream ever changes the macro name or value. If upstream removes the macro entirely, the#ifdef-guarded#undefis a no-op and safe. - Invariant 2 (Vulkan):
#ifndef VK_API_VERSION_1_4guard must remain until Ubuntu 22.04 is retired from CI (or the minimum Vulkan Headers version is bumped past 1.3.280). Track via ADR-0264 (NVIDIA driver regression gate). - Invariant 3 (Cambi FR atom feature):
CambiFullReferenceFeatureExtractoruses"cambi_encbd"(not"cambi") as the atom feature name for the distorted CAMBI score. If upstream changes theenc_bitdepthoption alias from"encbd"to something else, the vmafexec XML key changes and the Python extractor's wildcard prefix must be updated to match. - Re-test:
python3 -m pytest python/test/cambi_test.py -k "full_reference or fullref" -v
fix/precommit-onnx-binary-exclude — ADR collision sweep + pre-commit hook hardening¶
- Touches:
docs/adr/*.md(28 files renumbered to 0388–0415),docs/adr/README.md,docs/adr/_index_fragments/,scripts/ci/check-adr-numbering.sh,.pre-commit-config.yaml,tools/vmaf-tune/tests/test_hdr.py. - Invariant: No rebase impact on libvmaf C sources. The ADR renumbering affects documentation only; no code paths reference ADR numbers at runtime. Any in-flight branches that reference the old ADR numbers (0241-vmaf-tiny-v3, 0279-fr-regressor-v2-probabilistic, etc.) will need their references updated to the new numbers after rebasing onto master.
- Re-test:
bash scripts/ci/check-adr-numbering.shmust print "ADR numbering check passed."pre-commit run end-of-file-fixer ruff-check check-adr-numbering --all-filesmust all pass.
fix/round8-mcp-tmpdir-leak — MCP describe_worst_frames tmp-dir cleanup¶
No rebase impact: this change is MCP-server-only (mcp-server/vmaf-mcp/src/vmaf_mcp/server.py), touches no libvmaf C source, no public C API headers, no Meson build files, and no FFmpeg patch stack entries. Upstream Netflix/vmaf does not have the MCP server. The change adds a shutil.rmtree before the per-invocation PNG generation loop.
- Re-test:
PYTHONPATH=mcp-server/vmaf-mcp/src python -m pytest mcp-server/vmaf-mcp/tests/test_server.py::test_describe_worst_frames_tmpdir_cleared_on_next_call— must report 1 passed.
fix/round8-opt-nan-bypass — NaN rejection in set_option_double¶
- Touches:
core/src/opt.c— adds#include <math.h>and anisnan(n)guard inset_option_double. - Invariant: all callers of
vmaf_option_setwithVMAF_OPT_TYPE_DOUBLEmust receive-EINVALwhen the value string parses to NaN. Upstream Netflix/vmaf'sopt.cdoes not yet have this guard. If Netflix merges a version ofopt.cthat modifiesset_option_double(e.g. to add a new type or change the strtod flow), verify theisnanguard is preserved and still sits before then < min/n > maxchecks. - Re-test:
meson setup core/build-test libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_tests=true && ninja -C core/build-test test/test_opt && core/build-test/test/test_opt— must report 25/25 passed, includingtest_double_nan_is_rejectedandtest_double_inf_rejected_when_max_finite.
fix/fex-dedup-by-provided-feature — feature-extractor dedup by provided-feature names (ADR-0385)¶
- Touches:
core/src/fex_ctx_vector.c(newprovided_features_overlap()helper, updatedfeature_extractor_vector_append()dedup logic);core/test/test_feature_extractor.c(new regression test);core/test/meson.build(addsfex_ctx_vector.cto test target sources). - Rebase impact: Low. The change is entirely internal to
fex_ctx_vector.c; no public C API headers are touched, nocore/include/changes, nomeson_options.txt, no FFmpeg patch stack entries. If upstream Netflix/vmaf rewritesfex_ctx_vector.cin a future sync, port theprovided_features_overlap()helper and its two-stage dedup logic forward; reverting to name-only dedup re-opens T-CUDA-FEATURE-EXTRACTOR-DOUBLE-WRITE on every GPU binary that combines--feature <name>with a default model load. - Re-test:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu test_feature_extractor
# Expect: 6/6 tests passed, including
# test_fex_vector_dedup_by_provided_feature_name: pass
# Verify no "cannot be overwritten" warnings:
build-cpu/tools/vmaf \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 --feature adm --threads 1 \
2>&1 | grep "cannot be overwritten" | wc -l
# → 0
fix/pypsnr-ast-eval — JSON log serialization in PyFeatureExtractorMixin¶
No rebase impact: this change is Python-only (python/vmaf/core/feature_extractor.py), touches no C/CUDA/SYCL/HIP/Vulkan/Metal source, no public C API headers, no Meson build files, and no FFmpeg patch stack entries. The log files written by _generate_result are transient per-run scratch files (under workdir/); the format change from Python repr to JSON is invisible to callers. If upstream Netflix/vmaf modifies PyFeatureExtractorMixin._get_feature_scores or _generate_result in a future sync, verify that neither side re-introduces str() / ast.literal_eval — the numpy 2.x incompatibility is the root cause of T-PYPSNR-AST-EVAL.
- Re-test:
PYTHONPATH=$PWD/python python3 -m pytest python/test/feature_extractor_test.py -k pypsnr— must report 8/8 passed.
fix/pypsnr-feature-extractor-import — PyPsnrFeatureExtractor class hierarchy restoration¶
No rebase impact: this change is Python-only (python/vmaf/core/feature_extractor.py), touches no C/CUDA/SYCL/HIP/Vulkan/Metal source, no public C API headers, no Meson build files, and no FFmpeg patch stack entries. If upstream Netflix/vmaf adds or removes PyPsnrFeatureExtractor / PypsnrFeatureExtractor in a future sync, audit feature_extractor.py lines 722–830 to ensure the primary-vs-deprecated alias relationship is preserved (primary = PyPsnr*, deprecated = Pypsnr*).
0086 — TransNet shot-metadata columns + HDR VMAF model port slot (Research-0086, ADR-0300 follow-up)¶
- Touches:
tools/vmaf-tune/src/vmaftune/__init__.py(CORPUS_ROW_KEYS additive trio, noSCHEMA_VERSIONbump),tools/vmaf-tune/src/vmaftune/per_shot.py(summarise_shots,_detect_shots_with_status,ShotMetadata),tools/vmaf-tune/src/vmaftune/corpus.py(_resolve_shot_metadata, row population, newshot_runnerkwarg oniter_rows),tools/vmaf-tune/src/vmaftune/hdr.py(transfer-awareselect_hdr_vmaf_model,hdr_model_name_for,HDR_MODEL_FILENAME, single-shot warning helper), tests + docs. - Invariant:
iter_rowsrunsvmaf-perShotexactly once per source — the cost of TransNet inference is too high to pay per (preset, crf) cell. If a future PR moves shot detection inside the cell loop the corpus-generation wall time roughly doubles. Keep the per-source resolution at the top ofiter_rowsand passShotMetadatadown to_row_for. Additionally:_detect_shots_with_statusis the only call site that distinguishes "real one-shot source" from "fallback because the binary failed" — the publicdetect_shotsshape cannot carry that boolean and downstream consumers depend on the(shots, ok)tuple to emit(0, 0.0, 0.0)sentinel rows. - Upstream conflict probability: zero. Upstream Netflix/vmaf does not carry a
vmaf-tunedirectory, anhdr.py, or a shot-detection harness. The HDR VMAF model port slot (vmaf_hdr_v0.6.1.json) is fork-internal scaffolding — Netflix publishes the canonical artefact outside their publicmodel/tree. No upstream rebase will touch any of these files. - Re-test:
pytest tools/vmaf-tune/tests/test_hdr.py tools/vmaf-tune/tests/test_shot_metadata_columns.py tools/vmaf-tune/tests/test_per_shot.py.
0358 — CUDA motion race + leak + motion2/motion3 precision parity (ADR-0358)¶
- Touches:
core/src/feature/cuda/integer_motion_cuda.c(memset moved froms->strtopic_stream;motion2_scoreemission switched to the CPU'sMIN(score * motion_fps_weight, motion_max_val)post-process in collect + flush;motion3_postprocess_cudaguard relaxed toframe_index > 2for the pre-incremented frame counter;vmaf_cuda_buffer_host_free (s->sad_host)added toclose_fex_cudaand theinit_fex_cudaerror unwind),core/src/feature/cuda/integer_motion/motion_score.cu(shared-tile inner stride paddedTILE_W→ `TILE_PITCH = TILE_W - 1
;launch_bounds(BLOCK_X * BLOCK_Y, 8)added to both bpc kernels),core/src/feature/cuda/integer_motion_v2/motion_v2_score.cu(same padding + launch_bounds for the v2 twin),docs/adr/0358-...md,docs/adr/README.md(index row),docs/backends/cuda/overview.md(motion bit-exact-at-places=4 appendix to "Numerical tolerance vs the CPU scalar path"),docs/state.md(Recently-closed row),changelog.d/fixed/ cuda-motion-race-leak-precision.md. Upstream Netflix/vmaf does not currently ship the motion3-on-CUDA host post-processing surface (motion3_postprocess_cudais fork-local per ADR-0219) so a future rebase touchinginteger_motion_cuda.c` is unlikely to touch the same lines, but a pure-upstream port that resets the post-process to its naive form will silently un-fix bugs 3 + 4. - Invariant: the SAD
cuMemsetD8Asyncruns onpic_stream, NOT on the drain streams->str. The kernel'satomicAddlives onpic_stream; both streams areCU_STREAM_NON_BLOCKINGand there is no event linking them, so co-locating memset + kernel on the same stream is the only thing that orders them. Mirrors the verbatim pattern atinteger_motion_v2_cuda.c:188. Themotion2_scorerow emitted to the feature collector is the weighted-and-clipped valueMIN(score * motion_fps_weight, motion_max_val), NOT the rawmin(prev, cur)SAD score; this matchesinteger_motion.c:563. Themotion3_postprocess_cudamoving-average guard readsframe_index > 2, NOT> 1, becauseframe_indexis pre-incremented before the helper is called. - Re-test: build with
cd libvmaf && meson setup build-cuda -Denable_cuda=true -Denable_sycl=false --buildtype=release && ninja -C build-cuda, then runmeson test -C build-cuda(expect 55/55), then runtools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv -d python/test/resource/yuv/src01_hrc01_576x324.yuv -w 576 -h 324 -p 420 -b 8 --backend cuda --feature motion_cuda --output /tmp/cuda.json --jsonand the same with--backend cpu --feature motion --output /tmp/cpu.json --json;places=4diff overinteger_motion,integer_motion2,integer_motion3should report0/144mismatches atmax_abs = 0.00e+00.compute-sanitizer --tool memcheck --leak-check full tools/vmaf ... --backend cuda --feature motion_cudareportsLEAK SUMMARY: 0 bytes leaked in 0 allocationspost-fix.compute-sanitizer --tool racecheckreports0 hazards.
0326 — vmaf-tune codec-adapter dispatcher pivot (ADR-0326, HP-1)¶
- Touches:
tools/vmaf-tune/src/vmaftune/encode.py(build_ffmpeg_command+ new_resolve_codec_args/_legacy_codec_argshelpers),tools/vmaf-tune/src/vmaftune/per_shot.py(_segment_commandsignature + body, new_default_segment_preset),tools/vmaf-tune/src/vmaftune/codec_adapters/(11 adapters gainffmpeg_codec_args+extra_params; libaom slice normalised),tools/vmaf-tune/tests/test_encode_dispatcher_per_adapter.py(new),tools/vmaf-tune/tests/test_codec_adapter_libaom.py(slice expectations updated for new contract). Upstream Netflix/vmaf has novmaf-tunesurface, so conflict probability is zero — this entry exists because the dispatcher contract is fork-local and any future adapter PRs need to land bothffmpeg_codec_argsand a matching fixture row. - Invariant: every entry in
tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py::_REGISTRYships anffmpeg_codec_args(preset, quality) -> list[str]method that returns the codec-correct argv slice (with-c:v <encoder>as the first two tokens). The runtime contract is enforced bytests/test_encode_dispatcher_per_adapter.py::test_fixture_table_covers_every_registered_adapter— adding an adapter without a fixture row fails this meta-test. x264 and x265 argv shapes stay byte-for-byte["-c:v", encoder, "-preset", preset, "-crf", str(quality)](defended by thetest_x{264,265}_argv_byte_for_byte_legacy_shapepinning tests). The legacy fallback inencode._resolve_codec_argsreturns the historic libx264 shape for unregistered encoders so callers that bypass the registry stay invocable. - Re-test:
cd tools/vmaf-tune && pytest tests/test_encode_dispatcher_per_adapter.py tests/test_per_shot.py tests/test_codec_adapter_libaom.py -v(36 + 16 + 12 = 64 tests green on this branch). For a wider check, run the full vmaf-tune suite — pre-existing failures intest_recommend.py,test_resolution.py,test_encode_multi_codec.py(parse_versions(encoder=...)+encoder_runner=), andtest_codec_adapter_{x265,svtav1}.py(parse_versions(encoder=...)) are unrelated and predate HP-1.
0310 — Vulkan VIF int64 reduction race condition Phase 3 fix¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(replaces all three barebarrier()calls with explicitmemoryBarrierShared(); barrier();pairs covering the Phase-1 cooperative tile load, the Phase-2 vertical-conv shared write, and the Phase-4 cross-subgroup int64 reduction); plus documentation underdocs/research/0089-...md(Phase 3 status appendix),docs/adr/0269-...md(Phase 3 status appendix),docs/state.md(T-VK-VIF-1.4-RESIDUAL closed; new T-VK-VIF-1.4-RESIDUAL-ARC opened),core/src/vulkan/AGENTS.md(Phase 3 update on the existing invariant row),changelog.d/fixed/vif-int64-reduction-race-condition.md. Upstream Netflix/vmaf has no Vulkan backend, so conflict probability for the shader is zero. The entry exists because the fix is rebase-sensitive: any future cherry-pick that touchesvif.compand downgrades amemoryBarrierShared(); barrier();pair back to a barebarrier()will silently re-introduce the NVIDIA Vulkan 1.4 race. - Invariant:
vif.compshared-memory ordering between cooperative-write phases must be release-acquire, not just a bare workgroup-execution barrier. NVIDIA's Vulkan 1.4 default memory model requires the explicit shared-memory release; barebarrier()works at API 1.3 by accident on this driver. SCALE is irrelevant — the fix applies to all four pipeline specialisations because the barrier sites are in the SCALE-shared code. Do NOT remove the explicitmemoryBarrierShared()calls even if a perf review claims they are redundant under the GLSL spec wording: empirical real-hardware evidence in research-0089 2026-05-09 appendix shows otherwise on NVIDIA driver 595.71.05. - Re-test: apply the local API-1.4 bump (
core/src/vulkan/common.c3 sites +vma_impl.cppVMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build withmeson setup ... -Denable_vulkan=enabled, then runpython3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan --device 1 --places 4. Expect 0/48 across all four scales. Run the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3" against--vulkan_device 1; expect 5 identical(integer_vif_num_scale2, integer_vif_den_scale2) = (+2.494358e+04, +2.522523e+04)pairs at frame 5. Note that--vulkan_device 0on this multi-GPU host is the Intel Arc A380 lane and will still fail at API 1.4 (separateT-VK-VIF-1.4-RESIDUAL-ARCrow Open).
0309 — Vulkan VIF API-1.4 Phase 2 dump (T-VK-VIF-1.4-RESIDUAL)¶
- Touches:
docs/research/0089-vulkan-vif-fp-residual-bisect-2026-05-08.md(2026-05-09 status appendix with empirical numbers from the live RTX 4090),docs/state.md(T-VK-VIF-1.4-RESIDUAL row updated with the localisation),core/src/vulkan/AGENTS.md(new invariant row pinning the SCALE = 2 cross-subgroup-reduction memory-model finding),CHANGELOG.md(lusoris fork "Changed" entry). No code touched; the Phase 3 shader memory-model fix lands in a separate PR. Upstream Netflix/vmaf has no Vulkan backend so conflict probability for the AGENTS.md row is zero — entry exists because the empirical localisation flips the open state-row hypothesis from FP-precision to memory-model and retires theplaces=3override path that earlier rebase scaffolding might have suggested. - Invariant:
vif.compSCALE = 2 specialisation's Phase-4 cross-subgroup int64 reduction is non-deterministic on NVIDIA driver 595.71.05 + Vulkan 1.4.341 (lines 547–592,subgroupAdd barrier()+ thread-0 read ofs_lmem). API 1.3 lane is fully deterministic on the same hardware. The fourapiVersionpinning sites incore/src/vulkan/common.c+core/src/vulkan/vma_impl.cppstay at 1.3 until Phase 3 lands the explicit memory-scope barrier and a 5-run determinism gate confirms run-to-run identical(num, den)plusplaces=40/48 on NVIDIA. Theplaces=3override path is eliminated from the unblock options.- Re-test: apply the local API-1.4 bump (
core/src/vulkan/common.c3 sites +vma_impl.cppVMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build withmeson setup ... -Denable_vulkan=enabled, then run the gate and the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3". Expect 45/48places=4failures oninteger_vif_scale2(max abs1.527e-02) AND 5 distinct(integer_vif_num_scale2, integer_vif_den_scale2)pairs across 5 runs of--feature 'vif_vulkan=debug=true'. Both observations reproduced bit-for-bit on this session's hardware lane (UUIDe478b41b-5c4f-1ddb-f990-e44916aff4c8).
0309 — vmaf-tune fast CLI surface (ADR-0276 status-update appendix, HP-3)¶
- Touches:
tools/vmaf-tune/src/vmaftune/cli.py(newfastsubparser +_run_fast+_build_fast_sample_extractor+_build_fast_encode_runner+_parse_canonical6_means),tools/vmaf-tune/tests/test_cli_fast.py(new),tools/vmaf-tune/AGENTS.md(new exit-code-contract invariant bullet under the fast-path section),docs/adr/0276-vmaf-tune-fast-path.md(status-update appendix),docs/usage/vmaf-tune.md(new## fastsection). Upstream Netflix/vmaf has no fast-path surface, so conflict probability is zero — entry exists because the canonical-6 parsing path offpooled_metrics.<feature>.meanis sensitive to libvmaf's JSON output shape. - Invariant: The libvmaf JSON layout the
_parse_canonical6_meanshelper consumes is thepooled_metrics.<feature>.meanshape (modern libvmaf 3.x), with a per-frameframes[].metrics.<feature>fallback. Both shapes are covered byparse_vmaf_jsonfor the headline VMAF score inscore.py; canonical-6 means re-use the same surface. If upstream changes the JSON schema (e.g. nestspooled_metricsunder a new key),_parse_canonical6_meansfollows in the same PR — the fast-path proxy depends on the canonical-6 vector being correctly extracted from libvmaf's output. The OOD-gap exit code3from_run_fastis the documented fall-back signal indocs/usage/vmaf-tune.md§ "Fall-back idiom"; do not silently downgrade it to0. The CLI is the only seam that injectssample_extractorandencode_runnerintofast.fast_recommend; the Python API still raisesNotImplementedErrorwhen called without them. - Re-test:
PYTHONPATH=tools/vmaf-tune/src python -m pytest tools/vmaf-tune/tests/test_cli_fast.py tools/vmaf-tune/tests/test_fast.py -v(21 tests). Smoke end-to-end without ffmpeg / ONNX / GPU:vmaf-tune fast --target-vmaf 92 --smoke --n-trials 8should emit a JSON payload whosesmokeistrueandverify_vmafisnull.vmaf-tune fast --helplists every flag in_DOCUMENTED_FAST_FLAGSfromtest_cli_fast.py.
0366 — vmaf-tune corpus schema v3 (ADR-0366)¶
- Touches:
tools/vmaf-tune/src/vmaftune/__init__.py(SCHEMA_VERSION2 → 3, +12 canonical-6 aggregate keys),tools/vmaf-tune/src/vmaftune/score.py(newparse_feature_aggregates,ScoreResult.feature_means/_stds),tools/vmaf-tune/src/vmaftune/corpus.py(writer projects the aggregates into row keys; newread_jsonlwith v2 back-compat),ai/scripts/train_fr_regressor_v[23].py(consume the new columns directly from the corpus DataFrame). All paths are wholly fork-local —tools/vmaf-tune/andai/scripts/are not mirrored upstream — so rebase impact is zero. - Invariant: Phase B/C/D consumers and the FR-regressor trainers rely on the canonical-6
<feature>_meancolumns being present onschema_version >= 3rows and beingNaN(never0.0) when libvmaf does not expose the feature. Keep the writer-sideNaNcontract intact during any future widening; trainers drop NaN rows before fitting StandardScaler. The reader (read_jsonl) preserves the on-diskschema_versionso trainers can filter to>= 3if they need real per-feature data. - Re-test:
cd tools/vmaf-tune && python -m pytest \
tests/test_corpus.py tests/test_corpus_schema_v3.py \
tests/test_corpus_v2_back_compat.py -q
python -m pytest ai/tests/test_train_fr_regressor_v3.py -q
0308 — encoder knob-sweep recipe-regression policy (ADR-0308, docs-only)¶
- Touches:
docs/research/0080-encoder-knob-sweep-findings.md,docs/adr/0308-encoder-knob-sweep-recipe-regression-policy.md,docs/adr/README.md(index row),ai/AGENTS.md(knob-sweep invariant section),changelog.d/changed/encoder-knob-sweep-findings.md. No code touched; companion to PR #400 (ADR-0305 + Research-0077 +ai/scripts/analyze_knob_sweep.py). Upstream Netflix/vmaf has no encoder-knob-sweep surface, so conflict probability is zero — this entry exists only because the policy threshold (7-of-9 structural cut) is rebase-sensitive on the corpus shape. - Invariant: the 7-of-9 source-count threshold from ADR-0308 §Decision point 1 is calibrated against the current 9-source Netflix Public Dataset corpus. If the corpus grows past 9 sources (e.g. UGC expansion per ADR-0287, or HDR additions), re-derive the absolute threshold as a fraction (≥7/9 ≈ 78 %). The structural cluster is sharp on the current corpus (top-15 cells all hit 9-of-9, no observed cells in 4-6 range), so a fractional cut at ~75 % is robust. Do NOT relax
bitrate_tol_pct(default 5.0) orvmaf_tol(default 0.1) inai/scripts/analyze_knob_sweep.pywithout an ADR — those tolerances are calibrated against the per-frame VMAF noise floor and bitrate quantisation in libavformat muxers. - Re-test:
pytest ai/tests/test_knob_sweep_analysis.py -v(script logic; ships in PR #400). Policy gate is offline: regenerateruns/phase_a/full_grid/comprehensive.jsonlviatools/vmaf-tune/src/vmaftune/hw_encoder_corpus.py(3-hour run on a single host with NVENC + QSV) then re-runpython ai/scripts/analyze_knob_sweep.py --jsonl <adapted.jsonl> --out-dir runs/phase_a/full_grid/reports/and diff the resultingsummary.mdagainstdocs/research/0080-encoder-knob-sweep-findings.mdheadline table. Structural cluster (top-15 cells, all 9-of-9) is the invariant to defend.
0228 — Vulkan 1.4 bump deferred (ADR-0264, docs-only)¶
- Touches: none (docs-only PR). Future Step A of T-VK-1.4-BUMP will touch
core/src/feature/vulkan/shaders/vif.compandcore/src/feature/vulkan/shaders/ciede.comp; Step B will touch the threeapiVersionsites incore/src/vulkan/common.c(lines 54, 264, 374) and theVMA_VULKAN_VERSIONdefine incore/src/vulkan/vma_impl.cpp(line 22). - Invariant:
masterstays onVK_API_VERSION_1_3andVMA_VULKAN_VERSION = 1003000. Lifting the constant in any future upstream sync (Netflix doesn't ship a Vulkan backend, so the conflict is improbable) without first auditingprecise/OpDecorate ... NoContractiondecoration onvif.compandciede.compwill reintroduce the NVIDIA-driver regression captured in research-0053. Thepsnr_hvs_strict_shaders-O0list incore/src/vulkan/meson.buildis the existing precedent for shader-side bit-exactness mitigations and should be the place a 1.4-era audit lands its results (potentially expanding to covervif.comp+ciede.compif thepreciseaudit decides the optimizer is the right place to gate). - Re-test: when Step B lands, the gate is
python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkanand the same with--feature ciedeagainst NVIDIA + RADV + lavapipe; max abs diff must stay ≤5.0e-05(places=4) on all three.
0229 — HIP fifth-consumer kernel float_ansnr_hip (ADR-0266)¶
0228 — y4m_convert_411_422jpeg 1-byte heap-buffer-overflow fix¶
0228 — vmaf-tune resolution-aware model selection (ADR-0289)¶
0282 — vmaf-tune AMD AMF codec adapters (ADR-0282)¶
0228 — tools/vmaf-tune/ codec-agnostic encode dispatcher (ADR-0294)¶
- Touches:
tools/vmaf-tune/src/vmaftune/encode.py— refactored to look up the codec adapter and delegate argv composition. Wholly fork-local.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py,codec_adapters/x264.py— adapter contract gainsffmpeg_codec_args(preset, quality)andextra_params(). Both are duck-typed; missing methods fall back to the legacy x264-CRF shape.tools/vmaf-tune/tests/test_encode_multi_codec.py— new 19-test suite pinning the dispatcher contract per codec.docs/usage/vmaf-tune.md— new "Codec adapter contract" section.- Invariant: the harness (
encode.py,corpus.py) must not branch on codec identity. The only codec-aware code is the per-adaptercodec_adapters/*.pyfile. Any future change that adds anif adapter.encoder == "..."to the harness regresses ADR-0294's whole-purpose. The corpus row schema stays at SCHEMA_VERSION=1 —crfis preserved as the row column even when the underlying codec's quality knob is-cq/-qp/ etc.;EncodeRequest.qualityis a request-side property only. Adapters that don't yet exposeffmpeg_codec_argsare intentionally permitted to fall back to the legacy x264-CRF shape; removing that fallback would break in-flight adapter PRs landing one-at-a-time. - Re-test on rebase:
```bash pytest tools/vmaf-tune/tests/ -q # 32 passed (13 existing + 19 multi-codec)
python -c " from pathlib import Path from vmaftune.encode import EncodeRequest, build_ffmpeg_command req = EncodeRequest( source=Path('ref.yuv'), width=1920, height=1080, pix_fmt='yuv420p', framerate=24.0, encoder='libx264', preset='medium', crf=23, output=Path('out.mp4'), ) cmd = build_ffmpeg_command(req) assert cmd[cmd.index('-c:v') + 1] == 'libx264' assert cmd[cmd.index('-preset') + 1] == 'medium' assert cmd[cmd.index('-crf') + 1] == '23' print('x264 dispatcher path OK') "
0260 — vmaf-tune --sample-clip-seconds (ADR-0301)¶
- Touches:
tools/vmaf-tune/src/vmaftune/{cli,corpus,encode,score,__init__}.py— fork-local. No upstream Netflix/vmaf path overlap.tools/vmaf-tune/tests/test_corpus.py,tools/vmaf-tune/AGENTS.md,docs/usage/vmaf-tune.md,docs/adr/0301-vmaf-tune-sample-clip.md,docs/adr/_index_fragments/0301-vmaf-tune-sample-clip.md,docs/adr/_index_fragments/_order.txt,docs/adr/README.md.- Invariant: corpus JSONL
SCHEMA_VERSIONbumped to2— additiveclip_modekey only. Sample-clip windows are mirrored on both sides via FFmpeg input-side-ss/-t(encode) and libvmaf's--frame_skip_ref/--frame_cnt(score). The_resolve_sample_clip()helper is the single source of truth for the centre-anchored slice math; do not duplicate the computation elsewhere. Falls back silently to"full"whenN >= duration_s. - Re-test:
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_amf,hevc_amf,av1_amf,_amf_common}.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py— registry extended with three AMF entries.tools/vmaf-tune/tests/test_codec_adapter_amf.py(new).tools/vmaf-tune/tests/test_corpus.py— Phase A test renamed fromtest_known_codecs_phase_a_is_x264_onlytotest_known_codecs_includes_x264_and_amf.tools/vmaf-tune/AGENTS.md— adds AMF preset-compression invariant.docs/usage/vmaf-tune.md— adds Hardware encoders section.- Invariant: the 7-into-3 preset compression table in
_amf_common.py(_PRESET_TO_AMF) is the cross-codec axis Phase B / C consumers depend on. Every AMF adapter accepts the canonical 7 preset names (placebo…ultrafast) and maps them onto the three AMF rungs (quality/balanced/speed). Do not extend the preset vocabulary without amending ADR-0282 — registry uniformity (no codec-identity branching in the harness search loop) rests on every codec accepting the same names. - Re-test:
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
tools/vmaf-tune/src/vmaftune/resolution.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/corpus.py— addsCorpusOptions.resolution_aware: bool = Trueand pipes the effective model throughscore_res.request.modelinto the JSONL row.tools/vmaf-tune/src/vmaftune/cli.py— adds--resolution-aware/--no-resolution-aware(BooleanOptionalAction, default on).tools/vmaf-tune/tests/test_resolution.py(new).docs/usage/vmaf-tune.md— new "Resolution-aware mode" section.docs/adr/0289-vmaf-tune-resolution-aware.md(new) +docs/research/0064-vmaf-tune-resolution-aware.md(new).tools/vmaf-tune/AGENTS.md— two new invariant notes.- Invariant: the height-only decision rule (
height >= 2160→vmaf_4k_v0.6.1, elsevmaf_v0.6.1) is the documented contract. The JSONLvmaf_modelfield is now per-row (not per-job) — mixed ladder corpora legitimately contain multiple distinct values across rows. Downstream consumers (Phase B / C / D) must group/filter byvmaf_modelrather than assuming a constant. Width is accepted in the API for symmetry but ignored in the body; do not branch on it without a follow-up ADR. - Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep resolution-aware
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
core/tools/y4m_input.c— upstream-mirrored Daala-derived Y4M parser. The fix sits inside the 4:1:1 → 4:2:2-jpeg chroma upsample routiney4m_convert_411_422jpeg, lines ~500–530 in the function's three sub-loops. Upstream Netflix/vmaf carries the same shape; if upstream lands its own fix during a sync, prefer the upstream version and drop ours.core/test/test_y4m_411_oob.c(new, fork-local) — drives the minimal W=2 H=4 4:1:1 stream throughvideo_input_open+video_input_fetch_frame. Wholly fork-added; no upstream collision.core/test/meson.build— addstest_y4m_411_oobexecutable +test()registration.- Invariant: the first two sub-loops of
y4m_convert_411_422jpegmust guard_dst[(x << 1) | 1]writes with(x << 1 | 1) < dst_c_w, matching the third sub-loop's existing guard. Without the guard a 4:1:1 stream of width 2 (dst_c_w == 1) writes one byte past the destination chroma row. - Re-test:
cd libvmaf && meson setup ../build-asan --buildtype=debug -Db_sanitize=address -Db_lundef=false -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabledninja -C build-asan test/test_y4m_411_oobASAN_OPTIONS=detect_leaks=0 ./build-asan/test/test_y4m_411_oob— must report1 tests run, 1 passed. Pre-fix the binary aborts withAddressSanitizer: heap-buffer-overflow … WRITE of size 1aty4m_input.c:507.
0270 — saliency_student_v1 fork-trained on DUTS-TR (ADR-0286)¶
- Touches:
model/tiny/registry.json— adds thesaliency_student_v1row. Fork-local registry; no upstream overlap.model/tiny/saliency_student_v1.onnx(+.jsonsidecar) — new weights and metadata. Fork-local.ai/scripts/train_saliency_student.py— new training script. Wholly fork-local underai/, which has no upstream counterpart.docs/ai/models/saliency_student_v1.md,docs/research/0062-saliency-student-from-scratch-on-duts.md,docs/adr/0286-saliency-student-fork-trained-on-duts.md— new docs under fork-local trees.- Invariant: the C-side
feature_mobilesal.cextractor's tensor-name contract —input(NCHW[1, 3, H, W]) andsaliency_map(NCHW[1, 1, H, W]) — must continue to match the ONNX graph for bothsaliency_student_v1.onnxand the legacymobilesal.onnxplaceholder. Future weights swaps can change the graph internals freely but must keep these names + shapes; the smoke test asserts the registration. The op-allowlist constraint (graph uses only ops incore/src/dnn/op_allowlist.c) carries over from ADR-0218 —Resizeis not used;ConvTransposeis the upsample op for v1 to keep the graph load-clean against vanilla origin/master. - Re-test:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python -c "
from ai.src.vmaf_train.op_allowlist import check_model
from pathlib import Path
r = check_model(Path('model/tiny/saliency_student_v1.onnx'))
assert r.ok, r.pretty()
print('allowlist OK')
"
meson test -C build --suite=fast mobilesal
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
core/src/feature/hip/float_ansnr_hip.{c,h}(new) — fifth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/float_ansnr_cuda.ccall-graph-for-call-graph;init/submit/collect/closeinvoke the kernel-template helpers in the same order; the submit body intentionally bypassesvmaf_hip_kernel_submit_pre_launch(no atomic, kernel writes per-block (sig, noise) interleaved float partials directly).core/src/hip/meson.build— adds the new TU tohip_sources.core/src/feature/feature_extractor.c— adds theextern VmafFeatureExtractor vmaf_fex_float_ansnr_hip;declaration and the registry row under#if HAVE_HIP.core/test/test_hip_smoke.c— addstest_float_ansnr_hip_extractor_registeredsub-test pinning the lookup contract.- Invariant — the
submit_pre_launchbypass is load-bearing. The CUDA twin makes the same choice for the same reason. If a future PR adds asubmit_pre_launchcall tofloat_ansnr_cuda.c's submit path, the HIP twin must follow in the same PR. Likewise the readback shape (wg_count * 2u * sizeof(float)) and the bpc table (peak/psnr_max for 8/10/12/16-bit) mirror the CUDA twin verbatim — keep aligned on rebase. - Re-test on rebase:
cd libvmaf
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build # 48/48 green (47 CPU + HIP smoke)
0230 — HIP sixth-consumer kernel motion_v2_hip (ADR-0267)¶
- Touches:
core/src/feature/hip/integer_motion_v2_hip.{c,h}(new) — sixth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/integer_motion_v2_cuda.ccall-graph-for-call-graph; carries theVMAF_FEATURE_EXTRACTOR_TEMPORALflag and aflush()callback. The state struct has auintptr_t pix[2]ping-pong slot pair tracked outside the kernel-template (the template models a single device+host pair only).core/src/hip/meson.build— adds the new TU tohip_sources.core/src/feature/feature_extractor.c— adds theextern VmafFeatureExtractor vmaf_fex_integer_motion_v2_hip;declaration and the registry row under#if HAVE_HIP.core/test/test_hip_smoke.c— addstest_motion_v2_hip_extractor_registeredsub-test pinning the lookup contract (extractor name ismotion_v2_hip, matching the CUDA twin'smotion_v2_cudanaming).- Invariant — temporal-extractor + ping-pong shape. The
VMAF_FEATURE_EXTRACTOR_TEMPORALflag bit, theflush()callback registration, and theuintptr_t pix[2]slot pair are load-bearing for the runtime PR (T7-10b). The runtime PR will swapuintptr_t pix[2]for a real device-buffer handle pair matching the CUDA twin'sVmafCudaBuffer *pix[2]. On rebase: if the CUDA twin's flush-pass shape changes (currentlymin(score[i], score[i+1])), update the HIP twin'sflush_fex_hipbody in the same PR. - Re-test on rebase: same as 0229 —
meson test -C buildwithenable_hip=trueexercises the smoke contract.
0227 — ms_ssim_vulkan submit-side migrated to kernel_template (T-GPU-DEDUP-26)¶
- Touches:
core/src/feature/vulkan/ms_ssim_vulkan.c—extract()'s rawVkCommandBuffer/VkFence/vkAllocateCommandBuffers/vkBeginCommandBuffer/vkCreateFence/vkQueueSubmit/vkWaitForFences/vkDestroyFence/vkFreeCommandBuffersblocks becomeVmafVulkanKernelSubmittriples (vmaf_vulkan_kernel_submit_begin/_submit_end_and_wait/_submit_free). One triple covers the decimate-pyramid command buffer; one triple per scale covers the per-scale SSIM submit. The pipeline-side bundles (pl_decimate2-binding 4-variant +pl_ssim10-binding 9-variant) and their_add_variant()chains are unchanged from the prior migration.- Invariant: any future submit-side template change (timeline semaphores, deferred fence release, queue-family parameterisation) must keep the helpers' synchronous-wait + per-frame fence + per-frame command-buffer contract intact, since
ms_ssim_vulkan.cdoes host readback of thel_partials/c_partials/s_partialsbuffers immediately after_submit_end_and_waitreturns. The submit-side contract is the same one already documented incore/src/vulkan/AGENTS.md's "Rebase-sensitive invariants" section forkernel_template.h. - Re-test:
```bash cd libvmaf && meson test -C build python scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature float_ms_ssim --backend vulkan --places 4
0231 — SHA-pin GitHub Actions (OSSF Pinned-Dependencies)¶
- Touches: every workflow file under
.github/workflows/. All 13 fork workflows (docker-image.yml,docs.yml,ffmpeg-integration.yml,libvmaf-build-matrix.yml,lint-and-format.yml,nightly-bisect.yml,nightly.yml,release-please.yml,rule-enforcement.yml,scorecard.yml,security-scans.yml,supply-chain.yml,tests-and-quality-gates.yml) had theiruses:directives rewritten from<owner>/<repo>@vN[.M.K]to<owner>/<repo>@<40-char-sha> # vN.M.K. 97 references converted; the SLSA reusable-workflow ref insupply-chain.ymlis the single documented holdout (seeInvariantbelow). - Invariant — SHA-pin policy for
uses:. Every action reference in.github/workflows/*.ymlMUST be a 40-char commit SHA with the semver tag preserved as a trailing# vN.M.Kcomment. The OSSF ScorecardPinned-Dependenciescheck parses both forms and a floating tag (@vN) is treated as unpinned and counts against the aggregate score. Single permitted exception: the SLSA generator reusable workflow (slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml) must keep itsvX.Y.Ztag form because GitHub Actions consumers cannot SHA-pin reusable-workflow refs in every code path; the exception is documented inline insupply-chain.ymland survives on each rebase. Why this matters on upstream sync: Netflix upstream does not ship the fork's CI tree, so a/sync-upstreamrun that drags new workflow content (e.g. via repository templates or bot-authored bumps) into.github/workflows/can re-introduce floating-tag references unnoticed. The post-rebase check below is the standing gate — anything that lights up needs to be re-pinned before merging the sync. - Re-test on rebase:
# Anything that prints is a regression — every uses: must be either
# already SHA-pinned (40 hex) or, for the documented SLSA exception,
# the slsa-github-generator reusable-workflow ref.
grep -hnE '^\s*(- )?uses:\s+[^@]+@[^ #]+\s*$' .github/workflows/*.yml \
| grep -vE '@[a-f0-9]{40}' \
| grep -v 'slsa-framework/slsa-github-generator/.github/workflows/'
# SHA-resolution sanity for any new pin (per-action):
gh api repos/<owner>/<repo>/git/ref/tags/<vN.M.K> --jq '.object.sha'
# If the result is a "tag" object (annotated tag), deref:
gh api repos/<owner>/<repo>/git/tags/<sha-from-prev> --jq '.object.sha'
0226 — CUDA drain-batch engine-loop opt (T-GPU-OPT-1)¶
- Touches:
core/src/cuda/drain_batch.{h,c}(new) — TLS drain-batch table + shared drain stream +_open()/register/_flush()/_close()API.core/src/libvmaf.c— engine-side per-frame loop now wraps submit/collect with_open()+_flush()so all CUDA extractorfinishedevents are waited on a single shared drain stream.- All 12 CUDA feature kernels (
core/src/feature/cuda/*.c) register theirfinishedevent +drainedflag with the drain batch on submit; collect skips its privatecuStreamSynchronizewhendrainedis true. - Invariant — drained-flag contract. Every CUDA extractor's collect path must check the per-frame
drainedflag and skip its owncuStreamSynchronizewhen set; otherwise the drain batching is a no-op. The flag is reset tofalseper frame insidevmaf_cuda_drain_batch_register(). - Re-test on rebase:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast cuda
Expected: all CUDA tests green; bench shows ≥5% wall-clock gain on a 7-extractor VMAF model (model.json with all feature extractors enabled).
0225 — Netflix bench snapshot regen (upstream a44e5e61 motion fix)¶
- Touches:
testdata/netflix_benchmark_results.json— fork-added snapshot. CPU rows now reflect the post-fix motion feature; cuda / sycl rows from the previous regen are preserved unchanged because those backends were not exercised on this rerun (host-environment tooling — wrong renderD path,libvmaf_cudanot enabled in the local FFmpeg build). Future full regens should include cuda / sycl.testdata/bench_all.sh— defaultVMAF=no longer points at/usr/local/bin/vmaf(which on most dev hosts is stuck at the pre-upstream-a44e5e61v3.0.0); now defaults to the in-tree fork build atcore/build/tools/vmaf.testdata/benchmark_netflix.py—FFMPEG,YUVDIRand the hardcodedLD_LIBRARY_PATH=/usr/local/libare now overridable viaVMAF_FFMPEG,VMAF_YUVDIRand any caller-setLD_LIBRARY_PATH.- Invariant: the snapshot's CPU pooled VMAF for
src01_576x324is 76.667828 (post-fix), not 76.668904 (the upstream-buggy mirror). If/sync-upstreamever re-pulls a Netflix change that touchesmotion.cmirror-handling, this number is the reference. - Re-test:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
LD_LIBRARY_PATH=$(pwd)/build/src python3 \
../testdata/benchmark_netflix.py
Expected CPU pooled rows: 76.667828, 35.068672, 7.985899.
0224 — CUDA graph capture feasibility (research-0047, DEFER)¶
- Touches: none — investigation-only; no code lands. The research digest
docs/research/0047-cuda-graph-capture-feasibility.mddocuments why a CUDA graph capture path on the per-frame submit chain is deferred rather than shipped (realised wall-clock gain capped at ~1-3% vs. the predicted 10-20%, with a 4-slot picture-pool rotation that defeats single-graph capture and forces per-framecuGraphExecKernelNodeSetParamsrebinding for(ref, dis)device pointers). - Invariant: the
kernel_template.hdocstring keeps namingVmafCudaKernelLifecycle.finishedas a graph-capture hook point. Don't prune that comment on rebase — leaving the door open in the template is free, and the digest's "what needs to be true for a future GO" section depends on the hook still being there. - Re-test on rebase:
# Confirm the docstring still references graph capture as the hook
# point — wording change is fine, removal is not.
grep -q "graph capture" core/src/cuda/kernel_template.h
0223 — ADR slug-drift repair in CHANGELOG / rebase-notes (PR #304 follow-up)¶
- Touches:
CHANGELOG.md,docs/rebase-notes.md. No code; no upstream-shared path; no public-API surface. - Invariant: every
[ADR-NNNN](docs/adr/NNNN-slug.md)link in the fork's tracked docs resolves to an actual on-disk file underdocs/adr/. Repaired 4 broken slugs that did not exist on disk (0138-iqa-convolve-avx2-bitexact-double→0138-iqa-convolve-avx2-bitexact-double,0140-simd-dx-framework→0140-simd-dx-framework,0190-ms-ssim-vulkan→0190-ms-ssim-vulkan,0178-vulkan-adm-kernel→0178-vulkan-adm-kernel). All retained their cited NNNN per ADR-0028 (NNNN is immutable once Accepted). - Re-test on rebase: from repo root, the following must print no lines:
for ref in $(grep -ohE 'docs/adr/[0-9]{4}-[a-z0-9-]+\.md' \
CHANGELOG.md docs/rebase-notes.md AGENTS.md docs/state.md \
| sort -u); do
test -f "$ref" || echo "MISSING: $ref"
done
0125 — cambi_vulkan migrated to kernel_template (T-GPU-DEDUP-25, 5-bundle)¶
- Touches:
core/src/feature/vulkan/cambi_vulkan.c— state's quintet (dsl_2bind+ 5×pl_layout_*+shader_modules[CAMBI_PL_COUNT]- shared
desc_pool) collapses to fiveVmafVulkanKernelPipelinebundles (pl_trivial,pl_derivative,pl_filter_mode,pl_decimate,pl_mask_dp), each owning its own descriptor pool. The first slot ofpipelines[]per stage aliases the bundle's base pipeline;CAMBI_PL_FILTER_MODE_V,CAMBI_PL_MASK_SAT_COL, andCAMBI_PL_MASK_THRESHOLDare sibling variants built viavmaf_vulkan_kernel_pipeline_add_variant().
- shared
cambi_vk_alloc_settakes a bundle pointer (->desc_pool/->dsl) — every dispatch site picks the bundle that matches its push-constant struct.- The
cambi_vk_make_dsl/cambi_vk_make_pl/cambi_vk_create_shader/cambi_vk_build_pipelinehelpers are dropped — the template subsumes them. - Invariant — variants destroyed before bundle, base alias must be skipped. Five distinct push-constant struct sizes (
CambiVkPushTrivial/CambiVkPushDerivative/CambiVkPushFilterMode/CambiVkPushDecimate/CambiVkPushMaskDp) force five bundles even though every stage's DSL is 2-binding SSBO;_add_variant()only siblings pipelines under the same layout.close_fexmustvkDestroyPipeline()the variant slots (CAMBI_PL_FILTER_MODE_V,CAMBI_PL_MASK_SAT_COL,CAMBI_PL_MASK_THRESHOLD) before callingvmaf_vulkan_kernel_pipeline_destroy()on each bundle. - Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit):
cambimean = 0.0, identical to pre-migration (the pair has no banding artifacts). - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper. Upstream Netflix/vmaf has no Vulkan backend, so there is nothing to merge against.
0124 — ssimulacra2_vulkan migrated to kernel_template (T-GPU-DEDUP-24, 4-bundle)¶
- Touches:
core/src/feature/vulkan/ssimulacra2_vulkan.c— state's 16 long-lived pipeline-object fields (4×*_dsl + *_pl + *_shader+ the shareddesc_pool) collapse to fourVmafVulkanKernelPipelinebundles (pl_xyb,pl_mul,pl_blur,pl_ssim), each owning its own descriptor pool. The first slot of each per-bundle pipeline array (xyb_pipelines[0],mul_pipelines[0],blur_pipelines_h[0],ssim_pipelines[0]) aliases the bundle's baseVkPipeline; remaining per-scale / per-pass slots are siblings viavmaf_vulkan_kernel_pipeline_add_variant().ss2v_build_pipeline_int3reroutes through_add_variant()instead of callingvkCreateComputePipelinesdirectly;ss2v_alloc_settakes a bundle pointer (->desc_pool/->dsl) instead of a separate DSL argument; descriptor-set free sites at the tail ofss2v_run_scaleroute to each bundle's pool.- The
ss2v_make_dsl/ss2v_make_pl/ss2v_create_shaderhelpers are dropped — the template subsumes them. - Invariant — variants destroyed before bundle, slot 0 alias must be skipped. Four distinct DSL shapes (XYB = 6 SSBOs, MUL = 3, BLUR = 2, SSIM = 8) prevent collapsing to one bundle:
_add_variant()only siblings pipelines under the same layout.close_fexmustvkDestroyPipeline()the variant slots inxyb_pipelines[1..N-1],mul_pipelines[1..N-1],ssim_pipelines[1..N-1],blur_pipelines_h[1..N-1], and every slot ofblur_pipelines_v[]before callingvmaf_vulkan_kernel_pipeline_destroy()on each bundle, and must skip slot 0 of the first three arrays +blur_pipelines_hto avoid double-freeing the aliased base. - Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit):
ssimulacra2mean = 24.613842, identical to pre-migration. - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper. Upstream Netflix/vmaf has no ssimulacra2 extractor and no Vulkan backend, so there is nothing to merge against.
0118 — psnr_hvs_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-18)¶
- Touches:
core/src/feature/vulkan/psnr_hvs_vulkan.c— state'sdsl + pipeline_layout + shader + desc_pool + pipeline[3]collapses toVmafVulkanKernelPipeline pl + VkPipeline pipeline_chroma_u + VkPipeline pipeline_chroma_v. Plane 0 is the template's base pipeline; planes 1+2 are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- New
psnr_hvs_plane_pipeline()accessor maps plane index to the rightVkPipelinehandle. - Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the chroma U/V variants before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan in T-GPU-DEDUP-7. - Numerical contract: unchanged. Same shaders + spec-constants
- push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
- Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0119 — vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-19)¶
- Touches:
core/src/feature/vulkan/vif_vulkan.c— state'sdsl + pipeline_layout + shader + desc_pool + pipelines[4]collapses toVmafVulkanKernelPipeline pl + VkPipeline scale_variants[3]. Scale 0 is the template's base pipeline; scales 1, 2, 3 are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- New
vif_scale_pipeline()accessor maps scale index to the rightVkPipelinehandle (replacess->pipelines[scale]). - Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the 3 scale variants before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan in T-GPU-DEDUP-7 and psnr_hvs_vulkan in T-GPU-DEDUP-18. - Numerical contract: unchanged. Same shaders, same spec-constants, same push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
- Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0120 — float_vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-20)¶
- Touches:
core/src/feature/vulkan/float_vif_vulkan.c— state collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl; theVkPipeline pipelines[2][4]2-D lookup table is preserved so the existing[mode][scale]dispatch path stays clean, butpipelines[0][0]aliasess->pl.pipeline(the template's base). The other 6 entries are sibling pipelines created viavmaf_vulkan_kernel_pipeline_add_variant().- Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the 6 sibling variants (every(mode, scale)except(0, 0)) before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan / psnr_hvs_vulkan / vif_vulkan. - Invariant —
pipelines[0][0]aliasing. The base pipeline handle is owned bys->pl.pipeline; we copy it intopipelines[0][0]after_create()so the dispatch path can use a uniform 2-D lookup. The destroy loop must skip(mode=0, scale=0)to avoid double-freeing the template's pipeline. - Numerical contract: unchanged. Same shaders, spec-constants (
mode+scale), push-constants. Netflix-pair smoke matchesinteger_vifbit-identically to 4 decimals. - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0122 — float_adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-22)¶
- Touches:
core/src/feature/vulkan/float_adm_vulkan.c— twin to adm_vulkan (T-GPU-DEDUP-21); 16-pipeline 2-D[stage][scale]array. State collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl.pipelines[0][0]aliasess->pl.pipeline; the other 15 entries are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- Invariants:
- Variants destroyed before bundle.
pipelines[0][0]aliasing — destroy loop must skip(stage=0, scale=0).- Numerical contract: unchanged. Same float (
_ssuffix) primitives fromadm_tools.c; same 5-element spec-constant tuple; same float partial accumulation reduced in double on the host.
0121 — adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-21)¶
- Touches:
core/src/feature/vulkan/adm_vulkan.c— state collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl; theVkPipeline pipelines[4][4]2-D lookup is preserved so the per-stage dispatch path stays clean.pipelines[0][0]aliasess->pl.pipeline(the template's base); the other 15 entries are sibling pipelines viavmaf_vulkan_kernel_pipeline_add_variant().- Invariants:
- Variants destroyed before bundle (same rule as ssim_vulkan / psnr_hvs / vif / float_vif).
pipelines[0][0]aliasing — destroy loop must skip(stage=0, scale=0)to avoid double-freeing the template's pipeline.- Numerical contract: unchanged. Same shaders + 5-element spec-constant tuple (width, height, bpc, scale, stage) + push-constants.
- Rebase impact: low. Builds on top of PR #272.
0123 — ms_ssim_vulkan 2-bundle migration (T-GPU-DEDUP-23)¶
- Touches:
core/src/feature/vulkan/ms_ssim_vulkan.c— state collapsesdecimate_dsl + decimate_pl + decimate_shader + ssim_dsl + ssim_pl + ssim_shader + desc_pool(7 fields) to two bundlesVmafVulkanKernelPipeline pl_decimate+pl_ssim. Each bundle owns its own descriptor pool. The kernel has two distinct pipeline shapes (decimate = 2 SSBO bindings, ssim = 10 bindings), so two bundles is the minimum —_add_variant()only siblings pipelines under the same layout.decimate_pipelines[0]aliasespl_decimate.pipeline(the template's base = scale 0). The remainingMS_SSIM_SCALES - 2decimate variants (scales 1..3) are siblings via_add_variant().ssim_pipeline_horiz[0]aliasespl_ssim.pipeline(base = scale 0, pass 0). The other 9 entries (4×ssim_pipeline_horizfor scales 1..4, plus 5×ssim_pipeline_vertfor scales 0..4) are variants.- Invariant — variants destroyed before bundle. Same rule as ADR-0106 entry 0106:
close_fexmust destroydecimate_pipelines[1..3]andssim_pipeline_horiz[1..4]+ssim_pipeline_vert[0..4]before callingvmaf_vulkan_kernel_pipeline_destroy()onpl_decimate/pl_ssim. - Invariant —
[0]aliasing destroy-skip.decimate_pipelines[0]andssim_pipeline_horiz[0]must not be passed tovkDestroyPipelineinclose_fex—_destroy()already releases them viapl_decimate.pipeline/pl_ssim.pipeline. Double-free is UB. The destroy loops inclose_fexstart ati = 1for decimate and skipi == 0for ssim_horiz. - Invariant — per-bundle descriptor pool. The shared
s->desc_poolis gone;alloc_descriptor_setnow takes aconst VmafVulkanKernelPipeline *bundleand usesbundle->desc_pool+bundle->dsl. Per-framevkFreeDescriptorSetscalls must target the matching pool (pl_decimate.desc_poolfor decimate sets,pl_ssim.desc_poolfor ssim sets) — mixing them is undefined behavior. - Numerical contract: unchanged. Same shaders, spec constants, push constants, and dispatch order as before.
float_ms_ssimNetflix-pair smoke (576×324×48f) reports mean 0.963241; ssim pyramid intermediate values bit-identical to pre-migration run. - Rebase impact: low. Upstream Netflix has no Vulkan backend. Conflicts only against the parallel
T-GPU-DEDUP-{18..22}PRs (#284–#288) onCHANGELOG.md/docs/rebase-notes.md— auto-resolve keeps both halves.
0106 — Vulkan kernel template multi-pipeline + ssim/motion migration (T-GPU-DEDUP-7)¶
- Touches:
core/src/vulkan/kernel_template.h— newvmaf_vulkan_kernel_pipeline_add_variant()helper. Takes the base pipeline bundle (DSL / pipeline layout / shader / pool owned byvmaf_vulkan_kernel_pipeline_create) plus a partialVkComputePipelineCreateInfoand produces a siblingVkPipelinere-using the same layout / shader. The base_createand_destroyentry points are unchanged; existing consumers (psnr, moment, ciede) keep working.core/src/feature/vulkan/motion_vulkan.c— state collapsesVkPipeline pipelines[2](kept "for SYCL parity" but functionally identical because COMPUTE_SAD goes through push constants, not spec-constants) to a singleVmafVulkanKernelPipeline pl.create_pipelines/close_fexshrink to template-driven create + destroy.core/src/feature/vulkan/ssim_vulkan.c— state becomesVmafVulkanKernelPipeline pl + VkPipeline pipeline_vert. Pass 0 (horizontal) is the template's base pipeline; pass 1 (vertical) is created via_add_variant().close_fexdestroys the variant first, then callsvmaf_vulkan_kernel_pipeline_destroy()on the bundle.- Invariant — no spec-constant drift between base and variant.
_add_variant()overwritessType/stage.sType/stage.stage/stage.module/layoutof the caller'sVkComputePipelineCreateInfoso the variant is guaranteed to share the base's shader and layout. Callers control the variant's spec-constant viapSpecializationInfo. Reordering these overwrites lets a consumer accidentally bind a different shader module under the same layout — UB at descriptor-set time. - Invariant — variant destroyed before bundle.
close_fexin ssim mustvkDestroyPipeline(s->pipeline_vert)beforevmaf_vulkan_kernel_pipeline_destroy(&s->pl)— the bundle's_destroyreleases the descriptor pool, which thevkAllocateDescriptorSetsissued against the variant pipeline's layout cleanly drops only when the variant pipeline is already gone. - Numerical contract: unchanged. Both kernels run identical shaders + spec-constants + push-constants as before; only the Vulkan boilerplate that creates / destroys the pipeline scaffolding moved to a shared owner. Cross-backend parity gate at
places=4holds — Netflix-pairfloat_ssimsmoke (576×324×48f) reports mean 0.863, identical to pre-migration. - Rebase impact: low. The base pipeline-bundle helpers predate this change (PR #270 / #271); the new
_add_variantis additive. Upstream Netflix has no Vulkan backend to conflict with.
0111 — integer_ciede_cuda migrated to kernel_template (T-GPU-DEDUP-11)¶
- Touches:
core/src/feature/cuda/integer_ciede_cuda.c— state'sCUstream + CUevent + CUevent + VmafCudaBuffer + host-pinned float*quintet collapses toVmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. init / collect / close call the template'slifecycle_init/readback_alloc/collect_wait/lifecycle_close/readback_freehelpers. submit keeps the pre-launch wait inline (intentional — ciede has no atomic, so the template's pre-launch memset is unnecessary).- Numerical contract: unchanged. Pure CUDA-boilerplate consolidation. The host-side reduction in collect still uses the same
doubleaccumulator over per-block float partials —places=4(ADR-0187) holds.
0112 — integer_moment_cuda migrated to kernel_template (T-GPU-DEDUP-12)¶
- Touches:
core/src/feature/cuda/integer_moment_cuda.c— state's stream/event/device-buffer/host-pinned quintet collapses toVmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. submit callsvmaf_cuda_kernel_submit_pre_launch(atomic counters require the device-side memset). init / collect / close call the matching template helpers.- Numerical contract: unchanged. Same per-frame atomic accumulators (4× uint64), same
sums_host[i] / n_pixelshost division. - Rebase impact: low. Upstream Netflix has no equivalent template; this consolidation is fork-local.
0113 — integer_motion_v2_cuda migrated to kernel_template (T-GPU-DEDUP-13)¶
- Touches:
core/src/feature/cuda/integer_motion_v2_cuda.c— stream/event pair + sad device+host quintet collapses tolc + rb. Raw-pixel ping-pongpix[2]stays outside the bundle. submit keeps the memset onpic_streaminline rather than callingsubmit_pre_launch(the helper would move the memset tolc.str, which races with the kernel reading the accumulator). init / collect / close call the matching template helpers.- Numerical contract: unchanged. Same D2D copy, same conditional kernel launch on frame ≥ 1, same host-side
min(score[i], score[i+1])flush.
0114 — integer_ssim_cuda migrated to kernel_template (T-GPU-DEDUP-14)¶
- Touches:
core/src/feature/cuda/integer_ssim_cuda.c— stream/event/partials device+host quintet collapses tolc + rb. Five intermediate float buffers (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp) stay outside the bundle. submit keeps thecuStreamWaitEvent + horiz + vert + DtoHchain inline — SSIM writes one float per block (no atomic), so the template'ssubmit_pre_launchmemset is unnecessary. init / collect / close use the matching template helpers.- Numerical contract: unchanged. Same horiz-then-vert two-pass pipeline, same per-block float partial reduction in double on the host.
places=4(matching the ciede_cuda precision pattern) holds. - Rebase impact: low. Upstream Netflix has no equivalent; this is fork-added.
0115 — ms_ssim_cuda + psnr_hvs_cuda lifecycle migration (T-GPU-DEDUP-15)¶
- Touches:
core/src/feature/cuda/integer_ms_ssim_cuda.c— stream + 2-event lifecycle replaced withVmafCudaKernelLifecycle lc; multi-level pyramid + SSIM intermediate + 3-partials buffers stay outside the template's single-pair readback bundle.core/src/feature/cuda/integer_psnr_hvs_cuda.c— same shape; 3-plane ref/dist/partials triples remain inline.- Numerical contract: unchanged. The migration only affects init / close boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the
s->str→s->lc.str/s->event→s->lc.submit/s->finished→s->lc.finishedfield renames.
0116 — float_psnr/ansnr/motion cuda → kernel_template (T-GPU-DEDUP-16)¶
- Touches:
core/src/feature/cuda/float_psnr_cuda.c— stream/event/partials quintet →lc + rb; input upload buffersref_in/dis_instay outside the bundle.core/src/feature/cuda/float_ansnr_cuda.c— same shape; rb wraps the (sig, noise) interleaved partials.core/src/feature/cuda/float_motion_cuda.c— same shape; rb wraps the SAD partials,blur[2]ping-pong stays outside.- Numerical contract: unchanged. Same dispatch geometry, same reduction order. Cross-backend parity gate at the kernels' contracted precision (places=3 per ADR-0192) holds.
0117 — float_adm + float_vif cuda lifecycle migration (T-GPU-DEDUP-17)¶
- Touches:
core/src/feature/cuda/float_adm_cuda.c— stream + 2-event lifecycle replaced withVmafCudaKernelLifecycle lc; multi-stage DWT + CSF pipeline state stays outside the template's single-pair readback bundle.core/src/feature/cuda/float_vif_cuda.c— same shape; 4-level pyramid + per-scale (num, den) pairs remain inline.- Numerical contract: unchanged. The migration only affects init / close stream-event boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the field renames.
- Rebase impact: low. Upstream Netflix has no equivalent template; this is fork-added.
0107 — float_psnr_vulkan migrated to kernel_template (T-GPU-DEDUP-8)¶
- Touches:
core/src/feature/vulkan/float_psnr_vulkan.c— state'sdsl + pipeline_layout + shader + pipeline + desc_poolquintet is collapsed into a singleVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy. No shader changes, no spec-constant changes, no push-constant changes.- Numerical contract: unchanged. The migration is a pure Vulkan-boilerplate consolidation. Cross-backend parity gate at
places=4holds — Netflix-pair smoke reportsfloat_psnrmean 30.755 dB, identical to pre-migration.
0109 — float_ansnr_vulkan + motion_v2_vulkan migrated to kernel_template (T-GPU-DEDUP-9)¶
- Touches:
core/src/feature/vulkan/float_ansnr_vulkan.c— single-pipeline state collapses toVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy.core/src/feature/vulkan/motion_v2_vulkan.c— same shape.- Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Cross-backend parity gate at the kernel's contracted precision holds — Netflix-pair smoke reports
float_ansnrmean 23.51 dB andmotion2_v2_scoremean 3.895, identical to pre-migration.
0110 — float_motion_vulkan migrated to kernel_template (T-GPU-DEDUP-10)¶
- Touches:
core/src/feature/vulkan/float_motion_vulkan.c— single-pipeline state collapses toVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy.- Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Netflix-pair smoke reports
motionmean 4.049 /motion2mean 3.894, identical to pre-migration. - Rebase impact: low. Upstream Netflix has no Vulkan backend.
0108 — Bristol VI-Lab feasibility digest + BVI-CC ingest ADR (Draft)¶
- Touches:
docs/research/0046-bristol-vi-lab-feasibility.md(new) — nine-dataset survey + use-case fit + effort estimate.docs/adr/0241-bristol-bvi-cc-ingest.md(new, Status: Draft) — proposal to ingest BVI-CC as the second tiny-AI corpus.docs/adr/README.md— index row for ADR-0241.CHANGELOG.md— Added entry.- Numerical contract: not applicable (docs-only).
- Rebase impact: none. Pure research deliverables; upstream Netflix has no equivalent surface.
0094 — Vulkan VkImage import v2 async pending-fence (T7-29 part 4 / ADR-0251)¶
- ADR: ADR-0251; predecessor ADR-0186.
- Touches:
core/src/vulkan/import.c— full rewrite of the submission path. Single-fencesubmit_and_waitbecomes per-slotsubmit_to_slot+drain_slot_fence; the newslot_alloc/slot_releasehelpers materialise / tear down a ring slot (staging-pair + cmd buffer + fence).vmaf_vulkan_import_imageindexes into the ring byframe_index % ring_size;vmaf_vulkan_wait_computedrains every outstanding fence.vmaf_vulkan_state_build_pictureswaits the slot's fence before exposing the host pointer. Public-API signatures are unchanged.core/src/vulkan/vulkan_internal.h— newstruct VmafVulkanImportSlot;VmafVulkanImportSlotsbecomes a fixed-capacityVmafVulkanImportSlot ring[VMAF_VULKAN_RING_MAX]plus geometry +ring_size. Two new defines —VMAF_VULKAN_RING_DEFAULT(4) andVMAF_VULKAN_RING_MAX(8).VmafVulkanStategainsrequested_ring_size.core/src/vulkan/common.c—vmaf_vulkan_state_initand_state_init_externalsetrequested_ring_size = VMAF_VULKAN_RING_DEFAULT.core/test/test_vulkan_async_pending_fence.c(new, contract smoke for the v1 → v2 swap).core/test/meson.build— registers the new test under the existingenable_vulkanguard.core/src/vulkan/AGENTS.md(new) — pins the three rebase-sensitive ring invariants.docs/adr/0251-vulkan-async-pending-fence.md(new),docs/research/0042-vulkan-async-pending-fence.md(new),docs/api/gpu.md,docs/backends/vulkan/overview.md,CHANGELOG.md,docs/rebase-notes.md.ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch— unchanged. The v2 ring is fully internal toVmafVulkanState; the public ABI stays byte-identical so the filter consumes the new path transparently.- Invariant 1 — fixed ring depth at first import.
lazy_alloc_ringis the only place that materialises the ring; once allocated the depth never changes for the lifetime of theVmafVulkanState. Any caller that needs a different depth has to free + re-init. The geometry pinning contract from v1 (ADR-0186) is preserved verbatim. - Invariant 2 —
vkResetFencesonly afterVK_SUCCESSfromvkWaitForFences. Sole reset path lives indrain_slot_fence;fence_in_flightflips back to 0 only after the wait succeeds. A-EIOfrom the wait propagates up without resetting (so a retry would correctly re-wait rather than silently move on). - Invariant 3 —
state_freedrains before destroying.vmaf_vulkan_import_slots_freewalks the ring and callsdrain_slot_fenceon every in-flight slot, then issues onevkQueueWaitIdlebelt-and-braces (any feature kernel that submitted on the same queue may still be running). Reordering this triggers validation-layer "destroying in-use object" errors. - Numerical contract: unchanged. Async submission only changes when the host can read the staging buffer, not which bytes the GPU writes. Cross-backend parity gate (
scripts/ci/cross_backend_parity_gate.py,places=4) holds. - Memory delta: staging arena scales
1 → ring_sizeper direction. At default depth and 1080p 8-bit Y, the per-state host-visible footprint grows from ~4 MiB to ~16 MiB. Documented in ADR-0251 §Consequences.
0090 — cambi_vulkan extractor (T7-36 / ADR-0210)¶
- ADR: ADR-0210; predecessor ADR-0205.
- Touches:
core/src/feature/vulkan/cambi_vulkan.c(replaces the spike scaffold'sinit_stub/extract_stub/close_stubtriple with the full Vulkan-aware lifecycle).core/src/feature/vulkan/shaders/cambi_preprocess.comp(new),cambi_mask_dp.comp(new — unified row-SAT / col-SAT / threshold-compare viaPASS=0/1/2spec const).core/src/feature/cambi.c— appends a small block of public trampolines (vmaf_cambi_*) at the bottom of the file that thinly wrap the file-static helpers. No upstream function-static code is renamed or moved; the entire upstream body of cambi.c above the trampolines stays byte-identical, which keeps Netflix sync straightforward.core/src/feature/cambi_internal.h(new) — internal-only header exposingvmaf_cambi_calculate_c_values,vmaf_cambi_get_spatial_mask, etc., to the GPU twin.core/src/vulkan/meson.build— registers the 5 cambi shaders invulkan_shader_sources[]andcambi_vulkan.cinvulkan_sources.core/src/feature/feature_extractor.c— adds the extern decl + registry entry forvmaf_fex_cambi_vulkanunder#if HAVE_VULKAN.scripts/ci/cross_backend_vif_diff.py—cambirow inFEATURE_METRICSso the cross-backend gate runs atplaces=4against the CPU baseline.docs/adr/0210-cambi-vulkan-integration.md,docs/research/0032-cambi-vulkan-integration.md,docs/backends/vulkan.md,CHANGELOG.md.- Invariant 1 — bit-exactness by construction. Every GPU phase is integer arithmetic (
uint16derivative,int32SAT,>compare, stride-2 gather, 3-elementmode3lookup). The readback into the hostVmafPicturepair is byte-identical to what the CPU would have written; the host residual then runs the unmodified CPUcalculate_c_values+ spatial pooling on those buffers. Any rebase that introduces float arithmetic into one of these GPU phases — e.g., a future Netflix change to the derivative kernel that adds a bilinear interpolation step — will silently breakplaces=4and must be caught at the cross-backend gate. - Invariant 2 —
cambi_internal.hsignatures must stay in lock-step with cambi.c's file-static helpers. The Vulkan twin callsvmaf_cambi_calculate_c_values, which trampolines to the file-staticcalculate_c_values. Any signature change to the latter (extra parameters, type changes) must update the trampoline + header in the same PR or the GPU build breaks. - On upstream sync: cambi.c's file-static helpers are sometimes renamed by upstream (e.g.,
decimate→cambi_decimatewould happen during a Netflix tidy-up). When rebasing, search cambi.c's tail for the trampoline block — its fivestaticcalls (get_spatial_mask,decimate,filter_mode,calculate_c_values,spatial_pooling,weight_scores_per_scale,get_pixels_in_window,increment_range,decrement_range,get_derivative_data_for_row,cambi_preprocessing) need to match the upstream symbol names. Update the trampoline body if upstream renames; signatures should not need to change because the trampoline already takes the function-pointer-typedef form (VmafRangeUpdateretc.). - Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py --backend vulkan --feature cambi --ref testdata/ref_576x324_48f.yuv --dist testdata/dis_576x324_48f.yuv --width 576 --height 324 --pixel-format 420 --bitdepth 8 --frames 48. Should emitplaces=4 PASSwithmax_abs_diff = 0.0. If it diverges, bisect the GPU phases by reading back individual buffers (image_buf/mask_buf/deriv_buf) and comparing against the CPU's in-placepicplane after the equivalent stage.
The pre-ADR-0108 fork-local PRs are summarised by workstream rather than per-PR. Future PRs add entries individually.
0085 — Upstream c70debb1 partial port (adm_csf + barten_csf tests)¶
- No ADR. Pure upstream cherry-pick per ADR-0108 carve-out ("pure upstream syncs and
port-upstream-commitPRs are exempt"). - Upstream source:
c70debb1(Kyle Swanson, 2026-04-28): "libvmaf/test: port new adm/vif/speed tests". The audit row that flagged the gap is T-NEW-2 in the 2026-04-29 quarterly upstream-backlog re-audit (PR #205). - Touches (additive only):
core/src/feature/adm_csf_tools.h— new header (verbatim from upstream); declares the inlineadm_native_csfhelper (DLM-paper CSF) used by the newtest_adm_csfunit.core/test/test_adm_csf.c— new unit (verbatim from upstream); 2mu_assertcases onadm_native_csf(3, 3.0, 1080, {0, 45}).core/test/test_barten_csf.c— new unit (verbatim from upstream); 23mu_assertcases overbarten_rod_cone_sens,barten_mtf,barten_csf,linear_interpolate,barten_watson_blend_csf(all symbols already on the fork).core/test/meson.build— registers the two new executables + addstest('test_adm_csf', ...)andtest('test_barten_csf', ...).CHANGELOG.mdUnreleased § Changed.- Deliberate scope cuts (the upstream commit's other halves are not portable verbatim):
test_vif_tools.c— depends on upstream symbolsNUM_KERNELSCALES, the 21-entryvalid_kernelscalestable,vif_validate_kernelscale,vif_get_filter_size,vif_get_filter,speed_get_antialias_filter, and a[NUM_KERNELSCALES][5][65]filter table that the fork'svif_filter1d_table_s [11][4][65]does not match. Per Research-0024 Strategy E, the fork deliberately diverges from the upstreamvifruntime-helper chain to preserve the ADR-0138 / 0139 / 0142 / 0143 SIMD bit-exactness contract. Porting this test requires porting the runtime helpers first.test_speed_chroma.c—#includesfeature/speed.cdirectly; the fork has no SpEED extractor (feature/speed.cdoes not exist). Pairs with audit row T-NEW-1 (port the SpEED extractor wholesale, or absorb it into the tiny-AI speed metric).- Invariants (rebase-relevant):
- The new
adm_csf_tools.hheader is wholly additive and does not conflict with the existing forkadm_csf_snon-inline helper inadm_tools.h(different signature, different translation units). - The two new tests do not depend on Netflix golden YUVs — they evaluate the closed-form CSF math directly. No golden-data interaction.
- On upstream sync: a future port of the upstream
vifruntime-helper chain (Research-0024 Strategy A reversal) or the SpEED extractor (T-NEW-1) unlocks the deferred halves of this commit. Until then, fork-sidetest_vif_tools.c/test_speed_chroma.cstay absent. - Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu test_adm_csf test_barten_csf
meson test -C build-cpu test_adm_csf test_barten_csf
0084 — Embedded MCP server scaffold (T5-2, ADR-0209)¶
- ADR: ADR-0209 (audit-first scaffold) on top of the ADR-0128 governance + Research-0005 design.
- Upstream source: fork-local. Netflix/vmaf has no embedded MCP server (and no plans to add one — the workflow is agent-tooling-specific, well outside upstream's library scope).
- Touches:
core/include/libvmaf/libvmaf_mcp.h— new public header.core/include/core/meson.build— newif get_option('enable_mcp')install branch.core/src/mcp/— new directory:mcp.c(stub TU) +meson.build(exposesmcp_sources+mcp_defines).core/src/meson.build— newis_mcp_enabledguard +subdir('mcp')block;mcp_sourcesthreaded into thelibrary('vmaf', ...)source list alongsidednn_sources.core/test/meson.build— newif get_option('enable_mcp')block wiringtest_mcp_smoke.core/test/test_mcp_smoke.c— new 12-sub-test smoke.core/meson_options.txt— newenable_mcpumbrella + three sub-flags (all defaultfalse).- Invariant: every public entry point in
libvmaf_mcp.h(vmaf_mcp_init/_start_sse/_start_uds/_start_stdio/_stop/_close) returns-ENOSYS(or-EINVALon bad arguments) until the T5-2b runtime PR lands. The smoke pins this contract — a runtime PR that flips a return code without flipping the smoke expectation regresses the gate. - On upstream sync: zero interaction with upstream files. Wholly additive directory + boolean build flags. The
subdir('mcp')insertion incore/src/meson.buildlives next to the existingsubdir('dnn')/ Vulkan blocks; an upstream conflict in that area would be confined to those few lines and is mechanical to resolve. - Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_mcp=false
ninja -C build-cpu && meson test -C build-cpu # baseline still green
meson setup --reconfigure build-cpu libvmaf -Denable_mcp=true \
-Denable_mcp_sse=true -Denable_mcp_uds=true -Denable_mcp_stdio=true
ninja -C build-cpu
meson test -C build-cpu test_mcp_smoke # 12/12 sub-tests pass
0065 — T7-37 Netflix bench rerun + docs/benchmarks.md TBD fill¶
- No ADR. Empirical fill of pre-existing
TBDcells; no new decision. The bench script fixes that this rerun depends on shipped earlier under PR #169 (libvmaf/AGENTS.md backend-engagement foot-guns), PR #170 (--backend cudaactually engages CUDA), and PR #171 (testdata/bench_all.shuses correct flags). Vulkan header install for SDK consumers is PR #175. - Touches (additive only):
docs/benchmarks.md(everyTBDcell replaced with measured numbers; hardware-profile table updated to theryzen-4090-archost the rerun was performed on; "How to reproduce" section now documents fixture acquisition for the gitignored BBB 4K 200-frame pair).CHANGELOG.mdUnreleased § Changed entry. - Invariants (rebase-relevant): none. The numbers are tied to fork commit
41301496and theryzen-4090-arcprofile; an upstream rebase that changes feature pipelines would invalidate the table but not break parsing. - On upstream sync: zero interaction. Pure docs.
- Re-test on rebase:
bash testdata/bench_all.sh(after a fresh fork build) — confirms the bench script still drives all four backends and that the per-row metrics-key counts (CPU=15, CUDA=12, SYCL/Vulkan=34) still distinguish them. If they collapse to one count, the new upstream broke a backend dispatcher silently.
0050 — float_adm_cuda + float_adm_sycl extractors (ADR-0202)¶
- ADR: ADR-0202
- Touches:
core/src/feature/cuda/float_adm/float_adm_score.cu(new)core/src/feature/cuda/float_adm_cuda.{c,h}(new)core/src/feature/sycl/float_adm_sycl.cpp(new)core/src/meson.build— three changes: (1) newfloat_adm_scoreentry incuda_cu_sources, (2) newcuda_cu_extra_flagsdict that threads--fmad=false+-Xcompiler=-ffp-contract=offinto thefloat_adm_scorefatbin only, (3) new SYCL source insycl_feature_sources.core/src/feature/feature_extractor.c(extern decls + list entries forvmaf_fex_float_adm_cuda/vmaf_fex_float_adm_syclunder#if HAVE_CUDA/#if HAVE_SYCL).- Invariant 1 —
--fmad=falsefor the float_adm fatbin only: the angle-flag dot product (ot_dp = oh*th + ov*tv) and the cube reductions (xa*xa*xa,csf_o*csf_o*csf_o) require IEEE-754 add/mul ordering to match the GLSLprecisequalifier infloat_adm.comp. NVCC's default-fmad=truefuses these and drifts pastplaces=4at scale 3 / adm2. The integer ADM kernels sharecuda_flagsbut useint64accumulators where FMA is irrelevant — keep the FMA-on default for them. - Invariant 2 — parent-LL dimension trap: stage 0 at
scale > 0reads the parent's LL band; the mirror/clamp bounds arescale_w/h[scale](= parent's LL output dims = current scale's input dims), NOTscale_w/h[scale - 1](= parent's full image dims). Bothfloat_adm_cuda.candfloat_adm_sycl.cppcite this inline. Do not "simplify" by using the off-by-one neighbour. - Re-test:
CXX=icpx CC=icx meson setup build-cs -Denable_cuda=true \
-Denable_sycl=true -Denable_vulkan=enabled \
-Denable_float=true \
-Dsycl_compiler=/opt/intel/oneapi/compiler/latest/bin/icpx
ninja -C build-cs
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build-cs/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature float_adm \
--backend cuda --places 4
# Same with --backend sycl on a host with an SYCL device.
# Both must report 0/N mismatches at places=4.
0049 — float_adm_vulkan extractor (ADR-0199)¶
- ADR: ADR-0199
- Touches:
core/src/feature/vulkan/float_adm_vulkan.c(new)core/src/feature/vulkan/shaders/float_adm.comp(new)core/src/vulkan/meson.build(adds the .comp shader and the new .c source)core/src/feature/feature_extractor.c(extern decl + list entry under#if HAVE_VULKAN)scripts/ci/cross_backend_vif_diff.py(float_admentry inFEATURE_METRICS).github/workflows/tests-and-quality-gates.yml(lavapipefloat_admstep atplaces=4)- Invariant: float_adm GPU port uses the
2 * sup - idx - 1mirror form on both axes — matches both the scalaradm_dwt2_sand the AVX2float_adm_dwt2_avx2, which both consume the samedwt2_src_indices_filt_sindex buffer. This is intentionally different from float_vif's GPU mirror (ADR-0197), which uses-2because float_vif's AVX2 path takes a different code branch. Do not "fix" the asymmetry by analogy with float_vif. - Re-test:
meson setup build-vk -Denable_vulkan=enabled -Denable_cuda=false \
-Denable_sycl=false
ninja -C build-vk
meson test -C build-vk
VK_LOADER_DRIVERS_SELECT='*lvp*' python3 \
scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build-vk/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature float_adm --places 4
0083 — SSIMULACRA 2 Vulkan kernel (ADR-0201)¶
- ADR: ADR-0201
- Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf — fully fork-local feature.
- Touches:
core/src/feature/vulkan/ssimulacra2_vulkan.c(new file).core/src/feature/vulkan/shaders/ssimulacra2_xyb.comp,ssimulacra2_blur.comp,ssimulacra2_mul.comp,ssimulacra2_ssim.comp(4 new shader files).core/src/vulkan/meson.build— added 4 shaders tovulkan_shader_sourcesand 1 source tovulkan_sources; added all 4 ssimulacra2 shaders topsnr_hvs_strict_shaders(the-O0strict-mode list, kept its legacy name).core/src/feature/feature_extractor.c— registeredvmaf_fex_ssimulacra2_vulkanin the Vulkan branch of the extractor list (betweenpsnr_hvs_vulkanand the CUDA block).scripts/ci/cross_backend_vif_diff.py— addedssimulacra2toFEATURE_METRICS.- Rebase impact: low — fully additive, no upstream-shared files modified beyond
feature_extractor.c's registry array (which always grows on every new extractor and is not a rebase pain point). - Verification command:
meson setup core/build-vk-ss2 \
-Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false \
libvmaf
ninja -C core/build-vk-ss2 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build-vk-ss2/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 \
--feature ssimulacra2 --backend vulkan --places 1
# expected: max_abs_diff ≈ 1.59e-2, 0/48 mismatches at places=1
- Follow-ups:
- CUDA + SYCL twins (batch 3 parts 7b + 7c per ADR-0192).
- Performance follow-up: re-bin multiple rows / columns per WG in the IIR blur (currently
local_size = 1, one row/col per WG for correctness). - Optional: rename
psnr_hvs_strict_shaderstostrict_shadersincore/src/vulkan/meson.build(cosmetic — out of scope for this PR).
0001 — SIMD bit-identical reductions for float ADM¶
- Workstream PRs: #18, commits
24c88a32,f082cfd3. - Touches:
core/src/feature/integer_adm.c,core/src/feature/float_adm.c,core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/feature/arm64/adm_neon.c, upstreampython/test/feature_extractor_test.pytest expectations. - Invariant:
sum_cubeandcsf_den_scaleaccumulate cubed values in double precision (via_mm256_cvtps_pd/_mm512_cvtps_pd) in scalar, AVX2, AVX-512, and NEON. Upstream accumulates in float, which produces ~8e-5 drift between scalar and SIMD. Test expectations were tightened to match the double-precision path; an upstream-side accumulator change would re-introduce the drift and break the tightened assertions. - Re-test:
meson test -C build --suite=fast && python -m pytest python/test/feature_extractor_test.py -k adm.
0002 — CUDA ADM decouple-inline buffer elimination¶
- Workstream PRs: commit
787e3382. - Touches:
core/src/feature/cuda/integer_adm_cuda.cu,core/src/feature/cuda/adm_decouple_inline.cuh(new),core/src/feature/cuda/meson.build. Upstream'sadm_decouple.cuis no longer compiled in the fork. - Invariant: CSF and CM CUDA kernels read
ref/disDWT2 buffers directly and computedecouple_r/decouple_ainline via__device__helpers inadm_decouple_inline.cuh. The 6 intermediate buffers (decouple_r,decouple_a,csf_a× {scale-0 int16, scales 1-3 int32}) and the standaloneadm_decouple.cusource are intentionally removed. ~107 MB GPU memory savings at 4K. An upstream change toadm_decouple.cuwill look orphaned and a literal merge would re-introduce the buffer allocations. - Re-test:
meson setup build -Denable_cuda=true && ninja -C build && meson test -C build --suite=cuda.
0003 — SYCL backend (USM pool / D3D11 import / vmaf_sycl_* API)¶
- Workstream PRs: #33, #35, #5 (initial scaffolding), and the picture-pool deadlock fix that landed via #32.
- Touches:
core/include/libvmaf/libvmaf_sycl.h,core/src/sycl/,core/src/feature/sycl/,core/src/libvmaf.c(SYCL public-API entry points),meson_options.txt(enable_sycl). - Invariant:
vmaf_sycl_preallocate_picturesconstructs a realVmafSyclPicturePoolhonoringVmafSyclPicturePreallocationMethod(NONE/DEVICE/HOST);vmaf_sycl_picture_fetchdispatches to the pool when configured. The whole SYCL tree is fork-local and has no upstream counterpart — upstream changes tocore/src/libvmaf.cnear the SYCL entry-point block are likely to conflict. Picture-pool error paths invmaf_read_pictures(libvmaf.c) mustgoto cleanup;rather thanreturn err;to avoid leaking ref/dist pictures into the live-picture set (closes the always-on-pool deadlock fixed in #32 — see ADR-0104). See ADR-0101, ADR-0103, ADR-0104. - Re-test:
meson setup build -Denable_sycl=true && ninja -C build && meson test -C build --suite=sycl(requires oneAPI / icpx).
0004 — DNN runtime + tiny-AI surfaces¶
- Workstream PRs: #5, #8, #21, #22, #23, #31, #34, plus the pre-numbered DNN feat commits (
9b985946,1e5336d3,d122b721). - Touches:
core/include/libvmaf/dnn.h,core/src/dnn/,core/src/feature/feature_lpips.c,model/tiny/,meson_options.txt(enable_onnxruntime). - Invariant: ordered EP selection (CUDA → DML → CPU) with graceful fallback (ADR-0102);
fp16_iodoes host-side fp32↔fp16 cast on the scoring path;VMAF_TINY_MODEL_DIRenforces a path jail on model load (PR #31); the runtime op-allowlist (PR #21) walks the ONNX graph and rejects unknown ops + bounds Loop/Iftrip_countat 1024 (ADR-0036/0107). DNN tree is fork-local; upstream has no DNN code yet, so conflicts here are unlikely but themeson_options.txtandcore/src/meson.buildblocks near the DNN flag may collide. - Re-test:
meson setup build -Denable_onnxruntime=true && ninja -C build && meson test -C build --suite=dnn.
0005 — --precision CLI flag (IEEE-754 round-trip lossless)¶
- Workstream PRs: commit
c989fbd9. - Touches:
core/tools/vmaf.c,core/tools/cli_parse.c,core/include/libvmaf/libvmaf.h(addedvmaf_write_output_with_format),core/src/output.c. - Invariant: default
--precisionis%.17g(round-trip lossless);legacyopts back into upstream's%.6f; the public C API gainedvmaf_write_output_with_formatand the oldvmaf_write_outputroutes through it with the%.17gdefault. ABI-breaking only if upstream adds a same-named function with a different signature. See ADR-0006. - Re-test:
vmaf -r ref.yuv -d dis.yuv ... --precision=fulland diff against--precision=legacy.
0006 — Netflix golden tests preserved verbatim as required gate¶
- Workstream PRs: across the fork's life; codified in ADR-0024.
- Touches:
python/test/quality_runner_test.py,python/test/vmafexec_test.py,python/test/vmafexec_feature_extractor_test.py,python/test/feature_extractor_test.py,python/test/result_test.py,python/test/resource/yuv/. - Invariant:
assertAlmostEqual(...)golden values in the five upstream Python test files are never modified by this fork. Fork-added tests live in separate files (e.g.python/test/test_precision_flag.py). The CI gate "Netflix CPU golden tests (D24)" is required and blocks merge. Upstream changes to these files are accepted unless they relax the assertions. - Re-test:
make test-netflix-golden.
0007 — Build system (CUDA 13.2, oneAPI 2025.3, MkDocs migration)¶
- Workstream PRs: #7, #17, commit
8a995cb0. - Touches:
meson.build,meson_options.txt, top-levelMakefile,docs/(Sphinx → MkDocs Material migration —docs/conf.pyremoved,mkdocs.ymladded),docs/requirements.txt,Dockerfile.*, distro install scripts underscripts/. - Invariant: image pins are non-conservative (ADR-0027) — CUDA 13.2, oneAPI 2025.3, clang-format 22, black 26 — and ship experimental toolchain flags (
--expt-relaxed-constexpr, etc.) deliberately. An upstream sync that pulls in a Dockerfile change targeted at older CUDA or older oneAPI must not relax the pins. - Re-test:
meson setup build -Denable_cuda=true -Denable_sycl=true && ninja -C build && mkdocs build --strict.
0008 — Workspace / docs / MATLAB / resource-tree relocations¶
- Workstream PRs: codified across ADR-0026, ADR-0029, ADR-0030, ADR-0031, ADR-0032, ADR-0033, ADR-0034, ADR-0038.
- Touches: any path-walk in upstream's CI / scripts / docs that assumes the upstream layout (root-level
workspace/,resource/,matlab/, rootunittestscript, rootpatches/). - Invariant: the fork's layout is
python/vmaf/workspace/,python/vmaf/resource/,python/vmaf/matlab/,scripts/unittest,ffmpeg-patches/only,.github/codeql-config.yml. Upstream moves to a different sub-tree (e.g. a hypotheticaltools/workspace/) need to either be applied via a corresponding fork-side relocation or rejected with a rebase note. - Re-test:
python -m pytest python/test/ -k golden(verifies the resource-tree path works);make test-netflix-golden.
0009 — License headers (Lusoris/Claude on wholly-new files¶
2016–2026 on Netflix files)
- Workstream PRs: commits
c159761d,a185f8ef,0e98c949, codified in ADR-0025 / ADR-0105. - Touches: every wholly-new fork file (notably the SYCL tree and
core/src/dnn/) and every Netflix-touched file (year range2016 → 2016–2026). - Invariant: wholly-new fork files carry
Copyright 2026 Lusoris and Claude (Anthropic)under the same BSD-3-Clause-Plus-Patent license; mixed files use a dual-copyright notice. An upstream commit that resets a Netflix file's year range (e.g. back to2016–2020) must be partially rejected — keep the fork's2016–2026. - Re-test: grep that wholly-new fork files retain the Lusoris/Claude header (
grep -L "Copyright 2026 Lusoris" core/src/sycl/*.cpp— expected to match nothing).
0010 — .claude/ agent scaffolding + ADR tree + AGENTS.md / CLAUDE.md¶
- Workstream PRs: #14, #24, #37, plus continuous additions.
- Touches:
.claude/,AGENTS.md,CLAUDE.md,docs/adr/,.github/PULL_REQUEST_TEMPLATE.md. - Invariant: this whole tree is fork-local and has no upstream counterpart. Upstream additions to
.github/(issue templates, workflows) need to merge cleanly with the fork's existing files rather than replacing them. The ADR tree's IDs ≤ 0099 are backfills; new decisions start at 0100 (ADR-0028 / ADR-0106). - Re-test: visual review of
.github/anddocs/adr/README.mdafter the merge.
Pre-ADR-0108 entries above are the result of a one-shot backfill sweep on 2026-04-18; subsequent fork-local PRs add their own entries inline.
0011 — Nightly bisect-model-quality + fixture cache¶
- Workstream PRs: closes #4; sticky tracker issue #40.
- Touches:
.github/workflows/nightly-bisect.yml,ai/scripts/build_bisect_cache.py,ai/testdata/bisect/{features.parquet, models/*.onnx, README.md},scripts/ci/post-bisect-comment.py,docs/ai/bisect-model-quality.md,docs/adr/0109-nightly-bisect-model-quality.md,docs/research/0001-bisect-model-quality-cache.md,mkdocs.yml(nav). - Invariant: the committed parquet + ONNX bytes under
ai/testdata/bisect/must regenerate byte-identically fromai/scripts/build_bisect_cache.pywith seedsFEATURE_SEED=20260418andMODEL_SEED=20260419. The CI--checkstep asserts this before every bisect run, so any upstream pull that bumpspandas/pyarrow/onnxenough to change the serialiser bytes will fail the workflow until the cache is regenerated and committed. - Re-test:
python ai/scripts/build_bisect_cache.py --check
vmaf-train bisect-model-quality \
ai/testdata/bisect/models/model_*.onnx \
--features ai/testdata/bisect/features.parquet \
--min-plcc 0.85 --input-name input
# Expected: "no regression in this range"; first_bad_index None.
Pure upstream code is not touched, so no Netflix-side conflict vector. Only fork-local files; risk is toolchain drift, not merge conflict.
0012 — Upstream ADM port (Netflix 966be8d5)¶
- Workstream PRs: this PR; ports a single upstream commit.
- Touches:
core/src/feature/integer_adm.{c,h},core/src/feature/x86/adm_avx2.{c,h},core/src/feature/x86/adm_avx512.{c,h},core/src/feature/alias.c,core/src/feature/barten_csf_tools.h(new upstream file). - Invariant: the eight ADM files now mirror upstream's content byte-for-byte (modulo our clang-format-22 pass and the Netflix copyright-year bump on the new header). Future
/sync-upstreamruns can take new upstream ADM commits cleanly. Do not revert to a pre-966be8d5ADM kernel without also reverting the call-site signatures ininteger_compute_adm— upstream extendedi4_adm_cmfrom 8 to 13 args. - Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model version=vmaf_v0.6.1 -o /tmp/vmaf-port.json
grep '<metric name="vmaf"' /tmp/vmaf-port.json
# Expected: mean ≈ 76.66890 (golden 76.66890519623612, places=4 OK).
0013 — Upstream motion port (Netflix PR #1486 head 2aab9ef1)¶
- Workstream PRs: this PR; ports upstream PR #1486 (4 commits on top of
966be8d5ADM base, head2aab9ef1). Sister to entry 0012. - Touches:
core/src/feature/integer_motion.{c,h},core/src/feature/motion_blend_tools.h(new upstream file),core/src/feature/x86/motion_avx2.c,core/src/feature/x86/motion_avx512.c,core/src/feature/alias.c(additive:integer_motion3row),python/test/{quality_runner,vmafexec,feature_extractor,vmafexec_feature_extractor}_test.py(golden tolerance updates:places=4→places=2on motion-affected asserts; expected values unchanged). - Invariant: motion files mirror upstream byte-for-byte (modulo our clang-format-22 pass). The
alias.crow forinteger_motion3was inserted surgically to avoid clobbering the AVX-512 ADM registration added by entry 0012; new motion3 metric appears in default VMAF model output but is not standalone-loadable via--feature integer_motion3(sub-feature only). Netflix golden VMAF mean shifts76.668904824→76.667830213(well withinplaces=2tolerance the upstream PR loosened to). Do not revertplaces=4on motion-touching assertions without also reverting the motion code. - Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model version=vmaf_v0.6.1 -o /tmp/vmaf-motion-port.json
grep -E '<metric name="vmaf"|integer_motion3' /tmp/vmaf-motion-port.json
# Expected: vmaf mean ≈ 76.66783; integer_motion3 mean ≈ 3.98976.
0014 — Coverage gate overhaul + upstream python/test/ reformat¶
- Workstream PRs: this PR (coverage-gate overhaul + in-tree reformat of upstream-mirror Python tests).
- Touches:
.github/workflows/ci.yml(CPU + GPU coverage jobs:-Dc_args=-fprofile-update=atomic/-Dcpp_args=-fprofile-update=atomic,meson test --num-processes 1,-Denable_dnn=enabled, ORT install step on the CPU coverage job,lcov/geninforeplaced bygcovrwith--json-summary/--xml/--txtoutput, artifact renamecoverage-lcov-{cpu,gpu}→coverage-{cpu,gpu}),scripts/ci/coverage-check.sh(rewritten to parse gcovr JSON viapython3 -c— same CLI signature),core/src/dnn/dnn_api.c+ newcore/src/dnn/dnn_attach_api.c(vmaf_use_tiny_modelcarved out into its own TU so the unit-test binaries — which pull indnn_sourcesforfeature_lpips.cbut never linklibvmaf.c— don't end up with an undefined reference tovmaf_ctx_dnn_attachonceenable_dnn=enabledactivates the real bodies),core/src/dnn/meson.build+core/src/meson.build(newdnn_libvmaf_only_sourceslist wired intolibvmaf.soonly),python/test/{feature_extractor,quality_runner,vmafexec,vmafexec_feature_extractor}_test.py(mechanical Black + isort reformat — no assertion values changed, imports regrouped, line wrapping normalised). - Invariant: coverage CI must keep all five pieces in lockstep — (a)
-fprofile-update=atomiccloses the intra-process counter race on SIMD inner loops (vif_avx2.c:673,motion_avx2, etc.) → negative counts →geninfo/gcovr abort; (b)--num-processes 1closes the inter-process race where multiple parallel test binaries merge their counters into the same.gcdafiles for the sharedlibvmaf.soat process exit (per-thread atomicity does not cover this); (c)gcovrdeduplicates.gcnofiles belonging to the same source compiled into multiple targets — without dedup, lcov sums hits across compilation units and yields impossible100% values (
dnn_api.c — 1176%was the smoking gun on the first attempt that had only (a)+(b)); (d) ORT install +enable_dnn=enabledin the coverage job is what makescore/src/dnn/*.cmeasurable in the first place — without ORT, the DNN tree compiles in stub branches and the 85% per-critical-file gate is meaningless; (e)vmaf_use_tiny_modellives indnn_attach_api.cand is added tolibvmaf.soonly viadnn_libvmaf_only_sources— moving it back intodnn_api.creintroduces thevmaf_ctx_dnn_attachundefined-reference link error intest_feature_extractor/test_lpipswheneverenable_dnn=enabled, since those test binaries pull indnn_sourcesforfeature_lpips.cbut never linklibvmaf.c. Lint scope: upstream-mirror Python tests are linted at the same standard as fork-added code; we accept that/sync-upstreamand/port-upstream-commitwill re-trigger Black/isort failures whenever upstream rewrites these files, and the fix is another in-tree reformat pass — never an exclusion. The fork'spyproject.tomland.pre-commit-config.yamlkeeppython/test/resource/(binary fixtures only) excluded;python/test/*.pyis in scope. See ADR-0110 (race fixes, superseded) and ADR-0111 (gcovr + ORT layer). - Re-test:
# Reproduce coverage path locally (requires gcc + python3-pip):
pip install --user 'gcovr>=8.0'
cd libvmaf
meson setup build-cov-test --buildtype=debug -Db_coverage=true \
-Denable_avx512=true -Denable_float=true -Denable_dnn=disabled \
-Dc_args=-fprofile-update=atomic -Dcpp_args=-fprofile-update=atomic
ninja -C build-cov-test
meson test -C build-cov-test --print-errorlogs --num-processes 1
~/.local/bin/gcovr --root .. \
--filter 'src/.*' \
--exclude '.*/test/.*' --exclude '.*/tests/.*' \
--exclude '.*/subprojects/.*' \
--gcov-ignore-parse-errors=negative_hits.warn \
--gcov-ignore-parse-errors=suspicious_hits.warn \
--print-summary --txt build-cov-test/coverage.txt \
--json-summary build-cov-test/coverage.json \
build-cov-test
grep -E 'dnn_api|model_loader' build-cov-test/coverage.txt
# Expected: gcovr completes without "Unexpected negative count" AND no
# per-file percentages exceed 100% (drop --num-processes 1 to reproduce
# the multi-process .gcda merge race; switch back to lcov to reproduce
# the dnn_api.c — 1176% over-count from compilation-unit summation).
# Lint smoke test for upstream-mirror tree:
pre-commit run --files python/test/quality_runner_test.py
# Expected: Black/isort/Ruff all PASS — files are reformatted in-tree
# to fork style and stay clean until the next upstream sync.
0015 — Tox doctest collection skips vmaf/resource/¶
- Workstream PRs: this PR (
fix(ci): skip pytest doctest collection of vmaf/resource/ data files). Surfaced once ADR-0115 consolidated CI triggers tomasterand tox actually started running on PRs. - Touches:
python/tox.ini(single-line--ignore=vmaf/resourceadded to the pytest invocation, plus an explanatory comment block). Pure fork-local; no upstream Python file changes. - Invariant:
pytest --doctest-modulesmust not attempt to import files underpython/vmaf/resource/. Those are parameter / dataset / example-config.pyfiles; several have dots in their stems (e.g.vmaf_v7.2_bootstrap.py) that make them unimportable as Python modules. None carry doctests, so the ignore is correctness rather than a workaround. Do not drop the--ignore=vmaf/resourceflag without first verifying every file under that directory has been renamed to a dot-free stem and is importable. - Re-test:
cd python && tox -e py311 -- --collect-only --doctest-modules \
--ignore=vmaf/resource 2>&1 | grep -c "ERROR collecting vmaf/resource"
# Expected: 0 (was 5 before the fix).
Pure upstream code is not touched, so no Netflix-side conflict vector. Risk is upstream renaming or removing files under python/vmaf/resource/ such that the directory disappears, in which case the --ignore becomes a harmless no-op.
0016 — SYCL -fsycl link-arg gated on icpx CXX¶
- Workstream PRs: this PR (
fix(libvmaf): gate -fsycl link arg on icpx CXX, allow gcc/clang host linker). Surfaced once ADR-0115's CI consolidation added an Ubuntu SYCL job to PR-time CI that usesCXX=g++(host linker) with sidecar icpx for SYCL .cpp compilation. - Touches:
core/src/meson.build(thevmaf_link_argsblock immediately after theis_sycl_enabledflag handling — currently ~lines 696-712). Pure fork-local; no upstream Meson file changes expected. - Invariant:
-fsyclis appended tovmaf_link_argsonly whenmeson.get_compiler('cpp').get_id() == 'intel-llvm'(icpx). Rationale: the documented project mode (see comment nearis_sycl_enabledblock at top ofsrc/meson.build) compiles SYCL.cppfiles viacustom_targetwith icpx, while the project's CXX driver may be gcc / clang / msvc; in that mode the SPIR-V device code is already embedded in the icpx-compiled.ofiles at compile time, and the runtime libraries (libsycl+libsvml+libirc+libze_loader) declared as link dependencies resolve every symbol. Passing-fsyclto a non-icpx linker is a hard error (g++: error: unrecognized command-line option '-fsycl'). Do not remove thecpp.get_id() == 'intel-llvm'guard without first verifying every CI matrix leg uses icpx as the project CXX. - Re-test:
meson setup build -Denable_sycl=true \
-Dcpp_link_args=-Wl,--no-undefined
ninja -C build src/libvmaf.so.3
# Expected: link succeeds; no `-fsycl` errors with gcc/clang host CXX.
Pure fork-local guard; no Netflix-side conflict vector.
0017 — CLI precision default %.6f (Netflix-compat) + frame-skip unref¶
- Workstream PRs: this PR (
fix(cli): revert precision default to %.6f and unref skipped frames). Reverts the default flipped by commitc989fbd9(ADR-0006) per ADR-0119. Companion fix incore/tools/vmaf.cresolves the picture-pool exhaustion in the--frame_skip_ref/distloops surfaced once the always-on picture pool (ADR-0104) made unref'ing skipped pictures mandatory. - Touches:
core/tools/cli_parse.c(VMAF_DEFAULT_PRECISION_FMT+VMAF_LOSSLESS_PRECISION_FMTmacros,resolve_precision_fmt()body,--helptext)core/tools/cli_parse.h(field comments only; struct shape unchanged)core/src/output.c(DEFAULT_SCORE_FORMATmacro)core/tools/vmaf.c(skip loop bodies at thec.frame_skip_ref/c.frame_skip_distfor-loops)python/vmaf/core/result.py(per-frame and aggregate:.6fformatters)python/test/command_line_test.pyis unmodified — Netflix golden assertions stay frozen per CLAUDE.md §8; the binary's output format adapts to them, not the other way around.- Invariant:
vmafCLI default score-output format is%.6f(matches upstream Netflix byte-for-byte).--precision=max|fullselects%.17g(IEEE-754 round-trip lossless).--precision=legacyis a synonym for the default. The library default forvmaf_write_output_with_format(..., score_format=NULL)matches. Skipped frames in the--frame_skip_ref/--frame_skip_distpre-loops arevmaf_picture_unref'd immediately after fetch so the preallocated picture pool is not exhausted before the main scoring loop runs. Do not flip the macros back to%.17gor remove the unrefs without a superseding ADR — both are golden-gate-load-bearing. - Re-test:
ninja -C core/build
python -m pytest python/test/command_line_test.py \
::VmafexecCommandLineTest::test_run_vmafexec \
::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping \
::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping_unequal \
-v
# Expected: all three PASS in <1 s combined.
Pure fork-local; no Netflix-side conflict vector. If upstream ever changes the default format string, treat their value as the new baseline and reconfirm the golden assertions before adopting.
0018 — FFmpeg patches ship as ordered series.txt¶
- Workstream PRs: this PR (
fix(ci): drop dead sycl trigger + consolidate windows.yml into libvmaf.yml (ADR-0115)). Surfaced once ADR-0115's consolidation routed the docker / FFmpeg-SYCL jobs through the master-targeting CI gate for the first time on this branch — the standalone0003-…sycl…apply broke because it referenced struct fields added by0001-…tiny-model…, the Dockerfile onlyCOPY'd 0003, andffmpeg.ymlreferenced a stale../patches/path. - Touches:
Dockerfile(lines ~86-95 — the FFmpeg patch-apply block),.github/workflows/ffmpeg.yml(theBuild FFmpeg with SYCL patch seriesstep),ffmpeg-patches/000{1,2,3}-*.patch(regenerated via realgit format-patch -3so they carry validindex <sha>..<sha> <mode>lines and committable SHAs). Pure fork-local; no upstream FFmpeg or Netflix file changes. - Invariant: both the Dockerfile and
ffmpeg.ymlwalkffmpeg-patches/series.txtline-by-line and apply each patch viagit applywith apatch -p1fallback. Do not ship a new patch without appending it toseries.txt, and do not reorder existing entries — patch 0003 references LIBVMAFContext fields added by patch 0001, so any out-of-order apply breaks the build at hunk 2 of vf_libvmaf.c. - Two flag-side fixes bundled in the same PR:
--enable-libvmaf-syclis not a valid FFmpeg configure option. Patch 0003 usescheck_pkg_config libvmaf_sycl …auto-detection (matching howlibvmaf_cudais wired) — it never registers the switch. Both Dockerfile and ffmpeg.yml used to pass the flag and configure rejected it withUnknown option "--enable-libvmaf-sycl". SYCL support is now controlled solely by-Denable_sycl=trueat libvmaf build time; FFmpeg picks it up automatically whenlibvmaf-sycl.pcis onPKG_CONFIG_PATH.- The Dockerfile now carries two nvcc-flag ARGs.
NVCC_FLAGS(libvmaf) keeps four-gencodelines plus the experimental--extended-lambda/--expt-relaxed-constexpr/--expt-extended-lambdaflags needed for Thrust/CUB host+device code.FFMPEG_NVCC_FLAGS(FFmpeg) carries a single-gencode arch=compute_75,code=sm_75 -O2— FFmpeg'scheck_nvccrunsnvcc -ptx, which fails withnvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectureson multi-arch input, and--extended-lambdarequires host+device compilation. compute_75 PTX is forward-compatible with all newer GPUs via driver JIT. --enable-libnppis no longer passed to FFmpeg's configure. FFmpeg n8.1's libnpp probe carries an explicitdie "ERROR: libnpp support is deprecated, version 13.0 and up are not supported"(configure:7335-7336) that fires on the base image's CUDA 13.2 libnpp. We don't use scale_npp / transpose_npp / sharpen_npp in any VMAF workflow; cuvid + nvdec + nvenc + libvmaf-cuda is the actual GPU path. Revisit once we move to an FFmpeg release that supports CUDA 13 libnpp upstream.- Patch 0002 (
add-vmaf_pre-filter) gained a missing#include "libavutil/imgutils.h"forav_image_copy_plane(). FFmpeg's libavfilter Makefile builds with-Werror=implicit-function-declarationso this fired during the actual compile (not configure). Caught by a localdocker buildrather than waiting for GitHub Actions — much faster iteration loop. - Re-test:
cd /tmp && rm -rf ffmpeg-test && \
git clone -q --depth 1 -b n8.1 \
https://git.ffmpeg.org/ffmpeg.git ffmpeg-test && \
cd ffmpeg-test && \
while IFS= read -r line; do \
case "$line" in ''|\#*) continue ;; esac; \
git apply "/path/to/vmaf/ffmpeg-patches/$line" \
|| patch -p1 < "/path/to/vmaf/ffmpeg-patches/$line"; \
done < /path/to/vmaf/ffmpeg-patches/series.txt
# Expected: all three patches apply with no rejects; the resulting
# tree compiles with --enable-libvmaf. SYCL is auto-detected via
# check_pkg_config (patch 0003), so no explicit configure flag is
# required when libvmaf-sycl.pc is on PKG_CONFIG_PATH.
Pure fork-local series; no Netflix-side conflict vector. See ADR-0118.
0019 — Coverage Gate annotations: upload-artifact v7 + gcovr filter¶
- Workstream PRs: this PR.
- Touches:
.github/workflows/ci.yml(CPU + GPU coverage steps: gcovr stderr piped throughgrep -vE 'Ignoring (suspicious|negative) hits' ... || true),.github/workflows/{ci,lint,nightly,nightly-bisect,supply-chain,libvmaf}.yml(actions/upload-artifact@v5|@v6 → @v7,actions/download-artifact@v5 → @v7insupply-chain.yml). Note:windows.ymlwas consolidated intolibvmaf.ymlby ADR-0115 / PR #50, so the windows-side bump now lives inlibvmaf.yml'sbuild (MINGW64, …)job. - Invariant: Coverage Gate Annotations panel must finish empty on a clean run. The two pieces are coordinated — (a)
@v7for upload / download artifact actions silences GitHub's Node-20 deprecation banner ahead of the 2026-06-02 forced-Node-24 cutoff; (b) the gcovr stderr filter swallows theIgnoring (suspicious|negative) hitswarnings that gcovr 8 emits for the legitimately-large hit counts in tight ANSNR / VIF / motion inner loops (e.g.ansnr_tools.c:207at ~4.93 G hits across an HD multi-frame coverage suite — real, not gcov bug). The filter is regex-narrow and anchored to gcov's exact warning prefix; any other gcovr warning still surfaces. Upstream (Netflix/vmaf) does not maintain these CI files; rebase impact is limited to the unlikely case that an upstream sync touches the shared.github/workflows/tree, which it currently does not. See ADR-0117. - Re-test:
# Verify gcovr filter locally (after a coverage build per entry 0014):
~/.local/bin/gcovr --root .. \
--filter 'src/.*' \
--exclude '.*/test/.*' --exclude '.*/tests/.*' \
--exclude '.*/subprojects/.*' \
--gcov-ignore-parse-errors=negative_hits.warn \
--gcov-ignore-parse-errors=suspicious_hits.warn \
--print-summary --txt build-cov-test/coverage.txt \
build-cov-test \
2> >(grep -vE 'Ignoring (suspicious|negative) hits' >&2 || true)
# Expected: stderr contains the gcovr summary block but NO
# "Ignoring (suspicious|negative) hits" lines. coverage.txt unchanged.
# Verify all upload/download-artifact instances are on @v7:
grep -rE 'actions/(upload|download)-artifact@v[0-6]' .github/workflows/
# Expected: empty output.
0020 — CI workflow file + display-name renames (Title Case sweep)¶
- Workstream PRs: this PR; renames all six core
.github/workflows/*.ymlfiles to purpose-descriptive kebab-case and normalises every workflowname:and jobname:to Title Case. See ADR-0116. - Touches:
.github/workflows/{ci,lint,security,libvmaf,ffmpeg,docker}.yml(renamed viagit mvtotests-and-quality-gates.yml,lint-and-format.yml,security-scans.yml,libvmaf-build-matrix.yml,ffmpeg-integration.yml,docker-image.yml),README.md(5 badge URLs + labels),docs/principles.md(line 5 workflow-tuple update),.claude/skills/add-gpu-backend/SKILL.md+scaffold.sh(filename refs),docs/adr/0116-*.md(new),docs/adr/README.md(index row),CHANGELOG.md. - Invariant: workflow files are purpose-named; their
name:fields are Title Case sentences with em-dash axis tags; job-levelname:strings are Title Case sentences (Build — / Pre-Commit / Coverage Gate / etc.). Required-status-check contexts inmasterbranch protection are bound to job-level names — when renaming any job, re-pin viagh api --method PUT repos/VMAFx/vmafx/branches/master/protection. The 19 required gates' semantics are unchanged from ADR-0037; only their display strings move. - Re-test:
# Validate every workflow file parses and lists the expected job names.
cd .github/workflows
for f in tests-and-quality-gates.yml lint-and-format.yml security-scans.yml \
libvmaf-build-matrix.yml ffmpeg-integration.yml docker-image.yml; do
yq '.name, .jobs.[].name' "$f" || echo "PARSE FAIL: $f"
done
# Expected: each workflow prints its Title Case workflow name + job names;
# no PARSE FAIL lines.
0021 — DNN-enabled CI matrix legs (gcc + clang + macOS)¶
- Workstream PRs: this PR; adds three new entries to the
libvmaf-buildmatrix in.github/workflows/libvmaf-build-matrix.ymlcovering-Denable_dnn=enabledacross Ubuntu/gcc, Ubuntu/clang, and macOS/clang. See ADR-0120. - Touches:
.github/workflows/libvmaf-build-matrix.yml(3 new matrix entries + ORT install steps + dedicated dnn-suite test step),docs/adr/0120-ai-enabled-ci-matrix-legs.md(new),docs/adr/README.md(index row),CHANGELOG.md(Added entry). - Invariant: the DNN matrix legs install ONNX Runtime via the same pinned source as the dedicated Tiny AI job (tests-and-quality-gates.yml) — Linux: MS tarball at the version pinned by
ORT_VERSION; macOS: Homebrew. When the Tiny AI job's pin changes, the matrix legs'ORT_VERSIONenv in theirInstall ONNX Runtime (linux, DNN leg)step must change to match; otherwise compiler/portability coverage drifts away from the gating leg's actual ABI. - Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.libvmaf-build.strategy.matrix.include[] | select(.dnn==true) | .name' \
.github/workflows/libvmaf-build-matrix.yml
# Expected output (3 lines):
# Build — Ubuntu gcc (CPU) + DNN
# Build — Ubuntu clang (CPU) + DNN
# Build — macOS clang (CPU) + DNN
# Local DNN build sanity (matches what each leg will run):
meson setup libvmaf core/build --buildtype release \
--prefix $PWD/install -Denable_float=true -Denable_dnn=enabled
ninja -vC core/build install
meson test -C core/build --suite=dnn --print-errorlogs
- Branch protection: the two Linux DNN legs are pinned as required status checks on
masterimmediately after this PR's merge (19 → 21 contexts). The macOS leg stays informational (experimental: true) because Homebrew ORT floats. Re-pin command:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
--input /tmp/protection-update.json
0022 — Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)¶
- Workstream PRs: this PR; adds a new top-level
windows-gpu-buildjob to.github/workflows/libvmaf-build-matrix.ymlwith two matrix entries (CUDA, SYCL). See ADR-0121. - Touches:
.github/workflows/libvmaf-build-matrix.yml(newwindows-gpu-buildjob),docs/adr/0121-windows-gpu-build-only-legs.md(new),docs/adr/README.md(index row),CHANGELOG.md(Added entry),core/src/compat/win32/pthread.h(new — Win32 pthread shim for MSVC; mirrorscompat/gcc/stdatomic.hpattern),core/src/feature/integer_adm.h(UPSTREAM — converted thedwt_7_9_YCbCr_threshold[3]designated initializer to positional form so MSVC/nvcc-on-Windows accepts the C++ parse; semantically identical, no behavioural change),core/src/ref.handcore/src/feature/feature_extractor.h(UPSTREAM — added#if defined(__cplusplus) && defined(_MSC_VER)branch around#include <stdatomic.h>so MSVC C++ TUs pullatomic_intviausing std::atomic_int;; POSIX paths unchanged),core/src/sycl/d3d11_import.cpp(fix non-existent<libvmaf/log.h>→"log.h"),core/src/sycl/dmabuf_import.cpp(move<unistd.h>inside#if HAVE_SYCL_DMABUFguard for non-VA-API hosts),core/src/sycl/common.cpp(replace POSIXclock_gettime(CLOCK_MONOTONIC)with portablestd::chrono::steady_clock),core/src/feature/x86/motion_avx2.c(UPSTREAM — replace GCC vector-extension__m256i[N]indexing at line 529 with_mm256_extract_epi64; bit-exact),core/src/feature/x86/adm_avx2.c(UPSTREAM — replace 6(__m256i)(_mm256_cmp_ps(...))casts with_mm256_castps_si256(...)and 12__m128i[N]reductions with_mm_extract_epi64; bit-exact),core/src/feature/x86/adm_avx512.c(UPSTREAM — replace 12__m128i[N]reductions with_mm_extract_epi64; bit-exact),core/src/log.c(UPSTREAM — gate<unistd.h>behind!_WIN32, include<io.h>+ redirectisatty/filenoto_isatty/_filenofor MSVC),core/src/feature/integer_vif.c(UPSTREAM — switch thealigned_malloccursor fromvoid *touint8_t *with explicit typed-pointer casts so MSVC accepts the byte-wise pointer arithmetic),core/src/feature/cuda/integer_adm_cuda.c(UPSTREAM — drop unused<unistd.h>include),core/src/dnn/model_loader.c(fork-added — Windows fallback definitions for POSIXS_ISDIR/S_ISREGpath-classification macros),.github/workflows/lint-and-format.yml(fork-added — setlfs: trueon the pre-commit job's checkout so LFS-stored ONNX blobs resolve and don't appear as phantom pre-commit-induced diffs),core/src/feature/x86/motion_avx512.c(UPSTREAM — replace 1__m128i[N]reduction with_mm_extract_epi64; bit-exact),core/src/feature/x86/{vif_statistic_avx2,ansnr_avx2,ansnr_avx512,float_adm_avx2,float_adm_avx512,float_psnr_avx2,float_psnr_avx512,ssim_avx2,ssim_avx512}.c(UPSTREAM — convert 17 sites of trailing__attribute__((aligned(N)))to leading C11_Alignas(N); same alignment, MSVC-portable),core/src/feature/mkdirp.candcore/src/feature/mkdirp.h(UPSTREAM third-party MIT-licensed micro-library — gate<unistd.h>to non-Windows, add<direct.h>+_mkdirfor Windows, addmode_ttypedef for MSVC),core/meson.build(newpthread_dependencygated oncc.check_header('pthread.h')failing),core/src/meson.buildandcore/test/meson.build(threadpthread_dependencyinto every target compiling pthread-using TUs). - Invariant: Windows GPU legs are pinned to the same toolchain versions as the corresponding Linux GPU legs (CUDA 13.0.0, oneAPI BaseKit 2025.3.0.372) so a Linux-vs-Windows divergence implies an MSVC ABI issue, not a tooling-version delta. When either Linux GPU leg bumps its toolchain, the Windows leg must move in lockstep — the Intel installer URL on Windows hard-codes the per-release directory id and the version string, so the bump is two-line edits in the SYCL
Install Intel oneAPI (windows)step (theWINDOWS_BASEKIT_URLenv var). Both legs additionally inject/experimental:c11atomicsintoCFLAGS/CXXFLAGSbecause libvmaf uses C11 atomics that MSVC's<stdatomic.h>rejects without that opt-in flag — when MSVC ships full C11 atomics support, the flag becomes unconditional and can be dropped. Two Windows-only dependency steps round out the parity: the CUDA leg'sJimver/cuda-toolkitsub-package list includes bothcrt(CUDA Runtime Library compile-time headers, shipscrt/host_config.h;cuda_ccclis not a valid Windows sub-package name — installer rejects it) andnvvm(shipsnvvm/bin/cicc.exe+nvvm/libdevice/libdevice.*.bc; without it, nvcc's.cu → PTXstage fails withThe system cannot find the path specified.— on Linux apt pulls NVVM in transitively withcuda-nvcc-XY, Windows requires it explicitly); the SYCL leg builds the Level Zero loader from source (oneapi-src/level-zerov1.18.5 →cmake --build … --target install) because Windows oneAPI BaseKit ships the SYCL runtime but notze_loader.lib, and libvmaf's mesoncc.find_library('ze_loader')needs both the header and the import library. When the Linux aptlevel-zero-devversion moves, bump the L0 git tag to match.core/src/meson.buildguards the explicitsvml/irccc.find_librarycalls behindhost_machine.system() != 'windows'— those calls exist for the gcc/g++ + icpx Linux flow where the host linker is non-Intel; on Windows the host compiler is icx-cl itself and auto-injects the Intel runtime. Round-10 surfaced an additional Windows-only gap: ~14 libvmaf TUs#include <pthread.h>unconditionally, but MSVC and clang-cl ship no pthread (MinGW does, via winpthreads). The fork now ships a header-only Win32 shim atcore/src/compat/win32/pthread.hmapping the in-use pthread subset (mutex / cond / thread create+join+detach) onto SRWLOCK + CONDITION_VARIABLE +_beginthreadex. The shim is wired in viapthread_dependencyincore/meson.build, declared only whencc.check_header('pthread.h')fails — so MinGW and POSIX paths stay untouched. When upstream Netflix/vmaf adds new pthread surface (e.g.,pthread_rwlock_*), extendcompat/win32/pthread.hto cover it. Both nvcc fatbincustom_targets (CUDA) and icpxcustom_targets (SYCLcommon.cpp/picture_sycl.cpp/dmabuf_import.cpp, plus the SYCL feature kernels) bypass meson'sdependencies:plumbing and hand-roll their own-Ilists, so the shim path must be threaded into bothcuda_extra_includesandsycl_inc_flagsexplicitly on Windows. icpx-cl on Windows additionally rejects-fPIC(unsupported option for target 'x86_64-pc-windows-msvc') — sosycl_common_argsandsycl_feature_argsroute their-fPICtoken throughsycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []. PIC is the default for Windows DLLs, so dropping the flag is the correct fix rather than a workaround. Round-14 surfaced a third Windows-only blocker:core/src/feature/integer_adm.h(an upstream Netflix file, last touched by upstream port d06dd6cf) initialisesdwt_7_9_YCbCr_threshold[3]with C99 designated initializers ({.a = ..., .k = ..., .f0 = ..., .g = {...}}). The header is included from bothinteger_adm.c(C TU) andcuda/integer_adm/*.cu(C++ TU via nvcc); MSVC's C++ frontend (and nvcc's cudafe++ on Windows) rejects C99 designated initializers without/std:c++20. Converted to positional initialization in the same struct-member order (a / k / f0 / g[4]) — the conversion is provably semantically identical and works in every C/C++ standard, so it costs nothing on the upstream-merge side beyond a trivial conflict marker if upstream Netflix later edits the same lines. Restore designated form post-merge if upstream has it. Round-17 surfaced four more Windows/MSVC-only SYCL blockers, two of which touch upstream-shared headers. (a)core/src/ref.handcore/src/feature/feature_extractor.h(UPSTREAM) unconditionally#include <stdatomic.h>and use theatomic_inttypedef in struct definitions. MSVC's<stdatomic.h>(added in 19.34) only declares the C11 symbols inside the global namespace under C; in C++ compilation (icpx-cl drives the SYCL TUs as C++) MSVC surfaces them only insidenamespace std::. gcc/clang expose both via a GNU extension, so the upstream code works on every other platform. The fork now wraps both headers'#include <stdatomic.h>in#if defined(__cplusplus) && defined(_MSC_VER)→#include <atomic>+using std::atomic_int;, falling through to the original<stdatomic.h>line on every other configuration. ABI is unchanged —atomic_intresolves to the same underlying type. If upstream Netflix adds further C11 atomic typedefs in these headers (e.g.,atomic_uint,atomic_size_t), extend theusing std::lines to cover them. (b)core/src/sycl/d3d11_import.cpp(fork-added) used<libvmaf/log.h>which doesn't exist —log.hlives atcore/src/log.hand is internal. Switched to"log.h"; the icpx invocation already supplies the src-relative-I. (c)core/src/sycl/dmabuf_import.cpp(fork-added) included<unistd.h>at file scope, but POSIXclose()is only used inside the#if HAVE_SYCL_DMABUFVA-API block. Moved the<unistd.h>include inside that guard so non-DMA-BUF builds (Windows MSVC, macOS) compile cleanly. (d)core/src/sycl/common.cpp(fork-added) calledclock_gettime(CLOCK_MONOTONIC), which doesn't exist on Windows. Replaced withstd::chrono::steady_clock(guaranteed monotonic by the C++ standard, portable on every supported host). All four fixes preserve POSIX/Linux behaviour bit-identically and only change the Windows MSVC build path. Round-18 surfaced a fifth Windows blocker on the CUDA leg's CPU SIMD compile path:core/src/feature/x86/motion_avx2.c:529(UPSTREAM, ported in commit 9371a0aa from Netflix PR #1486) computedfinal_accum[0] + final_accum[1] + final_accum[2] + final_accum[3]to extract the four int64 lanes from an__m256i. gcc/clang allow this via the GNU vector-extension treatment of__m256i(it carries__attribute__((vector_size(32)))); MSVC rejects it withC2088: built-in operator '[' cannot be applied to an operand of type '__m256i'. Replaced with_mm256_extract_epi64(final_accum, N)for N ∈ {0..3}, summed — bit-exact lane sum on every compiler. Restore the index form post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Round-19 surfaced the same MSVC pattern at 19 more call sites across the AVX2/AVX-512 ADM and motion files plus six GCC-style vector casts.core/src/feature/x86/adm_avx2.c(UPSTREAM): 6 lines (915-920) used(__m256i)(_mm256_cmp_ps(...))C-style casts that gcc/clang accept via the GNU vector extension; replaced with the dedicated_mm256_castps_si256(...)bit-cast intrinsic. 12 lane-extract sites (r2_h[0]+r2_h[1], etc. at lines 2420 / 2425 / 2430 / 2893 / 2897 / 2901 / 4079 / 4084 / 4089 / 4627 / 4631 / 4635) replaced with_mm_extract_epi64(r2_X, N)summed pair.core/src/feature/x86/adm_avx512.c(UPSTREAM): 6 sister lane-extract sites (lines 4470 / 4477 / 4484 / 4625 / 4631 / 4637) — same fix. The AVX-512 paths reduce a__m512idown to__m128ifirst (via_mm512_extracti64x4_epi64→_mm256_extracti64x2_epi64) before the index, so only the final__m128i[N]step needed changing.core/src/feature/x86/motion_avx512.c(UPSTREAM, ported in 9371a0aa from PR #1486): one finalr2[0]+r2[1]reduction (line 448), same fix. All 19 lane-extract fixes plus the 6 cast fixes are bit-exact rewrites and only change the source-level syntax to MSVC-portable form. Restore the original forms post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Additionallycore/src/sycl/d3d11_import.cpp(fork-added) switched from C-style COBJMACROS helpers (ID3D11Device_CreateTexture2D,…_Release, etc.) to C++ method-call syntax (device->CreateTexture2D,tex->Release) — d3d11.h gates COBJMACROS behind!defined(__cplusplus), so the C-style helpers aren't visible in this.cppTU. The two forms are ABI-equivalent (both dispatch through the COM vtable); the choice is purely lexical and POSIX builds aren't affected (the whole TU is#ifdef _WIN32). Round-20 surfaced two more Windows-only blockers. (a) 17 sites across the x86 SIMD layer used GCC'sfloat tmp[N] __attribute__((aligned(M)));form to align scratch buffers for_mm{256,512}_store_ps. MSVC rejects the trailing-attribute syntax withC2146: syntax error: missing ';' before identifier '__attribute__'. Replaced with the C11-standard_Alignas(M) float tmp[N];(alignment specifier before the type) — works in gcc, clang and MSVC with/std:c11. Files touched (all UPSTREAM):vif_statistic_avx2.c(×2),ansnr_avx2.c(×2),ansnr_avx512.c(×2),float_adm_avx2.c(×2),float_adm_avx512.c(×2),float_psnr_avx2.c(×1),float_psnr_avx512.c(×1),ssim_avx2.c(×4),ssim_avx512.c(×4). The pre-existingvif_avx2.c/vif_avx512.calready define a portableALIGNED(x)macro at file scope and position the attribute before the type, so they compile cleanly under MSVC and were not touched. (b)core/src/feature/mkdirp.c(UPSTREAM, third-party MIT-licensed copy of Stephen Mathieson's micro-library) included<unistd.h>unconditionally but never used POSIXunistdsymbols (onlymkdirvia<sys/stat.h>/<direct.h>). Gated<unistd.h>to non-Windows and added<direct.h>for Windows; switchedmkdir(pathname)→_mkdir(pathname)(the non-deprecated MSVC name).core/src/feature/mkdirp.hadded amode_ttypedef under MSVC since neither<sys/types.h>nor<sys/stat.h>declare it on Windows;modeis ignored on the Windows path anyway. Round-21 surfaced two more blockers (the round-19__m128i[N]sweep missed six sites) plus a pre-commit workflow checkout gap. (a)core/src/feature/x86/adm_avx512.c(UPSTREAM) had six furtherr2_X[0] + r2_X[1]reductions at lines 2128 / 2135 / 2142 / 2589 / 2595 / 2601 that reduce a__m512iaccumulator down to__m128ibefore the lane index. Replaced with the same_mm_extract_epi64(r2_X, N)summed-pair pattern used in round 19 — bit-exact, MSVC-portable. (b)core/src/log.c(UPSTREAM) included<unistd.h>unconditionally to pick up POSIXisatty/fileno. On MSVC both live in<io.h>as_isatty/_fileno; gated the include and macro-redirected the names so the one call site at line 34 compiles on both sides without touching the POSIX path. (c).github/workflows/lint-and-format.yml(fork-added) checks out withoutlfs: true, so themodel/tiny/*.onnxfiles land as LFS pointer stubs. pre-commit's "changes made by hooks" reporter then diffs the stubs against HEAD's real blobs and fails the job even though no hook touched them. Addedlfs: trueto the pre-commit job's checkout. (d)core/src/meson.build—cuda_common_vmaf_libstatic library had nodependencies:list, so the Win32 pthread shim (wired in viapthread_dependencyin core/meson.build) wasn't on its include path;cuda/common.hunconditionally#include <pthread.h>and MSVC failed with C1083. Addeddependencies : [pthread_dependency]— no-op on POSIX (empty list), routes the shim path in on Windows. (e)core/src/feature/integer_vif.c(UPSTREAM) walked one bigaligned_mallocresult asvoid *dataand diddata += pad_size/data += h * stride_16etc. to carve the buffer into typed sub-pointers. gcc/clang accept pointer arithmetic onvoid *as a GNU extension (treatingsizeof(void) == 1); MSVC rejects it withC2036: 'void *': unknown size. Replaced the cursor type withuint8_t *and added explicit casts at assignment sites that take a typed pointer (uint16_t *mu1,uint32_t *mu1_32, etc.). Byte offsets are identical, layout unchanged, bit-exact. If upstream Netflix edits the same loop, reabsorb the walk and re-apply the cursor-type + cast pattern. (f)core/src/feature/cuda/integer_adm_cuda.c(UPSTREAM) included<unistd.h>at line 33 but used no POSIX symbols from it; MSVC failed with C1083. Dropped the unused include outright — simplest fix, no runtime change on any platform. (g)core/src/dnn/model_loader.c(fork-added) usesS_ISDIR/S_ISREGto classify resolved paths. MSVC ships the underlyingS_IFMT/S_IFDIR/S_IFREGbit masks in<sys/stat.h>but not the POSIX classification macros. Added a Windows-only fallback (#ifndef S_ISDIR #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #endif, same for S_ISREG) guarded by#ifdef _WIN32. Semantically identical to the POSIX macro on Linux/macOS. Round-21e surfaced the final source-portability blockers once the DLL build passed preprocessing. (h)core/src/predict.c,core/src/libvmaf.candcore/src/read_json_model.c(all UPSTREAM) used C99 variable-length arrays —double scores[cnt]at predict.c:385,char name[name_sz]at predict.c:453 and libvmaf.c:1741, pluscfg_name[cfg_name_sz]andgenerated_key[generated_key_sz]in the.jsonmodel-collection parser. gcc/clang accept VLAs as a C11 optional feature; MSVC (even with/std:c11) rejects them outright withC2057: expected constant expression(plus C2466 and C2133 on theconst size_tsized arrays — MSVC treatsconstas runtime-bounded, not a constant expression, even when the initialiser is literal like4 + 1). Replaced each runtime-sized buffer with a smallmalloc+ explicitfreeon every exit path (in predict.c and read_json_model.c agoto out;cleanup arm was introduced because the loops error-exit mid-function). Thegenerated_keybuffer in read_json_model.c uses the narrower fix —char generated_key[5];— since its size (four decimal digits of the bootstrap sub-model index plus NUL) is a true compile-time constant. Buffers are a handful of bytes each (name_szis the model-collection name length plus the fixed_ci_p95_losuffix,scoresholds ~20 doubles,cfg_nameis the name plus_0000suffix), so the heap round-trip is not performance-relevant; the new-ENOMEMfailure mode is handled uniformly by existing callers. The read_json_model.c refactor also plugs a pre-existing leak of thenamebuffer on the earlyreturn -EINVALwhen a JSON object key isn't a string — thegoto out;path freesname+cfg_nameon every exit.core/test/test_feature_extractor.c:56(UPSTREAM) declaredconst unsigned n_threads = 8;and used it as the extent ofVmafFeatureExtractorContext *fex_ctx[n_threads];. Converted toenum { n_threads = 8 };so MSVC sees a constant-expression; every other compiler accepts enum constants identically. Re-absorb if upstream Netflix later edits the same loops and your toolchain matrix omits MSVC. (i) The Windows MSVC build-only legs now build the full tree — CLI tools, unit tests and libvmaf.dll — rather than the previous short cut of disabling-Denable_tools/-Denable_tests. Per user direction ("fix the code ffs"), the tree polyfills the remaining POSIX surfaces on MSVC instead: (core/tools/compat/win32/getopt.h+core/tools/compat/win32/getopt.c) a from-scratch POSIX/GNU-compatiblegetopt_longshim (short / long options,no_argument/required_argument/optional_argument, argv permutation for non-option operands,--explicit stop,=-embedded values). The shim is fork-added (BSD-3-Clause-Plus-Patent, Copyright 2026 Lusoris and Claude) and declared via a singlegetopt_dependencyincore/meson.build, gated oncc.check_header('getopt.h')failing. The dependency auto-propagates the shim.cinto any consuming target via meson'ssources:keyword, so both thevmafCLI (core/tools/meson.build) and thetest_cli_parseunit test (core/test/meson.build) pick it up uniformly. MinGW ships<getopt.h>via mingw-w64-crt, socheck_headersucceeds there and the shim stays out of the TU list. (j) Eleven test executables (test_log,test_dict,test_opt,test_cpu,test_ref,test_feature,test_ciede,test_luminance_tools,test_cli_parse,test_sycl,test_sycl_pic_preallocation) were missingpthread_dependencyin theirdependencies:lists atcore/test/meson.build. On POSIXpthread_dependencyis an empty list so the omission was invisible; on MSVC those TUs transitively includefeature_collector.h→<pthread.h>and fail with C1083. Threaded the dependency through all eleven targets.test_cli_parseadditionally listsgetopt_dependencyto pick up the shim. (k) Three additional VLA sites surfaced once the test harness built on MSVC:test_cambi.c:254hadunsigned w = 5, h = 5; uint16_t buffer[3 * w];; converted toenum { w = 5, h = 5 };so the array extent is a constant expression.test_pic_preallocation.c:382andtest_pic_preallocation.c:506hadconst int num_threads = N; pthread_t threads[num_threads];— MSVC rejectsconst intas non-constant-expression. Converted toenum { num_threads = N, fetches_per_thread = M };. (l)test_ring_buffer.c:23(since removed; the ring-buffer test logic was folded into the CUDA-buffer / pic-preallocation suites) andtest_pic_preallocation.c:26included<unistd.h>forusleep/sleep. Gated behind!_WIN32with a Win32 fallback via<windows.h>+#define usleep(us) Sleep(((us) + 999) / 1000)/#define sleep(s) Sleep((s) * 1000). The conversion rounds sub-millisecondusleepinputs up, which is safe for these test paths (they use 100 µs jitter and 1 s waits). (m)core/tools/vmaf.cincluded<unistd.h>forisatty/fileno. Applied the same gating pattern used inlog.cin round-21(b) — include<io.h>on MSVC and redirectisatty/filenoto_isatty/_filenovia#define. (n)__builtin_clz/__builtin_clzllare GCC intrinsics; MSVC ships__lzcnt/__lzcnt64via<intrin.h>instead. The shim already lived incore/src/feature/integer_vif.hbutinteger_adm.c:939,x86/adm_avx2.c:1425andx86/adm_avx512.c:1217don't include that header. Extracted the shim into a dedicatedcore/src/feature/compat_builtin.h(fork-added) and included it from all four TUs. The guard isdefined(_MSC_VER) && !defined(__clang__), so clang-cl / icx-cl (which provide the GCC intrinsics natively) skip the shim. (o) The SYCL leg's D3D11 import TUcore/src/sycl/d3d11_import.cppis C++ (icpx-cl drives it as C++ on Windows) but included the internal C headerlog.hwithout anextern "C"wrap.log.his an upstream Netflix header with no__cplusplusguard, sovmaf_loggot C++ name-mangled in the .cpp TU and failed to resolve against the C-linkage symbol produced bylog.cat link time (LNK2019from every test target that pulls in the SYCL static lib). Wrapped the#include "log.h"withextern "C" { ... }inside the fork-added .cpp rather than touching the upstream header — keepslog.hidentical to upstream on every/sync-upstream. (p) The Windows MSVC legs build with--default-library=static. libvmaf's public API has no__declspec(dllexport)attributes (upstream Netflix is POSIX-shaped), so a vanilla MSVC shared build producessrc/vmaf-3.dllwith no exported symbols and the toolchain therefore never emits the companionvmaf.libimport library. Downstream tool targets then fail withLNK1181: cannot open input file 'src\vmaf.lib'. The MinGW matrix leg has used--default-library staticsince day one for the same reason (line 387); the MSVC legs now mirror that choice viamatrix.include[].meson_extra. Downstream consumers that want a DLL can either add__declspec(dllexport)decorations to the public API or use a.deffile; that is a separate decision and out of scope for the build-only gate. - Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.windows-gpu-build.strategy.matrix.include[].name' \
.github/workflows/libvmaf-build-matrix.yml
# Expected output (2 lines):
# Build — Windows MSVC + CUDA (build only)
# Build — Windows MSVC + oneAPI SYCL (build only)
- Branch protection: the two Windows GPU legs are pinned as required status checks on
masterimmediately after this PR's merge. After ADR-0120's two Linux DNN legs the count moves 21 → 23. Re-pin via:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
--input /tmp/protection-update.json
0023 — CUDA gencode coverage (sm_86/sm_89/compute_80 PTX) + init hardening¶
- Workstream PRs: the ADR-0122 PR (gencode + init hardening) and the ADR-0123 follow-up for the
32b115dfpost-cubin-load regression. - Touches:
core/src/meson.build— thegencodearray in theif get_option('enable_nvcc')branch.core/src/cuda/common.c—vmaf_cuda_state_init()error paths (multi-line actionable log,cuda_free_functions()+free(c)+*cu_state = NULLcleanup).docs/backends/cuda/overview.md—## Runtime requirementssection and### GPU architecture coveragetable.- Invariant: the
gencodearray unconditionally emits cubins forsm_75/sm_80/sm_86/sm_89plus acompute_80PTX, independent of hostnvccversion. Upstream Netflix's gencode only ships cubins at Txx major boundaries (sm_75/sm_80/sm_90/sm_100/sm_120); a literal merge that replaces our array with upstream's would re-open the Ampere-sm_86/ Ada-sm_89coverage hole. Thesm_90/sm_100/sm_120entries are still version-gated and should be preserved verbatim if upstream adds new gates. The init-path error messages are fork-local strings; upstream's terse"Error: failed to load CUDA functions"must NOT win a merge. - Re-test:
meson setup build -Denable_cuda=true -Denable_nvcc=true
ninja -C build 2>&1 | grep -E 'compute_(80|86|89)'
# Expect at least -gencode=arch=compute_86,code=sm_86 and
# -gencode=arch=compute_89,code=sm_89 and
# -gencode=arch=compute_80,code=compute_80
# Actionable init message (run without CUDA driver on the loader path):
LD_LIBRARY_PATH= ./build/tools/vmaf --help 2>&1 | grep -qi 'libcuda.so.1' || \
echo "init log regressed"
0024 — vmaf_read_pictures null-guard for CUDA device-only path¶
- Workstream PRs: the ADR-0123 follow-up landed atop the ADR-0122 gencode/init-hardening work.
- Touches:
core/src/libvmaf.c— the non-threaded tail ofvmaf_read_picturesat theprev_refupdate site (line ~1428 in the fork; upstream equivalent is the tail added byf740276a).- Invariant: the
prev_refupdate is guarded byif (ref && ref->ref)so pure-CUDA extractor sets (whereref = &ref_hostbutref_hostwas never populated bytranslate_picture_device) do not deref a NULL refcount. Upstream currently has the same unguarded tail; the bug is masked upstream only because the experimentalVMAF_PICTURE_POOLgate from32b115dfis still in place. A literal upstream merge that removes our null-guard while upstream's experimental gate is still holding would pass tests but re-open thelibvmaf_cudaffmpeg crash the moment the gate flips default-on (which the fork did in65460e3a, ADR-0104). Keep the guard until the upstream null-guard port lands. - Re-test:
# Unit tests cover the non-regression on the library side:
meson test -C build
# End-to-end regression: ffmpeg libvmaf_cuda must exit 0 on a
# CUDA-device-only extractor set (full recipe in ADR-0123).
./ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
-i /tmp/ref.mp4 -i /tmp/dis.mp4 \
-lavfi "[0:v]format=yuv420p,hwupload_cuda[r];\
[1:v]format=yuv420p,hwupload_cuda[d];\
[r][d]libvmaf_cuda=log_path=/tmp/out.json:log_fmt=json" \
-f null -
0025 — VIF init() fail-path frees advanced byte-cursor¶
- Workstream PRs: PR #47 (rewritten to leak-fix-only after master absorbed the void→uint8_t half via commit
b0a4ac3a, entry 0022 §e). Ports the leak-fix half of upstream Netflix PR #1476. - Touches:
core/src/feature/integer_vif.c(UPSTREAM — 2-line fix in theinit()fail:handler). - Invariant:
init()walksuint8_t *dataforward throughaligned_malloc's one allocation, advancing past each sub-pointer assignment. Ifvmaf_feature_name_dict_from_provided_featuresreturns NULL the fail path must free the base pointers->public.buf.data, never the advanced cursordata. Upstream master still hasaligned_free(data)there — same bug — so this entry is the reminder to not let an upstream sync re-introduce the advanced-cursor form. If upstream lands PR #1476 or an equivalent, the sync can drop this entry. - Re-test:
meson test -C build --suite=fast
# Static check: ripgrep the pattern that must NOT return.
rg -n "aligned_free\(data\)" core/src/feature/integer_vif.c && \
echo 'REGRESSED' || echo 'ok'
0026 — Automated rule-enforcement workflow + copyright pre-commit hook¶
- Workstream PRs: this PR (ADR-0124 adoption). Closes the "rule-without-a-check" gap on ADR-0100 / 0105 / 0106 / 0108.
- Touches (all FORK-ADDED — no upstream overlap):
.github/workflows/rule-enforcement.yml(new),scripts/ci/check-copyright.sh(new),.pre-commit-config.yaml(appended local hook). - Invariant: the
deep-dive-checklistjob is blocking on every PR that is not an upstream port (exempt viaport:title prefix orport/branch). The other three gates (doc-substance-check,adr-backfill-check, copyright pre-commit) are advisory or pre-commit, never CI-blocking; this split is the whole point of ADR-0124 and an upstream sync must not move them into the required-status-check set without a follow-up ADR. The opt-out parser matches/^-?\s*no .* (?:needed|impact|rebase-sensitive)/per ADR-0108 §Opt-out-lines — if upstream ever changes PR-template phrasing (unlikely; this is fork-local), the regex and the template must move together. - Re-test:
# Lint the workflow + hook locally.
pre-commit run --files \
.github/workflows/rule-enforcement.yml \
scripts/ci/check-copyright.sh \
.pre-commit-config.yaml
# Dry-run the copyright hook against a staged source file.
scripts/ci/check-copyright.sh core/src/libvmaf.c && echo ok
# Synthetic PR body that violates ADR-0108 should fail the parser;
# see docs/research/0002-automated-rule-enforcement.md §Verification
# plan for the three test cases.
0027 — SSIMULACRA 2 scalar extractor (libjxl FastGaussian IIR blur)¶
- Workstream PRs: this PR (
feat/ssimulacra2-scalar); proposal ADR in PR #67. - Touches:
core/src/feature/ssimulacra2.c(fork-local, new),core/src/meson.build,core/src/feature/feature_extractor.c. - Invariant: the extractor embeds several tables that must track libjxl upstream — opsin absorbance matrix,
MakePositiveXYBoffsets, 108 pooling weights, polynomial-transform coefficients, and the FastGaussian coefficient-derivation formulas (radius =3.2795·σ + 0.2546, Cramer's 3×3 solve for β, n2/d1 assignment per Charalampidis 2016 (33)). If libjxl ever changes any of these, updatessimulacra2.cin the same PR that syncs upstream. Self-consistency must stay at exactly100.000000for identical ref/dist inputs — this is the cheapest regression check. - Re-test:
meson test -C build --suite=fast
./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc00_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 --feature ssimulacra2 -o /tmp/self.xml \
&& grep -q 'ssimulacra2="100.000000"' /tmp/self.xml \
&& echo "ok: self-consistency 100.0"
0028 — MS-SSIM separable decimate + AVX2/AVX-512/NEON SIMD¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2(supersedes the rebase-incompatiblefeat/ms-ssim-decimate-simd; AVX2/AVX-512, commits7de8cd7fscalar separable,5f93c864AVX2,73436438AVX-512);feat/ms-ssim-decimate-neon-v2(NEON follow-up, stacked). - Touches:
core/src/feature/ms_ssim_decimate.{c,h}(NEW),core/src/feature/x86/ms_ssim_decimate_avx2.{c,h}(NEW),core/src/feature/x86/ms_ssim_decimate_avx512.{c,h}(NEW),core/src/feature/arm64/ms_ssim_decimate_neon.{c,h}(NEW),core/src/feature/ms_ssim.c(call-site change),core/src/meson.build(register new SIMD TUs),core/test/test_ms_ssim_decimate.c(NEW),core/test/meson.build(arm64 gating). - Invariant: the 9-tap 9/7 biorthogonal wavelet LPF coefficients (
ms_ssim_lpf_h/ms_ssim_lpf_v) are duplicated verbatim in five TUs for bit-identity: the scalarms_ssim_decimate.c, the AVX2 variant, the AVX-512 variant, the NEON variant, and upstream'sg_lpf_h/g_lpf_vinms_ssim.c. Any upstream change to the coefficient values or theKBND_SYMMETRICmirror branch iniqa/convolve.cmust be mirrored to all five. If not mirrored, SIMD paths and scalar diverge silently and the bit-equalitymemcmpintest_ms_ssim_decimatecatches it — but only when that test runs, so diff the five files first. - Re-test (on each supported host arch):
# x86_64 host — native build.
meson test -C build
./build/test/test_ms_ssim_decimate
# aarch64 host OR aarch64 cross under qemu — see /tmp/aarch64-cross.txt.
meson setup build-arm64 libvmaf --cross-file /tmp/aarch64-cross.txt \
-Denable_cuda=false -Denable_sycl=false
ninja -C build-arm64
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
build-arm64/test/test_ms_ssim_decimate
# Netflix MS-SSIM golden — places=4 must still pass through SIMD.
.venv/bin/python -m pytest \
python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor
0029 — KBND_SYMMETRIC period-based reflection in iqa/convolve.c¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2follow-up (CI triage on PR #69, 2026-04-20). - Touches:
core/src/feature/iqa/convolve.c(upstream file, rewrittenKBND_SYMMETRIC). - Invariant:
KBND_SYMMETRIC(img, w, h, x, y, _)must use the period-based form (period = 2*w,period = 2*h) so that offsets with|x| > wor|y| > hstill land in bounds. Upstream's single-reflect form was out-of-bounds wheneverw < kernel_halforh < kernel_half; the latent bug did not reproduce in Netflix golden tests because MS-SSIM pyramids never decimate below ~60×34. Any upstream change that reverts to the single-reflect form must be rejected or re-ported. - Re-test:
./build/test/test_ms_ssim_decimate # test_1x1 border case
.venv/bin/python -m pytest \
python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor
0030 — adm_decouple_s123_avx512 stack-array 64-byte alignment¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2follow-up (CI triage on PR #69, 2026-04-20). - Touches:
core/src/feature/x86/adm_avx512.c(upstream file, one-line_Alignas(64)onint64_t angle_flag[16]at line 1317).core/test/test_pic_preallocation.c(upstream file, threevmaf_model_destroy(model)calls pairing thevmaf_model_loadintest_picture_pool_basic/_small/_yuv444). - Invariant: the stack slot for
angle_flagmust be 64-byte aligned because two_mm512_loadu_si512(&angle_flag[0/8])loads in the same scope may be promoted to alignedvmovdqa64by LTO. Dropping the_Alignas(64)annotation re-introduces the SEGV under--buildtype=release -Db_lto=true -Db_sanitize=address. Debug / no-LTO builds keepvmovdqu64and cannot flag the regression. Seedocs/development/known-upstream-bugs.md. - Re-test:
meson setup build-asan-lto libvmaf \
-Denable_cuda=false -Denable_sycl=false \
-Db_sanitize=address --buildtype=release -Db_lto=true
ninja -C build-asan-lto test/test_pic_preallocation
ASAN_OPTIONS=detect_leaks=1 \
./build-asan-lto/test/test_pic_preallocation
0031 — Batch-A upstream-port small-fix sweep (ports of unmerged PRs)¶
- Workstream PRs:
feat/batch-a-upstream-small-fix-sweep— commits546a40ee(T0-1),8fed8ad1(T4-4),83a1db46(T4-5),34425dee(T4-6). ADRs 0131, 0132, 0134, 0135. - Touches:
core/src/cuda/picture_cuda.c(one-linecuMemFreeport of Netflix#1382)core/src/feature/feature_collector.c+core/test/test_feature_collector.c(mount/unmount bugfix port of Netflix#1406 + shared-helper test refactor)core/src/meson.build(declare_dependency+override_dependencyport of Netflix#1451)core/include/libvmaf/model.h,core/src/model.c,core/test/test_model.c,docs/api/index.md(built-in model iterator port of Netflix#1424)- Invariant: each of the four upstream PRs is OPEN (unmerged) on the port date; when Netflix merges any of them, the fork's version is correction-bearing (T4-4 test refactor, T4-6 three defect fixes + Doxygen doc expansion), not line-identical. Resolution on upstream merge is always "keep fork version" because the fork's version already satisfies the PR's intent and additionally fixes the defects.
- Netflix#1406 conflict will land in
test_feature_collector.c— fork usesload_three_test_models()helper vs upstream's inline per-modelVmafModel *m0, *m1, *m2;duplication. - Netflix#1424 conflict will land in
core/src/model.candcore/test/test_model.c— fork useselse ifguard +idx + 1 < CNT+ const-qualified test types. - Netflix#1382 and Netflix#1451 are line-identical in substance; merge should be clean aside from trailing-comma style drift.
- Re-test:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_feature_collector test/test_model
build/test/test_feature_collector
build/test/test_model
# Expected: 6/6 pass in test_feature_collector (mount/unmount
# 3-model sequences); 39/39 pass in test_model (includes
# test_version_next full-iteration invariant).
0032 — Thread-local locale handling for numeric I/O (port of Netflix/vmaf#1430)¶
- Workstream PRs:
port/netflix-1430-thread-locale(T4-3 from the "Batch-A follow-up" sweep, 2026-04-20). - Touches:
core/src/thread_locale.h/core/src/thread_locale.c(new, upstream-authored);core/src/meson.build(twocdata.set('HAVE_USELOCALE'/'HAVE_XLOCALE_H')probes +src_dir + 'thread_locale.c'inlibvmaf_sources);core/src/output.c(four writers gainpush_c()+pop()bracket, preserving fork'sferror(outfile) ? -EIO : 0return contract from ADR-0119);core/src/svm.cpp(drop<locale.h>include; replacesetlocale/strdup/setlocalebracket withvmaf_thread_locale_push_c/pop; addbuffer.imbue(std::locale::classic())to both SVM parser ctors with fork's K&R + 4-space style);core/src/read_json_model.c(bracketmodel_parsewith push/pop);core/test/meson.build(newtest_locale_handlingtarget + test registration);core/test/test_locale_handling.c(new, upstream-authored with three fork corrections for thescore_formatparameter). - Invariant: fork's output writers return
ferror(outfile) ? -EIO : 0— this must survive any upstream refactor of the writer bodies. Thepush_c()call MUST be paired with apop()on every return path (writer bodies have a single tail return, so the pattern is locallypush → body → pop → return ferror-check). Droppingpop()leaks alocale_ton POSIX and leaves the thread locked to "C" on Windows. - Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_locale_handling
# Repro the user-visible failure without the fix:
LC_ALL=de_DE.UTF-8 build/tools/vmaf --reference ref.yuv \
--distorted dis.yuv --width 1920 --height 1080 \
--pixel_format 420 --bitdepth 8 --output result.json \
--json
# Assert output contains period decimals, not comma.
python -c "import json; d=json.load(open('result.json')); \
assert all('.' in repr(v) for v in \
[f['metrics']['vmaf'] for f in d['frames']])"
- On upstream sync: when Netflix merges PR #1430, the
(cherry picked from commit 054a97ed…)trailer ingit log port/netflix-1430-thread-localelets the next/sync-upstreamskip this commit. If the upstream diff drifts, redo the three fork corrections listed in ADR-0137 §Decision.
0033 — SSIM / MS-SSIM SIMD bit-exact to scalar via per-lane scalar double¶
- Workstream PRs:
feat/ms-ssim-decimate-neon(this PR — companion to the ADR-0138 convolve fast path). - Touches:
core/src/feature/x86/ssim_avx2.candcore/src/feature/x86/ssim_avx512.c—ssim_accumulate_*rewritten.ssim_precompute_*andssim_variance_*unchanged (they were already bit-exact). Plus the new bit-exactconvolve_avx2.c/convolve_avx512.cand the upstream h-pass OOB fix atiqa/convolve.c:159. - Invariants (see ADR-0139 §Decision):
- Convolve taps — single-rounded
float*float→ widen →doubleadd, NO FMA. Mirrors scalarsum += img[i]*k[j]iniqa/convolve.c. - SSIM accumulate — scalar's
2.0 *literal (2.0 * ref_mu[i] * cmp_mu[i] + C1and2.0 * srsc + C2) is a Cdoubleliteral. Both SIMD accumulators do the2.0 *numerator + division + finall*c*sproduct per-lane in scalar double to match scalar type promotions byte-for-byte. - H-pass outer-loop bound —
y < dst_h + vc - kh_even(noty < dst_h + vc); the- kh_evenis load-bearing because the last cache row on even-tap kernels (e.g. box-8) is never read by the v-pass but was previously written OOB when image height equals kernel height.
Fork-local SSIM SIMD is NOT upstream. If upstream ever adds their own SSIM AVX2/AVX-512, keep the fork's version on conflict — it's the only variant verified bit-exact to scalar at --precision max. - Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_iqa_convolve test_ms_ssim_decimate
# Bit-exactness check across dispatch backends:
FIX=python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv
DIS=python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv
for m in 255 16 0; do
build/tools/vmaf --cpumask $m --reference $FIX --distorted $DIS \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
--feature float_ssim --feature float_ms_ssim \
--output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_16.xml) # expect empty
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_0.xml) # expect empty
- On upstream sync: the AVX2/AVX-512 SSIM surface is entirely fork-local (upstream has VIF/ADM/motion/CAMBI SIMD but no SSIM). If upstream ever introduces SSIM SIMD, their kernel bodies will almost certainly compute
l*c*sin vector float for throughput — do not adopt. The fork's per-lane-scalar-double reduction is required for the bit-exactness claim. Same applies toconvolve_avx2/512— they are fork-only; dispatch sits inssim_tools.cvia_iqa_convolve_set_dispatch.
0034 — SIMD DX framework + NEON SSIM/convolve bit-exact port¶
- Workstream PRs:
feat/simd-dx-framework(this PR, PR #A); ships the two demos on top of which PR #B will consume the framework (ssimulacra2, motion_v2, vif_statistic, ...). - Touches:
core/src/feature/simd_dx.h(new header),core/src/feature/arm64/convolve_neon.c+convolve_neon.h(new NEON port),core/src/feature/arm64/ssim_neon.c(ssim_accumulate_neonrewritten for ADR-0139 bit-exactness;precompute+varianceunchanged),core/src/feature/float_ssim.c+core/src/feature/float_ms_ssim.c(wireiqa_convolve_neoninto the aarch64 dispatch setters),core/src/meson.build(arm64_sources+= convolve_neon.c),core/test/meson.build(test_iqa_convolvearch filter extended toarm64/aarch64),core/test/test_iqa_convolve.c(NEON variant check + aarch64 CPU flag detection),core/test/dnn/meson.build(test_cli.shgated onnot meson.is_cross_build()— bash invokes$VMAF_BINdirectly so meson's exe_wrapper isn't applied), newbuild-aux/aarch64-linux-gnu.inimeson cross-file,.claude/skills/add-simd-path/SKILL.md(upgraded kernel-spec flags). - Invariants (see ADR-0140 §Decision):
simd_dx.his fork-local. Keep the fork's version on upstream conflict. Macro names are ISA-suffixed (_AVX2_4L,_AVX512_8L,_NEON_4L) — do not collapse into a cross-ISA abstraction; the fork's SIMD policy (user-memoryfeedback_simd_dx_scope.md) rules out Highway / simde / xsimd.- The ADR-0138 widen-then-add rule (single-rounded
float * float→ widen →doubleadd, NO FMA) applies to NEON exactly as to AVX2 / AVX-512. The NEON form uses pairedfloat64x2_taccumulators (lo / hi) because NEON has nofloat64x4_t. - The ADR-0139 per-lane scalar-double reduction rule applies to
ssim_accumulate_neonexactly as to the AVX2 / AVX-512 variants. The NEON implementation usesSIMD_ALIGNED_F32_BUF_NEON(_Alignas(16) float name[4]) + a 4-iteration scalar loop. - Re-test (requires
aarch64-linux-gnu-gcc+qemu-user-static+ aarch64 sysroot at/usr/aarch64-linux-gnu):
cd libvmaf
meson setup ../build-aarch64 \
--cross-file ../build-aux/aarch64-linux-gnu.ini \
-Denable_cuda=false -Denable_sycl=false -Denable_dnn=disabled
cd ..
ninja -C build-aarch64
meson test -C build-aarch64 # expect 31/31 OK
# Bit-exactness check scalar vs NEON under QEMU:
REF=python/test/resource/yuv/src01_hrc00_576x324.yuv
DIS=python/test/resource/yuv/src01_hrc01_576x324.yuv
for m in 255 0; do
LD_LIBRARY_PATH=$PWD/build-aarch64/src qemu-aarch64-static \
-L /usr/aarch64-linux-gnu build-aarch64/tools/vmaf \
--cpumask $m --reference $REF --distorted $DIS \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--feature float_ssim --feature float_ms_ssim \
--output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_0.xml) # expect empty
- On upstream sync: upstream has no NEON SSIM and no NEON convolve for IQA. If they ever add one, keep the fork's version on conflict — the fork's NEON path is the only variant verified bit-exact to scalar at
--precision max. Thebuild-aux/aarch64-linux-gnu.inicross-file has no upstream equivalent. The/add-simd-pathskill is fork-only; upstream doesn't ship.claude/skills/.
0036 — Port Netflix generalised AVX convolve + ADR-0141 cleanup¶
- Workstream PRs:
port/upstream-f3a628b4-generalized-avx-convolve(this PR). - Upstream commit:
f3a628b4"feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21). - Touches:
- convolution.h — upstream-tracking: adds
#define MAX_FWIDTH_AVX_CONV 17. - convolution_avx.c — upstream-tracking (2,500 LoC deletion) plus fork-delta cleanup per ADR-0141: four scanline helpers
convolution_f32_avx_s_1d_*changed from external linkage tostatic(no other TU uses them after the specialised-path removal); stride parameters widened frominttoptrdiff_tin the helpers, with(ptrdiff_t)casts at public-function multiplication sites;#include <stddef.h>added for the type. core/src/feature/vif_tools.c— upstream-tracking: three AVX dispatch sites drop thefwidth == 17 || ... == 3whitelist in favour offwidth <= MAX_FWIDTH_AVX_CONV.python/test/quality_runner_test.py,python/test/vmafexec_test.py— upstream-authored loosening of two full-VMAF-score assertions fromplaces=2(±0.005) toplaces=1(±0.05). Adopted per the ADR-0142 Netflix-authority precedent (project rule #1 addresses fork drift, not upstream-authored test updates the fork must track).- Invariants (see ADR-0143 §Decision):
- Static linkage on scanline helpers — upstream leaves the four
convolution_f32_avx_s_1d_*_scanlinehelpers with external linkage out of habit; the fork narrows them tostatic. On upstream sync: if upstream ever externs them from another TU, that's a flag to re-audit; keep the fork'sstaticunless the reference is real. ptrdiff_tstrides inside helpers — the publicconvolution_f32_avx_*_swrappers keepintstrides (matching the upstream interface +convolution.hdeclarations). Helpers takeptrdiff_tto silencebugprone-implicit-widening-of- multiplication-result. If upstream changes the public interface toptrdiff_t, drop the fork's wrapper-level casts.MAX_FWIDTH_AVX_CONV = 17— the ceiling is upstream's; if upstream bumps it, the fork must rebuild + re-run the VIF golden test pair.- Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build # expect 32/32 OK
clang-tidy -p build core/src/feature/common/convolution_avx.c
# Zero warnings expected on the touched file.
Netflix CPU golden CI leg exercises the two loosened assertions; confirmed locally under meson test. - On upstream sync: upstream is the source of truth for convolution_avx.c, convolution.h, vif_tools.c dispatch, and the two python golden tolerances. On a rebase, prefer upstream for those files except: - Keep the fork's static on the four scanline helpers. - Keep the fork's ptrdiff_t helper signatures + multiplication- site casts (unless upstream adopts them too, in which case converge). - Keep the fork's #include <stddef.h>. If upstream re-introduces a specialised fast path for common widths, evaluate on a per-fwidth perf profile — the fork's /profile-hotpath skill covers this.
0037 — Float convolution AVX-512 port (ADR-0504, fork-local)¶
- Workstream PR:
perf/float-convolution-avx512-port-2026-05-18. - Upstream: no AVX-512 float convolution path in upstream; this is fork-local (ADR-0504).
- Touches (fork-local):
convolution_avx512.c— new TU with four static scanline helpers and three public wrappers, all ported fromconvolution_avx.c(__m256→__m512, FMA added).convolution.h— adds threeconvolution_f32_avx512_*_sdeclarations.vif_tools.c— dispatch updated to testVMAF_X86_CPU_FLAG_AVX512beforeAVX2in all threevif_filter1d_*_sfunctions.core/src/meson.build— addsconvolution_avx512.ctox86_avx512_sources.- Rebase risk: LOW.
convolution_avx512.cis entirely fork-local; upstream changes toconvolution_avx.corconvolution.hmay need to be mirrored here, but the AVX-512 file has no upstream conflict surface. - Gate:
meson test -C build(63/63). Netflix CPU golden tests pass.
0038 — motion_v2 NEON SIMD (fork-local)¶
- Workstream PR:
port/motion-bundle-neon-and-updates(this PR). - Upstream: none — aarch64 NEON for
motion_v2is fork-local. Upstream scalar + AVX2 + AVX-512 variants exist; this PR adds the missing NEON fourth path. Scalar is the bit-exactness ground truth. - Touches (fork-local):
- motion_v2_neon.c — new TU, ~300 LoC. 4-wide int32 SIMD over the 5-tap Gaussian pipeline. Five
static inlinehelpers keep every function under the ADR-0141 60-line budget. - motion_v2_neon.h — new header declaring the two public entry points.
- integer_motion_v2.c — dispatch update: adds an
#if ARCH_AARCH64block ininitthat selects the NEON variant whenVMAF_ARM_CPU_FLAG_NEONis present, mirroring the existing x86 dispatch blocks. core/src/meson.build— addarm64/motion_v2_neon.cto thearm64_sourceslist.- Invariants (see ADR-0145 §Decision):
- Arithmetic right-shift throughout. The fork's AVX2 path uses
_mm256_srlv_epi64(logical) which can diverge from scalar on negative-diff pixels. The NEON port usesvshrq_n_s64(v, 16)for the known Phase-2 shift andvshlq_s64(v, -(int64_t)bpc)for the variable Phase-1 shift — both arithmetic, matching scalar C>>on signed integer. On rebase: keep the arithmetic forms; do NOT adoptvshrq_n_u64or a logical emulation even if it runs faster. - 4-lane stride + mirror tails. SIMD stride = 4; scalar tails cover the remainder. The Phase-2 helper
x_conv_row_sad_neonhands 4 lanes tox_conv_block4_neonand drops to scalar for both left/right edges (j < 2andj + 6 > w). On rebase: preserve the 4-lane stride and the two-sided scalar tail. - Signature parity with AVX2. Both pipeline entry points match the AVX2 + AVX-512 variants'
(const uint8_t *prev, ptrdiff_t, const uint8_t *cur, ptrdiff_t, int32_t *y_row, unsigned w, unsigned h, unsigned bpc)signature. On rebase: if upstream changes the signature, mirror the change here AND in the x86 variants in lockstep. - Re-test:
meson setup build-aarch64 libvmaf \
--cross-file build-aux/aarch64-linux-gnu.ini \
-Denable_cuda=false -Denable_sycl=false
ninja -C build-aarch64
meson test -C build-aarch64 --no-rebuild # expect 31/31 OK
clang-tidy -p build-aarch64 \
core/src/feature/arm64/motion_v2_neon.c
# Zero warnings expected on the touched file.
# NEON-vs-scalar bit-exact diff under QEMU:
YUV=python/test/resource/yuv
for mask in 0 255; do
LD_LIBRARY_PATH=build-aarch64/src \
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
build-aarch64/tools/vmaf \
-r $YUV/src01_hrc00_576x324.yuv \
-d $YUV/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 -n --feature motion_v2 \
--cpumask $mask -o /tmp/mv2_$mask.xml --precision max
done
diff <(grep -v 'fps=' /tmp/mv2_0.xml) \
<(grep -v 'fps=' /tmp/mv2_255.xml) # expect empty
- On upstream sync: upstream has no NEON
motion_v2and has not signalled plans to add one. If they ever do, diff their NEON against the fork's: on logical-vs-arithmetic shift, keep the fork's arithmetic form (matches scalar). On the function decomposition (the five helpers), adopt upstream's if it's smaller; the fork's layout is ADR-0141-driven, not a semantic contract. - Follow-up T7-32 (fixed 2026-05-09): The
_mm256_srlv_epi64(logical right shift) inmotion_score_pipeline_16_avx2was replaced withsrav_epi64_imm, an AVX2-safe arithmetic-right-shift emulation: logical shift OR sign-fill mask viasrai_epi32+slli_epi64. Two bugs were closed in the same PR: - AVX2 logical-vs-arithmetic shift:
_mm256_srlv_epi64replaced bysrav_epi64_immincore/src/feature/x86/motion_v2_avx2.c. The emulation is bit-exact with scalar C>> bpcon signedint64_t. - Test scalar reference mirror:
mirror_idxincore/test/test_motion_v2_simd.cused2*size - idx - 1instead of2*size - idx - 2, diverging frominteger_motion_v2.c::mirror(). Fixed to-2. All four adversarial fixtures (neg-diff bpc10/12, mixed-diff bpc10/12) now pass.meson test -C build50/50 OK. On rebase: keepsrav_epi64_imm; do not revert to_mm256_srlv_epi64. The rebase-time invariant is now: AVX2 path uses arithmetic shift (matching NEON and scalar).
0039 — readability-function-size NOLINT sweep (ADR-0146)¶
- ADR: ADR-0146
- Touches:
core/src/dict.ccore/src/picture.ccore/src/picture_pool.ccore/src/predict.ccore/src/libvmaf.ccore/src/output.ccore/src/read_json_model.ccore/src/feature/feature_extractor.ccore/src/feature/feature_collector.ccore/src/feature/iqa/convolve.ccore/src/feature/iqa/ssim_tools.ccore/src/feature/x86/vif_statistic_avx2.c- Invariant: every
readability-function-sizeNOLINT suppression has been replaced by a set of smallstatic(orstatic inline, for the SIMD / IQA files) helpers. The helper names are stable interfaces the surrounding code depends on (e.g.iqa_convolve_1d_separable,iqa_convolve_2d,ssim_compute_stats,ssim_workspace_alloc/_free,vif_stat_simd8_compute/_reduce,struct vif_simd8_lane,read_pictures_extractor_loop,read_pictures_post_extractor,read_pictures_validate_and_prep,read_pictures_update_prev_ref). Upstream Netflix has no equivalent helpers; rebases touching any of these files will conflict against the fork's split shape. - On upstream sync:
- If upstream lands a different decomposition of
_iqa_convolveor_iqa_ssim, prefer upstream's shape only if it keeps the ADR-0138 / ADR-0139 bit-exactness invariants (single-rounded float mul → widen to double → double add; per-lane scalar-float reduction through aligned temp buffer). Otherwise keep the fork's split and re-document the divergence here. - The fork renamed
_calc_scale→iqa_calc_scaleto clear thebugprone-reserved-identifiercheck. If upstream modifies_calc_scale, keep the fork's name and port the behavioural change. model_collection_parse_loopwrites directly tocfg_namerather than throughc->name— if upstream ever rewritesmodel_collection_parse, preserve the direct write (it's what lets the param stay non-const without a NOLINT).- Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 -o /tmp/vmaf_$mask.xml
done
diff <(grep -v fyi /tmp/vmaf_0.xml) <(grep -v fyi /tmp/vmaf_255.xml)
# expect exit 0 (Netflix-golden-pair VMAF bit-identical scalar vs SIMD)
Also run clang-tidy -p build on every file in Touches; expect zero warnings. - Follow-up T7-6: decide whether to rename the _iqa_* API surface (convolve / ssim / decimate / img_filter / filter_pixel / get_pixel) across all callers to clear the remaining bugprone-reserved-identifier suppressions in ssim.c, ms_ssim.c, float_ms_ssim.c. Out of scope here.
0040 — Thread-pool job recycling + inline data buffer (ADR-0147)¶
- ADR: ADR-0147
- Touches:
core/src/thread_pool.c - Invariants:
VmafThreadPoolJobcarries a fixed-sizechar inline_data[64]buffer. Payloads ≤ 64 bytes go throughmemcpy(job->inline_data, data, data_sz)+job->data = job->inline_data; payloads > 64 bytes take the legacymallocpath. The cleanup path MUST distinguish the two viajob->data != job->inline_data— a naivefree(job->data)would corrupt the slot. Enforced invmaf_thread_pool_job_clear_data.free_jobslist is protected by the existingqueue.lock; enqueue pops from it beforemallocing, runner recycles onto it after running a job.vmaf_thread_pool_destroywalks the list aftervmaf_thread_pool_waitreturns (all workers have exited → no lock needed). Any reorder that frees the queue lock before thefree_jobswalk is a leak on shutdown.- Fork's
void (*func)(void *data, void **thread_data)signature + per-workerVmafThreadPoolWorkerare fork-local; upstream Netflix #1464 hasfunc(void *data). Keep the fork's signature on any rebase — callers (src/libvmaf.c:threaded_enqueue_oneetc.) depend on the two-arg form. -
On upstream sync: Netflix PR #1464 is CLOSED (not merged) and bundles twelve unrelated optimizations. Only the thread-pool portion is ported here. If upstream ever reopens and merges #1464 (or a successor), cherry-pick only the pool mechanics; reject the payload-signature changes, the ADM / VIF / predict.c pieces (they conflict with ADR-0138 / 0139 / 0142 bit-exactness and with T7-5 predict.c refactor), and the feature-collector capacity bump (fork already capped at 8 for a reason — see
src/feature/feature_collector.c). -
Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for threads in 1 4; do
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 --threads $threads -o /tmp/vmaf_${threads}_${mask}.xml
done
done
# Expect bit-identical scores (attribute order may differ across
# --threads 1 vs --threads 4 because feature-collector emits in
# insertion order; the numeric values match).
diff <(grep -v fyi /tmp/vmaf_4_0.xml) <(grep -v fyi /tmp/vmaf_4_255.xml)
# expect exit 0 (scalar vs SIMD threaded)
Also run clang-tidy -p build core/src/thread_pool.c — expect zero warnings. Re-run the 500 000-job micro-benchmark from ADR-0147 §Decision if performance is under investigation.
0041 — IQA reserved-identifier rename + cleanup (ADR-0148)¶
- ADR: ADR-0148
- Touches: 21 files across
core/src/feature/(iqa/{convolve,decimate,ssim_tools}.{c,h},iqa/ssim_simd.h,ssim.c,integer_ssim.c,ms_ssim.c,ms_ssim_decimate.h,float_ssim.c,float_ms_ssim.c,x86/convolve_avx2.{c,h},x86/convolve_avx512.{c,h},arm64/convolve_neon.{c,h},AGENTS.md) pluscore/test/test_iqa_convolve.c. - Invariants:
- Every
_iqa_*/_kernel/_ssim_int/_map_reduce/_map/_reduce/_context/_ms_ssim_*/_ssim_*/_alloc_buffers/_free_bufferssymbol and the four underscore-prefixed header guards (_CONVOLVE_H_,_DECIMATE_H_,_SSIM_TOOLS_H_,__VMAF_MS_SSIM_DECIMATE_H__) is renamed to its non-reserved spelling. The fork's IQA surface no longer uses C's reserved-identifier name space. - The
clang-analyzer-security.ArrayBoundNOLINT bracket inssim_accumulate_rowandssim_reduce_row_range(integer_ssim.c) is load-bearing — the inner kernel-loopk_min/k_maxclamping is provably correct (k_min = max(0, hkernel_offs - x),k_max = min(hkernel_sz, hkernel_sz - (x + hkernel_offs - w + 1))) but the analyzer can't follow it across helper boundaries. Do not collapse the bracket. - The
clang-analyzer-unix.MallocNOLINT bracket intest_iqa_convolve.c(check_simd_variant,check_case) is intentional — test exits process on failure path; small allocations leak by design at test end. Do not refactor to free-on-exit. - The cross-TU NOLINT pattern on
compute_ssim(ssim.c) andcompute_ms_ssim(ms_ssim.c) — clang-tidymisc-use-internal-linkageruns per-TU and can't see the header bridge tofloat_ssim.c/float_ms_ssim.c. Keep the inline justification comment. - On upstream sync:
- The Netflix upstream IQA library (
tjdistler/iqa) has been effectively abandoned (last meaningful commit pre-2020). Future rebases will conflict on every renamed symbol; drop the underscore-prefix on each conflict and mirror the fork'siqa_*naming. - If upstream Netflix/vmaf ever reincorporates the IQA naming wholesale, prefer the fork's spellings — this PR is a one-shot mechanical rename with no semantic content.
- Re-test on rebase:
ninja -C build && meson test -C build
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 \
--feature float_ssim --feature float_ms_ssim \
-o /tmp/iqa_$mask.xml
done
diff <(grep -v fyi /tmp/iqa_0.xml) <(grep -v fyi /tmp/iqa_255.xml)
# expect exit 0 (bit-identical scalar vs SIMD on float_ssim/ms_ssim)
Also run clang-tidy -p build on every touched file (excluding arm64/); expect zero warnings.
0042 — Port Netflix #1376 — FIFO-hang fix via Semaphore (ADR-0149)¶
- ADR: ADR-0149
- Upstream commit: Netflix PR #1376, head
1c06ca4f1bb5da38b54db075a27c35ba8ea9d7b7(OPEN upstream as of 2026-04-24). - Touches:
python/vmaf/core/executor.py— baseExecutorclass +ExternalVmafExecutor-style subclass; delete_wait_for_workfiles/_wait_for_procfilespolling loops; rewrite_open_{work,proc}files_in_fifo_modearoundmultiprocessing.Semaphore(0); addopen_sem=Nonekwarg to every_open_{ref,dis}_{work,proc}fileand to the_open_workfilestaticmethod; drop unusedfrom time import sleep.python/vmaf/core/raw_extractor.py—AssetExtractor+DisYUVRawVideoExtractor; addopen_sem=Noneto_open_{ref,dis}_workfileoverrides (release on entry since these are no-ops); delete_wait_for_workfilesoverrides; drop unusedfrom time import sleep.- Fork carve-outs (load-bearing on rebase):
python/vmaf/__init__.py:__version__stays"3.0.0"— do NOT port upstream's bump to"4.0.0". The fork tracks its own versioning (v3.x.y-lusoris.N) per ADR-0025.from time import sleepis dropped from both files — upstream leaves the import in place (unused after their patch); the fork removes it because ADR-0141 touched-file rule requires ruff F401 clean.- Upstream typo preserved: the subclass warning message contains "to be created to be created". Comments note the typo inline; do not silently fix on rebase — it's upstream- authored and project policy is verbatim port.
- On upstream sync: upstream PR #1376 is still OPEN. When it merges, re-diff against the merged form; the touched hunks should be conflict-free because the fork now carries the same shape. Re-check whether upstream fixed the "to be created to be created" typo; if so, adopt the fix (it becomes a simple string update).
- Re-test:
python3 -m py_compile python/vmaf/core/executor.py \
python/vmaf/core/raw_extractor.py
ruff check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
black --check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
# all silent
# No FIFO-mode unit test in the tree; end-to-end harness
# exercise (needs libsvm + ffmpeg + fixtures) goes via
# make test-netflix-golden
# which doesn't exercise fifo_mode path but does verify the
# refactor didn't break executor.py imports.
0043 — Port Netflix #1472 — CUDA on Windows MSYS2/MinGW (ADR-0150)¶
- ADR: ADR-0150
- Upstream commits: Netflix PR #1472 —
15745cdf(portability) +b7b65e64(meson plumbing). Both OPEN upstream as of 2026-04-24. - Touches:
core/src/cuda/common.h— drop<pthread.h>include; rename reserved header guard__VMAF_SRC_CUDA_COMMON_H__→VMAF_SRC_CUDA_COMMON_INCLUDED.core/src/cuda/cuda_helper.cuh—#ifdef DEVICE_CODEguard around<cuda.h>vs<ffnvcodec/dynlink_loader.h>.core/src/picture.h—#ifdef DEVICE_CODEguard around<cuda.h>+ forward-declareVmafCudaStatevs<ffnvcodec/*>+ fulllibvmaf_cuda.h; rename reserved header guard.core/src/feature/integer_adm.h— updated comment abovedwt_7_9_YCbCr_thresholdtable noting the fork's positional-initializer shape vs upstream's#ifndef __CUDACC__shape (see §Fork carve-outs).core/src/feature/cuda/integer_adm/{adm_cm,adm_csf,adm_csf_den,adm_decouple,adm_dwt2}.cu—#ifndef DEVICE_CODEguard around#include "feature_collector.h".core/src/meson.build— Windows nvcc plumbing (+70 LoC underhost_machine.system() == 'windows'):vswhere-basedcl.exediscovery, MSVC + Windows SDK include path injection, CUDA version detection vianvcc --version,nvcc_ccbin_flags+nvcc_host_includesthreaded through everycustom_targetthat invokes nvcc.- Fork carve-outs (load-bearing on rebase):
integer_adm.huses positional initializers, NOT upstream's#ifndef __CUDACC__wrap. Both shapes resolve the MSVC/nvcc C++-designated-initializer issue; the positional form is C++-portable and keeps the table available to future.cuconsumers. Keep the fork's form on rebase.cuda_static_libkeepsdependencies : [pthread_dependency]. Upstream drops it; the fork needs it becausering_buffer.c(built as part ofcuda_static_lib)#includes<pthread.h>directly. On rebase: keep the fork's version.meson.buildgencode coverage block: the fork's ADR-0122 explicit cubin list (sm_75/80/86/89 + compute_80 PTX) sits after the new upstream nvcc-detect block. On rebase, re-assemble the same merged order: nvcc-detect first, then gencode coverage (both host-independent).- Header guards:
_INCLUDEDspellings are fork-local (ADR-0148 precedent). Upstream keeps reserved__VMAF_SRC_*_H__spellings. On rebase, keep_INCLUDED. - On upstream sync: PR #1472 is still OPEN. When merged, re-diff the three conflict-resolved hunks against upstream's final form. Keep fork's version on the four carve-outs above unless upstream meaningfully reshapes those regions.
- Re-test on rebase (Linux host with CUDA toolkit):
meson setup libvmaf core/build-cuda \
-Denable_cuda=true -Denable_nvcc=true -Denable_sycl=false
ninja -C core/build-cuda && meson test -C core/build-cuda
# Expect 6 .fatbin files generated + CLI linked + 35/35 tests pass.
Windows validation is operator-driven — CI does not yet have a Windows + MSYS2 + MinGW + MSVC BuildTools + CUDA runner (tracked as T7-3 in .workingdir2/OPEN.md). - Prerequisites note (Windows only): nv-codec-headers must be built from git master commit 876af32 or later. The release tag n13.0.19.0 is missing cuMemFreeHost, cuStreamCreateWithPriority, cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing issue, not scope of this port.
0058 — libvmaf.pc Cflags leak fix (ADR-0200)¶
- ADR: ADR-0200; bug-fix follow-up to entry 0057.
- Upstream source: fork-local. Netflix has no Vulkan backend.
- Touches:
core/subprojects/packagefiles/volk/meson.build— drops-include volk_priv_remap.hfromvolk_dep.compile_args; keeps-DVK_NO_PROTOTYPES.core/src/vulkan/meson.build— pullsvolk_priv_remap_h_pathfrom the volk subproject and appends['-include', <path>]tovmaf_cflags_common(privatec_args:on libvmaf'slibrary()call).- Invariants (load-bearing):
-includeMUST stay offvolk_dep.compile_args— otherwise it leaks into staticlibvmaf.pcCflags. Test on rebase:meson setup ... -Ddefault_library=static -Denable_vulkan=enabled, thengrep Cflags meson-private/libvmaf.pc— must NOT containvolk_priv_remapor any build-dir absolute path.-includeMUST be applied to libvmaf's compile — every libvmaf TU that calls volk'svk*API needs the rename macros active. Thevmaf_cflags_commoninjection covers this for all libvmaf sub-libraries (libvmaf_feature, libvmaf_cpu, etc.).- The path comes from
subproject('volk').get_variable(...), not from a hardcoded string — survives volk wrap version bumps. - On upstream sync: zero upstream interaction.
- Re-test on rebase / volk wrap bump:
meson setup build-vk-static-test libvmaf -Denable_vulkan=enabled \
-Denable_cuda=false -Denable_sycl=false -Ddefault_library=static
ninja -C build-vk-static-test src/libvmaf.a
grep Cflags build-vk-static-test/meson-private/libvmaf.pc
# Expected: no `volk_priv_remap` substring, no build-dir absolute path
0057 — Volk vk* priv-remap for static-archive builds (ADR-0198)¶
- ADR: ADR-0198; follow-up to ADR-0185.
- Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
- Touches:
core/subprojects/packagefiles/volk/meson.build— overlay applied on top of the upstream volk wrap. Adds acustom_targetthat runsgen_priv_remap.pyto producevolk_priv_remap.hfrom the upstreamvolk.h, and wires-includeof the generated header intovolk.c'sc_argsandvolk_dep'scompile_args.core/subprojects/packagefiles/volk/gen_priv_remap.py— fork-added generator script (regex againstextern PFN_vkXxx vkXxx;declarations).- Invariants (load-bearing):
- Force-include must propagate to every libvmaf TU pulling in
volk_dep— verified via meson dep graph. Removing the-includefromcompile_argsre-introduces the static-link multi-def cascade. - Generator regex matches every
vk*PFN declaration involk.h— confirmed for volk-1.4.341 (784declarations,784remaps). Bumping the volk wrap version: re-run the generator (it's a configure-time custom target, so it's automatic) and confirm the rename count printed to stdout matches the count of^extern PFN_vklines in the newvolk.h. - The renamed symbols use the
vmaf_priv_prefix — chosen to match no upstream Netflix or Vulkan SDK identifier. Don't rename to_vk*(collides with reserved-identifier C namespace) orvkv_*etc. - On upstream sync: zero upstream interaction. The volk wrap is a libvmaf-managed subproject; Netflix doesn't ship a Vulkan backend.
- Re-test on rebase / after any volk wrap bump:
meson setup build-vk-static libvmaf -Denable_vulkan=enabled \
-Denable_cuda=false -Denable_sycl=false \
-Ddefault_library=static
ninja -C build-vk-static src/libvmaf.a
test "$(nm build-vk-static/src/libvmaf.a 2>/dev/null \
| grep -cE '^[0-9a-f]* (T|D|B|R) vk[A-Z]')" = "0" \
&& echo OK
(Followed by the BtbN-style link reproducer in the ADR References section.)
0056 — SSIMULACRA 2 snapshot gate + fp-contract-off split (ADR-0164)¶
- ADR: ADR-0164
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
- Touches:
- python/test/ssimulacra2_test.py — new fork-added Python test. Uses
subprocess.callagainstExternalProgram.vmafexecwith--feature ssimulacra2; parses the--jsonoutput; asserts pooled + per-frame scores. - Invariants (load-bearing):
- Pinned values are CPU-only — generated on master HEAD after PR #100 merge. Re-generate if the scalar or any SIMD path changes semantically (which per ADR-0161/0162/0163's bit-exactness contract, it shouldn't — any bit-exact refactor leaves pinned values unchanged).
- Tolerance is 4 decimal places (
places=4) — matches 1e-4. The CPU paths are bit-exact so actual drift should be 0; the tolerance is defensive. -ffp-contract=offeverywhere in the ssimulacra2 pipeline:libvmaf_ssimulacra2_static_lib(scalar extractor),x86_ssimulacra2_avx2_lib,x86_ssimulacra2_avx512_lib, andarm64_ssimulacra2_lib(from ADR-0161). All four split out of their umbrella libs so other extractors keep upstream's default FMA policy. Without this the CI GCC/clang hosts drifted ~2e-4 from my AVX-512 authoring host — GCC 10+ defaults-ffp-contract=faston x86 with-mfmaand on aarch64, fusinga*b+cin scalar glue around the SIMD calls. Do NOT remove any of these carve-outs on rebase.- Fixtures are already-checked-in —
src01_hrc00/01_576x324is also the primary Netflix golden fixture; the 160×90 derived one stresses the sub-176 pyramid-termination path. - Do NOT modify the Netflix golden assertions in quality_runner_test.py et al. — those are upstream-pinned. This test is a SEPARATE file that adds fork-specific scores.
- On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future, cross-reference against their pinning if they add one.
- Re-test on rebase / after any ssimulacra2 change:
- Follow-ups:
- Cross-reference gate against libjxl
tools/ssimulacra2whenssimulacra2_rscargo install is fixed. - Expand fixture coverage if new YUV test assets land.
0055 — SSIMULACRA 2 picture_to_linear_rgb SIMD (ADR-0163)¶
- ADR: ADR-0163
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
- Touches:
- ssimulacra2_avx2.{c,h} — new
ssimulacra2_picture_to_linear_rgb_avx2+ helpers (read_plane_scalar_s2,srgb_to_linear_lane_avx2,compute_matrix_coefs). - ssimulacra2_avx512.{c,h} — 16-wide AVX-512 port.
- ssimulacra2_neon.{c,h} — 4-wide aarch64 port.
- ssimulacra2.c — new
ptlr_fnfield inSsimu2State; dispatch wrapperconvert_picture_to_linear_rgbunpacksVmafPictureintosimd_plane_t[3]; init assigns AVX2/AVX-512/NEON pointers. - ssimulacra2_simd_common.h — new shared header declaring
simd_plane_t. Decouples SIMD TUs fromVmafPicturetype. - test_ssimulacra2_simd.c — new
test_ptlr_420_8,test_ptlr_420_10,test_ptlr_444_8,test_ptlr_444_10,test_ptlr_422_8subtests + scalar referencesref_read_plane,ref_srgb_to_linear,ref_picture_to_linear_rgb. - Invariants (load-bearing):
- Scalar-order matmul —
G = Yn + cb_g * Un + cr_g * Vnchained left-to-right in all three SIMD TUs. Regression test catches reordering drift (~1 ulp). - Per-lane scalar
powf— vector polynomial approximation would drift scalar bit-exactness. Do not replace the lane spill/reload pattern with a vector libm. simd_plane_tlayout —{data, stride, w, h}ordering assumed by all three SIMD TUs. The dispatch wrapper builds this fromVmafPicturefields; layout must match.- Bounds clamping in
read_plane_scalar_*mirrors scalar reference verbatim (if (sx < 0) sx = 0; if (sx >= pw) sx = pw-1;etc.). Do not simplify — removes per-lane safety at plane edges. - Arbitrary chroma ratios fall through to the
int64_tmultiplication branch. Don't remove it — SSIMULACRA 2 is supposed to accept non-standard ratios gracefully. - On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides a SIMD YUV→RGB path, diff against the fork's — preserve the bit-exactness contract unless ADR-0142 Netflix-authority carve-out opens.
- Re-test on rebase:
ninja -C build && build/test/test_ssimulacra2_simd # 11/11
ninja -C build-aarch64 && \
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 11/11
- Follow-ups:
- T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending (gated on
tools/ssimulacra2availability). - SSIMULACRA 2 now has zero scalar hot paths. T3-1 closes in full with phases 1+2+3 (ADR-0161, 0162, 0163).
0054 — SSIMULACRA 2 FastGaussian IIR blur SIMD (ADR-0162)¶
- ADR: ADR-0162
- Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf.
- Touches:
- ssimulacra2_avx2.{c,h} — new
ssimulacra2_blur_plane_avx2+ 2 helpers (hblur_8rows_avx2,vblur_simd_8cols_avx2). - ssimulacra2_avx512.{c,h} — 16-wide port.
- ssimulacra2_neon.{c,h} — 4-wide aarch64 port, uses
vsetq_lane_f32in place of gather. - ssimulacra2.c — adds
blur_fnfunction pointer toSsimu2State, dispatch ininit_simd_dispatch(), call-site inblur_3plane. - test_ssimulacra2_simd.c — new
test_blur+ scalar reference (ref_blur_plane,ref_fast_gaussian_1d). - Invariants (load-bearing):
- Row-batching lane layout — horizontal pass lane
iMUST hold row(y_base + i). Gather index vector entries are(y_base + i) * w(stride-w). Changing this breaks bit-exactness vs scalar. - Scalar left-to-right summation order —
n2_k * sum - d1_k * prev1_k - prev2_kchained sequentially;o0 + o1 + o2at output time is(o0 + o1) + o2. Changing to(o0 + o2) + o1oro0 + (o1 + o2)will drift ~1 ulp and the regression test catches it. col_stateis 6 * w contiguous floats — layout is[prev1_0 | prev1_1 | prev1_2 | prev2_0 | prev2_1 | prev2_2]. SIMD loads assume this layout; changing field order requires updating all three SIMD TUs in lockstep withblur_plane.- NEON lane-set pattern — aarch64 has no gather intrinsic; 4 explicit
vsetq_lane_f32calls per input vector. Do not replace with ald1 {v.s}[lane]-style pseudo-gather without re-verifying bit-exactness. - Scalar tail in vertical pass matches scalar reference body verbatim. Any deviation breaks
memcmpequality on widths that aren't multiples of the SIMD width. - On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides their own IIR blur SIMD, diff against the fork's and preserve the bit-exactness contract unless an ADR-0142 Netflix-authority carve-out is opened.
- Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd # 6/6
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 6/6
- Follow-ups:
picture_to_linear_rgbSIMD — last scalar hot path in the extractor. 2 calls / frame. Low ROI but mechanical.- T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending.
0053 — SSIMULACRA 2 SIMD bit-exact ports (ADR-0161)¶
- ADR: ADR-0161
- Upstream source: fork-local. Upstream Netflix/vmaf has no SSIMULACRA 2 extractor at all (fork-added in ADR-0130).
- Touches:
- ssimulacra2_avx2.c / .h — 5 AVX2 kernels + per-lane
cbrtfhelper. - ssimulacra2_avx512.c / .h — 5 AVX-512 kernels; mechanical 16-wide widening of the AVX2 path.
- ssimulacra2_neon.c / .h — 5 NEON kernels; 4-wide aarch64 mirror.
- ssimulacra2.c — adds function-pointer dispatch fields to
Ssimu2State+init_simd_dispatch()helper, calls go through the pointers. - meson.build — registers the three SIMD TUs in
x86_avx2_sources/x86_avx512_sources/arm64_sources. - test_ssimulacra2_simd.c and
test/meson.build— new bit-exact test harness. - Invariants (load-bearing):
- Byte-for-byte bit-exactness to scalar on all 5 vectorised kernels under
FLT_EVAL_METHOD == 0. Regression caught pre- merge: naïve pairing(a+b)+(c+d)vs scalar((a+b)+c)+ddrifts by 1 ULP. Keep sequential scalar-order chains in all three SIMD TUs on rebase. cbrtfis per-lane scalar libm, not a polynomial. Any replacement with a vector cbrt would drift the ssimulacra2 score and break the regression test. Keep the spill/reload pattern.ssim_map/edge_diff_mapreductions use the ADR-0139 per-lanedoublescalar tail. Do NOT SIMD-reduce float lanes then lift to double — summation order changes.downsample_2x2deinterleave uses ISA-appropriate ops: AVX2vshufps+vpermpd, AVX-512vpermt2ps, NEONvuzp1q_f32+vuzp2q_f32. After deinterleave, sum order is((r0e+r0o)+r1e)+r1omatching scalar.#pragma STDC FP_CONTRACT OFFat every TU header. Ignored by aarch64 GCC (non-fatal-Wunknown-pragmas); kept for portability (clang, MSVC).- IIR blur +
picture_to_linear_rgbstay scalar in this PR. Follow-up PRs target these; when they land, re-verify bit-exactness viatest_ssimulacra2_simdexpansion. - Runtime dispatch order: AVX-512 > AVX2 on x86; NEON on aarch64; scalar fallback. Preserve on rebase.
- On upstream sync:
- Upstream has no SSIMULACRA 2 extractor; nothing to merge.
- If Netflix adopts SSIMULACRA 2 in the future, diff their implementation against the fork's scalar + SIMD TUs; keep the fork's bit-exactness contract absent a specific Netflix-authority carve-out ADR.
- Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd # 5/5
clang-tidy -p build core/src/feature/x86/ssimulacra2_avx2.c \
core/src/feature/x86/ssimulacra2_avx512.c
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 5/5
clang-tidy -p build-aarch64 \
core/src/feature/arm64/ssimulacra2_neon.c
- Follow-ups:
- IIR blur vectorisation (
blur_planevertical-pass column batching) — the biggest frame-level wallclock win. picture_to_linear_rgbper-lanepowf— lower ROI but mechanical.- T3-3 SSIMULACRA 2 snapshot-JSON regression test — ADR-0130 deferred; still pending.
0052 — psnr_hvs SIMD bit-exact ports (ADR-0159 AVX2, ADR-0160 NEON)¶
- ADRs: ADR-0159 (AVX2), ADR-0160 (NEON sister port).
- Upstream source: fork-local. Upstream Netflix/vmaf has no psnr_hvs SIMD path.
- Touches:
core/src/feature/x86/psnr_hvs_avx2.c— AVX2 TU.core/src/feature/x86/psnr_hvs_avx2.h— AVX2 header.core/src/feature/arm64/psnr_hvs_neon.c— NEON TU (sister port, ADR-0160).core/src/feature/arm64/psnr_hvs_neon.h— NEON header.core/src/feature/third_party/xiph/psnr_hvs.c— addPsnrHvsState+ runtime dispatch ininit()(AVX2 underARCH_X86, NEON underARCH_AARCH64) + scoped NOLINTBEGIN/END around the upstream Xiph scalar block (kept verbatim as the bit-exact reference).core/src/meson.build— addx86/psnr_hvs_avx2.ctox86_avx2_sourcesandarm64/psnr_hvs_neon.ctoarm64_sources.core/test/test_psnr_hvs_avx2.c,core/test/test_psnr_hvs_neon.c— bit-exact unit tests (x86 and aarch64 respectively).core/test/meson.build— register both tests underenable_asm, arch-gated.- Invariants (load-bearing):
- Bit-exactness to scalar: every
od_coeff(int32) and every finalpsnr_hvs_{y,cb,cr,psnr_hvs}value the AVX2 path emits must be byte-identical to the scalar reference on the Netflix golden pairs. If a rebase introduces any pattern that breaks this (e.g. a floating-point horizontal reduce in the mask accumulator), the unit testtest_psnr_hvs_avx2will fail — don't relax the assertions; fix the SIMD path. - DCT butterfly layout:
butterfly → transpose → butterfly → transpose. The transpose lives insideod_bin_fdct8x8_avx2. Do not move it. - Float accumulators stay scalar: means / variances / mask / error accumulation in
calc_psnrhvs_avx2use the same per-block scalar loop as scalar psnr_hvs — bit-exact by construction. Do not vectorize these with horizontal reductions without replicating ADR-0139's per-lane scalar-float reduction pattern. The cross-block error accumulatorretis threaded throughaccumulate_error()by pointer, not returned-then-summed: each of the 64 per-coefficient contributions per block must hit the outerretdirectly, matching scalar's inlineret += ...atthird_party/xiph/psnr_hvs.cline 355. IEEE-754 float add is non-associative — summing into a local float and then adding the per-block total toretchanges the summation tree and drifts the Netflix golden by ~5.5e-5. #pragma STDC FP_CONTRACT OFFat the TU header disables FMA formation. Required:fmaf(a, b, c)can differ from(a*b)+cby 1 ulp, breaking bit-exactness. Do not remove the pragma; do not add-ffp-contract=fastto the build flags for this TU.- NOLINT suppressions are load-bearing — each cites ADR-0141 inline (bit-exactness scalar-diff auditability for the 30-butterfly function, scalar float→double promotion for
sqrt, extractor-registry extern linkage forvmaf_fex_psnr_hvs, upstream-Xiph scoped block for rebase parity). - On upstream sync:
- Upstream has no psnr_hvs SIMD as of 2026-04-24. Keep fork's version on conflict.
- If upstream ever touches
psnr_hvs.cfor non-SIMD reasons (e.g. a masking-table update), rebase the AVX2 TU to match line-for-line and re-runtest_psnr_hvs_avx2to confirm bit-exactness survives. - NEON follow-up PR is a sister port; its
arm64/psnr_hvs_neon.cwill mirror this ADR's invariants. On rebase, the two SIMD TUs must stay in lock-step with the scalar reference. - Re-test on rebase:
ninja -C build
meson test -C build test_psnr_hvs_avx2
# Expect: 5/5 subtests pass (DCT bit-exact on 3 random seeds +
# delta + constant input).
# CLI-level bit-exactness on Netflix golden (requires the YUV
# fixtures in python/test/resource/yuv/):
# VMAF_CPU_MASK=0 (scalar)
# VMAF_CPU_MASK=255 (AVX2 enabled)
# Diff per-frame psnr_hvs_{y,cb,cr,psnr_hvs} XML fields; expect
# byte-identical across all 3 golden pairs.
0051 — Netflix#1486 motion updates verified present (ADR-0158)¶
- ADR: ADR-0158
- Upstream source: Netflix upstream PR #1486 ("Port motion updates"), MERGED 2026-04-20 as commits
a44e5e6(code) +62f47d5(Netflix golden updates). - Touches: documentation-only; the actual code changes this ADR documents are already in the fork's master via earlier incremental motion3 / blend / five-frame-window commits.
- Invariants (load-bearing for future
/sync-upstream): - The
edge_8mirror fix (i_tap = height - (i_tap - height + 2)) is present atinteger_motion.c:240,x86/motion_avx2.c:147,x86/motion_avx512.c:147. If upstream's mirror line ever diverges again, this is the hunk to watch. - The
motion_max_valfeature option is atinteger_motion.c:57,118-120with default 10000.0 andFEATURE_PARAMflag. Upstream's default = fork's default; don't drift. VMAF_integer_feature_motion3_scoreoutput plumbing is ininteger_motion.c+alias.c.- Fork-local motion extensions (five-frame-window, moving-average, blend, fps_weight) are ADDITIONS on top of Netflix#1486. They are not upstream. Upstream changes to motion extractor internals may conflict with them — diff against
core/src/feature/integer_motion.con every rebase and check that the fork'sMIN(s->score * s->motion_fps_weight, s->motion_max_val)invocations are preserved (lines ~409, ~503). - On upstream sync: nothing to port from Netflix#1486 — it's absorbed. If a future upstream PR touches the same code paths, prefer upstream's version for the scalar/edge handling and the fork's version for the five-frame-window / blend extensions.
- Re-test on rebase:
ninja -C build
meson test -C build
# Expect: 35/35 pass.
# Verify the upstream markers are still in place after rebase:
grep -n "height - (i_tap - height + 2)\|motion_max_val\|VMAF_integer_feature_motion3_score" \
core/src/feature/integer_motion.c \
core/src/feature/alias.c \
core/src/feature/x86/motion_avx2.c \
core/src/feature/x86/motion_avx512.c
# Expect: matches at all 4 files. If any missing, the rebase
# silently dropped the Netflix#1486 content — investigate.
0050 — CUDA preallocation memory leak fix + vmaf_cuda_state_free (ADR-0157)¶
- ADR: ADR-0157
- Upstream source: Netflix upstream issue #1300 (OPEN since 2024; no maintainer fix as of 2026-04-24). User reports GPU memory rises monotonically across init/preallocate/fetch/close cycles.
- Touches:
core/include/libvmaf/libvmaf_cuda.h— new publicvmaf_cuda_state_free()API declaration.core/src/cuda/common.c— newvmaf_cuda_state_free()implementation;vmaf_cuda_release()now callscuda_free_functions();vmaf_cuda_state_init()gets an outer failure unwind;init_with_primary_context()releases the retained primary context onfail_after_pop.core/src/cuda/ring_buffer.c(since folded into the per-stream dispatch + drain machinery; seecore/src/cuda/dispatch_strategy.candcore/src/cuda/drain_batch.c) —vmaf_ring_buffer_close()then unlocked + destroyed the mutex before freeing.core/test/test_cuda_preallocation_leak.c— new GPU-gated reducer (10-cycle loop with full cleanup).core/test/test_cuda_pic_preallocation.c,core/test/test_cuda_buffer_alloc_oom.c— add missingvmaf_cuda_state_free()+vmaf_model_destroy()calls aftervmaf_close()in every test that allocates these.core/test/meson.build— register the new reducer underenable_cudaguard.- Invariants (load-bearing):
- Public contract: every caller of
vmaf_cuda_state_init()MUST callvmaf_cuda_state_free()AFTERvmaf_close()on any VmafContext that imported the state. Informalfree(cu_state)is a silent double-free hazard AFTER close (vmaf_close's vmaf_cuda_release already memset's + frees CudaFunctions internals; vmaf_cuda_state_free only frees the heap allocation itself). vmaf_cuda_release()freesCudaFunctionsvia a saved pointer AFTER thememset. Order matters —memsetfirst socu_state->fis zeroed in the caller's struct, then free via the saved local. Do not re-order.vmaf_ring_buffer_close()unlocks BEFORE destroying the mutex (POSIX requires the mutex be unlocked for destroy).- The cold-start unwind in
init_with_primary_contextreleasescuDevicePrimaryCtxRetain's retained context ifcuStreamCreateWithPriorityfails. - The ADR-0122 / ADR-0123
is_cudastate_empty()null-guards at the top of every publicvmaf_cuda_*entry must continue to compose with the newvmaf_cuda_state_free()(which accepts NULL directly and doesn't call through to the CUDA API). - The new free call order in callers is:
vmaf_close(vmaf)→vmaf_cuda_state_free(cu_state)→vmaf_model_destroy(model). Reversing the first two produces a use-after-free. - On upstream sync:
- Upstream has no
vmaf_cuda_state_free()as of 2026-04-24. Keep the fork's version on any conflict. If upstream eventually lands the same API with a different spelling, prefer upstream's spelling and add a compat alias — but do not break the fork's ABI. vmaf_cuda_release()'scuda_free_functions()call is fork-local. On rebase, keep it.- The ring-buffer
pthread_mutex_unlock+pthread_mutex_destroypair is fork-local. On rebase, keep it. - If upstream refactors
VmafCudaStateownership semantics (unlikely — their pattern has been "leaked state in a long- lived process is acceptable" historically), re-audit this ADR and the new public API. - Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 40/40 pass including test_cuda_preallocation_leak.
# ASan leak-check:
cd libvmaf && meson setup build-asan-cuda \
-Db_sanitize=address -Denable_cuda=true -Denable_sycl=false \
--buildtype=debug
ninja -C build-asan-cuda
ASAN_OPTIONS='detect_leaks=1:leak_check_at_exit=1' \
build-asan-cuda/test/test_cuda_preallocation_leak
# Expect: 0 bytes leaked from core/src/* frames.
# (~180 bytes in libcuda.so.1 is expected — driver's process-
# lifetime cuInit cache, does not grow per cycle.)
0049 — CUDA graceful error propagation (ADR-0156)¶
- ADR: ADR-0156
- Upstream source: Netflix upstream issue #1420 (OPEN as of 2026-04-24). Reports that two concurrent VMAF-CUDA processes crash the second one at
vmaf_cuda_buffer_allocdue toCHECK_CUDA(cuMemAlloc)→assert(0)on OOM. - Touches:
core/src/cuda/cuda_helper.cuh— redefinedCHECK_CUDAfamily. New macrosCHECK_CUDA_GOTO+CHECK_CUDA_RETURN+ helpervmaf_cuda_result_to_errno. Oldassert(0)semantics removed entirely.core/src/cuda/common.c,core/src/cuda/picture_cuda.c,core/src/libvmaf.c— allCHECK_CUDA(...)sites converted; cleanup labels added where contexts / buffers were pushed / allocated.core/src/feature/cuda/integer_motion_cuda.c,integer_vif_cuda.c,integer_adm_cuda.c— same conversion; 12statichelpers promotedvoid → int.core/test/test_cuda_buffer_alloc_oom.c— new GPU-gated reducer.core/test/meson.build— register new test underenable_cudaguard.- Invariants (load-bearing):
CHECK_CUDA_GOTO/CHECK_CUDA_RETURNmust never callassert(0)orabort()on a CUDA error. Any regression back to the upstream abort-on-error semantics re-introduces Netflix#1420 and the NDEBUG footgun.- Every
CHECK_CUDA_GOTOtarget label must pop any previously-pushed CUDA context and free any partially-constructed buffers before returning the errno. The graceful path must not leak resources. vmaf_cuda_result_to_errnouses numericCUresultvalues directly (0 / 1 / 2 / 3 / 4 / 101 / 201 / 400) so host TUs that don't include<cuda.h>can transitively consume the mapping via the inline function. If upstream renumbersCUresultenum values (historically stable — they've been fixed since CUDA 1.0), re-audit the switch.- ADR-0122 / ADR-0123
is_cudastate_empty(...)guards at the top of every publicvmaf_cuda_*entry point must stay — they run before the CUDA API is touched and compose cleanly with the new error propagation. - Twelve
statichelper signatures in the feature extractors areint-returning (wasvoid): any upstream-port that restores thevoidreturn silently regresses the error path. - On upstream sync:
- Upstream Netflix still uses
assert(0)inCHECK_CUDAas of 2026-04-24. Keep the fork's macro definitions incuda_helper.cuhon any upstream conflict — this file is fork-local behaviour. - If upstream eventually lands Netflix#1420 with a similar refactor, prefer the fork's version unless upstream's has identical semantics (no
assert(0)/ noabort()/ translatesCUresultto-errno). Re-verifytest_cuda_buffer_alloc_oomafter rebase. - If upstream adds new
CHECK_CUDA(...)sites in a port, rewrite them toCHECK_CUDA_GOTO/CHECK_CUDA_RETURNas part of the port commit. - If upstream changes any of the 12
statichelper signatures back tovoid, re-promote them tointduring the merge. - Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 39/39 pass including test_cuda_buffer_alloc_oom.
# Reducer check — verify the OOM-to-errno path is live:
meson test -C core/build-cuda test_cuda_buffer_alloc_oom -v
# Expect subtests: request 1 TiB → -ENOMEM; request 0 bytes → 0.
clang-tidy -p core/build-cuda --quiet \
core/src/cuda/common.c \
core/src/cuda/picture_cuda.c \
core/src/feature/cuda/integer_motion_cuda.c \
core/src/feature/cuda/integer_vif_cuda.c \
core/src/feature/cuda/integer_adm_cuda.c \
core/src/libvmaf.c
# Expect exit 0 on every file.
0049 — compute_motion / picture_copy signature changes (b949cebf upstream port)¶
- Upstream commit: Netflix/vmaf b949cebf (feature/motion: port several feature extractor options)
- Prerequisite commit: Netflix/vmaf d3647c73 (picture_copy: add channel parameter)
- PR: upstream/port-b949cebf-motion
Rebase-sensitive invariants:
-
compute_motionsignature change —compute_motion()incore/src/feature/motion.c/motion.hnow takes an extraint motion_decimateparameter (themotion_add_scale1flag). Any new caller added in the fork that callscompute_motion()must pass this parameter. The SIMD integer motion callers (motion_avx2.c,motion_avx512.c) do NOT callcompute_motion()— they use the SAD/convolution dispatch table directly and are unaffected. -
vmaf_image_sad_csignature change — similarly gainsint motion_add_scale1. Any caller in the fork must be updated. Currently only called fromcompute_motion()internally. -
picture_copysignature change — gainsint channelas the last parameter (0=Y, 1=U, 2=V). Every caller in the tree has been updated to pass0(luma). When adding new callers that need UV planes, pass1or2. The fork's CUDA/SYCL/Vulkan callers have been updated in this PR. -
Default behavior preserved — all new options default to no-op values.
motion_add_scale1=false,motion_add_uv=false,motion_blend_factor=1.0,motion_fps_weight=1.0,motion_filter_size=5(= DEFAULT_MOTION_FILTER_SIZE). Integer and float motion2 scores are bit-identical to pre-port baseline. -
vif_scale_frame_sdependency avoided — the upstream b949cebf motion.c importsvif_scale_frame_sfrom vif_tools.h. The fork does not have this function yet (vif options chain is deferred, Research-0024 Strategy E). The bilinear downscaler formotion_add_scale1is implemented as local static functions inmotion.c(motion_scale_bilinear,motion_bilinear_interp,motion_mirror_f). When upstream's vif options chain is eventually ported, reconcile by replacing these local functions withvif_scale_frame_s.
Reproducer:
# verify bit-exactness (default options, scores must be identical):
./core/build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--model path=model/vmaf_v0.6.1.json \
--feature motion --no_prediction --json --output /tmp/motion.json
# integer_motion2 scores must match pre-port baseline at 6 decimal places.
0048 — i4_adm_cm int32 rounding overflow deliberately preserved (ADR-0155)¶
- ADR: ADR-0155
- Upstream source: Netflix upstream issue #955 (OPEN since 2020; no maintainer response as of 2026-04-24). Reports that
add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1))incore/src/feature/integer_adm.cscales 1–3 overflowsint32_t(1u << 31 = 0x80000000wraps to-2147483648). Rounding term is sign-negated; ADM scales 1–3 biased low by ≈1 LSB per summed term. - Touches (documentation-only):
docs/adr/0155-adm-i4-rounding-deferred-netflix-955.md— new ADR (this entry's anchor).core/src/feature/integer_adm.c— in-file warning comment above the overflow site (add_bef_shift_flt[]initialiser loop around line 1277). No code change.core/src/feature/AGENTS.md— invariant note under "Rebase-sensitive invariants".- Invariants (load-bearing — do NOT silently "fix"):
integer_adm.ckeepsint32_t add_bef_shift_flt[3]with the overflowing1u << 31assignment. The Netflix golden assertions (python/test/quality_runner_test.py,vmafexec_test.py,feature_extractor_test.py) encode the buggy ADM output. Project hard rule #1 (ADR-0024) prohibits changing those assertions.- Any "fix" that changes ADM numerical output must land together with a coordinated Netflix-authored golden-number update (the ADR-0142 Netflix-authority carve-out). Until Netflix#955 closes upstream, there is no authority to track.
- On upstream sync:
- If Netflix finally lands a fix for #955 (widening the rounding term to
uint32_torint64_t), sync the C-side fix AND the updatedassertAlmostEqualvalues in the same merge. Re-runmake test-netflix-goldenand/cross-backend-diffon the golden pairs to verify the new numbers are consistent across CPU / CUDA / SYCL. - Remove the in-file warning comment above the
add_bef_shift_fltinitialiser loop, flip ADR-0155 toSuperseded by ADR-NNNN, and drop this rebase-notes entry. - If upstream instead closes #955 as wont-fix, keep this entry verbatim and update the ADR status to note upstream's closure.
- Re-test on rebase (gates the invariant by confirming the golden numbers are unchanged):
ninja -C build
make test-netflix-golden
# Expect: VMAF mean 76.66890… on src01_hrc00/01_576x324 golden
# pair — bit-identical to pre-rebase.
0047 — vmaf_score_pooled -EAGAIN for pending features (ADR-0154)¶
- ADR: ADR-0154
- Upstream source: Netflix upstream issue #755 (OPEN as of 2026-04-24). Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); fork reopens it via error-code semantics without changing the retroactive-write design.
- Touches:
core/src/feature/feature_collector.c—vmaf_feature_collector_get_scorereturns-EAGAIN(was-EINVAL) when the requested index is valid but not yet written.core/src/feature/feature_collector.h— inlinevmaf_feature_vector_get_scorenow returns-EINVALfor null/out-of-range and-EAGAINfor not-written (was-1for both). Added#include <errno.h>. Rename reserved__VMAF_FEATURE_COLLECTOR_H__guard toVMAF_FEATURE_COLLECTOR_INCLUDED.core/test/test_score_pooled_eagain.c— new 4-subtest reducer.core/test/meson.build— register the new test.- Invariants (load-bearing, enforced by the reducer):
vmaf_feature_collector_get_score(fc, name, &score, i)returns-EAGAINiff the featurenameis registered andiis in range butscore[i].written == false.- The return stays
-EINVALfor (a) null pointers, (b)i >= feature_vector->capacity, (c) unknown feature name. - The inline fast-path
vmaf_feature_vector_get_scoreuses the same split. - On upstream sync: upstream has not changed the error semantics since 2020. If they do (unlikely), keep the fork's
-EAGAIN— it is strictly more informative and downstream code depending on the split would regress. - Re-test on rebase:
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: 4/4 subtests pass.
# Reducer check:
git stash push core/src/feature/feature_collector.c core/src/feature/feature_collector.h
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: Fail: 1 (tests fail without -EAGAIN split).
git stash pop
0046 — float_ms_ssim min-dim guard (ADR-0153)¶
- ADR: ADR-0153
- Upstream source: Netflix upstream issue #1414 (OPEN as of 2026-04-24). No upstream fix has landed; fork adds the guard independently.
- Touches:
core/src/feature/float_ms_ssim.c— add#include "log.h"+#include "iqa/ssim_tools.h"+ amin_dim = GAUSSIAN_LEN << (SCALES - 1)check at the start ofinit; extract SIMD dispatch into a newms_ssim_init_simd_dispatchhelper to keepinitwithin the ADR-0141 60-line budget.core/test/test_float_ms_ssim_min_dim.c— new 3-subtest reducer.core/test/meson.build— register the new test executable.- Invariant (load-bearing, enforced by the reducer):
float_ms_ssim.initreturns-EINVALwhenw < 176 || h < 176, where 176 is computed dynamically from the filter constants. The magic number is not hardcoded — changingSCALESorGAUSSIAN_LENupstream will auto-update the minimum. - On upstream sync: if Netflix upstream lands a similar init-time guard, keep the fork's version — the helper name
ms_ssim_init_simd_dispatchis fork-local (introduced to satisfy ADR-0141) and upstream's patch won't match. Both guards should be compatible; re-verify the reducer after rebase. - Re-test on rebase:
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: 3/3 subtests pass.
# Reducer check (confirms the guard is load-bearing):
git stash push core/src/feature/float_ms_ssim.c
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: Fail: 1 (tests fail without the guard).
git stash pop
0045 — vmaf_read_pictures monotonic-index guard (ADR-0152)¶
- ADR: ADR-0152
- Upstream source: Netflix upstream issue #910 (OPEN as of 2026-04-24). No upstream fix has landed; the fork adds the guard independently, per the 2021-10-14 maintainer comment that recommended exactly this shape.
- Touches:
core/src/libvmaf.c— addunsigned last_index+bool have_last_indexfields toVmafContext; prepend a monotonic-index check insideread_pictures_validate_and_prep(returns-EINVALon duplicates / regressions); update the two new fields at the tail of the same helper on success.core/test/test_read_pictures_monotonic.c— new 3-subtest reducer covering the Netflix#910 sequence and the two classes of rejection (duplicate, out-of-order).core/test/meson.build— register the new test executable.- Invariant (load-bearing, enforced by the reducer):
vmaf_read_pictures(vmaf, ref, dist, index)returns-EINVALwhenhave_last_index && index <= last_index. Flush (vmaf_read_pictures(vmaf, NULL, NULL, 0)) routes toflush_contextbefore the guard runs — flushing remains always-available independent of the last accepted index. - On upstream sync:
- If Netflix upstream eventually lands a similar guard at the API boundary, keep the fork's version — the helper function name (
read_pictures_validate_and_prep) is fork-local (ADR-0146), upstream's patch will target a different insertion point. Both guards should be compatible; re-verify the reducer after rebase. - If upstream instead lands an internal reordering mechanism (buffer-and-sort frames before dispatch), revisit this decision — the fork's API-level contract is stricter and may need to relax to match. Open a new ADR if so.
- Re-test on rebase:
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: 3/3 subtests pass.
# Reducer check (confirms the guard is load-bearing):
git stash push core/src/libvmaf.c
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: Fail: 1 (the test rejects the un-guarded behaviour).
git stash pop
0044 — i686 (32-bit x86) build-only CI job (ADR-0151)¶
- ADR: ADR-0151
- Upstream source: Netflix upstream issue #1481 (OPEN as of 2026-04-24). Reports i686 compile failure on
_mm256_extract_epi64. Workaround documented in the issue:-Denable_asm=false. - Touches:
build-aux/i686-linux-gnu.ini— new cross-file; gcc +-m32+cpu_family = 'x86'/cpu = 'i686'. Noexe_wrapper..github/workflows/libvmaf-build-matrix.yml— new matrix row withi686: trueflag + new install-deps step forgcc-multilib+g++-multilib; existing "Run tests" + "Run tox tests (ubuntu)" steps widened with&& !matrix.i686guards.- Invariants:
- The i686 matrix row pins
-Denable_asm=false— this is the upstream-documented workaround for_mm256_extract_epi64's missing declaration on 32-bit x86 targets. Do NOT remove the flag without first gating every_mm256_extract_epi64call site incore/src/feature/x86/adm_avx2.c+motion_avx2.c+adm_avx512.con__x86_64__. Removing the flag naively will re-break the build. - No
exe_wrapperin the cross-file: meson marks tests asSKIP 77even though the host can run i686 binaries natively. Build-only gate by design. - On upstream sync:
- If upstream Netflix fixes #1481 at source (by gating the intrinsic calls on
__x86_64__or by emulating via two_mm256_extract_epi32halves), sync the fix and re-enable ASM on the i686 row (drop-Denable_asm=falsefrommeson_extra). Re-verify bit-exactness via/cross-backend-diffon the x86_64 golden pair. - If upstream marks i686 unsupported in meson (e.g. via a hard error), the fork's i686 row should be removed or downgraded to
continue-on-error: true. - Re-test on rebase (Ubuntu host with
gcc-multilib):
meson setup libvmaf core/build-i686 \
--cross-file=build-aux/i686-linux-gnu.ini \
-Denable_asm=false \
-Denable_cuda=false -Denable_sycl=false
ninja -C core/build-i686
file core/build-i686/tools/vmaf
# Expect: ELF 32-bit LSB pie executable, Intel i386
CI runs this same sequence via the new matrix row.
0058 — Tiny-AI Netflix corpus training scaffold (ADR-0252)¶
- ADR: ADR-0252.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training harness or MCP server.
- Touches:
ai/— training harness;NflxLocalDatasetloader reads from--data-root(never from a hardcoded path).docs/ai/training-data.md— corpus path convention and loader API docs; purely additive.mcp-server/vmaf-mcp/tests/test_smoke_e2e.py— new e2e smoke test; references only committed golden fixtures.- Invariants (load-bearing):
- Data path is local-only.
.workingdir2/netflix/is gitignored; no YUV from this corpus is ever committed. The--data-rootCLI flag must remain the sole mechanism for locating the corpus. - Smoke test uses only committed fixtures.
test_smoke_e2e.pyreferencespython/test/resource/yuv/src01_hrc00_576x324.yuv(a committed golden file), never the local corpus path. On upstream sync the golden YUV path must stay stable. - No Netflix golden assertion is modified. The
places=4tolerance intest_smoke_e2e.pyasserts against thevmaf_v0.6.1CPU reference; it is not a golden assertion and may be adjusted by/regen-snapshotswith justification. - On upstream sync: zero interaction with Netflix upstream. The
ai/subtree andmcp-server/are wholly fork-local; upstream merges are conflict-free here. If Netflix ever ships a training harness, reconcile separately. - Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (vmaf binary)
# Skips automatically if binary or golden YUV is absent.
0085 — Research-0030 Phase-3b multi-seed validation (Gate 1 passed)¶
- No ADR. Empirical research digest closing Gate 1 of the 3-gate v2 validation chain. Architecture decision unchanged.
- Upstream source: fork-local. Netflix has no multi-seed validation surface for tiny-AI training.
- Touches (additive only):
docs/research/0030-phase3b-multiseed-validation.md— per-seed PLCC tables + stability analysis + Gate 2/3 plan.ai/scripts/phase3_subset_sweep.py— adds--seedsflag (comma-separated list) + per-seed result aggregation.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The +0.0175 Δ is multi-seed mean PLCC, not seed-0 PLCC. Don't cite the +0.0106 from Research-0029 once Research-0030 lands; the multi-seed number is more trustworthy.
- Subset B is more stable than canonical-6 across seeds. Don't ship a v2 model citing single-seed numbers — always report multi-seed mean ± seed-mean-std for any tiny-AI metric in a future digest.
- The
--seedsflag aggregates by flattening (seed × fold) pairs. The reportedmean_plccis the mean of alln_seeds × n_foldsmeasurements;seed_mean_plcc_stdis the std across per-seed means, which is the right number for "is the result seed-stable". - On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the runs/ files reproduce from the canonical command.
0084 — Research-0029 Phase-3b StandardScaler retry (positive result)¶
- No ADR. Empirical research digest; revives the Research-0026 hypothesis after the Research-0028 negative result. The architectural decision (ship
vmaf_tiny_v2) is gated on three validation steps documented in the digest §"Required before shipping". - Upstream source: fork-local. Netflix has no tiny-AI preprocessing-sensitivity analysis surface.
- Touches (additive only):
docs/research/0029-phase3b-standardscaler-results.md— per-fold tables + apples-to-apples comparison + 3-gate pre-shipping checklist.ai/scripts/phase3_subset_sweep.py— adds--standardizeflag +_standardize_inplacehelper.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- StandardScaler statistics MUST be fit per-fold on the train split only. Fitting on the full data would leak held-out information into LOSO; the
_standardize_inplacehelper enforces this by taking only the train slice as input. - A shipped
vmaf_tiny_v2.onnxMUST bundle its scaler(mean, std)in the sidecar JSON — otherwise inference applies different normalisation than training and the win evaporates. Currently UN-implemented; tracked as a §"Caveats" #5 follow-up. - Subset B's feature list is the load-bearing finding:
adm2,adm_scale3,vif_scale2,motion2,ssimulacra2,psnr_hvs,float_ssim. Phase-3c experiments may shift the optimal arch / lr / epochs but should keep this set. - On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the
--standardizeinvocation in §"Reproducer".
0082 — Research-0028 Phase-3 subset sweep (negative-result digest)¶
- No ADR. Empirical research digest. The architectural decision (no v2 model ships from this Phase) is governed by Research-0027's pre-registered stopping rule.
- Upstream source: fork-local. Netflix has no tiny-AI subset- sweep surface.
- Touches (additive only):
docs/research/0028-phase3-subset-sweep.md— per-fold tables- headline + standardisation caveat + Phase-3b/c/d follow-ups.
CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- canonical-6 stays the default until Phase-3b lands a ≥ 0.005 PLCC win (per Research-0027 stopping rule).
- The PLCC drop is most likely a feature-scale issue, not evidence the new features lack signal. Don't cite this digest to retire
ssimulacra2/adm_scale3from the candidate pool; re-test withStandardScalerfirst. - Phase-3 results are seed=0 only. Any v2-shipping decision needs 3-seed mean±std and KoNViD cross-check.
- On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; runs/ files are reproducible from the canonical command in §"Reproducer".
0081 — Research-0027 Phase-2 feature importance results¶
- No ADR. Empirical research digest closing Research-0026 Phase 2; the architectural decision (Subset A / B / C) is deferred to Phase-3 results in a future digest.
- Upstream source: fork-local. Netflix has no cross-metric feature-importance analysis surface.
- Touches (additive only):
docs/research/0027-phase2-feature-importance.md— per-method top-10 + consensus + redundancy + Phase-3 subset recommendations.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Consensus top-10 is the load-bearing finding:
adm2,adm_scale3,ssimulacra2,vif_scale2. Phase-3 candidate subsets MUST include all four. - The 11-pair redundancy table is corpus-specific — measurements on Netflix Public 9-source. KoNViD-1k cross- check is a Phase-3 prerequisite if Subsets B/C advance.
runs/full_features_netflix.parquetandruns/full_features_correlation.jsonstay gitignored. Reproducer in §"Reproducer" regenerates both.- On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the
runs/files are reproducible from the canonical commands.
0080 — Phase-2 analysis scripts (Research-0026 Phase 2 prep)¶
- No ADR. Pure analysis scaffolding; the architectural decision (which features to ship in v2) is gated on Phase 2's numerical output via Research-0027.
- Upstream source: fork-local. Netflix has no tiny-AI training nor cross-metric correlation tooling.
- Touches (additive only):
ai/scripts/extract_full_features.py— parquet extractor over Netflix corpus withFULL_FEATURES. Per-clip JSON cache at$XDG_CACHE_HOME/vmaf-tiny-ai-full/<source>/<dis_stem>.json.ai/scripts/feature_correlation.py— Pearson + MI + LASSO- RF + consensus top-K analyser; outputs JSON.
ai/tests/test_feature_correlation.py— 5 pytest cases against synthetic parquet (no libvmaf dependency).CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The per-clip JSON cache and the
FULL_FEATUREStuple must stay in lock-step. If the tuple grows (or shrinks), pre-existing cache files become stale and silently misalign their storedper_framecolumns with the new tuple. The extractor MUST be re-run with a cleared cache whenFULL_FEATURESchanges. Regression hint:test_default_features_unchangedintest_feature_sets.pyalready guards the canonical 6; extend coverage toFULL_FEATURESif rebases touch it. motion3resolves to extractormotion_v2in_METRIC_TO_EXTRACTOR, notmotion3(the upstream-canonical extractor name in the integer_motion_v2 module). The CLI--feature motion3does NOT exist. The JSON output key isinteger_motion3which_lookupfinds via theinteger_fallback.admandvifaggregates are NOT inFULL_FEATURES. The integer extractor emitsinteger_adm2andinteger_vif_scale0..3but no bareadm/vif. Listing them produced all-NaN columns in v1 — fixed in PR #185 amend.- On upstream sync: zero interaction. Pure fork-side analysis tooling.
- Re-test on rebase:
pytest ai/tests/test_feature_correlation.py ai/tests/test_feature_sets.py -v
# Expect: 14 passed in <1 s.
0079 — Tiny-AI feature-set registry (Research-0026 Phase 1)¶
- No ADR. Pure additive extension of an existing module; the architectural decision (which features, which model) lives in Research-0026's go/no-go gate after Phase 2.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training pipeline.
- Touches (additive only):
ai/data/feature_extractor.py— addsFULL_FEATURES(21 entries),FEATURE_SETSregistry,resolve_feature_set()helper._METRIC_TO_EXTRACTORgrew 11 → 25 entries.ai/tests/test_feature_sets.py— new 9-test smoke suite.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant — these are load-bearing):
DEFAULT_FEATURESstays the canonical 6-tuple matchingvmaf_v0.6.1's SVR input layout. Testtest_default_features_unchangedis the regression guard; any quiet broadening would invalidate every shipped tiny-AI ONNX (input-dim baked into the model). If a future change must broaden the default, ship a paired model swap under ADR-0049 sidecar policy.FULL_FEATURESexcludeslpipsandfloat_momentper Research-0026 §"Open questions" Q1. Testtest_full_features_excludes_lpips_and_momentenforces. Adding either would re-classify the experiment from "tiny model on classical features" to "ensemble of DNNs".- Every entry in
FULL_FEATURESMUST have an entry in_METRIC_TO_EXTRACTOR. Testtest_every_full_feature_has_extractor_mappingis the guard — without the mapping the libvmaf CLI silently emits NaN columns for the missing metric. - On upstream sync: zero interaction. Fork-only training surface.
- Re-test on rebase:
0078 — Research-0026 cross-metric feature fusion plan¶
- No ADR. Pure research-plan digest; the architectural decision (which features to add) is deferred to Research-0027 follow-up after Phase 2 numbers land.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training and no broader-feature-set hypothesis under investigation.
- Touches (additive only):
docs/research/0026-cross-metric-feature-fusion.md— 4-phase experimental plan + cost estimate + go/no-go criteria.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The 6-feature canonical baseline (
adm2,vif_scale0..3,motion2) stays the default. Any v2 model is opt-in via a newfeature_setfield in the sidecar JSON; existingvmaf_tiny_v1.onnxusers get the same numbers. lpipsis OUT of the candidate pool (Phase 1/2). It's DNN-based and would blur the line between "tiny model on classical features" and "ensemble of DNNs". Revisit only if classical features can't close the gap.- On upstream sync: zero interaction. Pure fork-side research planning.
- Re-test on rebase: documentation-only; no test surface.
0077 — Research-0025 FoxBird outlier resolved via KoNViD combined training¶
- No ADR. Empirical research digest closing the open question in Research-0023 §5; no architecture or policy decision. Pure documentation of an empirical result.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training, no KoNViD-1k integration, and no LOSO eval surface.
- Touches (additive only):
docs/research/0025-foxbird-resolved-via-konvid.md— per-clip table + comparison to Netflix-only baselines + interpretation + caveats + next-experiment list.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The training-fit per-clip numbers in §"Per-clip result" are NOT held-out generalisation metrics — FoxBird is in the training set. The proper validation is the LOSO sweep on the combined corpus (§"Next experiments" #1). Don't cite the 0.9936 FoxBird PLCC as a generalisation number; cite it as "training-fit on combined corpus, 5.4× RMSE improvement vs Netflix-only".
- Combined trainer command line is canonical. The reproduction recipe in §"Setup" includes
--seed 0,--konvid-val-fraction 0.1,--val-source Tennis,--val-mode netflix-source-and-konvid-holdout. Changing any knob invalidates the per-clip numbers. runs/tiny_combined_canonical/stays gitignored. The final ONNX is reproducible from the parquet + Netflix corpus + the canonical CLI; the durable record is the digest's table.- On upstream sync: zero interaction. Research digest is fork-only.
- Re-test on rebase:
python ai/train/train_combined.py \
--netflix-root .workingdir2/netflix \
--konvid-parquet ai/data/konvid_vmaf_pairs.parquet \
--model-arch mlp_small --epochs 30 --batch-size 256 --lr 1e-3 \
--val-mode netflix-source-and-konvid-holdout \
--val-source Tennis --konvid-val-fraction 0.1 --seed 0 \
--out-dir runs/tiny_combined_canonical
# Expect: FoxBird PLCC ≈ 0.9936 ± 1e-3 (numerical-noise floor),
# mean PLCC ≥ 0.9983 across 9 Netflix clips.
0076 — Research-0024 vif/adm upstream-divergence digest (Strategy E doc)¶
- No ADR. Pure documentation digest; the divergence decisions it ratifies are already governed by ADR-0138 / 0139 / 0142 / 0143 (vif SIMD bit-exactness contract) and ADR-0024 (Netflix golden-data immutability). The digest itself fits the per-PR research-digest deliverable bar from ADR-0108.
- Upstream source: forward-looking — pre-emptively documents the fork's non-port of Netflix
4ad6e0ea/41d42c9e/bc744aa3/8c645ce3(vif chain) and4dcc2f7c(float_adm chain). Strategy A onb949cebfmotion chain stays approved. - Touches (additive only):
docs/research/0024-vif-upstream-divergence.md— 5-strategy decision matrix + numerical-risk analysis for each chain.core/src/feature/AGENTS.md— two new "rebase-sensitive invariants" entries pinning the vif and adm divergences.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant — these are the whole point):
- Do not port
4ad6e0ea(vif runtime helpers) or8c645ce3(vif prescale options) verbatim. They replace the precomputedvif_filter1d_table_stable whose frozenconst floatGaussians make AVX2 == AVX-512 == NEON == scalar bit-for-bit. A future opt-in second-path port (Strategy C, runtime helpers behind--vif-prescale != 1) is allowed but must not touch the default code path. - Do not port
4dcc2f7cfloat_adm options chain. The 12-parametercompute_admsignature change cascades through SIMD (avx2 / avx512 / neon) and 3 GPU backends (vulkan / cuda / sycl). The newaimfeature has no fork- side golden values; defer until concrete user demand. - Mirror bugfix
41d42c9eis a separate decision. Must come paired withplaces=4 → places=3golden loosening per ADR-0142 Netflix-authority precedent. Not part of Strategy E; eligible for a focused single-purpose PR if any shipped model drifts more thanplaces=3because of the missing fix. b949cebfmotion chain port stays APPROVED under Strategy A (verbatim, float_motion-side only). Float_motion has no precomputed-table investment to protect; existing fork integer_motion already has 6/9 of these options; cheap to mirror onto float_motion.- On upstream sync: zero conflict — pure additions to research/ and AGENTS.md.
- Re-test on rebase: documentation-only PR; rendered markdown is the only verification surface.
# Re-run the diff scan that produced the digest (catches new
# upstream commits since 9dac0a59):
git fetch upstream && git log --pretty=format:'%h %s' \
upstream/master ^origin/master --since="2026-01-01" \
-- core/src/feature/{float_,integer_,}{vif,motion,adm,cambi}*.{c,h} \
core/src/feature/{vif,motion,adm,cambi}_options.h \
| head -30
# If new vif / adm option ports appear, update Research-0024 §"Same
# divergence test for motion + float_adm" before deciding to port.
0075 — Upstream 798409e3 + 314db130 ports (CUDA null-deref + remove all.c)¶
- No ADR. Pure upstream cherry-picks per ADR-0108 carve-out ("pure upstream syncs and
port-upstream-commitPRs are exempt"). - Upstream source:
798409e3(Lawrence Curtis, 2026-04-20): "Fix null deref crash on prev_ref update in pure CUDA pipelines"314db130(Kyle Swanson, 2026-04-28): "libvmaf/feature: remove empty translation unit all.c"- Touches (additive / removal only):
core/src/libvmaf.c— addsif (ref && ref->ref)guard beforevmaf_picture_ref(&vmaf->prev_ref, ref)at the two threaded paths (threaded_enqueue_oneline 1057 andthreaded_read_pictures_batchline 1105). Main path at line 1597 already has the guard.core/src/feature/all.c— file deleted.core/src/meson.build— drops thefeature_src_dir + 'all.c'line.core/src/feature/offset.c— updates the// NOLINTNEXTLINEcomment to dropall.cfrom the list of per-feature consumers.CHANGELOG.mdUnreleased § Fixed (798409e3) + § Changed (314db130).- Invariants (rebase-relevant):
- The fork has THREE prev_ref update sites; all need the
if (ref && ref->ref)guard. The mainvmaf_read_picturespath already had it (viaread_pictures_update_prev_refhelper); the threaded paths (#ifdef VMAF_BATCH_THREADING) inherited the unguarded shape from upstream's old code. Future upstream rebases must preserve all three guards even if Netflix refactors the threaded paths. all.cdeletion is symbol-safe. Allcompute_*functions it forward-declared are reached via per-extractor TUs that#includethe relevant<feature>.h. No external linker dependency onall.c's symbols.- On upstream sync: zero conflict expected — fork now matches upstream tip on these two surfaces.
- Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=disabled
ninja -C build-cpu
meson test -C build-cpu # 37 tests, all pass.
0074 — Combined Netflix + KoNViD-1k trainer driver¶
- No ADR. Pure engineering follow-up; the architecture rationale is fully covered by ADR-0203 (training-prep architecture) and Research-0023 §5 (FoxBird-class outlier needs broader corpus).
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI trainer.
- Stacks on the KoNViD-1k loader bridge (PR #178 / rebase-note 0073). Rebase order: land 0073 first.
- Touches (additive only):
ai/train/train_combined.py— concatenating trainer that reuses_build_model/_train_loop/export_onnxfromai/train/train.py.ai/tests/test_train_combined_smoke.py— 5 pytest cases (key splitter +--epochs 0paths, no libvmaf or real corpus required).docs/ai/training.md— "Combining KoNViD with the Netflix corpus" subsection rewritten from "follow-up" to runnable.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Reuse the canonical training-loop helpers. Don't fork
_build_model/_train_loop/export_onnxinto this file. Both trainers must share the model factory so a future change (e.g. addingmlp_large) lands in one place. - KoNViD train/val splits hold out whole clip keys, not random frames. A frame-level split would let frames from the same clip leak across train/val and inflate PLCC by 5-10 pp (well-known VQA pitfall — same reasoning as ADR-0203's Netflix 1-source-out split).
- Missing data falls back, not errors. Missing
--konvid-parquet→ Netflix-only path. Missing--netflix-root→ KoNViD-only path. Both missing → initial- weights ONNX export +rc=0so the smoke command always produces a deterministic artefact. - On upstream sync: zero interaction; pure fork-local trainer.
- Re-test on rebase:
pytest ai/tests/test_train_combined_smoke.py -v
# Expect: 5 passed (under ~3 s, no libvmaf required).
python ai/train/train_combined.py --epochs 0 \
--netflix-root /tmp/missing --konvid-parquet /tmp/missing.parquet \
--out-dir /tmp/combined_smoke
# Expect: <out-dir>/mlp_small_combined_final.onnx written, rc=0.
0073 — KoNViD-1k → VMAF-pair acquisition + loader bridge¶
- No ADR. Acquisition + loader pieces are pure additions; the methodology fits inside ADR-0203 / Research-0019.
- Upstream source: fork-local. KoNViD-1k integration is a fork-only training-data play.
- Touches (additive only):
ai/scripts/konvid_to_vmaf_pairs.py— acquisition pipeline.ai/train/konvid_pair_dataset.py—KoNViDPairDatasetclass mirroringNetflixFrameDataset's interface.ai/tests/test_konvid_pair_dataset.py— 5 pytest cases.docs/ai/training.md— new "C1 (KoNViD-1k corpus)" section.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
KoNViDPairDatasetmirrorsNetflixFrameDatasetshape.feature_dim == 6,numpy_arrays() → (X, y)returns(n_frames, 6)+(n_frames,). IfNetflixFrameDataset's feature order changes, mirror it here.- Acquisition parquet schema is fixed. Required columns:
key,frame_index,vif_scale0..3,adm2,motion2,vmaf. Add freely; do NOT rename / drop those. ai/data/konvid_vmaf_pairs.parquetand$VMAF_TINY_AI_CACHE/konvid-1k/stay gitignored. They regenerate from raw KoNViD.mp4sources.- On upstream sync: zero interaction.
- Re-test on rebase:
pytest ai/tests/test_konvid_pair_dataset.py -v
# Expect: 5 passed
python ai/scripts/konvid_to_vmaf_pairs.py --max-clips 5
# Expect: ~7 s wall, ai/data/konvid_vmaf_pairs.parquet with
# 5 unique keys × ~200 frames each.
0072 — Tiny-AI 3-arch LOSO eval harness + Research-0023¶
- No ADR. Methodology fits inside Research-0023; ADR-0203 already covers the training-prep architecture and the three-arch sweep concept.
- Research digest:
docs/research/0023-loso-3arch-results.md. - Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
- Touches (additive only):
ai/scripts/eval_loso_3arch.py— new harness; reuses the_load_session+_load_clip+CLIPShelpers fromeval_loso_mlp_small.py(PR #165).docs/research/0023-loso-3arch-results.md— methodology + per-fold tables formlp_small/mlp_medium/linear.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Reuse the PR #165 helpers. Don't fork the
_load_sessionexternal-data workaround into a copy — both scripts must keep using the same import. If a follow-up re-exports the shipped baselines with correctedexternal_data.location, both scripts deprecate the workaround simultaneously. runs/andmodel/tiny/training_runs/stay gitignored. The harness writesruns/loso_eval/loso_3arch_eval.{json,md}; the durable record is the table in Research-0023 §2 + the per-fold tables in §3. Regenerate via the loop in §6 of the digest.- On upstream sync: zero interaction. Pure fork-local evaluation harness.
- Re-test on rebase:
python ai/scripts/eval_loso_3arch.py
diff <(jq -r '.archs.mlp_small.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9808)
diff <(jq -r '.archs.mlp_medium.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9727)
diff <(jq -r '.archs.linear.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.3679)
# Expect: identical lines on a populated cache + identical fold ONNX.
0071 — T7-16 ADM Vulkan/SYCL drift verified-resolved (doc close)¶
- No ADR. Verification-only close, sister of T7-15.
- Upstream source: fork-local. ADM cross-backend gate is a fork-only test surface; Netflix/vmaf has no Vulkan or SYCL backend.
- Touches (additive only):
docs/state.md— new "Recently closed" row for T7-16..workingdir2/BACKLOG.md— T7-16 row marked closed (local- only planning dossier; gitignored).CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
places=4cross-backend ADM contract. Empiricaladm_scale2max_abs_diff is now 1e-6 (print floor; ULP=0) on Vulkan device 0 (NVIDIA), device 1 (Mesa anv on Arc), and SYCL device 0 (Arc); residualadm_scale1 ≈ 3.1e-5andadm2 ≈ 5e-6on 1/48 frames passplaces=4(5e-5 tolerance) but failplaces=5. Hold the gate atplaces=4.- No ADM kernel source change. Fix is environmental (NVCC + driver + SYCL runtime).
- On upstream sync: zero interaction.
- Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--feature adm --backend vulkan --device 0 --places 4 \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324
# Expect: 0/48 mismatches across all 5 ADM metrics.
0070 — T7-15 motion CUDA/SYCL drift verified-resolved (doc close)¶
- No ADR. Verification-only close; no code change in PR #172.
- Upstream source: fork-local. Cross-backend gate is a fork-only test surface; not in Netflix/vmaf.
- Touches (additive only):
docs/state.md— "Recently closed" row for T7-15..workingdir2/BACKLOG.md— T7-15 row marked closed (local- only planning dossier; gitignored).CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- The
places=4cross-backend gate stays atplaces=4. Empirical max_abs_diff is currently 0.0 (CUDA) or 1e-6 (SYCL/ Vulkan, JSON%frounding floor); tightening toplaces=5could be tempting but the 1e-6 print-floor would then make the SYCL + Vulkan rows fail. Hold atplaces=4until--precision=maxis wired into the diff tool. - No motion-kernel source change. PR #172 didn't modify
core/src/feature/cuda/integer_motion/*.cuorcore/src/feature/sycl/integer_motion_sycl.cpp. The fix is environmental (NVCC + driver), so the next CI run on a fresh image needs to be re-verified against the gate. - On upstream sync: zero interaction.
- Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature motion --backend cuda \
--places 4
# Expect: 0/48 mismatches, max_abs_diff = 0.0
0069 — libvmaf_vulkan.h installed under prefix (build bug)¶
- No ADR. Build-system bug fix; matches existing CUDA / SYCL install conditions.
- Upstream source: fork-local. Vulkan backend is fork-only; Netflix/vmaf has no
libvmaf_vulkan.h. - Touches:
core/include/core/meson.build— adds anis_vulkan_enabledgate that handles thefeatureoption'senabled/autostates; appendslibvmaf_vulkan.htoplatform_specific_headerswhen active.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- Install rule mirrors the CUDA / SYCL pattern but uses the feature-option API. The
is_cuda_enabled = get_option('enable_cuda') == trueboolean idiom doesn't apply toenable_vulkanbecause that's a feature option, not a boolean. Use.enabled() or .auto(). Don't "simplify" to== true— that would silently drop the install in theautostate. - Pairs with
ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patchwhich probes for the header viacheck_pkg_config libvmaf_vulkan "libvmaf >= 3.0.0" libvmaf/libvmaf_vulkan.h vmaf_vulkan_state_init_external. Removing the install rule re-introduces lawrence's 2026-04-28 symptom: FFmpeg silently drops thelibvmaf_vulkanfilter despite--enable-libvmaf-vulkan. - On upstream sync: zero interaction; Vulkan backend is fork-only.
- Re-test on rebase:
cd libvmaf
CC=icx CXX=icpx meson setup build -Denable_vulkan=enabled \
-Denable_cuda=true -Denable_sycl=true -Db_lto=false
ninja -C build
meson install -C build --destdir /tmp/libvmaf-install
ls /tmp/libvmaf-install/usr/local/include/libvmaf/libvmaf_vulkan.h
# Expect: file exists.
0066 — --backend cuda inverted-gpumask fix (CLI bug)¶
- No ADR. Bug fix; behaviour now matches the public-header
VmafConfiguration::gpumaskcontract. - Upstream source: fork-local. The
--backendCLI selector was added by the fork (Netflix/vmaf has no exclusive-backend selector). - Touches (additive + 1-line behavioural fix):
core/tools/cli_parse.c::parse_cli_args—--backend cudabranch setsgpumask = 0(wasgpumask = 1).core/test/test_cli_parse.c— 5 new regression tests (test_backend_{cpu,cuda_engages_cuda,cuda_preserves_explicit_gpumask,sycl,vulkan}) plusrun_aom_ctc_tests/run_backend_testshelper split to keeprun_testsunder the function-size budget.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
VmafConfiguration::gpumasksemantics:if gpumask: disable CUDA.compute_fex_flagsinsrc/libvmaf.croutes CUDA only whengpumask == 0. Any code path that sets a non-zerogpumaskto "request CUDA" silently disables it. The CLI's--backend cudabranch must setgpumask = 0and rely onuse_gpumask = trueto triggervmaf_cuda_state_init. Do not "fix" this back togpumask = 1— it's the bug being fixed.- Explicit
--gpumask=N --backend cudapreserves N. A user who passes--gpumask=2already hasuse_gpumask = true, so the--backend cudabranch's defaulting block (gated on!settings->use_gpumask) is skipped. Thetest_backend_cuda_preserves_explicit_gpumaskregression locks this in. - On upstream sync: zero interaction;
--backendis fork-only. - Re-test on rebase:
./build/test/test_cli_parse | grep -E 'backend_'
# Expect: 5 backend tests pass.
build/tools/vmaf -r REF -d DIS -w 576 -h 324 -p 420 -b 8 \
--model "path=model/vmaf_v0.6.1.json" --threads 1 \
--backend cuda --output cuda.json --json -q
python3 -c "import json; d=json.load(open('cuda.json')); \
assert len(d['frames'][0]['metrics']) == 12, 'CUDA not engaged'"
0067 — Tiny-AI PTQ accuracy across Execution Providers (T5-3e)¶
- No ADR. Investigation/measurement PR; ADR-0129 already governs the PTQ workstream. Findings update
docs/research/0006-tinyai-ptq-accuracy-targets.md§"GPU-EP quantisation" — that section was previously a deferred-open-question; it is now the empirical landing spot. - Research digest: same file (Research-0006).
- Upstream source: fork-local. Netflix/vmaf does not ship a PTQ harness or any tiny-AI ONNX path.
- Touches (additive only):
ai/scripts/measure_quant_drop_per_ep.py— new sibling ofmeasure_quant_drop.py. CPU+CUDA via ORT; Arc / OpenVINO-CPU via the nativeopenvinoPython runtime (noonnxruntime-openvinobecause no cp314 wheel exists). Reuses the_load_sessionrename workaround from PR #165 + avalue_info-strip fix so dynamic-PTQ doesn't choke on the shipped MLP ONNX.docs/ai/quant-eps.md— new user doc; linked fromdocs/ai/index.md.docs/research/0006-tinyai-ptq-accuracy-targets.md— refreshed header, replaced "GPU-EP open question" with the measurement table, fixed pre-existing MD040/MD060 lints surfaced on the touched file.docs/ai/index.md— added the quant-eps row, rewrapped to 80 cols.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant):
measure_quant_drop.py(the CI gate) is unchanged. The new script is purely additive. Any rebase that conflates the two scripts must keep the CI gate CPU-only — Arc int8 is broken, so a per-EP gate would red-light every PR.value_infostrip is required forvmaf_tiny_v1*dynamic PTQ. The shipped MLP ONNX duplicate weight tensors invalue_info, which makesquantize_dynamicraiseInferred shape and existing shape differ. The fix is in_save_inlined. Don't remove it during a refactor unless the underlying ONNX is regenerated.- CUDA-12 ABI shim. ORT-GPU 1.25 wheels link
libcublasLt.so.12even on CUDA-13 hosts. The reproduction recipe pins thenvidia-*-cu12wheels and prepends them toLD_LIBRARY_PATH. If a future ORT wheel drops the cu12 ABI we can cut the shim, but the script tolerates either since it doesn't import any CUDA symbol itself. - On upstream sync: zero interaction; entirely fork-local.
- Re-test on rebase:
SP=$VIRTUAL_ENV/lib/python3.14/site-packages/nvidia
export LD_LIBRARY_PATH="$SP/cublas/lib:$SP/cudnn/lib:$SP/cuda_nvrtc/lib:$SP/cuda_runtime/lib:$SP/cufft/lib:$SP/curand/lib:$SP/cusolver/lib:$SP/cusparse/lib:$SP/cuda_cupti/lib:$SP/nvtx/lib:$SP/nvjitlink/lib"
python ai/scripts/measure_quant_drop_per_ep.py \
--eps cpu cuda openvino \
--extra-fp32 vmaf_tiny_v1.onnx vmaf_tiny_v1_medium.onnx \
--out runs/quant-eps-$(date +%Y-%m-%d)
# Expected: CPU + CUDA PASS (drop ≤ 1.2e-4); OpenVINO Arc ERR
# (compile failure for Conv-int8) or NaN (MatMul-int8) until a
# newer intel_gpu plugin lands.
0065 — testdata/bench_all.sh correct backend-engagement flags¶
- No ADR. Bug fix; no behavioural surface change beyond "the bench actually engages the backends it claims to now."
- Upstream source: fork-local.
testdata/bench_all.shis a fork-only bench harness; not in Netflix/vmaf. - Touches (additive only):
testdata/bench_all.sh— switched per-row flag pattern from the disable-only singletons (--no_syclfor "CUDA", etc.) to the correct engagement form (--gpumask=0 --no_sycl --no_vulkanfor CUDA,--sycl_device=0 --no_cuda --no_vulkanfor SYCL,--vulkan_device=0 --no_cuda --no_syclfor Vulkan, and--no_cuda --no_sycl --no_vulkanfor CPU). Added a 4th column (Vulkan) to the comparator. Honours$VMAF_BINfor the binary path and$VMAF_ONEAPI_SETVARSfor the oneAPI install location.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- Disable-only singletons don't engage a backend.
--no_syclalone leaves CUDA available but unrequested.--no_cudaalone leaves SYCL available but unrequested. The CLI inits CUDA only whenc.use_gpumaskis set; SYCL only whenc.sycl_device >= 0orc.use_gpumask; Vulkan only whenc.vulkan_device >= 0. Any change to those gates that drops one of the per-row flags will re-introduce the silent CPU fallback. Verify after a rebase by inspecting JSONframes[0].metricskey counts (CPU 14-15, CUDA 11-12, Vulkan ~34) — seelibvmaf/AGENTS.md§"Backend-engagement foot-guns". gpumasksemantics are inverted from intuition.gpumask=0enables CUDA dispatch;gpumask=1disables it. The per-row CUDA flag is--gpumask=0, not--gpumask=1. Don't "fix" it to--gpumask=1for symmetry with sycl_device/vulkan_device — that's the bug being fixed (parallel to PR #170).- On upstream sync: zero interaction;
testdata/bench_all.shis fork-only. - Re-test on rebase:
bash testdata/bench_all.sh # smoke
# Verify each row's JSON keys match the expected per-backend count:
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cpu.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cuda.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_vulkan.json
0063 — Tiny-AI LOSO eval harness for mlp_small¶
- No ADR. The methodology fits inside Research Digest 0022; ADR-0203 already covers the training-prep architecture.
- Research digest:
docs/research/0022-loso-mlp-small-results.md. - Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
- Touches (additive only):
ai/scripts/eval_loso_mlp_small.py— new evaluation harness.docs/ai/loso-eval.md— usage doc.docs/research/0022-loso-mlp-small-results.md— methodology + results.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
_load_sessionworkaround for renamed-baseline ONNX. The shipped baselinesmodel/tiny/vmaf_tiny_v1*.onnxreference their pre-renameexternal_data.locationvalues. The workaround in_load_sessionrewrites the entries before handing the proto to ORT. Removing the workaround breaks the baseline phase. The proper fix (re-export with matching names) is tracked as a follow-up; until then this code path is load-bearing.runs/andmodel/tiny/training_runs/stay gitignored. The harness writes toruns/loso_eval/by default; do NOT promote any of those outputs into the tree. The 9 fold ONNX and the per-clip JSON cache regenerate from the corpus + trainer + libvmaf CLI.- On upstream sync: zero interaction. Pure fork-local evaluation harness.
- Re-test on rebase:
python ai/scripts/eval_loso_mlp_small.py
diff <(jq -r '.loso_aggregate.mean_plcc' runs/loso_eval/loso_mlp_small_eval.json) <(echo 0.9808)
# Expect: identical line on a populated cache + identical fold ONNX.
0064 — Section-A audit: 9 backlog rows + ADR cross-links¶
- No ADR. Process / docs PR; rows trace back to the individually-cited ADRs / research digests in their own References columns.
- Decision dossier:
.workingdir2/decisions/section-a-decisions-2026-04-28.md. - Source audit:
docs/backlog-audit-2026-04-28.md. - Upstream source: fork-local. Pure backlog hygiene PR; no Netflix code touched.
- Touches (additive only):
.workingdir2/BACKLOG.md— 9 new rows: T3-17, T3-18, T5-3e, T5-4, T7-35, T7-36, T7-37, T7-38; T6-1a row extended with the bisect-cache fixture sub-bullet.docs/research/0006-tinyai-ptq-accuracy-targets.md— drops the "defer until first user" framing on the GPU-EP quantisation open question per user direction; cross-links T5-3e.docs/research/0020-cambi-gpu-strategies.md— v2 follow-up section now cites T7-36 as the gate for opening the v2 row.docs/adr/0205-cambi-gpu-feasibility.md— Decision section's "follow-up integration PR" now cites T7-36.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant): none. Pure backlog text. Rebase-conflict risk is limited to the same
BACKLOG.mdtable rows that any future row addition would touch; trivial to re-resolve. - On upstream sync: zero interaction.
- Re-test on rebase: none — docs-only.
0062 — ssimulacra2 CUDA + SYCL twins (ADR-0206)¶
- ADR: ADR-0206.
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2 GPU implementation; this PR adds the CUDA + SYCL twins of the fork's ADR-0201 Vulkan kernel.
- Touches (additive + small wiring edits):
docs/adr/0206-ssimulacra2-cuda-sycl.mdand the index row indocs/adr/README.md.core/src/feature/cuda/ssimulacra2_cuda.{c,h}— new CUDA dispatch.core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cuandssimulacra2_mul.cu— new CUDA fatbins.core/src/feature/sycl/ssimulacra2_sycl.cpp— new SYCL extractor.core/src/feature/feature_extractor.c— two new extern declarations + two new entries infeature_extractor_list[].core/src/meson.build— addsssimulacra2_blur+ssimulacra2_multocuda_cu_sources, introduces (or extends, if PR #157 / ADR-0202 landed first) thecuda_cu_extra_flagsmap with assimulacra2_blurentry, threadsper_kernel_flagsinto the fatbin custom-target, and lists the two new C / CPP TUs.core/src/cuda/AGENTS.mdandcore/src/sycl/AGENTS.md— rebase invariant notes for the per-kernel--fmad=falseflag and the-fp-model=preciseSYCL build flag.docs/backends/cuda/overview.md,docs/backends/sycl/overview.md,docs/metrics/features.md— coverage matrix updates.CHANGELOG.mdUnreleased § Added.- Invariants (load-bearing on rebase):
- Per-kernel
--fmad=falseforssimulacra2_blur. The IIR'so = n2 * sum - d1 * prev1 - prev2must NOT fuse into FMAs — without the flag the recursive Gaussian's per-step rounding compounds across the 6-scale pyramid pastplaces=4. -fp-model=preciseon the SYCL feature build line. Removing it driftsssimulacra2_syclpastplaces=2through the IIR.- Hybrid host/GPU split mirrors Vulkan. Host runs YUV→RGB, XYB, downsample, and SSIM/EdgeDiff combine in double; GPU runs only mul + IIR blur. Any future PR that ports XYB or YUV→RGB onto the GPU MUST land alongside an updated ADR-0206 and re-validate
places=4on every Netflix CPU pair. - CUDA fex uses
.extract(synchronous), not.submit/.collect. Per-frame raw YUV is D2H-copied frompicture_cuda's device-sideVmafPicture.data[]into pinned host scratch viacuMemcpy2DAsync. Skipping the copy segfaults — direct host reads on aCUdeviceptrare the failure mode the prior agent's WIP hit. - On upstream sync: zero interaction with Netflix. The GPU coverage matrix for
ssimulacra2is wholly fork-local. - Re-test on rebase:
meson setup build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary ./build_cuda/tools/vmaf \
--feature ssimulacra2 --backend cuda --places 4 \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8
# Expect: 0/48 mismatches, max_abs_diff ~1e-6.
0061 — cambi GPU feasibility spike (ADR-0205)¶
- ADR: ADR-0205.
- Research digest:
docs/research/0020-cambi-gpu-strategies.md. - Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
- Touches (additive only):
docs/adr/0205-cambi-gpu-feasibility.md,docs/research/0020-cambi-gpu-strategies.md,docs/adr/README.mdindex row.core/src/feature/vulkan/cambi_vulkan.c— new dormant scaffold (not yet invulkan_sources, not yet registered).core/src/feature/vulkan/shaders/cambi_{derivative,decimate,filter_mode}.comp— new reference GLSL shaders, not yet in the build'sshaderslist.core/src/feature/AGENTS.mdinvariants +CHANGELOG.mdbullet.- Invariants (rebase-relevant):
- Hybrid host/GPU port by decision. If Netflix upstream tightens the c-value formula or histogram update protocol, the host residual call site in the eventual
cambi_vulkan.c::cambi_vulkan_extractmust be updated alongsidecambi.c::calculate_c_values— the same code is reused. Do NOT translate the c-values phase to GPU during any upstream-port PR; that optimisation belongs to the v2 strategy-III PR (deferred). - Scaffolds dormant in the spike PR. The
cambi_vulkan.cextractor returns-ENOSYSfromcambi_vulkan_init_stubuntil the integration follow-up wires it in. Do NOT registervmaf_fex_cambi_vulkan_scaffoldinfeature_extractor.c's list. - Shaders not in the build's shader list. Adding them to
core/src/vulkan/meson.build'svulkan_shaderslist before the integration PR produces orphaned*_spv.hheaders. Leave them alone in this spike PR. - On upstream sync: zero interaction. cambi.c itself is upstream-mirrored — Netflix changes flow through
port-upstream-commit; only the integration PR's host residual call site needs paired attention. - Re-test on rebase:
```bash meson setup build -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build
0059 — Tiny-AI Netflix corpus training prep (ADR-0203)¶
- ADR: ADR-0203.
- Upstream source: fork-local. Netflix/vmaf has no equivalent training surface.
- Touches:
ai/data/— Netflix loader, libvmaf-CLI feature extractor, distillation scoring.ai/train/— PyTorch dataset, eval harness, Lightning-style training entry point.ai/scripts/run_training.sh— convenience wrapper.ai/tests/— five new pytest modules (test_netflix_loader.py,test_dataset.py,test_eval.py,test_train_smoke.py, plusconftest.py).docs/ai/training.md— new "C1 (Netflix corpus)" section; existing sections untouched.ai/AGENTS.md— invariants section added.- Invariants (load-bearing):
- Filename ladder regex is fork-specific.
<source>_<quality>_<height>_<bitrate>.yuv(dis) +<source>_<fps>fps.yuv(ref). Upstream may publish a different naming convention later; do NOT merge them — keep this loader scoped to the Netflix corpus, add a sibling loader for any upstream alternative. - Per-clip cache schema is consumed by both dataset and any downstream tooling. Schema is
{features:{feature_names, per_frame, n_frames}, scores:{per_frame, pooled}}. Any change must invalidate$VMAF_TINY_AI_CACHE(delete or version-tag the directory). - Smoke command stays runnable without a built
vmafbinary. The_make_zero_payloadhelper inai.train.datasetinjects a fake payload for--epochs 0so CI gates don't drag a libvmaf build into the Python test surface. - YUV size probe never silently guesses.
probe_yuv_dimseither matches the 1920x1080 default, returns ffprobe's answer, or raises. Tests passassume_dims=(16, 16)explicitly for synthetic fixtures. - On upstream sync: no interaction with upstream. The
ai/subtree is wholly fork-local. - Re-test on rebase:
python -m pytest ai/tests/test_netflix_loader.py \
ai/tests/test_dataset.py ai/tests/test_eval.py \
ai/tests/test_train_smoke.py -v
python ai/train/train.py --epochs 0 --data-root /tmp/mock_corpus \
--assume-dims 16x16 --val-source BetaSrc --out-dir /tmp/out
0073 — Tiny-AI QAT trainer + first per-model QAT pass (T5-4)¶
- ADR: ADR-0207 (design), ADR-0208 (per-model impl).
- Touches:
ai/train/qat.py(new),ai/scripts/qat_train.py(rewrite fromNotImplementedErrorscaffold),ai/configs/learned_filter_v1_qat.yaml(new),ai/tests/test_qat_smoke.py(new),docs/ai/quantization.md(QAT tier added). All paths are wholly fork-local; no upstream Netflix/vmaf interaction. - Invariants:
- Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) is load-bearing. Both the legacy ONNX exporter (
quantized::conv2d) and the new TorchDynamo exporter (Conv2dPackedParamsBase.__obj_flatten__) refuse to consumeconvert_fxoutput on PyTorch 2.11. The bridge (state-dict diff to a fresh fp32 module + ORT static-quantize) is the only path that yields a QDQ ONNX. Do NOT collapse to a single-stepconvert_fx → torch.onnx.exportuntil both PyTorch issues are fixed; re-check both exporters on each PyTorch upgrade. - State-dict transfer matches by submodule name + shape.
_copy_qat_weights_into_fp32walksfp32_statekeys, finds the same key in the FX-prepared module, copies the tensor. Tiny-AI models today have stable submodule names (entry,body.*,exit); a model architecture that uses top-levelnn.Sequentialwould break this becauseprepare_qat_fxrenames Sequential children to numeric indices. TheRuntimeError("0 tensors copied")guard catches the silent failure mode. - FX preparation runs on CPU. PyTorch 2.11's FX symbolic tracer is flaky on CUDA buffers; the trainer migrates the model to CPU before
prepare_qat_fxand back to the accelerator for the fine-tune phase. The smoke test deliberately exercises the CPU path so this stays covered. torch.ao.quantizationdeprecation will hard-fail in PyTorch 2.10. Migration target istorchao.quantization.pt2e(prepare_pt2e/convert_pt2e); the two-step pipeline is mostly pt2e-compatible — only the FX-prep call changes.- On upstream sync: no interaction with upstream. The
ai/subtree is fully fork-local. - Re-test on rebase:
python -m pytest ai/tests/test_qat_smoke.py -v
python ai/scripts/qat_train.py \
--config ai/configs/learned_filter_v1_qat.yaml \
--output /tmp/qat_smoke.int8.onnx --smoke
0074 — GPU-parity matrix CI gate (T6-8 / ADR-0214)¶
- Touched surfaces (fork-local):
scripts/ci/cross_backend_parity_gate.py(new),.github/workflows/tests-and-quality-gates.yml(newvulkan-parity-matrix-gatejob),docs/development/cross-backend-gate.md(new),docs/backends/index.md(cross-backend section),libvmaf/AGENTS.md(rebase-sensitive invariant note). - Why this matters on rebase: the CI lane and the matrix-gate script are entirely fork-local. Upstream Netflix/vmaf has no comparable gate; conflicts on rebase are restricted to the CI workflow file when upstream rearranges its own jobs. The gate's Python script lives outside
core/src/so the upstream-sync path doesn't see it. - Invariants the gate enforces:
- Per-feature absolute tolerance is declared in one place (
FEATURE_TOLERANCEinscripts/ci/cross_backend_parity_gate.py). Tightening a tolerance requires a measurement-driven follow-up ADR; loosening requires a justification ADR (CLAUDE.md §12 r1). - The legacy single-feature gate
scripts/ci/cross_backend_vif_diff.pystays for one release cycle. Sister PRs in this session add to it; the T6-8b cleanup PR deletes it once the matrix gate has soaked. - CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is registered. The script supports them via
--backends; flipping the CI lane to required is a follow-up wiring change, not a code change. - On upstream sync: no interaction with upstream
tests-and-quality-gates.yml(the gate job is fork-added); rebase conflicts limited to insertion-order in the workflow file. - Re-test on rebase:
cd libvmaf && meson setup build \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled -Denable_float=true \
--buildtype=release && ninja -C build
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
--vmaf-binary core/build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --backends cpu vulkan \
--json-out /tmp/parity.json --md-out /tmp/parity.md
0220 — SYCL feature kernels are unconditionally fp64-free (T7-17)¶
- Touches:
core/src/sycl/common.cpp(init log line),core/src/sycl/AGENTS.md(new invariant row), all SYCL feature kernels undercore/src/feature/sycl/(no diff today, but the contract pins their shape going forward). - Invariant: every SYCL feature-kernel lambda captures and operates on
float/ integer types only. Nodoubleoperand inside aparallel_forbody, nosycl::reduction<double>, nosycl::plus<double>. A single fp64 instruction in the TU's SPIR-V module causes the Level Zero runtime to reject the entire module on Intel Arc A-series and other fp64-less devices, even when the offending kernel is never submitted. Host-sidedouble(inextract/flushpost-processing, score aggregation, log10 normalisation) remains fine. Concrete patterns in tree: ADM gain limiting via int64 Q31 (gain_limit_to_q31+launch_decouple_csf<false>ininteger_adm_sycl.cpp); VIF gain limiting via fp32sycl::fmin; CIEDE / SSIM accumulators viasycl::reduction<int64_t>/sycl::plus<int64_t>. - On upstream sync: Netflix/vmaf has no SYCL backend upstream; conflicts cannot enter via
git merge. The risk is a fork-local cherry-pick (e.g. a SYCL twin of a new CUDA kernel) bringing adoubleinto a kernel lambda. Audit the lambda capture list and anysycl::reduce*calls against this invariant before merging. - Re-test on rebase:
# Build SYCL backend
meson setup build-sycl libvmaf -Denable_sycl=true CC=icx CXX=icpx
ninja -C build-sycl
# On an fp64-less device (e.g. Intel Arc A380), confirm the
# init log line is INFO-level and reads "device lacks native
# fp64 — kernels already use fp32 + int64 paths, no emulation
# overhead". The SYCL kernels must launch successfully (no
# SPIR-V module rejection from the Level Zero runtime).
build-sycl/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --backend sycl \
--feature integer_vif --feature integer_adm \
--output /tmp/sycl-fp64less.json --json
0091 — T6-9 model registry schema + --tiny-model-verify (ADR-0211)¶
- No rebase impact: 100% fork-local surface. The registry (
model/tiny/registry.json), its JSON Schema (model/tiny/registry.schema.json), the--tiny-model-verifyCLI flag, and thevmaf_dnn_verify_signature()C entry point are entirely fork-local — none of these paths exist in upstream Netflix/vmaf. Listed here for completeness so a future/sync-upstreamrun sees the surface area was acknowledged. - Touches (additive only):
model/tiny/registry.json,model/tiny/registry.schema.json,ai/scripts/validate_model_registry.py,core/src/dnn/model_loader.{c,h}(addedvmaf_dnn_verify_signature()),core/include/libvmaf/dnn.h(public declaration),core/tools/cli_parse.{c,h}(ARG_TINY_MODEL_VERIFY+tiny_model_verifyfield),core/tools/vmaf.c(call site),core/test/dnn/test_tiny_model_verify.c,python/test/model_registry_schema_test.py,docs/ai/model-registry.md,docs/ai/inference.md,docs/ai/security.md,docs/adr/0209-...md,docs/adr/README.md(index row),CHANGELOG.md,core/src/dnn/AGENTS.md. - Invariants (rebase-relevant):
- Schema is the contract. New registry fields land in
registry.schema.jsonfirst, then inregistry.json, then in any consumers (the C-side parser, the Python validator, the MCP). Reverse order causes mismatch. schema_versionis bounded. The schema accepts only{0, 1}; bump the enum and the loader's check together when adding2.- Banned-function rule applies. The
cosigninvocation usesposix_spawnp(3p)with an explicit argv array. Do not replace withsystem(3)/popen(3)— both shell-parse the command and would re-introduce injection risk. - Bundle-file absence is fail-closed. When
sigstore_bundlepoints at a not-yet-existing file (pre-release state),vmaf_dnn_verify_signature()returns-ENOENT. The CLI surfaces this as a load failure; do not "soften" to a warning without an explicit ADR. - Re-test on rebase:
python3 ai/scripts/validate_model_registry.py
python3 -m pytest python/test/model_registry_schema_test.py -v
meson test -C build-cpu --suite=dnn
0074 — HIP (AMD ROCm) backend scaffold (T7-10)¶
- ADR: ADR-0212.
- Upstream source: fork-local. HIP backend is fork-only; Netflix/vmaf has no
libvmaf_hip.hand noenable_hipmeson option. - Touches:
core/include/libvmaf/libvmaf_hip.h(new).core/include/core/meson.build— adds theis_hip_enabledinstall gate, mirroringis_cuda_enabled/is_sycl_enabledboolean idioms.core/meson_options.txt— newenable_hipboolean option (default false).core/src/meson.build— newis_hip_enabledflag, conditionalsubdir('hip'),hip_sources+hip_depsthreaded throughlibvmaf_feature_static_lib(alongside the existing CUDA / SYCL / Vulkan aggregations) and the top-levellibrary('vmaf', ...)dependencieslist.core/src/hip/(new directory:common.{c,h},picture_hip.{c,h},dispatch_strategy.{c,h},meson.build).core/src/feature/hip/(new directory:adm_hip.c,vif_hip.c,motion_hip.c).core/test/test_hip_smoke.c(new).core/test/meson.build— registers the smoke test underif get_option('enable_hip') == true..github/workflows/libvmaf-build-matrix.yml— addsBuild — Ubuntu HIP (T7-10 scaffold)row.docs/backends/hip/overview.md(new),docs/backends/index.md(planned → scaffold row),docs/research/0033-hip-applicability.md(new),docs/adr/0212-hip-backend-scaffold.md(new),docs/adr/README.md(new index row).libvmaf/AGENTS.md— new "HIP backend scaffold contract" rebase-sensitive invariant entry.CHANGELOG.md— Unreleased § Added.- Invariants (rebase-relevant):
enable_hipis abooleanoption, not afeature. Mirrorsenable_cuda/enable_sycl; do not "harmonise" withenable_vulkan'sfeature/disabledform without an ADR amendment per ADR-0212 § "Decision".- Public C-API entry points return
-ENOSYSfor the scaffold. The smoke test core/test/test_hip_smoke.c pins this. A rebase that "succeeds" by accidentally enabling a code path (e.g. a refactor that early-returns 0 fromvmaf_hip_state_init) breaks the smoke and the runtime PR's contract baseline. hip_sourcesis added tolibvmaf_feature_static_lib, NOT directly to the top-levellibrary('vmaf', ...). The static lib is extracted into libvmaf viaobjects: [..., libvmaf_feature_static_lib.extract_all_objects(recursive: true), ...]at the bottom ofcore/src/meson.build. Addinghip_sourcesto the top library() too would double-link.hip_depsIS added to the top library()dependencies:list. The runtime PR will populatehip_depswith the realdependency('hip-lang')linkage; threading it through the top library() ensures consumers see the transitive dependency.- Header purity:
libvmaf_hip.hdoes not include<hip/hip_runtime.h>. HIP runtime types cross the public ABI asuintptr_t(matches the CUDA / Vulkan precedent; ADR-0212). Don't add<hip/...>includes to the public header during a rebase / runtime-PR bring-up. - No FFmpeg patch: the fork's
ffmpeg-patches/series does not currently consume the HIP API surface. CLAUDE §12 r14 only requires patch updates when an existing patch consumes the surface; the runtime PR (T7-10b) will add thehip_devicefilter option and the corresponding patch. - On upstream sync: zero interaction; HIP backend is fork-only.
- Re-test on rebase:
cd libvmaf
meson setup build-hip -Denable_cuda=false -Denable_sycl=false \
-Denable_hip=true
ninja -C build-hip
meson test -C build-hip test_hip_smoke
# Expect: 9/9 pass.
# Default no-HIP build still works:
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=fast
0074 — SSIMULACRA 2 SVE2 SIMD parity (T7-38)¶
- ADR: ADR-0213.
- Touches:
core/src/feature/arm64/ssimulacra2_sve2.{c,h}(new),core/src/feature/ssimulacra2.c(dispatch table override ininit_simd_dispatch),core/src/arm/cpu.{c,h}(HWCAP2_SVE2 probe + newVMAF_ARM_CPU_FLAG_SVE2enum value),core/src/meson.build(cc.compiles probe + optionalarm64_ssimulacra2_sve2static library),core/test/test_ssimulacra2_simd.c(SVE2 picker overrides on the arm64 path + dispatch diagnostic),build-aux/aarch64-linux-gnu-sve2.ini(new cross-file pinningqemu-aarch64-static -cpu max). All paths are wholly fork-local; no upstream Netflix/vmaf code is modified. - Invariants:
- Fixed 4-lane SVE2 predicate. Every kernel uses
svwhilelt_b32(0, 4)so SIMD arithmetic order is identical to the NEON sibling regardless of the runtime vector length. This keeps the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract intact. Do NOT widen the predicate tosvptrue_b32()without a separate ADR + snapshot regen — variable-length lane reductions perturb the per-step rounding order. - NEON stays the fallback. SVE2 is purely additive; the dispatch table assigns NEON first and only overrides on
VMAF_ARM_CPU_FLAG_SVE2. A toolchain that fails thecc.compiles(... -march=armv9-a+sve2)probe leavesHAVE_SVE2unset and the legacy NEON-only build is unchanged. -ffp-contract=offmirrors the NEON sibling. Without it GCC fuses the per-lane scalar tail'sa*b+cpatterns intofmla, drifting against the SIMD path by ~1 ulp. Thearm64_ssimulacra2_sve2static library carries the flag like its NEON counterpart.- On upstream sync: no interaction with upstream —
arm64/feature TUs and thearm/cpu.{c,h}flag enum are fork-local. An upstream sync that rewritesinit_simd_dispatchincore/src/feature/ssimulacra2.cwould also need the SVE2 cases preserved. - Re-test on rebase:
meson setup build-arm64-sve2 libvmaf \
--cross-file=build-aux/aarch64-linux-gnu-sve2.ini -Denable_asm=true
ninja -C build-arm64-sve2 test/test_ssimulacra2_simd
meson test -C build-arm64-sve2 test_ssimulacra2_simd
# stderr should report `ssimulacra2 simd dispatch: NEON=1 SVE2=1`
# and 11/11 tests should pass.
0075 — enable_lcs MS-SSIM extras on CUDA + Vulkan (T7-35 / ADR-0243)¶
- Touched surfaces (fork-local):
core/src/feature/cuda/integer_ms_ssim_cuda.c(addedenable_lcstoMsSsimStateCuda+options[]+ 15 host-sidevmaf_feature_collector_appendcalls gated on the bool),core/src/feature/vulkan/ms_ssim_vulkan.c(rewroteenable_lcshelp text + addedemit_lcs_metricshelper + gated 15vmaf_feature_collector_appendcalls),scripts/ci/cross_backend_vif_diff.pyscripts/ci/cross_backend_parity_gate.py(newfloat_ms_ssim_lcspseudo-feature +FEATURE_ALIASESmapplaces=4tolerance row). - Why this matters on rebase: the GPU MS-SSIM extractors are fork-local (Netflix upstream has no Vulkan or CUDA MS-SSIM kernel today). The
enable_lcssemantic and the metric names (float_ms_ssim_{l,c,s}_scale{0..4}) must match the upstream CPU reference atcore/src/feature/float_ms_ssim.c:189-221. If upstream ever renames or reorders those metrics, mirror the change on the GPU side in the same merge — public-API contract. - Invariants the contract enforces:
- Default-path output (
enable_lcs=false) stays bit-identical to the pre-T7-35 binary: only the host-side appends are gated; no kernel / shader / device-buffer changes. - Metric ordering is metric-wise (all
l_scale*first, thenc_*, thens_*) — matches the CPU emission order. places=4cross-backend tolerance per ADR-0190; enforced by the newfloat_ms_ssim_lcscell in the parity matrix gate (ADR-0214).- On upstream sync: zero interaction; the GPU twins do not exist upstream. The CPU
float_ms_ssim.cis shared with upstream butenable_lcsis upstream-stable since v3.0.0. - Re-test on rebase:
cd libvmaf && meson setup build-vulkan \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled -Denable_float=true \
--buildtype=release && ninja -C build-vulkan
cd ..
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build-vulkan/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 \
--feature float_ms_ssim_lcs --backend vulkan --places 4
0075 — 32-bit ADM/cpu fallbacks port (T-NEW-3)¶
- Touched surfaces (upstream-mirror):
core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/x86/cpu.c. Cherry-picks of upstream8a289703(Christopher Degawa, "adm: add fallback for extract_epi64 for 32-bit") and1b6c3886("x86/cpu: remove limit of avx+ on 32-bit"). - Why this matters on rebase: trivially conflict-free with any future upstream
extract_epi64work because we land upstream's exactextract_epi64macro/inline-fn pair. The conflict surface is the fork's clang-format-100col layout inadm_avx2.c/adm_avx512.cand the_Alignas(64)LTO-correctness slot inadm_avx512.c(docs/development/known-upstream-bugs.md); both are preserved verbatim. - Invariants the port preserves:
_Alignas(64) int64_t angle_flag[16]inadm_decouple_s123_avx512stays — without it, LTO can promote the unaligned load tovmovdqa64and fault under--buildtype=release -Db_lto=true.- The
extract_epi64symbol must remain resolved on both__x86_64__(macro to_mm256_extract_epi64) and 32-bit (fallback inline). If a future upstream change inlines the helper differently, keep the conditional definition. - On upstream sync: if Netflix ships further 32-bit fallbacks (motion / psnr — not in this port), expect a parallel
extract_epi64-style helper at the top of each affected SIMD file. The fork should mirror those verbatim into the same files. - Re-test on rebase:
meson setup build-i686 libvmaf \
--cross-file=build-aux/i686-linux-gnu.ini \
-Denable_asm=false
ninja -C build-i686
meson setup build-cpu libvmaf -Denable_avx512=true
ninja -C build-cpu
meson test -C build-cpu
0076 — codec-aware FR regressor surface (T7-CODEC-AWARE / ADR-0235)¶
- Touches:
ai/src/vmaf_train/codec.py(new),ai/src/vmaf_train/models/fr_regressor.py(extended),ai/scripts/bvi_dvc_to_full_features.py,ai/scripts/extract_full_features.py. No upstream-shared paths. - Invariant:
CODEC_VOCABinai/src/vmaf_train/codec.pyis closed and order-stable — the index of each codec is the one-hot column index baked into trained ONNX. Adding a codec appends to the tuple and bumpsCODEC_VOCAB_VERSION; reordering silently invalidates every shippedfr_regressor_v2_*.onnx.FRRegressor(num_codecs=0)must remain the v1 single-input contract — flipping the default would break every existingmodel/tiny/fr_regressor_v1.onnxconsumer. - Re-test:
pytest ai/tests/test_codec_aware_fr.py -v(8 sub-tests covering vocabulary contract + alias table + back-compat). Pure fork-local addition; no upstream rebase impact for the next/sync-upstream.
0075 — feature/speed extractors (T-NEW-1, upstream port d3647c73)¶
- Touches:
core/src/feature/speed.c(new),core/src/feature/picture_copy.{c,h}(signature change — addedint channelparameter),core/src/feature/float_*.ccall sites updated to passchannel=0,core/src/feature/feature_extractor.cregistry block,core/src/feature/alias.c,core/src/meson.build,core/src/feature/vif_tools.{c,h}(helper-function port from upstream4ad6e0ea). - Upstream source: verbatim cherry-pick of Netflix/vmaf
d3647c73("feature/speed: port speed_chroma and speed_temporal extractors") with its dependency4ad6e0ea("feature/vif: port helper functions"). Both are pre-existing on Netflix master and enter the fork as part of the T7-4 audit catch-up. - Invariant:
picture_copy()now takes achannelargument — every fork-local extractor that calls it (CUDAinteger_ms_ssim, Vulkanssim/ms_ssim) passeschannel=0. If upstream later evolves the signature again (e.g. adds bit-depth or stride validation), update those fork-local call sites in lockstep. Speed extractors only register whenVMAF_FLOAT_FEATURES=1(build with-Denable_float=true). - On upstream sync: future Netflix commits in
core/src/feature/speed.capply cleanly because the file is now a verbatim mirror; conflict potential is limited to the registry block infeature_extractor.c(interleave with the fork's Vulkan / SYCL / CUDA blocks) and to any furtherpicture_copysignature evolution. - Re-test on rebase:
```bash meson setup build-cpu libvmaf -Denable_cuda=false \ -Denable_sycl=false -Denable_float=true ninja -C build-cpu meson test -C build-cpu test_speed meson test -C build-cpu # full meson suite make test-netflix-golden # 3 CPU canonical pairs
0221 — CHANGELOG + ADR-index fragment-file pattern (T7-39 / ADR-0221)¶
- What changed: the fork stopped editing
CHANGELOG.mdanddocs/adr/README.mddirectly. Both files are now rendered from fragment trees: changelog.d/<section>/<topic>.md(Keep-a-Changelog sections), plus the migration archivechangelog.d/_pre_fragment_legacy.md.docs/adr/_index_fragments/<NNNN-slug>.md, plusdocs/adr/_index_fragments/_order.txt(frozen commit-merge order manifest) anddocs/adr/_index_fragments/_header.md(table prelude). Two scripts render the consolidated outputs:scripts/release/concat-changelog-fragments.sh --check|--writescripts/docs/concat-adr-index.sh --check|--write- On upstream sync: zero interaction —
CHANGELOG.mdis a fork-local Markdown surface (Netflix upstream doesn't ship a Keep-a-Changelog file in this format), anddocs/adr/is entirely fork-local. A/sync-upstreamrun will not touch the fragment trees. - Re-test on rebase:
bash scripts/release/concat-changelog-fragments.sh --check
bash scripts/docs/concat-adr-index.sh --check
# both must exit 0; otherwise run --write and re-stage.
0077 — DISTS extractor proposal (T7-DISTS / ADR-0236)¶
- What landed: ADR-0236 (Proposed) + Research-0043 design digest
- ADR README index row + CHANGELOG entry.
- Rebase impact: pure fork-local proposal-stage docs; no code, no Netflix-mirror file touched, no ffmpeg-patches change, no public C-API surface change.
- Reproducer (when implementation lands as T7-DISTS):
```sh vmaf --feature dists_sq=model_path=model/tiny/dists_sq.onnx \ --reference ref.yuv --distorted dist.yuv \ --width 1920 --height 1080 --pix_fmt yuv420p
0076 — GPU-gen ULP calibration head (proposal-stage, T7-GPU-ULP-CAL / ADR-0234)¶
- What landed: ADR-0234 (Proposed), Research-0041, data-collection scaffold at
ai/scripts/collect_gpu_calibration_data.py, forward-pointer indocs/usage/cli.mdfor the future--gpu-calibratedflag. - Rebase impact: pure fork-local (proposal docs + Python script); no upstream Netflix/vmaf code touched, no public C-API changes, no ffmpeg-patches changes.
- Reproducer:
```sh python3 ai/scripts/collect_gpu_calibration_data.py --smoke
0095 — Per-backend GPU kernel scaffolding templates (CUDA + Vulkan, ADR-0246)¶
- ADR: ADR-0246.
- Touches:
core/src/cuda/kernel_template.h(new, header-only).core/src/vulkan/kernel_template.h(new, header-only).core/src/cuda/AGENTS.md(new invariant row + dir listing).core/src/vulkan/AGENTS.md(new file).docs/backends/kernel-scaffolding.md(new).docs/adr/0246-gpu-kernel-template.md(new).CHANGELOG.md,docs/adr/README.md. All paths are wholly fork-local. Upstream Netflix/vmaf has no Vulkan backend at all today and the CUDA backend uses different per-kernel scaffolding shapes; nothing here can collide on a pure upstream sync.- Invariants:
- Templates are unused at PR-merge time.
kernel_template.hin bothcore/src/cuda/andcore/src/vulkan/lands with zero call-sites. Each future kernel migration is its own gated PR (places=4cross-backend-diff per ADR-0214). Do not bulk-port existing kernels onto the templates in a single sync — that would short-circuit the per-kernel gate. - Per-backend, not cross-backend. Resist the urge to merge the two templates into a unified
gpu/kernel_template.h. CUDA async-stream + event vs Vulkan command-buffer + fence + descriptor-pool share no concrete shape; a unified API would be lowest-common-denominator. - Helper functions, not macros. The header bodies are
static inlinefunctions for cuda-gdb / Nsight / RenderDoc step-debugging. TheCHECK_CUDA_GOTO/CHECK_CUDA_RETURNmacros incuda_helper.cuhstay where they pay off (textualgoto label), and the templates use them internally. - On upstream sync: no interaction with upstream paths. An upstream sync that touches
core/src/cuda/common.horpicture_cuda.hmay shift the helper signatures the template consumes (vmaf_cuda_buffer_alloc,vmaf_cuda_picture_get_stream, …); update the template if so. - Re-test on rebase:
```bash # CUDA build (configure inside libvmaf/ — see CLAUDE.md §2 note). meson setup core/build-cuda libvmaf \ -Denable_cuda=true -Denable_nvcc=true \ -Denable_vulkan=disabled -Denable_sycl=false ninja -C core/build-cuda meson test -C core/build-cuda
# Vulkan build. meson setup core/build-vulkan libvmaf \ -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C core/build-vulkan meson test -C core/build-vulkan
0222 — vmaf-perShot per-shot CRF predictor sidecar (T6-3b)¶
- Touches:
core/tools/meson.build(new executable + test wiring),core/tools/vmaf_per_shot.c(new file — fork-local, no upstream sibling),core/tools/test/meson.build(test row),core/tools/test/test_vmaf_per_shot.sh(new smoke test),core/tools/AGENTS.md(sidecar invariants),docs/usage/cli.md(cross-link),docs/usage/vmaf-perShot.md(new user doc),docs/ai/roadmap.md(T6-3b row update). - Invariant: the sidecar must stay standalone — it does not link the libvmaf metric path. Any upstream patch that tries to fold per-shot CRF prediction into
vmaf_score_*would collapse the encoder-hint vs. quality-score separation recorded in roadmap §2.4 and ADR-0222 §Decision. The CSV / JSON column set (shot_id,start_frame,end_frame,frames,mean_complexity,mean_motion,predicted_crf) is the public schema; downstream encoders consume it directly. - Conflict expectation on
/sync-upstream: low. Upstream Netflix has no per-shot CRF predictor in tree, so there is no natural collision point —tools/meson.buildis the only mutually-edited file and the newexecutable('vmaf-perShot', …)block is appended aftervmaf_bench_deps, well clear of upstream's likely additions. - Reproducer:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=disabled ninja -C build meson test -C build test_vmaf_per_shot --print-errorlogs ./build/tools/vmaf-perShot \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --output /tmp/plan.csv cat /tmp/plan.csv
0075 — vmaf-roi sidecar binary (T6-2b / ADR-0247)¶
- Touches:
core/tools/meson.build— adds thevmaf_roiexecutable target (after the existingvmaftarget, beforevmaf_bench). Append-only; no upstream-shared lines moved or removed.core/test/meson.build— adds thetest_vmaf_roiexecutable +test()registration. Append-only.core/tools/vmaf_roi.c— wholly new, fork-local.core/tools/vmaf_roi_core.h— wholly new, fork-local.core/test/test_vmaf_roi.c— wholly new, fork-local.- Invariant: the
vmaf-roisidecar emits two byte-exact formats that downstream encoder drivers (x265--qpfile, SVT-AV1--roi-map-file) will hard-depend on: - x265 ASCII grid — two
#-prefixed header lines (# vmaf-roi qpfile (x265, --qpfile-style)and# frame=N ctu=S cols=C rows=R strength=F.FFF), space-separated signed integers, one row per CTU row,\nterminator. - SVT-AV1 raw binary — exactly
cols * rowsbytes ofint8_t, row-major, no header. - QP-offset clamp —
+-12(VMAF_ROI_CORE_QP_OFFSET_MAX). - Reduction — per-CTU mean (not max). Switching to max or a percentile changes every downstream encoder result and requires its own ADR.
- Pure helpers in
vmaf_roi_core.h— the per-CTU mean reducer and saliency-to-QP mapper arestatic inlinein a header sotest_vmaf_roicompiles them without dragging the libvmaf link surface in. Moving them into a.cTU breaks the test wiring. - On upstream sync: no interaction with upstream —
tools/is a fork-local surface from upstream's perspective (upstream shipsvmaf.conly). An upstream sync that rewritescore/tools/meson.buildshould preserve thevmaf_roiexecutable block. - Re-test on rebase:
```bash meson setup build-cpu libvmaf \ -Denable_cuda=false -Denable_sycl=false -Denable_tools=true ninja -C build-cpu tools/vmaf_roi test/test_vmaf_roi meson test -C build-cpu test_vmaf_roi ./build-cpu/tools/vmaf_roi \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --frame 0 --output - \ --encoder x265 --ctu-size 64 --strength 6.0 | head -3 # First two lines are the # comment header; row 1 of the grid # should be "4 2 1 -1 -1 -1 1 2 4" (placeholder radial map).
0219 — motion3 GPU coverage on Vulkan + CUDA + SYCL (T3-15(c) / ADR-0219)¶
- What changed: The
motionGPU twins (core/src/feature/vulkan/motion_vulkan.c,core/src/feature/cuda/integer_motion_cuda.c,core/src/feature/sycl/integer_motion_sycl.cpp) now emitVMAF_integer_feature_motion3_scorein 3-frame window mode (default). Cross-backend gates extended (scripts/ci/cross_backend_*.pyFEATURE_METRICS["motion"]). - Invariants:
motion3 = host-side scalar post-process of motion2. No device-side state changes; motion3 is computed on the host inextract()/collect()/flush()after the existing SAD reduction. The post-processing function (motion3_postprocess_*) mirrors CPUinteger_motion.clines 510-560 byte-for-byte:clip(motion_blend(motion2 * fps_weight, blend_factor, blend_offset), max_val)with optional moving-average against the unaveraged prior blended value.motion_five_frame_window=truereturns-ENOTSUPatinit()on all three GPU backends. The 5-deep blur ring + second SAD-pair dispatch remain deferred. Do NOT silently fall back to the 3-frame path when the user enables the flag — fail loud per CERT C / CLAUDE.md §12 r4.- CPU motion3 algorithm is the source of truth. Any port of an upstream Netflix change to
integer_motion.cthat touchesmotion_blend(...), themotion_max_valclip, or the moving-average rule MUST be mirrored inmotion3_postprocess_*across all three GPU files in the same PR. The cross-backend gate atplaces=4will catch drift, but only after a full GPU run. - On upstream sync: Pure fork-local additions to GPU TUs. Upstream Netflix has no GPU motion extractor. The
motion_blend_tools.hheader is upstream-mirrored — if a sync rewrites themotion_blend()formula, regenerate the GPU snapshot and re-run the cross-backend gate. - Re-test on rebase:
```bash # CPU sanity (motion3 emission unchanged) ./core/build/tools/vmaf \ --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature motion --output /tmp/motion.json --json python -c "import json; d=json.load(open('/tmp/motion.json')); \ print('motion3 frames:', sum(1 for f in d['frames'] \ if 'integer_motion3' in f.get('metrics', {})))" # Expect 49 (one motion3 per frame).
# Cross-backend gate (Vulkan/lavapipe lane works on every host): python scripts/ci/cross_backend_vif_diff.py \ --feature motion --backend vulkan \ --ref python/test/resource/yuv/src01_hrc00_576x324.yuv \ --dis python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --bitdepth 8 \ --vmaf-bin core/build/tools/vmaf # Expect: integer_motion / integer_motion2 / integer_motion3 all OK at places=4.
0216 — vmaf_tiny_v2 (Phase-3-validated tiny VMAF MLP)¶
- Touches:
model/tiny/registry.json,model/tiny/vmaf_tiny_v2.{onnx,json},ai/scripts/{train,export,validate}_vmaf_tiny_v2.py,ai/AGENTS.md,core/test/dnn/{test_vmaf_tiny_v2.py,meson.build},docs/ai/{models/vmaf_tiny_v2.md,inference.md,roadmap.md},docs/adr/{0244-vmaf-tiny-v2.md,README.md},CHANGELOG.md. All paths are wholly fork-local; no upstream Netflix/vmaf code is modified. - Invariants:
- Bundled scaler stats are part of the trust root. The shipped ONNX bakes
(input - mean) / stdas ConstantSub+Divnodes that run before the MLP. Re-exporting must go throughai/scripts/export_vmaf_tiny_v2.py, which pullsmean/stdfrom the trainer checkpoint and writes them as graph initialisers. Adding an out-of-band scaler step at runtime (e.g., a sidecar JSON consumed by the loader) is forbidden without a follow-up ADR — it splits the trust root and invalidates the registry sha256 contract. - Feature column order is fixed. The graph reads
(adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2)in exactly this order; reordering breaks the bundledmean/stdconstants. Any change to the feature set requires a fresh Phase-3 chain (Research-0027 → 0028 → 0029 → 0030). - opset 17. Matches the sister tiny-AI models (
learned_filter_v1,nr_metric_v1,fastdvdnet_pre) and the ORT op-allowlist baseline. Upgrading requires re-validating theSub/Div/Gemm/Relu/Squeezeops againstop_allowlist.c. - On upstream sync: zero interaction. Netflix/vmaf has no equivalent surface; an upstream sync that touches
core/src/dnn/(op-allowlist or model-loader changes) needs to preserveSub/Div/Gemm/Relu/Squeezein the allowlist for opset 17. - Re-test on rebase:
```bash bash core/test/dnn/test_registry.sh python3 core/test/dnn/test_vmaf_tiny_v2.py python3 ai/scripts/validate_vmaf_tiny_v2.py \ --onnx model/tiny/vmaf_tiny_v2.onnx \ --parquet runs/full_features_netflix.parquet \ --rows 100 --min-plcc 0.97 meson test -C build-cpu --suite=dnn
0094 — Tiny-AI extractor template (ADR-0250)¶
- Touches:
core/src/dnn/tiny_extractor_template.h(new),core/src/feature/feature_lpips.c,core/src/feature/fastdvdnet_pre.c,core/src/dnn/AGENTS.md,docs/ai/extractor-template.md(new),docs/adr/0250-tiny-ai-extractor-template.md(new). - Invariants:
- Helper signatures are wire-format-stable.
vmaf_tiny_ai_resolve_model_path(name, option, env_var)andvmaf_tiny_ai_open_session(name, path, &out)produce the user-facing log lines<name>: no model path …and<name>: vmaf_dnn_session_open(<path>) failed: <rc>— downstream tooling greps these. Don't rename or reorder the parameters without bumping every extractor + the recipe doc. - YUV→RGB is bit-exact. The shared
vmaf_tiny_ai_yuv8_to_rgb8_planesis a literal move of the pre-existingfeature_lpips.cbody (BT.709 limited-range, nearest-neighbour chroma upsample). LPIPS / saliency / future colour-sensitive tiny-AI scores depend on byte-exact equality with the prior ad-hoc copies. Any change to the conversion constants or the rounding rule needs a separate ADR + a coordinated snapshot regen —model/tiny/weights aren't re-trained against new colour math casually. - Option-table macro is plain text substitution. The
VMAF_TINY_AI_MODEL_PATH_OPTION(state_t, help)macro emits a single struct literal — no control flow, no recursion, no variadic shenanigans (Power-of-10 rule 1 / rule 9). Don't extend it into a multi-option emitter without a fresh ADR. - On upstream sync: zero interaction with upstream —
feature_lpips.candfastdvdnet_pre.care fork-only files, and the newdnn/tiny_extractor_template.hlives entirely under fork-introducedcore/src/dnn/. An upstream sync that rewrites unrelatedfeature_*.cfiles won't conflict. - Re-test on rebase:
cd libvmaf
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=dnn
meson test -C build-cpu test_lpips test_fastdvdnet_pre
# All 10 dnn-suite + both extractor tests must pass.
0095 — Vulkan ring-depth tunable (ADR-0251 follow-up #3)¶
- PR: feat/t7-29-followup3-ring-tunable.
- What rebases need to know:
VmafVulkanConfigurationgrew an additiveunsigned max_outstanding_framesfield. Existing zero-initialised configs continue to receive the canonical default (0 → VMAF_VULKAN_RING_DEFAULT == 4). The clamp helpervmaf_vulkan_clamp_ring_sizemoved fromimport.c(file-local static) tovulkan_internal.h(static inline) sostate_initandlazy_alloc_ringshare one definition; an upstream sync that re-introduces the static inimport.cwould shadow the header helper — drop the duplicate, keep the inline. - New public symbol:
vmaf_vulkan_state_max_outstanding_frames(const VmafVulkanState *)— read-side accessor for the clamped value. Pure additive surface; no upstream collision. - On upstream sync: zero interaction. The ring is wholly fork-introduced (ADR-0251); upstream Netflix has no Vulkan backend.
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_async_pending_fence # All 8 cases must pass: 4 v2-contract + 4 ring-tunable.
0096 — tools/vmaf-tune/ automation umbrella spec (ADR-0237 / Research-0044)¶
- PR: feat/vmaf-tune-spec.
- What rebases need to know: this PR ships only an umbrella ADR
- research digest under
docs/. No tracked source code, notools/vmaf-tune/directory yet, no Meson changes. An upstream sync touching ffmpeg-patches orlibvmaf/cannot collide with this PR. - On upstream sync: zero interaction. Spec-only PR.
- Re-test on rebase:
# No build/test impact — verify the docs render and links are alive:
ls docs/adr/0237-quality-aware-encode-automation.md \
docs/research/0044-quality-aware-encode-automation.md
grep -c '\[ADR-0237\]' docs/adr/README.md
0097 — test_speed gated on enable_float (fix default-build failure)¶
- PR: fix/test-speed-chroma-registration.
- What rebases need to know:
core/test/meson.buildnow wraps thetest_speedexecutable +test()registration inif get_option('enable_float'). Thespeed_chroma/speed_temporalextractors live inspeed.c, which is only compiled whenenable_float=true(the entries infeature_extractor.care wrapped in#if VMAF_FLOAT_FEATURES), so the test'svmaf_get_feature_extractor_by_name("speed_chroma")returned NULL on a default build (enable_float=false). - On upstream sync: zero interaction.
test_speed.cwas added fork-side via the Netflix port commitd3647c73. The gating pattern matchestest_vulkan_*(if get_option('enable_vulkan').enabled()). - Re-test on rebase:
# default (enable_float=false): test_speed must NOT be in the suite
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --reconfigure
ninja -C build
meson test -C build # expect: NO test_speed in the run
# CI shape (enable_float=true): test_speed must run + pass
meson setup build libvmaf -Denable_float=true --reconfigure
ninja -C build
meson test -C build test_speed # expect: 5/5 pass
0098 — Vulkan picture preallocation surface (ADR-0238)¶
- PR: feat/vulkan-picture-preallocation.
- What rebases need to know: ABI grows additively. New public surface in
core/include/libvmaf/libvmaf_vulkan.h:enum VmafVulkanPicturePreallocationMethod,VmafVulkanPictureConfiguration,vmaf_vulkan_preallocate_pictures,vmaf_vulkan_picture_fetch. New enumeratorVMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICEincore/src/picture.h::VmafPictureBufferType. New TUcore/src/vulkan/picture_vulkan_pool.c(~180 LOC); registered incore/src/vulkan/meson.build. Fork-internal accessorvmaf_vulkan_state_context()(declared invulkan_internal.h) exposes the imported state's VkInstance/VkDevice to the pool — used only bylibvmaf.c::vmaf_vulkan_preallocate_pictures. VmafContextfield added:vmaf->vulkan.poolnext tovmaf->vulkan.state. Thevmaf_close()teardown closes the pool before clearing the state pointer (matches SYCL).- On upstream sync: zero interaction. Vulkan backend is fork-only; upstream Netflix has no Vulkan integration.
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_pic_preallocation # All 6 cases must pass under ASan/UBSan: # test_method_none_is_a_no_op # test_method_host_allocates_round_robins # test_method_device_allocates_round_robins # test_fetch_without_preallocate_falls_back # test_unknown_method_rejected # test_null_args_rejected
0099 — feature_mobilesal.c + transnet_v2.c migrated to tiny_extractor_template.h¶
- PR: refactor/migrate-ai-to-template.
- What rebases need to know:
feature_mobilesal.candtransnet_v2.cpreviously open-coded the model-path resolution (getenv+ log block), the YUV→RGB kernel (mobilesal only), thevmaf_dnn_session_open+ log boilerplate, and theVmafOption[].model_pathrow. They now use the helpers fromdnn/tiny_extractor_template.h(PR #251) — the same templatefeature_lpips.candfastdvdnet_pre.calready consume. Net −98 LOC of identical boilerplate. - Behavior preserved: bit-exact YUV→RGB conversion (mobilesal used the literal copy of
feature_lpips.c's body that the template hoisted), identical error-log strings, identical option-table flag/type/offset shape. The migratedmobilesal_optionsmacro expands to the same struct literal the hand-rolled version produced. - On upstream sync: zero interaction. Both files are fork-introduced; upstream Netflix has neither extractor.
0100 — cuda/ring_buffer.{c,h} → gpu_picture_pool.{c,h} (ADR-0239)¶
- PR: refactor/gpu-picture-pool-extract.
- What rebases need to know:
core/src/cuda/ring_buffer.candring_buffer.hare removed. The same callback-based round-robin pool lives atcore/src/gpu_picture_pool.{c,h}under renamed symbols (VmafRingBuffer→VmafGpuPicturePool,vmaf_ring_buffer_*→vmaf_gpu_picture_pool_*,_fetch_next_picture→_fetch). All call sites inlibvmaf.cmigrated.core/test/test_ring_buffer.crenamed totest_gpu_picture_pool.cwith the corresponding meson update. - Netflix-upstream interaction: minimal — Netflix's
cuda/ring_buffer.{c,h}last touched in commitcb1d49c6. An upstream sync that resurrects the old names should be redirected to the new ones; the file move is purely fork-local. Netflix#1300mutex-destroy-order fix preserved (ADR-0157) — moved verbatim to the new file; the fix remains attached tovmaf_gpu_picture_pool_close.- SYCL pool migration:
vmaf_sycl_picture_pool_*keeps its public-internal API but now delegates to the generic pool. The SYCL wrapper struct (VmafSyclPicturePool) just owns theVmafSyclCookiestorage.std::mutexdrops out. - Vulkan pool migration: bundled into this PR after #264 merged.
picture_vulkan_pool.crewrites as a thin wrapper around the generic pool — wrapper struct owns per-pool state for the alloc/free callbacks; the generic pool owns the round-robin slots / mutex / unwind. Same pattern as the SYCL migration above. - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=dnn
meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre
# All 11 dnn-suite + 4 extractor smoke tests must pass.
meson test -C build # 47/47 pass under ASan/UBSan
# CUDA build (CI-only; pre-existing local nvcc include-path quirk):
meson setup build-cuda libvmaf -Denable_cuda=true
ninja -C build-cuda
meson test -C build-cuda test_gpu_picture_pool
# SYCL build:
meson setup build-sycl libvmaf -Denable_sycl=true
ninja -C build-sycl
meson test -C build-sycl
0104 — psnr_vulkan.c migrated to vulkan/kernel_template.h¶
- PR: refactor/migrate-psnr-vulkan-to-template.
- What rebases need to know:
vulkan/kernel_template.h(410 LOC, ADR-0246, PR #251) shipped with zero consumers. Its docstring designatedpsnr_vulkan.cas the reference implementation. This PR lands the migration as the first consumer of the Vulkan template — paired with PR #269 (the first CUDA template consumer). The 5 long-lived pipeline objects (descriptor-set layout, pipeline layout, shader module, compute pipeline, descriptor pool) collapse from individual struct fields to oneVmafVulkanKernelPipeline plbundle.create_pipeline()(~104 LOC) collapses to a singlevmaf_vulkan_kernel_pipeline_create()call (~30 LOC) — the template owns the descriptor-set layout creation, pipeline layout, shader module, compute pipeline, and descriptor-pool sizing.close_fex()'svkDeviceWaitIdle+ 5×vkDestroy*sweep collapses to onevmaf_vulkan_kernel_pipeline_destroy()call. - Net LOC delta: −55 LOC on
psnr_vulkan.cdirectly. Unlike the CUDA template (where helper-call boilerplate roughly matches the inline savings), the Vulkan template's pipeline creation is dramatic enough that even the first consumer wins. - Bit-exactness gates: spec-constants, push-constant struct, shader bytecode, dispatch grid math, and host-side reduction are byte-identical to the prior implementation. The template only owns descriptor-set layout / pipeline layout / shader module / compute pipeline creation / descriptor pool sizing — none of which affects the kernel's mathematical behaviour. Cross-backend parity gate (places=4) re-runs unchanged.
- On upstream sync: zero interaction.
psnr_vulkan.cis fork-introduced (T7-23 / ADR-0182 / ADR-0216). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled
ninja -C build
meson test -C build # 50/50 pass on lavapipe
# Cross-backend parity gate (places=4):
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4
0105 — moment_vulkan.c + ciede_vulkan.c migrated to vulkan/kernel_template.h¶
- PR: refactor/migrate-motion-vulkan-to-template (note: the branch name reflects the original intent; motion's two-pipeline shape didn't fit the template's single-pipeline contract, so this PR migrates moment + ciede instead).
- What rebases need to know: second + third consumers of
vulkan/kernel_template.h(after PR #270 = psnr_vulkan, the first consumer). Both files follow the identical migration pattern: - Replace 5 individual pipeline-object fields (
dsl,pipeline_layout,shader,pipeline,desc_pool) with oneVmafVulkanKernelPipeline plbundle. - Replace ~100 LOC of
create_pipeline()body (descriptor-set layout + pipeline layout + shader module + compute pipeline + descriptor pool boilerplate) with a singlevmaf_vulkan_kernel_pipeline_create()call. - Replace
close_fex()'svkDeviceWaitIdle+ 5×vkDestroy*sweep with onevmaf_vulkan_kernel_pipeline_destroy()call. - Per-file LOC deltas:
moment_vulkan.c: −60 LOC (450 → 390).ciede_vulkan.c: −59 LOC (536 → 477).- Net: −119 LOC.
- Bit-exactness preserved: spec-constants (width/height/bpc/ subgroup_size identical across both), push-constant structs (
MomentPushConsts,CiedePushConsts), shader bytecodes (moment_spv,ciede_spv), dispatch grid math, and host-side reductions are byte-identical to the prior implementation. Cross-backend parity gates (places=4 for moment integer reduce; places=2 for ciede transcendentals per ADR-0187) re-run unchanged. motion_vulkan.cdeferred: motion uses two pipelines (first frame vs subsequent) sharing one DSL + layout + shader + pool. The template's current shape produces one pipeline per descriptor; splitting motion across twoVmafVulkanKernelPipelineinstances would duplicate the shared objects. Tracked as a follow-up template extension (multi-pipeline support).- On upstream sync: zero interaction. Both files are fork-introduced (T7-23 / ADR-0182 / ADR-0187).
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build # 50/50 pass on lavapipe (under ASan/UBSan) python scripts/ci/cross_backend_parity_gate.py --feature float_moment_ref1st --places 4 python scripts/ci/cross_backend_parity_gate.py --feature ciede2000 --places 2
0101 — GPU backend pattern doc (ADR-0240)¶
- PR: docs/gpu-backend-template.
- What rebases need to know: doc-only PR. Adds
docs/development/gpu-backend-template.md(recipe new GPU backends follow) andcore/include/libvmaf/AGENTS.md(public-headers-tree invariant note). No source code, no meson changes, no ABI impact. - On upstream sync: zero interaction. Both files are fork-introduced.
- Re-test on rebase:
```bash # Doc-only — verify links resolve: test -f docs/development/gpu-backend-template.md test -f core/include/libvmaf/AGENTS.md grep -c 'gpu-backend-template' core/include/libvmaf/AGENTS.md
0102 — Tiny-AI test registration macro (tiny_ai_test_template.h)¶
- PR: refactor/test-registration-macro.
- What rebases need to know: new
core/test/tiny_ai_test_template.hemits the four standard registration tests (<name>_is_registered,<name>_provides_primary_feature,<name>_options_table_well_formed,<name>_init_rejects_missing_model) via theVMAF_TINY_AI_DEFINE_REGISTRATION_TESTS(ext, feat, env, prefix)macro. The four per-extractor test files (test_lpips.c,test_mobilesal.c,test_transnet_v2.c,test_fastdvdnet_pre.c) shrank from ~140 LOC each to ~20-50 LOC. Net −286 LOC. Behavior bit-exact preserved (same assertions, same env-var save/restore dance, same setenv shim for MSVCRT). TransNet V2 keeps two extractor-specific extra tests (binary-flag round-trip + provided_features list-termination) that the macro doesn't cover. - On upstream sync: zero interaction. The four test files are fork-introduced (per ADR-0042 / ADR-0168 / ADR-0220 / ADR-0223 / ADR-0215).
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre # 4/4 binaries pass; 18 individual tests total (4x4 standard + 2 # TransNet V2 extras).
0103 — integer_psnr_cuda.c migrated to cuda/kernel_template.h¶
- PR: refactor/migrate-psnr-cuda-to-template.
- What rebases need to know:
cuda/kernel_template.hshipped with no consumers in PR #251 (ADR-0246). This PR migrates the first consumer (integer_psnr_cuda.c) — the file the template's own docstring explicitly designated as the reference. TheCUstream + CUevent + CUeventtriple and the(VmafCudaBuffer device, void *host_pinned, size_t bytes)readback pair are now dispensed by the template helpers (vmaf_cuda_kernel_lifecycle_init/_close,vmaf_cuda_kernel_readback_alloc/_free,vmaf_cuda_kernel_submit_pre_launch,vmaf_cuda_kernel_collect_wait) instead of being open-coded.PsnrStateCudashrinks: replaces three fields (event+finished+str) with oneVmafCudaKernelLifecycle - replaces (
sse+sse_host) with oneVmafCudaKernelReadback. - Net LOC delta: +8 LOC on
integer_psnr_cuda.calone — the helpers add per-call boilerplate. The dedup win materialises as more CUDA feature kernels (motion / moment / ssim / vif / adm) migrate one-at-a-time in follow-up PRs. Each subsequent migration saves ~15 LOC. - Bit-exactness gates: kernel launch + reduction logic unchanged. The migration only touches state-management boilerplate around the kernel; the SSE accumulator math, the per-bpc kernel function lookup, the host-side
log10score formula, and the dispatch grid-dim calculation are byte-identical to the prior implementation. Netflix golden gate + CPU/CUDA cross-backend parity gate (places=4) re-run unchanged. - On upstream sync: zero interaction.
integer_psnr_cuda.cis fork-introduced (T7-23 / ADR-0182). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true
ninja -C build
meson test -C build # CUDA test suite must pass
# Cross-backend parity gate:
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4
0125 — Vulkan submit-side template + fence pool + descriptor pre-alloc bundle (ADR-0256)¶
- Touches:
core/src/vulkan/kernel_template.h— fork-local. Output landing inruns/phase_a/is gitignored — rerun the script to reproduce.VmafVulkanKernelSubmitPoolstruct +_create/_destroy/_acquirehelpers +vmaf_vulkan_kernel_descriptor_sets_allochelper. Upstream has no Vulkan backend — no merge surface.core/src/feature/vulkan/{psnr_hvs,vif,float_vif,float_adm}_vulkan.c— fork-local kernel TUs, also no upstream peer.- Invariant: the four migrated kernels keep all per-frame
VkFence+VkCommandBuffer+VkDescriptorSetresources alive across frames in the pool. Pre-bound descriptor sets rely on the kernel'sVmafVulkanBuffer *handles being init-time stable (allocated ininit(), freed only inclose_fex).vmaf_vulkan_kernel_pipeline_destroydestroys the descriptor pool — pre-allocated sets are released implicitly via the pool; callers must NOT callvkFreeDescriptorSetson them. - Re-test on rebase:
meson setup build libvmaf -Denable_vulkan=enabled
ninja -C build
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
meson test -C build test_vulkan_smoke \
test_vulkan_async_pending_fence \
test_vulkan_pic_preallocation
python scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature vif --backend vulkan --places 4
python scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature adm --backend vulkan --places 4
0107 — psnr_hvs_cuda async upload + persistent pinned staging (T-GPU-OPT-2/3)¶
- Touches:
core/src/feature/cuda/integer_psnr_hvs_cuda.c— only consumer; fork-local from inception (T7-23 / ADR-0188 / ADR-0191). State addsupload_str(dedicated H2D stream),upload_done(cross-stream completion event), and per-plane persistent pinnedh_uint_ref[3]/h_uint_dist[3]staging buffers allocated once ininit_fex_cuda. The per-call helperupload_plane_cudais split intoissue_d2h_plane(pic-stream D2H),convert_plane(CPU normalise), andissue_h2d_plane(upload-stream H2D).submit_fex_cudaruns the three phases explicitly and recordsupload_doneafter the last H2D, thencuStreamWaitEvents onlc.strbefore kernel launches.core/src/cuda/AGENTS.md— adds a rebase-sensitive invariant entry under §Rebase-sensitive invariants documenting the three-phase flow + persistent staging contract.- Invariant: the pinned
h_uint_*andh_ref/h_distbuffers are never freed and re-allocated mid-stream; the H2Ds must run onupload_str(not onlc.str) so thecuStreamWaitEventcross-stream link is meaningful; theupload_doneevent is recorded after the last H2D for the current frame and waited on once before the first kernel launch of that frame. CUDA graph capture (future T-GPU-OPT-N) depends on the no-per-frame-alloc invariant; collapsing the three-phase split or re-introducing per-framevmaf_cuda_buffer_host_alloccalls breaks that follow-up. Bit-exactness gate isplaces=3forpsnr_hvs_y / cb / crand the combinedpsnr_hvs(matches the existing matrix; notplaces=4). - On upstream sync: zero interaction.
integer_psnr_hvs_cuda.cis fork-introduced (T7-23 / ADR-0188 / ADR-0191). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature psnr_hvs --backend cuda --places 3
0227 — output.c writer-format unit tests (R3 of coverage-gap-2026-05-02)¶
- Touches:
core/test/test_output.c(new) — exercises the four writers incore/src/output.c(XML / JSON / CSV / SUB) end-to-end viatmpfile()-backed sinks and a syntheticVmafFeatureCollector. Pure test-only; no production code change.core/test/meson.build— registerstest_outputnext totest_feature_collector(mirrors that test's wiring:link_with: libvmaf+ libsvm objects + log/predict/metadata helpers).- Invariant: the test pulls
libvmaf.candoutput.cin via#include "*.c"(mirroring the precedent intest_feature_collector.c) so the per-translation-unit.gcnolands in the test build dir and gcovr aggregates output.c's coverage. The mu-test framework macro (mu_assert) deliberately early-returns from eachstatic char *test_*()body — that's why every test body tripsclang-analyzer-unix.Malloc"potential leak" notes (cleanup runs only on the success-tail path). This pattern is shared across everycore/test/test_*.cfile and is load- bearing (per ADR-0141 NOLINT carve-out): replacing it with goto- cleanup would obscure the per-assertion failure message. - On upstream sync: zero interaction.
output.cis upstream- mirrored, but this PR doesn't touch it. The test only depends on the four public function signatures (vmaf_write_output_{xml, json,csv,sub}); if Netflix renames or reorders those, the test fails to compile and the rebase author updates it then. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && ./build/test/test_output
0126 — OSSF Scorecard policy (ADR-0263)¶
- Touches:
.github/workflows/scorecard.yml(line 45 — thegithub/codeql-action/upload-sarif@<sha>pin). The rest of the policy is doc-only (docs/adr/0263-*.md,docs/research/0053-*.md,changelog.d/security/). Upstream Netflix/vmaf does not ship a Scorecard workflow, so the path itself is fork-introduced and won't conflict. - Invariant: the
upload-sarifSHA must point to a commit that currently exists ingithub/codeql-action's git tree. A SHA that was oncev4head but no longer exists in the action repository triggers Scorecard's "imposter commit" defence and breaks the workflow with a 400 error againstapi.scorecard.dev. Verify on every Dependabot bump by spot-checkinggh api /repos/github/codeql-action/commits/<sha>returns 200. - On upstream sync: zero interaction.
- Re-test on rebase:
```bash # Confirm the pin still resolves to a real commit: pin=$(grep -oE 'codeql-action/upload-sarif@[a-f0-9]{40}' \ .github/workflows/scorecard.yml | head -1 | cut -d@ -f2) gh api "/repos/github/codeql-action/commits/$pin" --jq '.sha' # Then watch the next master push for a green Scorecard run: gh run list --workflow scorecard --repo VMAFx/vmafx --limit 1
0228 — U-2-Net u2netp saliency replacement deferred (ADR-0265)¶
- Touches: docs-only.
docs/adr/0265-u2netp-saliency-replacement-blocked.md— new ADR continuing the deferral chain started by ADR-0257.docs/research/0055-u2netp-saliency-replacement-survey.md— new research digest (upstream survey + license + distribution -allowlist audit + alternatives walk).docs/ai/models/mobilesal.md— pointer block updated to reference both ADR-0257 (first blocker) and ADR-0265 (second blocker).model/tiny/registry.json—mobilesal_placeholder_v0notesfield updated to reference ADR-0265 alongside ADR-0257 (no schema / sha256 / file changes).model/tiny/mobilesal.json— sidecarnotesfield updated in lockstep.scripts/gen_mobilesal_placeholder_onnx.py— generator notes string updated so re-running is idempotent against the new sidecar / registry text.CHANGELOG.md— Changed entry viachangelog.d/changed/T6-2a-followup-u2netp-replacement-deferred.md.docs/adr/README.md— index row viadocs/adr/_index_fragments/0265-u2netp-saliency-replacement-blocked.md.- Invariant: zero C-side surface change.
feature_mobilesal.ctensor-name contract (inputinput→ outputsaliency_map, NCHW float32[1, 3, H, W]→[1, 1, H, W]) is unchanged; the on-diskmodel/tiny/mobilesal.onnx(sha256f1226310…) is unchanged;mobilesal_placeholder_v0'ssmoke: trueflag is unchanged. Any future drop-in (U-2-Net viaT6-2a-mirror-u2netp-via-release+T6-2a-widen-allowlist-resize, distilled student, or BASNet / PoolNet survey result) replaces the.onnxand bumps the registry sha256 without touching the C side. - On upstream sync: zero interaction.
feature_mobilesal.c, the registry, the ADR, and the research digest are all fork-local (T6-2a; ADR-0218 / ADR-0257 / ADR-0265; not present in Netflix upstream). - Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_mobilesal
python3 ai/scripts/validate_model_registry.py
bash scripts/docs/concat-adr-index.sh --check
bash scripts/release/concat-changelog-fragments.sh --check
0108 — ssim_accumulate_avx512 per-lane double reduction vectorised¶
- ADR: ADR-0139 (existing; no new ADR — the per-lane reduction order is unchanged).
- Touches:
core/src/feature/x86/ssim_avx512.c— thessim_accumulate_block_avx512body. The per-lane scalarssim_accumulate_lanecalls (16 of them) are replaced by two 8-wide__m512dpasses that computelv,cv,sv, andlv*cv*svlane-wise in vector double. Aligneddouble[16]spill buffers replace the previous_Alignas(64) float[16]×6spill, and the scalar accumulation loop now does 4×16vaddsdinstead of 16 invocations of the per-lane helper.CHANGELOG.md— Changed entry.- This file — this entry.
- Invariant (load-bearing for ADR-0139 bit-exactness):
- Per-lane double computation order is byte-identical:
((2.0 * rm) * cm + C1) / l_den, then(2.0 * srsc + C2) / c_den, then(lv * cv) * sv. No FMA contraction (separate_mm512_mul_pd+_mm512_add_pd—_mm512_fmadd_pdis forbidden because it changes the rounding count and would diverge from scalar's two-stepmul+add). - Float→double widening uses
_mm512_cvtps_pdwhich is IEEE-754-exact for finite floats (52-bit mantissa fits 23-bit float losslessly). - Lane-by-lane left-to-right reduction order preserved:
local_ssim += t_ssim[k]fork = 0..15. Tree reductions (pairwise add, dual-accumulator unroll) are forbidden — they break running-sum associativity against scalar. - AVX2 / NEON twins kept on the per-lane scalar path. Verified bit-identical against the new AVX-512 at
--precision maxon the Netflixsrc01_hrc00/01_576x324and thecheckerboard_1920_1080_10_3_*_0pairs. The bit-exactness contract (ADR-0139) is per-lane, not per-ISA algorithm — so AVX2 / NEON stay scalar-per-lane until a dedicated PR vectorises them with the same care. - Rebase impact: zero conflict with Netflix upstream — the whole SSIM SIMD surface is fork-local (no upstream SSIM SIMD exists). Conflicts only arise if upstream changes
ssim_accumulate_default_scalariniqa/ssim_tools.c; in that case both the AVX2 / NEON per-lane helper and the AVX-512 vector-double block need a coordinated update preserving the three invariants above. - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build
# Bit-exact at --precision max, scalar vs AVX2 vs AVX-512:
for MASK in 0 16 255; do
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--feature float_ms_ssim --feature float_ssim \
--xml -o /tmp/m${MASK}.xml --precision max --cpumask $MASK
done
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m16.xml) # empty
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m255.xml) # empty
- Why this matters on rebase: an upstream commit that touches
core/src/feature/ssimulacra2.ccould prompt a "let's also port the GPU XYB while we're here" follow-up. The ledger entry is the standing answer: don't, the measurement was redone on NVIDIA in May 2026 and the result still failedplaces=4by five decades. See Research-0047.
0126 — FastDVDnet real upstream weights drop (ADR-0253)¶
- What changed: replaces
model/tiny/fastdvdnet_pre.onnxwith the wrapped real upstream FastDVDnet checkpoint (sha256eb9444cf6f07eefdc7f4f68d09131074dbd1dcee6f88a331ba684dd2fb5937d4, ~9.5 MiB), refreshes the sidecarmodel/tiny/fastdvdnet_pre.json, flips the registry row'ssmoke: true → falseand addslicense: "MIT"+ the upstream commit pinc8fdf61. New exporterai/scripts/export_fastdvdnet_pre.py(the older_placeholder.pyexporter is retained for reference). New ADRdocs/adr/0255-fastdvdnet-pre-real-weights.md; user-facing docdocs/ai/models/fastdvdnet_pre.mdrewritten with provenance, license attribution, and reproduce-the-export instructions. - Upstream source: fork-local. Netflix/vmaf does not ship a FastDVDnet temporal pre-filter; the C extractor and ONNX surface are entirely fork-introduced (ADR-0215). The wrapped weights are attribution-only (upstream
m-tassano/fastdvdnetMIT). - On upstream sync: zero interaction. Every file touched (
ai/scripts/export_fastdvdnet_pre*.py,model/tiny/fastdvdnet_pre.*,docs/ai/models/fastdvdnet_pre.md,docs/adr/0253-*.md, CHANGELOG fragment, ADR index fragment) lives in fork-introduced trees. - Re-test on rebase:
# Re-derive the ONNX from the pinned upstream checkpoint.
mkdir -p /tmp/fastdvdnet_upstream && cd /tmp/fastdvdnet_upstream
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/model.pth
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/models.py
cd /path/to/vmaf
python3 ai/scripts/export_fastdvdnet_pre.py \
--upstream-dir /tmp/fastdvdnet_upstream
python3 ai/scripts/validate_model_registry.py
meson test -C build --suite=fast --print-errorlogs test_fastdvdnet_pre
0127 — ONNX op-allowlist gains Resize (ADR-0258)¶
- Touches:
core/src/dnn/op_allowlist.c— fork-local file (no upstream counterpart). One new entry"Resize"under the/* convolutional */block.core/test/dnn/test_op_allowlist.c,core/test/dnn/test_onnx_scan.c— fork-local DNN tests.ai/tests/test_op_allowlist.py— fork-local Python parity test.- Invariant: the C allowlist is the single source of truth; the Python regex parser in
ai/src/vmaf_train/op_allowlist.pywalks the sameop_allowlist.cfile. Any future entry only needs the C edit — Python symmetry is automatic. - Upstream source: fork-local. Netflix/vmaf has no ONNX op- allowlist surface; the entire
core/src/dnn/tree is fork- introduced. - On upstream sync: zero interaction. Every file touched lives in fork-introduced trees.
- Re-test on rebase:
meson test -C build test_op_allowlist test_onnx_scan
PYTHONPATH=ai/src python -m pytest ai/tests/test_op_allowlist.py
0231 — vif.comp + ciede.comp precise decorations (ADR-0269 / Step A of Vulkan 1.4 bump)¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(3 local-variable type qualifiers:g,sv_sq,gg_sigma_f→precise float),core/src/feature/vulkan/shaders/ciede.comp(yuv_to_rgboutputs,rgb_to_xyzmatmul accumulators,ciede2000chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE). - Invariant: Both shaders are fork-local (Vulkan backend is fork-added; upstream Netflix/vmaf has no Vulkan compute kernels). The
precisekeyword is GLSL 4.50 standard syntax; glslc 2026.1 lowers it to per-resultOpDecorate NoContraction. The decorations are load-bearing for the cross-backend gate on NVIDIA driver 595.71+ — removing them would re-introduce the 42/48 ciede regression at API 1.3 documented in research-0054. - On upstream sync: zero interaction. Both shader files are entirely fork-introduced; upstream has no Vulkan compute path.
- Re-test on rebase:
# Re-confirm the cross-backend gate on a Vulkan-capable host.
meson setup core/build -Denable_vulkan=enabled
ninja -C core/build
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature vif --backend vulkan --places 4
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature ciede --backend vulkan --places 4
# Confirm SPIR-V still emits NoContraction post-rebase.
glslc --target-env=vulkan1.3 -O \
core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
spirv-dis /tmp/vif.spv | grep -c NoContraction # expect ≥ 60
Expected on NVIDIA 595.71+: vif 0/48 OK, ciede 5/48 FAIL (max abs 8.9e-05 — pre-existing fork debt at API 1.3, see ADR-0269). On RADV / lavapipe: bit-exact (precise is a no-op there).
0229 — fr_regressor_v2 codec-aware scaffold (ADR-0272)¶
- ADR: ADR-0272
- Touches:
ai/scripts/train_fr_regressor_v2.py(new) — Phase A JSONL consumer; trains the codec-aware FRRegressor.model/tiny/fr_regressor_v2.onnx(new, smoke) — placeholder ONNX from--smokemode; re-baked on production training.model/tiny/fr_regressor_v2.json(new) — sidecar.model/tiny/registry.json— new entry withsmoke: true.docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md(new).docs/adr/README.md— index row.docs/research/0058-fr-regressor-v2-feasibility.md(new).docs/ai/models/fr_regressor_v2.md(new) — model card.ai/AGENTS.md— invariant note (codec block layout + ENCODER_VOCAB ordering).CHANGELOG.md— Added entry.- Invariant: the 8-D codec block layout is
[encoder_onehot(6), preset_norm, crf_norm]withENCODER_VOCAB = (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown)in load-bearing order. CRF normaliser is/63(union upper bound). Preset normaliser is/9. Bumping the vocabulary requires a re-train; existing checkpoints pin the order they were trained against viaencoder_vocab_versionin the sidecar. The two-input ONNX (features,codec) follows the LPIPS-Sq precedent (ADR-0040 / ADR-0041). - Rebase impact: entirely fork-local; pure additive; no upstream-mirror file is touched. Phase A schema (consumed by this trainer) is itself fork-local (
tools/vmaf-tune/). No conflict expected on/sync-upstream. - Re-test on rebase:
0311 — libFuzzer harness expansion: yuv_input + cli_parse (ADR-0311)¶
- ADR: ADR-0311; parent ADR-0270.
- Touches:
core/test/fuzz/fuzz_yuv_input.c(new)core/test/fuzz/fuzz_cli_parse.c(new)core/test/fuzz/meson.build— two newexecutable(...)blocks for the harnesses, plus a sharedfuzz_vidinput_sourceslist.core/test/fuzz/yuv_input_corpus/*(new — 6 seeds covering 8/10-bit × 4:2:0 / 4:2:2 / 4:4:4 plus a truncated-frame seed).core/test/fuzz/cli_parse_corpus/*(new — 6 seeds covering the--feature,--model,--reference, YUV-flag, and--helpshapes).core/test/fuzz/README.md— Targets table extended..github/workflows/fuzz.yml— matrix gainsfuzz_yuv_input+fuzz_cli_parse; per-harness wall-clock budget reduced from 300 s to 60 s so the 3-target matrix fits the existingtimeout-minutes: 15cap.docs/development/fuzzing.md— runbook table + smoke commands extended.docs/adr/0311-libfuzzer-harness-expansion.md(new)docs/research/0083-libfuzzer-harness-expansion-target-survey.md(new)libvmaf/AGENTS.md— new invariant block for the one-parser-one-harness rule.CHANGELOG.md— Added entry.- Invariant:
- The fuzz scaffold remains opt-in (
-Dfuzz=true) — every defaultmeson setupinvocation must continue to skip it. fuzz_yuv_inputre-includestools/yuv_input.cand the rest of the vidinput trio as build inputs. Upstream Netflix/vmaf splits or renames of those source files need the matchingmeson.buildsource-list update.fuzz_cli_parsere-includestools/cli_parse.cas a build input and links againstlibvmafforvmaf_version()and feature-dictionary symbols. The-Wl,--wrap=exitlink arg is load-bearing — without it,usage()'sexit(1)would terminate the fuzzer process on first bad input.LLVMFuzzerTestOneInputkeeps external linkage; the scaffold-wide// NOLINTNEXTLINE(misc-use-internal-linkage)pattern is correct for libFuzzer's name-resolved entry-point ABI.- Rebase impact: any upstream sync that touches
core/tools/{yuv_input,cli_parse}.cmust re-run the 60 s smoke per harness on the merged tip; record any new-found crash-* artefact under the matching<target>_known_crashes/dir, not in<target>_corpus/. The__wrap_exitshim infuzz_cli_parse.cis GNU-ld / lld-only; do not assume it works on Apple ld without an-undefined,dynamic_lookupfallback. - Re-test on rebase:
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz \
test/fuzz/fuzz_y4m_input \
test/fuzz/fuzz_yuv_input \
test/fuzz/fuzz_cli_parse
./build-fuzz/test/fuzz/fuzz_yuv_input \
-seed=0 -runs=1000 \
core/test/fuzz/yuv_input_corpus/
./build-fuzz/test/fuzz/fuzz_cli_parse \
-seed=0 -runs=1000 \
core/test/fuzz/cli_parse_corpus/
0229 — libFuzzer scaffold for the YUV4MPEG2 parser (ADR-0270)¶
- ADR: ADR-0270
- Touches:
core/test/fuzz/fuzz_y4m_input.c(new)core/test/fuzz/meson.build(new)core/test/fuzz/README.md(new)core/test/fuzz/y4m_input_corpus/*(new — six seeds)core/test/fuzz/y4m_input_known_crashes/*(new — one 411-chroma OOB reproducer; excluded from CI corpus)core/test/meson.build—subdir('fuzz')line.core/meson_options.txt— newoption('fuzz', ...)..github/workflows/fuzz.yml(new — nightly 5-minute job).docs/development/fuzzing.md(new — operator runbook).docs/adr/0270-fuzzing-scaffold.md(new)docs/research/0059-libfuzzer-scaffold-y4m.md(new)docs/state.md— new Open-bug row for the 411-chroma OOB write.CHANGELOG.md— Added entry.- Invariant: the fuzz scaffold is opt-in — every default
meson setupinvocation must continue to skip it. The harness links statically againstcore/tools/{y4m_input,yuv_input,vidinput}.crather thanlibvmaf.soso the public C-API surface stays unchanged. - Rebase impact: the harness re-includes
core/tools/y4m_input.cas a build input. Any upstream Netflix/vmaf change that splits or renames the tool sources (e.g. moves the parser intocore/src/) needs the correspondingmeson.buildsource list update and the harness re-test below. They4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4mreproducer is the regression gate for the parser fix; do not delete it on upstream sync — if upstream lands the same fix, port the reproducer back intoy4m_input_corpus/as a permanent seed. - Re-test on rebase:
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
./build-fuzz/test/fuzz/fuzz_y4m_input \
-max_total_time=60 \
core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer still triggers (until the fix lands):
./build-fuzz/test/fuzz/fuzz_y4m_input \
core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m
0231 — HIP seventh-consumer kernel float_motion_hip (ADR-0273)¶
- ADR: ADR-0273
- Touches:
core/src/feature/hip/float_motion_hip.c(new) — seventh consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/float_motion_cuda.ccall-graph-for-call-graph;init/submit/collect/closeinvoke the kernel-template helpers in the same order;flush()callback for tail-frame motion2 emission;motion_force_zeroshort-circuit posture (fex->extractswap withsubmit / collect / flush / closenulled). Submit path intentionally bypassesvmaf_hip_kernel_submit_pre_launch(kernel writes per-WG SAD float partials directly, no atomic, no memset).core/src/feature/hip/float_motion_hip.h(new)core/src/hip/meson.build— new entry inhip_sources.core/src/feature/feature_extractor.c— extern declaration plusfeature_extractor_list[]entry under#if HAVE_HIP.core/test/test_hip_smoke.c— new sub-testtest_float_motion_hip_extractor_registered(also asserts theVMAF_FEATURE_EXTRACTOR_TEMPORALflag bit) and a row intest_table[].docs/adr/0273-hip-seventh-consumer-float-motion.md(new)docs/adr/README.md— index row.docs/backends/hip/overview.md— seventh / eighth consumer note.core/src/hip/AGENTS.md— invariant note.CHANGELOG.md— Added entry (joint with ADR-0274).- Invariant — three-buffer ping-pong +
motion_force_zeroshort-circuit are load-bearing. The state struct carries threeuintptr_tbuffer slots (ref_in,blur[2]) that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin'sVmafCudaBuffer *ref_in+VmafCudaBuffer *blur[2]field shape. Themotion_force_zeroshort-circuit (fex->extractswap, kernel-template helpers nulled) must stay aligned with the CUDA twin on every refactor — otherwise the runtime PR's helper-body flip diverges between the two backends. Thesubmit_pre_launchbypass mirrors the CUDA twin; if a future PR adds asubmit_pre_launchcall tofloat_motion_cuda.c's submit path, the HIP twin must follow in the same PR. - Rebase impact: entirely fork-local. New files are HIP-specific. The only upstream-touching edit is
feature_extractor.c, but the change sits inside an existing#if HAVE_HIPblock (ADR-0241); upstream has noHAVE_HIPso no conflict is expected. - Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke
0232 — HIP eighth-consumer kernel float_ssim_hip (ADR-0274)¶
- ADR: ADR-0274
- Touches:
core/src/feature/hip/float_ssim_hip.c(new) — eighth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/integer_ssim_cuda.ccall-graph-for-call-graph (the CUDA file registersvmaf_fex_float_ssim_cudadespite itsinteger_filename). First multi-dispatch HIP consumer (chars.n_dispatches_per_frame == 2). Submit path intentionally bypassesvmaf_hip_kernel_submit_pre_launch(kernel writes per-block float partials directly). State struct carries fiveuintptr_tintermediate float buffer slots (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp) tracked outside the kernel-template's readback bundle.validate_dims_hipandinit_dims_hiphelpers extracted frominit()to fit thereadability-function-sizebudget.core/src/feature/hip/float_ssim_hip.h(new)core/src/hip/meson.build— new entry inhip_sources.core/src/feature/feature_extractor.c— extern declaration plusfeature_extractor_list[]entry under#if HAVE_HIP.core/test/test_hip_smoke.c— new sub-testtest_float_ssim_hip_extractor_registered(also assertschars.n_dispatches_per_frame == 2) and a row intest_table[].docs/adr/0274-hip-eighth-consumer-float-ssim.md(new)docs/adr/README.md— index row.docs/backends/hip/overview.md— seventh / eighth consumer note (joint).core/src/hip/AGENTS.md— invariant note.CHANGELOG.md— Added entry (joint with ADR-0273).- Invariant — multi-dispatch + five-slot buffer pyramid + v1
scale=1validation are load-bearing. The state struct carries fiveuintptr_tintermediate float buffer slots that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin'sVmafCudaBuffer *h_*field shape — any drift in the CUDA twin's slot count requires a paired update here. Thechars.n_dispatches_per_frame == 2characteristic is asserted in the smoke test; do not silently lower it. The v1scale=1-EINVALvalidation surface (invalidate_dims_hip) must stay aligned with the CUDA twin'scompute_scale/vmaf_logchain. The HIP twin'svalidate_dims_hip/init_dims_hipextraction is intentional for the function-size budget; do not re-inline without verifying the budget still passes. - Rebase impact: entirely fork-local; same posture as ADR-0273.
- Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke
0229 — vmaf_tiny_v3 + vmaf_tiny_v4 dynamic-PTQ int8 sidecars (ADR-0275)¶
0278 — vmaf-tune libaom-av1 codec adapter (2026-05-03)¶
0228 — vmaf-tune libx265 codec adapter (ADR-0288)¶
0280 — vmaf-tune NVENC codec adapters (ADR-0290)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_nvenc,hevc_nvenc,av1_nvenc,_nvenc_common}.py(new). Wholly fork-local — no upstream Netflix/vmaf overlap.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py— registry expanded.tools/vmaf-tune/tests/test_codec_adapter_nvenc.py(new).tools/vmaf-tune/tests/test_corpus.py— Phase-A registry assertion updated.tools/vmaf-tune/AGENTS.md— invariant note expanded.docs/usage/vmaf-tune.md— "Hardware encoders (NVENC)" section.docs/adr/0290-vmaf-tune-nvenc-adapters.md(new) +docs/adr/README.mdindex row.docs/research/0065-vmaf-tune-nvenc-adapters.md(new).CHANGELOG.md— Added entry.- Invariant:
known_codecs()returns the four-codec tuple("av1_nvenc", "h264_nvenc", "hevc_nvenc", "libx264"); the mnemonic preset map (ultrafast/superfast/veryfast→p1,faster→p2,fast→p3,medium→p4,slow→p5,slower→p6,slowest/placebo→p7) is the canonical cross-codec preset alignment that downstream Phase B/C consumers assume. The CQ window is the hardware-permitted[0, 51]; the Phase A informative window is[15, 40]. - Rebase impact: zero —
tools/vmaf-tune/is wholly fork-local and has no upstream Netflix/vmaf path overlap. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry add),tools/vmaf-tune/src/vmaftune/encode.py(parse_versions(stderr, encoder=…)gains a per-codec branch),tools/vmaf-tune/src/vmaftune/cli.py(help-text wording only),tools/vmaf-tune/tests/test_codec_adapter_x265.py(new),tools/vmaf-tune/tests/test_corpus.py(membership-based codec list assertion). - Invariant: the codec-adapter contract documented in
tools/vmaf-tune/AGENTS.md(multi-codec from day one; the search loop never branches on codec identity). Theparse_versionssignature is still backward-compatible —encoderdefaults tolibx264so callers from before this PR keep working. - Upstream source: fork-local.
tools/vmaf-tune/is fork-only; upstream Netflix/vmaf does not ship encode automation. - On upstream sync: zero interaction. Confirm the
_index_fragments/_order.txtrow for0288-vmaf-tune-codec-adapter-x265remains present after any cross-merge. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry row + import),tools/vmaf-tune/tests/test_corpus.py(membership assertion relaxed from== ("libx264",)to"libx264" in known_codecs()),tools/vmaf-tune/tests/test_codec_adapter_libaom.py(new),tools/vmaf-tune/AGENTS.md(preset-vocabulary invariant). - Invariant: the cross-codec preset vocabulary (
placebo, slowest, slower, slow, medium, fast, faster, veryfast, superfast, ultrafast) is shared across AV1-family adapters so one--presetaxis covers x264 / x265 / svtav1 / libaom-av1. Each adapter maps the human name onto its codec-specific knob; do not introduce per-adapter preset names. - Upstream source: fork-local.
tools/vmaf-tune/is the fork-introduced quality-aware encode automation harness (ADR-0237); it has no upstream Netflix/vmaf counterpart. - On upstream sync: zero interaction with
upstream/master. Self-contained intools/vmaf-tune/anddocs/. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- ADR: ADR-0275
- Touches:
model/tiny/vmaf_tiny_v3.int8.onnx(new, 4 267 B)model/tiny/vmaf_tiny_v4.int8.onnx(new, 7 769 B)model/tiny/registry.json— newvmaf_tiny_v3andvmaf_tiny_v4rows withquant_mode,int8_sha256,quant_accuracy_budget_plccfields.model/tiny/vmaf_tiny_v3.json,model/tiny/vmaf_tiny_v4.json— same fields mirrored into the per-model sidecars.docs/ai/models/vmaf_tiny_v3.md,docs/ai/models/vmaf_tiny_v4.md— new "Quantisation" sections.docs/adr/0275-vmaf-tiny-v3-v4-ptq.md(new) and ADR index row.CHANGELOG.md— Added entry.- Invariant:
python ai/scripts/measure_quant_drop.py --allreports[PASS]for bothvmaf_tiny_v3(drop ≤ 0.001 on Netflix features) andvmaf_tiny_v4(drop ≤ 0.001), inside the 0.01 per-model budget. The runtime redirect from ADR-0174 picks the.int8.onnxsibling when an operator's registry overlay declaresquant_mode: dynamic. - Rebase impact: entirely fork-local — neither v3 nor v4 nor the dynamic-PTQ harness exists upstream. The new int8 ONNX bytes ship as committed binaries (mirroring
learned_filter_v1andnr_metric_v1); they are well below the few-MB external-data threshold and don't require the sigstore +.onnx.datapattern. - Re-test on rebase:
```bash python ai/scripts/validate_model_registry.py python ai/scripts/measure_quant_drop.py --all
0229 — NVIDIA-Vulkan ciede2000 places=4 fork debt root-cause (ADR-0273)¶
- Touched files: docs-only.
docs/adr/0273-...precision-gap.md(new) +_index_fragments/row +_order.txtappend.docs/research/0055-ciede-vulkan-nvidia-f32-f64-root-cause.md(new) +docs/research/README.mdindex row.docs/state.md— Open-bugs rowT-VK-CIEDE-F32-F64.docs/backends/vulkan/overview.md— NVIDIA-hardware caveat.changelog.d/changed/ciede-vulkan-nvidia-f32-f64-precision-gap.md(new).core/src/vulkan/AGENTS.md— invariant cross-link.- Invariant: the ciede.comp shader's f32 precision contract is load-bearing — promoting to f64 would silently change scores on every Vulkan device that supports
shaderFloat64and create a per-device-feature-bit divergence (RTX 4090 has it; many consumer GPUs don't). The CPUciede.c::get_lab_colordoing its colour-space chain indoubleis upstream Netflix behaviour and must not be narrowed to f32 to "fix" the GPU gap (would change Netflix golden ground truth). The 5/48 NVIDIA places=4 mismatch on the highest-ΔE frames is expected and documented; do not attempt to "fix" it without re-reading ADR-0273 first. - Rebase impact: zero — docs-only. The CPU and shader sources this ADR analyses are unchanged by this PR. If a future upstream rebase touches
ciede.c::get_lab_color(thedoublechain) the ADR's reasoning still holds; if upstream changes the CPU reference's precision posture, ADR-0273 needs aStatus: Supersededentry. - Re-test on rebase: a manual NVIDIA-hardware run if available:
```bash cd libvmaf && meson setup build \ -Denable_vulkan=enabled -Denable_cuda=false && ninja -C build cd .. python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary $PWD/core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature ciede --backend vulkan --device 0 --places 4 # Expected post-PR-346 (when merged): 5/48 mismatches at 1.78× threshold. # Expected pre-PR-346 (current master): 42/48 mismatches at higher ratio. # If the count drops below 5/48 on NVIDIA, ADR-0273 should record the # delta and consider closing T-VK-CIEDE-F32-F64.
0229 — tools/vmaf-tune fast Phase A.5 scaffold (ADR-0276)¶
- Touches:
tools/vmaf-tune/src/vmaftune/fast.py(new),tools/vmaf-tune/src/vmaftune/cli.py(newfastsubcommand branch),tools/vmaf-tune/pyproject.toml(new[fast]extra),tools/vmaf-tune/tests/test_fast.py(new),tools/vmaf-tune/AGENTS.md(new invariants),docs/usage/vmaf-tune.md(new "Phase A.5" section),docs/adr/0276-vmaf-tune-fast-path.md(new ADR),docs/research/0060-vmaf-tune-fast-path.md(new digest). - Invariant: the
fastsubcommand is opt-in and never automatically replaces the Phase A grid path. The slow grid is the ground-truth corpus generator (ADR-0237 contract); fast-path is for the recommendation use case only. Optuna is a lazy-imported optional dep gated behind the[fast]extra — importing it at module scope outsidefast.py(or its tests) breaks the zero-dep core install. - Rebase impact: entirely fork-local; the tool sits under
tools/vmaf-tune/which is fork-added, and no upstream files are touched. Upstream Netflix/vmaf has no analogous surface. - Re-test on rebase:
pip install -e 'tools/vmaf-tune[fast]'
pytest tools/vmaf-tune/tests/test_fast.py -v
vmaf-tune fast --smoke --target-vmaf 92
0229 — vmaf-tune recommend subcommand (ADR-0237 Phase B-lite)¶
- Touches:
tools/vmaf-tune/src/vmaftune/recommend.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/cli.py— addsrecommendsubparser;corpussubcommand untouched.tools/vmaf-tune/tests/test_recommend.py(new). 13-case smoke suite, mocks all binaries; runs in <100 ms.docs/usage/vmaf-tune.md— adds## recommendsection.- Invariant:
recommendconsumes the existingCORPUS_ROW_KEYSschema unchanged —vmaf_score,bitrate_kbps,crf,preset,encoder,exit_status. No schema bump. If a future PR bumpsSCHEMA_VERSION, both thecorpuswriter and therecommendreader must be updated in lockstep; tests assert this viatest_corpus_row_keys_match_init_contract. - Rebase impact: zero —
tools/vmaf-tune/is wholly fork-local; no upstream surface touches it. - Re-test on rebase:
0228 — integer_ms_ssim_cuda.c joins drain_batch (T-GPU-OPT-2 / ADR-0271)¶
- Touches:
core/src/feature/cuda/integer_ms_ssim_cuda.c. No upstream Netflix/vmaf changes expected here — the file is fork-added (CUDA twin of the upstream-portms_ssim_score.cu) and the surface this PR redrew (per-scalel_partials[i]/c_partials[i]/s_partials[i]arrays + the per-scaleh_l_partials[i]/h_c_partials[i]/h_s_partials[i]pinned host shadows + thesubmit()<→collect()work redistribution + thecuEventRecord(s->lc.finished, s->lc.str)+vmaf_cuda_drain_batch_register(&s->lc)tail) is also entirely fork-local. - Invariant: the engine-scope drain-batch contract from ADR-0271 / drain_batch.h. The kernel-launch order on
s->lc.strmust stay stable:decimate (× 4)then for each scalei ∈ 0..4horiz⇒vert_lcs⇒ DtoH(l_partials[i]) ⇒ DtoH(c_partials[i]) ⇒ DtoH(s_partials[i])thencuEventRecord(s->lc.finished, s->lc.str)thenvmaf_cuda_drain_batch_register(&s->lc). Same-stream ordering is what makes the shared SSIM intermediates (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp`) safe across scales without explicit sync — any change that parallelises the per-scale work onto multiple streams breaks bit-exactness unless per-scale intermediates are also added. - On upstream sync: zero interaction (the file is fork-added). If a future upstream PR adds an
integer_ms_ssim_cuda.cof its own, the merger must reconcile the per-scale partials topology + the drain_batch tail with whatever the new upstream shape brings. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build # confirms the CPU build still links cleanly
# If the dev host has a working nvcc / host-compiler pair:
meson setup build_cuda -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda src/liblibvmaf_feature.a.p/feature_cuda_integer_ms_ssim_cuda.c.o
# Netflix CPU golden gate (CPU is the bit-exactness ground truth):
make test-netflix-golden
# Cross-backend parity (places=4 gate, ADR-0214):
/cross-backend-diff
0277 — ffmpeg-patches refresh against n8.1 — 2026-05-04 (ADR-0277)¶
- Touches:
ffmpeg-patches/is unchanged (no content drift). Doc-only entries land in: docs/adr/0277-ffmpeg-patches-refresh-2026-05-04.md— new ADR.docs/adr/_index_fragments/0277-ffmpeg-patches-refresh-2026-05-04.md— index row.docs/adr/_index_fragments/_order.txt— manifest append.changelog.d/changed/ffmpeg-patches-refresh-2026-05-04.md— Changed entry.- This file — this entry.
- Invariant:
ffmpeg-patches/series.txtorder is load-bearing — patches0002…0006build on each other and only apply cleanly cumulatively. The verification gate is a series replay, not a per-patchgit apply --check(per ADR-0118 + CLAUDE.md §12 r14). - On upstream sync: zero interaction. Netflix/vmaf has no
ffmpeg-patches/tree; this is a fork-local integration surface. - Re-test on rebase (also: re-replay procedure for the next refresh):
# Clone pristine n8.1
git -C /tmp clone --depth 1 --branch n8.1 \
https://github.com/FFmpeg/FFmpeg.git ff-replay-$(date +%F)
cd /tmp/ff-replay-$(date +%F)
git switch -c refresh-$(date +%F)
git config user.email refresh@local && git config user.name "Refresh Bot"
# Replay the series cumulatively
for p in /path/to/vmaf/ffmpeg-patches/000*-*.patch; do
git am --3way "$p" || break
done
# Regenerate and compare to in-tree
mkdir -p /tmp/ff-regen-$(date +%F)
git format-patch n8.1.. -o /tmp/ff-regen-$(date +%F)/
# Diff old vs new excluding pure format-patch noise
for i in 1 2 3 4 5 6; do
orig=$(ls /path/to/vmaf/ffmpeg-patches/000${i}-*.patch)
regen=$(ls /tmp/ff-regen-$(date +%F)/000${i}-*.patch)
diff -u \
<(grep -v "^From [0-9a-f]\|^Date:\|^index " "$orig") \
<(grep -v "^From [0-9a-f]\|^Date:\|^index " "$regen") \
| head -40
done
If only stylistic diffs surface (PATCH N/M numbering, MIME headers, hunk-context counts, hunk offset shifts against cumulative state), keep originals — record a no-drift refresh ADR. If real content drift surfaces, regenerate and ship the refresh PR with the regenerated patches plus a content-summary ADR.
End-to-end vf_libvmaf smoke is best run from CI (ffmpeg-integration.yml) against an installed libvmaf prefix — the meson-uninstalled .pc does not satisfy FFmpeg's #include <libvmaf.h> probe (the headers live under libvmaf/libvmaf.h only; the system-installed .pc carries an extra -I${includedir}/libvmaf shortcut that the uninstalled .pc omits).
0229 — T7-5 NOLINT-sweep closeout (ADR-0278)¶
- Touched files:
core/src/feature/integer_adm.c(1 NOLINT cite, line ~988adm_decouple_s123— upstream-mirror Netflix966be8d5).core/src/feature/cuda/ssimulacra2_cuda.c(3 NOLINT cites:ss2c_picture_to_linear_rgb,ss2c_host_combine,ss2c_run_scale_gpu/extract_fex_cuda).core/src/feature/vulkan/ssimulacra2_vulkan.c(3 NOLINT cites:ss2v_setup_gaussian,ss2v_picture_to_linear_rgb,ss2v_run_scale).core/src/feature/vulkan/cambi_vulkan.c(1 NOLINT cite:cambi_vk_extract).core/src/feature/sycl/integer_adm_sycl.cpp(6 cites, SYCL kernel-launch entries).core/src/feature/sycl/integer_motion_sycl.cpp(2 cites).core/src/feature/sycl/integer_vif_sycl.cpp(4 cites).core/tools/vmaf.c(3 cites:copy_picture_data,init_gpu_backends,main).- Invariant: zero behavioural change. Edits are inside comment blocks — appended
(ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278)to existing prose justifications. No function bodies split. The 12 SYCL sites share an identical justification string verbatim; preserving the byte-for-byte duplicate is the load-bearing documentation pattern (grep-able across the SYCL TUs). - On upstream sync: minimal interaction. The cite-only edits live inside comment blocks above the function signatures; rebases will surface them as touched lines but the function bodies are unchanged. For
integer_adm.c's upstream-mirror block (Netflix966be8d5), the comment edit at line 984–991 is cosmetic — keep the fork's version on conflict (it merely names the ADR; the underlying prose is unchanged). - Re-test on rebase:
```bash # 1. Programmatic audit must report 0 missing citations python3 - <<'PY' import re, os paths = [os.path.join(r, f) for r, _, fs in os.walk('libvmaf/src') for f in fs if f.endswith(('.c','.cpp','.h'))] paths.append('core/tools/vmaf.c') miss = total = 0 for p in paths: with open(p) as fh: ls = fh.readlines() for i, line in enumerate(ls): if 'NOLINT' in line and 'readability-function-size' in line and 'NOLINTEND' not in line: total += 1 ctx = [line]; j = i - 1 while j >= 0 and j > i - 14: s = ls[j].strip() if not s: break if s.startswith(('//','/','')): ctx.insert(0, ls[j]); j -= 1 else: break buf = ''.join(ctx) if 'ADR-' not in buf and not re.search(r'[Rr]esearch-?\d', buf): miss += 1 print(f"sites={total} missing={miss}") PY
# 2. Build + Netflix golden gate meson setup build -Denable_cuda=false -Denable_sycl=false ninja -C build make test-netflix-golden
0231 — vmaf-tune score path decodes mp4 -> raw YUV¶
- Touches:
tools/vmaf-tune/src/vmaftune/score.py(new_decode_to_raw_yuv+_needs_decodehelpers,run_scoreshells out to ffmpeg whenreq.distorted.suffix not in {.yuv, .y4m});tools/vmaf-tune/tests/test_corpus.py(3 new regression tests + the smoke-end-to-end mock now also stubs the ffmpeg decode call). - Invariant: the decode-back is the contract the libvmaf CLI imposes — mp4/webm/etc.
--distortedis silently rejected as raw-yuv with the wrong byte count, surfacing asexit_status=234. Future encoder adapters that emit non-raw containers inherit this decode automatically. Do not "optimise" the temp YUV away without first migrating the corpus pipeline to theffmpeg+libvmaffilter (which can pipe an mp4 stream in directly). - On upstream sync: zero interaction.
vmaf-tuneis fork-only tooling; upstream Netflix/vmaf has no analogue. - Re-test on rebase:
```bash cd tools/vmaf-tune && python3 -m pytest tests/ # plus an end-to-end smoke (needs a real raw YUV + ffmpeg + vmaf): ./vmaf-tune corpus --source /path/to/ref.yuv --width 1920 \ --height 1080 --pix-fmt yuv420p --framerate 25 --duration 6 \ --encoder libx264 --preset medium --crf 23 \ --output /tmp/smoke.jsonl --no-source-hash # expect: vmaf_score is a real number, not NaN.
0232 — CUDA build pins nvcc --std c++20¶
- Touches:
core/src/meson.buildline 686 (cuda_flags = [...]). - Invariant: nvcc 12.x clamps host C++ at C++17 by default; 13.x accepts up to C++20. Bumping the host stdlib past nvcc's default (any gcc >= 16, libstdc++ ships C++23 features) breaks the host-side parse in
<type_traits>/<bits/utility.h>. Forcing--std c++20on CUDA 13+ keeps the host headers parseable. Do not drop this flag without first checking the host gcc version against nvcc's default. - On upstream sync: zero interaction. Netflix/vmaf doesn't ship the
cuda_flagslist shape we use (their CUDA build is the original pre-fork pattern); a sync that touchescore/src/meson.buildaround theis_cuda_enabledbranch should keep the--std c++20injection. - Re-test on rebase:
meson setup core/build-cuda -Denable_cuda=true \
-Denable_sycl=false -Denable_vulkan=disabled
ninja -C core/build-cuda
# smoke
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
-r .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
-d .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
-w 1920 -h 1080 -p 420 -b 8
0233 — CUDA motion flush_fex_cuda idempotency guard¶
- Touches:
core/src/feature/cuda/integer_motion_cuda.c— factored anappend_if_unwrittenhelper and routed the two motion2 / motion3 final-frame writes through it. - Invariant: under T-GPU-OPT-1 (PR #312 / ADR-0242), the pending-collect inside
flush_context_cudamay already have writtenmotion2_score[s->index]/motion3_score[s->index]beforeflush_fex_cudaruns. Any future motion-cuda flush logic that emits the same (feature, index) pair must keep this idempotency contract orflush_context_cudawill mis-surface as "context could not be synchronized". - On upstream sync: the bug only exists because the fork's
flush_context_cudaruns the pending-collect before the per-extractor flush. Netflix/vmaf upstream doesn't have the T-GPU-OPT-1 drain pattern, so the pre-#312 code path didn't duplicate-write. If Netflix lands a similar pattern, the fix shape mirrors what's done here. - Re-test on rebase:
ninja -C core/build-cuda
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model path=model/vmaf_v0.6.1.json --threads 1 -q \
--output /tmp/cuda.json --json
# Expect: clean run, no "cannot be overwritten" warning,
# no "problem flushing context" error.
0234 — hw_encoder_corpus.py Phase A real-corpus runner¶
- Touches: new
scripts/dev/hw_encoder_corpus.py(no existing caller; opt-in tooling). Output landing inruns/phase_a/is gitignored — rerun the script to reproduce.docs/development/intel-arc-vaapi-driver-priority.md. Output landing inruns/phase_a/is gitignored — rerun the script to reproduce. stratified sample, 58 KiB). - Invariant: the script's QSV path forces
env['LIBVA_DRIVER_NAME']='iHD'(set by the calling shell, not inside the script) when targeting/dev/dri/renderD129on a multi-card host that has NVIDIA's libva-driver-nvidia shim installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI translation and the MFX session handshake fails with -9. See the companion doc for the failure mode + fix. - On upstream sync: zero interaction. The script lives under
scripts/dev/(fork-only); upstream Netflix/vmaf has no comparable Phase A corpus tooling. - Re-test on rebase:
python3 scripts/dev/hw_encoder_corpus.py \
--vmaf-bin core/build-cuda/tools/vmaf \
--source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
--width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
--encoder h264_nvenc --cq 25 \
--out /tmp/smoke.jsonl
# Expect: 1 cell × ~150 frames, per-frame canonical-6 + vmaf,
# encoder=h264_nvenc, cq=25.
0235 — fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)¶
- Touches:
ai/scripts/train_fr_regressor_v2.py—ENCODER_VOCABgains 6 hw-codec entries (3 NVENC + 3 QSV);ENCODER_VOCAB_VERSIONbumps 1 -> 2;PRESET_ORDINALgains 6 sub-tables forp1..p7(NVENC) and the libx264-aligned QSV preset family. - Invariant: vocab order is load-bearing — index of every entry is baked into trained model graphs as a one-hot column position. New entries MUST be appended (never inserted into the middle), and the
unknownsentinel MUST stay last (UNKNOWN_ENCODER_INDEX = N - 1). BumpingENCODER_VOCAB_VERSIONsignals that any v1-graph ONNX needs re-export against v2 before consuming v2 training rows. - On upstream sync: zero interaction.
train_fr_regressor_v2.pyis fork-only (Phase B prereq, ADR-0237 / ADR-0272). - Re-test on rebase:
python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export— expect PLCC > 0.95 on a multi-codec corpus.
0276 — vmaf_tiny_v5 corpus-expansion probe (ADR-0287) — defer¶
- What changed: research-only addition. New scripts under
ai/scripts/(fetch_youtube_ugc_subset.py,extract_ugc_features.py,train_vmaf_tiny_v5.py,eval_loso_vmaf_tiny_v5.py), new ADRdocs/adr/0276-*.md, new research digestdocs/research/0057-*.md, and one CHANGELOG entry. No new ONNX artefact undermodel/tiny/, no registry change, no public C-API / CLI / meson_options change. The probe trained an architecturally identical mlp_small on a 5-corpus parquet (4-corpus + 27 000 UGC rows); the 1-σ ship gate did not clear (Δ PLCC = +0.00005), so the exporter that the prior agent had drafted (export_vmaf_tiny_v5.py) was discarded before the commit. - Upstream source: fork-local. Netflix/vmaf has no tiny-AI corpus-expansion surface; nothing on the upstream side touches these files.
- On upstream sync: zero interaction. The v5 surface lives entirely under
ai/scripts/+docs/adr/+docs/research/, all of which are fork-introduced trees. The shipped v2 model (model/tiny/vmaf_tiny_v2.onnx) and its registry row are untouched. - Re-test on rebase:
# No code under test on rebase — purely research artefacts.
# If revisiting the corpus expansion, the reproducer is in the
# research digest:
python3 ai/scripts/fetch_youtube_ugc_subset.py \
--out-dir .workingdir2/ugc/download \
--n-stems 30 \
--manifest .workingdir2/ugc/manifest.json
python3 ai/scripts/extract_ugc_features.py \
--manifest .workingdir2/ugc/manifest.json \
--yuv-dir .workingdir2/ugc/yuv \
--vmaf-bin build-cpu/tools/vmaf \
--out-parquet runs/full_features_ugc.parquet \
--max-height 360 --max-frames 300 --threads 8
python3 ai/scripts/eval_loso_vmaf_tiny_v5.py \
--parquet-base runs/full_features_4corpus.parquet \
--parquet-extra runs/full_features_ugc.parquet \
--out-json runs/vmaf_tiny_v5_loso_metrics.json
0227 — vmaf-tune Intel QSV codec adapters (ADR-0281)¶
- What changed: fork-local additions under
tools/vmaf-tune/src/vmaftune/codec_adapters/—_qsv_common.py,h264_qsv.py,hevc_qsv.py,av1_qsv.py, plus registry rows incodec_adapters/__init__.pyand a new test filetools/vmaf-tune/tests/test_codec_adapter_qsv.py. Doc updates:docs/usage/vmaf-tune.md(Hardware encoders section),docs/adr/0281-vmaf-tune-qsv-adapters.md,docs/research/0066-vmaf-tune-qsv-adapters.md,tools/vmaf-tune/AGENTS.md,CHANGELOG.md. - Upstream source: fork-local.
tools/vmaf-tune/is fork-introduced under ADR-0237; Netflix/vmaf has no corresponding tree. - On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths.
- Invariant: the registry exposes exactly four codecs (
av1_qsv,h264_qsv,hevc_qsv,libx264— alphabetical), each adapter validates its(preset, quality)pair, and the QSV preset vocabulary is the seven x264-style names (veryslow…veryfast, noultrafast/superfast). The encode pipeline (encode.py) remains x264-CRF-tied and will be widened in a separate PR — the QSV adapters are inert until then. Future codec families that share parameter shape (NVENC, AMF) follow the same_<family>_common.py+ N thin adapters pattern. - Re-test on rebase:
0230 — K150K-A corpus extraction script (ADR-0362)¶
- Touches:
ai/scripts/extract_k150k_features.py(new fork-only file),ai/AGENTS.md(K150K invariant note appended),docs/adr/README.md(ADR-0362 index row),CHANGELOG.md,docs/rebase-notes.md. - Invariant:
extract_k150k_features.pyrequiresbuild-cpu/tools/vmaf(fork build withssimulacra2+motion_v2). If upstream Netflix adds these extractors to their own release binary, the--vmaf-bindefault may be updated to the system binary -- but only after verifying that the metric JSON key names match the aliases in_METRIC_ALIASES. TheFEATURE_NAMEStuple is column-order-locked to the parquet schema; any reorder invalidates trained checkpoints that consume the parquet. - Upstream interaction: none. Script is fork-only; the K150K clips and parquet are gitignored. Upstream Netflix/vmaf does not ship a K150K extractor.
- Re-test on rebase:
0229 — vmaf-tune libvvenc + NN-VC codec adapter (ADR-0285)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/vvenc.py(new fork-only file),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry edit, fork-only),tools/vmaf-tune/tests/test_codec_adapter_vvenc.py(new),tools/vmaf-tune/tests/test_corpus.py(relaxes theknown_codecs() == ("libx264",)assertion to"libx264" in known_codecs()since the registry now spans multiple codecs). - Invariant: the codec-adapter registry is fork-introduced (Phase A of ADR-0237) and lives entirely outside the upstream Netflix tree, so
tools/vmaf-tune/does not touch upstream paths. The only rebase-sensitive surface is theCORPUS_ROW_KEYSschema insrc/vmaftune/__init__.py(per the Phase A invariant intools/vmaf-tune/AGENTS.md); this PR adds the adapter without changing the schema. - Upstream interaction: none.
tools/vmaf-tune/is not in Netflix/vmaf upstream. - Re-test on rebase:
- Status update 2026-05-09: the original
nnvc_intratoggle was removed (it emitted a fabricatedIntraNNkey that does not exist in any released VVenC). Replaced with a curated 9-knob real-VVenC 1.14.0 tuning surface (PerceptQPA,InternalBitDepth,Tier,Tiles,MaxParallelFrames,RPR,SAO,ALF,CCALF). Defaults preserve the bit-exact Phase A grid baseline.adapter_versionbumped to"2"so cache keys invalidate. See ADR-0285 §"Status update 2026-05-09".no rebase impact: REASON(fork-local file, no upstream-tree touch).
0228 — vmaf-tune Phase D scaffold (ADR-0276)¶
- Touches:
tools/vmaf-tune/src/vmaftune/per_shot.py,tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/tests/test_per_shot.py,docs/usage/vmaf-tune.md,docs/adr/0276-vmaf-tune-phase-d-per-shot.md. - Invariant: scaffold-only. The module relies on a stable predicate signature
(shot, target_vmaf, encoder) -> (crf, predicted_vmaf)that Phase B's bisect (PR #347) drops into later.Shotranges are half-open[start_frame, end_frame)even though the C-sidevmaf-perShotJSON/CSV sidecar uses an inclusiveend_frame— normalisation happens at the parse boundary in_parse_per_shot_json/parse_per_shot_csv.vmaf-perShotschema lives indocs/usage/vmaf-perShot.mdand is fork-local (ADR-0222), so upstream cannot drift it; the only rebase risk is fork-internal renames. - Upstream source: entirely fork-local.
tools/vmaf-tune/is fork-introduced (ADR-0237). Netflix/vmaf upstream has no encode-automation surface. - On upstream sync: zero interaction expected. No file in this PR overlaps an upstream-mirrored path.
- Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q
python tools/vmaf-tune/vmaf-tune tune-per-shot --help
0229 — vmaf-tune SVT-AV1 codec adapter (ADR-0278)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/svtav1.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry),tools/vmaf-tune/src/vmaftune/encode.py(parse_versionsextended for the SVT-AV1 banner pattern),tools/vmaf-tune/src/vmaftune/corpus.py(optionalffmpeg_preset_tokenhook). - Invariant:
PRESET_NAME_TO_INTis closed and order-stable; the integer values are baked into corpus rows that downstreamfr_regressor_v2(ADR-0235) trains on. Reordering or rewriting the table silently changes the integer SVT-AV1 receives. The codec key"libsvtav1"matchesCODEC_VOCAB[2]inai/src/vmaf_train/codec.py— keep them aligned on any rename. - Upstream source: fork-local.
tools/vmaf-tune/is a fork-introduced tree (see entry 0227 — Phase A scaffold). No Netflix/vmaf upstream interaction. - On upstream sync: zero interaction. Lives entirely under the fork-local
tools/vmaf-tune/tree. - Re-test on rebase:
0230 — fr_regressor_v2 PROD ship (ADR-0352)¶
- ADR: ADR-0352
0230 — fr_regressor_v2 PROD ship (ADR-0291)¶
-
ADR: ADR-0291
-
Touches:
model/tiny/fr_regressor_v2.onnx(binary, refreshed),model/tiny/fr_regressor_v2.json(sidecar, sha256 + metrics),model/tiny/registry.json(smoke flag flip, sha256 update),runs/phase_a/full_grid/per_frame_canonical6.jsonl(training corpus — fork-local artefact underruns/), companion docs. - Re-test recipe: see Research-0068 §Reproducer. Ship gate is LOSO PLCC ≥ 0.95 on the per-source folds; current run reports 0.9681 ± 0.0207.
- Rebase invariant: the per-frame canonical-6 corpus must be rebuilt from
runs/phase_a/{nvenc,qsv}_pf.jsonl(PR #392) before any retrain; do not re-train against the cell-onlycomprehensive.jsonl(it lacks the per-frame features and produces PLCC ≈ 0.7 — the smoke baseline). - No upstream interaction:
fr_regressor_v2is fork-local (ADR-0272).
0229 — vmaf-tune Phase E ladder generator (ADR-0295)¶
- ADR: ADR-0295
- Touches: entirely fork-local under
tools/vmaf-tune/. New moduletools/vmaf-tune/src/vmaftune/ladder.py, new test filetools/vmaf-tune/tests/test_ladder.py, two new subcommand blocks intools/vmaf-tune/src/vmaftune/cli.py. No upstream-shared paths touched. - Invariant:
vmaftune.ladder.convex_hullreturns a strictly monotonic Pareto frontier (both bitrate and vmaf monotonically increasing);select_kneesreturns exactlymin(n, len(hull))rungs in ascending bitrate order;emit_manifest("hls")produces one#EXT-X-STREAM-INFper rung with monotonically-increasingBANDWIDTH=values. The default_default_sampleris intentionallyNotImplementedError— production callers must inject a Phase B bisect-driven sampler. Phase B integration PR (gated on PR #347) swaps the default; the test suite continues to inject a synthetic stub. - Rebase impact: none — fork-local Python tool; upstream Netflix/vmaf does not ship a
tools/vmaf-tune/tree. - Re-test on rebase:
0229 — fr_regressor_v2 probabilistic head scaffold (ADR-0279)¶
- Touches:
ai/scripts/train_fr_regressor_v2_ensemble.py(new — fork-local).ai/scripts/eval_probabilistic_proxy.py(new — fork-local).model/tiny/fr_regressor_v2_ensemble_v1*.onnx,fr_regressor_v2_ensemble_v1.json(new artefacts; smoke probes).model/tiny/registry.json— five newkind: "fr"rows (fr_regressor_v2_ensemble_v1_seed{0..4}); existing entries untouched.ai/AGENTS.md— new "fr_regressor_v2_ensemble_v1 — probabilistic head" section pinning the per-member ONNX I/O contract, manifest-as-runtime-entry-point invariant, ensemble-size pin, confidence-rule one-of, codec-vocab parity, and smoke-artefact posture.docs/ai/models/fr_regressor_v2_probabilistic.md(new model card).docs/research/0067-fr-regressor-v2-probabilistic.md(new audit digest).docs/adr/0279-fr-regressor-v2-probabilistic.md(new ADR; Proposed). Index row appended todocs/adr/README.md.CHANGELOG.md—### Addedrow under "Unreleased — lusoris fork".- Invariant: the per-member ONNX I/O contract (two inputs:
features [N, 6]standardised +codec_onehot [N, NUM_CODECS]; one outputscore [N]) and the manifest'sconfidencerule (one-of"ensemble"/"ensemble+conformal") are the C-side adapter's load-bearing contract. Per-member ensembles are stockFRRegressor(num_codecs=NUM_CODECS)calls — flipping to a v1-shaped single-input graph silently invalidates the manifest.CODEC_VOCABparity withai/src/vmaf_train/codec.pyis required. - On upstream sync: zero interaction expected. Wholly fork-local; no upstream Netflix/vmaf path overlap. The
ai/package is fork-introduced (see ADR-0021, ADR-0036) — upstream has no probabilistic-regressor surface. If upstream ever ships its ownfr_regressor_v2variant, do NOT merge — register both ids side-by-side. - Re-test on rebase:
python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke
python ai/scripts/eval_probabilistic_proxy.py --smoke
python ai/scripts/validate_model_registry.py
0287 — vmaf-tune saliency-aware ROI tuning (ADR-0293)¶
- Touches:
tools/vmaf-tune/src/vmaftune/saliency.py,tools/vmaf-tune/src/vmaftune/cli.py(newrecommendsubcommand),tools/vmaf-tune/AGENTS.md(saliency invariant),docs/usage/vmaf-tune.md(saliency section). - Upstream source: fork-local. The
vmaf-tunetree was introduced in PR #329 (ADR-0237 Phase A) and has no upstream Netflix counterpart. - On upstream sync: zero interaction — pure fork-local Python package under
tools/vmaf-tune/. - Invariant: the saliency-to-QP-offset signal blend (
offset = (2*sal − 1) * foreground_offset, clamped to ±12) is bit-for-bit equivalent tovmaf-roi's C-side blend (ADR-0247).tests/test_saliency.pypins the contract; ifvmaf-roi's C blend changes,saliency.pyfollows in the same PR. The test seam contract (session_factory=…,encode_runner=…) lets the suite run withoutonnxruntimeorffmpeg. - Re-test on rebase:
0229 — tools/vmaf-roi-score/ Option C scaffold (ADR-0296)¶
- ADR: ADR-0296
- Touches:
tools/vmaf-roi-score/pyproject.toml(new)tools/vmaf-roi-score/vmaf-roi-score(new console shim)tools/vmaf-roi-score/src/vmafroiscore/__init__.py(new)tools/vmaf-roi-score/src/vmafroiscore/cli.py(new)tools/vmaf-roi-score/src/vmafroiscore/score.py(new)tools/vmaf-roi-score/src/vmafroiscore/mask.py(new)tools/vmaf-roi-score/tests/test_combine.py(new)tools/vmaf-roi-score/README.md(new)tools/vmaf-roi-score/AGENTS.md(new)docs/adr/0296-vmaf-roi-saliency-weighted.md(new)docs/adr/_index_fragments/0296-vmaf-roi-saliency-weighted.md(new)docs/adr/_index_fragments/_order.txt— append-only.docs/research/0069-vmaf-roi-saliency-weighted.md(new)docs/usage/vmaf-roi-score.md(new)changelog.d/added/T6-2c-vmaf-roi-score-scaffold.md(new)- Invariant:
tools/vmaf-roi-score/is wholly fork-local. No upstream Netflix/vmaf surface owns or interacts with this directory. The combine math is a pure linear blend on Pythonfloat; the JSON schema is pinned byROI_RESULT_KEYSandSCHEMA_VERSION = 1. Schema bumps require an ADR-0288 supersession. Naming guard: do not confuse withcore/tools/vmaf_roi.c(ADR-0247) — that's the encoder-steering binary. The scoring tool here isvmaf-roi-score; the names diverge deliberately. - Rebase impact: zero. Pure-Python tool under
tools/; not part of the libvmaf C build, not part of any Netflix-mirrored surface. - Re-test on rebase:
0228 — vmaf-tune compare codec-comparison mode (research-0061 Bucket #7)¶
- Touches:
tools/vmaf-tune/src/vmaftune/compare.py(new). Wholly fork-local; no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/cli.py— adds thecomparesubparser and_run_comparerouter.tools/vmaf-tune/tests/test_compare.py(new). Mocked predicate; noffmpeg/vmafbinaries required.tools/vmaf-tune/AGENTS.md— invariant note for the predicate seam andCOMPARE_ROW_KEYScontract.docs/usage/vmaf-tune.md— new "Codec comparison" section.- Invariant:
compare.compare_codecsorchestrates per-codec ranking via an injectedpredicate(codec, src, target_vmaf) -> RecommendResultcallable. The orchestration must not branch on codec name; new codecs land as one-file additions undercodec_adapters/and are picked up automatically by the registry.COMPARE_ROW_KEYSis the JSON / CSV column contract — same maintenance discipline asCORPUS_ROW_KEYS. - Rebase impact: entirely fork-local. The Phase A + Phase B recommend backend (ADR-0237) is fork-internal; upstream Netflix/vmaf has no
tools/vmaf-tune/tree. - Re-test on rebase:
```shell pytest tools/vmaf-tune/tests/test_compare.py -v PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \ --src /tmp/ref.yuv --target-vmaf 92 --format markdown
0229 — vmaf-tune --score-backend GPU score wiring (ADR-0299)¶
- Touches:
tools/vmaf-tune/src/vmaftune/score_backend.py(new). Wholly fork-local —tools/vmaf-tune/has no upstream Netflix/vmaf overlap.tools/vmaf-tune/src/vmaftune/{score,corpus,cli}.py(additive kwargs, no API removals).tools/vmaf-tune/tests/test_score_backend.py(new).docs/usage/vmaf-tune.md(new GPU section + flag row).docs/adr/0299-vmaf-tune-gpu-score.md(new).docs/research/0071-vmaf-tune-gpu-score-backend.md(new).- Invariant: the libvmaf CLI exposes
--backend NAMEwith valuesauto|cpu|cuda|sycl|vulkanexactly. Help-text parser inscore_backend.parse_supported_backendspins this format. If upstream renames the flag or reformats the help line on merge, the parser silently degrades to "CPU only" — the test fixtures intest_score_backend.pywill catch the format change but only if re-run. - Upstream source: fork-local. Netflix upstream's CLI does not ship a
--backendselector (CPU-only). - On upstream sync: zero interaction.
vmaf-tunelives entirely in fork-introduced paths and consumes only the fork's--backendflag. - Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v
# If the libvmaf help text reformats, parse_supported_backends
# will return {"cpu"} on test_parse_full_backend_line_yields_all_four
# and the test fails loudly.
0261 — vmaf-tune HDR-aware encode + score path (2026-05-03)¶
- What changed: fork-local addition under
tools/vmaf-tune/src/vmaftune/hdr.pyplus wiring intocorpus.py/cli.py/score.py. Adds ffprobe-driven HDR detection, codec-specific HDR ffmpeg flag dispatch, schema-v2 corpus row keys (hdr_transfer,hdr_primaries,hdr_forced), and four--auto-hdr/--force-*CLI modes. See ADR-0300. - Upstream source: zero.
tools/vmaf-tune/is fork-introduced (Phase A under ADR-0237). - On upstream sync: zero interaction. Upstream Netflix/vmaf ships no encode automation surface; this tree is entirely fork-local and lives outside
libvmaf/andpython/. - Schema migration note:
SCHEMA_VERSIONbumped 1 → 2. The three new keys are additive — Phase B / C loaders treat missing keys as SDR for backward compat with v1 rows. - Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -q
python -m vmaftune.cli corpus --help # confirm --auto-hdr surfaces
0298 — vmaf-tune content-addressed cache (ADR-0298)¶
- What changed: fork-local. New module
tools/vmaf-tune/src/vmaftune/cache.py; cache integration intools/vmaf-tune/src/vmaftune/corpus.py(iter_rowsnow consults the cache before encode/score); new CLI flags--no-cache,--cache-dir,--cache-size-gbincli.py. Codec-adapterProtocolgainsadapter_version: str; the lone Phase-A x264 adapter pins"1". - Upstream source: none.
tools/vmaf-tune/is fork-introduced (ADR-0237) and has no upstream counterpart. - On upstream sync: zero interaction with Netflix/vmaf master. The module sits entirely under
tools/vmaf-tune/, which upstream does not ship. - Invariant for future codec adapters: every
CodecAdaptermust declareadapter_version: str. Bump it whenever the adapter's argv shape, preset list, or quality range changes — otherwise the cache returns stale results post-upgrade. The contract is asserted bytest_cache_key_diffs_on_each_fieldintests/test_cache.py. - Re-test on rebase:
```bash pytest tools/vmaf-tune/tests/test_cache.py -v
0283 — vmaf-tune Apple VideoToolbox adapters (2026-05-05)¶
- What changed: fork-local addition under
tools/vmaf-tune/src/vmaftune/codec_adapters/. New files:h264_videotoolbox.py,hevc_videotoolbox.py,_videotoolbox_common.py, plus the registry hook in__init__.py. See ADR-0283. - Update 2026-05-09:
prores_videotoolbox.pyadapter added to the same registry pattern (broadcast / prosumer ProRes intermediate). Quality knob differs — ProRes is a fixed-rate codec, so the harness's--crfslot carries the integer ProRes tier id (0=proxy→ 5=xq) rather than a-q:vvalue._videotoolbox_common.pyextended withPRORES_PROFILE_*constants +validate_prores_videotoolbox()/prores_profile_name()helpers; profile ids verified against FFmpeg n8.1.1libavcodec/videotoolboxenc.c. See the Status update appendix in ADR-0283. - Upstream source: zero.
tools/vmaf-tune/is fork-introduced (Phase A under ADR-0237). - On upstream sync: zero interaction.
- Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py -q
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_prores_videotoolbox.py -q
0228 — vmaf-tune coarse-to-fine CRF search (ADR-0306)¶
- What changed: fork-local tooling. Adds
coarse_to_fine_search()totools/vmaf-tune/src/vmaftune/corpus.py, plumbs new CLI flags ontovmaf-tune corpus(--coarse-to-fine,--coarse-step,--fine-radius,--fine-step,--target-vmaf), and ships a newvmaf-tune recommendsubcommand. Widenstools/vmaf-tune/src/vmaftune/codec_adapters/x264.pyquality_rangefrom(15, 40)to(0, 51). JSONL row schema unchanged (SCHEMA_VERSION=1). - Upstream source: fork-local. The whole
tools/vmaf-tune/tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation surface. - On upstream sync: zero interaction.
tools/vmaf-tune/is not mirrored from upstream. - Re-test on rebase:
0314 — vmaf-tune --score-backend=vulkan (ADR-0314)¶
- Touches:
tools/vmaf-tune/src/vmaftune/cli.py(additive argparse flag oncorpus+recommendsubparsers; resolvesselect_backendand catchesBackendUnavailableErrorfor clean exit-2).tools/vmaf-tune/src/vmaftune/score.py(additivebackendkwarg onbuild_vmaf_commandandrun_score;None= no flag emitted).tools/vmaf-tune/src/vmaftune/corpus.py(newCorpusOptions.score_backendfield, defaultNone; forwarded intorun_score).tools/vmaf-tune/tests/test_score_backend.py(additive Vulkan-specific tests; pre-existing tests now pass after thebackend=kwarg lands).docs/adr/0314-vmaf-tune-score-backend-vulkan.md(new).docs/usage/vmaf-tune.md(new "Vulkan score backend" subsection under the existing GPU-scoring section).tools/vmaf-tune/AGENTS.md(invariant note: argparse choices stay in sync with libvmaf--backendvocabulary).changelog.d/added/vmaf-tune-score-backend-vulkan.md(new).- Invariant:
score_backend.ALL_BACKENDS = ("cpu", "cuda", "sycl", "vulkan")is the exact set libvmaf'score/tools/cli_parse.c--backendalternation accepts. Adding a new harness-side value without the libvmaf-side wiring produces silent strict-mode failures on hosts that probe positively for it. - Upstream source: zero. Netflix upstream's CLI does not ship a
--backendselector; bothtools/vmaf-tune/andcore/src/vulkan/are fork-introduced. - On upstream sync: zero interaction. No upstream-mirror file is touched.
- Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v -k vulkan
pytest tools/vmaf-tune/tests/test_score_backend.py -v
Failures here usually indicate the libvmaf help-text format changed; score_backend.parse_supported_backends test fixtures pin the format and will fail loudly.
0303 — fr_regressor_v2 ensemble prod flip (ADR-0303)¶
- ADR: ADR-0303
- Touches: entirely fork-local.
ai/scripts/train_fr_regressor_v2_ensemble_loso.py(new — 9-fold LOSO trainer over the five ensemble seeds; emitsloso_seed{N}.jsonartefacts).scripts/ci/ensemble_prod_gate.py(new — reads fiveloso_seed{N}.jsonfiles, returns exit 0 iffmean(PLCC_i) ≥ 0.95ANDmax - min ≤ 0.005).ai/AGENTS.md— appended "Ensemble registry invariant" paragraph under the existingfr_regressor_v2_ensemble_v1section.docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md(new),docs/research/0075-fr-regressor-v2-ensemble-prod-flip.md(new),changelog.d/added/fr-regressor-v2-ensemble-prod-flip.md(new).- Rebase invariant: the production ship gate is two-part —
mean_i(PLCC_i) ≥ 0.95ANDmax_i(PLCC_i) - min_i(PLCC_i) ≤ 0.005over five seeds. The variance bound is load-bearing: removing it silently allows a one-seed-wins-four-seeds-tie configuration that invalidates the ensemble's predictive-distribution semantics. Both thresholds live inscripts/ci/ensemble_prod_gate.py; do not weaken either without superseding ADR-0303. - Rebase invariant (registry): the five
fr_regressor_v2_ensemble_v1_seed{0..4}registry rows aresmoke: trueon master at this commit; flipping them tofalseis the follow-up flip PR's job, gated on a real-corpus LOSO run + the CI gate. Do not flip seed rows during a rebase merge conflict resolution. - Re-test on rebase:
python3 -c "import ast; ast.parse(open('ai/scripts/train_fr_regressor_v2_ensemble_loso.py').read())"
python3 -c "import ast; ast.parse(open('scripts/ci/ensemble_prod_gate.py').read())"
python ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help
python scripts/ci/ensemble_prod_gate.py --help
- Upstream source: zero.
fr_regressor_v2and its ensemble are fork-introduced (parent ADR-0272 / ADR-0279). - On upstream sync: zero interaction.
0313 — CI required-checks aggregator (2026-05-05)¶
- What changed: fork-local CI policy. New
.github/workflows/required-aggregator.yml— single workflow that runs on every non-draft PR and verifies the 23 named required checks reportedsuccess/skipped/neutral(or didn't appear at all, which is the path-filter-rejection semantics). Aggregator becomes the single branch-protection required check, replacing the 23-name list from ADR-0037. - Touches:
.github/workflows/required-aggregator.yml(new),docs/adr/0313-ci-required-checks-aggregator.md(new),changelog.d/added/ci-required-checks-aggregator.md(new),docs/adr/README.md(+1 row),docs/adr/_index_fragments/_order.txt(+1 line + new fragment file). - Upstream source: zero. Branch-protection policy is fork-only.
- On upstream sync: zero interaction with Netflix/vmaf master.
- Manual operator step at adoption (uses PATCH, not PUT — corrected from the original ADR-0313 body which had the wrong verb):
echo '{"strict": false, "contexts": ["Required Checks Aggregator"]}' | \
gh api -X PATCH "repos/VMAFx/vmafx/branches/master/protection/required_status_checks" --input -
- Re-test on rebase:
# YAML lint passes
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/required-aggregator.yml'))"
0305 — encoder knob-space Pareto analysis (2026-05-05)¶
- What changed: fork-local. New analysis scaffold for the 12,636-cell encoder knob sweep that backs
tools/vmaf-tune/codec_adapters/*recipe defaults. New files:ai/scripts/analyze_knob_sweep.py(per-(source, codec, rc_mode)Pareto hull on(bitrate_kbps, vmaf_score),encode_time_mstiebreaker, regression-detection check),ai/tests/test_knob_sweep_analysis.py(synthetic 20-row JSONL fixture). Methodology + scaffolded findings: see ADR-0305 + Research-0077. Companion to Research-0063. - Touches: none upstream-shared. Sits entirely under
ai/(fork-local since the tiny-AI training surface, ADR-0021) anddocs/{adr,research}/(fork ledger). - Upstream source: zero. The 12,636-cell sweep, the Pareto scaffold, and the regression-detection invariant are fork-introduced; Netflix/vmaf master ships no encoder knob-sweep tooling.
- On upstream sync: zero interaction with Netflix/vmaf master.
- Invariant for future codec adapter PRs: per the
ai/AGENTS.mdknob-sweep corpus invariant (ADR-0305), recipes that regress vs the bare encoder at matched bitrate within the same(source, codec, rc_mode)slice MUST NOT ship as adapter defaults. New adapter PRs cite the per-slice hull row fromreports/summary.md(or "no hull entry yet — bare default") in their PR description. Thecomprehensive.jsonlsweep file is generated locally and lives underruns/phase_a/full_grid/(gitignored — never committed). - Re-test on rebase:
0302 — ENCODER_VOCAB v3 schema expansion (ADR-0302)¶
- Touches:
ai/scripts/train_fr_regressor_v2.py(adds anENCODER_VOCAB_V3parallel constant; does not modify the liveENCODER_VOCABorENCODER_VOCAB_VERSION). - Invariant:
ENCODER_VOCABis append-only and order-stable (per ADR-0235). The v3 scaffold preserves the v2 slot ordering verbatim — slots 0..12 are bit-identical to the v2 vocab; slots 13/14/15 appendlibsvtav1,h264_videotoolbox,hevc_videotoolbox. The liveENCODER_VOCAB_VERSION = 2remains the source of truth until the follow-up retrain PR clears the LOSO PLCC ship gate. - Upstream interaction: zero.
ai/scripts/train_fr_regressor_v2.pyis fork-introduced (ADR-0272) and has no upstream counterpart. - Re-test on rebase:
python3 -c "
import importlib.util, pathlib
spec = importlib.util.spec_from_file_location(
't', pathlib.Path('ai/scripts/train_fr_regressor_v2.py')
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
assert len(m.ENCODER_VOCAB_V3) == 16
assert m.ENCODER_VOCAB_VERSION == 2
print('OK')
"
0304 — vmaf-tune fast-path prod wiring (ADR-0304)¶
- Touches:
tools/vmaf-tune/src/vmaftune/fast.py(replaces the ADR-0276 scaffold'sNotImplementedErrorpaths with concrete Optuna TPE + v2 proxy + GPU verify wiring); new moduletools/vmaf-tune/src/vmaftune/proxy.py(centralised seam forfr_regressor_v2ONNX inference); expandedtools/vmaf-tune/tests/test_fast.py. Doc-side: ADR-0304, Research-0076,tools/vmaf-tune/AGENTS.mdinvariant note. - Upstream source: zero.
tools/vmaf-tune/andmodel/tiny/fr_regressor_v2.onnxare both fork-introduced (ADR-0237 / ADR-0352). - Invariant: the production proxy is always
fr_regressor_v2(no smoke models in the production path) and a single GPU verify pass at recommend-end is mandatory — proxy alone never wins. Thevmaftune.proxy.run_proxyhelper is the single seam every fast-path consumer goes through; future probabilistic-head / ensemble migrations land in that one module. ENCODER_VOCAB v2 one-hot ordering is frozen by ADR-0352 and pinned inproxy.ENCODER_VOCAB_V2— keep in sync withai/scripts/train_fr_regressor_v2.py; drift raisesProxyErrorat inference time before bad predictions ship. - On upstream sync: zero interaction with Netflix/vmaf master.
- Re-test on rebase:
0307 — vmaf-tune ladder default sampler wiring (ADR-0307)¶
- What changed: fork-local tooling.
tools/vmaf-tune/src/vmaftune/ladder.py::_default_samplerno longer raisesNotImplementedError; it composescorpus.iter_rows(Phase A encode + score) withrecommend.pick_target_vmaf(smallest CRF clearing target VMAF) overDEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)at the adapter's mid-range preset. Module-level docstring + AGENTS.md invariant updated. New tests intools/vmaf-tune/tests/test_ladder.pystubiter_rowsviamonkeypatch.setattrso no live ffmpeg / vmaf binaries are needed. - Upstream source: fork-local. The whole
tools/vmaf-tune/tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation / ladder surface. - On upstream sync: zero interaction.
tools/vmaf-tune/is not mirrored from upstream. - Rebase invariant: the 5-point sweep
(18, 23, 28, 33, 38)is the load-bearing default; downstream Phase E callers size their wall-time budget against five encodes per(resolution, target_vmaf)cell. Do not widen / narrow it without an ADR-0307 follow-up. TheSamplerFnseam stays open — callers needing finer grids pass an explicitsampler=. - Re-test on rebase:
0309 — fr_regressor_v2 ensemble real-corpus retrain harness (ADR-0309)¶
- ADR: ADR-0309
- Touches: entirely fork-local.
ai/scripts/run_ensemble_v2_real_corpus_loso.sh(new — Bash wrapper that loops the five seeds over the existingtrain_fr_regressor_v2_ensemble_loso.pyagainst.workingdir2/netflix/).ai/scripts/validate_ensemble_seeds.py(new — calls the ADR-0303 gate and writesPROMOTE.json/HOLD.jsonwith a corpus sha256 snapshot).ai/tests/test_validate_ensemble_seeds.py(new — 7 tests, synthetic JSON fixtures for both verdict paths).ai/AGENTS.md— appended "Registry-flip is a separate PR (ADR-0309)" paragraph under the existingfr_regressor_v2_ensemble_v1section.docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md,docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md,docs/ai/ensemble-v2-real-corpus-retrain-runbook.md(all new).- Rebase invariant: the harness is decoupled from the registry mutation. Neither the wrapper nor the validator touches
model/tiny/registry.json; the registry flip is a separate follow-up PR gated on a passingPROMOTE.json. Auto-flipping on PROMOTE was rejected in ADR-0309's alternatives matrix specifically because rebase-time mutation of shipped registry rows is the foot-gun this invariant exists to prevent. - Re-test on rebase:
python -m pytest ai/tests/test_validate_ensemble_seeds.py -v
python ai/scripts/validate_ensemble_seeds.py --help
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
- Upstream source: zero.
- On upstream sync: zero interaction.
0310 — BVI-DVC corpus ingestion for fr_regressor_v2 (ADR-0310)¶
- Touches:
ai/scripts/bvi_dvc_to_corpus_jsonl.py(new fork-only adapter),ai/scripts/merge_corpora.py(new fork-only shard merger),ai/tests/test_merge_corpora.py(new),docs/ai/bvi-dvc-corpus-ingestion.md(new),docs/adr/0310-bvi-dvc-corpus-ingestion.md(new),docs/research/0082-bvi-dvc-corpus-feasibility.md(new),ai/AGENTS.md(BVI-DVC invariant note). - Invariant: the BVI-DVC archive and any extracted artefacts (parquet, cached libvmaf JSON, JSONL corpus shard) are research-only and stay local — only derived
fr_regressor_v2_*.onnxweights ship. The merge utility validates every row against the canonicalvmaftune.CORPUS_ROW_KEYStuple; the schema is the merge contract. Re-shape here is a pure transform on the cached libvmaf JSON; no ffmpeg / vmaf binary is invoked. The(src_sha256, encoder, preset, crf)natural key is load-bearing for de-duplication across mirrors and re-encodes. - Upstream interaction: none.
ai/is fork-introduced; BVI-DVC is not part of Netflix/vmaf upstream. - Re-test on rebase:
ADR-0312 — ffmpeg-patches/ vmaf-tune integration (2026-05-05)¶
- Files:
ffmpeg-patches/0007-libvmaf-tune-qpfile-unified.patch,ffmpeg-patches/0008-add-libvmaf_tune-filter.patch,ffmpeg-patches/0009-pass-autotune-cli-glue.patch,ffmpeg-patches/series.txt,ffmpeg-patches/README.md. - Rebase invariant: patches
0007–0009plug into the cumulative state after patches0001–0006apply against pristinen8.1. Per-patchgit apply --checkin isolation is the wrong gate; use the series-replay command in CLAUDE.md §12 r14 instead. - vmaf-tune patch invariant: the qpfile parser at
libavcodec/qpfile_parser.{c,h}is shared across all three encoder adapters in patch 0007. Future encoders that grow a-qpfileAVOption inherit it; do not fork the parser. Whentools/vmaf-tune/src/vmaftune/saliency.py's qpfile output format changes (new column, different frame-type alphabet, …), patch 0007 must change in the same PR (CLAUDE.md §12 r14). - vf_libvmaf_tune full-scoring promotion (2026-05-06): patch 0008 originally shipped as a scaffold (linear CRF↔VMAF interpolation, no libvmaf scoring) per ADR-0312's deferred-alternatives column. The filter now mirrors
vf_libvmaf.c's CPU framesync pipeline end-to-end (vmaf_init+vmaf_model_load+vmaf_use_features_from_modelin init(); per-framevmaf_picture_alloc+ memcpy +vmaf_read_pictures; flush +vmaf_score_pooled(MEAN)in uninit()). The CRF recommendation remains a piece-wise linear projection from the observed VMAF; per-clip Optuna TPE search stays intools/vmaf-tune/src/vmaftune/recommend.py. Rebase-side: the new filter still depends only on libvmaf's CPU C-API (vmaf_init,vmaf_model_load,vmaf_use_features_from_model,vmaf_read_pictures,vmaf_score_pooled,vmaf_close,vmaf_picture_alloc/unref); zero new symbols beyond whatvf_libvmaf.calready requires, so future libvmaf rebases that pass the existing libvmaf filter pass this one too. ADR-0312 sub-decision retired. - n7+ API migration (2026-05-06): patch 0008 originally referenced the removed
AVFilterLink::frame_ratemember directly (n6-era API); in n7+ that field moved offAVFilterLinkonto a newFilterLinkstruct accessed viaff_filter_link(AVFilterLink *)fromlibavfilter/filters.h. Patch 0008 now usesff_filter_link(outlink)->frame_rate = ff_filter_link(mainlink)->frame_rate;inconfig_output(), mirroring patches 0005/0006 which were already written against the post-n7 API. The bug slipped through CI because the FFmpeg-Vulkan lane only buildsvf_libvmaf.o, notvf_libvmaf_tune.c; the full SYCL lane catches it now that PR #415 addedffmpeg-patches/**to the integration workflow's path filter. Discovery: PR #415 / ADR-0317. - Upstream source: zero. The vmaf-tune integration is fork-introduced; pure upstream syncs are unaffected.
- On upstream sync: zero interaction with libvmaf master. FFmpeg-side rebases when n8.1 → n8.x land in
ffmpeg-patches/test/build-and-run.sh'sFFMPEG_SHAare tracked separately under each refresh ADR (e.g., ADR-0277 for the 2026-05-04 refresh). - Re-test on rebase:
git -C /path/to/ffmpeg-8 reset --hard n8.1
for p in ffmpeg-patches/000*-*.patch; do
git -C /path/to/ffmpeg-8 am --3way "$p" || break
done
# Build smoke (libvmaf-disabled — patches 0001–0006 skipped if libvmaf_dnn
# is not built). With libvmaf_dnn available:
cd /path/to/ffmpeg-8 && ./configure --enable-libvmaf --enable-libx264 --enable-libsvtav1 --enable-libaom --enable-gpl
make -j$(nproc) ffmpeg
./ffmpeg -hide_banner -h encoder=libx264 2>&1 | grep -i qpfile
- 2026-05-06 update — patch 0007 SVT-AV1 ROI bridge promoted from scaffold to full impl: the libsvtav1 hunk now sets
enc_params.enable_roi_map = true, builds oneSvtAv1RoiMapEvtper qpfile frame upfront ineb_enc_init(per-MB qp_offsets averaged into per-64×64-SBb64_seg_mapof up to 8 segment QPs; uniform binning when the value span exceeds the segment budget), and attaches each event as aROI_MAP_EVENTpriv-data node fromeb_send_frame()withnode->size = sizeof(SvtAv1RoiMapEvt*)(the validation contract enforced by SVT-AV1'sresource_coordination_process.c). Lifetime invariant: events + maps live for the entire encode session because SVT-AV1 reads ROI_MAP_EVENT data via shallow-copied pointers on async pipeline threads (perenc_handle.c::copy_private_data_list);eb_enc_closefrees them. Wiring is gated onSVT_AV1_CHECK_VERSION(1, 6, 0); older SVT-AV1 builds keep the log-and-continue fallback. libaom remains scaffold-only — itsAOME_SET_ROI_MAPbridge stays a separate follow-up. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision). - 2026-05-06 update — patch 0007 libaom-av1 ROI bridge promoted from scaffold to full impl: the libaom-av1 hunk now caches the parsed
VmafTuneQpFileinAOMContext, allocates a segment-id map at libaom's mode-info grid (ALIGN_POWER_OF_TWO(dim, 8) >> 2, sinceav1/common/enums.h::MI_SIZE == 4), and on every encoded frame picks up to 8 segment QPs from the per-frame qp_offset value range (uniform linear binning when the span exceedsAOM_MAX_SEGMENTS == 8), paints the per-mi segment map by expanding each per-16×16-MB qp_offset into a 4×4 block of mi cells, and issuesaom_codec_control(&ctx->encoder, AOME_SET_ROI_MAP, &roi_map). Lifetime invariant: libaom deep-copies the segment map anddelta_q[]table on every control call (perav1/encoder/encoder.c::av1_set_roi_map memcpy), so a single buffer is reused across frames and freed inaom_free(). The qpfile is also freed there. Trade-off: the 8-segment cap rounds nearby qp_offsets together when the saliency model emits more than 8 distinct values per frame; finer granularity requiresvmaf-tune corpusinstead. This retires the libaom-av1 deferral noted under ADR-0312 — both AV1 encoder hooks (libsvtav1 and libaom-av1) are now full-impl. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).
0315 — Vendor-neutral VVC encode strategy (ADR-0315 / Research-0085)¶
- ADR: ADR-0315
- Digest: Research-0085
- Touches: docs-only.
docs/research/0085-vendor-neutral-vvc-encode-landscape.md(new).docs/adr/0315-vendor-neutral-vvc-encode-strategy.md(new).docs/adr/_index_fragments/0315-vendor-neutral-vvc-encode-strategy.md(new).docs/adr/_index_fragments/_order.txt(one-line append).changelog.d/added/research-0085-vendor-neutral-vvc-encode.md(new).docs/rebase-notes.md(this entry).- Rebase invariant: none. The research digest and ADR are pure surveys with no code dependencies; nothing in the fork's source tree references them in a way that breaks on upstream rebase.
- Upstream source: zero. VVC encode strategy is a fork-local decision; upstream Netflix/vmaf has no codec adapter or encode-automation surface.
- On upstream sync: zero interaction. Pure docs.
- Re-test on rebase:
- 2026-05-06 follow-up (Research-0085 verification pass):
docs/research/0085-vendor-neutral-vvc-encode-landscape.mdflipped fromStatus: SKELETONtoStatus: Active. Most[UNVERIFIED]claims are now backed by primary-source URLs (NVIDIA SDK 13.0 docs, AMD AMF GitHub, Intel oneVPL GitHub +mfxstructures.h+CHANGELOG.md, Khronos registry, Phoronix Mesa/RADV coverage, VVenC issue tracker, ZLUDA repo).- ADR-0315's
## Contextand## Alternatives consideredrefreshed with the verified data points. Status staysProposed. [UNVERIFIED]count in the digest dropped 25 → 10; remaining items are legitimate gaps (NN-VC quality lift, vvenc per-kernel profile, HHI's non-public roadmap).- No code touched. No rebase impact beyond the existing docs-only posture.
0316 — cli_parse.c error() long-only-option fix (ADR-0316)¶
- ADR: ADR-0316 (follow-up to ADR-0311).
- Digest: none — bug-fix; fix shape fits in the ADR/commit body.
- Touches:
core/tools/cli_parse.c(3 lines — call-site arg change at theARG_THREADS/ARG_SUBSAMPLE/ARG_CPUMASKhandlers).core/test/fuzz/fuzz_cli_parse.c(removedknown_assert_in_inputearly-reject filter).core/test/fuzz/cli_parse_corpus/cli_threads_abbrev_assert.argv(promoted fromcli_parse_known_crashes/).core/test/test_cli_parse_long_only_args.c(new fork()-based regression test).core/test/meson.build(new test wiring, gated off Windows alongsidetest_y4m_411_oob).core/tools/AGENTS.md(added a long-only-options invariant note next to the existingcli_parse.crules).- Rebase invariant: load-bearing.
cli_parse.cis upstream-mirror with fork additions; the three handlers carry the fork-local shape of passing theARG_*enum value (not't'/'s'/'c') toparse_unsigned(). If an upstream sync re-introduces the original short-option char shape, the assert returns and the parked-then-promoted reproducer (cli_parse_corpus/cli_threads_abbrev_assert.argv) will surface it in the next nightly fuzz run. - Upstream source: the bug shape exists in Netflix/vmaf master too (long-only options were added upstream with the same short-option-char placeholder). When the fork ports an upstream fix that overlaps these handlers, prefer the
parse_unsigned(optarg, ARG_*, argv[0])form already on the fork. - On upstream sync: re-apply the three-line change in
cli_parse.cif upstream resets the call-site args. The unit test is fork-local and stays. - Re-test on rebase:
meson setup core/build libvmaf -Denable_tests=true \
-Denable_cuda=false -Denable_sycl=false
ninja -C core/build test/test_cli_parse_long_only_args
meson test -C core/build test_cli_parse_long_only_args -v
ADR-0317 — CI flake fix: doc-only PR path-filter (2026-05-06)¶
- Touched files:
.github/workflows/docker-image.yml— addedpaths:filter on bothpush:andpull_request:triggers..github/workflows/ffmpeg-integration.yml— addedpaths:filter on bothpush:andpull_request:triggers (covers all four matrix lanes: gcc, clang, SYCL, Vulkan).docs/adr/0317-ci-doc-only-pr-flake-fix.md,docs/adr/README.md(index row),changelog.d/fixed/ci-doc-only-pr-flakes.md.- Rebase invariant: not load-bearing. Workflow-only change. Both files are fork-local CI; upstream Netflix/vmaf does not ship a Docker workflow or an FFmpeg-integration matrix in this shape, so rebase conflicts are unlikely. If a future upstream sync introduces an overlapping
docker-image.ymlor FFmpeg matrix, prefer the fork's path-filtered form — the rationale (ADR-0313 aggregator posture, doc-only-PR runner-time burn) is fork-specific. - Upstream source: none — fork-local CI workflows.
- On upstream sync: no action required. If reviewers later add new build inputs (e.g. a top-level
docker-compose.yml, a newffmpeg-patches/*.txtconfig file), extend thepaths:lists in the same PR that adds the input. - Follow-up not in this ADR: patch
ffmpeg-patches/0008-add-libvmaf_tune-filter.patchline 256 (outlink->frame_rate = mainlink->frame_rate;) needs to migrate to theff_filter_link()accessor introduced in FFmpeg n7+, matching the pattern already in patches 0005 / 0006. Tracked separately; the path-filter does not hide it (any libvmaf/ or ffmpeg-patches/ PR will still trip the SYCL lane). - Re-test on rebase:
python3 -c "import yaml; \
yaml.safe_load(open('.github/workflows/docker-image.yml')); \
yaml.safe_load(open('.github/workflows/ffmpeg-integration.yml')); \
print('OK')"
0319 — fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training (ADR-0319)¶
- Touches:
ai/scripts/train_fr_regressor_v2_ensemble_loso.py(real_load_corpus+_train_one_seedbodies),ai/scripts/run_ensemble_v2_real_corpus_loso.sh(wrapper argv fix),docs/ai/ensemble-v2-real-corpus-retrain-runbook.md(Step 0 corpus-generation section),ai/AGENTS.md(canonical-6 schema invariant note),ai/tests/test_train_fr_regressor_v2_ensemble_loso_*.py(loader + train schema tests). Closes the deferrals tracked in rebase-notes §0303 + §0309. - Upstream source: none — fork-local ML training infrastructure. Netflix/vmaf upstream has no
fr_regressor_v2surface, no LOSO trainer, and no canonical-6 corpus tooling. - Invariant: the trainer's
_load_corpusaccepts the canonical-6 JSONL schema emitted byscripts/dev/hw_encoder_corpus.pybit-for-bit — required keys per row are(src, encoder, cq, frame_index, vmaf, adm2, vif_scale0..3, motion2). Codec block layout is 12-slotENCODER_VOCABv2 one-hot + constantpreset_norm = 0.5+crf_norm = (cq - cq_min) / (cq_max - cq_min). Schema changes require anENCODER_VOCAB_VERSIONbump and full ensemble retrain per the existing closed-vocabulary rule (ADR-0235 / ADR-0352). Fold-level StandardScaler is fit on the training rows only; leaking the held-out source's distribution into the scaler would silently inflate per-fold PLCC. - On upstream sync: no action required. If upstream Netflix/vmaf ever adds a competing LOSO trainer under
python/vmaf/, do NOT merge them — keep the fork's training stack underai/per the AGENTS.md scope rule. - Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v2_ensemble_loso_loader.py \
ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py -v
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
ADR-0323 — fr_regressor_v3 train + register on ENCODER_VOCAB v3 (2026-05-06)¶
- Scope:
ai/scripts/train_fr_regressor_v3.py(new),ai/tests/test_train_fr_regressor_v3.py(new),model/tiny/fr_regressor_v3.onnx(new, real-weight checkpoint from a 9-fold LOSO gate-pass at mean PLCC 0.9975),model/tiny/fr_regressor_v3.json(new sidecar withencoder_vocab_version: 3and full per-fold trace),model/tiny/registry.json(newfr_regressor_v3row,smoke: false),ai/AGENTS.md(v3 retrain invariant section gains a "Status" subsection recording the gate result),docs/ai/models/fr_regressor_v3.md(new model card),docs/adr/0323-fr-regressor-v3-train-and-register.md+ index row,changelog.d/added/fr-regressor-v3-train-register.md. - Rebase impact: zero. Fork-local feature; no upstream Netflix/vmaf surface is touched. The 16-slot
ENCODER_VOCAB_V3imported fromtrain_fr_regressor_v2.pywas already landed by PR #401 (ADR-0302). - On upstream sync: no action required. The v3 model ships alongside v2 —
fr_regressor_v2.onnxand its sidecar are unchanged; the v3 row is appended to the registry and sorted alphabetically. If a future upstream sync ever lands a competingfr_regressor_v3model underpython/vmaf/, do NOT cross-link them — the fork's training stack lives underai/. - Watch out for: the live
ENCODER_VOCAB_VERSIONinai/scripts/train_fr_regressor_v2.pystays at 2 (per ADR-0302's invariant). Do not bump it to 3 in this PR or in any downstream port; the in-place promotion of v3 over v2 is a separate "promote v3 to authoritative" PR per ADR-0302's production-flip checklist. - Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh # must report OK: 20+
python -c "import onnx; onnx.checker.check_model(onnx.load('model/tiny/fr_regressor_v3.onnx')); print('OK')"
ADR-0321 — fr_regressor_v2_ensemble_v1 full production flip (2026-05-06)¶
- Scope:
ai/scripts/export_ensemble_v2_seeds.py(new),model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.onnx(real full-corpus-trained weights replacing the 3025-byte synthetic scaffold bytes),model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.json(new per-seed sidecars),model/tiny/registry.json(sha256 +smoke: falseon the five seed rows),ai/AGENTS.md(new invariant: the registry-flip is now done; future re-flips require a fresh PROMOTE.json + re-run of the export driver). - Rebase impact: zero. This is a fork-local production-flip; no upstream Netflix/vmaf surface is touched. The 12-slot
ENCODER_VOCABv2 carried in each sidecar is the same one the LOSO trainer (ADR-0319) bakes into the codec-block layout, so there is no rebase-time vocabulary drift to worry about. - Watch out for: if a future upstream sync ever introduces a competing
fr_regressor_v2_ensemble_*model underpython/vmaf/, do NOT cross-link them — the fork's ensemble weights are gated onruns/ensemble_v2_real/PROMOTE.jsonand are not portable to a different training stack. - Re-test on rebase:
bash core/test/dnn/test_registry.sh # must report OK: 19
python -c "import onnx; \
[onnx.checker.check_model(onnx.load(f'model/tiny/fr_regressor_v2_ensemble_v1_seed{i}.onnx')) \
for i in range(5)]; print('OK')"
ADR-0324 — Ensemble training kit (2026-05-06)¶
- Touches:
tools/ensemble-training-kit/(new),docs/adr/0324-ensemble-training-kit.md(new),docs/adr/README.md(index row),changelog.d/added/0324-ensemble-training-kit.md(new). No engine code touched; no upstream-shared paths. - Invariant: the kit assumes the LOSO wrapper hard-codes seeds
(0 1 2 3 4). The orchestrator surfaces a warning if--seedsdeviates but still hands off to the wrapper. If a future PR parameterises the wrapper's seed list, update both the wrapper and the kit's pass-through logic in lockstep. - On upstream sync: no action required. The kit lives entirely under
tools/ensemble-training-kit/(a fork-local path) and only invokes other fork-local scripts (ai/scripts/,scripts/dev/,scripts/ci/). - Re-test on rebase:
bash -n tools/ensemble-training-kit/*.sh
bash tools/ensemble-training-kit/make-distribution-tarball.sh /tmp/kit-test.tar.gz
tar -tzf /tmp/kit-test.tar.gz | grep -q "tools/ensemble-training-kit/run-full-pipeline.sh"
ADR-0335 — Hardware-capability priors (2026-05-08)¶
- Touches:
ai/data/hardware_caps.csv(new),ai/scripts/hardware_caps_loader.py(new),ai/tests/test_hardware_caps.py(new),ai/AGENTS.md(one new bullet under "Rebase-sensitive invariants"),docs/ai/hardware-capability-priors.md(new),docs/research/0088-hardware-capability-priors-2026-05-08.md(new),docs/adr/0335-hardware-capability-priors.md(new),docs/adr/_index_fragments/0335-hardware-capability-priors.md(new),docs/adr/_index_fragments/_order.txt(one-line append),CHANGELOG.md(Added bullet under[Unreleased] — lusoris fork). No upstream-shared paths. - Invariant: the table is prior-only. The schema check in
hardware_caps_loader.pyrejects benchmark-shaped header columns (fps_*,throughput,mbps,latency,watts,tdp,score_*,vmaf_*), community-wiki source URLs (wikipedia.org,wikichip.org), empty fields, and rows withencoding_blocks=0. Adding throughput / quality columns is forbidden — that pathology was the contributor-pack digest's category-1 NO-GO finding. Schema extensions need a new ADR, not a silent column bump. Thecap_vector_for()return-dict shape is load-bearing: trainers / corpus writers consumehwcap_*columns by name; reordering or renaming silently breaks downstream parquet schemas. - On upstream sync: no action required. The whole surface lives under
ai/anddocs/— Netflix upstream has no equivalent. - Re-test on rebase:
python -m pytest ai/tests/test_hardware_caps.py -v # must report 23 passed
python ai/scripts/hardware_caps_loader.py # JSON dump, 6+ rows
ADR-0332 — External-competitor benchmark harness (2026-05-08)¶
- Touches:
tools/external-bench/(new),docs/adr/0332-external-bench-wrapper-only.md(new),docs/adr/_index_fragments/0332-external-bench-wrapper-only.md(new),docs/adr/_index_fragments/_order.txt(one-line append),docs/adr/README.md(regenerated),changelog.d/added/external-bench-harness.md(new),docs/research/0087-external-bench-competitor-survey-2026-05-08.md(new). No engine code touched; no upstream-shared paths. - Invariant: the harness is wrapper-only — never vendor or link
x264-pVMAF(GPL-2.0) into this fork. Future competitors follow the same pattern (tools/external-bench/<competitor>/run.shinvokes a user-installed binary via env var; output schema-shimmed into the canonical JSON shape). The output schema (frames[].{frame_idx, predicted_vmaf_or_mos, runtime_ms}+summary.{competitor, plcc, srocc, rmse, runtime_total_ms, params, gflops}) is the contract between every wrapper andcompare.py.run_wrapper'srunnerparameter MUST stay resolved at call time (not via default-arg binding) so monkeypatch-based tests work. - On upstream sync: no action required. The harness lives entirely under
tools/external-bench/(a fork-local path) and never touches Netflix-shared code. - Re-test on rebase:
python3 -m pytest tools/external-bench/tests/ -q # must report 7 passed
bash -n tools/external-bench/*/run.sh
0327 — Conformal-VQA prediction surface for vmaf-tune (ADR-0279)¶
- Touches:
tools/vmaf-tune/src/vmaftune/conformal.py(new),tools/vmaf-tune/src/vmaftune/predictor.py(Predictor.predict_vmaf_with_uncertainty),tools/vmaf-tune/src/vmaftune/cli.py(predictsubcommand gains--with-uncertainty/--calibration-sidecar/--alpha),tools/vmaf-tune/tests/test_conformal.py(new),docs/ai/conformal-vqa.md(new). No engine code touched; no upstream-shared paths. - Invariant: the conformal wrapper sits outside the ONNX graph and adds no new runtime dependency —
conformal.pyimports only the standard library (math,statistics,dataclasses,json,warnings). Future calibration-sidecar shapes use themethoddiscriminator string for versioning; do not rename"split-conformal"/"cv-plus"without bumping the loader. ThePredictor.predict_vmaf_with_uncertaintysignature is the Python-API contract consumed byvmaf-tune predict --with-uncertainty; renaming or reordering its keyword args breaks the CLI in lockstep. - On upstream sync: no action required.
vmaf-tuneis a fork-local tool; upstream Netflix/vmaf has no per-shot prediction surface. - Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_conformal.py -q
python3 -m pytest tools/vmaf-tune/tests/test_predictor.py -q
CI paths-ignore deny-list on heavy workflows (ADR-0341, 2026-05-09)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(fork-local —paths-ignore:block underpull_request:),.github/workflows/tests-and-quality-gates.yml(fork-local — same block),docs/adr/0341-ci-paths-ignore-doc-only-prs.md+ index fragment,changelog.d/changed/ci-paths-ignore-doc-only.md. - Invariant: the deny-list must stay strictly documentation-only (
docs/**,**/*.md,changelog.d/**,CHANGELOG.md,.workingdir2/**). Any path that contributes to a build, test, or lint input —libvmaf/**,meson.build,meson_options.txt,subprojects/**,python/**,ai/**,mcp-server/**,model/**,testdata/**,.github/workflows/**— must NEVER appear in the deny-list, otherwise the corresponding required check is silently skipped on a code-touching PR. The Required Checks Aggregator (ADR-0313) catches only the doc-only case (no required check ever ran for any required name); a too-broad deny-list would lose build coverage without anyone noticing. - On upstream sync: Netflix/vmaf upstream does not carry these two workflow files (they are fork-local additions). No sync conflict expected.
- Re-test on rebase:
HDR VMAF model search — Path C documentation only (2026-05-09)¶
- Files added (this fork only; upstream Netflix/vmaf has none of these):
model/vmaf_hdr_model_card.md— discoverable warning that the HDR scoring path falls back to the SDRvmaf_v0.6.1.jsonweights. Filename deliberately uses.md, not.json, so thevmaftune.hdr.select_hdr_vmaf_modelglob (vmaf_hdr_*.json) keeps returningNone.docs/research/0089-hdr-vmaf-model-search.md— verbatim trail of the source-or-train survey (URLs + access dates).changelog.d/added/hdr-vmaf-model-search.md— release-notes fragment per ADR-0221.- ADR-0300 grew an inline
### Status update 2026-05-09: HDR model statussection. - Why no model JSON ships: Path A negative findings (no public Netflix HDR VMAF model exists; HDRMAX is a different algorithm not loadable by libvmaf's JSON path). Path B deferred behind gated subjective HDR corpora + multi-day training compute. No fabricated weights are introduced.
- On upstream sync: if Netflix lands
vmaf_hdr_*.jsoninNetflix/vmaf/model/, port via/port-upstream-commit; the resolver picks it up automatically with novmaftunechange. Then deletemodel/vmaf_hdr_model_card.md(or rewrite it as a normal model card describing the upstream weights). Watch https://github.com/Netflix/vmaf/issues/645 for the upstream release announcement. - Re-test on rebase: no behavioural change — pure docs. Sanity:
python3 -c "from pathlib import Path; \
import sys; sys.path.insert(0,'tools/vmaf-tune/src'); \
from vmaftune.hdr import select_hdr_vmaf_model; \
print(select_hdr_vmaf_model(Path('model')))"
# Expect: None — confirms the .md card does not match the glob
ADR-0349 — fr_regressor_v3 namespace resolution (2026-05-09)¶
- Rebase impact: none. Docs-only change — adds ADR-0349, an append-only status appendix on ADR-0302 per ADR-0028, a
## fr_regressor_* namespace mapblock inai/AGENTS.md, and two changelog fragments. No upstream Netflix/vmaf surface touched; nofr_regressor_*registry rows touched (sha256s for_v1,_v2,_v2_ensemble_v1_seed{0..4},_v3all unchanged); no C / Python / ONNX bytes modified. - What to check after a rebase: nothing automated. The only drift risk is a future agent claiming
fr_regressor_v3plus_featuresfor an unrelated workstream —ai/AGENTS.mdcarries the reservation; reviewers verify the map row exists before approving any newfr_regressor_*registry id. - Reproducer:
```bash # ADR + AGENTS.md namespace map present and consistent: test -f docs/adr/0349-fr-regressor-v3-namespace.md grep -q "fr_regressor_* namespace map" ai/AGENTS.md grep -q "fr_regressor_v3plus_features" ai/AGENTS.md docs/adr/0349-fr-regressor-v3-namespace.md # Status appendix present on ADR-0302: grep -q "Status update 2026-05-09: namespace collision resolved" \ docs/adr/0302-encoder-vocab-v3-schema-expansion.md # Existing v3 production row bit-identical (sha256 unchanged): python3 -c "
import json reg = json.load(open('model/tiny/registry.json')) v3 = next(m for m in reg['models'] if m['id'] == 'fr_regressor_v3') assert v3['sha256'] == 'eaa16d23461eda74940b2ed590edfcaf13428aade294e47792a5a15f4d3b999c', v3 assert v3['smoke'] is False print('OK: fr_regressor_v3 production row unchanged') "
Registry test still passes:¶
bash core/test/dnn/test_registry.sh
0327 — Pre-push PR-body deliverables validator hook¶
- Touches:
scripts/ci/validate-pr-body.sh(new),scripts/git-hooks/pre-push(new),scripts/ci/test-validate-pr-body.sh(new),Makefile(hooks-installtarget adds the pre-push symlink). Re-usesscripts/ci/deliverables-check.shparser verbatim — no upstream-shared file is modified. - Invariant: parser shape parity with
.github/workflows/rule-enforcement.ymldeep-dive-checklist gate (ADR-0108). The validator constructs aPATHshim that interceptsgit diff --name-onlycalls only; every othergitinvocation falls through to the real binary. - On upstream sync: not applicable — these files are entirely fork-local and Netflix has no equivalent. If
scripts/ci/deliverables-check.shis ever rewritten or moved, the validator's exec path (scripts/ci/deliverables-check.sh) and the test harness's expected exit codes must follow. bash scripts/ci/test-validate-pr-body.sh # 8/8 cases pass
0320 — Semgrep # nosemgrep cites on Netflix-upstream Python harness (Research-0090)¶
- Touches:
python/vmaf/core/asset.py,python/vmaf/core/executor.py,python/vmaf/core/feature_extractor.py,python/vmaf/core/quality_runner.py,python/vmaf/core/result_store.py,python/vmaf/tools/decorator.py,python/test/command_line_test.py,python/test/feature_extractor_test.py,python/test/ssimulacra2_test.py,python/vmaf/config.py. - Invariant: every fork-added
# nosemgrep: <rule-id>line is paired with an inline cite toResearch-0090. The cite + rule-id pair is the load-bearing artifact (per memoryfeedback_no_guessing: every "false positive" claim ships its safety proof). If an upstream sync removes the cited line of code, drop the cite-comment block too. If upstream adds adefusedxmlfix at theElementTree.parse()site (feature_extractor.py:115,quality_runner.py:1496), keep upstream's fix and drop our suppressions. config.py:40(the SSL-bypass deletion) is a fork-exclusive security fix; if upstream resurrectsssl._create_unverified_contexton a sync, do not re-merge it — the bypass clobbers the process-global default and is unjustified per Research-0090, F1. semgrep scan --config=p/cwe-top-25 --config=p/c --config=p/python . \ --metrics=off --json | jq '.results | length'
# expect 0 — every legit finding either has a # nosemgrep cite or was fixed
0321 — Security-scans workflow registry-pack list (Research-0090)¶
- Touches:
.github/workflows/security-scans.yml,.github/workflows/lint-and-format.yml. - Invariant: the registry packs the workflow cites (
p/cwe-top-25+p/c+p/python) are validated againsthttps://semgrep.dev/c/p/<pack>— the previously-citedp/cert-c-strict,p/cert-cpp-strict, andp/cpppacks were retired by Semgrep in 2025 and 404. Thelint-and-format.ymlpull of${{ github.* }}intoenv:(clang-tidy + clang-tidy-sycl steps) defusesrun-shell-injection; preserve the pattern on any edit. See Research-0090, F2/F3. for pack in p/cwe-top-25 p/c p/python; do code=$(curl -sIL "https://semgrep.dev/c/${pack}" | head -1 | awk '{print $2}') [ "$code" = "200" ] && echo "${pack}: OK" || echo "${pack}: FAIL ($code)"
0320 — CodeQL C bulk sweep (78 deferred alerts → 60 fixed, 14 deferred to T7-5)¶
- Touches:
core/src/feature/{cambi.c,ciede.c,integer_adm.c,integer_psnr.c,adm_tools.h,third_party/xiph/psnr_hvs.c},core/src/feature/x86/{adm_avx2.c,adm_avx512.c,ansnr_avx2.c,ansnr_avx512.c,vif_avx2.c,vif_avx512.c},core/src/{pdjson.c,svm.cpp},core/test/{test_cpu.c,test_model.c},core/tools/{y4m_input.c,yuv_input.c,vmaf_bench.c}. All butvmaf_bench.care upstream-mirror Netflix files. - Invariant: widening casts on integer multiplications (
(size_t),(uint64_t),(double)) are LHS-prefixed before the multiply, never wrapped around the whole expression — the latter is a no-op againstcpp/integer-multiplication-cast-to-long. Deleted commented-out blocks (e.g., the AVX-512 VP-loop dead variant inadm_avx512.c::adm_dwt2_inverse) are gone for good; if upstream brings them back, they reintroduce the alerts.iqa/convolve.cwas deliberately left untouched: prefixing(double)on the float×float multiplications inside the scalar reference path breaks bit-exactness against the AVX2 path enforced bytest_iqa_convolve— CodeQL alert deferred to a follow-up that updates both paths in lockstep. - On upstream sync: any upstream change that re-introduces the deleted comment blocks or rewrites the cast forms will surface the alerts again. The
cambi_scoresignature change (CambiBuffers buffers→const CambiBuffers *buffers) is fork-local and likely to conflict with upstream patches that touch that function. The 14 deferredVifBufferlarge-parameter alerts are tracked under T7-5 (multi-backend coordinated refactor including NEON). - Re-test on rebase: cd libvmaf && meson test -C build # all 50+ C tests make test-netflix-golden # upstream golden gate
# Re-run CodeQL on master afterwards; the 60 fixed alerts must stay closed.
CodeQL cpp/declaration-hides-variable sweep (2026-05-09)¶
- What changed: Mechanical rename / scope-tighten / dedupe sweep closing 64 open
cpp/declaration-hides-variableCodeQL alerts onmaster. Touched files:core/src/feature/cambi.c,core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/feature/x86/vif_avx2.c,core/src/feature/x86/vif_avx512.c. All five are upstream-mirror; the Netflix copyright header is preserved on each. - Renames adopted (semantic over
_2suffix): cambi.c: innerint errshadowing function-scopeerrbecomesmkdir_err(heatmaps init) andsrc_err(full-ref extract path).adm_avx2.c/adm_avx512.c: thej == 0first-column special-case block is wrapped in{ ... }so itsj0..j3ands0..s3stop being visible to the per-jtail loop. The inner duplicate__m256i add_shift_HP_vex = _mm256_set1_epi32(32768)(and 512-bit twin) is removed — bit-identical to the function-scope value already in scope. The__m256i rfactor1that shadowed the function-scopefloat rfactor1[3]becomesrfactor_v0/_v1/_v2(and the AVX-512 twin likewise).vif_avx2.c/vif_avx512.c: tap-loop locals followf_tap,r_top/r_bot,d_top/d_botfor the s0 stage, andf_tap0/f_tap1,r_back0/r_fwd0, etc. for the AVX-512 paired-tap stage. Inner per-fj__m256i fq/__m512i fqshadows of the centre-tap broadcast becomef_tap. Inner-block duplicates of function-scoperef/dis/stride/ii(identical types and initialisers) are simply removed. The two scalarVifResiduals residualsdeclarations that shadowed function-scopeResiduals512 residualsbecometail_residuals. The twoconst uint16_t fcoeffdeclarations that shadowed function-scope__m512i fcoeffbecomefcoeff_scalar.- Invariant: bit-exactness gate — the rename sweep must not change any score. The Netflix CPU golden 3 (
src01_hrc00,checkerboard_1,checkerboard_10) ran clean against this PR. All 76 VMAF-targeted Python tests pass; the 9 unrelated pre-existing failures (NIQE, PyPSNR, FileSystemResultStore) reproduce on a pristineorigin/mastercheckout. - On upstream sync: Netflix has no equivalent renames on upstream
masteras of2026-05-09. When syncing, prefer the fork's renamed identifiers (the CodeQL gate depends on them). If Netflix later renames the same locals differently, reconcile by keeping fork names and updating any imported chunks at port time. - Re-test on rebase: meson test -C build --suite=fast PYTHONPATH=$PWD/python python3 -m pytest \ python/test/quality_runner_test.py -k test_run_vmaf \ python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ -m "not slow" -q
ADR-0209 v1 stdio runtime (T5-2b) — Embedded MCP server (2026-05-08)¶
- Touches:
core/src/mcp/{mcp.c,dispatcher.c,transport_stdio.c,mcp_internal.h,meson.build,3rdparty/cJSON/{cJSON.c,cJSON.h,LICENSE}},core/test/test_mcp_smoke.c,core/test/meson.build. All paths are fork-local. cJSON is vendored verbatim from upstreamDaveGamble/cJSON@v1.7.18under its MIT license. - Invariant: every TU under
core/src/mcp/(other than the vendored cJSON dir) is fork-local with theCopyright 2026 Lusoris and Claude (Anthropic)header; cJSON keeps its upstream MIT header verbatim. The public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged from T5-2 — only function bodies flipped from-ENOSYSto working implementations. SSE / UDS still return-ENOSYSso the v2 PR can wire them without touching the public surface. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface; the entire
core/src/mcp/subtree is fork-local. If upstream ever adds an MCP surface, expect a port-only sync since names will collide. cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \ -Denable_mcp=true -Denable_mcp_stdio=true ninja -C build && meson test -C build test_mcp_smoke -v
ADR-0334 — state.md-touch-check CI gate (2026-05-08)¶
- Touches:
.github/workflows/rule-enforcement.yml(new top-level jobstate-md-touch-check),scripts/ci/state-md-touch-check.sh(new),scripts/ci/test-state-md-touch-check.sh(new),scripts/ci/AGENTS.md(new rebase-sensitive-surface row),.github/PULL_REQUEST_TEMPLATE.md(already carries the "Bug-status hygiene" section +no state delta: REASONopt-out — coupled to the script's regex). No upstream-shared paths. - Invariant: the gate's trigger predicate (Conventional-Commit
fix:prefix, barebugtoken in title, GitHub close-keywordscloses/fixes/resolves#N, unchecked Bug-status-hygiene checkbox) and opt-out sentinel (no state delta: REASON) match the wording of the## Bug-status hygienesection in.github/PULL_REQUEST_TEMPLATE.md. Reword the template only alongside the script. The job carries thepull_request.draft == false || github.event_name != 'pull_request'gate (ADR-0331 pattern) — keep that on any future hoist into the required-aggregator set. - On upstream sync: Netflix/vmaf has no equivalent rule. No conflict expected; the workflow file is fork-introduced.
- Re-test on rebase: bash scripts/ci/test-state-md-touch-check.sh python3 -c "import yaml; yaml.safe_load(open('.github/workflows/rule-enforcement.yml')); print('YAML OK')" pre-commit run shellcheck --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh pre-commit run shfmt --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh
SYCL PSNR chroma extension (T3-15(b), 2026-05-09)¶
- Touches:
core/src/feature/sycl/integer_psnr_sycl.cpp(per-extractor chroma device buffers, per-plane SSE accumulators, and aprovided_featuresextension topsnr_y/psnr_cb/psnr_cr),core/src/sycl/AGENTS.md(per-kernel rebase-sensitive invariant for the chroma-on-per-extractor-buffer arrangement),docs/metrics/features.md(footnote ¹ refresh — all three GPU PSNR extractors now emit chroma),docs/adr/0192-gpu-long-tail-batch-3.mdReferences-section status update,changelog.d/added/sycl-psnr-chroma.md. - Invariant on the chroma upload path: chroma planes ride on per-extractor device buffers populated by host-side staging copies in the combined-graph
pre_fncallback — NOT the SYCL state's shared frame buffer (vmaf_sycl_shared_frame_init), which is luma-only by design. Luma stays graph-recorded; chroma SSE kernels run direct inpost_fnon the same in-order combined queue. The CUDA twin (PR #520 / commit 7f3d58a5) uses the existing CUDA per-plane picture infrastructure and therefore has no equivalent invariant. - On upstream sync: Netflix/vmaf upstream has no SYCL backend at all, so conflict probability is zero on
psnr_sycl. If an upstream port to the fork's SYCL runtime someday extendsvmaf_sycl_shared_frame_initto allocate chroma planes, the PSNR extension can be migrated onto it and the per-extractor chroma buffers retired — but only after a cross-backend gate run confirms bit-exactness against CPU atplaces=4(ADR-0214). source /opt/intel/oneapi/setvars.sh CC=icx CXX=icpx meson setup build-sycl libvmaf \ -Denable_sycl=true -Denable_cuda=false ninja -C build-sycl python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build-sycl/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend sycl --device 0
# Expect 0/48 mismatches across psnr_y / psnr_cb / psnr_cr at places=4.
```text
Cppcheck nullPointer false-positive in dict.c (2026-05-09)¶
Files pinned:
core/src/dict.c:121(one-line redundant-condition fix indict_overwrite_existing). Why this rebase-note exists: Master CI'sCppcheck (Whole Project)gate started failing on commit14b5ffba(#537) and blocked every open PR because each PR rebases onto a broken master. The cppcheck finding was likely always present but masked bypaths-ignorefiltering on the prior workflow shape; PR #530 widened cppcheck's trigger surface and exposed it. Deleted the redundant&& valguard sincevalis already checked at the public entry-pointvmaf_dictionary_set(dict.c:137). No behavior change; cppcheck flags the original as "either the val check is redundant or there's a possible null deref" because it can't prove the interprocedural guarantee. Rebase-sensitivity: zero — change is local todict.c. Future upstream sync of this file should keep the fix or re-run cppcheck locally to confirm absence of recurrence.
Aggregator timeout bump (2026-05-09)¶
Files pinned:
.github/workflows/required-aggregator.yml(deadline 30→90 min, job timeout 35→100 min) Why: 41 PRs in flight 2026-05-09 morning hit Aggregator timeouts while real CI eventually passed. Bumping both deadlines unblocks the train without touching the underlying matrix. Rebase-sensitivity: zero — workflow file is wholly fork-local.
ARC self-hosted runner pool — pilot Cppcheck routing (2026-05-09)¶
.github/workflows/lint-and-format.yml(Cppcheckruns-on:ternary). Why: opt-in graceful migration; ADR-0359 + docs/development/ci-runners.md document the flip-the-variable recipe when the cluster is degraded. Rebase-sensitivity: zero — workflow file is fork-local.
ADR-0338 — macOS Vulkan-via-MoltenVK CI lane (2026-05-09)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(fork-local — addsBuild — macOS Vulkan via MoltenVK (advisory)lane, addscontinue-on-errorplumbing onmatrix.experimental && matrix.moltenvk, addsInstall MoltenVK + Vulkan loader/headers (macOS)step, addsRun Vulkan smoke tests (macOS MoltenVK)step, gates the existing test/cache/tox steps on!matrix.moltenvk),docs/backends/vulkan/moltenvk.md(new fork-local doc),docs/adr/0127-vulkan-compute-backend.md(status-update appendix per the ADR's Proposed status — body untouched),docs/adr/0338-macos-vulkan-via-moltenvk-lane.md(new),docs/adr/_index_fragments/0338-macos-vulkan-via-moltenvk-lane.mdplus_order.txtappend (new),docs/research/0089-moltenvk-feasibility-on-fork-shaders.md(new),changelog.d/added/macos-vulkan-via-moltenvk-lane.md(new). - Invariant on the upstream-mirror file: none —
libvmaf-build-matrix.ymlis fork-local. The new lane'scontinue-on-errorclause MUST stay scoped tomatrix.experimental == true && matrix.moltenvk == trueso existingexperimental: truematrix entries (e.g. the macOS DNN lane) keep their default fail-fast behaviour.VK_ICD_FILENAMESMUST point at/opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json— note theetc/vulkansegment, NOTshare/vulkan(the homebrew formula's install layout usesetc/; verified againstFormula/m/molten-vk.rb). - On upstream sync: Netflix upstream has no macOS Vulkan lane and no MoltenVK awareness; nothing to reconcile. If a future MoltenVK release drops support for
GL_EXT_shader_atomic_int64translation,moment.compwill fail on the lane; the fix path is in ADR-0338 §Decision (lane iscontinue-on-errorso it does not block PRs) — update the known-limitations table indocs/backends/vulkan/moltenvk.mdand either pin a working MoltenVK version in the brew install line or rewrite the shader. - Re-test on rebase:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/libvmaf-build-matrix.yml'))" && \
echo "YAML parse OK"
# Confirm the lane is still in the matrix:
grep -q "Build — macOS Vulkan via MoltenVK (advisory)" \
.github/workflows/libvmaf-build-matrix.yml
# Confirm the lane is NOT promoted to required-aggregator until one
# green run on master (per ADR-0338):
! grep -q "macOS Vulkan via MoltenVK" \
.github/workflows/required-aggregator.yml
# Confirm the ICD path is the etc/ one, not share/:
grep -q "etc/vulkan/icd.d/MoltenVK_icd.json" \
.github/workflows/libvmaf-build-matrix.yml
ADR-0363 — Mend Renovate replaces Dependabot (2026-05-09)¶
- Touches:
renovate.json(new, repo-root),.github/workflows/renovate.yml(new),.github/dependabot.yml(deleted — renamed to.github/dependabot.yml.disabled),docs/development/dependency-bot.md(new operator playbook),changelog.d/changed/renovate-supersedes-dependabot.md(new),docs/adr/0363-renovate-replaces-dependabot.md(new),docs/adr/_index_fragments/0363-renovate-replaces-dependabot.md(new). - Invariant:
.github/dependabot.ymlno longer exists onmaster; the disabled copy isdependabot.yml.disabled. On upstream sync, if Netflix ever ships their owndependabot.yml, do NOT restore it — the fork intentionally uses Renovate. Merge the upstream file intodependabot.yml.disabledfor reference only. - Upstream interaction: none. Netflix/vmaf upstream has no Renovate config. Conflict risk is zero unless upstream adds
renovate.jsonor restoresdependabot.yml. - Re-test on rebase:
# Verify the workflow SHA-pin is still present and non-floating:
grep -E 'renovatebot/github-action@[a-f0-9]{40}' .github/workflows/renovate.yml
# Verify dependabot.yml is still absent:
test ! -f .github/dependabot.yml && echo "ok: dependabot.yml absent"
# Validate renovate.json syntax (requires Node):
node -e "JSON.parse(require('fs').readFileSync('renovate.json','utf8')); console.log('JSON valid')"
ADR-0355 — Symphony-inspired agent-dispatch infrastructure (2026-05-09)¶
Files added (all fork-introduced, none mirror upstream):
.claude/workflows/_template.md,.claude/workflows/codeql-alert-sweep.md,.claude/workflows/simd-port.md,.claude/workflows/feature-extractor-port.md.scripts/lib/__init__.py,scripts/lib/backlog_tracker.py,scripts/lib/AGENTS.md.scripts/ci/agent-eligibility-precheck.py(new row inscripts/ci/AGENTS.md"Rebase-sensitive surfaces" table).docs/development/agent-dispatch.md. Why this rebase-note exists: pure additive, all paths are fork-only (.claude/,scripts/lib/, fork-only docs). Upstream Netflix/vmaf has no.claude/, noscripts/lib/, and nodocs/development/agent-dispatch.md, so the merge surface is zero on/sync-upstream. The only coupling is internal betweenscripts/ci/agent-eligibility-precheck.pyandscripts/lib/backlog_tracker.py(sys.path import). Both files move together; documented inscripts/lib/AGENTS.mdand a new row inscripts/ci/AGENTS.md. Rebase-sensitivity: zero w.r.t. upstream. Internal-only: renamingBacklogItemfield names or theBacklogTracker/GitHubTrackerpublic method signatures is a breaking change for the precheck and any future state-audit script — guard via the smoke listed in Research-0091 §"Smoke results" before any rename PR. Format-coupling note: the BACKLOG.md row regex (scripts/lib/backlog_tracker.py:_ID_PATTERN) is brittle against table-shape edits. If a future BACKLOG.md edit adds a column or renames a status word, the parser will silently mis-classify rows — the smoke parses 101 rows on master at 2026-05-09; expect ≥ 100 after any structural edit.
0350 — psnr_hvs AVX-512 ceiling re-bench (ADR-0350, T3-9 (a))¶
docs/adr/0350-psnr-hvs-avx512-ceiling.md— closure ADR.docs/adr/0160-psnr-hvs-neon-bitexact.md— appended### Status update 2026-05-09appendix.docs/research/0091-psnr-hvs-avx512-bench-2026-05-09.md— empirical companion (cycle share, Amdahl ceiling, reproducer). Why this rebase-note exists: T3-9 (a) closes as AVX2 ceiling. The result has zero rebase-sensitivity by itself — no engine code changes — but the bit-exactness invariants that lock it to a ceiling do. The 78.42 % scalar tail incalc_psnrhvs_avx2/calc_psnrhvs_neonis locked by ADR-0138 / ADR-0139's "per-lane-scalar float reduction" rule (carried by ADR-0159 / ADR-0160). If a future upstream sync ofcore/src/feature/third_party/xiph/psnr_hvs.c(the Xiph/Daala DCT) changes the per-block summation tree — e.g. partial folding, re-ordered means, vectorised mask reductions — the AVX2 + NEON TUs incore/src/feature/x86/psnr_hvs_avx2.candcore/src/feature/arm64/psnr_hvs_neon.cMUST be re-audited against the new scalar reference, and the ceiling argument in ADR-0350 must be re-run (because the 78 / 15 cycle-share split would shift). Rebase-sensitivity: low for the ceiling decision itself (empirical re-bench on a current host is cheap — 30 seconds via the reproducer in Research-0091 §7); high for the underlying bit-exactness invariants the decision rests on (Netflix golden trips on ≥ 5.5e-5 drift per ADR-0160 §Context). The ADR-0350 §Verification reproducer is the gate — re-run it if the cycle share shifts, the Netflix normal-pair fixture changes, or a new host class (e.g. wide-issue Granite Rapids) goes into CI.
0320 — FFmpeg n8.1 → n8.1.1 base bump (2026-05-09)¶
- Touches:
ffmpeg-patches/series.txt(header comment),ffmpeg-patches/README.md(apply / verify / smoke sections),ffmpeg-patches/test/build-and-run.sh(FFMPEG_SHAdefault),scripts/ci/ffmpeg-patches-check.sh(header comment;FFMPEG_BRANCHenv default unchanged atrelease/8.1since the branch tracks point releases),docs/development/automated-rule-enforcement.md(gate description). The 9.patchfiles themselves are unchanged — every patch in the series applied cleanly, cumulatively, against pristinen8.1.1viagit am --3way. - Upstream source: FFmpeg upstream point release n8.1.1 (commit
239f2c7"Bump micro for 8.1.1") — bug-fix-only on top of n8.1, no API or AVOption breakage that the patch stack consumes. - Invariant: the patch stack continues to apply against the current tip of FFmpeg's
release/8.1branch. Per ADR-0118 and ADR-0186 §FFmpeg patch coupling, the verification gate is cumulativegit am --3wayagainst a pristine checkout, not per-patch standalone apply. The scripts/ci/ffmpeg-patches-check.sh local gate usesgit apply(no commit) but accumulates state in the same way. - On upstream sync: no action required. If a future FFmpeg point release (n8.1.2 or n8.2) lands new hunks that conflict with one of the patches, regenerate the affected patches via
git format-patchon the resolved state, bump the references in the five files listed under "Touches", and add a fresh rebase-notes entry citing the conflict file(s). - Re-test on rebase:
cd /tmp && rm -rf ffmpeg-n811 && \
git clone --depth 1 --branch n8.1.1 \
https://git.ffmpeg.org/ffmpeg.git ffmpeg-n811
git -C /tmp/ffmpeg-n811 config user.email agent@local
git -C /tmp/ffmpeg-n811 config user.name agent
for p in ffmpeg-patches/000*-*.patch; do
git -C /tmp/ffmpeg-n811 am --3way "$p" || break
done
bash scripts/ci/ffmpeg-patches-check.sh
ADR-0281 follow-up — QSV install-matrix discoverability backfill (2026-05-08)¶
- Touches:
docs/getting-started/install/{arch,fedora,ubuntu,macos,windows}.md(new## Intel QSVsection per page),docs/adr/0281-vmaf-tune-qsv-adapters.md(status-update appendix per ADR-0028),changelog.d/changed/qsv-install-matrix-docs.md(new fragment). No code, no engine, no upstream-shared C / Python source touched. Pure documentation backfill closing the SYCL-audit research-0086 Topic C gap (issue #464). - Invariant: each per-OS QSV section pins the package names against verified upstream URLs with a
Verified 2026-05-08access date. The hardware-generation matrix is sourced from the public Wikipedia "Intel Quick Sync Video — Hardware decoding and encoding" table; if Intel revises which generation supports AV1 encode (e.g. backports the encoder to Lunar Lake / Meteor Lake silicon currently absent from the table), the matrix in all five pages must move in lockstep — the Arch / Fedora / Ubuntu / Windows pages all carry the same matrix verbatim. The macOS page deliberately omits the matrix (QSV unsupported on macOS). - On upstream sync: no action required — Netflix/vmaf upstream does not ship per-OS install pages under
docs/getting-started/install/; that tree is fork-only.
# Lint the install pages (markdownlint via pre-commit):
pre-commit run --files docs/getting-started/install/*.md
# Verify each page (except alpine + macos) still carries the matrix:
for f in arch fedora ubuntu windows; do grep -q 'Arc Battlemage' "docs/getting-started/install/${f}.md" || echo "MISSING: ${f}"
# Confirm the macOS page documents QSV as unsupported:
grep -q 'Intel QSV. is unsupported on macOS' docs/getting-started/install/macos.md
0333 — vmaf-tune Phase F multi-pass encoding (ADR-0333)¶
Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(CodecAdapter Protocol gainssupports_two_pass: bool+two_pass_args(...))tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py(overrides both)tools/vmaf-tune/src/vmaftune/encode.py(EncodeRequestgainspass_number/stats_path;build_ffmpeg_commandadds the 2-pass argv splice + pass-1 null-muxer redirect; newrun_two_pass_encode)tools/vmaf-tune/src/vmaftune/corpus.py(CorpusOptions.two_pass, routing initer_rows)tools/vmaf-tune/src/vmaftune/cli.py(--two-passflag oncorpus/recommendsubparsers) Invariant: 2-pass encoding routes through the codec adapter viasupports_two_pass+two_pass_args(pass_number, stats_path). The encode driver never branches on codec name. Adapters withsupports_two_pass = Falseare honoured silently (single-pass fallback with stderr warning); the seam is open for sibling codec adapters (libx264, libsvtav1, libvvenc, libaom-av1) to opt in by overriding the two methods on their adapter file alone. This is the fork-local extension to the ADR-0237 Phase A multi-codec contract; upstream Netflix/vmaf has no equivalent and does not own this code path. Re-test:
(Optional, requires ffmpeg + libx265 in the runner's PATH:)
VMAF_TUNE_INTEGRATION=1 python -m pytest \
tests/test_codec_adapter_x265_two_pass.py::test_real_x265_two_pass_smoke -q
Rebase-sensitivity: zero from upstream — tools/vmaf-tune/ is fork-local. The only concern is the codec_adapters Protocol shape: a future upstream commit that adds a sibling codec adapter SHOULD inherit the supports_two_pass = False default and either explicitly opt in or leave the flag off. Downstream sibling-codec PRs in this fork should follow the ADR-0288 / ADR-0333 pattern: one adapter file, override the two methods, add a test file mirroring test_codec_adapter_x265_two_pass.py.
ADR-0360 — CAMBI CUDA port (T3-15a, 2026-05-09)¶
Files pinned:
core/src/feature/cuda/integer_cambi_cuda.c(new)core/src/feature/cuda/integer_cambi_cuda.h(new)core/src/feature/cuda/integer_cambi/cambi_score.cu(new)core/src/feature/feature_extractor.c(addedvmaf_fex_cambi_cudato list)core/src/meson.build(addedcambi_scoretocuda_cu_sources, addedinteger_cambi_cuda.cto CUDA feature sources)
Why: The CUDA twin of vmaf_fex_cambi (Strategy II hybrid — three GPU kernels for the embarrassingly parallel stages; calculate_c_values + topK on CPU). Registers vmaf_fex_cambi_cuda under #if HAVE_CUDA guard.
Rebase-sensitivity: low. The three new files are wholly fork-local and will not conflict. The two upstream-shared files have small, self-contained hunks:
feature_extractor.c: theextern vmaf_fex_cambi_cudadeclaration and the&vmaf_fex_cambi_cudaarray entry are inside a#if HAVE_CUDAblock. Upstream's additions to this file (new feature extractors, new dispatch flags) will not conflict unless Netflix adds their own CUDA twin for CAMBI (unlikely — they don't ship a CUDA backend).meson.build: thecambi_scoreentry in thecuda_cu_sourcesdict and theinteger_cambi_cuda.cline in the CUDA sources list. Any upstream changes tomeson.buildthat restructure thecuda_cu_sourcesdict would require a manual merge; the dict entries are sorted alphabetically by key, socambi_scorelands betweenadm_scoreandmotion_score.
If upstream adds cambi_cuda themselves: drop the fork copy and check for API divergence. Strategy II hybrid is the natural choice; the upstream implementation may differ if they choose Strategy III (fully-on-GPU calculate_c_values).
cambi_internal.h dependency: integer_cambi_cuda.c includes core/src/feature/cambi_internal.h (fork-added trampoline exposing cambi.c's static helpers). If upstream significantly refactors cambi.c (renames vmaf_cambi_preprocessing, vmaf_cambi_calculate_c_values, etc.), cambi_internal.h must be updated alongside. This is the same dependency the Vulkan twin (cambi_vulkan.c) has — see ADR-0210's rebase note for the full list of exposed functions.
Vulkan submit-pool PR-B: six secondary kernels (2026-05-09, ADR-0353)¶
Files changed:
core/src/feature/vulkan/ssim_vulkan.ccore/src/feature/vulkan/ciede_vulkan.ccore/src/feature/vulkan/ms_ssim_vulkan.ccore/src/feature/vulkan/motion_v2_vulkan.ccore/src/feature/vulkan/float_psnr_vulkan.ccore/src/feature/vulkan/float_motion_vulkan.ccore/src/feature/vulkan/AGENTS.mddocs/adr/0353-vulkan-submit-pool-pr-b-six-kernels.md
Why this rebase-note exists: six Vulkan host-glue TUs were migrated from per-frame command-buffer and descriptor-set allocation to the VmafVulkanKernelSubmitPool abstraction (ADR-0256). Any Netflix upstream sync that touches these same files (unlikely — they are fork-local) must preserve the VmafVulkanKernelSubmitPool fields in the state struct and the pool-destroy-before-pipeline-destroy ordering in close_fex().
Rebase-sensitivity: low. All six files are entirely fork-local; Netflix upstream does not have a Vulkan backend. The submit-pool API is defined in core/src/vulkan/kernel.h (also fork-local). No public header or C-API surface was changed; the FFmpeg patch series is unaffected.
Key invariant to preserve on rebase: vmaf_vulkan_kernel_submit_pool_destroy MUST be called before vmaf_vulkan_kernel_pipeline_destroy in every migrated kernel's close_fex(). See core/src/feature/vulkan/AGENTS.md §"Submit-pool ordering invariant".
0354 — Vulkan submit-pool PR-C: submit_pool_destroy-before-pipeline ordering¶
- Touches:
core/src/feature/vulkan/cambi_vulkan.c,core/src/feature/vulkan/ssimulacra2_vulkan.c,core/src/feature/vulkan/float_ansnr_vulkan.c,core/src/feature/vulkan/moment_vulkan.c. - Invariant: In every migrated extractor,
vmaf_vulkan_kernel_submit_pool_destroy()MUST precede everyvmaf_vulkan_kernel_pipeline_destroy()call inclose_fex(). Reversing the order frees the pool's command buffers after the pipeline's command pool is destroyed — undefined behaviour per Vulkan spec §6.2. - Re-test:
meson test -C build --suite=vulkanpasses.scripts/ci/cross_backend_vif_diff.pyshowsplaces=4for all four extractors on all three target devices (RTX 4090, Arc A380, RADV iGPU).
0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0291)¶
0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0352)¶
- Touches:
core/src/feature/vulkan/adm_vulkan.c,core/src/feature/vulkan/motion_vulkan.c,core/src/feature/vulkan/psnr_vulkan.c(all fork-local Vulkan kernels; no upstream C paths touched),changelog.d/changed/vulkan-submit-pool-pr-a-adm-motion-psnr.md,docs/adr/0291-vulkan-submit-pool-pr-a-adm-motion-psnr.md. - Invariant: Each migrated TU adds
VmafVulkanKernelSubmitPool sub_pooland pre-allocatedVkDescriptorSetfield(s) to its state struct. The pool must be destroyed (vmaf_vulkan_kernel_submit_pool_destroy) beforevmaf_vulkan_kernel_pipeline_destroyinclose_fex(); reversing the order would destroy the descriptor pool while the submit pool still holds live command buffer + fence references. Descriptor sets allocated viavmaf_vulkan_kernel_descriptor_sets_allocare freed implicitly by the descriptor pool tear-down — do NOT callvkFreeDescriptorSetson them inclose_fex(). Formotion_vulkan, the pre-allocated set is rebound once per frame viavkUpdateDescriptorSetsbecause the blur ping-pong changes whichblur[]slot is "current"; foradm_vulkanandpsnr_vulkanthe sets are stable afterinit()and require no per-frame update. - Upstream interaction: none. All three files are fork-local Vulkan kernel TUs not present in Netflix/vmaf upstream.
- On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths. The Vulkan backend is entirely fork-introduced.
- Re-test on rebase:
meson test -C build --suite=fast
# Cross-backend parity gate (places=4):
python python/test/cross_backend_diff.py \
--features adm motion psnr \
--backend vulkan cpu \
--places 4 \
--yuv testdata/yuv/src01_hrc00_576x324.yuv \
testdata/yuv/src01_hrc01_576x324.yuv
ADR-0350 — FFmpeg libvmaf filter CUDA backend selector (0010 patch)¶
Patch: ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch.
libavfilter/vf_libvmaf.c— addscudaAVOption + state field + init / cleanup / picture-pool wiring underCONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER.configure— adds--enable-libvmaf-cuda(EXTERNAL_LIBRARY_LISTentry + help text), promoteslibvmaf_cudafrom blanket-autodetect to gatedenabled libvmaf_cuda && require_pkg_config + check, preserves theenabled libvmaf && check_pkg_config libvmaf_cudain-filter probe so the new selector still works without the explicit flag when libvmaf ships CUDA. Why this rebase-note exists: Patch0010extends the SYCL (0003) / Vulkan (0004) per-context backend selectors to CUDA on the regularlibvmaffilter. The patch coexists with the upstream dedicatedlibvmaf_cudafilter (CONFIG_LIBVMAF_CUDA_FILTER) by gating its struct field and code paths on!CONFIG_LIBVMAF_CUDA_FILTER— the dedicated filter keeps owning its owncu_statefield. CLAUDE.md §12 r14 makes the patch update mandatory because the change touches a filter consumer of thevmaf_cuda_state_init/_import_state/_state_free/_preallocate_pictures/_fetch_preallocated_pictureC-API surface inlibvmaf_cuda.h. Rebase-sensitivity: low. The patch'svf_libvmaf.chunks are context-anchored on the SYCL/Vulkan selector blocks; if upstream FFmpeg renamesCONFIG_LIBVMAF_CUDA_FILTERor moves thelibvmaf_cuda.hinclude, the include guard at the top of the file needs the corresponding update. The configure hunks are context-anchored on the existing--enable-libvmaf-sycl/--enable-libvmaf-vulkanlines — those have proven stable across n8.0 → n8.1 → n8.1.1, so drift risk is low. WhenVmafCudaConfigurationever grows adevice_indexfield upstream, swap thecudaboolean for anint cuda_devicemirroring SYCL's shape (separate ADR + patch refresh). Verification gate: cumulativegit am --3wayreplay offfmpeg-patches/000{1..9}-*.patch+0010-*against pristine FFmpegn8.1.1PASS (2026-05-09). Build oflibavfilter/vf_libvmaf.oPASS under bothCONFIG_LIBVMAF_CUDA=0(selector errors at filter- init time per#elsebranch) andCONFIG_LIBVMAF_CUDA=1 && !CONFIG_LIBVMAF_CUDA_FILTER(selector active, picture-pool wiring compiles).
0320 — Vulkan instance / VMA apiVersion bump to 1.4 (Step B)¶
- Touches:
core/src/vulkan/common.c,core/src/vulkan/vma_impl.cpp,core/src/vulkan/AGENTS.md. - Invariant: the four
apiVersionsites (lines 54, 264, 374 ofcommon.c; line 22 ofvma_impl.cpp) request Vulkan 1.4, not 1.3. Together with the Step-Aprecisedecorations invif.comp/ciede.comp(PR #346) and the Phase-3 cross-subgroup release-acquire fix (PR #511), this gates the cross-backend places=4 contract on Arc + RADV. NVIDIA closure depends on Phase 3c (PR #512; block-on-merge until that lands). Netflix upstream does not carry a VMA dependency or a Vulkan backend; no upstream merge conflict expected on these files. - Re-test on rebase:
meson setup build -Denable_vulkan=enabled -Denable_cuda=false \
-Denable_sycl=false --buildtype=release
ninja -C build
for D in 0 1 2; do
python3 scripts/ci/cross_backend_parity_gate.py \
--vmaf-binary build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--backends cpu vulkan --vulkan-device "$D" \
--features vif ciede adm motion psnr
done
# All 0/N mismatches at places=4 once Phase 3c (PR #512) has landed.
ADR-0332 v2 runtime (T5-2c) — Embedded MCP server UDS + real compute_vmaf (2026-05-09)¶
- Touches:
core/src/mcp/{mcp.c,dispatcher.c,mcp_internal.h,meson.build,compute_vmaf.c,transport_uds.c},core/test/test_mcp_smoke.c. All paths are fork-local. No new third-party vendor drop in v2 — mongoose vendoring stays deferred to v3 with the SSE transport. - Invariant: same as ADR-0209 v1 — the entire
core/src/mcp/subtree is fork-local; the public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged (only function bodies flipped —vmaf_mcp_start_udsfrom-ENOSYSto a working AF_UNIX listener;compute_vmaffrom a{"status":"deferred_to_v2"}placeholder to a realvmaf_score_pooledbinding). Per ADR-0128 § operational guardrails the UDS socket file is created mode 0700; thatchmodhappens invmaf_mcp_start_udsafterbindand is a load-bearing security invariant — do NOT relax it on rebase.compute_vmafruns on a per-call ephemeralVmafContextso the host's main scoring run is unperturbed; do NOT rewire it to reuseserver->ctxbecausevmaf_score_pooledcommits the model destructively to the context. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. If upstream adds one, expect a port-only sync since names will collide.
- Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
-Denable_mcp=true -Denable_mcp_stdio=true \
-Denable_mcp_uds=true
ninja -C build && meson test -C build test_mcp_smoke -v
# Real-score smoke (single 576x324 pair):
build/test/test_mcp_smoke 2>&1 | tail -3 # expects "16 tests run, 16 passed"
ADR-0332 v3 runtime (T5-2d) — Embedded MCP server SSE transport (2026-05-09)¶
- Touches:
core/src/mcp/{mcp.c,mcp_internal.h,meson.build,transport_sse.c},core/meson_options.txt,core/test/test_mcp_smoke.c,docs/mcp/embedded.md,docs/adr/0332-mcp-runtime-v2.md(status-update appendix). All paths are fork-local. No third-party vendor drop in v3 — the originally-planned mongoose vendor was reversed because cesanta/mongoose 7.18 is GPL-2.0-only OR commercial, incompatible with the fork's BSD-3-Clause-Plus-Patent license (verified at upstream LICENSE 2026-05-09). The SSE transport is plain POSIX sockets in fork-owned C (~500 LOC). - Invariant: same as ADR-0209 / ADR-0332 v2 — the entire
core/src/mcp/subtree is fork-local; the public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged (onlyvmaf_mcp_start_sse's body flipped from-ENOSYSto a working AF_INET listener). The SSE listener bindsINADDR_LOOPBACKonly; do NOT switch toINADDR_ANYwithout a separate ADR + auth design (v3 ships intentionally without CORS/Bearer/per-session auth on the assumption of a same-host trust boundary). The SSE stop path usesshutdown(SHUT_RDWR)beforeclose()— plainclose()of an AF_INET listening fd from another thread does NOT unblockaccept()on Linux; do NOT remove theshutdowncall.enable_mcp_sseis now afeatureoption (defaultauto), notboolean false. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. Do NOT re-introduce mongoose (or any GPL-licensed HTTP library) on a future rebase without first amending CLAUDE §1 and adding a separate license-compatibility ADR.
- Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
-Denable_mcp=true -Denable_mcp_stdio=true \
-Denable_mcp_uds=true \
-Denable_mcp_sse=enabled
ninja -C build && meson test -C build test_mcp_smoke -v
build/test/test_mcp_smoke 2>&1 | tail -3 # expects "17 tests run, 17 passed"
Status update 2026-05-09 — placeholder-ref hardening¶
- Additional touches: same set as the 2026-05-08 ADR-0334 entry, no new files. The hardening adds a
git diff -U0 ... -- docs/state.mdcall insidescripts/ci/state-md-touch-check.sh(case 4a) plus 10 additional fixture cases inscripts/ci/test-state-md-touch-check.sh. - New invariant: inserted lines in
docs/state.md(lines starting with+, excluding the+++ b/...header) must not containthis PR/this commit/ bareTBD/<PR>/#NNN. Canonical accept forms arePR #Nandcommit `<sha>`. The placeholder vocabulary is coupled to PR #541's audit findings — reword in lockstep with the ADR-0334 status-update appendix if the fork's row template changes. - Re-test on rebase: same
bash scripts/ci/test-state-md-touch-check.shrun as the 2026-05-08 entry; the harness now reports18/18 passed(was8/8 passed).
0347 — Sanitizer matrix test-set scope (ADR-0347)¶
- Touches:
.github/workflows/tests-and-quality-gates.ymljobsanitizers(build + test step),core/test/meson.build(no edits — the absence of anysuite: 'unit'tag is the upstream state we now work with rather than against). - Invariant: the sanitizer job runs the full C unit-test set per sanitizer with a per-sanitizer deselect list driven by a
caseblock on${{ matrix.sanitizer }}. The deselect lists are load-bearing — each entry corresponds to a real bug tracked indocs/state.md. Under UBSan the build adds-Dc_args=-fno-sanitize=function -Dcpp_args=-fno-sanitize=functionto suppress the K&R-prototype harness UB; the mesoncasebranch must keep this build flag in sync with the test deselect entries. An upstream rebase that adds new test files viacore/test/meson.buildinherits full sanitizer coverage automatically (the workflow enumerates tests viameson test --list). - On upstream sync: if upstream Netflix lands a
suite: 'unit'tagging convention, the workflow is robust to it (we already enumerate frommeson test --list, not from--suite=unit). If upstream rewrites the harness to declarestatic char *test_X(void)with a(void)parameter, the-fno-sanitize=functionflag becomes redundant — leave it in place (zero cost) until a deliberate cleanup PR reverts the suppression. If upstream lands a fix for any of the surfaced defects (SVMModelParservalidation,feature_collectormetadata leak,integer_adm::div_lookuprace,framesyncmutex mismatch), drop the corresponding deselect row from the workflow'scaseblock in the same PR that pulls the upstream fix. cd libvmaf for SAN in address undefined thread; do EXTRA=() [ "$SAN" = undefined ] && EXTRA=( "-Dc_args=-fno-sanitize=function" "-Dcpp_args=-fno-sanitize=function" ) rm -rf "build-$SAN" CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld \ meson setup "build-$SAN" -Db_sanitize="$SAN" \ -Denable_cuda=false -Denable_sycl=false --buildtype=debug \ -Db_lto=false -Db_lundef=false "${EXTRA[@]}" meson compile -C "build-$SAN" case "$SAN" in address) EXCLUDE='test_model$|test_predict$|test_float_ms_ssim_min_dim$' ;; undefined) EXCLUDE='test_model$' ;; thread) EXCLUDE='test_model$|test_pic_preallocation$|test_framesync$' ;; esac TESTS=$(meson test -C "build-$SAN" --list \ | grep '^libvmaf:' \ | grep -vE "$EXCLUDE" \ | sed 's/^libvmaf://') meson test -C "build-$SAN" --print-errorlogs $TESTS
CodeQL bulk mechanical sweep — Python tree (2026-05-09)¶
- Why this matters on rebase: no rebase impact. The diff lives entirely in
python/vmaf/and one fork-local helper (core/src/vulkan/spv_embed.py). None of the touched Python modules have been changed by Netflix upstream in over four years; the closest churn is unrelated additions topython/vmaf/script/run_*.pydriver flags. A future/sync-upstreamwill land on a clean tree. - What changed: dead imports removed;
exit()→sys.exit()in seven CLI driver scripts;open(...)→with open(...)inpython/vmaf/tools/decorator.pyandcore/src/vulkan/spv_embed.py; typedexcept KeyError: passbodies got an explanatory one-line comment to satisfypy/empty-except;passremoved where it was a no-op tail statement; one commented-out debug block deleted fromtools/misc.py. - Re-test on rebase:
python3 -c "import ast; [ast.parse(open(f).read()) for f in (...)]"over the touched files;ruff checkover the same set must produce no NEW errors versus master baseline.
0345 — cambi × {CUDA, SYCL, HIP} GPU port planning (ADR-0345, docs-only)¶
- Touches:
docs/research/0091-cambi-gpu-port-planning-2026-05-09.md(new),docs/adr/0345-cambi-gpu-port-strategy.md(new),docs/adr/_index_fragments/0345-cambi-gpu-port-strategy.md(new fragment),docs/adr/_index_fragments/_order.txt(append slot),changelog.d/changed/cambi-gpu-planning-digest.md(new). No code. Companion to the per-port PRs that follow per the digest's §6 ordered plan (CUDA → SYCL → HIP). - Upstream source: none — fork-local planning artefact. Netflix/vmaf upstream has no CUDA / SYCL / HIP cambi twin and no plans to add one on those backends.
- Invariant: the planning round locks Strategy II host-staged hybrid for the three pending backends, inheriting verbatim from ADR-0205 §Decision and ADR-0210 §Decision. The cross-backend gate contract for cambi is
places=4from day one on all backends — by construction (integer-only GPU pre-passes; byte-identical readback; unmodified host residual). If any per-port PR sees empirical drift from CPU, fix the kernel — never relax the gate (memoryfeedback_no_test_weakening). The sharedcambi_internal.hhost residual surface (shipped with PR #196 for the Vulkan port) is the load-bearing reuse point — all four GPU twins (Vulkan, CUDA, SYCL, HIP) link against it and inherit any future CPU-side c-value formula change automatically. - On upstream sync: no action required. If a future upstream sync introduces a Netflix/vmaf cambi GPU twin (extremely unlikely — Netflix has no public CUDA / SYCL / HIP cambi work), evaluate whether to drop the fork's twin in favour of upstream's per the standard prefer-upstream rule; otherwise no action.
- Re-test on rebase: docs-only — no compile / runtime gate. The Strategy III v2 follow-up (parked per ADR-0205 §Out of scope) gets its own ADR + rebase-notes entry when profile data lands.
0320 — Vulkan VIF API-1.4 NVIDIA residual Phase 3b (deferral)¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(comment-only update at the Phase-4 reduction site — documents the Phase-3b candidate-fix experiments and the driver-side hypothesis; no code logic change vs. PR #511);docs/adr/0269-vif-ciede-precise-step-a.md(appended Phase-3b status update appendix; ADR body remains frozen per ADR-0028);docs/research/0090-...md(new);docs/state.md(rowT-VK-VIF-1.4-RESIDUAL-ARCretired in favour ofT-VK-VIF-1.4-RESIDUAL-NVIDIA-DEFERREDafter the hardware-mapping correction);core/src/vulkan/AGENTS.md(Phase 3b update + rebase invariant for cross-backend gate device-name selection);changelog.d/fixed/vif-arc-mesa-anv-int64-reduction.md(new fragment). - Invariant: the workgroup-scope
memoryBarrierShared(); barrier();pair PR #511 introduced is load-bearing for the Arc + RADV lanes at API 1.4 and stays. Phase 3b confirmed it cannot be downgraded back to a barebarrier()even if the NVIDIA residual ever closes — Arc's clean state is contingent on the workgroup-scope pair. - Cross-backend gate device-selection invariant (NEW): scripts that target a specific Vulkan vendor must select by
deviceNamesubstring, not by--vulkan_device <index>.vmaf_vulkan_context_new's device sort is stable inside the samedevtype_scorebucket and thevkEnumeratePhysicalDevicesenumeration order is host-policy-dependent (driver registration order in/etc/vulkan/icd.d/, Mesa device-select layer,VK_LOADER_*env vars). PR #511's commit message inverted the device map on this fork's CI workstation; the empirical numbers it cited as "NVIDIA" actually came from Arc and vice versa. New cross-backend lanes targeting a specific vendor should not inherit the off-by-one. - On upstream sync:
vif.compis fork-local; no upstream Netflix/vmaf has a Vulkan path. Cherry-picks from upstream cannot reach this file. - Re-test on rebase (assumes a multi-GPU CI workstation with NVIDIA + Arc + RADV; lavapipe-only CI lanes are a no-op for the API-1.4 residual since lavapipe never reproduced the bug):
# Local API-1.4 bump (off-master reproducer; do NOT commit).
sed -i 's/VK_API_VERSION_1_3/VK_API_VERSION_1_4/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1003000/VMA_VULKAN_VERSION 1004000/' \ core/src/vulkan/vma_impl.cpp cd libvmaf && meson setup build -Denable_vulkan=enabled \ -Denable_cuda=false -Denable_sycl=false && ninja -C build cd ..
# NVIDIA lane — expected 45/48 FAIL scale 2 until either the
# manual int64 subgroup-reduction patch lands or NVIDIA fixes
# the driver. Arc + RADV expected 0/48.
python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature vif --backend vulkan --device
# Revert local bump after testing.
sed -i 's/VK_API_VERSION_1_4/VK_API_VERSION_1_3/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1004000/VMA_VULKAN_VERSION 1003000/' \ core/src/vulkan/vma_impl.cpp
Upstream-port-later batch — Research-0090 18-commit triage close-out (2026-05-09)¶
- Touches:
docs/state.md(one row in "Deferred (waiting on external trigger)"), this file,changelog.d/changed/upstream-port-later-batch-2026-05-09.md. No code touched. Companion to PR #446 (Research-0090) and the in-flight PRs #497 (MyTestCase super-PR), #443 / #444 (cambi-docs duplicate pair). - Per-commit classification (input set: 18 PORT_LATER SHAs from Research-0090):
| # | Upstream SHA | Subject (truncated) | Verdict | Reopen / forward path |
|---|---|---|---|---|
| 1 | 38e905d1 | adopt MyTestCase + reformat BD-rate test data | PORT_DEFERRED | Subsumed by PR #497 commit e1dbdc09; close out when #497 merges |
| 2 | 005988ea | adopt MyTestCase + port new tests + align fifo_mode | PORT_DEFERRED | Subsumed by PR #497 commit 6c05afe2; close out when #497 merges |
| 3 | 4679db83 | fix VMAFEXEC_score tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit 0004d2cf — must preserve fork's golden places= values byte-for-byte (CLAUDE §8 / ADR-0024) |
| 4 | 3e075107 | adopt MyTestCase + update score values in vmafexec tests | PORT_DEFERRED | Subsumed by PR #497 commit 0004d2cf; close out when #497 merges |
| 5 | e3827e4d | adopt MyTestCase + port new tests in asset/bootstrap/local_explainer | PORT_DEFERRED | Subsumed by PR #497 commit 6c05afe2; close out when #497 merges |
| 6 | 25ff9f18 | remove empty VmafossexecCommandLineTest stub | PORT_DEFERRED → CHERRY-PICK after #497 | Pure 13-line deletion. PR #497 currently RE-EMITS the stub; once #497 lands, cherry-pick this commit standalone (zero-conflict against post-#497 tip). |
| 7 | 3a041a97 | adopt MyTestCase + update score values | PORT_DEFERRED | Subsumed by PR #497 commit d52d9221; close out when #497 merges |
| 8 | ead2d12b | fix vif_scale3 + adm3_egl_1 tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit b5a3f61b — Netflix-golden tolerance guard same as row 3 |
| 9 | 6c097fc4 | reduce ADM/VIF tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit f3881d5c — Netflix-golden tolerance guard same as row 3 |
| 10 | 7df50f3a | align testutil with full set of fixture functions | PORT_DEFERRED | Subsumed by PR #497 commit f1ae0495; close out when #497 merges |
| 11 | 322ca041 | replace temporal slicing with pre-sliced YUV fixtures | PORT_DEFERRED | Subsumed by PR #497 commit 7d9d9a10; close out when #497 merges. Sequencing matters: this commit must land before rows 12, 14, 15, 17 (the YUV-fixture consumers); #497 already orders them correctly. |
| 12 | 74bdce1b | align vmafexec_feature_extractor_test (aim/adm3/motion3) | PORT_DEFERRED | Subsumed by PR #497 commit 07e7cb48; close out when #497 merges |
| 13 | a3776335 | align feature_extractor_test (aim/adm3/motion3) | PORT_DEFERRED | Subsumed by PR #497 commit 15a6874d; close out when #497 merges |
| 14 | 0341f730 | remove duplicate test_run_vmaf_integer_fextractor | PORT_DEFERRED → CHERRY-PICK after #497 | Pure 76-line deletion. Same disposition as row 6 — #497 currently re-emits the duplicate; cherry-pick standalone after #497. |
| 15 | 9fa593eb | port feature_extractor tests for aim/adm3/motion3 + new options | PORT_DEFERRED | Subsumed by PR #497 commit ab21b694; close out when #497 merges |
| 16 | d93495f5 | reduce tolerance for VMAF scores in quality_runner tests | PORT_DEFERRED w/ Netflix-golden guard | PR #497 — Netflix-golden tolerance guard same as row 3 |
| 17 | 7d1ad54b | port feature extractor tests for aim/adm3/motion3 | PORT_DEFERRED | Subsumed by PR #497 commit 44b9e626; close out when #497 merges |
| 18 | 721569bc | resource/doc: cambi_high_res_speedup + motion2 score | PORT_DEFERRED → DEDUP | Already in flight on TWO branches (PR #443 + PR #444). Maintainer picks one and abandons the other per Research-0090 §Recommended action #4. No third port-PR opened. |
- Invariant: after PR #497 merges, the Research-0090 PORT_LATER bucket reduces to exactly two follow-up cherry-picks against post-#497 master:
git cherry-pick 25ff9f18(delete emptyVmafossexecCommandLineTest).git cherry-pick 0341f730(delete duplicatetest_run_vmaf_integer_fextractor). Both are pure deletions onpython/test/command_line_test.pyandpython/test/feature_extractor_test.pyrespectively; no score change, no Netflix-golden interaction. They were excluded from PR #497 because the v2 super-PR's diff state currently RE-EMITS those identifiers (likely because #497 cherry-picked from an earlier upstream tip than25ff9f18/0341f730).- Netflix-golden guard (binding): per CLAUDE §8 / ADR-0024, the three Netflix CPU golden pairs in
python/test/quality_runner_test.py,vmafexec_test.py,vmafexec_feature_extractor_test.py,feature_extractor_test.py,result_test.py(1 normalsrc01_hrc00↔hrc01+ 2 checkerboard) carry hard-codedassertAlmostEqualrows that are NEVER modified by a fork PR. Upstream commits4679db83,ead2d12b,6c097fc4,d93495f5explicitly LOWERplaces=on a subset of those rows (their stated motivation is macOS FP precision drift, not a true score change). Reviewer of PR #497 must verify that the 3 golden pairs retain fork tolerances byte-for-byte; only non-golden rows may adopt the relaxations. - On upstream sync: future
/sync-upstreamruns that re-detect these 18 SHAs should match this entry via the SHA list and short-circuit Pass-2 classification (skip re-triage). - Re-test on rebase: none required at the time of this commit (no code touched); after the two follow-up cherry-picks (
25ff9f18+0341f730) eventually land, run meson test -C build --suite=fast make test-netflix-golden # 3/3 CPU goldens still pass
0356 — Vulkan two-level GPU reduction for VIF / ADM / motion¶
- Touches:
core/src/feature/vulkan/vif_vulkan.c,adm_vulkan.c,motion_vulkan.c,core/src/vulkan/picture_vulkan.{h,c},core/src/vulkan/meson.build,core/src/feature/vulkan/shaders/vif_reduce.comp,adm_reduce.comp,motion_reduce.comp. - Invariant: The
vif_reduce.comp/adm_reduce.comp/motion_reduce.compshaders areACCUM_FIELDS=7/ACCUM_SLOTS=6/ single-field. If an upstream sync adds or removes fields from the per-WG accumulator layout invif.comp/adm.comp/motion.comp, the corresponding reducer shader and the host-sideVIF_ACCUM_FIELDS/ADM_ACCUM_SLOTS_PER_WGconstants must be updated in lockstep. Mismatch = silent miscompute. - Re-test:
meson test -C build --suite=fast+ runscripts/ci/cross-backend-diff.sh --backend=vulkan --places=4against the Netflix normal pair on any available Vulkan device. e notes
Single ledger of fork-local changes that need attention when this fork syncs from upstream/master (Netflix/vmaf). Required by ADR-0108: every fork-local PR that touches upstream-shared paths or establishes a rebase-sensitive invariant adds an entry here. PRs with no rebase impact state "no rebase impact" in the PR description and skip the entry.
The intended reader is whoever runs the next /sync-upstream (see ADR-0002 and .claude/skills/sync-upstream/). Read top-to-bottom before resolving conflicts.
Format¶
Each entry is a ### NNNN — short title heading with three fields:
- Touches: paths likely to conflict on upstream merge.
- Invariant: what the fork relies on that an upstream change could silently drop.
- Re-test: the command(s) to run after the merge to confirm the invariant survived. Reproducer-style — no surrounding prose required.
IDs are assigned in commit order and never reused. A single entry may cover several PRs in one workstream; cross-link from the ID heading.
Entries (backfilled 2026-04-18 per ADR-0108 adoption)¶
0332 — Agent worktree-drift hard guard (ADR-0332)¶
- Touches:
.pre-commit-config.yaml(one newlocalhook id),Makefile(hooks-installtarget — comment-only edit),scripts/ci/check-agent-worktree-drift.sh(new),scripts/ci/test_check_agent_worktree_drift.sh(new),AGENTS.md(new section §12a),docs/development/agent-worktree-discipline.md(new),docs/adr/0332-*.md(new),docs/adr/_index_fragments/(new fragment +_order.txtappend),changelog.d/added/agent-worktree-drift-guard.md(new). - Invariant on upstream-mirror files: none — every touched path is fork-local. The pre-commit hook ID
agent-worktree-drift-guardis unique to this fork; upstream Netflix/vmaf has nolocalhook block in its (also non-existent).pre-commit-config.yaml. - On upstream sync: no expected conflict. The
.pre-commit-config.yamlblock is fork-only; if Netflix ever introduces its own pre-commit config we'll rebase theagent-worktree-drift-guardlocalhook on top of upstream's blocks but the YAML structure is independent. - Re-test on rebase:
```bash bash scripts/ci/test_check_agent_worktree_drift.sh # End-to-end: refused commit from main with active agent. cd "$(git -C . rev-parse --show-toplevel)" && \ bash scripts/ci/check-agent-worktree-drift.sh ; echo "exit=$?" # Allowed commit from inside an agent worktree. cd "$(git -C . rev-parse --show-toplevel)/.claude/worktrees/agent-
0320 — psnr_cuda chroma extension (ADR-0351)¶
- Touches:
core/src/feature/cuda/integer_psnr/psnr_score.cu(kernel — fork-only file, BSD-3-Clause-Plus-Patent / Lusoris+Claude header) andcore/src/feature/cuda/integer_psnr_cuda.c(host glue — fork-only file, same header). Neither path is in upstream Netflix/vmaf. The PTX module key (psnr_score) andnvccextra flags table incore/src/meson.buildare unchanged by this PR. - Invariant: the kernel signature now takes a
planeparameter (unsigned) appended after(width, height). The host file'spsnr_cuda_dispatchpackskernelParams[]in the exact(ref, dis, sse, &width, &height, &plane)order — any refactor that reorders or drops the trailing argument silently breaks the chroma path becausecuLaunchKernelcannot validate argument types. The host also relies on thepicture_cudaupload path having uploaded all 3 planes for non-YUV400Pinputs (seelibvmaf.c::translate_picture_host'supload_mask); a future "minimise upload" optimisation must consult the extractor'scharsto decide which planes can be skipped, not assume luma-only. - Upstream interaction: none — CUDA backend is not in Netflix/vmaf upstream. Cherry-picks from upstream that touch
core/src/feature/integer_psnr.c(the CPU twin) need attention only if they changepsnr_name[],mse_name[], or theenable_chromasemantics; the CUDA path mirrors those conventions byte-for-byte to keep the cross-backend gate clean. - Re-test on rebase:
```bash cd libvmaf && meson setup build -Denable_cuda=true && ninja -C build python ../scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build/tools/vmaf \ --reference ../testdata/ref_576x324_48f.yuv \ --distorted ../testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend cuda --places 4
ADR-0994 — coverage build-break fix: remove vmaf_fex_integer_motion_v2 from feature_extractor.cpp + guard motion_five_frame_window in integer_motion.c (2026-06-03)¶
- Touches:
core/src/feature/integer_motion.c,core/src/feature/feature_extractor.cpp,docs/adr/0994-coverage-build-fix-motion-v2-ref.md(new),docs/adr/README.md,docs/state.md,docs/rebase-notes.md,changelog.d/fixed/0994-coverage-build-fix-motion-v2-ref.md(new). - No rebase-sensitive invariants: bug fix only. Both files touched are production C/C++ sources; no option table or ABI surface changed.
- Relation to ADR-0337: when the
prev_prev_refpicture-pool refactor (ADR-0337 deferred hunk) lands, flip the-ENOTSUPguard ininteger_motion.c::init()and restore thefex->prev_prev_refreference inextract(), then remove themotion_five_frame_windowcomment block added here.
ADR-0337 — motion_v2 public option surface duplication (2026-05-09)¶
- Touches:
core/src/feature/integer_motion_v2.c,core/src/feature/x86/motion_v2_avx2.c,core/src/feature/x86/motion_v2_avx512.c,core/src/feature/arm64/motion_v2_neon.c,docs/adr/0337-motion-v2-public-api-options.md(new),docs/adr/_index_fragments/0337-motion-v2-public-api-options.md(new),docs/adr/_index_fragments/_order.txt(one-line append),docs/adr/README.md(regenerated),changelog.d/added/motion-v2-public-api-options.md(new),changelog.d/fixed/motion-v2-mirror-off-by-one.md(new),docs/state.md(deferral row update),core/src/feature/AGENTS.md(invariant note). - Upstream cluster ported: Netflix/vmaf
856d3835(mirror off-by-one fix, propagated to scalar + AVX2 + AVX-512 + fork-local NEON),c17dd898(motion_max_valoption),a2b59b77(motion_five_frame_windowoption, partial — see Deferred-hunks below),4e469601(remaining options +motion3_v2_scoreprovided feature, manual port adapted to 3-frame mode only). - Architectural decision: ADR-0337 picks A1 — duplicate option surfaces between motion v1 and motion_v2. v1 (
integer_motion.c) and v2 (integer_motion_v2.c) each register their ownVmafOption[]table; the seven option names match upstream byte-for-byte so future/sync-upstreamruns find no behavioural delta. The duplication is purely textual (~80 LOC of option-table rows + 7 struct fields). Touching one extractor's help string requires touching the other; ADR-0141 catches drift on the next edit. - Invariants:
motion_five_frame_window=truereturns-ENOTSUPatinit()on motion_v2. Mirrors ADR-0219 §Decision's GPU motion3 precedent. The 3-frame default mode is fully supported. When the picture-pool plumbing follow-up lands, the-ENOTSUPguard flips to aprev_prev_reflookup; until then any caller passing=truesees a hard error.- motion v1's option surface is the source of truth for the seven shared option names. ADR-0158 carries v1's history; ADR-0337 carries v2's. Both extractors emit independently into the feature collector under
VMAF_integer_feature_motion*_score(v1) andVMAF_integer_feature_motion*_v2_score(v2) — there is no shared output namespace. - GPU twins (CUDA / SYCL / HIP / Vulkan) of
motion_v2do NOT yet register the option surface in this PR. Theirmotion3_v2_scoreemission is out of scope per ADR-0337 §Consequences; whether the GPU twins gain the same options follows when a model needs the score there. The mirror off-by-one fix in856d3835is propagated only to scalar + AVX2 + AVX-512 + NEON in this PR; CUDA / SYCL / HIP / Vulkan mirror formulae stay on the pre-fix2*size - idx - 1form and document the divergence. Refresh tracked as a follow-up. - Deferred upstream hunks (kept in upstream commits but not ported in this PR; tracked here so the next
/sync-upstreamfinds them): a2b59b77hunks incore/src/feature/feature_extractor.h(addsVmafPicture prev_prev_refto the per-extractor framework struct),core/src/libvmaf.c(picture-pool sizingn_threads * 2 + 2,prev_prev_refplumbing throughthreaded_extract_func/threaded_extract_batch_func/threaded_read_pictures/threaded_read_pictures_batch,vmaf_closecleanup),core/tools/vmaf.c(CLI picture-pool sizing). Conflicts in 8 regions on the fork'sread_pictures*decomposition (ADR-0152 monotonic-index gate) make an in-PR port unsafe; the picture-pool refactor will land as its own PR with a five-frame-window fixture underpython/test/once the framework changes are reviewed in isolation.a2b59b77's 5-frame branch inextract()(usesfex->prev_prev_refdirectly) lands together with the framework hunks above.4e469601's 5-frameflush()branch (variablestride/min_idx = 2,lo_idx/hi_idxwindow) is collapsed in this PR to themin_idx = 1constant for 3-frame mode only. The branch is structurally oneif (s->motion_five_frame_window)away from full upstream parity; reinstate when theprev_prev_refplumbing lands.- On upstream sync: future
/sync-upstreamagainstNetflix/vmaf masterwill report all four commits as already ported (cherry-picked or hand-ported with(cherry picked from …)trailers). When the picture-pool refactor PR lands, this ledger row gains a "5-frame mode now wired" sub-bullet and the-ENOTSUPguard flips. Avoid re-discovering the four commits as pending — the deferred hunks are tracked above, not the whole commits.
0310 — Vulkan VIF int64 reduction race condition Phase 3 fix¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(replaces all three barebarrier()calls with explicitmemoryBarrierShared(); barrier();pairs covering the Phase-1 cooperative tile load, the Phase-2 vertical-conv shared write, and the Phase-4 cross-subgroup int64 reduction); plus documentation underdocs/research/0089-...md(Phase 3 status appendix),docs/adr/0269-...md(Phase 3 status appendix),docs/state.md(T-VK-VIF-1.4-RESIDUAL closed; new T-VK-VIF-1.4-RESIDUAL-ARC opened),core/src/vulkan/AGENTS.md(Phase 3 update on the existing invariant row),changelog.d/fixed/vif-int64-reduction-race-condition.md. Upstream Netflix/vmaf has no Vulkan backend, so conflict probability for the shader is zero. The entry exists because the fix is rebase-sensitive: any future cherry-pick that touchesvif.compand downgrades amemoryBarrierShared(); barrier();pair back to a barebarrier()will silently re-introduce the NVIDIA Vulkan 1.4 race. - Invariant:
vif.compshared-memory ordering between cooperative-write phases must be release-acquire, not just a bare workgroup-execution barrier. NVIDIA's Vulkan 1.4 default memory model requires the explicit shared-memory release; barebarrier()works at API 1.3 by accident on this driver. SCALE is irrelevant — the fix applies to all four pipeline specialisations because the barrier sites are in the SCALE-shared code. Do NOT remove the explicitmemoryBarrierShared()calls even if a perf review claims they are redundant under the GLSL spec wording: empirical real-hardware evidence in research-0089 2026-05-09 appendix shows otherwise on NVIDIA driver 595.71.05. - Re-test: apply the local API-1.4 bump (
core/src/vulkan/common.c3 sites +vma_impl.cppVMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build withmeson setup ... -Denable_vulkan=enabled, then runpython3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan --device 1 --places 4. Expect 0/48 across all four scales. Run the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3" against--vulkan_device 1; expect 5 identical(integer_vif_num_scale2, integer_vif_den_scale2) = (+2.494358e+04, +2.522523e+04)pairs at frame 5. Note that--vulkan_device 0on this multi-GPU host is the Intel Arc A380 lane and will still fail at API 1.4 (separateT-VK-VIF-1.4-RESIDUAL-ARCrow Open).
0309 — Vulkan VIF API-1.4 Phase 2 dump (T-VK-VIF-1.4-RESIDUAL)¶
- Touches:
docs/research/0089-vulkan-vif-fp-residual-bisect-2026-05-08.md(2026-05-09 status appendix with empirical numbers from the live RTX 4090),docs/state.md(T-VK-VIF-1.4-RESIDUAL row updated with the localisation),core/src/vulkan/AGENTS.md(new invariant row pinning the SCALE = 2 cross-subgroup-reduction memory-model finding),CHANGELOG.md(lusoris fork "Changed" entry). No code touched; the Phase 3 shader memory-model fix lands in a separate PR. Upstream Netflix/vmaf has no Vulkan backend so conflict probability for the AGENTS.md row is zero — entry exists because the empirical localisation flips the open state-row hypothesis from FP-precision to memory-model and retires theplaces=3override path that earlier rebase scaffolding might have suggested. - Invariant:
vif.compSCALE = 2 specialisation's Phase-4 cross-subgroup int64 reduction is non-deterministic on NVIDIA driver 595.71.05 + Vulkan 1.4.341 (lines 547–592,subgroupAddbarrier()+ thread-0 read ofs_lmem). API 1.3 lane is fully deterministic on the same hardware. The fourapiVersionpinning sites incore/src/vulkan/common.c+core/src/vulkan/vma_impl.cppstay at 1.3 until Phase 3 lands the explicit memory-scope barrier and a 5-run determinism gate confirms run-to-run identical(num, den)plusplaces=40/48 on NVIDIA. Theplaces=3override path is eliminated from the unblock options. - Re-test: apply the local API-1.4 bump (
core/src/vulkan/common.c3 sites +vma_impl.cppVMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build withmeson setup ... -Denable_vulkan=enabled, then run the gate and the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3". Expect 45/48places=4failures oninteger_vif_scale2(max abs1.527e-02) AND 5 distinct(integer_vif_num_scale2, integer_vif_den_scale2)pairs across 5 runs of--feature 'vif_vulkan=debug=true'. Both observations reproduced bit-for-bit on this session's hardware lane (UUIDe478b41b-5c4f-1ddb-f990-e44916aff4c8).
0308 — encoder knob-sweep recipe-regression policy (ADR-0308, docs-only)¶
- Touches:
docs/research/0080-encoder-knob-sweep-findings.md,docs/adr/0308-encoder-knob-sweep-recipe-regression-policy.md,docs/adr/README.md(index row),ai/AGENTS.md(knob-sweep invariant section),changelog.d/changed/encoder-knob-sweep-findings.md. No code touched; companion to PR #400 (ADR-0305 + Research-0077 +ai/scripts/analyze_knob_sweep.py). Upstream Netflix/vmaf has no encoder-knob-sweep surface, so conflict probability is zero — this entry exists only because the policy threshold (7-of-9 structural cut) is rebase-sensitive on the corpus shape. - Invariant: the 7-of-9 source-count threshold from ADR-0308 §Decision point 1 is calibrated against the current 9-source Netflix Public Dataset corpus. If the corpus grows past 9 sources (e.g. UGC expansion per ADR-0287, or HDR additions), re-derive the absolute threshold as a fraction (≥7/9 ≈ 78 %). The structural cluster is sharp on the current corpus (top-15 cells all hit 9-of-9, no observed cells in 4-6 range), so a fractional cut at ~75 % is robust. Do NOT relax
bitrate_tol_pct(default 5.0) orvmaf_tol(default 0.1) inai/scripts/analyze_knob_sweep.pywithout an ADR — those tolerances are calibrated against the per-frame VMAF noise floor and bitrate quantisation in libavformat muxers. - Re-test:
pytest ai/tests/test_knob_sweep_analysis.py -v(script logic; ships in PR #400). Policy gate is offline: regenerateruns/phase_a/full_grid/comprehensive.jsonlviatools/vmaf-tune/src/vmaftune/hw_encoder_corpus.py(3-hour run on a single host with NVENC + QSV) then re-runpython ai/scripts/analyze_knob_sweep.py --jsonl <adapted.jsonl> --out-dir runs/phase_a/full_grid/reports/and diff the resultingsummary.mdagainstdocs/research/0080-encoder-knob-sweep-findings.mdheadline table. Structural cluster (top-15 cells, all 9-of-9) is the invariant to defend.
0228 — Vulkan 1.4 bump deferred (ADR-0264, docs-only)¶
- Touches: none (docs-only PR). Future Step A of T-VK-1.4-BUMP will touch
core/src/feature/vulkan/shaders/vif.compandcore/src/feature/vulkan/shaders/ciede.comp; Step B will touch the threeapiVersionsites incore/src/vulkan/common.c(lines 54, 264, 374) and theVMA_VULKAN_VERSIONdefine incore/src/vulkan/vma_impl.cpp(line 22). - Invariant:
masterstays onVK_API_VERSION_1_3andVMA_VULKAN_VERSION = 1003000. Lifting the constant in any future upstream sync (Netflix doesn't ship a Vulkan backend, so the conflict is improbable) without first auditingprecise/OpDecorate ... NoContractiondecoration onvif.compandciede.compwill reintroduce the NVIDIA-driver regression captured in research-0053. Thepsnr_hvs_strict_shaders-O0list incore/src/vulkan/meson.buildis the existing precedent for shader-side bit-exactness mitigations and should be the place a 1.4-era audit lands its results (potentially expanding to covervif.comp+ciede.compif thepreciseaudit decides the optimizer is the right place to gate). - Re-test: when Step B lands, the gate is
python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkanand the same with--feature ciedeagainst NVIDIA + RADV + lavapipe; max abs diff must stay ≤5.0e-05(places=4) on all three.
0229 — HIP fifth-consumer kernel float_ansnr_hip (ADR-0266)¶
0228 — y4m_convert_411_422jpeg 1-byte heap-buffer-overflow fix¶
0228 — vmaf-tune resolution-aware model selection (ADR-0289)¶
0282 — vmaf-tune AMD AMF codec adapters (ADR-0282)¶
0228 — tools/vmaf-tune/ codec-agnostic encode dispatcher (ADR-0294)¶
- Touches:
tools/vmaf-tune/src/vmaftune/encode.py— refactored to look up the codec adapter and delegate argv composition. Wholly fork-local.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py,codec_adapters/x264.py— adapter contract gainsffmpeg_codec_args(preset, quality)andextra_params(). Both are duck-typed; missing methods fall back to the legacy x264-CRF shape.tools/vmaf-tune/tests/test_encode_multi_codec.py— new 19-test suite pinning the dispatcher contract per codec.docs/usage/vmaf-tune.md— new "Codec adapter contract" section.- Invariant: the harness (
encode.py,corpus.py) must not branch on codec identity. The only codec-aware code is the per-adaptercodec_adapters/*.pyfile. Any future change that adds anif adapter.encoder == "..."to the harness regresses ADR-0294's whole-purpose. The corpus row schema stays at SCHEMA_VERSION=1 —crfis preserved as the row column even when the underlying codec's quality knob is-cq/-qp/ etc.;EncodeRequest.qualityis a request-side property only. Adapters that don't yet exposeffmpeg_codec_argsare intentionally permitted to fall back to the legacy x264-CRF shape; removing that fallback would break in-flight adapter PRs landing one-at-a-time. - Re-test on rebase:
```bash pytest tools/vmaf-tune/tests/ -q # 32 passed (13 existing + 19 multi-codec)
python -c " from pathlib import Path from vmaftune.encode import EncodeRequest, build_ffmpeg_command req = EncodeRequest( source=Path('ref.yuv'), width=1920, height=1080, pix_fmt='yuv420p', framerate=24.0, encoder='libx264', preset='medium', crf=23, output=Path('out.mp4'), ) cmd = build_ffmpeg_command(req) assert cmd[cmd.index('-c:v') + 1] == 'libx264' assert cmd[cmd.index('-preset') + 1] == 'medium' assert cmd[cmd.index('-crf') + 1] == '23' print('x264 dispatcher path OK') "
0260 — vmaf-tune --sample-clip-seconds (ADR-0301)¶
- Touches:
tools/vmaf-tune/src/vmaftune/{cli,corpus,encode,score,__init__}.py— fork-local. No upstream Netflix/vmaf path overlap.tools/vmaf-tune/tests/test_corpus.py,tools/vmaf-tune/AGENTS.md,docs/usage/vmaf-tune.md,docs/adr/0301-vmaf-tune-sample-clip.md,docs/adr/_index_fragments/0301-vmaf-tune-sample-clip.md,docs/adr/_index_fragments/_order.txt,docs/adr/README.md.- Invariant: corpus JSONL
SCHEMA_VERSIONbumped to2— additiveclip_modekey only. Sample-clip windows are mirrored on both sides via FFmpeg input-side-ss/-t(encode) and libvmaf's--frame_skip_ref/--frame_cnt(score). The_resolve_sample_clip()helper is the single source of truth for the centre-anchored slice math; do not duplicate the computation elsewhere. Falls back silently to"full"whenN >= duration_s. - Re-test:
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_amf,hevc_amf,av1_amf,_amf_common}.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py— registry extended with three AMF entries.tools/vmaf-tune/tests/test_codec_adapter_amf.py(new).tools/vmaf-tune/tests/test_corpus.py— Phase A test renamed fromtest_known_codecs_phase_a_is_x264_onlytotest_known_codecs_includes_x264_and_amf.tools/vmaf-tune/AGENTS.md— adds AMF preset-compression invariant.docs/usage/vmaf-tune.md— adds Hardware encoders section.- Invariant: the 7-into-3 preset compression table in
_amf_common.py(_PRESET_TO_AMF) is the cross-codec axis Phase B / C consumers depend on. Every AMF adapter accepts the canonical 7 preset names (placebo…ultrafast) and maps them onto the three AMF rungs (quality/balanced/speed). Do not extend the preset vocabulary without amending ADR-0282 — registry uniformity (no codec-identity branching in the harness search loop) rests on every codec accepting the same names. - Re-test:
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
tools/vmaf-tune/src/vmaftune/resolution.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/corpus.py— addsCorpusOptions.resolution_aware: bool = Trueand pipes the effective model throughscore_res.request.modelinto the JSONL row.tools/vmaf-tune/src/vmaftune/cli.py— adds--resolution-aware/--no-resolution-aware(BooleanOptionalAction, default on).tools/vmaf-tune/tests/test_resolution.py(new).docs/usage/vmaf-tune.md— new "Resolution-aware mode" section.docs/adr/0289-vmaf-tune-resolution-aware.md(new) +docs/research/0064-vmaf-tune-resolution-aware.md(new).tools/vmaf-tune/AGENTS.md— two new invariant notes.- Invariant: the height-only decision rule (
height >= 2160→vmaf_4k_v0.6.1, elsevmaf_v0.6.1) is the documented contract. The JSONLvmaf_modelfield is now per-row (not per-job) — mixed ladder corpora legitimately contain multiple distinct values across rows. Downstream consumers (Phase B / C / D) must group/filter byvmaf_modelrather than assuming a constant. Width is accepted in the API for symmetry but ignored in the body; do not branch on it without a follow-up ADR. - Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep resolution-aware
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
core/tools/y4m_input.c— upstream-mirrored Daala-derived Y4M parser. The fix sits inside the 4:1:1 → 4:2:2-jpeg chroma upsample routiney4m_convert_411_422jpeg, lines ~500–530 in the function's three sub-loops. Upstream Netflix/vmaf carries the same shape; if upstream lands its own fix during a sync, prefer the upstream version and drop ours.core/test/test_y4m_411_oob.c(new, fork-local) — drives the minimal W=2 H=4 4:1:1 stream throughvideo_input_open+video_input_fetch_frame. Wholly fork-added; no upstream collision.core/test/meson.build— addstest_y4m_411_oobexecutable +test()registration.- Invariant: the first two sub-loops of
y4m_convert_411_422jpegmust guard_dst[(x << 1) | 1]writes with(x << 1 | 1) < dst_c_w, matching the third sub-loop's existing guard. Without the guard a 4:1:1 stream of width 2 (dst_c_w == 1) writes one byte past the destination chroma row. - Re-test:
cd libvmaf && meson setup ../build-asan --buildtype=debug -Db_sanitize=address -Db_lundef=false -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabledninja -C build-asan test/test_y4m_411_oobASAN_OPTIONS=detect_leaks=0 ./build-asan/test/test_y4m_411_oob— must report1 tests run, 1 passed. Pre-fix the binary aborts withAddressSanitizer: heap-buffer-overflow … WRITE of size 1aty4m_input.c:507.
0270 — saliency_student_v1 fork-trained on DUTS-TR (ADR-0286)¶
- Touches:
model/tiny/registry.json— adds thesaliency_student_v1row. Fork-local registry; no upstream overlap.model/tiny/saliency_student_v1.onnx(+.jsonsidecar) — new weights and metadata. Fork-local.ai/scripts/train_saliency_student.py— new training script. Wholly fork-local underai/, which has no upstream counterpart.docs/ai/models/saliency_student_v1.md,docs/research/0062-saliency-student-from-scratch-on-duts.md,docs/adr/0286-saliency-student-fork-trained-on-duts.md— new docs under fork-local trees.- Invariant: the C-side
feature_mobilesal.cextractor's tensor-name contract —input(NCHW[1, 3, H, W]) andsaliency_map(NCHW[1, 1, H, W]) — must continue to match the ONNX graph for bothsaliency_student_v1.onnxand the legacymobilesal.onnxplaceholder. Future weights swaps can change the graph internals freely but must keep these names + shapes; the smoke test asserts the registration. The op-allowlist constraint (graph uses only ops incore/src/dnn/op_allowlist.c) carries over from ADR-0218 —Resizeis not used;ConvTransposeis the upsample op for v1 to keep the graph load-clean against vanilla origin/master. - Re-test:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python -c "
from ai.src.vmaf_train.op_allowlist import check_model
from pathlib import Path
r = check_model(Path('model/tiny/saliency_student_v1.onnx'))
assert r.ok, r.pretty()
print('allowlist OK')
"
meson test -C build --suite=fast mobilesal
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
core/src/feature/hip/float_ansnr_hip.{c,h}(new) — fifth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/float_ansnr_cuda.ccall-graph-for-call-graph;init/submit/collect/closeinvoke the kernel-template helpers in the same order; the submit body intentionally bypassesvmaf_hip_kernel_submit_pre_launch(no atomic, kernel writes per-block (sig, noise) interleaved float partials directly).core/src/hip/meson.build— adds the new TU tohip_sources.core/src/feature/feature_extractor.c— adds theextern VmafFeatureExtractor vmaf_fex_float_ansnr_hip;declaration and the registry row under#if HAVE_HIP.core/test/test_hip_smoke.c— addstest_float_ansnr_hip_extractor_registeredsub-test pinning the lookup contract.- Invariant — the
submit_pre_launchbypass is load-bearing. The CUDA twin makes the same choice for the same reason. If a future PR adds asubmit_pre_launchcall tofloat_ansnr_cuda.c's submit path, the HIP twin must follow in the same PR. Likewise the readback shape (wg_count * 2u * sizeof(float)) and the bpc table (peak/psnr_max for 8/10/12/16-bit) mirror the CUDA twin verbatim — keep aligned on rebase. - Re-test on rebase:
cd libvmaf
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build # 48/48 green (47 CPU + HIP smoke)
0230 — HIP sixth-consumer kernel motion_v2_hip (ADR-0267)¶
- Touches:
core/src/feature/hip/integer_motion_v2_hip.{c,h}(new) — sixth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/integer_motion_v2_cuda.ccall-graph-for-call-graph; carries theVMAF_FEATURE_EXTRACTOR_TEMPORALflag and aflush()callback. The state struct has auintptr_t pix[2]ping-pong slot pair tracked outside the kernel-template (the template models a single device+host pair only).core/src/hip/meson.build— adds the new TU tohip_sources.core/src/feature/feature_extractor.c— adds theextern VmafFeatureExtractor vmaf_fex_integer_motion_v2_hip;declaration and the registry row under#if HAVE_HIP.core/test/test_hip_smoke.c— addstest_motion_v2_hip_extractor_registeredsub-test pinning the lookup contract (extractor name ismotion_v2_hip, matching the CUDA twin'smotion_v2_cudanaming).- Invariant — temporal-extractor + ping-pong shape. The
VMAF_FEATURE_EXTRACTOR_TEMPORALflag bit, theflush()callback registration, and theuintptr_t pix[2]slot pair are load-bearing for the runtime PR (T7-10b). The runtime PR will swapuintptr_t pix[2]for a real device-buffer handle pair matching the CUDA twin'sVmafCudaBuffer *pix[2]. On rebase: if the CUDA twin's flush-pass shape changes (currentlymin(score[i], score[i+1])), update the HIP twin'sflush_fex_hipbody in the same PR. - Re-test on rebase: same as 0229 —
meson test -C buildwithenable_hip=trueexercises the smoke contract.
0227 — ms_ssim_vulkan submit-side migrated to kernel_template (T-GPU-DEDUP-26)¶
- Touches:
core/src/feature/vulkan/ms_ssim_vulkan.c—extract()'s rawVkCommandBuffer/VkFence/vkAllocateCommandBuffers/vkBeginCommandBuffer/vkCreateFence/vkQueueSubmit/vkWaitForFences/vkDestroyFence/vkFreeCommandBuffersblocks becomeVmafVulkanKernelSubmittriples (vmaf_vulkan_kernel_submit_begin/_submit_end_and_wait/_submit_free). One triple covers the decimate-pyramid command buffer; one triple per scale covers the per-scale SSIM submit. The pipeline-side bundles (pl_decimate2-binding 4-variant +pl_ssim10-binding 9-variant) and their_add_variant()chains are unchanged from the prior migration.- Invariant: any future submit-side template change (timeline semaphores, deferred fence release, queue-family parameterisation) must keep the helpers' synchronous-wait + per-frame fence + per-frame command-buffer contract intact, since
ms_ssim_vulkan.cdoes host readback of thel_partials/c_partials/s_partialsbuffers immediately after_submit_end_and_waitreturns. The submit-side contract is the same one already documented incore/src/vulkan/AGENTS.md's "Rebase-sensitive invariants" section forkernel_template.h. - Re-test:
```bash cd libvmaf && meson test -C build python scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature float_ms_ssim --backend vulkan --places 4
0231 — SHA-pin GitHub Actions (OSSF Pinned-Dependencies)¶
- Touches: every workflow file under
.github/workflows/. All 13 fork workflows (docker-image.yml,docs.yml,ffmpeg-integration.yml,libvmaf-build-matrix.yml,lint-and-format.yml,nightly-bisect.yml,nightly.yml,release-please.yml,rule-enforcement.yml,scorecard.yml,security-scans.yml,supply-chain.yml,tests-and-quality-gates.yml) had theiruses:directives rewritten from<owner>/<repo>@vN[.M.K]to<owner>/<repo>@<40-char-sha> # vN.M.K. 97 references converted; the SLSA reusable-workflow ref insupply-chain.ymlis the single documented holdout (seeInvariantbelow). - Invariant — SHA-pin policy for
uses:. Every action reference in.github/workflows/*.ymlMUST be a 40-char commit SHA with the semver tag preserved as a trailing# vN.M.Kcomment. The OSSF ScorecardPinned-Dependenciescheck parses both forms and a floating tag (@vN) is treated as unpinned and counts against the aggregate score. Single permitted exception: the SLSA generator reusable workflow (slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml) must keep itsvX.Y.Ztag form because GitHub Actions consumers cannot SHA-pin reusable-workflow refs in every code path; the exception is documented inline insupply-chain.ymland survives on each rebase. Why this matters on upstream sync: Netflix upstream does not ship the fork's CI tree, so a/sync-upstreamrun that drags new workflow content (e.g. via repository templates or bot-authored bumps) into.github/workflows/can re-introduce floating-tag references unnoticed. The post-rebase check below is the standing gate — anything that lights up needs to be re-pinned before merging the sync. - Re-test on rebase:
# Anything that prints is a regression — every uses: must be either
# already SHA-pinned (40 hex) or, for the documented SLSA exception,
# the slsa-github-generator reusable-workflow ref.
grep -hnE '^\s*(- )?uses:\s+[^@]+@[^ #]+\s*$' .github/workflows/*.yml \
| grep -vE '@[a-f0-9]{40}' \
| grep -v 'slsa-framework/slsa-github-generator/.github/workflows/'
# SHA-resolution sanity for any new pin (per-action):
gh api repos/<owner>/<repo>/git/ref/tags/<vN.M.K> --jq '.object.sha'
# If the result is a "tag" object (annotated tag), deref:
gh api repos/<owner>/<repo>/git/tags/<sha-from-prev> --jq '.object.sha'
0226 — CUDA drain-batch engine-loop opt (T-GPU-OPT-1)¶
- Touches:
core/src/cuda/drain_batch.{h,c}(new) — TLS drain-batch table + shared drain stream +_open()/register/_flush()/_close()API.core/src/libvmaf.c— engine-side per-frame loop now wraps submit/collect with_open()+_flush()so all CUDA extractorfinishedevents are waited on a single shared drain stream.- All 12 CUDA feature kernels (
core/src/feature/cuda/*.c) register theirfinishedevent +drainedflag with the drain batch on submit; collect skips its privatecuStreamSynchronizewhendrainedis true. - Invariant — drained-flag contract. Every CUDA extractor's collect path must check the per-frame
drainedflag and skip its owncuStreamSynchronizewhen set; otherwise the drain batching is a no-op. The flag is reset tofalseper frame insidevmaf_cuda_drain_batch_register(). - Re-test on rebase:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast cuda
Expected: all CUDA tests green; bench shows ≥5% wall-clock gain on a 7-extractor VMAF model (model.json with all feature extractors enabled).
0225 — Netflix bench snapshot regen (upstream a44e5e61 motion fix)¶
- Touches:
testdata/netflix_benchmark_results.json— fork-added snapshot. CPU rows now reflect the post-fix motion feature; cuda / sycl rows from the previous regen are preserved unchanged because those backends were not exercised on this rerun (host-environment tooling — wrong renderD path,libvmaf_cudanot enabled in the local FFmpeg build). Future full regens should include cuda / sycl.testdata/bench_all.sh— defaultVMAF=no longer points at/usr/local/bin/vmaf(which on most dev hosts is stuck at the pre-upstream-a44e5e61v3.0.0); now defaults to the in-tree fork build atcore/build/tools/vmaf.testdata/benchmark_netflix.py—FFMPEG,YUVDIRand the hardcodedLD_LIBRARY_PATH=/usr/local/libare now overridable viaVMAF_FFMPEG,VMAF_YUVDIRand any caller-setLD_LIBRARY_PATH.- Invariant: the snapshot's CPU pooled VMAF for
src01_576x324is 76.667828 (post-fix), not 76.668904 (the upstream-buggy mirror). If/sync-upstreamever re-pulls a Netflix change that touchesmotion.cmirror-handling, this number is the reference. - Re-test:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
LD_LIBRARY_PATH=$(pwd)/build/src python3 \
../testdata/benchmark_netflix.py
Expected CPU pooled rows: 76.667828, 35.068672, 7.985899.
0224 — CUDA graph capture feasibility (research-0047, DEFER)¶
- Touches: none — investigation-only; no code lands. The research digest
docs/research/0047-cuda-graph-capture-feasibility.mddocuments why a CUDA graph capture path on the per-frame submit chain is deferred rather than shipped (realised wall-clock gain capped at ~1-3% vs. the predicted 10-20%, with a 4-slot picture-pool rotation that defeats single-graph capture and forces per-framecuGraphExecKernelNodeSetParamsrebinding for(ref, dis)device pointers). - Invariant: the
kernel_template.hdocstring keeps namingVmafCudaKernelLifecycle.finishedas a graph-capture hook point. Don't prune that comment on rebase — leaving the door open in the template is free, and the digest's "what needs to be true for a future GO" section depends on the hook still being there. - Re-test on rebase:
# Confirm the docstring still references graph capture as the hook
# point — wording change is fine, removal is not.
grep -q "graph capture" core/src/cuda/kernel_template.h
0223 — ADR slug-drift repair in CHANGELOG / rebase-notes (PR #304 follow-up)¶
- Touches:
CHANGELOG.md,docs/rebase-notes.md. No code; no upstream-shared path; no public-API surface. - Invariant: every
[ADR-NNNN](docs/adr/NNNN-slug.md)link in the fork's tracked docs resolves to an actual on-disk file underdocs/adr/. Repaired 4 broken slugs that did not exist on disk (0138-iqa-convolve-avx2-bitexact-double→0138-iqa-convolve-avx2-bitexact-double,0140-simd-dx-framework→0140-simd-dx-framework,0190-ms-ssim-vulkan→0190-ms-ssim-vulkan,0178-vulkan-adm-kernel→0178-vulkan-adm-kernel). All retained their cited NNNN per ADR-0028 (NNNN is immutable once Accepted). - Re-test on rebase: from repo root, the following must print no lines:
for ref in $(grep -ohE 'docs/adr/[0-9]{4}-[a-z0-9-]+\.md' \
CHANGELOG.md docs/rebase-notes.md AGENTS.md docs/state.md \
| sort -u); do
test -f "$ref" || echo "MISSING: $ref"
done
0125 — cambi_vulkan migrated to kernel_template (T-GPU-DEDUP-25, 5-bundle)¶
- Touches:
core/src/feature/vulkan/cambi_vulkan.c— state's quintet (dsl_2bind+ 5×pl_layout_*+shader_modules[CAMBI_PL_COUNT]areddesc_pool) collapses to fiveVmafVulkanKernelPipelinebundles (pl_trivial,pl_derivative,pl_filter_mode,pl_decimate,pl_mask_dp), each owning its own descriptor pool. The first slot ofpipelines[]per stage aliases the bundle's base pipeline;CAMBI_PL_FILTER_MODE_V,CAMBI_PL_MASK_SAT_COL, andCAMBI_PL_MASK_THRESHOLDare sibling variants built viavmaf_vulkan_kernel_pipeline_add_variant().cambi_vk_alloc_settakes a bundle pointer (->desc_pool/->dsl) — every dispatch site picks the bundle that matches its push-constant struct.- The
cambi_vk_make_dsl/cambi_vk_make_pl/cambi_vk_create_shader/cambi_vk_build_pipelinehelpers are dropped — the template subsumes them. - Invariant — variants destroyed before bundle, base alias must be skipped. Five distinct push-constant struct sizes (
CambiVkPushTrivial/CambiVkPushDerivative/CambiVkPushFilterMode/CambiVkPushDecimate/CambiVkPushMaskDp) force five bundles even though every stage's DSL is 2-binding SSBO;_add_variant()only siblings pipelines under the same layout.close_fexmustvkDestroyPipeline()the variant slots (CAMBI_PL_FILTER_MODE_V,CAMBI_PL_MASK_SAT_COL,CAMBI_PL_MASK_THRESHOLD) before callingvmaf_vulkan_kernel_pipeline_destroy()on each bundle. - Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit):
cambimean = 0.0, identical to pre-migration (the pair has no banding artifacts). - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper. Upstream Netflix/vmaf has no Vulkan backend, so there is nothing to merge against.
0124 — ssimulacra2_vulkan migrated to kernel_template (T-GPU-DEDUP-24, 4-bundle)¶
- Touches:
core/src/feature/vulkan/ssimulacra2_vulkan.c— state's 16 long-lived pipeline-object fields (4×*_dsl + *_pl + *_shader+ the shareddesc_pool) collapse to fourVmafVulkanKernelPipelinebundles (pl_xyb,pl_mul,pl_blur,pl_ssim), each owning its own descriptor pool. The first slot of each per-bundle pipeline array (xyb_pipelines[0],mul_pipelines[0],blur_pipelines_h[0],ssim_pipelines[0]) aliases the bundle's baseVkPipeline; remaining per-scale / per-pass slots are siblings viavmaf_vulkan_kernel_pipeline_add_variant().ss2v_build_pipeline_int3reroutes through_add_variant()instead of callingvkCreateComputePipelinesdirectly;ss2v_alloc_settakes a bundle pointer (->desc_pool/->dsl) instead of a separate DSL argument; descriptor-set free sites at the tail ofss2v_run_scaleroute to each bundle's pool.- The
ss2v_make_dsl/ss2v_make_pl/ss2v_create_shaderhelpers are dropped — the template subsumes them. - Invariant — variants destroyed before bundle, slot 0 alias must be skipped. Four distinct DSL shapes (XYB = 6 SSBOs, MUL = 3, BLUR = 2, SSIM = 8) prevent collapsing to one bundle:
_add_variant()only siblings pipelines under the same layout.close_fexmustvkDestroyPipeline()the variant slots inxyb_pipelines[1..N-1],mul_pipelines[1..N-1],ssim_pipelines[1..N-1],blur_pipelines_h[1..N-1], and every slot ofblur_pipelines_v[]before callingvmaf_vulkan_kernel_pipeline_destroy()on each bundle, and must skip slot 0 of the first three arrays +blur_pipelines_hto avoid double-freeing the aliased base. - Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit):
ssimulacra2mean = 24.613842, identical to pre-migration. - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper. Upstream Netflix/vmaf has no ssimulacra2 extractor and no Vulkan backend, so there is nothing to merge against.
0118 — psnr_hvs_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-18)¶
- Touches:
core/src/feature/vulkan/psnr_hvs_vulkan.c— state'sdsl + pipeline_layout + shader + desc_pool + pipeline[3]collapses toVmafVulkanKernelPipeline pl + VkPipeline pipeline_chroma_u + VkPipeline pipeline_chroma_v. Plane 0 is the template's base pipeline; planes 1+2 are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- New
psnr_hvs_plane_pipeline()accessor maps plane index to the rightVkPipelinehandle. - Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the chroma U/V variants before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan in T-GPU-DEDUP-7. - Numerical contract: unchanged. Same shaders + spec-constants push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
- Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0119 — vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-19)¶
- Touches:
core/src/feature/vulkan/vif_vulkan.c— state'sdsl + pipeline_layout + shader + desc_pool + pipelines[4]collapses toVmafVulkanKernelPipeline pl + VkPipeline scale_variants[3]. Scale 0 is the template's base pipeline; scales 1, 2, 3 are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- New
vif_scale_pipeline()accessor maps scale index to the rightVkPipelinehandle (replacess->pipelines[scale]). - Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the 3 scale variants before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan in T-GPU-DEDUP-7 and psnr_hvs_vulkan in T-GPU-DEDUP-18. - Numerical contract: unchanged. Same shaders, same spec-constants, same push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
- Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0120 — float_vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-20)¶
- Touches:
core/src/feature/vulkan/float_vif_vulkan.c— state collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl; theVkPipeline pipelines[2][4]2-D lookup table is preserved so the existing[mode][scale]dispatch path stays clean, butpipelines[0][0]aliasess->pl.pipeline(the template's base). The other 6 entries are sibling pipelines created viavmaf_vulkan_kernel_pipeline_add_variant().- Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the 6 sibling variants (every(mode, scale)except(0, 0)) before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan / psnr_hvs_vulkan / vif_vulkan. - Invariant —
pipelines[0][0]aliasing. The base pipeline handle is owned bys->pl.pipeline; we copy it intopipelines[0][0]after_create()so the dispatch path can use a uniform 2-D lookup. The destroy loop must skip(mode=0, scale=0)to avoid double-freeing the template's pipeline. - Numerical contract: unchanged. Same shaders, spec-constants (
mode+scale), push-constants. Netflix-pair smoke matchesinteger_vifbit-identically to 4 decimals. - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0122 — float_adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-22)¶
- Touches:
core/src/feature/vulkan/float_adm_vulkan.c— twin to adm_vulkan (T-GPU-DEDUP-21); 16-pipeline 2-D[stage][scale]array. State collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl.pipelines[0][0]aliasess->pl.pipeline; the other 15 entries are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- Invariants:
- Variants destroyed before bundle.
pipelines[0][0]aliasing — destroy loop must skip(stage=0, scale=0).- Numerical contract: unchanged. Same float (
_ssuffix) primitives fromadm_tools.c; same 5-element spec-constant tuple; same float partial accumulation reduced in double on the host.
0121 — adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-21)¶
- Touches:
core/src/feature/vulkan/adm_vulkan.c— state collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl; theVkPipeline pipelines[4][4]2-D lookup is preserved so the per-stage dispatch path stays clean.pipelines[0][0]aliasess->pl.pipeline(the template's base); the other 15 entries are sibling pipelines viavmaf_vulkan_kernel_pipeline_add_variant().- Invariants:
- Variants destroyed before bundle (same rule as ssim_vulkan / psnr_hvs / vif / float_vif).
pipelines[0][0]aliasing — destroy loop must skip(stage=0, scale=0)to avoid double-freeing the template's pipeline.- Numerical contract: unchanged. Same shaders + 5-element spec-constant tuple (width, height, bpc, scale, stage) + push-constants.
- Rebase impact: low. Builds on top of PR #272.
0123 — ms_ssim_vulkan 2-bundle migration (T-GPU-DEDUP-23)¶
- Touches:
core/src/feature/vulkan/ms_ssim_vulkan.c— state collapsesdecimate_dsl + decimate_pl + decimate_shader + ssim_dsl + ssim_pl + ssim_shader + desc_pool(7 fields) to two bundlesVmafVulkanKernelPipeline pl_decimate+pl_ssim. Each bundle owns its own descriptor pool. The kernel has two distinct pipeline shapes (decimate = 2 SSBO bindings, ssim = 10 bindings), so two bundles is the minimum —_add_variant()only siblings pipelines under the same layout.decimate_pipelines[0]aliasespl_decimate.pipeline(the template's base = scale 0). The remainingMS_SSIM_SCALES - 2decimate variants (scales 1..3) are siblings via_add_variant().ssim_pipeline_horiz[0]aliasespl_ssim.pipeline(base = scale 0, pass 0). The other 9 entries (4×ssim_pipeline_horizfor scales 1..4, plus 5×ssim_pipeline_vertfor scales 0..4) are variants.- Invariant — variants destroyed before bundle. Same rule as ADR-0106 entry 0106:
close_fexmust destroydecimate_pipelines[1..3]andssim_pipeline_horiz[1..4]+ssim_pipeline_vert[0..4]before callingvmaf_vulkan_kernel_pipeline_destroy()onpl_decimate/pl_ssim. - Invariant —
[0]aliasing destroy-skip.decimate_pipelines[0]andssim_pipeline_horiz[0]must not be passed tovkDestroyPipelineinclose_fex—_destroy()already releases them viapl_decimate.pipeline/pl_ssim.pipeline. Double-free is UB. The destroy loops inclose_fexstart ati = 1for decimate and skipi == 0for ssim_horiz. - Invariant — per-bundle descriptor pool. The shared
s->desc_poolis gone;alloc_descriptor_setnow takes aconst VmafVulkanKernelPipeline *bundleand usesbundle->desc_pool+bundle->dsl. Per-framevkFreeDescriptorSetscalls must target the matching pool (pl_decimate.desc_poolfor decimate sets,pl_ssim.desc_poolfor ssim sets) — mixing them is undefined behavior. - Numerical contract: unchanged. Same shaders, spec constants, push constants, and dispatch order as before.
float_ms_ssimNetflix-pair smoke (576×324×48f) reports mean 0.963241; ssim pyramid intermediate values bit-identical to pre-migration run. - Rebase impact: low. Upstream Netflix has no Vulkan backend. Conflicts only against the parallel
T-GPU-DEDUP-{18..22}PRs (#284–#288) onCHANGELOG.md/docs/rebase-notes.md— auto-resolve keeps both halves.
0106 — Vulkan kernel template multi-pipeline + ssim/motion migration (T-GPU-DEDUP-7)¶
- Touches:
core/src/vulkan/kernel_template.h— newvmaf_vulkan_kernel_pipeline_add_variant()helper. Takes the base pipeline bundle (DSL / pipeline layout / shader / pool owned byvmaf_vulkan_kernel_pipeline_create) plus a partialVkComputePipelineCreateInfoand produces a siblingVkPipelinere-using the same layout / shader. The base_createand_destroyentry points are unchanged; existing consumers (psnr, moment, ciede) keep working.core/src/feature/vulkan/motion_vulkan.c— state collapsesVkPipeline pipelines[2](kept "for SYCL parity" but functionally identical because COMPUTE_SAD goes through push constants, not spec-constants) to a singleVmafVulkanKernelPipeline pl.create_pipelines/close_fexshrink to template-driven create + destroy.core/src/feature/vulkan/ssim_vulkan.c— state becomesVmafVulkanKernelPipeline pl + VkPipeline pipeline_vert. Pass 0 (horizontal) is the template's base pipeline; pass 1 (vertical) is created via_add_variant().close_fexdestroys the variant first, then callsvmaf_vulkan_kernel_pipeline_destroy()on the bundle.- Invariant — no spec-constant drift between base and variant.
_add_variant()overwritessType/stage.sType/stage.stage/stage.module/layoutof the caller'sVkComputePipelineCreateInfoso the variant is guaranteed to share the base's shader and layout. Callers control the variant's spec-constant viapSpecializationInfo. Reordering these overwrites lets a consumer accidentally bind a different shader module under the same layout — UB at descriptor-set time. - Invariant — variant destroyed before bundle.
close_fexin ssim mustvkDestroyPipeline(s->pipeline_vert)beforevmaf_vulkan_kernel_pipeline_destroy(&s->pl)— the bundle's_destroyreleases the descriptor pool, which thevkAllocateDescriptorSetsissued against the variant pipeline's layout cleanly drops only when the variant pipeline is already gone. - Numerical contract: unchanged. Both kernels run identical shaders + spec-constants + push-constants as before; only the Vulkan boilerplate that creates / destroys the pipeline scaffolding moved to a shared owner. Cross-backend parity gate at
places=4holds — Netflix-pairfloat_ssimsmoke (576×324×48f) reports mean 0.863, identical to pre-migration. - Rebase impact: low. The base pipeline-bundle helpers predate this change (PR #270 / #271); the new
_add_variantis additive. Upstream Netflix has no Vulkan backend to conflict with.
0111 — integer_ciede_cuda migrated to kernel_template (T-GPU-DEDUP-11)¶
- Touches:
core/src/feature/cuda/integer_ciede_cuda.c— state'sCUstream + CUevent + CUevent + VmafCudaBuffer + host-pinned float*quintet collapses toVmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. init / collect / close call the template'slifecycle_init/readback_alloc/collect_wait/lifecycle_close/readback_freehelpers. submit keeps the pre-launch wait inline (intentional — ciede has no atomic, so the template's pre-launch memset is unnecessary).- Numerical contract: unchanged. Pure CUDA-boilerplate consolidation. The host-side reduction in collect still uses the same
doubleaccumulator over per-block float partials —places=4(ADR-0187) holds.
0112 — integer_moment_cuda migrated to kernel_template (T-GPU-DEDUP-12)¶
- Touches:
core/src/feature/cuda/integer_moment_cuda.c— state's stream/event/device-buffer/host-pinned quintet collapses toVmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. submit callsvmaf_cuda_kernel_submit_pre_launch(atomic counters require the device-side memset). init / collect / close call the matching template helpers.- Numerical contract: unchanged. Same per-frame atomic accumulators (4× uint64), same
sums_host[i] / n_pixelshost division. - Rebase impact: low. Upstream Netflix has no equivalent template; this consolidation is fork-local.
0113 — integer_motion_v2_cuda migrated to kernel_template (T-GPU-DEDUP-13)¶
- Touches:
core/src/feature/cuda/integer_motion_v2_cuda.c— stream/event pair + sad device+host quintet collapses tolc + rb. Raw-pixel ping-pongpix[2]stays outside the bundle. submit keeps the memset onpic_streaminline rather than callingsubmit_pre_launch(the helper would move the memset tolc.str, which races with the kernel reading the accumulator). init / collect / close call the matching template helpers.- Numerical contract: unchanged. Same D2D copy, same conditional kernel launch on frame ≥ 1, same host-side
min(score[i], score[i+1])flush.
0114 — integer_ssim_cuda migrated to kernel_template (T-GPU-DEDUP-14)¶
- Touches:
core/src/feature/cuda/integer_ssim_cuda.c— stream/event/partials device+host quintet collapses tolc + rb. Five intermediate float buffers (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp) stay outside the bundle. submit keeps thecuStreamWaitEvent + horiz + vert + DtoHchain inline — SSIM writes one float per block (no atomic), so the template'ssubmit_pre_launchmemset is unnecessary. init / collect / close use the matching template helpers.- Numerical contract: unchanged. Same horiz-then-vert two-pass pipeline, same per-block float partial reduction in double on the host.
places=4(matching the ciede_cuda precision pattern) holds. - Rebase impact: low. Upstream Netflix has no equivalent; this is fork-added.
0115 — ms_ssim_cuda + psnr_hvs_cuda lifecycle migration (T-GPU-DEDUP-15)¶
- Touches:
core/src/feature/cuda/integer_ms_ssim_cuda.c— stream + 2-event lifecycle replaced withVmafCudaKernelLifecycle lc; multi-level pyramid + SSIM intermediate + 3-partials buffers stay outside the template's single-pair readback bundle.core/src/feature/cuda/integer_psnr_hvs_cuda.c— same shape; 3-plane ref/dist/partials triples remain inline.- Numerical contract: unchanged. The migration only affects init / close boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the
s->str→s->lc.str/s->event→s->lc.submit/s->finished→s->lc.finishedfield renames.
0116 — float_psnr/ansnr/motion cuda → kernel_template (T-GPU-DEDUP-16)¶
- Touches:
core/src/feature/cuda/float_psnr_cuda.c— stream/event/partials quintet →lc + rb; input upload buffersref_in/dis_instay outside the bundle.core/src/feature/cuda/float_ansnr_cuda.c— same shape; rb wraps the (sig, noise) interleaved partials.core/src/feature/cuda/float_motion_cuda.c— same shape; rb wraps the SAD partials,blur[2]ping-pong stays outside.- Numerical contract: unchanged. Same dispatch geometry, same reduction order. Cross-backend parity gate at the kernels' contracted precision (places=3 per ADR-0192) holds.
0117 — float_adm + float_vif cuda lifecycle migration (T-GPU-DEDUP-17)¶
- Touches:
core/src/feature/cuda/float_adm_cuda.c— stream + 2-event lifecycle replaced withVmafCudaKernelLifecycle lc; multi-stage DWT + CSF pipeline state stays outside the template's single-pair readback bundle.core/src/feature/cuda/float_vif_cuda.c— same shape; 4-level pyramid + per-scale (num, den) pairs remain inline.- Numerical contract: unchanged. The migration only affects init / close stream-event boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the field renames.
- Rebase impact: low. Upstream Netflix has no equivalent template; this is fork-added.
0107 — float_psnr_vulkan migrated to kernel_template (T-GPU-DEDUP-8)¶
- Touches:
core/src/feature/vulkan/float_psnr_vulkan.c— state'sdsl + pipeline_layout + shader + pipeline + desc_poolquintet is collapsed into a singleVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy. No shader changes, no spec-constant changes, no push-constant changes.- Numerical contract: unchanged. The migration is a pure Vulkan-boilerplate consolidation. Cross-backend parity gate at
places=4holds — Netflix-pair smoke reportsfloat_psnrmean 30.755 dB, identical to pre-migration.
0109 — float_ansnr_vulkan + motion_v2_vulkan migrated to kernel_template (T-GPU-DEDUP-9)¶
- Touches:
core/src/feature/vulkan/float_ansnr_vulkan.c— single-pipeline state collapses toVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy.core/src/feature/vulkan/motion_v2_vulkan.c— same shape.- Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Cross-backend parity gate at the kernel's contracted precision holds — Netflix-pair smoke reports
float_ansnrmean 23.51 dB andmotion2_v2_scoremean 3.895, identical to pre-migration.
0110 — float_motion_vulkan migrated to kernel_template (T-GPU-DEDUP-10)¶
- Touches:
core/src/feature/vulkan/float_motion_vulkan.c— single-pipeline state collapses toVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy.- Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Netflix-pair smoke reports
motionmean 4.049 /motion2mean 3.894, identical to pre-migration. - Rebase impact: low. Upstream Netflix has no Vulkan backend.
0108 — Bristol VI-Lab feasibility digest + BVI-CC ingest ADR (Draft)¶
- Touches:
docs/research/0046-bristol-vi-lab-feasibility.md(new) — nine-dataset survey + use-case fit + effort estimate.docs/adr/0241-bristol-bvi-cc-ingest.md(new, Status: Draft) — proposal to ingest BVI-CC as the second tiny-AI corpus.docs/adr/README.md— index row for ADR-0241.CHANGELOG.md— Added entry.- Numerical contract: not applicable (docs-only).
- Rebase impact: none. Pure research deliverables; upstream Netflix has no equivalent surface.
0094 — Vulkan VkImage import v2 async pending-fence (T7-29 part 4 / ADR-0251)¶
- ADR: ADR-0251; predecessor ADR-0186.
- Touches:
core/src/vulkan/import.c— full rewrite of the submission path. Single-fencesubmit_and_waitbecomes per-slotsubmit_to_slot+drain_slot_fence; the newslot_alloc/slot_releasehelpers materialise / tear down a ring slot (staging-pair + cmd buffer + fence).vmaf_vulkan_import_imageindexes into the ring byframe_index % ring_size;vmaf_vulkan_wait_computedrains every outstanding fence.vmaf_vulkan_state_build_pictureswaits the slot's fence before exposing the host pointer. Public-API signatures are unchanged.core/src/vulkan/vulkan_internal.h— newstruct VmafVulkanImportSlot;VmafVulkanImportSlotsbecomes a fixed-capacityVmafVulkanImportSlot ring[VMAF_VULKAN_RING_MAX]plus geometry +ring_size. Two new defines —VMAF_VULKAN_RING_DEFAULT(4) andVMAF_VULKAN_RING_MAX(8).VmafVulkanStategainsrequested_ring_size.core/src/vulkan/common.c—vmaf_vulkan_state_initand_state_init_externalsetrequested_ring_size = VMAF_VULKAN_RING_DEFAULT.core/test/test_vulkan_async_pending_fence.c(new, contract smoke for the v1 → v2 swap).core/test/meson.build— registers the new test under the existingenable_vulkanguard.core/src/vulkan/AGENTS.md(new) — pins the three rebase-sensitive ring invariants.docs/adr/0251-vulkan-async-pending-fence.md(new),docs/research/0042-vulkan-async-pending-fence.md(new),docs/api/gpu.md,docs/backends/vulkan/overview.md,CHANGELOG.md,docs/rebase-notes.md.ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch— unchanged. The v2 ring is fully internal toVmafVulkanState; the public ABI stays byte-identical so the filter consumes the new path transparently.- Invariant 1 — fixed ring depth at first import.
lazy_alloc_ringis the only place that materialises the ring; once allocated the depth never changes for the lifetime of theVmafVulkanState. Any caller that needs a different depth has to free + re-init. The geometry pinning contract from v1 (ADR-0186) is preserved verbatim. - Invariant 2 —
vkResetFencesonly afterVK_SUCCESSfromvkWaitForFences. Sole reset path lives indrain_slot_fence;fence_in_flightflips back to 0 only after the wait succeeds. A-EIOfrom the wait propagates up without resetting (so a retry would correctly re-wait rather than silently move on). - Invariant 3 —
state_freedrains before destroying.vmaf_vulkan_import_slots_freewalks the ring and callsdrain_slot_fenceon every in-flight slot, then issues onevkQueueWaitIdlebelt-and-braces (any feature kernel that submitted on the same queue may still be running). Reordering this triggers validation-layer "destroying in-use object" errors. - Numerical contract: unchanged. Async submission only changes when the host can read the staging buffer, not which bytes the GPU writes. Cross-backend parity gate (
scripts/ci/cross_backend_parity_gate.py,places=4) holds. - Memory delta: staging arena scales
1 → ring_sizeper direction. At default depth and 1080p 8-bit Y, the per-state host-visible footprint grows from ~4 MiB to ~16 MiB. Documented in ADR-0251 §Consequences.
0090 — cambi_vulkan extractor (T7-36 / ADR-0210)¶
- ADR: ADR-0210; predecessor ADR-0205.
- Touches:
core/src/feature/vulkan/cambi_vulkan.c(replaces the spike scaffold'sinit_stub/extract_stub/close_stubtriple with the full Vulkan-aware lifecycle).core/src/feature/vulkan/shaders/cambi_preprocess.comp(new),cambi_mask_dp.comp(new — unified row-SAT / col-SAT / threshold-compare viaPASS=0/1/2spec const).core/src/feature/cambi.c— appends a small block of public trampolines (vmaf_cambi_*) at the bottom of the file that thinly wrap the file-static helpers. No upstream function-static code is renamed or moved; the entire upstream body of cambi.c above the trampolines stays byte-identical, which keeps Netflix sync straightforward.core/src/feature/cambi_internal.h(new) — internal-only header exposingvmaf_cambi_calculate_c_values,vmaf_cambi_get_spatial_mask, etc., to the GPU twin.core/src/vulkan/meson.build— registers the 5 cambi shaders invulkan_shader_sources[]andcambi_vulkan.cinvulkan_sources.core/src/feature/feature_extractor.c— adds the extern decl + registry entry forvmaf_fex_cambi_vulkanunder#if HAVE_VULKAN.scripts/ci/cross_backend_vif_diff.py—cambirow inFEATURE_METRICSso the cross-backend gate runs atplaces=4against the CPU baseline.docs/adr/0210-cambi-vulkan-integration.md,docs/research/0032-cambi-vulkan-integration.md,docs/backends/vulkan.md,CHANGELOG.md.- Invariant 1 — bit-exactness by construction. Every GPU phase is integer arithmetic (
uint16derivative,int32SAT,>compare, stride-2 gather, 3-elementmode3lookup). The readback into the hostVmafPicturepair is byte-identical to what the CPU would have written; the host residual then runs the unmodified CPUcalculate_c_values+ spatial pooling on those buffers. Any rebase that introduces float arithmetic into one of these GPU phases — e.g., a future Netflix change to the derivative kernel that adds a bilinear interpolation step — will silently breakplaces=4and must be caught at the cross-backend gate. - Invariant 2 —
cambi_internal.hsignatures must stay in lock-step with cambi.c's file-static helpers. The Vulkan twin callsvmaf_cambi_calculate_c_values, which trampolines to the file-staticcalculate_c_values. Any signature change to the latter (extra parameters, type changes) must update the trampoline + header in the same PR or the GPU build breaks. - On upstream sync: cambi.c's file-static helpers are sometimes renamed by upstream (e.g.,
decimate→cambi_decimatewould happen during a Netflix tidy-up). When rebasing, search cambi.c's tail for the trampoline block — its fivestaticcalls (get_spatial_mask,decimate,filter_mode,calculate_c_values,spatial_pooling,weight_scores_per_scale,get_pixels_in_window,increment_range,decrement_range,get_derivative_data_for_row,cambi_preprocessing) need to match the upstream symbol names. Update the trampoline body if upstream renames; signatures should not need to change because the trampoline already takes the function-pointer-typedef form (VmafRangeUpdateretc.). - Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py --backend vulkan --feature cambi --ref testdata/ref_576x324_48f.yuv --dist testdata/dis_576x324_48f.yuv --width 576 --height 324 --pixel-format 420 --bitdepth 8 --frames 48. Should emitplaces=4 PASSwithmax_abs_diff = 0.0. If it diverges, bisect the GPU phases by reading back individual buffers (image_buf/mask_buf/deriv_buf) and comparing against the CPU's in-placepicplane after the equivalent stage.
The pre-ADR-0108 fork-local PRs are summarised by workstream rather than per-PR. Future PRs add entries individually.
0085 — Upstream c70debb1 partial port (adm_csf + barten_csf tests)¶
- No ADR. Pure upstream cherry-pick per ADR-0108 carve-out ("pure upstream syncs and
port-upstream-commitPRs are exempt"). - Upstream source:
c70debb1(Kyle Swanson, 2026-04-28): "libvmaf/test: port new adm/vif/speed tests". The audit row that flagged the gap is T-NEW-2 in the 2026-04-29 quarterly upstream-backlog re-audit (PR #205). - Touches (additive only):
core/src/feature/adm_csf_tools.h— new header (verbatim from upstream); declares the inlineadm_native_csfhelper (DLM-paper CSF) used by the newtest_adm_csfunit.core/test/test_adm_csf.c— new unit (verbatim from upstream); 2mu_assertcases onadm_native_csf(3, 3.0, 1080, {0, 45}).core/test/test_barten_csf.c— new unit (verbatim from upstream); 23mu_assertcases overbarten_rod_cone_sens,barten_mtf,barten_csf,linear_interpolate,barten_watson_blend_csf(all symbols already on the fork).core/test/meson.build— registers the two new executables + addstest('test_adm_csf', ...)andtest('test_barten_csf', ...).CHANGELOG.mdUnreleased § Changed.- Deliberate scope cuts (the upstream commit's other halves are not portable verbatim):
test_vif_tools.c— depends on upstream symbolsNUM_KERNELSCALES, the 21-entryvalid_kernelscalestable,vif_validate_kernelscale,vif_get_filter_size,vif_get_filter,speed_get_antialias_filter, and a[NUM_KERNELSCALES][5][65]filter table that the fork'svif_filter1d_table_s [11][4][65]does not match. Per Research-0024 Strategy E, the fork deliberately diverges from the upstreamvifruntime-helper chain to preserve the ADR-0138 / 0139 / 0142 / 0143 SIMD bit-exactness contract. Porting this test requires porting the runtime helpers first.test_speed_chroma.c—#includesfeature/speed.cdirectly; the fork has no SpEED extractor (feature/speed.cdoes not exist). Pairs with audit row T-NEW-1 (port the SpEED extractor wholesale, or absorb it into the tiny-AI speed metric).- Invariants (rebase-relevant):
- The new
adm_csf_tools.hheader is wholly additive and does not conflict with the existing forkadm_csf_snon-inline helper inadm_tools.h(different signature, different translation units). - The two new tests do not depend on Netflix golden YUVs — they evaluate the closed-form CSF math directly. No golden-data interaction.
- On upstream sync: a future port of the upstream
vifruntime-helper chain (Research-0024 Strategy A reversal) or the SpEED extractor (T-NEW-1) unlocks the deferred halves of this commit. Until then, fork-sidetest_vif_tools.c/test_speed_chroma.cstay absent. - Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu test_adm_csf test_barten_csf
meson test -C build-cpu test_adm_csf test_barten_csf
0084 — Embedded MCP server scaffold (T5-2, ADR-0209)¶
- ADR: ADR-0209 (audit-first scaffold) on top of the ADR-0128 governance + Research-0005 design.
- Upstream source: fork-local. Netflix/vmaf has no embedded MCP server (and no plans to add one — the workflow is agent-tooling-specific, well outside upstream's library scope).
- Touches:
core/include/libvmaf/libvmaf_mcp.h— new public header.core/include/core/meson.build— newif get_option('enable_mcp')install branch.core/src/mcp/— new directory:mcp.c(stub TU) +meson.build(exposesmcp_sources+mcp_defines).core/src/meson.build— newis_mcp_enabledguard +subdir('mcp')block;mcp_sourcesthreaded into thelibrary('vmaf', ...)source list alongsidednn_sources.core/test/meson.build— newif get_option('enable_mcp')block wiringtest_mcp_smoke.core/test/test_mcp_smoke.c— new 12-sub-test smoke.core/meson_options.txt— newenable_mcpumbrella + three sub-flags (all defaultfalse).- Invariant: every public entry point in
libvmaf_mcp.h(vmaf_mcp_init/_start_sse/_start_uds/_start_stdio/_stop/_close) returns-ENOSYS(or-EINVALon bad arguments) until the T5-2b runtime PR lands. The smoke pins this contract — a runtime PR that flips a return code without flipping the smoke expectation regresses the gate. - On upstream sync: zero interaction with upstream files. Wholly additive directory + boolean build flags. The
subdir('mcp')insertion incore/src/meson.buildlives next to the existingsubdir('dnn')/ Vulkan blocks; an upstream conflict in that area would be confined to those few lines and is mechanical to resolve. - Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_mcp=false
ninja -C build-cpu && meson test -C build-cpu # baseline still green
meson setup --reconfigure build-cpu libvmaf -Denable_mcp=true \
-Denable_mcp_sse=true -Denable_mcp_uds=true -Denable_mcp_stdio=true
ninja -C build-cpu
meson test -C build-cpu test_mcp_smoke # 12/12 sub-tests pass
0065 — T7-37 Netflix bench rerun + docs/benchmarks.md TBD fill¶
- No ADR. Empirical fill of pre-existing
TBDcells; no new decision. The bench script fixes that this rerun depends on shipped earlier under PR #169 (libvmaf/AGENTS.md backend-engagement foot-guns), PR #170 (--backend cudaactually engages CUDA), and PR #171 (testdata/bench_all.shuses correct flags). Vulkan header install for SDK consumers is PR #175. - Touches (additive only):
docs/benchmarks.md(everyTBDcell replaced with measured numbers; hardware-profile table updated to theryzen-4090-archost the rerun was performed on; "How to reproduce" section now documents fixture acquisition for the gitignored BBB 4K 200-frame pair).CHANGELOG.mdUnreleased § Changed entry. - Invariants (rebase-relevant): none. The numbers are tied to fork commit
41301496and theryzen-4090-arcprofile; an upstream rebase that changes feature pipelines would invalidate the table but not break parsing. - On upstream sync: zero interaction. Pure docs.
- Re-test on rebase:
bash testdata/bench_all.sh(after a fresh fork build) — confirms the bench script still drives all four backends and that the per-row metrics-key counts (CPU=15, CUDA=12, SYCL/Vulkan=34) still distinguish them. If they collapse to one count, the new upstream broke a backend dispatcher silently.
0050 — float_adm_cuda + float_adm_sycl extractors (ADR-0202)¶
- ADR: ADR-0202
- Touches:
core/src/feature/cuda/float_adm/float_adm_score.cu(new)core/src/feature/cuda/float_adm_cuda.{c,h}(new)core/src/feature/sycl/float_adm_sycl.cpp(new)core/src/meson.build— three changes: (1) newfloat_adm_scoreentry incuda_cu_sources, (2) newcuda_cu_extra_flagsdict that threads--fmad=false+-Xcompiler=-ffp-contract=offinto thefloat_adm_scorefatbin only, (3) new SYCL source insycl_feature_sources.core/src/feature/feature_extractor.c(extern decls + list entries forvmaf_fex_float_adm_cuda/vmaf_fex_float_adm_syclunder#if HAVE_CUDA/#if HAVE_SYCL).- Invariant 1 —
--fmad=falsefor the float_adm fatbin only: the angle-flag dot product (ot_dp = oh*th + ov*tv) and the cube reductions (xa*xa*xa,csf_o*csf_o*csf_o) require IEEE-754 add/mul ordering to match the GLSLprecisequalifier infloat_adm.comp. NVCC's default-fmad=truefuses these and drifts pastplaces=4at scale 3 / adm2. The integer ADM kernels sharecuda_flagsbut useint64accumulators where FMA is irrelevant — keep the FMA-on default for them. - Invariant 2 — parent-LL dimension trap: stage 0 at
scale > 0reads the parent's LL band; the mirror/clamp bounds arescale_w/h[scale](= parent's LL output dims = current scale's input dims), NOTscale_w/h[scale - 1](= parent's full image dims). Bothfloat_adm_cuda.candfloat_adm_sycl.cppcite this inline. Do not "simplify" by using the off-by-one neighbour. - Re-test:
CXX=icpx CC=icx meson setup build-cs -Denable_cuda=true \
-Denable_sycl=true -Denable_vulkan=enabled \
-Denable_float=true \
-Dsycl_compiler=/opt/intel/oneapi/compiler/latest/bin/icpx
ninja -C build-cs
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build-cs/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature float_adm \
--backend cuda --places 4
# Same with --backend sycl on a host with an SYCL device.
# Both must report 0/N mismatches at places=4.
0049 — float_adm_vulkan extractor (ADR-0199)¶
- ADR: ADR-0199
- Touches:
core/src/feature/vulkan/float_adm_vulkan.c(new)core/src/feature/vulkan/shaders/float_adm.comp(new)core/src/vulkan/meson.build(adds the .comp shader and the new .c source)core/src/feature/feature_extractor.c(extern decl + list entry under#if HAVE_VULKAN)scripts/ci/cross_backend_vif_diff.py(float_admentry inFEATURE_METRICS).github/workflows/tests-and-quality-gates.yml(lavapipefloat_admstep atplaces=4)- Invariant: float_adm GPU port uses the
2 * sup - idx - 1mirror form on both axes — matches both the scalaradm_dwt2_sand the AVX2float_adm_dwt2_avx2, which both consume the samedwt2_src_indices_filt_sindex buffer. This is intentionally different from float_vif's GPU mirror (ADR-0197), which uses-2because float_vif's AVX2 path takes a different code branch. Do not "fix" the asymmetry by analogy with float_vif. - Re-test:
meson setup build-vk -Denable_vulkan=enabled -Denable_cuda=false \
-Denable_sycl=false
ninja -C build-vk
meson test -C build-vk
VK_LOADER_DRIVERS_SELECT='*lvp*' python3 \
scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build-vk/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature float_adm --places 4
0083 — SSIMULACRA 2 Vulkan kernel (ADR-0201)¶
- ADR: ADR-0201
- Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf — fully fork-local feature.
- Touches:
core/src/feature/vulkan/ssimulacra2_vulkan.c(new file).core/src/feature/vulkan/shaders/ssimulacra2_xyb.comp,ssimulacra2_blur.comp,ssimulacra2_mul.comp,ssimulacra2_ssim.comp(4 new shader files).core/src/vulkan/meson.build— added 4 shaders tovulkan_shader_sourcesand 1 source tovulkan_sources; added all 4 ssimulacra2 shaders topsnr_hvs_strict_shaders(the-O0strict-mode list, kept its legacy name).core/src/feature/feature_extractor.c— registeredvmaf_fex_ssimulacra2_vulkanin the Vulkan branch of the extractor list (betweenpsnr_hvs_vulkanand the CUDA block).scripts/ci/cross_backend_vif_diff.py— addedssimulacra2toFEATURE_METRICS.- Rebase impact: low — fully additive, no upstream-shared files modified beyond
feature_extractor.c's registry array (which always grows on every new extractor and is not a rebase pain point). - Verification command:
meson setup core/build-vk-ss2 \
-Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false \
libvmaf
ninja -C core/build-vk-ss2 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build-vk-ss2/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 \
--feature ssimulacra2 --backend vulkan --places 1
# expected: max_abs_diff ≈ 1.59e-2, 0/48 mismatches at places=1
- Follow-ups:
- CUDA + SYCL twins (batch 3 parts 7b + 7c per ADR-0192).
- Performance follow-up: re-bin multiple rows / columns per WG in the IIR blur (currently
local_size = 1, one row/col per WG for correctness). - Optional: rename
psnr_hvs_strict_shaderstostrict_shadersincore/src/vulkan/meson.build(cosmetic — out of scope for this PR).
0001 — SIMD bit-identical reductions for float ADM¶
- Workstream PRs: #18, commits
24c88a32,f082cfd3. - Touches:
core/src/feature/integer_adm.c,core/src/feature/float_adm.c,core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/feature/arm64/adm_neon.c, upstreampython/test/feature_extractor_test.pytest expectations. - Invariant:
sum_cubeandcsf_den_scaleaccumulate cubed values in double precision (via_mm256_cvtps_pd/_mm512_cvtps_pd) in scalar, AVX2, AVX-512, and NEON. Upstream accumulates in float, which produces ~8e-5 drift between scalar and SIMD. Test expectations were tightened to match the double-precision path; an upstream-side accumulator change would re-introduce the drift and break the tightened assertions. - Re-test:
meson test -C build --suite=fast && python -m pytest python/test/feature_extractor_test.py -k adm.
0002 — CUDA ADM decouple-inline buffer elimination¶
- Workstream PRs: commit
787e3382. - Touches:
core/src/feature/cuda/integer_adm_cuda.cu,core/src/feature/cuda/adm_decouple_inline.cuh(new),core/src/feature/cuda/meson.build. Upstream'sadm_decouple.cuis no longer compiled in the fork. - Invariant: CSF and CM CUDA kernels read
ref/disDWT2 buffers directly and computedecouple_r/decouple_ainline via__device__helpers inadm_decouple_inline.cuh. The 6 intermediate buffers (decouple_r,decouple_a,csf_a× {scale-0 int16, scales 1-3 int32}) and the standaloneadm_decouple.cusource are intentionally removed. ~107 MB GPU memory savings at 4K. An upstream change toadm_decouple.cuwill look orphaned and a literal merge would re-introduce the buffer allocations. - Re-test:
meson setup build -Denable_cuda=true && ninja -C build && meson test -C build --suite=cuda.
0003 — SYCL backend (USM pool / D3D11 import / vmaf_sycl_* API)¶
- Workstream PRs: #33, #35, #5 (initial scaffolding), and the picture-pool deadlock fix that landed via #32.
- Touches:
core/include/libvmaf/libvmaf_sycl.h,core/src/sycl/,core/src/feature/sycl/,core/src/libvmaf.c(SYCL public-API entry points),meson_options.txt(enable_sycl). - Invariant:
vmaf_sycl_preallocate_picturesconstructs a realVmafSyclPicturePoolhonoringVmafSyclPicturePreallocationMethod(NONE/DEVICE/HOST);vmaf_sycl_picture_fetchdispatches to the pool when configured. The whole SYCL tree is fork-local and has no upstream counterpart — upstream changes tocore/src/libvmaf.cnear the SYCL entry-point block are likely to conflict. Picture-pool error paths invmaf_read_pictures(libvmaf.c) mustgoto cleanup;rather thanreturn err;to avoid leaking ref/dist pictures into the live-picture set (closes the always-on-pool deadlock fixed in #32 — see ADR-0104). See ADR-0101, ADR-0103, ADR-0104. - Re-test:
meson setup build -Denable_sycl=true && ninja -C build && meson test -C build --suite=sycl(requires oneAPI / icpx).
0004 — DNN runtime + tiny-AI surfaces¶
- Workstream PRs: #5, #8, #21, #22, #23, #31, #34, plus the pre-numbered DNN feat commits (
9b985946,1e5336d3,d122b721). - Touches:
core/include/libvmaf/dnn.h,core/src/dnn/,core/src/feature/feature_lpips.c,model/tiny/,meson_options.txt(enable_onnxruntime). - Invariant: ordered EP selection (CUDA → DML → CPU) with graceful fallback (ADR-0102);
fp16_iodoes host-side fp32↔fp16 cast on the scoring path;VMAF_TINY_MODEL_DIRenforces a path jail on model load (PR #31); the runtime op-allowlist (PR #21) walks the ONNX graph and rejects unknown ops + bounds Loop/Iftrip_countat 1024 (ADR-0036/0107). DNN tree is fork-local; upstream has no DNN code yet, so conflicts here are unlikely but themeson_options.txtandcore/src/meson.buildblocks near the DNN flag may collide. - Re-test:
meson setup build -Denable_onnxruntime=true && ninja -C build && meson test -C build --suite=dnn.
0005 — --precision CLI flag (IEEE-754 round-trip lossless)¶
- Workstream PRs: commit
c989fbd9. - Touches:
core/tools/vmaf.c,core/tools/cli_parse.c,core/include/libvmaf/libvmaf.h(addedvmaf_write_output_with_format),core/src/output.c. - Invariant: default
--precisionis%.17g(round-trip lossless);legacyopts back into upstream's%.6f; the public C API gainedvmaf_write_output_with_formatand the oldvmaf_write_outputroutes through it with the%.17gdefault. ABI-breaking only if upstream adds a same-named function with a different signature. See ADR-0006. - Re-test:
vmaf -r ref.yuv -d dis.yuv ... --precision=fulland diff against--precision=legacy.
0006 — Netflix golden tests preserved verbatim as required gate¶
- Workstream PRs: across the fork's life; codified in ADR-0024.
- Touches:
python/test/quality_runner_test.py,python/test/vmafexec_test.py,python/test/vmafexec_feature_extractor_test.py,python/test/feature_extractor_test.py,python/test/result_test.py,python/test/resource/yuv/. - Invariant:
assertAlmostEqual(...)golden values in the five upstream Python test files are never modified by this fork. Fork-added tests live in separate files (e.g.python/test/test_precision_flag.py). The CI gate "Netflix CPU golden tests (D24)" is required and blocks merge. Upstream changes to these files are accepted unless they relax the assertions. - Re-test:
make test-netflix-golden.
0007 — Build system (CUDA 13.2, oneAPI 2025.3, MkDocs migration)¶
- Workstream PRs: #7, #17, commit
8a995cb0. - Touches:
meson.build,meson_options.txt, top-levelMakefile,docs/(Sphinx → MkDocs Material migration —docs/conf.pyremoved,mkdocs.ymladded),docs/requirements.txt,Dockerfile.*, distro install scripts underscripts/. - Invariant: image pins are non-conservative (ADR-0027) — CUDA 13.2, oneAPI 2025.3, clang-format 22, black 26 — and ship experimental toolchain flags (
--expt-relaxed-constexpr, etc.) deliberately. An upstream sync that pulls in a Dockerfile change targeted at older CUDA or older oneAPI must not relax the pins. - Re-test:
meson setup build -Denable_cuda=true -Denable_sycl=true && ninja -C build && mkdocs build --strict.
0008 — Workspace / docs / MATLAB / resource-tree relocations¶
- Workstream PRs: codified across ADR-0026, ADR-0029, ADR-0030, ADR-0031, ADR-0032, ADR-0033, ADR-0034, ADR-0038.
- Touches: any path-walk in upstream's CI / scripts / docs that assumes the upstream layout (root-level
workspace/,resource/,matlab/, rootunittestscript, rootpatches/). - Invariant: the fork's layout is
python/vmaf/workspace/,python/vmaf/resource/,python/vmaf/matlab/,scripts/unittest,ffmpeg-patches/only,.github/codeql-config.yml. Upstream moves to a different sub-tree (e.g. a hypotheticaltools/workspace/) need to either be applied via a corresponding fork-side relocation or rejected with a rebase note. - Re-test:
python -m pytest python/test/ -k golden(verifies the resource-tree path works);make test-netflix-golden.
0009 — License headers (Lusoris/Claude on wholly-new files¶
2016–2026 on Netflix files)
- Workstream PRs: commits
c159761d,a185f8ef,0e98c949, codified in ADR-0025 / ADR-0105. - Touches: every wholly-new fork file (notably the SYCL tree and
core/src/dnn/) and every Netflix-touched file (year range2016 → 2016–2026). - Invariant: wholly-new fork files carry
Copyright 2026 Lusoris and Claude (Anthropic)under the same BSD-3-Clause-Plus-Patent license; mixed files use a dual-copyright notice. An upstream commit that resets a Netflix file's year range (e.g. back to2016–2020) must be partially rejected — keep the fork's2016–2026. - Re-test: grep that wholly-new fork files retain the Lusoris/Claude header (
grep -L "Copyright 2026 Lusoris" core/src/sycl/*.cpp— expected to match nothing).
0010 — .claude/ agent scaffolding + ADR tree + AGENTS.md / CLAUDE.md¶
- Workstream PRs: #14, #24, #37, plus continuous additions.
- Touches:
.claude/,AGENTS.md,CLAUDE.md,docs/adr/,.github/PULL_REQUEST_TEMPLATE.md. - Invariant: this whole tree is fork-local and has no upstream counterpart. Upstream additions to
.github/(issue templates, workflows) need to merge cleanly with the fork's existing files rather than replacing them. The ADR tree's IDs ≤ 0099 are backfills; new decisions start at 0100 (ADR-0028 / ADR-0106). - Re-test: visual review of
.github/anddocs/adr/README.mdafter the merge.
Pre-ADR-0108 entries above are the result of a one-shot backfill sweep on 2026-04-18; subsequent fork-local PRs add their own entries inline.
0011 — Nightly bisect-model-quality + fixture cache¶
- Workstream PRs: closes #4; sticky tracker issue #40.
- Touches:
.github/workflows/nightly-bisect.yml,ai/scripts/build_bisect_cache.py,ai/testdata/bisect/{features.parquet, models/*.onnx, README.md},scripts/ci/post-bisect-comment.py,docs/ai/bisect-model-quality.md,docs/adr/0109-nightly-bisect-model-quality.md,docs/research/0001-bisect-model-quality-cache.md,mkdocs.yml(nav). - Invariant: the committed parquet + ONNX bytes under
ai/testdata/bisect/must regenerate byte-identically fromai/scripts/build_bisect_cache.pywith seedsFEATURE_SEED=20260418andMODEL_SEED=20260419. The CI--checkstep asserts this before every bisect run, so any upstream pull that bumpspandas/pyarrow/onnxenough to change the serialiser bytes will fail the workflow until the cache is regenerated and committed. - Re-test:
python ai/scripts/build_bisect_cache.py --check
vmaf-train bisect-model-quality \
ai/testdata/bisect/models/model_*.onnx \
--features ai/testdata/bisect/features.parquet \
--min-plcc 0.85 --input-name input
# Expected: "no regression in this range"; first_bad_index None.
Pure upstream code is not touched, so no Netflix-side conflict vector. Only fork-local files; risk is toolchain drift, not merge conflict.
0012 — Upstream ADM port (Netflix 966be8d5)¶
- Workstream PRs: this PR; ports a single upstream commit.
- Touches:
core/src/feature/integer_adm.{c,h},core/src/feature/x86/adm_avx2.{c,h},core/src/feature/x86/adm_avx512.{c,h},core/src/feature/alias.c,core/src/feature/barten_csf_tools.h(new upstream file). - Invariant: the eight ADM files now mirror upstream's content byte-for-byte (modulo our clang-format-22 pass and the Netflix copyright-year bump on the new header). Future
/sync-upstreamruns can take new upstream ADM commits cleanly. Do not revert to a pre-966be8d5ADM kernel without also reverting the call-site signatures ininteger_compute_adm— upstream extendedi4_adm_cmfrom 8 to 13 args. - Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model version=vmaf_v0.6.1 -o /tmp/vmaf-port.json
grep '<metric name="vmaf"' /tmp/vmaf-port.json
# Expected: mean ≈ 76.66890 (golden 76.66890519623612, places=4 OK).
0013 — Upstream motion port (Netflix PR #1486 head 2aab9ef1)¶
- Workstream PRs: this PR; ports upstream PR #1486 (4 commits on top of
966be8d5ADM base, head2aab9ef1). Sister to entry 0012. - Touches:
core/src/feature/integer_motion.{c,h},core/src/feature/motion_blend_tools.h(new upstream file),core/src/feature/x86/motion_avx2.c,core/src/feature/x86/motion_avx512.c,core/src/feature/alias.c(additive:integer_motion3row),python/test/{quality_runner,vmafexec,feature_extractor,vmafexec_feature_extractor}_test.py(golden tolerance updates:places=4→places=2on motion-affected asserts; expected values unchanged). - Invariant: motion files mirror upstream byte-for-byte (modulo our clang-format-22 pass). The
alias.crow forinteger_motion3was inserted surgically to avoid clobbering the AVX-512 ADM registration added by entry 0012; new motion3 metric appears in default VMAF model output but is not standalone-loadable via--feature integer_motion3(sub-feature only). Netflix golden VMAF mean shifts76.668904824→76.667830213(well withinplaces=2tolerance the upstream PR loosened to). Do not revertplaces=4on motion-touching assertions without also reverting the motion code. - Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model version=vmaf_v0.6.1 -o /tmp/vmaf-motion-port.json
grep -E '<metric name="vmaf"|integer_motion3' /tmp/vmaf-motion-port.json
# Expected: vmaf mean ≈ 76.66783; integer_motion3 mean ≈ 3.98976.
0014 — Coverage gate overhaul + upstream python/test/ reformat¶
- Workstream PRs: this PR (coverage-gate overhaul + in-tree reformat of upstream-mirror Python tests).
- Touches:
.github/workflows/ci.yml(CPU + GPU coverage jobs:-Dc_args=-fprofile-update=atomic/-Dcpp_args=-fprofile-update=atomic,meson test --num-processes 1,-Denable_dnn=enabled, ORT install step on the CPU coverage job,lcov/geninforeplaced bygcovrwith--json-summary/--xml/--txtoutput, artifact renamecoverage-lcov-{cpu,gpu}→coverage-{cpu,gpu}),scripts/ci/coverage-check.sh(rewritten to parse gcovr JSON viapython3 -c— same CLI signature),core/src/dnn/dnn_api.c+ newcore/src/dnn/dnn_attach_api.c(vmaf_use_tiny_modelcarved out into its own TU so the unit-test binaries — which pull indnn_sourcesforfeature_lpips.cbut never linklibvmaf.c— don't end up with an undefined reference tovmaf_ctx_dnn_attachonceenable_dnn=enabledactivates the real bodies),core/src/dnn/meson.build+core/src/meson.build(newdnn_libvmaf_only_sourceslist wired intolibvmaf.soonly),python/test/{feature_extractor,quality_runner,vmafexec,vmafexec_feature_extractor}_test.py(mechanical Black + isort reformat — no assertion values changed, imports regrouped, line wrapping normalised). - Invariant: coverage CI must keep all five pieces in lockstep — (a)
-fprofile-update=atomiccloses the intra-process counter race on SIMD inner loops (vif_avx2.c:673,motion_avx2, etc.) → negative counts →geninfo/gcovr abort; (b)--num-processes 1closes the inter-process race where multiple parallel test binaries merge their counters into the same.gcdafiles for the sharedlibvmaf.soat process exit (per-thread atomicity does not cover this); (c)gcovrdeduplicates.gcnofiles belonging to the same source compiled into multiple targets — without dedup, lcov sums hits across compilation units and yields impossible100% values (
dnn_api.c — 1176%was the smoking gun on the first attempt that had only (a)+(b)); (d) ORT install +enable_dnn=enabledin the coverage job is what makescore/src/dnn/*.cmeasurable in the first place — without ORT, the DNN tree compiles in stub branches and the 85% per-critical-file gate is meaningless; (e)vmaf_use_tiny_modellives indnn_attach_api.cand is added tolibvmaf.soonly viadnn_libvmaf_only_sources— moving it back intodnn_api.creintroduces thevmaf_ctx_dnn_attachundefined-reference link error intest_feature_extractor/test_lpipswheneverenable_dnn=enabled, since those test binaries pull indnn_sourcesforfeature_lpips.cbut never linklibvmaf.c. Lint scope: upstream-mirror Python tests are linted at the same standard as fork-added code; we accept that/sync-upstreamand/port-upstream-commitwill re-trigger Black/isort failures whenever upstream rewrites these files, and the fix is another in-tree reformat pass — never an exclusion. The fork'spyproject.tomland.pre-commit-config.yamlkeeppython/test/resource/(binary fixtures only) excluded;python/test/*.pyis in scope. See ADR-0110 (race fixes, superseded) and ADR-0111 (gcovr + ORT layer). - Re-test:
# Reproduce coverage path locally (requires gcc + python3-pip):
pip install --user 'gcovr>=8.0'
cd libvmaf
meson setup build-cov-test --buildtype=debug -Db_coverage=true \
-Denable_avx512=true -Denable_float=true -Denable_dnn=disabled \
-Dc_args=-fprofile-update=atomic -Dcpp_args=-fprofile-update=atomic
ninja -C build-cov-test
meson test -C build-cov-test --print-errorlogs --num-processes 1
~/.local/bin/gcovr --root .. \
--filter 'src/.*' \
--exclude '.*/test/.*' --exclude '.*/tests/.*' \
--exclude '.*/subprojects/.*' \
--gcov-ignore-parse-errors=negative_hits.warn \
--gcov-ignore-parse-errors=suspicious_hits.warn \
--print-summary --txt build-cov-test/coverage.txt \
--json-summary build-cov-test/coverage.json \
build-cov-test
grep -E 'dnn_api|model_loader' build-cov-test/coverage.txt
# Expected: gcovr completes without "Unexpected negative count" AND no
# per-file percentages exceed 100% (drop --num-processes 1 to reproduce
# the multi-process .gcda merge race; switch back to lcov to reproduce
# the dnn_api.c — 1176% over-count from compilation-unit summation).
# Lint smoke test for upstream-mirror tree:
pre-commit run --files python/test/quality_runner_test.py
# Expected: Black/isort/Ruff all PASS — files are reformatted in-tree
# to fork style and stay clean until the next upstream sync.
0015 — Tox doctest collection skips vmaf/resource/¶
- Workstream PRs: this PR (
fix(ci): skip pytest doctest collection of vmaf/resource/ data files). Surfaced once ADR-0115 consolidated CI triggers tomasterand tox actually started running on PRs. - Touches:
python/tox.ini(single-line--ignore=vmaf/resourceadded to the pytest invocation, plus an explanatory comment block). Pure fork-local; no upstream Python file changes. - Invariant:
pytest --doctest-modulesmust not attempt to import files underpython/vmaf/resource/. Those are parameter / dataset / example-config.pyfiles; several have dots in their stems (e.g.vmaf_v7.2_bootstrap.py) that make them unimportable as Python modules. None carry doctests, so the ignore is correctness rather than a workaround. Do not drop the--ignore=vmaf/resourceflag without first verifying every file under that directory has been renamed to a dot-free stem and is importable. - Re-test:
cd python && tox -e py311 -- --collect-only --doctest-modules \
--ignore=vmaf/resource 2>&1 | grep -c "ERROR collecting vmaf/resource"
# Expected: 0 (was 5 before the fix).
Pure upstream code is not touched, so no Netflix-side conflict vector. Risk is upstream renaming or removing files under python/vmaf/resource/ such that the directory disappears, in which case the --ignore becomes a harmless no-op.
0016 — SYCL -fsycl link-arg gated on icpx CXX¶
- Workstream PRs: this PR (
fix(libvmaf): gate -fsycl link arg on icpx CXX, allow gcc/clang host linker). Surfaced once ADR-0115's CI consolidation added an Ubuntu SYCL job to PR-time CI that usesCXX=g++(host linker) with sidecar icpx for SYCL .cpp compilation. - Touches:
core/src/meson.build(thevmaf_link_argsblock immediately after theis_sycl_enabledflag handling — currently ~lines 696-712). Pure fork-local; no upstream Meson file changes expected. - Invariant:
-fsyclis appended tovmaf_link_argsonly whenmeson.get_compiler('cpp').get_id() == 'intel-llvm'(icpx). Rationale: the documented project mode (see comment nearis_sycl_enabledblock at top ofsrc/meson.build) compiles SYCL.cppfiles viacustom_targetwith icpx, while the project's CXX driver may be gcc / clang / msvc; in that mode the SPIR-V device code is already embedded in the icpx-compiled.ofiles at compile time, and the runtime libraries (libsycl+libsvml+libirc+libze_loader) declared as link dependencies resolve every symbol. Passing-fsyclto a non-icpx linker is a hard error (g++: error: unrecognized command-line option '-fsycl'). Do not remove thecpp.get_id() == 'intel-llvm'guard without first verifying every CI matrix leg uses icpx as the project CXX. - Re-test:
meson setup build -Denable_sycl=true \
-Dcpp_link_args=-Wl,--no-undefined
ninja -C build src/libvmaf.so.3
# Expected: link succeeds; no `-fsycl` errors with gcc/clang host CXX.
Pure fork-local guard; no Netflix-side conflict vector.
0017 — CLI precision default %.6f (Netflix-compat) + frame-skip unref¶
- Workstream PRs: this PR (
fix(cli): revert precision default to %.6f and unref skipped frames). Reverts the default flipped by commitc989fbd9(ADR-0006) per ADR-0119. Companion fix incore/tools/vmaf.cresolves the picture-pool exhaustion in the--frame_skip_ref/distloops surfaced once the always-on picture pool (ADR-0104) made unref'ing skipped pictures mandatory. - Touches:
core/tools/cli_parse.c(VMAF_DEFAULT_PRECISION_FMT+VMAF_LOSSLESS_PRECISION_FMTmacros,resolve_precision_fmt()body,--helptext)core/tools/cli_parse.h(field comments only; struct shape unchanged)core/src/output.c(DEFAULT_SCORE_FORMATmacro)core/tools/vmaf.c(skip loop bodies at thec.frame_skip_ref/c.frame_skip_distfor-loops)python/vmaf/core/result.py(per-frame and aggregate:.6fformatters)python/test/command_line_test.pyis unmodified — Netflix golden assertions stay frozen per CLAUDE.md §8; the binary's output format adapts to them, not the other way around.- Invariant:
vmafCLI default score-output format is%.6f(matches upstream Netflix byte-for-byte).--precision=max|fullselects%.17g(IEEE-754 round-trip lossless).--precision=legacyis a synonym for the default. The library default forvmaf_write_output_with_format(..., score_format=NULL)matches. Skipped frames in the--frame_skip_ref/--frame_skip_distpre-loops arevmaf_picture_unref'd immediately after fetch so the preallocated picture pool is not exhausted before the main scoring loop runs. Do not flip the macros back to%.17gor remove the unrefs without a superseding ADR — both are golden-gate-load-bearing. - Re-test:
ninja -C core/build
python -m pytest python/test/command_line_test.py \
::VmafexecCommandLineTest::test_run_vmafexec \
::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping \
::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping_unequal \
-v
# Expected: all three PASS in <1 s combined.
Pure fork-local; no Netflix-side conflict vector. If upstream ever changes the default format string, treat their value as the new baseline and reconfirm the golden assertions before adopting.
0018 — FFmpeg patches ship as ordered series.txt¶
- Workstream PRs: this PR (
fix(ci): drop dead sycl trigger + consolidate windows.yml into libvmaf.yml (ADR-0115)). Surfaced once ADR-0115's consolidation routed the docker / FFmpeg-SYCL jobs through the master-targeting CI gate for the first time on this branch — the standalone0003-…sycl…apply broke because it referenced struct fields added by0001-…tiny-model…, the Dockerfile onlyCOPY'd 0003, andffmpeg.ymlreferenced a stale../patches/path. - Touches:
Dockerfile(lines ~86-95 — the FFmpeg patch-apply block),.github/workflows/ffmpeg.yml(theBuild FFmpeg with SYCL patch seriesstep),ffmpeg-patches/000{1,2,3}-*.patch(regenerated via realgit format-patch -3so they carry validindex <sha>..<sha> <mode>lines and committable SHAs). Pure fork-local; no upstream FFmpeg or Netflix file changes. - Invariant: both the Dockerfile and
ffmpeg.ymlwalkffmpeg-patches/series.txtline-by-line and apply each patch viagit applywith apatch -p1fallback. Do not ship a new patch without appending it toseries.txt, and do not reorder existing entries — patch 0003 references LIBVMAFContext fields added by patch 0001, so any out-of-order apply breaks the build at hunk 2 of vf_libvmaf.c. - Two flag-side fixes bundled in the same PR:
--enable-libvmaf-syclis not a valid FFmpeg configure option. Patch 0003 usescheck_pkg_config libvmaf_sycl …auto-detection (matching howlibvmaf_cudais wired) — it never registers the switch. Both Dockerfile and ffmpeg.yml used to pass the flag and configure rejected it withUnknown option "--enable-libvmaf-sycl". SYCL support is now controlled solely by-Denable_sycl=trueat libvmaf build time; FFmpeg picks it up automatically whenlibvmaf-sycl.pcis onPKG_CONFIG_PATH.- The Dockerfile now carries two nvcc-flag ARGs.
NVCC_FLAGS(libvmaf) keeps four-gencodelines plus the experimental--extended-lambda/--expt-relaxed-constexpr/--expt-extended-lambdaflags needed for Thrust/CUB host+device code.FFMPEG_NVCC_FLAGS(FFmpeg) carries a single-gencode arch=compute_75,code=sm_75 -O2— FFmpeg'scheck_nvccrunsnvcc -ptx, which fails withnvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectureson multi-arch input, and--extended-lambdarequires host+device compilation. compute_75 PTX is forward-compatible with all newer GPUs via driver JIT. --enable-libnppis no longer passed to FFmpeg's configure. FFmpeg n8.1's libnpp probe carries an explicitdie "ERROR: libnpp support is deprecated, version 13.0 and up are not supported"(configure:7335-7336) that fires on the base image's CUDA 13.2 libnpp. We don't use scale_npp / transpose_npp / sharpen_npp in any VMAF workflow; cuvid + nvdec + nvenc + libvmaf-cuda is the actual GPU path. Revisit once we move to an FFmpeg release that supports CUDA 13 libnpp upstream.- Patch 0002 (
add-vmaf_pre-filter) gained a missing#include "libavutil/imgutils.h"forav_image_copy_plane(). FFmpeg's libavfilter Makefile builds with-Werror=implicit-function-declarationso this fired during the actual compile (not configure). Caught by a localdocker buildrather than waiting for GitHub Actions — much faster iteration loop. - Re-test:
cd /tmp && rm -rf ffmpeg-test && \
git clone -q --depth 1 -b n8.1 \
https://git.ffmpeg.org/ffmpeg.git ffmpeg-test && \
cd ffmpeg-test && \
while IFS= read -r line; do \
case "$line" in ''|\#*) continue ;; esac; \
git apply "/path/to/vmaf/ffmpeg-patches/$line" \
|| patch -p1 < "/path/to/vmaf/ffmpeg-patches/$line"; \
done < /path/to/vmaf/ffmpeg-patches/series.txt
# Expected: all three patches apply with no rejects; the resulting
# tree compiles with --enable-libvmaf. SYCL is auto-detected via
# check_pkg_config (patch 0003), so no explicit configure flag is
# required when libvmaf-sycl.pc is on PKG_CONFIG_PATH.
Pure fork-local series; no Netflix-side conflict vector. See ADR-0118.
0019 — Coverage Gate annotations: upload-artifact v7 + gcovr filter¶
- Workstream PRs: this PR.
- Touches:
.github/workflows/ci.yml(CPU + GPU coverage steps: gcovr stderr piped throughgrep -vE 'Ignoring (suspicious|negative) hits' ... || true),.github/workflows/{ci,lint,nightly,nightly-bisect,supply-chain,libvmaf}.yml(actions/upload-artifact@v5|@v6 → @v7,actions/download-artifact@v5 → @v7insupply-chain.yml). Note:windows.ymlwas consolidated intolibvmaf.ymlby ADR-0115 / PR #50, so the windows-side bump now lives inlibvmaf.yml'sbuild (MINGW64, …)job. - Invariant: Coverage Gate Annotations panel must finish empty on a clean run. The two pieces are coordinated — (a)
@v7for upload / download artifact actions silences GitHub's Node-20 deprecation banner ahead of the 2026-06-02 forced-Node-24 cutoff; (b) the gcovr stderr filter swallows theIgnoring (suspicious|negative) hitswarnings that gcovr 8 emits for the legitimately-large hit counts in tight ANSNR / VIF / motion inner loops (e.g.ansnr_tools.c:207at ~4.93 G hits across an HD multi-frame coverage suite — real, not gcov bug). The filter is regex-narrow and anchored to gcov's exact warning prefix; any other gcovr warning still surfaces. Upstream (Netflix/vmaf) does not maintain these CI files; rebase impact is limited to the unlikely case that an upstream sync touches the shared.github/workflows/tree, which it currently does not. See ADR-0117. - Re-test:
# Verify gcovr filter locally (after a coverage build per entry 0014):
~/.local/bin/gcovr --root .. \
--filter 'src/.*' \
--exclude '.*/test/.*' --exclude '.*/tests/.*' \
--exclude '.*/subprojects/.*' \
--gcov-ignore-parse-errors=negative_hits.warn \
--gcov-ignore-parse-errors=suspicious_hits.warn \
--print-summary --txt build-cov-test/coverage.txt \
build-cov-test \
2> >(grep -vE 'Ignoring (suspicious|negative) hits' >&2 || true)
# Expected: stderr contains the gcovr summary block but NO
# "Ignoring (suspicious|negative) hits" lines. coverage.txt unchanged.
# Verify all upload/download-artifact instances are on @v7:
grep -rE 'actions/(upload|download)-artifact@v[0-6]' .github/workflows/
# Expected: empty output.
0020 — CI workflow file + display-name renames (Title Case sweep)¶
- Workstream PRs: this PR; renames all six core
.github/workflows/*.ymlfiles to purpose-descriptive kebab-case and normalises every workflowname:and jobname:to Title Case. See ADR-0116. - Touches:
.github/workflows/{ci,lint,security,libvmaf,ffmpeg,docker}.yml(renamed viagit mvtotests-and-quality-gates.yml,lint-and-format.yml,security-scans.yml,libvmaf-build-matrix.yml,ffmpeg-integration.yml,docker-image.yml),README.md(5 badge URLs + labels),docs/principles.md(line 5 workflow-tuple update),.claude/skills/add-gpu-backend/SKILL.md+scaffold.sh(filename refs),docs/adr/0116-*.md(new),docs/adr/README.md(index row),CHANGELOG.md. - Invariant: workflow files are purpose-named; their
name:fields are Title Case sentences with em-dash axis tags; job-levelname:strings are Title Case sentences (Build — / Pre-Commit / Coverage Gate / etc.). Required-status-check contexts inmasterbranch protection are bound to job-level names — when renaming any job, re-pin viagh api --method PUT repos/VMAFx/vmafx/branches/master/protection. The 19 required gates' semantics are unchanged from ADR-0037; only their display strings move. - Re-test:
# Validate every workflow file parses and lists the expected job names.
cd .github/workflows
for f in tests-and-quality-gates.yml lint-and-format.yml security-scans.yml \
libvmaf-build-matrix.yml ffmpeg-integration.yml docker-image.yml; do
yq '.name, .jobs.[].name' "$f" || echo "PARSE FAIL: $f"
done
# Expected: each workflow prints its Title Case workflow name + job names;
# no PARSE FAIL lines.
0021 — DNN-enabled CI matrix legs (gcc + clang + macOS)¶
- Workstream PRs: this PR; adds three new entries to the
libvmaf-buildmatrix in.github/workflows/libvmaf-build-matrix.ymlcovering-Denable_dnn=enabledacross Ubuntu/gcc, Ubuntu/clang, and macOS/clang. See ADR-0120. - Touches:
.github/workflows/libvmaf-build-matrix.yml(3 new matrix entries + ORT install steps + dedicated dnn-suite test step),docs/adr/0120-ai-enabled-ci-matrix-legs.md(new),docs/adr/README.md(index row),CHANGELOG.md(Added entry). - Invariant: the DNN matrix legs install ONNX Runtime via the same pinned source as the dedicated Tiny AI job (tests-and-quality-gates.yml) — Linux: MS tarball at the version pinned by
ORT_VERSION; macOS: Homebrew. When the Tiny AI job's pin changes, the matrix legs'ORT_VERSIONenv in theirInstall ONNX Runtime (linux, DNN leg)step must change to match; otherwise compiler/portability coverage drifts away from the gating leg's actual ABI. - Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.libvmaf-build.strategy.matrix.include[] | select(.dnn==true) | .name' \
.github/workflows/libvmaf-build-matrix.yml
# Expected output (3 lines):
# Build — Ubuntu gcc (CPU) + DNN
# Build — Ubuntu clang (CPU) + DNN
# Build — macOS clang (CPU) + DNN
# Local DNN build sanity (matches what each leg will run):
meson setup libvmaf core/build --buildtype release \
--prefix $PWD/install -Denable_float=true -Denable_dnn=enabled
ninja -vC core/build install
meson test -C core/build --suite=dnn --print-errorlogs
- Branch protection: the two Linux DNN legs are pinned as required status checks on
masterimmediately after this PR's merge (19 → 21 contexts). The macOS leg stays informational (experimental: true) because Homebrew ORT floats. Re-pin command:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
--input /tmp/protection-update.json
0022 — Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)¶
- Workstream PRs: this PR; adds a new top-level
windows-gpu-buildjob to.github/workflows/libvmaf-build-matrix.ymlwith two matrix entries (CUDA, SYCL). See ADR-0121. - Touches:
.github/workflows/libvmaf-build-matrix.yml(newwindows-gpu-buildjob),docs/adr/0121-windows-gpu-build-only-legs.md(new),docs/adr/README.md(index row),CHANGELOG.md(Added entry),core/src/compat/win32/pthread.h(new — Win32 pthread shim for MSVC; mirrorscompat/gcc/stdatomic.hpattern),core/src/feature/integer_adm.h(UPSTREAM — converted thedwt_7_9_YCbCr_threshold[3]designated initializer to positional form so MSVC/nvcc-on-Windows accepts the C++ parse; semantically identical, no behavioural change),core/src/ref.handcore/src/feature/feature_extractor.h(UPSTREAM — added#if defined(__cplusplus) && defined(_MSC_VER)branch around#include <stdatomic.h>so MSVC C++ TUs pullatomic_intviausing std::atomic_int;; POSIX paths unchanged),core/src/sycl/d3d11_import.cpp(fix non-existent<libvmaf/log.h>→"log.h"),core/src/sycl/dmabuf_import.cpp(move<unistd.h>inside#if HAVE_SYCL_DMABUFguard for non-VA-API hosts),core/src/sycl/common.cpp(replace POSIXclock_gettime(CLOCK_MONOTONIC)with portablestd::chrono::steady_clock),core/src/feature/x86/motion_avx2.c(UPSTREAM — replace GCC vector-extension__m256i[N]indexing at line 529 with_mm256_extract_epi64; bit-exact),core/src/feature/x86/adm_avx2.c(UPSTREAM — replace 6(__m256i)(_mm256_cmp_ps(...))casts with_mm256_castps_si256(...)and 12__m128i[N]reductions with_mm_extract_epi64; bit-exact),core/src/feature/x86/adm_avx512.c(UPSTREAM — replace 12__m128i[N]reductions with_mm_extract_epi64; bit-exact),core/src/log.c(UPSTREAM — gate<unistd.h>behind!_WIN32, include<io.h>+ redirectisatty/filenoto_isatty/_filenofor MSVC),core/src/feature/integer_vif.c(UPSTREAM — switch thealigned_malloccursor fromvoid *touint8_t *with explicit typed-pointer casts so MSVC accepts the byte-wise pointer arithmetic),core/src/feature/cuda/integer_adm_cuda.c(UPSTREAM — drop unused<unistd.h>include),core/src/dnn/model_loader.c(fork-added — Windows fallback definitions for POSIXS_ISDIR/S_ISREGpath-classification macros),.github/workflows/lint-and-format.yml(fork-added — setlfs: trueon the pre-commit job's checkout so LFS-stored ONNX blobs resolve and don't appear as phantom pre-commit-induced diffs),core/src/feature/x86/motion_avx512.c(UPSTREAM — replace 1__m128i[N]reduction with_mm_extract_epi64; bit-exact),core/src/feature/x86/{vif_statistic_avx2,ansnr_avx2,ansnr_avx512,float_adm_avx2,float_adm_avx512,float_psnr_avx2,float_psnr_avx512,ssim_avx2,ssim_avx512}.c(UPSTREAM — convert 17 sites of trailing__attribute__((aligned(N)))to leading C11_Alignas(N); same alignment, MSVC-portable),core/src/feature/mkdirp.candcore/src/feature/mkdirp.h(UPSTREAM third-party MIT-licensed micro-library — gate<unistd.h>to non-Windows, add<direct.h>+_mkdirfor Windows, addmode_ttypedef for MSVC),core/meson.build(newpthread_dependencygated oncc.check_header('pthread.h')failing),core/src/meson.buildandcore/test/meson.build(threadpthread_dependencyinto every target compiling pthread-using TUs). - Invariant: Windows GPU legs are pinned to the same toolchain versions as the corresponding Linux GPU legs (CUDA 13.0.0, oneAPI BaseKit 2025.3.0.372) so a Linux-vs-Windows divergence implies an MSVC ABI issue, not a tooling-version delta. When either Linux GPU leg bumps its toolchain, the Windows leg must move in lockstep — the Intel installer URL on Windows hard-codes the per-release directory id and the version string, so the bump is two-line edits in the SYCL
Install Intel oneAPI (windows)step (theWINDOWS_BASEKIT_URLenv var). Both legs additionally inject/experimental:c11atomicsintoCFLAGS/CXXFLAGSbecause libvmaf uses C11 atomics that MSVC's<stdatomic.h>rejects without that opt-in flag — when MSVC ships full C11 atomics support, the flag becomes unconditional and can be dropped. Two Windows-only dependency steps round out the parity: the CUDA leg'sJimver/cuda-toolkitsub-package list includes bothcrt(CUDA Runtime Library compile-time headers, shipscrt/host_config.h;cuda_ccclis not a valid Windows sub-package name — installer rejects it) andnvvm(shipsnvvm/bin/cicc.exe+nvvm/libdevice/libdevice.*.bc; without it, nvcc's.cu → PTXstage fails withThe system cannot find the path specified.— on Linux apt pulls NVVM in transitively withcuda-nvcc-XY, Windows requires it explicitly); the SYCL leg builds the Level Zero loader from source (oneapi-src/level-zerov1.18.5 →cmake --build … --target install) because Windows oneAPI BaseKit ships the SYCL runtime but notze_loader.lib, and libvmaf's mesoncc.find_library('ze_loader')needs both the header and the import library. When the Linux aptlevel-zero-devversion moves, bump the L0 git tag to match.core/src/meson.buildguards the explicitsvml/irccc.find_librarycalls behindhost_machine.system() != 'windows'— those calls exist for the gcc/g++ + icpx Linux flow where the host linker is non-Intel; on Windows the host compiler is icx-cl itself and auto-injects the Intel runtime. Round-10 surfaced an additional Windows-only gap: ~14 libvmaf TUs#include <pthread.h>unconditionally, but MSVC and clang-cl ship no pthread (MinGW does, via winpthreads). The fork now ships a header-only Win32 shim atcore/src/compat/win32/pthread.hmapping the in-use pthread subset (mutex / cond / thread create+join+detach) onto SRWLOCK + CONDITION_VARIABLE +_beginthreadex. The shim is wired in viapthread_dependencyincore/meson.build, declared only whencc.check_header('pthread.h')fails — so MinGW and POSIX paths stay untouched. When upstream Netflix/vmaf adds new pthread surface (e.g.,pthread_rwlock_*), extendcompat/win32/pthread.hto cover it. Both nvcc fatbincustom_targets (CUDA) and icpxcustom_targets (SYCLcommon.cpp/picture_sycl.cpp/dmabuf_import.cpp, plus the SYCL feature kernels) bypass meson'sdependencies:plumbing and hand-roll their own-Ilists, so the shim path must be threaded into bothcuda_extra_includesandsycl_inc_flagsexplicitly on Windows. icpx-cl on Windows additionally rejects-fPIC(unsupported option for target 'x86_64-pc-windows-msvc') — sosycl_common_argsandsycl_feature_argsroute their-fPICtoken throughsycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []. PIC is the default for Windows DLLs, so dropping the flag is the correct fix rather than a workaround. Round-14 surfaced a third Windows-only blocker:core/src/feature/integer_adm.h(an upstream Netflix file, last touched by upstream port d06dd6cf) initialisesdwt_7_9_YCbCr_threshold[3]with C99 designated initializers ({.a = ..., .k = ..., .f0 = ..., .g = {...}}). The header is included from bothinteger_adm.c(C TU) andcuda/integer_adm/*.cu(C++ TU via nvcc); MSVC's C++ frontend (and nvcc's cudafe++ on Windows) rejects C99 designated initializers without/std:c++20. Converted to positional initialization in the same struct-member order (a / k / f0 / g[4]) — the conversion is provably semantically identical and works in every C/C++ standard, so it costs nothing on the upstream-merge side beyond a trivial conflict marker if upstream Netflix later edits the same lines. Restore designated form post-merge if upstream has it. Round-17 surfaced four more Windows/MSVC-only SYCL blockers, two of which touch upstream-shared headers. (a)core/src/ref.handcore/src/feature/feature_extractor.h(UPSTREAM) unconditionally#include <stdatomic.h>and use theatomic_inttypedef in struct definitions. MSVC's<stdatomic.h>(added in 19.34) only declares the C11 symbols inside the global namespace under C; in C++ compilation (icpx-cl drives the SYCL TUs as C++) MSVC surfaces them only insidenamespace std::. gcc/clang expose both via a GNU extension, so the upstream code works on every other platform. The fork now wraps both headers'#include <stdatomic.h>in#if defined(__cplusplus) && defined(_MSC_VER)→#include <atomic>+using std::atomic_int;, falling through to the original<stdatomic.h>line on every other configuration. ABI is unchanged —atomic_intresolves to the same underlying type. If upstream Netflix adds further C11 atomic typedefs in these headers (e.g.,atomic_uint,atomic_size_t), extend theusing std::lines to cover them. (b)core/src/sycl/d3d11_import.cpp(fork-added) used<libvmaf/log.h>which doesn't exist —log.hlives atcore/src/log.hand is internal. Switched to"log.h"; the icpx invocation already supplies the src-relative-I. (c)core/src/sycl/dmabuf_import.cpp(fork-added) included<unistd.h>at file scope, but POSIXclose()is only used inside the#if HAVE_SYCL_DMABUFVA-API block. Moved the<unistd.h>include inside that guard so non-DMA-BUF builds (Windows MSVC, macOS) compile cleanly. (d)core/src/sycl/common.cpp(fork-added) calledclock_gettime(CLOCK_MONOTONIC), which doesn't exist on Windows. Replaced withstd::chrono::steady_clock(guaranteed monotonic by the C++ standard, portable on every supported host). All four fixes preserve POSIX/Linux behaviour bit-identically and only change the Windows MSVC build path. Round-18 surfaced a fifth Windows blocker on the CUDA leg's CPU SIMD compile path:core/src/feature/x86/motion_avx2.c:529(UPSTREAM, ported in commit 9371a0aa from Netflix PR #1486) computedfinal_accum[0] + final_accum[1] + final_accum[2] + final_accum[3]to extract the four int64 lanes from an__m256i. gcc/clang allow this via the GNU vector-extension treatment of__m256i(it carries__attribute__((vector_size(32)))); MSVC rejects it withC2088: built-in operator '[' cannot be applied to an operand of type '__m256i'. Replaced with_mm256_extract_epi64(final_accum, N)for N ∈ {0..3}, summed — bit-exact lane sum on every compiler. Restore the index form post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Round-19 surfaced the same MSVC pattern at 19 more call sites across the AVX2/AVX-512 ADM and motion files plus six GCC-style vector casts.core/src/feature/x86/adm_avx2.c(UPSTREAM): 6 lines (915-920) used(__m256i)(_mm256_cmp_ps(...))C-style casts that gcc/clang accept via the GNU vector extension; replaced with the dedicated_mm256_castps_si256(...)bit-cast intrinsic. 12 lane-extract sites (r2_h[0]+r2_h[1], etc. at lines 2420 / 2425 / 2430 / 2893 / 2897 / 2901 / 4079 / 4084 / 4089 / 4627 / 4631 / 4635) replaced with_mm_extract_epi64(r2_X, N)summed pair.core/src/feature/x86/adm_avx512.c(UPSTREAM): 6 sister lane-extract sites (lines 4470 / 4477 / 4484 / 4625 / 4631 / 4637) — same fix. The AVX-512 paths reduce a__m512idown to__m128ifirst (via_mm512_extracti64x4_epi64→_mm256_extracti64x2_epi64) before the index, so only the final__m128i[N]step needed changing.core/src/feature/x86/motion_avx512.c(UPSTREAM, ported in 9371a0aa from PR #1486): one finalr2[0]+r2[1]reduction (line 448), same fix. All 19 lane-extract fixes plus the 6 cast fixes are bit-exact rewrites and only change the source-level syntax to MSVC-portable form. Restore the original forms post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Additionallycore/src/sycl/d3d11_import.cpp(fork-added) switched from C-style COBJMACROS helpers (ID3D11Device_CreateTexture2D,…_Release, etc.) to C++ method-call syntax (device->CreateTexture2D,tex->Release) — d3d11.h gates COBJMACROS behind!defined(__cplusplus), so the C-style helpers aren't visible in this.cppTU. The two forms are ABI-equivalent (both dispatch through the COM vtable); the choice is purely lexical and POSIX builds aren't affected (the whole TU is#ifdef _WIN32). Round-20 surfaced two more Windows-only blockers. (a) 17 sites across the x86 SIMD layer used GCC'sfloat tmp[N] __attribute__((aligned(M)));form to align scratch buffers for_mm{256,512}_store_ps. MSVC rejects the trailing-attribute syntax withC2146: syntax error: missing ';' before identifier '__attribute__'. Replaced with the C11-standard_Alignas(M) float tmp[N];(alignment specifier before the type) — works in gcc, clang and MSVC with/std:c11. Files touched (all UPSTREAM):vif_statistic_avx2.c(×2),ansnr_avx2.c(×2),ansnr_avx512.c(×2),float_adm_avx2.c(×2),float_adm_avx512.c(×2),float_psnr_avx2.c(×1),float_psnr_avx512.c(×1),ssim_avx2.c(×4),ssim_avx512.c(×4). The pre-existingvif_avx2.c/vif_avx512.calready define a portableALIGNED(x)macro at file scope and position the attribute before the type, so they compile cleanly under MSVC and were not touched. (b)core/src/feature/mkdirp.c(UPSTREAM, third-party MIT-licensed copy of Stephen Mathieson's micro-library) included<unistd.h>unconditionally but never used POSIXunistdsymbols (onlymkdirvia<sys/stat.h>/<direct.h>). Gated<unistd.h>to non-Windows and added<direct.h>for Windows; switchedmkdir(pathname)→_mkdir(pathname)(the non-deprecated MSVC name).core/src/feature/mkdirp.hadded amode_ttypedef under MSVC since neither<sys/types.h>nor<sys/stat.h>declare it on Windows;modeis ignored on the Windows path anyway. Round-21 surfaced two more blockers (the round-19__m128i[N]sweep missed six sites) plus a pre-commit workflow checkout gap. (a)core/src/feature/x86/adm_avx512.c(UPSTREAM) had six furtherr2_X[0] + r2_X[1]reductions at lines 2128 / 2135 / 2142 / 2589 / 2595 / 2601 that reduce a__m512iaccumulator down to__m128ibefore the lane index. Replaced with the same_mm_extract_epi64(r2_X, N)summed-pair pattern used in round 19 — bit-exact, MSVC-portable. (b)core/src/log.c(UPSTREAM) included<unistd.h>unconditionally to pick up POSIXisatty/fileno. On MSVC both live in<io.h>as_isatty/_fileno; gated the include and macro-redirected the names so the one call site at line 34 compiles on both sides without touching the POSIX path. (c).github/workflows/lint-and-format.yml(fork-added) checks out withoutlfs: true, so themodel/tiny/*.onnxfiles land as LFS pointer stubs. pre-commit's "changes made by hooks" reporter then diffs the stubs against HEAD's real blobs and fails the job even though no hook touched them. Addedlfs: trueto the pre-commit job's checkout. (d)core/src/meson.build—cuda_common_vmaf_libstatic library had nodependencies:list, so the Win32 pthread shim (wired in viapthread_dependencyin core/meson.build) wasn't on its include path;cuda/common.hunconditionally#include <pthread.h>and MSVC failed with C1083. Addeddependencies : [pthread_dependency]— no-op on POSIX (empty list), routes the shim path in on Windows. (e)core/src/feature/integer_vif.c(UPSTREAM) walked one bigaligned_mallocresult asvoid *dataand diddata += pad_size/data += h * stride_16etc. to carve the buffer into typed sub-pointers. gcc/clang accept pointer arithmetic onvoid *as a GNU extension (treatingsizeof(void) == 1); MSVC rejects it withC2036: 'void *': unknown size. Replaced the cursor type withuint8_t *and added explicit casts at assignment sites that take a typed pointer (uint16_t *mu1,uint32_t *mu1_32, etc.). Byte offsets are identical, layout unchanged, bit-exact. If upstream Netflix edits the same loop, reabsorb the walk and re-apply the cursor-type + cast pattern. (f)core/src/feature/cuda/integer_adm_cuda.c(UPSTREAM) included<unistd.h>at line 33 but used no POSIX symbols from it; MSVC failed with C1083. Dropped the unused include outright — simplest fix, no runtime change on any platform. (g)core/src/dnn/model_loader.c(fork-added) usesS_ISDIR/S_ISREGto classify resolved paths. MSVC ships the underlyingS_IFMT/S_IFDIR/S_IFREGbit masks in<sys/stat.h>but not the POSIX classification macros. Added a Windows-only fallback (#ifndef S_ISDIR #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #endif, same for S_ISREG) guarded by#ifdef _WIN32. Semantically identical to the POSIX macro on Linux/macOS. Round-21e surfaced the final source-portability blockers once the DLL build passed preprocessing. (h)core/src/predict.c,core/src/libvmaf.candcore/src/read_json_model.c(all UPSTREAM) used C99 variable-length arrays —double scores[cnt]at predict.c:385,char name[name_sz]at predict.c:453 and libvmaf.c:1741, pluscfg_name[cfg_name_sz]andgenerated_key[generated_key_sz]in the.jsonmodel-collection parser. gcc/clang accept VLAs as a C11 optional feature; MSVC (even with/std:c11) rejects them outright withC2057: expected constant expression(plus C2466 and C2133 on theconst size_tsized arrays — MSVC treatsconstas runtime-bounded, not a constant expression, even when the initialiser is literal like4 + 1). Replaced each runtime-sized buffer with a smallmalloc+ explicitfreeon every exit path (in predict.c and read_json_model.c agoto out;cleanup arm was introduced because the loops error-exit mid-function). Thegenerated_keybuffer in read_json_model.c uses the narrower fix —char generated_key[5];— since its size (four decimal digits of the bootstrap sub-model index plus NUL) is a true compile-time constant. Buffers are a handful of bytes each (name_szis the model-collection name length plus the fixed_ci_p95_losuffix,scoresholds ~20 doubles,cfg_nameis the name plus_0000suffix), so the heap round-trip is not performance-relevant; the new-ENOMEMfailure mode is handled uniformly by existing callers. The read_json_model.c refactor also plugs a pre-existing leak of thenamebuffer on the earlyreturn -EINVALwhen a JSON object key isn't a string — thegoto out;path freesname+cfg_nameon every exit.core/test/test_feature_extractor.c:56(UPSTREAM) declaredconst unsigned n_threads = 8;and used it as the extent ofVmafFeatureExtractorContext *fex_ctx[n_threads];. Converted toenum { n_threads = 8 };so MSVC sees a constant-expression; every other compiler accepts enum constants identically. Re-absorb if upstream Netflix later edits the same loops and your toolchain matrix omits MSVC. (i) The Windows MSVC build-only legs now build the full tree — CLI tools, unit tests and libvmaf.dll — rather than the previous short cut of disabling-Denable_tools/-Denable_tests. Per user direction ("fix the code ffs"), the tree polyfills the remaining POSIX surfaces on MSVC instead: (core/tools/compat/win32/getopt.h+core/tools/compat/win32/getopt.c) a from-scratch POSIX/GNU-compatiblegetopt_longshim (short / long options,no_argument/required_argument/optional_argument, argv permutation for non-option operands,--explicit stop,=-embedded values). The shim is fork-added (BSD-3-Clause-Plus-Patent, Copyright 2026 Lusoris and Claude) and declared via a singlegetopt_dependencyincore/meson.build, gated oncc.check_header('getopt.h')failing. The dependency auto-propagates the shim.cinto any consuming target via meson'ssources:keyword, so both thevmafCLI (core/tools/meson.build) and thetest_cli_parseunit test (core/test/meson.build) pick it up uniformly. MinGW ships<getopt.h>via mingw-w64-crt, socheck_headersucceeds there and the shim stays out of the TU list. (j) Eleven test executables (test_log,test_dict,test_opt,test_cpu,test_ref,test_feature,test_ciede,test_luminance_tools,test_cli_parse,test_sycl,test_sycl_pic_preallocation) were missingpthread_dependencyin theirdependencies:lists atcore/test/meson.build. On POSIXpthread_dependencyis an empty list so the omission was invisible; on MSVC those TUs transitively includefeature_collector.h→<pthread.h>and fail with C1083. Threaded the dependency through all eleven targets.test_cli_parseadditionally listsgetopt_dependencyto pick up the shim. (k) Three additional VLA sites surfaced once the test harness built on MSVC:test_cambi.c:254hadunsigned w = 5, h = 5; uint16_t buffer[3 * w];; converted toenum { w = 5, h = 5 };so the array extent is a constant expression.test_pic_preallocation.c:382andtest_pic_preallocation.c:506hadconst int num_threads = N; pthread_t threads[num_threads];— MSVC rejectsconst intas non-constant-expression. Converted toenum { num_threads = N, fetches_per_thread = M };. (l)test_ring_buffer.c:23andtest_pic_preallocation.c:26included<unistd.h>forusleep/sleep. Gated behind!_WIN32with a Win32 fallback via<windows.h>+#define usleep(us) Sleep(((us) + 999) / 1000)/#define sleep(s) Sleep((s) * 1000). The conversion rounds sub-millisecondusleepinputs up, which is safe for these test paths (they use 100 µs jitter and 1 s waits). (m)core/tools/vmaf.cincluded<unistd.h>forisatty/fileno. Applied the same gating pattern used inlog.cin round-21(b) — include<io.h>on MSVC and redirectisatty/filenoto_isatty/_filenovia#define. (n)__builtin_clz/__builtin_clzllare GCC intrinsics; MSVC ships__lzcnt/__lzcnt64via<intrin.h>instead. The shim already lived incore/src/feature/integer_vif.hbutinteger_adm.c:939,x86/adm_avx2.c:1425andx86/adm_avx512.c:1217don't include that header. Extracted the shim into a dedicatedcore/src/feature/compat_builtin.h(fork-added) and included it from all four TUs. The guard isdefined(_MSC_VER) && !defined(__clang__), so clang-cl / icx-cl (which provide the GCC intrinsics natively) skip the shim. (o) The SYCL leg's D3D11 import TUcore/src/sycl/d3d11_import.cppis C++ (icpx-cl drives it as C++ on Windows) but included the internal C headerlog.hwithout anextern "C"wrap.log.his an upstream Netflix header with no__cplusplusguard, sovmaf_loggot C++ name-mangled in the .cpp TU and failed to resolve against the C-linkage symbol produced bylog.cat link time (LNK2019from every test target that pulls in the SYCL static lib). Wrapped the#include "log.h"withextern "C" { ... }inside the fork-added .cpp rather than touching the upstream header — keepslog.hidentical to upstream on every/sync-upstream. (p) The Windows MSVC legs build with--default-library=static. libvmaf's public API has no__declspec(dllexport)attributes (upstream Netflix is POSIX-shaped), so a vanilla MSVC shared build producessrc/vmaf-3.dllwith no exported symbols and the toolchain therefore never emits the companionvmaf.libimport library. Downstream tool targets then fail withLNK1181: cannot open input file 'src\vmaf.lib'. The MinGW matrix leg has used--default-library staticsince day one for the same reason (line 387); the MSVC legs now mirror that choice viamatrix.include[].meson_extra. Downstream consumers that want a DLL can either add__declspec(dllexport)decorations to the public API or use a.deffile; that is a separate decision and out of scope for the build-only gate. - Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.windows-gpu-build.strategy.matrix.include[].name' \
.github/workflows/libvmaf-build-matrix.yml
# Expected output (2 lines):
# Build — Windows MSVC + CUDA (build only)
# Build — Windows MSVC + oneAPI SYCL (build only)
- Branch protection: the two Windows GPU legs are pinned as required status checks on
masterimmediately after this PR's merge. After ADR-0120's two Linux DNN legs the count moves 21 → 23. Re-pin via:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
--input /tmp/protection-update.json
0023 — CUDA gencode coverage (sm_86/sm_89/compute_80 PTX) + init hardening¶
- Workstream PRs: the ADR-0122 PR (gencode + init hardening) and the ADR-0123 follow-up for the
32b115dfpost-cubin-load regression. - Touches:
core/src/meson.build— thegencodearray in theif get_option('enable_nvcc')branch.core/src/cuda/common.c—vmaf_cuda_state_init()error paths (multi-line actionable log,cuda_free_functions()+free(c)+*cu_state = NULLcleanup).docs/backends/cuda/overview.md—## Runtime requirementssection and### GPU architecture coveragetable.- Invariant: the
gencodearray unconditionally emits cubins forsm_75/sm_80/sm_86/sm_89plus acompute_80PTX, independent of hostnvccversion. Upstream Netflix's gencode only ships cubins at Txx major boundaries (sm_75/sm_80/sm_90/sm_100/sm_120); a literal merge that replaces our array with upstream's would re-open the Ampere-sm_86/ Ada-sm_89coverage hole. Thesm_90/sm_100/sm_120entries are still version-gated and should be preserved verbatim if upstream adds new gates. The init-path error messages are fork-local strings; upstream's terse"Error: failed to load CUDA functions"must NOT win a merge. - Re-test:
meson setup build -Denable_cuda=true -Denable_nvcc=true
ninja -C build 2>&1 | grep -E 'compute_(80|86|89)'
# Expect at least -gencode=arch=compute_86,code=sm_86 and
# -gencode=arch=compute_89,code=sm_89 and
# -gencode=arch=compute_80,code=compute_80
# Actionable init message (run without CUDA driver on the loader path):
LD_LIBRARY_PATH= ./build/tools/vmaf --help 2>&1 | grep -qi 'libcuda.so.1' || \
echo "init log regressed"
0024 — vmaf_read_pictures null-guard for CUDA device-only path¶
- Workstream PRs: the ADR-0123 follow-up landed atop the ADR-0122 gencode/init-hardening work.
- Touches:
core/src/libvmaf.c— the non-threaded tail ofvmaf_read_picturesat theprev_refupdate site (line ~1428 in the fork; upstream equivalent is the tail added byf740276a).- Invariant: the
prev_refupdate is guarded byif (ref && ref->ref)so pure-CUDA extractor sets (whereref = &ref_hostbutref_hostwas never populated bytranslate_picture_device) do not deref a NULL refcount. Upstream currently has the same unguarded tail; the bug is masked upstream only because the experimentalVMAF_PICTURE_POOLgate from32b115dfis still in place. A literal upstream merge that removes our null-guard while upstream's experimental gate is still holding would pass tests but re-open thelibvmaf_cudaffmpeg crash the moment the gate flips default-on (which the fork did in65460e3a, ADR-0104). Keep the guard until the upstream null-guard port lands. - Re-test:
# Unit tests cover the non-regression on the library side:
meson test -C build
# End-to-end regression: ffmpeg libvmaf_cuda must exit 0 on a
# CUDA-device-only extractor set (full recipe in ADR-0123).
./ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
-i /tmp/ref.mp4 -i /tmp/dis.mp4 \
-lavfi "[0:v]format=yuv420p,hwupload_cuda[r];\
[1:v]format=yuv420p,hwupload_cuda[d];\
[r][d]libvmaf_cuda=log_path=/tmp/out.json:log_fmt=json" \
-f null -
0025 — VIF init() fail-path frees advanced byte-cursor¶
- Workstream PRs: PR #47 (rewritten to leak-fix-only after master absorbed the void→uint8_t half via commit
b0a4ac3a, entry 0022 §e). Ports the leak-fix half of upstream Netflix PR #1476. - Touches:
core/src/feature/integer_vif.c(UPSTREAM — 2-line fix in theinit()fail:handler). - Invariant:
init()walksuint8_t *dataforward throughaligned_malloc's one allocation, advancing past each sub-pointer assignment. Ifvmaf_feature_name_dict_from_provided_featuresreturns NULL the fail path must free the base pointers->public.buf.data, never the advanced cursordata. Upstream master still hasaligned_free(data)there — same bug — so this entry is the reminder to not let an upstream sync re-introduce the advanced-cursor form. If upstream lands PR #1476 or an equivalent, the sync can drop this entry. - Re-test:
meson test -C build --suite=fast
# Static check: ripgrep the pattern that must NOT return.
rg -n "aligned_free\(data\)" core/src/feature/integer_vif.c && \
echo 'REGRESSED' || echo 'ok'
0026 — Automated rule-enforcement workflow + copyright pre-commit hook¶
- Workstream PRs: this PR (ADR-0124 adoption). Closes the "rule-without-a-check" gap on ADR-0100 / 0105 / 0106 / 0108.
- Touches (all FORK-ADDED — no upstream overlap):
.github/workflows/rule-enforcement.yml(new),scripts/ci/check-copyright.sh(new),.pre-commit-config.yaml(appended local hook). - Invariant: the
deep-dive-checklistjob is blocking on every PR that is not an upstream port (exempt viaport:title prefix orport/branch). The other three gates (doc-substance-check,adr-backfill-check, copyright pre-commit) are advisory or pre-commit, never CI-blocking; this split is the whole point of ADR-0124 and an upstream sync must not move them into the required-status-check set without a follow-up ADR. The opt-out parser matches/^-?\s*no .* (?:needed|impact|rebase-sensitive)/per ADR-0108 §Opt-out-lines — if upstream ever changes PR-template phrasing (unlikely; this is fork-local), the regex and the template must move together. - Re-test:
# Lint the workflow + hook locally.
pre-commit run --files \
.github/workflows/rule-enforcement.yml \
scripts/ci/check-copyright.sh \
.pre-commit-config.yaml
# Dry-run the copyright hook against a staged source file.
scripts/ci/check-copyright.sh core/src/libvmaf.c && echo ok
# Synthetic PR body that violates ADR-0108 should fail the parser;
# see docs/research/0002-automated-rule-enforcement.md §Verification
# plan for the three test cases.
0027 — SSIMULACRA 2 scalar extractor (libjxl FastGaussian IIR blur)¶
- Workstream PRs: this PR (
feat/ssimulacra2-scalar); proposal ADR in PR #67. - Touches:
core/src/feature/ssimulacra2.c(fork-local, new),core/src/meson.build,core/src/feature/feature_extractor.c. - Invariant: the extractor embeds several tables that must track libjxl upstream — opsin absorbance matrix,
MakePositiveXYBoffsets, 108 pooling weights, polynomial-transform coefficients, and the FastGaussian coefficient-derivation formulas (radius =3.2795·σ + 0.2546, Cramer's 3×3 solve for β, n2/d1 assignment per Charalampidis 2016 (33)). If libjxl ever changes any of these, updatessimulacra2.cin the same PR that syncs upstream. Self-consistency must stay at exactly100.000000for identical ref/dist inputs — this is the cheapest regression check. - Re-test:
meson test -C build --suite=fast
./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc00_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 --feature ssimulacra2 -o /tmp/self.xml \
&& grep -q 'ssimulacra2="100.000000"' /tmp/self.xml \
&& echo "ok: self-consistency 100.0"
0028 — MS-SSIM separable decimate + AVX2/AVX-512/NEON SIMD¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2(supersedes the rebase-incompatiblefeat/ms-ssim-decimate-simd; AVX2/AVX-512, commits7de8cd7fscalar separable,5f93c864AVX2,73436438AVX-512);feat/ms-ssim-decimate-neon-v2(NEON follow-up, stacked). - Touches:
core/src/feature/ms_ssim_decimate.{c,h}(NEW),core/src/feature/x86/ms_ssim_decimate_avx2.{c,h}(NEW),core/src/feature/x86/ms_ssim_decimate_avx512.{c,h}(NEW),core/src/feature/arm64/ms_ssim_decimate_neon.{c,h}(NEW),core/src/feature/ms_ssim.c(call-site change),core/src/meson.build(register new SIMD TUs),core/test/test_ms_ssim_decimate.c(NEW),core/test/meson.build(arm64 gating). - Invariant: the 9-tap 9/7 biorthogonal wavelet LPF coefficients (
ms_ssim_lpf_h/ms_ssim_lpf_v) are duplicated verbatim in five TUs for bit-identity: the scalarms_ssim_decimate.c, the AVX2 variant, the AVX-512 variant, the NEON variant, and upstream'sg_lpf_h/g_lpf_vinms_ssim.c. Any upstream change to the coefficient values or theKBND_SYMMETRICmirror branch iniqa/convolve.cmust be mirrored to all five. If not mirrored, SIMD paths and scalar diverge silently and the bit-equalitymemcmpintest_ms_ssim_decimatecatches it — but only when that test runs, so diff the five files first. - Re-test (on each supported host arch):
# x86_64 host — native build.
meson test -C build
./build/test/test_ms_ssim_decimate
# aarch64 host OR aarch64 cross under qemu — see /tmp/aarch64-cross.txt.
meson setup build-arm64 libvmaf --cross-file /tmp/aarch64-cross.txt \
-Denable_cuda=false -Denable_sycl=false
ninja -C build-arm64
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
build-arm64/test/test_ms_ssim_decimate
# Netflix MS-SSIM golden — places=4 must still pass through SIMD.
.venv/bin/python -m pytest \
python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor
0029 — KBND_SYMMETRIC period-based reflection in iqa/convolve.c¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2follow-up (CI triage on PR #69, 2026-04-20). - Touches:
core/src/feature/iqa/convolve.c(upstream file, rewrittenKBND_SYMMETRIC). - Invariant:
KBND_SYMMETRIC(img, w, h, x, y, _)must use the period-based form (period = 2*w,period = 2*h) so that offsets with|x| > wor|y| > hstill land in bounds. Upstream's single-reflect form was out-of-bounds wheneverw < kernel_halforh < kernel_half; the latent bug did not reproduce in Netflix golden tests because MS-SSIM pyramids never decimate below ~60×34. Any upstream change that reverts to the single-reflect form must be rejected or re-ported. - Re-test:
./build/test/test_ms_ssim_decimate # test_1x1 border case
.venv/bin/python -m pytest \
python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor
0030 — adm_decouple_s123_avx512 stack-array 64-byte alignment¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2follow-up (CI triage on PR #69, 2026-04-20). - Touches:
core/src/feature/x86/adm_avx512.c(upstream file, one-line_Alignas(64)onint64_t angle_flag[16]at line 1317).core/test/test_pic_preallocation.c(upstream file, threevmaf_model_destroy(model)calls pairing thevmaf_model_loadintest_picture_pool_basic/_small/_yuv444). - Invariant: the stack slot for
angle_flagmust be 64-byte aligned because two_mm512_loadu_si512(&angle_flag[0/8])loads in the same scope may be promoted to alignedvmovdqa64by LTO. Dropping the_Alignas(64)annotation re-introduces the SEGV under--buildtype=release -Db_lto=true -Db_sanitize=address. Debug / no-LTO builds keepvmovdqu64and cannot flag the regression. Seedocs/development/known-upstream-bugs.md. - Re-test:
meson setup build-asan-lto libvmaf \
-Denable_cuda=false -Denable_sycl=false \
-Db_sanitize=address --buildtype=release -Db_lto=true
ninja -C build-asan-lto test/test_pic_preallocation
ASAN_OPTIONS=detect_leaks=1 \
./build-asan-lto/test/test_pic_preallocation
0031 — Batch-A upstream-port small-fix sweep (ports of unmerged PRs)¶
- Workstream PRs:
feat/batch-a-upstream-small-fix-sweep— commits546a40ee(T0-1),8fed8ad1(T4-4),83a1db46(T4-5),34425dee(T4-6). ADRs 0131, 0132, 0134, 0135. - Touches:
core/src/cuda/picture_cuda.c(one-linecuMemFreeport of Netflix#1382)core/src/feature/feature_collector.c+core/test/test_feature_collector.c(mount/unmount bugfix port of Netflix#1406 + shared-helper test refactor)core/src/meson.build(declare_dependency+override_dependencyport of Netflix#1451)core/include/libvmaf/model.h,core/src/model.c,core/test/test_model.c,docs/api/index.md(built-in model iterator port of Netflix#1424)- Invariant: each of the four upstream PRs is OPEN (unmerged) on the port date; when Netflix merges any of them, the fork's version is correction-bearing (T4-4 test refactor, T4-6 three defect fixes + Doxygen doc expansion), not line-identical. Resolution on upstream merge is always "keep fork version" because the fork's version already satisfies the PR's intent and additionally fixes the defects.
- Netflix#1406 conflict will land in
test_feature_collector.c— fork usesload_three_test_models()helper vs upstream's inline per-modelVmafModel *m0, *m1, *m2;duplication. - Netflix#1424 conflict will land in
core/src/model.candcore/test/test_model.c— fork useselse ifguard +idx + 1 < CNT+ const-qualified test types. - Netflix#1382 and Netflix#1451 are line-identical in substance; merge should be clean aside from trailing-comma style drift.
- Re-test:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_feature_collector test/test_model
build/test/test_feature_collector
build/test/test_model
# Expected: 6/6 pass in test_feature_collector (mount/unmount
# 3-model sequences); 39/39 pass in test_model (includes
# test_version_next full-iteration invariant).
0032 — Thread-local locale handling for numeric I/O (port of Netflix/vmaf#1430)¶
- Workstream PRs:
port/netflix-1430-thread-locale(T4-3 from the "Batch-A follow-up" sweep, 2026-04-20). - Touches:
core/src/thread_locale.h/core/src/thread_locale.c(new, upstream-authored);core/src/meson.build(twocdata.set('HAVE_USELOCALE'/'HAVE_XLOCALE_H')probes +src_dir + 'thread_locale.c'inlibvmaf_sources);core/src/output.c(four writers gainpush_c()+pop()bracket, preserving fork'sferror(outfile) ? -EIO : 0return contract from ADR-0119);core/src/svm.cpp(drop<locale.h>include; replacesetlocale/strdup/setlocalebracket withvmaf_thread_locale_push_c/pop; addbuffer.imbue(std::locale::classic())to both SVM parser ctors with fork's K&R + 4-space style);core/src/read_json_model.c(bracketmodel_parsewith push/pop);core/test/meson.build(newtest_locale_handlingtarget + test registration);core/test/test_locale_handling.c(new, upstream-authored with three fork corrections for thescore_formatparameter). - Invariant: fork's output writers return
ferror(outfile) ? -EIO : 0— this must survive any upstream refactor of the writer bodies. Thepush_c()call MUST be paired with apop()on every return path (writer bodies have a single tail return, so the pattern is locallypush → body → pop → return ferror-check). Droppingpop()leaks alocale_ton POSIX and leaves the thread locked to "C" on Windows. - Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_locale_handling
# Repro the user-visible failure without the fix:
LC_ALL=de_DE.UTF-8 build/tools/vmaf --reference ref.yuv \
--distorted dis.yuv --width 1920 --height 1080 \
--pixel_format 420 --bitdepth 8 --output result.json \
--json
# Assert output contains period decimals, not comma.
python -c "import json; d=json.load(open('result.json')); \
assert all('.' in repr(v) for v in \
[f['metrics']['vmaf'] for f in d['frames']])"
- On upstream sync: when Netflix merges PR #1430, the
(cherry picked from commit 054a97ed…)trailer ingit log port/netflix-1430-thread-localelets the next/sync-upstreamskip this commit. If the upstream diff drifts, redo the three fork corrections listed in ADR-0137 §Decision.
0033 — SSIM / MS-SSIM SIMD bit-exact to scalar via per-lane scalar double¶
- Workstream PRs:
feat/ms-ssim-decimate-neon(this PR — companion to the ADR-0138 convolve fast path). - Touches:
core/src/feature/x86/ssim_avx2.candcore/src/feature/x86/ssim_avx512.c—ssim_accumulate_*rewritten.ssim_precompute_*andssim_variance_*unchanged (they were already bit-exact). Plus the new bit-exactconvolve_avx2.c/convolve_avx512.cand the upstream h-pass OOB fix atiqa/convolve.c:159. - Invariants (see ADR-0139 §Decision):
- Convolve taps — single-rounded
float*float→ widen →doubleadd, NO FMA. Mirrors scalarsum += img[i]*k[j]iniqa/convolve.c. - SSIM accumulate — scalar's
2.0 *literal (2.0 * ref_mu[i] * cmp_mu[i] + C1and2.0 * srsc + C2) is a Cdoubleliteral. Both SIMD accumulators do the2.0 *numerator + division + finall*c*sproduct per-lane in scalar double to match scalar type promotions byte-for-byte. - H-pass outer-loop bound —
y < dst_h + vc - kh_even(noty < dst_h + vc); the- kh_evenis load-bearing because the last cache row on even-tap kernels (e.g. box-8) is never read by the v-pass but was previously written OOB when image height equals kernel height.
Fork-local SSIM SIMD is NOT upstream. If upstream ever adds their own SSIM AVX2/AVX-512, keep the fork's version on conflict — it's the only variant verified bit-exact to scalar at --precision max. - Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_iqa_convolve test_ms_ssim_decimate
# Bit-exactness check across dispatch backends:
FIX=python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv
DIS=python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv
for m in 255 16 0; do
build/tools/vmaf --cpumask $m --reference $FIX --distorted $DIS \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
--feature float_ssim --feature float_ms_ssim \
--output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_16.xml) # expect empty
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_0.xml) # expect empty
- On upstream sync: the AVX2/AVX-512 SSIM surface is entirely fork-local (upstream has VIF/ADM/motion/CAMBI SIMD but no SSIM). If upstream ever introduces SSIM SIMD, their kernel bodies will almost certainly compute
l*c*sin vector float for throughput — do not adopt. The fork's per-lane-scalar-double reduction is required for the bit-exactness claim. Same applies toconvolve_avx2/512— they are fork-only; dispatch sits inssim_tools.cvia_iqa_convolve_set_dispatch.
0034 — SIMD DX framework + NEON SSIM/convolve bit-exact port¶
- Workstream PRs:
feat/simd-dx-framework(this PR, PR #A); ships the two demos on top of which PR #B will consume the framework (ssimulacra2, motion_v2, vif_statistic, ...). - Touches:
core/src/feature/simd_dx.h(new header),core/src/feature/arm64/convolve_neon.c+convolve_neon.h(new NEON port),core/src/feature/arm64/ssim_neon.c(ssim_accumulate_neonrewritten for ADR-0139 bit-exactness;precompute+varianceunchanged),core/src/feature/float_ssim.c+core/src/feature/float_ms_ssim.c(wireiqa_convolve_neoninto the aarch64 dispatch setters),core/src/meson.build(arm64_sources+= convolve_neon.c),core/test/meson.build(test_iqa_convolvearch filter extended toarm64/aarch64),core/test/test_iqa_convolve.c(NEON variant check + aarch64 CPU flag detection),core/test/dnn/meson.build(test_cli.shgated onnot meson.is_cross_build()— bash invokes$VMAF_BINdirectly so meson's exe_wrapper isn't applied), newbuild-aux/aarch64-linux-gnu.inimeson cross-file,.claude/skills/add-simd-path/SKILL.md(upgraded kernel-spec flags). - Invariants (see ADR-0140 §Decision):
simd_dx.his fork-local. Keep the fork's version on upstream conflict. Macro names are ISA-suffixed (_AVX2_4L,_AVX512_8L,_NEON_4L) — do not collapse into a cross-ISA abstraction; the fork's SIMD policy (user-memoryfeedback_simd_dx_scope.md) rules out Highway / simde / xsimd.- The ADR-0138 widen-then-add rule (single-rounded
float * float→ widen →doubleadd, NO FMA) applies to NEON exactly as to AVX2 / AVX-512. The NEON form uses pairedfloat64x2_taccumulators (lo / hi) because NEON has nofloat64x4_t. - The ADR-0139 per-lane scalar-double reduction rule applies to
ssim_accumulate_neonexactly as to the AVX2 / AVX-512 variants. The NEON implementation usesSIMD_ALIGNED_F32_BUF_NEON(_Alignas(16) float name[4]) + a 4-iteration scalar loop. - Re-test (requires
aarch64-linux-gnu-gcc+qemu-user-static+ aarch64 sysroot at/usr/aarch64-linux-gnu):
cd libvmaf
meson setup ../build-aarch64 \
--cross-file ../build-aux/aarch64-linux-gnu.ini \
-Denable_cuda=false -Denable_sycl=false -Denable_dnn=disabled
cd ..
ninja -C build-aarch64
meson test -C build-aarch64 # expect 31/31 OK
# Bit-exactness check scalar vs NEON under QEMU:
REF=python/test/resource/yuv/src01_hrc00_576x324.yuv
DIS=python/test/resource/yuv/src01_hrc01_576x324.yuv
for m in 255 0; do
LD_LIBRARY_PATH=$PWD/build-aarch64/src qemu-aarch64-static \
-L /usr/aarch64-linux-gnu build-aarch64/tools/vmaf \
--cpumask $m --reference $REF --distorted $DIS \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--feature float_ssim --feature float_ms_ssim \
--output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_0.xml) # expect empty
- On upstream sync: upstream has no NEON SSIM and no NEON convolve for IQA. If they ever add one, keep the fork's version on conflict — the fork's NEON path is the only variant verified bit-exact to scalar at
--precision max. Thebuild-aux/aarch64-linux-gnu.inicross-file has no upstream equivalent. The/add-simd-pathskill is fork-only; upstream doesn't ship.claude/skills/.
0036 — Port Netflix generalised AVX convolve + ADR-0141 cleanup¶
- Workstream PRs:
port/upstream-f3a628b4-generalized-avx-convolve(this PR). - Upstream commit:
f3a628b4"feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21). - Touches:
- convolution.h — upstream-tracking: adds
#define MAX_FWIDTH_AVX_CONV 17. - convolution_avx.c — upstream-tracking (2,500 LoC deletion) plus fork-delta cleanup per ADR-0141: four scanline helpers
convolution_f32_avx_s_1d_*changed from external linkage tostatic(no other TU uses them after the specialised-path removal); stride parameters widened frominttoptrdiff_tin the helpers, with(ptrdiff_t)casts at public-function multiplication sites;#include <stddef.h>added for the type. core/src/feature/vif_tools.c— upstream-tracking: three AVX dispatch sites drop thefwidth == 17 || ... == 3whitelist in favour offwidth <= MAX_FWIDTH_AVX_CONV.python/test/quality_runner_test.py,python/test/vmafexec_test.py— upstream-authored loosening of two full-VMAF-score assertions fromplaces=2(±0.005) toplaces=1(±0.05). Adopted per the ADR-0142 Netflix-authority precedent (project rule #1 addresses fork drift, not upstream-authored test updates the fork must track).- Invariants (see ADR-0143 §Decision):
- Static linkage on scanline helpers — upstream leaves the four
convolution_f32_avx_s_1d_*_scanlinehelpers with external linkage out of habit; the fork narrows them tostatic. On upstream sync: if upstream ever externs them from another TU, that's a flag to re-audit; keep the fork'sstaticunless the reference is real. ptrdiff_tstrides inside helpers — the publicconvolution_f32_avx_*_swrappers keepintstrides (matching the upstream interface +convolution.hdeclarations). Helpers takeptrdiff_tto silencebugprone-implicit-widening-of- multiplication-result. If upstream changes the public interface toptrdiff_t, drop the fork's wrapper-level casts.MAX_FWIDTH_AVX_CONV = 17— the ceiling is upstream's; if upstream bumps it, the fork must rebuild + re-run the VIF golden test pair.- Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build # expect 32/32 OK
clang-tidy -p build core/src/feature/common/convolution_avx.c
# Zero warnings expected on the touched file.
Netflix CPU golden CI leg exercises the two loosened assertions; confirmed locally under meson test. - On upstream sync: upstream is the source of truth for convolution_avx.c, convolution.h, vif_tools.c dispatch, and the two python golden tolerances. On a rebase, prefer upstream for those files except: - Keep the fork's static on the four scanline helpers. - Keep the fork's ptrdiff_t helper signatures + multiplication- site casts (unless upstream adopts them too, in which case converge). - Keep the fork's #include <stddef.h>. If upstream re-introduces a specialised fast path for common widths, evaluate on a per-fwidth perf profile — the fork's /profile-hotpath skill covers this.
0038 — motion_v2 NEON SIMD (fork-local)¶
- Workstream PR:
port/motion-bundle-neon-and-updates(this PR). - Upstream: none — aarch64 NEON for
motion_v2is fork-local. Upstream scalar + AVX2 + AVX-512 variants exist; this PR adds the missing NEON fourth path. Scalar is the bit-exactness ground truth. - Touches (fork-local):
- motion_v2_neon.c — new TU, ~300 LoC. 4-wide int32 SIMD over the 5-tap Gaussian pipeline. Five
static inlinehelpers keep every function under the ADR-0141 60-line budget. - motion_v2_neon.h — new header declaring the two public entry points.
- integer_motion_v2.c — dispatch update: adds an
#if ARCH_AARCH64block ininitthat selects the NEON variant whenVMAF_ARM_CPU_FLAG_NEONis present, mirroring the existing x86 dispatch blocks. core/src/meson.build— addarm64/motion_v2_neon.cto thearm64_sourceslist.- Invariants (see ADR-0145 §Decision):
- Arithmetic right-shift throughout. The fork's AVX2 path uses
_mm256_srlv_epi64(logical) which can diverge from scalar on negative-diff pixels. The NEON port usesvshrq_n_s64(v, 16)for the known Phase-2 shift andvshlq_s64(v, -(int64_t)bpc)for the variable Phase-1 shift — both arithmetic, matching scalar C>>on signed integer. On rebase: keep the arithmetic forms; do NOT adoptvshrq_n_u64or a logical emulation even if it runs faster. - 4-lane stride + mirror tails. SIMD stride = 4; scalar tails cover the remainder. The Phase-2 helper
x_conv_row_sad_neonhands 4 lanes tox_conv_block4_neonand drops to scalar for both left/right edges (j < 2andj + 6 > w). On rebase: preserve the 4-lane stride and the two-sided scalar tail. - Signature parity with AVX2. Both pipeline entry points match the AVX2 + AVX-512 variants'
(const uint8_t *prev, ptrdiff_t, const uint8_t *cur, ptrdiff_t, int32_t *y_row, unsigned w, unsigned h, unsigned bpc)signature. On rebase: if upstream changes the signature, mirror the change here AND in the x86 variants in lockstep. - Re-test:
meson setup build-aarch64 libvmaf \
--cross-file build-aux/aarch64-linux-gnu.ini \
-Denable_cuda=false -Denable_sycl=false
ninja -C build-aarch64
meson test -C build-aarch64 --no-rebuild # expect 31/31 OK
clang-tidy -p build-aarch64 \
core/src/feature/arm64/motion_v2_neon.c
# Zero warnings expected on the touched file.
# NEON-vs-scalar bit-exact diff under QEMU:
YUV=python/test/resource/yuv
for mask in 0 255; do
LD_LIBRARY_PATH=build-aarch64/src \
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
build-aarch64/tools/vmaf \
-r $YUV/src01_hrc00_576x324.yuv \
-d $YUV/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 -n --feature motion_v2 \
--cpumask $mask -o /tmp/mv2_$mask.xml --precision max
done
diff <(grep -v 'fps=' /tmp/mv2_0.xml) \
<(grep -v 'fps=' /tmp/mv2_255.xml) # expect empty
- On upstream sync: upstream has no NEON
motion_v2and has not signalled plans to add one. If they ever do, diff their NEON against the fork's: on logical-vs-arithmetic shift, keep the fork's arithmetic form (matches scalar). On the function decomposition (the five helpers), adopt upstream's if it's smaller; the fork's layout is ADR-0141-driven, not a semantic contract. - Follow-up T7-32 (fixed 2026-05-09): The
_mm256_srlv_epi64(logical right shift) inmotion_score_pipeline_16_avx2was replaced withsrav_epi64_imm, an AVX2-safe arithmetic-right-shift emulation: logical shift OR sign-fill mask viasrai_epi32+slli_epi64. Two bugs were closed in the same PR: - AVX2 logical-vs-arithmetic shift:
_mm256_srlv_epi64replaced bysrav_epi64_immincore/src/feature/x86/motion_v2_avx2.c. The emulation is bit-exact with scalar C>> bpcon signedint64_t. - Test scalar reference mirror:
mirror_idxincore/test/test_motion_v2_simd.cused2*size - idx - 1instead of2*size - idx - 2, diverging frominteger_motion_v2.c::mirror(). Fixed to-2. All four adversarial fixtures (neg-diff bpc10/12, mixed-diff bpc10/12) now pass.meson test -C build50/50 OK. On rebase: keepsrav_epi64_imm; do not revert to_mm256_srlv_epi64. The rebase-time invariant is now: AVX2 path uses arithmetic shift (matching NEON and scalar).
0039 — readability-function-size NOLINT sweep (ADR-0146)¶
- ADR: ADR-0146
- Touches:
core/src/dict.ccore/src/picture.ccore/src/picture_pool.ccore/src/predict.ccore/src/libvmaf.ccore/src/output.ccore/src/read_json_model.ccore/src/feature/feature_extractor.ccore/src/feature/feature_collector.ccore/src/feature/iqa/convolve.ccore/src/feature/iqa/ssim_tools.ccore/src/feature/x86/vif_statistic_avx2.c- Invariant: every
readability-function-sizeNOLINT suppression has been replaced by a set of smallstatic(orstatic inline, for the SIMD / IQA files) helpers. The helper names are stable interfaces the surrounding code depends on (e.g.iqa_convolve_1d_separable,iqa_convolve_2d,ssim_compute_stats,ssim_workspace_alloc/_free,vif_stat_simd8_compute/_reduce,struct vif_simd8_lane,read_pictures_extractor_loop,read_pictures_post_extractor,read_pictures_validate_and_prep,read_pictures_update_prev_ref). Upstream Netflix has no equivalent helpers; rebases touching any of these files will conflict against the fork's split shape. - On upstream sync:
- If upstream lands a different decomposition of
_iqa_convolveor_iqa_ssim, prefer upstream's shape only if it keeps the ADR-0138 / ADR-0139 bit-exactness invariants (single-rounded float mul → widen to double → double add; per-lane scalar-float reduction through aligned temp buffer). Otherwise keep the fork's split and re-document the divergence here. - The fork renamed
_calc_scale→iqa_calc_scaleto clear thebugprone-reserved-identifiercheck. If upstream modifies_calc_scale, keep the fork's name and port the behavioural change. model_collection_parse_loopwrites directly tocfg_namerather than throughc->name— if upstream ever rewritesmodel_collection_parse, preserve the direct write (it's what lets the param stay non-const without a NOLINT).- Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 -o /tmp/vmaf_$mask.xml
done
diff <(grep -v fyi /tmp/vmaf_0.xml) <(grep -v fyi /tmp/vmaf_255.xml)
# expect exit 0 (Netflix-golden-pair VMAF bit-identical scalar vs SIMD)
Also run clang-tidy -p build on every file in Touches; expect zero warnings. - Follow-up T7-6: decide whether to rename the _iqa_* API surface (convolve / ssim / decimate / img_filter / filter_pixel / get_pixel) across all callers to clear the remaining bugprone-reserved-identifier suppressions in ssim.c, ms_ssim.c, float_ms_ssim.c. Out of scope here.
0040 — Thread-pool job recycling + inline data buffer (ADR-0147)¶
- ADR: ADR-0147
- Touches:
core/src/thread_pool.c - Invariants:
VmafThreadPoolJobcarries a fixed-sizechar inline_data[64]buffer. Payloads ≤ 64 bytes go throughmemcpy(job->inline_data, data, data_sz)+job->data = job->inline_data; payloads > 64 bytes take the legacymallocpath. The cleanup path MUST distinguish the two viajob->data != job->inline_data— a naivefree(job->data)would corrupt the slot. Enforced invmaf_thread_pool_job_clear_data.free_jobslist is protected by the existingqueue.lock; enqueue pops from it beforemallocing, runner recycles onto it after running a job.vmaf_thread_pool_destroywalks the list aftervmaf_thread_pool_waitreturns (all workers have exited → no lock needed). Any reorder that frees the queue lock before thefree_jobswalk is a leak on shutdown.- Fork's
void (*func)(void *data, void **thread_data)signature + per-workerVmafThreadPoolWorkerare fork-local; upstream Netflix #1464 hasfunc(void *data). Keep the fork's signature on any rebase — callers (src/libvmaf.c:threaded_enqueue_oneetc.) depend on the two-arg form. -
On upstream sync: Netflix PR #1464 is CLOSED (not merged) and bundles twelve unrelated optimizations. Only the thread-pool portion is ported here. If upstream ever reopens and merges #1464 (or a successor), cherry-pick only the pool mechanics; reject the payload-signature changes, the ADM / VIF / predict.c pieces (they conflict with ADR-0138 / 0139 / 0142 bit-exactness and with T7-5 predict.c refactor), and the feature-collector capacity bump (fork already capped at 8 for a reason — see
src/feature/feature_collector.c). -
Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for threads in 1 4; do
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 --threads $threads -o /tmp/vmaf_${threads}_${mask}.xml
done
done
# Expect bit-identical scores (attribute order may differ across
# --threads 1 vs --threads 4 because feature-collector emits in
# insertion order; the numeric values match).
diff <(grep -v fyi /tmp/vmaf_4_0.xml) <(grep -v fyi /tmp/vmaf_4_255.xml)
# expect exit 0 (scalar vs SIMD threaded)
Also run clang-tidy -p build core/src/thread_pool.c — expect zero warnings. Re-run the 500 000-job micro-benchmark from ADR-0147 §Decision if performance is under investigation.
0041 — IQA reserved-identifier rename + cleanup (ADR-0148)¶
- ADR: ADR-0148
- Touches: 21 files across
core/src/feature/(iqa/{convolve,decimate,ssim_tools}.{c,h},iqa/ssim_simd.h,ssim.c,integer_ssim.c,ms_ssim.c,ms_ssim_decimate.h,float_ssim.c,float_ms_ssim.c,x86/convolve_avx2.{c,h},x86/convolve_avx512.{c,h},arm64/convolve_neon.{c,h},AGENTS.md) pluscore/test/test_iqa_convolve.c. - Invariants:
- Every
_iqa_*/_kernel/_ssim_int/_map_reduce/_map/_reduce/_context/_ms_ssim_*/_ssim_*/_alloc_buffers/_free_bufferssymbol and the four underscore-prefixed header guards (_CONVOLVE_H_,_DECIMATE_H_,_SSIM_TOOLS_H_,__VMAF_MS_SSIM_DECIMATE_H__) is renamed to its non-reserved spelling. The fork's IQA surface no longer uses C's reserved-identifier name space. - The
clang-analyzer-security.ArrayBoundNOLINT bracket inssim_accumulate_rowandssim_reduce_row_range(integer_ssim.c) is load-bearing — the inner kernel-loopk_min/k_maxclamping is provably correct (k_min = max(0, hkernel_offs - x),k_max = min(hkernel_sz, hkernel_sz - (x + hkernel_offs - w + 1))) but the analyzer can't follow it across helper boundaries. Do not collapse the bracket. - The
clang-analyzer-unix.MallocNOLINT bracket intest_iqa_convolve.c(check_simd_variant,check_case) is intentional — test exits process on failure path; small allocations leak by design at test end. Do not refactor to free-on-exit. - The cross-TU NOLINT pattern on
compute_ssim(ssim.c) andcompute_ms_ssim(ms_ssim.c) — clang-tidymisc-use-internal-linkageruns per-TU and can't see the header bridge tofloat_ssim.c/float_ms_ssim.c. Keep the inline justification comment. - On upstream sync:
- The Netflix upstream IQA library (
tjdistler/iqa) has been effectively abandoned (last meaningful commit pre-2020). Future rebases will conflict on every renamed symbol; drop the underscore-prefix on each conflict and mirror the fork'siqa_*naming. - If upstream Netflix/vmaf ever reincorporates the IQA naming wholesale, prefer the fork's spellings — this PR is a one-shot mechanical rename with no semantic content.
- Re-test on rebase:
ninja -C build && meson test -C build
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 \
--feature float_ssim --feature float_ms_ssim \
-o /tmp/iqa_$mask.xml
done
diff <(grep -v fyi /tmp/iqa_0.xml) <(grep -v fyi /tmp/iqa_255.xml)
# expect exit 0 (bit-identical scalar vs SIMD on float_ssim/ms_ssim)
Also run clang-tidy -p build on every touched file (excluding arm64/); expect zero warnings.
0042 — Port Netflix #1376 — FIFO-hang fix via Semaphore (ADR-0149)¶
- ADR: ADR-0149
- Upstream commit: Netflix PR #1376, head
1c06ca4f1bb5da38b54db075a27c35ba8ea9d7b7(OPEN upstream as of 2026-04-24). - Touches:
python/vmaf/core/executor.py— baseExecutorclass +ExternalVmafExecutor-style subclass; delete_wait_for_workfiles/_wait_for_procfilespolling loops; rewrite_open_{work,proc}files_in_fifo_modearoundmultiprocessing.Semaphore(0); addopen_sem=Nonekwarg to every_open_{ref,dis}_{work,proc}fileand to the_open_workfilestaticmethod; drop unusedfrom time import sleep.python/vmaf/core/raw_extractor.py—AssetExtractor+DisYUVRawVideoExtractor; addopen_sem=Noneto_open_{ref,dis}_workfileoverrides (release on entry since these are no-ops); delete_wait_for_workfilesoverrides; drop unusedfrom time import sleep.- Fork carve-outs (load-bearing on rebase):
python/vmaf/__init__.py:__version__stays"3.0.0"— do NOT port upstream's bump to"4.0.0". The fork tracks its own versioning (v3.x.y-lusoris.N) per ADR-0025.from time import sleepis dropped from both files — upstream leaves the import in place (unused after their patch); the fork removes it because ADR-0141 touched-file rule requires ruff F401 clean.- Upstream typo preserved: the subclass warning message contains "to be created to be created". Comments note the typo inline; do not silently fix on rebase — it's upstream- authored and project policy is verbatim port.
- On upstream sync: upstream PR #1376 is still OPEN. When it merges, re-diff against the merged form; the touched hunks should be conflict-free because the fork now carries the same shape. Re-check whether upstream fixed the "to be created to be created" typo; if so, adopt the fix (it becomes a simple string update).
- Re-test:
python3 -m py_compile python/vmaf/core/executor.py \
python/vmaf/core/raw_extractor.py
ruff check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
black --check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
# all silent
# No FIFO-mode unit test in the tree; end-to-end harness
# exercise (needs libsvm + ffmpeg + fixtures) goes via
# make test-netflix-golden
# which doesn't exercise fifo_mode path but does verify the
# refactor didn't break executor.py imports.
0043 — Port Netflix #1472 — CUDA on Windows MSYS2/MinGW (ADR-0150)¶
- ADR: ADR-0150
- Upstream commits: Netflix PR #1472 —
15745cdf(portability) +b7b65e64(meson plumbing). Both OPEN upstream as of 2026-04-24. - Touches:
core/src/cuda/common.h— drop<pthread.h>include; rename reserved header guard__VMAF_SRC_CUDA_COMMON_H__→VMAF_SRC_CUDA_COMMON_INCLUDED.core/src/cuda/cuda_helper.cuh—#ifdef DEVICE_CODEguard around<cuda.h>vs<ffnvcodec/dynlink_loader.h>.core/src/picture.h—#ifdef DEVICE_CODEguard around<cuda.h>+ forward-declareVmafCudaStatevs<ffnvcodec/*>+ fulllibvmaf_cuda.h; rename reserved header guard.core/src/feature/integer_adm.h— updated comment abovedwt_7_9_YCbCr_thresholdtable noting the fork's positional-initializer shape vs upstream's#ifndef __CUDACC__shape (see §Fork carve-outs).core/src/feature/cuda/integer_adm/{adm_cm,adm_csf,adm_csf_den,adm_decouple,adm_dwt2}.cu—#ifndef DEVICE_CODEguard around#include "feature_collector.h".core/src/meson.build— Windows nvcc plumbing (+70 LoC underhost_machine.system() == 'windows'):vswhere-basedcl.exediscovery, MSVC + Windows SDK include path injection, CUDA version detection vianvcc --version,nvcc_ccbin_flags+nvcc_host_includesthreaded through everycustom_targetthat invokes nvcc.- Fork carve-outs (load-bearing on rebase):
integer_adm.huses positional initializers, NOT upstream's#ifndef __CUDACC__wrap. Both shapes resolve the MSVC/nvcc C++-designated-initializer issue; the positional form is C++-portable and keeps the table available to future.cuconsumers. Keep the fork's form on rebase.cuda_static_libkeepsdependencies : [pthread_dependency]. Upstream drops it; the fork needs it becausering_buffer.c(built as part ofcuda_static_lib)#includes<pthread.h>directly. On rebase: keep the fork's version.meson.buildgencode coverage block: the fork's ADR-0122 explicit cubin list (sm_75/80/86/89 + compute_80 PTX) sits after the new upstream nvcc-detect block. On rebase, re-assemble the same merged order: nvcc-detect first, then gencode coverage (both host-independent).- Header guards:
_INCLUDEDspellings are fork-local (ADR-0148 precedent). Upstream keeps reserved__VMAF_SRC_*_H__spellings. On rebase, keep_INCLUDED. - On upstream sync: PR #1472 is still OPEN. When merged, re-diff the three conflict-resolved hunks against upstream's final form. Keep fork's version on the four carve-outs above unless upstream meaningfully reshapes those regions.
- Re-test on rebase (Linux host with CUDA toolkit):
meson setup libvmaf core/build-cuda \
-Denable_cuda=true -Denable_nvcc=true -Denable_sycl=false
ninja -C core/build-cuda && meson test -C core/build-cuda
# Expect 6 .fatbin files generated + CLI linked + 35/35 tests pass.
Windows validation is operator-driven — CI does not yet have a Windows + MSYS2 + MinGW + MSVC BuildTools + CUDA runner (tracked as T7-3 in .workingdir2/OPEN.md). - Prerequisites note (Windows only): nv-codec-headers must be built from git master commit 876af32 or later. The release tag n13.0.19.0 is missing cuMemFreeHost, cuStreamCreateWithPriority, cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing issue, not scope of this port.
0058 — libvmaf.pc Cflags leak fix (ADR-0200)¶
- ADR: ADR-0200; bug-fix follow-up to entry 0057.
- Upstream source: fork-local. Netflix has no Vulkan backend.
- Touches:
core/subprojects/packagefiles/volk/meson.build— drops-include volk_priv_remap.hfromvolk_dep.compile_args; keeps-DVK_NO_PROTOTYPES.core/src/vulkan/meson.build— pullsvolk_priv_remap_h_pathfrom the volk subproject and appends['-include', <path>]tovmaf_cflags_common(privatec_args:on libvmaf'slibrary()call).- Invariants (load-bearing):
-includeMUST stay offvolk_dep.compile_args— otherwise it leaks into staticlibvmaf.pcCflags. Test on rebase:meson setup ... -Ddefault_library=static -Denable_vulkan=enabled, thengrep Cflags meson-private/libvmaf.pc— must NOT containvolk_priv_remapor any build-dir absolute path.-includeMUST be applied to libvmaf's compile — every libvmaf TU that calls volk'svk*API needs the rename macros active. Thevmaf_cflags_commoninjection covers this for all libvmaf sub-libraries (libvmaf_feature, libvmaf_cpu, etc.).- The path comes from
subproject('volk').get_variable(...), not from a hardcoded string — survives volk wrap version bumps. - On upstream sync: zero upstream interaction.
- Re-test on rebase / volk wrap bump:
meson setup build-vk-static-test libvmaf -Denable_vulkan=enabled \
-Denable_cuda=false -Denable_sycl=false -Ddefault_library=static
ninja -C build-vk-static-test src/libvmaf.a
grep Cflags build-vk-static-test/meson-private/libvmaf.pc
# Expected: no `volk_priv_remap` substring, no build-dir absolute path
0057 — Volk vk* priv-remap for static-archive builds (ADR-0198)¶
- ADR: ADR-0198; follow-up to ADR-0185.
- Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
- Touches:
core/subprojects/packagefiles/volk/meson.build— overlay applied on top of the upstream volk wrap. Adds acustom_targetthat runsgen_priv_remap.pyto producevolk_priv_remap.hfrom the upstreamvolk.h, and wires-includeof the generated header intovolk.c'sc_argsandvolk_dep'scompile_args.core/subprojects/packagefiles/volk/gen_priv_remap.py— fork-added generator script (regex againstextern PFN_vkXxx vkXxx;declarations).- Invariants (load-bearing):
- Force-include must propagate to every libvmaf TU pulling in
volk_dep— verified via meson dep graph. Removing the-includefromcompile_argsre-introduces the static-link multi-def cascade. - Generator regex matches every
vk*PFN declaration involk.h— confirmed for volk-1.4.341 (784declarations,784remaps). Bumping the volk wrap version: re-run the generator (it's a configure-time custom target, so it's automatic) and confirm the rename count printed to stdout matches the count of^extern PFN_vklines in the newvolk.h. - The renamed symbols use the
vmaf_priv_prefix — chosen to match no upstream Netflix or Vulkan SDK identifier. Don't rename to_vk*(collides with reserved-identifier C namespace) orvkv_*etc. - On upstream sync: zero upstream interaction. The volk wrap is a libvmaf-managed subproject; Netflix doesn't ship a Vulkan backend.
- Re-test on rebase / after any volk wrap bump:
meson setup build-vk-static libvmaf -Denable_vulkan=enabled \
-Denable_cuda=false -Denable_sycl=false \
-Ddefault_library=static
ninja -C build-vk-static src/libvmaf.a
test "$(nm build-vk-static/src/libvmaf.a 2>/dev/null \
| grep -cE '^[0-9a-f]* (T|D|B|R) vk[A-Z]')" = "0" \
&& echo OK
(Followed by the BtbN-style link reproducer in the ADR References section.)
0056 — SSIMULACRA 2 snapshot gate + fp-contract-off split (ADR-0164)¶
- ADR: ADR-0164
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
- Touches:
- python/test/ssimulacra2_test.py — new fork-added Python test. Uses
subprocess.callagainstExternalProgram.vmafexecwith--feature ssimulacra2; parses the--jsonoutput; asserts pooled + per-frame scores. - Invariants (load-bearing):
- Pinned values are CPU-only — generated on master HEAD after PR #100 merge. Re-generate if the scalar or any SIMD path changes semantically (which per ADR-0161/0162/0163's bit-exactness contract, it shouldn't — any bit-exact refactor leaves pinned values unchanged).
- Tolerance is 4 decimal places (
places=4) — matches 1e-4. The CPU paths are bit-exact so actual drift should be 0; the tolerance is defensive. -ffp-contract=offeverywhere in the ssimulacra2 pipeline:libvmaf_ssimulacra2_static_lib(scalar extractor),x86_ssimulacra2_avx2_lib,x86_ssimulacra2_avx512_lib, andarm64_ssimulacra2_lib(from ADR-0161). All four split out of their umbrella libs so other extractors keep upstream's default FMA policy. Without this the CI GCC/clang hosts drifted ~2e-4 from my AVX-512 authoring host — GCC 10+ defaults-ffp-contract=faston x86 with-mfmaand on aarch64, fusinga*b+cin scalar glue around the SIMD calls. Do NOT remove any of these carve-outs on rebase.- Fixtures are already-checked-in —
src01_hrc00/01_576x324is also the primary Netflix golden fixture; the 160×90 derived one stresses the sub-176 pyramid-termination path. - Do NOT modify the Netflix golden assertions in quality_runner_test.py et al. — those are upstream-pinned. This test is a SEPARATE file that adds fork-specific scores.
- On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future, cross-reference against their pinning if they add one.
- Re-test on rebase / after any ssimulacra2 change:
- Follow-ups:
- Cross-reference gate against libjxl
tools/ssimulacra2whenssimulacra2_rscargo install is fixed. - Expand fixture coverage if new YUV test assets land.
0055 — SSIMULACRA 2 picture_to_linear_rgb SIMD (ADR-0163)¶
- ADR: ADR-0163
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
- Touches:
- ssimulacra2_avx2.{c,h} — new
ssimulacra2_picture_to_linear_rgb_avx2+ helpers (read_plane_scalar_s2,srgb_to_linear_lane_avx2,compute_matrix_coefs). - ssimulacra2_avx512.{c,h} — 16-wide AVX-512 port.
- ssimulacra2_neon.{c,h} — 4-wide aarch64 port.
- ssimulacra2.c — new
ptlr_fnfield inSsimu2State; dispatch wrapperconvert_picture_to_linear_rgbunpacksVmafPictureintosimd_plane_t[3]; init assigns AVX2/AVX-512/NEON pointers. - ssimulacra2_simd_common.h — new shared header declaring
simd_plane_t. Decouples SIMD TUs fromVmafPicturetype. - test_ssimulacra2_simd.c — new
test_ptlr_420_8,test_ptlr_420_10,test_ptlr_444_8,test_ptlr_444_10,test_ptlr_422_8subtests + scalar referencesref_read_plane,ref_srgb_to_linear,ref_picture_to_linear_rgb. - Invariants (load-bearing):
- Scalar-order matmul —
G = Yn + cb_g * Un + cr_g * Vnchained left-to-right in all three SIMD TUs. Regression test catches reordering drift (~1 ulp). - Per-lane scalar
powf— vector polynomial approximation would drift scalar bit-exactness. Do not replace the lane spill/reload pattern with a vector libm. simd_plane_tlayout —{data, stride, w, h}ordering assumed by all three SIMD TUs. The dispatch wrapper builds this fromVmafPicturefields; layout must match.- Bounds clamping in
read_plane_scalar_*mirrors scalar reference verbatim (if (sx < 0) sx = 0; if (sx >= pw) sx = pw-1;etc.). Do not simplify — removes per-lane safety at plane edges. - Arbitrary chroma ratios fall through to the
int64_tmultiplication branch. Don't remove it — SSIMULACRA 2 is supposed to accept non-standard ratios gracefully. - On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides a SIMD YUV→RGB path, diff against the fork's — preserve the bit-exactness contract unless ADR-0142 Netflix-authority carve-out opens.
- Re-test on rebase:
ninja -C build && build/test/test_ssimulacra2_simd # 11/11
ninja -C build-aarch64 && \
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 11/11
- Follow-ups:
- T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending (gated on
tools/ssimulacra2availability). - SSIMULACRA 2 now has zero scalar hot paths. T3-1 closes in full with phases 1+2+3 (ADR-0161, 0162, 0163).
0054 — SSIMULACRA 2 FastGaussian IIR blur SIMD (ADR-0162)¶
- ADR: ADR-0162
- Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf.
- Touches:
- ssimulacra2_avx2.{c,h} — new
ssimulacra2_blur_plane_avx2+ 2 helpers (hblur_8rows_avx2,vblur_simd_8cols_avx2). - ssimulacra2_avx512.{c,h} — 16-wide port.
- ssimulacra2_neon.{c,h} — 4-wide aarch64 port, uses
vsetq_lane_f32in place of gather. - ssimulacra2.c — adds
blur_fnfunction pointer toSsimu2State, dispatch ininit_simd_dispatch(), call-site inblur_3plane. - test_ssimulacra2_simd.c — new
test_blur+ scalar reference (ref_blur_plane,ref_fast_gaussian_1d). - Invariants (load-bearing):
- Row-batching lane layout — horizontal pass lane
iMUST hold row(y_base + i). Gather index vector entries are(y_base + i) * w(stride-w). Changing this breaks bit-exactness vs scalar. - Scalar left-to-right summation order —
n2_k * sum - d1_k * prev1_k - prev2_kchained sequentially;o0 + o1 + o2at output time is(o0 + o1) + o2. Changing to(o0 + o2) + o1oro0 + (o1 + o2)will drift ~1 ulp and the regression test catches it. col_stateis 6 * w contiguous floats — layout is[prev1_0 | prev1_1 | prev1_2 | prev2_0 | prev2_1 | prev2_2]. SIMD loads assume this layout; changing field order requires updating all three SIMD TUs in lockstep withblur_plane.- NEON lane-set pattern — aarch64 has no gather intrinsic; 4 explicit
vsetq_lane_f32calls per input vector. Do not replace with ald1 {v.s}[lane]-style pseudo-gather without re-verifying bit-exactness. - Scalar tail in vertical pass matches scalar reference body verbatim. Any deviation breaks
memcmpequality on widths that aren't multiples of the SIMD width. - On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides their own IIR blur SIMD, diff against the fork's and preserve the bit-exactness contract unless an ADR-0142 Netflix-authority carve-out is opened.
- Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd # 6/6
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 6/6
- Follow-ups:
picture_to_linear_rgbSIMD — last scalar hot path in the extractor. 2 calls / frame. Low ROI but mechanical.- T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending.
0053 — SSIMULACRA 2 SIMD bit-exact ports (ADR-0161)¶
- ADR: ADR-0161
- Upstream source: fork-local. Upstream Netflix/vmaf has no SSIMULACRA 2 extractor at all (fork-added in ADR-0130).
- Touches:
- ssimulacra2_avx2.c / .h — 5 AVX2 kernels + per-lane
cbrtfhelper. - ssimulacra2_avx512.c / .h — 5 AVX-512 kernels; mechanical 16-wide widening of the AVX2 path.
- ssimulacra2_neon.c / .h — 5 NEON kernels; 4-wide aarch64 mirror.
- ssimulacra2.c — adds function-pointer dispatch fields to
Ssimu2State+init_simd_dispatch()helper, calls go through the pointers. - meson.build — registers the three SIMD TUs in
x86_avx2_sources/x86_avx512_sources/arm64_sources. - test_ssimulacra2_simd.c and
test/meson.build— new bit-exact test harness. - Invariants (load-bearing):
- Byte-for-byte bit-exactness to scalar on all 5 vectorised kernels under
FLT_EVAL_METHOD == 0. Regression caught pre- merge: naïve pairing(a+b)+(c+d)vs scalar((a+b)+c)+ddrifts by 1 ULP. Keep sequential scalar-order chains in all three SIMD TUs on rebase. cbrtfis per-lane scalar libm, not a polynomial. Any replacement with a vector cbrt would drift the ssimulacra2 score and break the regression test. Keep the spill/reload pattern.ssim_map/edge_diff_mapreductions use the ADR-0139 per-lanedoublescalar tail. Do NOT SIMD-reduce float lanes then lift to double — summation order changes.downsample_2x2deinterleave uses ISA-appropriate ops: AVX2vshufps+vpermpd, AVX-512vpermt2ps, NEONvuzp1q_f32+vuzp2q_f32. After deinterleave, sum order is((r0e+r0o)+r1e)+r1omatching scalar.#pragma STDC FP_CONTRACT OFFat every TU header. Ignored by aarch64 GCC (non-fatal-Wunknown-pragmas); kept for portability (clang, MSVC).- IIR blur +
picture_to_linear_rgbstay scalar in this PR. Follow-up PRs target these; when they land, re-verify bit-exactness viatest_ssimulacra2_simdexpansion. - Runtime dispatch order: AVX-512 > AVX2 on x86; NEON on aarch64; scalar fallback. Preserve on rebase.
- On upstream sync:
- Upstream has no SSIMULACRA 2 extractor; nothing to merge.
- If Netflix adopts SSIMULACRA 2 in the future, diff their implementation against the fork's scalar + SIMD TUs; keep the fork's bit-exactness contract absent a specific Netflix-authority carve-out ADR.
- Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd # 5/5
clang-tidy -p build core/src/feature/x86/ssimulacra2_avx2.c \
core/src/feature/x86/ssimulacra2_avx512.c
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 5/5
clang-tidy -p build-aarch64 \
core/src/feature/arm64/ssimulacra2_neon.c
- Follow-ups:
- IIR blur vectorisation (
blur_planevertical-pass column batching) — the biggest frame-level wallclock win. picture_to_linear_rgbper-lanepowf— lower ROI but mechanical.- T3-3 SSIMULACRA 2 snapshot-JSON regression test — ADR-0130 deferred; still pending.
0052 — psnr_hvs SIMD bit-exact ports (ADR-0159 AVX2, ADR-0160 NEON)¶
- ADRs: ADR-0159 (AVX2), ADR-0160 (NEON sister port).
- Upstream source: fork-local. Upstream Netflix/vmaf has no psnr_hvs SIMD path.
- Touches:
core/src/feature/x86/psnr_hvs_avx2.c— AVX2 TU.core/src/feature/x86/psnr_hvs_avx2.h— AVX2 header.core/src/feature/arm64/psnr_hvs_neon.c— NEON TU (sister port, ADR-0160).core/src/feature/arm64/psnr_hvs_neon.h— NEON header.core/src/feature/third_party/xiph/psnr_hvs.c— addPsnrHvsState+ runtime dispatch ininit()(AVX2 underARCH_X86, NEON underARCH_AARCH64) + scoped NOLINTBEGIN/END around the upstream Xiph scalar block (kept verbatim as the bit-exact reference).core/src/meson.build— addx86/psnr_hvs_avx2.ctox86_avx2_sourcesandarm64/psnr_hvs_neon.ctoarm64_sources.core/test/test_psnr_hvs_avx2.c,core/test/test_psnr_hvs_neon.c— bit-exact unit tests (x86 and aarch64 respectively).core/test/meson.build— register both tests underenable_asm, arch-gated.- Invariants (load-bearing):
- Bit-exactness to scalar: every
od_coeff(int32) and every finalpsnr_hvs_{y,cb,cr,psnr_hvs}value the AVX2 path emits must be byte-identical to the scalar reference on the Netflix golden pairs. If a rebase introduces any pattern that breaks this (e.g. a floating-point horizontal reduce in the mask accumulator), the unit testtest_psnr_hvs_avx2will fail — don't relax the assertions; fix the SIMD path. - DCT butterfly layout:
butterfly → transpose → butterfly → transpose. The transpose lives insideod_bin_fdct8x8_avx2. Do not move it. - Float accumulators stay scalar: means / variances / mask / error accumulation in
calc_psnrhvs_avx2use the same per-block scalar loop as scalar psnr_hvs — bit-exact by construction. Do not vectorize these with horizontal reductions without replicating ADR-0139's per-lane scalar-float reduction pattern. The cross-block error accumulatorretis threaded throughaccumulate_error()by pointer, not returned-then-summed: each of the 64 per-coefficient contributions per block must hit the outerretdirectly, matching scalar's inlineret += ...atthird_party/xiph/psnr_hvs.cline 355. IEEE-754 float add is non-associative — summing into a local float and then adding the per-block total toretchanges the summation tree and drifts the Netflix golden by ~5.5e-5. #pragma STDC FP_CONTRACT OFFat the TU header disables FMA formation. Required:fmaf(a, b, c)can differ from(a*b)+cby 1 ulp, breaking bit-exactness. Do not remove the pragma; do not add-ffp-contract=fastto the build flags for this TU.- NOLINT suppressions are load-bearing — each cites ADR-0141 inline (bit-exactness scalar-diff auditability for the 30-butterfly function, scalar float→double promotion for
sqrt, extractor-registry extern linkage forvmaf_fex_psnr_hvs, upstream-Xiph scoped block for rebase parity). - On upstream sync:
- Upstream has no psnr_hvs SIMD as of 2026-04-24. Keep fork's version on conflict.
- If upstream ever touches
psnr_hvs.cfor non-SIMD reasons (e.g. a masking-table update), rebase the AVX2 TU to match line-for-line and re-runtest_psnr_hvs_avx2to confirm bit-exactness survives. - NEON follow-up PR is a sister port; its
arm64/psnr_hvs_neon.cwill mirror this ADR's invariants. On rebase, the two SIMD TUs must stay in lock-step with the scalar reference. - Re-test on rebase:
ninja -C build
meson test -C build test_psnr_hvs_avx2
# Expect: 5/5 subtests pass (DCT bit-exact on 3 random seeds +
# delta + constant input).
# CLI-level bit-exactness on Netflix golden (requires the YUV
# fixtures in python/test/resource/yuv/):
# VMAF_CPU_MASK=0 (scalar)
# VMAF_CPU_MASK=255 (AVX2 enabled)
# Diff per-frame psnr_hvs_{y,cb,cr,psnr_hvs} XML fields; expect
# byte-identical across all 3 golden pairs.
0051 — Netflix#1486 motion updates verified present (ADR-0158)¶
- ADR: ADR-0158
- Upstream source: Netflix upstream PR #1486 ("Port motion updates"), MERGED 2026-04-20 as commits
a44e5e6(code) +62f47d5(Netflix golden updates). - Touches: documentation-only; the actual code changes this ADR documents are already in the fork's master via earlier incremental motion3 / blend / five-frame-window commits.
- Invariants (load-bearing for future
/sync-upstream): - The
edge_8mirror fix (i_tap = height - (i_tap - height + 2)) is present atinteger_motion.c:240,x86/motion_avx2.c:147,x86/motion_avx512.c:147. If upstream's mirror line ever diverges again, this is the hunk to watch. - The
motion_max_valfeature option is atinteger_motion.c:57,118-120with default 10000.0 andFEATURE_PARAMflag. Upstream's default = fork's default; don't drift. VMAF_integer_feature_motion3_scoreoutput plumbing is ininteger_motion.c+alias.c.- Fork-local motion extensions (five-frame-window, moving-average, blend, fps_weight) are ADDITIONS on top of Netflix#1486. They are not upstream. Upstream changes to motion extractor internals may conflict with them — diff against
core/src/feature/integer_motion.con every rebase and check that the fork'sMIN(s->score * s->motion_fps_weight, s->motion_max_val)invocations are preserved (lines ~409, ~503). - On upstream sync: nothing to port from Netflix#1486 — it's absorbed. If a future upstream PR touches the same code paths, prefer upstream's version for the scalar/edge handling and the fork's version for the five-frame-window / blend extensions.
- Re-test on rebase:
ninja -C build
meson test -C build
# Expect: 35/35 pass.
# Verify the upstream markers are still in place after rebase:
grep -n "height - (i_tap - height + 2)\|motion_max_val\|VMAF_integer_feature_motion3_score" \
core/src/feature/integer_motion.c \
core/src/feature/alias.c \
core/src/feature/x86/motion_avx2.c \
core/src/feature/x86/motion_avx512.c
# Expect: matches at all 4 files. If any missing, the rebase
# silently dropped the Netflix#1486 content — investigate.
0050 — CUDA preallocation memory leak fix + vmaf_cuda_state_free (ADR-0157)¶
- ADR: ADR-0157
- Upstream source: Netflix upstream issue #1300 (OPEN since 2024; no maintainer fix as of 2026-04-24). User reports GPU memory rises monotonically across init/preallocate/fetch/close cycles.
- Touches:
core/include/libvmaf/libvmaf_cuda.h— new publicvmaf_cuda_state_free()API declaration.core/src/cuda/common.c— newvmaf_cuda_state_free()implementation;vmaf_cuda_release()now callscuda_free_functions();vmaf_cuda_state_init()gets an outer failure unwind;init_with_primary_context()releases the retained primary context onfail_after_pop.core/src/cuda/ring_buffer.c—vmaf_ring_buffer_close()now unlocks + destroys the mutex before freeing.core/test/test_cuda_preallocation_leak.c— new GPU-gated reducer (10-cycle loop with full cleanup).core/test/test_cuda_pic_preallocation.c,core/test/test_cuda_buffer_alloc_oom.c— add missingvmaf_cuda_state_free()+vmaf_model_destroy()calls aftervmaf_close()in every test that allocates these.core/test/meson.build— register the new reducer underenable_cudaguard.- Invariants (load-bearing):
- Public contract: every caller of
vmaf_cuda_state_init()MUST callvmaf_cuda_state_free()AFTERvmaf_close()on any VmafContext that imported the state. Informalfree(cu_state)is a silent double-free hazard AFTER close (vmaf_close's vmaf_cuda_release already memset's + frees CudaFunctions internals; vmaf_cuda_state_free only frees the heap allocation itself). vmaf_cuda_release()freesCudaFunctionsvia a saved pointer AFTER thememset. Order matters —memsetfirst socu_state->fis zeroed in the caller's struct, then free via the saved local. Do not re-order.vmaf_ring_buffer_close()unlocks BEFORE destroying the mutex (POSIX requires the mutex be unlocked for destroy).- The cold-start unwind in
init_with_primary_contextreleasescuDevicePrimaryCtxRetain's retained context ifcuStreamCreateWithPriorityfails. - The ADR-0122 / ADR-0123
is_cudastate_empty()null-guards at the top of every publicvmaf_cuda_*entry must continue to compose with the newvmaf_cuda_state_free()(which accepts NULL directly and doesn't call through to the CUDA API). - The new free call order in callers is:
vmaf_close(vmaf)→vmaf_cuda_state_free(cu_state)→vmaf_model_destroy(model). Reversing the first two produces a use-after-free. - On upstream sync:
- Upstream has no
vmaf_cuda_state_free()as of 2026-04-24. Keep the fork's version on any conflict. If upstream eventually lands the same API with a different spelling, prefer upstream's spelling and add a compat alias — but do not break the fork's ABI. vmaf_cuda_release()'scuda_free_functions()call is fork-local. On rebase, keep it.- The ring-buffer
pthread_mutex_unlock+pthread_mutex_destroypair is fork-local. On rebase, keep it. - If upstream refactors
VmafCudaStateownership semantics (unlikely — their pattern has been "leaked state in a long- lived process is acceptable" historically), re-audit this ADR and the new public API. - Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 40/40 pass including test_cuda_preallocation_leak.
# ASan leak-check:
cd libvmaf && meson setup build-asan-cuda \
-Db_sanitize=address -Denable_cuda=true -Denable_sycl=false \
--buildtype=debug
ninja -C build-asan-cuda
ASAN_OPTIONS='detect_leaks=1:leak_check_at_exit=1' \
build-asan-cuda/test/test_cuda_preallocation_leak
# Expect: 0 bytes leaked from core/src/* frames.
# (~180 bytes in libcuda.so.1 is expected — driver's process-
# lifetime cuInit cache, does not grow per cycle.)
0049 — CUDA graceful error propagation (ADR-0156)¶
- ADR: ADR-0156
- Upstream source: Netflix upstream issue #1420 (OPEN as of 2026-04-24). Reports that two concurrent VMAF-CUDA processes crash the second one at
vmaf_cuda_buffer_allocdue toCHECK_CUDA(cuMemAlloc)→assert(0)on OOM. - Touches:
core/src/cuda/cuda_helper.cuh— redefinedCHECK_CUDAfamily. New macrosCHECK_CUDA_GOTO+CHECK_CUDA_RETURN+ helpervmaf_cuda_result_to_errno. Oldassert(0)semantics removed entirely.core/src/cuda/common.c,core/src/cuda/picture_cuda.c,core/src/libvmaf.c— allCHECK_CUDA(...)sites converted; cleanup labels added where contexts / buffers were pushed / allocated.core/src/feature/cuda/integer_motion_cuda.c,integer_vif_cuda.c,integer_adm_cuda.c— same conversion; 12statichelpers promotedvoid → int.core/test/test_cuda_buffer_alloc_oom.c— new GPU-gated reducer.core/test/meson.build— register new test underenable_cudaguard.- Invariants (load-bearing):
CHECK_CUDA_GOTO/CHECK_CUDA_RETURNmust never callassert(0)orabort()on a CUDA error. Any regression back to the upstream abort-on-error semantics re-introduces Netflix#1420 and the NDEBUG footgun.- Every
CHECK_CUDA_GOTOtarget label must pop any previously-pushed CUDA context and free any partially-constructed buffers before returning the errno. The graceful path must not leak resources. vmaf_cuda_result_to_errnouses numericCUresultvalues directly (0 / 1 / 2 / 3 / 4 / 101 / 201 / 400) so host TUs that don't include<cuda.h>can transitively consume the mapping via the inline function. If upstream renumbersCUresultenum values (historically stable — they've been fixed since CUDA 1.0), re-audit the switch.- ADR-0122 / ADR-0123
is_cudastate_empty(...)guards at the top of every publicvmaf_cuda_*entry point must stay — they run before the CUDA API is touched and compose cleanly with the new error propagation. - Twelve
statichelper signatures in the feature extractors areint-returning (wasvoid): any upstream-port that restores thevoidreturn silently regresses the error path. - On upstream sync:
- Upstream Netflix still uses
assert(0)inCHECK_CUDAas of 2026-04-24. Keep the fork's macro definitions incuda_helper.cuhon any upstream conflict — this file is fork-local behaviour. - If upstream eventually lands Netflix#1420 with a similar refactor, prefer the fork's version unless upstream's has identical semantics (no
assert(0)/ noabort()/ translatesCUresultto-errno). Re-verifytest_cuda_buffer_alloc_oomafter rebase. - If upstream adds new
CHECK_CUDA(...)sites in a port, rewrite them toCHECK_CUDA_GOTO/CHECK_CUDA_RETURNas part of the port commit. - If upstream changes any of the 12
statichelper signatures back tovoid, re-promote them tointduring the merge. - Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 39/39 pass including test_cuda_buffer_alloc_oom.
# Reducer check — verify the OOM-to-errno path is live:
meson test -C core/build-cuda test_cuda_buffer_alloc_oom -v
# Expect subtests: request 1 TiB → -ENOMEM; request 0 bytes → 0.
clang-tidy -p core/build-cuda --quiet \
core/src/cuda/common.c \
core/src/cuda/picture_cuda.c \
core/src/feature/cuda/integer_motion_cuda.c \
core/src/feature/cuda/integer_vif_cuda.c \
core/src/feature/cuda/integer_adm_cuda.c \
core/src/libvmaf.c
# Expect exit 0 on every file.
0049 — compute_motion / picture_copy signature changes (b949cebf upstream port)¶
- Upstream commit: Netflix/vmaf b949cebf (feature/motion: port several feature extractor options)
- Prerequisite commit: Netflix/vmaf d3647c73 (picture_copy: add channel parameter)
- PR: upstream/port-b949cebf-motion
Rebase-sensitive invariants:
-
compute_motionsignature change —compute_motion()incore/src/feature/motion.c/motion.hnow takes an extraint motion_decimateparameter (themotion_add_scale1flag). Any new caller added in the fork that callscompute_motion()must pass this parameter. The SIMD integer motion callers (motion_avx2.c,motion_avx512.c) do NOT callcompute_motion()— they use the SAD/convolution dispatch table directly and are unaffected. -
vmaf_image_sad_csignature change — similarly gainsint motion_add_scale1. Any caller in the fork must be updated. Currently only called fromcompute_motion()internally. -
picture_copysignature change — gainsint channelas the last parameter (0=Y, 1=U, 2=V). Every caller in the tree has been updated to pass0(luma). When adding new callers that need UV planes, pass1or2. The fork's CUDA/SYCL/Vulkan callers have been updated in this PR. -
Default behavior preserved — all new options default to no-op values.
motion_add_scale1=false,motion_add_uv=false,motion_blend_factor=1.0,motion_fps_weight=1.0,motion_filter_size=5(= DEFAULT_MOTION_FILTER_SIZE). Integer and float motion2 scores are bit-identical to pre-port baseline. -
vif_scale_frame_sdependency avoided — the upstream b949cebf motion.c importsvif_scale_frame_sfrom vif_tools.h. The fork does not have this function yet (vif options chain is deferred, Research-0024 Strategy E). The bilinear downscaler formotion_add_scale1is implemented as local static functions inmotion.c(motion_scale_bilinear,motion_bilinear_interp,motion_mirror_f). When upstream's vif options chain is eventually ported, reconcile by replacing these local functions withvif_scale_frame_s.
Reproducer:
# verify bit-exactness (default options, scores must be identical):
./core/build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--model path=model/vmaf_v0.6.1.json \
--feature motion --no_prediction --json --output /tmp/motion.json
# integer_motion2 scores must match pre-port baseline at 6 decimal places.
0048 — i4_adm_cm int32 rounding overflow deliberately preserved (ADR-0155)¶
- ADR: ADR-0155
- Upstream source: Netflix upstream issue #955 (OPEN since 2020; no maintainer response as of 2026-04-24). Reports that
add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1))incore/src/feature/integer_adm.cscales 1–3 overflowsint32_t(1u << 31 = 0x80000000wraps to-2147483648). Rounding term is sign-negated; ADM scales 1–3 biased low by ≈1 LSB per summed term. - Touches (documentation-only):
docs/adr/0155-adm-i4-rounding-deferred-netflix-955.md— new ADR (this entry's anchor).core/src/feature/integer_adm.c— in-file warning comment above the overflow site (add_bef_shift_flt[]initialiser loop around line 1277). No code change.core/src/feature/AGENTS.md— invariant note under "Rebase-sensitive invariants".- Invariants (load-bearing — do NOT silently "fix"):
integer_adm.ckeepsint32_t add_bef_shift_flt[3]with the overflowing1u << 31assignment. The Netflix golden assertions (python/test/quality_runner_test.py,vmafexec_test.py,feature_extractor_test.py) encode the buggy ADM output. Project hard rule #1 (ADR-0024) prohibits changing those assertions.- Any "fix" that changes ADM numerical output must land together with a coordinated Netflix-authored golden-number update (the ADR-0142 Netflix-authority carve-out). Until Netflix#955 closes upstream, there is no authority to track.
- On upstream sync:
- If Netflix finally lands a fix for #955 (widening the rounding term to
uint32_torint64_t), sync the C-side fix AND the updatedassertAlmostEqualvalues in the same merge. Re-runmake test-netflix-goldenand/cross-backend-diffon the golden pairs to verify the new numbers are consistent across CPU / CUDA / SYCL. - Remove the in-file warning comment above the
add_bef_shift_fltinitialiser loop, flip ADR-0155 toSuperseded by ADR-NNNN, and drop this rebase-notes entry. - If upstream instead closes #955 as wont-fix, keep this entry verbatim and update the ADR status to note upstream's closure.
- Re-test on rebase (gates the invariant by confirming the golden numbers are unchanged):
ninja -C build
make test-netflix-golden
# Expect: VMAF mean 76.66890… on src01_hrc00/01_576x324 golden
# pair — bit-identical to pre-rebase.
0047 — vmaf_score_pooled -EAGAIN for pending features (ADR-0154)¶
- ADR: ADR-0154
- Upstream source: Netflix upstream issue #755 (OPEN as of 2026-04-24). Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); fork reopens it via error-code semantics without changing the retroactive-write design.
- Touches:
core/src/feature/feature_collector.c—vmaf_feature_collector_get_scorereturns-EAGAIN(was-EINVAL) when the requested index is valid but not yet written.core/src/feature/feature_collector.h— inlinevmaf_feature_vector_get_scorenow returns-EINVALfor null/out-of-range and-EAGAINfor not-written (was-1for both). Added#include <errno.h>. Rename reserved__VMAF_FEATURE_COLLECTOR_H__guard toVMAF_FEATURE_COLLECTOR_INCLUDED.core/test/test_score_pooled_eagain.c— new 4-subtest reducer.core/test/meson.build— register the new test.- Invariants (load-bearing, enforced by the reducer):
vmaf_feature_collector_get_score(fc, name, &score, i)returns-EAGAINiff the featurenameis registered andiis in range butscore[i].written == false.- The return stays
-EINVALfor (a) null pointers, (b)i >= feature_vector->capacity, (c) unknown feature name. - The inline fast-path
vmaf_feature_vector_get_scoreuses the same split. - On upstream sync: upstream has not changed the error semantics since 2020. If they do (unlikely), keep the fork's
-EAGAIN— it is strictly more informative and downstream code depending on the split would regress. - Re-test on rebase:
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: 4/4 subtests pass.
# Reducer check:
git stash push core/src/feature/feature_collector.c core/src/feature/feature_collector.h
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: Fail: 1 (tests fail without -EAGAIN split).
git stash pop
0046 — float_ms_ssim min-dim guard (ADR-0153)¶
- ADR: ADR-0153
- Upstream source: Netflix upstream issue #1414 (OPEN as of 2026-04-24). No upstream fix has landed; fork adds the guard independently.
- Touches:
core/src/feature/float_ms_ssim.c— add#include "log.h"+#include "iqa/ssim_tools.h"+ amin_dim = GAUSSIAN_LEN << (SCALES - 1)check at the start ofinit; extract SIMD dispatch into a newms_ssim_init_simd_dispatchhelper to keepinitwithin the ADR-0141 60-line budget.core/test/test_float_ms_ssim_min_dim.c— new 3-subtest reducer.core/test/meson.build— register the new test executable.- Invariant (load-bearing, enforced by the reducer):
float_ms_ssim.initreturns-EINVALwhenw < 176 || h < 176, where 176 is computed dynamically from the filter constants. The magic number is not hardcoded — changingSCALESorGAUSSIAN_LENupstream will auto-update the minimum. - On upstream sync: if Netflix upstream lands a similar init-time guard, keep the fork's version — the helper name
ms_ssim_init_simd_dispatchis fork-local (introduced to satisfy ADR-0141) and upstream's patch won't match. Both guards should be compatible; re-verify the reducer after rebase. - Re-test on rebase:
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: 3/3 subtests pass.
# Reducer check (confirms the guard is load-bearing):
git stash push core/src/feature/float_ms_ssim.c
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: Fail: 1 (tests fail without the guard).
git stash pop
0045 — vmaf_read_pictures monotonic-index guard (ADR-0152)¶
- ADR: ADR-0152
- Upstream source: Netflix upstream issue #910 (OPEN as of 2026-04-24). No upstream fix has landed; the fork adds the guard independently, per the 2021-10-14 maintainer comment that recommended exactly this shape.
- Touches:
core/src/libvmaf.c— addunsigned last_index+bool have_last_indexfields toVmafContext; prepend a monotonic-index check insideread_pictures_validate_and_prep(returns-EINVALon duplicates / regressions); update the two new fields at the tail of the same helper on success.core/test/test_read_pictures_monotonic.c— new 3-subtest reducer covering the Netflix#910 sequence and the two classes of rejection (duplicate, out-of-order).core/test/meson.build— register the new test executable.- Invariant (load-bearing, enforced by the reducer):
vmaf_read_pictures(vmaf, ref, dist, index)returns-EINVALwhenhave_last_index && index <= last_index. Flush (vmaf_read_pictures(vmaf, NULL, NULL, 0)) routes toflush_contextbefore the guard runs — flushing remains always-available independent of the last accepted index. - On upstream sync:
- If Netflix upstream eventually lands a similar guard at the API boundary, keep the fork's version — the helper function name (
read_pictures_validate_and_prep) is fork-local (ADR-0146), upstream's patch will target a different insertion point. Both guards should be compatible; re-verify the reducer after rebase. - If upstream instead lands an internal reordering mechanism (buffer-and-sort frames before dispatch), revisit this decision — the fork's API-level contract is stricter and may need to relax to match. Open a new ADR if so.
- Re-test on rebase:
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: 3/3 subtests pass.
# Reducer check (confirms the guard is load-bearing):
git stash push core/src/libvmaf.c
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: Fail: 1 (the test rejects the un-guarded behaviour).
git stash pop
0044 — i686 (32-bit x86) build-only CI job (ADR-0151)¶
- ADR: ADR-0151
- Upstream source: Netflix upstream issue #1481 (OPEN as of 2026-04-24). Reports i686 compile failure on
_mm256_extract_epi64. Workaround documented in the issue:-Denable_asm=false. - Touches:
build-aux/i686-linux-gnu.ini— new cross-file; gcc +-m32+cpu_family = 'x86'/cpu = 'i686'. Noexe_wrapper..github/workflows/libvmaf-build-matrix.yml— new matrix row withi686: trueflag + new install-deps step forgcc-multilib+g++-multilib; existing "Run tests" + "Run tox tests (ubuntu)" steps widened with&& !matrix.i686guards.- Invariants:
- The i686 matrix row pins
-Denable_asm=false— this is the upstream-documented workaround for_mm256_extract_epi64's missing declaration on 32-bit x86 targets. Do NOT remove the flag without first gating every_mm256_extract_epi64call site incore/src/feature/x86/adm_avx2.c+motion_avx2.c+adm_avx512.con__x86_64__. Removing the flag naively will re-break the build. - No
exe_wrapperin the cross-file: meson marks tests asSKIP 77even though the host can run i686 binaries natively. Build-only gate by design. - On upstream sync:
- If upstream Netflix fixes #1481 at source (by gating the intrinsic calls on
__x86_64__or by emulating via two_mm256_extract_epi32halves), sync the fix and re-enable ASM on the i686 row (drop-Denable_asm=falsefrommeson_extra). Re-verify bit-exactness via/cross-backend-diffon the x86_64 golden pair. - If upstream marks i686 unsupported in meson (e.g. via a hard error), the fork's i686 row should be removed or downgraded to
continue-on-error: true. - Re-test on rebase (Ubuntu host with
gcc-multilib):
meson setup libvmaf core/build-i686 \
--cross-file=build-aux/i686-linux-gnu.ini \
-Denable_asm=false \
-Denable_cuda=false -Denable_sycl=false
ninja -C core/build-i686
file core/build-i686/tools/vmaf
# Expect: ELF 32-bit LSB pie executable, Intel i386
CI runs this same sequence via the new matrix row.
0058 — Tiny-AI Netflix corpus training scaffold (ADR-0252)¶
- ADR: ADR-0252.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training harness or MCP server.
- Touches:
ai/— training harness;NflxLocalDatasetloader reads from--data-root(never from a hardcoded path).docs/ai/training-data.md— corpus path convention and loader API docs; purely additive.mcp-server/vmaf-mcp/tests/test_smoke_e2e.py— new e2e smoke test; references only committed golden fixtures.- Invariants (load-bearing):
- Data path is local-only.
.workingdir2/netflix/is gitignored; no YUV from this corpus is ever committed. The--data-rootCLI flag must remain the sole mechanism for locating the corpus. - Smoke test uses only committed fixtures.
test_smoke_e2e.pyreferencespython/test/resource/yuv/src01_hrc00_576x324.yuv(a committed golden file), never the local corpus path. On upstream sync the golden YUV path must stay stable. - No Netflix golden assertion is modified. The
places=4tolerance intest_smoke_e2e.pyasserts against thevmaf_v0.6.1CPU reference; it is not a golden assertion and may be adjusted by/regen-snapshotswith justification. - On upstream sync: zero interaction with Netflix upstream. The
ai/subtree andmcp-server/are wholly fork-local; upstream merges are conflict-free here. If Netflix ever ships a training harness, reconcile separately. - Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (vmaf binary)
# Skips automatically if binary or golden YUV is absent.
0085 — Research-0030 Phase-3b multi-seed validation (Gate 1 passed)¶
- No ADR. Empirical research digest closing Gate 1 of the 3-gate v2 validation chain. Architecture decision unchanged.
- Upstream source: fork-local. Netflix has no multi-seed validation surface for tiny-AI training.
- Touches (additive only):
docs/research/0030-phase3b-multiseed-validation.md— per-seed PLCC tables + stability analysis + Gate 2/3 plan.ai/scripts/phase3_subset_sweep.py— adds--seedsflag (comma-separated list) + per-seed result aggregation.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The +0.0175 Δ is multi-seed mean PLCC, not seed-0 PLCC. Don't cite the +0.0106 from Research-0029 once Research-0030 lands; the multi-seed number is more trustworthy.
- Subset B is more stable than canonical-6 across seeds. Don't ship a v2 model citing single-seed numbers — always report multi-seed mean ± seed-mean-std for any tiny-AI metric in a future digest.
- The
--seedsflag aggregates by flattening (seed × fold) pairs. The reportedmean_plccis the mean of alln_seeds × n_foldsmeasurements;seed_mean_plcc_stdis the std across per-seed means, which is the right number for "is the result seed-stable". - On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the runs/ files reproduce from the canonical command.
0084 — Research-0029 Phase-3b StandardScaler retry (positive result)¶
- No ADR. Empirical research digest; revives the Research-0026 hypothesis after the Research-0028 negative result. The architectural decision (ship
vmaf_tiny_v2) is gated on three validation steps documented in the digest §"Required before shipping". - Upstream source: fork-local. Netflix has no tiny-AI preprocessing-sensitivity analysis surface.
- Touches (additive only):
docs/research/0029-phase3b-standardscaler-results.md— per-fold tables + apples-to-apples comparison + 3-gate pre-shipping checklist.ai/scripts/phase3_subset_sweep.py— adds--standardizeflag +_standardize_inplacehelper.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- StandardScaler statistics MUST be fit per-fold on the train split only. Fitting on the full data would leak held-out information into LOSO; the
_standardize_inplacehelper enforces this by taking only the train slice as input. - A shipped
vmaf_tiny_v2.onnxMUST bundle its scaler(mean, std)in the sidecar JSON per ADR-0049 — otherwise inference applies different normalisation than training and the win evaporates. Currently UN-implemented; tracked as a §"Caveats" #5 follow-up. - Subset B's feature list is the load-bearing finding:
adm2,adm_scale3,vif_scale2,motion2,ssimulacra2,psnr_hvs,float_ssim. Phase-3c experiments may shift the optimal arch / lr / epochs but should keep this set. - On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the
--standardizeinvocation in §"Reproducer".
0082 — Research-0028 Phase-3 subset sweep (negative-result digest)¶
- No ADR. Empirical research digest. The architectural decision (no v2 model ships from this Phase) is governed by Research-0027's pre-registered stopping rule.
- Upstream source: fork-local. Netflix has no tiny-AI subset- sweep surface.
- Touches (additive only):
docs/research/0028-phase3-subset-sweep.md— per-fold tables adline + standardisation caveat + Phase-3b/c/d follow-ups.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- canonical-6 stays the default until Phase-3b lands a ≥ 0.005 PLCC win (per Research-0027 stopping rule).
- The PLCC drop is most likely a feature-scale issue, not evidence the new features lack signal. Don't cite this digest to retire
ssimulacra2/adm_scale3from the candidate pool; re-test withStandardScalerfirst. - Phase-3 results are seed=0 only. Any v2-shipping decision needs 3-seed mean±std and KoNViD cross-check.
- On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; runs/ files are reproducible from the canonical command in §"Reproducer".
0081 — Research-0027 Phase-2 feature importance results¶
- No ADR. Empirical research digest closing Research-0026 Phase 2; the architectural decision (Subset A / B / C) is deferred to Phase-3 results in a future digest.
- Upstream source: fork-local. Netflix has no cross-metric feature-importance analysis surface.
- Touches (additive only):
docs/research/0027-phase2-feature-importance.md— per-method top-10 + consensus + redundancy + Phase-3 subset recommendations.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Consensus top-10 is the load-bearing finding:
adm2,adm_scale3,ssimulacra2,vif_scale2. Phase-3 candidate subsets MUST include all four. - The 11-pair redundancy table is corpus-specific — measurements on Netflix Public 9-source. KoNViD-1k cross- check is a Phase-3 prerequisite if Subsets B/C advance.
runs/full_features_netflix.parquetandruns/full_features_correlation.jsonstay gitignored. Reproducer in §"Reproducer" regenerates both.- On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the
runs/files are reproducible from the canonical commands.
0080 — Phase-2 analysis scripts (Research-0026 Phase 2 prep)¶
- No ADR. Pure analysis scaffolding; the architectural decision (which features to ship in v2) is gated on Phase 2's numerical output via Research-0027.
- Upstream source: fork-local. Netflix has no tiny-AI training nor cross-metric correlation tooling.
- Touches (additive only):
ai/scripts/extract_full_features.py— parquet extractor over Netflix corpus withFULL_FEATURES. Per-clip JSON cache at$XDG_CACHE_HOME/vmaf-tiny-ai-full/<source>/<dis_stem>.json.ai/scripts/feature_correlation.py— Pearson + MI + LASSO- consensus top-K analyser; outputs JSON.
ai/tests/test_feature_correlation.py— 5 pytest cases against synthetic parquet (no libvmaf dependency).CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The per-clip JSON cache and the
FULL_FEATUREStuple must stay in lock-step. If the tuple grows (or shrinks), pre-existing cache files become stale and silently misalign their storedper_framecolumns with the new tuple. The extractor MUST be re-run with a cleared cache whenFULL_FEATURESchanges. Regression hint:test_default_features_unchangedintest_feature_sets.pyalready guards the canonical 6; extend coverage toFULL_FEATURESif rebases touch it. motion3resolves to extractormotion_v2in_METRIC_TO_EXTRACTOR, notmotion3(the upstream-canonical extractor name in the integer_motion_v2 module). The CLI--feature motion3does NOT exist. The JSON output key isinteger_motion3which_lookupfinds via theinteger_fallback.admandvifaggregates are NOT inFULL_FEATURES. The integer extractor emitsinteger_adm2andinteger_vif_scale0..3but no bareadm/vif. Listing them produced all-NaN columns in v1 — fixed in PR #185 amend.- On upstream sync: zero interaction. Pure fork-side analysis tooling.
- Re-test on rebase:
pytest ai/tests/test_feature_correlation.py ai/tests/test_feature_sets.py -v
# Expect: 14 passed in <1 s.
0079 — Tiny-AI feature-set registry (Research-0026 Phase 1)¶
- No ADR. Pure additive extension of an existing module; the architectural decision (which features, which model) lives in Research-0026's go/no-go gate after Phase 2.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training pipeline.
- Touches (additive only):
ai/data/feature_extractor.py— addsFULL_FEATURES(21 entries),FEATURE_SETSregistry,resolve_feature_set()helper._METRIC_TO_EXTRACTORgrew 11 → 25 entries.ai/tests/test_feature_sets.py— new 9-test smoke suite.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant — these are load-bearing):
DEFAULT_FEATURESstays the canonical 6-tuple matchingvmaf_v0.6.1's SVR input layout. Testtest_default_features_unchangedis the regression guard; any quiet broadening would invalidate every shipped tiny-AI ONNX (input-dim baked into the model). If a future change must broaden the default, ship a paired model swap under ADR-0049 sidecar policy.FULL_FEATURESexcludeslpipsandfloat_momentper Research-0026 §"Open questions" Q1. Testtest_full_features_excludes_lpips_and_momentenforces. Adding either would re-classify the experiment from "tiny model on classical features" to "ensemble of DNNs".- Every entry in
FULL_FEATURESMUST have an entry in_METRIC_TO_EXTRACTOR. Testtest_every_full_feature_has_extractor_mappingis the guard — without the mapping the libvmaf CLI silently emits NaN columns for the missing metric. - On upstream sync: zero interaction. Fork-only training surface.
- Re-test on rebase:
0078 — Research-0026 cross-metric feature fusion plan¶
- No ADR. Pure research-plan digest; the architectural decision (which features to add) is deferred to Research-0027 follow-up after Phase 2 numbers land.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training and no broader-feature-set hypothesis under investigation.
- Touches (additive only):
docs/research/0026-cross-metric-feature-fusion.md— 4-phase experimental plan + cost estimate + go/no-go criteria.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The 6-feature canonical baseline (
adm2,vif_scale0..3,motion2) stays the default. Any v2 model is opt-in via a newfeature_setfield in the sidecar JSON; existingvmaf_tiny_v1.onnxusers get the same numbers. lpipsis OUT of the candidate pool (Phase 1/2). It's DNN-based and would blur the line between "tiny model on classical features" and "ensemble of DNNs". Revisit only if classical features can't close the gap.- On upstream sync: zero interaction. Pure fork-side research planning.
- Re-test on rebase: documentation-only; no test surface.
0077 — Research-0025 FoxBird outlier resolved via KoNViD combined training¶
- No ADR. Empirical research digest closing the open question in Research-0023 §5; no architecture or policy decision. Pure documentation of an empirical result.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training, no KoNViD-1k integration, and no LOSO eval surface.
- Touches (additive only):
docs/research/0025-foxbird-resolved-via-konvid.md— per-clip table + comparison to Netflix-only baselines + interpretation + caveats + next-experiment list.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The training-fit per-clip numbers in §"Per-clip result" are NOT held-out generalisation metrics — FoxBird is in the training set. The proper validation is the LOSO sweep on the combined corpus (§"Next experiments" #1). Don't cite the 0.9936 FoxBird PLCC as a generalisation number; cite it as "training-fit on combined corpus, 5.4× RMSE improvement vs Netflix-only".
- Combined trainer command line is canonical. The reproduction recipe in §"Setup" includes
--seed 0,--konvid-val-fraction 0.1,--val-source Tennis,--val-mode netflix-source-and-konvid-holdout. Changing any knob invalidates the per-clip numbers. runs/tiny_combined_canonical/stays gitignored. The final ONNX is reproducible from the parquet + Netflix corpus + the canonical CLI; the durable record is the digest's table.- On upstream sync: zero interaction. Research digest is fork-only.
- Re-test on rebase:
python ai/train/train_combined.py \
--netflix-root .workingdir2/netflix \
--konvid-parquet ai/data/konvid_vmaf_pairs.parquet \
--model-arch mlp_small --epochs 30 --batch-size 256 --lr 1e-3 \
--val-mode netflix-source-and-konvid-holdout \
--val-source Tennis --konvid-val-fraction 0.1 --seed 0 \
--out-dir runs/tiny_combined_canonical
# Expect: FoxBird PLCC ≈ 0.9936 ± 1e-3 (numerical-noise floor),
# mean PLCC ≥ 0.9983 across 9 Netflix clips.
0076 — Research-0024 vif/adm upstream-divergence digest (Strategy E doc)¶
- No ADR. Pure documentation digest; the divergence decisions it ratifies are already governed by ADR-0138 / 0139 / 0142 / 0143 (vif SIMD bit-exactness contract) and ADR-0024 (Netflix golden-data immutability). The digest itself fits the per-PR research-digest deliverable bar from ADR-0108.
- Upstream source: forward-looking — pre-emptively documents the fork's non-port of Netflix
4ad6e0ea/41d42c9e/bc744aa3/8c645ce3(vif chain) and4dcc2f7c(float_adm chain). Strategy A onb949cebfmotion chain stays approved. - Touches (additive only):
docs/research/0024-vif-upstream-divergence.md— 5-strategy decision matrix + numerical-risk analysis for each chain.core/src/feature/AGENTS.md— two new "rebase-sensitive invariants" entries pinning the vif and adm divergences.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant — these are the whole point):
- Do not port
4ad6e0ea(vif runtime helpers) or8c645ce3(vif prescale options) verbatim. They replace the precomputedvif_filter1d_table_stable whose frozenconst floatGaussians make AVX2 == AVX-512 == NEON == scalar bit-for-bit. A future opt-in second-path port (Strategy C, runtime helpers behind--vif-prescale != 1) is allowed but must not touch the default code path. - Do not port
4dcc2f7cfloat_adm options chain. The 12-parametercompute_admsignature change cascades through SIMD (avx2 / avx512 / neon) and 3 GPU backends (vulkan / cuda / sycl). The newaimfeature has no fork- side golden values; defer until concrete user demand. - Mirror bugfix
41d42c9eis a separate decision. Must come paired withplaces=4 → places=3golden loosening per ADR-0142 Netflix-authority precedent. Not part of Strategy E; eligible for a focused single-purpose PR if any shipped model drifts more thanplaces=3because of the missing fix. b949cebfmotion chain port stays APPROVED under Strategy A (verbatim, float_motion-side only). Float_motion has no precomputed-table investment to protect; existing fork integer_motion already has 6/9 of these options; cheap to mirror onto float_motion.- On upstream sync: zero conflict — pure additions to research/ and AGENTS.md.
- Re-test on rebase: documentation-only PR; rendered markdown is the only verification surface.
# Re-run the diff scan that produced the digest (catches new
# upstream commits since 9dac0a59):
git fetch upstream && git log --pretty=format:'%h %s' \
upstream/master ^origin/master --since="2026-01-01" \
-- core/src/feature/{float_,integer_,}{vif,motion,adm,cambi}*.{c,h} \
core/src/feature/{vif,motion,adm,cambi}_options.h \
| head -30
# If new vif / adm option ports appear, update Research-0024 §"Same
# divergence test for motion + float_adm" before deciding to port.
0075 — Upstream 798409e3 + 314db130 ports (CUDA null-deref + remove all.c)¶
- No ADR. Pure upstream cherry-picks per ADR-0108 carve-out ("pure upstream syncs and
port-upstream-commitPRs are exempt"). - Upstream source:
798409e3(Lawrence Curtis, 2026-04-20): "Fix null deref crash on prev_ref update in pure CUDA pipelines"314db130(Kyle Swanson, 2026-04-28): "libvmaf/feature: remove empty translation unit all.c"- Touches (additive / removal only):
core/src/libvmaf.c— addsif (ref && ref->ref)guard beforevmaf_picture_ref(&vmaf->prev_ref, ref)at the two threaded paths (threaded_enqueue_oneline 1057 andthreaded_read_pictures_batchline 1105). Main path at line 1597 already has the guard.core/src/feature/all.c— file deleted.core/src/meson.build— drops thefeature_src_dir + 'all.c'line.core/src/feature/offset.c— updates the// NOLINTNEXTLINEcomment to dropall.cfrom the list of per-feature consumers.CHANGELOG.mdUnreleased § Fixed (798409e3) + § Changed (314db130).- Invariants (rebase-relevant):
- The fork has THREE prev_ref update sites; all need the
if (ref && ref->ref)guard. The mainvmaf_read_picturespath already had it (viaread_pictures_update_prev_refhelper); the threaded paths (#ifdef VMAF_BATCH_THREADING) inherited the unguarded shape from upstream's old code. Future upstream rebases must preserve all three guards even if Netflix refactors the threaded paths. all.cdeletion is symbol-safe. Allcompute_*functions it forward-declared are reached via per-extractor TUs that#includethe relevant<feature>.h. No external linker dependency onall.c's symbols.- On upstream sync: zero conflict expected — fork now matches upstream tip on these two surfaces.
- Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=disabled
ninja -C build-cpu
meson test -C build-cpu # 37 tests, all pass.
0074 — Combined Netflix + KoNViD-1k trainer driver¶
- No ADR. Pure engineering follow-up; the architecture rationale is fully covered by ADR-0203 (training-prep architecture) and Research-0023 §5 (FoxBird-class outlier needs broader corpus).
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI trainer.
- Stacks on the KoNViD-1k loader bridge (PR #178 / rebase-note 0073). Rebase order: land 0073 first.
- Touches (additive only):
ai/train/train_combined.py— concatenating trainer that reuses_build_model/_train_loop/export_onnxfromai/train/train.py.ai/tests/test_train_combined_smoke.py— 5 pytest cases (key splitter +--epochs 0paths, no libvmaf or real corpus required).docs/ai/training.md— "Combining KoNViD with the Netflix corpus" subsection rewritten from "follow-up" to runnable.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Reuse the canonical training-loop helpers. Don't fork
_build_model/_train_loop/export_onnxinto this file. Both trainers must share the model factory so a future change (e.g. addingmlp_large) lands in one place. - KoNViD train/val splits hold out whole clip keys, not random frames. A frame-level split would let frames from the same clip leak across train/val and inflate PLCC by 5-10 pp (well-known VQA pitfall — same reasoning as ADR-0203's Netflix 1-source-out split).
- Missing data falls back, not errors. Missing
--konvid-parquet→ Netflix-only path. Missing--netflix-root→ KoNViD-only path. Both missing → initial- weights ONNX export +rc=0so the smoke command always produces a deterministic artefact. - On upstream sync: zero interaction; pure fork-local trainer.
- Re-test on rebase:
pytest ai/tests/test_train_combined_smoke.py -v
# Expect: 5 passed (under ~3 s, no libvmaf required).
python ai/train/train_combined.py --epochs 0 \
--netflix-root /tmp/missing --konvid-parquet /tmp/missing.parquet \
--out-dir /tmp/combined_smoke
# Expect: <out-dir>/mlp_small_combined_final.onnx written, rc=0.
0073 — KoNViD-1k → VMAF-pair acquisition + loader bridge¶
- No ADR. Acquisition + loader pieces are pure additions; the methodology fits inside ADR-0203 / Research-0019.
- Upstream source: fork-local. KoNViD-1k integration is a fork-only training-data play.
- Touches (additive only):
ai/scripts/konvid_to_vmaf_pairs.py— acquisition pipeline.ai/train/konvid_pair_dataset.py—KoNViDPairDatasetclass mirroringNetflixFrameDataset's interface.ai/tests/test_konvid_pair_dataset.py— 5 pytest cases.docs/ai/training.md— new "C1 (KoNViD-1k corpus)" section.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
KoNViDPairDatasetmirrorsNetflixFrameDatasetshape.feature_dim == 6,numpy_arrays() → (X, y)returns(n_frames, 6)+(n_frames,). IfNetflixFrameDataset's feature order changes, mirror it here.- Acquisition parquet schema is fixed. Required columns:
key,frame_index,vif_scale0..3,adm2,motion2,vmaf. Add freely; do NOT rename / drop those. ai/data/konvid_vmaf_pairs.parquetand$VMAF_TINY_AI_CACHE/konvid-1k/stay gitignored. They regenerate from raw KoNViD.mp4sources.- On upstream sync: zero interaction.
- Re-test on rebase:
pytest ai/tests/test_konvid_pair_dataset.py -v
# Expect: 5 passed
python ai/scripts/konvid_to_vmaf_pairs.py --max-clips 5
# Expect: ~7 s wall, ai/data/konvid_vmaf_pairs.parquet with
# 5 unique keys × ~200 frames each.
0072 — Tiny-AI 3-arch LOSO eval harness + Research-0023¶
- No ADR. Methodology fits inside Research-0023; ADR-0203 already covers the training-prep architecture and the three-arch sweep concept.
- Research digest:
docs/research/0023-loso-3arch-results.md. - Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
- Touches (additive only):
ai/scripts/eval_loso_3arch.py— new harness; reuses the_load_session+_load_clip+CLIPShelpers fromeval_loso_mlp_small.py(PR #165).docs/research/0023-loso-3arch-results.md— methodology + per-fold tables formlp_small/mlp_medium/linear.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Reuse the PR #165 helpers. Don't fork the
_load_sessionexternal-data workaround into a copy — both scripts must keep using the same import. If a follow-up re-exports the shipped baselines with correctedexternal_data.location, both scripts deprecate the workaround simultaneously. runs/andmodel/tiny/training_runs/stay gitignored. The harness writesruns/loso_eval/loso_3arch_eval.{json,md}; the durable record is the table in Research-0023 §2 + the per-fold tables in §3. Regenerate via the loop in §6 of the digest.- On upstream sync: zero interaction. Pure fork-local evaluation harness.
- Re-test on rebase:
python ai/scripts/eval_loso_3arch.py
diff <(jq -r '.archs.mlp_small.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9808)
diff <(jq -r '.archs.mlp_medium.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9727)
diff <(jq -r '.archs.linear.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.3679)
# Expect: identical lines on a populated cache + identical fold ONNX.
0071 — T7-16 ADM Vulkan/SYCL drift verified-resolved (doc close)¶
- No ADR. Verification-only close, sister of T7-15.
- Upstream source: fork-local. ADM cross-backend gate is a fork-only test surface; Netflix/vmaf has no Vulkan or SYCL backend.
- Touches (additive only):
docs/state.md— new "Recently closed" row for T7-16..workingdir2/BACKLOG.md— T7-16 row marked closed (local- only planning dossier; gitignored).CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
places=4cross-backend ADM contract. Empiricaladm_scale2max_abs_diff is now 1e-6 (print floor; ULP=0) on Vulkan device 0 (NVIDIA), device 1 (Mesa anv on Arc), and SYCL device 0 (Arc); residualadm_scale1 ≈ 3.1e-5andadm2 ≈ 5e-6on 1/48 frames passplaces=4(5e-5 tolerance) but failplaces=5. Hold the gate atplaces=4.- No ADM kernel source change. Fix is environmental (NVCC + driver + SYCL runtime).
- On upstream sync: zero interaction.
- Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--feature adm --backend vulkan --device 0 --places 4 \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324
# Expect: 0/48 mismatches across all 5 ADM metrics.
0070 — T7-15 motion CUDA/SYCL drift verified-resolved (doc close)¶
- No ADR. Verification-only close; no code change in PR #172.
- Upstream source: fork-local. Cross-backend gate is a fork-only test surface; not in Netflix/vmaf.
- Touches (additive only):
docs/state.md— "Recently closed" row for T7-15..workingdir2/BACKLOG.md— T7-15 row marked closed (local- only planning dossier; gitignored).CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- The
places=4cross-backend gate stays atplaces=4. Empirical max_abs_diff is currently 0.0 (CUDA) or 1e-6 (SYCL/ Vulkan, JSON%frounding floor); tightening toplaces=5could be tempting but the 1e-6 print-floor would then make the SYCL + Vulkan rows fail. Hold atplaces=4until--precision=maxis wired into the diff tool. - No motion-kernel source change. PR #172 didn't modify
core/src/feature/cuda/integer_motion/*.cuorcore/src/feature/sycl/integer_motion_sycl.cpp. The fix is environmental (NVCC + driver), so the next CI run on a fresh image needs to be re-verified against the gate. - On upstream sync: zero interaction.
- Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature motion --backend cuda \
--places 4
# Expect: 0/48 mismatches, max_abs_diff = 0.0
0069 — libvmaf_vulkan.h installed under prefix (build bug)¶
- No ADR. Build-system bug fix; matches existing CUDA / SYCL install conditions.
- Upstream source: fork-local. Vulkan backend is fork-only; Netflix/vmaf has no
libvmaf_vulkan.h. - Touches:
core/include/core/meson.build— adds anis_vulkan_enabledgate that handles thefeatureoption'senabled/autostates; appendslibvmaf_vulkan.htoplatform_specific_headerswhen active.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- Install rule mirrors the CUDA / SYCL pattern but uses the feature-option API. The
is_cuda_enabled = get_option('enable_cuda') == trueboolean idiom doesn't apply toenable_vulkanbecause that's a feature option, not a boolean. Use.enabled() or .auto(). Don't "simplify" to== true— that would silently drop the install in theautostate. - Pairs with
ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patchwhich probes for the header viacheck_pkg_config libvmaf_vulkan "libvmaf >= 3.0.0" libvmaf/libvmaf_vulkan.h vmaf_vulkan_state_init_external. Removing the install rule re-introduces lawrence's 2026-04-28 symptom: FFmpeg silently drops thelibvmaf_vulkanfilter despite--enable-libvmaf-vulkan. - On upstream sync: zero interaction; Vulkan backend is fork-only.
- Re-test on rebase:
cd libvmaf
CC=icx CXX=icpx meson setup build -Denable_vulkan=enabled \
-Denable_cuda=true -Denable_sycl=true -Db_lto=false
ninja -C build
meson install -C build --destdir /tmp/libvmaf-install
ls /tmp/libvmaf-install/usr/local/include/libvmaf/libvmaf_vulkan.h
# Expect: file exists.
0066 — --backend cuda inverted-gpumask fix (CLI bug)¶
- No ADR. Bug fix; behaviour now matches the public-header
VmafConfiguration::gpumaskcontract. - Upstream source: fork-local. The
--backendCLI selector was added by the fork (Netflix/vmaf has no exclusive-backend selector). - Touches (additive + 1-line behavioural fix):
core/tools/cli_parse.c::parse_cli_args—--backend cudabranch setsgpumask = 0(wasgpumask = 1).core/test/test_cli_parse.c— 5 new regression tests (test_backend_{cpu,cuda_engages_cuda,cuda_preserves_explicit_gpumask,sycl,vulkan}) plusrun_aom_ctc_tests/run_backend_testshelper split to keeprun_testsunder the function-size budget.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
VmafConfiguration::gpumasksemantics:if gpumask: disable CUDA.compute_fex_flagsinsrc/libvmaf.croutes CUDA only whengpumask == 0. Any code path that sets a non-zerogpumaskto "request CUDA" silently disables it. The CLI's--backend cudabranch must setgpumask = 0and rely onuse_gpumask = trueto triggervmaf_cuda_state_init. Do not "fix" this back togpumask = 1— it's the bug being fixed.- Explicit
--gpumask=N --backend cudapreserves N. A user who passes--gpumask=2already hasuse_gpumask = true, so the--backend cudabranch's defaulting block (gated on!settings->use_gpumask) is skipped. Thetest_backend_cuda_preserves_explicit_gpumaskregression locks this in. - On upstream sync: zero interaction;
--backendis fork-only. - Re-test on rebase:
./build/test/test_cli_parse | grep -E 'backend_'
# Expect: 5 backend tests pass.
build/tools/vmaf -r REF -d DIS -w 576 -h 324 -p 420 -b 8 \
--model "path=model/vmaf_v0.6.1.json" --threads 1 \
--backend cuda --output cuda.json --json -q
python3 -c "import json; d=json.load(open('cuda.json')); \
assert len(d['frames'][0]['metrics']) == 12, 'CUDA not engaged'"
0067 — Tiny-AI PTQ accuracy across Execution Providers (T5-3e)¶
- No ADR. Investigation/measurement PR; ADR-0129 already governs the PTQ workstream. Findings update
docs/research/0006-tinyai-ptq-accuracy-targets.md§"GPU-EP quantisation" — that section was previously a deferred-open-question; it is now the empirical landing spot. - Research digest: same file (Research-0006).
- Upstream source: fork-local. Netflix/vmaf does not ship a PTQ harness or any tiny-AI ONNX path.
- Touches (additive only):
ai/scripts/measure_quant_drop_per_ep.py— new sibling ofmeasure_quant_drop.py. CPU+CUDA via ORT; Arc / OpenVINO-CPU via the nativeopenvinoPython runtime (noonnxruntime-openvinobecause no cp314 wheel exists). Reuses the_load_sessionrename workaround from PR #165 + avalue_info-strip fix so dynamic-PTQ doesn't choke on the shipped MLP ONNX.docs/ai/quant-eps.md— new user doc; linked fromdocs/ai/index.md.docs/research/0006-tinyai-ptq-accuracy-targets.md— refreshed header, replaced "GPU-EP open question" with the measurement table, fixed pre-existing MD040/MD060 lints surfaced on the touched file.docs/ai/index.md— added the quant-eps row, rewrapped to 80 cols.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant):
measure_quant_drop.py(the CI gate) is unchanged. The new script is purely additive. Any rebase that conflates the two scripts must keep the CI gate CPU-only — Arc int8 is broken, so a per-EP gate would red-light every PR.value_infostrip is required forvmaf_tiny_v1*dynamic PTQ. The shipped MLP ONNX duplicate weight tensors invalue_info, which makesquantize_dynamicraiseInferred shape and existing shape differ. The fix is in_save_inlined. Don't remove it during a refactor unless the underlying ONNX is regenerated.- CUDA-12 ABI shim. ORT-GPU 1.25 wheels link
libcublasLt.so.12even on CUDA-13 hosts. The reproduction recipe pins thenvidia-*-cu12wheels and prepends them toLD_LIBRARY_PATH. If a future ORT wheel drops the cu12 ABI we can cut the shim, but the script tolerates either since it doesn't import any CUDA symbol itself. - On upstream sync: zero interaction; entirely fork-local.
- Re-test on rebase:
SP=$VIRTUAL_ENV/lib/python3.14/site-packages/nvidia
export LD_LIBRARY_PATH="$SP/cublas/lib:$SP/cudnn/lib:$SP/cuda_nvrtc/lib:$SP/cuda_runtime/lib:$SP/cufft/lib:$SP/curand/lib:$SP/cusolver/lib:$SP/cusparse/lib:$SP/cuda_cupti/lib:$SP/nvtx/lib:$SP/nvjitlink/lib"
python ai/scripts/measure_quant_drop_per_ep.py \
--eps cpu cuda openvino \
--extra-fp32 vmaf_tiny_v1.onnx vmaf_tiny_v1_medium.onnx \
--out runs/quant-eps-$(date +%Y-%m-%d)
# Expected: CPU + CUDA PASS (drop ≤ 1.2e-4); OpenVINO Arc ERR
# (compile failure for Conv-int8) or NaN (MatMul-int8) until a
# newer intel_gpu plugin lands.
0065 — testdata/bench_all.sh correct backend-engagement flags¶
- No ADR. Bug fix; no behavioural surface change beyond "the bench actually engages the backends it claims to now."
- Upstream source: fork-local.
testdata/bench_all.shis a fork-only bench harness; not in Netflix/vmaf. - Touches (additive only):
testdata/bench_all.sh— switched per-row flag pattern from the disable-only singletons (--no_syclfor "CUDA", etc.) to the correct engagement form (--gpumask=0 --no_sycl --no_vulkanfor CUDA,--sycl_device=0 --no_cuda --no_vulkanfor SYCL,--vulkan_device=0 --no_cuda --no_syclfor Vulkan, and--no_cuda --no_sycl --no_vulkanfor CPU). Added a 4th column (Vulkan) to the comparator. Honours$VMAF_BINfor the binary path and$VMAF_ONEAPI_SETVARSfor the oneAPI install location.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- Disable-only singletons don't engage a backend.
--no_syclalone leaves CUDA available but unrequested.--no_cudaalone leaves SYCL available but unrequested. The CLI inits CUDA only whenc.use_gpumaskis set; SYCL only whenc.sycl_device >= 0orc.use_gpumask; Vulkan only whenc.vulkan_device >= 0. Any change to those gates that drops one of the per-row flags will re-introduce the silent CPU fallback. Verify after a rebase by inspecting JSONframes[0].metricskey counts (CPU 14-15, CUDA 11-12, Vulkan ~34) — seelibvmaf/AGENTS.md§"Backend-engagement foot-guns". gpumasksemantics are inverted from intuition.gpumask=0enables CUDA dispatch;gpumask=1disables it. The per-row CUDA flag is--gpumask=0, not--gpumask=1. Don't "fix" it to--gpumask=1for symmetry with sycl_device/vulkan_device — that's the bug being fixed (parallel to PR #170).- On upstream sync: zero interaction;
testdata/bench_all.shis fork-only. - Re-test on rebase:
bash testdata/bench_all.sh # smoke
# Verify each row's JSON keys match the expected per-backend count:
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cpu.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cuda.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_vulkan.json
0063 — Tiny-AI LOSO eval harness for mlp_small¶
- No ADR. The methodology fits inside Research Digest 0022; ADR-0203 already covers the training-prep architecture.
- Research digest:
docs/research/0022-loso-mlp-small-results.md. - Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
- Touches (additive only):
ai/scripts/eval_loso_mlp_small.py— new evaluation harness.docs/ai/loso-eval.md— usage doc.docs/research/0022-loso-mlp-small-results.md— methodology + results.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
_load_sessionworkaround for renamed-baseline ONNX. The shipped baselinesmodel/tiny/vmaf_tiny_v1*.onnxreference their pre-renameexternal_data.locationvalues. The workaround in_load_sessionrewrites the entries before handing the proto to ORT. Removing the workaround breaks the baseline phase. The proper fix (re-export with matching names) is tracked as a follow-up; until then this code path is load-bearing.runs/andmodel/tiny/training_runs/stay gitignored. The harness writes toruns/loso_eval/by default; do NOT promote any of those outputs into the tree. The 9 fold ONNX and the per-clip JSON cache regenerate from the corpus + trainer + libvmaf CLI.- On upstream sync: zero interaction. Pure fork-local evaluation harness.
- Re-test on rebase:
python ai/scripts/eval_loso_mlp_small.py
diff <(jq -r '.loso_aggregate.mean_plcc' runs/loso_eval/loso_mlp_small_eval.json) <(echo 0.9808)
# Expect: identical line on a populated cache + identical fold ONNX.
0064 — Section-A audit: 9 backlog rows + ADR cross-links¶
- No ADR. Process / docs PR; rows trace back to the individually-cited ADRs / research digests in their own References columns.
- Decision dossier:
.workingdir2/decisions/section-a-decisions-2026-04-28.md. - Source audit:
docs/backlog-audit-2026-04-28.md. - Upstream source: fork-local. Pure backlog hygiene PR; no Netflix code touched.
- Touches (additive only):
.workingdir2/BACKLOG.md— 9 new rows: T3-17, T3-18, T5-3e, T5-4, T7-35, T7-36, T7-37, T7-38; T6-1a row extended with the bisect-cache fixture sub-bullet.docs/research/0006-tinyai-ptq-accuracy-targets.md— drops the "defer until first user" framing on the GPU-EP quantisation open question per user direction; cross-links T5-3e.docs/research/0020-cambi-gpu-strategies.md— v2 follow-up section now cites T7-36 as the gate for opening the v2 row.docs/adr/0205-cambi-gpu-feasibility.md— Decision section's "follow-up integration PR" now cites T7-36.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant): none. Pure backlog text. Rebase-conflict risk is limited to the same
BACKLOG.mdtable rows that any future row addition would touch; trivial to re-resolve. - On upstream sync: zero interaction.
- Re-test on rebase: none — docs-only.
0062 — ssimulacra2 CUDA + SYCL twins (ADR-0206)¶
- ADR: ADR-0206.
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2 GPU implementation; this PR adds the CUDA + SYCL twins of the fork's ADR-0201 Vulkan kernel.
- Touches (additive + small wiring edits):
docs/adr/0206-ssimulacra2-cuda-sycl.mdand the index row indocs/adr/README.md.core/src/feature/cuda/ssimulacra2_cuda.{c,h}— new CUDA dispatch.core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cuandssimulacra2_mul.cu— new CUDA fatbins.core/src/feature/sycl/ssimulacra2_sycl.cpp— new SYCL extractor.core/src/feature/feature_extractor.c— two new extern declarations + two new entries infeature_extractor_list[].core/src/meson.build— addsssimulacra2_blur+ssimulacra2_multocuda_cu_sources, introduces (or extends, if PR #157 / ADR-0202 landed first) thecuda_cu_extra_flagsmap with assimulacra2_blurentry, threadsper_kernel_flagsinto the fatbin custom-target, and lists the two new C / CPP TUs.core/src/cuda/AGENTS.mdandcore/src/sycl/AGENTS.md— rebase invariant notes for the per-kernel--fmad=falseflag and the-fp-model=preciseSYCL build flag.docs/backends/cuda/overview.md,docs/backends/sycl/overview.md,docs/metrics/features.md— coverage matrix updates.CHANGELOG.mdUnreleased § Added.- Invariants (load-bearing on rebase):
- Per-kernel
--fmad=falseforssimulacra2_blur. The IIR'so = n2 * sum - d1 * prev1 - prev2must NOT fuse into FMAs — without the flag the recursive Gaussian's per-step rounding compounds across the 6-scale pyramid pastplaces=4. -fp-model=preciseon the SYCL feature build line. Removing it driftsssimulacra2_syclpastplaces=2through the IIR.- Hybrid host/GPU split mirrors Vulkan. Host runs YUV→RGB, XYB, downsample, and SSIM/EdgeDiff combine in double; GPU runs only mul + IIR blur. Any future PR that ports XYB or YUV→RGB onto the GPU MUST land alongside an updated ADR-0206 and re-validate
places=4on every Netflix CPU pair. - CUDA fex uses
.extract(synchronous), not.submit/.collect. Per-frame raw YUV is D2H-copied frompicture_cuda's device-sideVmafPicture.data[]into pinned host scratch viacuMemcpy2DAsync. Skipping the copy segfaults — direct host reads on aCUdeviceptrare the failure mode the prior agent's WIP hit. - On upstream sync: zero interaction with Netflix. The GPU coverage matrix for
ssimulacra2is wholly fork-local. - Re-test on rebase:
meson setup build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary ./build_cuda/tools/vmaf \
--feature ssimulacra2 --backend cuda --places 4 \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8
# Expect: 0/48 mismatches, max_abs_diff ~1e-6.
0061 — cambi GPU feasibility spike (ADR-0205)¶
- ADR: ADR-0205.
- Research digest:
docs/research/0020-cambi-gpu-strategies.md. - Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
- Touches (additive only):
docs/adr/0205-cambi-gpu-feasibility.md,docs/research/0020-cambi-gpu-strategies.md,docs/adr/README.mdindex row.core/src/feature/vulkan/cambi_vulkan.c— new dormant scaffold (not yet invulkan_sources, not yet registered).core/src/feature/vulkan/shaders/cambi_{derivative,decimate,filter_mode}.comp— new reference GLSL shaders, not yet in the build'sshaderslist.core/src/feature/AGENTS.mdinvariants +CHANGELOG.mdbullet.- Invariants (rebase-relevant):
- Hybrid host/GPU port by decision. If Netflix upstream tightens the c-value formula or histogram update protocol, the host residual call site in the eventual
cambi_vulkan.c::cambi_vulkan_extractmust be updated alongsidecambi.c::calculate_c_values— the same code is reused. Do NOT translate the c-values phase to GPU during any upstream-port PR; that optimisation belongs to the v2 strategy-III PR (deferred). - Scaffolds dormant in the spike PR. The
cambi_vulkan.cextractor returns-ENOSYSfromcambi_vulkan_init_stubuntil the integration follow-up wires it in. Do NOT registervmaf_fex_cambi_vulkan_scaffoldinfeature_extractor.c's list. - Shaders not in the build's shader list. Adding them to
core/src/vulkan/meson.build'svulkan_shaderslist before the integration PR produces orphaned*_spv.hheaders. Leave them alone in this spike PR. - On upstream sync: zero interaction. cambi.c itself is upstream-mirrored — Netflix changes flow through
port-upstream-commit; only the integration PR's host residual call site needs paired attention. - Re-test on rebase:
```bash meson setup build -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build
0059 — Tiny-AI Netflix corpus training prep (ADR-0203)¶
- ADR: ADR-0203.
- Upstream source: fork-local. Netflix/vmaf has no equivalent training surface.
- Touches:
ai/data/— Netflix loader, libvmaf-CLI feature extractor, distillation scoring.ai/train/— PyTorch dataset, eval harness, Lightning-style training entry point.ai/scripts/run_training.sh— convenience wrapper.ai/tests/— five new pytest modules (test_netflix_loader.py,test_dataset.py,test_eval.py,test_train_smoke.py, plusconftest.py).docs/ai/training.md— new "C1 (Netflix corpus)" section; existing sections untouched.ai/AGENTS.md— invariants section added.- Invariants (load-bearing):
- Filename ladder regex is fork-specific.
<source>_<quality>_<height>_<bitrate>.yuv(dis) +<source>_<fps>fps.yuv(ref). Upstream may publish a different naming convention later; do NOT merge them — keep this loader scoped to the Netflix corpus, add a sibling loader for any upstream alternative. - Per-clip cache schema is consumed by both dataset and any downstream tooling. Schema is
{features:{feature_names, per_frame, n_frames}, scores:{per_frame, pooled}}. Any change must invalidate$VMAF_TINY_AI_CACHE(delete or version-tag the directory). - Smoke command stays runnable without a built
vmafbinary. The_make_zero_payloadhelper inai.train.datasetinjects a fake payload for--epochs 0so CI gates don't drag a libvmaf build into the Python test surface. - YUV size probe never silently guesses.
probe_yuv_dimseither matches the 1920x1080 default, returns ffprobe's answer, or raises. Tests passassume_dims=(16, 16)explicitly for synthetic fixtures. - On upstream sync: no interaction with upstream. The
ai/subtree is wholly fork-local. - Re-test on rebase:
python -m pytest ai/tests/test_netflix_loader.py \
ai/tests/test_dataset.py ai/tests/test_eval.py \
ai/tests/test_train_smoke.py -v
python ai/train/train.py --epochs 0 --data-root /tmp/mock_corpus \
--assume-dims 16x16 --val-source BetaSrc --out-dir /tmp/out
0073 — Tiny-AI QAT trainer + first per-model QAT pass (T5-4)¶
- ADR: ADR-0207 (design), ADR-0208 (per-model impl).
- Touches:
ai/train/qat.py(new),ai/scripts/qat_train.py(rewrite fromNotImplementedErrorscaffold),ai/configs/learned_filter_v1_qat.yaml(new),ai/tests/test_qat_smoke.py(new),docs/ai/quantization.md(QAT tier added). All paths are wholly fork-local; no upstream Netflix/vmaf interaction. - Invariants:
- Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) is load-bearing. Both the legacy ONNX exporter (
quantized::conv2d) and the new TorchDynamo exporter (Conv2dPackedParamsBase.__obj_flatten__) refuse to consumeconvert_fxoutput on PyTorch 2.11. The bridge (state-dict diff to a fresh fp32 module + ORT static-quantize) is the only path that yields a QDQ ONNX. Do NOT collapse to a single-stepconvert_fx → torch.onnx.exportuntil both PyTorch issues are fixed; re-check both exporters on each PyTorch upgrade. - State-dict transfer matches by submodule name + shape.
_copy_qat_weights_into_fp32walksfp32_statekeys, finds the same key in the FX-prepared module, copies the tensor. Tiny-AI models today have stable submodule names (entry,body.*,exit); a model architecture that uses top-levelnn.Sequentialwould break this becauseprepare_qat_fxrenames Sequential children to numeric indices. TheRuntimeError("0 tensors copied")guard catches the silent failure mode. - FX preparation runs on CPU. PyTorch 2.11's FX symbolic tracer is flaky on CUDA buffers; the trainer migrates the model to CPU before
prepare_qat_fxand back to the accelerator for the fine-tune phase. The smoke test deliberately exercises the CPU path so this stays covered. torch.ao.quantizationdeprecation will hard-fail in PyTorch 2.10. Migration target istorchao.quantization.pt2e(prepare_pt2e/convert_pt2e); the two-step pipeline is mostly pt2e-compatible — only the FX-prep call changes.- On upstream sync: no interaction with upstream. The
ai/subtree is fully fork-local. - Re-test on rebase:
python -m pytest ai/tests/test_qat_smoke.py -v
python ai/scripts/qat_train.py \
--config ai/configs/learned_filter_v1_qat.yaml \
--output /tmp/qat_smoke.int8.onnx --smoke
0074 — GPU-parity matrix CI gate (T6-8 / ADR-0214)¶
- Touched surfaces (fork-local):
scripts/ci/cross_backend_parity_gate.py(new),.github/workflows/tests-and-quality-gates.yml(newvulkan-parity-matrix-gatejob),docs/development/cross-backend-gate.md(new),docs/backends/index.md(cross-backend section),libvmaf/AGENTS.md(rebase-sensitive invariant note). - Why this matters on rebase: the CI lane and the matrix-gate script are entirely fork-local. Upstream Netflix/vmaf has no comparable gate; conflicts on rebase are restricted to the CI workflow file when upstream rearranges its own jobs. The gate's Python script lives outside
core/src/so the upstream-sync path doesn't see it. - Invariants the gate enforces:
- Per-feature absolute tolerance is declared in one place (
FEATURE_TOLERANCEinscripts/ci/cross_backend_parity_gate.py). Tightening a tolerance requires a measurement-driven follow-up ADR; loosening requires a justification ADR (CLAUDE.md §12 r1). - The legacy single-feature gate
scripts/ci/cross_backend_vif_diff.pystays for one release cycle. Sister PRs in this session add to it; the T6-8b cleanup PR deletes it once the matrix gate has soaked. - CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is registered. The script supports them via
--backends; flipping the CI lane to required is a follow-up wiring change, not a code change. - On upstream sync: no interaction with upstream
tests-and-quality-gates.yml(the gate job is fork-added); rebase conflicts limited to insertion-order in the workflow file. - Re-test on rebase:
cd libvmaf && meson setup build \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled -Denable_float=true \
--buildtype=release && ninja -C build
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
--vmaf-binary core/build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --backends cpu vulkan \
--json-out /tmp/parity.json --md-out /tmp/parity.md
0220 — SYCL feature kernels are unconditionally fp64-free (T7-17)¶
- Touches:
core/src/sycl/common.cpp(init log line),core/src/sycl/AGENTS.md(new invariant row), all SYCL feature kernels undercore/src/feature/sycl/(no diff today, but the contract pins their shape going forward). - Invariant: every SYCL feature-kernel lambda captures and operates on
float/ integer types only. Nodoubleoperand inside aparallel_forbody, nosycl::reduction<double>, nosycl::plus<double>. A single fp64 instruction in the TU's SPIR-V module causes the Level Zero runtime to reject the entire module on Intel Arc A-series and other fp64-less devices, even when the offending kernel is never submitted. Host-sidedouble(inextract/flushpost-processing, score aggregation, log10 normalisation) remains fine. Concrete patterns in tree: ADM gain limiting via int64 Q31 (gain_limit_to_q31+launch_decouple_csf<false>ininteger_adm_sycl.cpp); VIF gain limiting via fp32sycl::fmin; CIEDE / SSIM accumulators viasycl::reduction<int64_t>/sycl::plus<int64_t>. - On upstream sync: Netflix/vmaf has no SYCL backend upstream; conflicts cannot enter via
git merge. The risk is a fork-local cherry-pick (e.g. a SYCL twin of a new CUDA kernel) bringing adoubleinto a kernel lambda. Audit the lambda capture list and anysycl::reduce*calls against this invariant before merging. - Re-test on rebase:
# Build SYCL backend
meson setup build-sycl libvmaf -Denable_sycl=true CC=icx CXX=icpx
ninja -C build-sycl
# On an fp64-less device (e.g. Intel Arc A380), confirm the
# init log line is INFO-level and reads "device lacks native
# fp64 — kernels already use fp32 + int64 paths, no emulation
# overhead". The SYCL kernels must launch successfully (no
# SPIR-V module rejection from the Level Zero runtime).
build-sycl/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --backend sycl \
--feature integer_vif --feature integer_adm \
--output /tmp/sycl-fp64less.json --json
0091 — T6-9 model registry schema + --tiny-model-verify (ADR-0211)¶
- No rebase impact: 100% fork-local surface. The registry (
model/tiny/registry.json), its JSON Schema (model/tiny/registry.schema.json), the--tiny-model-verifyCLI flag, and thevmaf_dnn_verify_signature()C entry point are entirely fork-local — none of these paths exist in upstream Netflix/vmaf. Listed here for completeness so a future/sync-upstreamrun sees the surface area was acknowledged. - Touches (additive only):
model/tiny/registry.json,model/tiny/registry.schema.json,ai/scripts/validate_model_registry.py,core/src/dnn/model_loader.{c,h}(addedvmaf_dnn_verify_signature()),core/include/libvmaf/dnn.h(public declaration),core/tools/cli_parse.{c,h}(ARG_TINY_MODEL_VERIFY+tiny_model_verifyfield),core/tools/vmaf.c(call site),core/test/dnn/test_tiny_model_verify.c,python/test/model_registry_schema_test.py,docs/ai/model-registry.md,docs/ai/inference.md,docs/ai/security.md,docs/adr/0209-...md,docs/adr/README.md(index row),CHANGELOG.md,core/src/dnn/AGENTS.md. - Invariants (rebase-relevant):
- Schema is the contract. New registry fields land in
registry.schema.jsonfirst, then inregistry.json, then in any consumers (the C-side parser, the Python validator, the MCP). Reverse order causes mismatch. schema_versionis bounded. The schema accepts only{0, 1}; bump the enum and the loader's check together when adding2.- Banned-function rule applies. The
cosigninvocation usesposix_spawnp(3p)with an explicit argv array. Do not replace withsystem(3)/popen(3)— both shell-parse the command and would re-introduce injection risk. - Bundle-file absence is fail-closed. When
sigstore_bundlepoints at a not-yet-existing file (pre-release state),vmaf_dnn_verify_signature()returns-ENOENT. The CLI surfaces this as a load failure; do not "soften" to a warning without an explicit ADR. - Re-test on rebase:
python3 ai/scripts/validate_model_registry.py
python3 -m pytest python/test/model_registry_schema_test.py -v
meson test -C build-cpu --suite=dnn
0074 — HIP (AMD ROCm) backend scaffold (T7-10)¶
- ADR: ADR-0212.
- Upstream source: fork-local. HIP backend is fork-only; Netflix/vmaf has no
libvmaf_hip.hand noenable_hipmeson option. - Touches:
core/include/libvmaf/libvmaf_hip.h(new).core/include/core/meson.build— adds theis_hip_enabledinstall gate, mirroringis_cuda_enabled/is_sycl_enabledboolean idioms.core/meson_options.txt— newenable_hipboolean option (default false).core/src/meson.build— newis_hip_enabledflag, conditionalsubdir('hip'),hip_sources+hip_depsthreaded throughlibvmaf_feature_static_lib(alongside the existing CUDA / SYCL / Vulkan aggregations) and the top-levellibrary('vmaf', ...)dependencieslist.core/src/hip/(new directory:common.{c,h},picture_hip.{c,h},dispatch_strategy.{c,h},meson.build).core/src/feature/hip/(new directory:adm_hip.c,vif_hip.c,motion_hip.c).core/test/test_hip_smoke.c(new).core/test/meson.build— registers the smoke test underif get_option('enable_hip') == true..github/workflows/libvmaf-build-matrix.yml— addsBuild — Ubuntu HIP (T7-10 scaffold)row.docs/backends/hip/overview.md(new),docs/backends/index.md(planned → scaffold row),docs/research/0033-hip-applicability.md(new),docs/adr/0212-hip-backend-scaffold.md(new),docs/adr/README.md(new index row).libvmaf/AGENTS.md— new "HIP backend scaffold contract" rebase-sensitive invariant entry.CHANGELOG.md— Unreleased § Added.- Invariants (rebase-relevant):
enable_hipis abooleanoption, not afeature. Mirrorsenable_cuda/enable_sycl; do not "harmonise" withenable_vulkan'sfeature/disabledform without an ADR amendment per ADR-0212 § "Decision".- Public C-API entry points return
-ENOSYSfor the scaffold. The smoke test core/test/test_hip_smoke.c pins this. A rebase that "succeeds" by accidentally enabling a code path (e.g. a refactor that early-returns 0 fromvmaf_hip_state_init) breaks the smoke and the runtime PR's contract baseline. hip_sourcesis added tolibvmaf_feature_static_lib, NOT directly to the top-levellibrary('vmaf', ...). The static lib is extracted into libvmaf viaobjects: [..., libvmaf_feature_static_lib.extract_all_objects(recursive: true), ...]at the bottom ofcore/src/meson.build. Addinghip_sourcesto the top library() too would double-link.hip_depsIS added to the top library()dependencies:list. The runtime PR will populatehip_depswith the realdependency('hip-lang')linkage; threading it through the top library() ensures consumers see the transitive dependency.- Header purity:
libvmaf_hip.hdoes not include<hip/hip_runtime.h>. HIP runtime types cross the public ABI asuintptr_t(matches the CUDA / Vulkan precedent; ADR-0212). Don't add<hip/...>includes to the public header during a rebase / runtime-PR bring-up. - No FFmpeg patch: the fork's
ffmpeg-patches/series does not currently consume the HIP API surface. CLAUDE §12 r14 only requires patch updates when an existing patch consumes the surface; the runtime PR (T7-10b) will add thehip_devicefilter option and the corresponding patch. - On upstream sync: zero interaction; HIP backend is fork-only.
- Re-test on rebase:
cd libvmaf
meson setup build-hip -Denable_cuda=false -Denable_sycl=false \
-Denable_hip=true
ninja -C build-hip
meson test -C build-hip test_hip_smoke
# Expect: 9/9 pass.
# Default no-HIP build still works:
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=fast
0074 — SSIMULACRA 2 SVE2 SIMD parity (T7-38)¶
- ADR: ADR-0213.
- Touches:
core/src/feature/arm64/ssimulacra2_sve2.{c,h}(new),core/src/feature/ssimulacra2.c(dispatch table override ininit_simd_dispatch),core/src/arm/cpu.{c,h}(HWCAP2_SVE2 probe + newVMAF_ARM_CPU_FLAG_SVE2enum value),core/src/meson.build(cc.compiles probe + optionalarm64_ssimulacra2_sve2static library),core/test/test_ssimulacra2_simd.c(SVE2 picker overrides on the arm64 path + dispatch diagnostic),build-aux/aarch64-linux-gnu-sve2.ini(new cross-file pinningqemu-aarch64-static -cpu max). All paths are wholly fork-local; no upstream Netflix/vmaf code is modified. - Invariants:
- Fixed 4-lane SVE2 predicate. Every kernel uses
svwhilelt_b32(0, 4)so SIMD arithmetic order is identical to the NEON sibling regardless of the runtime vector length. This keeps the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract intact. Do NOT widen the predicate tosvptrue_b32()without a separate ADR + snapshot regen — variable-length lane reductions perturb the per-step rounding order. - NEON stays the fallback. SVE2 is purely additive; the dispatch table assigns NEON first and only overrides on
VMAF_ARM_CPU_FLAG_SVE2. A toolchain that fails thecc.compiles(... -march=armv9-a+sve2)probe leavesHAVE_SVE2unset and the legacy NEON-only build is unchanged. -ffp-contract=offmirrors the NEON sibling. Without it GCC fuses the per-lane scalar tail'sa*b+cpatterns intofmla, drifting against the SIMD path by ~1 ulp. Thearm64_ssimulacra2_sve2static library carries the flag like its NEON counterpart.- On upstream sync: no interaction with upstream —
arm64/feature TUs and thearm/cpu.{c,h}flag enum are fork-local. An upstream sync that rewritesinit_simd_dispatchincore/src/feature/ssimulacra2.cwould also need the SVE2 cases preserved. - Re-test on rebase:
meson setup build-arm64-sve2 libvmaf \
--cross-file=build-aux/aarch64-linux-gnu-sve2.ini -Denable_asm=true
ninja -C build-arm64-sve2 test/test_ssimulacra2_simd
meson test -C build-arm64-sve2 test_ssimulacra2_simd
# stderr should report `ssimulacra2 simd dispatch: NEON=1 SVE2=1`
# and 11/11 tests should pass.
0075 — enable_lcs MS-SSIM extras on CUDA + Vulkan (T7-35 / ADR-0243)¶
- Touched surfaces (fork-local):
core/src/feature/cuda/integer_ms_ssim_cuda.c(addedenable_lcstoMsSsimStateCuda+options[]+ 15 host-sidevmaf_feature_collector_appendcalls gated on the bool),core/src/feature/vulkan/ms_ssim_vulkan.c(rewroteenable_lcshelp text + addedemit_lcs_metricshelper + gated 15vmaf_feature_collector_appendcalls),scripts/ci/cross_backend_vif_diff.py scripts/ci/cross_backend_parity_gate.py(newfloat_ms_ssim_lcspseudo-feature +FEATURE_ALIASESmapplaces=4tolerance row).- Why this matters on rebase: the GPU MS-SSIM extractors are fork-local (Netflix upstream has no Vulkan or CUDA MS-SSIM kernel today). The
enable_lcssemantic and the metric names (float_ms_ssim_{l,c,s}_scale{0..4}) must match the upstream CPU reference atcore/src/feature/float_ms_ssim.c:189-221. If upstream ever renames or reorders those metrics, mirror the change on the GPU side in the same merge — public-API contract. - Invariants the contract enforces:
- Default-path output (
enable_lcs=false) stays bit-identical to the pre-T7-35 binary: only the host-side appends are gated; no kernel / shader / device-buffer changes. - Metric ordering is metric-wise (all
l_scale*first, thenc_*, thens_*) — matches the CPU emission order. places=4cross-backend tolerance per ADR-0190; enforced by the newfloat_ms_ssim_lcscell in the parity matrix gate (ADR-0214).- On upstream sync: zero interaction; the GPU twins do not exist upstream. The CPU
float_ms_ssim.cis shared with upstream butenable_lcsis upstream-stable since v3.0.0. - Re-test on rebase:
cd libvmaf && meson setup build-vulkan \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled -Denable_float=true \
--buildtype=release && ninja -C build-vulkan
cd ..
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build-vulkan/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 \
--feature float_ms_ssim_lcs --backend vulkan --places 4
0075 — 32-bit ADM/cpu fallbacks port (T-NEW-3)¶
- Touched surfaces (upstream-mirror):
core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/x86/cpu.c. Cherry-picks of upstream8a289703(Christopher Degawa, "adm: add fallback for extract_epi64 for 32-bit") and1b6c3886("x86/cpu: remove limit of avx+ on 32-bit"). - Why this matters on rebase: trivially conflict-free with any future upstream
extract_epi64work because we land upstream's exactextract_epi64macro/inline-fn pair. The conflict surface is the fork's clang-format-100col layout inadm_avx2.c/adm_avx512.cand the_Alignas(64)LTO-correctness slot inadm_avx512.c(docs/development/known-upstream-bugs.md); both are preserved verbatim. - Invariants the port preserves:
_Alignas(64) int64_t angle_flag[16]inadm_decouple_s123_avx512stays — without it, LTO can promote the unaligned load tovmovdqa64and fault under--buildtype=release -Db_lto=true.- The
extract_epi64symbol must remain resolved on both__x86_64__(macro to_mm256_extract_epi64) and 32-bit (fallback inline). If a future upstream change inlines the helper differently, keep the conditional definition. - On upstream sync: if Netflix ships further 32-bit fallbacks (motion / psnr — not in this port), expect a parallel
extract_epi64-style helper at the top of each affected SIMD file. The fork should mirror those verbatim into the same files. - Re-test on rebase:
meson setup build-i686 libvmaf \
--cross-file=build-aux/i686-linux-gnu.ini \
-Denable_asm=false
ninja -C build-i686
meson setup build-cpu libvmaf -Denable_avx512=true
ninja -C build-cpu
meson test -C build-cpu
0076 — codec-aware FR regressor surface (T7-CODEC-AWARE / ADR-0235)¶
- Touches:
ai/src/vmaf_train/codec.py(new),ai/src/vmaf_train/models/fr_regressor.py(extended),ai/scripts/bvi_dvc_to_full_features.py,ai/scripts/extract_full_features.py. No upstream-shared paths. - Invariant:
CODEC_VOCABinai/src/vmaf_train/codec.pyis closed and order-stable — the index of each codec is the one-hot column index baked into trained ONNX. Adding a codec appends to the tuple and bumpsCODEC_VOCAB_VERSION; reordering silently invalidates every shippedfr_regressor_v2_*.onnx.FRRegressor(num_codecs=0)must remain the v1 single-input contract — flipping the default would break every existingmodel/tiny/fr_regressor_v1.onnxconsumer. - Re-test:
pytest ai/tests/test_codec_aware_fr.py -v(8 sub-tests covering vocabulary contract + alias table + back-compat). Pure fork-local addition; no upstream rebase impact for the next/sync-upstream.
0075 — feature/speed extractors (T-NEW-1, upstream port d3647c73)¶
- Touches:
core/src/feature/speed.c(new),core/src/feature/picture_copy.{c,h}(signature change — addedint channelparameter),core/src/feature/float_*.ccall sites updated to passchannel=0,core/src/feature/feature_extractor.cregistry block,core/src/feature/alias.c,core/src/meson.build,core/src/feature/vif_tools.{c,h}(helper-function port from upstream4ad6e0ea). - Upstream source: verbatim cherry-pick of Netflix/vmaf
d3647c73("feature/speed: port speed_chroma and speed_temporal extractors") with its dependency4ad6e0ea("feature/vif: port helper functions"). Both are pre-existing on Netflix master and enter the fork as part of the T7-4 audit catch-up. - Invariant:
picture_copy()now takes achannelargument — every fork-local extractor that calls it (CUDAinteger_ms_ssim, Vulkanssim/ms_ssim) passeschannel=0. If upstream later evolves the signature again (e.g. adds bit-depth or stride validation), update those fork-local call sites in lockstep. Speed extractors only register whenVMAF_FLOAT_FEATURES=1(build with-Denable_float=true). - On upstream sync: future Netflix commits in
core/src/feature/speed.capply cleanly because the file is now a verbatim mirror; conflict potential is limited to the registry block infeature_extractor.c(interleave with the fork's Vulkan / SYCL / CUDA blocks) and to any furtherpicture_copysignature evolution. - Re-test on rebase:
```bash meson setup build-cpu libvmaf -Denable_cuda=false \ -Denable_sycl=false -Denable_float=true ninja -C build-cpu meson test -C build-cpu test_speed meson test -C build-cpu # full meson suite make test-netflix-golden # 3 CPU canonical pairs
0221 — CHANGELOG + ADR-index fragment-file pattern (T7-39 / ADR-0221)¶
- What changed: the fork stopped editing
CHANGELOG.mdanddocs/adr/README.mddirectly. Both files are now rendered from fragment trees: changelog.d/<section>/<topic>.md(Keep-a-Changelog sections), plus the migration archivechangelog.d/_pre_fragment_legacy.md.docs/adr/_index_fragments/<NNNN-slug>.md, plusdocs/adr/_index_fragments/_order.txt(frozen commit-merge order manifest) anddocs/adr/_index_fragments/_header.md(table prelude). Two scripts render the consolidated outputs:scripts/release/concat-changelog-fragments.sh --check|--writescripts/docs/concat-adr-index.sh --check|--write- On upstream sync: zero interaction —
CHANGELOG.mdis a fork-local Markdown surface (Netflix upstream doesn't ship a Keep-a-Changelog file in this format), anddocs/adr/is entirely fork-local. A/sync-upstreamrun will not touch the fragment trees. - Re-test on rebase:
bash scripts/release/concat-changelog-fragments.sh --check
bash scripts/docs/concat-adr-index.sh --check
# both must exit 0; otherwise run --write and re-stage.
0077 — DISTS extractor proposal (T7-DISTS / ADR-0236)¶
- What landed: ADR-0236 (Proposed) + Research-0043 design digest ADR README index row + CHANGELOG entry.
- Rebase impact: pure fork-local proposal-stage docs; no code, no Netflix-mirror file touched, no ffmpeg-patches change, no public C-API surface change.
- Reproducer (when implementation lands as T7-DISTS):
```sh vmaf --feature dists_sq=model_path=model/tiny/dists_sq.onnx \ --reference ref.yuv --distorted dist.yuv \ --width 1920 --height 1080 --pix_fmt yuv420p
0076 — GPU-gen ULP calibration head (proposal-stage, T7-GPU-ULP-CAL / ADR-0234)¶
- What landed: ADR-0234 (Proposed), Research-0041, data-collection scaffold at
ai/scripts/collect_gpu_calibration_data.py, forward-pointer indocs/usage/cli.mdfor the future--gpu-calibratedflag. - Rebase impact: pure fork-local (proposal docs + Python script); no upstream Netflix/vmaf code touched, no public C-API changes, no ffmpeg-patches changes.
- Reproducer:
```sh python3 ai/scripts/collect_gpu_calibration_data.py --smoke
0095 — Per-backend GPU kernel scaffolding templates (CUDA + Vulkan, ADR-0246)¶
- ADR: ADR-0246.
- Touches:
core/src/cuda/kernel_template.h(new, header-only).core/src/vulkan/kernel_template.h(new, header-only).core/src/cuda/AGENTS.md(new invariant row + dir listing).core/src/vulkan/AGENTS.md(new file).docs/backends/kernel-scaffolding.md(new).docs/adr/0246-gpu-kernel-template.md(new).CHANGELOG.md,docs/adr/README.md. All paths are wholly fork-local. Upstream Netflix/vmaf has no Vulkan backend at all today and the CUDA backend uses different per-kernel scaffolding shapes; nothing here can collide on a pure upstream sync.- Invariants:
- Templates are unused at PR-merge time.
kernel_template.hin bothcore/src/cuda/andcore/src/vulkan/lands with zero call-sites. Each future kernel migration is its own gated PR (places=4cross-backend-diff per ADR-0214). Do not bulk-port existing kernels onto the templates in a single sync — that would short-circuit the per-kernel gate. - Per-backend, not cross-backend. Resist the urge to merge the two templates into a unified
gpu/kernel_template.h. CUDA async-stream + event vs Vulkan command-buffer + fence + descriptor-pool share no concrete shape; a unified API would be lowest-common-denominator. - Helper functions, not macros. The header bodies are
static inlinefunctions for cuda-gdb / Nsight / RenderDoc step-debugging. TheCHECK_CUDA_GOTO/CHECK_CUDA_RETURNmacros incuda_helper.cuhstay where they pay off (textualgoto label), and the templates use them internally. - On upstream sync: no interaction with upstream paths. An upstream sync that touches
core/src/cuda/common.horpicture_cuda.hmay shift the helper signatures the template consumes (vmaf_cuda_buffer_alloc,vmaf_cuda_picture_get_stream, …); update the template if so. - Re-test on rebase:
```bash # CUDA build (configure inside libvmaf/ — see CLAUDE.md §2 note). meson setup core/build-cuda libvmaf \ -Denable_cuda=true -Denable_nvcc=true \ -Denable_vulkan=disabled -Denable_sycl=false ninja -C core/build-cuda meson test -C core/build-cuda
# Vulkan build. meson setup core/build-vulkan libvmaf \ -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C core/build-vulkan meson test -C core/build-vulkan
0222 — vmaf-perShot per-shot CRF predictor sidecar (T6-3b)¶
- Touches:
core/tools/meson.build(new executable + test wiring),core/tools/vmaf_per_shot.c(new file — fork-local, no upstream sibling),core/tools/test/meson.build(test row),core/tools/test/test_vmaf_per_shot.sh(new smoke test),core/tools/AGENTS.md(sidecar invariants),docs/usage/cli.md(cross-link),docs/usage/vmaf-perShot.md(new user doc),docs/ai/roadmap.md(T6-3b row update). - Invariant: the sidecar must stay standalone — it does not link the libvmaf metric path. Any upstream patch that tries to fold per-shot CRF prediction into
vmaf_score_*would collapse the encoder-hint vs. quality-score separation recorded in roadmap §2.4 and ADR-0222 §Decision. The CSV / JSON column set (shot_id,start_frame,end_frame,frames,mean_complexity,mean_motion,predicted_crf) is the public schema; downstream encoders consume it directly. - Conflict expectation on
/sync-upstream: low. Upstream Netflix has no per-shot CRF predictor in tree, so there is no natural collision point —tools/meson.buildis the only mutually-edited file and the newexecutable('vmaf-perShot', …)block is appended aftervmaf_bench_deps, well clear of upstream's likely additions. - Reproducer:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=disabled ninja -C build meson test -C build test_vmaf_per_shot --print-errorlogs ./build/tools/vmaf-perShot \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --output /tmp/plan.csv cat /tmp/plan.csv
0075 — vmaf-roi sidecar binary (T6-2b / ADR-0247)¶
- Touches:
core/tools/meson.build— adds thevmaf_roiexecutable target (after the existingvmaftarget, beforevmaf_bench). Append-only; no upstream-shared lines moved or removed.core/test/meson.build— adds thetest_vmaf_roiexecutable +test()registration. Append-only.core/tools/vmaf_roi.c— wholly new, fork-local.core/tools/vmaf_roi_core.h— wholly new, fork-local.core/test/test_vmaf_roi.c— wholly new, fork-local.- Invariant: the
vmaf-roisidecar emits two byte-exact formats that downstream encoder drivers (x265--qpfile, SVT-AV1--roi-map-file) will hard-depend on: - x265 ASCII grid — two
#-prefixed header lines (# vmaf-roi qpfile (x265, --qpfile-style)and# frame=N ctu=S cols=C rows=R strength=F.FFF), space-separated signed integers, one row per CTU row,\nterminator. - SVT-AV1 raw binary — exactly
cols * rowsbytes ofint8_t, row-major, no header. - QP-offset clamp —
+-12(VMAF_ROI_CORE_QP_OFFSET_MAX). - Reduction — per-CTU mean (not max). Switching to max or a percentile changes every downstream encoder result and requires its own ADR.
- Pure helpers in
vmaf_roi_core.h— the per-CTU mean reducer and saliency-to-QP mapper arestatic inlinein a header sotest_vmaf_roicompiles them without dragging the libvmaf link surface in. Moving them into a.cTU breaks the test wiring. - On upstream sync: no interaction with upstream —
tools/is a fork-local surface from upstream's perspective (upstream shipsvmaf.conly). An upstream sync that rewritescore/tools/meson.buildshould preserve thevmaf_roiexecutable block. - Re-test on rebase:
```bash meson setup build-cpu libvmaf \ -Denable_cuda=false -Denable_sycl=false -Denable_tools=true ninja -C build-cpu tools/vmaf_roi test/test_vmaf_roi meson test -C build-cpu test_vmaf_roi ./build-cpu/tools/vmaf_roi \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --frame 0 --output - \ --encoder x265 --ctu-size 64 --strength 6.0 | head -3 # First two lines are the # comment header; row 1 of the grid # should be "4 2 1 -1 -1 -1 1 2 4" (placeholder radial map).
0219 — motion3 GPU coverage on Vulkan + CUDA + SYCL (T3-15(c) / ADR-0219)¶
- What changed: The
motionGPU twins (core/src/feature/vulkan/motion_vulkan.c,core/src/feature/cuda/integer_motion_cuda.c,core/src/feature/sycl/integer_motion_sycl.cpp) now emitVMAF_integer_feature_motion3_scorein 3-frame window mode (default). Cross-backend gates extended (scripts/ci/cross_backend_*.pyFEATURE_METRICS["motion"]). - Invariants:
motion3 = host-side scalar post-process of motion2. No device-side state changes; motion3 is computed on the host inextract()/collect()/flush()after the existing SAD reduction. The post-processing function (motion3_postprocess_*) mirrors CPUinteger_motion.clines 510-560 byte-for-byte:clip(motion_blend(motion2 * fps_weight, blend_factor, blend_offset), max_val)with optional moving-average against the unaveraged prior blended value.motion_five_frame_window=truereturns-ENOTSUPatinit()on all three GPU backends. The 5-deep blur ring + second SAD-pair dispatch remain deferred. Do NOT silently fall back to the 3-frame path when the user enables the flag — fail loud per CERT C / CLAUDE.md §12 r4.- CPU motion3 algorithm is the source of truth. Any port of an upstream Netflix change to
integer_motion.cthat touchesmotion_blend(...), themotion_max_valclip, or the moving-average rule MUST be mirrored inmotion3_postprocess_*across all three GPU files in the same PR. The cross-backend gate atplaces=4will catch drift, but only after a full GPU run. - On upstream sync: Pure fork-local additions to GPU TUs. Upstream Netflix has no GPU motion extractor. The
motion_blend_tools.hheader is upstream-mirrored — if a sync rewrites themotion_blend()formula, regenerate the GPU snapshot and re-run the cross-backend gate. - Re-test on rebase:
```bash # CPU sanity (motion3 emission unchanged) ./core/build/tools/vmaf \ --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature motion --output /tmp/motion.json --json python -c "import json; d=json.load(open('/tmp/motion.json')); \ print('motion3 frames:', sum(1 for f in d['frames'] \ if 'integer_motion3' in f.get('metrics', {})))" # Expect 49 (one motion3 per frame).
# Cross-backend gate (Vulkan/lavapipe lane works on every host): python scripts/ci/cross_backend_vif_diff.py \ --feature motion --backend vulkan \ --ref python/test/resource/yuv/src01_hrc00_576x324.yuv \ --dis python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --bitdepth 8 \ --vmaf-bin core/build/tools/vmaf # Expect: integer_motion / integer_motion2 / integer_motion3 all OK at places=4.
0216 — vmaf_tiny_v2 (Phase-3-validated tiny VMAF MLP)¶
- Touches:
model/tiny/registry.json,model/tiny/vmaf_tiny_v2.{onnx,json},ai/scripts/{train,export,validate}_vmaf_tiny_v2.py,ai/AGENTS.md,core/test/dnn/{test_vmaf_tiny_v2.py,meson.build},docs/ai/{models/vmaf_tiny_v2.md,inference.md,roadmap.md},docs/adr/{0244-vmaf-tiny-v2.md,README.md},CHANGELOG.md. All paths are wholly fork-local; no upstream Netflix/vmaf code is modified. - Invariants:
- Bundled scaler stats are part of the trust root. The shipped ONNX bakes
(input - mean) / stdas ConstantSub+Divnodes that run before the MLP. Re-exporting must go throughai/scripts/export_vmaf_tiny_v2.py, which pullsmean/stdfrom the trainer checkpoint and writes them as graph initialisers. Adding an out-of-band scaler step at runtime (e.g., a sidecar JSON consumed by the loader) is forbidden without a follow-up ADR — it splits the trust root and invalidates the registry sha256 contract. - Feature column order is fixed. The graph reads
(adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2)in exactly this order; reordering breaks the bundledmean/stdconstants. Any change to the feature set requires a fresh Phase-3 chain (Research-0027 → 0028 → 0029 → 0030). - opset 17. Matches the sister tiny-AI models (
learned_filter_v1,nr_metric_v1,fastdvdnet_pre) and the ORT op-allowlist baseline. Upgrading requires re-validating theSub/Div/Gemm/Relu/Squeezeops againstop_allowlist.c. - On upstream sync: zero interaction. Netflix/vmaf has no equivalent surface; an upstream sync that touches
core/src/dnn/(op-allowlist or model-loader changes) needs to preserveSub/Div/Gemm/Relu/Squeezein the allowlist for opset 17. - Re-test on rebase:
```bash bash core/test/dnn/test_registry.sh python3 core/test/dnn/test_vmaf_tiny_v2.py python3 ai/scripts/validate_vmaf_tiny_v2.py \ --onnx model/tiny/vmaf_tiny_v2.onnx \ --parquet runs/full_features_netflix.parquet \ --rows 100 --min-plcc 0.97 meson test -C build-cpu --suite=dnn
0094 — Tiny-AI extractor template (ADR-0250)¶
- Touches:
core/src/dnn/tiny_extractor_template.h(new),core/src/feature/feature_lpips.c,core/src/feature/fastdvdnet_pre.c,core/src/dnn/AGENTS.md,docs/ai/extractor-template.md(new),docs/adr/0250-tiny-ai-extractor-template.md(new). - Invariants:
- Helper signatures are wire-format-stable.
vmaf_tiny_ai_resolve_model_path(name, option, env_var)andvmaf_tiny_ai_open_session(name, path, &out)produce the user-facing log lines<name>: no model path …and<name>: vmaf_dnn_session_open(<path>) failed: <rc>— downstream tooling greps these. Don't rename or reorder the parameters without bumping every extractor + the recipe doc. - YUV→RGB is bit-exact. The shared
vmaf_tiny_ai_yuv8_to_rgb8_planesis a literal move of the pre-existingfeature_lpips.cbody (BT.709 limited-range, nearest-neighbour chroma upsample). LPIPS / saliency / future colour-sensitive tiny-AI scores depend on byte-exact equality with the prior ad-hoc copies. Any change to the conversion constants or the rounding rule needs a separate ADR + a coordinated snapshot regen —model/tiny/weights aren't re-trained against new colour math casually. - Option-table macro is plain text substitution. The
VMAF_TINY_AI_MODEL_PATH_OPTION(state_t, help)macro emits a single struct literal — no control flow, no recursion, no variadic shenanigans (Power-of-10 rule 1 / rule 9). Don't extend it into a multi-option emitter without a fresh ADR. - On upstream sync: zero interaction with upstream —
feature_lpips.candfastdvdnet_pre.care fork-only files, and the newdnn/tiny_extractor_template.hlives entirely under fork-introducedcore/src/dnn/. An upstream sync that rewrites unrelatedfeature_*.cfiles won't conflict. - Re-test on rebase:
cd libvmaf
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=dnn
meson test -C build-cpu test_lpips test_fastdvdnet_pre
# All 10 dnn-suite + both extractor tests must pass.
0095 — Vulkan ring-depth tunable (ADR-0251 follow-up #3)¶
- PR: feat/t7-29-followup3-ring-tunable.
- What rebases need to know:
VmafVulkanConfigurationgrew an additiveunsigned max_outstanding_framesfield. Existing zero-initialised configs continue to receive the canonical default (0 → VMAF_VULKAN_RING_DEFAULT == 4). The clamp helpervmaf_vulkan_clamp_ring_sizemoved fromimport.c(file-local static) tovulkan_internal.h(static inline) sostate_initandlazy_alloc_ringshare one definition; an upstream sync that re-introduces the static inimport.cwould shadow the header helper — drop the duplicate, keep the inline. - New public symbol:
vmaf_vulkan_state_max_outstanding_frames(const VmafVulkanState *)— read-side accessor for the clamped value. Pure additive surface; no upstream collision. - On upstream sync: zero interaction. The ring is wholly fork-introduced (ADR-0251); upstream Netflix has no Vulkan backend.
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_hip=false -Denable_vulkan=disabled \ -Denable_float=true ninja -C build && meson test -C build # 51/52 OK; 1 pre-existing # T7-32 fail on # test_motion_v2_simd # (ADR-0038 follow-up)
# Smoke the new options + ENOTSUP guard: build/tools/vmaf --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature 'motion_v2=motion_blend_factor=0.5' \ --xml -o /tmp/r.xml --no_prediction grep motion3_v2 /tmp/r.xml | head -3 # → 49 frames with VMAF_integer_feature_motion3_v2_score_mbf_0.5
# ENOTSUP guard: build/tools/vmaf --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature 'motion_v2=motion_five_frame_window=1' \ --xml -o /tmp/r2.xml --no_prediction 2>&1 # → "problem loading feature extractor: motion_v2" # → stderr: "motion_v2: motion_five_frame_window=true is not supported …"
ADR-index backfill 2026-05-08 (this PR)¶
- Touches:
docs/adr/_index_fragments/0235-codec-aware-fr-regressor.md(new),docs/adr/_index_fragments/0236-dists-extractor.md(new),docs/adr/_index_fragments/0238-vulkan-picture-preallocation.md(new),docs/adr/_index_fragments/0239-gpu-picture-pool-dedup.md(new),docs/adr/_index_fragments/0251-vulkan-async-pending-fence.md(new),docs/adr/_index_fragments/0279-fr-regressor-v2-probabilistic.md(new),docs/adr/_index_fragments/_order.txt(six slugs appended),docs/adr/README.md(eight rows appended; one duplicate ADR-0279 row deduplicated). - Invariant: no engine code touched; no upstream-shared paths. Pure fork-local index maintenance.
- On upstream sync: no action required.
docs/adr/is a fork-local tree. - Coordination with #468 (27-ADR status sweep): both PRs touch ADR metadata. They do not conflict at the file level (#468 edits ADR bodies; this PR adds index fragments + appends README rows for the eight previously-unindexed ADRs). At merge time the README append-tail may overlap if #468 lands later index rows for its swept ADRs; whichever lands first, the second rebases by re-running
scripts/docs/concat-adr-index.sh --checkand inserting any newly-stale rows in commit-merge order. - Known finding (out of scope):
scripts/docs/concat-adr-index.sh --checkcurrently reports a much larger fragment-vs-README drift than this PR introduces — many ADRs have rows inREADME.mdwithout corresponding_index_fragments/files, and several_order.txtslugs have no fragment yet. Running--writeblindly would drop ~37 README rows for ADRs unrelated to this PR. The ADR-0221 fragment-driven contract therefore could not be enforced via a clean--writehere; eight new rows were appended directly to keep the change scoped. A separate sweep PR is needed to flush the residual drift. - Re-test on rebase:
for n in 0235 0236 0238 0239 0251 0276 0279 0315; do
grep -cE "^\| \[ADR-$n\]" docs/adr/README.md # must be ≥ 1
done
bash scripts/docs/concat-adr-index.sh >/dev/null # must succeed
-Denable_vulkan=enabled
ninja -C build meson test -C build test_vulkan_async_pending_fence
# All 8 cases must pass: 4 v2-contract + 4 ring-tunable.
0096 — tools/vmaf-tune/ automation umbrella spec (ADR-0237 / Research-0044)¶
- PR: feat/vmaf-tune-spec.
- What rebases need to know: this PR ships only an umbrella ADR research digest under
docs/. No tracked source code, notools/vmaf-tune/directory yet, no Meson changes. An upstream sync touching ffmpeg-patches orlibvmaf/cannot collide with this PR. - On upstream sync: zero interaction. Spec-only PR.
- Re-test on rebase:
# No build/test impact — verify the docs render and links are alive:
ls docs/adr/0237-quality-aware-encode-automation.md \
docs/research/0044-quality-aware-encode-automation.md
grep -c '\[ADR-0237\]' docs/adr/README.md
0097 — test_speed gated on enable_float (fix default-build failure)¶
- PR: fix/test-speed-chroma-registration.
- What rebases need to know:
core/test/meson.buildnow wraps thetest_speedexecutable +test()registration inif get_option('enable_float'). Thespeed_chroma/speed_temporalextractors live inspeed.c, which is only compiled whenenable_float=true(the entries infeature_extractor.care wrapped in#if VMAF_FLOAT_FEATURES), so the test'svmaf_get_feature_extractor_by_name("speed_chroma")returned NULL on a default build (enable_float=false). - On upstream sync: zero interaction.
test_speed.cwas added fork-side via the Netflix port commitd3647c73. The gating pattern matchestest_vulkan_*(if get_option('enable_vulkan').enabled()). - Re-test on rebase:
# default (enable_float=false): test_speed must NOT be in the suite
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --reconfigure
ninja -C build
meson test -C build # expect: NO test_speed in the run
# CI shape (enable_float=true): test_speed must run + pass
meson setup build libvmaf -Denable_float=true --reconfigure
ninja -C build
meson test -C build test_speed # expect: 5/5 pass
0098 — Vulkan picture preallocation surface (ADR-0238)¶
- PR: feat/vulkan-picture-preallocation.
- What rebases need to know: ABI grows additively. New public surface in
core/include/libvmaf/libvmaf_vulkan.h:enum VmafVulkanPicturePreallocationMethod,VmafVulkanPictureConfiguration,vmaf_vulkan_preallocate_pictures,vmaf_vulkan_picture_fetch. New enumeratorVMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICEincore/src/picture.h::VmafPictureBufferType. New TUcore/src/vulkan/picture_vulkan_pool.c(~180 LOC); registered incore/src/vulkan/meson.build. Fork-internal accessorvmaf_vulkan_state_context()(declared invulkan_internal.h) exposes the imported state's VkInstance/VkDevice to the pool — used only bylibvmaf.c::vmaf_vulkan_preallocate_pictures. VmafContextfield added:vmaf->vulkan.poolnext tovmaf->vulkan.state. Thevmaf_close()teardown closes the pool before clearing the state pointer (matches SYCL).- On upstream sync: zero interaction. Vulkan backend is fork-only; upstream Netflix has no Vulkan integration.
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_pic_preallocation # All 6 cases must pass under ASan/UBSan: # test_method_none_is_a_no_op # test_method_host_allocates_round_robins # test_method_device_allocates_round_robins # test_fetch_without_preallocate_falls_back # test_unknown_method_rejected # test_null_args_rejected
0099 — feature_mobilesal.c + transnet_v2.c migrated to tiny_extractor_template.h¶
- PR: refactor/migrate-ai-to-template.
- What rebases need to know:
feature_mobilesal.candtransnet_v2.cpreviously open-coded the model-path resolution (getenv+ log block), the YUV→RGB kernel (mobilesal only), thevmaf_dnn_session_open+ log boilerplate, and theVmafOption[].model_pathrow. They now use the helpers fromdnn/tiny_extractor_template.h(PR #251) — the same templatefeature_lpips.candfastdvdnet_pre.calready consume. Net −98 LOC of identical boilerplate. - Behavior preserved: bit-exact YUV→RGB conversion (mobilesal used the literal copy of
feature_lpips.c's body that the template hoisted), identical error-log strings, identical option-table flag/type/offset shape. The migratedmobilesal_optionsmacro expands to the same struct literal the hand-rolled version produced. - On upstream sync: zero interaction. Both files are fork-introduced; upstream Netflix has neither extractor.
0100 — cuda/ring_buffer.{c,h} → gpu_picture_pool.{c,h} (ADR-0239)¶
- PR: refactor/gpu-picture-pool-extract.
- What rebases need to know:
core/src/cuda/ring_buffer.candring_buffer.hare removed. The same callback-based round-robin pool lives atcore/src/gpu_picture_pool.{c,h}under renamed symbols (VmafRingBuffer→VmafGpuPicturePool,vmaf_ring_buffer_*→vmaf_gpu_picture_pool_*,_fetch_next_picture→_fetch). All call sites inlibvmaf.cmigrated.core/test/test_ring_buffer.crenamed totest_gpu_picture_pool.cwith the corresponding meson update. - Netflix-upstream interaction: minimal — Netflix's
cuda/ring_buffer.{c,h}last touched in commitcb1d49c6. An upstream sync that resurrects the old names should be redirected to the new ones; the file move is purely fork-local. Netflix#1300mutex-destroy-order fix preserved (ADR-0157) — moved verbatim to the new file; the fix remains attached tovmaf_gpu_picture_pool_close.- SYCL pool migration:
vmaf_sycl_picture_pool_*keeps its public-internal API but now delegates to the generic pool. The SYCL wrapper struct (VmafSyclPicturePool) just owns theVmafSyclCookiestorage.std::mutexdrops out. - Vulkan pool migration: bundled into this PR after #264 merged.
picture_vulkan_pool.crewrites as a thin wrapper around the generic pool — wrapper struct owns per-pool state for the alloc/free callbacks; the generic pool owns the round-robin slots / mutex / unwind. Same pattern as the SYCL migration above. - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=dnn
meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre
# All 11 dnn-suite + 4 extractor smoke tests must pass.
meson test -C build # 47/47 pass under ASan/UBSan
# CUDA build (CI-only; pre-existing local nvcc include-path quirk):
meson setup build-cuda libvmaf -Denable_cuda=true
ninja -C build-cuda
meson test -C build-cuda test_gpu_picture_pool
# SYCL build:
meson setup build-sycl libvmaf -Denable_sycl=true
ninja -C build-sycl
meson test -C build-sycl
0104 — psnr_vulkan.c migrated to vulkan/kernel_template.h¶
- PR: refactor/migrate-psnr-vulkan-to-template.
- What rebases need to know:
vulkan/kernel_template.h(410 LOC, ADR-0246, PR #251) shipped with zero consumers. Its docstring designatedpsnr_vulkan.cas the reference implementation. This PR lands the migration as the first consumer of the Vulkan template — paired with PR #269 (the first CUDA template consumer). The 5 long-lived pipeline objects (descriptor-set layout, pipeline layout, shader module, compute pipeline, descriptor pool) collapse from individual struct fields to oneVmafVulkanKernelPipeline plbundle.create_pipeline()(~104 LOC) collapses to a singlevmaf_vulkan_kernel_pipeline_create()call (~30 LOC) — the template owns the descriptor-set layout creation, pipeline layout, shader module, compute pipeline, and descriptor-pool sizing.close_fex()'svkDeviceWaitIdle+ 5×vkDestroy*sweep collapses to onevmaf_vulkan_kernel_pipeline_destroy()call. - Net LOC delta: −55 LOC on
psnr_vulkan.cdirectly. Unlike the CUDA template (where helper-call boilerplate roughly matches the inline savings), the Vulkan template's pipeline creation is dramatic enough that even the first consumer wins. - Bit-exactness gates: spec-constants, push-constant struct, shader bytecode, dispatch grid math, and host-side reduction are byte-identical to the prior implementation. The template only owns descriptor-set layout / pipeline layout / shader module / compute pipeline creation / descriptor pool sizing — none of which affects the kernel's mathematical behaviour. Cross-backend parity gate (places=4) re-runs unchanged.
- On upstream sync: zero interaction.
psnr_vulkan.cis fork-introduced (T7-23 / ADR-0182 / ADR-0216). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled
ninja -C build
meson test -C build # 50/50 pass on lavapipe
# Cross-backend parity gate (places=4):
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4
0105 — moment_vulkan.c + ciede_vulkan.c migrated to vulkan/kernel_template.h¶
- PR: refactor/migrate-motion-vulkan-to-template (note: the branch name reflects the original intent; motion's two-pipeline shape didn't fit the template's single-pipeline contract, so this PR migrates moment + ciede instead).
- What rebases need to know: second + third consumers of
vulkan/kernel_template.h(after PR #270 = psnr_vulkan, the first consumer). Both files follow the identical migration pattern: - Replace 5 individual pipeline-object fields (
dsl,pipeline_layout,shader,pipeline,desc_pool) with oneVmafVulkanKernelPipeline plbundle. - Replace ~100 LOC of
create_pipeline()body (descriptor-set layout + pipeline layout + shader module + compute pipeline + descriptor pool boilerplate) with a singlevmaf_vulkan_kernel_pipeline_create()call. - Replace
close_fex()'svkDeviceWaitIdle+ 5×vkDestroy*sweep with onevmaf_vulkan_kernel_pipeline_destroy()call. - Per-file LOC deltas:
moment_vulkan.c: −60 LOC (450 → 390).ciede_vulkan.c: −59 LOC (536 → 477).- Net: −119 LOC.
- Bit-exactness preserved: spec-constants (width/height/bpc/ subgroup_size identical across both), push-constant structs (
MomentPushConsts,CiedePushConsts), shader bytecodes (moment_spv,ciede_spv), dispatch grid math, and host-side reductions are byte-identical to the prior implementation. Cross-backend parity gates (places=4 for moment integer reduce; places=2 for ciede transcendentals per ADR-0187) re-run unchanged. motion_vulkan.cdeferred: motion uses two pipelines (first frame vs subsequent) sharing one DSL + layout + shader + pool. The template's current shape produces one pipeline per descriptor; splitting motion across twoVmafVulkanKernelPipelineinstances would duplicate the shared objects. Tracked as a follow-up template extension (multi-pipeline support).- On upstream sync: zero interaction. Both files are fork-introduced (T7-23 / ADR-0182 / ADR-0187).
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build # 50/50 pass on lavapipe (under ASan/UBSan) python scripts/ci/cross_backend_parity_gate.py --feature float_moment_ref1st --places 4 python scripts/ci/cross_backend_parity_gate.py --feature ciede2000 --places 2
0101 — GPU backend pattern doc (ADR-0240)¶
- PR: docs/gpu-backend-template.
- What rebases need to know: doc-only PR. Adds
docs/development/gpu-backend-template.md(recipe new GPU backends follow) andcore/include/libvmaf/AGENTS.md(public-headers-tree invariant note). No source code, no meson changes, no ABI impact. - On upstream sync: zero interaction. Both files are fork-introduced.
- Re-test on rebase:
```bash # Doc-only — verify links resolve: test -f docs/development/gpu-backend-template.md test -f core/include/libvmaf/AGENTS.md grep -c 'gpu-backend-template' core/include/libvmaf/AGENTS.md
0102 — Tiny-AI test registration macro (tiny_ai_test_template.h)¶
- PR: refactor/test-registration-macro.
- What rebases need to know: new
core/test/tiny_ai_test_template.hemits the four standard registration tests (<name>_is_registered,<name>_provides_primary_feature,<name>_options_table_well_formed,<name>_init_rejects_missing_model) via theVMAF_TINY_AI_DEFINE_REGISTRATION_TESTS(ext, feat, env, prefix)macro. The four per-extractor test files (test_lpips.c,test_mobilesal.c,test_transnet_v2.c,test_fastdvdnet_pre.c) shrank from ~140 LOC each to ~20-50 LOC. Net −286 LOC. Behavior bit-exact preserved (same assertions, same env-var save/restore dance, same setenv shim for MSVCRT). TransNet V2 keeps two extractor-specific extra tests (binary-flag round-trip + provided_features list-termination) that the macro doesn't cover. - On upstream sync: zero interaction. The four test files are fork-introduced (per ADR-0042 / ADR-0168 / ADR-0220 / ADR-0223 / ADR-0215).
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre # 4/4 binaries pass; 18 individual tests total (4x4 standard + 2 # TransNet V2 extras).
0103 — integer_psnr_cuda.c migrated to cuda/kernel_template.h¶
- PR: refactor/migrate-psnr-cuda-to-template.
- What rebases need to know:
cuda/kernel_template.hshipped with no consumers in PR #251 (ADR-0246). This PR migrates the first consumer (integer_psnr_cuda.c) — the file the template's own docstring explicitly designated as the reference. TheCUstream + CUevent + CUeventtriple and the(VmafCudaBuffer device, void *host_pinned, size_t bytes)readback pair are now dispensed by the template helpers (vmaf_cuda_kernel_lifecycle_init/_close,vmaf_cuda_kernel_readback_alloc/_free,vmaf_cuda_kernel_submit_pre_launch,vmaf_cuda_kernel_collect_wait) instead of being open-coded.PsnrStateCudashrinks: replaces three fields (event+finished+str) with oneVmafCudaKernelLifecyclereplaces (sse+sse_host) with oneVmafCudaKernelReadback. - Net LOC delta: +8 LOC on
integer_psnr_cuda.calone — the helpers add per-call boilerplate. The dedup win materialises as more CUDA feature kernels (motion / moment / ssim / vif / adm) migrate one-at-a-time in follow-up PRs. Each subsequent migration saves ~15 LOC. - Bit-exactness gates: kernel launch + reduction logic unchanged. The migration only touches state-management boilerplate around the kernel; the SSE accumulator math, the per-bpc kernel function lookup, the host-side
log10score formula, and the dispatch grid-dim calculation are byte-identical to the prior implementation. Netflix golden gate + CPU/CUDA cross-backend parity gate (places=4) re-run unchanged. - On upstream sync: zero interaction.
integer_psnr_cuda.cis fork-introduced (T7-23 / ADR-0182). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true
ninja -C build
meson test -C build # CUDA test suite must pass
# Cross-backend parity gate:
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4
0125 — Vulkan submit-side template + fence pool + descriptor pre-alloc bundle (ADR-0256)¶
- Touches:
core/src/vulkan/kernel_template.h— fork-local. Output landing inruns/phase_a/is gitignored — rerun the script to reproduce.VmafVulkanKernelSubmitPoolstruct +_create/_destroy/_acquirehelpers +vmaf_vulkan_kernel_descriptor_sets_allochelper. Upstream has no Vulkan backend — no merge surface.core/src/feature/vulkan/{psnr_hvs,vif,float_vif,float_adm}_vulkan.c— fork-local kernel TUs, also no upstream peer.- Invariant: the four migrated kernels keep all per-frame
VkFence+VkCommandBuffer+VkDescriptorSetresources alive across frames in the pool. Pre-bound descriptor sets rely on the kernel'sVmafVulkanBuffer *handles being init-time stable (allocated ininit(), freed only inclose_fex).vmaf_vulkan_kernel_pipeline_destroydestroys the descriptor pool — pre-allocated sets are released implicitly via the pool; callers must NOT callvkFreeDescriptorSetson them. - Re-test on rebase:
meson setup build libvmaf -Denable_vulkan=enabled
ninja -C build
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
meson test -C build test_vulkan_smoke \
test_vulkan_async_pending_fence \
test_vulkan_pic_preallocation
python scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature vif --backend vulkan --places 4
python scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature adm --backend vulkan --places 4
0107 — psnr_hvs_cuda async upload + persistent pinned staging (T-GPU-OPT-2/3)¶
- Touches:
core/src/feature/cuda/integer_psnr_hvs_cuda.c— only consumer; fork-local from inception (T7-23 / ADR-0188 / ADR-0191). State addsupload_str(dedicated H2D stream),upload_done(cross-stream completion event), and per-plane persistent pinnedh_uint_ref[3]/h_uint_dist[3]staging buffers allocated once ininit_fex_cuda. The per-call helperupload_plane_cudais split intoissue_d2h_plane(pic-stream D2H),convert_plane(CPU normalise), andissue_h2d_plane(upload-stream H2D).submit_fex_cudaruns the three phases explicitly and recordsupload_doneafter the last H2D, thencuStreamWaitEvents onlc.strbefore kernel launches.core/src/cuda/AGENTS.md— adds a rebase-sensitive invariant entry under §Rebase-sensitive invariants documenting the three-phase flow + persistent staging contract.- Invariant: the pinned
h_uint_*andh_ref/h_distbuffers are never freed and re-allocated mid-stream; the H2Ds must run onupload_str(not onlc.str) so thecuStreamWaitEventcross-stream link is meaningful; theupload_doneevent is recorded after the last H2D for the current frame and waited on once before the first kernel launch of that frame. CUDA graph capture (future T-GPU-OPT-N) depends on the no-per-frame-alloc invariant; collapsing the three-phase split or re-introducing per-framevmaf_cuda_buffer_host_alloccalls breaks that follow-up. Bit-exactness gate isplaces=3forpsnr_hvs_y / cb / crand the combinedpsnr_hvs(matches the existing matrix; notplaces=4). - On upstream sync: zero interaction.
integer_psnr_hvs_cuda.cis fork-introduced (T7-23 / ADR-0188 / ADR-0191). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature psnr_hvs --backend cuda --places 3
0227 — output.c writer-format unit tests (R3 of coverage-gap-2026-05-02)¶
- Touches:
core/test/test_output.c(new) — exercises the four writers incore/src/output.c(XML / JSON / CSV / SUB) end-to-end viatmpfile()-backed sinks and a syntheticVmafFeatureCollector. Pure test-only; no production code change.core/test/meson.build— registerstest_outputnext totest_feature_collector(mirrors that test's wiring:link_with: libvmaf+ libsvm objects + log/predict/metadata helpers).- Invariant: the test pulls
libvmaf.candoutput.cin via#include "*.c"(mirroring the precedent intest_feature_collector.c) so the per-translation-unit.gcnolands in the test build dir and gcovr aggregates output.c's coverage. The mu-test framework macro (mu_assert) deliberately early-returns from eachstatic char *test_*()body — that's why every test body tripsclang-analyzer-unix.Malloc"potential leak" notes (cleanup runs only on the success-tail path). This pattern is shared across everycore/test/test_*.cfile and is load- bearing (per ADR-0141 NOLINT carve-out): replacing it with goto- cleanup would obscure the per-assertion failure message. - On upstream sync: zero interaction.
output.cis upstream- mirrored, but this PR doesn't touch it. The test only depends on the four public function signatures (vmaf_write_output_{xml, json,csv,sub}); if Netflix renames or reorders those, the test fails to compile and the rebase author updates it then. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && ./build/test/test_output
0126 — OSSF Scorecard policy (ADR-0263)¶
- Touches:
.github/workflows/scorecard.yml(line 45 — thegithub/codeql-action/upload-sarif@<sha>pin). The rest of the policy is doc-only (docs/adr/0263-*.md,docs/research/0053-*.md,changelog.d/security/). Upstream Netflix/vmaf does not ship a Scorecard workflow, so the path itself is fork-introduced and won't conflict. - Invariant: the
upload-sarifSHA must point to a commit that currently exists ingithub/codeql-action's git tree. A SHA that was oncev4head but no longer exists in the action repository triggers Scorecard's "imposter commit" defence and breaks the workflow with a 400 error againstapi.scorecard.dev. Verify on every Dependabot bump by spot-checkinggh api /repos/github/codeql-action/commits/<sha>returns 200. - On upstream sync: zero interaction.
- Re-test on rebase:
```bash # Confirm the pin still resolves to a real commit: pin=$(grep -oE 'codeql-action/upload-sarif@[a-f0-9]{40}' \ .github/workflows/scorecard.yml | head -1 | cut -d@ -f2) gh api "/repos/github/codeql-action/commits/$pin" --jq '.sha' # Then watch the next master push for a green Scorecard run: gh run list --workflow scorecard --repo VMAFx/vmafx --limit 1
0228 — U-2-Net u2netp saliency replacement deferred (ADR-0265)¶
- Touches: docs-only.
docs/adr/0265-u2netp-saliency-replacement-blocked.md— new ADR continuing the deferral chain started by ADR-0257.docs/research/0055-u2netp-saliency-replacement-survey.md— new research digest (upstream survey + license + distribution- op-allowlist audit + alternatives walk).
docs/ai/models/mobilesal.md— pointer block updated to reference both ADR-0257 (first blocker) and ADR-0265 (second blocker).model/tiny/registry.json—mobilesal_placeholder_v0notesfield updated to reference ADR-0265 alongside ADR-0257 (no schema / sha256 / file changes).model/tiny/mobilesal.json— sidecarnotesfield updated in lockstep.scripts/gen_mobilesal_placeholder_onnx.py— generator notes string updated so re-running is idempotent against the new sidecar / registry text.CHANGELOG.md— Changed entry viachangelog.d/changed/T6-2a-followup-u2netp-replacement-deferred.md.docs/adr/README.md— index row viadocs/adr/_index_fragments/0265-u2netp-saliency-replacement-blocked.md.- Invariant: zero C-side surface change.
feature_mobilesal.ctensor-name contract (inputinput→ outputsaliency_map, NCHW float32[1, 3, H, W]→[1, 1, H, W]) is unchanged; the on-diskmodel/tiny/mobilesal.onnx(sha256f1226310…) is unchanged;mobilesal_placeholder_v0'ssmoke: trueflag is unchanged. Any future drop-in (U-2-Net viaT6-2a-mirror-u2netp-via-release+T6-2a-widen-allowlist-resize, distilled student, or BASNet / PoolNet survey result) replaces the.onnxand bumps the registry sha256 without touching the C side. - On upstream sync: zero interaction.
feature_mobilesal.c, the registry, the ADR, and the research digest are all fork-local (T6-2a; ADR-0218 / ADR-0257 / ADR-0265; not present in Netflix upstream). - Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_mobilesal
python3 ai/scripts/validate_model_registry.py
bash scripts/docs/concat-adr-index.sh --check
bash scripts/release/concat-changelog-fragments.sh --check
0108 — ssim_accumulate_avx512 per-lane double reduction vectorised¶
- ADR: ADR-0139 (existing; no new ADR — the per-lane reduction order is unchanged).
- Touches:
core/src/feature/x86/ssim_avx512.c— thessim_accumulate_block_avx512body. The per-lane scalarssim_accumulate_lanecalls (16 of them) are replaced by two 8-wide__m512dpasses that computelv,cv,sv, andlv*cv*svlane-wise in vector double. Aligneddouble[16]spill buffers replace the previous_Alignas(64) float[16]×6spill, and the scalar accumulation loop now does 4×16vaddsdinstead of 16 invocations of the per-lane helper.CHANGELOG.md— Changed entry.- This file — this entry.
- Invariant (load-bearing for ADR-0139 bit-exactness):
- Per-lane double computation order is byte-identical:
((2.0 * rm) * cm + C1) / l_den, then(2.0 * srsc + C2) / c_den, then(lv * cv) * sv. No FMA contraction (separate_mm512_mul_pd+_mm512_add_pd—_mm512_fmadd_pdis forbidden because it changes the rounding count and would diverge from scalar's two-stepmul+add). - Float→double widening uses
_mm512_cvtps_pdwhich is IEEE-754-exact for finite floats (52-bit mantissa fits 23-bit float losslessly). - Lane-by-lane left-to-right reduction order preserved:
local_ssim += t_ssim[k]fork = 0..15. Tree reductions (pairwise add, dual-accumulator unroll) are forbidden — they break running-sum associativity against scalar. - AVX2 / NEON twins kept on the per-lane scalar path. Verified bit-identical against the new AVX-512 at
--precision maxon the Netflixsrc01_hrc00/01_576x324and thecheckerboard_1920_1080_10_3_*_0pairs. The bit-exactness contract (ADR-0139) is per-lane, not per-ISA algorithm — so AVX2 / NEON stay scalar-per-lane until a dedicated PR vectorises them with the same care. - Rebase impact: zero conflict with Netflix upstream — the whole SSIM SIMD surface is fork-local (no upstream SSIM SIMD exists). Conflicts only arise if upstream changes
ssim_accumulate_default_scalariniqa/ssim_tools.c; in that case both the AVX2 / NEON per-lane helper and the AVX-512 vector-double block need a coordinated update preserving the three invariants above. - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build
# Bit-exact at --precision max, scalar vs AVX2 vs AVX-512:
for MASK in 0 16 255; do
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--feature float_ms_ssim --feature float_ssim \
--xml -o /tmp/m${MASK}.xml --precision max --cpumask $MASK
done
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m16.xml) # empty
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m255.xml) # empty
- Why this matters on rebase: an upstream commit that touches
core/src/feature/ssimulacra2.ccould prompt a "let's also port the GPU XYB while we're here" follow-up. The ledger entry is the standing answer: don't, the measurement was redone on NVIDIA in May 2026 and the result still failedplaces=4by five decades. See Research-0047.
0126 — FastDVDnet real upstream weights drop (ADR-0253)¶
- What changed: replaces
model/tiny/fastdvdnet_pre.onnxwith the wrapped real upstream FastDVDnet checkpoint (sha256eb9444cf6f07eefdc7f4f68d09131074dbd1dcee6f88a331ba684dd2fb5937d4, ~9.5 MiB), refreshes the sidecarmodel/tiny/fastdvdnet_pre.json, flips the registry row'ssmoke: true → falseand addslicense: "MIT"+ the upstream commit pinc8fdf61. New exporterai/scripts/export_fastdvdnet_pre.py(the older_placeholder.pyexporter is retained for reference). New ADRdocs/adr/0255-fastdvdnet-pre-real-weights.md; user-facing docdocs/ai/models/fastdvdnet_pre.mdrewritten with provenance, license attribution, and reproduce-the-export instructions. - Upstream source: fork-local. Netflix/vmaf does not ship a FastDVDnet temporal pre-filter; the C extractor and ONNX surface are entirely fork-introduced (ADR-0215). The wrapped weights are attribution-only (upstream
m-tassano/fastdvdnetMIT). - On upstream sync: zero interaction. Every file touched (
ai/scripts/export_fastdvdnet_pre*.py,model/tiny/fastdvdnet_pre.*,docs/ai/models/fastdvdnet_pre.md,docs/adr/0253-*.md, CHANGELOG fragment, ADR index fragment) lives in fork-introduced trees. - Re-test on rebase:
# Re-derive the ONNX from the pinned upstream checkpoint.
mkdir -p /tmp/fastdvdnet_upstream && cd /tmp/fastdvdnet_upstream
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/model.pth
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/models.py
cd /path/to/vmaf
python3 ai/scripts/export_fastdvdnet_pre.py \
--upstream-dir /tmp/fastdvdnet_upstream
python3 ai/scripts/validate_model_registry.py
meson test -C build --suite=fast --print-errorlogs test_fastdvdnet_pre
0127 — ONNX op-allowlist gains Resize (ADR-0258)¶
- Touches:
core/src/dnn/op_allowlist.c— fork-local file (no upstream counterpart). One new entry"Resize"under the/* convolutional */block.core/test/dnn/test_op_allowlist.c,core/test/dnn/test_onnx_scan.c— fork-local DNN tests.ai/tests/test_op_allowlist.py— fork-local Python parity test.- Invariant: the C allowlist is the single source of truth; the Python regex parser in
ai/src/vmaf_train/op_allowlist.pywalks the sameop_allowlist.cfile. Any future entry only needs the C edit — Python symmetry is automatic. - Upstream source: fork-local. Netflix/vmaf has no ONNX op- allowlist surface; the entire
core/src/dnn/tree is fork- introduced. - On upstream sync: zero interaction. Every file touched lives in fork-introduced trees.
- Re-test on rebase:
meson test -C build test_op_allowlist test_onnx_scan
PYTHONPATH=ai/src python -m pytest ai/tests/test_op_allowlist.py
0231 — vif.comp + ciede.comp precise decorations (ADR-0269 / Step A of Vulkan 1.4 bump)¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(3 local-variable type qualifiers:g,sv_sq,gg_sigma_f→precise float),core/src/feature/vulkan/shaders/ciede.comp(yuv_to_rgboutputs,rgb_to_xyzmatmul accumulators,ciede2000chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE). - Invariant: Both shaders are fork-local (Vulkan backend is fork-added; upstream Netflix/vmaf has no Vulkan compute kernels). The
precisekeyword is GLSL 4.50 standard syntax; glslc 2026.1 lowers it to per-resultOpDecorate NoContraction. The decorations are load-bearing for the cross-backend gate on NVIDIA driver 595.71+ — removing them would re-introduce the 42/48 ciede regression at API 1.3 documented in research-0054. - On upstream sync: zero interaction. Both shader files are entirely fork-introduced; upstream has no Vulkan compute path.
- Re-test on rebase:
# Re-confirm the cross-backend gate on a Vulkan-capable host.
meson setup core/build -Denable_vulkan=enabled
ninja -C core/build
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature vif --backend vulkan --places 4
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature ciede --backend vulkan --places 4
# Confirm SPIR-V still emits NoContraction post-rebase.
glslc --target-env=vulkan1.3 -O \
core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
spirv-dis /tmp/vif.spv | grep -c NoContraction # expect ≥ 60
Expected on NVIDIA 595.71+: vif 0/48 OK, ciede 5/48 FAIL (max abs 8.9e-05 — pre-existing fork debt at API 1.3, see ADR-0269). On RADV / lavapipe: bit-exact (precise is a no-op there).
0229 — fr_regressor_v2 codec-aware scaffold (ADR-0272)¶
- ADR: ADR-0272
- Touches:
ai/scripts/train_fr_regressor_v2.py(new) — Phase A JSONL consumer; trains the codec-aware FRRegressor.model/tiny/fr_regressor_v2.onnx(new, smoke) — placeholder ONNX from--smokemode; re-baked on production training.model/tiny/fr_regressor_v2.json(new) — sidecar.model/tiny/registry.json— new entry withsmoke: true.docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md(new).docs/adr/README.md— index row.docs/research/0058-fr-regressor-v2-feasibility.md(new).docs/ai/models/fr_regressor_v2.md(new) — model card.ai/AGENTS.md— invariant note (codec block layout + ENCODER_VOCAB ordering).CHANGELOG.md— Added entry.- Invariant: the 8-D codec block layout is
[encoder_onehot(6), preset_norm, crf_norm]withENCODER_VOCAB = (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown)in load-bearing order. CRF normaliser is/63(union upper bound). Preset normaliser is/9. Bumping the vocabulary requires a re-train; existing checkpoints pin the order they were trained against viaencoder_vocab_versionin the sidecar. The two-input ONNX (features,codec) follows the LPIPS-Sq precedent (ADR-0040 / ADR-0041). - Rebase impact: entirely fork-local; pure additive; no upstream-mirror file is touched. Phase A schema (consumed by this trainer) is itself fork-local (
tools/vmaf-tune/). No conflict expected on/sync-upstream. - Re-test on rebase:
0311 — libFuzzer harness expansion: yuv_input + cli_parse (ADR-0311)¶
- ADR: ADR-0311; parent ADR-0270.
- Touches:
core/test/fuzz/fuzz_yuv_input.c(new)core/test/fuzz/fuzz_cli_parse.c(new)core/test/fuzz/meson.build— two newexecutable(...)blocks for the harnesses, plus a sharedfuzz_vidinput_sourceslist.core/test/fuzz/yuv_input_corpus/*(new — 6 seeds covering 8/10-bit × 4:2:0 / 4:2:2 / 4:4:4 plus a truncated-frame seed).core/test/fuzz/cli_parse_corpus/*(new — 6 seeds covering the--feature,--model,--reference, YUV-flag, and--helpshapes).core/test/fuzz/README.md— Targets table extended..github/workflows/fuzz.yml— matrix gainsfuzz_yuv_input+fuzz_cli_parse; per-harness wall-clock budget reduced from 300 s to 60 s so the 3-target matrix fits the existingtimeout-minutes: 15cap.docs/development/fuzzing.md— runbook table + smoke commands extended.docs/adr/0311-libfuzzer-harness-expansion.md(new)docs/research/0083-libfuzzer-harness-expansion-target-survey.md(new)libvmaf/AGENTS.md— new invariant block for the one-parser-one-harness rule.CHANGELOG.md— Added entry.- Invariant:
- The fuzz scaffold remains opt-in (
-Dfuzz=true) — every defaultmeson setupinvocation must continue to skip it. fuzz_yuv_inputre-includestools/yuv_input.cand the rest of the vidinput trio as build inputs. Upstream Netflix/vmaf splits or renames of those source files need the matchingmeson.buildsource-list update.fuzz_cli_parsere-includestools/cli_parse.cas a build input and links againstlibvmafforvmaf_version()and feature-dictionary symbols. The-Wl,--wrap=exitlink arg is load-bearing — without it,usage()'sexit(1)would terminate the fuzzer process on first bad input.LLVMFuzzerTestOneInputkeeps external linkage; the scaffold-wide// NOLINTNEXTLINE(misc-use-internal-linkage)pattern is correct for libFuzzer's name-resolved entry-point ABI.- Rebase impact: any upstream sync that touches
core/tools/{yuv_input,cli_parse}.cmust re-run the 60 s smoke per harness on the merged tip; record any new-found crash-* artefact under the matching<target>_known_crashes/dir, not in<target>_corpus/. The__wrap_exitshim infuzz_cli_parse.cis GNU-ld / lld-only; do not assume it works on Apple ld without an-undefined,dynamic_lookupfallback. - Re-test on rebase:
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz \
test/fuzz/fuzz_y4m_input \
test/fuzz/fuzz_yuv_input \
test/fuzz/fuzz_cli_parse
./build-fuzz/test/fuzz/fuzz_yuv_input \
-seed=0 -runs=1000 \
core/test/fuzz/yuv_input_corpus/
./build-fuzz/test/fuzz/fuzz_cli_parse \
-seed=0 -runs=1000 \
core/test/fuzz/cli_parse_corpus/
0229 — libFuzzer scaffold for the YUV4MPEG2 parser (ADR-0270)¶
- ADR: ADR-0270
- Touches:
core/test/fuzz/fuzz_y4m_input.c(new)core/test/fuzz/meson.build(new)core/test/fuzz/README.md(new)core/test/fuzz/y4m_input_corpus/*(new — six seeds)core/test/fuzz/y4m_input_known_crashes/*(new — one 411-chroma OOB reproducer; excluded from CI corpus)core/test/meson.build—subdir('fuzz')line.core/meson_options.txt— newoption('fuzz', ...)..github/workflows/fuzz.yml(new — nightly 5-minute job).docs/development/fuzzing.md(new — operator runbook).docs/adr/0270-fuzzing-scaffold.md(new)docs/research/0059-libfuzzer-scaffold-y4m.md(new)docs/state.md— new Open-bug row for the 411-chroma OOB write.CHANGELOG.md— Added entry.- Invariant: the fuzz scaffold is opt-in — every default
meson setupinvocation must continue to skip it. The harness links statically againstcore/tools/{y4m_input,yuv_input,vidinput}.crather thanlibvmaf.soso the public C-API surface stays unchanged. - Rebase impact: the harness re-includes
core/tools/y4m_input.cas a build input. Any upstream Netflix/vmaf change that splits or renames the tool sources (e.g. moves the parser intocore/src/) needs the correspondingmeson.buildsource list update and the harness re-test below. They4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4mreproducer is the regression gate for the parser fix; do not delete it on upstream sync — if upstream lands the same fix, port the reproducer back intoy4m_input_corpus/as a permanent seed. - Re-test on rebase:
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
./build-fuzz/test/fuzz/fuzz_y4m_input \
-max_total_time=60 \
core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer still triggers (until the fix lands):
./build-fuzz/test/fuzz/fuzz_y4m_input \
core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m
0231 — HIP seventh-consumer kernel float_motion_hip (ADR-0273)¶
- ADR: ADR-0273
- Touches:
core/src/feature/hip/float_motion_hip.c(new) — seventh consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/float_motion_cuda.ccall-graph-for-call-graph;init/submit/collect/closeinvoke the kernel-template helpers in the same order;flush()callback for tail-frame motion2 emission;motion_force_zeroshort-circuit posture (fex->extractswap withsubmit / collect / flush / closenulled). Submit path intentionally bypassesvmaf_hip_kernel_submit_pre_launch(kernel writes per-WG SAD float partials directly, no atomic, no memset).core/src/feature/hip/float_motion_hip.h(new)core/src/hip/meson.build— new entry inhip_sources.core/src/feature/feature_extractor.c— extern declaration plusfeature_extractor_list[]entry under#if HAVE_HIP.core/test/test_hip_smoke.c— new sub-testtest_float_motion_hip_extractor_registered(also asserts theVMAF_FEATURE_EXTRACTOR_TEMPORALflag bit) and a row intest_table[].docs/adr/0273-hip-seventh-consumer-float-motion.md(new)docs/adr/README.md— index row.docs/backends/hip/overview.md— seventh / eighth consumer note.core/src/hip/AGENTS.md— invariant note.CHANGELOG.md— Added entry (joint with ADR-0274).- Invariant — three-buffer ping-pong +
motion_force_zeroshort-circuit are load-bearing. The state struct carries threeuintptr_tbuffer slots (ref_in,blur[2]) that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin'sVmafCudaBuffer *ref_in+VmafCudaBuffer *blur[2]field shape. Themotion_force_zeroshort-circuit (fex->extractswap, kernel-template helpers nulled) must stay aligned with the CUDA twin on every refactor — otherwise the runtime PR's helper-body flip diverges between the two backends. Thesubmit_pre_launchbypass mirrors the CUDA twin; if a future PR adds asubmit_pre_launchcall tofloat_motion_cuda.c's submit path, the HIP twin must follow in the same PR. - Rebase impact: entirely fork-local. New files are HIP-specific. The only upstream-touching edit is
feature_extractor.c, but the change sits inside an existing#if HAVE_HIPblock (ADR-0241); upstream has noHAVE_HIPso no conflict is expected. - Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke
0232 — HIP eighth-consumer kernel float_ssim_hip (ADR-0274)¶
- ADR: ADR-0274
- Touches:
core/src/feature/hip/float_ssim_hip.c(new) — eighth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/integer_ssim_cuda.ccall-graph-for-call-graph (the CUDA file registersvmaf_fex_float_ssim_cudadespite itsinteger_filename). First multi-dispatch HIP consumer (chars.n_dispatches_per_frame == 2). Submit path intentionally bypassesvmaf_hip_kernel_submit_pre_launch(kernel writes per-block float partials directly). State struct carries fiveuintptr_tintermediate float buffer slots (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp) tracked outside the kernel-template's readback bundle.validate_dims_hipandinit_dims_hiphelpers extracted frominit()to fit thereadability-function-sizebudget.core/src/feature/hip/float_ssim_hip.h(new)core/src/hip/meson.build— new entry inhip_sources.core/src/feature/feature_extractor.c— extern declaration plusfeature_extractor_list[]entry under#if HAVE_HIP.core/test/test_hip_smoke.c— new sub-testtest_float_ssim_hip_extractor_registered(also assertschars.n_dispatches_per_frame == 2) and a row intest_table[].docs/adr/0274-hip-eighth-consumer-float-ssim.md(new)docs/adr/README.md— index row.docs/backends/hip/overview.md— seventh / eighth consumer note (joint).core/src/hip/AGENTS.md— invariant note.CHANGELOG.md— Added entry (joint with ADR-0273).- Invariant — multi-dispatch + five-slot buffer pyramid + v1
scale=1validation are load-bearing. The state struct carries fiveuintptr_tintermediate float buffer slots that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin'sVmafCudaBuffer *h_*field shape — any drift in the CUDA twin's slot count requires a paired update here. Thechars.n_dispatches_per_frame == 2characteristic is asserted in the smoke test; do not silently lower it. The v1scale=1-EINVALvalidation surface (invalidate_dims_hip) must stay aligned with the CUDA twin'scompute_scale/vmaf_logchain. The HIP twin'svalidate_dims_hip/init_dims_hipextraction is intentional for the function-size budget; do not re-inline without verifying the budget still passes. - Rebase impact: entirely fork-local; same posture as ADR-0273.
- Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke
0229 — vmaf_tiny_v3 + vmaf_tiny_v4 dynamic-PTQ int8 sidecars (ADR-0275)¶
0278 — vmaf-tune libaom-av1 codec adapter (2026-05-03)¶
0228 — vmaf-tune libx265 codec adapter (ADR-0288)¶
0280 — vmaf-tune NVENC codec adapters (ADR-0290)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_nvenc,hevc_nvenc,av1_nvenc,_nvenc_common}.py(new). Wholly fork-local — no upstream Netflix/vmaf overlap.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py— registry expanded.tools/vmaf-tune/tests/test_codec_adapter_nvenc.py(new).tools/vmaf-tune/tests/test_corpus.py— Phase-A registry assertion updated.tools/vmaf-tune/AGENTS.md— invariant note expanded.docs/usage/vmaf-tune.md— "Hardware encoders (NVENC)" section.docs/adr/0290-vmaf-tune-nvenc-adapters.md(new) +docs/adr/README.mdindex row.docs/research/0065-vmaf-tune-nvenc-adapters.md(new).CHANGELOG.md— Added entry.- Invariant:
known_codecs()returns the four-codec tuple("av1_nvenc", "h264_nvenc", "hevc_nvenc", "libx264"); the mnemonic preset map (ultrafast/superfast/veryfast→p1,faster→p2,fast→p3,medium→p4,slow→p5,slower→p6,slowest/placebo→p7) is the canonical cross-codec preset alignment that downstream Phase B/C consumers assume. The CQ window is the hardware-permitted[0, 51]; the Phase A informative window is[15, 40]. - Rebase impact: zero —
tools/vmaf-tune/is wholly fork-local and has no upstream Netflix/vmaf path overlap. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry add),tools/vmaf-tune/src/vmaftune/encode.py(parse_versions(stderr, encoder=…)gains a per-codec branch),tools/vmaf-tune/src/vmaftune/cli.py(help-text wording only),tools/vmaf-tune/tests/test_codec_adapter_x265.py(new),tools/vmaf-tune/tests/test_corpus.py(membership-based codec list assertion). - Invariant: the codec-adapter contract documented in
tools/vmaf-tune/AGENTS.md(multi-codec from day one; the search loop never branches on codec identity). Theparse_versionssignature is still backward-compatible —encoderdefaults tolibx264so callers from before this PR keep working. - Upstream source: fork-local.
tools/vmaf-tune/is fork-only; upstream Netflix/vmaf does not ship encode automation. - On upstream sync: zero interaction. Confirm the
_index_fragments/_order.txtrow for0288-vmaf-tune-codec-adapter-x265remains present after any cross-merge. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry row + import),tools/vmaf-tune/tests/test_corpus.py(membership assertion relaxed from== ("libx264",)to"libx264" in known_codecs()),tools/vmaf-tune/tests/test_codec_adapter_libaom.py(new),tools/vmaf-tune/AGENTS.md(preset-vocabulary invariant). - Invariant: the cross-codec preset vocabulary (
placebo, slowest, slower, slow, medium, fast, faster, veryfast, superfast, ultrafast) is shared across AV1-family adapters so one--presetaxis covers x264 / x265 / svtav1 / libaom-av1. Each adapter maps the human name onto its codec-specific knob; do not introduce per-adapter preset names. - Upstream source: fork-local.
tools/vmaf-tune/is the fork-introduced quality-aware encode automation harness (ADR-0237); it has no upstream Netflix/vmaf counterpart. - On upstream sync: zero interaction with
upstream/master. Self-contained intools/vmaf-tune/anddocs/. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- ADR: ADR-0275
- Touches:
model/tiny/vmaf_tiny_v3.int8.onnx(new, 4 267 B)model/tiny/vmaf_tiny_v4.int8.onnx(new, 7 769 B)model/tiny/registry.json— newvmaf_tiny_v3andvmaf_tiny_v4rows withquant_mode,int8_sha256,quant_accuracy_budget_plccfields.model/tiny/vmaf_tiny_v3.json,model/tiny/vmaf_tiny_v4.json— same fields mirrored into the per-model sidecars.docs/ai/models/vmaf_tiny_v3.md,docs/ai/models/vmaf_tiny_v4.md— new "Quantisation" sections.docs/adr/0275-vmaf-tiny-v3-v4-ptq.md(new) and ADR index row.CHANGELOG.md— Added entry.- Invariant:
python ai/scripts/measure_quant_drop.py --allreports[PASS]for bothvmaf_tiny_v3(drop ≤ 0.001 on Netflix features) andvmaf_tiny_v4(drop ≤ 0.001), inside the 0.01 per-model budget. The runtime redirect from ADR-0174 picks the.int8.onnxsibling when an operator's registry overlay declaresquant_mode: dynamic. - Rebase impact: entirely fork-local — neither v3 nor v4 nor the dynamic-PTQ harness exists upstream. The new int8 ONNX bytes ship as committed binaries (mirroring
learned_filter_v1andnr_metric_v1); they are well below the few-MB external-data threshold and don't require the sigstore +.onnx.datapattern. - Re-test on rebase:
```bash python ai/scripts/validate_model_registry.py python ai/scripts/measure_quant_drop.py --all
0229 — NVIDIA-Vulkan ciede2000 places=4 fork debt root-cause (ADR-0273)¶
- Touched files: docs-only.
docs/adr/0273-...precision-gap.md(new) +_index_fragments/row +_order.txtappend.docs/research/0055-ciede-vulkan-nvidia-f32-f64-root-cause.md(new) +docs/research/README.mdindex row.docs/state.md— Open-bugs rowT-VK-CIEDE-F32-F64.docs/backends/vulkan/overview.md— NVIDIA-hardware caveat.changelog.d/changed/ciede-vulkan-nvidia-f32-f64-precision-gap.md(new).core/src/vulkan/AGENTS.md— invariant cross-link.- Invariant: the ciede.comp shader's f32 precision contract is load-bearing — promoting to f64 would silently change scores on every Vulkan device that supports
shaderFloat64and create a per-device-feature-bit divergence (RTX 4090 has it; many consumer GPUs don't). The CPUciede.c::get_lab_colordoing its colour-space chain indoubleis upstream Netflix behaviour and must not be narrowed to f32 to "fix" the GPU gap (would change Netflix golden ground truth). The 5/48 NVIDIA places=4 mismatch on the highest-ΔE frames is expected and documented; do not attempt to "fix" it without re-reading ADR-0273 first. - Rebase impact: zero — docs-only. The CPU and shader sources this ADR analyses are unchanged by this PR. If a future upstream rebase touches
ciede.c::get_lab_color(thedoublechain) the ADR's reasoning still holds; if upstream changes the CPU reference's precision posture, ADR-0273 needs aStatus: Supersededentry. - Re-test on rebase: a manual NVIDIA-hardware run if available:
```bash cd libvmaf && meson setup build \ -Denable_vulkan=enabled -Denable_cuda=false && ninja -C build cd .. python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary $PWD/core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature ciede --backend vulkan --device 0 --places 4 # Expected post-PR-346 (when merged): 5/48 mismatches at 1.78× threshold. # Expected pre-PR-346 (current master): 42/48 mismatches at higher ratio. # If the count drops below 5/48 on NVIDIA, ADR-0273 should record the # delta and consider closing T-VK-CIEDE-F32-F64.
0229 — tools/vmaf-tune fast Phase A.5 scaffold (ADR-0276)¶
- Touches:
tools/vmaf-tune/src/vmaftune/fast.py(new),tools/vmaf-tune/src/vmaftune/cli.py(newfastsubcommand branch),tools/vmaf-tune/pyproject.toml(new[fast]extra),tools/vmaf-tune/tests/test_fast.py(new),tools/vmaf-tune/AGENTS.md(new invariants),docs/usage/vmaf-tune.md(new "Phase A.5" section),docs/adr/0276-vmaf-tune-fast-path.md(new ADR),docs/research/0060-vmaf-tune-fast-path.md(new digest). - Invariant: the
fastsubcommand is opt-in and never automatically replaces the Phase A grid path. The slow grid is the ground-truth corpus generator (ADR-0237 contract); fast-path is for the recommendation use case only. Optuna is a lazy-imported optional dep gated behind the[fast]extra — importing it at module scope outsidefast.py(or its tests) breaks the zero-dep core install. - Rebase impact: entirely fork-local; the tool sits under
tools/vmaf-tune/which is fork-added, and no upstream files are touched. Upstream Netflix/vmaf has no analogous surface. - Re-test on rebase:
pip install -e 'tools/vmaf-tune[fast]'
pytest tools/vmaf-tune/tests/test_fast.py -v
vmaf-tune fast --smoke --target-vmaf 92
0229 — vmaf-tune recommend subcommand (ADR-0237 Phase B-lite)¶
- Touches:
tools/vmaf-tune/src/vmaftune/recommend.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/cli.py— addsrecommendsubparser;corpussubcommand untouched.tools/vmaf-tune/tests/test_recommend.py(new). 13-case smoke suite, mocks all binaries; runs in <100 ms.docs/usage/vmaf-tune.md— adds## recommendsection.- Invariant:
recommendconsumes the existingCORPUS_ROW_KEYSschema unchanged —vmaf_score,bitrate_kbps,crf,preset,encoder,exit_status. No schema bump. If a future PR bumpsSCHEMA_VERSION, both thecorpuswriter and therecommendreader must be updated in lockstep; tests assert this viatest_corpus_row_keys_match_init_contract. - Rebase impact: zero —
tools/vmaf-tune/is wholly fork-local; no upstream surface touches it. - Re-test on rebase:
0228 — integer_ms_ssim_cuda.c joins drain_batch (T-GPU-OPT-2 / ADR-0271)¶
- Touches:
core/src/feature/cuda/integer_ms_ssim_cuda.c. No upstream Netflix/vmaf changes expected here — the file is fork-added (CUDA twin of the upstream-portms_ssim_score.cu) and the surface this PR redrew (per-scalel_partials[i]/c_partials[i]/s_partials[i]arrays + the per-scaleh_l_partials[i]/h_c_partials[i]/h_s_partials[i]pinned host shadows + thesubmit()<→collect()work redistribution + thecuEventRecord(s->lc.finished, s->lc.str)+vmaf_cuda_drain_batch_register(&s->lc)tail) is also entirely fork-local. - Invariant: the engine-scope drain-batch contract from ADR-0271 / drain_batch.h. The kernel-launch order on
s->lc.strmust stay stable:decimate (× 4)then for each scalei ∈ 0..4horiz⇒vert_lcs⇒ DtoH(l_partials[i]) ⇒ DtoH(c_partials[i]) ⇒ DtoH(s_partials[i])thencuEventRecord(s->lc.finished, s->lc.str)thenvmaf_cuda_drain_batch_register(&s->lc). Same-stream ordering is what makes the shared SSIM intermediates (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp`) safe across scales without explicit sync — any change that parallelises the per-scale work onto multiple streams breaks bit-exactness unless per-scale intermediates are also added. - On upstream sync: zero interaction (the file is fork-added). If a future upstream PR adds an
integer_ms_ssim_cuda.cof its own, the merger must reconcile the per-scale partials topology + the drain_batch tail with whatever the new upstream shape brings. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build # confirms the CPU build still links cleanly
# If the dev host has a working nvcc / host-compiler pair:
meson setup build_cuda -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda src/liblibvmaf_feature.a.p/feature_cuda_integer_ms_ssim_cuda.c.o
# Netflix CPU golden gate (CPU is the bit-exactness ground truth):
make test-netflix-golden
# Cross-backend parity (places=4 gate, ADR-0214):
/cross-backend-diff
0277 — ffmpeg-patches refresh against n8.1 — 2026-05-04 (ADR-0277)¶
- Touches:
ffmpeg-patches/is unchanged (no content drift). Doc-only entries land in: docs/adr/0277-ffmpeg-patches-refresh-2026-05-04.md— new ADR.docs/adr/_index_fragments/0277-ffmpeg-patches-refresh-2026-05-04.md— index row.docs/adr/_index_fragments/_order.txt— manifest append.changelog.d/changed/ffmpeg-patches-refresh-2026-05-04.md— Changed entry.- This file — this entry.
- Invariant:
ffmpeg-patches/series.txtorder is load-bearing — patches0002…0006build on each other and only apply cleanly cumulatively. The verification gate is a series replay, not a per-patchgit apply --check(per ADR-0118 + CLAUDE.md §12 r14). - On upstream sync: zero interaction. Netflix/vmaf has no
ffmpeg-patches/tree; this is a fork-local integration surface. - Re-test on rebase (also: re-replay procedure for the next refresh):
# Clone pristine n8.1
git -C /tmp clone --depth 1 --branch n8.1 \
https://github.com/FFmpeg/FFmpeg.git ff-replay-$(date +%F)
cd /tmp/ff-replay-$(date +%F)
git switch -c refresh-$(date +%F)
git config user.email refresh@local && git config user.name "Refresh Bot"
# Replay the series cumulatively
for p in /path/to/vmaf/ffmpeg-patches/000*-*.patch; do
git am --3way "$p" || break
done
# Regenerate and compare to in-tree
mkdir -p /tmp/ff-regen-$(date +%F)
git format-patch n8.1.. -o /tmp/ff-regen-$(date +%F)/
# Diff old vs new excluding pure format-patch noise
for i in 1 2 3 4 5 6; do
orig=$(ls /path/to/vmaf/ffmpeg-patches/000${i}-*.patch)
regen=$(ls /tmp/ff-regen-$(date +%F)/000${i}-*.patch)
diff -u \
<(grep -v "^From [0-9a-f]\|^Date:\|^index " "$orig") \
<(grep -v "^From [0-9a-f]\|^Date:\|^index " "$regen") \
| head -40
done
If only stylistic diffs surface (PATCH N/M numbering, MIME headers, hunk-context counts, hunk offset shifts against cumulative state), keep originals — record a no-drift refresh ADR. If real content drift surfaces, regenerate and ship the refresh PR with the regenerated patches plus a content-summary ADR.
End-to-end vf_libvmaf smoke is best run from CI (ffmpeg-integration.yml) against an installed libvmaf prefix — the meson-uninstalled .pc does not satisfy FFmpeg's #include <libvmaf.h> probe (the headers live under libvmaf/libvmaf.h only; the system-installed .pc carries an extra -I${includedir}/libvmaf shortcut that the uninstalled .pc omits).
0229 — T7-5 NOLINT-sweep closeout (ADR-0278)¶
- Touched files:
core/src/feature/integer_adm.c(1 NOLINT cite, line ~988adm_decouple_s123— upstream-mirror Netflix966be8d5).core/src/feature/cuda/ssimulacra2_cuda.c(3 NOLINT cites:ss2c_picture_to_linear_rgb,ss2c_host_combine,ss2c_run_scale_gpu/extract_fex_cuda).core/src/feature/vulkan/ssimulacra2_vulkan.c(3 NOLINT cites:ss2v_setup_gaussian,ss2v_picture_to_linear_rgb,ss2v_run_scale).core/src/feature/vulkan/cambi_vulkan.c(1 NOLINT cite:cambi_vk_extract).core/src/feature/sycl/integer_adm_sycl.cpp(6 cites, SYCL kernel-launch entries).core/src/feature/sycl/integer_motion_sycl.cpp(2 cites).core/src/feature/sycl/integer_vif_sycl.cpp(4 cites).core/tools/vmaf.c(3 cites:copy_picture_data,init_gpu_backends,main).- Invariant: zero behavioural change. Edits are inside comment blocks — appended
(ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278)to existing prose justifications. No function bodies split. The 12 SYCL sites share an identical justification string verbatim; preserving the byte-for-byte duplicate is the load-bearing documentation pattern (grep-able across the SYCL TUs). - On upstream sync: minimal interaction. The cite-only edits live inside comment blocks above the function signatures; rebases will surface them as touched lines but the function bodies are unchanged. For
integer_adm.c's upstream-mirror block (Netflix966be8d5), the comment edit at line 984–991 is cosmetic — keep the fork's version on conflict (it merely names the ADR; the underlying prose is unchanged). - Re-test on rebase:
```bash # 1. Programmatic audit must report 0 missing citations python3 - <<'PY' import re, os paths = [os.path.join(r, f) for r, _, fs in os.walk('libvmaf/src') for f in fs if f.endswith(('.c','.cpp','.h'))] paths.append('core/tools/vmaf.c') miss = total = 0 for p in paths: with open(p) as fh: ls = fh.readlines() for i, line in enumerate(ls): if 'NOLINT' in line and 'readability-function-size' in line and 'NOLINTEND' not in line: total += 1 ctx = [line]; j = i - 1 while j >= 0 and j > i - 14: s = ls[j].strip() if not s: break if s.startswith(('//','/','')): ctx.insert(0, ls[j]); j -= 1 else: break buf = ''.join(ctx) if 'ADR-' not in buf and not re.search(r'[Rr]esearch-?\d', buf): miss += 1 print(f"sites={total} missing={miss}") PY
# 2. Build + Netflix golden gate meson setup build -Denable_cuda=false -Denable_sycl=false ninja -C build make test-netflix-golden
0231 — vmaf-tune score path decodes mp4 -> raw YUV¶
- Touches:
tools/vmaf-tune/src/vmaftune/score.py(new_decode_to_raw_yuv+_needs_decodehelpers,run_scoreshells out to ffmpeg whenreq.distorted.suffix not in {.yuv, .y4m});tools/vmaf-tune/tests/test_corpus.py(3 new regression tests + the smoke-end-to-end mock now also stubs the ffmpeg decode call). - Invariant: the decode-back is the contract the libvmaf CLI imposes — mp4/webm/etc.
--distortedis silently rejected as raw-yuv with the wrong byte count, surfacing asexit_status=234. Future encoder adapters that emit non-raw containers inherit this decode automatically. Do not "optimise" the temp YUV away without first migrating the corpus pipeline to theffmpeg+libvmaffilter (which can pipe an mp4 stream in directly). - On upstream sync: zero interaction.
vmaf-tuneis fork-only tooling; upstream Netflix/vmaf has no analogue. - Re-test on rebase:
```bash cd tools/vmaf-tune && python3 -m pytest tests/ # plus an end-to-end smoke (needs a real raw YUV + ffmpeg + vmaf): ./vmaf-tune corpus --source /path/to/ref.yuv --width 1920 \ --height 1080 --pix-fmt yuv420p --framerate 25 --duration 6 \ --encoder libx264 --preset medium --crf 23 \ --output /tmp/smoke.jsonl --no-source-hash # expect: vmaf_score is a real number, not NaN.
0232 — CUDA build pins nvcc --std c++20¶
- Touches:
core/src/meson.buildline 686 (cuda_flags = [...]). - Invariant: nvcc 12.x clamps host C++ at C++17 by default; 13.x accepts up to C++20. Bumping the host stdlib past nvcc's default (any gcc >= 16, libstdc++ ships C++23 features) breaks the host-side parse in
<type_traits>/<bits/utility.h>. Forcing--std c++20on CUDA 13+ keeps the host headers parseable. Do not drop this flag without first checking the host gcc version against nvcc's default. - On upstream sync: zero interaction. Netflix/vmaf doesn't ship the
cuda_flagslist shape we use (their CUDA build is the original pre-fork pattern); a sync that touchescore/src/meson.buildaround theis_cuda_enabledbranch should keep the--std c++20injection. - Re-test on rebase:
meson setup core/build-cuda -Denable_cuda=true \
-Denable_sycl=false -Denable_vulkan=disabled
ninja -C core/build-cuda
# smoke
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
-r .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
-d .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
-w 1920 -h 1080 -p 420 -b 8
0233 — CUDA motion flush_fex_cuda idempotency guard¶
- Touches:
core/src/feature/cuda/integer_motion_cuda.c— factored anappend_if_unwrittenhelper and routed the two motion2 / motion3 final-frame writes through it. - Invariant: under T-GPU-OPT-1 (PR #312 / ADR-0242), the pending-collect inside
flush_context_cudamay already have writtenmotion2_score[s->index]/motion3_score[s->index]beforeflush_fex_cudaruns. Any future motion-cuda flush logic that emits the same (feature, index) pair must keep this idempotency contract orflush_context_cudawill mis-surface as "context could not be synchronized". - On upstream sync: the bug only exists because the fork's
flush_context_cudaruns the pending-collect before the per-extractor flush. Netflix/vmaf upstream doesn't have the T-GPU-OPT-1 drain pattern, so the pre-#312 code path didn't duplicate-write. If Netflix lands a similar pattern, the fix shape mirrors what's done here. - Re-test on rebase:
ninja -C core/build-cuda
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model path=model/vmaf_v0.6.1.json --threads 1 -q \
--output /tmp/cuda.json --json
# Expect: clean run, no "cannot be overwritten" warning,
# no "problem flushing context" error.
0234 — hw_encoder_corpus.py Phase A real-corpus runner¶
- Touches: new
scripts/dev/hw_encoder_corpus.py(no existing caller; opt-in tooling). Output landing inruns/phase_a/is gitignored — rerun the script to reproduce.docs/development/intel-arc-vaapi-driver-priority.md. Output landing inruns/phase_a/is gitignored — rerun the script to reproduce. stratified sample, 58 KiB). - Invariant: the script's QSV path forces
env['LIBVA_DRIVER_NAME']='iHD'(set by the calling shell, not inside the script) when targeting/dev/dri/renderD129on a multi-card host that has NVIDIA's libva-driver-nvidia shim installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI translation and the MFX session handshake fails with -9. See the companion doc for the failure mode + fix. - On upstream sync: zero interaction. The script lives under
scripts/dev/(fork-only); upstream Netflix/vmaf has no comparable Phase A corpus tooling. - Re-test on rebase:
python3 scripts/dev/hw_encoder_corpus.py \
--vmaf-bin core/build-cuda/tools/vmaf \
--source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
--width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
--encoder h264_nvenc --cq 25 \
--out /tmp/smoke.jsonl
# Expect: 1 cell × ~150 frames, per-frame canonical-6 + vmaf,
# encoder=h264_nvenc, cq=25.
0235 — fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)¶
- Touches:
ai/scripts/train_fr_regressor_v2.py—ENCODER_VOCABgains 6 hw-codec entries (3 NVENC + 3 QSV);ENCODER_VOCAB_VERSIONbumps 1 -> 2;PRESET_ORDINALgains 6 sub-tables forp1..p7(NVENC) and the libx264-aligned QSV preset family. - Invariant: vocab order is load-bearing — index of every entry is baked into trained model graphs as a one-hot column position. New entries MUST be appended (never inserted into the middle), and the
unknownsentinel MUST stay last (UNKNOWN_ENCODER_INDEX = N - 1). BumpingENCODER_VOCAB_VERSIONsignals that any v1-graph ONNX needs re-export against v2 before consuming v2 training rows. - On upstream sync: zero interaction.
train_fr_regressor_v2.pyis fork-only (Phase B prereq, ADR-0237 / ADR-0272). - Re-test on rebase:
python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export— expect PLCC > 0.95 on a multi-codec corpus.
0276 — vmaf_tiny_v5 corpus-expansion probe (ADR-0287) — defer¶
- What changed: research-only addition. New scripts under
ai/scripts/(fetch_youtube_ugc_subset.py,extract_ugc_features.py,train_vmaf_tiny_v5.py,eval_loso_vmaf_tiny_v5.py), new ADRdocs/adr/0276-*.md, new research digestdocs/research/0057-*.md, and one CHANGELOG entry. No new ONNX artefact undermodel/tiny/, no registry change, no public C-API / CLI / meson_options change. The probe trained an architecturally identical mlp_small on a 5-corpus parquet (4-corpus + 27 000 UGC rows); the 1-σ ship gate did not clear (Δ PLCC = +0.00005), so the exporter that the prior agent had drafted (export_vmaf_tiny_v5.py) was discarded before the commit. - Upstream source: fork-local. Netflix/vmaf has no tiny-AI corpus-expansion surface; nothing on the upstream side touches these files.
- On upstream sync: zero interaction. The v5 surface lives entirely under
ai/scripts/+docs/adr/+docs/research/, all of which are fork-introduced trees. The shipped v2 model (model/tiny/vmaf_tiny_v2.onnx) and its registry row are untouched. - Re-test on rebase:
# No code under test on rebase — purely research artefacts.
# If revisiting the corpus expansion, the reproducer is in the
# research digest:
python3 ai/scripts/fetch_youtube_ugc_subset.py \
--out-dir .workingdir2/ugc/download \
--n-stems 30 \
--manifest .workingdir2/ugc/manifest.json
python3 ai/scripts/extract_ugc_features.py \
--manifest .workingdir2/ugc/manifest.json \
--yuv-dir .workingdir2/ugc/yuv \
--vmaf-bin build-cpu/tools/vmaf \
--out-parquet runs/full_features_ugc.parquet \
--max-height 360 --max-frames 300 --threads 8
python3 ai/scripts/eval_loso_vmaf_tiny_v5.py \
--parquet-base runs/full_features_4corpus.parquet \
--parquet-extra runs/full_features_ugc.parquet \
--out-json runs/vmaf_tiny_v5_loso_metrics.json
0227 — vmaf-tune Intel QSV codec adapters (ADR-0281)¶
- What changed: fork-local additions under
tools/vmaf-tune/src/vmaftune/codec_adapters/—_qsv_common.py,h264_qsv.py,hevc_qsv.py,av1_qsv.py, plus registry rows incodec_adapters/__init__.pyand a new test filetools/vmaf-tune/tests/test_codec_adapter_qsv.py. Doc updates:docs/usage/vmaf-tune.md(Hardware encoders section),docs/adr/0281-vmaf-tune-qsv-adapters.md,docs/research/0066-vmaf-tune-qsv-adapters.md,tools/vmaf-tune/AGENTS.md,CHANGELOG.md. - Upstream source: fork-local.
tools/vmaf-tune/is fork-introduced under ADR-0237; Netflix/vmaf has no corresponding tree. - On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths.
- Invariant: the registry exposes exactly four codecs (
av1_qsv,h264_qsv,hevc_qsv,libx264— alphabetical), each adapter validates its(preset, quality)pair, and the QSV preset vocabulary is the seven x264-style names (veryslow…veryfast, noultrafast/superfast). The encode pipeline (encode.py) remains x264-CRF-tied and will be widened in a separate PR — the QSV adapters are inert until then. Future codec families that share parameter shape (NVENC, AMF) follow the same_<family>_common.py+ N thin adapters pattern. - Re-test on rebase:
0229 — vmaf-tune libvvenc + NN-VC codec adapter (ADR-0285)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/vvenc.py(new fork-only file),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry edit, fork-only),tools/vmaf-tune/tests/test_codec_adapter_vvenc.py(new),tools/vmaf-tune/tests/test_corpus.py(relaxes theknown_codecs() == ("libx264",)assertion to"libx264" in known_codecs()since the registry now spans multiple codecs). - Invariant: the codec-adapter registry is fork-introduced (Phase A of ADR-0237) and lives entirely outside the upstream Netflix tree, so
tools/vmaf-tune/does not touch upstream paths. The only rebase-sensitive surface is theCORPUS_ROW_KEYSschema insrc/vmaftune/__init__.py(per the Phase A invariant intools/vmaf-tune/AGENTS.md); this PR adds the adapter without changing the schema. - Upstream interaction: none.
tools/vmaf-tune/is not in Netflix/vmaf upstream. - Re-test on rebase:
- Status update 2026-05-09: the original
nnvc_intratoggle was removed (it emitted a fabricatedIntraNNkey that does not exist in any released VVenC). Replaced with a curated 9-knob real-VVenC 1.14.0 tuning surface (PerceptQPA,InternalBitDepth,Tier,Tiles,MaxParallelFrames,RPR,SAO,ALF,CCALF). Defaults preserve the bit-exact Phase A grid baseline.adapter_versionbumped to"2"so cache keys invalidate. See ADR-0285 §"Status update 2026-05-09".no rebase impact: REASON(fork-local file, no upstream-tree touch).
0228 — vmaf-tune Phase D scaffold (ADR-0276)¶
- Touches:
tools/vmaf-tune/src/vmaftune/per_shot.py,tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/tests/test_per_shot.py,docs/usage/vmaf-tune.md,docs/adr/0276-vmaf-tune-phase-d-per-shot.md. - Invariant: scaffold-only. The module relies on a stable predicate signature
(shot, target_vmaf, encoder) -> (crf, predicted_vmaf)that Phase B's bisect (PR #347) drops into later.Shotranges are half-open[start_frame, end_frame)even though the C-sidevmaf-perShotJSON/CSV sidecar uses an inclusiveend_frame— normalisation happens at the parse boundary in_parse_per_shot_json/parse_per_shot_csv.vmaf-perShotschema lives indocs/usage/vmaf-perShot.mdand is fork-local (ADR-0222), so upstream cannot drift it; the only rebase risk is fork-internal renames. - Upstream source: entirely fork-local.
tools/vmaf-tune/is fork-introduced (ADR-0237). Netflix/vmaf upstream has no encode-automation surface. - On upstream sync: zero interaction expected. No file in this PR overlaps an upstream-mirrored path.
- Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q
python tools/vmaf-tune/vmaf-tune tune-per-shot --help
0229 — vmaf-tune SVT-AV1 codec adapter (ADR-0278)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/svtav1.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry),tools/vmaf-tune/src/vmaftune/encode.py(parse_versionsextended for the SVT-AV1 banner pattern),tools/vmaf-tune/src/vmaftune/corpus.py(optionalffmpeg_preset_tokenhook). - Invariant:
PRESET_NAME_TO_INTis closed and order-stable; the integer values are baked into corpus rows that downstreamfr_regressor_v2(ADR-0235) trains on. Reordering or rewriting the table silently changes the integer SVT-AV1 receives. The codec key"libsvtav1"matchesCODEC_VOCAB[2]inai/src/vmaf_train/codec.py— keep them aligned on any rename. - Upstream source: fork-local.
tools/vmaf-tune/is a fork-introduced tree (see entry 0227 — Phase A scaffold). No Netflix/vmaf upstream interaction. - On upstream sync: zero interaction. Lives entirely under the fork-local
tools/vmaf-tune/tree. - Re-test on rebase:
0230 — fr_regressor_v2 PROD ship (ADR-0352)¶
- ADR: ADR-0352
- Touches:
model/tiny/fr_regressor_v2.onnx(binary, refreshed),model/tiny/fr_regressor_v2.json(sidecar, sha256 + metrics),model/tiny/registry.json(smoke flag flip, sha256 update),runs/phase_a/full_grid/per_frame_canonical6.jsonl(training corpus — fork-local artefact underruns/), companion docs. - Re-test recipe: see Research-0068 §Reproducer. Ship gate is LOSO PLCC ≥ 0.95 on the per-source folds; current run reports 0.9681 ± 0.0207.
- Rebase invariant: the per-frame canonical-6 corpus must be rebuilt from
runs/phase_a/{nvenc,qsv}_pf.jsonl(PR #392) before any retrain; do not re-train against the cell-onlycomprehensive.jsonl(it lacks the per-frame features and produces PLCC ≈ 0.7 — the smoke baseline). - No upstream interaction:
fr_regressor_v2is fork-local (ADR-0272).
0229 — vmaf-tune Phase E ladder generator (ADR-0295)¶
- ADR: ADR-0295
- Touches: entirely fork-local under
tools/vmaf-tune/. New moduletools/vmaf-tune/src/vmaftune/ladder.py, new test filetools/vmaf-tune/tests/test_ladder.py, two new subcommand blocks intools/vmaf-tune/src/vmaftune/cli.py. No upstream-shared paths touched. - Invariant:
vmaftune.ladder.convex_hullreturns a strictly monotonic Pareto frontier (both bitrate and vmaf monotonically increasing);select_kneesreturns exactlymin(n, len(hull))rungs in ascending bitrate order;emit_manifest("hls")produces one#EXT-X-STREAM-INFper rung with monotonically-increasingBANDWIDTH=values. The default_default_sampleris intentionallyNotImplementedError— production callers must inject a Phase B bisect-driven sampler. Phase B integration PR (gated on PR #347) swaps the default; the test suite continues to inject a synthetic stub. - Rebase impact: none — fork-local Python tool; upstream Netflix/vmaf does not ship a
tools/vmaf-tune/tree. - Re-test on rebase:
0229 — fr_regressor_v2 probabilistic head scaffold (ADR-0279)¶
- Touches:
ai/scripts/train_fr_regressor_v2_ensemble.py(new — fork-local).ai/scripts/eval_probabilistic_proxy.py(new — fork-local).model/tiny/fr_regressor_v2_ensemble_v1*.onnx,fr_regressor_v2_ensemble_v1.json(new artefacts; smoke probes).model/tiny/registry.json— five newkind: "fr"rows (fr_regressor_v2_ensemble_v1_seed{0..4}); existing entries untouched.ai/AGENTS.md— new "fr_regressor_v2_ensemble_v1 — probabilistic head" section pinning the per-member ONNX I/O contract, manifest-as-runtime-entry-point invariant, ensemble-size pin, confidence-rule one-of, codec-vocab parity, and smoke-artefact posture.docs/ai/models/fr_regressor_v2_probabilistic.md(new model card).docs/research/0067-fr-regressor-v2-probabilistic.md(new audit digest).docs/adr/0279-fr-regressor-v2-probabilistic.md(new ADR; Proposed). Index row appended todocs/adr/README.md.CHANGELOG.md—### Addedrow under "Unreleased — lusoris fork".- Invariant: the per-member ONNX I/O contract (two inputs:
features [N, 6]standardised +codec_onehot [N, NUM_CODECS]; one outputscore [N]) and the manifest'sconfidencerule (one-of"ensemble"/"ensemble+conformal") are the C-side adapter's load-bearing contract. Per-member ensembles are stockFRRegressor(num_codecs=NUM_CODECS)calls — flipping to a v1-shaped single-input graph silently invalidates the manifest.CODEC_VOCABparity withai/src/vmaf_train/codec.pyis required. - On upstream sync: zero interaction expected. Wholly fork-local; no upstream Netflix/vmaf path overlap. The
ai/package is fork-introduced (see ADR-0021, ADR-0036) — upstream has no probabilistic-regressor surface. If upstream ever ships its ownfr_regressor_v2variant, do NOT merge — register both ids side-by-side. - Re-test on rebase:
python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke
python ai/scripts/eval_probabilistic_proxy.py --smoke
python ai/scripts/validate_model_registry.py
0287 — vmaf-tune saliency-aware ROI tuning (ADR-0293)¶
- Touches:
tools/vmaf-tune/src/vmaftune/saliency.py,tools/vmaf-tune/src/vmaftune/cli.py(newrecommendsubcommand),tools/vmaf-tune/AGENTS.md(saliency invariant),docs/usage/vmaf-tune.md(saliency section). - Upstream source: fork-local. The
vmaf-tunetree was introduced in PR #329 (ADR-0237 Phase A) and has no upstream Netflix counterpart. - On upstream sync: zero interaction — pure fork-local Python package under
tools/vmaf-tune/. - Invariant: the saliency-to-QP-offset signal blend (
offset = (2*sal − 1) * foreground_offset, clamped to ±12) is bit-for-bit equivalent tovmaf-roi's C-side blend (ADR-0247).tests/test_saliency.pypins the contract; ifvmaf-roi's C blend changes,saliency.pyfollows in the same PR. The test seam contract (session_factory=…,encode_runner=…) lets the suite run withoutonnxruntimeorffmpeg. - Re-test on rebase:
0229 — tools/vmaf-roi-score/ Option C scaffold (ADR-0296)¶
- ADR: ADR-0296
- Touches:
tools/vmaf-roi-score/pyproject.toml(new)tools/vmaf-roi-score/vmaf-roi-score(new console shim)tools/vmaf-roi-score/src/vmafroiscore/__init__.py(new)tools/vmaf-roi-score/src/vmafroiscore/cli.py(new)tools/vmaf-roi-score/src/vmafroiscore/score.py(new)tools/vmaf-roi-score/src/vmafroiscore/mask.py(new)tools/vmaf-roi-score/tests/test_combine.py(new)tools/vmaf-roi-score/README.md(new)tools/vmaf-roi-score/AGENTS.md(new)docs/adr/0296-vmaf-roi-saliency-weighted.md(new)docs/adr/_index_fragments/0296-vmaf-roi-saliency-weighted.md(new)docs/adr/_index_fragments/_order.txt— append-only.docs/research/0069-vmaf-roi-saliency-weighted.md(new)docs/usage/vmaf-roi-score.md(new)changelog.d/added/T6-2c-vmaf-roi-score-scaffold.md(new)- Invariant:
tools/vmaf-roi-score/is wholly fork-local. No upstream Netflix/vmaf surface owns or interacts with this directory. The combine math is a pure linear blend on Pythonfloat; the JSON schema is pinned byROI_RESULT_KEYSandSCHEMA_VERSION = 1. Schema bumps require an ADR-0288 supersession. Naming guard: do not confuse withcore/tools/vmaf_roi.c(ADR-0247) — that's the encoder-steering binary. The scoring tool here isvmaf-roi-score; the names diverge deliberately. - Rebase impact: zero. Pure-Python tool under
tools/; not part of the libvmaf C build, not part of any Netflix-mirrored surface. - Re-test on rebase:
0228 — vmaf-tune compare codec-comparison mode (research-0061 Bucket #7)¶
- Touches:
tools/vmaf-tune/src/vmaftune/compare.py(new). Wholly fork-local; no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/cli.py— adds thecomparesubparser and_run_comparerouter.tools/vmaf-tune/tests/test_compare.py(new). Mocked predicate; noffmpeg/vmafbinaries required.tools/vmaf-tune/AGENTS.md— invariant note for the predicate seam andCOMPARE_ROW_KEYScontract.docs/usage/vmaf-tune.md— new "Codec comparison" section.- Invariant:
compare.compare_codecsorchestrates per-codec ranking via an injectedpredicate(codec, src, target_vmaf) -> RecommendResultcallable. The orchestration must not branch on codec name; new codecs land as one-file additions undercodec_adapters/and are picked up automatically by the registry.COMPARE_ROW_KEYSis the JSON / CSV column contract — same maintenance discipline asCORPUS_ROW_KEYS. - Rebase impact: entirely fork-local. The Phase A + Phase B recommend backend (ADR-0237) is fork-internal; upstream Netflix/vmaf has no
tools/vmaf-tune/tree. - Re-test on rebase:
```shell pytest tools/vmaf-tune/tests/test_compare.py -v PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \ --src /tmp/ref.yuv --target-vmaf 92 --format markdown
0229 — vmaf-tune --score-backend GPU score wiring (ADR-0299)¶
- Touches:
tools/vmaf-tune/src/vmaftune/score_backend.py(new). Wholly fork-local —tools/vmaf-tune/has no upstream Netflix/vmaf overlap.tools/vmaf-tune/src/vmaftune/{score,corpus,cli}.py(additive kwargs, no API removals).tools/vmaf-tune/tests/test_score_backend.py(new).docs/usage/vmaf-tune.md(new GPU section + flag row).docs/adr/0299-vmaf-tune-gpu-score.md(new).docs/research/0071-vmaf-tune-gpu-score-backend.md(new).- Invariant: the libvmaf CLI exposes
--backend NAMEwith valuesauto|cpu|cuda|sycl|vulkanexactly. Help-text parser inscore_backend.parse_supported_backendspins this format. If upstream renames the flag or reformats the help line on merge, the parser silently degrades to "CPU only" — the test fixtures intest_score_backend.pywill catch the format change but only if re-run. - Upstream source: fork-local. Netflix upstream's CLI does not ship a
--backendselector (CPU-only). - On upstream sync: zero interaction.
vmaf-tunelives entirely in fork-introduced paths and consumes only the fork's--backendflag. - Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v
# If the libvmaf help text reformats, parse_supported_backends
# will return {"cpu"} on test_parse_full_backend_line_yields_all_four
# and the test fails loudly.
0261 — vmaf-tune HDR-aware encode + score path (2026-05-03)¶
- What changed: fork-local addition under
tools/vmaf-tune/src/vmaftune/hdr.pyplus wiring intocorpus.py/cli.py/score.py. Adds ffprobe-driven HDR detection, codec-specific HDR ffmpeg flag dispatch, schema-v2 corpus row keys (hdr_transfer,hdr_primaries,hdr_forced), and four--auto-hdr/--force-*CLI modes. See ADR-0300. - Upstream source: zero.
tools/vmaf-tune/is fork-introduced (Phase A under ADR-0237). - On upstream sync: zero interaction. Upstream Netflix/vmaf ships no encode automation surface; this tree is entirely fork-local and lives outside
libvmaf/andpython/. - Schema migration note:
SCHEMA_VERSIONbumped 1 → 2. The three new keys are additive — Phase B / C loaders treat missing keys as SDR for backward compat with v1 rows. - Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -q
python -m vmaftune.cli corpus --help # confirm --auto-hdr surfaces
0298 — vmaf-tune content-addressed cache (ADR-0298)¶
- What changed: fork-local. New module
tools/vmaf-tune/src/vmaftune/cache.py; cache integration intools/vmaf-tune/src/vmaftune/corpus.py(iter_rowsnow consults the cache before encode/score); new CLI flags--no-cache,--cache-dir,--cache-size-gbincli.py. Codec-adapterProtocolgainsadapter_version: str; the lone Phase-A x264 adapter pins"1". - Upstream source: none.
tools/vmaf-tune/is fork-introduced (ADR-0237) and has no upstream counterpart. - On upstream sync: zero interaction with Netflix/vmaf master. The module sits entirely under
tools/vmaf-tune/, which upstream does not ship. - Invariant for future codec adapters: every
CodecAdaptermust declareadapter_version: str. Bump it whenever the adapter's argv shape, preset list, or quality range changes — otherwise the cache returns stale results post-upgrade. The contract is asserted bytest_cache_key_diffs_on_each_fieldintests/test_cache.py. - Re-test on rebase:
```bash pytest tools/vmaf-tune/tests/test_cache.py -v
0283 — vmaf-tune Apple VideoToolbox adapters (2026-05-05)¶
- What changed: fork-local addition under
tools/vmaf-tune/src/vmaftune/codec_adapters/. New files:h264_videotoolbox.py,hevc_videotoolbox.py,_videotoolbox_common.py, plus the registry hook in__init__.py. See ADR-0283. - Update 2026-05-09:
prores_videotoolbox.pyadapter added to the same registry pattern (broadcast / prosumer ProRes intermediate). Quality knob differs — ProRes is a fixed-rate codec, so the harness's--crfslot carries the integer ProRes tier id (0=proxy→ 5=xq) rather than a-q:vvalue._videotoolbox_common.pyextended withPRORES_PROFILE_*constants +validate_prores_videotoolbox()/prores_profile_name()helpers; profile ids verified against FFmpeg n8.1.1libavcodec/videotoolboxenc.c. See the Status update appendix in ADR-0283. - Upstream source: zero.
tools/vmaf-tune/is fork-introduced (Phase A under ADR-0237). - On upstream sync: zero interaction.
- Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py -q
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_prores_videotoolbox.py -q
0228 — vmaf-tune coarse-to-fine CRF search (ADR-0306)¶
- What changed: fork-local tooling. Adds
coarse_to_fine_search()totools/vmaf-tune/src/vmaftune/corpus.py, plumbs new CLI flags ontovmaf-tune corpus(--coarse-to-fine,--coarse-step,--fine-radius,--fine-step,--target-vmaf), and ships a newvmaf-tune recommendsubcommand. Widenstools/vmaf-tune/src/vmaftune/codec_adapters/x264.pyquality_rangefrom(15, 40)to(0, 51). JSONL row schema unchanged (SCHEMA_VERSION=1). - Upstream source: fork-local. The whole
tools/vmaf-tune/tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation surface. - On upstream sync: zero interaction.
tools/vmaf-tune/is not mirrored from upstream. - Re-test on rebase:
0314 — vmaf-tune --score-backend=vulkan (ADR-0314)¶
- Touches:
tools/vmaf-tune/src/vmaftune/cli.py(additive argparse flag oncorpus+recommendsubparsers; resolvesselect_backendand catchesBackendUnavailableErrorfor clean exit-2).tools/vmaf-tune/src/vmaftune/score.py(additivebackendkwarg onbuild_vmaf_commandandrun_score;None= no flag emitted).tools/vmaf-tune/src/vmaftune/corpus.py(newCorpusOptions.score_backendfield, defaultNone; forwarded intorun_score).tools/vmaf-tune/tests/test_score_backend.py(additive Vulkan-specific tests; pre-existing tests now pass after thebackend=kwarg lands).docs/adr/0314-vmaf-tune-score-backend-vulkan.md(new).docs/usage/vmaf-tune.md(new "Vulkan score backend" subsection under the existing GPU-scoring section).tools/vmaf-tune/AGENTS.md(invariant note: argparse choices stay in sync with libvmaf--backendvocabulary).changelog.d/added/vmaf-tune-score-backend-vulkan.md(new).- Invariant:
score_backend.ALL_BACKENDS = ("cpu", "cuda", "sycl", "vulkan")is the exact set libvmaf'score/tools/cli_parse.c--backendalternation accepts. Adding a new harness-side value without the libvmaf-side wiring produces silent strict-mode failures on hosts that probe positively for it. - Upstream source: zero. Netflix upstream's CLI does not ship a
--backendselector; bothtools/vmaf-tune/andcore/src/vulkan/are fork-introduced. - On upstream sync: zero interaction. No upstream-mirror file is touched.
- Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v -k vulkan
pytest tools/vmaf-tune/tests/test_score_backend.py -v
Failures here usually indicate the libvmaf help-text format changed; score_backend.parse_supported_backends test fixtures pin the format and will fail loudly.
0303 — fr_regressor_v2 ensemble prod flip (ADR-0303)¶
- ADR: ADR-0303
- Touches: entirely fork-local.
ai/scripts/train_fr_regressor_v2_ensemble_loso.py(new — 9-fold LOSO trainer over the five ensemble seeds; emitsloso_seed{N}.jsonartefacts).scripts/ci/ensemble_prod_gate.py(new — reads fiveloso_seed{N}.jsonfiles, returns exit 0 iffmean(PLCC_i) ≥ 0.95ANDmax - min ≤ 0.005).ai/AGENTS.md— appended "Ensemble registry invariant" paragraph under the existingfr_regressor_v2_ensemble_v1section.docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md(new),docs/research/0075-fr-regressor-v2-ensemble-prod-flip.md(new),changelog.d/added/fr-regressor-v2-ensemble-prod-flip.md(new).- Rebase invariant: the production ship gate is two-part —
mean_i(PLCC_i) ≥ 0.95ANDmax_i(PLCC_i) - min_i(PLCC_i) ≤ 0.005over five seeds. The variance bound is load-bearing: removing it silently allows a one-seed-wins-four-seeds-tie configuration that invalidates the ensemble's predictive-distribution semantics. Both thresholds live inscripts/ci/ensemble_prod_gate.py; do not weaken either without superseding ADR-0303. - Rebase invariant (registry): the five
fr_regressor_v2_ensemble_v1_seed{0..4}registry rows aresmoke: trueon master at this commit; flipping them tofalseis the follow-up flip PR's job, gated on a real-corpus LOSO run + the CI gate. Do not flip seed rows during a rebase merge conflict resolution. - Re-test on rebase:
python3 -c "import ast; ast.parse(open('ai/scripts/train_fr_regressor_v2_ensemble_loso.py').read())"
python3 -c "import ast; ast.parse(open('scripts/ci/ensemble_prod_gate.py').read())"
python ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help
python scripts/ci/ensemble_prod_gate.py --help
- Upstream source: zero.
fr_regressor_v2and its ensemble are fork-introduced (parent ADR-0272 / ADR-0279). - On upstream sync: zero interaction.
0313 — CI required-checks aggregator (2026-05-05)¶
- What changed: fork-local CI policy. New
.github/workflows/required-aggregator.yml— single workflow that runs on every non-draft PR and verifies the 23 named required checks reportedsuccess/skipped/neutral(or didn't appear at all, which is the path-filter-rejection semantics). Aggregator becomes the single branch-protection required check, replacing the 23-name list from ADR-0037. - Touches:
.github/workflows/required-aggregator.yml(new),docs/adr/0313-ci-required-checks-aggregator.md(new),changelog.d/added/ci-required-checks-aggregator.md(new),docs/adr/README.md(+1 row),docs/adr/_index_fragments/_order.txt(+1 line + new fragment file). - Upstream source: zero. Branch-protection policy is fork-only.
- On upstream sync: zero interaction with Netflix/vmaf master.
- Manual operator step at adoption (uses PATCH, not PUT — corrected from the original ADR-0313 body which had the wrong verb):
echo '{"strict": false, "contexts": ["Required Checks Aggregator"]}' | \
gh api -X PATCH "repos/VMAFx/vmafx/branches/master/protection/required_status_checks" --input -
- Re-test on rebase:
# YAML lint passes
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/required-aggregator.yml'))"
0305 — encoder knob-space Pareto analysis (2026-05-05)¶
- What changed: fork-local. New analysis scaffold for the 12,636-cell encoder knob sweep that backs
tools/vmaf-tune/codec_adapters/*recipe defaults. New files:ai/scripts/analyze_knob_sweep.py(per-(source, codec, rc_mode)Pareto hull on(bitrate_kbps, vmaf_score),encode_time_mstiebreaker, regression-detection check),ai/tests/test_knob_sweep_analysis.py(synthetic 20-row JSONL fixture). Methodology + scaffolded findings: see ADR-0305 + Research-0077. Companion to Research-0063. - Touches: none upstream-shared. Sits entirely under
ai/(fork-local since the tiny-AI training surface, ADR-0021) anddocs/{adr,research}/(fork ledger). - Upstream source: zero. The 12,636-cell sweep, the Pareto scaffold, and the regression-detection invariant are fork-introduced; Netflix/vmaf master ships no encoder knob-sweep tooling.
- On upstream sync: zero interaction with Netflix/vmaf master.
- Invariant for future codec adapter PRs: per the
ai/AGENTS.mdknob-sweep corpus invariant (ADR-0305), recipes that regress vs the bare encoder at matched bitrate within the same(source, codec, rc_mode)slice MUST NOT ship as adapter defaults. New adapter PRs cite the per-slice hull row fromreports/summary.md(or "no hull entry yet — bare default") in their PR description. Thecomprehensive.jsonlsweep file is generated locally and lives underruns/phase_a/full_grid/(gitignored — never committed). - Re-test on rebase:
0302 — ENCODER_VOCAB v3 schema expansion (ADR-0302)¶
- Touches:
ai/scripts/train_fr_regressor_v2.py(adds anENCODER_VOCAB_V3parallel constant; does not modify the liveENCODER_VOCABorENCODER_VOCAB_VERSION). - Invariant:
ENCODER_VOCABis append-only and order-stable (per ADR-0235). The v3 scaffold preserves the v2 slot ordering verbatim — slots 0..12 are bit-identical to the v2 vocab; slots 13/14/15 appendlibsvtav1,h264_videotoolbox,hevc_videotoolbox. The liveENCODER_VOCAB_VERSION = 2remains the source of truth until the follow-up retrain PR clears the LOSO PLCC ship gate. - Upstream interaction: zero.
ai/scripts/train_fr_regressor_v2.pyis fork-introduced (ADR-0272) and has no upstream counterpart. - Re-test on rebase:
python3 -c "
import importlib.util, pathlib
spec = importlib.util.spec_from_file_location(
't', pathlib.Path('ai/scripts/train_fr_regressor_v2.py')
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
assert len(m.ENCODER_VOCAB_V3) == 16
assert m.ENCODER_VOCAB_VERSION == 2
print('OK')
"
0304 — vmaf-tune fast-path prod wiring (ADR-0304)¶
- Touches:
tools/vmaf-tune/src/vmaftune/fast.py(replaces the ADR-0276 scaffold'sNotImplementedErrorpaths with concrete Optuna TPE + v2 proxy + GPU verify wiring); new moduletools/vmaf-tune/src/vmaftune/proxy.py(centralised seam forfr_regressor_v2ONNX inference); expandedtools/vmaf-tune/tests/test_fast.py. Doc-side: ADR-0304, Research-0076,tools/vmaf-tune/AGENTS.mdinvariant note. - Upstream source: zero.
tools/vmaf-tune/andmodel/tiny/fr_regressor_v2.onnxare both fork-introduced (ADR-0237 / ADR-0352). - Invariant: the production proxy is always
fr_regressor_v2(no smoke models in the production path) and a single GPU verify pass at recommend-end is mandatory — proxy alone never wins. Thevmaftune.proxy.run_proxyhelper is the single seam every fast-path consumer goes through; future probabilistic-head / ensemble migrations land in that one module. ENCODER_VOCAB v2 one-hot ordering is frozen by ADR-0352 and pinned inproxy.ENCODER_VOCAB_V2— keep in sync withai/scripts/train_fr_regressor_v2.py; drift raisesProxyErrorat inference time before bad predictions ship. - On upstream sync: zero interaction with Netflix/vmaf master.
- Re-test on rebase:
0307 — vmaf-tune ladder default sampler wiring (ADR-0307)¶
- What changed: fork-local tooling.
tools/vmaf-tune/src/vmaftune/ladder.py::_default_samplerno longer raisesNotImplementedError; it composescorpus.iter_rows(Phase A encode + score) withrecommend.pick_target_vmaf(smallest CRF clearing target VMAF) overDEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)at the adapter's mid-range preset. Module-level docstring + AGENTS.md invariant updated. New tests intools/vmaf-tune/tests/test_ladder.pystubiter_rowsviamonkeypatch.setattrso no live ffmpeg / vmaf binaries are needed. - Upstream source: fork-local. The whole
tools/vmaf-tune/tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation / ladder surface. - On upstream sync: zero interaction.
tools/vmaf-tune/is not mirrored from upstream. - Rebase invariant: the 5-point sweep
(18, 23, 28, 33, 38)is the load-bearing default; downstream Phase E callers size their wall-time budget against five encodes per(resolution, target_vmaf)cell. Do not widen / narrow it without an ADR-0307 follow-up. TheSamplerFnseam stays open — callers needing finer grids pass an explicitsampler=. - Re-test on rebase:
0309 — fr_regressor_v2 ensemble real-corpus retrain harness (ADR-0309)¶
- ADR: ADR-0309
- Touches: entirely fork-local.
ai/scripts/run_ensemble_v2_real_corpus_loso.sh(new — Bash wrapper that loops the five seeds over the existingtrain_fr_regressor_v2_ensemble_loso.pyagainst.workingdir2/netflix/).ai/scripts/validate_ensemble_seeds.py(new — calls the ADR-0303 gate and writesPROMOTE.json/HOLD.jsonwith a corpus sha256 snapshot).ai/tests/test_validate_ensemble_seeds.py(new — 7 tests, synthetic JSON fixtures for both verdict paths).ai/AGENTS.md— appended "Registry-flip is a separate PR (ADR-0309)" paragraph under the existingfr_regressor_v2_ensemble_v1section.docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md,docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md,docs/ai/ensemble-v2-real-corpus-retrain-runbook.md(all new).- Rebase invariant: the harness is decoupled from the registry mutation. Neither the wrapper nor the validator touches
model/tiny/registry.json; the registry flip is a separate follow-up PR gated on a passingPROMOTE.json. Auto-flipping on PROMOTE was rejected in ADR-0309's alternatives matrix specifically because rebase-time mutation of shipped registry rows is the foot-gun this invariant exists to prevent. - Re-test on rebase:
python -m pytest ai/tests/test_validate_ensemble_seeds.py -v
python ai/scripts/validate_ensemble_seeds.py --help
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
- Upstream source: zero.
- On upstream sync: zero interaction.
0310 — BVI-DVC corpus ingestion for fr_regressor_v2 (ADR-0310)¶
- Touches:
ai/scripts/bvi_dvc_to_corpus_jsonl.py(new fork-only adapter),ai/scripts/merge_corpora.py(new fork-only shard merger),ai/tests/test_merge_corpora.py(new),docs/ai/bvi-dvc-corpus-ingestion.md(new),docs/adr/0310-bvi-dvc-corpus-ingestion.md(new),docs/research/0082-bvi-dvc-corpus-feasibility.md(new),ai/AGENTS.md(BVI-DVC invariant note). - Invariant: the BVI-DVC archive and any extracted artefacts (parquet, cached libvmaf JSON, JSONL corpus shard) are research-only and stay local — only derived
fr_regressor_v2_*.onnxweights ship. The merge utility validates every row against the canonicalvmaftune.CORPUS_ROW_KEYStuple; the schema is the merge contract. Re-shape here is a pure transform on the cached libvmaf JSON; no ffmpeg / vmaf binary is invoked. The(src_sha256, encoder, preset, crf)natural key is load-bearing for de-duplication across mirrors and re-encodes. - Upstream interaction: none.
ai/is fork-introduced; BVI-DVC is not part of Netflix/vmaf upstream. - Re-test on rebase:
ADR-0312 — ffmpeg-patches/ vmaf-tune integration (2026-05-05)¶
- Files:
ffmpeg-patches/0007-libvmaf-tune-qpfile-unified.patch,ffmpeg-patches/0008-add-libvmaf_tune-filter.patch,ffmpeg-patches/0009-pass-autotune-cli-glue.patch,ffmpeg-patches/series.txt,ffmpeg-patches/README.md. - Rebase invariant: patches
0007–0009plug into the cumulative state after patches0001–0006apply against pristinen8.1. Per-patchgit apply --checkin isolation is the wrong gate; use the series-replay command in CLAUDE.md §12 r14 instead. - vmaf-tune patch invariant: the qpfile parser at
libavcodec/qpfile_parser.{c,h}is shared across all three encoder adapters in patch 0007. Future encoders that grow a-qpfileAVOption inherit it; do not fork the parser. Whentools/vmaf-tune/src/vmaftune/saliency.py's qpfile output format changes (new column, different frame-type alphabet, …), patch 0007 must change in the same PR (CLAUDE.md §12 r14). - vf_libvmaf_tune full-scoring promotion (2026-05-06): patch 0008 originally shipped as a scaffold (linear CRF↔VMAF interpolation, no libvmaf scoring) per ADR-0312's deferred-alternatives column. The filter now mirrors
vf_libvmaf.c's CPU framesync pipeline end-to-end (vmaf_init+vmaf_model_load+vmaf_use_features_from_modelin init(); per-framevmaf_picture_alloc+ memcpy +vmaf_read_pictures; flush +vmaf_score_pooled(MEAN)in uninit()). The CRF recommendation remains a piece-wise linear projection from the observed VMAF; per-clip Optuna TPE search stays intools/vmaf-tune/src/vmaftune/recommend.py. Rebase-side: the new filter still depends only on libvmaf's CPU C-API (vmaf_init,vmaf_model_load,vmaf_use_features_from_model,vmaf_read_pictures,vmaf_score_pooled,vmaf_close,vmaf_picture_alloc/unref); zero new symbols beyond whatvf_libvmaf.calready requires, so future libvmaf rebases that pass the existing libvmaf filter pass this one too. ADR-0312 sub-decision retired. - n7+ API migration (2026-05-06): patch 0008 originally referenced the removed
AVFilterLink::frame_ratemember directly (n6-era API); in n7+ that field moved offAVFilterLinkonto a newFilterLinkstruct accessed viaff_filter_link(AVFilterLink *)fromlibavfilter/filters.h. Patch 0008 now usesff_filter_link(outlink)->frame_rate = ff_filter_link(mainlink)->frame_rate;inconfig_output(), mirroring patches 0005/0006 which were already written against the post-n7 API. The bug slipped through CI because the FFmpeg-Vulkan lane only buildsvf_libvmaf.o, notvf_libvmaf_tune.c; the full SYCL lane catches it now that PR #415 addedffmpeg-patches/**to the integration workflow's path filter. Discovery: PR #415 / ADR-0317. - Upstream source: zero. The vmaf-tune integration is fork-introduced; pure upstream syncs are unaffected.
- On upstream sync: zero interaction with libvmaf master. FFmpeg-side rebases when n8.1 → n8.x land in
ffmpeg-patches/test/build-and-run.sh'sFFMPEG_SHAare tracked separately under each refresh ADR (e.g., ADR-0277 for the 2026-05-04 refresh). - Re-test on rebase:
git -C /path/to/ffmpeg-8 reset --hard n8.1
for p in ffmpeg-patches/000*-*.patch; do
git -C /path/to/ffmpeg-8 am --3way "$p" || break
done
# Build smoke (libvmaf-disabled — patches 0001–0006 skipped if libvmaf_dnn
# is not built). With libvmaf_dnn available:
cd /path/to/ffmpeg-8 && ./configure --enable-libvmaf --enable-libx264 --enable-libsvtav1 --enable-libaom --enable-gpl
make -j$(nproc) ffmpeg
./ffmpeg -hide_banner -h encoder=libx264 2>&1 | grep -i qpfile
-
2026-05-06 update — patch 0007 SVT-AV1 ROI bridge promoted from scaffold to full impl: the libsvtav1 hunk now sets
enc_params.enable_roi_map = true, builds oneSvtAv1RoiMapEvtper qpfile frame upfront ineb_enc_init(per-MB qp_offsets averaged into per-64×64-SBb64_seg_mapof up to 8 segment QPs; uniform binning when the value span exceeds the segment budget), and attaches each event as aROI_MAP_EVENTpriv-data node fromeb_send_frame()withnode->size = sizeof(SvtAv1RoiMapEvt*)(the validation contract enforced by SVT-AV1'sresource_coordination_process.c). Lifetime invariant: events + maps live for the entire encode session because SVT-AV1 reads ROI_MAP_EVENT data via shallow-copied pointers on async pipeline threads (perenc_handle.c::copy_private_data_list);eb_enc_closefrees them. Wiring is gated onSVT_AV1_CHECK_VERSION(1, 6, 0); older SVT-AV1 builds keep the log-and-continue fallback. libaom remains scaffold-only — itsAOME_SET_ROI_MAPbridge stays a separate follow-up. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision). -
2026-05-06 update — patch 0007 libaom-av1 ROI bridge promoted from scaffold to full impl: the libaom-av1 hunk now caches the parsed
VmafTuneQpFileinAOMContext, allocates a segment-id map at libaom's mode-info grid (ALIGN_POWER_OF_TWO(dim, 8) >> 2, sinceav1/common/enums.h::MI_SIZE == 4), and on every encoded frame picks up to 8 segment QPs from the per-frame qp_offset value range (uniform linear binning when the span exceedsAOM_MAX_SEGMENTS == 8), paints the per-mi segment map by expanding each per-16×16-MB qp_offset into a 4×4 block of mi cells, and issuesaom_codec_control(&ctx->encoder, AOME_SET_ROI_MAP, &roi_map). Lifetime invariant: libaom deep-copies the segment map anddelta_q[]table on every control call (perav1/encoder/encoder.c::av1_set_roi_map memcpy), so a single buffer is reused across frames and freed inaom_free(). The qpfile is also freed there. Trade-off: the 8-segment cap rounds nearby qp_offsets together when the saliency model emits more than 8 distinct values per frame; finer granularity requiresvmaf-tune corpusinstead. This retires the libaom-av1 deferral noted under ADR-0312 — both AV1 encoder hooks (libsvtav1 and libaom-av1) are now full-impl. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).
0315 — Vendor-neutral VVC encode strategy (ADR-0315 / Research-0085)¶
- ADR: ADR-0315
- Digest: Research-0085
- Touches: docs-only.
docs/research/0085-vendor-neutral-vvc-encode-landscape.md(new).docs/adr/0315-vendor-neutral-vvc-encode-strategy.md(new).docs/adr/_index_fragments/0315-vendor-neutral-vvc-encode-strategy.md(new).docs/adr/_index_fragments/_order.txt(one-line append).changelog.d/added/research-0085-vendor-neutral-vvc-encode.md(new).docs/rebase-notes.md(this entry).- Rebase invariant: none. The research digest and ADR are pure surveys with no code dependencies; nothing in the fork's source tree references them in a way that breaks on upstream rebase.
- Upstream source: zero. VVC encode strategy is a fork-local decision; upstream Netflix/vmaf has no codec adapter or encode-automation surface.
- On upstream sync: zero interaction. Pure docs.
- Re-test on rebase:
- 2026-05-06 follow-up (Research-0085 verification pass):
docs/research/0085-vendor-neutral-vvc-encode-landscape.mdflipped fromStatus: SKELETONtoStatus: Active. Most[UNVERIFIED]claims are now backed by primary-source URLs (NVIDIA SDK 13.0 docs, AMD AMF GitHub, Intel oneVPL GitHub +mfxstructures.h+CHANGELOG.md, Khronos registry, Phoronix Mesa/RADV coverage, VVenC issue tracker, ZLUDA repo).- ADR-0315's
## Contextand## Alternatives consideredrefreshed with the verified data points. Status staysProposed. [UNVERIFIED]count in the digest dropped 25 → 10; remaining items are legitimate gaps (NN-VC quality lift, vvenc per-kernel profile, HHI's non-public roadmap).- No code touched. No rebase impact beyond the existing docs-only posture.
0316 — cli_parse.c error() long-only-option fix (ADR-0316)¶
- ADR: ADR-0316 (follow-up to ADR-0311).
- Digest: none — bug-fix; fix shape fits in the ADR/commit body.
- Touches:
core/tools/cli_parse.c(3 lines — call-site arg change at theARG_THREADS/ARG_SUBSAMPLE/ARG_CPUMASKhandlers).core/test/fuzz/fuzz_cli_parse.c(removedknown_assert_in_inputearly-reject filter).core/test/fuzz/cli_parse_corpus/cli_threads_abbrev_assert.argv(promoted fromcli_parse_known_crashes/).core/test/test_cli_parse_long_only_args.c(new fork()-based regression test).core/test/meson.build(new test wiring, gated off Windows alongsidetest_y4m_411_oob).core/tools/AGENTS.md(added a long-only-options invariant note next to the existingcli_parse.crules).- Rebase invariant: load-bearing.
cli_parse.cis upstream-mirror with fork additions; the three handlers carry the fork-local shape of passing theARG_*enum value (not't'/'s'/'c') toparse_unsigned(). If an upstream sync re-introduces the original short-option char shape, the assert returns and the parked-then-promoted reproducer (cli_parse_corpus/cli_threads_abbrev_assert.argv) will surface it in the next nightly fuzz run. - Upstream source: the bug shape exists in Netflix/vmaf master too (long-only options were added upstream with the same short-option-char placeholder). When the fork ports an upstream fix that overlaps these handlers, prefer the
parse_unsigned(optarg, ARG_*, argv[0])form already on the fork. - On upstream sync: re-apply the three-line change in
cli_parse.cif upstream resets the call-site args. The unit test is fork-local and stays. - Re-test on rebase:
meson setup core/build libvmaf -Denable_tests=true \
-Denable_cuda=false -Denable_sycl=false
ninja -C core/build test/test_cli_parse_long_only_args
meson test -C core/build test_cli_parse_long_only_args -v
ADR-0317 — CI flake fix: doc-only PR path-filter (2026-05-06)¶
- Touched files:
.github/workflows/docker-image.yml— addedpaths:filter on bothpush:andpull_request:triggers..github/workflows/ffmpeg-integration.yml— addedpaths:filter on bothpush:andpull_request:triggers (covers all four matrix lanes: gcc, clang, SYCL, Vulkan).docs/adr/0317-ci-doc-only-pr-flake-fix.md,docs/adr/README.md(index row),changelog.d/fixed/ci-doc-only-pr-flakes.md.- Rebase invariant: not load-bearing. Workflow-only change. Both files are fork-local CI; upstream Netflix/vmaf does not ship a Docker workflow or an FFmpeg-integration matrix in this shape, so rebase conflicts are unlikely. If a future upstream sync introduces an overlapping
docker-image.ymlor FFmpeg matrix, prefer the fork's path-filtered form — the rationale (ADR-0313 aggregator posture, doc-only-PR runner-time burn) is fork-specific. - Upstream source: none — fork-local CI workflows.
- On upstream sync: no action required. If reviewers later add new build inputs (e.g. a top-level
docker-compose.yml, a newffmpeg-patches/*.txtconfig file), extend thepaths:lists in the same PR that adds the input. - Follow-up not in this ADR: patch
ffmpeg-patches/0008-add-libvmaf_tune-filter.patchline 256 (outlink->frame_rate = mainlink->frame_rate;) needs to migrate to theff_filter_link()accessor introduced in FFmpeg n7+, matching the pattern already in patches 0005 / 0006. Tracked separately; the path-filter does not hide it (any libvmaf/ or ffmpeg-patches/ PR will still trip the SYCL lane). - Re-test on rebase:
python3 -c "import yaml; \
yaml.safe_load(open('.github/workflows/docker-image.yml')); \
yaml.safe_load(open('.github/workflows/ffmpeg-integration.yml')); \
print('OK')"
0319 — fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training (ADR-0319)¶
- Touches:
ai/scripts/train_fr_regressor_v2_ensemble_loso.py(real_load_corpus+_train_one_seedbodies),ai/scripts/run_ensemble_v2_real_corpus_loso.sh(wrapper argv fix),docs/ai/ensemble-v2-real-corpus-retrain-runbook.md(Step 0 corpus-generation section),ai/AGENTS.md(canonical-6 schema invariant note),ai/tests/test_train_fr_regressor_v2_ensemble_loso_*.py(loader + train schema tests). Closes the deferrals tracked in rebase-notes §0303 + §0309. - Upstream source: none — fork-local ML training infrastructure. Netflix/vmaf upstream has no
fr_regressor_v2surface, no LOSO trainer, and no canonical-6 corpus tooling. - Invariant: the trainer's
_load_corpusaccepts the canonical-6 JSONL schema emitted byscripts/dev/hw_encoder_corpus.pybit-for-bit — required keys per row are(src, encoder, cq, frame_index, vmaf, adm2, vif_scale0..3, motion2). Codec block layout is 12-slotENCODER_VOCABv2 one-hot + constantpreset_norm = 0.5+crf_norm = (cq - cq_min) / (cq_max - cq_min). Schema changes require anENCODER_VOCAB_VERSIONbump and full ensemble retrain per the existing closed-vocabulary rule (ADR-0235 / ADR-0352). Fold-level StandardScaler is fit on the training rows only; leaking the held-out source's distribution into the scaler would silently inflate per-fold PLCC. - On upstream sync: no action required. If upstream Netflix/vmaf ever adds a competing LOSO trainer under
python/vmaf/, do NOT merge them — keep the fork's training stack underai/per the AGENTS.md scope rule. - Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v2_ensemble_loso_loader.py \
ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py -v
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
ADR-0323 — fr_regressor_v3 train + register on ENCODER_VOCAB v3 (2026-05-06)¶
- Scope:
ai/scripts/train_fr_regressor_v3.py(new),ai/tests/test_train_fr_regressor_v3.py(new),model/tiny/fr_regressor_v3.onnx(new, real-weight checkpoint from a 9-fold LOSO gate-pass at mean PLCC 0.9975),model/tiny/fr_regressor_v3.json(new sidecar withencoder_vocab_version: 3and full per-fold trace),model/tiny/registry.json(newfr_regressor_v3row,smoke: false),ai/AGENTS.md(v3 retrain invariant section gains a "Status" subsection recording the gate result),docs/ai/models/fr_regressor_v3.md(new model card),docs/adr/0323-fr-regressor-v3-train-and-register.md+ index row,changelog.d/added/fr-regressor-v3-train-register.md. - Rebase impact: zero. Fork-local feature; no upstream Netflix/vmaf surface is touched. The 16-slot
ENCODER_VOCAB_V3imported fromtrain_fr_regressor_v2.pywas already landed by PR #401 (ADR-0302). - On upstream sync: no action required. The v3 model ships alongside v2 —
fr_regressor_v2.onnxand its sidecar are unchanged; the v3 row is appended to the registry and sorted alphabetically. If a future upstream sync ever lands a competingfr_regressor_v3model underpython/vmaf/, do NOT cross-link them — the fork's training stack lives underai/. - Watch out for: the live
ENCODER_VOCAB_VERSIONinai/scripts/train_fr_regressor_v2.pystays at 2 (per ADR-0302's invariant). Do not bump it to 3 in this PR or in any downstream port; the in-place promotion of v3 over v2 is a separate "promote v3 to authoritative" PR per ADR-0302's production-flip checklist. - Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh # must report OK: 20+
python -c "import onnx; onnx.checker.check_model(onnx.load('model/tiny/fr_regressor_v3.onnx')); print('OK')"
ADR-0321 — fr_regressor_v2_ensemble_v1 full production flip (2026-05-06)¶
- Scope:
ai/scripts/export_ensemble_v2_seeds.py(new),model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.onnx(real full-corpus-trained weights replacing the 3025-byte synthetic scaffold bytes),model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.json(new per-seed sidecars),model/tiny/registry.json(sha256 +smoke: falseon the five seed rows),ai/AGENTS.md(new invariant: the registry-flip is now done; future re-flips require a fresh PROMOTE.json + re-run of the export driver). - Rebase impact: zero. This is a fork-local production-flip; no upstream Netflix/vmaf surface is touched. The 12-slot
ENCODER_VOCABv2 carried in each sidecar is the same one the LOSO trainer (ADR-0319) bakes into the codec-block layout, so there is no rebase-time vocabulary drift to worry about. - Watch out for: if a future upstream sync ever introduces a competing
fr_regressor_v2_ensemble_*model underpython/vmaf/, do NOT cross-link them — the fork's ensemble weights are gated onruns/ensemble_v2_real/PROMOTE.jsonand are not portable to a different training stack. - Re-test on rebase:
bash core/test/dnn/test_registry.sh # must report OK: 19
python -c "import onnx; \
[onnx.checker.check_model(onnx.load(f'model/tiny/fr_regressor_v2_ensemble_v1_seed{i}.onnx')) \
for i in range(5)]; print('OK')"
ADR-0324 — Ensemble training kit (2026-05-06)¶
- Touches:
tools/ensemble-training-kit/(new),docs/adr/0324-ensemble-training-kit.md(new),docs/adr/README.md(index row),changelog.d/added/0324-ensemble-training-kit.md(new). No engine code touched; no upstream-shared paths. - Invariant: the kit assumes the LOSO wrapper hard-codes seeds
(0 1 2 3 4). The orchestrator surfaces a warning if--seedsdeviates but still hands off to the wrapper. If a future PR parameterises the wrapper's seed list, update both the wrapper and the kit's pass-through logic in lockstep. - On upstream sync: no action required. The kit lives entirely under
tools/ensemble-training-kit/(a fork-local path) and only invokes other fork-local scripts (ai/scripts/,scripts/dev/,scripts/ci/). - Re-test on rebase:
bash -n tools/ensemble-training-kit/*.sh
bash tools/ensemble-training-kit/make-distribution-tarball.sh /tmp/kit-test.tar.gz
tar -tzf /tmp/kit-test.tar.gz | grep -q "tools/ensemble-training-kit/run-full-pipeline.sh"
ADR-0332 — External-competitor benchmark harness (2026-05-08)¶
- Touches:
tools/external-bench/(new),docs/adr/0332-external-bench-wrapper-only.md(new),docs/adr/_index_fragments/0332-external-bench-wrapper-only.md(new),docs/adr/_index_fragments/_order.txt(one-line append),docs/adr/README.md(regenerated),changelog.d/added/external-bench-harness.md(new),docs/research/0087-external-bench-competitor-survey-2026-05-08.md(new). No engine code touched; no upstream-shared paths. - Invariant: the harness is wrapper-only — never vendor or link
x264-pVMAF(GPL-2.0) into this fork. Future competitors follow the same pattern (tools/external-bench/<competitor>/run.shinvokes a user-installed binary via env var; output schema-shimmed into the canonical JSON shape). The output schema (frames[].{frame_idx, predicted_vmaf_or_mos, runtime_ms}+summary.{competitor, plcc, srocc, rmse, runtime_total_ms, params, gflops}) is the contract between every wrapper andcompare.py.run_wrapper'srunnerparameter MUST stay resolved at call time (not via default-arg binding) so monkeypatch-based tests work. - On upstream sync: no action required. The harness lives entirely under
tools/external-bench/(a fork-local path) and never touches Netflix-shared code.
ADR-0331 — Skip CI on draft pull requests (2026-05-08)¶
- Touches:
.github/workflows/{docker-image,security-scans,lint-and-format,ffmpeg-integration,libvmaf-build-matrix,rule-enforcement,tests-and-quality-gates}.yml(per-jobif:clause +pull_request.typeslist).required-aggregator.ymlis unchanged — it already adopted the pattern under ADR-0313. No upstream-shared paths. - Invariant: every top-level job in the eight fork workflows that trigger on
pull_requestcarriesif: github.event_name != 'pull_request' || github.event.pull_request.draft == false. Thepull_request:block listsready_for_reviewintypes:so promotion of a draft fires CI exactly once. The second clause keepspush:triggers (no PR object) intact. If an upstream merge introduces a new top-level job, that job MUST inherit the gate; otherwise drafts will silently consume one matrix slot per push. -
On upstream sync: Netflix/vmaf upstream does not gate on draft state; if a sync brings in new
pull_requestworkflow content, replay the gate on every newly-introduced top-level job. Composing with an existingif:follows thecoverage-gpupattern — wrap both predicates in${{ ... && ( ... ) }}. -
Re-test on rebase:
```bash python3 -c "import yaml; names=['docker-image','security-scans','lint-and-format','required-aggregator','ffmpeg-integration','libvmaf-build-matrix','rule-enforcement','tests-and-quality-gates']; [yaml.safe_load(open(f'.github/workflows/{n}.yml')) for n in names]; print('OK')" # Spot-check the gate is present on every top-level job: for f in docker-image security-scans lint-and-format ffmpeg-integration \ libvmaf-build-matrix rule-enforcement tests-and-quality-gates \ required-aggregator; do grep -c "pull_request.draft == false" ".github/workflows/${f}.yml" done # Each must report >= 1.
SSIM extractor registration fix (2026-05-08)¶
- Touches:
core/src/feature/feature_extractor.c(upstream-mirror — adds one extern + one registry-array entry near the existing SSIM rows),core/src/feature/integer_ssim.c(upstream-mirror — adds#include "config.h"and refreshes the file-scope comment abovevmaf_fex_ssim),core/src/meson.build(addsinteger_ssim.cto the source list — fork-local diff),core/test/test_feature_extractor.c(adds one regression test alongside the existing tests),docs/metrics/features.md(table row + footnote ²),docs/state.md,changelog.d/fixed/ssim-extractor-registration.md. - Invariant on the upstream-mirror files: the registry-array entry must remain inside the unconditional CPU block (the same block as
&vmaf_fex_float_ssim/&vmaf_fex_float_ms_ssim) —vmaf_fex_ssimis CPU-only with no SIMD or GPU twin. Theconfig.hinclude ininteger_ssim.cis load-bearing on Vulkan-enabled LTO builds becausefeature_extractor.candinteger_ssim.cmust agree onHAVE_VULKAN/HAVE_CUDA/HAVE_SYCLfor theVmafFeatureExtractorstruct layout to match across TUs. - On upstream sync: if Netflix ever lands its own integer-SSIM registry row, drop the fork's row in favour of upstream's; the file structure is identical. If upstream removes
integer_ssim.centirely (the file has been dormant on master for years), revert the meson.build addition. Otherwise no action. - Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false && ninja -C build
./build/test/test_feature_extractor # 5/5 pass, includes new ssim row
./build/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--feature ssim --output /tmp/ssim_smoke.json && \
grep -q '<metric name="ssim"' /tmp/ssim_smoke.json
# Vulkan-enabled LTO build (-Wlto-type-mismatch must stay clean)
meson setup build-vulkan -Denable_vulkan=enabled --reconfigure && \
ninja -C build-vulkan tools/vmaf
CI paths-ignore deny-list on heavy workflows (ADR-0341, 2026-05-09)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(fork-local —paths-ignore:block underpull_request:),.github/workflows/tests-and-quality-gates.yml(fork-local — same block),docs/adr/0341-ci-paths-ignore-doc-only-prs.md+ index fragment,changelog.d/changed/ci-paths-ignore-doc-only.md. - Invariant: the deny-list must stay strictly documentation-only (
docs/**,**/*.md,changelog.d/**,CHANGELOG.md,.workingdir2/**). Any path that contributes to a build, test, or lint input —libvmaf/**,meson.build,meson_options.txt,subprojects/**,python/**,ai/**,mcp-server/**,model/**,testdata/**,.github/workflows/**— must NEVER appear in the deny-list, otherwise the corresponding required check is silently skipped on a code-touching PR. The Required Checks Aggregator (ADR-0313) catches only the doc-only case (no required check ever ran for any required name); a too-broad deny-list would lose build coverage without anyone noticing. - On upstream sync: Netflix/vmaf upstream does not carry these two workflow files (they are fork-local additions). No sync conflict expected.
- Re-test on rebase:
HDR VMAF model search — Path C documentation only (2026-05-09)¶
- Files added (this fork only; upstream Netflix/vmaf has none of these):
model/vmaf_hdr_model_card.md— discoverable warning that the HDR scoring path falls back to the SDRvmaf_v0.6.1.jsonweights. Filename deliberately uses.md, not.json, so thevmaftune.hdr.select_hdr_vmaf_modelglob (vmaf_hdr_*.json) keeps returningNone.docs/research/0089-hdr-vmaf-model-search.md— verbatim trail of the source-or-train survey (URLs + access dates).changelog.d/added/hdr-vmaf-model-search.md— release-notes fragment per ADR-0221.- ADR-0300 grew an inline
### Status update 2026-05-09: HDR model statussection. - Why no model JSON ships: Path A negative findings (no public Netflix HDR VMAF model exists; HDRMAX is a different algorithm not loadable by libvmaf's JSON path). Path B deferred behind gated subjective HDR corpora + multi-day training compute. No fabricated weights are introduced.
- On upstream sync: if Netflix lands
vmaf_hdr_*.jsoninNetflix/vmaf/model/, port via/port-upstream-commit; the resolver picks it up automatically with novmaftunechange. Then deletemodel/vmaf_hdr_model_card.md(or rewrite it as a normal model card describing the upstream weights). Watch https://github.com/Netflix/vmaf/issues/645 for the upstream release announcement. - Re-test on rebase: no behavioural change — pure docs. Sanity:
python3 -c "from pathlib import Path; \
import sys; sys.path.insert(0,'tools/vmaf-tune/src'); \
from vmaftune.hdr import select_hdr_vmaf_model; \
print(select_hdr_vmaf_model(Path('model')))"
# Expect: None — confirms the .md card does not match the glob
ADR-0349 — fr_regressor_v3 namespace resolution (2026-05-09)¶
- Rebase impact: none. Docs-only change — adds ADR-0349, an append-only status appendix on ADR-0302 per ADR-0028, a
## fr_regressor_* namespace mapblock inai/AGENTS.md, and two changelog fragments. No upstream Netflix/vmaf surface touched; nofr_regressor_*registry rows touched (sha256s for_v1,_v2,_v2_ensemble_v1_seed{0..4},_v3all unchanged); no C / Python / ONNX bytes modified. - What to check after a rebase: nothing automated. The only drift risk is a future agent claiming
fr_regressor_v3plus_featuresfor an unrelated workstream —ai/AGENTS.mdcarries the reservation; reviewers verify the map row exists before approving any newfr_regressor_*registry id. - Reproducer:
```bash # ADR + AGENTS.md namespace map present and consistent: test -f docs/adr/0349-fr-regressor-v3-namespace.md grep -q "fr_regressor_* namespace map" ai/AGENTS.md grep -q "fr_regressor_v3plus_features" ai/AGENTS.md docs/adr/0349-fr-regressor-v3-namespace.md # Status appendix present on ADR-0302: grep -q "Status update 2026-05-09: namespace collision resolved" \ docs/adr/0302-encoder-vocab-v3-schema-expansion.md # Existing v3 production row bit-identical (sha256 unchanged): python3 -c "
import json reg = json.load(open('model/tiny/registry.json')) v3 = next(m for m in reg['models'] if m['id'] == 'fr_regressor_v3') assert v3['sha256'] == 'eaa16d23461eda74940b2ed590edfcaf13428aade294e47792a5a15f4d3b999c', v3 assert v3['smoke'] is False print('OK: fr_regressor_v3 production row unchanged') "
Registry test still passes:¶
bash core/test/dnn/test_registry.sh
0327 — Pre-push PR-body deliverables validator hook¶
- Touches:
scripts/ci/validate-pr-body.sh(new),scripts/git-hooks/pre-push(new),scripts/ci/test-validate-pr-body.sh(new),Makefile(hooks-installtarget adds the pre-push symlink). Re-usesscripts/ci/deliverables-check.shparser verbatim — no upstream-shared file is modified. - Invariant: parser shape parity with
.github/workflows/rule-enforcement.ymldeep-dive-checklist gate (ADR-0108). The validator constructs aPATHshim that interceptsgit diff --name-onlycalls only; every othergitinvocation falls through to the real binary. - On upstream sync: not applicable — these files are entirely fork-local and Netflix has no equivalent. If
scripts/ci/deliverables-check.shis ever rewritten or moved, the validator's exec path (scripts/ci/deliverables-check.sh) and the test harness's expected exit codes must follow. bash scripts/ci/test-validate-pr-body.sh # 8/8 cases pass
0320 — Semgrep # nosemgrep cites on Netflix-upstream Python harness (Research-0090)¶
- Touches:
python/vmaf/core/asset.py,python/vmaf/core/executor.py,python/vmaf/core/feature_extractor.py,python/vmaf/core/quality_runner.py,python/vmaf/core/result_store.py,python/vmaf/tools/decorator.py,python/test/command_line_test.py,python/test/feature_extractor_test.py,python/test/ssimulacra2_test.py,python/vmaf/config.py. - Invariant: every fork-added
# nosemgrep: <rule-id>line is paired with an inline cite toResearch-0090. The cite + rule-id pair is the load-bearing artifact (per memoryfeedback_no_guessing: every "false positive" claim ships its safety proof). If an upstream sync removes the cited line of code, drop the cite-comment block too. If upstream adds adefusedxmlfix at theElementTree.parse()site (feature_extractor.py:115,quality_runner.py:1496), keep upstream's fix and drop our suppressions. config.py:40(the SSL-bypass deletion) is a fork-exclusive security fix; if upstream resurrectsssl._create_unverified_contexton a sync, do not re-merge it — the bypass clobbers the process-global default and is unjustified per Research-0090, F1. semgrep scan --config=p/cwe-top-25 --config=p/c --config=p/python . \ --metrics=off --json | jq '.results | length'
# expect 0 — every legit finding either has a # nosemgrep cite or was fixed
0321 — Security-scans workflow registry-pack list (Research-0090)¶
- Touches:
.github/workflows/security-scans.yml,.github/workflows/lint-and-format.yml. - Invariant: the registry packs the workflow cites (
p/cwe-top-25+p/c+p/python) are validated againsthttps://semgrep.dev/c/p/<pack>— the previously-citedp/cert-c-strict,p/cert-cpp-strict, andp/cpppacks were retired by Semgrep in 2025 and 404. Thelint-and-format.ymlpull of${{ github.* }}intoenv:(clang-tidy + clang-tidy-sycl steps) defusesrun-shell-injection; preserve the pattern on any edit. See Research-0090, F2/F3. for pack in p/cwe-top-25 p/c p/python; do code=$(curl -sIL "https://semgrep.dev/c/${pack}" | head -1 | awk '{print $2}') [ "$code" = "200" ] && echo "${pack}: OK" || echo "${pack}: FAIL ($code)"
0320 — CodeQL C bulk sweep (78 deferred alerts → 60 fixed, 14 deferred to T7-5)¶
- Touches:
core/src/feature/{cambi.c,ciede.c,integer_adm.c,integer_psnr.c,adm_tools.h,third_party/xiph/psnr_hvs.c},core/src/feature/x86/{adm_avx2.c,adm_avx512.c,ansnr_avx2.c,ansnr_avx512.c,vif_avx2.c,vif_avx512.c},core/src/{pdjson.c,svm.cpp},core/test/{test_cpu.c,test_model.c},core/tools/{y4m_input.c,yuv_input.c,vmaf_bench.c}. All butvmaf_bench.care upstream-mirror Netflix files. - Invariant: widening casts on integer multiplications (
(size_t),(uint64_t),(double)) are LHS-prefixed before the multiply, never wrapped around the whole expression — the latter is a no-op againstcpp/integer-multiplication-cast-to-long. Deleted commented-out blocks (e.g., the AVX-512 VP-loop dead variant inadm_avx512.c::adm_dwt2_inverse) are gone for good; if upstream brings them back, they reintroduce the alerts.iqa/convolve.cwas deliberately left untouched: prefixing(double)on the float×float multiplications inside the scalar reference path breaks bit-exactness against the AVX2 path enforced bytest_iqa_convolve— CodeQL alert deferred to a follow-up that updates both paths in lockstep. - On upstream sync: any upstream change that re-introduces the deleted comment blocks or rewrites the cast forms will surface the alerts again. The
cambi_scoresignature change (CambiBuffers buffers→const CambiBuffers *buffers) is fork-local and likely to conflict with upstream patches that touch that function. The 14 deferredVifBufferlarge-parameter alerts are tracked under T7-5 (multi-backend coordinated refactor including NEON). - Re-test on rebase: cd libvmaf && meson test -C build # all 50+ C tests make test-netflix-golden # upstream golden gate
# Re-run CodeQL on master afterwards; the 60 fixed alerts must stay closed.
CodeQL cpp/declaration-hides-variable sweep (2026-05-09)¶
- What changed: Mechanical rename / scope-tighten / dedupe sweep closing 64 open
cpp/declaration-hides-variableCodeQL alerts onmaster. Touched files:core/src/feature/cambi.c,core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/feature/x86/vif_avx2.c,core/src/feature/x86/vif_avx512.c. All five are upstream-mirror; the Netflix copyright header is preserved on each. - Renames adopted (semantic over
_2suffix): cambi.c: innerint errshadowing function-scopeerrbecomesmkdir_err(heatmaps init) andsrc_err(full-ref extract path).adm_avx2.c/adm_avx512.c: thej == 0first-column special-case block is wrapped in{ ... }so itsj0..j3ands0..s3stop being visible to the per-jtail loop. The inner duplicate__m256i add_shift_HP_vex = _mm256_set1_epi32(32768)(and 512-bit twin) is removed — bit-identical to the function-scope value already in scope. The__m256i rfactor1that shadowed the function-scopefloat rfactor1[3]becomesrfactor_v0/_v1/_v2(and the AVX-512 twin likewise).vif_avx2.c/vif_avx512.c: tap-loop locals followf_tap,r_top/r_bot,d_top/d_botfor the s0 stage, andf_tap0/f_tap1,r_back0/r_fwd0, etc. for the AVX-512 paired-tap stage. Inner per-fj__m256i fq/__m512i fqshadows of the centre-tap broadcast becomef_tap. Inner-block duplicates of function-scoperef/dis/stride/ii(identical types and initialisers) are simply removed. The two scalarVifResiduals residualsdeclarations that shadowed function-scopeResiduals512 residualsbecometail_residuals. The twoconst uint16_t fcoeffdeclarations that shadowed function-scope__m512i fcoeffbecomefcoeff_scalar.- Invariant: bit-exactness gate — the rename sweep must not change any score. The Netflix CPU golden 3 (
src01_hrc00,checkerboard_1,checkerboard_10) ran clean against this PR. All 76 VMAF-targeted Python tests pass; the 9 unrelated pre-existing failures (NIQE, PyPSNR, FileSystemResultStore) reproduce on a pristineorigin/mastercheckout. - On upstream sync: Netflix has no equivalent renames on upstream
masteras of2026-05-09. When syncing, prefer the fork's renamed identifiers (the CodeQL gate depends on them). If Netflix later renames the same locals differently, reconcile by keeping fork names and updating any imported chunks at port time. - Re-test on rebase: meson test -C build --suite=fast PYTHONPATH=$PWD/python python3 -m pytest \ python/test/quality_runner_test.py -k test_run_vmaf \ python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ -m "not slow" -q
ADR-0209 v1 stdio runtime (T5-2b) — Embedded MCP server (2026-05-08)¶
- Touches:
core/src/mcp/{mcp.c,dispatcher.c,transport_stdio.c,mcp_internal.h,meson.build,3rdparty/cJSON/{cJSON.c,cJSON.h,LICENSE}},core/test/test_mcp_smoke.c,core/test/meson.build. All paths are fork-local. cJSON is vendored verbatim from upstreamDaveGamble/cJSON@v1.7.18under its MIT license. - Invariant: every TU under
core/src/mcp/(other than the vendored cJSON dir) is fork-local with theCopyright 2026 Lusoris and Claude (Anthropic)header; cJSON keeps its upstream MIT header verbatim. The public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged from T5-2 — only function bodies flipped from-ENOSYSto working implementations. SSE / UDS still return-ENOSYSso the v2 PR can wire them without touching the public surface. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface; the entire
core/src/mcp/subtree is fork-local. If upstream ever adds an MCP surface, expect a port-only sync since names will collide. cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \ -Denable_mcp=true -Denable_mcp_stdio=true ninja -C build && meson test -C build test_mcp_smoke -v
ADR-0334 — state.md-touch-check CI gate (2026-05-08)¶
- Touches:
.github/workflows/rule-enforcement.yml(new top-level jobstate-md-touch-check),scripts/ci/state-md-touch-check.sh(new),scripts/ci/test-state-md-touch-check.sh(new),scripts/ci/AGENTS.md(new rebase-sensitive-surface row),.github/PULL_REQUEST_TEMPLATE.md(already carries the "Bug-status hygiene" section +no state delta: REASONopt-out — coupled to the script's regex). No upstream-shared paths. - Invariant: the gate's trigger predicate (Conventional-Commit
fix:prefix, barebugtoken in title, GitHub close-keywordscloses/fixes/resolves#N, unchecked Bug-status-hygiene checkbox) and opt-out sentinel (no state delta: REASON) match the wording of the## Bug-status hygienesection in.github/PULL_REQUEST_TEMPLATE.md. Reword the template only alongside the script. The job carries thepull_request.draft == false || github.event_name != 'pull_request'gate (ADR-0331 pattern) — keep that on any future hoist into the required-aggregator set. - On upstream sync: Netflix/vmaf has no equivalent rule. No conflict expected; the workflow file is fork-introduced.
- Re-test on rebase: bash scripts/ci/test-state-md-touch-check.sh python3 -c "import yaml; yaml.safe_load(open('.github/workflows/rule-enforcement.yml')); print('YAML OK')" pre-commit run shellcheck --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh pre-commit run shfmt --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh
SYCL PSNR chroma extension (T3-15(b), 2026-05-09)¶
- Touches:
core/src/feature/sycl/integer_psnr_sycl.cpp(per-extractor chroma device buffers, per-plane SSE accumulators, and aprovided_featuresextension topsnr_y/psnr_cb/psnr_cr),core/src/sycl/AGENTS.md(per-kernel rebase-sensitive invariant for the chroma-on-per-extractor-buffer arrangement),docs/metrics/features.md(footnote ¹ refresh — all three GPU PSNR extractors now emit chroma),docs/adr/0192-gpu-long-tail-batch-3.mdReferences-section status update,changelog.d/added/sycl-psnr-chroma.md. - Invariant on the chroma upload path: chroma planes ride on per-extractor device buffers populated by host-side staging copies in the combined-graph
pre_fncallback — NOT the SYCL state's shared frame buffer (vmaf_sycl_shared_frame_init), which is luma-only by design. Luma stays graph-recorded; chroma SSE kernels run direct inpost_fnon the same in-order combined queue. The CUDA twin (PR #520 / commit 7f3d58a5) uses the existing CUDA per-plane picture infrastructure and therefore has no equivalent invariant. - On upstream sync: Netflix/vmaf upstream has no SYCL backend at all, so conflict probability is zero on
psnr_sycl. If an upstream port to the fork's SYCL runtime someday extendsvmaf_sycl_shared_frame_initto allocate chroma planes, the PSNR extension can be migrated onto it and the per-extractor chroma buffers retired — but only after a cross-backend gate run confirms bit-exactness against CPU atplaces=4(ADR-0214). source /opt/intel/oneapi/setvars.sh CC=icx CXX=icpx meson setup build-sycl libvmaf \ -Denable_sycl=true -Denable_cuda=false ninja -C build-sycl python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build-sycl/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend sycl --device 0
# Expect 0/48 mismatches across psnr_y / psnr_cb / psnr_cr at places=4.
```text
Cppcheck nullPointer false-positive in dict.c (2026-05-09)¶
Files pinned:
core/src/dict.c:121(one-line redundant-condition fix indict_overwrite_existing). Why this rebase-note exists: Master CI'sCppcheck (Whole Project)gate started failing on commit14b5ffba(#537) and blocked every open PR because each PR rebases onto a broken master. The cppcheck finding was likely always present but masked bypaths-ignorefiltering on the prior workflow shape; PR #530 widened cppcheck's trigger surface and exposed it. Deleted the redundant&& valguard sincevalis already checked at the public entry-pointvmaf_dictionary_set(dict.c:137). No behavior change; cppcheck flags the original as "either the val check is redundant or there's a possible null deref" because it can't prove the interprocedural guarantee. Rebase-sensitivity: zero — change is local todict.c. Future upstream sync of this file should keep the fix or re-run cppcheck locally to confirm absence of recurrence.
Aggregator timeout bump (2026-05-09)¶
Files pinned:
.github/workflows/required-aggregator.yml(deadline 30→90 min, job timeout 35→100 min) Why: 41 PRs in flight 2026-05-09 morning hit Aggregator timeouts while real CI eventually passed. Bumping both deadlines unblocks the train without touching the underlying matrix. Rebase-sensitivity: zero — workflow file is wholly fork-local.
ARC self-hosted runner pool — pilot Cppcheck routing (2026-05-09)¶
.github/workflows/lint-and-format.yml(Cppcheckruns-on:ternary). Why: opt-in graceful migration; ADR-0359 + docs/development/ci-runners.md document the flip-the-variable recipe when the cluster is degraded. Rebase-sensitivity: zero — workflow file is fork-local.
ADR-0338 — macOS Vulkan-via-MoltenVK CI lane (2026-05-09)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(fork-local — addsBuild — macOS Vulkan via MoltenVK (advisory)lane, addscontinue-on-errorplumbing onmatrix.experimental && matrix.moltenvk, addsInstall MoltenVK + Vulkan loader/headers (macOS)step, addsRun Vulkan smoke tests (macOS MoltenVK)step, gates the existing test/cache/tox steps on!matrix.moltenvk),docs/backends/vulkan/moltenvk.md(new fork-local doc),docs/adr/0127-vulkan-compute-backend.md(status-update appendix per the ADR's Proposed status — body untouched),docs/adr/0338-macos-vulkan-via-moltenvk-lane.md(new),docs/adr/_index_fragments/0338-macos-vulkan-via-moltenvk-lane.mdplus_order.txtappend (new),docs/research/0089-moltenvk-feasibility-on-fork-shaders.md(new),changelog.d/added/macos-vulkan-via-moltenvk-lane.md(new). - Invariant on the upstream-mirror file: none —
libvmaf-build-matrix.ymlis fork-local. The new lane'scontinue-on-errorclause MUST stay scoped tomatrix.experimental == true && matrix.moltenvk == trueso existingexperimental: truematrix entries (e.g. the macOS DNN lane) keep their default fail-fast behaviour.VK_ICD_FILENAMESMUST point at/opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json— note theetc/vulkansegment, NOTshare/vulkan(the homebrew formula's install layout usesetc/; verified againstFormula/m/molten-vk.rb). - On upstream sync: Netflix upstream has no macOS Vulkan lane and no MoltenVK awareness; nothing to reconcile. If a future MoltenVK release drops support for
GL_EXT_shader_atomic_int64translation,moment.compwill fail on the lane; the fix path is in ADR-0338 §Decision (lane iscontinue-on-errorso it does not block PRs) — update the known-limitations table indocs/backends/vulkan/moltenvk.mdand either pin a working MoltenVK version in the brew install line or rewrite the shader. - Re-test on rebase:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/libvmaf-build-matrix.yml'))" && \
echo "YAML parse OK"
# Confirm the lane is still in the matrix:
grep -q "Build — macOS Vulkan via MoltenVK (advisory)" \
.github/workflows/libvmaf-build-matrix.yml
# Confirm the lane is NOT promoted to required-aggregator until one
# green run on master (per ADR-0338):
! grep -q "macOS Vulkan via MoltenVK" \
.github/workflows/required-aggregator.yml
# Confirm the ICD path is the etc/ one, not share/:
grep -q "etc/vulkan/icd.d/MoltenVK_icd.json" \
.github/workflows/libvmaf-build-matrix.yml
ADR-0363 — Mend Renovate replaces Dependabot (2026-05-09)¶
- Touches:
renovate.json(new, repo-root),.github/workflows/renovate.yml(new),.github/dependabot.yml(deleted — renamed to.github/dependabot.yml.disabled),docs/development/dependency-bot.md(new operator playbook),changelog.d/changed/renovate-supersedes-dependabot.md(new),docs/adr/0363-renovate-replaces-dependabot.md(new),docs/adr/_index_fragments/0363-renovate-replaces-dependabot.md(new). - Invariant:
.github/dependabot.ymlno longer exists onmaster; the disabled copy isdependabot.yml.disabled. On upstream sync, if Netflix ever ships their owndependabot.yml, do NOT restore it — the fork intentionally uses Renovate. Merge the upstream file intodependabot.yml.disabledfor reference only. - Upstream interaction: none. Netflix/vmaf upstream has no Renovate config. Conflict risk is zero unless upstream adds
renovate.jsonor restoresdependabot.yml. - Re-test on rebase:
# Verify the workflow SHA-pin is still present and non-floating:
grep -E 'renovatebot/github-action@[a-f0-9]{40}' .github/workflows/renovate.yml
# Verify dependabot.yml is still absent:
test ! -f .github/dependabot.yml && echo "ok: dependabot.yml absent"
# Validate renovate.json syntax (requires Node):
node -e "JSON.parse(require('fs').readFileSync('renovate.json','utf8')); console.log('JSON valid')"
ADR-0355 — Symphony-inspired agent-dispatch infrastructure (2026-05-09)¶
Files added (all fork-introduced, none mirror upstream):
.claude/workflows/_template.md,.claude/workflows/codeql-alert-sweep.md,.claude/workflows/simd-port.md,.claude/workflows/feature-extractor-port.md.scripts/lib/__init__.py,scripts/lib/backlog_tracker.py,scripts/lib/AGENTS.md.scripts/ci/agent-eligibility-precheck.py(new row inscripts/ci/AGENTS.md"Rebase-sensitive surfaces" table).docs/development/agent-dispatch.md. Why this rebase-note exists: pure additive, all paths are fork-only (.claude/,scripts/lib/, fork-only docs). Upstream Netflix/vmaf has no.claude/, noscripts/lib/, and nodocs/development/agent-dispatch.md, so the merge surface is zero on/sync-upstream. The only coupling is internal betweenscripts/ci/agent-eligibility-precheck.pyandscripts/lib/backlog_tracker.py(sys.path import). Both files move together; documented inscripts/lib/AGENTS.mdand a new row inscripts/ci/AGENTS.md. Rebase-sensitivity: zero w.r.t. upstream. Internal-only: renamingBacklogItemfield names or theBacklogTracker/GitHubTrackerpublic method signatures is a breaking change for the precheck and any future state-audit script — guard via the smoke listed in Research-0091 §"Smoke results" before any rename PR. Format-coupling note: the BACKLOG.md row regex (scripts/lib/backlog_tracker.py:_ID_PATTERN) is brittle against table-shape edits. If a future BACKLOG.md edit adds a column or renames a status word, the parser will silently mis-classify rows — the smoke parses 101 rows on master at 2026-05-09; expect ≥ 100 after any structural edit.
0350 — psnr_hvs AVX-512 ceiling re-bench (ADR-0350, T3-9 (a))¶
docs/adr/0350-psnr-hvs-avx512-ceiling.md— closure ADR.docs/adr/0160-psnr-hvs-neon-bitexact.md— appended### Status update 2026-05-09appendix.docs/research/0091-psnr-hvs-avx512-bench-2026-05-09.md— empirical companion (cycle share, Amdahl ceiling, reproducer). Why this rebase-note exists: T3-9 (a) closes as AVX2 ceiling. The result has zero rebase-sensitivity by itself — no engine code changes — but the bit-exactness invariants that lock it to a ceiling do. The 78.42 % scalar tail incalc_psnrhvs_avx2/calc_psnrhvs_neonis locked by ADR-0138 / ADR-0139's "per-lane-scalar float reduction" rule (carried by ADR-0159 / ADR-0160). If a future upstream sync ofcore/src/feature/third_party/xiph/psnr_hvs.c(the Xiph/Daala DCT) changes the per-block summation tree — e.g. partial folding, re-ordered means, vectorised mask reductions — the AVX2 + NEON TUs incore/src/feature/x86/psnr_hvs_avx2.candcore/src/feature/arm64/psnr_hvs_neon.cMUST be re-audited against the new scalar reference, and the ceiling argument in ADR-0350 must be re-run (because the 78 / 15 cycle-share split would shift). Rebase-sensitivity: low for the ceiling decision itself (empirical re-bench on a current host is cheap — 30 seconds via the reproducer in Research-0091 §7); high for the underlying bit-exactness invariants the decision rests on (Netflix golden trips on ≥ 5.5e-5 drift per ADR-0160 §Context). The ADR-0350 §Verification reproducer is the gate — re-run it if the cycle share shifts, the Netflix normal-pair fixture changes, or a new host class (e.g. wide-issue Granite Rapids) goes into CI.
0320 — FFmpeg n8.1 → n8.1.1 base bump (2026-05-09)¶
- Touches:
ffmpeg-patches/series.txt(header comment),ffmpeg-patches/README.md(apply / verify / smoke sections),ffmpeg-patches/test/build-and-run.sh(FFMPEG_SHAdefault),scripts/ci/ffmpeg-patches-check.sh(header comment;FFMPEG_BRANCHenv default unchanged atrelease/8.1since the branch tracks point releases),docs/development/automated-rule-enforcement.md(gate description). The 9.patchfiles themselves are unchanged — every patch in the series applied cleanly, cumulatively, against pristinen8.1.1viagit am --3way. - Upstream source: FFmpeg upstream point release n8.1.1 (commit
239f2c7"Bump micro for 8.1.1") — bug-fix-only on top of n8.1, no API or AVOption breakage that the patch stack consumes. - Invariant: the patch stack continues to apply against the current tip of FFmpeg's
release/8.1branch. Per ADR-0118 and ADR-0186 §FFmpeg patch coupling, the verification gate is cumulativegit am --3wayagainst a pristine checkout, not per-patch standalone apply. The scripts/ci/ffmpeg-patches-check.sh local gate usesgit apply(no commit) but accumulates state in the same way. - On upstream sync: no action required. If a future FFmpeg point release (n8.1.2 or n8.2) lands new hunks that conflict with one of the patches, regenerate the affected patches via
git format-patchon the resolved state, bump the references in the five files listed under "Touches", and add a fresh rebase-notes entry citing the conflict file(s). - Re-test on rebase:
cd /tmp && rm -rf ffmpeg-n811 && \
git clone --depth 1 --branch n8.1.1 \
https://git.ffmpeg.org/ffmpeg.git ffmpeg-n811
git -C /tmp/ffmpeg-n811 config user.email agent@local
git -C /tmp/ffmpeg-n811 config user.name agent
for p in ffmpeg-patches/000*-*.patch; do
git -C /tmp/ffmpeg-n811 am --3way "$p" || break
done
bash scripts/ci/ffmpeg-patches-check.sh
ADR-0281 follow-up — QSV install-matrix discoverability backfill (2026-05-08)¶
- Touches:
docs/getting-started/install/{arch,fedora,ubuntu,macos,windows}.md(new## Intel QSVsection per page),docs/adr/0281-vmaf-tune-qsv-adapters.md(status-update appendix per ADR-0028),changelog.d/changed/qsv-install-matrix-docs.md(new fragment). No code, no engine, no upstream-shared C / Python source touched. Pure documentation backfill closing the SYCL-audit research-0086 Topic C gap (issue #464). - Invariant: each per-OS QSV section pins the package names against verified upstream URLs with a
Verified 2026-05-08access date. The hardware-generation matrix is sourced from the public Wikipedia "Intel Quick Sync Video — Hardware decoding and encoding" table; if Intel revises which generation supports AV1 encode (e.g. backports the encoder to Lunar Lake / Meteor Lake silicon currently absent from the table), the matrix in all five pages must move in lockstep — the Arch / Fedora / Ubuntu / Windows pages all carry the same matrix verbatim. The macOS page deliberately omits the matrix (QSV unsupported on macOS). - On upstream sync: no action required — Netflix/vmaf upstream does not ship per-OS install pages under
docs/getting-started/install/; that tree is fork-only.
# Lint the install pages (markdownlint via pre-commit):
pre-commit run --files docs/getting-started/install/*.md
# Verify each page (except alpine + macos) still carries the matrix:
for f in arch fedora ubuntu windows; do grep -q 'Arc Battlemage' "docs/getting-started/install/${f}.md" || echo "MISSING: ${f}"
# Confirm the macOS page documents QSV as unsupported:
grep -q 'Intel QSV. is unsupported on macOS' docs/getting-started/install/macos.md
0333 — vmaf-tune Phase F multi-pass encoding (ADR-0333)¶
Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(CodecAdapter Protocol gainssupports_two_pass: bool+two_pass_args(...))tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py(overrides both)tools/vmaf-tune/src/vmaftune/encode.py(EncodeRequestgainspass_number/stats_path;build_ffmpeg_commandadds the 2-pass argv splice + pass-1 null-muxer redirect; newrun_two_pass_encode)tools/vmaf-tune/src/vmaftune/corpus.py(CorpusOptions.two_pass, routing initer_rows)tools/vmaf-tune/src/vmaftune/cli.py(--two-passflag oncorpus/recommendsubparsers) Invariant: 2-pass encoding routes through the codec adapter viasupports_two_pass+two_pass_args(pass_number, stats_path). The encode driver never branches on codec name. Adapters withsupports_two_pass = Falseare honoured silently (single-pass fallback with stderr warning); the seam is open for sibling codec adapters (libx264, libsvtav1, libvvenc, libaom-av1) to opt in by overriding the two methods on their adapter file alone. This is the fork-local extension to the ADR-0237 Phase A multi-codec contract; upstream Netflix/vmaf has no equivalent and does not own this code path. Re-test:
(Optional, requires ffmpeg + libx265 in the runner's PATH:)
VMAF_TUNE_INTEGRATION=1 python -m pytest \
tests/test_codec_adapter_x265_two_pass.py::test_real_x265_two_pass_smoke -q
Rebase-sensitivity: zero from upstream — tools/vmaf-tune/ is fork-local. The only concern is the codec_adapters Protocol shape: a future upstream commit that adds a sibling codec adapter SHOULD inherit the supports_two_pass = False default and either explicitly opt in or leave the flag off. Downstream sibling-codec PRs in this fork should follow the ADR-0288 / ADR-0333 pattern: one adapter file, override the two methods, add a test file mirroring test_codec_adapter_x265_two_pass.py.
ADR-0360 — CAMBI CUDA port (T3-15a, 2026-05-09)¶
Files pinned:
core/src/feature/cuda/integer_cambi_cuda.c(new)core/src/feature/cuda/integer_cambi_cuda.h(new)core/src/feature/cuda/integer_cambi/cambi_score.cu(new)core/src/feature/feature_extractor.c(addedvmaf_fex_cambi_cudato list)core/src/meson.build(addedcambi_scoretocuda_cu_sources, addedinteger_cambi_cuda.cto CUDA feature sources)
Why: The CUDA twin of vmaf_fex_cambi (Strategy II hybrid — three GPU kernels for the embarrassingly parallel stages; calculate_c_values + topK on CPU). Registers vmaf_fex_cambi_cuda under #if HAVE_CUDA guard.
Rebase-sensitivity: low. The three new files are wholly fork-local and will not conflict. The two upstream-shared files have small, self-contained hunks:
feature_extractor.c: theextern vmaf_fex_cambi_cudadeclaration and the&vmaf_fex_cambi_cudaarray entry are inside a#if HAVE_CUDAblock. Upstream's additions to this file (new feature extractors, new dispatch flags) will not conflict unless Netflix adds their own CUDA twin for CAMBI (unlikely — they don't ship a CUDA backend).meson.build: thecambi_scoreentry in thecuda_cu_sourcesdict and theinteger_cambi_cuda.cline in the CUDA sources list. Any upstream changes tomeson.buildthat restructure thecuda_cu_sourcesdict would require a manual merge; the dict entries are sorted alphabetically by key, socambi_scorelands betweenadm_scoreandmotion_score.
If upstream adds cambi_cuda themselves: drop the fork copy and check for API divergence. Strategy II hybrid is the natural choice; the upstream implementation may differ if they choose Strategy III (fully-on-GPU calculate_c_values).
cambi_internal.h dependency: integer_cambi_cuda.c includes core/src/feature/cambi_internal.h (fork-added trampoline exposing cambi.c's static helpers). If upstream significantly refactors cambi.c (renames vmaf_cambi_preprocessing, vmaf_cambi_calculate_c_values, etc.), cambi_internal.h must be updated alongside. This is the same dependency the Vulkan twin (cambi_vulkan.c) has — see ADR-0210's rebase note for the full list of exposed functions.
Vulkan submit-pool PR-B: six secondary kernels (2026-05-09, ADR-0353)¶
Files changed:
core/src/feature/vulkan/ssim_vulkan.ccore/src/feature/vulkan/ciede_vulkan.ccore/src/feature/vulkan/ms_ssim_vulkan.ccore/src/feature/vulkan/motion_v2_vulkan.ccore/src/feature/vulkan/float_psnr_vulkan.ccore/src/feature/vulkan/float_motion_vulkan.ccore/src/feature/vulkan/AGENTS.mddocs/adr/0353-vulkan-submit-pool-pr-b-six-kernels.md
Why this rebase-note exists: six Vulkan host-glue TUs were migrated from per-frame command-buffer and descriptor-set allocation to the VmafVulkanKernelSubmitPool abstraction (ADR-0256). Any Netflix upstream sync that touches these same files (unlikely — they are fork-local) must preserve the VmafVulkanKernelSubmitPool fields in the state struct and the pool-destroy-before-pipeline-destroy ordering in close_fex().
Rebase-sensitivity: low. All six files are entirely fork-local; Netflix upstream does not have a Vulkan backend. The submit-pool API is defined in core/src/vulkan/kernel.h (also fork-local). No public header or C-API surface was changed; the FFmpeg patch series is unaffected.
Key invariant to preserve on rebase: vmaf_vulkan_kernel_submit_pool_destroy MUST be called before vmaf_vulkan_kernel_pipeline_destroy in every migrated kernel's close_fex(). See core/src/feature/vulkan/AGENTS.md §"Submit-pool ordering invariant".
0354 — Vulkan submit-pool PR-C: submit_pool_destroy-before-pipeline ordering¶
- Touches:
core/src/feature/vulkan/cambi_vulkan.c,core/src/feature/vulkan/ssimulacra2_vulkan.c,core/src/feature/vulkan/float_ansnr_vulkan.c,core/src/feature/vulkan/moment_vulkan.c. - Invariant: In every migrated extractor,
vmaf_vulkan_kernel_submit_pool_destroy()MUST precede everyvmaf_vulkan_kernel_pipeline_destroy()call inclose_fex(). Reversing the order frees the pool's command buffers after the pipeline's command pool is destroyed — undefined behaviour per Vulkan spec §6.2. - Re-test:
meson test -C build --suite=vulkanpasses.scripts/ci/cross_backend_vif_diff.pyshowsplaces=4for all four extractors on all three target devices (RTX 4090, Arc A380, RADV iGPU).
0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0291)¶
0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0352)¶
- Touches:
core/src/feature/vulkan/adm_vulkan.c,core/src/feature/vulkan/motion_vulkan.c,core/src/feature/vulkan/psnr_vulkan.c(all fork-local Vulkan kernels; no upstream C paths touched),changelog.d/changed/vulkan-submit-pool-pr-a-adm-motion-psnr.md,docs/adr/0291-vulkan-submit-pool-pr-a-adm-motion-psnr.md. - Invariant: Each migrated TU adds
VmafVulkanKernelSubmitPool sub_pooland pre-allocatedVkDescriptorSetfield(s) to its state struct. The pool must be destroyed (vmaf_vulkan_kernel_submit_pool_destroy) beforevmaf_vulkan_kernel_pipeline_destroyinclose_fex(); reversing the order would destroy the descriptor pool while the submit pool still holds live command buffer + fence references. Descriptor sets allocated viavmaf_vulkan_kernel_descriptor_sets_allocare freed implicitly by the descriptor pool tear-down — do NOT callvkFreeDescriptorSetson them inclose_fex(). Formotion_vulkan, the pre-allocated set is rebound once per frame viavkUpdateDescriptorSetsbecause the blur ping-pong changes whichblur[]slot is "current"; foradm_vulkanandpsnr_vulkanthe sets are stable afterinit()and require no per-frame update. - Upstream interaction: none. All three files are fork-local Vulkan kernel TUs not present in Netflix/vmaf upstream.
- On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths. The Vulkan backend is entirely fork-introduced.
- Re-test on rebase:
meson test -C build --suite=fast
# Cross-backend parity gate (places=4):
python python/test/cross_backend_diff.py \
--features adm motion psnr \
--backend vulkan cpu \
--places 4 \
--yuv testdata/yuv/src01_hrc00_576x324.yuv \
testdata/yuv/src01_hrc01_576x324.yuv
ADR-0350 — FFmpeg libvmaf filter CUDA backend selector (0010 patch)¶
Patch: ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch.
libavfilter/vf_libvmaf.c— addscudaAVOption + state field + init / cleanup / picture-pool wiring underCONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER.configure— adds--enable-libvmaf-cuda(EXTERNAL_LIBRARY_LISTentry + help text), promoteslibvmaf_cudafrom blanket-autodetect to gatedenabled libvmaf_cuda && require_pkg_config + check, preserves theenabled libvmaf && check_pkg_config libvmaf_cudain-filter probe so the new selector still works without the explicit flag when libvmaf ships CUDA. Why this rebase-note exists: Patch0010extends the SYCL (0003) / Vulkan (0004) per-context backend selectors to CUDA on the regularlibvmaffilter. The patch coexists with the upstream dedicatedlibvmaf_cudafilter (CONFIG_LIBVMAF_CUDA_FILTER) by gating its struct field and code paths on!CONFIG_LIBVMAF_CUDA_FILTER— the dedicated filter keeps owning its owncu_statefield. CLAUDE.md §12 r14 makes the patch update mandatory because the change touches a filter consumer of thevmaf_cuda_state_init/_import_state/_state_free/_preallocate_pictures/_fetch_preallocated_pictureC-API surface inlibvmaf_cuda.h. Rebase-sensitivity: low. The patch'svf_libvmaf.chunks are context-anchored on the SYCL/Vulkan selector blocks; if upstream FFmpeg renamesCONFIG_LIBVMAF_CUDA_FILTERor moves thelibvmaf_cuda.hinclude, the include guard at the top of the file needs the corresponding update. The configure hunks are context-anchored on the existing--enable-libvmaf-sycl/--enable-libvmaf-vulkanlines — those have proven stable across n8.0 → n8.1 → n8.1.1, so drift risk is low. WhenVmafCudaConfigurationever grows adevice_indexfield upstream, swap thecudaboolean for anint cuda_devicemirroring SYCL's shape (separate ADR + patch refresh). Verification gate: cumulativegit am --3wayreplay offfmpeg-patches/000{1..9}-*.patch+0010-*against pristine FFmpegn8.1.1PASS (2026-05-09). Build oflibavfilter/vf_libvmaf.oPASS under bothCONFIG_LIBVMAF_CUDA=0(selector errors at filter- init time per#elsebranch) andCONFIG_LIBVMAF_CUDA=1 && !CONFIG_LIBVMAF_CUDA_FILTER(selector active, picture-pool wiring compiles).
0320 — Vulkan instance / VMA apiVersion bump to 1.4 (Step B)¶
- Touches:
core/src/vulkan/common.c,core/src/vulkan/vma_impl.cpp,core/src/vulkan/AGENTS.md. - Invariant: the four
apiVersionsites (lines 54, 264, 374 ofcommon.c; line 22 ofvma_impl.cpp) request Vulkan 1.4, not 1.3. Together with the Step-Aprecisedecorations invif.comp/ciede.comp(PR #346) and the Phase-3 cross-subgroup release-acquire fix (PR #511), this gates the cross-backend places=4 contract on Arc + RADV. NVIDIA closure depends on Phase 3c (PR #512; block-on-merge until that lands). Netflix upstream does not carry a VMA dependency or a Vulkan backend; no upstream merge conflict expected on these files. - Re-test on rebase:
meson setup build -Denable_vulkan=enabled -Denable_cuda=false \
-Denable_sycl=false --buildtype=release
ninja -C build
for D in 0 1 2; do
python3 scripts/ci/cross_backend_parity_gate.py \
--vmaf-binary build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--backends cpu vulkan --vulkan-device "$D" \
--features vif ciede adm motion psnr
done
# All 0/N mismatches at places=4 once Phase 3c (PR #512) has landed.
ADR-0332 v2 runtime (T5-2c) — Embedded MCP server UDS + real compute_vmaf (2026-05-09)¶
- Touches:
core/src/mcp/{mcp.c,dispatcher.c,mcp_internal.h,meson.build,compute_vmaf.c,transport_uds.c},core/test/test_mcp_smoke.c. All paths are fork-local. No new third-party vendor drop in v2 — mongoose vendoring stays deferred to v3 with the SSE transport. - Invariant: same as ADR-0209 v1 — the entire
core/src/mcp/subtree is fork-local; the public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged (only function bodies flipped —vmaf_mcp_start_udsfrom-ENOSYSto a working AF_UNIX listener;compute_vmaffrom a{"status":"deferred_to_v2"}placeholder to a realvmaf_score_pooledbinding). Per ADR-0128 § operational guardrails the UDS socket file is created mode 0700; thatchmodhappens invmaf_mcp_start_udsafterbindand is a load-bearing security invariant — do NOT relax it on rebase.compute_vmafruns on a per-call ephemeralVmafContextso the host's main scoring run is unperturbed; do NOT rewire it to reuseserver->ctxbecausevmaf_score_pooledcommits the model destructively to the context. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. If upstream adds one, expect a port-only sync since names will collide.
- Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
-Denable_mcp=true -Denable_mcp_stdio=true \
-Denable_mcp_uds=true
ninja -C build && meson test -C build test_mcp_smoke -v
# Real-score smoke (single 576x324 pair):
build/test/test_mcp_smoke 2>&1 | tail -3 # expects "16 tests run, 16 passed"
ADR-0332 v3 runtime (T5-2d) — Embedded MCP server SSE transport (2026-05-09)¶
- Touches:
core/src/mcp/{mcp.c,mcp_internal.h,meson.build,transport_sse.c},core/meson_options.txt,core/test/test_mcp_smoke.c,docs/mcp/embedded.md,docs/adr/0332-mcp-runtime-v2.md(status-update appendix). All paths are fork-local. No third-party vendor drop in v3 — the originally-planned mongoose vendor was reversed because cesanta/mongoose 7.18 is GPL-2.0-only OR commercial, incompatible with the fork's BSD-3-Clause-Plus-Patent license (verified at upstream LICENSE 2026-05-09). The SSE transport is plain POSIX sockets in fork-owned C (~500 LOC). - Invariant: same as ADR-0209 / ADR-0332 v2 — the entire
core/src/mcp/subtree is fork-local; the public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged (onlyvmaf_mcp_start_sse's body flipped from-ENOSYSto a working AF_INET listener). The SSE listener bindsINADDR_LOOPBACKonly; do NOT switch toINADDR_ANYwithout a separate ADR + auth design (v3 ships intentionally without CORS/Bearer/per-session auth on the assumption of a same-host trust boundary). The SSE stop path usesshutdown(SHUT_RDWR)beforeclose()— plainclose()of an AF_INET listening fd from another thread does NOT unblockaccept()on Linux; do NOT remove theshutdowncall.enable_mcp_sseis now afeatureoption (defaultauto), notboolean false. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. Do NOT re-introduce mongoose (or any GPL-licensed HTTP library) on a future rebase without first amending CLAUDE §1 and adding a separate license-compatibility ADR.
- Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
-Denable_mcp=true -Denable_mcp_stdio=true \
-Denable_mcp_uds=true \
-Denable_mcp_sse=enabled
ninja -C build && meson test -C build test_mcp_smoke -v
build/test/test_mcp_smoke 2>&1 | tail -3 # expects "17 tests run, 17 passed"
Status update 2026-05-09 — placeholder-ref hardening¶
- Additional touches: same set as the 2026-05-08 ADR-0334 entry, no new files. The hardening adds a
git diff -U0 ... -- docs/state.mdcall insidescripts/ci/state-md-touch-check.sh(case 4a) plus 10 additional fixture cases inscripts/ci/test-state-md-touch-check.sh. - New invariant: inserted lines in
docs/state.md(lines starting with+, excluding the+++ b/...header) must not containthis PR/this commit/ bareTBD/<PR>/#NNN. Canonical accept forms arePR #Nandcommit `<sha>`. The placeholder vocabulary is coupled to PR #541's audit findings — reword in lockstep with the ADR-0334 status-update appendix if the fork's row template changes. - Re-test on rebase: same
bash scripts/ci/test-state-md-touch-check.shrun as the 2026-05-08 entry; the harness now reports18/18 passed(was8/8 passed).
0347 — Sanitizer matrix test-set scope (ADR-0347)¶
- Touches:
.github/workflows/tests-and-quality-gates.ymljobsanitizers(build + test step),core/test/meson.build(no edits — the absence of anysuite: 'unit'tag is the upstream state we now work with rather than against). - Invariant: the sanitizer job runs the full C unit-test set per sanitizer with a per-sanitizer deselect list driven by a
caseblock on${{ matrix.sanitizer }}. The deselect lists are load-bearing — each entry corresponds to a real bug tracked indocs/state.md. Under UBSan the build adds-Dc_args=-fno-sanitize=function -Dcpp_args=-fno-sanitize=functionto suppress the K&R-prototype harness UB; the mesoncasebranch must keep this build flag in sync with the test deselect entries. An upstream rebase that adds new test files viacore/test/meson.buildinherits full sanitizer coverage automatically (the workflow enumerates tests viameson test --list). - On upstream sync: if upstream Netflix lands a
suite: 'unit'tagging convention, the workflow is robust to it (we already enumerate frommeson test --list, not from--suite=unit). If upstream rewrites the harness to declarestatic char *test_X(void)with a(void)parameter, the-fno-sanitize=functionflag becomes redundant — leave it in place (zero cost) until a deliberate cleanup PR reverts the suppression. If upstream lands a fix for any of the surfaced defects (SVMModelParservalidation,feature_collectormetadata leak,integer_adm::div_lookuprace,framesyncmutex mismatch), drop the corresponding deselect row from the workflow'scaseblock in the same PR that pulls the upstream fix. cd libvmaf for SAN in address undefined thread; do EXTRA=() [ "$SAN" = undefined ] && EXTRA=( "-Dc_args=-fno-sanitize=function" "-Dcpp_args=-fno-sanitize=function" ) rm -rf "build-$SAN" CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld \ meson setup "build-$SAN" -Db_sanitize="$SAN" \ -Denable_cuda=false -Denable_sycl=false --buildtype=debug \ -Db_lto=false -Db_lundef=false "${EXTRA[@]}" meson compile -C "build-$SAN" case "$SAN" in address) EXCLUDE='test_model$|test_predict$|test_float_ms_ssim_min_dim$' ;; undefined) EXCLUDE='test_model$' ;; thread) EXCLUDE='test_model$|test_pic_preallocation$|test_framesync$' ;; esac TESTS=$(meson test -C "build-$SAN" --list \ | grep '^libvmaf:' \ | grep -vE "$EXCLUDE" \ | sed 's/^libvmaf://') meson test -C "build-$SAN" --print-errorlogs $TESTS
CodeQL bulk mechanical sweep — Python tree (2026-05-09)¶
- Why this matters on rebase: no rebase impact. The diff lives entirely in
python/vmaf/and one fork-local helper (core/src/vulkan/spv_embed.py). None of the touched Python modules have been changed by Netflix upstream in over four years; the closest churn is unrelated additions topython/vmaf/script/run_*.pydriver flags. A future/sync-upstreamwill land on a clean tree. - What changed: dead imports removed;
exit()→sys.exit()in seven CLI driver scripts;open(...)→with open(...)inpython/vmaf/tools/decorator.pyandcore/src/vulkan/spv_embed.py; typedexcept KeyError: passbodies got an explanatory one-line comment to satisfypy/empty-except;passremoved where it was a no-op tail statement; one commented-out debug block deleted fromtools/misc.py. - Re-test on rebase:
python3 -c "import ast; [ast.parse(open(f).read()) for f in (...)]"over the touched files;ruff checkover the same set must produce no NEW errors versus master baseline.
0345 — cambi × {CUDA, SYCL, HIP} GPU port planning (ADR-0345, docs-only)¶
- Touches:
docs/research/0091-cambi-gpu-port-planning-2026-05-09.md(new),docs/adr/0345-cambi-gpu-port-strategy.md(new),docs/adr/_index_fragments/0345-cambi-gpu-port-strategy.md(new fragment),docs/adr/_index_fragments/_order.txt(append slot),changelog.d/changed/cambi-gpu-planning-digest.md(new). No code. Companion to the per-port PRs that follow per the digest's §6 ordered plan (CUDA → SYCL → HIP). - Upstream source: none — fork-local planning artefact. Netflix/vmaf upstream has no CUDA / SYCL / HIP cambi twin and no plans to add one on those backends.
- Invariant: the planning round locks Strategy II host-staged hybrid for the three pending backends, inheriting verbatim from ADR-0205 §Decision and ADR-0210 §Decision. The cross-backend gate contract for cambi is
places=4from day one on all backends — by construction (integer-only GPU pre-passes; byte-identical readback; unmodified host residual). If any per-port PR sees empirical drift from CPU, fix the kernel — never relax the gate (memoryfeedback_no_test_weakening). The sharedcambi_internal.hhost residual surface (shipped with PR #196 for the Vulkan port) is the load-bearing reuse point — all four GPU twins (Vulkan, CUDA, SYCL, HIP) link against it and inherit any future CPU-side c-value formula change automatically. - On upstream sync: no action required. If a future upstream sync introduces a Netflix/vmaf cambi GPU twin (extremely unlikely — Netflix has no public CUDA / SYCL / HIP cambi work), evaluate whether to drop the fork's twin in favour of upstream's per the standard prefer-upstream rule; otherwise no action.
- Re-test on rebase: docs-only — no compile / runtime gate. The Strategy III v2 follow-up (parked per ADR-0205 §Out of scope) gets its own ADR + rebase-notes entry when profile data lands.
0320 — Vulkan VIF API-1.4 NVIDIA residual Phase 3b (deferral)¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(comment-only update at the Phase-4 reduction site — documents the Phase-3b candidate-fix experiments and the driver-side hypothesis; no code logic change vs. PR #511);docs/adr/0269-vif-ciede-precise-step-a.md(appended Phase-3b status update appendix; ADR body remains frozen per ADR-0028);docs/research/0090-...md(new);docs/state.md(rowT-VK-VIF-1.4-RESIDUAL-ARCretired in favour ofT-VK-VIF-1.4-RESIDUAL-NVIDIA-DEFERREDafter the hardware-mapping correction);core/src/vulkan/AGENTS.md(Phase 3b update + rebase invariant for cross-backend gate device-name selection);changelog.d/fixed/vif-arc-mesa-anv-int64-reduction.md(new fragment). - Invariant: the workgroup-scope
memoryBarrierShared(); barrier();pair PR #511 introduced is load-bearing for the Arc + RADV lanes at API 1.4 and stays. Phase 3b confirmed it cannot be downgraded back to a barebarrier()even if the NVIDIA residual ever closes — Arc's clean state is contingent on the workgroup-scope pair. - Cross-backend gate device-selection invariant (NEW): scripts that target a specific Vulkan vendor must select by
deviceNamesubstring, not by--vulkan_device <index>.vmaf_vulkan_context_new's device sort is stable inside the samedevtype_scorebucket and thevkEnumeratePhysicalDevicesenumeration order is host-policy-dependent (driver registration order in/etc/vulkan/icd.d/, Mesa device-select layer,VK_LOADER_*env vars). PR #511's commit message inverted the device map on this fork's CI workstation; the empirical numbers it cited as "NVIDIA" actually came from Arc and vice versa. New cross-backend lanes targeting a specific vendor should not inherit the off-by-one. - On upstream sync:
vif.compis fork-local; no upstream Netflix/vmaf has a Vulkan path. Cherry-picks from upstream cannot reach this file. - Re-test on rebase (assumes a multi-GPU CI workstation with NVIDIA + Arc + RADV; lavapipe-only CI lanes are a no-op for the API-1.4 residual since lavapipe never reproduced the bug):
# Local API-1.4 bump (off-master reproducer; do NOT commit).
sed -i 's/VK_API_VERSION_1_3/VK_API_VERSION_1_4/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1003000/VMA_VULKAN_VERSION 1004000/' \ core/src/vulkan/vma_impl.cpp cd libvmaf && meson setup build -Denable_vulkan=enabled \ -Denable_cuda=false -Denable_sycl=false && ninja -C build cd ..
# NVIDIA lane — expected 45/48 FAIL scale 2 until either the
# manual int64 subgroup-reduction patch lands or NVIDIA fixes
# the driver. Arc + RADV expected 0/48.
0230 — ssimulacra2_cuda GPU module unload + per-scale malloc removal (ADR-0356)¶
- Touches:
core/src/feature/cuda/ssimulacra2_cuda.c(fork-only — fork-added CUDA extractor),core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu(fork-only kernel),core/src/cuda/AGENTS.md(fork-local package guidance). - Invariant: every
cuModuleLoadDatain the fork's CUDA extractors must be paired with a guardedcuModuleUnloadin the matchingclose_fex_cuda, betweencuStreamSynchronizeandcuStreamDestroy. The leak is invisible tocompute-sanitizer --tool memcheck(the tool's leak-checker is scoped tocuMem*Alloconly). The XYB H2D / D2H byte counts shrink to the valid sub-region per scale; the device-sideplane_full_pixelsstride contract (kernels assume each plane starts at full-resolution offsets) stays unchanged. Pinned scratch reservationsh_ref_lin_ds/h_dis_lin_dsare owned byss2c_alloc_buffersand freed byclose_fex_cudavia the existingSS2C_FREE_HOSTmacro. - Upstream interaction: none.
ssimulacra2_cudais fork-added per ADR-0206 and has no upstream Netflix/vmaf twin. meson test -C core/build test_ssimulacra2_simd
python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \
--feature vif --backend vulkan --device <NVIDIA-index>
# Revert local bump after testing.
sed -i 's/VK_API_VERSION_1_4/VK_API_VERSION_1_3/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1004000/VMA_VULKAN_VERSION 1003000/' \ core/src/vulkan/vma_impl.cpp
Upstream-port-later batch — Research-0090 18-commit triage close-out (2026-05-09)¶
- Touches:
docs/state.md(one row in "Deferred (waiting on external trigger)"), this file,changelog.d/changed/upstream-port-later-batch-2026-05-09.md. No code touched. Companion to PR #446 (Research-0090) and the in-flight PRs #497 (MyTestCase super-PR), #443 / #444 (cambi-docs duplicate pair). - Per-commit classification (input set: 18 PORT_LATER SHAs from Research-0090):
| # | Upstream SHA | Subject (truncated) | Verdict | Reopen / forward path |
|---|---|---|---|---|
| 1 | 38e905d1 | adopt MyTestCase + reformat BD-rate test data | PORT_DEFERRED | Subsumed by PR #497 commit e1dbdc09; close out when #497 merges |
| 2 | 005988ea | adopt MyTestCase + port new tests + align fifo_mode | PORT_DEFERRED | Subsumed by PR #497 commit 6c05afe2; close out when #497 merges |
| 3 | 4679db83 | fix VMAFEXEC_score tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit 0004d2cf — must preserve fork's golden places= values byte-for-byte (CLAUDE §8 / ADR-0024) |
| 4 | 3e075107 | adopt MyTestCase + update score values in vmafexec tests | PORT_DEFERRED | Subsumed by PR #497 commit 0004d2cf; close out when #497 merges |
| 5 | e3827e4d | adopt MyTestCase + port new tests in asset/bootstrap/local_explainer | PORT_DEFERRED | Subsumed by PR #497 commit 6c05afe2; close out when #497 merges |
| 6 | 25ff9f18 | remove empty VmafossexecCommandLineTest stub | PORT_DEFERRED → CHERRY-PICK after #497 | Pure 13-line deletion. PR #497 currently RE-EMITS the stub; once #497 lands, cherry-pick this commit standalone (zero-conflict against post-#497 tip). |
| 7 | 3a041a97 | adopt MyTestCase + update score values | PORT_DEFERRED | Subsumed by PR #497 commit d52d9221; close out when #497 merges |
| 8 | ead2d12b | fix vif_scale3 + adm3_egl_1 tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit b5a3f61b — Netflix-golden tolerance guard same as row 3 |
| 9 | 6c097fc4 | reduce ADM/VIF tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit f3881d5c — Netflix-golden tolerance guard same as row 3 |
| 10 | 7df50f3a | align testutil with full set of fixture functions | PORT_DEFERRED | Subsumed by PR #497 commit f1ae0495; close out when #497 merges |
| 11 | 322ca041 | replace temporal slicing with pre-sliced YUV fixtures | PORT_DEFERRED | Subsumed by PR #497 commit 7d9d9a10; close out when #497 merges. Sequencing matters: this commit must land before rows 12, 14, 15, 17 (the YUV-fixture consumers); #497 already orders them correctly. |
| 12 | 74bdce1b | align vmafexec_feature_extractor_test (aim/adm3/motion3) | PORT_DEFERRED | Subsumed by PR #497 commit 07e7cb48; close out when #497 merges |
| 13 | a3776335 | align feature_extractor_test (aim/adm3/motion3) | PORT_DEFERRED | Subsumed by PR #497 commit 15a6874d; close out when #497 merges |
| 14 | 0341f730 | remove duplicate test_run_vmaf_integer_fextractor | PORT_DEFERRED → CHERRY-PICK after #497 | Pure 76-line deletion. Same disposition as row 6 — #497 currently re-emits the duplicate; cherry-pick standalone after #497. |
| 15 | 9fa593eb | port feature_extractor tests for aim/adm3/motion3 + new options | PORT_DEFERRED | Subsumed by PR #497 commit ab21b694; close out when #497 merges |
| 16 | d93495f5 | reduce tolerance for VMAF scores in quality_runner tests | PORT_DEFERRED w/ Netflix-golden guard | PR #497 — Netflix-golden tolerance guard same as row 3 |
| 17 | 7d1ad54b | port feature extractor tests for aim/adm3/motion3 | PORT_DEFERRED | Subsumed by PR #497 commit 44b9e626; close out when #497 merges |
| 18 | 721569bc | resource/doc: cambi_high_res_speedup + motion2 score | PORT_DEFERRED → DEDUP | Already in flight on TWO branches (PR #443 + PR #444). Maintainer picks one and abandons the other per Research-0090 §Recommended action #4. No third port-PR opened. |
- Invariant: after PR #497 merges, the Research-0090 PORT_LATER bucket reduces to exactly two follow-up cherry-picks against post-#497 master:
git cherry-pick 25ff9f18(delete emptyVmafossexecCommandLineTest).git cherry-pick 0341f730(delete duplicatetest_run_vmaf_integer_fextractor). Both are pure deletions onpython/test/command_line_test.pyandpython/test/feature_extractor_test.pyrespectively; no score change, no Netflix-golden interaction. They were excluded from PR #497 because the v2 super-PR's diff state currently RE-EMITS those identifiers (likely because #497 cherry-picked from an earlier upstream tip than25ff9f18/0341f730).- Netflix-golden guard (binding): per CLAUDE §8 / ADR-0024, the three Netflix CPU golden pairs in
python/test/quality_runner_test.py,vmafexec_test.py,vmafexec_feature_extractor_test.py,feature_extractor_test.py,result_test.py(1 normalsrc01_hrc00↔hrc01+ 2 checkerboard) carry hard-codedassertAlmostEqualrows that are NEVER modified by a fork PR. Upstream commits4679db83,ead2d12b,6c097fc4,d93495f5explicitly LOWERplaces=on a subset of those rows (their stated motivation is macOS FP precision drift, not a true score change). Reviewer of PR #497 must verify that the 3 golden pairs retain fork tolerances byte-for-byte; only non-golden rows may adopt the relaxations. - On upstream sync: future
/sync-upstreamruns that re-detect these 18 SHAs should match this entry via the SHA list and short-circuit Pass-2 classification (skip re-triage). - Re-test on rebase: none required at the time of this commit (no code touched); after the two follow-up cherry-picks (
25ff9f18+0341f730) eventually land, run meson test -C build --suite=fast make test-netflix-golden # 3/3 CPU goldens still pass ADR-0108: every fork-local PR that touches upstream-shared paths or establishes a rebase-sensitive invariant adds an entry here. PRs with no rebase impact state "no rebase impact" in the PR description and skip the entry.
The intended reader is whoever runs the next /sync-upstream (see ADR-0002 and .claude/skills/sync-upstream/). Read top-to-bottom before resolving conflicts.
Format¶
Each entry is a ### NNNN — short title heading with three fields:
- Touches: paths likely to conflict on upstream merge.
- Invariant: what the fork relies on that an upstream change could silently drop.
- Re-test: the command(s) to run after the merge to confirm the invariant survived. Reproducer-style — no surrounding prose required.
IDs are assigned in commit order and never reused. A single entry may cover several PRs in one workstream; cross-link from the ID heading.
Entries (backfilled 2026-04-18 per ADR-0108 adoption)¶
0310 — Vulkan VIF int64 reduction race condition Phase 3 fix¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(replaces all three barebarrier()calls with explicitmemoryBarrierShared(); barrier();pairs covering the Phase-1 cooperative tile load, the Phase-2 vertical-conv shared write, and the Phase-4 cross-subgroup int64 reduction); plus documentation underdocs/research/0089-...md(Phase 3 status appendix),docs/adr/0269-...md(Phase 3 status appendix),docs/state.md(T-VK-VIF-1.4-RESIDUAL closed; new T-VK-VIF-1.4-RESIDUAL-ARC opened),core/src/vulkan/AGENTS.md(Phase 3 update on the existing invariant row),changelog.d/fixed/vif-int64-reduction-race-condition.md. Upstream Netflix/vmaf has no Vulkan backend, so conflict probability for the shader is zero. The entry exists because the fix is rebase-sensitive: any future cherry-pick that touchesvif.compand downgrades amemoryBarrierShared(); barrier();pair back to a barebarrier()will silently re-introduce the NVIDIA Vulkan 1.4 race. - Invariant:
vif.compshared-memory ordering between cooperative-write phases must be release-acquire, not just a bare workgroup-execution barrier. NVIDIA's Vulkan 1.4 default memory model requires the explicit shared-memory release; barebarrier()works at API 1.3 by accident on this driver. SCALE is irrelevant — the fix applies to all four pipeline specialisations because the barrier sites are in the SCALE-shared code. Do NOT remove the explicitmemoryBarrierShared()calls even if a perf review claims they are redundant under the GLSL spec wording: empirical real-hardware evidence in research-0089 2026-05-09 appendix shows otherwise on NVIDIA driver 595.71.05. - Re-test: apply the local API-1.4 bump (
core/src/vulkan/common.c3 sites +vma_impl.cppVMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build withmeson setup ... -Denable_vulkan=enabled, then runpython3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkan --device 1 --places 4. Expect 0/48 across all four scales. Run the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3" against--vulkan_device 1; expect 5 identical(integer_vif_num_scale2, integer_vif_den_scale2) = (+2.494358e+04, +2.522523e+04)pairs at frame 5. Note that--vulkan_device 0on this multi-GPU host is the Intel Arc A380 lane and will still fail at API 1.4 (separateT-VK-VIF-1.4-RESIDUAL-ARCrow Open).
0309 — Vulkan VIF API-1.4 Phase 2 dump (T-VK-VIF-1.4-RESIDUAL)¶
- Touches:
docs/research/0089-vulkan-vif-fp-residual-bisect-2026-05-08.md(2026-05-09 status appendix with empirical numbers from the live RTX 4090),docs/state.md(T-VK-VIF-1.4-RESIDUAL row updated with the localisation),core/src/vulkan/AGENTS.md(new invariant row pinning the SCALE = 2 cross-subgroup-reduction memory-model finding),CHANGELOG.md(lusoris fork "Changed" entry). No code touched; the Phase 3 shader memory-model fix lands in a separate PR. Upstream Netflix/vmaf has no Vulkan backend so conflict probability for the AGENTS.md row is zero — entry exists because the empirical localisation flips the open state-row hypothesis from FP-precision to memory-model and retires theplaces=3override path that earlier rebase scaffolding might have suggested. - Invariant:
vif.compSCALE = 2 specialisation's Phase-4 cross-subgroup int64 reduction is non-deterministic on NVIDIA driver 595.71.05 + Vulkan 1.4.341 (lines 547–592,subgroupAddbarrier()+ thread-0 read ofs_lmem). API 1.3 lane is fully deterministic on the same hardware. The fourapiVersionpinning sites incore/src/vulkan/common.c+core/src/vulkan/vma_impl.cppstay at 1.3 until Phase 3 lands the explicit memory-scope barrier and a 5-run determinism gate confirms run-to-run identical(num, den)plusplaces=40/48 on NVIDIA. Theplaces=3override path is eliminated from the unblock options. - Re-test: apply the local API-1.4 bump (
core/src/vulkan/common.c3 sites +vma_impl.cppVMA_VULKAN_VERSION 1004000) on a NVIDIA RTX 4090 + driver 595.71+ machine, build withmeson setup ... -Denable_vulkan=enabled, then run the gate and the 5-run determinism check from research-0089 §"Reproduction recipe for Phase 3". Expect 45/48places=4failures oninteger_vif_scale2(max abs1.527e-02) AND 5 distinct(integer_vif_num_scale2, integer_vif_den_scale2)pairs across 5 runs of--feature 'vif_vulkan=debug=true'. Both observations reproduced bit-for-bit on this session's hardware lane (UUIDe478b41b-5c4f-1ddb-f990-e44916aff4c8).
0308 — encoder knob-sweep recipe-regression policy (ADR-0308, docs-only)¶
- Touches:
docs/research/0080-encoder-knob-sweep-findings.md,docs/adr/0308-encoder-knob-sweep-recipe-regression-policy.md,docs/adr/README.md(index row),ai/AGENTS.md(knob-sweep invariant section),changelog.d/changed/encoder-knob-sweep-findings.md. No code touched; companion to PR #400 (ADR-0305 + Research-0077 +ai/scripts/analyze_knob_sweep.py). Upstream Netflix/vmaf has no encoder-knob-sweep surface, so conflict probability is zero — this entry exists only because the policy threshold (7-of-9 structural cut) is rebase-sensitive on the corpus shape. - Invariant: the 7-of-9 source-count threshold from ADR-0308 §Decision point 1 is calibrated against the current 9-source Netflix Public Dataset corpus. If the corpus grows past 9 sources (e.g. UGC expansion per ADR-0287, or HDR additions), re-derive the absolute threshold as a fraction (≥7/9 ≈ 78 %). The structural cluster is sharp on the current corpus (top-15 cells all hit 9-of-9, no observed cells in 4-6 range), so a fractional cut at ~75 % is robust. Do NOT relax
bitrate_tol_pct(default 5.0) orvmaf_tol(default 0.1) inai/scripts/analyze_knob_sweep.pywithout an ADR — those tolerances are calibrated against the per-frame VMAF noise floor and bitrate quantisation in libavformat muxers. - Re-test:
pytest ai/tests/test_knob_sweep_analysis.py -v(script logic; ships in PR #400). Policy gate is offline: regenerateruns/phase_a/full_grid/comprehensive.jsonlviatools/vmaf-tune/src/vmaftune/hw_encoder_corpus.py(3-hour run on a single host with NVENC + QSV) then re-runpython ai/scripts/analyze_knob_sweep.py --jsonl <adapted.jsonl> --out-dir runs/phase_a/full_grid/reports/and diff the resultingsummary.mdagainstdocs/research/0080-encoder-knob-sweep-findings.mdheadline table. Structural cluster (top-15 cells, all 9-of-9) is the invariant to defend.
0228 — Vulkan 1.4 bump deferred (ADR-0264, docs-only)¶
- Touches: none (docs-only PR). Future Step A of T-VK-1.4-BUMP will touch
core/src/feature/vulkan/shaders/vif.compandcore/src/feature/vulkan/shaders/ciede.comp; Step B will touch the threeapiVersionsites incore/src/vulkan/common.c(lines 54, 264, 374) and theVMA_VULKAN_VERSIONdefine incore/src/vulkan/vma_impl.cpp(line 22). - Invariant:
masterstays onVK_API_VERSION_1_3andVMA_VULKAN_VERSION = 1003000. Lifting the constant in any future upstream sync (Netflix doesn't ship a Vulkan backend, so the conflict is improbable) without first auditingprecise/OpDecorate ... NoContractiondecoration onvif.compandciede.compwill reintroduce the NVIDIA-driver regression captured in research-0053. Thepsnr_hvs_strict_shaders-O0list incore/src/vulkan/meson.buildis the existing precedent for shader-side bit-exactness mitigations and should be the place a 1.4-era audit lands its results (potentially expanding to covervif.comp+ciede.compif thepreciseaudit decides the optimizer is the right place to gate). - Re-test: when Step B lands, the gate is
python3 scripts/ci/cross_backend_vif_diff.py --feature vif --backend vulkanand the same with--feature ciedeagainst NVIDIA + RADV + lavapipe; max abs diff must stay ≤5.0e-05(places=4) on all three.
0229 — HIP fifth-consumer kernel float_ansnr_hip (ADR-0266)¶
0228 — y4m_convert_411_422jpeg 1-byte heap-buffer-overflow fix¶
0228 — vmaf-tune resolution-aware model selection (ADR-0289)¶
0282 — vmaf-tune AMD AMF codec adapters (ADR-0282)¶
0228 — tools/vmaf-tune/ codec-agnostic encode dispatcher (ADR-0294)¶
- Touches:
tools/vmaf-tune/src/vmaftune/encode.py— refactored to look up the codec adapter and delegate argv composition. Wholly fork-local.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py,codec_adapters/x264.py— adapter contract gainsffmpeg_codec_args(preset, quality)andextra_params(). Both are duck-typed; missing methods fall back to the legacy x264-CRF shape.tools/vmaf-tune/tests/test_encode_multi_codec.py— new 19-test suite pinning the dispatcher contract per codec.docs/usage/vmaf-tune.md— new "Codec adapter contract" section.- Invariant: the harness (
encode.py,corpus.py) must not branch on codec identity. The only codec-aware code is the per-adaptercodec_adapters/*.pyfile. Any future change that adds anif adapter.encoder == "..."to the harness regresses ADR-0294's whole-purpose. The corpus row schema stays at SCHEMA_VERSION=1 —crfis preserved as the row column even when the underlying codec's quality knob is-cq/-qp/ etc.;EncodeRequest.qualityis a request-side property only. Adapters that don't yet exposeffmpeg_codec_argsare intentionally permitted to fall back to the legacy x264-CRF shape; removing that fallback would break in-flight adapter PRs landing one-at-a-time. - Re-test on rebase:
```bash pytest tools/vmaf-tune/tests/ -q # 32 passed (13 existing + 19 multi-codec)
python -c " from pathlib import Path from vmaftune.encode import EncodeRequest, build_ffmpeg_command req = EncodeRequest( source=Path('ref.yuv'), width=1920, height=1080, pix_fmt='yuv420p', framerate=24.0, encoder='libx264', preset='medium', crf=23, output=Path('out.mp4'), ) cmd = build_ffmpeg_command(req) assert cmd[cmd.index('-c:v') + 1] == 'libx264' assert cmd[cmd.index('-preset') + 1] == 'medium' assert cmd[cmd.index('-crf') + 1] == '23' print('x264 dispatcher path OK') "
0260 — vmaf-tune --sample-clip-seconds (ADR-0301)¶
- Touches:
tools/vmaf-tune/src/vmaftune/{cli,corpus,encode,score,__init__}.py— fork-local. No upstream Netflix/vmaf path overlap.tools/vmaf-tune/tests/test_corpus.py,tools/vmaf-tune/AGENTS.md,docs/usage/vmaf-tune.md,docs/adr/0301-vmaf-tune-sample-clip.md,docs/adr/_index_fragments/0301-vmaf-tune-sample-clip.md,docs/adr/_index_fragments/_order.txt,docs/adr/README.md.- Invariant: corpus JSONL
SCHEMA_VERSIONbumped to2— additiveclip_modekey only. Sample-clip windows are mirrored on both sides via FFmpeg input-side-ss/-t(encode) and libvmaf's--frame_skip_ref/--frame_cnt(score). The_resolve_sample_clip()helper is the single source of truth for the centre-anchored slice math; do not duplicate the computation elsewhere. Falls back silently to"full"whenN >= duration_s. - Re-test:
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_amf,hevc_amf,av1_amf,_amf_common}.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py— registry extended with three AMF entries.tools/vmaf-tune/tests/test_codec_adapter_amf.py(new).tools/vmaf-tune/tests/test_corpus.py— Phase A test renamed fromtest_known_codecs_phase_a_is_x264_onlytotest_known_codecs_includes_x264_and_amf.tools/vmaf-tune/AGENTS.md— adds AMF preset-compression invariant.docs/usage/vmaf-tune.md— adds Hardware encoders section.- Invariant: the 7-into-3 preset compression table in
_amf_common.py(_PRESET_TO_AMF) is the cross-codec axis Phase B / C consumers depend on. Every AMF adapter accepts the canonical 7 preset names (placebo…ultrafast) and maps them onto the three AMF rungs (quality/balanced/speed). Do not extend the preset vocabulary without amending ADR-0282 — registry uniformity (no codec-identity branching in the harness search loop) rests on every codec accepting the same names. - Re-test:
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
tools/vmaf-tune/src/vmaftune/resolution.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/corpus.py— addsCorpusOptions.resolution_aware: bool = Trueand pipes the effective model throughscore_res.request.modelinto the JSONL row.tools/vmaf-tune/src/vmaftune/cli.py— adds--resolution-aware/--no-resolution-aware(BooleanOptionalAction, default on).tools/vmaf-tune/tests/test_resolution.py(new).docs/usage/vmaf-tune.md— new "Resolution-aware mode" section.docs/adr/0289-vmaf-tune-resolution-aware.md(new) +docs/research/0064-vmaf-tune-resolution-aware.md(new).tools/vmaf-tune/AGENTS.md— two new invariant notes.- Invariant: the height-only decision rule (
height >= 2160→vmaf_4k_v0.6.1, elsevmaf_v0.6.1) is the documented contract. The JSONLvmaf_modelfield is now per-row (not per-job) — mixed ladder corpora legitimately contain multiple distinct values across rows. Downstream consumers (Phase B / C / D) must group/filter byvmaf_modelrather than assuming a constant. Width is accepted in the API for symmetry but ignored in the body; do not branch on it without a follow-up ADR. - Re-test:
pytest tools/vmaf-tune/tests/ -q
python tools/vmaf-tune/vmaf-tune corpus --help | grep resolution-aware
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
core/tools/y4m_input.c— upstream-mirrored Daala-derived Y4M parser. The fix sits inside the 4:1:1 → 4:2:2-jpeg chroma upsample routiney4m_convert_411_422jpeg, lines ~500–530 in the function's three sub-loops. Upstream Netflix/vmaf carries the same shape; if upstream lands its own fix during a sync, prefer the upstream version and drop ours.core/test/test_y4m_411_oob.c(new, fork-local) — drives the minimal W=2 H=4 4:1:1 stream throughvideo_input_open+video_input_fetch_frame. Wholly fork-added; no upstream collision.core/test/meson.build— addstest_y4m_411_oobexecutable +test()registration.- Invariant: the first two sub-loops of
y4m_convert_411_422jpegmust guard_dst[(x << 1) | 1]writes with(x << 1 | 1) < dst_c_w, matching the third sub-loop's existing guard. Without the guard a 4:1:1 stream of width 2 (dst_c_w == 1) writes one byte past the destination chroma row. - Re-test:
cd libvmaf && meson setup ../build-asan --buildtype=debug -Db_sanitize=address -Db_lundef=false -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabledninja -C build-asan test/test_y4m_411_oobASAN_OPTIONS=detect_leaks=0 ./build-asan/test/test_y4m_411_oob— must report1 tests run, 1 passed. Pre-fix the binary aborts withAddressSanitizer: heap-buffer-overflow … WRITE of size 1aty4m_input.c:507.
0270 — saliency_student_v1 fork-trained on DUTS-TR (ADR-0286)¶
- Touches:
model/tiny/registry.json— adds thesaliency_student_v1row. Fork-local registry; no upstream overlap.model/tiny/saliency_student_v1.onnx(+.jsonsidecar) — new weights and metadata. Fork-local.ai/scripts/train_saliency_student.py— new training script. Wholly fork-local underai/, which has no upstream counterpart.docs/ai/models/saliency_student_v1.md,docs/research/0062-saliency-student-from-scratch-on-duts.md,docs/adr/0286-saliency-student-fork-trained-on-duts.md— new docs under fork-local trees.- Invariant: the C-side
feature_mobilesal.cextractor's tensor-name contract —input(NCHW[1, 3, H, W]) andsaliency_map(NCHW[1, 1, H, W]) — must continue to match the ONNX graph for bothsaliency_student_v1.onnxand the legacymobilesal.onnxplaceholder. Future weights swaps can change the graph internals freely but must keep these names + shapes; the smoke test asserts the registration. The op-allowlist constraint (graph uses only ops incore/src/dnn/op_allowlist.c) carries over from ADR-0218 —Resizeis not used;ConvTransposeis the upsample op for v1 to keep the graph load-clean against vanilla origin/master. - Re-test:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python -c "
from ai.src.vmaf_train.op_allowlist import check_model
from pathlib import Path
r = check_model(Path('model/tiny/saliency_student_v1.onnx'))
assert r.ok, r.pretty()
print('allowlist OK')
"
meson test -C build --suite=fast mobilesal
0227 — tools/vmaf-tune/ Phase A scaffold (ADR-0237 Phase A)¶
- Touches:
core/src/feature/hip/float_ansnr_hip.{c,h}(new) — fifth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/float_ansnr_cuda.ccall-graph-for-call-graph;init/submit/collect/closeinvoke the kernel-template helpers in the same order; the submit body intentionally bypassesvmaf_hip_kernel_submit_pre_launch(no atomic, kernel writes per-block (sig, noise) interleaved float partials directly).core/src/hip/meson.build— adds the new TU tohip_sources.core/src/feature/feature_extractor.c— adds theextern VmafFeatureExtractor vmaf_fex_float_ansnr_hip;declaration and the registry row under#if HAVE_HIP.core/test/test_hip_smoke.c— addstest_float_ansnr_hip_extractor_registeredsub-test pinning the lookup contract.- Invariant — the
submit_pre_launchbypass is load-bearing. The CUDA twin makes the same choice for the same reason. If a future PR adds asubmit_pre_launchcall tofloat_ansnr_cuda.c's submit path, the HIP twin must follow in the same PR. Likewise the readback shape (wg_count * 2u * sizeof(float)) and the bpc table (peak/psnr_max for 8/10/12/16-bit) mirror the CUDA twin verbatim — keep aligned on rebase. - Re-test on rebase:
cd libvmaf
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build # 48/48 green (47 CPU + HIP smoke)
0230 — HIP sixth-consumer kernel motion_v2_hip (ADR-0267)¶
- Touches:
core/src/feature/hip/integer_motion_v2_hip.{c,h}(new) — sixth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/integer_motion_v2_cuda.ccall-graph-for-call-graph; carries theVMAF_FEATURE_EXTRACTOR_TEMPORALflag and aflush()callback. The state struct has auintptr_t pix[2]ping-pong slot pair tracked outside the kernel-template (the template models a single device+host pair only).core/src/hip/meson.build— adds the new TU tohip_sources.core/src/feature/feature_extractor.c— adds theextern VmafFeatureExtractor vmaf_fex_integer_motion_v2_hip;declaration and the registry row under#if HAVE_HIP.core/test/test_hip_smoke.c— addstest_motion_v2_hip_extractor_registeredsub-test pinning the lookup contract (extractor name ismotion_v2_hip, matching the CUDA twin'smotion_v2_cudanaming).- Invariant — temporal-extractor + ping-pong shape. The
VMAF_FEATURE_EXTRACTOR_TEMPORALflag bit, theflush()callback registration, and theuintptr_t pix[2]slot pair are load-bearing for the runtime PR (T7-10b). The runtime PR will swapuintptr_t pix[2]for a real device-buffer handle pair matching the CUDA twin'sVmafCudaBuffer *pix[2]. On rebase: if the CUDA twin's flush-pass shape changes (currentlymin(score[i], score[i+1])), update the HIP twin'sflush_fex_hipbody in the same PR. - Re-test on rebase: same as 0229 —
meson test -C buildwithenable_hip=trueexercises the smoke contract.
0227 — ms_ssim_vulkan submit-side migrated to kernel_template (T-GPU-DEDUP-26)¶
- Touches:
core/src/feature/vulkan/ms_ssim_vulkan.c—extract()'s rawVkCommandBuffer/VkFence/vkAllocateCommandBuffers/vkBeginCommandBuffer/vkCreateFence/vkQueueSubmit/vkWaitForFences/vkDestroyFence/vkFreeCommandBuffersblocks becomeVmafVulkanKernelSubmittriples (vmaf_vulkan_kernel_submit_begin/_submit_end_and_wait/_submit_free). One triple covers the decimate-pyramid command buffer; one triple per scale covers the per-scale SSIM submit. The pipeline-side bundles (pl_decimate2-binding 4-variant +pl_ssim10-binding 9-variant) and their_add_variant()chains are unchanged from the prior migration.- Invariant: any future submit-side template change (timeline semaphores, deferred fence release, queue-family parameterisation) must keep the helpers' synchronous-wait + per-frame fence + per-frame command-buffer contract intact, since
ms_ssim_vulkan.cdoes host readback of thel_partials/c_partials/s_partialsbuffers immediately after_submit_end_and_waitreturns. The submit-side contract is the same one already documented incore/src/vulkan/AGENTS.md's "Rebase-sensitive invariants" section forkernel_template.h. - Re-test:
```bash cd libvmaf && meson test -C build python scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature float_ms_ssim --backend vulkan --places 4
0231 — SHA-pin GitHub Actions (OSSF Pinned-Dependencies)¶
- Touches: every workflow file under
.github/workflows/. All 13 fork workflows (docker-image.yml,docs.yml,ffmpeg-integration.yml,libvmaf-build-matrix.yml,lint-and-format.yml,nightly-bisect.yml,nightly.yml,release-please.yml,rule-enforcement.yml,scorecard.yml,security-scans.yml,supply-chain.yml,tests-and-quality-gates.yml) had theiruses:directives rewritten from<owner>/<repo>@vN[.M.K]to<owner>/<repo>@<40-char-sha> # vN.M.K. 97 references converted; the SLSA reusable-workflow ref insupply-chain.ymlis the single documented holdout (seeInvariantbelow). - Invariant — SHA-pin policy for
uses:. Every action reference in.github/workflows/*.ymlMUST be a 40-char commit SHA with the semver tag preserved as a trailing# vN.M.Kcomment. The OSSF ScorecardPinned-Dependenciescheck parses both forms and a floating tag (@vN) is treated as unpinned and counts against the aggregate score. Single permitted exception: the SLSA generator reusable workflow (slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml) must keep itsvX.Y.Ztag form because GitHub Actions consumers cannot SHA-pin reusable-workflow refs in every code path; the exception is documented inline insupply-chain.ymland survives on each rebase. Why this matters on upstream sync: Netflix upstream does not ship the fork's CI tree, so a/sync-upstreamrun that drags new workflow content (e.g. via repository templates or bot-authored bumps) into.github/workflows/can re-introduce floating-tag references unnoticed. The post-rebase check below is the standing gate — anything that lights up needs to be re-pinned before merging the sync. - Re-test on rebase:
# Anything that prints is a regression — every uses: must be either
# already SHA-pinned (40 hex) or, for the documented SLSA exception,
# the slsa-github-generator reusable-workflow ref.
grep -hnE '^\s*(- )?uses:\s+[^@]+@[^ #]+\s*$' .github/workflows/*.yml \
| grep -vE '@[a-f0-9]{40}' \
| grep -v 'slsa-framework/slsa-github-generator/.github/workflows/'
# SHA-resolution sanity for any new pin (per-action):
gh api repos/<owner>/<repo>/git/ref/tags/<vN.M.K> --jq '.object.sha'
# If the result is a "tag" object (annotated tag), deref:
gh api repos/<owner>/<repo>/git/tags/<sha-from-prev> --jq '.object.sha'
0226 — CUDA drain-batch engine-loop opt (T-GPU-OPT-1)¶
- Touches:
core/src/cuda/drain_batch.{h,c}(new) — TLS drain-batch table + shared drain stream +_open()/register/_flush()/_close()API.core/src/libvmaf.c— engine-side per-frame loop now wraps submit/collect with_open()+_flush()so all CUDA extractorfinishedevents are waited on a single shared drain stream.- All 12 CUDA feature kernels (
core/src/feature/cuda/*.c) register theirfinishedevent +drainedflag with the drain batch on submit; collect skips its privatecuStreamSynchronizewhendrainedis true. - Invariant — drained-flag contract. Every CUDA extractor's collect path must check the per-frame
drainedflag and skip its owncuStreamSynchronizewhen set; otherwise the drain batching is a no-op. The flag is reset tofalseper frame insidevmaf_cuda_drain_batch_register(). - Re-test on rebase:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast cuda
Expected: all CUDA tests green; bench shows ≥5% wall-clock gain on a 7-extractor VMAF model (model.json with all feature extractors enabled).
0225 — Netflix bench snapshot regen (upstream a44e5e61 motion fix)¶
- Touches:
testdata/netflix_benchmark_results.json— fork-added snapshot. CPU rows now reflect the post-fix motion feature; cuda / sycl rows from the previous regen are preserved unchanged because those backends were not exercised on this rerun (host-environment tooling — wrong renderD path,libvmaf_cudanot enabled in the local FFmpeg build). Future full regens should include cuda / sycl.testdata/bench_all.sh— defaultVMAF=no longer points at/usr/local/bin/vmaf(which on most dev hosts is stuck at the pre-upstream-a44e5e61v3.0.0); now defaults to the in-tree fork build atcore/build/tools/vmaf.testdata/benchmark_netflix.py—FFMPEG,YUVDIRand the hardcodedLD_LIBRARY_PATH=/usr/local/libare now overridable viaVMAF_FFMPEG,VMAF_YUVDIRand any caller-setLD_LIBRARY_PATH.- Invariant: the snapshot's CPU pooled VMAF for
src01_576x324is 76.667828 (post-fix), not 76.668904 (the upstream-buggy mirror). If/sync-upstreamever re-pulls a Netflix change that touchesmotion.cmirror-handling, this number is the reference. - Re-test:
cd libvmaf
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
LD_LIBRARY_PATH=$(pwd)/build/src python3 \
../testdata/benchmark_netflix.py
Expected CPU pooled rows: 76.667828, 35.068672, 7.985899.
0224 — CUDA graph capture feasibility (research-0047, DEFER)¶
- Touches: none — investigation-only; no code lands. The research digest
docs/research/0047-cuda-graph-capture-feasibility.mddocuments why a CUDA graph capture path on the per-frame submit chain is deferred rather than shipped (realised wall-clock gain capped at ~1-3% vs. the predicted 10-20%, with a 4-slot picture-pool rotation that defeats single-graph capture and forces per-framecuGraphExecKernelNodeSetParamsrebinding for(ref, dis)device pointers). - Invariant: the
kernel_template.hdocstring keeps namingVmafCudaKernelLifecycle.finishedas a graph-capture hook point. Don't prune that comment on rebase — leaving the door open in the template is free, and the digest's "what needs to be true for a future GO" section depends on the hook still being there. - Re-test on rebase:
# Confirm the docstring still references graph capture as the hook
# point — wording change is fine, removal is not.
grep -q "graph capture" core/src/cuda/kernel_template.h
0223 — ADR slug-drift repair in CHANGELOG / rebase-notes (PR #304 follow-up)¶
- Touches:
CHANGELOG.md,docs/rebase-notes.md. No code; no upstream-shared path; no public-API surface. - Invariant: every
[ADR-NNNN](docs/adr/NNNN-slug.md)link in the fork's tracked docs resolves to an actual on-disk file underdocs/adr/. Repaired 4 broken slugs that did not exist on disk (0138-iqa-convolve-avx2-bitexact-double→0138-iqa-convolve-avx2-bitexact-double,0140-simd-dx-framework→0140-simd-dx-framework,0190-ms-ssim-vulkan→0190-ms-ssim-vulkan,0178-vulkan-adm-kernel→0178-vulkan-adm-kernel). All retained their cited NNNN per ADR-0028 (NNNN is immutable once Accepted). - Re-test on rebase: from repo root, the following must print no lines:
for ref in $(grep -ohE 'docs/adr/[0-9]{4}-[a-z0-9-]+\.md' \
CHANGELOG.md docs/rebase-notes.md AGENTS.md docs/state.md \
| sort -u); do
test -f "$ref" || echo "MISSING: $ref"
done
0125 — cambi_vulkan migrated to kernel_template (T-GPU-DEDUP-25, 5-bundle)¶
- Touches:
core/src/feature/vulkan/cambi_vulkan.c— state's quintet (dsl_2bind+ 5×pl_layout_*+shader_modules[CAMBI_PL_COUNT]areddesc_pool) collapses to fiveVmafVulkanKernelPipelinebundles (pl_trivial,pl_derivative,pl_filter_mode,pl_decimate,pl_mask_dp), each owning its own descriptor pool. The first slot ofpipelines[]per stage aliases the bundle's base pipeline;CAMBI_PL_FILTER_MODE_V,CAMBI_PL_MASK_SAT_COL, andCAMBI_PL_MASK_THRESHOLDare sibling variants built viavmaf_vulkan_kernel_pipeline_add_variant().cambi_vk_alloc_settakes a bundle pointer (->desc_pool/->dsl) — every dispatch site picks the bundle that matches its push-constant struct.- The
cambi_vk_make_dsl/cambi_vk_make_pl/cambi_vk_create_shader/cambi_vk_build_pipelinehelpers are dropped — the template subsumes them. - Invariant — variants destroyed before bundle, base alias must be skipped. Five distinct push-constant struct sizes (
CambiVkPushTrivial/CambiVkPushDerivative/CambiVkPushFilterMode/CambiVkPushDecimate/CambiVkPushMaskDp) force five bundles even though every stage's DSL is 2-binding SSBO;_add_variant()only siblings pipelines under the same layout.close_fexmustvkDestroyPipeline()the variant slots (CAMBI_PL_FILTER_MODE_V,CAMBI_PL_MASK_SAT_COL,CAMBI_PL_MASK_THRESHOLD) before callingvmaf_vulkan_kernel_pipeline_destroy()on each bundle. - Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit):
cambimean = 0.0, identical to pre-migration (the pair has no banding artifacts). - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper. Upstream Netflix/vmaf has no Vulkan backend, so there is nothing to merge against.
0124 — ssimulacra2_vulkan migrated to kernel_template (T-GPU-DEDUP-24, 4-bundle)¶
- Touches:
core/src/feature/vulkan/ssimulacra2_vulkan.c— state's 16 long-lived pipeline-object fields (4×*_dsl + *_pl + *_shader+ the shareddesc_pool) collapse to fourVmafVulkanKernelPipelinebundles (pl_xyb,pl_mul,pl_blur,pl_ssim), each owning its own descriptor pool. The first slot of each per-bundle pipeline array (xyb_pipelines[0],mul_pipelines[0],blur_pipelines_h[0],ssim_pipelines[0]) aliases the bundle's baseVkPipeline; remaining per-scale / per-pass slots are siblings viavmaf_vulkan_kernel_pipeline_add_variant().ss2v_build_pipeline_int3reroutes through_add_variant()instead of callingvkCreateComputePipelinesdirectly;ss2v_alloc_settakes a bundle pointer (->desc_pool/->dsl) instead of a separate DSL argument; descriptor-set free sites at the tail ofss2v_run_scaleroute to each bundle's pool.- The
ss2v_make_dsl/ss2v_make_pl/ss2v_create_shaderhelpers are dropped — the template subsumes them. - Invariant — variants destroyed before bundle, slot 0 alias must be skipped. Four distinct DSL shapes (XYB = 6 SSBOs, MUL = 3, BLUR = 2, SSIM = 8) prevent collapsing to one bundle:
_add_variant()only siblings pipelines under the same layout.close_fexmustvkDestroyPipeline()the variant slots inxyb_pipelines[1..N-1],mul_pipelines[1..N-1],ssim_pipelines[1..N-1],blur_pipelines_h[1..N-1], and every slot ofblur_pipelines_v[]before callingvmaf_vulkan_kernel_pipeline_destroy()on each bundle, and must skip slot 0 of the first three arrays +blur_pipelines_hto avoid double-freeing the aliased base. - Numerical contract: bit-exact preserved. Same shaders + spec-constants + push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template. Validated on the Netflix-pair smoke (576×324×8-bit):
ssimulacra2mean = 24.613842, identical to pre-migration. - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper. Upstream Netflix/vmaf has no ssimulacra2 extractor and no Vulkan backend, so there is nothing to merge against.
0118 — psnr_hvs_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-18)¶
- Touches:
core/src/feature/vulkan/psnr_hvs_vulkan.c— state'sdsl + pipeline_layout + shader + desc_pool + pipeline[3]collapses toVmafVulkanKernelPipeline pl + VkPipeline pipeline_chroma_u + VkPipeline pipeline_chroma_v. Plane 0 is the template's base pipeline; planes 1+2 are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- New
psnr_hvs_plane_pipeline()accessor maps plane index to the rightVkPipelinehandle. - Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the chroma U/V variants before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan in T-GPU-DEDUP-7. - Numerical contract: unchanged. Same shaders + spec-constants push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
- Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0119 — vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-19)¶
- Touches:
core/src/feature/vulkan/vif_vulkan.c— state'sdsl + pipeline_layout + shader + desc_pool + pipelines[4]collapses toVmafVulkanKernelPipeline pl + VkPipeline scale_variants[3]. Scale 0 is the template's base pipeline; scales 1, 2, 3 are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- New
vif_scale_pipeline()accessor maps scale index to the rightVkPipelinehandle (replacess->pipelines[scale]). - Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the 3 scale variants before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan in T-GPU-DEDUP-7 and psnr_hvs_vulkan in T-GPU-DEDUP-18. - Numerical contract: unchanged. Same shaders, same spec-constants, same push-constants as before; only the Vulkan pipeline-bundle scaffolding moved to the template.
- Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0120 — float_vif_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-20)¶
- Touches:
core/src/feature/vulkan/float_vif_vulkan.c— state collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl; theVkPipeline pipelines[2][4]2-D lookup table is preserved so the existing[mode][scale]dispatch path stays clean, butpipelines[0][0]aliasess->pl.pipeline(the template's base). The other 6 entries are sibling pipelines created viavmaf_vulkan_kernel_pipeline_add_variant().- Invariant — variants destroyed before bundle.
close_fexmustvkDestroyPipeline()the 6 sibling variants (every(mode, scale)except(0, 0)) before callingvmaf_vulkan_kernel_pipeline_destroy(&s->pl)— same rule as ssim_vulkan / psnr_hvs_vulkan / vif_vulkan. - Invariant —
pipelines[0][0]aliasing. The base pipeline handle is owned bys->pl.pipeline; we copy it intopipelines[0][0]after_create()so the dispatch path can use a uniform 2-D lookup. The destroy loop must skip(mode=0, scale=0)to avoid double-freeing the template's pipeline. - Numerical contract: unchanged. Same shaders, spec-constants (
mode+scale), push-constants. Netflix-pair smoke matchesinteger_vifbit-identically to 4 decimals. - Rebase impact: low. Builds on top of PR #272's
_add_variant()helper.
0122 — float_adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-22)¶
- Touches:
core/src/feature/vulkan/float_adm_vulkan.c— twin to adm_vulkan (T-GPU-DEDUP-21); 16-pipeline 2-D[stage][scale]array. State collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl.pipelines[0][0]aliasess->pl.pipeline; the other 15 entries are siblings viavmaf_vulkan_kernel_pipeline_add_variant().- Invariants:
- Variants destroyed before bundle.
pipelines[0][0]aliasing — destroy loop must skip(stage=0, scale=0).- Numerical contract: unchanged. Same float (
_ssuffix) primitives fromadm_tools.c; same 5-element spec-constant tuple; same float partial accumulation reduced in double on the host.
0121 — adm_vulkan migrated to kernel_template + _add_variant (T-GPU-DEDUP-21)¶
- Touches:
core/src/feature/vulkan/adm_vulkan.c— state collapsesdsl + pipeline_layout + shader + desc_pooltoVmafVulkanKernelPipeline pl; theVkPipeline pipelines[4][4]2-D lookup is preserved so the per-stage dispatch path stays clean.pipelines[0][0]aliasess->pl.pipeline(the template's base); the other 15 entries are sibling pipelines viavmaf_vulkan_kernel_pipeline_add_variant().- Invariants:
- Variants destroyed before bundle (same rule as ssim_vulkan / psnr_hvs / vif / float_vif).
pipelines[0][0]aliasing — destroy loop must skip(stage=0, scale=0)to avoid double-freeing the template's pipeline.- Numerical contract: unchanged. Same shaders + 5-element spec-constant tuple (width, height, bpc, scale, stage) + push-constants.
- Rebase impact: low. Builds on top of PR #272.
0123 — ms_ssim_vulkan 2-bundle migration (T-GPU-DEDUP-23)¶
- Touches:
core/src/feature/vulkan/ms_ssim_vulkan.c— state collapsesdecimate_dsl + decimate_pl + decimate_shader + ssim_dsl + ssim_pl + ssim_shader + desc_pool(7 fields) to two bundlesVmafVulkanKernelPipeline pl_decimate+pl_ssim. Each bundle owns its own descriptor pool. The kernel has two distinct pipeline shapes (decimate = 2 SSBO bindings, ssim = 10 bindings), so two bundles is the minimum —_add_variant()only siblings pipelines under the same layout.decimate_pipelines[0]aliasespl_decimate.pipeline(the template's base = scale 0). The remainingMS_SSIM_SCALES - 2decimate variants (scales 1..3) are siblings via_add_variant().ssim_pipeline_horiz[0]aliasespl_ssim.pipeline(base = scale 0, pass 0). The other 9 entries (4×ssim_pipeline_horizfor scales 1..4, plus 5×ssim_pipeline_vertfor scales 0..4) are variants.- Invariant — variants destroyed before bundle. Same rule as ADR-0106 entry 0106:
close_fexmust destroydecimate_pipelines[1..3]andssim_pipeline_horiz[1..4]+ssim_pipeline_vert[0..4]before callingvmaf_vulkan_kernel_pipeline_destroy()onpl_decimate/pl_ssim. - Invariant —
[0]aliasing destroy-skip.decimate_pipelines[0]andssim_pipeline_horiz[0]must not be passed tovkDestroyPipelineinclose_fex—_destroy()already releases them viapl_decimate.pipeline/pl_ssim.pipeline. Double-free is UB. The destroy loops inclose_fexstart ati = 1for decimate and skipi == 0for ssim_horiz. - Invariant — per-bundle descriptor pool. The shared
s->desc_poolis gone;alloc_descriptor_setnow takes aconst VmafVulkanKernelPipeline *bundleand usesbundle->desc_pool+bundle->dsl. Per-framevkFreeDescriptorSetscalls must target the matching pool (pl_decimate.desc_poolfor decimate sets,pl_ssim.desc_poolfor ssim sets) — mixing them is undefined behavior. - Numerical contract: unchanged. Same shaders, spec constants, push constants, and dispatch order as before.
float_ms_ssimNetflix-pair smoke (576×324×48f) reports mean 0.963241; ssim pyramid intermediate values bit-identical to pre-migration run. - Rebase impact: low. Upstream Netflix has no Vulkan backend. Conflicts only against the parallel
T-GPU-DEDUP-{18..22}PRs (#284–#288) onCHANGELOG.md/docs/rebase-notes.md— auto-resolve keeps both halves.
0106 — Vulkan kernel template multi-pipeline + ssim/motion migration (T-GPU-DEDUP-7)¶
- Touches:
core/src/vulkan/kernel_template.h— newvmaf_vulkan_kernel_pipeline_add_variant()helper. Takes the base pipeline bundle (DSL / pipeline layout / shader / pool owned byvmaf_vulkan_kernel_pipeline_create) plus a partialVkComputePipelineCreateInfoand produces a siblingVkPipelinere-using the same layout / shader. The base_createand_destroyentry points are unchanged; existing consumers (psnr, moment, ciede) keep working.core/src/feature/vulkan/motion_vulkan.c— state collapsesVkPipeline pipelines[2](kept "for SYCL parity" but functionally identical because COMPUTE_SAD goes through push constants, not spec-constants) to a singleVmafVulkanKernelPipeline pl.create_pipelines/close_fexshrink to template-driven create + destroy.core/src/feature/vulkan/ssim_vulkan.c— state becomesVmafVulkanKernelPipeline pl + VkPipeline pipeline_vert. Pass 0 (horizontal) is the template's base pipeline; pass 1 (vertical) is created via_add_variant().close_fexdestroys the variant first, then callsvmaf_vulkan_kernel_pipeline_destroy()on the bundle.- Invariant — no spec-constant drift between base and variant.
_add_variant()overwritessType/stage.sType/stage.stage/stage.module/layoutof the caller'sVkComputePipelineCreateInfoso the variant is guaranteed to share the base's shader and layout. Callers control the variant's spec-constant viapSpecializationInfo. Reordering these overwrites lets a consumer accidentally bind a different shader module under the same layout — UB at descriptor-set time. - Invariant — variant destroyed before bundle.
close_fexin ssim mustvkDestroyPipeline(s->pipeline_vert)beforevmaf_vulkan_kernel_pipeline_destroy(&s->pl)— the bundle's_destroyreleases the descriptor pool, which thevkAllocateDescriptorSetsissued against the variant pipeline's layout cleanly drops only when the variant pipeline is already gone. - Numerical contract: unchanged. Both kernels run identical shaders + spec-constants + push-constants as before; only the Vulkan boilerplate that creates / destroys the pipeline scaffolding moved to a shared owner. Cross-backend parity gate at
places=4holds — Netflix-pairfloat_ssimsmoke (576×324×48f) reports mean 0.863, identical to pre-migration. - Rebase impact: low. The base pipeline-bundle helpers predate this change (PR #270 / #271); the new
_add_variantis additive. Upstream Netflix has no Vulkan backend to conflict with.
0111 — integer_ciede_cuda migrated to kernel_template (T-GPU-DEDUP-11)¶
- Touches:
core/src/feature/cuda/integer_ciede_cuda.c— state'sCUstream + CUevent + CUevent + VmafCudaBuffer + host-pinned float*quintet collapses toVmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. init / collect / close call the template'slifecycle_init/readback_alloc/collect_wait/lifecycle_close/readback_freehelpers. submit keeps the pre-launch wait inline (intentional — ciede has no atomic, so the template's pre-launch memset is unnecessary).- Numerical contract: unchanged. Pure CUDA-boilerplate consolidation. The host-side reduction in collect still uses the same
doubleaccumulator over per-block float partials —places=4(ADR-0187) holds.
0112 — integer_moment_cuda migrated to kernel_template (T-GPU-DEDUP-12)¶
- Touches:
core/src/feature/cuda/integer_moment_cuda.c— state's stream/event/device-buffer/host-pinned quintet collapses toVmafCudaKernelLifecycle lc + VmafCudaKernelReadback rb. submit callsvmaf_cuda_kernel_submit_pre_launch(atomic counters require the device-side memset). init / collect / close call the matching template helpers.- Numerical contract: unchanged. Same per-frame atomic accumulators (4× uint64), same
sums_host[i] / n_pixelshost division. - Rebase impact: low. Upstream Netflix has no equivalent template; this consolidation is fork-local.
0113 — integer_motion_v2_cuda migrated to kernel_template (T-GPU-DEDUP-13)¶
- Touches:
core/src/feature/cuda/integer_motion_v2_cuda.c— stream/event pair + sad device+host quintet collapses tolc + rb. Raw-pixel ping-pongpix[2]stays outside the bundle. submit keeps the memset onpic_streaminline rather than callingsubmit_pre_launch(the helper would move the memset tolc.str, which races with the kernel reading the accumulator). init / collect / close call the matching template helpers.- Numerical contract: unchanged. Same D2D copy, same conditional kernel launch on frame ≥ 1, same host-side
min(score[i], score[i+1])flush.
0114 — integer_ssim_cuda migrated to kernel_template (T-GPU-DEDUP-14)¶
- Touches:
core/src/feature/cuda/integer_ssim_cuda.c— stream/event/partials device+host quintet collapses tolc + rb. Five intermediate float buffers (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp) stay outside the bundle. submit keeps thecuStreamWaitEvent + horiz + vert + DtoHchain inline — SSIM writes one float per block (no atomic), so the template'ssubmit_pre_launchmemset is unnecessary. init / collect / close use the matching template helpers.- Numerical contract: unchanged. Same horiz-then-vert two-pass pipeline, same per-block float partial reduction in double on the host.
places=4(matching the ciede_cuda precision pattern) holds. - Rebase impact: low. Upstream Netflix has no equivalent; this is fork-added.
0115 — ms_ssim_cuda + psnr_hvs_cuda lifecycle migration (T-GPU-DEDUP-15)¶
- Touches:
core/src/feature/cuda/integer_ms_ssim_cuda.c— stream + 2-event lifecycle replaced withVmafCudaKernelLifecycle lc; multi-level pyramid + SSIM intermediate + 3-partials buffers stay outside the template's single-pair readback bundle.core/src/feature/cuda/integer_psnr_hvs_cuda.c— same shape; 3-plane ref/dist/partials triples remain inline.- Numerical contract: unchanged. The migration only affects init / close boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the
s->str→s->lc.str/s->event→s->lc.submit/s->finished→s->lc.finishedfield renames.
0116 — float_psnr/ansnr/motion cuda → kernel_template (T-GPU-DEDUP-16)¶
- Touches:
core/src/feature/cuda/float_psnr_cuda.c— stream/event/partials quintet →lc + rb; input upload buffersref_in/dis_instay outside the bundle.core/src/feature/cuda/float_ansnr_cuda.c— same shape; rb wraps the (sig, noise) interleaved partials.core/src/feature/cuda/float_motion_cuda.c— same shape; rb wraps the SAD partials,blur[2]ping-pong stays outside.- Numerical contract: unchanged. Same dispatch geometry, same reduction order. Cross-backend parity gate at the kernels' contracted precision (places=3 per ADR-0192) holds.
0117 — float_adm + float_vif cuda lifecycle migration (T-GPU-DEDUP-17)¶
- Touches:
core/src/feature/cuda/float_adm_cuda.c— stream + 2-event lifecycle replaced withVmafCudaKernelLifecycle lc; multi-stage DWT + CSF pipeline state stays outside the template's single-pair readback bundle.core/src/feature/cuda/float_vif_cuda.c— same shape; 4-level pyramid + per-scale (num, den) pairs remain inline.- Numerical contract: unchanged. The migration only affects init / close stream-event boilerplate; submit / collect dispatch and host reduction paths are untouched apart from the field renames.
- Rebase impact: low. Upstream Netflix has no equivalent template; this is fork-added.
0107 — float_psnr_vulkan migrated to kernel_template (T-GPU-DEDUP-8)¶
- Touches:
core/src/feature/vulkan/float_psnr_vulkan.c— state'sdsl + pipeline_layout + shader + pipeline + desc_poolquintet is collapsed into a singleVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy. No shader changes, no spec-constant changes, no push-constant changes.- Numerical contract: unchanged. The migration is a pure Vulkan-boilerplate consolidation. Cross-backend parity gate at
places=4holds — Netflix-pair smoke reportsfloat_psnrmean 30.755 dB, identical to pre-migration.
0109 — float_ansnr_vulkan + motion_v2_vulkan migrated to kernel_template (T-GPU-DEDUP-9)¶
- Touches:
core/src/feature/vulkan/float_ansnr_vulkan.c— single-pipeline state collapses toVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy.core/src/feature/vulkan/motion_v2_vulkan.c— same shape.- Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Cross-backend parity gate at the kernel's contracted precision holds — Netflix-pair smoke reports
float_ansnrmean 23.51 dB andmotion2_v2_scoremean 3.895, identical to pre-migration.
0110 — float_motion_vulkan migrated to kernel_template (T-GPU-DEDUP-10)¶
- Touches:
core/src/feature/vulkan/float_motion_vulkan.c— single-pipeline state collapses toVmafVulkanKernelPipeline pl;create_pipelinesandclose_fexshrink to template-driven create + destroy.- Numerical contract: unchanged. Pure Vulkan-boilerplate consolidation. Netflix-pair smoke reports
motionmean 4.049 /motion2mean 3.894, identical to pre-migration. - Rebase impact: low. Upstream Netflix has no Vulkan backend.
0108 — Bristol VI-Lab feasibility digest + BVI-CC ingest ADR (Draft)¶
- Touches:
docs/research/0046-bristol-vi-lab-feasibility.md(new) — nine-dataset survey + use-case fit + effort estimate.docs/adr/0241-bristol-bvi-cc-ingest.md(new, Status: Draft) — proposal to ingest BVI-CC as the second tiny-AI corpus.docs/adr/README.md— index row for ADR-0241.CHANGELOG.md— Added entry.- Numerical contract: not applicable (docs-only).
- Rebase impact: none. Pure research deliverables; upstream Netflix has no equivalent surface.
0094 — Vulkan VkImage import v2 async pending-fence (T7-29 part 4 / ADR-0251)¶
- ADR: ADR-0251; predecessor ADR-0186.
- Touches:
core/src/vulkan/import.c— full rewrite of the submission path. Single-fencesubmit_and_waitbecomes per-slotsubmit_to_slot+drain_slot_fence; the newslot_alloc/slot_releasehelpers materialise / tear down a ring slot (staging-pair + cmd buffer + fence).vmaf_vulkan_import_imageindexes into the ring byframe_index % ring_size;vmaf_vulkan_wait_computedrains every outstanding fence.vmaf_vulkan_state_build_pictureswaits the slot's fence before exposing the host pointer. Public-API signatures are unchanged.core/src/vulkan/vulkan_internal.h— newstruct VmafVulkanImportSlot;VmafVulkanImportSlotsbecomes a fixed-capacityVmafVulkanImportSlot ring[VMAF_VULKAN_RING_MAX]plus geometry +ring_size. Two new defines —VMAF_VULKAN_RING_DEFAULT(4) andVMAF_VULKAN_RING_MAX(8).VmafVulkanStategainsrequested_ring_size.core/src/vulkan/common.c—vmaf_vulkan_state_initand_state_init_externalsetrequested_ring_size = VMAF_VULKAN_RING_DEFAULT.core/test/test_vulkan_async_pending_fence.c(new, contract smoke for the v1 → v2 swap).core/test/meson.build— registers the new test under the existingenable_vulkanguard.core/src/vulkan/AGENTS.md(new) — pins the three rebase-sensitive ring invariants.docs/adr/0251-vulkan-async-pending-fence.md(new),docs/research/0042-vulkan-async-pending-fence.md(new),docs/api/gpu.md,docs/backends/vulkan/overview.md,CHANGELOG.md,docs/rebase-notes.md.ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch— unchanged. The v2 ring is fully internal toVmafVulkanState; the public ABI stays byte-identical so the filter consumes the new path transparently.- Invariant 1 — fixed ring depth at first import.
lazy_alloc_ringis the only place that materialises the ring; once allocated the depth never changes for the lifetime of theVmafVulkanState. Any caller that needs a different depth has to free + re-init. The geometry pinning contract from v1 (ADR-0186) is preserved verbatim. - Invariant 2 —
vkResetFencesonly afterVK_SUCCESSfromvkWaitForFences. Sole reset path lives indrain_slot_fence;fence_in_flightflips back to 0 only after the wait succeeds. A-EIOfrom the wait propagates up without resetting (so a retry would correctly re-wait rather than silently move on). - Invariant 3 —
state_freedrains before destroying.vmaf_vulkan_import_slots_freewalks the ring and callsdrain_slot_fenceon every in-flight slot, then issues onevkQueueWaitIdlebelt-and-braces (any feature kernel that submitted on the same queue may still be running). Reordering this triggers validation-layer "destroying in-use object" errors. - Numerical contract: unchanged. Async submission only changes when the host can read the staging buffer, not which bytes the GPU writes. Cross-backend parity gate (
scripts/ci/cross_backend_parity_gate.py,places=4) holds. - Memory delta: staging arena scales
1 → ring_sizeper direction. At default depth and 1080p 8-bit Y, the per-state host-visible footprint grows from ~4 MiB to ~16 MiB. Documented in ADR-0251 §Consequences.
0090 — cambi_vulkan extractor (T7-36 / ADR-0210)¶
- ADR: ADR-0210; predecessor ADR-0205.
- Touches:
core/src/feature/vulkan/cambi_vulkan.c(replaces the spike scaffold'sinit_stub/extract_stub/close_stubtriple with the full Vulkan-aware lifecycle).core/src/feature/vulkan/shaders/cambi_preprocess.comp(new),cambi_mask_dp.comp(new — unified row-SAT / col-SAT / threshold-compare viaPASS=0/1/2spec const).core/src/feature/cambi.c— appends a small block of public trampolines (vmaf_cambi_*) at the bottom of the file that thinly wrap the file-static helpers. No upstream function-static code is renamed or moved; the entire upstream body of cambi.c above the trampolines stays byte-identical, which keeps Netflix sync straightforward.core/src/feature/cambi_internal.h(new) — internal-only header exposingvmaf_cambi_calculate_c_values,vmaf_cambi_get_spatial_mask, etc., to the GPU twin.core/src/vulkan/meson.build— registers the 5 cambi shaders invulkan_shader_sources[]andcambi_vulkan.cinvulkan_sources.core/src/feature/feature_extractor.c— adds the extern decl + registry entry forvmaf_fex_cambi_vulkanunder#if HAVE_VULKAN.scripts/ci/cross_backend_vif_diff.py—cambirow inFEATURE_METRICSso the cross-backend gate runs atplaces=4against the CPU baseline.docs/adr/0210-cambi-vulkan-integration.md,docs/research/0032-cambi-vulkan-integration.md,docs/backends/vulkan.md,CHANGELOG.md.- Invariant 1 — bit-exactness by construction. Every GPU phase is integer arithmetic (
uint16derivative,int32SAT,>compare, stride-2 gather, 3-elementmode3lookup). The readback into the hostVmafPicturepair is byte-identical to what the CPU would have written; the host residual then runs the unmodified CPUcalculate_c_values+ spatial pooling on those buffers. Any rebase that introduces float arithmetic into one of these GPU phases — e.g., a future Netflix change to the derivative kernel that adds a bilinear interpolation step — will silently breakplaces=4and must be caught at the cross-backend gate. - Invariant 2 —
cambi_internal.hsignatures must stay in lock-step with cambi.c's file-static helpers. The Vulkan twin callsvmaf_cambi_calculate_c_values, which trampolines to the file-staticcalculate_c_values. Any signature change to the latter (extra parameters, type changes) must update the trampoline + header in the same PR or the GPU build breaks. - On upstream sync: cambi.c's file-static helpers are sometimes renamed by upstream (e.g.,
decimate→cambi_decimatewould happen during a Netflix tidy-up). When rebasing, search cambi.c's tail for the trampoline block — its fivestaticcalls (get_spatial_mask,decimate,filter_mode,calculate_c_values,spatial_pooling,weight_scores_per_scale,get_pixels_in_window,increment_range,decrement_range,get_derivative_data_for_row,cambi_preprocessing) need to match the upstream symbol names. Update the trampoline body if upstream renames; signatures should not need to change because the trampoline already takes the function-pointer-typedef form (VmafRangeUpdateretc.). - Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py --backend vulkan --feature cambi --ref testdata/ref_576x324_48f.yuv --dist testdata/dis_576x324_48f.yuv --width 576 --height 324 --pixel-format 420 --bitdepth 8 --frames 48. Should emitplaces=4 PASSwithmax_abs_diff = 0.0. If it diverges, bisect the GPU phases by reading back individual buffers (image_buf/mask_buf/deriv_buf) and comparing against the CPU's in-placepicplane after the equivalent stage.
The pre-ADR-0108 fork-local PRs are summarised by workstream rather than per-PR. Future PRs add entries individually.
0085 — Upstream c70debb1 partial port (adm_csf + barten_csf tests)¶
- No ADR. Pure upstream cherry-pick per ADR-0108 carve-out ("pure upstream syncs and
port-upstream-commitPRs are exempt"). - Upstream source:
c70debb1(Kyle Swanson, 2026-04-28): "libvmaf/test: port new adm/vif/speed tests". The audit row that flagged the gap is T-NEW-2 in the 2026-04-29 quarterly upstream-backlog re-audit (PR #205). - Touches (additive only):
core/src/feature/adm_csf_tools.h— new header (verbatim from upstream); declares the inlineadm_native_csfhelper (DLM-paper CSF) used by the newtest_adm_csfunit.core/test/test_adm_csf.c— new unit (verbatim from upstream); 2mu_assertcases onadm_native_csf(3, 3.0, 1080, {0, 45}).core/test/test_barten_csf.c— new unit (verbatim from upstream); 23mu_assertcases overbarten_rod_cone_sens,barten_mtf,barten_csf,linear_interpolate,barten_watson_blend_csf(all symbols already on the fork).core/test/meson.build— registers the two new executables + addstest('test_adm_csf', ...)andtest('test_barten_csf', ...).CHANGELOG.mdUnreleased § Changed.- Deliberate scope cuts (the upstream commit's other halves are not portable verbatim):
test_vif_tools.c— depends on upstream symbolsNUM_KERNELSCALES, the 21-entryvalid_kernelscalestable,vif_validate_kernelscale,vif_get_filter_size,vif_get_filter,speed_get_antialias_filter, and a[NUM_KERNELSCALES][5][65]filter table that the fork'svif_filter1d_table_s [11][4][65]does not match. Per Research-0024 Strategy E, the fork deliberately diverges from the upstreamvifruntime-helper chain to preserve the ADR-0138 / 0139 / 0142 / 0143 SIMD bit-exactness contract. Porting this test requires porting the runtime helpers first.test_speed_chroma.c—#includesfeature/speed.cdirectly; the fork has no SpEED extractor (feature/speed.cdoes not exist). Pairs with audit row T-NEW-1 (port the SpEED extractor wholesale, or absorb it into the tiny-AI speed metric).- Invariants (rebase-relevant):
- The new
adm_csf_tools.hheader is wholly additive and does not conflict with the existing forkadm_csf_snon-inline helper inadm_tools.h(different signature, different translation units). - The two new tests do not depend on Netflix golden YUVs — they evaluate the closed-form CSF math directly. No golden-data interaction.
- On upstream sync: a future port of the upstream
vifruntime-helper chain (Research-0024 Strategy A reversal) or the SpEED extractor (T-NEW-1) unlocks the deferred halves of this commit. Until then, fork-sidetest_vif_tools.c/test_speed_chroma.cstay absent. - Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu test_adm_csf test_barten_csf
meson test -C build-cpu test_adm_csf test_barten_csf
0084 — Embedded MCP server scaffold (T5-2, ADR-0209)¶
- ADR: ADR-0209 (audit-first scaffold) on top of the ADR-0128 governance + Research-0005 design.
- Upstream source: fork-local. Netflix/vmaf has no embedded MCP server (and no plans to add one — the workflow is agent-tooling-specific, well outside upstream's library scope).
- Touches:
core/include/libvmaf/libvmaf_mcp.h— new public header.core/include/core/meson.build— newif get_option('enable_mcp')install branch.core/src/mcp/— new directory:mcp.c(stub TU) +meson.build(exposesmcp_sources+mcp_defines).core/src/meson.build— newis_mcp_enabledguard +subdir('mcp')block;mcp_sourcesthreaded into thelibrary('vmaf', ...)source list alongsidednn_sources.core/test/meson.build— newif get_option('enable_mcp')block wiringtest_mcp_smoke.core/test/test_mcp_smoke.c— new 12-sub-test smoke.core/meson_options.txt— newenable_mcpumbrella + three sub-flags (all defaultfalse).- Invariant: every public entry point in
libvmaf_mcp.h(vmaf_mcp_init/_start_sse/_start_uds/_start_stdio/_stop/_close) returns-ENOSYS(or-EINVALon bad arguments) until the T5-2b runtime PR lands. The smoke pins this contract — a runtime PR that flips a return code without flipping the smoke expectation regresses the gate. - On upstream sync: zero interaction with upstream files. Wholly additive directory + boolean build flags. The
subdir('mcp')insertion incore/src/meson.buildlives next to the existingsubdir('dnn')/ Vulkan blocks; an upstream conflict in that area would be confined to those few lines and is mechanical to resolve. - Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_mcp=false
ninja -C build-cpu && meson test -C build-cpu # baseline still green
meson setup --reconfigure build-cpu libvmaf -Denable_mcp=true \
-Denable_mcp_sse=true -Denable_mcp_uds=true -Denable_mcp_stdio=true
ninja -C build-cpu
meson test -C build-cpu test_mcp_smoke # 12/12 sub-tests pass
0065 — T7-37 Netflix bench rerun + docs/benchmarks.md TBD fill¶
- No ADR. Empirical fill of pre-existing
TBDcells; no new decision. The bench script fixes that this rerun depends on shipped earlier under PR #169 (libvmaf/AGENTS.md backend-engagement foot-guns), PR #170 (--backend cudaactually engages CUDA), and PR #171 (testdata/bench_all.shuses correct flags). Vulkan header install for SDK consumers is PR #175. - Touches (additive only):
docs/benchmarks.md(everyTBDcell replaced with measured numbers; hardware-profile table updated to theryzen-4090-archost the rerun was performed on; "How to reproduce" section now documents fixture acquisition for the gitignored BBB 4K 200-frame pair).CHANGELOG.mdUnreleased § Changed entry. - Invariants (rebase-relevant): none. The numbers are tied to fork commit
41301496and theryzen-4090-arcprofile; an upstream rebase that changes feature pipelines would invalidate the table but not break parsing. - On upstream sync: zero interaction. Pure docs.
- Re-test on rebase:
bash testdata/bench_all.sh(after a fresh fork build) — confirms the bench script still drives all four backends and that the per-row metrics-key counts (CPU=15, CUDA=12, SYCL/Vulkan=34) still distinguish them. If they collapse to one count, the new upstream broke a backend dispatcher silently.
0050 — float_adm_cuda + float_adm_sycl extractors (ADR-0202)¶
- ADR: ADR-0202
- Touches:
core/src/feature/cuda/float_adm/float_adm_score.cu(new)core/src/feature/cuda/float_adm_cuda.{c,h}(new)core/src/feature/sycl/float_adm_sycl.cpp(new)core/src/meson.build— three changes: (1) newfloat_adm_scoreentry incuda_cu_sources, (2) newcuda_cu_extra_flagsdict that threads--fmad=false+-Xcompiler=-ffp-contract=offinto thefloat_adm_scorefatbin only, (3) new SYCL source insycl_feature_sources.core/src/feature/feature_extractor.c(extern decls + list entries forvmaf_fex_float_adm_cuda/vmaf_fex_float_adm_syclunder#if HAVE_CUDA/#if HAVE_SYCL).- Invariant 1 —
--fmad=falsefor the float_adm fatbin only: the angle-flag dot product (ot_dp = oh*th + ov*tv) and the cube reductions (xa*xa*xa,csf_o*csf_o*csf_o) require IEEE-754 add/mul ordering to match the GLSLprecisequalifier infloat_adm.comp. NVCC's default-fmad=truefuses these and drifts pastplaces=4at scale 3 / adm2. The integer ADM kernels sharecuda_flagsbut useint64accumulators where FMA is irrelevant — keep the FMA-on default for them. - Invariant 2 — parent-LL dimension trap: stage 0 at
scale > 0reads the parent's LL band; the mirror/clamp bounds arescale_w/h[scale](= parent's LL output dims = current scale's input dims), NOTscale_w/h[scale - 1](= parent's full image dims). Bothfloat_adm_cuda.candfloat_adm_sycl.cppcite this inline. Do not "simplify" by using the off-by-one neighbour. - Re-test:
CXX=icpx CC=icx meson setup build-cs -Denable_cuda=true \
-Denable_sycl=true -Denable_vulkan=enabled \
-Denable_float=true \
-Dsycl_compiler=/opt/intel/oneapi/compiler/latest/bin/icpx
ninja -C build-cs
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build-cs/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature float_adm \
--backend cuda --places 4
# Same with --backend sycl on a host with an SYCL device.
# Both must report 0/N mismatches at places=4.
0049 — float_adm_vulkan extractor (ADR-0199)¶
- ADR: ADR-0199
- Touches:
core/src/feature/vulkan/float_adm_vulkan.c(new)core/src/feature/vulkan/shaders/float_adm.comp(new)core/src/vulkan/meson.build(adds the .comp shader and the new .c source)core/src/feature/feature_extractor.c(extern decl + list entry under#if HAVE_VULKAN)scripts/ci/cross_backend_vif_diff.py(float_admentry inFEATURE_METRICS).github/workflows/tests-and-quality-gates.yml(lavapipefloat_admstep atplaces=4)- Invariant: float_adm GPU port uses the
2 * sup - idx - 1mirror form on both axes — matches both the scalaradm_dwt2_sand the AVX2float_adm_dwt2_avx2, which both consume the samedwt2_src_indices_filt_sindex buffer. This is intentionally different from float_vif's GPU mirror (ADR-0197), which uses-2because float_vif's AVX2 path takes a different code branch. Do not "fix" the asymmetry by analogy with float_vif. - Re-test:
meson setup build-vk -Denable_vulkan=enabled -Denable_cuda=false \
-Denable_sycl=false
ninja -C build-vk
meson test -C build-vk
VK_LOADER_DRIVERS_SELECT='*lvp*' python3 \
scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build-vk/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature float_adm --places 4
0083 — SSIMULACRA 2 Vulkan kernel (ADR-0201)¶
- ADR: ADR-0201
- Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf — fully fork-local feature.
- Touches:
core/src/feature/vulkan/ssimulacra2_vulkan.c(new file).core/src/feature/vulkan/shaders/ssimulacra2_xyb.comp,ssimulacra2_blur.comp,ssimulacra2_mul.comp,ssimulacra2_ssim.comp(4 new shader files).core/src/vulkan/meson.build— added 4 shaders tovulkan_shader_sourcesand 1 source tovulkan_sources; added all 4 ssimulacra2 shaders topsnr_hvs_strict_shaders(the-O0strict-mode list, kept its legacy name).core/src/feature/feature_extractor.c— registeredvmaf_fex_ssimulacra2_vulkanin the Vulkan branch of the extractor list (betweenpsnr_hvs_vulkanand the CUDA block).scripts/ci/cross_backend_vif_diff.py— addedssimulacra2toFEATURE_METRICS.- Rebase impact: low — fully additive, no upstream-shared files modified beyond
feature_extractor.c's registry array (which always grows on every new extractor and is not a rebase pain point). - Verification command:
meson setup core/build-vk-ss2 \
-Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false \
libvmaf
ninja -C core/build-vk-ss2 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build-vk-ss2/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 \
--feature ssimulacra2 --backend vulkan --places 1
# expected: max_abs_diff ≈ 1.59e-2, 0/48 mismatches at places=1
- Follow-ups:
- CUDA + SYCL twins (batch 3 parts 7b + 7c per ADR-0192).
- Performance follow-up: re-bin multiple rows / columns per WG in the IIR blur (currently
local_size = 1, one row/col per WG for correctness). - Optional: rename
psnr_hvs_strict_shaderstostrict_shadersincore/src/vulkan/meson.build(cosmetic — out of scope for this PR).
0001 — SIMD bit-identical reductions for float ADM¶
- Workstream PRs: #18, commits
24c88a32,f082cfd3. - Touches:
core/src/feature/integer_adm.c,core/src/feature/float_adm.c,core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/feature/arm64/adm_neon.c, upstreampython/test/feature_extractor_test.pytest expectations. - Invariant:
sum_cubeandcsf_den_scaleaccumulate cubed values in double precision (via_mm256_cvtps_pd/_mm512_cvtps_pd) in scalar, AVX2, AVX-512, and NEON. Upstream accumulates in float, which produces ~8e-5 drift between scalar and SIMD. Test expectations were tightened to match the double-precision path; an upstream-side accumulator change would re-introduce the drift and break the tightened assertions. - Re-test:
meson test -C build --suite=fast && python -m pytest python/test/feature_extractor_test.py -k adm.
0002 — CUDA ADM decouple-inline buffer elimination¶
- Workstream PRs: commit
787e3382. - Touches:
core/src/feature/cuda/integer_adm_cuda.cu,core/src/feature/cuda/adm_decouple_inline.cuh(new),core/src/feature/cuda/meson.build. Upstream'sadm_decouple.cuis no longer compiled in the fork. - Invariant: CSF and CM CUDA kernels read
ref/disDWT2 buffers directly and computedecouple_r/decouple_ainline via__device__helpers inadm_decouple_inline.cuh. The 6 intermediate buffers (decouple_r,decouple_a,csf_a× {scale-0 int16, scales 1-3 int32}) and the standaloneadm_decouple.cusource are intentionally removed. ~107 MB GPU memory savings at 4K. An upstream change toadm_decouple.cuwill look orphaned and a literal merge would re-introduce the buffer allocations. - Re-test:
meson setup build -Denable_cuda=true && ninja -C build && meson test -C build --suite=cuda.
0003 — SYCL backend (USM pool / D3D11 import / vmaf_sycl_* API)¶
- Workstream PRs: #33, #35, #5 (initial scaffolding), and the picture-pool deadlock fix that landed via #32.
- Touches:
core/include/libvmaf/libvmaf_sycl.h,core/src/sycl/,core/src/feature/sycl/,core/src/libvmaf.c(SYCL public-API entry points),meson_options.txt(enable_sycl). - Invariant:
vmaf_sycl_preallocate_picturesconstructs a realVmafSyclPicturePoolhonoringVmafSyclPicturePreallocationMethod(NONE/DEVICE/HOST);vmaf_sycl_picture_fetchdispatches to the pool when configured. The whole SYCL tree is fork-local and has no upstream counterpart — upstream changes tocore/src/libvmaf.cnear the SYCL entry-point block are likely to conflict. Picture-pool error paths invmaf_read_pictures(libvmaf.c) mustgoto cleanup;rather thanreturn err;to avoid leaking ref/dist pictures into the live-picture set (closes the always-on-pool deadlock fixed in #32 — see ADR-0104). See ADR-0101, ADR-0103, ADR-0104. - Re-test:
meson setup build -Denable_sycl=true && ninja -C build && meson test -C build --suite=sycl(requires oneAPI / icpx).
0004 — DNN runtime + tiny-AI surfaces¶
- Workstream PRs: #5, #8, #21, #22, #23, #31, #34, plus the pre-numbered DNN feat commits (
9b985946,1e5336d3,d122b721). - Touches:
core/include/libvmaf/dnn.h,core/src/dnn/,core/src/feature/feature_lpips.c,model/tiny/,meson_options.txt(enable_onnxruntime). - Invariant: ordered EP selection (CUDA → DML → CPU) with graceful fallback (ADR-0102);
fp16_iodoes host-side fp32↔fp16 cast on the scoring path;VMAF_TINY_MODEL_DIRenforces a path jail on model load (PR #31); the runtime op-allowlist (PR #21) walks the ONNX graph and rejects unknown ops + bounds Loop/Iftrip_countat 1024 (ADR-0036/0107). DNN tree is fork-local; upstream has no DNN code yet, so conflicts here are unlikely but themeson_options.txtandcore/src/meson.buildblocks near the DNN flag may collide. - Re-test:
meson setup build -Denable_onnxruntime=true && ninja -C build && meson test -C build --suite=dnn.
0005 — --precision CLI flag (IEEE-754 round-trip lossless)¶
- Workstream PRs: commit
c989fbd9. - Touches:
core/tools/vmaf.c,core/tools/cli_parse.c,core/include/libvmaf/libvmaf.h(addedvmaf_write_output_with_format),core/src/output.c. - Invariant: default
--precisionis%.17g(round-trip lossless);legacyopts back into upstream's%.6f; the public C API gainedvmaf_write_output_with_formatand the oldvmaf_write_outputroutes through it with the%.17gdefault. ABI-breaking only if upstream adds a same-named function with a different signature. See ADR-0006. - Re-test:
vmaf -r ref.yuv -d dis.yuv ... --precision=fulland diff against--precision=legacy.
0006 — Netflix golden tests preserved verbatim as required gate¶
- Workstream PRs: across the fork's life; codified in ADR-0024.
- Touches:
python/test/quality_runner_test.py,python/test/vmafexec_test.py,python/test/vmafexec_feature_extractor_test.py,python/test/feature_extractor_test.py,python/test/result_test.py,python/test/resource/yuv/. - Invariant:
assertAlmostEqual(...)golden values in the five upstream Python test files are never modified by this fork. Fork-added tests live in separate files (e.g.python/test/test_precision_flag.py). The CI gate "Netflix CPU golden tests (D24)" is required and blocks merge. Upstream changes to these files are accepted unless they relax the assertions. - Re-test:
make test-netflix-golden.
0007 — Build system (CUDA 13.2, oneAPI 2025.3, MkDocs migration)¶
- Workstream PRs: #7, #17, commit
8a995cb0. - Touches:
meson.build,meson_options.txt, top-levelMakefile,docs/(Sphinx → MkDocs Material migration —docs/conf.pyremoved,mkdocs.ymladded),docs/requirements.txt,Dockerfile.*, distro install scripts underscripts/. - Invariant: image pins are non-conservative (ADR-0027) — CUDA 13.2, oneAPI 2025.3, clang-format 22, black 26 — and ship experimental toolchain flags (
--expt-relaxed-constexpr, etc.) deliberately. An upstream sync that pulls in a Dockerfile change targeted at older CUDA or older oneAPI must not relax the pins. - Re-test:
meson setup build -Denable_cuda=true -Denable_sycl=true && ninja -C build && mkdocs build --strict.
0008 — Workspace / docs / MATLAB / resource-tree relocations¶
- Workstream PRs: codified across ADR-0026, ADR-0029, ADR-0030, ADR-0031, ADR-0032, ADR-0033, ADR-0034, ADR-0038.
- Touches: any path-walk in upstream's CI / scripts / docs that assumes the upstream layout (root-level
workspace/,resource/,matlab/, rootunittestscript, rootpatches/). - Invariant: the fork's layout is
python/vmaf/workspace/,python/vmaf/resource/,python/vmaf/matlab/,scripts/unittest,ffmpeg-patches/only,.github/codeql-config.yml. Upstream moves to a different sub-tree (e.g. a hypotheticaltools/workspace/) need to either be applied via a corresponding fork-side relocation or rejected with a rebase note. - Re-test:
python -m pytest python/test/ -k golden(verifies the resource-tree path works);make test-netflix-golden.
0009 — License headers (Lusoris/Claude on wholly-new files¶
2016–2026 on Netflix files)
- Workstream PRs: commits
c159761d,a185f8ef,0e98c949, codified in ADR-0025 / ADR-0105. - Touches: every wholly-new fork file (notably the SYCL tree and
core/src/dnn/) and every Netflix-touched file (year range2016 → 2016–2026). - Invariant: wholly-new fork files carry
Copyright 2026 Lusoris and Claude (Anthropic)under the same BSD-3-Clause-Plus-Patent license; mixed files use a dual-copyright notice. An upstream commit that resets a Netflix file's year range (e.g. back to2016–2020) must be partially rejected — keep the fork's2016–2026. - Re-test: grep that wholly-new fork files retain the Lusoris/Claude header (
grep -L "Copyright 2026 Lusoris" core/src/sycl/*.cpp— expected to match nothing).
0010 — .claude/ agent scaffolding + ADR tree + AGENTS.md / CLAUDE.md¶
- Workstream PRs: #14, #24, #37, plus continuous additions.
- Touches:
.claude/,AGENTS.md,CLAUDE.md,docs/adr/,.github/PULL_REQUEST_TEMPLATE.md. - Invariant: this whole tree is fork-local and has no upstream counterpart. Upstream additions to
.github/(issue templates, workflows) need to merge cleanly with the fork's existing files rather than replacing them. The ADR tree's IDs ≤ 0099 are backfills; new decisions start at 0100 (ADR-0028 / ADR-0106). - Re-test: visual review of
.github/anddocs/adr/README.mdafter the merge.
Pre-ADR-0108 entries above are the result of a one-shot backfill sweep on 2026-04-18; subsequent fork-local PRs add their own entries inline.
0011 — Nightly bisect-model-quality + fixture cache¶
- Workstream PRs: closes #4; sticky tracker issue #40.
- Touches:
.github/workflows/nightly-bisect.yml,ai/scripts/build_bisect_cache.py,ai/testdata/bisect/{features.parquet, models/*.onnx, README.md},scripts/ci/post-bisect-comment.py,docs/ai/bisect-model-quality.md,docs/adr/0109-nightly-bisect-model-quality.md,docs/research/0001-bisect-model-quality-cache.md,mkdocs.yml(nav). - Invariant: the committed parquet + ONNX bytes under
ai/testdata/bisect/must regenerate byte-identically fromai/scripts/build_bisect_cache.pywith seedsFEATURE_SEED=20260418andMODEL_SEED=20260419. The CI--checkstep asserts this before every bisect run, so any upstream pull that bumpspandas/pyarrow/onnxenough to change the serialiser bytes will fail the workflow until the cache is regenerated and committed. - Re-test:
python ai/scripts/build_bisect_cache.py --check
vmaf-train bisect-model-quality \
ai/testdata/bisect/models/model_*.onnx \
--features ai/testdata/bisect/features.parquet \
--min-plcc 0.85 --input-name input
# Expected: "no regression in this range"; first_bad_index None.
Pure upstream code is not touched, so no Netflix-side conflict vector. Only fork-local files; risk is toolchain drift, not merge conflict.
0012 — Upstream ADM port (Netflix 966be8d5)¶
- Workstream PRs: this PR; ports a single upstream commit.
- Touches:
core/src/feature/integer_adm.{c,h},core/src/feature/x86/adm_avx2.{c,h},core/src/feature/x86/adm_avx512.{c,h},core/src/feature/alias.c,core/src/feature/barten_csf_tools.h(new upstream file). - Invariant: the eight ADM files now mirror upstream's content byte-for-byte (modulo our clang-format-22 pass and the Netflix copyright-year bump on the new header). Future
/sync-upstreamruns can take new upstream ADM commits cleanly. Do not revert to a pre-966be8d5ADM kernel without also reverting the call-site signatures ininteger_compute_adm— upstream extendedi4_adm_cmfrom 8 to 13 args. - Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model version=vmaf_v0.6.1 -o /tmp/vmaf-port.json
grep '<metric name="vmaf"' /tmp/vmaf-port.json
# Expected: mean ≈ 76.66890 (golden 76.66890519623612, places=4 OK).
0013 — Upstream motion port (Netflix PR #1486 head 2aab9ef1)¶
- Workstream PRs: this PR; ports upstream PR #1486 (4 commits on top of
966be8d5ADM base, head2aab9ef1). Sister to entry 0012. - Touches:
core/src/feature/integer_motion.{c,h},core/src/feature/motion_blend_tools.h(new upstream file),core/src/feature/x86/motion_avx2.c,core/src/feature/x86/motion_avx512.c,core/src/feature/alias.c(additive:integer_motion3row),python/test/{quality_runner,vmafexec,feature_extractor,vmafexec_feature_extractor}_test.py(golden tolerance updates:places=4→places=2on motion-affected asserts; expected values unchanged). - Invariant: motion files mirror upstream byte-for-byte (modulo our clang-format-22 pass). The
alias.crow forinteger_motion3was inserted surgically to avoid clobbering the AVX-512 ADM registration added by entry 0012; new motion3 metric appears in default VMAF model output but is not standalone-loadable via--feature integer_motion3(sub-feature only). Netflix golden VMAF mean shifts76.668904824→76.667830213(well withinplaces=2tolerance the upstream PR loosened to). Do not revertplaces=4on motion-touching assertions without also reverting the motion code. - Re-test:
ninja -C core/build && meson test -C core/build
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model version=vmaf_v0.6.1 -o /tmp/vmaf-motion-port.json
grep -E '<metric name="vmaf"|integer_motion3' /tmp/vmaf-motion-port.json
# Expected: vmaf mean ≈ 76.66783; integer_motion3 mean ≈ 3.98976.
0014 — Coverage gate overhaul + upstream python/test/ reformat¶
- Workstream PRs: this PR (coverage-gate overhaul + in-tree reformat of upstream-mirror Python tests).
- Touches:
.github/workflows/ci.yml(CPU + GPU coverage jobs:-Dc_args=-fprofile-update=atomic/-Dcpp_args=-fprofile-update=atomic,meson test --num-processes 1,-Denable_dnn=enabled, ORT install step on the CPU coverage job,lcov/geninforeplaced bygcovrwith--json-summary/--xml/--txtoutput, artifact renamecoverage-lcov-{cpu,gpu}→coverage-{cpu,gpu}),scripts/ci/coverage-check.sh(rewritten to parse gcovr JSON viapython3 -c— same CLI signature),core/src/dnn/dnn_api.c+ newcore/src/dnn/dnn_attach_api.c(vmaf_use_tiny_modelcarved out into its own TU so the unit-test binaries — which pull indnn_sourcesforfeature_lpips.cbut never linklibvmaf.c— don't end up with an undefined reference tovmaf_ctx_dnn_attachonceenable_dnn=enabledactivates the real bodies),core/src/dnn/meson.build+core/src/meson.build(newdnn_libvmaf_only_sourceslist wired intolibvmaf.soonly),python/test/{feature_extractor,quality_runner,vmafexec,vmafexec_feature_extractor}_test.py(mechanical Black + isort reformat — no assertion values changed, imports regrouped, line wrapping normalised). - Invariant: coverage CI must keep all five pieces in lockstep — (a)
-fprofile-update=atomiccloses the intra-process counter race on SIMD inner loops (vif_avx2.c:673,motion_avx2, etc.) → negative counts →geninfo/gcovr abort; (b)--num-processes 1closes the inter-process race where multiple parallel test binaries merge their counters into the same.gcdafiles for the sharedlibvmaf.soat process exit (per-thread atomicity does not cover this); (c)gcovrdeduplicates.gcnofiles belonging to the same source compiled into multiple targets — without dedup, lcov sums hits across compilation units and yields impossible100% values (
dnn_api.c — 1176%was the smoking gun on the first attempt that had only (a)+(b)); (d) ORT install +enable_dnn=enabledin the coverage job is what makescore/src/dnn/*.cmeasurable in the first place — without ORT, the DNN tree compiles in stub branches and the 85% per-critical-file gate is meaningless; (e)vmaf_use_tiny_modellives indnn_attach_api.cand is added tolibvmaf.soonly viadnn_libvmaf_only_sources— moving it back intodnn_api.creintroduces thevmaf_ctx_dnn_attachundefined-reference link error intest_feature_extractor/test_lpipswheneverenable_dnn=enabled, since those test binaries pull indnn_sourcesforfeature_lpips.cbut never linklibvmaf.c. Lint scope: upstream-mirror Python tests are linted at the same standard as fork-added code; we accept that/sync-upstreamand/port-upstream-commitwill re-trigger Black/isort failures whenever upstream rewrites these files, and the fix is another in-tree reformat pass — never an exclusion. The fork'spyproject.tomland.pre-commit-config.yamlkeeppython/test/resource/(binary fixtures only) excluded;python/test/*.pyis in scope. See ADR-0110 (race fixes, superseded) and ADR-0111 (gcovr + ORT layer). - Re-test:
# Reproduce coverage path locally (requires gcc + python3-pip):
pip install --user 'gcovr>=8.0'
cd libvmaf
meson setup build-cov-test --buildtype=debug -Db_coverage=true \
-Denable_avx512=true -Denable_float=true -Denable_dnn=disabled \
-Dc_args=-fprofile-update=atomic -Dcpp_args=-fprofile-update=atomic
ninja -C build-cov-test
meson test -C build-cov-test --print-errorlogs --num-processes 1
~/.local/bin/gcovr --root .. \
--filter 'src/.*' \
--exclude '.*/test/.*' --exclude '.*/tests/.*' \
--exclude '.*/subprojects/.*' \
--gcov-ignore-parse-errors=negative_hits.warn \
--gcov-ignore-parse-errors=suspicious_hits.warn \
--print-summary --txt build-cov-test/coverage.txt \
--json-summary build-cov-test/coverage.json \
build-cov-test
grep -E 'dnn_api|model_loader' build-cov-test/coverage.txt
# Expected: gcovr completes without "Unexpected negative count" AND no
# per-file percentages exceed 100% (drop --num-processes 1 to reproduce
# the multi-process .gcda merge race; switch back to lcov to reproduce
# the dnn_api.c — 1176% over-count from compilation-unit summation).
# Lint smoke test for upstream-mirror tree:
pre-commit run --files python/test/quality_runner_test.py
# Expected: Black/isort/Ruff all PASS — files are reformatted in-tree
# to fork style and stay clean until the next upstream sync.
0015 — Tox doctest collection skips vmaf/resource/¶
- Workstream PRs: this PR (
fix(ci): skip pytest doctest collection of vmaf/resource/ data files). Surfaced once ADR-0115 consolidated CI triggers tomasterand tox actually started running on PRs. - Touches:
python/tox.ini(single-line--ignore=vmaf/resourceadded to the pytest invocation, plus an explanatory comment block). Pure fork-local; no upstream Python file changes. - Invariant:
pytest --doctest-modulesmust not attempt to import files underpython/vmaf/resource/. Those are parameter / dataset / example-config.pyfiles; several have dots in their stems (e.g.vmaf_v7.2_bootstrap.py) that make them unimportable as Python modules. None carry doctests, so the ignore is correctness rather than a workaround. Do not drop the--ignore=vmaf/resourceflag without first verifying every file under that directory has been renamed to a dot-free stem and is importable. - Re-test:
cd python && tox -e py311 -- --collect-only --doctest-modules \
--ignore=vmaf/resource 2>&1 | grep -c "ERROR collecting vmaf/resource"
# Expected: 0 (was 5 before the fix).
Pure upstream code is not touched, so no Netflix-side conflict vector. Risk is upstream renaming or removing files under python/vmaf/resource/ such that the directory disappears, in which case the --ignore becomes a harmless no-op.
0016 — SYCL -fsycl link-arg gated on icpx CXX¶
- Workstream PRs: this PR (
fix(libvmaf): gate -fsycl link arg on icpx CXX, allow gcc/clang host linker). Surfaced once ADR-0115's CI consolidation added an Ubuntu SYCL job to PR-time CI that usesCXX=g++(host linker) with sidecar icpx for SYCL .cpp compilation. - Touches:
core/src/meson.build(thevmaf_link_argsblock immediately after theis_sycl_enabledflag handling — currently ~lines 696-712). Pure fork-local; no upstream Meson file changes expected. - Invariant:
-fsyclis appended tovmaf_link_argsonly whenmeson.get_compiler('cpp').get_id() == 'intel-llvm'(icpx). Rationale: the documented project mode (see comment nearis_sycl_enabledblock at top ofsrc/meson.build) compiles SYCL.cppfiles viacustom_targetwith icpx, while the project's CXX driver may be gcc / clang / msvc; in that mode the SPIR-V device code is already embedded in the icpx-compiled.ofiles at compile time, and the runtime libraries (libsycl+libsvml+libirc+libze_loader) declared as link dependencies resolve every symbol. Passing-fsyclto a non-icpx linker is a hard error (g++: error: unrecognized command-line option '-fsycl'). Do not remove thecpp.get_id() == 'intel-llvm'guard without first verifying every CI matrix leg uses icpx as the project CXX. - Re-test:
meson setup build -Denable_sycl=true \
-Dcpp_link_args=-Wl,--no-undefined
ninja -C build src/libvmaf.so.3
# Expected: link succeeds; no `-fsycl` errors with gcc/clang host CXX.
Pure fork-local guard; no Netflix-side conflict vector.
0017 — CLI precision default %.6f (Netflix-compat) + frame-skip unref¶
- Workstream PRs: this PR (
fix(cli): revert precision default to %.6f and unref skipped frames). Reverts the default flipped by commitc989fbd9(ADR-0006) per ADR-0119. Companion fix incore/tools/vmaf.cresolves the picture-pool exhaustion in the--frame_skip_ref/distloops surfaced once the always-on picture pool (ADR-0104) made unref'ing skipped pictures mandatory. - Touches:
core/tools/cli_parse.c(VMAF_DEFAULT_PRECISION_FMT+VMAF_LOSSLESS_PRECISION_FMTmacros,resolve_precision_fmt()body,--helptext)core/tools/cli_parse.h(field comments only; struct shape unchanged)core/src/output.c(DEFAULT_SCORE_FORMATmacro)core/tools/vmaf.c(skip loop bodies at thec.frame_skip_ref/c.frame_skip_distfor-loops)python/vmaf/core/result.py(per-frame and aggregate:.6fformatters)python/test/command_line_test.pyis unmodified — Netflix golden assertions stay frozen per CLAUDE.md §8; the binary's output format adapts to them, not the other way around.- Invariant:
vmafCLI default score-output format is%.6f(matches upstream Netflix byte-for-byte).--precision=max|fullselects%.17g(IEEE-754 round-trip lossless).--precision=legacyis a synonym for the default. The library default forvmaf_write_output_with_format(..., score_format=NULL)matches. Skipped frames in the--frame_skip_ref/--frame_skip_distpre-loops arevmaf_picture_unref'd immediately after fetch so the preallocated picture pool is not exhausted before the main scoring loop runs. Do not flip the macros back to%.17gor remove the unrefs without a superseding ADR — both are golden-gate-load-bearing. - Re-test:
ninja -C core/build
python -m pytest python/test/command_line_test.py \
::VmafexecCommandLineTest::test_run_vmafexec \
::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping \
::VmafexecCommandLineTest::test_run_vmafexec_with_frame_skipping_unequal \
-v
# Expected: all three PASS in <1 s combined.
Pure fork-local; no Netflix-side conflict vector. If upstream ever changes the default format string, treat their value as the new baseline and reconfirm the golden assertions before adopting.
0018 — FFmpeg patches ship as ordered series.txt¶
- Workstream PRs: this PR (
fix(ci): drop dead sycl trigger + consolidate windows.yml into libvmaf.yml (ADR-0115)). Surfaced once ADR-0115's consolidation routed the docker / FFmpeg-SYCL jobs through the master-targeting CI gate for the first time on this branch — the standalone0003-…sycl…apply broke because it referenced struct fields added by0001-…tiny-model…, the Dockerfile onlyCOPY'd 0003, andffmpeg.ymlreferenced a stale../patches/path. - Touches:
Dockerfile(lines ~86-95 — the FFmpeg patch-apply block),.github/workflows/ffmpeg.yml(theBuild FFmpeg with SYCL patch seriesstep),ffmpeg-patches/000{1,2,3}-*.patch(regenerated via realgit format-patch -3so they carry validindex <sha>..<sha> <mode>lines and committable SHAs). Pure fork-local; no upstream FFmpeg or Netflix file changes. - Invariant: both the Dockerfile and
ffmpeg.ymlwalkffmpeg-patches/series.txtline-by-line and apply each patch viagit applywith apatch -p1fallback. Do not ship a new patch without appending it toseries.txt, and do not reorder existing entries — patch 0003 references LIBVMAFContext fields added by patch 0001, so any out-of-order apply breaks the build at hunk 2 of vf_libvmaf.c. - Two flag-side fixes bundled in the same PR:
--enable-libvmaf-syclis not a valid FFmpeg configure option. Patch 0003 usescheck_pkg_config libvmaf_sycl …auto-detection (matching howlibvmaf_cudais wired) — it never registers the switch. Both Dockerfile and ffmpeg.yml used to pass the flag and configure rejected it withUnknown option "--enable-libvmaf-sycl". SYCL support is now controlled solely by-Denable_sycl=trueat libvmaf build time; FFmpeg picks it up automatically whenlibvmaf-sycl.pcis onPKG_CONFIG_PATH.- The Dockerfile now carries two nvcc-flag ARGs.
NVCC_FLAGS(libvmaf) keeps four-gencodelines plus the experimental--extended-lambda/--expt-relaxed-constexpr/--expt-extended-lambdaflags needed for Thrust/CUB host+device code.FFMPEG_NVCC_FLAGS(FFmpeg) carries a single-gencode arch=compute_75,code=sm_75 -O2— FFmpeg'scheck_nvccrunsnvcc -ptx, which fails withnvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectureson multi-arch input, and--extended-lambdarequires host+device compilation. compute_75 PTX is forward-compatible with all newer GPUs via driver JIT. --enable-libnppis no longer passed to FFmpeg's configure. FFmpeg n8.1's libnpp probe carries an explicitdie "ERROR: libnpp support is deprecated, version 13.0 and up are not supported"(configure:7335-7336) that fires on the base image's CUDA 13.2 libnpp. We don't use scale_npp / transpose_npp / sharpen_npp in any VMAF workflow; cuvid + nvdec + nvenc + libvmaf-cuda is the actual GPU path. Revisit once we move to an FFmpeg release that supports CUDA 13 libnpp upstream.- Patch 0002 (
add-vmaf_pre-filter) gained a missing#include "libavutil/imgutils.h"forav_image_copy_plane(). FFmpeg's libavfilter Makefile builds with-Werror=implicit-function-declarationso this fired during the actual compile (not configure). Caught by a localdocker buildrather than waiting for GitHub Actions — much faster iteration loop. - Re-test:
cd /tmp && rm -rf ffmpeg-test && \
git clone -q --depth 1 -b n8.1 \
https://git.ffmpeg.org/ffmpeg.git ffmpeg-test && \
cd ffmpeg-test && \
while IFS= read -r line; do \
case "$line" in ''|\#*) continue ;; esac; \
git apply "/path/to/vmaf/ffmpeg-patches/$line" \
|| patch -p1 < "/path/to/vmaf/ffmpeg-patches/$line"; \
done < /path/to/vmaf/ffmpeg-patches/series.txt
# Expected: all three patches apply with no rejects; the resulting
# tree compiles with --enable-libvmaf. SYCL is auto-detected via
# check_pkg_config (patch 0003), so no explicit configure flag is
# required when libvmaf-sycl.pc is on PKG_CONFIG_PATH.
Pure fork-local series; no Netflix-side conflict vector. See ADR-0118.
0019 — Coverage Gate annotations: upload-artifact v7 + gcovr filter¶
- Workstream PRs: this PR.
- Touches:
.github/workflows/ci.yml(CPU + GPU coverage steps: gcovr stderr piped throughgrep -vE 'Ignoring (suspicious|negative) hits' ... || true),.github/workflows/{ci,lint,nightly,nightly-bisect,supply-chain,libvmaf}.yml(actions/upload-artifact@v5|@v6 → @v7,actions/download-artifact@v5 → @v7insupply-chain.yml). Note:windows.ymlwas consolidated intolibvmaf.ymlby ADR-0115 / PR #50, so the windows-side bump now lives inlibvmaf.yml'sbuild (MINGW64, …)job. - Invariant: Coverage Gate Annotations panel must finish empty on a clean run. The two pieces are coordinated — (a)
@v7for upload / download artifact actions silences GitHub's Node-20 deprecation banner ahead of the 2026-06-02 forced-Node-24 cutoff; (b) the gcovr stderr filter swallows theIgnoring (suspicious|negative) hitswarnings that gcovr 8 emits for the legitimately-large hit counts in tight ANSNR / VIF / motion inner loops (e.g.ansnr_tools.c:207at ~4.93 G hits across an HD multi-frame coverage suite — real, not gcov bug). The filter is regex-narrow and anchored to gcov's exact warning prefix; any other gcovr warning still surfaces. Upstream (Netflix/vmaf) does not maintain these CI files; rebase impact is limited to the unlikely case that an upstream sync touches the shared.github/workflows/tree, which it currently does not. See ADR-0117. - Re-test:
# Verify gcovr filter locally (after a coverage build per entry 0014):
~/.local/bin/gcovr --root .. \
--filter 'src/.*' \
--exclude '.*/test/.*' --exclude '.*/tests/.*' \
--exclude '.*/subprojects/.*' \
--gcov-ignore-parse-errors=negative_hits.warn \
--gcov-ignore-parse-errors=suspicious_hits.warn \
--print-summary --txt build-cov-test/coverage.txt \
build-cov-test \
2> >(grep -vE 'Ignoring (suspicious|negative) hits' >&2 || true)
# Expected: stderr contains the gcovr summary block but NO
# "Ignoring (suspicious|negative) hits" lines. coverage.txt unchanged.
# Verify all upload/download-artifact instances are on @v7:
grep -rE 'actions/(upload|download)-artifact@v[0-6]' .github/workflows/
# Expected: empty output.
0020 — CI workflow file + display-name renames (Title Case sweep)¶
- Workstream PRs: this PR; renames all six core
.github/workflows/*.ymlfiles to purpose-descriptive kebab-case and normalises every workflowname:and jobname:to Title Case. See ADR-0116. - Touches:
.github/workflows/{ci,lint,security,libvmaf,ffmpeg,docker}.yml(renamed viagit mvtotests-and-quality-gates.yml,lint-and-format.yml,security-scans.yml,libvmaf-build-matrix.yml,ffmpeg-integration.yml,docker-image.yml),README.md(5 badge URLs + labels),docs/principles.md(line 5 workflow-tuple update),.claude/skills/add-gpu-backend/SKILL.md+scaffold.sh(filename refs),docs/adr/0116-*.md(new),docs/adr/README.md(index row),CHANGELOG.md. - Invariant: workflow files are purpose-named; their
name:fields are Title Case sentences with em-dash axis tags; job-levelname:strings are Title Case sentences (Build — / Pre-Commit / Coverage Gate / etc.). Required-status-check contexts inmasterbranch protection are bound to job-level names — when renaming any job, re-pin viagh api --method PUT repos/VMAFx/vmafx/branches/master/protection. The 19 required gates' semantics are unchanged from ADR-0037; only their display strings move. - Re-test:
# Validate every workflow file parses and lists the expected job names.
cd .github/workflows
for f in tests-and-quality-gates.yml lint-and-format.yml security-scans.yml \
libvmaf-build-matrix.yml ffmpeg-integration.yml docker-image.yml; do
yq '.name, .jobs.[].name' "$f" || echo "PARSE FAIL: $f"
done
# Expected: each workflow prints its Title Case workflow name + job names;
# no PARSE FAIL lines.
0021 — DNN-enabled CI matrix legs (gcc + clang + macOS)¶
- Workstream PRs: this PR; adds three new entries to the
libvmaf-buildmatrix in.github/workflows/libvmaf-build-matrix.ymlcovering-Denable_dnn=enabledacross Ubuntu/gcc, Ubuntu/clang, and macOS/clang. See ADR-0120. - Touches:
.github/workflows/libvmaf-build-matrix.yml(3 new matrix entries + ORT install steps + dedicated dnn-suite test step),docs/adr/0120-ai-enabled-ci-matrix-legs.md(new),docs/adr/README.md(index row),CHANGELOG.md(Added entry). - Invariant: the DNN matrix legs install ONNX Runtime via the same pinned source as the dedicated Tiny AI job (tests-and-quality-gates.yml) — Linux: MS tarball at the version pinned by
ORT_VERSION; macOS: Homebrew. When the Tiny AI job's pin changes, the matrix legs'ORT_VERSIONenv in theirInstall ONNX Runtime (linux, DNN leg)step must change to match; otherwise compiler/portability coverage drifts away from the gating leg's actual ABI. - Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.libvmaf-build.strategy.matrix.include[] | select(.dnn==true) | .name' \
.github/workflows/libvmaf-build-matrix.yml
# Expected output (3 lines):
# Build — Ubuntu gcc (CPU) + DNN
# Build — Ubuntu clang (CPU) + DNN
# Build — macOS clang (CPU) + DNN
# Local DNN build sanity (matches what each leg will run):
meson setup libvmaf core/build --buildtype release \
--prefix $PWD/install -Denable_float=true -Denable_dnn=enabled
ninja -vC core/build install
meson test -C core/build --suite=dnn --print-errorlogs
- Branch protection: the two Linux DNN legs are pinned as required status checks on
masterimmediately after this PR's merge (19 → 21 contexts). The macOS leg stays informational (experimental: true) because Homebrew ORT floats. Re-pin command:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
--input /tmp/protection-update.json
0022 — Windows GPU build-only matrix legs (MSVC + CUDA, MSVC + oneAPI SYCL)¶
- Workstream PRs: this PR; adds a new top-level
windows-gpu-buildjob to.github/workflows/libvmaf-build-matrix.ymlwith two matrix entries (CUDA, SYCL). See ADR-0121. - Touches:
.github/workflows/libvmaf-build-matrix.yml(newwindows-gpu-buildjob),docs/adr/0121-windows-gpu-build-only-legs.md(new),docs/adr/README.md(index row),CHANGELOG.md(Added entry),core/src/compat/win32/pthread.h(new — Win32 pthread shim for MSVC; mirrorscompat/gcc/stdatomic.hpattern),core/src/feature/integer_adm.h(UPSTREAM — converted thedwt_7_9_YCbCr_threshold[3]designated initializer to positional form so MSVC/nvcc-on-Windows accepts the C++ parse; semantically identical, no behavioural change),core/src/ref.handcore/src/feature/feature_extractor.h(UPSTREAM — added#if defined(__cplusplus) && defined(_MSC_VER)branch around#include <stdatomic.h>so MSVC C++ TUs pullatomic_intviausing std::atomic_int;; POSIX paths unchanged),core/src/sycl/d3d11_import.cpp(fix non-existent<libvmaf/log.h>→"log.h"),core/src/sycl/dmabuf_import.cpp(move<unistd.h>inside#if HAVE_SYCL_DMABUFguard for non-VA-API hosts),core/src/sycl/common.cpp(replace POSIXclock_gettime(CLOCK_MONOTONIC)with portablestd::chrono::steady_clock),core/src/feature/x86/motion_avx2.c(UPSTREAM — replace GCC vector-extension__m256i[N]indexing at line 529 with_mm256_extract_epi64; bit-exact),core/src/feature/x86/adm_avx2.c(UPSTREAM — replace 6(__m256i)(_mm256_cmp_ps(...))casts with_mm256_castps_si256(...)and 12__m128i[N]reductions with_mm_extract_epi64; bit-exact),core/src/feature/x86/adm_avx512.c(UPSTREAM — replace 12__m128i[N]reductions with_mm_extract_epi64; bit-exact),core/src/log.c(UPSTREAM — gate<unistd.h>behind!_WIN32, include<io.h>+ redirectisatty/filenoto_isatty/_filenofor MSVC),core/src/feature/integer_vif.c(UPSTREAM — switch thealigned_malloccursor fromvoid *touint8_t *with explicit typed-pointer casts so MSVC accepts the byte-wise pointer arithmetic),core/src/feature/cuda/integer_adm_cuda.c(UPSTREAM — drop unused<unistd.h>include),core/src/dnn/model_loader.c(fork-added — Windows fallback definitions for POSIXS_ISDIR/S_ISREGpath-classification macros),.github/workflows/lint-and-format.yml(fork-added — setlfs: trueon the pre-commit job's checkout so LFS-stored ONNX blobs resolve and don't appear as phantom pre-commit-induced diffs),core/src/feature/x86/motion_avx512.c(UPSTREAM — replace 1__m128i[N]reduction with_mm_extract_epi64; bit-exact),core/src/feature/x86/{vif_statistic_avx2,ansnr_avx2,ansnr_avx512,float_adm_avx2,float_adm_avx512,float_psnr_avx2,float_psnr_avx512,ssim_avx2,ssim_avx512}.c(UPSTREAM — convert 17 sites of trailing__attribute__((aligned(N)))to leading C11_Alignas(N); same alignment, MSVC-portable),core/src/feature/mkdirp.candcore/src/feature/mkdirp.h(UPSTREAM third-party MIT-licensed micro-library — gate<unistd.h>to non-Windows, add<direct.h>+_mkdirfor Windows, addmode_ttypedef for MSVC),core/meson.build(newpthread_dependencygated oncc.check_header('pthread.h')failing),core/src/meson.buildandcore/test/meson.build(threadpthread_dependencyinto every target compiling pthread-using TUs). - Invariant: Windows GPU legs are pinned to the same toolchain versions as the corresponding Linux GPU legs (CUDA 13.0.0, oneAPI BaseKit 2025.3.0.372) so a Linux-vs-Windows divergence implies an MSVC ABI issue, not a tooling-version delta. When either Linux GPU leg bumps its toolchain, the Windows leg must move in lockstep — the Intel installer URL on Windows hard-codes the per-release directory id and the version string, so the bump is two-line edits in the SYCL
Install Intel oneAPI (windows)step (theWINDOWS_BASEKIT_URLenv var). Both legs additionally inject/experimental:c11atomicsintoCFLAGS/CXXFLAGSbecause libvmaf uses C11 atomics that MSVC's<stdatomic.h>rejects without that opt-in flag — when MSVC ships full C11 atomics support, the flag becomes unconditional and can be dropped. Two Windows-only dependency steps round out the parity: the CUDA leg'sJimver/cuda-toolkitsub-package list includes bothcrt(CUDA Runtime Library compile-time headers, shipscrt/host_config.h;cuda_ccclis not a valid Windows sub-package name — installer rejects it) andnvvm(shipsnvvm/bin/cicc.exe+nvvm/libdevice/libdevice.*.bc; without it, nvcc's.cu → PTXstage fails withThe system cannot find the path specified.— on Linux apt pulls NVVM in transitively withcuda-nvcc-XY, Windows requires it explicitly); the SYCL leg builds the Level Zero loader from source (oneapi-src/level-zerov1.18.5 →cmake --build … --target install) because Windows oneAPI BaseKit ships the SYCL runtime but notze_loader.lib, and libvmaf's mesoncc.find_library('ze_loader')needs both the header and the import library. When the Linux aptlevel-zero-devversion moves, bump the L0 git tag to match.core/src/meson.buildguards the explicitsvml/irccc.find_librarycalls behindhost_machine.system() != 'windows'— those calls exist for the gcc/g++ + icpx Linux flow where the host linker is non-Intel; on Windows the host compiler is icx-cl itself and auto-injects the Intel runtime. Round-10 surfaced an additional Windows-only gap: ~14 libvmaf TUs#include <pthread.h>unconditionally, but MSVC and clang-cl ship no pthread (MinGW does, via winpthreads). The fork now ships a header-only Win32 shim atcore/src/compat/win32/pthread.hmapping the in-use pthread subset (mutex / cond / thread create+join+detach) onto SRWLOCK + CONDITION_VARIABLE +_beginthreadex. The shim is wired in viapthread_dependencyincore/meson.build, declared only whencc.check_header('pthread.h')fails — so MinGW and POSIX paths stay untouched. When upstream Netflix/vmaf adds new pthread surface (e.g.,pthread_rwlock_*), extendcompat/win32/pthread.hto cover it. Both nvcc fatbincustom_targets (CUDA) and icpxcustom_targets (SYCLcommon.cpp/picture_sycl.cpp/dmabuf_import.cpp, plus the SYCL feature kernels) bypass meson'sdependencies:plumbing and hand-roll their own-Ilists, so the shim path must be threaded into bothcuda_extra_includesandsycl_inc_flagsexplicitly on Windows. icpx-cl on Windows additionally rejects-fPIC(unsupported option for target 'x86_64-pc-windows-msvc') — sosycl_common_argsandsycl_feature_argsroute their-fPICtoken throughsycl_pic_arg = host_machine.system() != 'windows' ? ['-fPIC'] : []. PIC is the default for Windows DLLs, so dropping the flag is the correct fix rather than a workaround. Round-14 surfaced a third Windows-only blocker:core/src/feature/integer_adm.h(an upstream Netflix file, last touched by upstream port d06dd6cf) initialisesdwt_7_9_YCbCr_threshold[3]with C99 designated initializers ({.a = ..., .k = ..., .f0 = ..., .g = {...}}). The header is included from bothinteger_adm.c(C TU) andcuda/integer_adm/*.cu(C++ TU via nvcc); MSVC's C++ frontend (and nvcc's cudafe++ on Windows) rejects C99 designated initializers without/std:c++20. Converted to positional initialization in the same struct-member order (a / k / f0 / g[4]) — the conversion is provably semantically identical and works in every C/C++ standard, so it costs nothing on the upstream-merge side beyond a trivial conflict marker if upstream Netflix later edits the same lines. Restore designated form post-merge if upstream has it. Round-17 surfaced four more Windows/MSVC-only SYCL blockers, two of which touch upstream-shared headers. (a)core/src/ref.handcore/src/feature/feature_extractor.h(UPSTREAM) unconditionally#include <stdatomic.h>and use theatomic_inttypedef in struct definitions. MSVC's<stdatomic.h>(added in 19.34) only declares the C11 symbols inside the global namespace under C; in C++ compilation (icpx-cl drives the SYCL TUs as C++) MSVC surfaces them only insidenamespace std::. gcc/clang expose both via a GNU extension, so the upstream code works on every other platform. The fork now wraps both headers'#include <stdatomic.h>in#if defined(__cplusplus) && defined(_MSC_VER)→#include <atomic>+using std::atomic_int;, falling through to the original<stdatomic.h>line on every other configuration. ABI is unchanged —atomic_intresolves to the same underlying type. If upstream Netflix adds further C11 atomic typedefs in these headers (e.g.,atomic_uint,atomic_size_t), extend theusing std::lines to cover them. (b)core/src/sycl/d3d11_import.cpp(fork-added) used<libvmaf/log.h>which doesn't exist —log.hlives atcore/src/log.hand is internal. Switched to"log.h"; the icpx invocation already supplies the src-relative-I. (c)core/src/sycl/dmabuf_import.cpp(fork-added) included<unistd.h>at file scope, but POSIXclose()is only used inside the#if HAVE_SYCL_DMABUFVA-API block. Moved the<unistd.h>include inside that guard so non-DMA-BUF builds (Windows MSVC, macOS) compile cleanly. (d)core/src/sycl/common.cpp(fork-added) calledclock_gettime(CLOCK_MONOTONIC), which doesn't exist on Windows. Replaced withstd::chrono::steady_clock(guaranteed monotonic by the C++ standard, portable on every supported host). All four fixes preserve POSIX/Linux behaviour bit-identically and only change the Windows MSVC build path. Round-18 surfaced a fifth Windows blocker on the CUDA leg's CPU SIMD compile path:core/src/feature/x86/motion_avx2.c:529(UPSTREAM, ported in commit 9371a0aa from Netflix PR #1486) computedfinal_accum[0] + final_accum[1] + final_accum[2] + final_accum[3]to extract the four int64 lanes from an__m256i. gcc/clang allow this via the GNU vector-extension treatment of__m256i(it carries__attribute__((vector_size(32)))); MSVC rejects it withC2088: built-in operator '[' cannot be applied to an operand of type '__m256i'. Replaced with_mm256_extract_epi64(final_accum, N)for N ∈ {0..3}, summed — bit-exact lane sum on every compiler. Restore the index form post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Round-19 surfaced the same MSVC pattern at 19 more call sites across the AVX2/AVX-512 ADM and motion files plus six GCC-style vector casts.core/src/feature/x86/adm_avx2.c(UPSTREAM): 6 lines (915-920) used(__m256i)(_mm256_cmp_ps(...))C-style casts that gcc/clang accept via the GNU vector extension; replaced with the dedicated_mm256_castps_si256(...)bit-cast intrinsic. 12 lane-extract sites (r2_h[0]+r2_h[1], etc. at lines 2420 / 2425 / 2430 / 2893 / 2897 / 2901 / 4079 / 4084 / 4089 / 4627 / 4631 / 4635) replaced with_mm_extract_epi64(r2_X, N)summed pair.core/src/feature/x86/adm_avx512.c(UPSTREAM): 6 sister lane-extract sites (lines 4470 / 4477 / 4484 / 4625 / 4631 / 4637) — same fix. The AVX-512 paths reduce a__m512idown to__m128ifirst (via_mm512_extracti64x4_epi64→_mm256_extracti64x2_epi64) before the index, so only the final__m128i[N]step needed changing.core/src/feature/x86/motion_avx512.c(UPSTREAM, ported in 9371a0aa from PR #1486): one finalr2[0]+r2[1]reduction (line 448), same fix. All 19 lane-extract fixes plus the 6 cast fixes are bit-exact rewrites and only change the source-level syntax to MSVC-portable form. Restore the original forms post-merge if upstream Netflix later edits the same lines and your toolchain matrix doesn't include MSVC. Additionallycore/src/sycl/d3d11_import.cpp(fork-added) switched from C-style COBJMACROS helpers (ID3D11Device_CreateTexture2D,…_Release, etc.) to C++ method-call syntax (device->CreateTexture2D,tex->Release) — d3d11.h gates COBJMACROS behind!defined(__cplusplus), so the C-style helpers aren't visible in this.cppTU. The two forms are ABI-equivalent (both dispatch through the COM vtable); the choice is purely lexical and POSIX builds aren't affected (the whole TU is#ifdef _WIN32). Round-20 surfaced two more Windows-only blockers. (a) 17 sites across the x86 SIMD layer used GCC'sfloat tmp[N] __attribute__((aligned(M)));form to align scratch buffers for_mm{256,512}_store_ps. MSVC rejects the trailing-attribute syntax withC2146: syntax error: missing ';' before identifier '__attribute__'. Replaced with the C11-standard_Alignas(M) float tmp[N];(alignment specifier before the type) — works in gcc, clang and MSVC with/std:c11. Files touched (all UPSTREAM):vif_statistic_avx2.c(×2),ansnr_avx2.c(×2),ansnr_avx512.c(×2),float_adm_avx2.c(×2),float_adm_avx512.c(×2),float_psnr_avx2.c(×1),float_psnr_avx512.c(×1),ssim_avx2.c(×4),ssim_avx512.c(×4). The pre-existingvif_avx2.c/vif_avx512.calready define a portableALIGNED(x)macro at file scope and position the attribute before the type, so they compile cleanly under MSVC and were not touched. (b)core/src/feature/mkdirp.c(UPSTREAM, third-party MIT-licensed copy of Stephen Mathieson's micro-library) included<unistd.h>unconditionally but never used POSIXunistdsymbols (onlymkdirvia<sys/stat.h>/<direct.h>). Gated<unistd.h>to non-Windows and added<direct.h>for Windows; switchedmkdir(pathname)→_mkdir(pathname)(the non-deprecated MSVC name).core/src/feature/mkdirp.hadded amode_ttypedef under MSVC since neither<sys/types.h>nor<sys/stat.h>declare it on Windows;modeis ignored on the Windows path anyway. Round-21 surfaced two more blockers (the round-19__m128i[N]sweep missed six sites) plus a pre-commit workflow checkout gap. (a)core/src/feature/x86/adm_avx512.c(UPSTREAM) had six furtherr2_X[0] + r2_X[1]reductions at lines 2128 / 2135 / 2142 / 2589 / 2595 / 2601 that reduce a__m512iaccumulator down to__m128ibefore the lane index. Replaced with the same_mm_extract_epi64(r2_X, N)summed-pair pattern used in round 19 — bit-exact, MSVC-portable. (b)core/src/log.c(UPSTREAM) included<unistd.h>unconditionally to pick up POSIXisatty/fileno. On MSVC both live in<io.h>as_isatty/_fileno; gated the include and macro-redirected the names so the one call site at line 34 compiles on both sides without touching the POSIX path. (c).github/workflows/lint-and-format.yml(fork-added) checks out withoutlfs: true, so themodel/tiny/*.onnxfiles land as LFS pointer stubs. pre-commit's "changes made by hooks" reporter then diffs the stubs against HEAD's real blobs and fails the job even though no hook touched them. Addedlfs: trueto the pre-commit job's checkout. (d)core/src/meson.build—cuda_common_vmaf_libstatic library had nodependencies:list, so the Win32 pthread shim (wired in viapthread_dependencyin core/meson.build) wasn't on its include path;cuda/common.hunconditionally#include <pthread.h>and MSVC failed with C1083. Addeddependencies : [pthread_dependency]— no-op on POSIX (empty list), routes the shim path in on Windows. (e)core/src/feature/integer_vif.c(UPSTREAM) walked one bigaligned_mallocresult asvoid *dataand diddata += pad_size/data += h * stride_16etc. to carve the buffer into typed sub-pointers. gcc/clang accept pointer arithmetic onvoid *as a GNU extension (treatingsizeof(void) == 1); MSVC rejects it withC2036: 'void *': unknown size. Replaced the cursor type withuint8_t *and added explicit casts at assignment sites that take a typed pointer (uint16_t *mu1,uint32_t *mu1_32, etc.). Byte offsets are identical, layout unchanged, bit-exact. If upstream Netflix edits the same loop, reabsorb the walk and re-apply the cursor-type + cast pattern. (f)core/src/feature/cuda/integer_adm_cuda.c(UPSTREAM) included<unistd.h>at line 33 but used no POSIX symbols from it; MSVC failed with C1083. Dropped the unused include outright — simplest fix, no runtime change on any platform. (g)core/src/dnn/model_loader.c(fork-added) usesS_ISDIR/S_ISREGto classify resolved paths. MSVC ships the underlyingS_IFMT/S_IFDIR/S_IFREGbit masks in<sys/stat.h>but not the POSIX classification macros. Added a Windows-only fallback (#ifndef S_ISDIR #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) #endif, same for S_ISREG) guarded by#ifdef _WIN32. Semantically identical to the POSIX macro on Linux/macOS. Round-21e surfaced the final source-portability blockers once the DLL build passed preprocessing. (h)core/src/predict.c,core/src/libvmaf.candcore/src/read_json_model.c(all UPSTREAM) used C99 variable-length arrays —double scores[cnt]at predict.c:385,char name[name_sz]at predict.c:453 and libvmaf.c:1741, pluscfg_name[cfg_name_sz]andgenerated_key[generated_key_sz]in the.jsonmodel-collection parser. gcc/clang accept VLAs as a C11 optional feature; MSVC (even with/std:c11) rejects them outright withC2057: expected constant expression(plus C2466 and C2133 on theconst size_tsized arrays — MSVC treatsconstas runtime-bounded, not a constant expression, even when the initialiser is literal like4 + 1). Replaced each runtime-sized buffer with a smallmalloc+ explicitfreeon every exit path (in predict.c and read_json_model.c agoto out;cleanup arm was introduced because the loops error-exit mid-function). Thegenerated_keybuffer in read_json_model.c uses the narrower fix —char generated_key[5];— since its size (four decimal digits of the bootstrap sub-model index plus NUL) is a true compile-time constant. Buffers are a handful of bytes each (name_szis the model-collection name length plus the fixed_ci_p95_losuffix,scoresholds ~20 doubles,cfg_nameis the name plus_0000suffix), so the heap round-trip is not performance-relevant; the new-ENOMEMfailure mode is handled uniformly by existing callers. The read_json_model.c refactor also plugs a pre-existing leak of thenamebuffer on the earlyreturn -EINVALwhen a JSON object key isn't a string — thegoto out;path freesname+cfg_nameon every exit.core/test/test_feature_extractor.c:56(UPSTREAM) declaredconst unsigned n_threads = 8;and used it as the extent ofVmafFeatureExtractorContext *fex_ctx[n_threads];. Converted toenum { n_threads = 8 };so MSVC sees a constant-expression; every other compiler accepts enum constants identically. Re-absorb if upstream Netflix later edits the same loops and your toolchain matrix omits MSVC. (i) The Windows MSVC build-only legs now build the full tree — CLI tools, unit tests and libvmaf.dll — rather than the previous short cut of disabling-Denable_tools/-Denable_tests. Per user direction ("fix the code ffs"), the tree polyfills the remaining POSIX surfaces on MSVC instead: (core/tools/compat/win32/getopt.h+core/tools/compat/win32/getopt.c) a from-scratch POSIX/GNU-compatiblegetopt_longshim (short / long options,no_argument/required_argument/optional_argument, argv permutation for non-option operands,--explicit stop,=-embedded values). The shim is fork-added (BSD-3-Clause-Plus-Patent, Copyright 2026 Lusoris and Claude) and declared via a singlegetopt_dependencyincore/meson.build, gated oncc.check_header('getopt.h')failing. The dependency auto-propagates the shim.cinto any consuming target via meson'ssources:keyword, so both thevmafCLI (core/tools/meson.build) and thetest_cli_parseunit test (core/test/meson.build) pick it up uniformly. MinGW ships<getopt.h>via mingw-w64-crt, socheck_headersucceeds there and the shim stays out of the TU list. (j) Eleven test executables (test_log,test_dict,test_opt,test_cpu,test_ref,test_feature,test_ciede,test_luminance_tools,test_cli_parse,test_sycl,test_sycl_pic_preallocation) were missingpthread_dependencyin theirdependencies:lists atcore/test/meson.build. On POSIXpthread_dependencyis an empty list so the omission was invisible; on MSVC those TUs transitively includefeature_collector.h→<pthread.h>and fail with C1083. Threaded the dependency through all eleven targets.test_cli_parseadditionally listsgetopt_dependencyto pick up the shim. (k) Three additional VLA sites surfaced once the test harness built on MSVC:test_cambi.c:254hadunsigned w = 5, h = 5; uint16_t buffer[3 * w];; converted toenum { w = 5, h = 5 };so the array extent is a constant expression.test_pic_preallocation.c:382andtest_pic_preallocation.c:506hadconst int num_threads = N; pthread_t threads[num_threads];— MSVC rejectsconst intas non-constant-expression. Converted toenum { num_threads = N, fetches_per_thread = M };. (l)test_ring_buffer.c:23andtest_pic_preallocation.c:26included<unistd.h>forusleep/sleep. Gated behind!_WIN32with a Win32 fallback via<windows.h>+#define usleep(us) Sleep(((us) + 999) / 1000)/#define sleep(s) Sleep((s) * 1000). The conversion rounds sub-millisecondusleepinputs up, which is safe for these test paths (they use 100 µs jitter and 1 s waits). (m)core/tools/vmaf.cincluded<unistd.h>forisatty/fileno. Applied the same gating pattern used inlog.cin round-21(b) — include<io.h>on MSVC and redirectisatty/filenoto_isatty/_filenovia#define. (n)__builtin_clz/__builtin_clzllare GCC intrinsics; MSVC ships__lzcnt/__lzcnt64via<intrin.h>instead. The shim already lived incore/src/feature/integer_vif.hbutinteger_adm.c:939,x86/adm_avx2.c:1425andx86/adm_avx512.c:1217don't include that header. Extracted the shim into a dedicatedcore/src/feature/compat_builtin.h(fork-added) and included it from all four TUs. The guard isdefined(_MSC_VER) && !defined(__clang__), so clang-cl / icx-cl (which provide the GCC intrinsics natively) skip the shim. (o) The SYCL leg's D3D11 import TUcore/src/sycl/d3d11_import.cppis C++ (icpx-cl drives it as C++ on Windows) but included the internal C headerlog.hwithout anextern "C"wrap.log.his an upstream Netflix header with no__cplusplusguard, sovmaf_loggot C++ name-mangled in the .cpp TU and failed to resolve against the C-linkage symbol produced bylog.cat link time (LNK2019from every test target that pulls in the SYCL static lib). Wrapped the#include "log.h"withextern "C" { ... }inside the fork-added .cpp rather than touching the upstream header — keepslog.hidentical to upstream on every/sync-upstream. (p) The Windows MSVC legs build with--default-library=static. libvmaf's public API has no__declspec(dllexport)attributes (upstream Netflix is POSIX-shaped), so a vanilla MSVC shared build producessrc/vmaf-3.dllwith no exported symbols and the toolchain therefore never emits the companionvmaf.libimport library. Downstream tool targets then fail withLNK1181: cannot open input file 'src\vmaf.lib'. The MinGW matrix leg has used--default-library staticsince day one for the same reason (line 387); the MSVC legs now mirror that choice viamatrix.include[].meson_extra. Downstream consumers that want a DLL can either add__declspec(dllexport)decorations to the public API or use a.deffile; that is a separate decision and out of scope for the build-only gate. - Re-test:
# Local sanity: the matrix file parses and the new job names exist.
yq '.jobs.windows-gpu-build.strategy.matrix.include[].name' \
.github/workflows/libvmaf-build-matrix.yml
# Expected output (2 lines):
# Build — Windows MSVC + CUDA (build only)
# Build — Windows MSVC + oneAPI SYCL (build only)
- Branch protection: the two Windows GPU legs are pinned as required status checks on
masterimmediately after this PR's merge. After ADR-0120's two Linux DNN legs the count moves 21 → 23. Re-pin via:
gh api --method PUT repos/VMAFx/vmafx/branches/master/protection \
--input /tmp/protection-update.json
0023 — CUDA gencode coverage (sm_86/sm_89/compute_80 PTX) + init hardening¶
- Workstream PRs: the ADR-0122 PR (gencode + init hardening) and the ADR-0123 follow-up for the
32b115dfpost-cubin-load regression. - Touches:
core/src/meson.build— thegencodearray in theif get_option('enable_nvcc')branch.core/src/cuda/common.c—vmaf_cuda_state_init()error paths (multi-line actionable log,cuda_free_functions()+free(c)+*cu_state = NULLcleanup).docs/backends/cuda/overview.md—## Runtime requirementssection and### GPU architecture coveragetable.- Invariant: the
gencodearray unconditionally emits cubins forsm_75/sm_80/sm_86/sm_89plus acompute_80PTX, independent of hostnvccversion. Upstream Netflix's gencode only ships cubins at Txx major boundaries (sm_75/sm_80/sm_90/sm_100/sm_120); a literal merge that replaces our array with upstream's would re-open the Ampere-sm_86/ Ada-sm_89coverage hole. Thesm_90/sm_100/sm_120entries are still version-gated and should be preserved verbatim if upstream adds new gates. The init-path error messages are fork-local strings; upstream's terse"Error: failed to load CUDA functions"must NOT win a merge. - Re-test:
meson setup build -Denable_cuda=true -Denable_nvcc=true
ninja -C build 2>&1 | grep -E 'compute_(80|86|89)'
# Expect at least -gencode=arch=compute_86,code=sm_86 and
# -gencode=arch=compute_89,code=sm_89 and
# -gencode=arch=compute_80,code=compute_80
# Actionable init message (run without CUDA driver on the loader path):
LD_LIBRARY_PATH= ./build/tools/vmaf --help 2>&1 | grep -qi 'libcuda.so.1' || \
echo "init log regressed"
0024 — vmaf_read_pictures null-guard for CUDA device-only path¶
- Workstream PRs: the ADR-0123 follow-up landed atop the ADR-0122 gencode/init-hardening work.
- Touches:
core/src/libvmaf.c— the non-threaded tail ofvmaf_read_picturesat theprev_refupdate site (line ~1428 in the fork; upstream equivalent is the tail added byf740276a).- Invariant: the
prev_refupdate is guarded byif (ref && ref->ref)so pure-CUDA extractor sets (whereref = &ref_hostbutref_hostwas never populated bytranslate_picture_device) do not deref a NULL refcount. Upstream currently has the same unguarded tail; the bug is masked upstream only because the experimentalVMAF_PICTURE_POOLgate from32b115dfis still in place. A literal upstream merge that removes our null-guard while upstream's experimental gate is still holding would pass tests but re-open thelibvmaf_cudaffmpeg crash the moment the gate flips default-on (which the fork did in65460e3a, ADR-0104). Keep the guard until the upstream null-guard port lands. - Re-test:
# Unit tests cover the non-regression on the library side:
meson test -C build
# End-to-end regression: ffmpeg libvmaf_cuda must exit 0 on a
# CUDA-device-only extractor set (full recipe in ADR-0123).
./ffmpeg -init_hw_device cuda=cu:0 -filter_hw_device cu \
-i /tmp/ref.mp4 -i /tmp/dis.mp4 \
-lavfi "[0:v]format=yuv420p,hwupload_cuda[r];\
[1:v]format=yuv420p,hwupload_cuda[d];\
[r][d]libvmaf_cuda=log_path=/tmp/out.json:log_fmt=json" \
-f null -
0025 — VIF init() fail-path frees advanced byte-cursor¶
- Workstream PRs: PR #47 (rewritten to leak-fix-only after master absorbed the void→uint8_t half via commit
b0a4ac3a, entry 0022 §e). Ports the leak-fix half of upstream Netflix PR #1476. - Touches:
core/src/feature/integer_vif.c(UPSTREAM — 2-line fix in theinit()fail:handler). - Invariant:
init()walksuint8_t *dataforward throughaligned_malloc's one allocation, advancing past each sub-pointer assignment. Ifvmaf_feature_name_dict_from_provided_featuresreturns NULL the fail path must free the base pointers->public.buf.data, never the advanced cursordata. Upstream master still hasaligned_free(data)there — same bug — so this entry is the reminder to not let an upstream sync re-introduce the advanced-cursor form. If upstream lands PR #1476 or an equivalent, the sync can drop this entry. - Re-test:
meson test -C build --suite=fast
# Static check: ripgrep the pattern that must NOT return.
rg -n "aligned_free\(data\)" core/src/feature/integer_vif.c && \
echo 'REGRESSED' || echo 'ok'
0026 — Automated rule-enforcement workflow + copyright pre-commit hook¶
- Workstream PRs: this PR (ADR-0124 adoption). Closes the "rule-without-a-check" gap on ADR-0100 / 0105 / 0106 / 0108.
- Touches (all FORK-ADDED — no upstream overlap):
.github/workflows/rule-enforcement.yml(new),scripts/ci/check-copyright.sh(new),.pre-commit-config.yaml(appended local hook). - Invariant: the
deep-dive-checklistjob is blocking on every PR that is not an upstream port (exempt viaport:title prefix orport/branch). The other three gates (doc-substance-check,adr-backfill-check, copyright pre-commit) are advisory or pre-commit, never CI-blocking; this split is the whole point of ADR-0124 and an upstream sync must not move them into the required-status-check set without a follow-up ADR. The opt-out parser matches/^-?\s*no .* (?:needed|impact|rebase-sensitive)/per ADR-0108 §Opt-out-lines — if upstream ever changes PR-template phrasing (unlikely; this is fork-local), the regex and the template must move together. - Re-test:
# Lint the workflow + hook locally.
pre-commit run --files \
.github/workflows/rule-enforcement.yml \
scripts/ci/check-copyright.sh \
.pre-commit-config.yaml
# Dry-run the copyright hook against a staged source file.
scripts/ci/check-copyright.sh core/src/libvmaf.c && echo ok
# Synthetic PR body that violates ADR-0108 should fail the parser;
# see docs/research/0002-automated-rule-enforcement.md §Verification
# plan for the three test cases.
0027 — SSIMULACRA 2 scalar extractor (libjxl FastGaussian IIR blur)¶
- Workstream PRs: this PR (
feat/ssimulacra2-scalar); proposal ADR in PR #67. - Touches:
core/src/feature/ssimulacra2.c(fork-local, new),core/src/meson.build,core/src/feature/feature_extractor.c. - Invariant: the extractor embeds several tables that must track libjxl upstream — opsin absorbance matrix,
MakePositiveXYBoffsets, 108 pooling weights, polynomial-transform coefficients, and the FastGaussian coefficient-derivation formulas (radius =3.2795·σ + 0.2546, Cramer's 3×3 solve for β, n2/d1 assignment per Charalampidis 2016 (33)). If libjxl ever changes any of these, updatessimulacra2.cin the same PR that syncs upstream. Self-consistency must stay at exactly100.000000for identical ref/dist inputs — this is the cheapest regression check. - Re-test:
meson test -C build --suite=fast
./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc00_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 --feature ssimulacra2 -o /tmp/self.xml \
&& grep -q 'ssimulacra2="100.000000"' /tmp/self.xml \
&& echo "ok: self-consistency 100.0"
0028 — MS-SSIM separable decimate + AVX2/AVX-512/NEON SIMD¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2(supersedes the rebase-incompatiblefeat/ms-ssim-decimate-simd; AVX2/AVX-512, commits7de8cd7fscalar separable,5f93c864AVX2,73436438AVX-512);feat/ms-ssim-decimate-neon-v2(NEON follow-up, stacked). - Touches:
core/src/feature/ms_ssim_decimate.{c,h}(NEW),core/src/feature/x86/ms_ssim_decimate_avx2.{c,h}(NEW),core/src/feature/x86/ms_ssim_decimate_avx512.{c,h}(NEW),core/src/feature/arm64/ms_ssim_decimate_neon.{c,h}(NEW),core/src/feature/ms_ssim.c(call-site change),core/src/meson.build(register new SIMD TUs),core/test/test_ms_ssim_decimate.c(NEW),core/test/meson.build(arm64 gating). - Invariant: the 9-tap 9/7 biorthogonal wavelet LPF coefficients (
ms_ssim_lpf_h/ms_ssim_lpf_v) are duplicated verbatim in five TUs for bit-identity: the scalarms_ssim_decimate.c, the AVX2 variant, the AVX-512 variant, the NEON variant, and upstream'sg_lpf_h/g_lpf_vinms_ssim.c. Any upstream change to the coefficient values or theKBND_SYMMETRICmirror branch iniqa/convolve.cmust be mirrored to all five. If not mirrored, SIMD paths and scalar diverge silently and the bit-equalitymemcmpintest_ms_ssim_decimatecatches it — but only when that test runs, so diff the five files first. - Re-test (on each supported host arch):
# x86_64 host — native build.
meson test -C build
./build/test/test_ms_ssim_decimate
# aarch64 host OR aarch64 cross under qemu — see /tmp/aarch64-cross.txt.
meson setup build-arm64 libvmaf --cross-file /tmp/aarch64-cross.txt \
-Denable_cuda=false -Denable_sycl=false
ninja -C build-arm64
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
build-arm64/test/test_ms_ssim_decimate
# Netflix MS-SSIM golden — places=4 must still pass through SIMD.
.venv/bin/python -m pytest \
python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor
0029 — KBND_SYMMETRIC period-based reflection in iqa/convolve.c¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2follow-up (CI triage on PR #69, 2026-04-20). - Touches:
core/src/feature/iqa/convolve.c(upstream file, rewrittenKBND_SYMMETRIC). - Invariant:
KBND_SYMMETRIC(img, w, h, x, y, _)must use the period-based form (period = 2*w,period = 2*h) so that offsets with|x| > wor|y| > hstill land in bounds. Upstream's single-reflect form was out-of-bounds wheneverw < kernel_halforh < kernel_half; the latent bug did not reproduce in Netflix golden tests because MS-SSIM pyramids never decimate below ~60×34. Any upstream change that reverts to the single-reflect form must be rejected or re-ported. - Re-test:
./build/test/test_ms_ssim_decimate # test_1x1 border case
.venv/bin/python -m pytest \
python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_ms_ssim_fextractor
0030 — adm_decouple_s123_avx512 stack-array 64-byte alignment¶
- Workstream PRs:
feat/ms-ssim-decimate-simd-v2follow-up (CI triage on PR #69, 2026-04-20). - Touches:
core/src/feature/x86/adm_avx512.c(upstream file, one-line_Alignas(64)onint64_t angle_flag[16]at line 1317).core/test/test_pic_preallocation.c(upstream file, threevmaf_model_destroy(model)calls pairing thevmaf_model_loadintest_picture_pool_basic/_small/_yuv444). - Invariant: the stack slot for
angle_flagmust be 64-byte aligned because two_mm512_loadu_si512(&angle_flag[0/8])loads in the same scope may be promoted to alignedvmovdqa64by LTO. Dropping the_Alignas(64)annotation re-introduces the SEGV under--buildtype=release -Db_lto=true -Db_sanitize=address. Debug / no-LTO builds keepvmovdqu64and cannot flag the regression. Seedocs/development/known-upstream-bugs.md. - Re-test:
meson setup build-asan-lto libvmaf \
-Denable_cuda=false -Denable_sycl=false \
-Db_sanitize=address --buildtype=release -Db_lto=true
ninja -C build-asan-lto test/test_pic_preallocation
ASAN_OPTIONS=detect_leaks=1 \
./build-asan-lto/test/test_pic_preallocation
0031 — Batch-A upstream-port small-fix sweep (ports of unmerged PRs)¶
- Workstream PRs:
feat/batch-a-upstream-small-fix-sweep— commits546a40ee(T0-1),8fed8ad1(T4-4),83a1db46(T4-5),34425dee(T4-6). ADRs 0131, 0132, 0134, 0135. - Touches:
core/src/cuda/picture_cuda.c(one-linecuMemFreeport of Netflix#1382)core/src/feature/feature_collector.c+core/test/test_feature_collector.c(mount/unmount bugfix port of Netflix#1406 + shared-helper test refactor)core/src/meson.build(declare_dependency+override_dependencyport of Netflix#1451)core/include/libvmaf/model.h,core/src/model.c,core/test/test_model.c,docs/api/index.md(built-in model iterator port of Netflix#1424)- Invariant: each of the four upstream PRs is OPEN (unmerged) on the port date; when Netflix merges any of them, the fork's version is correction-bearing (T4-4 test refactor, T4-6 three defect fixes + Doxygen doc expansion), not line-identical. Resolution on upstream merge is always "keep fork version" because the fork's version already satisfies the PR's intent and additionally fixes the defects.
- Netflix#1406 conflict will land in
test_feature_collector.c— fork usesload_three_test_models()helper vs upstream's inline per-modelVmafModel *m0, *m1, *m2;duplication. - Netflix#1424 conflict will land in
core/src/model.candcore/test/test_model.c— fork useselse ifguard +idx + 1 < CNT+ const-qualified test types. - Netflix#1382 and Netflix#1451 are line-identical in substance; merge should be clean aside from trailing-comma style drift.
- Re-test:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_feature_collector test/test_model
build/test/test_feature_collector
build/test/test_model
# Expected: 6/6 pass in test_feature_collector (mount/unmount
# 3-model sequences); 39/39 pass in test_model (includes
# test_version_next full-iteration invariant).
0032 — Thread-local locale handling for numeric I/O (port of Netflix/vmaf#1430)¶
- Workstream PRs:
port/netflix-1430-thread-locale(T4-3 from the "Batch-A follow-up" sweep, 2026-04-20). - Touches:
core/src/thread_locale.h/core/src/thread_locale.c(new, upstream-authored);core/src/meson.build(twocdata.set('HAVE_USELOCALE'/'HAVE_XLOCALE_H')probes +src_dir + 'thread_locale.c'inlibvmaf_sources);core/src/output.c(four writers gainpush_c()+pop()bracket, preserving fork'sferror(outfile) ? -EIO : 0return contract from ADR-0119);core/src/svm.cpp(drop<locale.h>include; replacesetlocale/strdup/setlocalebracket withvmaf_thread_locale_push_c/pop; addbuffer.imbue(std::locale::classic())to both SVM parser ctors with fork's K&R + 4-space style);core/src/read_json_model.c(bracketmodel_parsewith push/pop);core/test/meson.build(newtest_locale_handlingtarget + test registration);core/test/test_locale_handling.c(new, upstream-authored with three fork corrections for thescore_formatparameter). - Invariant: fork's output writers return
ferror(outfile) ? -EIO : 0— this must survive any upstream refactor of the writer bodies. Thepush_c()call MUST be paired with apop()on every return path (writer bodies have a single tail return, so the pattern is locallypush → body → pop → return ferror-check). Droppingpop()leaks alocale_ton POSIX and leaves the thread locked to "C" on Windows. - Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_locale_handling
# Repro the user-visible failure without the fix:
LC_ALL=de_DE.UTF-8 build/tools/vmaf --reference ref.yuv \
--distorted dis.yuv --width 1920 --height 1080 \
--pixel_format 420 --bitdepth 8 --output result.json \
--json
# Assert output contains period decimals, not comma.
python -c "import json; d=json.load(open('result.json')); \
assert all('.' in repr(v) for v in \
[f['metrics']['vmaf'] for f in d['frames']])"
- On upstream sync: when Netflix merges PR #1430, the
(cherry picked from commit 054a97ed…)trailer ingit log port/netflix-1430-thread-localelets the next/sync-upstreamskip this commit. If the upstream diff drifts, redo the three fork corrections listed in ADR-0137 §Decision.
0033 — SSIM / MS-SSIM SIMD bit-exact to scalar via per-lane scalar double¶
- Workstream PRs:
feat/ms-ssim-decimate-neon(this PR — companion to the ADR-0138 convolve fast path). - Touches:
core/src/feature/x86/ssim_avx2.candcore/src/feature/x86/ssim_avx512.c—ssim_accumulate_*rewritten.ssim_precompute_*andssim_variance_*unchanged (they were already bit-exact). Plus the new bit-exactconvolve_avx2.c/convolve_avx512.cand the upstream h-pass OOB fix atiqa/convolve.c:159. - Invariants (see ADR-0139 §Decision):
- Convolve taps — single-rounded
float*float→ widen →doubleadd, NO FMA. Mirrors scalarsum += img[i]*k[j]iniqa/convolve.c. - SSIM accumulate — scalar's
2.0 *literal (2.0 * ref_mu[i] * cmp_mu[i] + C1and2.0 * srsc + C2) is a Cdoubleliteral. Both SIMD accumulators do the2.0 *numerator + division + finall*c*sproduct per-lane in scalar double to match scalar type promotions byte-for-byte. - H-pass outer-loop bound —
y < dst_h + vc - kh_even(noty < dst_h + vc); the- kh_evenis load-bearing because the last cache row on even-tap kernels (e.g. box-8) is never read by the v-pass but was previously written OOB when image height equals kernel height.
Fork-local SSIM SIMD is NOT upstream. If upstream ever adds their own SSIM AVX2/AVX-512, keep the fork's version on conflict — it's the only variant verified bit-exact to scalar at --precision max. - Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_iqa_convolve test_ms_ssim_decimate
# Bit-exactness check across dispatch backends:
FIX=python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv
DIS=python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv
for m in 255 16 0; do
build/tools/vmaf --cpumask $m --reference $FIX --distorted $DIS \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
--feature float_ssim --feature float_ms_ssim \
--output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_16.xml) # expect empty
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_0.xml) # expect empty
- On upstream sync: the AVX2/AVX-512 SSIM surface is entirely fork-local (upstream has VIF/ADM/motion/CAMBI SIMD but no SSIM). If upstream ever introduces SSIM SIMD, their kernel bodies will almost certainly compute
l*c*sin vector float for throughput — do not adopt. The fork's per-lane-scalar-double reduction is required for the bit-exactness claim. Same applies toconvolve_avx2/512— they are fork-only; dispatch sits inssim_tools.cvia_iqa_convolve_set_dispatch.
0034 — SIMD DX framework + NEON SSIM/convolve bit-exact port¶
- Workstream PRs:
feat/simd-dx-framework(this PR, PR #A); ships the two demos on top of which PR #B will consume the framework (ssimulacra2, motion_v2, vif_statistic, ...). - Touches:
core/src/feature/simd_dx.h(new header),core/src/feature/arm64/convolve_neon.c+convolve_neon.h(new NEON port),core/src/feature/arm64/ssim_neon.c(ssim_accumulate_neonrewritten for ADR-0139 bit-exactness;precompute+varianceunchanged),core/src/feature/float_ssim.c+core/src/feature/float_ms_ssim.c(wireiqa_convolve_neoninto the aarch64 dispatch setters),core/src/meson.build(arm64_sources+= convolve_neon.c),core/test/meson.build(test_iqa_convolvearch filter extended toarm64/aarch64),core/test/test_iqa_convolve.c(NEON variant check + aarch64 CPU flag detection),core/test/dnn/meson.build(test_cli.shgated onnot meson.is_cross_build()— bash invokes$VMAF_BINdirectly so meson's exe_wrapper isn't applied), newbuild-aux/aarch64-linux-gnu.inimeson cross-file,.claude/skills/add-simd-path/SKILL.md(upgraded kernel-spec flags). - Invariants (see ADR-0140 §Decision):
simd_dx.his fork-local. Keep the fork's version on upstream conflict. Macro names are ISA-suffixed (_AVX2_4L,_AVX512_8L,_NEON_4L) — do not collapse into a cross-ISA abstraction; the fork's SIMD policy (user-memoryfeedback_simd_dx_scope.md) rules out Highway / simde / xsimd.- The ADR-0138 widen-then-add rule (single-rounded
float * float→ widen →doubleadd, NO FMA) applies to NEON exactly as to AVX2 / AVX-512. The NEON form uses pairedfloat64x2_taccumulators (lo / hi) because NEON has nofloat64x4_t. - The ADR-0139 per-lane scalar-double reduction rule applies to
ssim_accumulate_neonexactly as to the AVX2 / AVX-512 variants. The NEON implementation usesSIMD_ALIGNED_F32_BUF_NEON(_Alignas(16) float name[4]) + a 4-iteration scalar loop. - Re-test (requires
aarch64-linux-gnu-gcc+qemu-user-static+ aarch64 sysroot at/usr/aarch64-linux-gnu):
cd libvmaf
meson setup ../build-aarch64 \
--cross-file ../build-aux/aarch64-linux-gnu.ini \
-Denable_cuda=false -Denable_sycl=false -Denable_dnn=disabled
cd ..
ninja -C build-aarch64
meson test -C build-aarch64 # expect 31/31 OK
# Bit-exactness check scalar vs NEON under QEMU:
REF=python/test/resource/yuv/src01_hrc00_576x324.yuv
DIS=python/test/resource/yuv/src01_hrc01_576x324.yuv
for m in 255 0; do
LD_LIBRARY_PATH=$PWD/build-aarch64/src qemu-aarch64-static \
-L /usr/aarch64-linux-gnu build-aarch64/tools/vmaf \
--cpumask $m --reference $REF --distorted $DIS \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--feature float_ssim --feature float_ms_ssim \
--output /tmp/ssim_$m.xml --precision max
done
diff <(grep -v '<fyi fps' /tmp/ssim_255.xml) \
<(grep -v '<fyi fps' /tmp/ssim_0.xml) # expect empty
- On upstream sync: upstream has no NEON SSIM and no NEON convolve for IQA. If they ever add one, keep the fork's version on conflict — the fork's NEON path is the only variant verified bit-exact to scalar at
--precision max. Thebuild-aux/aarch64-linux-gnu.inicross-file has no upstream equivalent. The/add-simd-pathskill is fork-only; upstream doesn't ship.claude/skills/.
0036 — Port Netflix generalised AVX convolve + ADR-0141 cleanup¶
- Workstream PRs:
port/upstream-f3a628b4-generalized-avx-convolve(this PR). - Upstream commit:
f3a628b4"feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21). - Touches:
- convolution.h — upstream-tracking: adds
#define MAX_FWIDTH_AVX_CONV 17. - convolution_avx.c — upstream-tracking (2,500 LoC deletion) plus fork-delta cleanup per ADR-0141: four scanline helpers
convolution_f32_avx_s_1d_*changed from external linkage tostatic(no other TU uses them after the specialised-path removal); stride parameters widened frominttoptrdiff_tin the helpers, with(ptrdiff_t)casts at public-function multiplication sites;#include <stddef.h>added for the type. core/src/feature/vif_tools.c— upstream-tracking: three AVX dispatch sites drop thefwidth == 17 || ... == 3whitelist in favour offwidth <= MAX_FWIDTH_AVX_CONV.python/test/quality_runner_test.py,python/test/vmafexec_test.py— upstream-authored loosening of two full-VMAF-score assertions fromplaces=2(±0.005) toplaces=1(±0.05). Adopted per the ADR-0142 Netflix-authority precedent (project rule #1 addresses fork drift, not upstream-authored test updates the fork must track).- Invariants (see ADR-0143 §Decision):
- Static linkage on scanline helpers — upstream leaves the four
convolution_f32_avx_s_1d_*_scanlinehelpers with external linkage out of habit; the fork narrows them tostatic. On upstream sync: if upstream ever externs them from another TU, that's a flag to re-audit; keep the fork'sstaticunless the reference is real. ptrdiff_tstrides inside helpers — the publicconvolution_f32_avx_*_swrappers keepintstrides (matching the upstream interface +convolution.hdeclarations). Helpers takeptrdiff_tto silencebugprone-implicit-widening-of- multiplication-result. If upstream changes the public interface toptrdiff_t, drop the fork's wrapper-level casts.MAX_FWIDTH_AVX_CONV = 17— the ceiling is upstream's; if upstream bumps it, the fork must rebuild + re-run the VIF golden test pair.- Re-test:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build # expect 32/32 OK
clang-tidy -p build core/src/feature/common/convolution_avx.c
# Zero warnings expected on the touched file.
Netflix CPU golden CI leg exercises the two loosened assertions; confirmed locally under meson test. - On upstream sync: upstream is the source of truth for convolution_avx.c, convolution.h, vif_tools.c dispatch, and the two python golden tolerances. On a rebase, prefer upstream for those files except: - Keep the fork's static on the four scanline helpers. - Keep the fork's ptrdiff_t helper signatures + multiplication- site casts (unless upstream adopts them too, in which case converge). - Keep the fork's #include <stddef.h>. If upstream re-introduces a specialised fast path for common widths, evaluate on a per-fwidth perf profile — the fork's /profile-hotpath skill covers this.
0038 — motion_v2 NEON SIMD (fork-local)¶
- Workstream PR:
port/motion-bundle-neon-and-updates(this PR). - Upstream: none — aarch64 NEON for
motion_v2is fork-local. Upstream scalar + AVX2 + AVX-512 variants exist; this PR adds the missing NEON fourth path. Scalar is the bit-exactness ground truth. - Touches (fork-local):
- motion_v2_neon.c — new TU, ~300 LoC. 4-wide int32 SIMD over the 5-tap Gaussian pipeline. Five
static inlinehelpers keep every function under the ADR-0141 60-line budget. - motion_v2_neon.h — new header declaring the two public entry points.
- integer_motion_v2.c — dispatch update: adds an
#if ARCH_AARCH64block ininitthat selects the NEON variant whenVMAF_ARM_CPU_FLAG_NEONis present, mirroring the existing x86 dispatch blocks. core/src/meson.build— addarm64/motion_v2_neon.cto thearm64_sourceslist.- Invariants (see ADR-0145 §Decision):
- Arithmetic right-shift throughout. The fork's AVX2 path uses
_mm256_srlv_epi64(logical) which can diverge from scalar on negative-diff pixels. The NEON port usesvshrq_n_s64(v, 16)for the known Phase-2 shift andvshlq_s64(v, -(int64_t)bpc)for the variable Phase-1 shift — both arithmetic, matching scalar C>>on signed integer. On rebase: keep the arithmetic forms; do NOT adoptvshrq_n_u64or a logical emulation even if it runs faster. - 4-lane stride + mirror tails. SIMD stride = 4; scalar tails cover the remainder. The Phase-2 helper
x_conv_row_sad_neonhands 4 lanes tox_conv_block4_neonand drops to scalar for both left/right edges (j < 2andj + 6 > w). On rebase: preserve the 4-lane stride and the two-sided scalar tail. - Signature parity with AVX2. Both pipeline entry points match the AVX2 + AVX-512 variants'
(const uint8_t *prev, ptrdiff_t, const uint8_t *cur, ptrdiff_t, int32_t *y_row, unsigned w, unsigned h, unsigned bpc)signature. On rebase: if upstream changes the signature, mirror the change here AND in the x86 variants in lockstep. - Re-test:
meson setup build-aarch64 libvmaf \
--cross-file build-aux/aarch64-linux-gnu.ini \
-Denable_cuda=false -Denable_sycl=false
ninja -C build-aarch64
meson test -C build-aarch64 --no-rebuild # expect 31/31 OK
clang-tidy -p build-aarch64 \
core/src/feature/arm64/motion_v2_neon.c
# Zero warnings expected on the touched file.
# NEON-vs-scalar bit-exact diff under QEMU:
YUV=python/test/resource/yuv
for mask in 0 255; do
LD_LIBRARY_PATH=build-aarch64/src \
qemu-aarch64-static -L /usr/aarch64-linux-gnu \
build-aarch64/tools/vmaf \
-r $YUV/src01_hrc00_576x324.yuv \
-d $YUV/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 -n --feature motion_v2 \
--cpumask $mask -o /tmp/mv2_$mask.xml --precision max
done
diff <(grep -v 'fps=' /tmp/mv2_0.xml) \
<(grep -v 'fps=' /tmp/mv2_255.xml) # expect empty
- On upstream sync: upstream has no NEON
motion_v2and has not signalled plans to add one. If they ever do, diff their NEON against the fork's: on logical-vs-arithmetic shift, keep the fork's arithmetic form (matches scalar). On the function decomposition (the five helpers), adopt upstream's if it's smaller; the fork's layout is ADR-0141-driven, not a semantic contract. - Follow-up T7-32 (fixed 2026-05-09): The
_mm256_srlv_epi64(logical right shift) inmotion_score_pipeline_16_avx2was replaced withsrav_epi64_imm, an AVX2-safe arithmetic-right-shift emulation: logical shift OR sign-fill mask viasrai_epi32+slli_epi64. Two bugs were closed in the same PR: - AVX2 logical-vs-arithmetic shift:
_mm256_srlv_epi64replaced bysrav_epi64_immincore/src/feature/x86/motion_v2_avx2.c. The emulation is bit-exact with scalar C>> bpcon signedint64_t. - Test scalar reference mirror:
mirror_idxincore/test/test_motion_v2_simd.cused2*size - idx - 1instead of2*size - idx - 2, diverging frominteger_motion_v2.c::mirror(). Fixed to-2. All four adversarial fixtures (neg-diff bpc10/12, mixed-diff bpc10/12) now pass.meson test -C build50/50 OK. On rebase: keepsrav_epi64_imm; do not revert to_mm256_srlv_epi64. The rebase-time invariant is now: AVX2 path uses arithmetic shift (matching NEON and scalar).
0039 — readability-function-size NOLINT sweep (ADR-0146)¶
- ADR: ADR-0146
- Touches:
core/src/dict.ccore/src/picture.ccore/src/picture_pool.ccore/src/predict.ccore/src/libvmaf.ccore/src/output.ccore/src/read_json_model.ccore/src/feature/feature_extractor.ccore/src/feature/feature_collector.ccore/src/feature/iqa/convolve.ccore/src/feature/iqa/ssim_tools.ccore/src/feature/x86/vif_statistic_avx2.c- Invariant: every
readability-function-sizeNOLINT suppression has been replaced by a set of smallstatic(orstatic inline, for the SIMD / IQA files) helpers. The helper names are stable interfaces the surrounding code depends on (e.g.iqa_convolve_1d_separable,iqa_convolve_2d,ssim_compute_stats,ssim_workspace_alloc/_free,vif_stat_simd8_compute/_reduce,struct vif_simd8_lane,read_pictures_extractor_loop,read_pictures_post_extractor,read_pictures_validate_and_prep,read_pictures_update_prev_ref). Upstream Netflix has no equivalent helpers; rebases touching any of these files will conflict against the fork's split shape. - On upstream sync:
- If upstream lands a different decomposition of
_iqa_convolveor_iqa_ssim, prefer upstream's shape only if it keeps the ADR-0138 / ADR-0139 bit-exactness invariants (single-rounded float mul → widen to double → double add; per-lane scalar-float reduction through aligned temp buffer). Otherwise keep the fork's split and re-document the divergence here. - The fork renamed
_calc_scale→iqa_calc_scaleto clear thebugprone-reserved-identifiercheck. If upstream modifies_calc_scale, keep the fork's name and port the behavioural change. model_collection_parse_loopwrites directly tocfg_namerather than throughc->name— if upstream ever rewritesmodel_collection_parse, preserve the direct write (it's what lets the param stay non-const without a NOLINT).- Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 -o /tmp/vmaf_$mask.xml
done
diff <(grep -v fyi /tmp/vmaf_0.xml) <(grep -v fyi /tmp/vmaf_255.xml)
# expect exit 0 (Netflix-golden-pair VMAF bit-identical scalar vs SIMD)
Also run clang-tidy -p build on every file in Touches; expect zero warnings. - Follow-up T7-6: decide whether to rename the _iqa_* API surface (convolve / ssim / decimate / img_filter / filter_pixel / get_pixel) across all callers to clear the remaining bugprone-reserved-identifier suppressions in ssim.c, ms_ssim.c, float_ms_ssim.c. Out of scope here.
0040 — Thread-pool job recycling + inline data buffer (ADR-0147)¶
- ADR: ADR-0147
- Touches:
core/src/thread_pool.c - Invariants:
VmafThreadPoolJobcarries a fixed-sizechar inline_data[64]buffer. Payloads ≤ 64 bytes go throughmemcpy(job->inline_data, data, data_sz)+job->data = job->inline_data; payloads > 64 bytes take the legacymallocpath. The cleanup path MUST distinguish the two viajob->data != job->inline_data— a naivefree(job->data)would corrupt the slot. Enforced invmaf_thread_pool_job_clear_data.free_jobslist is protected by the existingqueue.lock; enqueue pops from it beforemallocing, runner recycles onto it after running a job.vmaf_thread_pool_destroywalks the list aftervmaf_thread_pool_waitreturns (all workers have exited → no lock needed). Any reorder that frees the queue lock before thefree_jobswalk is a leak on shutdown.- Fork's
void (*func)(void *data, void **thread_data)signature + per-workerVmafThreadPoolWorkerare fork-local; upstream Netflix #1464 hasfunc(void *data). Keep the fork's signature on any rebase — callers (src/libvmaf.c:threaded_enqueue_oneetc.) depend on the two-arg form. -
On upstream sync: Netflix PR #1464 is CLOSED (not merged) and bundles twelve unrelated optimizations. Only the thread-pool portion is ported here. If upstream ever reopens and merges #1464 (or a successor), cherry-pick only the pool mechanics; reject the payload-signature changes, the ADM / VIF / predict.c pieces (they conflict with ADR-0138 / 0139 / 0142 bit-exactness and with T7-5 predict.c refactor), and the feature-collector capacity bump (fork already capped at 8 for a reason — see
src/feature/feature_collector.c). -
Re-test on rebase (x86, any libsvm-less host):
ninja -C build && meson test -C build
for threads in 1 4; do
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 --threads $threads -o /tmp/vmaf_${threads}_${mask}.xml
done
done
# Expect bit-identical scores (attribute order may differ across
# --threads 1 vs --threads 4 because feature-collector emits in
# insertion order; the numeric values match).
diff <(grep -v fyi /tmp/vmaf_4_0.xml) <(grep -v fyi /tmp/vmaf_4_255.xml)
# expect exit 0 (scalar vs SIMD threaded)
Also run clang-tidy -p build core/src/thread_pool.c — expect zero warnings. Re-run the 500 000-job micro-benchmark from ADR-0147 §Decision if performance is under investigation.
0041 — IQA reserved-identifier rename + cleanup (ADR-0148)¶
- ADR: ADR-0148
- Touches: 21 files across
core/src/feature/(iqa/{convolve,decimate,ssim_tools}.{c,h},iqa/ssim_simd.h,ssim.c,integer_ssim.c,ms_ssim.c,ms_ssim_decimate.h,float_ssim.c,float_ms_ssim.c,x86/convolve_avx2.{c,h},x86/convolve_avx512.{c,h},arm64/convolve_neon.{c,h},AGENTS.md) pluscore/test/test_iqa_convolve.c. - Invariants:
- Every
_iqa_*/_kernel/_ssim_int/_map_reduce/_map/_reduce/_context/_ms_ssim_*/_ssim_*/_alloc_buffers/_free_bufferssymbol and the four underscore-prefixed header guards (_CONVOLVE_H_,_DECIMATE_H_,_SSIM_TOOLS_H_,__VMAF_MS_SSIM_DECIMATE_H__) is renamed to its non-reserved spelling. The fork's IQA surface no longer uses C's reserved-identifier name space. - The
clang-analyzer-security.ArrayBoundNOLINT bracket inssim_accumulate_rowandssim_reduce_row_range(integer_ssim.c) is load-bearing — the inner kernel-loopk_min/k_maxclamping is provably correct (k_min = max(0, hkernel_offs - x),k_max = min(hkernel_sz, hkernel_sz - (x + hkernel_offs - w + 1))) but the analyzer can't follow it across helper boundaries. Do not collapse the bracket. - The
clang-analyzer-unix.MallocNOLINT bracket intest_iqa_convolve.c(check_simd_variant,check_case) is intentional — test exits process on failure path; small allocations leak by design at test end. Do not refactor to free-on-exit. - The cross-TU NOLINT pattern on
compute_ssim(ssim.c) andcompute_ms_ssim(ms_ssim.c) — clang-tidymisc-use-internal-linkageruns per-TU and can't see the header bridge tofloat_ssim.c/float_ms_ssim.c. Keep the inline justification comment. - On upstream sync:
- The Netflix upstream IQA library (
tjdistler/iqa) has been effectively abandoned (last meaningful commit pre-2020). Future rebases will conflict on every renamed symbol; drop the underscore-prefix on each conflict and mirror the fork'siqa_*naming. - If upstream Netflix/vmaf ever reincorporates the IQA naming wholesale, prefer the fork's spellings — this PR is a one-shot mechanical rename with no semantic content.
- Re-test on rebase:
ninja -C build && meson test -C build
for mask in 0 255; do
VMAF_CPU_MASK=$mask ./build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
-m version=vmaf_v0.6.1 \
--feature float_ssim --feature float_ms_ssim \
-o /tmp/iqa_$mask.xml
done
diff <(grep -v fyi /tmp/iqa_0.xml) <(grep -v fyi /tmp/iqa_255.xml)
# expect exit 0 (bit-identical scalar vs SIMD on float_ssim/ms_ssim)
Also run clang-tidy -p build on every touched file (excluding arm64/); expect zero warnings.
0042 — Port Netflix #1376 — FIFO-hang fix via Semaphore (ADR-0149)¶
- ADR: ADR-0149
- Upstream commit: Netflix PR #1376, head
1c06ca4f1bb5da38b54db075a27c35ba8ea9d7b7(OPEN upstream as of 2026-04-24). - Touches:
python/vmaf/core/executor.py— baseExecutorclass +ExternalVmafExecutor-style subclass; delete_wait_for_workfiles/_wait_for_procfilespolling loops; rewrite_open_{work,proc}files_in_fifo_modearoundmultiprocessing.Semaphore(0); addopen_sem=Nonekwarg to every_open_{ref,dis}_{work,proc}fileand to the_open_workfilestaticmethod; drop unusedfrom time import sleep.python/vmaf/core/raw_extractor.py—AssetExtractor+DisYUVRawVideoExtractor; addopen_sem=Noneto_open_{ref,dis}_workfileoverrides (release on entry since these are no-ops); delete_wait_for_workfilesoverrides; drop unusedfrom time import sleep.- Fork carve-outs (load-bearing on rebase):
python/vmaf/__init__.py:__version__stays"3.0.0"— do NOT port upstream's bump to"4.0.0". The fork tracks its own versioning (v3.x.y-lusoris.N) per ADR-0025.from time import sleepis dropped from both files — upstream leaves the import in place (unused after their patch); the fork removes it because ADR-0141 touched-file rule requires ruff F401 clean.- Upstream typo preserved: the subclass warning message contains "to be created to be created". Comments note the typo inline; do not silently fix on rebase — it's upstream- authored and project policy is verbatim port.
- On upstream sync: upstream PR #1376 is still OPEN. When it merges, re-diff against the merged form; the touched hunks should be conflict-free because the fork now carries the same shape. Re-check whether upstream fixed the "to be created to be created" typo; if so, adopt the fix (it becomes a simple string update).
- Re-test:
python3 -m py_compile python/vmaf/core/executor.py \
python/vmaf/core/raw_extractor.py
ruff check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
black --check python/vmaf/core/executor.py python/vmaf/core/raw_extractor.py
# all silent
# No FIFO-mode unit test in the tree; end-to-end harness
# exercise (needs libsvm + ffmpeg + fixtures) goes via
# make test-netflix-golden
# which doesn't exercise fifo_mode path but does verify the
# refactor didn't break executor.py imports.
0043 — Port Netflix #1472 — CUDA on Windows MSYS2/MinGW (ADR-0150)¶
- ADR: ADR-0150
- Upstream commits: Netflix PR #1472 —
15745cdf(portability) +b7b65e64(meson plumbing). Both OPEN upstream as of 2026-04-24. - Touches:
core/src/cuda/common.h— drop<pthread.h>include; rename reserved header guard__VMAF_SRC_CUDA_COMMON_H__→VMAF_SRC_CUDA_COMMON_INCLUDED.core/src/cuda/cuda_helper.cuh—#ifdef DEVICE_CODEguard around<cuda.h>vs<ffnvcodec/dynlink_loader.h>.core/src/picture.h—#ifdef DEVICE_CODEguard around<cuda.h>+ forward-declareVmafCudaStatevs<ffnvcodec/*>+ fulllibvmaf_cuda.h; rename reserved header guard.core/src/feature/integer_adm.h— updated comment abovedwt_7_9_YCbCr_thresholdtable noting the fork's positional-initializer shape vs upstream's#ifndef __CUDACC__shape (see §Fork carve-outs).core/src/feature/cuda/integer_adm/{adm_cm,adm_csf,adm_csf_den,adm_decouple,adm_dwt2}.cu—#ifndef DEVICE_CODEguard around#include "feature_collector.h".core/src/meson.build— Windows nvcc plumbing (+70 LoC underhost_machine.system() == 'windows'):vswhere-basedcl.exediscovery, MSVC + Windows SDK include path injection, CUDA version detection vianvcc --version,nvcc_ccbin_flags+nvcc_host_includesthreaded through everycustom_targetthat invokes nvcc.- Fork carve-outs (load-bearing on rebase):
integer_adm.huses positional initializers, NOT upstream's#ifndef __CUDACC__wrap. Both shapes resolve the MSVC/nvcc C++-designated-initializer issue; the positional form is C++-portable and keeps the table available to future.cuconsumers. Keep the fork's form on rebase.cuda_static_libkeepsdependencies : [pthread_dependency]. Upstream drops it; the fork needs it becausering_buffer.c(built as part ofcuda_static_lib)#includes<pthread.h>directly. On rebase: keep the fork's version.meson.buildgencode coverage block: the fork's ADR-0122 explicit cubin list (sm_75/80/86/89 + compute_80 PTX) sits after the new upstream nvcc-detect block. On rebase, re-assemble the same merged order: nvcc-detect first, then gencode coverage (both host-independent).- Header guards:
_INCLUDEDspellings are fork-local (ADR-0148 precedent). Upstream keeps reserved__VMAF_SRC_*_H__spellings. On rebase, keep_INCLUDED. - On upstream sync: PR #1472 is still OPEN. When merged, re-diff the three conflict-resolved hunks against upstream's final form. Keep fork's version on the four carve-outs above unless upstream meaningfully reshapes those regions.
- Re-test on rebase (Linux host with CUDA toolkit):
meson setup libvmaf core/build-cuda \
-Denable_cuda=true -Denable_nvcc=true -Denable_sycl=false
ninja -C core/build-cuda && meson test -C core/build-cuda
# Expect 6 .fatbin files generated + CLI linked + 35/35 tests pass.
Windows validation is operator-driven — CI does not yet have a Windows + MSYS2 + MinGW + MSVC BuildTools + CUDA runner (tracked as T7-3 in .workingdir2/OPEN.md). - Prerequisites note (Windows only): nv-codec-headers must be built from git master commit 876af32 or later. The release tag n13.0.19.0 is missing cuMemFreeHost, cuStreamCreateWithPriority, cuLaunchHostFunc, and other CudaFunctions members libvmaf uses. Pre-existing issue, not scope of this port.
0058 — libvmaf.pc Cflags leak fix (ADR-0200)¶
- ADR: ADR-0200; bug-fix follow-up to entry 0057.
- Upstream source: fork-local. Netflix has no Vulkan backend.
- Touches:
core/subprojects/packagefiles/volk/meson.build— drops-include volk_priv_remap.hfromvolk_dep.compile_args; keeps-DVK_NO_PROTOTYPES.core/src/vulkan/meson.build— pullsvolk_priv_remap_h_pathfrom the volk subproject and appends['-include', <path>]tovmaf_cflags_common(privatec_args:on libvmaf'slibrary()call).- Invariants (load-bearing):
-includeMUST stay offvolk_dep.compile_args— otherwise it leaks into staticlibvmaf.pcCflags. Test on rebase:meson setup ... -Ddefault_library=static -Denable_vulkan=enabled, thengrep Cflags meson-private/libvmaf.pc— must NOT containvolk_priv_remapor any build-dir absolute path.-includeMUST be applied to libvmaf's compile — every libvmaf TU that calls volk'svk*API needs the rename macros active. Thevmaf_cflags_commoninjection covers this for all libvmaf sub-libraries (libvmaf_feature, libvmaf_cpu, etc.).- The path comes from
subproject('volk').get_variable(...), not from a hardcoded string — survives volk wrap version bumps. - On upstream sync: zero upstream interaction.
- Re-test on rebase / volk wrap bump:
meson setup build-vk-static-test libvmaf -Denable_vulkan=enabled \
-Denable_cuda=false -Denable_sycl=false -Ddefault_library=static
ninja -C build-vk-static-test src/libvmaf.a
grep Cflags build-vk-static-test/meson-private/libvmaf.pc
# Expected: no `volk_priv_remap` substring, no build-dir absolute path
0057 — Volk vk* priv-remap for static-archive builds (ADR-0198)¶
- ADR: ADR-0198; follow-up to ADR-0185.
- Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
- Touches:
core/subprojects/packagefiles/volk/meson.build— overlay applied on top of the upstream volk wrap. Adds acustom_targetthat runsgen_priv_remap.pyto producevolk_priv_remap.hfrom the upstreamvolk.h, and wires-includeof the generated header intovolk.c'sc_argsandvolk_dep'scompile_args.core/subprojects/packagefiles/volk/gen_priv_remap.py— fork-added generator script (regex againstextern PFN_vkXxx vkXxx;declarations).- Invariants (load-bearing):
- Force-include must propagate to every libvmaf TU pulling in
volk_dep— verified via meson dep graph. Removing the-includefromcompile_argsre-introduces the static-link multi-def cascade. - Generator regex matches every
vk*PFN declaration involk.h— confirmed for volk-1.4.341 (784declarations,784remaps). Bumping the volk wrap version: re-run the generator (it's a configure-time custom target, so it's automatic) and confirm the rename count printed to stdout matches the count of^extern PFN_vklines in the newvolk.h. - The renamed symbols use the
vmaf_priv_prefix — chosen to match no upstream Netflix or Vulkan SDK identifier. Don't rename to_vk*(collides with reserved-identifier C namespace) orvkv_*etc. - On upstream sync: zero upstream interaction. The volk wrap is a libvmaf-managed subproject; Netflix doesn't ship a Vulkan backend.
- Re-test on rebase / after any volk wrap bump:
meson setup build-vk-static libvmaf -Denable_vulkan=enabled \
-Denable_cuda=false -Denable_sycl=false \
-Ddefault_library=static
ninja -C build-vk-static src/libvmaf.a
test "$(nm build-vk-static/src/libvmaf.a 2>/dev/null \
| grep -cE '^[0-9a-f]* (T|D|B|R) vk[A-Z]')" = "0" \
&& echo OK
(Followed by the BtbN-style link reproducer in the ADR References section.)
0056 — SSIMULACRA 2 snapshot gate + fp-contract-off split (ADR-0164)¶
- ADR: ADR-0164
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
- Touches:
- python/test/ssimulacra2_test.py — new fork-added Python test. Uses
subprocess.callagainstExternalProgram.vmafexecwith--feature ssimulacra2; parses the--jsonoutput; asserts pooled + per-frame scores. - Invariants (load-bearing):
- Pinned values are CPU-only — generated on master HEAD after PR #100 merge. Re-generate if the scalar or any SIMD path changes semantically (which per ADR-0161/0162/0163's bit-exactness contract, it shouldn't — any bit-exact refactor leaves pinned values unchanged).
- Tolerance is 4 decimal places (
places=4) — matches 1e-4. The CPU paths are bit-exact so actual drift should be 0; the tolerance is defensive. -ffp-contract=offeverywhere in the ssimulacra2 pipeline:libvmaf_ssimulacra2_static_lib(scalar extractor),x86_ssimulacra2_avx2_lib,x86_ssimulacra2_avx512_lib, andarm64_ssimulacra2_lib(from ADR-0161). All four split out of their umbrella libs so other extractors keep upstream's default FMA policy. Without this the CI GCC/clang hosts drifted ~2e-4 from my AVX-512 authoring host — GCC 10+ defaults-ffp-contract=faston x86 with-mfmaand on aarch64, fusinga*b+cin scalar glue around the SIMD calls. Do NOT remove any of these carve-outs on rebase.- Fixtures are already-checked-in —
src01_hrc00/01_576x324is also the primary Netflix golden fixture; the 160×90 derived one stresses the sub-176 pyramid-termination path. - Do NOT modify the Netflix golden assertions in quality_runner_test.py et al. — those are upstream-pinned. This test is a SEPARATE file that adds fork-specific scores.
- On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future, cross-reference against their pinning if they add one.
- Re-test on rebase / after any ssimulacra2 change:
- Follow-ups:
- Cross-reference gate against libjxl
tools/ssimulacra2whenssimulacra2_rscargo install is fixed. - Expand fixture coverage if new YUV test assets land.
0055 — SSIMULACRA 2 picture_to_linear_rgb SIMD (ADR-0163)¶
- ADR: ADR-0163
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2.
- Touches:
- ssimulacra2_avx2.{c,h} — new
ssimulacra2_picture_to_linear_rgb_avx2+ helpers (read_plane_scalar_s2,srgb_to_linear_lane_avx2,compute_matrix_coefs). - ssimulacra2_avx512.{c,h} — 16-wide AVX-512 port.
- ssimulacra2_neon.{c,h} — 4-wide aarch64 port.
- ssimulacra2.c — new
ptlr_fnfield inSsimu2State; dispatch wrapperconvert_picture_to_linear_rgbunpacksVmafPictureintosimd_plane_t[3]; init assigns AVX2/AVX-512/NEON pointers. - ssimulacra2_simd_common.h — new shared header declaring
simd_plane_t. Decouples SIMD TUs fromVmafPicturetype. - test_ssimulacra2_simd.c — new
test_ptlr_420_8,test_ptlr_420_10,test_ptlr_444_8,test_ptlr_444_10,test_ptlr_422_8subtests + scalar referencesref_read_plane,ref_srgb_to_linear,ref_picture_to_linear_rgb. - Invariants (load-bearing):
- Scalar-order matmul —
G = Yn + cb_g * Un + cr_g * Vnchained left-to-right in all three SIMD TUs. Regression test catches reordering drift (~1 ulp). - Per-lane scalar
powf— vector polynomial approximation would drift scalar bit-exactness. Do not replace the lane spill/reload pattern with a vector libm. simd_plane_tlayout —{data, stride, w, h}ordering assumed by all three SIMD TUs. The dispatch wrapper builds this fromVmafPicturefields; layout must match.- Bounds clamping in
read_plane_scalar_*mirrors scalar reference verbatim (if (sx < 0) sx = 0; if (sx >= pw) sx = pw-1;etc.). Do not simplify — removes per-lane safety at plane edges. - Arbitrary chroma ratios fall through to the
int64_tmultiplication branch. Don't remove it — SSIMULACRA 2 is supposed to accept non-standard ratios gracefully. - On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides a SIMD YUV→RGB path, diff against the fork's — preserve the bit-exactness contract unless ADR-0142 Netflix-authority carve-out opens.
- Re-test on rebase:
ninja -C build && build/test/test_ssimulacra2_simd # 11/11
ninja -C build-aarch64 && \
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 11/11
- Follow-ups:
- T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending (gated on
tools/ssimulacra2availability). - SSIMULACRA 2 now has zero scalar hot paths. T3-1 closes in full with phases 1+2+3 (ADR-0161, 0162, 0163).
0054 — SSIMULACRA 2 FastGaussian IIR blur SIMD (ADR-0162)¶
- ADR: ADR-0162
- Upstream source: fork-local. No SSIMULACRA 2 extractor in upstream Netflix/vmaf.
- Touches:
- ssimulacra2_avx2.{c,h} — new
ssimulacra2_blur_plane_avx2+ 2 helpers (hblur_8rows_avx2,vblur_simd_8cols_avx2). - ssimulacra2_avx512.{c,h} — 16-wide port.
- ssimulacra2_neon.{c,h} — 4-wide aarch64 port, uses
vsetq_lane_f32in place of gather. - ssimulacra2.c — adds
blur_fnfunction pointer toSsimu2State, dispatch ininit_simd_dispatch(), call-site inblur_3plane. - test_ssimulacra2_simd.c — new
test_blur+ scalar reference (ref_blur_plane,ref_fast_gaussian_1d). - Invariants (load-bearing):
- Row-batching lane layout — horizontal pass lane
iMUST hold row(y_base + i). Gather index vector entries are(y_base + i) * w(stride-w). Changing this breaks bit-exactness vs scalar. - Scalar left-to-right summation order —
n2_k * sum - d1_k * prev1_k - prev2_kchained sequentially;o0 + o1 + o2at output time is(o0 + o1) + o2. Changing to(o0 + o2) + o1oro0 + (o1 + o2)will drift ~1 ulp and the regression test catches it. col_stateis 6 * w contiguous floats — layout is[prev1_0 | prev1_1 | prev1_2 | prev2_0 | prev2_1 | prev2_2]. SIMD loads assume this layout; changing field order requires updating all three SIMD TUs in lockstep withblur_plane.- NEON lane-set pattern — aarch64 has no gather intrinsic; 4 explicit
vsetq_lane_f32calls per input vector. Do not replace with ald1 {v.s}[lane]-style pseudo-gather without re-verifying bit-exactness. - Scalar tail in vertical pass matches scalar reference body verbatim. Any deviation breaks
memcmpequality on widths that aren't multiples of the SIMD width. - On upstream sync: no upstream interaction. If Netflix adopts SSIMULACRA 2 in the future and provides their own IIR blur SIMD, diff against the fork's and preserve the bit-exactness contract unless an ADR-0142 Netflix-authority carve-out is opened.
- Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd # 6/6
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 6/6
- Follow-ups:
picture_to_linear_rgbSIMD — last scalar hot path in the extractor. 2 calls / frame. Low ROI but mechanical.- T3-3 SSIMULACRA 2 snapshot-JSON regression test — still pending.
0053 — SSIMULACRA 2 SIMD bit-exact ports (ADR-0161)¶
- ADR: ADR-0161
- Upstream source: fork-local. Upstream Netflix/vmaf has no SSIMULACRA 2 extractor at all (fork-added in ADR-0130).
- Touches:
- ssimulacra2_avx2.c / .h — 5 AVX2 kernels + per-lane
cbrtfhelper. - ssimulacra2_avx512.c / .h — 5 AVX-512 kernels; mechanical 16-wide widening of the AVX2 path.
- ssimulacra2_neon.c / .h — 5 NEON kernels; 4-wide aarch64 mirror.
- ssimulacra2.c — adds function-pointer dispatch fields to
Ssimu2State+init_simd_dispatch()helper, calls go through the pointers. - meson.build — registers the three SIMD TUs in
x86_avx2_sources/x86_avx512_sources/arm64_sources. - test_ssimulacra2_simd.c and
test/meson.build— new bit-exact test harness. - Invariants (load-bearing):
- Byte-for-byte bit-exactness to scalar on all 5 vectorised kernels under
FLT_EVAL_METHOD == 0. Regression caught pre- merge: naïve pairing(a+b)+(c+d)vs scalar((a+b)+c)+ddrifts by 1 ULP. Keep sequential scalar-order chains in all three SIMD TUs on rebase. cbrtfis per-lane scalar libm, not a polynomial. Any replacement with a vector cbrt would drift the ssimulacra2 score and break the regression test. Keep the spill/reload pattern.ssim_map/edge_diff_mapreductions use the ADR-0139 per-lanedoublescalar tail. Do NOT SIMD-reduce float lanes then lift to double — summation order changes.downsample_2x2deinterleave uses ISA-appropriate ops: AVX2vshufps+vpermpd, AVX-512vpermt2ps, NEONvuzp1q_f32+vuzp2q_f32. After deinterleave, sum order is((r0e+r0o)+r1e)+r1omatching scalar.#pragma STDC FP_CONTRACT OFFat every TU header. Ignored by aarch64 GCC (non-fatal-Wunknown-pragmas); kept for portability (clang, MSVC).- IIR blur +
picture_to_linear_rgbstay scalar in this PR. Follow-up PRs target these; when they land, re-verify bit-exactness viatest_ssimulacra2_simdexpansion. - Runtime dispatch order: AVX-512 > AVX2 on x86; NEON on aarch64; scalar fallback. Preserve on rebase.
- On upstream sync:
- Upstream has no SSIMULACRA 2 extractor; nothing to merge.
- If Netflix adopts SSIMULACRA 2 in the future, diff their implementation against the fork's scalar + SIMD TUs; keep the fork's bit-exactness contract absent a specific Netflix-authority carve-out ADR.
- Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_ssimulacra2_simd # 5/5
clang-tidy -p build core/src/feature/x86/ssimulacra2_avx2.c \
core/src/feature/x86/ssimulacra2_avx512.c
# aarch64:
ninja -C build-aarch64
qemu-aarch64-static -L /usr/aarch64-linux-gnu/ \
build-aarch64/test/test_ssimulacra2_simd # 5/5
clang-tidy -p build-aarch64 \
core/src/feature/arm64/ssimulacra2_neon.c
- Follow-ups:
- IIR blur vectorisation (
blur_planevertical-pass column batching) — the biggest frame-level wallclock win. picture_to_linear_rgbper-lanepowf— lower ROI but mechanical.- T3-3 SSIMULACRA 2 snapshot-JSON regression test — ADR-0130 deferred; still pending.
0052 — psnr_hvs SIMD bit-exact ports (ADR-0159 AVX2, ADR-0160 NEON)¶
- ADRs: ADR-0159 (AVX2), ADR-0160 (NEON sister port).
- Upstream source: fork-local. Upstream Netflix/vmaf has no psnr_hvs SIMD path.
- Touches:
core/src/feature/x86/psnr_hvs_avx2.c— AVX2 TU.core/src/feature/x86/psnr_hvs_avx2.h— AVX2 header.core/src/feature/arm64/psnr_hvs_neon.c— NEON TU (sister port, ADR-0160).core/src/feature/arm64/psnr_hvs_neon.h— NEON header.core/src/feature/third_party/xiph/psnr_hvs.c— addPsnrHvsState+ runtime dispatch ininit()(AVX2 underARCH_X86, NEON underARCH_AARCH64) + scoped NOLINTBEGIN/END around the upstream Xiph scalar block (kept verbatim as the bit-exact reference).core/src/meson.build— addx86/psnr_hvs_avx2.ctox86_avx2_sourcesandarm64/psnr_hvs_neon.ctoarm64_sources.core/test/test_psnr_hvs_avx2.c,core/test/test_psnr_hvs_neon.c— bit-exact unit tests (x86 and aarch64 respectively).core/test/meson.build— register both tests underenable_asm, arch-gated.- Invariants (load-bearing):
- Bit-exactness to scalar: every
od_coeff(int32) and every finalpsnr_hvs_{y,cb,cr,psnr_hvs}value the AVX2 path emits must be byte-identical to the scalar reference on the Netflix golden pairs. If a rebase introduces any pattern that breaks this (e.g. a floating-point horizontal reduce in the mask accumulator), the unit testtest_psnr_hvs_avx2will fail — don't relax the assertions; fix the SIMD path. - DCT butterfly layout:
butterfly → transpose → butterfly → transpose. The transpose lives insideod_bin_fdct8x8_avx2. Do not move it. - Float accumulators stay scalar: means / variances / mask / error accumulation in
calc_psnrhvs_avx2use the same per-block scalar loop as scalar psnr_hvs — bit-exact by construction. Do not vectorize these with horizontal reductions without replicating ADR-0139's per-lane scalar-float reduction pattern. The cross-block error accumulatorretis threaded throughaccumulate_error()by pointer, not returned-then-summed: each of the 64 per-coefficient contributions per block must hit the outerretdirectly, matching scalar's inlineret += ...atthird_party/xiph/psnr_hvs.cline 355. IEEE-754 float add is non-associative — summing into a local float and then adding the per-block total toretchanges the summation tree and drifts the Netflix golden by ~5.5e-5. #pragma STDC FP_CONTRACT OFFat the TU header disables FMA formation. Required:fmaf(a, b, c)can differ from(a*b)+cby 1 ulp, breaking bit-exactness. Do not remove the pragma; do not add-ffp-contract=fastto the build flags for this TU.- NOLINT suppressions are load-bearing — each cites ADR-0141 inline (bit-exactness scalar-diff auditability for the 30-butterfly function, scalar float→double promotion for
sqrt, extractor-registry extern linkage forvmaf_fex_psnr_hvs, upstream-Xiph scoped block for rebase parity). - On upstream sync:
- Upstream has no psnr_hvs SIMD as of 2026-04-24. Keep fork's version on conflict.
- If upstream ever touches
psnr_hvs.cfor non-SIMD reasons (e.g. a masking-table update), rebase the AVX2 TU to match line-for-line and re-runtest_psnr_hvs_avx2to confirm bit-exactness survives. - NEON follow-up PR is a sister port; its
arm64/psnr_hvs_neon.cwill mirror this ADR's invariants. On rebase, the two SIMD TUs must stay in lock-step with the scalar reference. - Re-test on rebase:
ninja -C build
meson test -C build test_psnr_hvs_avx2
# Expect: 5/5 subtests pass (DCT bit-exact on 3 random seeds +
# delta + constant input).
# CLI-level bit-exactness on Netflix golden (requires the YUV
# fixtures in python/test/resource/yuv/):
# VMAF_CPU_MASK=0 (scalar)
# VMAF_CPU_MASK=255 (AVX2 enabled)
# Diff per-frame psnr_hvs_{y,cb,cr,psnr_hvs} XML fields; expect
# byte-identical across all 3 golden pairs.
0051 — Netflix#1486 motion updates verified present (ADR-0158)¶
- ADR: ADR-0158
- Upstream source: Netflix upstream PR #1486 ("Port motion updates"), MERGED 2026-04-20 as commits
a44e5e6(code) +62f47d5(Netflix golden updates). - Touches: documentation-only; the actual code changes this ADR documents are already in the fork's master via earlier incremental motion3 / blend / five-frame-window commits.
- Invariants (load-bearing for future
/sync-upstream): - The
edge_8mirror fix (i_tap = height - (i_tap - height + 2)) is present atinteger_motion.c:240,x86/motion_avx2.c:147,x86/motion_avx512.c:147. If upstream's mirror line ever diverges again, this is the hunk to watch. - The
motion_max_valfeature option is atinteger_motion.c:57,118-120with default 10000.0 andFEATURE_PARAMflag. Upstream's default = fork's default; don't drift. VMAF_integer_feature_motion3_scoreoutput plumbing is ininteger_motion.c+alias.c.- Fork-local motion extensions (five-frame-window, moving-average, blend, fps_weight) are ADDITIONS on top of Netflix#1486. They are not upstream. Upstream changes to motion extractor internals may conflict with them — diff against
core/src/feature/integer_motion.con every rebase and check that the fork'sMIN(s->score * s->motion_fps_weight, s->motion_max_val)invocations are preserved (lines ~409, ~503). - On upstream sync: nothing to port from Netflix#1486 — it's absorbed. If a future upstream PR touches the same code paths, prefer upstream's version for the scalar/edge handling and the fork's version for the five-frame-window / blend extensions.
- Re-test on rebase:
ninja -C build
meson test -C build
# Expect: 35/35 pass.
# Verify the upstream markers are still in place after rebase:
grep -n "height - (i_tap - height + 2)\|motion_max_val\|VMAF_integer_feature_motion3_score" \
core/src/feature/integer_motion.c \
core/src/feature/alias.c \
core/src/feature/x86/motion_avx2.c \
core/src/feature/x86/motion_avx512.c
# Expect: matches at all 4 files. If any missing, the rebase
# silently dropped the Netflix#1486 content — investigate.
0050 — CUDA preallocation memory leak fix + vmaf_cuda_state_free (ADR-0157)¶
- ADR: ADR-0157
- Upstream source: Netflix upstream issue #1300 (OPEN since 2024; no maintainer fix as of 2026-04-24). User reports GPU memory rises monotonically across init/preallocate/fetch/close cycles.
- Touches:
core/include/libvmaf/libvmaf_cuda.h— new publicvmaf_cuda_state_free()API declaration.core/src/cuda/common.c— newvmaf_cuda_state_free()implementation;vmaf_cuda_release()now callscuda_free_functions();vmaf_cuda_state_init()gets an outer failure unwind;init_with_primary_context()releases the retained primary context onfail_after_pop.core/src/cuda/ring_buffer.c—vmaf_ring_buffer_close()now unlocks + destroys the mutex before freeing.core/test/test_cuda_preallocation_leak.c— new GPU-gated reducer (10-cycle loop with full cleanup).core/test/test_cuda_pic_preallocation.c,core/test/test_cuda_buffer_alloc_oom.c— add missingvmaf_cuda_state_free()+vmaf_model_destroy()calls aftervmaf_close()in every test that allocates these.core/test/meson.build— register the new reducer underenable_cudaguard.- Invariants (load-bearing):
- Public contract: every caller of
vmaf_cuda_state_init()MUST callvmaf_cuda_state_free()AFTERvmaf_close()on any VmafContext that imported the state. Informalfree(cu_state)is a silent double-free hazard AFTER close (vmaf_close's vmaf_cuda_release already memset's + frees CudaFunctions internals; vmaf_cuda_state_free only frees the heap allocation itself). vmaf_cuda_release()freesCudaFunctionsvia a saved pointer AFTER thememset. Order matters —memsetfirst socu_state->fis zeroed in the caller's struct, then free via the saved local. Do not re-order.vmaf_ring_buffer_close()unlocks BEFORE destroying the mutex (POSIX requires the mutex be unlocked for destroy).- The cold-start unwind in
init_with_primary_contextreleasescuDevicePrimaryCtxRetain's retained context ifcuStreamCreateWithPriorityfails. - The ADR-0122 / ADR-0123
is_cudastate_empty()null-guards at the top of every publicvmaf_cuda_*entry must continue to compose with the newvmaf_cuda_state_free()(which accepts NULL directly and doesn't call through to the CUDA API). - The new free call order in callers is:
vmaf_close(vmaf)→vmaf_cuda_state_free(cu_state)→vmaf_model_destroy(model). Reversing the first two produces a use-after-free. - On upstream sync:
- Upstream has no
vmaf_cuda_state_free()as of 2026-04-24. Keep the fork's version on any conflict. If upstream eventually lands the same API with a different spelling, prefer upstream's spelling and add a compat alias — but do not break the fork's ABI. vmaf_cuda_release()'scuda_free_functions()call is fork-local. On rebase, keep it.- The ring-buffer
pthread_mutex_unlock+pthread_mutex_destroypair is fork-local. On rebase, keep it. - If upstream refactors
VmafCudaStateownership semantics (unlikely — their pattern has been "leaked state in a long- lived process is acceptable" historically), re-audit this ADR and the new public API. - Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 40/40 pass including test_cuda_preallocation_leak.
# ASan leak-check:
cd libvmaf && meson setup build-asan-cuda \
-Db_sanitize=address -Denable_cuda=true -Denable_sycl=false \
--buildtype=debug
ninja -C build-asan-cuda
ASAN_OPTIONS='detect_leaks=1:leak_check_at_exit=1' \
build-asan-cuda/test/test_cuda_preallocation_leak
# Expect: 0 bytes leaked from core/src/* frames.
# (~180 bytes in libcuda.so.1 is expected — driver's process-
# lifetime cuInit cache, does not grow per cycle.)
0049 — CUDA graceful error propagation (ADR-0156)¶
- ADR: ADR-0156
- Upstream source: Netflix upstream issue #1420 (OPEN as of 2026-04-24). Reports that two concurrent VMAF-CUDA processes crash the second one at
vmaf_cuda_buffer_allocdue toCHECK_CUDA(cuMemAlloc)→assert(0)on OOM. - Touches:
core/src/cuda/cuda_helper.cuh— redefinedCHECK_CUDAfamily. New macrosCHECK_CUDA_GOTO+CHECK_CUDA_RETURN+ helpervmaf_cuda_result_to_errno. Oldassert(0)semantics removed entirely.core/src/cuda/common.c,core/src/cuda/picture_cuda.c,core/src/libvmaf.c— allCHECK_CUDA(...)sites converted; cleanup labels added where contexts / buffers were pushed / allocated.core/src/feature/cuda/integer_motion_cuda.c,integer_vif_cuda.c,integer_adm_cuda.c— same conversion; 12statichelpers promotedvoid → int.core/test/test_cuda_buffer_alloc_oom.c— new GPU-gated reducer.core/test/meson.build— register new test underenable_cudaguard.- Invariants (load-bearing):
CHECK_CUDA_GOTO/CHECK_CUDA_RETURNmust never callassert(0)orabort()on a CUDA error. Any regression back to the upstream abort-on-error semantics re-introduces Netflix#1420 and the NDEBUG footgun.- Every
CHECK_CUDA_GOTOtarget label must pop any previously-pushed CUDA context and free any partially-constructed buffers before returning the errno. The graceful path must not leak resources. vmaf_cuda_result_to_errnouses numericCUresultvalues directly (0 / 1 / 2 / 3 / 4 / 101 / 201 / 400) so host TUs that don't include<cuda.h>can transitively consume the mapping via the inline function. If upstream renumbersCUresultenum values (historically stable — they've been fixed since CUDA 1.0), re-audit the switch.- ADR-0122 / ADR-0123
is_cudastate_empty(...)guards at the top of every publicvmaf_cuda_*entry point must stay — they run before the CUDA API is touched and compose cleanly with the new error propagation. - Twelve
statichelper signatures in the feature extractors areint-returning (wasvoid): any upstream-port that restores thevoidreturn silently regresses the error path. - On upstream sync:
- Upstream Netflix still uses
assert(0)inCHECK_CUDAas of 2026-04-24. Keep the fork's macro definitions incuda_helper.cuhon any upstream conflict — this file is fork-local behaviour. - If upstream eventually lands Netflix#1420 with a similar refactor, prefer the fork's version unless upstream's has identical semantics (no
assert(0)/ noabort()/ translatesCUresultto-errno). Re-verifytest_cuda_buffer_alloc_oomafter rebase. - If upstream adds new
CHECK_CUDA(...)sites in a port, rewrite them toCHECK_CUDA_GOTO/CHECK_CUDA_RETURNas part of the port commit. - If upstream changes any of the 12
statichelper signatures back tovoid, re-promote them tointduring the merge. - Re-test on rebase:
ninja -C core/build-cuda
meson test -C core/build-cuda
# Expect: 39/39 pass including test_cuda_buffer_alloc_oom.
# Reducer check — verify the OOM-to-errno path is live:
meson test -C core/build-cuda test_cuda_buffer_alloc_oom -v
# Expect subtests: request 1 TiB → -ENOMEM; request 0 bytes → 0.
clang-tidy -p core/build-cuda --quiet \
core/src/cuda/common.c \
core/src/cuda/picture_cuda.c \
core/src/feature/cuda/integer_motion_cuda.c \
core/src/feature/cuda/integer_vif_cuda.c \
core/src/feature/cuda/integer_adm_cuda.c \
core/src/libvmaf.c
# Expect exit 0 on every file.
0049 — compute_motion / picture_copy signature changes (b949cebf upstream port)¶
- Upstream commit: Netflix/vmaf b949cebf (feature/motion: port several feature extractor options)
- Prerequisite commit: Netflix/vmaf d3647c73 (picture_copy: add channel parameter)
- PR: upstream/port-b949cebf-motion
Rebase-sensitive invariants:
-
compute_motionsignature change —compute_motion()incore/src/feature/motion.c/motion.hnow takes an extraint motion_decimateparameter (themotion_add_scale1flag). Any new caller added in the fork that callscompute_motion()must pass this parameter. The SIMD integer motion callers (motion_avx2.c,motion_avx512.c) do NOT callcompute_motion()— they use the SAD/convolution dispatch table directly and are unaffected. -
vmaf_image_sad_csignature change — similarly gainsint motion_add_scale1. Any caller in the fork must be updated. Currently only called fromcompute_motion()internally. -
picture_copysignature change — gainsint channelas the last parameter (0=Y, 1=U, 2=V). Every caller in the tree has been updated to pass0(luma). When adding new callers that need UV planes, pass1or2. The fork's CUDA/SYCL/Vulkan callers have been updated in this PR. -
Default behavior preserved — all new options default to no-op values.
motion_add_scale1=false,motion_add_uv=false,motion_blend_factor=1.0,motion_fps_weight=1.0,motion_filter_size=5(= DEFAULT_MOTION_FILTER_SIZE). Integer and float motion2 scores are bit-identical to pre-port baseline. -
vif_scale_frame_sdependency avoided — the upstream b949cebf motion.c importsvif_scale_frame_sfrom vif_tools.h. The fork does not have this function yet (vif options chain is deferred, Research-0024 Strategy E). The bilinear downscaler formotion_add_scale1is implemented as local static functions inmotion.c(motion_scale_bilinear,motion_bilinear_interp,motion_mirror_f). When upstream's vif options chain is eventually ported, reconcile by replacing these local functions withvif_scale_frame_s.
Reproducer:
# verify bit-exactness (default options, scores must be identical):
./core/build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--model path=model/vmaf_v0.6.1.json \
--feature motion --no_prediction --json --output /tmp/motion.json
# integer_motion2 scores must match pre-port baseline at 6 decimal places.
0048 — i4_adm_cm int32 rounding overflow deliberately preserved (ADR-0155)¶
- ADR: ADR-0155
- Upstream source: Netflix upstream issue #955 (OPEN since 2020; no maintainer response as of 2026-04-24). Reports that
add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1))incore/src/feature/integer_adm.cscales 1–3 overflowsint32_t(1u << 31 = 0x80000000wraps to-2147483648). Rounding term is sign-negated; ADM scales 1–3 biased low by ≈1 LSB per summed term. - Touches (documentation-only):
docs/adr/0155-adm-i4-rounding-deferred-netflix-955.md— new ADR (this entry's anchor).core/src/feature/integer_adm.c— in-file warning comment above the overflow site (add_bef_shift_flt[]initialiser loop around line 1277). No code change.core/src/feature/AGENTS.md— invariant note under "Rebase-sensitive invariants".- Invariants (load-bearing — do NOT silently "fix"):
integer_adm.ckeepsint32_t add_bef_shift_flt[3]with the overflowing1u << 31assignment. The Netflix golden assertions (python/test/quality_runner_test.py,vmafexec_test.py,feature_extractor_test.py) encode the buggy ADM output. Project hard rule #1 (ADR-0024) prohibits changing those assertions.- Any "fix" that changes ADM numerical output must land together with a coordinated Netflix-authored golden-number update (the ADR-0142 Netflix-authority carve-out). Until Netflix#955 closes upstream, there is no authority to track.
- On upstream sync:
- If Netflix finally lands a fix for #955 (widening the rounding term to
uint32_torint64_t), sync the C-side fix AND the updatedassertAlmostEqualvalues in the same merge. Re-runmake test-netflix-goldenand/cross-backend-diffon the golden pairs to verify the new numbers are consistent across CPU / CUDA / SYCL. - Remove the in-file warning comment above the
add_bef_shift_fltinitialiser loop, flip ADR-0155 toSuperseded by ADR-NNNN, and drop this rebase-notes entry. - If upstream instead closes #955 as wont-fix, keep this entry verbatim and update the ADR status to note upstream's closure.
- Re-test on rebase (gates the invariant by confirming the golden numbers are unchanged):
ninja -C build
make test-netflix-golden
# Expect: VMAF mean 76.66890… on src01_hrc00/01_576x324 golden
# pair — bit-identical to pre-rebase.
0047 — vmaf_score_pooled -EAGAIN for pending features (ADR-0154)¶
- ADR: ADR-0154
- Upstream source: Netflix upstream issue #755 (OPEN as of 2026-04-24). Upstream maintainer closed the door on the streaming use case in 2020 ("you cannot call vmaf_score_pooled() in a loop"); fork reopens it via error-code semantics without changing the retroactive-write design.
- Touches:
core/src/feature/feature_collector.c—vmaf_feature_collector_get_scorereturns-EAGAIN(was-EINVAL) when the requested index is valid but not yet written.core/src/feature/feature_collector.h— inlinevmaf_feature_vector_get_scorenow returns-EINVALfor null/out-of-range and-EAGAINfor not-written (was-1for both). Added#include <errno.h>. Rename reserved__VMAF_FEATURE_COLLECTOR_H__guard toVMAF_FEATURE_COLLECTOR_INCLUDED.core/test/test_score_pooled_eagain.c— new 4-subtest reducer.core/test/meson.build— register the new test.- Invariants (load-bearing, enforced by the reducer):
vmaf_feature_collector_get_score(fc, name, &score, i)returns-EAGAINiff the featurenameis registered andiis in range butscore[i].written == false.- The return stays
-EINVALfor (a) null pointers, (b)i >= feature_vector->capacity, (c) unknown feature name. - The inline fast-path
vmaf_feature_vector_get_scoreuses the same split. - On upstream sync: upstream has not changed the error semantics since 2020. If they do (unlikely), keep the fork's
-EAGAIN— it is strictly more informative and downstream code depending on the split would regress. - Re-test on rebase:
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: 4/4 subtests pass.
# Reducer check:
git stash push core/src/feature/feature_collector.c core/src/feature/feature_collector.h
ninja -C build && meson test -C build test_score_pooled_eagain
# Expect: Fail: 1 (tests fail without -EAGAIN split).
git stash pop
0046 — float_ms_ssim min-dim guard (ADR-0153)¶
- ADR: ADR-0153
- Upstream source: Netflix upstream issue #1414 (OPEN as of 2026-04-24). No upstream fix has landed; fork adds the guard independently.
- Touches:
core/src/feature/float_ms_ssim.c— add#include "log.h"+#include "iqa/ssim_tools.h"+ amin_dim = GAUSSIAN_LEN << (SCALES - 1)check at the start ofinit; extract SIMD dispatch into a newms_ssim_init_simd_dispatchhelper to keepinitwithin the ADR-0141 60-line budget.core/test/test_float_ms_ssim_min_dim.c— new 3-subtest reducer.core/test/meson.build— register the new test executable.- Invariant (load-bearing, enforced by the reducer):
float_ms_ssim.initreturns-EINVALwhenw < 176 || h < 176, where 176 is computed dynamically from the filter constants. The magic number is not hardcoded — changingSCALESorGAUSSIAN_LENupstream will auto-update the minimum. - On upstream sync: if Netflix upstream lands a similar init-time guard, keep the fork's version — the helper name
ms_ssim_init_simd_dispatchis fork-local (introduced to satisfy ADR-0141) and upstream's patch won't match. Both guards should be compatible; re-verify the reducer after rebase. - Re-test on rebase:
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: 3/3 subtests pass.
# Reducer check (confirms the guard is load-bearing):
git stash push core/src/feature/float_ms_ssim.c
ninja -C build && meson test -C build test_float_ms_ssim_min_dim
# Expect: Fail: 1 (tests fail without the guard).
git stash pop
0045 — vmaf_read_pictures monotonic-index guard (ADR-0152)¶
- ADR: ADR-0152
- Upstream source: Netflix upstream issue #910 (OPEN as of 2026-04-24). No upstream fix has landed; the fork adds the guard independently, per the 2021-10-14 maintainer comment that recommended exactly this shape.
- Touches:
core/src/libvmaf.c— addunsigned last_index+bool have_last_indexfields toVmafContext; prepend a monotonic-index check insideread_pictures_validate_and_prep(returns-EINVALon duplicates / regressions); update the two new fields at the tail of the same helper on success.core/test/test_read_pictures_monotonic.c— new 3-subtest reducer covering the Netflix#910 sequence and the two classes of rejection (duplicate, out-of-order).core/test/meson.build— register the new test executable.- Invariant (load-bearing, enforced by the reducer):
vmaf_read_pictures(vmaf, ref, dist, index)returns-EINVALwhenhave_last_index && index <= last_index. Flush (vmaf_read_pictures(vmaf, NULL, NULL, 0)) routes toflush_contextbefore the guard runs — flushing remains always-available independent of the last accepted index. - On upstream sync:
- If Netflix upstream eventually lands a similar guard at the API boundary, keep the fork's version — the helper function name (
read_pictures_validate_and_prep) is fork-local (ADR-0146), upstream's patch will target a different insertion point. Both guards should be compatible; re-verify the reducer after rebase. - If upstream instead lands an internal reordering mechanism (buffer-and-sort frames before dispatch), revisit this decision — the fork's API-level contract is stricter and may need to relax to match. Open a new ADR if so.
- Re-test on rebase:
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: 3/3 subtests pass.
# Reducer check (confirms the guard is load-bearing):
git stash push core/src/libvmaf.c
ninja -C build && meson test -C build test_read_pictures_monotonic
# Expect: Fail: 1 (the test rejects the un-guarded behaviour).
git stash pop
0044 — i686 (32-bit x86) build-only CI job (ADR-0151)¶
- ADR: ADR-0151
- Upstream source: Netflix upstream issue #1481 (OPEN as of 2026-04-24). Reports i686 compile failure on
_mm256_extract_epi64. Workaround documented in the issue:-Denable_asm=false. - Touches:
build-aux/i686-linux-gnu.ini— new cross-file; gcc +-m32+cpu_family = 'x86'/cpu = 'i686'. Noexe_wrapper..github/workflows/libvmaf-build-matrix.yml— new matrix row withi686: trueflag + new install-deps step forgcc-multilib+g++-multilib; existing "Run tests" + "Run tox tests (ubuntu)" steps widened with&& !matrix.i686guards.- Invariants:
- The i686 matrix row pins
-Denable_asm=false— this is the upstream-documented workaround for_mm256_extract_epi64's missing declaration on 32-bit x86 targets. Do NOT remove the flag without first gating every_mm256_extract_epi64call site incore/src/feature/x86/adm_avx2.c+motion_avx2.c+adm_avx512.con__x86_64__. Removing the flag naively will re-break the build. - No
exe_wrapperin the cross-file: meson marks tests asSKIP 77even though the host can run i686 binaries natively. Build-only gate by design. - On upstream sync:
- If upstream Netflix fixes #1481 at source (by gating the intrinsic calls on
__x86_64__or by emulating via two_mm256_extract_epi32halves), sync the fix and re-enable ASM on the i686 row (drop-Denable_asm=falsefrommeson_extra). Re-verify bit-exactness via/cross-backend-diffon the x86_64 golden pair. - If upstream marks i686 unsupported in meson (e.g. via a hard error), the fork's i686 row should be removed or downgraded to
continue-on-error: true. - Re-test on rebase (Ubuntu host with
gcc-multilib):
meson setup libvmaf core/build-i686 \
--cross-file=build-aux/i686-linux-gnu.ini \
-Denable_asm=false \
-Denable_cuda=false -Denable_sycl=false
ninja -C core/build-i686
file core/build-i686/tools/vmaf
# Expect: ELF 32-bit LSB pie executable, Intel i386
CI runs this same sequence via the new matrix row.
0058 — Tiny-AI Netflix corpus training scaffold (ADR-0252)¶
- ADR: ADR-0252.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training harness or MCP server.
- Touches:
ai/— training harness;NflxLocalDatasetloader reads from--data-root(never from a hardcoded path).docs/ai/training-data.md— corpus path convention and loader API docs; purely additive.mcp-server/vmaf-mcp/tests/test_smoke_e2e.py— new e2e smoke test; references only committed golden fixtures.- Invariants (load-bearing):
- Data path is local-only.
.workingdir2/netflix/is gitignored; no YUV from this corpus is ever committed. The--data-rootCLI flag must remain the sole mechanism for locating the corpus. - Smoke test uses only committed fixtures.
test_smoke_e2e.pyreferencespython/test/resource/yuv/src01_hrc00_576x324.yuv(a committed golden file), never the local corpus path. On upstream sync the golden YUV path must stay stable. - No Netflix golden assertion is modified. The
places=4tolerance intest_smoke_e2e.pyasserts against thevmaf_v0.6.1CPU reference; it is not a golden assertion and may be adjusted by/regen-snapshotswith justification. - On upstream sync: zero interaction with Netflix upstream. The
ai/subtree andmcp-server/are wholly fork-local; upstream merges are conflict-free here. If Netflix ever ships a training harness, reconcile separately. - Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (vmaf binary)
# Skips automatically if binary or golden YUV is absent.
0085 — Research-0030 Phase-3b multi-seed validation (Gate 1 passed)¶
- No ADR. Empirical research digest closing Gate 1 of the 3-gate v2 validation chain. Architecture decision unchanged.
- Upstream source: fork-local. Netflix has no multi-seed validation surface for tiny-AI training.
- Touches (additive only):
docs/research/0030-phase3b-multiseed-validation.md— per-seed PLCC tables + stability analysis + Gate 2/3 plan.ai/scripts/phase3_subset_sweep.py— adds--seedsflag (comma-separated list) + per-seed result aggregation.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The +0.0175 Δ is multi-seed mean PLCC, not seed-0 PLCC. Don't cite the +0.0106 from Research-0029 once Research-0030 lands; the multi-seed number is more trustworthy.
- Subset B is more stable than canonical-6 across seeds. Don't ship a v2 model citing single-seed numbers — always report multi-seed mean ± seed-mean-std for any tiny-AI metric in a future digest.
- The
--seedsflag aggregates by flattening (seed × fold) pairs. The reportedmean_plccis the mean of alln_seeds × n_foldsmeasurements;seed_mean_plcc_stdis the std across per-seed means, which is the right number for "is the result seed-stable". - On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the runs/ files reproduce from the canonical command.
0084 — Research-0029 Phase-3b StandardScaler retry (positive result)¶
- No ADR. Empirical research digest; revives the Research-0026 hypothesis after the Research-0028 negative result. The architectural decision (ship
vmaf_tiny_v2) is gated on three validation steps documented in the digest §"Required before shipping". - Upstream source: fork-local. Netflix has no tiny-AI preprocessing-sensitivity analysis surface.
- Touches (additive only):
docs/research/0029-phase3b-standardscaler-results.md— per-fold tables + apples-to-apples comparison + 3-gate pre-shipping checklist.ai/scripts/phase3_subset_sweep.py— adds--standardizeflag +_standardize_inplacehelper.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- StandardScaler statistics MUST be fit per-fold on the train split only. Fitting on the full data would leak held-out information into LOSO; the
_standardize_inplacehelper enforces this by taking only the train slice as input. - A shipped
vmaf_tiny_v2.onnxMUST bundle its scaler(mean, std)in the sidecar JSON per ADR-0049 — otherwise inference applies different normalisation than training and the win evaporates. Currently UN-implemented; tracked as a §"Caveats" #5 follow-up. - Subset B's feature list is the load-bearing finding:
adm2,adm_scale3,vif_scale2,motion2,ssimulacra2,psnr_hvs,float_ssim. Phase-3c experiments may shift the optimal arch / lr / epochs but should keep this set. - On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the runs/ files are reproducible from the
--standardizeinvocation in §"Reproducer".
0082 — Research-0028 Phase-3 subset sweep (negative-result digest)¶
- No ADR. Empirical research digest. The architectural decision (no v2 model ships from this Phase) is governed by Research-0027's pre-registered stopping rule.
- Upstream source: fork-local. Netflix has no tiny-AI subset- sweep surface.
- Touches (additive only):
docs/research/0028-phase3-subset-sweep.md— per-fold tables adline + standardisation caveat + Phase-3b/c/d follow-ups.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- canonical-6 stays the default until Phase-3b lands a ≥ 0.005 PLCC win (per Research-0027 stopping rule).
- The PLCC drop is most likely a feature-scale issue, not evidence the new features lack signal. Don't cite this digest to retire
ssimulacra2/adm_scale3from the candidate pool; re-test withStandardScalerfirst. - Phase-3 results are seed=0 only. Any v2-shipping decision needs 3-seed mean±std and KoNViD cross-check.
- On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; runs/ files are reproducible from the canonical command in §"Reproducer".
0081 — Research-0027 Phase-2 feature importance results¶
- No ADR. Empirical research digest closing Research-0026 Phase 2; the architectural decision (Subset A / B / C) is deferred to Phase-3 results in a future digest.
- Upstream source: fork-local. Netflix has no cross-metric feature-importance analysis surface.
- Touches (additive only):
docs/research/0027-phase2-feature-importance.md— per-method top-10 + consensus + redundancy + Phase-3 subset recommendations.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Consensus top-10 is the load-bearing finding:
adm2,adm_scale3,ssimulacra2,vif_scale2. Phase-3 candidate subsets MUST include all four. - The 11-pair redundancy table is corpus-specific — measurements on Netflix Public 9-source. KoNViD-1k cross- check is a Phase-3 prerequisite if Subsets B/C advance.
runs/full_features_netflix.parquetandruns/full_features_correlation.jsonstay gitignored. Reproducer in §"Reproducer" regenerates both.- On upstream sync: zero interaction. Fork-only research.
- Re-test on rebase: documentation-only PR; the
runs/files are reproducible from the canonical commands.
0080 — Phase-2 analysis scripts (Research-0026 Phase 2 prep)¶
- No ADR. Pure analysis scaffolding; the architectural decision (which features to ship in v2) is gated on Phase 2's numerical output via Research-0027.
- Upstream source: fork-local. Netflix has no tiny-AI training nor cross-metric correlation tooling.
- Touches (additive only):
ai/scripts/extract_full_features.py— parquet extractor over Netflix corpus withFULL_FEATURES. Per-clip JSON cache at$XDG_CACHE_HOME/vmaf-tiny-ai-full/<source>/<dis_stem>.json.ai/scripts/feature_correlation.py— Pearson + MI + LASSO- consensus top-K analyser; outputs JSON.
ai/tests/test_feature_correlation.py— 5 pytest cases against synthetic parquet (no libvmaf dependency).CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The per-clip JSON cache and the
FULL_FEATUREStuple must stay in lock-step. If the tuple grows (or shrinks), pre-existing cache files become stale and silently misalign their storedper_framecolumns with the new tuple. The extractor MUST be re-run with a cleared cache whenFULL_FEATURESchanges. Regression hint:test_default_features_unchangedintest_feature_sets.pyalready guards the canonical 6; extend coverage toFULL_FEATURESif rebases touch it. motion3resolves to extractormotion_v2in_METRIC_TO_EXTRACTOR, notmotion3(the upstream-canonical extractor name in the integer_motion_v2 module). The CLI--feature motion3does NOT exist. The JSON output key isinteger_motion3which_lookupfinds via theinteger_fallback.admandvifaggregates are NOT inFULL_FEATURES. The integer extractor emitsinteger_adm2andinteger_vif_scale0..3but no bareadm/vif. Listing them produced all-NaN columns in v1 — fixed in PR #185 amend.- On upstream sync: zero interaction. Pure fork-side analysis tooling.
- Re-test on rebase:
pytest ai/tests/test_feature_correlation.py ai/tests/test_feature_sets.py -v
# Expect: 14 passed in <1 s.
0079 — Tiny-AI feature-set registry (Research-0026 Phase 1)¶
- No ADR. Pure additive extension of an existing module; the architectural decision (which features, which model) lives in Research-0026's go/no-go gate after Phase 2.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training pipeline.
- Touches (additive only):
ai/data/feature_extractor.py— addsFULL_FEATURES(21 entries),FEATURE_SETSregistry,resolve_feature_set()helper._METRIC_TO_EXTRACTORgrew 11 → 25 entries.ai/tests/test_feature_sets.py— new 9-test smoke suite.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant — these are load-bearing):
DEFAULT_FEATURESstays the canonical 6-tuple matchingvmaf_v0.6.1's SVR input layout. Testtest_default_features_unchangedis the regression guard; any quiet broadening would invalidate every shipped tiny-AI ONNX (input-dim baked into the model). If a future change must broaden the default, ship a paired model swap under ADR-0049 sidecar policy.FULL_FEATURESexcludeslpipsandfloat_momentper Research-0026 §"Open questions" Q1. Testtest_full_features_excludes_lpips_and_momentenforces. Adding either would re-classify the experiment from "tiny model on classical features" to "ensemble of DNNs".- Every entry in
FULL_FEATURESMUST have an entry in_METRIC_TO_EXTRACTOR. Testtest_every_full_feature_has_extractor_mappingis the guard — without the mapping the libvmaf CLI silently emits NaN columns for the missing metric. - On upstream sync: zero interaction. Fork-only training surface.
- Re-test on rebase:
0078 — Research-0026 cross-metric feature fusion plan¶
- No ADR. Pure research-plan digest; the architectural decision (which features to add) is deferred to Research-0027 follow-up after Phase 2 numbers land.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training and no broader-feature-set hypothesis under investigation.
- Touches (additive only):
docs/research/0026-cross-metric-feature-fusion.md— 4-phase experimental plan + cost estimate + go/no-go criteria.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The 6-feature canonical baseline (
adm2,vif_scale0..3,motion2) stays the default. Any v2 model is opt-in via a newfeature_setfield in the sidecar JSON; existingvmaf_tiny_v1.onnxusers get the same numbers. lpipsis OUT of the candidate pool (Phase 1/2). It's DNN-based and would blur the line between "tiny model on classical features" and "ensemble of DNNs". Revisit only if classical features can't close the gap.- On upstream sync: zero interaction. Pure fork-side research planning.
- Re-test on rebase: documentation-only; no test surface.
0077 — Research-0025 FoxBird outlier resolved via KoNViD combined training¶
- No ADR. Empirical research digest closing the open question in Research-0023 §5; no architecture or policy decision. Pure documentation of an empirical result.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training, no KoNViD-1k integration, and no LOSO eval surface.
- Touches (additive only):
docs/research/0025-foxbird-resolved-via-konvid.md— per-clip table + comparison to Netflix-only baselines + interpretation + caveats + next-experiment list.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- The training-fit per-clip numbers in §"Per-clip result" are NOT held-out generalisation metrics — FoxBird is in the training set. The proper validation is the LOSO sweep on the combined corpus (§"Next experiments" #1). Don't cite the 0.9936 FoxBird PLCC as a generalisation number; cite it as "training-fit on combined corpus, 5.4× RMSE improvement vs Netflix-only".
- Combined trainer command line is canonical. The reproduction recipe in §"Setup" includes
--seed 0,--konvid-val-fraction 0.1,--val-source Tennis,--val-mode netflix-source-and-konvid-holdout. Changing any knob invalidates the per-clip numbers. runs/tiny_combined_canonical/stays gitignored. The final ONNX is reproducible from the parquet + Netflix corpus + the canonical CLI; the durable record is the digest's table.- On upstream sync: zero interaction. Research digest is fork-only.
- Re-test on rebase:
python ai/train/train_combined.py \
--netflix-root .workingdir2/netflix \
--konvid-parquet ai/data/konvid_vmaf_pairs.parquet \
--model-arch mlp_small --epochs 30 --batch-size 256 --lr 1e-3 \
--val-mode netflix-source-and-konvid-holdout \
--val-source Tennis --konvid-val-fraction 0.1 --seed 0 \
--out-dir runs/tiny_combined_canonical
# Expect: FoxBird PLCC ≈ 0.9936 ± 1e-3 (numerical-noise floor),
# mean PLCC ≥ 0.9983 across 9 Netflix clips.
0076 — Research-0024 vif/adm upstream-divergence digest (Strategy E doc)¶
- No ADR. Pure documentation digest; the divergence decisions it ratifies are already governed by ADR-0138 / 0139 / 0142 / 0143 (vif SIMD bit-exactness contract) and ADR-0024 (Netflix golden-data immutability). The digest itself fits the per-PR research-digest deliverable bar from ADR-0108.
- Upstream source: forward-looking — pre-emptively documents the fork's non-port of Netflix
4ad6e0ea/41d42c9e/bc744aa3/8c645ce3(vif chain) and4dcc2f7c(float_adm chain). Strategy A onb949cebfmotion chain stays approved. - Touches (additive only):
docs/research/0024-vif-upstream-divergence.md— 5-strategy decision matrix + numerical-risk analysis for each chain.core/src/feature/AGENTS.md— two new "rebase-sensitive invariants" entries pinning the vif and adm divergences.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant — these are the whole point):
- Do not port
4ad6e0ea(vif runtime helpers) or8c645ce3(vif prescale options) verbatim. They replace the precomputedvif_filter1d_table_stable whose frozenconst floatGaussians make AVX2 == AVX-512 == NEON == scalar bit-for-bit. A future opt-in second-path port (Strategy C, runtime helpers behind--vif-prescale != 1) is allowed but must not touch the default code path. - Do not port
4dcc2f7cfloat_adm options chain. The 12-parametercompute_admsignature change cascades through SIMD (avx2 / avx512 / neon) and 3 GPU backends (vulkan / cuda / sycl). The newaimfeature has no fork- side golden values; defer until concrete user demand. - Mirror bugfix
41d42c9eis a separate decision. Must come paired withplaces=4 → places=3golden loosening per ADR-0142 Netflix-authority precedent. Not part of Strategy E; eligible for a focused single-purpose PR if any shipped model drifts more thanplaces=3because of the missing fix. b949cebfmotion chain port stays APPROVED under Strategy A (verbatim, float_motion-side only). Float_motion has no precomputed-table investment to protect; existing fork integer_motion already has 6/9 of these options; cheap to mirror onto float_motion.- On upstream sync: zero conflict — pure additions to research/ and AGENTS.md.
- Re-test on rebase: documentation-only PR; rendered markdown is the only verification surface.
# Re-run the diff scan that produced the digest (catches new
# upstream commits since 9dac0a59):
git fetch upstream && git log --pretty=format:'%h %s' \
upstream/master ^origin/master --since="2026-01-01" \
-- core/src/feature/{float_,integer_,}{vif,motion,adm,cambi}*.{c,h} \
core/src/feature/{vif,motion,adm,cambi}_options.h \
| head -30
# If new vif / adm option ports appear, update Research-0024 §"Same
# divergence test for motion + float_adm" before deciding to port.
0075 — Upstream 798409e3 + 314db130 ports (CUDA null-deref + remove all.c)¶
- No ADR. Pure upstream cherry-picks per ADR-0108 carve-out ("pure upstream syncs and
port-upstream-commitPRs are exempt"). - Upstream source:
798409e3(Lawrence Curtis, 2026-04-20): "Fix null deref crash on prev_ref update in pure CUDA pipelines"314db130(Kyle Swanson, 2026-04-28): "libvmaf/feature: remove empty translation unit all.c"- Touches (additive / removal only):
core/src/libvmaf.c— addsif (ref && ref->ref)guard beforevmaf_picture_ref(&vmaf->prev_ref, ref)at the two threaded paths (threaded_enqueue_oneline 1057 andthreaded_read_pictures_batchline 1105). Main path at line 1597 already has the guard.core/src/feature/all.c— file deleted.core/src/meson.build— drops thefeature_src_dir + 'all.c'line.core/src/feature/offset.c— updates the// NOLINTNEXTLINEcomment to dropall.cfrom the list of per-feature consumers.CHANGELOG.mdUnreleased § Fixed (798409e3) + § Changed (314db130).- Invariants (rebase-relevant):
- The fork has THREE prev_ref update sites; all need the
if (ref && ref->ref)guard. The mainvmaf_read_picturespath already had it (viaread_pictures_update_prev_refhelper); the threaded paths (#ifdef VMAF_BATCH_THREADING) inherited the unguarded shape from upstream's old code. Future upstream rebases must preserve all three guards even if Netflix refactors the threaded paths. all.cdeletion is symbol-safe. Allcompute_*functions it forward-declared are reached via per-extractor TUs that#includethe relevant<feature>.h. No external linker dependency onall.c's symbols.- On upstream sync: zero conflict expected — fork now matches upstream tip on these two surfaces.
- Re-test on rebase:
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=disabled
ninja -C build-cpu
meson test -C build-cpu # 37 tests, all pass.
0074 — Combined Netflix + KoNViD-1k trainer driver¶
- No ADR. Pure engineering follow-up; the architecture rationale is fully covered by ADR-0203 (training-prep architecture) and Research-0023 §5 (FoxBird-class outlier needs broader corpus).
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI trainer.
- Stacks on the KoNViD-1k loader bridge (PR #178 / rebase-note 0073). Rebase order: land 0073 first.
- Touches (additive only):
ai/train/train_combined.py— concatenating trainer that reuses_build_model/_train_loop/export_onnxfromai/train/train.py.ai/tests/test_train_combined_smoke.py— 5 pytest cases (key splitter +--epochs 0paths, no libvmaf or real corpus required).docs/ai/training.md— "Combining KoNViD with the Netflix corpus" subsection rewritten from "follow-up" to runnable.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Reuse the canonical training-loop helpers. Don't fork
_build_model/_train_loop/export_onnxinto this file. Both trainers must share the model factory so a future change (e.g. addingmlp_large) lands in one place. - KoNViD train/val splits hold out whole clip keys, not random frames. A frame-level split would let frames from the same clip leak across train/val and inflate PLCC by 5-10 pp (well-known VQA pitfall — same reasoning as ADR-0203's Netflix 1-source-out split).
- Missing data falls back, not errors. Missing
--konvid-parquet→ Netflix-only path. Missing--netflix-root→ KoNViD-only path. Both missing → initial- weights ONNX export +rc=0so the smoke command always produces a deterministic artefact. - On upstream sync: zero interaction; pure fork-local trainer.
- Re-test on rebase:
pytest ai/tests/test_train_combined_smoke.py -v
# Expect: 5 passed (under ~3 s, no libvmaf required).
python ai/train/train_combined.py --epochs 0 \
--netflix-root /tmp/missing --konvid-parquet /tmp/missing.parquet \
--out-dir /tmp/combined_smoke
# Expect: <out-dir>/mlp_small_combined_final.onnx written, rc=0.
0073 — KoNViD-1k → VMAF-pair acquisition + loader bridge¶
- No ADR. Acquisition + loader pieces are pure additions; the methodology fits inside ADR-0203 / Research-0019.
- Upstream source: fork-local. KoNViD-1k integration is a fork-only training-data play.
- Touches (additive only):
ai/scripts/konvid_to_vmaf_pairs.py— acquisition pipeline.ai/train/konvid_pair_dataset.py—KoNViDPairDatasetclass mirroringNetflixFrameDataset's interface.ai/tests/test_konvid_pair_dataset.py— 5 pytest cases.docs/ai/training.md— new "C1 (KoNViD-1k corpus)" section.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
KoNViDPairDatasetmirrorsNetflixFrameDatasetshape.feature_dim == 6,numpy_arrays() → (X, y)returns(n_frames, 6)+(n_frames,). IfNetflixFrameDataset's feature order changes, mirror it here.- Acquisition parquet schema is fixed. Required columns:
key,frame_index,vif_scale0..3,adm2,motion2,vmaf. Add freely; do NOT rename / drop those. ai/data/konvid_vmaf_pairs.parquetand$VMAF_TINY_AI_CACHE/konvid-1k/stay gitignored. They regenerate from raw KoNViD.mp4sources.- On upstream sync: zero interaction.
- Re-test on rebase:
pytest ai/tests/test_konvid_pair_dataset.py -v
# Expect: 5 passed
python ai/scripts/konvid_to_vmaf_pairs.py --max-clips 5
# Expect: ~7 s wall, ai/data/konvid_vmaf_pairs.parquet with
# 5 unique keys × ~200 frames each.
0072 — Tiny-AI 3-arch LOSO eval harness + Research-0023¶
- No ADR. Methodology fits inside Research-0023; ADR-0203 already covers the training-prep architecture and the three-arch sweep concept.
- Research digest:
docs/research/0023-loso-3arch-results.md. - Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
- Touches (additive only):
ai/scripts/eval_loso_3arch.py— new harness; reuses the_load_session+_load_clip+CLIPShelpers fromeval_loso_mlp_small.py(PR #165).docs/research/0023-loso-3arch-results.md— methodology + per-fold tables formlp_small/mlp_medium/linear.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
- Reuse the PR #165 helpers. Don't fork the
_load_sessionexternal-data workaround into a copy — both scripts must keep using the same import. If a follow-up re-exports the shipped baselines with correctedexternal_data.location, both scripts deprecate the workaround simultaneously. runs/andmodel/tiny/training_runs/stay gitignored. The harness writesruns/loso_eval/loso_3arch_eval.{json,md}; the durable record is the table in Research-0023 §2 + the per-fold tables in §3. Regenerate via the loop in §6 of the digest.- On upstream sync: zero interaction. Pure fork-local evaluation harness.
- Re-test on rebase:
python ai/scripts/eval_loso_3arch.py
diff <(jq -r '.archs.mlp_small.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9808)
diff <(jq -r '.archs.mlp_medium.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.9727)
diff <(jq -r '.archs.linear.aggregate.mean_plcc' runs/loso_eval/loso_3arch_eval.json) <(echo 0.3679)
# Expect: identical lines on a populated cache + identical fold ONNX.
0071 — T7-16 ADM Vulkan/SYCL drift verified-resolved (doc close)¶
- No ADR. Verification-only close, sister of T7-15.
- Upstream source: fork-local. ADM cross-backend gate is a fork-only test surface; Netflix/vmaf has no Vulkan or SYCL backend.
- Touches (additive only):
docs/state.md— new "Recently closed" row for T7-16..workingdir2/BACKLOG.md— T7-16 row marked closed (local- only planning dossier; gitignored).CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
places=4cross-backend ADM contract. Empiricaladm_scale2max_abs_diff is now 1e-6 (print floor; ULP=0) on Vulkan device 0 (NVIDIA), device 1 (Mesa anv on Arc), and SYCL device 0 (Arc); residualadm_scale1 ≈ 3.1e-5andadm2 ≈ 5e-6on 1/48 frames passplaces=4(5e-5 tolerance) but failplaces=5. Hold the gate atplaces=4.- No ADM kernel source change. Fix is environmental (NVCC + driver + SYCL runtime).
- On upstream sync: zero interaction.
- Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--feature adm --backend vulkan --device 0 --places 4 \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324
# Expect: 0/48 mismatches across all 5 ADM metrics.
0070 — T7-15 motion CUDA/SYCL drift verified-resolved (doc close)¶
- No ADR. Verification-only close; no code change in PR #172.
- Upstream source: fork-local. Cross-backend gate is a fork-only test surface; not in Netflix/vmaf.
- Touches (additive only):
docs/state.md— "Recently closed" row for T7-15..workingdir2/BACKLOG.md— T7-15 row marked closed (local- only planning dossier; gitignored).CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- The
places=4cross-backend gate stays atplaces=4. Empirical max_abs_diff is currently 0.0 (CUDA) or 1e-6 (SYCL/ Vulkan, JSON%frounding floor); tightening toplaces=5could be tempting but the 1e-6 print-floor would then make the SYCL + Vulkan rows fail. Hold atplaces=4until--precision=maxis wired into the diff tool. - No motion-kernel source change. PR #172 didn't modify
core/src/feature/cuda/integer_motion/*.cuorcore/src/feature/sycl/integer_motion_sycl.cpp. The fix is environmental (NVCC + driver), so the next CI run on a fresh image needs to be re-verified against the gate. - On upstream sync: zero interaction.
- Re-test on rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature motion --backend cuda \
--places 4
# Expect: 0/48 mismatches, max_abs_diff = 0.0
0069 — libvmaf_vulkan.h installed under prefix (build bug)¶
- No ADR. Build-system bug fix; matches existing CUDA / SYCL install conditions.
- Upstream source: fork-local. Vulkan backend is fork-only; Netflix/vmaf has no
libvmaf_vulkan.h. - Touches:
core/include/core/meson.build— adds anis_vulkan_enabledgate that handles thefeatureoption'senabled/autostates; appendslibvmaf_vulkan.htoplatform_specific_headerswhen active.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- Install rule mirrors the CUDA / SYCL pattern but uses the feature-option API. The
is_cuda_enabled = get_option('enable_cuda') == trueboolean idiom doesn't apply toenable_vulkanbecause that's a feature option, not a boolean. Use.enabled() or .auto(). Don't "simplify" to== true— that would silently drop the install in theautostate. - Pairs with
ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patchwhich probes for the header viacheck_pkg_config libvmaf_vulkan "libvmaf >= 3.0.0" libvmaf/libvmaf_vulkan.h vmaf_vulkan_state_init_external. Removing the install rule re-introduces lawrence's 2026-04-28 symptom: FFmpeg silently drops thelibvmaf_vulkanfilter despite--enable-libvmaf-vulkan. - On upstream sync: zero interaction; Vulkan backend is fork-only.
- Re-test on rebase:
cd libvmaf
CC=icx CXX=icpx meson setup build -Denable_vulkan=enabled \
-Denable_cuda=true -Denable_sycl=true -Db_lto=false
ninja -C build
meson install -C build --destdir /tmp/libvmaf-install
ls /tmp/libvmaf-install/usr/local/include/libvmaf/libvmaf_vulkan.h
# Expect: file exists.
0066 — --backend cuda inverted-gpumask fix (CLI bug)¶
- No ADR. Bug fix; behaviour now matches the public-header
VmafConfiguration::gpumaskcontract. - Upstream source: fork-local. The
--backendCLI selector was added by the fork (Netflix/vmaf has no exclusive-backend selector). - Touches (additive + 1-line behavioural fix):
core/tools/cli_parse.c::parse_cli_args—--backend cudabranch setsgpumask = 0(wasgpumask = 1).core/test/test_cli_parse.c— 5 new regression tests (test_backend_{cpu,cuda_engages_cuda,cuda_preserves_explicit_gpumask,sycl,vulkan}) plusrun_aom_ctc_tests/run_backend_testshelper split to keeprun_testsunder the function-size budget.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
VmafConfiguration::gpumasksemantics:if gpumask: disable CUDA.compute_fex_flagsinsrc/libvmaf.croutes CUDA only whengpumask == 0. Any code path that sets a non-zerogpumaskto "request CUDA" silently disables it. The CLI's--backend cudabranch must setgpumask = 0and rely onuse_gpumask = trueto triggervmaf_cuda_state_init. Do not "fix" this back togpumask = 1— it's the bug being fixed.- Explicit
--gpumask=N --backend cudapreserves N. A user who passes--gpumask=2already hasuse_gpumask = true, so the--backend cudabranch's defaulting block (gated on!settings->use_gpumask) is skipped. Thetest_backend_cuda_preserves_explicit_gpumaskregression locks this in. - On upstream sync: zero interaction;
--backendis fork-only. - Re-test on rebase:
./build/test/test_cli_parse | grep -E 'backend_'
# Expect: 5 backend tests pass.
build/tools/vmaf -r REF -d DIS -w 576 -h 324 -p 420 -b 8 \
--model "path=model/vmaf_v0.6.1.json" --threads 1 \
--backend cuda --output cuda.json --json -q
python3 -c "import json; d=json.load(open('cuda.json')); \
assert len(d['frames'][0]['metrics']) == 12, 'CUDA not engaged'"
0067 — Tiny-AI PTQ accuracy across Execution Providers (T5-3e)¶
- No ADR. Investigation/measurement PR; ADR-0129 already governs the PTQ workstream. Findings update
docs/research/0006-tinyai-ptq-accuracy-targets.md§"GPU-EP quantisation" — that section was previously a deferred-open-question; it is now the empirical landing spot. - Research digest: same file (Research-0006).
- Upstream source: fork-local. Netflix/vmaf does not ship a PTQ harness or any tiny-AI ONNX path.
- Touches (additive only):
ai/scripts/measure_quant_drop_per_ep.py— new sibling ofmeasure_quant_drop.py. CPU+CUDA via ORT; Arc / OpenVINO-CPU via the nativeopenvinoPython runtime (noonnxruntime-openvinobecause no cp314 wheel exists). Reuses the_load_sessionrename workaround from PR #165 + avalue_info-strip fix so dynamic-PTQ doesn't choke on the shipped MLP ONNX.docs/ai/quant-eps.md— new user doc; linked fromdocs/ai/index.md.docs/research/0006-tinyai-ptq-accuracy-targets.md— refreshed header, replaced "GPU-EP open question" with the measurement table, fixed pre-existing MD040/MD060 lints surfaced on the touched file.docs/ai/index.md— added the quant-eps row, rewrapped to 80 cols.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant):
measure_quant_drop.py(the CI gate) is unchanged. The new script is purely additive. Any rebase that conflates the two scripts must keep the CI gate CPU-only — Arc int8 is broken, so a per-EP gate would red-light every PR.value_infostrip is required forvmaf_tiny_v1*dynamic PTQ. The shipped MLP ONNX duplicate weight tensors invalue_info, which makesquantize_dynamicraiseInferred shape and existing shape differ. The fix is in_save_inlined. Don't remove it during a refactor unless the underlying ONNX is regenerated.- CUDA-12 ABI shim. ORT-GPU 1.25 wheels link
libcublasLt.so.12even on CUDA-13 hosts. The reproduction recipe pins thenvidia-*-cu12wheels and prepends them toLD_LIBRARY_PATH. If a future ORT wheel drops the cu12 ABI we can cut the shim, but the script tolerates either since it doesn't import any CUDA symbol itself. - On upstream sync: zero interaction; entirely fork-local.
- Re-test on rebase:
SP=$VIRTUAL_ENV/lib/python3.14/site-packages/nvidia
export LD_LIBRARY_PATH="$SP/cublas/lib:$SP/cudnn/lib:$SP/cuda_nvrtc/lib:$SP/cuda_runtime/lib:$SP/cufft/lib:$SP/curand/lib:$SP/cusolver/lib:$SP/cusparse/lib:$SP/cuda_cupti/lib:$SP/nvtx/lib:$SP/nvjitlink/lib"
python ai/scripts/measure_quant_drop_per_ep.py \
--eps cpu cuda openvino \
--extra-fp32 vmaf_tiny_v1.onnx vmaf_tiny_v1_medium.onnx \
--out runs/quant-eps-$(date +%Y-%m-%d)
# Expected: CPU + CUDA PASS (drop ≤ 1.2e-4); OpenVINO Arc ERR
# (compile failure for Conv-int8) or NaN (MatMul-int8) until a
# newer intel_gpu plugin lands.
0065 — testdata/bench_all.sh correct backend-engagement flags¶
- No ADR. Bug fix; no behavioural surface change beyond "the bench actually engages the backends it claims to now."
- Upstream source: fork-local.
testdata/bench_all.shis a fork-only bench harness; not in Netflix/vmaf. - Touches (additive only):
testdata/bench_all.sh— switched per-row flag pattern from the disable-only singletons (--no_syclfor "CUDA", etc.) to the correct engagement form (--gpumask=0 --no_sycl --no_vulkanfor CUDA,--sycl_device=0 --no_cuda --no_vulkanfor SYCL,--vulkan_device=0 --no_cuda --no_syclfor Vulkan, and--no_cuda --no_sycl --no_vulkanfor CPU). Added a 4th column (Vulkan) to the comparator. Honours$VMAF_BINfor the binary path and$VMAF_ONEAPI_SETVARSfor the oneAPI install location.CHANGELOG.mdUnreleased § Fixed.- Invariants (rebase-relevant):
- Disable-only singletons don't engage a backend.
--no_syclalone leaves CUDA available but unrequested.--no_cudaalone leaves SYCL available but unrequested. The CLI inits CUDA only whenc.use_gpumaskis set; SYCL only whenc.sycl_device >= 0orc.use_gpumask; Vulkan only whenc.vulkan_device >= 0. Any change to those gates that drops one of the per-row flags will re-introduce the silent CPU fallback. Verify after a rebase by inspecting JSONframes[0].metricskey counts (CPU 14-15, CUDA 11-12, Vulkan ~34) — seelibvmaf/AGENTS.md§"Backend-engagement foot-guns". gpumasksemantics are inverted from intuition.gpumask=0enables CUDA dispatch;gpumask=1disables it. The per-row CUDA flag is--gpumask=0, not--gpumask=1. Don't "fix" it to--gpumask=1for symmetry with sycl_device/vulkan_device — that's the bug being fixed (parallel to PR #170).- On upstream sync: zero interaction;
testdata/bench_all.shis fork-only. - Re-test on rebase:
bash testdata/bench_all.sh # smoke
# Verify each row's JSON keys match the expected per-backend count:
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cpu.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_cuda.json
jq '.frames[0].metrics | keys | length' testdata/bbb/results/t1_vulkan.json
0063 — Tiny-AI LOSO eval harness for mlp_small¶
- No ADR. The methodology fits inside Research Digest 0022; ADR-0203 already covers the training-prep architecture.
- Research digest:
docs/research/0022-loso-mlp-small-results.md. - Upstream source: fork-local. Netflix/vmaf has no LOSO eval surface.
- Touches (additive only):
ai/scripts/eval_loso_mlp_small.py— new evaluation harness.docs/ai/loso-eval.md— usage doc.docs/research/0022-loso-mlp-small-results.md— methodology + results.CHANGELOG.mdUnreleased § Added.- Invariants (rebase-relevant):
_load_sessionworkaround for renamed-baseline ONNX. The shipped baselinesmodel/tiny/vmaf_tiny_v1*.onnxreference their pre-renameexternal_data.locationvalues. The workaround in_load_sessionrewrites the entries before handing the proto to ORT. Removing the workaround breaks the baseline phase. The proper fix (re-export with matching names) is tracked as a follow-up; until then this code path is load-bearing.runs/andmodel/tiny/training_runs/stay gitignored. The harness writes toruns/loso_eval/by default; do NOT promote any of those outputs into the tree. The 9 fold ONNX and the per-clip JSON cache regenerate from the corpus + trainer + libvmaf CLI.- On upstream sync: zero interaction. Pure fork-local evaluation harness.
- Re-test on rebase:
python ai/scripts/eval_loso_mlp_small.py
diff <(jq -r '.loso_aggregate.mean_plcc' runs/loso_eval/loso_mlp_small_eval.json) <(echo 0.9808)
# Expect: identical line on a populated cache + identical fold ONNX.
0064 — Section-A audit: 9 backlog rows + ADR cross-links¶
- No ADR. Process / docs PR; rows trace back to the individually-cited ADRs / research digests in their own References columns.
- Decision dossier:
.workingdir2/decisions/section-a-decisions-2026-04-28.md. - Source audit:
docs/backlog-audit-2026-04-28.md. - Upstream source: fork-local. Pure backlog hygiene PR; no Netflix code touched.
- Touches (additive only):
.workingdir2/BACKLOG.md— 9 new rows: T3-17, T3-18, T5-3e, T5-4, T7-35, T7-36, T7-37, T7-38; T6-1a row extended with the bisect-cache fixture sub-bullet.docs/research/0006-tinyai-ptq-accuracy-targets.md— drops the "defer until first user" framing on the GPU-EP quantisation open question per user direction; cross-links T5-3e.docs/research/0020-cambi-gpu-strategies.md— v2 follow-up section now cites T7-36 as the gate for opening the v2 row.docs/adr/0205-cambi-gpu-feasibility.md— Decision section's "follow-up integration PR" now cites T7-36.CHANGELOG.mdUnreleased § Changed.- Invariants (rebase-relevant): none. Pure backlog text. Rebase-conflict risk is limited to the same
BACKLOG.mdtable rows that any future row addition would touch; trivial to re-resolve. - On upstream sync: zero interaction.
- Re-test on rebase: none — docs-only.
0062 — ssimulacra2 CUDA + SYCL twins (ADR-0206)¶
- ADR: ADR-0206.
- Upstream source: fork-local. Netflix/vmaf has no SSIMULACRA 2 GPU implementation; this PR adds the CUDA + SYCL twins of the fork's ADR-0201 Vulkan kernel.
- Touches (additive + small wiring edits):
docs/adr/0206-ssimulacra2-cuda-sycl.mdand the index row indocs/adr/README.md.core/src/feature/cuda/ssimulacra2_cuda.{c,h}— new CUDA dispatch.core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cuandssimulacra2_mul.cu— new CUDA fatbins.core/src/feature/sycl/ssimulacra2_sycl.cpp— new SYCL extractor.core/src/feature/feature_extractor.c— two new extern declarations + two new entries infeature_extractor_list[].core/src/meson.build— addsssimulacra2_blur+ssimulacra2_multocuda_cu_sources, introduces (or extends, if PR #157 / ADR-0202 landed first) thecuda_cu_extra_flagsmap with assimulacra2_blurentry, threadsper_kernel_flagsinto the fatbin custom-target, and lists the two new C / CPP TUs.core/src/cuda/AGENTS.mdandcore/src/sycl/AGENTS.md— rebase invariant notes for the per-kernel--fmad=falseflag and the-fp-model=preciseSYCL build flag.docs/backends/cuda/overview.md,docs/backends/sycl/overview.md,docs/metrics/features.md— coverage matrix updates.CHANGELOG.mdUnreleased § Added.- Invariants (load-bearing on rebase):
- Per-kernel
--fmad=falseforssimulacra2_blur. The IIR'so = n2 * sum - d1 * prev1 - prev2must NOT fuse into FMAs — without the flag the recursive Gaussian's per-step rounding compounds across the 6-scale pyramid pastplaces=4. -fp-model=preciseon the SYCL feature build line. Removing it driftsssimulacra2_syclpastplaces=2through the IIR.- Hybrid host/GPU split mirrors Vulkan. Host runs YUV→RGB, XYB, downsample, and SSIM/EdgeDiff combine in double; GPU runs only mul + IIR blur. Any future PR that ports XYB or YUV→RGB onto the GPU MUST land alongside an updated ADR-0206 and re-validate
places=4on every Netflix CPU pair. - CUDA fex uses
.extract(synchronous), not.submit/.collect. Per-frame raw YUV is D2H-copied frompicture_cuda's device-sideVmafPicture.data[]into pinned host scratch viacuMemcpy2DAsync. Skipping the copy segfaults — direct host reads on aCUdeviceptrare the failure mode the prior agent's WIP hit. - On upstream sync: zero interaction with Netflix. The GPU coverage matrix for
ssimulacra2is wholly fork-local. - Re-test on rebase:
meson setup build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary ./build_cuda/tools/vmaf \
--feature ssimulacra2 --backend cuda --places 4 \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8
# Expect: 0/48 mismatches, max_abs_diff ~1e-6.
0061 — cambi GPU feasibility spike (ADR-0205)¶
- ADR: ADR-0205.
- Research digest:
docs/research/0020-cambi-gpu-strategies.md. - Upstream source: fork-local. Netflix/vmaf has no Vulkan backend.
- Touches (additive only):
docs/adr/0205-cambi-gpu-feasibility.md,docs/research/0020-cambi-gpu-strategies.md,docs/adr/README.mdindex row.core/src/feature/vulkan/cambi_vulkan.c— new dormant scaffold (not yet invulkan_sources, not yet registered).core/src/feature/vulkan/shaders/cambi_{derivative,decimate,filter_mode}.comp— new reference GLSL shaders, not yet in the build'sshaderslist.core/src/feature/AGENTS.mdinvariants +CHANGELOG.mdbullet.- Invariants (rebase-relevant):
- Hybrid host/GPU port by decision. If Netflix upstream tightens the c-value formula or histogram update protocol, the host residual call site in the eventual
cambi_vulkan.c::cambi_vulkan_extractmust be updated alongsidecambi.c::calculate_c_values— the same code is reused. Do NOT translate the c-values phase to GPU during any upstream-port PR; that optimisation belongs to the v2 strategy-III PR (deferred). - Scaffolds dormant in the spike PR. The
cambi_vulkan.cextractor returns-ENOSYSfromcambi_vulkan_init_stubuntil the integration follow-up wires it in. Do NOT registervmaf_fex_cambi_vulkan_scaffoldinfeature_extractor.c's list. - Shaders not in the build's shader list. Adding them to
core/src/vulkan/meson.build'svulkan_shaderslist before the integration PR produces orphaned*_spv.hheaders. Leave them alone in this spike PR. - On upstream sync: zero interaction. cambi.c itself is upstream-mirrored — Netflix changes flow through
port-upstream-commit; only the integration PR's host residual call site needs paired attention. - Re-test on rebase:
```bash meson setup build -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build
0059 — Tiny-AI Netflix corpus training prep (ADR-0203)¶
- ADR: ADR-0203.
- Upstream source: fork-local. Netflix/vmaf has no equivalent training surface.
- Touches:
ai/data/— Netflix loader, libvmaf-CLI feature extractor, distillation scoring.ai/train/— PyTorch dataset, eval harness, Lightning-style training entry point.ai/scripts/run_training.sh— convenience wrapper.ai/tests/— five new pytest modules (test_netflix_loader.py,test_dataset.py,test_eval.py,test_train_smoke.py, plusconftest.py).docs/ai/training.md— new "C1 (Netflix corpus)" section; existing sections untouched.ai/AGENTS.md— invariants section added.- Invariants (load-bearing):
- Filename ladder regex is fork-specific.
<source>_<quality>_<height>_<bitrate>.yuv(dis) +<source>_<fps>fps.yuv(ref). Upstream may publish a different naming convention later; do NOT merge them — keep this loader scoped to the Netflix corpus, add a sibling loader for any upstream alternative. - Per-clip cache schema is consumed by both dataset and any downstream tooling. Schema is
{features:{feature_names, per_frame, n_frames}, scores:{per_frame, pooled}}. Any change must invalidate$VMAF_TINY_AI_CACHE(delete or version-tag the directory). - Smoke command stays runnable without a built
vmafbinary. The_make_zero_payloadhelper inai.train.datasetinjects a fake payload for--epochs 0so CI gates don't drag a libvmaf build into the Python test surface. - YUV size probe never silently guesses.
probe_yuv_dimseither matches the 1920x1080 default, returns ffprobe's answer, or raises. Tests passassume_dims=(16, 16)explicitly for synthetic fixtures. - On upstream sync: no interaction with upstream. The
ai/subtree is wholly fork-local. - Re-test on rebase:
python -m pytest ai/tests/test_netflix_loader.py \
ai/tests/test_dataset.py ai/tests/test_eval.py \
ai/tests/test_train_smoke.py -v
python ai/train/train.py --epochs 0 --data-root /tmp/mock_corpus \
--assume-dims 16x16 --val-source BetaSrc --out-dir /tmp/out
0073 — Tiny-AI QAT trainer + first per-model QAT pass (T5-4)¶
- ADR: ADR-0207 (design), ADR-0208 (per-model impl).
- Touches:
ai/train/qat.py(new),ai/scripts/qat_train.py(rewrite fromNotImplementedErrorscaffold),ai/configs/learned_filter_v1_qat.yaml(new),ai/tests/test_qat_smoke.py(new),docs/ai/quantization.md(QAT tier added). All paths are wholly fork-local; no upstream Netflix/vmaf interaction. - Invariants:
- Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) is load-bearing. Both the legacy ONNX exporter (
quantized::conv2d) and the new TorchDynamo exporter (Conv2dPackedParamsBase.__obj_flatten__) refuse to consumeconvert_fxoutput on PyTorch 2.11. The bridge (state-dict diff to a fresh fp32 module + ORT static-quantize) is the only path that yields a QDQ ONNX. Do NOT collapse to a single-stepconvert_fx → torch.onnx.exportuntil both PyTorch issues are fixed; re-check both exporters on each PyTorch upgrade. - State-dict transfer matches by submodule name + shape.
_copy_qat_weights_into_fp32walksfp32_statekeys, finds the same key in the FX-prepared module, copies the tensor. Tiny-AI models today have stable submodule names (entry,body.*,exit); a model architecture that uses top-levelnn.Sequentialwould break this becauseprepare_qat_fxrenames Sequential children to numeric indices. TheRuntimeError("0 tensors copied")guard catches the silent failure mode. - FX preparation runs on CPU. PyTorch 2.11's FX symbolic tracer is flaky on CUDA buffers; the trainer migrates the model to CPU before
prepare_qat_fxand back to the accelerator for the fine-tune phase. The smoke test deliberately exercises the CPU path so this stays covered. torch.ao.quantizationdeprecation will hard-fail in PyTorch 2.10. Migration target istorchao.quantization.pt2e(prepare_pt2e/convert_pt2e); the two-step pipeline is mostly pt2e-compatible — only the FX-prep call changes.- On upstream sync: no interaction with upstream. The
ai/subtree is fully fork-local. - Re-test on rebase:
python -m pytest ai/tests/test_qat_smoke.py -v
python ai/scripts/qat_train.py \
--config ai/configs/learned_filter_v1_qat.yaml \
--output /tmp/qat_smoke.int8.onnx --smoke
0074 — GPU-parity matrix CI gate (T6-8 / ADR-0214)¶
- Touched surfaces (fork-local):
scripts/ci/cross_backend_parity_gate.py(new),.github/workflows/tests-and-quality-gates.yml(newvulkan-parity-matrix-gatejob),docs/development/cross-backend-gate.md(new),docs/backends/index.md(cross-backend section),libvmaf/AGENTS.md(rebase-sensitive invariant note). - Why this matters on rebase: the CI lane and the matrix-gate script are entirely fork-local. Upstream Netflix/vmaf has no comparable gate; conflicts on rebase are restricted to the CI workflow file when upstream rearranges its own jobs. The gate's Python script lives outside
core/src/so the upstream-sync path doesn't see it. - Invariants the gate enforces:
- Per-feature absolute tolerance is declared in one place (
FEATURE_TOLERANCEinscripts/ci/cross_backend_parity_gate.py). Tightening a tolerance requires a measurement-driven follow-up ADR; loosening requires a justification ADR (CLAUDE.md §12 r1). - The legacy single-feature gate
scripts/ci/cross_backend_vif_diff.pystays for one release cycle. Sister PRs in this session add to it; the T6-8b cleanup PR deletes it once the matrix gate has soaked. - CUDA / SYCL / hardware-Vulkan are advisory until a self-hosted runner is registered. The script supports them via
--backends; flipping the CI lane to required is a follow-up wiring change, not a code change. - On upstream sync: no interaction with upstream
tests-and-quality-gates.yml(the gate job is fork-added); rebase conflicts limited to insertion-order in the workflow file. - Re-test on rebase:
cd libvmaf && meson setup build \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled -Denable_float=true \
--buildtype=release && ninja -C build
cd ..
python3 scripts/ci/cross_backend_parity_gate.py \
--vmaf-binary core/build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --backends cpu vulkan \
--json-out /tmp/parity.json --md-out /tmp/parity.md
0220 — SYCL feature kernels are unconditionally fp64-free (T7-17)¶
- Touches:
core/src/sycl/common.cpp(init log line),core/src/sycl/AGENTS.md(new invariant row), all SYCL feature kernels undercore/src/feature/sycl/(no diff today, but the contract pins their shape going forward). - Invariant: every SYCL feature-kernel lambda captures and operates on
float/ integer types only. Nodoubleoperand inside aparallel_forbody, nosycl::reduction<double>, nosycl::plus<double>. A single fp64 instruction in the TU's SPIR-V module causes the Level Zero runtime to reject the entire module on Intel Arc A-series and other fp64-less devices, even when the offending kernel is never submitted. Host-sidedouble(inextract/flushpost-processing, score aggregation, log10 normalisation) remains fine. Concrete patterns in tree: ADM gain limiting via int64 Q31 (gain_limit_to_q31+launch_decouple_csf<false>ininteger_adm_sycl.cpp); VIF gain limiting via fp32sycl::fmin; CIEDE / SSIM accumulators viasycl::reduction<int64_t>/sycl::plus<int64_t>. - On upstream sync: Netflix/vmaf has no SYCL backend upstream; conflicts cannot enter via
git merge. The risk is a fork-local cherry-pick (e.g. a SYCL twin of a new CUDA kernel) bringing adoubleinto a kernel lambda. Audit the lambda capture list and anysycl::reduce*calls against this invariant before merging. - Re-test on rebase:
# Build SYCL backend
meson setup build-sycl libvmaf -Denable_sycl=true CC=icx CXX=icpx
ninja -C build-sycl
# On an fp64-less device (e.g. Intel Arc A380), confirm the
# init log line is INFO-level and reads "device lacks native
# fp64 — kernels already use fp32 + int64 paths, no emulation
# overhead". The SYCL kernels must launch successfully (no
# SPIR-V module rejection from the Level Zero runtime).
build-sycl/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --backend sycl \
--feature integer_vif --feature integer_adm \
--output /tmp/sycl-fp64less.json --json
0091 — T6-9 model registry schema + --tiny-model-verify (ADR-0211)¶
- No rebase impact: 100% fork-local surface. The registry (
model/tiny/registry.json), its JSON Schema (model/tiny/registry.schema.json), the--tiny-model-verifyCLI flag, and thevmaf_dnn_verify_signature()C entry point are entirely fork-local — none of these paths exist in upstream Netflix/vmaf. Listed here for completeness so a future/sync-upstreamrun sees the surface area was acknowledged. - Touches (additive only):
model/tiny/registry.json,model/tiny/registry.schema.json,ai/scripts/validate_model_registry.py,core/src/dnn/model_loader.{c,h}(addedvmaf_dnn_verify_signature()),core/include/libvmaf/dnn.h(public declaration),core/tools/cli_parse.{c,h}(ARG_TINY_MODEL_VERIFY+tiny_model_verifyfield),core/tools/vmaf.c(call site),core/test/dnn/test_tiny_model_verify.c,python/test/model_registry_schema_test.py,docs/ai/model-registry.md,docs/ai/inference.md,docs/ai/security.md,docs/adr/0209-...md,docs/adr/README.md(index row),CHANGELOG.md,core/src/dnn/AGENTS.md. - Invariants (rebase-relevant):
- Schema is the contract. New registry fields land in
registry.schema.jsonfirst, then inregistry.json, then in any consumers (the C-side parser, the Python validator, the MCP). Reverse order causes mismatch. schema_versionis bounded. The schema accepts only{0, 1}; bump the enum and the loader's check together when adding2.- Banned-function rule applies. The
cosigninvocation usesposix_spawnp(3p)with an explicit argv array. Do not replace withsystem(3)/popen(3)— both shell-parse the command and would re-introduce injection risk. - Bundle-file absence is fail-closed. When
sigstore_bundlepoints at a not-yet-existing file (pre-release state),vmaf_dnn_verify_signature()returns-ENOENT. The CLI surfaces this as a load failure; do not "soften" to a warning without an explicit ADR. - Re-test on rebase:
python3 ai/scripts/validate_model_registry.py
python3 -m pytest python/test/model_registry_schema_test.py -v
meson test -C build-cpu --suite=dnn
0074 — HIP (AMD ROCm) backend scaffold (T7-10)¶
- ADR: ADR-0212.
- Upstream source: fork-local. HIP backend is fork-only; Netflix/vmaf has no
libvmaf_hip.hand noenable_hipmeson option. - Touches:
core/include/libvmaf/libvmaf_hip.h(new).core/include/core/meson.build— adds theis_hip_enabledinstall gate, mirroringis_cuda_enabled/is_sycl_enabledboolean idioms.core/meson_options.txt— newenable_hipboolean option (default false).core/src/meson.build— newis_hip_enabledflag, conditionalsubdir('hip'),hip_sources+hip_depsthreaded throughlibvmaf_feature_static_lib(alongside the existing CUDA / SYCL / Vulkan aggregations) and the top-levellibrary('vmaf', ...)dependencieslist.core/src/hip/(new directory:common.{c,h},picture_hip.{c,h},dispatch_strategy.{c,h},meson.build).core/src/feature/hip/(new directory:adm_hip.c,vif_hip.c,motion_hip.c).core/test/test_hip_smoke.c(new).core/test/meson.build— registers the smoke test underif get_option('enable_hip') == true..github/workflows/libvmaf-build-matrix.yml— addsBuild — Ubuntu HIP (T7-10 scaffold)row.docs/backends/hip/overview.md(new),docs/backends/index.md(planned → scaffold row),docs/research/0033-hip-applicability.md(new),docs/adr/0212-hip-backend-scaffold.md(new),docs/adr/README.md(new index row).libvmaf/AGENTS.md— new "HIP backend scaffold contract" rebase-sensitive invariant entry.CHANGELOG.md— Unreleased § Added.- Invariants (rebase-relevant):
enable_hipis abooleanoption, not afeature. Mirrorsenable_cuda/enable_sycl; do not "harmonise" withenable_vulkan'sfeature/disabledform without an ADR amendment per ADR-0212 § "Decision".- Public C-API entry points return
-ENOSYSfor the scaffold. The smoke test core/test/test_hip_smoke.c pins this. A rebase that "succeeds" by accidentally enabling a code path (e.g. a refactor that early-returns 0 fromvmaf_hip_state_init) breaks the smoke and the runtime PR's contract baseline. hip_sourcesis added tolibvmaf_feature_static_lib, NOT directly to the top-levellibrary('vmaf', ...). The static lib is extracted into libvmaf viaobjects: [..., libvmaf_feature_static_lib.extract_all_objects(recursive: true), ...]at the bottom ofcore/src/meson.build. Addinghip_sourcesto the top library() too would double-link.hip_depsIS added to the top library()dependencies:list. The runtime PR will populatehip_depswith the realdependency('hip-lang')linkage; threading it through the top library() ensures consumers see the transitive dependency.- Header purity:
libvmaf_hip.hdoes not include<hip/hip_runtime.h>. HIP runtime types cross the public ABI asuintptr_t(matches the CUDA / Vulkan precedent; ADR-0212). Don't add<hip/...>includes to the public header during a rebase / runtime-PR bring-up. - No FFmpeg patch: the fork's
ffmpeg-patches/series does not currently consume the HIP API surface. CLAUDE §12 r14 only requires patch updates when an existing patch consumes the surface; the runtime PR (T7-10b) will add thehip_devicefilter option and the corresponding patch. - On upstream sync: zero interaction; HIP backend is fork-only.
- Re-test on rebase:
cd libvmaf
meson setup build-hip -Denable_cuda=false -Denable_sycl=false \
-Denable_hip=true
ninja -C build-hip
meson test -C build-hip test_hip_smoke
# Expect: 9/9 pass.
# Default no-HIP build still works:
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=fast
0074 — SSIMULACRA 2 SVE2 SIMD parity (T7-38)¶
- ADR: ADR-0213.
- Touches:
core/src/feature/arm64/ssimulacra2_sve2.{c,h}(new),core/src/feature/ssimulacra2.c(dispatch table override ininit_simd_dispatch),core/src/arm/cpu.{c,h}(HWCAP2_SVE2 probe + newVMAF_ARM_CPU_FLAG_SVE2enum value),core/src/meson.build(cc.compiles probe + optionalarm64_ssimulacra2_sve2static library),core/test/test_ssimulacra2_simd.c(SVE2 picker overrides on the arm64 path + dispatch diagnostic),build-aux/aarch64-linux-gnu-sve2.ini(new cross-file pinningqemu-aarch64-static -cpu max). All paths are wholly fork-local; no upstream Netflix/vmaf code is modified. - Invariants:
- Fixed 4-lane SVE2 predicate. Every kernel uses
svwhilelt_b32(0, 4)so SIMD arithmetic order is identical to the NEON sibling regardless of the runtime vector length. This keeps the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract intact. Do NOT widen the predicate tosvptrue_b32()without a separate ADR + snapshot regen — variable-length lane reductions perturb the per-step rounding order. - NEON stays the fallback. SVE2 is purely additive; the dispatch table assigns NEON first and only overrides on
VMAF_ARM_CPU_FLAG_SVE2. A toolchain that fails thecc.compiles(... -march=armv9-a+sve2)probe leavesHAVE_SVE2unset and the legacy NEON-only build is unchanged. -ffp-contract=offmirrors the NEON sibling. Without it GCC fuses the per-lane scalar tail'sa*b+cpatterns intofmla, drifting against the SIMD path by ~1 ulp. Thearm64_ssimulacra2_sve2static library carries the flag like its NEON counterpart.- On upstream sync: no interaction with upstream —
arm64/feature TUs and thearm/cpu.{c,h}flag enum are fork-local. An upstream sync that rewritesinit_simd_dispatchincore/src/feature/ssimulacra2.cwould also need the SVE2 cases preserved. - Re-test on rebase:
meson setup build-arm64-sve2 libvmaf \
--cross-file=build-aux/aarch64-linux-gnu-sve2.ini -Denable_asm=true
ninja -C build-arm64-sve2 test/test_ssimulacra2_simd
meson test -C build-arm64-sve2 test_ssimulacra2_simd
# stderr should report `ssimulacra2 simd dispatch: NEON=1 SVE2=1`
# and 11/11 tests should pass.
0075 — enable_lcs MS-SSIM extras on CUDA + Vulkan (T7-35 / ADR-0243)¶
- Touched surfaces (fork-local):
core/src/feature/cuda/integer_ms_ssim_cuda.c(addedenable_lcstoMsSsimStateCuda+options[]+ 15 host-sidevmaf_feature_collector_appendcalls gated on the bool),core/src/feature/vulkan/ms_ssim_vulkan.c(rewroteenable_lcshelp text + addedemit_lcs_metricshelper + gated 15vmaf_feature_collector_appendcalls),scripts/ci/cross_backend_vif_diff.py scripts/ci/cross_backend_parity_gate.py(newfloat_ms_ssim_lcspseudo-feature +FEATURE_ALIASESmapplaces=4tolerance row).- Why this matters on rebase: the GPU MS-SSIM extractors are fork-local (Netflix upstream has no Vulkan or CUDA MS-SSIM kernel today). The
enable_lcssemantic and the metric names (float_ms_ssim_{l,c,s}_scale{0..4}) must match the upstream CPU reference atcore/src/feature/float_ms_ssim.c:189-221. If upstream ever renames or reorders those metrics, mirror the change on the GPU side in the same merge — public-API contract. - Invariants the contract enforces:
- Default-path output (
enable_lcs=false) stays bit-identical to the pre-T7-35 binary: only the host-side appends are gated; no kernel / shader / device-buffer changes. - Metric ordering is metric-wise (all
l_scale*first, thenc_*, thens_*) — matches the CPU emission order. places=4cross-backend tolerance per ADR-0190; enforced by the newfloat_ms_ssim_lcscell in the parity matrix gate (ADR-0214).- On upstream sync: zero interaction; the GPU twins do not exist upstream. The CPU
float_ms_ssim.cis shared with upstream butenable_lcsis upstream-stable since v3.0.0. - Re-test on rebase:
cd libvmaf && meson setup build-vulkan \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled -Denable_float=true \
--buildtype=release && ninja -C build-vulkan
cd ..
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build-vulkan/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 \
--feature float_ms_ssim_lcs --backend vulkan --places 4
0075 — 32-bit ADM/cpu fallbacks port (T-NEW-3)¶
- Touched surfaces (upstream-mirror):
core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/x86/cpu.c. Cherry-picks of upstream8a289703(Christopher Degawa, "adm: add fallback for extract_epi64 for 32-bit") and1b6c3886("x86/cpu: remove limit of avx+ on 32-bit"). - Why this matters on rebase: trivially conflict-free with any future upstream
extract_epi64work because we land upstream's exactextract_epi64macro/inline-fn pair. The conflict surface is the fork's clang-format-100col layout inadm_avx2.c/adm_avx512.cand the_Alignas(64)LTO-correctness slot inadm_avx512.c(docs/development/known-upstream-bugs.md); both are preserved verbatim. - Invariants the port preserves:
_Alignas(64) int64_t angle_flag[16]inadm_decouple_s123_avx512stays — without it, LTO can promote the unaligned load tovmovdqa64and fault under--buildtype=release -Db_lto=true.- The
extract_epi64symbol must remain resolved on both__x86_64__(macro to_mm256_extract_epi64) and 32-bit (fallback inline). If a future upstream change inlines the helper differently, keep the conditional definition. - On upstream sync: if Netflix ships further 32-bit fallbacks (motion / psnr — not in this port), expect a parallel
extract_epi64-style helper at the top of each affected SIMD file. The fork should mirror those verbatim into the same files. - Re-test on rebase:
meson setup build-i686 libvmaf \
--cross-file=build-aux/i686-linux-gnu.ini \
-Denable_asm=false
ninja -C build-i686
meson setup build-cpu libvmaf -Denable_avx512=true
ninja -C build-cpu
meson test -C build-cpu
0076 — codec-aware FR regressor surface (T7-CODEC-AWARE / ADR-0235)¶
- Touches:
ai/src/vmaf_train/codec.py(new),ai/src/vmaf_train/models/fr_regressor.py(extended),ai/scripts/bvi_dvc_to_full_features.py,ai/scripts/extract_full_features.py. No upstream-shared paths. - Invariant:
CODEC_VOCABinai/src/vmaf_train/codec.pyis closed and order-stable — the index of each codec is the one-hot column index baked into trained ONNX. Adding a codec appends to the tuple and bumpsCODEC_VOCAB_VERSION; reordering silently invalidates every shippedfr_regressor_v2_*.onnx.FRRegressor(num_codecs=0)must remain the v1 single-input contract — flipping the default would break every existingmodel/tiny/fr_regressor_v1.onnxconsumer. - Re-test:
pytest ai/tests/test_codec_aware_fr.py -v(8 sub-tests covering vocabulary contract + alias table + back-compat). Pure fork-local addition; no upstream rebase impact for the next/sync-upstream.
0075 — feature/speed extractors (T-NEW-1, upstream port d3647c73)¶
- Touches:
core/src/feature/speed.c(new),core/src/feature/picture_copy.{c,h}(signature change — addedint channelparameter),core/src/feature/float_*.ccall sites updated to passchannel=0,core/src/feature/feature_extractor.cregistry block,core/src/feature/alias.c,core/src/meson.build,core/src/feature/vif_tools.{c,h}(helper-function port from upstream4ad6e0ea). - Upstream source: verbatim cherry-pick of Netflix/vmaf
d3647c73("feature/speed: port speed_chroma and speed_temporal extractors") with its dependency4ad6e0ea("feature/vif: port helper functions"). Both are pre-existing on Netflix master and enter the fork as part of the T7-4 audit catch-up. - Invariant:
picture_copy()now takes achannelargument — every fork-local extractor that calls it (CUDAinteger_ms_ssim, Vulkanssim/ms_ssim) passeschannel=0. If upstream later evolves the signature again (e.g. adds bit-depth or stride validation), update those fork-local call sites in lockstep. Speed extractors only register whenVMAF_FLOAT_FEATURES=1(build with-Denable_float=true). - On upstream sync: future Netflix commits in
core/src/feature/speed.capply cleanly because the file is now a verbatim mirror; conflict potential is limited to the registry block infeature_extractor.c(interleave with the fork's Vulkan / SYCL / CUDA blocks) and to any furtherpicture_copysignature evolution. - Re-test on rebase:
```bash meson setup build-cpu libvmaf -Denable_cuda=false \ -Denable_sycl=false -Denable_float=true ninja -C build-cpu meson test -C build-cpu test_speed meson test -C build-cpu # full meson suite make test-netflix-golden # 3 CPU canonical pairs
0221 — CHANGELOG + ADR-index fragment-file pattern (T7-39 / ADR-0221)¶
- What changed: the fork stopped editing
CHANGELOG.mdanddocs/adr/README.mddirectly. Both files are now rendered from fragment trees: changelog.d/<section>/<topic>.md(Keep-a-Changelog sections), plus the migration archivechangelog.d/_pre_fragment_legacy.md.docs/adr/_index_fragments/<NNNN-slug>.md, plusdocs/adr/_index_fragments/_order.txt(frozen commit-merge order manifest) anddocs/adr/_index_fragments/_header.md(table prelude). Two scripts render the consolidated outputs:scripts/release/concat-changelog-fragments.sh --check|--writescripts/docs/concat-adr-index.sh --check|--write- On upstream sync: zero interaction —
CHANGELOG.mdis a fork-local Markdown surface (Netflix upstream doesn't ship a Keep-a-Changelog file in this format), anddocs/adr/is entirely fork-local. A/sync-upstreamrun will not touch the fragment trees. - Re-test on rebase:
bash scripts/release/concat-changelog-fragments.sh --check
bash scripts/docs/concat-adr-index.sh --check
# both must exit 0; otherwise run --write and re-stage.
0077 — DISTS extractor proposal (T7-DISTS / ADR-0236)¶
- What landed: ADR-0236 (Proposed) + Research-0043 design digest ADR README index row + CHANGELOG entry.
- Rebase impact: pure fork-local proposal-stage docs; no code, no Netflix-mirror file touched, no ffmpeg-patches change, no public C-API surface change.
- Reproducer (when implementation lands as T7-DISTS):
```sh vmaf --feature dists_sq=model_path=model/tiny/dists_sq.onnx \ --reference ref.yuv --distorted dist.yuv \ --width 1920 --height 1080 --pix_fmt yuv420p
0076 — GPU-gen ULP calibration head (proposal-stage, T7-GPU-ULP-CAL / ADR-0234)¶
- What landed: ADR-0234 (Proposed), Research-0041, data-collection scaffold at
ai/scripts/collect_gpu_calibration_data.py, forward-pointer indocs/usage/cli.mdfor the future--gpu-calibratedflag. - Rebase impact: pure fork-local (proposal docs + Python script); no upstream Netflix/vmaf code touched, no public C-API changes, no ffmpeg-patches changes.
- Reproducer:
```sh python3 ai/scripts/collect_gpu_calibration_data.py --smoke
0095 — Per-backend GPU kernel scaffolding templates (CUDA + Vulkan, ADR-0246)¶
- ADR: ADR-0246.
- Touches:
core/src/cuda/kernel_template.h(new, header-only).core/src/vulkan/kernel_template.h(new, header-only).core/src/cuda/AGENTS.md(new invariant row + dir listing).core/src/vulkan/AGENTS.md(new file).docs/backends/kernel-scaffolding.md(new).docs/adr/0246-gpu-kernel-template.md(new).CHANGELOG.md,docs/adr/README.md. All paths are wholly fork-local. Upstream Netflix/vmaf has no Vulkan backend at all today and the CUDA backend uses different per-kernel scaffolding shapes; nothing here can collide on a pure upstream sync.- Invariants:
- Templates are unused at PR-merge time.
kernel_template.hin bothcore/src/cuda/andcore/src/vulkan/lands with zero call-sites. Each future kernel migration is its own gated PR (places=4cross-backend-diff per ADR-0214). Do not bulk-port existing kernels onto the templates in a single sync — that would short-circuit the per-kernel gate. - Per-backend, not cross-backend. Resist the urge to merge the two templates into a unified
gpu/kernel_template.h. CUDA async-stream + event vs Vulkan command-buffer + fence + descriptor-pool share no concrete shape; a unified API would be lowest-common-denominator. - Helper functions, not macros. The header bodies are
static inlinefunctions for cuda-gdb / Nsight / RenderDoc step-debugging. TheCHECK_CUDA_GOTO/CHECK_CUDA_RETURNmacros incuda_helper.cuhstay where they pay off (textualgoto label), and the templates use them internally. - On upstream sync: no interaction with upstream paths. An upstream sync that touches
core/src/cuda/common.horpicture_cuda.hmay shift the helper signatures the template consumes (vmaf_cuda_buffer_alloc,vmaf_cuda_picture_get_stream, …); update the template if so. - Re-test on rebase:
```bash # CUDA build (configure inside libvmaf/ — see CLAUDE.md §2 note). meson setup core/build-cuda libvmaf \ -Denable_cuda=true -Denable_nvcc=true \ -Denable_vulkan=disabled -Denable_sycl=false ninja -C core/build-cuda meson test -C core/build-cuda
# Vulkan build. meson setup core/build-vulkan libvmaf \ -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false ninja -C core/build-vulkan meson test -C core/build-vulkan
0222 — vmaf-perShot per-shot CRF predictor sidecar (T6-3b)¶
- Touches:
core/tools/meson.build(new executable + test wiring),core/tools/vmaf_per_shot.c(new file — fork-local, no upstream sibling),core/tools/test/meson.build(test row),core/tools/test/test_vmaf_per_shot.sh(new smoke test),core/tools/AGENTS.md(sidecar invariants),docs/usage/cli.md(cross-link),docs/usage/vmaf-perShot.md(new user doc),docs/ai/roadmap.md(T6-3b row update). - Invariant: the sidecar must stay standalone — it does not link the libvmaf metric path. Any upstream patch that tries to fold per-shot CRF prediction into
vmaf_score_*would collapse the encoder-hint vs. quality-score separation recorded in roadmap §2.4 and ADR-0222 §Decision. The CSV / JSON column set (shot_id,start_frame,end_frame,frames,mean_complexity,mean_motion,predicted_crf) is the public schema; downstream encoders consume it directly. - Conflict expectation on
/sync-upstream: low. Upstream Netflix has no per-shot CRF predictor in tree, so there is no natural collision point —tools/meson.buildis the only mutually-edited file and the newexecutable('vmaf-perShot', …)block is appended aftervmaf_bench_deps, well clear of upstream's likely additions. - Reproducer:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=disabled ninja -C build meson test -C build test_vmaf_per_shot --print-errorlogs ./build/tools/vmaf-perShot \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --output /tmp/plan.csv cat /tmp/plan.csv
0075 — vmaf-roi sidecar binary (T6-2b / ADR-0247)¶
- Touches:
core/tools/meson.build— adds thevmaf_roiexecutable target (after the existingvmaftarget, beforevmaf_bench). Append-only; no upstream-shared lines moved or removed.core/test/meson.build— adds thetest_vmaf_roiexecutable +test()registration. Append-only.core/tools/vmaf_roi.c— wholly new, fork-local.core/tools/vmaf_roi_core.h— wholly new, fork-local.core/test/test_vmaf_roi.c— wholly new, fork-local.- Invariant: the
vmaf-roisidecar emits two byte-exact formats that downstream encoder drivers (x265--qpfile, SVT-AV1--roi-map-file) will hard-depend on: - x265 ASCII grid — two
#-prefixed header lines (# vmaf-roi qpfile (x265, --qpfile-style)and# frame=N ctu=S cols=C rows=R strength=F.FFF), space-separated signed integers, one row per CTU row,\nterminator. - SVT-AV1 raw binary — exactly
cols * rowsbytes ofint8_t, row-major, no header. - QP-offset clamp —
+-12(VMAF_ROI_CORE_QP_OFFSET_MAX). - Reduction — per-CTU mean (not max). Switching to max or a percentile changes every downstream encoder result and requires its own ADR.
- Pure helpers in
vmaf_roi_core.h— the per-CTU mean reducer and saliency-to-QP mapper arestatic inlinein a header sotest_vmaf_roicompiles them without dragging the libvmaf link surface in. Moving them into a.cTU breaks the test wiring. - On upstream sync: no interaction with upstream —
tools/is a fork-local surface from upstream's perspective (upstream shipsvmaf.conly). An upstream sync that rewritescore/tools/meson.buildshould preserve thevmaf_roiexecutable block. - Re-test on rebase:
```bash meson setup build-cpu libvmaf \ -Denable_cuda=false -Denable_sycl=false -Denable_tools=true ninja -C build-cpu tools/vmaf_roi test/test_vmaf_roi meson test -C build-cpu test_vmaf_roi ./build-cpu/tools/vmaf_roi \ --reference testdata/ref_576x324_48f.yuv \ --width 576 --height 324 --frame 0 --output - \ --encoder x265 --ctu-size 64 --strength 6.0 | head -3 # First two lines are the # comment header; row 1 of the grid # should be "4 2 1 -1 -1 -1 1 2 4" (placeholder radial map).
0219 — motion3 GPU coverage on Vulkan + CUDA + SYCL (T3-15(c) / ADR-0219)¶
- What changed: The
motionGPU twins (core/src/feature/vulkan/motion_vulkan.c,core/src/feature/cuda/integer_motion_cuda.c,core/src/feature/sycl/integer_motion_sycl.cpp) now emitVMAF_integer_feature_motion3_scorein 3-frame window mode (default). Cross-backend gates extended (scripts/ci/cross_backend_*.pyFEATURE_METRICS["motion"]). - Invariants:
motion3 = host-side scalar post-process of motion2. No device-side state changes; motion3 is computed on the host inextract()/collect()/flush()after the existing SAD reduction. The post-processing function (motion3_postprocess_*) mirrors CPUinteger_motion.clines 510-560 byte-for-byte:clip(motion_blend(motion2 * fps_weight, blend_factor, blend_offset), max_val)with optional moving-average against the unaveraged prior blended value.motion_five_frame_window=truereturns-ENOTSUPatinit()on all three GPU backends. The 5-deep blur ring + second SAD-pair dispatch remain deferred. Do NOT silently fall back to the 3-frame path when the user enables the flag — fail loud per CERT C / CLAUDE.md §12 r4.- CPU motion3 algorithm is the source of truth. Any port of an upstream Netflix change to
integer_motion.cthat touchesmotion_blend(...), themotion_max_valclip, or the moving-average rule MUST be mirrored inmotion3_postprocess_*across all three GPU files in the same PR. The cross-backend gate atplaces=4will catch drift, but only after a full GPU run. - On upstream sync: Pure fork-local additions to GPU TUs. Upstream Netflix has no GPU motion extractor. The
motion_blend_tools.hheader is upstream-mirrored — if a sync rewrites themotion_blend()formula, regenerate the GPU snapshot and re-run the cross-backend gate. - Re-test on rebase:
```bash # CPU sanity (motion3 emission unchanged) ./core/build/tools/vmaf \ --reference python/test/resource/yuv/src01_hrc00_576x324.yuv \ --distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --pixel_format 420 --bitdepth 8 \ --feature motion --output /tmp/motion.json --json python -c "import json; d=json.load(open('/tmp/motion.json')); \ print('motion3 frames:', sum(1 for f in d['frames'] \ if 'integer_motion3' in f.get('metrics', {})))" # Expect 49 (one motion3 per frame).
# Cross-backend gate (Vulkan/lavapipe lane works on every host): python scripts/ci/cross_backend_vif_diff.py \ --feature motion --backend vulkan \ --ref python/test/resource/yuv/src01_hrc00_576x324.yuv \ --dis python/test/resource/yuv/src01_hrc01_576x324.yuv \ --width 576 --height 324 --bitdepth 8 \ --vmaf-bin core/build/tools/vmaf # Expect: integer_motion / integer_motion2 / integer_motion3 all OK at places=4.
0216 — vmaf_tiny_v2 (Phase-3-validated tiny VMAF MLP)¶
- Touches:
model/tiny/registry.json,model/tiny/vmaf_tiny_v2.{onnx,json},ai/scripts/{train,export,validate}_vmaf_tiny_v2.py,ai/AGENTS.md,core/test/dnn/{test_vmaf_tiny_v2.py,meson.build},docs/ai/{models/vmaf_tiny_v2.md,inference.md,roadmap.md},docs/adr/{0244-vmaf-tiny-v2.md,README.md},CHANGELOG.md. All paths are wholly fork-local; no upstream Netflix/vmaf code is modified. - Invariants:
- Bundled scaler stats are part of the trust root. The shipped ONNX bakes
(input - mean) / stdas ConstantSub+Divnodes that run before the MLP. Re-exporting must go throughai/scripts/export_vmaf_tiny_v2.py, which pullsmean/stdfrom the trainer checkpoint and writes them as graph initialisers. Adding an out-of-band scaler step at runtime (e.g., a sidecar JSON consumed by the loader) is forbidden without a follow-up ADR — it splits the trust root and invalidates the registry sha256 contract. - Feature column order is fixed. The graph reads
(adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2)in exactly this order; reordering breaks the bundledmean/stdconstants. Any change to the feature set requires a fresh Phase-3 chain (Research-0027 → 0028 → 0029 → 0030). - opset 17. Matches the sister tiny-AI models (
learned_filter_v1,nr_metric_v1,fastdvdnet_pre) and the ORT op-allowlist baseline. Upgrading requires re-validating theSub/Div/Gemm/Relu/Squeezeops againstop_allowlist.c. - On upstream sync: zero interaction. Netflix/vmaf has no equivalent surface; an upstream sync that touches
core/src/dnn/(op-allowlist or model-loader changes) needs to preserveSub/Div/Gemm/Relu/Squeezein the allowlist for opset 17. - Re-test on rebase:
```bash bash core/test/dnn/test_registry.sh python3 core/test/dnn/test_vmaf_tiny_v2.py python3 ai/scripts/validate_vmaf_tiny_v2.py \ --onnx model/tiny/vmaf_tiny_v2.onnx \ --parquet runs/full_features_netflix.parquet \ --rows 100 --min-plcc 0.97 meson test -C build-cpu --suite=dnn
0094 — Tiny-AI extractor template (ADR-0250)¶
- Touches:
core/src/dnn/tiny_extractor_template.h(new),core/src/feature/feature_lpips.c,core/src/feature/fastdvdnet_pre.c,core/src/dnn/AGENTS.md,docs/ai/extractor-template.md(new),docs/adr/0250-tiny-ai-extractor-template.md(new). - Invariants:
- Helper signatures are wire-format-stable.
vmaf_tiny_ai_resolve_model_path(name, option, env_var)andvmaf_tiny_ai_open_session(name, path, &out)produce the user-facing log lines<name>: no model path …and<name>: vmaf_dnn_session_open(<path>) failed: <rc>— downstream tooling greps these. Don't rename or reorder the parameters without bumping every extractor + the recipe doc. - YUV→RGB is bit-exact. The shared
vmaf_tiny_ai_yuv8_to_rgb8_planesis a literal move of the pre-existingfeature_lpips.cbody (BT.709 limited-range, nearest-neighbour chroma upsample). LPIPS / saliency / future colour-sensitive tiny-AI scores depend on byte-exact equality with the prior ad-hoc copies. Any change to the conversion constants or the rounding rule needs a separate ADR + a coordinated snapshot regen —model/tiny/weights aren't re-trained against new colour math casually. - Option-table macro is plain text substitution. The
VMAF_TINY_AI_MODEL_PATH_OPTION(state_t, help)macro emits a single struct literal — no control flow, no recursion, no variadic shenanigans (Power-of-10 rule 1 / rule 9). Don't extend it into a multi-option emitter without a fresh ADR. - On upstream sync: zero interaction with upstream —
feature_lpips.candfastdvdnet_pre.care fork-only files, and the newdnn/tiny_extractor_template.hlives entirely under fork-introducedcore/src/dnn/. An upstream sync that rewrites unrelatedfeature_*.cfiles won't conflict. - Re-test on rebase:
cd libvmaf
meson setup build-cpu -Denable_cuda=false -Denable_sycl=false
ninja -C build-cpu
meson test -C build-cpu --suite=dnn
meson test -C build-cpu test_lpips test_fastdvdnet_pre
# All 10 dnn-suite + both extractor tests must pass.
0095 — Vulkan ring-depth tunable (ADR-0251 follow-up #3)¶
- PR: feat/t7-29-followup3-ring-tunable.
- What rebases need to know:
VmafVulkanConfigurationgrew an additiveunsigned max_outstanding_framesfield. Existing zero-initialised configs continue to receive the canonical default (0 → VMAF_VULKAN_RING_DEFAULT == 4). The clamp helpervmaf_vulkan_clamp_ring_sizemoved fromimport.c(file-local static) tovulkan_internal.h(static inline) sostate_initandlazy_alloc_ringshare one definition; an upstream sync that re-introduces the static inimport.cwould shadow the header helper — drop the duplicate, keep the inline. - New public symbol:
vmaf_vulkan_state_max_outstanding_frames(const VmafVulkanState *)— read-side accessor for the clamped value. Pure additive surface; no upstream collision. - On upstream sync: zero interaction. The ring is wholly fork-introduced (ADR-0251); upstream Netflix has no Vulkan backend.
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_async_pending_fence # All 8 cases must pass: 4 v2-contract + 4 ring-tunable.
0096 — tools/vmaf-tune/ automation umbrella spec (ADR-0237 / Research-0044)¶
- PR: feat/vmaf-tune-spec.
- What rebases need to know: this PR ships only an umbrella ADR research digest under
docs/. No tracked source code, notools/vmaf-tune/directory yet, no Meson changes. An upstream sync touching ffmpeg-patches orlibvmaf/cannot collide with this PR. - On upstream sync: zero interaction. Spec-only PR.
- Re-test on rebase:
# No build/test impact — verify the docs render and links are alive:
ls docs/adr/0237-quality-aware-encode-automation.md \
docs/research/0044-quality-aware-encode-automation.md
grep -c '\[ADR-0237\]' docs/adr/README.md
0097 — test_speed gated on enable_float (fix default-build failure)¶
- PR: fix/test-speed-chroma-registration.
- What rebases need to know:
core/test/meson.buildnow wraps thetest_speedexecutable +test()registration inif get_option('enable_float'). Thespeed_chroma/speed_temporalextractors live inspeed.c, which is only compiled whenenable_float=true(the entries infeature_extractor.care wrapped in#if VMAF_FLOAT_FEATURES), so the test'svmaf_get_feature_extractor_by_name("speed_chroma")returned NULL on a default build (enable_float=false). - On upstream sync: zero interaction.
test_speed.cwas added fork-side via the Netflix port commitd3647c73. The gating pattern matchestest_vulkan_*(if get_option('enable_vulkan').enabled()). - Re-test on rebase:
# default (enable_float=false): test_speed must NOT be in the suite
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --reconfigure
ninja -C build
meson test -C build # expect: NO test_speed in the run
# CI shape (enable_float=true): test_speed must run + pass
meson setup build libvmaf -Denable_float=true --reconfigure
ninja -C build
meson test -C build test_speed # expect: 5/5 pass
0098 — Vulkan picture preallocation surface (ADR-0238)¶
- PR: feat/vulkan-picture-preallocation.
- What rebases need to know: ABI grows additively. New public surface in
core/include/libvmaf/libvmaf_vulkan.h:enum VmafVulkanPicturePreallocationMethod,VmafVulkanPictureConfiguration,vmaf_vulkan_preallocate_pictures,vmaf_vulkan_picture_fetch. New enumeratorVMAF_PICTURE_BUFFER_TYPE_VULKAN_DEVICEincore/src/picture.h::VmafPictureBufferType. New TUcore/src/vulkan/picture_vulkan_pool.c(~180 LOC); registered incore/src/vulkan/meson.build. Fork-internal accessorvmaf_vulkan_state_context()(declared invulkan_internal.h) exposes the imported state's VkInstance/VkDevice to the pool — used only bylibvmaf.c::vmaf_vulkan_preallocate_pictures. VmafContextfield added:vmaf->vulkan.poolnext tovmaf->vulkan.state. Thevmaf_close()teardown closes the pool before clearing the state pointer (matches SYCL).- On upstream sync: zero interaction. Vulkan backend is fork-only; upstream Netflix has no Vulkan integration.
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build test_vulkan_pic_preallocation # All 6 cases must pass under ASan/UBSan: # test_method_none_is_a_no_op # test_method_host_allocates_round_robins # test_method_device_allocates_round_robins # test_fetch_without_preallocate_falls_back # test_unknown_method_rejected # test_null_args_rejected
0099 — feature_mobilesal.c + transnet_v2.c migrated to tiny_extractor_template.h¶
- PR: refactor/migrate-ai-to-template.
- What rebases need to know:
feature_mobilesal.candtransnet_v2.cpreviously open-coded the model-path resolution (getenv+ log block), the YUV→RGB kernel (mobilesal only), thevmaf_dnn_session_open+ log boilerplate, and theVmafOption[].model_pathrow. They now use the helpers fromdnn/tiny_extractor_template.h(PR #251) — the same templatefeature_lpips.candfastdvdnet_pre.calready consume. Net −98 LOC of identical boilerplate. - Behavior preserved: bit-exact YUV→RGB conversion (mobilesal used the literal copy of
feature_lpips.c's body that the template hoisted), identical error-log strings, identical option-table flag/type/offset shape. The migratedmobilesal_optionsmacro expands to the same struct literal the hand-rolled version produced. - On upstream sync: zero interaction. Both files are fork-introduced; upstream Netflix has neither extractor.
0100 — cuda/ring_buffer.{c,h} → gpu_picture_pool.{c,h} (ADR-0239)¶
- PR: refactor/gpu-picture-pool-extract.
- What rebases need to know:
core/src/cuda/ring_buffer.candring_buffer.hare removed. The same callback-based round-robin pool lives atcore/src/gpu_picture_pool.{c,h}under renamed symbols (VmafRingBuffer→VmafGpuPicturePool,vmaf_ring_buffer_*→vmaf_gpu_picture_pool_*,_fetch_next_picture→_fetch). All call sites inlibvmaf.cmigrated.core/test/test_ring_buffer.crenamed totest_gpu_picture_pool.cwith the corresponding meson update. - Netflix-upstream interaction: minimal — Netflix's
cuda/ring_buffer.{c,h}last touched in commitcb1d49c6. An upstream sync that resurrects the old names should be redirected to the new ones; the file move is purely fork-local. Netflix#1300mutex-destroy-order fix preserved (ADR-0157) — moved verbatim to the new file; the fix remains attached tovmaf_gpu_picture_pool_close.- SYCL pool migration:
vmaf_sycl_picture_pool_*keeps its public-internal API but now delegates to the generic pool. The SYCL wrapper struct (VmafSyclPicturePool) just owns theVmafSyclCookiestorage.std::mutexdrops out. - Vulkan pool migration: bundled into this PR after #264 merged.
picture_vulkan_pool.crewrites as a thin wrapper around the generic pool — wrapper struct owns per-pool state for the alloc/free callbacks; the generic pool owns the round-robin slots / mutex / unwind. Same pattern as the SYCL migration above. - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=dnn
meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre
# All 11 dnn-suite + 4 extractor smoke tests must pass.
meson test -C build # 47/47 pass under ASan/UBSan
# CUDA build (CI-only; pre-existing local nvcc include-path quirk):
meson setup build-cuda libvmaf -Denable_cuda=true
ninja -C build-cuda
meson test -C build-cuda test_gpu_picture_pool
# SYCL build:
meson setup build-sycl libvmaf -Denable_sycl=true
ninja -C build-sycl
meson test -C build-sycl
0104 — psnr_vulkan.c migrated to vulkan/kernel_template.h¶
- PR: refactor/migrate-psnr-vulkan-to-template.
- What rebases need to know:
vulkan/kernel_template.h(410 LOC, ADR-0246, PR #251) shipped with zero consumers. Its docstring designatedpsnr_vulkan.cas the reference implementation. This PR lands the migration as the first consumer of the Vulkan template — paired with PR #269 (the first CUDA template consumer). The 5 long-lived pipeline objects (descriptor-set layout, pipeline layout, shader module, compute pipeline, descriptor pool) collapse from individual struct fields to oneVmafVulkanKernelPipeline plbundle.create_pipeline()(~104 LOC) collapses to a singlevmaf_vulkan_kernel_pipeline_create()call (~30 LOC) — the template owns the descriptor-set layout creation, pipeline layout, shader module, compute pipeline, and descriptor-pool sizing.close_fex()'svkDeviceWaitIdle+ 5×vkDestroy*sweep collapses to onevmaf_vulkan_kernel_pipeline_destroy()call. - Net LOC delta: −55 LOC on
psnr_vulkan.cdirectly. Unlike the CUDA template (where helper-call boilerplate roughly matches the inline savings), the Vulkan template's pipeline creation is dramatic enough that even the first consumer wins. - Bit-exactness gates: spec-constants, push-constant struct, shader bytecode, dispatch grid math, and host-side reduction are byte-identical to the prior implementation. The template only owns descriptor-set layout / pipeline layout / shader module / compute pipeline creation / descriptor pool sizing — none of which affects the kernel's mathematical behaviour. Cross-backend parity gate (places=4) re-runs unchanged.
- On upstream sync: zero interaction.
psnr_vulkan.cis fork-introduced (T7-23 / ADR-0182 / ADR-0216). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=enabled
ninja -C build
meson test -C build # 50/50 pass on lavapipe
# Cross-backend parity gate (places=4):
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4
0105 — moment_vulkan.c + ciede_vulkan.c migrated to vulkan/kernel_template.h¶
- PR: refactor/migrate-motion-vulkan-to-template (note: the branch name reflects the original intent; motion's two-pipeline shape didn't fit the template's single-pipeline contract, so this PR migrates moment + ciede instead).
- What rebases need to know: second + third consumers of
vulkan/kernel_template.h(after PR #270 = psnr_vulkan, the first consumer). Both files follow the identical migration pattern: - Replace 5 individual pipeline-object fields (
dsl,pipeline_layout,shader,pipeline,desc_pool) with oneVmafVulkanKernelPipeline plbundle. - Replace ~100 LOC of
create_pipeline()body (descriptor-set layout + pipeline layout + shader module + compute pipeline + descriptor pool boilerplate) with a singlevmaf_vulkan_kernel_pipeline_create()call. - Replace
close_fex()'svkDeviceWaitIdle+ 5×vkDestroy*sweep with onevmaf_vulkan_kernel_pipeline_destroy()call. - Per-file LOC deltas:
moment_vulkan.c: −60 LOC (450 → 390).ciede_vulkan.c: −59 LOC (536 → 477).- Net: −119 LOC.
- Bit-exactness preserved: spec-constants (width/height/bpc/ subgroup_size identical across both), push-constant structs (
MomentPushConsts,CiedePushConsts), shader bytecodes (moment_spv,ciede_spv), dispatch grid math, and host-side reductions are byte-identical to the prior implementation. Cross-backend parity gates (places=4 for moment integer reduce; places=2 for ciede transcendentals per ADR-0187) re-run unchanged. motion_vulkan.cdeferred: motion uses two pipelines (first frame vs subsequent) sharing one DSL + layout + shader + pool. The template's current shape produces one pipeline per descriptor; splitting motion across twoVmafVulkanKernelPipelineinstances would duplicate the shared objects. Tracked as a follow-up template extension (multi-pipeline support).- On upstream sync: zero interaction. Both files are fork-introduced (T7-23 / ADR-0182 / ADR-0187).
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false \ -Denable_vulkan=enabled ninja -C build meson test -C build # 50/50 pass on lavapipe (under ASan/UBSan) python scripts/ci/cross_backend_parity_gate.py --feature float_moment_ref1st --places 4 python scripts/ci/cross_backend_parity_gate.py --feature ciede2000 --places 2
0101 — GPU backend pattern doc (ADR-0240)¶
- PR: docs/gpu-backend-template.
- What rebases need to know: doc-only PR. Adds
docs/development/gpu-backend-template.md(recipe new GPU backends follow) andcore/include/libvmaf/AGENTS.md(public-headers-tree invariant note). No source code, no meson changes, no ABI impact. - On upstream sync: zero interaction. Both files are fork-introduced.
- Re-test on rebase:
```bash # Doc-only — verify links resolve: test -f docs/development/gpu-backend-template.md test -f core/include/libvmaf/AGENTS.md grep -c 'gpu-backend-template' core/include/libvmaf/AGENTS.md
0102 — Tiny-AI test registration macro (tiny_ai_test_template.h)¶
- PR: refactor/test-registration-macro.
- What rebases need to know: new
core/test/tiny_ai_test_template.hemits the four standard registration tests (<name>_is_registered,<name>_provides_primary_feature,<name>_options_table_well_formed,<name>_init_rejects_missing_model) via theVMAF_TINY_AI_DEFINE_REGISTRATION_TESTS(ext, feat, env, prefix)macro. The four per-extractor test files (test_lpips.c,test_mobilesal.c,test_transnet_v2.c,test_fastdvdnet_pre.c) shrank from ~140 LOC each to ~20-50 LOC. Net −286 LOC. Behavior bit-exact preserved (same assertions, same env-var save/restore dance, same setenv shim for MSVCRT). TransNet V2 keeps two extractor-specific extra tests (binary-flag round-trip + provided_features list-termination) that the macro doesn't cover. - On upstream sync: zero interaction. The four test files are fork-introduced (per ADR-0042 / ADR-0168 / ADR-0220 / ADR-0223 / ADR-0215).
- Re-test on rebase:
```bash meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false ninja -C build meson test -C build test_lpips test_mobilesal test_transnet_v2 test_fastdvdnet_pre # 4/4 binaries pass; 18 individual tests total (4x4 standard + 2 # TransNet V2 extras).
0103 — integer_psnr_cuda.c migrated to cuda/kernel_template.h¶
- PR: refactor/migrate-psnr-cuda-to-template.
- What rebases need to know:
cuda/kernel_template.hshipped with no consumers in PR #251 (ADR-0246). This PR migrates the first consumer (integer_psnr_cuda.c) — the file the template's own docstring explicitly designated as the reference. TheCUstream + CUevent + CUeventtriple and the(VmafCudaBuffer device, void *host_pinned, size_t bytes)readback pair are now dispensed by the template helpers (vmaf_cuda_kernel_lifecycle_init/_close,vmaf_cuda_kernel_readback_alloc/_free,vmaf_cuda_kernel_submit_pre_launch,vmaf_cuda_kernel_collect_wait) instead of being open-coded.PsnrStateCudashrinks: replaces three fields (event+finished+str) with oneVmafCudaKernelLifecyclereplaces (sse+sse_host) with oneVmafCudaKernelReadback. - Net LOC delta: +8 LOC on
integer_psnr_cuda.calone — the helpers add per-call boilerplate. The dedup win materialises as more CUDA feature kernels (motion / moment / ssim / vif / adm) migrate one-at-a-time in follow-up PRs. Each subsequent migration saves ~15 LOC. - Bit-exactness gates: kernel launch + reduction logic unchanged. The migration only touches state-management boilerplate around the kernel; the SSE accumulator math, the per-bpc kernel function lookup, the host-side
log10score formula, and the dispatch grid-dim calculation are byte-identical to the prior implementation. Netflix golden gate + CPU/CUDA cross-backend parity gate (places=4) re-run unchanged. - On upstream sync: zero interaction.
integer_psnr_cuda.cis fork-introduced (T7-23 / ADR-0182). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true
ninja -C build
meson test -C build # CUDA test suite must pass
# Cross-backend parity gate:
python scripts/ci/cross_backend_parity_gate.py --feature psnr_y --places 4
0125 — Vulkan submit-side template + fence pool + descriptor pre-alloc bundle (ADR-0256)¶
- Touches:
core/src/vulkan/kernel_template.h— fork-local. Output landing inruns/phase_a/is gitignored — rerun the script to reproduce.VmafVulkanKernelSubmitPoolstruct +_create/_destroy/_acquirehelpers +vmaf_vulkan_kernel_descriptor_sets_allochelper. Upstream has no Vulkan backend — no merge surface.core/src/feature/vulkan/{psnr_hvs,vif,float_vif,float_adm}_vulkan.c— fork-local kernel TUs, also no upstream peer.- Invariant: the four migrated kernels keep all per-frame
VkFence+VkCommandBuffer+VkDescriptorSetresources alive across frames in the pool. Pre-bound descriptor sets rely on the kernel'sVmafVulkanBuffer *handles being init-time stable (allocated ininit(), freed only inclose_fex).vmaf_vulkan_kernel_pipeline_destroydestroys the descriptor pool — pre-allocated sets are released implicitly via the pool; callers must NOT callvkFreeDescriptorSetson them. - Re-test on rebase:
meson setup build libvmaf -Denable_vulkan=enabled
ninja -C build
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json \
meson test -C build test_vulkan_smoke \
test_vulkan_async_pending_fence \
test_vulkan_pic_preallocation
python scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature vif --backend vulkan --places 4
python scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary build/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature adm --backend vulkan --places 4
0107 — psnr_hvs_cuda async upload + persistent pinned staging (T-GPU-OPT-2/3)¶
- Touches:
core/src/feature/cuda/integer_psnr_hvs_cuda.c— only consumer; fork-local from inception (T7-23 / ADR-0188 / ADR-0191). State addsupload_str(dedicated H2D stream),upload_done(cross-stream completion event), and per-plane persistent pinnedh_uint_ref[3]/h_uint_dist[3]staging buffers allocated once ininit_fex_cuda. The per-call helperupload_plane_cudais split intoissue_d2h_plane(pic-stream D2H),convert_plane(CPU normalise), andissue_h2d_plane(upload-stream H2D).submit_fex_cudaruns the three phases explicitly and recordsupload_doneafter the last H2D, thencuStreamWaitEvents onlc.strbefore kernel launches.core/src/cuda/AGENTS.md— adds a rebase-sensitive invariant entry under §Rebase-sensitive invariants documenting the three-phase flow + persistent staging contract.- Invariant: the pinned
h_uint_*andh_ref/h_distbuffers are never freed and re-allocated mid-stream; the H2Ds must run onupload_str(not onlc.str) so thecuStreamWaitEventcross-stream link is meaningful; theupload_doneevent is recorded after the last H2D for the current frame and waited on once before the first kernel launch of that frame. CUDA graph capture (future T-GPU-OPT-N) depends on the no-per-frame-alloc invariant; collapsing the three-phase split or re-introducing per-framevmaf_cuda_buffer_host_alloccalls breaks that follow-up. Bit-exactness gate isplaces=3forpsnr_hvs_y / cb / crand the combinedpsnr_hvs(matches the existing matrix; notplaces=4). - On upstream sync: zero interaction.
integer_psnr_hvs_cuda.cis fork-introduced (T7-23 / ADR-0188 / ADR-0191). - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature psnr_hvs --backend cuda --places 3
0227 — output.c writer-format unit tests (R3 of coverage-gap-2026-05-02)¶
- Touches:
core/test/test_output.c(new) — exercises the four writers incore/src/output.c(XML / JSON / CSV / SUB) end-to-end viatmpfile()-backed sinks and a syntheticVmafFeatureCollector. Pure test-only; no production code change.core/test/meson.build— registerstest_outputnext totest_feature_collector(mirrors that test's wiring:link_with: libvmaf+ libsvm objects + log/predict/metadata helpers).- Invariant: the test pulls
libvmaf.candoutput.cin via#include "*.c"(mirroring the precedent intest_feature_collector.c) so the per-translation-unit.gcnolands in the test build dir and gcovr aggregates output.c's coverage. The mu-test framework macro (mu_assert) deliberately early-returns from eachstatic char *test_*()body — that's why every test body tripsclang-analyzer-unix.Malloc"potential leak" notes (cleanup runs only on the success-tail path). This pattern is shared across everycore/test/test_*.cfile and is load- bearing (per ADR-0141 NOLINT carve-out): replacing it with goto- cleanup would obscure the per-assertion failure message. - On upstream sync: zero interaction.
output.cis upstream- mirrored, but this PR doesn't touch it. The test only depends on the four public function signatures (vmaf_write_output_{xml, json,csv,sub}); if Netflix renames or reorders those, the test fails to compile and the rebase author updates it then. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && ./build/test/test_output
0126 — OSSF Scorecard policy (ADR-0263)¶
- Touches:
.github/workflows/scorecard.yml(line 45 — thegithub/codeql-action/upload-sarif@<sha>pin). The rest of the policy is doc-only (docs/adr/0263-*.md,docs/research/0053-*.md,changelog.d/security/). Upstream Netflix/vmaf does not ship a Scorecard workflow, so the path itself is fork-introduced and won't conflict. - Invariant: the
upload-sarifSHA must point to a commit that currently exists ingithub/codeql-action's git tree. A SHA that was oncev4head but no longer exists in the action repository triggers Scorecard's "imposter commit" defence and breaks the workflow with a 400 error againstapi.scorecard.dev. Verify on every Dependabot bump by spot-checkinggh api /repos/github/codeql-action/commits/<sha>returns 200. - On upstream sync: zero interaction.
- Re-test on rebase:
```bash # Confirm the pin still resolves to a real commit: pin=$(grep -oE 'codeql-action/upload-sarif@[a-f0-9]{40}' \ .github/workflows/scorecard.yml | head -1 | cut -d@ -f2) gh api "/repos/github/codeql-action/commits/$pin" --jq '.sha' # Then watch the next master push for a green Scorecard run: gh run list --workflow scorecard --repo VMAFx/vmafx --limit 1
0228 — U-2-Net u2netp saliency replacement deferred (ADR-0265)¶
- Touches: docs-only.
docs/adr/0265-u2netp-saliency-replacement-blocked.md— new ADR continuing the deferral chain started by ADR-0257.docs/research/0055-u2netp-saliency-replacement-survey.md— new research digest (upstream survey + license + distribution- op-allowlist audit + alternatives walk).
docs/ai/models/mobilesal.md— pointer block updated to reference both ADR-0257 (first blocker) and ADR-0265 (second blocker).model/tiny/registry.json—mobilesal_placeholder_v0notesfield updated to reference ADR-0265 alongside ADR-0257 (no schema / sha256 / file changes).model/tiny/mobilesal.json— sidecarnotesfield updated in lockstep.scripts/gen_mobilesal_placeholder_onnx.py— generator notes string updated so re-running is idempotent against the new sidecar / registry text.CHANGELOG.md— Changed entry viachangelog.d/changed/T6-2a-followup-u2netp-replacement-deferred.md.docs/adr/README.md— index row viadocs/adr/_index_fragments/0265-u2netp-saliency-replacement-blocked.md.- Invariant: zero C-side surface change.
feature_mobilesal.ctensor-name contract (inputinput→ outputsaliency_map, NCHW float32[1, 3, H, W]→[1, 1, H, W]) is unchanged; the on-diskmodel/tiny/mobilesal.onnx(sha256f1226310…) is unchanged;mobilesal_placeholder_v0'ssmoke: trueflag is unchanged. Any future drop-in (U-2-Net viaT6-2a-mirror-u2netp-via-release+T6-2a-widen-allowlist-resize, distilled student, or BASNet / PoolNet survey result) replaces the.onnxand bumps the registry sha256 without touching the C side. - On upstream sync: zero interaction.
feature_mobilesal.c, the registry, the ADR, and the research digest are all fork-local (T6-2a; ADR-0218 / ADR-0257 / ADR-0265; not present in Netflix upstream). - Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_mobilesal
python3 ai/scripts/validate_model_registry.py
bash scripts/docs/concat-adr-index.sh --check
bash scripts/release/concat-changelog-fragments.sh --check
0108 — ssim_accumulate_avx512 per-lane double reduction vectorised¶
- ADR: ADR-0139 (existing; no new ADR — the per-lane reduction order is unchanged).
- Touches:
core/src/feature/x86/ssim_avx512.c— thessim_accumulate_block_avx512body. The per-lane scalarssim_accumulate_lanecalls (16 of them) are replaced by two 8-wide__m512dpasses that computelv,cv,sv, andlv*cv*svlane-wise in vector double. Aligneddouble[16]spill buffers replace the previous_Alignas(64) float[16]×6spill, and the scalar accumulation loop now does 4×16vaddsdinstead of 16 invocations of the per-lane helper.CHANGELOG.md— Changed entry.- This file — this entry.
- Invariant (load-bearing for ADR-0139 bit-exactness):
- Per-lane double computation order is byte-identical:
((2.0 * rm) * cm + C1) / l_den, then(2.0 * srsc + C2) / c_den, then(lv * cv) * sv. No FMA contraction (separate_mm512_mul_pd+_mm512_add_pd—_mm512_fmadd_pdis forbidden because it changes the rounding count and would diverge from scalar's two-stepmul+add). - Float→double widening uses
_mm512_cvtps_pdwhich is IEEE-754-exact for finite floats (52-bit mantissa fits 23-bit float losslessly). - Lane-by-lane left-to-right reduction order preserved:
local_ssim += t_ssim[k]fork = 0..15. Tree reductions (pairwise add, dual-accumulator unroll) are forbidden — they break running-sum associativity against scalar. - AVX2 / NEON twins kept on the per-lane scalar path. Verified bit-identical against the new AVX-512 at
--precision maxon the Netflixsrc01_hrc00/01_576x324and thecheckerboard_1920_1080_10_3_*_0pairs. The bit-exactness contract (ADR-0139) is per-lane, not per-ISA algorithm — so AVX2 / NEON stay scalar-per-lane until a dedicated PR vectorises them with the same care. - Rebase impact: zero conflict with Netflix upstream — the whole SSIM SIMD surface is fork-local (no upstream SSIM SIMD exists). Conflicts only arise if upstream changes
ssim_accumulate_default_scalariniqa/ssim_tools.c; in that case both the AVX2 / NEON per-lane helper and the AVX-512 vector-double block need a coordinated update preserving the three invariants above. - Re-test on rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build
# Bit-exact at --precision max, scalar vs AVX2 vs AVX-512:
for MASK in 0 16 255; do
core/build/tools/vmaf -r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--feature float_ms_ssim --feature float_ssim \
--xml -o /tmp/m${MASK}.xml --precision max --cpumask $MASK
done
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m16.xml) # empty
diff <(grep -v 'fyi fps' /tmp/m0.xml) <(grep -v 'fyi fps' /tmp/m255.xml) # empty
- Why this matters on rebase: an upstream commit that touches
core/src/feature/ssimulacra2.ccould prompt a "let's also port the GPU XYB while we're here" follow-up. The ledger entry is the standing answer: don't, the measurement was redone on NVIDIA in May 2026 and the result still failedplaces=4by five decades. See Research-0047.
0126 — FastDVDnet real upstream weights drop (ADR-0253)¶
- What changed: replaces
model/tiny/fastdvdnet_pre.onnxwith the wrapped real upstream FastDVDnet checkpoint (sha256eb9444cf6f07eefdc7f4f68d09131074dbd1dcee6f88a331ba684dd2fb5937d4, ~9.5 MiB), refreshes the sidecarmodel/tiny/fastdvdnet_pre.json, flips the registry row'ssmoke: true → falseand addslicense: "MIT"+ the upstream commit pinc8fdf61. New exporterai/scripts/export_fastdvdnet_pre.py(the older_placeholder.pyexporter is retained for reference). New ADRdocs/adr/0255-fastdvdnet-pre-real-weights.md; user-facing docdocs/ai/models/fastdvdnet_pre.mdrewritten with provenance, license attribution, and reproduce-the-export instructions. - Upstream source: fork-local. Netflix/vmaf does not ship a FastDVDnet temporal pre-filter; the C extractor and ONNX surface are entirely fork-introduced (ADR-0215). The wrapped weights are attribution-only (upstream
m-tassano/fastdvdnetMIT). - On upstream sync: zero interaction. Every file touched (
ai/scripts/export_fastdvdnet_pre*.py,model/tiny/fastdvdnet_pre.*,docs/ai/models/fastdvdnet_pre.md,docs/adr/0253-*.md, CHANGELOG fragment, ADR index fragment) lives in fork-introduced trees. - Re-test on rebase:
# Re-derive the ONNX from the pinned upstream checkpoint.
mkdir -p /tmp/fastdvdnet_upstream && cd /tmp/fastdvdnet_upstream
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/model.pth
curl -L -O https://raw.githubusercontent.com/m-tassano/fastdvdnet/c8fdf61/models.py
cd /path/to/vmaf
python3 ai/scripts/export_fastdvdnet_pre.py \
--upstream-dir /tmp/fastdvdnet_upstream
python3 ai/scripts/validate_model_registry.py
meson test -C build --suite=fast --print-errorlogs test_fastdvdnet_pre
0127 — ONNX op-allowlist gains Resize (ADR-0258)¶
- Touches:
core/src/dnn/op_allowlist.c— fork-local file (no upstream counterpart). One new entry"Resize"under the/* convolutional */block.core/test/dnn/test_op_allowlist.c,core/test/dnn/test_onnx_scan.c— fork-local DNN tests.ai/tests/test_op_allowlist.py— fork-local Python parity test.- Invariant: the C allowlist is the single source of truth; the Python regex parser in
ai/src/vmaf_train/op_allowlist.pywalks the sameop_allowlist.cfile. Any future entry only needs the C edit — Python symmetry is automatic. - Upstream source: fork-local. Netflix/vmaf has no ONNX op- allowlist surface; the entire
core/src/dnn/tree is fork- introduced. - On upstream sync: zero interaction. Every file touched lives in fork-introduced trees.
- Re-test on rebase:
meson test -C build test_op_allowlist test_onnx_scan
PYTHONPATH=ai/src python -m pytest ai/tests/test_op_allowlist.py
0231 — vif.comp + ciede.comp precise decorations (ADR-0269 / Step A of Vulkan 1.4 bump)¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(3 local-variable type qualifiers:g,sv_sq,gg_sigma_f→precise float),core/src/feature/vulkan/shaders/ciede.comp(yuv_to_rgboutputs,rgb_to_xyzmatmul accumulators,ciede2000chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE). - Invariant: Both shaders are fork-local (Vulkan backend is fork-added; upstream Netflix/vmaf has no Vulkan compute kernels). The
precisekeyword is GLSL 4.50 standard syntax; glslc 2026.1 lowers it to per-resultOpDecorate NoContraction. The decorations are load-bearing for the cross-backend gate on NVIDIA driver 595.71+ — removing them would re-introduce the 42/48 ciede regression at API 1.3 documented in research-0054. - On upstream sync: zero interaction. Both shader files are entirely fork-introduced; upstream has no Vulkan compute path.
- Re-test on rebase:
# Re-confirm the cross-backend gate on a Vulkan-capable host.
meson setup core/build -Denable_vulkan=enabled
ninja -C core/build
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature vif --backend vulkan --places 4
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary core/build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--feature ciede --backend vulkan --places 4
# Confirm SPIR-V still emits NoContraction post-rebase.
glslc --target-env=vulkan1.3 -O \
core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
spirv-dis /tmp/vif.spv | grep -c NoContraction # expect ≥ 60
Expected on NVIDIA 595.71+: vif 0/48 OK, ciede 5/48 FAIL (max abs 8.9e-05 — pre-existing fork debt at API 1.3, see ADR-0269). On RADV / lavapipe: bit-exact (precise is a no-op there).
0229 — fr_regressor_v2 codec-aware scaffold (ADR-0272)¶
- ADR: ADR-0272
- Touches:
ai/scripts/train_fr_regressor_v2.py(new) — Phase A JSONL consumer; trains the codec-aware FRRegressor.model/tiny/fr_regressor_v2.onnx(new, smoke) — placeholder ONNX from--smokemode; re-baked on production training.model/tiny/fr_regressor_v2.json(new) — sidecar.model/tiny/registry.json— new entry withsmoke: true.docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md(new).docs/adr/README.md— index row.docs/research/0058-fr-regressor-v2-feasibility.md(new).docs/ai/models/fr_regressor_v2.md(new) — model card.ai/AGENTS.md— invariant note (codec block layout + ENCODER_VOCAB ordering).CHANGELOG.md— Added entry.- Invariant: the 8-D codec block layout is
[encoder_onehot(6), preset_norm, crf_norm]withENCODER_VOCAB = (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown)in load-bearing order. CRF normaliser is/63(union upper bound). Preset normaliser is/9. Bumping the vocabulary requires a re-train; existing checkpoints pin the order they were trained against viaencoder_vocab_versionin the sidecar. The two-input ONNX (features,codec) follows the LPIPS-Sq precedent (ADR-0040 / ADR-0041). - Rebase impact: entirely fork-local; pure additive; no upstream-mirror file is touched. Phase A schema (consumed by this trainer) is itself fork-local (
tools/vmaf-tune/). No conflict expected on/sync-upstream. - Re-test on rebase:
0311 — libFuzzer harness expansion: yuv_input + cli_parse (ADR-0311)¶
- ADR: ADR-0311; parent ADR-0270.
- Touches:
core/test/fuzz/fuzz_yuv_input.c(new)core/test/fuzz/fuzz_cli_parse.c(new)core/test/fuzz/meson.build— two newexecutable(...)blocks for the harnesses, plus a sharedfuzz_vidinput_sourceslist.core/test/fuzz/yuv_input_corpus/*(new — 6 seeds covering 8/10-bit × 4:2:0 / 4:2:2 / 4:4:4 plus a truncated-frame seed).core/test/fuzz/cli_parse_corpus/*(new — 6 seeds covering the--feature,--model,--reference, YUV-flag, and--helpshapes).core/test/fuzz/README.md— Targets table extended..github/workflows/fuzz.yml— matrix gainsfuzz_yuv_input+fuzz_cli_parse; per-harness wall-clock budget reduced from 300 s to 60 s so the 3-target matrix fits the existingtimeout-minutes: 15cap.docs/development/fuzzing.md— runbook table + smoke commands extended.docs/adr/0311-libfuzzer-harness-expansion.md(new)docs/research/0083-libfuzzer-harness-expansion-target-survey.md(new)libvmaf/AGENTS.md— new invariant block for the one-parser-one-harness rule.CHANGELOG.md— Added entry.- Invariant:
- The fuzz scaffold remains opt-in (
-Dfuzz=true) — every defaultmeson setupinvocation must continue to skip it. fuzz_yuv_inputre-includestools/yuv_input.cand the rest of the vidinput trio as build inputs. Upstream Netflix/vmaf splits or renames of those source files need the matchingmeson.buildsource-list update.fuzz_cli_parsere-includestools/cli_parse.cas a build input and links againstlibvmafforvmaf_version()and feature-dictionary symbols. The-Wl,--wrap=exitlink arg is load-bearing — without it,usage()'sexit(1)would terminate the fuzzer process on first bad input.LLVMFuzzerTestOneInputkeeps external linkage; the scaffold-wide// NOLINTNEXTLINE(misc-use-internal-linkage)pattern is correct for libFuzzer's name-resolved entry-point ABI.- Rebase impact: any upstream sync that touches
core/tools/{yuv_input,cli_parse}.cmust re-run the 60 s smoke per harness on the merged tip; record any new-found crash-* artefact under the matching<target>_known_crashes/dir, not in<target>_corpus/. The__wrap_exitshim infuzz_cli_parse.cis GNU-ld / lld-only; do not assume it works on Apple ld without an-undefined,dynamic_lookupfallback. - Re-test on rebase:
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz \
test/fuzz/fuzz_y4m_input \
test/fuzz/fuzz_yuv_input \
test/fuzz/fuzz_cli_parse
./build-fuzz/test/fuzz/fuzz_yuv_input \
-seed=0 -runs=1000 \
core/test/fuzz/yuv_input_corpus/
./build-fuzz/test/fuzz/fuzz_cli_parse \
-seed=0 -runs=1000 \
core/test/fuzz/cli_parse_corpus/
0229 — libFuzzer scaffold for the YUV4MPEG2 parser (ADR-0270)¶
- ADR: ADR-0270
- Touches:
core/test/fuzz/fuzz_y4m_input.c(new)core/test/fuzz/meson.build(new)core/test/fuzz/README.md(new)core/test/fuzz/y4m_input_corpus/*(new — six seeds)core/test/fuzz/y4m_input_known_crashes/*(new — one 411-chroma OOB reproducer; excluded from CI corpus)core/test/meson.build—subdir('fuzz')line.core/meson_options.txt— newoption('fuzz', ...)..github/workflows/fuzz.yml(new — nightly 5-minute job).docs/development/fuzzing.md(new — operator runbook).docs/adr/0270-fuzzing-scaffold.md(new)docs/research/0059-libfuzzer-scaffold-y4m.md(new)docs/state.md— new Open-bug row for the 411-chroma OOB write.CHANGELOG.md— Added entry.- Invariant: the fuzz scaffold is opt-in — every default
meson setupinvocation must continue to skip it. The harness links statically againstcore/tools/{y4m_input,yuv_input,vidinput}.crather thanlibvmaf.soso the public C-API surface stays unchanged. - Rebase impact: the harness re-includes
core/tools/y4m_input.cas a build input. Any upstream Netflix/vmaf change that splits or renames the tool sources (e.g. moves the parser intocore/src/) needs the correspondingmeson.buildsource list update and the harness re-test below. They4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4mreproducer is the regression gate for the parser fix; do not delete it on upstream sync — if upstream lands the same fix, port the reproducer back intoy4m_input_corpus/as a permanent seed. - Re-test on rebase:
CC=clang CXX=clang++ \
meson setup build-fuzz libvmaf \
--buildtype=debug \
-Db_sanitize=address \
-Db_lundef=false \
-Dfuzz=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build-fuzz test/fuzz/fuzz_y4m_input
./build-fuzz/test/fuzz/fuzz_y4m_input \
-max_total_time=60 \
core/test/fuzz/y4m_input_corpus/
# Verify the known-crash reproducer still triggers (until the fix lands):
./build-fuzz/test/fuzz/fuzz_y4m_input \
core/test/fuzz/y4m_input_known_crashes/y4m_411_w2_h4_oob_dst.y4m
0231 — HIP seventh-consumer kernel float_motion_hip (ADR-0273)¶
- ADR: ADR-0273
- Touches:
core/src/feature/hip/float_motion_hip.c(new) — seventh consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/float_motion_cuda.ccall-graph-for-call-graph;init/submit/collect/closeinvoke the kernel-template helpers in the same order;flush()callback for tail-frame motion2 emission;motion_force_zeroshort-circuit posture (fex->extractswap withsubmit / collect / flush / closenulled). Submit path intentionally bypassesvmaf_hip_kernel_submit_pre_launch(kernel writes per-WG SAD float partials directly, no atomic, no memset).core/src/feature/hip/float_motion_hip.h(new)core/src/hip/meson.build— new entry inhip_sources.core/src/feature/feature_extractor.c— extern declaration plusfeature_extractor_list[]entry under#if HAVE_HIP.core/test/test_hip_smoke.c— new sub-testtest_float_motion_hip_extractor_registered(also asserts theVMAF_FEATURE_EXTRACTOR_TEMPORALflag bit) and a row intest_table[].docs/adr/0273-hip-seventh-consumer-float-motion.md(new)docs/adr/README.md— index row.docs/backends/hip/overview.md— seventh / eighth consumer note.core/src/hip/AGENTS.md— invariant note.CHANGELOG.md— Added entry (joint with ADR-0274).- Invariant — three-buffer ping-pong +
motion_force_zeroshort-circuit are load-bearing. The state struct carries threeuintptr_tbuffer slots (ref_in,blur[2]) that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin'sVmafCudaBuffer *ref_in+VmafCudaBuffer *blur[2]field shape. Themotion_force_zeroshort-circuit (fex->extractswap, kernel-template helpers nulled) must stay aligned with the CUDA twin on every refactor — otherwise the runtime PR's helper-body flip diverges between the two backends. Thesubmit_pre_launchbypass mirrors the CUDA twin; if a future PR adds asubmit_pre_launchcall tofloat_motion_cuda.c's submit path, the HIP twin must follow in the same PR. - Rebase impact: entirely fork-local. New files are HIP-specific. The only upstream-touching edit is
feature_extractor.c, but the change sits inside an existing#if HAVE_HIPblock (ADR-0241); upstream has noHAVE_HIPso no conflict is expected. - Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke
0232 — HIP eighth-consumer kernel float_ssim_hip (ADR-0274)¶
- ADR: ADR-0274
- Touches:
core/src/feature/hip/float_ssim_hip.c(new) — eighth consumer ofcore/src/hip/kernel_template.h. Mirrorscore/src/feature/cuda/integer_ssim_cuda.ccall-graph-for-call-graph (the CUDA file registersvmaf_fex_float_ssim_cudadespite itsinteger_filename). First multi-dispatch HIP consumer (chars.n_dispatches_per_frame == 2). Submit path intentionally bypassesvmaf_hip_kernel_submit_pre_launch(kernel writes per-block float partials directly). State struct carries fiveuintptr_tintermediate float buffer slots (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp) tracked outside the kernel-template's readback bundle.validate_dims_hipandinit_dims_hiphelpers extracted frominit()to fit thereadability-function-sizebudget.core/src/feature/hip/float_ssim_hip.h(new)core/src/hip/meson.build— new entry inhip_sources.core/src/feature/feature_extractor.c— extern declaration plusfeature_extractor_list[]entry under#if HAVE_HIP.core/test/test_hip_smoke.c— new sub-testtest_float_ssim_hip_extractor_registered(also assertschars.n_dispatches_per_frame == 2) and a row intest_table[].docs/adr/0274-hip-eighth-consumer-float-ssim.md(new)docs/adr/README.md— index row.docs/backends/hip/overview.md— seventh / eighth consumer note (joint).core/src/hip/AGENTS.md— invariant note.CHANGELOG.md— Added entry (joint with ADR-0273).- Invariant — multi-dispatch + five-slot buffer pyramid + v1
scale=1validation are load-bearing. The state struct carries fiveuintptr_tintermediate float buffer slots that the runtime PR (T7-10b) will swap for real device-buffer handles matching the CUDA twin'sVmafCudaBuffer *h_*field shape — any drift in the CUDA twin's slot count requires a paired update here. Thechars.n_dispatches_per_frame == 2characteristic is asserted in the smoke test; do not silently lower it. The v1scale=1-EINVALvalidation surface (invalidate_dims_hip) must stay aligned with the CUDA twin'scompute_scale/vmaf_logchain. The HIP twin'svalidate_dims_hip/init_dims_hipextraction is intentional for the function-size budget; do not re-inline without verifying the budget still passes. - Rebase impact: entirely fork-local; same posture as ADR-0273.
- Re-test on rebase:
meson setup build libvmaf -Denable_hip=true \
-Denable_cuda=false -Denable_sycl=false -Denable_vulkan=disabled
ninja -C build
meson test -C build test_hip_smoke
0229 — vmaf_tiny_v3 + vmaf_tiny_v4 dynamic-PTQ int8 sidecars (ADR-0275)¶
0278 — vmaf-tune libaom-av1 codec adapter (2026-05-03)¶
0228 — vmaf-tune libx265 codec adapter (ADR-0288)¶
0280 — vmaf-tune NVENC codec adapters (ADR-0290)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/{h264_nvenc,hevc_nvenc,av1_nvenc,_nvenc_common}.py(new). Wholly fork-local — no upstream Netflix/vmaf overlap.tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py— registry expanded.tools/vmaf-tune/tests/test_codec_adapter_nvenc.py(new).tools/vmaf-tune/tests/test_corpus.py— Phase-A registry assertion updated.tools/vmaf-tune/AGENTS.md— invariant note expanded.docs/usage/vmaf-tune.md— "Hardware encoders (NVENC)" section.docs/adr/0290-vmaf-tune-nvenc-adapters.md(new) +docs/adr/README.mdindex row.docs/research/0065-vmaf-tune-nvenc-adapters.md(new).CHANGELOG.md— Added entry.- Invariant:
known_codecs()returns the four-codec tuple("av1_nvenc", "h264_nvenc", "hevc_nvenc", "libx264"); the mnemonic preset map (ultrafast/superfast/veryfast→p1,faster→p2,fast→p3,medium→p4,slow→p5,slower→p6,slowest/placebo→p7) is the canonical cross-codec preset alignment that downstream Phase B/C consumers assume. The CQ window is the hardware-permitted[0, 51]; the Phase A informative window is[15, 40]. - Rebase impact: zero —
tools/vmaf-tune/is wholly fork-local and has no upstream Netflix/vmaf path overlap. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry add),tools/vmaf-tune/src/vmaftune/encode.py(parse_versions(stderr, encoder=…)gains a per-codec branch),tools/vmaf-tune/src/vmaftune/cli.py(help-text wording only),tools/vmaf-tune/tests/test_codec_adapter_x265.py(new),tools/vmaf-tune/tests/test_corpus.py(membership-based codec list assertion). - Invariant: the codec-adapter contract documented in
tools/vmaf-tune/AGENTS.md(multi-codec from day one; the search loop never branches on codec identity). Theparse_versionssignature is still backward-compatible —encoderdefaults tolibx264so callers from before this PR keep working. - Upstream source: fork-local.
tools/vmaf-tune/is fork-only; upstream Netflix/vmaf does not ship encode automation. - On upstream sync: zero interaction. Confirm the
_index_fragments/_order.txtrow for0288-vmaf-tune-codec-adapter-x265remains present after any cross-merge. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry row + import),tools/vmaf-tune/tests/test_corpus.py(membership assertion relaxed from== ("libx264",)to"libx264" in known_codecs()),tools/vmaf-tune/tests/test_codec_adapter_libaom.py(new),tools/vmaf-tune/AGENTS.md(preset-vocabulary invariant). - Invariant: the cross-codec preset vocabulary (
placebo, slowest, slower, slow, medium, fast, faster, veryfast, superfast, ultrafast) is shared across AV1-family adapters so one--presetaxis covers x264 / x265 / svtav1 / libaom-av1. Each adapter maps the human name onto its codec-specific knob; do not introduce per-adapter preset names. - Upstream source: fork-local.
tools/vmaf-tune/is the fork-introduced quality-aware encode automation harness (ADR-0237); it has no upstream Netflix/vmaf counterpart. - On upstream sync: zero interaction with
upstream/master. Self-contained intools/vmaf-tune/anddocs/. - Re-test on rebase:
0227 — ffmpeg-patches/ series re-verified against n8.1 (2026-05-03)¶
- ADR: ADR-0275
- Touches:
model/tiny/vmaf_tiny_v3.int8.onnx(new, 4 267 B)model/tiny/vmaf_tiny_v4.int8.onnx(new, 7 769 B)model/tiny/registry.json— newvmaf_tiny_v3andvmaf_tiny_v4rows withquant_mode,int8_sha256,quant_accuracy_budget_plccfields.model/tiny/vmaf_tiny_v3.json,model/tiny/vmaf_tiny_v4.json— same fields mirrored into the per-model sidecars.docs/ai/models/vmaf_tiny_v3.md,docs/ai/models/vmaf_tiny_v4.md— new "Quantisation" sections.docs/adr/0275-vmaf-tiny-v3-v4-ptq.md(new) and ADR index row.CHANGELOG.md— Added entry.- Invariant:
python ai/scripts/measure_quant_drop.py --allreports[PASS]for bothvmaf_tiny_v3(drop ≤ 0.001 on Netflix features) andvmaf_tiny_v4(drop ≤ 0.001), inside the 0.01 per-model budget. The runtime redirect from ADR-0174 picks the.int8.onnxsibling when an operator's registry overlay declaresquant_mode: dynamic. - Rebase impact: entirely fork-local — neither v3 nor v4 nor the dynamic-PTQ harness exists upstream. The new int8 ONNX bytes ship as committed binaries (mirroring
learned_filter_v1andnr_metric_v1); they are well below the few-MB external-data threshold and don't require the sigstore +.onnx.datapattern. - Re-test on rebase:
```bash python ai/scripts/validate_model_registry.py python ai/scripts/measure_quant_drop.py --all
0229 — NVIDIA-Vulkan ciede2000 places=4 fork debt root-cause (ADR-0273)¶
- Touched files: docs-only.
docs/adr/0273-...precision-gap.md(new) +_index_fragments/row +_order.txtappend.docs/research/0055-ciede-vulkan-nvidia-f32-f64-root-cause.md(new) +docs/research/README.mdindex row.docs/state.md— Open-bugs rowT-VK-CIEDE-F32-F64.docs/backends/vulkan/overview.md— NVIDIA-hardware caveat.changelog.d/changed/ciede-vulkan-nvidia-f32-f64-precision-gap.md(new).core/src/vulkan/AGENTS.md— invariant cross-link.- Invariant: the ciede.comp shader's f32 precision contract is load-bearing — promoting to f64 would silently change scores on every Vulkan device that supports
shaderFloat64and create a per-device-feature-bit divergence (RTX 4090 has it; many consumer GPUs don't). The CPUciede.c::get_lab_colordoing its colour-space chain indoubleis upstream Netflix behaviour and must not be narrowed to f32 to "fix" the GPU gap (would change Netflix golden ground truth). The 5/48 NVIDIA places=4 mismatch on the highest-ΔE frames is expected and documented; do not attempt to "fix" it without re-reading ADR-0273 first. - Rebase impact: zero — docs-only. The CPU and shader sources this ADR analyses are unchanged by this PR. If a future upstream rebase touches
ciede.c::get_lab_color(thedoublechain) the ADR's reasoning still holds; if upstream changes the CPU reference's precision posture, ADR-0273 needs aStatus: Supersededentry. - Re-test on rebase: a manual NVIDIA-hardware run if available:
```bash cd libvmaf && meson setup build \ -Denable_vulkan=enabled -Denable_cuda=false && ninja -C build cd .. python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary $PWD/core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature ciede --backend vulkan --device 0 --places 4 # Expected post-PR-346 (when merged): 5/48 mismatches at 1.78× threshold. # Expected pre-PR-346 (current master): 42/48 mismatches at higher ratio. # If the count drops below 5/48 on NVIDIA, ADR-0273 should record the # delta and consider closing T-VK-CIEDE-F32-F64.
0229 — tools/vmaf-tune fast Phase A.5 scaffold (ADR-0276)¶
- Touches:
tools/vmaf-tune/src/vmaftune/fast.py(new),tools/vmaf-tune/src/vmaftune/cli.py(newfastsubcommand branch),tools/vmaf-tune/pyproject.toml(new[fast]extra),tools/vmaf-tune/tests/test_fast.py(new),tools/vmaf-tune/AGENTS.md(new invariants),docs/usage/vmaf-tune.md(new "Phase A.5" section),docs/adr/0276-vmaf-tune-fast-path.md(new ADR),docs/research/0060-vmaf-tune-fast-path.md(new digest). - Invariant: the
fastsubcommand is opt-in and never automatically replaces the Phase A grid path. The slow grid is the ground-truth corpus generator (ADR-0237 contract); fast-path is for the recommendation use case only. Optuna is a lazy-imported optional dep gated behind the[fast]extra — importing it at module scope outsidefast.py(or its tests) breaks the zero-dep core install. - Rebase impact: entirely fork-local; the tool sits under
tools/vmaf-tune/which is fork-added, and no upstream files are touched. Upstream Netflix/vmaf has no analogous surface. - Re-test on rebase:
pip install -e 'tools/vmaf-tune[fast]'
pytest tools/vmaf-tune/tests/test_fast.py -v
vmaf-tune fast --smoke --target-vmaf 92
0229 — vmaf-tune recommend subcommand (ADR-0237 Phase B-lite)¶
- Touches:
tools/vmaf-tune/src/vmaftune/recommend.py(new). Wholly fork-local — no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/cli.py— addsrecommendsubparser;corpussubcommand untouched.tools/vmaf-tune/tests/test_recommend.py(new). 13-case smoke suite, mocks all binaries; runs in <100 ms.docs/usage/vmaf-tune.md— adds## recommendsection.- Invariant:
recommendconsumes the existingCORPUS_ROW_KEYSschema unchanged —vmaf_score,bitrate_kbps,crf,preset,encoder,exit_status. No schema bump. If a future PR bumpsSCHEMA_VERSION, both thecorpuswriter and therecommendreader must be updated in lockstep; tests assert this viatest_corpus_row_keys_match_init_contract. - Rebase impact: zero —
tools/vmaf-tune/is wholly fork-local; no upstream surface touches it. - Re-test on rebase:
0228 — integer_ms_ssim_cuda.c joins drain_batch (T-GPU-OPT-2 / ADR-0271)¶
- Touches:
core/src/feature/cuda/integer_ms_ssim_cuda.c. No upstream Netflix/vmaf changes expected here — the file is fork-added (CUDA twin of the upstream-portms_ssim_score.cu) and the surface this PR redrew (per-scalel_partials[i]/c_partials[i]/s_partials[i]arrays + the per-scaleh_l_partials[i]/h_c_partials[i]/h_s_partials[i]pinned host shadows + thesubmit()<→collect()work redistribution + thecuEventRecord(s->lc.finished, s->lc.str)+vmaf_cuda_drain_batch_register(&s->lc)tail) is also entirely fork-local. - Invariant: the engine-scope drain-batch contract from ADR-0271 / drain_batch.h. The kernel-launch order on
s->lc.strmust stay stable:decimate (× 4)then for each scalei ∈ 0..4horiz⇒vert_lcs⇒ DtoH(l_partials[i]) ⇒ DtoH(c_partials[i]) ⇒ DtoH(s_partials[i])thencuEventRecord(s->lc.finished, s->lc.str)thenvmaf_cuda_drain_batch_register(&s->lc). Same-stream ordering is what makes the shared SSIM intermediates (h_ref_mu,h_cmp_mu,h_ref_sq,h_cmp_sq,h_refcmp`) safe across scales without explicit sync — any change that parallelises the per-scale work onto multiple streams breaks bit-exactness unless per-scale intermediates are also added. - On upstream sync: zero interaction (the file is fork-added). If a future upstream PR adds an
integer_ms_ssim_cuda.cof its own, the merger must reconcile the per-scale partials topology + the drain_batch tail with whatever the new upstream shape brings. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build # confirms the CPU build still links cleanly
# If the dev host has a working nvcc / host-compiler pair:
meson setup build_cuda -Denable_cuda=true -Denable_sycl=false
ninja -C build_cuda src/liblibvmaf_feature.a.p/feature_cuda_integer_ms_ssim_cuda.c.o
# Netflix CPU golden gate (CPU is the bit-exactness ground truth):
make test-netflix-golden
# Cross-backend parity (places=4 gate, ADR-0214):
/cross-backend-diff
0277 — ffmpeg-patches refresh against n8.1 — 2026-05-04 (ADR-0277)¶
- Touches:
ffmpeg-patches/is unchanged (no content drift). Doc-only entries land in: docs/adr/0277-ffmpeg-patches-refresh-2026-05-04.md— new ADR.docs/adr/_index_fragments/0277-ffmpeg-patches-refresh-2026-05-04.md— index row.docs/adr/_index_fragments/_order.txt— manifest append.changelog.d/changed/ffmpeg-patches-refresh-2026-05-04.md— Changed entry.- This file — this entry.
- Invariant:
ffmpeg-patches/series.txtorder is load-bearing — patches0002…0006build on each other and only apply cleanly cumulatively. The verification gate is a series replay, not a per-patchgit apply --check(per ADR-0118 + CLAUDE.md §12 r14). - On upstream sync: zero interaction. Netflix/vmaf has no
ffmpeg-patches/tree; this is a fork-local integration surface. - Re-test on rebase (also: re-replay procedure for the next refresh):
# Clone pristine n8.1
git -C /tmp clone --depth 1 --branch n8.1 \
https://github.com/FFmpeg/FFmpeg.git ff-replay-$(date +%F)
cd /tmp/ff-replay-$(date +%F)
git switch -c refresh-$(date +%F)
git config user.email refresh@local && git config user.name "Refresh Bot"
# Replay the series cumulatively
for p in /path/to/vmaf/ffmpeg-patches/000*-*.patch; do
git am --3way "$p" || break
done
# Regenerate and compare to in-tree
mkdir -p /tmp/ff-regen-$(date +%F)
git format-patch n8.1.. -o /tmp/ff-regen-$(date +%F)/
# Diff old vs new excluding pure format-patch noise
for i in 1 2 3 4 5 6; do
orig=$(ls /path/to/vmaf/ffmpeg-patches/000${i}-*.patch)
regen=$(ls /tmp/ff-regen-$(date +%F)/000${i}-*.patch)
diff -u \
<(grep -v "^From [0-9a-f]\|^Date:\|^index " "$orig") \
<(grep -v "^From [0-9a-f]\|^Date:\|^index " "$regen") \
| head -40
done
If only stylistic diffs surface (PATCH N/M numbering, MIME headers, hunk-context counts, hunk offset shifts against cumulative state), keep originals — record a no-drift refresh ADR. If real content drift surfaces, regenerate and ship the refresh PR with the regenerated patches plus a content-summary ADR.
End-to-end vf_libvmaf smoke is best run from CI (ffmpeg-integration.yml) against an installed libvmaf prefix — the meson-uninstalled .pc does not satisfy FFmpeg's #include <libvmaf.h> probe (the headers live under libvmaf/libvmaf.h only; the system-installed .pc carries an extra -I${includedir}/libvmaf shortcut that the uninstalled .pc omits).
0229 — T7-5 NOLINT-sweep closeout (ADR-0278)¶
- Touched files:
core/src/feature/integer_adm.c(1 NOLINT cite, line ~988adm_decouple_s123— upstream-mirror Netflix966be8d5).core/src/feature/cuda/ssimulacra2_cuda.c(3 NOLINT cites:ss2c_picture_to_linear_rgb,ss2c_host_combine,ss2c_run_scale_gpu/extract_fex_cuda).core/src/feature/vulkan/ssimulacra2_vulkan.c(3 NOLINT cites:ss2v_setup_gaussian,ss2v_picture_to_linear_rgb,ss2v_run_scale).core/src/feature/vulkan/cambi_vulkan.c(1 NOLINT cite:cambi_vk_extract).core/src/feature/sycl/integer_adm_sycl.cpp(6 cites, SYCL kernel-launch entries).core/src/feature/sycl/integer_motion_sycl.cpp(2 cites).core/src/feature/sycl/integer_vif_sycl.cpp(4 cites).core/tools/vmaf.c(3 cites:copy_picture_data,init_gpu_backends,main).- Invariant: zero behavioural change. Edits are inside comment blocks — appended
(ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278)to existing prose justifications. No function bodies split. The 12 SYCL sites share an identical justification string verbatim; preserving the byte-for-byte duplicate is the load-bearing documentation pattern (grep-able across the SYCL TUs). - On upstream sync: minimal interaction. The cite-only edits live inside comment blocks above the function signatures; rebases will surface them as touched lines but the function bodies are unchanged. For
integer_adm.c's upstream-mirror block (Netflix966be8d5), the comment edit at line 984–991 is cosmetic — keep the fork's version on conflict (it merely names the ADR; the underlying prose is unchanged). - Re-test on rebase:
```bash # 1. Programmatic audit must report 0 missing citations python3 - <<'PY' import re, os paths = [os.path.join(r, f) for r, _, fs in os.walk('libvmaf/src') for f in fs if f.endswith(('.c','.cpp','.h'))] paths.append('core/tools/vmaf.c') miss = total = 0 for p in paths: with open(p) as fh: ls = fh.readlines() for i, line in enumerate(ls): if 'NOLINT' in line and 'readability-function-size' in line and 'NOLINTEND' not in line: total += 1 ctx = [line]; j = i - 1 while j >= 0 and j > i - 14: s = ls[j].strip() if not s: break if s.startswith(('//','/','')): ctx.insert(0, ls[j]); j -= 1 else: break buf = ''.join(ctx) if 'ADR-' not in buf and not re.search(r'[Rr]esearch-?\d', buf): miss += 1 print(f"sites={total} missing={miss}") PY
# 2. Build + Netflix golden gate meson setup build -Denable_cuda=false -Denable_sycl=false ninja -C build make test-netflix-golden
0231 — vmaf-tune score path decodes mp4 -> raw YUV¶
- Touches:
tools/vmaf-tune/src/vmaftune/score.py(new_decode_to_raw_yuv+_needs_decodehelpers,run_scoreshells out to ffmpeg whenreq.distorted.suffix not in {.yuv, .y4m});tools/vmaf-tune/tests/test_corpus.py(3 new regression tests + the smoke-end-to-end mock now also stubs the ffmpeg decode call). - Invariant: the decode-back is the contract the libvmaf CLI imposes — mp4/webm/etc.
--distortedis silently rejected as raw-yuv with the wrong byte count, surfacing asexit_status=234. Future encoder adapters that emit non-raw containers inherit this decode automatically. Do not "optimise" the temp YUV away without first migrating the corpus pipeline to theffmpeg+libvmaffilter (which can pipe an mp4 stream in directly). - On upstream sync: zero interaction.
vmaf-tuneis fork-only tooling; upstream Netflix/vmaf has no analogue. - Re-test on rebase:
```bash cd tools/vmaf-tune && python3 -m pytest tests/ # plus an end-to-end smoke (needs a real raw YUV + ffmpeg + vmaf): ./vmaf-tune corpus --source /path/to/ref.yuv --width 1920 \ --height 1080 --pix-fmt yuv420p --framerate 25 --duration 6 \ --encoder libx264 --preset medium --crf 23 \ --output /tmp/smoke.jsonl --no-source-hash # expect: vmaf_score is a real number, not NaN.
0232 — CUDA build pins nvcc --std c++20¶
- Touches:
core/src/meson.buildline 686 (cuda_flags = [...]). - Invariant: nvcc 12.x clamps host C++ at C++17 by default; 13.x accepts up to C++20. Bumping the host stdlib past nvcc's default (any gcc >= 16, libstdc++ ships C++23 features) breaks the host-side parse in
<type_traits>/<bits/utility.h>. Forcing--std c++20on CUDA 13+ keeps the host headers parseable. Do not drop this flag without first checking the host gcc version against nvcc's default. - On upstream sync: zero interaction. Netflix/vmaf doesn't ship the
cuda_flagslist shape we use (their CUDA build is the original pre-fork pattern); a sync that touchescore/src/meson.buildaround theis_cuda_enabledbranch should keep the--std c++20injection. - Re-test on rebase:
meson setup core/build-cuda -Denable_cuda=true \
-Denable_sycl=false -Denable_vulkan=disabled
ninja -C core/build-cuda
# smoke
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
-r .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
-d .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
-w 1920 -h 1080 -p 420 -b 8
0233 — CUDA motion flush_fex_cuda idempotency guard¶
- Touches:
core/src/feature/cuda/integer_motion_cuda.c— factored anappend_if_unwrittenhelper and routed the two motion2 / motion3 final-frame writes through it. - Invariant: under T-GPU-OPT-1 (PR #312 / ADR-0242), the pending-collect inside
flush_context_cudamay already have writtenmotion2_score[s->index]/motion3_score[s->index]beforeflush_fex_cudaruns. Any future motion-cuda flush logic that emits the same (feature, index) pair must keep this idempotency contract orflush_context_cudawill mis-surface as "context could not be synchronized". - On upstream sync: the bug only exists because the fork's
flush_context_cudaruns the pending-collect before the per-extractor flush. Netflix/vmaf upstream doesn't have the T-GPU-OPT-1 drain pattern, so the pre-#312 code path didn't duplicate-write. If Netflix lands a similar pattern, the fix shape mirrors what's done here. - Re-test on rebase:
ninja -C core/build-cuda
./core/build-cuda/tools/vmaf --gpumask=0 --no_sycl --no_vulkan \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--model path=model/vmaf_v0.6.1.json --threads 1 -q \
--output /tmp/cuda.json --json
# Expect: clean run, no "cannot be overwritten" warning,
# no "problem flushing context" error.
0234 — hw_encoder_corpus.py Phase A real-corpus runner¶
- Touches: new
scripts/dev/hw_encoder_corpus.py(no existing caller; opt-in tooling). Output landing inruns/phase_a/is gitignored — rerun the script to reproduce.docs/development/intel-arc-vaapi-driver-priority.md. Output landing inruns/phase_a/is gitignored — rerun the script to reproduce. stratified sample, 58 KiB). - Invariant: the script's QSV path forces
env['LIBVA_DRIVER_NAME']='iHD'(set by the calling shell, not inside the script) when targeting/dev/dri/renderD129on a multi-card host that has NVIDIA's libva-driver-nvidia shim installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI translation and the MFX session handshake fails with -9. See the companion doc for the failure mode + fix. - On upstream sync: zero interaction. The script lives under
scripts/dev/(fork-only); upstream Netflix/vmaf has no comparable Phase A corpus tooling. - Re-test on rebase:
python3 scripts/dev/hw_encoder_corpus.py \
--vmaf-bin core/build-cuda/tools/vmaf \
--source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
--width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
--encoder h264_nvenc --cq 25 \
--out /tmp/smoke.jsonl
# Expect: 1 cell × ~150 frames, per-frame canonical-6 + vmaf,
# encoder=h264_nvenc, cq=25.
0235 — fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)¶
- Touches:
ai/scripts/train_fr_regressor_v2.py—ENCODER_VOCABgains 6 hw-codec entries (3 NVENC + 3 QSV);ENCODER_VOCAB_VERSIONbumps 1 -> 2;PRESET_ORDINALgains 6 sub-tables forp1..p7(NVENC) and the libx264-aligned QSV preset family. - Invariant: vocab order is load-bearing — index of every entry is baked into trained model graphs as a one-hot column position. New entries MUST be appended (never inserted into the middle), and the
unknownsentinel MUST stay last (UNKNOWN_ENCODER_INDEX = N - 1). BumpingENCODER_VOCAB_VERSIONsignals that any v1-graph ONNX needs re-export against v2 before consuming v2 training rows. - On upstream sync: zero interaction.
train_fr_regressor_v2.pyis fork-only (Phase B prereq, ADR-0237 / ADR-0272). - Re-test on rebase:
python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export— expect PLCC > 0.95 on a multi-codec corpus.
0276 — vmaf_tiny_v5 corpus-expansion probe (ADR-0287) — defer¶
- What changed: research-only addition. New scripts under
ai/scripts/(fetch_youtube_ugc_subset.py,extract_ugc_features.py,train_vmaf_tiny_v5.py,eval_loso_vmaf_tiny_v5.py), new ADRdocs/adr/0276-*.md, new research digestdocs/research/0057-*.md, and one CHANGELOG entry. No new ONNX artefact undermodel/tiny/, no registry change, no public C-API / CLI / meson_options change. The probe trained an architecturally identical mlp_small on a 5-corpus parquet (4-corpus + 27 000 UGC rows); the 1-σ ship gate did not clear (Δ PLCC = +0.00005), so the exporter that the prior agent had drafted (export_vmaf_tiny_v5.py) was discarded before the commit. - Upstream source: fork-local. Netflix/vmaf has no tiny-AI corpus-expansion surface; nothing on the upstream side touches these files.
- On upstream sync: zero interaction. The v5 surface lives entirely under
ai/scripts/+docs/adr/+docs/research/, all of which are fork-introduced trees. The shipped v2 model (model/tiny/vmaf_tiny_v2.onnx) and its registry row are untouched. - Re-test on rebase:
# No code under test on rebase — purely research artefacts.
# If revisiting the corpus expansion, the reproducer is in the
# research digest:
python3 ai/scripts/fetch_youtube_ugc_subset.py \
--out-dir .workingdir2/ugc/download \
--n-stems 30 \
--manifest .workingdir2/ugc/manifest.json
python3 ai/scripts/extract_ugc_features.py \
--manifest .workingdir2/ugc/manifest.json \
--yuv-dir .workingdir2/ugc/yuv \
--vmaf-bin build-cpu/tools/vmaf \
--out-parquet runs/full_features_ugc.parquet \
--max-height 360 --max-frames 300 --threads 8
python3 ai/scripts/eval_loso_vmaf_tiny_v5.py \
--parquet-base runs/full_features_4corpus.parquet \
--parquet-extra runs/full_features_ugc.parquet \
--out-json runs/vmaf_tiny_v5_loso_metrics.json
0227 — vmaf-tune Intel QSV codec adapters (ADR-0281)¶
- What changed: fork-local additions under
tools/vmaf-tune/src/vmaftune/codec_adapters/—_qsv_common.py,h264_qsv.py,hevc_qsv.py,av1_qsv.py, plus registry rows incodec_adapters/__init__.pyand a new test filetools/vmaf-tune/tests/test_codec_adapter_qsv.py. Doc updates:docs/usage/vmaf-tune.md(Hardware encoders section),docs/adr/0281-vmaf-tune-qsv-adapters.md,docs/research/0066-vmaf-tune-qsv-adapters.md,tools/vmaf-tune/AGENTS.md,CHANGELOG.md. - Upstream source: fork-local.
tools/vmaf-tune/is fork-introduced under ADR-0237; Netflix/vmaf has no corresponding tree. - On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths.
- Invariant: the registry exposes exactly four codecs (
av1_qsv,h264_qsv,hevc_qsv,libx264— alphabetical), each adapter validates its(preset, quality)pair, and the QSV preset vocabulary is the seven x264-style names (veryslow…veryfast, noultrafast/superfast). The encode pipeline (encode.py) remains x264-CRF-tied and will be widened in a separate PR — the QSV adapters are inert until then. Future codec families that share parameter shape (NVENC, AMF) follow the same_<family>_common.py+ N thin adapters pattern. - Re-test on rebase:
0229 — vmaf-tune libvvenc + NN-VC codec adapter (ADR-0285)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/vvenc.py(new fork-only file),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry edit, fork-only),tools/vmaf-tune/tests/test_codec_adapter_vvenc.py(new),tools/vmaf-tune/tests/test_corpus.py(relaxes theknown_codecs() == ("libx264",)assertion to"libx264" in known_codecs()since the registry now spans multiple codecs). - Invariant: the codec-adapter registry is fork-introduced (Phase A of ADR-0237) and lives entirely outside the upstream Netflix tree, so
tools/vmaf-tune/does not touch upstream paths. The only rebase-sensitive surface is theCORPUS_ROW_KEYSschema insrc/vmaftune/__init__.py(per the Phase A invariant intools/vmaf-tune/AGENTS.md); this PR adds the adapter without changing the schema. - Upstream interaction: none.
tools/vmaf-tune/is not in Netflix/vmaf upstream. - Re-test on rebase:
- Status update 2026-05-09: the original
nnvc_intratoggle was removed (it emitted a fabricatedIntraNNkey that does not exist in any released VVenC). Replaced with a curated 9-knob real-VVenC 1.14.0 tuning surface (PerceptQPA,InternalBitDepth,Tier,Tiles,MaxParallelFrames,RPR,SAO,ALF,CCALF). Defaults preserve the bit-exact Phase A grid baseline.adapter_versionbumped to"2"so cache keys invalidate. See ADR-0285 §"Status update 2026-05-09".no rebase impact: REASON(fork-local file, no upstream-tree touch).
0228 — vmaf-tune Phase D scaffold (ADR-0276)¶
- Touches:
tools/vmaf-tune/src/vmaftune/per_shot.py,tools/vmaf-tune/src/vmaftune/cli.py,tools/vmaf-tune/tests/test_per_shot.py,docs/usage/vmaf-tune.md,docs/adr/0276-vmaf-tune-phase-d-per-shot.md. - Invariant: scaffold-only. The module relies on a stable predicate signature
(shot, target_vmaf, encoder) -> (crf, predicted_vmaf)that Phase B's bisect (PR #347) drops into later.Shotranges are half-open[start_frame, end_frame)even though the C-sidevmaf-perShotJSON/CSV sidecar uses an inclusiveend_frame— normalisation happens at the parse boundary in_parse_per_shot_json/parse_per_shot_csv.vmaf-perShotschema lives indocs/usage/vmaf-perShot.mdand is fork-local (ADR-0222), so upstream cannot drift it; the only rebase risk is fork-internal renames. - Upstream source: entirely fork-local.
tools/vmaf-tune/is fork-introduced (ADR-0237). Netflix/vmaf upstream has no encode-automation surface. - On upstream sync: zero interaction expected. No file in this PR overlaps an upstream-mirrored path.
- Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_per_shot.py -q
python tools/vmaf-tune/vmaf-tune tune-per-shot --help
0229 — vmaf-tune SVT-AV1 codec adapter (ADR-0278)¶
- Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/svtav1.py(new),tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(registry),tools/vmaf-tune/src/vmaftune/encode.py(parse_versionsextended for the SVT-AV1 banner pattern),tools/vmaf-tune/src/vmaftune/corpus.py(optionalffmpeg_preset_tokenhook). - Invariant:
PRESET_NAME_TO_INTis closed and order-stable; the integer values are baked into corpus rows that downstreamfr_regressor_v2(ADR-0235) trains on. Reordering or rewriting the table silently changes the integer SVT-AV1 receives. The codec key"libsvtav1"matchesCODEC_VOCAB[2]inai/src/vmaf_train/codec.py— keep them aligned on any rename. - Upstream source: fork-local.
tools/vmaf-tune/is a fork-introduced tree (see entry 0227 — Phase A scaffold). No Netflix/vmaf upstream interaction. - On upstream sync: zero interaction. Lives entirely under the fork-local
tools/vmaf-tune/tree. - Re-test on rebase:
0230 — fr_regressor_v2 PROD ship (ADR-0352)¶
- ADR: ADR-0352
- Touches:
model/tiny/fr_regressor_v2.onnx(binary, refreshed),model/tiny/fr_regressor_v2.json(sidecar, sha256 + metrics),model/tiny/registry.json(smoke flag flip, sha256 update),runs/phase_a/full_grid/per_frame_canonical6.jsonl(training corpus — fork-local artefact underruns/), companion docs. - Re-test recipe: see Research-0068 §Reproducer. Ship gate is LOSO PLCC ≥ 0.95 on the per-source folds; current run reports 0.9681 ± 0.0207.
- Rebase invariant: the per-frame canonical-6 corpus must be rebuilt from
runs/phase_a/{nvenc,qsv}_pf.jsonl(PR #392) before any retrain; do not re-train against the cell-onlycomprehensive.jsonl(it lacks the per-frame features and produces PLCC ≈ 0.7 — the smoke baseline). - No upstream interaction:
fr_regressor_v2is fork-local (ADR-0272).
0229 — vmaf-tune Phase E ladder generator (ADR-0295)¶
- ADR: ADR-0295
- Touches: entirely fork-local under
tools/vmaf-tune/. New moduletools/vmaf-tune/src/vmaftune/ladder.py, new test filetools/vmaf-tune/tests/test_ladder.py, two new subcommand blocks intools/vmaf-tune/src/vmaftune/cli.py. No upstream-shared paths touched. - Invariant:
vmaftune.ladder.convex_hullreturns a strictly monotonic Pareto frontier (both bitrate and vmaf monotonically increasing);select_kneesreturns exactlymin(n, len(hull))rungs in ascending bitrate order;emit_manifest("hls")produces one#EXT-X-STREAM-INFper rung with monotonically-increasingBANDWIDTH=values. The default_default_sampleris intentionallyNotImplementedError— production callers must inject a Phase B bisect-driven sampler. Phase B integration PR (gated on PR #347) swaps the default; the test suite continues to inject a synthetic stub. - Rebase impact: none — fork-local Python tool; upstream Netflix/vmaf does not ship a
tools/vmaf-tune/tree. - Re-test on rebase:
0229 — fr_regressor_v2 probabilistic head scaffold (ADR-0279)¶
- Touches:
ai/scripts/train_fr_regressor_v2_ensemble.py(new — fork-local).ai/scripts/eval_probabilistic_proxy.py(new — fork-local).model/tiny/fr_regressor_v2_ensemble_v1*.onnx,fr_regressor_v2_ensemble_v1.json(new artefacts; smoke probes).model/tiny/registry.json— five newkind: "fr"rows (fr_regressor_v2_ensemble_v1_seed{0..4}); existing entries untouched.ai/AGENTS.md— new "fr_regressor_v2_ensemble_v1 — probabilistic head" section pinning the per-member ONNX I/O contract, manifest-as-runtime-entry-point invariant, ensemble-size pin, confidence-rule one-of, codec-vocab parity, and smoke-artefact posture.docs/ai/models/fr_regressor_v2_probabilistic.md(new model card).docs/research/0067-fr-regressor-v2-probabilistic.md(new audit digest).docs/adr/0279-fr-regressor-v2-probabilistic.md(new ADR; Proposed). Index row appended todocs/adr/README.md.CHANGELOG.md—### Addedrow under "Unreleased — lusoris fork".- Invariant: the per-member ONNX I/O contract (two inputs:
features [N, 6]standardised +codec_onehot [N, NUM_CODECS]; one outputscore [N]) and the manifest'sconfidencerule (one-of"ensemble"/"ensemble+conformal") are the C-side adapter's load-bearing contract. Per-member ensembles are stockFRRegressor(num_codecs=NUM_CODECS)calls — flipping to a v1-shaped single-input graph silently invalidates the manifest.CODEC_VOCABparity withai/src/vmaf_train/codec.pyis required. - On upstream sync: zero interaction expected. Wholly fork-local; no upstream Netflix/vmaf path overlap. The
ai/package is fork-introduced (see ADR-0021, ADR-0036) — upstream has no probabilistic-regressor surface. If upstream ever ships its ownfr_regressor_v2variant, do NOT merge — register both ids side-by-side. - Re-test on rebase:
python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke
python ai/scripts/eval_probabilistic_proxy.py --smoke
python ai/scripts/validate_model_registry.py
0287 — vmaf-tune saliency-aware ROI tuning (ADR-0293)¶
- Touches:
tools/vmaf-tune/src/vmaftune/saliency.py,tools/vmaf-tune/src/vmaftune/cli.py(newrecommendsubcommand),tools/vmaf-tune/AGENTS.md(saliency invariant),docs/usage/vmaf-tune.md(saliency section). - Upstream source: fork-local. The
vmaf-tunetree was introduced in PR #329 (ADR-0237 Phase A) and has no upstream Netflix counterpart. - On upstream sync: zero interaction — pure fork-local Python package under
tools/vmaf-tune/. - Invariant: the saliency-to-QP-offset signal blend (
offset = (2*sal − 1) * foreground_offset, clamped to ±12) is bit-for-bit equivalent tovmaf-roi's C-side blend (ADR-0247).tests/test_saliency.pypins the contract; ifvmaf-roi's C blend changes,saliency.pyfollows in the same PR. The test seam contract (session_factory=…,encode_runner=…) lets the suite run withoutonnxruntimeorffmpeg. - Re-test on rebase:
0229 — tools/vmaf-roi-score/ Option C scaffold (ADR-0296)¶
- ADR: ADR-0296
- Touches:
tools/vmaf-roi-score/pyproject.toml(new)tools/vmaf-roi-score/vmaf-roi-score(new console shim)tools/vmaf-roi-score/src/vmafroiscore/__init__.py(new)tools/vmaf-roi-score/src/vmafroiscore/cli.py(new)tools/vmaf-roi-score/src/vmafroiscore/score.py(new)tools/vmaf-roi-score/src/vmafroiscore/mask.py(new)tools/vmaf-roi-score/tests/test_combine.py(new)tools/vmaf-roi-score/README.md(new)tools/vmaf-roi-score/AGENTS.md(new)docs/adr/0296-vmaf-roi-saliency-weighted.md(new)docs/adr/_index_fragments/0296-vmaf-roi-saliency-weighted.md(new)docs/adr/_index_fragments/_order.txt— append-only.docs/research/0069-vmaf-roi-saliency-weighted.md(new)docs/usage/vmaf-roi-score.md(new)changelog.d/added/T6-2c-vmaf-roi-score-scaffold.md(new)- Invariant:
tools/vmaf-roi-score/is wholly fork-local. No upstream Netflix/vmaf surface owns or interacts with this directory. The combine math is a pure linear blend on Pythonfloat; the JSON schema is pinned byROI_RESULT_KEYSandSCHEMA_VERSION = 1. Schema bumps require an ADR-0288 supersession. Naming guard: do not confuse withcore/tools/vmaf_roi.c(ADR-0247) — that's the encoder-steering binary. The scoring tool here isvmaf-roi-score; the names diverge deliberately. - Rebase impact: zero. Pure-Python tool under
tools/; not part of the libvmaf C build, not part of any Netflix-mirrored surface. - Re-test on rebase:
0228 — vmaf-tune compare codec-comparison mode (research-0061 Bucket #7)¶
- Touches:
tools/vmaf-tune/src/vmaftune/compare.py(new). Wholly fork-local; no upstream Netflix/vmaf path overlap.tools/vmaf-tune/src/vmaftune/cli.py— adds thecomparesubparser and_run_comparerouter.tools/vmaf-tune/tests/test_compare.py(new). Mocked predicate; noffmpeg/vmafbinaries required.tools/vmaf-tune/AGENTS.md— invariant note for the predicate seam andCOMPARE_ROW_KEYScontract.docs/usage/vmaf-tune.md— new "Codec comparison" section.- Invariant:
compare.compare_codecsorchestrates per-codec ranking via an injectedpredicate(codec, src, target_vmaf) -> RecommendResultcallable. The orchestration must not branch on codec name; new codecs land as one-file additions undercodec_adapters/and are picked up automatically by the registry.COMPARE_ROW_KEYSis the JSON / CSV column contract — same maintenance discipline asCORPUS_ROW_KEYS. - Rebase impact: entirely fork-local. The Phase A + Phase B recommend backend (ADR-0237) is fork-internal; upstream Netflix/vmaf has no
tools/vmaf-tune/tree. - Re-test on rebase:
```shell pytest tools/vmaf-tune/tests/test_compare.py -v PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \ --src /tmp/ref.yuv --target-vmaf 92 --format markdown
0229 — vmaf-tune --score-backend GPU score wiring (ADR-0299)¶
- Touches:
tools/vmaf-tune/src/vmaftune/score_backend.py(new). Wholly fork-local —tools/vmaf-tune/has no upstream Netflix/vmaf overlap.tools/vmaf-tune/src/vmaftune/{score,corpus,cli}.py(additive kwargs, no API removals).tools/vmaf-tune/tests/test_score_backend.py(new).docs/usage/vmaf-tune.md(new GPU section + flag row).docs/adr/0299-vmaf-tune-gpu-score.md(new).docs/research/0071-vmaf-tune-gpu-score-backend.md(new).- Invariant: the libvmaf CLI exposes
--backend NAMEwith valuesauto|cpu|cuda|sycl|vulkanexactly. Help-text parser inscore_backend.parse_supported_backendspins this format. If upstream renames the flag or reformats the help line on merge, the parser silently degrades to "CPU only" — the test fixtures intest_score_backend.pywill catch the format change but only if re-run. - Upstream source: fork-local. Netflix upstream's CLI does not ship a
--backendselector (CPU-only). - On upstream sync: zero interaction.
vmaf-tunelives entirely in fork-introduced paths and consumes only the fork's--backendflag. - Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v
# If the libvmaf help text reformats, parse_supported_backends
# will return {"cpu"} on test_parse_full_backend_line_yields_all_four
# and the test fails loudly.
0261 — vmaf-tune HDR-aware encode + score path (2026-05-03)¶
- What changed: fork-local addition under
tools/vmaf-tune/src/vmaftune/hdr.pyplus wiring intocorpus.py/cli.py/score.py. Adds ffprobe-driven HDR detection, codec-specific HDR ffmpeg flag dispatch, schema-v2 corpus row keys (hdr_transfer,hdr_primaries,hdr_forced), and four--auto-hdr/--force-*CLI modes. See ADR-0300. - Upstream source: zero.
tools/vmaf-tune/is fork-introduced (Phase A under ADR-0237). - On upstream sync: zero interaction. Upstream Netflix/vmaf ships no encode automation surface; this tree is entirely fork-local and lives outside
libvmaf/andpython/. - Schema migration note:
SCHEMA_VERSIONbumped 1 → 2. The three new keys are additive — Phase B / C loaders treat missing keys as SDR for backward compat with v1 rows. - Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/ -q
python -m vmaftune.cli corpus --help # confirm --auto-hdr surfaces
HP-2 — vmaf-tune HDR iter_rows integration (2026-05-08)¶
- What changed: fork-local.
tools/vmaf-tune/src/vmaftune/corpus.pynow importsvmaftune.hdrand wiresdetect_hdr/hdr_codec_args/select_hdr_vmaf_modelinto the per-source encode + score loop. The 0300 PR landedhdr.pyand the four CLI flags but never imported the module — PQ sources silently encoded as SDR. Schema bumps v2 → v3 because the originally-promisedhdr_transfer/hdr_primaries/hdr_forcedrow columns finally land. See ADR-0300 § Status update 2026-05-08. - Upstream source: zero. Fork-only.
- On upstream sync: zero interaction.
tools/vmaf-tune/is fork-introduced. - Schema migration note:
SCHEMA_VERSION2 → 3 (additive). The three HDR keys default to""/""/Falsefor SDR rows; Phase B / C loaders that ignore unknown keys keep working against v3 rows. - Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_hdr.py -q
python -m pytest tools/vmaf-tune/tests/test_corpus.py::test_corpus_row_keys_match_init_contract -q
0298 — vmaf-tune content-addressed cache (ADR-0298)¶
- What changed: fork-local. New module
tools/vmaf-tune/src/vmaftune/cache.py; cache integration intools/vmaf-tune/src/vmaftune/corpus.py(iter_rowsnow consults the cache before encode/score); new CLI flags--no-cache,--cache-dir,--cache-size-gbincli.py. Codec-adapterProtocolgainsadapter_version: str; the lone Phase-A x264 adapter pins"1". - Upstream source: none.
tools/vmaf-tune/is fork-introduced (ADR-0237) and has no upstream counterpart. - On upstream sync: zero interaction with Netflix/vmaf master. The module sits entirely under
tools/vmaf-tune/, which upstream does not ship. - Invariant for future codec adapters: every
CodecAdaptermust declareadapter_version: str. Bump it whenever the adapter's argv shape, preset list, or quality range changes — otherwise the cache returns stale results post-upgrade. The contract is asserted bytest_cache_key_diffs_on_each_fieldintests/test_cache.py. - Re-test on rebase:
```bash pytest tools/vmaf-tune/tests/test_cache.py -v
0283 — vmaf-tune Apple VideoToolbox adapters (2026-05-05)¶
- What changed: fork-local addition under
tools/vmaf-tune/src/vmaftune/codec_adapters/. New files:h264_videotoolbox.py,hevc_videotoolbox.py,_videotoolbox_common.py, plus the registry hook in__init__.py. See ADR-0283. - Update 2026-05-09:
prores_videotoolbox.pyadapter added to the same registry pattern (broadcast / prosumer ProRes intermediate). Quality knob differs — ProRes is a fixed-rate codec, so the harness's--crfslot carries the integer ProRes tier id (0=proxy→ 5=xq) rather than a-q:vvalue._videotoolbox_common.pyextended withPRORES_PROFILE_*constants +validate_prores_videotoolbox()/prores_profile_name()helpers; profile ids verified against FFmpeg n8.1.1libavcodec/videotoolboxenc.c. See the Status update appendix in ADR-0283. - Upstream source: zero.
tools/vmaf-tune/is fork-introduced (Phase A under ADR-0237). - On upstream sync: zero interaction.
- Re-test on rebase:
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py -q
python -m pytest tools/vmaf-tune/tests/test_codec_adapter_prores_videotoolbox.py -q
0228 — vmaf-tune coarse-to-fine CRF search (ADR-0306)¶
- What changed: fork-local tooling. Adds
coarse_to_fine_search()totools/vmaf-tune/src/vmaftune/corpus.py, plumbs new CLI flags ontovmaf-tune corpus(--coarse-to-fine,--coarse-step,--fine-radius,--fine-step,--target-vmaf), and ships a newvmaf-tune recommendsubcommand. Widenstools/vmaf-tune/src/vmaftune/codec_adapters/x264.pyquality_rangefrom(15, 40)to(0, 51). JSONL row schema unchanged (SCHEMA_VERSION=1). - Upstream source: fork-local. The whole
tools/vmaf-tune/tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation surface. - On upstream sync: zero interaction.
tools/vmaf-tune/is not mirrored from upstream. - Re-test on rebase:
0314 — vmaf-tune --score-backend=vulkan (ADR-0314)¶
- Touches:
tools/vmaf-tune/src/vmaftune/cli.py(additive argparse flag oncorpus+recommendsubparsers; resolvesselect_backendand catchesBackendUnavailableErrorfor clean exit-2).tools/vmaf-tune/src/vmaftune/score.py(additivebackendkwarg onbuild_vmaf_commandandrun_score;None= no flag emitted).tools/vmaf-tune/src/vmaftune/corpus.py(newCorpusOptions.score_backendfield, defaultNone; forwarded intorun_score).tools/vmaf-tune/tests/test_score_backend.py(additive Vulkan-specific tests; pre-existing tests now pass after thebackend=kwarg lands).docs/adr/0314-vmaf-tune-score-backend-vulkan.md(new).docs/usage/vmaf-tune.md(new "Vulkan score backend" subsection under the existing GPU-scoring section).tools/vmaf-tune/AGENTS.md(invariant note: argparse choices stay in sync with libvmaf--backendvocabulary).changelog.d/added/vmaf-tune-score-backend-vulkan.md(new).- Invariant:
score_backend.ALL_BACKENDS = ("cpu", "cuda", "sycl", "vulkan")is the exact set libvmaf'score/tools/cli_parse.c--backendalternation accepts. Adding a new harness-side value without the libvmaf-side wiring produces silent strict-mode failures on hosts that probe positively for it. - Upstream source: zero. Netflix upstream's CLI does not ship a
--backendselector; bothtools/vmaf-tune/andcore/src/vulkan/are fork-introduced. - On upstream sync: zero interaction. No upstream-mirror file is touched.
- Re-test on rebase:
pytest tools/vmaf-tune/tests/test_score_backend.py -v -k vulkan
pytest tools/vmaf-tune/tests/test_score_backend.py -v
Failures here usually indicate the libvmaf help-text format changed; score_backend.parse_supported_backends test fixtures pin the format and will fail loudly.
0303 — fr_regressor_v2 ensemble prod flip (ADR-0303)¶
- ADR: ADR-0303
- Touches: entirely fork-local.
ai/scripts/train_fr_regressor_v2_ensemble_loso.py(new — 9-fold LOSO trainer over the five ensemble seeds; emitsloso_seed{N}.jsonartefacts).scripts/ci/ensemble_prod_gate.py(new — reads fiveloso_seed{N}.jsonfiles, returns exit 0 iffmean(PLCC_i) ≥ 0.95ANDmax - min ≤ 0.005).ai/AGENTS.md— appended "Ensemble registry invariant" paragraph under the existingfr_regressor_v2_ensemble_v1section.docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md(new),docs/research/0075-fr-regressor-v2-ensemble-prod-flip.md(new),changelog.d/added/fr-regressor-v2-ensemble-prod-flip.md(new).- Rebase invariant: the production ship gate is two-part —
mean_i(PLCC_i) ≥ 0.95ANDmax_i(PLCC_i) - min_i(PLCC_i) ≤ 0.005over five seeds. The variance bound is load-bearing: removing it silently allows a one-seed-wins-four-seeds-tie configuration that invalidates the ensemble's predictive-distribution semantics. Both thresholds live inscripts/ci/ensemble_prod_gate.py; do not weaken either without superseding ADR-0303. - Rebase invariant (registry): the five
fr_regressor_v2_ensemble_v1_seed{0..4}registry rows aresmoke: trueon master at this commit; flipping them tofalseis the follow-up flip PR's job, gated on a real-corpus LOSO run + the CI gate. Do not flip seed rows during a rebase merge conflict resolution. - Re-test on rebase:
python3 -c "import ast; ast.parse(open('ai/scripts/train_fr_regressor_v2_ensemble_loso.py').read())"
python3 -c "import ast; ast.parse(open('scripts/ci/ensemble_prod_gate.py').read())"
python ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help
python scripts/ci/ensemble_prod_gate.py --help
- Upstream source: zero.
fr_regressor_v2and its ensemble are fork-introduced (parent ADR-0272 / ADR-0279). - On upstream sync: zero interaction.
0313 — CI required-checks aggregator (2026-05-05)¶
- What changed: fork-local CI policy. New
.github/workflows/required-aggregator.yml— single workflow that runs on every non-draft PR and verifies the 23 named required checks reportedsuccess/skipped/neutral(or didn't appear at all, which is the path-filter-rejection semantics). Aggregator becomes the single branch-protection required check, replacing the 23-name list from ADR-0037. - Touches:
.github/workflows/required-aggregator.yml(new),docs/adr/0313-ci-required-checks-aggregator.md(new),changelog.d/added/ci-required-checks-aggregator.md(new),docs/adr/README.md(+1 row),docs/adr/_index_fragments/_order.txt(+1 line + new fragment file). - Upstream source: zero. Branch-protection policy is fork-only.
- On upstream sync: zero interaction with Netflix/vmaf master.
- Manual operator step at adoption (uses PATCH, not PUT — corrected from the original ADR-0313 body which had the wrong verb):
echo '{"strict": false, "contexts": ["Required Checks Aggregator"]}' | \
gh api -X PATCH "repos/VMAFx/vmafx/branches/master/protection/required_status_checks" --input -
- Re-test on rebase:
# YAML lint passes
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/required-aggregator.yml'))"
0305 — encoder knob-space Pareto analysis (2026-05-05)¶
- What changed: fork-local. New analysis scaffold for the 12,636-cell encoder knob sweep that backs
tools/vmaf-tune/codec_adapters/*recipe defaults. New files:ai/scripts/analyze_knob_sweep.py(per-(source, codec, rc_mode)Pareto hull on(bitrate_kbps, vmaf_score),encode_time_mstiebreaker, regression-detection check),ai/tests/test_knob_sweep_analysis.py(synthetic 20-row JSONL fixture). Methodology + scaffolded findings: see ADR-0305 + Research-0077. Companion to Research-0063. - Touches: none upstream-shared. Sits entirely under
ai/(fork-local since the tiny-AI training surface, ADR-0021) anddocs/{adr,research}/(fork ledger). - Upstream source: zero. The 12,636-cell sweep, the Pareto scaffold, and the regression-detection invariant are fork-introduced; Netflix/vmaf master ships no encoder knob-sweep tooling.
- On upstream sync: zero interaction with Netflix/vmaf master.
- Invariant for future codec adapter PRs: per the
ai/AGENTS.mdknob-sweep corpus invariant (ADR-0305), recipes that regress vs the bare encoder at matched bitrate within the same(source, codec, rc_mode)slice MUST NOT ship as adapter defaults. New adapter PRs cite the per-slice hull row fromreports/summary.md(or "no hull entry yet — bare default") in their PR description. Thecomprehensive.jsonlsweep file is generated locally and lives underruns/phase_a/full_grid/(gitignored — never committed). - Re-test on rebase:
0302 — ENCODER_VOCAB v3 schema expansion (ADR-0302)¶
- Touches:
ai/scripts/train_fr_regressor_v2.py(adds anENCODER_VOCAB_V3parallel constant; does not modify the liveENCODER_VOCABorENCODER_VOCAB_VERSION). - Invariant:
ENCODER_VOCABis append-only and order-stable (per ADR-0235). The v3 scaffold preserves the v2 slot ordering verbatim — slots 0..12 are bit-identical to the v2 vocab; slots 13/14/15 appendlibsvtav1,h264_videotoolbox,hevc_videotoolbox. The liveENCODER_VOCAB_VERSION = 2remains the source of truth until the follow-up retrain PR clears the LOSO PLCC ship gate. - Upstream interaction: zero.
ai/scripts/train_fr_regressor_v2.pyis fork-introduced (ADR-0272) and has no upstream counterpart. - Re-test on rebase:
python3 -c "
import importlib.util, pathlib
spec = importlib.util.spec_from_file_location(
't', pathlib.Path('ai/scripts/train_fr_regressor_v2.py')
)
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
assert len(m.ENCODER_VOCAB_V3) == 16
assert m.ENCODER_VOCAB_VERSION == 2
print('OK')
"
0304 — vmaf-tune fast-path prod wiring (ADR-0304)¶
- Touches:
tools/vmaf-tune/src/vmaftune/fast.py(replaces the ADR-0276 scaffold'sNotImplementedErrorpaths with concrete Optuna TPE + v2 proxy + GPU verify wiring); new moduletools/vmaf-tune/src/vmaftune/proxy.py(centralised seam forfr_regressor_v2ONNX inference); expandedtools/vmaf-tune/tests/test_fast.py. Doc-side: ADR-0304, Research-0076,tools/vmaf-tune/AGENTS.mdinvariant note. - Upstream source: zero.
tools/vmaf-tune/andmodel/tiny/fr_regressor_v2.onnxare both fork-introduced (ADR-0237 / ADR-0352). - Invariant: the production proxy is always
fr_regressor_v2(no smoke models in the production path) and a single GPU verify pass at recommend-end is mandatory — proxy alone never wins. Thevmaftune.proxy.run_proxyhelper is the single seam every fast-path consumer goes through; future probabilistic-head / ensemble migrations land in that one module. ENCODER_VOCAB v2 one-hot ordering is frozen by ADR-0352 and pinned inproxy.ENCODER_VOCAB_V2— keep in sync withai/scripts/train_fr_regressor_v2.py; drift raisesProxyErrorat inference time before bad predictions ship. - On upstream sync: zero interaction with Netflix/vmaf master.
- Re-test on rebase:
0307 — vmaf-tune ladder default sampler wiring (ADR-0307)¶
- What changed: fork-local tooling.
tools/vmaf-tune/src/vmaftune/ladder.py::_default_samplerno longer raisesNotImplementedError; it composescorpus.iter_rows(Phase A encode + score) withrecommend.pick_target_vmaf(smallest CRF clearing target VMAF) overDEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)at the adapter's mid-range preset. Module-level docstring + AGENTS.md invariant updated. New tests intools/vmaf-tune/tests/test_ladder.pystubiter_rowsviamonkeypatch.setattrso no live ffmpeg / vmaf binaries are needed. - Upstream source: fork-local. The whole
tools/vmaf-tune/tree is fork-introduced (ADR-0237); upstream Netflix/vmaf has no encode-automation / ladder surface. - On upstream sync: zero interaction.
tools/vmaf-tune/is not mirrored from upstream. - Rebase invariant: the 5-point sweep
(18, 23, 28, 33, 38)is the load-bearing default; downstream Phase E callers size their wall-time budget against five encodes per(resolution, target_vmaf)cell. Do not widen / narrow it without an ADR-0307 follow-up. TheSamplerFnseam stays open — callers needing finer grids pass an explicitsampler=. - Re-test on rebase:
0309 — fr_regressor_v2 ensemble real-corpus retrain harness (ADR-0309)¶
- ADR: ADR-0309
- Touches: entirely fork-local.
ai/scripts/run_ensemble_v2_real_corpus_loso.sh(new — Bash wrapper that loops the five seeds over the existingtrain_fr_regressor_v2_ensemble_loso.pyagainst.workingdir2/netflix/).ai/scripts/validate_ensemble_seeds.py(new — calls the ADR-0303 gate and writesPROMOTE.json/HOLD.jsonwith a corpus sha256 snapshot).ai/tests/test_validate_ensemble_seeds.py(new — 7 tests, synthetic JSON fixtures for both verdict paths).ai/AGENTS.md— appended "Registry-flip is a separate PR (ADR-0309)" paragraph under the existingfr_regressor_v2_ensemble_v1section.docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md,docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md,docs/ai/ensemble-v2-real-corpus-retrain-runbook.md(all new).- Rebase invariant: the harness is decoupled from the registry mutation. Neither the wrapper nor the validator touches
model/tiny/registry.json; the registry flip is a separate follow-up PR gated on a passingPROMOTE.json. Auto-flipping on PROMOTE was rejected in ADR-0309's alternatives matrix specifically because rebase-time mutation of shipped registry rows is the foot-gun this invariant exists to prevent. - Re-test on rebase:
python -m pytest ai/tests/test_validate_ensemble_seeds.py -v
python ai/scripts/validate_ensemble_seeds.py --help
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
- Upstream source: zero.
- On upstream sync: zero interaction.
0310 — BVI-DVC corpus ingestion for fr_regressor_v2 (ADR-0310)¶
- Touches:
ai/scripts/bvi_dvc_to_corpus_jsonl.py(new fork-only adapter),ai/scripts/merge_corpora.py(new fork-only shard merger),ai/tests/test_merge_corpora.py(new),docs/ai/bvi-dvc-corpus-ingestion.md(new),docs/adr/0310-bvi-dvc-corpus-ingestion.md(new),docs/research/0082-bvi-dvc-corpus-feasibility.md(new),ai/AGENTS.md(BVI-DVC invariant note). - Invariant: the BVI-DVC archive and any extracted artefacts (parquet, cached libvmaf JSON, JSONL corpus shard) are research-only and stay local — only derived
fr_regressor_v2_*.onnxweights ship. The merge utility validates every row against the canonicalvmaftune.CORPUS_ROW_KEYStuple; the schema is the merge contract. Re-shape here is a pure transform on the cached libvmaf JSON; no ffmpeg / vmaf binary is invoked. The(src_sha256, encoder, preset, crf)natural key is load-bearing for de-duplication across mirrors and re-encodes. - Upstream interaction: none.
ai/is fork-introduced; BVI-DVC is not part of Netflix/vmaf upstream. - Re-test on rebase:
ADR-0312 — ffmpeg-patches/ vmaf-tune integration (2026-05-05)¶
- Files:
ffmpeg-patches/0007-libvmaf-tune-qpfile-unified.patch,ffmpeg-patches/0008-add-libvmaf_tune-filter.patch,ffmpeg-patches/0009-pass-autotune-cli-glue.patch,ffmpeg-patches/series.txt,ffmpeg-patches/README.md. - Rebase invariant: patches
0007–0009plug into the cumulative state after patches0001–0006apply against pristinen8.1. Per-patchgit apply --checkin isolation is the wrong gate; use the series-replay command in CLAUDE.md §12 r14 instead. - vmaf-tune patch invariant: the qpfile parser at
libavcodec/qpfile_parser.{c,h}is shared across all three encoder adapters in patch 0007. Future encoders that grow a-qpfileAVOption inherit it; do not fork the parser. Whentools/vmaf-tune/src/vmaftune/saliency.py's qpfile output format changes (new column, different frame-type alphabet, …), patch 0007 must change in the same PR (CLAUDE.md §12 r14). - vf_libvmaf_tune full-scoring promotion (2026-05-06): patch 0008 originally shipped as a scaffold (linear CRF↔VMAF interpolation, no libvmaf scoring) per ADR-0312's deferred-alternatives column. The filter now mirrors
vf_libvmaf.c's CPU framesync pipeline end-to-end (vmaf_init+vmaf_model_load+vmaf_use_features_from_modelin init(); per-framevmaf_picture_alloc+ memcpy +vmaf_read_pictures; flush +vmaf_score_pooled(MEAN)in uninit()). The CRF recommendation remains a piece-wise linear projection from the observed VMAF; per-clip Optuna TPE search stays intools/vmaf-tune/src/vmaftune/recommend.py. Rebase-side: the new filter still depends only on libvmaf's CPU C-API (vmaf_init,vmaf_model_load,vmaf_use_features_from_model,vmaf_read_pictures,vmaf_score_pooled,vmaf_close,vmaf_picture_alloc/unref); zero new symbols beyond whatvf_libvmaf.calready requires, so future libvmaf rebases that pass the existing libvmaf filter pass this one too. ADR-0312 sub-decision retired. - n7+ API migration (2026-05-06): patch 0008 originally referenced the removed
AVFilterLink::frame_ratemember directly (n6-era API); in n7+ that field moved offAVFilterLinkonto a newFilterLinkstruct accessed viaff_filter_link(AVFilterLink *)fromlibavfilter/filters.h. Patch 0008 now usesff_filter_link(outlink)->frame_rate = ff_filter_link(mainlink)->frame_rate;inconfig_output(), mirroring patches 0005/0006 which were already written against the post-n7 API. The bug slipped through CI because the FFmpeg-Vulkan lane only buildsvf_libvmaf.o, notvf_libvmaf_tune.c; the full SYCL lane catches it now that PR #415 addedffmpeg-patches/**to the integration workflow's path filter. Discovery: PR #415 / ADR-0317. - Upstream source: zero. The vmaf-tune integration is fork-introduced; pure upstream syncs are unaffected.
- On upstream sync: zero interaction with libvmaf master. FFmpeg-side rebases when n8.1 → n8.x land in
ffmpeg-patches/test/build-and-run.sh'sFFMPEG_SHAare tracked separately under each refresh ADR (e.g., ADR-0277 for the 2026-05-04 refresh). - Re-test on rebase:
git -C /path/to/ffmpeg-8 reset --hard n8.1
for p in ffmpeg-patches/000*-*.patch; do
git -C /path/to/ffmpeg-8 am --3way "$p" || break
done
# Build smoke (libvmaf-disabled — patches 0001–0006 skipped if libvmaf_dnn
# is not built). With libvmaf_dnn available:
cd /path/to/ffmpeg-8 && ./configure --enable-libvmaf --enable-libx264 --enable-libsvtav1 --enable-libaom --enable-gpl
make -j$(nproc) ffmpeg
./ffmpeg -hide_banner -h encoder=libx264 2>&1 | grep -i qpfile
-
2026-05-06 update — patch 0007 SVT-AV1 ROI bridge promoted from scaffold to full impl: the libsvtav1 hunk now sets
enc_params.enable_roi_map = true, builds oneSvtAv1RoiMapEvtper qpfile frame upfront ineb_enc_init(per-MB qp_offsets averaged into per-64×64-SBb64_seg_mapof up to 8 segment QPs; uniform binning when the value span exceeds the segment budget), and attaches each event as aROI_MAP_EVENTpriv-data node fromeb_send_frame()withnode->size = sizeof(SvtAv1RoiMapEvt*)(the validation contract enforced by SVT-AV1'sresource_coordination_process.c). Lifetime invariant: events + maps live for the entire encode session because SVT-AV1 reads ROI_MAP_EVENT data via shallow-copied pointers on async pipeline threads (perenc_handle.c::copy_private_data_list);eb_enc_closefrees them. Wiring is gated onSVT_AV1_CHECK_VERSION(1, 6, 0); older SVT-AV1 builds keep the log-and-continue fallback. libaom remains scaffold-only — itsAOME_SET_ROI_MAPbridge stays a separate follow-up. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision). -
2026-05-06 update — patch 0007 libaom-av1 ROI bridge promoted from scaffold to full impl: the libaom-av1 hunk now caches the parsed
VmafTuneQpFileinAOMContext, allocates a segment-id map at libaom's mode-info grid (ALIGN_POWER_OF_TWO(dim, 8) >> 2, sinceav1/common/enums.h::MI_SIZE == 4), and on every encoded frame picks up to 8 segment QPs from the per-frame qp_offset value range (uniform linear binning when the span exceedsAOM_MAX_SEGMENTS == 8), paints the per-mi segment map by expanding each per-16×16-MB qp_offset into a 4×4 block of mi cells, and issuesaom_codec_control(&ctx->encoder, AOME_SET_ROI_MAP, &roi_map). Lifetime invariant: libaom deep-copies the segment map anddelta_q[]table on every control call (perav1/encoder/encoder.c::av1_set_roi_map memcpy), so a single buffer is reused across frames and freed inaom_free(). The qpfile is also freed there. Trade-off: the 8-segment cap rounds nearby qp_offsets together when the saliency model emits more than 8 distinct values per frame; finer granularity requiresvmaf-tune corpusinstead. This retires the libaom-av1 deferral noted under ADR-0312 — both AV1 encoder hooks (libsvtav1 and libaom-av1) are now full-impl. No new ADR per CLAUDE.md §12 r8 (executes the existing ADR-0312 decision).
0315 — Vendor-neutral VVC encode strategy (ADR-0315 / Research-0085)¶
- ADR: ADR-0315
- Digest: Research-0085
- Touches: docs-only.
docs/research/0085-vendor-neutral-vvc-encode-landscape.md(new).docs/adr/0315-vendor-neutral-vvc-encode-strategy.md(new).docs/adr/_index_fragments/0315-vendor-neutral-vvc-encode-strategy.md(new).docs/adr/_index_fragments/_order.txt(one-line append).changelog.d/added/research-0085-vendor-neutral-vvc-encode.md(new).docs/rebase-notes.md(this entry).- Rebase invariant: none. The research digest and ADR are pure surveys with no code dependencies; nothing in the fork's source tree references them in a way that breaks on upstream rebase.
- Upstream source: zero. VVC encode strategy is a fork-local decision; upstream Netflix/vmaf has no codec adapter or encode-automation surface.
- On upstream sync: zero interaction. Pure docs.
- Re-test on rebase:
- 2026-05-06 follow-up (Research-0085 verification pass):
docs/research/0085-vendor-neutral-vvc-encode-landscape.mdflipped fromStatus: SKELETONtoStatus: Active. Most[UNVERIFIED]claims are now backed by primary-source URLs (NVIDIA SDK 13.0 docs, AMD AMF GitHub, Intel oneVPL GitHub +mfxstructures.h+CHANGELOG.md, Khronos registry, Phoronix Mesa/RADV coverage, VVenC issue tracker, ZLUDA repo).- ADR-0315's
## Contextand## Alternatives consideredrefreshed with the verified data points. Status staysProposed. [UNVERIFIED]count in the digest dropped 25 → 10; remaining items are legitimate gaps (NN-VC quality lift, vvenc per-kernel profile, HHI's non-public roadmap).- No code touched. No rebase impact beyond the existing docs-only posture.
0316 — cli_parse.c error() long-only-option fix (ADR-0316)¶
- ADR: ADR-0316 (follow-up to ADR-0311).
- Digest: none — bug-fix; fix shape fits in the ADR/commit body.
- Touches:
core/tools/cli_parse.c(3 lines — call-site arg change at theARG_THREADS/ARG_SUBSAMPLE/ARG_CPUMASKhandlers).core/test/fuzz/fuzz_cli_parse.c(removedknown_assert_in_inputearly-reject filter).core/test/fuzz/cli_parse_corpus/cli_threads_abbrev_assert.argv(promoted fromcli_parse_known_crashes/).core/test/test_cli_parse_long_only_args.c(new fork()-based regression test).core/test/meson.build(new test wiring, gated off Windows alongsidetest_y4m_411_oob).core/tools/AGENTS.md(added a long-only-options invariant note next to the existingcli_parse.crules).- Rebase invariant: load-bearing.
cli_parse.cis upstream-mirror with fork additions; the three handlers carry the fork-local shape of passing theARG_*enum value (not't'/'s'/'c') toparse_unsigned(). If an upstream sync re-introduces the original short-option char shape, the assert returns and the parked-then-promoted reproducer (cli_parse_corpus/cli_threads_abbrev_assert.argv) will surface it in the next nightly fuzz run. - Upstream source: the bug shape exists in Netflix/vmaf master too (long-only options were added upstream with the same short-option-char placeholder). When the fork ports an upstream fix that overlaps these handlers, prefer the
parse_unsigned(optarg, ARG_*, argv[0])form already on the fork. - On upstream sync: re-apply the three-line change in
cli_parse.cif upstream resets the call-site args. The unit test is fork-local and stays. - Re-test on rebase:
meson setup core/build libvmaf -Denable_tests=true \
-Denable_cuda=false -Denable_sycl=false
ninja -C core/build test/test_cli_parse_long_only_args
meson test -C core/build test_cli_parse_long_only_args -v
ADR-0317 — CI flake fix: doc-only PR path-filter (2026-05-06)¶
- Touched files:
.github/workflows/docker-image.yml— addedpaths:filter on bothpush:andpull_request:triggers..github/workflows/ffmpeg-integration.yml— addedpaths:filter on bothpush:andpull_request:triggers (covers all four matrix lanes: gcc, clang, SYCL, Vulkan).docs/adr/0317-ci-doc-only-pr-flake-fix.md,docs/adr/README.md(index row),changelog.d/fixed/ci-doc-only-pr-flakes.md.- Rebase invariant: not load-bearing. Workflow-only change. Both files are fork-local CI; upstream Netflix/vmaf does not ship a Docker workflow or an FFmpeg-integration matrix in this shape, so rebase conflicts are unlikely. If a future upstream sync introduces an overlapping
docker-image.ymlor FFmpeg matrix, prefer the fork's path-filtered form — the rationale (ADR-0313 aggregator posture, doc-only-PR runner-time burn) is fork-specific. - Upstream source: none — fork-local CI workflows.
- On upstream sync: no action required. If reviewers later add new build inputs (e.g. a top-level
docker-compose.yml, a newffmpeg-patches/*.txtconfig file), extend thepaths:lists in the same PR that adds the input. - Follow-up not in this ADR: patch
ffmpeg-patches/0008-add-libvmaf_tune-filter.patchline 256 (outlink->frame_rate = mainlink->frame_rate;) needs to migrate to theff_filter_link()accessor introduced in FFmpeg n7+, matching the pattern already in patches 0005 / 0006. Tracked separately; the path-filter does not hide it (any libvmaf/ or ffmpeg-patches/ PR will still trip the SYCL lane). - Re-test on rebase:
python3 -c "import yaml; \
yaml.safe_load(open('.github/workflows/docker-image.yml')); \
yaml.safe_load(open('.github/workflows/ffmpeg-integration.yml')); \
print('OK')"
0319 — fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training (ADR-0319)¶
- Touches:
ai/scripts/train_fr_regressor_v2_ensemble_loso.py(real_load_corpus+_train_one_seedbodies),ai/scripts/run_ensemble_v2_real_corpus_loso.sh(wrapper argv fix),docs/ai/ensemble-v2-real-corpus-retrain-runbook.md(Step 0 corpus-generation section),ai/AGENTS.md(canonical-6 schema invariant note),ai/tests/test_train_fr_regressor_v2_ensemble_loso_*.py(loader + train schema tests). Closes the deferrals tracked in rebase-notes §0303 + §0309. - Upstream source: none — fork-local ML training infrastructure. Netflix/vmaf upstream has no
fr_regressor_v2surface, no LOSO trainer, and no canonical-6 corpus tooling. - Invariant: the trainer's
_load_corpusaccepts the canonical-6 JSONL schema emitted byscripts/dev/hw_encoder_corpus.pybit-for-bit — required keys per row are(src, encoder, cq, frame_index, vmaf, adm2, vif_scale0..3, motion2). Codec block layout is 12-slotENCODER_VOCABv2 one-hot + constantpreset_norm = 0.5+crf_norm = (cq - cq_min) / (cq_max - cq_min). Schema changes require anENCODER_VOCAB_VERSIONbump and full ensemble retrain per the existing closed-vocabulary rule (ADR-0235 / ADR-0352). Fold-level StandardScaler is fit on the training rows only; leaking the held-out source's distribution into the scaler would silently inflate per-fold PLCC. - On upstream sync: no action required. If upstream Netflix/vmaf ever adds a competing LOSO trainer under
python/vmaf/, do NOT merge them — keep the fork's training stack underai/per the AGENTS.md scope rule. - Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v2_ensemble_loso_loader.py \
ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py -v
bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
ADR-0323 — fr_regressor_v3 train + register on ENCODER_VOCAB v3 (2026-05-06)¶
- Scope:
ai/scripts/train_fr_regressor_v3.py(new),ai/tests/test_train_fr_regressor_v3.py(new),model/tiny/fr_regressor_v3.onnx(new, real-weight checkpoint from a 9-fold LOSO gate-pass at mean PLCC 0.9975),model/tiny/fr_regressor_v3.json(new sidecar withencoder_vocab_version: 3and full per-fold trace),model/tiny/registry.json(newfr_regressor_v3row,smoke: false),ai/AGENTS.md(v3 retrain invariant section gains a "Status" subsection recording the gate result),docs/ai/models/fr_regressor_v3.md(new model card),docs/adr/0323-fr-regressor-v3-train-and-register.md+ index row,changelog.d/added/fr-regressor-v3-train-register.md. - Rebase impact: zero. Fork-local feature; no upstream Netflix/vmaf surface is touched. The 16-slot
ENCODER_VOCAB_V3imported fromtrain_fr_regressor_v2.pywas already landed by PR #401 (ADR-0302). - On upstream sync: no action required. The v3 model ships alongside v2 —
fr_regressor_v2.onnxand its sidecar are unchanged; the v3 row is appended to the registry and sorted alphabetically. If a future upstream sync ever lands a competingfr_regressor_v3model underpython/vmaf/, do NOT cross-link them — the fork's training stack lives underai/. - Watch out for: the live
ENCODER_VOCAB_VERSIONinai/scripts/train_fr_regressor_v2.pystays at 2 (per ADR-0302's invariant). Do not bump it to 3 in this PR or in any downstream port; the in-place promotion of v3 over v2 is a separate "promote v3 to authoritative" PR per ADR-0302's production-flip checklist. - Re-test on rebase:
pytest ai/tests/test_train_fr_regressor_v3.py -v
bash core/test/dnn/test_registry.sh # must report OK: 20+
python -c "import onnx; onnx.checker.check_model(onnx.load('model/tiny/fr_regressor_v3.onnx')); print('OK')"
ADR-0321 — fr_regressor_v2_ensemble_v1 full production flip (2026-05-06)¶
- Scope:
ai/scripts/export_ensemble_v2_seeds.py(new),model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.onnx(real full-corpus-trained weights replacing the 3025-byte synthetic scaffold bytes),model/tiny/fr_regressor_v2_ensemble_v1_seed{0..4}.json(new per-seed sidecars),model/tiny/registry.json(sha256 +smoke: falseon the five seed rows),ai/AGENTS.md(new invariant: the registry-flip is now done; future re-flips require a fresh PROMOTE.json + re-run of the export driver). - Rebase impact: zero. This is a fork-local production-flip; no upstream Netflix/vmaf surface is touched. The 12-slot
ENCODER_VOCABv2 carried in each sidecar is the same one the LOSO trainer (ADR-0319) bakes into the codec-block layout, so there is no rebase-time vocabulary drift to worry about. - Watch out for: if a future upstream sync ever introduces a competing
fr_regressor_v2_ensemble_*model underpython/vmaf/, do NOT cross-link them — the fork's ensemble weights are gated onruns/ensemble_v2_real/PROMOTE.jsonand are not portable to a different training stack. - Re-test on rebase:
bash core/test/dnn/test_registry.sh # must report OK: 19
python -c "import onnx; \
[onnx.checker.check_model(onnx.load(f'model/tiny/fr_regressor_v2_ensemble_v1_seed{i}.onnx')) \
for i in range(5)]; print('OK')"
ADR-0324 — Ensemble training kit (2026-05-06)¶
- Touches:
tools/ensemble-training-kit/(new),docs/adr/0324-ensemble-training-kit.md(new),docs/adr/README.md(index row),changelog.d/added/0324-ensemble-training-kit.md(new). No engine code touched; no upstream-shared paths. - Invariant: the kit assumes the LOSO wrapper hard-codes seeds
(0 1 2 3 4). The orchestrator surfaces a warning if--seedsdeviates but still hands off to the wrapper. If a future PR parameterises the wrapper's seed list, update both the wrapper and the kit's pass-through logic in lockstep. - On upstream sync: no action required. The kit lives entirely under
tools/ensemble-training-kit/(a fork-local path) and only invokes other fork-local scripts (ai/scripts/,scripts/dev/,scripts/ci/). - Re-test on rebase:
bash -n tools/ensemble-training-kit/*.sh
bash tools/ensemble-training-kit/make-distribution-tarball.sh /tmp/kit-test.tar.gz
tar -tzf /tmp/kit-test.tar.gz | grep -q "tools/ensemble-training-kit/run-full-pipeline.sh"
ADR-0332 — External-competitor benchmark harness (2026-05-08)¶
- Touches:
tools/external-bench/(new),docs/adr/0332-external-bench-wrapper-only.md(new),docs/adr/_index_fragments/0332-external-bench-wrapper-only.md(new),docs/adr/_index_fragments/_order.txt(one-line append),docs/adr/README.md(regenerated),changelog.d/added/external-bench-harness.md(new),docs/research/0087-external-bench-competitor-survey-2026-05-08.md(new). No engine code touched; no upstream-shared paths. - Invariant: the harness is wrapper-only — never vendor or link
x264-pVMAF(GPL-2.0) into this fork. Future competitors follow the same pattern (tools/external-bench/<competitor>/run.shinvokes a user-installed binary via env var; output schema-shimmed into the canonical JSON shape). The output schema (frames[].{frame_idx, predicted_vmaf_or_mos, runtime_ms}+summary.{competitor, plcc, srocc, rmse, runtime_total_ms, params, gflops}) is the contract between every wrapper andcompare.py.run_wrapper'srunnerparameter MUST stay resolved at call time (not via default-arg binding) so monkeypatch-based tests work. - On upstream sync: no action required. The harness lives entirely under
tools/external-bench/(a fork-local path) and never touches Netflix-shared code. - Re-test on rebase:
python3 -c "import yaml; names=['docker-image','security-scans','lint-and-format','required-aggregator','ffmpeg-integration','libvmaf-build-matrix','rule-enforcement','tests-and-quality-gates']; [yaml.safe_load(open(f'.github/workflows/{n}.yml')) for n in names]; print('OK')"
# Spot-check the gate is present on every top-level job:
for f in docker-image security-scans lint-and-format ffmpeg-integration \
libvmaf-build-matrix rule-enforcement tests-and-quality-gates \
required-aggregator; do
grep -c "pull_request.draft == false" ".github/workflows/${f}.yml"
done # Each must report >= 1.
SSIM extractor registration fix (2026-05-08)¶
- Touches:
core/src/feature/feature_extractor.c(upstream-mirror — adds one extern + one registry-array entry near the existing SSIM rows),core/src/feature/integer_ssim.c(upstream-mirror — adds#include "config.h"and refreshes the file-scope comment abovevmaf_fex_ssim),core/src/meson.build(addsinteger_ssim.cto the source list — fork-local diff),core/test/test_feature_extractor.c(adds one regression test alongside the existing tests),docs/metrics/features.md(table row + footnote ²),docs/state.md,changelog.d/fixed/ssim-extractor-registration.md. - Invariant on the upstream-mirror files: the registry-array entry must remain inside the unconditional CPU block (the same block as
&vmaf_fex_float_ssim/&vmaf_fex_float_ms_ssim) —vmaf_fex_ssimis CPU-only with no SIMD or GPU twin. Theconfig.hinclude ininteger_ssim.cis load-bearing on Vulkan-enabled LTO builds becausefeature_extractor.candinteger_ssim.cmust agree onHAVE_VULKAN/HAVE_CUDA/HAVE_SYCLfor theVmafFeatureExtractorstruct layout to match across TUs. - On upstream sync: if Netflix ever lands its own integer-SSIM registry row, drop the fork's row in favour of upstream's; the file structure is identical. If upstream removes
integer_ssim.centirely (the file has been dormant on master for years), revert the meson.build addition. Otherwise no action. - Re-test on rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false && ninja -C build
./build/test/test_feature_extractor # 5/5 pass, includes new ssim row
./build/tools/vmaf --reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--feature ssim --output /tmp/ssim_smoke.json && \
grep -q '<metric name="ssim"' /tmp/ssim_smoke.json
# Vulkan-enabled LTO build (-Wlto-type-mismatch must stay clean)
meson setup build-vulkan -Denable_vulkan=enabled --reconfigure && \
ninja -C build-vulkan tools/vmaf
python3 -m pytest tools/external-bench/tests/ -q # must report 7 passed
bash -n tools/external-bench/*/run.sh
0327 — Conformal-VQA prediction surface for vmaf-tune (ADR-0279)¶
- Touches:
tools/vmaf-tune/src/vmaftune/conformal.py(new),tools/vmaf-tune/src/vmaftune/predictor.py(Predictor.predict_vmaf_with_uncertainty),tools/vmaf-tune/src/vmaftune/cli.py(predictsubcommand gains--with-uncertainty/--calibration-sidecar/--alpha),tools/vmaf-tune/tests/test_conformal.py(new),docs/ai/conformal-vqa.md(new). No engine code touched; no upstream-shared paths. - Invariant: the conformal wrapper sits outside the ONNX graph and adds no new runtime dependency —
conformal.pyimports only the standard library (math,statistics,dataclasses,json,warnings). Future calibration-sidecar shapes use themethoddiscriminator string for versioning; do not rename"split-conformal"/"cv-plus"without bumping the loader. ThePredictor.predict_vmaf_with_uncertaintysignature is the Python-API contract consumed byvmaf-tune predict --with-uncertainty; renaming or reordering its keyword args breaks the CLI in lockstep. - On upstream sync: no action required.
vmaf-tuneis a fork-local tool; upstream Netflix/vmaf has no per-shot prediction surface. - Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_conformal.py -q
python3 -m pytest tools/vmaf-tune/tests/test_predictor.py -q
CI paths-ignore deny-list on heavy workflows (ADR-0341, 2026-05-09)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(fork-local —paths-ignore:block underpull_request:),.github/workflows/tests-and-quality-gates.yml(fork-local — same block),docs/adr/0341-ci-paths-ignore-doc-only-prs.md+ index fragment,changelog.d/changed/ci-paths-ignore-doc-only.md. - Invariant: the deny-list must stay strictly documentation-only (
docs/**,**/*.md,changelog.d/**,CHANGELOG.md,.workingdir2/**). Any path that contributes to a build, test, or lint input —libvmaf/**,meson.build,meson_options.txt,subprojects/**,python/**,ai/**,mcp-server/**,model/**,testdata/**,.github/workflows/**— must NEVER appear in the deny-list, otherwise the corresponding required check is silently skipped on a code-touching PR. The Required Checks Aggregator (ADR-0313) catches only the doc-only case (no required check ever ran for any required name); a too-broad deny-list would lose build coverage without anyone noticing. - On upstream sync: Netflix/vmaf upstream does not carry these two workflow files (they are fork-local additions). No sync conflict expected.
0332 — mkdocs --strict validation policy (ADR-0332)¶
- Touches:
mkdocs.yml(validation block +exclude_docs:),docs/mcp/embedded.md(one anchor fix),docs/research/0055-...md(one anchor fix),docs/{index,state,rebase-notes}.md(small bare-relative-dir-link sweep). All fork-local — no upstream-shared paths touched. - Upstream source: none — Netflix/vmaf upstream uses Sphinx / GitHub-rendered Markdown, not mkdocs. The
mkdocs.ymlconfig is wholly fork-local. - Invariant:
mkdocs.yml validation:must keeplinks.{not_found,unrecognized_links}: infountil either (a) ADR-0028 / ADR-0106 are superseded by a less-strict immutability rule that allows refreshing renamed-ADR cross-refs in frozen ADR bodies, or (b) the ~820 cross-tree-pointer links from docs into source-tree files (../../core/src/...,../../scripts/ci/...,../../.github/workflows/...) are migrated to absolute GitHub URLs or moved intodocs_dir-resident generated content. Promoting either category towarnwhile those conditions hold turns the docs lane permanently red. - On upstream sync: no action — the lane is fork-local.
- Re-test on rebase:
HDR VMAF model search — Path C documentation only (2026-05-09)¶
- Files added (this fork only; upstream Netflix/vmaf has none of these):
model/vmaf_hdr_model_card.md— discoverable warning that the HDR scoring path falls back to the SDRvmaf_v0.6.1.jsonweights. Filename deliberately uses.md, not.json, so thevmaftune.hdr.select_hdr_vmaf_modelglob (vmaf_hdr_*.json) keeps returningNone.docs/research/0089-hdr-vmaf-model-search.md— verbatim trail of the source-or-train survey (URLs + access dates).changelog.d/added/hdr-vmaf-model-search.md— release-notes fragment per ADR-0221.- ADR-0300 grew an inline
### Status update 2026-05-09: HDR model statussection. - Why no model JSON ships: Path A negative findings (no public Netflix HDR VMAF model exists; HDRMAX is a different algorithm not loadable by libvmaf's JSON path). Path B deferred behind gated subjective HDR corpora + multi-day training compute. No fabricated weights are introduced.
- On upstream sync: if Netflix lands
vmaf_hdr_*.jsoninNetflix/vmaf/model/, port via/port-upstream-commit; the resolver picks it up automatically with novmaftunechange. Then deletemodel/vmaf_hdr_model_card.md(or rewrite it as a normal model card describing the upstream weights). Watch https://github.com/Netflix/vmaf/issues/645 for the upstream release announcement. - Re-test on rebase: no behavioural change — pure docs. Sanity:
python3 -c "from pathlib import Path; \
import sys; sys.path.insert(0,'tools/vmaf-tune/src'); \
from vmaftune.hdr import select_hdr_vmaf_model; \
print(select_hdr_vmaf_model(Path('model')))"
# Expect: None — confirms the .md card does not match the glob
mkdocs build --strict # must EXIT=0 with no WARNING lines
ADR-0349 — fr_regressor_v3 namespace resolution (2026-05-09)¶
- Rebase impact: none. Docs-only change — adds ADR-0349, an append-only status appendix on ADR-0302 per ADR-0028, a
## fr_regressor_* namespace mapblock inai/AGENTS.md, and two changelog fragments. No upstream Netflix/vmaf surface touched; nofr_regressor_*registry rows touched (sha256s for_v1,_v2,_v2_ensemble_v1_seed{0..4},_v3all unchanged); no C / Python / ONNX bytes modified. - What to check after a rebase: nothing automated. The only drift risk is a future agent claiming
fr_regressor_v3plus_featuresfor an unrelated workstream —ai/AGENTS.mdcarries the reservation; reviewers verify the map row exists before approving any newfr_regressor_*registry id. - Reproducer:
```bash # ADR + AGENTS.md namespace map present and consistent: test -f docs/adr/0349-fr-regressor-v3-namespace.md grep -q "fr_regressor_* namespace map" ai/AGENTS.md grep -q "fr_regressor_v3plus_features" ai/AGENTS.md docs/adr/0349-fr-regressor-v3-namespace.md # Status appendix present on ADR-0302: grep -q "Status update 2026-05-09: namespace collision resolved" \ docs/adr/0302-encoder-vocab-v3-schema-expansion.md # Existing v3 production row bit-identical (sha256 unchanged): python3 -c "
import json reg = json.load(open('model/tiny/registry.json')) v3 = next(m for m in reg['models'] if m['id'] == 'fr_regressor_v3') assert v3['sha256'] == 'eaa16d23461eda74940b2ed590edfcaf13428aade294e47792a5a15f4d3b999c', v3 assert v3['smoke'] is False print('OK: fr_regressor_v3 production row unchanged') "
Registry test still passes:¶
bash core/test/dnn/test_registry.sh
0327 — Pre-push PR-body deliverables validator hook¶
- Touches:
scripts/ci/validate-pr-body.sh(new),scripts/git-hooks/pre-push(new),scripts/ci/test-validate-pr-body.sh(new),Makefile(hooks-installtarget adds the pre-push symlink). Re-usesscripts/ci/deliverables-check.shparser verbatim — no upstream-shared file is modified. - Invariant: parser shape parity with
.github/workflows/rule-enforcement.ymldeep-dive-checklist gate (ADR-0108). The validator constructs aPATHshim that interceptsgit diff --name-onlycalls only; every othergitinvocation falls through to the real binary. - On upstream sync: not applicable — these files are entirely fork-local and Netflix has no equivalent. If
scripts/ci/deliverables-check.shis ever rewritten or moved, the validator's exec path (scripts/ci/deliverables-check.sh) and the test harness's expected exit codes must follow. bash scripts/ci/test-validate-pr-body.sh # 8/8 cases pass
0320 — Semgrep # nosemgrep cites on Netflix-upstream Python harness (Research-0090)¶
- Touches:
python/vmaf/core/asset.py,python/vmaf/core/executor.py,python/vmaf/core/feature_extractor.py,python/vmaf/core/quality_runner.py,python/vmaf/core/result_store.py,python/vmaf/tools/decorator.py,python/test/command_line_test.py,python/test/feature_extractor_test.py,python/test/ssimulacra2_test.py,python/vmaf/config.py. - Invariant: every fork-added
# nosemgrep: <rule-id>line is paired with an inline cite toResearch-0090. The cite + rule-id pair is the load-bearing artifact (per memoryfeedback_no_guessing: every "false positive" claim ships its safety proof). If an upstream sync removes the cited line of code, drop the cite-comment block too. If upstream adds adefusedxmlfix at theElementTree.parse()site (feature_extractor.py:115,quality_runner.py:1496), keep upstream's fix and drop our suppressions. config.py:40(the SSL-bypass deletion) is a fork-exclusive security fix; if upstream resurrectsssl._create_unverified_contexton a sync, do not re-merge it — the bypass clobbers the process-global default and is unjustified per Research-0090, F1. semgrep scan --config=p/cwe-top-25 --config=p/c --config=p/python . \ --metrics=off --json | jq '.results | length'
# expect 0 — every legit finding either has a # nosemgrep cite or was fixed
0321 — Security-scans workflow registry-pack list (Research-0090)¶
- Touches:
.github/workflows/security-scans.yml,.github/workflows/lint-and-format.yml. - Invariant: the registry packs the workflow cites (
p/cwe-top-25+p/c+p/python) are validated againsthttps://semgrep.dev/c/p/<pack>— the previously-citedp/cert-c-strict,p/cert-cpp-strict, andp/cpppacks were retired by Semgrep in 2025 and 404. Thelint-and-format.ymlpull of${{ github.* }}intoenv:(clang-tidy + clang-tidy-sycl steps) defusesrun-shell-injection; preserve the pattern on any edit. See Research-0090, F2/F3. for pack in p/cwe-top-25 p/c p/python; do code=$(curl -sIL "https://semgrep.dev/c/${pack}" | head -1 | awk '{print $2}') [ "$code" = "200" ] && echo "${pack}: OK" || echo "${pack}: FAIL ($code)"
0320 — CodeQL C bulk sweep (78 deferred alerts → 60 fixed, 14 deferred to T7-5)¶
- Touches:
core/src/feature/{cambi.c,ciede.c,integer_adm.c,integer_psnr.c,adm_tools.h,third_party/xiph/psnr_hvs.c},core/src/feature/x86/{adm_avx2.c,adm_avx512.c,ansnr_avx2.c,ansnr_avx512.c,vif_avx2.c,vif_avx512.c},core/src/{pdjson.c,svm.cpp},core/test/{test_cpu.c,test_model.c},core/tools/{y4m_input.c,yuv_input.c,vmaf_bench.c}. All butvmaf_bench.care upstream-mirror Netflix files. - Invariant: widening casts on integer multiplications (
(size_t),(uint64_t),(double)) are LHS-prefixed before the multiply, never wrapped around the whole expression — the latter is a no-op againstcpp/integer-multiplication-cast-to-long. Deleted commented-out blocks (e.g., the AVX-512 VP-loop dead variant inadm_avx512.c::adm_dwt2_inverse) are gone for good; if upstream brings them back, they reintroduce the alerts.iqa/convolve.cwas deliberately left untouched: prefixing(double)on the float×float multiplications inside the scalar reference path breaks bit-exactness against the AVX2 path enforced bytest_iqa_convolve— CodeQL alert deferred to a follow-up that updates both paths in lockstep. - On upstream sync: any upstream change that re-introduces the deleted comment blocks or rewrites the cast forms will surface the alerts again. The
cambi_scoresignature change (CambiBuffers buffers→const CambiBuffers *buffers) is fork-local and likely to conflict with upstream patches that touch that function. The 14 deferredVifBufferlarge-parameter alerts are tracked under T7-5 (multi-backend coordinated refactor including NEON). - Re-test on rebase: cd libvmaf && meson test -C build # all 50+ C tests make test-netflix-golden # upstream golden gate
# Re-run CodeQL on master afterwards; the 60 fixed alerts must stay closed.
CodeQL cpp/declaration-hides-variable sweep (2026-05-09)¶
- What changed: Mechanical rename / scope-tighten / dedupe sweep closing 64 open
cpp/declaration-hides-variableCodeQL alerts onmaster. Touched files:core/src/feature/cambi.c,core/src/feature/x86/adm_avx2.c,core/src/feature/x86/adm_avx512.c,core/src/feature/x86/vif_avx2.c,core/src/feature/x86/vif_avx512.c. All five are upstream-mirror; the Netflix copyright header is preserved on each. - Renames adopted (semantic over
_2suffix): cambi.c: innerint errshadowing function-scopeerrbecomesmkdir_err(heatmaps init) andsrc_err(full-ref extract path).adm_avx2.c/adm_avx512.c: thej == 0first-column special-case block is wrapped in{ ... }so itsj0..j3ands0..s3stop being visible to the per-jtail loop. The inner duplicate__m256i add_shift_HP_vex = _mm256_set1_epi32(32768)(and 512-bit twin) is removed — bit-identical to the function-scope value already in scope. The__m256i rfactor1that shadowed the function-scopefloat rfactor1[3]becomesrfactor_v0/_v1/_v2(and the AVX-512 twin likewise).vif_avx2.c/vif_avx512.c: tap-loop locals followf_tap,r_top/r_bot,d_top/d_botfor the s0 stage, andf_tap0/f_tap1,r_back0/r_fwd0, etc. for the AVX-512 paired-tap stage. Inner per-fj__m256i fq/__m512i fqshadows of the centre-tap broadcast becomef_tap. Inner-block duplicates of function-scoperef/dis/stride/ii(identical types and initialisers) are simply removed. The two scalarVifResiduals residualsdeclarations that shadowed function-scopeResiduals512 residualsbecometail_residuals. The twoconst uint16_t fcoeffdeclarations that shadowed function-scope__m512i fcoeffbecomefcoeff_scalar.- Invariant: bit-exactness gate — the rename sweep must not change any score. The Netflix CPU golden 3 (
src01_hrc00,checkerboard_1,checkerboard_10) ran clean against this PR. All 76 VMAF-targeted Python tests pass; the 9 unrelated pre-existing failures (NIQE, PyPSNR, FileSystemResultStore) reproduce on a pristineorigin/mastercheckout. - On upstream sync: Netflix has no equivalent renames on upstream
masteras of2026-05-09. When syncing, prefer the fork's renamed identifiers (the CodeQL gate depends on them). If Netflix later renames the same locals differently, reconcile by keeping fork names and updating any imported chunks at port time. - Re-test on rebase: meson test -C build --suite=fast PYTHONPATH=$PWD/python python3 -m pytest \ python/test/quality_runner_test.py -k test_run_vmaf \ python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ -m "not slow" -q
ADR-0209 v1 stdio runtime (T5-2b) — Embedded MCP server (2026-05-08)¶
- Touches:
core/src/mcp/{mcp.c,dispatcher.c,transport_stdio.c,mcp_internal.h,meson.build,3rdparty/cJSON/{cJSON.c,cJSON.h,LICENSE}},core/test/test_mcp_smoke.c,core/test/meson.build. All paths are fork-local. cJSON is vendored verbatim from upstreamDaveGamble/cJSON@v1.7.18under its MIT license. - Invariant: every TU under
core/src/mcp/(other than the vendored cJSON dir) is fork-local with theCopyright 2026 Lusoris and Claude (Anthropic)header; cJSON keeps its upstream MIT header verbatim. The public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged from T5-2 — only function bodies flipped from-ENOSYSto working implementations. SSE / UDS still return-ENOSYSso the v2 PR can wire them without touching the public surface. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface; the entire
core/src/mcp/subtree is fork-local. If upstream ever adds an MCP surface, expect a port-only sync since names will collide. cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \ -Denable_mcp=true -Denable_mcp_stdio=true ninja -C build && meson test -C build test_mcp_smoke -v
ADR-0334 — state.md-touch-check CI gate (2026-05-08)¶
- Touches:
.github/workflows/rule-enforcement.yml(new top-level jobstate-md-touch-check),scripts/ci/state-md-touch-check.sh(new),scripts/ci/test-state-md-touch-check.sh(new),scripts/ci/AGENTS.md(new rebase-sensitive-surface row),.github/PULL_REQUEST_TEMPLATE.md(already carries the "Bug-status hygiene" section +no state delta: REASONopt-out — coupled to the script's regex). No upstream-shared paths. - Invariant: the gate's trigger predicate (Conventional-Commit
fix:prefix, barebugtoken in title, GitHub close-keywordscloses/fixes/resolves#N, unchecked Bug-status-hygiene checkbox) and opt-out sentinel (no state delta: REASON) match the wording of the## Bug-status hygienesection in.github/PULL_REQUEST_TEMPLATE.md. Reword the template only alongside the script. The job carries thepull_request.draft == false || github.event_name != 'pull_request'gate (ADR-0331 pattern) — keep that on any future hoist into the required-aggregator set. - On upstream sync: Netflix/vmaf has no equivalent rule. No conflict expected; the workflow file is fork-introduced.
- Re-test on rebase: bash scripts/ci/test-state-md-touch-check.sh python3 -c "import yaml; yaml.safe_load(open('.github/workflows/rule-enforcement.yml')); print('YAML OK')" pre-commit run shellcheck --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh pre-commit run shfmt --files scripts/ci/state-md-touch-check.sh scripts/ci/test-state-md-touch-check.sh
SYCL PSNR chroma extension (T3-15(b), 2026-05-09)¶
- Touches:
core/src/feature/sycl/integer_psnr_sycl.cpp(per-extractor chroma device buffers, per-plane SSE accumulators, and aprovided_featuresextension topsnr_y/psnr_cb/psnr_cr),core/src/sycl/AGENTS.md(per-kernel rebase-sensitive invariant for the chroma-on-per-extractor-buffer arrangement),docs/metrics/features.md(footnote ¹ refresh — all three GPU PSNR extractors now emit chroma),docs/adr/0192-gpu-long-tail-batch-3.mdReferences-section status update,changelog.d/added/sycl-psnr-chroma.md. - Invariant on the chroma upload path: chroma planes ride on per-extractor device buffers populated by host-side staging copies in the combined-graph
pre_fncallback — NOT the SYCL state's shared frame buffer (vmaf_sycl_shared_frame_init), which is luma-only by design. Luma stays graph-recorded; chroma SSE kernels run direct inpost_fnon the same in-order combined queue. The CUDA twin (PR #520 / commit 7f3d58a5) uses the existing CUDA per-plane picture infrastructure and therefore has no equivalent invariant. - On upstream sync: Netflix/vmaf upstream has no SYCL backend at all, so conflict probability is zero on
psnr_sycl. If an upstream port to the fork's SYCL runtime someday extendsvmaf_sycl_shared_frame_initto allocate chroma planes, the PSNR extension can be migrated onto it and the per-extractor chroma buffers retired — but only after a cross-backend gate run confirms bit-exactness against CPU atplaces=4(ADR-0214). source /opt/intel/oneapi/setvars.sh CC=icx CXX=icpx meson setup build-sycl libvmaf \ -Denable_sycl=true -Denable_cuda=false ninja -C build-sycl python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary build-sycl/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 --pixel-format 420 --bitdepth 8 \ --feature psnr --backend sycl --device 0
# Expect 0/48 mismatches across psnr_y / psnr_cb / psnr_cr at places=4.
```text
Cppcheck nullPointer false-positive in dict.c (2026-05-09)¶
Files pinned:
core/src/dict.c:121(one-line redundant-condition fix indict_overwrite_existing). Why this rebase-note exists: Master CI'sCppcheck (Whole Project)gate started failing on commit14b5ffba(#537) and blocked every open PR because each PR rebases onto a broken master. The cppcheck finding was likely always present but masked bypaths-ignorefiltering on the prior workflow shape; PR #530 widened cppcheck's trigger surface and exposed it. Deleted the redundant&& valguard sincevalis already checked at the public entry-pointvmaf_dictionary_set(dict.c:137). No behavior change; cppcheck flags the original as "either the val check is redundant or there's a possible null deref" because it can't prove the interprocedural guarantee. Rebase-sensitivity: zero — change is local todict.c. Future upstream sync of this file should keep the fix or re-run cppcheck locally to confirm absence of recurrence.
Aggregator timeout bump (2026-05-09)¶
Files pinned:
.github/workflows/required-aggregator.yml(deadline 30→90 min, job timeout 35→100 min) Why: 41 PRs in flight 2026-05-09 morning hit Aggregator timeouts while real CI eventually passed. Bumping both deadlines unblocks the train without touching the underlying matrix. Rebase-sensitivity: zero — workflow file is wholly fork-local.
ARC self-hosted runner pool — pilot Cppcheck routing (2026-05-09)¶
.github/workflows/lint-and-format.yml(Cppcheckruns-on:ternary). Why: opt-in graceful migration; ADR-0359 + docs/development/ci-runners.md document the flip-the-variable recipe when the cluster is degraded. Rebase-sensitivity: zero — workflow file is fork-local.
ADR-0338 — macOS Vulkan-via-MoltenVK CI lane (2026-05-09)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(fork-local — addsBuild — macOS Vulkan via MoltenVK (advisory)lane, addscontinue-on-errorplumbing onmatrix.experimental && matrix.moltenvk, addsInstall MoltenVK + Vulkan loader/headers (macOS)step, addsRun Vulkan smoke tests (macOS MoltenVK)step, gates the existing test/cache/tox steps on!matrix.moltenvk),docs/backends/vulkan/moltenvk.md(new fork-local doc),docs/adr/0127-vulkan-compute-backend.md(status-update appendix per the ADR's Proposed status — body untouched),docs/adr/0338-macos-vulkan-via-moltenvk-lane.md(new),docs/adr/_index_fragments/0338-macos-vulkan-via-moltenvk-lane.mdplus_order.txtappend (new),docs/research/0089-moltenvk-feasibility-on-fork-shaders.md(new),changelog.d/added/macos-vulkan-via-moltenvk-lane.md(new). - Invariant on the upstream-mirror file: none —
libvmaf-build-matrix.ymlis fork-local. The new lane'scontinue-on-errorclause MUST stay scoped tomatrix.experimental == true && matrix.moltenvk == trueso existingexperimental: truematrix entries (e.g. the macOS DNN lane) keep their default fail-fast behaviour.VK_ICD_FILENAMESMUST point at/opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json— note theetc/vulkansegment, NOTshare/vulkan(the homebrew formula's install layout usesetc/; verified againstFormula/m/molten-vk.rb). - On upstream sync: Netflix upstream has no macOS Vulkan lane and no MoltenVK awareness; nothing to reconcile. If a future MoltenVK release drops support for
GL_EXT_shader_atomic_int64translation,moment.compwill fail on the lane; the fix path is in ADR-0338 §Decision (lane iscontinue-on-errorso it does not block PRs) — update the known-limitations table indocs/backends/vulkan/moltenvk.mdand either pin a working MoltenVK version in the brew install line or rewrite the shader. - Re-test on rebase:
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/libvmaf-build-matrix.yml'))" && \
echo "YAML parse OK"
# Confirm the lane is still in the matrix:
grep -q "Build — macOS Vulkan via MoltenVK (advisory)" \
.github/workflows/libvmaf-build-matrix.yml
# Confirm the lane is NOT promoted to required-aggregator until one
# green run on master (per ADR-0338):
! grep -q "macOS Vulkan via MoltenVK" \
.github/workflows/required-aggregator.yml
# Confirm the ICD path is the etc/ one, not share/:
grep -q "etc/vulkan/icd.d/MoltenVK_icd.json" \
.github/workflows/libvmaf-build-matrix.yml
ADR-0363 — Mend Renovate replaces Dependabot (2026-05-09)¶
- Touches:
renovate.json(new, repo-root),.github/workflows/renovate.yml(new),.github/dependabot.yml(deleted — renamed to.github/dependabot.yml.disabled),docs/development/dependency-bot.md(new operator playbook),changelog.d/changed/renovate-supersedes-dependabot.md(new),docs/adr/0363-renovate-replaces-dependabot.md(new),docs/adr/_index_fragments/0363-renovate-replaces-dependabot.md(new). - Invariant:
.github/dependabot.ymlno longer exists onmaster; the disabled copy isdependabot.yml.disabled. On upstream sync, if Netflix ever ships their owndependabot.yml, do NOT restore it — the fork intentionally uses Renovate. Merge the upstream file intodependabot.yml.disabledfor reference only. - Upstream interaction: none. Netflix/vmaf upstream has no Renovate config. Conflict risk is zero unless upstream adds
renovate.jsonor restoresdependabot.yml. - Re-test on rebase:
# Verify the workflow SHA-pin is still present and non-floating:
grep -E 'renovatebot/github-action@[a-f0-9]{40}' .github/workflows/renovate.yml
# Verify dependabot.yml is still absent:
test ! -f .github/dependabot.yml && echo "ok: dependabot.yml absent"
# Validate renovate.json syntax (requires Node):
node -e "JSON.parse(require('fs').readFileSync('renovate.json','utf8')); console.log('JSON valid')"
ADR-0355 — Symphony-inspired agent-dispatch infrastructure (2026-05-09)¶
Files added (all fork-introduced, none mirror upstream):
.claude/workflows/_template.md,.claude/workflows/codeql-alert-sweep.md,.claude/workflows/simd-port.md,.claude/workflows/feature-extractor-port.md.scripts/lib/__init__.py,scripts/lib/backlog_tracker.py,scripts/lib/AGENTS.md.scripts/ci/agent-eligibility-precheck.py(new row inscripts/ci/AGENTS.md"Rebase-sensitive surfaces" table).docs/development/agent-dispatch.md. Why this rebase-note exists: pure additive, all paths are fork-only (.claude/,scripts/lib/, fork-only docs). Upstream Netflix/vmaf has no.claude/, noscripts/lib/, and nodocs/development/agent-dispatch.md, so the merge surface is zero on/sync-upstream. The only coupling is internal betweenscripts/ci/agent-eligibility-precheck.pyandscripts/lib/backlog_tracker.py(sys.path import). Both files move together; documented inscripts/lib/AGENTS.mdand a new row inscripts/ci/AGENTS.md. Rebase-sensitivity: zero w.r.t. upstream. Internal-only: renamingBacklogItemfield names or theBacklogTracker/GitHubTrackerpublic method signatures is a breaking change for the precheck and any future state-audit script — guard via the smoke listed in Research-0091 §"Smoke results" before any rename PR. Format-coupling note: the BACKLOG.md row regex (scripts/lib/backlog_tracker.py:_ID_PATTERN) is brittle against table-shape edits. If a future BACKLOG.md edit adds a column or renames a status word, the parser will silently mis-classify rows — the smoke parses 101 rows on master at 2026-05-09; expect ≥ 100 after any structural edit.
0350 — psnr_hvs AVX-512 ceiling re-bench (ADR-0350, T3-9 (a))¶
docs/adr/0350-psnr-hvs-avx512-ceiling.md— closure ADR.docs/adr/0160-psnr-hvs-neon-bitexact.md— appended### Status update 2026-05-09appendix.docs/research/0091-psnr-hvs-avx512-bench-2026-05-09.md— empirical companion (cycle share, Amdahl ceiling, reproducer). Why this rebase-note exists: T3-9 (a) closes as AVX2 ceiling. The result has zero rebase-sensitivity by itself — no engine code changes — but the bit-exactness invariants that lock it to a ceiling do. The 78.42 % scalar tail incalc_psnrhvs_avx2/calc_psnrhvs_neonis locked by ADR-0138 / ADR-0139's "per-lane-scalar float reduction" rule (carried by ADR-0159 / ADR-0160). If a future upstream sync ofcore/src/feature/third_party/xiph/psnr_hvs.c(the Xiph/Daala DCT) changes the per-block summation tree — e.g. partial folding, re-ordered means, vectorised mask reductions — the AVX2 + NEON TUs incore/src/feature/x86/psnr_hvs_avx2.candcore/src/feature/arm64/psnr_hvs_neon.cMUST be re-audited against the new scalar reference, and the ceiling argument in ADR-0350 must be re-run (because the 78 / 15 cycle-share split would shift). Rebase-sensitivity: low for the ceiling decision itself (empirical re-bench on a current host is cheap — 30 seconds via the reproducer in Research-0091 §7); high for the underlying bit-exactness invariants the decision rests on (Netflix golden trips on ≥ 5.5e-5 drift per ADR-0160 §Context). The ADR-0350 §Verification reproducer is the gate — re-run it if the cycle share shifts, the Netflix normal-pair fixture changes, or a new host class (e.g. wide-issue Granite Rapids) goes into CI.
0320 — FFmpeg n8.1 → n8.1.1 base bump (2026-05-09)¶
- Touches:
ffmpeg-patches/series.txt(header comment),ffmpeg-patches/README.md(apply / verify / smoke sections),ffmpeg-patches/test/build-and-run.sh(FFMPEG_SHAdefault),scripts/ci/ffmpeg-patches-check.sh(header comment;FFMPEG_BRANCHenv default unchanged atrelease/8.1since the branch tracks point releases),docs/development/automated-rule-enforcement.md(gate description). The 9.patchfiles themselves are unchanged — every patch in the series applied cleanly, cumulatively, against pristinen8.1.1viagit am --3way. - Upstream source: FFmpeg upstream point release n8.1.1 (commit
239f2c7"Bump micro for 8.1.1") — bug-fix-only on top of n8.1, no API or AVOption breakage that the patch stack consumes. - Invariant: the patch stack continues to apply against the current tip of FFmpeg's
release/8.1branch. Per ADR-0118 and ADR-0186 §FFmpeg patch coupling, the verification gate is cumulativegit am --3wayagainst a pristine checkout, not per-patch standalone apply. The scripts/ci/ffmpeg-patches-check.sh local gate usesgit apply(no commit) but accumulates state in the same way. - On upstream sync: no action required. If a future FFmpeg point release (n8.1.2 or n8.2) lands new hunks that conflict with one of the patches, regenerate the affected patches via
git format-patchon the resolved state, bump the references in the five files listed under "Touches", and add a fresh rebase-notes entry citing the conflict file(s). - Re-test on rebase:
cd /tmp && rm -rf ffmpeg-n811 && \
git clone --depth 1 --branch n8.1.1 \
https://git.ffmpeg.org/ffmpeg.git ffmpeg-n811
git -C /tmp/ffmpeg-n811 config user.email agent@local
git -C /tmp/ffmpeg-n811 config user.name agent
for p in ffmpeg-patches/000*-*.patch; do
git -C /tmp/ffmpeg-n811 am --3way "$p" || break
done
bash scripts/ci/ffmpeg-patches-check.sh
ADR-0281 follow-up — QSV install-matrix discoverability backfill (2026-05-08)¶
- Touches:
docs/getting-started/install/{arch,fedora,ubuntu,macos,windows}.md(new## Intel QSVsection per page),docs/adr/0281-vmaf-tune-qsv-adapters.md(status-update appendix per ADR-0028),changelog.d/changed/qsv-install-matrix-docs.md(new fragment). No code, no engine, no upstream-shared C / Python source touched. Pure documentation backfill closing the SYCL-audit research-0086 Topic C gap (issue #464). - Invariant: each per-OS QSV section pins the package names against verified upstream URLs with a
Verified 2026-05-08access date. The hardware-generation matrix is sourced from the public Wikipedia "Intel Quick Sync Video — Hardware decoding and encoding" table; if Intel revises which generation supports AV1 encode (e.g. backports the encoder to Lunar Lake / Meteor Lake silicon currently absent from the table), the matrix in all five pages must move in lockstep — the Arch / Fedora / Ubuntu / Windows pages all carry the same matrix verbatim. The macOS page deliberately omits the matrix (QSV unsupported on macOS). - On upstream sync: no action required — Netflix/vmaf upstream does not ship per-OS install pages under
docs/getting-started/install/; that tree is fork-only.
# Lint the install pages (markdownlint via pre-commit):
pre-commit run --files docs/getting-started/install/*.md
# Verify each page (except alpine + macos) still carries the matrix:
for f in arch fedora ubuntu windows; do grep -q 'Arc Battlemage' "docs/getting-started/install/${f}.md" || echo "MISSING: ${f}"
# Confirm the macOS page documents QSV as unsupported:
grep -q 'Intel QSV. is unsupported on macOS' docs/getting-started/install/macos.md
0333 — vmaf-tune Phase F multi-pass encoding (ADR-0333)¶
Touches:
tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py(CodecAdapter Protocol gainssupports_two_pass: bool+two_pass_args(...))tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py(overrides both)tools/vmaf-tune/src/vmaftune/encode.py(EncodeRequestgainspass_number/stats_path;build_ffmpeg_commandadds the 2-pass argv splice + pass-1 null-muxer redirect; newrun_two_pass_encode)tools/vmaf-tune/src/vmaftune/corpus.py(CorpusOptions.two_pass, routing initer_rows)tools/vmaf-tune/src/vmaftune/cli.py(--two-passflag oncorpus/recommendsubparsers) Invariant: 2-pass encoding routes through the codec adapter viasupports_two_pass+two_pass_args(pass_number, stats_path). The encode driver never branches on codec name. Adapters withsupports_two_pass = Falseare honoured silently (single-pass fallback with stderr warning); the seam is open for sibling codec adapters (libx264, libsvtav1, libvvenc, libaom-av1) to opt in by overriding the two methods on their adapter file alone. This is the fork-local extension to the ADR-0237 Phase A multi-codec contract; upstream Netflix/vmaf has no equivalent and does not own this code path. Re-test:
(Optional, requires ffmpeg + libx265 in the runner's PATH:)
VMAF_TUNE_INTEGRATION=1 python -m pytest \
tests/test_codec_adapter_x265_two_pass.py::test_real_x265_two_pass_smoke -q
Rebase-sensitivity: zero from upstream — tools/vmaf-tune/ is fork-local. The only concern is the codec_adapters Protocol shape: a future upstream commit that adds a sibling codec adapter SHOULD inherit the supports_two_pass = False default and either explicitly opt in or leave the flag off. Downstream sibling-codec PRs in this fork should follow the ADR-0288 / ADR-0333 pattern: one adapter file, override the two methods, add a test file mirroring test_codec_adapter_x265_two_pass.py.
ADR-0360 — CAMBI CUDA port (T3-15a, 2026-05-09)¶
Files pinned:
core/src/feature/cuda/integer_cambi_cuda.c(new)core/src/feature/cuda/integer_cambi_cuda.h(new)core/src/feature/cuda/integer_cambi/cambi_score.cu(new)core/src/feature/feature_extractor.c(addedvmaf_fex_cambi_cudato list)core/src/meson.build(addedcambi_scoretocuda_cu_sources, addedinteger_cambi_cuda.cto CUDA feature sources)
Why: The CUDA twin of vmaf_fex_cambi (Strategy II hybrid — three GPU kernels for the embarrassingly parallel stages; calculate_c_values + topK on CPU). Registers vmaf_fex_cambi_cuda under #if HAVE_CUDA guard.
Rebase-sensitivity: low. The three new files are wholly fork-local and will not conflict. The two upstream-shared files have small, self-contained hunks:
feature_extractor.c: theextern vmaf_fex_cambi_cudadeclaration and the&vmaf_fex_cambi_cudaarray entry are inside a#if HAVE_CUDAblock. Upstream's additions to this file (new feature extractors, new dispatch flags) will not conflict unless Netflix adds their own CUDA twin for CAMBI (unlikely — they don't ship a CUDA backend).meson.build: thecambi_scoreentry in thecuda_cu_sourcesdict and theinteger_cambi_cuda.cline in the CUDA sources list. Any upstream changes tomeson.buildthat restructure thecuda_cu_sourcesdict would require a manual merge; the dict entries are sorted alphabetically by key, socambi_scorelands betweenadm_scoreandmotion_score.
If upstream adds cambi_cuda themselves: drop the fork copy and check for API divergence. Strategy II hybrid is the natural choice; the upstream implementation may differ if they choose Strategy III (fully-on-GPU calculate_c_values).
cambi_internal.h dependency: integer_cambi_cuda.c includes core/src/feature/cambi_internal.h (fork-added trampoline exposing cambi.c's static helpers). If upstream significantly refactors cambi.c (renames vmaf_cambi_preprocessing, vmaf_cambi_calculate_c_values, etc.), cambi_internal.h must be updated alongside. This is the same dependency the Vulkan twin (cambi_vulkan.c) has — see ADR-0210's rebase note for the full list of exposed functions.
Vulkan submit-pool PR-B: six secondary kernels (2026-05-09, ADR-0353)¶
Files changed:
core/src/feature/vulkan/ssim_vulkan.ccore/src/feature/vulkan/ciede_vulkan.ccore/src/feature/vulkan/ms_ssim_vulkan.ccore/src/feature/vulkan/motion_v2_vulkan.ccore/src/feature/vulkan/float_psnr_vulkan.ccore/src/feature/vulkan/float_motion_vulkan.ccore/src/feature/vulkan/AGENTS.mddocs/adr/0353-vulkan-submit-pool-pr-b-six-kernels.md
Why this rebase-note exists: six Vulkan host-glue TUs were migrated from per-frame command-buffer and descriptor-set allocation to the VmafVulkanKernelSubmitPool abstraction (ADR-0256). Any Netflix upstream sync that touches these same files (unlikely — they are fork-local) must preserve the VmafVulkanKernelSubmitPool fields in the state struct and the pool-destroy-before-pipeline-destroy ordering in close_fex().
Rebase-sensitivity: low. All six files are entirely fork-local; Netflix upstream does not have a Vulkan backend. The submit-pool API is defined in core/src/vulkan/kernel.h (also fork-local). No public header or C-API surface was changed; the FFmpeg patch series is unaffected.
Key invariant to preserve on rebase: vmaf_vulkan_kernel_submit_pool_destroy MUST be called before vmaf_vulkan_kernel_pipeline_destroy in every migrated kernel's close_fex(). See core/src/feature/vulkan/AGENTS.md §"Submit-pool ordering invariant".
0354 — Vulkan submit-pool PR-C: submit_pool_destroy-before-pipeline ordering¶
- Touches:
core/src/feature/vulkan/cambi_vulkan.c,core/src/feature/vulkan/ssimulacra2_vulkan.c,core/src/feature/vulkan/float_ansnr_vulkan.c,core/src/feature/vulkan/moment_vulkan.c. - Invariant: In every migrated extractor,
vmaf_vulkan_kernel_submit_pool_destroy()MUST precede everyvmaf_vulkan_kernel_pipeline_destroy()call inclose_fex(). Reversing the order frees the pool's command buffers after the pipeline's command pool is destroyed — undefined behaviour per Vulkan spec §6.2. - Re-test:
meson test -C build --suite=vulkanpasses.scripts/ci/cross_backend_vif_diff.pyshowsplaces=4for all four extractors on all three target devices (RTX 4090, Arc A380, RADV iGPU).
0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0291)¶
0231 — Vulkan submit-pool migration PR A: adm + motion + psnr (ADR-0352)¶
- Touches:
core/src/feature/vulkan/adm_vulkan.c,core/src/feature/vulkan/motion_vulkan.c,core/src/feature/vulkan/psnr_vulkan.c(all fork-local Vulkan kernels; no upstream C paths touched),changelog.d/changed/vulkan-submit-pool-pr-a-adm-motion-psnr.md,docs/adr/0291-vulkan-submit-pool-pr-a-adm-motion-psnr.md. - Invariant: Each migrated TU adds
VmafVulkanKernelSubmitPool sub_pooland pre-allocatedVkDescriptorSetfield(s) to its state struct. The pool must be destroyed (vmaf_vulkan_kernel_submit_pool_destroy) beforevmaf_vulkan_kernel_pipeline_destroyinclose_fex(); reversing the order would destroy the descriptor pool while the submit pool still holds live command buffer + fence references. Descriptor sets allocated viavmaf_vulkan_kernel_descriptor_sets_allocare freed implicitly by the descriptor pool tear-down — do NOT callvkFreeDescriptorSetson them inclose_fex(). Formotion_vulkan, the pre-allocated set is rebound once per frame viavkUpdateDescriptorSetsbecause the blur ping-pong changes whichblur[]slot is "current"; foradm_vulkanandpsnr_vulkanthe sets are stable afterinit()and require no per-frame update. - Upstream interaction: none. All three files are fork-local Vulkan kernel TUs not present in Netflix/vmaf upstream.
- On upstream sync: zero interaction. Upstream cannot conflict with this PR's paths. The Vulkan backend is entirely fork-introduced.
- Re-test on rebase:
meson test -C build --suite=fast
# Cross-backend parity gate (places=4):
python python/test/cross_backend_diff.py \
--features adm motion psnr \
--backend vulkan cpu \
--places 4 \
--yuv testdata/yuv/src01_hrc00_576x324.yuv \
testdata/yuv/src01_hrc01_576x324.yuv
ADR-0350 — FFmpeg libvmaf filter CUDA backend selector (0010 patch)¶
Patch: ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch.
libavfilter/vf_libvmaf.c— addscudaAVOption + state field + init / cleanup / picture-pool wiring underCONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER.configure— adds--enable-libvmaf-cuda(EXTERNAL_LIBRARY_LISTentry + help text), promoteslibvmaf_cudafrom blanket-autodetect to gatedenabled libvmaf_cuda && require_pkg_config + check, preserves theenabled libvmaf && check_pkg_config libvmaf_cudain-filter probe so the new selector still works without the explicit flag when libvmaf ships CUDA. Why this rebase-note exists: Patch0010extends the SYCL (0003) / Vulkan (0004) per-context backend selectors to CUDA on the regularlibvmaffilter. The patch coexists with the upstream dedicatedlibvmaf_cudafilter (CONFIG_LIBVMAF_CUDA_FILTER) by gating its struct field and code paths on!CONFIG_LIBVMAF_CUDA_FILTER— the dedicated filter keeps owning its owncu_statefield. CLAUDE.md §12 r14 makes the patch update mandatory because the change touches a filter consumer of thevmaf_cuda_state_init/_import_state/_state_free/_preallocate_pictures/_fetch_preallocated_pictureC-API surface inlibvmaf_cuda.h. Rebase-sensitivity: low. The patch'svf_libvmaf.chunks are context-anchored on the SYCL/Vulkan selector blocks; if upstream FFmpeg renamesCONFIG_LIBVMAF_CUDA_FILTERor moves thelibvmaf_cuda.hinclude, the include guard at the top of the file needs the corresponding update. The configure hunks are context-anchored on the existing--enable-libvmaf-sycl/--enable-libvmaf-vulkanlines — those have proven stable across n8.0 → n8.1 → n8.1.1, so drift risk is low. WhenVmafCudaConfigurationever grows adevice_indexfield upstream, swap thecudaboolean for anint cuda_devicemirroring SYCL's shape (separate ADR + patch refresh). Verification gate: cumulativegit am --3wayreplay offfmpeg-patches/000{1..9}-*.patch+0010-*against pristine FFmpegn8.1.1PASS (2026-05-09). Build oflibavfilter/vf_libvmaf.oPASS under bothCONFIG_LIBVMAF_CUDA=0(selector errors at filter- init time per#elsebranch) andCONFIG_LIBVMAF_CUDA=1 && !CONFIG_LIBVMAF_CUDA_FILTER(selector active, picture-pool wiring compiles).
0320 — Vulkan instance / VMA apiVersion bump to 1.4 (Step B)¶
- Touches:
core/src/vulkan/common.c,core/src/vulkan/vma_impl.cpp,core/src/vulkan/AGENTS.md. - Invariant: the four
apiVersionsites (lines 54, 264, 374 ofcommon.c; line 22 ofvma_impl.cpp) request Vulkan 1.4, not 1.3. Together with the Step-Aprecisedecorations invif.comp/ciede.comp(PR #346) and the Phase-3 cross-subgroup release-acquire fix (PR #511), this gates the cross-backend places=4 contract on Arc + RADV. NVIDIA closure depends on Phase 3c (PR #512; block-on-merge until that lands). Netflix upstream does not carry a VMA dependency or a Vulkan backend; no upstream merge conflict expected on these files. - Re-test on rebase:
meson setup build -Denable_vulkan=enabled -Denable_cuda=false \
-Denable_sycl=false --buildtype=release
ninja -C build
for D in 0 1 2; do
python3 scripts/ci/cross_backend_parity_gate.py \
--vmaf-binary build/tools/vmaf \
--reference python/test/resource/yuv/src01_hrc00_576x324.yuv \
--distorted python/test/resource/yuv/src01_hrc01_576x324.yuv \
--width 576 --height 324 --pixel-format 420 --bitdepth 8 \
--backends cpu vulkan --vulkan-device "$D" \
--features vif ciede adm motion psnr
done
# All 0/N mismatches at places=4 once Phase 3c (PR #512) has landed.
ADR-0332 v2 runtime (T5-2c) — Embedded MCP server UDS + real compute_vmaf (2026-05-09)¶
- Touches:
core/src/mcp/{mcp.c,dispatcher.c,mcp_internal.h,meson.build,compute_vmaf.c,transport_uds.c},core/test/test_mcp_smoke.c. All paths are fork-local. No new third-party vendor drop in v2 — mongoose vendoring stays deferred to v3 with the SSE transport. - Invariant: same as ADR-0209 v1 — the entire
core/src/mcp/subtree is fork-local; the public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged (only function bodies flipped —vmaf_mcp_start_udsfrom-ENOSYSto a working AF_UNIX listener;compute_vmaffrom a{"status":"deferred_to_v2"}placeholder to a realvmaf_score_pooledbinding). Per ADR-0128 § operational guardrails the UDS socket file is created mode 0700; thatchmodhappens invmaf_mcp_start_udsafterbindand is a load-bearing security invariant — do NOT relax it on rebase.compute_vmafruns on a per-call ephemeralVmafContextso the host's main scoring run is unperturbed; do NOT rewire it to reuseserver->ctxbecausevmaf_score_pooledcommits the model destructively to the context. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. If upstream adds one, expect a port-only sync since names will collide.
- Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
-Denable_mcp=true -Denable_mcp_stdio=true \
-Denable_mcp_uds=true
ninja -C build && meson test -C build test_mcp_smoke -v
# Real-score smoke (single 576x324 pair):
build/test/test_mcp_smoke 2>&1 | tail -3 # expects "16 tests run, 16 passed"
ADR-0332 v3 runtime (T5-2d) — Embedded MCP server SSE transport (2026-05-09)¶
- Touches:
core/src/mcp/{mcp.c,mcp_internal.h,meson.build,transport_sse.c},core/meson_options.txt,core/test/test_mcp_smoke.c,docs/mcp/embedded.md,docs/adr/0332-mcp-runtime-v2.md(status-update appendix). All paths are fork-local. No third-party vendor drop in v3 — the originally-planned mongoose vendor was reversed because cesanta/mongoose 7.18 is GPL-2.0-only OR commercial, incompatible with the fork's BSD-3-Clause-Plus-Patent license (verified at upstream LICENSE 2026-05-09). The SSE transport is plain POSIX sockets in fork-owned C (~500 LOC). - Invariant: same as ADR-0209 / ADR-0332 v2 — the entire
core/src/mcp/subtree is fork-local; the public ABI incore/include/libvmaf/libvmaf_mcp.his unchanged (onlyvmaf_mcp_start_sse's body flipped from-ENOSYSto a working AF_INET listener). The SSE listener bindsINADDR_LOOPBACKonly; do NOT switch toINADDR_ANYwithout a separate ADR + auth design (v3 ships intentionally without CORS/Bearer/per-session auth on the assumption of a same-host trust boundary). The SSE stop path usesshutdown(SHUT_RDWR)beforeclose()— plainclose()of an AF_INET listening fd from another thread does NOT unblockaccept()on Linux; do NOT remove theshutdowncall.enable_mcp_sseis now afeatureoption (defaultauto), notboolean false. - On upstream sync: no action required. Netflix/vmaf upstream has no embedded MCP surface. Do NOT re-introduce mongoose (or any GPL-licensed HTTP library) on a future rebase without first amending CLAUDE §1 and adding a separate license-compatibility ADR.
- Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false \
-Denable_mcp=true -Denable_mcp_stdio=true \
-Denable_mcp_uds=true \
-Denable_mcp_sse=enabled
ninja -C build && meson test -C build test_mcp_smoke -v
build/test/test_mcp_smoke 2>&1 | tail -3 # expects "17 tests run, 17 passed"
Status update 2026-05-09 — placeholder-ref hardening¶
- Additional touches: same set as the 2026-05-08 ADR-0334 entry, no new files. The hardening adds a
git diff -U0 ... -- docs/state.mdcall insidescripts/ci/state-md-touch-check.sh(case 4a) plus 10 additional fixture cases inscripts/ci/test-state-md-touch-check.sh. - New invariant: inserted lines in
docs/state.md(lines starting with+, excluding the+++ b/...header) must not containthis PR/this commit/ bareTBD/<PR>/#NNN. Canonical accept forms arePR #Nandcommit `<sha>`. The placeholder vocabulary is coupled to PR #541's audit findings — reword in lockstep with the ADR-0334 status-update appendix if the fork's row template changes. - Re-test on rebase: same
bash scripts/ci/test-state-md-touch-check.shrun as the 2026-05-08 entry; the harness now reports18/18 passed(was8/8 passed).
0347 — Sanitizer matrix test-set scope (ADR-0347)¶
- Touches:
.github/workflows/tests-and-quality-gates.ymljobsanitizers(build + test step),core/test/meson.build(no edits — the absence of anysuite: 'unit'tag is the upstream state we now work with rather than against). - Invariant: the sanitizer job runs the full C unit-test set per sanitizer with a per-sanitizer deselect list driven by a
caseblock on${{ matrix.sanitizer }}. The deselect lists are load-bearing — each entry corresponds to a real bug tracked indocs/state.md. Under UBSan the build adds-Dc_args=-fno-sanitize=function -Dcpp_args=-fno-sanitize=functionto suppress the K&R-prototype harness UB; the mesoncasebranch must keep this build flag in sync with the test deselect entries. An upstream rebase that adds new test files viacore/test/meson.buildinherits full sanitizer coverage automatically (the workflow enumerates tests viameson test --list). - On upstream sync: if upstream Netflix lands a
suite: 'unit'tagging convention, the workflow is robust to it (we already enumerate frommeson test --list, not from--suite=unit). If upstream rewrites the harness to declarestatic char *test_X(void)with a(void)parameter, the-fno-sanitize=functionflag becomes redundant — leave it in place (zero cost) until a deliberate cleanup PR reverts the suppression. If upstream lands a fix for any of the surfaced defects (SVMModelParservalidation,feature_collectormetadata leak,integer_adm::div_lookuprace,framesyncmutex mismatch), drop the corresponding deselect row from the workflow'scaseblock in the same PR that pulls the upstream fix. cd libvmaf for SAN in address undefined thread; do EXTRA=() [ "$SAN" = undefined ] && EXTRA=( "-Dc_args=-fno-sanitize=function" "-Dcpp_args=-fno-sanitize=function" ) rm -rf "build-$SAN" CC=clang CXX=clang++ LDFLAGS=-fuse-ld=lld \ meson setup "build-$SAN" -Db_sanitize="$SAN" \ -Denable_cuda=false -Denable_sycl=false --buildtype=debug \ -Db_lto=false -Db_lundef=false "${EXTRA[@]}" meson compile -C "build-$SAN" case "$SAN" in address) EXCLUDE='test_model$|test_predict$|test_float_ms_ssim_min_dim$' ;; undefined) EXCLUDE='test_model$' ;; thread) EXCLUDE='test_model$|test_pic_preallocation$|test_framesync$' ;; esac TESTS=$(meson test -C "build-$SAN" --list \ | grep '^libvmaf:' \ | grep -vE "$EXCLUDE" \ | sed 's/^libvmaf://') meson test -C "build-$SAN" --print-errorlogs $TESTS
CodeQL bulk mechanical sweep — Python tree (2026-05-09)¶
- Why this matters on rebase: no rebase impact. The diff lives entirely in
python/vmaf/and one fork-local helper (core/src/vulkan/spv_embed.py). None of the touched Python modules have been changed by Netflix upstream in over four years; the closest churn is unrelated additions topython/vmaf/script/run_*.pydriver flags. A future/sync-upstreamwill land on a clean tree. - What changed: dead imports removed;
exit()→sys.exit()in seven CLI driver scripts;open(...)→with open(...)inpython/vmaf/tools/decorator.pyandcore/src/vulkan/spv_embed.py; typedexcept KeyError: passbodies got an explanatory one-line comment to satisfypy/empty-except;passremoved where it was a no-op tail statement; one commented-out debug block deleted fromtools/misc.py. - Re-test on rebase:
python3 -c "import ast; [ast.parse(open(f).read()) for f in (...)]"over the touched files;ruff checkover the same set must produce no NEW errors versus master baseline.
0345 — cambi × {CUDA, SYCL, HIP} GPU port planning (ADR-0345, docs-only)¶
- Touches:
docs/research/0091-cambi-gpu-port-planning-2026-05-09.md(new),docs/adr/0345-cambi-gpu-port-strategy.md(new),docs/adr/_index_fragments/0345-cambi-gpu-port-strategy.md(new fragment),docs/adr/_index_fragments/_order.txt(append slot),changelog.d/changed/cambi-gpu-planning-digest.md(new). No code. Companion to the per-port PRs that follow per the digest's §6 ordered plan (CUDA → SYCL → HIP). - Upstream source: none — fork-local planning artefact. Netflix/vmaf upstream has no CUDA / SYCL / HIP cambi twin and no plans to add one on those backends.
- Invariant: the planning round locks Strategy II host-staged hybrid for the three pending backends, inheriting verbatim from ADR-0205 §Decision and ADR-0210 §Decision. The cross-backend gate contract for cambi is
places=4from day one on all backends — by construction (integer-only GPU pre-passes; byte-identical readback; unmodified host residual). If any per-port PR sees empirical drift from CPU, fix the kernel — never relax the gate (memoryfeedback_no_test_weakening). The sharedcambi_internal.hhost residual surface (shipped with PR #196 for the Vulkan port) is the load-bearing reuse point — all four GPU twins (Vulkan, CUDA, SYCL, HIP) link against it and inherit any future CPU-side c-value formula change automatically. - On upstream sync: no action required. If a future upstream sync introduces a Netflix/vmaf cambi GPU twin (extremely unlikely — Netflix has no public CUDA / SYCL / HIP cambi work), evaluate whether to drop the fork's twin in favour of upstream's per the standard prefer-upstream rule; otherwise no action.
- Re-test on rebase: docs-only — no compile / runtime gate. The Strategy III v2 follow-up (parked per ADR-0205 §Out of scope) gets its own ADR + rebase-notes entry when profile data lands.
0320 — Vulkan VIF API-1.4 NVIDIA residual Phase 3b (deferral)¶
- Touches:
core/src/feature/vulkan/shaders/vif.comp(comment-only update at the Phase-4 reduction site — documents the Phase-3b candidate-fix experiments and the driver-side hypothesis; no code logic change vs. PR #511);docs/adr/0269-vif-ciede-precise-step-a.md(appended Phase-3b status update appendix; ADR body remains frozen per ADR-0028);docs/research/0090-...md(new);docs/state.md(rowT-VK-VIF-1.4-RESIDUAL-ARCretired in favour ofT-VK-VIF-1.4-RESIDUAL-NVIDIA-DEFERREDafter the hardware-mapping correction);core/src/vulkan/AGENTS.md(Phase 3b update + rebase invariant for cross-backend gate device-name selection);changelog.d/fixed/vif-arc-mesa-anv-int64-reduction.md(new fragment). - Invariant: the workgroup-scope
memoryBarrierShared(); barrier();pair PR #511 introduced is load-bearing for the Arc + RADV lanes at API 1.4 and stays. Phase 3b confirmed it cannot be downgraded back to a barebarrier()even if the NVIDIA residual ever closes — Arc's clean state is contingent on the workgroup-scope pair. - Cross-backend gate device-selection invariant (NEW): scripts that target a specific Vulkan vendor must select by
deviceNamesubstring, not by--vulkan_device <index>.vmaf_vulkan_context_new's device sort is stable inside the samedevtype_scorebucket and thevkEnumeratePhysicalDevicesenumeration order is host-policy-dependent (driver registration order in/etc/vulkan/icd.d/, Mesa device-select layer,VK_LOADER_*env vars). PR #511's commit message inverted the device map on this fork's CI workstation; the empirical numbers it cited as "NVIDIA" actually came from Arc and vice versa. New cross-backend lanes targeting a specific vendor should not inherit the off-by-one. - On upstream sync:
vif.compis fork-local; no upstream Netflix/vmaf has a Vulkan path. Cherry-picks from upstream cannot reach this file. - Re-test on rebase (assumes a multi-GPU CI workstation with NVIDIA + Arc + RADV; lavapipe-only CI lanes are a no-op for the API-1.4 residual since lavapipe never reproduced the bug):
# Local API-1.4 bump (off-master reproducer; do NOT commit).
sed -i 's/VK_API_VERSION_1_3/VK_API_VERSION_1_4/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1003000/VMA_VULKAN_VERSION 1004000/' \ core/src/vulkan/vma_impl.cpp cd libvmaf && meson setup build -Denable_vulkan=enabled \ -Denable_cuda=false -Denable_sycl=false && ninja -C build cd ..
# NVIDIA lane — expected 45/48 FAIL scale 2 until either the
# manual int64 subgroup-reduction patch lands or NVIDIA fixes
# the driver. Arc + RADV expected 0/48.
python3 scripts/ci/cross_backend_vif_diff.py \ --vmaf-binary core/build/tools/vmaf \ --reference testdata/ref_576x324_48f.yuv \ --distorted testdata/dis_576x324_48f.yuv \ --width 576 --height 324 \ --feature vif --backend vulkan --device
# Revert local bump after testing.
sed -i 's/VK_API_VERSION_1_4/VK_API_VERSION_1_3/g' \ core/src/vulkan/common.c sed -i 's/VMA_VULKAN_VERSION 1004000/VMA_VULKAN_VERSION 1003000/' \ core/src/vulkan/vma_impl.cpp
Upstream-port-later batch — Research-0090 18-commit triage close-out (2026-05-09)¶
- Touches:
docs/state.md(one row in "Deferred (waiting on external trigger)"), this file,changelog.d/changed/upstream-port-later-batch-2026-05-09.md. No code touched. Companion to PR #446 (Research-0090) and the in-flight PRs #497 (MyTestCase super-PR), #443 / #444 (cambi-docs duplicate pair). - Per-commit classification (input set: 18 PORT_LATER SHAs from Research-0090):
| # | Upstream SHA | Subject (truncated) | Verdict | Reopen / forward path |
|---|---|---|---|---|
| 1 | 38e905d1 | adopt MyTestCase + reformat BD-rate test data | PORT_DEFERRED | Subsumed by PR #497 commit e1dbdc09; close out when #497 merges |
| 2 | 005988ea | adopt MyTestCase + port new tests + align fifo_mode | PORT_DEFERRED | Subsumed by PR #497 commit 6c05afe2; close out when #497 merges |
| 3 | 4679db83 | fix VMAFEXEC_score tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit 0004d2cf — must preserve fork's golden places= values byte-for-byte (CLAUDE §8 / ADR-0024) |
| 4 | 3e075107 | adopt MyTestCase + update score values in vmafexec tests | PORT_DEFERRED | Subsumed by PR #497 commit 0004d2cf; close out when #497 merges |
| 5 | e3827e4d | adopt MyTestCase + port new tests in asset/bootstrap/local_explainer | PORT_DEFERRED | Subsumed by PR #497 commit 6c05afe2; close out when #497 merges |
| 6 | 25ff9f18 | remove empty VmafossexecCommandLineTest stub | PORT_DEFERRED → CHERRY-PICK after #497 | Pure 13-line deletion. PR #497 currently RE-EMITS the stub; once #497 lands, cherry-pick this commit standalone (zero-conflict against post-#497 tip). |
| 7 | 3a041a97 | adopt MyTestCase + update score values | PORT_DEFERRED | Subsumed by PR #497 commit d52d9221; close out when #497 merges |
| 8 | ead2d12b | fix vif_scale3 + adm3_egl_1 tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit b5a3f61b — Netflix-golden tolerance guard same as row 3 |
| 9 | 6c097fc4 | reduce ADM/VIF tolerances for macOS FP precision | PORT_DEFERRED w/ Netflix-golden guard | PR #497 commit f3881d5c — Netflix-golden tolerance guard same as row 3 |
| 10 | 7df50f3a | align testutil with full set of fixture functions | PORT_DEFERRED | Subsumed by PR #497 commit f1ae0495; close out when #497 merges |
| 11 | 322ca041 | replace temporal slicing with pre-sliced YUV fixtures | PORT_DEFERRED | Subsumed by PR #497 commit 7d9d9a10; close out when #497 merges. Sequencing matters: this commit must land before rows 12, 14, 15, 17 (the YUV-fixture consumers); #497 already orders them correctly. |
| 12 | 74bdce1b | align vmafexec_feature_extractor_test (aim/adm3/motion3) | PORT_DEFERRED | Subsumed by PR #497 commit 07e7cb48; close out when #497 merges |
| 13 | a3776335 | align feature_extractor_test (aim/adm3/motion3) | PORT_DEFERRED | Subsumed by PR #497 commit 15a6874d; close out when #497 merges |
| 14 | 0341f730 | remove duplicate test_run_vmaf_integer_fextractor | PORT_DEFERRED → CHERRY-PICK after #497 | Pure 76-line deletion. Same disposition as row 6 — #497 currently re-emits the duplicate; cherry-pick standalone after #497. |
| 15 | 9fa593eb | port feature_extractor tests for aim/adm3/motion3 + new options | PORT_DEFERRED | Subsumed by PR #497 commit ab21b694; close out when #497 merges |
| 16 | d93495f5 | reduce tolerance for VMAF scores in quality_runner tests | PORT_DEFERRED w/ Netflix-golden guard | PR #497 — Netflix-golden tolerance guard same as row 3 |
| 17 | 7d1ad54b | port feature extractor tests for aim/adm3/motion3 | PORT_DEFERRED | Subsumed by PR #497 commit 44b9e626; close out when #497 merges |
| 18 | 721569bc | resource/doc: cambi_high_res_speedup + motion2 score | PORT_DEFERRED → DEDUP | Already in flight on TWO branches (PR #443 + PR #444). Maintainer picks one and abandons the other per Research-0090 §Recommended action #4. No third port-PR opened. |
- Invariant: after PR #497 merges, the Research-0090 PORT_LATER bucket reduces to exactly two follow-up cherry-picks against post-#497 master:
git cherry-pick 25ff9f18(delete emptyVmafossexecCommandLineTest).git cherry-pick 0341f730(delete duplicatetest_run_vmaf_integer_fextractor). Both are pure deletions onpython/test/command_line_test.pyandpython/test/feature_extractor_test.pyrespectively; no score change, no Netflix-golden interaction. They were excluded from PR #497 because the v2 super-PR's diff state currently RE-EMITS those identifiers (likely because #497 cherry-picked from an earlier upstream tip than25ff9f18/0341f730).- Netflix-golden guard (binding): per CLAUDE §8 / ADR-0024, the three Netflix CPU golden pairs in
python/test/quality_runner_test.py,vmafexec_test.py,vmafexec_feature_extractor_test.py,feature_extractor_test.py,result_test.py(1 normalsrc01_hrc00↔hrc01+ 2 checkerboard) carry hard-codedassertAlmostEqualrows that are NEVER modified by a fork PR. Upstream commits4679db83,ead2d12b,6c097fc4,d93495f5explicitly LOWERplaces=on a subset of those rows (their stated motivation is macOS FP precision drift, not a true score change). Reviewer of PR #497 must verify that the 3 golden pairs retain fork tolerances byte-for-byte; only non-golden rows may adopt the relaxations. - On upstream sync: future
/sync-upstreamruns that re-detect these 18 SHAs should match this entry via the SHA list and short-circuit Pass-2 classification (skip re-triage). - Re-test on rebase: none required at the time of this commit (no code touched); after the two follow-up cherry-picks (
25ff9f18+0341f730) eventually land, run meson test -C build --suite=fast make test-netflix-golden # 3/3 CPU goldens still pass
ADR-0357 — Vulkan readback buffer VMA flag separation (PR pending)¶
What changed: picture_vulkan.{c,h} now exposes two sibling allocation functions: vmaf_vulkan_buffer_alloc (UPLOAD, unchanged) and vmaf_vulkan_buffer_alloc_readback (READBACK, HOST_ACCESS_RANDOM). A new vmaf_vulkan_buffer_invalidate wraps vmaInvalidateAllocation. All 17 feature kernel files under core/src/feature/vulkan/ are updated to use the readback variant for accumulator and partial-sum buffers.
core/src/vulkan/picture_vulkan.c— two new functions + shared helper.core/src/vulkan/picture_vulkan.h— two new declarations.- All 17
core/src/feature/vulkan/*.cfiles — alloc and invalidate call sites. Rebase-sensitivity: low — entirely fork-local Vulkan backend code with no upstream Netflix counterpart. If an upstream sync adds new files tocore/src/vulkan/orcore/src/feature/vulkan/, new readback buffers in those files must be classified (UPLOAD vs READBACK) and use the correct allocator per the table in ADR-0350. Conflict risk on the 17 feature files is zero (upstream doesn't touch them).
ADR-0356 — ffmpeg-patches surface-sync CI gate (2026-05-09)¶
Files added:
scripts/ci/ffmpeg-patches-surface-check.sh(new gate script)..github/workflows/rule-enforcement.yml(newffmpeg-patches-surface-checkjob).docs/adr/0356-ffmpeg-patches-surface-gate.md(decision record).docs/development/automated-rule-enforcement.md(user-facing doc update).
Why this rebase-note exists: the gate is fork-local CI; it does not touch any upstream-shared file, so an upstream merge cannot drop its enforcement. However, whoever runs the next /sync-upstream should be aware that ffmpeg-patches/ integrity is now machine-checked on every PR — if a future libvmaf header rename slips through during conflict resolution and breaks the patch stack, the gate will fire on the post-sync PR and surface the omission immediately rather than at the next sync.
Rebase-sensitivity: zero on the upstream-merge path. Indirect benefit: the gate hardens ffmpeg-patches/ against silent drift, so the patch-stack invariants tracked elsewhere in this file (entries referencing ffmpeg-patches/0001…0009) are now machine-defended.
0320 — HIP CI lane apt-installs ROCm runtime (ADR-0212 status update)¶
- Touches:
.github/workflows/libvmaf-build-matrix.yml(HIP laneif: matrix.hipinstall step + base-deps gate),.github/workflows/required-aggregator.yml(HIP lane added to required-check allow-list). Upstream Netflix/vmaf has no HIP backend and no equivalent CI matrix; conflict probability againstupstream/masteris zero. Entry exists to flag the rebase-sensitive ROCm-version pin for future maintainers. - Invariant: the ROCm version pin (
ROCM_VERSION: "7.2.3") in theInstall ROCm / HIP runtimestep must match the version the maintainer's local box runs against. The apt URL ishttps://repo.radeon.com/rocm/apt/<ver>— the version is part of the path, so AMD effectively snapshots each ROCm release as its own apt repo. Bumping the pin is a one-line change but requires re-validating thatrocm-hip-runtime-devstill pulls the same symbol set; in particular,amdhip64major-version changes have historically brokendlopenconsumers.nobleis the codename forubuntu-24.04, which is whatubuntu-latestresolves to on GitHub-hosted runners as of 2024-04. Ifubuntu-latestrolls forward to a newer LTS, the apt repo path component (https://repo.radeon.com/rocm/apt/<ver> <codename> main) needs to be re-checked against https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/install-methods/package-manager/package-manager-ubuntu.html for the current AMD-supported codename list. - Re-test on rebase:
# Locally, mirror what CI does (assumes ROCm /opt/rocm install on dev box):
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
./build/test/test_hip_smoke # passes with device_count == 0
# Apt-side: verify the URL still resolves (versioned path)
curl -sfI https://repo.radeon.com/rocm/apt/7.2.3/dists/noble/Release \
&& echo OK || echo "ROCm apt URL drifted — bump ROCM_VERSION"
RN-2026-05-08-cambi-cluster — port 9 of 10 upstream cambi commits¶
- Tracked by: ADR-0328, PR
feat/upstream-port-cambi-cluster-2026-05-08. - Cluster: Netflix upstream commits
d655cefe,9fad7317,767a6780,8c60dc9e,bd278ea6,1091b0c1,77474251,933cccb4,984f281fported verbatim.41bacc83("move shared code to cambi.h") explicitly skipped. - Touches:
core/src/feature/cambi.c,core/src/feature/x86/cambi_avx2.c,core/src/feature/x86/cambi_avx2.h,core/test/test_cambi.c.cambi_reciprocal_lut.hstays (fork commitef6d33e6already added it before upstream). - Invariant: the fork uses a
CAMBI_CALC_C_VALUES_BODYmacro incambi.cto share the calculate_c_values loop nest acrosscalculate_c_values(scalar),calculate_c_values_avx2, andcalculate_c_values_neon. Upstream keeps the three variants as separate function definitions incambi.c(scalar) andcambi_avx2.c(AVX-2) with the helpers exposed viacambi.h. The fork's macro keeps the three drivers in lockstep without externalising the helpers. - Twin-update gaps:
- AVX-512: no
calculate_c_values_row_avx512exists; the AVX-512 dispatch path falls through tocalculate_c_values_avx2. Tracked as a perf follow-up — bit-exactness preserved, only throughput affected. - NEON:
calculate_c_values_neonuses scalarcalculate_c_values_row(no NEONcalculate_c_values_row_neonexists yet). Tracked as a perf follow-up. - CUDA / SYCL: cambi has no GPU twin in those backends (the only existing twin is Vulkan, ADR-0205 Strategy II). The Vulkan twin's host-residual shim
vmaf_cambi_calculate_c_valueswas updated in port 933cccb4 to drop the inc/dec range-updater parameters (now(void)-cast sincecalculate_c_valuesself-dispatches its updaters); ABI-compatible withcambi_internal.hcallers. - On upstream sync: when re-syncing cambi, expect conflicts on the
calculate_c_values_avx2body — upstream keeps it as a function incambi_avx2.c, the fork keeps it insidecambi.cvia the macro. The translation is mechanical: take any inner-loop change from upstream's body, apply it once insideCAMBI_CALC_C_VALUES_BODY. The fork'scalculate_c_values_neonhas no upstream counterpart and stays fork-local. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build && build/test/test_cambi
# Optional GPU-parity gate when available:
# ./scripts/cross-backend-diff.sh --feature cambi
ADR-0336 — KonViD MOS head v1 (2026-05-08)¶
- Touches:
ai/scripts/train_konvid_mos_head.py(new),ai/tests/test_train_konvid_mos_head.py(new),tools/vmaf-tune/src/vmaftune/predictor.py(addsPredictor.predict_mos+ the optionalkonvid_mos_head_v1.onnxloader;_DEFAULT_COEFFSand_predict_analyticalare unchanged),tools/vmaf-tune/tests/test_predict_mos.py(new),model/konvid_mos_head_v1.onnx(new),model/konvid_mos_head_v1_card.md(new),model/konvid_mos_head_v1.json(new manifest sidecar),docs/adr/0336-konvid-mos-head-v1.md(new),docs/research/0090-konvid-mos-head-design.md(new),docs/state.md(T-MOS-HEAD-PRODFLIP row),changelog.d/added/0336-konvid-mos-head-v1.md(new). All paths are fork-local; upstream Netflix/vmaf has no MOS-head surface and the predictor lives entirely undertools/vmaf-tune/. - Invariant: the MOS-head ONNX I/O contract is two-input named tensors (
featuresshape(N, 11);encoder_onehotshape(N, 1)) -> one output tensor (mosshape(N,)) with the range[1.0, 5.0]baked into the graph via1 + 4 * sigmoid(raw). The 11 feature columns are(adm2, vif_scale0..3, motion2, saliency_mean, saliency_var, shot_count_norm, shot_mean_len_norm, shot_cut_density)in that exact order — they line up withtrain_konvid_mos_head.FEATURE_COLUMNSand the predictor's_predict_mos_via_headzero-fills layout. ENCODER_VOCAB v4 ships a single"ugc-mixed"slot; multi-slot expansion is append-only.Predictor.predict_mosfalls back tomos = (predicted_vmaf - 30) / 14clamped to[1, 5]whenever the ONNX is missing oronnxruntimeis unavailable — that fallback is the documented behaviour, not a bug. - On upstream sync: no action required. The trainer + predictor + MOS head + tests live entirely under fork-local paths (
ai/,tools/vmaf-tune/,model/); upstream syncs cannot touch them.tools/vmaf-tune/src/vmaftune/predictor.pyis fork-local but co-evolves with vmaf-tune; if a future ADR re-shapesShotFeatures, replay the MOS-head feature-column map in lockstep. - Re-test on rebase:
```bash python3 -m pytest ai/tests/test_train_konvid_mos_head.py tools/vmaf-tune/tests/test_predict_mos.py -v python3 ai/scripts/train_konvid_mos_head.py --smoke --no-export # gate must report PASS
ADR-0335 — AdaptiveCpp as a second SYCL toolchain (2026-05-08)¶
- Touches:
core/src/feature/sycl/sycl_compat.h(new),core/src/feature/sycl/*.cpp(10 attribute call sites in 9 files switched from[[intel::reqd_sub_group_size(N)]]toVMAF_SYCL_REQD_SG_SIZE(N)),core/src/meson.build(toolchain branch in the SYCL block + the feature-kernel block),core/meson_options.txt(description bump onsycl_compiler+ newsycl_acpp_targetsoption),docs/development/sycl-toolchains.md(new),docs/adr/0335-adaptivecpp-second-sycl-toolchain.md(new),docs/adr/_index_fragments/0335-adaptivecpp-second-sycl-toolchain.md(new),docs/adr/_index_fragments/_order.txt(append),docs/adr/README.md(regenerated byconcat-adr-index.sh --write),docs/adr/0217-sycl-toolchain-cleanup.md(status-update appendix per ADR-0028),core/src/sycl/AGENTS.md(invariant row),changelog.d/added/0335-adaptivecpp-second-sycl-toolchain.md(new). No upstream-shared paths incore/src/feature/sycl/*.cppare touched onupstream/master(those TUs are fork-local SYCL twins). - Invariant: Intel
icpxstays the primary toolchain. AdaptiveCpp is opt-in via-Dsycl_compiler=acpp. Any new Intel-specific SYCL kernel attribute (e.g. a future[[intel::*]]decoration,sycl::ext::oneapi::experimental::*use) must land behind a new macro incore/src/feature/sycl/sycl_compat.hrather than appear inline. AdaptiveCpp output is not bit-identical to icpx and not bit-identical to scalar CPU (consistent with the existing CPU-only golden gate). The canonical AdaptiveCpp identification macros areSYCL_IMPLEMENTATION_ACPPand the legacySYCL_IMPLEMENTATION_HIPSYCL, both auto-defined by<sycl/sycl.hpp>. - On upstream sync: if a Netflix upstream cherry-pick lands a bare
[[intel::reqd_sub_group_size(N)]](or any Intel-specific SYCL attribute) on a kernel lambda, wrap the attribute in the appropriateVMAF_SYCL_*compat macro before merging. Upstream has no SYCL backend today, so the conflict surface is small. - Re-test on rebase:
# Plumbing parses cleanly with the icpx default still selected:
meson setup /tmp/build-sycl-icpx libvmaf -Denable_sycl=false
# And the macro count is consistent (10 sites under acpp guard):
grep -rl 'VMAF_SYCL_REQD_SG_SIZE' core/src/feature/sycl | wc -l
# → 9 files (the compat header itself defines the macro;
# 9 kernel TUs consume it.)
ADR-0212 §Status update — HIP runtime (T7-10b, 2026-05-08)¶
- Touches:
core/src/hip/common.c,core/src/hip/kernel_template.c,core/src/hip/meson.build,core/test/test_hip_smoke.c,core/test/meson.build(addedhip_depseverywherevulkan_depsalready appears so test executables that statically pull the feature lib resolvehipMemsetAsync/hipFree). - Invariant: the
kernel_template.chelpers andcommon.cpublic API both store HIP runtime handles (hipStream_t,hipEvent_t) asuintptr_tin the structs that cross the public ABI. The header-purity contract documented incore/src/hip/kernel_template.his load-bearing — moving the cast site (or replacinguintptr_twithvoid *) breaks every consumer TU and the publiclibvmaf_hip.hno-<hip/...>guarantee. The fallbackfind_library('amdhip64', dirs: hip_search_paths)exists because ROCm 7.x publishes nohip-lang.pcand the cmake config breaks under meson's CMake probe — the fallback is the supported path on ROCm 7.x. - Re-test on rebase:
PATH=/opt/rocm/bin:$PATH meson setup build --reconfigure \
-Denable_hip=true -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_hip_smoke
The smoke test self-skips the device-resident assertions when vmaf_hip_device_count() == 0, so it stays portable across CI runners that don't expose an AMD GPU.
saliency_student_v2 — Resize-decoder ablation (ADR-0364, 2026-05-09)¶
- Touches:
ai/scripts/train_saliency_student_v2.py(new),model/tiny/saliency_student_v2.{onnx,json}(new),model/tiny/saliency_student_v2_card.md(new),model/tiny/registry.json(new row),docs/ai/models/saliency_student_v2.md(new),docs/adr/0364-saliency-student-v2-resize-decoder.md(new),docs/research/0089-saliency-student-v2-resize-decoder.md(new),changelog.d/added/saliency-student-v2.md(new). All paths are fork-only — no upstream-mirrored files touched. - Invariant: v1 (
saliency_student_v1.onnx, registry idsaliency_student_v1,smoke: false) stays as the production weights for the C-sidemobilesalextractor. v2 is a parallel artefact undermodel/tiny/; promotion to production is a separate PR. The trainer's_ResizeConvmodule produces an ONNX graph withResize(mode=linear,coordinate_transformation_mode=half_pixel) — every op stays oncore/src/dnn/op_allowlist.cpost-ADR-0258. - On upstream sync: no rebase impact — Netflix has no parallel saliency-student model, no consumer of
Resizein the upstream ONNX surface, and nomodel/tiny/registry in the upstream tree. If Netflix ever lands a saliency model, the fork'ssaliency_student_v{1,2}rows stay independent. - Re-test on rebase:
.venv/bin/python ai/scripts/validate_model_registry.py
.venv/bin/python - <<'EOF'
import onnx
g = onnx.load('model/tiny/saliency_student_v2.onnx')
ops = sorted({n.op_type for n in g.graph.node})
assert 'Resize' in ops and 'ConvTranspose' not in ops, ops
print('v2 ONNX op-set:', ops)
EOF
Predictor v2 — real-corpus LOSO trainer + ADR-0303 gate (2026-05-08)¶
- Touches:
ai/scripts/train_predictor_v2_realcorpus.py(new),ai/scripts/run_predictor_v2_training.sh(new),ai/tests/test_train_predictor_v2_realcorpus.py(new),docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md(Status-update appendix only — body frozen per ADR-0028),changelog.d/added/predictor-v2-realcorpus-trainer.md(new). No upstream-shared paths; the trainer lives entirely under fork-localai/scripts/. - Invariant: the gate constants
SHIP_GATE_MEAN_PLCC = 0.95,SHIP_GATE_PLCC_SPREAD_MAX = 0.005,SHIP_GATE_PER_FOLD_MIN = 0.95,LOSO_FOLD_COUNT = 5mirror ADR-0303 §Decision and the constants inscripts/ci/ensemble_prod_gate.py. They MUST stay in lockstep; if a future ADR changes the gate, update both files (the predictor trainer + the ensemble CI gate) and re-runtest_gate_constants_match_adr_0303. The 14-codec list in_resolve_codecs()is sourced fromvmaftune.predictor._DEFAULT_COEFFSwhen PR #450 is on the path; the hard-coded fallback exists for the bootstrap case where this script lands before #450 merges. Drift between the two is asserted at runtime — adding a 15th codec means updating the mirror. - On upstream sync: no action required. The trainer + tests live entirely under fork-local paths (
ai/scripts/,ai/tests/); upstream Netflix/vmaf has no equivalent surface. PR #450 (the predictor train pipeline) is itself fork-local; an upstream sync that reorganisesai/scripts/would invalidate the relative imports — re-run the test suite if that happens. - Re-test on rebase:
```bash python -m pytest ai/tests/test_train_predictor_v2_realcorpus.py -q bash -n ai/scripts/run_predictor_v2_training.sh python ai/scripts/train_predictor_v2_realcorpus.py --synthetic-smoke --report-out /tmp/p2.json
ADR-0332 — OpenVINO NPU EP wired into tiny-AI dispatch (2026-05-08)¶
- Touches:
core/include/libvmaf/dnn.h,core/src/dnn/ort_backend.{c,h},core/tools/vmaf.c,core/tools/cli_parse.{c,h},core/test/dnn/test_ep_fp16.c,core/test/dnn/test_cli.sh,docs/ai/inference.md,docs/usage/cli.md,docs/development/oneapi-install.md,docs/adr/0332-openvino-npu-ep-wiring.md(new),docs/adr/_index_fragments/0332-openvino-npu-ep-wiring.md(new),changelog.d/added/openvino-npu-ep.md(new). The libvmafdnn/and tools surfaces are fork-local additions; upstream Netflix/vmaf has no tiny-AI / ONNX Runtime dispatch layer, so conflict probability ondnn/is zero. - Invariant:
VmafDnnDeviceenum values9..11(OPENVINO_NPU/OPENVINO_CPU/OPENVINO_GPU) are appended after CoreML5..8. ABI requires these values stay stable across releases — append-only; never renumber. The--tiny-devicevalidator incli_parse.c::ARG_TINY_DEVICEenumerates the keyword set; new keywords append to the validator AND to the help string AND toresolve_tiny_device()invmaf.ctogether. Thevmaf_dnn_session_attached_ep()stable-string list (docs/ai/inference.md+dnn.hdoxygen) gains"OpenVINO:NPU"— consumers asserting on the returned string MUST update. - On upstream sync: no action required for upstream Netflix/vmaf. If a future Netflix sync introduces an unrelated tiny-AI surface (unlikely), reconcile the EP-name list at the merge.
- Re-test on rebase:
cd libvmaf && \
CC=icx CXX=icpx meson setup build -Denable_sycl=true -Denable_cuda=false && \
ninja -C build && \
./build/test/dnn/test_ep_fp16 && \
./build/tools/vmaf --tiny-device=openvino-npu --tiny-device=openvino-cpu \
--tiny-device=openvino-gpu # validator must accept all three keywords
ADR-0365 — CoreML execution provider wiring (2026-05-09)¶
- Touches:
core/include/libvmaf/dnn.h,core/src/dnn/ort_backend.{c,h},core/tools/cli_parse.{c,h},core/tools/vmaf.c,core/test/dnn/test_ep_fp16.c,core/test/dnn/test_cli.sh,docs/ai/inference.md,docs/usage/cli.md. Coordinates with ADR-0332 (OpenVINO NPU EP, PR #496) — both touch the same files; conflicts are mechanical (adjacent enum values, adjacent switch cases, adjacent CLI keyword strings). OpenVINO NPU/CPU/GPU values are 9..11 (after CoreML 5..8). - Invariant:
VmafDnnDeviceenum is append-only. CoreML values are 5..8; OpenVINO pinned variants are 9..11. TheSessionOptionsAppendExecutionProvider("CoreMLExecutionProvider", …)generic form is deliberate so the Linux build needs nocoreml_provider_factory.hinclude. TheMLComputeUnitskey string values (CPUAndNeuralEngine/CPUAndGPU/CPUOnly) are part of the CoreML EP public contract — upstream renames would break the wiring. The AUTO chain inserts CoreML at the last position (after CUDA / OpenVINO / ROCm); reordering changes the Apple-silicon AUTO outcome. - Re-test on rebase:
cd libvmaf && meson setup build -Denable_dnn=auto \
-Denable_cuda=false -Denable_sycl=false \
-Dbuilt_in_models=false && \
ninja -C build && \
./build/test/dnn/test_ep_fp16 && \
VMAF_BIN=$PWD/build/tools/vmaf bash test/dnn/test_cli.sh && \
./build/tools/vmaf --tiny-device coreml-ane 2>&1 | \
grep -q 'Reference' && \
./build/tools/vmaf --tiny-device bogus 2>&1 | \
grep -q 'coreml'
python3 -m pytest tools/external-bench/tests/ -q # must report 7 passed
bash -n tools/external-bench/*/run.sh
0361 — Metal (Apple Silicon) backend scaffold (ADR-0361)¶
- Touches:
core/include/libvmaf/libvmaf_metal.h(new, fork-local) — public C-API for the Metal backend (vmaf_metal_state_init/_import_state/_state_free/vmaf_metal_list_devices/vmaf_metal_available). Mirrors the HIP / Vulkan / SYCL / CUDA public-header convention; opaque runtime types cross the ABI asuintptr_tper ADR-0361 / ADR-0212 / ADR-0184.core/src/metal/{common,picture_metal,dispatch_strategy,kernel_template}.{c,h}AGENTS.md+meson.build(new, fork-local) — backend tree. Every entry point returns-ENOSYS. Thekernel_templatefield shape mirrors the HIP twin modulo the unified-memory buffer collapse (oneMTLBufferwithMTLResourceStorageModeSharedinstead of the (device, pinned-host) readback pair).
core/src/feature/metal/integer_motion_v2_metal.c(new, fork-local) — first kernel-template consumer. Mirrorsfeature/hip/integer_motion_v2_hip.ccall-graph-for-call-graph modulo the single-buffer prev-ref slot (vs the HIP twin'spix[2]ping-pong).core/test/test_metal_smoke.c(new, fork-local) — 14-sub-test smoke pinning the-ENOSYScontract. Mirrorstest_hip_smoke.c.core/meson_options.txt— newenable_metalfeature option (defaultauto). Onautothe parent meson resolves tohost_machine.system() == 'darwin'so non-macOS hosts compile cleanly without the frameworks;enabledforces linkage and fails on non-macOS. Type-featurematchesenable_dnn's auto-resolve shape (Metal on macOS is always available, like DNN on a host with ONNX Runtime); the GPU-vendor-pair boolean-default-off triad (enable_cuda/enable_sycl/enable_hip) does not fit because Metal has no comparable "wrong-host silent flip" risk.core/src/meson.build—is_metal_enabledresolution +subdir('metal')+metal_sources/metal_depsthreaded throughlibvmaf_feature_static_libandlibvmaflibrary() calls alongside CUDA / SYCL / Vulkan / HIP / DNN aggregations.core/test/meson.build—test_metal_smokeexecutable wired under the same auto-on-macOS / explicit-enabled gate.core/src/feature/feature_extractor.c— addsextern VmafFeatureExtractor vmaf_fex_integer_motion_v2_metal;- registry entry under
#if HAVE_METAL.
- registry entry under
.github/workflows/libvmaf-build-matrix.yml— new laneBuild — macOS Metal (T8-1 scaffold)onmacos-latestwith-Denable_metal=enabled. Themacos-latestrunner ships the Metal SDK as part of the system framework set; no extra install step is needed.docs/backends/metal/index.md(new, fork-local) +docs/backends/index.md(row added) + ADR-0361 + index fragment +changelog.d/added/metal-backend-scaffold.md+docs/state.mdrow T8-1b.- Upstream-port footprint: zero — Netflix/vmaf does not ship a Metal backend; this is a wholly fork-local addition. No upstream file is touched. Same posture as the HIP scaffold (T7-10) and the Vulkan scaffold (T5-1).
- Rebase invariants (mirror the HIP scaffold's invariant set):
metal/kernel_template.hmirrorship/kernel_template.hmodulo the unified-memory buffer collapse (singleMTLBufferslot vs the HIP(device, pinned-host)pair). On rebase, if the HIP twin's lifecycle struct gains a third event slot, the Metal twin must follow in the same PR.feature/metal/integer_motion_v2_metal.cmirrorsfeature/hip/integer_motion_v2_hip.ccall-graph-for-call-graph modulo the single-prev_ref-slot collapse (vs the HIP twin'spix[2]ping-pong). On rebase, drift in the HIP twin's submit body (e.g. an addedsubmit_pre_launchcall) requires a paired update here.vmaf_fex_integer_motion_v2_metalregisters without theVMAF_FEATURE_EXTRACTOR_METALflag bit set. The flag bit is reserved for the runtime PR (T8-1b) which adds theVMAF_PICTURE_BUFFER_TYPE_METAL_DEVICEtag and then sets the flag. Same posture as the HIP twin'sVMAF_FEATURE_EXTRACTOR_HIP-deferral; on rebase, leave the flags atVMAF_FEATURE_EXTRACTOR_TEMPORALonly until T8-1b.- Re-test (on macOS only — Linux dev sessions cannot run this lane locally):
And on every host (Linux / Windows included): the default-build gate must stay green — the auto-probe resolves to disabled on non-macOS hosts so meson setup build && ninja -C build runs unchanged.
ADR-0325 — vmaf-tune auto Phase F.1 + F.2 short-circuits (2026-05-08)¶
0327 — Conformal-VQA prediction surface for vmaf-tune (ADR-0279)¶
- Touches:
tools/vmaf-tune/src/vmaftune/conformal.py(new),tools/vmaf-tune/src/vmaftune/predictor.py(Predictor.predict_vmaf_with_uncertainty),tools/vmaf-tune/src/vmaftune/cli.py(predictsubcommand gains--with-uncertainty/--calibration-sidecar/--alpha),tools/vmaf-tune/tests/test_conformal.py(new),docs/ai/conformal-vqa.md(new). No engine code touched; no upstream-shared paths. - Invariant: the conformal wrapper sits outside the ONNX graph and adds no new runtime dependency —
conformal.pyimports only the standard library (math,statistics,dataclasses,json,warnings). Future calibration-sidecar shapes use themethoddiscriminator string for versioning; do not rename"split-conformal"/"cv-plus"without bumping the loader. ThePredictor.predict_vmaf_with_uncertaintysignature is the Python-API contract consumed byvmaf-tune predict --with-uncertainty; renaming or reordering its keyword args breaks the CLI in lockstep. - On upstream sync: no action required.
vmaf-tuneis a fork-local tool; upstream Netflix/vmaf has no per-shot prediction surface. - Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_conformal.py -q
python3 -m pytest tools/vmaf-tune/tests/test_predictor.py -q
ADR-0364 — vmaf-tune auto Phase F.1 + F.2 short-circuits (2026-05-08)¶
- Touches:
tools/vmaf-tune/src/vmaftune/auto.py(new),tools/vmaf-tune/src/vmaftune/cli.py(addedautosubparser + dispatcher),tools/vmaf-tune/tests/test_auto_short_circuits.py(new),tools/vmaf-tune/AGENTS.md(invariant row),docs/usage/vmaf-tune.md(## autosection),docs/adr/0364-vmaf-tune-phase-f-auto.md(status update — already-accepted body untouched per ADR-0028; appended a### Status updateblock under## References). No upstream-shared paths.
ADR-0325 — vmaf-tune auto Phase F.1 + F.2 short-circuits (2026-05-08)¶
ADR-0371 — Shared CorpusIngestBase (2026-05-10)¶
No rebase impact: pure Python refactor under ai/ — no C/header/patch changes, no upstream-shared paths touched. All six MOS-corpus adapter scripts now import from corpus.base import CorpusIngestBase (PYTHONPATH=ai/src); if a future upstream sync adds a corpus/ directory under ai/ the import path may collide but the risk is negligible (Netflix/vmaf does not carry an ai/ subtree).
- Touches:
tools/vmaf-tune/src/vmaftune/auto.py(new),tools/vmaf-tune/src/vmaftune/cli.py(addedautosubparser + dispatcher),tools/vmaf-tune/tests/test_auto_short_circuits.py(new),tools/vmaf-tune/AGENTS.md(invariant row),docs/usage/vmaf-tune.md(## autosection),docs/adr/0325-vmaf-tune-phase-f-auto.md(status update — already-accepted body untouched per ADR-0028; appended a### Status updateblock under## References). No upstream-shared paths. - Invariant:
SHORT_CIRCUIT_PREDICATESinauto.pyis an ordered tuple, not a set. The seven entries appear in the canonical orderLADDER_SINGLE_RUNG,CODEC_PINNED,PREDICTOR_GOSPEL,SKIP_SALIENCY,SDR_SKIP,SAMPLE_CLIP_PROPAGATE,SKIP_PER_SHOT. The JSON schema records short-circuits in this order underplan.metadata.short_circuits; downstream consumers (CI corpus collector, post-hoc speedup analysis) parse the canonical-order list. Adding an eighth short-circuit (F.3+) appends; never reorder. The Phase D thresholds (PHASE_D_DURATION_GATE_S = 300.0,PHASE_D_SHOT_VARIANCE_GATE = 0.15) are placeholders pending F.3 empirical fit. - On upstream sync: no action required. Module is fork-local (
tools/vmaf-tune/is fork-only). Thevmaf-tuneumbrella ADR-0237 explicitly carves Phases B–F out of upstream scope. - Re-test on rebase:
cd tools/vmaf-tune && python -m pytest tests/test_auto_short_circuits.py -v
PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli auto \
--src /dev/null --target-vmaf 93 --max-budget-bitrate 5000 \
--allow-codecs libx264 --sample-clip-seconds 10 --smoke
ADR-0325 — vmaf-tune auto Phase F.3 confidence-aware fallbacks (2026-05-08)¶
- Touches:
tools/vmaf-tune/src/vmaftune/auto.py(F.3 helpers,_confidence_aware_escalation,ConfidenceThresholds,ConfidenceDecision,load_confidence_thresholds, per-cell wiring inrun_auto),tools/vmaf-tune/tests/test_auto_confidence_aware.py(new, 28 tests),tools/vmaf-tune/AGENTS.md(invariant note),docs/usage/vmaf-tune.md(new### Confidence-aware fallbacks (F.3)subsection under## auto),docs/adr/0325-vmaf-tune-phase-f-auto.md(status update appended per ADR-0028; already-Accepted body untouched),changelog.d/added/phase-f3-confidence-aware-fallbacks.md(new). No upstream-shared paths. - Invariant:
DEFAULT_TIGHT_INTERVAL_MAX_WIDTH = 2.0andDEFAULT_WIDE_INTERVAL_MIN_WIDTH = 5.0are an emergency floor (Research-0067), not a target. The production values come from a JSON calibration sidecar produced by the conformal-VQA pipeline (ADR-0279) with the canonical keystight_interval_max_widthandwide_interval_min_width.load_confidence_thresholdsfalls back to the defaults with a one-line WARNING when no sidecar is found; do not silence the warning._confidence_aware_escalationis a pure function of its three inputs and is exposed in__all__so downstream tools (the MCP server'sautoproxy, the CI corpus collector) can embed it directly. The JSON schema records per-cell decisions inplan.metadata.confidence_aware_escalations[](one entry per(rung, codec)cell with keysrung,codec,verdict,interval_width,decision); each cell inplan.cells[]also carriesconfidence_decision+interval_widthso consumers don't need to cross-reference the metadata array index. Adding a fourthConfidenceDecisionvalue is a schema bump — coordinate with downstream JSON consumers. - On upstream sync: no action required.
tools/vmaf-tune/is fork-only; the conformal-VQA prediction surface (ADR-0279) and the F.1 + F.2 scaffold (ADR-0325) are both fork-local. - Re-test on rebase:
cd tools/vmaf-tune && python -m pytest \
tests/test_auto_confidence_aware.py \
tests/test_auto_short_circuits.py \
tests/test_conformal.py -v
ADR-0325 — vmaf-tune auto Phase F.4 per-content-type recipe overrides (2026-05-09)¶
- Touches:
tools/vmaf-tune/src/vmaftune/auto.py(added_apply_recipe_override,_CONTENT_RECIPE_TABLE,get_recipe_for_class, the four_<class>_recipefactories, and theRECIPE_CLASS_*constants; integrated the override intorun_autoand addedrecipe_applied/effective_predictor_target_vmafto the JSON metadata),tools/vmaf-tune/tests/test_auto_recipe_overrides.py(new — 37 assertions),tools/vmaf-tune/tests/test_auto_short_circuits.py(one test updated for the F.4 force-single-rung semantics on animation sources),tools/vmaf-tune/AGENTS.md(invariant row),docs/usage/vmaf-tune.md(### Per-content-type recipes (F.4)subsection),docs/adr/0325-vmaf-tune-phase-f-auto.md(status update appended; already-accepted body untouched per ADR-0028),changelog.d/added/phase-f4-content-recipes.md. No upstream-shared paths. - Invariant:
_CONTENT_RECIPE_TABLEstores factory callables, not literal dicts. Everyget_recipe_for_class/_apply_recipe_overridecall returns a fresh override dict so caller mutations cannot leak between runs. The four override keys honoured by the driver aretight_interval_max_width,force_single_rung,saliency_intensity,target_vmaf_offset; the_RECIPE_KEYSallowlist filters anything else as defence-in-depth. Thetarget_vmaf_offsetshifts onlyeffective_predictor_target_vmaf; the inputtarget_vmaf(production-flip gate) is preserved verbatim. Every threshold value at F.4 is provisional pending F.5 calibration — do not promote a placeholder to "calibrated" in a drive-by edit. - On upstream sync: no action required.
tools/vmaf-tune/is fork-local; ADR-0237 explicitly carves Phases B–F out of upstream scope. - Re-test on rebase:
PYTHONPATH=tools/vmaf-tune/src python -m pytest \
tools/vmaf-tune/tests/test_auto_recipe_overrides.py \
tools/vmaf-tune/tests/test_auto_short_circuits.py \
tools/vmaf-tune/tests/test_auto_confidence_aware.py -v
PYTHONPATH=tools/vmaf-tune/src python -c \
"from pathlib import Path; from vmaftune.auto import run_auto, SourceMeta; \
m = SourceMeta(height=1080, width=1920, content_class='animation', duration_s=120, shot_variance=0.05); \
p = run_auto(src=Path('/dev/null'), target_vmaf=93.0, max_budget_kbps=5000.0, \
allow_codecs=('libx264',), smoke=True, meta_override=m); \
assert p.metadata['recipe_applied'] == 'animation'; \
assert p.metadata['target_vmaf'] == 93.0; \
assert p.metadata['effective_predictor_target_vmaf'] == 95.0; \
print('F.4 smoke OK')"
ADR-0325 — vmaf-tune auto Phase F.5 calibrated recipe overrides (2026-05-09)¶
- Touches:
ai/scripts/calibrate_phase_f_recipes.py(new),ai/data/phase_f_recipes_calibrated.json(new — tracked via the.gitignore!ai/data/phase_f_recipes_calibrated.jsonallow rule),tools/vmaf-tune/src/vmaftune/auto.py(added_F4_PLACEHOLDER_RECIPES,_CALIBRATED_RECIPES_FILENAME,_find_calibrated_recipes_path,_load_calibrated_recipes,_CALIBRATED_RECIPES; the four_<class>_recipefactories now read from_CALIBRATED_RECIPES),tools/vmaf-tune/tests/test_calibrated_recipes.py(new — 14 assertions),docs/usage/vmaf-tune.md(calibrated table replaces the F.4 placeholder table in the### Per-content-type recipes (F.4)subsection),docs/adr/0325-vmaf-tune-phase-f-auto.md(### Status update 2026-05-09: F.5 calibratedappended; already-accepted body untouched per ADR-0028),changelog.d/changed/phase-f5-calibrated-recipes.md,.gitignore(one allow rule for the JSON file). No upstream-shared paths. - Invariant: the
_CONTENT_RECIPE_TABLEfactories now consume_CALIBRATED_RECIPESsnapshotted at module import. The runtime load is a single read; reloading at runtime requiresimportlib.reload(vmaftune.auto). Everyget_recipe_for_class/_apply_recipe_overridecall still returns a fresh dict — the read-only invariant from F.4 is preserved bydict(_CALIBRATED_ RECIPES[<cls>]). The_load_calibrated_recipesloader strips every_provenancesub-dict and filters every key against_RECIPE_KEYSso a malicious or malformed JSON cannot inject unknown keys into a recipe. Per memoryfeedback_no_test_weakening, the calibration cannot widen the production-flip gate beyond the ConfidenceThresholds wide-interval ceiling — the regression testtest_calibrated_ugc_width_below_wide_gate_ceilinglocks this in. - On upstream sync: no action required.
tools/vmaf-tune/,ai/scripts/,ai/data/are all fork-local; ADR-0237 explicitly carves Phases B–F out of upstream scope. - Re-test on rebase:
PYTHONPATH=tools/vmaf-tune/src python -m pytest \
tools/vmaf-tune/tests/test_calibrated_recipes.py \
tools/vmaf-tune/tests/test_auto_recipe_overrides.py -v
python ai/scripts/calibrate_phase_f_recipes.py \
--corpus .workingdir2/konvid-150k/konvid_150k.jsonl \
--out /tmp/recipes_smoke.json \
--max-rows 10000
ADR-0335 — Hardware-capability priors (2026-05-08)¶
- Touches:
ai/data/hardware_caps.csv(new),ai/scripts/hardware_caps_loader.py(new),ai/tests/test_hardware_caps.py(new),ai/AGENTS.md(one new bullet under "Rebase-sensitive invariants"),docs/ai/hardware-capability-priors.md(new),docs/research/0088-hardware-capability-priors-2026-05-08.md(new),docs/adr/0335-hardware-capability-priors.md(new),docs/adr/_index_fragments/0335-hardware-capability-priors.md(new),docs/adr/_index_fragments/_order.txt(one-line append),CHANGELOG.md(Added bullet under[Unreleased] — lusoris fork). No upstream-shared paths. - Invariant: the table is prior-only. The schema check in
hardware_caps_loader.pyrejects benchmark-shaped header columns (fps_*,throughput,mbps,latency,watts,tdp,score_*,vmaf_*), community-wiki source URLs (wikipedia.org,wikichip.org), empty fields, and rows withencoding_blocks=0. Adding throughput / quality columns is forbidden — that pathology was the contributor-pack digest's category-1 NO-GO finding. Schema extensions need a new ADR, not a silent column bump. Thecap_vector_for()return-dict shape is load-bearing: trainers / corpus writers consumehwcap_*columns by name; reordering or renaming silently breaks downstream parquet schemas. - On upstream sync: no action required. The whole surface lives under
ai/anddocs/— Netflix upstream has no equivalent. - Re-test on rebase:
```bash python -m pytest ai/tests/test_hardware_caps.py -v # must report 23 passed python ai/scripts/hardware_caps_loader.py # JSON dump, 6+ rows
ADR-0367 — LSVQ corpus ingestion (2026-05-08)¶
- Touches:
ai/scripts/lsvq_to_corpus_jsonl.py(new),ai/tests/test_lsvq.py(new),docs/adr/0367-lsvq-corpus-ingestion.md(new),docs/adr/README.md(regenerated index),docs/ai/lsvq-ingestion.md(new),docs/research/0090-lsvq-corpus-feasibility.md(new),changelog.d/added/0367-lsvq-ingestion.md(new). No engine code touched; no upstream-shared paths. - Invariant: the JSONL row schema emitted by this adapter is byte-identical to the KonViD-150k Phase 2 adapter (
ai/scripts/konvid_150k_to_corpus_jsonl.py) modulo thecorpusandcorpus_versionliterals. If a future PR widens the row contract (new column, type change), the LSVQ adapter must follow in lockstep — the trainer-side data loader consumes both shards through one schema. - On upstream sync: no action required. The adapter lives entirely under fork-local paths (
ai/scripts/,ai/tests/) and only consumes a fork-local CSV manifest. - Re-test on rebase:
ADR-0325 — Local sidecar training scaffold (2026-05-08)¶
- Touches:
tools/vmaf-tune/src/vmaftune/sidecar.py(new),tools/vmaf-tune/tests/test_sidecar.py(new),docs/adr/0325-local-sidecar-training.md(new),docs/adr/_index_fragments/0325-local-sidecar-training.md(new),docs/adr/_index_fragments/_order.txt(append),docs/adr/README.md(index row),docs/research/0086-local-sidecar-feasibility.md(new),docs/ai/local-sidecar-training.md(new),changelog.d/added/local-sidecar-training-scaffold.md(new),tools/vmaf-tune/AGENTS.md(sidecar invariant note). No engine code touched; no upstream-shared paths. - Invariant: the sidecar's on-disk state schema (
SIDECAR_SCHEMA_VERSION = 1,FEATURE_DIM = 14, the column order in_feature_vector) is the load-bearing pin. Adding columns or reordering them must bumpSIDECAR_SCHEMA_VERSION; otherwise saved state from older harness versions silently aligns mismatched columns to the wrong feature. TheSidecarConfig.predictor_versiontag is the load-bearing pin against shipped-predictor upgrades — bumping it is the contract that invalidates stale corrections without operator intervention. - On upstream sync: no action required. The sidecar lives entirely under
tools/vmaf-tune/(fork-local) and only consumes the existingPredictor/ShotFeaturessurface. Upstream Netflix/vmaf does not ship avmaf-tuneanalogue; conflict probability is zero. - Re-test on rebase:
```bash cd tools/vmaf-tune && python -m pytest tests/test_sidecar.py -v
ADR-0368 — YouTube UGC corpus ingestion (2026-05-08)¶
- Touches:
ai/scripts/youtube_ugc_to_corpus_jsonl.py(new),ai/tests/test_youtube_ugc.py(new),docs/adr/0368-youtube-ugc-corpus-ingestion.md(new),docs/adr/_index_fragments/0368-youtube-ugc-corpus-ingestion.md(new),docs/adr/_index_fragments/_order.txt(one-line append),docs/adr/README.md(regenerated index),docs/ai/youtube-ugc-ingestion.md(new),docs/research/0091-youtube-ugc-corpus-feasibility.md(new),changelog.d/added/0368-youtube-ugc-ingestion.md(new),ai/AGENTS.md(one-paragraph invariant). No engine code touched; no upstream-shared paths. - Invariant: the JSONL row schema emitted by this adapter is byte-identical to the LSVQ adapter (
ai/scripts/lsvq_to_corpus_jsonl.py, ADR-0367) and the KonViD-150k Phase 2 adapter modulo thecorpusandcorpus_versionliterals. If a future PR widens the row contract (new column, type change), all adapters must follow in lockstep.
ADR-0369 — Waterloo IVC 4K-VQA corpus ingestion (2026-05-08)¶
- Touches:
ai/scripts/waterloo_ivc_to_corpus_jsonl.py(new),ai/tests/test_waterloo_ivc.py(new),docs/adr/0369-waterloo-ivc-4k-corpus-ingestion.md(new),docs/adr/_index_fragments/0369-waterloo-ivc-4k-corpus-ingestion.md(new),docs/adr/_index_fragments/_order.txt(one-line append),docs/adr/README.md(regenerated index),docs/ai/waterloo-ivc-4k-ingestion.md(new),docs/research/0091-waterloo-ivc-4k-corpus-feasibility.md(new),changelog.d/added/0369-waterloo-ivc-4k-ingestion.md(new),ai/AGENTS.md(one-paragraph invariant). No engine code touched; no upstream-shared paths. -
Invariant: JSONL row schema is byte-identical to the LSVQ (ADR-0367) and YouTube-UGC (ADR-0368) adapters modulo the
corpusandcorpus_versionliterals. All adapters must change in lockstep on schema widening. -
On upstream sync: no action required.
- Re-test on rebase:
```bash
pytest ai/tests/test_youtube_ugc.py -v
pytest ai/tests/test_waterloo_ivc.py -v
ADR-0325 — predictor stub-models policy (2026-05-08)¶
- Touches:
tools/vmaf-tune/src/vmaftune/predictor_train.py(new),model/predictor_<codec>.onnx× 14 (new),model/predictor_<codec>_card.md× 14 (new),tools/vmaf-tune/tests/test_predictor_train.py(new),docs/ai/predictor.md(new),docs/adr/0325-predictor-stub-models-policy.md(new),docs/adr/README.md+_index_fragments/0325-*.md+_order.txt(index rows),changelog.d/added/predictor-train-pipeline.md(new). No engine code; no upstream-shared paths. - Invariant: the trainer's
CODECStuple is sourced frompredictor._DEFAULT_COEFFSso the two stay in lockstep. Any new codec adapter that lands inpredictor._DEFAULT_COEFFSmust (a) ship a matching synthetic-stub model + card undermodel/predictor_<codec>.{onnx,_card.md}in the same PR, and (b) re-run the trainer to refresh the artefact set. The shipped-model smoke test (test_predictor_loads_each_shipped_model) parameterises overCODECSand will fail if either condition is missed. - On upstream sync: no action required. The predictor + trainer live entirely under
tools/vmaf-tune/(a fork-local path); the model artefacts live undermodel/but use apredictor_<codec>.onnxnaming scheme that does not collide with any upstreammodel/vmaf_*.{json,pkl}ormodel/tiny/*.onnxpath. - Re-test on rebase:
```bash python3 -m pytest tools/vmaf-tune/tests/test_predictor_train.py -q python3 -c " import sys sys.path.insert(0, 'tools/vmaf-tune/src') from vmaftune.predictor_train import main sys.exit(main(['--output-dir', '/tmp/predictor-rebase', '--epochs', '20'])) "
ADR-0325 — vmaf-tune Phase B target-VMAF bisect (2026-05-08)¶
ADR-0326 — vmaf-tune Phase B target-VMAF bisect (2026-05-08)¶
- Touches:
tools/vmaf-tune/src/vmaftune/bisect.py(new),tools/vmaf-tune/src/vmaftune/compare.py(default-predicate error string),tools/vmaf-tune/tests/test_bisect.py(new),tools/vmaf-tune/tests/test_compare.py(renamed default-predicate assertion),tools/vmaf-tune/AGENTS.md(Phase B invariant),docs/adr/0326-vmaf-tune-phase-b-bisect.md(new),docs/adr/_index_fragments/0326-vmaf-tune-phase-b-bisect.md(new),docs/adr/_index_fragments/_order.txt(append),docs/research/0090-vmaf-tune-phase-b-bisect-feasibility.md(new),docs/usage/vmaf-tune-bisect.md(new),changelog.d/added/vmaf-tune-phase-b-bisect.md(new). No upstream Netflix/vmaf surface is touched. - Invariant: the bisect assumes monotone-decreasing VMAF in CRF. Two non-adjacent samples that violate this contract abort the call with a clear error rather than falling back to a different search strategy. Do NOT add a fallback path on rebase — the AGENTS.md Phase B note is load-bearing.
- Companion seam:
compare._default_predicateno longer raisesNotImplementedError("Phase B pending"); it returns a well-formedRecommendResult(ok=False, error=...)pointing callers atmake_bisect_predicate. Any downstream tests that asserted "Phase B pending" verbatim need updating. - On upstream sync: no action required. The module lives entirely under
tools/vmaf-tune/(a fork-local path). - Re-test on rebase:
python3 -m pytest tools/vmaf-tune/tests/test_bisect.py -v
python3 -m pytest tools/vmaf-tune/tests/test_compare.py -v
feat/sycl-integer-cambi-port — CAMBI SYCL twin (T3-15 / ADR-0371, 2026-05-10)¶
- Touches:
core/src/feature/sycl/integer_cambi_sycl.cpp(new file),core/src/feature/feature_extractor.c(extern declaration + list entry under#if HAVE_SYCL),core/src/meson.build(source addition to the SYCL feature list),core/test/test_integer_cambi_sycl.c(new smoke test),core/test/meson.build(test target +gpu_all_depsrefactor),docs/backends/sycl/overview.md(Known gaps update),docs/adr/0371-cambi-sycl-port.md(new ADR). - Invariant:
vmaf_fex_cambi_syclmust remain registered before any Vulkan or CUDA CAMBI extractor infeature_extractor_list[]so SYCL is preferred when the runtime selects a GPU backend. The ordering#if HAVE_SYCL … &vmaf_fex_cambi_syclbefore#if HAVE_VULKAN/#if HAVE_CUDAis load-bearing. Additionally: the host residual callsvmaf_cambi_calculate_c_valuesandvmaf_cambi_spatial_poolingviacambi_internal.htrampoline — if upstream Netflix ever renames or removes those symbols the SYCL twin will silently stop compiling. - Upstream conflict probability: low. Netflix upstream does not carry a
core/src/feature/sycl/directory. The only upstream-shared paths touched arefeature_extractor.c(extern + list entry) andcambi_internal.h(consumed, not modified). A conflict onfeature_extractor.cwould be an upstream addition of a new extractor; resolve by re-inserting thevmaf_fex_cambi_syclentry under#if HAVE_SYCL. - Re-test on rebase:
meson setup build -Denable_sycl=true -Denable_cuda=false && ninja -C build
meson test -C build --suite=fast
fix/float-adm-extractor-loading — enable_float default flip (2026-05-09)¶
No rebase-sensitive invariants. The change is a single default-value flip in core/meson_options.txt (enable_float: false → enable_float: true) and a prose update to docs/development/build-flags.md. No C source was modified; no build-system paths changed; no new symbols were added.
- On upstream sync: if Netflix upstream ever adds their own
enable_floatdefault change, prefer theirs and drop this entry. - Re-test on rebase: run the reproducer —
./build/tools/vmaf --feature float_adm --no_prediction ...— and confirm it no longer prints "problem loading feature extractor".
ADR-0326 — MyTestCase upstream migration (partial port, Batch E, 2026-05-08)¶
- Touches:
python/test/testutil.py,python/test/bd_rate_calculator_test.py,python/test/asset_test.py,python/test/bootstrap_train_test_model_test.py,python/test/local_explainer_test.py,python/test/cy_test.py,python/test/executor_test.py,python/test/raw_extractor_test.py,python/test/cross_validation_test.py,python/test/niqe_train_test_model_test.py,python/vmaf/script/run_testing.py,python/vmaf/tools/misc.py,python/vmaf/tools/testutils.py. Five Netflix golden-pinned files (quality_runner_test.py,vmafexec_test.py,vmafexec_feature_extractor_test.py,feature_extractor_test.py,result_test.py) are deliberately untouched. - Invariant: every
assertAlmostEqual(key, value)pair in the five golden-pinned files remains byte-identical to the fork's pre-port state per ADR-0024. Verified via/tmp/mytestcase-port/verify_golden.pyagainst the multiset baseline/tmp/mytestcase-port/baseline-pairs.json: all 310 + 183 + 37 + 113 + 17 = 660 pairs PASS post-port. CLAUDE.md §1 / §8 forbid altering them. - Deferred upstream commits (still need porting in a future session, in chronological order):
7d1ad54b(port aim/adm3/motion3 fextractor tests),9fa593eb(more aim/adm3/motion3 + new options),0341f730(remove duplicate test_run_vmaf_integer_fextractor),a3776335+74bdce1b(align fork tests with upstream layout for aim/adm3/motion3),322ca041(replace temporal slicing with pre-sliced YUV fixtures),6c097fc4+ead2d12b+4679db83(macOS FP tolerance widenings — many of these are no-ops for our fork because the affected lines do not exist in the fork's current state),005988ea(routine_test MyTestCase + fifo_mode),3a041a97+3e075107(per the user's 2026-05-08 instruction list, these "update score values" upstream commits are PERMANENTLY skipped — porting them would violate ADR-0024). The3cbf352d+eb3374d0anda333ba4c+403dafedrevert-pairs are no-ops upstream and require no port. TheMyTestCasemixin itself is already inpython/vmaf/tools/misc.pyfrom a prior fork-local sync. - Watch out for: when retrying the deferred port, the fork's
feature_extractor_test.pytest method order (psnr -> ansnr -> ssim -> ssim_flat -> ms_ssim) differs from upstream's post-cluster layout (psnr -> ssim -> ms_ssim -> ansnr); the cherry-pick conflicts cluster around this reordering. The aim/adm3/motion3 additive blocks should be transplanted as new test methods rather than merged into existing ones. The verifier script will catch any (key, value) pair drop. - Re-test on rebase:
```bash python3 /tmp/mytestcase-port/verify_golden.py # OVERALL: PASS required pytest python/test/bd_rate_calculator_test.py -v pytest python/test/asset_test.py -v pytest python/test/quality_runner_test.py python/test/vmafexec_test.py \ python/test/vmafexec_feature_extractor_test.py \ python/test/feature_extractor_test.py python/test/result_test.py \ --collect-only -q # 173 tests collected, no errors
ADR-0318 — fr_regressor_v2 ensemble retrain harness fix (2026-05-06)¶
- Touched files:
ai/scripts/run_ensemble_v2_real_corpus_loso.sh— wrapper passes--corpus "$CORPUS_JSONL"+--out-dir "$out_dir", drops--corpus-root/--output. JSONL-existence check replaces the YUV-directory hard-fail (YUV check is informational).docs/ai/ensemble-v2-real-corpus-retrain-runbook.md— adds step 0. Generate the Phase A canonical-6 corpus, expands prereqs table with the JSONL row + Phase A wall-time estimate.docs/adr/0318-ensemble-retrain-harness-fix.md,docs/adr/README.md(index row),changelog.d/fixed/ensemble-retrain-harness-interface.md.- Rebase invariant: not load-bearing. Wrapper-script + doc-only change. The trainer
ai/scripts/train_fr_regressor_v2_ensemble_loso.pyCLI is the authoritative interface (frozen here as part of the decision); any future change to it must update this wrapper in the same PR. - Upstream source: none — fork-local AI training harness.
- On upstream sync: no action required. Path is entirely under
ai/scripts/+docs/ai/+docs/adr/; upstream Netflix/vmaf does not ship these directories. - Re-test on rebase:
```bash bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh python3 ai/scripts/train_fr_regressor_v2_ensemble_loso.py --help mkdir -p runs/phase_a/full_grid && \ touch runs/phase_a/full_grid/per_frame_canonical6.jsonl && \ bash ai/scripts/run_ensemble_v2_real_corpus_loso.sh 2>&1 | \ grep -q "unrecognized arguments" && echo "REGRESSED" || echo "OK" rm -rf runs/phase_a runs/ensemble_v2_real
0320 — fr_regressor_v2 ensemble seeds — production flip (ADR-0320)¶
- Touches:
model/tiny/registry.json(fivefr_regressor_v2_ensemble_v1_seed{0..4}rows flipped fromsmoke: truetosmoke: false),model/tiny/fr_regressor_v2_ensemble_v1_seed_flip_PROMOTE.json(new — committed verdict from the ADR-0319 harness run),ai/AGENTS.md(registry-flip invariant updated to record the flip - the going-forward "fresh PROMOTE.json required" rule),
docs/state.md(Recently closed row),docs/adr/0320-fr-regressor-v2-ensemble-seed-flip.md(new ADR). Closes the deferral tracked in rebase-notes §0303 / §0309 / §0319. - Upstream source: none — fork-local registry mutation honouring ADR-0303's flip contract. Netflix/vmaf upstream has no
fr_regressor_v2ensemble surface. - Invariant: any future change to the
fr_regressor_v2_ensemble_v1_seed{0..4}registry rows (sha256 bump after retraining, smoke-flag mutation, ONNX path change) requires a freshruns/ensemble_v2_real/PROMOTE.jsonverdict with mean per-seed LOSO PLCC ≥ 0.95 ANDmax - minspread ≤ 0.005 — the same two-part gate ADR-0303 defined and ADR-0320 honoured. Never mutate these rows during a/sync-upstreamrebase or as a side-effect of any other PR; the harness emits the verdict file but does not mutate the registry. The committed verdict atmodel/tiny/fr_regressor_v2_ensemble_v1_seed_flip_PROMOTE.jsonis the audit-trail anchor for the 2026-05-06 flip. - On upstream sync: no action required. The five rows live in
model/tiny/registry.jsonwhich is fork-local; upstream has no competing entries. If upstream ever ships its ownfr_regressor_v2_ensemble_v1_*registry rows, stop and consult the rebase reviewer — naming collision implies an architectural divergence that needs a Supersedes-ADR, not a mechanical merge. - Re-test on rebase:
python3 -c "import json; \
d = json.load(open('model/tiny/registry.json')); \
seeds = [m for m in d['models'] \
if m['id'].startswith('fr_regressor_v2_ensemble_v1_seed')]; \
assert len(seeds) == 5, seeds; \
assert all(m['smoke'] is False for m in seeds), seeds; \
print('OK: 5 ensemble seeds at smoke=false')"
python3 -c "import json; \
v = json.load(open('model/tiny/fr_regressor_v2_ensemble_v1_seed_flip_PROMOTE.json')); \
assert v['verdict'] == 'PROMOTE'; \
assert v['gate']['passed'] is True; \
print('OK: verdict still PROMOTE')"
--feature ssimulacra2 --backend cuda --places 4
ADR-0372 — HIP batch-1: integer_psnr_hip + float_ansnr_hip real kernels (2026-05-10)¶
- Touches:
core/src/feature/hip/integer_psnr_hip.c(full rewrite),core/src/feature/hip/float_ansnr_hip.c(full rewrite),core/src/feature/hip/integer_psnr_hip.h(HSACO symbol decl underHAVE_HIPCC),core/src/feature/hip/float_ansnr_hip.h(HSACO symbol decl underHAVE_HIPCC),core/src/feature/hip/integer_psnr/psnr_score.hip(new device kernel),core/src/feature/hip/float_ansnr/float_ansnr_score.hip(new device kernel),core/src/hip/kernel_template.{h,c}(vmaf_hip_kernel_submit_post_record— also in PR #612; on merge conflict keep one copy and drop the duplicate),core/src/meson.build(hip_hsaco_sourcesHSACO build pipeline — also in PR #612),docs/backends/hip/overview.md(status table update),docs/adr/0372-hip-batch1-integer-psnr-float-ansnr.md(new). - Invariant (HAVE_HIPCC dual-path): all device-state fields in
PsnrStateHip/AnsnrStateHipand allhipModule_t/hipFunction_tmember declarations live under#ifdef HAVE_HIPCC. Without the flag the scaffold-ENOSYScontract is preserved and the host TU compiles without ROCm SDK headers. On rebase or refactor, never move device-state fields outside the#ifdef HAVE_HIPCCguard — it breaks the CPU-only build. - Invariant (float_ansnr no-memset bypass):
float_ansnr_hip'ssubmit()does not callvmaf_hip_kernel_submit_pre_launch. The device kernel writes per-block(sig, noise)float partials directly into an output buffer (partials[2*block_idx+0]and[+1]); no atomic accumulation means no memset is needed. The partials buffer is sizedwg_count * 2u * sizeof(float)at init. On rebase: if a future PR adds asubmit_pre_launchcall tofloat_ansnr_cuda.c, the HIP twin must follow in the same PR. - Invariant (integer_psnr uint64 split shuffle): the PSNR kernel splits a uint64 warp-reduction into two uint32
__shfl_downcalls (HIP warp size = 64, no native uint64 shuffle). On rebase: if ROCm adds native uint64 shuffle primitives in a future release, the kernel can be simplified — but verify the cross-backend numeric gatemeson test -C build --suite=hip-paritystill passes before landing. - Merge-conflict risk with PR #612:
vmaf_hip_kernel_submit_post_recordinkernel_template.{h,c}and thehip_hsaco_sourcesmeson pipeline are also being added by PR #612 (float_psnr_hip). When the two PRs merge, keep one copy of each and discard the duplicate. The bodies are identical so either direction is safe. - On upstream sync: no action required for PSNR or ANSNR logic (fork-local kernels). If upstream adds its own HIP backend with conflicting
meson.buildvariables, resolve manually against thehip_hsaco_sourcespattern documented in this PR. - Re-test on rebase:
# CPU-only build (no ROCm required): must compile clean
meson setup build_hip_cpu -Denable_hip=true -Denable_hipcc=false \
-Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_cpu
# With ROCm + hipcc: HSACO pipeline must produce .hsaco + _hsaco.c
meson setup build_hip_full -Denable_hip=true -Denable_hipcc=true \
-Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_full
# Cross-backend numeric gate (requires AMD GPU)
meson test -C build_hip_full --suite=hip-parity
ADR-0373 — HIP batch-2: float_motion_hip real kernel (2026-05-10)¶
- Touches:
core/src/feature/hip/float_motion_hip.c(full rewrite to#ifdef HAVE_HIPCCdual-path;uintptr_topaque slots replaced with realvoid *device pointers),core/src/feature/hip/float_motion/float_motion_score.hip(device kernel — already present, no change in this PR),core/src/meson.build(float_motion_scoreadded tohip_kernel_sources),docs/backends/hip/overview.md(status update to 4/11),docs/adr/0373-hip-batch2-float-motion.md(new). - Invariant (HAVE_HIPCC dual-path):
hipModule_t module,hipFunction_t funcbpc8/funcbpc16, andvoid *ref_in,void *blur[2]live under#ifdef HAVE_HIPCC. Without the flag the scaffold-ENOSYScontract is preserved. Never move these fields outside the guard. - Invariant (temporal blur ping-pong):
cur_bluralternates 0/1 in bothsubmit()andcollect(). The kernel readsblur[1 - s->cur_blur]as "prev" and writesblur[s->cur_blur]as "cur".collect()flipscur_blurafter consuming the partials. On rebase: if the CUDA twin changes the ping-pong direction, the HIP twin must follow in the same PR. - Invariant (first-frame compute_sad=0):
submit()passescompute_sad=0whenindex == 0. The kernel still writescur_blurbut sets all partials to 0.0.collect()emits motion_score=0 and motion2_score=0 forindex==0without SAD accumulation. - Invariant (flush tail):
flush()emitsVMAF_feature_motion2_score = prev_motion_scoreats->index(the last frame) and returns 1. Ifs->index == 0it returns 1 immediately. Mirrorsflush_fex_cudashape exactly. - On upstream sync: no action required (fork-local kernel). If Netflix adds a HIP backend with conflicting
float_motionlogic, resolve against this invariant set. - Re-test on rebase:
# CPU-only build (no ROCm required): must compile clean
meson setup build_hip_cpu -Denable_hip=true -Denable_hipcc=false \
-Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_cpu
# With ROCm + hipcc: HSACO pipeline must produce float_motion_score.hsaco
meson setup build_hip_full -Denable_hip=true -Denable_hipcc=true \
-Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_full
# Cross-backend numeric gate (requires AMD GPU)
meson test -C build_hip_full --suite=hip-parity
ADR-0375 — HIP batch-3: float_moment_hip + float_ssim_hip real kernels (2026-05-10)¶
- Touches:
core/src/feature/hip/float_moment_hip.c(full rewrite to#ifdef HAVE_HIPCCdual-path),core/src/feature/hip/float_moment_hip.h(HSACO symbol decl underHAVE_HIPCC),core/src/feature/hip/float_moment/moment_score.hip(new device kernel),core/src/feature/hip/float_ssim_hip.c(full rewrite to#ifdef HAVE_HIPCCdual-path),core/src/feature/hip/float_ssim_hip.h(HSACO symbol decl underHAVE_HIPCC),core/src/feature/hip/float_ssim/ssim_score.hip(new device kernel),core/src/meson.build(moment_scoreandssim_scoreadded tohip_kernel_sources),docs/backends/hip/overview.md(status update to 6/11),docs/adr/0375-hip-batch3-float-moment-float-ssim.md(new). - Invariant (HAVE_HIPCC dual-path):
hipModule_t module,hipFunction_t funcbpc8/16,void *ref_in/dis_in(moment) and all fivevoid *d_*intermediate buffers +void *ref_in/cmp_in(SSIM) live under#ifdef HAVE_HIPCCin the respective structs. Free helpers (moment_hip_module_free,ssim_hip_bufs_free) are defined outside the guard with internal#ifdef HAVE_HIPCCbodies (mirrorsfloat_psnr_hip_module_free). On rebase: never move device-state fields outside the guard — it breaks the CPU-only build. - Invariant (moment 7-arg kernel):
calculate_moment_hip_kernel_8bpcand_16bpcboth take 7 arguments (ref, dis, ref_stride, dis_stride, sums, width, height) — the 16bpc kernel does NOT take abpcarg. The host launch must pass 7 args to both functions. If the CUDA twin adds abpcarg tomoment_score_16bpc, the HIP twin must follow. - Invariant (SSIM two-pass stream ordering): both
calculate_ssim_hip_horiz_*andcalculate_ssim_hip_vert_combinerun on the sames->lc.strstream. Implicit stream ordering provides the happens-before between Pass 1 writes and Pass 2 reads — no explicit event is needed between the two launches. On rebase: if the CUDA twin adds an explicit inter-pass sync event, evaluate whether GCN/RDNA's stream ordering guarantees are equivalent before mirroring. - Invariant (SSIM WARPS_PER_BLOCK=2):
SSIM_WARPS_PER_BLOCK = SSIM_BLOCK_SIZE / SSIM_WARP_SIZE = 128 / 64 = 2(vs CUDA's 4 = 128/32). The shared-memory arrays_warp_sums[SSIM_WARPS_PER_BLOCK]must be sized 2. On rebase: if the block size or warp size changes, updatessim_score.hipaccordingly. - Invariant (SSIM scale=1 only):
init_fex_hiprejectsscale != 1with-EINVAL. Mirrors the CUDA and Vulkan SSIM twins. Lifting this constraint is a future batch item and requires a new ADR. - On upstream sync: no action required (fork-local kernels).
- Re-test on rebase:
# CPU-only build (no ROCm required): must compile clean
meson setup build_hip_cpu -Denable_hip=true -Denable_hipcc=false \
-Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_cpu
# With ROCm + hipcc: HSACO pipeline must produce moment_score.hsaco + ssim_score.hsaco
meson setup build_hip_full -Denable_hip=true -Denable_hipcc=true \
-Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build_hip_full
# Cross-backend numeric gate (requires AMD GPU)
meson test -C build_hip_full --suite=hip-parity
feat/hip-float-psnr-first-real — T7-10b: float_psnr_hip first real kernel (ADR-0254)¶
Touches:
core/src/feature/hip/float_psnr_hip.c— complete rewrite from scaffold stub to functional kernel consumer (HIP Module API pattern:hipModuleLoadData+hipModuleLaunchKernel).core/src/feature/hip/float_psnr_hip.h— HSACO symbol extern declarations (float_psnr_score_hsaco[],float_psnr_score_hsaco_len).core/src/feature/hip/float_psnr/float_psnr_score.hip— new HIP device kernel file (warp-64 reduction, GCN/RDNA-specific__shfl_down).core/src/hip/kernel_template.{h,c}— newvmaf_hip_kernel_submit_post_recordhelper for the post-launch event.core/src/hip/meson.build—float_psnr_hip.cadded tohip_sources.core/src/meson.build—hip_hsaco_sourceslist +enable_hipccguard +hipcc/xxdcustom-target pipeline.core/meson_options.txt— newenable_hipccboolean option.core/src/feature/feature_extractor.c—vmaf_fex_float_psnr_hipregistration under#if HAVE_HIP.core/test/test_hip_smoke.c—test_float_psnr_hip_extractor_registered.
Invariants:
-
vmaf_hip_kernel_submit_post_recordcall ordering — must be called afterhipMemcpyAsyncDtoH and before the collect-sidevmaf_hip_kernel_collect_wait. Any reorder breaks the event-fencing contract (the finished event records the end of the readback copy, not the end of the kernel launch). If a future PR refactorskernel_template.c, preserve this ordering constraint. -
#ifdef HAVE_HIPCCwraps all device-dependent state —hipModule_t,hipFunction_t, staging-buffer allocs, and the kernel launch chain are all inside#ifdef HAVE_HIPCCguards so the file compiles cleanly without a ROCm SDK. The#ifndef HAVE_HIPCCpaths return-ENOSYS. Any future real-kernel port must preserve this dual-path pattern. -
Warp size 64 (GCN/RDNA) —
FPSNR_WARPS_PER_BLOCK = 4for block size 256 (vs CUDA's 8 warps at warp size 32). The.hipkernel uses__shfl_down(v, off)without a mask (HIP convention; CUDA uses__shfl_down_sync(0xffffffff, v, off)). Any future port of the CUDA twin's warp-reduction that changes these constants must update both backends. -
enable_hipcc=falsedefault — the meson option defaults tofalseso CI / downstream builds without a ROCm toolchain still compile cleanly. Theenable_hipcc=truepath requireshipcc+xxdinPATHand a ROCm 6+ SDK. Gate any build-system change on both codepaths.
Re-test on rebase:
# CPU-only (no ROCm needed):
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build
meson test -C build --suite=fast # test_float_psnr_hip_extractor_registered passes
# With hipcc (ROCm 6+):
meson setup build -Denable_hip=true -Denable_hipcc=true -Denable_cuda=false libvmaf
ninja -C build
# Confirm kernel launches on device by running vmaf with --feature float_psnr_hip
HIP batch-4 -- ciede_hip and integer_motion_v2_hip real kernels (ADR-0377)¶
Rebase-sensitive invariants:
-
Arithmetic shift on int32/int64 in
motion_v2_score.hip— the inner filter right-shifts (>> shift_y,>> shift_x) operate on signed types (int32_t,int64_t). They MUST remain arithmetic (signed) shifts. Converting to logical shifts (e.g.,>> (unsigned), or using bitwise ops) diverges from the CPU reference for negative values. This was the root cause of the AVX2srlv_epi64regression in PR #587. The CUDA twin documents the same constraint. -
Mirror padding diverges from
motion_hip—motion_v2_score.hipuses reflective mirror (2 * size - idx - 1) whilemotion_hip's kernel uses skip-boundary mirror (2 * size - idx - 2). Both match their respective CPU references. Do not unify them on rebase. -
Six YUV staging buffers for
ciede_hip—ciede_hip_bufs_allocallocates ref_y/u/v + dis_y/u/v separately. The chroma buffers are sized atchroma_w * chroma_h, notluma_w * luma_h. If the HIP picture-buffer API changes on rebase (e.g.,VMAF_FEATURE_EXTRACTOR_HIPflag lands and pictures arrive on-device), these staging copies and theirhipMalloc/hipFreecalls must be removed or made conditional. -
#ifdef HAVE_HIPCCdual-path preserved — same invariant as float_psnr_hip (see entry above). All device-dependent state and kernel launches are inside#ifdef HAVE_HIPCCguards.
Re-test on rebase:
# CPU-only (no ROCm needed):
meson setup build -Denable_hip=true -Denable_cuda=false -Denable_sycl=false libvmaf
ninja -C build
meson test -C build # 54/54 pass including test_hip_smoke
speed_qa -- real SpEED-QA implementation (ADR-0253)¶
core/src/feature/speed_qa.c went from a 71-line placeholder scaffold to a ~380-line real implementation. The extractor now sets
speed_qa — real SpEED-QA implementation (ADR-0253)¶
core/src/feature/speed_qa.c went from a 71-line placeholder scaffold to a ~380-line real implementation. The extractor now sets
VMAF_FEATURE_EXTRACTOR_TEMPORAL and carries priv_size = sizeof(SpeedQaState); the registration entry in feature_extractor_list[] is unchanged (always unconditional, outside the #if VMAF_FLOAT_FEATURES block).
No upstream rebase conflict expected. The scaffold was fork-local; Netflix
upstream has no speed_qa.c. The upstream speed.c is unmodified.
Rebase invariant: vmaf_fex_speed_qa must stay outside the VMAF_FLOAT_FEATURES guard in feature_extractor.c -- speed_qa.c is compiled unconditionally (no float dependency). If a future Netflix commit lands a speed_qa.c, audit for algorithm conflicts before merging.
upstream has no speed_qa.c. The speed.c file (upstream port) is unmodified.
Rebase invariant: vmaf_fex_speed_qa must stay outside the VMAF_FLOAT_FEATURES guard in feature_extractor.c — speed_qa.c is compiled unconditionally (no float dependency). If a rebase lands a Netflix speed_qa.c, audit for algorithm conflicts before merging.
Re-test on rebase:
meson setup build_test libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build_test test/test_speed_qa
meson test -C build_test test_speed_qa --verbose
# Expected: 5 tests run, 5 passed
0378 — picture-upload stream CU_STREAM_NON_BLOCKING (PR #702, ADR-0378)¶
Touches: core/src/cuda/picture_cuda.c (one line in vmaf_cuda_picture_alloc).
Invariant: priv->cuda.str must be created with CU_STREAM_NON_BLOCKING (via cuStreamCreateWithPriority). The CUDA implicit null-stream serialisation rule makes CU_STREAM_DEFAULT a per-frame context barrier; at sub-4K this reduces CUDA motion throughput to ~0.55x CPU. If a future upstream commit touches vmaf_cuda_picture_alloc and reverts the stream flag, the performance regression returns silently.
Re-test:
meson setup build-cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build-cuda
./build-cuda/tools/vmaf_bench --resolution 576x324 --gpu-only --frames 20
# motion (CUDA) @ 576x324 must be >= 30 fps (>= 1x CPU baseline)
ADR-0376 — Vulkan buffer-invalidate void → int fix (GCC 16, 2026-05-10)¶
- Touches:
core/src/feature/vulkan/float_ansnr_vulkan.c(functionreduce_partials, call site inextract),core/src/feature/vulkan/cambi_vulkan.c(functionscambi_vk_readback_image,cambi_vk_readback_mask, call sites incambi_vk_extract). - Invariant:
reduce_partials,cambi_vk_readback_image, andcambi_vk_readback_maskare nowstatic intwith error-propagating call sites. If an upstream sync brings a competing refactor of these functions (e.g., a signature change or a different coherency-flush strategy), thestatic intcontract and call-site error checks must be preserved. - On upstream sync: no risk from Netflix/vmaf upstream — these files are 100% fork-local (Vulkan feature extractors do not exist upstream). The rebase risk is internal: if a fork-local PR changes the Vulkan buffer-API surface (e.g., a new
vmaf_vulkan_buffer_invalidatevariant), verify the call-site error propagation pattern still applies. - Re-test on rebase:
meson setup build-vk-retest -Denable_cuda=false -Denable_sycl=false -Denable_vulkan=true
ninja -C build-vk-retest
# Must compile cleanly under GCC 16 with no -Wreturn-mismatch error
PR-fix-cuda-picture-widening — CUDA picture_cuda.c integer-precision fixes (round-5 clang-tidy)¶
- Touches:
core/src/cuda/picture_cuda.c— upstream-shared CUDA picture allocation and transfer path. - Invariant: Three
WidthInBytes/cuMemAllocPitchwidth arguments now use(size_t)casts to prevent silent 32-bit multiplication overflow before widening tosize_t.aligned_y/aligned_careunsignedwith an explicit1umask literal.vmaf_ref_load()result is stored aslong. If an upstream sync modifies these expressions, ensure the(size_t)casts andunsignedtypes are preserved. - Re-test on rebase:
clang-tidy \
-checks='-*,bugprone-narrowing-conversions,bugprone-implicit-widening-of-multiplication-result' \
-p core/build-cuda \
core/src/cuda/picture_cuda.c
# Must produce zero warnings for the named checks.
PR-fix-cuda-dispatch-getenv — CUDA dispatch_strategy.c getenv() thread-safety fix¶
- Touches:
core/src/cuda/dispatch_strategy.c— fork-local TU (no upstream equivalent). - Invariant:
g_env_once/cache_env_dispatch/g_env_dispmust remain as the single canonical read path forVMAF_CUDA_DISPATCH. If a future PR needs to re-read the variable (e.g., for unit-test reset), it must resetg_env_onceviapthread_once_t g_env_once = PTHREAD_ONCE_INIT;in a test fixture, not callgetenv()directly fromvmaf_cuda_select_strategy. - Re-test on rebase:
clang-tidy \
-checks='-*,concurrency-*' \
-p core/build-cuda \
core/src/cuda/dispatch_strategy.c
# Must produce zero concurrency-mt-unsafe warnings.
0103 — -fvisibility=hidden + VMAF_EXPORT public-API annotation (ADR-0379, Research-0092)¶
- Touches:
core/src/meson.build(vmaf_cflags_common),core/include/libvmaf/*.h(all public headers),core/include/libvmaf/macros.h(new file),core/include/core/meson.build(header install list),core/src/dnn/model_loader.h(vmaf_dnn_verify_signaturedeclaration). - Invariant:
-fvisibility=hiddenis invmaf_cflags_common. Any new publicvmaf_*function added by an upstream sync — whether inlibvmaf.c,picture.c,dict.c, or any other source — must also haveVMAF_EXPORTon its declaration in the matching public header, otherwise it will be hidden inlibvmaf.soand downstream callers will get a link error. Gate:nm -D --defined-only build/src/libvmaf.so.* | grep ' [TW] ' | grep -v ' vmaf_' | wc -lmust be 0. - On upstream sync: upstream Netflix/vmaf does NOT use
-fvisibility=hidden. Any new public entry point in an upstream commit (typically added tocore/src/libvmaf.c+core/include/libvmaf/libvmaf.h) will compile to a hidden symbol on the fork withoutVMAF_EXPORT. The merge author must: - Add
VMAF_EXPORTto the new declaration in the public header. - Run the
nm -Dgate (above) — it must return 0. - Run
meson test -C build— all tests must pass. - Re-test on rebase:
meson setup build-vis libvmaf -Denable_cuda=false -Denable_sycl=false --wipe
ninja -C build-vis
nm -D --defined-only build-vis/src/libvmaf.so.* | grep ' [TW] ' | grep -v ' vmaf_' | wc -l
# Must print 0
meson test -C build-vis
# All tests must pass
PR-fix-cuda-switch-defaults — CUDA feature extractor defensive fixes (round-5 clang-tidy)¶
- Touches:
core/src/feature/cuda/integer_adm_cuda.c,core/src/feature/cuda/integer_vif_cuda.c. - Invariant:
default: break;clauses added to threeswitch(scale)statements. If an upstream sync adds new scale cases to ADM (scales 1–3) or VIF (scales 0–3), the default clause remains valid but no longer exhausts all cases — update the comment accordingly. TheRES_BUFFER_SIZEmacro now has parentheses; any fork-local addition to that macro must preserve them. - Re-test on rebase:
clang-tidy \
-checks='-*,bugprone-macro-parentheses,bugprone-switch-missing-default-case' \
-p core/build-cuda \
core/src/feature/cuda/integer_adm_cuda.c \
core/src/feature/cuda/integer_vif_cuda.c
# Must produce zero warnings for those checks.
PR-fix-picture-align-unsigned-narrowing — integer-sanitizer narrowing/overflow fixes in picture.c, libvmaf.c, tensor_io.c (round-5 -fsanitize=integer sweep)¶
- Touches:
core/src/picture.c(upstream-shared picture geometry),core/src/libvmaf.c(upstream-shared vmaf_init), andcore/src/dnn/tensor_io.c(fork-added f16 ↔ f32 converter). - Invariant: Three narrowing/overflow defects corrected: (1)
aligned_y/aligned_care nowunsignedwithDATA_ALIGN - 1umask — if an upstream sync touchespicture_compute_geometry, ensure theunsignedtype and1uliteral are preserved. (2)vmaf_set_cpu_flags_maskcall site invmaf_inituses(unsigned)(~cfg.cpumask)— if cpumask type changes upstream, revisit the cast. (3)f16_to_f32_onesubnormal path uses a signedint32_t exp_adjcounter — if the f16 converter is reworked, verify no unsigned wrap is reintroduced. - Re-test on rebase:
CC=clang CXX=clang++ meson setup /tmp/build-isan-retest libvmaf \
-Denable_cuda=false -Denable_sycl=false \
--buildtype=debugoptimized -Db_sanitize=integer -Db_lundef=false -Db_lto=false
ninja -C /tmp/build-isan-retest
UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0" \
/tmp/build-isan-retest/test/test_picture 2>&1 | grep "runtime error"
UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0" \
/tmp/build-isan-retest/test/test_read_pictures_monotonic 2>&1 | grep "runtime error"
UBSAN_OPTIONS="halt_on_error=0:abort_on_error=0" \
/tmp/build-isan-retest/test/dnn/test_tensor_io 2>&1 | grep "runtime error"
# All three must produce zero "runtime error" lines.
PR-fix-cuda-pinned-alloc-null-deref — CWE-476 null-deref in vmaf_cuda_picture_alloc_pinned (round-6 cross-PR audit)¶
- Touches:
core/src/cuda/picture_cuda.c— CUDA host TU; no upstream equivalent. - Invariant: The sequential check pattern (
err = vmaf_picture_priv_init(pic); if (err) goto free_data;) must be preserved on any rebase or future modification. The|=idiom evaluates the right-hand side unconditionally regardless of prior failure — PR #700 fixed the identical pattern inpicture.c(CWE-476); this fix closes the same class in the CUDA path. If upstream ever adds a similar pinned-picture allocation function, apply the same sequential-check discipline. Secondary:DATA_ALIGN_PINNED - 1u(with theusuffix) must be preserved on both sides of the alignment mask expression to match thepicture.cpattern fixed by PR #708. - Re-test on rebase:
gcc -fanalyzer -Wno-analyzer-too-complex \
-Ilibvmaf/src -Ilibvmaf/include \
core/src/cuda/picture_cuda.c 2>&1 | grep "CWE-476"
# Must produce zero CWE-476 warnings for vmaf_cuda_picture_alloc_pinned.
meson test -C build --suite=fast
# Must be green.
0380 — FFmpeg HIP backend selector patch (ADR-0380, ffmpeg-patches 0011)¶
- Touches:
ffmpeg-patches/0011-libvmaf-wire-hip-backend-selector.patch,ffmpeg-patches/series.txt,ffmpeg-patches/README.md. - Invariant: The patch is authored against FFmpeg
n8.1.1as the cumulative base (0001..0010 stack applied). Context lines in the patch referenceVmafCudaState *cu_state/cuda_pool_initialised(added by patch 0010) and#include <vulkan/vulkan.h>/#endif(added by patch 0006). If the series is ever rebased to a newer FFmpeg tag (n8.2,n9.x, etc.) the surrounding context invf_libvmaf.cmay shift; run the full series replay against the new tag and regenerate conflicting patches. The HIP cleanup path (vmaf_hip_state_free(&s->hip_state)) uses a double pointer, unlike the CUDA path (vmaf_cuda_state_free(s->cu_state)) which uses a single pointer — this asymmetry is intentional and matcheslibvmaf_hip.h; preserve it. - Error-code invariant (fix/hip-averror-propagation-0011, 2026-05-10): Both HIP error sites in
init()must usereturn AVERROR(-err), notreturn AVERROR(EINVAL).AVERROR(-err)maps the libvmaf-supplied errno (e.g.-ENODEV= -19,-ENOSYS= -38) to the correct FFmpeg error string ("No such device", "Function not implemented"). TheAVERROR(EINVAL)form was the original patch text; it was corrected during the full 0001–0011 e2e test run. If the patch context is regenerated or the hunk is split, verify the fix is preserved. - Re-test on rebase:
git -C /tmp/ffmpeg-8 reset --hard n8.1.1
for p in ffmpeg-patches/000*-*.patch; do
git -C /tmp/ffmpeg-8 am --3way "$p" || break
done
# All 11 patches must apply without conflict.
Research-0094 — integer_motion_v2 flush() dict-leak fix¶
- Touches:
core/src/feature/integer_motion_v2.c. - Invariant: The
dict_locally_ownedflag inflush()(introduced in this fix) relies on the invariant thats->feature_name_dictisNULLatflush()entry only in the registered-context (threaded dispatch) path, and non-NULL only whenextract()has already run on this context (serial / pool-instance path). If a future upstream change causesextract()to clears->feature_name_dictmid-run (e.g., per-scene re-init), the flag will incorrectly take the locally-owned path and free the dict prematurely. The companion unit test (test in test_feature_extractor) guards this via the existing motion_v2 code path. - Re-test on rebase:
meson test -C build --suite=fast
# 53/53 must pass, including test_feature_extractor and test_motion_v2_simd.
ASAN_OPTIONS='detect_leaks=1' ./build-leak/tools/vmaf \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 --feature motion_v2 \
--output /dev/null --threads 4 2>&1 | grep -E 'leak|SUMMARY'
# Must produce no output (clean).
0382 — Y4M negative-dimension rejection (ADR-0382, T-FUZZ-Y4M-NEG-WIDTH-SEGV)¶
- Touches:
core/tools/y4m_input.c(internal staticy4m_input_open_impl),core/test/fuzz/y4m_input_known_crashes/(new corpus seed). - Invariant: The guard
if (_y4m->pic_w <= 0 || _y4m->pic_h <= 0)must stay between they4m_parse_tags()call and the chroma-type dispatch block. If upstream restructuresy4m_input_open_implor moves the tag parser, the guard must migrate with it so no allocation occurs before the check. They4m_neg_width_null_deref.y4mseed must be replayed on every rebase to confirm the parser returns clean-1rather than SEGV. - No rebase impact on public API or ffmpeg-patches: the fix is internal to
y4m_input_open_impl(astaticfunction); no public header is changed; no ffmpeg-patches patch is affected. - Re-test on rebase:
CC=clang meson setup build-fuzz libvmaf \
-Dfuzz=true -Db_sanitize=address \
-Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=disabled --buildtype=debug
ninja -C build-fuzz core/test/fuzz/fuzz_y4m_input
./build-fuzz/core/test/fuzz/fuzz_y4m_input \
core/test/fuzz/y4m_input_known_crashes/y4m_neg_width_null_deref.y4m
# Pre-fix: SEGV on address 0x000000000000 inside fread.
# Post-fix: exits 0; stderr prints
# "Invalid YUV4MPEG2 dimensions: W=-8 H=4 (must be > 0)."
fix/picture-odd-dim-chroma-ceiling — picture_compute_geometry ceiling division for odd luma dims¶
- Touches:
core/src/picture.c,core/src/cuda/picture_cuda.c,core/src/feature/cuda/integer_psnr_cuda.c,core/src/feature/cuda/integer_psnr_hvs_cuda.c,core/src/feature/integer_psnr.c,core/test/test_picture.c. - Invariant: All geometry computations for chroma plane dimensions use ceiling division
(dim + ss) >> ss(wheressis 0 or 1). If upstream adds a new allocator or copies the geometry pattern, it must use the same ceiling form. The regression testtest_picture_odd_dim_chroma_ceilingpins this: 577 × 323 YUV420 must producepic.w[1]==289,pic.h[1]==162. - Re-test on rebase:
meson test -C build --suite=fast
# test_picture must pass; it includes test_picture_odd_dim_chroma_ceiling.
# Additionally, the ASan smoke:
python3 -c "
W, H = 577, 323
luma = bytes([128] * W * H)
cw, ch = (W+1)>>1, (H+1)>>1
chroma = bytes([128] * cw * ch)
open('/tmp/odd.yuv','wb').write((luma+chroma+chroma)*3)
"
ASAN_OPTIONS=halt_on_error=1 ./build/tools/vmaf \
--reference /tmp/odd.yuv --distorted /tmp/odd.yuv \
--width 577 --height 323 --pixel_format 420 --bitdepth 8 \
--feature ciede --threads 4
# Must exit 0 with no ASan reports.
fix/motion-mirror-padding-min-dim — 5-tap filter minimum-dimension guard in all motion extractors¶
- Touches:
core/src/feature/integer_motion.c,core/src/feature/integer_motion_v2.c,core/src/feature/float_motion.c,core/src/feature/cuda/integer_motion_cuda.c,core/src/feature/cuda/float_motion_cuda.c,core/src/feature/cuda/integer_motion_v2_cuda.c,core/src/feature/sycl/integer_motion_sycl.cpp,core/src/feature/sycl/float_motion_sycl.cpp,core/src/feature/sycl/integer_motion_v2_sycl.cpp,core/src/feature/vulkan/motion_vulkan.c,core/src/feature/vulkan/motion_v2_vulkan.c,core/src/feature/vulkan/float_motion_vulkan.c,core/src/feature/hip/integer_motion_v2_hip.c,core/src/feature/hip/float_motion_hip.c,core/test/test_motion_min_dim.c,core/test/meson.build. - Invariant: Every motion
init()rejectsw < 3 || h < 3with-EINVALbefore any buffer allocation. The reflect-101 mirror formulaheight - (i_tap - height + 2)requiresheight ≥ filter_width/2 + 1 = 3. If upstream Netflix/vmaf modifies the convolution core to support smaller frames (e.g. by switching to a clamp-to-edge formula), the guard should be re-evaluated. If upstream adds a new motion extractor that also uses the 5-tap kernel, add the same guard to itsinit(). - Re-test on rebase:
meson test -C build --suite=fast
# test_motion_min_dim must pass (13/13 cases).
# Reproducer:
python3 -c "
plane = bytes([128]*1*1)
chroma = bytes([128]*1*1)
frame = plane + chroma + chroma
with open('/tmp/1x1.yuv','wb') as f: f.write(frame*3)
"
./build/tools/vmaf --reference /tmp/1x1.yuv --distorted /tmp/1x1.yuv \
--width 1 --height 1 --pixel_format 420 --bitdepth 8 \
--feature motion --threads 1 2>&1 | grep -E 'EINVAL|minimum|below'
# Must print the "frame 1x1 is below the 5-tap filter minimum" message.
0381 — Vulkan VIF scale 2/3 numerical saturation fix (ADR-0381, PR #718)¶
- Touches:
core/src/feature/vulkan/shaders/float_vif.comp,core/src/feature/vulkan/vif_vulkan.c,core/src/vulkan/meson.build. - Invariant 1 —
float_vif.compmust remain inpsnr_hvs_strict_shaders.meson.build'spsnr_hvs_strict_shaderslist controls which shaders compile withglslc -O0(strict) vs-O(optimised).float_vif.compbelongs to this list because the SPIR-V optimizer's FMA-contraction and reassociation ofsigma1_sq = xx - mu1*mu1triggers catastrophic cancellation at scales 2 and 3 (small local variance), saturating the per-scale score to 1.0 and inflating VMAF by ~+1.07. If the list is re-ordered orfloat_vif.compis accidentally removed, restore it. - Invariant 2 —
precisequalifiers onfloat_vif.compaccumulators. The vertical-pass accumulators (a_mu1,a_mu2,a_xx,a_yy,a_xy), the horizontal-pass accumulators (mu1,mu2,xx,yy,xy), and the sigma expressions (sigma1_sq,sigma2_sq,sigma12) carryprecisequalifiers. These map toOpDecorate NoContractionin SPIR-V and defend against driver-side FMA contraction (Vulkan 1.4 NVIDIA / newer MoltenVK). Do not remove theprecisequalifiers without re-runningplaces=4on every CI hardware lane. - Invariant 3 — integer VIF rd buffer ceiling division.
vif_vulkan.c::alloc_buffers()allocates the per-scale rd buffers with ceiling division:((w + 1u) / 2u) * ((h + 1u) / 2u). For odd input dimensions (e.g. h=81 at scale 2 of a 576×324 input), the shader writesrd_yindices up toh/2 = 40(inclusive), requiring 41 rows. Floor divisionh/2 = 40under-allocates by one row (72 uint32 slots), corrupting the adjacent per-WG int64 accumulator buffer. The ceiling form must be preserved on any refactor or upstream merge touchingalloc_buffers. - Re-test on rebase:
# Build with Vulkan enabled
meson setup build-vk-vif libvmaf -Denable_vulkan=true -Denable_cuda=false -Denable_sycl=false --buildtype=release
ninja -C build-vk-vif
# Run per-scale VIF parity check on the Netflix 576x324 golden pair
build-vk-vif/tools/vmaf \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 --feature float_vif_vulkan --backend vulkan \
--output /tmp/vk_vif_out.json
# All per-scale VIF scores must be < 1.0; VMAF must be within ±0.5 of CPU.
# Per-scale delta must be < 1e-3 from CPU reference.
fix/recal-adm-f1f2-post-pr731 — Recalibrate fork-local adm_f1f2 assertion after PR #731 AIM port¶
No rebase impact: this change touches only python/test/feature_extractor_test.py (a single assertAlmostEqual value for the fork-local adm_f1s/f2s feature), with a documentation comment explaining the recalibration. No C sources, no public headers, no Meson options, no FFmpeg patch stack entries were modified. If upstream Netflix/vmaf adds its own adm_f1s/f2s noise-weight test in a future sync, verify that the expected value (0.8872294166666667) still matches the post-PR-#731 CPU scalar path output for the src01_hrc00_576x324.yuv ↔ src01_hrc01_576x324.yuv pair with the f1s/f2s parameters listed in test_run_vmaf_fextractor_adm_f1f2.
- Re-test:
PYTHONPATH=$PWD/python python3 -m pytest python/test/feature_extractor_test.py::FeatureExtractorTest::test_run_vmaf_fextractor_adm_f1f2 -v— must report 1 passed.
feat/adm-gpu-param-sync — ADM noise_weight/csf_scale/csf_diag_scale GPU extension¶
- Touches:
core/src/feature/cuda/float_adm_cuda.c,core/src/feature/cuda/integer_adm_cuda.c,core/src/feature/sycl/float_adm_sycl.cpp,core/src/feature/sycl/integer_adm_sycl.cpp,core/src/feature/vulkan/adm_vulkan.c,core/src/feature/vulkan/float_adm_vulkan.c. - Invariant 1 — three-param parity with CPU. Every GPU ADM backend (
float_adm_cuda,integer_adm_cuda,float_adm_sycl,integer_adm_sycl,adm_vulkan,float_adm_vulkan) exposesadm_csf_scale,adm_csf_diag_scale, andnoise_weightwith the same defaults (1.0,1.0,0.03125) as the CPU scalar path added by PR #731. If upstream Netflix ever adds or renames these parameters ininteger_adm.c/float_adm.c, the corresponding GPU files must be updated in the same PR. - Invariant 2 — integer CUDA must NOT include
adm_options.hdirectly.core/src/feature/cuda/integer_adm_cuda.cmust NOT includefeature/adm_options.hdirectly.DEFAULT_ADM_NOISE_WEIGHT,DEFAULT_ADM_CSF_SCALE,DEFAULT_ADM_CSF_DIAG_SCALE, and the full 4-memberenum ADM_CSF_MODEarrive transitively viacuda/integer_adm_cuda.h→feature/integer_adm.h. A direct include reintroduces the 2-memberenum ADM_CSF_MODEfromadm_options.hand produces a redeclaration error. - Invariant 3 — Vulkan integer fast-path gated on CSF-scale defaults.
adm_vulkan.ccontains a hard-codedi_rfactorfast-path for the3.0 * 1080default viewing geometry. It is gated by:bool csf_default = (fabs(s->adm_csf_scale - 1.0) < 1e-9) && (fabs(s->adm_csf_diag_scale - 1.0) < 1e-9). If the fast-path is ever updated, the CSF-default guard must be updated to match; removing or loosening the guard will produce wrong rfactors when non-default CSF scales are in use. - Re-test on rebase:
# CPU-only build + golden test
meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false \
-Denable_vulkan=disabled
ninja -C build-cpu
make test-netflix-golden
# Verify default params produce unchanged scores
build-cpu/tools/vmaf \
-r python/test/resource/yuv/src01_hrc00_576x324.yuv \
-d python/test/resource/yuv/src01_hrc01_576x324.yuv \
-w 576 -h 324 -p 420 -b 8 \
--feature adm=noise_weight=0.03125:adm_csf_scale=1.0:adm_csf_diag_scale=1.0 \
--output /tmp/adm_param_default.json
# adm2 must match the no-param baseline (places=4).
0383 — K150K parallel CPU driver + feature_extractor_list dedup fix (ADR-0383)¶
- Touches:
ai/scripts/extract_k150k_features.py— driver redesign.ai/AGENTS.md— K150K-A invariant note updated.core/src/feature/feature_extractor.c— duplicate CUDA extractor registration removed (lines 239–240 deduplicated: six CUDA extractors that were registered twice).docs/ai/datasets/k150k.md— user-facing docs updated.docs/adr/0383-k150k-parallel-cpu-driver.md— new ADR.docs/research/0096-k150k-gpu-driver-investigation-2026-05-10.md— new digest.- Invariant 1 —
feature_extractor_list[]must have no duplicate entries. The dedup infeature_extractor_vector_append()is by extractor name, not by provided-feature name. Duplicate entries infeature_extractor_list[]result in both extractors being registered and both writing the same feature-collector slots. If upstream Netflix/vmaf modifiescore/src/feature/feature_extractor.cto add new backend entries, verify that no extractor is registered more than once. - Invariant 2 — CUDA binary double-write via default model auto-load. When
--modelis not specified and--no-predictionis absent, the CLI auto-loadsvmaf_v0.6.1, which registers CUDA twins viavmaf_use_features_from_model(). A subsequent--feature admcall registers the CPU "adm" extractor in addition; both run and double-write. This is a latent bug in the CLI model-auto-load / explicit-feature interaction path. The K150K pipeline works around it by using the CPU binary (no CUDA context). See Research-0096 for full root-cause analysis. - Re-test on rebase:
# Verify no duplicate entries exist in feature_extractor_list[]
grep -c "vmaf_fex_integer_adm_cuda" core/src/feature/feature_extractor.c
# Must print 2 (one extern declaration + one list entry)
# Verify CPU driver produces correct output
python ai/scripts/extract_k150k_features.py --limit 5 --threads-cuda 2 \
--out /tmp/smoke5.parquet && python3 -c \
"import pandas; df=pandas.read_parquet('/tmp/smoke5.parquet'); \
print(df.shape, df.columns.tolist()[:5])"
# Must print (5, 48) and the first five column names.
fix/ci-master-shfmt-cppcheck-semgrep — CI gate fixes (ADR-0384)¶
- Touches:
.pre-commit-config.yaml,.github/workflows/lint-and-format.yml,core/src/feature/adm.c,core/src/feature/ansnr.c,core/src/feature/offset.c,core/src/feature/vif.c,scripts/ci/check-agent-worktree-drift.sh. - Invariant: No rebase-sensitive invariants. The
.pre-commit-config.yamland workflow changes are fork-infrastructure. The(void *)cast fix in the four feature files is a portable C idiom; upstream may or may not carry their own version of these files. The semgrep-comment reword incheck-agent-worktree-drift.shis fork-only. - Re-test on rebase:
# Verify semgrep is clean
semgrep scan --config=.semgrep.yml --error
# Verify cppcheck finds no invalidPointerCast in the four files
cppcheck --enable=portability core/src/feature/adm.c \
core/src/feature/ansnr.c core/src/feature/offset.c \
core/src/feature/vif.c 2>&1 | grep invalidPointerCast
# Must produce no output.
no rebase impact: the CI infrastructure files are fork-local; the C source changes are minimal (cast through void*) and will trivially survive any upstream rebase that doesn't rewrite these specific functions.
fix/thread-pool-pthread-create-unchecked — thread pool pthread_create error handling + n_workers_created race fix¶
- Touches:
core/src/thread_pool.c. - Invariant:
VmafThreadPoolnow has an_workers_createdfield (written once at creation, never decremented) alongside the existingn_threadscounter (decremented by each exiting worker). Any upstream change tothread_pool.cthat adds or renames struct fields or changes thepthread_createcall site must be reconciled against the fork's error-handling block (lines ~170–192) and then_workers_createdfield initialisation. - Re-test on rebase:
meson setup /tmp/build-tp-rebase libvmaf \
-Denable_cuda=false -Denable_sycl=false --buildtype=debugoptimized
meson test -C /tmp/build-tp-rebase
# Must report 54/54 (or more) OK.
ai/tiny-netflix-training-scaffold — tiny-AI Netflix corpus training scaffold draft PR (ADR-0417)¶
- Touches:
docs/adr/0417-tiny-ai-netflix-training-scaffold-pr.md,docs/research/0099-tiny-ai-netflix-training-update.md,docs/adr/_index_fragments/0417-tiny-ai-netflix-training-scaffold-pr.md,changelog.d/added/0417-tiny-ai-netflix-training-scaffold-pr.md,docs/ai/training-data.md(See-also links only). - Invariant: the corpus path
.workingdir2/netflix/is gitignored and must never be committed. The--data-rootCLI flag andVMAF_DATA_ROOTenvironment variable are the only sanctioned ways to point the training scripts at the corpus. Any rebase or upstream sync that modifiesai/ormcp-server/vmaf-mcp/must preserve this invariant; verify withgit check-ignore -v .workingdir2/netflix/ref/(must return the root.gitignoreentry). - Re-test on rebase:
cd mcp-server/vmaf-mcp && python -m pytest tests/test_smoke_e2e.py -v
# Requires: meson compile -C build (for the vmaf binary).
# test_list_tools_returns_expected_names PASSED
# test_list_tools_each_has_input_schema PASSED
# test_call_tool_list_models_returns_list PASSED
# test_call_tool_list_backends_includes_cpu PASSED
# test_call_tool_unknown_name_returns_error_json PASSED
# test_call_tool_vmaf_score_golden_pair PASSED (requires build/tools/vmaf)
no rebase impact on libvmaf C sources: this branch is doc-only (ADR-0417, Research Digest 0099, changelog fragment, ADR index fragment). The MCP smoke test and training-data.md are already in master and untouched by this branch.
0420 — Metal (Apple Silicon) backend runtime (T8-1b / ADR-0420)¶
- Touches:
core/src/metal/common.mm(new, replacescommon.c) — MTLDevice + MTLCommandQueue lifecycle;MTLCreateSystemDefaultDevicefor auto-pick;MTLCopyAllDevicesfor explicit indexing; Apple-Family-7 gate.core/src/metal/picture_metal.mm(new, replacespicture_metal.c) — MTLBuffer allocator withMTLResourceStorageModeShared(zero-copy).core/src/metal/kernel_template.mm(new, replaceskernel_template.c) — private MTLCommandQueue + two MTLSharedEvent handles; blit-fill accumulator zero; cross-queueencodeWaitForEvent;waitUntilCompleteddrain.core/src/metal/common.h— two new internal accessors:vmaf_metal_context_device_handle()+vmaf_metal_context_queue_handle().core/src/metal/meson.build—dependency('Foundation'/'Metal', required: true);-fobjc-arcproject arg forobjcpp; source list flipped from.cto.mm.core/test/test_metal_smoke.c— smoke expectations flipped from-ENOSYSpin to runtime (0on Apple7+,-ENODEVelsewhere).docs/adr/0420-metal-backend-runtime-t8-1b.md+docs/adr/_index_fragments/0420-metal-backend-runtime-t8-1b.md+changelog.d/changed/metal-backend-runtime.md.- Upstream-port footprint: zero — Netflix/vmaf has no Metal backend.
- Rebase invariants:
- Header purity: no
<Metal/Metal.h>in any header or pure-C consumer. Metal handles cross the boundary asvoid */uintptr_t. Do not promote a Metal type into a header on rebase. - ARC bridge-cast discipline:
__bridge_retainedto stash (+1 retain),__bridge_transferto release (−1),__bridgeto borrow (no refcount). A missing_retainedleaks; a missing_transferdouble-frees. - Struct privacy:
struct VmafMetalContextis defined only incommon.mm. Consumers use the accessor pair — never struct-layout introspection. - HIP twin parity for kernel_template: any PR that grows the HIP
kernel_template.clifecycle must propagate the same change tokernel_template.mmin the same PR. - Re-test on rebase (macOS, Apple-Family-7+):
meson setup build libvmaf -Denable_metal=enabled \
-Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build test_metal_smoke # must PASS
On Linux: meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false && ninja -C build (Metal subdir not entered; no Metal test registered).
ADR-0422 — CLI HIP and Metal backend selectors (2026-05-11)¶
Files touched: core/tools/cli_parse.h, core/tools/cli_parse.c, core/tools/vmaf.c, core/test/test_cli_parse.c, core/include/libvmaf/libvmaf_metal.h.
Rebase impact: none — this PR only adds new CLI flags and branches to the standalone vmaf tool. No libvmaf public C API symbols changed; no meson_options.txt entries added; the ffmpeg-patches stack is unaffected. The libvmaf_metal.h change is docstring-only (no API surface delta).
Invariants to preserve on rebase:
CLISettingsincli_parse.hhasno_hip,hip_device,no_metal,metal_devicefields. If upstream adds its own HIP/Metal CLI flags in the same struct, resolve the merge by keeping the fork's field names (they match our header convention) and dropping any upstream stub.--backend cpudisables all five GPU backends (no_cuda,no_sycl,no_vulkan,no_hip,no_metal). If upstream extends the backend enum, ensure the cpu branch stays exhaustive.init_gpu_backends()signature invmaf.cpasseship_state/hip_activeandmetal_state/metal_activeby reference under#ifdef HAVE_HIP/#ifdef HAVE_METALguards. Preserve both the guards and the by-reference convention on rebase.
Smoke-test after rebase:
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false --buildtype=debug
ninja -C build
meson test -C build test_cli_parse # all 5 new tests must pass
2026-05-14 — vmaf-tune recommend --from-corpus Row Filtering¶
Files touched: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_recommend.py, docs/usage/vmaf-tune.md, docs/state.md.
Rebase impact: low. This only aligns the CLI --from-corpus path with the existing vmaftune.recommend.recommend() filtering contract. No corpus schema, encode path, score path, or model artefact changes.
Invariant to preserve on rebase: both CLI and library corpus recommendation must filter through RecommendRequest / recommend() so failed rows, NaN rows, and non-matching encoder / preset rows cannot win from the CLI.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_recommend.py -q
2026-05-14 — vmaf-tune Usage-Doc Scaffold Label Cleanup¶
Files touched: docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-coarse-to-fine.md, docs/usage/vmaf-tune-bitrate-ladder.md, docs/usage/vmaf-tune-ladder-default-sampler.md, docs/usage/vmaf-tune-saliency-aware.md.
Rebase impact: documentation-only. The implementation already lives in tools/vmaf-tune/src/vmaftune/corpus.py, ladder.py, saliency.py, and fast.py; this PR removes stale user-facing stub labels that survived after those surfaces were wired.
Invariant to preserve on rebase: usage docs are implementation-status contracts, not backlog labels. If a command is wired and tested, do not call it a stub/scaffold in docs/usage/; describe the shipped path and name any remaining production limit precisely.
Smoke-test after rebase:
rg -n 'scaffold-only|Status: scaffold only|\(stub\)|\*\*Stub\*\*|recommend --saliency-aware|advisory in scaffold' \
docs/usage/vmaf-tune.md \
docs/usage/vmaf-tune-coarse-to-fine.md \
docs/usage/vmaf-tune-bitrate-ladder.md \
docs/usage/vmaf-tune-ladder-default-sampler.md \
docs/usage/vmaf-tune-saliency-aware.md
2026-05-14 — vmaf-tune fast --time-budget-s Timeout Wiring¶
Files touched: tools/vmaf-tune/src/vmaftune/fast.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_fast.py, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-fast-path.md.
Rebase impact: low. This is a fast-path user-surface fix only; no libvmaf public C API, model schema, or FFmpeg patch stack is touched.
Invariant to preserve on rebase: time_budget_s is a soft Optuna timeout. Do not revert it to metadata-only. The JSON n_trials field reports completed trials because it may be lower than the requested --n-trials when the timeout fires.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_fast.py \
tools/vmaf-tune/tests/test_cli_fast.py -q
2026-05-14 — vmaf-tune Public Doc Stub-Label Sweep¶
Files touched: docs/usage/vmaf-tune-resolution-aware.md, docs/ai/ensemble-training-kit.md, docs/ai/models/vmaf_tiny_v5.md, docs/ai/per-pr-doc-bar.md, docs/ai/predictor.md, docs/development/ffmpeg-patches-refresh.md, docs/development/ossf-scorecard.md, tools/vmaf-tune/README.md, tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py, tools/vmaf-tune/src/vmaftune/per_shot.py.
Rebase impact: low. This PR updates stale public wording and docstrings after already-shipped implementations. It does not change the vmaf-tune row schema, CLI arguments, model defaults, or libvmaf public API.
Invariant to preserve on rebase: user-facing docs describe shipped implementation status, not old backlog labels. Keep intentional scaffold warnings only where the backing implementation or required external artefact is still genuinely missing.
Smoke-test after rebase:
rg -n '^# .*\(stub\)|^# .*stub|> \*\*Stub\*\*|0276-vmaf-tune-phase-d|full prose follows|later PR' \
docs/usage docs/ai docs/development tools/vmaf-tune/README.md -g '*.md'
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_resolution.py \
tools/vmaf-tune/tests/test_per_shot.py \
tools/vmaf-tune/tests/test_encode_dispatcher_per_adapter.py -q
.venv/bin/python -m ruff check \
tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py \
tools/vmaf-tune/src/vmaftune/per_shot.py
2026-05-14 — vmaf-tune Predictor Directory-Corpus Training¶
Files touched: tools/vmaf-tune/src/vmaftune/predictor_train.py, tools/vmaf-tune/tests/test_predictor_train.py, docs/usage/vmaf-tune.md, tools/vmaf-tune/README.md.
Rebase impact: low. This only broadens the trainer's corpus input resolver from a single JSONL file to a file-or-directory source. The corpus row schema, predictor input vector, shipped model defaults, and libvmaf public surface are unchanged.
Invariant to preserve on rebase: directory corpus traversal is recursive and sorted. Keep that determinism so repeated training over .workingdir2/corpus_run/ sees the same row order across filesystems.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_predictor_train.py \
-q
2026-05-14 — vmaf-tune benchmark Phase-G Corpus Report¶
Files touched: tools/vmaf-tune/src/vmaftune/benchmark.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_benchmark.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/adr/0424-vmaf-tune-corpus-benchmark.md, docs/research/0106-vmaf-tune-corpus-benchmark.md.
Rebase impact: low. The new command is a read-only consumer of the existing Phase-A JSONL row schema. It does not change CORPUS_ROW_KEYS, libvmaf public API, FFmpeg patches, or encode/scoring behaviour.
Invariant to preserve on rebase: vmaf-tune benchmark must stay offline. It reads corpus rows and reports matched-quality encoder summaries; live encode comparisons remain owned by vmaf-tune compare.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_benchmark.py -q
2026-05-14 — vmaf-tune auto winner selection¶
Files touched: tools/vmaf-tune/src/vmaftune/auto.py, tools/vmaf-tune/tests/test_auto_short_circuits.py, docs/usage/vmaf-tune.md, tools/vmaf-tune/AGENTS.md.
Rebase impact: low. The Phase F JSON schema now includes metadata.winner and a per-cell selected boolean, but corpus rows and libvmaf public APIs are unchanged.
Invariant to preserve on rebase: keep metadata.winner aligned with exactly one cells[].selected == true row. The selector remains quality/budget ordered per ADR-0428.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_auto_short_circuits.py -q
2026-05-14 — testdata bench_perf portability¶
Files touched: testdata/bench_perf.py, testdata/test_bench_perf.py, docs/benchmarks.md.
Rebase impact: low. The performance JSON snapshots are unchanged; only the FFmpeg lavfi benchmark harness gains configuration and hardware-free smoke surfaces.
Invariant to preserve on rebase: bench_perf.py must not reintroduce mandatory machine-local paths. The MP4 decode test remains opt-in through --bbb-mp4-ref / VMAF_BBB_MP4_REF, while --require-all is the strict mode.
Smoke-test after rebase:
PYTHONPATH=. .venv/bin/python -m pytest testdata/test_bench_perf.py -q
.venv/bin/python testdata/bench_perf.py --list-tests
.venv/bin/python testdata/bench_perf.py --backend cpu --dry-run
2026-05-14 — CHUG HDR Corpus Ingestion + Feature Materialisation¶
Files touched: ai/scripts/chug_to_corpus_jsonl.py, ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, scripts/dev/training_discovery_report.py, docs/ai/chug-ingestion.md, docs/ai/mos-corpora.md, docs/research/0101-training-discovery-synthesis-2026-05-14.md, docs/adr/0426-chug-hdr-corpus-ingestion.md, docs/adr/0427-chug-hdr-feature-materialisation.md, and ai/AGENTS.md.
Rebase impact: low to medium. The new CHUG adapter is fork-local and local-only, but it intentionally widens the MOS-corpus family with an HDR dataset and optional chug_* JSONL metadata fields.
Invariant to preserve on rebase: CHUG media and labels stay out of git. The adapter stores CHUG's raw mos_j as mos_raw_0_100 and maps the trainer-facing mos to [1, 5]; do not silently change that scale. The feature materialiser pairs distorted rows to the matching chug_content_name reference and scales distorted clips to reference geometry before extraction. Keep the license posture non-commercial/share-alike until the README/license mismatch is clarified upstream.
Smoke-test after rebase:
PYTHONPATH=ai/src .venv/bin/python -m pytest ai/tests/test_chug.py -q
python3 scripts/dev/training_discovery_report.py --output /tmp/training_discovery_report.md
2026-05-14 — vmaf-tune ladder Uncertainty CLI Wiring¶
Files touched: tools/vmaf-tune/src/vmaftune/ladder.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_ladder.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, and docs/usage/vmaf-tune-ladder.md.
Rebase impact: low. The normal point-estimate ladder path is unchanged. When --with-uncertainty is set, corpus rows that contain a vmaf_interval object now flow through apply_uncertainty_recipe() before select_knees(). Rows without intervals use the active wide_interval_min_width as a conservative centred fallback interval so point-only corpora still participate in midpoint insertion.
Invariant to preserve on rebase: the uncertainty transform stays post-hull and pre-knee-selection. Do not run it before convex_hull(), or synthetic midpoint rungs can distort the Pareto filtering stage.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_ladder.py \
tools/vmaf-tune/tests/test_ladder_uncertainty.py -q
2026-05-14 — vmaf-tune libaom-av1 saliency ROI Dispatch¶
Files touched: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/codec_adapters/libaom.py, tools/vmaf-tune/src/vmaftune/cli.py, vmaf-tune saliency tests, and the matching usage docs/state/changelog notes.
Rebase impact: low. The change only adds libaom-av1 to the existing saliency ROI dispatch table and uses the FFmpeg patch stack's top-level -qpfile <path> option. It does not alter scoring, predictor inputs, model files, or libvmaf public ABI.
Invariant to preserve on rebase: libaom-av1 saliency uses the shared x264-style 16x16 qpfile writer, but passes it as separate argv tokens ("-qpfile", path). Keep ephemeral cleanup aware of both key=path params and -qpfile path pairs.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_saliency.py \
tools/vmaf-tune/tests/test_saliency_roi_adapters.py \
tools/vmaf-tune/tests/test_saliency_roi_codec.py \
-q
2026-05-14 — Metal Dispatch Support Table¶
Files touched: core/src/metal/dispatch_strategy.c, core/src/metal/dispatch_strategy.h, core/test/test_metal_smoke.c, core/src/metal/AGENTS.md, docs/backends/metal/index.md.
Rebase impact: low. The dispatch predicate now reflects the Metal kernels already compiled into the backend; it does not change kernel math, picture layout, metallib embedding, or public libvmaf_metal.h symbols.
Invariant to preserve on rebase: every newly-landed Metal extractor must append both its extractor name and its provided feature keys to g_metal_features. Unknown features, NULL contexts, and NULL names must keep returning 0.
Smoke-test after rebase:
meson setup build-metal -Denable_metal=enabled
ninja -C build-metal test_metal_smoke
meson test -C build-metal test_metal_smoke
2026-05-14 — Tiny-AI Bisect Cache Real-Feature Bridge¶
Files touched: ai/scripts/build_bisect_cache.py, ai/tests/test_build_bisect_cache.py, ai/testdata/bisect/README.md, docs/ai/bisect-model-quality.md, ai/AGENTS.md.
Rebase impact: low. The committed nightly cache remains generated from the existing deterministic synthetic seeds unless callers pass --source-features. The real-feature path only broadens the generator to materialise an operator-provided parquet into the same features.parquet + linear-ONNX timeline layout.
Invariant to preserve on rebase: the output feature order stays adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2, and the output target column stays named mos even when the source uses dmos, target, or score.
Smoke-test after rebase:
PYTHONPATH=ai/src .venv/bin/python -m pytest \
ai/tests/test_build_bisect_cache.py \
ai/tests/test_bisect_model_quality.py -q
PYTHONPATH=ai/src .venv/bin/python ai/scripts/build_bisect_cache.py --check
2026-05-14 — Vulkan VIF Manual Int64 Subgroup Reduction¶
Files touched: core/src/feature/vulkan/shaders/vif.comp, core/src/vulkan/AGENTS.md, docs/adr/0269-vif-ciede-precise-step-a.md, docs/research/0108-vulkan-vif-int64-subgroup-reduction-2026-05-14.md, docs/state.md.
Rebase impact: medium. The shader semantic change is intentionally small, but it is load-bearing for Vulkan API-1.4 parity on NVIDIA. Do not simplify the Phase-4 VIF accumulator path back to subgroupAdd(int64_t) when resolving upstream shader conflicts.
Invariant to preserve on rebase: vif.comp must keep GL_KHR_shader_subgroup_shuffle and the manual reduce_i64_subgroup(...) helper for all seven int64 accumulator fields. The helper exists because NVIDIA RTX 4090 + driver 595.71.05 produced non-deterministic integer_vif_scale2 output through subgroupAdd(int64_t) at Vulkan API 1.4.
Smoke-test after rebase:
glslc --target-env=vulkan1.3 -O \
core/src/feature/vulkan/shaders/vif.comp -o /tmp/vif.spv
ninja -C build-vulkan-int64 tools/vmaf
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary "$PWD/build-vulkan-int64/tools/vmaf" \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --feature vif --backend vulkan \
--device 0 --places 4
2026-05-14 — Saliency RGB ingest + SSIMULACRA2 public docs¶
Files touched: tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/tests/test_saliency.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/metrics/ssimulacra2.md, docs/metrics/features.md, docs/adr/0430-saliency-rgb-ingest-and-ssimulacra2-docs.md, docs/research/0112-public-doc-gap-batch-2026-05-14.md.
Rebase impact: low. The changed saliency preprocessing is fork-local and keeps the same ONNX model input shape ([1, 3, H, W]).
Invariant to preserve on rebase: compute_saliency_map() must keep Y/U/V yuv420p ingest, BT.709 limited-range YUV-to-RGB conversion, and ImageNet normalisation before invoking saliency_student_v1. The old luma-replicated RGB path is no longer the user-facing contract.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_saliency.py -q
scripts/docs/concat-adr-index.sh --check
2026-05-14 — test_score_pooled_eagain Sanitizer Deselect Retired¶
Files touched: .github/workflows/tests-and-quality-gates.yml, core/src/feature/x86/adm_avx2.c, docs/state.md.
Rebase impact: low. The sanitizer workflow now dispatches test_score_pooled_eagain again in ASan, UBSan, and TSan lanes. The AVX2 ADM helper keeps scalar-path parity for direct-LUT-range values: temp < 32768 returns temp and shift 0, while larger values still use the rounded 15-bit reduction. The remaining T-SANITIZER-DEFECTS-REVEALED-758 exclusions stay in place.
Invariant to preserve on rebase: sanitizer deselect regexes should contain only tests with an active state row. Do not re-add test_score_pooled_eagain unless a fresh sanitizer report is captured and tracked. Do not call __builtin_clz() for ADM direct-LUT values below 32768.
Smoke-test after rebase:
ASAN_OPTIONS=detect_leaks=1:halt_on_error=1 \
./core/build-asan-score/test/test_score_pooled_eagain
UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 \
./core/build-ubsan-score/test/test_score_pooled_eagain
TSAN_OPTIONS=halt_on_error=1 \
./core/build-tsan-score/test/test_score_pooled_eagain
2026-05-14 — test_feature_collector Sanitizer Deselect Retired¶
Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md.
Rebase impact: low. The sanitizer workflow now dispatches test_feature_collector again in ASan, UBSan, and TSan lanes. The remaining T-SANITIZER-DEFECTS-REVEALED-758 exclusions stay in place.
Invariant to preserve on rebase: sanitizer deselect regexes should contain only tests with an active state row. Do not re-add test_feature_collector unless a fresh sanitizer report is captured and tracked.
Smoke-test after rebase:
ASAN_OPTIONS=detect_leaks=1:halt_on_error=1 \
./core/build-asan-score/test/test_feature_collector
UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 \
./core/build-ubsan-score/test/test_feature_collector
TSAN_OPTIONS=halt_on_error=1 \
./core/build-tsan-score/test/test_feature_collector
2026-05-14 — test_pic_preallocation Sanitizer Deselect Retired¶
Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md.
Rebase impact: low. The sanitizer workflow now dispatches test_pic_preallocation again in ASan, UBSan, and TSan lanes. The remaining T-SANITIZER-DEFECTS-REVEALED-758 exclusions stay in place.
Invariant to preserve on rebase: sanitizer deselect regexes should contain only tests with an active state row. Do not re-add test_pic_preallocation unless a fresh sanitizer report is captured and tracked.
Smoke-test after rebase:
ASAN_OPTIONS=detect_leaks=1:halt_on_error=1 \
./core/build-asan-score/test/test_pic_preallocation
UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 \
./core/build-ubsan-score/test/test_pic_preallocation
TSAN_OPTIONS=halt_on_error=1 \
./core/build-tsan-score/test/test_pic_preallocation
2026-05-14 — vmaf-tune libx265 encoder-stats parser¶
Files touched: tools/vmaf-tune/src/vmaftune/encoder_stats.py, tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py, tools/vmaf-tune/tests/test_encoder_stats_parser_x264.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md.
Rebase impact: low. The corpus row schema stays at the existing v3 ten-column enc_internal_* contract; this only teaches the parser x265's pass-1 aliases (q-aq, icu, pcu, scu) and fractional CTU counts.
Invariant to preserve on rebase: x264 imb / pmb / smb and x265 icu / pcu / scu must continue to feed the same intra / predicted / skip ratio columns. Do not split the public corpus schema per codec.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_encoder_stats_parser_x264.py -q
2026-05-14 — vmaf-tune predictor directory-corpus orchestration¶
Files touched: tools/vmaf-tune/src/vmaftune/predictor_train.py, tools/vmaf-tune/tests/test_predictor_train.py, docs/usage/vmaf-tune.md.
Rebase impact: low. The loader already supported recursive JSONL directories; this change removes stale is_file() gates in the trainer orchestration so CLI/API callers get the documented real-corpus path. Model format, feature order, corpus row schema, and shipped model bytes are unchanged.
Invariant to preserve on rebase: --corpus <directory> and train_all_codecs(corpus_path=<directory>) must call the same load_corpus() path as single-file inputs. Do not reintroduce file-only guards above the loader.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_predictor_train.py -q
2026-05-15 — vmaf-tune sidecar CLI wiring¶
Files touched: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_cli_sidecar.py, tools/vmaf-tune/tests/test_sidecar.py, tools/vmaf-tune/AGENTS.md, docs/ai/local-sidecar-training.md, docs/usage/vmaf-tune.md, docs/research/0122-vmaf-tune-sidecar-cli-2026-05-15.md.
Rebase impact: low. This adds one top-level vmaf-tune sidecar subcommand group and does not change corpus row schemas, predictor ONNX schemas, codec adapters, libvmaf public APIs, or FFmpeg patches.
Invariant to preserve on rebase: the CLI must remain a thin wrapper over vmaftune.sidecar.SidecarPredictor. It must keep the same cache layout (<cache>/<predictor-version>/<codec>/state.json), random host UUID posture, and ShotFeatures feature names as the Python API. Do not add upload, hostname-derived IDs, or predictor mutation to this surface.
Smoke-test after rebase:
cd tools/vmaf-tune && ../../.venv/bin/python -m pytest \
tests/test_cli_sidecar.py tests/test_sidecar.py -q
2026-05-14 — vmaf-tune Phase-B Bisect Sample Clips¶
Files touched: tools/vmaf-tune/src/vmaftune/bisect.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_bisect.py, tools/vmaf-tune/tests/test_compare.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-bisect.md, docs/research/0109-vmaf-tune-bisect-sample-clip-2026-05-14.md.
Rebase impact: low. The public addition is one vmaf-tune compare flag and one Python bisect argument. It does not change the compare report schema, codec adapter registry, libvmaf public API, or FFmpeg patch stack.
Invariant to preserve on rebase: sample_clip_seconds in bisect_target_vmaf, make_bisect_predicate, and vmaf-tune compare must compute one centre-anchored sample window and thread it into both EncodeRequest (sample_clip_start_s / sample_clip_seconds) and ScoreRequest (frame_skip_ref / frame_cnt). Bitrate must be normalised against the sample duration when sample-clip mode is active. Unknown duration, non-positive framerate, or samples not shorter than the source remain full-source mode.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_bisect.py \
tools/vmaf-tune/tests/test_compare.py -q
2026-05-14 — vmaf-tune ladder spacing alias fix¶
Files touched: tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/src/vmaftune/ladder.py, tools/vmaf-tune/tests/test_ladder.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md.
Rebase impact: low. This only keeps the Phase-E CLI choices and library spacing modes aligned. The ladder hull math, default sampler, manifest schema, and encode/scoring behaviour are unchanged.
Invariant to preserve on rebase: argparse choices for vmaf-tune ladder --spacing must stay in lockstep with ladder.select_knees(). vmaf is the documented perceptual spacing mode; uniform remains a backwards-compatible alias for that same mode.
Smoke-test after rebase:
2026-05-15 — Tiny-AI real-weight limitation docs¶
Files touched: docs/ai/roadmap.md, docs/ai/models/fastdvdnet_pre.md, docs/metrics/features.md, core/src/dnn/AGENTS.md, core/src/feature/AGENTS.md.
Rebase impact: low. This is a documentation / invariant-note cleanup that aligns user-facing docs with the already-shipped smoke: false FastDVDnet and TransNet V2 registry entries. Model bytes, registry schema, extractor I/O names, and runtime behaviour are unchanged.
Invariant to preserve on rebase: do not reintroduce placeholder-only wording for fastdvdnet_pre or transnet_v2. The remaining follow-ups are the FFmpeg temporal-filter consumer, luma-native FastDVDnet retrain, per-shot CRF aggregation, and true RGB / bilinear TransNet thumbnails.
Smoke-test after rebase:
rg -n "real upstream weights are tracked|ADR-0246|0253-fastdvdnet" \
docs/ai docs/metrics core/src/dnn/AGENTS.md core/src/feature/AGENTS.md
2026-05-15 — vmaf-roi High-Bit-Depth Input¶
Files touched: core/tools/vmaf_roi.c, core/tools/test/meson.build, core/tools/test/test_vmaf_roi_high_bitdepth.sh, core/tools/AGENTS.md, docs/usage/vmaf-roi.md, docs/research/0123-vmaf-roi-high-bitdepth-2026-05-15.md.
Rebase impact: low. This extends an existing CLI flag and does not change libvmaf public APIs, encoder sidecar schemas, or FFmpeg patches.
Invariant to preserve on rebase: vmaf-roi --bitdepth 10|12|16 must seek using full planar YUV frame bytes, including chroma planes and 16-bit sample containers, then downscale luma to the existing luma8 saliency-model contract. Unsupported depths such as 9-bit remain rejected.
Smoke-test after rebase:
2026-05-15 — vmaf-perShot 4:2:2 / 4:4:4 Input¶
Files touched: core/tools/vmaf_per_shot.c, core/tools/test/test_vmaf_per_shot.sh, core/tools/AGENTS.md, docs/usage/vmaf-perShot.md, docs/research/0124-vmaf-pershot-422-444-2026-05-15.md.
Rebase impact: low. This extends one existing CLI option and does not change the CSV / JSON plan schema, libvmaf public APIs, or FFmpeg patch stack.
Invariant to preserve on rebase: vmaf-perShot remains luma-only for detection and CRF prediction, but --pixel_format 420|422|444 must count the selected planar chroma layout when skipping to the next frame. --bitdepth remains limited to 8|10|12|16.
Smoke-test after rebase:
2026-05-15 — CUDA psnr_hvs DCT Parallelisation¶
Files touched: core/src/feature/cuda/integer_psnr_hvs/psnr_hvs_score.cu, core/src/feature/cuda/AGENTS.md, docs/backends/cuda/overview.md, docs/research/0130-cuda-psnr-hvs-dct-parallel-2026-05-15.md.
Rebase impact: low. The host lifecycle, feature names, CLI surface, and public APIs are unchanged. This is a CUDA-kernel scheduling optimisation for an existing extractor.
Invariant to preserve on rebase: only the integer 8x8 DCT passes run across the first eight CUDA threads. Float means, variances, masking, and masked-error accumulation stay thread-0 serial in CPU scan order; do not convert them to warp/block reductions without a separate numeric-contract ADR and cross-backend tolerance update.
Smoke-test after rebase:
python3 scripts/ci/cross_backend_vif_diff.py \
--vmaf-binary "$PWD/core/build-cuda/tools/vmaf" \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --feature psnr_hvs --backend cuda --places 3
2026-05-15 — test_cli_parse Sanitizer Deselect Retired¶
Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md, changelog.d/fixed/sanitizer-cli-parse.md.
Rebase impact: low. This only narrows the ADR-0347 sanitizer deselect regexes after re-verifying test_cli_parse on current master; the CLI parser behavior and public options are unchanged.
Invariant to preserve on rebase: keep test_cli_parse out of the ASan / UBSan / TSan EXCLUDE regexes unless a new sanitizer report is captured and tracked in docs/state.md.
Smoke-test after rebase:
ASAN_OPTIONS=detect_leaks=1:halt_on_error=1:abort_on_error=1:print_summary=1 ./core/build-asan-cli/test/test_cli_parse
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ./core/build-ubsan-cli/test/test_cli_parse
TSAN_OPTIONS=halt_on_error=1 ./core/build-tsan-cli/test/test_cli_parse
2026-05-15 — test_predict Sanitizer Deselect Retired¶
Files touched: .github/workflows/tests-and-quality-gates.yml, docs/state.md, changelog.d/fixed/sanitizer-predict.md.
Rebase impact: low. This only narrows the ADR-0347 sanitizer deselect regexes after re-verifying test_predict on current master; prediction logic, model loading, and output scores are unchanged.
Invariant to preserve on rebase: keep test_predict out of the ASan / UBSan / TSan EXCLUDE regexes unless a new sanitizer report is captured and tracked in docs/state.md.
Smoke-test after rebase:
ASAN_OPTIONS=detect_leaks=1:halt_on_error=1:abort_on_error=1:print_summary=1 ./core/build-asan-predict/test/test_predict
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ./core/build-ubsan-predict/test/test_predict
TSAN_OPTIONS=halt_on_error=1 ./core/build-tsan-predict/test/test_predict
2026-05-15 — vmaf-tune HDR Dispatch Coverage¶
Files touched: tools/vmaf-tune/src/vmaftune/hdr.py, tools/vmaf-tune/tests/test_hdr.py, tools/vmaf-tune/tests/test_auto_short_circuits.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-hdr-and-sampling.md, docs/research/0126-vmaf-tune-hdr-dispatch-coverage-2026-05-15.md.
Rebase impact: low. This extends the existing ADR-0300 dispatch table only; it does not change corpus schema, codec-adapter quality knobs, or HDR model lookup.
Invariant to preserve on rebase: hdr_codec_args() remains the single HDR argv contract. Hardware HEVC rows should emit p010le + main10 plus global color tags; hardware AV1 rows should emit p010le plus global color tags; codec-private mastering-display / MaxCLL flags stay limited to verified families.
Smoke-test after rebase:
PYTHONPATH=tools/vmaf-tune/src .venv/bin/python -m pytest \
tools/vmaf-tune/tests/test_hdr.py \
tools/vmaf-tune/tests/test_auto_short_circuits.py -q
2026-05-15 — Docs Pages Strict-Anchor Repair¶
Files touched: docs/ai/quantization.md, docs/api/gpu.md, docs/metrics/ssimulacra2.md, docs/usage/vmaf-tune.md, changelog.d/fixed/docs-pages-anchor-strict.md.
Rebase impact: low. This only corrects MkDocs-rendered internal anchors and adds the missing docs/api/gpu.md HIP / Metal section targets consumed by docs/api/index.md.
Invariant to preserve on rebase: mkdocs build --strict must stay green on master before a docs-affecting PR is merged; do not weaken validation.links.anchors: warn to hide anchor drift.
Smoke-test after rebase:
2026-05-15 — CHUG FULL_FEATURES Parquet Metadata Enrichment¶
Files touched: ai/scripts/enrich_k150k_parquet_metadata.py, ai/tests/test_enrich_k150k_parquet_metadata.py, ai/AGENTS.md, docs/ai/chug-ingestion.md, and docs/ai/datasets/k150k.md.
Rebase impact: low. This adds a recovery utility for local FULL_FEATURES parquet jobs that predate --metadata-jsonl; it does not change the extraction schema or feature column order.
Invariant to preserve on rebase: the enrichment utility matches rows by clip_name, fills missing metadata cells by default, writes parquet atomically, and leaves feature/MOS columns unchanged unless the operator passes --overwrite-metadata.
Smoke-test after rebase:
PYTHONPATH=ai/src .venv/bin/python -m pytest \
ai/tests/test_enrich_k150k_parquet_metadata.py \
ai/tests/test_extract_k150k_features.py -q
fix/saliency-per-mb-eval-2026-05-15 — CLI short-opt + bench atoi fix (Batch 5)¶
Branch: fix/saliency-per-mb-eval-2026-05-15
Files touched: core/tools/cli_parse.c, core/tools/vmaf_bench.c, core/test/test_cli_parse.c, docs/usage/cli.md.
Rebase impact: low. cli_parse.c and vmaf_bench.c are upstream-shared files; if Netflix/vmaf ever adds a new short option or touches the same switch block, the case 'c': fall-through arm may need to be re-applied. The invariant comment (INVARIANT (ADR-0438)) marks the intent clearly. vmaf_bench.c changes are in the SYCL-gated #if defined(HAVE_SYCL) block; upstream is unlikely to add atoi back.
Invariant to preserve on rebase: every entry in short_opts[] in core/tools/cli_parse.c must have a matching case arm in the switch (o) block inside cli_parse(). If Netflix adds a new short option upstream without a case, the same silent-drop bug recurs.
Smoke-test after rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build test/test_cli_parse
./build/test/test_cli_parse # expect: 18 tests run, 18 passed
fix/motion-fps-weight-all-gpu-backends — motion_fps_weight parity across all GPU twins¶
Branch: fix/saliency-per-mb-eval-2026-05-15 (squash PR #863)
Files touched: core/src/feature/cuda/integer_motion_v2_cuda.c, core/src/feature/sycl/integer_motion_v2_sycl.cpp, core/src/feature/vulkan/motion_v2_vulkan.c, core/src/feature/hip/integer_motion_v2_hip.c, core/src/feature/metal/integer_motion_v2_metal.mm, core/src/feature/cuda/float_motion_cuda.c, core/src/feature/sycl/float_motion_sycl.cpp, core/src/feature/vulkan/float_motion_vulkan.c, core/src/feature/hip/float_motion_hip.c, core/src/feature/metal/float_motion_metal.mm.
Rebase impact: low. All touched files are fork-local or fork-added GPU twins; upstream Netflix/vmaf does not maintain any GPU motion extractor files. No upstream-shared path is modified.
Invariant to preserve on rebase: motion_fps_weight must remain present in every motion-family GPU twin's VmafOption options[] table and applied identically (see canonical note in core/src/feature/cuda/AGENTS.md). If a future PR introduces a new motion GPU backend or a new motion-related option, the same option table and application math must be replicated across all twins in the same PR.
Smoke-test after rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast
core/test/meson.build — suite-tagging invariant (fix/meson-suite-fast)¶
Files touched: core/test/meson.build, core/test/AGENTS.md.
Rebase impact: moderate. Upstream Netflix/vmaf periodically adds new test() calls to core/test/meson.build without suite: arguments (that is the upstream convention). Every upstream sync or port-upstream-commit cherry-pick that touches this file must be followed by:
Any output is a missing tag — add the appropriate suite: before merging. Failure to do so silently breaks meson test -C build --suite=fast (the pre-push gate) because Meson's --suite filter matches only tests that declare the named suite; untagged tests are invisible to the filter and the command exits 0 with zero tests run.
Invariant to preserve on rebase: every test(...) call in core/test/meson.build carries a suite: keyword argument. The fast suite is the pre-push gate; simd and gpu are secondary selectors for CI matrix jobs. See core/test/AGENTS.md for the full tag matrix.
Smoke-test after rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast --list # must print >20 tests, not 0
perf/cambi-calculate-c-values-avx512-neon-2026-05-16 (ADR-0452)¶
What changed: Added calculate_c_values_row_avx512 and calculate_c_values_row_neon as siblings of the existing calculate_c_values_row_avx2. Updated cambi.c dispatch to assign calculate_c_values_avx512 on AVX-512 hosts and corrected the NEON wrapper to call calculate_c_values_row_neon instead of the scalar fallback.
Rebase impact: low. All modified files are fork-local additions to cambi SIMD infrastructure; upstream Netflix/vmaf does not maintain AVX-512 or NEON CAMBI kernels. No public API surface is changed.
Invariant to preserve on rebase: The twin-update rule (x86/AGENTS.md, arm64/AGENTS.md) now requires that every cambi inner-loop function ported to AVX2 ships with AVX-512 + NEON siblings in the same PR. Do not merge a cambi AVX2 kernel without the matching AVX-512 + NEON files and a dispatch update in cambi.c.
refactor/gpu-dispatch-parse-dedup — shared GPU dispatch env tokenizer (ADR-0483)¶
Branch: refactor/gpu-dispatch-parse-dedup
Files touched: core/src/gpu_dispatch_parse.h (new), core/src/cuda/dispatch_strategy.c, core/src/sycl/dispatch_strategy.cpp, core/src/vulkan/dispatch_strategy.c.
Rebase impact: low. The three dispatch_strategy TUs are fork-local; upstream Netflix/vmaf does not have dispatch_strategy.c files. No public headers, no meson sources, and no link-time symbols change — the new gpu_dispatch_parse.h is a header-only static inline and is not added to any meson source list.
Invariant to preserve on rebase: k_<backend>_strategy_names[] index 0 must equal the backend enum's default value (e.g. VMAF_CUDA_DISPATCH_DIRECT = 0). When adding new strategy enum values, append to both the enum and the table; never reorder either.
perf/chug-drop-ssimulacra2-cuda-self-vs-self-2026-05-16 — K150K/CHUG self-vs-self extraction schema v2¶
Branch: perf/chug-drop-ssimulacra2-cuda-self-vs-self-2026-05-16
Files touched: ai/scripts/extract_k150k_features.py, ai/AGENTS.md, ai/tests/test_extract_k150k_no_ssimulacra2.py.
Rebase impact: low. All touched files are fork-local tiny-AI infrastructure; upstream Netflix/vmaf does not maintain ai/ or K150K extraction pipelines. No upstream-shared C/C++/headers are modified.
Invariant to preserve on rebase: the K150K extraction script (extract_k150k_features.py) is a fork-only feature. If upstream adds its own extract_features.py or similar, keep them separate under different package names; do not merge them. The parquet schema v2 (21-feature, no ssimulacra2) is now authoritative for new K150K/CHUG extraction runs. Existing v1 parquets (22-feature, with ssimulacra2) are grandfathered in; loaders must handle both by detecting feature count at runtime or reading a schema-version sidecar (future work).
Smoke-test after rebase:
feat/psnr-hvs-vulkan-enable-chroma-2026-05-16 — enable_chroma option for psnr_hvs_vulkan (ADR-0461)¶
- Touches:
core/src/feature/vulkan/psnr_hvs_vulkan.c - Invariant:
enable_chromadefaults totrue; do not flip. Whenfalse,n_planes=1, chroma pipelines are not created, and the combinedpsnr_hvsscore is suppressed.close_fex()relies onVK_NULL_HANDLEguards for the chroma pipeline variants. - No rebase impact on upstream files (fork-local Vulkan extractor).
Smoke-test after rebase:
# Default (enable_chroma=true): expect psnr_hvs_y + psnr_hvs_cb + psnr_hvs_cr + psnr_hvs
./build/tools/vmaf --reference src01_hrc00_576x324.yuv \
--distorted src01_hrc01_576x324.yuv \
--width 576 --height 324 --feature psnr_hvs_vulkan
# Luma-only: expect only psnr_hvs_y
./build/tools/vmaf --reference src01_hrc00_576x324.yuv \
--distorted src01_hrc01_576x324.yuv \
--width 576 --height 324 \
--feature-opts 'psnr_hvs_vulkan=enable_chroma=false'
feat/hip-float-adm-real-2026-05-16 — HIP float_adm ninth consumer (ADR-0468)¶
Branch: feat/hip-float-adm-real-2026-05-16
Files touched: core/src/feature/hip/float_adm_hip.c (new), core/src/feature/hip/float_adm_hip.h (new), core/src/feature/hip/float_adm/float_adm_score.hip (new), core/src/hip/meson.build (add TU to hip_sources), core/src/meson.build (add float_adm_score to hip_kernel_sources), core/src/feature/feature_extractor.c (extern decl + #if HAVE_HIP list row), docs/adr/0468-hip-float-adm-real-kernel.md (new), docs/adr/README.md (index row), changelog.d/added/hip-float-adm-real-kernel.md (new).
Rebase impact: low. All new files are fork-local HIP infrastructure; upstream Netflix/vmaf does not maintain a HIP backend. The only upstream-shared file touched is feature_extractor.c, where the change is limited to adding an extern declaration and a single list entry inside #if HAVE_HIP — a block upstream does not have.
Invariant to preserve on rebase: float_adm_hip.c must track float_adm_cuda.c semantically. Any change to the four pipeline stages (DWT coefficients, decouple angle flag parenthesisation, CM threshold 8-neighbour sum, border factor) must be mirrored in both the CUDA and HIP TUs. The warp-size difference (CUDA=32 vs HIP=64) means the shared-memory partial arrays differ in size (FADM_WARPS_PER_BLOCK = 8 vs 4); this is correct and must not be unified.
perf/adm-p-norm-fast-path-vif-arm64-malloc-2026-05-16 (ADR-0463)¶
What changed: Added adm_cm_s_p3, adm_csf_den_scale_s_p3, and adm_sum_cube_s_p3 fast-path variants in adm_tools.c; dispatch added in adm.c:compute_adm. Removed per-call aligned_malloc from the scalar fallback paths of vif_filter1d_s, vif_filter1d_sq_s, and vif_filter1d_xy_s in vif_tools.c — the caller-supplied tmpbuf is used instead.
Rebase impact: low. All modified files (adm_tools.c, adm_tools.h, adm.c, vif_tools.c) are shared with upstream Netflix/vmaf. The ADM changes add new symbols (no existing signatures altered). The VIF changes only remove local malloc/free; the function signatures and caller-supplied tmpbuf contract are unchanged.
Invariant to preserve on rebase: When upstream Netflix/vmaf modifies adm_cm_s, adm_csf_den_scale_s, or adm_sum_cube_s, the corresponding _p3 variants in the fork must receive the same logic change (minus the powf path). When upstream modifies vif_filter1d_* scalar fallbacks, ensure they do not reintroduce aligned_malloc in the fallback body. See core/src/feature/AGENTS.md performance-invariant section.
fix/dispatch-strategy-registry-audit-2026-05-15 — dispatch registry deduplication + HIP/Metal fixes¶
Touches: core/src/feature/feature_extractor.c (SYCL/Vulkan sections of feature_extractor_list[]), core/src/hip/dispatch_strategy.c, core/src/metal/dispatch_strategy.c.
Rebase impact: low for the SYCL/Vulkan deduplication (purely cosmetic — first-match semantics mean behaviour is unchanged). Medium for HIP and Metal dispatch-supports: if an upstream sync adds new feature_extractor_list[] entries for HIP or Metal extractors, they must also be added to g_hip_features[] / g_metal_features[] in the same commit.
Invariant to preserve on rebase: every vmaf_fex_*_hip extractor registered in feature_extractor_list[] must appear in g_hip_features[] in core/src/hip/dispatch_strategy.c. Every vmaf_fex_*_metal extractor must appear (by extractor .name and all provided_features[] keys) in g_metal_features[] in core/src/metal/dispatch_strategy.c. The build does not enforce this — run scripts/ci/check-dispatch-registry.sh after any kernel addition.
Smoke-test after rebase:
meson setup build -Denable_hip=true -Denable_hipcc=false
ninja -C build
# Must compile without errors; vmaf_fex_float_adm_hip must be registered.
# With a ROCm 6+ toolchain:
meson setup build -Denable_hip=true -Denable_hipcc=true
ninja -C build
meson test -C build --suite=fast
perf/vif-cuda-smem-staging-2026-05-16 (ADR-0454)¶
Files touched: core/src/feature/cuda/integer_vif/filter1d.cu, core/src/feature/cuda/AGENTS.md.
Rebase impact: low. filter1d.cu is a fork-local CUDA kernel file (Netflix/vmaf does not maintain CUDA implementations). AGENTS.md is fork-local. No upstream-shared path, public header, or build file is modified.
Invariant to preserve on rebase: the __shared__ smem staging in all four filter template functions must be preserved verbatim (see canonical note added to AGENTS.md). If a future upstream commit adds a filter1d.cu-equivalent (unlikely — Netflix does not ship GPU VIF), reconcile by keeping the smem staging on our side. Do not remove the __syncthreads() between the cooperative load and the compute phase — that barrier is the only thing ordering the smem writes from all threads before any thread reads.
perf/ssimulacra2-cuda-blur-fusion-transpose — 3-channel kernel fusion + V-pass transpose (ADR-0456)¶
- Touches:
core/src/feature/cuda/ssimulacra2/ssimulacra2_blur.cu,core/src/feature/cuda/ssimulacra2_cuda.c,core/src/feature/cuda/AGENTS.md. - Invariant:
ssimulacra2_blur.cumust export exactly 5 kernel symbols:ssimulacra2_transpose,ssimulacra2_blur_h,ssimulacra2_blur_h3,ssimulacra2_blur_v,ssimulacra2_blur_v3_transposed. The host dispatch inssimulacra2_cuda.clooks up all 5 viacuModuleGetFunctionat init time; removing or renaming any symbol causes a hard init failure. The fused kernels usegridDim.z = 3withplane_stride = width * height(full-resolution constant stride, NOT scale-adjusted stride); any change to the stride contract must propagate to both the kernel and the three host dispatch helpers (ss2c_launch_blur_h3,ss2c_launch_transpose,ss2c_launch_blur_v3_transposed). The--fmad=falseflag in thecuda_cu_extra_flagsmap incore/src/meson.buildis load-bearing for theplaces=4cross-backend parity gate; do not remove it. - Re-test:
meson setup core/build_cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C core/build_cuda tools/vmaf
# Correctness gate:
./core/build_cuda/tools/vmaf \
--reference testdata/ref_576x324_48f.yuv \
--distorted testdata/dis_576x324_48f.yuv \
--width 576 --height 324 --pixel_format 420 --bitdepth 8 \
--feature ssimulacra2_cuda --output /tmp/cuda_out.xml --backend cuda
# Cross-backend parity:
python3 scripts/ci/cross_backend_parity_gate.py \
--features ssimulacra2 --backends cpu cuda --places 4
perf/adm-cm-cuda-warp-reduce-fusion — ADM CM i4 warp-reduce fusion (2026-05-16)¶
What changed: integer_adm/adm_cm.cu — i4_adm_cm_line_kernel (writes INT32 per-thread values to accum_per_thread global scratch) + adm_cm_reduce_line_kernel_4 (reads scratch, cubic-accumulates, warp-reduces) replaced by a single i4_adm_cm_line_kernel_fused kernel that does all three steps internally and writes via atomicAdd_int64. integer_adm_cuda.c updated: one cuLaunchKernel per scale (was two), func_adm_cm_reduce_line_kernel_4 removed from AdmStateCuda.
Rebase impact: low. All touched files are fork-local CUDA kernels and their host glue; Netflix upstream does not maintain GPU ADM CM kernels.
Invariant to preserve on rebase: the fused kernel's shift constants (shift_sq=30, add_shift_sq=1<<29, shift_cub=ceil(log2(w)), shift_inner_accum=ceil(log2(h))) must match those used by adm_cm_reduce_line_kernel in the same file for scale != 0. If the reduce kernel's constants are ever changed, the fused kernel's constants must be updated in lockstep.
Smoke-test after rebase:
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build
meson test -C build --suite=fast
python3 scripts/ci/cross_backend_parity_gate.py \
--features vif --backends cpu cuda --places 4
python3 scripts/ci/cross_backend_parity_gate.py --features adm --backends cpu cuda --places 4
---
### Ghost `moment_vulkan.c` removed (fix/drop-ghost-moment-vulkan-c)
PR #1067 re-introduced the pre-rename `moment_vulkan.c` alongside the new
`float_moment_vulkan.c` that PR #1046 had established. The ghost file was
deleted and `core/src/vulkan/meson.build` updated to reference
`float_moment_vulkan.c` exclusively.
**Rebase impact**: any branch that modified `moment_vulkan.c` must be
re-targeted to `float_moment_vulkan.c` instead.
**Smoke-test after rebase**:
```bash
ninja -C build && echo "no duplicate symbol error"
perf/cache-rfe-hw-flags — cache rfe_hw_flags bitmask (F2-B)¶
File changed: core/src/libvmaf.c — VmafContext struct + vmaf_init + vmaf_use_feature + vmaf_read_pictures.
No rebase impact: the change is entirely internal to libvmaf.c; no public header touched, no FFmpeg-patch surface changed.
Invariant: rfe_hw_flags_dirty must be set to true in vmaf_init (after the memset zeroes it to false). If a future refactor moves the memset or adds a second init path, the dirty flag must be set at every init site.
Smoke-test after rebase:
meson setup build -Denable_cuda=true -Denable_sycl=false
ninja -C build src/liblibvmaf.a.p/libvmaf_src_libvmaf.c.o
# Expected: compiles without error or warning
PR #1067 clobbered four GPU feature options (fix/enable-chroma-pr1067-regression)¶
PR #1067 (bootstrap name-builder refactor) merged a stale base that pre-dated four option additions and overwrote them:
integer_psnr_metal.mm: lostenable_chromafield + option entry +n_planesguard (PR #986)float_psnr_metal.mm: lostenable_chroma+ per-plane dispatch loop +n_planes(PR #978)psnr_vulkan.c: ceiling division reverted to floor division for chroma geometry (PR #878)vif_vulkan.c: lostvif_skip_scale0field + option entry + score-suppression guards (PR #1057)
Rebase impact: any branch that adds options to these four files and was branched before PR #1067 merged must be rebased onto master (post-fix) to avoid re-clobbering these options.
Smoke-test after rebase:
meson setup build -Denable_cuda=false -Denable_sycl=false --wipe
ninja -C build
meson test -C build --suite=fast
perf/chug-sidecar-bit-depth-key-f6b (2026-05-17)¶
File: core/src/feature/hip/integer_psnr_hip.c
No rebase impact: the change is additive (new option + plane-loop). If upstream later changes the HIP PSNR submit/collect call-graph, re-check that the per-plane loop in submit_fex_hip and collect_fex_hip matches whatever new structure upstream introduces. The kernel (psnr_score.hip) is unchanged.
fix/float-vif-skip-scale0-hip-metal¶
Files: core/src/feature/sycl/float_vif_sycl.cpp, core/src/feature/hip/float_vif_hip.c, core/src/feature/metal/float_vif_metal.mm
No rebase impact: the changes are additive (new field + option + host-side guard in the collect path). GPU kernels are unchanged. If upstream later changes the float_vif collect path or adds vif_skip_scale0 natively, re-check that scale-0 suppression in all three backends matches the CPU implementation in float_vif.c.
fix/adm-metal-missing-options¶
File: core/src/feature/metal/integer_adm_metal.mm
No rebase impact: the change moves three implicit defaults from init_fex_metal into the options table. The struct fields and kernel dispatch are unchanged. If upstream adds its own Metal ADM options or renames the default macros, re-check that DEFAULT_ADM_CSF_SCALE, DEFAULT_ADM_CSF_DIAG_SCALE, and DEFAULT_ADM_NOISE_WEIGHT still resolve correctly.
fix/docs-pr-strict-check-batch18¶
Files: .github/workflows/lint-and-format.yml, .github/workflows/required-aggregator.yml
No rebase impact: the change adds a new CI job (docs-lint) and a new entry in the required-aggregator check list. Both are purely additive and contain no fork-local logic that upstream could change. If upstream adds its own docs-lint CI, dedup by dropping our job or merging the two.
chore/svm-h-remove-orphaned-xxx-marker¶
File: core/src/svm.h
No rebase impact: removes an empty /* XXX */ comment from the vendored libsvm header and folds two trailing comment lines on free_sv into one two-line block. If upstream libsvm updates svm.h, re-apply by re-removing the marker (it originates in libsvm upstream and may reappear).
model/tiny/vmaf_tiny_v1_medium.onnx¶
Files: model/tiny/vmaf_tiny_v1_medium.onnx (binary, inline repack), model/tiny/vmaf_tiny_v1_medium.onnx.data (deleted), changelog.d/fixed/model-tiny-v1-medium-external-data.md.
No rebase impact: model-only binary change, no C-API or registry schema change. If upstream Netflix ever ships a file with this name, treat as a conflict and keep the fork's version (it is a fork-local model, not an upstream artifact).
docs/motion-dedicated-page¶
Files: docs/metrics/motion.md (new), docs/metrics/features.md, docs/adr/0491-motion-dedicated-doc-page.md, docs/adr/README.md, changelog.d/added/motion-dedicated-doc-page.md.
No rebase impact: doc-only addition. If upstream Netflix adds a motion extractor or renames existing ones, update docs/metrics/motion.md to match — no code change required.
fix/vulkan-vif-shader-fp64-for-bit-exact¶
Files: core/src/feature/vulkan/shaders/vif.comp, core/src/vulkan/common.c, docs/adr/0492-vulkan-vif-shader-fp64-g-computation.md, docs/backends/vulkan/overview.md, changelog.d/fixed/vulkan-vif-fp64-g-computation.md.
Rebase sensitivity (medium): vif.comp carries the fp64 extension declaration at line 68 and the revised g/sv_sq block at ~line 540. If upstream Netflix modifies the VIF computation path in integer_vif.c, re-verify that the double-precision GLSL block still mirrors the CPU reference exactly (especially the eps constant and the int32 truncation order for sv_sq). The common.c shaderFloat64 probe must stay in sync with any new device-feature guards added to the same function.
fix/test-output-portable-tempfile¶
Files: core/test/test_output.c, changelog.d/fixed/test-output-portable-tempfile.md.
No rebase impact: core/test/test_output.c is fork-local (added by PR #963, fork-only coverage gap follow-up; Netflix upstream has no equivalent file). The Windows-portable make_temp_path() helper sits inside that file and is not exported. Upstream syncs do not touch it.
fix/mcp-probe-findings-2026-05-17¶
Files: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, mcp-server/vmaf-mcp/tests/test_probe_findings_2026_05_17.py, mcp-server/vmaf-mcp/tests/test_backend_dispatch.py, testdata/bench_all.sh, docs/adr/0495-mcp-probe-bug-fixes.md, docs/mcp/tools.md, docs/state.md, changelog.d/fixed/mcp-probe-bug-cluster-2026-05-17.md.
Rebase sensitivity (low): all changes live in the fork-only mcp-server/ tree and the fork-added testdata/bench_all.sh (Netflix upstream has neither). The new _BACKEND_DISABLE / _BACKEND_PROBE_CACHE helpers and the --no_<backend> flag plumbing in _run_vmaf_score assume the libvmaf CLI continues to advertise --no_<backend> switches in --help and to accept them on the command line. If upstream ever removes them (the new --backend $NAME exclusive selector landed in the fork on 2026-04-28 — see bench_all.sh header comments), swap _probe_backends to parse the selector grammar instead. No upstream- mirrored file is touched.
fix/bbb-e2e-v2-bug-cluster-2026-05-18¶
Files: core/tools/vmaf.c (init_gpu_backends explicit-backend gating + amend_json_with_backend_used helper), tools/vmaf-tune/src/vmaftune/{score,bisect,corpus,ladder,report,encode,cli}.py, tools/vmaf-tune/tests/test_bbb_e2e_v2_bug_cluster.py, dev/Containerfile (matplotlib), dev/scripts/dev-mcp-entrypoint.sh (mkdir -p /tmp), docs/adr/0498-vmaf-tune-bbb-e2e-v2-bug-cluster.md, docs/adr/README.md, docs/state.md, docs/usage/vmaf-tune.md, docs/backends/index.md, docs/development/dev-mcp.md, changelog.d/fixed/vmaf-tune-bbb-e2e-v2-bug-cluster.md.
Rebase sensitivity (medium for core/tools/vmaf.c, low for the rest): the C-side change is bolted on at the end of each backend's state_init failure stanza inside init_gpu_backends; an upstream refactor that restructures that helper (Netflix has no equivalent function — the fork extracted it as ADR-0141 §2 with NOLINTNEXTLINE) would need the explicit-backend if (...) return -1; gates re-applied per backend. The amend_json_with_backend_used helper is fork-local (operates on the file libvmaf wrote — no API change) and survives upstream syncs verbatim. The vmaf-tune fixes live entirely in tools/vmaf-tune/ which is fork-added; no upstream-mirror file is touched. The ScoreRequest.duration_s and CorpusJob.{src_width, src_height} field additions are optional with safe defaults so older test fixtures still compile. ffmpeg-patches are unaffected — no public C API surface changed.
fix/vmaf-tune-ladder-reference-decode-v3¶
Files: tools/vmaf-tune/src/vmaftune/{corpus,score}.py, tools/vmaf-tune/tests/test_bbb_e2e_v3_bug_cluster.py, docs/adr/0499-vmaf-tune-ladder-reference-decode-v3.md, docs/adr/README.md, docs/usage/vmaf-tune.md, changelog.d/fixed/vmaf-tune-ladder-reference-decode-v3.md.
Rebase sensitivity (none): all changes live in tools/vmaf-tune/ which is fork-added — no upstream Netflix file is touched. The new _maybe_decode_reference helper and _decode_source_to_yuv shared building block are private module functions; no public API was added or renamed. Dropping .y4m from _VMAF_RAW_SUFFIXES / VMAF_RAW_SUFFIXES is a behaviour change inside the wrapper that matches what the libvmaf CLI has always done (raw_input_open rejects Y4M files when use_yuv=true — see core/tools/cli_parse.c and the regression test test_vmaf_raw_suffixes_matches_libvmaf_cli_source which cross-checks the table against the CLI source). ffmpeg-patches unaffected. No effect on bisect.py (already decodes the reference per ADR-0498) — the regression test test_bisect_decodes_reference_too pins the existing invariant.
perf/vif-lut-shrink-and-filter-cache (ADR-0500)¶
Touches upstream-mirrored files: core/src/feature/integer_vif.h, integer_vif.c, vif.c, vif.h, float_vif.c, x86/vif_avx512.c.
Rebase note: when pulling upstream changes to any of these files, verify that:
VifPublicState.log2_table(now 32768 entries) is not reverted to 65537 entries.- The
compute_vifsignature addition (precomputed_filters,precomputed_filter_widths) does not conflict with upstream signature changes. - The three
_mm512_i32gather_epi64gather sites invif_avx512.cretain the_mm256_and_si256index mask withVIF_LOG2_TABLE_SIZE - 1. log_generatefills indices[0..32767]withlog2f(32768+i)*2048(not the originallog2f(i)*2048foriin[32767..65535]).
If upstream changes any of the above, a new reconciliation pass is needed.
fix/bbb-e2e-v4-bug-cluster-2026-05-18 (ADR-0501)¶
Files: tools/vmaf-tune/src/vmaftune/{corpus,ladder,cli}.py, tools/vmaf-tune/tests/{test_bbb_e2e_v2_bug_cluster,test_bbb_e2e_v4_bug_cluster}.py, docs/adr/0501-vmaf-tune-bbb-e2e-v4-bug-cluster.md, docs/adr/README.md, docs/usage/vmaf-tune.md, changelog.d/fixed/0501-vmaf-tune-bbb-e2e-v4-bug-cluster.md.
Rebase sensitivity (none): all changes live in tools/vmaf-tune/ (fork-added) and fork-added docs/changelog files — no upstream Netflix file is touched. The corpus / ladder / cli modules are not present in Netflix master. The optional target_width / target_height kwargs added to _decode_source_to_yuv and _maybe_decode_reference default to None so older test fixtures and the ADR-0499 single-resolution code path round-trip unchanged. The samples= kwarg added to emit_manifest / _emit_json is keyword-only with a None default; HLS / DASH emitters silently ignore it. _run_report's stdout JSON grows two new fields (degraded, codec_rows_unavailable) — a strict schema consumer that asserts on absence would need updating, but the existing fields stay populated. ffmpeg-patches are unaffected; no public C API surface changed.
perf/adm-decouple-gather-locality-2026-05-18 (ADR-0502)¶
Files touched: core/src/feature/x86/adm_avx512.c (upstream-mirror), core/src/feature/x86/AGENTS.md, docs/adr/0502-adm-decouple-gather-prefetch.md, docs/adr/README.md, docs/research/0435-adm-decouple-gather-locality.md, changelog.d/performance/adm-decouple-gather-prefetch.md.
Rebase sensitivity (low): the only upstream-shared file touched is adm_avx512.c. The change is a self-contained block (16 lines, guarded by if (j + 32 < right_mod16)) inserted before the three vpgatherdd lines. Conflicts arise only if Netflix upstream modifies adm_decouple_avx512 — resolution: apply the prefetch block to the updated gather cluster in the upstream version. The adm_div_lookup LUT signature is unchanged; the adm_div_lookup[val + 32768] access pattern is identical to scalar. No public C-API surface, no header, no meson build files touched.
ADR-0503: vif_subsample_rd_8_avx512 loop fission (2026-05-18)¶
File: core/src/feature/x86/vif_avx512.c
If Netflix upstream modifies vif_subsample_rd_8_avx512, the two noinline helpers (vif_subsample_rd_8_vert_j, vif_subsample_rd_8_horiz_j) and their parameter structs (VifVertCoeffs8, VifHorizCoeffs8) must be kept in sync with any upstream changes to the accumulation order or filter constant initialisation. The struct fields map 1:1 to the local variables in the original monolithic body (f0-f4, mask2/3/x for vertical; fcoeff-fcoeff4, addnum, mask1 for horizontal). No public C-API surface, no header, no meson build files touched.
fix/bbb-e2e-v5-bug-cluster-2026-05-18 (ADR-0505)¶
Files: tools/vmaf-tune/src/vmaftune/{corpus,ladder,cli}.py, tools/vmaf-tune/tests/{test_bbb_e2e_v4_bug_cluster,test_bbb_e2e_v5_bug_cluster}.py, docs/adr/0505-vmaf-tune-bbb-e2e-v5-bug-cluster.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0505-vmaf-tune-bbb-e2e-v5-bug-cluster.md.
Rebase sensitivity (none): all changes live in fork-added tools/vmaf-tune/ plus fork-added docs/changelog files — no upstream Netflix file is touched. The new source_is_container wiring in corpus.iter_rows consumes an existing EncodeRequest field (added in earlier fork work); container-detection logic is a suffix-set membership test against the fork-local _VMAF_RAW_SUFFIXES. The new cloud_sink kwarg on make_default_sampler and _default_sampler defaults to None so every existing caller round-trips unchanged.
fix/ladder-duration-clip-ffmpeg-t-flag (ADR-0508)¶
Files: tools/vmaf-tune/src/vmaftune/encode.py, tools/vmaf-tune/tests/test_bbb_e2e_v8_bug_cluster.py, docs/adr/0508-vmaf-tune-ladder-pass1-stats-duration-clip.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0508-vmaf-tune-ladder-pass1-stats-duration-clip.md.
Rebase sensitivity (none): all changes live in fork-added tools/vmaf-tune/ plus fork-added docs/changelog files — no upstream Netflix file is touched. The fix adds a six-line fallback to build_pass1_stats_command that reads the existing EncodeRequest.duration_s field (introduced by ADR-0506 V6-1) and emits an input-side -t duration_s when the caller did not opt into sample-clip mode. Sample-clip precedence is preserved so existing tests that pin the sample-clip argv shape continue to pass unchanged. No public surface of the libvmaf C API changes; no ffmpeg-patches file consumes tools/vmaf-tune/ Python helpers.
fix/bbb-e2e-v6-bug-cluster-2026-05-18 (ADR-0506)¶
Files: tools/vmaf-tune/src/vmaftune/{corpus,encode,cli}.py, tools/vmaf-tune/tests/test_bbb_e2e_v6_bug_cluster.py, docs/adr/0506-vmaf-tune-bbb-e2e-v6-bug-cluster.md, docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0506-vmaf-tune-bbb-e2e-v6-bug-cluster.md.
Rebase sensitivity (none): all changes live in fork-added tools/vmaf-tune/ plus fork-added docs/changelog files — no upstream Netflix file is touched. EncodeRequest gains a new duration_s: float = 0.0 field with a back-compatible default so every existing caller round-trips unchanged. _decode_source_to_yuv gains four new kwargs (source_is_raw, source_width, source_height, source_framerate) all defaulting to None/False; the container-source path (which is what every v3/v4/v5 test exercises) takes the legacy branch and emits an identical argv. _maybe_decode_reference and iter_rows wire the new kwargs through. No public surface of the libvmaf C API changes; no ffmpeg-patches file consumes the modified tools/vmaf-tune/ Python helpers.
fix/mcp-backend-probe-allowlist-ladder-score-backend (ADR-0511)¶
Files: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, mcp-server/vmaf-mcp/tests/test_backend_probe_and_allowlist_0509.py, tools/vmaf-tune/src/vmaftune/{cli,ladder}.py, tools/vmaf-tune/tests/test_ladder_score_backend_0509.py, docs/adr/0511-mcp-backend-probe-allowlist-and-ladder-backend.md, docs/adr/README.md, docs/state.md, docs/mcp/backends.md, docs/usage/vmaf-tune.md, docs/rebase-notes.md, mcp-server/AGENTS.md, tools/vmaf-tune/AGENTS.md, changelog.d/fixed/mcp-and-ladder-backend.md.
Rebase sensitivity (none): every touched file is fork-local — the MCP server (mcp-server/) and the vmaf-tune CLI (tools/vmaf-tune/) are wholly fork-added trees with no upstream counterpart. The libvmaf C surface is untouched (the probe shells out to the existing vmaf --help flag table, no new CLI surface added there), and no ffmpeg-patches/ patch consumes tools/vmaf-tune/ or mcp-server/ Python helpers. make_default_sampler and _default_sampler gain a new score_backend: str | None = None kwarg with a back-compatible default, so every existing caller round-trips unchanged. _run_tune_per_shot is deliberately not touched — the auto → None → libvmaf-picks predicate contract is preserved as documented inline.
fix/windows-mingw64-build-repair (ADR-0515)¶
Rebase sensitivity (none): test-only change confined to a fork-added file (core/test/test_public_api_score.c, added 2026-05-16 from the C-API coverage audit). The Win32 #ifdef branch mirrors the pre-existing pattern in core/test/dnn/test_model_loader.c::test_sidecar_parses; no public C-API or ffmpeg-patches/ surface touched. No upstream-Netflix counterpart for this test file, so upstream rebases cannot collide with the helper.
fix/tiny-model-loader-external-data-and-feature-rank (ADR-0518)¶
Files: core/src/dnn/{model_loader.h,model_loader.c,dnn_ctx.h,ort_backend.h,ort_backend.c,AGENTS.md}, core/src/libvmaf.c, core/test/dnn/{meson.build,test_cli.sh,test_model_loader.c}, docs/adr/0518-tiny-model-loader-external-data-and-feature-rank.md, docs/adr/README.md, docs/ai/inference.md, docs/state.md, docs/research/0518-tiny-model-loader-feature-rank.md, docs/rebase-notes.md, changelog.d/fixed/tiny-model-loader.md.
Rebase sensitivity (low — fork-local dnn surface, additive): All touched files are fork-local except core/src/libvmaf.c, where the changes are confined to the tiny-AI bridge (the VmafContext::dnn struct in the file-private definitions block, vmaf_ctx_dnn_free, vmaf_ctx_dnn_attach, and vmaf_ctx_dnn_run_frame — all fork-added per ADR-0040 / ADR-0042). The struct grows by four fields (in_rank, n_features, extra_in_width, extra_in_buf); no existing offset shifts since the new fields are appended inside the dnn substruct, which is private to libvmaf.c. VmafModelSidecar (declared in core/src/dnn/model_loader.h) grows by n_features / feature_names[VMAF_DNN_MAX_FEATURE_NAMES] / feature_mean[] / feature_std[] / has_feature_scaler — this is a header consumed only by the dnn TU set and the DNN tests; consumers outside core/src/dnn/ or core/test/dnn/ should not depend on the struct layout (it's an internal sidecar contract, not a public API). vmaf_ort_input_shape_at() is a new public symbol on ort_backend.h; the existing vmaf_ort_input_shape() remains as the slot == 0 shortcut. No ffmpeg-patches file consumes any of the changed symbols.
fix/hip-import-state-implementation (ADR-0519)¶
Files: core/src/hip/common.c (delete vmaf_hip_import_state stub body — 9 lines removed), core/src/libvmaf.c (add HAVE_HIP block: header include, hip field on VmafContext, real implementation of vmaf_hip_import_state, cleanup in vmaf_close), core/test/test_hip_smoke.c (replace test_import_state_returns_enosys with test_import_state_validates_arguments + test_import_state_succeeds_with_real_state), docs/adr/0519-hip-import-state-implementation.md, docs/adr/README.md, docs/backends/hip/overview.md, docs/state.md, docs/research/0519-hip-import-state.md, docs/rebase-notes.md, changelog.d/fixed/hip-import-state.md.
Rebase sensitivity (low — fork-local HIP surface, additive): The VmafContext struct in core/src/libvmaf.c grows by one field — a hip substruct holding a single VmafHipState * pointer — gated by #ifdef HAVE_HIP. The field is appended after the existing metal substruct (the last existing GPU substruct), so no offsets shift in CPU-only / CUDA-only / etc. builds. The new vmaf_hip_import_state definition matches the existing public declaration in core/include/libvmaf/libvmaf_hip.h field-for-field; the header is unchanged, so consumers (the fork's core/tools/vmaf.c and any future ffmpeg-patches/ HIP consumer) recompile against the same ABI. core/src/hip/common.c loses the 9-line vmaf_hip_import_state stub; no other in-file functions are touched. On upstream rebase the patch is trivially applicable because Netflix/vmaf master ships no HIP backend; the entire core/src/hip/ tree is fork-local (ADR-0212). No ffmpeg-patches file consumes vmaf_hip_import_state directly today — vf_libvmaf reaches HIP only through the CPU-path fallback for now.
2026-05-18 — --tiny-codec / --tiny-preset / --tiny-crf populate codec block (ADR-0522, PR #TBD)¶
Fork-local. Adds three CLI flags + one public C-API (vmaf_dnn_set_codec_context) that override the ADR-0518 "unknown" codec pre-seed for codec-aware tiny models (fr_regressor_v2). Touched: core/include/libvmaf/dnn.h (public header — new export), core/src/dnn/dnn_attach_api.c (public symbol), core/src/dnn/dnn_ctx.h (bridge), core/src/libvmaf.c (bridge implementation — vmaf_ctx_dnn_set_codec_context), core/src/dnn/model_loader.{c,h} (sidecar encoder_vocab[] parsing + vmaf_dnn_codec_block_fill helper + VmafModelSidecar grows by n_encoder_vocab / encoder_vocab[VMAF_DNN_MAX_ENCODER_VOCAB] / codec_aware), core/tools/cli_parse.{c,h} (three new flags + three new CLISettings fields: tiny_codec, tiny_preset, tiny_crf), core/tools/vmaf.c (call site after vmaf_use_tiny_model), core/test/dnn/test_model_loader.c (8 new tests).
Rebase sensitivity (low — fork-local additive): All touched files are fork-local. VmafModelSidecar and the VmafContext::dnn substruct grow by additive fields only — no existing offset shifts. vmaf_dnn_set_codec_context() is a new VMAF_EXPORT symbol on the public libvmaf/dnn.h surface; the ffmpeg-patches stack does NOT currently consume tiny-model inference (the vf_libvmaf filter wires through the classic metric collector, not the tiny-AI surface), so no patch update is required for this PR (per CLAUDE.md §12 r14: "Does NOT apply to … kernel implementations behind an existing public surface"). The C-side codec_block_preset_ordinal table is a duplicate of train_fr_regressor_v2.py::PRESET_ORDINAL; the core/src/dnn/AGENTS.md invariant note flags both files as a co-edit pair. The sidecar encoder_vocab array is the single source of truth for the vocabulary; vocab bumps (e.g. ADR-0302 v3) only require a new sidecar JSON, no C recompile.
fix/cli-threads-parse-safety-v2 (ADR-0528)¶
Files touched: core/test/test_cli_parse_long_only_args.c, core/tools/cli_parse.c.
Rebase sensitivity (low — fork-local additive): Both files are fork-local. The test (test_cli_parse_long_only_args.c) has no upstream twin — it was added in PR #408 / ADR-0316 to lock down the long-only short-option synthesis bug. cli_parse.c is upstream- adjacent (Netflix maintains its own error() with the same assert(long_opts[n].name) shape and the same sprintf(optname, …) calls). On a future upstream sync, expect a merge conflict on error() if Netflix changes the assert / sprintf lines independently: keep the fork's if (!found) usage(…); return; + snprintf shape and drop the <assert.h> include. No public API surface changes; the ffmpeg-patches stack is untouched.
2026-05-18 — HIP integer_motion flag promotion + HIP_DEVICE buffer enum (ADR-0530, PR #TBD)¶
Extends ADR-0519. Promotes VMAF_FEATURE_EXTRACTOR_HIP on vmaf_fex_integer_motion_hip so the model-driven dispatch picks the HIP kernel instead of the CPU twin when a HIP state is imported. Adds VMAF_PICTURE_BUFFER_TYPE_HIP_DEVICE to the picture-buffer enum (reserved for the future HIP picture pool; HIP TUs still accept HOST and do their own HtoD copy). Wires compute_fex_flags() for HIP, adds a CPU-twin fallback in vmaf_get_feature_extractor_by_feature_name(), drains HIP-flagged extractors' gpu_pending final-frame collect in flush_context_serial(), and routes the HIP integer_motion collect/flush writes through feature_name_dict so the encoded option-aware key matches the predict-side lookup.
Touched: core/src/picture.h (new enum entry), core/src/feature/feature_extractor.c (dispatch buffer-type check + _by_feature_name fallback + new extern + registry row), core/src/libvmaf.c (compute_fex_flags HIP slot + flush_context_serial HIP drain), core/src/feature/hip/integer_motion_hip.c (flag bit set + dict-aware writes), core/src/feature/hip/integer_vif_hip.c (flag bit cleared with citation — un-promotes pending kernel-level fix), core/src/hip/meson.build (compile integer_motion_hip.c), core/src/meson.build (motion_score.hip HSACO), core/src/hip/AGENTS.md (invariant rewrite), core/test/test_hip_smoke.c (registration + flag-dispatch tests), docs/backends/hip/overview.md, docs/rebase-notes.md, changelog.d/added/0530-hip-integer-motion-flag-promotion.md, docs/state.md.
Rebase sensitivity (medium — touches upstream-mirror feature_extractor.c dispatch site):
The dispatch-time HIP buffer-type check is a NEW symmetric block right after the existing CUDA buffer-type check. Any upstream port that touches the CUDA block needs a paired update to the HIP block to keep them symmetric. The CPU-twin fallback pass in _by_feature_name is a documented contract going forward (ADR-0530) — future GPU backend work cannot assume "flag set ⇒ full coverage"; treat the fallback as the established behaviour, not as a bug to fix.
The compute_fex_flags() HIP slot mirrors the existing Vulkan / SYCL slots field-for-field; the flush_context_serial() HIP drain mirrors the SYCL flush_context_sycl drain. Any upstream refactor that relocates either function needs to move all three GPU slots / drains together.
vmaf_fex_integer_vif_hip had VMAF_FEATURE_EXTRACTOR_HIP set speculatively in its batch-1 commit; this PR clears it with an inline citation. Do NOT re-enable on a future rebase without a kernel-level GPU-memory-access-fault fix and an ADR-0530-style per-extractor reproducer.
No public-header change → no ffmpeg-patches/ update required (per CLAUDE.md §12 r14: the new picture-buffer-type enum lives in the libvmaf-private src/picture.h, not the public include/libvmaf/picture.h; the ffmpeg vf_libvmaf filter hands HOST buffers to libvmaf and is unaffected).
feat/hip-register-all-extractors (ADR-0533)¶
Files touched: core/src/hip/meson.build, core/src/feature/feature_extractor.c, core/test/test_hip_smoke.c, core/src/hip/AGENTS.md, docs/backends/hip/overview.md, docs/state.md, docs/adr/0533-hip-all-extractors-registration-sweep.md, docs/adr/README.md, changelog.d/fixed/hip-register-all-extractors.md.
Rebase sensitivity (low — fork-local only): All edits sit in fork-additive HIP plumbing. Upstream Netflix/vmaf ships no HIP backend, so the #if HAVE_HIP blocks in feature_extractor.c are entirely fork-local; the extern + registry entries land inside the same #if HAVE_HIP regions ADR-0523 already extended. core/src/hip/meson.build is a fork-added file (the subdir('hip') invocation is gated on enable_hip). No public C-API surface changes — vmaf_get_feature_extractor_by_name already existed; the sweep only adds rows to the table it reads. The ffmpeg-patches stack is untouched (no new LIBVMAFContext field, no new CLI flag, no new meson_options.txt entry). On a future upstream sync, expect zero conflicts — Netflix never touches HIP files. If a future PR adds a new HIP feature TU, the invariant pinned in core/src/hip/AGENTS.md (every vmaf_fex_*_hip symbol must appear in hip_sources + the extern/registry blocks) must be honoured or the registration drops out silently.
ADR-0538 — per-shot predicate bitrate sidecar (PR #1290 follow-up)¶
No rebase impact: the change is entirely internal to tools/vmaf-tune/src/vmaftune/cli.py (_build_per_shot_bisect_predicate return type change + call-site unpack + dataclasses.replace patch loop) and the corresponding test file. No public API surface, no C code, no meson_options.txt entry, no ffmpeg-patches entry, no new public Python symbol. Netflix upstream never touches vmaf-tune; upstream syncs will not conflict. On a future upstream sync, expect zero conflicts.
ADR-0539 — HIP integer_moment HSACO blob registration¶
No rebase impact: change is one new row in hip_kernel_sources inside core/src/meson.build, gated by the fork-only enable_hip flag. Netflix upstream has no HIP backend and never touches hip_kernel_sources, the four feature/hip/integer_moment/* paths, or the hip_hsaco_stubs.c TU. The ffmpeg-patches stack is untouched (no new LIBVMAFContext field, no new CLI flag, no new meson_options.txt entry). On a future upstream sync, expect zero conflicts.
If a future PR adds yet another HIP host TU that consumes a <name>_hsaco symbol distinct from any existing meson key, the invariant pinned in core/src/feature/hip/AGENTS.md (HSACO symbol naming) must be honoured to avoid the same class of link error this ADR closed.
feat/dev-container-ffmpeg-av1-hwaccel (ADR-0543)¶
Files touched: dev/Containerfile (stage 3.5 apt list + SVT-AV1 / libaom / vvenc / AMF source builds + FFmpeg configure flags + build- time encoder probe), dev/AGENTS.md (four new "FFmpeg encoder exposure invariants"), docs/development/dev-mcp.md (encoder matrix + runtime failure modes + full-sweep reproducer), core/src/meson.build (one-line follow-up to ADR-0523: add 'motion_score' to the hip_kernel_sources dict; surfaced as a stage-3 link blocker during ADR-0543's container rebuild verification), ffmpeg-patches/0007- libvmaf-tune-qpfile-unified.patch (one-line addition: #include <stdbool.h> to libavcodec/libsvtav1.c so enable_roi_map = true compiles — surfaced when the in-image FFmpeg was first built with --enable-libsvtav1; the patched code was never previously exercised because no prior dev-image enabled libsvtav1), docs/adr/0529-…md (+ index fragment), changelog.d/added/0529-…md, docs/state.md (two state rows), this file.
Rebase sensitivity (none — container-only fork-local additive plus one fork-local libvmaf wiring fix): Every touched file lives under dev/, docs/, changelog.d/, or core/src/meson.build. The libvmaf hunk adds one dict entry referencing a fork-local .hip source (ADR-0523 lineage); no upstream conflict possible because Netflix has no HIP backend. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry. The CLAUDE.md §12 r14 patch-stack rule does not apply — the FFmpeg configure-line change happens in the in-image build only and is orthogonal to the host-side patch series under ffmpeg-patches/. Pin bumps (VVenC v1.12.0, AMF v1.4.36, FFmpeg n8.1.1 via FFMPEG_TAG build-arg) are visible in the ARG lines of dev/Containerfile; bumping them is a local container change.
ADR-0543 — ADR-0498 enforcement hardening (exit code 100 + JSON error + per-feature gate)¶
Summary: Hardens the explicit-backend gate that ADR-0498 introduced in core/tools/vmaf.c. Adds three orthogonal contracts: dedicated exit code 100 (VMAF_EXIT_BACKEND_INIT_FAILED) for --backend NAME init failures, structured JSON error descriptor at the --output path when format is JSON, and a per-feature gate that hard-fails GPU-pinned feature names (*_cuda / *_sycl / *_vulkan / *_hip / *_metal) when the matching backend isn't active.
Files touched: core/tools/vmaf.c (new constants, new helpers write_backend_error_json / feature_backend_suffix / backend_active, new bool *cuda_active_out parameter on init_gpu_backends, per-feature gate in the feature-loading loop, simplified backend_used echo), docs/adr/0543-adr-0498-enforcement- hardening.md (+ index row in docs/adr/README.md), changelog.d/fixed/0543-adr-0498-enforcement-hardening.md, tools/vmaf-tune/tests/test_adr_0543_backend_enforcement.py (13 integration + source-level tests), docs/state.md (Recently closed row), this file.
Rebase sensitivity (none — fork-local additive against an already-fork-local helper): The only C source touched is core/tools/vmaf.c, and only inside the init_gpu_backends helper + its caller — both of which are fork-local additions that do not exist in Netflix/vmaf upstream (Netflix has no SYCL / HIP / Vulkan / Metal backends and no --backend selector). The new bool *cuda_active_out parameter on init_gpu_backends is guarded by #ifdef HAVE_CUDA and only affects the in-tree caller. No upstream conflict possible.
ffmpeg-patches/ impact (none): No public libvmaf C-API entry points added, renamed, or removed. No meson_options.txt flag added. No LIBVMAFContext field added. No vf_libvmaf.c filter variant added. The new exit code is a CLI-level contract observed by wrappers (vmaf-tune, MCP) — FFmpeg's libvmaf filter consumes libvmaf via the C API and is not impacted. CLAUDE.md §12 r14 does not apply.
fix/feature-extractor-list-dedup (ADR-0544)¶
Removes 61 duplicate &vmaf_fex_* entries from core/src/feature/feature_extractor.c's static feature_extractor_list[] (55 Vulkan + 6 SYCL) and adds vmaf_feature_extractor_list_audit(), called from vmaf_init(), that returns -EINVAL if any extractor name or pointer is seen twice.
Rebase sensitivity (low — fork-local hunks only): The duplicated rows lived in fork-local #if HAVE_VULKAN / #if HAVE_SYCL blocks — both backends are absent upstream. The deduped arrangement keeps the same row ordering Netflix would expect for the CPU + CUDA paths (untouched) so the inevitable next sync-upstream sees no diff there. The new public header line in core/src/feature/feature_extractor.h (the vmaf_feature_extractor_list_audit() declaration) is appended after the existing fork-local symbols and before the VmafFeatureExtractorContextFlags block, isolating it from upstream hunks. The vmaf_init() call site lives in a fork-local block (right after vmaf_set_log_level) that already differs from upstream because of the HIP/Vulkan plumbing — a conflict is only possible if Netflix adds a new init-time call there, in which case the resolution is trivial (preserve both calls; the audit is order-independent w.r.t. other init steps).
Touched files: core/src/feature/feature_extractor.{c,h}, core/src/libvmaf.c, core/test/test_feature_extractor.c, docs/adr/0541-*.md, docs/adr/_index_fragments/0541-*.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md (regenerated), docs/state.md, docs/rebase-notes.md, changelog.d/fixed/0541-*.md. No ffmpeg-patches/, meson_options.txt, or meson.build change (test is exercised by an existing test_feature_extractor target).
chore/wire-or-delete-dead-extractor-files (ADR-0545)¶
Deletes 18 dead Vulkan / Metal feature-extractor source files plus 14 paired orphan shaders (.comp / .metal) from core/src/feature/{vulkan,metal}/ that were never wired into their backend's meson.build. Wires one previously-unwired Metal TU (float_ms_ssim_metal.mm + float_ms_ssim.metal, ADR-0490) into core/src/metal/meson.build. Removes one dead extern in core/src/feature/feature_extractor.c (vmaf_fex_integer_adm_metal) and refreshes the core/src/feature/{vulkan,metal}/AGENTS.md rebase-sensitive invariants section to forbid re-introducing the deleted scaffolds.
Rebase sensitivity (none — pure fork-local housekeeping): All deleted files were fork-local scaffolds added in commit 302bd1673 (2026-05-18, "docs(rules): default to vmaf-dev-mcp container"). Netflix upstream has no Vulkan or Metal backend, so no upstream conflict is possible on the deletes. The lone wired file (float_ms_ssim_metal.mm) is fork-original, references no upstream identifier, and lives under fork-only core/src/metal/. The retained adm_vulkan.c legacy shim is out of scope per ADR-0468. No CPU-path C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry, no Python-binding change — CLAUDE.md §12 r14 (FFmpeg patch-stack sync) does not apply.
feat/vmaf-tune-full-file-and-no-bisect (ADR-0548)¶
Files touched: tools/vmaf-tune/src/vmaftune/cli.py (auto-probe block in _run_tune_per_shot; new _run_compare_crf_sweep function; --no-bisect / --crf-sweep argparse flags in the compare subparser; --width / --height / --framerate made optional in the tune-per-shot subparser), tools/vmaf-tune/tests/test_tune_per_shot_container_src.py (new — Fix A smoke tests), tools/vmaf-tune/tests/test_compare_no_bisect.py (new — Fix B smoke tests), tools/vmaf-tune/AGENTS.md (two new invariant notes), docs/adr/0548-vmaf-tune-full-file-and-no-bisect.md (+ index row in docs/adr/README.md), changelog.d/added/0548-…md, docs/usage/vmaf-tune.md (Fix A and Fix B documentation), this file.
Rebase sensitivity (none — vmaf-tune Python only): All touched files live under tools/vmaf-tune/, docs/, or changelog.d/. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry is touched. The cli.py changes are purely additive: new function _run_compare_crf_sweep, new optional args in existing subparsers, and an early-return dispatch at the top of _run_compare. No existing API surface is renamed or removed. The probe block at the top of _run_tune_per_shot only executes when args.width is None or args.height is None or args.framerate is None — callers that pass explicit geometry are unaffected. No upstream Netflix/vmaf path is touched.
ADR-0539 — HIP hip_cu_extra_flags dispatch + ssimulacra2_blur -ffp-contract=off¶
No rebase impact: the change is entirely additive in core/src/meson.build inside the if get_option('enable_hipcc') block — a new hip_cu_extra_flags dict and one extra per_kernel_flags list interpolated into the existing hipcc custom_target command. The fall-through (.get(name, [])) keeps the command line byte-identical for every kernel not listed. Netflix upstream ships no HIP backend, so the entire enable_hipcc block is fork-local; upstream syncs will not conflict. The dict mechanism extends naturally — when porting a future CUDA kernel that lists flags in cuda_cu_extra_flags, mirror the entry in hip_cu_extra_flags per core/src/feature/hip/AGENTS.md. No public API surface, no meson_options.txt entry, no ffmpeg-patches entry. On a future upstream sync, expect zero conflicts.
ADR-0539 — integer ADM HIP kernels (real impl, removes ADR-0536 weak stubs)¶
No rebase impact: every touched file is fork-local — the four .hip kernel sources under core/src/feature/hip/integer_adm/ are fork-additive (Netflix ships no HIP backend), the hip_kernel_sources meson dict additions live inside the if is_hip_enabled and is_hipcc_enabled block (also fork-local), and the hip_hsaco_stubs.c weak-fallback file is wholly fork-added under ADR-0536. No public C-API surface changes — kernel symbol names match the GET_FN calls in integer_adm_hip.c exactly, host TU is untouched. No meson_options.txt flag added or renamed (re-uses enable_hip + enable_hipcc). No ffmpeg-patches entry needs an update (no new LIBVMAFContext field, no new CLI flag). On a future upstream sync, expect zero conflicts. If a future PR re-introduces a CUDA-only helper into one of the four kernels (re-breaking the standalone build), do NOT re-add a weak HSACO stub — fix the kernel (invariant pinned in core/src/feature/hip/AGENTS.md).
ADR-0546 — Codec-adapter two_pass_args real implementations¶
Rebase impact: none. The change is entirely fork-local — all modified files live under tools/vmaf-tune/src/vmaftune/codec_adapters/ (fork-added Phase A/F vmaf-tune package) and docs/. No upstream Netflix/vmaf file is touched, no ffmpeg-patches file is touched, no public core/include/ header is touched, no meson_options.txt key is added.
Touched files: tools/vmaf-tune/src/vmaftune/codec_adapters/{svtav1,libaom,vvenc,_nvenc_common,_qsv_common,_amf_common,_videotoolbox_common,h264_videotoolbox,hevc_videotoolbox,av1_videotoolbox,prores_videotoolbox}.py, tools/vmaf-tune/tests/test_codec_adapter_two_pass_real.py, docs/adr/0546-codec-adapter-two-pass-real.md, docs/adr/README.md (one index row), docs/research/0546-codec-adapter-two-pass-real.md, docs/usage/vmaf-tune.md (codec support matrix refresh), docs/state.md (Recently-closed row), changelog.d/added/0546-codec-adapter-two-pass-real.md, docs/rebase-notes.md (this entry).
chore/ai-tooling-env-overrides-split (ADR-0547)¶
Rebase sensitivity (low — fork-local only):
ai/scripts/*.py: every file is fork-local. The edits add anos.environ.get(...)wrap around the existing default-path literal. No upstream conflict possible..gitignore: appends*.bak/*.orig. Trivial to resolve if upstream ever touches the same lines (unlikely — these are universal editor-backup patterns).docs/ai/scripts-env-vars.md(new file),mkdocs.ymlnav entry: fork-local docs tree. No upstream conflict possible.tools/vmaf-tune/src/vmaftune/cli.py.bak: untracked file deletion; no git history impact.
Touched files: .gitignore, fifteen ai/scripts/*.py files, docs/ai/scripts-env-vars.md (new), mkdocs.yml, docs/adr/0547-ai-script-env-vars.md (new), docs/adr/README.md, docs/state.md, docs/rebase-notes.md, changelog.d/changed/0547-ai-script-env-vars.md (new). No ffmpeg-patches/ change (no C-API, CLI flag, or meson_options.txt consumed by a patch — CLAUDE.md §12 r14 exempt).
chore/audit-cleanup-bundle-2 (ADR-0549)¶
No rebase impact. All changes are confined to fork-local files:
core/src/feature/cuda/integer_{ssim,ms_ssim,psnr,moment}_cuda.c(comment addition only — no functional change; upstream parity intact).core/src/feature/sycl/integer_{ssim,ms_ssim,psnr,moment}_sycl.cpp(comment addition only).dev/Containerfile(fork-local; whole file is fork-added)..gitignore(adds.claude/worktrees/line; no upstream conflict).docs/state.md(fork-only doc tree).python/test/vmafexec_test.py(comment deletion only; assertion value andplacesargument are unchanged — no golden-gate impact).docs/adr/0549-audit-cleanup-bundle-2.md,docs/adr/README.md,changelog.d/changed/0549-audit-cleanup-bundle-2.md,docs/rebase-notes.md(this entry).
Touched files: core/src/feature/cuda/integer_{ssim,ms_ssim,psnr,moment}_cuda.c, core/src/feature/sycl/integer_{ssim,ms_ssim,psnr,moment}_sycl.cpp, dev/Containerfile, .gitignore, docs/state.md, python/test/vmafexec_test.py, docs/adr/0549-audit-cleanup-bundle-2.md, docs/adr/README.md (one index row), changelog.d/changed/0549-audit-cleanup-bundle-2.md, docs/rebase-notes.md (this entry).
ADR-0546 — audit bundle (Vulkan-01 / saliency-tune-01 / ai-01)¶
No rebase impact for Vulkan-01: adding &vmaf_fex_integer_motion_vulkan_impl to feature_extractor_list[] in feature_extractor.c is a purely additive change under the existing #if HAVE_VULKAN guard. Netflix has no Vulkan backend, so upstream syncs produce zero conflicts.
No rebase impact for saliency-tune-01: all touched files (tools/vmaf-tune/src/vmaftune/saliency.py, tools/vmaf-tune/src/vmaftune/cli.py) are fork-local. No public C-API, no meson_options.txt entry, no ffmpeg-patches entry.
No rebase impact for ai-01: tools/vmaf-tune/src/vmaftune/predictor_train.py is fork-local. The --emit-stub-card-only flag is additive.
ADR-0550 -- Cross-backend parity matrix 2026-05-18¶
Touches: docs/adr/0550-cross-backend-parity-matrix-2026-05-18.md, docs/research/0550-cross-backend-parity-matrix-2026-05-18.md, docs/adr/README.md (index row), docs/state.md (audit-closed row), changelog.d/added/0550-cross-backend-parity-matrix.md, this file.
Rebase sensitivity (none -- docs-only, no C source): All touched files are under docs/ and changelog.d/. No libvmaf C source, no public header, no meson_options.txt, no ffmpeg-patches/ entry. No numerical-correctness risk: this is a read-only audit that produces only documentation artefacts.
ADR-0561 — HIP gfx_targets fallback widening¶
Branch: fix/hip-gfx-targets-fallback-widening Rebase impact: low — touches only core/src/meson.build, core/meson_options.txt, and docs/backends/hip/overview.md. No kernel code, no public API change.
Rebase-sensitive invariant: The fallback string 'gfx90a,gfx1030,gfx1036,gfx1100' at the end of the four-step probe chain in core/src/meson.build must not regress to 'gfx90a' alone. If a meson.build rebase conflict arises in that region, prefer the wider fallback. The comment block above the fallback explains the rationale.
Touched files: core/src/meson.build (fallback string + comment), core/meson_options.txt (description update), docs/backends/hip/overview.md (-Dhip_gfx_targets section), docs/adr/0561-hip-gfx-targets-fallback-widening.md, docs/adr/README.md (one index row), docs/research/0561-hip-gfx-targets-fallback-widening.md, docs/state.md (Recently-closed row T-HIP-GFX-TARGETS-FALLBACK-2026-05-18), changelog.d/fixed/0561-hip-gfx-targets-fallback-widening.md, docs/rebase-notes.md (this entry).
ADR-0562 — VCQ-223 local-explainer hang fix¶
Rebase impact: none. The change is entirely fork-local — all modified files live under python/ (Python wrapper test harness) and docs/. No upstream Netflix/vmaf C file is touched, no ffmpeg-patches file is touched, no public core/include/ header is touched, no meson_options.txt key is added.
Touched files: python/vmaf/core/quality_runner_extra.py, python/test/local_explainer_test.py, docs/adr/0562-local-explainer-hang-fix.md, docs/adr/README.md (one index row), docs/state.md (Recently-closed row), changelog.d/fixed/vcq-223-local-explainer-hang.md, docs/rebase-notes.md (this entry).
ADR-0559 — Feature coverage audit: speed_chroma + speed_temporal in extraction scripts¶
Rebase impact: minimal. Changes are entirely fork-local — all modified files live under ai/data/feature_extractor.py, ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/extract_full_features.py (docstring only), and docs/. No upstream Netflix/vmaf file is touched, no ffmpeg-patches file is touched, no public core/include/ header is touched, no meson_options.txt key is added.
If a future upstream sync adds speed_chroma or speed_temporal to the upstream FULL_FEATURES equivalent, this fork's tuple will have them already; check for duplicates on merge.
Touched files: ai/data/feature_extractor.py (FULL_FEATURES + _METRIC_TO_EXTRACTOR), ai/scripts/bvi_dvc_to_full_features.py (local FULL_FEATURES + EXTRACTORS), ai/scripts/extract_full_features.py (docstring only), docs/ai/models/konvid_mos_head_v1.md (coverage-gap note), docs/adr/0559-feature-coverage-audit.md, docs/adr/README.md (one index row), docs/research/feature-coverage-audit-2026-05-18.md, changelog.d/added/0559-feature-coverage-audit-speed-features.md, docs/rebase-notes.md (this entry).
ADR-0566 — HIP VIF per-feature places=4 gate (supersedes ADR-0537 follow-up)¶
Branch: fix/hip-vif-svm-amplification-places4-gate Rebase impact: documentation-only — touches only docs/adr/, docs/state.md, docs/rebase-notes.md, and changelog.d/. No kernel code, no meson.build change, no public API change.
Rebase-sensitive invariant: If ADR-0537 is amended in a future PR, ensure the "places=3 is acceptable" follow-up clause is not reintroduced. The supersession is recorded in ADR-0566 and in the Recently-closed row T-HIP-VIF-PLACES3-GATE-INCORRECT-2026-05-18.
Touched files: docs/adr/0566-hip-vif-per-feature-places4-gate.md, docs/adr/README.md (one index row), docs/state.md (Recently-closed row T-HIP-VIF-PLACES3-GATE-INCORRECT-2026-05-18), changelog.d/fixed/0566-hip-vif-per-feature-places4-gate.md, docs/rebase-notes.md (this entry).
ADR-0552 — HIP VIF deterministic wavefront reduction¶
Branch: fix/hip-vif-deterministic-reduce Rebase impact: low — only touches core/src/feature/hip/integer_vif/vif_statistics.hip and documentation files. No public API change. No meson.build change.
Rebase-sensitive invariant: The wavefront_reduce_i64 helper uses __shfl_xor with strides 32, 16, 8, 4, 2, 1 (for AMD 64-lane wavefronts). Do NOT merge with a CUDA-style __shfl_down_sync port that uses strides 16, 8, 4, 2, 1 (32-lane) — the stride list is wrong for AMD and will under-reduce, leaving 32-thread partial sums in the accumulator.
Conflict scenario: If a rebase brings in a change to vif_statistics.hip from the CUDA parity sweep or a vif_hori_16_body template refactor, verify that:
- The outer
if (x < w && y < h)guard is preserved (not replaced by early return). wavefront_reduce_accums(thr)is called before theatomicAddblock.- The
atomicAddblock is insideif ((threadIdx.x % AMD_WAVEFRONT_SIZE) == 0).
Touched files: core/src/feature/hip/integer_vif/vif_statistics.hip, docs/adr/0552-hip-integer-vif-deterministic-reduce.md, docs/adr/README.md (one index row), docs/research/0552-hip-vif-deterministic-reduce.md, docs/state.md (Recently-closed row T-HIP-VIF-PARITY-PLACES4-2026-05-18), changelog.d/fixed/0552-hip-vif-deterministic-reduce.md, docs/rebase-notes.md (this entry).
fix/python-mcp-ai-audit-p0-p1-2026-05-18 (ADR-0556)¶
Python / MCP / AI silent-fallback audit. No upstream-shared paths modified; all fixes are in fork-local Python harness files (tools/vmaf-tune/, mcp-server/, ai/scripts/) or documentation. No rebase-sensitive invariants introduced — the score.py JSONDecodeError guard is a pure additive safety wrapper around an existing json.load call, and the bvi_dvc_to_full_features.py empty-entries guards are early-exits before any loop body runs. Files touched: tools/vmaf-tune/src/vmaftune/score.py, tools/vmaf-tune/src/vmaftune/auto.py, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/validate_model_registry.py, docs/adr/0556-python-mcp-ai-audit-2026-05-18.md, docs/adr/README.md (one index row), docs/research/python-mcp-ai-audit-2026-05-18.md, changelog.d/fixed/0556-python-mcp-ai-audit.md, docs/state.md (5 T-rows added to Open section),
ADR-0568 — upstream port USE_DIRECT_READ zero-copy input path (Netflix/vmaf@30a6e2a8d)¶
Branch: chore/upstream-port-direct-read-and-speed-wrappers Rebase impact: low — all touched files are upstream-shared paths that the upstream commit also modifies. On the next sync-upstream, these changes should merge cleanly because the fork's version is a strict superset of the upstream diff (same logic, plus fork style conventions: (void)fprintf, explicit (int) casts on fread returns, memcmp(…) != 0).
Rebase-sensitive invariant: The new fetch_into_vmaf_picture vtable field is the last member of video_input_vtbl. Any future vtable extension must append after it or update both the vtable struct and all initialisers (YUV_INPUT_VTBL, Y4M_INPUT_VTBL) simultaneously. The upstream port order is: open_raw, open, get_info, fetch_frame, close, fetch_into_vmaf_picture.
Touched files: core/tools/vidinput.h, core/tools/vidinput.c, core/tools/yuv_input.c, core/tools/y4m_input.c, core/tools/vmaf.c, docs/adr/0567-upstream-port-direct-read.md, docs/adr/README.md (one index row), changelog.d/perf/0567-upstream-port-direct-read.md,
ADR-0568 — sycl_icpx_aot_targets default¶
Rebase impact: low. Adds a new sycl_icpx_aot_targets string option to core/meson_options.txt and wires the corresponding AOT flags in core/src/meson.build. Any upstream Netflix/vmaf change that also touches those two files will produce a trivial two-hunk conflict. Resolution: preserve both the upstream hunk and the fork-added sycl_icpx_aot_targets option + toolchain-flag block. No public API header is touched; no ffmpeg-patches file is touched.
Touched files: core/meson_options.txt (new option), core/src/meson.build (AOT flag wiring, icpx branch), docs/backends/sycl/overview.md (new AOT section), dev/Containerfile (doc comment near meson invocation), docs/adr/0568-sycl-icpx-aot-targets-default.md (new ADR), docs/adr/README.md (one index row), changelog.d/added/0568-sycl-icpx-aot-targets-default.md, docs/state.md (no-bug note),
chore/sdk-version-bumps-may-2026 (ADR-0569)¶
No rebase-sensitive invariants. All changes are version-string edits in dev/Containerfile ARG lines, .pre-commit-config.yaml rev fields, .github/workflows/supply-chain.yml action SHA pins, and python/requirements.txt ceiling. No API changes, no C/Python logic changes.
Conflict scenarios:
dev/Containerfile: The Ubuntu 26.04 base-image PR (in-flight) touches different ARG blocks. If a rebase conflict occurs, keep both sets of version edits — they are in disjoint sections of the file..pre-commit-config.yaml: If a concurrent PR bumps the same tools, prefer the higher version.python/requirements.txt: If the Ubuntu 26.04 PR widens thenumpyceiling in the same PR, both edits are independent; apply both.
Touched files: dev/Containerfile, .pre-commit-config.yaml, .github/workflows/supply-chain.yml, python/requirements.txt, ai/pyproject.toml (comment only), docs/adr/0569-sdk-version-bumps-2026-05-18.md, docs/adr/README.md (one index row), changelog.d/changed/0569-sdk-version-bumps-2026-05-18.md, dev/AGENTS.md (invariant notes), docs/development/dev-mcp.md (version table),
ADR-0574 — CUDA twins for HDR-model aim and adm3 sub-features (Phase 1)¶
Branch: feat/hdr-features-cuda-twins Rebase impact: CUDA kernel and host files only — touches core/src/feature/cuda/float_adm/float_adm_score.cu and core/src/feature/cuda/float_adm_cuda.c. No meson.build change, no public C-API change, no Python/model change.
Rebase-sensitive invariant: FADM_ACCUM_SLOTS = 9 must remain identical in both files. The .cu unit defines the per-WG slot layout ([0..2]=csf_den, [3..5]=cm_num, [6..8]=aim_cm); the .c host uses it for buffer allocation, D2H copy size, and accumulator reads. If a rebase replaces the .cu with a pre-ADR-0574 version (FADM_ACCUM_SLOTS = 6), update float_adm_cuda.c to match in the same commit — a mismatch silently corrupts host memory. The --fmad=false nvcc flag covers all six kernels; do not remove it.
Touched files: core/src/feature/cuda/float_adm/float_adm_score.cu, core/src/feature/cuda/float_adm_cuda.c, docs/adr/0574-hdr-features-cuda-twins-phase-1.md, docs/adr/README.md (one index row), core/src/feature/cuda/AGENTS.md (slot-sync invariant note), docs/research/netflix-upstream-feature-additions-since-sync-2026-05-18.md, docs/metrics/features.md (aim/adm3 sub-feature docs + footnote ⁶), docs/state.md (Recently-closed row T-CUDA-AIM-ADM3-2026-05-18), changelog.d/added/0574-hdr-features-cuda-twins-aim-adm3.md, docs/rebase-notes.md (this entry).
ADR-0606 — macOS SIGSEGV deep-fix in output.c writers (PR #1403 follow-up)¶
Rebase-sensitive invariant: i >= fc->feature_vector[j]->capacity (not >) in all seven frame-iteration bounds checks in core/src/output.c. If upstream Netflix ever backports a fix to the same comparison sites and uses > (the old, buggy form), the rebase must preserve the >= — the > form is a heap buffer overread UB that surfaces on macOS with MALLOC_PERTURB_=198.
The fps computation guard (if (vmaf->pic_cnt == 0 || timer_elapsed == 0)) in core/src/libvmaf.c is similarly rebase-sensitive: if upstream modifies the fps block, preserve the guard before dividing so import-only callers (those that use vmaf_import_feature_score without vmaf_read_pictures) do not produce 0.0/0.0 which may SIGFPE on Apple platforms.
Touched files: core/src/output.c (7 bounds-check sites, json pool-score + frames comma fixes), core/src/libvmaf.c (fps defensive computation), docs/adr/0606-macos-vmaf-write-output-segv-deep-fix.md, docs/adr/README.md (one index row), docs/state.md (Recently-closed row), changelog.d/fixed/0606-macos-vmaf-write-output-segv-deep-fix.md, docs/rebase-notes.md (this entry).
ADR-0612 — vmaf-tune compare: decode reference YUV once (shared-ref fix)¶
ADR-0607 — vmaf-tune compare: decode reference YUV once (shared-ref fix)¶
No rebase impact: all touched files are fork-local Python harness files. No upstream C sources, no public headers, no FFmpeg patch series involved.
Touched files: tools/vmaf-tune/src/vmaftune/compare.py (pre_decoded_ref param on compare_codecs and compare_codecs_sweep), tools/vmaf-tune/src/vmaftune/cli.py (decode-once block + try/finally in _run_compare; imports _decode_to_raw_yuv from .score), tools/vmaf-tune/tests/test_bbb_e2e_v15_shared_ref.py (7 acceptance tests), docs/adr/0607-vmaftune-shared-ref-yuv-decode-once.md, docs/adr/README.md (one index row), changelog.d/fixed/0607-vmaftune-shared-ref-yuv-decode-once.md,
ADR-0612 — Tiny-AI Netflix corpus training scaffold (2026-05-19 iteration)¶
No rebase-sensitive invariants introduced by this PR — all changes are documentation, research digest, and CHANGELOG fragment. No C/CUDA/SIMD paths modified; no loader or test code changed.
The one invariant worth noting for future rebases: the .workingdir2/netflix/ corpus path is local-only and gitignored. If a future rebase touches .gitignore, confirm that the *.yuv and .workingdir2/ entries remain in place. Training scripts must continue to accept --data-root as an explicit CLI flag rather than hard-coding the path.
Touched files: docs/adr/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/adr/_index_fragments/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/adr/_index_fragments/_order.txt, docs/research/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/ai/training-data.md (cross-reference links), changelog.d/added/0612-tiny-ai-netflix-training-scaffold-2026-05-19.md, docs/rebase-notes.md (this entry).
ADR-0626 — SSH debug session on macOS CI failure (tmate)¶
No rebase-sensitive invariants — the change is limited to .github/workflows/libvmaf-build-matrix.yml (one new step) and docs/. No C, CUDA, SYCL, HIP, or Python paths touched.
If an upstream sync touches libvmaf-build-matrix.yml, confirm that the SSH debug session on test failure step is preserved after the merge and that its if: condition still references runner.os == 'macOS', failure(), and github.event_name == 'workflow_dispatch'. The action SHA pin (c0afd6f790e3a5564914980036ebf83216678101) will be bumped automatically by Renovate when a new mxschmitt/action-tmate release is tagged.
Touched files: .github/workflows/libvmaf-build-matrix.yml (one new step after "Run tests"), docs/development/ci-tmate-debug.md (new operator guide), docs/adr/0626-macos-ci-tmate-debug-on-failure.md, docs/adr/README.md (one index row), changelog.d/added/0626-macos-ci-tmate-debug-on-failure.md, docs/rebase-notes.md (this entry).
ADR-0628 — Remote-aware ADR number allocator¶
No rebase impact: all touched files are fork-local tooling and CI configuration. No upstream C sources, public headers, or FFmpeg patch series involved.
scripts/adr/next-free.sh is a fork-added script with no upstream analogue; it will never conflict on a Netflix upstream sync. The .github/workflows/ rule-enforcement.yml change adds a new step to an existing job — this file does not exist upstream, so no conflict is expected. The CLAUDE.md update extends §12 r8 prose only.
Touched files: scripts/adr/next-free.sh (remote-aware allocator + .git/adr-claims/ side-pointer), scripts/adr/tests/test-next-free-remote-aware.sh (new acceptance tests), .github/workflows/rule-enforcement.yml (phase-2 open-PR collision check), CLAUDE.md (§12 r8 extended allocator description), docs/adr/0628-adr-allocator-remote-aware.md, docs/adr/README.md (one index row), changelog.d/fixed/0628-adr-allocator-remote-aware.md,
ADR-0608 — MCP P0 fixes: isError, probe_backend, vmaf_version, vmaf_score_encoded¶
No rebase impact: all touched files are fork-local MCP server Python files and docs. No upstream C sources, no public headers, no FFmpeg patch series involved.
Touched files: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py (isError fix in _call_tool; new _probe_backend, _vmaf_version, _run_vmaf_score_encoded, _ffprobe_geometry, _decode_to_yuv functions; three new Tool registrations), mcp-server/vmaf-mcp/tests/test_mcp_p0_adr0608.py (11 new regression tests), mcp-server/vmaf-mcp/tests/test_smoke_e2e.py (2 test updates for new behavior), mcp-server/vmaf-mcp/README.md (tools table updated to 10 tools, 6 backends), docs/mcp/tools.md (new tool sections, corrected list_backends description and response body, updated error conventions table), docs/adr/0608-mcp-p0-iserror-and-probe-version-encoded.md, docs/adr/README.md (one index row), changelog.d/fixed/adr0608-mcp-p0-iserror-probe-version-encoded.md, changelog.d/added/adr0608-mcp-probe-backend-vmaf-version-encoded.md, docs/rebase-notes.md (this entry).
Master CI repair — DNN coverage, MCP smoke, and formatter drift (2026-05-19)¶
No rebase-sensitive invariants introduced — this PR is CI/test/doc hygiene only. No public C API, backend implementation, model artifact, FFmpeg patch, or Netflix golden-data assertion changed.
If a future rebase touches the DNN tiny-model smoke tests, preserve these contracts:
core/test/dnn/test_cli.shuses--tiny-resize bilinearfor thenr_metric_v1.onnxno-reference smoke because the shipped NR model is 224x224 and strict resize mode intentionally rejects the 576x324 fixture.- The same CLI smoke caps tiny-model inference at
--frame_cnt 1; it is a load/run smoke, not a full-clip numerical benchmark. mcp-server/vmaf-mcp/tests/test_smoke_e2e.pyexpects unknown tool names to raise, matching the ADR-0613 isError contract.- The Netflix golden and lavapipe parity gates keep per-command
timeoutwrappers plus step-leveltimeout-minutes; if a runner/backend hangs, CI must fail diagnostically instead of waiting for the full job-level timeout. - The Netflix golden CI lane intentionally invokes only
QualityRunnerTest::test_run_vmaf_runnerandQualityRunnerTest::test_run_vmaf_runner_checkerboard: together they cover the D24 normal pair plus the checkerboard 10-px and 1-px distorted pairs. Do not expand this lane into the broad Python quality/feature suites; those are separate test coverage, not the golden-data gate. - Keep the D24 normal and checkerboard invocations as separate workflow steps; this preserves a clear failure surface when the 1080p checkerboard pair is slow or stuck. The normal-pair step still needs a multi-minute budget on cold GitHub-hosted runners because the Python runner invokes the feature binary with ADM/VIF/motion debug output. The normal-pair budget is 21 minutes inside a 22-minute step; lowering it back to 7 or 11 minutes has timed out on cold 2026-05-19 GitHub-hosted runners before assertions completed.
- The lavapipe Vulkan VIF cross-backend lane has the same cold-runner constraint. Keep the required VIF step at a 15-minute command wrapper inside a 16-minute step; the old 8-minute wrapper timed out exactly on GitHub-hosted Ubuntu before the diff script could report a result.
python/vmaf/routine.py::run_test_on_dataset()only passes bootstrap stats kwargs when the runner exposes the full bootstrap score-key getter set. Normal VMAF and PSNR runners do not haveget_bagging_score_key()/ CI95 / all-model prediction fields; macOS tox exercises those normal runners throughrun_testing.py, so do not reintroduce unconditional bootstrap-key access.- Python doctests under
python/vmaf/tools/must not rely on platform/version scalar reprs or assertion traceback formatting. NumPy 2 can display scalar values asnp.float64(...), and Python 3.14 can append assert-expression details; keep examples explicit withfloat(...), string formatting, or first-line exception-message printing. core/src/thread_locale.cusesduplocale(LC_GLOBAL_LOCALE)as the base fornewlocale(LC_NUMERIC_MASK, "C", base). Do not restorenewlocale(LC_ALL_MASK, "C", NULL): macOS allocator poisoning can expose poisoned internal locale pointers as SIGSEGV in the output-writer tests.core/test/test_output.cmust not includelibvmaf.coroutput.cdirectly while also linking libvmaf. Usecore/src/libvmaf_priv.h::vmaf_feature_collector_get()for the internal collector access instead; duplicate implementation TUs have crashed Apple ld64 + LTO macOS jobs under allocator poisoning.core/src/dnn/model_loader.c::vmaf_dnn_sidecar_load()rejects oversized sidecars withstat()before opening them. Preserve this cheap metadata-only guard so thetest_vmaf_use_tiny_modeloversized-sidecar case does not enter platform stdio on the expected-EFBIGpath. The regression test copiesmodel/tiny/smoke_v0.onnxrather than synthesising an invalid ONNX blob; that keeps a missed sidecar gate as an assertion failure instead of an ORT invalid-model crash on macOS.core/src/output.cflushes each writer's stream before callingvmaf_thread_locale_pop(). Keep the flush inside the locale lifetime: path-basedvmaf_write_output()usesfdopen()and may otherwise defer the stream flush tofclose()after the temporary C numeric locale has been restored/freed, which is the macOS-only writer SIGSEGV shape.core/test/meson.builddefineslibvmaf_public_linkso public ABI tests link the shared library whendefault_library=both. Do not routetest_public_api_scoreortest_vmaf_use_tiny_modelback throughlibvmaf.get_static_lib()on macOS: Apple ld64 + LTO folds the public call into the test executable and reproduces the writer/DNN SIGSEGV shape. The internaltest_outputtarget keeps its static link forvmaf_feature_collector_get()but disables LTO at the target on Darwin only; Linux clang static-archive links still need-fltobecausesrc/libvmaf.acontains LLVM bitcode there.python/vmaf/core/asset.py::ORDERED_FILTER_LISTincludesfpsandformatbetweenpadandgblur. Keep that order stable: it controls both FFmpeg preprocessing command composition and the slugifiedAssetstring identity. The corresponding properties arefps_cmd,ref_fps_cmd, anddis_fps_cmd;formatis intentionally accessed viaget_filter_cmd("format", target)like the generic filter-only keys.core/src/feature/feature_extractor.hincludes generatedconfig.hbefore definingstruct VmafFeatureExtractor. Keep that include in the header, not just in selected consumer TUs: backend-enabled LTO builds need every extractor definition and the registry to see identicalHAVE_CUDA/HAVE_SYCL/HAVE_VULKANmacro state, otherwise GCC emits-Wlto-type-mismatchand may misoptimise extractor globals.core/src/feature/common/macros.h::FORCE_INLINEalready expands to an inline specifier on GCC/Clang. Do not re-add a second literalinlineto CSF / CAMBI / motion helper declarations; Clang reports the duplicate specifier throughout the build matrix.core/tools/vmaf.c::fetch_picture()owns a preallocated picture slot as soon asvmaf_fetch_preallocated_picture()succeeds. Preserve the EOF/read-error cleanup that unrefs that slot before returning1or-1, and preserve therun_frame_loop()cleanup for the opposite side when only one input read succeeded. Without both, the CLI can score and write output successfully, then hang forever invmaf_close()while the picture pool waits for unread slots to return.scripts/ci/cross_backend_{parity_gate,vif_diff}.pymust keep the backend-specific extractor alias("adm", "vulkan") -> "integer_adm_vulkan". ADR-0586 renamed Vulkan integer ADM to the canonical extractor name while CPU/CUDA/SYCL retained the historicaladm,adm_cuda, andadm_syclnames. Dropping the alias makes the lavapipe parity gate invoke the retiredadm_vulkancompatibility name and fail before comparing scores.core/src/feature/common/convolution_avx512.cvertical scanlines must use_mm512_loadu_ps/_mm512_storeu_ps.MAX_ALIGNis 32 bytes, not 64 bytes; the stride can be a 64-byte multiple while the row base is still only 32-byte aligned. Reintroducing aligned AVX-512 memory ops can crashfloat_vifon AVX-512-capable CPU runners.
Touched files: .github/workflows/tests-and-quality-gates.yml, docs/development/zed-migration-plan-2026-05-19.md, docs/metrics/features.md, docs/usage/python.md, core/src/dnn/AGENTS.md, core/src/dnn/model_loader.c, core/src/dnn/ort_backend.c, core/src/AGENTS.md, core/src/feature/AGENTS.md, core/src/feature/adm_csf_tools.h, core/src/feature/arm64/moment_sve2.c, core/src/feature/arm64/psnr_hvs_neon.c, core/src/feature/arm64/ssimulacra2_host_neon.c, core/src/feature/arm64/ssimulacra2_neon.c, core/src/feature/arm64/ssimulacra2_sve2.c, core/src/feature/barten_csf_tools.h, core/src/feature/cambi.c, core/src/feature/feature_dists.c, core/src/feature/feature_extractor.h, core/src/feature/feature_lpips.c, core/src/feature/feature_mobilesal.c, core/src/feature/fastdvdnet_pre.c, core/src/feature/integer_motion.c, core/src/feature/motion_blend_tools.h, core/src/feature/ssimulacra2.c, core/src/feature/transnet_v2.c, core/src/feature/vulkan/adm_vulkan.c, core/src/feature/vulkan/cambi_vulkan.c, core/src/feature/vulkan/float_vif_vulkan.c, core/src/feature/vulkan/integer_adm_vulkan.c, core/src/feature/vulkan/ssimulacra2_vulkan.c, core/src/feature/vif_tools.c, core/src/feature/x86/psnr_hvs_avx2.c, core/src/feature/x86/ssimulacra2_avx2.c, core/src/feature/x86/ssimulacra2_avx512.c, core/src/feature/x86/ssimulacra2_host_avx2.c, core/src/feature/x86/vif_avx512.c, core/src/libvmaf.c, core/src/libvmaf_priv.h, core/src/framesync.c, core/src/model.c, core/src/output.c, core/src/picture.c, core/src/thread_locale.c, core/src/vulkan/vma_impl.cpp, core/tools/AGENTS.md, core/tools/vmaf.c, core/test/AGENTS.md, core/test/dnn/test_cli.sh, core/test/dnn/test_dnn_session_api.c, core/test/dnn/test_model_loader.c, core/test/dnn/test_ort_internals.c, core/test/dnn/test_tensor_io.c, core/test/dnn/test_vmaf_use_tiny_model.c, core/test/dnn/meson.build, core/test/meson.build, core/test/test_feature_extractor.c, core/test/test_framesync.c, core/test/test_model.c, core/test/test_output.c, core/test/test_predict.c, core/test/test_psnr_hvs_simd.c, core/test/test.c, mcp-server/vmaf-mcp/tests/test_smoke_e2e.py, python/test/asset_test.py, python/vmaf/core/asset.py, core/tools/vmaf_bench.c, core/src/feature/common/AGENTS.md, core/src/feature/common/convolution.h, core/src/feature/common/convolution_avx512.c, core/test/test_vif_simd.c, scripts/ci/AGENTS.md, scripts/ci/cross_backend_parity_gate.py, scripts/ci/cross_backend_vif_diff.py, scripts/ci/test_cross_backend_feature_names.py, docs/state.md, changelog.d/fixed/master-ci-dnn-mcp-coverage-2026-05-19.md, docs/rebase-notes.md (this entry).
ADR-0640 — Tiny-AI Netflix corpus training scaffold (2026-05-20 iteration)¶
No rebase impact on upstream C sources or FFmpeg patches. All touched files are fork-local docs, Python test infrastructure, and the changelog fragment tree.
Key invariants (track when upstream Netflix/vmaf adds its own training surface):
.workingdir2/netflix/is gitignored and the 37 GB corpus is never committed. The--data-rootflag (orVMAF_DATA_ROOTenv var) is the mandatory CLI interface; any training script that hard-codes the corpus path violates this invariant.mcp-server/vmaf-mcp/tests/test_smoke_e2e.pyruns against committed fixtures only (python/test/resource/yuv/src01_hrc00_576x324.yuv). Do not change the smoke test to reference.workingdir2/netflix/.- Architecture selection and the actual training run are deferred to a follow-up PR; do not trigger training from the scaffold branch.
Touched files: docs/adr/0640-tiny-ai-netflix-training-scaffold-2026-05-20.md, docs/adr/_index_fragments/0640-tiny-ai-netflix-training-scaffold-2026-05-20.md, docs/adr/_index_fragments/_order.txt, docs/research/0615-tiny-ai-netflix-training-2026-05-20.md, docs/ai/training-data.md (See also section extended), changelog.d/added/0640-tiny-ai-netflix-training-scaffold-2026-05-20.md, docs/rebase-notes.md (this entry).
ADR-0643 — vmaf-tune encoder-profile report contract¶
No rebase impact on upstream libvmaf C sources. This change touches fork-local vmaf-tune Python code, docs, tests, and the FFmpeg patch stack. The FFmpeg integration is advisory CLI glue only.
Key invariants:
ReportData.to_dict()embedsencoder_profile.schema == "vmaftune.encoder_profile.v1". Future report-shape changes should be additive or should bump the profile schema.vmaf-tune encode-profilemust read raw JSON, HTML, and Markdown reports. HTML raw JSON is escaped in<pre>and intentionally unescaped before parsing.- The profile reader selects one recommendation by
--codec,--target-vmaf, and/or--recommendation-index; it must not implicitly encode every codec or ladder rung. - FFmpeg patch
0015-vmaf-tune-profile-cli-glue.patchstays advisory. Do not duplicate vmaf-tune's JSON/profile selection logic in FFmpeg. - FFmpeg 8.x.x base: upstream tags were fetched on 2026-05-20 and the latest released 8.x.x tag was
n8.1.1(n8.2-devis a dev tag). The fullffmpeg-patches/000*-*.patchseries replayed cleanly against a temporary pristinen8.1.1worktree.
Touched files: tools/vmaf-tune/src/vmaftune/report.py, tools/vmaf-tune/src/vmaftune/encoder_profile.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_report.py, tools/vmaf-tune/tests/test_encoder_profile.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-ffmpeg.md, ffmpeg-patches/0015-vmaf-tune-profile-cli-glue.patch, ffmpeg-patches/series.txt, ffmpeg-patches/README.md, docs/adr/0643-vmaf-tune-encoder-profile-contract.md, docs/adr/_index_fragments/0643-vmaf-tune-encoder-profile-contract.md, docs/adr/_index_fragments/_order.txt, docs/research/0643-vmaf-tune-encoder-profile-contract.md, changelog.d/added/vmaf-tune-encoder-profile.md, docs/rebase-notes.md (this entry).
ADR-0644 — vmaf-tune codec runtime variants¶
No upstream Netflix C-source rebase impact. The change is confined to the fork-local tools/vmaf-tune Python CLI/report schema, usage docs, and ADR metadata.
Key invariants:
ADAPTER@VARIANTis a compare display token. The baseADAPTERstill routes through the codec-adapter registry and FFmpeg-c:vencoder name.--encoder-ffmpeg-bin TOKEN=PATHis an exact-token binding. Unknown binding keys are rejected rather than silently falling back to the global--ffmpeg-bin.- Compare JSON/CSV rows now include
adapter,runtime_variant, andffmpeg_bin. Keep these fields together if future schema work touchesCOMPARE_ROW_KEYS.
Touched files: tools/vmaf-tune/src/vmaftune/encoder_runtime.py, tools/vmaf-tune/src/vmaftune/compare.py, tools/vmaf-tune/src/vmaftune/cli.py, tools/vmaf-tune/tests/test_encoder_runtime.py, tools/vmaf-tune/tests/test_compare.py, tools/vmaf-tune/tests/test_compare_no_bisect.py, tools/vmaf-tune/tests/test_compare_rate_quality_sweep.py, tools/vmaf-tune/AGENTS.md, docs/usage/vmaf-tune.md, docs/usage/vmaf-tune-codec-adapters.md, docs/adr/0644-vmaf-tune-codec-runtime-variants.md, docs/adr/_index_fragments/0644-vmaf-tune-codec-runtime-variants.md, docs/adr/_index_fragments/_order.txt, docs/research/0644-vmaf-tune-codec-runtime-variants.md, changelog.d/added/0644-vmaf-tune-codec-runtime-variants.md, docs/rebase-notes.md (this entry).
ADR-0645 — Integer ADM p-norm SIMD callback ABI¶
When rebasing any upstream change that touches integer ADM contrast-measure callbacks, keep adm_p_norm threaded through the scalar and x86 SIMD twins.
Touched ABI group: core/src/feature/integer_adm.c, core/src/feature/x86/adm_avx2.c, core/src/feature/x86/adm_avx512.c, core/src/feature/x86/adm_avx2.h, core/src/feature/x86/adm_avx512.h.
Invariant: adm_cm and i4_adm_cm must accept the p-norm parameter and the final powf exponent must be 1.0f / (float)adm_p_norm in every twin. The default 3.0 path is the Netflix-compatible path; do not split SIMD dispatch back to a hard-coded exponent when resolving conflicts.
ADR-0648 — CHUG HDR MOS trainer entry point¶
CHUG HDR subjective-MOS experiments use ai/scripts/train_chug_hdr_mos_head.py and local chug_hdr_mos_head_v1 manifests. Keep CHUG operator docs on that entry point; do not reintroduce instructions that pass CHUG shards through train_konvid_mos_head.py's KonViD-named flags. The wrapper may reuse the shared MOS-head implementation, but it must pass explicit non-existent KonViD paths so CHUG runs cannot accidentally mix local KonViD rows with HDR MOS shards.
ADR-0649 — CHUG HDR wide MOS feature schema¶
train_chug_hdr_mos_head.py defaults to --feature-schema chug-hdr-wide-v1. That schema is CHUG-local and currently 34 columns: canonical-6 means, p10/p90 / std temporal aggregates, and HDR ladder / geometry metadata. Do not edit the KonViD FEATURE_COLUMNS order to implement CHUG experiments; keep the shipped konvid_mos_head_v1 ONNX on the konvid-v1 11-column schema. Downstream CHUG experiment scripts must read feature_schema and feature_order from the manifest instead of assuming 11 inputs.
ADR-0331 — rule-enforcement ready-for-review trigger repair¶
.github/workflows/rule-enforcement.yml must include ready_for_review in its pull_request.types list, alongside edited. Without ready_for_review, draft-to-ready promotion leaves the ADR-0108, ADR-0100, FFmpeg-surface, ADR-number, backfill, and docs/state.md gates stuck on their draft-time skipped check runs while the heavier workflows rerun correctly. Keep edited as well so PR-body fixes can rerun only the rule-enforcement workflow without burning the full matrix again.
Test build graph — generated vcs_version.h dependency¶
core/test/test_feature_collector.c directly includes core/src/libvmaf.c, and libvmaf.c includes the generated vcs_version.h header. Keep rev_target listed in the test_feature_collector executable sources in core/test/meson.build; otherwise fresh parallel Ninja builds can compile the test before include/vcs_version.h exists and fail nondeterministically.
Vulkan lavapipe CI — motion probes stay out of the VIF job¶
The Vulkan VIF Cross-Backend (lavapipe, places=4) job should not run the known-broken motion / motion_v2 lavapipe probes with continue-on-error: true. GitHub still emits ##[error] annotations for those advisory failures, which makes a passing PR look broken. Keep the documented T-VULKAN-MOTION-LAVAPIPE-INIT debt in docs/state.md and keep the required GPU-Parity Matrix Gate skip list until the Vulkan motion lavapipe bug is actually fixed; do not reintroduce advisory failing steps inside the named VIF gate.
ADR-0647 — fr_regressor_v1 Netflix refresh¶
No upstream Netflix C-source rebase impact. This is a fork-local model artifact refresh: model/tiny/fr_regressor_v1.onnx, its sidecar, registry row, model card, ADR/research docs, and state/changelog metadata.
Key invariants:
- The ADR-0249 model recipe and PLCC ship gate stay unchanged. Do not use this refresh as precedent for changing architecture, feature order, or gate threshold.
- Refreshes must train from a dated current full-feature table, not from stale
runs/full_features_netflix.parquet. fr_regressor_v1.onnxis inline after export. If the exporter rewrites the stale sibling.onnx.datafile while the ONNX has no external initializers, restore the orphan sidecar rather than expanding the model diff.
Touched files: model/tiny/fr_regressor_v1.onnx, model/tiny/fr_regressor_v1.json, model/tiny/registry.json, docs/ai/models/fr_regressor_v1.md, docs/adr/0647-ai-fr-regressor-v1-refresh-20260520.md, docs/adr/_index_fragments/0647-ai-fr-regressor-v1-refresh-20260520.md, docs/adr/_index_fragments/_order.txt, docs/research/0647-ai-fr-regressor-v1-refresh-20260520.md, ai/AGENTS.md, docs/state.md, changelog.d/changed/0647-ai-fr-regressor-v1-refresh-20260520.md, docs/rebase-notes.md (this entry).
ADR-0651 — CHUG HDR row metadata¶
No upstream Netflix C-source rebase impact. This is a fork-local AI corpus-materialisation schema extension in ai/scripts/chug_extract_features.py.
Key invariants:
- CHUG feature rows now preserve
feature_ref_*andfeature_dis_*ffprobe HDR/display metadata for the matched reference and distorted clip. - Unknown ffprobe fields remain explicit as
unknownornull; do not infer panel/display capability in the materialiser. - The existing
--audit-outputcorpus preflight remains the aggregate health check; the row fields are the model-facing copy.
Touched files: ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, docs/ai/chug-ingestion.md, docs/adr/0651-chug-hdr-row-metadata.md, docs/adr/_index_fragments/0651-chug-hdr-row-metadata.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0651-chug-hdr-row-metadata.md, ai/AGENTS.md, changelog.d/added/0651-chug-hdr-row-metadata.md, docs/rebase-notes.md (this entry).
ADR-0652 — CHUG visual-signal primitives¶
No upstream Netflix C-source rebase impact. This is a fork-local AI feature-row schema extension in ai/scripts/chug_extract_features.py.
Key invariants:
- CHUG feature rows now include
feature_ref_*,feature_dis_*, andfeature_delta_*luma-domain visual-signal primitives forluma_std,sharpness_laplacian_var,highfreq_abs_mean, andnoise_lap_mad. - These are deterministic diagnostic blur/noise/grain proxies computed from sampled decoded YUV10 luma frames. Do not treat them as a trained no-reference VQA model.
- The visual-signal cache lives beside the existing CHUG feature cache and must be regenerated if the primitive definitions change.
Touched files: ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, docs/ai/chug-ingestion.md, docs/adr/0652-chug-visual-signal-primitives.md, docs/adr/_index_fragments/0652-chug-visual-signal-primitives.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0652-chug-visual-signal-primitives.md, ai/AGENTS.md, changelog.d/added/0652-chug-visual-signal-primitives.md, docs/rebase-notes.md (this entry).
ADR-0653 — CHUG display-profile training¶
No upstream Netflix C-source rebase impact. This is a fork-local CHUG HDR MOS training-schema extension under ai/ plus docs and DDD material.
Key invariants:
chug-hdr-wide-v1remains the no-profile CHUG default.--display-profile-jsonselectschug-hdr-display-v1only when the caller did not explicitly pass--feature-schema.- Row-local display fields override the target profile so future multi-display HDR corpora keep their panel axis.
- The display profile is recorded in the emitted manifest with normalized feature values and source sha256.
Touched files: ai/scripts/train_konvid_mos_head.py, ai/scripts/train_chug_hdr_mos_head.py, ai/tests/test_train_konvid_mos_head.py, docs/ai/chug-ingestion.md, docs/ai/mos-corpora.md, docs/ai/models/konvid_mos_head_v1.md, docs/adr/0653-chug-display-profile-training.md, docs/adr/_index_fragments/0653-chug-display-profile-training.md, docs/adr/_index_fragments/_order.txt, docs/research/0653-chug-display-profile-training.md, ai/AGENTS.md, changelog.d/added/0653-chug-display-profile-training.md, docs/rebase-notes.md (this entry).
ADR-0657 — Second-opinion feature materializer¶
No upstream Netflix C-source rebase impact. This is a fork-local AI feature-table enrichment utility under ai/scripts/.
Key invariants:
ai/scripts/materialize_second_opinion_features.pystays table-side: it joins already-generated scorer JSON/JSONL and must not invoke or vendor third-party VQA projects.- Output columns remain namespaced as
second_opinion_<scorer>_*so downstream audits and trainers can detect NR/MOS evidence without colliding with native corpus columns. - Duplicate
(scorer, key)rows are rejected; they usually indicate stale reruns or mismatched row keys and must not be averaged silently.
Touched files: ai/scripts/materialize_second_opinion_features.py, ai/scripts/signal_mix_audit.py, ai/tests/test_second_opinion_features.py, docs/ai/second-opinion-features.md, docs/ai/signal-mix-audit.md, docs/ai/index.md, docs/adr/0657-second-opinion-feature-materializer.md, docs/adr/_index_fragments/0657-second-opinion-feature-materializer.md, docs/adr/_index_fragments/_order.txt, docs/research/0657-second-opinion-feature-materializer.md, ai/AGENTS.md, mkdocs.yml, changelog.d/added/0657-second-opinion-feature-materializer.md, docs/rebase-notes.md (this entry).
ADR-0658 — Project modernization audit¶
No upstream Netflix C-source rebase impact. This is a fork-local developer-tooling audit under scripts/dev/.
Key invariants:
scripts/dev/project_modernization_audit.pyis read-only. It may emit JSON and Markdown, but it must not rewrite.workingdir2/OPEN.md,.workingdir2/BACKLOG.md,docs/state.md, changelog fragments, or PR bodies.- The scanner is advisory queue shaping, not a required CI gate. Its marker matches are intentionally text-based and need human triage.
- Archived scratch remains skipped by default; include it only with
--include-archivesduring deliberate archaeology.
Touched files: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, docs/development/project-modernization-audit.md, docs/adr/0658-project-modernization-audit.md, docs/adr/_index_fragments/0658-project-modernization-audit.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0658-project-modernization-audit.md, scripts/AGENTS.md, mkdocs.yml, changelog.d/added/0658-project-modernization-audit.md, docs/rebase-notes.md (this entry).
ADR-0659 — Modernization audit false-positive filter¶
No upstream Netflix C-source rebase impact. This is a fork-local developer-tooling precision fix under scripts/dev/.
Key invariants:
- Live Python
raise NotImplementedError(...)rows remain high-severity audit findings. - Historical closeout prose such as "replaced the NotImplementedError scaffold", Python
except NotImplementedErrorhandlers, and customNotImplementedErrorexception subclasses are not modernization gaps. - Documented
-ENOSYSoptional-build contracts are not modernization gaps; barereturn -ENOSYS;rows outside such context still are. - Add future suppressions as narrow line-context tests; avoid file-level suppressions that could hide new real debt.
Touched files: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, docs/development/project-modernization-audit.md, docs/adr/0659-modernization-audit-false-positive-filter.md, docs/adr/_index_fragments/0659-modernization-audit-false-positive-filter.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0659-modernization-audit-false-positive-filter.md, scripts/AGENTS.md, changelog.d/fixed/0659-modernization-audit-false-positive-filter.md, docs/rebase-notes.md (this entry).
ADR-0661 — AI run manifest provenance¶
No upstream Netflix C-source rebase impact. This is fork-local AI tooling under ai/ plus human-facing docs.
Key invariants:
- New AI training/export sidecars should use
aiutils.run_manifest.build_run_provenance()instead of hand-rolled path hashing or argument JSON. - CHUG MOS wrapper runs record
train_chug_hdr_mos_head.pyas the user-facingentrypoint, even though they delegate into the shared KonViD training loop. - Add
shared_trainerwhen wrapper identity and implementation script differ.
Touched files: ai/src/aiutils/run_manifest.py, ai/src/aiutils/__init__.py, ai/scripts/train_konvid_mos_head.py, ai/scripts/train_chug_hdr_mos_head.py, ai/tests/test_run_manifest.py, ai/tests/test_train_konvid_mos_head.py, ai/AGENTS.md, ai/src/aiutils/AGENTS.md, docs/ai/training.md, docs/ai/models/konvid_mos_head_v1.md, docs/ai/chug-ingestion.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/adr/_index_fragments/0661-ai-run-manifest-provenance.md, docs/adr/_index_fragments/_order.txt, docs/adr/README.md, docs/research/0661-ai-run-manifest-provenance.md, changelog.d/added/0661-ai-run-manifest-provenance.md, docs/rebase-notes.md (this entry).
ADR-0662 — Vulkan motion lavapipe parity¶
Rebase-sensitive feature-extractor impact. This changes fork-local GPU motion twins and CI parity routing; keep these invariants when resolving any upstream sync that touches motion, feature registration, or parity scripts.
Key invariants:
integer_motion_vulkanstays before legacymotion_vulkanin the Vulkan registry block so model feature-name dispatch chooses the lavapipe-stable canonical twin.- Both parity scripts keep
BACKEND_EXTRACTOR_ALIASES[("motion", "vulkan")] = "integer_motion_vulkan". - CUDA, SYCL, and Vulkan
motion_v2kernels use the CPUinteger_motion_v2.c::mirrorhigh-edge literal2 * size - idx - 2; do not restore the stale-1formula from old ADR-0193 prose. integer_motion_vulkandefaultsdebug=true, matching CPU, CUDA, and the legacy Vulkan motion extractor, so the rawinteger_motionmetric is emitted for parity.
Touched files: .github/workflows/tests-and-quality-gates.yml, core/src/feature/feature_extractor.c, core/src/feature/vulkan/integer_motion_vulkan.c, core/src/feature/vulkan/shaders/motion_v2.comp, core/src/feature/cuda/integer_motion_v2/motion_v2_score.cu, core/src/feature/sycl/integer_motion_v2_sycl.cpp, scripts/ci/cross_backend_vif_diff.py, scripts/ci/cross_backend_parity_gate.py, docs/metrics/motion.md, docs/metrics/features.md, docs/backends/vulkan/overview.md, docs/api/gpu.md, docs/development/cross-backend-gate.md, docs/adr/0193-motion-v2-vulkan.md, docs/adr/0662-vulkan-motion-lavapipe-parity.md, docs/adr/_index_fragments/0193-motion-v2-vulkan.md, docs/adr/_index_fragments/0662-vulkan-motion-lavapipe-parity.md, docs/adr/_index_fragments/_order.txt, docs/research/0662-vulkan-motion-lavapipe-parity.md, core/src/feature/AGENTS.md, core/src/feature/vulkan/AGENTS.md, core/src/feature/cuda/AGENTS.md, scripts/ci/AGENTS.md, changelog.d/fixed/0662-vulkan-motion-lavapipe-parity.md, docs/rebase-notes.md (this entry).
ADR-0663 — MOS label materializer¶
No upstream Netflix C-source rebase impact. This is fork-local AI training/data-prep plumbing under ai/scripts/.
Key invariants:
ai/scripts/materialize_mos_labels.pystays table-side: it joins subjective MOS labels onto already-extracted feature tables and must not extract features, download corpora, or train models.- Real MOS-head training must not silently synthesize data when explicit real-corpus paths produce zero labelled rows.
--smokeis the documented synthetic path. - Conflicting duplicate label keys are rejected; low unique-key coverage fails by default so stale key joins do not become training inputs.
Touched files: ai/scripts/materialize_mos_labels.py, ai/scripts/train_konvid_mos_head.py, ai/tests/test_materialize_mos_labels.py, ai/tests/test_train_konvid_mos_head.py, docs/ai/mos-label-materializer.md, docs/ai/mos-corpora.md, docs/ai/models/konvid_mos_head_v1.md, docs/ai/index.md, docs/adr/0663-mos-label-materializer.md, docs/adr/_index_fragments/0663-mos-label-materializer.md, docs/adr/_index_fragments/_order.txt, docs/research/0663-mos-label-materializer.md, ai/AGENTS.md, mkdocs.yml, changelog.d/added/0663-mos-label-materializer.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — AI model sidecar run provenance¶
AI-sidecar provenance impact. This widens ADR-0661 from MOS-head trainers to the FR regressor training family and the vmaf_tiny exporter family.
Key invariants:
train_fr_regressor.py,train_fr_regressor_v2.py, andtrain_fr_regressor_v3.pysidecars carryrun_provenancebuilt byaiutils.run_manifest.build_run_provenance().- v1/v2 metrics JSON carries the same block, including gate-failed runs where no ONNX export is written.
export_vmaf_tiny_v2.py,export_vmaf_tiny_v3.py, andexport_vmaf_tiny_v4.pysidecars carry the same block with the checkpoint input and ONNX/sidecar output targets.- Do not replace this with per-script argument/path JSON when rebasing AI trainer changes; extend the shared helper instead.
Touched files: ai/scripts/export_vmaf_tiny_v2.py, ai/scripts/export_vmaf_tiny_v3.py, ai/scripts/export_vmaf_tiny_v4.py, ai/scripts/train_fr_regressor.py, ai/scripts/train_fr_regressor_v2.py, ai/scripts/train_fr_regressor_v3.py, ai/tests/test_fr_regressor_run_provenance.py, ai/tests/test_vmaf_tiny_export_run_provenance.py, docs/ai/training.md, docs/ai/models/fr_regressor_v1.md, docs/ai/models/fr_regressor_v2.md, docs/ai/models/fr_regressor_v3.md, docs/ai/models/vmaf_tiny_v2.md, docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0664-ai-fr-regressor-run-provenance.md, changelog.d/added/0664-ai-fr-regressor-run-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — AI eval report run provenance¶
AI-eval/validate provenance impact. This widens ADR-0661 from model-producing sidecars to the tiny-VMAF evaluation and validation report families.
Key invariants:
eval_loso_vmaf_tiny_v3.py,eval_loso_vmaf_tiny_v4.py,eval_loso_vmaf_tiny_v5.py, andeval_multiseed_v3_v4.pyreport JSON files carryrun_provenancebuilt byaiutils.run_manifest.build_run_provenance().- Evaluation reports record the feature parquet input(s), parsed eval hyperparameters, original argv, and
report_targetoutput path. validate_ensemble_seeds.pyverdict JSON files carry the same schema and recordloso_dir,corpus_root, seed list, gate thresholds, and thePROMOTE.json/HOLD.jsonoutput path.- Do not restore per-script
json.dumps(...).write_text(...)report writers when rebasing eval-script changes; usewrite_manifest_json()so the JSON shape and newline handling stay shared.
Touched files: ai/scripts/eval_loso_vmaf_tiny_v3.py, ai/scripts/eval_loso_vmaf_tiny_v4.py, ai/scripts/eval_loso_vmaf_tiny_v5.py, ai/scripts/eval_multiseed_v3_v4.py, ai/scripts/validate_ensemble_seeds.py, ai/tests/test_eval_report_run_provenance.py, ai/tests/test_validate_ensemble_seeds.py, docs/ai/training.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md, docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md, docs/ai/models/vmaf_tiny_v5.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0665-ai-eval-report-run-provenance.md, ai/src/aiutils/AGENTS.md, changelog.d/added/0665-ai-eval-report-run-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Legacy AI eval report run provenance¶
Legacy eval/report provenance impact. This widens ADR-0661 adoption from the refreshed v3/v4/v5 eval family to older durable AI evaluation reports.
Key invariants:
eval_loso_mlp_small.pyandeval_loso_3arch.pyJSON reports carryrun_provenancebuilt byaiutils.run_manifest.build_run_provenance().eval_probabilistic_proxy.py --metrics-outwrites the same block with the ensemble manifest input, optional held-out parquet, and metrics output path.eval_saliency_per_mb.pywrites the same block for CLI output, recording the predicted and ground-truth mask directories plus block settings.- Do not restore direct
json.dump()writers for these durable reports when rebasing old eval-script changes; usewrite_manifest_json()for stable sorting and newline handling.
Touched files: ai/scripts/eval_loso_mlp_small.py, ai/scripts/eval_loso_3arch.py, ai/scripts/eval_probabilistic_proxy.py, ai/scripts/eval_saliency_per_mb.py, ai/tests/test_legacy_eval_report_run_provenance.py, ai/tests/test_eval_saliency_per_mb.py, docs/ai/training.md, docs/ai/loso-eval.md, docs/ai/saliency-per-mb-eval.md, docs/ai/models/fr_regressor_v2_probabilistic.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0666-ai-legacy-eval-report-provenance.md, changelog.d/added/0666-ai-legacy-eval-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Predictor v2 real-corpus report provenance¶
Predictor-v2 report provenance impact. This widens ADR-0661 adoption to the per-codec real-corpus gate report used before predictor-v2 model-card updates.
Key invariants:
ai/scripts/train_predictor_v2_realcorpus.pywritesruns/predictor_v2_realcorpus/report.jsonwith arun_provenanceblock built byaiutils.run_manifest.build_run_provenance().- The report records the trainer entrypoint, original argv, parsed arguments, explicit corpus files, corpus roots, resolved JSONL files, and report target.
- Keep ADR-0303 gate constants untouched; provenance makes failed or insufficient reports reproducible, but it does not change pass/fail logic.
- Do not restore direct
Path.write_text(json.dumps(...))report output here; usewrite_manifest_json()so JSON sorting and trailing-newline behavior stay shared with the other AI provenance reports.
Touched files: ai/scripts/train_predictor_v2_realcorpus.py, ai/tests/test_train_predictor_v2_realcorpus.py, docs/ai/predictor-v2-realcorpus-training.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0667-predictor-v2-report-provenance.md, changelog.d/added/0667-predictor-v2-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — vmaf_tiny train stats provenance¶
vmaf_tiny training provenance impact. This widens ADR-0661 adoption to the pre-export stats JSON files emitted by the vmaf_tiny trainer family.
Key invariants:
train_vmaf_tiny_v2.py,train_vmaf_tiny_v3.py,train_vmaf_tiny_v4.py, andtrain_vmaf_tiny_v5.pywrite their--out-statsJSON with arun_provenanceblock built byaiutils.run_manifest.build_run_provenance().- v2/v3/v4 stats record the parquet input, checkpoint target, stats target, argv, and parsed hyperparameters.
- v5 stats record both
parquet_baseandparquet_extra, plus checkpoint and stats output targets. - Do not restore direct
Path.write_text(json.dumps(...))stats output here; usewrite_manifest_json()so JSON sorting and trailing-newline behavior stay shared with the other AI provenance reports.
Touched files: ai/scripts/train_vmaf_tiny_v2.py, ai/scripts/train_vmaf_tiny_v3.py, ai/scripts/train_vmaf_tiny_v4.py, ai/scripts/train_vmaf_tiny_v5.py, ai/tests/test_vmaf_tiny_train_run_provenance.py, docs/ai/training.md, docs/ai/models/vmaf_tiny_v2.md, docs/ai/models/vmaf_tiny_v3.md, docs/ai/models/vmaf_tiny_v4.md, docs/ai/models/vmaf_tiny_v5.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0668-vmaf-tiny-train-stats-provenance.md, changelog.d/added/0668-vmaf-tiny-train-stats-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — AI materializer audit provenance¶
Materializer audit provenance impact. This widens ADR-0661 adoption to feature-table materializers and signal-mix audit reports that feed retraining or model-mix decisions.
Key invariants:
materialize_mos_labels.py --audit-jsonandmaterialize_second_opinion_features.py --audit-jsonincluderun_provenancein their audit JSON outputs.materialize_saliency_features.py --audit-jsonwrites row counters, the effective config, andrun_provenance; use it for saliency-enriched tables that feed retraining.signal_mix_audit.py --out-jsonincludesrun_provenancefor audited table paths, thresholds, argv, JSON output, and Markdown output.- Do not reintroduce bespoke path hashing or direct audit JSON writers on these surfaces; use
aiutils.run_manifest.build_run_provenance()andwrite_manifest_json().
Touched files: ai/scripts/materialize_mos_labels.py, ai/scripts/materialize_second_opinion_features.py, ai/scripts/materialize_saliency_features.py, ai/scripts/signal_mix_audit.py, ai/tests/test_materialize_mos_labels.py, ai/tests/test_second_opinion_features.py, ai/tests/test_materialize_saliency_features.py, ai/tests/test_signal_mix_audit.py, docs/ai/training.md, docs/ai/mos-label-materializer.md, docs/ai/second-opinion-features.md, docs/ai/saliency-feature-materializer.md, docs/ai/signal-mix-audit.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0669-ai-materializer-audit-provenance.md, changelog.d/added/0669-ai-materializer-audit-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Ensemble seed export provenance¶
Ensemble export provenance impact. This widens ADR-0661 adoption to the production seed exporter for fr_regressor_v2_ensemble_v1_seed* sidecars.
Key invariants:
ai/scripts/export_ensemble_v2_seeds.pybuilds onerun_provenanceblock per invocation with the corpus, PROMOTE verdict, parsed export args, argv, per-seed ONNX/sidecar targets, and optional registry target.- Each fresh
fr_regressor_v2_ensemble_v1_seed{N}.jsonsidecar receives that block. - Sidecar and optional registry writes use
write_manifest_json()so the JSON formatting contract matches the other ADR-0661 adopters.
Touched files: ai/scripts/export_ensemble_v2_seeds.py, ai/tests/test_export_ensemble_v2_seeds_provenance.py, docs/ai/training.md, docs/ai/models/fr_regressor_v2_probabilistic.md, docs/ai/ensemble-training-kit.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0670-ensemble-seed-export-provenance.md, changelog.d/added/0670-ensemble-seed-export-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Ensemble LOSO report provenance¶
Ensemble LOSO provenance impact. This widens ADR-0661 adoption to the per-seed loso_seed{N}.json reports emitted by the production ensemble LOSO trainer.
Key invariants:
ai/scripts/train_fr_regressor_v2_ensemble_loso.pywrites eachloso_seed{N}.jsonthroughwrite_manifest_json().- Each report includes
run_provenancewith the trainer entrypoint, original argv, parsed training args, corpus JSONL input, and per-seed report target. - The existing gate keys (
mean_plcc,min_plcc,max_plcc,folds, and seed metadata) remain unchanged forscripts/ci/ensemble_prod_gate.pyandai/scripts/validate_ensemble_seeds.py.
Touched files: ai/scripts/train_fr_regressor_v2_ensemble_loso.py, ai/tests/test_train_fr_regressor_v2_ensemble_loso_train.py, docs/ai/training.md, docs/ai/ensemble-v2-real-corpus-retrain-runbook.md, docs/ai/ensemble-training-kit.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0671-ensemble-loso-report-provenance.md, changelog.d/added/0671-ensemble-loso-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — vmaf-train CLI report provenance¶
vmaf-train report provenance impact. This widens ADR-0661 adoption to the user-facing vmaf-train --json report surfaces.
Key invariants:
ai/src/vmaf_train/cli.pyuses_write_cli_report_json()for durable report commands that accept--json.- Covered subcommands:
validate-norm,profile,audit-learned-filter,quantize-int8,cross-backend, andbisect-model-quality. - The provenance block records the CLI entrypoint, argv, parsed options, model/feature/calibration/frame inputs, JSON report output, and generated model output where a command writes one.
Touched files: ai/src/vmaf_train/cli.py, ai/tests/test_tune_cli.py, docs/usage/vmaf-train.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0672-vmaf-train-cli-report-provenance.md, changelog.d/added/0672-vmaf-train-cli-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Feature-correlation report provenance¶
Feature-correlation provenance impact. This widens ADR-0661 adoption to the feature-ranking report emitted by ai/scripts/feature_correlation.py.
Key invariants:
feature_correlation.py --outwrites the JSON report throughwrite_manifest_json().- The report includes
run_provenancewith the analyzer entrypoint, original argv, parsed target / redundancy / top-K arguments, source parquet input, and JSON report target. - The analytic payload keys (
pearson,redundant_pairs,importances,per_method_topk, andconsensus_topk) remain unchanged for downstream research/audit readers.
Touched files: ai/scripts/feature_correlation.py, ai/tests/test_feature_correlation.py, docs/research/0027-phase2-feature-importance.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0673-feature-correlation-report-provenance.md, changelog.d/added/0673-feature-correlation-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Phase-3 subset-sweep report provenance¶
Phase-3 sweep provenance impact. This widens ADR-0661 adoption to the model-selection JSON emitted by ai/scripts/phase3_subset_sweep.py.
Key invariants:
phase3_subset_sweep.py --outwrites the JSON report throughwrite_manifest_json().- The report keeps existing subset result keys and adds top-level
run_provenancewith the analyzer entrypoint, original argv, parsed subset / seed / standardization arguments, source parquet input, and JSON report target. - The subset result payload (
features,per_seed,summary) remains unchanged for each requested subset.
Touched files: ai/scripts/phase3_subset_sweep.py, ai/tests/test_phase3_subset_sweep.py, docs/research/0028-phase3-subset-sweep.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0674-phase3-subset-report-provenance.md, changelog.d/added/0674-phase3-subset-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — Quantisation report provenance¶
Quantisation provenance impact. This widens ADR-0661 adoption to the int8 producer/gate scripts used for model-card promotion evidence.
Key invariants:
ai/scripts/ptq_dynamic.py --report-outandai/scripts/ptq_static.py --report-outwrite JSON reports with fp32/int8 sizes, selected quantisation settings, andrun_provenance.ai/scripts/qat_train.py --report-outwrites QAT output/report metadata for the fp32 bridge and final int8 ONNX artifact.ai/scripts/measure_quant_drop.py --out-jsonpreserves per-model gate rows andrun_provenancewithout changing stdout or exit codes.
Touched files: ai/scripts/ptq_dynamic.py, ai/scripts/ptq_static.py, ai/scripts/qat_train.py, ai/scripts/measure_quant_drop.py, ai/tests/test_ptq_scripts.py, ai/tests/test_qat_smoke.py, docs/ai/quantization.md, docs/ai/training.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0682-quantization-report-provenance.md, changelog.d/added/0682-quantization-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0661 follow-up — CHUG extraction report provenance¶
CHUG extraction provenance impact. This widens ADR-0661 adoption to the local CHUG split manifest and HDR metadata audit JSON emitted before HDR MOS training.
Key invariants:
ai/scripts/chug_extract_features.py --split-manifestkeeps the existing content-level split payload and adds top-levelrun_provenance.ai/scripts/chug_extract_features.py --audit-outputkeeps the existing HDR audit counters/malformed-row payload and adds top-levelrun_provenance.- Feature JSONL rows are unchanged; this PR only stamps the durable split/audit JSON evidence with extractor command and input/output context.
Touched files: ai/scripts/chug_extract_features.py, ai/tests/test_chug.py, docs/ai/chug-ingestion.md, docs/adr/0661-ai-run-manifest-provenance.md, docs/research/0684-chug-extraction-report-provenance.md, changelog.d/added/0684-chug-extraction-report-provenance.md, docs/rebase-notes.md (this entry).
ADR-0668 — AI derived table provenance¶
Derived-table provenance impact. This extends the ADR-0661 manifest pattern from trainer/report JSONs down to the local FULL_FEATURES parquet builders that feed refreshed AI models.
Key invariants:
ai/scripts/extract_k150k_features.pywrites<out>.manifest.jsonby default with feature order, CPU/CUDA extractor split, restart counters, backend worker counts, parquet row count, and sharedrun_provenance.ai/scripts/combine_full_feature_parquets.pywrites<out>.manifest.jsonby default with input labels, per-input row counts, missing-feature fill lists, corpus distribution, output column order, and sharedrun_provenance.ai/scripts/enrich_k150k_parquet_metadata.pywrites<out>.manifest.jsonby default with metadata match/update counters, available metadata keys, overwrite policy, and sharedrun_provenance.- Existing parquet row schemas are unchanged; the manifest is a sibling local evidence artifact.
Touched files: ai/scripts/extract_k150k_features.py, ai/scripts/combine_full_feature_parquets.py, ai/scripts/enrich_k150k_parquet_metadata.py, ai/tests/test_extract_k150k_features.py, ai/tests/test_combine_full_feature_parquets.py, ai/tests/test_enrich_k150k_parquet_metadata.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/chug-ingestion.md, docs/adr/0668-ai-derived-table-provenance.md, docs/research/0688-ai-derived-table-provenance.md, changelog.d/added/0668-ai-derived-table-provenance.md, docs/rebase-notes.md (this entry).
ADR-0669 — AI corpus JSONL provenance¶
Corpus-JSONL provenance impact. This extends the ADR-0661 manifest pattern to the corpus JSONL boundary before trainers consume merged or aggregated row streams.
Key invariants:
ai/scripts/aggregate_corpora.pywrites<output>.manifest.jsonby default with MOS scale conversions, optional corpus-source overrides, aggregate counters, and sharedrun_provenance.ai/scripts/merge_corpora.pywrites<output>.manifest.jsonby default with required vmaf-tune corpus keys, natural dedup key, merge counters, and sharedrun_provenance.- JSONL row schemas are unchanged; run-level evidence belongs in the sidecar.
Touched files: ai/scripts/aggregate_corpora.py, ai/scripts/merge_corpora.py, ai/tests/test_aggregate_corpora.py, ai/tests/test_merge_corpora.py, ai/AGENTS.md, docs/ai/mos-corpora.md, docs/ai/multi-corpus-aggregation.md, docs/ai/training.md, docs/adr/0669-ai-corpus-jsonl-provenance.md, docs/research/0689-ai-corpus-jsonl-provenance.md, changelog.d/added/0669-ai-corpus-jsonl-provenance.md, docs/rebase-notes.md (this entry).
ADR-0670 — AI legacy corpus extraction manifests¶
Legacy trainer-input provenance impact. This extends the ADR-0661 manifest pattern to older corpus/extraction scripts that directly create local trainer-input parquets or vmaf-tune JSONL.
Key invariants:
ai/scripts/extract_full_features.pywrites<out>.manifest.jsonby default with Netflix corpus/cache inputs, VMAF binary evidence, feature list, pair count, row count, and sharedrun_provenance.ai/scripts/konvid_to_vmaf_pairs.pywrites<out>.manifest.jsonby default with KoNViD root, VMAF/model inputs, cache policy, CRF, feature list, clip / frame counters, failed clip IDs, and sharedrun_provenance.ai/scripts/bvi_dvc_to_corpus_jsonl.pywrites<output>.manifest.jsonby default with cache inputs, row schema version, adapter labels, row/cache counters, and sharedrun_provenance.- BVI-DVC JSONL rows must include the current vmaf-tune v3 additive keys; unavailable HDR, shot, canonical-feature aggregate, and encoder-internal values are explicit defaults, not missing columns.
Touched files: ai/scripts/extract_full_features.py, ai/scripts/konvid_to_vmaf_pairs.py, ai/scripts/bvi_dvc_to_corpus_jsonl.py, ai/tests/test_legacy_corpus_extraction_manifests.py, ai/AGENTS.md, docs/ai/training.md, docs/ai/mos-corpora.md, docs/adr/0670-ai-legacy-corpus-extraction-manifests.md, docs/research/0690-ai-legacy-corpus-extraction-manifests.md, changelog.d/added/0670-ai-legacy-corpus-extraction-manifests.md, docs/rebase-notes.md (this entry).
ADR-0673 — Saliency materializer batch manifest¶
Saliency table refresh impact. This adds a batch orchestration layer over ai/scripts/materialize_saliency_features.py. Rebase work that changes SaliencyMaterializeConfig, saliency status values, or table read/write semantics must update both the single-table script and the batch manifest runner/tests together.
Key invariants:
ai/scripts/batch_materialize_saliency_features.pymust import and reuse the single-table materializer functions; it must not duplicate FFmpeg decode, ffprobe fallback, saliency inference, or row status semantics.- Batch manifests carry shared defaults plus per-table overrides. Relative paths resolve from the manifest directory unless
--base-diris supplied. - Batch reports use schema
saliency-materializer-batch-v1and include ADR-0661run_provenance.
Touched files: ai/scripts/batch_materialize_saliency_features.py, ai/tests/test_batch_materialize_saliency_features.py, ai/AGENTS.md, docs/ai/saliency-feature-materializer.md, docs/adr/0673-saliency-materializer-batch-manifest.md, docs/research/0693-saliency-materializer-batch-manifest.md, changelog.d/added/0673-saliency-materializer-batch-manifest.md, docs/rebase-notes.md (this entry).
ADR-0674 — Second-opinion materializer batch manifest¶
Second-opinion table refresh impact. This adds a batch orchestration layer over ai/scripts/materialize_second_opinion_features.py. Rebase work that changes score-sidecar parsing, join-key policy, missing-score semantics, or run-provenance fields must update both the single-table joiner and the batch manifest runner/tests together.
Key invariants:
ai/scripts/batch_materialize_second_opinion_features.pymust import and reusematerialize_second_opinion_features.materialize(); external scorer execution remains outside this repo.- Batch manifests carry shared defaults plus per-table overrides. Relative paths resolve from the manifest directory unless
--base-diris supplied. - Batch reports use schema
second-opinion-materializer-batch-v1and include ADR-0661run_provenance.
Touched files: ai/scripts/batch_materialize_second_opinion_features.py, ai/tests/test_batch_materialize_second_opinion_features.py, ai/AGENTS.md, docs/ai/second-opinion-features.md, docs/adr/0674-second-opinion-materializer-batch-manifest.md, docs/research/0694-second-opinion-materializer-batch-manifest.md, changelog.d/added/0674-second-opinion-materializer-batch-manifest.md, docs/rebase-notes.md (this entry).
ADR-0675 — MOS label materializer batch manifest¶
MOS-labelled table refresh impact. This adds a batch orchestration layer over ai/scripts/materialize_mos_labels.py. Rebase work that changes MOS column inference, key-normalisation, match-rate enforcement, overwrite policy, or run-provenance fields must update both the single-table materializer and the batch manifest runner/tests together.
Key invariants:
ai/scripts/batch_materialize_mos_labels.pymust import and reusematerialize_mos_labels.materialize(); it must not parse MOS rows, extract features, or train models.- Batch manifests carry shared defaults plus per-table overrides. Relative paths resolve from the manifest directory unless
--base-diris supplied. - Batch reports use schema
mos-label-materializer-batch-v1and include ADR-0661run_provenance.
Touched files: ai/scripts/batch_materialize_mos_labels.py, ai/tests/test_batch_materialize_mos_labels.py, ai/AGENTS.md, docs/ai/mos-label-materializer.md, docs/adr/0675-mos-label-materializer-batch-manifest.md, docs/research/0695-mos-label-materializer-batch-manifest.md, changelog.d/added/0675-mos-label-materializer-batch-manifest.md, docs/rebase-notes.md (this entry).
ADR-0679 — CI draft auto-merge gate¶
Merge-train safety impact. The single required branch-protection context, Required Checks Aggregator, must not be skipped on draft PRs. It now runs on drafts and fails intentionally so GitHub cannot treat a draft-era skipped check as sufficient for auto-merge after the PR is marked ready.
Key invariants:
- Keep
required-aggregator.ymlfree of a job-level draft skip. Expensive sibling workflows may still skip draft PRs, but the required aggregate status must fail drafts and rerun onready_for_review. - The aggregator ignores sibling check runs older than the current workflow registration window and chooses the newest run per check name. This prevents stale draft-era skipped checks on the same SHA from masking ready-run checks.
- The ADR collision guard phase 1 compares against
BASE_SHA, not liveorigin/master, so a fast post-merge workflow cannot self-collide against the PR's own ADR.
Touched files: .github/workflows/required-aggregator.yml, .github/workflows/rule-enforcement.yml, .github/AGENTS.md, docs/adr/0679-ci-draft-automerge-gate.md, docs/research/0699-ci-draft-automerge-gate.md, changelog.d/fixed/0679-ci-draft-automerge-gate.md, docs/rebase-notes.md (this entry).
ADR-0680 — Shared AI CLI helper pattern¶
AI script helper impact. Batch manifest runners now share parser and raw argv boilerplate through aiutils.cli_helpers. Rebase work that changes the standard manifest/report/fail-fast flags should update the helper and all batch runner tests together instead of editing each runner independently.
Key invariants:
collect_cli_argv()is the canonical raw-argument capture for ADR-0661 provenance in scripts that accept an injectableargv.add_batch_manifest_arguments()owns--manifest,--base-dir,--report-json,--report-md,--fail-fast, and optional--allow-row-failuresfor batch manifest runners.- Table-specific manifest schemas and materializer semantics stay in the individual runner modules.
Touched files: ai/src/aiutils/cli_helpers.py, ai/scripts/batch_materialize_saliency_features.py, ai/scripts/batch_materialize_second_opinion_features.py, ai/scripts/batch_materialize_mos_labels.py, ai/tests/test_cli_helpers.py, ai/AGENTS.md, ai/src/aiutils/AGENTS.md, .claude/skills/ai-run-manifest/SKILL.md, docs/ai/training.md, docs/adr/0680-ai-cli-helper-pattern.md, docs/research/0700-ai-cli-helper-pattern.md, changelog.d/added/0680-ai-cli-helper-pattern.md, docs/rebase-notes.md (this entry).
ADR-0681 — AI script bootstrap helper¶
AI script import impact. Directly executable ai/scripts/*.py files now use ai/scripts/_script_bootstrap.py::bootstrap_ai_script(__file__) for repo-local imports before they import aiutils, sibling materializers, or vmaf-tune helpers.
Key invariants:
aiutilsmust remain free of startup path mutation; the bootstrap lives inai/scriptsbecause it has to run beforeai/srcis importable.- New ad hoc
sys.path.insert(...)blocks in AI scripts should be avoided. If a script needs a new repo-local root, extend_script_bootstrap.pyandai/tests/test_script_bootstrap.py. - The helper only owns import roots; artifact schemas, materializer rules, and report contents stay in the individual scripts.
Touched files: ai/scripts/_script_bootstrap.py, ai/scripts/batch_materialize_saliency_features.py, ai/scripts/batch_materialize_second_opinion_features.py, ai/scripts/batch_materialize_mos_labels.py, ai/scripts/enrich_k150k_parquet_metadata.py, ai/scripts/combine_full_feature_parquets.py, ai/scripts/extract_k150k_features.py, ai/tests/test_script_bootstrap.py, ai/AGENTS.md, ai/src/aiutils/AGENTS.md, .claude/skills/ai-run-manifest/SKILL.md, docs/ai/training.md, docs/adr/0681-ai-script-bootstrap-helper.md, docs/research/0701-ai-script-bootstrap-helper.md, changelog.d/changed/0681-ai-script-bootstrap-helper.md, docs/rebase-notes.md (this entry).
fix/mcp-cjson-banned-functions (ADR-0683)¶
No upstream rebase impact: core/src/mcp/3rdparty/cJSON/ is fork-local; upstream Netflix/vmaf does not vendor cJSON. There is no rebase conflict risk from the Netflix side.
Invariant: if this directory is synced to a newer cJSON upstream release, verify that no banned functions (sprintf, strcpy) have been re-introduced, and re-apply the fixes documented in ADR-0683. The AGENTS.md in this directory carries the exact grep command to check.
Smoke: ninja -C build && meson test -C build --suite=fast (no dedicated cJSON unit test; the MCP smoke covers the JSON paths).
Touched files: core/src/mcp/3rdparty/cJSON/cJSON.c, core/src/mcp/3rdparty/cJSON/AGENTS.md, docs/adr/0683-cjson-banned-function-remediation.md, docs/adr/README.md, changelog.d/fixed/0683-mcp-cjson-banned-functions.md, docs/rebase-notes.md (this entry).
2026-05-21 follow-up — contract-noise filter widening¶
No upstream Netflix C-source rebase impact. This stays within ADR-0659's scanner-precision policy.
Key invariant: suppress only context-bound false positives: optional-backend contracts that name HAVE_*, enable_*=false, missing loader/runtime, or CPU fallback; unit-test stub prose; and ADR allocator .md.stub reservation wording. Non-implementation "stub" uses such as Python type-stub packages, driver-stub diagnostics, and ABI-pinning disabled-build stub comments are also filtered. Do not add broad file-level allowlists.
Touched files: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, docs/development/project-modernization-audit.md, docs/research/0685-modernization-audit-contract-noise.md, scripts/AGENTS.md, changelog.d/fixed/0685-modernization-audit-contract-noise.md, docs/rebase-notes.md (this entry).
ADR-0682 — Tiny-AI Netflix corpus training scaffold — 2026-05-22 prep scope¶
- ADR: ADR-0682.
- Upstream source: fork-local. Netflix/vmaf has no tiny-AI training surface.
-
Branch:
ai/tiny-netflix-training-scaffold. Key invariants: -
Data path is local-only.
.workingdir2/netflix/is gitignored; YUV files are never committed. Every training script must accept--data-root(or theVMAF_DATA_ROOTenvironment variable) as the sole corpus entry point. - Branch name is the routine's idempotency key. Once
ai/tiny-netflix-training-scaffoldexists on origin, the daily prep-scaffolding routine exits silently. Do not rename or delete the branch until the follow-up architecture-selection PR has merged. - Netflix golden pairs are held-out only. The 3 pairs in
python/test/resource/yuv/(seeCLAUDE.md §8) are correctness gates; they are never used as training data. - Architecture selection is deferred. ADR-0682 and ADR-0242 document the alternatives table but do not pick an architecture. The follow-up PR must resolve questions (A), (B), (C) from ADR-0242 before any training run. Touched files:
docs/adr/0682-tiny-ai-netflix-training-scaffold-2026-05-22.md,docs/adr/_index_fragments/0682-tiny-ai-netflix-training-scaffold-2026-05-22.md,docs/research/0706-tiny-ai-netflix-training-prep-2026-05-22.md,changelog.d/added/0682-tiny-ai-netflix-training-scaffold-2026-05-22.md,docs/rebase-notes.md(this entry).
feat/bindings-rust-vmafx-sys (ADR-0706) — fork-only Rust crate, no Netflix upstream impact¶
No upstream rebase impact: bindings/rust/vmafx-sys, the root Cargo.toml, .github/workflows/rust-ci.yml, and docs/development/rust.md are wholly fork-local. Netflix/vmaf upstream has no Rust surface; upstream cherry-picks and port-upstream-commit syncs are unaffected. The libvmaf C public headers consumed by bindgen remain at core/include/libvmaf/ (ADR-0700 path); any future upstream header change that adds or removes a symbol is handled automatically by re-running cargo build (bindgen regenerates on every build).
feat/vmafx-phase4b-distributed-platform-adr-0709 — fork-only architectural decision, no Netflix upstream impact¶
No upstream rebase impact: ADR-0709 and the Phase 4b architecture diagram (docs/architecture/phase4b-distributed-platform.md) are wholly fork-local documents. Netflix/vmaf upstream has no controller/node/operator architecture, no Go or Rust binaries, and no rclone/eBPF integration. Upstream cherry-picks and port-upstream-commit syncs are unaffected.
The C ABI break decision (Phase 4b.8) will require updating ffmpeg-patches/ when the implementation PR lands; that PR's docs/rebase-notes.md entry will detail the specific patch files affected. This umbrella ADR does not touch any C source files.
Touched files: docs/adr/0709-vmafx-phase4b-distributed-platform.md, docs/architecture/phase4b-distributed-platform.md, changelog.d/added/vmafx-phase4b-umbrella-adr.md, docs/state.md, docs/rebase-notes.md (this entry), docs/adr/README.md.
docs/research-netflix-pipeline-backlog-audit (Research-0732) — research digest only, no Netflix upstream impact¶
no rebase impact: this PR adds only docs/research/0732-netflix-pipeline-backlog-audit.md and a changelog fragment. No C sources, headers, build files, or test fixtures are touched. Netflix/vmaf upstream cherry-picks and port-upstream-commit syncs are unaffected.
Touched files: docs/research/0732-netflix-pipeline-backlog-audit.md, changelog.d/added/0732-netflix-pipeline-backlog-audit.md, docs/state.md, docs/rebase-notes.md (this entry).
refactor/cpp23-pilot-metadata-handler — no upstream Netflix conflict¶
No rebase impact. metadata_handler.c is a fork-local refactor: Netflix/vmaf upstream also has a libvmaf/src/metadata_handler.c at the same path (pre-rename). The rename to .cpp is fork-local (upstream stays .c). If an upstream commit touches libvmaf/src/metadata_handler.c, the port must:
- Apply the upstream diff content to
core/src/metadata_handler.cppmanually (the C code is still valid C++ after the conversion). - Verify the
extern "C"guards inmetadata_handler.hare not disturbed. - Rebuild and re-run
make test-netflix-goldento confirm scores unchanged.
The meson.build change (replacing the src_dir + 'metadata_handler.c' entry with the metadata_handler_cpp20_lib static lib) is entirely fork-local and has no upstream equivalent.
Touched files: core/src/metadata_handler.cpp (was metadata_handler.c), core/src/metadata_handler.h (added extern "C" guards), core/src/meson.build (isolated static lib for C++20), core/test/meson.build (updated .c -> .cpp references), docs/adr/0708-vmafx-cpp23-internals-pilot.md, docs/research/0732-vmafx-cpp23-internals-migration-plan.md, changelog.d/changed/0708-cpp23-internals-pilot.md, docs/state.md (this entry), docs/rebase-notes.md (this entry).
ADR-0707 — TAD Rust pilot (cbindgen integration) — 2026-05-28¶
- ADR: ADR-0707.
- Upstream source: fork-local. Netflix/vmaf has no Rust feature extractors.
- Branch:
feat/tad-rust-pilot
Key rebase invariants:
core/src/feature/feature_extractor.cgains#if HAVE_RUST_TADguards around thevmaf_fex_tadextern and list entry. On upstream sync, ensure these guards are preserved; do not merge the upstream version of this file without re-applying the guards.core/src/meson.buildhas acargo build --releasecustom_target and adeclare_dependencyfor the Rust archive. These are entirely fork-local additions; upstream's meson.build will not have them. The additions appear after thelibvmaf_feature_sourceslist and before thelibvmaf = library()call.tad_rust.cis compiled as a DIRECT source of thelibvmaflibrary target (not intolibvmaf_feature.a). This is an intentional architectural choice; do not move it intolibvmaf_feature_sourceson rebase.Cargo.tomlat the repo root is the workspace manifest. Upstream will never have this file; no merge conflict expected.- The
enable_rust_featuresmeson option incore/meson_options.txtis fork-local; preserve on upstream merges.Cargo.toml(repo root, new),core/meson_options.txt,core/src/meson.build,core/src/feature/feature_extractor.c,core/src/feature/tad_rust.c(new),core/src/feature/rust/tad/(new crate directory),core/test/meson.build,core/test/test_tad_rust.c(new),docs/adr/0707-vmafx-rust-pilot-feature.md,docs/metrics/tad.md,changelog.d/added/tad-rust-pilot.md,
CAMBI Python compat-layer sync v0.5 → v0.8 — 2026-05-28¶
- ADR: no ADR required — 1:1 upstream port with no fork-local divergence.
- Upstream source: Netflix/vmaf
CambiFeatureExtractorversion history through v0.8 (Research-0732 item #4). - Branch:
chore/cambi-python-v0.8-sync
Rebase notes: The fork is now at parity with upstream Netflix/vmaf for the Python CAMBI wrappers as of 2026-05-28. Future Netflix syncs of compat/python-vmaf/core/cambi_feature_extractor.py, compat/python-vmaf/core/cambi_quality_runner.py, and python/test/cambi_test.py should merge cleanly. No fork-local divergence was introduced; this was a pure upstream port.
compat/python-vmaf/core/cambi_feature_extractor.py, compat/python-vmaf/core/cambi_quality_runner.py, python/test/cambi_test.py, changelog.d/changed/cambi-python-v0.8-sync.md,
vmafx-node Go worker binary (ADR-0713)¶
no rebase impact: fork-only addition — all new files under cmd/vmafx-node/, pkg/gpu/, pkg/ai/, gen/go/controller/, docker/Dockerfile.node*, deploy/helm/vmafx/templates/node.yaml. No C sources, no upstream-mirror files touched. pkg/encoder/discover.go and pkg/encoder/hardware.go are new fork-local files; pkg/encoder/encoder.go (already fork-local from ADR-0705) is not modified.
Files added: cmd/vmafx-node/main.go (new), cmd/vmafx-node/executor.go (new), cmd/vmafx-node/main_test.go (new), gen/go/controller/controller.pb.go (new), gen/go/controller/controller_grpc.pb.go (new), pkg/gpu/detect.go (new), pkg/gpu/detect_test.go (new), pkg/ai/infer.go (new), pkg/ai/infer_test.go (new), pkg/encoder/discover.go (new), pkg/encoder/hardware.go (new), docker/Dockerfile.node (new), docker/Dockerfile.node-cpu (new), docker/Dockerfile.node-cuda12 (new), docker/Dockerfile.node-rocm6 (new), docker/Dockerfile.node-sycl-oneapi2026 (new), deploy/helm/vmafx/templates/node.yaml (new), deploy/helm/vmafx/templates/_helpers.tpl (extended), deploy/helm/vmafx/values.yaml (extended — .Values.node section added), docs/server/node.md (new), docs/adr/0713-vmafx-node-impl.md (new), changelog.d/added/vmafx-node.md (new).
Research-0733 — VMAFX eBPF optimization target — 2026-05-28¶
No rebase impact: docs-only PR. All touched files (docs/research/, changelog.d/, docs/state.md, docs/rebase-notes.md) are fork-local with no upstream Netflix/vmaf equivalent. No C source, no build system, no test assertions changed.
Touched files: docs/research/0733-vmafx-ebpf-optimization-target.md (new), changelog.d/changed/ebpf-research.md (new), docs/state.md (new row), docs/rebase-notes.md (this entry).
cmd/vmafx-operator — Kubernetes Operator kubebuilder skeleton (ADR-0714)¶
No rebase impact on upstream C/Python code: the operator is entirely fork-local (api/vmafx/v1/, cmd/vmafx-operator/, config/crd/, config/rbac/, deploy/helm/vmafx/crds/, deploy/helm/vmafx/templates/operator-*.yaml, go.mod, go.sum). None of these paths overlap with Netflix/vmaf upstream.
If a future upstream sync adds a Go module or touches go.mod, merge the dependency lists in go.mod and regenerate go.sum.
Fork-local files: api/vmafx/v1/ (new), cmd/vmafx-operator/ (new), config/crd/bases/ (new), config/rbac/role.yaml (new), deploy/helm/vmafx/crds/ (new), deploy/helm/vmafx/templates/operator-deployment.yaml (new), deploy/helm/vmafx/templates/operator-rbac.yaml (new), deploy/helm/vmafx/values.yaml (operator.* section added), docs/adr/0714-vmafx-operator-skeleton.md, docs/development/operator.md, changelog.d/added/vmafx-operator-skeleton.md,
core/src/feature/cuda/AGENTS.md — __mul24 prohibition invariant (Research-0734, 2026-05-28)¶
The 2026-05-28 audit confirmed zero __mul24 / __umul24 / __mul24hi usages in the fork's CUDA kernel tree. A prohibition invariant was added to core/src/feature/cuda/AGENTS.md. On upstream sync: if Netflix/vmaf ever adds a CUDA kernel that uses these intrinsics, the prohibiton invariant requires the caller to either remove the intrinsic (replace with *) or obtain CODEOWNERS sign-off documenting the minimum-CUDA-13.3 constraint (see the AGENTS.md note for the full acceptance criteria).
No upstream file is currently in conflict; this note exists to alert future sync agents that the invariant file was intentionally added by the fork and should be preserved through rebases.
Research-0734 — CUDA 13.3 fix-list deep audit¶
No rebase impact on upstream C/Python code: this PR is docs-only (research digest, changelog fragment, state.md row, rebase-notes entry). No C source, .cu kernel, or build file is modified.
If a future upstream sync changes dev/Containerfile or Dockerfile CUDA base-image pins, verify that the new pin is >= 13.3 to ensure the NVCC thread-reconvergence fix [6156910] is included.
Fork-local files: docs/research/0734-cuda-13.3-fix-list-deep-audit.md (new), changelog.d/changed/cuda-13.3-fix-list-deep-audit.md (new), docs/state.md (new row), docs/rebase-notes.md (this entry).
scripts/dev/cleanup-agent-state.sh — agent-state cleanup utility¶
No rebase impact on upstream C/Python code: the script is entirely fork-local developer tooling that touches no compiled sources, tests, or public API.
If a future upstream sync adds a scripts/dev/ directory, merge manually (name collision is the only risk; no logic conflict).
Fork-local files added: scripts/dev/cleanup-agent-state.sh (new), docs/development/agent-worktree-discipline.md (cleanup section added), changelog.d/added/dev-cleanup-script.md (new).
Research-0734 — CUDA VIF filter1d ncu hotpath (no rebase impact)¶
no rebase impact: pure research digest; no source files modified.
Research-0744 cross-backend baseline (2026-05-28) — no rebase impact¶
This PR adds only docs/research/0744-cuda-cross-backend-baseline-pre-ncu-perf.md, changelog.d/perf/cuda-cross-backend-baseline.md, and a docs/state.md row. No C, header, Python, or build files are modified. No upstream sync action is required.
docs/research/0734–0738 — CUDA ADM/motion/SSIM/MS-SSIM ncu hotpath profiles (2026-05-28)¶
No rebase impact: research-only documents, no source code changes. The profiling findings (Research-0734 through 0738) are advisory; no kernel modifications were made in this PR. When a follow-up PR implements the integer_ssim_score.cu extern "C" fix (Research-0736 recommendation 1), that PR must also update ssim_cuda.c host glue and verify bit-exact parity against the CPU integer_ssim extractor on the Netflix golden fixture.
C++23 wave adversarial review (2026-05-28)¶
Read-only review of PRs #41, #43, #44, #45, #48, #51, #54, #56, #58. No files were modified by this review. The review digest is in docs/research/cpp23-wave-adversarial-review-20260528.md.
Critical issues that must be fixed before merge:
- PR #43
opt.cpp:strtol/strtodon potentially non-NUL-terminatedstring_view::data() - PR #48
dict.cpp:strtof(float) assigned todouble— precision loss on option values - PR #54
model.cpp:strlen(model->name) - 5Uunsigned underflow → heap overflow - PR #58
ref.cpp:make_unique/ C-callerfree()allocator mismatch
No rebase impact from the review itself; all findings are fixes required in those PRs.
core/src/feature/cuda/integer_ssim/ — extern "C" on new kernels (ADR-0747)¶
Any upstream or fork PR that adds a new __global__ kernel to a .cu file under core/src/feature/cuda/ or core/src/cuda/ must wrap the entry point in extern "C" { } if it is also referenced by cuModuleGetFunction in the host .c glue.
The invariant is enforced by scripts/dev/check-cuda-extern-c.sh. Run it locally before pushing. On upstream sync, if Netflix adds new CUDA kernels to their libvmaf/src/feature/cuda/ tree, check whether those kernels use extern "C" in the upstream source and mirror the pattern here.
This invariant was formalised after the audit that found integer_ssim/integer_ssim_score.cu missing extern "C", silently breaking --feature ssim --backend cuda since introduction (PR #77 fixed the analogous break in ssim_score.cu; ADR-0747 fixes integer_ssim_score.cu).
core/src/feature/cuda/integer_vif/filter1d.cu — ADR-0743 launch_bounds + __ldg¶
No rebase-sensitive invariants for downstream callers — the changes are confined to the device-side kernel body and the FILTER1D_8_HORI macro. The symbol name filter1d_8_horizontal_kernel_2_17_9 is unchanged; the C host file integer_vif_cuda.c continues to load and dispatch it by name.
If an upstream Netflix/vmaf sync introduces changes to filter1d.cu:
- The
__launch_bounds__(128, 10)annotation onFILTER1D_8_HORImust be preserved (or re-applied) — upstream does not carry this hint. - The
__ldg()calls on the 7buf.tmp.*loads must be preserved. - If upstream changes
val_per_threadorHORI_TILE_W, recheck the smem budget constraint (14812 B/block at vpt=4 is smem-limited on sm_89 — see ADR-0743 for derivation). - The ptxas advisory "minnctapersm out of range, ignored" for sm_75/sm_80/ sm_86 is expected and benign; do not treat it as a gate failure.
research-0748 / PR #76 1080p re-measurement — no new rebase invariants¶
The 1080p re-measurement (research-0748) validates PR #76 at production resolution. No new rebase-sensitive invariants beyond those already documented in the ADR-0743 __launch_bounds__ + __ldg entry above. The register budget (48 regs/thread) and __ldg annotations must be preserved on any upstream sync that touches filter1d.cu per the existing note.
One-off container SYCL device-access pattern (--device /dev/dri --group-add 988)¶
No rebase impact on upstream C/Python code.
When running vmaf-dev-mcp:cuda13.3 as a one-off docker run with SYCL needed:
--device /dev/driis not sufficient. The Level Zero GPU ICD requires/dev/dri/by-path/pci-XXXX:YY:ZZ.W-rendersymlinks to enumerate Intel devices. These symlinks are not passed by--device /dev/dri; they require an explicit-v /dev/dri/by-path:/dev/dri/by-path:robind-mount.--group-add renderfails becauserenderis not a group name inside the container. Use--group-add 988(the host render GID, confirmed on this machine).- Source
setvars.shinside the container before invokingsycl-lsorvmaf --backend sycl.
The docker compose deployment (dev/docker-compose.yml) already carries the by-path bind-mount per ADR-0514; this note covers one-off docker run usage.
Fork-local files: docs/research/0734-cross-backend-baseline-with-sycl-20260528.md (new), changelog.d/changed/cross-backend-baseline-with-sycl.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).
ADR-0752 — Multi-resolution perf benchmark baseline¶
No rebase impact on upstream C/Python code.
New fork-local files only:
scripts/perf/bench-multi-resolution.sh— benchmark harnesstestdata/perf_multi_resolution.json— baseline snapshot (schema_version=1)docs/development/perf.md— usage docsdocs/research/research-0752-perf-bench-multi-resolution-baseline.mddocs/adr/0752-perf-bench-multi-resolution.mdchangelog.d/added/perf-bench-multi-resolution.mdcore/AGENTS.md(new invariant appended)
The upscaled fixture cache files (testdata/ref_1920x1080_48f.yuv, etc.) are generated on first run and should be .gitignored (they are reproducible from the 576×324 native fixture via ffmpeg -vf scale=W:H:flags=bilinear).
perf/cuda-ssim-vert-combine-ldg-launch-bounds-leak-20260529 (ADR-0754)¶
No rebase impact on upstream C/Python code.
core/src/feature/cuda/integer_ssim/ssim_score.cu and core/src/feature/cuda/integer_ssim_cuda.c are wholly fork-local files with no upstream Netflix equivalents. The VmafCudaBuffer struct and the vmaf_cuda_kernel_readback_free / vmaf_cuda_buffer_host_free helpers are fork-local CUDA infrastructure. No Netflix upstream commit will collide with these changes on sync-upstream.
Fork-local files modified: core/src/feature/cuda/integer_ssim/ssim_score.cu (F2 + F4 — ldg() + __launch_bounds), core/src/feature/cuda/integer_ssim_cuda.c (F6 per-caller save+free DROPPED — superseded by helper fix in PR #94), core/src/feature/cuda/AGENTS.md (invariant notes), docs/adr/0754-cuda-ssim-vert-combine-ldg-pinned-leak.md (new), docs/adr/README.md (new row), docs/research/0754-cuda-ssim-vert-combine-ldg-launch-bounds-2026-05-29.md (new), changelog.d/perf/cuda-ssim-vert-combine.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).
Research-0755 — HIP backend audit (2026-05-29)¶
No rebase impact on upstream C/Python code.
All files modified are fork-local: core/src/feature/hip/AGENTS.md (invariant notes), docs/research/0755-hip-backend-audit-20260529.md (new), changelog.d/changed/hip-backend-audit.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).
No source files were modified (audit-only). No Netflix upstream commit will collide with these additions on sync-upstream.
research/cuda-f3-struct-by-value-audit-20260529 (2026-05-29)¶
No rebase impact: this PR adds documentation-only files (research digest, ADR, changelog fragment, state.md row). No CUDA source files are modified. VmafCudaBuffer, VmafPicture, and AdmBufferCuda definitions are unchanged; no upstream Netflix/vmaf commit will collide with this PR's diff.
Fork-local files added/modified: docs/research/research-0756-cuda-f3-struct-by-value-audit.md (new), docs/adr/0756-cuda-f3-struct-by-value-audit.md (new), docs/adr/README.md (new row), changelog.d/perf/cuda-f3-struct-by-value-audit.md (new), docs/state.md (new row),
ADR-0755: C++23 Wave 7 — activate cpu.cpp (PR on 2026-05-29)¶
No rebase impact on upstream C/Python code.
core/src/cpu.c was deleted and core/src/meson.build updated to compile cpu.cpp. The file cpu.cpp is wholly fork-local (no upstream Netflix equivalent). No Netflix upstream commit will collide with this deletion.
Fork-local files modified: core/src/cpu.c (deleted), core/src/meson.build (cpu.c → cpu.cpp in libvmaf_cpu_sources), docs/adr/0755-cpp23-wave7-single-file.md (new), docs/adr/README.md (new row), changelog.d/changed/0755-cpp23-wave7-cpu-cpp.md (new), docs/rebase-notes.md (this entry).
research/cuda-motion-ncu-profile-20260529¶
No rebase impact: research-only commit. No source files modified. Files added: docs/research/0760-cuda-motion-ncu-multi-resolution-20260529.md, changelog.d/perf/cuda-motion-ncu-multi-resolution.md, docs/adr/0760-cuda-motion-ncu-multi-resolution.md (research ADR). No upstream collision risk.
HIP ADM buffer-by-pointer refactor (ADR-0759, 2026-05-29)¶
Files touched: core/src/feature/hip/integer_adm/adm_csf.hip, core/src/feature/hip/integer_adm/adm_cm.hip, core/src/feature/hip/integer_adm_hip.c, core/src/feature/hip/AGENTS.md
Rebase impact: None. All touched files are fork-added; no upstream Netflix/vmaf file is modified. The HIP backend does not exist in upstream. No rebase conflict is possible with upstream syncs.
The changed kernel signatures are internal to the HIP dispatch path and are not part of any public API.
ADR-0762 — CUDA CIEDE2000 __ldg() F3 fix (2026-05-29)¶
No rebase impact on upstream C/Python code.
All files modified are fork-local: core/src/feature/cuda/integer_ciede/ciede_score.cu (F3 fix — __ldg + __launch_bounds), core/src/feature/cuda/integer_vif_cuda.c (resolve pre-existing merge-conflict stub from 24bb5daf89), docs/adr/0762-cuda-ciede-ldg.md (new), changelog.d/perf/cuda-ciede-ldg.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).
ciede_score.cu is entirely fork-local (Netflix upstream has no CUDA ciede kernel). A sync-upstream that adds a CUDA ciede kernel upstream would need to incorporate this __ldg() pattern. The integer_vif_cuda.c conflict resolution keeps the HEAD side (ADR-0743 comment block); no Netflix upstream content was discarded.
ADR-0764 — psnr_hvs CUDA kernel F3 ldg() + __launch_bounds(64) (2026-05-29)¶
No rebase impact on upstream C/Python code: psnr_hvs_score.cu is entirely fork-local. integer_psnr_hvs_cuda.c is unchanged.
If an upstream sync changes the psnr_hvs CPU reference in core/src/feature/third_party/xiph/psnr_hvs.c, verify the CUDA kernel's cooperative tile load and reduction order in psnr_hvs_score.cu are still byte-for-byte equivalent to the CPU's calc_psnrhvs computation pattern.
All files modified are fork-local: core/src/feature/cuda/integer_psnr_hvs/psnr_hvs_score.cu (pointer extraction + __ldg() + __launch_bounds__(64)), docs/adr/0764-psnr-hvs-ldg-launch-bounds.md (new), docs/research/0764-cuda-psnr-hvs-ldg-launch-bounds-2026-05-29.md (new), changelog.d/perf/cuda-psnr-hvs-ldg-launch-bounds.md (new), docs/rebase-notes.md (this entry), docs/state.md (new row).
ADR-0787 — libvmaf API error-path audit (2026-05-29)¶
No rebase impact: this PR adds only documentation files (research digest, ADR, changelog fragment) and no C/Python source changes.
All files modified are fork-local: docs/research/research-0787-libvmaf-api-error-path-audit.md (new), docs/adr/0787-libvmaf-api-error-path-audit.md (new), docs/adr/README.md (new row), changelog.d/fixed/0787-libvmaf-api-error-path-audit.md (new), docs/rebase-notes.md (this entry).
The six implementation fixes recommended by the audit (vmaf_write_output_with_format errno, vmaf_cuda_state_init error codes, vmaf_close unchecked returns, CUDA EBUSY guard, vmaf_init error propagation) will land in a separate fix PR that will carry its own rebase-notes entry. The vmaf_cuda_state_free ABI-normalisation is deferred to a major-version PR.
ADR-0815 — vmafx-operator + vmafx-node distroless Dockerfiles (2026-05-29)¶
No rebase impact on upstream C/Python code.
All files added are fork-local: docker/Dockerfile.operator (new), .github/workflows/docker-publish-operator-node.yml (new), docs/adr/0815-operator-node-distroless-dockerfiles.md (new), changelog.d/added/0815-operator-node-distroless-dockerfiles.md (new), docs/backends/operator.md (new), docs/rebase-notes.md (this entry).
No upstream Netflix/vmaf files are touched. A sync-upstream cannot conflict with these additions. The docker/Dockerfile.node file was already in-tree (ADR-0717); this PR only adds the CI workflow that publishes it.
no rebase impact: REASON — changes are confined to config files (.clang-tidy, .pre-commit-config.yaml, pyproject.toml), fork-owned Python sources in ai/ and scripts/ (UP auto-fixes), and docs. No upstream Netflix/vmaf C source is touched; the HeaderFilterRegex fix has no effect on any upstream file.
ADR-0795 — prev_ref thread-safety hardening — 2026-05-29¶
No rebase impact: all changes are in core/src/libvmaf.c (comments, a rename from fex to shared_fex, and a defensive assert). No logic change; no new symbols; no API change. The modified functions (threaded_extract_func, threaded_extract_batch_func) are fork-local dispatch paths not present in upstream Netflix/vmaf.
Fork-local files: core/src/libvmaf.c (comments + assert), docs/adr/0795-prev-ref-thread-safety.md, changelog.d/fixed/prev-ref-batch-thread-safety.md.
ADR-0882 — fuzz target audit (json_model + dnn_sidecar) — 2026-05-30¶
no rebase impact: REASON — all new files (core/test/fuzz/fuzz_json_model.c, core/test/fuzz/fuzz_dnn_sidecar.c, seed corpora under core/test/fuzz/json_model_corpus/ + core/test/fuzz/dnn_sidecar_corpus/, the known-crash reproducer under core/test/fuzz/json_model_known_crashes/, and ADR-0882 + changelog fragment) are fork-local. Upstream Netflix/vmaf has no libFuzzer harnesses at all (the entire core/test/fuzz/ subtree is fork-added per ADR-0270 + ADR-0311). The core/test/fuzz/meson.build edits sit in a if not get_option('fuzz') guarded subdir that upstream does not descend into. The .github/workflows/fuzz.yml matrix addition extends a fork-only workflow file. The only files touching shared upstream-mirror code are doc edits (docs/state.md, docs/rebase-notes.md, docs/adr/README.md) that always paint the fork-local row pattern.
ADR-0887 — vmaf_model_destroy slopes-OOB fix — 2026-05-30¶
Low rebase impact, but not zero. Touches two upstream-mirrored files:
core/src/read_json_model.c— addssync_n_featureshelper, replacesparse_feature_names' unconditionaln_features++with a per-iterationsync_n_features(model, i)call, adds the same call toparse_slopes/parse_intercepts/parse_feature_opts_dicts, and insertsvalidate_feature_arraysbeforeparse_model_dictreturns.core/src/model.c::vmaf_model_destroy— flips the destroy walk bound frommax(feature_cap, n_features)tomin(feature_cap, n_features).
On upstream sync, if Netflix has independently changed parse_feature_names or vmaf_model_destroy, take the upstream changes for unrelated lines and re-apply this fork's hunks (the new sync_n_features helper, the validate_feature_arrays call, and the min bound in destroy). Upstream Netflix does not currently have this validation pass, so a conflict means upstream changed an adjacent surface — re-applying the fork's hunks post-upstream is mechanical.
Fork-local files (no rebase impact): docs/adr/0887-*.md, docs/research/0887-*.md, core/test/test_model.c regression tests, changelog.d/fixed/vmaf-model-destroy-slopes-oob.md, docs/state.md row.
Feature-extractor coverage round 2 (ADR-0938, 2026-05-31)¶
no rebase impact: REASON — all seven new files (core/test/test_integer_psnr_coverage.c, core/test/test_integer_motion_coverage.c, core/test/test_integer_motion_v2_coverage.c, core/test/test_integer_vif_log2.c, core/test/test_iqa_convolve_coverage.c, core/test/test_barten_csf_coverage.c, core/test/test_ms_ssim_decimate_coverage.c) are fork-local additions under core/test/ and seven additive blocks in core/test/meson.build that do not touch any upstream-mirrored test file. The only contact surface with upstream is the consumed public C-API and the public feature/integer_vif.h / feature/barten_csf_tools.h / feature/iqa/convolve.h headers, which are upstream-mirrored but read-only from these tests. On upstream sync, conflicts are restricted to the meson.build insertion points; reapply the seven test_*_coverage = executable(...) blocks and the matching test('test_*_coverage', ...) rows post-rebase.
Feature-extractor coverage round 3 (ADR-0948, 2026-05-31)¶
no rebase impact: REASON — additions are confined to fork-local test binaries under core/test/ (test_integer_motion_edge16_coverage, test_adm_csf_tools_coverage, test_feature_collector_coverage) and three append-only entries in core/test/meson.build. No upstream-mirrored source touched; no public API delta. On upstream sync the new tests apply cleanly regardless of what Netflix does to the underlying production files because the tests link against the existing libvmaf static target and import public + internal headers that already existed before round 3.
SYCL kernel coverage round 2 (ADR-0884, 2026-05-30)¶
no rebase impact: REASON — all changes are confined to fork-added test files (core/test/test_sycl_adm_parity.c, core/test/test_sycl_ciede_parity.c, core/test/test_sycl_ssim_parity.c, core/test/test_sycl_ms_ssim_parity.c, core/test/test_sycl_motion_v2_parity.c), the meson wiring for those files in core/test/meson.build, and docs / changelog / core/src/feature/sycl/AGENTS.md companion notes. No upstream Netflix/vmaf C source is touched. The SYCL backend itself is fork-original (Netflix/vmaf has no SYCL path), so there is no upstream rebase surface for these tests at all.
CUDA kernel parity tests — round 2 (ADR-0886, 2026-05-30)¶
no rebase impact: REASON — adds five new fork-local test files under core/test/ (test_cuda_adm_parity.c, test_cuda_motion_v2_parity.c, test_cuda_cambi_parity.c, test_cuda_psnr_hvs_parity.c, test_cuda_ssim_parity.c) and wires them through core/test/meson.build inside the existing if cuda_dependency.found() guard. The tests exercise the public C API (vmaf_init / vmaf_use_feature / vmaf_cuda_state_init / vmaf_feature_score_at_index); upstream Netflix/vmaf does not own any of the touched files. Conflict surface on sync is limited to the core/test/meson.build stanza ordering, which is mechanical.
Fork-local files: core/test/test_cuda_adm_parity.c, core/test/test_cuda_motion_v2_parity.c, core/test/test_cuda_cambi_parity.c, core/test/test_cuda_psnr_hvs_parity.c, core/test/test_cuda_ssim_parity.c, core/test/meson.build (new stanzas only), docs/adr/0886-cuda-kernel-coverage-round2.md, docs/research/cuda-kernel-coverage-round2-2026-05-30.md, changelog.d/added/0886-cuda-kernel-coverage-round2.md.
macOS CI ansnr-residual cleanup (ADR-0749 follow-up, 2026-05-30)¶
no rebase impact: REASON — changes are confined to fork-mirrored upstream test files (python/test/feature_extractor_test.py, python/test/quality_runner_test.py, python/test/routine_test.py) where assertions referencing the legacy ansnr / anpsnr keys are dropped or the tests are skipped per ADR-0749 (ansnr feature sunset). The @unittest.skip reasons cite ADR-0749, so on upstream sync the conflict resolution is mechanical: if Netflix upstream still has the legacy assertions they were calibrated against float_ansnr output that this fork no longer produces — keep the skips. If Netflix upstream removes the legacy assertions themselves (matching this fork's direction), drop the local skips.
CI scripts: rebrand-proof assertion-density + tempfile trap (ADR-0968, 2026-05-31)¶
no rebase impact: fork-local — scripts/ci/assertion-density.sh and scripts/release/concat-changelog-fragments.sh are entirely fork-introduced; Netflix upstream has no equivalent files in either path. The only rebase risk is a new upstream scripts/ entry shadowing the directory, which would surface as an explicit conflict rather than a silent behaviour change.
compat/python-vmaf leaf-utility coverage (2026-05-31)¶
no rebase impact: REASON — the new test file lives entirely under python/test/compat_python_vmaf_coverage_test.py (fork-local test directory that Netflix upstream never touches) and imports leaf utilities by their existing public names. No production module under compat/python-vmaf/ is modified; only compat/python-vmaf/AGENTS.md gains one paragraph documenting which leaves carry coverage tests and warning about the latent sha1 bug in tools/decorator.py's persist helpers. Upstream syncs do not own compat/python-vmaf/AGENTS.md (fork-only file).
Master CI regressions — Metal MS-SSIM fixture + ssimulacra2 icpx XYB (ADR-0973, 2026-05-31)¶
no rebase impact: REASON — all touched files are fork-additions with no upstream conflict surface:
core/test/test_metal_float_ms_ssim_parity.c— fork-added in T8-2a; Netflix upstream has no Metal backend.core/test/test_ssimulacra2_simd.c— fork-added SIMD bit-exactness test; Netflix upstream has no SSIMULACRA 2 SIMD paths.docs/adr/0973-*.md,docs/research/0973-*.md,changelog.d/fixed/0973-*.md,core/test/AGENTS.md— fork-only governance / docs.
The fix adds a file-scope #pragma clang fp contract(off) block to test_ssimulacra2_simd.c. If a future contributor refactors the file's scalar reference functions out into a helper header, the pragma block must move with them or the icx FMA contraction returns and test_xyb fails under the all-backends matrix leg.
test_gpu_picture_pool.c Round 27 D.3 + D.4 cleanup (ADR-0970, 2026-05-31)¶
no rebase impact: REASON — core/test/test_gpu_picture_pool.c is a fork-local test file (it was introduced in this fork's PR #266 / ADR-0239; Netflix upstream has no equivalent file). The two changes (remove unused .state malloc, delete dead /* ... */ block) affect only lines that Netflix upstream never touches. core/test/AGENTS.md is also fork-only.
ADR-0922 — coverage ratchet + per-PR delta gate — 2026-05-31¶
No rebase impact. All touched files are fork-local CI / docs infrastructure:
scripts/ci/coverage-check.sh(raisedOVERALL_MIN37 → 60,CRITICAL_MIN85 → 90, tightened eachPER_FILE_MINentry by +5pp).scripts/ci/coverage-delta-check.sh(new — per-PR delta gate)..github/workflows/tests-and-quality-gates.yml(Coverage Gate job: new floor numbers + two new steps that compute base-branch coverage and run the delta gate on pull-request events).docs/adr/0922-coverage-ratchet-aggressive.md,docs/adr/_index_fragments/0922-coverage-ratchet-aggressive.md,docs/adr/_index_fragments/_order.txt,docs/adr/README.md(regenerated byscripts/docs/concat-adr-index.sh).changelog.d/changed/0922-coverage-ratchet-aggressive.md.
Upstream Netflix/vmaf has no coverage gate, so on sync there is nothing to reconcile. The per-PR delta gate's fetch-depth: 0 checkout requirement is worth flagging if the workflow ever gets restructured: a shallow checkout breaks git merge-base HEAD "$BASE_REF".
Metal kernel coverage round 4 — closeout (2026-05-31, ADR-0959)¶
no rebase impact: REASON — every new file path is fork-local Metal-only (Netflix upstream has no Metal backend at all per rebase-notes.md §"feat/libvmaf-metal-filter-iosurface" lineage). The single existing-file edit, core/test/meson.build, appends one executable() + test() block inside the existing enable_metal guard introduced by ADR-0361 (no boundary change, no upstream-mirrored line touched). Upstream sync resolution is trivially "keep theirs" everywhere except inside the if metal_test_opt.enabled() … block, which is fork-only by construction.
Fork-local additions (no rebase impact): core/test/test_metal_kernel_coverage_audit.c, docs/adr/0959-metal-kernel-coverage-round4-closeout.md, docs/research/0959-metal-kernel-coverage-round4-closeout.md, changelog.d/added/metal-kernel-coverage-round4.md, the new audit row in docs/adr/README.md, the T-METAL-KERNEL-PARITY-ROUND4-2026-05-31 row in docs/state.md.
CUDA kernel parity coverage — round 4 (ADR-0956, 2026-05-31)¶
no rebase impact: REASON — all five new files (core/test/test_cuda_float_adm_parity.c, core/test/test_cuda_float_motion_parity.c, core/test/test_cuda_float_ssim_parity.c, core/test/test_cuda_speed_chroma_smoke.c, core/test/test_cuda_speed_temporal_smoke.c) live entirely under fork-local test directories that Netflix upstream never touches. The only modified shared file is core/test/meson.build, where the round 4 block is appended after the existing round 3 / ADR-0541 / motion3 parity blocks inside the existing if get_option('enable_cuda') guard. On upstream sync, if Netflix has independently added test binaries in the same enable_cuda block the conflict is a trivial append-vs-append three-way merge (no shared lines change). Fork-local documentation files (ADR-0956, the round 4 research digest, the changelog fragment, this rebase-notes row) are never authored upstream.
speed_internal.c + SpEED GPU twin wiring (ADR-0964, 2026-05-31)¶
Will bite a rebase. This PR adds core/src/feature/speed_internal.c (a fork-local TU that duplicates ~600 LOC of pure math — eigendecomposition, QR factorisation, matrix helpers — from speed.c). When /sync-upstream ports any change to speed.c's static helpers (compute_eigenvalues, matrix_qr_decomposition, solve_triangular_system, convert_to_tridiagonal, compute_eigenvalues_tridiagonal, compute_covariance_matrix, filter_and_downscale, ...), the same change must be mirrored into speed_internal.c. Symptom of drift: test_sycl_speed_chroma_parity / test_sycl_speed_temporal_parity flag a places=4 violation between CPU and SYCL on Intel Arc.
Also fork-local:
core/src/feature/hip/speed_{chroma,temporal}_hip.c(already in tree, newly wired intocore/src/hip/meson.build).core/src/feature/sycl/speed_{chroma,temporal}_sycl.cpp(already in tree, newly wired intocore/src/meson.buildsycl_feature_sources).core/src/feature/feature_extractor.cexterns + registry rows for the four new GPU extractor symbols, gated on#if HAVE_HIP/#if HAVE_SYCL.core/test/test_sycl_speed_chroma_parity.c+core/test/test_sycl_speed_temporal_parity.c.
If Netflix upstream ever ships its own SpEED GPU implementation that takes a different code-sharing approach (e.g. exposing speed.c helpers via a non-static-prefix), the fork should consider migrating to upstream's pattern; until then speed_internal.c is the canonical location for the shared helpers and the GPU TUs depend on its function names.
CUDA twins (speed_chroma_cuda, speed_temporal_cuda) are NOT wired in this PR — the TUs reference symbols (CHECK_CUDA, CudaFunctions->cuMemAllocHost) that do not exist; they need a repair pass. Tracked as T-CUDA-SPEED-TU-REPAIR-2026-05-31 in docs/state.md.
CUDA SpEED TU repair + wiring (ADR-0965, 2026-05-31)¶
No new rebase risk beyond ADR-0964. This repair PR fixes the latent bugs in speed_chroma_cuda.c and speed_temporal_cuda.c and wires them into meson. The changes are:
CHECK_CUDA(cu_f, CALL)replaced withCHECK_CUDA_GOTO(cu_f, CALL, fail)throughout both TUs.cuMemAllocHost(ptr, sz)replaced withcuMemHostAlloc(ptr, sz, 0x01u)in theALLOC_HOSTmacros (both TUs).
Both changes are mechanical; no algorithmic content was altered. The rebase-note from ADR-0964 above covers the speed_internal.c drift risk (mirror fixes between speed.c and speed_internal.c).
New additions:
core/src/feature/feature_extractor.cexterns + registry rows forvmaf_fex_speed_chroma_cuda/vmaf_fex_speed_temporal_cudaunder#if HAVE_CUDA.core/test/test_cuda_speed_chroma_parity.c+core/test/test_cuda_speed_temporal_parity.c.
Go errors.Join cleanup paths + slog key standardisation (ADR-0935, 2026-05-31)¶
no rebase impact: REASON — every file touched lives in fork-original Phase 4b Go subtree (pkg/bisect/, pkg/encoder/, pkg/storage/, cmd/vmafx-controller/queue/, cmd/vmafx-node/). Netflix upstream ships no Go code under these paths, so a future upstream/master sync cannot conflict here. The cmd/vmafx-tune/AGENTS.md invariant addition is also fork-original. If a follow-up port-PR introduces upstream Go code, the errors.Join discipline documented in cmd/vmafx-tune/AGENTS.md §7 applies on entry.
Generic registry for vmafx-controller (ADR-0925, 2026-05-31)¶
no rebase impact: REASON — touched files are 100 % fork-only Go sources (pkg/registry/registry.go, pkg/registry/registry_test.go, cmd/vmafx-controller/nodes/registry.go, pkg/observability/observability.go). Netflix upstream is a pure C / Python tree; the cmd/ and pkg/ Go trees do not exist there.
VmafPicture v2 design scaffold (ADR-0928, 2026-05-31)¶
Files touched: core/include/libvmaf/picture_v2.h (new), docs/adr/0928-vmaf-picture-v2-explicit-backend-state.md (new), docs/architecture/vmaf-picture-v2-migration.md (new), docs/adr/README.md + docs/adr/_index_fragments/ (index row), changelog.d/added/vmaf-picture-v2-design.md (fragment).
Rebase impact: None for this PR. The new header is declared but not yet wired into meson.build, and v1 (core/include/libvmaf/picture.h) is preserved bit-for-bit — every existing consumer (FFmpeg patches 0002–0006, MCP server, Rust binding scaffold, Python wheels) still sees the v1 surface unchanged. Upstream Netflix/vmaf has no v2 counterpart on the deprecation horizon, so no sync conflict is expected.
Lifecycle (per ADR-0928):
- Cycle N (this PR): header declared, design + scaffold only.
- Cycle N+1: header wired into meson, converters implemented in
core/src/picture.c, v1 marked__attribute__((deprecated)). - Cycle N+2: in-tree backends +
ffmpeg-patches/0002-0006switched to v2 (coordinated per CLAUDE.md §12 r14). - Cycle N+3 (≈ 12 months, target VMAFX v4.0.0): v1 removed, SONAME bump
libvmaf.so.3 → .4.
If upstream Netflix independently adds a VmafPicture v2 of their own before cycle N+3, reconcile by adopting upstream's naming (VmafPicture2 is intentionally generic) and remap our converters; otherwise the cycle-N+3 v1-removal commit is the natural ABI break window.
pathlib sweep + ruff PTH guard (ADR-0936, 2026-05-31)¶
no rebase impact: changes are confined to fork-owned Python — the two console-shim files under tools/vmaf-*/ (fork-added, no upstream twin), fork-owned ai/scripts/, ai/src/corpus/, mcp-server/, scripts/ci/, and tools/vmaf-tune/src/ modules. The pyproject.toml ruff config delta adds PTH to select and lists it in the existing per-file ignores for the upstream-mirror trees (python/**, compat/python-vmaf/**, testdata/**). Upstream Netflix Python is covered by those ignores; an upstream sync will not see the PTH rule applied to their files.
iter.Seq[T] companion APIs for Go packages (ADR-0932, 2026-05-31)¶
no rebase impact: REASON — every touched file is fork-original Go code under pkg/bisect/, pkg/ladder/, pkg/ai/, and cmd/vmafx-controller/nodes/. None of these paths exist upstream (Netflix/vmaf has no Go module), so an upstream sync cannot conflict with the new IterSamples / IterCloud / IterHull / AllSeq / ListModelsSeq surfaces. The deprecated Registry.All / Registry.ListModels shims are likewise fork-local. If a future Netflix upstream adds Go bindings, the conflict is resolution-only at the package-tree level (different directory layout, no symbol overlap).
Skills library expansion — /add-mcp-tool, /add-k8s-resource, /audit-modernization, bisect-common (ADR-0939, 2026-05-31)¶
no rebase impact: all new files land under .claude/skills/, which is fork-local infrastructure (the upstream Netflix/vmaf repo does not ship a .claude/ directory). The accompanying ADR, index fragment, changelog fragment, and research digest are likewise fork-local. The two existing bisect skills (bisect-regression, bisect-model-quality) gain scaffold.sh driver scripts that source .claude/skills/lib/bisect-common.sh — still all fork-local. No upstream files touched.
If upstream Netflix ever adopts .claude/ skills (unlikely — different agent tooling), revisit whether the three new scaffolds should be promoted or stay fork-only. The bisect-common library has no upstream analogue either, so the merge surface is zero.
ai/ dataclass → pydantic v2 migration (ADR-0934, 2026-05-31)¶
no rebase impact: REASON — touched files are entirely fork-local. Upstream Netflix/vmaf does not ship ai/src/vmaf_train/ at all (the package is fork-added — Tiny-AI surface, ADR-0042). TrainConfig (train.py), ModelMetadata (registry.py), and ManifestEntry (data/datasets.py) become pydantic.BaseModels; pydantic>=2.13.4 added to ai/pyproject.toml (already in tree via mcp-server/vmaf-mcp). Sidecar JSON layout byte-identical (ModelMetadata.to_json() uses model_dump(mode="json") + json.dumps(indent=2, sort_keys=True)). On upstream sync the diff cannot conflict — Netflix has no equivalent file to merge into.
Vendored libsvm + IQA test-coverage uplift (2026-05-31, ADR-0952)¶
core/test/test_svm_api.c and core/test/test_iqa_helpers.c are pure fork-local test additions. They link against the vendored libsvm_static_lib (for svm) and libvmaf_feature_static_lib + libvmaf_cpu_static_lib (for iqa) via their extract_all_objects recipes — the same pattern used by test_iqa_convolve.c, test_feature_extractor.c, and PR #381's test_svm_parser.c. No vendored source is touched.
Rebase impact: when Netflix upstream re-pins libsvm (3.24 → 3.36 or later) or when the IQA helpers gain new public functions:
test_svm_api.cassertions on inspector outputs and on the C-SVC / EPSILON-SVR predict round-trip are functional invariants of the libsvm public API; a major-version bump that changes them is a semantic break and should land its own ADR.test_iqa_helpers.c_round()and_cmp_float()assertions document the current asymmetric rounding rule ("trunc toward zero, add sign when |frac| >= 0.5"). If upstream tdistler.com (or Netflix's 2016 update) ever rewrites those helpers to IEEE-754 round-half-to-even, the tests will fail — that is by design; the failure surfaces the unintended numerical change at the rebase diff, not at the integration SSIM result.
The meson wiring in core/test/meson.build inserts two new executables above test_feature_extractor and registers them in the fast suite. Both fragments are isolated; the only adjacency to upstream code is the alphabetical position in the test list.
PR companion to ADR-0889 (PR #381, libsvm parser audit) — the two PRs can land in either order without conflict.
external-bench test coverage backfill (ADR-0332 follow-up, 2026-05-31)¶
no rebase impact: REASON — changes are confined to fork-only files (tools/external-bench/tests/test_compare.py, changelog.d/added/*, docs/research/*). The tools/external-bench/ tree is fork-only per ADR-0332 (no upstream counterpart); coverage backfill (14 new tests for BVI-DVC discovery edge cases, Netflix discovery edge cases, validator rejection paths, run_wrapper missing-output guard, and main() --limit + per-item skip flow) cannot conflict on upstream sync.
HIP motion3 parity test ENOSYS-skip (ADR-0949, 2026-05-31)¶
no rebase impact: REASON — the only file touched in the libvmaf source tree is core/test/test_hip_motion3_parity.c, which is wholly fork-added (no Netflix upstream counterpart — Netflix/vmaf does not ship a HIP backend). The skip-on--ENOSYS change is self-contained inside the test's HIP-path helper; CPU baseline, tolerance, fixture geometry, and end-of-stream handling are unchanged. Upstream sync cannot conflict because no upstream file touches this test path or the motion_hip extractor's scaffold-vs-runtime split.
GitHub Actions custom-action + reusable-workflow audit (ADR-0951, 2026-05-31)¶
no rebase impact: REASON — audit-only PR. No code under core/, python/, ai/, mcp-server/, or tools/ is touched. The only edited files are: docs/adr/0951-github-actions-custom-audit.md (new ADR), docs/adr/README.md + docs/adr/_index_fragments/_order.txt (index rows), docs/research/0951-github-actions-custom-audit.md (digest), changelog.d/changed/github-actions-custom-audit.md (fragment), and this rebase-notes row. The fork-wide SHA-pin invariant in .github/AGENTS.md lines 100–144 is unchanged. On upstream sync the audit conclusions remain valid until Netflix introduces its own .github/actions/ tree or workflow_call: workflow; re-run the three reproducer commands in the research digest to confirm.
fix(mcp-server): NamedTemporaryFile (ADR-0975) — no rebase impact¶
Replaces a local variable assignment in _run_vmaf_score. No C surface, no public API, no upstream-mirrored file touched. On rebase against upstream Netflix/vmaf, this change applies cleanly to the MCP server layer which is entirely fork-local.
ADR-0945 — HIP kernel parity coverage round 3 — 2026-05-31¶
no rebase impact: REASON — the 4 new test files (core/test/test_hip_cambi_parity.c, core/test/test_hip_float_adm_parity.c, core/test/test_hip_float_motion_parity.c, core/test/test_hip_float_psnr_parity.c) live entirely under the fork-only HIP backend tree. Upstream Netflix/vmaf has no HIP backend, no parity tests, and no test_hip_* files; the if get_option('enable_hip') == true block in core/test/meson.build is fork-local (added by the HIP scaffold landing in ADR-0212). Wiring lives strictly inside that block. The only non-test files touched are docs/adr/README.md (index row), docs/adr/_index_fragments/_order.txt, the changelog.d/added/hip-kernel-coverage-round3.md fragment, and the companion docs/research/hip-kernel-coverage-round3-2026-05-31.md audit — all fork-only.
ADR-0918 — LLVM IR diff harness — 2026-05-31¶
no rebase impact: harness is fork-local tooling (scripts/perf/check-ir-diff.sh, scripts/perf/ir-diff-config.yaml, testdata/ir-snapshots/, make ir-diff / make ir-diff-update targets). It snapshots LLVM IR for fork-added SIMD sources only; Netflix upstream never touches these paths. The only upstream coupling is the SIMD source files themselves (core/src/feature/x86/*.c) — if a future upstream sync changes the scalar reference for psnr_hvs / ms_ssim_decimate / ssimulacra2 and the AVX2 twin must change in lockstep, the snapshot regen step (make ir-diff-update) is a normal part of the port — same discipline as the score JSON snapshots under /regen-snapshots. The new core/src/feature/x86/AGENTS.md invariant note flags this for the next sync agent.
vmafx-operator functional test coverage uplift (2026-05-31)¶
no rebase impact: REASON — all four new test files live under cmd/vmafx-operator/internal/controller/ which is fork-added per ADR-0714 (vmafx-operator kubebuilder skeleton). Upstream Netflix/vmaf ships no Go sources and no Kubernetes operator surface; there is nothing to merge against.
Fork-local files: cmd/vmafx-operator/internal/controller/vmafxnode_controller_test.go (new), cmd/vmafx-operator/internal/controller/vmafxjob_controller_branch_test.go (new), cmd/vmafx-operator/internal/controller/vmafxmodeltraining_controller_branch_test.go (new), cmd/vmafx-operator/internal/controller/setup_with_manager_test.go (new), changelog.d/added/operator-functional-coverage.md (new).
ADR-0913 — CHANGELOG.md renderer splice contract + 44 k-line drift sweep — 2026-05-31¶
no rebase impact (upstream): REASON — fork-local infrastructure only. The renderer (scripts/release/concat-changelog-fragments.sh), the rendered file (CHANGELOG.md), and the fragment tree (changelog.d/) are all fork-added; Netflix/vmaf upstream uses a hand-edited CHANGELOG.md with no fragment system.
In-flight fork-branch impact (medium): every in-flight fork branch that added a fragment under the old ## Section / ### Section shape will conflict on its fragment file at rebase time. Resolution is mechanical — keep the bullet content, drop the redundant first-line section header (the renderer emits ### Section itself). Branches that added perf entries under changelog.d/perf/ or changelog.d/performance/ need to rename to changelog.d/changed/perf-<topic>.md (the same convention PR #384 / ADR-0892 introduces). On rebase the renderer's new stderr WARNING surfaces the wrong directory immediately; bash scripts/release/concat-changelog-fragments.sh --check then verifies the fix.
__init__.py export-completeness audit (ADR-0911, 2026-05-31)¶
no rebase impact: REASON — all eight modified __init__.py files are fork-added (ai/__init__.py, ai/data/__init__.py, ai/train/__init__.py, ai/src/vmaf_train/__init__.py, ai/src/vmaf_train/data/__init__.py, dev-llm/src/vmaf_dev_llm/__init__.py, mcp-server/vmaf-mcp/src/vmaf_mcp/__init__.py, scripts/lib/__init__.py). Upstream-mirror packages (compat/python-vmaf/**, python/test/__init__.py) were deliberately left byte-identical per the upstream-mirror rebase-hygiene rule. No upstream Netflix/vmaf file is touched.
ADR-0907 — Wall-clock perf regression gate (2026-05-30)¶
No rebase impact on upstream C/Python code.
New fork-local files only:
scripts/perf/check-regression.py— gate script (stdlib-only)scripts/perf/test_check_regression.py— smoke testsdocs/adr/0907-perf-regression-gate-wall-clock.md(new)docs/adr/_index_fragments/0907-perf-regression-gate-wall-clock.md(new)changelog.d/added/perf-regression-gate.md(new).github/workflows/tests-and-quality-gates.yml(newperf-regressionjob; the disabledcross-backendjob's brokenbench_all.sh --backend=cpu --snapshot-only --tolerance-ulp=2invocation is replaced with a no-op placeholder echo sincebench_all.shdoes not parse those flags)
No upstream Netflix collision risk — the gate consumes only the fork-added testdata/perf_multi_resolution.json baseline (ADR-0752, fork-local).
Slow-test audit (ADR-0908, 2026-05-30)¶
no rebase impact: REASON — all touched files are fork-local. A new ADR (docs/adr/0908-slow-test-audit-2026-05-30.md), a new research digest (docs/research/slow-test-audit-2026-05-30.md), fork-added pytest configuration in three pyproject.toml files registering the slow marker (tools/vmaf-tune/pyproject.toml, ai/pyproject.toml, mcp-server/vmaf-mcp/pyproject.toml), and fork-added test files (tools/vmaf-tune/tests/test_bbb_e2e_v5_bug_cluster.py, tools/vmaf-tune/tests/test_bbb_e2e_v14_bug_cluster.py). None are mirrored from upstream Netflix/vmaf.
ADR status-field drift sweep (2026-05-30)¶
no rebase impact: changes are confined to fork-local ADR markdown files under docs/adr/. Status-field flips on ADR-0573 (→ Superseded by ADR-0738) and Status normalisation on ADR-0105 / ADR-0106 / ADR-0107 (Supersedes-in-Status → Accepted + explicit Supersedes line). Netflix upstream has no docs/adr/ tree; nothing to reconcile on sync. Audit methodology and the full decision matrix live in docs/research/adr-status-drift-audit-2026-05-30.md.
ADR-0903 — Codecov upload wiring (2026-05-30)¶
no rebase impact: REASON — all changes are confined to fork-only files: .github/workflows/tests-and-quality-gates.yml is a fork-added CI workflow (upstream Netflix/vmaf has no equivalent gcovr-based Coverage Gate), docs/adr/0903-wire-codecov-upload.md is fork-only documentation, and changelog.d/added/wire-codecov-upload.md is a fork-only changelog fragment per ADR-0221. The added codecov/codecov-action steps depend only on the Cobertura XML the existing gcovr step already produces; upstream sync cannot break this wiring because the gcovr job itself is fork-only.
ADR-0904 — cargo-machete build-dep ignores (2026-05-30)¶
no rebase impact: REASON — Netflix/vmaf upstream has no Rust workspace. Both touched Cargo.toml files (bindings/rust/vmafx-sys/Cargo.toml, core/src/feature/rust/tad/Cargo.toml) live entirely in fork-local trees (ADR-0702, ADR-0707). The [package.metadata.cargo-machete] blocks add no-op metadata (cargo ignores keys it doesn't know) and cannot conflict with anything upstream might add later.
Signing and attestation audit (ADR-0902, 2026-05-30)¶
no rebase impact: REASON — changes are confined to fork-local CI infrastructure (.github/workflows/docker-publish-production.yml, docs/development/release.md, docs/adr/0902-*.md, docs/research/signing-and-attestation-audit-2026-05-30.md, changelog.d/security/signing-and-attestation-audit.md). The supply-chain workflow (supply-chain.yml) is itself fork-additive (Netflix upstream does not ship a Sigstore + SLSA + SBOM release channel); upstream syncs never touch any of these files.
Doxygen public-API clean (ADR-0953, 2026-05-31)¶
Low rebase impact: every edit lands as additional doxygen comments or per-member /**< desc */ annotations inside the public headers under core/include/libvmaf/. Two of the headers touched are Netflix-upstream-mirrored — picture.h and model.h — but the edits are pure documentation; no struct layout, function signature, or symbol name moves. An upstream sync that re-touches either file should accept its hunks unchanged and let the fork's doc comments remain in place. The four fork-added headers (libvmaf_mcp.h, dnn.h, libvmaf_metal.h, libvmaf_hip.h) are fork-local and have no upstream counterpart. The new core/doc/Doxyfile.public-api, .github/workflows/doxygen-public-api.yml, ADR-0953, research digest, changelog fragment, and AGENTS.md invariant note are fork-local — zero rebase exposure.: warning-clean doxygen build for libvmaf public C API (recovery of #457)): warning-clean doxygen build for libvmaf public C API (recovery of #457))
governance-audit (2026-05-30, ADR-0901)¶
No rebase impact — all changes are fork-local governance files that upstream Netflix/vmaf does not ship:
GOVERNANCE.md(new),MAINTAINERS.md(new) — top-level fork-only..github/CODEOWNERS— append-only additions below the existing rows (the rename of the existing/libvmaf/...rows to/core/...is owned by in-flight PR #321, not this PR).CONTRIBUTING.md— fork-specific block extended with branch-naming, ADR-0108 deliverables, ADR-allocator pointer, governance pointer. The inherited Netflix upstream contribution-guide block at the bottom is unchanged.docs/adr/0901-governance-audit.md,docs/adr/_index_fragments/0901-governance-audit.md,docs/adr/_index_fragments/_order.txt(one-line append),docs/research/governance-audit-2026-05-30.md,changelog.d/added/governance-audit.md— all fork-only paths.
On upstream sync, no conflict is expected. If CODEOWNERS shows a textual conflict because PR #321 landed in-between, the resolution is trivial: keep PR #321's renamed /core/... rows AND keep this PR's new append-only rows. Both edits are non-overlapping at the line level.
ADR-0893 — Pre-commit config audit — 2026-05-30¶
no rebase impact: REASON — .pre-commit-config.yaml is a fork-local config file. Upstream Netflix/vmaf does not ship pre-commit configuration; all revisions and hooks listed are fork-owned. Touches one fork-owned Python file via isort 6.0.1 auto-fix (tools/vmaf-tune/tests/test_codec_adapter_av1_videotoolbox.py), which is itself outside the upstream tree.
libsvm vendored audit — extend SAN-MODEL-MALLOC-OOB to row-ordering (ADR-0889, 2026-05-30)¶
Touches the vendored libsvm parser core/src/svm.cpp, which is wrapped in a file-level NOLINTBEGIN / NOLINTEND cordon. On Netflix-vmaf upstream sync the file is part of the fork-mirrored set: Netflix upstream has not refreshed its vendored libsvm copy since 2020-11 either, so a Netflix-only sync has near-zero conflict risk on this file.
On an upstream libsvm (Chih-Chung Chang / Chih-Jen Lin) sync — deliberately deferred per ADR-0889 — the fork carries three patch families that must be re-applied:
- Thread-locale isolation (ADR-0137) —
buffer.imbue(std::locale::classic())in bothSVMModelParserFileSourceandSVMModelParserBufferSourceconstructors. - JSON in-memory entry point —
svm_parse_model_from_bufferplus theSVMModelParserBufferSourcetemplate instantiation. Consumed byread_json_model.c; removing it breaks JSON-embedded SVM model loading. - SAN-MODEL-MALLOC-OOB hardening —
VMAF_SVM_MAX_AXIS_COUNT(1 << 24) bound,nr_class/total_svaxis-size asserts inparse_header()andparse_support_vectors(),sv_buffer.empty()post-parse guard, plus the row-ordering preconditions (exceptAssert(model->nr_class > 0, ...)) onrho,label,probA,probB,nr_svadded by ADR-0889.
Regression coverage at core/test/test_svm_parser.c (suite fast). On sync, re-run that test plus test_predict and test_model before merging. See core/src/AGENTS.md §10 for the full invariant list.
CI concurrency + cost audit (ADR-0890, 2026-05-30)¶
no rebase impact: REASON — CI-only changes to .github/workflows/ files that are wholly fork-local. Netflix upstream's CI is one .github/workflows/ file with a different name and structure; the five files modified here (ffmpeg-integration.yml, sanitizers.yml, security-scans.yml, lint-and-format.yml, plus the ADR / changelog / state.md surface) have no upstream counterpart. No source / header / patch surface touched; the ffmpeg-patches/ series is unaffected.
ADR-0883 — HIP kernel parity coverage round 2 — 2026-05-30¶
no rebase impact: REASON — the 5 new test files (core/test/test_hip_ciede_parity.c, test_hip_psnr_hvs_parity.c, test_hip_motion_parity.c, test_hip_ssim_parity.c, test_hip_ms_ssim_parity.c) live entirely under the fork-only HIP backend tree. Upstream Netflix/vmaf has no HIP backend, no parity tests, and no test_hip_* files; the if get_option('enable_hip') == true block in core/test/meson.build is fork-local (enable_hip was added by the HIP scaffold landing in ADR-0212). Wiring lives strictly inside that block. The only non-test file touched is docs/adr/README.md (index row) and docs/adr/_index_fragments/_order.txt — both fork-only.
ADR-0876 — printf-format portability sweep (CERT FIO47-C) — 2026-05-30¶
Low rebase impact, scoped to fork-added log / debug call sites. The four touched source files (core/src/libvmaf.c, core/src/sycl/common.cpp, core/src/sycl/dmabuf_import.cpp, core/test/test_motion_v2_simd.c) either are fork-added (the SYCL TUs + the AVX2 test) or contain a fork-added block inside an upstream-mirror file (the tiny-model loader in libvmaf.c, which is post-ADR-0700 fork-edited per git blame). The format-string changes are mechanical: (unsigned long)x + %lu → x + %" PRIu64 " for uint64_t; (long long)x + %lld → x + %" PRId64 " for int64_t; (unsigned long long)x + %llx → x + %" PRIx64 " for uint64_t hex prints. Three call sites in upstream-mirror code (core/src/feature/x86/adm_avx512.c print_128_64 debug macro) and POSIX-off_t / Windows-DWORD sites were intentionally not changed — see docs/research/0876-printf-format-portability-audit.md §2 Class C for the rationale. Future upstream syncs that touch the same lines will conflict trivially; resolve in favour of the PRI-macro form for fixed-width types.
ADR-0877 — error-code consistency audit (MS-SSIM decimate) — 2026-05-30¶
no rebase impact: the four touched TUs (core/src/feature/ms_ssim_decimate.{c,h}, core/src/feature/x86/ms_ssim_decimate_{avx2,avx512}.c, core/src/feature/arm64/ms_ssim_decimate_neon.c) are fork-added 2026-04-20; they have no upstream Netflix/vmaf counterpart. The change converts the malloc-failure branch from bare return -1 to return -ENOMEM and tightens the header docstring to match — no logic change on the hot path. Bit-exactness across scalar / AVX2 / AVX-512 / NEON is preserved (only the cold malloc-failure branch is touched).
ADR-0875 — GitHub Actions hardening audit — 2026-05-30¶
no rebase impact: REASON — all changes are confined to fork-local CI workflows under .github/workflows/ (go-ci.yml, rust-ci.yml, sanitizers.yml, supply-chain.yml). Upstream Netflix/vmaf has a completely different CI pipeline; none of these files exist upstream. Adds top-level permissions: contents: read to the two Go/Rust workflows and persist-credentials: false to five actions/checkout steps. No source code touched.
Fork-local files: .github/workflows/go-ci.yml, .github/workflows/rust-ci.yml, .github/workflows/sanitizers.yml, .github/workflows/supply-chain.yml, docs/adr/0875-github-actions-audit-2026-05-30.md, docs/research/github-actions-audit-2026-05-30.md, changelog.d/security/github-actions-audit-2026-05-30.md.
ADR-0873 — ARM64 NEON bit-exactness audit — 2026-05-30¶
Rebase impact: low, limited to build system and one test file.
core/src/meson.build lines 581–643: the arm64_v8 static lib is split into arm64_v8 (integer-only TUs, unchanged compile flags) and arm64_v8_fp (float-arithmetic TUs, new -ffp-contract=off flag). If upstream Netflix/vmaf adds new NEON TUs to this region, they must be classified as integer or float and placed in the correct lib.
core/src/feature/arm64/float_adm_neon.c: float_adm_sum_cube_neon and float_adm_csf_den_scale_neon now accumulate into float64x2_t instead of float32x4_t. This is a numeric change — if upstream modifies these functions, the double-accumulation pattern must be preserved.
core/src/feature/adm.c: comment-only change (ADR-0873 follow-up note).
core/test/test_motion_v2_simd.c: fill_adversarial_neg and fill_adversarial_mixed moved outside #if ARCH_X86; NEON test arm added for motion_score_pipeline_16_neon. On upstream sync, ensure the x86 test body still compiles.
Fork-local files: core/src/meson.build (lib split), core/src/feature/arm64/float_adm_neon.c (reduction stability), core/src/feature/arm64/AGENTS.md (invariant note), core/src/feature/adm.c (comment), core/test/test_motion_v2_simd.c (NEON test arm), docs/adr/0873-arm64-neon-bit-exactness-audit.md, changelog.d/fixed/arm64-neon-bit-exactness-audit.md.
Logging consistency audit — 2026-05-30¶
No rebase impact on upstream. All routed sites are fork-local: core/src/libvmaf.c (vmaf_write_output — fork-added entry point added by the --precision/output overhaul), core/src/sycl/dispatch_strategy.cpp (fork-only file, ADR-0181), core/src/sycl/common.cpp (fork-only file). Vendored core/src/svm.cpp and upstream-mirror feature extractors (vif.c, adm.c, ms_ssim.c, motion.c, ssim.c) are explicitly deferred precisely because they carry upstream-sync invariants — leaving them untouched preserves the rebase story.
Fork-local files: core/src/libvmaf.c, core/src/sycl/dispatch_strategy.cpp, core/src/sycl/common.cpp, docs/research/logging-consistency-audit-2026-05-30.md, changelog.d/changed/logging-consistency-audit.md.
ADR-0870 — Helm values.schema.json + dev-MCP path drift — 2026-05-30¶
no rebase impact: all touched files are fork-additions (deploy/helm/, dev/Containerfile, dev/docker-compose.yml, .dockerignore, docs/adr/0870-*.md, docs/adr/README.md, docs/adr/_index_fragments/_order.txt, docs/development/k8s-deployment.md, changelog.d/added/0870-*.md, changelog.d/fixed/0870-*.md, docs/state.md). None of these have upstream Netflix/vmaf counterparts. The Containerfile path fixes (libvmaf/ → core/) are the downstream of ADR-0700's repo rename; future rebases against a hypothetical upstream that re-introduced a libvmaf/ directory at the repo root would need their own audit, but no such state exists or is planned.
ADR-0868 — GPU backend kernel coverage gap-fill — 2026-05-30¶
No rebase impact: all changes are net-new fork-local test files under core/test/test_{cuda,hip,sycl,metal}_*_parity*.c plus their meson wiring. No upstream Netflix/vmaf source is touched. The tests target fork-added GPU extractor names (psnr_cuda, ciede_cuda, psnr_hip, vif_hip, psnr_sycl, vif_sycl, the 8 *_metal extractors) which do not exist upstream. Mirrors the existing test_{cuda,hip,sycl}_motion3_parity.c pattern (already fork-local).
Fork-local files: core/test/test_cuda_psnr_parity.c, core/test/test_cuda_ciede_parity.c, core/test/test_hip_psnr_parity.c, core/test/test_hip_vif_parity.c, core/test/test_sycl_psnr_parity.c, core/test/test_sycl_vif_parity.c, core/test/test_metal_kernel_registration.c, core/test/meson.build (additive blocks only, no upstream-touching hunks), docs/adr/0868-gpu-backend-kernel-coverage.md, docs/research/gpu-backend-kernel-coverage-audit-2026-05-30.md, changelog.d/added/0868-gpu-backend-kernel-coverage.md.
test/feature-extractor-coverage-push — 2026-05-30¶
no rebase impact: REASON — test-only changes confined to fork-local files under core/test/. New test file core/test/test_mkdirp.c is wholly fork-added (no upstream equivalent); the four touched files (core/test/test_luminance_tools.c, core/test/test_feature.c, core/test/test_feature_extractor.c, core/test/meson.build) gain only new static char *test_… functions and registrations — no existing logic edited. No production source under core/src/ is touched. The test_mkdirp binary compiles core/src/feature/mkdirp.c directly into its TU; this is the same pattern other test binaries already use (test_ref compiles ../src/ref.c, test_thread_pool compiles ../src/thread_pool.c, etc.), so the linkage introduces no new precedent.
Fork-local files: core/test/test_mkdirp.c (new), core/test/test_luminance_tools.c, core/test/test_feature.c, core/test/test_feature_extractor.c, core/test/meson.build, changelog.d/added/feature-extractor-coverage-push.md.
fix/simd-bug-audit-20260531 — 2026-05-31¶
no rebase impact: fork-local SIMD entry points only. The two patched files (core/src/feature/x86/float_adm_avx2.c, core/src/feature/arm64/float_adm_neon.c) are fork-added SIMD ports of upstream adm_dwt2_s; they are not yet wired through compute_adm (ADR-0873 follow-up). Upstream Netflix/vmaf has neither file. The change harmonises the NULL-allocation guard with the already-shipped AVX-512 sibling (float_adm_avx512.c) which has been the de-facto reference since master tip; no upstream merge can collide. The third file (core/src/feature/arm64/ssimulacra2_host_neon.c) is wholly fork-added (SSIMULACRA 2 is a fork extractor) and the edit is comment- only.
Fork-local files: core/src/feature/x86/float_adm_avx2.c, core/src/feature/arm64/float_adm_neon.c, core/src/feature/arm64/ssimulacra2_host_neon.c, changelog.d/fixed/simd-float-adm-dwt2-unchecked-aligned-malloc.md.
ai/ tempfile + path-safety bandit sweep — 2026-05-30¶
no rebase impact: REASON — every touched file lives under ai/scripts/ or ai/tests/, all of which are wholly fork-local (Netflix upstream ships no tiny-AI training, dataset acquisition, or ONNX export pipeline). No upstream Netflix/vmaf file is touched. Fork-local files: ai/scripts/bvi_dvc_to_full_features.py, ai/scripts/konvid_to_full_features.py, ai/scripts/export_tiny_models.py, ai/scripts/export_u2netp_mirror.py, ai/scripts/export_vmaf_tiny_v{2,3,4}.py, ai/scripts/fetch_konvid_1k.py, ai/scripts/fetch_youtube_ugc_subset.py, ai/tests/test_corpus_base.py, ai/tests/test_feature_extractor_defaults.py, ai/tests/test_merge_corpora.py, ai/tests/test_train_predictor_v2_realcorpus.py, changelog.d/security/ai-tempfile-and-path-safety.md.
Go controller / server / MCP test coverage expansion (2026-05-30)¶
No rebase impact: all touched files are fork-added Go tests under cmd/ — none have an upstream Netflix/vmaf counterpart (upstream ships no Go sources). The PR adds:
cmd/vmafx-controller/main_extra_test.go(new)cmd/vmafx-controller/nodes/registry_edge_test.go(new)cmd/vmafx-server/main_extra_test.go(new)cmd/vmafx-mcp/impl_test.go(new)changelog.d/added/go-controller-mcp-coverage.md(new)docs/state.md(one_Updated:annotation line; no row change)docs/rebase-notes.md(this entry)
No upstream-mirror file is touched.
ADR-0848 — Per-surface doc compliance audit (2026-05-29)¶
Rebase impact: none. This PR adds only docs/research/, docs/adr/, changelog.d/, and docs/state.md changes. No code, no meson, no public headers.
Future rebases: If PRs that fix the three gaps (Issue A / B / C from Research-0848) are in flight, ensure: - Issue A (Vulkan removal docs): no conflict expected — docs/backends/vulkan/, docs/metrics/features.md, docs/development/build-flags.md are rarely touched. - Issue B (deprecations.md): docs/development/deprecations.md is append-only.
Changelog fragment consolidation (2026-05-29)¶
no rebase impact: changelog-only — scripts/release/concat-changelog-fragments.sh awk fix + changelog.d/ fragment moves do not touch any upstream Netflix/vmaf source file. No C, Python, or test changes.
float_adm AVX2/AVX-512 F2+F3 precision fix (ADR-0844, 2026-05-29)¶
Rebase invariant: if an upstream Netflix/vmaf commit changes float_adm_csf_den_scale_s, float_adm_sum_cube_s, or any other reduction function in core/src/feature/float_adm.c, the corresponding AVX2 and AVX-512 variants in core/src/feature/x86/float_adm_avx2.c and float_adm_avx512.c must be updated to preserve the double-precision widening contract (ADR-0844 / ADR-0139). The hadd_pd4 and hsum_ps_to_double helpers are static inline and duplicated across TUs intentionally — do not merge them into a shared header. The -ffp-contract=off per-TU static library carve-out in core/src/meson.build (the x86_float_adm_avx2_lib and x86_float_adm_avx512_lib targets) must be preserved on any rebase that touches the meson.build AVX2/AVX-512 build block; they mirror the ssimulacra2 carve-out already in tree.
AVX-512 motion parity tests (ADR-0854, 2026-05-29)¶
no rebase impact: REASON — changes are confined to new test files (core/test/test_motion_avx512_parity.c, changelog.d/added/motion-avx512-parity-tests.md, docs/adr/0854-motion-avx512-parity-tests.md) and additive changes to core/test/simd_bitexact_test.h (new helper function) and core/test/meson.build (new test registration). No upstream Netflix/vmaf production source is modified; no existing test is changed; no golden assertions are touched.
ADR-0852 — HIP speed extractor wiring (2026-05-29)¶
no rebase impact: the three changed files (core/src/meson.build, core/src/hip/meson.build, core/src/feature/feature_extractor.c) are fork-owned; no upstream Netflix/vmaf C source is touched. The only upstream- adjacent file is feature_extractor.c whose #if HAVE_HIP block is a fork-added section; conflicts are only possible with other HIP-wiring PRs.
Dependency audit 2026-05-30 — golang.org/x/net + x/sys bump¶
No rebase impact: the only changed files are go.mod / go.sum, plus a changelog fragment and a research digest. The Go workspace is a fork-only addition (Netflix/vmaf upstream does not ship Go modules); there is no upstream baseline to rebase against. Versions: golang.org/x/net v0.53.0 -> v0.55.0, golang.org/x/sys v0.43.0 -> v0.45.0, golang.org/x/term v0.42.0 -> v0.43.0, golang.org/x/text v0.36.0 -> v0.37.0 (minimum-version selection).
Fork-local files: go.mod, go.sum, changelog.d/security/dependency-audit-2026-05-30.md, docs/research/dependency-audit-2026-05-30.md.
CodeQL Go coverage + config conflict resolution (ADR-0811, 2026-05-29)¶
no rebase impact: CI-config-only change; no public API surface affected. All changes are confined to .github/codeql-config.yml (Go paths addition + gen/go exclusion), .github/workflows/security-scans.yml (new codeql-go job), docs/adr/0811-security-codeql-go-pvr.md, and the changelog fragment. No upstream Netflix/vmaf files are touched; no C/Python/Go production code is modified. On upstream sync, the CodeQL workflow additions apply cleanly regardless of upstream changes.
Fork-local files: .github/codeql-config.yml, .github/workflows/security-scans.yml, docs/adr/0811-security-codeql-go-pvr.md, changelog.d/security/0811-codeql-go-config-fix.md.
release-please draft mode¶
no rebase impact: release-tooling-only change (release-please-config.json "draft": true). No C sources, headers, or test logic modified.
Coverage-overrides audit — tighten tiny_extractor_template.h (ADR-0881, 2026-05-30)¶
no rebase impact: REASON — changes are confined to fork-only files: scripts/ci/coverage-check.sh (fork-only CI gate), the new docs/adr/0881-*.md ADR, the new docs/research/0881-*.md digest, the ADR index fragment under docs/adr/_index_fragments/, and the changelog.d/changed/ fragment. The threshold ratchet only tightens an existing override (10 → 75) — does not introduce a new path Netflix upstream might also override. Future audits per the codified rule (see ADR-0881 §Decision) are also fork-only since coverage-check.sh itself is fork-only (Netflix upstream has no equivalent gate).
vmafx-operator envtest etcd setup (2026-05-30)¶
no rebase impact: REASON — all changes are in fork-added paths only. Files touched: Makefile (new setup-envtest + setup-envtest-env targets in the Go workspace section, ADR-0702 scope), .github/workflows/go-ci.yml (new pre-test step installing sigs.k8s.io/controller-runtime/tools/setup-envtest@latest + exporting KUBEBUILDER_ASSETS), cmd/vmafx-operator/internal/controller/suite_test.go (top-of-TestControllers t.Skip() guard + nil-testEnv bailout in AfterSuite), cmd/vmafx-operator/AGENTS.md (new invariant #6 documenting the skip-safe envtest pattern), and the changelog.d/fixed/ fragment. cmd/vmafx-operator/ is fork-added per ADR-0714 — upstream Netflix/vmaf ships no Go sources, so no upstream merge can reach these files.
log.c → log.cpp C++23 pilot (ADR-0708 Wave 1, 2026-05-30)¶
Upstream Netflix libvmaf/src/log.c is fork-renamed to core/src/log.cpp. Future port-upstream-commit runs that touch libvmaf/src/log.c must apply changes to core/src/log.cpp instead — the fork-rename mapping is recorded here.
Public C ABI is preserved: core/src/log.h retains the same two function prototypes (vmaf_log, vmaf_set_log_level) and now carries extern "C" guards so it is includable from both C and C++ TUs. The C-mangled exported symbols are unchanged (nm libvmaf.so shows vmaf_log and vmaf_set_log_level with the same C-mangling as the prior log.c build).
Meson wiring: log.cpp compiles in an isolated log_cpp23_lib static_library with override_options: ['cpp_std=' + libvmaf_cpu_cpp_std], mirroring the metadata_handler_cpp20_lib pattern (ADR-0708 metadata_handler pilot). Test executables that previously direct-compiled ../src/log.c (test_lpips, test_dists, test_feature_extractor, test_speed, ...) now pick up log symbols via the shared log_cpp23_test_objects aggregate in core/test/meson.build.
Fork-local files: core/src/log.cpp (was: core/src/log.c, removed), core/src/log.h (added extern "C" guards), core/src/meson.build (replaced log.c source entry with log_cpp23_lib), core/test/meson.build (added log_cpp23_test_objects, removed inline '../src/log.c' source entries from ~20 test execs, wired test_log into the fast suite), docs/adr/0708-vmafx-cpp23-internals-pilot.md (consequences cross-link), changelog.d/changed/log-c-to-cpp23.md.extern "C" guards added: log.h, model.h, read_json_model.h, opt.h. Any upstream commit that adds new declarations to these headers must include the guard-wrapped declaration for correctness. Flag in the port if upstream adds a declaration outside the guard block.
port/upstream-netflix-may-jun-2026 — 2026-06-01¶
Five Netflix upstream commits ported. Each reduces the diff against upstream and therefore reduces future rebase friction.
-
e4b93c6ed (
fetch_picturedirect-read):core/tools/vmaf.cno longer has a#ifdef USE_DIRECT_READbranch. Future upstreams that touchvmaf.cwill now merge cleanly without the compile-guard conflict. -
a4a1492d3 (integer_motion rename):
core/src/feature/integer_motion.candcore/src/feature/x86/motion_avx2.{c,h}/motion_avx512.{c,h}are now at upstream parity.integer_motion_v2.candmotion_v2_avx2/512are fork-local (GPU build paths); any future upstream touch of those names should check whether the GPU backends have been updated to the renamed API. -
c2155d6cd (2160p CSF):
core/src/feature/barten_csf_tools.his now at upstream parity.core/test/test_barten_csf.chas new upstream tests. -
9a078011c (ADM SIMD fix):
core/src/feature/integer_adm.candcore/src/feature/x86/adm_avx2.c+adm_avx512.cat upstream parity. -
30f472b14 (Speed_chroma AVX):
core/src/feature/speed.c,core/src/feature/x86/speed_avx2.{c,h},core/src/feature/x86/speed_avx512.{c,h}are new upstream-mirror files. Future upstream touches to speed.c may need to propagatecompute_cov_kernelinto the GPU speed_chroma extractors.
Fork-local files touched: core/tools/vmaf.c (commit #1 — call-site updates for signature change), core/src/feature/feature_extractor.c (commit #2 — remove CPU v2), core/src/meson.build (commits #2, #5 — add speed_avx2/512, remove motion_v2 CPU build).
ADR-0700 Dockerfile path residuals — 2026-05-30¶
no rebase impact: REASON — touches only fork-added Dockerfiles (docker/Dockerfile.production, docker/Dockerfile.production-gpu, docker/dev/{alpine-3.20,arch,fedora-40}.Dockerfile) and the fork-added dev/Containerfile. None of these files have an upstream Netflix/vmaf counterpart. The change is a literal libvmaf/ → core/ substitution at source-tree positions (meson setup … core, COPY core/, cd core); install-path / package / filter-name occurrences (/usr/local/include/libvmaf/, libvmaf.so, libvmaf-dev, --enable-libvmaf*) are deliberately preserved because they describe the shipped library / package / ffmpeg-filter surface, not the source layout.
ADR-0709 residual ANSNR references in docs + ai/data — 2026-05-30¶
no rebase impact: REASON — all changes are fork-local. Touched files are ai/data/feature_extractor.py (fork-added Python helper, no upstream counterpart), docs/metrics/ansnr.md, docs/backends/index.md, docs/backends/cuda/overview.md, docs/backends/hip/overview.md. The HIP and CUDA overviews and the metric page are fork-only docs; the backends index page is also fork-only. No upstream Netflix/vmaf source is touched. The cleanup closes residual references left over after PR #38 (ADR-0709) removed the float_ansnr extractor from every backend.
fix/post-rename-post-vulkan-sweep — 2026-06-01¶
no rebase impact: post-rename cleanup only. The Containerfile change adds a pkg-install line that cannot conflict with upstream (upstream has no Containerfile). The score_backend.py change is fork-local code with no upstream counterpart. The test and doc updates are purely fork-local.
ADR-0777 — Thread-Safety Audit: CUDA / SYCL / HIP Backends (2026-05-29)¶
no rebase impact: docs/research + docs/adr only; no source files were changed.
Python dep freshness sweep (ADR-0879, 2026-05-30)¶
no rebase impact: all touched files are fork-local — ai/pyproject.toml, mcp-server/vmaf-mcp/pyproject.toml, dev-llm/pyproject.toml, tools/vmaf-tune/pyproject.toml, tools/vmaf-roi-score/pyproject.toml, python/test/requirements.txt. Netflix upstream does not ship the ai/, mcp-server/, dev-llm/, or tools/vmaf-* trees; the only file shared with upstream (python/test/requirements.txt) gained a >=7.1.0 floor on pytest-cov which is purely additive over upstream's bare pytest-cov line. On rebase, keep the bumped floor; if upstream introduces its own ceiling on pytest-cov, intersect rather than overwrite.
pyright-strict-audit (2026-05-30, ADR-0888)¶
no rebase impact: REASON — all touched files are fork-added Python sources under ai/src/ and tools/vmaf-tune/src/ (and the CodecAdapter Protocol in tools/vmaf-tune/src/vmaftune/codec_adapters/__init__.py, also fork-added). No upstream Netflix/vmaf file is touched. The annotation tightening (TYPE_CHECKING torch import, assert-based Optional narrowing, cast through stub gaps, dropped dead Optional comparisons) does not change runtime behaviour — all 12 fixes are pure type-checker compliance. The audit's companion file pyrightconfig.audit.json is intentionally gitignored so this PR doesn't introduce a CI gate before the long-tail cleanup is done.
cuda-ms-ssim-double-precision-lcs (2026-06-03, ADR-0990)¶
no rebase impact: REASON — all touched files are fork-added CUDA sources (core/src/feature/cuda/integer_ms_ssim/ms_ssim_score.cu, core/src/feature/cuda/integer_ms_ssim_cuda.c) and docs. Netflix upstream does not ship a CUDA ms_ssim kernel. The invariant to preserve on any future port of ssim_accumulate_default_scalar changes from upstream: the ms_ssim_vert_lcs kernel must keep double for my_l/my_c/my_s, the shared-memory warp partial arrays, and the c1/c2/c3 parameters — these are load-bearing for the places=4 parity gate (ADR-0990 / ADR-0139). If upstream changes the scalar 2.0 * to 2.0f * (regressing to float), do NOT mirror that change into the CUDA kernel without also updating the parity test tolerance.
sycl-float-ssim-ssimulacra2-parity-research (2026-06-03, ADR-0985)¶
no rebase impact: REASON — changes are: (1) a new test file core/test/test_sycl_float_ssim_parity.c (fork-added, no upstream equivalent), (2) a meson.build test entry, (3) a research document, (4) an ADR, (5) a clarifying comment in integer_ssim_sycl.cpp (fork-added GPU kernel), and (6) state.md / changelog.d fragment updates. No CPU scalar, no public API, no Netflix upstream file is touched.
perf/arm64-float-moment-sve2 (2026-06-03, ADR-0584)¶
Rebase note: core/src/feature/float_moment.c gains an #if HAVE_SVE2 block that selects compute_1st_moment_sve2 / compute_2nd_moment_sve2 over the NEON fallback when VMAF_ARM_CPU_FLAG_SVE2 is set. The scalar default and the NEON path are unchanged; SVE2 is purely additive. core/src/meson.build gains a new arm64_moment_sve2_lib static library inside the existing if is_sve2_supported block. core/test/test_moment_simd.c gains four SVE2 test functions guarded by #if HAVE_SVE2. docs/backends/arm/overview.md updates the per-feature coverage table. No upstream Netflix/vmaf file is touched; no public C API or CLI flag changes.
shared-strict-json-helpers (2026-06-03, ADR-0988)¶
no rebase impact: REASON — all touched files are fork-added Python modules (tools/vmaf-tune/src/vmaftune/compare.py, report.py, benchmark.py; mcp-server/vmaf-mcp/src/vmaf_mcp/server.py) with no upstream Netflix/vmaf equivalent. The changes are import additions and private-function removals; no public API, no C sources, no Netflix golden-data files are touched.
sycl-motion-add-uv (2026-06-03, ADR-0989)¶
no rebase impact: REASON — all changed files are fork-added GPU backends (integer_motion_sycl.cpp, integer_motion_cuda.c, motion_vulkan.c, integer_motion_hip.c, integer_motion_metal.mm). The upstream Netflix integer_motion.c is not modified. If upstream adds motion_add_uv to integer_motion.c in a future sync, check whether the SYCL per-plane normalization formula remains consistent.
avx512-float-moment (2026-06-03, ADR-0987)¶
no rebase impact: REASON — all touched files are fork-added x86 SIMD sources (core/src/feature/x86/moment_avx512.c, moment_avx512.h) and the dispatch addition inside float_moment.c is guarded by HAVE_AVX512 / VMAF_X86_CPU_FLAG_AVX512 ifdefs that are invisible on any non-AVX-512 build path. The four new parity test cases in test_moment_simd.c are also guarded by HAVE_AVX512 and do not touch any upstream Netflix file. No public C API, no meson_options.txt entry, no CLI flag, no ffmpeg patch is changed.
CUDA motion 8-frame SAD batching (ADR-0845, 2026-05-29)¶
core/src/feature/cuda/integer_motion_cuda.c — structural change to MotionStateCuda (sad ring, score_ring, last_batch_boundary fields) and rewrite of submit/collect/flush.
Rebase impact: MEDIUM. If an upstream Netflix commit touches integer_motion_cuda.c, expect a conflict in submit_fex_cuda and collect_fex_cuda. Resolution rules: 1. Keep the batching structure (sad[] ring, batch-boundary sync in collect). 2. Apply upstream logic changes (e.g., score normalization formula, new options) to the batch-emit paths rather than the old per-frame paths. 3. Verify the ADR-0358 invariant: cuMemsetD8Async is always on pic_stream, NOT s->str. 4. The emit_batch_scores() frame_index save/restore must survive; dropping it breaks motion3 moving-average correctness.
core/src/feature/cuda/AGENTS.md — new section "Motion SAD batch fencing": keep verbatim on rebase.
ADR-0930 — Helm NetworkPolicy + PSS baseline — 2026-05-31¶
no rebase impact: REASON — every touched file lives entirely under deploy/helm/vmafx/ (a fork-local directory; Netflix upstream ships no Helm chart), plus fork-local documentation under docs/development/, docs/adr/, docs/research/, docs/state.md, docs/rebase-notes.md (this file), and changelog.d/added/. None of the production C, Go, Python, FFmpeg-patch, or Meson surfaces are touched. Future upstream syncs cannot conflict with this change.
Fork-local files: - deploy/helm/vmafx/values.yaml (UID + seccomp + networkPolicy block) - deploy/helm/vmafx/templates/networkpolicy.yaml (new) - deploy/helm/vmafx/templates/operator-deployment.yaml (inherit from .Values) - deploy/helm/vmafx/templates/tests/test-connection.yaml (inherit from .Values) - deploy/helm/vmafx/templates/NOTES.txt (PSS / NP hints) - docs/development/k8s-deployment.md (Pod security + NetworkPolicy sections) - docs/adr/0930-helm-networkpolicy-pss.md and matching _index_fragments/ entry - docs/research/0930-helm-networkpolicy-pss.md - changelog.d/added/helm-networkpolicy-pss.md - docs/state.md (closed row)
fix(hip): integer_ms_ssim_hip picture_copy normalization — 2026-06-03¶
no rebase impact: REASON — changes touch only core/src/feature/hip/integer_ms_ssim_hip.c (fork-only HIP backend, no upstream equivalent in Netflix/vmaf), docs/state.md (fork-local bug tracker), changelog.d/fixed/ (fragment), and docs/rebase-notes.md (this entry). core/test/test_hip_ms_ssim_parity.c and the corresponding meson.build entry were already present on master (merged from gpu-runtime-bug-audit). No CPU scalar path, no public header, no Netflix upstream file is touched.
feat(simd): integer-ssim-avx2 (ADR-0784)¶
Branch: feat/integer-ssim-avx2 Touches: core/src/feature/integer_ssim.c, core/src/feature/x86/integer_ssim_avx2.{c,h}, core/src/meson.build (x86_avx2_sources), core/test/test_integer_ssim_simd.c, docs/adr/0784-integer-ssim-avx2.md, docs/backends/x86/integer-ssim-avx2.md.
Adds AVX2 dispatch for the horizontal moment accumulation pass in integer_ssim.c. The dispatch uses function pointers in IntegerSsimState; the integer_ssim_moments_t struct in integer_ssim_avx2.h must stay layout-identical to ssim_moments in integer_ssim.c. Any upstream refactor of ssim_moments field order requires a matching update in the AVX2 header.
chore(ci): ci-workflow-name-shortening (ADR-0995)¶
Branch: chore/ci-shorten-workflow-names
no rebase impact: pure CI display-name rename; no C/C++/Python source touched.
fix(dnn): add missing vmaf_ort_internal_input/output_elem_type accessors¶
Branch: fix/dnn-ort-internals-missing-elem-type-accessors Touches: core/src/dnn/ort_backend_internal.h, core/src/dnn/ort_backend.c, changelog.d/fixed/dnn-ort-internals-elem-type-accessors.md.
No rebase impact: the added symbols (VmafOrtElemType, vmaf_ort_internal_input_elem_type, vmaf_ort_internal_output_elem_type) are fork-local internal-test helpers with no upstream Netflix/vmaf analogue. The VmafOrtSession.input_elem_types / .output_elem_types fields and the VMAF_HAVE_DNN guard structure they read from are also fork-local. Conflict probability on these files with upstream is zero.
chore(scripts): modernization-audit scanner — reduce false-positive noise¶
Branch: chore/modernization-audit-false-positive-filter Touches: scripts/dev/project_modernization_audit.py, scripts/dev/test_project_modernization_audit.py, changelog.d/fixed/modernization-audit-calibration-and-closed-row-noise.md.
No rebase impact: changes are confined to the developer-tools scanner and its test file. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The new module-level constants (CALIBRATION_PLACEHOLDER_PATHS, CLOSED_SECTION_HEADINGS_RE, CLOSED_ROW_RE) and the updated scan_state_files / _marker_suppressed functions have no upstream analogue; there is no merge conflict possible.
fix(rust): vmafx-sys Default trait + Rust CI re-trigger¶
Branch: fix/rust-ci-vmafx-sys-build-dep
no rebase impact: adds impl Default for VmafContext in bindings/rust/vmafx-sys/src/safe.rs. Fork-local Rust crate with no upstream analogue; no C source, public header, or Python file is touched.
fix(perf): scaffold perf gate baseline + advisory threshold (ADR-1005)¶
Branch: fix/perf-gate-advisory-threshold-adr1005
no rebase impact: adds --advisory and --skip-if-no-baseline flags to scripts/perf/check-regression.py, updates the CI workflow step comment and flags, and adds docs/development/perf-gate.md. No C source, public header, Netflix golden assertion, or upstream-mirrored Python file is touched.
test(c): CPU feature extractor coverage push — round 3¶
Branch: test/cpu-extractor-coverage-push
no rebase impact: adds four new test-only .c files under core/test/ and wires them into core/test/meson.build. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The new tests exercise existing extractor paths; no new symbols are introduced.
chore(cppcheck): audit + cite all cppcheck-suppress comments¶
Files touched: core/src/feature/vif.c, changelog.d/chore/cppcheck-suppress-cite-audit.md, docs/rebase-notes.md.
No rebase impact: comment-only edit to vif.c; adds [MISRA-C:2012-11.3/EXP36-C] citations to 10 bare cppcheck-suppress invalidPointerCast annotations. No logic changed; no public header, Netflix golden assertion, or upstream-mirrored symbol is affected.
test(compat-python-vmaf): coverage push round 2 — Asset + ResultStore + crossval¶
Branch: test/compat-python-vmaf-coverage-push (or equivalent worktree branch)
no rebase impact: all four new test files (test_asset.py, test_result_store.py, test_cross_validation.py, test_tools_misc.py) live exclusively under compat/python-vmaf/tests/. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The tests exercise existing public APIs only and add no new production code paths.
chore(adr-0726): final Vulkan residual scrub — config flags + Docker + comments¶
Branch: chore/adr-0726-vulkan-residual-scrub
no rebase impact: removes dead Vulkan build-matrix rows and updates stale Vulkan references in docs, CLAUDE.md, and AGENTS.md files to past tense. No C source files changed. No public headers changed. No upstream-mirrored Python files changed. No Netflix golden assertions touched. The only structural change is removing two dead CI matrix rows that would fail anyway (meson rejects the unknown enable_vulkan option).
ADR-1011 — CUDA symbol visibility (2026-06-04)¶
No rebase impact. Adding static to TU-internal functions has no ABI or behaviour effect — all call sites are function-pointer assignments within the same translation unit. No public headers changed.
ADR-1010 — MCP server JSON parse guards (2026-06-04)¶
No rebase impact. Error-handling only — wraps two json.loads calls. No protocol, API, or tool-schema changes. Output format on the success path is unchanged.
ADR-1008 — C lifecycle + test correctness fixes (2026-06-04)¶
No rebase impact. pic_cnt increment timing change only affects error-retry callers (extremely uncommon path). Div-by-zero guard only fires when n_subsample covers all frames in range (degenerate caller). Test fixes in test_feature_collector.c and test_framesync.c are test-only with no production code change.
ADR-1007 — C string/numeric UB fixes (2026-06-04)¶
No rebase impact. All changes are guarded code paths that only fire for unusual caller-supplied values (NULL string defaults, model names shorter than 5 chars, tiny ADM frame dimensions). No public API, golden assertions, or ABI touched.
ADR-1012 — Go queue state-machine guards (2026-06-04)¶
No rebase impact. Both changes affect only the internal SQLite write path of the controller queue. No public proto/gRPC API change. Callers that receive the new 'job was cancelled before assignment' error from PullWork should retry — the controller's own retry loop already does this.
ADR-1009 — Go shutdown goroutine fixes (2026-06-04)¶
No rebase impact. WaitForShutdown drain-window change only affects shutdown timing (returns up to 30s earlier on clean shutdown). GracefulStop hard-stop fallback only fires on stuck streaming RPCs. No public API, ABI, or golden assertion touched.
fix(observability): Prometheus registry isolation + timer leak (ADR-1014)¶
Branch: fix/r5-prometheus-registry
no rebase impact: all changes are in pkg/observability/observability.go and its test file. No C source, public header, upstream-mirrored Python, or Netflix golden-assertion file is touched. The struct gains two private fields (reg prometheus.Registerer, sourcesOnce sync.Once) which are zero-valued before NewMetrics is called — no caller needs updating. The WaitForShutdown time.After → time.NewTimer + defer Stop() change is behaviour-equivalent; the only observable difference is the timer being released promptly on early return rather than at GC time.
fix(operator): Go operator resource-allocation — http.Client + gRPC dial (ADR-1017)¶
Branch: fix/r5-go-timer-ctx
no rebase impact: all changes are in cmd/vmafx-operator/internal/controller/. No public Go API, CRD schema, RBAC manifest, Helm template, or C source is touched. The SetupWithManager change is additive (adds an if r.HTTPClient == nil guard). No Netflix golden assertions touched.
fix(mcp,controller): exec.CommandContext + gRPC panic recovery (ADR-1018)¶
Branch: fix/r5-mcp-exec-ctx
no rebase impact: changes are in cmd/vmafx-mcp/impl.go and cmd/vmafx-controller/grpc_server.go. The runVmafScore Go signature change is internal to cmd/vmafx-mcp (all three callers are in the same file). No public MCP tool schema, JSON-RPC protocol, or gRPC proto file changes. No C source, public header, or Netflix golden assertions touched.
fix/y4m-dst-buf-read-sz-overflow (2026-06-04)¶
Files touched: core/tools/y4m_input.c, docs/adr/1022-y4m-dst-buf-read-sz-overflow.md, changelog.d/fixed/1022-y4m-dst-buf-read-sz-overflow.md
no rebase impact: the fix adds (size_t) casts to five arithmetic expressions in y4m_input_open_impl(). No public header is changed. No API surface changes. The upstream y4m_input.c source differs from this fork's copy (earlier ADR-0977 fixes are already in tree); cherry-picks of upstream Y4M changes will need to re-apply the same cast pattern to any newly introduced chroma branches.
fix(auth,nodes): constant-time session-token compare + JWT nbf validation (ADR-1021, 2026-06-04)¶
Files touched: cmd/vmafx-controller/nodes/registry.go, cmd/vmafx-controller/auth/middleware.go, mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py, cmd/vmafx-controller/nodes/registry_test.go, cmd/vmafx-controller/auth/middleware_test.go, docs/adr/1021-session-token-const-time-compare.md, changelog.d/fixed/r5-crypto-const-time-session-token.md
no rebase impact: security-only bug-fix with no public API changes. All modified symbols are internal (non-exported comparison logic, JWT payload struct field). No C source, public C API header, upstream-mirrored Python, Netflix golden assertion, or ffmpeg-patches file is touched.
fix/mcp-asyncio-adr1023 (2026-06-04)¶
Files touched: mcp-server/vmaf-mcp/src/vmaf_mcp/server.py, docs/adr/1023-mcp-asyncio-correctness.md, docs/adr/README.md, changelog.d/fixed/mcp-asyncio-correctness.md
no rebase impact: changes are isolated to the MCP server Python module and docs. No C source, public C header, Netflix golden-assertion file, or upstream-mirrored Python harness is touched. The changes are pure async-safety fixes inside coroutines that already existed; no public API or tool schema changes.
fix/r5-memory-ordering (2026-06-04, ADR-1020)¶
Files touched: core/src/ref.c, core/src/ref.cpp, core/src/ref.h, core/src/feature/feature_collector.h, core/src/feature/feature_collector.cpp, core/src/picture_pool.c
If an upstream Netflix commit touches any of these files, review the following invariants before accepting:
ref.c/ref.cpp: the decrement must remainmemory_order_acq_rel; any upstream change that reverts to bareatomic_fetch_submust be re-annotated.feature_collector.h: thedestroyedfield must survive struct layout changes; all new public entry points that lockfeature_collector->lockmust add the destroyed-guard pattern immediately after the lock call.picture_pool.cfetch path: thepool->pictures[idx]copy must happen before the unlock; if upstream refactors the fetch function, preserve that ordering.
fix/integer-ssim-moments-type-non-x86 (2026-06-04, ADR-1040)¶
Files touched: core/src/feature/integer_ssim.h (new), core/src/feature/integer_ssim.c, core/src/feature/x86/integer_ssim_avx2.h
If an upstream Netflix commit adds or renames fields in the SSIM accumulation buffer, the shared header integer_ssim.h must be updated to match. The layout invariant (six consecutive int64_t fields, identical to the private ssim_moments struct) is documented in ADR-0784 and ADR-1040; any upstream layout change that breaks the direct cast in accum_row_scalar_8 / accum_row_scalar_16 requires a coordinated update to both the typedef and the cast sites.
fix/ci-go-rust-red-adr1041¶
no rebase impact: CI configuration change (go-ci.yml) and test/meson.build build guard. Neither touches public API or upstream-mirrored C code.
feat/0804-vmaf-context-get-backend¶
no rebase impact: purely additive public C API addition (new enum VmafBackend and new function vmaf_context_get_backend()). No upstream-mirrored code is modified; no existing entry points are changed. ADR-0804.
fix/dev-cuda-gpu-passthrough¶
no rebase impact: dev/docker-compose.yml only; changes default runtime: and expands capabilities for the NVIDIA passthrough. No C sources, public API, or upstream-mirrored files are touched. ADR-1053.
fix/cppcheck-vif-suppression-syntax¶
no rebase impact: comment-only change in core/src/feature/vif.c. Corrects cppcheck suppression delimiter from [...] to ; ... in 10 inline comments. No logic, no public API, no upstream-mirrored code changed.
chore/state-md-stale-open-rows-sweep-20260606¶
no rebase impact: docs/state.md only — removes 2 stale Open rows already in Recently Closed, updates 1 Open row's owner reference, and fixes a duplicate Recently Closed row. No C sources, public API, or upstream-mirrored files are touched.
test/go-vmafx-node-coverage-r6¶
no rebase impact: test-only additions to cmd/vmafx-node/executor_extra_test.go and cmd/vmafx-node/bpf/bypass_unit_test.go. No C sources, public API, upstream-mirrored files, or production Go code is changed.
fix/vendored-cjson-pdjson-depth-overflow (ADR-1061, 2026-06-06)¶
Touches two vendored sources: core/src/pdjson.c and core/src/mcp/3rdparty/cJSON/cJSON.c. Neither file exists in the Netflix upstream tree (pdjson and cJSON are fork-local additions). Rebase against Netflix/vmaf master has zero conflict risk from this change.
The PDJSON_STACK_MAX constant added to pdjson.c is a #define at the top of the file; any future vendor sync that replaces the file will need to re-apply the same define or find a better integration point.
test/svm-multiclass-realloc-and-compose-dri-lint (PR #TBD, 2026-06-06, ADR-1066)¶
no rebase impact: adds new test file core/test/test_svm_multiclass.c, new lint script scripts/ci/check-compose-dri-writable.sh, and a step in .github/workflows/dev-container-build.yml. No existing C source, public API, or upstream-mirrored file is modified.
fix/go-staticcheck-r10-timer-body (ADR-1065)¶
no rebase impact: all changes are fork-local Go files (pkg/storage/, cmd/vmafx-controller/, cmd/vmafx-server/) with no upstream-mirrored C or Python code affected. The time.NewTicker refactor is a semantic no-op for rebases; the MaxBytesReader + ReadTimeout additions are internal to the HTTP handler and do not touch any public API surface.
fix/sanitizer-exclusions-huge-alloc-tests¶
no rebase impact: only .github/workflows/sanitizers.yml and a changelog fragment are modified. No C source, public API, or upstream-mirrored file is touched.
fix/pic-prealloc-asan-leak (2026-06-06)¶
no rebase impact: single-line change in core/src/libvmaf.c setting fex_ctx->is_initialized = true before the batch-flush loop in flush_context_threaded. The change is additive — it enables the existing vmaf_feature_extractor_context_close teardown path to run correctly on shared (never-initialized) contexts. No upstream-mirrored file is modified, no public API is affected, no test fixtures change.
fix/win32-pthread-once-redefinition (2026-06-06)¶
no rebase impact: removes a duplicate block from core/src/compat/win32/pthread.h (typedef, macro, BOOL CALLBACK, and inline function). The surviving first definition is unchanged. No upstream-mirrored file is modified, no public API is affected, no test fixtures change.
fix/ubsan-enum-invalid-value-log-opt (2026-06-06)¶
no rebase impact: both changes are one-liner casts (static_cast<int>) in core/src/log.cpp and core/src/opt.cpp. Neither file is upstream-mirrored (both are C++23 rewrites of upstream C originals — ADR-0708 and ADR-0772), no public API is affected, and no test fixtures change.
test/operator-controller-coverage (2026-06-06)¶
no rebase impact: all changes are confined to cmd/vmafx-operator/internal/controller/ test files and the fix to vmafxnode_controller.go (removes the status.lastHeartbeat write — additive correctness only). No public API is affected, no upstream-mirrored file is touched, no test fixtures change. Depends on PR #759 (ADR-1069 CRD schema fix) for the envtest assertions to pass end-to-end with a real API server.
fix/disable-recurring-flaky-tests (2026-06-07)¶
no rebase impact: the change is two should_fail: true additions to core/test/meson.build and a new ADR file. No source files, no public API, no upstream-mirrored files, and no test fixtures are modified. The test binaries remain compiled; only their expected-failure polarity flips in the Meson test registry.
fix/dnn-onnx-domain-bypass (2026-06-07, ADR-1089)¶
no rebase impact: all changes are confined to fork-local files (core/src/dnn/onnx_scan.c, core/src/dnn/onnx_scan.h, core/test/dnn/test_onnx_scan.c). Netflix/vmaf has no ONNX DNN surface; no upstream-mirrored file is touched. The scanner's internal enum gains NODE_DOMAIN_FIELD = 7; the scan_node loop gains a new branch; a 50-line read_domain() helper is added. No public API or CLI surface changes.
Re-test: meson test -C build test_onnx_scan (26/26 pass).
fix/r13-gpu-dispatch-env-fast-path-data-race (2026-06-06)¶
no rebase impact: changes confined to core/src/gpu_dispatch_env.cpp. Adds std::atomic<bool> ready per-slot publication flag. No public header or ABI change; the AtomicBool type is internal to the translation unit.
worktree-wf_b08e0c22-717-2 / fix/framesync-producer-death-deadlock (2026-06-07)¶
no rebase impact: changes confined to core/src/framesync.c and core/src/framesync.h. Adds vmaf_framesync_abort() and aborted flag. The new function is internal to libvmaf; no public header change.
fix/cuda-stream-event-leak-paths (2026-06-07)¶
no rebase impact: changes confined to core/src/cuda/picture_cuda.c and four CUDA feature extractors. Graduated cleanup labels only; no new public API or ABI change.
fix/metal-buffer-ownership-leaks (2026-06-07)¶
no rebase impact: changes confined to core/src/feature/metal/float_ms_ssim_metal.mm and core/src/metal/picture_import.mm. MTLBuffer retain-count fixes only; no public header or ABI change.
fix/mcp-http-edge-cases (2026-06-07)¶
no rebase impact: changes confined to mcp-server/vmaf-mcp/src/vmaf_mcp/http_transport.py and a new test file. No C API or public header change.
fix/vmaftune-corner-cases-r14 (2026-06-07)¶
no rebase impact: changes confined to tools/vmaf-tune/src/vmaftune/cli.py and encode.py. No C API or public header change.
fix/ci-yaml-concurrency-timeout (2026-06-07)¶
no rebase impact: changes confined to 18 .github/workflows/*.yml files. No source, header, or build file is modified.
fix/ci-workflow-permissions-least-privilege (2026-06-07)¶
no rebase impact: changes confined to two .github/workflows/ files. No source, header, or build file is modified.
cov/vmafx-controller-queue-nodes-auth (2026-06-07)¶
no rebase impact: changes confined to cmd/vmafx-controller/queue/queue.go and test files. No public API or ABI change.
fix/operator-crd-status-schema-gaps (2026-06-07)¶
no rebase impact: changes confined to cmd/vmafx-operator/ and CRD YAML files. No C API or libvmaf surface changed.
fix/ms-ssim-hip-adr0990-precision-parity (2026-06-07)¶
no rebase impact: changes confined to core/src/feature/hip/integer_ms_ssim_hip.c and ms_ssim_score.hip. No public header or ABI change.
fix/ms-ssim-option-parity-hip-sycl (2026-06-07)¶
no rebase impact: changes confined to CUDA/HIP/SYCL ms-ssim extractors. No public header or ABI change.
fix/cargo-deny-bsd2-patent-allowlist (2026-06-07)¶
no rebase impact: changes confined to deny.toml. No source, header, or build file is modified.
fix/rust-pilot-clippy (2026-06-07)¶
no rebase impact: changes confined to bindings/rust/ and core/src/feature/rust/tad/. No C API or public header change.
fix/coverage-pkg-storage (2026-06-07)¶
no rebase impact: new test files only (pkg/storage/coverage_test.go, cmd/vmafx-node/bpf/coverage_test.go) + changelog fragments. No production source or public header change.
fix/mcp-streaming-backpressure-disconnect (2026-06-07)¶
no rebase impact: changes confined to cmd/vmafx-mcp/impl*.go and mcp-server/vmaf-mcp/src/vmaf_mcp/server.py. No C API or public header change.
test/r12-thread-safety-batch-tsan (2026-06-07)¶
no rebase impact: adds new test file core/test/test_thread_safety_batch.c and updates core/test/meson.build. No production source or public header change.
fix/r12-picture-ref-unref-error-path-coverage (2026-06-07)¶
no rebase impact: adds test coverage to core/test/test_picture.c only. No production source or public header change.
fix/r14-yuv-input-edge-cases (2026-06-07)¶
no rebase impact: changes confined to core/tools/y4m_input.c. No public header or ABI change.
fix/r14-cli-flag-parsing (2026-06-07)¶
no rebase impact: changes confined to core/tools/cli_parse.c and cli_parse.cpp. No public header or ABI change.
fix/test-malloc-leak-r12 (2026-06-07)¶
no rebase impact: test-only changes to free malloc'd buffers on early exit in test_framesync.c and test_pic_preallocation.c. No production source touched.
fix/test-framework-mu-assert-stderr-output (2026-06-07)¶
no rebase impact: changes confined to core/test/test.c and core/test/test.h. Fix mu_report writing to stdout instead of stderr; add missing include guard.
worktree-wf_392e91a3-897-12 / fix/ci-action-sha-consistency (2026-06-07)¶
no rebase impact: corrects inconsistent action SHAs in e2e-k8s, go-ci, and rust-ci workflows. No source, header, or build file is modified.
fix/ort-error-message-logging (2026-06-07)¶
no rebase impact: changes confined to core/src/dnn/ort_backend.c. No public header or ABI change.
fix/bench-clock-unchecked-returns (2026-06-07)¶
no rebase impact: changes confined to core/tools/vmaf.c and core/tools/vmaf_bench.c. No public header or ABI change.
fix/msvc-windows-portability-hygiene (2026-06-07)¶
no rebase impact: dead code removal in core/src/dnn/model_loader.c and core/src/feature/x86/vif_avx2.c / vif_avx512.c. No public header, ABI, or numeric change.
fix/roi-frame-bytes-odd-dims (2026-06-07)¶
no rebase impact: changes confined to core/tools/vmaf_roi.c and core/test/test_vmaf_roi.c. No public header or ABI change.
fix/vmaf-per-shot-correctness (2026-06-07)¶
no rebase impact: changes confined to core/tools/vmaf_per_shot.c and core/tools/test/test_vmaf_per_shot.sh. No public header or ABI change.
fix/compat-python-vmaf-mode-shim (2026-06-07)¶
no rebase impact: changes confined to compat/python-vmaf/__init__.py, compat/python-vmaf/config.py, and compat/python-vmaf/core/matlab_feature_extractor.py. No C source, public header, or ABI change.
fix/ffmpeg-vmaf-pre-device-full-enum (2026-06-07)¶
no rebase impact: changes confined to ffmpeg-patches/0002-*.patch and ffmpeg-patches/0014-*.patch. No C source or public header in-tree is modified.
fix/observability-otel-trace-context (2026-06-07)¶
no rebase impact: changes confined to cmd/vmafx-server/grpc_server.go, pkg/observability/otel_instruments.go, and pkg/score/grpc_client.go. No C source, public header, or ABI change.
worktree-wf_392e91a3-897-1 / fix/ai-atomic-writes (2026-06-07)¶
no rebase impact: changes confined to ai/src/aiutils/file_utils.py, ai/src/aiutils/run_manifest.py, and five AI scripts. No C source, public header, or build file is modified.
fix/helm-rolling-update-correctness (2026-06-07)¶
no rebase impact: changes confined to deploy/helm/vmafx/ Helm chart templates and values. No C source, public header, or Go source is modified.
fix/r12-dead-code-and-unused-var-after-pr-train (2026-06-07)¶
no rebase impact: removes dead code and unused variables from core/src/feature/integer_motion.c and core/test/test_framesync.c. No public header or ABI change.
fix/motion-coverage-picture-ref-include (2026-06-07)¶
no rebase impact: changes confined to core/test/test_integer_motion_coverage.c. Test-only change. No production source or public header modified.
fix/state-sweep-fix — CI build-matrix + ASan + motion_v2 coverage (2026-06-07)¶
libvmaf-build-matrix.yml: four meson setup / ninja invocations updated from libvmaf to core — ADR-0700 rename follow-up. Conflicts possible if another branch also edits those four lines; resolve by keeping core as the source dir. tests-and-quality-gates.yml: ASAN_OPTIONS: allocator_may_return_null=1 added to the sanitizer step env; no conflict risk. core/src/picture.c and core/src/picture.h: vmaf_picture_pool_flush() added; no conflict risk (new symbol). core/test/test_integer_motion_v2_coverage.c: manual prev_ref assignments and memset calls removed; tests now call extract in a plain loop. Conflicts possible if another branch edits the same test functions; resolve by keeping the wrapper-managed approach (no manual prev_ref assignment).
fix/ffmpeg-vulkan-ci-job-removal (2026-06-07)¶
no rebase impact: only .github/workflows/ffmpeg-integration.yml modified (dead job removed) and docs/state.md + changelog.d/fixed/ffmpeg-vulkan-ci-job-removal.md added. No production source, public header, or meson build files touched.
docs/phase4b9-container-only-publishing (2026-06-08)¶
no rebase impact: docs-only change. CLAUDE.md §15 updated with a new publishing bullet; docs/development/publishing.md and docs/adr/1102-*.md added; docs/adr/README.md gets one new index row; changelog.d/added/1102-*.md added. No production source, public header, meson build files, or test files touched.
fix/codeql-large-parameter-const-pointer (2026-06-12)¶
Rebase-sensitive: function signatures changed across VIF, SpEED, and CAMBI. - VifBuffer: PADDING_SQ_DATA, PADDING_SQ_DATA_2 (integer_vif.h), pad_top_and_bottom, decimate_and_pad, subsample_rd_8/16 (integer_vif.c), vif_subsample_rd_8/16_avx2 + vif_filter1d_8/16_avx2 (x86/vif_avx2.h/.c), vif_subsample_rd_8/16_avx512 (x86/vif_avx512.h/.c), vif_subsample_rd_8/16_neon (arm64/vif_neon.h/.c): all now take const VifBuffer * instead of VifBuffer by value. The function-pointer typedef in VifState was updated accordingly. Call sites pass &buf or &s->public.buf as appropriate. - SpeedDimensions (speed.c): 11 static functions now take const SpeedDimensions *. Call sites in speed_extract_score pass &s->dimensions. - SpeedInternalDimensions (speed_internal.h/.c): speed_internal_filter_and_downscale and speed_internal_compute_cov_matrix now take const SpeedInternalDimensions *. All GPU backend call sites (cuda/hip/sycl speed twins) pass &s->dim. - CambiBuffers (cambi.c): cambi_score now takes const CambiBuffers *. Call site passes &s->buffers. Conflicts possible if upstream or another branch edits these function signatures or adds new call sites; resolve by carrying the const * form forward.
docs/index.md — Vulkan image-import list item removed (docs/remove-stale-vulkan-image-import-ref)¶
no rebase impact: docs-only removal of a stale list item; no code or nav structure changed.
fix/functional-matrix-broken-17 (2026-06-12)¶
no rebase impact: all changes are bug fixes in independent files (bench_all.sh, bisect.py, server.py, cli.py, op_allowlist.c, float_adm_cuda.c, dnn_api.c, Containerfile) with no shared function-signature changes, no renamed symbols, and no upstream-mirrored path modifications.
rc/scaffold-stub-completion — picture_v2 implementation + ai/scripts exit-code fix¶
Files touched: core/src/picture_v2.c (new), core/test/test_picture_v2.c (new), core/include/libvmaf/picture_v2.h, core/src/meson.build, core/include/libvmaf/meson.build, core/test/meson.build, docs/architecture/vmaf-picture-v2-migration.md, ai/scripts/{gen_calibration,quantize_int8,build_calibration_set,eval_loso_fr_regressor_v2, external_benchmark_pvmaf,fetch_lsvq,gen_dists_sq_placeholder_onnx, gen_mobilesal_placeholder_onnx,gen_ssimulacra2_eotf_lut,hdrsdr_vqa_to_corpus_jsonl, my_corpus_to_corpus_jsonl,train_fr_regressor_v4,train_video_saliency_student}.py.
Rebase impact: low. picture_v2.c is a new file; no upstream file was modified. If upstream ever adds its own picture_v2.c (unlikely given it is a fork-local concept), resolve by keeping the fork's implementation. The ai/scripts exit-code fix is purely script-internal; no C or build system conflict possible with upstream.
fix/rc-gate-three-infra-test-bugs (2026-06-13)¶
no rebase impact: all changes are test-only. cmd/vmafx-mcp/server_test.go drops t.Parallel() from one test function (no C/API change). core/test/meson.build adds a TSAN_OPTIONS env entry alongside an existing ASAN_OPTIONS entry for one test. core/test/dnn/test_cli.sh adds a DNN-availability probe near the top of the script. None of these files exist in upstream Netflix/vmaf; no upstream merge conflict is possible.
chore/remove-vulkan-moltenvk-dead-leftovers (2026-06-13)¶
no rebase impact: deletions only (subproject wraps, Docker stages, CI job bodies, moltenvk.md). No shared function signatures changed, no symbols renamed, no upstream-mirrored C paths modified. ABI-reserved enum gaps preserved.
fix/hip-vif-mirror2-boundary (2026-06-13)¶
no rebase impact for upstream syncs: touches only fork-local HIP files (core/src/feature/hip/integer_vif/vif_statistics.hip) and the fork-local HIP VIF parity test (core/test/test_hip_vif_parity.c). Neither file exists in upstream Netflix/vmaf. The docs/adr/ and docs/backends/hip/overview.md changes are also fork-local. Any future upstream sync that adds upstream files under core/src/feature/hip/ would require manual review of boundary semantics, but no mechanical conflict is possible.
feat/pelorus-vendor-interop-abi (2026-06-14)¶
no rebase impact for upstream Netflix/vmaf syncs: every file is fork-local and has no upstream counterpart. New vendored mirror under core/src/interop/pelorus_*.c + core/include/libvmaf/pelorus/*.h (sourced from VMAFx/pelorus@835e097, NOT Netflix upstream), the conformance fixture core/test/test_pelorus_interop.c, scripts/sync-pelorus-interop.sh, and the docs/ADR/changelog/state deliverables. The core/src/meson.build and core/test/meson.build edits append to fork-local lists (the libvmaf source list and the test registrations) and do not touch upstream-mirrored build logic.
Rebase-sensitive invariant (cross-repo, NOT upstream): the vendored files are a byte-identical mirror pinned to VMAFx/pelorus@835e097. Never hand-edit them — clang-format/clang-tidy are deliberately excluded for these paths (dir-local .clang-tidy, .cppcheck-suppressions.txt, the make format path filters, the assertion-density skip, and the auto-format-on-edit.sh PostToolUse hook skip). A pelorus ABI bump is re-synced via scripts/sync-pelorus-interop.sh --update (which also bumps the pin), never by editing the mirror in place. If a future change rewrites these files, re-run the sync guard + the conformance fixture before merging.
feat/pelorus-autotune-control-plane (2026-06-14)¶
no rebase impact for upstream syncs: touches only fork-local files under tools/vmaf-tune/ — a new src/vmaftune/filter_adapters/ package (__init__.py, pelorus_deband.py), a new src/vmaftune/prefilter.py module, three new test files under tests/, plus additive edits to src/vmaftune/cli.py (new imports, a prefilter subparser, a _run_prefilter handler, and one dispatch line). None of these exist in upstream Netflix/vmaf — tools/vmaf-tune/ is entirely fork-added. No shared function signatures changed and no upstream-mirrored paths were touched. The cli.py edits are append-only at well-separated sites (import block, subparser-registration block, handler block, dispatch block), so even a fork-internal rebase against a newer cli.py resolves cleanly. External coupling note: the 10 deband knobs are a verbatim copy of the Pelorus ADR-0110 control-plane contract; a contract change on the Pelorus side requires a matching edit to filter_adapters/pelorus_deband.py in a coordinated two-repo PR (the conformance test fails on drift).
feat/golusoris-server (2026-06-14)¶
no rebase impact for upstream Netflix/vmaf syncs: every file touched is fork-local Go and has no upstream counterpart. The change rewrites cmd/vmafx-server/*.go (the Go gRPC + HTTP scoring service) onto the golusoris fx framework (ADR-1119), updates Dockerfile.go-server, docs/server/grpc.md, and docs/usage/env-vars.md for the env-var rename, and adds an app_test.go fxtest lifecycle test. None of these exist in upstream; libvmaf's C sources, public headers, and the Netflix golden gate are untouched.
Rebase-sensitive invariants (fork-internal, NOT upstream): - R1 cgo-lifetime stop order. The composition root forces the *libvmaf.Scorer to be constructed BEFORE the golusoris *grpc.Server (an fx.Invoke(func(_ *libvmaf.Scorer) {}) registered ahead of the gRPC service-registration invoke, plus scorer-first arg order in that invoke). fx runs OnStop hooks in reverse construction order, so this guarantees the gRPC server's GracefulStop drains in-flight Score calls before the scorer's Close() releases C resources. TestStopOrderScorerAfterGRPC pins this; do not reorder those invokes or flip the arg order without re-deriving the ordering (see the empirical probe rationale in the PR). - go.mod pin. github.com/golusoris/golusoris stays at v0.3.1. The fx migration only adds transitive // indirect deps (go-grpc-middleware/v2) and promotes go-chi/chi/v5 from indirect to direct via go mod tidy; it does not bump the golusoris pin or touch internal/app/bootstrap. - Env-var contract. VMAFX_HTTP_ADDR / VMAFX_GRPC_LISTEN map to the golusoris http.addr / grpc.listen keys under the VMAFX_ prefix. If golusoris renames those keys, the server's documented env contract must follow.
feat/golusoris-node (2026-06-15)¶
no rebase impact for upstream Netflix/vmaf syncs: every file touched is fork-local Go and has no upstream counterpart. The change rewrites cmd/vmafx-node/main.go (the Go gRPC worker root) onto the golusoris fx framework (ADR-1119, Phase-1 PR-3), adds cmd/vmafx-node/providers.go (the fx domain providers) and cmd/vmafx-node/scoring_handler.go (the VmafxScoring impl moved out of the now-removed cmd/vmafx-node/server package), refactors cmd/vmafx-node/online_feedback.go (Start/Close lifecycle), updates docs/usage/env-vars.md for the env-var rename, and adds app_test.go + app_scorestream_test.go (fxtest lifecycle + end-to-end ScoreStream). None of these exist in upstream; libvmaf's C sources, public headers, and the Netflix golden gate are untouched. The eBPF loader under cmd/vmafx-node/bpf/ is unrelated to golusoris and was not touched.
Rebase-sensitive invariants (fork-internal, NOT upstream): - R-node lifecycle stop order. The composition root forces the *libvmaf.Scorer to be constructed first, then the *FeedbackClient + *Executor (a lazy-provider guard fx.Invoke(func(_ *FeedbackClient, _ *Executor) {})), then the golusoris *grpc.Server (a standalone fx.Invoke(func(_ *grpc.Server) {}) lazy-provider guard). fx runs OnStop hooks in reverse construction order, so this guarantees: gRPC GracefulStop drains in-flight Score / ScoreStream calls → FeedbackClient drainer stops → scorer Close(). TestStopOrderNode (app_test.go) pins this against the REAL hook firing order; do not reorder those invokes or flip arg order. - Lazy-provider listener guard. grpc.Module's listener binds in an OnStart hook that only runs if *grpc.Server is consumed. The standalone fx.Invoke(func(_ *grpc.Server) {}) is load-bearing — remove it and the node serves nothing. TestAppStartsAndBinds dials the bound addr to prove it. - FeedbackClient drainer lifetime. NewFeedbackClient(log) no longer takes a context or spawns a goroutine; Start() launches the drainer (bound to an internal, Close-owned context) and Close() stops + awaits it. Both are idempotent. Wired to fx OnStart/OnStop in provideFeedbackClient. - go.mod pin. github.com/golusoris/golusoris stays at v0.4.0. The fx migration only adds the transitive // indirect dep go-grpc-middleware/v2 v2.3.3 via go mod tidy; it does not bump the golusoris pin or touch internal/app/bootstrap. - Env-var contract. VMAFX_GRPC_LISTEN maps to the golusoris grpc.listen key under the VMAFX_ prefix (replaces VMAFX_NODE_ADDR). If golusoris renames that key, the node's documented env contract must follow.
feat/golusoris-operator (2026-06-15)¶
no rebase impact for upstream Netflix/vmaf syncs: every file is fork-local and has no upstream counterpart. cmd/vmafx-operator/main.go is rewritten from a hand-rolled controller-runtime entry point onto the golusoris fx framework (ADR-1119 Phase 1), and cmd/vmafx-operator/main_test.go adds fx-graph validation. The reconcilers under cmd/vmafx-operator/internal/controller/ and the webhooks under cmd/vmafx-operator/internal/webhook/ are unchanged; only their wiring (Setup-against-manager) moved into fx.Invoke hooks. None of these files exist upstream.
Rebase-sensitive invariant (cross-repo, NOT Netflix upstream): this migration requires github.com/golusoris/golusoris >= v0.4.0, because the github.com/golusoris/golusoris/k8s/operator module (introduced by golusoris commit 3df9f1a / PR #224) first appears in tag v0.4.0 and is ABSENT in v0.3.1. The foundation commit (afd66c7ef) pins v0.3.1, which predates k8s/operator — so cmd/vmafx-operator/main.go does not compile until the go.mod golusoris pin is bumped to v0.4.0+. The pin bump is intentionally NOT part of this branch (it is a shared go.mod change owned by the migration orchestrator).
golusoris#227 note: the in-tree main.go does NOT add an app-level ctrl.SetLogger shim — golusoris v0.4.0's operator.Module already calls ctrl.SetLogger(loggerFromSlog(logger)) inside newManager, so a second SetLogger from the app would be redundant. If a future golusoris release reverts that (regressing #227), re-add the shim as an fx.Invoke(func(l *slog.Logger){ ctrl.SetLogger(logr.FromSlogHandler(l.Handler())) }). Likewise webhooks are wired via operator.Options.WebhookPort (also added post-v0.3.1); if that field disappears upstream, the app must stand up its own webhook.NewServer and add it to the manager.
feat/golusoris-mcp (2026-06-15)¶
no rebase impact for upstream Netflix/vmaf syncs: this PR rewrites only cmd/vmafx-mcp/main.go (the Go MCP server composition root) onto the golusoris fx framework (ADR-1119, Phase-1 PR-5), plus the fork-local docs/changelog/AGENTS deliverables. cmd/vmafx-mcp/ is entirely fork-added and has no upstream counterpart. The MCP tool surface (tools.go, impl.go, impl_direct.go, server.go) is byte-unchanged, and no test file changed.
- Composition root. The hand-rolled
flag.Parse+signal.NotifyContext - bespoke stdio/HTTP transport loops + custom
observability.InitOTelare replaced byfx.New(bootstrap.Base, fx.Replace(config.Options{...}), bootstrap.FxLogger(), fx.Provide(buildMCPServer), fx.Invoke(runMCPTransport)).Run(). Mirrorscmd/vmafx-server/main.goandcmd/vmafx-node/main.go. Because the MCP server is NOT a golusoris server module (golusoris ships no MCP module), the transport is owned in therunMCPTransportlifecycle hook rather than by a framework module — if golusoris later adds an MCP module, fold the hook into it. - bootstrap dependency. This PR consumes
internal/app/bootstrap.Baseandbootstrap.FxLogger()but does NOT modify them; it shares the bootstrap stanza with the sibling fx migrations (#932/#934/#935/#936). A rebase that reshapesbootstrap.Base(e.g. when golusoris#226 ships a version module, or golusoris#234's LOG_LEVEL prefix-read lands and the env bridge can be deleted) must re-checkmain()here too. - Env bridge (interim).
main()bridgesVMAFX_LOG_LEVEL → LOG_LEVELandVMAFX_LOG_FORMAT → LOG_FORMATbeforefx.New, identical to the sibling binaries (golusoris#234). Delete all four bridges across the cmd/ tree in one sweep once the carrying golusoris tag lands. - Env-var / flag contract change.
--transport/--portflags removed; replaced byVMAFX_MCP_TRANSPORT(mcp.transport, defaultstdio) andVMAFX_MCP_HTTP_ADDR(mcp.http.addr, default:3000).VMAF_BINandVMAFX_MCP_DIRECTare read directly by the tool handlers (not via koanf) and are unchanged. - Rebase-sensitive invariant — stdio-stdout purity. Nothing in the fx graph may write to stdout in stdio mode (the JSON-RPC framing owns it). golusoris log → stderr,
otel.Moduleis OTLP-gRPC (no stdout),bootstrap.FxLogger()→ slog → stderr. A future rebase that adds anfx.Print-style logger, a stdout OTel exporter, or anyfmt.Printlnto the composition root MUST gate it off in stdio mode. See cmd/vmafx-mcp/AGENTS.md invariant #11.
feat/golusoris-controller (2026-06-15)¶
no rebase impact for upstream Netflix/vmaf syncs: every file touched is fork-local Go and has no upstream counterpart. The change rewrites cmd/vmafx-controller/*.go (the Go gRPC + HTTP controller: SQLite job queue + node registry + FIFO scheduler + JWT auth) onto the golusoris fx framework (ADR-1119, Phase-1 PR-2), refactors cmd/vmafx-controller/nodes/registry.go, adds an app_test.go fxtest lifecycle suite, and updates docs/usage/env-vars.md for the env-var rename. libvmaf's C sources, public headers, and the Netflix golden gate are untouched; no ffmpeg-patch impact.
Rebase-sensitive invariants (fork-internal, NOT upstream): - golusoris pin → v0.4.1. The PR depends on golusoris#225 (grpc.ProvideServerOption, used to chain the JWT auth interceptors), which is NOT in the v0.4.0 tag the shared go.mod currently pins. The committed go.mod keeps the v0.4.0 pin (no replace directive); the binary + app_test.go will not compile until the orchestrator bumps the pin to the tag carrying #225 (expected v0.4.1). go mod tidy against v0.4.0 is otherwise clean — the migration only adds the transitive // indirect go-grpc-middleware/v2 (golusoris HEAD's grpc.Module recovery/logging interceptors). - R1 stop order. The composition root forces the *libvmaf.Scorer, the SQLite queue.Queue, and the *nodes.Registry to be constructed BEFORE the golusoris *grpc.Server (an fx.Invoke(func(_ *libvmaf.Scorer, _ queue.Queue, _ *nodes.Registry) {}) registered ahead of the gRPC service-registration invoke). fx runs OnStop hooks in reverse construction order, so this guarantees the gRPC GracefulStop drains in-flight RPCs before the queue Close, the node-registry reaper stop, and the scorer Close. TestStopOrder pins this; do not reorder those invokes without re-deriving the ordering. - nodes.Registry lifecycle. NewRegistry(log) no longer takes a context or spawns the reaper at construction; the reaper is launched by Start(ctx) (fx OnStart) and stopped + awaited by Close() (fx OnStop). Every call site (production + tests) must drive Start/Close via the lifecycle rather than passing a caller context. - gen/go/controller proto types are hand-written. Unlike the protoc-generated gen/go (scoring) types, gen/go/controller/*.pb.go are hand-maintained and do NOT implement the protobuf-v2 reflection interface, so VmafxController messages cannot be marshaled by the standard gRPC wire codec. In-process handler tests are unaffected; over-the-wire fxtests therefore use the VmafxScoring service. See the orchestrator note below — regenerating gen/go/controller with buf/protoc is the proper fix and is a candidate follow-up. - Env-var contract. VMAFX_HTTP_ADDR / VMAFX_GRPC_LISTEN map to the golusoris http.addr / grpc.listen keys; auth.tenant_claim / auth.roles_claim are golusoris CompoundKeys (preserve the underscore). If golusoris renames those keys, the controller's documented env contract must follow.
fix/mcp-probe-parity (2026-06-15)¶
no rebase impact: edits the fork-only MCP servers (cmd/vmafx-mcp/{impl.go,tools.go,impl_test.go,AGENTS.md}, mcp-server/vmaf-mcp/src/vmaf_mcp/server.py) + docs/mcp/tools.md + docs/state.md + changelog. No libvmaf C-API / CLI / meson_options.txt / public-header change → no ffmpeg-patch impact. Rebase-sensitive invariant — probe_backend parity (cmd/vmafx-mcp/AGENTS.md invariant #12): the Go handleProbeBackend and the Python _probe_backend MUST share the same 64×64 (≥36px/dim, CUDA-ADM minimum) synthetic probe frame AND the same runtime_healthy predicate (null/non-finite vmaf.mean → runtime_healthy=false, error "vmaf returned exit 0 but score was null"). A rebase that touches either probe handler must keep the two in lock-step.
fix/bughunt-feature-cpu — CIEDE 4:2:2 chroma-upsample flag swap + cambi init leak (2026-06-27)¶
Rebase impact: DIVERGES from upstream on ciede.c — read carefully on the next sync. core/src/feature/ciede.c scale_chroma_planes / scale_chroma_planes_hbd is a near-verbatim upstream Netflix file, and upstream carries the identical transposition bug: the horizontal sample index keys off ss_ver and the vertical row advance keys off ss_hor. This fix swaps them to the correct chroma-subsample math (horizontal → ss_hor, vertical → ss_ver). Rebase-sensitive invariant for the next syncer: do NOT let an upstream sync silently revert this — if Netflix re-pulls the transposed lines, keep the fork's corrected flags. The bug only manifests on YUV422P (heap OOB + wrong ciede2000 scores); YUV420P is a no-op (both flags set) and YUV444P never calls the function, so the Netflix golden CIEDE2000 pair (420P) is bit-identical either way. Guarded by test_ciede_scale_chroma_422_8b / _16b in core/test/test_ciede.c, which fail against the buggy/upstream form. The core/src/feature/cambi.c change is a fork-internal error-path unwind (route init() failures through close_cambi()); cambi is upstream-mirrored but the change is confined to the -ENOMEM / -EINVAL error paths with no success-path or scoring delta — a sync that re-pulls cambi init() should re-apply the goto fail unwind. No public header, CLI, meson-option, or ffmpeg-patch surface changes.
fix/bughunt-simd (2026-06-27)¶
no rebase impact: edits fork-added SIMD float_moment paths only (core/src/feature/arm64/moment_sve2.c, core/src/feature/x86/moment_avx2.c, core/src/feature/x86/moment_avx512.c, core/src/feature/arm64/moment_neon.c) + the fork test core/test/test_moment_simd.c. No libvmaf C-API / CLI / meson_options.txt / public-header change -> no ffmpeg-patch impact. No Netflix golden assertion touched. Rebase-sensitive invariant — moment SVE2 lane mapping (core/src/feature/arm64/AGENTS.md): the SVE FCVT .s->.d (svcvt_f64_f32) widens the EVEN-indexed f32 lanes (source element 2i), NOT the lower contiguous lanes; the odd lanes must be widened with the SVE2 FCVTLT (svcvtlt_f64_f32, source element 2i+1). Any future edit to moment_sve2.c must keep the even+odd dual-convert (stepping a full svcntw() register) or it will silently double-count even lanes and drop odd lanes on >128-bit SVE.
fix/bughunt-dnn (2026-06-27)¶
no rebase impact: edits the fork-only DNN tiny-AI / ONNX Runtime path (core/src/dnn/tensor_io.c, core/src/dnn/ort_backend.c) and its fork-only tests (core/test/dnn/test_tensor_io.c, core/test/dnn/test_ort_internals.c) + docs/state.md + changelog. No libvmaf public-header / CLI / meson_options.txt change, so no ffmpeg-patch impact. The whole core/src/dnn/ tree is fork-added (not present upstream Netflix/vmaf), so there is no upstream-parity conflict surface to track on a future sync.
fix/bughunt-ai (2026-06-27)¶
no rebase impact: edits training-harness Python only (ai/scripts/{aggregate_corpora,extract_k150k_features,materialize_saliency_features}.py, ai/train/konvid_pair_dataset.py + ai/tests/). No libvmaf C-API / CLI / meson_options.txt / public-header change → no ffmpeg-patch impact. No rebase-sensitive invariants.
fix/bughunt-cli (2026-06-27)¶
no ffmpeg-patch impact: edits the CLI (core/tools/vmaf.cpp, cli_parse.cpp) only. Deleted dead core/tools/vmaf.c (unreferenced; superseded by vmaf.cpp) + re-pointed 8 stale config/doc refs. Invariant: cli_parse.c is NOT dead — it is the TU compiled into test_cli_parse / test_cli_parse_long_only_args / fuzz_cli_parse; do not delete it on rebase. --help→stdout/exit0, no-frames→exit 101 (VMAF_EXIT_NO_FRAMES_DECODED).
chore/version-3.2.0 (2026-06-27)¶
no rebase impact: bumps core/meson.build version (x-release-please-version) + .release-please-manifest.json . to 3.2.0 / 3.2.0-lusoris.0 to track upstream libvmaf 3.2.0 SONAME. On an upstream sync, keep the fork's libvmaf version aligned with Netflix's (<upstream-X.Y.Z>-lusoris.N).
fix/round3-build-gpu-batch (2026-06-27)¶
no ffmpeg-patch impact. R3-6 HIP integer_vif uninit-err (init err=0). R3-9 NVTX libdl → cc.find_library('dl'). R3-10 ssim AVX2 carve-out + _x86_simd_strict_fp_extra (icx -fp-model=precise; no-op on gcc/clang). Invariant: every x86 SIMD carve-out lib that needs bit-exactness under icx must carry _x86_simd_strict_fp_extra; keep the ssim carve-out aligned with its psnr_hvs/ms_ssim/ssimulacra2 siblings.