Architectural Decision Records (ADR)¶

This is the canonical, tracked decision log for the fork. Every non-trivial architectural / policy / scope decision lands here as its own markdown file before the corresponding commit merges.

Format¶

We use Michael Nygard's ADR format (MADR-style), one markdown file per decision — not a mega-table. See joelparkerhenderson/architecture-decision-record for background.

Each ADR file is named NNNN-kebab-case-title.md with a zero-padded 4-digit ID and follows the structure in 0000-template.md:

# ADR-NNNN: <short, declarative title>

- **Status**: Proposed | Accepted | Deprecated | Superseded by [ADR-NNNN](NNNN-title.md)
- **Date**: YYYY-MM-DD
- **Deciders**: <names / handles>
- **Tags**: <comma-separated area tags>

## Context              — the problem, the forces at play
## Decision             — one paragraph in active voice
## Alternatives considered  — at minimum the runner-up, in a pros/cons table
## Consequences         — Positive / Negative / Neutral-follow-ups
## References           — upstream docs, prior ADRs, related PRs, popup-answer source

Conventions¶

Filename: NNNN-kebab-case-title.md. IDs are assigned in commit order and never reused.
Immutable once Accepted: the body is frozen. To change a decision, write a new ADR with Status: Supersedes ADR-NNNN and flip the old one to Superseded by ADR-MMMM.
One decision per ADR — if you find yourself writing "and also…", split it.
Tagging: use the flat tag palette below so grep -l 'Tags:.*cuda' docs/adr/*.md works. New tags are fine when justified.
Link from per-package AGENTS.md: the relevant per-package AGENTS.md points to the ADRs that govern that subtree, so the rationale is one click away from the code.
Backfill policy: ADRs ≤ 0099 are backfills — decisions made before the ADR practice was formalised on 2026-04-17, captured retroactively from commit history and planning dossiers. Their Status reflects the current code, not the original decision date. New decisions start at 0100.

Tag palette¶

ai, agents, build, ci, claude, cli, cuda, dnn, docs, framework, git, github, license, lint, matlab, mcp, planning, python, readme, release, security, simd, supply-chain, sycl, testing, workspace.

Why it exists¶

A Claude session makes a decision (directory move, CI gate change, dependency swap), commits it, the session ends, and the rationale is recoverable only from the commit message — which typically summarises the what but omits the alternatives considered. ADRs preserve "we chose X over Y because Z" in a single auditable place. See ADR-0028.

What counts as non-trivial?¶

Another engineer could reasonably have chosen differently. Examples:

Directory moves (e.g., ADR-0026: workspace/ → python/vmaf/workspace/)
Base-image / dependency policy (e.g., ADR-0027: non-conservative CUDA pins)
CI-gate semantics (e.g., ADR-0024: Netflix golden tests as required status)
Test-selection / regeneration rules
Coding-standards changes (e.g., ADR-0012)
New user-visible flags or surfaces (e.g., ADR-0023)

Not ADR-worthy: bug fixes, implementation details, one-off refactors that don't change any interface or policy.

Relation to `.workingdir2/`¶

Planning dossiers live under .workingdir2/ (gitignored). Mirrored copies of ADRs may exist there for local session continuity, but the tracked docs/adr/ tree is authoritative.

Index¶

ID	Title	Status	Tags
ADR-0001	Treat uncommitted benchmark result JSON as noise	Accepted	workspace, git, testing
ADR-0002	Merge path gpu-opt → sycl → master, master is fork default	Accepted	git, release, workspace
ADR-0003	Introduce `.workingdir2` as new planning directory	Accepted	workspace, planning, claude
ADR-0004	Auto-push sycl and master to origin after merges	Accepted	git, ci, release
ADR-0005	Adopt full framework adaptation scope (a–g)	Accepted	framework, ci, docs, build, mcp
ADR-0006	Set CLI precision default to `%.17g` with `--precision` flag	Superseded by ADR-0119	cli, testing, python
ADR-0007	Rewrite `.claude/settings.json` from scratch	Accepted	claude, agents
ADR-0008	Rewrite README with fork branding, preserve Netflix attribution	Accepted	docs, readme, license
ADR-0009	MCP server exposes four core tools	Accepted	mcp, python, framework
ADR-0010	Sign release artifacts keyless via Sigstore	Accepted	security, release, supply-chain
ADR-0011	Version scheme `v3.x.y-lusoris.N`	Accepted	release, framework
ADR-0012	Coding standards stack: JPL + CERT + MISRA	Accepted	lint, docs, license
ADR-0013	Support full local dev distro matrix	Accepted	build, docs, framework
ADR-0014	VSCode uses clangd; disable MS C/C++ IntelliSense	Accepted	build, framework, lint
ADR-0015	CI matrix: Linux / macOS / Windows with sanitizers	Accepted	ci, testing, security
ADR-0016	`sycl → master` merge conflict resolution policy	Accepted	git, workspace
ADR-0017	Claude skills scope includes domain scaffolding	Accepted	claude, agents, framework
ADR-0018	Claude hooks scope: safety + auto-format + git	Accepted	claude, agents, ci, git
ADR-0019	`.workingdir2` is the full planning dossier	Accepted	workspace, planning, docs
ADR-0020	Tiny-AI scope covers all four capabilities	Accepted	ai, dnn, framework, cli
ADR-0021	Training stack: PyTorch + Lightning with ONNX export	Accepted	ai, python, framework
ADR-0022	Inference runtime: ONNX Runtime via execution providers	Accepted	ai, dnn, cuda, sycl, build
ADR-0023	Tiny-AI user surfaces: CLI, C API, ffmpeg, training	Accepted	ai, dnn, cli, framework
ADR-0024	Preserve Netflix source-of-truth tests verbatim	Accepted	testing, ci, license
ADR-0025	Copyright handling preserves Netflix, adds Lusoris/Claude	Superseded by ADR-0105	license, docs
ADR-0026	Relocate Python harness workspace under `python/vmaf/`	Accepted	workspace, python, docs
ADR-0027	Non-conservative image pins + experimental toolchain flags	Accepted	ci, cuda, sycl, build, supply-chain
ADR-0028	Every non-trivial decision gets an ADR before the commit	Superseded by ADR-0106	docs, planning, agents
ADR-0029	Relocate resource tree under `python/vmaf/`	Accepted	workspace, python, docs
ADR-0030	Relocate MATLAB sources under `python/vmaf/`	Accepted	workspace, matlab, python
ADR-0031	Fork-added docs live under `docs/`	Accepted	docs, workspace
ADR-0032	Relocate root `unittest` script to `scripts/`	Accepted	testing, workspace
ADR-0033	Relocate CodeQL config to `.github/`	Accepted	security, ci, github
ADR-0034	Delete `patches/` leftover; keep only `ffmpeg-patches/`	Accepted	workspace, build
ADR-0035	Migrate `.claude/settings.json` hooks to current schema	Accepted	claude, agents
ADR-0036	Tiny-AI Wave 1 scope expanded beyond D20–D23	Superseded by ADR-0107	ai, dnn, cli, framework, mcp
ADR-0037	Protect `master` branch on GitHub with required checks	Accepted	github, ci, security, release
ADR-0038	Purge upstream MATLAB MEX compiled binaries from tree	Accepted	security, matlab, supply-chain
ADR-0039	Pull forward runtime op-allowlist walk + model registry	Accepted	ai, dnn, security, supply-chain
ADR-0040	Extend DNN session API to multi-input/output with named bindings	Accepted	ai, dnn, cli
ADR-0041	Ship LPIPS-SqueezeNet FR extractor with inverse-ImageNet in graph	Accepted	ai, dnn, cli
ADR-0042	Tiny-AI PRs must ship human-readable docs in the same PR	Accepted	ai, dnn, docs
ADR-0100	Every user-discoverable change ships docs in the same PR (project-wide)	Accepted	docs, agents, framework
ADR-0101	Implement USM-backed picture pre-allocation pool for SYCL	Accepted	sycl, gpu, picture-api, memory
ADR-0102	DNN EP selection is ordered + graceful; `fp16_io` does a host-side fp32↔fp16 cast	Accepted	ai, dnn, cli
ADR-0103	Implement `vmaf_sycl_import_d3d11_surface` as staging-texture H2D path	Accepted	sycl, windows, api
ADR-0104	Compile `picture_pool` unconditionally and size it for the live-picture set	Accepted	api, build, cli
ADR-0105	Copyright handling preserves Netflix and adds Lusoris/Claude (paraphrased re-statement)	Supersedes ADR-0025	license, docs
ADR-0106	Every non-trivial decision gets its own ADR file before the commit (paraphrased re-statement)	Supersedes ADR-0028	docs, planning, agents
ADR-0107	Tiny-AI Wave 1 scope expanded beyond ADR-0020 through ADR-0023 (paraphrased re-statement)	Supersedes ADR-0036	ai, dnn, cli, framework, mcp
ADR-0108	Every fork-local PR ships the six deep-dive deliverables (research digest, decision matrix, AGENTS.md invariant, reproducer, changelog entry, rebase note)	Accepted	docs, agents, framework, planning
ADR-0109	Nightly `bisect-model-quality` runs against a synthetic placeholder cache (real DMOS-aligned cache swaps in via follow-up)	Accepted	ai, ci, tiny-ai, framework
ADR-0110	Coverage gate uses `-fprofile-update=atomic` to survive parallel meson tests on instrumented SIMD code	Superseded by ADR-0111	ci, build, simd, testing
ADR-0111	Coverage gate switches `lcov` → `gcovr` and installs ORT in the coverage job (fixes 1176% over-count + DNN-stub coverage gap; layers on ADR-0110 race fixes)	Accepted	ci, build, dnn, testing
ADR-0112	Expose `ort_backend.c` static helpers (fp16 conversion, resolve_name) via private internal header so `test_ort_internals` can unit-test edge branches the public API can't reach on a CPU-only ORT build	Accepted	dnn, testing, coverage
ADR-0113	`vmaf_ort_open` falls back to CPU when `CreateSession` fails after a non-CPU EP attached; coverage CI installs `onnxruntime-gpu` + `libcudart12` to exercise EP-attach success arms	Accepted	dnn, ci, coverage, ort
ADR-0114	`coverage-check.sh` gains a per-file critical-coverage override map; `dnn/ort_backend.c` + `dnn/dnn_api.c` floor at 78% (structural EP-availability ceiling per ADR-0112)	Accepted	ci, coverage, dnn, ort, gate
ADR-0115	All CI workflows trigger on `[master]` only (drop dead `sycl` branch); delete `windows.yml` and merge into `libvmaf.yml` preserving the `build (MINGW64, …)` required-status-check name	Accepted	ci, github, build, framework
ADR-0116	CI workflow naming convention — purpose-named files + Title Case display names	Accepted	ci, github, docs
ADR-0117	Bump `actions/upload-artifact@v5`→`@v7` (Node 24) repo-wide; filter gcovr `Ignoring suspicious hits` stderr noise so the Coverage Gate Annotations panel finishes empty	Accepted	ci, coverage, gcovr, github-actions
ADR-0118	`ffmpeg-patches/` is a quilt-style series applied via `series.txt` ordering by both Dockerfile and `ffmpeg.yml`; patches regenerated via real `git format-patch -3` carrying valid index lines + signed-off-by trail	Accepted	ci, build, ffmpeg, docker, sycl, ai
ADR-0119	Revert CLI precision default from `%.17g` to `%.6f` so the Netflix CPU golden gate (CLAUDE.md §8) passes without per-call-site flags; `--precision=max` keeps the round-trip-lossless opt-in	Supersedes ADR-0006	cli, testing, python, golden-gate
ADR-0120	Add three DNN-enabled matrix legs (Ubuntu gcc, Ubuntu clang, macOS clang) to `libvmaf-build-matrix.yml` so the ORT C-API surface and `dnn` meson suite are exercised across compilers/OSes; macOS leg `experimental: true` (Homebrew ORT floats)	Accepted	ci, ai, dnn, ort, build, github-actions
ADR-0121	Add Windows GPU build-only matrix legs (`Build — Windows MSVC + CUDA (build only)` + `Build — Windows MSVC + oneAPI SYCL (build only)`) so MSVC build-portability of the CUDA / SYCL backends is gated on PR, not from downstream user reports	Accepted	ci, build, cuda, sycl, github-actions
ADR-0122	Unconditional CUDA cubin coverage for `sm_86` / `sm_89` + `compute_80` PTX fallback in `core/src/meson.build`; actionable multi-line `libcuda.so.1` dlopen-failure + `cuInit` messages in `vmaf_cuda_state_init()` (with pre-existing leak fix on error paths)	Accepted	cuda, build, docs
ADR-0123	Fix ffmpeg `libvmaf_cuda` null-deref at `vmaf_read_pictures` tail: `prev_ref` update on CUDA-device-only extractor set dereferenced zero-initialised `ref_host`. Null-guard the `vmaf_picture_ref(&vmaf->prev_ref, ref)` call. Upstream `f740276a` + `32b115df` + fork `65460e3a` combined to reach default builds.	Accepted	cuda, regression, upstream-sync
ADR-0124	Automate the four rule-bearing process ADRs (0100 doc-substance, 0105 copyright, 0106 ADR-per-decision, 0108 deep-dive deliverables). New `rule-enforcement.yml` workflow with one blocking job (`deep-dive-checklist`) + two advisory PR-comment jobs (`doc-substance-check`, `adr-backfill-check`); pre-commit hook for the copyright template.	Accepted	ci, agents, framework, docs, license
ADR-0125	MS-SSIM decimate fast paths: AVX2 + AVX-512 specialised 9×9 separable LPF factor-2 kernels under `core/src/feature/x86/`; vendored `iqa/decimate.c` stays untouched; bit-exactness enforced via a scalar-separable reference; NEON deferred to follow-up	Proposed	simd, testing, agents
ADR-0126	SSIMULACRA 2 perceptual metric as a fork-local feature extractor — port libjxl C++ reference to `core/src/feature/ssimulacra2.c`, scalar-first, float-tolerance bit-closeness	Proposed	metrics, feature-extractor, docs, agents
ADR-0127	Vulkan compute backend — vendor-neutral GPU path alongside CUDA/SYCL/HIP; volk loader, GLSL→SPIR-V via glslc, VMA allocator, VIF as pathfinder	Proposed	gpu, vulkan, backend, build, agents
ADR-0128	Embedded MCP server inside libvmaf — SSE + UDS + stdio transports, build-flag-gated, new `libvmaf_mcp.h` header, Power-of-10 compliant via dedicated MCP thread + SPSC queue	Accepted	mcp, agents, api, build, docs
ADR-0129	Tiny-AI post-training int8 quantisation — static + dynamic + QAT per model via `quant_mode` field in `model/registry.json`; three scripts under `ai/scripts/`; CI accuracy-budget gate	Proposed	ai, onnx, quantization, model, docs
ADR-0130	Ship scalar C port of the SSIMULACRA 2 metric (libjxl tools/ssimulacra2.cc). BT.709 limited-range YUV→sRGB→linear→XYB pipeline, 6-scale pyramid with separable Gaussian blur (σ=1.5, reflect padding) replacing libjxl's FastGaussian IIR, 108-weight polynomial pool. Snapshot JSON deferred to follow-up PR. Implementation closeout for ADR-0126 (Proposed, PR #67).	Accepted	metrics, feature-extractor, ssimulacra2
ADR-0131	Port Netflix#1382 — swap `cuMemFreeAsync(ptr, priv->cuda.str)` for synchronous `cuMemFree(ptr)` at `core/src/cuda/picture_cuda.c:247` to fix the multi-session assert-0 crash reported in Netflix#1381. Preceding `cuStreamSynchronize` already removed the async-overlap benefit, so no perf regression.	Accepted	cuda, upstream-port, correctness
ADR-0132	Port Netflix#1406 — fix `vmaf_feature_collector_mount_model` list traversal bug (dereferenced-pointer mutation corrupted the list on ≥3 mounts) and `unmount_model` error-code semantics (`-EINVAL` → `-ENOENT` on not-found). Refactored the upstream test extension into a shared `load_three_test_models` helper to keep per-test bodies under JPL-Power-of-10 rule-4 size thresholds.	Accepted	upstream-port, correctness, testing
ADR-0133	Unify push-event and pull-request-event clang-tidy scoping: both now scan only the event's delta (`<before>..HEAD` on push, `origin/<base>...HEAD` on PR). Master-post-merge no longer re-scans every vendored file (`core/src/svm.cpp`, `core/src/cuda/*.c` without CUDA headers) and surfaces long-latent warnings unrelated to the push. `actions/checkout@v6` bumped to `fetch-depth: 0`.	Accepted	ci, lint, clang-tidy
ADR-0134	Port Netflix#1451 — append `declare_dependency(link_with: libvmaf, include_directories: [libvmaf_inc])` + `meson.override_dependency('libvmaf', ...)` at the end of `core/src/meson.build` so consumers can use the fork as a meson subproject with the standard `dependency('libvmaf')` idiom. Fork deviates from upstream by using trailing-comma style to match fork build-file conventions.	Accepted	build, upstream-port
ADR-0135	Port Netflix#1424 — expose `vmaf_model_version_next(prev, &version)` public iterator for built-in VMAF model versions. Corrects three upstream defects during port: NULL-pointer arithmetic UB (missing `else`), off-by-one returning the `{0}` sentinel, and const-qualifier mismatches in the test. Adds `BUILT_IN_MODEL_CNT == 0` early-return for zero-models build configurations. Doxygen-style header doc replaces upstream's one-liner.	Accepted	api, upstream-port, correctness
ADR-0136	Strip markdown emphasis/code characters (`, ``, `_`) from the PR body before the `Deep-Dive Deliverables Checklist` grep. The template ships label bullets like - [ ] `AGENTS.md` invariant note* — backticks inserted characters between tokens and broke the literal-item regex, rejecting conforming PRs. One-line `tr -d` pass applied to both parse and diff-verification steps.	Accepted	ci, rule-enforcement, adr-0108
ADR-0137	Port upstream Netflix/vmaf PR #1430 (thread-local locale handling via `thread_locale.h/.c`) into the fork. Replaces process-global `setlocale(LC_ALL, "C")` bracket in `svm_save_model` + output writers with POSIX.1-2008 `uselocale` (Linux/macOS/BSD), Windows `_configthreadlocale`, graceful fallback elsewhere. Fork corrections: `NULL` score_format in test, merge ADR-0119 `ferror` return contract with upstream pop, drop dead `<locale.h>` include in svm.cpp.	Accepted	port, libvmaf, i18n, thread-safety, upstream-port
ADR-0138	Add AVX2 + AVX-512 bit-exact fast path for `_iqa_convolve` using the three-stage pattern single-rounded float mul → widen to double → double add (no FMA) to mirror scalar `sum += img[i]*k[j]` under FLT_EVAL_METHOD == 0. Specialised for MS-SSIM / SSIM invariants (11-tap Gaussian or 8-tap square, normalised, separable). Dispatch in `ssim_tools.c` via new `_iqa_convolve_set_dispatch`; vendored `iqa/convolve.c` untouched. Profile-driven after the 51% self-time hot spot revealed post-decimate-SIMD. Bit-identical to scalar; Netflix golden gate unchanged.	Proposed	simd, performance
ADR-0139	Rewrite fork-local `ssim_accumulate_avx2/avx512` to match scalar `ssim_accumulate_default_scalar` byte-for-byte by doing the `2.0 * mu1mu2 + C1` numerator and the final `lcs` product per-lane in scalar double* rather than vector float. Prior SIMD computed l, c in float and multiplied as float → 8th-decimal (~0.13 float-ULP) drift on MS-SSIM. `precompute` / `variance` are pure elementwise float ops and already bit-exact. Verified on Netflix + checkerboard pairs: scalar = AVX2 = AVX-512 bit-identical at `--precision max`.	Proposed	simd, performance, bit-exact
ADR-0140	Ship a two-part SIMD DX framework — `core/src/feature/simd_dx.h` with ISA-specific macros for the ADR-0138 widen-then-add pattern and the ADR-0139 per-lane-scalar-double reduction pattern, plus an upgraded `/add-simd-path` skill that scaffolds TU + header + unit test + meson rows from a kernel spec. Demonstrated on convolve NEON (ADR-0138 port to aarch64) and ssim NEON bit-exactness audit (research-0012 follow-up). Cross-compile + QEMU audit gate verifies NEON = scalar at `--precision max`. Enables PR #B (ssimulacra2 AVX2+NEON, motion_v2 NEON, vif_statistic AVX-512+NEON, etc.) at higher throughput.	Proposed	simd, dx, build, agents
ADR-0141	Every PR leaves every file it touches lint-clean to the fork's strictest profile — fork-local AND upstream-mirror. Refactor first; `// NOLINT` reserved for cases where refactoring would break a load-bearing invariant (ADR-0138 / ADR-0139 bit-exactness, upstream-parity identifier the rebase story depends on). Every NOLINT cites inline the ADR / research digest / rebase invariant that forces it. Historical debt (18 pre-existing `readability-function-size` NOLINTs + upstream `_iqa_*` suppressions) stays scoped to backlog item T7-5; this ADR does not backdate the rule.	Accepted	ci, process, code-quality, agents
ADR-0142	Port Netflix upstream `18e8f1c5` (feature/vif: add `vif_sigma_nsq`) with fork-specific extension of the AVX2 `vif_statistic_s_avx2` SIMD variant to accept the new parameter. Default (`2.0`) bit-identical to pre-port master; AVX2 and scalar agree on the new 14-argument contract. Fork keeps a local `(float)vif_sigma_nsq` cast to preserve ADR-0138/0139 float-arithmetic discipline (upstream's scalar double-promotes into a slightly different computation). ADR-0141 applied to touched files (ptrdiff_t stride widening fix; function-size NOLINT on the AVX2 variant references T7-5 sweep).	Accepted	upstream-port, feature-param, vif, simd
ADR-0143	Port Netflix upstream `f3a628b4` (feature/common: generalize avx convolution for arbitrary filter widths) — replaces 2.4k LoC of specialised fwidth ∈ {3,5,9,17} kernels in `common/convolution_avx.c` with a single generalised 1-D scanline pair + `MAX_FWIDTH_AVX_CONV` ceiling in `convolution.h`; drops the hard-coded whitelist in the VIF dispatch (`vif_tools.c`). Adopts upstream's paired python-golden loosening from `places=2` to `places=1` on `VMAF_score` / `VMAFEXEC_score` per the ADR-0142 Netflix-authority precedent. ADR-0141 applied to touched files: four 1-D scanline helpers now `static`, strides widened to `ptrdiff_t`, `#include <stddef.h>` added. Zero clang-tidy warnings on the touched file.	Accepted	upstream-port, simd, avx2, vif, convolve
ADR-0145	Port `motion_v2` to NEON in a new fork-local TU `arm64/motion_v2_neon.c`. Bit-exact to the scalar reference under QEMU (cpumask=0 vs cpumask=255 byte-identical on Netflix golden pair). Uses arithmetic right-shift throughout (`vshrq_n_s64` / `vshlq_s64(v, -bpc)`) to match scalar C `>>` semantics — deliberately diverges from the fork's AVX2 variant, which uses `_mm256_srlv_epi64` (logical); AVX2 re-audit queued as follow-up. Five small `static inline` helpers keep every function under ADR-0141's 60-line budget; zero clang-tidy warnings, no NOLINT. Closes backlog item T3-4 (gap-fill Step 2).	Accepted	simd, neon, motion, bit-exact, performance
ADR-0146	Sweep every `readability-function-size` NOLINT from `core/src/` (20 sites). Fork-local files (dict, picture, predict, libvmaf, output, feature_collector, feature_extractor, picture_pool, read_json_model) refactored via small `static` helpers; IQA / SIMD files (`_iqa_convolve`, `_iqa_ssim`, `vif_statistic_s_avx2`) refactored via `static inline` helpers that preserve ADR-0138 / ADR-0139 bit-exactness (per-lane scalar-float reduction threaded through `struct vif_simd8_lane`; convolve ordering unchanged). Drive-by lint fixes: `calloc(w*h, ...)` widening, multi-declaration forms, `model_collection_parse_loop` alias-write, `_calc_scale` → `iqa_calc_scale` rename. Zero new NOLINTs introduced; Netflix-golden-pair VMAF score bit-identical between VMAF_CPU_MASK=0 and =255. Closes backlog item T7-5.	Accepted	lint, cleanup, refactor, touched-file-rule
ADR-0147	Port the thread-pool job-object free list + 64-byte inline data buffer from Netflix upstream PR #1464 (closed) into `core/src/thread_pool.c`. Recycles `VmafThreadPoolJob` slots instead of malloc/free per enqueue; payloads ≤ 64 bytes bypass a second malloc via inline_data. Adapted to the fork's `void (func)(void data, void **thread_data)` signature and per-worker `VmafThreadPoolWorker` data path. ~1.8–2.6× enqueue throughput on a 500 000-job 4-thread micro-benchmark. Netflix-golden-pair VMAF bit-identical between `--threads 4` and serial, and between `VMAF_CPU_MASK=0` and `=255` under `--threads 4`. Closes the thread-pool half of backlog item T3-6 (PSNR SIMD half was already in via fork commit 81fcd42e).	Accepted	performance, threading, upstream-port
ADR-0148	Rename the IQA-derived reserved-identifier surface (`_iqa_`, `struct _kernel`, `_ssim_int`, `_map_reduce`, `_map`, `_reduce`, `_context`, `_ms_ssim_map`, `_ssim_map`, `_ms_ssim_reduce`, `_ssim_reduce`, `_alloc_buffers`, `_free_buffers`, header guards `_CONVOLVE_H_` / `_DECIMATE_H_` / `_SSIM_TOOLS_H_` / `__VMAF_MS_SSIM_DECIMATE_H__`) to non-reserved `iqa_` / `ms_ssim_` / `_INCLUDED` spellings across `core/src/feature/{iqa,}` (21 files). Sweeps the ADR-0141 touched-file lint cascade that surfaced (~40 pre-existing warnings: misc-use-internal-linkage → `static` or cross-TU NOLINT; widening multiplications → size_t casts; multi-decl splits; function-size refactors of `calc_ssim` / `compute_ssim` / `compute_ms_ssim` / `run_gauss_tests`; unused-parameter `(void)` casts; scoped NOLINTBEGIN/END for analyzer false positives on kernel-offset bounds and test-helper malloc leaks). Bit-identical VMAF score on Netflix golden pair (scalar vs SIMD, with `--feature float_ssim --feature float_ms_ssim`). Closes backlog item T7-6.	Accepted	lint, cleanup, refactor, iqa, touched-file-rule
ADR-0149	Port Netflix upstream PR #1376 ("Fix fifo hangs") into the Python harness under `python/vmaf/core/executor.py` + `python/vmaf/core/raw_extractor.py`. Replaces the 1-second `os.path.exists()` polling loop in `_open_{work,proc}files_in_fifo_mode` with `multiprocessing.Semaphore(0)` signalled by the child processes after `os.mkfifo(...)`; parent acquires with 5-second soft-timeout warn then blocks indefinitely. Applied to both the base `Executor` class hierarchy and the `ExternalVmafExecutor`-style subclass (single-process variant). Fork carve-outs: skip upstream's `__version__ = "3.0.0" → "4.0.0"` bump (fork tracks its own versioning per ADR-0025); drop now-unused `from time import sleep` from both files (ADR-0141 ruff F401). Closes backlog item T4-7.	Accepted	upstream-port, python, concurrency, fifo
ADR-0150	Port Netflix upstream PR #1472 ("cuda: enable CUDA feature extraction on Windows (MSYS2/MinGW)") two-commit series: source-portability guards in CUDA headers + `.cu` files (drop `<pthread.h>` from `cuda/common.h`, DEVICE_CODE guards on `<ffnvcodec/>` vs `<cuda.h>` in `cuda_helper.cuh` + `picture.h`, `#ifndef DEVICE_CODE` around `feature_collector.h` in 5 ADM `.cu` files), and meson build plumbing (`vswhere`-based `cl.exe` discovery without `PATH` pollution, Windows SDK + MSVC include path injection via `-I` flags to nvcc, CUDA version detection via `nvcc --version` instead of `meson.get_compiler('cuda')`). Fork-specific conflict resolutions: keep positional (not `#ifndef __CUDACC__`) initializers in `integer_adm.h`; keep `pthread_dependency` on `cuda_static_lib` (ring_buffer.c still uses pthread directly); merge fork's gencode coverage block (ADR-0122) with upstream's new nvcc-detect block. Drive-by: rename reserved `__VMAF_SRC__H__` header guards to `VMAF_SRC_*_INCLUDED`. Linux CPU build 32/32 + Linux CUDA build 35/35 pass. Closes backlog item T4-2.	Accepted	upstream-port, cuda, windows, mingw, build
ADR-0151	Add a 32-bit x86 (i686) build-only row to `.github/workflows/libvmaf-build-matrix.yml` to reproduce Netflix upstream issue #1481. New cross-file `build-aux/i686-linux-gnu.ini` (gcc + `-m32`, `cpu_family = 'x86'`, `cpu = 'i686'`); new install-deps step for `gcc-multilib g++-multilib`; matrix row runs `meson_extra: --cross-file=build-aux/i686-linux-gnu.ini -Denable_asm=false` (pins upstream's documented workaround); test + tox steps skipped for the i686 leg because meson marks cross-built tests `SKIP 77`. Scope note: fixing the actual `_mm256_extract_epi64` compile failure (24 call sites in `adm_avx2.c`) is explicitly out of scope — this ADR adds the CI gate only. Closes backlog item T4-8.	Accepted	ci, build, x86, netflix-upstream
ADR-0152	`vmaf_read_pictures` now rejects non-monotonic indices with `-EINVAL`. Addresses Netflix upstream issue #910: `integer_motion` / motion2 / motion3 extractors keep sliding-window state keyed by `index % N`, so submitting frames out of order or with duplicate indices silently corrupts their ring-buffers (symptom: missing `integer_motion2_score` on last frame when submission order doesn't match frame order). Enforced via new `last_index` + `have_last_index` fields on `VmafContext`, checked inside the existing `read_pictures_validate_and_prep` helper from ADR-0146. Net visible behaviour change: duplicates and out-of-order indices now return `-EINVAL` instead of producing silent-wrong-answer. 3-subtest reducer in `test_read_pictures_monotonic.c` verified to fail pre-fix and pass post-fix. Zero impact on in-tree callers (vmaf CLI + test suite already iterate strictly increasing). Closes backlog item T1-2.	Accepted	api, correctness, motion, netflix-upstream
ADR-0153	`float_ms_ssim` init rejects input below 176×176 with `-EINVAL` + clear error message. Addresses Netflix upstream issue #1414: the 5-level 11-tap MS-SSIM pyramid walks off the kernel footprint at a mid-level scale for inputs below 176×176 (QCIF and smaller), producing a confusing mid-run `error: scale below 1x1!` cascade. Fix checks `w < GAUSSIAN_LEN << (SCALES - 1)` at init time in `float_ms_ssim.c:init`, derived from the existing filter constants so it stays in sync. Extracted `ms_ssim_init_simd_dispatch` helper to keep the function under the ADR-0141 size budget. 3-subtest reducer in `test_float_ms_ssim_min_dim.c` covers 5 boundary rejections (including exact-under cases like 175×176 and 176×175) and 2 accept cases (176×176 exact + 576×324). Verified fail-without-fix + pass-with-fix. Visible behaviour: init now fails immediately instead of mid-stream; zero impact on inputs ≥176×176. Closes backlog item T1-4.	Accepted	correctness, ms-ssim, netflix-upstream
ADR-0154	`vmaf_score_pooled` (via `vmaf_feature_collector_get_score` + the inline `vmaf_feature_vector_get_score` fast-path) now returns `-EAGAIN` when a requested feature index is valid but not yet written (transient — e.g. motion2 waiting for the next frame or flush), distinguishing it from `-EINVAL` which remains for programmer errors (bad pointer, out-of-range, unknown feature). Addresses Netflix upstream issue #755: downstream integrations that want per-frame streaming VMAF output can now distinguish 'retry after next read or flush' from 'abort'. Inline-helper return was previously `-1`; now `-EINVAL` (structural) or `-EAGAIN` (pending). Drive-by: rename reserved `__VMAF_FEATURE_COLLECTOR_H__` header guard to `VMAF_FEATURE_COLLECTOR_INCLUDED` (ADR-0141 touched-file rule). 4-subtest reducer in `test_score_pooled_eagain.c` verified to fail pre-fix, pass post-fix. Closes backlog item T1-1.	Accepted	api, correctness, motion, netflix-upstream
ADR-0155	Deliberately preserve the `i4_adm_cm` int32 rounding overflow reported as Netflix upstream issue #955. `add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1))` in `core/src/feature/integer_adm.c` scales 1–3 assigns `0x80000000` into `int32_t`, wrapping to `-2147483648`; downstream `(prod + add_bef_shift) >> 32` subtracts 2^31 instead of adding it, biasing ADM scales 1–3 low by ≈1 LSB per summed term. The buggy arithmetic is encoded in the Netflix golden `assertAlmostEqual` values (project hard rule #1 / ADR-0024); fixing unilaterally would diverge from every published VMAF number calibrated on these outputs. Documentation-only landing: ADR + in-file warning comments at both overflow sites + rebase-notes 0048 + `core/src/feature/AGENTS.md` invariant. If Netflix closes #955 with a fix, sync under the ADR-0142 Netflix-authority carve-out (C-side fix + golden-number update in one merge). Closes backlog item T1-8 as "verified present, deliberately preserved".	Accepted	correctness, adm, netflix-upstream, deferred, golden-gate
ADR-0156	Replace `CHECK_CUDA`'s `assert(0)` semantics with graceful error propagation across the entire CUDA backend. Addresses Netflix upstream issue #1420: two concurrent VMAF-CUDA processes crashed the second one at `vmaf_cuda_buffer_alloc` on `cuMemAlloc` OOM. Introduces `CHECK_CUDA_GOTO(funcs, CALL, label)` (cleanup-aware) + `CHECK_CUDA_RETURN(funcs, CALL)` (immediate-return) macros + `vmaf_cuda_result_to_errno` helper that maps `CUresult` → `-ENOMEM` / `-EIO` / `-ENODEV` / `-EINVAL`. All 178 `CHECK_CUDA(...)` call sites across 7 TUs (`common.c`, `picture_cuda.c`, `libvmaf.c`, `integer_motion_cuda.c`, `integer_vif_cuda.c`, `integer_adm_cuda.c`, `cuda_helper.cuh`) converted. Twelve `static` helper functions promoted from `void → int` to carry errors. Fixes the NDEBUG footgun (`assert(0)` was a no-op in release builds → silent segfault). ADR-0122 / ADR-0123 null-guards preserved verbatim. 39/39 CUDA tests pass including new `test_cuda_buffer_alloc_oom` reducer verified to hit the `cuMemAlloc(1 TiB)` OOM path. Clang-tidy clean across all 6 touched files. Closes backlog item T1-6.	Accepted	cuda, correctness, api, netflix-upstream, reliability
ADR-0157	Fix CUDA preallocation memory leak + add new public `vmaf_cuda_state_free()` API. Addresses Netflix upstream issue #1300: users running CUDA VMAF in init/preallocate/fetch/close loops saw GPU memory rise monotonically. Four framework-side leaks identified via ASan + fixed: (1) `VmafCudaState` heap allocation had no public free — new `vmaf_cuda_state_free()` symbol in `libvmaf_cuda.h` + implementation in `common.c` (NULL-safe `free()` wrapper; must be called AFTER `vmaf_close()`); (2) `vmaf_cuda_release()` now calls `cuda_free_functions()` to release the dlopen'd driver table, via a saved pointer AFTER the existing `memset`; (3) `vmaf_ring_buffer_close()` now unlocks + destroys the `pthread_mutex` before freeing (was: destroying a locked mutex, POSIX UB); (4) cold-start unwind in `init_with_primary_context()` releases the retained primary context if `cuStreamCreateWithPriority` fails. Mirrors SYCL backend's `vmaf_sycl_state_free()` ownership pattern. New 10-cycle GPU-gated reducer `test_cuda_preallocation_leak.c` verified to leak 0 framework bytes under ASan (183 bytes remain in `libcuda.so.1` driver cache — persists for process lifetime, not per-cycle). Test-side cleanup fixed in `test_cuda_pic_preallocation.c` + `test_cuda_buffer_alloc_oom.c`. Preserves ADR-0122 / ADR-0123 null-guards + ADR-0156 CHECK_CUDA_GOTO cleanup paths verbatim. Visible behaviour change: every CUDA caller must now call `vmaf_cuda_state_free(cu_state)` AFTER `vmaf_close(vmaf)` — informal `free(cu_state)` becomes a double-free. Closes backlog item T1-7.	Accepted	cuda, correctness, api, netflix-upstream, memory
ADR-0158	Verify Netflix upstream PR #1486 ("Port motion updates") is already fully present in the fork's master. Both commits — `a44e5e6` (motion edge-mirror fix + `motion_max_val` option + `motion3` output + SIMD dispatch refactor) and `62f47d5` (73 coordinated Netflix golden `assertAlmostEqual` updates) — are confirmed in the fork via grep + programmatic AST-style scan. Fork PR #45 tracked the original port attempt but never merged; substance landed incrementally via later motion3 / blend / five-frame-window / moving-average commits. Doc-only close — ADR + rebase-notes entry 0051 for future `/sync-upstream` coverage; no code change. Closes backlog item T4-1.	Accepted	upstream-port, motion, netflix-upstream, verification
ADR-0159	AVX2 port of `psnr_hvs` (Xiph/Daala 8×8 integer DCT + contrast-sensitivity weighting + masking). Vectorizes the DCT 8 rows in parallel via `__m256i` (one register per row, 8× int32) using a `butterfly → transpose → butterfly → transpose` layout. Every `od_coeff` emitted and every final `psnr_hvs_{y,cb,cr,psnr_hvs}` feature value is byte-identical between scalar and AVX2 on all 3 Netflix golden CPU pairs (verified per-frame XML diff via `VMAF_CPU_MASK=0` vs default). Float accumulators (means / variances / mask / error) kept scalar by construction per ADR-0139 precedent. `#pragma STDC FP_CONTRACT OFF` at the TU header to prevent FMA formation from breaking 1-ulp guarantee. 3.58× DCT speedup on isolated microbenchmark (11.0 → 39.3 Mblocks/s at `-O3 -mavx2 -mfma`). New unit test `test_psnr_hvs_avx2.c` pins the DCT-level bit-exactness contract on 5 reproducible inputs. NEON follow-up PR to come (backlog T3-5-neon). Scoped NOLINTBEGIN/END around upstream Xiph scalar block keeps it verbatim as the bit-exact reference.	Accepted	simd, avx2, psnr-hvs, bit-exact, performance
ADR-0160	NEON (aarch64) sister port of `psnr_hvs` — follows ADR-0159 AVX2. Same byte-identical bit-exactness contract vs scalar. Lane-width adjusted: NEON's 4-wide `int32x4_t` means each 8-column row splits into `r_k_lo` (cols 0-3) and `r_k_hi` (cols 4-7); the 30-butterfly runs twice per DCT pass (once per half) and the 8×8 transpose decomposes into four 4×4 `vtrn1q_s32`/`vtrn2q_s32`/`vtrn1q_s64`/`vtrn2q_s64` stages plus a top-right ↔ bottom-left block swap. Float accumulators kept scalar per ADR-0139/0159. Inherits the ADR-0159 pointer-based `accumulate_error` signature to preserve scalar summation order (IEEE-754 non-associativity catches: local float accumulator + return would drift the Netflix golden by ~5.5e-5). New unit test `test_psnr_hvs_neon.c` pins DCT-level bit-exactness on 5 reproducible inputs; passes under `qemu-aarch64-static`. End-to-end 576×324 8-bit Netflix golden pair diff (scalar via `VMAF_CPU_MASK=0` vs NEON default): byte-identical except for the `fps` timing header. 1080p 10-bit pairs covered by native-aarch64 CI + Netflix CPU Golden Tests required check (QEMU segfaults on heavy 10-bit workloads — known emulator limit, not a port defect). Closes backlog item T3-5-neon.	Accepted	simd, neon, aarch64, psnr-hvs, bit-exact, performance
ADR-0161	SSIMULACRA 2 SIMD ports — AVX2 + AVX-512 + NEON in a single PR (T3-1 + T3-2). Vectorises 5 of the 8 hot kernels: `multiply_3plane`, `linear_rgb_to_xyb` (per-lane scalar `cbrtf` for bit-exactness), `downsample_2x2` (scalar-order `((r0e + r0o) + r1e) + r1o` sequential adds via `vshufps+vpermpd` / `vpermt2ps` / `vuzp1q_f32`+`vuzp2q_f32` deinterleaves), `ssim_map` + `edge_diff_map` (per-lane `double` reduction per ADR-0139). Byte-for-byte identical to scalar on all 5 kernels × 3 ISAs — verified via new `test_ssimulacra2_simd.c` with reproducible xorshift32 inputs (5/5 on AVX-512 host, 5/5 under `qemu-aarch64-static`). Runtime dispatch via function pointers in `Ssimu2State`, init-time selection in a new `init_simd_dispatch()` helper. IIR blur + `picture_to_linear_rgb` (`powf`) explicitly deferred to follow-up PRs — the IIR serial recurrence needs per-column batching which is a focused PR on its own, and the `powf` kernel is a smaller ROI. `#pragma STDC FP_CONTRACT OFF` at every TU header (ignored on aarch64 GCC, kept for portability). Closes backlog T3-1 + T3-2 partially (pointwise kernels); leaves IIR blur / YUV-pipeline vectorisation as follow-up.	Accepted	simd, avx2, avx512, neon, ssimulacra2, bit-exact, performance
ADR-0162	SSIMULACRA 2 FastGaussian IIR blur SIMD (phase 2 of T3-1). Vectorises `blur_plane` on all three ISAs — single largest wall-clock cost (30 calls / frame). Horizontal pass: N-row batching via `_mm256_i32gather_ps` / `_mm512_i32gather_ps` / NEON lane-sets (AVX2=8, AVX-512=16, NEON=4 rows). Vertical pass: column-SIMD load/store over `prev1_`/`prev2_` state arrays + scalar tail. Bit-exact to scalar per-lane under `FLT_EVAL_METHOD == 0`; verified via new `test_blur` subtest in `test_ssimulacra2_simd.c` (6/6 on AVX-512 host, 6/6 under `qemu-aarch64-static`). Dispatch via new `blur_fn` function pointer in `Ssimu2State` assigned in `init_simd_dispatch()` (NULL = scalar fallback). Only `picture_to_linear_rgb` (2 calls / frame, `powf` EOTF) remains scalar — deferred to follow-up.	Accepted	simd, avx2, avx512, neon, ssimulacra2, bit-exact, iir-blur, performance
ADR-0163	SSIMULACRA 2 `picture_to_linear_rgb` SIMD (phase 3 of T3-1 — closes it at 7/7 kernels). YUV → linear RGB with BT.709/BT.601 matmul + sRGB EOTF, now on all 3 ISAs. Strategy: per-lane scalar pixel reads (handles all chroma ratios + 8/16-bit uniformly), SIMD matmul + normalise + clamp, per-lane scalar `powf` for the sRGB EOTF branch (mirrors phase-1 `cbrtf` pattern). Bit-exact to scalar under `FLT_EVAL_METHOD == 0`. New shared header `ssimulacra2_simd_common.h` with `simd_plane_t { data, stride, w, h }` decouples SIMD TUs from `VmafPicture`. Dispatch wrapper in `ssimulacra2.c` unpacks VmafPicture into the struct. Verified via 5 new `test_ptlr_*` subtests (420/420-10bit/444/444-10bit/422): 11/11 total pass on AVX-512 host + 11/11 under `qemu-aarch64-static`. Closes backlog T3-1 in full; SSIMULACRA 2 now has zero scalar hot paths.	Accepted	simd, avx2, avx512, neon, ssimulacra2, bit-exact, yuv-rgb, srgb-eotf
ADR-0164	SSIMULACRA 2 snapshot-JSON regression gate (closes backlog T3-3). New `python/test/ssimulacra2_test.py` invokes the `vmaf` CLI with `--feature ssimulacra2` on two checked-in YUV fixtures (`src01_hrc00/01_576x324` + `ref/dis_test_...q_160x90`) and `assertAlmostEqual`s the per-frame + pooled scores against values pinned on master HEAD. 12 asserts × 2 fixtures. Self-consistency gate only — not cross-checked against libjxl `tools/ssimulacra2` (requires libjxl+codec build chain in CI, scope creep) or Pacidus Python port (known scipy.gaussian_filter vs libjxl FastGaussian IIR drift). CPU path is bit-exact across AVX2/AVX-512/NEON/scalar per ADR-0161/0162/0163, so values are reproducible on any host.	Accepted	test, ssimulacra2, regression-gate, fork-local
ADR-0165	Tracked `docs/state.md` for in-tree bug-status hygiene + new CLAUDE.md §12 rule 13 mandating a same-PR update on every bug close / open / rule-out. Resolves the `STATE.md`-shaped half of Issue #20 that ADR-0028 (ADR-row-before-commit) intentionally left out — ADRs cover decisions, this file covers bug status. Three sections (Open / Recently closed / Confirmed not-affected / Deferred) with cross-links to ADRs + PRs + Netflix issues. PR template carries a checkbox; opt-out syntax `no state delta: REASON` for PRs without bug-status impact. Closes Issue #20 + backlog item T7-1.	Accepted	process, state-hygiene, claude-rule, fork-local
ADR-0166	MCP server (`vmaf-mcp` Python package) release artifact channel — both PyPI (Trusted Publishing / OIDC, no token) and GitHub release attachment with Sigstore keyless signing + PEP 740 attestations + SLSA L3 provenance + SBOM. Wired as new `mcp-build` / `mcp-sign` / `mcp-publish-pypi` jobs in the existing `supply-chain.yml` (one workflow surface for libvmaf + MCP release matrix coherence). Operational note: a one-time PyPI Trusted Publisher binding (project `vmaf-mcp`, owner `lusoris`, repo `vmaf`, workflow `supply-chain.yml`, environment `pypi-publish`) must be configured by the user before the first release after this PR lands. Closes backlog item T7-2.	Accepted	release, mcp, supply-chain, sigstore, pypi
ADR-0167	Path-mapped doc-drift enforcement — closes the gap surfaced by the 2026-04-25 docs audit (16 PRs landed in 2 days; 2 HIGH + 4 MEDIUM doc gaps slipped past the existing checks). Two layers: (1) new project hook `.claude/hooks/docs-drift-warn.sh` — informational stderr `NOTICE` when an Edit/Write touches a user-discoverable surface but no matching `docs/<topic>/` file is touched; copies the `auto-snapshot-warn.sh` pattern, no block. (2) `rule-enforcement.yml` `doc-substance-check` promoted from advisory (`continue-on-error: true`) to blocking + rewritten with a path-mapped surface→docs check. ADR additions under `docs/adr/` no longer satisfy the check (ADRs are decisions, not user-facing docs — CLAUDE.md §12 rule 10). Per-PR opt-out `no docs needed: REASON` for legitimate internal-refactor / bug-fix / test PRs. Path map covers libvmaf headers, feature extractors, SIMD/GPU twins, CLI tools, build flags, MCP server, tiny-AI CLI, ffmpeg patches.	Accepted	process, enforcement, claude-hook, ci, adr-0100
ADR-0168	Tiny-AI Wave 1 baselines C2 (`nr_metric_v1`) + C3 (`learned_filter_v1`) trained on KoNViD-1k (T6-1 partial; C1 deferred — Netflix Public Dataset is access-gated, not programmatically downloadable). C2: MobileNet-tiny 19.1K-param NR scoring model, 224×224 grayscale, 60 epochs early-stopped at 23, val/MSE 0.382. C3: 4-block residual CNN 18.9K-param, self-supervised on synthetic gaussian σ=1.2 + JPEG-Q35 degradation pairs, 100 epochs, val/L1 0.019. Four new scripts under `ai/scripts/` (`fetch_konvid_1k.py` / `extract_konvid_frames.py` / `train_konvid.py` / `export_tiny_models.py`); two new datamodule classes (`FrameMOSDataset`, `PairedFrameDataset`) under `vmaf_train.data.frame_dataset`. Schema + C-side `VmafModelKind` enum extended to allow `kind: "filter"` (registry trust-root for filter models consumed by ffmpeg `vmaf_pre`, NOT loaded by libvmaf scoring path). KoNViD-1k MOS values not redistributed — populated manifest stays gitignored, user re-runs `vmaf-train manifest-scan` on fresh clone. ORT op-allowlist + roundtrip atol 1e-4 both pass.	Accepted	tiny-ai, training, onnx, konvid-1k, c2, c3, fork-local
ADR-0169	ONNX op-allowlist admits `Loop` + `If` (T6-5); `Scan` stays rejected. Wire-format scanner in `onnx_scan.c` extended with mutually-recursive `scan_attribute` / `scan_node` helpers that descend into `NodeProto.attribute` (field 5) → `AttributeProto.g` / `.graphs` (fields 6 / 11) so forbidden ops cannot hide inside a `Loop.body` / `If.then_branch` / `If.else_branch` subgraph. Recursion is depth-bounded (`VMAF_DNN_MAX_SUBGRAPH_DEPTH = 8`). Python `vmaf_train.op_allowlist` mirrors the recursion via `_collect_op_types` so the export-time check and the runtime load-time check stay in lockstep. Bounded-iteration guard explicitly deferred to a follow-up ADR (data-flow analysis for `Loop.M → Constant ≤ MAX_LOOP_ITERATIONS` doubles scope; track as T6-5b). 4 existing tests flipped from "Loop/If rejected" to "Loop/If accepted" + 4 new subgraph-recursion tests added (2 C, 2 Python).	Accepted	tiny-ai, onnx, security, op-allowlist
ADR-0170	`vmaf_pre` ffmpeg filter extended to 10/12-bit LE pixel formats (`yuv{420,422,444}p1{0,2}le` + `gray{10,12}le`) and to optional chroma filtering via new `chroma=0` / `chroma=1` option (default 0 preserves luma-only back-compat). New libvmaf public API `vmaf_dnn_session_run_plane16(sess, in, in_stride_bytes, w, h, bpc, out, out_stride_bytes)` alongside existing `_luma8`; `bpc` in range 9..16 selects the normalisation divisor `(1 << bpc) - 1`. Two matching tensor helpers `vmaf_tensor_from/to_plane16` in `tensor_io.{h,c}`. Filter dispatches `_luma8` at bpc=8 and `_plane16` at bpc≥9 via a new `run_plane()` helper; `chroma=1` re-runs the same session on U/V at chroma-subsampled dimensions. 3 new tensor-io round-trip tests (10-bit identity, bpc bounds, 12-bit clamp). Closes BACKLOG T6-4.	Accepted	tiny-ai, ffmpeg, dnn, api, fork-local
ADR-0171	Bounded `Loop.M` trip-count guard (T6-5b; closes the follow-up deferred in ADR-0169). Two layers, mirroring the ADR-0167 doc-drift pattern. (1) Python export-time `_collect_loop_violations` walks the graph, traces every `Loop`'s first input back to a `Constant` int64 scalar in the local scope (recurses into subgraphs), rejects when the producer is a graph input, the wrong op_type, or the value is outside `[0, MAX_LOOP_TRIP_COUNT]` (default 1024, per-call overridable). `AllowlistReport` gains `loop_violations` field and a strengthened `pretty()`. (2) C wire-format scanner gains a counter threaded through `scan_graph` / `scan_node` / `scan_attribute` that increments on every `Loop` op_type at any depth; rejects with `-EPERM` + `first_bad="Loop"` when count exceeds `VMAF_DNN_MAX_LOOP_NODES = 16`. C cap is intentionally coarser than the Python data-flow check — reproducing the producer-map lookup in the wire scanner would violate ADR D39's "no libprotobuf-c" constraint. 5 new Python tests + 1 new C test.	Accepted	tiny-ai, onnx, security, op-allowlist
ADR-0172	New MCP tool `describe_worst_frames` (T6-6) — scores a `(ref, dis)` pair, picks the N worst-VMAF frames, extracts each as PNG via `ffmpeg select='eq(n,<idx>)'`, runs a vision-language model and returns frame metadata + descriptions. Lazy VLM loader cascades `HuggingFaceTB/SmolVLM-Instruct` → `vikhyatk/moondream2` → metadata-only fallback (when `transformers` is missing or no candidate model loads). New `vlm` optional dependency group in MCP `pyproject.toml` (transformers + torch + Pillow + accelerate). First concrete consumer of ADR-0171's bounded-Loop guard — autoregressive VLM token generation requires `Loop` nodes. PNGs stored under `/tmp/vmaf-mcp-worst-<pid>/` so the caller can fetch them post-call. 5 new tests + 1 extended; all 17 MCP tests pass.	Accepted	tiny-ai, mcp, vlm, fork-local
ADR-0173	Audit-first implementation of ADR-0129's PTQ int8 policy (T5-3). Three new optional fields in `registry.schema.json` (`quant_mode` enum `fp32` / `dynamic` / `static` / `qat`, `quant_calibration_set`, `quant_accuracy_budget_plcc` default 0.01). Three scripts under `ai/scripts/` (`ptq_dynamic.py` wraps ORT `quantize_dynamic`; `ptq_static.py` wraps `quantize_static` with a `.npz`-backed `CalibrationDataReader`; `qat_train.py` is a CLI scaffold that raises `NotImplementedError` until the per-model QAT PR lands the trainer hook). New `VmafModelQuantMode` enum in `model_loader.h` + sidecar parser branch (default FP32 fail-safe on unknown values). 4 Python smoke tests + 3 C sidecar tests. Audit-first: no shipped model flips its `quant_mode` from `fp32` in this PR; the runtime `.int8.onnx` redirect + the `ai-quant-accuracy` CI gate land with the first per-model quantisation PR (T5-3b). New `docs/ai/quantization.md` user reference.	Accepted	tiny-ai, onnx, quantization, registry, ci, fork-local
ADR-0174	First per-model PTQ — `learned_filter_v1` flips to `quant_mode: "dynamic"` (T5-3b; closes T5-3 fully). 80 KB → 33 KB (2.4× shrink). Drop measurement: PLCC 0.999883 vs fp32 on a 16-sample synthetic input set, drop 0.000117 vs budget 0.01 (100× margin). Runtime `.int8.onnx` redirect wired in `vmaf_dnn_session_open` — when the sidecar declares `quant_mode != FP32`, the loader strips trailing `.onnx`, appends `.int8.onnx`, re-validates, and passes that path to ORT. Fp32 file stays on disk as the regression baseline. New `int8_sha256` registry/sidecar field (required when quant_mode != fp32) extends the trust-root invariant. New `ai/scripts/measure_quant_drop.py` walks the registry and gates each non-fp32 model against its `quant_accuracy_budget_plcc`. New CI step in the `Tiny AI` job runs `--all`. C2 `nr_metric_v1` stays fp32 in this PR — its dynamic-batch export trips ORT's internal shape inference (tracked as T5-3c follow-up).	Accepted	tiny-ai, onnx, quantization, registry, ci, fork-local
ADR-0175	Vulkan compute backend — scaffold-only audit-first PR (T5-1; closes T5-1 audit half, runtime + VIF-pathfinder + lavapipe smoke land in follow-up PRs per ADR-0127). New public header `libvmaf_vulkan.h` declaring `VmafVulkanState` / `VmafVulkanConfiguration` / `vmaf_vulkan_state_init` / `_import_state` / `_state_free` / `vmaf_vulkan_list_devices` / `vmaf_vulkan_available`. New `core/src/vulkan/` (common + picture_vulkan) + `core/src/feature/vulkan/` (3 kernel stubs: adm/vif/motion). All entry points return `-ENOSYS`. New `enable_vulkan` feature option (default disabled) with conditional `subdir('vulkan')` in `core/src/meson.build`. New 4-sub-test smoke at `core/test/test_vulkan_smoke.c` pinning the stub contract. New CI matrix row `Build — Ubuntu Vulkan Scaffold (stub kernels)` compiling with `-Denable_vulkan=enabled`. New ffmpeg patch `0004-libvmaf-wire-vulkan-backend-selector.patch` mirroring the SYCL selector in 0003 (adds `vulkan_device` libvmaf filter option). New `docs/backends/vulkan/overview.md`. Zero runtime dependencies for the scaffold — no `dependency('vulkan')`, no volk, no glslc, no VMA; those land with the runtime PR.	Accepted	gpu, vulkan, scaffold, audit-first, fork-local
ADR-0176	Vulkan VIF cross-backend gate. Two CI lanes: `Vulkan VIF Cross-Backend (lavapipe, places=4)` runs on every PR using Mesa lavapipe on `ubuntu-24.04` (no GPU runner needed); `Vulkan VIF Cross-Backend (Arc A380, advisory)` runs nightly on the self-hosted Arc runner (parked behind `if: false` until label `vmaf-arc` registered). Both invoke `scripts/ci/cross_backend_vif_diff.py` comparing per-frame `integer_vif_scale0..3` at `places=4` against the CPU scalar reference on the Netflix normal pair. Empirical baseline at `acf9f5b8` is ULP=0 (GLSL kernel uses native int64 accumulators); the `places=4` slack is forward compatibility for future driver / kernel changes. Includes `VMAF_FEATURE_EXTRACTOR_VULKAN = 1 << 5` flag, public state-level API (`vmaf_vulkan_state_init` / `_state_free` / `_available` / `_list_devices`), CLI flags `--no_vulkan` and `--vulkan_device <N>`. Closes T5-1b-v.	Accepted	ci, vulkan, gpu, numerical-correctness, fork-local
ADR-0177	Vulkan motion kernel (T5-1c, motion half). Replaces the 37-line stub at `core/src/feature/vulkan/motion_vulkan.c` with a real `VmafFeatureExtractor` + GLSL compute shader (`shaders/motion.comp`). Implements separable 5-tap Gaussian blur (`{3571, 16004, 26386, 16004, 3571}`, sum=65536) + per-WG int64 SAD reduction; emits `integer_motion` and `integer_motion2` with the standard 1-frame lag. `motion3` (5-frame window mode) deliberately deferred — no shipped model uses it. Cross-backend gate generalized: `scripts/ci/cross_backend_vif_diff.py` gains `--feature {vif,motion}` selector; both lavapipe and Arc-nightly lanes run a second motion diff step. Empirical baseline: ULP=0 vs CPU on the Netflix normal pair (48 frames). Closes the motion half of T5-1c; ADM follow-up next PR.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0178	Vulkan ADM kernel (T5-1c, ADM half — closes T5-1c). Replaces the 37-line stub at `core/src/feature/vulkan/adm_vulkan.c` with a real `VmafFeatureExtractor` (~700 LOC) backed by a new GLSL compute shader (`shaders/adm.comp`, ~660 LOC) implementing 4-scale CDF 9/7 DWT + decouple+CSF + per-band reductions. 16 pipelines per ADM extractor (one per `(scale, stage)`). Provides the standard `integer_adm2`, `integer_adm_scale0..3` outputs. Cross-backend gate gains a third "ADM cross-backend diff" step in both lavapipe and Arc-nightly lanes. Empirical baseline: ULP=0 vs CPU on Netflix normal (48 frames at 576x324) + 1920x1080 checkerboard (3 frames); residual on scales 1-3 at full IEEE-754 precision is ~7e-7 from host-side double-summation order, well under places=4. Closes T5-1c — Vulkan kernel matrix now matches SYCL/CUDA for the default `vmaf_v0.6.1` model.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0179	`float_moment` SIMD parity (AVX2 + NEON), closing the only fully-scalar row in the SIMD-coverage matrix (T7-19). New `compute_{1st,2nd}_moment_{avx2,neon}` under `core/src/feature/{x86,arm64}/moment_*.{c,h}` follow the `ansnr_avx2.c` pattern: square in float, accumulate into `double` via scattered-tmp (AVX2) or lane-pair widening via `vcvt_f64_f32` (NEON). Dispatched from `float_moment.c::init` via function pointers selected from `vmaf_get_cpu_flags()`. Tolerance-bounded (1e-7 relative — ~500× tighter than the production snapshot gate's `places=4`), not bit-exact, matching the contract documented in the kernel TU headers. New `test_moment_simd` runs four cases per arch (two random seeds, an aligned width, and a tiny edge case to exercise the per-row tail). End-to-end CLI output unchanged at JSON `%g` precision.	Accepted	simd, avx2, neon, feature-extractor, fork-local
ADR-0180	CPU coverage matrix audit (post-T7-19 sweep): closes 5 stale gaps in one pass. T7-22 (`ms_ssim` per-scale SIMD) was already done via ADR-0138 / 0139 / 0140 — verified via 3.2× wall-clock speedup vs `--cpumask 0xfffffffe`. CAMBI scalar fallback already exists at `cambi.c:446-460` — earlier "no pure-C scalar path" note was wrong. motion_v2 NEON has been shipped since 2026-04 at `arm64/motion_v2_neon.c` — earlier "x86 SIMD but no NEON" note was wrong. integer `ansnr` is a phantom row — no `integer_ansnr` extractor is registered. T7-21 (`psnr_hvs` AVX-512) closes as AVX2 ceiling with empirical evidence — 1.17× wall-clock speedup of AVX2 vs scalar means the 8-wide DCT is bandwidth-amortised; AVX-512 widening would force a 2-block host batch with no measurable payoff. Same verdict applies to deferred float_moment AVX-512. Matrix + BACKLOG updated to reflect ground truth.	Accepted	simd, audit, fork-local, doc-correction
ADR-0181	T7-26 — Global feature-characteristics registry + per-backend dispatch-strategy modules (CUDA / SYCL / Vulkan). Replaces the existing per-context SYCL `GRAPH_AREA_THRESHOLD` heuristic with a per-feature registry: each `VmafFeatureExtractor` carries a `VmafFeatureCharacteristics` descriptor (`n_dispatches_per_frame`, `is_reduction_only`, `min_useful_frame_area`, `dispatch_hint`). Per-backend `dispatch_strategy.{c,h}` modules consume the descriptor + frame dims + env overrides and return the backend's primitive (SYCL `DIRECT` vs `GRAPH_REPLAY`, CUDA `DIRECT` vs `GRAPH_CAPTURE` stub, Vulkan `PRIMARY_CMDBUF` vs `SECONDARY_CMDBUF_REUSE` stub). New env override surface: `VMAF_<BACKEND>_DISPATCH=feature:strategy,...` (per-feature; supersedes legacy `VMAF_SYCL_USE_GRAPH` / `VMAF_SYCL_NO_GRAPH` aliases). Descriptors seeded for vif (4 dispatches, 720p area), motion (2 dispatches, 1080p area), adm (16 dispatches, 720p area). Empirical: ADM at 576×324 within 0.5% of pre-T7-26 behaviour (registry preserves byte-for-byte AUTO + 720p semantics). Foundation for adding 14 GPU long-tail kernels without 42 duplicate dispatch sites. Same PR also fixes a pre-existing GCC LTO type-mismatch surfaced by the new `chars` field — `null.c`, `feature_lpips.c`, `ssimulacra2.c` were missing `#include "config.h"` and saw a smaller `VmafFeatureExtractor` struct than `feature_extractor.c`.	Accepted	gpu, cuda, sycl, vulkan, architecture, fork-local
ADR-0182	GPU long-tail batch 1 — psnr + ciede + moment on CUDA / SYCL / Vulkan (T7-23 + follow-ups). 14 of ~16 registered metrics are missing GPU coverage today; this ADR scopes the bundle that closes the 3 simplest GPU-friendly metrics across all 3 backends. Per-backend ordering: psnr Vulkan → psnr CUDA → psnr SYCL → ciede {3 backends} → moment {3 backends}. Each backend group lands as a separate commit on the feature branch so partial revert is cheap. Batch 1a (this PR) ships psnr Vulkan only — luma-only `psnr_y`, 89-LOC GLSL shader + 391-LOC host C, single dispatch/frame, subgroup-int64 reduction. Verified bit-exact (max_abs_diff = 0.0) vs CPU scalar on Intel Arc A380 / Mesa anv across 48 frames. Cross-backend gate gains a "PSNR cross-backend diff" step on the lavapipe lane. CUDA / SYCL twins + ciede / moment land in subsequent batches 1b–1d on top of the same ADR.	Accepted	gpu, cuda, sycl, vulkan, feature-extractor, fork-local
ADR-0183	T7-28 — `libvmaf_sycl` FFmpeg filter, zero-copy QSV/VAAPI import. Closes the user-doc gap exposed by PR #126: hwdec via `-hwaccel qsv` was forced through a `hwdownload,format=yuv420p` round-trip because the regular `libvmaf` filter only takes software frames. New `ffmpeg-patches/0005-libvmaf-add-libvmaf-sycl-filter.patch` adds a dedicated `libvmaf_sycl` filter that consumes oneVPL `mfxFrameSurface1` frames, extracts the VA surface ID, and routes through `vmaf_sycl_import_va_surface` for zero-copy DMA-BUF import on the Level Zero / SYCL compute queue. Pairs with existing 0003-* (sycl_device option on regular libvmaf filter) so users have both paths: `libvmaf=sycl_device=N` for software frames + SYCL compute, `libvmaf_sycl=…` for QSV hwdec + zero-copy SYCL. T7-29 (Vulkan VkImage import) remains open — needs new C-API surface in libvmaf_vulkan.h.	Accepted	sycl, ffmpeg, fork-local, zero-copy
ADR-0184	T7-29 part 1 — Vulkan VkImage zero-copy import C-API scaffold. Symmetric to T7-28 (SYCL VAAPI/dmabuf, ADR-0183) but for Vulkan: when FFmpeg decodes via `-hwaccel vulkan -hwaccel_output_format vulkan`, the regular `libvmaf` filter forces a `hwdownload,format=yuv420p` round-trip because the C-API surface for VkImage import doesn't exist. This ADR adds three new entry points in `libvmaf_vulkan.h` — `vmaf_vulkan_import_image` (takes external VkImage + VkSemaphore), `vmaf_vulkan_wait_compute`, `vmaf_vulkan_read_imported_pictures` — mirroring the SYCL backend's import surface. Header purity: handles cross the ABI as `uintptr_t` (matches the libvmaf_cuda.h precedent). Scaffold only: every function returns `-ENOSYS`, mirroring how the original Vulkan backend shipped via ADR-0175. T7-29 part 2 (real implementation: `vkCmdCopyImageToBuffer` + timeline-semaphore wait) and part 3 (`ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch`) follow in subsequent PRs.	Accepted	vulkan, ffmpeg, fork-local, zero-copy, scaffold
ADR-0185	T7-31 — hide volk / vk* symbols from libvmaf.so's public ABI. When libvmaf is built with `-Denable_vulkan=enabled`, the bundled volk Vulkan-loader leaked ~30 `volk` + the full `vk` API into the .so's exported symbols. Static FFmpeg builds (BtbN-style cross-toolchain releases, lawrence's glibc-2.28 environment, etc.) that link both libvmaf and libvulkan.a get GNU-ld multiple-definition errors at the final link step. Fix: pass `-Wl,--exclude-libs,ALL` on the libvmaf.so link command in `core/src/meson.build`; gates off Darwin / Windows where the flag isn't supported (those linkers don't auto-export static-archive symbols anyway). Verified via `nm -D libvmaf.so` (zero `vk` / `volk` symbols post-fix); smoke + end-to-end psnr_vulkan on Arc A380 unchanged.	Accepted	vulkan, build, fork-local, abi
ADR-0186	T7-29 parts 2 + 3 — Vulkan VkImage zero-copy import implementation + `libvmaf_vulkan` FFmpeg filter. Drops the `-ENOSYS` stubs from ADR-0184: per-state ref/dis staging `VkBuffer` pair (HOST_VISIBLE, DATA_ALIGN-strided), `vkCmdCopyImageToBuffer` + timeline-semaphore wait per frame, `vmaf_vulkan_state_build_pictures` builder so `read_imported_pictures` routes through standard `vmaf_read_pictures` with no-op release callbacks. Adds `vmaf_vulkan_state_init_external` so the FFmpeg filter runs libvmaf compute on the decoder's VkDevice (source VkImage handles are device-bound). New `ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch` packages the `libvmaf_vulkan` filter consuming `AV_PIX_FMT_VULKAN` frames. Synchronous v1 design (fence-wait inside `import_image`); async pending-fence v2 + true zero-copy GPU compute deferred. Also introduces fork rule §12 r14: PRs that change libvmaf surfaces probed by `ffmpeg-patches/` must update the relevant patch in the same PR. Smoke 10/10; float_moment cross-backend gate clean (0/48 × 4 metrics on Arc A380).	Accepted	vulkan, ffmpeg, fork-local, zero-copy
ADR-0187	T7-23 / batch 1c part 1 — `ciede_vulkan` extractor. First non-bit-exact GPU twin in the fork: per-pixel ciede2000 ΔE uses ~40 transcendentals (pow / sqrt / sin / atan2) so bit-exactness against the libm-based CPU is not on the table. Single-dispatch GLSL shader emits per-WG `float` partials; host accumulates in `double`, divides by W·H, applies the CPU's logarithmic transform `45 - 20·log10(mean_ΔE)` for the `ciede2000` metric. 6 storage-buffer bindings (ref + dis Y/U/V at full luma resolution; chroma upscaled host-side via the `ciede.c::scale_chroma_planes` pattern). Empirical: Intel Arc A380 + Mesa anv → `max_abs = 1.0e-5` across 48 frames at 576×324, well under `places=4` threshold (≤5e-5) — gate runs at places=4 for parity with the bit-exact kernels. New CI step on the lavapipe lane. CUDA + SYCL twins follow as batch 1c parts 2 + 3.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0188	T7-23 / batch 2 scope — `psnr_hvs` / `ssim` / `ms_ssim` across CUDA + SYCL + Vulkan. Picks up where batch 1 closed (PR #137). Per-metric ordering: `ssim` first (smallest, scaffolds the separable Gaussian filter ms_ssim reuses), then `ms_ssim` (5-level pyramid built on the ssim kernel), then `psnr_hvs` (largest — DCT-based). Per-backend ordering inside each metric: Vulkan → CUDA → SYCL (same as batch 1). Precision targets: `places=4` for ssim/ms_ssim with measure-then-set-the-contract approach (relax to `places=3` if needed), `places=2` for psnr_hvs. ssim/ms_ssim luma-only; psnr_hvs needs all three planes (CUDA gets free chroma upload via PR #137's bitmask fix). 9 PRs total to close batch 2 (3 metrics × 3 backends), each ~500-1000 LOC + per-metric ADRs.	Accepted	gpu, cuda, sycl, vulkan, feature-extractor, fork-local
ADR-0189	T7-23 / batch 2 part 1a — `float_ssim_vulkan` extractor. Vulkan twin of the active CPU `float_ssim`. Two-dispatch design: horizontal 11-tap Gaussian over ref / cmp / ref² / cmp² / ref·cmp into 5 intermediate float buffers, then vertical 11-tap + per-pixel SSIM combine + per-WG float partials. Host accumulates partials in `double`, divides by `(W-10)·(H-10)` (matches CPU's iqa_ssim valid-region averaging). 11-tap kernel weights baked into GLSL byte-for-byte from `g_gaussian_window_h` in `iqa/ssim_tools.h`. picture_copy host-side normalises uint sample → float [0, 255] before upload (matches float_ssim.c). v1 limitation: `scale=1` only — auto-detect rejects `scale > 1` with `-EINVAL`; production 1080p needs `--feature float_ssim_vulkan:scale=1` pinned (or smaller input). Cross-backend gate fixture (576×324) auto-resolves to scale=1. Empirical: Intel Arc A380 + Mesa anv → `max_abs = 1.0e-6` across 48 frames at 576×324, well under `places=4` threshold. CUDA + SYCL twins follow as batch 2 parts 1b + 1c.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0190	T7-23 / batch 2 part 2a — `float_ms_ssim_vulkan` extractor. Wang multi-scale SSIM on Vulkan: 5-level pyramid built via 9-tap 9/7 biorthogonal LPF + 2× downsample (`ms_ssim_decimate.comp`), then per-scale SSIM compute that emits three per-WG partials (`l, c, s`) instead of a single combined SSIM (`ms_ssim.comp` — variant of `ssim.comp`). Host accumulates partials in `double` per scale, applies the Wang weights `α/β/γ` (matches `ms_ssim.c::g_alphas/g_betas/g_gammas` byte-for-byte) for the product combine. Surfaced one bug during bring-up: the σx²/σy² clamp-to-zero in `ssim_variance_scalar` (line 165) is required to avoid `sqrt(negative)` → NaN at scale 0 when float ULP errors push variances slightly negative on flat regions. Min-dim guard mirrors ADR-0153 (176×176). v1 doesn't implement `enable_lcs` (15 extra metrics). Empirical: Intel Arc A380 + Mesa anv → `max_abs = 2.0e-6` across 48 frames at 576×324, well under `places=4`. CUDA + SYCL twins follow as batch 2 parts 2b + 2c.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0191	T7-23 / batch 2 part 3a — `float_psnr_hvs_vulkan` extractor. First DCT-based GPU kernel in the fork. One GLSL shader (`psnr_hvs.comp`), one dispatch per plane (3 per frame). Per-WG = one 8×8 block: cooperatively load samples + per-quadrant means/vars, run the Xiph integer `od_bin_fdct8x8` scalar (lifting + RSHIFT — `int32` arithmetic matches CPU bit-for-bit), per-thread mask + masked-error contribution, `subgroupAdd` to a per-block float partial. Host accumulates partials in `double`, applies `score / pixels / samplemax²` then `10·log10(1/score)` per plane. Combined `psnr_hvs = 0.8·Y + 0.1·(Cb + Cr)`. Step=7 overlap matches CPU loop. 6 pipelines (3 planes × 2 bpc paths) via specialisation constants. CSF/mask tables baked per plane. `places=2` precision target per ADR-0188 (DCT integer-exact, but per-block float reductions in `s_mask` / `s_gvar` and the per-plane log10 limit the floor). CUDA + SYCL twins follow as batch 2 parts 3b + 3c.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0192	T7-23 / batch 3 scope — close every remaining GPU-coverage gap: `integer_motion_v2` + `float_ansnr` + `ssimulacra2` + `cambi` (Group A: no GPU twin yet) plus `float_psnr` / `float_motion` / `float_vif` / `float_adm` (Group B: float twins of int kernels already on GPU). Per-metric ordering by ascending complexity: motion_v2 (300 LOC, reuses integer_motion convolve) → float_ansnr (124 LOC) → float twins (4 metrics, smallest first) → ssimulacra2 (XYB + IIR + per-stage SSIM) → cambi (sequential range-tracking, hardest port — feasibility spike required). Per-backend Vulkan → CUDA → SYCL (matches batches 1 + 2). Precision contracts: `places=4` for motion_v2; `places=3` for float twins + float_ansnr; `places=2` for ssimulacra2 + cambi (measure-first). 21+ PRs to close (7 metrics × 3 backends, ssimulacra2 + cambi may sub-split). After this batch, every registered feature extractor has at least one GPU twin (lpips remains ORT-delegated per ADR-0022). Float twins kept native (not aliased to int kernels — different input domains).	Accepted	gpu, cuda, sycl, vulkan, feature-extractor, fork-local
ADR-0193	T7-23 / batch 3 part 1a — `integer_motion_v2_vulkan` extractor. Single-dispatch GLSL kernel that exploits convolution linearity (the SAD of blurred prev/cur frames equals the sum of absolute values of the blurred per-pixel diff), so each frame computes its score in one V→H separable convolve over `(prev_ref - cur_ref)`, accumulating absolute values into per-WG `int64` partials with no blurred-state buffer (vs `motion_vulkan`'s ping-pong). Raw-pixel ping-pong (`ref_buf[2]`) halves upload bandwidth vs framework `prev_ref`. Corrected by ADR-0662: current CPU `integer_motion_v2.c::mirror` uses reflect-101 (`2*size - idx - 2`), and CUDA / SYCL / Vulkan twins must match that literal; the original `-1` prose here was stale.	Accepted	vulkan, gpu, feature-extractor, fork-local, bit-exact
ADR-0194	T7-23 / batch 3 part 2 — `float_ansnr_{vulkan,cuda,sycl}` extractors. Single-dispatch GPU kernel applies the CPU's 3x3 ref filter and 5x5 dis filter inline from a 20×20 shared/SLM tile, then accumulates per-pixel `sig = ref_filtr²` and `noise = (ref_filtr - filtd)²` into per-WG float partials. Host reduces in `double` and applies the CPU formulas `float_ansnr = 10log10(sig/noise)` and `float_anpsnr = MIN(10log10(peak²·w·h/max(noise, 1e-10)), psnr_max)`. Edge-replicating mirror (`2*size - idx - 1`) matches CPU `ansnr_filter2d_s` — same divergence-from-motion footgun as ADR-0193. Empirical floor on cross-backend gate fixture: `max_abs_diff = 6e-6` (8-bit, 48 frames) and `2e-6` (10-bit, 3 frames) on all three backends — identical numbers across Vulkan / CUDA / SYCL. Closes ANSNR's "CPU-only float, no GPU twin" matrix gap.	Accepted	vulkan, cuda, sycl, gpu, feature-extractor, fork-local, places-4
ADR-0195	T7-23 / batch 3 part 3 — `float_psnr_{vulkan,cuda,sycl}` extractors. Smallest GPU twin in the long-tail: ~120 LOC GLSL / ~110 LOC PTX / ~150 LOC SYCL. Single-dispatch, no halo, no shared tile — every pixel is independent. Per-thread `(ref - dis)²` (float), sub-group reduce + SLM cross-subgroup reduce → per-WG float partials, host accumulates in `double` and applies `MIN(10·log10(peak² / max(noise/(w·h), 1e-10)), psnr_max)`. Empirically bit-exact vs CPU on all three backends, both 8-bit (48 frames) and 10-bit (3 frames) — `max_abs_diff = 0.0e+00` everywhere. Float-domain kernel structurally too simple to drift; host-side `double` reduction absorbs any per-WG ULP noise. Drive-by docs fix: features.md row claimed `float_psnr_y / _cb / _cr` plane outputs (wrong — the CPU extractor only emits `float_psnr`); corrected in this PR. First of four Group B float twins from ADR-0192.	Accepted	vulkan, cuda, sycl, gpu, feature-extractor, fork-local, bit-exact
ADR-0196	T7-23 / batch 3 part 4 — `float_motion_{vulkan,cuda,sycl}` extractors. Float twin of `integer_motion`'s GPU kernels: same V→H 5-tap separable Gaussian blur (FILTER_5_s float weights summing to ~1.0), same 2-buffer ping-pong of blurred refs, same per-WG float SAD partials + host `double` reduction. `motion = sad / (w·h)`, `motion2 = min(prev, cur)` emitted at index-1 (delayed-by-one). Mirror padding: skip-boundary `2*(sup-1) - idx` matches CPU `convolution_internal.h::convolution_edge_s` (NOT motion_v2's edge-replicating). Empirical max_abs_diff = 3e-6 (8-bit, 48 frames) / 1e-6 (10-bit, 3 frames) on all three backends with identical numbers — strong correctness signal. Lavapipe + Arc A380 + RTX 4090. Second of four Group B float twins from ADR-0192.	Accepted	vulkan, cuda, sycl, gpu, feature-extractor, fork-local, places-4
ADR-0197	T7-23 / batch 3 part 5 — `float_vif_{vulkan,cuda,sycl}` extractors. Third Group B float twin. 4-scale Gaussian pyramid (filters 17/9/5/3 at default `vif_kernelscale=1.0`) + per-pixel `vif_stat_one_pixel`. 7 dispatches/frame: 4 compute + 3 decimate. CPU's `VIF_OPT_HANDLE_BORDERS` branch — per-scale dims `prev/2`, decimation samples `(2gx, 2gy)` with mirror padding. Mirror-asymmetry fix: CPU has two H-mirror formulas that differ by 1 — `vif_mirror_tap_h` (`-1`, scalar fallback only) vs `convolution_edge_s` (`-2`, AVX2 production border path). The GPU follows the AVX2 form (production); using scalar's form drifts `5.46e-4` at scale 1, using AVX2's brings it to `1.4e-5`. places=4 across all 4 scales, identical max_abs_diff across all three backends (`1e-6 / 1.4e-5 / 1.8e-5 / 3.7e-5` at 8-bit; tighter at 10-bit). v1 restricts kernelscale=1.0 only.	Accepted	vulkan, cuda, sycl, gpu, feature-extractor, fork-local, places-4
ADR-0198	Follow-up to ADR-0185 — `-Wl,--exclude-libs,ALL` only takes effect at the `gcc -shared` step that produces `libvmaf.so`; static-archive builds (`default_library=static -Denable_vulkan=enabled`) still bundle volk's full `vk` API as STB_GLOBAL symbols inside `libvmaf.a`, which collides with the Khronos `libvulkan.a` in BtbN-style fully-static FFmpeg link environments (lawrence's repro 2026-04-27, ~700 multi-def errors). Fix: rename volk's `vk` symbols to `vmaf_priv_vk` at the C preprocessor level via a force-included header generated from `volk.h`. The packagefile parses every `extern PFN_vkXxx vkXxx;` declaration, emits `#define vkXxx vmaf_priv_vkXxx` (784 entries for volk-1.4.341), and `-include`s the result on `volk.c` and every libvmaf TU pulling in `volk_dep`. Identical fix for shared and static; no per-build-mode meson branches. Verified: shared `nm -D libvmaf.so` reports 0 leaked `vk` (unchanged from ADR-0185); static `nm libvmaf.a` reports 0 GLOBAL `vk` (was ~700) and 719 GLOBAL `vmaf_priv_vk`; BtbN-style `gcc -static main.c libvmaf.a libvulkan-stub.a` link succeeds; `test_vulkan_smoke` 10/10 pass on the renamed build (volk runtime dispatch still functional).	Accepted	vulkan, build, fork-local, abi
ADR-0199	T7-23 / batch 3 part 6 — `float_adm_vulkan` extractor, sixth and final Group B float twin. Float twin of integer_adm_vulkan (ADR-0178): same 4-stage / 4-scale wave-of-stages design (16 pipelines), float buffers throughout, host-side `double` accumulation across WGs. Stage 0 = DWT vertical (ref+dis fused, dim_z=2), stage 1 = DWT horizontal (4 bands), stage 2 = decouple + CSF (writes csf_a + csf_f), stage 3 = CSF denominator + Contrast Measure fused (1D dispatch over 3 bands × num_active_rows; per-WG 6-slot float partials). Mirror-asymmetry status: float_adm has NO trap analogous to ADR-0197 — both the scalar `adm_dwt2_s` and the AVX2 `float_adm_dwt2_avx2` consume the same `dwt2_src_indices_filt_s` index buffer (`2 * sup - idx - 1` for both axes); the GPU follows that. Picture-copy semantics applied in-shader (`(u8 + offset)` for 8-bit, `(u16 / scaler) + offset` for HBD; offset = -128). `places=4` cross-backend contract on the lavapipe lane. CUDA + SYCL twins land as a focused follow-up.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0200	Bug-fix follow-up to ADR-0198. The `-include volk_priv_remap.h` flag was attached to `volk_dep.compile_args`; on `default_library=static` builds meson copies dependency `compile_args` into the generated `libvmaf.pc` `Cflags:` so consumers can re-link against transitive deps. lawrence's BtbN-style fully-static FFmpeg build (cross-toolchain glibc-2.28, 2026-04-27 22:19) hit `<command-line>: fatal error: /<libvmaf-build-dir>/subprojects/volk-vulkan-sdk-1.4.341.0/volk_priv_remap.h: No such file or directory` on the `check_func_headers aom/aom_codec.h` probe — the build-dir path no longer existed after libvmaf was installed to `/opt/ffbuild/`. Fix: move the `-include` off `volk_dep.compile_args` and onto libvmaf's private `c_args` via `vmaf_cflags_common += ['-include', volk_priv_remap_h_path]` in `core/src/vulkan/meson.build`, where the path is pulled from `subproject('volk').get_variable('volk_priv_remap_h_path')`. `c_args:` are private to the target and don't leak into pkg-config; symbol-rename behaviour is byte-for-byte identical. Verified post-fix `Cflags: -I${includedir} -I${includedir}/libvmaf -DVK_NO_PROTOTYPES -pthread` — no leaked path. `nm libvmaf.a` still reports 0 GLOBAL `vk` and 719 `vmaf_priv_vk`.	Accepted	vulkan, build, fork-local, abi
ADR-0201	T7-23 / batch 3 part 7 — `ssimulacra2_vulkan` extractor. 4-shader Vulkan kernel: `ssimulacra2_xyb.comp` (linear-RGB → XYB with deterministic in-shader cube root, port of `vmaf_ss2_cbrtf`), `ssimulacra2_mul.comp` (3-plane elementwise product for ref²/dis²/ref·dis pre-blur), `ssimulacra2_blur.comp` (separable Charalampidis 2016 3-pole IIR with sigma=1.5 — one workgroup per row for H pass, one per column for V pass; per-channel offsets via push consts to avoid the descriptor-update-between-record trap), and `ssimulacra2_ssim.comp` (per-pixel SSIMMap + EdgeDiffMap stats with 18-slot per-WG shared-memory halving reduction). Host: YUV→linear-RGB scalar libjxl port + 2×2 box downsample between scales (with full-resolution plane stride preserved across the pyramid so GPU channel offsets stay constant). All 4 shaders compile with `-O0` to disable SPIR-V FMA contraction. Empirical (Netflix normal pair, 48 frames): per-scale SSIM + edge-diff stats agree CPU-vs-Vulkan to 4–5 decimal places; pooled `ssimulacra2` score `max_abs_diff = 1.59e-2` (mean 5.30e-3). Cross-backend gate runs at `places=1` — the parent ADR-0192's nominal `places=2` was anticipated to "may surprise upward; measure first" and the multi-stage XYB+IIR+SSIM-combine+log float pipeline lands at the `places=1` floor. Min-dim guard rejects below 8×8. CUDA + SYCL twins follow as batch 3 parts 7b + 7c.	Accepted	vulkan, gpu, feature-extractor, fork-local
ADR-0202	T7-23 / batch 3 parts 6b + 6c — `float_adm_cuda` + `float_adm_sycl` extractors, the CUDA and SYCL twins of ADR-0199. Direct ports of the Vulkan kernel: same 4-stage / 4-scale wave-of-stages design, same `-1` mirror form, same fused stage 3 with cross-band CM threshold, same per-scale `[csf_h/v/d, cm_h/v/d]` 6-slot WG partials reduced on the host in `double`. CUDA: single `.cu` file with four `__global__` entry points (matches `float_vif_cuda`), submit/collect async-stream pattern. SYCL: single `.cpp` with `launch_` templates over `SCALE`. Two precision-critical fixes from bring-up: (1) `--fmad=false` on the `float_adm_score` fatbin via a new per-kernel `cuda_cu_extra_flags` dict in `meson.build` — NVCC's default FMA contraction in the angle-flag dot product cascades through the cube reductions and pushes scale-3/adm2 past `places=4` (3.6e-4 max_abs vs CPU before fix). Scoped to this one kernel; integer ADM keeps its existing FMA-on path. (2) Parent-LL dimension trap — stage 0 at `scale > 0` must clamp/mirror against the parent's LL output dims* (`scale_w/h[scale]`), NOT the parent's full-resolution dims (`scale_w/h[scale - 1]`); first cut clamped against the wrong bounds and let parent reads wander into uninitialised memory. Both fixes documented inline at the call sites. Verified `max_abs_diff ≤ 6e-6` across all 5 outputs (adm2, adm_scale0..3) on the Netflix normal pair; checkerboard 1px is bit-exact.	Accepted	cuda, sycl, gpu, feature-extractor, fork-local, places-4
ADR-0203	Implementation follow-up to ADR-0242. Adds runnable Netflix-corpus loader + feature extractor + `vmaf_v0.6.1` distillation + PyTorch dataset + PLCC/SROCC/KROCC/RMSE eval harness + Lightning-style training entry point under `ai/data/` and `ai/train/`. Three architectures registered: `linear` (7 params), `mlp_small` (257 params, default), `mlp_medium` (2 561 params). Default validation split holds out the `Tennis_24fps` source (1-source-out, content-disjoint). Per-clip JSON cache at `$VMAF_TINY_AI_CACHE` with atomic write-rename. `--epochs 0 --assume-dims 16x16` smoke command works without the real corpus or a built `vmaf` binary so CI can verify the harness end-to-end. Does NOT run training — the actual training is a manual user invocation deferred to the next PR. New `docs/ai/training.md` Netflix-corpus section + 25 unit tests under `ai/tests/`.	Accepted	ai, training, fork-local, onnx, docs
ADR-0205	cambi GPU feasibility spike (mandated by ADR-0192 §Consequences). Verdict: feasible as a hybrid host/GPU pipeline, mirroring ADR-0201's precedent. GPU shaders cover preprocessing + per-row derivative kernel + 7×7 spatial-mask summed-area table + 2× decimate + 3-tap separable mode filter. The precision-sensitive `calculate_c_values` sliding-histogram pass + top-K spatial pooling stay on the host; the GPU phases are integer + bit-exact w.r.t. CPU so the `places=2` precision contract holds trivially. Three classical re-formulations evaluated: (I) single-WG direct port — rejected, ~1/64 GPU utilisation; (II) parallel-scan reformulation — rejected for v1, materialises 17 GiB intermediate at 4K; (III) direct per-pixel histogram — deferred to v2 as profile-driven perf polish, ~9× CPU bandwidth. Companion research digest: `docs/research/0020-cambi-gpu-strategies.md` (literature: Blelloch 1990, Sengupta 2007, Merrill & Grimshaw 2016). v1 LOC estimate: ~1230 (host glue ~700 + 6 shaders ~400 + wiring ~130). This PR ships the architecture sketch + reference shader scaffolds + dormant `cambi_vulkan.c` host skeleton (not yet build-wired, matching ssimulacra2 precedent); a follow-up PR wires + integrates + validates.	Accepted	gpu, vulkan, cambi, feasibility-spike, fork-local
ADR-0206	T7-23 / batch 3 part 7b + 7c — `ssimulacra2_cuda` and `ssimulacra2_sycl` extractors, mechanical ports of the ADR-0201 Vulkan hybrid host/GPU pipeline. Host: YUV→linear-RGB + 2×2 box downsample + linear-RGB→XYB + per-pixel SSIM/EdgeDiff combine in double precision (verbatim ports of `ssimulacra2.c` scalar paths — identical to Vulkan twin). GPU: 3-plane elementwise multiply (`ssimulacra2_mul3`) + separable Charalampidis 2016 3-pole IIR blur (`ssimulacra2_blur_h` / `ssimulacra2_blur_v`, one work-item per row/column). CUDA fatbin pinned with `-Xcompiler=-ffp-contract=off --fmad=false` via per-kernel `cuda_cu_extra_flags` map — without it the IIR's `n2sum - d1prev1 - prev2` fuses to FMAs and per-step rounding compounds across the 6-scale pyramid past `places=4`. SYCL relies on the existing `-fp-model=precise` for the same effect. CUDA fex is `.extract`-pattern (synchronous host loop); D2H-copies raw YUV from `picture_cuda`'s device-side `VmafPicture.data[]` into pinned host scratch before host YUV→RGB. Empirical (CUDA, RTX 4070): Netflix normal pair `max_abs_diff = 1.0e-6` (places=4 with 5-decade margin); both checkerboard pairs bit-exact (0.0). SYCL precision verified through CI lavapipe-equivalent gate. Closes batch 3 part 7 across all three GPU backends.	Accepted	cuda, sycl, gpu, feature-extractor, fork-local
ADR-0207	T5-4 — Quantization-Aware Training (QAT) design for the fork's tiny-AI surface. The 2026-04-28 backlog audit (§A.2.1) flagged QAT as untracked; per the Section-A audit decisions the user direction is implement, do not close. ADR locks the pass design before code lands: PyTorch `torch.ao.quantization` modern API (`prepare_qat_fx` / `convert_fx`); pretrain fp32 → insert FakeQuant (default symmetric per-tensor activations + per-channel weight observers, matching the PTQ static recipe in Research-0006 §2 so QAT-vs-static delta is attributable to training, not qconfig drift) → fine-tune at 10× reduced LR → `convert_fx` + `torch.onnx.export(..., opset_version=17)` to a QDQ `.int8.onnx`; pass through the existing `measure_quant_drop.py` audit harness with `quant_mode="qat"` (extending the `dynamic`/`static` enum) and a 0.002 default `quant_accuracy_budget_plcc`. Trainer hook lands in new `ai/train/qat.py`; `ai/scripts/qat_train.py` becomes a real implementation (was `NotImplementedError` scaffold per ADR-0173's audit-first sequence). Pairs with T5-3e so QAT models round-trip through the same EP set as PTQ models (CPU + CUDA + Arc OpenVINO/Level Zero). Alternatives considered: legacy eager-mode `torch.quantization.prepare_qat` (deprecated; rejected); ORT-internal `Olive` (inverts the PyTorch-first training flow; rejected); skip QAT and tighten the static-PTQ budget (rejected — direct contradiction of user direction §A.2.1).	Proposed	ai, quantization, dnn, tiny-ai, fork-local
ADR-0208	T5-4 — first per-model QAT implementation, plus the trainer hook + CLI driver landing the ADR-0207 design. New `ai/train/qat.py` (Lightning-compatible `run_qat` + `QatConfig`); `ai/scripts/qat_train.py` rewritten from `NotImplementedError` scaffold to real CLI driver; new `ai/configs/learned_filter_v1_qat.yaml`. Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) bridges PyTorch 2.11's two ONNX exporter bugs — neither legacy TorchScript (`quantized::conv2d` ops not standard ONNX) nor TorchDynamo (`Conv2dPackedParamsBase.__obj_flatten__` AttributeError) can export `convert_fx` output to QDQ ONNX today. Bridge: copy QAT-conditioned weights into fresh fp32 module, export legacy ONNX, run `quantize_static` with calibration drawn from QAT distribution. Empirical on synthetic-corpus `learned_filter_v1` (256 train pairs, 20 fp32 + 10 QAT epochs, 32-sample held-out): within-pipeline QAT-fp32 vs QAT-int8 PLCC drop 0.000081 (budget 0.002, PASS by 25×); cross-pipeline fp32-baseline vs QAT-int8 PLCC drop 0.001228 (PASS); static-PTQ comparison drop 0.000066. On this tiny model static-PTQ already exceeds budget so `learned_filter_v1` stays on `quant_mode: "dynamic"`; QAT pipeline validated and ready for the next model where static misses budget. CI smoke test under `ai/tests/test_qat_smoke.py`.	Proposed	ai, quantization, dnn, tiny-ai, qat, fork-local
ADR-0209	T5-2 — embedded MCP server scaffold (audit-first), implementing the ADR-0128 governance + Research-0005 design. New public header `libvmaf_mcp.h` declaring `VmafMcpServer`, `VmafMcpConfig` + per-transport configs, `vmaf_mcp_init` / `_start_sse` / `_start_uds` / `_start_stdio` / `_stop` / `_close` / `_available` / `_transport_available`. New stub TU at `core/src/mcp/mcp.c` — every entry point validates its arguments (NULL → `-EINVAL`) then returns `-ENOSYS` until the T5-2b runtime PR wires cJSON + mongoose + the dedicated MCP pthread + the SPSC ring buffer. New umbrella `enable_mcp` (boolean, default false) + per-transport sub-flags `enable_mcp_sse` / `enable_mcp_uds` / `enable_mcp_stdio` in `meson_options.txt`; conditional `subdir('mcp')` in `core/src/meson.build`. New 12-sub-test smoke at `core/test/test_mcp_smoke.c` pinning the `-ENOSYS` + NULL-guard contract. New `docs/mcp/embedded.md`. Zero runtime dependencies for the scaffold — no cJSON, no mongoose, no transport bodies; all land with T5-2b. Same audit-first shape as ADR-0175 (Vulkan T5-1) + ADR-0184 (T7-29 part 1).	Accepted	mcp, agents, api, scaffold, audit-first, fork-local
ADR-0210	T7-36 — cambi Vulkan integration (Strategy II hybrid GPU/host). Closes ADR-0192 long-tail terminus.	Accepted	vulkan, gpu, cambi, feature-extractor, fork-local, places-4
ADR-0211	T6-9 — formal tiny-model registry schema (JSON Schema Draft 2020-12) with new license + Sigstore-bundle metadata, plus the new `--tiny-model-verify` CLI flag wiring `cosign verify-blob` via `posix_spawnp(3p)`. Schema bumped to `schema_version: 1` (loader accepts both 0 and 1). `vmaf_dnn_verify_signature()` exposed in the public `libvmaf/dnn.h` header; fails closed on missing registry entry, missing bundle file, missing `cosign`, or non-zero exit. Validator at `ai/scripts/validate_model_registry.py` (jsonschema with structural fallback). Five entries registered today (`learned_filter_v1`, `lpips_sq_v1`, `nr_metric_v1`, two CI smoke probes); license metadata tracks fork-trained models as BSD-3-Clause-Plus-Patent and upstream-derived models with their original licenses (LPIPS-Sq is BSD-2-Clause). Bundle files themselves are populated at release time by the existing supply-chain workflow; pre-release the verifier treats absence as a fail-closed signal.	Accepted	ai, dnn, security, supply-chain, fork-local
ADR-0212	HIP (AMD ROCm) compute backend — scaffold-only audit-first PR (T7-10; closes T7-10 audit half, runtime + VIF-pathfinder land in follow-up PRs). New public header `libvmaf_hip.h` declaring `VmafHipState` / `VmafHipConfiguration` / `vmaf_hip_state_init` / `_import_state` / `_state_free` / `vmaf_hip_list_devices` / `vmaf_hip_available`. New `core/src/hip/` (common + picture_hip + dispatch_strategy) + `core/src/feature/hip/` (3 kernel stubs: adm/vif/motion). All entry points return `-ENOSYS`. New `enable_hip` boolean option (default false) with conditional `subdir('hip')` in `core/src/meson.build`. New 9-sub-test smoke at `core/test/test_hip_smoke.c` exercising every public C-API entry point. New CI matrix row `Build — Ubuntu HIP (T7-10 scaffold)` compiling with `-Denable_hip=true`. New `docs/backends/hip/overview.md` + `docs/research/0033-hip-applicability.md` + `docs/backends/index.md` flipped from "planned" to "scaffold". Zero hard runtime dependencies for the scaffold — `dependency('hip-lang', required: false)` is silently absent on stock Ubuntu runners; ROCm SDK arrives with the runtime PR. Mirrors the Vulkan T5-1 scaffold (ADR-0175).	Accepted	gpu, hip, rocm, amd, scaffold, audit-first, fork-local
ADR-0213	T7-38 — SSIMULACRA 2 SVE2 SIMD parity for IIR blur + PTLR + the four pointwise kernels. Mirrors the NEON sibling lane-for-lane under a fixed 4-lane `svwhilelt_b32(0, 4)` predicate so the SVE2 path is byte-identical to NEON regardless of the runtime vector length, satisfying the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract. Runtime-gated via `getauxval(AT_HWCAP2) & HWCAP2_SVE2`; NEON remains the fallback. Build probe (`cc.compiles(... -march=armv9-a+sve2)`) leaves `HAVE_SVE2` unset on toolchains without SVE2 intrinsics so the legacy NEON-only build path is unchanged. Validated under `qemu-aarch64-static -cpu max`: dispatch surfaces `NEON=1 SVE2=1` and all 11 `test_ssimulacra2_simd` bit-exactness subtests pass byte-for-byte against the scalar reference. Drops the "deferred pending CI hardware" footnote in Research-0016 / Research-0017 — qemu validates correctness today, native aarch64 perf runner remains a follow-up. Closes backlog item T7-38.	Accepted	simd, arm64, sve2, ssimulacra2, qemu, fork-local
ADR-0214	T6-8 — GPU-parity CI gate. New `scripts/ci/cross_backend_parity_gate.py` iterates every `(feature, backend-pair)` cell, diffs per-frame metrics with a feature-specific absolute tolerance declared in one place (`FEATURE_TOLERANCE` — `5e-5` default = places=4 from ADR-0125/0138/0140; `5e-3` for ciede / ssimulacra2; `5e-4` for psnr_hvs; `1e-2` FP16 contract via `--fp16-features`), and emits one JSON record + one Markdown row per cell. New CI lane `vulkan-parity-matrix-gate` in tests-and-quality-gates.yml runs on every PR over CPU↔Vulkan/lavapipe (no GPU runner needed); CUDA/SYCL/hardware-Vulkan are advisory until a self-hosted runner is wired in. Generalises and is the long-term replacement for the per-feature `scripts/ci/cross_backend_vif_diff.py` lane (kept for one release cycle). New user doc `docs/development/cross-backend-gate.md`; cross-linked from `docs/backends/index.md`; `libvmaf/AGENTS.md` rebase-sensitive invariant note. The gate never modifies feature implementations — verify-only.	Accepted	ci, gpu, vulkan, cuda, sycl, agents, fork-local
ADR-0215	T6-7 — FastDVDnet temporal pre-filter (5-frame sliding window) lands as a registered feature extractor `fastdvdnet_pre` in `core/src/feature/fastdvdnet_pre.c`, backed by an ONNX model with the contract `frames: [1, 5, H, W] -> denoised: [1, 1, H, W]` (channel axis stacks `[t-2, t-1, t, t+1, t+2]`). Internal 5-slot ring buffer with replicate-edge clamp at clip start/end; per-frame scalar `fastdvdnet_pre_l1_residual` appended through `vmaf_feature_collector_append`. This PR ships a smoke-only placeholder ONNX (`model/tiny/fastdvdnet_pre.onnx`, ~6 KB, randomly-initialised 3-layer CNN with the correct shape contract — registry row `smoke: true`); real upstream-derived FastDVDnet weights + the FFmpeg `vmaf_pre_temporal` filter that consumes the denoised frame buffer are tracked as T6-7b. Same `vmaf_dnn_session_*` integration shape as `feature_lpips.c`; declines cleanly on missing model_path with `-EINVAL`. New unit test `core/test/test_fastdvdnet_pre.c` mirrors `test_lpips.c` registration + options-table contract. New user-facing doc `docs/ai/models/fastdvdnet_pre.md`; roadmap §3.3 row marked shipped.	Accepted	ai, dnn, feature-extractor, wave-1, fork-local
ADR-0216	T3-15(b) — `psnr_vulkan` chroma extension. Extends the luma-only ADR-0182 extractor to emit `psnr_cb` / `psnr_cr` alongside `psnr_y`. Three-element arrays in `PsnrVulkanState` carry per-plane input + SE-partials buffers; a single command buffer issues three back-to-back dispatches of the existing plane-agnostic `psnr.comp` shader against per-plane `(width, height, num_workgroups_x)` push constants. Subsampling derived from `pix_fmt`: 4:2:0 → w/2 × h/2, 4:2:2 → w/2 × h, 4:4:4 → w × h. YUV400 clamps `n_planes = 1` so chroma dispatches and emits are skipped. `provided_features` becomes `{"psnr_y", "psnr_cb", "psnr_cr"}`. `psnr_max[p]` follows CPU integer_psnr.c default branch ((6 * bpc) + 12). Cross-backend gate (`scripts/ci/cross_backend_vif_diff.py --feature psnr`) extended to assert all three plane scores at `places=4`; lavapipe measurement on the 576×324 testdata fixture reports `max_abs_diff = 0.0` across 48 frames for all three metrics (deterministic int64 SSE on both sides). Builds on the existing Vulkan framework — no fresh shader, no fresh pipeline-creation logic; chroma SSIM / chroma MS-SSIM follow-ups stay separate rows.	Accepted	vulkan, gpu, feature-extractor, psnr, fork-local
ADR-0217	SYCL toolchain cleanup — multi-version recipe + icpx-aware clang-tidy wrapper	Accepted	sycl, ci, clang-tidy, tooling, fork-local
ADR-0218	MobileSal saliency feature extractor (T6-2a)	Accepted	ai, dnn, feature-extractor, saliency, fork-local
ADR-0219	motion3 GPU coverage on Vulkan + CUDA + SYCL (3-frame window)	Accepted	gpu, vulkan, cuda, sycl, motion, feature-extractor, fork-local, t3-15c, places-4
ADR-0220	T7-17 — SYCL feature kernels are unconditionally fp64-free. Audit confirmed all SYCL feature kernels (`integer_adm_sycl.cpp`, `integer_vif_sycl.cpp`, `integer_ciede_sycl.cpp`, `integer_ssim_sycl.cpp`, the float-input extractors) are already fp64-free in their device code — ADM gain limiting via int64 Q31 split-multiply (`gain_limit_to_q31` + `launch_decouple_csf<false>`), VIF gain limiting via fp32 `sycl::fmin` over float operands, accumulators via `sycl::plus<int64_t>`. The previous WARNING-level init log line ("device lacks fp64 support — using int64 emulation for gain limiting") was misleading: it suggested an emulation-overhead fallback that does not exist. Reworded to INFO-level "device lacks native fp64 — kernels already use fp32 + int64 paths, no emulation overhead". `VmafSyclState.has_fp64` retained for future fp64-gated optimisations. New `docs/backends/sycl/overview.md` § "fp64-less device contract (T7-17)" documents the no-`double`-in-lambda-captures rule + the SPIR-V module-taint rationale (a single fp64 instruction in any lambda blocks the whole TU on Arc A-series). The originally reported 5–10× Arc A380 vs Vulkan perf gap has a different root cause (kernel geometry / sub-group size / memory pattern) — out of T7-17's scope.	Accepted	sycl, perf, gpu, arc, intel, t7-17, fork-local
ADR-0221	T7-39 — CHANGELOG + ADR-index fragment-file pattern. New `changelog.d/<section>/<topic>.md` and `docs/adr/_index_fragments/<slug>.md` per-PR fragment trees + two in-tree concat scripts (`scripts/release/concat-changelog-fragments.sh`, `scripts/docs/concat-adr-index.sh`) replace edits to the consolidated `CHANGELOG.md` Unreleased block + `docs/adr/README.md` index table. Eliminates the per-PR merge-conflict surface that cost ≈16 min per PR over the 2026-04-28 sprint. Migration is content-preserving: existing 3119-line Unreleased body archived verbatim as `changelog.d/_pre_fragment_legacy.md`; 159 ADR rows split per-slug with a frozen `_order.txt` preserving the existing commit-merge order. PR template + Doc-Substance Gate (ADR-0167) updated to recognise fragment files as CHANGELOG entries. release-please workflow integration tracked as T7-39b follow-up.	Accepted	process, release, docs, ci, fork-local
ADR-0222	`vmaf-perShot` per-shot CRF predictor sidecar	Accepted	ai, tools, encoder-hint, fork-local, t6-3b
ADR-0223		Accepted	`ai`, `dnn`, `feature-extractor`, `wave-1`, `shot-detection`, `fork-local`
ADR-0234	T-GPU-ULP — per-GPU-generation ULP calibration scoped as two tiers. Tier 1 (this PR): YAML calibration table at `scripts/ci/gpu_ulp_calibration.yaml` mapping a runtime GPU id (Research-0041 schema: `vulkan:0xVVVV:0xDDDD` / `cuda:M.m` / `sycl:0xVVVV:DRIVER`) to a per-feature absolute tolerance for the cross-backend parity gate. Both `cross_backend_vif_diff.py` and `cross_backend_parity_gate.py` accept new `--gpu-id` / `--calibration-table` flags; when omitted, the per-feature `FEATURE_TOLERANCE` defaults remain authoritative (legacy callers see no behaviour change). Lookup picks the most-specific glob match (longest non-wildcard prefix wins). Initial coverage: 1 calibrated row (Mesa lavapipe — tolerances mirror the gate's pre-existing defaults) plus 11 placeholder rows (NVIDIA Ampere / Turing / Ada / Hopper, AMD RDNA2 / RDNA3, Intel Arc Alchemist / Battlemage, generic Intel SYCL); placeholders are functional no-ops until a real-hardware corpus replaces their `features:` block. Tier 2 (deferred to `feat(ai): T7-GPU-ULP-CAL — calibration-head v0`): an ONNX calibration head behind a new `--gpu-calibrated` CLI flag, gated on the data-collection script producing a real corpus on at least one real GPU. The hosted-CI lavapipe lane passes `--gpu-id "vulkan:0x10005:0x0"` so the gate's tolerance decisions are now per-arch annotated in the parity report. New unit test `scripts/ci/test_calibration.py` (19 cases) covers the loader, glob semantics, specificity ranking, and shipped-table round-trip.	Accepted	ai, gpu, vulkan, cuda, sycl, cross-backend, fork-local, t7-gpu-ulp-cal
ADR-0235	Codec-aware FR regressor (`fr_regressor_v2`)	Proposed	`ai`, `dnn`, `tiny-ai`, `fr-regressor`, `fork-local`
ADR-0236	DISTS extractor as LPIPS companion	Accepted	ai, fr, dnn, tiny-ai, fork-local, perceptual
ADR-0237	Quality-aware encode automation surface (`vmaf-tune`)	Accepted (Phase A only; Phases B–F remain Proposed)	tooling, ai, ffmpeg, codec, automation, fork-local
ADR-0238	Vulkan VmafPicture preallocation surface (API parity with CUDA / SYCL)	Proposed	vulkan, api, preallocation, fork-local, parity
ADR-0239	Backend-agnostic GPU picture pool (`gpu_picture_pool.{h,c}`)	Proposed	refactor, gpu, cuda, sycl, vulkan, dedup, fork-local
ADR-0240	GPU backend public-header pattern doc (PR3 of GPU dedup, doc-only)	Accepted	docs, gpu, agents, fork-local
ADR-0241	T7-10 first-consumer PR — `integer_psnr_hip` host scaffolding via mirrored kernel-template. Ships `core/src/hip/kernel_template.{h,c}` (field-for-field mirror of `cuda/kernel_template.h` from ADR-0221: `VmafHipKernelLifecycle` private-stream + 2-event struct, `VmafHipKernelReadback` device-accumulator + pinned-host pair, six lifecycle helpers) + `core/src/feature/hip/integer_psnr_hip.{c,h}` (first consumer; mirrors `integer_psnr_cuda.c`'s init/submit/collect/close call graph verbatim). Helpers and the consumer's submit/collect return `-ENOSYS` while the runtime PR (T7-10b) is pending; bodies flip to live HIP without touching consumer call-sites. New `vmaf_fex_psnr_hip` registered in `feature_extractor_list` under `#if HAVE_HIP` so callers get "extractor found, runtime not ready" instead of "no such extractor". New `VMAF_FEATURE_EXTRACTOR_HIP = 1 << 6` flag bit reserved (unused until T7-10b adds the picture buffer-type plumbing). Smoke test grows 5 sub-tests pinning template-helper `-ENOSYS` contracts + extractor-name lookup; 14/14 pass under `-Denable_hip=true`. CPU baseline (47/47) + HIP scaffold (48/48) green. No ROCm SDK required (matches ADR-0212's audit-first split). Out-of-line helpers (not `static inline` like CUDA) so the runtime PR has one editing target. Mirrors the Vulkan T5-1 → T5-1b cadence; runtime + remaining kernels remain in T7-10b.	Accepted	gpu, hip, rocm, amd, kernel-template, fork-local
ADR-0242	Scaffold-only PR preparing tiny-AI training on the local Netflix VMAF corpus (9 ref / 70 distorted YUVs at `.workingdir2/netflix/`, gitignored). Documents the corpus path convention, `--data-root` loader API, and evaluation harness. Records architecture-choice space (MLP depth/width sweep, distillation-vs-from-scratch policy, model size targets) without selecting a configuration — that decision is deferred to a follow-up PR. Adds MCP end-to-end smoke test (`test_smoke_e2e.py`) exercising `vmaf_score` against the Netflix golden fixture. No training runs; no golden assertions modified.	Accepted	ai, training, fork-local, onnx, docs
ADR-0243	`enable_lcs` MS-SSIM extras on CUDA + Vulkan	Accepted	cuda, vulkan, gpu, metrics, ms-ssim, fork-local
ADR-0244	vmaf_tiny_v2 — canonical-6 + StandardScaler tiny VMAF MLP	Accepted	ai, dnn, tiny-ai, model, registry, fork-local
ADR-0245	SIMD bit-exact test harness shared header	Accepted	simd, test, dx, fork-local
ADR-0246	Per-backend GPU kernel scaffolding templates (CUDA + Vulkan)	Accepted	gpu, cuda, vulkan, refactor, fork-local
ADR-0247	vmaf-roi sidecar binary for per-CTU QP offsets	Accepted	`tools`, `ai`, `roi`, `encoder`
ADR-0248	`nr_metric_v1` joins dynamic-PTQ family (T5-3d)	Accepted	tiny-ai, onnx, quantization, registry, fork-local
ADR-0249	Tiny-AI Wave 1 baseline C1 — `fr_regressor_v1` on Netflix Public	Accepted	tiny-ai, training, onnx, netflix-public, c1, fork-local
ADR-0250	Tiny-AI extractor template — shared scaffolding header	Accepted	`ai`, `dnn`, `refactor`, `dx`, `fork-local`
ADR-0251	Vulkan VkImage import — v2 async pending-fence model (T7-29 part 4)	Proposed	vulkan, ffmpeg, fork-local, zero-copy, performance, implementation
ADR-0252	ssimulacra2 Vulkan host-path AVX2 + NEON SIMD (T-GPU-OPT-VK-2)	Accepted	`simd`, `vulkan`, `ssimulacra2`, `performance`
ADR-0253	Defer SpEED-QA full-reference reduction. Closes the user's 2026-04-21 deep-research queued track. The fork keeps `speed_chroma` / `speed_temporal` (PR #213, port of upstream `d3647c73`) as research-stage extractors gated behind `-Denable_float=true`; does not add a `speed_qa` reduction; does not register a SpEED-driven model. Rationale: SpEED-QA overlaps `vif` substantially (both GSM-prior divisive-normalisation entropy estimators), the "speed" headline inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, and the assumed `model/speed_4_v0.6.0.json` upstream binary does not exist — there is no Netflix artefact to mirror. Reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Companion research digest: `docs/research/0051-speed-qa-feasibility.md`.	Proposed	metrics, research, feature-extractor, roadmap
ADR-0254	HIP second-consumer kernel — `float_psnr_hip` via mirrored kernel-template	Accepted	gpu, hip, rocm, amd, kernel-template, fork-local
ADR-0255	T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the ADR-0215 smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream `m-tassano/fastdvdnet` (commit `c8fdf61`, MIT) wrapped by a `LumaAdapter` PyTorch module that preserves the C-side luma `[1, 5, H, W]` → `[1, 1, H, W]` contract: each luma plane is `Concat`-tiled into RGB (`Y → [Y, Y, Y]`) to match upstream's 15-channel input, a constant `sigma = 25/255` noise map (upstream's reference inference level) is broadcast via `ones_like(centre) * sigma`, and the upstream RGB output is collapsed back to luma using BT.601 weights (`Y = 0.299 R + 0.587 G + 0.114 B`). Every `nn.PixelShuffle` instance in upstream's UpBlock is swapped pre-export for an allowlist-safe `Reshape`/`Transpose`/`Reshape` decomposition (zero learned params → numerically identical, verified `< 1e-6` max-abs diff between upstream PyTorch and exported ONNX); `DepthToSpace` deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips `smoke: false` with `license: MIT`, upstream commit pin, and refreshed `sha256`; sidecar JSON + doc `docs/ai/models/fastdvdnet_pre.md` carry full provenance. New `ai/scripts/export_fastdvdnet_pre.py` (replaces the `_placeholder.py` exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up.	Accepted	ai, dnn, feature-extractor, wave-1, weights-drop, fork-local
ADR-0256	Vulkan submit-side template + fence pool + descriptor pre-alloc
ADR-0257	Defers the T6-2a-followup real-weights swap for `mobilesal_placeholder_v0` indefinitely. Upstream `yuhuan-wu/MobileSal` is CC BY-NC-SA 4.0 (incompatible with the fork's BSD-3-Clause-Plus-Patent), distributes weights only via Google Drive viewer URLs (no GitHub release / pinnable raw URL), and is RGB-D (the C contract is RGB-only). ADR-0218's "MIT-licensed" claim was inaccurate; corrected here and in `docs/ai/models/mobilesal.md`. Smoke-only placeholder remains shipped; recommended replacement is U-2-Net `u2netp` (Apache-2.0), filed as backlog row T6-2a-replace-with-u2netp. Companion to Research-0053.	Accepted	ai, dnn, mobilesal, saliency, license, fork-local, docs
ADR-0258	T7-32 — ONNX op-allowlist gains `Resize` to unblock U-2-Net (PR #341) and the wider saliency / segmentation surface (mobilesal, BASNet, PiDiNet, FPN-style detectors). One-line addition under `op_allowlist.c`'s `/* convolutional */` block plus comment block citing the supported-mode contract (`nearest`, `linear` recommended; `cubic` not exercised by any in-tree consumer and numerically less stable on quantised inputs). Wire scanner stays op-type-only per ADR D39 / ADR-0169 — per-attribute `mode` filtering would expand the bounded-auditable scope. Python `vmaf_train.op_allowlist` regex parser picks up the new entry automatically (export-time + load-time symmetry preserved). New tests: `test_resize_op_allowed` (C allowlist), `test_resize_top_level_allowed` (C wire-format scan), `test_resize_now_allowed` (Python parser). All 47 libvmaf tests + 15 Python tests green.	Accepted	tiny-ai, onnx, security, op-allowlist
ADR-0259	HIP third-consumer kernel — `ciede_hip` via mirrored kernel-template	Accepted	gpu, hip, rocm, amd, kernel-template, fork-local
ADR-0260	HIP fourth-consumer kernel — `float_moment_hip` via mirrored kernel-template	Accepted	gpu, hip, rocm, amd, kernel-template, fork-local
ADR-0261	TransNet V2 shot-boundary detector — real upstream weights via NTCHW adapter (T6-3a-followup)	Accepted	`ai`, `dnn`, `feature-extractor`, `wave-1`, `weights-drop`, `shot-detection`, `fork-local`
ADR-0262	Relax `ai/scripts/build_bisect_cache.py --check` parquet leg from `filecmp.cmp` byte equality to typed-Arrow-Table content equality (`pyarrow.Table.equals` + schema + row-count). Caused by issue #40 freezing on a 2026-04-21 success comment for ~14 days while the nightly red-lined every night on `parquet-cpp-arrow version 23 → 24` `created_by`-string drift in the runner image's pyarrow upgrade. ONNX byte-equality preserved (`producer_name` / `producer_version` / `ir_version` already pinned). `nightly-bisect.yml` decouples sticky-comment updates from `result.json` existing: a new `post-bisect-comment.py --wiring-broke` mode posts a "WIRING BROKE" verdict to issue #40 with the cache-check stderr inline when `--check` itself fails, then exits non-zero so the run stays red. Relaxes ADR-0109 §Decision (parquet only).	Accepted	ai, ci, tiny-ai, framework
ADR-0263	OSSF Scorecard policy + remediation cadence. Diagnoses the months-long red `scorecard.yml` workflow (root cause: `github/codeql-action/upload-sarif` SHA `b25d0ebf...` is an "imposter commit" — no longer exists in the action's repository, so Scorecard's webapp returns 400 on publish). Repins to current `v4` head `e46ed2cbd0...` to unblock. Documents per-check scoring against the 6.2 / 10 baseline (target ≥ 7.0): accepted blockers are `Code-Review` (solo-maintainer artefact), `Branch-Protection` (`GITHUB_TOKEN` can't read classic rules without a fine-grained PAT, which the secret-policy forbids), `Maintained` (auto-resolves at 90-day age), `CII-Best-Practices` (out-of-scope external badge application). Active remediation queue: `Vulnerabilities` (bump `python/requirements.txt` lower bounds), `Pinned-Dependencies` (upstream Dockerfile-parser bug + SHA-pin sweep), `Fuzzing` (OSS-Fuzz onboarding), `Signed-Releases` (resolves on first release-please cut), `Packaging`. Companion research digest: `docs/research/0053-ossf-scorecard-investigation.md`. 90-day re-evaluation cadence.	Accepted	ci, security, supply-chain, docs, fork-local
ADR-0264	Vulkan 1.4 API-version bump blocked on shader FP-contraction audit	Accepted	vulkan, fork-local, bit-exactness, backlog, docs
ADR-0265	Defers the T6-2a-replace-with-u2netp model-family swap recommended by ADR-0257. Upstream `xuebinqin/U-2-Net` is Apache-2.0 (license OK) but `u2netp.pth` is distributed only via Google Drive (no GitHub release, no pinnable raw URL — same blocker as MobileSal in ADR-0257), and U-2-Net's `F.upsample(..., mode='bilinear')` lowers to ONNX `Resize` which is not on the fork's `op_allowlist.c`. Smoke-only placeholder remains shipped; recommended next step is `T6-2a-widen-allowlist-resize` (separate ADR-scope decision) before another saliency-replacement attempt. Companion to Research-0054.	Accepted	ai, dnn, mobilesal, u2netp, saliency, license, op-allowlist, fork-local, docs
ADR-0266	HIP fifth kernel-template consumer — `float_ansnr_hip`	Accepted	`hip`, `gpu`, `feature-extractor`, `kernel-template`
ADR-0267	HIP sixth kernel-template consumer — `motion_v2_hip`	Accepted	`hip`, `gpu`, `feature-extractor`, `kernel-template`, `temporal`
ADR-0269	Step A of the Vulkan 1.4 API-version bump path documented in ADR-0264. Tags the load-bearing FP ops in `vif.comp` (`g`, `sv_sq`, `gg_sigma_f`) and `ciede.comp` (yuv→rgb outputs, rgb→xyz matmul accumulators, ciede2000 chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE) with GLSL `precise` so glslc emits per-result `OpDecorate ... NoContraction` (62 lines in vif, 126 in ciede; verified bit-identical 1.3↔1.4 SPIR-V). Partial fix only: on NVIDIA RTX 4090 + driver 595.71.05, ciede improves 19× (42/48 → 5/48 mismatches; 1.67e-04 → 8.9e-05 max abs at `places=4`); vif's 45/48 regression at API 1.4 is not fixed — the SPIR-V decorations are correctly emitted on the suspect ops but the driver still drifts, suggesting the regression is not in those five tagged float ops. Step B remains blocked. Documented in research-0054. 5/48 ciede tail (1.78× the threshold) deferred to a CPU-side double-vs-float bisect follow-up.	Accepted	vulkan, fork-local, bit-exactness, shaders
ADR-0270	libFuzzer scaffold for parser surfaces (OSSF Scorecard remediation)	Proposed	security, build, ci, docs
ADR-0271	T-GPU-OPT-2 — wires `integer_ms_ssim_cuda` through the engine-scope CUDA fence-batching helper (`drain_batch.h`). Allocates per-scale partials buffers (5× `l_partials[]` / `c_partials[]` / `s_partials[]` device + matching pinned host shadows) so all 5 SSIM scales' `horiz` + `vert_lcs` launches and DtoH copies enqueue back-to-back on `s->lc.str` inside `submit()`; records `s->lc.finished` once after the last DtoH and calls `vmaf_cuda_drain_batch_register(&s->lc)` so the engine's single `cuStreamSynchronize(drain_str)` covers this extractor too. `collect()` collapses to a host-side reduction only — `vmaf_cuda_kernel_collect_wait` short-circuits when the engine has already drained the lifecycle. The shared SSIM intermediate buffers (`h_ref_mu`, `h_cmp_mu`, `h_ref_sq`, `h_cmp_sq`, `h_refcmp`) stay shared because same-stream ordering serialises `horiz` ⇒ `vert_lcs` ⇒ DtoH naturally; the previous per-scale `cuStreamSynchronize` was forced only by the host-side reduction stepping in between launches. Net change: 6 per-frame host-blocking syscalls collapse into the existing batched flush; expected ms_ssim wall-clock +3-5% on the Netflix CUDA benchmark. Bit-exact (same kernels, same stream, same submission order; only the host wait point moves). Vulkan / SYCL twins unchanged — `drain_batch` is CUDA-only by design (ADR-0246). Footprint grows by ~12 small buffers per state (≈ 8.1 KB device + 8.1 KB pinned host at 1080p, negligible against the existing pyramid allocations).	Accepted	cuda, gpu, perf, fork-local
ADR-0272	`fr_regressor_v2` codec-aware scaffold (Phase B prereq)	Proposed	`ai`, `dnn`, `tiny-ai`, `fr-regressor`, `codec-aware`,
ADR-0273	HIP seventh kernel-template consumer — `float_motion_hip`	Accepted	`hip`, `gpu`, `feature-extractor`, `kernel-template`, `temporal`, `fork-local`
ADR-0274	HIP eighth kernel-template consumer — `float_ssim_hip`	Accepted	`hip`, `gpu`, `feature-extractor`, `kernel-template`, `multi-dispatch`, `fork-local`
ADR-0275	`vmaf_tiny_v3` and `vmaf_tiny_v4` join dynamic-PTQ family (T5-3d follow-up)	Accepted	tiny-ai, onnx, quantization, registry, fork-local
ADR-0276	T-VMAF-TUNE Phase A.5 — opt-in `vmaf-tune fast` proxy-based recommend. Combines three acceleration levers (VMAF proxy via `fr_regressor_v2` per ADR-0272, Bayesian search via Optuna TPE, GPU-accelerated VMAF verify per ADR-0157 / ADR-0186) into a new `fast` subcommand that targets the recommendation use case at ~20–50× the speed of the Phase A grid (~100–500× with NVENC, follow-up). Slow Phase A grid path stays canonical as ground-truth corpus generator (ADR-0237 contract); fast-path is opt-in via `pip install vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop, smoke-mode synthetic predictor, CLI subcommand, production-shape entry point. Real encode + ONNX inference + GPU verify wiring is a follow-up PR gated on Phase A corpus + fr_regressor_v2 weights training (PR #347). Smoke test: `vmaf-tune fast --smoke --target-vmaf 92`. Companion research digest: `docs/research/0060-vmaf-tune-fast-path.md`.	Proposed	tooling, ai, ffmpeg, codec, automation, fork-local
ADR-0277	ffmpeg-patches series replay against pristine `n8.1` — 2026-05-04. Verifies the six-patch stack (`0001-libvmaf-add-tiny-model-option.patch` through `0006-libvmaf-add-libvmaf-vulkan-filter.patch`) still applies cleanly cumulatively per ADR-0118 + ADR-0186. No content drift: all six patches replay clean via `git am --3way`; `git format-patch n8.1..` regeneration shows only cosmetic noise (PATCH numbering, MIME headers, hunk-context counts). Keeps originals to avoid churn. PRs #332-#341 (HIP kernel-template consumers, OSSF Scorecard work, Vulkan 1.4 deferral, U-2-Net deferral) introduced zero drift on the ffmpeg-integration surface. `vf_libvmaf` end-to-end smoke deferred to CI (`ffmpeg-integration.yml`) — the meson-uninstalled `.pc` doesn't satisfy FFmpeg's `#include <libvmaf.h>` probe locally; CI runs against an installed prefix.	Accepted	ffmpeg, fork-local, maintenance, patches
ADR-0278	T7-5 NOLINT-sweep closeout. Cite-only pass that appends `(ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278)` to the 22 surviving `readability-function-size` NOLINT sites in `core/src/` + `core/tools/` whose comments described the invariant in prose without naming an ADR explicitly. Touches `integer_adm.c` (1 site, upstream-mirror Netflix `966be8d5`), `cuda/ssimulacra2_cuda.c` (3 sites), `vulkan/ssimulacra2_vulkan.c` (3 sites), `vulkan/cambi_vulkan.c` (1 site), `sycl/integer_adm_sycl.cpp` (6), `sycl/integer_motion_sycl.cpp` (2), `sycl/integer_vif_sycl.cpp` (4), `tools/vmaf.c` (3 driver functions). After this PR, programmatic audit reports 75 sites total, 0 missing ADR/Research citations. Backlog item T7-5 closed. No function bodies split, no behavioural change. Companion research digest `docs/research/0063-t7-5-nolint-sweep.md`.	Accepted	lint, cleanup, touched-file-rule, t7-5, fork-local, docs
ADR-0279	vmaf-tune codec adapter — libaom-av1	Accepted	tools, vmaf-tune, av1, codec-adapter
ADR-0281	`vmaf-tune` Intel QSV codec adapters (`h264_qsv`, `hevc_qsv`, `av1_qsv`)	Accepted	tooling, ffmpeg, codec, qsv, intel, fork-local
ADR-0282	`vmaf-tune` AMD AMF codec adapters (h264 / hevc / av1)	Accepted	tooling, ffmpeg, codec, amd, amf, vmaf-tune, fork-local
ADR-0283	Apple VideoToolbox codec adapters for `tools/vmaf-tune/`. Adds `H264VideoToolboxAdapter` + `HEVCVideoToolboxAdapter` (and a shared `_videotoolbox_common.py` for the `-q:v` 0..100 quality knob + nine-name preset → `-realtime` boolean mapping) along the same one-file-per-codec contract NVENC / AMF / QSV already use. AV1 hardware encoding intentionally omitted (unavailable on Apple Silicon as of 2026). Tests mock `subprocess.run` so Linux CI stays green; macOS end-to-end is left to contributors with VideoToolbox available locally. The originally-coupled 16-slot codec-vocab schema expansion is deferred to a follow-up PR awaiting a fresh `fr_regressor_v2` retrain (ship-gate per ADR-0235 + ADR-0291). Companion research digest `docs/research/0074-vmaf-tune-videotoolbox-adapters.md`.	Accepted	tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local
ADR-0285	`vmaf-tune` libvvenc adapter — VVC / H.266 with optional NN-VC tools	Accepted	tooling, codec, vvc, h266, ai, nnvc, fork-local
ADR-0286	Fork-trained saliency student `saliency_student_v1` on DUTS-TR	Accepted	ai, dnn, mobilesal, saliency, training, license, fork-local, docs
ADR-0287	vmaf_tiny_v5 — corpus expansion (4-corpus + YouTube UGC vp9 subset)	Accepted (decision	ai, dnn, tiny-ai, training-data, research, fork-local
ADR-0288	`vmaf-tune` libx265 codec adapter — first sibling codec after the ADR-0237 Phase A x264 scaffold. New `tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py` (frozen `X265Adapter` dataclass: 10 presets including `placebo`, 0..51 CRF window pinned to the same Phase A informative range as x264, `profile_for(pix_fmt)` table mapping `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`. `encode.parse_versions` gains an encoder-aware regex (`x265 [info]: HEVC encoder version …` → `libx265-<version>`). CLI `--encoder` now accepts `libx264 \| libx265` via `choices=list(known_codecs())`. 14 new subprocess-mocked tests under `tests/test_codec_adapter_x265.py` (adapter contract, preset/CRF validation, profile mapping, ffmpeg argv shape, version parsing, run_encode round-trip, corpus end-to-end, missing-binary error handling). Real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`. No schema bump — existing `encoder` row column already carries codec identity. Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto).	Accepted	tooling, ffmpeg, codec, automation, fork-local, vmaf-tune
ADR-0289	`vmaf-tune` resolution-aware model selection + CRF offsets	Accepted	tooling, vmaf-tune, model-selection, fork-local
ADR-0290	NVENC codec adapters for `vmaf-tune` (h264 / hevc / av1)	Accepted	tooling, codec, nvenc, gpu, fork-local
ADR-0291	T-FR-V2-PROD — flip `fr_regressor_v2` from smoke to production. Trained on the Phase A real-corpus 216-cell aggregate (216 (src, encoder, preset, cq) cells averaged from 33,840 per-frame canonical-6 rows produced by `scripts/dev/hw_encoder_corpus.py`, ADR-0237 / PR #392) using the v2 ENCODER_VOCAB (12 encoders, PR #394). MLP `6 → 32 → 32 → 32 → 1` with 14-D codec block concatenation; 200ep Adam lr=5e-4 batch=32. LOSO PLCC = 0.9681 ± 0.0207 clears the ADR-0235 0.95 ship gate (one outlier — OldTownCross at 0.9183 — held in scope). Registry row flips `smoke: false`, sha256 `67934b0b…`. Production default stays `vmaf_tiny_v2`; `fr_regressor_v2` is the teacher-score predictor for vmaf-tune Phase B+. Companion research digest Research-0067.	Accepted	ai, dnn, tiny-ai, fr-regressor, codec-aware, vmaf-tune, fork-local
ADR-0293	`vmaf-tune` saliency-aware ROI tuning (Bucket #2)	Accepted	tooling, ai, saliency, ffmpeg, codec, fork-local
ADR-0294	vmaf-tune codec adapter for SVT-AV1	Accepted	`tools`, `vmaf-tune`, `codec`, `av1`, `fork-local`
ADR-0295	vmaf-tune Phase E — per-title bitrate-ladder generator	Proposed	tooling, ffmpeg, codec, automation, abr, fork-local
ADR-0296	Region-of-interest VMAF scoring surface — Option C scaffold. Ships `tools/vmaf-roi-score/` (Python tool) that drives the `vmaf` CLI twice (full-frame + saliency-masked) and blends the two pooled scalars via a user-controlled weight `w ∈ [0, 1]`. Distinct from the existing `core/tools/vmaf_roi.c` encoder-steering binary (ADR-0247) — different surface, related model. Combine math (`blend_scores`) + CLI surface + JSON schema (`SCHEMA_VERSION = 1`) + subprocess seam ship in this PR. The `--saliency-model` ONNX inference path is wired and validated but mask materialisation deliberately exits 64 — gated on PR #359 (`saliency_student_v1`) merging and a follow-up T6-2c PR. Option A (per-pixel feature pooling weighted by saliency in libvmaf C code) is explicitly deferred — separate ADR + research-grade numerical validation; this scaffold avoids the Netflix golden gate and cross-backend ULP-diff burden entirely. No MOS-correlation claim is made — validation against a labelled saliency-MOS corpus is research follow-up (Research-0069). Companion to Research-0069 (Option-space digest).	Accepted	tooling, ai, saliency, vmaf, fork-local
ADR-0297	`vmaf-tune` — codec-agnostic encode dispatcher	Accepted	tooling, ffmpeg, codec, automation, fork-local
ADR-0298	vmaf-tune content-addressed encode/score cache	Accepted	`tools`, `vmaf-tune`, `cache`, `fork-local`
ADR-0299	GPU scoring backend for `vmaf-tune` (`--score-backend`)	Accepted	tooling, cuda, vulkan, sycl, ai, automation, fork-local
ADR-0300	Bucket #9 of the PR #354 vmaf-tune capability audit — HDR-aware encoding + scoring in `tools/vmaf-tune/`. New `vmaftune.hdr` module exposes `detect_hdr` (ffprobe-driven PQ / HLG classification with strict BT.2020 primaries gate so malformed signaling falls back to SDR), `hdr_codec_args` (per-encoder dispatch table covering libx264, libx265, libsvtav1, hevc_nvenc, libvvenc), and `select_hdr_vmaf_model` (returns `model/vmaf_hdr_*.json` if shipped). Corpus driver gains `--auto-hdr` / `--force-sdr` / `--force-hdr-pq` / `--force-hdr-hlg` mutually-exclusive modes and three new schema-v2 row keys (`hdr_transfer`, `hdr_primaries`, `hdr_forced`); `SCHEMA_VERSION` bumped 1 → 2. Score model arg now accepts pre-formatted `path=` / `version=` strings so HDR-model paths flow through unchanged. Encode-side correctness ships now; HDR-VMAF model port is filed as a backlog follow-up — until it lands, HDR sources are scored against the SDR model with a one-shot warning.	Accepted (encode-side); HDR-VMAF scoring deferred	tooling, vmaf-tune, hdr, codec, ffmpeg, fork-local
ADR-0301	`vmaf-tune --sample-clip-seconds N` — opt-in sample-clip mode that encodes/scores only the centre N-second window of each source per grid cell instead of the full reference, scaling per-cell wall time roughly linearly with slice length (e.g. ~6x speedup at `N=10` against a 60-second source). FFmpeg input-side `-ss <start> -t <N>` cuts the rawvideo demuxer at the slice boundary; the libvmaf CLI's `--frame_skip_ref` / `--frame_cnt` mirror the same window on the score side so VMAF compares matching frames without slicing the reference YUV on disk. Centre-anchored placement (naive scaffold; TransNet V2-based smart placement is a follow-up). Each emitted row carries `clip_mode = "sample_<N>s"` or `"full"` so Phase B (target-VMAF bisect) and Phase C (per-title CRF predictor) can filter, weight, or epilogue-rescore. Corpus schema bumps additively to `SCHEMA_VERSION = 2`. Expected accuracy delta ~1–2 VMAF points on diverse content, ~0.3–0.5 on uniform content. Companion to ADR-0237 Phase A.	Accepted	tooling, ffmpeg, vmaf-tune, fork-local
ADR-0302	ENCODER_VOCAB v3 — 16-slot schema expansion + retrain plan	Accepted
ADR-0303	`fr_regressor_v2` ensemble — production flip trainer + CI gate	Accepted	ai, fr-regressor, ensemble, probabilistic, loso, ci-gate, fork-local
ADR-0304	`vmaf-tune fast` — production wiring (Optuna TPE + v2 proxy + GPU verify)	Accepted	tooling, ai, ffmpeg, codec, automation, fork-local
ADR-0305	Encoder knob-space Pareto-frontier analysis stratified per (source, codec, rc_mode)	Accepted	ai, vmaf-tune, research, encoder, pareto, fork-local
ADR-0306	`vmaf-tune` coarse-to-fine CRF search	Accepted	tooling, automation, vmaf-tune, ffmpeg
ADR-0307	`vmaf-tune` ladder default sampler — wire Phase B/E gap	Accepted	tooling, automation, vmaf-tune, ladder, fork-local
ADR-0308	Encoder knob-sweep recipe-regression revision policy	Accepted	ai, vmaf-tune, codec-adapters, knob-sweep, fork-local
ADR-0309	`fr_regressor_v2` ensemble — real-corpus retrain harness + flip workflow	Accepted	ai, fr-regressor, ensemble, loso, runbook, fork-local
ADR-0310	BVI-DVC corpus ingestion for `fr_regressor_v2` — adopt the Bristol VI Lab BVI-DVC dataset (Ma, Zhang, Bull 2021) as a second training shard alongside the Netflix Public drop. New `ai/scripts/bvi_dvc_to_corpus_jsonl.py` adapter re-shapes the existing parquet pipeline's cached libvmaf JSON into vmaf-tune Phase A `CORPUS_ROW_KEYS` rows; new `ai/scripts/merge_corpora.py` concatenates Netflix + BVI-DVC shards with `(src_sha256, encoder, preset, crf)` deduplication and schema validation. Triples training-corpus size and expands LOSO partitioning from 9 folds to 9 + N. License posture is local-only — corpus stays under `.workingdir2/`, only derived `fr_regressor_v2_*.onnx` weights ship. Production-weights flip stays gated on ADR-0303's ensemble criterion; this ADR ships ingestion infrastructure only. Tests under `ai/tests/test_merge_corpora.py` cover concat-with-dedup and schema-violation rejection on synthetic fixtures.	Accepted	ai, training, corpus, license, fork-local
ADR-0311	libFuzzer harness expansion — `fuzz_yuv_input` + `fuzz_cli_parse`	Accepted	security, build, ci, docs, fork-local
ADR-0312	FFmpeg-patch series for vmaf-tune integration (qpfile + libvmaf_tune + pass-autotune)	Accepted	tooling, ffmpeg, vmaf-tune, patch-series, scaffold
ADR-0313	New `Required Checks Aggregator` workflow + branch-protection policy update. The 23-named-required-check posture (per ADR-0037) deadlocks doc/Python-only PRs because the C-build matrix path-filter-skips on their diffs, but branch protection counts a path-filter-skip + a never-ran-at-all as not satisfying the required-check. Aggregator is one workflow that always runs (no path filter) and verifies each named check on the head SHA reported `success`/`skipped`/`neutral` (or didn't appear at all — path-filter rejection). Aggregator becomes the single required check; the 23 individual workflows continue to run unchanged. Manual operator step at adoption: `gh api PUT repos/VMAFx/vmafx/branches/master/protection/required_status_checks -F 'contexts=["Required Checks Aggregator"]'`. Unblocks PRs #400, #403, #404, #405, #406, #407 currently stuck on the structural deadlock.	Proposed	ci, branch-protection, policy, fork-local
ADR-0314	vmaf-tune `--score-backend=vulkan` (vendor-neutral GPU scoring)	Accepted	tooling, vmaf-tune, vulkan, gpu, fork-local
ADR-0315	Vendor-neutral VVC GPU encode strategy — three-tier rollout. Verified premise (NVIDIA NVENC SDK 13.0 docs, 2026-05-05): no GPU vendor ships hardware VVC encode silicon today (NVENC supports only H.264 / HEVC / AV1; AMD AMF / Intel QSV indicative same). Tier 1 (ship today): document NN-VC as the de-facto vendor-neutral H.266 GPU contribution (runs on any ONNXRuntime EP via the existing `vvenc.py` adapter) and wire the existing Vulkan backend to `vmaf-tune` for GPU-accelerated scoring of CPU-encoded VVC bitstreams (sibling ADR-0314, scoped separately). Tier 2 (backlog): HIP port of VVenC's motion-estimation, transform, and loop-filter kernels, gated on three demand-pull triggers — user-reported throughput pain on a real corpus, Tier-1 NN-VC docs adopted in production, RDNA 3/4 or PVC CI machine available. Tier 3 (quarterly revisit): a `VK_KHR_video_encode_h266`-based libvmaf-side encode adapter, gated on Khronos ratification + at least one shipping driver. No `h266_nvenc` adapter follow-up is planned (silicon does not ship VVC encode); ZLUDA + hypothetical CUDA-VVC rejected as not actionable. Source survey: Research-0085 (skeleton — most factual claims marked `[UNVERIFIED]` pending direct vendor-doc check).	Proposed	codecs, vvc, h266, gpu, hip, sycl, vulkan-video, vmaf-tune, nn-vc, fork-local
ADR-0316	cli_parse — handle long-only options in `error()`	Accepted	cli, security, fork-local, fuzzing
ADR-0317	Path-filter Docker + FFmpeg-integration on doc/Python-only PRs	Accepted	`ci`, `build`
ADR-0318	`fr_regressor_v2` ensemble retrain harness — wrapper-trainer interface fix + Phase A pre-step doc	Accepted	ai, fr-regressor, ensemble, loso, runbook, fork-local
ADR-0319	`fr_regressor_v2` ensemble LOSO trainer — real loader + per-fold training	Accepted	ai, fr-regressor, ensemble, loso, fork-local
ADR-0320	`fr_regressor_v2` ensemble seeds — production flip (smoke → false)	Accepted	ai, fr-regressor, ensemble, registry, prod-flip, fork-local
ADR-0321	`fr_regressor_v2_ensemble_v1` — full production flip (real ONNX + sidecars)	Accepted	`ai`, `tinyai`, `models`, `registry`, `prod-flip`
ADR-0323	`fr_regressor_v3` — train + register on ENCODER_VOCAB v3 (16-slot)	Accepted	ai, fr-regressor, codec-aware, encoder-vocab, loso, fork-local
ADR-0324	Ensemble training kit — portable Phase-A + LOSO retrain bundle	Accepted	ai, fr-regressor, ensemble, tooling, fork-local
ADR-0325	KonViD-150k corpus ingestion	Accepted	ai, training, corpus, license, fork-local
ADR-0326	`vmaftune.bisect` — Phase B target-VMAF bisect: integer binary search over CRF assuming monotone-decreasing VMAF in CRF, returns the largest CRF whose measured VMAF still meets the floor. Six-to-eight encode round-trips per call (`O(log range)`); midpoint rounds toward higher CRF so the "best so far" is always a measured CRF, never extrapolated. Aborts with a clear error on monotonicity violation rather than falling back to a different search strategy (the `tools/vmaf-tune/AGENTS.md` invariant). Subprocess seam mirrors `encode.run_encode` / `score.run_score` so unit tests run with synthetic curves. `make_bisect_predicate(target_vmaf, *, width=..., height=..., framerate=..., duration_s=...)` adapter satisfies `compare.PredicateFn`; `compare._default_predicate` updated to point callers at the entry-point. Replaces the `NotImplementedError("Phase B pending")` placeholder referenced by ADR-0276, ADR-0287, ADR-0295, ADR-0306. Companion to ADR-0237 Phase A.	Accepted	tooling, vmaf-tune, fork-local
ADR-0328	Cambi cluster port — skip the shared-header rename	Accepted	simd, port, cambi
ADR-0331	Skip CI on draft pull requests across all 8 fork workflows. Adds `types: [opened, synchronize, reopened, ready_for_review]` to each `pull_request` trigger and gates every top-level job with an `if:` clause that skips when `github.event.pull_request.draft == true` while keeping `push:` triggers intact. The `ready_for_review` event re-fires CI when a draft is promoted; draft PRs are unmergeable by GitHub's definition, so the resulting "Required check missing" state on drafts is benign. Cuts CI spend by roughly half on the fork's typical 10+-draft work-in-progress queue.	Proposed	ci, build, fork-local
ADR-0332	Agent worktree-drift hard guard. New pre-commit hook `scripts/ci/check-agent-worktree-drift.sh` (wired through `.pre-commit-config.yaml` and installed by `make hooks-install`) refuses commits whose `git rev-parse --show-toplevel` is the main checkout while one or more sibling agent worktrees exist under `<root>/.claude/worktrees/agent-*`. Catches the drift pattern observed five times in the 2026-05-09 session (PR #498, #520, #526, MCP runtime v2 first attempt, multi-corpus run) where a background agent committed into the main checkout instead of its assigned worktree, clobbering the user's working state. Bypass: `git commit --no-verify` for legitimate main-checkout commits while an agent runs. Pairs with the process-side global rule `feedback_agents_isolated_worktree_only`; documented at `docs/development/agent-worktree-discipline.md`.	Accepted	agents, ci, build, fork-local
ADR-0333	`vmaf-tune` Phase F multi-pass encoding — first proof-of-concept on `libx265`. Adapter contract gains `supports_two_pass: bool` + `two_pass_args(pass_number, stats_path) -> tuple[str, ...]`; `X265Adapter` overrides both, returning `('-x265-params', f'pass={N}:stats={path}')`. `EncodeRequest` gains optional `pass_number: int = 0` + `stats_path: Path \| None = None`; `build_ffmpeg_command` splices `two_pass_args` and redirects pass-1 output to `-f null -` so the throwaway pass doesn't write a useless mp4. New `encode.run_two_pass_encode(req, ...)` runs pass 1 → pass 2 with a per-encode unique stats path, returns one combined `EncodeResult` (encode_time = sum of both passes; size = pass-2 size). New CLI flag `--two-pass` opts in on `corpus` / `recommend`; default stays single-pass. Adapters where `supports_two_pass = False` fall back to single-pass with a stderr warning rather than failing (matches the saliency x264-only fallback precedent). Cache key (ADR-0298) gains `two_pass: bool` so 1-pass and 2-pass cells are distinct entries. New tests under `tests/test_codec_adapter_x265_two_pass.py` (subprocess-mocked argv shape, run_two_pass_encode round-trip, corpus end-to-end with `pass_count`-aware cache, fallback behaviour for unsupported adapters). Identifies which sibling codecs benefit from this seam: `libx264`, `libsvtav1`, `libvvenc`, `libaom-av1` (yes, follow-up PRs); NVENC family (separate ADR — `-multipass` is single-invocation lookahead, not the stats-file two-call sequence); AMF / QSV / VideoToolbox (no — hardware encoders use internal lookahead, no stats-file 2-pass).	Accepted	tooling, ffmpeg, codec, automation, fork-local, vmaf-tune, phase-f
ADR-0334	state.md-touch-check CI gate (ADR-0165 enforcement) — promotes CLAUDE.md §12 rule 13 (every PR that closes / opens / rules-out a bug updates `docs/state.md`) from reviewer-enforced to CI-enforced. Lands `scripts/ci/state-md-touch-check.sh` + companion fixture script `scripts/ci/test-state-md-touch-check.sh` (8 cases: 5 primary + 3 regression for `debug` vs `bug`, `Closes #N`, `BUG-` upper-case), wired as a fourth blocking job `state-md-touch-check` in `.github/workflows/rule-enforcement.yml` alongside the existing `deep-dive-checklist`/`doc-substance-check`/`adr-backfill-check` jobs. Trigger predicate fires on Conventional-Commit `fix:` prefix, the bare token `bug` (word-boundary so `debug` doesn't fire), GitHub-issue close keywords (`closes`/`fixes`/`resolves` `#N`), or an unchecked Bug-status-hygiene template row. Pass conditions: diff touches `docs/state.md`, OR PR body carries `no state delta: REASON` (HTML comments stripped first so the template's instructional placeholder doesn't accidentally satisfy). Same draft-PR gating + script-with-thin-wrapper shape as `deliverables-check.sh` (ADR-0124). Surfaced as a backlog row by the state.md audit-backfill PR #455.	Accepted	ci, process, state-hygiene, claude-rule, fork-local
ADR-0335	Hardware-capability priors for the FR-regressor corpus — ships `ai/data/hardware_caps.csv` with per-architecture capability fingerprints (codecs supported, max resolution per codec, encoding-block count, tensor-core / NPU presence, driver-min-version, primary vendor source URL, ISO verified-date) for the three named GPU generations Battlemage / RDNA4 / Blackwell plus their immediate predecessors Alchemist / RDNA3 / Ada Lovelace (six rows on 2026-05-08). Loader at `ai/scripts/hardware_caps_loader.py` exposes `cap_vector_for(encoder, encoder_arch_hint)` returning fixed-shape `hwcap_` feature columns the corpus-ingest pipeline merges into encode rows. Prior-only by design*: schema rejects benchmark-shaped columns, community-wiki source URLs, empty fields, and zero-encoding-block rows; companion research digest `0088-hardware-capability-priors-2026-05-08.md` establishes the category-1 NO-GO finding (vendor-published throughput / quality numbers leak biased priors) and the category-3 NO-GO finding (community wikis lack audit trail). NVIDIA Hopper deliberately excluded — H100 / H200 ship zero NVENC engines and fall outside an encode-capability fingerprint. Operator doc at `docs/ai/hardware-capability-priors.md`.	Accepted	ai, corpus, data, docs
ADR-0336	KonViD MOS head v1 (ADR-0325 Phase 3) — train and register `konvid_mos_head_v1`, the fork's first head trained directly against subjective MOS ratings (not the libvmaf VMAF teacher score). Maps canonical-6 + saliency mean/var + 3 TransNet shot-metadata columns + ENCODER_VOCAB v4 single-slot one-hot to a scalar MOS in `[1.0, 5.0]`. Small MLP (~5K params, opset 17, ONNX-allowlist conformant). Trainer is `ai/scripts/train_konvid_mos_head.py`; vmaf-tune callers reach the surface via `Predictor.predict_mos()` with a documented linear approximation `mos = (predicted_vmaf - 30) / 14` as the fallback when the ONNX is missing. Production-flip gate (PLCC ≥ 0.85, SROCC ≥ 0.82, RMSE ≤ 0.45, spread ≤ 0.005) is not lowered on real-corpus failures (memory `feedback_no_test_weakening`); the synthetic-corpus surrogate gate ships at PLCC ≥ 0.75. Real-corpus retrain blocked on PR #447 (KonViD-150k ingestion).	Accepted	ai, training, mos, konvid, fork-local
ADR-0337	motion_v2 inherits motion v1's public option surface (duplicate registration) — resolves the architectural question deferred by PR #453 / PR #460. Lands the upstream four-commit `motion_v2` cluster (`856d3835` mirror fix, `c17dd898` `motion_max_val`, `a2b59b77` `motion_five_frame_window`-as-option, `4e469601` remaining options + `motion3_v2_score`) by registering the same option set on `motion_v2` that motion v1 already exposes via ADR-0158. Picks A1: duplicate option surfaces over A2 (shared parser — couples v1/v2 internals), A3 (deprecate v1 — breaks goldens + CLI), A4 (defer — falls behind upstream). `motion_five_frame_window=true` returns `-ENOTSUP` at `init()` mirroring ADR-0219 §Decision; the 3-frame default is fully supported. Picture-pool plumbing (`prev_prev_ref` field + `n_threads * 2 + 2` sizing) deferred to a follow-up PR. Netflix golden gate untouched (motion v1 unchanged).	Accepted	upstream-port, motion, feature-extractor, cli, public-api, fork-local
ADR-0338	Add an advisory CI lane `Build — macOS Vulkan via MoltenVK (advisory)` on `macos-latest` to validate the existing Vulkan compute backend on Apple Silicon via MoltenVK. Installs `molten-vk` + `vulkan-loader` + `vulkan-headers` + `shaderc` via Homebrew, pins `VK_ICD_FILENAMES` to `/opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json`, builds with `-Denable_vulkan=enabled`, runs the three Vulkan smoke tests. Lane is `continue-on-error: true` until one green run on `master`. Complementary to the planned native Metal backend, not a replacement. Operator doc + known-limitations matrix at `docs/backends/vulkan/moltenvk.md`.	Accepted	ci, vulkan, macos, moltenvk, gpu
ADR-0339	Placeholder `Av1VideoToolboxAdapter` for `tools/vmaf-tune/` codec-adapter registry. Apple M3 / M4 silicon has hardware AV1 encode but FFmpeg upstream has not exposed it (verified against master `8518599cd1`, 2026-05-09). The adapter registers with `supports_runtime=False` and raises `Av1VideoToolboxUnavailableError` until a runtime probe of `ffmpeg -h encoder=av1_videotoolbox` confirms the encoder exists, at which point it self-activates without a code change. Paired with `scripts/upstream-watcher/check_ffmpeg_av1_videotoolbox.sh` and a weekly cron workflow that opens a tracking GitHub issue when the encoder lands. First instance of the upstream-watcher pattern documented in `docs/development/upstream-watchers.md`.	Accepted	tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local, upstream-blocked
ADR-0340	Multi-corpus aggregation for the FR-regressor / predictor v2 trainer	Accepted	ai, training, corpus, fork-local
ADR-0341	Add `paths-ignore` filter to `libvmaf-build-matrix.yml` and `tests-and-quality-gates.yml` so doc-only / research-only PRs skip the full 18-cell build matrix and 10-job test matrix. Conservative deny-list: `docs/`, `/.md`, `changelog.d/`, `CHANGELOG.md`, `.workingdir2/*`. Safe under ADR-0313 because the Required Checks Aggregator explicitly treats workflow-not-reported as path-filter-skipped/acceptable (see `required-aggregator.yml` — `if (!run) { core.info('OK (not reported, treated as path-filter-skip)'); continue; }`). Mirrors the path-filter pattern from ADR-0317 (deny-list polarity instead of allow-list so unknown new paths fail closed → run CI). Saves ~14 runner-min per avg PR per Research-0089 §3.2; concrete example PR #525 (single file `docs/research/0089-...md`) would have qualified.	Proposed	ci, build, policy, fork-local
ADR-0345	cambi × {CUDA, SYCL, HIP} GPU port strategy. Locks Strategy II host-staged hybrid (the ADR-0205 / ADR-0210 Vulkan precedent) for all three pending backends — GPU services preprocess + derivative + spatial-mask SAT + decimate + filter-mode; host residual runs unmodified `calculate_c_values` + spatial pooling on byte-identical buffers; cross-backend gate at `places=4` from day one. LOC envelopes per Research-0091: CUDA ≈1100 (LOW risk), SYCL ≈1300 (MEDIUM, dual toolchain per ADR-0335), HIP ≈1100 (MEDIUM-LOW, hipify-perl seed from CUDA). Implementation order CUDA → SYCL → HIP. Strategy III (fully-on-GPU c-values) remains parked per ADR-0205 §Out of scope.	Accepted	cuda, sycl, hip, gpu, cambi, fork-local, places-4
ADR-0346	FR-features-from-NR-corpus adapter pattern	Accepted	ai, training, corpus, methodology, fork-local
ADR-0347	Sanitizer matrix (ASan + UBSan + TSan, ADR-0015) — concrete test-set scope. The pre-existing `meson test --suite=unit` invocation matched zero tests because no `test()` call in `core/test/meson.build` carries a `suite: 'unit'` tag, leaving every leg printing `No suitable tests defined.` and exiting 0 with zero correctness coverage. This ADR replaces `--suite=unit` with the full C unit-test set under each sanitizer, with a documented per-sanitizer deselect list for tests that fail because of a real underlying bug (not a sanitizer mis-configuration). ASan deselects `test_model`, `test_predict`, `test_float_ms_ssim_min_dim`; UBSan deselects `test_model` (and adds `-fno-sanitize=function` to skip the K&R-prototype harness UB across ~50 minunit-style tests); TSan deselects `test_model`, `test_pic_preallocation`, `test_framesync`. Each deselected entry cites a follow-up bug in `docs/state.md` so the gap stays visible. MSan stays out of the matrix (the matrix has always been ASan + UBSan + TSan, despite occasional "MSan" naming) — adding it would require an instrumented libc++ leg out of scope here. Surfaces seven real defects that have been hiding behind the no-op gate (svm.cpp parser allocation/null-deref on malformed JSON, dict/extractor leaks, integer_adm `div_lookup` global-init race, framesync mutex-domain mismatch). Companion research digest `docs/research/0089-sanitizer-matrix-test-scope.md`.	Accepted	ci, testing, sanitizer, asan, ubsan, tsan, fork-local
ADR-0348	Globally suppress the CodeQL `cpp/poorly-documented-function` rule via `.github/codeql-config.yml` `query-filters: - exclude: id: cpp/poorly-documented-function`. The rule warns on every C/C++ function lacking a `/** /` Doxygen header block, which directly contradicts the fork's documented coding standard (`CLAUDE.md` §6, `docs/principles.md`: "default to writing no comments. Only add one when the WHY* is non-obvious"). 15 currently-open alerts in `core/src/` are over-zealous against this house style — not real correctness flags. Alternatives considered: per-instance `lgtm[...]` annotations (15 noise-comments), mass-add `/** */` blocks (contradicts standard), per-instance review (15× "suppress" anyway). The remaining `security-extended` + `security-and-quality` packs stay enabled — targeted exclusion, not wholesale weakening. Verification is post-merge: alerts auto-close on the next CodeQL scan against `master`.	Accepted	ci, security, codeql, policy, fork-local
ADR-0349	`fr_regressor_v3` namespace resolution — keep the existing production checkpoint (sha256 `eaa16d23…`, 19 call sites in 12 files, PR #428 merged 2026-05-06) untouched and reserve `fr_regressor_v3plus_features` as the namespace for the future canonical-6 + `encoder_internal` + shot-boundary + `hwcap` feature-set bump. Reservation lives in this ADR + a new `## fr_regressor_* namespace map` block in `ai/AGENTS.md`; the registry row lands with the future PR that ships the `.onnx` (a stub row would fail `core/test/dnn/test_registry.sh`). Rejects (a) renaming v3 to `fr_regressor_v3_vocab16` (touches 19 call sites + breaks the production-flip immutability ADR-0291 establishes) and (b) calling the future work `fr_regressor_v4_features` (inflates `_v4` to a name-conflict workaround, polluting the major-version axis we genuinely need for future regressor redesigns). Status appendix added to ADR-0302 per ADR-0028.	Accepted	ai, docs, naming, fork-local
ADR-0350	`psnr_hvs` AVX-512 — re-bench confirms AVX2 ceiling (T3-9 (a) close-out). Per-symbol cycle share on 48-frame Netflix normal pair (Zen 5, 2026-05-09): `calc_psnrhvs_avx2` 78.42 % (scalar tail locked by ADR-0138/0139 bit-exactness), `od_bin_fdct8x8_avx2` 14.82 % (the only piece AVX-512 can widen). Amdahl ceiling: even an infinitely fast 16-lane DCT caps wall-clock improvement at 14.82 / (1 − 0.1482) = 17.4 % (1.17× over AVX2); a realistic 2-block batch recovers ~50 % of DCT cost, so projected gain is 1.07–1.08× — well below the T3-9 1.3× ship gate. Re-validates ADR-0180's 2026-04-26 verdict empirically on a current host with full AVX-512 (avx512f / dq / cd / bw / vl / ifma / vbmi); gives ADR-0160 a status-update appendix pointing here as the empirical close-out. T3-9 (b) and (c) (`ssimulacra2_*` AVX-512 + NEON drift audit; `iqa_convolve` AVX-512) bench independently in their own follow-up PRs.	Accepted	simd, avx512, psnr-hvs, ceiling, audit, fork-local
ADR-0351	T3-15(b) — `psnr_cuda` chroma extension. Extends the luma-only ADR-0182 extractor to emit `psnr_cb` / `psnr_cr` alongside `psnr_y` on the CUDA backend, mirroring ADR-0216's Vulkan port. Three-element readback array in `PsnrStateCuda` (`rb[3]`) carries per-plane device-SSE accumulators + pinned-host slots; the kernel (`calculate_psnr_kernel_{8,16}bpc` in `psnr_score.cu`) gains a `plane` parameter so it indexes `data[plane] / stride[plane]` instead of the hard-coded `[0]`. Single private stream + submit/finished event pair issues all per-plane launches back-to-back on the picture stream (no inter-plane barrier — accumulators are independent), then DtoHs all three slots on `lc.str` before a single `cuStreamSynchronize` in `collect()`. Subsampling derived from `pix_fmt`: 4:2:0 → w/2 × h/2, 4:2:2 → w/2 × h, 4:4:4 → w × h. YUV400P clamps `n_planes = 1` so chroma dispatches and emits are skipped. `provided_features` becomes `{"psnr_y", "psnr_cb", "psnr_cr"}`. `psnr_max[p]` follows CPU integer_psnr.c default branch (`(6 * bpc) + 12`). `picture_cuda` upload path needed no change — chroma planes were already uploaded for non-`YUV400P` inputs since the `ciede_cuda` landing (`libvmaf.c::translate_picture_host`'s `upload_mask`). Cross-backend gate (`scripts/ci/cross_backend_vif_diff.py --feature psnr --backend cuda`) extended to assert all three plane scores at `places=4`; RTX 4090 measurement on the 576×324 and 640×480 testdata fixtures reports `max_abs_diff = 0.0` across 48 frames for all three metrics (deterministic int64 SSE on both sides). Closes the GPU long-tail backlog row "psnr chroma parity with CPU" across both shipping GPU backends; chroma SSIM / chroma MS-SSIM CUDA follow-ups stay separate rows.	Accepted	cuda, gpu, feature-extractor, psnr, fork-local
ADR-0352	Vulkan submit-pool migration PR A: adm_vulkan, motion_vulkan, psnr_vulkan (ADR-0256 follow-up, T-GPU-OPT-VK-1 + T-GPU-OPT-VK-4). Eliminates per-frame vkCreateFence / vkAllocateCommandBuffers / vkAllocateDescriptorSets for the three highest-ROI remaining extractors. adm_vulkan (16 dispatches/frame, 4 descriptor sets) and psnr_vulkan (3 dispatches/frame, 3 descriptor sets) achieve zero per-frame Vulkan API overhead beyond the actual dispatch because all buffer handles are init-time-stable. motion_vulkan (1 dispatch/frame) retains one vkUpdateDescriptorSets per frame for the blur ping-pong. Per-frame elimination: 12-16 round-trips. Expected throughput gain at sub-HD resolutions: 10-60 percent per the ADR-0256 profile. Numerical output is bit-identical; places=4 cross-backend gate passes on all three extractors.	Accepted	vulkan, perf, kernel-template, fork-local
ADR-0353	Vulkan submit-pool migration PR-B — six secondary kernels	Proposed	vulkan, gpu, performance, fork-local
ADR-0354	Vulkan submit-pool migration PR-C — cambi, ssimulacra2, float_ansnr, moment	Accepted	vulkan, perf, kernel-template, fork-local
ADR-0355	Symphony-inspired agent-dispatch infrastructure — three thin, in-repo artefacts ported from openai/symphony §3.1/§4.1: (1) `.claude/workflows/` typed-YAML-front-matter briefs (`_template.md` + `codeql-alert-sweep.md` / `simd-port.md` / `feature-extractor-port.md`); (2) `scripts/lib/backlog_tracker.py` read-only `BacklogItem` / `BacklogTracker` / `GitHubTracker` parsing `.workingdir2/BACKLOG.md` rows + `gh` PR queries; (3) `scripts/ci/agent-eligibility-precheck.py` pre-dispatch gate (BACKLOG row open + no merged PR on scope + no in-flight agent). Adopts Symphony's shapes without the Elixir/Codex/Linear runtime — stdlib-only, one PR, opt-in until a Claude Code `Agent.preDispatch` hook surfaces. Closes the two confirmed NO-OP dispatches this session (`vmaf_tiny_v3` registry / T7-5 NOLINT sweep) at the cheapest gate. Rejects (a) full Symphony adoption (multi-week, new runtime + Linear dependency) and (c) status-quo manual triage (already losing more context than the build-out costs).	Accepted	agents, ci, tooling, fork-local
ADR-0356	Two-level GPU reduction for Vulkan VIF / ADM / motion accumulators	Accepted	vulkan, perf, gpu
ADR-0357	Vulkan readback buffer VMA allocation flag separation	Accepted	vulkan, performance
ADR-0358	CUDA `motion` correctness — fix four real bugs surfaced by a cuda-reviewer pass: (1) cross-stream race on the SAD accumulator (memset on `s->str`, kernel atomic-add on `pic_stream`, no event linkage), now memsets on `pic_stream` mirroring the v2 pattern; (2) pinned-memory leak of `s->sad_host` (`compute-sanitizer --tool memcheck` reports `LEAK SUMMARY: 8 bytes leaked in 1 allocations` on `master`, 0 on the fix); (3) `motion2_score` skipped the `MIN(score * motion_fps_weight, motion_max_val)` post-process the CPU reference does at `integer_motion.c:563`; (4) moving-average guard off-by-one because `s->frame_index` is pre-incremented before `motion3_postprocess_cuda` runs. Plus two perf advisories in `motion_score.cu` and `motion_v2_score.cu`: (5) pad shared-tile inner stride 20 → 21 to break the 2-way bank conflict (GCD(20, 32) = 4 vs GCD(21, 32) = 1, +64 B/block); (6) add `__launch_bounds__(BLOCK_X * BLOCK_Y, 8)`. Default settings remain bit-exact at `places=4` (0/144 mismatches on Netflix `src01_hrc00_576x324.yuv` ↔ `src01_hrc01`); the bugs in (3) + (4) only trip under non-default `motion_fps_weight ≠ 1.0` or `motion_moving_average=true`. ADR-0357 reserved for motion3 GPU coverage; common.c:388/416 inverted-stream-select advisory deferred per agent brief (no live callers).	Accepted	cuda, motion, correctness, precision, fork-local
ADR-0359	ARC self-hosted runner pool pilot — route `Cppcheck (Whole Project)` through a ternary `runs-on` expression keyed on the new repo variable `ARC_RUNNERS_ENABLED`. Default `false` keeps every job on GitHub-hosted; flip to `true` to opt the pilot job into the in-cluster `arc-runners` scale set. Observable failure mode: if ARC is degraded, the job sits queued and the operator flips the variable back to fall back. After ≥ 1 day green pilot, ramp up to sanitizers + Vulkan + GPU build legs in follow-up PRs.	Accepted	ci, infra, fork-local
ADR-0360	CAMBI CUDA port (Strategy II hybrid, T3-15a)	Accepted	cuda, gpu, cambi, feature-extractor, fork-local, places-4, t3-15
ADR-0361	Metal (Apple Silicon) compute backend — scaffold-only audit-first PR (T8-1; closes T8-1 audit half, runtime + motion_v2 kernel land in follow-up PRs T8-1b / T8-1c). New public header `libvmaf_metal.h` declaring `VmafMetalState` / `VmafMetalConfiguration` / `vmaf_metal_state_init` / `_import_state` / `_state_free` / `vmaf_metal_list_devices` / `vmaf_metal_available`. New `core/src/metal/` (common + picture_metal + dispatch_strategy + kernel_template) + first-consumer scaffold `core/src/feature/metal/integer_motion_v2_metal.c` registering `vmaf_fex_integer_motion_v2_metal` (TEMPORAL flag, `motion_v2_metal` name). All entry points return `-ENOSYS`. New `enable_metal` feature option (default `auto`: probes for `Metal.framework` / `MetalKit.framework` on macOS, disabled elsewhere) with conditional `subdir('metal')` in `core/src/meson.build`. New 14-sub-test smoke at `core/test/test_metal_smoke.c` exercising every public C-API entry point, the kernel-template helpers, and the first-consumer registration. New CI matrix row `Build — macOS Metal (T8-1 scaffold)` compiling on `macos-latest` with `-Denable_metal=enabled`. New `docs/backends/metal/index.md` + `docs/backends/index.md` flipped from "planned" to "scaffold". Apple Silicon (GPU Family Apple 7+) only — Intel Macs rejected per discontinued-platform reasoning. Runtime layer will use Apple's MetalCpp C++ wrapper (per https://developer.apple.com/metal/cpp/, accessed 2026-05-09); MoltenVK passthrough rejected (translation overhead, double-dependency); Intel oneAPI rejected (no macOS distribution); OpenCL rejected (deprecated by Apple since macOS 10.14). Mirrors the HIP T7-10 scaffold (ADR-0212) and Vulkan T5-1 scaffold (ADR-0175).	Accepted	gpu, metal, apple-silicon, scaffold, audit-first, fork-local
ADR-0362	K150K-A corpus integration: FR-from-NR extraction of FULL_FEATURES. Integrates KoNViD-150k-A (152,265 clips, crowd-sourced MOS) into the tiny-AI training pipeline using the FR-from-NR adapter (ADR-0346): the same decoded YUV is fed as both reference and distorted to `build-cpu/tools/vmaf --backend cuda` (RTX 4090). All 22 FULL_FEATURES (Research-0026) are extracted per clip and aggregated to nanmean + nanstd; `ciede2000` and `psnr_hvs` are all-NaN (identity-pair artifact, expected). Output: `runs/full_features_k150k.parquet` (gitignored, one row per clip, 48 cols). Restartable via `.done` checkpoint + atomic parquet flush every 1000 clips. Single-process ETA ~296 h at ~7 s/clip. Implementation: `ai/scripts/extract_k150k_features.py`. User docs: `docs/ai/datasets/k150k.md`. Companion: Research-0067.	Accepted	ai, training-data, corpus, k150k, full-features, fork-local
ADR-0363	Mend Renovate replaces Dependabot as the dependency-update bot. Adds `renovate.json` (config:recommended base, weekly Monday schedule, grouped GitHub Actions minor+patch bumps, pre-commit hooks manager, Python patch grouping) and `.github/workflows/renovate.yml` (self-hosted via `renovatebot/github-action` SHA-pinned to v46.1.13 `79dc0ba74dc3de28db0a7aeb1d0b95d5bf5fde2a`). Disables `.github/dependabot.yml` (renamed to `.disabled` with a revert comment). Key addition over Dependabot: a `customManagers` regex rule tracking `FFMPEG_SHA:=n[0-9.]+` in `ffmpeg-patches/test/build-and-run.sh` and `FFMPEG_PATCHES_BRANCH` in `scripts/ci/ffmpeg-patches-check.sh` against `github-tags/FFmpeg/FFmpeg` — a surface Dependabot cannot reach. Operator doc: `docs/development/dependency-bot.md`.	Accepted	ci, security, dependencies, github-actions, pre-commit, fork-local
ADR-0364	`saliency_student_v2` — Resize-decoder ablation on the v1 recipe	Accepted (gate passed: v2 IoU 0.7105 ≥ v1 0.6558)	ai, dnn, mobilesal, saliency, training, fork-local, docs
ADR-0365	Wire the Apple CoreML execution provider into the tiny-AI ORT dispatch layer. Adds four `--tiny-device` selectors (`coreml`, `coreml-ane`, `coreml-gpu`, `coreml-cpu`) and the matching `VmafDnnDevice` enum values 5..8 (append-only). The `coreml-ane` selector pins `MLComputeUnits=CPUAndNeuralEngine` for highest perf-per-watt on M-series silicon; the unscoped `coreml` lets the EP auto-route. Wiring uses the generic key/value `SessionOptionsAppendExecutionProvider` form so the Linux build degrades cleanly when the EP is absent. End-to-end ANE silicon validation deferred until Apple-silicon hardware access. Apple-side parallel to ADR-0332's OpenVINO NPU wiring.	Proposed	ai, dnn, coreml, apple-silicon, fork-local
ADR-0366	vmaf-tune corpus schema v3 — canonical-6 per-feature aggregates	Accepted	ai, tools, vmaf-tune, corpus, schema
ADR-0367	LSVQ corpus ingestion for `nr_metric_v1` — adopt the LIVE Large-Scale Social Video Quality dataset (Ying et al. ICCV 2021, ~39 K UGC videos, ~5.5 M ratings, CC-BY-4.0) as a third training shard alongside KonViD-150k (ADR-0325 Phase 2) and BVI-DVC (ADR-0310). New `ai/scripts/lsvq_to_corpus_jsonl.py` adapter mirrors the KonViD-150k Phase 2 shape verbatim — resumable per-URL `curl` downloads with atomic tempfile-rename progress writes, ffprobe-driven geometry probe, MOS / SD / rating-count round-trip from the canonical Hugging Face split CSV (`teowu/LSVQ-videos`), and the same JSONL row contract modulo `corpus = "lsvq"`. Refuses sub-1000-row CSVs; defaults to a 500-row laptop-class subset with `--full` for whole-corpus ingestion (~500 GB working set). License posture is local-only — corpus + per-clip MOS stay under `.workingdir2/`, only derived `nr_metric_v1_*.onnx` weights ship with CC-BY-4.0 attribution. ENCODER_VOCAB v4 `"ugc-mixed"` collapse stays trainer-side and is unchanged by this PR. Tests under `ai/tests/test_lsvq.py` cover resumable resume, attrition tolerance, refuse-tiny cutoff, atomic progress-file writes, ffprobe geometry parse, broken-clip skip, MOS-column round-trip (canonical + alias headers), bare-stem `name` → `.mp4` suffix, append+dedup on re-run, and `--max-rows` / `--full` cap behaviour.	Accepted	ai, training, corpus, license, fork-local
ADR-0368	External-competitor benchmark harness — wrapper-only architecture for side-by-side comparison between the fork's `fr_regressor_v2_ensemble_v1` + `nr_metric_v1` predictors and two external OSS competitors (Synamedia/Quortex `x264-pVMAF`, GPL-2.0; DOVER-Mobile, Apache-2.0 + CC-BY-NC-SA 4.0). Lands `tools/external-bench/` with four `run.sh` wrappers (one per competitor; each invokes a user-installed binary via env var and re-shapes its output into a normalised JSON schema), `compare.py` orchestrator (BVI-DVC test fold + Netflix Public Drop corpus discovery, PLCC/SROCC/RMSE/runtime aggregation, fixed-width comparison-table renderer), 7 stubbed pytest cases that monkeypatch `subprocess.run` so tests never depend on external binaries being installed, and operator-facing `README.md` documenting the licence boundary. Critical licence constraint: `x264-pVMAF` is GPL-2.0 vs the fork's BSD-3-Clause-Plus-Patent — the wrapper-only posture keeps zero GPL'd code in the fork; vendoring would relicense the entire fork. Companion to ADR-0310 (corpus) and ADR-0321 (fork-side predictor lineage).	Accepted	ai, testing, license, tooling, fork-local
ADR-0369	Waterloo IVC 4K-VQA corpus ingestion for `nr_metric_v1` — adopt the University of Waterloo IVC 4K Video Quality Database (Li et al. ICIAR 2019; 20 pristine 4K sources × 5 codecs × 3 resolutions × 4 distortion levels = 1 200 clips with controlled-subjective-study MOS, permissive academic licence) as a fourth training shard alongside BVI-DVC (ADR-0310), KonViD-150k (ADR-0325 Phase 2), and LSVQ (ADR-0333). New `ai/scripts/waterloo_ivc_to_corpus_jsonl.py` adapter mirrors the LSVQ / KonViD-150k Phase 2 shape — resumable per-URL `curl` downloads with atomic tempfile-rename progress writes, ffprobe-driven geometry probe, and the same JSONL row contract modulo `corpus = "waterloo-ivc-4k"`. Closes the 2160p resolution-bin gap in the BVI-DVC + KonViD-150k + LSVQ union flagged in research digest #465. Auto-detects between the upstream canonical headerless 5-tuple shape (`encoder, video_number, resolution, distortion_level, mos`) and the standard LSVQ-shape named-column CSV. Refuses sub-100-row CSVs; defaults to a 100-row laptop-class subset with `--full` for whole-corpus ingestion (~multi-TB working set). MOS is recorded verbatim on the Waterloo-native 0–100 raw scale (NOT 1–5 like KonViD / LSVQ); cross-corpus rescaling is a trainer-side follow-up. License posture is local-only — corpus + per-clip MOS stay under `.workingdir2/`, only derived `nr_metric_v1_*.onnx` weights ship with IVC attribution. ENCODER_VOCAB v4 `"professional-graded"` slot routing stays trainer-side and is unchanged by this PR. Tests under `ai/tests/test_waterloo_ivc.py` (20 cases) cover canonical-headerless auto-detect, standard-CSV parse, alias headers, native 0–100 MOS round-trip, resumable resume, attrition tolerance, refuse-tiny cutoff, atomic progress-file writes, ffprobe geometry parse (HEVC / AV1 at 4K), broken-clip skip, append+dedup on re-run, encoder_upstream verbatim, and `--max-rows` / `--full` cap behaviour.	Accepted	ai, training, corpus, license, fork-local
ADR-0370	LIVE-VQC MOS-corpus ingestion for `nr_metric_v1`	Accepted	ai, training, corpus, license, fork-local
ADR-0371	Shared `CorpusIngestBase` for MOS-corpus ingestion adapters	Accepted	ai, corpus, refactor, fork-local
ADR-0372	HIP Batch-1 — `integer_psnr_hip` and `float_ansnr_hip` Real Kernels	Accepted	hip, gpu, build
ADR-0373	HIP Batch-2 — `float_motion_hip` Real Kernel	Accepted	hip, gpu, build
ADR-0374	Build-time-optional public APIs return `-ENOSYS` when disabled	Accepted	dnn, cuda, sycl, hip, vulkan, metal, mcp, build, api, fork-local
ADR-0375	HIP batch-3 — `float_moment_hip` and `float_ssim_hip` real kernels	Accepted	hip, gpu, build, feature-extractor, fork-local
ADR-0376	Fix silent error-swallow in Vulkan buffer-invalidate readback functions	Accepted	vulkan, gpu, build, correctness, fork-local
ADR-0377	HIP batch-4 — `ciede_hip` and `integer_motion_v2_hip` real kernels	Accepted	hip, gpu, build, feature-extractor, fork-local
ADR-0378	Per-picture CUDA streams must use CU_STREAM_NON_BLOCKING	Accepted	cuda, performance, gpu, feature-extractor, fork-local
ADR-0379	libvmaf Symbol Visibility — Hide Internal Symbols with `-fvisibility=hidden`	Accepted	build, api, security, abi, fork-local
ADR-0380	FFmpeg libvmaf filter — HIP backend selector patch (0011)	Accepted	ffmpeg-patches, hip, integration
ADR-0381	Fix Vulkan VIF Scale 2/3 Numerical Saturation (PR #718)	Accepted	vulkan, precision, build
ADR-0382	Y4M header parser — reject non-positive width or height before allocation	Accepted	security, fuzz, parser, fork-local
ADR-0383	K150K corpus scoring driver — parallel CPU worker redesign	Accepted	ai, corpus, performance, training, fork-local
ADR-0384	Switch shfmt pre-commit hook from binary download to Go-source build	Accepted	ci, build, fork-local
ADR-0385	Feature-extractor deduplication by provided-feature names	Accepted	correctness, cuda, gpu, feature-extractor, fork-local
ADR-0386	ADR Number Collision Prevention — Hook + CI Gate + Helper Script	Accepted	ci, docs, git, agents
ADR-0387	Migrate Renovate from self-hosted workflow to GitHub App	Accepted	infra, dependency-bot, fork-local
ADR-0388	Ingest BVI-CC as the second tiny-AI training corpus	Draft	ai, fr-regressor, corpus, license, bristol
ADR-0389	vmaf_tiny_v3 — wider/deeper mlp_medium tiny VMAF MLP	Accepted	ai, dnn, tiny-ai, model, registry, fork-local
ADR-0390	vmaf_tiny_v4 — mlp_large arch (opt-in only; arch ladder stops here)	Accepted	`ai`, `tiny-ai`, `model`, `inference`
ADR-0391	Closes the PR #346 deferred follow-up — root-causes the residual 5/48 NVIDIA-Vulkan ciede2000 places=4 mismatch (max abs `8.9e-05`, 1.78× threshold) as a structural f32-vs-f64 precision gap. CPU `ciede.c::get_lab_color` runs the BT.709 → linear-RGB → XYZ → Lab chain in `double`; the Vulkan shader runs it in `float`. Controlled experiment (rebuild CPU with f32-throughout helpers) proves f32-CPU and NVIDIA-Vulkan agree to ~6e-7 on the 5 failing frames (the highest-ΔE frames of the fixture); the gap is the irreducible f32-vs-f64 delta amplified by per-pixel ΔE summation. Rejects three mitigations: f64 shader promotion via `shaderFloat64` (RTX 4090 runs f64 at 1/64 fp32 throughput; SPIR-V f64 transcendentals unmandated by spec), f32-narrowing the CPU reference (changes Netflix golden ground truth), and matched polynomial `pow`/`sqrt`/`sin` approximations (cost-benefit fails for a 1.78× tail). Accepts as documented fork debt under `docs/state.md` Open bug T-VK-CIEDE-F32-F64; lavapipe parity gate (places=4, 0/48) remains authoritative for CI. Companion to research-0055.	Accepted	vulkan, ciede, precision, gpu, nvidia, fork-local
ADR-0392	`vmaf-tune` Phase D — per-shot CRF tuning scaffold	Accepted (scaffold only; native per-codec emission and	tooling, ai, ffmpeg, codec, automation, fork-local
ADR-0393	`fr_regressor_v2` probabilistic head — deep-ensemble + conformal scaffold	Accepted	ai, fr-regressor, probabilistic, ensemble, conformal, fork-local
ADR-0394	Local sidecar training — on-host bias-correction model that adapts the shipped per-shot VMAF predictor to the operator's own source / encoder mix without mutating the predictor itself. Adds `tools/vmaf-tune/src/vmaftune/sidecar.py` with `SidecarConfig`, `SidecarModel` (online ridge regression with closed-form Sherman-Morrison rank-1 inverse update, pure-Python, zero new ML dep), and `SidecarPredictor` that composes a `Predictor` with a `SidecarModel`. Persistence under `${XDG_CACHE_HOME:-~/.cache}/vmaf-tune/sidecar/<predictor-version>/<codec>/state.json`, keyed by an anonymous random 128-bit host UUID generated by `secrets.token_hex(16)` — never derived from MAC, hostname, machine-id, or any machine-identifying info (load-bearing precondition for a future opt-in community-pool upload). Cold-start is identically zero correction (composed predictor is bit-equivalent to the bare `Predictor`); predictor-version bump invalidates the sidecar to a fresh cold-start. Five contract tests under `tools/vmaf-tune/tests/test_sidecar.py` pin cold-start pass-through, residual-reduction after captures, save/load round-trip, UUID stability across reconstructions, and predictor-version invalidation. Operator doc `docs/ai/local-sidecar-training.md`; algorithm + privacy + drift-hook rationale in Research-0086. Local-only by default; opt-in upload to a community pool (ChatGPT-vision item 4) is explicitly out of scope and tracked under §Future work. Closes the Research-0087 item-3 "partially scaffolded" gap.	Proposed	ai, vmaf-tune, sidecar, online-learning, privacy, fork-local
ADR-0395	Predictor stub-models policy — ship one synthetic-stub `model/predictor_<codec>.onnx` for each of the 14 vmaf-tune codec adapters trained from a deterministic 100-row synthetic corpus seeded by codec name, plus the trainer (`tools/vmaf-tune/src/vmaftune/predictor_train.py`) that consumes a real Phase A JSONL corpus when available and falls back to the synthetic generator per-codec when not. Tiny MLP (14 inputs × 64 hidden × 1 output, ~5K params), opset 18, op-allowlist validated. Per-codec model card under `model/predictor_<codec>_card.md` flags `corpus.kind: synthetic-stub-N=100` with a do-not-use-in-production warning; cards switch to `corpus.kind: real-N=<rows>` when the operator runs the trainer against a real corpus. Closes the predictor follow-up from PR #430. Production weights flip stays gated on real corpus generation.	Accepted	ai, vmaf-tune, predictor, models, fork-local
ADR-0396	Video-temporal saliency extension to `saliency_student_v1` — three-phase rollout. Phase 1 (immediate): configurable temporal aggregator (`mean` / `ema` / `motion-weighted`) inside `tools/vmaf-tune/src/vmaftune/saliency.py`, no new model — captures the SalEMA-validated "EMA over a frozen 2D backbone closes most of the gap to a sophisticated temporal model" finding for free. Phase 2 (follow-up): `video_saliency_student_v1` (~200–300 K params, BSD-3-Clause-Plus-Patent, ONNX opset 17), distilled from UNISAL (Apache-2.0, MobileNetV2 + Bypass-RNN) on DHF1K (CC BY 4.0). TinyU-Net + learned per-channel EMA gate on the bottleneck; same I/O contract as `saliency_student_v1` plus an optional bottleneck-state input. Trained via `ai/scripts/train_video_saliency_student.py`. Phase 3: ONNX export + `vmaf-tune recommend --saliency-mode {image, video}` flag, with `image` as the default until a BD-rate sweep justifies the flip. Rejected: TASED-Net (21.2 M params, MIT — wrong size class for fork's tiny-AI footprint), ViNet-v2 (CC BY-NC-SA 4.0 — license blocker mirrors ADR-0257), AViMoS (mouse-tracked, held in reserve for v2), Mamba-based ZenithChaser (op not on `core/src/dnn/op_allowlist.c`). Source survey: Research-0086.	Proposed	ai, dnn, saliency, video-saliency, vmaf-tune, roi, fork-local, design
ADR-0397	`vmaf-tune` Phase F — `auto` adaptive recipe-aware tuning entry point. Design-only ADR (F.0): ships the deterministic decision tree that composes the existing phases (`corpus`, `recommend`, `fast`, `predict`, `tune-per-shot`, `recommend-saliency`, `ladder`, `compare`) plus the orthogonal modes (HDR auto-detect, sample-clip, resolution-aware) into a single `vmaf-tune auto --src ref.mkv --target-vmaf 92 --max-budget-bitrate 5000` CLI verb. Internal architecture is a hand-coded tree (no learned policy at runtime; explainability + reproducibility floor) with a 30-line pseudocode spec. Phased rollout: F.0 design (this ADR), F.1 sequential scaffold + `--smoke`, F.2 short-circuits (single-rung ladder, codec known, GOSPEL predictor, short / low-variance source skips Phase D, photographic content skips saliency, SDR skips HDR pipeline, sample-clip propagation), F.3 confidence-aware fallbacks (per-cell escalation to coarse-to-fine on FALL_BACK), F.4 per-content-type recipe overrides (animation / live-action / screen-content). No code yet. Companion: Research-0067.	Proposed	tooling, automation, vmaf-tune, ffmpeg, codec, fork-local
ADR-0398	MyTestCase upstream migration — partial port (golden-pinned files deferred)	Accepted	testing, upstream-sync, python
ADR-0399	`vmaf-tune` codec-adapter contract becomes a runtime contract (HP-1)	Accepted	tooling, codec, automation, fork-local, bug-fix
ADR-0400	encoder-internal-stats capture (corpus expansion v1)	Accepted	vmaf-tune, corpus, predictor, x264
ADR-0401	libvmaf WebAssembly target — research-only feasibility ADR proposing a phased rollout: EXPERIMENT (smallest scalar Tier-1 prototype, behind `enable_wasm=false`, no release / no npm publish) before any commitment to Tier 2 (simde + WASM-SIMD + npm publish) or Tier 3 (`onnxruntime-web` + tiny-AI heads). The WASM build joins GPU / SIMD as "numerically close, never bit-exact" and runs its own snapshot suite under `testdata/scores_wasm_*.json` rather than participating in the Netflix CPU golden gate. Decision matrix in Research-0089.	Proposed	build, wasm, browser, ai, fork-local
ADR-0402	MCP runtime v2 — UDS transport + real `compute_vmaf` binding	Accepted	mcp, agents, api, transport, fork-local
ADR-0403	mkdocs `--strict` validation policy. Tightens the existing `.github/workflows/docs.yml` `--strict` lane (which had been a smoke test because every link-validation category in `mkdocs.yml` was set to `info`): promotes `links.anchors` and `nav.{not_found,omitted_files}` to `warn` so the lane fails on broken in-doc anchors, mkdocs nav typos, and leaked excluded-tree pages; documents `links.{not_found,unrecognized_links}: info` carve-outs for the two unfixable populations (cross-tree pointers `../../core/src/...` outside `docs_dir`, and ADR-body cross-refs to renamed neighbours frozen by ADR-0028 / ADR-0106 immutability). Excludes `docs/adr/_index_fragments/**` from the rendered site (concatenation source per ADR-0221). Sweeps actionable subset: fixes two genuinely-broken anchors (`docs/mcp/embedded.md` → ADR-0209's "What lands next" heading, `docs/research/0055-...md` → Research-0053's "Distribution" heading) and the bare-relative-dir links in `docs/{index,state,rebase-notes}.md`. Net: strict-build flips from EXIT=1 with 1,276 emitted WARNINGs (after promoting categories to `warn`) to EXIT=0 with the actionable classes still gated.	Proposed	docs, ci, mkdocs, fork-local
ADR-0404	Keep `nightly.yml` (TSan) and `fuzz.yml` (libFuzzer) workflows running unmodified despite 23+ days of consecutive `failure` runs. Both gates surface real bugs — a data race in `div_lookup_generator` (ADM init) and a NULL-deref SEGV in `y4m_input_fetch_frame` on negative-width Y4M headers. Per memory `feedback_no_test_weakening`, the gates stay on; the bugs get fixed in dedicated follow-up PRs (not in this triage PR). State.md rows pin the failing tests + reopen triggers so reviewers can immediately distinguish the two known-open bugs from any new finding.	Accepted	ci, testing, fuzzing, security
ADR-0405	Wire OpenVINO NPU execution provider into the tiny-AI dispatch layer. Adds `VMAF_DNN_DEVICE_OPENVINO_NPU` / `_CPU` / `_GPU` enum values plus matching `--tiny-device=openvino-npu` / `openvino-cpu` / `openvino-gpu` CLI keywords; pins the OpenVINO EP to a single `device_type` with no fallback inside the explicit-selector branches (the existing `--tiny-device=openvino` keeps the GPU→CPU fallback chain). NPU is intentionally NOT added to the AUTO try-chain — opt-in only because of NPU power-state latency floor on small graphs. Smoke-test exercises the selector path against `vmaf_dnn_session_attached_ep()`; on hardware without NPU silicon the graceful CPU-EP fallback in `vmaf_ort_open()` handles the absence. End-to-end NPU validation deferred to a contributor with Meteor / Lunar / Arrow Lake hardware (re-evaluation triggers in the ADR body).	Accepted	ai, dnn, openvino, intel-ai-pc, fork-local
ADR-0406	Defer the SYCL ADM DWT `group_load` rewrite recommended by research-0086 §A.4. Two blockers surfaced on implementation: (1) divisibility — the vert tile (`TILE_H × WG_X = 576` int32 elements, `WG_SIZE = 256`) violates the SYCL ext `total = WG_SIZE × ElementsPerWorkItem` contract (`576 / 256 = 2.25` is not integer; the expression `2(WG_Y+1)/WG_Y` is integer only for `WG_Y ∈ {1, 2}`), (2) source contiguity — `group_load` takes a contiguous `InputIteratorT` and the multi-row tile load is contiguous only within a single 32-int row. The hori pass has no SLM tile, so was a non-target. Battlemage register-pressure validation is unavailable on the Arc A380 dev host. The kernel stays bit-exact-untouched; the audit checklist row in `docs/development/oneapi-install.md` is annotated with a cross-link to this ADR. Reopens when (a) a tile-geometry redesign yields integer divisibility AND (b) Xe2 hardware is available. ADR-0202 carries a Status-update appendix per the immutability rule.	Accepted	sycl, adm, perf, deferred, fork-local
ADR-0407	AdaptiveCpp / hipSYCL added as a second supported SYCL toolchain alongside Intel oneAPI `icpx`. Contributors who do not want to install Intel's ~2.6 GB closed-source basekit can pass `-Dsycl_compiler=acpp` (with `-Dsycl_acpp_targets=<targets>`) to build the fork's `-Denable_sycl=true` path against the open-source LLVM-based AdaptiveCpp instead. Intel `icpx` remains the primary toolchain — fork-shipped binaries, Intel discrete-GPU codegen, and OpenVINO / NPU enablement stay icpx-coupled. New `core/src/feature/sycl/sycl_compat.h` neutralises the 10 `[[intel::reqd_sub_group_size(N)]]` call sites under acpp via a `VMAF_SYCL_REQD_SG_SIZE(N)` macro; new `docs/development/sycl-toolchains.md` documents the per-toolchain capability matrix and numerical-conformance gap. Closes Research-0086 Topic B (GO-AS-SECOND-TOOLCHAIN).	Accepted	sycl, build, toolchain, fork-local, ci, contributor-experience
ADR-0408	FFmpeg `libvmaf` filter — CUDA backend selector, mirroring the existing SYCL (ADR-0118) / Vulkan (ADR-0186) selectors. Adds `ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch`: a `cuda` boolean AVOption on the `libvmaf` filter that — when set — inits a `VmafCudaState` against the CUDA primary context, imports it into the `VmafContext`, and dispenses `VmafPicture`s from a `HOST_PINNED` preallocation pool so software AVFrame input flows into pinned-host memory the CUDA feature kernels DMA from without a staging copy. Configure changes promote `libvmaf_cuda` from blanket-autodetect to the user-facing `--enable-libvmaf-cuda` flag (matching SYCL/Vulkan in `EXTERNAL_LIBRARY_LIST`); the in-filter selector keeps working when libvmaf ships CUDA support. Coexists with the upstream dedicated `libvmaf_cuda` filter via the `CONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER` guard. Cumulative replay 0001..0010 against pristine `n8.1.1` PASS; configure `--help` advertises `--enable-libvmaf-cuda` symmetrically with SYCL/Vulkan.	Accepted	ffmpeg-patches, cuda, integration
ADR-0409	Automated CI gate for CLAUDE.md §12 r14 — third blocking job in `.github/workflows/rule-enforcement.yml`, backed by `scripts/ci/ffmpeg-patches-surface-check.sh`. Parses every patch under `ffmpeg-patches/` once, extracts a "consumed set" of `vmaf_` / `Vmaf` / `libvmaf_` / `--enable-libvmaf-` tokens, and intersects against the PR's diff over `core/include/libvmaf/.h` and `core/meson_options.txt`; fails when the intersection is non-empty and no `ffmpeg-patches/.patch` is in the diff. Per-PR opt-out is `no ffmpeg-patches update needed: REASON` per the ADR-0108 family convention. Picks bash + grep over libclang AST / ctags — sub-second runtime, zero new deps; false positives close in seconds via the opt-out line, false negatives (a real surface change slipping through) are the bug we're paying to avoid. Mirrors ADR-0124 gate structure; absorbs into the ADR-0313 aggregator without branch-protection edits.	Accepted	ci, ffmpeg-integration, process, rule-enforcement
ADR-0410	`ssimulacra2_cuda` 2026-05-09 cuda-reviewer follow-up — fixes the GPU module leak (`init_fex_cuda` calls `cuModuleLoadData` twice, `close_fex_cuda` never calls `cuModuleUnload`; ~200-500 KB of GPU-resident module backing store leaked per `vmaf_close()` cycle, invisible to `compute-sanitizer --tool memcheck` because the leak-checker tracks `cuMemAlloc` only), removes the per-scale 24 MB `malloc(3 width * height * sizeof(float))` from `extract_fex_cuda` (replaced with two pre-allocated pinned scratch buffers `h_ref_lin_ds` / `h_dis_lin_ds` reused across scales), shrinks the H2D / D2H transfers from full-plane to per-plane `scale_w * scale_h * sizeof(float)` (15× PCIe-traffic reduction at 1080p scale 2: 518 KB valid vs 8 MB full-plane per copy, repeated across 2 H2D + 5 D2H per scale), and annotates `ssimulacra2_blur_h` / `ssimulacra2_blur_v` with `__launch_bounds__(64, 32)` so `nvcc` trims registers to keep ≥32 resident blocks per SM. Bit-exact at places=4 (0/48 mismatches, max abs diff 0.000000e+00) on the Netflix 576×324 fixture. Wall-clock at 1080p RTX 4090 within noise (~0.7%) — the host XYB pre-pass dominates; the fix is correctness-shaped, not throughput-shaped, at this resolution. Architectural ceilings (H-pass non-coalesced, V-pass L1 pressure) require a shared-memory tile-transpose rewrite and remain known follow-ups. The `cuModuleUnload` rule is propagated to `core/src/cuda/AGENTS.md § Lifecycle invariants` so future agent passes pin it on every CUDA extractor (every existing extractor leaks the same way — separate sweep PR).	Accepted	cuda, gpu, perf, memory-leak, ssimulacra2, fork-local
ADR-0412	Fork-local release-artefact mirror scaffold for `u2netp.pth` under Apache-2.0 §4 NOTICE compliance. Partial unblock of T6-2a path (b) (`T6-2a-mirror-u2netp-via-release` — ADR-0265 fallback path). Adds `LICENSES/Apache-2.0-u2netp.txt` (full licence text + attribution block citing upstream copyright, paper, repository, and HEAD `ac7e1c81` commit pin; upstream ships no NOTICE file, so §4 (d) is moot), a model-card stub at `docs/ai/models/u2netp_mirror_card.md` (5-point bar per ADR-0042 with sha256 + Sigstore bundle URL placeholders for the binary upload), an operator workflow doc at `docs/ai/u2netp-mirror.md` (`gh release download` + sha256 cross-check + `cosign verify-blob` against the VMAFx/vmafx OIDC identity), an idempotent staging step in `.github/workflows/supply-chain.yml` (fast-exits when `model/u2netp_mirror.{onnx,pth}` are absent), and a `.gitignore` entry pinning the binary to release attachments only. ADR-0671 adds the missing exporter. Recommended path for new consumers remains `saliency_student_v2`; the mirror is the named fallback for users who specifically want upstream u2netp lineage (citation, comparative evaluation, downstream pipelines pinned to upstream behaviour). Binary upload remains a sibling release-asset step. Companion: Research-0086.	Accepted	ai, dnn, u2netp, saliency, license, apache-2.0, supply-chain, fork-local, docs
ADR-0413	YouTube UGC corpus ingestion for `nr_metric_v1` — adopt the Google YouTube UGC dataset (Wang, Inguva, Adsumilli MMSP 2019 + CVPR 2021 transcoded follow-up; ~1500 community UGC originals; CC-BY) as a fourth MOS-corpus training shard alongside LSVQ (ADR-0333), KonViD-150k (ADR-0325 Phase 2), and BVI-DVC (ADR-0310). New `ai/scripts/youtube_ugc_to_corpus_jsonl.py` adapter mirrors the LSVQ shape verbatim — resumable per-URL `curl` downloads with atomic progress writes, ffprobe-driven geometry probe, MOS / SD / n-ratings round-trip from the canonical bucket-rooted manifest CSV (`https://storage.googleapis.com/ugc-dataset/original_videos.csv`), and the same JSONL row contract modulo `corpus = "youtube-ugc"` and `corpus_version = "ugc-2019-orig"` (or `ugc-2020-transcoded-mean` for the transcoded-mean variant). The dataset is hosted in the public-readable GCS bucket `gs://ugc-dataset/` with the `allUsers:objectViewer` IAM role — no sign-up, no request form. Refuses (exit 2) when handed a `<` 200-row CSV; defaults to a 300-row laptop-class subset with `--full` opting into whole-corpus ingestion (~2 TB working set). License posture stays local-only — clips + per-clip MOS under `.workingdir2/youtube-ugc/`, only derived `nr_metric_v1_*.onnx` weights ship with CC-BY attribution. Tests under `ai/tests/test_youtube_ugc.py` (18 cases) cover resumable-resume, attrition tolerance, refuse-tiny cutoff, atomic progress writes, ffprobe geometry parse, broken-clip skip, MOS-column round-trip (canonical + alias headers including `DMOS`), bare-stem `vid` -> `.mp4` suffix, append+dedup on re-run, `--max-rows` / `--full` cap, and synthesised-bucket-URL path for manifests omitting `url`.	Accepted	ai, training, corpus, license, fork-local
ADR-0414	Saliency-aware ROI for x265 / SVT-AV1 / libvvenc adapters	Accepted	vmaf-tune, saliency, codec-adapter, roi, fork-local
ADR-0415	CAMBI SYCL port — closes last CUDA-to-SYCL parity gap	Accepted	sycl, gpu, cambi, feature-extractor, fork-local, t3-15
ADR-0416	VIF on-the-fly filter sync from Netflix upstream	Proposed
ADR-0417	Draft PR registration for the `ai/tiny-netflix-training-scaffold` branch. Formally bundles the tiny-AI Netflix corpus training scaffold (ADR-0242) into a reviewable PR unit; adds Research Digest 0099 (2024–2026 distillation and ONNX Runtime literature update). No code changes — scaffold content (loader API, MCP smoke test, training-data docs) is already in `master` via ADR-0242. Decision deferred to follow-up PR pending user architecture confirmation.	Accepted	ai, training, mcp, fork-local, onnx, docs
ADR-0418	Full upstream ADM + VIF-prescale sync (companion to PR #758 / ADR-0416)	Proposed
ADR-0419	Gate SVE2 build probe to non-Darwin hosts. Apple Silicon (M1–M4) is ARMv8.x without SVE2 and the runtime detection in `core/src/arm/cpu.c` is already `__linux__`-only; recent Apple Clang accepts `-march=armv9-a+sve2` so `cc.compiles()` returns true but the SSIMULACRA 2 SVE2 TU then fails to build under Apple's incomplete intrinsics surface. Force `HAVE_SVE2=false` on Darwin to mirror the runtime gate.	Accepted	build, simd, macos, arm64
ADR-0420	Metal backend runtime (T8-1b): replaces the T8-1 scaffold's `-ENOSYS` stubs with Objective-C++ TUs (`common.mm`, `picture_metal.mm`, `kernel_template.mm`) driving `Metal.framework` directly. ARC + `__bridge_retained`/`__bridge_transfer` casts keep `<Metal/Metal.h>` out of all headers; consumer TUs stay pure-C through `uintptr_t` / `void *` handle ABIs and the new `vmaf_metal_context_{device,queue}_handle` accessors. Build wiring: `Foundation` + `Metal` framework deps flipped to `required: true`, `-fobjc-arc` added via `add_project_arguments(language: 'objcpp')`. Smoke test flipped from `-ENOSYS` pin to runtime expectations (0 on Apple-Family-7+, `-ENODEV` on Intel Mac / non-Apple hosts; gracefully short-circuits on `-ENODEV` so non-Apple-Silicon Mac CI lanes stay green). Unblocks T8-1c (first real kernel — `integer_motion_v2.metal`). Companion to issue #763.	Accepted	gpu, metal, apple-silicon, runtime, fork-local
ADR-0421	Metal first kernel — `integer_motion_v2` (T8-1c)	Accepted	gpu, metal, apple-silicon, kernel, bit-exact, fork-local
ADR-0422	CLI HIP and Metal backend selectors: adds `--no_hip`, `--hip_device <N>`, `--no_metal`, `--metal_device <N>` to the `vmaf` CLI and extends `--backend` to accept `hip` and `metal`. Activation follows the Vulkan opt-in model (device flag must be non-negative; `--backend hip\|metal` defaults device to 0 and disables all other backends). `init_gpu_backends()` in `vmaf.c` gains guarded `vmaf_hip_state_init` / `vmaf_metal_state_init` blocks. Five new `test_cli_parse` tests added. `docs/usage/cli.md` updated. No ffmpeg-patches update required (patches consume libvmaf C API, not the standalone CLI tool flags).	Accepted	cli, hip, metal, gpu, fork-local
ADR-0423	Metal IOSurface zero-copy import (T8-IOS). Public `libvmaf_metal.h` gains `VmafMetalExternalHandles`, `vmaf_metal_state_init_external`, `vmaf_metal_picture_import`, `vmaf_metal_wait_compute`, `vmaf_metal_read_imported_pictures`. `core/src/metal/picture_import.mm` implements the import via `IOSurfaceLock` + per-row memcpy into a shared-storage `VmafPicture` (Apple Silicon unified-memory cost is equivalent to a Shared MTLBuffer copy). Companion `ffmpeg-patches/0013-libvmaf-add-libvmaf-metal-filter.patch` registers the `libvmaf_metal` filter consuming `AV_PIX_FMT_VIDEOTOOLBOX` frames via `CVPixelBufferGetIOSurface`; FFmpeg passes `device=0` so libvmaf falls back to `MTLCreateSystemDefaultDevice` until upstream ships `AVMetalDeviceContext`.	Accepted	metal, ffmpeg-patches, gpu, t8-ios
ADR-0424	`vmaf-tune benchmark` consumes existing Phase-A JSONL corpora and reports one matched-quality row per encoder. The command filters successful finite rows, chooses the lowest-bitrate point clearing `--target-vmaf`, keeps closest misses visible as `unmet`, and emits markdown / JSON / CSV without launching FFmpeg or libvmaf.	Accepted	vmaf-tune, cli, benchmark, corpus
ADR-0425	vmaf-roi-score saliency materialiser	Accepted	tooling, ai, saliency, vmaf, fork-local
ADR-0426	Add CHUG as a local-only UGC-HDR MOS-corpus ingestion path. The adapter downloads/probes CHUG videos under `.workingdir2/chug/`, preserves raw CHUG MOS and HDR ladder metadata, maps trainer-facing MOS onto `[1, 5]`, and keeps all CHUG media / labels out of git under the dataset's non-commercial/share-alike license posture.	Accepted	ai, hdr, corpus, mos
ADR-0427	Materialise CHUG HDR feature rows with reference-aligned full-reference pairs. Distorted ladder rows are paired to their `chug_content_name` reference, decoded to 10-bit 4:2:0 YUV, scaled to reference geometry, and emitted as local-only clip-level feature rows for MOS-head training.	Accepted	ai, hdr, corpus, mos
ADR-0428	vmaf-tune auto selects one winner	Accepted	vmaf-tune, cli, planning
ADR-0429	testdata bench_perf is configurable	Accepted	benchmarks, testdata, tooling
ADR-0430	Saliency RGB ingest and SSIMULACRA2 public docs	Accepted	vmaf-tune, saliency, docs, metrics, fork-local
ADR-0431	Split FR-from-NR CUDA extraction into an explicit CUDA pass plus CPU residual pass. CHUG/K150K local FULL_FEATURES materialisation keeps the same parquet schema while avoiding the mixed all-feature `--backend cuda` path that can fail on 10-bit clips with duplicate feature-key writes and CUDA context synchronization errors.	Accepted	ai, cuda, training-data, corpus, fork-local
ADR-0432	Extend `vmaf-roi-score` saliency-mask materialisation to little-endian planar 8/10/12/16-bit YUV so HDR/CHUG-style inputs preserve native sample depth while saliency inference still consumes 8-bit RGB.	Accepted	roi, tiny-ai, hdr, tooling, fork-local
ADR-0433	CHUG content-safe train/validation/test splits plus an ffprobe-backed HDR metadata audit in the local feature materialiser.	Accepted	ai, hdr, chug, training, fork-local
ADR-0434	CHUG Parquet Metadata Enrichment	Accepted	ai, hdr, chug, training, fork-local
ADR-0435	PR-body pre-push validation hook	Accepted	ci, agents, hooks, docs
ADR-0436	MCP server backend-selector parity (vulkan/hip/metal)	Accepted	mcp, agents, api, dispatch, fork-local
ADR-0437	Metal public-header install and `vmaf_metal_import_state` declaration	Accepted	metal, build, c-api, install, apple-silicon, fork-local
ADR-0438	CLI parser short-option handler coverage invariant — every short option in `short_opts[]` must have a `case` arm; adds missing `case 'c':` for `--cpumask`	Accepted	cli, lint, testing, correctness
ADR-0444	Promote `saliency_student_v2` to production default (IoU 0.7105 vs v1 0.6558, +8.3 %)	Accepted	ai, dnn, saliency, tiny-ai, fork-local
ADR-0445	Persistent VkPipelineCache for Vulkan compute backend	Accepted	vulkan, gpu, performance, pipeline-cache, fork-local
ADR-0446	K150K/CHUG extractor passes HDR and HFR per-feature options	Accepted	ai, hdr, hfr, training, corpus, fork-local
ADR-0447	Motion features under-report on HFR / 50p content; apply bounded FPS weighting across motion extractor variants and document the CHUG/HFR bias.	Accepted	ai, motion, hfr, feature-extractor, cuda, sycl, vulkan, fork-local
ADR-0448	Active upstream monitoring (no silent "wait" deferrals)	Accepted	ci, governance, upstream-sync, deferral, fork-local
ADR-0451	Local dev-MCP container for live probing — Docker, all 4 GPU backends (CUDA/SYCL/Vulkan/HIP), continuous 15-min smoke probe	Accepted	infra, docker, mcp, gpu, hip, cuda, sycl, vulkan, dev, fork-local
ADR-0452	Port `calculate_c_values_row` to AVX-512 and NEON (CAMBI banding detector)	Accepted	simd, cambi, perf, fork-local
ADR-0453	PSNR `enable_chroma` option parity across all GPU backends	Accepted	cuda, sycl, vulkan, psnr, option-parity, bug
ADR-0454	VIF CUDA shared-memory staging for horizontal and vertical filter passes	Proposed	cuda, gpu, vif, performance, smem, fork-local
ADR-0455	KonViD-150k k150ka/k150kb split promotion into the MOS-head trainer	Accepted	ai, training, corpus, konvid, fork-local
ADR-0456	SSIMULACRA2 CUDA blur: 3-channel kernel fusion (`gridDim.z`) + V-pass shared-memory transpose for coalesced access	Accepted	cuda, perf, ssimulacra2
ADR-0457	model/tiny/*.onnx blobs ≥1MB live in GitHub Releases, not git	Accepted	ai, model-storage, repo-size, fork-local
ADR-0458	SYCL CAMBI queue-sync collapse + SSIM horizontal SLM staging	Accepted	sycl, perf, cambi, ssim, gpu, fork-local
ADR-0459	vmaf-tune panel/display-aware recommendation workstream (HDRSDR-VQA)	Proposed	vmaf-tune, ai, hdr, training, panel, display, fork-local
ADR-0460	Dispatch-strategy registry audit 2026-05-15	Accepted
ADR-0461	CLI validates positive dimensions and chroma-alignment on input videos	Accepted	cli, validation, correctness
ADR-0463	ADM p-norm fast-path split and VIF scalar-fallback malloc hoist	Accepted	perf, adm, vif, simd, cpu, fork-local
ADR-0464	CAMBI CUDA spatial-mask shared-memory tile	Accepted	cuda, gpu, cambi, performance, kernel, fork-local
ADR-0466	mkdocs strict-mode pre-push hook	Accepted	docs, ci, git, hooks, mkdocs
ADR-0467	SSIMULACRA2 AVX-512 + NEON IIR blur / `picture_to_linear_rgb` ULP audit — clean close	simd, ssimulacra2, audit	Accepted
ADR-0468	HIP float_adm real kernel (ninth HIP consumer)	Accepted	hip, build, feature-extractor
ADR-0469	`float_psnr` HIP twin — wire `enable_chroma` option	Accepted	hip, psnr, option-parity
ADR-0470	Disk-Persistent VkPipelineCache for Vulkan Feature Extractors	Accepted	vulkan, perf, build
ADR-0471	Add `enable_chroma` to `integer_psnr_hip` (chroma parity with CUDA/SYCL/Vulkan twins)	Accepted	hip, psnr, option-parity, chroma, fork-local
ADR-0480	Bootstrap Score Name-Builder Deduplication	Accepted	refactor, predict, libvmaf
ADR-0481	ADM p-norm Parameter Hardcoded at 3.0 — Deferral Decision	Accepted	adm, predict, ai, testing
ADR-0482	Expand vmaf_pre FFmpeg filter device strings to match full VmafDnnDevice enum	Accepted	ffmpeg, ai, build
ADR-0483	Extract shared `vmaf_gpu_dispatch_parse_env` tokenizer into `gpu_dispatch_parse.h`; removes verbatim triplicate across CUDA/SYCL/Vulkan dispatch_strategy TUs	Accepted	cuda, sycl, vulkan, refactor, dedup
ADR-0484	Extend kernel-scaffolding.md with HIP and Metal lifecycle contract	Accepted	docs, hip, metal, gpu, fork-local
ADR-0485	Extract `VMAF_LIFECYCLE_ZERO` macro to eliminate struct-init duplication across HIP and Metal kernel templates	Accepted	cuda, framework, lint, build
ADR-0486	Codify the three-function GPU backend context-API contract in docs	Accepted	docs, gpu, hip, metal, vulkan, cuda, api, fork-local
ADR-0487	Wire adm_min_val option into integer_adm GPU backends	Accepted	cuda, sycl, vulkan, adm, parity
ADR-0488	Shared once-snapshot helper for GPU dispatch env variables	Accepted	gpu, cuda, vulkan, sycl, dispatch, threading, refactor, fork-local
ADR-0489	CAMBI SYCL — Replace GPU-to-GPU `q.wait()` Calls with Event Chains (SY-1)	Accepted	sycl, gpu, cambi, performance, fork-local
ADR-0490	float_ms_ssim Metal port	Accepted	metal, ms-ssim, float, apple-silicon, fork-local
ADR-0491	Add dedicated `docs/metrics/motion.md` reference page	Accepted	docs, metrics, motion, fork-local
ADR-0492	Promote Vulkan VIF g/sv_sq Computation to double Precision	Superseded	vulkan, vif, gpu-parity, precision
ADR-0493	Test YUV fixtures must be md5-verified, not just present-by-name	Accepted	testing, ci, fixtures, golden-data
ADR-0494	Restore the non-golden Python test suite to green	Accepted	testing, ci, python, regression-recovery
ADR-0495	MCP server probe-driven bug-fix cluster (2026-05-17)	Accepted	mcp, ai, testing, regression-recovery
ADR-0496	Default to the `vmaf-dev-mcp` container for all vmaf / vmaf-tune / ai / MCP work	Accepted	tooling, container, dev-experience, project-rule, fork-local
ADR-0497	vmaf-tune BBB end-to-end bug cluster (compare / ladder / report)	Accepted	vmaf-tune, cli, bugfix, docs
ADR-0498	vmaf-tune BBB end-to-end v2 bug cluster + explicit-backend semantics	Accepted	vmaf-tune, cli, libvmaf, bugfix, docs, container
ADR-0499	vmaf-tune ladder must decode container/Y4M references before scoring	Accepted	vmaf-tune, ladder, corpus, ffmpeg, docs
ADR-0500	VIF log2 LUT Shrink and Gaussian Filter Cache	Accepted	simd, perf, integer-vif, float-vif
ADR-0501	vmaf-tune ladder cross-resolution scoring + report degraded flag	Accepted	vmaf-tune, ladder, corpus, report, vmaf-cli, docs
ADR-0502	ADM decouple gather prefetch (Approach B)	Accepted	simd, perf, adm, avx512, fork-local
ADR-0503	`vif_subsample_rd_8_avx512` Loop Fission to Reduce ZMM Register Spill	Accepted	simd, performance, avx512, vif
ADR-0504	AVX-512F port of float separable convolution scanlines	Accepted	simd, performance, build
ADR-0505	vmaf-tune ladder container-source encode + full per-CRF sample cloud	Accepted	vmaf-tune, ladder, corpus, encode, vmaf-cli, docs
ADR-0506	vmaf-tune ladder duration clipping, raw-YUV cross-res decode, CLI exit code	Accepted	vmaf-tune, ladder, corpus, encode, cli, docs
ADR-0508	vmaf-tune ladder pass-1 stats argv honours --duration	Accepted	vmaf-tune, ladder, encode, bugfix
ADR-0509	vmaf-tune compare auto-probes container-source framerate / duration	Accepted	vmaf-tune, compare, bisect, encode, vmaf-cli
ADR-0510	CHUG re-extract VMAF-alignment fix — FR-corpus guard on the FR-from-NR extractor	Accepted	ai, corpus, chug, k150k, extractor, training-data, regression-guard
ADR-0511	MCP backend probe, default allowlist, and `vmaf-tune ladder --score-backend` (2026-05-18)	Accepted	mcp, vmaf-tune, ai, dx, bugfix
ADR-0512	Vulkan VIF Two-Variant Compute Shader (fp32 Auto-Fallback)	Accepted	vulkan, vif, gpu-parity, precision, compatibility
ADR-0513	`vmaf-tune tune-per-shot` exposes `--scene-threshold` + `--max-shot-duration`; report renders 1-shot timeline	Accepted	vmaf-tune, per-shot, report, ux, bugfix, fork-local
ADR-0514	dev-MCP container exposes every host GPU backend (CUDA + SYCL + Vulkan + HIP): adds `${ONEAPI_ROOT}/tcm/latest/lib` to `LD_LIBRARY_PATH` so the level-zero UR adapter resolves `libhwloc.so.15` (closes SYCL "No platforms found" on Arc); clears `VK_ICD_FILENAMES` (was pinned to a non-existent `lvp_icd.x86_64.json` that hid the host-mounted `nvidia_icd.json` + mesa intel/radeon ICDs); bind-mounts `/dev/dri/by-path` so the Intel compute-runtime can resolve Arc via its udev pci symlink; adds a build-time backend probe that scans for "built without X support" regressions (the precise failure mode where the running image silently shipped without `-Denable_hip=true`). Closes finding 8 of `SESSION_FINDINGS_v9_GPU_PROBE.md`; pairs with ADR-0492 fp64 relax for full Vulkan parity on Arc.	Accepted	container, dev-experience, gpu, sycl, vulkan, hip, cuda, fork-local
ADR-0515	Portable temp-path setup for `test_public_api_score` on MinGW64 (2026-05-18): the `test_vmaf_write_output` case hardcoded `/tmp/vmaf_test_output_XXXXXX` and called `mkstemp(3)` on it. MSYS2/MinGW64 inside the GitHub Actions `windows-latest` runner does not expose a usable `/tmp` from the `MINGW64` shell, so `mkstemp` failed with ENOENT and the `Build — Windows MinGW64 (CPU)` job was perpetually red on master. Fix: extract a `make_temp_output_path()` helper that uses `GetTempPathA()` + a `<pid>`-suffixed filename on `#ifdef _WIN32` (mirroring the precedent in `core/test/dnn/test_model_loader.c::test_sidecar_parses`) and keeps `mkstemp` on POSIX. `unlink` → `remove` for Win32 portability. Conservative, test-only scope.	Accepted	ci, build, windows, mingw, test, fork-local, bugfix
ADR-0516	`vmaf-tune compare` multi-target rate-quality sweep (schema v2) (2026-05-18)	Accepted	vmaf-tune, ux, dx, schema-evolution
ADR-0517	Repair MCP `run_benchmark` tool — drop per-call args, inject VMAF_BIN, guard set -u in bench_all.sh	Accepted	mcp, bugfix, fork-local, benchmark
ADR-0518	Tiny-model loader accepts external-data and feature-vector ONNX	Accepted	ai, dnn, loader, bug-fix
ADR-0519	Implement vmaf_hip_import_state to unblock --backend hip	Accepted	hip, backend, libvmaf, gpu
ADR-0520	Wire `vmaf --no-reference` through to the scoring path (2026-05-18): the flag was parsed into `CLISettings::no_reference` but never read; the unconditional `if (!settings->path_ref)` gate always tripped with `Reference .y4m or .yuv (-r/--reference) is required`. Fix: gate the reference-required check on `!no_reference` and require `--tiny-model` when set; in `vmaf.c` skip the ref open and feed the distorted file twice (two `video_input` handles backed by the same source) so `vmaf_read_pictures` sees a non-null picture pair while the rank-4 DNN dispatch sees the distorted frame. Refuses rank-2 feature-vector tiny models (they consume FR feature columns); auto-suppresses the built-in VMAF SVM default that would otherwise require a real reference. Closes `.workingdir/bbb_reports/E2E_TEST_MATRIX_v9.md` Finding 8.	Accepted	cli, ai, dnn, docs, fork-local, bugfix
ADR-0521	MSVC portability gating — `vif_avx512.c` noinline/noclone + `yuv_input.c` S_ISREG/fstat	Accepted	ci, build, windows, msvc, simd, tools, portability, fork-local, bugfix
ADR-0522	`--tiny-codec` / `--tiny-preset` / `--tiny-crf` populate codec one-hot block	Accepted	cli, ai, dnn, tiny-model
ADR-0523	Register `vmaf_fex_integer_motion_hip` in the extractor list	Accepted	hip, gpu, feature-extractor, bugfix, fork-local
ADR-0524	Tiny-model loader accepts ONNX models with a symbolic batch dim (2026-05-18): `vmaf_ctx_dnn_attach` rejected `model/tiny/nr_metric_v1.onnx` (input shape `['batch', 1, 224, 224]`) because the gate required `in_shape[0] == 1`; ORT surfaces symbolic dims as `-1`. Surfaced by the `--no-reference` agent (PR #1280 / ADR-0520) — every shipped NR tiny model uses the PyTorch/ONNX dynamic-batch idiom. Fix: fold `in_shape[0] ∈ {1, -1}` to batch=1 in both `dnn_attach_nchw` and `dnn_attach_feature_vector` (and the optional rank-2 second input); fixed batch > 1 stays rejected (no batched-inference scheduler exists); symbolic H/W stays rejected with a sharper diagnostic. Per-frame inference always emits batch=1 already, so no runtime change needed.	Accepted	ai, dnn, loader, bug-fix
ADR-0525	Extract `run_cmd` subprocess helper into `aiutils`	Accepted	ai, refactor, fork-local
ADR-0526	Add enable_lcs and enable_chroma to float_ms_ssim SYCL twin	Accepted	sycl, parity, options
ADR-0527	Accept pre-extracted BVI-DVC YUVs via `--bvi-dir`	Accepted	ai, corpus, cli, docs
ADR-0528	`test_cli_parse_long_only_args::test_threads_invalid_optarg_does_not_assert` regression close (2026-05-18): the test fork-harness allocated a 4 KiB stderr buffer and stopped reading once full, so once `usage()`'s help text grew past 4 KiB the child either SIGPIPE'd (signal 13) or SIGABRT'd (signal 6 via aborting stdio) — the test's `WIFEXITED` check rejected the run. Fix: refactor the parent into a `read_head_drain_tail()` helper that captures the first 511 bytes (enough for the "Invalid argument …" needle) then drains the remainder so the writer never blocks; extract child-side `dup2 + cli_parse + _exit` into `child_parse_via_pipe()`. Defence-in-depth in `error()`: replace `assert(long_opts[n].name)` with an explicit `usage()` fallback (banned macro per `principles.md §1.2 rule 30`; `-DNDEBUG` would silently no-op anyway) and `sprintf → snprintf` on the 256-byte `optname` scratch buffer. Product-code path was already correct (ADR-0316 / ADR-0438 long-only enum fix); this is the test harness catching up.	Accepted	cli, test, regression, fork-local, bugfix
ADR-0529	Replace `/dev/dri/by-path` bind with whole `/dev/dri` bind in dev container	Accepted	build, ci, cuda, sycl, agents
ADR-0530	Promote `VMAF_FEATURE_EXTRACTOR_HIP` on `integer_motion_hip`; add `VMAF_PICTURE_BUFFER_TYPE_HIP_DEVICE` enum entry; wire `compute_fex_flags()` HIP slot; add CPU-twin fallback pass in `_by_feature_name`; HIP `flush_context_serial` drain (gpu_pending final-frame collect); route HIP collect/flush writes through `feature_name_dict`. End-to-end: `--backend hip --feature integer_motion` now actually dispatches `calculate_motion_score_kernel_8bpc` (48 HSACO launches per 48-frame clip; HIP vmaf=76.7125 vs CPU vmaf=76.6678, within places=4). Extends ADR-0519 (HIP `vmaf_hip_import_state` impl) + ADR-0523 (PR #1283, extractor registration + meson source-list entry).	Accepted	hip, gpu, dispatch, feature-extractor, fork-local
ADR-0531	Per-shot plan emits bitrate_kbps + chart shows last shot	Accepted	vmaf-tune, per-shot, report, chart
ADR-0532	tune-per-shot tolerates read-only CWD when writing segments	Accepted	vmaf-tune, cli, robustness, container
ADR-0533	Full HIP feature-extractor registration sweep	Accepted	hip, gpu, feature-extractor, bugfix, fork-local
ADR-0534	vmaf-tune compare emits + renders rate-quality curve from per-iteration bisect samples	Accepted	vmaf-tune, compare, report, chart, ux
ADR-0535	Atomic ADR Number Allocator with Cross-Branch Claim	Accepted	ci, docs, git, agents, tooling
ADR-0536	Per-shot predicate threads bitrate_kbps through bisect sidecar (PR #1290 follow-up)	Accepted	vmaf-tune, per-shot, bugfix
ADR-0537	Fix the `vmaf_fex_integer_vif_hip` GPU memory access fault that ADR-0530 un-flagged as a follow-up: upload the static 4×18 `vif_filter1d_table` to a device buffer at init (the pre-fix kernel was being handed a host pointer); correct filter half-widths from `{9,5,3,0}` (parsed from the kernel-name suffix) to `{8,4,2,1}` (= `vif_filter1d_width[scale]/2`) so the loop stops reading past the 18-entry table; add the rd-filter downsample-write path so scales 1–3 read the half-resolution planes the previous horizontal pass wrote (the pre-fix kernel left them uninitialised); allocate per-frame `hipMemcpy2DAsync` staging buffers so the kernel reads device memory not the host `VmafPicture->data[0]`. Re-enable `VMAF_FEATURE_EXTRACTOR_HIP` on `vmaf_fex_integer_vif_hip`. End-to-end: `--backend hip --feature vif_hip` produces VMAF VIF scores on the Netflix golden pair (0.5047 / 0.8764 / 0.9365 / 0.9634 vs CPU 0.5057 / 0.8791 / 0.9379 / 0.9643 — within places=3; places=4 follow-up tracked). Also bundles three adjacent fixes: missing HSACO entries for the ADR-0533 extractor sweep, weak-stub HSACO blobs for the four ADM kernels that don't compile standalone, and `-I build_dir/feature/hip/hip` additions to the hipcc command. Closes the ADR-0530 follow-up.	Accepted	hip, gpu, kernel, vif, bug-fix, fork-local
ADR-0538	vmaf-tune compare ships premium-archival --target-vmafs default + bisect reaches VMAF 95+	Accepted	vmaf-tune, compare, bisect, defaults, premium-archival
ADR-0539	integer ADM HIP kernels — real implementation replacing weak HSACO stubs	Accepted	hip, adm, kernels, rocm, fork-local
ADR-0540	dev-MCP container FFmpeg ships AV1 (SVT/aom) + VVenC + hardware encoders (NVENC, oneVPL/QSV, AMF): adds `libsvtav1-dev` / `libaom-dev` / `libvpl-dev` (apt), builds Fraunhofer VVenC + AMD AMF headers from source, and extends the FFmpeg configure line with `--enable-libsvtav1 --enable-libaom --enable-libvvenc --enable-nvenc --enable-cuda-nvcc --enable-libnpp --enable-libvpl --enable-amf`. Adds a build-time encoder probe that locks down the compile-in promise (catches silent `--enable-*` drift the same way ADR-0514 catches backend-flag drift). Unblocks `vmaf-tune compare` sweeps across the full 15-encoder matrix on hosts with NVIDIA + Intel + AMD GPUs.	Accepted	container, dev-experience, ffmpeg, codecs, av1, vvc, nvenc, qsv, amf, fork-local
ADR-0541	Pin dev-MCP container Intel NEO + ROCm runtimes to versions matching the host kernel	Accepted	build, dev, sycl, hip, container, gpu
ADR-0542	dev-MCP container closes the last GPU plumbing gaps: entrypoint dynamically rewrites `VK_DRIVER_FILES` to exclude lavapipe whenever any real Vulkan ICD is present (NVIDIA via Container Toolkit / Intel + AMD via mesa); pins `HSA_OVERRIDE_GFX_VERSION=10.3.0` + `HSA_ENABLE_SDMA=0` + `ROCR_VISIBLE_DEVICES=0` in `docker-compose.yml` so AMD `gfx1036` iGPU passes `hsa_init()` (ROCm 6.x allowlist gate); adds `intel-media-va-driver-non-free` + `mesa-va-drivers` to stage 1 for VA-API codec driver coverage; documents the `NVIDIA_DRIVER_CAPABILITIES=…,graphics,…` requirement for NVIDIA Vulkan ICD bind-mount. All five `--backend {cpu, cuda, sycl, vulkan, hip}` lanes run on real hardware (no silent CPU/lavapipe fallback).	Accepted	container, dev-experience, vulkan, sycl, hip, rocm, fork-local
ADR-0543	ADR-0498 enforcement hardening — distinct exit code + structured JSON error + per-feature gate	Accepted	cli, libvmaf, bugfix, backend, exit-code, extends-adr-0498
ADR-0544	Deduplicate `feature_extractor_list[]` (2026-05-18): the `HAVE_VULKAN` block held 67 entries instead of 18 — `vmaf_fex_psnr_hvs_vulkan` and `vmaf_fex_float_ms_ssim_vulkan` each registered 11 times, seven other Vulkan twins 6 times each; the `HAVE_SYCL` block held 17 entries instead of 11 (six twins registered twice). The first-match `vmaf_get_feature_extractor_by_name()` hid the bug from CLI users, but the ctx-pool's iterator path allocated one pool entry per duplicate, driving `init`/`extract`/`flush` 2–11x per picture on the affected GPU twins (plausible root cause of the v9 CHUG "VMAF=99 across all bitladders" anomaly previously attributed to operational misuse in PR #1270). Fix removes 61 duplicate entries and adds `vmaf_feature_extractor_list_audit()` — called once from `vmaf_init()` and exercised by a new `meson test` case so the bug class is detected at startup on every future build.	Accepted	bug, dispatch, vulkan, sycl, registry, fork-local
ADR-0545	Wire or delete dead Vulkan/Metal feature-extractor source files	Accepted	vulkan, metal, build, housekeeping, dead-code
ADR-0546	Audit bundle — Vulkan motion dispatch wiring, saliency hard-fail, model-card placeholder	Accepted	vulkan, vmaf-tune, ai, build, docs
ADR-0547	VMAF__DIR env-var overrides for ai/scripts corpus paths + drop cli.py.bak	Accepted	ai, scripts, container, hygiene, fork-local
ADR-0548	vmaf-tune tune-per-shot accepts container sources directly; compare gains --no-bisect mode	Accepted	vmaf-tune, cli, ergonomics, compare, per-shot
ADR-0549	Audit cleanup bundle 2	Accepted	docs, build, container, housekeeping, fork-local
ADR-0550	Auto-resize input plane to NR tiny-model dims + `--tiny-resize` flag	Accepted	ai, cli, dnn, api
ADR-0551	VCQ-223 LocalExplainer CI timeout — root cause and fix path	Proposed	python, test, local-explainer, performance, bugfix, fork-local
ADR-0552	Deterministic wavefront reduction for `integer_vif_hip` horizontal kernels	Accepted	hip, gpu, kernel, vif, parity, correctness, fork-local
ADR-0556	Python / MCP / AI silent-fallback audit fixes (2026-05-18)	Accepted	python, mcp, ai, vmaf-tune, correctness, audit
ADR-0559	Feature Coverage Audit — Add speed_chroma + speed_temporal to Extraction Scripts (HDR-model prep)	Accepted	ai, feature-extraction, speed, hdr, corpus, fork-local
ADR-0561	Unknown	Accepted
ADR-0562	VCQ-223 LocalExplainer hang fix — cap neighbor_samples in test runner	Accepted	python, test, local-explainer, performance, bugfix, fork-local
ADR-0563	HIP extractor audit — verification of 9 remaining scaffold claims	Accepted	hip, gpu, audit, parity, fork-local
ADR-0564	Real integer_ssim GPU kernels (CUDA, HIP, SYCL) — replace silent float_ssim substitution	Accepted	cuda, hip, sycl, ssim, kernel, correctness, gpu, fork-local
ADR-0565	Continuous feature-mix evaluation pipeline (`predictor-bench`): a YAML-declared eval grid of (target_model × corpus × codec × display × tuning_preset) cells evaluated by greedy forward feature selection + LOSO CV + 95 % bootstrap CIs on the marginal PLCC/SROCC delta vs the Netflix SVM baseline, stored in DuckDB and rendered as a Markdown report. Implemented as `vmaf-tune predictor-bench run/report/show/diff`. Three-phase plan: Phase 1 local MVP, Phase 2 SHAP + manual CI trigger, Phase 3 nightly cron + PR regression gate. Motivated by the upcoming Netflix HDR VMAF model, CHUG-HDR corpus, and continuous codec-adapter evolution.	Proposed	ai, vmaf-tune, predictor, eval, corpus, fork-local, ci
ADR-0566	Unknown	Accepted
ADR-0567	Real on-device GPU kernels for `speed_chroma` and `speed_temporal` (4 backends)	Accepted	gpu, hip, cuda, sycl, vulkan, speed, fork-local
ADR-0568	Default `sycl_icpx_aot_targets` to the full Intel GPU architecture list to avoid SYCL JIT cold-start costs on Arc/iGPU builds.	Accepted	sycl, build, meson, gpu, intel, aot, fork-local
ADR-0569	Bundle low-risk SDK/tool version bumps from the 2026-05-18 audit, including ORT, AMF, VVenC, formatters, ruff, cosign, and libsvm bounds.	Accepted	build, container, ci, deps, pre-commit, fork-local
ADR-0573	Dev-mcp container — ubuntu:26.04 + CUDA 13.2 + hipcc + ocloc	Accepted	build, ci, cuda, container, dev
ADR-0574	CUDA Twins for HDR-Model Features — Phase 1 (aim, adm3)	Accepted	cuda, feature, hdr, adm
ADR-0575	Fix yuv_input.c stat compat — include-order and _MSC_VER guard	Accepted	ci, build, windows, msvc, mingw, tools, portability, bugfix, fork-local
ADR-0576	ffmpeg-patches n8.1.1 full-feature-exposure sync	Accepted	ffmpeg, build, ci, cuda, sycl, hip, vulkan, metal
ADR-0577	vmaf-tune bisect decode concurrency cap and aggressive workdir cleanup	Accepted	vmaf-tune, compare, bisect, disk-space, concurrency, fork-local
ADR-0578	Hoist VIF scratch buffer from per-frame allocation to VifState	Accepted	perf, vif, cpu, build
ADR-0579	`vmaf-tune auto --execute` — Phase F real encode/score execution mode	Accepted	vmaf-tune, phase-f, encode, score, cli, fork-local
ADR-0580	float_ansnr enable_chroma option	Accepted	feature-extractor, metrics
ADR-0581	Add `enable_chroma` option to `integer_vif`	Accepted	feature, vif, chroma
ADR-0582	MS-SSIM `enable_db` and `clip_db` option parity on CUDA and SYCL backends; also adds `enable_lcs` to SYCL extractor.	Accepted	cuda, sycl, ms_ssim, option-parity, bug
ADR-0583	Add `enable_chroma` option to the `float_ms_ssim` extractor	Accepted	ms-ssim, float-ms-ssim, option-parity, metrics, correctness, fork-local
ADR-0584	Unknown	Accepted
ADR-0585	Add `enable_chroma` option to `psnr_hvs_vulkan`	Accepted	psnr-hvs, vulkan, option-parity, metrics, fork-local
ADR-0586	Introduce integer_adm_vulkan.c as canonical Vulkan integer ADM extractor	Accepted	vulkan, build, feature-extractor
ADR-0587	Real Metal Compute Kernels for CAMBI	Accepted	metal, cambi, gpu, build
ADR-0588	vmaf-tune executor — per-shot and saliency execution modes	Accepted	vmaf-tune, executor, per-shot, saliency, phase-f, fork-local
ADR-0589	Metal `float_ssim` option parity — `enable_lcs`, `enable_db`, `clip_db`, `scale`	Accepted	metal, ssim, option-parity, apple-silicon, kernel, fork-local
ADR-0590	Wire `enable_db` / `clip_db` into the CUDA and SYCL MS-SSIM twins	Accepted	cuda, sycl, ms-ssim, option-parity, bug, fork-local
ADR-0591	Restore `rfe_hw_flags` per-frame bitmask cache after PR #1067 clobber	Accepted	cuda, perf, bug, libvmaf
ADR-0592	Remove float_vif_score weak HSACO stub now that real HIP kernel ships	Accepted	hip, build, cleanup
ADR-0593	HIP integer_moment kernel — register real HSACO blob alongside psnr / psnr_hvs	Accepted	hip, gpu, parity, build
ADR-0594	Per-kernel `hip_cu_extra_flags` dispatch — disable FMA contraction for `ssimulacra2_blur` HIP HSACO	Accepted	hip, build, ssimulacra2, numerics
ADR-0595	Real two-pass argv for all 14 codec adapters	Accepted	vmaf-tune, codec, encode, ffmpeg
ADR-0596	Delete orphan and duplicate HIP/CUDA translation units	Accepted	hip, cuda, build, cleanup
ADR-0597	`integer_vif` is luma-only across every backend; CUDA `enable_chroma` is a documented no-op	Accepted	cuda, vif, parity, docs, audit-disposition
ADR-0598	vmaf-tune workdir relocation — disk-space preflight + VMAFTUNE_WORKDIR env var	Accepted	vmaf-tune, bugfix, cli, container, workspace
ADR-0599	Cross-Backend Parity Audit — Full Extractor Matrix (2026-05-18)	Accepted	cuda, sycl, vulkan, hip, ci, parity, audit
ADR-0600	Port upstream USE_DIRECT_READ zero-copy input path (Netflix/vmaf@30a6e2a8d)	Accepted	upstream-port, performance, tools, cli, build
ADR-0601	Fix three BBB v14 hardware-encoder bugs: (V14-A) raise probe dummy-encode resolution from 64×64 to 320×240 so NVENC / QSV do not reject it with EINVAL; (V14-B) inject QSV VA-API device-init chain (`-init_hw_device vaapi=va:… -init_hw_device qsv=qsv_dev@va -filter_hw_device va` + `-vf format=nv12,hwupload=extra_hw_frames=64`) into both probe and encode paths, exposed via `--vaapi-device` flag + `VMAFTUNE_VAAPI_DEVICE` env var; (V14-C) document gfx1036 (AMD Raphael/Phoenix APU) decoder-only limitation in `_amf_common.py` — no VCE encode block, `AMF_NOT_SUPPORTED` is a hardware ceiling not a software bug.	Accepted	vmaf-tune, compare, qsv, amf, nvenc, hardware, probe, bugfix, fork-local
ADR-0602	macOS SIGSEGV in vmaf_write_output — pic_cnt underflow + missing vmaf NULL guard	Accepted	bugfix, macos, output, portability, correctness, fork-local
ADR-0603	Ubuntu 26.04 (Resolute Raccoon) fallout fixes — CUDA 13.2, Python 3.14, apt renames	Accepted	build, cuda, ci, python, supply-chain
ADR-0604	Add Renovate customManager for ROCm apt-repo tracking	Accepted	build, container, supply-chain, hip, renovate
ADR-0605	Renovate customManagers for all dev/Containerfile pinned dependencies	Accepted	build, container, supply-chain, renovate, cuda, sycl, hip, intel, onnx
ADR-0606	macOS SIGSEGV deep-fix in output.c writers (PR #1403 follow-up)	Accepted	bugfix, macos, output, portability, correctness, fork-local
ADR-0607	vmaf-tune compare: decode reference YUV once for the entire run	Accepted	vmaf-tune, performance, disk-space, compare
ADR-0608	Commit `.zed/` project configuration (settings, tasks, debug) for Zed editor parity with `.vscode/`; adds clangd LSP, pyright+ruff, shfmt, vmaf-mcp `context_servers` entry, CodeLLDB debug configs, and task shortcuts mirroring all Makefile targets. `docs/development/ide-setup.md` updated with Zed section. `.zed/local/` added to `.gitignore`.	Accepted	dev, ide, docs, build, workspace
ADR-0612	Tiny-AI training scaffold on the Netflix VMAF corpus (2026-05-19 iteration): formalises architecture alternatives table (MLP depth/width, distillation-vs-scratch, model size, evaluation scope), `--data-root` CLI contract, and companion research digest 0607. Scaffold-only; training deferred to follow-up PR.	Proposed	ai, docs, workspace, mcp
ADR-0613	Dynamic Optimizer — Joint Shot-Boundary + CRF Co-Optimisation	Proposed	ai, planning, vmaf-tune
ADR-0614	Per-Shot ABR Rendition Selection	Proposed	ai, planning, vmaf-tune
ADR-0615	Fast NR Pre-Scoring for CRF Bisect Acceleration	Proposed	ai, planning, vmaf-tune
ADR-0616	VMAF NEG Integration into vmaf-tune	Proposed	ai, planning, vmaf-tune, docs
ADR-0617	Cross-Shot Complexity Weighting and Title-Level Quality Constraints	Proposed	ai, planning, vmaf-tune
ADR-0618	Content-Aware Classifier for Encoder Routing	Proposed	ai, planning, vmaf-tune, dnn
ADR-0620	Scaffold audit P0 — three silent-correctness fixes	Accepted	python, correctness, bugfix, fork-local
ADR-0621	Scaffold audit P3 cleanup (2026-05-19): six housekeeping items — (1) repo-root detection in `permutation_importance.py`; (2) `.workingdir2` → `.corpus` default paths in 13 `ai/scripts/*.py`; (3) precise `@unittest.skip` rationales for 4 stale test skips; (4) inline `places=1` justification comments in deterministic quality-runner tests; (5) CI-only smoke fixture documentation in `docs/ai/model-registry.md` + `lpips_sq_v1` mismatch note; (6) semgrepignore entry for vendored cJSON upstream markers. State drift: add `T-VULKAN-MOTION-LAVAPIPE-INIT` Open row in `docs/state.md`; close `T-PYTHON-PERMUTATION-IMPORTANCE-HARDCODED-PATH`.	Accepted	hygiene, ai, python, test, docs, ci, fork-local
ADR-0622	VMAF NEG integration for vmaf-tune commands: model resolution, CLI threading, and `--neg` support across recommendation/tuning surfaces.	Accepted	vmaf-tune, docs
ADR-0623	Scaffold audit P2: nine half-finished implementation fixes — expose `adm_p_norm` on integer ADM; gate `float_vif_hip` auto-dispatch behind `enable_float_vif_hip_autodispatch` Meson option; file T-SYCL-CLANG-TIDY-DISABLED / T-DOCKER-SMOKE / T-VULKAN-MOTION-LAVAPIPE-INIT / T-GPU-COVERAGE-STABLE-WEEKS / T-INTEGER-ADM-P-NORM-SIMD-GAP in state.md; fix stale `.workingdir2/` → `.corpus/` path in konvid_mos_head_v1.md; add forward-declaration banner to u2netp_mirror_card.md; rename lpips_sq.md → lpips_sq_v1.md to match registry id.	Accepted	ci, build, docs, hip, adm, state
ADR-0624	Fast NR pre-scoring implementation for `tune-per-shot` and `compare`, including NR proxy backend, early-elimination telemetry, and bisect integration.	Accepted	vmaf-tune, bisect, ai, performance
ADR-0626	SSH-into-runner debug session on macOS CI failure via tmate	Accepted	ci, macos, debug, fork-local
ADR-0628	Remote-aware ADR number allocator — cross-worktree collision prevention	Accepted	adr, tooling, ci, governance, agents, fork-local
ADR-0634	MCP P0 capability audit fixes: spec-correct `isError`, backend probe, version reporting, and encoded-video scoring tools.	Accepted	mcp, bugfix, spec-correctness, fork-local
ADR-0635	Unknown	Accepted
ADR-0637	Fix 5 master CI failures: (1) repair corrupted `test_list_tools_returns_expected_names` function in MCP smoke test (invalid syntax from PR #1417/#1418 squash-merge); (2) lower coverage floor from 40% to 37% to match measured 37.7% after 2,200 LOC added by PRs #1417–#1425; (3) bump `netflix-golden` timeout 25→45 min; (4) bump `vulkan-vif-cross-backend` timeout 25→60 min; (5) bump `vulkan-parity-matrix-gate` timeout 30→60 min.	Accepted	ci, mcp, coverage, vulkan, timeout, fork-local
ADR-0638	MCP P1 surface — vmaf-tune integration, list_extractors, describe_model, progress notifications	Accepted	mcp, vmaf-tune, api, docs
ADR-0639	Scaffold-audit P1 — backend precheck, HIP picture, mobilesal bpc, DNN multi-output	Accepted	python, hip, ai, dnn, docs, vmaf-tune
ADR-0640	Tiny-AI training on the original Netflix VMAF training corpus (2026-05-20 scaffold iteration)	Proposed	ai, docs, workspace, mcp
ADR-0641	Harden dev-container encoder probes and vmaf-tune compare reports: auto-select Intel VA-API render nodes for QSV, build pinned `intel/vpl-gpu-rt` into the dev image, surface actionable AMF/QSV runtime errors, align compose health with the stdio MCP entrypoint, emit `compare --format html\|both` profile-card reports directly, reduce default CPU compare encoders to `libx265,libsvtav1`, and lower raw shared-reference mid-run disk headroom to 1.1×.	Accepted	dev-container, vmaf-tune, ffmpeg, qsv, amf, reports
ADR-0642	AI refresh defaults use the current fork CPU `vmaf` binary and real `FULL_FEATURES` regeneration paths for KoNViD-1k, UGC, BVI-DVC, and aggregate parquet rebuilds.	Accepted	ai, training-data, full-features, konvid, ugc, bvi-dvc, fork-local
ADR-0643	vmaf-tune reports embed a versioned encoder profile that humans can inspect and `vmaf-tune encode-profile` can consume to run one selected FFmpeg encode; FFmpeg patch 0015 adds the advisory `-vmaf-profile` hand-off.	Accepted	vmaf-tune, ffmpeg, reports, cli, encoder-profile
ADR-0644	Add `vmaf-tune compare` codec runtime variants: `ADAPTER@VARIANT` display tokens still route through the base adapter, `--encoder-ffmpeg-bin TOKEN=PATH` binds token-local FFmpeg binaries, and compare JSON/CSV rows now expose `adapter`, `runtime_variant`, and `ffmpeg_bin` provenance metadata.	Accepted	vmaf-tune, ffmpeg, cli, docs
ADR-0645	Thread integer ADM p-norm through SIMD callbacks	Accepted	simd, feature-extractor, testing
ADR-0646	Route attached DNN multi-output tensors	Accepted	2026-05-20
ADR-0647	Refresh `fr_regressor_v1` from the 2026-05-20 Netflix feature table	Accepted	ai, tiny-ai, model-refresh, netflix-public
ADR-0648	CHUG HDR MOS Trainer Entry Point	Proposed
ADR-0649	CHUG HDR Wide MOS Feature Schema	Proposed
ADR-0650	Add a signal-mix audit CLI	Accepted	2026-05-20
ADR-0651	Preserve normalized ffprobe HDR/display metadata on every CHUG feature row	Accepted	ai, chug, hdr, metadata, training
ADR-0652	Add cheap decoded-luma blur/noise/grain primitives to CHUG feature rows	Accepted	ai, chug, hdr, features, training
ADR-0653	CHUG Display Profile Training	Proposed
ADR-0654	Predictor Preserves Saliency Signals	Accepted
ADR-0655	Saliency feature materializer for existing AI corpus tables: enrich JSONL/parquet rows with `saliency_mean`, `saliency_var`, and row-level status before predictor / MOS-head retrains consume the signal.	Accepted	ai, saliency, training-data, docs
ADR-0656	External-bench fork wrappers now emit registry competitor keys (`fork-fr-regressor`, `fork-nr-metric`) in `summary.competitor` so `compare.py` can validate and aggregate the fork's own FR/NR competitors instead of skipping them as schema mismatches. Shell-wrapper smoke tests use a fake `vmaf-tune` binary to pin the contract without installing external competitors.	Accepted	ai, testing, tooling, fork-local
ADR-0657	Second-Opinion Feature Materializer	Accepted	ai, signal-mix, mos, external-bench
ADR-0658	Project modernization audit	Accepted	2026-05-20
ADR-0659	Modernization audit false-positive filter	Accepted	2026-05-20
ADR-0660	Tiny-AI extractors check DNN availability before model paths	Accepted	2026-05-20
ADR-0661	AI training sidecars record shared run provenance. MOS-head manifests include the user-facing entrypoint, normalized CLI arguments, named input/output paths, file hashes where available, and shared-trainer identity for CHUG wrapper runs.	Accepted	ai, tooling, manifests, training
ADR-0662	Vulkan motion lavapipe parity. Routes automatic Vulkan `motion` dispatch and CI parity through the canonical `integer_motion_vulkan` twin while preserving `motion_vulkan` as an explicit compatibility name; restores `integer_motion_vulkan`'s CPU/CUDA-compatible `debug=true` default; corrects CUDA / SYCL / Vulkan `motion_v2` mirror padding to the CPU reflect-101 literal (`2*size - idx - 2`); and re-enables `motion` + `motion_v2` in the lavapipe parity matrix.	Accepted	vulkan, cuda, sycl, ci, feature-extractor, numerical-correctness
ADR-0663	Adds an explicit MOS label materializer for feature tables and changes real MOS-head training to fail instead of silently synthesizing when no labelled rows load.	Accepted	ai, mos, training, corpus
ADR-0664	Install CUDA 13.2.0 directly in the Windows MSVC + CUDA CI leg after the wrapper action failed before setup	Accepted	ci, build, cuda, windows, github-actions
ADR-0665	Fast-NR calibration sidecar writes require a minimum sample count and positive NR-vs-FR PLCC gate; weak fits report but do not update tune inputs unless explicitly overridden.	Accepted	ai, vmaf-tune, calibration, fast-nr
ADR-0666	`vmaf-tune report` renders run-specific Quick takeaways before detailed charts so profile cards state the best row, coverage gaps, ladder span, and per-shot CRF spread for non-expert readers.	Accepted	vmaf-tune, reports, ux, encoder-profile
ADR-0667	`vmaf-tune --score-backend auto` now uses native-first GPU priority (`cuda -> sycl -> hip -> vulkan -> cpu`) and accepts explicit `hip` backed by ROCm availability probes.	Accepted	vmaf-tune, gpu, cuda, sycl, hip, vulkan, fork-local
ADR-0668	AI FULL_FEATURES table builders emit replayable manifest sidecars with shared `run_provenance` for extraction, parquet combination, and metadata enrichment outputs.	Proposed	ai, training, provenance, parquet
ADR-0669	AI corpus JSONL merge and aggregation scripts emit replayable manifest sidecars with shared `run_provenance`.	Proposed	ai, training, provenance, corpus
ADR-0670	Legacy AI corpus/extraction scripts emit replayable manifest sidecars with shared `run_provenance`.	Proposed	ai, training, provenance, corpus
ADR-0671	Adds the missing `ai/scripts/export_u2netp_mirror.py` operator bridge for the ADR-0412 U2NetP mirror scaffold. The exporter imports an audited local upstream U-2-Net checkout, loads `u2netp.pth`, exports ONNX opset 17 as `input` -> `saliency_map`, requires Apache-2.0 license evidence, and writes a `u2netp-mirror-export-manifest-v1` sidecar with shared `run_provenance`. The generated ONNX remains a signed release asset, not a committed file, and does not flip the `mobilesal` default.	Accepted	ai, dnn, u2netp, saliency, onnx, provenance, fork-local
ADR-0672	Exposes `ai/scripts/materialize_saliency_features.py` temporal reducer controls (`mean`, `ema`, `max`, `motion-weighted`), records saliency model / reducer / EMA metadata on newly materialized rows, and preserves unknown provenance on skipped existing saliency rows.	Accepted	ai, saliency, materializer, provenance, fork-local
ADR-0673	Adds a manifest-driven batch runner for saliency feature materialization so refreshed AI tables can share defaults, per-table overrides, audits, and one provenance-backed batch report.	Accepted	ai, saliency, materializer, provenance, fork-local
ADR-0674	Adds a manifest-driven batch runner for second-opinion feature materialization so refreshed AI tables can join external NR/MOS scorer sidecars with per-table audits and one provenance-backed batch report.	Accepted	ai, second-opinion, materializer, provenance, fork-local
ADR-0675	Adds a manifest-driven batch runner for MOS-label materialization so refreshed AI feature tables can join subjective labels with per-table audits and one provenance-backed batch report.	Accepted	ai, mos, materializer, provenance, fork-local
ADR-0676	Requires MOS corpus JSONL adapters to emit replayable manifest sidecars with run counters, effective ingest config, path evidence, and ADR-0661 provenance.	Accepted	ai, mos, corpus, provenance, fork-local
ADR-0677	KoNViD-1k and YouTube-UGC fetch helpers now write deterministic ADR-0661 run-manifest sidecars before corpus conversion. `fetch_konvid_1k.py` defaults to `<root>/fetch_manifest.json`; `fetch_youtube_ugc_subset.py` keeps its existing stem content manifest and writes `<manifest>.run-manifest.json` by default.	Accepted	ai, datasets, provenance, training, fork-local
ADR-0678	AI scripts now have `aiutils.run_manifest.write_run_manifest()` plus a Claude `/ai-run-manifest` workflow for standalone artifact sidecars, while stable reports may continue embedding `build_run_provenance()`.	Accepted	ai, provenance, docs, agents
ADR-0679	CI Draft Auto-Merge Gate	Accepted	ci, github-actions, merge-train, adr, fork-local
ADR-0680	AI batch scripts now share parser/raw-argv boilerplate through `aiutils.cli_helpers` while keeping table-specific manifest schemas local to each runner.	Accepted	ai, cli, provenance, agents
ADR-0681	AI Script Bootstrap Helper	Accepted	ai, cli, provenance, agents
ADR-0682	Tiny-AI Netflix corpus training scaffold — 2026-05-22 prep scope	Accepted	ai, training, fork-local, onnx, mcp, docs
ADR-0683	Replace banned functions (`sprintf`/`strcpy`) in vendored MCP cJSON with `snprintf`/`memcpy`; add vendor-policy `AGENTS.md`	Accepted	mcp, vendored, security, c, fork-local
ADR-0684	Pre-rebase worktree-drift guard — companion git hook to ADR-0332, refuses `git rebase` from main checkout while agent worktrees are active	Accepted	agents, ci, git-hooks, fork-local
ADR-0685	Tiny-AI Netflix corpus training scaffold — 2026-05-27 prep scope	Accepted	ai, training, fork-local, onnx, mcp, docs
ADR-0686	VMAFX rebrand and aggressive modernization — umbrella ADR covering rename, Phase 1–4 plan, and multi-language strategy	Proposed	meta, vmafx, rebrand, modernization
ADR-0687	CHUG HDR MOS head — held-out test partition validator	Accepted	ai, training, hdr, validation, fork-local
ADR-0688	HIP wave32 carry-preserving int64 reduction for VIF and motion kernels	Accepted	hip, vif, motion, numerics, rocm, fork-local
ADR-0689	VMAFX CI matrix deduplication — remove redundant job axes	Accepted	ci, build, vmafx, fork-local
ADR-0690	VMAFX binary and AI tool aliases (`vmafx`, `vmafx-tune`, `vmafx-mcp`)	Accepted	cli, build, vmafx, aliases, fork-local
ADR-0691	VMAFX Phase 1C — drop legacy build paths (`build-cpu`, `build-cuda`, `build-all` → unified `build/`)	Accepted	build, meson, vmafx, phase1, fork-local
ADR-0692	Bump C standard to C23 (VMAFX rebrand Phase 1D); fix `test_propagate_metadata` prototype mismatch; add `-Wimplicit-fallthrough`.	Accepted	build, c, standards, meson, fork-local, vmafx-rebrand
ADR-0694	Tighten clang-tidy enforcement and confirm ASan/UBSan/TSan/MSan as required CI gates	Accepted	ci, lint, sanitizers, clang-tidy, fork-local
ADR-0696	`--netflix-compat` flag for restoring legacy VMAF defaults	Accepted	cli, compat, vmafx, fork-local
ADR-0698	VMAFX production Dockerfile — multi-arch, image signing, SBOM	Proposed	docker, build, security, vmafx
ADR-0699	VMAFX Helm chart and Kubernetes manifests with 3-vendor GPU device-plugin support	Proposed	k8s, helm, gpu, vmafx
ADR-0700	VMAFX repo layout: rename `libvmaf/` → `core/` and `python/vmaf/` → `compat/python-vmaf/`; ABI unchanged	Accepted	build, workspace, meta, vmafx
ADR-0701	vmafx-server HTTP transport + observability foundation (`/healthz`, `/readyz`, `/metrics`, `/v1/score`)	Proposed	server, http, observability, cloud-native, vmafx
ADR-0702	VMAFX Phase 4 multi-language modernization: Go 1.23 workspace, Rust workspace, C++23 policy	Proposed	go, rust, cpp23, language-policy, modernization, vmafx
ADR-0703	vmafx-server in Go — gRPC + HTTP/JSON, libvmaf cgo wrapper, Prometheus metrics, distroless build	Proposed	server, go, grpc, http, observability, vmafx
ADR-0704	vmafx-mcp Go port: single static binary, MCP Go SDK v1.6.1, all 15 Python tools ported with byte-for-byte schema parity	Accepted	mcp, go, build, agents
ADR-0705	vmafx-tune Go port Stage 1: `compare` subcommand as `vmafx-tune-go`; `pkg/encoder`, `pkg/bisect`, `pkg/report`	Accepted	go, vmafx-tune, language-modernization, cli, phase4, fork-local
ADR-0706	Rust `vmafx-sys` FFI crate: bindgen-generated raw bindings + thin safe wrapper for libvmaf	Accepted	rust, bindings, ffi, build
ADR-0707	TAD (Temporal Absolute Difference) — first Rust feature extractor; proves cbindgen → Meson → libvmaf.so integration	Accepted	rust, build, metrics, feature-extractor, phase4, fork-local
ADR-0708	C++20 internals pilot: convert `metadata_handler.c` to `.cpp` with RAII `unique_ptr` linked-list teardown; establish per-file C++20 migration recipe for Wave 1–3	Accepted	build, c++, cpp23, refactor, internals, fork-local, vmafx-rebrand
ADR-0709	VMAFX Phase 4b — distributed video-quality, encoding, and ML platform: controller/node/operator, ffmpeg, rclone, eBPF	Proposed	go, k8s, operator, controller, ffmpeg, rclone, ebpf, onnx, phase4b, fork-local
ADR-0710	VMAFX CI Slim-Down v2 — one build per OS, ASan/UBSan/MSan/TSan, GitHub-hosted-only runners	Accepted	ci, build, sanitizers, fork-local
ADR-0711	vmafx-controller Phase 4b.1 — Go service: gRPC + HTTP, in-memory job queue, persistent node registry, FIFO scheduler	Accepted	go, controller, phase4b, grpc, http, fork-local
ADR-0712	IDE config audit and refresh for multi-language post-rebrand VMAFX: clangd, gopls, rust-analyzer, Python LSP	Accepted	ide, clangd, gopls, rust-analyzer, build, fork-local
ADR-0713	vmafx-node Go worker binary — ffmpeg decode/encode/score pipeline, gRPC heartbeat to controller	Proposed	go, node, ffmpeg, phase4b, fork-local
ADR-0714	vmafx-operator kubebuilder skeleton + CRDs: `VmafxJob`, `VmafxNode`, `VmafxModelTraining`; Stage 1 reconcilers; Helm integration	Accepted	go, k8s, operator, crd, phase4b, fork-local
ADR-0717	vmafx-node ffmpeg version policy: pin to latest stable tag (n8.2); multi-stage Dockerfile with cpu/cuda/rocm/sycl variants	Accepted	node, ffmpeg, docker, phase4b, fork-local
ADR-0761	C++23 Wave 8 — opt.cpp activation + read_json_model.cpp	Accepted	`build`, `c++`, `cpp23`, `refactor`, `internals`, `fork-local`
ADR-0763	CUDA `adm_decouple` kernels: `__ldg()` F3 fix	Accepted
ADR-0767	Phase 4b.8 — libvmaf C ABI Break for VMAFx v4.0.0	Proposed	`api`, `abi`, `phase4b`, `breaking-change`, `v4`, `ffmpeg-patches`, `fork-local`
ADR-0768	C++23 Wave 9 — picture_pool + gpu_picture_pool	Accepted	`build`, `c++`, `cpp23`, `refactor`, `internals`, `fork-local`, `vmafx-rebrand`
ADR-0770	vmafx-tune Go port — Stage 4 (`report` subcommand)	Accepted	`go`, `vmafx-tune`, `language-modernization`, `cli`, `phase4`, `fork-local`
ADR-0771	SIMD twin coverage inventory and gap prioritisation	Accepted	`simd`, `docs`, `planning`
ADR-0772	Rename `feature_extractor.c` to `feature_extractor.cpp`	Accepted	`cpp23`, `build`, `core`, `fork-local`
ADR-0773	CUDA ADM decouple-inline — `__ldg()` F3 fix on active path	Accepted	`cuda`, `performance`, `adm`, `fork-local`
ADR-0774	MCP server audit — path rename, subsample drop, schema drift, dead code	Accepted	mcp, server, audit, fork-local
ADR-0777	Thread-Safety Audit — CUDA / SYCL / HIP Backends	Accepted
ADR-0778	Picture pool / framesync lifecycle audit and targeted fixes	Accepted	`correctness`, `picture-pool`, `framesync`, `refcount`,
ADR-0779	eBPF FUSE read-path bypass for vmafx-node rclone mounts	Proposed	`ebpf`, `node`, `rclone`, `performance`, `phase4b`, `fork-local`
ADR-0780	NOLINT Cluster Refactor — Slab Allocator, SYCL Stride, ADM Band-Size	Proposed	`ci`, `simd`, `cuda`, `sycl`, `hip`, `lint`
ADR-0781	Sidecar online training — SGD + EMA + replay buffer	Proposed	ai, sidecar, online-learning, k8s, vmafx-node, phase4b, fork-local
ADR-0782	OpenTelemetry tracing and metrics schema for the VMAFX platform	Accepted	`observability`, `go`, `platform`, `adr-0782`
ADR-0783	Kubernetes end-to-end integration test harness — kind + kuttl	Proposed	`ci`, `testing`, `k8s`, `github`
ADR-0784	AVX2 SIMD path for integer SSIM horizontal moment accumulation	Accepted	`simd`, `x86`, `avx2`, `ssim`, `performance`, `fork-local`
ADR-0786	vmafx-operator Stage 2 — reconciler loops, webhook validation, per-controller RBAC	Accepted	`go`, `k8s`, `operator`, `crd`, `controller-runtime`, `phase4b`, `fork-local`
ADR-0787	libvmaf Public API Error-Path Consistency Audit	Accepted
ADR-0788	Doxygen doc-comment and @thread-safety tags on public C-API	Accepted
ADR-0790	Containerfile layer optimization — merge apt layer, strip build artifacts, no-cache-dir pip	Accepted	`build`, `docker`, `containerfile`, `fork-local`
ADR-0793	Nightly Workflow Audit — TSan, Artifact Retention, Python Version	Accepted	`ci`, `nightly`, `sanitizers`, `artifacts`, `fork-local`
ADR-0794	Multi-Tenant Auth Gateway for vmafx-controller	Accepted	`security`, `controller`, `auth`, `multi-tenant`, `oidc`, `grpc`
ADR-0797	vmafx-server OpenAPI REST contract	Accepted	server, api, rest, openapi, swagger, go, vmafx-server
ADR-0802	CI Runner Image Standardization — Pin ubuntu-latest to ubuntu-24.04	Accepted	`ci`, `build`
ADR-0804	Add `vmaf_context_get_backend` — additive ABI introspection	Accepted	api, abi, backend, gpu, fork-local
ADR-0806	VmafFeatureDictionary caller-ownership contract	Accepted	`api`, `memory`, `testing`
ADR-0809	C++23 Wave 8 — CLI conversion (`cli_parse.c` → `.cpp`, `vmaf.c` → `.cpp`)	Accepted	`cpp23`, `build`, `cli`, `raii`, `fork-local`
ADR-0811	Security hardening — CodeQL Go coverage + codeql-config	Accepted	`ci`, `security`, `codeql`, `go`, `dependabot`, `ossf`
ADR-0812	Renovate — Go/Cargo grouping, schedule, and concurrent-PR cap	Accepted	`ci`, `build`, `deps`
ADR-0815	Distroless Dockerfiles for vmafx-operator and vmafx-node	Accepted	`docker`, `ci`, `release`, `operator`, `node`, `k8s`, `phase4b`, `fork-local`
ADR-0819	PR-time CI gate for dev/Containerfile	Accepted	`ci`, `build`, `workspace`
ADR-0844	float_adm AVX2/AVX-512 F2+F3 — double-precision and FP-contraction	Accepted	simd, bit-exactness, avx2, avx512, float_adm, build
ADR-0845	CUDA motion — multi-frame SAD batching to reduce per-launch overhead	Proposed	`cuda`, `performance`, `motion`, `fork-local`
ADR-0848	Per-Surface Documentation Compliance Audit — Session 2026-05-29	Accepted	`docs`, `compliance`, `process`, `per-surface-bar`, `fork-local`
ADR-0852	Wire speed_chroma_hip and speed_temporal_hip into HIP Build and Dispatch	Accepted	`hip`, `build`, `speed`, `feature`, `gpu`, `fork-local`
ADR-0854	Direct AVX-512 parity tests for motion kernels	Accepted
ADR-0858	C++23 conversion of `gpu_dispatch_env.c`	Accepted	build, c++, cpp23, refactor, internals, fork-local, gpu-dispatch
ADR-0873	ARM64 NEON bit-exactness audit — `-ffp-contract=off` carve-out scope	Accepted	`simd`, `arm64`, `neon`, `bit-exactness`, `build`, `ci`
ADR-0891	SIMD bit-exactness round-2 — unify SSIMULACRA 2 colour-matrix on FMA, extend `-fp-model=precise` to `libvmaf_feature_static_lib`	Accepted	`simd`, `build`, `bit-exact`, `icx`
ADR-0899	Bash strict-mode + trap-cleanup sweep across in-tree shell scripts	Accepted	`ci`, `agents`, `shell`, `hygiene`
ADR-0904	Pin `cargo-machete` ignore entries for `bindgen` / `cbindgen` build dependencies	Accepted	rust, build, ci, workspace
ADR-0908	Slow-test audit (2026-05-30) — no >30 s tests found; install `slow` marker as a future gate	Accepted	ci, testing, devx
ADR-0915	Enable clang-tidy `modernize-*` family with curated opt-outs; discharge top 15 findings (8 nullptr + 6 deprecated-headers + 1 use-auto) on fork-added C++ files	Accepted	lint, ci, cpp, quality-gate, fork-local
ADR-0860	Re-include Vulkan FFmpeg patches (0004 + 0006) as no-op shims for chain coherence — unblocks SYCL CI leg	Accepted	2026-05-30
ADR-0882	Fuzz target audit: add `fuzz_json_model` + `fuzz_dnn_sidecar` libFuzzer harnesses closing deferred targets #3 + #4 from Research-0083. First run surfaced a heap-buffer-overflow in `vmaf_model_destroy` via `parse_slopes` outrunning `feature_names`.	Accepted	2026-05-30
ADR-0861	Drop "and Claude (Anthropic)" from fork copyright lines (single notice `Copyright 2026 Lusoris` going forward); partially supersedes ADR-0025 / ADR-0105 format guidance.	Accepted	2026-05-30
ADR-0892	Conventional-Commits coverage + Changelog-fragment section hygiene — extend release-please root section list, move `perf/` + `performance/` fragments to `changed/perf-*`	Accepted	2026-05-30
ADR-0865	Sunset ANSNR (pre-VMAF metric): drop `ansnr` / `float_ansnr` feature extractors from all backends — back-dated to PR #38 merge (2026-05-28); closes ADR-0108 compliance gap caused by PR #38's wrong `Parent ADR-0709` cite	Accepted	2026-05-28
ADR-0871	SSIM SIMD dispatch installation must be pthread_once-guarded	Accepted	simd, threading, correctness, ssim, tsan
ADR-0869	Sanitizer-Pass Cleanup — CAMBI Option-Type Mismatch and AVX{2,512} ADM Signed-Shift UB	Accepted	`c`, `simd`, `sanitizer`, `correctness`, `cambi`, `adm`
ADR-0887	Reject JSON models whose per-feature arrays disagree on length	Accepted	security, parser, model, fuzz, hardening
ADR-0930	Helm chart NetworkPolicy default-deny + Pod Security Standards "restricted" baseline (opt-in NP, UID 65532, seccomp RuntimeDefault)	Accepted	2026-05-31
ADR-0927	OpenTelemetry traces + metrics Phase 1: adopt OTel Go SDK across all VMAFX Go services; pilot in `vmafx-controller` with `pkg/observability.InitOTel` helper, OTLP-to-collector export, head-based 1 % trace sampler, 60 s metric reader. Existing slog + Prometheus paths preserved. Per-service opt-in rollout for `vmafx-node`, `vmafx-server`, `vmafx-mcp`, `vmafx-tune` in later PRs.	Accepted	2026-05-31
ADR-0925	Introduce `pkg/registry.Store[K, V]` generic in-memory keyed store + `registry.Counter` constraint; refactor `cmd/vmafx-controller/nodes/Registry` to compose it and fold one of `pkg/observability.SetControllerSources`'s narrow interfaces into the generic constraint. Queue stays SQLite-backed.	Accepted	2026-05-31
ADR-0928	VmafPicture v2 — replace `void *priv` overlay with explicit `VmafBackendHandle backend` discriminator + typed `uintptr_t backend_handle`; dual-API window (12 months); SONAME 3→4 scheduled for VMAFX v4.0.0; design + scaffold header only in this PR	Proposed (2026-05-31)	api, abi, gpu, cuda, sycl, hip, metal, ffmpeg, rust, fork-local, vmafx-rebrand
ADR-0926	Parquet schema v2 — canonical column order, zstd-3, schema metadata	Accepted	ai, data, storage, parquet, k150k, chug
ADR-0924	Native bash pre-commit hook as opt-in alternative to the pre-commit framework (~10x faster on small commits; CI unchanged)	Accepted	2026-05-31
ADR-0923	Adopt BuildKit cache mounts and ccache across the container build matrix	Accepted	ci, build, container, performance
ADR-0931	MCP server — replace subprocess delegation with direct cgo (Phase 1: `vmaf_score` + `describe_model`, behind `VMAFX_MCP_DIRECT=1`)	Proposed	mcp, go, cgo, libvmaf, performance, vmafx, modernization
ADR-0929	Promote the safe wrapper layer out of `vmafx-sys` into a standalone `vmafx` crate; ship Phase 1 (`Context`, `Model`, `Picture`, `Score`, `Error`)	Accepted	rust, bindings, ffi, phase4, workspace, fork-local
ADR-0953	Doxygen public-API build is warning-clean	Accepted	`docs`, `ci`, `api`, `public-surface`
ADR-0959	Metal kernel parity coverage round 4 — closeout	Accepted	testing, metal, gpu, parity, regression-guard
ADR-0960	GPU runtime error-path leak fixes — round 25 (A.1 + A.2 + A.3)	Accepted	`cuda`, `memory`, `threading`, `correctness`, `fork-local`
ADR-0961	Controller queue — roll back PullWork on post-update Get failure (round-25 audit B.1)	Accepted	go, controller, correctness, queue, phase4b, fork-local
ADR-0963	ai/src: guard NaN propagation in eval + tune (round-25 audit C.1 + C.2)	Accepted	`ai`, `correctness`, `bisect`
ADR-0966	Fix dev/Containerfile post-ADR-0700 libvmaf → core paths (Round 26 audit C.1)	Accepted	`dev`, `docker`, `containerfile`, `rename`, `adr-0700`, `fork-local`
ADR-0968	CI scripts — rebrand-proof assertion-density grep + tempfile EXIT trap in changelog concat (Round 26 audit D.1 + D.2)	Accepted	`ci`, `build`, `docs`
ADR-0969	Helm chart: add seccompProfile default + fix node-deployment image helper (Round 26 audit B.1 + B.3)	Accepted	2026-05-31
ADR-0967	MCP HTTP transport security — add auth + body limit + safer bind default (Round 26 audit A.1)	Accepted	security, mcp, http, auth, hardening, fork-local
ADR-0922	Aggressive coverage ratchet: overall 37 → 60, critical 85 → 90, +5pp on every `PER_FILE_MIN` override; new per-PR coverage-delta gate (max 0.5pp drop on overall or any touched file); 30-day grace for in-flight PRs; exception process gated on new ADR superseding this one	Accepted	2026-05-31
ADR-0958	HIP kernel parity-test coverage round 4	Accepted	`hip`, `tests`, `gpu`, `coverage`, `fork-local`
ADR-0962	Controller fixes — implement StreamJobs snapshot and add reaper stop signal (round-25 audit B.3 + B.4)	Accepted	controller, grpc, go, correctness, goroutine, phase4b, fork-local
ADR-0956	CUDA kernel parity coverage — round 4 (last 5 uncovered kernels)	Accepted	testing, cuda, parity, fork-local, gpu-coverage
ADR-0964	Implement `speed_internal.c` and wire `speed_{chroma,temporal}_{hip,sycl}`	Accepted	`cuda`, `hip`, `sycl`, `feature-extractor`, `cross-backend-parity`, `speed`
ADR-0965	CUDA SpEED TU repair — align with current CudaFunctions table (closes T-CUDA-SPEED-TU-REPAIR-2026-05-31)	Accepted	`cuda`, `feature-extractor`, `cross-backend-parity`, `speed`, `fork-local`
ADR-0937	mkdocs ADR nav — per-hundred bucket layout + auto by-tag indexes (`scripts/docs/generate-adr-nav.sh`, `scripts/docs/generate-adr-by-tag.sh`)	Accepted	2026-05-31
ADR-0933	Add bidirectional `ScoreStream` RPC for per-frame VMAF scoring. Phase 1: schema + server stub. Phase 2 (2026-06-13): wired to libvmaf via the in-memory `pkg/libvmaf.StreamScorer`; both vmafx-server and vmafx-node serve it; pooled VMAF bit-identical to `ScoreDirect`.	Accepted	2026-05-31
ADR-0938	Feature-extractor coverage round 2 — seven CPU-side test executables	Accepted	tests, coverage, ci
ADR-0935	Wrap multi-step Go cleanup paths with `errors.Join`; standardise `slog` error keys	Accepted	go, observability, refactor, phase4b
ADR-0936	Replace final `os.path.*` usages with `pathlib.Path`; enable ruff `PTH` ruleset (flake8-use-pathlib) on fork-owned Python; per-file ignore for upstream Netflix trees. Fixes 12 PTH violations across 9 files.	Accepted	2026-05-31
ADR-0932	`iter.Seq[T]` companion APIs for single-pass Go collections	Accepted	`go`, `api`, `performance`, `ergonomics`
ADR-0939	Add `/add-mcp-tool`, `/add-k8s-resource`, `/audit-modernization` skills + consolidate bisect-* on shared `lib/bisect-common.sh`	Accepted	2026-05-31
ADR-0934	Migrate user-input dataclass configs (`TrainConfig`, `ModelMetadata`, `ManifestEntry`) to pydantic v2 BaseModel — declared validators, line-numbered errors, JSON-Schema export. Internal report / data-carrier dataclasses stay untouched per explicit triage rule. Sidecar JSON layout byte-identical. 667 ai tests pass.	Accepted	ai, validation, configs, modernization
ADR-0955	Fix two latent bugs in `compat/python-vmaf/`: scanf width-handler inversion + ProcessRunner locale `setdefault`	Accepted	2026-05-31
ADR-0954	Host-only unit test for shared GPU dispatch runtime	Accepted	test, gpu, cuda, hip, runtime
ADR-0952	Push test coverage on vendored libsvm + IQA paths the fork uses (2 new fast-suite executables, 29 assertions, +60 pp coverage)	Accepted	2026-05-31
ADR-0950	Fix symmetric "adm" vs "adm_hip" feature-name bug in test_hip_adm_parity; add `-ENOSYS` skip for `enable_hipcc=false` posture	Accepted	2026-05-31
ADR-0949	HIP motion3 parity test skips cleanly on `-ENOSYS` from `vmaf_read_pictures` when libvmaf was built with `enable_hipcc=false`; mirrors the existing no-HIP-device skip path. Fixes PR #443 audit-flagged `meson test --suite gpu` failure.	Accepted	2026-05-31
ADR-0951	GitHub Actions custom-action + reusable-workflow audit (2026-05-31): no `.github/actions/` exists, no `workflow_call:` workflows exist, all 24 workflows already SHA-pinned per ADR-0263; two abstraction candidates (composite `setup-build-deps`, reusable `meson-cpu-build.yml`) deferred.	Accepted	2026-05-31
ADR-0947	CUDA kernel parity coverage — round 3 (float-path twins + ssimulacra2)	Accepted	testing, cuda, parity, fork-local, gpu-coverage
ADR-0948	Feature-extractor coverage round 3 — targeted unit tests for low-coverage files	Accepted	`test`, `coverage`, `feature`
ADR-0945	HIP kernel parity-test coverage round 3	Accepted	`hip`, `tests`, `gpu`, `coverage`, `fork-local`
ADR-0914	Unified Python test orchestrator (nox at repo root) covering ai/ + mcp/ + tools/ + dev-llm/ + python/	Accepted	2026-05-31
ADR-0917	Adopt `cargo-deny` with a workspace `deny.toml` enforcing license allowlist, banned crates (`openssl-sys`, `native-tls`), RustSec advisories, and crates.io-only sources for the Rust workspace; wired into `rust-ci.yml` as a parallel job	Accepted	2026-05-31
ADR-0918	LLVM IR diff harness for bit-exact SIMD paths — opt-in `make ir-diff` gate that snapshots per-function IR under `testdata/ir-snapshots/` and fails on compiler-induced drift (FMA / FP-contract reassociation)	Accepted	simd, build, ci, perf, diagnostics, fork-local
ADR-0912	Pixel-format edge coverage at the libvmaf unit-test layer (4:2:2, 4:4:4, 10/12-bit PSNR/SSIM/CIEDE end-to-end smoke)	Accepted	test, coverage, fork-local, pixel-format, hbd
ADR-0913	Changelog fragment renderer — splice contract is `^## \[`, not `^##`: fixes 23 k+ line drift from PR #332 / #383 / #401 root cause (fragment-internal `##` headers tripping the sentinel); regenerated CHANGELOG.md from 59 757 → 15 030 lines	Accepted	2026-05-31
ADR-0719	vmafx-node rclone Integration — Remote-Asset Streaming Without Disk Materialisation	Accepted	architecture, go, node, rclone, storage, ffmpeg, phase4b, fork-local
ADR-0720	C++23 Wave-1 Pilot — `mem.c` conversion	Accepted	build, c, cpp23, refactor, internals, fork-local, vmafx-rebrand
ADR-0721	C++23 Wave 1 pilot: convert `opt.c` → `opt.cpp` using `std::optional<T>` for parse helpers; `extern "C"` guards; `[[nodiscard]]` on public entry point; isolated `opt_cpp23_lib` static-lib pattern (ADR-0708 playbook).	Accepted	2026-05-28
ADR-0723	C++23 Pilot — `fex_ctx_vector.c` Conversion (Wave 2)	Accepted	build, c, cpp23, refactor, internals, fork-local, vmafx-rebrand
ADR-0725	C++23 Pilot — `log.c` conversion (real C++23, supersedes ADR-0722)	Accepted	build, c, cpp23, refactor, internals, fork-local, vmafx-rebrand
ADR-0726	Drop Vulkan backend	Accepted	vulkan, gpu, backend, build, breaking, fork-local
ADR-0727	C++23 Wave 2: project-wide `cpp_std=c++23` + `dict.c` → `dict.cpp` with `std::expected`, `std::string_view`, `[[nodiscard]]`; toolchain floor gcc >= 13 / clang >= 16	Accepted	2026-05-28
ADR-0728	Sunset Legacy Native Build Modes — Phase 4b.9 Follow-On	Accepted	ci, build, vmafx, breaking
ADR-0729	C++23 Wave 3 — feature_name, picture_copy, model	Accepted	build, cpp23, refactor
ADR-0730	vmafx-tune Go port — Stage 2 (ladder subcommand)	Accepted	go, vmafx-tune, language-modernization, cli, phase4, fork-local
ADR-0731	C++23 Wave 3 Part B — psnr_tools, luminance_tools, mkdirp	Accepted	build, cpp23, modernization
ADR-0733	C++23 Wave 4 — output writers (XML, JSON, CSV, subtitle)	Accepted	build, c++, cpp23, refactor, internals, fork-local, vmafx-rebrand
ADR-0735	C++23 Wave 5 — cpu, ref, thread_locale	Accepted	build, c++, cpp23, refactor, internals, fork-local
ADR-0738	Bump local CUDA toolkit pin to 13.3 + R610 minimum driver (partial — CI deferred)	Accepted	cuda, build, container, ci, deps
ADR-0743	CUDA VIF filter1d ncu-driven performance optimizations	Accepted	cuda, performance, vif
ADR-0744	CUDA adm_cm `__launch_bounds__(128, 8)` register reduction (ms_ssim_decimate smem tiling reverted)	Accepted	cuda, performance, adm, ms_ssim, occupancy
ADR-0746	integer_adm_cuda — emit integer_adm3 + integer_aim (parity with CPU)	Accepted	cuda, integer_adm, aim, adm3, parity
ADR-0747	CUDA `extern "C"` invariant for host-looked-up kernels	Accepted
ADR-0749	Sunset VmafLegacyQualityRunner (float-path runner)	Accepted	python, quality-runner, breaking-change, cleanup
ADR-0750	Hardware Measurement Verdict for PR perf/cuda-ms-ssim-decimate-adm-cm-ncu-driven	Accepted	cuda, performance, ms_ssim, adm_cm, measurement
ADR-0752	Multi-Resolution Performance Benchmark Baseline	Accepted
ADR-0753	Runtime Resolution-Aware CUDA Kernel Variant Dispatch	Accepted	cuda, perf, build
ADR-0754	CUDA SSIM `vert_combine`: `__ldg()` + `__launch_bounds__` + pinned-host leak fix	Accepted
ADR-0755	C++23 Wave 7 — drop orphan `cpu.c`, activate `cpu.cpp`	Accepted	cpp23, build, core, fork-local
ADR-0756	CUDA F3 struct-by-value kernel audit (scope + dispatch order)	Accepted	cuda, perf, research
ADR-0757	CUDA MS-SSIM `ms_ssim_vert_lcs` + `ms_ssim_horiz`: `__ldg()` + `__launch_bounds__` (F3 fix #2)	Accepted
ADR-0759	HIP ADM — AdmBufferHip passed by pointer (F3 fix)	Accepted	hip, performance, cuda, kernel, adm, fork-local
ADR-0760	CUDA motion kernel multi-resolution ncu profiling methodology	Accepted	cuda, perf, research
ADR-0762	CUDA CIEDE2000 8bpc/16bpc — `__ldg()` read-only cache routing (F3 fix)	Accepted	cuda, performance, ciede, fork-local
ADR-0775	DNN ORT Backend Audit Findings	Accepted	dnn, onnx, ort, thread-safety, correctness, fork-local, research
ADR-0792	Env-var overrides for hardcoded YUV and testdata paths	Accepted	workspace, ci, testdata
ADR-0795	Clarify and harden VmafFeatureExtractor.prev_ref thread-safety invariant	Accepted	threading, feature-extractor, batch-threading, correctness
ADR-0810	ADR-0108 Six-Deliverables Compliance Audit (2026-05-29) + D3 Gap Fixes	Accepted	docs, agents, process
ADR-0839	C++23 wave — shadow-identifier and implicit-cast cleanup	Accepted	cpp23, lint, core, sycl, fork-local
ADR-0840	Fix cu_state leak on import failure and gpu_dispatch_env TOCTOU	Accepted	cuda, security, framework, ci
ADR-0841	Environment variable reference page and canonical naming	Accepted	docs, sycl, cuda, ai, workspace
ADR-0853	Remove dead debug-print macros from motion_avx2.c	Accepted	simd, lint, avx2, cleanup, fork-local
ADR-0911	`__init__.py` export-completeness audit — `__all__` + SPDX headers across 8 fork-added Python packages	Accepted	2026-05-31
ADR-0910	Project-wide `codespell` config + skip-list policy: ignore Netflix-author / vendored / frozen-ADR files; ignore domain acronyms (ANE, HSA, SME, CANN, COO, …); 3 typo fixes (CONTRIBUTING.md, docs/metrics/cambi.md ×2)	Accepted	2026-05-31
ADR-0907	Wall-clock perf regression gate over the ADR-0752 multi-resolution baseline. Adds `scripts/perf/check-regression.py` (stdlib-only) and a CPU-only CI job in `tests-and-quality-gates.yml` that runs `bench-multi-resolution.sh` and asserts that no `(resolution, backend, metric)` cell regresses by > 5% wall-clock vs the committed `testdata/perf_multi_resolution.json` baseline. Replaces the silently-broken `bench_all.sh --backend=cpu --snapshot-only --tolerance-ulp=2` invocation that was a no-op (the script does not parse those flags). `continue-on-error: true` for one release cycle so cross-runner variance data can inform whether the 5% tolerance is right before the step is promoted to a required check. GPU lanes deferred to a follow-up ADR once self-hosted runners stabilise.	Proposed	ci, performance, regression-gate, fork-local, testing
ADR-0905	`.gitignore` + `.github/workflows/` staleness audit — drop dead rules, rewire post-ADR-0700 paths, no workflow removals	Accepted	2026-05-30
ADR-0903	Wire `codecov/codecov-action@v6.0.1` (SHA-pinned) into both `Coverage Gate` jobs in `tests-and-quality-gates.yml`. Uses fork-aware OIDC (no `CODECOV_TOKEN` secret); `fail_ci_if_error: false` so Codecov outages do not double-gate the gcovr threshold check; flag-tagged `cpu` vs `gpu` to mirror the two-job split. Closes the gap PR #383 documented ("Codecov badge intentionally NOT added: no Codecov upload step exists in any workflow").	Accepted	2026-05-30
ADR-0902	Signing and attestation audit (2026-05-30): existing Sigstore + SLSA + SBOM coverage is strong; close three closeable gaps by adding `actions/attest-build-provenance@v4.1.0` to all 5 container build jobs, having the post-push smoke-test verify the cosign signature before pulling, and expanding `docs/development/release.md` with copy-pasteable consumer verification recipes. Tag signing, DCO sign-off, Helm chart signing, standalone Go binary releases — explicitly scoped out.	Accepted	2026-05-30
ADR-0953	Doxygen public-API build is warning-clean	Accepted	`docs`, `ci`, `api`, `public-surface`
ADR-0901	Governance file audit: add `GOVERNANCE.md` + `MAINTAINERS.md`, extend CODEOWNERS for fork-local subtrees, document ADR-0108 in `CONTRIBUTING.md`	Accepted	governance, docs, meta
ADR-0893	Pre-commit config audit — 2026-05-30 (forbid-new-submodules + isort/ruff bumps)	Accepted	ci, lint, hygiene, pre-commit
ADR-0889	Vendored libsvm 3.24 audit — close header-row-ordering oob, document upstream-version policy	Accepted	2026-05-30
ADR-0890	CI concurrency + cost audit follow-up to PR #301: concurrency block + ccache on `ffmpeg-integration.yml`, ccache on `sanitizers.yml`, `paths-ignore` on `security-scans.yml`, early file-delta skip on `lint-and-format.yml::clang-tidy`	Accepted	2026-05-30
ADR-0884	SYCL kernel coverage round 2 — five new CPU-vs-SYCL parity gates (adm, ciede, integer_ssim, float_ms_ssim, motion_v2) at ADR-0214 places=4; follows round 1 / ADR-0868	Accepted	2026-05-30
ADR-0886	CUDA kernel parity test coverage — round 2 gap-fill	Accepted	testing, cuda, gpu, parity, coverage
ADR-0883	HIP kernel parity-test coverage round 2	Accepted	`hip`, `tests`, `gpu`, `coverage`
ADR-0880	Remove three unreferenced fork-added testdata artifacts: `check_borders.py` (one-off ADM border debug), `compare_a380.py` (superseded by `compare_combined.py`), `scores_sycl_b580_576_mq.json` (orphan slim-schema snapshot)	Accepted	2026-05-30
ADR-0878	Trivy container scan baseline: production Dockerfiles add `USER nonroot:nonroot` (UID 65532) to every final stage; clears 2 HIGH DS-0002 findings on `docker/Dockerfile.production` + `docker/Dockerfile.production-gpu`; `dev/Containerfile` keeps `USER root` (intentional dev sandbox); image-CVE scan blocked on `MANIFEST_UNKNOWN` for unpublished `ghcr.io/vmafx/vmafx:*` (follow-up)	Accepted	2026-05-30
ADR-0876	Adopt `<inttypes.h>` PRI macros (`PRId64` / `PRIu64` / `PRIx64`) for fixed-width integer printf formatting in fork-added C / C++ — eliminates Windows-LLP64 truncation bug class (CERT FIO47-C, MISRA 21.6)	Accepted	2026-05-30
ADR-0877	Fork-added MS-SSIM decimate dispatcher — convert `return -1` (malloc-failure) to `-ENOMEM`	Accepted	2026-05-30
ADR-0875	GitHub Actions hardening audit (2026-05-30): SHA-pinning confirmed across 22 workflows; backfill top-level `permissions: contents: read` on `go-ci.yml` + `rust-ci.yml`; add `persist-credentials: false` to 5 checkouts in `sanitizers.yml` + `supply-chain.yml`.	Accepted	2026-05-30
ADR-0874	Name magic numbers in fork-added C surfaces (CERT INT07-C closeout pass 1) — adds ~20 named `#define` constants across `core/src/mcp/`, `core/src/picture.c`, `core/src/cuda/picture_cuda.c`, `core/src/libvmaf.c`. Rename-only; no numeric values changed.	Accepted	2026-05-30
ADR-0872	POSIX I/O EINTR-retry + return-value audit on fork-added C: two MCP drain loops now retry on `EINTR` (silent stream desync under signal pressure); seven discarded `close(2)` returns marked `(void)`-cast for Power-of-10 rule 7 hygiene.	Accepted	2026-05-30
ADR-0870	Add `deploy/helm/vmafx/values.schema.json` (enforces `workload` / `gpu.vendor` / `storage.mode` enums, blocks sibling-key typos via `additionalProperties: false`); fix `dev/Containerfile` ADR-0700 path drift (`COPY libvmaf/` → `COPY core/` + new `COPY compat/`, two `cd libvmaf` → `cd core`, `.dockerignore` `core/build*/` siblings).	Accepted	2026-05-30
ADR-0868	GPU backend kernel parity-test coverage gap-fill	Accepted	tests, cuda, hip, sycl, metal, coverage
ADR-0970	test_gpu_picture_pool.c: remove unused malloc + dead code (Round 27 audit D.3 + D.4)	Accepted	`testing`, `cuda`, `memory`, `cleanup`, `fork-local`
ADR-0971	Test suite: NULL-check malloc in 3 test files (Round 27 audit D.1)	Accepted	`testing`, `correctness`, `asan`, `fork-local`
ADR-0972	Public headers — replace ISO-reserved `__VMAF___` include guards with `LIBVMAF__H` (Round 27 audit A.1, SEI CERT DCL37-C)	Accepted	`api`, `headers`, `cert-c`, `lint`, `compatibility`
ADR-0973	Master CI fixes — Metal MS-SSIM fixture dim + ssimulacra2 icpx XYB bit-exactness	Accepted	ci, simd, metal, ssimulacra2, icpx, bit-exactness, tests
ADR-0975	Use NamedTemporaryFile in _run_vmaf_score to eliminate task-name collision risk	Accepted	`mcp`, `security`, `concurrency`
ADR-0976	dnn sidecar: delete dead `has_norm` / `norm_mean` / `norm_std` / `expected_min` / `expected_max` / `has_range` fields and three consumer branches (per ADR-0114 deferred cleanup); plug partial-allocation leak in `extract_string_array` on every error path	Accepted	2026-05-31
ADR-0977	core/tools input-reader safety — Y4M malloc-NULL check, YUV/Y4M size_t cast, bench GPU-state leaks	Accepted	security, bug, tools, audit
ADR-0978	vmafx-server + pkg/score bug-audit — shutdown leak, gRPC Send-EOF surfacing, HTTP body cap, panic recovery	Accepted	security, bug, audit, go, grpc, http, server
ADR-0980	Markdown-lint full-ruleset discharge — content fixes + per-file scoped disables	Accepted	docs, lint, ci, policy, fork-local
ADR-0983	gosec sweep — fix all findings + add CI gate	Accepted	security, ci, go, fork-local
ADR-0862	K150K extractor — .done vs parquet consistency check on restart	Accepted	`ai`, `pipeline`, `durability`, `k150k`
ADR-0879	Python dependency freshness sweep — bump nine stale floors across ai/, mcp-server/, dev-llm/, tools/, leave ceilings + hash-pinning to follow-up	Accepted	2026-05-30
ADR-0881	Coverage-overrides audit: tighten `tiny_extractor_template.h` 10 % → 75 % (67 pp slack recovered); codify tighten/keep/remove rule + quarterly audit cadence	Accepted	2026-05-30
ADR-0888	Pyright strict audit of fork-local Python packages	Accepted	ai, mcp, python, type-safety, ci
ADR-0946	SYCL kernel coverage round 3 — 5 CPU vs. SYCL parity gates for `float_psnr_sycl`, `float_adm_sycl`, `float_vif_sycl`, `float_motion_sycl`, `psnr_hvs_sycl` at ADR-0214 places=4 (1e-4)	Accepted	2026-05-31
ADR-0957	SYCL kernel coverage round 4 — 4 CPU vs. SYCL parity gates for `float_moment_sycl`, `speed_chroma_sycl`, `speed_temporal_sycl` (places=4 / 1e-4) and `ssimulacra2_sycl` (5e-3 per ADR-0214 FEATURE_TOLERANCE); closes the SYCL kernel-coverage backlog at 18/18 = 100 %	Accepted	2026-05-31
ADR-0984	Port Netflix Upstream May–Jun 2026 (5 commits)	Accepted
ADR-0985	SYCL parity divergence investigation — float_ssim + ssimulacra2 on Arc A380	Proposed	sycl, parity, ci, gpu, precision, arc
ADR-0986	docs.yml — add PR trigger to surface doc-substance gaps before merge	Accepted	ci, docs, fork-local
ADR-0987	AVX-512 path for float_moment feature extractor	Accepted	`simd`, `avx512`, `performance`, `float_moment`, `fork-local`
ADR-0988	Route strict-JSON helpers through `vmaftune.jsonio` across vmaf-tune	Accepted	`refactor`, `json`, `vmaf-tune`, `mcp`
ADR-0989	Wire motion_add_uv through integer_motion_sycl (UV blur+SAD on device, per-plane normalization on host); add motion_add_uv ENOTSUP stub to CUDA/Vulkan/HIP/Metal; upgrade motion_five_frame_window rejections to WARNING	Accepted	2026-06-03
ADR-0990	Restore double-precision L/C/S accumulation in CUDA ms_ssim_vert_lcs — promotes per-pixel L/C/S and warp/block reductions from float to double (ADR-0139 pattern), fixes test_cuda_float_ms_ssim_parity places=4 gate	Accepted	2026-06-03
ADR-0991	Fixes the missing `pythonpath = ["scripts"]` in `ai/pyproject.toml` so batch materializer tests pass from the repo root, and commits a smoke-run scaffold under `ai/testdata/smoke-second-opinion-batch/`.	Accepted	ai, second-opinion, materializer, testing, smoke, fork-local
ADR-0992	Ship corpus-specific MOS-label batch manifests for KonViD and CHUG (ai/configs/); add smoke tests that validate manifest schema and run end-to-end with synthetic data; fix pre-existing sys.path bug in test_batch_materialize_mos_labels.py	Accepted	2026-06-03
ADR-0994	Fix Coverage Gate build break — `integer_motion.c` compile error	Accepted	`ci`, `coverage`, `build`, `motion`
ADR-0995	Shorten CI workflow and job display names	Accepted	`ci`, `github-actions`, `docs`
ADR-0996	eBPF FUSE bypass for rclone zero-copy path in vmafx-node	Proposed	`ci`, `go`, `ebpf`, `rclone`, `performance`, `security`, `supply-chain`
ADR-0999	Guard `<stdatomic.h>` includes in C++ translation units (GCC 14 + Clang-18 fix)	Accepted	`build`, `ci`, `cpp`, `atomics`, `tsan`, `fork-local`
ADR-1000	Tech-stack badges added to README (version pins, GPU/SIMD, container); Go version bumped go.mod + go-ci.yml 1.23→1.26.4; Rust edition 2024 deferred (bindgen 0.69 extern-block blocker)	Accepted	2026-06-04
ADR-1001	SYCL parity round 5 — CAMBI CPU vs. SYCL parity gate	Accepted	`sycl`, `test`, `gpu`, `parity`, `kernel-coverage`, `cambi`, `fork-local`
ADR-1002	Bump Rust workspace to edition 2024 and bindgen to 0.72	Accepted	`rust`, `build`, `workspace`
ADR-1003	Bump project-wide C++ standard from c++11 to c++23 in core/meson.build — lands the ADR-0727 decision that was never applied to the default_options entry; fixes pre-existing test_feature_collector_coverage linker failure	Accepted	2026-06-04
ADR-1004	HIP kernel parity-test coverage round 5 — 2 CPU vs. HIP parity gates for `speed_chroma_hip` and `speed_temporal_hip` at ADR-0214 places=4 (1e-4), closing the round-4 deferral after ADR-0964 / PR #465 resolved the `speed_internal.c` link defect; lifts HIP extractor parity coverage to 17/17 (100%) for all non-deferred kernels	Accepted	2026-06-04
ADR-1005	Perf Gate Advisory Mode and Baseline Refresh Documentation	Accepted	`ci`, `perf`
ADR-1007	Fix C string/numeric UB cluster — NULL strcmp, size_t underflow, signed-shift overflow, snprintf truncation	Accepted	`core`, `security`, `c`, `ub`
ADR-1008	Fix C lifecycle bugs — pic_cnt double-increment, div-by-zero in pooled score, silent test failures	Accepted	`core`, `correctness`, `test`, `c`
ADR-1009	Fix Go shutdown / goroutine correctness — WaitForShutdown unconditional block, unbounded GracefulStop	Accepted	`go`, `server`, `controller`, `shutdown`, `correctness`
ADR-1010	MCP server JSON parse guards — vmaf output and ffprobe output	Accepted	`mcp`, `python`, `error-handling`, `correctness`
ADR-1011	Add `static` to TU-internal CUDA helper functions — VIF, ADM, motion	Accepted	`core`, `cuda`, `correctness`, `build`
ADR-1012	Go queue state-machine guards — PullWork AND-status, ReportResult idempotency	Accepted	`go`, `controller`, `queue`, `correctness`, `concurrency`
ADR-1014	Prometheus registry isolation for SetControllerSources	Accepted	`security`, `observability`, `go`
ADR-1017	Go operator controller resource-allocation fixes	Accepted	`security`, `k8s`, `go`, `operator`
ADR-1018	MCP exec.CommandContext + controller gRPC panic recovery	Accepted	`security`, `mcp`, `go`, `grpc`
ADR-1020	acq_rel memory ordering on ref-count decrement + mutex-destroy-after-unlock + picture-pool unlock ordering	Accepted	`correctness`, `threading`, `core`
ADR-1021	Constant-time session-token comparison + JWT nbf-claim validation	Accepted	`security`, `auth`
ADR-1022	Cast `dst_buf_read_sz` operands to `size_t` in y4m_input to prevent signed-integer overflow	Accepted	`core`, `security`, `correctness`, `tools`, `c`
ADR-1023	MCP server asyncio correctness — async wrappers for blocking I/O	Accepted	`mcp`, `asyncio`, `python`, `correctness`
ADR-1024	R6 per-metric scoring guards — PSNR/ADM correctness fixes	Accepted	`correctness`, `psnr`, `adm`, `simd`
ADR-1025	R6 CUDA/HIP kernel correctness fixes	Accepted	`correctness`, `cuda`, `hip`, `simd`
ADR-1026	R6 SYCL kernel correctness — rd-stride OOB and unchecked graph_wait	Accepted	`correctness`, `sycl`
ADR-1030	HIP adm_decouple dangling body + VIF wavefront 32-bit carry + Metal motion vertical halo	Accepted	`hip`, `metal`, `correctness`, `gpu`
ADR-1032	vmaf_init double-init guard and vmaf_close pointer-contract documentation	Accepted	`api`, `correctness`, `memory-safety`
ADR-1033	CPU-side scoring NaN/UB guards across PSNR/SSIM/MS-SSIM/ADM/CAMBI/MOTION	Accepted	`correctness`, `cpu`, `psnr`, `ssim`, `adm`, `cambi`, `motion`
ADR-1034	Fix SYCL integer_vif rd_stride OOB on odd widths and integer_motion UV queue sync gap	Accepted	`sycl`, `correctness`, `gpu`
ADR-1035	CI workflow concurrency guards and job timeouts	Accepted	`ci`, `security`, `supply-chain`
ADR-1036	Correct SPDX license identifiers and add missing libsvm copyright	Accepted	`license`, `security`, `supply-chain`
ADR-1038	MCP cross-surface precision-default and probe-precision drift	Accepted	`mcp`, `correctness`, `cross-surface`
ADR-1039	Fix CERT MEM04-C realloc OOM safety in vendored libsvm	Accepted	`memory-safety`, `correctness`, `vendored`
ADR-1040	Promote `integer_ssim_moments_t` to shared header (macOS / Windows arm64 build fix)	Accepted	`build`, `simd`, `arm64`, `macos`, `windows`, `integer-ssim`, `fork-local`
ADR-1041	Fix CI RED — Go metal option type + Rust AVX-512 test guard	Accepted	`ci`, `build`, `go`, `rust`, `avx512`, `metal`
ADR-1042	Containerfile hardening — non-root USER + build-time DEBIAN_FRONTEND	Accepted	`containerfile`, `security`, `docker`, `ci`, `hardening`
ADR-1047	Helm chart schema and values.yaml correctness fixes (R9 batch)	Accepted	`helm`, `k8s`, `bug`
ADR-1048	vmaf-tune `ladder --duration` sentinel dest mismatch fix	Accepted	`vmaf-tune`, `bug`, `cli`
ADR-1049	Exponential backoff for vmafx-node online-feedback drainLoop	Accepted	`go`, `grpc`, `node`, `bug`
ADR-1051	Port upstream batch-threading + picture-pool defaults (dff4082b + 46d3a154)	Accepted	`upstream-port`, `scoring`, `threading`, `correctness`
ADR-1052	Re-register CPU `motion_v2` extractor and fix post-flush test ordering	Accepted	`core`, `motion`, `test`, `build`
ADR-1053	Default docker-compose runtime to nvidia and expand GPU capabilities	Accepted	`dev`, `cuda`, `docker`, `build`
ADR-1056	Use /std:c++latest on MSVC instead of cpp_std=c++23	Accepted	`build`, `ci`, `windows`, `msvc`
ADR-1057	Revert float-ADM SIMD dispatch wiring (PR #685) — NEON FMA divergence unfixable in scope	Superseded by fix/neon-fma-safe-float-adm-dwt2 (float_adm_dwt2_neon.c)	`simd`, `neon`, `float-adm`, `revert`, `correctness`
ADR-1058	Helm chart security hardening — PDB, RBAC split, metrics NetworkPolicy, schema tightening	Accepted	`helm`, `k8s`, `rbac`, `security`, `networkpolicy`
ADR-1060	Round 10 C++23 wave error-path cleanup	Accepted	`cpp23`, `correctness`, `memory`, `error-handling`, `fork-local`
ADR-1061	Fix depth-limit, integer-overflow, and banned-function bugs in vendored pdjson and cJSON	Accepted	security, vendored, mcp, c, libvmaf, fork-local
ADR-1063	Rust clippy strictness — scoped bindings suppression, unsafe_op lint, no panicking Default	Accepted	`rust`, `lint`, `safety`, `workspace`
ADR-1064	Wire score_fmt option on all vmaf FFmpeg filters	Accepted	`ffmpeg`, `build`, `api`
ADR-1065	Go staticcheck r10: replace time.After in poll loops with time.NewTicker (timer-leak fix); add MaxBytesReader + 413 mapping to controller /v1/score; add ReadTimeout to both HTTP servers	Accepted	2026-06-06
ADR-1066	Regression tests for the sequential-realloc double-free in libsvm	Accepted	`test`, `security`, `ci`
ADR-1068	Fix fast-path data race in gpu_dispatch_env.cpp via atomic publication flag	Accepted	`core`, `correctness`, `thread-safety`, `cpp23`
ADR-1069	Operator CRD status-schema gaps and VmafxNode LastHeartbeat ownership	Accepted	`operator`, `crd`, `k8s`, `bug`
ADR-1071	Promote HIP ms_ssim_vert_lcs to double precision (ADR-0990 parity)	Accepted	`hip`, `precision`, `ms_ssim`, `cross-backend-parity`
ADR-1072	Fix PREV_REF refcount leak in threaded batch and serial dispatch paths	Accepted	`core`, `threading`, `memory`, `picture-pool`, `bug`
ADR-1073	Fix vmaf_score_at_index EAGAIN-guard misapplication for model output slots	Accepted	`mcp`, `scoring`, `correctness`, `core`, `fork-local`
ADR-1074	Helm chart values completeness — missing knobs and schema gaps	Accepted	`helm`, `k8s`, `bug`
ADR-1075	MCP HTTP transport `POST /v1/score` body-validation edge cases	Accepted	`mcp`, `security`, `correctness`, `http`, `fork-local`
ADR-1077	vmaf-tune corner cases: parse_versions + compare preset	Accepted
ADR-1078	ms_ssim option parity across HIP and SYCL backends	Accepted	`hip`, `sycl`, `cuda`, `ms_ssim`, `parity`
ADR-1079	TSan-eligible thread-safety test for threaded_extract_batch_func	Accepted	`ci`, `test`, `threading`
ADR-1080	UBSan enum-invalid-value fixes in vmaf_log and vmaf_option_set	Accepted	`ci`, `sanitizer`, `build`
ADR-1081	vmaf_bench correctness — unchecked alloc returns and wall-clock timer	Accepted	`tools`, `bench`, `correctness`, `clock`
ADR-1083	y4m_input_fetch_frame signed-integer overflow + fread(NULL) UB fixes	Accepted	`core`, `security`, `correctness`, `tools`, `c`, `fork-local`, `bugfix`
ADR-1084	Use filepath.SplitList for VMAF_MCP_ALLOW path-list parsing	Accepted	`build`, `windows`, `mcp`
ADR-1085	MCP streaming backpressure — kill child processes on client disconnect	Accepted	`mcp`, `security`, `go`, `python`
ADR-1086	CI Workflow Least-Privilege Permissions Audit	Accepted	`ci`, `security`
ADR-1087	Extend test coverage for pkg/storage and cmd/vmafx-node/bpf	Accepted	`test`, `storage`, `ebpf`, `coverage`
ADR-1088	CLI flag-parsing hardening — parse_unsigned overflow/negative guards and --help in cli_parse.cpp	Accepted	`cli`, `security`, `correctness`
ADR-1089	Block non-standard ONNX operator domains in the DNN wire scanner	Accepted	`security`, `ai`, `dnn`, `fork-local`
ADR-1090	Fix CUDA stream and event leaks on init error paths	Accepted	`cuda`, `security`, `testing`
ADR-1092	framesync producer-death deadlock — abort flag + shutdown broadcast	Accepted	`core`, `threading`, `correctness`, `sanitizer`, `fork-local`
ADR-1093	Disable two recurring-failure tests via should_fail while root cause is under investigation	Accepted	ci, testing, flaky, picture-pool, sycl, fork-local
ADR-1094	Helm chart rolling-update correctness — node strategy, PDB default, probe fix, grace period	Accepted	`helm`, `kubernetes`, `deploy`, `fork-local`
ADR-1095	Fix OTel trace context propagation across gRPC boundaries	Accepted
ADR-1096	Doxygen @brief/@param coverage for core internal headers	Accepted	`docs`, `maintainability`
ADR-1097	Atomic file writes for AI-script cache and output files	Accepted	`ai`, `correctness`, `reliability`
ADR-1099	Propagate `-fsycl` via `sycl_dependency` to fix test-binary SIGSEGV	Accepted
ADR-1100	Skip GPU-flagged extractors when `flags == 0` in `vmaf_get_feature_extractor_by_feature_name`	Accepted	`feature-extractor`, `correctness`, `sycl`, `cuda`, `hip`, `bug-fix`, `fork-local`
ADR-1101	Change vmaf container user GID/UID from 1000 to 2000	Accepted	`build`, `ci`, `workspace`
ADR-1102	Container-only canonical artifact publishing (Phase 4b.9)	Accepted	container, build, release, publish, phase4b, docs-policy, fork-local
ADR-1103	Fix integer_vif_hip boundary condition: all filter-loop reads used clamp_i (replicate-edge), disagreeing with CPU PADDING_SQ_DATA and the CUDA twin's two-bounce mirror-reflect. The mismatch produced max	HIP−CPU	≈ 0.0018 per VIF scale (places~2.75) on the Netflix src01 pair, violating the ADR-0214 places=4 gate. Replaces clamp_i with mirror2_i in all six filter-loop boundary reads; confirmed places~6 (max 1e-6) on gfx1030 wave32 hardware across all 48 src01 frames. Tightens test_hip_vif_parity.c tolerance from 1e-3 to 1e-4.
ADR-1104	Remove AVX-512 dispatch from float VIF convolution to restore Netflix golden scores	Accepted	`simd`, `correctness`, `float-vif`, `bug-fix`
ADR-1105	`fr_regressor_v2_ensemble` production flip deferred to the one-shot post-RC retrain	Accepted	`ai`, `models`, `rc`, `docs`
ADR-1106	HIP `integer_motion_v2` mirror corrected from `2size-idx-1` to reflect-101 `2size-idx-2`, matching the CPU/CUDA/SYCL twins. Supersedes ADR-0377's incorrect claim that the `-1` form matched CPU/CUDA (all three use `-2`; identical call sites, so it was a one-pixel high-boundary divergence — same class as ADR-1103's HIP VIF fix). Also adds the missing `shape[d] <= 0` guard in `build_input_tensor` (DNN infer path). HIP-only; no CPU golden impact.	Accepted	2026-06-13
ADR-1107	Fix multi-PREV_REF starvation in `threaded_read_pictures_batch`: the batch struct-copied a single `prev_ref` snapshot into the first temporal extractor and zeroed the shared snapshot, so a 2nd PREV_REF extractor (e.g. `motion_v2`, which co-schedules `motion`) saw NULL prev_ref and returned -EINVAL every frame — failing 100% of `--threads N` extractions involving motion_v2 (incl. the K150K retrain corpus). Fix: each PREV_REF extractor takes its own `vmaf_picture_ref()`; snapshot released once at `unref:`. Regression test `test_batch_two_prev_ref_extractors`. Scores unchanged.	Accepted	2026-06-13
ADR-1108	CUDA motion_v2 twin emits motion3_v2_score	Accepted	cuda, feature, parity
ADR-1109	vmafx-node `Serve()` registers the `VmafxScoring` gRPC service (Score + ScoreStream + Health) — turning the Phase-4b.4 listen-only stub into a directly-dispatchable scoring endpoint that reuses the shared `pkg/libvmaf` engine, with graceful shutdown. The controller-pull worker loop (ADR-0713) is a separate client role.	Accepted	2026-06-13
ADR-1110	Add `delta_e_itp` (ΔE-ITP, ITU-R BT.2124-0), a CPU-only HDR/WCG full-reference colour-difference feature extractor (feature key `delta_e_itp`), filling the fork's HDR colour-fidelity gap (`ciede` is SDR/BT.709). RC scope: PQ (ST-2084) transfer only — HLG / BT.1886 deferred (single-sourced constants); `transfer=hlg`/`bt1886` rejected with `-EINVAL`. Options `transfer`(=pq), `matrix`(bt2020/bt709), `range`(limited/full); mean per-pixel pooling; YUV400 rejected; double-precision math, no out-of-gamut clamping. Mirrors `ciede.c`; PQ EOTF in `delta_e_itp_math.h`. Unit test asserts the BT.2124-0 Annex 4 full-precision ITP triple at places=4.	Accepted	2026-06-14
ADR-1112	NIQE no-reference CPU feature extractor (`niqe`) replicating the fork Python harness byte-for-byte against the in-tree pristine model `niqe_v0.1.pkl`. Load-bearing fork divergences from upstream NIQE: the AGGD `N` carries a trailing `aggdratio` factor and the MSCN maps + PIL-bicubic half-res are round-tripped through float32. NR posture: scores the distorted picture only (ref / _90 discarded), mirroring CAMBI. Matches the harness at places=4+ on natural content.	Accepted	2026-06-14
ADR-1111	Add PU21 HDR perceptual metric — a CPU `pu21` extractor providing `pu21_psnr` + `pu21_ssim`. Encodes the luma plane through the PU21 transfer function (canonical `banding_glare` variant, all four selectable via `variant`) onto a perceptually-uniform scale, then scores PSNR (peak=256, no SDR cap) and a self-contained L=256 Gaussian SSIM. RC ships PQ (ST.2084) input only (`transfer` option defaults `pq`, rejects others with -EINVAL; HLG/SDR deferred). Critically, PU-SSIM uses its OWN L=256 SSIM (`pu21_ssim.c`), NOT a modification of the golden `float_ssim`/`iqa_ssim` (L=255, feeds Netflix assertions). All per-pixel math is double precision. Encoder + PU-PSNR(100,99)=51.873338803 dB verified at places=4 vs the fp64 dossier oracle (`test_pu21`).	Accepted	2026-06-14
ADR-1113	Vendor the Pelorus interop ABI (data-plane side-data blob: pack/parse, deband param contract, version accessors) into vmafx as a pinned, read-only, append-only mirror of `VMAFx/pelorus@835e097` — three `.c` under `core/src/interop/`, three headers under `core/include/libvmaf/pelorus/`, the shared 7-vector conformance fixture as `core/test/test_pelorus_interop.c`. Single source of truth stays in pelorus (ADR-0103); a pinned drift guard (`scripts/sync-pelorus-interop.sh`, reads the pinned git tree object) fails CI on divergence; vendored files are lint/format-excluded to keep them byte-identical. CPU-only, dependency-free, no Vulkan. Alternatives (submodule / meson-subproject / hand-reimplement) rejected to avoid build coupling + diverging parsers.	Accepted	2026-06-14
ADR-1114	Y-FUNQUE+ wavelet-domain atom features only (`y_funque_plus`, CPU-only, temporal) — emits `y_funque_plus_ms_ssim`, `y_funque_plus_dlm`, `y_funque_plus_mad`. Per-frame: 2x OpenCV `INTER_CUBIC` (Keys cubic `a=-0.75`) pre-downscale, 2-level Haar DWT (pywt `'periodization'`), Nadenau Y-channel CSF weighting of detail subbands only, then the three atoms (MS-SSIM cov pooling, DLM detail-loss, MAD-Ref temporal). The DLM num/den abs-asymmetry is replicated exactly (num pools `rest^3` without abs, den pools ref with abs). The fused ScaledSVR MOS score is deferred — upstream `funque_plus` ships no frozen regressor (trains per-dataset at runtime), so a fused number would be fork-originated and needs a licensed subjective dataset + model card. License finding: `funque_plus` is MIT (Copyright (c) 2023 Abhinau Kumar), BSD-2-Clause-Patent-compatible; the C is a clean-room reimplementation from the papers (arXiv:2304.03412, 2202.11241) cross-checked against the MIT reference, no source copied verbatim. Double-precision, `-ffp-contract=off`. Unit test asserts per-atom oracles at places=4 (independently re-derived against a pywt+OpenCV reference) plus odd-dim + crop-path regressions.	Accepted	2026-06-14
ADR-1115	BRISQUE no-reference opinion-aware CPU feature extractor (`brisque`), bundling the canonical LIVE-lab `allmodel` (libsvm EPSILON_SVR) under a documented research-use attribution exception (NOTICE + model card citing Mittal/Moorthy/Bovik TIP 2012). Replicates the gregfreeman MATLAB pipeline that trained the model — GGD for the MSCN field (not krshrimali's AGGD), Gaussian sigma=7/6 (not truncated 1.166), MATLAB antialiased-bicubic half-res — NOT the krshrimali C++ port. Range-scales with the inline `computescore.cpp` arrays (NOT the conflicting `allrange` file); no output clamp; plain `svm_predict`. First feature-extractor consumer of the vendored libsvm. NR posture: scores the distorted picture only. SDR-luma trained; PQ/HLG HDR out of scope (warns + scores as SDR).	Accepted	2026-06-14
ADR-1116	vmaf-tune autotune prefilter control plane — new `filter_adapters/` family (sibling to `codec_adapters/`) with a `FilterAdapter` Protocol + `pelorus_deband.py` hard-coding the 10 frozen knobs from the Pelorus ADR-0110 contract and emitting `-vf pelorus_deband_vulkan=...`; new `vmaf-tune prefilter` subcommand drives a JOINT Optuna TPE search over the deband knob space + CRF (reusing the `fast.py` TPESampler study) with VMAF as the oracle, returning recommended strengths + CRF + per-probe VMAF. vmafx stays Vulkan-free (emits the string, scores the output); the live `deband→encode→score` loop is gated behind `pelorus_filter_available()`. Unit-tested (adapter emission/validation, search-space, mocked subcommand loop); live encode untested here (no pelorus-enabled ffmpeg build).	Accepted	2026-06-14
ADR-1117	MCP `vmaf_score` / `vmaf_score_encoded` gain optional tiny-AI/DNN (`tiny_model`, `tiny_device`/`--dnn-ep`, `tiny_threads`, `tiny_fp16`, `tiny_model_verify`, `tiny_codec`, `tiny_preset`, `tiny_crf`, `tiny_resize`, `no_reference` NR mode), feature-selection (`feature` repeatable, `aom_ctc`/`nflx_ctc` presets), and score-completeness (`threads`, `frame_cnt`, `frame_skip_ref`/`frame_skip_dist`, `no_prediction`) parameters — closing the fork's largest MCP capability gap (Tiny-AI was 0%% reachable). All params optional + backward-compatible; forwarded to the `vmaf` CLI only when set. Go (`cmd/vmafx-mcp`) and Python (`mcp-server/vmaf-mcp`) schemas verified byte-identical. NR mode makes `ref` optional and is gated on `tiny_model`.	Accepted	mcp, ai, docs, agents, fork-local
ADR-1118	Pelorus side-data perceptually re-weights VMAF spatial pooling (opt-in, golden-isolated). The vendored interop parser (ADR-1113) reads each frame's per-cell banding-risk / variance maps; a normalized `[0,1]` salience becomes a per-frame pooling weight `w = 1 + strength·salience`, turning pooled MEAN/HARMONIC_MEAN into their weighted forms (MIN/MAX unaffected). Golden-gate isolation (load-bearing): inert unless BOTH the `perceptual_weight` opt-in is set AND a valid Pelorus blob is present for the frame; otherwise `w ≡ 1.0` and the pooling runs the literal upstream expression, so the no-side-data path (and the Netflix golden pairs, which carry no side-data) is byte-identical — proven by `test_perceptual_weight.c`. New C-API `vmaf_set_perceptual_{weight_enabled,weight_strength,sidedata}` (`core/include/libvmaf/perceptual_weight.h`); weight module `core/src/feature/perceptual_weight.c`; `vf_libvmaf` reader + `perceptual_weight` AVOption (ffmpeg-patch 0017). R1–R6 compat: `min(known_size,dir.size)` reads, unknown bits ignored, `grid==0` degrades to frame-level scalar, ABI-major mismatch rejected (unweighted + log). CPU-only, no GPU. Alternatives (auxiliary-feature / both) rejected — maintainer chose spatial-pooling weighting.	Accepted	2026-06-14
ADR-1119	Adopt the `github.com/golusoris/golusoris` fx framework (pinned `v0.4.0`) across all six vmafx Go binaries + `pkg/`. Every binary becomes an `fx.New(...).Run()` over golusoris modules (Core/otel/HTTP/grpc/k8s-operator/clikit), sharing a vmafx-local `internal/app/bootstrap` stanza (`Base` + `FxLogger`). RC-blocking, phased services-first (`vmafx-server` → `controller` → `node` → `operator`, then `mcp`/`tune` + `pkg` sweep). Keeps the `VMAFX_` config env-prefix via `fx.Replace(config.Options{...})` (not golusoris's default `APP_`) and keeps the controller's embedded SQLite queue (not `golusoris.Jobs`/Postgres). Dependency closures already align byte-for-byte, so the `go.mod` merge is low-risk (`go build ./...` clean post-`go get`). Four framework gaps filed upstream were all integrated by the maintainer (#225 gRPC interceptor injection, #226 version module, #227 operator SetLogger + webhook, #234 log reads `log.level` from config); the pin was bumped v0.3.1→v0.4.0. Only #226 + the `k8s/operator` module are in the v0.4.0 tag; #225/#227/#234 are merged to golusoris main but untagged, so the service binaries carry small interim shims (the #234 log-env bridge; the #227 operator SetLogger + webhook gate) and the controller (needs #225) follows the next golusoris tag.	Accepted	2026-06-14
ADR-1120	Re-pin the vendored Pelorus interop ABI (ADR-1113) from `pelorus@835e097` (ABI 1.0) to `pelorus@818d844` (ABI 1.3) and consume the new `PEL_SEC_COMPLEXITY` per-frame section in perceptual weighting (ADR-1118). Minor-3 appends three sections (`QPREPORT` 1.1, `MOTION_CONF` 1.2, `COMPLEXITY` 1.3) and three files; the mirror gains `denoise.h`, `denoise_params.c`, `qp_report_csv.c` (the last is required for the minor-3 conformance fixture to link `pel_x265_csv_parse`). The `PelorusSideData` header layout is unchanged (append-only R1) and the parser keys on `abi_major`, so it is a deliberate re-pin, not a break. Fixes the sync-script `--update` bug that re-vendored the six manifest files but not the conformance-fixture body (the immediately-following drift check then failed). Complexity modulation: `perceptual_weight.c` attenuates the banding salience by `(1 − 0.5·complexity)` floored at `0.25` — banding is more visible on flat/simple frames and masked on busy/textured frames. Golden-isolated (load-bearing): engages only when the opt-in is set AND a complexity section is present; absent section → factor `1.0` → byte-identical pooling, so the Netflix golden 576×324 pair scores `76.667831` unchanged (verified). Proven by `test_complexity_modulates_weight` / `_grid_zero` (toggle-proven). New vendored files inherit the existing prefix-glob lint/format exclusions. CPU-only, no GPU. Alternatives (trim-fixture, threshold/step modulation, replace-banding) rejected.	Accepted	2026-06-27
ADR-0452	Hoist VIF 10-plane scratch buffer from per-frame allocation to VifState init/close lifecycle; eliminates ~79 MB/frame allocator traffic at 1080p.	Accepted	perf, vif, cpu, build, fork-local
ADR-0460	Dispatch-strategy registry audit: deduplicate SYCL/Vulkan feature registry rows and align HIP/Metal dispatch-support tables with registered extractor names.	Accepted	dispatch, hip, metal, sycl, vulkan, correctness
ADR-0539	Per-kernel `hip_cu_extra_flags` keeps SSIMULACRA2 recursive blur FP contraction disabled on HIP so parity stays inside the ADR-0214 gate.	Accepted	hip, build, ssimulacra2, numerics, fork-local
ADR-0567	Port Netflix upstream `30a6e2a8d` direct-read CLI path, avoiding the intermediate `video_input_ycbcr` buffer when `USE_DIRECT_READ` is enabled.	Accepted	upstream-port, performance, tools, cli, build
ADR-0764	psnr_hvs CUDA kernel: F3 ldg() + __restrict pointer extraction + launch_bounds(64) (PR #96 candidate #5, mirrors ADR-0754)	Accepted	cuda, perf, psnr_hvs, fork-local
ADR-0866	Wire markdownlint-cli2 into `make lint` + pre-commit + CI (touched-file scope)	Accepted	2026-05-30
ADR-0982	GPU runtime bug audit — round 26: 6 init/teardown leak fixes across CUDA + SYCL + shared GPU TUs (drain stream / picture stream + events + device pointers / CUDA func table / picture-pool partial-init / SYCL extractor-vs-queue ordering / VA readback exception unwind)	Accepted	2026-05-31
ADR-0993	KoNViD / UGC / BVI-DVC saliency batch manifests: in-tree `ai/batch-manifests/saliency/` with runnable KoNViD-150K manifest and scaffolded UGC/BVI stubs documenting path-column blocking gaps	Accepted	2026-06-03