Architectural Decision Records (ADR)¶
This is the canonical, tracked decision log for the fork. Every non-trivial architectural / policy / scope decision lands here as its own markdown file before the corresponding commit merges.
Format¶
We use Michael Nygard's ADR format (MADR-style), one markdown file per decision — not a mega-table. See joelparkerhenderson/architecture-decision-record for background.
Each ADR file is named NNNN-kebab-case-title.md with a zero-padded 4-digit ID and follows the structure in 0000-template.md:
# ADR-NNNN: <short, declarative title>
- **Status**: Proposed | Accepted | Deprecated | Superseded by [ADR-NNNN](NNNN-title.md)
- **Date**: YYYY-MM-DD
- **Deciders**: <names / handles>
- **Tags**: <comma-separated area tags>
## Context — the problem, the forces at play
## Decision — one paragraph in active voice
## Alternatives considered — at minimum the runner-up, in a pros/cons table
## Consequences — Positive / Negative / Neutral-follow-ups
## References — upstream docs, prior ADRs, related PRs, popup-answer source
Conventions¶
- Filename:
NNNN-kebab-case-title.md. IDs are assigned in commit order and never reused. - Immutable once Accepted: the body is frozen. To change a decision, write a new ADR with
Status: Supersedes ADR-NNNNand flip the old one toSuperseded by ADR-MMMM. - One decision per ADR — if you find yourself writing "and also…", split it.
- Tagging: use the flat tag palette below so
grep -l 'Tags:.*cuda' docs/adr/*.mdworks. New tags are fine when justified. - Link from per-package AGENTS.md: the relevant per-package
AGENTS.mdpoints to the ADRs that govern that subtree, so the rationale is one click away from the code. - Backfill policy: ADRs ≤ 0099 are backfills — decisions made before the ADR practice was formalised on 2026-04-17, captured retroactively from commit history and planning dossiers. Their
Statusreflects the current code, not the original decision date. New decisions start at 0100.
Tag palette¶
ai, agents, build, ci, claude, cli, cuda, dnn, docs, framework, git, github, license, lint, matlab, mcp, planning, python, readme, release, security, simd, supply-chain, sycl, testing, workspace.
Why it exists¶
A Claude session makes a decision (directory move, CI gate change, dependency swap), commits it, the session ends, and the rationale is recoverable only from the commit message — which typically summarises the what but omits the alternatives considered. ADRs preserve "we chose X over Y because Z" in a single auditable place. See ADR-0028.
What counts as non-trivial?¶
Another engineer could reasonably have chosen differently. Examples:
- Directory moves (e.g., ADR-0026:
workspace/→python/vmaf/workspace/) - Base-image / dependency policy (e.g., ADR-0027: non-conservative CUDA pins)
- CI-gate semantics (e.g., ADR-0024: Netflix golden tests as required status)
- Test-selection / regeneration rules
- Coding-standards changes (e.g., ADR-0012)
- New user-visible flags or surfaces (e.g., ADR-0023)
Not ADR-worthy: bug fixes, implementation details, one-off refactors that don't change any interface or policy.
Relation to .workingdir2/¶
Planning dossiers live under .workingdir2/ (gitignored). Mirrored copies of ADRs may exist there for local session continuity, but the tracked docs/adr/ tree is authoritative.
Index¶
| ID | Title | Status | Tags |
|---|---|---|---|
| ADR-0001 | Treat uncommitted benchmark result JSON as noise | Accepted | workspace, git, testing |
| ADR-0002 | Merge path gpu-opt → sycl → master, master is fork default | Accepted | git, release, workspace |
| ADR-0003 | Introduce .workingdir2 as new planning directory | Accepted | workspace, planning, claude |
| ADR-0004 | Auto-push sycl and master to origin after merges | Accepted | git, ci, release |
| ADR-0005 | Adopt full framework adaptation scope (a–g) | Accepted | framework, ci, docs, build, mcp |
| ADR-0006 | Set CLI precision default to %.17g with --precision flag | Superseded by ADR-0119 | cli, testing, python |
| ADR-0007 | Rewrite .claude/settings.json from scratch | Accepted | claude, agents |
| ADR-0008 | Rewrite README with fork branding, preserve Netflix attribution | Accepted | docs, readme, license |
| ADR-0009 | MCP server exposes four core tools | Accepted | mcp, python, framework |
| ADR-0010 | Sign release artifacts keyless via Sigstore | Accepted | security, release, supply-chain |
| ADR-0011 | Version scheme v3.x.y-lusoris.N | Accepted | release, framework |
| ADR-0012 | Coding standards stack: JPL + CERT + MISRA | Accepted | lint, docs, license |
| ADR-0013 | Support full local dev distro matrix | Accepted | build, docs, framework |
| ADR-0014 | VSCode uses clangd; disable MS C/C++ IntelliSense | Accepted | build, framework, lint |
| ADR-0015 | CI matrix: Linux / macOS / Windows with sanitizers | Accepted | ci, testing, security |
| ADR-0016 | sycl → master merge conflict resolution policy | Accepted | git, workspace |
| ADR-0017 | Claude skills scope includes domain scaffolding | Accepted | claude, agents, framework |
| ADR-0018 | Claude hooks scope: safety + auto-format + git | Accepted | claude, agents, ci, git |
| ADR-0019 | .workingdir2 is the full planning dossier | Accepted | workspace, planning, docs |
| ADR-0020 | Tiny-AI scope covers all four capabilities | Accepted | ai, dnn, framework, cli |
| ADR-0021 | Training stack: PyTorch + Lightning with ONNX export | Accepted | ai, python, framework |
| ADR-0022 | Inference runtime: ONNX Runtime via execution providers | Accepted | ai, dnn, cuda, sycl, build |
| ADR-0023 | Tiny-AI user surfaces: CLI, C API, ffmpeg, training | Accepted | ai, dnn, cli, framework |
| ADR-0024 | Preserve Netflix source-of-truth tests verbatim | Accepted | testing, ci, license |
| ADR-0025 | Copyright handling preserves Netflix, adds Lusoris/Claude | Superseded by ADR-0105 | license, docs |
| ADR-0026 | Relocate Python harness workspace under python/vmaf/ | Accepted | workspace, python, docs |
| ADR-0027 | Non-conservative image pins + experimental toolchain flags | Accepted | ci, cuda, sycl, build, supply-chain |
| ADR-0028 | Every non-trivial decision gets an ADR before the commit | Superseded by ADR-0106 | docs, planning, agents |
| ADR-0029 | Relocate resource tree under python/vmaf/ | Accepted | workspace, python, docs |
| ADR-0030 | Relocate MATLAB sources under python/vmaf/ | Accepted | workspace, matlab, python |
| ADR-0031 | Fork-added docs live under docs/ | Accepted | docs, workspace |
| ADR-0032 | Relocate root unittest script to scripts/ | Accepted | testing, workspace |
| ADR-0033 | Relocate CodeQL config to .github/ | Accepted | security, ci, github |
| ADR-0034 | Delete patches/ leftover; keep only ffmpeg-patches/ | Accepted | workspace, build |
| ADR-0035 | Migrate .claude/settings.json hooks to current schema | Accepted | claude, agents |
| ADR-0036 | Tiny-AI Wave 1 scope expanded beyond D20–D23 | Superseded by ADR-0107 | ai, dnn, cli, framework, mcp |
| ADR-0037 | Protect master branch on GitHub with required checks | Accepted | github, ci, security, release |
| ADR-0038 | Purge upstream MATLAB MEX compiled binaries from tree | Accepted | security, matlab, supply-chain |
| ADR-0039 | Pull forward runtime op-allowlist walk + model registry | Accepted | ai, dnn, security, supply-chain |
| ADR-0040 | Extend DNN session API to multi-input/output with named bindings | Accepted | ai, dnn, cli |
| ADR-0041 | Ship LPIPS-SqueezeNet FR extractor with inverse-ImageNet in graph | Accepted | ai, dnn, cli |
| ADR-0042 | Tiny-AI PRs must ship human-readable docs in the same PR | Accepted | ai, dnn, docs |
| ADR-0100 | Every user-discoverable change ships docs in the same PR (project-wide) | Accepted | docs, agents, framework |
| ADR-0101 | Implement USM-backed picture pre-allocation pool for SYCL | Accepted | sycl, gpu, picture-api, memory |
| ADR-0102 | DNN EP selection is ordered + graceful; fp16_io does a host-side fp32↔fp16 cast | Accepted | ai, dnn, cli |
| ADR-0103 | Implement vmaf_sycl_import_d3d11_surface as staging-texture H2D path | Accepted | sycl, windows, api |
| ADR-0104 | Compile picture_pool unconditionally and size it for the live-picture set | Accepted | api, build, cli |
| ADR-0105 | Copyright handling preserves Netflix and adds Lusoris/Claude (paraphrased re-statement) | Supersedes ADR-0025 | license, docs |
| ADR-0106 | Every non-trivial decision gets its own ADR file before the commit (paraphrased re-statement) | Supersedes ADR-0028 | docs, planning, agents |
| ADR-0107 | Tiny-AI Wave 1 scope expanded beyond ADR-0020 through ADR-0023 (paraphrased re-statement) | Supersedes ADR-0036 | ai, dnn, cli, framework, mcp |
| ADR-0108 | Every fork-local PR ships the six deep-dive deliverables (research digest, decision matrix, AGENTS.md invariant, reproducer, changelog entry, rebase note) | Accepted | docs, agents, framework, planning |
| ADR-0109 | Nightly bisect-model-quality runs against a synthetic placeholder cache (real DMOS-aligned cache swaps in via follow-up) | Accepted | ai, ci, tiny-ai, framework |
| ADR-0110 | Coverage gate uses -fprofile-update=atomic to survive parallel meson tests on instrumented SIMD code | Superseded by ADR-0111 | ci, build, simd, testing |
| ADR-0111 | Coverage gate switches lcov → gcovr and installs ORT in the coverage job (fixes 1176% over-count + DNN-stub coverage gap; layers on ADR-0110 race fixes) | Accepted | ci, build, dnn, testing |
| ADR-0112 | Expose ort_backend.c static helpers (fp16 conversion, resolve_name) via private internal header so test_ort_internals can unit-test edge branches the public API can't reach on a CPU-only ORT build | Accepted | dnn, testing, coverage |
| ADR-0113 | vmaf_ort_open falls back to CPU when CreateSession fails after a non-CPU EP attached; coverage CI installs onnxruntime-gpu + libcudart12 to exercise EP-attach success arms | Accepted | dnn, ci, coverage, ort |
| ADR-0114 | coverage-check.sh gains a per-file critical-coverage override map; dnn/ort_backend.c + dnn/dnn_api.c floor at 78% (structural EP-availability ceiling per ADR-0112) | Accepted | ci, coverage, dnn, ort, gate |
| ADR-0115 | All CI workflows trigger on [master] only (drop dead sycl branch); delete windows.yml and merge into libvmaf.yml preserving the build (MINGW64, …) required-status-check name | Accepted | ci, github, build, framework |
| ADR-0116 | CI workflow naming convention — purpose-named files + Title Case display names | Accepted | ci, github, docs |
| ADR-0117 | Bump actions/upload-artifact@v5→@v7 (Node 24) repo-wide; filter gcovr Ignoring suspicious hits stderr noise so the Coverage Gate Annotations panel finishes empty | Accepted | ci, coverage, gcovr, github-actions |
| ADR-0118 | ffmpeg-patches/ is a quilt-style series applied via series.txt ordering by both Dockerfile and ffmpeg.yml; patches regenerated via real git format-patch -3 carrying valid index lines + signed-off-by trail | Accepted | ci, build, ffmpeg, docker, sycl, ai |
| ADR-0119 | Revert CLI precision default from %.17g to %.6f so the Netflix CPU golden gate (CLAUDE.md §8) passes without per-call-site flags; --precision=max keeps the round-trip-lossless opt-in | Supersedes ADR-0006 | cli, testing, python, golden-gate |
| ADR-0120 | Add three DNN-enabled matrix legs (Ubuntu gcc, Ubuntu clang, macOS clang) to libvmaf-build-matrix.yml so the ORT C-API surface and dnn meson suite are exercised across compilers/OSes; macOS leg experimental: true (Homebrew ORT floats) | Accepted | ci, ai, dnn, ort, build, github-actions |
| ADR-0121 | Add Windows GPU build-only matrix legs (Build — Windows MSVC + CUDA (build only) + Build — Windows MSVC + oneAPI SYCL (build only)) so MSVC build-portability of the CUDA / SYCL backends is gated on PR, not from downstream user reports | Accepted | ci, build, cuda, sycl, github-actions |
| ADR-0122 | Unconditional CUDA cubin coverage for sm_86 / sm_89 + compute_80 PTX fallback in core/src/meson.build; actionable multi-line libcuda.so.1 dlopen-failure + cuInit messages in vmaf_cuda_state_init() (with pre-existing leak fix on error paths) | Accepted | cuda, build, docs |
| ADR-0123 | Fix ffmpeg libvmaf_cuda null-deref at vmaf_read_pictures tail: prev_ref update on CUDA-device-only extractor set dereferenced zero-initialised ref_host. Null-guard the vmaf_picture_ref(&vmaf->prev_ref, ref) call. Upstream f740276a + 32b115df + fork 65460e3a combined to reach default builds. | Accepted | cuda, regression, upstream-sync |
| ADR-0124 | Automate the four rule-bearing process ADRs (0100 doc-substance, 0105 copyright, 0106 ADR-per-decision, 0108 deep-dive deliverables). New rule-enforcement.yml workflow with one blocking job (deep-dive-checklist) + two advisory PR-comment jobs (doc-substance-check, adr-backfill-check); pre-commit hook for the copyright template. | Accepted | ci, agents, framework, docs, license |
| ADR-0125 | MS-SSIM decimate fast paths: AVX2 + AVX-512 specialised 9×9 separable LPF factor-2 kernels under core/src/feature/x86/; vendored iqa/decimate.c stays untouched; bit-exactness enforced via a scalar-separable reference; NEON deferred to follow-up | Proposed | simd, testing, agents |
| ADR-0126 | SSIMULACRA 2 perceptual metric as a fork-local feature extractor — port libjxl C++ reference to core/src/feature/ssimulacra2.c, scalar-first, float-tolerance bit-closeness | Proposed | metrics, feature-extractor, docs, agents |
| ADR-0127 | Vulkan compute backend — vendor-neutral GPU path alongside CUDA/SYCL/HIP; volk loader, GLSL→SPIR-V via glslc, VMA allocator, VIF as pathfinder | Proposed | gpu, vulkan, backend, build, agents |
| ADR-0128 | Embedded MCP server inside libvmaf — SSE + UDS + stdio transports, build-flag-gated, new libvmaf_mcp.h header, Power-of-10 compliant via dedicated MCP thread + SPSC queue | Accepted | mcp, agents, api, build, docs |
| ADR-0129 | Tiny-AI post-training int8 quantisation — static + dynamic + QAT per model via quant_mode field in model/registry.json; three scripts under ai/scripts/; CI accuracy-budget gate | Proposed | ai, onnx, quantization, model, docs |
| ADR-0130 | Ship scalar C port of the SSIMULACRA 2 metric (libjxl tools/ssimulacra2.cc). BT.709 limited-range YUV→sRGB→linear→XYB pipeline, 6-scale pyramid with separable Gaussian blur (σ=1.5, reflect padding) replacing libjxl's FastGaussian IIR, 108-weight polynomial pool. Snapshot JSON deferred to follow-up PR. Implementation closeout for ADR-0126 (Proposed, PR #67). | Accepted | metrics, feature-extractor, ssimulacra2 |
| ADR-0131 | Port Netflix#1382 — swap cuMemFreeAsync(ptr, priv->cuda.str) for synchronous cuMemFree(ptr) at core/src/cuda/picture_cuda.c:247 to fix the multi-session assert-0 crash reported in Netflix#1381. Preceding cuStreamSynchronize already removed the async-overlap benefit, so no perf regression. | Accepted | cuda, upstream-port, correctness |
| ADR-0132 | Port Netflix#1406 — fix vmaf_feature_collector_mount_model list traversal bug (dereferenced-pointer mutation corrupted the list on ≥3 mounts) and unmount_model error-code semantics (-EINVAL → -ENOENT on not-found). Refactored the upstream test extension into a shared load_three_test_models helper to keep per-test bodies under JPL-Power-of-10 rule-4 size thresholds. | Accepted | upstream-port, correctness, testing |
| ADR-0133 | Unify push-event and pull-request-event clang-tidy scoping: both now scan only the event's delta (<before>..HEAD on push, origin/<base>...HEAD on PR). Master-post-merge no longer re-scans every vendored file (core/src/svm.cpp, core/src/cuda/*.c without CUDA headers) and surfaces long-latent warnings unrelated to the push. actions/checkout@v6 bumped to fetch-depth: 0. | Accepted | ci, lint, clang-tidy |
| ADR-0134 | Port Netflix#1451 — append declare_dependency(link_with: libvmaf, include_directories: [libvmaf_inc]) + meson.override_dependency('libvmaf', ...) at the end of core/src/meson.build so consumers can use the fork as a meson subproject with the standard dependency('libvmaf') idiom. Fork deviates from upstream by using trailing-comma style to match fork build-file conventions. | Accepted | build, upstream-port |
| ADR-0135 | Port Netflix#1424 — expose vmaf_model_version_next(prev, &version) public iterator for built-in VMAF model versions. Corrects three upstream defects during port: NULL-pointer arithmetic UB (missing else), off-by-one returning the {0} sentinel, and const-qualifier mismatches in the test. Adds BUILT_IN_MODEL_CNT == 0 early-return for zero-models build configurations. Doxygen-style header doc replaces upstream's one-liner. | Accepted | api, upstream-port, correctness |
| ADR-0136 | Strip markdown emphasis/code characters (`, *, _) from the PR body before the Deep-Dive Deliverables Checklist grep. The template ships label bullets like - [ ] **`AGENTS.md` invariant note** — backticks inserted characters between tokens and broke the literal-item regex, rejecting conforming PRs. One-line tr -d pass applied to both parse and diff-verification steps. | Accepted | ci, rule-enforcement, adr-0108 |
| ADR-0137 | Port upstream Netflix/vmaf PR #1430 (thread-local locale handling via thread_locale.h/.c) into the fork. Replaces process-global setlocale(LC_ALL, "C") bracket in svm_save_model + output writers with POSIX.1-2008 uselocale (Linux/macOS/BSD), Windows _configthreadlocale, graceful fallback elsewhere. Fork corrections: NULL score_format in test, merge ADR-0119 ferror return contract with upstream pop, drop dead <locale.h> include in svm.cpp. | Accepted | port, libvmaf, i18n, thread-safety, upstream-port |
| ADR-0138 | Add AVX2 + AVX-512 bit-exact fast path for _iqa_convolve using the three-stage pattern single-rounded float mul → widen to double → double add (no FMA) to mirror scalar sum += img[i]*k[j] under FLT_EVAL_METHOD == 0. Specialised for MS-SSIM / SSIM invariants (11-tap Gaussian or 8-tap square, normalised, separable). Dispatch in ssim_tools.c via new _iqa_convolve_set_dispatch; vendored iqa/convolve.c untouched. Profile-driven after the 51% self-time hot spot revealed post-decimate-SIMD. Bit-identical to scalar; Netflix golden gate unchanged. | Proposed | simd, performance |
| ADR-0139 | Rewrite fork-local ssim_accumulate_avx2/avx512 to match scalar ssim_accumulate_default_scalar byte-for-byte by doing the 2.0 * mu1*mu2 + C1 numerator and the final l*c*s product per-lane in scalar double rather than vector float. Prior SIMD computed l, c in float and multiplied as float → 8th-decimal (~0.13 float-ULP) drift on MS-SSIM. precompute / variance are pure elementwise float ops and already bit-exact. Verified on Netflix + checkerboard pairs: scalar = AVX2 = AVX-512 bit-identical at --precision max. | Proposed | simd, performance, bit-exact |
| ADR-0140 | Ship a two-part SIMD DX framework — core/src/feature/simd_dx.h with ISA-specific macros for the ADR-0138 widen-then-add pattern and the ADR-0139 per-lane-scalar-double reduction pattern, plus an upgraded /add-simd-path skill that scaffolds TU + header + unit test + meson rows from a kernel spec. Demonstrated on convolve NEON (ADR-0138 port to aarch64) and ssim NEON bit-exactness audit (research-0012 follow-up). Cross-compile + QEMU audit gate verifies NEON = scalar at --precision max. Enables PR #B (ssimulacra2 AVX2+NEON, motion_v2 NEON, vif_statistic AVX-512+NEON, etc.) at higher throughput. | Proposed | simd, dx, build, agents |
| ADR-0141 | Every PR leaves every file it touches lint-clean to the fork's strictest profile — fork-local AND upstream-mirror. Refactor first; // NOLINT reserved for cases where refactoring would break a load-bearing invariant (ADR-0138 / ADR-0139 bit-exactness, upstream-parity identifier the rebase story depends on). Every NOLINT cites inline the ADR / research digest / rebase invariant that forces it. Historical debt (18 pre-existing readability-function-size NOLINTs + upstream _iqa_* suppressions) stays scoped to backlog item T7-5; this ADR does not backdate the rule. | Accepted | ci, process, code-quality, agents |
| ADR-0142 | Port Netflix upstream 18e8f1c5 (feature/vif: add vif_sigma_nsq) with fork-specific extension of the AVX2 vif_statistic_s_avx2 SIMD variant to accept the new parameter. Default (2.0) bit-identical to pre-port master; AVX2 and scalar agree on the new 14-argument contract. Fork keeps a local (float)vif_sigma_nsq cast to preserve ADR-0138/0139 float-arithmetic discipline (upstream's scalar double-promotes into a slightly different computation). ADR-0141 applied to touched files (ptrdiff_t stride widening fix; function-size NOLINT on the AVX2 variant references T7-5 sweep). | Accepted | upstream-port, feature-param, vif, simd |
| ADR-0143 | Port Netflix upstream f3a628b4 (feature/common: generalize avx convolution for arbitrary filter widths) — replaces 2.4k LoC of specialised fwidth ∈ {3,5,9,17} kernels in common/convolution_avx.c with a single generalised 1-D scanline pair + MAX_FWIDTH_AVX_CONV ceiling in convolution.h; drops the hard-coded whitelist in the VIF dispatch (vif_tools.c). Adopts upstream's paired python-golden loosening from places=2 to places=1 on VMAF_score / VMAFEXEC_score per the ADR-0142 Netflix-authority precedent. ADR-0141 applied to touched files: four 1-D scanline helpers now static, strides widened to ptrdiff_t, #include <stddef.h> added. Zero clang-tidy warnings on the touched file. | Accepted | upstream-port, simd, avx2, vif, convolve |
| ADR-0145 | Port motion_v2 to NEON in a new fork-local TU arm64/motion_v2_neon.c. Bit-exact to the scalar reference under QEMU (cpumask=0 vs cpumask=255 byte-identical on Netflix golden pair). Uses arithmetic right-shift throughout (vshrq_n_s64 / vshlq_s64(v, -bpc)) to match scalar C >> semantics — deliberately diverges from the fork's AVX2 variant, which uses _mm256_srlv_epi64 (logical); AVX2 re-audit queued as follow-up. Five small static inline helpers keep every function under ADR-0141's 60-line budget; zero clang-tidy warnings, no NOLINT. Closes backlog item T3-4 (gap-fill Step 2). | Accepted | simd, neon, motion, bit-exact, performance |
| ADR-0146 | Sweep every readability-function-size NOLINT from core/src/ (20 sites). Fork-local files (dict, picture, predict, libvmaf, output, feature_collector, feature_extractor, picture_pool, read_json_model) refactored via small static helpers; IQA / SIMD files (_iqa_convolve, _iqa_ssim, vif_statistic_s_avx2) refactored via static inline helpers that preserve ADR-0138 / ADR-0139 bit-exactness (per-lane scalar-float reduction threaded through struct vif_simd8_lane; convolve ordering unchanged). Drive-by lint fixes: calloc(w*h, ...) widening, multi-declaration forms, model_collection_parse_loop alias-write, _calc_scale → iqa_calc_scale rename. Zero new NOLINTs introduced; Netflix-golden-pair VMAF score bit-identical between VMAF_CPU_MASK=0 and =255. Closes backlog item T7-5. | Accepted | lint, cleanup, refactor, touched-file-rule |
| ADR-0147 | Port the thread-pool job-object free list + 64-byte inline data buffer from Netflix upstream PR #1464 (closed) into core/src/thread_pool.c. Recycles VmafThreadPoolJob slots instead of malloc/free per enqueue; payloads ≤ 64 bytes bypass a second malloc via inline_data. Adapted to the fork's void (*func)(void *data, void **thread_data) signature and per-worker VmafThreadPoolWorker data path. ~1.8–2.6× enqueue throughput on a 500 000-job 4-thread micro-benchmark. Netflix-golden-pair VMAF bit-identical between --threads 4 and serial, and between VMAF_CPU_MASK=0 and =255 under --threads 4. Closes the thread-pool half of backlog item T3-6 (PSNR SIMD half was already in via fork commit 81fcd42e). | Accepted | performance, threading, upstream-port |
| ADR-0148 | Rename the IQA-derived reserved-identifier surface (_iqa_*, struct _kernel, _ssim_int, _map_reduce, _map, _reduce, _context, _ms_ssim_map, _ssim_map, _ms_ssim_reduce, _ssim_reduce, _alloc_buffers, _free_buffers, header guards _CONVOLVE_H_ / _DECIMATE_H_ / _SSIM_TOOLS_H_ / __VMAF_MS_SSIM_DECIMATE_H__) to non-reserved iqa_* / ms_ssim_* / _INCLUDED spellings across core/src/feature/{iqa,*} (21 files). Sweeps the ADR-0141 touched-file lint cascade that surfaced (~40 pre-existing warnings: misc-use-internal-linkage → static or cross-TU NOLINT; widening multiplications → size_t casts; multi-decl splits; function-size refactors of calc_ssim / compute_ssim / compute_ms_ssim / run_gauss_tests; unused-parameter (void) casts; scoped NOLINTBEGIN/END for analyzer false positives on kernel-offset bounds and test-helper malloc leaks). Bit-identical VMAF score on Netflix golden pair (scalar vs SIMD, with --feature float_ssim --feature float_ms_ssim). Closes backlog item T7-6. | Accepted | lint, cleanup, refactor, iqa, touched-file-rule |
| ADR-0149 | Port Netflix upstream PR #1376 ("Fix fifo hangs") into the Python harness under python/vmaf/core/executor.py + python/vmaf/core/raw_extractor.py. Replaces the 1-second os.path.exists() polling loop in _open_{work,proc}files_in_fifo_mode with multiprocessing.Semaphore(0) signalled by the child processes after os.mkfifo(...); parent acquires with 5-second soft-timeout warn then blocks indefinitely. Applied to both the base Executor class hierarchy and the ExternalVmafExecutor-style subclass (single-process variant). Fork carve-outs: skip upstream's __version__ = "3.0.0" → "4.0.0" bump (fork tracks its own versioning per ADR-0025); drop now-unused from time import sleep from both files (ADR-0141 ruff F401). Closes backlog item T4-7. | Accepted | upstream-port, python, concurrency, fifo |
| ADR-0150 | Port Netflix upstream PR #1472 ("cuda: enable CUDA feature extraction on Windows (MSYS2/MinGW)") two-commit series: source-portability guards in CUDA headers + .cu files (drop <pthread.h> from cuda/common.h, DEVICE_CODE guards on <ffnvcodec/*> vs <cuda.h> in cuda_helper.cuh + picture.h, #ifndef DEVICE_CODE around feature_collector.h in 5 ADM .cu files), and meson build plumbing (vswhere-based cl.exe discovery without PATH pollution, Windows SDK + MSVC include path injection via -I flags to nvcc, CUDA version detection via nvcc --version instead of meson.get_compiler('cuda')). Fork-specific conflict resolutions: keep positional (not #ifndef __CUDACC__) initializers in integer_adm.h; keep pthread_dependency on cuda_static_lib (ring_buffer.c still uses pthread directly); merge fork's gencode coverage block (ADR-0122) with upstream's new nvcc-detect block. Drive-by: rename reserved __VMAF_SRC_*_H__ header guards to VMAF_SRC_*_INCLUDED. Linux CPU build 32/32 + Linux CUDA build 35/35 pass. Closes backlog item T4-2. | Accepted | upstream-port, cuda, windows, mingw, build |
| ADR-0151 | Add a 32-bit x86 (i686) build-only row to .github/workflows/libvmaf-build-matrix.yml to reproduce Netflix upstream issue #1481. New cross-file build-aux/i686-linux-gnu.ini (gcc + -m32, cpu_family = 'x86', cpu = 'i686'); new install-deps step for gcc-multilib g++-multilib; matrix row runs meson_extra: --cross-file=build-aux/i686-linux-gnu.ini -Denable_asm=false (pins upstream's documented workaround); test + tox steps skipped for the i686 leg because meson marks cross-built tests SKIP 77. Scope note: fixing the actual _mm256_extract_epi64 compile failure (24 call sites in adm_avx2.c) is explicitly out of scope — this ADR adds the CI gate only. Closes backlog item T4-8. | Accepted | ci, build, x86, netflix-upstream |
| ADR-0152 | vmaf_read_pictures now rejects non-monotonic indices with -EINVAL. Addresses Netflix upstream issue #910: integer_motion / motion2 / motion3 extractors keep sliding-window state keyed by index % N, so submitting frames out of order or with duplicate indices silently corrupts their ring-buffers (symptom: missing integer_motion2_score on last frame when submission order doesn't match frame order). Enforced via new last_index + have_last_index fields on VmafContext, checked inside the existing read_pictures_validate_and_prep helper from ADR-0146. Net visible behaviour change: duplicates and out-of-order indices now return -EINVAL instead of producing silent-wrong-answer. 3-subtest reducer in test_read_pictures_monotonic.c verified to fail pre-fix and pass post-fix. Zero impact on in-tree callers (vmaf CLI + test suite already iterate strictly increasing). Closes backlog item T1-2. | Accepted | api, correctness, motion, netflix-upstream |
| ADR-0153 | float_ms_ssim init rejects input below 176×176 with -EINVAL + clear error message. Addresses Netflix upstream issue #1414: the 5-level 11-tap MS-SSIM pyramid walks off the kernel footprint at a mid-level scale for inputs below 176×176 (QCIF and smaller), producing a confusing mid-run error: scale below 1x1! cascade. Fix checks w < GAUSSIAN_LEN << (SCALES - 1) at init time in float_ms_ssim.c:init, derived from the existing filter constants so it stays in sync. Extracted ms_ssim_init_simd_dispatch helper to keep the function under the ADR-0141 size budget. 3-subtest reducer in test_float_ms_ssim_min_dim.c covers 5 boundary rejections (including exact-under cases like 175×176 and 176×175) and 2 accept cases (176×176 exact + 576×324). Verified fail-without-fix + pass-with-fix. Visible behaviour: init now fails immediately instead of mid-stream; zero impact on inputs ≥176×176. Closes backlog item T1-4. | Accepted | correctness, ms-ssim, netflix-upstream |
| ADR-0154 | vmaf_score_pooled (via vmaf_feature_collector_get_score + the inline vmaf_feature_vector_get_score fast-path) now returns -EAGAIN when a requested feature index is valid but not yet written (transient — e.g. motion2 waiting for the next frame or flush), distinguishing it from -EINVAL which remains for programmer errors (bad pointer, out-of-range, unknown feature). Addresses Netflix upstream issue #755: downstream integrations that want per-frame streaming VMAF output can now distinguish 'retry after next read or flush' from 'abort'. Inline-helper return was previously -1; now -EINVAL (structural) or -EAGAIN (pending). Drive-by: rename reserved __VMAF_FEATURE_COLLECTOR_H__ header guard to VMAF_FEATURE_COLLECTOR_INCLUDED (ADR-0141 touched-file rule). 4-subtest reducer in test_score_pooled_eagain.c verified to fail pre-fix, pass post-fix. Closes backlog item T1-1. | Accepted | api, correctness, motion, netflix-upstream |
| ADR-0155 | Deliberately preserve the i4_adm_cm int32 rounding overflow reported as Netflix upstream issue #955. add_bef_shift_flt[idx] = (1u << (shift_flt[idx] - 1)) in core/src/feature/integer_adm.c scales 1–3 assigns 0x80000000 into int32_t, wrapping to -2147483648; downstream (prod + add_bef_shift) >> 32 subtracts 2^31 instead of adding it, biasing ADM scales 1–3 low by ≈1 LSB per summed term. The buggy arithmetic is encoded in the Netflix golden assertAlmostEqual values (project hard rule #1 / ADR-0024); fixing unilaterally would diverge from every published VMAF number calibrated on these outputs. Documentation-only landing: ADR + in-file warning comments at both overflow sites + rebase-notes 0048 + core/src/feature/AGENTS.md invariant. If Netflix closes #955 with a fix, sync under the ADR-0142 Netflix-authority carve-out (C-side fix + golden-number update in one merge). Closes backlog item T1-8 as "verified present, deliberately preserved". | Accepted | correctness, adm, netflix-upstream, deferred, golden-gate |
| ADR-0156 | Replace CHECK_CUDA's assert(0) semantics with graceful error propagation across the entire CUDA backend. Addresses Netflix upstream issue #1420: two concurrent VMAF-CUDA processes crashed the second one at vmaf_cuda_buffer_alloc on cuMemAlloc OOM. Introduces CHECK_CUDA_GOTO(funcs, CALL, label) (cleanup-aware) + CHECK_CUDA_RETURN(funcs, CALL) (immediate-return) macros + vmaf_cuda_result_to_errno helper that maps CUresult → -ENOMEM / -EIO / -ENODEV / -EINVAL. All 178 CHECK_CUDA(...) call sites across 7 TUs (common.c, picture_cuda.c, libvmaf.c, integer_motion_cuda.c, integer_vif_cuda.c, integer_adm_cuda.c, cuda_helper.cuh) converted. Twelve static helper functions promoted from void → int to carry errors. Fixes the NDEBUG footgun (assert(0) was a no-op in release builds → silent segfault). ADR-0122 / ADR-0123 null-guards preserved verbatim. 39/39 CUDA tests pass including new test_cuda_buffer_alloc_oom reducer verified to hit the cuMemAlloc(1 TiB) OOM path. Clang-tidy clean across all 6 touched files. Closes backlog item T1-6. | Accepted | cuda, correctness, api, netflix-upstream, reliability |
| ADR-0157 | Fix CUDA preallocation memory leak + add new public vmaf_cuda_state_free() API. Addresses Netflix upstream issue #1300: users running CUDA VMAF in init/preallocate/fetch/close loops saw GPU memory rise monotonically. Four framework-side leaks identified via ASan + fixed: (1) VmafCudaState heap allocation had no public free — new vmaf_cuda_state_free() symbol in libvmaf_cuda.h + implementation in common.c (NULL-safe free() wrapper; must be called AFTER vmaf_close()); (2) vmaf_cuda_release() now calls cuda_free_functions() to release the dlopen'd driver table, via a saved pointer AFTER the existing memset; (3) vmaf_ring_buffer_close() now unlocks + destroys the pthread_mutex before freeing (was: destroying a locked mutex, POSIX UB); (4) cold-start unwind in init_with_primary_context() releases the retained primary context if cuStreamCreateWithPriority fails. Mirrors SYCL backend's vmaf_sycl_state_free() ownership pattern. New 10-cycle GPU-gated reducer test_cuda_preallocation_leak.c verified to leak 0 framework bytes under ASan (183 bytes remain in libcuda.so.1 driver cache — persists for process lifetime, not per-cycle). Test-side cleanup fixed in test_cuda_pic_preallocation.c + test_cuda_buffer_alloc_oom.c. Preserves ADR-0122 / ADR-0123 null-guards + ADR-0156 CHECK_CUDA_GOTO cleanup paths verbatim. Visible behaviour change: every CUDA caller must now call vmaf_cuda_state_free(cu_state) AFTER vmaf_close(vmaf) — informal free(cu_state) becomes a double-free. Closes backlog item T1-7. | Accepted | cuda, correctness, api, netflix-upstream, memory |
| ADR-0158 | Verify Netflix upstream PR #1486 ("Port motion updates") is already fully present in the fork's master. Both commits — a44e5e6 (motion edge-mirror fix + motion_max_val option + motion3 output + SIMD dispatch refactor) and 62f47d5 (73 coordinated Netflix golden assertAlmostEqual updates) — are confirmed in the fork via grep + programmatic AST-style scan. Fork PR #45 tracked the original port attempt but never merged; substance landed incrementally via later motion3 / blend / five-frame-window / moving-average commits. Doc-only close — ADR + rebase-notes entry 0051 for future /sync-upstream coverage; no code change. Closes backlog item T4-1. | Accepted | upstream-port, motion, netflix-upstream, verification |
| ADR-0159 | AVX2 port of psnr_hvs (Xiph/Daala 8×8 integer DCT + contrast-sensitivity weighting + masking). Vectorizes the DCT 8 rows in parallel via __m256i (one register per row, 8× int32) using a butterfly → transpose → butterfly → transpose layout. Every od_coeff emitted and every final psnr_hvs_{y,cb,cr,psnr_hvs} feature value is byte-identical between scalar and AVX2 on all 3 Netflix golden CPU pairs (verified per-frame XML diff via VMAF_CPU_MASK=0 vs default). Float accumulators (means / variances / mask / error) kept scalar by construction per ADR-0139 precedent. #pragma STDC FP_CONTRACT OFF at the TU header to prevent FMA formation from breaking 1-ulp guarantee. 3.58× DCT speedup on isolated microbenchmark (11.0 → 39.3 Mblocks/s at -O3 -mavx2 -mfma). New unit test test_psnr_hvs_avx2.c pins the DCT-level bit-exactness contract on 5 reproducible inputs. NEON follow-up PR to come (backlog T3-5-neon). Scoped NOLINTBEGIN/END around upstream Xiph scalar block keeps it verbatim as the bit-exact reference. | Accepted | simd, avx2, psnr-hvs, bit-exact, performance |
| ADR-0160 | NEON (aarch64) sister port of psnr_hvs — follows ADR-0159 AVX2. Same byte-identical bit-exactness contract vs scalar. Lane-width adjusted: NEON's 4-wide int32x4_t means each 8-column row splits into r_k_lo (cols 0-3) and r_k_hi (cols 4-7); the 30-butterfly runs twice per DCT pass (once per half) and the 8×8 transpose decomposes into four 4×4 vtrn1q_s32/vtrn2q_s32/vtrn1q_s64/vtrn2q_s64 stages plus a top-right ↔ bottom-left block swap. Float accumulators kept scalar per ADR-0139/0159. Inherits the ADR-0159 pointer-based accumulate_error signature to preserve scalar summation order (IEEE-754 non-associativity catches: local float accumulator + return would drift the Netflix golden by ~5.5e-5). New unit test test_psnr_hvs_neon.c pins DCT-level bit-exactness on 5 reproducible inputs; passes under qemu-aarch64-static. End-to-end 576×324 8-bit Netflix golden pair diff (scalar via VMAF_CPU_MASK=0 vs NEON default): byte-identical except for the fps timing header. 1080p 10-bit pairs covered by native-aarch64 CI + Netflix CPU Golden Tests required check (QEMU segfaults on heavy 10-bit workloads — known emulator limit, not a port defect). Closes backlog item T3-5-neon. | Accepted | simd, neon, aarch64, psnr-hvs, bit-exact, performance |
| ADR-0161 | SSIMULACRA 2 SIMD ports — AVX2 + AVX-512 + NEON in a single PR (T3-1 + T3-2). Vectorises 5 of the 8 hot kernels: multiply_3plane, linear_rgb_to_xyb (per-lane scalar cbrtf for bit-exactness), downsample_2x2 (scalar-order ((r0e + r0o) + r1e) + r1o sequential adds via vshufps+vpermpd / vpermt2ps / vuzp1q_f32+vuzp2q_f32 deinterleaves), ssim_map + edge_diff_map (per-lane double reduction per ADR-0139). Byte-for-byte identical to scalar on all 5 kernels × 3 ISAs — verified via new test_ssimulacra2_simd.c with reproducible xorshift32 inputs (5/5 on AVX-512 host, 5/5 under qemu-aarch64-static). Runtime dispatch via function pointers in Ssimu2State, init-time selection in a new init_simd_dispatch() helper. IIR blur + picture_to_linear_rgb (powf) explicitly deferred to follow-up PRs — the IIR serial recurrence needs per-column batching which is a focused PR on its own, and the powf kernel is a smaller ROI. #pragma STDC FP_CONTRACT OFF at every TU header (ignored on aarch64 GCC, kept for portability). Closes backlog T3-1 + T3-2 partially (pointwise kernels); leaves IIR blur / YUV-pipeline vectorisation as follow-up. | Accepted | simd, avx2, avx512, neon, ssimulacra2, bit-exact, performance |
| ADR-0162 | SSIMULACRA 2 FastGaussian IIR blur SIMD (phase 2 of T3-1). Vectorises blur_plane on all three ISAs — single largest wall-clock cost (30 calls / frame). Horizontal pass: N-row batching via _mm256_i32gather_ps / _mm512_i32gather_ps / NEON lane-sets (AVX2=8, AVX-512=16, NEON=4 rows). Vertical pass: column-SIMD load/store over prev1_*/prev2_* state arrays + scalar tail. Bit-exact to scalar per-lane under FLT_EVAL_METHOD == 0; verified via new test_blur subtest in test_ssimulacra2_simd.c (6/6 on AVX-512 host, 6/6 under qemu-aarch64-static). Dispatch via new blur_fn function pointer in Ssimu2State assigned in init_simd_dispatch() (NULL = scalar fallback). Only picture_to_linear_rgb (2 calls / frame, powf EOTF) remains scalar — deferred to follow-up. | Accepted | simd, avx2, avx512, neon, ssimulacra2, bit-exact, iir-blur, performance |
| ADR-0163 | SSIMULACRA 2 picture_to_linear_rgb SIMD (phase 3 of T3-1 — closes it at 7/7 kernels). YUV → linear RGB with BT.709/BT.601 matmul + sRGB EOTF, now on all 3 ISAs. Strategy: per-lane scalar pixel reads (handles all chroma ratios + 8/16-bit uniformly), SIMD matmul + normalise + clamp, per-lane scalar powf for the sRGB EOTF branch (mirrors phase-1 cbrtf pattern). Bit-exact to scalar under FLT_EVAL_METHOD == 0. New shared header ssimulacra2_simd_common.h with simd_plane_t { data, stride, w, h } decouples SIMD TUs from VmafPicture. Dispatch wrapper in ssimulacra2.c unpacks VmafPicture into the struct. Verified via 5 new test_ptlr_* subtests (420/420-10bit/444/444-10bit/422): 11/11 total pass on AVX-512 host + 11/11 under qemu-aarch64-static. Closes backlog T3-1 in full; SSIMULACRA 2 now has zero scalar hot paths. | Accepted | simd, avx2, avx512, neon, ssimulacra2, bit-exact, yuv-rgb, srgb-eotf |
| ADR-0164 | SSIMULACRA 2 snapshot-JSON regression gate (closes backlog T3-3). New python/test/ssimulacra2_test.py invokes the vmaf CLI with --feature ssimulacra2 on two checked-in YUV fixtures (src01_hrc00/01_576x324 + ref/dis_test_...q_160x90) and assertAlmostEquals the per-frame + pooled scores against values pinned on master HEAD. 12 asserts × 2 fixtures. Self-consistency gate only — not cross-checked against libjxl tools/ssimulacra2 (requires libjxl+codec build chain in CI, scope creep) or Pacidus Python port (known scipy.gaussian_filter vs libjxl FastGaussian IIR drift). CPU path is bit-exact across AVX2/AVX-512/NEON/scalar per ADR-0161/0162/0163, so values are reproducible on any host. | Accepted | test, ssimulacra2, regression-gate, fork-local |
| ADR-0165 | Tracked docs/state.md for in-tree bug-status hygiene + new CLAUDE.md §12 rule 13 mandating a same-PR update on every bug close / open / rule-out. Resolves the STATE.md-shaped half of Issue #20 that ADR-0028 (ADR-row-before-commit) intentionally left out — ADRs cover decisions, this file covers bug status. Three sections (Open / Recently closed / Confirmed not-affected / Deferred) with cross-links to ADRs + PRs + Netflix issues. PR template carries a checkbox; opt-out syntax no state delta: REASON for PRs without bug-status impact. Closes Issue #20 + backlog item T7-1. | Accepted | process, state-hygiene, claude-rule, fork-local |
| ADR-0166 | MCP server (vmaf-mcp Python package) release artifact channel — both PyPI (Trusted Publishing / OIDC, no token) and GitHub release attachment with Sigstore keyless signing + PEP 740 attestations + SLSA L3 provenance + SBOM. Wired as new mcp-build / mcp-sign / mcp-publish-pypi jobs in the existing supply-chain.yml (one workflow surface for libvmaf + MCP release matrix coherence). Operational note: a one-time PyPI Trusted Publisher binding (project vmaf-mcp, owner lusoris, repo vmaf, workflow supply-chain.yml, environment pypi-publish) must be configured by the user before the first release after this PR lands. Closes backlog item T7-2. | Accepted | release, mcp, supply-chain, sigstore, pypi |
| ADR-0167 | Path-mapped doc-drift enforcement — closes the gap surfaced by the 2026-04-25 docs audit (16 PRs landed in 2 days; 2 HIGH + 4 MEDIUM doc gaps slipped past the existing checks). Two layers: (1) new project hook .claude/hooks/docs-drift-warn.sh — informational stderr NOTICE when an Edit/Write touches a user-discoverable surface but no matching docs/<topic>/ file is touched; copies the auto-snapshot-warn.sh pattern, no block. (2) rule-enforcement.yml doc-substance-check promoted from advisory (continue-on-error: true) to blocking + rewritten with a path-mapped surface→docs check. ADR additions under docs/adr/ no longer satisfy the check (ADRs are decisions, not user-facing docs — CLAUDE.md §12 rule 10). Per-PR opt-out no docs needed: REASON for legitimate internal-refactor / bug-fix / test PRs. Path map covers libvmaf headers, feature extractors, SIMD/GPU twins, CLI tools, build flags, MCP server, tiny-AI CLI, ffmpeg patches. | Accepted | process, enforcement, claude-hook, ci, adr-0100 |
| ADR-0168 | Tiny-AI Wave 1 baselines C2 (nr_metric_v1) + C3 (learned_filter_v1) trained on KoNViD-1k (T6-1 partial; C1 deferred — Netflix Public Dataset is access-gated, not programmatically downloadable). C2: MobileNet-tiny 19.1K-param NR scoring model, 224×224 grayscale, 60 epochs early-stopped at 23, val/MSE 0.382. C3: 4-block residual CNN 18.9K-param, self-supervised on synthetic gaussian σ=1.2 + JPEG-Q35 degradation pairs, 100 epochs, val/L1 0.019. Four new scripts under ai/scripts/ (fetch_konvid_1k.py / extract_konvid_frames.py / train_konvid.py / export_tiny_models.py); two new datamodule classes (FrameMOSDataset, PairedFrameDataset) under vmaf_train.data.frame_dataset. Schema + C-side VmafModelKind enum extended to allow kind: "filter" (registry trust-root for filter models consumed by ffmpeg vmaf_pre, NOT loaded by libvmaf scoring path). KoNViD-1k MOS values not redistributed — populated manifest stays gitignored, user re-runs vmaf-train manifest-scan on fresh clone. ORT op-allowlist + roundtrip atol 1e-4 both pass. | Accepted | tiny-ai, training, onnx, konvid-1k, c2, c3, fork-local |
| ADR-0169 | ONNX op-allowlist admits Loop + If (T6-5); Scan stays rejected. Wire-format scanner in onnx_scan.c extended with mutually-recursive scan_attribute / scan_node helpers that descend into NodeProto.attribute (field 5) → AttributeProto.g / .graphs (fields 6 / 11) so forbidden ops cannot hide inside a Loop.body / If.then_branch / If.else_branch subgraph. Recursion is depth-bounded (VMAF_DNN_MAX_SUBGRAPH_DEPTH = 8). Python vmaf_train.op_allowlist mirrors the recursion via _collect_op_types so the export-time check and the runtime load-time check stay in lockstep. Bounded-iteration guard explicitly deferred to a follow-up ADR (data-flow analysis for Loop.M → Constant ≤ MAX_LOOP_ITERATIONS doubles scope; track as T6-5b). 4 existing tests flipped from "Loop/If rejected" to "Loop/If accepted" + 4 new subgraph-recursion tests added (2 C, 2 Python). | Accepted | tiny-ai, onnx, security, op-allowlist |
| ADR-0170 | vmaf_pre ffmpeg filter extended to 10/12-bit LE pixel formats (yuv{420,422,444}p1{0,2}le + gray{10,12}le) and to optional chroma filtering via new chroma=0 / chroma=1 option (default 0 preserves luma-only back-compat). New libvmaf public API vmaf_dnn_session_run_plane16(sess, in, in_stride_bytes, w, h, bpc, out, out_stride_bytes) alongside existing _luma8; bpc in range 9..16 selects the normalisation divisor (1 << bpc) - 1. Two matching tensor helpers vmaf_tensor_from/to_plane16 in tensor_io.{h,c}. Filter dispatches _luma8 at bpc=8 and _plane16 at bpc≥9 via a new run_plane() helper; chroma=1 re-runs the same session on U/V at chroma-subsampled dimensions. 3 new tensor-io round-trip tests (10-bit identity, bpc bounds, 12-bit clamp). Closes BACKLOG T6-4. | Accepted | tiny-ai, ffmpeg, dnn, api, fork-local |
| ADR-0171 | Bounded Loop.M trip-count guard (T6-5b; closes the follow-up deferred in ADR-0169). Two layers, mirroring the ADR-0167 doc-drift pattern. (1) Python export-time _collect_loop_violations walks the graph, traces every Loop's first input back to a Constant int64 scalar in the local scope (recurses into subgraphs), rejects when the producer is a graph input, the wrong op_type, or the value is outside [0, MAX_LOOP_TRIP_COUNT] (default 1024, per-call overridable). AllowlistReport gains loop_violations field and a strengthened pretty(). (2) C wire-format scanner gains a counter threaded through scan_graph / scan_node / scan_attribute that increments on every Loop op_type at any depth; rejects with -EPERM + first_bad="Loop" when count exceeds VMAF_DNN_MAX_LOOP_NODES = 16. C cap is intentionally coarser than the Python data-flow check — reproducing the producer-map lookup in the wire scanner would violate ADR D39's "no libprotobuf-c" constraint. 5 new Python tests + 1 new C test. | Accepted | tiny-ai, onnx, security, op-allowlist |
| ADR-0172 | New MCP tool describe_worst_frames (T6-6) — scores a (ref, dis) pair, picks the N worst-VMAF frames, extracts each as PNG via ffmpeg select='eq(n,<idx>)', runs a vision-language model and returns frame metadata + descriptions. Lazy VLM loader cascades HuggingFaceTB/SmolVLM-Instruct → vikhyatk/moondream2 → metadata-only fallback (when transformers is missing or no candidate model loads). New vlm optional dependency group in MCP pyproject.toml (transformers + torch + Pillow + accelerate). First concrete consumer of ADR-0171's bounded-Loop guard — autoregressive VLM token generation requires Loop nodes. PNGs stored under /tmp/vmaf-mcp-worst-<pid>/ so the caller can fetch them post-call. 5 new tests + 1 extended; all 17 MCP tests pass. | Accepted | tiny-ai, mcp, vlm, fork-local |
| ADR-0173 | Audit-first implementation of ADR-0129's PTQ int8 policy (T5-3). Three new optional fields in registry.schema.json (quant_mode enum fp32 / dynamic / static / qat, quant_calibration_set, quant_accuracy_budget_plcc default 0.01). Three scripts under ai/scripts/ (ptq_dynamic.py wraps ORT quantize_dynamic; ptq_static.py wraps quantize_static with a .npz-backed CalibrationDataReader; qat_train.py is a CLI scaffold that raises NotImplementedError until the per-model QAT PR lands the trainer hook). New VmafModelQuantMode enum in model_loader.h + sidecar parser branch (default FP32 fail-safe on unknown values). 4 Python smoke tests + 3 C sidecar tests. Audit-first: no shipped model flips its quant_mode from fp32 in this PR; the runtime .int8.onnx redirect + the ai-quant-accuracy CI gate land with the first per-model quantisation PR (T5-3b). New docs/ai/quantization.md user reference. | Accepted | tiny-ai, onnx, quantization, registry, ci, fork-local |
| ADR-0174 | First per-model PTQ — learned_filter_v1 flips to quant_mode: "dynamic" (T5-3b; closes T5-3 fully). 80 KB → 33 KB (2.4× shrink). Drop measurement: PLCC 0.999883 vs fp32 on a 16-sample synthetic input set, drop 0.000117 vs budget 0.01 (100× margin). Runtime .int8.onnx redirect wired in vmaf_dnn_session_open — when the sidecar declares quant_mode != FP32, the loader strips trailing .onnx, appends .int8.onnx, re-validates, and passes that path to ORT. Fp32 file stays on disk as the regression baseline. New int8_sha256 registry/sidecar field (required when quant_mode != fp32) extends the trust-root invariant. New ai/scripts/measure_quant_drop.py walks the registry and gates each non-fp32 model against its quant_accuracy_budget_plcc. New CI step in the Tiny AI job runs --all. C2 nr_metric_v1 stays fp32 in this PR — its dynamic-batch export trips ORT's internal shape inference (tracked as T5-3c follow-up). | Accepted | tiny-ai, onnx, quantization, registry, ci, fork-local |
| ADR-0175 | Vulkan compute backend — scaffold-only audit-first PR (T5-1; closes T5-1 audit half, runtime + VIF-pathfinder + lavapipe smoke land in follow-up PRs per ADR-0127). New public header libvmaf_vulkan.h declaring VmafVulkanState / VmafVulkanConfiguration / vmaf_vulkan_state_init / _import_state / _state_free / vmaf_vulkan_list_devices / vmaf_vulkan_available. New core/src/vulkan/ (common + picture_vulkan) + core/src/feature/vulkan/ (3 kernel stubs: adm/vif/motion). All entry points return -ENOSYS. New enable_vulkan feature option (default disabled) with conditional subdir('vulkan') in core/src/meson.build. New 4-sub-test smoke at core/test/test_vulkan_smoke.c pinning the stub contract. New CI matrix row Build — Ubuntu Vulkan Scaffold (stub kernels) compiling with -Denable_vulkan=enabled. New ffmpeg patch 0004-libvmaf-wire-vulkan-backend-selector.patch mirroring the SYCL selector in 0003 (adds vulkan_device libvmaf filter option). New docs/backends/vulkan/overview.md. Zero runtime dependencies for the scaffold — no dependency('vulkan'), no volk, no glslc, no VMA; those land with the runtime PR. | Accepted | gpu, vulkan, scaffold, audit-first, fork-local |
| ADR-0176 | Vulkan VIF cross-backend gate. Two CI lanes: Vulkan VIF Cross-Backend (lavapipe, places=4) runs on every PR using Mesa lavapipe on ubuntu-24.04 (no GPU runner needed); Vulkan VIF Cross-Backend (Arc A380, advisory) runs nightly on the self-hosted Arc runner (parked behind if: false until label vmaf-arc registered). Both invoke scripts/ci/cross_backend_vif_diff.py comparing per-frame integer_vif_scale0..3 at places=4 against the CPU scalar reference on the Netflix normal pair. Empirical baseline at acf9f5b8 is ULP=0 (GLSL kernel uses native int64 accumulators); the places=4 slack is forward compatibility for future driver / kernel changes. Includes VMAF_FEATURE_EXTRACTOR_VULKAN = 1 << 5 flag, public state-level API (vmaf_vulkan_state_init / _state_free / _available / _list_devices), CLI flags --no_vulkan and --vulkan_device <N>. Closes T5-1b-v. | Accepted | ci, vulkan, gpu, numerical-correctness, fork-local |
| ADR-0177 | Vulkan motion kernel (T5-1c, motion half). Replaces the 37-line stub at core/src/feature/vulkan/motion_vulkan.c with a real VmafFeatureExtractor + GLSL compute shader (shaders/motion.comp). Implements separable 5-tap Gaussian blur ({3571, 16004, 26386, 16004, 3571}, sum=65536) + per-WG int64 SAD reduction; emits integer_motion and integer_motion2 with the standard 1-frame lag. motion3 (5-frame window mode) deliberately deferred — no shipped model uses it. Cross-backend gate generalized: scripts/ci/cross_backend_vif_diff.py gains --feature {vif,motion} selector; both lavapipe and Arc-nightly lanes run a second motion diff step. Empirical baseline: ULP=0 vs CPU on the Netflix normal pair (48 frames). Closes the motion half of T5-1c; ADM follow-up next PR. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0178 | Vulkan ADM kernel (T5-1c, ADM half — closes T5-1c). Replaces the 37-line stub at core/src/feature/vulkan/adm_vulkan.c with a real VmafFeatureExtractor (~700 LOC) backed by a new GLSL compute shader (shaders/adm.comp, ~660 LOC) implementing 4-scale CDF 9/7 DWT + decouple+CSF + per-band reductions. 16 pipelines per ADM extractor (one per (scale, stage)). Provides the standard integer_adm2, integer_adm_scale0..3 outputs. Cross-backend gate gains a third "ADM cross-backend diff" step in both lavapipe and Arc-nightly lanes. Empirical baseline: ULP=0 vs CPU on Netflix normal (48 frames at 576x324) + 1920x1080 checkerboard (3 frames); residual on scales 1-3 at full IEEE-754 precision is ~7e-7 from host-side double-summation order, well under places=4. Closes T5-1c — Vulkan kernel matrix now matches SYCL/CUDA for the default vmaf_v0.6.1 model. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0179 | float_moment SIMD parity (AVX2 + NEON), closing the only fully-scalar row in the SIMD-coverage matrix (T7-19). New compute_{1st,2nd}_moment_{avx2,neon} under core/src/feature/{x86,arm64}/moment_*.{c,h} follow the ansnr_avx2.c pattern: square in float, accumulate into double via scattered-tmp (AVX2) or lane-pair widening via vcvt_f64_f32 (NEON). Dispatched from float_moment.c::init via function pointers selected from vmaf_get_cpu_flags(). Tolerance-bounded (1e-7 relative — ~500× tighter than the production snapshot gate's places=4), not bit-exact, matching the contract documented in the kernel TU headers. New test_moment_simd runs four cases per arch (two random seeds, an aligned width, and a tiny edge case to exercise the per-row tail). End-to-end CLI output unchanged at JSON %g precision. | Accepted | simd, avx2, neon, feature-extractor, fork-local |
| ADR-0180 | CPU coverage matrix audit (post-T7-19 sweep): closes 5 stale gaps in one pass. T7-22 (ms_ssim per-scale SIMD) was already done via ADR-0138 / 0139 / 0140 — verified via 3.2× wall-clock speedup vs --cpumask 0xfffffffe. CAMBI scalar fallback already exists at cambi.c:446-460 — earlier "no pure-C scalar path" note was wrong. motion_v2 NEON has been shipped since 2026-04 at arm64/motion_v2_neon.c — earlier "x86 SIMD but no NEON" note was wrong. integer ansnr is a phantom row — no integer_ansnr extractor is registered. T7-21 (psnr_hvs AVX-512) closes as AVX2 ceiling with empirical evidence — 1.17× wall-clock speedup of AVX2 vs scalar means the 8-wide DCT is bandwidth-amortised; AVX-512 widening would force a 2-block host batch with no measurable payoff. Same verdict applies to deferred float_moment AVX-512. Matrix + BACKLOG updated to reflect ground truth. | Accepted | simd, audit, fork-local, doc-correction |
| ADR-0181 | T7-26 — Global feature-characteristics registry + per-backend dispatch-strategy modules (CUDA / SYCL / Vulkan). Replaces the existing per-context SYCL GRAPH_AREA_THRESHOLD heuristic with a per-feature registry: each VmafFeatureExtractor carries a VmafFeatureCharacteristics descriptor (n_dispatches_per_frame, is_reduction_only, min_useful_frame_area, dispatch_hint). Per-backend dispatch_strategy.{c,h} modules consume the descriptor + frame dims + env overrides and return the backend's primitive (SYCL DIRECT vs GRAPH_REPLAY, CUDA DIRECT vs GRAPH_CAPTURE stub, Vulkan PRIMARY_CMDBUF vs SECONDARY_CMDBUF_REUSE stub). New env override surface: VMAF_<BACKEND>_DISPATCH=feature:strategy,... (per-feature; supersedes legacy VMAF_SYCL_USE_GRAPH / VMAF_SYCL_NO_GRAPH aliases). Descriptors seeded for vif (4 dispatches, 720p area), motion (2 dispatches, 1080p area), adm (16 dispatches, 720p area). Empirical: ADM at 576×324 within 0.5% of pre-T7-26 behaviour (registry preserves byte-for-byte AUTO + 720p semantics). Foundation for adding 14 GPU long-tail kernels without 42 duplicate dispatch sites. Same PR also fixes a pre-existing GCC LTO type-mismatch surfaced by the new chars field — null.c, feature_lpips.c, ssimulacra2.c were missing #include "config.h" and saw a smaller VmafFeatureExtractor struct than feature_extractor.c. | Accepted | gpu, cuda, sycl, vulkan, architecture, fork-local |
| ADR-0182 | GPU long-tail batch 1 — psnr + ciede + moment on CUDA / SYCL / Vulkan (T7-23 + follow-ups). 14 of ~16 registered metrics are missing GPU coverage today; this ADR scopes the bundle that closes the 3 simplest GPU-friendly metrics across all 3 backends. Per-backend ordering: psnr Vulkan → psnr CUDA → psnr SYCL → ciede {3 backends} → moment {3 backends}. Each backend group lands as a separate commit on the feature branch so partial revert is cheap. Batch 1a (this PR) ships psnr Vulkan only — luma-only psnr_y, 89-LOC GLSL shader + 391-LOC host C, single dispatch/frame, subgroup-int64 reduction. Verified bit-exact (max_abs_diff = 0.0) vs CPU scalar on Intel Arc A380 / Mesa anv across 48 frames. Cross-backend gate gains a "PSNR cross-backend diff" step on the lavapipe lane. CUDA / SYCL twins + ciede / moment land in subsequent batches 1b–1d on top of the same ADR. | Accepted | gpu, cuda, sycl, vulkan, feature-extractor, fork-local |
| ADR-0183 | T7-28 — libvmaf_sycl FFmpeg filter, zero-copy QSV/VAAPI import. Closes the user-doc gap exposed by PR #126: hwdec via -hwaccel qsv was forced through a hwdownload,format=yuv420p round-trip because the regular libvmaf filter only takes software frames. New ffmpeg-patches/0005-libvmaf-add-libvmaf-sycl-filter.patch adds a dedicated libvmaf_sycl filter that consumes oneVPL mfxFrameSurface1 frames, extracts the VA surface ID, and routes through vmaf_sycl_import_va_surface for zero-copy DMA-BUF import on the Level Zero / SYCL compute queue. Pairs with existing 0003-* (sycl_device option on regular libvmaf filter) so users have both paths: libvmaf=sycl_device=N for software frames + SYCL compute, libvmaf_sycl=… for QSV hwdec + zero-copy SYCL. T7-29 (Vulkan VkImage import) remains open — needs new C-API surface in libvmaf_vulkan.h. | Accepted | sycl, ffmpeg, fork-local, zero-copy |
| ADR-0184 | T7-29 part 1 — Vulkan VkImage zero-copy import C-API scaffold. Symmetric to T7-28 (SYCL VAAPI/dmabuf, ADR-0183) but for Vulkan: when FFmpeg decodes via -hwaccel vulkan -hwaccel_output_format vulkan, the regular libvmaf filter forces a hwdownload,format=yuv420p round-trip because the C-API surface for VkImage import doesn't exist. This ADR adds three new entry points in libvmaf_vulkan.h — vmaf_vulkan_import_image (takes external VkImage + VkSemaphore), vmaf_vulkan_wait_compute, vmaf_vulkan_read_imported_pictures — mirroring the SYCL backend's import surface. Header purity: handles cross the ABI as uintptr_t (matches the libvmaf_cuda.h precedent). Scaffold only: every function returns -ENOSYS, mirroring how the original Vulkan backend shipped via ADR-0175. T7-29 part 2 (real implementation: vkCmdCopyImageToBuffer + timeline-semaphore wait) and part 3 (ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch) follow in subsequent PRs. | Accepted | vulkan, ffmpeg, fork-local, zero-copy, scaffold |
| ADR-0185 | T7-31 — hide volk / vk* symbols from libvmaf.so's public ABI. When libvmaf is built with -Denable_vulkan=enabled, the bundled volk Vulkan-loader leaked ~30 volk* + the full vk* API into the .so's exported symbols. Static FFmpeg builds (BtbN-style cross-toolchain releases, lawrence's glibc-2.28 environment, etc.) that link both libvmaf and libvulkan.a get GNU-ld multiple-definition errors at the final link step. Fix: pass -Wl,--exclude-libs,ALL on the libvmaf.so link command in core/src/meson.build; gates off Darwin / Windows where the flag isn't supported (those linkers don't auto-export static-archive symbols anyway). Verified via nm -D libvmaf.so (zero vk* / volk* symbols post-fix); smoke + end-to-end psnr_vulkan on Arc A380 unchanged. | Accepted | vulkan, build, fork-local, abi |
| ADR-0186 | T7-29 parts 2 + 3 — Vulkan VkImage zero-copy import implementation + libvmaf_vulkan FFmpeg filter. Drops the -ENOSYS stubs from ADR-0184: per-state ref/dis staging VkBuffer pair (HOST_VISIBLE, DATA_ALIGN-strided), vkCmdCopyImageToBuffer + timeline-semaphore wait per frame, vmaf_vulkan_state_build_pictures builder so read_imported_pictures routes through standard vmaf_read_pictures with no-op release callbacks. Adds vmaf_vulkan_state_init_external so the FFmpeg filter runs libvmaf compute on the decoder's VkDevice (source VkImage handles are device-bound). New ffmpeg-patches/0006-libvmaf-add-libvmaf-vulkan-filter.patch packages the libvmaf_vulkan filter consuming AV_PIX_FMT_VULKAN frames. Synchronous v1 design (fence-wait inside import_image); async pending-fence v2 + true zero-copy GPU compute deferred. Also introduces fork rule §12 r14: PRs that change libvmaf surfaces probed by ffmpeg-patches/ must update the relevant patch in the same PR. Smoke 10/10; float_moment cross-backend gate clean (0/48 × 4 metrics on Arc A380). | Accepted | vulkan, ffmpeg, fork-local, zero-copy |
| ADR-0187 | T7-23 / batch 1c part 1 — ciede_vulkan extractor. First non-bit-exact GPU twin in the fork: per-pixel ciede2000 ΔE uses ~40 transcendentals (pow / sqrt / sin / atan2) so bit-exactness against the libm-based CPU is not on the table. Single-dispatch GLSL shader emits per-WG float partials; host accumulates in double, divides by W·H, applies the CPU's logarithmic transform 45 - 20·log10(mean_ΔE) for the ciede2000 metric. 6 storage-buffer bindings (ref + dis Y/U/V at full luma resolution; chroma upscaled host-side via the ciede.c::scale_chroma_planes pattern). Empirical: Intel Arc A380 + Mesa anv → max_abs = 1.0e-5 across 48 frames at 576×324, well under places=4 threshold (≤5e-5) — gate runs at places=4 for parity with the bit-exact kernels. New CI step on the lavapipe lane. CUDA + SYCL twins follow as batch 1c parts 2 + 3. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0188 | T7-23 / batch 2 scope — psnr_hvs / ssim / ms_ssim across CUDA + SYCL + Vulkan. Picks up where batch 1 closed (PR #137). Per-metric ordering: ssim first (smallest, scaffolds the separable Gaussian filter ms_ssim reuses), then ms_ssim (5-level pyramid built on the ssim kernel), then psnr_hvs (largest — DCT-based). Per-backend ordering inside each metric: Vulkan → CUDA → SYCL (same as batch 1). Precision targets: places=4 for ssim/ms_ssim with measure-then-set-the-contract approach (relax to places=3 if needed), places=2 for psnr_hvs. ssim/ms_ssim luma-only; psnr_hvs needs all three planes (CUDA gets free chroma upload via PR #137's bitmask fix). 9 PRs total to close batch 2 (3 metrics × 3 backends), each ~500-1000 LOC + per-metric ADRs. | Accepted | gpu, cuda, sycl, vulkan, feature-extractor, fork-local |
| ADR-0189 | T7-23 / batch 2 part 1a — float_ssim_vulkan extractor. Vulkan twin of the active CPU float_ssim. Two-dispatch design: horizontal 11-tap Gaussian over ref / cmp / ref² / cmp² / ref·cmp into 5 intermediate float buffers, then vertical 11-tap + per-pixel SSIM combine + per-WG float partials. Host accumulates partials in double, divides by (W-10)·(H-10) (matches CPU's iqa_ssim valid-region averaging). 11-tap kernel weights baked into GLSL byte-for-byte from g_gaussian_window_h in iqa/ssim_tools.h. picture_copy host-side normalises uint sample → float [0, 255] before upload (matches float_ssim.c). v1 limitation: scale=1 only — auto-detect rejects scale > 1 with -EINVAL; production 1080p needs --feature float_ssim_vulkan:scale=1 pinned (or smaller input). Cross-backend gate fixture (576×324) auto-resolves to scale=1. Empirical: Intel Arc A380 + Mesa anv → max_abs = 1.0e-6 across 48 frames at 576×324, well under places=4 threshold. CUDA + SYCL twins follow as batch 2 parts 1b + 1c. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0190 | T7-23 / batch 2 part 2a — float_ms_ssim_vulkan extractor. Wang multi-scale SSIM on Vulkan: 5-level pyramid built via 9-tap 9/7 biorthogonal LPF + 2× downsample (ms_ssim_decimate.comp), then per-scale SSIM compute that emits three per-WG partials (l, c, s) instead of a single combined SSIM (ms_ssim.comp — variant of ssim.comp). Host accumulates partials in double per scale, applies the Wang weights α/β/γ (matches ms_ssim.c::g_alphas/g_betas/g_gammas byte-for-byte) for the product combine. Surfaced one bug during bring-up: the σx²/σy² clamp-to-zero in ssim_variance_scalar (line 165) is required to avoid sqrt(negative) → NaN at scale 0 when float ULP errors push variances slightly negative on flat regions. Min-dim guard mirrors ADR-0153 (176×176). v1 doesn't implement enable_lcs (15 extra metrics). Empirical: Intel Arc A380 + Mesa anv → max_abs = 2.0e-6 across 48 frames at 576×324, well under places=4. CUDA + SYCL twins follow as batch 2 parts 2b + 2c. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0191 | T7-23 / batch 2 part 3a — float_psnr_hvs_vulkan extractor. First DCT-based GPU kernel in the fork. One GLSL shader (psnr_hvs.comp), one dispatch per plane (3 per frame). Per-WG = one 8×8 block: cooperatively load samples + per-quadrant means/vars, run the Xiph integer od_bin_fdct8x8 scalar (lifting + RSHIFT — int32 arithmetic matches CPU bit-for-bit), per-thread mask + masked-error contribution, subgroupAdd to a per-block float partial. Host accumulates partials in double, applies score / pixels / samplemax² then 10·log10(1/score) per plane. Combined psnr_hvs = 0.8·Y + 0.1·(Cb + Cr). Step=7 overlap matches CPU loop. 6 pipelines (3 planes × 2 bpc paths) via specialisation constants. CSF/mask tables baked per plane. places=2 precision target per ADR-0188 (DCT integer-exact, but per-block float reductions in s_mask / s_gvar and the per-plane log10 limit the floor). CUDA + SYCL twins follow as batch 2 parts 3b + 3c. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0192 | T7-23 / batch 3 scope — close every remaining GPU-coverage gap: integer_motion_v2 + float_ansnr + ssimulacra2 + cambi (Group A: no GPU twin yet) plus float_psnr / float_motion / float_vif / float_adm (Group B: float twins of int kernels already on GPU). Per-metric ordering by ascending complexity: motion_v2 (300 LOC, reuses integer_motion convolve) → float_ansnr (124 LOC) → float twins (4 metrics, smallest first) → ssimulacra2 (XYB + IIR + per-stage SSIM) → cambi (sequential range-tracking, hardest port — feasibility spike required). Per-backend Vulkan → CUDA → SYCL (matches batches 1 + 2). Precision contracts: places=4 for motion_v2; places=3 for float twins + float_ansnr; places=2 for ssimulacra2 + cambi (measure-first). 21+ PRs to close (7 metrics × 3 backends, ssimulacra2 + cambi may sub-split). After this batch, every registered feature extractor has at least one GPU twin (lpips remains ORT-delegated per ADR-0022). Float twins kept native (not aliased to int kernels — different input domains). | Accepted | gpu, cuda, sycl, vulkan, feature-extractor, fork-local |
| ADR-0193 | T7-23 / batch 3 part 1a — integer_motion_v2_vulkan extractor. Single-dispatch GLSL kernel that exploits convolution linearity (the SAD of blurred prev/cur frames equals the sum of absolute values of the blurred per-pixel diff), so each frame computes its score in one V→H separable convolve over (prev_ref - cur_ref), accumulating absolute values into per-WG int64 partials with no blurred-state buffer (vs motion_vulkan's ping-pong). Raw-pixel ping-pong (ref_buf[2]) halves upload bandwidth vs framework prev_ref. Corrected by ADR-0662: current CPU integer_motion_v2.c::mirror uses reflect-101 (2*size - idx - 2), and CUDA / SYCL / Vulkan twins must match that literal; the original -1 prose here was stale. | Accepted | vulkan, gpu, feature-extractor, fork-local, bit-exact |
| ADR-0194 | T7-23 / batch 3 part 2 — float_ansnr_{vulkan,cuda,sycl} extractors. Single-dispatch GPU kernel applies the CPU's 3x3 ref filter and 5x5 dis filter inline from a 20×20 shared/SLM tile, then accumulates per-pixel sig = ref_filtr² and noise = (ref_filtr - filtd)² into per-WG float partials. Host reduces in double and applies the CPU formulas float_ansnr = 10*log10(sig/noise) and float_anpsnr = MIN(10*log10(peak²·w·h/max(noise, 1e-10)), psnr_max). Edge-replicating mirror (2*size - idx - 1) matches CPU ansnr_filter2d_s — same divergence-from-motion footgun as ADR-0193. Empirical floor on cross-backend gate fixture: max_abs_diff = 6e-6 (8-bit, 48 frames) and 2e-6 (10-bit, 3 frames) on all three backends — identical numbers across Vulkan / CUDA / SYCL. Closes ANSNR's "CPU-only float, no GPU twin" matrix gap. | Accepted | vulkan, cuda, sycl, gpu, feature-extractor, fork-local, places-4 |
| ADR-0195 | T7-23 / batch 3 part 3 — float_psnr_{vulkan,cuda,sycl} extractors. Smallest GPU twin in the long-tail: ~120 LOC GLSL / ~110 LOC PTX / ~150 LOC SYCL. Single-dispatch, no halo, no shared tile — every pixel is independent. Per-thread (ref - dis)² (float), sub-group reduce + SLM cross-subgroup reduce → per-WG float partials, host accumulates in double and applies MIN(10·log10(peak² / max(noise/(w·h), 1e-10)), psnr_max). Empirically bit-exact vs CPU on all three backends, both 8-bit (48 frames) and 10-bit (3 frames) — max_abs_diff = 0.0e+00 everywhere. Float-domain kernel structurally too simple to drift; host-side double reduction absorbs any per-WG ULP noise. Drive-by docs fix: features.md row claimed float_psnr_y / _cb / _cr plane outputs (wrong — the CPU extractor only emits float_psnr); corrected in this PR. First of four Group B float twins from ADR-0192. | Accepted | vulkan, cuda, sycl, gpu, feature-extractor, fork-local, bit-exact |
| ADR-0196 | T7-23 / batch 3 part 4 — float_motion_{vulkan,cuda,sycl} extractors. Float twin of integer_motion's GPU kernels: same V→H 5-tap separable Gaussian blur (FILTER_5_s float weights summing to ~1.0), same 2-buffer ping-pong of blurred refs, same per-WG float SAD partials + host double reduction. motion = sad / (w·h), motion2 = min(prev, cur) emitted at index-1 (delayed-by-one). Mirror padding: skip-boundary 2*(sup-1) - idx matches CPU convolution_internal.h::convolution_edge_s (NOT motion_v2's edge-replicating). Empirical max_abs_diff = 3e-6 (8-bit, 48 frames) / 1e-6 (10-bit, 3 frames) on all three backends with identical numbers — strong correctness signal. Lavapipe + Arc A380 + RTX 4090. Second of four Group B float twins from ADR-0192. | Accepted | vulkan, cuda, sycl, gpu, feature-extractor, fork-local, places-4 |
| ADR-0197 | T7-23 / batch 3 part 5 — float_vif_{vulkan,cuda,sycl} extractors. Third Group B float twin. 4-scale Gaussian pyramid (filters 17/9/5/3 at default vif_kernelscale=1.0) + per-pixel vif_stat_one_pixel. 7 dispatches/frame: 4 compute + 3 decimate. CPU's VIF_OPT_HANDLE_BORDERS branch — per-scale dims prev/2, decimation samples (2*gx, 2*gy) with mirror padding. Mirror-asymmetry fix: CPU has two H-mirror formulas that differ by 1 — vif_mirror_tap_h (-1, scalar fallback only) vs convolution_edge_s (-2, AVX2 production border path). The GPU follows the AVX2 form (production); using scalar's form drifts 5.46e-4 at scale 1, using AVX2's brings it to 1.4e-5. places=4 across all 4 scales, identical max_abs_diff across all three backends (1e-6 / 1.4e-5 / 1.8e-5 / 3.7e-5 at 8-bit; tighter at 10-bit). v1 restricts kernelscale=1.0 only. | Accepted | vulkan, cuda, sycl, gpu, feature-extractor, fork-local, places-4 |
| ADR-0198 | Follow-up to ADR-0185 — -Wl,--exclude-libs,ALL only takes effect at the gcc -shared step that produces libvmaf.so; static-archive builds (default_library=static -Denable_vulkan=enabled) still bundle volk's full vk* API as STB_GLOBAL symbols inside libvmaf.a, which collides with the Khronos libvulkan.a in BtbN-style fully-static FFmpeg link environments (lawrence's repro 2026-04-27, ~700 multi-def errors). Fix: rename volk's vk* symbols to vmaf_priv_vk* at the C preprocessor level via a force-included header generated from volk.h. The packagefile parses every extern PFN_vkXxx vkXxx; declaration, emits #define vkXxx vmaf_priv_vkXxx (784 entries for volk-1.4.341), and -includes the result on volk.c and every libvmaf TU pulling in volk_dep. Identical fix for shared and static; no per-build-mode meson branches. Verified: shared nm -D libvmaf.so reports 0 leaked vk* (unchanged from ADR-0185); static nm libvmaf.a reports 0 GLOBAL vk* (was ~700) and 719 GLOBAL vmaf_priv_vk*; BtbN-style gcc -static main.c libvmaf.a libvulkan-stub.a link succeeds; test_vulkan_smoke 10/10 pass on the renamed build (volk runtime dispatch still functional). | Accepted | vulkan, build, fork-local, abi |
| ADR-0199 | T7-23 / batch 3 part 6 — float_adm_vulkan extractor, sixth and final Group B float twin. Float twin of integer_adm_vulkan (ADR-0178): same 4-stage / 4-scale wave-of-stages design (16 pipelines), float buffers throughout, host-side double accumulation across WGs. Stage 0 = DWT vertical (ref+dis fused, dim_z=2), stage 1 = DWT horizontal (4 bands), stage 2 = decouple + CSF (writes csf_a + csf_f), stage 3 = CSF denominator + Contrast Measure fused (1D dispatch over 3 bands × num_active_rows; per-WG 6-slot float partials). Mirror-asymmetry status: float_adm has NO trap analogous to ADR-0197 — both the scalar adm_dwt2_s and the AVX2 float_adm_dwt2_avx2 consume the same dwt2_src_indices_filt_s index buffer (2 * sup - idx - 1 for both axes); the GPU follows that. Picture-copy semantics applied in-shader ((u8 + offset) for 8-bit, (u16 / scaler) + offset for HBD; offset = -128). places=4 cross-backend contract on the lavapipe lane. CUDA + SYCL twins land as a focused follow-up. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0200 | Bug-fix follow-up to ADR-0198. The -include volk_priv_remap.h flag was attached to volk_dep.compile_args; on default_library=static builds meson copies dependency compile_args into the generated libvmaf.pc Cflags: so consumers can re-link against transitive deps. lawrence's BtbN-style fully-static FFmpeg build (cross-toolchain glibc-2.28, 2026-04-27 22:19) hit <command-line>: fatal error: /<libvmaf-build-dir>/subprojects/volk-vulkan-sdk-1.4.341.0/volk_priv_remap.h: No such file or directory on the check_func_headers aom/aom_codec.h probe — the build-dir path no longer existed after libvmaf was installed to /opt/ffbuild/. Fix: move the -include off volk_dep.compile_args and onto libvmaf's private c_args via vmaf_cflags_common += ['-include', volk_priv_remap_h_path] in core/src/vulkan/meson.build, where the path is pulled from subproject('volk').get_variable('volk_priv_remap_h_path'). c_args: are private to the target and don't leak into pkg-config; symbol-rename behaviour is byte-for-byte identical. Verified post-fix Cflags: -I${includedir} -I${includedir}/libvmaf -DVK_NO_PROTOTYPES -pthread — no leaked path. nm libvmaf.a still reports 0 GLOBAL vk* and 719 vmaf_priv_vk*. | Accepted | vulkan, build, fork-local, abi |
| ADR-0201 | T7-23 / batch 3 part 7 — ssimulacra2_vulkan extractor. 4-shader Vulkan kernel: ssimulacra2_xyb.comp (linear-RGB → XYB with deterministic in-shader cube root, port of vmaf_ss2_cbrtf), ssimulacra2_mul.comp (3-plane elementwise product for ref²/dis²/ref·dis pre-blur), ssimulacra2_blur.comp (separable Charalampidis 2016 3-pole IIR with sigma=1.5 — one workgroup per row for H pass, one per column for V pass; per-channel offsets via push consts to avoid the descriptor-update-between-record trap), and ssimulacra2_ssim.comp (per-pixel SSIMMap + EdgeDiffMap stats with 18-slot per-WG shared-memory halving reduction). Host: YUV→linear-RGB scalar libjxl port + 2×2 box downsample between scales (with full-resolution plane stride preserved across the pyramid so GPU channel offsets stay constant). All 4 shaders compile with -O0 to disable SPIR-V FMA contraction. Empirical (Netflix normal pair, 48 frames): per-scale SSIM + edge-diff stats agree CPU-vs-Vulkan to 4–5 decimal places; pooled ssimulacra2 score max_abs_diff = 1.59e-2 (mean 5.30e-3). Cross-backend gate runs at places=1 — the parent ADR-0192's nominal places=2 was anticipated to "may surprise upward; measure first" and the multi-stage XYB+IIR+SSIM-combine+log float pipeline lands at the places=1 floor. Min-dim guard rejects below 8×8. CUDA + SYCL twins follow as batch 3 parts 7b + 7c. | Accepted | vulkan, gpu, feature-extractor, fork-local |
| ADR-0202 | T7-23 / batch 3 parts 6b + 6c — float_adm_cuda + float_adm_sycl extractors, the CUDA and SYCL twins of ADR-0199. Direct ports of the Vulkan kernel: same 4-stage / 4-scale wave-of-stages design, same -1 mirror form, same fused stage 3 with cross-band CM threshold, same per-scale [csf_h/v/d, cm_h/v/d] 6-slot WG partials reduced on the host in double. CUDA: single .cu file with four __global__ entry points (matches float_vif_cuda), submit/collect async-stream pattern. SYCL: single .cpp with launch_* templates over SCALE. Two precision-critical fixes from bring-up: (1) --fmad=false on the float_adm_score fatbin via a new per-kernel cuda_cu_extra_flags dict in meson.build — NVCC's default FMA contraction in the angle-flag dot product cascades through the cube reductions and pushes scale-3/adm2 past places=4 (3.6e-4 max_abs vs CPU before fix). Scoped to this one kernel; integer ADM keeps its existing FMA-on path. (2) Parent-LL dimension trap — stage 0 at scale > 0 must clamp/mirror against the parent's LL output dims (scale_w/h[scale]), NOT the parent's full-resolution dims (scale_w/h[scale - 1]); first cut clamped against the wrong bounds and let parent reads wander into uninitialised memory. Both fixes documented inline at the call sites. Verified max_abs_diff ≤ 6e-6 across all 5 outputs (adm2, adm_scale0..3) on the Netflix normal pair; checkerboard 1px is bit-exact. | Accepted | cuda, sycl, gpu, feature-extractor, fork-local, places-4 |
| ADR-0203 | Implementation follow-up to ADR-0242. Adds runnable Netflix-corpus loader + feature extractor + vmaf_v0.6.1 distillation + PyTorch dataset + PLCC/SROCC/KROCC/RMSE eval harness + Lightning-style training entry point under ai/data/ and ai/train/. Three architectures registered: linear (7 params), mlp_small (257 params, default), mlp_medium (2 561 params). Default validation split holds out the Tennis_24fps source (1-source-out, content-disjoint). Per-clip JSON cache at $VMAF_TINY_AI_CACHE with atomic write-rename. --epochs 0 --assume-dims 16x16 smoke command works without the real corpus or a built vmaf binary so CI can verify the harness end-to-end. Does NOT run training — the actual training is a manual user invocation deferred to the next PR. New docs/ai/training.md Netflix-corpus section + 25 unit tests under ai/tests/. | Accepted | ai, training, fork-local, onnx, docs |
| ADR-0205 | cambi GPU feasibility spike (mandated by ADR-0192 §Consequences). Verdict: feasible as a hybrid host/GPU pipeline, mirroring ADR-0201's precedent. GPU shaders cover preprocessing + per-row derivative kernel + 7×7 spatial-mask summed-area table + 2× decimate + 3-tap separable mode filter. The precision-sensitive calculate_c_values sliding-histogram pass + top-K spatial pooling stay on the host; the GPU phases are integer + bit-exact w.r.t. CPU so the places=2 precision contract holds trivially. Three classical re-formulations evaluated: (I) single-WG direct port — rejected, ~1/64 GPU utilisation; (II) parallel-scan reformulation — rejected for v1, materialises 17 GiB intermediate at 4K; (III) direct per-pixel histogram — deferred to v2 as profile-driven perf polish, ~9× CPU bandwidth. Companion research digest: docs/research/0020-cambi-gpu-strategies.md (literature: Blelloch 1990, Sengupta 2007, Merrill & Grimshaw 2016). v1 LOC estimate: ~1230 (host glue ~700 + 6 shaders ~400 + wiring ~130). This PR ships the architecture sketch + reference shader scaffolds + dormant cambi_vulkan.c host skeleton (not yet build-wired, matching ssimulacra2 precedent); a follow-up PR wires + integrates + validates. | Accepted | gpu, vulkan, cambi, feasibility-spike, fork-local |
| ADR-0206 | T7-23 / batch 3 part 7b + 7c — ssimulacra2_cuda and ssimulacra2_sycl extractors, mechanical ports of the ADR-0201 Vulkan hybrid host/GPU pipeline. Host: YUV→linear-RGB + 2×2 box downsample + linear-RGB→XYB + per-pixel SSIM/EdgeDiff combine in double precision (verbatim ports of ssimulacra2.c scalar paths — identical to Vulkan twin). GPU: 3-plane elementwise multiply (ssimulacra2_mul3) + separable Charalampidis 2016 3-pole IIR blur (ssimulacra2_blur_h / ssimulacra2_blur_v, one work-item per row/column). CUDA fatbin pinned with -Xcompiler=-ffp-contract=off --fmad=false via per-kernel cuda_cu_extra_flags map — without it the IIR's n2*sum - d1*prev1 - prev2 fuses to FMAs and per-step rounding compounds across the 6-scale pyramid past places=4. SYCL relies on the existing -fp-model=precise for the same effect. CUDA fex is .extract-pattern (synchronous host loop); D2H-copies raw YUV from picture_cuda's device-side VmafPicture.data[] into pinned host scratch before host YUV→RGB. Empirical (CUDA, RTX 4070): Netflix normal pair max_abs_diff = 1.0e-6 (places=4 with 5-decade margin); both checkerboard pairs bit-exact (0.0). SYCL precision verified through CI lavapipe-equivalent gate. Closes batch 3 part 7 across all three GPU backends. | Accepted | cuda, sycl, gpu, feature-extractor, fork-local |
| ADR-0207 | T5-4 — Quantization-Aware Training (QAT) design for the fork's tiny-AI surface. The 2026-04-28 backlog audit (§A.2.1) flagged QAT as untracked; per the Section-A audit decisions the user direction is implement, do not close. ADR locks the pass design before code lands: PyTorch torch.ao.quantization modern API (prepare_qat_fx / convert_fx); pretrain fp32 → insert FakeQuant (default symmetric per-tensor activations + per-channel weight observers, matching the PTQ static recipe in Research-0006 §2 so QAT-vs-static delta is attributable to training, not qconfig drift) → fine-tune at 10× reduced LR → convert_fx + torch.onnx.export(..., opset_version=17) to a QDQ .int8.onnx; pass through the existing measure_quant_drop.py audit harness with quant_mode="qat" (extending the dynamic/static enum) and a 0.002 default quant_accuracy_budget_plcc. Trainer hook lands in new ai/train/qat.py; ai/scripts/qat_train.py becomes a real implementation (was NotImplementedError scaffold per ADR-0173's audit-first sequence). Pairs with T5-3e so QAT models round-trip through the same EP set as PTQ models (CPU + CUDA + Arc OpenVINO/Level Zero). Alternatives considered: legacy eager-mode torch.quantization.prepare_qat (deprecated; rejected); ORT-internal Olive (inverts the PyTorch-first training flow; rejected); skip QAT and tighten the static-PTQ budget (rejected — direct contradiction of user direction §A.2.1). | Proposed | ai, quantization, dnn, tiny-ai, fork-local |
| ADR-0208 | T5-4 — first per-model QAT implementation, plus the trainer hook + CLI driver landing the ADR-0207 design. New ai/train/qat.py (Lightning-compatible run_qat + QatConfig); ai/scripts/qat_train.py rewritten from NotImplementedError scaffold to real CLI driver; new ai/configs/learned_filter_v1_qat.yaml. Two-step pipeline (PyTorch QAT → fp32 ONNX → ORT static-quantize) bridges PyTorch 2.11's two ONNX exporter bugs — neither legacy TorchScript (quantized::conv2d ops not standard ONNX) nor TorchDynamo (Conv2dPackedParamsBase.__obj_flatten__ AttributeError) can export convert_fx output to QDQ ONNX today. Bridge: copy QAT-conditioned weights into fresh fp32 module, export legacy ONNX, run quantize_static with calibration drawn from QAT distribution. Empirical on synthetic-corpus learned_filter_v1 (256 train pairs, 20 fp32 + 10 QAT epochs, 32-sample held-out): within-pipeline QAT-fp32 vs QAT-int8 PLCC drop 0.000081 (budget 0.002, PASS by 25×); cross-pipeline fp32-baseline vs QAT-int8 PLCC drop 0.001228 (PASS); static-PTQ comparison drop 0.000066. On this tiny model static-PTQ already exceeds budget so learned_filter_v1 stays on quant_mode: "dynamic"; QAT pipeline validated and ready for the next model where static misses budget. CI smoke test under ai/tests/test_qat_smoke.py. | Proposed | ai, quantization, dnn, tiny-ai, qat, fork-local |
| ADR-0209 | T5-2 — embedded MCP server scaffold (audit-first), implementing the ADR-0128 governance + Research-0005 design. New public header libvmaf_mcp.h declaring VmafMcpServer, VmafMcpConfig + per-transport configs, vmaf_mcp_init / _start_sse / _start_uds / _start_stdio / _stop / _close / _available / _transport_available. New stub TU at core/src/mcp/mcp.c — every entry point validates its arguments (NULL → -EINVAL) then returns -ENOSYS until the T5-2b runtime PR wires cJSON + mongoose + the dedicated MCP pthread + the SPSC ring buffer. New umbrella enable_mcp (boolean, default false) + per-transport sub-flags enable_mcp_sse / enable_mcp_uds / enable_mcp_stdio in meson_options.txt; conditional subdir('mcp') in core/src/meson.build. New 12-sub-test smoke at core/test/test_mcp_smoke.c pinning the -ENOSYS + NULL-guard contract. New docs/mcp/embedded.md. Zero runtime dependencies for the scaffold — no cJSON, no mongoose, no transport bodies; all land with T5-2b. Same audit-first shape as ADR-0175 (Vulkan T5-1) + ADR-0184 (T7-29 part 1). | Accepted | mcp, agents, api, scaffold, audit-first, fork-local |
| ADR-0210 | T7-36 — cambi Vulkan integration (Strategy II hybrid GPU/host). Closes ADR-0192 long-tail terminus. | Accepted | vulkan, gpu, cambi, feature-extractor, fork-local, places-4 |
| ADR-0211 | T6-9 — formal tiny-model registry schema (JSON Schema Draft 2020-12) with new license + Sigstore-bundle metadata, plus the new --tiny-model-verify CLI flag wiring cosign verify-blob via posix_spawnp(3p). Schema bumped to schema_version: 1 (loader accepts both 0 and 1). vmaf_dnn_verify_signature() exposed in the public libvmaf/dnn.h header; fails closed on missing registry entry, missing bundle file, missing cosign, or non-zero exit. Validator at ai/scripts/validate_model_registry.py (jsonschema with structural fallback). Five entries registered today (learned_filter_v1, lpips_sq_v1, nr_metric_v1, two CI smoke probes); license metadata tracks fork-trained models as BSD-3-Clause-Plus-Patent and upstream-derived models with their original licenses (LPIPS-Sq is BSD-2-Clause). Bundle files themselves are populated at release time by the existing supply-chain workflow; pre-release the verifier treats absence as a fail-closed signal. | Accepted | ai, dnn, security, supply-chain, fork-local |
| ADR-0212 | HIP (AMD ROCm) compute backend — scaffold-only audit-first PR (T7-10; closes T7-10 audit half, runtime + VIF-pathfinder land in follow-up PRs). New public header libvmaf_hip.h declaring VmafHipState / VmafHipConfiguration / vmaf_hip_state_init / _import_state / _state_free / vmaf_hip_list_devices / vmaf_hip_available. New core/src/hip/ (common + picture_hip + dispatch_strategy) + core/src/feature/hip/ (3 kernel stubs: adm/vif/motion). All entry points return -ENOSYS. New enable_hip boolean option (default false) with conditional subdir('hip') in core/src/meson.build. New 9-sub-test smoke at core/test/test_hip_smoke.c exercising every public C-API entry point. New CI matrix row Build — Ubuntu HIP (T7-10 scaffold) compiling with -Denable_hip=true. New docs/backends/hip/overview.md + docs/research/0033-hip-applicability.md + docs/backends/index.md flipped from "planned" to "scaffold". Zero hard runtime dependencies for the scaffold — dependency('hip-lang', required: false) is silently absent on stock Ubuntu runners; ROCm SDK arrives with the runtime PR. Mirrors the Vulkan T5-1 scaffold (ADR-0175). | Accepted | gpu, hip, rocm, amd, scaffold, audit-first, fork-local |
| ADR-0213 | T7-38 — SSIMULACRA 2 SVE2 SIMD parity for IIR blur + PTLR + the four pointwise kernels. Mirrors the NEON sibling lane-for-lane under a fixed 4-lane svwhilelt_b32(0, 4) predicate so the SVE2 path is byte-identical to NEON regardless of the runtime vector length, satisfying the ADR-0138 / ADR-0139 / ADR-0140 byte-exact contract. Runtime-gated via getauxval(AT_HWCAP2) & HWCAP2_SVE2; NEON remains the fallback. Build probe (cc.compiles(... -march=armv9-a+sve2)) leaves HAVE_SVE2 unset on toolchains without SVE2 intrinsics so the legacy NEON-only build path is unchanged. Validated under qemu-aarch64-static -cpu max: dispatch surfaces NEON=1 SVE2=1 and all 11 test_ssimulacra2_simd bit-exactness subtests pass byte-for-byte against the scalar reference. Drops the "deferred pending CI hardware" footnote in Research-0016 / Research-0017 — qemu validates correctness today, native aarch64 perf runner remains a follow-up. Closes backlog item T7-38. | Accepted | simd, arm64, sve2, ssimulacra2, qemu, fork-local |
| ADR-0214 | T6-8 — GPU-parity CI gate. New scripts/ci/cross_backend_parity_gate.py iterates every (feature, backend-pair) cell, diffs per-frame metrics with a feature-specific absolute tolerance declared in one place (FEATURE_TOLERANCE — 5e-5 default = places=4 from ADR-0125/0138/0140; 5e-3 for ciede / ssimulacra2; 5e-4 for psnr_hvs; 1e-2 FP16 contract via --fp16-features), and emits one JSON record + one Markdown row per cell. New CI lane vulkan-parity-matrix-gate in tests-and-quality-gates.yml runs on every PR over CPU↔Vulkan/lavapipe (no GPU runner needed); CUDA/SYCL/hardware-Vulkan are advisory until a self-hosted runner is wired in. Generalises and is the long-term replacement for the per-feature scripts/ci/cross_backend_vif_diff.py lane (kept for one release cycle). New user doc docs/development/cross-backend-gate.md; cross-linked from docs/backends/index.md; libvmaf/AGENTS.md rebase-sensitive invariant note. The gate never modifies feature implementations — verify-only. | Accepted | ci, gpu, vulkan, cuda, sycl, agents, fork-local |
| ADR-0215 | T6-7 — FastDVDnet temporal pre-filter (5-frame sliding window) lands as a registered feature extractor fastdvdnet_pre in core/src/feature/fastdvdnet_pre.c, backed by an ONNX model with the contract frames: [1, 5, H, W] -> denoised: [1, 1, H, W] (channel axis stacks [t-2, t-1, t, t+1, t+2]). Internal 5-slot ring buffer with replicate-edge clamp at clip start/end; per-frame scalar fastdvdnet_pre_l1_residual appended through vmaf_feature_collector_append. This PR ships a smoke-only placeholder ONNX (model/tiny/fastdvdnet_pre.onnx, ~6 KB, randomly-initialised 3-layer CNN with the correct shape contract — registry row smoke: true); real upstream-derived FastDVDnet weights + the FFmpeg vmaf_pre_temporal filter that consumes the denoised frame buffer are tracked as T6-7b. Same vmaf_dnn_session_* integration shape as feature_lpips.c; declines cleanly on missing model_path with -EINVAL. New unit test core/test/test_fastdvdnet_pre.c mirrors test_lpips.c registration + options-table contract. New user-facing doc docs/ai/models/fastdvdnet_pre.md; roadmap §3.3 row marked shipped. | Accepted | ai, dnn, feature-extractor, wave-1, fork-local |
| ADR-0216 | T3-15(b) — psnr_vulkan chroma extension. Extends the luma-only ADR-0182 extractor to emit psnr_cb / psnr_cr alongside psnr_y. Three-element arrays in PsnrVulkanState carry per-plane input + SE-partials buffers; a single command buffer issues three back-to-back dispatches of the existing plane-agnostic psnr.comp shader against per-plane (width, height, num_workgroups_x) push constants. Subsampling derived from pix_fmt: 4:2:0 → w/2 × h/2, 4:2:2 → w/2 × h, 4:4:4 → w × h. YUV400 clamps n_planes = 1 so chroma dispatches and emits are skipped. provided_features becomes {"psnr_y", "psnr_cb", "psnr_cr"}. psnr_max[p] follows CPU integer_psnr.c default branch ((6 * bpc) + 12). Cross-backend gate (scripts/ci/cross_backend_vif_diff.py --feature psnr) extended to assert all three plane scores at places=4; lavapipe measurement on the 576×324 testdata fixture reports max_abs_diff = 0.0 across 48 frames for all three metrics (deterministic int64 SSE on both sides). Builds on the existing Vulkan framework — no fresh shader, no fresh pipeline-creation logic; chroma SSIM / chroma MS-SSIM follow-ups stay separate rows. | Accepted | vulkan, gpu, feature-extractor, psnr, fork-local |
| ADR-0217 | SYCL toolchain cleanup — multi-version recipe + icpx-aware clang-tidy wrapper | Accepted | sycl, ci, clang-tidy, tooling, fork-local |
| ADR-0218 | MobileSal saliency feature extractor (T6-2a) | Accepted | ai, dnn, feature-extractor, saliency, fork-local |
| ADR-0219 | motion3 GPU coverage on Vulkan + CUDA + SYCL (3-frame window) | Accepted | gpu, vulkan, cuda, sycl, motion, feature-extractor, fork-local, t3-15c, places-4 |
| ADR-0220 | T7-17 — SYCL feature kernels are unconditionally fp64-free. Audit confirmed all SYCL feature kernels (integer_adm_sycl.cpp, integer_vif_sycl.cpp, integer_ciede_sycl.cpp, integer_ssim_sycl.cpp, the float-input extractors) are already fp64-free in their device code — ADM gain limiting via int64 Q31 split-multiply (gain_limit_to_q31 + launch_decouple_csf<false>), VIF gain limiting via fp32 sycl::fmin over float operands, accumulators via sycl::plus<int64_t>. The previous WARNING-level init log line ("device lacks fp64 support — using int64 emulation for gain limiting") was misleading: it suggested an emulation-overhead fallback that does not exist. Reworded to INFO-level "device lacks native fp64 — kernels already use fp32 + int64 paths, no emulation overhead". VmafSyclState.has_fp64 retained for future fp64-gated optimisations. New docs/backends/sycl/overview.md § "fp64-less device contract (T7-17)" documents the no-double-in-lambda-captures rule + the SPIR-V module-taint rationale (a single fp64 instruction in any lambda blocks the whole TU on Arc A-series). The originally reported 5–10× Arc A380 vs Vulkan perf gap has a different root cause (kernel geometry / sub-group size / memory pattern) — out of T7-17's scope. | Accepted | sycl, perf, gpu, arc, intel, t7-17, fork-local |
| ADR-0221 | T7-39 — CHANGELOG + ADR-index fragment-file pattern. New changelog.d/<section>/<topic>.md and docs/adr/_index_fragments/<slug>.md per-PR fragment trees + two in-tree concat scripts (scripts/release/concat-changelog-fragments.sh, scripts/docs/concat-adr-index.sh) replace edits to the consolidated CHANGELOG.md Unreleased block + docs/adr/README.md index table. Eliminates the per-PR merge-conflict surface that cost ≈16 min per PR over the 2026-04-28 sprint. Migration is content-preserving: existing 3119-line Unreleased body archived verbatim as changelog.d/_pre_fragment_legacy.md; 159 ADR rows split per-slug with a frozen _order.txt preserving the existing commit-merge order. PR template + Doc-Substance Gate (ADR-0167) updated to recognise fragment files as CHANGELOG entries. release-please workflow integration tracked as T7-39b follow-up. | Accepted | process, release, docs, ci, fork-local |
| ADR-0222 | vmaf-perShot per-shot CRF predictor sidecar | Accepted | ai, tools, encoder-hint, fork-local, t6-3b |
| ADR-0223 | Accepted | ai, dnn, feature-extractor, wave-1, shot-detection, fork-local | |
| ADR-0234 | T-GPU-ULP — per-GPU-generation ULP calibration scoped as two tiers. Tier 1 (this PR): YAML calibration table at scripts/ci/gpu_ulp_calibration.yaml mapping a runtime GPU id (Research-0041 schema: vulkan:0xVVVV:0xDDDD / cuda:M.m / sycl:0xVVVV:DRIVER) to a per-feature absolute tolerance for the cross-backend parity gate. Both cross_backend_vif_diff.py and cross_backend_parity_gate.py accept new --gpu-id / --calibration-table flags; when omitted, the per-feature FEATURE_TOLERANCE defaults remain authoritative (legacy callers see no behaviour change). Lookup picks the most-specific glob match (longest non-wildcard prefix wins). Initial coverage: 1 calibrated row (Mesa lavapipe — tolerances mirror the gate's pre-existing defaults) plus 11 placeholder rows (NVIDIA Ampere / Turing / Ada / Hopper, AMD RDNA2 / RDNA3, Intel Arc Alchemist / Battlemage, generic Intel SYCL); placeholders are functional no-ops until a real-hardware corpus replaces their features: block. Tier 2 (deferred to feat(ai): T7-GPU-ULP-CAL — calibration-head v0): an ONNX calibration head behind a new --gpu-calibrated CLI flag, gated on the data-collection script producing a real corpus on at least one real GPU. The hosted-CI lavapipe lane passes --gpu-id "vulkan:0x10005:0x0" so the gate's tolerance decisions are now per-arch annotated in the parity report. New unit test scripts/ci/test_calibration.py (19 cases) covers the loader, glob semantics, specificity ranking, and shipped-table round-trip. | Accepted | ai, gpu, vulkan, cuda, sycl, cross-backend, fork-local, t7-gpu-ulp-cal |
| ADR-0235 | Codec-aware FR regressor (fr_regressor_v2) | Proposed | ai, dnn, tiny-ai, fr-regressor, fork-local |
| ADR-0236 | DISTS extractor as LPIPS companion | Accepted | ai, fr, dnn, tiny-ai, fork-local, perceptual |
| ADR-0237 | Quality-aware encode automation surface (vmaf-tune) | Accepted (Phase A only; Phases B–F remain Proposed) | tooling, ai, ffmpeg, codec, automation, fork-local |
| ADR-0238 | Vulkan VmafPicture preallocation surface (API parity with CUDA / SYCL) | Proposed | vulkan, api, preallocation, fork-local, parity |
| ADR-0239 | Backend-agnostic GPU picture pool (gpu_picture_pool.{h,c}) | Proposed | refactor, gpu, cuda, sycl, vulkan, dedup, fork-local |
| ADR-0240 | GPU backend public-header pattern doc (PR3 of GPU dedup, doc-only) | Accepted | docs, gpu, agents, fork-local |
| ADR-0241 | T7-10 first-consumer PR — integer_psnr_hip host scaffolding via mirrored kernel-template. Ships core/src/hip/kernel_template.{h,c} (field-for-field mirror of cuda/kernel_template.h from ADR-0221: VmafHipKernelLifecycle private-stream + 2-event struct, VmafHipKernelReadback device-accumulator + pinned-host pair, six lifecycle helpers) + core/src/feature/hip/integer_psnr_hip.{c,h} (first consumer; mirrors integer_psnr_cuda.c's init/submit/collect/close call graph verbatim). Helpers and the consumer's submit/collect return -ENOSYS while the runtime PR (T7-10b) is pending; bodies flip to live HIP without touching consumer call-sites. New vmaf_fex_psnr_hip registered in feature_extractor_list under #if HAVE_HIP so callers get "extractor found, runtime not ready" instead of "no such extractor". New VMAF_FEATURE_EXTRACTOR_HIP = 1 << 6 flag bit reserved (unused until T7-10b adds the picture buffer-type plumbing). Smoke test grows 5 sub-tests pinning template-helper -ENOSYS contracts + extractor-name lookup; 14/14 pass under -Denable_hip=true. CPU baseline (47/47) + HIP scaffold (48/48) green. No ROCm SDK required (matches ADR-0212's audit-first split). Out-of-line helpers (not static inline like CUDA) so the runtime PR has one editing target. Mirrors the Vulkan T5-1 → T5-1b cadence; runtime + remaining kernels remain in T7-10b. | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local |
| ADR-0242 | Scaffold-only PR preparing tiny-AI training on the local Netflix VMAF corpus (9 ref / 70 distorted YUVs at .workingdir2/netflix/, gitignored). Documents the corpus path convention, --data-root loader API, and evaluation harness. Records architecture-choice space (MLP depth/width sweep, distillation-vs-from-scratch policy, model size targets) without selecting a configuration — that decision is deferred to a follow-up PR. Adds MCP end-to-end smoke test (test_smoke_e2e.py) exercising vmaf_score against the Netflix golden fixture. No training runs; no golden assertions modified. | Accepted | ai, training, fork-local, onnx, docs |
| ADR-0243 | enable_lcs MS-SSIM extras on CUDA + Vulkan | Accepted | cuda, vulkan, gpu, metrics, ms-ssim, fork-local |
| ADR-0244 | vmaf_tiny_v2 — canonical-6 + StandardScaler tiny VMAF MLP | Accepted | ai, dnn, tiny-ai, model, registry, fork-local |
| ADR-0245 | SIMD bit-exact test harness shared header | Accepted | simd, test, dx, fork-local |
| ADR-0246 | Per-backend GPU kernel scaffolding templates (CUDA + Vulkan) | Accepted | gpu, cuda, vulkan, refactor, fork-local |
| ADR-0247 | vmaf-roi sidecar binary for per-CTU QP offsets | Accepted | tools, ai, roi, encoder |
| ADR-0248 | nr_metric_v1 joins dynamic-PTQ family (T5-3d) | Accepted | tiny-ai, onnx, quantization, registry, fork-local |
| ADR-0249 | Tiny-AI Wave 1 baseline C1 — fr_regressor_v1 on Netflix Public | Accepted | tiny-ai, training, onnx, netflix-public, c1, fork-local |
| ADR-0250 | Tiny-AI extractor template — shared scaffolding header | Accepted | ai, dnn, refactor, dx, fork-local |
| ADR-0251 | Vulkan VkImage import — v2 async pending-fence model (T7-29 part 4) | Proposed | vulkan, ffmpeg, fork-local, zero-copy, performance, implementation |
| ADR-0252 | ssimulacra2 Vulkan host-path AVX2 + NEON SIMD (T-GPU-OPT-VK-2) | Accepted | simd, vulkan, ssimulacra2, performance |
| ADR-0253 | Defer SpEED-QA full-reference reduction. Closes the user's 2026-04-21 deep-research queued track. The fork keeps speed_chroma / speed_temporal (PR #213, port of upstream d3647c73) as research-stage extractors gated behind -Denable_float=true; does not add a speed_qa reduction; does not register a SpEED-driven model. Rationale: SpEED-QA overlaps vif substantially (both GSM-prior divisive-normalisation entropy estimators), the "speed" headline inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, and the assumed model/speed_4_v0.6.0.json upstream binary does not exist — there is no Netflix artefact to mirror. Reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Companion research digest: docs/research/0051-speed-qa-feasibility.md. | Proposed | metrics, research, feature-extractor, roadmap |
| ADR-0254 | HIP second-consumer kernel — float_psnr_hip via mirrored kernel-template | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local |
| ADR-0255 | T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the ADR-0215 smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream m-tassano/fastdvdnet (commit c8fdf61, MIT) wrapped by a LumaAdapter PyTorch module that preserves the C-side luma [1, 5, H, W] → [1, 1, H, W] contract: each luma plane is Concat-tiled into RGB (Y → [Y, Y, Y]) to match upstream's 15-channel input, a constant sigma = 25/255 noise map (upstream's reference inference level) is broadcast via ones_like(centre) * sigma, and the upstream RGB output is collapsed back to luma using BT.601 weights (Y = 0.299 R + 0.587 G + 0.114 B). Every nn.PixelShuffle instance in upstream's UpBlock is swapped pre-export for an allowlist-safe Reshape/Transpose/Reshape decomposition (zero learned params → numerically identical, verified < 1e-6 max-abs diff between upstream PyTorch and exported ONNX); DepthToSpace deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips smoke: false with license: MIT, upstream commit pin, and refreshed sha256; sidecar JSON + doc docs/ai/models/fastdvdnet_pre.md carry full provenance. New ai/scripts/export_fastdvdnet_pre.py (replaces the _placeholder.py exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up. | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, fork-local |
| ADR-0256 | Vulkan submit-side template + fence pool + descriptor pre-alloc | ||
| ADR-0257 | Defers the T6-2a-followup real-weights swap for mobilesal_placeholder_v0 indefinitely. Upstream yuhuan-wu/MobileSal is CC BY-NC-SA 4.0 (incompatible with the fork's BSD-3-Clause-Plus-Patent), distributes weights only via Google Drive viewer URLs (no GitHub release / pinnable raw URL), and is RGB-D (the C contract is RGB-only). ADR-0218's "MIT-licensed" claim was inaccurate; corrected here and in docs/ai/models/mobilesal.md. Smoke-only placeholder remains shipped; recommended replacement is U-2-Net u2netp (Apache-2.0), filed as backlog row T6-2a-replace-with-u2netp. Companion to Research-0053. | Accepted | ai, dnn, mobilesal, saliency, license, fork-local, docs |
| ADR-0258 | T7-32 — ONNX op-allowlist gains Resize to unblock U-2-Net (PR #341) and the wider saliency / segmentation surface (mobilesal, BASNet, PiDiNet, FPN-style detectors). One-line addition under op_allowlist.c's /* convolutional */ block plus comment block citing the supported-mode contract (nearest, linear recommended; cubic not exercised by any in-tree consumer and numerically less stable on quantised inputs). Wire scanner stays op-type-only per ADR D39 / ADR-0169 — per-attribute mode filtering would expand the bounded-auditable scope. Python vmaf_train.op_allowlist regex parser picks up the new entry automatically (export-time + load-time symmetry preserved). New tests: test_resize_op_allowed (C allowlist), test_resize_top_level_allowed (C wire-format scan), test_resize_now_allowed (Python parser). All 47 libvmaf tests + 15 Python tests green. | Accepted | tiny-ai, onnx, security, op-allowlist |
| ADR-0259 | HIP third-consumer kernel — ciede_hip via mirrored kernel-template | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local |
| ADR-0260 | HIP fourth-consumer kernel — float_moment_hip via mirrored kernel-template | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local |
| ADR-0261 | TransNet V2 shot-boundary detector — real upstream weights via NTCHW adapter (T6-3a-followup) | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, shot-detection, fork-local |
| ADR-0262 | Relax ai/scripts/build_bisect_cache.py --check parquet leg from filecmp.cmp byte equality to typed-Arrow-Table content equality (pyarrow.Table.equals + schema + row-count). Caused by issue #40 freezing on a 2026-04-21 success comment for ~14 days while the nightly red-lined every night on parquet-cpp-arrow version 23 → 24 created_by-string drift in the runner image's pyarrow upgrade. ONNX byte-equality preserved (producer_name / producer_version / ir_version already pinned). nightly-bisect.yml decouples sticky-comment updates from result.json existing: a new post-bisect-comment.py --wiring-broke mode posts a "WIRING BROKE" verdict to issue #40 with the cache-check stderr inline when --check itself fails, then exits non-zero so the run stays red. Relaxes ADR-0109 §Decision (parquet only). | Accepted | ai, ci, tiny-ai, framework |
| ADR-0263 | OSSF Scorecard policy + remediation cadence. Diagnoses the months-long red scorecard.yml workflow (root cause: github/codeql-action/upload-sarif SHA b25d0ebf... is an "imposter commit" — no longer exists in the action's repository, so Scorecard's webapp returns 400 on publish). Repins to current v4 head e46ed2cbd0... to unblock. Documents per-check scoring against the 6.2 / 10 baseline (target ≥ 7.0): accepted blockers are Code-Review (solo-maintainer artefact), Branch-Protection (GITHUB_TOKEN can't read classic rules without a fine-grained PAT, which the secret-policy forbids), Maintained (auto-resolves at 90-day age), CII-Best-Practices (out-of-scope external badge application). Active remediation queue: Vulnerabilities (bump python/requirements.txt lower bounds), Pinned-Dependencies (upstream Dockerfile-parser bug + SHA-pin sweep), Fuzzing (OSS-Fuzz onboarding), Signed-Releases (resolves on first release-please cut), Packaging. Companion research digest: docs/research/0053-ossf-scorecard-investigation.md. 90-day re-evaluation cadence. | Accepted | ci, security, supply-chain, docs, fork-local |
| ADR-0264 | Vulkan 1.4 API-version bump blocked on shader FP-contraction audit | Accepted | vulkan, fork-local, bit-exactness, backlog, docs |
| ADR-0265 | Defers the T6-2a-replace-with-u2netp model-family swap recommended by ADR-0257. Upstream xuebinqin/U-2-Net is Apache-2.0 (license OK) but u2netp.pth is distributed only via Google Drive (no GitHub release, no pinnable raw URL — same blocker as MobileSal in ADR-0257), and U-2-Net's F.upsample(..., mode='bilinear') lowers to ONNX Resize which is not on the fork's op_allowlist.c. Smoke-only placeholder remains shipped; recommended next step is T6-2a-widen-allowlist-resize (separate ADR-scope decision) before another saliency-replacement attempt. Companion to Research-0054. | Accepted | ai, dnn, mobilesal, u2netp, saliency, license, op-allowlist, fork-local, docs |
| ADR-0266 | HIP fifth kernel-template consumer — float_ansnr_hip | Accepted | hip, gpu, feature-extractor, kernel-template |
| ADR-0267 | HIP sixth kernel-template consumer — motion_v2_hip | Accepted | hip, gpu, feature-extractor, kernel-template, temporal |
| ADR-0269 | Step A of the Vulkan 1.4 API-version bump path documented in ADR-0264. Tags the load-bearing FP ops in vif.comp (g, sv_sq, gg_sigma_f) and ciede.comp (yuv→rgb outputs, rgb→xyz matmul accumulators, ciede2000 chroma magnitudes + half-axes + s_l/c/h + lightness/chroma/hue + final ΔE) with GLSL precise so glslc emits per-result OpDecorate ... NoContraction (62 lines in vif, 126 in ciede; verified bit-identical 1.3↔1.4 SPIR-V). Partial fix only: on NVIDIA RTX 4090 + driver 595.71.05, ciede improves 19× (42/48 → 5/48 mismatches; 1.67e-04 → 8.9e-05 max abs at places=4); vif's 45/48 regression at API 1.4 is not fixed — the SPIR-V decorations are correctly emitted on the suspect ops but the driver still drifts, suggesting the regression is not in those five tagged float ops. Step B remains blocked. Documented in research-0054. 5/48 ciede tail (1.78× the threshold) deferred to a CPU-side double-vs-float bisect follow-up. | Accepted | vulkan, fork-local, bit-exactness, shaders |
| ADR-0270 | libFuzzer scaffold for parser surfaces (OSSF Scorecard remediation) | Proposed | security, build, ci, docs |
| ADR-0271 | T-GPU-OPT-2 — wires integer_ms_ssim_cuda through the engine-scope CUDA fence-batching helper (drain_batch.h). Allocates per-scale partials buffers (5× l_partials[] / c_partials[] / s_partials[] device + matching pinned host shadows) so all 5 SSIM scales' horiz + vert_lcs launches and DtoH copies enqueue back-to-back on s->lc.str inside submit(); records s->lc.finished once after the last DtoH and calls vmaf_cuda_drain_batch_register(&s->lc) so the engine's single cuStreamSynchronize(drain_str) covers this extractor too. collect() collapses to a host-side reduction only — vmaf_cuda_kernel_collect_wait short-circuits when the engine has already drained the lifecycle. The shared SSIM intermediate buffers (h_ref_mu, h_cmp_mu, h_ref_sq, h_cmp_sq, h_refcmp) stay shared because same-stream ordering serialises horiz ⇒ vert_lcs ⇒ DtoH naturally; the previous per-scale cuStreamSynchronize was forced only by the host-side reduction stepping in between launches. Net change: 6 per-frame host-blocking syscalls collapse into the existing batched flush; expected ms_ssim wall-clock +3-5% on the Netflix CUDA benchmark. Bit-exact (same kernels, same stream, same submission order; only the host wait point moves). Vulkan / SYCL twins unchanged — drain_batch is CUDA-only by design (ADR-0246). Footprint grows by ~12 small buffers per state (≈ 8.1 KB device + 8.1 KB pinned host at 1080p, negligible against the existing pyramid allocations). | Accepted | cuda, gpu, perf, fork-local |
| ADR-0272 | fr_regressor_v2 codec-aware scaffold (Phase B prereq) | Proposed | ai, dnn, tiny-ai, fr-regressor, codec-aware, |
| ADR-0273 | HIP seventh kernel-template consumer — float_motion_hip | Accepted | hip, gpu, feature-extractor, kernel-template, temporal, fork-local |
| ADR-0274 | HIP eighth kernel-template consumer — float_ssim_hip | Accepted | hip, gpu, feature-extractor, kernel-template, multi-dispatch, fork-local |
| ADR-0275 | vmaf_tiny_v3 and vmaf_tiny_v4 join dynamic-PTQ family (T5-3d follow-up) | Accepted | tiny-ai, onnx, quantization, registry, fork-local |
| ADR-0276 | T-VMAF-TUNE Phase A.5 — opt-in vmaf-tune fast proxy-based recommend. Combines three acceleration levers (VMAF proxy via fr_regressor_v2 per ADR-0272, Bayesian search via Optuna TPE, GPU-accelerated VMAF verify per ADR-0157 / ADR-0186) into a new fast subcommand that targets the recommendation use case at ~20–50× the speed of the Phase A grid (~100–500× with NVENC, follow-up). Slow Phase A grid path stays canonical as ground-truth corpus generator (ADR-0237 contract); fast-path is opt-in via pip install vmaf-tune[fast]. This PR ships the scaffold only — Optuna search loop, smoke-mode synthetic predictor, CLI subcommand, production-shape entry point. Real encode + ONNX inference + GPU verify wiring is a follow-up PR gated on Phase A corpus + fr_regressor_v2 weights training (PR #347). Smoke test: vmaf-tune fast --smoke --target-vmaf 92. Companion research digest: docs/research/0060-vmaf-tune-fast-path.md. | Proposed | tooling, ai, ffmpeg, codec, automation, fork-local |
| ADR-0277 | ffmpeg-patches series replay against pristine n8.1 — 2026-05-04. Verifies the six-patch stack (0001-libvmaf-add-tiny-model-option.patch through 0006-libvmaf-add-libvmaf-vulkan-filter.patch) still applies cleanly cumulatively per ADR-0118 + ADR-0186. No content drift: all six patches replay clean via git am --3way; git format-patch n8.1.. regeneration shows only cosmetic noise (PATCH numbering, MIME headers, hunk-context counts). Keeps originals to avoid churn. PRs #332-#341 (HIP kernel-template consumers, OSSF Scorecard work, Vulkan 1.4 deferral, U-2-Net deferral) introduced zero drift on the ffmpeg-integration surface. vf_libvmaf end-to-end smoke deferred to CI (ffmpeg-integration.yml) — the meson-uninstalled .pc doesn't satisfy FFmpeg's #include <libvmaf.h> probe locally; CI runs against an installed prefix. | Accepted | ffmpeg, fork-local, maintenance, patches |
| ADR-0278 | T7-5 NOLINT-sweep closeout. Cite-only pass that appends (ADR-0141 §2 ... load-bearing invariant; T7-5 sweep closeout — ADR-0278) to the 22 surviving readability-function-size NOLINT sites in core/src/ + core/tools/ whose comments described the invariant in prose without naming an ADR explicitly. Touches integer_adm.c (1 site, upstream-mirror Netflix 966be8d5), cuda/ssimulacra2_cuda.c (3 sites), vulkan/ssimulacra2_vulkan.c (3 sites), vulkan/cambi_vulkan.c (1 site), sycl/integer_adm_sycl.cpp (6), sycl/integer_motion_sycl.cpp (2), sycl/integer_vif_sycl.cpp (4), tools/vmaf.c (3 driver functions). After this PR, programmatic audit reports 75 sites total, 0 missing ADR/Research citations. Backlog item T7-5 closed. No function bodies split, no behavioural change. Companion research digest docs/research/0063-t7-5-nolint-sweep.md. | Accepted | lint, cleanup, touched-file-rule, t7-5, fork-local, docs |
| ADR-0279 | vmaf-tune codec adapter — libaom-av1 | Accepted | tools, vmaf-tune, av1, codec-adapter |
| ADR-0281 | vmaf-tune Intel QSV codec adapters (h264_qsv, hevc_qsv, av1_qsv) | Accepted | tooling, ffmpeg, codec, qsv, intel, fork-local |
| ADR-0282 | vmaf-tune AMD AMF codec adapters (h264 / hevc / av1) | Accepted | tooling, ffmpeg, codec, amd, amf, vmaf-tune, fork-local |
| ADR-0283 | Apple VideoToolbox codec adapters for tools/vmaf-tune/. Adds H264VideoToolboxAdapter + HEVCVideoToolboxAdapter (and a shared _videotoolbox_common.py for the -q:v 0..100 quality knob + nine-name preset → -realtime boolean mapping) along the same one-file-per-codec contract NVENC / AMF / QSV already use. AV1 hardware encoding intentionally omitted (unavailable on Apple Silicon as of 2026). Tests mock subprocess.run so Linux CI stays green; macOS end-to-end is left to contributors with VideoToolbox available locally. The originally-coupled 16-slot codec-vocab schema expansion is deferred to a follow-up PR awaiting a fresh fr_regressor_v2 retrain (ship-gate per ADR-0235 + ADR-0291). Companion research digest docs/research/0074-vmaf-tune-videotoolbox-adapters.md. | Accepted | tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local |
| ADR-0285 | vmaf-tune libvvenc adapter — VVC / H.266 with optional NN-VC tools | Accepted | tooling, codec, vvc, h266, ai, nnvc, fork-local |
| ADR-0286 | Fork-trained saliency student saliency_student_v1 on DUTS-TR | Accepted | ai, dnn, mobilesal, saliency, training, license, fork-local, docs |
| ADR-0287 | vmaf_tiny_v5 — corpus expansion (4-corpus + YouTube UGC vp9 subset) | Accepted (decision | ai, dnn, tiny-ai, training-data, research, fork-local |
| ADR-0288 | vmaf-tune libx265 codec adapter — first sibling codec after the ADR-0237 Phase A x264 scaffold. New tools/vmaf-tune/src/vmaftune/codec_adapters/x265.py (frozen X265Adapter dataclass: 10 presets including placebo, 0..51 CRF window pinned to the same Phase A informative range as x264, profile_for(pix_fmt) table mapping yuv420p10le → main10 for downstream HDR work). Registered under libx265 in codec_adapters/__init__.py. encode.parse_versions gains an encoder-aware regex (x265 [info]: HEVC encoder version … → libx265-<version>). CLI --encoder now accepts libx264 | libx265 via choices=list(known_codecs()). 14 new subprocess-mocked tests under tests/test_codec_adapter_x265.py (adapter contract, preset/CRF validation, profile mapping, ffmpeg argv shape, version parsing, run_encode round-trip, corpus end-to-end, missing-binary error handling). Real-binary integration test gated on VMAF_TUNE_INTEGRATION=1. No schema bump — existing encoder row column already carries codec identity. Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). | Accepted | tooling, ffmpeg, codec, automation, fork-local, vmaf-tune |
| ADR-0289 | vmaf-tune resolution-aware model selection + CRF offsets | Accepted | tooling, vmaf-tune, model-selection, fork-local |
| ADR-0290 | NVENC codec adapters for vmaf-tune (h264 / hevc / av1) | Accepted | tooling, codec, nvenc, gpu, fork-local |
| ADR-0291 | T-FR-V2-PROD — flip fr_regressor_v2 from smoke to production. Trained on the Phase A real-corpus 216-cell aggregate (216 (src, encoder, preset, cq) cells averaged from 33,840 per-frame canonical-6 rows produced by scripts/dev/hw_encoder_corpus.py, ADR-0237 / PR #392) using the v2 ENCODER_VOCAB (12 encoders, PR #394). MLP 6 → 32 → 32 → 32 → 1 with 14-D codec block concatenation; 200ep Adam lr=5e-4 batch=32. LOSO PLCC = 0.9681 ± 0.0207 clears the ADR-0235 0.95 ship gate (one outlier — OldTownCross at 0.9183 — held in scope). Registry row flips smoke: false, sha256 67934b0b…. Production default stays vmaf_tiny_v2; fr_regressor_v2 is the teacher-score predictor for vmaf-tune Phase B+. Companion research digest Research-0067. | Accepted | ai, dnn, tiny-ai, fr-regressor, codec-aware, vmaf-tune, fork-local |
| ADR-0293 | vmaf-tune saliency-aware ROI tuning (Bucket #2) | Accepted | tooling, ai, saliency, ffmpeg, codec, fork-local |
| ADR-0294 | vmaf-tune codec adapter for SVT-AV1 | Accepted | tools, vmaf-tune, codec, av1, fork-local |
| ADR-0295 | vmaf-tune Phase E — per-title bitrate-ladder generator | Proposed | tooling, ffmpeg, codec, automation, abr, fork-local |
| ADR-0296 | Region-of-interest VMAF scoring surface — Option C scaffold. Ships tools/vmaf-roi-score/ (Python tool) that drives the vmaf CLI twice (full-frame + saliency-masked) and blends the two pooled scalars via a user-controlled weight w ∈ [0, 1]. Distinct from the existing core/tools/vmaf_roi.c encoder-steering binary (ADR-0247) — different surface, related model. Combine math (blend_scores) + CLI surface + JSON schema (SCHEMA_VERSION = 1) + subprocess seam ship in this PR. The --saliency-model ONNX inference path is wired and validated but mask materialisation deliberately exits 64 — gated on PR #359 (saliency_student_v1) merging and a follow-up T6-2c PR. Option A (per-pixel feature pooling weighted by saliency in libvmaf C code) is explicitly deferred — separate ADR + research-grade numerical validation; this scaffold avoids the Netflix golden gate and cross-backend ULP-diff burden entirely. No MOS-correlation claim is made — validation against a labelled saliency-MOS corpus is research follow-up (Research-0069). Companion to Research-0069 (Option-space digest). | Accepted | tooling, ai, saliency, vmaf, fork-local |
| ADR-0297 | vmaf-tune — codec-agnostic encode dispatcher | Accepted | tooling, ffmpeg, codec, automation, fork-local |
| ADR-0298 | vmaf-tune content-addressed encode/score cache | Accepted | tools, vmaf-tune, cache, fork-local |
| ADR-0299 | GPU scoring backend for vmaf-tune (--score-backend) | Accepted | tooling, cuda, vulkan, sycl, ai, automation, fork-local |
| ADR-0300 | Bucket #9 of the PR #354 vmaf-tune capability audit — HDR-aware encoding + scoring in tools/vmaf-tune/. New vmaftune.hdr module exposes detect_hdr (ffprobe-driven PQ / HLG classification with strict BT.2020 primaries gate so malformed signaling falls back to SDR), hdr_codec_args (per-encoder dispatch table covering libx264, libx265, libsvtav1, hevc_nvenc, libvvenc), and select_hdr_vmaf_model (returns model/vmaf_hdr_*.json if shipped). Corpus driver gains --auto-hdr / --force-sdr / --force-hdr-pq / --force-hdr-hlg mutually-exclusive modes and three new schema-v2 row keys (hdr_transfer, hdr_primaries, hdr_forced); SCHEMA_VERSION bumped 1 → 2. Score model arg now accepts pre-formatted path= / version= strings so HDR-model paths flow through unchanged. Encode-side correctness ships now; HDR-VMAF model port is filed as a backlog follow-up — until it lands, HDR sources are scored against the SDR model with a one-shot warning. | Accepted (encode-side); HDR-VMAF scoring deferred | tooling, vmaf-tune, hdr, codec, ffmpeg, fork-local |
| ADR-0301 | vmaf-tune --sample-clip-seconds N — opt-in sample-clip mode that encodes/scores only the centre N-second window of each source per grid cell instead of the full reference, scaling per-cell wall time roughly linearly with slice length (e.g. ~6x speedup at N=10 against a 60-second source). FFmpeg input-side -ss <start> -t <N> cuts the rawvideo demuxer at the slice boundary; the libvmaf CLI's --frame_skip_ref / --frame_cnt mirror the same window on the score side so VMAF compares matching frames without slicing the reference YUV on disk. Centre-anchored placement (naive scaffold; TransNet V2-based smart placement is a follow-up). Each emitted row carries clip_mode = "sample_<N>s" or "full" so Phase B (target-VMAF bisect) and Phase C (per-title CRF predictor) can filter, weight, or epilogue-rescore. Corpus schema bumps additively to SCHEMA_VERSION = 2. Expected accuracy delta ~1–2 VMAF points on diverse content, ~0.3–0.5 on uniform content. Companion to ADR-0237 Phase A. | Accepted | tooling, ffmpeg, vmaf-tune, fork-local |
| ADR-0302 | ENCODER_VOCAB v3 — 16-slot schema expansion + retrain plan | Accepted | |
| ADR-0303 | fr_regressor_v2 ensemble — production flip trainer + CI gate | Accepted | ai, fr-regressor, ensemble, probabilistic, loso, ci-gate, fork-local |
| ADR-0304 | vmaf-tune fast — production wiring (Optuna TPE + v2 proxy + GPU verify) | Accepted | tooling, ai, ffmpeg, codec, automation, fork-local |
| ADR-0305 | Encoder knob-space Pareto-frontier analysis stratified per (source, codec, rc_mode) | Accepted | ai, vmaf-tune, research, encoder, pareto, fork-local |
| ADR-0306 | vmaf-tune coarse-to-fine CRF search | Accepted | tooling, automation, vmaf-tune, ffmpeg |
| ADR-0307 | vmaf-tune ladder default sampler — wire Phase B/E gap | Accepted | tooling, automation, vmaf-tune, ladder, fork-local |
| ADR-0308 | Encoder knob-sweep recipe-regression revision policy | Accepted | ai, vmaf-tune, codec-adapters, knob-sweep, fork-local |
| ADR-0309 | fr_regressor_v2 ensemble — real-corpus retrain harness + flip workflow | Accepted | ai, fr-regressor, ensemble, loso, runbook, fork-local |
| ADR-0310 | BVI-DVC corpus ingestion for fr_regressor_v2 — adopt the Bristol VI Lab BVI-DVC dataset (Ma, Zhang, Bull 2021) as a second training shard alongside the Netflix Public drop. New ai/scripts/bvi_dvc_to_corpus_jsonl.py adapter re-shapes the existing parquet pipeline's cached libvmaf JSON into vmaf-tune Phase A CORPUS_ROW_KEYS rows; new ai/scripts/merge_corpora.py concatenates Netflix + BVI-DVC shards with (src_sha256, encoder, preset, crf) deduplication and schema validation. Triples training-corpus size and expands LOSO partitioning from 9 folds to 9 + N. License posture is local-only — corpus stays under .workingdir2/, only derived fr_regressor_v2_*.onnx weights ship. Production-weights flip stays gated on ADR-0303's ensemble criterion; this ADR ships ingestion infrastructure only. Tests under ai/tests/test_merge_corpora.py cover concat-with-dedup and schema-violation rejection on synthetic fixtures. | Accepted | ai, training, corpus, license, fork-local |
| ADR-0311 | libFuzzer harness expansion — fuzz_yuv_input + fuzz_cli_parse | Accepted | security, build, ci, docs, fork-local |
| ADR-0312 | FFmpeg-patch series for vmaf-tune integration (qpfile + libvmaf_tune + pass-autotune) | Accepted | tooling, ffmpeg, vmaf-tune, patch-series, scaffold |
| ADR-0313 | New Required Checks Aggregator workflow + branch-protection policy update. The 23-named-required-check posture (per ADR-0037) deadlocks doc/Python-only PRs because the C-build matrix path-filter-skips on their diffs, but branch protection counts a path-filter-skip + a never-ran-at-all as not satisfying the required-check. Aggregator is one workflow that always runs (no path filter) and verifies each named check on the head SHA reported success/skipped/neutral (or didn't appear at all — path-filter rejection). Aggregator becomes the single required check; the 23 individual workflows continue to run unchanged. Manual operator step at adoption: gh api PUT repos/VMAFx/vmafx/branches/master/protection/required_status_checks -F 'contexts=["Required Checks Aggregator"]'. Unblocks PRs #400, #403, #404, #405, #406, #407 currently stuck on the structural deadlock. | Proposed | ci, branch-protection, policy, fork-local |
| ADR-0314 | vmaf-tune --score-backend=vulkan (vendor-neutral GPU scoring) | Accepted | tooling, vmaf-tune, vulkan, gpu, fork-local |
| ADR-0315 | Vendor-neutral VVC GPU encode strategy — three-tier rollout. Verified premise (NVIDIA NVENC SDK 13.0 docs, 2026-05-05): no GPU vendor ships hardware VVC encode silicon today (NVENC supports only H.264 / HEVC / AV1; AMD AMF / Intel QSV indicative same). Tier 1 (ship today): document NN-VC as the de-facto vendor-neutral H.266 GPU contribution (runs on any ONNXRuntime EP via the existing vvenc.py adapter) and wire the existing Vulkan backend to vmaf-tune for GPU-accelerated scoring of CPU-encoded VVC bitstreams (sibling ADR-0314, scoped separately). Tier 2 (backlog): HIP port of VVenC's motion-estimation, transform, and loop-filter kernels, gated on three demand-pull triggers — user-reported throughput pain on a real corpus, Tier-1 NN-VC docs adopted in production, RDNA 3/4 or PVC CI machine available. Tier 3 (quarterly revisit): a VK_KHR_video_encode_h266-based libvmaf-side encode adapter, gated on Khronos ratification + at least one shipping driver. No h266_nvenc adapter follow-up is planned (silicon does not ship VVC encode); ZLUDA + hypothetical CUDA-VVC rejected as not actionable. Source survey: Research-0085 (skeleton — most factual claims marked [UNVERIFIED] pending direct vendor-doc check). | Proposed | codecs, vvc, h266, gpu, hip, sycl, vulkan-video, vmaf-tune, nn-vc, fork-local |
| ADR-0316 | cli_parse — handle long-only options in error() | Accepted | cli, security, fork-local, fuzzing |
| ADR-0317 | Path-filter Docker + FFmpeg-integration on doc/Python-only PRs | Accepted | ci, build |
| ADR-0318 | fr_regressor_v2 ensemble retrain harness — wrapper-trainer interface fix + Phase A pre-step doc | Accepted | ai, fr-regressor, ensemble, loso, runbook, fork-local |
| ADR-0319 | fr_regressor_v2 ensemble LOSO trainer — real loader + per-fold training | Accepted | ai, fr-regressor, ensemble, loso, fork-local |
| ADR-0320 | fr_regressor_v2 ensemble seeds — production flip (smoke → false) | Accepted | ai, fr-regressor, ensemble, registry, prod-flip, fork-local |
| ADR-0321 | fr_regressor_v2_ensemble_v1 — full production flip (real ONNX + sidecars) | Accepted | ai, tinyai, models, registry, prod-flip |
| ADR-0323 | fr_regressor_v3 — train + register on ENCODER_VOCAB v3 (16-slot) | Accepted | ai, fr-regressor, codec-aware, encoder-vocab, loso, fork-local |
| ADR-0324 | Ensemble training kit — portable Phase-A + LOSO retrain bundle | Accepted | ai, fr-regressor, ensemble, tooling, fork-local |
| ADR-0325 | KonViD-150k corpus ingestion | Accepted | ai, training, corpus, license, fork-local |
| ADR-0326 | vmaftune.bisect — Phase B target-VMAF bisect: integer binary search over CRF assuming monotone-decreasing VMAF in CRF, returns the largest CRF whose measured VMAF still meets the floor. Six-to-eight encode round-trips per call (O(log range)); midpoint rounds toward higher CRF so the "best so far" is always a measured CRF, never extrapolated. Aborts with a clear error on monotonicity violation rather than falling back to a different search strategy (the tools/vmaf-tune/AGENTS.md invariant). Subprocess seam mirrors encode.run_encode / score.run_score so unit tests run with synthetic curves. make_bisect_predicate(target_vmaf, *, width=..., height=..., framerate=..., duration_s=...) adapter satisfies compare.PredicateFn; compare._default_predicate updated to point callers at the entry-point. Replaces the NotImplementedError("Phase B pending") placeholder referenced by ADR-0276, ADR-0287, ADR-0295, ADR-0306. Companion to ADR-0237 Phase A. | Accepted | tooling, vmaf-tune, fork-local |
| ADR-0328 | Cambi cluster port — skip the shared-header rename | Accepted | simd, port, cambi |
| ADR-0331 | Skip CI on draft pull requests across all 8 fork workflows. Adds types: [opened, synchronize, reopened, ready_for_review] to each pull_request trigger and gates every top-level job with an if: clause that skips when github.event.pull_request.draft == true while keeping push: triggers intact. The ready_for_review event re-fires CI when a draft is promoted; draft PRs are unmergeable by GitHub's definition, so the resulting "Required check missing" state on drafts is benign. Cuts CI spend by roughly half on the fork's typical 10+-draft work-in-progress queue. | Proposed | ci, build, fork-local |
| ADR-0332 | Agent worktree-drift hard guard. New pre-commit hook scripts/ci/check-agent-worktree-drift.sh (wired through .pre-commit-config.yaml and installed by make hooks-install) refuses commits whose git rev-parse --show-toplevel is the main checkout while one or more sibling agent worktrees exist under <root>/.claude/worktrees/agent-*. Catches the drift pattern observed five times in the 2026-05-09 session (PR #498, #520, #526, MCP runtime v2 first attempt, multi-corpus run) where a background agent committed into the main checkout instead of its assigned worktree, clobbering the user's working state. Bypass: git commit --no-verify for legitimate main-checkout commits while an agent runs. Pairs with the process-side global rule feedback_agents_isolated_worktree_only; documented at docs/development/agent-worktree-discipline.md. | Accepted | agents, ci, build, fork-local |
| ADR-0333 | vmaf-tune Phase F multi-pass encoding — first proof-of-concept on libx265. Adapter contract gains supports_two_pass: bool + two_pass_args(pass_number, stats_path) -> tuple[str, ...]; X265Adapter overrides both, returning ('-x265-params', f'pass={N}:stats={path}'). EncodeRequest gains optional pass_number: int = 0 + stats_path: Path | None = None; build_ffmpeg_command splices two_pass_args and redirects pass-1 output to -f null - so the throwaway pass doesn't write a useless mp4. New encode.run_two_pass_encode(req, ...) runs pass 1 → pass 2 with a per-encode unique stats path, returns one combined EncodeResult (encode_time = sum of both passes; size = pass-2 size). New CLI flag --two-pass opts in on corpus / recommend; default stays single-pass. Adapters where supports_two_pass = False fall back to single-pass with a stderr warning rather than failing (matches the saliency x264-only fallback precedent). Cache key (ADR-0298) gains two_pass: bool so 1-pass and 2-pass cells are distinct entries. New tests under tests/test_codec_adapter_x265_two_pass.py (subprocess-mocked argv shape, run_two_pass_encode round-trip, corpus end-to-end with pass_count-aware cache, fallback behaviour for unsupported adapters). Identifies which sibling codecs benefit from this seam: libx264, libsvtav1, libvvenc, libaom-av1 (yes, follow-up PRs); NVENC family (separate ADR — -multipass is single-invocation lookahead, not the stats-file two-call sequence); AMF / QSV / VideoToolbox (no — hardware encoders use internal lookahead, no stats-file 2-pass). | Accepted | tooling, ffmpeg, codec, automation, fork-local, vmaf-tune, phase-f |
| ADR-0334 | state.md-touch-check CI gate (ADR-0165 enforcement) — promotes CLAUDE.md §12 rule 13 (every PR that closes / opens / rules-out a bug updates docs/state.md) from reviewer-enforced to CI-enforced. Lands scripts/ci/state-md-touch-check.sh + companion fixture script scripts/ci/test-state-md-touch-check.sh (8 cases: 5 primary + 3 regression for debug vs bug, Closes #N, BUG- upper-case), wired as a fourth blocking job state-md-touch-check in .github/workflows/rule-enforcement.yml alongside the existing deep-dive-checklist/doc-substance-check/adr-backfill-check jobs. Trigger predicate fires on Conventional-Commit fix: prefix, the bare token bug (word-boundary so debug doesn't fire), GitHub-issue close keywords (closes/fixes/resolves #N), or an unchecked Bug-status-hygiene template row. Pass conditions: diff touches docs/state.md, OR PR body carries no state delta: REASON (HTML comments stripped first so the template's instructional placeholder doesn't accidentally satisfy). Same draft-PR gating + script-with-thin-wrapper shape as deliverables-check.sh (ADR-0124). Surfaced as a backlog row by the state.md audit-backfill PR #455. | Accepted | ci, process, state-hygiene, claude-rule, fork-local |
| ADR-0335 | Hardware-capability priors for the FR-regressor corpus — ships ai/data/hardware_caps.csv with per-architecture capability fingerprints (codecs supported, max resolution per codec, encoding-block count, tensor-core / NPU presence, driver-min-version, primary vendor source URL, ISO verified-date) for the three named GPU generations Battlemage / RDNA4 / Blackwell plus their immediate predecessors Alchemist / RDNA3 / Ada Lovelace (six rows on 2026-05-08). Loader at ai/scripts/hardware_caps_loader.py exposes cap_vector_for(encoder, encoder_arch_hint) returning fixed-shape hwcap_* feature columns the corpus-ingest pipeline merges into encode rows. Prior-only by design: schema rejects benchmark-shaped columns, community-wiki source URLs, empty fields, and zero-encoding-block rows; companion research digest 0088-hardware-capability-priors-2026-05-08.md establishes the category-1 NO-GO finding (vendor-published throughput / quality numbers leak biased priors) and the category-3 NO-GO finding (community wikis lack audit trail). NVIDIA Hopper deliberately excluded — H100 / H200 ship zero NVENC engines and fall outside an encode-capability fingerprint. Operator doc at docs/ai/hardware-capability-priors.md. | Accepted | ai, corpus, data, docs |
| ADR-0336 | KonViD MOS head v1 (ADR-0325 Phase 3) — train and register konvid_mos_head_v1, the fork's first head trained directly against subjective MOS ratings (not the libvmaf VMAF teacher score). Maps canonical-6 + saliency mean/var + 3 TransNet shot-metadata columns + ENCODER_VOCAB v4 single-slot one-hot to a scalar MOS in [1.0, 5.0]. Small MLP (~5K params, opset 17, ONNX-allowlist conformant). Trainer is ai/scripts/train_konvid_mos_head.py; vmaf-tune callers reach the surface via Predictor.predict_mos() with a documented linear approximation mos = (predicted_vmaf - 30) / 14 as the fallback when the ONNX is missing. Production-flip gate (PLCC ≥ 0.85, SROCC ≥ 0.82, RMSE ≤ 0.45, spread ≤ 0.005) is not lowered on real-corpus failures (memory feedback_no_test_weakening); the synthetic-corpus surrogate gate ships at PLCC ≥ 0.75. Real-corpus retrain blocked on PR #447 (KonViD-150k ingestion). | Accepted | ai, training, mos, konvid, fork-local |
| ADR-0337 | motion_v2 inherits motion v1's public option surface (duplicate registration) — resolves the architectural question deferred by PR #453 / PR #460. Lands the upstream four-commit motion_v2 cluster (856d3835 mirror fix, c17dd898 motion_max_val, a2b59b77 motion_five_frame_window-as-option, 4e469601 remaining options + motion3_v2_score) by registering the same option set on motion_v2 that motion v1 already exposes via ADR-0158. Picks A1: duplicate option surfaces over A2 (shared parser — couples v1/v2 internals), A3 (deprecate v1 — breaks goldens + CLI), A4 (defer — falls behind upstream). motion_five_frame_window=true returns -ENOTSUP at init() mirroring ADR-0219 §Decision; the 3-frame default is fully supported. Picture-pool plumbing (prev_prev_ref field + n_threads * 2 + 2 sizing) deferred to a follow-up PR. Netflix golden gate untouched (motion v1 unchanged). | Accepted | upstream-port, motion, feature-extractor, cli, public-api, fork-local |
| ADR-0338 | Add an advisory CI lane Build — macOS Vulkan via MoltenVK (advisory) on macos-latest to validate the existing Vulkan compute backend on Apple Silicon via MoltenVK. Installs molten-vk + vulkan-loader + vulkan-headers + shaderc via Homebrew, pins VK_ICD_FILENAMES to /opt/homebrew/etc/vulkan/icd.d/MoltenVK_icd.json, builds with -Denable_vulkan=enabled, runs the three Vulkan smoke tests. Lane is continue-on-error: true until one green run on master. Complementary to the planned native Metal backend, not a replacement. Operator doc + known-limitations matrix at docs/backends/vulkan/moltenvk.md. | Accepted | ci, vulkan, macos, moltenvk, gpu |
| ADR-0339 | Placeholder Av1VideoToolboxAdapter for tools/vmaf-tune/ codec-adapter registry. Apple M3 / M4 silicon has hardware AV1 encode but FFmpeg upstream has not exposed it (verified against master 8518599cd1, 2026-05-09). The adapter registers with supports_runtime=False and raises Av1VideoToolboxUnavailableError until a runtime probe of ffmpeg -h encoder=av1_videotoolbox confirms the encoder exists, at which point it self-activates without a code change. Paired with scripts/upstream-watcher/check_ffmpeg_av1_videotoolbox.sh and a weekly cron workflow that opens a tracking GitHub issue when the encoder lands. First instance of the upstream-watcher pattern documented in docs/development/upstream-watchers.md. | Accepted | tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local, upstream-blocked |
| ADR-0340 | Multi-corpus aggregation for the FR-regressor / predictor v2 trainer | Accepted | ai, training, corpus, fork-local |
| ADR-0341 | Add paths-ignore filter to libvmaf-build-matrix.yml and tests-and-quality-gates.yml so doc-only / research-only PRs skip the full 18-cell build matrix and 10-job test matrix. Conservative deny-list: docs/**, **/*.md, changelog.d/**, CHANGELOG.md, .workingdir2/**. Safe under ADR-0313 because the Required Checks Aggregator explicitly treats workflow-not-reported as path-filter-skipped/acceptable (see required-aggregator.yml — if (!run) { core.info('OK (not reported, treated as path-filter-skip)'); continue; }). Mirrors the path-filter pattern from ADR-0317 (deny-list polarity instead of allow-list so unknown new paths fail closed → run CI). Saves ~14 runner-min per avg PR per Research-0089 §3.2; concrete example PR #525 (single file docs/research/0089-...md) would have qualified. | Proposed | ci, build, policy, fork-local |
| ADR-0345 | cambi × {CUDA, SYCL, HIP} GPU port strategy. Locks Strategy II host-staged hybrid (the ADR-0205 / ADR-0210 Vulkan precedent) for all three pending backends — GPU services preprocess + derivative + spatial-mask SAT + decimate + filter-mode; host residual runs unmodified calculate_c_values + spatial pooling on byte-identical buffers; cross-backend gate at places=4 from day one. LOC envelopes per Research-0091: CUDA ≈1100 (LOW risk), SYCL ≈1300 (MEDIUM, dual toolchain per ADR-0335), HIP ≈1100 (MEDIUM-LOW, hipify-perl seed from CUDA). Implementation order CUDA → SYCL → HIP. Strategy III (fully-on-GPU c-values) remains parked per ADR-0205 §Out of scope. | Accepted | cuda, sycl, hip, gpu, cambi, fork-local, places-4 |
| ADR-0346 | FR-features-from-NR-corpus adapter pattern | Accepted | ai, training, corpus, methodology, fork-local |
| ADR-0347 | Sanitizer matrix (ASan + UBSan + TSan, ADR-0015) — concrete test-set scope. The pre-existing meson test --suite=unit invocation matched zero tests because no test() call in core/test/meson.build carries a suite: 'unit' tag, leaving every leg printing No suitable tests defined. and exiting 0 with zero correctness coverage. This ADR replaces --suite=unit with the full C unit-test set under each sanitizer, with a documented per-sanitizer deselect list for tests that fail because of a real underlying bug (not a sanitizer mis-configuration). ASan deselects test_model, test_predict, test_float_ms_ssim_min_dim; UBSan deselects test_model (and adds -fno-sanitize=function to skip the K&R-prototype harness UB across ~50 minunit-style tests); TSan deselects test_model, test_pic_preallocation, test_framesync. Each deselected entry cites a follow-up bug in docs/state.md so the gap stays visible. MSan stays out of the matrix (the matrix has always been ASan + UBSan + TSan, despite occasional "MSan" naming) — adding it would require an instrumented libc++ leg out of scope here. Surfaces seven real defects that have been hiding behind the no-op gate (svm.cpp parser allocation/null-deref on malformed JSON, dict/extractor leaks, integer_adm div_lookup global-init race, framesync mutex-domain mismatch). Companion research digest docs/research/0089-sanitizer-matrix-test-scope.md. | Accepted | ci, testing, sanitizer, asan, ubsan, tsan, fork-local |
| ADR-0348 | Globally suppress the CodeQL cpp/poorly-documented-function rule via .github/codeql-config.yml query-filters: - exclude: id: cpp/poorly-documented-function. The rule warns on every C/C++ function lacking a /** */ Doxygen header block, which directly contradicts the fork's documented coding standard (CLAUDE.md §6, docs/principles.md: "default to writing no comments. Only add one when the WHY is non-obvious"). 15 currently-open alerts in core/src/ are over-zealous against this house style — not real correctness flags. Alternatives considered: per-instance lgtm[...] annotations (15 noise-comments), mass-add /** */ blocks (contradicts standard), per-instance review (15× "suppress" anyway). The remaining security-extended + security-and-quality packs stay enabled — targeted exclusion, not wholesale weakening. Verification is post-merge: alerts auto-close on the next CodeQL scan against master. | Accepted | ci, security, codeql, policy, fork-local |
| ADR-0349 | fr_regressor_v3 namespace resolution — keep the existing production checkpoint (sha256 eaa16d23…, 19 call sites in 12 files, PR #428 merged 2026-05-06) untouched and reserve fr_regressor_v3plus_features as the namespace for the future canonical-6 + encoder_internal + shot-boundary + hwcap feature-set bump. Reservation lives in this ADR + a new ## fr_regressor_* namespace map block in ai/AGENTS.md; the registry row lands with the future PR that ships the .onnx (a stub row would fail core/test/dnn/test_registry.sh). Rejects (a) renaming v3 to fr_regressor_v3_vocab16 (touches 19 call sites + breaks the production-flip immutability ADR-0291 establishes) and (b) calling the future work fr_regressor_v4_features (inflates _v4 to a name-conflict workaround, polluting the major-version axis we genuinely need for future regressor redesigns). Status appendix added to ADR-0302 per ADR-0028. | Accepted | ai, docs, naming, fork-local |
| ADR-0350 | psnr_hvs AVX-512 — re-bench confirms AVX2 ceiling (T3-9 (a) close-out). Per-symbol cycle share on 48-frame Netflix normal pair (Zen 5, 2026-05-09): calc_psnrhvs_avx2 78.42 % (scalar tail locked by ADR-0138/0139 bit-exactness), od_bin_fdct8x8_avx2 14.82 % (the only piece AVX-512 can widen). Amdahl ceiling: even an infinitely fast 16-lane DCT caps wall-clock improvement at 14.82 / (1 − 0.1482) = 17.4 % (1.17× over AVX2); a realistic 2-block batch recovers ~50 % of DCT cost, so projected gain is 1.07–1.08× — well below the T3-9 1.3× ship gate. Re-validates ADR-0180's 2026-04-26 verdict empirically on a current host with full AVX-512 (avx512f / dq / cd / bw / vl / ifma / vbmi); gives ADR-0160 a status-update appendix pointing here as the empirical close-out. T3-9 (b) and (c) (ssimulacra2_* AVX-512 + NEON drift audit; iqa_convolve AVX-512) bench independently in their own follow-up PRs. | Accepted | simd, avx512, psnr-hvs, ceiling, audit, fork-local |
| ADR-0351 | T3-15(b) — psnr_cuda chroma extension. Extends the luma-only ADR-0182 extractor to emit psnr_cb / psnr_cr alongside psnr_y on the CUDA backend, mirroring ADR-0216's Vulkan port. Three-element readback array in PsnrStateCuda (rb[3]) carries per-plane device-SSE accumulators + pinned-host slots; the kernel (calculate_psnr_kernel_{8,16}bpc in psnr_score.cu) gains a plane parameter so it indexes data[plane] / stride[plane] instead of the hard-coded [0]. Single private stream + submit/finished event pair issues all per-plane launches back-to-back on the picture stream (no inter-plane barrier — accumulators are independent), then DtoHs all three slots on lc.str before a single cuStreamSynchronize in collect(). Subsampling derived from pix_fmt: 4:2:0 → w/2 × h/2, 4:2:2 → w/2 × h, 4:4:4 → w × h. YUV400P clamps n_planes = 1 so chroma dispatches and emits are skipped. provided_features becomes {"psnr_y", "psnr_cb", "psnr_cr"}. psnr_max[p] follows CPU integer_psnr.c default branch ((6 * bpc) + 12). picture_cuda upload path needed no change — chroma planes were already uploaded for non-YUV400P inputs since the ciede_cuda landing (libvmaf.c::translate_picture_host's upload_mask). Cross-backend gate (scripts/ci/cross_backend_vif_diff.py --feature psnr --backend cuda) extended to assert all three plane scores at places=4; RTX 4090 measurement on the 576×324 and 640×480 testdata fixtures reports max_abs_diff = 0.0 across 48 frames for all three metrics (deterministic int64 SSE on both sides). Closes the GPU long-tail backlog row "psnr chroma parity with CPU" across both shipping GPU backends; chroma SSIM / chroma MS-SSIM CUDA follow-ups stay separate rows. | Accepted | cuda, gpu, feature-extractor, psnr, fork-local |
| ADR-0352 | Vulkan submit-pool migration PR A: adm_vulkan, motion_vulkan, psnr_vulkan (ADR-0256 follow-up, T-GPU-OPT-VK-1 + T-GPU-OPT-VK-4). Eliminates per-frame vkCreateFence / vkAllocateCommandBuffers / vkAllocateDescriptorSets for the three highest-ROI remaining extractors. adm_vulkan (16 dispatches/frame, 4 descriptor sets) and psnr_vulkan (3 dispatches/frame, 3 descriptor sets) achieve zero per-frame Vulkan API overhead beyond the actual dispatch because all buffer handles are init-time-stable. motion_vulkan (1 dispatch/frame) retains one vkUpdateDescriptorSets per frame for the blur ping-pong. Per-frame elimination: 12-16 round-trips. Expected throughput gain at sub-HD resolutions: 10-60 percent per the ADR-0256 profile. Numerical output is bit-identical; places=4 cross-backend gate passes on all three extractors. | Accepted | vulkan, perf, kernel-template, fork-local |
| ADR-0353 | Vulkan submit-pool migration PR-B — six secondary kernels | Proposed | vulkan, gpu, performance, fork-local |
| ADR-0354 | Vulkan submit-pool migration PR-C — cambi, ssimulacra2, float_ansnr, moment | Accepted | vulkan, perf, kernel-template, fork-local |
| ADR-0355 | Symphony-inspired agent-dispatch infrastructure — three thin, in-repo artefacts ported from openai/symphony §3.1/§4.1: (1) .claude/workflows/ typed-YAML-front-matter briefs (_template.md + codeql-alert-sweep.md / simd-port.md / feature-extractor-port.md); (2) scripts/lib/backlog_tracker.py read-only BacklogItem / BacklogTracker / GitHubTracker parsing .workingdir2/BACKLOG.md rows + gh PR queries; (3) scripts/ci/agent-eligibility-precheck.py pre-dispatch gate (BACKLOG row open + no merged PR on scope + no in-flight agent). Adopts Symphony's shapes without the Elixir/Codex/Linear runtime — stdlib-only, one PR, opt-in until a Claude Code Agent.preDispatch hook surfaces. Closes the two confirmed NO-OP dispatches this session (vmaf_tiny_v3 registry / T7-5 NOLINT sweep) at the cheapest gate. Rejects (a) full Symphony adoption (multi-week, new runtime + Linear dependency) and (c) status-quo manual triage (already losing more context than the build-out costs). | Accepted | agents, ci, tooling, fork-local |
| ADR-0356 | Two-level GPU reduction for Vulkan VIF / ADM / motion accumulators | Accepted | vulkan, perf, gpu |
| ADR-0357 | Vulkan readback buffer VMA allocation flag separation | Accepted | vulkan, performance |
| ADR-0358 | CUDA motion correctness — fix four real bugs surfaced by a cuda-reviewer pass: (1) cross-stream race on the SAD accumulator (memset on s->str, kernel atomic-add on pic_stream, no event linkage), now memsets on pic_stream mirroring the v2 pattern; (2) pinned-memory leak of s->sad_host (compute-sanitizer --tool memcheck reports LEAK SUMMARY: 8 bytes leaked in 1 allocations on master, 0 on the fix); (3) motion2_score skipped the MIN(score * motion_fps_weight, motion_max_val) post-process the CPU reference does at integer_motion.c:563; (4) moving-average guard off-by-one because s->frame_index is pre-incremented before motion3_postprocess_cuda runs. Plus two perf advisories in motion_score.cu and motion_v2_score.cu: (5) pad shared-tile inner stride 20 → 21 to break the 2-way bank conflict (GCD(20, 32) = 4 vs GCD(21, 32) = 1, +64 B/block); (6) add __launch_bounds__(BLOCK_X * BLOCK_Y, 8). Default settings remain bit-exact at places=4 (0/144 mismatches on Netflix src01_hrc00_576x324.yuv ↔ src01_hrc01); the bugs in (3) + (4) only trip under non-default motion_fps_weight ≠ 1.0 or motion_moving_average=true. ADR-0357 reserved for motion3 GPU coverage; common.c:388/416 inverted-stream-select advisory deferred per agent brief (no live callers). | Accepted | cuda, motion, correctness, precision, fork-local |
| ADR-0359 | ARC self-hosted runner pool pilot — route Cppcheck (Whole Project) through a ternary runs-on expression keyed on the new repo variable ARC_RUNNERS_ENABLED. Default false keeps every job on GitHub-hosted; flip to true to opt the pilot job into the in-cluster arc-runners scale set. Observable failure mode: if ARC is degraded, the job sits queued and the operator flips the variable back to fall back. After ≥ 1 day green pilot, ramp up to sanitizers + Vulkan + GPU build legs in follow-up PRs. | Accepted | ci, infra, fork-local |
| ADR-0360 | CAMBI CUDA port (Strategy II hybrid, T3-15a) | Accepted | cuda, gpu, cambi, feature-extractor, fork-local, places-4, t3-15 |
| ADR-0361 | Metal (Apple Silicon) compute backend — scaffold-only audit-first PR (T8-1; closes T8-1 audit half, runtime + motion_v2 kernel land in follow-up PRs T8-1b / T8-1c). New public header libvmaf_metal.h declaring VmafMetalState / VmafMetalConfiguration / vmaf_metal_state_init / _import_state / _state_free / vmaf_metal_list_devices / vmaf_metal_available. New core/src/metal/ (common + picture_metal + dispatch_strategy + kernel_template) + first-consumer scaffold core/src/feature/metal/integer_motion_v2_metal.c registering vmaf_fex_integer_motion_v2_metal (TEMPORAL flag, motion_v2_metal name). All entry points return -ENOSYS. New enable_metal feature option (default auto: probes for Metal.framework / MetalKit.framework on macOS, disabled elsewhere) with conditional subdir('metal') in core/src/meson.build. New 14-sub-test smoke at core/test/test_metal_smoke.c exercising every public C-API entry point, the kernel-template helpers, and the first-consumer registration. New CI matrix row Build — macOS Metal (T8-1 scaffold) compiling on macos-latest with -Denable_metal=enabled. New docs/backends/metal/index.md + docs/backends/index.md flipped from "planned" to "scaffold". Apple Silicon (GPU Family Apple 7+) only — Intel Macs rejected per discontinued-platform reasoning. Runtime layer will use Apple's MetalCpp C++ wrapper (per https://developer.apple.com/metal/cpp/, accessed 2026-05-09); MoltenVK passthrough rejected (translation overhead, double-dependency); Intel oneAPI rejected (no macOS distribution); OpenCL rejected (deprecated by Apple since macOS 10.14). Mirrors the HIP T7-10 scaffold (ADR-0212) and Vulkan T5-1 scaffold (ADR-0175). | Accepted | gpu, metal, apple-silicon, scaffold, audit-first, fork-local |
| ADR-0362 | K150K-A corpus integration: FR-from-NR extraction of FULL_FEATURES. Integrates KoNViD-150k-A (152,265 clips, crowd-sourced MOS) into the tiny-AI training pipeline using the FR-from-NR adapter (ADR-0346): the same decoded YUV is fed as both reference and distorted to build-cpu/tools/vmaf --backend cuda (RTX 4090). All 22 FULL_FEATURES (Research-0026) are extracted per clip and aggregated to nanmean + nanstd; ciede2000 and psnr_hvs are all-NaN (identity-pair artifact, expected). Output: runs/full_features_k150k.parquet (gitignored, one row per clip, 48 cols). Restartable via .done checkpoint + atomic parquet flush every 1000 clips. Single-process ETA ~296 h at ~7 s/clip. Implementation: ai/scripts/extract_k150k_features.py. User docs: docs/ai/datasets/k150k.md. Companion: Research-0067. | Accepted | ai, training-data, corpus, k150k, full-features, fork-local |
| ADR-0363 | Mend Renovate replaces Dependabot as the dependency-update bot. Adds renovate.json (config:recommended base, weekly Monday schedule, grouped GitHub Actions minor+patch bumps, pre-commit hooks manager, Python patch grouping) and .github/workflows/renovate.yml (self-hosted via renovatebot/github-action SHA-pinned to v46.1.13 79dc0ba74dc3de28db0a7aeb1d0b95d5bf5fde2a). Disables .github/dependabot.yml (renamed to .disabled with a revert comment). Key addition over Dependabot: a customManagers regex rule tracking FFMPEG_SHA:=n[0-9.]+ in ffmpeg-patches/test/build-and-run.sh and FFMPEG_PATCHES_BRANCH in scripts/ci/ffmpeg-patches-check.sh against github-tags/FFmpeg/FFmpeg — a surface Dependabot cannot reach. Operator doc: docs/development/dependency-bot.md. | Accepted | ci, security, dependencies, github-actions, pre-commit, fork-local |
| ADR-0364 | saliency_student_v2 — Resize-decoder ablation on the v1 recipe | Accepted (gate passed: v2 IoU 0.7105 ≥ v1 0.6558) | ai, dnn, mobilesal, saliency, training, fork-local, docs |
| ADR-0365 | Wire the Apple CoreML execution provider into the tiny-AI ORT dispatch layer. Adds four --tiny-device selectors (coreml, coreml-ane, coreml-gpu, coreml-cpu) and the matching VmafDnnDevice enum values 5..8 (append-only). The coreml-ane selector pins MLComputeUnits=CPUAndNeuralEngine for highest perf-per-watt on M-series silicon; the unscoped coreml lets the EP auto-route. Wiring uses the generic key/value SessionOptionsAppendExecutionProvider form so the Linux build degrades cleanly when the EP is absent. End-to-end ANE silicon validation deferred until Apple-silicon hardware access. Apple-side parallel to ADR-0332's OpenVINO NPU wiring. | Proposed | ai, dnn, coreml, apple-silicon, fork-local |
| ADR-0366 | vmaf-tune corpus schema v3 — canonical-6 per-feature aggregates | Accepted | ai, tools, vmaf-tune, corpus, schema |
| ADR-0367 | LSVQ corpus ingestion for nr_metric_v1 — adopt the LIVE Large-Scale Social Video Quality dataset (Ying et al. ICCV 2021, ~39 K UGC videos, ~5.5 M ratings, CC-BY-4.0) as a third training shard alongside KonViD-150k (ADR-0325 Phase 2) and BVI-DVC (ADR-0310). New ai/scripts/lsvq_to_corpus_jsonl.py adapter mirrors the KonViD-150k Phase 2 shape verbatim — resumable per-URL curl downloads with atomic tempfile-rename progress writes, ffprobe-driven geometry probe, MOS / SD / rating-count round-trip from the canonical Hugging Face split CSV (teowu/LSVQ-videos), and the same JSONL row contract modulo corpus = "lsvq". Refuses sub-1000-row CSVs; defaults to a 500-row laptop-class subset with --full for whole-corpus ingestion (~500 GB working set). License posture is local-only — corpus + per-clip MOS stay under .workingdir2/, only derived nr_metric_v1_*.onnx weights ship with CC-BY-4.0 attribution. ENCODER_VOCAB v4 "ugc-mixed" collapse stays trainer-side and is unchanged by this PR. Tests under ai/tests/test_lsvq.py cover resumable resume, attrition tolerance, refuse-tiny cutoff, atomic progress-file writes, ffprobe geometry parse, broken-clip skip, MOS-column round-trip (canonical + alias headers), bare-stem name → .mp4 suffix, append+dedup on re-run, and --max-rows / --full cap behaviour. | Accepted | ai, training, corpus, license, fork-local |
| ADR-0368 | External-competitor benchmark harness — wrapper-only architecture for side-by-side comparison between the fork's fr_regressor_v2_ensemble_v1 + nr_metric_v1 predictors and two external OSS competitors (Synamedia/Quortex x264-pVMAF, GPL-2.0; DOVER-Mobile, Apache-2.0 + CC-BY-NC-SA 4.0). Lands tools/external-bench/ with four run.sh wrappers (one per competitor; each invokes a user-installed binary via env var and re-shapes its output into a normalised JSON schema), compare.py orchestrator (BVI-DVC test fold + Netflix Public Drop corpus discovery, PLCC/SROCC/RMSE/runtime aggregation, fixed-width comparison-table renderer), 7 stubbed pytest cases that monkeypatch subprocess.run so tests never depend on external binaries being installed, and operator-facing README.md documenting the licence boundary. Critical licence constraint: x264-pVMAF is GPL-2.0 vs the fork's BSD-3-Clause-Plus-Patent — the wrapper-only posture keeps zero GPL'd code in the fork; vendoring would relicense the entire fork. Companion to ADR-0310 (corpus) and ADR-0321 (fork-side predictor lineage). | Accepted | ai, testing, license, tooling, fork-local |
| ADR-0369 | Waterloo IVC 4K-VQA corpus ingestion for nr_metric_v1 — adopt the University of Waterloo IVC 4K Video Quality Database (Li et al. ICIAR 2019; 20 pristine 4K sources × 5 codecs × 3 resolutions × 4 distortion levels = 1 200 clips with controlled-subjective-study MOS, permissive academic licence) as a fourth training shard alongside BVI-DVC (ADR-0310), KonViD-150k (ADR-0325 Phase 2), and LSVQ (ADR-0333). New ai/scripts/waterloo_ivc_to_corpus_jsonl.py adapter mirrors the LSVQ / KonViD-150k Phase 2 shape — resumable per-URL curl downloads with atomic tempfile-rename progress writes, ffprobe-driven geometry probe, and the same JSONL row contract modulo corpus = "waterloo-ivc-4k". Closes the 2160p resolution-bin gap in the BVI-DVC + KonViD-150k + LSVQ union flagged in research digest #465. Auto-detects between the upstream canonical headerless 5-tuple shape (encoder, video_number, resolution, distortion_level, mos) and the standard LSVQ-shape named-column CSV. Refuses sub-100-row CSVs; defaults to a 100-row laptop-class subset with --full for whole-corpus ingestion (~multi-TB working set). MOS is recorded verbatim on the Waterloo-native 0–100 raw scale (NOT 1–5 like KonViD / LSVQ); cross-corpus rescaling is a trainer-side follow-up. License posture is local-only — corpus + per-clip MOS stay under .workingdir2/, only derived nr_metric_v1_*.onnx weights ship with IVC attribution. ENCODER_VOCAB v4 "professional-graded" slot routing stays trainer-side and is unchanged by this PR. Tests under ai/tests/test_waterloo_ivc.py (20 cases) cover canonical-headerless auto-detect, standard-CSV parse, alias headers, native 0–100 MOS round-trip, resumable resume, attrition tolerance, refuse-tiny cutoff, atomic progress-file writes, ffprobe geometry parse (HEVC / AV1 at 4K), broken-clip skip, append+dedup on re-run, encoder_upstream verbatim, and --max-rows / --full cap behaviour. | Accepted | ai, training, corpus, license, fork-local |
| ADR-0370 | LIVE-VQC MOS-corpus ingestion for nr_metric_v1 | Accepted | ai, training, corpus, license, fork-local |
| ADR-0371 | Shared CorpusIngestBase for MOS-corpus ingestion adapters | Accepted | ai, corpus, refactor, fork-local |
| ADR-0372 | HIP Batch-1 — integer_psnr_hip and float_ansnr_hip Real Kernels | Accepted | hip, gpu, build |
| ADR-0373 | HIP Batch-2 — float_motion_hip Real Kernel | Accepted | hip, gpu, build |
| ADR-0374 | Build-time-optional public APIs return -ENOSYS when disabled | Accepted | dnn, cuda, sycl, hip, vulkan, metal, mcp, build, api, fork-local |
| ADR-0375 | HIP batch-3 — float_moment_hip and float_ssim_hip real kernels | Accepted | hip, gpu, build, feature-extractor, fork-local |
| ADR-0376 | Fix silent error-swallow in Vulkan buffer-invalidate readback functions | Accepted | vulkan, gpu, build, correctness, fork-local |
| ADR-0377 | HIP batch-4 — ciede_hip and integer_motion_v2_hip real kernels | Accepted | hip, gpu, build, feature-extractor, fork-local |
| ADR-0378 | Per-picture CUDA streams must use CU_STREAM_NON_BLOCKING | Accepted | cuda, performance, gpu, feature-extractor, fork-local |
| ADR-0379 | libvmaf Symbol Visibility — Hide Internal Symbols with -fvisibility=hidden | Accepted | build, api, security, abi, fork-local |
| ADR-0380 | FFmpeg libvmaf filter — HIP backend selector patch (0011) | Accepted | ffmpeg-patches, hip, integration |
| ADR-0381 | Fix Vulkan VIF Scale 2/3 Numerical Saturation (PR #718) | Accepted | vulkan, precision, build |
| ADR-0382 | Y4M header parser — reject non-positive width or height before allocation | Accepted | security, fuzz, parser, fork-local |
| ADR-0383 | K150K corpus scoring driver — parallel CPU worker redesign | Accepted | ai, corpus, performance, training, fork-local |
| ADR-0384 | Switch shfmt pre-commit hook from binary download to Go-source build | Accepted | ci, build, fork-local |
| ADR-0385 | Feature-extractor deduplication by provided-feature names | Accepted | correctness, cuda, gpu, feature-extractor, fork-local |
| ADR-0386 | ADR Number Collision Prevention — Hook + CI Gate + Helper Script | Accepted | ci, docs, git, agents |
| ADR-0387 | Migrate Renovate from self-hosted workflow to GitHub App | Accepted | infra, dependency-bot, fork-local |
| ADR-0388 | Ingest BVI-CC as the second tiny-AI training corpus | Draft | ai, fr-regressor, corpus, license, bristol |
| ADR-0389 | vmaf_tiny_v3 — wider/deeper mlp_medium tiny VMAF MLP | Accepted | ai, dnn, tiny-ai, model, registry, fork-local |
| ADR-0390 | vmaf_tiny_v4 — mlp_large arch (opt-in only; arch ladder stops here) | Accepted | ai, tiny-ai, model, inference |
| ADR-0391 | Closes the PR #346 deferred follow-up — root-causes the residual 5/48 NVIDIA-Vulkan ciede2000 places=4 mismatch (max abs 8.9e-05, 1.78× threshold) as a structural f32-vs-f64 precision gap. CPU ciede.c::get_lab_color runs the BT.709 → linear-RGB → XYZ → Lab chain in double; the Vulkan shader runs it in float. Controlled experiment (rebuild CPU with f32-throughout helpers) proves f32-CPU and NVIDIA-Vulkan agree to ~6e-7 on the 5 failing frames (the highest-ΔE frames of the fixture); the gap is the irreducible f32-vs-f64 delta amplified by per-pixel ΔE summation. Rejects three mitigations: f64 shader promotion via shaderFloat64 (RTX 4090 runs f64 at 1/64 fp32 throughput; SPIR-V f64 transcendentals unmandated by spec), f32-narrowing the CPU reference (changes Netflix golden ground truth), and matched polynomial pow/sqrt/sin approximations (cost-benefit fails for a 1.78× tail). Accepts as documented fork debt under docs/state.md Open bug T-VK-CIEDE-F32-F64; lavapipe parity gate (places=4, 0/48) remains authoritative for CI. Companion to research-0055. | Accepted | vulkan, ciede, precision, gpu, nvidia, fork-local |
| ADR-0392 | vmaf-tune Phase D — per-shot CRF tuning scaffold | Accepted (scaffold only; native per-codec emission and | tooling, ai, ffmpeg, codec, automation, fork-local |
| ADR-0393 | fr_regressor_v2 probabilistic head — deep-ensemble + conformal scaffold | Accepted | ai, fr-regressor, probabilistic, ensemble, conformal, fork-local |
| ADR-0394 | Local sidecar training — on-host bias-correction model that adapts the shipped per-shot VMAF predictor to the operator's own source / encoder mix without mutating the predictor itself. Adds tools/vmaf-tune/src/vmaftune/sidecar.py with SidecarConfig, SidecarModel (online ridge regression with closed-form Sherman-Morrison rank-1 inverse update, pure-Python, zero new ML dep), and SidecarPredictor that composes a Predictor with a SidecarModel. Persistence under ${XDG_CACHE_HOME:-~/.cache}/vmaf-tune/sidecar/<predictor-version>/<codec>/state.json, keyed by an anonymous random 128-bit host UUID generated by secrets.token_hex(16) — never derived from MAC, hostname, machine-id, or any machine-identifying info (load-bearing precondition for a future opt-in community-pool upload). Cold-start is identically zero correction (composed predictor is bit-equivalent to the bare Predictor); predictor-version bump invalidates the sidecar to a fresh cold-start. Five contract tests under tools/vmaf-tune/tests/test_sidecar.py pin cold-start pass-through, residual-reduction after captures, save/load round-trip, UUID stability across reconstructions, and predictor-version invalidation. Operator doc docs/ai/local-sidecar-training.md; algorithm + privacy + drift-hook rationale in Research-0086. Local-only by default; opt-in upload to a community pool (ChatGPT-vision item 4) is explicitly out of scope and tracked under §Future work. Closes the Research-0087 item-3 "partially scaffolded" gap. | Proposed | ai, vmaf-tune, sidecar, online-learning, privacy, fork-local |
| ADR-0395 | Predictor stub-models policy — ship one synthetic-stub model/predictor_<codec>.onnx for each of the 14 vmaf-tune codec adapters trained from a deterministic 100-row synthetic corpus seeded by codec name, plus the trainer (tools/vmaf-tune/src/vmaftune/predictor_train.py) that consumes a real Phase A JSONL corpus when available and falls back to the synthetic generator per-codec when not. Tiny MLP (14 inputs × 64 hidden × 1 output, ~5K params), opset 18, op-allowlist validated. Per-codec model card under model/predictor_<codec>_card.md flags corpus.kind: synthetic-stub-N=100 with a do-not-use-in-production warning; cards switch to corpus.kind: real-N=<rows> when the operator runs the trainer against a real corpus. Closes the predictor follow-up from PR #430. Production weights flip stays gated on real corpus generation. | Accepted | ai, vmaf-tune, predictor, models, fork-local |
| ADR-0396 | Video-temporal saliency extension to saliency_student_v1 — three-phase rollout. Phase 1 (immediate): configurable temporal aggregator (mean / ema / motion-weighted) inside tools/vmaf-tune/src/vmaftune/saliency.py, no new model — captures the SalEMA-validated "EMA over a frozen 2D backbone closes most of the gap to a sophisticated temporal model" finding for free. Phase 2 (follow-up): video_saliency_student_v1 (~200–300 K params, BSD-3-Clause-Plus-Patent, ONNX opset 17), distilled from UNISAL (Apache-2.0, MobileNetV2 + Bypass-RNN) on DHF1K (CC BY 4.0). TinyU-Net + learned per-channel EMA gate on the bottleneck; same I/O contract as saliency_student_v1 plus an optional bottleneck-state input. Trained via ai/scripts/train_video_saliency_student.py. Phase 3: ONNX export + vmaf-tune recommend --saliency-mode {image, video} flag, with image as the default until a BD-rate sweep justifies the flip. Rejected: TASED-Net (21.2 M params, MIT — wrong size class for fork's tiny-AI footprint), ViNet-v2 (CC BY-NC-SA 4.0 — license blocker mirrors ADR-0257), AViMoS (mouse-tracked, held in reserve for v2), Mamba-based ZenithChaser (op not on core/src/dnn/op_allowlist.c). Source survey: Research-0086. | Proposed | ai, dnn, saliency, video-saliency, vmaf-tune, roi, fork-local, design |
| ADR-0397 | vmaf-tune Phase F — auto adaptive recipe-aware tuning entry point. Design-only ADR (F.0): ships the deterministic decision tree that composes the existing phases (corpus, recommend, fast, predict, tune-per-shot, recommend-saliency, ladder, compare) plus the orthogonal modes (HDR auto-detect, sample-clip, resolution-aware) into a single vmaf-tune auto --src ref.mkv --target-vmaf 92 --max-budget-bitrate 5000 CLI verb. Internal architecture is a hand-coded tree (no learned policy at runtime; explainability + reproducibility floor) with a 30-line pseudocode spec. Phased rollout: F.0 design (this ADR), F.1 sequential scaffold + --smoke, F.2 short-circuits (single-rung ladder, codec known, GOSPEL predictor, short / low-variance source skips Phase D, photographic content skips saliency, SDR skips HDR pipeline, sample-clip propagation), F.3 confidence-aware fallbacks (per-cell escalation to coarse-to-fine on FALL_BACK), F.4 per-content-type recipe overrides (animation / live-action / screen-content). No code yet. Companion: Research-0067. | Proposed | tooling, automation, vmaf-tune, ffmpeg, codec, fork-local |
| ADR-0398 | MyTestCase upstream migration — partial port (golden-pinned files deferred) | Accepted | testing, upstream-sync, python |
| ADR-0399 | vmaf-tune codec-adapter contract becomes a runtime contract (HP-1) | Accepted | tooling, codec, automation, fork-local, bug-fix |
| ADR-0400 | encoder-internal-stats capture (corpus expansion v1) | Accepted | vmaf-tune, corpus, predictor, x264 |
| ADR-0401 | libvmaf WebAssembly target — research-only feasibility ADR proposing a phased rollout: EXPERIMENT (smallest scalar Tier-1 prototype, behind enable_wasm=false, no release / no npm publish) before any commitment to Tier 2 (simde + WASM-SIMD + npm publish) or Tier 3 (onnxruntime-web + tiny-AI heads). The WASM build joins GPU / SIMD as "numerically close, never bit-exact" and runs its own snapshot suite under testdata/scores_wasm_*.json rather than participating in the Netflix CPU golden gate. Decision matrix in Research-0089. | Proposed | build, wasm, browser, ai, fork-local |
| ADR-0402 | MCP runtime v2 — UDS transport + real compute_vmaf binding | Accepted | mcp, agents, api, transport, fork-local |
| ADR-0403 | mkdocs --strict validation policy. Tightens the existing .github/workflows/docs.yml --strict lane (which had been a smoke test because every link-validation category in mkdocs.yml was set to info): promotes links.anchors and nav.{not_found,omitted_files} to warn so the lane fails on broken in-doc anchors, mkdocs nav typos, and leaked excluded-tree pages; documents links.{not_found,unrecognized_links}: info carve-outs for the two unfixable populations (cross-tree pointers ../../core/src/... outside docs_dir, and ADR-body cross-refs to renamed neighbours frozen by ADR-0028 / ADR-0106 immutability). Excludes docs/adr/_index_fragments/** from the rendered site (concatenation source per ADR-0221). Sweeps actionable subset: fixes two genuinely-broken anchors (docs/mcp/embedded.md → ADR-0209's "What lands next" heading, docs/research/0055-...md → Research-0053's "Distribution" heading) and the bare-relative-dir links in docs/{index,state,rebase-notes}.md. Net: strict-build flips from EXIT=1 with 1,276 emitted WARNINGs (after promoting categories to warn) to EXIT=0 with the actionable classes still gated. | Proposed | docs, ci, mkdocs, fork-local |
| ADR-0404 | Keep nightly.yml (TSan) and fuzz.yml (libFuzzer) workflows running unmodified despite 23+ days of consecutive failure runs. Both gates surface real bugs — a data race in div_lookup_generator (ADM init) and a NULL-deref SEGV in y4m_input_fetch_frame on negative-width Y4M headers. Per memory feedback_no_test_weakening, the gates stay on; the bugs get fixed in dedicated follow-up PRs (not in this triage PR). State.md rows pin the failing tests + reopen triggers so reviewers can immediately distinguish the two known-open bugs from any new finding. | Accepted | ci, testing, fuzzing, security |
| ADR-0405 | Wire OpenVINO NPU execution provider into the tiny-AI dispatch layer. Adds VMAF_DNN_DEVICE_OPENVINO_NPU / _CPU / _GPU enum values plus matching --tiny-device=openvino-npu / openvino-cpu / openvino-gpu CLI keywords; pins the OpenVINO EP to a single device_type with no fallback inside the explicit-selector branches (the existing --tiny-device=openvino keeps the GPU→CPU fallback chain). NPU is intentionally NOT added to the AUTO try-chain — opt-in only because of NPU power-state latency floor on small graphs. Smoke-test exercises the selector path against vmaf_dnn_session_attached_ep(); on hardware without NPU silicon the graceful CPU-EP fallback in vmaf_ort_open() handles the absence. End-to-end NPU validation deferred to a contributor with Meteor / Lunar / Arrow Lake hardware (re-evaluation triggers in the ADR body). | Accepted | ai, dnn, openvino, intel-ai-pc, fork-local |
| ADR-0406 | Defer the SYCL ADM DWT group_load rewrite recommended by research-0086 §A.4. Two blockers surfaced on implementation: (1) divisibility — the vert tile (TILE_H × WG_X = 576 int32 elements, WG_SIZE = 256) violates the SYCL ext total = WG_SIZE × ElementsPerWorkItem contract (576 / 256 = 2.25 is not integer; the expression 2(WG_Y+1)/WG_Y is integer only for WG_Y ∈ {1, 2}), (2) source contiguity — group_load takes a contiguous InputIteratorT and the multi-row tile load is contiguous only within a single 32-int row. The hori pass has no SLM tile, so was a non-target. Battlemage register-pressure validation is unavailable on the Arc A380 dev host. The kernel stays bit-exact-untouched; the audit checklist row in docs/development/oneapi-install.md is annotated with a cross-link to this ADR. Reopens when (a) a tile-geometry redesign yields integer divisibility AND (b) Xe2 hardware is available. ADR-0202 carries a Status-update appendix per the immutability rule. | Accepted | sycl, adm, perf, deferred, fork-local |
| ADR-0407 | AdaptiveCpp / hipSYCL added as a second supported SYCL toolchain alongside Intel oneAPI icpx. Contributors who do not want to install Intel's ~2.6 GB closed-source basekit can pass -Dsycl_compiler=acpp (with -Dsycl_acpp_targets=<targets>) to build the fork's -Denable_sycl=true path against the open-source LLVM-based AdaptiveCpp instead. Intel icpx remains the primary toolchain — fork-shipped binaries, Intel discrete-GPU codegen, and OpenVINO / NPU enablement stay icpx-coupled. New core/src/feature/sycl/sycl_compat.h neutralises the 10 [[intel::reqd_sub_group_size(N)]] call sites under acpp via a VMAF_SYCL_REQD_SG_SIZE(N) macro; new docs/development/sycl-toolchains.md documents the per-toolchain capability matrix and numerical-conformance gap. Closes Research-0086 Topic B (GO-AS-SECOND-TOOLCHAIN). | Accepted | sycl, build, toolchain, fork-local, ci, contributor-experience |
| ADR-0408 | FFmpeg libvmaf filter — CUDA backend selector, mirroring the existing SYCL (ADR-0118) / Vulkan (ADR-0186) selectors. Adds ffmpeg-patches/0010-libvmaf-wire-cuda-backend-selector.patch: a cuda boolean AVOption on the libvmaf filter that — when set — inits a VmafCudaState against the CUDA primary context, imports it into the VmafContext, and dispenses VmafPictures from a HOST_PINNED preallocation pool so software AVFrame input flows into pinned-host memory the CUDA feature kernels DMA from without a staging copy. Configure changes promote libvmaf_cuda from blanket-autodetect to the user-facing --enable-libvmaf-cuda flag (matching SYCL/Vulkan in EXTERNAL_LIBRARY_LIST); the in-filter selector keeps working when libvmaf ships CUDA support. Coexists with the upstream dedicated libvmaf_cuda filter via the CONFIG_LIBVMAF_CUDA && !CONFIG_LIBVMAF_CUDA_FILTER guard. Cumulative replay 0001..0010 against pristine n8.1.1 PASS; configure --help advertises --enable-libvmaf-cuda symmetrically with SYCL/Vulkan. | Accepted | ffmpeg-patches, cuda, integration |
| ADR-0409 | Automated CI gate for CLAUDE.md §12 r14 — third blocking job in .github/workflows/rule-enforcement.yml, backed by scripts/ci/ffmpeg-patches-surface-check.sh. Parses every patch under ffmpeg-patches/ once, extracts a "consumed set" of vmaf_* / Vmaf* / libvmaf_* / --enable-libvmaf-* tokens, and intersects against the PR's diff over core/include/libvmaf/*.h and core/meson_options.txt; fails when the intersection is non-empty and no ffmpeg-patches/*.patch is in the diff. Per-PR opt-out is no ffmpeg-patches update needed: REASON per the ADR-0108 family convention. Picks bash + grep over libclang AST / ctags — sub-second runtime, zero new deps; false positives close in seconds via the opt-out line, false negatives (a real surface change slipping through) are the bug we're paying to avoid. Mirrors ADR-0124 gate structure; absorbs into the ADR-0313 aggregator without branch-protection edits. | Accepted | ci, ffmpeg-integration, process, rule-enforcement |
| ADR-0410 | ssimulacra2_cuda 2026-05-09 cuda-reviewer follow-up — fixes the GPU module leak (init_fex_cuda calls cuModuleLoadData twice, close_fex_cuda never calls cuModuleUnload; ~200-500 KB of GPU-resident module backing store leaked per vmaf_close() cycle, invisible to compute-sanitizer --tool memcheck because the leak-checker tracks cuMem*Alloc only), removes the per-scale 24 MB malloc(3 * width * height * sizeof(float)) from extract_fex_cuda (replaced with two pre-allocated pinned scratch buffers h_ref_lin_ds / h_dis_lin_ds reused across scales), shrinks the H2D / D2H transfers from full-plane to per-plane scale_w * scale_h * sizeof(float) (15× PCIe-traffic reduction at 1080p scale 2: 518 KB valid vs 8 MB full-plane per copy, repeated across 2 H2D + 5 D2H per scale), and annotates ssimulacra2_blur_h / ssimulacra2_blur_v with __launch_bounds__(64, 32) so nvcc trims registers to keep ≥32 resident blocks per SM. Bit-exact at places=4 (0/48 mismatches, max abs diff 0.000000e+00) on the Netflix 576×324 fixture. Wall-clock at 1080p RTX 4090 within noise (~0.7%) — the host XYB pre-pass dominates; the fix is correctness-shaped, not throughput-shaped, at this resolution. Architectural ceilings (H-pass non-coalesced, V-pass L1 pressure) require a shared-memory tile-transpose rewrite and remain known follow-ups. The cuModuleUnload rule is propagated to core/src/cuda/AGENTS.md § Lifecycle invariants so future agent passes pin it on every CUDA extractor (every existing extractor leaks the same way — separate sweep PR). | Accepted | cuda, gpu, perf, memory-leak, ssimulacra2, fork-local |
| ADR-0412 | Fork-local release-artefact mirror scaffold for u2netp.pth under Apache-2.0 §4 NOTICE compliance. Partial unblock of T6-2a path (b) (T6-2a-mirror-u2netp-via-release — ADR-0265 fallback path). Adds LICENSES/Apache-2.0-u2netp.txt (full licence text + attribution block citing upstream copyright, paper, repository, and HEAD ac7e1c81 commit pin; upstream ships no NOTICE file, so §4 (d) is moot), a model-card stub at docs/ai/models/u2netp_mirror_card.md (5-point bar per ADR-0042 with sha256 + Sigstore bundle URL placeholders for the binary upload), an operator workflow doc at docs/ai/u2netp-mirror.md (gh release download + sha256 cross-check + cosign verify-blob against the VMAFx/vmafx OIDC identity), an idempotent staging step in .github/workflows/supply-chain.yml (fast-exits when model/u2netp_mirror.{onnx,pth} are absent), and a .gitignore entry pinning the binary to release attachments only. ADR-0671 adds the missing exporter. Recommended path for new consumers remains saliency_student_v2; the mirror is the named fallback for users who specifically want upstream u2netp lineage (citation, comparative evaluation, downstream pipelines pinned to upstream behaviour). Binary upload remains a sibling release-asset step. Companion: Research-0086. | Accepted | ai, dnn, u2netp, saliency, license, apache-2.0, supply-chain, fork-local, docs |
| ADR-0413 | YouTube UGC corpus ingestion for nr_metric_v1 — adopt the Google YouTube UGC dataset (Wang, Inguva, Adsumilli MMSP 2019 + CVPR 2021 transcoded follow-up; ~1500 community UGC originals; CC-BY) as a fourth MOS-corpus training shard alongside LSVQ (ADR-0333), KonViD-150k (ADR-0325 Phase 2), and BVI-DVC (ADR-0310). New ai/scripts/youtube_ugc_to_corpus_jsonl.py adapter mirrors the LSVQ shape verbatim — resumable per-URL curl downloads with atomic progress writes, ffprobe-driven geometry probe, MOS / SD / n-ratings round-trip from the canonical bucket-rooted manifest CSV (https://storage.googleapis.com/ugc-dataset/original_videos.csv), and the same JSONL row contract modulo corpus = "youtube-ugc" and corpus_version = "ugc-2019-orig" (or ugc-2020-transcoded-mean for the transcoded-mean variant). The dataset is hosted in the public-readable GCS bucket gs://ugc-dataset/ with the allUsers:objectViewer IAM role — no sign-up, no request form. Refuses (exit 2) when handed a < 200-row CSV; defaults to a 300-row laptop-class subset with --full opting into whole-corpus ingestion (~2 TB working set). License posture stays local-only — clips + per-clip MOS under .workingdir2/youtube-ugc/, only derived nr_metric_v1_*.onnx weights ship with CC-BY attribution. Tests under ai/tests/test_youtube_ugc.py (18 cases) cover resumable-resume, attrition tolerance, refuse-tiny cutoff, atomic progress writes, ffprobe geometry parse, broken-clip skip, MOS-column round-trip (canonical + alias headers including DMOS), bare-stem vid -> .mp4 suffix, append+dedup on re-run, --max-rows / --full cap, and synthesised-bucket-URL path for manifests omitting url. | Accepted | ai, training, corpus, license, fork-local |
| ADR-0414 | Saliency-aware ROI for x265 / SVT-AV1 / libvvenc adapters | Accepted | vmaf-tune, saliency, codec-adapter, roi, fork-local |
| ADR-0415 | CAMBI SYCL port — closes last CUDA-to-SYCL parity gap | Accepted | sycl, gpu, cambi, feature-extractor, fork-local, t3-15 |
| ADR-0416 | VIF on-the-fly filter sync from Netflix upstream | Proposed | |
| ADR-0417 | Draft PR registration for the ai/tiny-netflix-training-scaffold branch. Formally bundles the tiny-AI Netflix corpus training scaffold (ADR-0242) into a reviewable PR unit; adds Research Digest 0099 (2024–2026 distillation and ONNX Runtime literature update). No code changes — scaffold content (loader API, MCP smoke test, training-data docs) is already in master via ADR-0242. Decision deferred to follow-up PR pending user architecture confirmation. | Accepted | ai, training, mcp, fork-local, onnx, docs |
| ADR-0418 | Full upstream ADM + VIF-prescale sync (companion to PR #758 / ADR-0416) | Proposed | |
| ADR-0419 | Gate SVE2 build probe to non-Darwin hosts. Apple Silicon (M1–M4) is ARMv8.x without SVE2 and the runtime detection in core/src/arm/cpu.c is already __linux__-only; recent Apple Clang accepts -march=armv9-a+sve2 so cc.compiles() returns true but the SSIMULACRA 2 SVE2 TU then fails to build under Apple's incomplete intrinsics surface. Force HAVE_SVE2=false on Darwin to mirror the runtime gate. | Accepted | build, simd, macos, arm64 |
| ADR-0420 | Metal backend runtime (T8-1b): replaces the T8-1 scaffold's -ENOSYS stubs with Objective-C++ TUs (common.mm, picture_metal.mm, kernel_template.mm) driving Metal.framework directly. ARC + __bridge_retained/__bridge_transfer casts keep <Metal/Metal.h> out of all headers; consumer TUs stay pure-C through uintptr_t / void * handle ABIs and the new vmaf_metal_context_{device,queue}_handle accessors. Build wiring: Foundation + Metal framework deps flipped to required: true, -fobjc-arc added via add_project_arguments(language: 'objcpp'). Smoke test flipped from -ENOSYS pin to runtime expectations (0 on Apple-Family-7+, -ENODEV on Intel Mac / non-Apple hosts; gracefully short-circuits on -ENODEV so non-Apple-Silicon Mac CI lanes stay green). Unblocks T8-1c (first real kernel — integer_motion_v2.metal). Companion to issue #763. | Accepted | gpu, metal, apple-silicon, runtime, fork-local |
| ADR-0421 | Metal first kernel — integer_motion_v2 (T8-1c) | Accepted | gpu, metal, apple-silicon, kernel, bit-exact, fork-local |
| ADR-0422 | CLI HIP and Metal backend selectors: adds --no_hip, --hip_device <N>, --no_metal, --metal_device <N> to the vmaf CLI and extends --backend to accept hip and metal. Activation follows the Vulkan opt-in model (device flag must be non-negative; --backend hip|metal defaults device to 0 and disables all other backends). init_gpu_backends() in vmaf.c gains guarded vmaf_hip_state_init / vmaf_metal_state_init blocks. Five new test_cli_parse tests added. docs/usage/cli.md updated. No ffmpeg-patches update required (patches consume libvmaf C API, not the standalone CLI tool flags). | Accepted | cli, hip, metal, gpu, fork-local |
| ADR-0423 | Metal IOSurface zero-copy import (T8-IOS). Public libvmaf_metal.h gains VmafMetalExternalHandles, vmaf_metal_state_init_external, vmaf_metal_picture_import, vmaf_metal_wait_compute, vmaf_metal_read_imported_pictures. core/src/metal/picture_import.mm implements the import via IOSurfaceLock + per-row memcpy into a shared-storage VmafPicture (Apple Silicon unified-memory cost is equivalent to a Shared MTLBuffer copy). Companion ffmpeg-patches/0013-libvmaf-add-libvmaf-metal-filter.patch registers the libvmaf_metal filter consuming AV_PIX_FMT_VIDEOTOOLBOX frames via CVPixelBufferGetIOSurface; FFmpeg passes device=0 so libvmaf falls back to MTLCreateSystemDefaultDevice until upstream ships AVMetalDeviceContext. | Accepted | metal, ffmpeg-patches, gpu, t8-ios |
| ADR-0424 | vmaf-tune benchmark consumes existing Phase-A JSONL corpora and reports one matched-quality row per encoder. The command filters successful finite rows, chooses the lowest-bitrate point clearing --target-vmaf, keeps closest misses visible as unmet, and emits markdown / JSON / CSV without launching FFmpeg or libvmaf. | Accepted | vmaf-tune, cli, benchmark, corpus |
| ADR-0425 | vmaf-roi-score saliency materialiser | Accepted | tooling, ai, saliency, vmaf, fork-local |
| ADR-0426 | Add CHUG as a local-only UGC-HDR MOS-corpus ingestion path. The adapter downloads/probes CHUG videos under .workingdir2/chug/, preserves raw CHUG MOS and HDR ladder metadata, maps trainer-facing MOS onto [1, 5], and keeps all CHUG media / labels out of git under the dataset's non-commercial/share-alike license posture. | Accepted | ai, hdr, corpus, mos |
| ADR-0427 | Materialise CHUG HDR feature rows with reference-aligned full-reference pairs. Distorted ladder rows are paired to their chug_content_name reference, decoded to 10-bit 4:2:0 YUV, scaled to reference geometry, and emitted as local-only clip-level feature rows for MOS-head training. | Accepted | ai, hdr, corpus, mos |
| ADR-0428 | vmaf-tune auto selects one winner | Accepted | vmaf-tune, cli, planning |
| ADR-0429 | testdata bench_perf is configurable | Accepted | benchmarks, testdata, tooling |
| ADR-0430 | Saliency RGB ingest and SSIMULACRA2 public docs | Accepted | vmaf-tune, saliency, docs, metrics, fork-local |
| ADR-0431 | Split FR-from-NR CUDA extraction into an explicit CUDA pass plus CPU residual pass. CHUG/K150K local FULL_FEATURES materialisation keeps the same parquet schema while avoiding the mixed all-feature --backend cuda path that can fail on 10-bit clips with duplicate feature-key writes and CUDA context synchronization errors. | Accepted | ai, cuda, training-data, corpus, fork-local |
| ADR-0432 | Extend vmaf-roi-score saliency-mask materialisation to little-endian planar 8/10/12/16-bit YUV so HDR/CHUG-style inputs preserve native sample depth while saliency inference still consumes 8-bit RGB. | Accepted | roi, tiny-ai, hdr, tooling, fork-local |
| ADR-0433 | CHUG content-safe train/validation/test splits plus an ffprobe-backed HDR metadata audit in the local feature materialiser. | Accepted | ai, hdr, chug, training, fork-local |
| ADR-0434 | CHUG Parquet Metadata Enrichment | Accepted | ai, hdr, chug, training, fork-local |
| ADR-0435 | PR-body pre-push validation hook | Accepted | ci, agents, hooks, docs |
| ADR-0436 | MCP server backend-selector parity (vulkan/hip/metal) | Accepted | mcp, agents, api, dispatch, fork-local |
| ADR-0437 | Metal public-header install and vmaf_metal_import_state declaration | Accepted | metal, build, c-api, install, apple-silicon, fork-local |
| ADR-0438 | CLI parser short-option handler coverage invariant — every short option in short_opts[] must have a case arm; adds missing case 'c': for --cpumask | Accepted | cli, lint, testing, correctness |
| ADR-0444 | Promote saliency_student_v2 to production default (IoU 0.7105 vs v1 0.6558, +8.3 %) | Accepted | ai, dnn, saliency, tiny-ai, fork-local |
| ADR-0445 | Persistent VkPipelineCache for Vulkan compute backend | Accepted | vulkan, gpu, performance, pipeline-cache, fork-local |
| ADR-0446 | K150K/CHUG extractor passes HDR and HFR per-feature options | Accepted | ai, hdr, hfr, training, corpus, fork-local |
| ADR-0447 | Motion features under-report on HFR / 50p content; apply bounded FPS weighting across motion extractor variants and document the CHUG/HFR bias. | Accepted | ai, motion, hfr, feature-extractor, cuda, sycl, vulkan, fork-local |
| ADR-0448 | Active upstream monitoring (no silent "wait" deferrals) | Accepted | ci, governance, upstream-sync, deferral, fork-local |
| ADR-0451 | Local dev-MCP container for live probing — Docker, all 4 GPU backends (CUDA/SYCL/Vulkan/HIP), continuous 15-min smoke probe | Accepted | infra, docker, mcp, gpu, hip, cuda, sycl, vulkan, dev, fork-local |
| ADR-0452 | Port calculate_c_values_row to AVX-512 and NEON (CAMBI banding detector) | Accepted | simd, cambi, perf, fork-local |
| ADR-0453 | PSNR enable_chroma option parity across all GPU backends | Accepted | cuda, sycl, vulkan, psnr, option-parity, bug |
| ADR-0454 | VIF CUDA shared-memory staging for horizontal and vertical filter passes | Proposed | cuda, gpu, vif, performance, smem, fork-local |
| ADR-0455 | KonViD-150k k150ka/k150kb split promotion into the MOS-head trainer | Accepted | ai, training, corpus, konvid, fork-local |
| ADR-0456 | SSIMULACRA2 CUDA blur: 3-channel kernel fusion (gridDim.z) + V-pass shared-memory transpose for coalesced access | Accepted | cuda, perf, ssimulacra2 |
| ADR-0457 | model/tiny/*.onnx blobs ≥1MB live in GitHub Releases, not git | Accepted | ai, model-storage, repo-size, fork-local |
| ADR-0458 | SYCL CAMBI queue-sync collapse + SSIM horizontal SLM staging | Accepted | sycl, perf, cambi, ssim, gpu, fork-local |
| ADR-0459 | vmaf-tune panel/display-aware recommendation workstream (HDRSDR-VQA) | Proposed | vmaf-tune, ai, hdr, training, panel, display, fork-local |
| ADR-0460 | Dispatch-strategy registry audit 2026-05-15 | Accepted | |
| ADR-0461 | CLI validates positive dimensions and chroma-alignment on input videos | Accepted | cli, validation, correctness |
| ADR-0463 | ADM p-norm fast-path split and VIF scalar-fallback malloc hoist | Accepted | perf, adm, vif, simd, cpu, fork-local |
| ADR-0464 | CAMBI CUDA spatial-mask shared-memory tile | Accepted | cuda, gpu, cambi, performance, kernel, fork-local |
| ADR-0466 | mkdocs strict-mode pre-push hook | Accepted | docs, ci, git, hooks, mkdocs |
| ADR-0467 | SSIMULACRA2 AVX-512 + NEON IIR blur / picture_to_linear_rgb ULP audit — clean close | simd, ssimulacra2, audit | Accepted |
| ADR-0468 | HIP float_adm real kernel (ninth HIP consumer) | Accepted | hip, build, feature-extractor |
| ADR-0469 | float_psnr HIP twin — wire enable_chroma option | Accepted | hip, psnr, option-parity |
| ADR-0470 | Disk-Persistent VkPipelineCache for Vulkan Feature Extractors | Accepted | vulkan, perf, build |
| ADR-0471 | Add enable_chroma to integer_psnr_hip (chroma parity with CUDA/SYCL/Vulkan twins) | Accepted | hip, psnr, option-parity, chroma, fork-local |
| ADR-0480 | Bootstrap Score Name-Builder Deduplication | Accepted | refactor, predict, libvmaf |
| ADR-0481 | ADM p-norm Parameter Hardcoded at 3.0 — Deferral Decision | Accepted | adm, predict, ai, testing |
| ADR-0482 | Expand vmaf_pre FFmpeg filter device strings to match full VmafDnnDevice enum | Accepted | ffmpeg, ai, build |
| ADR-0483 | Extract shared vmaf_gpu_dispatch_parse_env tokenizer into gpu_dispatch_parse.h; removes verbatim triplicate across CUDA/SYCL/Vulkan dispatch_strategy TUs | Accepted | cuda, sycl, vulkan, refactor, dedup |
| ADR-0484 | Extend kernel-scaffolding.md with HIP and Metal lifecycle contract | Accepted | docs, hip, metal, gpu, fork-local |
| ADR-0485 | Extract VMAF_LIFECYCLE_ZERO macro to eliminate struct-init duplication across HIP and Metal kernel templates | Accepted | cuda, framework, lint, build |
| ADR-0486 | Codify the three-function GPU backend context-API contract in docs | Accepted | docs, gpu, hip, metal, vulkan, cuda, api, fork-local |
| ADR-0487 | Wire adm_min_val option into integer_adm GPU backends | Accepted | cuda, sycl, vulkan, adm, parity |
| ADR-0488 | Shared once-snapshot helper for GPU dispatch env variables | Accepted | gpu, cuda, vulkan, sycl, dispatch, threading, refactor, fork-local |
| ADR-0489 | CAMBI SYCL — Replace GPU-to-GPU q.wait() Calls with Event Chains (SY-1) | Accepted | sycl, gpu, cambi, performance, fork-local |
| ADR-0490 | float_ms_ssim Metal port | Accepted | metal, ms-ssim, float, apple-silicon, fork-local |
| ADR-0491 | Add dedicated docs/metrics/motion.md reference page | Accepted | docs, metrics, motion, fork-local |
| ADR-0492 | Promote Vulkan VIF g/sv_sq Computation to double Precision | Superseded | vulkan, vif, gpu-parity, precision |
| ADR-0493 | Test YUV fixtures must be md5-verified, not just present-by-name | Accepted | testing, ci, fixtures, golden-data |
| ADR-0494 | Restore the non-golden Python test suite to green | Accepted | testing, ci, python, regression-recovery |
| ADR-0495 | MCP server probe-driven bug-fix cluster (2026-05-17) | Accepted | mcp, ai, testing, regression-recovery |
| ADR-0496 | Default to the vmaf-dev-mcp container for all vmaf / vmaf-tune / ai / MCP work | Accepted | tooling, container, dev-experience, project-rule, fork-local |
| ADR-0497 | vmaf-tune BBB end-to-end bug cluster (compare / ladder / report) | Accepted | vmaf-tune, cli, bugfix, docs |
| ADR-0498 | vmaf-tune BBB end-to-end v2 bug cluster + explicit-backend semantics | Accepted | vmaf-tune, cli, libvmaf, bugfix, docs, container |
| ADR-0499 | vmaf-tune ladder must decode container/Y4M references before scoring | Accepted | vmaf-tune, ladder, corpus, ffmpeg, docs |
| ADR-0500 | VIF log2 LUT Shrink and Gaussian Filter Cache | Accepted | simd, perf, integer-vif, float-vif |
| ADR-0501 | vmaf-tune ladder cross-resolution scoring + report degraded flag | Accepted | vmaf-tune, ladder, corpus, report, vmaf-cli, docs |
| ADR-0502 | ADM decouple gather prefetch (Approach B) | Accepted | simd, perf, adm, avx512, fork-local |
| ADR-0503 | vif_subsample_rd_8_avx512 Loop Fission to Reduce ZMM Register Spill | Accepted | simd, performance, avx512, vif |
| ADR-0504 | AVX-512F port of float separable convolution scanlines | Accepted | simd, performance, build |
| ADR-0505 | vmaf-tune ladder container-source encode + full per-CRF sample cloud | Accepted | vmaf-tune, ladder, corpus, encode, vmaf-cli, docs |
| ADR-0506 | vmaf-tune ladder duration clipping, raw-YUV cross-res decode, CLI exit code | Accepted | vmaf-tune, ladder, corpus, encode, cli, docs |
| ADR-0508 | vmaf-tune ladder pass-1 stats argv honours --duration | Accepted | vmaf-tune, ladder, encode, bugfix |
| ADR-0509 | vmaf-tune compare auto-probes container-source framerate / duration | Accepted | vmaf-tune, compare, bisect, encode, vmaf-cli |
| ADR-0510 | CHUG re-extract VMAF-alignment fix — FR-corpus guard on the FR-from-NR extractor | Accepted | ai, corpus, chug, k150k, extractor, training-data, regression-guard |
| ADR-0511 | MCP backend probe, default allowlist, and vmaf-tune ladder --score-backend (2026-05-18) | Accepted | mcp, vmaf-tune, ai, dx, bugfix |
| ADR-0512 | Vulkan VIF Two-Variant Compute Shader (fp32 Auto-Fallback) | Accepted | vulkan, vif, gpu-parity, precision, compatibility |
| ADR-0513 | vmaf-tune tune-per-shot exposes --scene-threshold + --max-shot-duration; report renders 1-shot timeline | Accepted | vmaf-tune, per-shot, report, ux, bugfix, fork-local |
| ADR-0514 | dev-MCP container exposes every host GPU backend (CUDA + SYCL + Vulkan + HIP): adds ${ONEAPI_ROOT}/tcm/latest/lib to LD_LIBRARY_PATH so the level-zero UR adapter resolves libhwloc.so.15 (closes SYCL "No platforms found" on Arc); clears VK_ICD_FILENAMES (was pinned to a non-existent lvp_icd.x86_64.json that hid the host-mounted nvidia_icd.json + mesa intel/radeon ICDs); bind-mounts /dev/dri/by-path so the Intel compute-runtime can resolve Arc via its udev pci symlink; adds a build-time backend probe that scans for "built without X support" regressions (the precise failure mode where the running image silently shipped without -Denable_hip=true). Closes finding 8 of SESSION_FINDINGS_v9_GPU_PROBE.md; pairs with ADR-0492 fp64 relax for full Vulkan parity on Arc. | Accepted | container, dev-experience, gpu, sycl, vulkan, hip, cuda, fork-local |
| ADR-0515 | Portable temp-path setup for test_public_api_score on MinGW64 (2026-05-18): the test_vmaf_write_output case hardcoded /tmp/vmaf_test_output_XXXXXX and called mkstemp(3) on it. MSYS2/MinGW64 inside the GitHub Actions windows-latest runner does not expose a usable /tmp from the MINGW64 shell, so mkstemp failed with ENOENT and the Build — Windows MinGW64 (CPU) job was perpetually red on master. Fix: extract a make_temp_output_path() helper that uses GetTempPathA() + a <pid>-suffixed filename on #ifdef _WIN32 (mirroring the precedent in core/test/dnn/test_model_loader.c::test_sidecar_parses) and keeps mkstemp on POSIX. unlink → remove for Win32 portability. Conservative, test-only scope. | Accepted | ci, build, windows, mingw, test, fork-local, bugfix |
| ADR-0516 | vmaf-tune compare multi-target rate-quality sweep (schema v2) (2026-05-18) | Accepted | vmaf-tune, ux, dx, schema-evolution |
| ADR-0517 | Repair MCP run_benchmark tool — drop per-call args, inject VMAF_BIN, guard set -u in bench_all.sh | Accepted | mcp, bugfix, fork-local, benchmark |
| ADR-0518 | Tiny-model loader accepts external-data and feature-vector ONNX | Accepted | ai, dnn, loader, bug-fix |
| ADR-0519 | Implement vmaf_hip_import_state to unblock --backend hip | Accepted | hip, backend, libvmaf, gpu |
| ADR-0520 | Wire vmaf --no-reference through to the scoring path (2026-05-18): the flag was parsed into CLISettings::no_reference but never read; the unconditional if (!settings->path_ref) gate always tripped with Reference .y4m or .yuv (-r/--reference) is required. Fix: gate the reference-required check on !no_reference and require --tiny-model when set; in vmaf.c skip the ref open and feed the distorted file twice (two video_input handles backed by the same source) so vmaf_read_pictures sees a non-null picture pair while the rank-4 DNN dispatch sees the distorted frame. Refuses rank-2 feature-vector tiny models (they consume FR feature columns); auto-suppresses the built-in VMAF SVM default that would otherwise require a real reference. Closes .workingdir/bbb_reports/E2E_TEST_MATRIX_v9.md Finding 8. | Accepted | cli, ai, dnn, docs, fork-local, bugfix |
| ADR-0521 | MSVC portability gating — vif_avx512.c noinline/noclone + yuv_input.c S_ISREG/fstat | Accepted | ci, build, windows, msvc, simd, tools, portability, fork-local, bugfix |
| ADR-0522 | --tiny-codec / --tiny-preset / --tiny-crf populate codec one-hot block | Accepted | cli, ai, dnn, tiny-model |
| ADR-0523 | Register vmaf_fex_integer_motion_hip in the extractor list | Accepted | hip, gpu, feature-extractor, bugfix, fork-local |
| ADR-0524 | Tiny-model loader accepts ONNX models with a symbolic batch dim (2026-05-18): vmaf_ctx_dnn_attach rejected model/tiny/nr_metric_v1.onnx (input shape ['batch', 1, 224, 224]) because the gate required in_shape[0] == 1; ORT surfaces symbolic dims as -1. Surfaced by the --no-reference agent (PR #1280 / ADR-0520) — every shipped NR tiny model uses the PyTorch/ONNX dynamic-batch idiom. Fix: fold in_shape[0] ∈ {1, -1} to batch=1 in both dnn_attach_nchw and dnn_attach_feature_vector (and the optional rank-2 second input); fixed batch > 1 stays rejected (no batched-inference scheduler exists); symbolic H/W stays rejected with a sharper diagnostic. Per-frame inference always emits batch=1 already, so no runtime change needed. | Accepted | ai, dnn, loader, bug-fix |
| ADR-0525 | Extract run_cmd subprocess helper into aiutils | Accepted | ai, refactor, fork-local |
| ADR-0526 | Add enable_lcs and enable_chroma to float_ms_ssim SYCL twin | Accepted | sycl, parity, options |
| ADR-0527 | Accept pre-extracted BVI-DVC YUVs via --bvi-dir | Accepted | ai, corpus, cli, docs |
| ADR-0528 | test_cli_parse_long_only_args::test_threads_invalid_optarg_does_not_assert regression close (2026-05-18): the test fork-harness allocated a 4 KiB stderr buffer and stopped reading once full, so once usage()'s help text grew past 4 KiB the child either SIGPIPE'd (signal 13) or SIGABRT'd (signal 6 via aborting stdio) — the test's WIFEXITED check rejected the run. Fix: refactor the parent into a read_head_drain_tail() helper that captures the first 511 bytes (enough for the "Invalid argument …" needle) then drains the remainder so the writer never blocks; extract child-side dup2 + cli_parse + _exit into child_parse_via_pipe(). Defence-in-depth in error(): replace assert(long_opts[n].name) with an explicit usage() fallback (banned macro per principles.md §1.2 rule 30; -DNDEBUG would silently no-op anyway) and sprintf → snprintf on the 256-byte optname scratch buffer. Product-code path was already correct (ADR-0316 / ADR-0438 long-only enum fix); this is the test harness catching up. | Accepted | cli, test, regression, fork-local, bugfix |
| ADR-0529 | Replace /dev/dri/by-path bind with whole /dev/dri bind in dev container | Accepted | build, ci, cuda, sycl, agents |
| ADR-0530 | Promote VMAF_FEATURE_EXTRACTOR_HIP on integer_motion_hip; add VMAF_PICTURE_BUFFER_TYPE_HIP_DEVICE enum entry; wire compute_fex_flags() HIP slot; add CPU-twin fallback pass in _by_feature_name; HIP flush_context_serial drain (gpu_pending final-frame collect); route HIP collect/flush writes through feature_name_dict. End-to-end: --backend hip --feature integer_motion now actually dispatches calculate_motion_score_kernel_8bpc (48 HSACO launches per 48-frame clip; HIP vmaf=76.7125 vs CPU vmaf=76.6678, within places=4). Extends ADR-0519 (HIP vmaf_hip_import_state impl) + ADR-0523 (PR #1283, extractor registration + meson source-list entry). | Accepted | hip, gpu, dispatch, feature-extractor, fork-local |
| ADR-0531 | Per-shot plan emits bitrate_kbps + chart shows last shot | Accepted | vmaf-tune, per-shot, report, chart |
| ADR-0532 | tune-per-shot tolerates read-only CWD when writing segments | Accepted | vmaf-tune, cli, robustness, container |
| ADR-0533 | Full HIP feature-extractor registration sweep | Accepted | hip, gpu, feature-extractor, bugfix, fork-local |
| ADR-0534 | vmaf-tune compare emits + renders rate-quality curve from per-iteration bisect samples | Accepted | vmaf-tune, compare, report, chart, ux |
| ADR-0535 | Atomic ADR Number Allocator with Cross-Branch Claim | Accepted | ci, docs, git, agents, tooling |
| ADR-0536 | Per-shot predicate threads bitrate_kbps through bisect sidecar (PR #1290 follow-up) | Accepted | vmaf-tune, per-shot, bugfix |
| ADR-0537 | Fix the vmaf_fex_integer_vif_hip GPU memory access fault that ADR-0530 un-flagged as a follow-up: upload the static 4×18 vif_filter1d_table to a device buffer at init (the pre-fix kernel was being handed a host pointer); correct filter half-widths from {9,5,3,0} (parsed from the kernel-name suffix) to {8,4,2,1} (= vif_filter1d_width[scale]/2) so the loop stops reading past the 18-entry table; add the rd-filter downsample-write path so scales 1–3 read the half-resolution planes the previous horizontal pass wrote (the pre-fix kernel left them uninitialised); allocate per-frame hipMemcpy2DAsync staging buffers so the kernel reads device memory not the host VmafPicture->data[0]. Re-enable VMAF_FEATURE_EXTRACTOR_HIP on vmaf_fex_integer_vif_hip. End-to-end: --backend hip --feature vif_hip produces VMAF VIF scores on the Netflix golden pair (0.5047 / 0.8764 / 0.9365 / 0.9634 vs CPU 0.5057 / 0.8791 / 0.9379 / 0.9643 — within places=3; places=4 follow-up tracked). Also bundles three adjacent fixes: missing HSACO entries for the ADR-0533 extractor sweep, weak-stub HSACO blobs for the four ADM kernels that don't compile standalone, and -I build_dir/feature/hip/hip additions to the hipcc command. Closes the ADR-0530 follow-up. | Accepted | hip, gpu, kernel, vif, bug-fix, fork-local |
| ADR-0538 | vmaf-tune compare ships premium-archival --target-vmafs default + bisect reaches VMAF 95+ | Accepted | vmaf-tune, compare, bisect, defaults, premium-archival |
| ADR-0539 | integer ADM HIP kernels — real implementation replacing weak HSACO stubs | Accepted | hip, adm, kernels, rocm, fork-local |
| ADR-0540 | dev-MCP container FFmpeg ships AV1 (SVT/aom) + VVenC + hardware encoders (NVENC, oneVPL/QSV, AMF): adds libsvtav1-dev / libaom-dev / libvpl-dev (apt), builds Fraunhofer VVenC + AMD AMF headers from source, and extends the FFmpeg configure line with --enable-libsvtav1 --enable-libaom --enable-libvvenc --enable-nvenc --enable-cuda-nvcc --enable-libnpp --enable-libvpl --enable-amf. Adds a build-time encoder probe that locks down the compile-in promise (catches silent --enable-* drift the same way ADR-0514 catches backend-flag drift). Unblocks vmaf-tune compare sweeps across the full 15-encoder matrix on hosts with NVIDIA + Intel + AMD GPUs. | Accepted | container, dev-experience, ffmpeg, codecs, av1, vvc, nvenc, qsv, amf, fork-local |
| ADR-0541 | Pin dev-MCP container Intel NEO + ROCm runtimes to versions matching the host kernel | Accepted | build, dev, sycl, hip, container, gpu |
| ADR-0542 | dev-MCP container closes the last GPU plumbing gaps: entrypoint dynamically rewrites VK_DRIVER_FILES to exclude lavapipe whenever any real Vulkan ICD is present (NVIDIA via Container Toolkit / Intel + AMD via mesa); pins HSA_OVERRIDE_GFX_VERSION=10.3.0 + HSA_ENABLE_SDMA=0 + ROCR_VISIBLE_DEVICES=0 in docker-compose.yml so AMD gfx1036 iGPU passes hsa_init() (ROCm 6.x allowlist gate); adds intel-media-va-driver-non-free + mesa-va-drivers to stage 1 for VA-API codec driver coverage; documents the NVIDIA_DRIVER_CAPABILITIES=…,graphics,… requirement for NVIDIA Vulkan ICD bind-mount. All five --backend {cpu, cuda, sycl, vulkan, hip} lanes run on real hardware (no silent CPU/lavapipe fallback). | Accepted | container, dev-experience, vulkan, sycl, hip, rocm, fork-local |
| ADR-0543 | ADR-0498 enforcement hardening — distinct exit code + structured JSON error + per-feature gate | Accepted | cli, libvmaf, bugfix, backend, exit-code, extends-adr-0498 |
| ADR-0544 | Deduplicate feature_extractor_list[] (2026-05-18): the HAVE_VULKAN block held 67 entries instead of 18 — vmaf_fex_psnr_hvs_vulkan and vmaf_fex_float_ms_ssim_vulkan each registered 11 times, seven other Vulkan twins 6 times each; the HAVE_SYCL block held 17 entries instead of 11 (six twins registered twice). The first-match vmaf_get_feature_extractor_by_name() hid the bug from CLI users, but the ctx-pool's iterator path allocated one pool entry per duplicate, driving init/extract/flush 2–11x per picture on the affected GPU twins (plausible root cause of the v9 CHUG "VMAF=99 across all bitladders" anomaly previously attributed to operational misuse in PR #1270). Fix removes 61 duplicate entries and adds vmaf_feature_extractor_list_audit() — called once from vmaf_init() and exercised by a new meson test case so the bug class is detected at startup on every future build. | Accepted | bug, dispatch, vulkan, sycl, registry, fork-local |
| ADR-0545 | Wire or delete dead Vulkan/Metal feature-extractor source files | Accepted | vulkan, metal, build, housekeeping, dead-code |
| ADR-0546 | Audit bundle — Vulkan motion dispatch wiring, saliency hard-fail, model-card placeholder | Accepted | vulkan, vmaf-tune, ai, build, docs |
| ADR-0547 | VMAF_ | Accepted | ai, scripts, container, hygiene, fork-local |
| ADR-0548 | vmaf-tune tune-per-shot accepts container sources directly; compare gains --no-bisect mode | Accepted | vmaf-tune, cli, ergonomics, compare, per-shot |
| ADR-0549 | Audit cleanup bundle 2 | Accepted | docs, build, container, housekeeping, fork-local |
| ADR-0550 | Auto-resize input plane to NR tiny-model dims + --tiny-resize flag | Accepted | ai, cli, dnn, api |
| ADR-0551 | VCQ-223 LocalExplainer CI timeout — root cause and fix path | Proposed | python, test, local-explainer, performance, bugfix, fork-local |
| ADR-0552 | Deterministic wavefront reduction for integer_vif_hip horizontal kernels | Accepted | hip, gpu, kernel, vif, parity, correctness, fork-local |
| ADR-0556 | Python / MCP / AI silent-fallback audit fixes (2026-05-18) | Accepted | python, mcp, ai, vmaf-tune, correctness, audit |
| ADR-0559 | Feature Coverage Audit — Add speed_chroma + speed_temporal to Extraction Scripts (HDR-model prep) | Accepted | ai, feature-extraction, speed, hdr, corpus, fork-local |
| ADR-0561 | Unknown | Accepted | |
| ADR-0562 | VCQ-223 LocalExplainer hang fix — cap neighbor_samples in test runner | Accepted | python, test, local-explainer, performance, bugfix, fork-local |
| ADR-0563 | HIP extractor audit — verification of 9 remaining scaffold claims | Accepted | hip, gpu, audit, parity, fork-local |
| ADR-0564 | Real integer_ssim GPU kernels (CUDA, HIP, SYCL) — replace silent float_ssim substitution | Accepted | cuda, hip, sycl, ssim, kernel, correctness, gpu, fork-local |
| ADR-0565 | Continuous feature-mix evaluation pipeline (predictor-bench): a YAML-declared eval grid of (target_model × corpus × codec × display × tuning_preset) cells evaluated by greedy forward feature selection + LOSO CV + 95 % bootstrap CIs on the marginal PLCC/SROCC delta vs the Netflix SVM baseline, stored in DuckDB and rendered as a Markdown report. Implemented as vmaf-tune predictor-bench run/report/show/diff. Three-phase plan: Phase 1 local MVP, Phase 2 SHAP + manual CI trigger, Phase 3 nightly cron + PR regression gate. Motivated by the upcoming Netflix HDR VMAF model, CHUG-HDR corpus, and continuous codec-adapter evolution. | Proposed | ai, vmaf-tune, predictor, eval, corpus, fork-local, ci |
| ADR-0566 | Unknown | Accepted | |
| ADR-0567 | Real on-device GPU kernels for speed_chroma and speed_temporal (4 backends) | Accepted | gpu, hip, cuda, sycl, vulkan, speed, fork-local |
| ADR-0568 | Default sycl_icpx_aot_targets to the full Intel GPU architecture list to avoid SYCL JIT cold-start costs on Arc/iGPU builds. | Accepted | sycl, build, meson, gpu, intel, aot, fork-local |
| ADR-0569 | Bundle low-risk SDK/tool version bumps from the 2026-05-18 audit, including ORT, AMF, VVenC, formatters, ruff, cosign, and libsvm bounds. | Accepted | build, container, ci, deps, pre-commit, fork-local |
| ADR-0573 | Dev-mcp container — ubuntu:26.04 + CUDA 13.2 + hipcc + ocloc | Accepted | build, ci, cuda, container, dev |
| ADR-0574 | CUDA Twins for HDR-Model Features — Phase 1 (aim, adm3) | Accepted | cuda, feature, hdr, adm |
| ADR-0575 | Fix yuv_input.c stat compat — include-order and _MSC_VER guard | Accepted | ci, build, windows, msvc, mingw, tools, portability, bugfix, fork-local |
| ADR-0576 | ffmpeg-patches n8.1.1 full-feature-exposure sync | Accepted | ffmpeg, build, ci, cuda, sycl, hip, vulkan, metal |
| ADR-0577 | vmaf-tune bisect decode concurrency cap and aggressive workdir cleanup | Accepted | vmaf-tune, compare, bisect, disk-space, concurrency, fork-local |
| ADR-0578 | Hoist VIF scratch buffer from per-frame allocation to VifState | Accepted | perf, vif, cpu, build |
| ADR-0579 | vmaf-tune auto --execute — Phase F real encode/score execution mode | Accepted | vmaf-tune, phase-f, encode, score, cli, fork-local |
| ADR-0580 | float_ansnr enable_chroma option | Accepted | feature-extractor, metrics |
| ADR-0581 | Add enable_chroma option to integer_vif | Accepted | feature, vif, chroma |
| ADR-0582 | MS-SSIM enable_db and clip_db option parity on CUDA and SYCL backends; also adds enable_lcs to SYCL extractor. | Accepted | cuda, sycl, ms_ssim, option-parity, bug |
| ADR-0583 | Add enable_chroma option to the float_ms_ssim extractor | Accepted | ms-ssim, float-ms-ssim, option-parity, metrics, correctness, fork-local |
| ADR-0584 | Unknown | Accepted | |
| ADR-0585 | Add enable_chroma option to psnr_hvs_vulkan | Accepted | psnr-hvs, vulkan, option-parity, metrics, fork-local |
| ADR-0586 | Introduce integer_adm_vulkan.c as canonical Vulkan integer ADM extractor | Accepted | vulkan, build, feature-extractor |
| ADR-0587 | Real Metal Compute Kernels for CAMBI | Accepted | metal, cambi, gpu, build |
| ADR-0588 | vmaf-tune executor — per-shot and saliency execution modes | Accepted | vmaf-tune, executor, per-shot, saliency, phase-f, fork-local |
| ADR-0589 | Metal float_ssim option parity — enable_lcs, enable_db, clip_db, scale | Accepted | metal, ssim, option-parity, apple-silicon, kernel, fork-local |
| ADR-0590 | Wire enable_db / clip_db into the CUDA and SYCL MS-SSIM twins | Accepted | cuda, sycl, ms-ssim, option-parity, bug, fork-local |
| ADR-0591 | Restore rfe_hw_flags per-frame bitmask cache after PR #1067 clobber | Accepted | cuda, perf, bug, libvmaf |
| ADR-0592 | Remove float_vif_score weak HSACO stub now that real HIP kernel ships | Accepted | hip, build, cleanup |
| ADR-0593 | HIP integer_moment kernel — register real HSACO blob alongside psnr / psnr_hvs | Accepted | hip, gpu, parity, build |
| ADR-0594 | Per-kernel hip_cu_extra_flags dispatch — disable FMA contraction for ssimulacra2_blur HIP HSACO | Accepted | hip, build, ssimulacra2, numerics |
| ADR-0595 | Real two-pass argv for all 14 codec adapters | Accepted | vmaf-tune, codec, encode, ffmpeg |
| ADR-0596 | Delete orphan and duplicate HIP/CUDA translation units | Accepted | hip, cuda, build, cleanup |
| ADR-0597 | integer_vif is luma-only across every backend; CUDA enable_chroma is a documented no-op | Accepted | cuda, vif, parity, docs, audit-disposition |
| ADR-0598 | vmaf-tune workdir relocation — disk-space preflight + VMAFTUNE_WORKDIR env var | Accepted | vmaf-tune, bugfix, cli, container, workspace |
| ADR-0599 | Cross-Backend Parity Audit — Full Extractor Matrix (2026-05-18) | Accepted | cuda, sycl, vulkan, hip, ci, parity, audit |
| ADR-0600 | Port upstream USE_DIRECT_READ zero-copy input path (Netflix/vmaf@30a6e2a8d) | Accepted | upstream-port, performance, tools, cli, build |
| ADR-0601 | Fix three BBB v14 hardware-encoder bugs: (V14-A) raise probe dummy-encode resolution from 64×64 to 320×240 so NVENC / QSV do not reject it with EINVAL; (V14-B) inject QSV VA-API device-init chain (-init_hw_device vaapi=va:… -init_hw_device qsv=qsv_dev@va -filter_hw_device va + -vf format=nv12,hwupload=extra_hw_frames=64) into both probe and encode paths, exposed via --vaapi-device flag + VMAFTUNE_VAAPI_DEVICE env var; (V14-C) document gfx1036 (AMD Raphael/Phoenix APU) decoder-only limitation in _amf_common.py — no VCE encode block, AMF_NOT_SUPPORTED is a hardware ceiling not a software bug. | Accepted | vmaf-tune, compare, qsv, amf, nvenc, hardware, probe, bugfix, fork-local |
| ADR-0602 | macOS SIGSEGV in vmaf_write_output — pic_cnt underflow + missing vmaf NULL guard | Accepted | bugfix, macos, output, portability, correctness, fork-local |
| ADR-0603 | Ubuntu 26.04 (Resolute Raccoon) fallout fixes — CUDA 13.2, Python 3.14, apt renames | Accepted | build, cuda, ci, python, supply-chain |
| ADR-0604 | Add Renovate customManager for ROCm apt-repo tracking | Accepted | build, container, supply-chain, hip, renovate |
| ADR-0605 | Renovate customManagers for all dev/Containerfile pinned dependencies | Accepted | build, container, supply-chain, renovate, cuda, sycl, hip, intel, onnx |
| ADR-0606 | macOS SIGSEGV deep-fix in output.c writers (PR #1403 follow-up) | Accepted | bugfix, macos, output, portability, correctness, fork-local |
| ADR-0607 | vmaf-tune compare: decode reference YUV once for the entire run | Accepted | vmaf-tune, performance, disk-space, compare |
| ADR-0608 | Commit .zed/ project configuration (settings, tasks, debug) for Zed editor parity with .vscode/; adds clangd LSP, pyright+ruff, shfmt, vmaf-mcp context_servers entry, CodeLLDB debug configs, and task shortcuts mirroring all Makefile targets. docs/development/ide-setup.md updated with Zed section. .zed/local/ added to .gitignore. | Accepted | dev, ide, docs, build, workspace |
| ADR-0612 | Tiny-AI training scaffold on the Netflix VMAF corpus (2026-05-19 iteration): formalises architecture alternatives table (MLP depth/width, distillation-vs-scratch, model size, evaluation scope), --data-root CLI contract, and companion research digest 0607. Scaffold-only; training deferred to follow-up PR. | Proposed | ai, docs, workspace, mcp |
| ADR-0613 | Dynamic Optimizer — Joint Shot-Boundary + CRF Co-Optimisation | Proposed | ai, planning, vmaf-tune |
| ADR-0614 | Per-Shot ABR Rendition Selection | Proposed | ai, planning, vmaf-tune |
| ADR-0615 | Fast NR Pre-Scoring for CRF Bisect Acceleration | Proposed | ai, planning, vmaf-tune |
| ADR-0616 | VMAF NEG Integration into vmaf-tune | Proposed | ai, planning, vmaf-tune, docs |
| ADR-0617 | Cross-Shot Complexity Weighting and Title-Level Quality Constraints | Proposed | ai, planning, vmaf-tune |
| ADR-0618 | Content-Aware Classifier for Encoder Routing | Proposed | ai, planning, vmaf-tune, dnn |
| ADR-0620 | Scaffold audit P0 — three silent-correctness fixes | Accepted | python, correctness, bugfix, fork-local |
| ADR-0621 | Scaffold audit P3 cleanup (2026-05-19): six housekeeping items — (1) repo-root detection in permutation_importance.py; (2) .workingdir2 → .corpus default paths in 13 ai/scripts/*.py; (3) precise @unittest.skip rationales for 4 stale test skips; (4) inline places=1 justification comments in deterministic quality-runner tests; (5) CI-only smoke fixture documentation in docs/ai/model-registry.md + lpips_sq_v1 mismatch note; (6) semgrepignore entry for vendored cJSON upstream markers. State drift: add T-VULKAN-MOTION-LAVAPIPE-INIT Open row in docs/state.md; close T-PYTHON-PERMUTATION-IMPORTANCE-HARDCODED-PATH. | Accepted | hygiene, ai, python, test, docs, ci, fork-local |
| ADR-0622 | VMAF NEG integration for vmaf-tune commands: model resolution, CLI threading, and --neg support across recommendation/tuning surfaces. | Accepted | vmaf-tune, docs |
| ADR-0623 | Scaffold audit P2: nine half-finished implementation fixes — expose adm_p_norm on integer ADM; gate float_vif_hip auto-dispatch behind enable_float_vif_hip_autodispatch Meson option; file T-SYCL-CLANG-TIDY-DISABLED / T-DOCKER-SMOKE / T-VULKAN-MOTION-LAVAPIPE-INIT / T-GPU-COVERAGE-STABLE-WEEKS / T-INTEGER-ADM-P-NORM-SIMD-GAP in state.md; fix stale .workingdir2/ → .corpus/ path in konvid_mos_head_v1.md; add forward-declaration banner to u2netp_mirror_card.md; rename lpips_sq.md → lpips_sq_v1.md to match registry id. | Accepted | ci, build, docs, hip, adm, state |
| ADR-0624 | Fast NR pre-scoring implementation for tune-per-shot and compare, including NR proxy backend, early-elimination telemetry, and bisect integration. | Accepted | vmaf-tune, bisect, ai, performance |
| ADR-0626 | SSH-into-runner debug session on macOS CI failure via tmate | Accepted | ci, macos, debug, fork-local |
| ADR-0628 | Remote-aware ADR number allocator — cross-worktree collision prevention | Accepted | adr, tooling, ci, governance, agents, fork-local |
| ADR-0634 | MCP P0 capability audit fixes: spec-correct isError, backend probe, version reporting, and encoded-video scoring tools. | Accepted | mcp, bugfix, spec-correctness, fork-local |
| ADR-0635 | Unknown | Accepted | |
| ADR-0637 | Fix 5 master CI failures: (1) repair corrupted test_list_tools_returns_expected_names function in MCP smoke test (invalid syntax from PR #1417/#1418 squash-merge); (2) lower coverage floor from 40% to 37% to match measured 37.7% after 2,200 LOC added by PRs #1417–#1425; (3) bump netflix-golden timeout 25→45 min; (4) bump vulkan-vif-cross-backend timeout 25→60 min; (5) bump vulkan-parity-matrix-gate timeout 30→60 min. | Accepted | ci, mcp, coverage, vulkan, timeout, fork-local |
| ADR-0638 | MCP P1 surface — vmaf-tune integration, list_extractors, describe_model, progress notifications | Accepted | mcp, vmaf-tune, api, docs |
| ADR-0639 | Scaffold-audit P1 — backend precheck, HIP picture, mobilesal bpc, DNN multi-output | Accepted | python, hip, ai, dnn, docs, vmaf-tune |
| ADR-0640 | Tiny-AI training on the original Netflix VMAF training corpus (2026-05-20 scaffold iteration) | Proposed | ai, docs, workspace, mcp |
| ADR-0641 | Harden dev-container encoder probes and vmaf-tune compare reports: auto-select Intel VA-API render nodes for QSV, build pinned intel/vpl-gpu-rt into the dev image, surface actionable AMF/QSV runtime errors, align compose health with the stdio MCP entrypoint, emit compare --format html|both profile-card reports directly, reduce default CPU compare encoders to libx265,libsvtav1, and lower raw shared-reference mid-run disk headroom to 1.1×. | Accepted | dev-container, vmaf-tune, ffmpeg, qsv, amf, reports |
| ADR-0642 | AI refresh defaults use the current fork CPU vmaf binary and real FULL_FEATURES regeneration paths for KoNViD-1k, UGC, BVI-DVC, and aggregate parquet rebuilds. | Accepted | ai, training-data, full-features, konvid, ugc, bvi-dvc, fork-local |
| ADR-0643 | vmaf-tune reports embed a versioned encoder profile that humans can inspect and vmaf-tune encode-profile can consume to run one selected FFmpeg encode; FFmpeg patch 0015 adds the advisory -vmaf-profile hand-off. | Accepted | vmaf-tune, ffmpeg, reports, cli, encoder-profile |
| ADR-0644 | Add vmaf-tune compare codec runtime variants: ADAPTER@VARIANT display tokens still route through the base adapter, --encoder-ffmpeg-bin TOKEN=PATH binds token-local FFmpeg binaries, and compare JSON/CSV rows now expose adapter, runtime_variant, and ffmpeg_bin provenance metadata. | Accepted | vmaf-tune, ffmpeg, cli, docs |
| ADR-0645 | Thread integer ADM p-norm through SIMD callbacks | Accepted | simd, feature-extractor, testing |
| ADR-0646 | Route attached DNN multi-output tensors | Accepted | 2026-05-20 |
| ADR-0647 | Refresh fr_regressor_v1 from the 2026-05-20 Netflix feature table | Accepted | ai, tiny-ai, model-refresh, netflix-public |
| ADR-0648 | CHUG HDR MOS Trainer Entry Point | Proposed | |
| ADR-0649 | CHUG HDR Wide MOS Feature Schema | Proposed | |
| ADR-0650 | Add a signal-mix audit CLI | Accepted | 2026-05-20 |
| ADR-0651 | Preserve normalized ffprobe HDR/display metadata on every CHUG feature row | Accepted | ai, chug, hdr, metadata, training |
| ADR-0652 | Add cheap decoded-luma blur/noise/grain primitives to CHUG feature rows | Accepted | ai, chug, hdr, features, training |
| ADR-0653 | CHUG Display Profile Training | Proposed | |
| ADR-0654 | Predictor Preserves Saliency Signals | Accepted | |
| ADR-0655 | Saliency feature materializer for existing AI corpus tables: enrich JSONL/parquet rows with saliency_mean, saliency_var, and row-level status before predictor / MOS-head retrains consume the signal. | Accepted | ai, saliency, training-data, docs |
| ADR-0656 | External-bench fork wrappers now emit registry competitor keys (fork-fr-regressor, fork-nr-metric) in summary.competitor so compare.py can validate and aggregate the fork's own FR/NR competitors instead of skipping them as schema mismatches. Shell-wrapper smoke tests use a fake vmaf-tune binary to pin the contract without installing external competitors. | Accepted | ai, testing, tooling, fork-local |
| ADR-0657 | Second-Opinion Feature Materializer | Accepted | ai, signal-mix, mos, external-bench |
| ADR-0658 | Project modernization audit | Accepted | 2026-05-20 |
| ADR-0659 | Modernization audit false-positive filter | Accepted | 2026-05-20 |
| ADR-0660 | Tiny-AI extractors check DNN availability before model paths | Accepted | 2026-05-20 |
| ADR-0661 | AI training sidecars record shared run provenance. MOS-head manifests include the user-facing entrypoint, normalized CLI arguments, named input/output paths, file hashes where available, and shared-trainer identity for CHUG wrapper runs. | Accepted | ai, tooling, manifests, training |
| ADR-0662 | Vulkan motion lavapipe parity. Routes automatic Vulkan motion dispatch and CI parity through the canonical integer_motion_vulkan twin while preserving motion_vulkan as an explicit compatibility name; restores integer_motion_vulkan's CPU/CUDA-compatible debug=true default; corrects CUDA / SYCL / Vulkan motion_v2 mirror padding to the CPU reflect-101 literal (2*size - idx - 2); and re-enables motion + motion_v2 in the lavapipe parity matrix. | Accepted | vulkan, cuda, sycl, ci, feature-extractor, numerical-correctness |
| ADR-0663 | Adds an explicit MOS label materializer for feature tables and changes real MOS-head training to fail instead of silently synthesizing when no labelled rows load. | Accepted | ai, mos, training, corpus |
| ADR-0664 | Install CUDA 13.2.0 directly in the Windows MSVC + CUDA CI leg after the wrapper action failed before setup | Accepted | ci, build, cuda, windows, github-actions |
| ADR-0665 | Fast-NR calibration sidecar writes require a minimum sample count and positive NR-vs-FR PLCC gate; weak fits report but do not update tune inputs unless explicitly overridden. | Accepted | ai, vmaf-tune, calibration, fast-nr |
| ADR-0666 | vmaf-tune report renders run-specific Quick takeaways before detailed charts so profile cards state the best row, coverage gaps, ladder span, and per-shot CRF spread for non-expert readers. | Accepted | vmaf-tune, reports, ux, encoder-profile |
| ADR-0667 | vmaf-tune --score-backend auto now uses native-first GPU priority (cuda -> sycl -> hip -> vulkan -> cpu) and accepts explicit hip backed by ROCm availability probes. | Accepted | vmaf-tune, gpu, cuda, sycl, hip, vulkan, fork-local |
| ADR-0668 | AI FULL_FEATURES table builders emit replayable manifest sidecars with shared run_provenance for extraction, parquet combination, and metadata enrichment outputs. | Proposed | ai, training, provenance, parquet |
| ADR-0669 | AI corpus JSONL merge and aggregation scripts emit replayable manifest sidecars with shared run_provenance. | Proposed | ai, training, provenance, corpus |
| ADR-0670 | Legacy AI corpus/extraction scripts emit replayable manifest sidecars with shared run_provenance. | Proposed | ai, training, provenance, corpus |
| ADR-0671 | Adds the missing ai/scripts/export_u2netp_mirror.py operator bridge for the ADR-0412 U2NetP mirror scaffold. The exporter imports an audited local upstream U-2-Net checkout, loads u2netp.pth, exports ONNX opset 17 as input -> saliency_map, requires Apache-2.0 license evidence, and writes a u2netp-mirror-export-manifest-v1 sidecar with shared run_provenance. The generated ONNX remains a signed release asset, not a committed file, and does not flip the mobilesal default. | Accepted | ai, dnn, u2netp, saliency, onnx, provenance, fork-local |
| ADR-0672 | Exposes ai/scripts/materialize_saliency_features.py temporal reducer controls (mean, ema, max, motion-weighted), records saliency model / reducer / EMA metadata on newly materialized rows, and preserves unknown provenance on skipped existing saliency rows. | Accepted | ai, saliency, materializer, provenance, fork-local |
| ADR-0673 | Adds a manifest-driven batch runner for saliency feature materialization so refreshed AI tables can share defaults, per-table overrides, audits, and one provenance-backed batch report. | Accepted | ai, saliency, materializer, provenance, fork-local |
| ADR-0674 | Adds a manifest-driven batch runner for second-opinion feature materialization so refreshed AI tables can join external NR/MOS scorer sidecars with per-table audits and one provenance-backed batch report. | Accepted | ai, second-opinion, materializer, provenance, fork-local |
| ADR-0675 | Adds a manifest-driven batch runner for MOS-label materialization so refreshed AI feature tables can join subjective labels with per-table audits and one provenance-backed batch report. | Accepted | ai, mos, materializer, provenance, fork-local |
| ADR-0676 | Requires MOS corpus JSONL adapters to emit replayable manifest sidecars with run counters, effective ingest config, path evidence, and ADR-0661 provenance. | Accepted | ai, mos, corpus, provenance, fork-local |
| ADR-0677 | KoNViD-1k and YouTube-UGC fetch helpers now write deterministic ADR-0661 run-manifest sidecars before corpus conversion. fetch_konvid_1k.py defaults to <root>/fetch_manifest.json; fetch_youtube_ugc_subset.py keeps its existing stem content manifest and writes <manifest>.run-manifest.json by default. | Accepted | ai, datasets, provenance, training, fork-local |
| ADR-0678 | AI scripts now have aiutils.run_manifest.write_run_manifest() plus a Claude /ai-run-manifest workflow for standalone artifact sidecars, while stable reports may continue embedding build_run_provenance(). | Accepted | ai, provenance, docs, agents |
| ADR-0679 | CI Draft Auto-Merge Gate | Accepted | ci, github-actions, merge-train, adr, fork-local |
| ADR-0680 | AI batch scripts now share parser/raw-argv boilerplate through aiutils.cli_helpers while keeping table-specific manifest schemas local to each runner. | Accepted | ai, cli, provenance, agents |
| ADR-0681 | AI Script Bootstrap Helper | Accepted | ai, cli, provenance, agents |
| ADR-0682 | Tiny-AI Netflix corpus training scaffold — 2026-05-22 prep scope | Accepted | ai, training, fork-local, onnx, mcp, docs |
| ADR-0683 | Replace banned functions (sprintf/strcpy) in vendored MCP cJSON with snprintf/memcpy; add vendor-policy AGENTS.md | Accepted | mcp, vendored, security, c, fork-local |
| ADR-0684 | Pre-rebase worktree-drift guard — companion git hook to ADR-0332, refuses git rebase from main checkout while agent worktrees are active | Accepted | agents, ci, git-hooks, fork-local |
| ADR-0685 | Tiny-AI Netflix corpus training scaffold — 2026-05-27 prep scope | Accepted | ai, training, fork-local, onnx, mcp, docs |
| ADR-0686 | VMAFX rebrand and aggressive modernization — umbrella ADR covering rename, Phase 1–4 plan, and multi-language strategy | Proposed | meta, vmafx, rebrand, modernization |
| ADR-0687 | CHUG HDR MOS head — held-out test partition validator | Accepted | ai, training, hdr, validation, fork-local |
| ADR-0688 | HIP wave32 carry-preserving int64 reduction for VIF and motion kernels | Accepted | hip, vif, motion, numerics, rocm, fork-local |
| ADR-0689 | VMAFX CI matrix deduplication — remove redundant job axes | Accepted | ci, build, vmafx, fork-local |
| ADR-0690 | VMAFX binary and AI tool aliases (vmafx, vmafx-tune, vmafx-mcp) | Accepted | cli, build, vmafx, aliases, fork-local |
| ADR-0691 | VMAFX Phase 1C — drop legacy build paths (build-cpu, build-cuda, build-all → unified build/) | Accepted | build, meson, vmafx, phase1, fork-local |
| ADR-0692 | Bump C standard to C23 (VMAFX rebrand Phase 1D); fix test_propagate_metadata prototype mismatch; add -Wimplicit-fallthrough. | Accepted | build, c, standards, meson, fork-local, vmafx-rebrand |
| ADR-0694 | Tighten clang-tidy enforcement and confirm ASan/UBSan/TSan/MSan as required CI gates | Accepted | ci, lint, sanitizers, clang-tidy, fork-local |
| ADR-0696 | --netflix-compat flag for restoring legacy VMAF defaults | Accepted | cli, compat, vmafx, fork-local |
| ADR-0698 | VMAFX production Dockerfile — multi-arch, image signing, SBOM | Proposed | docker, build, security, vmafx |
| ADR-0699 | VMAFX Helm chart and Kubernetes manifests with 3-vendor GPU device-plugin support | Proposed | k8s, helm, gpu, vmafx |
| ADR-0700 | VMAFX repo layout: rename libvmaf/ → core/ and python/vmaf/ → compat/python-vmaf/; ABI unchanged | Accepted | build, workspace, meta, vmafx |
| ADR-0701 | vmafx-server HTTP transport + observability foundation (/healthz, /readyz, /metrics, /v1/score) | Proposed | server, http, observability, cloud-native, vmafx |
| ADR-0702 | VMAFX Phase 4 multi-language modernization: Go 1.23 workspace, Rust workspace, C++23 policy | Proposed | go, rust, cpp23, language-policy, modernization, vmafx |
| ADR-0703 | vmafx-server in Go — gRPC + HTTP/JSON, libvmaf cgo wrapper, Prometheus metrics, distroless build | Proposed | server, go, grpc, http, observability, vmafx |
| ADR-0704 | vmafx-mcp Go port: single static binary, MCP Go SDK v1.6.1, all 15 Python tools ported with byte-for-byte schema parity | Accepted | mcp, go, build, agents |
| ADR-0705 | vmafx-tune Go port Stage 1: compare subcommand as vmafx-tune-go; pkg/encoder, pkg/bisect, pkg/report | Accepted | go, vmafx-tune, language-modernization, cli, phase4, fork-local |
| ADR-0706 | Rust vmafx-sys FFI crate: bindgen-generated raw bindings + thin safe wrapper for libvmaf | Accepted | rust, bindings, ffi, build |
| ADR-0707 | TAD (Temporal Absolute Difference) — first Rust feature extractor; proves cbindgen → Meson → libvmaf.so integration | Accepted | rust, build, metrics, feature-extractor, phase4, fork-local |
| ADR-0708 | C++20 internals pilot: convert metadata_handler.c to .cpp with RAII unique_ptr linked-list teardown; establish per-file C++20 migration recipe for Wave 1–3 | Accepted | build, c++, cpp23, refactor, internals, fork-local, vmafx-rebrand |
| ADR-0709 | VMAFX Phase 4b — distributed video-quality, encoding, and ML platform: controller/node/operator, ffmpeg, rclone, eBPF | Proposed | go, k8s, operator, controller, ffmpeg, rclone, ebpf, onnx, phase4b, fork-local |
| ADR-0710 | VMAFX CI Slim-Down v2 — one build per OS, ASan/UBSan/MSan/TSan, GitHub-hosted-only runners | Accepted | ci, build, sanitizers, fork-local |
| ADR-0711 | vmafx-controller Phase 4b.1 — Go service: gRPC + HTTP, in-memory job queue, persistent node registry, FIFO scheduler | Accepted | go, controller, phase4b, grpc, http, fork-local |
| ADR-0712 | IDE config audit and refresh for multi-language post-rebrand VMAFX: clangd, gopls, rust-analyzer, Python LSP | Accepted | ide, clangd, gopls, rust-analyzer, build, fork-local |
| ADR-0713 | vmafx-node Go worker binary — ffmpeg decode/encode/score pipeline, gRPC heartbeat to controller | Proposed | go, node, ffmpeg, phase4b, fork-local |
| ADR-0714 | vmafx-operator kubebuilder skeleton + CRDs: VmafxJob, VmafxNode, VmafxModelTraining; Stage 1 reconcilers; Helm integration | Accepted | go, k8s, operator, crd, phase4b, fork-local |
| ADR-0717 | vmafx-node ffmpeg version policy: pin to latest stable tag (n8.2); multi-stage Dockerfile with cpu/cuda/rocm/sycl variants | Accepted | node, ffmpeg, docker, phase4b, fork-local |
| ADR-0761 | C++23 Wave 8 — opt.cpp activation + read_json_model.cpp | Accepted | build, c++, cpp23, refactor, internals, fork-local |
| ADR-0763 | CUDA adm_decouple kernels: __ldg() F3 fix | Accepted | |
| ADR-0767 | Phase 4b.8 — libvmaf C ABI Break for VMAFx v4.0.0 | Proposed | api, abi, phase4b, breaking-change, v4, ffmpeg-patches, fork-local |
| ADR-0768 | C++23 Wave 9 — picture_pool + gpu_picture_pool | Accepted | build, c++, cpp23, refactor, internals, fork-local, vmafx-rebrand |
| ADR-0770 | vmafx-tune Go port — Stage 4 (report subcommand) | Accepted | go, vmafx-tune, language-modernization, cli, phase4, fork-local |
| ADR-0771 | SIMD twin coverage inventory and gap prioritisation | Accepted | simd, docs, planning |
| ADR-0772 | Rename feature_extractor.c to feature_extractor.cpp | Accepted | cpp23, build, core, fork-local |
| ADR-0773 | CUDA ADM decouple-inline — __ldg() F3 fix on active path | Accepted | cuda, performance, adm, fork-local |
| ADR-0774 | MCP server audit — path rename, subsample drop, schema drift, dead code | Accepted | mcp, server, audit, fork-local |
| ADR-0777 | Thread-Safety Audit — CUDA / SYCL / HIP Backends | Accepted | |
| ADR-0778 | Picture pool / framesync lifecycle audit and targeted fixes | Accepted | correctness, picture-pool, framesync, refcount, |
| ADR-0779 | eBPF FUSE read-path bypass for vmafx-node rclone mounts | Proposed | ebpf, node, rclone, performance, phase4b, fork-local |
| ADR-0780 | NOLINT Cluster Refactor — Slab Allocator, SYCL Stride, ADM Band-Size | Proposed | ci, simd, cuda, sycl, hip, lint |
| ADR-0781 | Sidecar online training — SGD + EMA + replay buffer | Proposed | ai, sidecar, online-learning, k8s, vmafx-node, phase4b, fork-local |
| ADR-0782 | OpenTelemetry tracing and metrics schema for the VMAFX platform | Accepted | observability, go, platform, adr-0782 |
| ADR-0783 | Kubernetes end-to-end integration test harness — kind + kuttl | Proposed | ci, testing, k8s, github |
| ADR-0784 | AVX2 SIMD path for integer SSIM horizontal moment accumulation | Accepted | simd, x86, avx2, ssim, performance, fork-local |
| ADR-0786 | vmafx-operator Stage 2 — reconciler loops, webhook validation, per-controller RBAC | Accepted | go, k8s, operator, crd, controller-runtime, phase4b, fork-local |
| ADR-0787 | libvmaf Public API Error-Path Consistency Audit | Accepted | |
| ADR-0788 | Doxygen doc-comment and @thread-safety tags on public C-API | Accepted | |
| ADR-0790 | Containerfile layer optimization — merge apt layer, strip build artifacts, no-cache-dir pip | Accepted | build, docker, containerfile, fork-local |
| ADR-0793 | Nightly Workflow Audit — TSan, Artifact Retention, Python Version | Accepted | ci, nightly, sanitizers, artifacts, fork-local |
| ADR-0794 | Multi-Tenant Auth Gateway for vmafx-controller | Accepted | security, controller, auth, multi-tenant, oidc, grpc |
| ADR-0797 | vmafx-server OpenAPI REST contract | Accepted | server, api, rest, openapi, swagger, go, vmafx-server |
| ADR-0802 | CI Runner Image Standardization — Pin ubuntu-latest to ubuntu-24.04 | Accepted | ci, build |
| ADR-0804 | Add vmaf_context_get_backend — additive ABI introspection | Accepted | api, abi, backend, gpu, fork-local |
| ADR-0806 | VmafFeatureDictionary caller-ownership contract | Accepted | api, memory, testing |
| ADR-0809 | C++23 Wave 8 — CLI conversion (cli_parse.c → .cpp, vmaf.c → .cpp) | Accepted | cpp23, build, cli, raii, fork-local |
| ADR-0811 | Security hardening — CodeQL Go coverage + codeql-config | Accepted | ci, security, codeql, go, dependabot, ossf |
| ADR-0812 | Renovate — Go/Cargo grouping, schedule, and concurrent-PR cap | Accepted | ci, build, deps |
| ADR-0815 | Distroless Dockerfiles for vmafx-operator and vmafx-node | Accepted | docker, ci, release, operator, node, k8s, phase4b, fork-local |
| ADR-0819 | PR-time CI gate for dev/Containerfile | Accepted | ci, build, workspace |
| ADR-0844 | float_adm AVX2/AVX-512 F2+F3 — double-precision and FP-contraction | Accepted | simd, bit-exactness, avx2, avx512, float_adm, build |
| ADR-0845 | CUDA motion — multi-frame SAD batching to reduce per-launch overhead | Proposed | cuda, performance, motion, fork-local |
| ADR-0848 | Per-Surface Documentation Compliance Audit — Session 2026-05-29 | Accepted | docs, compliance, process, per-surface-bar, fork-local |
| ADR-0852 | Wire speed_chroma_hip and speed_temporal_hip into HIP Build and Dispatch | Accepted | hip, build, speed, feature, gpu, fork-local |
| ADR-0854 | Direct AVX-512 parity tests for motion kernels | Accepted | |
| ADR-0858 | C++23 conversion of gpu_dispatch_env.c | Accepted | build, c++, cpp23, refactor, internals, fork-local, gpu-dispatch |
| ADR-0873 | ARM64 NEON bit-exactness audit — -ffp-contract=off carve-out scope | Accepted | simd, arm64, neon, bit-exactness, build, ci |
| ADR-0891 | SIMD bit-exactness round-2 — unify SSIMULACRA 2 colour-matrix on FMA, extend -fp-model=precise to libvmaf_feature_static_lib | Accepted | simd, build, bit-exact, icx |
| ADR-0899 | Bash strict-mode + trap-cleanup sweep across in-tree shell scripts | Accepted | ci, agents, shell, hygiene |
| ADR-0904 | Pin cargo-machete ignore entries for bindgen / cbindgen build dependencies | Accepted | rust, build, ci, workspace |
| ADR-0908 | Slow-test audit (2026-05-30) — no >30 s tests found; install slow marker as a future gate | Accepted | ci, testing, devx |
| ADR-0915 | Enable clang-tidy modernize-* family with curated opt-outs; discharge top 15 findings (8 nullptr + 6 deprecated-headers + 1 use-auto) on fork-added C++ files | Accepted | lint, ci, cpp, quality-gate, fork-local |
| ADR-0860 | Re-include Vulkan FFmpeg patches (0004 + 0006) as no-op shims for chain coherence — unblocks SYCL CI leg | Accepted | 2026-05-30 |
| ADR-0882 | Fuzz target audit: add fuzz_json_model + fuzz_dnn_sidecar libFuzzer harnesses closing deferred targets #3 + #4 from Research-0083. First run surfaced a heap-buffer-overflow in vmaf_model_destroy via parse_slopes outrunning feature_names. | Accepted | 2026-05-30 |
| ADR-0861 | Drop "and Claude (Anthropic)" from fork copyright lines (single notice Copyright 2026 Lusoris going forward); partially supersedes ADR-0025 / ADR-0105 format guidance. | Accepted | 2026-05-30 |
| ADR-0892 | Conventional-Commits coverage + Changelog-fragment section hygiene — extend release-please root section list, move perf/ + performance/ fragments to changed/perf-* | Accepted | 2026-05-30 |
| ADR-0865 | Sunset ANSNR (pre-VMAF metric): drop ansnr / float_ansnr feature extractors from all backends — back-dated to PR #38 merge (2026-05-28); closes ADR-0108 compliance gap caused by PR #38's wrong Parent ADR-0709 cite | Accepted | 2026-05-28 |
| ADR-0871 | SSIM SIMD dispatch installation must be pthread_once-guarded | Accepted | simd, threading, correctness, ssim, tsan |
| ADR-0869 | Sanitizer-Pass Cleanup — CAMBI Option-Type Mismatch and AVX{2,512} ADM Signed-Shift UB | Accepted | c, simd, sanitizer, correctness, cambi, adm |
| ADR-0887 | Reject JSON models whose per-feature arrays disagree on length | Accepted | security, parser, model, fuzz, hardening |
| ADR-0930 | Helm chart NetworkPolicy default-deny + Pod Security Standards "restricted" baseline (opt-in NP, UID 65532, seccomp RuntimeDefault) | Accepted | 2026-05-31 |
| ADR-0927 | OpenTelemetry traces + metrics Phase 1: adopt OTel Go SDK across all VMAFX Go services; pilot in vmafx-controller with pkg/observability.InitOTel helper, OTLP-to-collector export, head-based 1 % trace sampler, 60 s metric reader. Existing slog + Prometheus paths preserved. Per-service opt-in rollout for vmafx-node, vmafx-server, vmafx-mcp, vmafx-tune in later PRs. | Accepted | 2026-05-31 |
| ADR-0925 | Introduce pkg/registry.Store[K, V] generic in-memory keyed store + registry.Counter constraint; refactor cmd/vmafx-controller/nodes/Registry to compose it and fold one of pkg/observability.SetControllerSources's narrow interfaces into the generic constraint. Queue stays SQLite-backed. | Accepted | 2026-05-31 |
| ADR-0928 | VmafPicture v2 — replace void *priv overlay with explicit VmafBackendHandle backend discriminator + typed uintptr_t backend_handle; dual-API window (12 months); SONAME 3→4 scheduled for VMAFX v4.0.0; design + scaffold header only in this PR | Proposed (2026-05-31) | api, abi, gpu, cuda, sycl, hip, metal, ffmpeg, rust, fork-local, vmafx-rebrand |
| ADR-0926 | Parquet schema v2 — canonical column order, zstd-3, schema metadata | Accepted | ai, data, storage, parquet, k150k, chug |
| ADR-0924 | Native bash pre-commit hook as opt-in alternative to the pre-commit framework (~10x faster on small commits; CI unchanged) | Accepted | 2026-05-31 |
| ADR-0923 | Adopt BuildKit cache mounts and ccache across the container build matrix | Accepted | ci, build, container, performance |
| ADR-0931 | MCP server — replace subprocess delegation with direct cgo (Phase 1: vmaf_score + describe_model, behind VMAFX_MCP_DIRECT=1) | Proposed | mcp, go, cgo, libvmaf, performance, vmafx, modernization |
| ADR-0929 | Promote the safe wrapper layer out of vmafx-sys into a standalone vmafx crate; ship Phase 1 (Context, Model, Picture, Score, Error) | Accepted | rust, bindings, ffi, phase4, workspace, fork-local |
| ADR-0953 | Doxygen public-API build is warning-clean | Accepted | docs, ci, api, public-surface |
| ADR-0959 | Metal kernel parity coverage round 4 — closeout | Accepted | testing, metal, gpu, parity, regression-guard |
| ADR-0960 | GPU runtime error-path leak fixes — round 25 (A.1 + A.2 + A.3) | Accepted | cuda, memory, threading, correctness, fork-local |
| ADR-0961 | Controller queue — roll back PullWork on post-update Get failure (round-25 audit B.1) | Accepted | go, controller, correctness, queue, phase4b, fork-local |
| ADR-0963 | ai/src: guard NaN propagation in eval + tune (round-25 audit C.1 + C.2) | Accepted | ai, correctness, bisect |
| ADR-0966 | Fix dev/Containerfile post-ADR-0700 libvmaf → core paths (Round 26 audit C.1) | Accepted | dev, docker, containerfile, rename, adr-0700, fork-local |
| ADR-0968 | CI scripts — rebrand-proof assertion-density grep + tempfile EXIT trap in changelog concat (Round 26 audit D.1 + D.2) | Accepted | ci, build, docs |
| ADR-0969 | Helm chart: add seccompProfile default + fix node-deployment image helper (Round 26 audit B.1 + B.3) | Accepted | 2026-05-31 |
| ADR-0967 | MCP HTTP transport security — add auth + body limit + safer bind default (Round 26 audit A.1) | Accepted | security, mcp, http, auth, hardening, fork-local |
| ADR-0922 | Aggressive coverage ratchet: overall 37 → 60, critical 85 → 90, +5pp on every PER_FILE_MIN override; new per-PR coverage-delta gate (max 0.5pp drop on overall or any touched file); 30-day grace for in-flight PRs; exception process gated on new ADR superseding this one | Accepted | 2026-05-31 |
| ADR-0958 | HIP kernel parity-test coverage round 4 | Accepted | hip, tests, gpu, coverage, fork-local |
| ADR-0962 | Controller fixes — implement StreamJobs snapshot and add reaper stop signal (round-25 audit B.3 + B.4) | Accepted | controller, grpc, go, correctness, goroutine, phase4b, fork-local |
| ADR-0956 | CUDA kernel parity coverage — round 4 (last 5 uncovered kernels) | Accepted | testing, cuda, parity, fork-local, gpu-coverage |
| ADR-0964 | Implement speed_internal.c and wire speed_{chroma,temporal}_{hip,sycl} | Accepted | cuda, hip, sycl, feature-extractor, cross-backend-parity, speed |
| ADR-0965 | CUDA SpEED TU repair — align with current CudaFunctions table (closes T-CUDA-SPEED-TU-REPAIR-2026-05-31) | Accepted | cuda, feature-extractor, cross-backend-parity, speed, fork-local |
| ADR-0937 | mkdocs ADR nav — per-hundred bucket layout + auto by-tag indexes (scripts/docs/generate-adr-nav.sh, scripts/docs/generate-adr-by-tag.sh) | Accepted | 2026-05-31 |
| ADR-0933 | Add bidirectional ScoreStream RPC for per-frame VMAF scoring. Phase 1: schema + server stub. Phase 2 (2026-06-13): wired to libvmaf via the in-memory pkg/libvmaf.StreamScorer; both vmafx-server and vmafx-node serve it; pooled VMAF bit-identical to ScoreDirect. | Accepted | 2026-05-31 |
| ADR-0938 | Feature-extractor coverage round 2 — seven CPU-side test executables | Accepted | tests, coverage, ci |
| ADR-0935 | Wrap multi-step Go cleanup paths with errors.Join; standardise slog error keys | Accepted | go, observability, refactor, phase4b |
| ADR-0936 | Replace final os.path.* usages with pathlib.Path; enable ruff PTH ruleset (flake8-use-pathlib) on fork-owned Python; per-file ignore for upstream Netflix trees. Fixes 12 PTH violations across 9 files. | Accepted | 2026-05-31 |
| ADR-0932 | iter.Seq[T] companion APIs for single-pass Go collections | Accepted | go, api, performance, ergonomics |
| ADR-0939 | Add /add-mcp-tool, /add-k8s-resource, /audit-modernization skills + consolidate bisect-* on shared lib/bisect-common.sh | Accepted | 2026-05-31 |
| ADR-0934 | Migrate user-input dataclass configs (TrainConfig, ModelMetadata, ManifestEntry) to pydantic v2 BaseModel — declared validators, line-numbered errors, JSON-Schema export. Internal report / data-carrier dataclasses stay untouched per explicit triage rule. Sidecar JSON layout byte-identical. 667 ai tests pass. | Accepted | ai, validation, configs, modernization |
| ADR-0955 | Fix two latent bugs in compat/python-vmaf/: scanf width-handler inversion + ProcessRunner locale setdefault | Accepted | 2026-05-31 |
| ADR-0954 | Host-only unit test for shared GPU dispatch runtime | Accepted | test, gpu, cuda, hip, runtime |
| ADR-0952 | Push test coverage on vendored libsvm + IQA paths the fork uses (2 new fast-suite executables, 29 assertions, +60 pp coverage) | Accepted | 2026-05-31 |
| ADR-0950 | Fix symmetric "adm" vs "adm_hip" feature-name bug in test_hip_adm_parity; add -ENOSYS skip for enable_hipcc=false posture | Accepted | 2026-05-31 |
| ADR-0949 | HIP motion3 parity test skips cleanly on -ENOSYS from vmaf_read_pictures when libvmaf was built with enable_hipcc=false; mirrors the existing no-HIP-device skip path. Fixes PR #443 audit-flagged meson test --suite gpu failure. | Accepted | 2026-05-31 |
| ADR-0951 | GitHub Actions custom-action + reusable-workflow audit (2026-05-31): no .github/actions/ exists, no workflow_call: workflows exist, all 24 workflows already SHA-pinned per ADR-0263; two abstraction candidates (composite setup-build-deps, reusable meson-cpu-build.yml) deferred. | Accepted | 2026-05-31 |
| ADR-0947 | CUDA kernel parity coverage — round 3 (float-path twins + ssimulacra2) | Accepted | testing, cuda, parity, fork-local, gpu-coverage |
| ADR-0948 | Feature-extractor coverage round 3 — targeted unit tests for low-coverage files | Accepted | test, coverage, feature |
| ADR-0945 | HIP kernel parity-test coverage round 3 | Accepted | hip, tests, gpu, coverage, fork-local |
| ADR-0914 | Unified Python test orchestrator (nox at repo root) covering ai/ + mcp/ + tools/ + dev-llm/ + python/ | Accepted | 2026-05-31 |
| ADR-0917 | Adopt cargo-deny with a workspace deny.toml enforcing license allowlist, banned crates (openssl-sys, native-tls), RustSec advisories, and crates.io-only sources for the Rust workspace; wired into rust-ci.yml as a parallel job | Accepted | 2026-05-31 |
| ADR-0918 | LLVM IR diff harness for bit-exact SIMD paths — opt-in make ir-diff gate that snapshots per-function IR under testdata/ir-snapshots/ and fails on compiler-induced drift (FMA / FP-contract reassociation) | Accepted | simd, build, ci, perf, diagnostics, fork-local |
| ADR-0912 | Pixel-format edge coverage at the libvmaf unit-test layer (4:2:2, 4:4:4, 10/12-bit PSNR/SSIM/CIEDE end-to-end smoke) | Accepted | test, coverage, fork-local, pixel-format, hbd |
| ADR-0913 | Changelog fragment renderer — splice contract is ^## \[, not ^##: fixes 23 k+ line drift from PR #332 / #383 / #401 root cause (fragment-internal ## headers tripping the sentinel); regenerated CHANGELOG.md from 59 757 → 15 030 lines | Accepted | 2026-05-31 |
| ADR-0719 | vmafx-node rclone Integration — Remote-Asset Streaming Without Disk Materialisation | Accepted | architecture, go, node, rclone, storage, ffmpeg, phase4b, fork-local |
| ADR-0720 | C++23 Wave-1 Pilot — mem.c conversion | Accepted | build, c, cpp23, refactor, internals, fork-local, vmafx-rebrand |
| ADR-0721 | C++23 Wave 1 pilot: convert opt.c → opt.cpp using std::optional<T> for parse helpers; extern "C" guards; [[nodiscard]] on public entry point; isolated opt_cpp23_lib static-lib pattern (ADR-0708 playbook). | Accepted | 2026-05-28 |
| ADR-0723 | C++23 Pilot — fex_ctx_vector.c Conversion (Wave 2) | Accepted | build, c, cpp23, refactor, internals, fork-local, vmafx-rebrand |
| ADR-0725 | C++23 Pilot — log.c conversion (real C++23, supersedes ADR-0722) | Accepted | build, c, cpp23, refactor, internals, fork-local, vmafx-rebrand |
| ADR-0726 | Drop Vulkan backend | Accepted | vulkan, gpu, backend, build, breaking, fork-local |
| ADR-0727 | C++23 Wave 2: project-wide cpp_std=c++23 + dict.c → dict.cpp with std::expected, std::string_view, [[nodiscard]]; toolchain floor gcc >= 13 / clang >= 16 | Accepted | 2026-05-28 |
| ADR-0728 | Sunset Legacy Native Build Modes — Phase 4b.9 Follow-On | Accepted | ci, build, vmafx, breaking |
| ADR-0729 | C++23 Wave 3 — feature_name, picture_copy, model | Accepted | build, cpp23, refactor |
| ADR-0730 | vmafx-tune Go port — Stage 2 (ladder subcommand) | Accepted | go, vmafx-tune, language-modernization, cli, phase4, fork-local |
| ADR-0731 | C++23 Wave 3 Part B — psnr_tools, luminance_tools, mkdirp | Accepted | build, cpp23, modernization |
| ADR-0733 | C++23 Wave 4 — output writers (XML, JSON, CSV, subtitle) | Accepted | build, c++, cpp23, refactor, internals, fork-local, vmafx-rebrand |
| ADR-0735 | C++23 Wave 5 — cpu, ref, thread_locale | Accepted | build, c++, cpp23, refactor, internals, fork-local |
| ADR-0738 | Bump local CUDA toolkit pin to 13.3 + R610 minimum driver (partial — CI deferred) | Accepted | cuda, build, container, ci, deps |
| ADR-0743 | CUDA VIF filter1d ncu-driven performance optimizations | Accepted | cuda, performance, vif |
| ADR-0744 | CUDA adm_cm __launch_bounds__(128, 8) register reduction (ms_ssim_decimate smem tiling reverted) | Accepted | cuda, performance, adm, ms_ssim, occupancy |
| ADR-0746 | integer_adm_cuda — emit integer_adm3 + integer_aim (parity with CPU) | Accepted | cuda, integer_adm, aim, adm3, parity |
| ADR-0747 | CUDA extern "C" invariant for host-looked-up kernels | Accepted | |
| ADR-0749 | Sunset VmafLegacyQualityRunner (float-path runner) | Accepted | python, quality-runner, breaking-change, cleanup |
| ADR-0750 | Hardware Measurement Verdict for PR perf/cuda-ms-ssim-decimate-adm-cm-ncu-driven | Accepted | cuda, performance, ms_ssim, adm_cm, measurement |
| ADR-0752 | Multi-Resolution Performance Benchmark Baseline | Accepted | |
| ADR-0753 | Runtime Resolution-Aware CUDA Kernel Variant Dispatch | Accepted | cuda, perf, build |
| ADR-0754 | CUDA SSIM vert_combine: __ldg() + __launch_bounds__ + pinned-host leak fix | Accepted | |
| ADR-0755 | C++23 Wave 7 — drop orphan cpu.c, activate cpu.cpp | Accepted | cpp23, build, core, fork-local |
| ADR-0756 | CUDA F3 struct-by-value kernel audit (scope + dispatch order) | Accepted | cuda, perf, research |
| ADR-0757 | CUDA MS-SSIM ms_ssim_vert_lcs + ms_ssim_horiz: __ldg() + __launch_bounds__ (F3 fix #2) | Accepted | |
| ADR-0759 | HIP ADM — AdmBufferHip passed by pointer (F3 fix) | Accepted | hip, performance, cuda, kernel, adm, fork-local |
| ADR-0760 | CUDA motion kernel multi-resolution ncu profiling methodology | Accepted | cuda, perf, research |
| ADR-0762 | CUDA CIEDE2000 8bpc/16bpc — __ldg() read-only cache routing (F3 fix) | Accepted | cuda, performance, ciede, fork-local |
| ADR-0775 | DNN ORT Backend Audit Findings | Accepted | dnn, onnx, ort, thread-safety, correctness, fork-local, research |
| ADR-0792 | Env-var overrides for hardcoded YUV and testdata paths | Accepted | workspace, ci, testdata |
| ADR-0795 | Clarify and harden VmafFeatureExtractor.prev_ref thread-safety invariant | Accepted | threading, feature-extractor, batch-threading, correctness |
| ADR-0810 | ADR-0108 Six-Deliverables Compliance Audit (2026-05-29) + D3 Gap Fixes | Accepted | docs, agents, process |
| ADR-0839 | C++23 wave — shadow-identifier and implicit-cast cleanup | Accepted | cpp23, lint, core, sycl, fork-local |
| ADR-0840 | Fix cu_state leak on import failure and gpu_dispatch_env TOCTOU | Accepted | cuda, security, framework, ci |
| ADR-0841 | Environment variable reference page and canonical naming | Accepted | docs, sycl, cuda, ai, workspace |
| ADR-0853 | Remove dead debug-print macros from motion_avx2.c | Accepted | simd, lint, avx2, cleanup, fork-local |
| ADR-0911 | __init__.py export-completeness audit — __all__ + SPDX headers across 8 fork-added Python packages | Accepted | 2026-05-31 |
| ADR-0910 | Project-wide codespell config + skip-list policy: ignore Netflix-author / vendored / frozen-ADR files; ignore domain acronyms (ANE, HSA, SME, CANN, COO, …); 3 typo fixes (CONTRIBUTING.md, docs/metrics/cambi.md ×2) | Accepted | 2026-05-31 |
| ADR-0907 | Wall-clock perf regression gate over the ADR-0752 multi-resolution baseline. Adds scripts/perf/check-regression.py (stdlib-only) and a CPU-only CI job in tests-and-quality-gates.yml that runs bench-multi-resolution.sh and asserts that no (resolution, backend, metric) cell regresses by > 5% wall-clock vs the committed testdata/perf_multi_resolution.json baseline. Replaces the silently-broken bench_all.sh --backend=cpu --snapshot-only --tolerance-ulp=2 invocation that was a no-op (the script does not parse those flags). continue-on-error: true for one release cycle so cross-runner variance data can inform whether the 5% tolerance is right before the step is promoted to a required check. GPU lanes deferred to a follow-up ADR once self-hosted runners stabilise. | Proposed | ci, performance, regression-gate, fork-local, testing |
| ADR-0905 | .gitignore + .github/workflows/ staleness audit — drop dead rules, rewire post-ADR-0700 paths, no workflow removals | Accepted | 2026-05-30 |
| ADR-0903 | Wire codecov/codecov-action@v6.0.1 (SHA-pinned) into both Coverage Gate jobs in tests-and-quality-gates.yml. Uses fork-aware OIDC (no CODECOV_TOKEN secret); fail_ci_if_error: false so Codecov outages do not double-gate the gcovr threshold check; flag-tagged cpu vs gpu to mirror the two-job split. Closes the gap PR #383 documented ("Codecov badge intentionally NOT added: no Codecov upload step exists in any workflow"). | Accepted | 2026-05-30 |
| ADR-0902 | Signing and attestation audit (2026-05-30): existing Sigstore + SLSA + SBOM coverage is strong; close three closeable gaps by adding actions/attest-build-provenance@v4.1.0 to all 5 container build jobs, having the post-push smoke-test verify the cosign signature before pulling, and expanding docs/development/release.md with copy-pasteable consumer verification recipes. Tag signing, DCO sign-off, Helm chart signing, standalone Go binary releases — explicitly scoped out. | Accepted | 2026-05-30 |
| ADR-0953 | Doxygen public-API build is warning-clean | Accepted | docs, ci, api, public-surface |
| ADR-0901 | Governance file audit: add GOVERNANCE.md + MAINTAINERS.md, extend CODEOWNERS for fork-local subtrees, document ADR-0108 in CONTRIBUTING.md | Accepted | governance, docs, meta |
| ADR-0893 | Pre-commit config audit — 2026-05-30 (forbid-new-submodules + isort/ruff bumps) | Accepted | ci, lint, hygiene, pre-commit |
| ADR-0889 | Vendored libsvm 3.24 audit — close header-row-ordering oob, document upstream-version policy | Accepted | 2026-05-30 |
| ADR-0890 | CI concurrency + cost audit follow-up to PR #301: concurrency block + ccache on ffmpeg-integration.yml, ccache on sanitizers.yml, paths-ignore on security-scans.yml, early file-delta skip on lint-and-format.yml::clang-tidy | Accepted | 2026-05-30 |
| ADR-0884 | SYCL kernel coverage round 2 — five new CPU-vs-SYCL parity gates (adm, ciede, integer_ssim, float_ms_ssim, motion_v2) at ADR-0214 places=4; follows round 1 / ADR-0868 | Accepted | 2026-05-30 |
| ADR-0886 | CUDA kernel parity test coverage — round 2 gap-fill | Accepted | testing, cuda, gpu, parity, coverage |
| ADR-0883 | HIP kernel parity-test coverage round 2 | Accepted | hip, tests, gpu, coverage |
| ADR-0880 | Remove three unreferenced fork-added testdata artifacts: check_borders.py (one-off ADM border debug), compare_a380.py (superseded by compare_combined.py), scores_sycl_b580_576_mq.json (orphan slim-schema snapshot) | Accepted | 2026-05-30 |
| ADR-0878 | Trivy container scan baseline: production Dockerfiles add USER nonroot:nonroot (UID 65532) to every final stage; clears 2 HIGH DS-0002 findings on docker/Dockerfile.production + docker/Dockerfile.production-gpu; dev/Containerfile keeps USER root (intentional dev sandbox); image-CVE scan blocked on MANIFEST_UNKNOWN for unpublished ghcr.io/vmafx/vmafx:* (follow-up) | Accepted | 2026-05-30 |
| ADR-0876 | Adopt <inttypes.h> PRI macros (PRId64 / PRIu64 / PRIx64) for fixed-width integer printf formatting in fork-added C / C++ — eliminates Windows-LLP64 truncation bug class (CERT FIO47-C, MISRA 21.6) | Accepted | 2026-05-30 |
| ADR-0877 | Fork-added MS-SSIM decimate dispatcher — convert return -1 (malloc-failure) to -ENOMEM | Accepted | 2026-05-30 |
| ADR-0875 | GitHub Actions hardening audit (2026-05-30): SHA-pinning confirmed across 22 workflows; backfill top-level permissions: contents: read on go-ci.yml + rust-ci.yml; add persist-credentials: false to 5 checkouts in sanitizers.yml + supply-chain.yml. | Accepted | 2026-05-30 |
| ADR-0874 | Name magic numbers in fork-added C surfaces (CERT INT07-C closeout pass 1) — adds ~20 named #define constants across core/src/mcp/, core/src/picture.c, core/src/cuda/picture_cuda.c, core/src/libvmaf.c. Rename-only; no numeric values changed. | Accepted | 2026-05-30 |
| ADR-0872 | POSIX I/O EINTR-retry + return-value audit on fork-added C: two MCP drain loops now retry on EINTR (silent stream desync under signal pressure); seven discarded close(2) returns marked (void)-cast for Power-of-10 rule 7 hygiene. | Accepted | 2026-05-30 |
| ADR-0870 | Add deploy/helm/vmafx/values.schema.json (enforces workload / gpu.vendor / storage.mode enums, blocks sibling-key typos via additionalProperties: false); fix dev/Containerfile ADR-0700 path drift (COPY libvmaf/ → COPY core/ + new COPY compat/, two cd libvmaf → cd core, .dockerignore core/build*/ siblings). | Accepted | 2026-05-30 |
| ADR-0868 | GPU backend kernel parity-test coverage gap-fill | Accepted | tests, cuda, hip, sycl, metal, coverage |
| ADR-0970 | test_gpu_picture_pool.c: remove unused malloc + dead code (Round 27 audit D.3 + D.4) | Accepted | testing, cuda, memory, cleanup, fork-local |
| ADR-0971 | Test suite: NULL-check malloc in 3 test files (Round 27 audit D.1) | Accepted | testing, correctness, asan, fork-local |
| ADR-0972 | Public headers — replace ISO-reserved __VMAF_*__ include guards with LIBVMAF_*_H (Round 27 audit A.1, SEI CERT DCL37-C) | Accepted | api, headers, cert-c, lint, compatibility |
| ADR-0973 | Master CI fixes — Metal MS-SSIM fixture dim + ssimulacra2 icpx XYB bit-exactness | Accepted | ci, simd, metal, ssimulacra2, icpx, bit-exactness, tests |
| ADR-0975 | Use NamedTemporaryFile in _run_vmaf_score to eliminate task-name collision risk | Accepted | mcp, security, concurrency |
| ADR-0976 | dnn sidecar: delete dead has_norm / norm_mean / norm_std / expected_min / expected_max / has_range fields and three consumer branches (per ADR-0114 deferred cleanup); plug partial-allocation leak in extract_string_array on every error path | Accepted | 2026-05-31 |
| ADR-0977 | core/tools input-reader safety — Y4M malloc-NULL check, YUV/Y4M size_t cast, bench GPU-state leaks | Accepted | security, bug, tools, audit |
| ADR-0978 | vmafx-server + pkg/score bug-audit — shutdown leak, gRPC Send-EOF surfacing, HTTP body cap, panic recovery | Accepted | security, bug, audit, go, grpc, http, server |
| ADR-0980 | Markdown-lint full-ruleset discharge — content fixes + per-file scoped disables | Accepted | docs, lint, ci, policy, fork-local |
| ADR-0983 | gosec sweep — fix all findings + add CI gate | Accepted | security, ci, go, fork-local |
| ADR-0862 | K150K extractor — .done vs parquet consistency check on restart | Accepted | ai, pipeline, durability, k150k |
| ADR-0879 | Python dependency freshness sweep — bump nine stale floors across ai/, mcp-server/, dev-llm/, tools/, leave ceilings + hash-pinning to follow-up | Accepted | 2026-05-30 |
| ADR-0881 | Coverage-overrides audit: tighten tiny_extractor_template.h 10 % → 75 % (67 pp slack recovered); codify tighten/keep/remove rule + quarterly audit cadence | Accepted | 2026-05-30 |
| ADR-0888 | Pyright strict audit of fork-local Python packages | Accepted | ai, mcp, python, type-safety, ci |
| ADR-0946 | SYCL kernel coverage round 3 — 5 CPU vs. SYCL parity gates for float_psnr_sycl, float_adm_sycl, float_vif_sycl, float_motion_sycl, psnr_hvs_sycl at ADR-0214 places=4 (1e-4) | Accepted | 2026-05-31 |
| ADR-0957 | SYCL kernel coverage round 4 — 4 CPU vs. SYCL parity gates for float_moment_sycl, speed_chroma_sycl, speed_temporal_sycl (places=4 / 1e-4) and ssimulacra2_sycl (5e-3 per ADR-0214 FEATURE_TOLERANCE); closes the SYCL kernel-coverage backlog at 18/18 = 100 % | Accepted | 2026-05-31 |
| ADR-0984 | Port Netflix Upstream May–Jun 2026 (5 commits) | Accepted | |
| ADR-0985 | SYCL parity divergence investigation — float_ssim + ssimulacra2 on Arc A380 | Proposed | sycl, parity, ci, gpu, precision, arc |
| ADR-0986 | docs.yml — add PR trigger to surface doc-substance gaps before merge | Accepted | ci, docs, fork-local |
| ADR-0987 | AVX-512 path for float_moment feature extractor | Accepted | simd, avx512, performance, float_moment, fork-local |
| ADR-0988 | Route strict-JSON helpers through vmaftune.jsonio across vmaf-tune | Accepted | refactor, json, vmaf-tune, mcp |
| ADR-0989 | Wire motion_add_uv through integer_motion_sycl (UV blur+SAD on device, per-plane normalization on host); add motion_add_uv ENOTSUP stub to CUDA/Vulkan/HIP/Metal; upgrade motion_five_frame_window rejections to WARNING | Accepted | 2026-06-03 |
| ADR-0990 | Restore double-precision L/C/S accumulation in CUDA ms_ssim_vert_lcs — promotes per-pixel L/C/S and warp/block reductions from float to double (ADR-0139 pattern), fixes test_cuda_float_ms_ssim_parity places=4 gate | Accepted | 2026-06-03 |
| ADR-0991 | Fixes the missing pythonpath = ["scripts"] in ai/pyproject.toml so batch materializer tests pass from the repo root, and commits a smoke-run scaffold under ai/testdata/smoke-second-opinion-batch/. | Accepted | ai, second-opinion, materializer, testing, smoke, fork-local |
| ADR-0992 | Ship corpus-specific MOS-label batch manifests for KonViD and CHUG (ai/configs/); add smoke tests that validate manifest schema and run end-to-end with synthetic data; fix pre-existing sys.path bug in test_batch_materialize_mos_labels.py | Accepted | 2026-06-03 |
| ADR-0994 | Fix Coverage Gate build break — integer_motion.c compile error | Accepted | ci, coverage, build, motion |
| ADR-0995 | Shorten CI workflow and job display names | Accepted | ci, github-actions, docs |
| ADR-0996 | eBPF FUSE bypass for rclone zero-copy path in vmafx-node | Proposed | ci, go, ebpf, rclone, performance, security, supply-chain |
| ADR-0999 | Guard <stdatomic.h> includes in C++ translation units (GCC 14 + Clang-18 fix) | Accepted | build, ci, cpp, atomics, tsan, fork-local |
| ADR-1000 | Tech-stack badges added to README (version pins, GPU/SIMD, container); Go version bumped go.mod + go-ci.yml 1.23→1.26.4; Rust edition 2024 deferred (bindgen 0.69 extern-block blocker) | Accepted | 2026-06-04 |
| ADR-1001 | SYCL parity round 5 — CAMBI CPU vs. SYCL parity gate | Accepted | sycl, test, gpu, parity, kernel-coverage, cambi, fork-local |
| ADR-1002 | Bump Rust workspace to edition 2024 and bindgen to 0.72 | Accepted | rust, build, workspace |
| ADR-1003 | Bump project-wide C++ standard from c++11 to c++23 in core/meson.build — lands the ADR-0727 decision that was never applied to the default_options entry; fixes pre-existing test_feature_collector_coverage linker failure | Accepted | 2026-06-04 |
| ADR-1004 | HIP kernel parity-test coverage round 5 — 2 CPU vs. HIP parity gates for speed_chroma_hip and speed_temporal_hip at ADR-0214 places=4 (1e-4), closing the round-4 deferral after ADR-0964 / PR #465 resolved the speed_internal.c link defect; lifts HIP extractor parity coverage to 17/17 (100%) for all non-deferred kernels | Accepted | 2026-06-04 |
| ADR-1005 | Perf Gate Advisory Mode and Baseline Refresh Documentation | Accepted | ci, perf |
| ADR-1007 | Fix C string/numeric UB cluster — NULL strcmp, size_t underflow, signed-shift overflow, snprintf truncation | Accepted | core, security, c, ub |
| ADR-1008 | Fix C lifecycle bugs — pic_cnt double-increment, div-by-zero in pooled score, silent test failures | Accepted | core, correctness, test, c |
| ADR-1009 | Fix Go shutdown / goroutine correctness — WaitForShutdown unconditional block, unbounded GracefulStop | Accepted | go, server, controller, shutdown, correctness |
| ADR-1010 | MCP server JSON parse guards — vmaf output and ffprobe output | Accepted | mcp, python, error-handling, correctness |
| ADR-1011 | Add static to TU-internal CUDA helper functions — VIF, ADM, motion | Accepted | core, cuda, correctness, build |
| ADR-1012 | Go queue state-machine guards — PullWork AND-status, ReportResult idempotency | Accepted | go, controller, queue, correctness, concurrency |
| ADR-1014 | Prometheus registry isolation for SetControllerSources | Accepted | security, observability, go |
| ADR-1017 | Go operator controller resource-allocation fixes | Accepted | security, k8s, go, operator |
| ADR-1018 | MCP exec.CommandContext + controller gRPC panic recovery | Accepted | security, mcp, go, grpc |
| ADR-1020 | acq_rel memory ordering on ref-count decrement + mutex-destroy-after-unlock + picture-pool unlock ordering | Accepted | correctness, threading, core |
| ADR-1021 | Constant-time session-token comparison + JWT nbf-claim validation | Accepted | security, auth |
| ADR-1022 | Cast dst_buf_read_sz operands to size_t in y4m_input to prevent signed-integer overflow | Accepted | core, security, correctness, tools, c |
| ADR-1023 | MCP server asyncio correctness — async wrappers for blocking I/O | Accepted | mcp, asyncio, python, correctness |
| ADR-1024 | R6 per-metric scoring guards — PSNR/ADM correctness fixes | Accepted | correctness, psnr, adm, simd |
| ADR-1025 | R6 CUDA/HIP kernel correctness fixes | Accepted | correctness, cuda, hip, simd |
| ADR-1026 | R6 SYCL kernel correctness — rd-stride OOB and unchecked graph_wait | Accepted | correctness, sycl |
| ADR-1030 | HIP adm_decouple dangling body + VIF wavefront 32-bit carry + Metal motion vertical halo | Accepted | hip, metal, correctness, gpu |
| ADR-1032 | vmaf_init double-init guard and vmaf_close pointer-contract documentation | Accepted | api, correctness, memory-safety |
| ADR-1033 | CPU-side scoring NaN/UB guards across PSNR/SSIM/MS-SSIM/ADM/CAMBI/MOTION | Accepted | correctness, cpu, psnr, ssim, adm, cambi, motion |
| ADR-1034 | Fix SYCL integer_vif rd_stride OOB on odd widths and integer_motion UV queue sync gap | Accepted | sycl, correctness, gpu |
| ADR-1035 | CI workflow concurrency guards and job timeouts | Accepted | ci, security, supply-chain |
| ADR-1036 | Correct SPDX license identifiers and add missing libsvm copyright | Accepted | license, security, supply-chain |
| ADR-1038 | MCP cross-surface precision-default and probe-precision drift | Accepted | mcp, correctness, cross-surface |
| ADR-1039 | Fix CERT MEM04-C realloc OOM safety in vendored libsvm | Accepted | memory-safety, correctness, vendored |
| ADR-1040 | Promote integer_ssim_moments_t to shared header (macOS / Windows arm64 build fix) | Accepted | build, simd, arm64, macos, windows, integer-ssim, fork-local |
| ADR-1041 | Fix CI RED — Go metal option type + Rust AVX-512 test guard | Accepted | ci, build, go, rust, avx512, metal |
| ADR-1042 | Containerfile hardening — non-root USER + build-time DEBIAN_FRONTEND | Accepted | containerfile, security, docker, ci, hardening |
| ADR-1047 | Helm chart schema and values.yaml correctness fixes (R9 batch) | Accepted | helm, k8s, bug |
| ADR-1048 | vmaf-tune ladder --duration sentinel dest mismatch fix | Accepted | vmaf-tune, bug, cli |
| ADR-1049 | Exponential backoff for vmafx-node online-feedback drainLoop | Accepted | go, grpc, node, bug |
| ADR-1051 | Port upstream batch-threading + picture-pool defaults (dff4082b + 46d3a154) | Accepted | upstream-port, scoring, threading, correctness |
| ADR-1052 | Re-register CPU motion_v2 extractor and fix post-flush test ordering | Accepted | core, motion, test, build |
| ADR-1053 | Default docker-compose runtime to nvidia and expand GPU capabilities | Accepted | dev, cuda, docker, build |
| ADR-1056 | Use /std:c++latest on MSVC instead of cpp_std=c++23 | Accepted | build, ci, windows, msvc |
| ADR-1057 | Revert float-ADM SIMD dispatch wiring (PR #685) — NEON FMA divergence unfixable in scope | Superseded by fix/neon-fma-safe-float-adm-dwt2 (float_adm_dwt2_neon.c) | simd, neon, float-adm, revert, correctness |
| ADR-1058 | Helm chart security hardening — PDB, RBAC split, metrics NetworkPolicy, schema tightening | Accepted | helm, k8s, rbac, security, networkpolicy |
| ADR-1060 | Round 10 C++23 wave error-path cleanup | Accepted | cpp23, correctness, memory, error-handling, fork-local |
| ADR-1061 | Fix depth-limit, integer-overflow, and banned-function bugs in vendored pdjson and cJSON | Accepted | security, vendored, mcp, c, libvmaf, fork-local |
| ADR-1063 | Rust clippy strictness — scoped bindings suppression, unsafe_op lint, no panicking Default | Accepted | rust, lint, safety, workspace |
| ADR-1064 | Wire score_fmt option on all vmaf FFmpeg filters | Accepted | ffmpeg, build, api |
| ADR-1065 | Go staticcheck r10: replace time.After in poll loops with time.NewTicker (timer-leak fix); add MaxBytesReader + 413 mapping to controller /v1/score; add ReadTimeout to both HTTP servers | Accepted | 2026-06-06 |
| ADR-1066 | Regression tests for the sequential-realloc double-free in libsvm | Accepted | test, security, ci |
| ADR-1068 | Fix fast-path data race in gpu_dispatch_env.cpp via atomic publication flag | Accepted | core, correctness, thread-safety, cpp23 |
| ADR-1069 | Operator CRD status-schema gaps and VmafxNode LastHeartbeat ownership | Accepted | operator, crd, k8s, bug |
| ADR-1071 | Promote HIP ms_ssim_vert_lcs to double precision (ADR-0990 parity) | Accepted | hip, precision, ms_ssim, cross-backend-parity |
| ADR-1072 | Fix PREV_REF refcount leak in threaded batch and serial dispatch paths | Accepted | core, threading, memory, picture-pool, bug |
| ADR-1073 | Fix vmaf_score_at_index EAGAIN-guard misapplication for model output slots | Accepted | mcp, scoring, correctness, core, fork-local |
| ADR-1074 | Helm chart values completeness — missing knobs and schema gaps | Accepted | helm, k8s, bug |
| ADR-1075 | MCP HTTP transport POST /v1/score body-validation edge cases | Accepted | mcp, security, correctness, http, fork-local |
| ADR-1077 | vmaf-tune corner cases: parse_versions + compare preset | Accepted | |
| ADR-1078 | ms_ssim option parity across HIP and SYCL backends | Accepted | hip, sycl, cuda, ms_ssim, parity |
| ADR-1079 | TSan-eligible thread-safety test for threaded_extract_batch_func | Accepted | ci, test, threading |
| ADR-1080 | UBSan enum-invalid-value fixes in vmaf_log and vmaf_option_set | Accepted | ci, sanitizer, build |
| ADR-1081 | vmaf_bench correctness — unchecked alloc returns and wall-clock timer | Accepted | tools, bench, correctness, clock |
| ADR-1083 | y4m_input_fetch_frame signed-integer overflow + fread(NULL) UB fixes | Accepted | core, security, correctness, tools, c, fork-local, bugfix |
| ADR-1084 | Use filepath.SplitList for VMAF_MCP_ALLOW path-list parsing | Accepted | build, windows, mcp |
| ADR-1085 | MCP streaming backpressure — kill child processes on client disconnect | Accepted | mcp, security, go, python |
| ADR-1086 | CI Workflow Least-Privilege Permissions Audit | Accepted | ci, security |
| ADR-1087 | Extend test coverage for pkg/storage and cmd/vmafx-node/bpf | Accepted | test, storage, ebpf, coverage |
| ADR-1088 | CLI flag-parsing hardening — parse_unsigned overflow/negative guards and --help in cli_parse.cpp | Accepted | cli, security, correctness |
| ADR-1089 | Block non-standard ONNX operator domains in the DNN wire scanner | Accepted | security, ai, dnn, fork-local |
| ADR-1090 | Fix CUDA stream and event leaks on init error paths | Accepted | cuda, security, testing |
| ADR-1092 | framesync producer-death deadlock — abort flag + shutdown broadcast | Accepted | core, threading, correctness, sanitizer, fork-local |
| ADR-1093 | Disable two recurring-failure tests via should_fail while root cause is under investigation | Accepted | ci, testing, flaky, picture-pool, sycl, fork-local |
| ADR-1094 | Helm chart rolling-update correctness — node strategy, PDB default, probe fix, grace period | Accepted | helm, kubernetes, deploy, fork-local |
| ADR-1095 | Fix OTel trace context propagation across gRPC boundaries | Accepted | |
| ADR-1096 | Doxygen @brief/@param coverage for core internal headers | Accepted | docs, maintainability |
| ADR-1097 | Atomic file writes for AI-script cache and output files | Accepted | ai, correctness, reliability |
| ADR-1099 | Propagate -fsycl via sycl_dependency to fix test-binary SIGSEGV | Accepted | |
| ADR-1100 | Skip GPU-flagged extractors when flags == 0 in vmaf_get_feature_extractor_by_feature_name | Accepted | feature-extractor, correctness, sycl, cuda, hip, bug-fix, fork-local |
| ADR-1101 | Change vmaf container user GID/UID from 1000 to 2000 | Accepted | build, ci, workspace |
| ADR-1102 | Container-only canonical artifact publishing (Phase 4b.9) | Accepted | container, build, release, publish, phase4b, docs-policy, fork-local |
| ADR-1103 | Fix integer_vif_hip boundary condition: all filter-loop reads used clamp_i (replicate-edge), disagreeing with CPU PADDING_SQ_DATA and the CUDA twin's two-bounce mirror-reflect. The mismatch produced max | HIP−CPU | ≈ 0.0018 per VIF scale (places~2.75) on the Netflix src01 pair, violating the ADR-0214 places=4 gate. Replaces clamp_i with mirror2_i in all six filter-loop boundary reads; confirmed places~6 (max 1e-6) on gfx1030 wave32 hardware across all 48 src01 frames. Tightens test_hip_vif_parity.c tolerance from 1e-3 to 1e-4. |
| ADR-1104 | Remove AVX-512 dispatch from float VIF convolution to restore Netflix golden scores | Accepted | simd, correctness, float-vif, bug-fix |
| ADR-1105 | fr_regressor_v2_ensemble production flip deferred to the one-shot post-RC retrain | Accepted | ai, models, rc, docs |
| ADR-1106 | HIP integer_motion_v2 mirror corrected from 2*size-idx-1 to reflect-101 2*size-idx-2, matching the CPU/CUDA/SYCL twins. Supersedes ADR-0377's incorrect claim that the -1 form matched CPU/CUDA (all three use -2; identical call sites, so it was a one-pixel high-boundary divergence — same class as ADR-1103's HIP VIF fix). Also adds the missing shape[d] <= 0 guard in build_input_tensor (DNN infer path). HIP-only; no CPU golden impact. | Accepted | 2026-06-13 |
| ADR-1107 | Fix multi-PREV_REF starvation in threaded_read_pictures_batch: the batch struct-copied a single prev_ref snapshot into the first temporal extractor and zeroed the shared snapshot, so a 2nd PREV_REF extractor (e.g. motion_v2, which co-schedules motion) saw NULL prev_ref and returned -EINVAL every frame — failing 100% of --threads N extractions involving motion_v2 (incl. the K150K retrain corpus). Fix: each PREV_REF extractor takes its own vmaf_picture_ref(); snapshot released once at unref:. Regression test test_batch_two_prev_ref_extractors. Scores unchanged. | Accepted | 2026-06-13 |
| ADR-1108 | CUDA motion_v2 twin emits motion3_v2_score | Accepted | cuda, feature, parity |
| ADR-1109 | vmafx-node Serve() registers the VmafxScoring gRPC service (Score + ScoreStream + Health) — turning the Phase-4b.4 listen-only stub into a directly-dispatchable scoring endpoint that reuses the shared pkg/libvmaf engine, with graceful shutdown. The controller-pull worker loop (ADR-0713) is a separate client role. | Accepted | 2026-06-13 |
| ADR-1110 | Add delta_e_itp (ΔE-ITP, ITU-R BT.2124-0), a CPU-only HDR/WCG full-reference colour-difference feature extractor (feature key delta_e_itp), filling the fork's HDR colour-fidelity gap (ciede is SDR/BT.709). RC scope: PQ (ST-2084) transfer only — HLG / BT.1886 deferred (single-sourced constants); transfer=hlg/bt1886 rejected with -EINVAL. Options transfer(=pq), matrix(bt2020/bt709), range(limited/full); mean per-pixel pooling; YUV400 rejected; double-precision math, no out-of-gamut clamping. Mirrors ciede.c; PQ EOTF in delta_e_itp_math.h. Unit test asserts the BT.2124-0 Annex 4 full-precision ITP triple at places=4. | Accepted | 2026-06-14 |
| ADR-1112 | NIQE no-reference CPU feature extractor (niqe) replicating the fork Python harness byte-for-byte against the in-tree pristine model niqe_v0.1.pkl. Load-bearing fork divergences from upstream NIQE: the AGGD N carries a trailing *aggdratio factor and the MSCN maps + PIL-bicubic half-res are round-tripped through float32. NR posture: scores the distorted picture only (ref / *_90 discarded), mirroring CAMBI. Matches the harness at places=4+ on natural content. | Accepted | 2026-06-14 |
| ADR-1111 | Add PU21 HDR perceptual metric — a CPU pu21 extractor providing pu21_psnr + pu21_ssim. Encodes the luma plane through the PU21 transfer function (canonical banding_glare variant, all four selectable via variant) onto a perceptually-uniform scale, then scores PSNR (peak=256, no SDR cap) and a self-contained L=256 Gaussian SSIM. RC ships PQ (ST.2084) input only (transfer option defaults pq, rejects others with -EINVAL; HLG/SDR deferred). Critically, PU-SSIM uses its OWN L=256 SSIM (pu21_ssim.c), NOT a modification of the golden float_ssim/iqa_ssim (L=255, feeds Netflix assertions). All per-pixel math is double precision. Encoder + PU-PSNR(100,99)=51.873338803 dB verified at places=4 vs the fp64 dossier oracle (test_pu21). | Accepted | 2026-06-14 |
| ADR-1113 | Vendor the Pelorus interop ABI (data-plane side-data blob: pack/parse, deband param contract, version accessors) into vmafx as a pinned, read-only, append-only mirror of VMAFx/pelorus@835e097 — three .c under core/src/interop/, three headers under core/include/libvmaf/pelorus/, the shared 7-vector conformance fixture as core/test/test_pelorus_interop.c. Single source of truth stays in pelorus (ADR-0103); a pinned drift guard (scripts/sync-pelorus-interop.sh, reads the pinned git tree object) fails CI on divergence; vendored files are lint/format-excluded to keep them byte-identical. CPU-only, dependency-free, no Vulkan. Alternatives (submodule / meson-subproject / hand-reimplement) rejected to avoid build coupling + diverging parsers. | Accepted | 2026-06-14 |
| ADR-1114 | Y-FUNQUE+ wavelet-domain atom features only (y_funque_plus, CPU-only, temporal) — emits y_funque_plus_ms_ssim, y_funque_plus_dlm, y_funque_plus_mad. Per-frame: 2x OpenCV INTER_CUBIC (Keys cubic a=-0.75) pre-downscale, 2-level Haar DWT (pywt 'periodization'), Nadenau Y-channel CSF weighting of detail subbands only, then the three atoms (MS-SSIM cov pooling, DLM detail-loss, MAD-Ref temporal). The DLM num/den abs-asymmetry is replicated exactly (num pools rest^3 without abs, den pools ref with abs). The fused ScaledSVR MOS score is deferred — upstream funque_plus ships no frozen regressor (trains per-dataset at runtime), so a fused number would be fork-originated and needs a licensed subjective dataset + model card. License finding: funque_plus is MIT (Copyright (c) 2023 Abhinau Kumar), BSD-2-Clause-Patent-compatible; the C is a clean-room reimplementation from the papers (arXiv:2304.03412, 2202.11241) cross-checked against the MIT reference, no source copied verbatim. Double-precision, -ffp-contract=off. Unit test asserts per-atom oracles at places=4 (independently re-derived against a pywt+OpenCV reference) plus odd-dim + crop-path regressions. | Accepted | 2026-06-14 |
| ADR-1115 | BRISQUE no-reference opinion-aware CPU feature extractor (brisque), bundling the canonical LIVE-lab allmodel (libsvm EPSILON_SVR) under a documented research-use attribution exception (NOTICE + model card citing Mittal/Moorthy/Bovik TIP 2012). Replicates the gregfreeman MATLAB pipeline that trained the model — GGD for the MSCN field (not krshrimali's AGGD), Gaussian sigma=7/6 (not truncated 1.166), MATLAB antialiased-bicubic half-res — NOT the krshrimali C++ port. Range-scales with the inline computescore.cpp arrays (NOT the conflicting allrange file); no output clamp; plain svm_predict. First feature-extractor consumer of the vendored libsvm. NR posture: scores the distorted picture only. SDR-luma trained; PQ/HLG HDR out of scope (warns + scores as SDR). | Accepted | 2026-06-14 |
| ADR-1116 | vmaf-tune autotune prefilter control plane — new filter_adapters/ family (sibling to codec_adapters/) with a FilterAdapter Protocol + pelorus_deband.py hard-coding the 10 frozen knobs from the Pelorus ADR-0110 contract and emitting -vf pelorus_deband_vulkan=...; new vmaf-tune prefilter subcommand drives a JOINT Optuna TPE search over the deband knob space + CRF (reusing the fast.py TPESampler study) with VMAF as the oracle, returning recommended strengths + CRF + per-probe VMAF. vmafx stays Vulkan-free (emits the string, scores the output); the live deband→encode→score loop is gated behind pelorus_filter_available(). Unit-tested (adapter emission/validation, search-space, mocked subcommand loop); live encode untested here (no pelorus-enabled ffmpeg build). | Accepted | 2026-06-14 |
| ADR-1117 | MCP vmaf_score / vmaf_score_encoded gain optional tiny-AI/DNN (tiny_model, tiny_device/--dnn-ep, tiny_threads, tiny_fp16, tiny_model_verify, tiny_codec, tiny_preset, tiny_crf, tiny_resize, no_reference NR mode), feature-selection (feature repeatable, aom_ctc/nflx_ctc presets), and score-completeness (threads, frame_cnt, frame_skip_ref/frame_skip_dist, no_prediction) parameters — closing the fork's largest MCP capability gap (Tiny-AI was 0%% reachable). All params optional + backward-compatible; forwarded to the vmaf CLI only when set. Go (cmd/vmafx-mcp) and Python (mcp-server/vmaf-mcp) schemas verified byte-identical. NR mode makes ref optional and is gated on tiny_model. | Accepted | mcp, ai, docs, agents, fork-local |
| ADR-1118 | Pelorus side-data perceptually re-weights VMAF spatial pooling (opt-in, golden-isolated). The vendored interop parser (ADR-1113) reads each frame's per-cell banding-risk / variance maps; a normalized [0,1] salience becomes a per-frame pooling weight w = 1 + strength·salience, turning pooled MEAN/HARMONIC_MEAN into their weighted forms (MIN/MAX unaffected). Golden-gate isolation (load-bearing): inert unless BOTH the perceptual_weight opt-in is set AND a valid Pelorus blob is present for the frame; otherwise w ≡ 1.0 and the pooling runs the literal upstream expression, so the no-side-data path (and the Netflix golden pairs, which carry no side-data) is byte-identical — proven by test_perceptual_weight.c. New C-API vmaf_set_perceptual_{weight_enabled,weight_strength,sidedata} (core/include/libvmaf/perceptual_weight.h); weight module core/src/feature/perceptual_weight.c; vf_libvmaf reader + perceptual_weight AVOption (ffmpeg-patch 0017). R1–R6 compat: min(known_size,dir.size) reads, unknown bits ignored, grid==0 degrades to frame-level scalar, ABI-major mismatch rejected (unweighted + log). CPU-only, no GPU. Alternatives (auxiliary-feature / both) rejected — maintainer chose spatial-pooling weighting. | Accepted | 2026-06-14 |
| ADR-1119 | Adopt the github.com/golusoris/golusoris fx framework (pinned v0.4.0) across all six vmafx Go binaries + pkg/. Every binary becomes an fx.New(...).Run() over golusoris modules (Core/otel/HTTP/grpc/k8s-operator/clikit), sharing a vmafx-local internal/app/bootstrap stanza (Base + FxLogger). RC-blocking, phased services-first (vmafx-server → controller → node → operator, then mcp/tune + pkg sweep). Keeps the VMAFX_ config env-prefix via fx.Replace(config.Options{...}) (not golusoris's default APP_) and keeps the controller's embedded SQLite queue (not golusoris.Jobs/Postgres). Dependency closures already align byte-for-byte, so the go.mod merge is low-risk (go build ./... clean post-go get). Four framework gaps filed upstream were all integrated by the maintainer (#225 gRPC interceptor injection, #226 version module, #227 operator SetLogger + webhook, #234 log reads log.level from config); the pin was bumped v0.3.1→v0.4.0. Only #226 + the k8s/operator module are in the v0.4.0 tag; #225/#227/#234 are merged to golusoris main but untagged, so the service binaries carry small interim shims (the #234 log-env bridge; the #227 operator SetLogger + webhook gate) and the controller (needs #225) follows the next golusoris tag. | Accepted | 2026-06-14 |
| ADR-1120 | Re-pin the vendored Pelorus interop ABI (ADR-1113) from pelorus@835e097 (ABI 1.0) to pelorus@818d844 (ABI 1.3) and consume the new PEL_SEC_COMPLEXITY per-frame section in perceptual weighting (ADR-1118). Minor-3 appends three sections (QPREPORT 1.1, MOTION_CONF 1.2, COMPLEXITY 1.3) and three files; the mirror gains denoise.h, denoise_params.c, qp_report_csv.c (the last is required for the minor-3 conformance fixture to link pel_x265_csv_parse). The PelorusSideData header layout is unchanged (append-only R1) and the parser keys on abi_major, so it is a deliberate re-pin, not a break. Fixes the sync-script --update bug that re-vendored the six manifest files but not the conformance-fixture body (the immediately-following drift check then failed). Complexity modulation: perceptual_weight.c attenuates the banding salience by (1 − 0.5·complexity) floored at 0.25 — banding is more visible on flat/simple frames and masked on busy/textured frames. Golden-isolated (load-bearing): engages only when the opt-in is set AND a complexity section is present; absent section → factor 1.0 → byte-identical pooling, so the Netflix golden 576×324 pair scores 76.667831 unchanged (verified). Proven by test_complexity_modulates_weight / _grid_zero (toggle-proven). New vendored files inherit the existing prefix-glob lint/format exclusions. CPU-only, no GPU. Alternatives (trim-fixture, threshold/step modulation, replace-banding) rejected. | Accepted | 2026-06-27 |
| ADR-0452 | Hoist VIF 10-plane scratch buffer from per-frame allocation to VifState init/close lifecycle; eliminates ~79 MB/frame allocator traffic at 1080p. | Accepted | perf, vif, cpu, build, fork-local |
| ADR-0460 | Dispatch-strategy registry audit: deduplicate SYCL/Vulkan feature registry rows and align HIP/Metal dispatch-support tables with registered extractor names. | Accepted | dispatch, hip, metal, sycl, vulkan, correctness |
| ADR-0539 | Per-kernel hip_cu_extra_flags keeps SSIMULACRA2 recursive blur FP contraction disabled on HIP so parity stays inside the ADR-0214 gate. | Accepted | hip, build, ssimulacra2, numerics, fork-local |
| ADR-0567 | Port Netflix upstream 30a6e2a8d direct-read CLI path, avoiding the intermediate video_input_ycbcr buffer when USE_DIRECT_READ is enabled. | Accepted | upstream-port, performance, tools, cli, build |
| ADR-0764 | psnr_hvs CUDA kernel: F3 ldg() + __restrict pointer extraction + launch_bounds(64) (PR #96 candidate #5, mirrors ADR-0754) | Accepted | cuda, perf, psnr_hvs, fork-local |
| ADR-0866 | Wire markdownlint-cli2 into make lint + pre-commit + CI (touched-file scope) | Accepted | 2026-05-30 |
| ADR-0982 | GPU runtime bug audit — round 26: 6 init/teardown leak fixes across CUDA + SYCL + shared GPU TUs (drain stream / picture stream + events + device pointers / CUDA func table / picture-pool partial-init / SYCL extractor-vs-queue ordering / VA readback exception unwind) | Accepted | 2026-05-31 |
| ADR-0993 | KoNViD / UGC / BVI-DVC saliency batch manifests: in-tree ai/batch-manifests/saliency/ with runnable KoNViD-150K manifest and scaffolded UGC/BVI stubs documenting path-column blocking gaps | Accepted | 2026-06-03 |