ADR-0304: `vmaf-tune fast` — production wiring (Optuna TPE + v2 proxy + GPU verify)¶

Status: Accepted
Date: 2026-05-05
Deciders: Lusoris
Tags: tooling, ai, ffmpeg, codec, automation, fork-local

Context¶

ADR-0276 shipped the vmaf-tune fast subcommand as a scaffold under tools/vmaf-tune/src/vmaftune/fast.py: smoke mode runs Optuna over a synthetic CRF→VMAF curve, but smoke=False raises NotImplementedError because no production proxy or verify pass was wired. The scaffold deliberately deferred the production loop to a follow-up gated on two prerequisites:

A real Phase A corpus. PR #392 produced hw_encoder_corpus.py (33,840 per-frame canonical-6 rows across 9 Netflix sources × NVENC + QSV); see ADR-0291.
A production fr_regressor_v2. ADR-0291 flipped the v2 row in model/tiny/registry.json from smoke: true to smoke: false, shipping the trained ONNX (sha256 67934b0b…) at LOSO PLCC ≥ 0.95. The model predicts VMAF from six canonical libvmaf features (adm2, vif_scale0..3, motion2) plus a 14-D codec block (12-way encoder one-hot + preset_norm + crf_norm).

Both prerequisites are now satisfied. This ADR records the decision to wire the production loop on top of the scaffold without relaxing the scaffold's invariants — fast stays opt-in, the slow grid stays canonical, and the smoke-mode entry point keeps working on hosts without Optuna or onnxruntime installed.

Research-0076 is the companion digest. It walks the n_trials-vs-convergence trade-off for TPE on a CRF axis (typically 30–50 trials before diminishing returns), the proxy-vs-truth correlation budget (mean absolute VMAF gap ≤ 0.5 expected at v2's PLCC of 0.9794), and the single-pass GPU-verify cost (one ffmpeg encode + one libvmaf score at the chosen CRF, end-to-end seconds on CUDA / Vulkan / SYCL).

Decision¶

We will wire the production fast-path loop as follows:

Search strategy: Optuna TPE. The scaffold already imports Optuna; we keep optuna.samplers.TPESampler(seed=0) as the default sampler. Bayesian search beats grid + random on an integer-CRF axis at this trial budget; CMA-ES is overkill for a single integer dimension.
Proxy scorer: fr_regressor_v2. The proxy is the production ONNX shipped in ADR-0291 (no smoke models). A new vmaftune.proxy.run_proxy(features, codec_block) -> float helper loads the registry-pinned ONNX session lazily and runs inference; the helper is the single seam every consumer goes through, so future ensemble / probabilistic-head migrations (ADR-0279 follow-up) land in one place.
Single GPU verify pass at recommend-end. After TPE converges, the harness runs one real encode + libvmaf score at the recommended CRF using the GPU score backend selected via vmaftune.score_backend.select_backend(prefer) (ADR-0299). The proxy score and the verify score are both reported; the verify score is authoritative, the proxy score is logged as a diagnostic. Proxy alone never wins — the contract is one verify pass, always, regardless of how confident the proxy looks.
Smoke mode keeps working. fast_recommend(..., smoke=True) still routes through the synthetic _smoke_predictor and skips the verify pass. CI on hosts without onnxruntime / Optuna / a GPU continues to exercise the search-loop wiring end-to-end.

The recommended-CRF result gains two new fields beyond the scaffold's shape: verify_vmaf (the real libvmaf score from the GPU pass) and proxy_verify_gap (abs(predicted_vmaf - verify_vmaf)). When the gap exceeds a configurable tolerance the CLI exits non-zero so the operator knows the proxy was OOD on this source — this is the explicit fallback signal to the slow Phase A grid (ADR-0276 "Negative consequences" mitigation).

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Optuna TPE (chosen)	Bayesian convergence in 30–50 trials on an integer CRF axis; scaffold already imports it; deterministic with seed; proven on similar tuning loops	Optional Optuna dep (already in `[fast]` extra)	Picked: best convergence-per-trial on this dimension; matches ADR-0276 scaffold
CMA-ES	Robust on continuous high-dim spaces	Overkill for a single integer dimension; CRF discretisation defeats its strength; no convergence advantage at 30–50 trials	Rejected: wrong tool for the problem geometry
Random + early-stop	Zero new search complexity; trivially parallelisable	Needs ~3× more trials than TPE for the same convergence; no Bayesian "sharpening" near the target VMAF	Rejected: wastes the proxy budget on uninformative samples

Consequences¶

Positive:
The recommendation use case becomes seconds-to-minutes (TPE typically 30–50 trials × proxy inference µs + 1 GPU verify pass) instead of hours (Phase A grid).
The proxy/verify gap surfaces in the result, giving the operator an explicit OOD signal without silent failure.
Production fr_regressor_v2 finally has a user-facing application, validating ADR-0291's ship gate end-to-end.
Smoke mode is preserved as the CI-friendly path.
Negative:
vmaf-tune[fast] now needs onnxruntime in addition to Optuna for the production path. CI hosts without onnxruntime fall back to smoke mode; tests skip the production-path cases via pytest.importorskip.
The proxy is bounded by v2's calibration. Out-of-distribution sources (HDR, highly synthetic content, novel codecs not in ENCODER_VOCAB v2) can produce a large proxy/verify gap. Mitigation: the gap is logged and gates the exit code.
Neutral / follow-ups:
NVENC / QSV / AMF auto-detection (lever C in ADR-0276) remains a follow-up. Today the verify pass uses the same encoder the proxy was trained on.
Probabilistic-head proxy (ADR-0279, ensemble) is a follow-up. The run_proxy seam abstracts the inference call so the swap is one-file.
A small recommendation-quality benchmark — for ≥3 sources, run both Phase A grid and fast, report the verify VMAF gap at the recommended CRF — must land before Status flips to Accepted.

References¶

req: user prompt — paraphrased: "wire the vmaf-tune fast-path production code paths now that fr_regressor_v2 is production — Optuna TPE search using the v2 proxy, then a single GPU-verify pass" (paraphrased per the user-quote handling rule).
ADR-0276 — fast-path scaffold (this ADR's parent).
ADR-0237 — vmaf-tune umbrella spec.
ADR-0291 — fr_regressor_v2 production ship (this ADR's proxy prerequisite).
ADR-0299 — GPU score backend selection (this ADR's verify-pass enabler).
Research-0076 — companion digest.

ADR-0304: vmaf-tune fast — production wiring (Optuna TPE + v2 proxy + GPU verify)¶