ADR-0392: vmaf-tune Phase D — per-shot CRF tuning¶
- Status: Accepted (CLI bisect wiring landed 2026-05-14; native per-codec emission still deferred)
- Date: 2026-05-03
- Deciders: Lusoris
- Tags: tooling, ai, ffmpeg, codec, automation, fork-local
Context¶
Phase A of vmaf-tune (corpus tooling, ADR-0237, PR #329) shipped the grid-sweep harness. Research-0061 — the vmaf-tune capability audit — ranked per-shot CRF tuning (Bucket
1) as the table-stakes Netflix-equivalent feature: the canonical¶
2018 paper reports 10–30 % bitrate savings at constant VMAF when CRF is varied per shot instead of held flat across the title.
The fork already ships every component the orchestration needs:
- TransNet V2 real-weights ONNX model at
model/tiny/transnet_v2.onnx(ADR-0223), consumed by the C-sidevmaf-perShotCLI (ADR-0222). - Phase A corpus + codec adapter contract at
tools/vmaf-tune/src/vmaftune/. - Phase B target-VMAF bisect — its predicate shape
(shot, target_vmaf, encoder) -> (crf, measured_or_predicted_vmaf)is the hook the per-shot loop calls per shot.
What is missing is the orchestration layer that ties shot detection to per-shot CRF selection to FFmpeg encode + concat. This ADR originally shipped that layer as a scaffold with stable public API, two explicitly-pluggable integration seams, and mocked smoke coverage.
The scaffold is gated by three follow-up dependencies we deliberately do not ship in the same PR:
- Per-codec native per-shot emission (x264
--qpfile, x265--zones, SVT-AV1 segment tables) must land per-codec — the default path uses per-segment encode plus concat-demuxer, which is portable but loses GOP-alignment efficiency. - A held-out per-shot validation corpus must exist so we can claim numerical wins; without it, this PR ships zero quality claims.
Status update 2026-05-14: Phase B bisect has landed and the vmaf-tune tune-per-shot CLI now binds the predicate seam to bisect_target_vmaf by default. The CLI extracts each detected half-open shot to a temporary raw-YUV reference, runs the real encode+score bisect for that shot, and records the measured VMAF in the JSON plan. --predicate-module MODULE:CALLABLE remains the explicit custom/test hook; the library-only default predicate still returns the adapter default CRF for dry-run callers that invoke tune_per_shot() without geometry.
Decision¶
We will ship tools/vmaf-tune/src/vmaftune/per_shot.py as the Phase D orchestration layer with the following public surface:
Shot(start_frame, end_frame)— half-open frame range. Thevmaf-perShotCSV/JSON sidecar uses inclusiveend_frame; we normalise into the half-open form at the parse boundary.ShotRecommendation(shot, crf, predicted_vmaf).EncodingPlan(recommendations, encoder, framerate, segment_commands, concat_command, concat_listing)— segments + the FFmpeg argv to realise them.detect_shots(video_path, *, per_shot_bin="vmaf-perShot", runner=...)— calls the C-side binary; falls back to a single-shot range if the binary is missing or fails.runneris the test seam.tune_per_shot(shots, *, target_vmaf, encoder, predicate=None)— drives the predicate per shot. CLI callers get the Phase-B bisect predicate by default; library callers that omit a predicate get the codec adapter'squality_defaultfor deterministic dry runs.merge_shots(recs, *, source, output, framerate, encoder, segment_dir=..., ffmpeg_bin=...)— emits oneffmpegargv per shot (using-ss+-frames:v) plus a final concat-demuxer command.
The CLI subcommand is vmaf-tune tune-per-shot. JSON plan to stdout by default, --plan-out and --script-out for files. Default CLI behaviour is real per-shot bisect unless --predicate-module is supplied.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Pluggable scaffold (chosen) | Stable public API; tests run with mocks; Phase B + per-codec emission land as drop-ins | Ships zero quality wins until follow-ups land; risk of stalling at "scaffold" status | Picked: matches Phase A's deliberately-scaffolded pattern; the alternative (gate Phase D on every dependency) blocks the audit's Bucket #1 indefinitely |
| Wait for Phase B + per-codec emission first | Single PR ships the full feature with measurable savings | Bundles three independent workstreams into one giant PR; gates each on the slowest; impossible to review incrementally | Rejected: the integration layer is independently valuable as a forcing function; landing it first surfaces interface mismatches in Phase B / codec adapters |
Native --qpfile / --zones per codec from day one | Keeps the encoder's GOP and rate-control coherent across shot boundaries | Requires every codec adapter to grow per-shot emission before the orchestrator exists; couples Phase D's PR to the codec adapter cadence | Rejected: per-segment + concat is portable across every codec; native emission is a per-adapter optimisation that lands in the codec PRs |
| Inline shot detection (call ONNX Runtime directly from Python) | One fewer subprocess; tighter integration | Duplicates vmaf-perShot's detection logic; bypasses the C-side preprocessing the perShot binary already does (luma planes, 27×48 thumbnails) | Rejected: vmaf-perShot is the canonical detector; reusing it preserves a single point of truth for shot boundaries across the fork |
| Phase B bisect inlined into Phase D | Single PR ships the bisect + the per-shot loop together | Couples two independently-tested workstreams; doubles the test surface; Phase B already has its own PR (#347) | Rejected: keep the bisect surface in its own ADR / PR; the predicate seam is the contract between them |
| CLI binds existing Phase B bisect (2026-05-14 choice) | Retires the CLI scaffold without changing the library predicate API; reuses tested bisect monotonicity and adapter validation | Extracts each shot to temporary raw YUV before bisect, so long clips pay extra disk I/O | Picked: smallest real implementation now that Phase B exists; native per-codec zones remain a separate optimization |
Make tune_per_shot() require a predicate | Prevents accidental adapter-default dry runs | Breaks existing programmatic smoke users and tests that intentionally exercise the dry-run API | Rejected: CLI is the user-discoverable production path; the library dry-run fallback is explicitly documented as non-production |
Consequences¶
- Positive:
- Closes the audit's Bucket #1 scope at the orchestration layer.
- Stable public API (
detect_shots/tune_per_shot/merge_shots) that Phase B and per-codec emission can plug into without breaking callers. - Mocked smoke coverage: tests pass without
ffmpeg,vmaf, orvmaf-perShoton PATH. -
First end-to-end downstream consumer of TransNet V2's real weights — exercises the
vmaf-perShotbinary surface. -
Negative:
- Per-segment + concat encoding loses keyframe alignment with the encoder's natural GOP — efficiency penalty vs native
--qpfile/--zones. Acceptable for the portable path; the codec PRs replace it per-codec. - The bisect path extracts each shot to raw YUV first; operators should expect temporary disk usage proportional to the largest in-flight shot.
-
Two-language surface (Python orchestration calling a C-side binary) means a
vmaf-perShotregression silently degrades to the single-shot fallback. Logged via a follow-up: emit a stderr warning when fallback fires in production runs. -
Neutral / follow-ups:
- Per-codec PRs (x265, SVT-AV1, libaom, libvvenc) extend the codec adapter contract with an
emit_per_shot_overrideshook (already declared in ADR-0237) thatmerge_shotswill dispatch to instead of the per-segment + concat fallback. - Held-out per-shot validation corpus is a separate research item; likely BVI-DVC + Netflix Public + KoNViD subsets re-encoded through Phase A's grid sweep.
- MCP integration (Phase F) gains a
per_shot_plantool once this scaffold lands.
References¶
- Source:
req2026-05-03 — "Scaffold Phase D ofvmaf-tuneper PR #354's audit (Bucket #1, M effort, 'Netflix per-shot table-stakes')." Don't fully implement — ship a working scaffold design ADR. - ADR-0237 —
vmaf-tuneumbrella; this ADR is its first per-phase split. - ADR-0222 — C-side
vmaf-perShotCLI surface consumed bydetect_shots. - ADR-0223 — TransNet V2 real weights driving the shot detector.
- Research-0044 — option-space digest covering encode-search strategies.
- Research-0061 (PR #354) —
vmaf-tunecapability audit, Bucket #1 ranks per-shot tuning as the M-effort, High-impact next step. - Netflix tech blog 2018 — Per-Title Encode Optimization and the follow-on per-shot dynamic optimiser; the public 10–30 % bitrate savings figure motivating Bucket #1.