ADR-0588: vmaf-tune executor — per-shot and saliency execution modes¶
- Status: Accepted
- Date: 2026-05-16
- Deciders: lusoris
- Tags:
vmaf-tune,executor,per-shot,saliency,phase-f,fork-local
Context¶
ADR-0454 introduced vmaf-tune auto --execute, which drives real FFmpeg encodes and libvmaf scores for a planning-phase AutoPlan cell. That baseline covers the "whole-clip, no ROI bias" case. Two capabilities that the planning phase already supports needed equivalent execute-mode coverage:
-
Per-shot scoring: the Phase D planner (
per_shot.py) can split a source into shot boundaries (viavmaf-perShot/ TransNet V2) and recommend per-shot CRFs. Without an execute-mode counterpart, a caller can see the plan but cannot measure real per-shot VMAF without wiring up the plumbing themselves. -
Saliency-weighted encoding:
saliency.py/saliency_aware_encodeapplies per-codec ROI bias (x264 qpfile, x265 zones, SVT-AV1 qpmap, VVenC ROI CSV) to steer bits toward salient regions. The execute mode did not expose a saliency path, so callers had no single-call entry-point to encode-and-score with ROI bias active.
The two modes are logically independent: per-shot is about temporal segmentation; saliency is about spatial bit allocation within a single encode.
Decision¶
We extend executor.py with two new public functions:
-
run_plan_per_shot: detects shot boundaries viadetect_shots(falls back to a single-shot range whenvmaf-perShotis absent), encodes and scores each shot segment independently usingframe_skip_ref/frame_cntto align the reference window, then reports a frame-length-weighted VMAF aggregate alongside per-shot rows intune_results_per_shot.jsonl. -
run_plan_saliency: wrapssaliency_aware_encodefor each selected cell, records whether saliency actually ran (vs graceful fallback), and scores the output in the standard encode → score pipeline, writing rows totune_results_saliency.jsonl.
Both functions keep the same test-seam pattern (encode_runner, score_runner, shot_runner, session_factory) as the base run_plan, so they are fully testable without any real binary.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Flags on run_plan (--per-shot, --saliency) | Single entry-point | Complex branching; callers cannot mix modes orthogonally | Separate entry-points are simpler and independently testable |
New executor_pershot.py / executor_saliency.py modules | Clean separation | Extra import complexity; shared helpers would need a third module | Kept in executor.py — the shared dataclasses and _log helper stay local |
Scene-change via FFmpeg select=gt(scene,0.4) | No extra binary | Much slower than TransNet V2; no frame-accurate boundary output | detect_shots already wraps vmaf-perShot with a tested fallback path |
Consequences¶
- Positive: callers can measure real per-shot VMAF from a single call; the saliency-aware encode path is now end-to-end testable without custom wiring.
- Negative: per-shot execution runs N encodes + N score calls (N = shot count), which is proportionally slower than a single whole-clip run. This is expected and documented in
vmaf-tune.md. - Neutral / follow-ups:
run_plan_per_shotinherits thedetect_shotsfallback behaviour — callers should checkshot_countin the result row to distinguish real shot data from the single-shot sentinel.
References¶
- ADR-0454: Phase F base execute mode.
- ADR-0222:
vmaf-perShotC-side binary (TransNet V2 wrapper). - ADR-0293: saliency-aware ROI encoding (
saliency_aware_encode). - Per user direction: Phase F follow-up item, 2026-05-16.