ADR-0506: vmaf-tune ladder duration clipping, raw-YUV cross-res decode, CLI exit code¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, claude
- Tags: vmaf-tune, ladder, corpus, encode, cli, docs
Context¶
The BBB end-to-end v6 probe (after PR #1262 / ADR-0505 closed the v5 cluster) surfaced three follow-ups, all confined to the vmaf-tune ladder orchestration:
-
V6-1 —
ladder --duration Nwas metadata-only. The flag is used by the sampler to compute kbps (size_bytes * 8 / N) but never wired into the ffmpeg encode pipe. A 10-second smoke run against a 9-minute container source therefore re-encoded the full 9 min at every CRF in the sweep — the v6 probe consumed ~10 min wall time on a single 3-cell sweep before timing out. The reference leg is already clipped by_maybe_decode_reference(via the-targument added in v2 Bug #v2-A); the encode leg never received the same treatment because the encode driver only honouredsample_clip_seconds(the ADR-0297 sample-clip mode), which the ladder CLI does not set. -
V6-2 — A cross-resolution ladder against a raw-YUV source failed on every rung whose target differed from the source dims. The v4 reference-side scale path (
_decode_source_to_yuvwithtarget_width/target_height) builds an ffmpeg argv of the formffmpeg -i src.yuv -f rawvideo -pix_fmt yuv420p -vf scale=W:H dst.yuv— no input-side-f rawvideo -s SRCWxSRCH -r FRblock, so ffmpeg's demuxer cannot parse the raw planar bytes and refuses the input. The sampler then yields zero scorable encodes and the CLI raisesRuntimeError: default sampler produced no scorable encodes. The v4 test that pins the scale path used a.mp4source, which exercises ffmpeg's container auto-detect and silently masks the missing raw-input flags. -
V6-3 —
vmaf-tune ladderreturned exit code 0 even when the sampler raised aRuntimeError("default sampler produced no scorable encodes"). The Python traceback was printed to stderr but the wrapper that callsmain()returned0to the shell, defeating CI gates and shell-script error handling. Other vmaf-tune subcommands (compare,tune-per-shot,report) return2on operational failure;_run_ladderhad no try/except aroundbuild_and_emit.
The V6-1 wall-time bug is the dominant operational pain — every container-source ladder smoke run currently consumes minutes per cell instead of seconds. V6-2 defeats the entire purpose of multi-rung ladders on raw-YUV sources (the single-resolution path introduced in ADR-0498 works in isolation but cannot be composed into a ladder). V6-3 is the low-severity CI / scripting bug.
Decision¶
-
V6-1 encode-side duration clamp: extend
EncodeRequestwith a newduration_s: float = 0.0field and havebuild_ffmpeg_commandappend-t duration_sas an input-side flag whenever the caller did NOT opt into sample-clip mode (sample_clip_seconds == 0.0) ANDduration_s > 0. Sample-clip mode (ADR-0297) keeps precedence because it carries an explicit start offset and is centred inside the window — the new flag is a "bound the whole encode" clamp, not a per-cell sample window.iter_rowsplumbsCorpusJob.duration_sinto the newEncodeRequest.duration_sso the existing--durationflag exercises the clamp without any CLI changes. The reference decode already honoursjob.duration_s; mirroring it on the encode side restores the contract the flag's help text already promised. -
V6-2 raw-YUV demuxer flags in cross-res reference decode: extend
_decode_source_to_yuvwith three new kwargs —source_is_raw,source_width,source_height(andsource_frameratefor completeness, defaulted to 24 fps when unset). Whensource_is_raw=Truethe argv is rebuilt to insert-f rawvideo -pix_fmt … -s SRCWxSRCH -r FRBEFORE-i src.yuvso the demuxer can parse raw planar bytes. The new kwargs areNone/Falseby default so container-source callers (the v4 pinned path) are unaffected._maybe_decode_referenceis extended with a matching trio and computessource_is_rawfrom the source suffix, then forwards the geometry.iter_rowspassesjob.src_width / src_height(or, when those areNone, the rung dims as the legacy single-res case) plusjob.framerateinto the new kwargs. -
V6-3 CLI exit-code on RuntimeError: wrap the
build_and_emitcall in_run_ladderin atry / except (RuntimeError, ValueError, OSError)block that prints the exception message to stderr and returns2. The exception list is intentionally narrow —KeyboardInterruptand unexpectedExceptions still propagate so debug sessions surface the traceback.2matches the convention used by the sibling subcommands.
Alternatives considered¶
-
V6-1: route
--durationthroughsample_clip_seconds. Rejected: sample-clip mode is centred inside the source window and carries a start offset. Settingsample_clip_seconds = Nwithduration_s = Nwould trip therequested >= durationguard in_resolve_sample_clipand fall back to full-source mode anyway, so the path doesn't even compose. The newduration_sfield onEncodeRequestis a one-line extension that keeps the two concepts orthogonal — "bound the encode to N seconds from the start" vs "extract a centred N-second window". -
V6-1: clamp the encode in the ladder CLI by setting
CorpusOptions.sample_clip_seconds. Rejected for the same reason: the_resolve_sample_clipprecondition rejectsrequested == duration. Forcing it to accept would change the semantics ofsample_clip_secondsfor every other corpus caller. -
V6-2: probe the source via ffprobe before deciding the demuxer flags. Rejected: the suffix-based detection used everywhere else in the corpus pipeline (
_VMAF_RAW_SUFFIXES) is already authoritative — the caller has already declared the source's geometry viaCorpusJob.src_width / src_height / pix_fmt / framerate, so ffprobe would only re-derive what we already know. The new kwargs are mandatory only whensource_is_raw=True, so the API can't silently accept a missing geometry — it raisesValueErrorinstead. -
V6-3: let the exception escape and rely on Python's default exit code of 1. Rejected: the wrapper that invokes
_run_ladderis the CLI dispatcher, which historically returns the value_run_*produces. Letting the exception escape changes the contract for every subcommand and makes unit-testing the failure path harder (the test would need to assertpytest.raises(RuntimeError)instead ofrc != 0). Catching at the subcommand boundary keeps the dispatcher surface uniform.
Consequences¶
- Positive:
ladder --duration 10against a 9-minute container source now consumes ~10 seconds of encode wall time per cell instead of 9 minutes. Smoke runs become tractable on long sources without pre-cutting viaffmpeg -ss/-t.- Cross-resolution ladders against raw-YUV sources score successfully on every rung; the per-rung reference decode now produces a parseable raw YUV file at the rung target.
-
vmaf-tune ladderjoins the other subcommands in returning2on operational failure; CI gates and shell scripts can rely on the exit byte. -
Negative:
EncodeRequestgrows a sixth field that interacts withsample_clip_seconds; new callers must know which to set. The precedence rule is documented inline (sample-clip wins) and unit-tested.-
_decode_source_to_yuvnow raisesValueErrorwhensource_is_raw=Truebut the geometry is missing. Existing callers (V3 / V4 paths) passsource_is_raw=Falseby default so no regression; new callers get a loud failure instead of a malformed argv. -
Neutral / follow-ups:
test_bbb_e2e_v6_bug_cluster.pypins one regression per finding plus a subprocess-driven CLI check for V6-3.docs/usage/vmaf-tune.mdclarifies that--duration Nnow bounds the encode pipe as well as the reference window.- The v5 end-to-end docker probe (
test_ladder_against_bbb_container_yields_plausible_vmaf) will run much faster post-fix; the 60-second pytest-timeout that fired on it pre-fix can stay in place.
References¶
- BBB e2e v6 bug log:
/tmp/bbb_e2e_bugs_v6.md(gitignored) - Predecessor (v5): ADR-0505
- Predecessor (v4): ADR-0501
- Predecessor (v3): ADR-0499
- Predecessor (v2): ADR-0498
- Encode driver:
tools/vmaf-tune/src/vmaftune/encode.py:build_ffmpeg_command - Corpus iter_rows:
tools/vmaf-tune/src/vmaftune/corpus.py:iter_rows - Reference decode helper:
tools/vmaf-tune/src/vmaftune/corpus.py:_decode_source_to_yuv - CLI:
tools/vmaf-tune/src/vmaftune/cli.py:_run_ladder - Source:
req(direct user direction in the agent dispatch briefing on 2026-05-18 — paraphrased: "fix the three v6 BBB e2e bugs in a single PR: thread--durationinto the encode driver so smoke runs actually clip; fix cross-res raw-YUV reference decode by passing source geometry; wrap the CLI RuntimeError so the process exits non-zero.")