ADR-0498: vmaf-tune BBB end-to-end v2 bug cluster + explicit-backend semantics¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude (Opus 4.7)
- Tags:
vmaf-tune,cli,libvmaf,bugfix,docs,container
Context¶
The follow-up BBB end-to-end smoke run (after PR #1253 / ADR-0497 closed the first seven-bug cluster) surfaced five new defects one layer down: disk-runaway in the score-step decoder, a cross-resolution ladder crash against a raw-YUV source, a missing matplotlib dependency in the dev-mcp container image, a residual bare-NaN literal in the report's JSON appendix, and a silent CPU fallback when the libvmaf binary is asked for an explicit GPU backend that fails to initialise. Two operational follow-ups (encoder availability diagnostics and encoder-version detection) were grouped in the same PR because they touch the same files and share a regression-test fixture.
The decision was whether to ship a focused per-bug PR train or a single consolidated cluster PR (precedent: ADR-0497 chose the cluster form for the first BBB cluster).
Decision¶
We ship a single PR (fix/bbb-e2e-v2-bug-cluster-2026-05-18) that addresses all five v2 bugs plus the two operational follow-ups, backed by nine regression tests in tools/vmaf-tune/tests/test_bbb_e2e_v2_bug_cluster.py. The fixes are:
- Bug #v2-A (Major — disk runaway).
score._decode_to_raw_yuvgained an optionalduration_sparameter that emits a ffmpeg-tclamp on the output side. The duration is threaded down from a newScoreRequest.duration_sfield via the existingmaybe_decode_distortedshim.bisect._encode_and_scoreandcorpus.iter_rowspopulate it from the caller'sduration_s. A 10 s probe against a 634 s 1080p source now produces ~896 MB of raw YUV instead of ~58 GB. - Bug #v2-B (Major — cross-resolution ladder crash). The default ladder sampler accepts new
src_width / src_heightparameters; when set and distinct from the rung target, theCorpusJobcarries both source and rung dimensions and theiter_rowsEncodeRequestbuilder switches to source-dim-s W:Hplus a-vf scale=W:Hfilter for the downscale. The CLI defaults the source dims to the largest entry in--resolutions(so single-resolution ladders preserve the legacy behaviour) and exposes explicit--src-width / --src-heightoverrides for cases where the source resolution isn't one of the ladder rungs. - Bug #v2-C (Major — container surface gap, ADR-0496 compliance).
dev/Containerfilenowpip installsmatplotlibinto/opt/vmaf-venvsovmaf-tune reportworks inside the container without an out-of-bandpip install. As defence in depth,report.py's chart helpers fall back to an HTML comment placeholder when matplotlib is unimportable, so table-only reports still render outside the container. - Bug #v2-D (Minor — RFC 8259 conformance).
report.py's<details>JSON appendix (both Markdown and HTML paths) now passesallow_nan=Falseand coercesfloat('nan')/float('inf')toNone. The fix mirrors the_nan_to_noneshim already incompare.py(added by ADR-0497) so strict JSON parsers (Go, Rust,jq) accept the appendix. - Bug #v2-E (Major — silent backend fallback). The vmaf binary's
init_gpu_backendsderives anexplicit_backendflag from--backend NAME(anything other thanauto/cpu) and turns each per-backendstate_initfailure into a non-zero exit when that backend was the explicit request. The--backend autopath keeps the legacy soft-fallback chain. Independently, the JSON output now carries a top-level"backend_used": "NAME"key (cpu / cuda / sycl / vulkan / hip / metal) so CI gates and MCP probes can confirm what actually ran — mirrors the MCP-layer echo added by PR #1251.
Operational follow-ups, same PR:
- Encoder availability vs encode failure —
bisect._encode_and_scorenow distinguishes "Encoder not found" / "Unknown encoder" stderr markers and reportsencoder unavailable (libsvtav1): …instead of the crypticencode failed at CRF NN (exit=1): Encoder not foundthe pre-ADR-0498 path emitted. - Encoder version detection —
encode.parse_versionsregex widened to accept thex264 - core 164/x264-core 164variants, and a process-cached_probe_encoder_version_from_ffmpeghelper falls back toffmpeg -version's--enable-libx264/--enable-libsvtav1configure-line markers when the per-encoder banner is suppressed by-hide_banner. Rows that previously carried"unknown"now carrylibx264-enabled/libsvtav1-enabledso consumers can at least confirm the encoder is compiled in. - dev-mcp-stdio /tmp —
dev/scripts/dev-mcp-entrypoint.shmkdir -p /tmp && chmod 1777 /tmpas its first action so the sibling MCP log + the bug-cluster repro scripts never fail on "No such file or directory: /tmp/vmaf-mcp.log" when the runtime ships a minimal/filesystem.
The "consolidated cluster" form was chosen over per-bug PRs for the same reasons ADR-0497 cited: bugs were discovered together by one repro, they all gate the same documented headline workflow, the regression-test fixture is shared, and per-bug PRs would multiply CI cost.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Five separate PRs (one per major bug) | Smaller diffs, atomic reverts | 5× CI cost, fragmented test fixture, harder to verify the e2e smoke went green | Bugs share a v2 cluster identity; splitting hides the regression context |
Bug #v2-E: extend the libvmaf C API with a vmaf_get_active_backend() getter | Cleaner integration, no JSON post-edit | Touches a public header (triggers ffmpeg-patches rebase per rule 14), and the MCP layer already echoes backend_used (PR #1251) — the same data is available there | Out-of-process textual amend keeps the API surface stable; downstream consumers already parse JSON, not C |
| Bug #v2-B: reject multi-resolution YUV ladders with a clear error and require container sources | Smaller diff | Punishes users who legitimately have raw YUV at a higher resolution than their target ladder; closes off a documented workflow | Cross-resolution sampling against a raw source is a normal authoring step; the fix is straightforward (scale filter) and unblocks the workflow |
| Bug #v2-C: skip charts when matplotlib is missing, don't add the dep | Smallest container delta | Charts are part of the documented report output; ADR-0496 says every user surface works in the container | Adopted the dep AND added the fallback — the container is the supported path; the fallback is for host-side runs |
Consequences¶
Positive¶
- The documented
vmaf-tuneheadline workflow now survives a realistic re-probe (10 s window against 634 s 1080p source) on a modest dev-mcp host without disk-space drama. - Cross-resolution ladders against raw YUV sources work as documented.
vmaf-tune reportworks insidevmaf-dev-mcpwithout out-of-band pip-installs (ADR-0496 compliance).- The JSON output of both
compare,ladder, andreportis now uniformly RFC 8259 conformant. - CI gates depending on
--backend NAMEnow fail loudly when the backend can't init, instead of silently regressing to CPU scoring. The newbackend_usedJSON key lets downstream tooling confirm dispatch independently of the human-readable stderr.
Negative / costs¶
ScoreRequestandCorpusJobgrew new optional fields; the defaults preserve the legacy behaviour but pinned-shape consumers (tests that compare againstdataclasses.asdict) may need a one-line update.- The dev-mcp container is one matplotlib install heavier (~50 MB of dependencies). Worth it per ADR-0496.
- The
init_gpu_backendsC function gained ~25 lines of explicit-backend gating, pushing it deeper into NOLINT- function-size territory; the existing NOLINTNEXTLINE + ADR-0141 citation already cover it.
Neutral¶
_probe_encoder_version_from_ffmpegruns at most once per process per (binary, encoder) tuple; the cache lives in module scope so test isolation requires explicit_PROBE_CACHE.clear()(mirrors existing patterns).
References¶
- Bug log:
/tmp/bbb_e2e_bugs_v2.md(BBB e2e v2 probe report, 2026-05-18). - ADR-0497: prior BBB end-to-end cluster (closed seven bugs; this ADR closes the next layer of five).
- ADR-0496: prefer
vmaf-dev-mcpcontainer for vmaf / vmaf-tune / ai / MCP-probing work — drives the matplotlib container fix. - ADR-0495 / PR #1251: MCP-layer
backend_usedecho — mirrored by Bug #v2-E's JSON-output amend. - ADR-0299 / ADR-0175 / ADR-0186 / ADR-0299: backend dispatch contract — explicit-backend semantics are an explicit documentation refinement.
paraphrased: per user direction "fix all 5 v2 bugs found by the second BBB end-to-end probe. Single PR" — five major fixes plus three operational follow-ups consolidated in one PR per the "bigger-content PRs over per-LOC PRs" rule.