Skip to content

ADR-0424: vmaf-tune benchmark consumes Phase-A corpora

  • Status: Accepted
  • Date: 2026-05-14
  • Deciders: Lusoris maintainers
  • Tags: vmaf-tune, cli, benchmark, corpus

Context

The .workingdir2 backlog still listed Phase G, the cross-codec corpus benchmark, as not started. The fork already has enough encode-producing surfaces (corpus, recommend, compare, ladder, fast) that another default encode path would overlap existing commands and increase CI cost. Operators still need a stable way to answer the post-sweep question: which encoder clears a target VMAF at the lowest bitrate in an existing Phase-A JSONL corpus?

Decision

We will add vmaf-tune benchmark as a read-only corpus report. It consumes --from-corpus JSONL, filters successful finite rows, groups by encoder, and reports each encoder's lowest-bitrate row clearing --target-vmaf. Encoders that do not clear the target remain visible as unmet using their closest VMAF miss. Output formats are markdown, json, and csv.

Alternatives considered

Option Pros Cons Why not chosen
Extend compare Reuses the existing cross-codec name compare runs Phase-B bisect work; making it also mean "read an existing corpus" blurs runtime expectations Rejected to keep live encode comparison and offline corpus analysis separate
Add benchmark mode to recommend Reuses existing corpus loader recommend returns one row for one predicate, while Phase G needs one summary per encoder plus baseline deltas Rejected because the output contract is different
New benchmark subcommand Clear runtime contract; no new encodes; easy to test from synthetic JSONL Adds another CLI surface Chosen; the user-visible surface is small and maps directly to the backlog item

Consequences

  • Positive: Phase-G users can compare codecs from a saved corpus without rerunning FFmpeg/libvmaf.
  • Positive: JSON/CSV output can feed notebooks and dashboards; markdown is suitable for PR comments.
  • Negative: The report inherits corpus coverage bias. If one encoder was swept with a narrower CRF/preset grid, its benchmark row is only as good as that corpus.
  • Neutral / follow-ups: BD-rate integration can layer on top once the fork standardises interpolation policy across sparse, multi-source corpora.

References

  • .workingdir2/BACKLOG.md VT-OPEN: Phase G cross-codec corpus benchmark not started.
  • ADR-0237: vmaf-tune umbrella.
  • req: "okay in the meantime we can create a new branch and find more backlogs etc... bugs or whatever, anything we do is a huge win"