ADR-0295: vmaf-tune Phase E — per-title bitrate-ladder generator¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: Lusoris
- Tags: tooling, ffmpeg, codec, automation, abr, fork-local
Context¶
Phase A of vmaf-tune (ADR-0237 / PR #329, merged) ships the encoder-grid corpus generator. Phase B (target-VMAF bisect, PR #347) gives us "find the encoder parameters that hit a requested VMAF for a given resolution". Phase D (per-shot dynamic CRF) is in flight.
What the fork still does not own — and what the PR #354 capability audit ranked the single biggest game-changer — is the next layer up: combining bisect-at-each-resolution into a per-title ABR ladder. The per-title encoding paper (Netflix 2015) is unambiguous that the optimal ladder for one title is the upper convex hull of (bitrate, vmaf) points sampled across multiple resolutions, not a fixed authoring spec. The audit's wording: ships this and the fork "reshapes from 'best open-source VMAF measurement' into 'only open-source per-title ladder generator with measured-PLCC proxy'".
This Phase E PR scaffolds the surface — the API, the convex-hull math, the rendition picker, the manifest emitters (HLS / DASH / JSON) — with a fully-mocked sampler so the smoke path works without the Phase B bisect being merged. Real (resolution × target) sampling wires up in a follow-up PR once Phase B lands.
Decision¶
We will ship tools/vmaf-tune/src/vmaftune/ladder.py and a vmaf-tune ladder CLI subcommand that:
- Sample the (resolution × target_vmaf) plane via a pluggable
SamplerFncallback (default: dispatch to Phase B's bisect; tests inject a synthetic stub). - Compute the Pareto frontier as a two-pass: drop dominated points, then take the upper-convex envelope (the diminishing- returns hull).
- Pick
nrungs from the hull using either log-bitrate spacing (Apple HLS authoring-spec convention, default) or VMAF spacing (perceptual). - Emit a manifest in HLS master-playlist, DASH MPD, or JSON descriptor form.
The default canonical rendition set is the 5-rung 1080p/720p/480p/360p/240p ladder against VMAF targets {95, 90, 85, 75, 65}; both are CLI-overridable.
Scope intentionally excludes: real encodes (Phase A's job), target-VMAF bisect (Phase B's job), per-shot variation (Phase D's job), and live MCP exposure (Phase F).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Pareto-then-upper-convex-hull (chosen) | Mirrors Netflix per-title paper exactly; produces strictly monotonic, diminishing-returns ladder; small inline implementation (~30 LOC); ABR clients see no inversions when stepping rungs | Two-pass; sensitive to floating-point ties on bitrate (handled by tie-break sort + dedup) | Gold standard for per-title ladders; everything else is a degraded approximation |
| Apple HLS authoring-spec fixed rungs | Trivial; broad client compatibility | Same ladder for every title regardless of content complexity — defeats the point of per-title encoding; the audit explicitly calls fixed ladders out as the worst option | Rejected — defeats the entire premise |
| Geometric (×2) bitrate ladder | Simple; matches HLS spec recommendations; no encoding required | Ignores the source's R-D curve; cartoons need fewer bits than sports at the same rung; same as fixed authoring spec, just parameterised | Rejected — same fundamental flaw as fixed rungs |
| JND-spaced ladder (Visicom 2019, JND-VMAF) | Perceptually motivated; matches viewer's quality-step threshold | Requires a JND model on top of VMAF (we only have VMAF); deferred until tiny-AI exposes a JND head | Deferred to a future ADR; layer on top once JND head ships |
| Bayesian-optimisation sampler (instead of grid×bisect) | Fewer encodes per title; principled exploration | Phase B's bisect already exists; BO would be a parallel research workstream; orthogonal to the ladder math | Out of scope — Phase E is the ladder math; sampler is pluggable |
Consequences¶
- Positive:
- Closes the loop on the Phase A→B→C→D→E pipeline. With Phase B merged, a single CLI invocation produces the full ladder for a title.
- Phase F (MCP) gets
generate_ladderfor free — wraps thebuild_and_emitconvenience. - The audit's "game-changer" status moves from claimed to demonstrable: no other open-source tool ships per-title ladders against VMAF measurement out of the box.
-
HLS and DASH manifest output means the CLI is directly callable by an encode pipeline; downstream tooling re-points the placeholder URIs at real per-rendition playlists.
-
Negative:
- The default
sampler=NoneraisesNotImplementedErroruntil Phase B's bisect lands. The CLI is currently smoke-only — useful via Python tests, not yet useful end-to-end. Status stays Proposed until that integration PR lands and we have an end-to-end smoke against a Netflix Public clip. - Synthetic test corpus is not validated against a real per-title encode. Smoke tests prove the math; PLCC against a real Netflix per-title baseline is a separate validation milestone.
-
Manifest emit ships placeholder variant URIs; the consumer must re-point them. We do not currently package the manifest with actual segmented MP4s — that's a downstream concern.
-
Neutral / follow-ups:
- Phase B integration PR (gated on PR #347 merge): replace
_default_samplerwith a real bisect-driven sampler. - Real-corpus validation (gated on Netflix Public encodes via Phase A): compute PLCC of the picked ladder rungs against Netflix's published per-title rungs and document the delta in
docs/research/0061-vmaf-tune-capability-audit.md. - Status flips to Accepted only when the end-to-end PR lands AND the validation digest reports the delta.
References¶
- Audit source:
docs/research/0061-vmaf-tune-capability-audit.mdBucket #6 (per-title ladder generator — flagged as the game-changer). - ADR-0237 — vmaf-tune umbrella spec (this ADR is its Phase E child).
- Netflix per-title encoding paper, 2015 — the canonical reference for the convex-hull approach.
- Apple HLS Authoring Specification for Apple Devices §2.3 — bandwidth-doubling ladder convention used as the default
spacing="log_bitrate"mode. - av1an
--target-qualitymode — prior art for per-rendition bisect; conceptually a Phase B sibling, not a Phase E sibling. - Bitmovin Per-Title — closed-source equivalent on the cloud-encoder side.
- PR #347 (Phase B target-VMAF bisect, in flight) — the integration point for the production sampler.
- PR #354 capability audit — flagged Bucket #6 as the highest- leverage gap in the fork's automation surface.
Status update 2026-05-08: Accepted¶
Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).
Acceptance criteria verified in tree at HEAD 0a8b539e:
tools/vmaf-tune/src/vmaftune/ladder.py— present (scaffold withbuild_ladder,convex_hull,select_knees,emit_manifest).vmaf-tune ladderCLI subcommand registered.- ADR-0307 (Accepted in the 2026-05-06 sweep) wired the default
_default_samplerso the placeholder no longer raisesNotImplementedError; theSamplerFnseam stays open for callers needing finer control. - Verification command:
ls tools/vmaf-tune/src/vmaftune/ladder.py.