ADR-0295: vmaf-tune Phase E — per-title bitrate-ladder generator¶

Status: Accepted
Date: 2026-05-03
Deciders: Lusoris
Tags: tooling, ffmpeg, codec, automation, abr, fork-local

Context¶

Phase A of vmaf-tune (ADR-0237 / PR #329, merged) ships the encoder-grid corpus generator. Phase B (target-VMAF bisect, PR #347) gives us "find the encoder parameters that hit a requested VMAF for a given resolution". Phase D (per-shot dynamic CRF) is in flight.

What the fork still does not own — and what the PR #354 capability audit ranked the single biggest game-changer — is the next layer up: combining bisect-at-each-resolution into a per-title ABR ladder. The per-title encoding paper (Netflix 2015) is unambiguous that the optimal ladder for one title is the upper convex hull of (bitrate, vmaf) points sampled across multiple resolutions, not a fixed authoring spec. The audit's wording: ships this and the fork "reshapes from 'best open-source VMAF measurement' into 'only open-source per-title ladder generator with measured-PLCC proxy'".

This Phase E PR scaffolds the surface — the API, the convex-hull math, the rendition picker, the manifest emitters (HLS / DASH / JSON) — with a fully-mocked sampler so the smoke path works without the Phase B bisect being merged. Real (resolution × target) sampling wires up in a follow-up PR once Phase B lands.

Decision¶

We will ship tools/vmaf-tune/src/vmaftune/ladder.py and a vmaf-tune ladder CLI subcommand that:

Sample the (resolution × target_vmaf) plane via a pluggable SamplerFn callback (default: dispatch to Phase B's bisect; tests inject a synthetic stub).
Compute the Pareto frontier as a two-pass: drop dominated points, then take the upper-convex envelope (the diminishing- returns hull).
Pick n rungs from the hull using either log-bitrate spacing (Apple HLS authoring-spec convention, default) or VMAF spacing (perceptual).
Emit a manifest in HLS master-playlist, DASH MPD, or JSON descriptor form.

The default canonical rendition set is the 5-rung 1080p/720p/480p/360p/240p ladder against VMAF targets {95, 90, 85, 75, 65}; both are CLI-overridable.

Scope intentionally excludes: real encodes (Phase A's job), target-VMAF bisect (Phase B's job), per-shot variation (Phase D's job), and live MCP exposure (Phase F).

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Pareto-then-upper-convex-hull (chosen)	Mirrors Netflix per-title paper exactly; produces strictly monotonic, diminishing-returns ladder; small inline implementation (~30 LOC); ABR clients see no inversions when stepping rungs	Two-pass; sensitive to floating-point ties on bitrate (handled by tie-break sort + dedup)	Gold standard for per-title ladders; everything else is a degraded approximation
Apple HLS authoring-spec fixed rungs	Trivial; broad client compatibility	Same ladder for every title regardless of content complexity — defeats the point of per-title encoding; the audit explicitly calls fixed ladders out as the worst option	Rejected — defeats the entire premise
Geometric (×2) bitrate ladder	Simple; matches HLS spec recommendations; no encoding required	Ignores the source's R-D curve; cartoons need fewer bits than sports at the same rung; same as fixed authoring spec, just parameterised	Rejected — same fundamental flaw as fixed rungs
JND-spaced ladder (Visicom 2019, JND-VMAF)	Perceptually motivated; matches viewer's quality-step threshold	Requires a JND model on top of VMAF (we only have VMAF); deferred until tiny-AI exposes a JND head	Deferred to a future ADR; layer on top once JND head ships
Bayesian-optimisation sampler (instead of grid×bisect)	Fewer encodes per title; principled exploration	Phase B's bisect already exists; BO would be a parallel research workstream; orthogonal to the ladder math	Out of scope — Phase E is the ladder math; sampler is pluggable

Consequences¶

Positive:
Closes the loop on the Phase A→B→C→D→E pipeline. With Phase B merged, a single CLI invocation produces the full ladder for a title.
Phase F (MCP) gets generate_ladder for free — wraps the build_and_emit convenience.
The audit's "game-changer" status moves from claimed to demonstrable: no other open-source tool ships per-title ladders against VMAF measurement out of the box.
HLS and DASH manifest output means the CLI is directly callable by an encode pipeline; downstream tooling re-points the placeholder URIs at real per-rendition playlists.
Negative:
The default sampler=None raises NotImplementedError until Phase B's bisect lands. The CLI is currently smoke-only — useful via Python tests, not yet useful end-to-end. Status stays Proposed until that integration PR lands and we have an end-to-end smoke against a Netflix Public clip.
Synthetic test corpus is not validated against a real per-title encode. Smoke tests prove the math; PLCC against a real Netflix per-title baseline is a separate validation milestone.
Manifest emit ships placeholder variant URIs; the consumer must re-point them. We do not currently package the manifest with actual segmented MP4s — that's a downstream concern.
Neutral / follow-ups:
Phase B integration PR (gated on PR #347 merge): replace _default_sampler with a real bisect-driven sampler.
Real-corpus validation (gated on Netflix Public encodes via Phase A): compute PLCC of the picked ladder rungs against Netflix's published per-title rungs and document the delta in docs/research/0061-vmaf-tune-capability-audit.md.
Status flips to Accepted only when the end-to-end PR lands AND the validation digest reports the delta.

References¶

Audit source: docs/research/0061-vmaf-tune-capability-audit.md Bucket #6 (per-title ladder generator — flagged as the game-changer).
ADR-0237 — vmaf-tune umbrella spec (this ADR is its Phase E child).
Netflix per-title encoding paper, 2015 — the canonical reference for the convex-hull approach.
Apple HLS Authoring Specification for Apple Devices §2.3 — bandwidth-doubling ladder convention used as the default spacing="log_bitrate" mode.
av1an --target-quality mode — prior art for per-rendition bisect; conceptually a Phase B sibling, not a Phase E sibling.
Bitmovin Per-Title — closed-source equivalent on the cloud-encoder side.
PR #347 (Phase B target-VMAF bisect, in flight) — the integration point for the production sampler.
PR #354 capability audit — flagged Bucket #6 as the highest- leverage gap in the fork's automation surface.

Status update 2026-05-08: Accepted¶

Audited as part of the 2026-05-08 ADR Proposed sweep (Research-0086).

Acceptance criteria verified in tree at HEAD 0a8b539e:

tools/vmaf-tune/src/vmaftune/ladder.py — present (scaffold with build_ladder, convex_hull, select_knees, emit_manifest).
vmaf-tune ladder CLI subcommand registered.
ADR-0307 (Accepted in the 2026-05-06 sweep) wired the default _default_sampler so the placeholder no longer raises NotImplementedError; the SamplerFn seam stays open for callers needing finer control.
Verification command: ls tools/vmaf-tune/src/vmaftune/ladder.py.