ADR-0326: vmaf-tune Phase B — target-VMAF bisect¶
- Status: Accepted
- Status update 2026-05-15: implemented;
tools/vmaf-tune/src/vmaftune/bisect.pypresent on master;BisectResult+bisect_target_vmaf()exported; Phase B released. - Date: 2026-05-08
- Deciders: lusoris
- Tags: tooling, vmaf-tune, fork-local
Context¶
The vmaf-tune harness has shipped Phase A (corpus grid sweep), Phase A.5 (fast Optuna proxy), Phase C (predict, ADR-0276), Phase D (tune-per-shot), Phase E (ladder, ADR-0295), plus the compare and recommend-saliency subcommands. Each of these surfaces depends on a target-VMAF predicate with the loose contract "given a source + codec + quality floor, return the most-compressed encode that still clears the floor". Until this PR the production predicate raised NotImplementedError("Phase B pending"); tests injected a fake predicate, production callers had no working backend.
The "Phase B pending" placeholder has been referenced from ADR-0276, ADR-0287, ADR-0295, ADR-0306, and others as the "production wiring" they each defer to. Promoting the placeholder is overdue.
The analytical-curve binary search in predictor.pick_crf already establishes the algorithmic shape on synthetic data; Phase B mirrors it onto real encodes via the existing encode.run_encode / score.run_score subprocess seams. The codec-adapter registry already exposes the search-space boundary (adapter.quality_range, per ADR-0296), so the bisect picks that as the default search domain. A separate coarse-to-fine search (ADR-0306) lives in corpus.coarse_to_fine_search for the broader (preset, CRF) grid; the bisect is the single-axis primitive coarse-to-fine wraps when only the CRF axis is in play.
Decision¶
We will ship vmaftune.bisect as a pure module with three exported symbols:
BisectResult—(codec, best_crf, measured_vmaf, bitrate_kbps, encode_time_ms, n_iterations, encoder_version, ok, error). Mirrors the shape ofcompare.RecommendResultso a one-line adapter (make_bisect_predicate) satisfies the existingcompare.PredicateFnsignature.bisect_target_vmaf(src, codec, target_vmaf, *, width, height, ..., crf_range=None, max_iterations=8, encode_runner=None, score_runner=None, ...) -> BisectResult— the core algorithm.make_bisect_predicate(target_vmaf, *, width, height, ...) -> compare.PredicateFn— the closure-binding adapter.
The algorithm is a textbook integer binary search over CRF assuming monotone-decreasing VMAF in CRF:
lo, hi = crf_range or adapter.quality_range
while lo <= hi and n_iterations < max_iterations:
mid = midpoint_lower_quality(lo, hi) # round toward higher CRF
measured = score(encode(src, codec, mid))
if measured >= target_vmaf:
best = (mid, measured); lo = mid + 1 # try harder compression
else:
hi = mid - 1 # need higher quality
The midpoint rounds toward the lower-quality (higher-CRF) end of the window so the "best so far" we accept is always a CRF we actually measured — never one extrapolated to from an adjacent sample. This is a one-line correctness guard, not a performance choice.
The bisect aborts with a clear error when:
- The target is unreachable in the searched window (target above the curve's maximum) —
ok=False, error="target ... unreachable in CRF window ..."and the closest-miss CRF is reported in the error string. - Two non-adjacent samples violate monotone-decreasing VMAF in CRF by more than 0.5 VMAF (looser than measurement noise on a single shot) —
ok=False, error="monotonicity violation ...". The bisect never falls back to a non-bisect strategy in this case; thetools/vmaf-tune/AGENTS.mdinvariant is "monotonicity is a hard contract; bail with a clear error if it doesn't hold".
The module is subprocess-free in tests: encode_runner / score_runner mirror the pattern from encode.run_encode / score.run_score, and the test suite exercises the full bisect (including the predicate adapter and compare_codecs integration) with synthetic curves.
compare._default_predicate is updated to point callers at bisect.make_bisect_predicate rather than name "Phase B pending"; the predicate signature (codec, src, target_vmaf) -> RecommendResult does not carry source geometry so operators bind geometry once via make_bisect_predicate(width=..., height=..., framerate=..., duration_s=...) and pass the closure into compare_codecs(predicate=...) or via --predicate-module MODULE:CALLABLE. The default predicate stays a pointer rather than the production wiring because making compare's top-level signature carry geometry would break the existing ranking contract for every other caller.
No new CLI subcommand. The bisect is a programmatic primitive that the existing compare / recommend-saliency / predict / tune-per-shot / ladder subcommands consume via the predicate seam.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Binary search (chosen) | O(log range) encodes; mirrors the proven predictor.pick_crf shape; trivial to reason about | Assumes monotone-decreasing VMAF in CRF (real-world content satisfies this for every modern codec; we hard-bail when it doesn't) | Clear winner — the assumption is sound and the algorithm is the smallest correct primitive |
| Golden-section search | Optimal for unimodal continuous functions; one fewer evaluation per halving | CRF is integer-valued; golden-section's (φ, 1/φ) partition does not respect integer steps; convergence becomes irregular below 4-CRF windows | Cost outweighs the saving once we cap at 8 iterations |
| Full coarse-to-fine grid (ADR-0306) | Explores the (preset, CRF) plane; robust against pathological curves | Encodes the entire grid; ~15–25 encodes per call vs the bisect's 6–8; over-budget for any per-shot or per-resolution outer loop | Already shipped (corpus.coarse_to_fine_search) for the use case it fits — Phase B is the single-axis inner loop, not a replacement |
| Brute-force linear scan | Trivially correct; catches non-monotone curves | O(range) encodes — (0..51) is 52 encodes per call; inflates Phase E ladder generation by ~50× | Dismissed; same monotonicity assumption applies, and brute force ignores it instead of asserting it |
Bayesian optimisation (e.g. Optuna TPESampler) | Handles non-monotone curves; the fast subcommand already wires Optuna | Optuna is an opt-in dep (ADR-0276); pulling it into the production predicate breaks the zero-dep corpus path | The fast subcommand owns this niche; Phase B stays pure stdlib |
Consequences¶
- Positive: Every existing subcommand that had a stubbed predicate has a real production wiring with one closure-binding step. The monotonicity invariant is enforced rather than assumed silently. The 6–8-encode budget per call lets Phase E generate a five-tier per-resolution ladder in well under a wall-clock minute on modest hardware once GPU-side encode is wired in.
- Negative: The predicate signature in
compare.PredicateFncannot carry source geometry, so the default predicate still surfaces an error when called with no closure binding. This is a documented one-line hand-shake, not a regression — pre-Phase-B the default predicate raisedNotImplementedError. - Neutral / follow-ups:
- Sample-clip mode (ADR-0301) is out of scope for the first cut; the bisect always encodes the full source. Wiring
sample_clip_secondsthrough is a small follow-up that mirrorscorpus._resolve_sample_clip. - Cache integration (ADR-0298) is not yet wired; the bisect re-encodes on every call. The cache key fields are already adapter-aware so the wiring is a one-call insertion in
_encode_and_score. - The bisect module is a candidate for the future
vmaf-tune bisectstandalone CLI subcommand if operator demand surfaces; today's wiring keeps it a programmatic primitive only.
References¶
- ADR-0237 — vmaf-tune umbrella spec.
- ADR-0276 — Phase A.5 fast path; cites Phase B as the production target-VMAF backend.
- ADR-0287 — saliency-aware encoding; consumes the same predicate seam.
- ADR-0295 — Phase E ladder; default sampler composes Phase B with
recommend.pick_target_vmaf. - ADR-0296 — adapter
quality_rangeis the search-space boundary. - ADR-0306 — coarse-to-fine grid search; complementary, not a replacement.
- Research-0090 — Phase B bisect feasibility.
- Source:
req(direct user instruction in this session: "Implement vmaf-tune Phase B (target-VMAF bisect) in the VMAFx/vmafx fork").