Research-0077 — Encoder knob-space Pareto frontiers (per source × codec × rc_mode)¶

Status: Adopted by ADR-0305
Date: 2026-05-05
Companion to: Research-0063

TL;DR¶

Companion analysis to Research-0063. Where 0063 identified the rate-control axis as load-bearing (NVENC's tuned recipe regresses ~4 VMAF at cq=30 against bare defaults), 0077 enumerates the full recipe space across 6 codec families × 3 rate-control modes × 9 sources and reports the Pareto frontier per slice. Headline findings land in §Headline findings via a follow-up commit when the sweep completes (ETA ~3h from this PR).

Question¶

Given the Research-0063 finding that recipe quality stratifies by rate-control mode, what is the dominance hull of encoder knob combinations on (bitrate, vmaf) within each (source, codec, rc_mode) slice, and which knob combinations qualify as ship-candidates for tools/vmaf-tune/codec_adapters/* defaults under the regression-detection invariant ("must not regress vs the bare encoder at matched bitrate within the same slice")?

Method¶

Sweep¶

12,636-cell grid generated by scripts/dev/hw_encoder_corpus.py (extended to enumerate per-codec knob axes — see ADR-0305):

axis	values	count
sources	9 Netflix (BigBuckBunny, Crowd-Run, Park-Joy, Tractor, Sintel-Trailer, In-To-Tree, Riverbed, Touchdown-Pass, Ducks-Take-Off)	9
codec families	libx264, libx265, libsvtav1, libaom-av1, libvpx-vp9, libvvenc	6
rate-control modes	`cq`, `vbr`, `cbr`	3
knob combinations per codec	preset × tune × spatial-aq × temporal-aq × bf × lookahead × multipass × adaptive-i/b (subset valid per codec)	~78 avg

Per-cell schema mirrors vmaf-tune Phase A SCHEMA_VERSION = 2: source, codec, rc_mode, preset, quality, knob_dict, bitrate_kbps, vmaf_score, encode_time_ms, clip_mode. Sweep output lives at runs/phase_a/full_grid/comprehensive.jsonl (gitignored).

Analysis¶

ai/scripts/analyze_knob_sweep.py consumes the JSONL and:

Groups rows by (source, codec, rc_mode).
Computes the 2-D Pareto frontier on (bitrate_kbps, vmaf_score) — minimise bitrate, maximise vmaf, ascending bitrate sort, sweep the running-max vmaf.
Applies encode_time_ms as a tiebreaker for hull-boundary rows (lower wall time wins when bitrate and vmaf tie within floating-point tolerance).
For each slice: emits a per-slice CSV (reports/pareto_<source>_<codec>_<rc>.csv) listing the hull rows + the dominated rows directly above the hull at each bitrate.
Aggregates the per-slice hulls into a markdown summary at reports/summary.md listing, per (codec, rc_mode), the knob combinations that appear on at least one source's hull and the ones that regress against the bare-encoder baseline.

Expected frontier count¶

9 sources × 6 codecs × 3 rc_modes = 162 candidate slices. After dedup (some rc_modes are not implemented for some codecs — e.g. libvvenc ships no CBR path; some hardware-only modes are excluded from this software-encoder sweep) the realised count is ~25 distinct hulls — close to the ADR-0305 estimate.

Scaffolded headline findings¶

Headline findings: TBD pending sweep completion (~3h ETA from this PR).

This section will be populated by a follow-up commit on the same branch (research/encoder-knob-space-pareto-frontiers) once comprehensive.jsonl is complete. The placeholder below shows the schema each row will follow when populated:

codec	rc_mode	hull-recipe knobs (frequency-of-appearance ranked)	regresses-vs-bare on N/9 sources	recommended default
libx264	cq	TBD	TBD	TBD
libx264	vbr	TBD	TBD	TBD
libx264	cbr	TBD	TBD	TBD
libx265	cq	TBD	TBD	TBD
...	...	...	...	...

Once populated, the regression check is the gate: any recipe with a non-zero "regresses-vs-bare" count is blocked from being shipped as the adapter default for that (codec, rc_mode) pair.

Reproducer¶

# 1. Generate the sweep (long-running, ~3h on a single host with
#    NVENC + CUDA libvmaf available; gitignored output).
mkdir -p runs/phase_a/full_grid/
python3 scripts/dev/hw_encoder_corpus.py \
  --sweep-spec scripts/dev/sweep_specs/comprehensive.yaml \
  --vmaf-bin core/build-cuda/tools/vmaf \
  --score-backend cuda \
  --out runs/phase_a/full_grid/comprehensive.jsonl

# 2. Analyse — produces per-slice CSVs + reports/summary.md.
python3 ai/scripts/analyze_knob_sweep.py \
  --jsonl runs/phase_a/full_grid/comprehensive.jsonl \
  --out-dir reports/

# 3. Smoke test on the synthetic fixture (no sweep required).
pytest ai/tests/test_knob_sweep_analysis.py -v

References¶

Research-0063 — precursor finding (CQ vs VBR stratifies recipe quality).
ADR-0305 — governing decision (per-(source, codec, rc_mode) stratification).
ADR-0237 — vmaf-tune Phase A scope.
ADR-0291 — fr_regressor_v2 v2 prod ship; downstream consumer of these frontiers.
ADR-0297 — codec-agnostic encode dispatcher; the surface adapter defaults flow into.
Source: req — user request to land the analysis methodology + scripts ahead of sweep completion so the recipe-default workstream is unblocked.