Research-0077 — Encoder knob-space Pareto frontiers (per source × codec × rc_mode)¶
- Status: Adopted by ADR-0305
- Date: 2026-05-05
- Companion to: Research-0063
TL;DR¶
Companion analysis to Research-0063. Where 0063 identified the rate-control axis as load-bearing (NVENC's tuned recipe regresses ~4 VMAF at cq=30 against bare defaults), 0077 enumerates the full recipe space across 6 codec families × 3 rate-control modes × 9 sources and reports the Pareto frontier per slice. Headline findings land in §Headline findings via a follow-up commit when the sweep completes (ETA ~3h from this PR).
Question¶
Given the Research-0063 finding that recipe quality stratifies by rate-control mode, what is the dominance hull of encoder knob combinations on (bitrate, vmaf) within each (source, codec, rc_mode) slice, and which knob combinations qualify as ship-candidates for tools/vmaf-tune/codec_adapters/* defaults under the regression-detection invariant ("must not regress vs the bare encoder at matched bitrate within the same slice")?
Method¶
Sweep¶
12,636-cell grid generated by scripts/dev/hw_encoder_corpus.py (extended to enumerate per-codec knob axes — see ADR-0305):
| axis | values | count |
|---|---|---|
| sources | 9 Netflix (BigBuckBunny, Crowd-Run, Park-Joy, Tractor, Sintel-Trailer, In-To-Tree, Riverbed, Touchdown-Pass, Ducks-Take-Off) | 9 |
| codec families | libx264, libx265, libsvtav1, libaom-av1, libvpx-vp9, libvvenc | 6 |
| rate-control modes | cq, vbr, cbr | 3 |
| knob combinations per codec | preset × tune × spatial-aq × temporal-aq × bf × lookahead × multipass × adaptive-i/b (subset valid per codec) | ~78 avg |
Per-cell schema mirrors vmaf-tune Phase A SCHEMA_VERSION = 2: source, codec, rc_mode, preset, quality, knob_dict, bitrate_kbps, vmaf_score, encode_time_ms, clip_mode. Sweep output lives at runs/phase_a/full_grid/comprehensive.jsonl (gitignored).
Analysis¶
ai/scripts/analyze_knob_sweep.py consumes the JSONL and:
- Groups rows by
(source, codec, rc_mode). - Computes the 2-D Pareto frontier on (bitrate_kbps, vmaf_score) — minimise bitrate, maximise vmaf, ascending bitrate sort, sweep the running-max vmaf.
- Applies
encode_time_msas a tiebreaker for hull-boundary rows (lower wall time wins when bitrate and vmaf tie within floating-point tolerance). - For each slice: emits a per-slice CSV (
reports/pareto_<source>_<codec>_<rc>.csv) listing the hull rows + the dominated rows directly above the hull at each bitrate. - Aggregates the per-slice hulls into a markdown summary at
reports/summary.mdlisting, per (codec, rc_mode), the knob combinations that appear on at least one source's hull and the ones that regress against the bare-encoder baseline.
Expected frontier count¶
9 sources × 6 codecs × 3 rc_modes = 162 candidate slices. After dedup (some rc_modes are not implemented for some codecs — e.g. libvvenc ships no CBR path; some hardware-only modes are excluded from this software-encoder sweep) the realised count is ~25 distinct hulls — close to the ADR-0305 estimate.
Scaffolded headline findings¶
Headline findings: TBD pending sweep completion (~3h ETA from this PR).
This section will be populated by a follow-up commit on the same branch (research/encoder-knob-space-pareto-frontiers) once comprehensive.jsonl is complete. The placeholder below shows the schema each row will follow when populated:
| codec | rc_mode | hull-recipe knobs (frequency-of-appearance ranked) | regresses-vs-bare on N/9 sources | recommended default |
|---|---|---|---|---|
| libx264 | cq | TBD | TBD | TBD |
| libx264 | vbr | TBD | TBD | TBD |
| libx264 | cbr | TBD | TBD | TBD |
| libx265 | cq | TBD | TBD | TBD |
| ... | ... | ... | ... | ... |
Once populated, the regression check is the gate: any recipe with a non-zero "regresses-vs-bare" count is blocked from being shipped as the adapter default for that (codec, rc_mode) pair.
Reproducer¶
# 1. Generate the sweep (long-running, ~3h on a single host with
# NVENC + CUDA libvmaf available; gitignored output).
mkdir -p runs/phase_a/full_grid/
python3 scripts/dev/hw_encoder_corpus.py \
--sweep-spec scripts/dev/sweep_specs/comprehensive.yaml \
--vmaf-bin core/build-cuda/tools/vmaf \
--score-backend cuda \
--out runs/phase_a/full_grid/comprehensive.jsonl
# 2. Analyse — produces per-slice CSVs + reports/summary.md.
python3 ai/scripts/analyze_knob_sweep.py \
--jsonl runs/phase_a/full_grid/comprehensive.jsonl \
--out-dir reports/
# 3. Smoke test on the synthetic fixture (no sweep required).
pytest ai/tests/test_knob_sweep_analysis.py -v
References¶
- Research-0063 — precursor finding (CQ vs VBR stratifies recipe quality).
- ADR-0305 — governing decision (per-(source, codec, rc_mode) stratification).
- ADR-0237 —
vmaf-tunePhase A scope. - ADR-0291 —
fr_regressor_v2v2 prod ship; downstream consumer of these frontiers. - ADR-0297 — codec-agnostic encode dispatcher; the surface adapter defaults flow into.
- Source:
req— user request to land the analysis methodology + scripts ahead of sweep completion so the recipe-default workstream is unblocked.