Skip to content

Research-0077 — Encoder knob-space Pareto frontiers (per source × codec × rc_mode)

TL;DR

Companion analysis to Research-0063. Where 0063 identified the rate-control axis as load-bearing (NVENC's tuned recipe regresses ~4 VMAF at cq=30 against bare defaults), 0077 enumerates the full recipe space across 6 codec families × 3 rate-control modes × 9 sources and reports the Pareto frontier per slice. Headline findings land in §Headline findings via a follow-up commit when the sweep completes (ETA ~3h from this PR).

Question

Given the Research-0063 finding that recipe quality stratifies by rate-control mode, what is the dominance hull of encoder knob combinations on (bitrate, vmaf) within each (source, codec, rc_mode) slice, and which knob combinations qualify as ship-candidates for tools/vmaf-tune/codec_adapters/* defaults under the regression-detection invariant ("must not regress vs the bare encoder at matched bitrate within the same slice")?

Method

Sweep

12,636-cell grid generated by scripts/dev/hw_encoder_corpus.py (extended to enumerate per-codec knob axes — see ADR-0305):

axis values count
sources 9 Netflix (BigBuckBunny, Crowd-Run, Park-Joy, Tractor, Sintel-Trailer, In-To-Tree, Riverbed, Touchdown-Pass, Ducks-Take-Off) 9
codec families libx264, libx265, libsvtav1, libaom-av1, libvpx-vp9, libvvenc 6
rate-control modes cq, vbr, cbr 3
knob combinations per codec preset × tune × spatial-aq × temporal-aq × bf × lookahead × multipass × adaptive-i/b (subset valid per codec) ~78 avg

Per-cell schema mirrors vmaf-tune Phase A SCHEMA_VERSION = 2: source, codec, rc_mode, preset, quality, knob_dict, bitrate_kbps, vmaf_score, encode_time_ms, clip_mode. Sweep output lives at runs/phase_a/full_grid/comprehensive.jsonl (gitignored).

Analysis

ai/scripts/analyze_knob_sweep.py consumes the JSONL and:

  1. Groups rows by (source, codec, rc_mode).
  2. Computes the 2-D Pareto frontier on (bitrate_kbps, vmaf_score) — minimise bitrate, maximise vmaf, ascending bitrate sort, sweep the running-max vmaf.
  3. Applies encode_time_ms as a tiebreaker for hull-boundary rows (lower wall time wins when bitrate and vmaf tie within floating-point tolerance).
  4. For each slice: emits a per-slice CSV (reports/pareto_<source>_<codec>_<rc>.csv) listing the hull rows + the dominated rows directly above the hull at each bitrate.
  5. Aggregates the per-slice hulls into a markdown summary at reports/summary.md listing, per (codec, rc_mode), the knob combinations that appear on at least one source's hull and the ones that regress against the bare-encoder baseline.

Expected frontier count

9 sources × 6 codecs × 3 rc_modes = 162 candidate slices. After dedup (some rc_modes are not implemented for some codecs — e.g. libvvenc ships no CBR path; some hardware-only modes are excluded from this software-encoder sweep) the realised count is ~25 distinct hulls — close to the ADR-0305 estimate.

Scaffolded headline findings

Headline findings: TBD pending sweep completion (~3h ETA from this PR).

This section will be populated by a follow-up commit on the same branch (research/encoder-knob-space-pareto-frontiers) once comprehensive.jsonl is complete. The placeholder below shows the schema each row will follow when populated:

codec rc_mode hull-recipe knobs (frequency-of-appearance ranked) regresses-vs-bare on N/9 sources recommended default
libx264 cq TBD TBD TBD
libx264 vbr TBD TBD TBD
libx264 cbr TBD TBD TBD
libx265 cq TBD TBD TBD
... ... ... ... ...

Once populated, the regression check is the gate: any recipe with a non-zero "regresses-vs-bare" count is blocked from being shipped as the adapter default for that (codec, rc_mode) pair.

Reproducer

# 1. Generate the sweep (long-running, ~3h on a single host with
#    NVENC + CUDA libvmaf available; gitignored output).
mkdir -p runs/phase_a/full_grid/
python3 scripts/dev/hw_encoder_corpus.py \
  --sweep-spec scripts/dev/sweep_specs/comprehensive.yaml \
  --vmaf-bin core/build-cuda/tools/vmaf \
  --score-backend cuda \
  --out runs/phase_a/full_grid/comprehensive.jsonl

# 2. Analyse — produces per-slice CSVs + reports/summary.md.
python3 ai/scripts/analyze_knob_sweep.py \
  --jsonl runs/phase_a/full_grid/comprehensive.jsonl \
  --out-dir reports/

# 3. Smoke test on the synthetic fixture (no sweep required).
pytest ai/tests/test_knob_sweep_analysis.py -v

References

  • Research-0063 — precursor finding (CQ vs VBR stratifies recipe quality).
  • ADR-0305 — governing decision (per-(source, codec, rc_mode) stratification).
  • ADR-0237vmaf-tune Phase A scope.
  • ADR-0291fr_regressor_v2 v2 prod ship; downstream consumer of these frontiers.
  • ADR-0297 — codec-agnostic encode dispatcher; the surface adapter defaults flow into.
  • Source: req — user request to land the analysis methodology + scripts ahead of sweep completion so the recipe-default workstream is unblocked.