Research-0063 — Encoder knob-space stratifies by rate-control mode (CQ vs VBR)¶
TL;DR¶
The "VOD-HQ recipe" (-tune hq -multipass fullres -spatial_aq 1 -temporal_aq 1 -rc-lookahead 32 -bf 3 for NVENC; -look_ahead 1 -look_ahead_depth 40 -bf 4 -adaptive_i 1 -adaptive_b 1 for QSV) is calibrated for VBR/CBR rate control, not constant-CQ. At fixed CQ it regresses NVENC quality by 2.7–3.3 VMAF points and only marginally lifts QSV (+0.2 to +0.9). vmaf-tune's recommend output must condition on the rate-control mode the user is targeting before suggesting knobs.
Method¶
Two grids run on the same 9 Netflix sources, same scoring backend (libvmaf CUDA, vmaf_v0.6.1 model):
| grid | preset | quality | extras |
|---|---|---|---|
| bare | NVENC p4, QSV medium | cq 18, 25, 31, 37 | none |
| tuned | NVENC p4..p7, QSV slow..veryslow | cq 18, 22, 26, 30, 34, 38 | NVENC: -tune hq -multipass fullres -spatial_aq 1 -temporal_aq 1 -rc-lookahead 32 -bf 3; QSV: -look_ahead 1 -look_ahead_depth 40 -bf 4 -adaptive_i 1 -adaptive_b 1 |
Both pipelines use the same scripts/dev/hw_encoder_corpus.py driver (NVENC + NVDEC + libvmaf-CUDA score, see PR #392). 33,840 bare per-frame rows + 120,960 tuned per-frame rows. Comparison restricted to (encoder, cq) pairs present in both grids (12 cells).
Result¶
Mean VMAF lift (tuned − bare) across matched (encoder, cq) cells:
| encoder | mean lift |
|---|---|
h264_nvenc | −2.71 |
hevc_nvenc | −3.31 |
av1_nvenc | −0.34 |
h264_qsv | +0.21 |
av1_qsv | +0.81 |
hevc_qsv | +0.94 |
Per-cell deltas at cq=30 (mid-quality operating point):
av1_nvenc bare 96.96 / tuned 96.23 (Δ -0.73)
h264_nvenc bare 92.75 / tuned 88.06 (Δ -4.69)
hevc_nvenc bare 94.50 / tuned 90.19 (Δ -4.30)
av1_qsv bare 90.94 / tuned 92.02 (Δ +1.09)
h264_qsv bare 91.06 / tuned 90.69 (Δ -0.37)
hevc_qsv bare 90.77 / tuned 91.46 (Δ +0.69)
NVENC's tuned recipe loses ~4 VMAF points at cq=30 vs bare-defaults on h264 / hevc.
Why it stratifies by rate-control mode¶
The NVENC HQ recipe's load-bearing knobs target rate-controlled encoding:
-multipass fullresis a 2-pass mechanism that amortises the first pass's bit-budget estimate across the second pass. With constant CQ there is no bit budget — multipass either over-spends (writes a heavier first pass than the cq target needs) or under-spends. Either way the second pass diverges from what a single-pass cq encode would do.-tune hqflips the encoder's internal cost-function weights toward "high-quality VBR" — this includes more aggressive scene-cut detection and B-pyramid placement that pays off when the rate-control loop has bits to redistribute. CQ has no rate-control loop, so the cost-function shift adds overhead without the redistribution benefit.-bf 3+-rc-lookahead 32assume the encoder can amortise the 3-frame B-pyramid latency across a 32-frame look-ahead window. On 6-second 1080p clips (150 frames at 25fps) that's reasonable; on shorter clips or scene-change-heavy content the look-ahead window walks past content boundaries and pulls bad reference decisions.
QSV's recipe is less affected because Intel's MFX session implements look-ahead as a quality-improving probe regardless of rate-control mode — the lift is small (+0.2 to +0.9) but consistent.
Implication for vmaf-tune¶
vmaf-tune's recommend command (Phase B / ADR-0237) currently takes a quality target (--target-vmaf or --target-bitrate) but does not know which rate-control mode the user is targeting. Before recommending knobs the recommender must distinguish:
- CQ-target encoding ("constant quality, let bitrate float") — skip multipass, skip
-tune hqfor NVENC, prefer the simpler bare-default recipe + sweep preset. - VBR-target encoding ("hit this average bitrate at best quality") — apply the full HQ recipe, run multipass, enable look-ahead at full depth.
- CBR-target encoding ("rate-controlled live, hit this exact bitrate") — different again; tune for low-latency, no look-ahead.
The corpus generator produced today is CQ-mode only. A VBR-mode sweep against the same source set would generate a comparable corpus where the HQ recipe should genuinely lift (predicted +2 to +5 VMAF at fixed bitrate based on Intel's published QSV calibrations).
Decision¶
- Document this in the recommend-command spec — recommend output must carry a
rate_control_modefield (cq/vbr/cbr). - Phase A corpus runner gains a
--rate-controlknob; default tocq(current behaviour) but emit arate_control_modecolumn in every JSONL row. - fr_regressor_v2 input vector gains a
rate_control_modeone-hot (3-dim: cq / vbr / cbr / unknown). Append-only vocab extension following the same shape as the encoder vocab v1→v2 in PR #394. - Future tuned-recipe ADR captures recipe-per-mode separately rather than the current single-recipe assumption.
Reproducer¶
# Bare-default cq sweep (already in runs/phase_a/full_grid/nvenc_full.jsonl)
python3 scripts/dev/hw_encoder_corpus.py \
--vmaf-bin core/build-cuda/tools/vmaf \
--source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
--width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
--encoder h264_nvenc --cq 30 --out /tmp/bare.jsonl
# Tuned VOD-HQ cq sweep (same source, same cq, hq recipe)
python3 scripts/dev/hw_encoder_corpus.py \
--vmaf-bin core/build-cuda/tools/vmaf \
--source .workingdir2/netflix/ref/BigBuckBunny_25fps.yuv \
--width 1920 --height 1080 --pix-fmt yuv420p --framerate 25 \
--encoder h264_nvenc --preset p4 --cq 30 \
--extra-encode=-tune --extra-encode=hq \
--extra-encode=-multipass --extra-encode=fullres \
--extra-encode=-spatial_aq --extra-encode=1 \
--extra-encode=-temporal_aq --extra-encode=1 \
--extra-encode=-rc-lookahead --extra-encode=32 \
--extra-encode=-bf --extra-encode=3 \
--out /tmp/tuned.jsonl
# Diff pooled VMAF — expect tuned to be ~4 points lower at cq=30
References¶
- ADR-0237 — vmaf-tune Phase A scope
- PR #392 —
hw_encoder_corpus.pyrunner - PR #394 —
fr_regressor_v2ENCODER_VOCAB v2 - NVIDIA Video Codec SDK Programming Guide — multipass / tune semantics