ADR-0400: encoder-internal-stats capture (corpus expansion v1)¶
- Status: Accepted
- Date: 2026-05-08
- Deciders: lusoris, Claude
- Tags: vmaf-tune, corpus, predictor, x264
Context¶
The vmaf-tune corpus generator currently records only the post-hoc encode outcome — output size, wall time, VMAF score, version provenance. The encoder's own rate-distortion ledger (predicted bitrate, QP, motion-vector cost, texture cost, partition decisions) is discarded. Per the contributor-pack research digest (docs/research/0086-contributor-pack-web-data-expansion-2026-05-08.md, ranked highest-leverage signal expansion), this signal is "free at encode time" — already produced by every modern encoder during a single --pass 1 / -pass 1 -passlogfile <prefix> invocation. Capturing it closes the loop on what the encoder's own rate-distortion engine saw, not just on what the input pixels look like.
Decision¶
We extend the corpus row schema (bumped to v3) with ten enc_internal_* scalar aggregates capturing per-frame x264 pass-1 stats: mean/std QP, mean/std bits, mean/std MV cost, mean intra-texture cost, mean predicted-texture cost, intra-MB ratio, skip-MB ratio. The codec-adapter contract gains a supports_encoder_stats: bool flag; libx264 and libx265 opt in (libx264 parsed in v1, libx265 wired through but parser in a follow-up), every hardware encoder and software AV1 encoder opts out. Rows for opt-out codecs emit zeros so the schema is uniform across the corpus.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Capture stats inline during the production encode | One pass instead of two | x264's stats writer requires -pass 1 mode; mode-switching mid-encode isn't supported by FFmpeg | Two-pass (stats-only pass + production CRF pass) is the canonical workflow |
Parse the post-encode bitstream with ffprobe -show_frames | No second encode | Misses RC-internal signal (QP-pre-AQ, partition cost, predicted bits); ffprobe is decoder-side, not encoder-side | Can't recover the "what the encoder considered" signal |
| Defer until a multi-codec parser is ready (libx265, libvpx) | Single PR delivers all codecs | Predictor-integration follow-up is gated on x264 alone landing; multi-codec adds weeks | Ship x264 now; libx265/libvpx parsers land additively |
Consequences¶
- Positive: predictor gains the encoder's own RD-loop signal, which the contributor-pack digest sized as the highest-leverage available feature. Schema is uniform across codecs (zeros for opt-out). Adapter contract is explicit — adding a new encoder forces an opt-in/opt-out choice.
- Negative: per-encode wall-clock cost roughly doubles for
supports_encoder_stats=Trueadapters — we run a stats-only pass-1 invocation before the production CRF encode. The corpus sweep gains the new columns at 2× cost. This is the documented trade-off; no way to recover the signal without the extra pass. - Neutral / follow-ups: predictor integration is out of scope for this PR; the predictor consumes the new columns in a follow-up after the schema lands. Tests verify the columns are populated and schema-uniform; correlation with VMAF is measured downstream. libx265 / libvpx-specific parsers land in additional follow-up PRs on top of the v3 schema (no further version bump needed).
References¶
- Research digest:
docs/research/0086-contributor-pack-web-data-expansion-2026-05-08.md(ranked encoder-internal stats as highest-leverage signal expansion). - x264 source:
encoder/ratecontrol.c,x264_ratecontrol_summarywriter. - FFmpeg pass-1:
doc/encoders.texi,libavcodec/libx264.c(passlog passthrough). - Coordinates with ADR-0302 (encoder-vocab-v3-schema-expansion); whichever PR lands first claims SCHEMA_VERSION=3, the other lands its columns additively.
- Source:
req(direct user request: "capture per-frame encoder-internal statistics that x264 already emits via--pass 1/--stats").