Per-shot VMAF predictor¶
The per-shot VMAF predictor turns "encode every shot, score every shot" into "predict every shot, encode every shot, score a sampled subset". The predict-then-verify loop saves wall time on long titles without giving up the per-shot quality contract:
- Probe-encode each shot once at the codec's
probe_quality(e.g.libx264 --preset ultrafast --crf 28). - Read cheap signals from the probe — bitrate, per-frame-type sizes, optional saliency / signalstats.
- Feed those signals to the per-codec ONNX predictor; it returns a predicted VMAF for any candidate CRF.
- Binary-search the codec's CRF range for the largest CRF whose predicted VMAF still meets the operator's target.
- Encode at that CRF.
- Validate by re-scoring a stratified sample of shots; if the residuals stay within tolerance the predictions hold (
GOSPEL), else recalibrate or fall back.
This document covers the user-facing contract per the five-point tiny-AI bar in ADR-0042.
1. Purpose¶
Concretely, the predictor lets tools/vmaf-tune skip step 5's costly real-VMAF measurement on every shot. With 14 codec adapters (libx264, libx265, libsvtav1, libaom-av1, libvvenc plus the NVENC, AMF, and QSV families across H.264, HEVC, AV1) the harness loads model/predictor_<codec>.onnx at startup and routes every pick_crf(...) through it.
The runtime predictor surface is:
from vmaftune.predictor import Predictor, ShotFeatures
p = Predictor(model_path=Path("model/predictor_libx264.onnx"))
crf = p.pick_crf(features, target_vmaf=92.0, codec="libx264")
Without model_path, the predictor falls back to a per-codec analytical curve. Tests and dev hosts without ONNX Runtime hit that path automatically; production deployments load the ONNX file.
2. Training data¶
The trainer is tools/vmaf-tune/src/vmaftune/predictor_train.py. It consumes the same vmaf-tune Phase A JSONL corpus (ADR-0237) that the recommend / per-shot tools already produce — one row per (source, preset, crf) cell with bitrate_kbps and the measured vmaf_score.
The hardware-encoder models for h264_nvenc, hevc_nvenc, av1_nvenc, h264_qsv, hevc_qsv, and av1_qsv are trained on the real Phase-A hardware sweep at runs/phase_a/full_grid/comprehensive.jsonl (local training corpus; not committed). Their model cards carry corpus.kind: real-N=<rows> and honest held-out metrics.
The software and AMF models remain synthetic-stub models per ADR-0325: each such codec gets a deterministic 100-row synthetic corpus seeded by the codec name. The synthetic target is the predictor's own analytical-fallback curve, so the resulting ONNX model is a smooth re-encoding of the analytical formula. Stub models are not authoritative for production CRF picks. Every per-codec model card flags this prominently.
To train real models on a real corpus:
# 1. Generate the corpus (Phase A, may take hours).
python -m vmaftune.cli corpus --encoder libx264 \
--source ref.yuv --output corpus.jsonl ...
# 2. Re-train the predictor for one or more codecs against it.
python -m vmaftune.predictor_train \
--corpus corpus.jsonl \
--output-dir model \
--codec libx264 --codec libx265 \
--epochs 200
The trainer writes one ONNX + one model card per requested codec. Codecs not present in the corpus fall back to the synthetic-stub path in the same run; mixed runs are explicit in each card via the corpus.kind line.
Corpus row → predictor input projection¶
| Predictor input | Source |
|---|---|
crf | row crf (cq / q aliases accepted for hardware sweeps) |
probe_bitrate_kbps | row bitrate_kbps (actual_kbps alias accepted) |
probe_*_avg_bytes | row values when present; otherwise derived from bitrate_kbps + framerate |
saliency_* / signalstats | row values when present; otherwise zero |
shot_length_frames | framerate × duration_s |
fps, width, height | row metadata |
The runtime extractor in predictor_features.py populates the saliency / signalstats inputs from a real probe run. vmaf-tune predict --use-saliency decodes the current shot to temporary yuv420p and runs the configured saliency_student ONNX model over sampled frames before feeding saliency_mean and saliency_var into ShotFeatures; this is a predictor input, not the ROI encode path from recommend-saliency.
3. Op allowlist compliance¶
The trainer validates every exported ONNX against the libvmaf C-side allowlist (core/src/dnn/op_allowlist.c) via ai/src/vmaf_train/op_allowlist.py. Failure aborts the export.
The shipped MLP graph uses only allowlisted ops:
| Op | Used in |
|---|---|
Sub, Div | per-feature input normalisation |
Gemm | three fully-connected layers |
Relu | hidden-layer activation |
Sigmoid | output range gating |
Mul | output × 100 to land in [0, 100] |
Constant | normalisation buffers + bias terms |
Op-allowlist status appears in every per-codec model card under section 3.
4. Validation metrics¶
Each model card carries PLCC, SROCC, and RMSE on the held-out 20 % split (seeded shuffle). Stub-model numbers are artificially high because the regression target is the analytical fallback — the network smooths itself. Real-corpus runs produce honest numbers; the production gate is "PLCC ≥ 0.95 on the held-out split per codec", matching the existing fr_regressor_v2 gate (ADR-0291).
The trainer also pins the runtime contract via tools/vmaf-tune/tests/test_predictor_train.py:
- Every shipped
model/predictor_<codec>.onnxloads under ONNX Runtime CPU. - Output is finite and clamped to
[0, 100]. - Output is non-strictly monotone-decreasing in CRF.
Predictor(model_path=...)routes through the ONNX session, not the analytical fallback.
5. Signing¶
Production-grade tiny-AI weights ship with a Sigstore-keyless OIDC signature attached at the release-please tag step (per the existing model/tiny/*.onnx pattern; see docs/development/release.md). Stub models ship unsigned because their numerical content is not authoritative; their cards carry a Sigstore signature: PLACEHOLDER line. Real-corpus model files are still unsigned while in-tree on a branch; release automation attaches the Sigstore-keyless OIDC bundles for published tags, following the same release workflow as the model/tiny/*.onnx artefacts.
File layout¶
model/
predictor_libx264.onnx + predictor_libx264_card.md
predictor_libx265.onnx + predictor_libx265_card.md
predictor_libsvtav1.onnx + predictor_libsvtav1_card.md
predictor_libaom-av1.onnx + predictor_libaom-av1_card.md
predictor_libvvenc.onnx + predictor_libvvenc_card.md
predictor_h264_nvenc.onnx + predictor_h264_nvenc_card.md
predictor_hevc_nvenc.onnx + predictor_hevc_nvenc_card.md
predictor_av1_nvenc.onnx + predictor_av1_nvenc_card.md
predictor_h264_amf.onnx + predictor_h264_amf_card.md
predictor_hevc_amf.onnx + predictor_hevc_amf_card.md
predictor_av1_amf.onnx + predictor_av1_amf_card.md
predictor_h264_qsv.onnx + predictor_h264_qsv_card.md
predictor_hevc_qsv.onnx + predictor_hevc_qsv_card.md
predictor_av1_qsv.onnx + predictor_av1_qsv_card.md
tools/vmaf-tune/src/vmaftune/
predictor.py # runtime: Predictor, pick_crf, pick_keyint
predictor_features.py # probe-encode + signalstats extractor
predictor_train.py # trainer (this PR)
predictor_validate.py # GOSPEL / RECALIBRATE / FALL_BACK loop
tools/vmaf-tune/tests/
test_predictor.py # runtime + analytical fallback pins
test_predictor_train.py # trainer + shipped-model pins (this PR)
Reproduction¶
# Re-train every shipped stub (~30s on CPU).
python -m vmaftune.predictor_train --output-dir model --epochs 120
# Verify every shipped model.
pytest tools/vmaf-tune/tests/test_predictor_train.py -v