vmaf_tiny_v2 — feature-fusion VMAF estimator¶
vmaf_tiny_v2 is a tiny multi-layer perceptron that predicts a VMAF score from six classic libvmaf features (adm2, vif_scale0..3, motion2 — the canonical-6 set used by vmaf_v0.6.1). It replaces vmaf_tiny_v1 as the default tiny VMAF fusion model: same input contract, same output range, +0.005–0.018 PLCC across the validation chain (Netflix LOSO + KoNViD 5-fold).
The model is only the regressor. Feature extraction is unchanged —
adm,vif, andmotionare computed by the existing libvmaf CPU/GPU paths. v2 just gives you a smaller, more accurate fusion head than the upstream SVM.
What the output means¶
A single scalar per frame, on the same 0–100 VMAF scale as the classic SVM regressor.
| Value | Interpretation |
|---|---|
| 100 | Perceptually identical to the reference |
| 80–95 | High-quality encode |
| 60–80 | Visible compression artifacts |
| < 60 | Heavy degradation |
Shipped checkpoint¶
| Field | Value |
|---|---|
| Model name | vmaf_tiny_v2 |
| Location | model/tiny/vmaf_tiny_v2.onnx |
| Architecture | mlp_small — Linear(6, 16) → ReLU → Linear(16, 8) → ReLU → Linear(8, 1), ~257 params |
| Input | features — float32 [N, 6], dynamic batch |
| Feature order | adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2 |
| Output | vmaf — float32 [N] |
| ONNX opset | 17 |
| Quantisation | fp32 (size already <2 KB; 8-bit has no shipping payoff) |
| License | BSD-3-Clause-Plus-Patent |
| Registry entry | vmaf_tiny_v2 in model/tiny/registry.json |
| Sidecar | model/tiny/vmaf_tiny_v2.json |
| Exporter | ai/scripts/export_vmaf_tiny_v2.py |
| Trainer | ai/scripts/train_vmaf_tiny_v2.py |
The graph bakes the StandardScaler (mean, std) from the training set as Constant nodes that run before the MLP — the runtime feeds raw feature values, the trust-root sha256 covers the calibration values too. There is no out-of-band scaler file to ship or distribute.
Fresh exports add a run_provenance block to the sidecar. It records the exporter entrypoint, CLI arguments, checkpoint input, and ONNX / sidecar output paths so refreshed model artifacts can be traced without reading local shell history.
Effective topology:
features [N, 6]
|
Sub <- mean ([6] constant)
|
Div <- std ([6] constant)
|
Linear(6,16) → ReLU → Linear(16,8) → ReLU → Linear(8,1)
|
Squeeze(-1) -> vmaf [N]
Training data¶
- Netflix Public Dataset (9 sources × encodings — local extract).
- KoNViD-1k (5-fold extract; CC BY 4.0; not redistributed).
- BVI-DVC subsets A + B + C + D (full coverage).
All combined into runs/full_features_4corpus.parquet (330 499 frame-rows × 22 FULL_FEATURES + vmaf teacher score from vmaf_v0.6.1). The 4-corpus union is what we fit the StandardScaler and the MLP on for the production export. LOSO + 5-fold are the validation methodology, not the deployment recipe.
The shipped weights were retrained on the full 4-corpus union after the 3-corpus sweep validated the canonical-6 + lr=1e-3 + 90ep configuration (Phase-3 chain → ADR-0244). Adding the BVI-DVC A + B subsets brings the row count from 305 795 to 330 499 (+24 704 rows, +8.1 %) and keeps train PLCC at 0.9999 / RMSE 0.153.
Validation¶
Per the Phase-3 chain (Research-0027 → 0028 → 0029 → 0030):
| Methodology | PLCC | SROCC | Notes |
|---|---|---|---|
| Netflix LOSO (9 folds × 5 seeds) | 0.9978 ± 0.0021 | 0.9959 ± 0.0027 | +0.005–0.018 over Subset-B baseline |
| KoNViD 5-fold | 0.9998 | 0.9989 | corpus-portability gate (Phase-3b) |
The min-PLCC = 0.97 ship gate runs in ai/scripts/validate_vmaf_tiny_v2.py against runs/full_features_netflix.parquet first 100 rows; refuses to exit-0 below the gate. Pass --out-json when preserving promotion evidence; the JSON report includes ADR-0661 run_provenance for the ONNX, parquet, parsed gate arguments, and report path.
Usage — CLI¶
# Use vmaf_tiny_v2 instead of the classic SVM regressor.
vmaf -r ref.yuv -d dis.yuv -w 1920 -h 1080 -p 420 -b 8 \
--tiny-model model/tiny/vmaf_tiny_v2.onnx \
--tiny-device auto
--tiny-device auto walks cuda → openvino → rocm → cpu. The model is so small (<2 KB) that the dispatch overhead dominates wall-clock on every device; CPU is usually the fastest path.
Usage — Python (ONNX Runtime)¶
For research workflows that already have the canonical-6 features in hand (e.g. from runs/full_features_*.parquet):
import numpy as np
import onnxruntime as ort
import pandas as pd
sess = ort.InferenceSession("model/tiny/vmaf_tiny_v2.onnx",
providers=["CPUExecutionProvider"])
df = pd.read_parquet("runs/full_features_netflix.parquet").head(100)
features = df[
["adm2", "vif_scale0", "vif_scale1",
"vif_scale2", "vif_scale3", "motion2"]
].to_numpy(dtype=np.float32)
(vmaf,) = sess.run(None, {"features": features})
print(vmaf[:5]) # -> per-frame VMAF estimates
Reproducer¶
# 1. Train on the 4-corpus parquet (~12 min CPU on a typical dev box).
python3 ai/scripts/train_vmaf_tiny_v2.py \
--parquet runs/full_features_4corpus.parquet \
--out-ckpt /tmp/vmaf_tiny_v2.pt \
--out-stats /tmp/vmaf_tiny_v2_stats.json
# 2. Export to ONNX with bundled scaler stats.
python3 ai/scripts/export_vmaf_tiny_v2.py \
--ckpt /tmp/vmaf_tiny_v2.pt \
--out-onnx model/tiny/vmaf_tiny_v2.onnx \
--out-sidecar model/tiny/vmaf_tiny_v2.json
# 3. Validate (PLCC must be >= 0.97 on the Netflix slice).
python3 ai/scripts/validate_vmaf_tiny_v2.py \
--onnx model/tiny/vmaf_tiny_v2.onnx \
--parquet runs/full_features_netflix.parquet \
--rows 100 --min-plcc 0.97 \
--out-json runs/vmaf_tiny_v2_validate.json
The training stats JSON includes ADR-0661 run_provenance with the trainer entrypoint, argv, parsed hyperparameters, parquet input, checkpoint target, and stats target. Keep that block with any refreshed stats used for export.
Limitations¶
- The model fuses six already-extracted features — it is not a pixel-input quality model. To use it from raw YUV, the feature extraction stage runs first (the regular libvmaf path).
- Trained on SDR content. HDR coverage is out of scope until the upstream HDR feature extractors land.
- The 4-corpus parquet uses
vmaf_v0.6.1as the teacher score; v2 cannot exceedvmaf_v0.6.1in absolute correctness — it approximates the SVM with a much smaller MLP. - Bit-exactness across CPU/GPU execution providers is not guaranteed (ADR-0042 / ADR-0119 — places=4 tolerance applies to tiny-AI models too).