`fr_regressor_v1` — full-reference VMAF score regressor (C1 baseline)¶

fr_regressor_v1 is the Wave-1 C1 baseline (full-reference scoring): a tiny MLP that maps libvmaf's classical 6-feature vector (adm2, vif_scale0..3, motion2) to a per-frame VMAF score. It is the neural-network sibling of the production vmaf_v0.6.1 SVR — same input, same target — packaged as a 67-op-allowlisted ONNX so it can run inside libvmaf's tiny-AI inference path on every supported execution provider (CPU / CUDA / OpenVINO / ROCm).

Status — refreshed 2026-05-20. Originally shipped 2026-04-29 (ADR-0249); refreshed after the May feature-extraction/default-path bugfix wave using runs/full_features_netflix_refresh_20260520.parquet. See ADR-0647.

What the output means¶

The model emits a single scalar feature score, one value per frame pair. Range matches vmaf_v0.6.1: [0, 100] typical, with > 100 clipped at the libvmaf level.

Value	Interpretation
0	Catastrophic distortion
~30	Heavily compressed, blocky
~60	Visible artefacts but watchable
~80	Near-transparent on consumer displays
100	Indistinguishable from the reference

PLCC against vmaf_v0.6.1 per-frame on the Netflix Public 9-fold LOSO hold-out is reported in model/tiny/fr_regressor_v1.json (sidecar field training.loso_mean_plcc). The ship gate is mean LOSO PLCC ≥ 0.95; the trainer refuses to overwrite the registry below that threshold.

Shipped checkpoint¶

Field	Value
Model name	`fr_regressor_v1`
Location	`model/tiny/fr_regressor_v1.onnx`
Sidecar	`model/tiny/fr_regressor_v1.json`
Architecture	`FRRegressor` (2-layer GELU MLP, hidden=64, dropout=0.1)
Input	`features` — `[N, 6]` float32, standardised (mean / std in sidecar)
Output	`score` — `[N]` float32, VMAF-scale
Feature order	`adm2, vif_scale0, vif_scale1, vif_scale2, vif_scale3, motion2`
ONNX opset	17
Training corpus	Netflix Public Dataset refresh table `runs/full_features_netflix_refresh_20260520.parquet` (11190 rows, 30 cols; source YUVs remain local-only)
Teacher	`vmaf_v0.6.1` per-frame score
Held-out PLCC	`0.9982 ± 0.0014` mean 9-fold LOSO
License	BSD-3-Clause-Plus-Patent (fork-local; checkpoint is non-redistributable Netflix data derivative — see Provenance below)
Exporter	`ai/scripts/train_fr_regressor.py`

The sidecar JSON pins the training-time per-feature mean / std vector under feature_mean / feature_std. Callers must standardise their input feature vector with the same statistics before invoking the graph — the standardisation is not baked into the ONNX so that downstream consumers can substitute a different feature pool without re-exporting.

The sidecar also carries run_provenance (ai-run-provenance-v1): the trainer entrypoint, parsed arguments, input parquet path/hash, and output model/sidecar/registry/metrics targets. The metrics JSON written by --metrics-out carries the same block, including failed ship-gate runs where no ONNX is exported.

2026-05-20 refresh metrics¶

Source	PLCC	SROCC	RMSE
BigBuckBunny	0.9962	0.6277	4.466
BirdsInCage	0.9993	0.9997	1.854
CrowdRun	0.9997	0.9998	1.005
ElFuente1	0.9992	0.9963	1.635
ElFuente2	0.9964	0.9969	3.169
FoxBird	0.9967	0.9954	2.352
OldTownCross	0.9992	0.9998	1.931
Seeking	0.9989	0.9962	1.982
Tennis	0.9982	0.9982	1.356

Summary: mean PLCC 0.9982 ± 0.0014, mean SROCC 0.9567 ± 0.1234, mean RMSE 2.194 ± 1.049. BigBuckBunny has weak rank ordering but high linear agreement; the C1 ship gate remains the ADR-0249 PLCC gate.

Provenance¶

The training corpus (.workingdir2/netflix/) is the Netflix Public Dataset, distributed by Netflix under a license that forbids redistribution. The shipped ONNX is a derivative: parameters were fitted to per-frame vmaf_v0.6.1 teacher scores computed locally on that corpus. The fork ships the resulting ONNX (~few KB of parameters) under BSD-3-Clause-Plus-Patent on the basis that the parameter values are a derived statistical summary, not a redistribution of the YUV bitstreams or the (separately access-gated) DMOS sidecar CSV. If your jurisdiction reads "derivative work" more broadly, treat the checkpoint as Netflix-license-encumbered and rebuild from your own copy of the dataset.

Usage — CLI¶

vmaf \
    --reference ref.yuv \
    --distorted dist.yuv \
    --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
    --tiny-model fr_regressor_v1 \
    --tiny-device auto \
    --output score.json

--tiny-device auto resolves to the best available execution provider (CUDA → OpenVINO → CPU). The CPU path is a hard requirement — every shipped tiny model must run there as the variance-anchor (see ADR-0214).

Usage — Python¶

import json
import numpy as np
import onnxruntime as ort

# 1. Load the ONNX + the per-feature standardisation from the sidecar.
sidecar = json.loads(open("model/tiny/fr_regressor_v1.json").read())
mean = np.asarray(sidecar["feature_mean"], dtype=np.float32)
std = np.asarray(sidecar["feature_std"], dtype=np.float32)

sess = ort.InferenceSession("model/tiny/fr_regressor_v1.onnx",
                            providers=["CPUExecutionProvider"])

# 2. Compute the canonical-6 features per frame using libvmaf
#    (e.g. via ai.data.feature_extractor.extract_features).
feats = ...  # shape (n_frames, 6) in the order documented above

# 3. Standardise and run.
x = (feats - mean) / std
scores = sess.run(["score"], {"features": x.astype(np.float32)})[0]

Re-training¶

# 1. Make sure a fresh Netflix feature table exists. Regenerate via:
python ai/scripts/extract_full_features.py \
    --data-root .workingdir2/netflix \
    --vmaf-bin core/build-cpu/tools/vmaf \
    --out runs/full_features_netflix_refresh_YYYYMMDD.parquet

# 2. Train + export (defaults match the shipped checkpoint).
PYTHONPATH=ai/src python ai/scripts/train_fr_regressor.py \
    --parquet runs/full_features_netflix_refresh_YYYYMMDD.parquet

The trainer:

Runs a 9-fold leave-one-source-out (LOSO) sweep over the parquet, reporting per-fold PLCC / SROCC / RMSE.
Refuses to ship if the mean LOSO PLCC < 0.95 (configurable via --ship-threshold; lowering the threshold is a soft-fail of policy — fix the model, not the gate).
Re-trains a final model on all 9 sources and exports it to model/tiny/fr_regressor_v1.onnx, updating the sidecar + registry sha256.

Idempotent: re-running with the same seed + parquet produces the same ONNX bytes (modulo torch / onnx producer-string drift).

Known limitations¶

Canonical-6 input only. Larger feature pools (subsets A / B / full-21) live in ai/scripts/phase3_subset_sweep.py; ship-grade models from those subsets are tracked separately.
Netflix Public corpus only. Generalisation to UGC / live-encode / HDR is not validated. C2 (nr_metric_v1) covers UGC; HDR is on the Wave-1 follow-up backlog.
vmaf_v0.6.1 as teacher. Inherits its biases (banding insensitive, over-confident on heavily-compressed cartoon content, etc.). MOS alignment is transitive through the Netflix Public DMOS that vmaf_v0.6.1 was originally calibrated against.
Static input shape. Only the batch axis is dynamic ([N, 6]); the feature dimension is pinned at 6.
Rank correlation caveat on BigBuckBunny. The 2026-05-20 refresh keeps high PLCC (0.9962) on BigBuckBunny but low SROCC (0.6277). Treat this model as a PLCC-aligned teacher-score regressor, not a per-content ranker.

fr_regressor_v1 — full-reference VMAF score regressor (C1 baseline)¶