`nr_metric_v1` — tiny no-reference quality metric¶

nr_metric_v1 is a compact MobileNet-style no-reference (NR) quality estimator that predicts a MOS-proxy scalar from a single 224×224 grayscale luma frame. It is the C2 baseline for the fork's tiny-AI NR capability (ADR-0020), trained on KoNViD-1k crowd-sourced MOS labels.

Status — shipped 2026-04-25. Production baseline for C2 NR scoring (KoNViD-1k, CC BY 4.0). An INT8 sidecar is available via nr_metric_v1.int8.onnx (dynamic-PTQ). See ADR-0168 and ADR-0174.

What the output means¶

A single scalar per frame on a normalised MOS scale. The model was trained to predict crowd-sourced MOS (1–5 scale) from KoNViD-1k; the output is a continuous float in approximately that range.

Value	Interpretation
~4.5–5.0	Pristine / near-reference quality
~3.5–4.5	Good quality; minor perceptible artefacts
~2.5–3.5	Moderate quality; clearly visible artefacts
~1.0–2.5	Poor quality; heavy compression / blur

The output is a frame-level MOS estimate. Clip-level quality is typically obtained by averaging over all frames (or a representative subset). The model is content-blind — it does not have access to the reference stream.

Shipped checkpoint¶

Field	Value
Model id	`nr_metric_v1`
Location	`model/tiny/nr_metric_v1.onnx`
INT8 sidecar	`model/tiny/nr_metric_v1.int8.onnx`
Architecture	MobileNet-tiny — depthwise separable Conv stack; ~19 K params
Input	`input` — float32 NCHW `[1, 1, 224, 224]` grayscale luma in `[0, 1]`
Output	`mos_score` — float32 `[1]` scalar MOS estimate
ONNX opset	17
Training corpus	KoNViD-1k (1 200 clips; CC BY 4.0; not redistributed in-tree)
Val MSE	~0.382 (RMSE ≈ 0.62 on 1–5 MOS, KoNViD-1k validation split)
Quantisation	Dynamic-PTQ INT8 via `ai/scripts/ptq_dynamic.py`; `quant_accuracy_budget_plcc = 0.01`
License	BSD-3-Clause-Plus-Patent
Trainer / exporter	`ai/scripts/train_konvid.py` + `ai/scripts/export_tiny_models.py`

Fresh exports from ai/scripts/export_tiny_models.py add ADR-0661 run_provenance to model/tiny/nr_metric_v1.json. That block records the checkpoint input, parsed exporter arguments, ONNX output, sidecar output, and registry target so a refreshed C2 baseline can be traced without relying on shell history.

Training corpus provenance¶

Field	Value
Dataset	KoNViD-1k
Source	https://datasets.vqa.mmsp-kn.de/databases/KoNViD-1k/
Licence	CC BY 4.0 — clips are not redistributed in-tree
Clips	1 200 user-generated video clips, 8 s each at various resolutions
MOS labels	Crowd-sourced mean opinion score (1–5 scale, Amazon Mechanical Turk)
Split used	~973 train / ~107 val / ~120 test (80/9/10 % random split, seed 42)
Feature	Middle frame extracted per clip at 224×224 grayscale

Acknowledgement. This model was trained on KoNViD-1k. We thank the dataset authors for distributing the clips and MOS labels under CC BY 4.0. The clips themselves are not committed to this repository.

Op-allowlist conformance¶

Every op in the graph is on core/src/dnn/op_allowlist.c: Conv, Relu, DepthwiseConv, GlobalAveragePool, Flatten, Gemm (or equivalent MatMul + Add).

Usage — CLI¶

vmaf \
    --reference ref.yuv \
    --distorted dist.yuv \
    --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
    --tiny-model model/tiny/nr_metric_v1.onnx \
    --output score.json

The output JSON gains a per-frame nr_mos_score column. Pool to clip level by averaging across frames.

Usage — C API¶

#include <libvmaf/libvmaf.h>

VmafDnnSession *session = NULL;
vmaf_dnn_session_open(&session, "model/tiny/nr_metric_v1.onnx",
                      VMAF_DNN_DEVICE_CPU);
/* run session per-frame ... */
vmaf_dnn_session_close(session);

Reproducing the model¶

# 1. fetch KoNViD-1k (~40 GB) — not redistributed in-tree
.venv/bin/python ai/scripts/fetch_konvid_1k.py

# 2. extract middle frames + convert to corpus JSONL
.venv/bin/python ai/scripts/konvid_1k_to_corpus_jsonl.py

# 3. train the C2/C3 checkpoints
.venv/bin/python ai/scripts/train_konvid.py \
    --model both \
    --output-c2 runs/c2_konvid \
    --output-c3 runs/c3_konvid \
    --epochs-c2 50 \
    --epochs-c3 200 \
    --seed 42

# 4. export the ONNX + sidecar + registry rows
.venv/bin/python ai/scripts/export_tiny_models.py \
    --c2-ckpt runs/c2_konvid/last.ckpt \
    --c3-ckpt runs/c3_konvid/last.ckpt

# 5. quantise to INT8
.venv/bin/python ai/scripts/ptq_dynamic.py \
    --model model/tiny/nr_metric_v1.onnx \
    --output model/tiny/nr_metric_v1.int8.onnx

# 6. validate against the registry
.venv/bin/python ai/scripts/validate_model_registry.py

Fast-NR calibration sidecar¶

vmaf-tune --fast-nr reads model/tiny/nr_metric_v1.json for the optional calibration fields used to skip full-reference calls during CRF bisection: calibration_slope, calibration_intercept, and calibration_threshold. The slope/intercept map this model's raw MOS-like output into VMAF units before the threshold comparison. Regenerate those fields with:

.venv/bin/python ai/scripts/calibrate_nr_threshold.py \
    --corpus .corpus/netflix/ \
    --output model/tiny/nr_metric_v1.json \
    --nr-ep cpu

Use --nr-ep cpu when a long CUDA/ROCm extraction job is already running or when calibration should be bit-for-bit reproducible across hosts. The default --nr-ep auto tries CUDA/ROCm ONNX Runtime providers first and falls back to CPU. When the corpus path contains a ref/ directory, calibration sweeps only that reference directory; the local Netflix public source names are recognised as 1920×1080 YUV even though the filenames omit the geometry.

Fresh sidecars include ADR-0661 run_provenance with the requested and actual corpus directories, nr_metric_v1.onnx, CRF grid, parsed CLI arguments, JSON output path, and Markdown calibration report path.

Known limitations¶

Single-frame, no temporal context: quality of slow-motion blur, flicker, or buffering artefacts may be underestimated relative to human perception, which integrates over ≥1 s of video.
Grayscale only: chroma degradation (colour bleeding, banding in blue channel) contributes nothing to the prediction.
KoNViD-1k domain: the corpus is user-generated internet content at moderate bitrates. Performance on professionally shot content, HDR/WCG material, or severe synthetic degradation outside the training distribution may be unreliable.
MOS scale calibration: the 1–5 scale is calibrated to KoNViD-1k's specific test conditions. Direct comparison to VMAF scores or other dataset MOS values requires dataset-specific recalibration.

learned_filter_v1.md — sibling KoNViD-1k baseline (C3 residual filter, same training corpus).
ADR-0168 — decision record for both C2 + C3 KoNViD baselines.
ADR-0174 — INT8 dynamic-PTQ policy.
ADR-0248 — PTQ accuracy budget for this model.
ADR-0042 — tiny-AI doc-substance rule this card satisfies.

nr_metric_v1 — tiny no-reference quality metric¶