Skip to content

vmaf-tune — Fast NR Pre-Scoring (--fast-nr)

--fast-nr enables NR early-elimination in the Phase B CRF bisect (ADR-0615 / ADR-0624). Instead of running a full-reference VMAF call at every bisect midpoint, the cheap nr_metric_v1 ONNX model scores the distorted stream alone (~200 ms CPU, <50 ms GPU EP), maps that raw MOS-like score into VMAF units with the sidecar calibration, and skips the expensive FR call whenever the calibrated proxy is far from the target. The final accepted CRF always receives a full-reference confirmation call.

Typical wall-time reduction: 2–4× on in-domain content that sits well away from the target VMAF.

Supported subcommands

Subcommand Flag supported
compare --fast-nr
tune-per-shot --fast-nr
corpus not applicable (no bisect loop)
ladder not applicable (corpus-sampler mode)

Requirements

pip install onnxruntime         # CPU only
pip install onnxruntime-gpu     # CUDA / ROCm EP (recommended on GPU hosts)
pip install numpy               # required by the NR frame extractor

The nr_metric_v1.onnx model is already in-tree at model/tiny/. No additional download is required.

Quick start

# Compare four CPU codecs with NR pre-scoring enabled:
vmaf-tune compare \
    --src source.yuv --width 1920 --height 1080 \
    --framerate 24 --duration 10 \
    --target-vmaf 93 \
    --fast-nr

# Per-shot encode with NR pre-scoring:
vmaf-tune tune-per-shot \
    --src source.yuv --width 1920 --height 1080 \
    --framerate 24 \
    --target-vmaf 90 \
    --encoder libx265 \
    --fast-nr

The terminal will print the resolved δ_fast threshold at startup:

vmaf-tune compare: --fast-nr enabled; δ_fast=8.0 VMAF (NR early-elimination)

And INFO-level logs report savings per bisect run:

fast-nr: bisect done — FR calls 3 total, 4 saved (57%)

The δ_fast threshold

δ_fast is the VMAF-unit half-width of the "uncertainty zone". The raw nr_metric_v1 output is first mapped as NR_VMAF = calibration_slope × NR_raw + calibration_intercept. When |NR_VMAF − target| > δ_fast the FR call is skipped. When within the zone the FR call is paid.

The default value (8.0 VMAF) comes from the ADR-0615 design target and is conservative enough to be safe on un-calibrated hosts. The calibration script fits a more precise value from your actual corpus:

python ai/scripts/calibrate_nr_threshold.py \
    --corpus .corpus/netflix/ \
    --output model/tiny/nr_metric_v1.json \
    --nr-ep cpu

After a successful calibration run, nr_metric_v1.json will contain calibration_slope, calibration_intercept, and calibration_threshold fields that NRProxyBackend picks up automatically on the next --fast-nr invocation. The calibration report is written to docs/ai/models/nr_metric_v1-calibration-<date>.md. Fresh calibration JSON also includes ADR-0661 run_provenance so the threshold can be traced back to the corpus directory, nr_metric_v1.onnx input, CRF grid, CLI arguments, and Markdown report path that produced it.

The script now guards the sidecar write: by default it requires at least 10 samples and PLCC ≥ 0.70 between raw NR scores and FR VMAF. Weak fits still write the Markdown report with a WEAK quality status, but they do not update nr_metric_v1.json unless --allow-weak-calibration is passed for diagnostic work. This keeps --fast-nr from consuming a sidecar that cannot actually speed up the Netflix-style tune loop safely.

Content-type considerations

The default δ_fast = 8.0 VMAF is a global threshold calibrated on mixed-content Netflix clips. On highly homogeneous content (animation, screen-capture) NR correlates better with FR and a tighter δ_fast (e.g. 5.0) may yield more FR savings without correctness risk. On high-motion sports content NR can be less predictive, so a looser δ_fast (e.g. 10–12) is safer. Use --fast-nr together with --delta-fast (not yet exposed in CLI; inject via sidecar JSON) for content-specific tuning.

Calibration script

ai/scripts/calibrate_nr_threshold.py walks a YUV corpus at a CRF grid, runs FR + NR scoring for each clip, fits vmaf_fr ≈ calibration_slope × nr_raw + calibration_intercept via least-squares, and sets δ_fast = 2σ(residuals).

usage: calibrate_nr_threshold.py [-h] [--corpus CORPUS] [--output JSON]
                                 [--onnx ONNX] [--crfs CRFS] [--codec CODEC]
                                 [--preset PRESET] [--width WIDTH]
                                 [--height HEIGHT] [--pix-fmt PIX_FMT]
                                 [--max-clips N] [--vmaf-bin VMAF_BIN]
                                 [--ffmpeg-bin FFMPEG_BIN] [--report-dir DIR]
                                 [--delta-fast VMAF] [--nr-ep {auto,cpu}]
                                 [--min-calibration-samples N]
                                 [--min-plcc R]
                                 [--allow-weak-calibration]
                                 [--dry-run] [-v]

Key options:

Option Default Description
--corpus PATH .corpus/netflix/ Directory of reference YUVs; if PATH/ref/ exists, only that reference subdirectory is swept
--output JSON model/tiny/nr_metric_v1.json Sidecar JSON to update
--crfs LIST 18,23,28,33,38 CRF grid to sweep
--max-clips N (all) Limit clips for quick smoke runs
--nr-ep auto\|cpu auto auto tries CUDA/ROCm EP before CPU; cpu pins CPU inference for reproducible calibration or when CUDA is already busy
--min-calibration-samples N 10 Minimum fitted samples required before the JSON sidecar can be written
--min-plcc R 0.70 Minimum positive NR-vs-FR Pearson correlation required before the JSON sidecar can be written
--allow-weak-calibration off Diagnostic override that writes a weak sidecar with calibration_quality_status = "weak"
--dry-run off Compute δ_fast without writing the JSON
--delta-fast VMAF (fit from data) Force a specific δ_fast value

Without a real corpus the script falls back to the YUVs under python/test/resource/yuv/ for a minimal smoke run. The in-tree Netflix public corpus convention uses 1080p source filenames such as BigBuckBunny_25fps.yuv; those names are recognised even though they do not spell out 1920x1080.

FR call savings — example

On a 10-second 1080p source at --target-vmaf 93 with the default CRF window [0, 63] and max_iterations=8:

Phase CRF NR score δ_fast Action
iter 1 31 72.4 8.0 NR_VMAF far below target → skip FR, lower CRF
iter 2 15 97.8 8.0 NR_VMAF far above target → skip FR, raise CRF
iter 3 23 91.1 8.0 NR_VMAF within zone → pay FR (92.8 measured)
iter 4 27 86.5 8.0 NR_VMAF far below target → skip FR, lower CRF
iter 5 (final) 25 always FR: 93.1 measured ✓

Result: 5 bisect iterations, 2 FR calls (iterations 3 and 5), 3 FR calls saved — 60% FR reduction.

Implementation details

  • Model: model/tiny/nr_metric_v1.onnx — single-luma-frame MobileNet scoring from the middle frame of the distorted YUV.
  • Cache: NRProxyBackend caches per (path, width, height) within a bisect run so repeated calls for the same CRF output are free.
  • GPU EP: CUDA / ROCm execution provider is preferred when onnxruntime-gpu is installed; falls back to CPU EP silently.
  • Error handling: any NR inference failure falls through to FR — correctness is never compromised.
  • Telemetry: BisectResult.fr_calls_total and .fr_calls_saved carry the per-run savings count, exposed in INFO logs.

See also