ADR-0624: Fast NR Pre-Scoring Implementation (ADR-0615 impl)¶
- Status: Accepted (Implemented)
- Date: 2026-05-19
- Deciders: lusoris
- Tags:
ai,vmaf-tune,bisect,onnx,fork-local
Context¶
ADR-0615 decided to use an NR early-elimination strategy for the Phase B CRF bisect: at each midpoint, score the distorted stream with nr_metric_v1.onnx (~200 ms CPU, <50 ms GPU EP), and skip the full-reference VMAF call when |NR - target| > δ_fast. The implementation phases were:
- P1 —
NRProxyBackendclass inscore_backend.pywith per-CRF cache and ONNX session lifecycle (CUDA/ROCm EP with CPU fallback). - P2 —
ai/scripts/calibrate_nr_threshold.pycalibration sweep. - P3 —
--fast-nrCLI flag oncompareandtune-per-shot; threading throughmake_bisect_predicateand_build_per_shot_bisect_predicate. - P4 — Docs
docs/usage/vmaf-tune-fast-nr.md.
The score_backend.py module was already the natural seam (ADR-0315); the NRProxyBackend class sits alongside the existing FR backend selector without touching the shared backend-selection logic.
Decision¶
Implement ADR-0615 Option B (NR early-elimination) across four phases as scoped above. Key design choices made during implementation:
- Inference error isolation:
session.run()exceptions are wrapped inNRProxyBackendErrorso the bisect loop can safely fall through to FR on any inference failure — zero risk of silent correctness errors. - Single shared instance across the compare sweep: one
NRProxyBackendis constructed in_run_compareand shared across all per-targetmake_bisect_predicateclosures. The per-CRF result cache is therefore shared across target rungs, which is correct because NR scores are target-independent. - Final CRF always gets FR confirmation: the
cur_lo < cur_higuard inbisect_target_vmafensures the last collapsed-window CRF never uses the NR path — this preserves the FR correctness guarantee regardless of δ_fast tuning. - Sentinel-based coupling:
_encode_and_scoresignals NR skip via a structured sentinel string (__nr_skip__:<direction>;<nr_score>) inBisectResult.error, allowingbisect_target_vmafto parse direction + NR score without widening the return type. - δ_fast from sidecar JSON:
calibration_thresholdis written byai/scripts/calibrate_nr_threshold.pyintomodel/tiny/nr_metric_v1.jsonso the value survives model updates. The compile-time default (8.0 VMAF) is used when the sidecar field is absent, matching the ADR-0615 design default.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Widen BisectResult with an nr_skipped bool field | Cleaner API | Changes the frozen dataclass shape; breaks existing callers that pattern-match on fields | Sentinel string avoids the shape change with zero downstream impact |
Construct NRProxyBackend per-predicate | Each predicate has its own cache | Model loaded N times (N = n_codecs × n_targets); GPU EP initialisation adds ~2 s per load | Single shared instance is correct and fast |
Expose nr_proxy_backend in BisectResult telemetry fields directly | More observable | Adds a non-serialisable object to the result; breaks JSON export | fr_calls_total / fr_calls_saved integer fields carry the observable signal without coupling |
subprocess.run the ONNX inference in a separate process | Isolates segfault risk | ~500 ms IPC overhead per call kills the speedup | In-process with error wrapping is the right trade-off at this latency budget |
Consequences¶
- Positive: 2–4× bisect wall-time reduction on content far from target; direct enabler for the DO (ADR-0613) and per-shot ABR (ADR-0614) inner loops. Calibration script and report make δ_fast a versioned, auditable artefact.
- Negative: Two-model invocation adds ~200 ms overhead on boundary shots (NR within δ_fast zone); onnxruntime + numpy become soft dependencies for
--fast-nr(hard error only when the flag is passed). - Neutral:
fr_calls_total/fr_calls_savedtelemetry inBisectResultis always populated when NR is active, zero otherwise — no BC break for existing callers.
References¶
- ADR-0615: Decision to implement fast NR pre-scoring.
- Research-0611: Design options and calibration plan.
tools/vmaf-tune/src/vmaftune/score_backend.py—NRProxyBackendclass.tools/vmaf-tune/src/vmaftune/bisect.py— sentinel path +make_bisect_predicate.tools/vmaf-tune/src/vmaftune/cli.py—--fast-nroncompare+tune-per-shot.tools/vmaf-tune/tests/test_nr_proxy_backend.py— 40 unit tests.ai/scripts/calibrate_nr_threshold.py— calibration sweep script.model/tiny/nr_metric_v1.json—calibration_thresholdfield.docs/usage/vmaf-tune-fast-nr.md— user-facing documentation.- Source: per user direction (fast-nr-prescoring re-dispatch, 2026-05-19).