ADR-0665: ¶
- Status: Proposed
- Date: 2026-05-21
- Deciders:
- Tags:
Context¶
Decision¶
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Consequences¶
- Positive:
- Negative:
- Neutral / follow-ups:
References¶
- See ADR-0535 for the original allocator design.
- See ADR-0628 for the remote-aware extension.
- Source:
.\ >
ADR-0665: Fast-NR calibration quality guard¶
- Status: Accepted
- Date: 2026-05-21
- Deciders: Lusoris maintainers
- Tags: ai, vmaf-tune, calibration, fast-nr, quality-gate
Context¶
vmaf-tune --fast-nr uses nr_metric_v1 as a cheap no-reference proxy before paying for full-reference VMAF. The proxy only unlocks safe consumer-hardware speedups when its sidecar maps raw NR scores into VMAF units with a meaningful corpus fit. A 2026-05-21 CPU calibration sweep over the current local corpus produced a very weak fit (PLCC=0.0821, sigma=16.2445, delta_fast=32.49), which would turn the sidecar into a misleading runtime contract if written.
Decision¶
ai/scripts/calibrate_nr_threshold.py will quality-gate every attempted JSON sidecar write. The default gate requires at least 10 calibration samples and PLCC >= 0.70. Weak fits still produce a Markdown report, but the script refuses to update nr_metric_v1.json unless the operator explicitly passes --allow-weak-calibration. The sidecar records gate status, gate thresholds, and rejection reasons whenever it is written.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Keep writing every fitted sidecar | Fastest to operate; preserves old CLI behavior | Can promote non-predictive NR fits into vmaf-tune early-elimination and hide the reason inside a JSON sidecar | The measured PLCC failure showed this is unsafe for the Netflix-style consumer pipeline |
| Use only sample-count gating | Blocks tiny smoke fits | Still accepts anti-correlated or flat NR signals when enough rows exist | The real failed sweep had enough rows; correlation was the missing safety check |
| Require manual review outside the script | Keeps code simple | Easy to skip during overnight training/recalibration runs; no machine-readable status reaches the sidecar | The guard belongs at the write boundary that creates the tune input |
Consequences¶
- Positive:
vmaf-tuneonly consumes fast-NR sidecars that passed a minimum predictive-signal check, making the fast path a real speedup rather than a hidden quality risk. - Negative: quick one-clip calibration probes now fail the write gate unless callers lower the sample gate or use
--allow-weak-calibrationfor diagnostics. - Neutral / follow-ups: future NR models should tune the default PLCC gate with held-out real-corpus evidence instead of weakening tests or runtime thresholds.
References¶
- Research-0685
- ADR-0615
- ADR-0624
- Source:
req— "ai unlocks the speedup in tune for the full netflix style pipeline on consumer stuff"