ADR-0248: nr_metric_v1 joins dynamic-PTQ family (T5-3d)¶
- Status: Accepted
- Date: 2026-04-29
- Deciders: Lusoris, Claude (Anthropic)
- Tags: tiny-ai, onnx, quantization, registry, fork-local
Context¶
ADR-0173 shipped the audit-first PTQ harness. ADR-0174 flipped the first model — learned_filter_v1 — into quant_mode: "dynamic" and explicitly deferred nr_metric_v1 because onnxruntime.quantization.quantize_dynamic raised Inferred shape and existing shape differ in dimension 0: (128) vs (1) during its internal shape inference pass.
PR #174 (T5-3e empirical PTQ accuracy) hit the same class of failure on vmaf_tiny_v1*.onnx and traced the root cause: torch.onnx.export duplicates every initialiser into graph.value_info with static-shape annotations that do not survive the dynamic batch axis substitution. ORT's pre-quantisation shape inference then fails when the duplicated record disagrees with the canonical shape on the initialiser. PR #174 introduced a _save_inlined helper in ai/scripts/measure_quant_drop_per_ep.py that strips every value_info entry whose name collides with an initialiser. Inspecting the shipped model/tiny/nr_metric_v1.onnx confirmed the same duplicate pattern (29 initialisers, all 29 mirrored in graph.value_info).
Decision¶
We will (1) port the PR #174 strip pattern into the canonical export path and the dynamic-PTQ entry point, (2) re-save the existing nr_metric_v1.onnx with the value_info duplicates stripped (the inference-graph semantics are unchanged — initialisers carry their own canonical shape — and ONNX-Runtime CPU produced bit-identical output before vs after on a deterministic 16-sample input set), and (3) flip nr_metric_v1 to quant_mode: "dynamic" with the same 0.01-PLCC budget as learned_filter_v1. The drop measured by ai/scripts/measure_quant_drop.py is 0.007674 (PLCC 0.992326), inside budget.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Strip duplicates only inside ptq_dynamic.py (no export-side fix) | Smallest blast radius; existing on-disk model bytes stay frozen | Every future re-export of any tiny model would re-introduce the bug; the next model to quantise hits it again | Rejected. The cost of also updating exports.py is two lines, the cost of repeating the diagnosis is hours per model. |
Re-train + re-export from a Lightning checkpoint with dynamo=True and a fresh dynamic_axes spec | Yields a graph the upstream tooling produces cleanly; future-proof against torch.onnx legacy quirks | No runs/c2_konvid/last.ckpt is committed; KoNViD-1k corpus is not redistributable; would block T5-3d on a full retraining cycle just to reproduce the same weights | Rejected. The fp32 weights are already audited (sha256-pinned in registry); re-saving with value_info stripped preserves the audit chain. |
Pin a workaround in onnxruntime | Long-term cleanest if upstream accepts | Requires new release dependency; onnxruntime 1.22 is the floor across the rest of the harness | Rejected. The strip is a five-line ONNX-level transform; an upstream patch would take longer than this entire PR. |
Promote nr_metric_v1 to static PTQ instead of dynamic | Slightly better accuracy; per-channel calibration | Requires shipping a calibration .npz; dynamic already inside budget by 23% | Rejected. ADR-0174 precedent: don't add the calibration-asset cost until a budget violation forces it. |
Consequences¶
- Positive:
- C2 (
nr_metric_v1) is now part of the quantised family; end-to-end PTQ flow now covers two of the three production tiny models (the third — LPIPS-Sq — is upstream-derived and out of scope for fork-local quantisation decisions). - The
value_infostrip inexport_to_onnxmakes every future fork-trained tiny model PTQ-clean by construction. - The
value_infostrip insideptq_dynamic.pymakes the entry point robust against pre-existing fork-local ONNX files that pre-date the export-side fix. - 2.0× size shrink (119 KB → 58 KB).
- Negative:
- The on-disk
nr_metric_v1.onnxsha256 changes (60c2bd59…→75eff676…). All consumers that pinned the pre-T5-3d hash need to roll forward. Same audit trail applies as to a normal model refresh. - PLCC drop (0.0077) is much higher than
learned_filter_v1(0.000117). Still inside the 0.01 budget, but the headroom is only 23% — a future architectural change to the C2 path could cross the line. Tracked indocs/ai/quantization.mdas the motivating follow-up to revisitstaticPTQ if budget headroom erodes. - Neutral / follow-ups:
nr_metric_v1Sigstore bundle is still a placeholder (nr_metric_v1.onnx.sigstore.jsonis populated at release time by.github/workflows/supply-chain.yml) — same lifecycle as the fp32 file.
Tests¶
python ai/scripts/ptq_dynamic.py model/tiny/nr_metric_v1.onnxproduces a 59 797-byte int8 file.python ai/scripts/measure_quant_drop.py model/tiny/nr_metric_v1.onnxreports[PASS] nr_metric_v1 mode=dynamic PLCC=0.992326 drop=0.007674 budget=0.0100.python ai/scripts/validate_model_registry.pyreportsOK: 6 registry entries valid against registry.schema.json.- The CI
ai-quant-accuracystep (introduced in ADR-0174) now exercises bothlearned_filter_v1andnr_metric_v1.
Reproducer¶
# (one-time, on the existing on-disk fp32 file)
python - <<'PY'
import onnx
m = onnx.load("model/tiny/nr_metric_v1.onnx")
init = {t.name for t in m.graph.initializer}
keep = [vi for vi in m.graph.value_info if vi.name not in init]
del m.graph.value_info[:]
m.graph.value_info.extend(keep)
onnx.save(m, "model/tiny/nr_metric_v1.onnx", save_as_external_data=False)
PY
# Quantise + gate.
python ai/scripts/ptq_dynamic.py model/tiny/nr_metric_v1.onnx
python ai/scripts/measure_quant_drop.py model/tiny/nr_metric_v1.onnx
References¶
- ADR-0173 — PTQ audit harness.
- ADR-0174 — first per-model PTQ (
learned_filter_v1); deferrednr_metric_v1to T5-3c, absorbed here as T5-3d. - ADR-0168 —
nr_metric_v1baseline. - PR #174 (T5-3e empirical PTQ) — origin of the
_save_inlinedworkaround. - BACKLOG row T5-3d — per-model PTQ umbrella.
req— user direction 2026-04-29: implement T5-3d first sub-bullet (re-exportnr_metric_v1with explicit dynamic batch axis + run T5-3 PTQ pipeline).