ADR-0657: Second-Opinion Feature Materializer¶
- Status: Accepted
- Date: 2026-05-20
- Deciders: Lusoris agents
- Tags: ai, signal-mix, mos, external-bench
Context¶
ADR-0650 made the signal-mix blind spots explicit: refreshed feature tables still need no-reference and subjective-MOS second opinions so model retrains can see where FR metrics saturate or disagree with human-rated UGC/HDR clips. Some useful scorers are in-tree (nr_metric_v1); others, such as DOVER, Q-Align, FAST-VQA, MUSIQ, and CLIP-IQA, are external projects with separate licence and installation constraints.
The fork already keeps tools/external-bench/ wrapper-only for licence reasons. The feature-table path needs the same boundary: corpus materialisation should accept scorer outputs, not embed third-party competitor execution inside the training scripts.
Decision¶
Add ai/scripts/materialize_second_opinion_features.py, a table-side joiner that reads already-generated scorer JSON/JSONL and appends namespaced second_opinion_<scorer>_* columns to parquet/JSONL feature tables.
The materializer:
- supports common feature-table formats: parquet, JSONL/NDJSON, and JSON rows;
- supports scalar score rows and external-bench wrapper-style frame payloads;
- joins by an inferred or explicit row key;
- records score, status, runtime, frame count, and score-file provenance;
- rejects duplicate
(scorer, key)rows instead of silently averaging them; - can mark, drop, or fail rows with missing scorer outputs.
ai/scripts/signal_mix_audit.py also recognises second_opinion_* and common NR/MOS scorer names as no-reference / subjective-MOS evidence.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Run external competitors directly from corpus feature extraction | One command could produce all columns | Pulls licence-bound external tools into a training path and makes tests depend on optional binaries | Violates the wrapper-only posture and makes reproducibility worse |
Extend tools/external-bench/compare.py into a corpus enricher | Reuses wrapper discovery | The benchmark harness is aggregate/report-oriented and currently loses row-table context | Keep benchmark comparison and feature-table enrichment separate |
| Join second-opinion scores inside each trainer | No extra operator step | Duplicates join logic across MOS heads, predictor training, and future refresh scripts | Central table materialisation is easier to audit and test |
Consequences¶
- Positive: refreshed tables can carry NR/MOS second-opinion features without waiting for every scorer to become a first-class in-tree extractor.
- Positive: the licence boundary remains clean; external scorer execution stays out-of-tree or under wrapper-only harnesses.
- Negative: operators must run scorers separately and preserve a row key in their JSON/JSONL outputs.
- Neutral / follow-ups: run the materializer on CHUG, KoNViD/UGC, and Netflix-derived refreshed tables, then measure retrain impact with
signal_mix_audit.pyand the relevant MOS/predictor gates.
References¶
- ADR-0650
- docs/ai/signal-mix-audit.md
- User request: "well that sounds like a lot to do... go on then"