Second-opinion feature materializer¶
ai/scripts/materialize_second_opinion_features.py joins externally generated NR/MOS scores onto refreshed feature tables. Use it after running fork-local or third-party scorers such as fork-nr-metric, DOVER, Q-Align, FAST-VQA, MUSIQ, or CLIP-IQA, and before retraining MOS heads, predictor models, or running the signal-mix audit.
The script does not run, vendor, or link any external scorer. It only reads score JSON/JSONL files and appends namespaced columns:
| Column | Meaning |
|---|---|
second_opinion_<scorer>_score | Clip-level MOS/VQA score from the scorer. |
second_opinion_<scorer>_status | ok, missing, or bad. |
second_opinion_<scorer>_runtime_ms | Runtime reported by the scorer, when available. |
second_opinion_<scorer>_frames | Number of frame scores averaged when the score came from frame JSON. |
second_opinion_<scorer>_source | Score file that supplied the row. |
Inputs¶
Feature tables may be parquet, JSONL/NDJSON, or a JSON object with rows. Score files may be JSONL/NDJSON or JSON. Each score row needs:
- a row key such as
clip_id,video_id,filename,name, orpath; - a
competitorfield, unless passed asLABEL=paththrough--scores; - either a scalar score field such as
score,mos,predicted_mos,predicted_vmaf_or_mos, or a wrapper-styleframes[]list with per-framepredicted_vmaf_or_mos.
External-bench wrapper payloads are accepted directly when they include a clip key. Their per-frame scores are averaged if no summary-level score is present.
Examples¶
Join DOVER and the fork NR scorer onto a CHUG feature shard:
.venv/bin/python ai/scripts/materialize_second_opinion_features.py \
--features .corpus/chug/training/fr_canonical_shards/shard_000.features.jsonl \
--scores dover-mobile=.workingdir2/second-opinion/dover-chug.jsonl \
--scores fork-nr-metric=.workingdir2/second-opinion/fork-nr-chug.jsonl \
--out .workingdir2/second-opinion/shard_000.with-second-opinion.jsonl \
--audit-json .workingdir2/second-opinion/shard_000.audit.json
Use --missing-policy fail for promotion-grade tables where every row must have every scorer. Use the default mark during exploratory audits; missing scores remain visible as second_opinion_<scorer>_status = "missing".
When --audit-json is set, the audit file includes ADR-0661 run_provenance with the feature table, score sidecar paths, parsed join options, output table target, and audit target. Keep that audit with any derived table used for retraining; external scorer sidecars can be rebuilt only if their exact join inputs are preserved.
Batch Manifest¶
Use ai/scripts/batch_materialize_second_opinion_features.py when the same scorer family needs to be joined across multiple refreshed tables. Paths in the manifest are relative to the manifest file unless --base-dir is supplied.
{
"defaults": {
"missing_policy": "fail",
"key_normalize": "auto"
},
"tables": [
{
"id": "chug_hdr",
"features": ".corpus/chug/training/fr_canonical_shards/shard_000.features.jsonl",
"scores": [
"dover-mobile=.workingdir2/second-opinion/dover-chug.jsonl",
"fork-nr-metric=.workingdir2/second-opinion/fork-nr-chug.jsonl"
],
"out": ".workingdir2/second-opinion/shard_000.with-second-opinion.jsonl",
"audit_json": ".workingdir2/second-opinion/shard_000.audit.json"
}
]
}
.venv/bin/python ai/scripts/batch_materialize_second_opinion_features.py \
--manifest .workingdir2/second-opinion/batch.json \
--report-json .workingdir2/second-opinion/batch.report.json \
--report-md .workingdir2/second-opinion/batch.report.md
Each table may override any single-run join option from defaults, including missing_policy, key_column, score_key_field, score_field, key_normalize, prefix, and overwrite. The batch report uses schema second-opinion-materializer-batch-v1 and carries ADR-0661 run_provenance.
Smoke-Run Scaffold¶
A self-contained smoke run with synthetic fixture tables is committed under ai/testdata/smoke-second-opinion-batch/. Run it from the repo root to verify the full pipeline end-to-end before using real corpus data:
mkdir -p /tmp/vmafx-smoke-second-opinion
PYTHONPATH=ai/scripts python ai/scripts/batch_materialize_second_opinion_features.py \
--manifest ai/testdata/smoke-second-opinion-batch/batch.json \
--base-dir . \
--report-json /tmp/vmafx-smoke-second-opinion/smoke.report.json \
--report-md /tmp/vmafx-smoke-second-opinion/smoke.report.md
Expected output: tables=2 input_rows=5 output_rows=5 failed_tables=0. See ai/testdata/smoke-second-opinion-batch/README.md for the full inspection checklist.
Reading The Columns¶
Second-opinion columns are advisory evidence, not replacements for reference metrics or subjective labels. Their main value is intersection:
- disagreeing with VMAF on UGC/HDR clips that humans rate poorly;
- separating no-reference camera defects from encode artifacts;
- giving MOS heads an extra axis when FR features saturate;
- showing signal-mix audits that NR/MOS evidence is present rather than silently missing.
Do not commit derived score tables that contain licensed third-party outputs or private MOS labels unless the dataset and scorer licences explicitly allow redistribution.