Skip to content

ADR-0674: Second-Opinion Materializer Batch Manifest

  • Status: Accepted
  • Date: 2026-05-21
  • Deciders: Lusoris, Codex
  • Tags: ai, second-opinion, materializer, provenance, fork-local

Context

ADR-0657 added ai/scripts/materialize_second_opinion_features.py, a table-side joiner for externally generated NR/MOS scorer JSON. That design intentionally keeps competitor execution out of the repo: operators generate DOVER, Q-Align, FAST-VQA, fork-NR, or other sidecars elsewhere, then join the scalar evidence onto refreshed feature tables.

The current signal-mix backlog needs those joins across several refreshed tables. Repeating the single-table command by hand risks mixing scorer labels, missing policies, and output paths, and it produces no single artifact that downstream retraining can cite as the second-opinion refresh set.

Decision

Add ai/scripts/batch_materialize_second_opinion_features.py, a manifest-driven orchestrator over the existing second-opinion materializer. The manifest has shared defaults and a tables[] array. Each table carries id, features, scores, out, optional audit_json, and any single-table join override. Relative paths resolve from the manifest directory unless --base-dir is supplied.

The batch runner writes each joined table, optional per-table audit JSON, and a second-opinion-materializer-batch-v1 report with ADR-0661 run provenance. It must call materialize_second_opinion_features.materialize() for every table; it does not parse scorer payloads itself and does not invoke external scorer binaries.

Alternatives considered

Option Pros Cons Why not chosen
Manifest-driven batch wrapper over the shared joiner Repeatable multi-table joins; one provenance report; preserves the no-external-scorer boundary Adds one operator-facing CLI Chosen: it closes the execution gap without changing row semantics
Keep shell loops No new Python surface No stable batch artifact, weak provenance, easy label/path drift Rejected: shell history is not adequate training evidence
Invoke competitor scorers from the batch runner Fully automated end-to-end scorer generation Vendors/links third-party competitors and violates ADR-0657's table-side boundary Rejected: scorer execution remains outside this repo
Add corpus-specific second-opinion scripts Simple per-corpus defaults Duplicates join policy and column semantics Rejected: corpus differences are already expressible in manifest entries

Consequences

  • Positive: Second-opinion joins for CHUG, KoNViD, UGC, Netflix, and BVI can be replayed from one manifest and cited by retraining jobs.
  • Negative: Join option changes must keep the single-table script and batch manifest validation in sync.
  • Neutral / follow-ups: Generate scorer sidecars, run the batch manifest on refreshed tables, rerun the signal-mix audit, and measure MOS/predictor retrain impact.

References

  • ADR-0657 — table-side second-opinion joiner.
  • ADR-0661 — shared AI run provenance.
  • Research-0694 — implementation digest.
  • Source: req — "yeah every possible gain through intersection (even of not yet included metrics)... thats an interesting topic for sure lol"
  • Source: req — "well go on i guess we have enough backlog..."