ADR-0673: ¶

Status: Proposed
Date: 2026-05-21
Deciders:
Tags:

Context¶

Decision¶

Alternatives considered¶

Option	Pros	Cons	Why not chosen

Consequences¶

Positive:
Negative:
Neutral / follow-ups:

References¶

See ADR-0535 for the original allocator design.
See ADR-0628 for the remote-aware extension.
Source: .\>

ADR-0673: Saliency Materializer Batch Manifest¶

Status: Accepted
Date: 2026-05-21
Deciders: Lusoris, Codex
Tags: ai, saliency, materializer, provenance, fork-local

Context¶

ADR-0655 introduced ai/scripts/materialize_saliency_features.py as the single table-shaped saliency enrichment path. ADR-0672 then added explicit model and temporal-reducer metadata so refreshed tables can compare saliency_student_v1, U2NetP, EMA, max, and motion-weighted saliency rows.

The remaining operational gap is execution shape. The current refresh backlog needs the same saliency materializer run over CHUG, KoNViD, UGC, Netflix, and BVI-derived tables before predictor and MOS-head retrains can measure the signal. Hand-written shell loops lose per-table config, overwrite policy, row-failure status, and provenance, and they are easy to forget after context compression.

Decision¶

Add ai/scripts/batch_materialize_saliency_features.py, a manifest-driven orchestrator over the existing single-table materializer. The manifest carries shared defaults plus a tables[] array with per-table id, input, output, optional audit_json, and any SaliencyMaterializeConfig override. Relative paths resolve from the manifest directory unless --base-dir is given. The runner writes each output table, optional per-table audit JSON, and one saliency-materializer-batch-v1 report with ADR-0661 run provenance.

The batch script must not own decoding or saliency semantics. It imports materialize_rows(), read_table(), and write_table() from the single-table script so row statuses and output columns stay identical.

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Manifest-driven batch wrapper over the shared materializer	Repeatable multi-table refreshes; one provenance report; no duplicate decode/model code	Adds one operator-facing CLI	Chosen: it closes the immediate saliency execution gap while preserving ADR-0655's shared row semantics
Keep using shell loops	No new Python surface	No stable config schema, no batch report, weak resume/audit story	Rejected: this is exactly how saliency table refreshes get lost or mixed across models
Add corpus-specific materializers	Each corpus can hard-code roots and columns	Duplicates row IO/status/model logic and violates the shared-materializer invariant	Rejected: the current table schema already represents the corpus differences
Fold batch mode into the single-table CLI	One script name	Makes the single-table parser carry two unrelated invocation shapes	Rejected: a thin orchestration script is easier to test and keep out of trainer hot loops

Consequences¶

Positive: Saliency refreshes across CHUG, KoNViD, UGC, Netflix, and BVI can be launched from one versioned manifest with per-table audits.
Negative: Operators have one more AI script to validate when changing saliency materializer config fields.
Neutral / follow-ups: Use the batch manifest on refreshed tables, rerun the signal-mix audit, and measure predictor / MOS-head retrain impact.

References¶

ADR-0655 — shared saliency table materializer.
ADR-0672 — model and temporal-reducer metadata for saliency rows.
ADR-0661 — shared AI run provenance.
Research-0693 — implementation digest.
Source: req — "well go on i guess we have enough backlog..."
Source: req — "well and in this audit perhaps find gaps that we have no metric/signal for at all or so"