Skip to content

ADR-0673:

  • Status: Proposed
  • Date: 2026-05-21
  • Deciders:
  • Tags:

Context

Decision

Alternatives considered

Option Pros Cons Why not chosen

Consequences

  • Positive:
  • Negative:
  • Neutral / follow-ups:

References

  • See ADR-0535 for the original allocator design.
  • See ADR-0628 for the remote-aware extension.
  • Source: .\>

ADR-0673: Saliency Materializer Batch Manifest

  • Status: Accepted
  • Date: 2026-05-21
  • Deciders: Lusoris, Codex
  • Tags: ai, saliency, materializer, provenance, fork-local

Context

ADR-0655 introduced ai/scripts/materialize_saliency_features.py as the single table-shaped saliency enrichment path. ADR-0672 then added explicit model and temporal-reducer metadata so refreshed tables can compare saliency_student_v1, U2NetP, EMA, max, and motion-weighted saliency rows.

The remaining operational gap is execution shape. The current refresh backlog needs the same saliency materializer run over CHUG, KoNViD, UGC, Netflix, and BVI-derived tables before predictor and MOS-head retrains can measure the signal. Hand-written shell loops lose per-table config, overwrite policy, row-failure status, and provenance, and they are easy to forget after context compression.

Decision

Add ai/scripts/batch_materialize_saliency_features.py, a manifest-driven orchestrator over the existing single-table materializer. The manifest carries shared defaults plus a tables[] array with per-table id, input, output, optional audit_json, and any SaliencyMaterializeConfig override. Relative paths resolve from the manifest directory unless --base-dir is given. The runner writes each output table, optional per-table audit JSON, and one saliency-materializer-batch-v1 report with ADR-0661 run provenance.

The batch script must not own decoding or saliency semantics. It imports materialize_rows(), read_table(), and write_table() from the single-table script so row statuses and output columns stay identical.

Alternatives considered

Option Pros Cons Why not chosen
Manifest-driven batch wrapper over the shared materializer Repeatable multi-table refreshes; one provenance report; no duplicate decode/model code Adds one operator-facing CLI Chosen: it closes the immediate saliency execution gap while preserving ADR-0655's shared row semantics
Keep using shell loops No new Python surface No stable config schema, no batch report, weak resume/audit story Rejected: this is exactly how saliency table refreshes get lost or mixed across models
Add corpus-specific materializers Each corpus can hard-code roots and columns Duplicates row IO/status/model logic and violates the shared-materializer invariant Rejected: the current table schema already represents the corpus differences
Fold batch mode into the single-table CLI One script name Makes the single-table parser carry two unrelated invocation shapes Rejected: a thin orchestration script is easier to test and keep out of trainer hot loops

Consequences

  • Positive: Saliency refreshes across CHUG, KoNViD, UGC, Netflix, and BVI can be launched from one versioned manifest with per-table audits.
  • Negative: Operators have one more AI script to validate when changing saliency materializer config fields.
  • Neutral / follow-ups: Use the batch manifest on refreshed tables, rerun the signal-mix audit, and measure predictor / MOS-head retrain impact.

References

  • ADR-0655 — shared saliency table materializer.
  • ADR-0672 — model and temporal-reducer metadata for saliency rows.
  • ADR-0661 — shared AI run provenance.
  • Research-0693 — implementation digest.
  • Source: req — "well go on i guess we have enough backlog..."
  • Source: req — "well and in this audit perhaps find gaps that we have no metric/signal for at all or so"