ADR-0991: Second-Opinion Batch Materializer — Smoke-Run Scaffold and Test Fix¶
- Status: Accepted
- Date: 2026-06-03
- Deciders: Lusoris
- Tags: ai, second-opinion, materializer, testing, smoke, fork-local
Context¶
ADR-0674 introduced ai/scripts/batch_materialize_second_opinion_features.py and its test suite. The implementation is correct; however, two gaps were identified when the PR (archived as #1497) was reviewed for its first real corpus run:
-
No committed smoke-run scaffold. There is no versioned manifest or fixture set that an operator can use to verify the script works end-to-end against a known input before pointing it at real corpus data. The analogous saliency batch runner (ADR-0673) also lacks such a scaffold, but the second-opinion path is more frequently referenced in signal-mix planning.
-
ai/pyproject.tomlmissingpythonpath = ["scripts"]in pytest configuration. The batch tests import the scripts viaimportlib.spec_from_file_location, which triggers_script_bootstrapresolution at module load time. Withoutai/scripts/onsys.path, all three second-opinion batch tests and all four saliency batch tests fail withModuleNotFoundError: No module named '_script_bootstrap'. This is a pre-existing bug affecting any invocation ofpytest ai/tests/from the repo root without an explicitPYTHONPATH=ai/scriptsprefix.
Decision¶
-
Add
pythonpath = ["scripts"]to[tool.pytest.ini_options]inai/pyproject.toml. This fixes the pre-existing failure for both the second-opinion batch tests and the saliency batch tests without changing any test logic. -
Commit a smoke-run scaffold under
ai/testdata/smoke-second-opinion-batch/: abatch.jsonmanifest, minimal synthetic fixture JSONL tables, and aREADME.mdwith the exact invocation and expected output. The manifest outputs to/tmp/so no generated artifacts are committed.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Add PYTHONPATH=ai/scripts to every CI step | No pyproject.toml change | Requires updating every CI job that runs ai tests; leaks an env var concern into infra rather than the package config | Rejected: the cleaner fix is the package declaring its own pytest paths |
| Fix only second-opinion tests, leave saliency broken | Smaller PR scope | Leaves a known CI failure in the saliency batch test suite | Rejected: same root cause, same one-line fix |
Place smoke manifest in ai/runs/ | Conventional location for run manifests | runs/ is gitignored at the root .gitignore level, so the manifest would not be committed | Rejected: ai/testdata/ is not gitignored and is the correct home for committed fixture data |
| No smoke scaffold, rely on unit tests alone | No new committed files | No single "run this to prove it works" command for operators new to the script | Rejected: operator confidence requires an end-to-end smoke path with known fixtures |
Consequences¶
- Positive:
pytest ai/tests/from the repo root passes for all batch materializer tests without a manualPYTHONPATHprefix. - Positive: Operators can verify the second-opinion batch path end-to-end with one command and two synthetic tables before pointing the script at real corpus data.
- Negative:
ai/testdata/grows by five small files (batch manifest, two fixture feature tables, two fixture score tables, README). - Neutral / follow-ups: Run the batch manifest on real scorer sidecars for CHUG, KoNViD, UGC, Netflix, and BVI tables, then rerun the signal-mix audit. Consider adding the same pytest path fix for any other ai/scripts batch runners added in the future (saliency batch runner currently uses the same fix path).
References¶
- ADR-0657 — table-side second-opinion joiner.
- ADR-0661 — AI run provenance.
- ADR-0673 — saliency batch runner (same pytest fix also applied).
- ADR-0674 — second-opinion batch manifest design.
- Research-0991 — implementation digest.
- Source: req — "Script merged (PR #1497 archived) but no artifacts under .workingdir2/ or runs/. If wired but not run, scaffold a smoke launch."