Research-0991: Second-Opinion Batch Materializer — Smoke-Run Scaffold¶
Problem¶
ai/scripts/batch_materialize_second_opinion_features.py was merged in PR #1497 (archived) but no run artifacts exist under .workingdir2/ or runs/. The task is to verify the script is wired correctly and scaffold a committed smoke-run path that operators can use before pointing the batch runner at real corpus data.
Two issues found during verification:
Issue 1: pytest fails without PYTHONPATH=ai/scripts¶
Running pytest ai/tests/ from the repo root fails for the second-opinion batch tests and the saliency batch tests with:
Root cause: test_batch_materialize_second_opinion_features.py uses importlib.spec_from_file_location to load the script at its absolute path. When Python executes the script body, it tries from _script_bootstrap import bootstrap_ai_script — but _script_bootstrap.py lives in ai/scripts/ and ai/scripts/ is not on sys.path when pytest is invoked from the repo root.
ai/pyproject.toml has no pythonpath entry for pytest, so the issue affects every CI invocation of pytest ai/tests/ unless an explicit PYTHONPATH=ai/scripts is prepended. The same bug affects the saliency batch tests.
Fix: add pythonpath = ["scripts"] to [tool.pytest.ini_options] in ai/pyproject.toml. This is the standard pytest mechanism for adding non-installed directories to sys.path during test collection; it requires pytest ≥ 7.0, which the existing pytest>=9.0.3 floor satisfies.
Verification: pytest ai/tests/test_batch_materialize_second_opinion_features.py ai/tests/test_batch_materialize_saliency_features.py — all 7 tests pass after the fix, zero tests pass before.
Issue 2: No committed smoke scaffold¶
The runs/ directory is gitignored, so run manifests and outputs from exploratory invocations are not available in the repo. The analogous saliency batch runner also lacks a committed smoke path.
Fix: commit a minimal smoke scaffold under ai/testdata/smoke-second-opinion-batch/:
batch.json— two-table manifest; relative paths anchor to the repo root via--base-dir .; outputs go to/tmp/vmafx-smoke-second-opinion/so no generated artifacts are committed.fixtures/features_a.jsonl,fixtures/features_b.jsonl— 3-row and 2-row synthetic feature tables keyed onvideo_id.fixtures/scores_fork_nr_a.jsonl,fixtures/scores_fork_nr_b.jsonl— matchingfork-nrscore rows with scalarscoreandruntime_ms.README.md— exact invocation, expected stderr, and inspection checklist.
Smoke run verified end-to-end:
PYTHONPATH=ai/scripts python ai/scripts/batch_materialize_second_opinion_features.py \
--manifest ai/testdata/smoke-second-opinion-batch/batch.json \
--base-dir . \
--report-json /tmp/vmafx-smoke-second-opinion/smoke.report.json \
--report-md /tmp/vmafx-smoke-second-opinion/smoke.report.md
Stderr: tables=2 input_rows=5 output_rows=5 failed_tables=0
Report JSON:
schema: second-opinion-materializer-batch-v1status: oksummary.tables: 2,summary.output_rows: 5run_provenance.schema: ai-run-provenance-v1
Output tables contain second_opinion_fork_nr_score, _status=ok, and _runtime_ms columns, confirming the full join pipeline.
Decision Drivers¶
- Fix a pre-existing CI latent failure before it causes confusion on a future PR that adds more batch script tests.
- Give operators a single
PYTHONPATH=ai/scripts python ...command with known output to verify the pipeline before connecting real corpus data.
Follow-Up¶
Generate real scorer sidecars for CHUG, KoNViD, UGC, Netflix, and BVI tables, run the batch manifest, then rerun the signal-mix audit and retrain candidate models.