ADR-0562: VCQ-223 LocalExplainer hang fix — cap neighbor_samples in test runner¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude (Anthropic)
- Tags:
python,test,local-explainer,performance,bugfix,fork-local
Context¶
ADR-0551 (PR #1325) identified the root cause of the VCQ-223 CI hang: VmafQualityRunnerWithLocalExplainer._run_on_asset constructed LocalExplainer() with the default neighbor_samples=5000, producing ~480 000 libsvm svm_predict_values calls per typical run (wall time 4–8 min; CI timeout). This PR implements the fix proposed in ADR-0551.
The test QualityRunnerTest::test_run_vmaf_runner_local_explainer_with_bootstrap_model has been skipped with @unittest.skip("[VCQ-223]") since commit e3827e4dd.
Decision¶
The fix targets the fallback path inside VmafQualityRunnerWithLocalExplainer._run_on_asset (option (b) from ADR-0551): the fallback now constructs LocalExplainer(neighbor_samples=100) — matching the passing sibling test test_explain_vmaf_results — rather than the 5000-sample default. Production callers that need higher fidelity can pass optional_dict={"explainer_neighbor_samples": 5000} or supply a full LocalExplainer via optional_dict2={"explainer": ...}.
The @unittest.skip decorator is removed. Score assertions are recalibrated to the neighbor_samples=100 values.
Implementation notes¶
python/vmaf/core/quality_runner_extra.py:38— fallbackLocalExplainer()replaced withLocalExplainer(neighbor_samples=neighbor_samples)whereneighbor_samplesdefaults to 100 and is overridable viaoptional_dict.get("explainer_neighbor_samples", 100).python/test/local_explainer_test.py:252—@unittest.skipremoved; score assertions updated toneighbor_samples=100calibration values:results[0]["VMAF_LE_score"]= 75.40980306756497results[1]["VMAF_LE_score"]= 99.95804823471536- Wall time on dev machine (Python 3.14): approximately 78 seconds.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Lower LocalExplainer.DEFAULT_NEIGHBOR_SAMPLES globally to 100 | Single-point change | Breaks production callers relying on 5000-sample explanations | Chosen against — default is public API surface |
Fix only at test call site via optional_dict2 | Minimal blast radius | Runner fallback stays broken for any future caller | Runner-level fix is the correct layer |
| Raise the test timeout | No code change | Root cause remains; burns 4–8 min of CI time | Rejected — fix root cause, not symptom |
Consequences¶
- Positive:
QualityRunnerTest::test_run_vmaf_runner_local_explainer_with_bootstrap_modelruns and completes within the 120-second timeout. TheLocalExplainer+VmafQualityRunnerWithLocalExplainercode path is exercised in CI. - Negative: The
neighbor_samples=100values differ from whatneighbor_samples=5000would produce; score assertions changed accordingly. - Neutral / follow-ups: Production callers relying on the implicit 5000-sample default should pass
optional_dict={"explainer_neighbor_samples": 5000}.
References¶
- Diagnosis ADR: ADR-0551 (PR #1325)
- Skip introduced: commit
e3827e4dd(2026-05-06) - Related passing test:
LocalExplainerTest::test_explain_vmaf_results(usesneighbor_samples=100) docs/state.mdtracking item: T-VCQ-223-LOCAL-EXPLAINER-HANG