ADR-0908: Slow-test audit (2026-05-30) — no >30 s tests found; install slow marker as a future gate¶
- Status: Accepted
- Date: 2026-05-30
- Deciders: lusoris, Claude (Opus 4.7)
- Tags: ci, testing, devx
Context¶
The user asked for an audit of slow tests (the ad-hoc gate was "anything over 30 seconds") across the three pytest packages (ai/tests/, mcp-server/vmaf-mcp/tests/, tools/vmaf-tune/tests/) and the meson test suite. The motivation: slow tests inflate every CI round-trip and every local pre-push gate. A pre-emptive audit catches drift before the suite is too painful to run on every push.
The empirical result is unambiguous — no test in any suite exceeds 30 seconds today. The slowest test in the entire tree clocks 13.40 s (tools/vmaf-tune/tests/test_bbb_e2e_v5_bug_cluster.py::test_ladder_against_bbb_container_yields_plausible_vmaf — a docker-exec live ladder run against the BBB corpus). The second slowest is 13.20 s (test_v14_a_nvenc_probe_succeeds_on_gpu_host, a live NVENC encoder probe; skipped on hosts without an NVIDIA driver). The libvmaf meson suite tops out at 5.00 s (test_framesync). The ai/tests/ suite tops out at 3.21 s (test_train_konvid_mos_head.py::test_smoke_run_is_deterministic).
Because the threshold question came up at all, the absence of a registered slow pytest marker is itself a gap: when a future test does breach 30 s, there is no documented opt-out mechanism for the fast-path CI gates.
Decision¶
We will:
-
Register a
slowpytest marker intools/vmaf-tune/pyproject.toml's[tool.pytest.ini_options]so future >30 s tests have a documented opt-out (pytest -m "not slow"). Apply the same marker convention in the other two pytest packages (ai/pyproject.toml,mcp-server/vmaf-mcp/pyproject.toml) for consistency. -
Mark the two ~13 s tests with
@pytest.mark.sloweven though neither breaches 30 s. They are the only tests in the tree that plausibly grow past the threshold (one is a real docker-exec ladder encode; the other is a live NVENC probe gated by hardware). Marking them now means a-m "not slow"developer gate excludes them without further triage. -
Apply a low-risk speedup to the docker ladder e2e test:
- Drop the encode
--durationfrom 4 s -> 2 s (~50 % less encode + decode work; the test's assertion floor isvmaf >= 50.0, which 2 s of BBB content still clears comfortably). - Reduce the CRF sweep from 3 points (
23,28,33) -> 2 points (23,33). Combined with 2 resolutions this still yields 4 samples, which is the assertion floor (len(samples) >= 4). The expected runtime drops from ~13 s -> ~7 s.
We will not modify the NVENC probe test — the ~12 s cost is GPU driver context-creation latency, not test logic, so the only way to speed it up is to mock the encoder probe (which would defeat the "live GPU smoke" purpose of the test).
We will not touch the meson tests. The slowest one (test_framesync at 5.00 s) is a synchronization-correctness probe that legitimately needs to drive multiple frames through the pipeline.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Take no action (no tests >30 s today) | Zero diff | Leaves no marker for the next time a test breaches the threshold | Pre-emptive markers cost nothing and prevent the next audit from finding the same gap |
Add slow marker + skip in default CI | Faster CI by ~13 s | Hides real coverage; the two marked tests catch real regressions (V5-2 garbage encode, V14-A NVENC init) | Mark-and-run is safer; the marker exists as opt-out, not opt-in |
| Aggressively speed the v5 docker test (1 res * 1 CRF) | Fastest | Loses the dedup assertion (len(samples) == len(set(keys))) and the multi-resolution coverage of V5-3 | 2 res * 2 CRF is the minimum that preserves both invariants |
Mock the NVENC probe with a fake runner | Cuts 13 s -> <1 s | Test no longer probes a real driver; loses the V14-A regression-catching value | The whole point of test_v14_a_nvenc_probe_succeeds_on_gpu_host is to catch real GPU init failures |
Consequences¶
- Positive:
- Future tests >30 s have a documented and pre-installed opt-out marker.
- The docker e2e test runs ~45 % faster (~13 s -> ~7 s) without losing coverage.
- The audit produces a baseline timing table that the next audit can compare against.
- Negative:
- One more marker name developers must remember (
slow). - Neutral / follow-ups:
- A research digest accompanies this ADR under
docs/research/slow-test-audit-2026-05-30.mdcapturing the full per-test timing table. - If a future test breaches 30 s, the author should mark it
slowand justify in the same PR; reviewers verify against this ADR.
References¶
- Source:
req— "Audit slow tests (>30 sec) across pytest + meson test suites. Document + propose speedups." - Related: ADR-0108 (six deep-dive deliverables).
- Companion digest:
docs/research/slow-test-audit-2026-05-30.md.