ADR-0404: Keep nightly.yml + fuzz.yml red until underlying bugs land¶
- Status: Accepted
- Date: 2026-05-09
- Deciders: lusoris, Claude
- Tags: ci, testing, fuzzing, security
Context¶
Research-0089 (PR #525) §5 flagged that nightly.yml and fuzz.yml have 0 successful runs in the last 50 runs each. Empirical triage against gh run list --workflow … --limit 50 --json conclusion,event on 2026-05-09 confirmed:
-
nightly.yml— 23 consecutiveschedulefailures onmaster(oldest visible run 2026-04-16, latest 2026-05-08). Theclang-tidy-fullandnetflix-benchmarkjobs succeed; only theThreadSanitizerjob fails. TSan reports a real data race indiv_lookup_generatoratcore/src/feature/integer_adm.h:32-38— a staticdiv_lookup[65537]table populated concurrently by multiple worker threads spawned fromvmaf_thread_pool_createincore/src/thread_pool.c:169. Failing tests:test_model,test_framesync,test_pic_preallocation(3 of 50). -
fuzz.yml— 4 consecutiveschedulefailures onmaster(2026-05-05 through 2026-05-08). Thefuzz_yuv_inputandfuzz_cli_parsematrix legs succeed; onlyfuzz_y4m_inputfails. Crash:AddressSanitizer SEGVinfreadatcore/tools/y4m_input.c:877from aYUV4MPEG2 W-8 H4 F30:1 Ip A1:1 C422reproducer (negative width parses but leaves_y4m->dst_buf == NULLwhiledst_buf_read_sz > 0, NULL-deref on thefreaddestination). Distinct from the Y4M-411-OOB crash that PR #357 / commit05ba29a6already fixed iny4m_input.c:507.
Both gates are doing exactly what they were scaffolded to do (ADR-0015 for nightly TSan, ADR-0270 + ADR-0311 for libFuzzer): surfacing real bugs. The triage decision is what to do with the red signal until the underlying fixes land.
Decision¶
We will keep both workflows running, unmodified. No continue-on-error, no matrix-leg deletion, no workflow_dispatch-only gating. The bugs each gate has surfaced are documented in docs/state.md with explicit reopen triggers; once the fix PRs land, the next nightly run will return green automatically.
This is the application of memory feedback_no_test_weakening to a CI gate that has been red for 23+ days: the gate is correctly catching real defects, and the cost of a CI red badge is lower than the cost of silencing a working detector. Both findings (the ADM init race and the negative-width Y4M SEGV) are out-of-scope for this triage PR and will be fixed in dedicated follow-up PRs.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Keep gates running, document the open bugs (chosen) | Honours feedback_no_test_weakening; the gate keeps watching for regressions on the other harnesses while the known bug is in flight; reopen is automatic | CI red badge persists until fixes land | Aligned with the user's standing rule that a working detector is never silenced to make CI green |
Add continue-on-error: true to the failing jobs / matrix legs | CI badge turns green | Silences a working detector; later regressions in the same code path get masked under a "yellow" marker that humans habituate to ignoring | Direct violation of feedback_no_test_weakening; the rule explicitly covers "skipping/disabling failing tests" |
| Disable both workflows entirely | Stops the noise | Loses ~24 h of TSan / fuzz coverage on the passing paths every day; the ADM race is a known data race the user explicitly wants surfaced | Throws out the working signal with the failing signal |
| Fix both bugs in this PR | Returns CI to green | Two unrelated fixes in a triage PR; the ADM race needs a pthread_once / call_once rework that touches a load-bearing init path (NASA/JPL Power-of-10 review territory); the y4m fix needs a header-validation hardening sweep that should be co-designed with fuzz_y4m_input.c filter rules | Scope creep; a triage PR ships triage, not unrelated bug fixes |
Consequences¶
- Positive: Both gates continue to surface regressions on the passing harnesses (TSan covers
test_pic_preallocation/test_framesync/test_modelplus 47 other tests; fuzz coversfuzz_yuv_input+fuzz_cli_parse+ the corpus-only path offuzz_y4m_input). The CI red badge becomes load-bearing — reviewers see exactly two known-open bugs and any new failure is immediately distinguishable. - Negative: Until the two fixes land, the badge stays red and any new fuzz / TSan finding is queued behind the existing two rather than being immediately distinguishable. Mitigated by the state.md rows pinning the current failing tests + harness so a third row (a new finding) is obviously additive.
- Neutral / follow-ups:
- Fix PR for the ADM init race — recommend wrapping
div_lookup_generator()inpthread_onceor moving it to a one-shot init atvmaf_initboundary before any worker thread spawns. - Fix PR for the negative-width Y4M SEGV — header parser needs to reject
W <= 0/H <= 0before any allocation; the fuzz reproducer (artifactcrash-645a8f241b71d80ff496c10984d9b493d03dbfe1, auto-uploaded by run 25538384046) should be promoted tocore/test/fuzz/y4m_input_known_crashes/until the fix lands. - Both rows added to
docs/state.mdOpen section in the same PR as this ADR; reopen triggers cite the fix-PR commit SHAs (filled in when the fixes land).
References¶
- Research-0089 (PR #525) §5 — flagged the 0/50 success rate.
- ADR-0015 — nightly TSan gate scaffold.
- ADR-0270 — libFuzzer scaffold.
- ADR-0311 — fuzz harness expansion.
- Memory
feedback_no_test_weakening— never weaken a test to make it pass. gh run list --workflow nightly.yml --limit 50(2026-05-09): 23 visiblescheduleruns, allfailure.gh run list --workflow fuzz.yml --limit 50(2026-05-09): 4 visiblescheduleruns, allfailure.- Source:
req— user task body 2026-05-09 ("Investigation-then-fix task: nightly.yml and fuzz.yml workflows have 0 successful runs … document the deferral with explicit reopen trigger + state.md row").