ADR-0793: Nightly Workflow Audit — TSan, Artifact Retention, Python Version¶
- Status: Accepted
- Date: 2026-05-29
- Deciders: lusoris
- Tags:
ci,nightly,sanitizers,artifacts,fork-local
Context¶
A periodic audit of the five cron-scheduled workflows (nightly.yml, nightly-bisect.yml, sanitizers.yml, fuzz.yml, and the upstream-watcher family) identified three concrete defects and one structural redundancy:
-
Duplicate TSan job —
nightly.ymlran a full ThreadSanitizer build and test suite on a daily cron. After ADR-0710 (CI Slim-Down v2),sanitizers.ymlalready fires a TSan job on every push tomaster, which is both more timely and higher signal (it catches the specific commit that introduced a race, not a 24-hour-old snapshot). The nightly cron TSan consumed ~45 minutes of runner time every night for zero incremental coverage. -
Missing artifact retention — Both nightly artifacts (
clang-tidy-full-reportandnightly-benchmark-results) had no explicitretention-dayssetting, defaulting to GitHub's 90-day retention. Clang-tidy logs are diagnostic and lose value after a week; benchmark JSONs are comparable period-over-period but do not need to be kept for three months. -
Wrong Python version in
nightly-bisect.yml— The step was named "Set up Python 3.12" but pinnedpython-version: "3.14.5". Python 3.14 is a pre-release alpha series that does not yet have a stable release; thesetup-pythonaction would attempt to download a non-existent version and fail the job. The step name was the authoritative intent; the version string was a copy-paste error from an unreleased future spec. -
Stale
libvmafsource-dir reference infuzz.yml(already fixed in the current tree by PR fix/ci-paths-libvmaf-to-core-20260528): confirmed clean in the worktree; no change needed here.
Decision¶
Apply the following targeted fixes to nightly.yml and nightly-bisect.yml:
- Remove the
tsanjob fromnightly.yml. All TSan coverage is now provided by the master-pushsanitizers.ymljob (ADR-0710). A header comment explains the delegation. - Add
retention-days: 14to theclang-tidy-full-reportartifact. Fourteen days is sufficient for a developer to investigate a finding before the log is garbage-collected. - Add
retention-days: 30to thenightly-benchmark-resultsartifact. Thirty days provides one month of period-over-period comparisons without the 90-day default accumulation. - Fix
python-version: "3.14.5"→"3.12"innightly-bisect.yml. This matches the step name and uses the latest stable series theai/dependencies have been validated against.
No changes are made to the upstream-watcher workflows, scorecard.yml, security-scans.yml, or fuzz.yml — all of those are structurally sound.
Nightly workflow inventory post-fix¶
| Workflow | Schedule | Purpose | Still needed? |
|---|---|---|---|
nightly.yml — clang-tidy-full | 03:17 UTC daily | Full-tree clang-tidy (too slow for PRs) | Yes |
nightly.yml — netflix-benchmark | 03:17 UTC daily | CPU benchmark throughput baseline | Yes |
nightly-bisect.yml | 04:37 UTC daily | Bisect-model-quality smoke + sticky-issue update | Yes |
sanitizers.yml — fuzz-nightly | 04:30 UTC daily | libFuzzer × 3 harnesses × 60 s | Yes |
sanitizers.yml — tsan | master push | ThreadSanitizer (replaces nightly cron TSan) | Yes |
scorecard.yml | Mon 04:19 UTC weekly | OSSF Scorecard supply-chain health | Yes |
security-scans.yml | Mon 06:00 UTC weekly | CodeQL + Semgrep + Gitleaks | Yes |
upstream-watcher.yml | Mon 08:00 UTC weekly | FFmpeg av1_videotoolbox probe | Yes (until encoder lands) |
upstream-netflix-955-watcher.yml | Sun 06:00 UTC weekly | Netflix#1494 merge probe | Yes (until merged) |
upstream-netflix-645-hdr-model-watcher.yml | Sun 06:15 UTC weekly | HDR model file probe | Yes (until landed) |
upstream-ffmpeg-hip-hwdec-watcher.yml | Sun 06:30 UTC weekly | FFmpeg HIP hwdec probe | Yes (until landed) |
Resource cost (post-fix, estimated per night)¶
| Job | Runner | Est. duration | Runner-minutes/night |
|---|---|---|---|
| clang-tidy-full | ubuntu-24.04 | ~45 min | 45 |
| netflix-benchmark | ubuntu-24.04 | ~20 min | 20 |
| nightly-bisect | ubuntu-24.04 | ~5–10 min | 10 |
| fuzz × 3 | ubuntu-latest × 3 | ~5 min each | 15 |
| Total | ~90 min/night |
The removed TSan job was ~45 min/night → net saving ~45 runner-minutes/night (~22 hours/month).
Triage status¶
clang-tidy-full-reportartifacts: reviewed on demand; findings surface in the fork's lint gate when the touched file is changed in a PR. The nightly job exists to catch latent warnings in files not touched recently.nightly-benchmark-results: informal throughput tracking. No automatic alert on regression; operator reviews monthly or when a performance PR lands.bisect-report: sticky comment on issue #40 updated every night; anyWIRING BROKEverdict makes the job red and visible in the Actions tab.- Fuzz crash artifacts: uploaded for 30 days on non-zero exit; failures make the job red; no automated triage issue yet (tracked as a follow-up in docs/state.md T-SANITIZER-DEFECTS-REVEALED-758).
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Keep nightly TSan, remove master-push TSan | Always-fresh nightly snapshot | 24-hour lag between commit and detection; duplicates coverage | master-push TSan is strictly better |
| Keep both TSan jobs | Belt-and-suspenders | ~90 extra runner-minutes/month for zero incremental signal | Redundant |
| Set fuzz retention to 7 days | Minimal storage | Crash artifacts may need re-download after a weekend; 30 days matches fuzz.yml precedent | 30 days is reasonable |
Consequences¶
- Positive: ~45 runner-minutes/night saved; artifact storage reduced from 90 to 14/30 days for nightly jobs;
nightly-bisect.ymlPython version now matches a real stable release. - Negative: None. TSan coverage is fully preserved via
sanitizers.yml. - Neutral: Benchmark triage workflow unchanged; no alert automation added (out of scope for this ADR).
References¶
- ADR-0710 (VMAFX CI Slim-Down v2 — introduced master-push TSan in sanitizers.yml)
- ADR-0109 (nightly-bisect scaffold)
- ADR-0270, ADR-0311 (libFuzzer nightly)
- ADR-0448 (upstream-watcher governance)
req(paraphrased): user requested a nightly workflow audit covering necessity, brittleness, cost, triage habits, and artifact GC.