TSan race audit — libvmaf threadpool + dispatcher paths (2026-05-30)¶
- Trigger: agent task on branch
chore/mcp-tools-audit-20260529— "Run TSan (ThreadSanitizer) on the libvmaf threadpool + dispatcher paths to surface latent race conditions". ADR-0138/0139 elevate bit-exactness to a load-bearing invariant — a race can silently produce wrong scores without crashing. - Worktree:
fix/tsan-race-auditbased onorigin/master(master tipbbcaa8d127, "fix(common): init_blur_array — free partial allocations on failure (#297)"). - Build:
meson setup build-tsan core -Denable_cuda=false -Denable_sycl=false -Db_sanitize=thread. Verified TSan instrumentation vianm | grep __tsan(14 symbols) andldd | grep libtsan(libtsan.so.2).
Reproducer¶
TSAN_OPTIONS="halt_on_error=0 second_deadlock_stack=1 history_size=7" \
./build-tsan/tools/vmaf \
-r python/test/resource/yuv/checkerboard_1920_1080_10_3_0_0.yuv \
-d python/test/resource/yuv/checkerboard_1920_1080_10_3_1_0.yuv \
-w 1920 -h 1080 -p 420 -b 8 --threads 16 \
--feature psnr --feature float_ssim \
--feature float_ms_ssim --feature ciede \
-o /tmp/vmaf-bigstress.json --json
Smaller invocations (--threads 4 with a single feature, the fast test suite, test_framesync, test_thread_pool) did not fire the race; the stressor needs both float_ssim and float_ms_ssim enabled plus enough workers (>=4) to overlap the two extractors' init() callbacks.
Findings¶
Pre-fix: 10 TSan warnings. All on the same four globals, defined in core/src/feature/iqa/ssim_tools.c:
| Global | Site (write thread A) | Site (write thread B) |
|---|---|---|
g_ssim_precompute | iqa_ssim_set_dispatch from float_ssim::init | same, from float_ms_ssim::init |
g_ssim_variance | same | same |
g_ssim_accumulate | same | same |
g_iqa_convolve | iqa_convolve_set_dispatch from float_ssim::init | same, from float_ms_ssim::init |
Root cause¶
vmaf_feature_extractor_context_extract (core/src/feature/feature_extractor.c:580) lazy-initialises each fex_ctx on first extract:
if (!fex_ctx->is_initialized) {
int err = vmaf_feature_extractor_context_init(fex_ctx, ...);
if (err) return err;
}
The fex_ctx_pool gives each worker a distinct fex_ctx, so the is_initialized race itself is benign within one extractor. But the dispatch globals are process-wide: when worker A runs float_ssim::init and worker B runs float_ms_ssim::init concurrently, both call iqa_ssim_set_dispatch(...) / iqa_convolve_set_dispatch(...) and race-write the same 8 bytes.
The races are value-benign on current x86-64 hardware (8-byte aligned, all writers install the same ISA-best pointer) but TSan correctly flags them, and:
- On weakly-ordered hardware (ARM, POWER) a non-aligned racing store could publish a torn pointer.
- ADR-0138 / ADR-0139 treat any dispatch-pointer race as load-bearing.
Other globals scanned, no race surfaced¶
g_flags/g_flags_mask(core/src/cpu.cpp) — alreadystd::atomic, initialised before workers start.prev_refshared state infeature_extractor.c— already covered by ADR-0795 with documented assertions.vmaf_thread_pool_*—pthread_mutex_tand condvars correctly scope the worker queue; no races on the queue or refcounts.vmaf_fex_ctx_pool_*—pthread_mutex_t(pool->lock)plusatomic_load/atomic_fetch_addon entry refcounts; no races.vmaf_feature_collector— no concurrent writes flagged.- DNN / ORT init paths — no concurrent writes flagged.
Fix¶
Single process-wide pthread_once_t owned by ssim_tools.c, shared between float_ssim.c and float_ms_ssim.c via the new iqa_ssim_install_dispatch_once(pthread_once_t *, void (*)(void)) helper declared in core/src/feature/iqa/ssim_simd.h.
- The helper's
guardparameter is retained for API symmetry but ignored; the actual guard is the file-staticg_ssim_dispatch_once. - The installer-callback pointer is published via an atomic CAS (
memory_order_release) so only the first publisher writes the slot and TSan sees a sequenced store. pthread_onceruns the trampoline exactly once and emits a full memory barrier; all workers (any TU) read the four dispatch globals through that barrier.
See ADR-0871 for the full decision record, alternatives matrix, and consequences.
Verification¶
$ TSAN_OPTIONS=... ./build-tsan/tools/vmaf ... # post-fix
$ grep -cE 'WARNING: ThreadSanitizer' /tmp/tsan-bigstress3.log
0
$ meson test -C build-tsan
Ok: 63
Fail: 0
$ python3 -c "import json; d=json.load(open('/tmp/vmaf-fixed-src01.json')); print(d['pooled_metrics']['vmaf'])"
{'min': 71.174765, 'max': 87.180955, 'mean': 76.66783, 'harmonic_mean': 76.508906}
# bit-for-bit identical to the pre-fix run on the src01_hrc00/src01_hrc01 pair
Files touched¶
core/src/feature/iqa/ssim_simd.h— declareiqa_ssim_install_dispatch_once, include<pthread.h>.core/src/feature/iqa/ssim_tools.c— file-staticpthread_once_t, atomic publish of installer, trampoline.core/src/feature/float_ssim.c— extract installer tossim_install_dispatch_for_host_isa, call once-helper.core/src/feature/float_ms_ssim.c— call once-helper instead of direct dispatch install.docs/adr/0871-ssim-dispatch-pthread-once.md— decision record.docs/adr/README.md— index row.docs/adr/_index_fragments/_order.txt— manifest entry.changelog.d/fixed/tsan-race-audit.md— release-please fragment.docs/research/tsan-race-audit-2026-05-30.md— this digest.docs/rebase-notes.md— rebase-impact entry.docs/state.md— closed-bug row.
Audit clean status¶
No further races, deadlocks, or heap-use-after-free in thread context were observed across:
- 49 / 49
fastsuite tests - 63 / 63 full suite (including
dnnandslow) --threads 4baseline scoring (src01)--threads 16stress with 4 features on 1080p checkerboard--threads 16stress withreport_atomic_races=1vmaf_bench(no-config exit)test_framesync,test_thread_poolstandalone