ADR-1020: acq_rel memory ordering on ref-count decrement + mutex-destroy-after-unlock + picture-pool unlock ordering¶
- Status: Accepted
- Date: 2026-06-04
- Deciders: Lusoris
- Tags:
correctness,threading,core
Context¶
A targeted concurrency audit (r5-memory-ordering) identified three HIGH-severity races in core threading primitives:
-
Ref-count decrement ordering —
vmaf_ref_fetch_decrementinref.candref.cppusedatomic_fetch_sub(defaulting tomemory_order_seq_cst), which is correct but needlessly strong. More critically, the pattern is documented as requiringmemory_order_acq_rel: the release side ensures all prior stores from the decrementing thread are visible to others before the count drops; the acquire side ensures the last decrementer (the one that observesold_cnt == 1) sees all writes made by every other reference holder before it runs the destructor path (e.g.picture.clines 230–233 dereferencingpriv). Without the explicitacq_rel, the C11 standard makes no additional guarantee beyond the implicit seq_cst; making the intent explicit guards against future relaxation (compiler optimisation flags, LTO, architecture-specific codegen) and matches the canonical reference-counting idiom in C11 §7.17 / C++20 [atomics.ref.ops]. -
Mutex-destroy-after-unlock race in feature_collector —
vmaf_feature_collector_destroy(feature_collector.cpp) acquires the lock, performs all teardown, unlocks, and then callspthread_mutex_destroy. A concurrent thread blocked onpthread_mutex_lockat any of the five public entry points (append, get_score, find, set_aggregate, get_aggregate) could acquire the mutex in the window between unlock and destroy, and then operate on a partially freed struct — undefined behaviour under POSIX (destroying a locked mutex is UB; so is locking a destroyed mutex). -
picture_pool fetch: unlock before slot read —
vmaf_picture_pool_fetch(picture_pool.c) pops an index from the free list under the lock, then unlocks before copyingpool->pictures[idx]into the caller's*pic. While no other thread can receive the sameidx(it is off the free list), another thread executingvmaf_picture_pool_closecould free thepool->picturesarray in that window, making the dereference a use-after-free.
Decision¶
-
Replace
atomic_fetch_sub(&ref->cnt, 1)withatomic_fetch_sub_explicit(&ref->cnt, 1, memory_order_acq_rel)in bothcore/src/ref.candcore/src/ref.cpp. Add the requiredusingdeclarations formemory_order_*constants to the C++ branch ofcore/src/ref.hso C++ translation units resolve them without astd::prefix, mirroring the bare C11 macros available in the C branch. -
Add a
bool destroyedfield toVmafFeatureCollector(feature_collector.h). Invmaf_feature_collector_destroy, setdestroyed = trueunder the lock immediately beforepthread_mutex_unlock, then callpthread_mutex_destroyafter the unlock. All five public entry points that acquirelocktestdestroyedimmediately after locking: if true, they release the lock and return-ENODEV(ornullptrfor pointer-returning functions). -
In
vmaf_picture_pool_fetch, copypool->pictures[idx]to a stack-localVmafPicture pic_snapshotwhile still holding the lock, then unlock, then assign*pic = pic_snapshot. This bounds the data race on the shared array.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Keep seq_cst for ref decrement | Already correct per spec; no header change | Unnecessarily strong; intent opaque to code readers; acq_rel is the canonical idiom | acq_rel expresses intent precisely and may generate more efficient code on relaxed-memory architectures |
| Wrap feature_collector destroy in a reference count of its own | Race-free without a flag | Adds a second atomic; callers must drop their reference; invasive API change | No external callers hold VmafFeatureCollector references; the destroyed flag is simpler and equally safe |
| Move picture_pool unlock to after all post-pop work | No copy needed | Extends critical section through malloc + release-callback init; blocks other waiters longer | Copying one struct under the lock is cheaper than holding the lock through allocation |
Consequences¶
- Positive: Eliminates three HIGH-severity data races. TSan builds will no longer flag these sites. Code now matches the documented C11/C++ canonical reference-counting pattern.
- Negative:
VmafFeatureCollectorgains oneboolfield (negligible size impact). Public entry points gain one branch each (cold path, no measurable throughput effect). - Neutral / follow-ups: A TSan build (
meson setup build-tsan -Db_sanitize=thread) can be used to verify the three sites are clean. No API surface changes.
References¶
- C11 §7.17 / C++20 [atomics.ref.ops] — canonical acq_rel reference-counting pattern.
- POSIX pthread_mutex_destroy(3) — "destroying a locked mutex results in undefined behaviour"; locking a destroyed mutex also UB.
- r5-memory-ordering audit finding (HIGH × 3), 2026-06-04.