ADR-1072: Fix PREV_REF refcount leak in threaded batch and serial dispatch paths¶
- Status: Accepted
- Date: 2026-06-06
- Deciders: Lusoris
- Tags:
core,threading,memory,picture-pool,bug
Context¶
The VMAF_FEATURE_EXTRACTOR_PREV_REF protocol (introduced in PR #688) lets temporal extractors receive the previous frame via fex->prev_ref rather than managing their own internal state with VMAF_FEATURE_EXTRACTOR_TEMPORAL. feature_extractor.cpp::vmaf_feature_extractor_context_extract() implements a SWAP on success: it unrefs the old fex->prev_ref and calls vmaf_picture_ref to store the current frame with a bumped refcount.
Both dispatch paths that feed PREV_REF extractors copied vmaf->prev_ref into fex->prev_ref via a bare struct copy (no vmaf_picture_ref), ran the extractor, then cleared fex->prev_ref with a bare memset — without first calling vmaf_picture_unref. After the SWAP, fex->prev_ref holds the current frame with a bumped refcount; the bare memset discards that counted reference without triggering the pool-release callback.
The net effect: every frame processed by a PREV_REF extractor leaks one picture-pool slot. After ~pool_size frames the pool is exhausted and vmaf_picture_pool_fetch() blocks in pthread_cond_wait indefinitely.
The paths affected:
-
threaded_extract_batch_func(master active path for n_threads > 0): struct copy at line 1683 + bare memset at line 1690. Additionally, the SWAP's unref drops the old-frame VmafRef to 0 and frees it; the subsequentvmaf_picture_unref(&f->prev_ref)in theunref:block is then a use-after-free. -
threaded_extract_func(dead code, kept for consistency): same struct copy + bare memset pattern. -
read_pictures_dispatch_one(serial path, n_threads=0): was already fixed in PR #741 via ADR-0778 (usesvmaf_picture_ref, not struct copy).
Three CI test failures share this root cause: test_picture_pool_basic deadlocks because integer_motion is PREV_REF and is registered via vmaf_use_features_from_model("vmaf_v0.6.1") with n_threads=4.
Two additional CI failures were separate test-design bugs:
-
test_float_ms_ssim_cpu_cuda_parityandtest_hip_ms_ssim_parityused a 256x144 fixture;float_ms_ssimrequiresmin(w,h) >= 176for its 5-level 11-tap Gaussian pyramid (seefloat_ms_ssim.c:131) and returned-EINVALon the firstvmaf_read_picturescall. -
test_hip_motion_parityqueriedVMAF_integer_feature_motion_scorewhichinteger_motiononly writes whendebug=true; without that option the extractor emits onlyVMAF_integer_feature_motion_sad_score.
Decision¶
In threaded_extract_batch_func: after vmaf_feature_extractor_context_extract returns, if the extractor carries VMAF_FEATURE_EXTRACTOR_PREV_REF, call vmaf_picture_unref(&td->fex_ctx[i]->fex->prev_ref) before the memset to balance the SWAP's bump (on success) or the struct-copy's share (on error). Then zero f->prev_ref to prevent the unref: block from double-freeing the now-consumed VmafRef.
Apply the same one-unref-before-memset pattern to threaded_extract_func for consistency (dead code path, but must not carry a latent bug if re-activated).
Fix the test-design bugs:
- Raise fixture height from 144 to 192 in both ms_ssim parity tests.
- Pass
debug=1viavmaf_feature_dictionary_settovmaf_use_featureinrun_cpu_motionsomotion_scoreis emitted on the CPU path.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
Use vmaf_picture_ref instead of struct copy at dispatch (match serial path) | Eliminates shared-pointer aliasing; simpler ownership | Adds one ref/unref per frame per PREV_REF extractor; requires zeroing fex->prev_ref after extract separately | The fix is correct with the minimal unref-before-memset change; converting to full ref would be a larger refactor with no observable correctness advantage |
| Disable PREV_REF flag on integer_motion in the batch path | Avoids the code path entirely | Breaks temporal correctness of motion scores when threaded | Not an option; motion requires the previous frame |
| Disable the 4 failing sub-tests | Avoids touching libvmaf.c | Hides a real memory-safety bug; pool exhaustion will reappear in other tests | The root cause is a one-line fix per path |
| Fix only the batch path, leave threaded_extract_func with the bug | Minimal change | Dead code with a latent bug is a maintenance hazard | Cheap to fix both while the code is open |
Consequences¶
- Positive: picture-pool exhaustion from PREV_REF extractors in threaded mode is eliminated. Tests that previously deadlocked or returned EINVAL now pass. The serial path (ADR-0778) and the batch path are now consistent.
- Negative: none.
- Neutral / follow-ups:
threaded_extract_funcandthreaded_read_picturesremain dead code; a follow-up cleanup PR can remove them.
References¶
- ADR-0778: serial-path PREV_REF fix (PR #741)
- ADR-0795: batch-threading thread-safety analysis
- PR #688: integer_motion TEMPORAL → PREV_REF migration
- PR #764: pattern for disabling sub-tests (reference for the alternative not taken)
float_ms_ssim.c:131: minimum-dimension check comment and rationale