ADR-0544: deduplicate feature_extractor_list[] registrations¶
- Status: Accepted
- Date: 2026-05-18
- Deciders: lusoris, Claude (deep-audit follow-up)
- Tags:
bug,dispatch,vulkan,sycl,registry
Context¶
core/src/feature/feature_extractor.c exposes the runtime registry through a single static array, feature_extractor_list[]. Two iterator clients walk that array on the hot path:
vmaf_get_feature_extractor_by_name()— first-match, returns on hit.vmaf_get_feature_extractor_by_feature_name()— also first-match.
In addition, the ctx-pool's get_fex_list_entry() keys on fex->name (string equality), so the same extractor symbol registered twice still collapses to a single pool entry there. But once a caller resolves an extractor through the by-feature iterator and then walks the registry itself — which the model-loader path does when expanding vmaf_use_features_from_model() — every registered copy spawns its own ctx-pool entry, and the pool drives init/extract/flush once per entry per picture.
The deep audit on 2026-05-18 (.workingdir/bbb_reports/DEEP_AUDIT_2026_05_18.md, Finding 22) found that:
- The
HAVE_VULKANblock held 67 entries instead of 18.vmaf_fex_psnr_hvs_vulkanandvmaf_fex_float_ms_ssim_vulkanwere each registered 11 times; seven other Vulkan twins were registered 6 times each. - The
HAVE_SYCLblock held 17 entries instead of 11. Six SYCL twins were registered 2 times each (psnr_sycl,float_moment_sycl,ciede_sycl,float_ssim_sycl,float_ms_ssim_sycl,psnr_hvs_sycl). - The
HAVE_CUDA,HAVE_HIP,HAVE_METAL, and CPU blocks were clean.
The pattern of the duplicates (interleaved partial copies of the same list) strongly suggests an editor mishap during one of the Vulkan follow-up PRs that went undetected because the first-match get_by_name path masked it.
The bug is a plausible root cause for the v9 CHUG "VMAF=99 across all bitladders" anomaly that was previously attributed to operational misuse in PR #1270 — Vulkan-twin extractors firing 6-11x per pic would both inflate runtime and double-write the feature collector slots that gate the final score.
Decision¶
We will:
- Edit
feature_extractor_list[]so each&vmaf_fex_*symbol appears exactly once under its correct#ifdef HAVE_*guard. Total reduction: 61 duplicate entries removed (55 Vulkan + 6 SYCL). - Add
vmaf_feature_extractor_list_audit()— an O(N²) one-shot walker that compares both pointer identity andnamestrings, logs each offender atVMAF_LOG_LEVEL_ERROR, and returns-EINVALon any duplicate. - Call the audit from
vmaf_init()before any other context state is created, so a regression fails fast at init time instead of silently doubling per-pic work. - Add a C unit test (
test_feature_extractor_list_no_duplicatesincore/test/test_feature_extractor.c) that exercises the audit helper on the live registry. This wires the check into themeson testsuite that already gates pushes.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Just delete the duplicate rows (no audit, no test) | Smallest diff | Nothing prevents a future editor mishap from re-introducing the bug; the original misregistration sat in the tree undetected for multiple releases | Insufficient guardrail given the cost of recurrence |
| Replace the array with a hash-set built at startup | Strongest dedup guarantee at runtime | Requires non-trivial allocator wiring + new init/teardown lifecycle; loses the constant-init guarantee of the static array | Disproportionate to the bug; the audit + test combo is sufficient |
Use a generated static_assert instead of a runtime audit | Compile-time failure | C99 _Static_assert can't iterate a string-comparing predicate over a pointer array; would need a code-gen step | Adds build complexity for marginal gain over the runtime audit that already runs on every vmaf_init |
Audit at every get_by_name call instead of once at init | Catches duplicates that creep in via dlopen-like extensions (we don't have any) | O(N²) work on every call when the cost only needs to be paid once | Init-time audit is the right scope |
Consequences¶
- Positive: every iterator-driven dispatch path now allocates one ctx-pool entry per extractor instead of 2–11; per-pic init/extract/flush invocations drop by the same factor for the affected Vulkan / SYCL twins. The audit catches the bug class at init time on every future build.
- Negative:
vmaf_init()now performs an O(N²) walk (~3 600 comparisons at N≈60). Negligible — runs once perVmafContext. - Neutral / follow-ups: the cross-backend numeric-parity sweep (
/cross-backend-diff) is the right tool to confirm the Vulkan scores are now sane; the CHUG re-extract should be re-run against the deduped binary to verify the "VMAF=99" pattern is gone. If Vulkan scores were genuinely meaningless because of the duplicate init/flush sequence, the snapshot JSONs undertestdata/may need a/regen-snapshotspass — assess after the parity sweep.
References¶
- Finding 22,
.workingdir/bbb_reports/DEEP_AUDIT_2026_05_18.md(deep audit run; the raw count + symptom-to-root-cause mapping that triggered this fix). - Companion CLAUDE.md §12 r12 (touched-file lint-clean rule) — the edited block also drops the trailing
// ...C++-style comments in favour of/* ... */blocks for clang-format consistency. - Companion CLAUDE.md §12 r13 (state.md update) — see
docs/state.md"Recently closed" row referencing this ADR.