Research-0091: Partial-integration audit — features the AI can't learn from yet¶
- Status: Draft
- Date: 2026-05-08
- Author: lusoris (with Claude)
- Tags: audit, metrics, feature-extractors, tiny-ai, integration, vmaf-tune
TL;DR¶
The fork ships 22 user-facing feature extractors (CPU registry in core/src/feature/feature_extractor.c), most of them with rich SIMD and GPU-backend coverage. End-to-end integration is not the bottleneck on the engine rungs (1-3). The bottleneck sits squarely on the learning rungs (4-6):
- Fully integrated, 8 / 8 rungs: 0 features.
- Engine-complete (rungs 1+2+3) but AI-blind (rungs 4-6): 20 features. Every metric VMAF actually scores is invisible to the fork's predictor / ensemble training because:
tools/vmaf-tune/src/vmaftune/__init__.py:CORPUS_ROW_KEYS(rung 4) captures onlyvmaf_score— no per-feature columns at all. All trainers (train_fr_regressor_v2.pyline 76-85,train_fr_regressor_v2_ensemble.pyline 82-89,train_fr_regressor_v2_ensemble_loso.pyline 75,train_fr_regressor_v3.pyline 86) require an out-of-band "canonical-6" feature subset (adm2,vif_scale0..3,motion2) that the corpus tool does not emit; the v2 trainer's_row_to_features(line 282-301) explicitly comments "Phase A's current schema does not emit per-frame features" and falls back to synthetic features in--syntheticmode.tools/vmaf-tune/src/vmaftune/predictor.py:ShotFeatures(lines 53-86) accepts zero libvmaf metric outputs — only probe-encode bytes, saliency mean/var, signalstats luma stats, and structural metadata. Rung 6 fails for every metric without exception.- Backend-incomplete on rung 2: 5 notable cases (CAMBI is Vulkan-only on GPUs; SSIM-fixed CPU is dead code; HIP coverage gaps on ADM / VIF / SSIM; SYCL has no CAMBI / SSIMULACRA2 has no NEON SIMD).
- Doc-incomplete on rung 7: 2 features missing dedicated pages (CIEDE2000 has no
docs/metrics/ciede.md; ANSNR has no own page).docs/metrics/features.mdis the per-feature reference and covers every shipped extractor inline; standalone pages exist only for CAMBI today. - Engine-broken (rung 1): integer-fixed
ssimdefinesvmaf_fex_ssimincore/src/feature/integer_ssim.c:280but is never registered infeature_extractor_list[](feature_extractor.clines 145-225). The symbol is dead — the only CPU SSIM that actually runs isfloat_ssim. Thedocs/metrics/features.mdtable even listsssim(fixed) as shipping with a Vulkan kernel and "no SIMD"; the row is misleading because--feature ssimcannot resolve at all. - Engine-broken (rung 1, milder): integer-fixed PSNR-HVS has CUDA / SYCL / Vulkan kernels but no CPU file (
integer_psnr_hvs.cdoesn't exist; onlyinteger_psnr_hvs_cuda.c,integer_psnr_hvs_sycl.cpp,psnr_hvs_vulkan.c); the CPUpsnr_hvsextractor atcore/src/feature/psnr.c(no — see inline note infeatures.md) covers it via a different path.
The single highest-ROI gap for the user's stated framing ("we can only learn from them with AI"): add per-feature columns to CORPUS_ROW_KEYS. Any trainer that wants to consume more than adm2 + vif_scale0..3 + motion2 is currently blocked at the data layer, not at the model layer.
End-to-end integration ladder used¶
Each row in the matrix below is scored against eight rungs:
- CPU reference —
.cfile undercore/src/feature/, registered infeature_extractor_list[]. - All shipped backends covered — CUDA + SYCL + Vulkan + HIP, where each is meaningful.
- SIMD coverage — AVX2 + AVX-512 + NEON for pixel-level features.
- Corpus row schema — the feature's per-frame value lands in
CORPUS_ROW_KEYS. - Trainer feature input — the feature is consumed as a column by
ai/scripts/train_fr_regressor_v2*.py/_v3.py/ ensemble. - Predictor input — the feature surfaces in
tools/vmaf-tune/src/vmaftune/predictor.py:ShotFeatures. - User-facing surface — documented in
docs/metrics/, exposed via--feature <name>on the CLI, available via the ffmpeglibvmaffilter. - Tests — Netflix-golden CPU + at least one cross-backend ULP (the
/cross-backend-diffartefact).
Legend: ✅ full, ⚠️ partial, ❌ missing, – not applicable.
Coverage matrix¶
| Feature | Invocation | CPU | CUDA | SYCL | Vulkan | HIP | SIMD | Corpus | Trainer | Predictor | Docs | Tests | Score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| VIF (fixed) | vif | ✅ | ✅ | ✅ | ✅ | ⚠️¹ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | 9 / 12 |
| VIF (float) | float_vif | ✅ | ✅ | ✅ | ✅ | ⚠️¹ | – | ❌ | ⚠️² | ❌ | ✅ | ✅ | 8 / 12 |
| Motion (fixed) | motion | ✅ | ✅ | ⚠️³ | ✅ | ⚠️¹ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | 8 / 12 |
| Motion v2 (fixed) | motion_v2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ⚠️² | ❌ | ✅ | ✅ | 9 / 12 |
| Motion (float) | float_motion | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ⚠️² | ❌ | ✅ | ✅ | 9 / 12 |
| ADM (fixed) | adm | ✅ | ✅ | ✅ | ✅ | ⚠️¹ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | 9 / 12 |
| ADM (float) | float_adm | ✅ | ✅ | ✅ | ✅ | ❌⁴ | ✅ | ❌ | ⚠️² | ❌ | ✅ | ✅ | 8 / 12 |
| CAMBI | cambi | ✅ | ❌ | ❌ | ✅ | ❌ | ⚠️⁵ | ❌ | ❌ | ❌ | ✅ | ✅ | 5 / 12 |
| CIEDE2000 | ciede | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ⚠️⁶ | ✅ | 8 / 12 |
| PSNR (fixed) | psnr | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | 8 / 12 |
| PSNR (float) | float_psnr | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | 8 / 12 |
| PSNR-HVS | psnr_hvs | ✅ | ✅ | ✅ | ✅ | ❌⁴ | ⚠️⁷ | ❌ | ❌ | ❌ | ✅ | ✅ | 7 / 12 |
| SSIM (fixed) | ssim | ❌⁸ | – | – | ✅ | – | – | ❌ | ❌ | ❌ | ⚠️⁸ | ❌ | 1 / 12 |
| SSIM (float) | float_ssim | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | 8 / 12 |
| MS-SSIM (float) | float_ms_ssim | ✅ | ✅ | ✅ | ✅ | ❌⁴ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | 7 / 12 |
| ANSNR (float) | float_ansnr | ✅ | ✅ | ✅ | ✅ | ✅ | – | ❌ | ❌ | ❌ | ⚠️⁶ | ❌ | 6 / 12 |
| SSIMULACRA2 | ssimulacra2 | ✅ | ✅ | ✅ | ✅ | ❌⁴ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | 7 / 12 |
| Float moment | float_moment | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️⁹ | ❌ | ❌ | ❌ | ✅ | ❌ | 6 / 12 |
| LPIPS (tiny-AI) | lpips | ✅ | –ᴬ | –ᴬ | –ᴬ | –ᴬ | – | ❌ | ❌ | ❌ | ✅ | ❌ | 4 / 12 |
| FastDVDnet pre | fastdvdnet_pre | ✅ | – | – | – | – | – | ❌ | ❌ | ❌ | ✅ | ✅ | 3 / 12 |
| Mobilesal | (internal) | ✅ | – | – | – | – | – | ❌ | ❌ | ❌ | ✅ | ✅ | 3 / 12 |
| TransNet V2 | transnet_v2 | ✅ | – | – | – | – | – | ❌ | ❌ | ❌ | ✅ | ✅ | 3 / 12 |
| Speed chroma | speed_chroma | ✅ | – | – | – | – | – | ❌ | ❌ | ❌ | ✅ | ✅ | 3 / 12 |
| Speed temporal | speed_temporal | ✅ | – | – | – | – | – | ❌ | ❌ | ❌ | ✅ | ✅ | 3 / 12 |
| Saliency-student | (python only) | ❌ᴮ | – | – | – | – | – | ❌ | ❌ | ⚠️ᴮ | ✅ | ⚠️ | 1 / 12 |
Footnotes¶
¹ HIP exposes ciede, motion, vif, adm via the C-side _hip.c files in core/src/feature/hip/ but several are still labelled "HIP nth-consumer scaffold" (T7-10a/b series, ADR-0273/0274) — kernels run under hipLaunchKernelGGL but several sibling agents (feat/hip-* worktrees) are completing the bit-exact-vs-CUDA gate. Treat ⚠️ as "compiles + registers + emits scores; cross-backend ULP not yet at the T6-8 GPU-parity bar (ADR-0214)". CI-side this is the same status tracked in docs/state.md.
² Trainer consumes motion2, vif_scale0..3, adm2 via CANONICAL_6 — the columns are sourced from a non-canonical sidecar JSON the corpus tool does not produce. So even the "trainer-fed" features are only consumed when an external pipeline hand-attaches per_frame_features to corpus rows. v3 raises the bar: train_fr_regressor_v3.py:143 requires the columns inside the corpus DataFrame, which the current corpus.py cannot satisfy.
³ Motion (fixed) on SYCL: integer_motion_sycl.cpp exists and is registered, but motion2 GPU-vs-CPU bit-exactness is one of the recurring /cross-backend-diff debt items (see ADR-0186 + docs/state.md SYCL section).
⁴ HIP coverage gaps: float_adm, psnr_hvs, float_ms_ssim, ssimulacra2 have no _hip.c file in core/src/feature/hip/. T7-10b consumer plan (ADR-0273) tracks the rollout; not all 8 consumers have landed.
⁵ CAMBI SIMD: cambi_avx2.c and cambi_avx512.c exist (core/src/feature/x86/) and cambi_neon.c (core/src/feature/arm64/) — but docs/metrics/features.md's table reports "—" for CAMBI SIMD. Either the doc is stale or the SIMD paths exist but are not runtime-dispatched. Needs verification.
⁶ CIEDE2000 and ANSNR have no dedicated docs/metrics/<name>.md page. They are documented inline in docs/metrics/features.md under the per-extractor table, which satisfies ADR-0100's per-surface minimum bar (invocation / output / range / formats / limitations) — but the "every shipped feature has a page" framing in CLAUDE.md §12 r10 is debatable; CAMBI is the only fork-added metric with its own file. ⚠️ rather than ❌ because the inline coverage exists.
⁷ PSNR-HVS NEON: psnr_hvs_neon.c exists; AVX-512 path does not exist (features.md correctly says "AVX2, NEON" only). ⚠️ because the AVX-512 gap is intentional per ADR-pending; it is not "incomplete" so much as "deliberately scalar on AVX-512".
⁸ SSIM (fixed) is dead. integer_ssim.c:280 defines VmafFeatureExtractor vmaf_fex_ssim but feature_extractor_list[] in core/src/feature/feature_extractor.c:145-225 does not include &vmaf_fex_ssim. The Vulkan kernel (ssim_vulkan.c) and the doc row in docs/metrics/features.md both imply the feature ships, but --feature ssim cannot resolve at the CLI because the symbol is unreachable from the registry. Either (a) register it, or (b) delete the file + fix docs. A grep for any other reference to vmaf_fex_ssim returns only the definition itself.
⁹ Float moment SIMD: moment_avx2.c and moment_neon.c ship (core/src/feature/x86/, core/src/feature/arm64/); AVX-512 is absent (matches features.md "AVX2, NEON" row).
ᴬ LPIPS dispatches through ORT's execution provider (CPU / CUDA / OpenVINO / ROCm). It does not own a libvmaf-side GPU kernel; that's by design (the GPU-ness is delegated to ONNX Runtime). Marked – not ❌.
ᴮ Saliency-student is not a libvmaf feature extractor at all. It is registered as a tiny-AI model (docs/ai/models/saliency_student_v1.md), exposed only through tools/vmaf-tune/src/vmaftune/saliency.py, and its outputs surface in ShotFeatures.saliency_mean / saliency_var — making it the only rung-6-positive integration on the matrix. Listed for completeness because the user named it; not a libvmaf shipped surface, so the "CPU / CUDA / SYCL …" rungs are not applicable.
Needs verification¶
- CAMBI SIMD —
cambi_avx2.cetc. exist as files; runtime dispatch needs confirmation againstcpu.cto confirm whether they're wired through (footnote ⁵). Marked ⚠️ pending. - HIP "scaffold-vs-shipped" status per kernel — CI status under
gh workflowis the source of truth; this audit treats the presence of a_hip.c+ registry entry as ✅ but flags the T7-10b consumer-plan items as ⚠️ where the cross-backend ULP gate has not yet been declared green for that feature.
High-priority promotions (AI-relevant — rungs 4-6)¶
The user's central concern: features that score well on the engine rungs (1-3) but where the AI cannot learn from them because rung 4 (corpus row schema) is the bottleneck.
Top 5 promotions ranked by AI-stack ROI:
1. Add per-feature columns to CORPUS_ROW_KEYS¶
- What's missing:
tools/vmaf-tune/src/vmaftune/__init__.py:26-53has 26 keys, none of them per-feature. Per-frame VMAF feature values are produced bylibvmafon every score run and discarded. - Suggested next PR: Add
per_frame_features: dict[str, list[float]](or a flattenedfeature_<name>_mean/_p10/_p90/_varquartet per feature) to the corpus row. BumpSCHEMA_VERSIONfrom 2 to 3. Updatecorpus.pyto capture from the libvmaf JSON output (the CLI already writes per-frame metrics via the--jsonflag). - One-line ROI: Unblocks every existing trainer (v2, v2-ensemble, v2-ensemble-loso, v3) to consume real per-frame canonical-6 instead of synthetic placeholders. Also opens rung 5 for non-canonical features (psnr/ssim/cambi/ssimulacra2/ciede) via downstream PRs.
2. Extend ShotFeatures to accept libvmaf metric digest¶
- What's missing:
tools/vmaf-tune/src/vmaftune/predictor.py:53-86accepts only probe-encode bytes + saliency + signalstats luma. Zero perceptual metrics from libvmaf reach the predictor MLP. - Suggested next PR: Add three new fields per metric of interest:
metric_<name>_mean/_var/_p10(the predictor wants cheap-to-compute summary stats, not full per-frame). Start withcambi_mean,psnr_y_mean,ssimulacra2_mean— the three metrics most-likely to improve VMAF prediction on out-of-distribution content (banding, severe-degradation, perceptual-jpeg-style). - One-line ROI: Closes rung 6 for the highest-signal metrics; gives the predictor a learnable tail-quality prior beyond the canonical-6.
3. Promote canonical-6 to v4: add cambi + ssimulacra2¶
- What's missing: Trainers v2/v3 hard-code 6 features; CAMBI and SSIMULACRA2 are not consumed despite both being shipped extractors with high perceptual-quality signal that VMAF (the model) does not fuse natively.
- Suggested next PR: Define
CANONICAL_8as a v4 schema in a new ADR, addcambi+ssimulacra2to the trainer's input columns, retrainfr_regressor_v3against an extended-corpus run. - One-line ROI: Closes rung 5 for two high-signal metrics that the fork ships fully but the AI ignores. Banding-sensitive content in particular loses the most signal today.
4. Wire saliency into the corpus row¶
- What's missing:
saliency_student_v1runs only insidevmaftune/saliency.pyand feeds the predictor but never lands in the corpus. The trainer cannot use it for ground-truth conditioning. - Suggested next PR: Add
saliency_mean+saliency_vartoCORPUS_ROW_KEYS; havecorpus.pyinvoke the saliency-student ONNX on the centre frame at corpus-build time. Schema bump to v4. - One-line ROI: The fork's only rung-6-shipped feature becomes rung-5-shipped too — closes the loop so the regressor can condition on the same saliency signal the predictor uses.
5. SSIM-fixed: register or delete¶
- What's missing:
vmaf_fex_ssimis defined but unregistered (footnote ⁸). Either path is a small PR; the current state is "engine-broken" for the only feature the doc table claims ships but cannot resolve from the CLI. - Suggested next PR: Decide via ADR. Either (a) register the symbol and add a
test_ssim_fixed.csmoke test, or (b) deleteinteger_ssim.c+integer_ssim_cuda.c+ the Vulkan / SYCL twins, and remove the row fromfeatures.md. (b) is the lower- effort path; the float SSIM covers the same use case. - One-line ROI: Stops the doc-vs-code drift that future contributors will hit. Not directly an AI-stack item; included because it's a rung-1 break that surfaced during this audit.
Backend-incomplete (rung 2)¶
For completeness — these are not the AI-blockers but they're the "the fork ships GPU backends X but feature F is missing from one of them" debt:
| Feature | Missing on | Tracking |
|---|---|---|
| CAMBI | CUDA, SYCL, HIP | ADR-0210 (Vulkan only) |
float_adm | HIP | T7-10b consumer plan |
psnr_hvs | HIP | T7-10b consumer plan |
float_ms_ssim | HIP | T7-10b consumer plan |
ssimulacra2 | HIP | T7-10b consumer plan |
motion2 (SYCL) | ULP-gate pending | ADR-0186 |
| SSIM-fixed | every backend (dead symbol) | this audit |
The HIP gaps are tracked sibling-agent work (feat/hip-*-consumers worktrees); not a fresh action item. CAMBI on CUDA / SYCL is real debt — it's the only metric on the matrix with three GPU-backend gaps at once.
Doc-incomplete (rung 7)¶
docs/metrics/features.md provides per-extractor coverage inline that satisfies ADR-0100's minimum bar (invocation, output keys, output range, input formats, options, backends, limitations) for all shipped extractors. Standalone files exist only for CAMBI today.
Per ADR-0100's per-surface bar, inline coverage in features.md is acceptable; the rule does not require a separate .md per metric. This audit therefore marks rung 7 as ✅ where features.md is substantive and ⚠️ where the inline section is thin (CIEDE2000 and ANSNR get only a one-liner table row plus footnotes).
Tiny-AI stricter bar (ADR-0042 5-point): The five tiny-AI features below are required to ship the 5-point doc per ADR-0042 (model card, inputs, outputs, limitations, retraining recipe). All five have a docs/ai/models/<name>.md page; quality varies but none are missing a page.
| Tiny-AI feature | Page | 5-point bar status |
|---|---|---|
lpips | docs/ai/models/lpips_sq.md | Verify; not part of this audit |
fastdvdnet_pre | docs/ai/models/fastdvdnet_pre.md | Verify; not part of this audit |
mobilesal (internal) | docs/ai/models/mobilesal.md | Verify; not part of this audit |
transnet_v2 | docs/ai/models/transnet_v2.md | Verify; not part of this audit |
saliency_student_v1 | docs/ai/models/saliency_student_v1.md | Verify; not part of this audit |
5-point compliance is out of scope for this digest — covered by the sibling Phase-A-promotion audit (af3bb1432deaf63ad) and the tiny-AI SOTA web-research strand.
Test-incomplete (rung 8)¶
float_moment— notest_moment.c; onlytest_moment_simd.ccovers the SIMD path. Netflix-golden coverage absent.float_ansnr— no dedicated test file undercore/test/.lpips,mobilesal,transnet_v2,fastdvdnet_pre—test_mobilesal.c,test_transnet_v2.c,test_fastdvdnet_pre.cexist;test_lpips.cdoes not. LPIPS is engine-tested via the ORT smoke (core/test/dnn/) but lacks a Netflix-golden row.ssim(fixed) — no test, dead symbol.- Saliency-student — Python-side tests exist (
tools/vmaf-tune/tests/); not a libvmaf C test.
Recommended sprint plan¶
Ranked by AI-stack ROI; each item is a single PR scope.
- Schema v3 — per-feature corpus columns (rung 4).
- File:
tools/vmaf-tune/src/vmaftune/__init__.py,tools/vmaf-tune/src/vmaftune/corpus.py,ai/scripts/train_fr_regressor_v2.py:_row_to_features. - Scope: ~150-300 LOC; bump
SCHEMA_VERSIONto 3; emit per-frame features array from the libvmaf JSON parser; update v2/v3 trainers to consume the schema directly without sidecar. -
ADR required (schema change).
-
Predictor input expansion —
ShotFeaturesv2 (rung 6). - File:
tools/vmaf-tune/src/vmaftune/predictor.py,tools/vmaf-tune/src/vmaftune/predictor_features.py. - Scope: ~200-400 LOC; add metric-digest fields, retrain analytical-fallback coefficients, update predictor smoke tests.
-
ADR required (predictor input contract change).
-
CANONICAL_8 — add cambi + ssimulacra2 to v4 trainer (rung 5).
- File:
ai/scripts/train_fr_regressor_v4.py(new). - Depends on item 1.
-
ADR required (input vocabulary expansion).
-
Saliency in corpus (rung 5).
- File:
tools/vmaf-tune/src/vmaftune/corpus.py,tools/vmaf-tune/src/vmaftune/saliency.py. -
Depends on item 1; piggy-backs on the schema bump.
-
SSIM-fixed dead-symbol cleanup (rung 1).
- File:
core/src/feature/feature_extractor.c(register) or delete the integer SSIM family. Quick ADR either way. - Touched-file cleanup per ADR-0141.
References¶
- ADR-0042 — tiny-AI docs required per PR (5-point bar).
- ADR-0100 — project-wide doc substance rule.
- ADR-0141 — touched-file cleanup rule.
- ADR-0186 — Vulkan image-import impl + cross-backend bit-exactness posture.
- ADR-0210 — CAMBI Vulkan integration (the source of CAMBI's Vulkan-only GPU posture).
- ADR-0214 — T6-8 GPU-parity gate.
- ADR-0237 — quality-aware encode automation roadmap (Phase A → F).
- ADR-0273 / ADR-0274 — HIP T7-10b consumer plan.
- ADR-0291 / ADR-0319 — canonical-6 trainer feature contract.
- ADR-0297 — sample-clip mode (origin of
clip_modeinCORPUS_ROW_KEYSv2). - Research-0044 — quality-aware encode automation option-space.
- Research-0078 — encoder-vocab v3 schema expansion (sibling).
- Research-0081 — fr_regressor_v2 ensemble real-corpus methodology.
- Research-0084 — ffmpeg-patch vmaf-tune integration survey.
- Phase-A-promotion audit (in flight, worktree
agent-af3bb1432deaf63ad) — different angle (scaffold markers vs integration completeness); cross-link. - Phase-F design (sibling worktree
agent-ad1f149047faaf0db) — composition of existing surfaces; this audit's findings inform Phase F's "what's actually composable" inventory. - Tiny-AI SOTA web research (sibling) — external SOTA comparison.
- Predictor train pipeline (sibling) — companion to item 2 above.
Appendix: source-of-truth file paths¶
- CPU registration:
core/src/feature/feature_extractor.c:145(thefeature_extractor_list[]table). - Corpus schema:
tools/vmaf-tune/src/vmaftune/__init__.py:22-53(SCHEMA_VERSIONandCORPUS_ROW_KEYS). - v2 trainer canonical-6:
ai/scripts/train_fr_regressor_v2.py:78-85. - v2 trainer "no per-frame" comment:
ai/scripts/train_fr_regressor_v2.py:298-301. - v3 trainer required-columns assertion:
ai/scripts/train_fr_regressor_v3.py:143-148. - Predictor
ShotFeatures:tools/vmaf-tune/src/vmaftune/predictor.py:53-86. - Dead
vmaf_fex_ssimdefinition:core/src/feature/integer_ssim.c:280.