Research digests¶
Iteration-time research notes for the lusoris vmaf fork. Each digest captures what was investigated and why for a fork-local workstream — source links, alternatives weighed, prior art, dead ends.
These are not ADRs:
- An ADR records a decision and its alternatives at the moment it was made. The body is frozen once Accepted.
- A research digest records the learning behind that decision (and the iterations that followed). It can be amended as new evidence arrives, the same way a lab notebook is.
A typical workstream has one ADR (the decision) and one research digest (the supporting investigation). Some PRs reuse an existing digest by linking; that is fine.
When to write one¶
Required by ADR-0108 on every fork-local PR that makes a non-trivial design choice. PRs without a design choice (e.g., a one-line bug fix in fork-added code) state "no research digest needed: trivial" in the PR description and skip the file. Reuse over duplication: if the workstream already has a digest, link it from the new PR instead of starting a parallel one.
Format¶
Each file is named NNNN-kebab-case-topic.md with a 4-digit zero-padded ID assigned in commit order. The structure mirrors 0000-template.md:
# Research-NNNN: <short, descriptive title>
- **Status**: Active | Superseded by Research-MMMM | Archived
- **Workstream**: <ADR-NNNN, ADR-MMMM, ...>
- **Last updated**: YYYY-MM-DD
## Question — what was the unknown going in
## Sources — papers, upstream docs, Netflix issues, prior PRs
## Findings — what was learned, with citations
## Alternatives explored — what didn't work and why
## Open questions — what is still unknown
## Related — ADRs, PRs, issues
Conventions:
- IDs are assigned in commit order and never reused.
- Digests are amendable — update the
Last updateddate when you add findings. To replace one entirely, addStatus: Superseded by Research-MMMMand write a new file. - Cite sources inline with
[link text](URL)so readers can verify. - Keep one digest per workstream, not per PR. Cross-link from the PR description.
Index¶
| ID | Title | Status | Workstream |
|---|---|---|---|
| 0001 | Cache shape for bisect-model-quality nightly | Active | ADR-0109 |
| 0002 | Automating process-ADR enforcement (0100 / 0105 / 0106 / 0108) | Active | ADR-0124 |
| 0003 | SSIMULACRA 2 port source selection + upstream-drift strategy | Active | ADR-0126 |
| 0004 | Vulkan compute backend — loader, shader language, allocator, DMABUF import | Active | ADR-0127 |
| 0005 | Embedded MCP in libvmaf — threading, JSON library, SSE server, Power-of-10 fit | Active | ADR-0128 |
| 0006 | Tiny-AI PTQ int8 — accuracy targets, ORT API comparison, calibration sourcing | Active | ADR-0129 |
| 0007 | SSIMULACRA 2 scalar port — YUV handling, blur deviation, snapshot tooling | Active | ADR-0126, ADR-0130 |
| 0008 | MS-SSIM decimate SIMD — FLOP accounting, summation order, bit-exactness | Active | ADR-0125 |
| 0010 | Is Netflix about to ship a SpEED-driven VMAF successor? (informational) | Active | — |
| 0011 | _iqa_convolve AVX2 — bit-exactness via __m256d, kernel invariants, Amdahl | Active | ADR-0138 |
| 0012 | SSIM SIMD bit-exactness to scalar — where the ULP drifted | Active | ADR-0139 |
| 0013 | SIMD DX framework — audit + NEON bit-exactness port | Active | ADR-0140 |
| 0014 | psnr_hvs NEON sister port — half-wide split strategy, aarch64 gotchas, QEMU verification limits | Active | ADR-0160 |
| 0015 | SSIMULACRA 2 AVX2 + AVX-512 + NEON — per-lane cbrtf, left-to-right summation, 2×2 downsample deinterleave | Active | ADR-0161 |
| 0016 | SSIMULACRA 2 IIR blur SIMD — row-batching with gather (horizontal), column-SIMD (vertical), bit-exact to scalar | Active | ADR-0162 |
| 0017 | SSIMULACRA 2 picture_to_linear_rgb SIMD — per-lane scalar reads, SIMD matmul, per-lane scalar powf | Active | ADR-0163 |
| 0018 | SSIMULACRA 2 snapshot-JSON regression gate — why fork self-consistency beats libjxl/Pacidus cross-check at this scope | Active | ADR-0164 |
| 0031 | Intel AI-PC NPU + EP applicability to tiny-AI / dnn/ — verdict: defer NPU; iGPU already covered by OpenVINO EP | Active | — (backlog T7-9) |
| 0046 | vmaf_tiny_v3 (mlp_medium 6→32→16→1, 769 params) vs v2 (mlp_small 257 params): 4-corpus parquet, identical recipe; Netflix LOSO mean PLCC 0.9986 ± 0.0015 vs v2's 0.9978 ± 0.0021 (+0.0008 mean, -29 % std). Decision matrix + per-fold table; ship-alongside-v2 recommendation. | Active | ADR-0241 |
| 0048 | vmaf_tiny_v4 (mlp_large, 3 073 params) — does the architecture ladder saturate? Verdict: yes, +0.0001 mean PLCC vs v3 (below 1 std). Ladder stops at v4. | Active | ADR-0242 |
| 0053 | iqa_convolve block-of-N tap widen — failed-attempt post-mortem; per-tap widen is load-bearing for bit-exactness, block-of-4 reorder mismatches scalar on 27.67 % of pixels (10 M Monte Carlo) | Active | ADR-0138 |
| 0054 | precise decoration audit on vif.comp + ciede.comp — Step A of the Vulkan 1.4 bump path. ciede improves 19× (42/48 → 5/48 mismatches at NVIDIA driver 595.71); vif decorated correctly but the 1.4 regression is not in the tagged float ops. Step B stays blocked. | Active | ADR-0269, ADR-0264 |
| 0055 | Root-causes the residual 5/48 NVIDIA-Vulkan ciede2000 places=4 mismatch (1.78× threshold, max abs 8.9e-05) deferred from PR #346. Triangulates double-CPU vs experimental float-CPU vs NVIDIA-Vulkan: f32-CPU matches NVIDIA-GPU to ~6e-7 on the 5 failing high-ΔE frames. Conclusion: structural f32/f64 colour-space-chain precision gap, not a driver fast-math bug. Mitigations rejected; documented as fork debt. | Active | ADR-0273 |
| 0085 | Vendor-neutral VVC (H.266) GPU encode landscape — survey of NVENC (Ada+ silicon only), AMD AMF / Intel QSV (decode-only in 2026), VK_KHR_video_encode_h266 (unratified), HIP / SYCL ports of VVenC (3–6 eng-month effort), NN-VC tools via ONNXRuntime EPs (vendor-neutral today), ZLUDA (rejected). Cost / risk / value matrix + three-tier rollout recommendation feeding ADR-0315. | Active | ADR-0315 |
| 0086 | Usage-doc coverage audit against ADR-shipped surfaces — 255 ADRs scanned, 46 GOOD / 31 BACKFILL / 178 N/A; identifies 5 highest-leverage gaps (vmaf-tune codec adapters, --score-backend, --cache, Vulkan image import, HDR + sample-clip) for full backfill in this PR; remaining 26 land as ADR-cited stubs. | Active | ADR-0100, ADR-0167 |
| 0090 | Phase-A-promotion audit (2026-05-08) — repo-wide scan for surfaces still flagged "Phase A only / scaffold-only / Phase B pending" whose follow-up wiring hasn't shipped. 5 production-blocking promotions (HDR not actually wired into iter_rows; 15 of 17 codec adapters bypass ffmpeg_codec_args; vmaf-tune fast has no CLI subcommand; embedded MCP and HIP runtimes still -ENOSYS), 12 cosmetic doc-drift items, 9 ADRs ready for Proposed→Accepted. Recommended sprint plan + sibling-agent coordination notes. | Active | ADR-0237, ADR-0261, ADR-0276, ADR-0209 |
| 0091 | End-to-end integration audit of every shipped libvmaf feature extractor against an 8-rung ladder (CPU → backends → SIMD → corpus → trainer → predictor → docs → tests). 22 extractors inventoried; 0 score 8/8. Engine rungs (1-3) mostly green; learning rungs (4-6) red across the board because CORPUS_ROW_KEYS captures only vmaf_score and ShotFeatures accepts no libvmaf metric outputs. Surprise findings: vmaf_fex_ssim (integer SSIM) is defined but never registered — dead symbol since CPU registration list ships without it. Top-5 promotions ranked by AI-stack ROI. | Active | — |
| 0126 | vmaf-tune HDR dispatch coverage — widens the central hdr_codec_args() table for AV1 NVENC, HEVC/AV1 QSV, HEVC/AV1 AMF, HEVC VideoToolbox, and libaom while keeping private SEI flags limited to verified families. | Active | ADR-0300 |
| 0135 | CHUG/K150K extractor I/O cost breakdown and Win 1 + Win 2 optimisations — per-clip cost audit from perf-audit §6; replaces O(N²) parquet flush with at-end-only write via JSONL staging; adds ffprobe skip from CHUG sidecar geometry; decision matrix for in-memory vs streaming vs DuckDB; projected wall-time savings for 5992-clip CHUG run. | Active | — (perf-audit-pipeline-2026-05-16.md §6) |
| 0136 | HDR/UGC dataset license + access audit (2026-05-15) — evaluates 13 candidate corpora from Audit Slice C.7; 6 datasets classified ACTIONABLE-NOW (Beyond8Bits, HDRSDR-VQA, LIVE HDR Database, IPI-MobileHDRVQA, HDR-VDC, CHUG already active); 5 BLOCKED on access or license; 1 BLOCKED on infrastructure. HDRSDR-VQA's 6-display pairwise design surfaces the new panel/display-aware workstream scoped in ADR-0459. | Active | ADR-0459 |
| 0053 | Post-merge CPU profile 2026-05-03 — perf top-10 after PRs #310–#321; surfaces 3 new opt targets (convolve widen, SSIM double reduction, VIF gather elimination) | Active | — | | 0081 | Real-corpus retrain methodology for the fr_regressor_v2 deep ensemble — corpus-size sufficiency (9 ref + 70 dis @ .workingdir2/netflix/), 9-fold LOSO sizing inherited from the deterministic ADR-0291 baseline, seed-diversity hyperparameters, and the Seeking_25fps weak-fold diagnostic for HOLD-on-spread cases. | Active | ADR-0309 | | 0089 | CPU double vs Vulkan float stage bisect on the residual NVIDIA-Vulkan integer_vif_scale2 45/48-frame places=4 mismatch at API 1.4 (T-VK-VIF-1.4-RESIDUAL). Static SPIR-V re-verification confirms only 5 FP-arithmetic ops in vif.comp and all 5 are NoContraction-decorated post-PR #346 — SPIR-V mitigation surface is exhausted. SYCL counter-example (same f32 contract, passes the gate) rules out a pure f32-vs-f64 class issue. Localised root cause: NVIDIA shaderFloatControls2-v2 codegen flip at API 1.4 on a non-IEEE-bound default (reciprocal-multiply, fast-rsq) outside the SPIR-V declarable surface. Phase-2 shader fix not warranted; recommends per-stage NVIDIA dynamic dump or places=3 override ADR. | Active | ADR-0264, ADR-0269 | | 0090 | Per-commit triage of the 41 upstream commits binned SKIP-doc-or-format by the /sync-upstream Pass-2 heuristic on 2026-05-08. Splits into 5 PORT_NOW (motion_v2 mirroring bugfix, motion_v2 option cluster + prev_prev_ref API, two cambi internals), 18 PORT_LATER (python/test MyTestCase migration, blocked on agent-E worktree), 4 DEFER_INDEFINITELY, 1 PORTED_SILENTLY (662fb9ce semaphores → fork commit e5a52e74), 12 MERGE_BOUNDARY. Surfaces the riskiest item: the python/test mass-port is +5 600 LOC and crosses Netflix-golden assertions in feature_extractor_test.py. | Active | — (companion to Research-0089) | | 0135 | Vulkan dispatch overhead characterization — T7-18: startup dominated by uncached vkCreateComputePipelines; per-frame fence/submit overhead ruled out; pipeline-cache fix recommended | Active | T7-18 | | 0091 | CAMBI CUDA integration trade-offs (T3-15a): per-thread 49-read vs shared-memory SAT for the spatial-mask kernel; synchronous vs async ring-buffer DtoH for the 5-scale pipeline; host_pinned slot reuse for score storage; two compile-time bugs found and fixed (cuMemcpyDtoH arg order, VMAF_FEATURE_DISPATCH_SEQUENTIAL non-existence). Predecessor: Research-0032 (Vulkan twin). | Active | ADR-0360 |
| 0734 | CUDA VIF filter1d.cu ncu hotpath profile on RTX 4090 (sm_89). Primary bottleneck: launch-width-limited (0.84 waves) + register pressure (56 regs/thread). Top kernel filter1d_8_horizontal_kernel_2_17_9 = 35 % of VIF filter time. Three optimization candidates: increase val_per_thread 2→4, reduce register live range, add __ldg() for smem loads. | Active | — | (Index seeded by ADR-0108's adoption PR; backfilled digests for the existing major workstreams will be added as their authors revisit the corresponding code.) | 0135 | CAMBI CUDA spatial-mask SLM tile -- design analysis: img-tile correctness bug, 26x read reduction via direct zd_tile load, bank-conflict accepted at uint8 row access | Active | ADR-0464 | | 0751 | Cross-backend 4K (3840x2160) baseline (CPU + CUDA) and PR #79 adm_cm_line_kernel_8 A/B at 4K. RTX 4090 medians: vif CUDA 147 fps, adm CUDA 161 fps. filter1d fully saturated at 4K (253 waves, 69.7% occ). adm_cm __launch_bounds__ win is zero at 4K (-0.3%) vs -9.3% at 1080p (register-bound regime only). ms_ssim_decimate scale 0 saturated at 4K (88.1% occ). | Active | — |