ADR-0240: GPU backend public-header pattern doc (PR3 of GPU dedup, doc-only)¶
- Status: Accepted
- Date: 2026-05-02
- Deciders: Lusoris
- Tags: docs, gpu, agents, fork-local
Context¶
GPU dedup PR3 (after PR2 = ADR-0239 picture-pool extract) was originally scoped as "GPU public header codegen — extract libvmaf_{cuda,sycl,vulkan,hip}.h from one template". A 2026-05-02 audit measured the four headers and found:
- ~20 of ~200 lines per header truly shared (state init / import_state / state_free lifecycle).
- The remaining 90 % is backend-specific feature surface — CUDA's picture preallocation, SYCL's DMABuf/VA/D3D11 import, Vulkan's VkImage zero-copy import + ring depth, HIP's scaffold-only surface.
Codegenning 10 % of each header would add a build-system Python dependency (Jinja2 or hand-rolled emitter) for too little return — the divergence between backends is the API surface, not the lifecycle boilerplate. The dedup-via-pattern approach the tiny-AI extractor template (ADR-0250) uses is a better fit: a recipe lives in docs/, new backends consult it, no build-system additions.
Decision¶
PR3 of the GPU dedup sequence ships two documentation artefacts instead of a codegen pipeline:
docs/development/gpu-backend-template.md— the recipe new GPU backends follow (state lifecycle, optional sections for_available/_list_devices/ picture preallocation / hwaccel zero-copy import, Doxygen + ABI conventions).core/include/libvmaf/AGENTS.md— a public-headers-tree invariant note that points back at the template + pins the rebase-sensitive ordering of the four existing backend headers.
The picture-preallocation section enforces the SYCL/Vulkan NONE / HOST / DEVICE three-method shape (CUDA's CUDA-specific HOST_PINNED is documented as a historical quirk new backends should not replicate). The "implementation MUST delegate to VmafGpuPicturePool" rule (ADR-0239) is restated in the template so future backends can't reimplement the round-robin lifecycle.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Doc-pattern (chosen) | Mirrors the tiny-AI template precedent (ADR-0250); zero build-system dependency; agent-readable AGENTS guidance | Future backend additions rely on developer discipline (no enforcement) | Picked: 90 % of each header is genuinely backend-specific; codegen is the wrong tool for ~10 % shared shape |
| Jinja2 codegen at build time | Forces shared lifecycle into one template; new backends are a one-line spec addition | Adds a Python build dependency; obscures the headers (you edit the template, not the file); high friction for a 20-LOC dedup | Rejected: ROI doesn't justify the build-system addition |
| Pure-Python codegen with checked-in outputs | No build-time Python; same template + a make regen-headers target | Two sources of truth — easy to drift; requires a CI gate that re-renders + diffs | Rejected: same return as Jinja2 with extra drift surface |
| Thin shared header with macro-emit blocks | Pure C; no codegen; #define X_BACKEND_LIFECYCLE(prefix) style | C macro stamping for type names is fragile; readability collapses; Doxygen on macros is brittle | Rejected: cure worse than the disease |
| Skip PR3 entirely; jump to PR4 (kernel host-glue) | Less PR churn | Leaves the user's "headers second" sequence step unhonoured; no agent-readable pattern doc for the next GPU backend addition | Rejected: doc-pattern is small, real, and lands the sequence step |
Consequences¶
- Positive:
- New GPU backends (Metal, DirectML, future ROCm-replacement) have a written recipe to follow — same dedup pattern the tiny-AI extractors use (ADR-0250 / PR #251 → migrated in PR #265).
- The four-header rebase invariant note pins which headers upstream syncs are expected to touch (only
libvmaf_cuda.h) versus mis-merge signals (any of the other three). - The
NONE / HOST / DEVICEthree-method picture-preallocation convention is locked as the new-backend default;HOST_PINNEDwon't propagate beyond CUDA. - Negative:
- No machine-enforced rule that future backends consult the template. Discipline-only; relies on PR review + the AGENTS.md pointer firing in agents' contexts.
- The dedup return is small in absolute LOC (template doc + AGENTS.md, ~250 lines added to ship the recipe).
- Neutral / follow-ups:
- PR4 of the dedup sequence (T-GPU-DEDUP-3): GPU feature kernel host-glue extract. 10+ files per backend share state/init/close boilerplate (~250/500 LOC per file). The leverage there is dramatic; this is the highest-LOC dedup of the sequence.
- Vulkan picture_vulkan_pool.c migration (T-GPU-DEDUP-2-vulkan): after #264 (Vulkan picture preallocation) lands, rewrite the Vulkan pool as a thin wrapper around the generic pool — parallel to what #266 did for SYCL. Tracked separately because it depends on #264 merging first.
References¶
- Source:
req2026-05-02 — popup answer "all three sequenced — pool first, headers second, kernels third"; clarification reply "we only created the boilerplate but we didn't actually migrate" (which led to the audit-then-doc-pattern reframing of PR3 from codegen to recipe). - 2026-05-02 GPU dedup audit (in-conversation, agent-produced).
- ADR-0239 — PR2 of the dedup sequence (picture-pool extract).
- ADR-0250 — the precedent for "recipe doc + shared helpers, not codegen".
docs/development/gpu-backend-template.md— the recipe.core/include/libvmaf/AGENTS.md— the public-headers-tree invariant note.