ADR-0255: FastDVDnet temporal pre-filter — real upstream weights via luma adapter (T6-7b)¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: Lusoris, Claude (Opus 4.7)
- Tags:
ai,dnn,feature-extractor,wave-1,weights-drop
Context¶
ADR-0215 shipped the FastDVDnet temporal pre-filter contract — a registered feature extractor fastdvdnet_pre, a 5-frame ring buffer in core/src/feature/fastdvdnet_pre.c, a smoke-only placeholder ONNX under model/tiny/fastdvdnet_pre.onnx, and the smoke test plumbing. The placeholder was a randomly-initialised 3-layer CNN matching the declared I/O shape (frames [1, 5, H, W] luma, denoised [1, 1, H, W]); the registry row carried smoke: true and the doc flagged the missing real weights as backlog item T6-7b.
Two structural mismatches blocked a verbatim weights drop:
- RGB vs luma. Upstream FastDVDnet (Tassano, Delon, Veit 2020;
github.com/m-tassano/fastdvdnet, MIT) takes RGB inputs of shape(1, 5*3, H, W) = (1, 15, H, W)plus a per-pixel noise map of shape(1, 1, H, W). The fork's C contract is luma-only with no noise-map input. - PixelShuffle. Upstream's UpBlock uses
nn.PixelShuffle, which PyTorch's ONNX exporter emits asDepthToSpace.DepthToSpaceis not on the fork's strict ONNX op allowlist (core/src/dnn/op_allowlist.c).
Both mismatches need resolving in the same PR or the weights drop breaks either the C extractor or the model loader's allowlist scan.
Decision¶
We will ship the upstream FastDVDnet checkpoint (pinned at commit c8fdf61, sha256 9d9d8413c33e3d9d961d07c530237befa1197610b9d60602ff42fd77975d2a17) wrapped by a small LumaAdapter PyTorch module that preserves the existing C-side I/O contract:
- replicate each luma plane into RGB via
Concat(Y -> [Y, Y, Y]) to match upstream's 15-channel input; - broadcast a constant
sigma = 25/255noise map (the upstream reference inference noise level used bytest_fastdvdnet.py) viaones_like(centre) * sigma; - collapse the upstream RGB output back to luma using BT.601 weights
Y = 0.299 R + 0.587 G + 0.114 B.
We will additionally swap every nn.PixelShuffle instance in the upstream graph for an allowlist-safe Reshape → Transpose → Reshape decomposition before exporting. PixelShuffle has zero learned parameters, so the swap is numerically identical (verified < 1e-6 max-abs diff between the upstream PyTorch graph and the exported ONNX on random luma inputs).
The export script lives at ai/scripts/export_fastdvdnet_pre.py; it verifies the upstream weights checksum, rebuilds the wrapped graph, and re-emits the registry row with smoke: false, license: "MIT", the upstream commit pin, and a refreshed sha256.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Luma adapter + PixelShuffle swap (chosen) | Real working denoiser; zero changes to the C extractor or its tests; reproducible from a pinned upstream checkpoint; one-PR delivery. | Network was not trained on luma-tiled-into-RGB inputs, so denoising quality is below a luma-native retrain. | Picked. The quality cost is bounded (BT.601 round-trip is well-defined) and the path to a luma retrain (T6-7c) is explicit in the doc. |
| Change the C extractor to accept RGB + noise map | Faithful to upstream training distribution. | Breaks the public extractor contract pinned in ADR-0215; requires a co-ordinated update to the future FFmpeg vmaf_pre_temporal filter wiring; doubles the surface area of this PR. | Deferred. If a luma retrain (T6-7c) lands, the contract stays luma-only anyway. |
| Train a luma-native FastDVDnet from scratch | No domain mismatch; smaller checkpoint feasible. | Hours of training + dataset prep; out-of-scope for one PR (same blocker that drove ADR-0215's placeholder). | Tracked as T6-7c. |
| Keep DepthToSpace in the allowlist | Simpler export script (no monkey-patching of upstream's UpBlock). | Widens the trusted op surface for the entire DNN loader; every future model that uses DepthToSpace inherits the change. | Rejected. PixelShuffle is shape-only, so the decomposition costs nothing and keeps the allowlist tight. |
| INT8-quantize the wrapper graph (per-model PTQ, ADR-0174 pattern) | Smaller checkpoint (~2.5 MiB instead of 9.5 MiB). | PTQ requires representative calibration data the fork doesn't have for video denoising; quality budget verification absent. | Deferred. The 9.5 MiB fp32 checkpoint is well under the 50 MiB DNN size cap; PTQ is a follow-up if the size matters. |
| FP16 export (smaller checkpoint, common DNN mode) | Halves the on-disk size. | The fork's DNN loader path defaults to fp32 inference; mixing fp16 weights complicates the smoke-gate path. | Deferred until a fork-wide fp16 inference policy lands. |
Consequences¶
- Positive: T6-7b row closed;
fastdvdnet_preships a real working denoiser whose output now meaningfully tracks the reference inference behaviour from upstream's published checkpoint. The registry row'ssmoke: falseflag and the sidecar JSON's upstream-commit pin make the weights provenance auditable. - Negative: 9.5 MiB additional binary content under
model/tiny/fastdvdnet_pre.onnx(was ~6 KiB placeholder). Acceptable: well under the 50 MiB ONNX size cap and roughly 3xlpips_sq.onnx(3.27 MiB), which is already shipped. - Neutral / follow-ups:
- T6-7c: train a luma-native FastDVDnet variant (or a chroma-aware luma+chroma extension) once the FFmpeg
vmaf_pre_temporalfilter (T6-7b proper, separate row) is in flight. The retrain target depends on the actual consumer surface. - T6-7d (optional): PTQ the wrapper graph to int8 once representative video-denoising calibration data exists; budget against an L1-residual gate on a held-out clip set.
References¶
- Tassano, Delon, Veit. FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation, CVPR 2020. arXiv:1907.01361.
- Upstream repo:
https://github.com/m-tassano/fastdvdnetpinned atc8fdf6182a0340e89dd18f5df25b47337cbede6f. - ADR-0215 — original contract + placeholder rationale.
- ADR-0042 — tiny-AI doc bar.
- ADR-0174 — per-model PTQ pattern (informs T6-7d).
- Source: per user direction (research+code task, "pick the smoke-only ONNX with the most accessible upstream weights and ship the real drop").