Research-0055 — U-2-Net u2netp saliency replacement survey¶
| Field | Value |
|---|---|
| Date | 2026-05-03 |
| Status | Blocker recorded — placeholder remains shipped |
| Tags | dnn, tiny-ai, mobilesal, u2netp, saliency, license, op-allowlist, blocker |
Companion to ADR-0265. Captures the upstream survey, the license + distribution audit, the ONNX op-allowlist mismatch, and the alternatives reviewed before the deferral decision.
Background¶
ADR-0257 / Research-0053 (PR #328) deferred the MobileSal real-weights swap because upstream yuhuan-wu/MobileSal is CC BY-NC-SA 4.0, distributes weights through Google Drive viewer URLs, and is RGB-D where the fork's C contract is RGB-only. ADR-0257's recommended next step was filed as backlog row T6-2a-replace-with-u2netp: switch the underlying model family from MobileSal to U-2-Net's 4.7 MB u2netp variant under Apache-2.0, which is permissive, pure RGB, and (per ADR-0257 §Alternatives) "drop-in compatible with the saliency I/O contract".
This digest records the result of attempting that replacement. The plan was to mirror the FastDVDnet T6-7b pattern from PR #326 / ADR-0253:
- Pin upstream commit by SHA.
curl -L -O <raw URL>/u2netp.pthfrom the pinned commit.- Wrap with a
LumaAdapter(Y → [Y, Y, Y] tile, RGB → Y collapse) so the upstream graph matches the C-side[1, 3, H, W] → [1, 1, H, W]contract. torch.onnx.exportat opset 17 withdo_constant_folding=True.- Verify PyTorch ↔ ONNX max-abs-diff
< 1e-5over 5 random inputs. - Bump
model/tiny/registry.jsonmobilesal_placeholder_v0→u2netp_v1,smoke: false, licenseApache-2.0.
The plan blocks on two independent findings (sections below). The saliency extractor's I/O contract is preserved end-to-end — every piece of the C side, the smoke test, the registry schema, and the sidecar layout would still work — but step 2 (download) and step 4 (allowlist-safe export) each have their own irreducible blocker.
Upstream survey (2026-05-03)¶
The U-2-Net paper (Qin, Zhang, Huang, Dehghan, Zaiane, Jagersand, "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection", Pattern Recognition 2020) has one canonical reference implementation:
xuebinqin/U-2-Net(the corresponding author's repository, HEADac7e1c81as of 2026-05-03).
The repository was probed via the GitHub REST API on 2026-05-03.
License — Apache-2.0 (clean)¶
The repository carries a clean SPDX LICENSE file:
$ gh api repos/xuebinqin/U-2-Net/license --jq '.license'
{ "spdx_id": "Apache-2.0", "name": "Apache License 2.0" }
Apache-2.0 is fully compatible with the fork's BSD-3-Clause-Plus-Patent license under the standard combine-and-redistribute pattern (Apache-2.0 §4 NOTICE attribution required; no copyleft). This is the axis that unblocks U-2-Net relative to MobileSal — there is no licence incompatibility.
Distribution — Google-Drive walled, no GitHub release¶
The trained checkpoints are not in the repository tree and not GitHub release artefacts:
gh api repos/xuebinqin/U-2-Net/releasesreturns[]— the repo has zero releases.- A recursive listing of HEAD
ac7e1c81(gh api 'repos/xuebinqin/U-2-Net/git/trees/master?recursive=1') returns 217 paths; the onlysaved_models/content issaved_models/face_detection_cv2/haarcascade_frontalface_default.xml(an OpenCV cascade, unrelated). No*.pthpaths. - The
README.mdlinks every checkpoint via Google Drive viewer URLs — foru2netp.pth:
or u2netp.pth (4.7 MB) from GoogleDrive
The viewer URL returns Google Drive's HTML preview page, not the raw .pth — the actual download requires an authenticated browser session and a "Download anyway" click for files over Drive's unauth-quota threshold.
This is exactly the same distribution-channel blocker that ruled out MobileSal in Research-0053 §Distribution. The FastDVDnet pattern of pinning an upstream commit and curling the weights file by SHA does not reproduce non-interactively in CI.
Architecture — pure RGB ✓¶
U-2-Net's forward signature is RGB-only:
(xuebinqin/U-2-Net/u2net.py, U2NETP.forward.) This unblocks the RGB-D mismatch axis that ADR-0257 recorded for MobileSal — the fork's C-side luma-derived RGB contract maps onto U-2-Net's input without an off-distribution depth-map workaround.
ONNX op allowlist — Resize blocker¶
xuebinqin/U-2-Net/u2net.py builds the U^2 architecture with bilinear up-sampling at every decoder stage:
(Verified by reading model/u2net.py at upstream HEAD.) PyTorch's ONNX exporter lowers F.upsample(..., mode='bilinear') to the Resize op (the legacy Upsample op was removed in opset 13; opset 17 emits Resize exclusively).
The fork's ONNX op allowlist (core/src/dnn/op_allowlist.c) does not include Resize:
/* structural / shape */ Identity, Reshape, Flatten, Squeeze,
Unsqueeze, Transpose, Concat, Slice,
Gather, Cast, Shape, Expand
/* arithmetic */ Add, Sub, Mul, Div, Neg, Abs, Sqrt,
Pow, Exp, Log, Clip, Min, Max, Sum, Mean
/* reductions */ ReduceMean/Sum/Max/Min,
GlobalAveragePool, GlobalMaxPool
/* dense */ Gemm, MatMul
/* convolutional */ Conv, ConvTranspose, MaxPool, AveragePool
/* normalization */ BatchNormalization, LayerNormalization,
InstanceNormalization
/* activations */ Relu, LeakyRelu, Sigmoid, Tanh, Softmax,
Elu, Selu, Softplus, Softsign, Gelu, Erf,
HardSigmoid, HardSwish, PRelu, Clip
/* dropout */ Dropout
/* QDQ */ QuantizeLinear, DequantizeLinear
/* misc */ Constant, ConstantOfShape
/* control flow */ Loop, If
Resize is not on the list. Loading a U-2-Net ONNX with Resize nodes through the fork's vmaf_dnn_session_open would be rejected by the recursive scan in onnx_scan.c.
The PixelShuffle decomposition trick from PR #326 (replace nn.PixelShuffle(r) with a Reshape → Transpose → Reshape block — allowlist-safe because PixelShuffle is a pure shape op with no learned parameters and an exact integer-stride decomposition) does not work for bilinear interpolation. Bilinear resampling requires a 2-tap filter tree at every output pixel; the only ways to express that with the current allowlist are:
- Pre-compute the bilinear weights into a depth-wise
Convwith fixed kernel shape and stride — works for fixed spatial dimensions but the upstreamF.upsample(..., size=tar.shape[2:])resolves the target size dynamically from the encoder skip connection. Static-strideConvcannot replicate that. - Replace bilinear with nearest-neighbour using
Slice+Concat— numerically inequivalent (would need to retrain from scratch). - Use
ConvTransposewith a stride-2 4×4 bilinear kernel — works for 2× upsampling but U-2-Net's decoder uses non-power-of-two ratios at the inner stages, and the kernel weights are not the upstream-trained ones (off-distribution).
None of these preserve the upstream-trained weights faithfully under the existing allowlist.
The two reasonable unblocks are:
- Widen the allowlist to include
Resizeunder a bounded attribute schema. Filed as T6-2a-widen-allowlist-resize. - Train a from-scratch saliency student designed against the existing allowlist. Filed as T6-2a-train-saliency-student.
Both are independent ADR-scope decisions, neither bundleable into a "swap weights" PR.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Defer u2netp swap; keep placeholder; document blocker | Honest record of what's actually shipped; corrects ADR-0257's recommendation in light of the second blocker; aligns with the task-brief "don't fake it" directive; zero C-side surface change | T6-2a-followup remains open with no near-term unblock | Chosen — see ADR-0265 |
Vendor u2netp.pth via an in-tree authenticated-fetch helper (gdown) | Real saliency signal; clean Apache-2.0 license | Adds gdown (or equivalent Drive-scraper) to the runtime / CI deps; Google can break the unauth path at any time; CI cannot reproduce the export deterministically; license-clean weights from a fragile distribution channel are still a supply-chain risk; explicitly forbidden by the task brief | Rejected — distribution-channel hardening is a separate scope decision |
Mirror u2netp.pth through the fork's own release artefacts | Stable raw URL the export script can pin; Apache-2.0 §4 NOTICE attribution permits redistribution | Apache-2.0 NOTICE bundling is a release-pipeline change; sets precedent for re-hosting every upstream tiny-AI weight blob in our own releases (storage + audit cost); does not solve the op-allowlist blocker | Filed as T6-2a-mirror-u2netp-via-release; blocked behind the op-allowlist decision regardless |
Widen the ONNX op allowlist to include Resize (bilinear-only, fixed-attribute) | Unblocks every model that uses bilinear up-sampling (U-2-Net, U-Net, most decoders); a single security-review pass amortises across future imports | Resize has 6+ optional attributes any of which change graph semantics; bounding the attribute set requires fork-local op-validator logic; security review is an independent ADR scope | Filed as T6-2a-widen-allowlist-resize; not bundled here because the licence + distribution blocker is independent |
Rewrite u2net.py in-tree against allowlist-safe primitives | Fork-owned graph with verifiable lineage to upstream weights | Bilinear upsampling has no exact decomposition into {Conv, Reshape, Transpose, Slice} at dynamic strides; either pre-computes the bilinear-kernel Conv (graph blow-up, dynamic stride) or accepts nearest-neighbour (re-trains from scratch) | Rejected — engineering effort comparable to from-scratch retrain, with worse provenance |
| Train a from-scratch saliency student on a permissive corpus (DUTS / DUT-OMRON) | Fully fork-owned weights with clean provenance; sidesteps distribution-channel question; can be designed inside the existing op allowlist from the start | Engineering effort comparable to the rest of T6-2a put together; no in-tree SOD training harness; quality-vs-baseline calibration is a research project on its own | Filed as T6-2a-train-saliency-student; deferred until at least one of the other unblocks lands |
Email xuebinqin/U-2-Net author to cut a GitHub release with u2netp.pth as an artefact | Cleanest fix for the distribution-channel blocker if granted; doesn't require relicensing (already Apache-2.0) | Out-of-band ask with no commitment; even if granted, the op-allowlist blocker remains | Filed as long-shot follow-up; this PR does not depend on it |
Use BASNet / PoolNet / other RGB-only saliency model | Different distribution channel might unblock; might use allowlist-safe ops | Each candidate needs its own license + distribution + architecture audit; survey work effectively starts over | Out-of-scope; revisit only if both U-2-Net unblocks stall |
Ship u2netp anyway with random .pth proxy weights and document it as a placeholder | Pattern matches the original placeholder | Conflates "smoke" and "real weights" — already what mobilesal_placeholder_v0 is; adds nothing; explicitly forbidden by the task brief | Rejected — duplicate of the existing placeholder |
Recommendation¶
Keep the smoke-only placeholder shipped at model/tiny/mobilesal.onnx (mobilesal_placeholder_v0, sha256 f1226310…, smoke: true) and document the second-tier blocker in ADR-0265. The mobilesal extractor remains usable end-to-end — every C-side test in core/test/test_mobilesal.c passes against the placeholder — but saliency_mean stays a content-independent constant (~0.5) until at least one of the unblock paths lands. The C contract does not change; any future drop-in (U-2-Net via mirror + allowlist widening, distilled student, or BASNet/PoolNet survey result) just replaces the .onnx and bumps the registry sha256.
Of the three filed follow-ups, T6-2a-widen-allowlist-resize is the load-bearing one — both the U-2-Net mirror path and any future modern-decoder import depend on it. Recommend prioritising it before another saliency-replacement attempt.
References¶
- ADR-0218 — original MobileSal extractor design with the smoke-only placeholder.
- ADR-0257 / Research-0053 (PR #328) — sibling MobileSal blocker; this digest extends the chain.
- ADR-0253 (PR #326) — sibling real-weights swap that did succeed (FastDVDnet, MIT, GitHub-raw downloadable, RGB-only architecture). The pattern this digest was supposed to mirror.
- Upstream code: https://github.com/xuebinqin/U-2-Net (HEAD
ac7e1c81, SPDX = Apache-2.0). - Upstream paper: Qin, Zhang, Huang, Dehghan, Zaiane, Jagersand, "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection", Pattern Recognition 2020.
- ONNX
Resizeop spec: https://onnx.ai/onnx/operators/onnx__Resize.html. core/src/dnn/op_allowlist.c— the canonical allowlist enumeration this digest audited against.- Source: paraphrased — task brief directive "if u2netp weights download fails, ship a docs-only blocker PR similar to #328's mobilesal pattern, documenting what's needed. Don't push fake weights."