ADR-0218: MobileSal saliency feature extractor (T6-2a)¶
- Status: Accepted
- Date: 2026-04-29
- Deciders: Lusoris, Claude (Anthropic)
- Tags: ai, dnn, feature-extractor, saliency, fork-local
Context¶
The Wave 1 tiny-AI roadmap (docs/ai/roadmap.md §2.3) and ADR-0036 (Tiny-AI Wave 1 scope expansion) commit the fork to a MobileSal saliency surface that feeds two downstream consumers: a scoring-side saliency-weighted variant of VMAF, and an encoder-side per-CTU QP-offset sidecar consumed by x265 --qpfile / SVT-AV1 ROI APIs. Backlog item T6-2 originally bundled both; revisiting it on the way into implementation, the scoring-side extractor and the encoder-side sidecar tool have nothing structurally in common — different output schemas, different consumers, different test surfaces — so the backlog row was subdivided into T6-2a (scoring-side extractor) and T6-2b (sidecar tool).
This ADR covers T6-2a only. Open questions resolved here:
- Real upstream weights vs synthetic placeholder. The upstream
yun-liu/MobileSalcheckpoint is ~10 MB and MIT-licensed, but re-exporting it cleanly with a static op set and ImageNet-in-graph normalisation is a non-trivial export-script effort the rest of T6-2a (C wiring, registry plumbing, smoke test, docs) does not block on. The pattern set by FastDVDnet (T6-7, PR #203) and the smoke fixtures inmodel/tiny/smoke_v0.onnxis to ship a tiny synthetic placeholder that matches the I/O contract bit-for-bit while real weights remain a tracked follow-up. - Scalar mean vs full-map export. A per-pixel saliency feature would be ~ a few MB per frame at 1080p — the existing
VmafFeatureCollectorAPI is scalar-per-frame and refactoring it for tensor-valued features is well beyond the scope of T6-2a. - T6-2a vs T6-2b boundary. The encoder-side sidecar tool needs a different output format (encoder-native CTU grid), a different CLI surface (
tools/vmaf-roi), and its own integration tests against x265/SVT-AV1. Bundling it into T6-2a would balloon the PR and delay landing the scoring-side surface.
Decision¶
We add a new no-reference feature extractor mobilesal under core/src/feature/feature_mobilesal.c that runs a single ONNX saliency session over the distorted frame and emits a scalar saliency_mean per frame via vmaf_feature_collector_append. The session binds tensors by name — input input (NCHW float32 RGB, ImageNet-normalised) and output saliency_map (NCHW float32 [1,1,H,W]) — so any future drop-in (real upstream MobileSal export, distilled student) replaces the placeholder without C changes. The shipped checkpoint model/tiny/mobilesal.onnx is a smoke-only synthetic placeholder generated by scripts/gen_mobilesal_placeholder_onnx.py (3→1 1×1 Conv + Sigmoid). Real upstream MIT-licensed weights are the T6-2a-followup task. The encoder-side per-CTU QP-offset sidecar remains the T6-2b follow-up.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Real upstream MobileSal weights now | Immediate quality signal; users can correlate against actual saliency content | Upstream re-export non-trivial (custom ops, ImageNet-in-graph, opset alignment); blocks unrelated PR scope; pulls in 10 MB binary before the surface stabilises | Deferred to T6-2a-followup; placeholder unblocks plumbing |
| Synthetic placeholder ONNX (smoke=true) | Zero-friction landing; locks down C / registry / sidecar / docs; deterministic sha256; 330 bytes; same precedent as smoke_v0.onnx | Score is content-independent until real weights land | Chosen — placeholder is explicitly labelled smoke-only in registry.json and docs/ai/models/mobilesal.md |
Skip the placeholder, gate the extractor on a missing .onnx | Smaller PR | Leaves the feature uncallable end-to-end; smoke test cannot exercise the pipeline; pattern diverges from smoke_v0 / smoke_fp16_v0 precedent | Rejected — defeats T6-2a's purpose |
Scalar saliency_mean only | Fits existing VmafFeatureCollector API; matches every other feature in the tree; single PR | Discards spatial information that the encoder-side T6-2b needs | Chosen for T6-2a — T6-2b will read the map directly from the same session in its own CLI |
| Full per-pixel saliency map as a feature | Maximum information for downstream consumers | Requires VmafFeatureCollector redesign for tensor features; balloons PR; not consumed by anything yet (saliency-weighted L1 / L2 are T6-2b/c) | Deferred — T6-2b's tools/vmaf-roi will consume the map directly without going through the feature collector |
| Saliency-weighted L1 in the same PR | Matches the Wave 1 spec end-to-end | Requires a second feature extractor (FR), a paired ref/dist tensor scratch, and at least two more golden-test fixtures | Deferred to T6-2b — a saliency-weighted FR pooler is its own surface |
| Bundle T6-2a + T6-2b in one PR | Single ADR, single review pass | Mixes scoring-side feature wiring with encoder-side CLI, encoder format probing, x265/SVT-AV1 integration tests; unclear PR scope | Rejected — already the reason the backlog row was subdivided |
Consequences¶
- Positive:
- Wave 1 §2.3 scoring-side surface lands in one focused PR.
- The C wiring is shape-clean for any future MobileSal export that obeys the
input/saliency_mapname contract. - Smoke gate (
test_mobilesal) verifies registration + option table + missing-model decline path, just liketest_lpips. - The placeholder ONNX is 330 bytes and deterministic — no big binary churn in the tree.
- Negative:
- The placeholder's
saliency_meanis content-independent; the feature is not yet useful for ranking quality. The PR labels this clearly inregistry.json(smoke: true) and indocs/ai/models/mobilesal.md. - One more model id in the registry to maintain on schema-version bumps.
- Neutral / follow-ups:
- T6-2a-followup — export real upstream
yun-liu/MobileSalweights with ImageNet-in-graph + dynamic NCHW shape; flipsmoke: true→falsein the registry and bump the model idmobilesal_v1. - T6-2b —
tools/vmaf-roiCLI consuming the same model and writing per-CTU QP-offset sidecars in encoder-native format; saliency-weighted L1/L2 FR feature. - Future — once
VmafFeatureCollectorgrows tensor-valued features, expose the saliency map directly instead of routing throughtools/vmaf-roi.
References¶
docs/ai/roadmap.md§2.3 — Wave 1 MobileSal scope.- ADR-0036 — Wave 1 scope.
- ADR-0041 — sister LPIPS-SqueezeNet extractor; shared YUV → ImageNet-RGB plumbing.
- ADR-0042 — tiny-AI doc-substance rule this PR satisfies.
- ADR-0108 — fork-local PR deep-dive deliverables checklist.
- ADR-0168 — baseline tiny-AI checkpoint shape (sidecar + registry + sha256).
- Upstream paper: Wu, Liu, Cheng, Lu, Cheng, "MobileSal: Extremely Efficient RGB-D Salient Object Detection", IEEE TPAMI 2021.
- Upstream code: https://github.com/yun-liu/MobileSal (MIT).
- Source:
.workingdir2/BACKLOG.mdrow T6-2a.