ADR-0218: MobileSal saliency feature extractor (T6-2a)¶

Status: Accepted
Date: 2026-04-29
Deciders: Lusoris, Claude (Anthropic)
Tags: ai, dnn, feature-extractor, saliency, fork-local

Context¶

The Wave 1 tiny-AI roadmap (docs/ai/roadmap.md §2.3) and ADR-0036 (Tiny-AI Wave 1 scope expansion) commit the fork to a MobileSal saliency surface that feeds two downstream consumers: a scoring-side saliency-weighted variant of VMAF, and an encoder-side per-CTU QP-offset sidecar consumed by x265 --qpfile / SVT-AV1 ROI APIs. Backlog item T6-2 originally bundled both; revisiting it on the way into implementation, the scoring-side extractor and the encoder-side sidecar tool have nothing structurally in common — different output schemas, different consumers, different test surfaces — so the backlog row was subdivided into T6-2a (scoring-side extractor) and T6-2b (sidecar tool).

This ADR covers T6-2a only. Open questions resolved here:

Real upstream weights vs synthetic placeholder. The upstream yun-liu/MobileSal checkpoint is ~10 MB and MIT-licensed, but re-exporting it cleanly with a static op set and ImageNet-in-graph normalisation is a non-trivial export-script effort the rest of T6-2a (C wiring, registry plumbing, smoke test, docs) does not block on. The pattern set by FastDVDnet (T6-7, PR #203) and the smoke fixtures in model/tiny/smoke_v0.onnx is to ship a tiny synthetic placeholder that matches the I/O contract bit-for-bit while real weights remain a tracked follow-up.
Scalar mean vs full-map export. A per-pixel saliency feature would be ~ a few MB per frame at 1080p — the existing VmafFeatureCollector API is scalar-per-frame and refactoring it for tensor-valued features is well beyond the scope of T6-2a.
T6-2a vs T6-2b boundary. The encoder-side sidecar tool needs a different output format (encoder-native CTU grid), a different CLI surface (tools/vmaf-roi), and its own integration tests against x265/SVT-AV1. Bundling it into T6-2a would balloon the PR and delay landing the scoring-side surface.

Decision¶

We add a new no-reference feature extractor mobilesal under core/src/feature/feature_mobilesal.c that runs a single ONNX saliency session over the distorted frame and emits a scalar saliency_mean per frame via vmaf_feature_collector_append. The session binds tensors by name — input input (NCHW float32 RGB, ImageNet-normalised) and output saliency_map (NCHW float32 [1,1,H,W]) — so any future drop-in (real upstream MobileSal export, distilled student) replaces the placeholder without C changes. The shipped checkpoint model/tiny/mobilesal.onnx is a smoke-only synthetic placeholder generated by scripts/gen_mobilesal_placeholder_onnx.py (3→1 1×1 Conv + Sigmoid). Real upstream MIT-licensed weights are the T6-2a-followup task. The encoder-side per-CTU QP-offset sidecar remains the T6-2b follow-up.

Alternatives considered¶

Option	Pros	Cons	Why not chosen
Real upstream MobileSal weights now	Immediate quality signal; users can correlate against actual saliency content	Upstream re-export non-trivial (custom ops, ImageNet-in-graph, opset alignment); blocks unrelated PR scope; pulls in 10 MB binary before the surface stabilises	Deferred to T6-2a-followup; placeholder unblocks plumbing
Synthetic placeholder ONNX (smoke=true)	Zero-friction landing; locks down C / registry / sidecar / docs; deterministic sha256; 330 bytes; same precedent as `smoke_v0.onnx`	Score is content-independent until real weights land	Chosen — placeholder is explicitly labelled smoke-only in `registry.json` and `docs/ai/models/mobilesal.md`
Skip the placeholder, gate the extractor on a missing `.onnx`	Smaller PR	Leaves the feature uncallable end-to-end; smoke test cannot exercise the pipeline; pattern diverges from `smoke_v0` / `smoke_fp16_v0` precedent	Rejected — defeats T6-2a's purpose
Scalar `saliency_mean` only	Fits existing `VmafFeatureCollector` API; matches every other feature in the tree; single PR	Discards spatial information that the encoder-side T6-2b needs	Chosen for T6-2a — T6-2b will read the map directly from the same session in its own CLI
Full per-pixel saliency map as a feature	Maximum information for downstream consumers	Requires `VmafFeatureCollector` redesign for tensor features; balloons PR; not consumed by anything yet (saliency-weighted L1 / L2 are T6-2b/c)	Deferred — T6-2b's `tools/vmaf-roi` will consume the map directly without going through the feature collector
Saliency-weighted L1 in the same PR	Matches the Wave 1 spec end-to-end	Requires a second feature extractor (FR), a paired ref/dist tensor scratch, and at least two more golden-test fixtures	Deferred to T6-2b — a saliency-weighted FR pooler is its own surface
Bundle T6-2a + T6-2b in one PR	Single ADR, single review pass	Mixes scoring-side feature wiring with encoder-side CLI, encoder format probing, x265/SVT-AV1 integration tests; unclear PR scope	Rejected — already the reason the backlog row was subdivided

Consequences¶

Positive:
Wave 1 §2.3 scoring-side surface lands in one focused PR.
The C wiring is shape-clean for any future MobileSal export that obeys the input / saliency_map name contract.
Smoke gate (test_mobilesal) verifies registration + option table + missing-model decline path, just like test_lpips.
The placeholder ONNX is 330 bytes and deterministic — no big binary churn in the tree.
Negative:
The placeholder's saliency_mean is content-independent; the feature is not yet useful for ranking quality. The PR labels this clearly in registry.json (smoke: true) and in docs/ai/models/mobilesal.md.
One more model id in the registry to maintain on schema-version bumps.
Neutral / follow-ups:
T6-2a-followup — export real upstream yun-liu/MobileSal weights with ImageNet-in-graph + dynamic NCHW shape; flip smoke: true → false in the registry and bump the model id mobilesal_v1.
T6-2b — tools/vmaf-roi CLI consuming the same model and writing per-CTU QP-offset sidecars in encoder-native format; saliency-weighted L1/L2 FR feature.
Future — once VmafFeatureCollector grows tensor-valued features, expose the saliency map directly instead of routing through tools/vmaf-roi.

References¶

docs/ai/roadmap.md §2.3 — Wave 1 MobileSal scope.
ADR-0036 — Wave 1 scope.
ADR-0041 — sister LPIPS-SqueezeNet extractor; shared YUV → ImageNet-RGB plumbing.
ADR-0042 — tiny-AI doc-substance rule this PR satisfies.
ADR-0108 — fork-local PR deep-dive deliverables checklist.
ADR-0168 — baseline tiny-AI checkpoint shape (sidecar + registry + sha256).
Upstream paper: Wu, Liu, Cheng, Lu, Cheng, "MobileSal: Extremely Efficient RGB-D Salient Object Detection", IEEE TPAMI 2021.
Upstream code: https://github.com/yun-liu/MobileSal (MIT).
Source: .workingdir2/BACKLOG.md row T6-2a.