Skip to content

ADR-0218: MobileSal saliency feature extractor (T6-2a)

  • Status: Accepted
  • Date: 2026-04-29
  • Deciders: Lusoris, Claude (Anthropic)
  • Tags: ai, dnn, feature-extractor, saliency, fork-local

Context

The Wave 1 tiny-AI roadmap (docs/ai/roadmap.md §2.3) and ADR-0036 (Tiny-AI Wave 1 scope expansion) commit the fork to a MobileSal saliency surface that feeds two downstream consumers: a scoring-side saliency-weighted variant of VMAF, and an encoder-side per-CTU QP-offset sidecar consumed by x265 --qpfile / SVT-AV1 ROI APIs. Backlog item T6-2 originally bundled both; revisiting it on the way into implementation, the scoring-side extractor and the encoder-side sidecar tool have nothing structurally in common — different output schemas, different consumers, different test surfaces — so the backlog row was subdivided into T6-2a (scoring-side extractor) and T6-2b (sidecar tool).

This ADR covers T6-2a only. Open questions resolved here:

  1. Real upstream weights vs synthetic placeholder. The upstream yun-liu/MobileSal checkpoint is ~10 MB and MIT-licensed, but re-exporting it cleanly with a static op set and ImageNet-in-graph normalisation is a non-trivial export-script effort the rest of T6-2a (C wiring, registry plumbing, smoke test, docs) does not block on. The pattern set by FastDVDnet (T6-7, PR #203) and the smoke fixtures in model/tiny/smoke_v0.onnx is to ship a tiny synthetic placeholder that matches the I/O contract bit-for-bit while real weights remain a tracked follow-up.
  2. Scalar mean vs full-map export. A per-pixel saliency feature would be ~ a few MB per frame at 1080p — the existing VmafFeatureCollector API is scalar-per-frame and refactoring it for tensor-valued features is well beyond the scope of T6-2a.
  3. T6-2a vs T6-2b boundary. The encoder-side sidecar tool needs a different output format (encoder-native CTU grid), a different CLI surface (tools/vmaf-roi), and its own integration tests against x265/SVT-AV1. Bundling it into T6-2a would balloon the PR and delay landing the scoring-side surface.

Decision

We add a new no-reference feature extractor mobilesal under core/src/feature/feature_mobilesal.c that runs a single ONNX saliency session over the distorted frame and emits a scalar saliency_mean per frame via vmaf_feature_collector_append. The session binds tensors by name — input input (NCHW float32 RGB, ImageNet-normalised) and output saliency_map (NCHW float32 [1,1,H,W]) — so any future drop-in (real upstream MobileSal export, distilled student) replaces the placeholder without C changes. The shipped checkpoint model/tiny/mobilesal.onnx is a smoke-only synthetic placeholder generated by scripts/gen_mobilesal_placeholder_onnx.py (3→1 1×1 Conv + Sigmoid). Real upstream MIT-licensed weights are the T6-2a-followup task. The encoder-side per-CTU QP-offset sidecar remains the T6-2b follow-up.

Alternatives considered

Option Pros Cons Why not chosen
Real upstream MobileSal weights now Immediate quality signal; users can correlate against actual saliency content Upstream re-export non-trivial (custom ops, ImageNet-in-graph, opset alignment); blocks unrelated PR scope; pulls in 10 MB binary before the surface stabilises Deferred to T6-2a-followup; placeholder unblocks plumbing
Synthetic placeholder ONNX (smoke=true) Zero-friction landing; locks down C / registry / sidecar / docs; deterministic sha256; 330 bytes; same precedent as smoke_v0.onnx Score is content-independent until real weights land Chosen — placeholder is explicitly labelled smoke-only in registry.json and docs/ai/models/mobilesal.md
Skip the placeholder, gate the extractor on a missing .onnx Smaller PR Leaves the feature uncallable end-to-end; smoke test cannot exercise the pipeline; pattern diverges from smoke_v0 / smoke_fp16_v0 precedent Rejected — defeats T6-2a's purpose
Scalar saliency_mean only Fits existing VmafFeatureCollector API; matches every other feature in the tree; single PR Discards spatial information that the encoder-side T6-2b needs Chosen for T6-2a — T6-2b will read the map directly from the same session in its own CLI
Full per-pixel saliency map as a feature Maximum information for downstream consumers Requires VmafFeatureCollector redesign for tensor features; balloons PR; not consumed by anything yet (saliency-weighted L1 / L2 are T6-2b/c) Deferred — T6-2b's tools/vmaf-roi will consume the map directly without going through the feature collector
Saliency-weighted L1 in the same PR Matches the Wave 1 spec end-to-end Requires a second feature extractor (FR), a paired ref/dist tensor scratch, and at least two more golden-test fixtures Deferred to T6-2b — a saliency-weighted FR pooler is its own surface
Bundle T6-2a + T6-2b in one PR Single ADR, single review pass Mixes scoring-side feature wiring with encoder-side CLI, encoder format probing, x265/SVT-AV1 integration tests; unclear PR scope Rejected — already the reason the backlog row was subdivided

Consequences

  • Positive:
  • Wave 1 §2.3 scoring-side surface lands in one focused PR.
  • The C wiring is shape-clean for any future MobileSal export that obeys the input / saliency_map name contract.
  • Smoke gate (test_mobilesal) verifies registration + option table + missing-model decline path, just like test_lpips.
  • The placeholder ONNX is 330 bytes and deterministic — no big binary churn in the tree.
  • Negative:
  • The placeholder's saliency_mean is content-independent; the feature is not yet useful for ranking quality. The PR labels this clearly in registry.json (smoke: true) and in docs/ai/models/mobilesal.md.
  • One more model id in the registry to maintain on schema-version bumps.
  • Neutral / follow-ups:
  • T6-2a-followup — export real upstream yun-liu/MobileSal weights with ImageNet-in-graph + dynamic NCHW shape; flip smoke: truefalse in the registry and bump the model id mobilesal_v1.
  • T6-2btools/vmaf-roi CLI consuming the same model and writing per-CTU QP-offset sidecars in encoder-native format; saliency-weighted L1/L2 FR feature.
  • Future — once VmafFeatureCollector grows tensor-valued features, expose the saliency map directly instead of routing through tools/vmaf-roi.

References

  • docs/ai/roadmap.md §2.3 — Wave 1 MobileSal scope.
  • ADR-0036 — Wave 1 scope.
  • ADR-0041 — sister LPIPS-SqueezeNet extractor; shared YUV → ImageNet-RGB plumbing.
  • ADR-0042 — tiny-AI doc-substance rule this PR satisfies.
  • ADR-0108 — fork-local PR deep-dive deliverables checklist.
  • ADR-0168 — baseline tiny-AI checkpoint shape (sidecar + registry + sha256).
  • Upstream paper: Wu, Liu, Cheng, Lu, Cheng, "MobileSal: Extremely Efficient RGB-D Salient Object Detection", IEEE TPAMI 2021.
  • Upstream code: https://github.com/yun-liu/MobileSal (MIT).
  • Source: .workingdir2/BACKLOG.md row T6-2a.