saliency_student_v2 — Resize-decoder ablation on the v1 recipe¶
saliency_student_v2 is a fork-trained tiny saliency student that exercises the bilinear-resize-then-conv decoder pattern admitted by ADR-0258. It ships as a parallel artefact alongside saliency_student_v1 under model/tiny/. v1 remains the production weights for the C-side mobilesal extractor; v2 is staged for a future production-flip PR after empirical validation in real ROI encodes.
Status — Production (default 2026-05-15). Promoted to production default for the
mobilesalextractor; supersedessaliency_student_v1(IoU 0.7105 vs 0.6558, +8.3%). See ADR-0444, ADR-0332, and Research-0089.
What changed vs v1¶
The encoder, channels, skip connections, loss, optimizer, schedule, augmentation pipeline, seed, dataset, and held-out split are all identical to v1 — this is a single-variable ablation on the decoder upsampler:
| Decoder upsampler | v1 | v2 |
|---|---|---|
| Op | nn.ConvTranspose2d(k=2, s=2, no bias) | F.interpolate(scale=2, bilinear, align_corners=False) + nn.Conv2d(k=3, p=1, no bias) |
| ONNX op | ConvTranspose | Resize (mode=linear, coordinate_transformation_mode=half_pixel) + Conv |
| Allowlist gate | pre-ADR-0258 | post-ADR-0258 |
| Trainable params | 112 841 | 123 721 (+9.6 %) |
Shipped checkpoint¶
| Field | Value |
|---|---|
| Model id | saliency_student_v2 |
| Display name | vmaf_tiny_saliency_student_v2 |
| Location | model/tiny/saliency_student_v2.onnx |
| Sidecar | model/tiny/saliency_student_v2.json |
| Architecture | TinyU-Net with Resize+3×3-Conv decoder; otherwise identical to v1 |
| Trainable parameters | 123 721 |
| Input | input — float32 NCHW [1, 3, H, W] ImageNet-normalised RGB |
| Output | saliency_map — float32 NCHW [1, 1, H, W] per-pixel saliency in [0, 1] |
| ONNX opset | 17 |
| Training corpus | DUTS-TR (Wang et al. 2017), same 10 553 pairs as v1 — not redistributed in-tree |
| Held-out validation IoU | 0.7105 on the 5 % DUTS-TR validation fold (seed=42) — gate PASS vs v1's 0.6558 (+0.0547 / +8.3 %); full per-epoch trace in build_artifacts/saliency_student_v2_train.json |
| PyTorch ↔ ONNX parity max-abs-diff | 3.278e-6 (threshold 1e-5; same threshold v1 used) |
| Training wall-clock | 573.0 s (≈ 9.5 min) on RTX 4090, CUDA 13, PyTorch 2.11 |
| License | BSD-3-Clause-Plus-Patent |
| Trainer | ai/scripts/train_saliency_student_v2.py |
The training run is deterministic given a fixed --seed (default 42) and pinned PyTorch / CUDA versions. Fresh --metrics-out JSON includes ADR-0661 run_provenance with the DUTS-TR root, ONNX output, metrics output, parsed training arguments, and original argv.
Training corpus provenance¶
Identical to v1 — DUTS-TR (Wang, Lu, Wang, Feng, Wang, Yin, Ruan, "Learning to Detect Salient Objects with Image-Level Supervision", CVPR 2017). Project page: http://saliencydetection.net/duts/. Direct URL: https://saliencydetection.net/duts/download/DUTS-TR.zip. Distribution: free for academic and research purposes (see project page). SHA-256 of the redistributed zip — see saliency_student_v1.md. The DUTS images are not committed to this repository; only the trained weights are.
Op-allowlist conformance¶
Every op in the v2 graph is on core/src/dnn/op_allowlist.c post-ADR-0258:
Resize is the new entry vs v1; ConvTranspose is no longer required. Constant materialises the integer-pair output spatial dims for the resample target — benign, on the allowlist.
ADR-0258's policy: scope is op-type only; attribute enforcement is delegated to ORT. v2 declares mode='linear', coordinate_transformation_mode='half_pixel', antialias=0 (the PyTorch align_corners=False bilinear default at opset 17). The fork's wire-format scanner does not gate these attributes; ORT accepts the combination unconditionally.
Usage — CLI¶
vmaf \
--reference ref.yuv \
--distorted dist.yuv \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
--feature mobilesal \
--feature_params mobilesal:model_path=model/tiny/saliency_student_v2.onnx \
--output score.json
Equivalently, set the model path via env var:
VMAF_MOBILESAL_MODEL_PATH=model/tiny/saliency_student_v2.onnx \
vmaf --reference ref.yuv --distorted dist.yuv \
--width 1920 --height 1080 --pixel_format 420 --bitdepth 8 \
--feature mobilesal --output score.json
Usage — C API¶
#include <libvmaf/libvmaf.h>
VmafFeatureDictionary *opts = NULL;
vmaf_feature_dictionary_set(&opts, "model_path",
"model/tiny/saliency_student_v2.onnx");
int err = vmaf_use_feature(ctx, "mobilesal", opts);
/* ... vmaf_score_pooled(ctx, ..., "saliency_mean", ...) for the per-frame mean */
Reproducing the model¶
# 1. fetch DUTS-TR (~271 MB) — not redistributed in-tree
mkdir -p $HOME/datasets/duts && cd $HOME/datasets/duts
wget https://saliencydetection.net/duts/download/DUTS-TR.zip
unzip DUTS-TR.zip
# 2. train + export (≈ 10–15 min on RTX 4090 / 24 GB at 256×256, batch 32)
.venv/bin/python ai/scripts/train_saliency_student_v2.py \
--duts-root $HOME/datasets/duts/DUTS-TR \
--output model/tiny/saliency_student_v2.onnx \
--epochs 50 --batch-size 32 --lr 1e-3 --seed 42 \
--metrics-out build_artifacts/saliency_student_v2_train.json
# 3. validate against the registry
.venv/bin/python ai/scripts/validate_model_registry.py \
--out-json build_artifacts/tiny_model_registry_validation.json
# 4. validate ONNX allowlist / PyTorch-vs-ORT parity / registry status
.venv/bin/python ai/scripts/validate_saliency_student.py \
--onnx model/tiny/saliency_student_v2.onnx \
--out-json build_artifacts/saliency_student_v2_validate.json
The training metrics and validation JSONs carry ADR-0661 run_provenance, so model-card evidence records the DUTS root, ONNX output, check verdicts, argv, and report paths.
Promotion history¶
saliency_student_v2 was initially shipped 2026-05-09 as a parallel artefact alongside v1 (per ADR-0332), with v1 remaining as the production weight pending ROI A/B validation. On 2026-05-15, ADR-0444 promoted v2 to the production default based on robust held-out IoU improvement (+8.3 % relative). v1 is retained in the registry for regression baselines and backwards compatibility.
Known limitations¶
Inherited from v1:
- 8-bit YUV only (10-bit gated on the LPIPS loader path).
- BT.709 limited-range Y'CbCr → RGB at the C side.
- ~124 K parameters is well below upstream u2netp (~4.7 M).
- 5 % held-out validation only; external evaluation is a follow-up.
Specific to v2:
- The "Resize + 3×3 Conv" pattern adds ~10 K parameters vs v1 (3×3 vs 2×2 kernel). Inference latency at 256×256 is within measurement noise of v1 on RTX 4090 / Intel Xeon.
- ADR-0258 admits
Resizeop-type-only without attribute enforcement. v2's exported attributes (linear,half_pixel,antialias=0) are documented here, but the fork's scanner does not validate them — ORT does. Future tiny-AI consumers usingResizeshould follow the same documented attribute contract.
Related¶
saliency_student_v1.md— production weights for themobilesalextractor; v2 is the architectural successor.mobilesal.md— the C-side extractor wiring; unchanged by this PR.- ADR-0258 — admits
Resizeto the allowlist. - ADR-0286 — v1 decision record.
- ADR-0332 — v2 decision record (this PR).
- Research-0089 — companion digest.
- ADR-0042 — tiny-AI doc-substance rule this card satisfies.