ADR-0618: Content-Aware Classifier for Encoder Routing¶

Status: Proposed
Date: 2026-05-19
Deciders: lusoris
Tags: ai, planning, vmaf-tune, dnn

Context¶

Different content types require different encoding strategies: animation 2D compresses well at low resolution and benefits from tune=animation; sports needs high CRF headroom and tight shot boundaries; HDR requires HDR-aware VMAF models. Currently vmaf-tune applies uniform parameters regardless of content type. A pre-encoding classifier that tags the source once (single 10s clip analysis) and emits a tag dict enables automatic routing to per-content-type VMAF priors, encoder parameters, and ladder defaults — without operator intervention.

Decision¶

We will implement a Hybrid A4 classifier (Research 0614): run FFmpeg siti for spatial complexity and motion intensity (deterministic, no GPU, ~5–10 s per clip), infer dynamic range from container metadata, and use CAMBI as a source-quality proxy. For genre and subjective tags (the only tags that cannot be derived from signal statistics), run a single local VLM call via Ollama (Gemma 3B Vision or Llama 3.2 11B Vision) on 3 sampled frames. Results are cached per clip fingerprint (xxHash128 of the first 1 MB) in a sidecar JSON.

The classifier lives in tools/vmaf-tune/src/vmaftune/classify.py. The output is a ContentTags dataclass fed into a routing table that selects encoder params, VMAF model, and ladder priors.

Alternatives considered¶

Option	Pros	Cons	Why not chosen
A1 — VLM-only (Ollama local)	Zero training; all tags covered	GPU required; 2–5 s latency; non-deterministic	Retained as partial contributor in A4
A1b — Claude Vision API	Zero training; natural language	Content leaves premises; API cost; latency	Unacceptable for unreleased content
A2 — Train small CNN	Zero runtime overhead; deterministic	1–2 weeks training; taxonomy crosswalk; genre-only	Deferred; requires labeled corpus
A3 — CAMBI + SI/TI only	Deterministic; fast; no GPU	No genre; no semantic tags	Covers 5 of 7 tag categories; adopted as A4 base
A4 — Hybrid SI/TI + selective VLM (chosen)	Covers all tags; deterministic for signal tags; VLM only when needed	VLM non-determinism on genre tags; Ollama dependency	—

Consequences¶

Positive: Zero training burden; all 7 tag categories covered; enables automatic routing for NEG (ADR-0616), ladder priors, and encoder tune params; caching means the classifier runs once per title.
Negative: Ollama dependency adds GPU requirement to the pre-encoding step when genre tags are needed; non-deterministic genre tags (LLM-sampled) may produce inconsistent routing on re-runs without a fixed seed.
Neutral / follow-ups: A deterministic fallback (A3-only, no genre) must be available when Ollama is absent (CI environments, CPU-only runners). CNN classifier (A2) is the long-term upgrade path once a labeled corpus is assembled.

Dependencies¶

ADR-0616 (VMAF NEG) — the classifier's animation-3d tag auto-selects NEG; ADR-0616 must land first.
No other items in the 6-list are hard prerequisites; classifier can ship without DO, ABR rendition, or cross-shot weighting.
FFmpeg siti filter available on the dev machine (confirmed in-container).
CAMBI available via libvmaf CLI (in-tree).
Ollama with Gemma 3B or Llama 3.2 11B Vision available on dev machine.

Implementation phases¶

Phase	Description	Effort
P1	`classify.py` skeleton; `ContentTags` dataclass; FFmpeg SI/TI extraction	2 days
P2	CAMBI source-quality proxy; container metadata HDR detection	1 day
P3	Ollama VLM genre tagger; fallback to `genre=unknown` when Ollama absent	2 days
P4	Routing table: `ContentTags` → encoder params + VMAF model + ladder priors	1 day
P5	Cache (xxHash128 fingerprint + sidecar JSON); CLI `--classify` subcommand	1 day
P6	Docs `docs/usage/vmaf-tune-classifier.md`; routing table docs	1 day

Total estimate: 8 days (largest item on the roadmap).

References¶

Research digest: docs/research/0614-content-aware-classifier-research.md.
FFmpeg siti filter: ITU-T P.910 SI/TI.
CAMBI: resource/doc/cambi.md (Netflix/vmaf upstream; retrieved via GitHub API 2026-05-19).
MediaPipe Video Classification (mediapipe.dev; 2023).
Anthropic Claude Vision API (claude.ai/docs; 2025).
tools/vmaf-tune/src/vmaftune/saliency.py — existing visual-feature pipeline.
Source: per user direction (roadmap planning session 2026-05-19).