ADR-0618: Content-Aware Classifier for Encoder Routing¶
- Status: Proposed
- Date: 2026-05-19
- Deciders: lusoris
- Tags:
ai,planning,vmaf-tune,dnn
Context¶
Different content types require different encoding strategies: animation 2D compresses well at low resolution and benefits from tune=animation; sports needs high CRF headroom and tight shot boundaries; HDR requires HDR-aware VMAF models. Currently vmaf-tune applies uniform parameters regardless of content type. A pre-encoding classifier that tags the source once (single 10s clip analysis) and emits a tag dict enables automatic routing to per-content-type VMAF priors, encoder parameters, and ladder defaults — without operator intervention.
Decision¶
We will implement a Hybrid A4 classifier (Research 0614): run FFmpeg siti for spatial complexity and motion intensity (deterministic, no GPU, ~5–10 s per clip), infer dynamic range from container metadata, and use CAMBI as a source-quality proxy. For genre and subjective tags (the only tags that cannot be derived from signal statistics), run a single local VLM call via Ollama (Gemma 3B Vision or Llama 3.2 11B Vision) on 3 sampled frames. Results are cached per clip fingerprint (xxHash128 of the first 1 MB) in a sidecar JSON.
The classifier lives in tools/vmaf-tune/src/vmaftune/classify.py. The output is a ContentTags dataclass fed into a routing table that selects encoder params, VMAF model, and ladder priors.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| A1 — VLM-only (Ollama local) | Zero training; all tags covered | GPU required; 2–5 s latency; non-deterministic | Retained as partial contributor in A4 |
| A1b — Claude Vision API | Zero training; natural language | Content leaves premises; API cost; latency | Unacceptable for unreleased content |
| A2 — Train small CNN | Zero runtime overhead; deterministic | 1–2 weeks training; taxonomy crosswalk; genre-only | Deferred; requires labeled corpus |
| A3 — CAMBI + SI/TI only | Deterministic; fast; no GPU | No genre; no semantic tags | Covers 5 of 7 tag categories; adopted as A4 base |
| A4 — Hybrid SI/TI + selective VLM (chosen) | Covers all tags; deterministic for signal tags; VLM only when needed | VLM non-determinism on genre tags; Ollama dependency | — |
Consequences¶
- Positive: Zero training burden; all 7 tag categories covered; enables automatic routing for NEG (ADR-0616), ladder priors, and encoder
tuneparams; caching means the classifier runs once per title. - Negative: Ollama dependency adds GPU requirement to the pre-encoding step when genre tags are needed; non-deterministic genre tags (LLM-sampled) may produce inconsistent routing on re-runs without a fixed seed.
- Neutral / follow-ups: A deterministic fallback (A3-only, no genre) must be available when Ollama is absent (CI environments, CPU-only runners). CNN classifier (A2) is the long-term upgrade path once a labeled corpus is assembled.
Dependencies¶
- ADR-0616 (VMAF NEG) — the classifier's
animation-3dtag auto-selects NEG; ADR-0616 must land first. - No other items in the 6-list are hard prerequisites; classifier can ship without DO, ABR rendition, or cross-shot weighting.
- FFmpeg
sitifilter available on the dev machine (confirmed in-container). - CAMBI available via libvmaf CLI (in-tree).
- Ollama with Gemma 3B or Llama 3.2 11B Vision available on dev machine.
Implementation phases¶
| Phase | Description | Effort |
|---|---|---|
| P1 | classify.py skeleton; ContentTags dataclass; FFmpeg SI/TI extraction | 2 days |
| P2 | CAMBI source-quality proxy; container metadata HDR detection | 1 day |
| P3 | Ollama VLM genre tagger; fallback to genre=unknown when Ollama absent | 2 days |
| P4 | Routing table: ContentTags → encoder params + VMAF model + ladder priors | 1 day |
| P5 | Cache (xxHash128 fingerprint + sidecar JSON); CLI --classify subcommand | 1 day |
| P6 | Docs docs/usage/vmaf-tune-classifier.md; routing table docs | 1 day |
Total estimate: 8 days (largest item on the roadmap).
References¶
- Research digest: docs/research/0614-content-aware-classifier-research.md.
- FFmpeg
sitifilter: ITU-T P.910 SI/TI. - CAMBI:
resource/doc/cambi.md(Netflix/vmaf upstream; retrieved via GitHub API 2026-05-19). - MediaPipe Video Classification (mediapipe.dev; 2023).
- Anthropic Claude Vision API (claude.ai/docs; 2025).
tools/vmaf-tune/src/vmaftune/saliency.py— existing visual-feature pipeline.- Source: per user direction (roadmap planning session 2026-05-19).