Research Digest 0613: Cross-Shot Complexity Weighting¶
Scope: Title-level quality targets (e.g. "average VMAF ≥ 94, no shot below 91") instead of per-shot independent targets; design of the constraint solver and integration into per_shot.py's predicate API. Retrieved: 2026-05-19 Status: Planning-only; no implementation.
Problem Statement¶
The current Phase D implementation (per_shot.py) treats each shot independently: every shot is bisected to the same per-shot VMAF target (e.g. 93.0). This is suboptimal in two ways:
- Bit waste on simple shots: a simple fade-to-black or talking-head shot can achieve VMAF 96 at CRF 28 when the target is 93. The extra quality costs bits without perceptual value.
- Hard shots may undershoot: a complex action sequence may barely reach 93 at CRF 22, consuming many bits. Relaxing the target for this shot to 91 (still above a minimum floor) while tightening other shots recovers bits to spend on the hard shot's detail.
The desired model: assign per-shot CRFs such that the title-level VMAF distribution satisfies:
mean(vmaf) ≥ target_mean(e.g. 94)min(vmaf) ≥ floor(e.g. 91)total_bits ≤ budget(optional hard constraint)
Literature¶
There is no standard public paper on per-shot cross-shot weighting specifically. The problem is related to:
- Rate-distortion theory: optimal bit allocation across frames (Lagrangian rate control). Classic result: allocate bits in proportion to complexity.
- Netflix per-title encoding (Tech Blog 2015; SSL failed 2026-05-19): implies per-title global quality targets; the per-shot extension is fork-local.
- Lagrangian quality optimisation: well-studied in H.264/HEVC rate control literature (e.g. HRD-compliant video coding). The Lagrange multiplier λ connects bitrate and distortion; adjusting λ per shot achieves optimal bit allocation for a fixed total budget.
Current Fork State¶
| Component | Status |
|---|---|
per_shot.py PredicateFn | Per-shot target VMAF, no global constraint |
bisect.py | Target VMAF as scalar per call |
conformal.py / uncertainty.py | Interval-aware quality |
| Cross-shot quality optimiser | Not implemented |
The PredicateFn signature is (shot, target_vmaf, encoder) → (crf, vmaf). To support cross-shot weighting, the predicate must either:
- Accept a per-shot target derived from a global constraint solver, or
- Be replaced by a two-pass architecture: first pass estimates per-shot complexity; second pass solves the allocation problem; third pass runs bisect with per-shot targets.
Design Options¶
Option A: Proportional complexity-based target relaxation¶
Measure per-shot complexity (spatial entropy + motion energy). Assign per-shot VMAF targets such that target_i = target_mean - k × (complexity_i - mean_complexity). Simple linear relaxation: complex shots get a lower target, simple shots get a higher target. The k parameter controls trade-off aggressiveness.
Pros: Zero solver overhead; very fast; simple to implement. Cons: Not guaranteed to satisfy the mean/floor constraints exactly; k must be tuned per codec/content type.
Option B: Iterative redistribution (greedy two-pass)¶
First pass: bisect all shots to a uniform target T. Measure actual VMAF scores. Second pass: identify shots where VMAF >> T (simple shots); reduce their targets; add the recovered bits to a "budget" that is redistributed to hard shots (VMAF < T). Repeat until convergence or max iterations.
Pros: No solver; reuses existing bisect.py; guaranteed floor unless content is uniformly hard. Cons: Requires 2× encode overhead; convergence not guaranteed; order- dependent.
Option C: Lagrangian optimisation (recommended for V1)¶
Model the problem as: minimise Σ bits(shot_i, crf_i) subject to mean(vmaf(shot_i, crf_i)) ≥ target_mean and vmaf(shot_i, crf_i) ≥ floor ∀ i. Use a bisect on the Lagrange multiplier λ: at each λ, each shot independently minimises bits_i + λ × (target_vmaf - vmaf_i). The λ bisect converges in ~10 iterations; the inner per-shot bisect is already Phase B.
Pros: Principled; provably optimal under the model; reuses Phase B. Cons: Requires CRF-to-bitrate and CRF-to-VMAF to be available without full encode (use NR proxy for the inner loop if Item 3 is available); O(shots × λ iterations) probes.
Option D: LP/QP solver (exact)¶
Enumerate a discrete CRF grid for each shot; formulate as an integer program where each shot selects one CRF; solve with scipy.optimize.linprog or PuLP. Exact solution; tractable for ≤100 shots and ≤10 CRF levels per shot.
Pros: Exact; off-the-shelf solver. Cons: Requires pre-computing all (shot, CRF) → (bits, VMAF) pairs — expensive without NR proxy; scipy dependency.
Recommended Decision Matrix¶
| Option | Optimality | Runtime cost | Dependencies |
|---|---|---|---|
| A — linear relaxation | Low | Negligible | None |
| B — iterative redistribution | Medium | 2× encode | None |
| C — Lagrangian (recommended) | High | ~1.5× encode | Item 3 (NR) |
| D — LP/QP solver | Exact | 5–10× encode | Item 3 (NR) |
API Design Sketch¶
@dataclasses.dataclass
class TitleQualityConstraints:
target_mean_vmaf: float = 94.0
floor_vmaf: float = 91.0
max_iterations: int = 3 # redistribution passes
lambda_bisect_iterations: int = 10
def tune_per_shot_with_constraints(
shots: list[Shot],
predicate: PredicateFn,
constraints: TitleQualityConstraints,
encoder: str,
) -> list[ShotRecommendation]: ...
The constraints object is optional: if None, the existing per-shot independent behaviour is preserved (backward compatible).
Open Questions¶
- Should the floor constraint be a hard lower bound (abort encode if violated) or a soft penalty (accept with warning)?
- How should the constraint solver handle shots where even CRF=0 (lossless) cannot reach
floor_vmaf? (Content that is too degraded to reconstruct.) - Is the
target_meana simple arithmetic mean, or a duration-weighted mean (longer shots contribute more to the title quality)?
References¶
tools/vmaf-tune/src/vmaftune/per_shot.py— Phase D predicate API.tools/vmaf-tune/src/vmaftune/bisect.py— Phase B CRF bisect.tools/vmaf-tune/src/vmaftune/uncertainty.py— confidence intervals.- Lagrangian rate control: ITU-T SG16 standard rate-control literature (H.264 / H.265 HRD compliance); no public arXiv paper on per-shot VMAF Lagrangian found in search 2026-05-19.
- ADR-0617 — Decision record for cross-shot complexity weighting.