Research-0086 — Local sidecar training feasibility¶

Status: Active — scaffold-grade. Establishes the algorithm shortlist, privacy contract, cold-start posture, and drift-detection hook for ADR-0325. Not a state-of-the-art ML survey; the goal is to fix the contract before a future PR replaces the linear baseline.
Workstream: ADR-0325
Last updated: 2026-05-08

Question¶

Per the user's ChatGPT-vision text item 3 — what is the minimum-viable shape of an on-host adaptive loop that improves the shipped per-shot VMAF predictor from the operator's actual encodes, without breaking predictor determinism and without any data leaving the host?

Online-learning algorithm choice¶

Three plausible algorithms for the bias-correction role on top of the shipped MLP-or-analytical predictor (we are not replacing the predictor — we are adding a residual-correcting head):

Algorithm	Per-update cost	State size	Closed-form?	Why considered
Online ridge regression (chosen for scaffold)	`O(d²)` per capture for the `d × d` Gram update	`O(d²)` floats (`d ≈ 14` features → ~200 floats)	Yes — rank-1 Sherman-Morrison or batched normal-equation solve	Tiny, deterministic, zero-dep, well-understood. Linear bias-correction is the right baseline before claiming the shipped predictor's residual has structure ridge can't absorb.
Stochastic gradient descent on a 1-2 layer MLP	`O(d × h)` per capture	`O(d × h)` floats	No (iterative)	More expressive than ridge. Pulls in PyTorch on `vmaf-tune`'s zero-dep harness. Premature: needs LR scheduling + a replay buffer for streaming.
Gradient-boosted residual trees (XGBoost / LightGBM)	`O(n × d × tree_depth)` per refit	`O(trees × leaves)`	No (greedy)	Strong empirical performer for tabular residuals. New native dep; pickle format isn't safe to round-trip across hosts (relevant for the future opt-in-upload PR).

Choice for the scaffold: online ridge. It is the only one of the three that gives a closed-form rank-1 update (Sherman-Morrison on the inverse Gram matrix), trains in sub-millisecond per capture, has zero hyperparameters beyond lambda_l2, and is bit-stable enough that a save/load round-trip can be unit-tested. The SidecarModel interface in sidecar.py is small enough to swap implementations later (one update and one predict_correction method) without disturbing the rest of the harness.

The L2 penalty (lambda_l2) defaults to 1.0 — small enough that the sidecar can shift the prediction by several VMAF points after ~50 captures, large enough that a single outlier capture cannot move the correction by more than ~0.5 VMAF.

Privacy contract¶

The sidecar persists under ${XDG_CACHE_HOME:-~/.cache}/vmaf-tune/sidecar/<predictor-version>/<codec>/state.json:

A random per-install 128-bit host_uuid (hex). Generated by secrets.token_hex(16) on first run; persisted in ~/.cache/vmaf-tune/sidecar/host-uuid and re-read on every subsequent run. Never derived from MAC address, hostname, /etc/machine-id, CPUID, or any other machine-identifying signal.
The current ridge weight vector (d floats) and Gram inverse (d × d floats).
A bounded ring buffer of the last max_history_rows captures (default 500) — pairs of (features, observed_vmaf, predicted_vmaf, timestamp). The ring buffer is what the drift-detection hook reads; ridge does not need it for the closed-form update.
The predictor_version string the sidecar was trained against. Mismatch on load → discard everything except host_uuid and start fresh.

The sidecar does not persist: source filename, source SHA, IP address, network info, encoder argv, output path, anything else that could re-identify the operator's encode stream.

The opt-in upload surface (out of scope for this PR; tracked in ADR-0325 §Future work) will need a separate consent UX and a signing chain. The local-only persisted state is upload-ready in the sense that anonymisation is already enforced — but the upload itself is not implemented here, and the cache dir is not network-reachable by the harness.

Cold-start behaviour¶

Before any captures: the sidecar's correction is identically zero. SidecarPredictor.predict_vmaf delegates to the shipped Predictor and adds 0.0. This means a fresh install behaves bit-identically to the bare shipped predictor — no surprise quality drop, no "predictor changed after I installed sidecar" effect.

Algorithmically: the ridge weights are initialised to zero and the inverse Gram to (1/lambda_l2) * I. With zero weights, every prediction returns zero correction.

The first capture produces a non-zero update; subsequent captures refine it. Operators who never hit the capture path (e.g. dry-run encodes that don't run libvmaf) never see a non-zero correction either — this is intentional.

Drift detection hook¶

The scaffold tracks the rolling residual norm — the L2 norm of the last 50 residuals (observed_vmaf − (predicted_vmaf + sidecar_correction)). Surfacing this as an operator-visible warning is a follow-up PR; the scaffold exposes it as SidecarModel.recent_residual_rms for the future drift-detection caller.

The signal interpretation:

Norm stays bounded (e.g., RMS ≤ 1.5 VMAF for a 50-capture window): the sidecar is absorbing the structured bias — healthy.
Norm grows despite continued captures: the residual has non-linear structure that the linear sidecar can't fit. Recommend the operator runs the offline retrain (ensemble training kit, ADR-0324) with their own corpus shard, or contributes to the community pool once item 4 lands.
Norm collapses to near-zero: either the operator's content is an exact match for the shipped corpus (unusual but possible), or there is too little capture diversity to reveal residual structure (only one source, one CRF). Not actionable — informational.

The threshold values are deliberately not pinned in this digest; they will be calibrated against real captures from a future operator-data collection PR. The scaffold ships the hook, not the thresholds.

What the scaffold does NOT include¶

Any training data leaves the host (community-pool upload).
Operator-facing CLI flag to enable / disable the sidecar — the initial integration is library-only; CLI wiring lands once we validate the surface against real operator usage.
A drift-warning UX. Hook only; threshold + presentation are a follow-up.
Cross-host sidecar migration. Each install starts fresh.
Replacement of the shipped predictor with an online-trained model (rejected in ADR-0325 §Alternatives).

References¶

ADR-0325 — the decision this digest supports.
Research-0087 — Section 1 item 3 (the gap this scaffold opens).
Sherman, J., Morrison, W. J. (1950). Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix. Annals of Mathematical Statistics 21(1):124–127. — closed-form rank-1 inverse-update used by the online-ridge implementation.