Local sidecar training¶

Status: shipped local surface — public API, persistence layout, closed-form online-ridge fit, and operator-facing vmaf-tune sidecar CLI are available. Drift-warning thresholds and opt-in upload to a community pool remain follow-ups in ADR-0394 §Consequences.

Purpose¶

The fork ships a per-shot VMAF predictor (tools/vmaf-tune/src/vmaftune/predictor.py) trained against a fixed offline corpus (Phase-A canonical-6 + BVI-DVC, see ADR-0309 and ADR-0310). The shipped predictor is deterministic and reproducible across hosts — it does not adapt to your specific source mix.

The sidecar is a small bias-correction model an operator trains on their own host from the residuals between the predictor's output and the libvmaf score actually observed at encode time. At inference the sidecar's correction is added to the shipped predictor's output; the shipped predictor itself stays read-only:

sidecar_vmaf = Predictor.predict_vmaf(features, crf, codec)
             + SidecarModel.predict_correction(features, crf)

The sidecar adapts to:

Your encoder's behaviour at the bitrate / preset combinations you actually use, including patterns the offline corpus doesn't emphasise.
Your source characteristics (animation vs live action, grain structure, framing, colour science) where the corpus average is the wrong prior.
Per-codec drift between the shipped predictor and your real encodes.

It does not replace the shipped predictor. A model upgrade invalidates the sidecar cleanly and starts a fresh cold-start correction.

Training data¶

The sidecar is trained from your own encodes. Each capture is a 4-tuple:

Field	Source	Persisted?
`features` (`ShotFeatures`)	Probe encode + signalstats. Already produced by `vmaftune.predictor_features`.	Yes (in the rolling history ring buffer).
`crf` (int)	The CRF the encoder ran at.	Yes.
`predicted_vmaf` (float)	`Predictor.predict_vmaf(features, crf, codec)`.	Yes (in the residual).
`observed_vmaf` (float)	The libvmaf score of the real encode.	Yes (in the residual).

Persistence layout (per Research-0086):

${XDG_CACHE_HOME:-~/.cache}/vmaf-tune/sidecar/
  host-uuid                                  # 32-char hex, random per install
  <predictor-version>/
    <codec>/
      state.json                             # ridge weights + A_inv + history

The cache dir does not persist source filenames, source SHAs, IPs, encoder argv, or output paths. The host UUID is a 128-bit random token generated by secrets.token_hex(16) on first install — it is never derived from MAC, hostname, /etc/machine-id, CPUID, or any other machine-identifying signal (see ADR-0394 §Consequences).

Set a different cache dir through XDG_CACHE_HOME or by passing SidecarConfig(cache_dir=...) directly.

Op allowlist¶

The sidecar's online-ridge fit is implemented in pure Python (closed-form Sherman-Morrison rank-1 update on the inverse Gram matrix). There is no ONNX graph, no torch dependency, and no inference engine to allowlist — the harness's zero-dep contract on tools/vmaf-tune/pyproject.toml is preserved.

If a future PR replaces the linear baseline with a tiny MLP exported to ONNX, the op allowlist will follow the canonical fork pattern documented in docs/ai/quantization.md and docs/ai/inference.md. For the current pure-Python sidecar there is nothing to allowlist.

Validation metrics¶

The sidecar model is validated by contract tests under tools/vmaf-tune/tests/test_sidecar.py:

Test	Pin
`test_cold_start_passes_through`	An empty sidecar adds zero correction → `SidecarPredictor` is bit-equivalent to the bare `Predictor` until the first capture lands.
`test_update_then_predict_reduces_residual`	After 40 captures with a constant +5 VMAF bias, the sidecar's prediction is closer to observed than the bare predictor's by at least 50 % of the bias.
`test_save_load_round_trip`	Train → save → load preserves weights, `A_inv`, history, and `n_updates` exactly; the round-tripped predictor returns the same prediction within 1e-9 VMAF.
`test_anonymised_host_uuid_stable_within_install`	UUID persists across `SidecarPredictor` reconstructions sharing a cache dir. Two distinct cache dirs ⇒ two distinct UUIDs (proves it is not derived from machine-identifying info).
`test_predictor_version_change_invalidates_sidecar`	Bumping `SidecarConfig.predictor_version` returns a cold-start model rather than a stale fit.

Run them locally:

cd tools/vmaf-tune
python -m pytest tests/test_sidecar.py -v

The CLI surface is validated by tools/vmaf-tune/tests/test_cli_sidecar.py:

Test	Pin
`test_sidecar_subparser_is_registered`	`vmaf-tune sidecar` remains a top-level subcommand.
`test_sidecar_help_lists_operator_commands`	Help exposes `status`, `predict`, `record`, and `batch-record`.
`test_sidecar_status_json_uses_requested_cache`	`--cache-dir` is honoured and host UUID creation stays local.
`test_sidecar_record_then_predict_applies_correction`	A recorded capture persists and changes a later prediction.
`test_sidecar_batch_record_accepts_features_wrapper`	JSONL batch capture rows support `{ "features": ... }` wrappers.

cd tools/vmaf-tune
python -m pytest tests/test_sidecar.py tests/test_cli_sidecar.py -v

The suites are sub-second and CPU-only.

The drift-detection signal (SidecarModel.recent_residual_rms) is reported by vmaf-tune sidecar status; thresholded warning policy is a future PR because the fork still needs corpus-backed operating ranges.

Signing for opt-in upload¶

The opt-in upload of anonymised captures to a community pool — item 4 of the user's ChatGPT-vision text — is out of scope for this local sidecar surface. It is tracked as a follow-up in ADR-0394 §Consequences and will require its own ADR covering:

Explicit consent UX before any byte leaves the host.
A signing chain for capture bundles (Sigstore keyless follows the fork's release-signing posture; see docs/development/release.md).
An upload protocol (transport, batching, retry).
A community-pool aggregation policy.

Until then the sidecar is strictly local: the cache directory is not network-reachable from the harness, no upload code path exists, and the recorded host UUID is solely an anonymous handle for the local state.

Programmatic usage¶

from vmaftune.predictor import Predictor, ShotFeatures
from vmaftune.sidecar import SidecarConfig, SidecarPredictor

predictor = Predictor()  # or Predictor(model_path=Path("model/predictor_libx264.onnx"))
sidecar = SidecarPredictor.for_codec(
    predictor,
    codec="libx264",
    config=SidecarConfig(),  # default cache dir under $XDG_CACHE_HOME
)

# At inference time:
score_hat = sidecar.predict_vmaf(features, crf=28)

# After the real encode + libvmaf run:
sidecar.record_capture(features, crf=28, observed_vmaf=real_libvmaf_score)

record_capture saves to disk by default; pass persist=False for batched workflows that flush at session end via sidecar.save().

CLI usage¶

The vmaf-tune sidecar subcommand provides the same local model for shell workflows:

vmaf-tune sidecar status --codec libx264 --json

vmaf-tune sidecar record \
  --codec libx264 \
  --features-json features.json \
  --crf 28 \
  --observed-vmaf 94.2

vmaf-tune sidecar predict \
  --codec libx264 \
  --features-json features.json \
  --crf 28 \
  --json

features.json may be a flat ShotFeatures object or an object with a features member. The required keys are probe_bitrate_kbps, probe_i_frame_avg_bytes, probe_p_frame_avg_bytes, and probe_b_frame_avg_bytes; omitted optional signals use the same zero defaults as ShotFeatures.

For existing capture logs, write JSONL rows containing features, crf, and observed_vmaf:

vmaf-tune sidecar batch-record \
  --codec libx264 \
  --captures-jsonl captures.jsonl

All CLI paths honour --cache-dir, --predictor-version, and --model. The default predictor is the deterministic analytical fallback, so the sidecar commands remain usable on hosts without onnxruntime.

Interaction with model upgrades¶

When the shipped predictor version bumps, the sidecar's recorded predictor_version no longer matches SidecarConfig.predictor_version, and SidecarModel.load discards the old state and returns a cold-start model. This is intentional — a stale correction trained against the old predictor's residuals would be worse than no correction against the new predictor.

Operators see a one-time reset to bare-predictor scores after a model upgrade; the sidecar re-fits from the next encode's capture onward.