Skip to content

Local sidecar training

Status: shipped local surface — public API, persistence layout, closed-form online-ridge fit, and operator-facing vmaf-tune sidecar CLI are available. Drift-warning thresholds and opt-in upload to a community pool remain follow-ups in ADR-0394 §Consequences.

Purpose

The fork ships a per-shot VMAF predictor (tools/vmaf-tune/src/vmaftune/predictor.py) trained against a fixed offline corpus (Phase-A canonical-6 + BVI-DVC, see ADR-0309 and ADR-0310). The shipped predictor is deterministic and reproducible across hosts — it does not adapt to your specific source mix.

The sidecar is a small bias-correction model an operator trains on their own host from the residuals between the predictor's output and the libvmaf score actually observed at encode time. At inference the sidecar's correction is added to the shipped predictor's output; the shipped predictor itself stays read-only:

sidecar_vmaf = Predictor.predict_vmaf(features, crf, codec)
             + SidecarModel.predict_correction(features, crf)

The sidecar adapts to:

  • Your encoder's behaviour at the bitrate / preset combinations you actually use, including patterns the offline corpus doesn't emphasise.
  • Your source characteristics (animation vs live action, grain structure, framing, colour science) where the corpus average is the wrong prior.
  • Per-codec drift between the shipped predictor and your real encodes.

It does not replace the shipped predictor. A model upgrade invalidates the sidecar cleanly and starts a fresh cold-start correction.

Training data

The sidecar is trained from your own encodes. Each capture is a 4-tuple:

Field Source Persisted?
features (ShotFeatures) Probe encode + signalstats. Already produced by vmaftune.predictor_features. Yes (in the rolling history ring buffer).
crf (int) The CRF the encoder ran at. Yes.
predicted_vmaf (float) Predictor.predict_vmaf(features, crf, codec). Yes (in the residual).
observed_vmaf (float) The libvmaf score of the real encode. Yes (in the residual).

Persistence layout (per Research-0086):

${XDG_CACHE_HOME:-~/.cache}/vmaf-tune/sidecar/
  host-uuid                                  # 32-char hex, random per install
  <predictor-version>/
    <codec>/
      state.json                             # ridge weights + A_inv + history

The cache dir does not persist source filenames, source SHAs, IPs, encoder argv, or output paths. The host UUID is a 128-bit random token generated by secrets.token_hex(16) on first install — it is never derived from MAC, hostname, /etc/machine-id, CPUID, or any other machine-identifying signal (see ADR-0394 §Consequences).

Set a different cache dir through XDG_CACHE_HOME or by passing SidecarConfig(cache_dir=...) directly.

Op allowlist

The sidecar's online-ridge fit is implemented in pure Python (closed-form Sherman-Morrison rank-1 update on the inverse Gram matrix). There is no ONNX graph, no torch dependency, and no inference engine to allowlist — the harness's zero-dep contract on tools/vmaf-tune/pyproject.toml is preserved.

If a future PR replaces the linear baseline with a tiny MLP exported to ONNX, the op allowlist will follow the canonical fork pattern documented in docs/ai/quantization.md and docs/ai/inference.md. For the current pure-Python sidecar there is nothing to allowlist.

Validation metrics

The sidecar model is validated by contract tests under tools/vmaf-tune/tests/test_sidecar.py:

Test Pin
test_cold_start_passes_through An empty sidecar adds zero correction → SidecarPredictor is bit-equivalent to the bare Predictor until the first capture lands.
test_update_then_predict_reduces_residual After 40 captures with a constant +5 VMAF bias, the sidecar's prediction is closer to observed than the bare predictor's by at least 50 % of the bias.
test_save_load_round_trip Train → save → load preserves weights, A_inv, history, and n_updates exactly; the round-tripped predictor returns the same prediction within 1e-9 VMAF.
test_anonymised_host_uuid_stable_within_install UUID persists across SidecarPredictor reconstructions sharing a cache dir. Two distinct cache dirs ⇒ two distinct UUIDs (proves it is not derived from machine-identifying info).
test_predictor_version_change_invalidates_sidecar Bumping SidecarConfig.predictor_version returns a cold-start model rather than a stale fit.

Run them locally:

cd tools/vmaf-tune
python -m pytest tests/test_sidecar.py -v

The CLI surface is validated by tools/vmaf-tune/tests/test_cli_sidecar.py:

Test Pin
test_sidecar_subparser_is_registered vmaf-tune sidecar remains a top-level subcommand.
test_sidecar_help_lists_operator_commands Help exposes status, predict, record, and batch-record.
test_sidecar_status_json_uses_requested_cache --cache-dir is honoured and host UUID creation stays local.
test_sidecar_record_then_predict_applies_correction A recorded capture persists and changes a later prediction.
test_sidecar_batch_record_accepts_features_wrapper JSONL batch capture rows support { "features": ... } wrappers.
cd tools/vmaf-tune
python -m pytest tests/test_sidecar.py tests/test_cli_sidecar.py -v

The suites are sub-second and CPU-only.

The drift-detection signal (SidecarModel.recent_residual_rms) is reported by vmaf-tune sidecar status; thresholded warning policy is a future PR because the fork still needs corpus-backed operating ranges.

Signing for opt-in upload

The opt-in upload of anonymised captures to a community pool — item 4 of the user's ChatGPT-vision text — is out of scope for this local sidecar surface. It is tracked as a follow-up in ADR-0394 §Consequences and will require its own ADR covering:

  • Explicit consent UX before any byte leaves the host.
  • A signing chain for capture bundles (Sigstore keyless follows the fork's release-signing posture; see docs/development/release.md).
  • An upload protocol (transport, batching, retry).
  • A community-pool aggregation policy.

Until then the sidecar is strictly local: the cache directory is not network-reachable from the harness, no upload code path exists, and the recorded host UUID is solely an anonymous handle for the local state.

Programmatic usage

from vmaftune.predictor import Predictor, ShotFeatures
from vmaftune.sidecar import SidecarConfig, SidecarPredictor

predictor = Predictor()  # or Predictor(model_path=Path("model/predictor_libx264.onnx"))
sidecar = SidecarPredictor.for_codec(
    predictor,
    codec="libx264",
    config=SidecarConfig(),  # default cache dir under $XDG_CACHE_HOME
)

# At inference time:
score_hat = sidecar.predict_vmaf(features, crf=28)

# After the real encode + libvmaf run:
sidecar.record_capture(features, crf=28, observed_vmaf=real_libvmaf_score)

record_capture saves to disk by default; pass persist=False for batched workflows that flush at session end via sidecar.save().

CLI usage

The vmaf-tune sidecar subcommand provides the same local model for shell workflows:

vmaf-tune sidecar status --codec libx264 --json

vmaf-tune sidecar record \
  --codec libx264 \
  --features-json features.json \
  --crf 28 \
  --observed-vmaf 94.2

vmaf-tune sidecar predict \
  --codec libx264 \
  --features-json features.json \
  --crf 28 \
  --json

features.json may be a flat ShotFeatures object or an object with a features member. The required keys are probe_bitrate_kbps, probe_i_frame_avg_bytes, probe_p_frame_avg_bytes, and probe_b_frame_avg_bytes; omitted optional signals use the same zero defaults as ShotFeatures.

For existing capture logs, write JSONL rows containing features, crf, and observed_vmaf:

vmaf-tune sidecar batch-record \
  --codec libx264 \
  --captures-jsonl captures.jsonl

All CLI paths honour --cache-dir, --predictor-version, and --model. The default predictor is the deterministic analytical fallback, so the sidecar commands remain usable on hosts without onnxruntime.

Interaction with model upgrades

When the shipped predictor version bumps, the sidecar's recorded predictor_version no longer matches SidecarConfig.predictor_version, and SidecarModel.load discards the old state and returns a cold-start model. This is intentional — a stale correction trained against the old predictor's residuals would be worse than no correction against the new predictor.

Operators see a one-time reset to bare-predictor scores after a model upgrade; the sidecar re-fits from the next encode's capture onward.

See also

  • ADR-0394 — the decision.
  • Research-0086 — algorithm choice, privacy contract, drift-detection hook.
  • ADR-0309 — the offline retrain workflow the sidecar complements.
  • ADR-0324 — portable retrain kit a collaborator can run; the sidecar is the on-host analogue.