vmaf-tune content-addressed cache¶
vmaf-tune corpus runs are iterative — most user sessions re-run the same (source, encoder, preset, crf) tuples after adjusting one unrelated flag. Re-encoding and re-scoring those unchanged tuples burns minutes-to-hours of wall-clock time for no new information. Per ADR-0298 the corpus runner ships a content-addressed cache that turns those repeat cells into free hits.
The cache is on by default for new runs. --no-cache disables it. The base tool is documented in vmaf-tune.md.
Cache key¶
A cache entry is keyed on the SHA-256 of the canonical-JSON-encoded tuple of six fields, all of which must match for a hit:
| Field | Source |
|---|---|
src_sha256 | Content hash of the reference YUV. |
encoder | Adapter slug (libx264, hevc_nvenc, …). |
preset | Encoder preset string fed to the adapter. |
crf | Quality-knob value (int). |
adapter_version | Bumps when the adapter's argv shape changes. |
ffmpeg_version | Host ffmpeg version string. |
Dropping any one of these fields would produce a wrong cached score (e.g. on an adapter upgrade or an ffmpeg rebuild). Each field participates in the key by design; tests in tools/vmaf-tune/tests/test_cache.py enforce that mutating any one field produces a new key.
src_sha256 can be skipped with --no-source-hash for fast iteration on huge YUVs at the cost of provenance — the cache will then key purely on path identity.
Cache layout on disk¶
$XDG_CACHE_HOME/vmaf-tune/ (default; `~/.cache/vmaf-tune/` if XDG unset)
meta/<key>.json JSON sidecar with the parsed result tuple
blobs/<key>.bin opaque encoded artifact (atomic put: tmp + rename)
__index__.json last-access timestamps for LRU eviction
CLI shape¶
vmaf-tune corpus \
--source ref.yuv --width 1920 --height 1080 \
--pix-fmt yuv420p --framerate 24 \
--encoder libx264 --preset medium --crf 22 \
--out corpus.jsonl
# → cache enabled, default dir
| Flag | Effect |
|---|---|
| (default) | Cache enabled at $XDG_CACHE_HOME/vmaf-tune/. |
--no-cache | Disable cache for this run; always re-encode + re-score. |
--cache-dir <path> | Override cache root. |
--cache-size-bytes <N> | Override LRU ceiling (default 10 GiB). |
The CorpusOptions library API exposes cache_enabled / cache_dir / cache_size_bytes for in-process callers.
Cache lifecycle¶
- Hit: the encode + score subprocesses are skipped entirely; the cached result tuple is reused as the JSONL row.
- Miss: encode + score run as normal; the result tuple is inserted with a fresh
last_accesstimestamp. - LRU eviction: when the on-disk cache size exceeds
--cache-size-bytes, the LRU__index__.jsondrops least-recently- accessed entries until under the ceiling. The eviction step is invoked at the end of eachcorpusrun, not opportunistically during it (so a single run cannot evict its own entries).
Per ADR-0298 the cache content is not baked into the JSONL row — the JSONL row remains the canonical record. The cache is an opaque encode/score result store keyed by the six-tuple; corpus rows on disk look identical whether the cache hit or missed.
Manual eviction¶
There is no "cache prune" subcommand yet — manual rm -rf is the intended escape hatch when the corpus methodology changes incompatibly.
See also¶
vmaf-tune.md— the base tool, corpus + recommend flow.vmaf-tune-codec-adapters.md— theadapter_versioncache-key field bumps any time the listed adapter shape changes.- ADR-0298 — design decision.
- Research-0086 — audit that triggered this page.