Skip to content

--precision — score output precision

--precision controls how VMAF scores are formatted in XML / JSON / CSV / SUB output and on stderr. Fork-added per ADR-0119 (which supersedes ADR-0006). The default matches upstream Netflix %.6f exactly so the CPU golden gate (CLAUDE.md §8) passes without explicit flags. Round-trip lossless %.17g is opt-in via --precision=max.

Grammar

--precision N          # integer 1..17 → printf "%.<N>g"
--precision max        # alias for "%.17g" (IEEE-754 round-trip lossless)
--precision full       # alias for "%.17g"
--precision legacy     # "%.6f" — synonym for the default (pre-fork format)

Defaults (no --precision): %.6f.

When to pick each

Mode Use when
no flag / legacy Default, Netflix-compatible. Matches upstream byte-for-byte. Required for the golden gate, FFmpeg vf_libvmaf, and any consumer that parses the existing CLI output schema.
max / full Cross-backend numeric diff (CPU vs SIMD vs CUDA vs SYCL), archival reports where every ULP matters, or any pipeline that re-parses scores into doubles and compares them.
N = 3 to N = 6 Human reading on a terminal where you want shorter scores. Do not pipe this into a tool that compares scores.
N = 7 to N = 16 Intermediate — no common reason; usually you want max.
N = 17 Equivalent to max explicitly.

Why %.6f is the default

Several Netflix golden tests (e.g. python/test/command_line_test.py) do exact-string match against XML output, not assertAlmostEqual. Under %.17g the printed strings change shape (more digits) even though the underlying doubles are identical, so the goldens fail. The only way to keep those assertions passing without modifying them — which CLAUDE.md §8 forbids — is for the CLI's default to print the pre-fork %.6f form. See ADR-0119 for the full rationale.

Why %.17g is the lossless opt-in

IEEE-754 double precision holds ~15.95 significant decimal digits. %.17g is the minimum printf format that guarantees:

parse(print(x)) == x    for every finite double x

Anything shorter (%.15g, %.6f) can round-trip-corrupt scores that differ by one ULP — exactly the resolution at which cross-backend diffs matter. See ../benchmarks.md and ../backends/index.md for the cross-backend ULP budget. Pass --precision=max whenever you intend to compare scores numerically across backends.

Effect on all output channels

The same format applies uniformly to:

  • Pooled stderr line (vmaf_v0.6.1: 76.668905)
  • XML per-frame attribute values
  • JSON per-frame + pooled numbers
  • CSV cells
  • SubRip <subtitle> text

There is no way to pick different precisions for different outputs in one invocation — this is intentional so that output.xml, output.json, and the stderr line always agree.

Example — round-trip check

./build/core/tools/vmaf \
  --reference  src01_hrc00_576x324.yuv \
  --distorted  src01_hrc01_576x324.yuv \
  --width 576 --height 324 --pixel_format 420 --bitdepth 8 \
  --output scores.json --json

Default output (Netflix-compatible):

{"vmaf": {"mean": 76.668905, ...}}

Same run with --precision max:

{"vmaf": {"mean": 76.66890501970558, ...}}

The default form drops ~9 significant digits. Across the 3 Netflix CPU golden pairs, the default and max form agree to 6 decimals by construction — but that is the exact margin at which SIMD / GPU backends deviate, so archival reports and cross-backend diffs must use --precision=max.

Interaction with goldens

The 3 Netflix CPU golden tests (ADR-0024) pass bit-for-bit with the %.6f default — including the exact-string XML assertions in python/test/command_line_test.py that originally drove this revert.

  • cli.md — full CLI reference, --precision summary row.
  • ADR-0119 — current decision (%.6f default).
  • ADR-0006Superseded. Original %.17g-default decision; kept for history.
  • ../benchmarks.md — fork-added benchmark numbers, all reported at --precision=max.