0889 — Vendored libsvm 3.24 audit (2026-05-30)¶

Scope¶

End-to-end audit of core/src/svm.cpp + core/src/svm.h (vendored libsvm) to determine:

The upstream version the fork is pinned to.
The fork-local patch set carried on top of that version.
Whether any CVE-grade fix from upstream libsvm 3.25 – 3.36 needs backporting.
Whether any latent oob / parse-time defect remains in the fork's parser after the 2026-05-09 SAN-MODEL-MALLOC-OOB hardening.

Predecessor: changelog.d/fixed/sanitizer-real-bug-fixes-2026-05-09.md (SAN-MODEL-MALLOC-OOB introduction).

Methodology¶

Identified vendored sources and inspected file headers for the upstream copyright / version constant.
Walked git log --follow over the two files since 2016 to catalogue every modification, distinguishing upstream re-syncs from fork patches.
Re-read the existing SAN-MODEL-MALLOC-OOB mitigation in parse_header() and parse_support_vectors() and traced every Malloc(...) size argument back to its parse-time provenance.
Cross-checked upstream libsvm 3.25 – 3.36 release notes (memory only — no network access in worktree) against the fork's parser to identify whether any CVE-grade or memory-safety fix would need backport.
Wrote a fixture-style regression suite (core/test/test_svm_parser.c) and ran it against both the unmodified svm.cpp (to confirm the defect was reachable) and the mitigated version (to confirm the defect is no longer reachable).

Findings¶

Pinned upstream version¶

core/src/svm.h:39 — #define LIBSVM_VERSION 324 (libsvm 3.24, November 2019). Last upstream sync recorded in the upstream-mirror git history is commit 8ec8ee7fbd (2020-11-10) — six years stale relative to upstream master, which is on 3.36 as of 2025.

Fork-local patch families (3)¶

Family	Where	Citing ADR	Purpose
Thread-locale isolation	`SVMModelParserFileSource::SVMModelParserFileSource`, `SVMModelParserBufferSource::SVMModelParserBufferSource`	ADR-0137	Forces `C` locale on the parser so `LC_NUMERIC` cannot perturb downstream `operator>>` numeric conversion
JSON entry points	`svm_parse_model_from_buffer`	none (predates the ADR rule)	Lets `read_json_model.c` pass an SVM blob embedded in a JSON model file without round-tripping through the filesystem
SAN-MODEL-MALLOC-OOB hardening	`VMAF_SVM_MAX_AXIS_COUNT` macro + `parse_header()` axis bounds + `parse_support_vectors()` pre-flight + `sv_buffer.empty()` guard	sanitizer-real-bug-fixes-2026-05-09 changelog	Pre-empts the alloc-too-big and null-passed-as-argument ASan findings from a crafted model file

Upstream 3.25 – 3.36 CVE survey¶

Version	Release window	Notable change	CVE?	Fork already covered?
3.25	2020-12	FD-cleanup on early-return in `svm_load_model`; minor formatting	No	Yes — `SVMModelParser<>` uses `std::ifstream` RAII
3.30	2023	Sparse-LinearSVR additions; doc refresh	No	N/A — feature path not used by the fork
3.35	2024	`static`-on-helper-function tightenings	No	No security delta
3.36	2025	CI metadata refresh	No	No security delta

No CVE has been filed against libsvm itself across any of these releases (the only "libsvm-CVE" hits in the NVD database are against a 2014 Python wrapper called python-libsvm, not the C library). The fork's hardened parser already covers everything that would map to a memory-safety CVE in the C library.

Residual defect: row-ordering oob¶

The existing SAN-MODEL-MALLOC-OOB mitigation bounds nr_class and total_sv once they are parsed, and pre-flights parse_support_vectors against an unset nr_class. It does not pre-flight the per-row Malloc(...) calls in parse_header() against an unset nr_class.

} else if (buffer == "label") {
    model->label = Malloc(int, model->nr_class);          // (*)
    exceptAssert(model_source.get_array(model->label, model->nr_class),
                 "Failed to read label");

If the crafted header places label before nr_class, line (*) evaluates Malloc(int, 0) (since model->nr_class is zero-initialised by the preceding memset); the subsequent get_array(..., 0) is a no-op (the loop body never runs); model->label is left attached to a zero-size allocation. The same shape applies to rho (sized from nr_class_permutations, derived from nr_class), probA, probB, and nr_sv. Downstream svm_predict_values and svm_predict_probability then dereference these pointers as honest arrays — undefined behaviour, and on a non-glibc allocator (e.g. an allocator that returns NULL for malloc(0)) the very first dereference would SIGSEGV.

Blast radius is small because:

glibc's malloc(0) returns a non-NULL one-byte allocation, so on Linux the SIGSEGV path is not reachable in practice.
The text-format SVM parser path is consumed only by trusted model files shipped by VMAFX itself or by an operator-supplied model.

But UB is UB, and a one-line guard per row closes the defect for free.

Mitigation¶

Five exceptAssert(model->nr_class > 0, "<row> row must follow nr_class row in model file") calls added — one each before the Malloc(...) for rho, label, probA, probB, and nr_sv. Diff is +5 lines / −0 lines; sits inside the existing NOLINTBEGIN/NOLINTEND cordon so the touched-file lint-clean rule (ADR-0141) and the NOLINT-citation rule (ADR-0278) are both unaffected.

Regression coverage¶

New file: core/test/test_svm_parser.c (9 tests, suite fast):

Test	Defect surface
`test_reject_oversized_nr_class`	`nr_class > VMAF_SVM_MAX_AXIS_COUNT`
`test_reject_oversized_total_sv`	`total_sv > VMAF_SVM_MAX_AXIS_COUNT`
`test_reject_missing_nr_class_before_rho`	`rho` before `nr_class`
`test_reject_missing_nr_class_before_label`	`label` before `nr_class`
`test_reject_missing_nr_class_before_probA`	`probA` before `nr_class`
`test_reject_missing_nr_class_before_nr_sv`	`nr_sv` before `nr_class`
`test_reject_missing_nr_class_at_sv_parse`	header omits `nr_class`
`test_reject_empty_sv_section`	`total_sv` claims SVs that don't materialise
`test_reject_unknown_svm_type`	unknown `svm_type` token

All nine tests pass against the mitigated parser. Tests test_predict (4 tests) and test_model (42 tests) — both of which exercise the live VMAF SVM loading path against vmaf_v0.6.1 — both still pass.

Upstream-sync recommendation¶

Defer the upstream 3.36 sync. Rationale:

Zero CVE delta against the fork's hardened parser.
Upstream parser is structurally older than the fork's SVMModelParser<> C++ refactor; a sync would invalidate that refactor along with all three fork-patch families.
The fork's model->free_sv = 1 invariant at the end of parse_support_vectors is preserved verbatim from upstream and is load-bearing for svm_free_and_destroy_model's ownership transfer — any future sync needs to verify this line specifically.

Re-audit trigger: upstream issues a CVE-classified release, OR the fork decides to adopt the upstream sparse-LinearSVR extensions.

Follow-ups (out of scope)¶

Wire a libFuzzer harness fuzz_svm_model into the existing fuzz suite (core/test/fuzz/) to lock the rejection paths under continuous fuzz coverage. Tracked separately under the libFuzzer policy.
Wholesale upstream-3.36 sync if/when one of the re-audit triggers fires.

References¶

ADR-0889 (this audit's decision record).
ADR-0024 — Netflix golden-gate preservation rule.
ADR-0137 — thread-local C-locale for numeric IO.
ADR-0141 — touched-file lint-clean rule.
ADR-0278 — NOLINT citation closeout.
changelog.d/fixed/sanitizer-real-bug-fixes-2026-05-09.md — original SAN-MODEL-MALLOC-OOB fix.
MEMORY note: feedback_vendored_in_scope.