0889 — Vendored libsvm 3.24 audit (2026-05-30)¶
Scope¶
End-to-end audit of core/src/svm.cpp + core/src/svm.h (vendored libsvm) to determine:
- The upstream version the fork is pinned to.
- The fork-local patch set carried on top of that version.
- Whether any CVE-grade fix from upstream libsvm 3.25 – 3.36 needs backporting.
- Whether any latent oob / parse-time defect remains in the fork's parser after the 2026-05-09 SAN-MODEL-MALLOC-OOB hardening.
Predecessor: changelog.d/fixed/sanitizer-real-bug-fixes-2026-05-09.md (SAN-MODEL-MALLOC-OOB introduction).
Methodology¶
- Identified vendored sources and inspected file headers for the upstream copyright / version constant.
- Walked
git log --followover the two files since 2016 to catalogue every modification, distinguishing upstream re-syncs from fork patches. - Re-read the existing SAN-MODEL-MALLOC-OOB mitigation in
parse_header()andparse_support_vectors()and traced everyMalloc(...)size argument back to its parse-time provenance. - Cross-checked upstream libsvm 3.25 – 3.36 release notes (memory only — no network access in worktree) against the fork's parser to identify whether any CVE-grade or memory-safety fix would need backport.
- Wrote a fixture-style regression suite (
core/test/test_svm_parser.c) and ran it against both the unmodifiedsvm.cpp(to confirm the defect was reachable) and the mitigated version (to confirm the defect is no longer reachable).
Findings¶
Pinned upstream version¶
core/src/svm.h:39 — #define LIBSVM_VERSION 324 (libsvm 3.24, November 2019). Last upstream sync recorded in the upstream-mirror git history is commit 8ec8ee7fbd (2020-11-10) — six years stale relative to upstream master, which is on 3.36 as of 2025.
Fork-local patch families (3)¶
| Family | Where | Citing ADR | Purpose |
|---|---|---|---|
| Thread-locale isolation | SVMModelParserFileSource::SVMModelParserFileSource, SVMModelParserBufferSource::SVMModelParserBufferSource | ADR-0137 | Forces C locale on the parser so LC_NUMERIC cannot perturb downstream operator>> numeric conversion |
| JSON entry points | svm_parse_model_from_buffer | none (predates the ADR rule) | Lets read_json_model.c pass an SVM blob embedded in a JSON model file without round-tripping through the filesystem |
| SAN-MODEL-MALLOC-OOB hardening | VMAF_SVM_MAX_AXIS_COUNT macro + parse_header() axis bounds + parse_support_vectors() pre-flight + sv_buffer.empty() guard | sanitizer-real-bug-fixes-2026-05-09 changelog | Pre-empts the alloc-too-big and null-passed-as-argument ASan findings from a crafted model file |
Upstream 3.25 – 3.36 CVE survey¶
| Version | Release window | Notable change | CVE? | Fork already covered? |
|---|---|---|---|---|
| 3.25 | 2020-12 | FD-cleanup on early-return in svm_load_model; minor formatting | No | Yes — SVMModelParser<> uses std::ifstream RAII |
| 3.30 | 2023 | Sparse-LinearSVR additions; doc refresh | No | N/A — feature path not used by the fork |
| 3.35 | 2024 | static-on-helper-function tightenings | No | No security delta |
| 3.36 | 2025 | CI metadata refresh | No | No security delta |
No CVE has been filed against libsvm itself across any of these releases (the only "libsvm-CVE" hits in the NVD database are against a 2014 Python wrapper called python-libsvm, not the C library). The fork's hardened parser already covers everything that would map to a memory-safety CVE in the C library.
Residual defect: row-ordering oob¶
The existing SAN-MODEL-MALLOC-OOB mitigation bounds nr_class and total_sv once they are parsed, and pre-flights parse_support_vectors against an unset nr_class. It does not pre-flight the per-row Malloc(...) calls in parse_header() against an unset nr_class.
} else if (buffer == "label") {
model->label = Malloc(int, model->nr_class); // (*)
exceptAssert(model_source.get_array(model->label, model->nr_class),
"Failed to read label");
If the crafted header places label before nr_class, line (*) evaluates Malloc(int, 0) (since model->nr_class is zero-initialised by the preceding memset); the subsequent get_array(..., 0) is a no-op (the loop body never runs); model->label is left attached to a zero-size allocation. The same shape applies to rho (sized from nr_class_permutations, derived from nr_class), probA, probB, and nr_sv. Downstream svm_predict_values and svm_predict_probability then dereference these pointers as honest arrays — undefined behaviour, and on a non-glibc allocator (e.g. an allocator that returns NULL for malloc(0)) the very first dereference would SIGSEGV.
Blast radius is small because:
- glibc's
malloc(0)returns a non-NULL one-byte allocation, so on Linux the SIGSEGV path is not reachable in practice. - The text-format SVM parser path is consumed only by trusted model files shipped by VMAFX itself or by an operator-supplied model.
But UB is UB, and a one-line guard per row closes the defect for free.
Mitigation¶
Five exceptAssert(model->nr_class > 0, "<row> row must follow nr_class row in model file") calls added — one each before the Malloc(...) for rho, label, probA, probB, and nr_sv. Diff is +5 lines / −0 lines; sits inside the existing NOLINTBEGIN/NOLINTEND cordon so the touched-file lint-clean rule (ADR-0141) and the NOLINT-citation rule (ADR-0278) are both unaffected.
Regression coverage¶
New file: core/test/test_svm_parser.c (9 tests, suite fast):
| Test | Defect surface |
|---|---|
test_reject_oversized_nr_class | nr_class > VMAF_SVM_MAX_AXIS_COUNT |
test_reject_oversized_total_sv | total_sv > VMAF_SVM_MAX_AXIS_COUNT |
test_reject_missing_nr_class_before_rho | rho before nr_class |
test_reject_missing_nr_class_before_label | label before nr_class |
test_reject_missing_nr_class_before_probA | probA before nr_class |
test_reject_missing_nr_class_before_nr_sv | nr_sv before nr_class |
test_reject_missing_nr_class_at_sv_parse | header omits nr_class |
test_reject_empty_sv_section | total_sv claims SVs that don't materialise |
test_reject_unknown_svm_type | unknown svm_type token |
All nine tests pass against the mitigated parser. Tests test_predict (4 tests) and test_model (42 tests) — both of which exercise the live VMAF SVM loading path against vmaf_v0.6.1 — both still pass.
Upstream-sync recommendation¶
Defer the upstream 3.36 sync. Rationale:
- Zero CVE delta against the fork's hardened parser.
- Upstream parser is structurally older than the fork's
SVMModelParser<>C++ refactor; a sync would invalidate that refactor along with all three fork-patch families. - The fork's
model->free_sv = 1invariant at the end ofparse_support_vectorsis preserved verbatim from upstream and is load-bearing forsvm_free_and_destroy_model's ownership transfer — any future sync needs to verify this line specifically.
Re-audit trigger: upstream issues a CVE-classified release, OR the fork decides to adopt the upstream sparse-LinearSVR extensions.
Follow-ups (out of scope)¶
- Wire a libFuzzer harness
fuzz_svm_modelinto the existing fuzz suite (core/test/fuzz/) to lock the rejection paths under continuous fuzz coverage. Tracked separately under the libFuzzer policy. - Wholesale upstream-3.36 sync if/when one of the re-audit triggers fires.
References¶
- ADR-0889 (this audit's decision record).
- ADR-0024 — Netflix golden-gate preservation rule.
- ADR-0137 — thread-local C-locale for numeric IO.
- ADR-0141 — touched-file lint-clean rule.
- ADR-0278 — NOLINT citation closeout.
changelog.d/fixed/sanitizer-real-bug-fixes-2026-05-09.md— original SAN-MODEL-MALLOC-OOB fix.- MEMORY note:
feedback_vendored_in_scope.