ADR-0262: bisect-model-quality cache check uses logical comparison for parquet¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: Lusoris, Claude (Anthropic)
- Tags: ai, ci, tiny-ai, framework
- Supersedes: relaxes the byte-equality clause from ADR-0109 §Decision (parquet artefacts only); ONNX byte-equality is preserved.
Context¶
Issue #40 is the sticky tracker for the nightly bisect-model-quality workflow defined in ADR-0109. The original ADR asserted full byte-equality between the committed ai/testdata/bisect/features.parquet and the regenerator's output, and flagged "rebases that change pandas or pyarrow versions can break the drift check; the fix is regenerate + commit" under §Negative.
That maintenance cost has now landed, harder than expected. From 2026-04-22 through 2026-05-03 every nightly run has failed at the build_bisect_cache.py --check step with DRIFT byte drift: features.parquet. The cause is the runner image's pyarrow upgrading from 23.x to 24.x: the parquet writer embeds created_by: parquet-cpp-arrow version <X>.<Y>.<Z> in the file header, so byte identity flips on every minor pyarrow release the actions/setup-python toolcache adopts. Worse, the workflow's sticky-comment update is gated on the result.json artefact existing (if: hashFiles('bisect-out/result.json') != ''), and the --check failure produces no result.json — so the comment on issue #40 stayed frozen on the 2026-04-21 green verdict for ~14 days while the workflow silently red-lined every night.
The committed parquet's actual content (256 rows × 7 typed columns, fixed-seed values) is unchanged; only the writer-version string drifted. Byte-equality on parquet is the wrong gate for the drift guarantee the ADR-0109 §Decision was after — "catching drift in pandas / pyarrow / onnx serialisation" should mean "catch a content change", not "catch a version-string change".
Decision¶
Relax the parquet leg of the cache --check to a typed-Arrow-Table comparison; keep ONNX byte-equality as-is. Concretely:
ai/scripts/build_bisect_cache.pyadds_compare_parquet, which reads both files viapyarrow.parquet.read_tableand compares schema, row count, andTable.equalscontent. The parquet header'screated_byfield is no longer load-bearing._compare_onnxkeepsfilecmp.cmp(shallow=False). ONNX determinism is held by the existing pinning ofproducer_name,producer_version, andir_versionin_save_linear_fr; weight, opset, or graph-topology drift continues to trip immediately.- The unknown-extension fallback in
check()still uses byte comparison so a future artefact format does not slip through unguarded. nightly-bisect.ymldecouples the cache-check step's fail-fast semantics from the sticky-comment update: when--checkfails,post-bisect-comment.py --wiring-broke --error-log <stderr>posts a "WIRING BROKE" verdict to issue #40 with the cache-check stderr inline, then the workflow exits non-zero so the run is still red. The result.json path stays the success path; the wiring-broke path is the failure path; both lead to a fresh sticky comment.
Alternatives considered¶
| Option | Pros | Cons | Why not chosen |
|---|---|---|---|
| Logical (Arrow Table) parquet compare + onnx byte compare (chosen) | Tolerates harmless writer-version drift; still catches content/schema/row-count drift; one-time fix; no recurring maintenance | Loses byte-level detection of compression-codec or footer-layout changes (none of which has ever produced a real signal) | Best fit: the failure mode we keep hitting is exactly the one this skips |
Pin pyarrow==23.0.1 in ai/pyproject.toml and the workflow | Restores byte equality immediately; no script change | Brittle: every dependabot / renovate bump or transitive constraint forces another regeneration cycle; conflicts with pyarrow>=17.0 floor that real training jobs need | Punts the problem; doesn't solve "version-string-as-load-bearing-byte" |
| Regenerate the committed cache against pyarrow 24, commit, repeat per ADR-0109 §Negative | Lowest blast radius; matches the ADR's documented escape valve | Treadmill: the next pyarrow release breaks it again; consumed 14 days of stale comment already with nobody noticing | Doesn't scale; the maintenance cost the ADR accepted turned out to be unaffordable |
Strip created_by from the committed parquet via post-write rewrite | Keeps byte-equality semantics for the rest of the file | Requires a custom parquet rewriter; non-portable across pyarrow versions; the parquet spec doesn't promise stable footer layout for the same content | Fragile, undertested |
| Move the cache check into the bisect tool itself and skip on drift | Simpler workflow | Hides drift entirely; defeats the purpose of the gate | Worse than status quo |
| Switch the cache to a CSV / JSON encoding for trivial byte stability | Easy byte equality | Loses typed dtypes the bisect consumes; doubles fixture size; misses the point that real DMOS caches will be parquet | Wrong direction for the eventual real-cache swap |
Consequences¶
- Positive:
- Issue #40 unsticks — nightly verdict reflects the previous night's actual run again, including a "WIRING BROKE" path for future
--checkfailures so the comment never silently freezes. - Toolchain bumps that touch only writer metadata (pyarrow minor / patch releases,
--compressionchoice changes, footer-layout rewrites) no longer red-line the nightly. - The drift guarantee that matters — "row values, schema, model weights, opset, or ir_version changed silently" — is preserved via Arrow-Table equality + ONNX byte equality.
- Negative:
- A pyarrow release that actually corrupts row values (vs writes a bit-identical-but-byte-different file) is caught by content equality, not by header drift, so the failure point shifts one layer deeper — but that is the correct layer.
- Real future content drift now requires the maintainer to read the
_compare_parquetdiff message ("schema drift" / "row-count drift" / "row-content drift") rather than a generic "byte drift" line. The diagnostic surface improves; the muscle memory shifts. - Neutral / follow-ups:
- Real DMOS-aligned cache swap from Research-0001 inherits the same
_compare_parquetshape; no further check-script work expected at swap time. - If a future artefact ships in a third format (e.g.
.npz), the fallback byte-compare branch continues to apply until that format grows its own_compare_<ext>helper.
References¶
- Issue #40 — sticky bisect tracker; the comment that froze on 2026-04-21.
- ADR-0109 — parent decision on the nightly bisect workflow + synthetic placeholder cache.
- Research-0001 — cache shape alternatives and the eventual real-cache swap path.
ai/scripts/build_bisect_cache.py— the regenerator and--checkdriver.scripts/ci/post-bisect-comment.py— the sticky-comment writer; gains a--wiring-brokemode..github/workflows/nightly-bisect.yml— the workflow whose silent failure motivated the change.- Source: per user direction in this issue's diagnostic task — "diagnose why the workflow stopped firing and post either a fix or a manual run". Fix preferred over manual rerun because the underlying cause was structural, not transient.
Reproducer / smoke-test¶
# Logical-equality path: still passes against the committed cache.
python ai/scripts/build_bisect_cache.py --check
# Force a known-good byte-different rewrite to confirm the parquet
# leg now tolerates it (compression codec swap forces footer drift):
python -c "
import pyarrow.parquet as pq, tempfile, filecmp
from pathlib import Path
import sys; sys.path.insert(0, 'ai/scripts')
import importlib.util
spec = importlib.util.spec_from_file_location('m', 'ai/scripts/build_bisect_cache.py')
m = importlib.util.module_from_spec(spec); spec.loader.exec_module(m)
src = Path('ai/testdata/bisect/features.parquet')
with tempfile.TemporaryDirectory() as t:
dst = Path(t) / 'features.parquet'
pq.write_table(pq.read_table(str(src)), str(dst), compression='snappy')
assert not filecmp.cmp(str(src), str(dst), shallow=False)
assert m._compare_parquet(src, dst) is None
print('OK: byte-different, logically equal')
"
# Force a content drift to confirm the gate still catches it:
python -c "
import pyarrow as pa, pyarrow.parquet as pq, tempfile
from pathlib import Path
import sys; sys.path.insert(0, 'ai/scripts')
import importlib.util
spec = importlib.util.spec_from_file_location('m', 'ai/scripts/build_bisect_cache.py')
m = importlib.util.module_from_spec(spec); spec.loader.exec_module(m)
src = Path('ai/testdata/bisect/features.parquet')
with tempfile.TemporaryDirectory() as t:
dst = Path(t) / 'features.parquet'
df = pq.read_table(str(src)).to_pandas()
df.iloc[0, 0] = 99.0
pq.write_table(pa.Table.from_pandas(df, preserve_index=True), str(dst))
assert m._compare_parquet(src, dst) == 'row-content drift: features.parquet'
print('OK: content drift detected')
"