Hardware-capability priors¶
Operator-facing reference for the per-architecture GPU encode fingerprint table that the corpus-ingest pipeline merges into training rows. Owned by ADR-0335; methodology written up in docs/research/0088-hardware-capability-priors-2026-05-08.md.
What the table is¶
ai/data/hardware_caps.csv is a small (~7 row) table of capability metadata for the GPU video-encode generations the fork's corpus encounters today. Each row describes what one architecture can do — codecs it can emit, resolution caps per codec, encoder-block count, tensor-core / NPU presence, minimum production driver — sourced from primary vendor documentation.
The table is consumed by ai/scripts/hardware_caps_loader.py at corpus-ingest time. The loader's cap_vector_for(encoder, encoder_arch_hint) function returns a flat dict of hwcap_* columns the ingest script merges into each corpus row before the parquet write. The trainer then has structural priors like "this generation supports AV1" or "this generation has tensor cores" available without ever measuring throughput.
What the table is NOT¶
Not a benchmark database. The table contains zero throughput, latency, watts, TDP, fps, or quality numbers. The contributor-pack research digest's category-1 NO-GO finding established that shipping vendor-published performance numbers leaks biased priors into the FR-regressor — vendors run their own benchmarks under favourable conditions and the numbers are not comparable across generations or vendors. The trainer learns performance from the corpus's own measured rows; capability metadata only describes the search space.
If you find yourself wanting to add nvenc_av1_max_throughput_fps or similar, stop and re-read the digest. The right place for that data is a corpus row produced by the fork's own vmaf-tune pipeline, not a static CSV.
Schema¶
| Column | Type | Notes |
|---|---|---|
arch_name | kebab-case string | e.g. battlemage, rdna4, blackwell |
vendor | intel | amd | nvidia | rejected if anything else |
gen_year | int | first retail product year |
codecs_supported | \|-separated tokens | only HW-encode codecs (decode-only excluded) |
max_resolution_per_codec | \|-separated codec=WxH | one entry per codec in codecs_supported |
encoding_blocks | int >= 1 | independent encode engines on the highest-tier SKU |
tensor_cores | bool | matrix / tensor / XMX path present |
npu_present | bool | on-die NPU separate from GPU tensor units |
driver_min_version | string | oldest production driver branch |
source_url | https URL | vendor primary doc only (Wikipedia rejected) |
verified_date | ISO-8601 | date the row was verified against source_url |
Architectures shipped on 2026-05-08¶
| arch | vendor | gen_year | encode codecs | role |
|---|---|---|---|---|
| alchemist | intel | 2022 | h264 hevc av1 | predecessor of Battlemage |
| battlemage | intel | 2024 | h264 hevc av1 | named target |
| rdna3 | amd | 2022 | h264 hevc av1 | predecessor of RDNA4 |
| rdna4 | amd | 2025 | h264 hevc av1 | named target |
| ada-lovelace | nvidia | 2022 | h264 hevc av1 | predecessor of Blackwell |
| blackwell | nvidia | 2025 | h264 hevc av1 | named target |
Each row's primary verification source URL lives directly in the CSV's source_url column — reviewers and future operators can follow the link to confirm any claim before relying on it.
Why Hopper is absent¶
NVIDIA Hopper (H100 / H200) ships zero NVENC engines on the data-centre SKUs and is therefore out of scope for an encode-capability fingerprint. Including a row with encoding_blocks=0 would be misleading; the loader rejects it schema-side.
Ingest contract¶
from ai.scripts.hardware_caps_loader import (
HardwareCapsTable,
cap_vector_for,
)
caps = HardwareCapsTable.load_default()
for encode_row in corpus_rows:
fingerprint = cap_vector_for(
caps,
encoder=encode_row["encoder"], # e.g. "av1_nvenc"
encoder_arch_hint=encode_row.get("arch"), # e.g. "blackwell"
)
encode_row.update(fingerprint)
The returned dict always carries the same keys (hwcap_known, hwcap_arch_name, …) regardless of whether the architecture was resolved. Unresolved rows get hwcap_known=0 and None for every other field — downstream parquet writers serialise these as nulls so the column shape is stable across the dataset.
The loader only fingerprints hardware encoders (*_nvenc, *_amf, *_qsv). CPU-only encoders (libx264, libx265, libsvtav1, …) return the same blank vector with hwcap_known=0.
Adding a new architecture¶
- Identify the primary vendor doc — NVIDIA Video Codec SDK release notes, AMD GPUOpen / AMF docs, Intel oneVPL or Arc product pages. Wikipedia, community wikis, and third-party reviews are NOT acceptable.
https://...only. - Confirm every required column from primary text. If any column cannot be verified, do not ship the row — drop it and open an issue noting which capability is unverified.
- Append the row to
ai/data/hardware_caps.csv. The schema check inai/tests/test_hardware_caps.pyrejects empty fields, missingcodec=WxHpairs, non-vendor source domains, and non-ISOverified_date. - Run the round-trip test:
- Bump the row's
verified_dateto the date you read the source doc, not the date you opened the PR.
Re-verification cadence¶
The verified_date column is the only operational hint about freshness. There is no scheduled rotation today; the table is small enough that a one-shot re-walk every 12 months (or whenever a new GPU generation lands) is the expected pattern. Rows whose verified_date falls more than a year behind the head of master should be re-walked or dropped.
Cross-references¶
- ADR-0335 — decision record.
docs/research/0088-hardware-capability-priors-2026-05-08.md— research digest with the NO-GO finding on benchmark numbers.ai/data/hardware_caps.csv— the table itself.ai/scripts/hardware_caps_loader.py— loader + ingest helper.ai/tests/test_hardware_caps.py— round-trip / schema / vector tests.