ART vs LSM storage study
The residency / prefix index is written on every KV event and read on every request (longest reusable prefix), and a persistent control plane needs it on disk. Its workload is prefix-heavy (shared system prompts, RAG docs, agent sessions) and write-frequent. Which storage engine fits?
QuillCache makes the index a pluggable, persistent, measured component behind one
IndexBackend trait, and asks the question directly. Two natural designs:
- ART (Holt) — a radix/trie. Prefix-native (the prefix is the path), near-memory point/prefix lookups, and no compaction write amplification (append-only). SGLang’s RadixAttention uses a radix tree in memory for exactly this; Holt makes it persistent with a WAL.
- LSM (RocksDB) — write-optimized via compaction, but compaction causes write amplification and prefix scans are less natural (they span levels).
Results
Same workload, same trait, via cargo run --features "rocksdb holt" -- bench-index:
| backend | ingest | prefix-scan p50 | p99 | write-amp | recovery | on-disk |
|---|---|---|---|---|---|---|
| memory (flat map) | 706k/s | 494 µs | 1685 µs | — | — | 0 |
| rocksdb (LSM) | 56k/s | 16.8 µs | 29.6 µs | 10.6× | 4.1 ms | small |
| holt (ART) | 55k/s | 9.96 µs | 13.7 µs | 1.0× | 2.6 ms | larger |
ART wins prefix-scan latency (~1.7× over LSM at p50, ~50× over the flat map’s O(N) scan), recovery, and write amplification (append-only, 1×). LSM wins the on-disk footprint (compaction reclaims space). Ingest is comparable between the two persistent backends — the cost of durability.
So pick ART when prefix-scan latency and recovery dominate (the common case for a residency index queried per request); pick LSM when disk footprint is the constraint.
Write amplification, measured
Write amplification is read from RocksDB’s own flush/compaction statistics, not assumed. The tradeoff is exact and opposite: ART writes each record once (1×) but keeps everything; LSM rewrites data through compaction (10.6×) but reclaims space. This is the classic append-only-vs-compaction storage tradeoff, here measured for a KV-cache residency index — a gap a recently published RocksDB-for-KV-cache approach left unanalyzed.
A bottleneck the benchmark caught
An eviction-churn phase (remove_block + put under cache pressure) surfaced
that remove_block was O(scope) — given a block hash but not its prefix,
every backend scanned the whole identity scope. A secondary block_hash → primary key reverse index made it an O(matches) seek, 100–300× faster eviction on
the persistent backends. The benchmark working as intended: it found the
bottleneck, then measured the fix.