Skip to content

ART vs LSM storage study

The residency / prefix index is written on every KV event and read on every request (longest reusable prefix), and a persistent control plane needs it on disk. Its workload is prefix-heavy (shared system prompts, RAG docs, agent sessions) and write-frequent. Which storage engine fits?

QuillCache makes the index a pluggable, persistent, measured component behind one IndexBackend trait, and asks the question directly. Two natural designs:

  • ART (Holt) — a radix/trie. Prefix-native (the prefix is the path), near-memory point/prefix lookups, and no compaction write amplification (append-only). SGLang’s RadixAttention uses a radix tree in memory for exactly this; Holt makes it persistent with a WAL.
  • LSM (RocksDB) — write-optimized via compaction, but compaction causes write amplification and prefix scans are less natural (they span levels).

Results

Same workload, same trait, via cargo run --features "rocksdb holt" -- bench-index:

backendingestprefix-scan p50p99write-amprecoveryon-disk
memory (flat map)706k/s494 µs1685 µs0
rocksdb (LSM)56k/s16.8 µs29.6 µs10.6×4.1 mssmall
holt (ART)55k/s9.96 µs13.7 µs1.0×2.6 mslarger

ART wins prefix-scan latency (~1.7× over LSM at p50, ~50× over the flat map’s O(N) scan), recovery, and write amplification (append-only, 1×). LSM wins the on-disk footprint (compaction reclaims space). Ingest is comparable between the two persistent backends — the cost of durability.

So pick ART when prefix-scan latency and recovery dominate (the common case for a residency index queried per request); pick LSM when disk footprint is the constraint.

Write amplification, measured

Write amplification is read from RocksDB’s own flush/compaction statistics, not assumed. The tradeoff is exact and opposite: ART writes each record once (1×) but keeps everything; LSM rewrites data through compaction (10.6×) but reclaims space. This is the classic append-only-vs-compaction storage tradeoff, here measured for a KV-cache residency index — a gap a recently published RocksDB-for-KV-cache approach left unanalyzed.

A bottleneck the benchmark caught

An eviction-churn phase (remove_block + put under cache pressure) surfaced that remove_block was O(scope) — given a block hash but not its prefix, every backend scanned the whole identity scope. A secondary block_hash → primary key reverse index made it an O(matches) seek, 100–300× faster eviction on the persistent backends. The benchmark working as intended: it found the bottleneck, then measured the fix.