Skip to content

Crates

QuillCache is a Cargo workspace. The CUDA crate is excluded from the default build (it needs an NVIDIA toolchain); everything else builds hardware-free.

craterole
quillcache-gatewayOpenAI-compatible gateway: proxy, streaming, decision headers, SLO goodput
quillcache-controlcontrol plane: plan() / observe_placement / audit_reuse
quillcache-routerrouting policies incl. DynamoCostRouter (+ greedy / SLO-aware / session / prefix-affinity / round-robin)
quillcache-coreKvBlockKey identity, CostModel, the IndexBackend + DataPlane traits, and the ART-vs-LSM bench
quillcache-storeLocalKvStore (crash-consistent byte pool), StoreDataPlane (tiers), PooledStore, NodeRegistry
quillcache-transfertransfer engine: LocalTransfer / TcpTransfer / RdmaTransfer (reserved)
quillcache-index-holtHolt (persistent ART) index backend
quillcache-index-rocksdbRocksDB (LSM) index backend
quillcache-cudaCUDA device tier: HBM↔host copies + FP8 quantize-on-offload (feature-gated, excluded from the workspace)

The two seams

Two traits make the system pluggable and testable:

  • IndexBackend (quillcache-core) — the residency index. Implementations: MemoryIndex (reference), Holt (persistent ART), RocksDB (LSM). The same trait lets the storage study compare engines apples-to-apples.
  • DataPlane (quillcache-core) — the KV byte tier manager. StoreDataPlane implements it over per-worker LocalKvStore byte pools, so place() moves real bytes between HBM/DRAM/SSD tiers.

The transfer engine (Transfer trait) and the node registry (NodeRegistry trait) are the seams for the distributed read path: TCP today, RDMA reserved; an in-memory registry now, etcd pluggable behind the trait.