Crates
QuillCache is a Cargo workspace. The CUDA crate is excluded from the default build (it needs an NVIDIA toolchain); everything else builds hardware-free.
| crate | role |
|---|---|
quillcache-gateway | OpenAI-compatible gateway: proxy, streaming, decision headers, SLO goodput |
quillcache-control | control plane: plan() / observe_placement / audit_reuse |
quillcache-router | routing policies incl. DynamoCostRouter (+ greedy / SLO-aware / session / prefix-affinity / round-robin) |
quillcache-core | KvBlockKey identity, CostModel, the IndexBackend + DataPlane traits, and the ART-vs-LSM bench |
quillcache-store | LocalKvStore (crash-consistent byte pool), StoreDataPlane (tiers), PooledStore, NodeRegistry |
quillcache-transfer | transfer engine: LocalTransfer / TcpTransfer / RdmaTransfer (reserved) |
quillcache-index-holt | Holt (persistent ART) index backend |
quillcache-index-rocksdb | RocksDB (LSM) index backend |
quillcache-cuda | CUDA device tier: HBM↔host copies + FP8 quantize-on-offload (feature-gated, excluded from the workspace) |
The two seams
Two traits make the system pluggable and testable:
IndexBackend(quillcache-core) — the residency index. Implementations:MemoryIndex(reference), Holt (persistent ART), RocksDB (LSM). The same trait lets the storage study compare engines apples-to-apples.DataPlane(quillcache-core) — the KV byte tier manager.StoreDataPlaneimplements it over per-workerLocalKvStorebyte pools, soplace()moves real bytes between HBM/DRAM/SSD tiers.
The transfer engine (Transfer trait) and the node registry
(NodeRegistry trait) are the seams for the distributed read path: TCP today,
RDMA reserved; an in-memory registry now, etcd pluggable behind the trait.