Quick start
QuillCache is a Rust workspace. The default build needs no GPU, no RDMA, and no C++ toolchain.
Build & test
git clone https://github.com/feichai0017/quillcachecd quillcachecargo buildcargo test # 45 testsThe ART-vs-LSM storage study
The residency index is benchmarked across three backends on the same trace. RocksDB needs a C++ toolchain; Holt (ART) is pure Rust.
cargo run --features "rocksdb holt" -- bench-index --backend holtcargo run --features "rocksdb holt" -- bench-index --backend rocksdbcargo run -- bench-index --backend memorySee the storage study for the numbers and what they mean.
Online mode — the gateway
Run the OpenAI-compatible gateway in front of real engines, backed by a persistent ART (Holt) residency index that survives restarts:
cargo run --features holt -- gateway --config examples/quillcache-gateway.yaml# set `index: holt` in the config for the persistent residency indexThe gateway proxies POST /v1/chat/completions and POST /v1/completions,
ingests KV events at POST /v1/kv-events, and reports state at GET /v1/state.
Each response carries x-quillcache-* decision headers (local hits, transfers,
recomputes, refused unsafe reuse, estimated TTFT).
The CUDA device tier
The CUDA tier (HBM↔host copies + FP16→FP8 quantize-on-offload) is a separate crate, excluded from the default workspace so the build stays hardware-free. Build it on a GPU box:
cd crates/quillcache-cudacargo build --features cuda