Cache & Bloom Filters
NoKV’s LSM tier layers a multi-level block cache with bloom filter caching to accelerate lookups. The implementation is in lsm/cache.go.
1. Components
| Component | Purpose | Source |
|---|---|---|
cache.indexs | Table index cache (fid → *pb.TableIndex) reused across reopen. | utils/cache |
blockCache | Ristretto-based block cache (L0/L1 only) with per-table direct slots. | lsm/cache.go |
bloomCache | LRU cache of bloom filter bitsets per SST. | lsm/cache.go |
cacheMetrics | Atomic hit/miss counters for L0/L1 blocks and blooms. | lsm/cache.go#L30-L110 |
Badger uses a similar block cache split (Pinner/Cache) while RocksDB exposes block cache(s) via the BlockBasedTableOptions. NoKV keeps it Go-native and GC-friendly.
1.1 Index Cache & Handles
- SSTable metadata stays with the
tablestruct, while decoded protobuf indexes are stored incache.indexs. Lookups first hit the cache before falling back to disk. - SST handles are reopened on demand for lower levels. L0/L1 tables keep their file descriptors pinned, while deeper levels close them once no iterator is using the table.
2. Block Cache Strategy
User-space block cache (L0/L1, parsed blocks, Ristretto LFU-ish)
Deeper levels rely on OS page cache + mmap readahead
Options.BlockCacheSizesets capacity in blocks (cost=1 per block). Entries keep parsed blocks (data slice + offsets/baseKey/checksum), so hits avoid re-parsing.- Per-table direct slots (
table.cacheSlots[idx]) give a lock-free fast path. Misses fall back to the shared Ristretto cache (approx LFU with admission). - Evictions clear the table slot via
OnEvict; user-space cache only tracks L0/L1 blocks. Deeper levels depend on the OS page cache. - Access patterns:
getBlockalso updates hit/miss metrics for L0/L1; deeper levels bypass the cache and do not affect metrics.
flowchart LR Read --> CheckCache CheckCache -->|hit| Return CheckCache -->|miss| LoadFromTable["LoadFromTable (mmap + OS page cache)"] LoadFromTable --> InsertCache InsertCache --> Return
By default only L0 and L1 blocks are cached (level > 1 short-circuits), reflecting the higher re-use for top levels.
3. Bloom Cache
bloomCachestores the raw filter bitset (utils.Filter) per table ID. Entries are deep-copied (SafeCopy) to avoid sharing memory with mmaps.- Cache policy is LRU.
- Capacity is controlled by
Options.BloomCacheSize. - Bloom hits/misses are recorded via
cacheMetrics.recordBloom, feeding intoStatsSnapshot.Cache.BloomHitRate.
4. Metrics & Observability
cache.metricsSnapshot() produces:
type CacheMetrics struct {
L0Hits, L0Misses uint64
L1Hits, L1Misses uint64
BloomHits, BloomMisses uint64
IndexHits, IndexMisses uint64
}
Stats.Snapshot converts these into hit rates. Monitor them alongside the block cache sizes to decide when to scale memory.
5. HotRing Integration
- Hot detection: HotRing counts on read/write paths and triggers targeted prefetch for hot keys.
- Cache warmup: prefetch loads target blocks into the normal L0/L1 block cache path.
- Compaction coupling: HotRing top-k feeds compaction scoring; levels/ingest shards covering hot ranges get higher scores to trim overlap sooner.
- Tuning: Hot thresholds come from HotRing options (window/decay configurable).
6. Interaction with Value Log
- Keys stored as value pointers (large values) still populate block cache entries for the key/index block. The value payload is read directly from the vlog (
valueLog.read), so block cache hit rates remain meaningful. - Discard stats from flushes can demote cached blocks via
cache.dropBlock, ensuring obsolete SST data leaves the cache quickly.
7. Comparison
| Feature | RocksDB | BadgerDB | NoKV |
|---|---|---|---|
| Block cache policy | Configurable multiple caches | Single cache | Ristretto for L0/L1 + OS page cache for deeper levels |
| Bloom cache | Enabled per table, no explicit cache | Optional | Dedicated LRU storing filters |
| Metrics | Block cache stats via GetAggregatedIntProperty | Limited | NoKV.Stats.cache.* hit rates |
8. Operational Tips
- If bloom hit rate falls below ~60%, consider increasing bits-per-key or Bloom cache size.
- Track
nokv stats --jsoncache metrics over time; drops often indicate iterator misuse or working-set shifts.
More on SST layout lives in docs/manifest.md and docs/architecture.md.