Compaction & Cache Strategy
NoKV’s compaction pipeline borrows the leveled‑LSM layout from RocksDB, but layers it with a landing buffer, lightweight cache telemetry, and simple concurrency guards so the implementation stays approachable while still handling bursty workloads.
1. Overview
Compactions are orchestrated by lsm.compaction with lsm.levelManager supplying scheduling input and executing the plan. Each level owns two lists of tables:
tables– the canonical sorted run for the level.landing– a landing buffer that temporarily holds SSTables moved from the level above when there is not yet enough work (or bandwidth) to do a full merge.
The compaction runtime periodically calls into the picker to build a list of Priority entries. The priorities consider three signals:
- L0 table count – loosely capped by
Options.NumLevelZeroTables. - Level size vs target – computed by
levelTargets(), which dynamically adjusts the “base” level depending on total data volume. - Landing buffer backlog – if a level’s
landingshards have data, they receive elevated scores so landed tables are merged promptly.
The highest adjusted score is processed first. L0 compactions can either move tables into the landing buffer of the base level (cheap re‑parenting) or compact directly into a lower level when the overlap warrants it.
Planning now happens via Plan: LSM snapshots table metadata into TableMeta, PlanFor* selects table IDs + key ranges, and LSM resolves the plan back to *table before executing.
2. Landing Buffer
moveToLanding (see engine/lsm/compaction_executor.go) performs a metadata-only migration:
- Records a
manifest.EditDeleteFilefor the source level. - Logs a new
manifest.EditAddFiletargeting the destination level. - Removes the table from
thisLevel.tablesand appends it tonextLevel.landing.
This keeps write amplification low when many small L0 tables arrive at once. Reads still see the newest data because levelHandler.searchLNSST checks the landing buffer before consulting the canonical level tables.
Compaction tests (engine/lsm/compaction_test.go) assert that after calling moveToLanding the table disappears from the source level and shows up in the landing buffer.
3. Concurrency Guards
To prevent overlapping compactions:
State.CompareAndAddtracks the key range of each in-flight compaction per level.- Attempts to register a compaction whose ranges intersect an existing one are rejected.
- When a compaction finishes,
State.Deleteremoves the ranges and table IDs from the guard.
This mechanism is intentionally simple—just a mutex‐protected slice—yet effective in tests (TestCompactStatusGuards) that simulate back‑to‑back registrations on the same key range.
4. Cache Telemetry
NoKV’s cache strategy has two explicit user-space caches plus direct bloom probing from decoded table indexes (engine/lsm/cache.go):
| Component | Purpose | Metrics hook |
|---|---|---|
| Block cache | Ristretto cache for L0/L1 blocks. | cacheMetrics.recordBlock(level, hit) |
| OS page cache path | Deeper levels bypass user-space cache and rely on mmap + kernel page cache. | Same as above |
| Bloom filters | Embedded in pb.TableIndex and probed directly from the decoded index. | no separate cache layer |
Cache hit/miss signals are exported through StatsSnapshot.Cache (and surfaced by nokv stats / expvar), which is especially helpful when tuning landing behaviour. If L0/L1 cache misses spike, the landing buffer likely needs to be drained faster. TestCacheHotColdMetrics verifies cache hit accounting.
5. Interaction with Value Log
Compaction informs value‑log GC via discard statistics:
- During
subcompact, every entry merged out is inspected. If it stores aValuePtr, the amount is added to the discard map. - At the end of subcompaction, the accumulated discard map is pushed through
setDiscardStatsCh. valueLogreceives the stats and can safely rewrite or delete vlog segments with predominantly obsolete data.
This tight coupling keeps the value log from growing indefinitely after heavy overwrite workloads.
6. Testing Checklist
Relevant tests to keep compaction healthy:
engine/lsm/compaction_test.goTestCompactionMoveToLanding– ensures metadata migration works and the landing buffer grows.TestCompactStatusGuards– checks overlap detection.
engine/lsm/cache_test.goTestCacheHotColdMetrics– validates cache hit accounting.
engine/lsm/lsm_test.goTestCompact/TestHitStorage– end‑to‑end verification that data remains queryable across memtable flushes and compactions.
When adding new compaction heuristics or cache behaviour, extend these tests (or introduce new ones) so the behaviour stays observable.
7. Practical Tips
- Tune
Options.LandingCompactBatchSizewhen landing queues build up; increasing it lets a single move cover more tables. - Observe
NoKV.Stats.cache.*andNoKV.Stats.compaction.*via the CLI (nokv stats) to decide whether you need more compaction workers or bigger caches. - For workloads dominated by range scans, consider increasing
Options.BlockCacheBytesif you want to keep more L0/L1 blocks in the user-space cache; cold data relies on the OS page cache. - Keep an eye on
NoKV.Stats.value_log.gc(for examplegc_runsandhead_updates); if compactions are generating discard stats but the value log head doesn’t move, GC thresholds may be too conservative.
With these mechanisms, NoKV stays resilient under bursty writes while keeping the code path small and discoverable—ideal for learning or embedding. Dive into the source files referenced above for deeper implementation details.