HotRing – Hot Key Tracking
hotring is NoKV’s built-in hot-key tracker. It samples read/write access frequency per key and exposes the hottest entries to the stats subsystem and CLI. The implementation resides in hotring/.
1. Motivation
- Cache hints –
DB.prefetchLoop(seedb.go) consumes hot keys to schedule asynchronous reads into the block cache. - Operational insight –
StatsSnapshot.HotKeysandnokv stats --jsonsurface the hottest keys, aiding debugging of traffic hotspots. - Throttling –
HotRing.TouchAndClampenables simple rate caps: once a key crosses a threshold, callers can back off or log alerts.
Compared with RocksDB (which exposes block access stats via perf_context) and Badger (which lacks built-in hot-key reporting), NoKV offers a lightweight but concurrent-friendly tracker out of the box.
2. Data Structure
HotRing
buckets[] -> per-bucket lock-free linked list (Node)
hashFn -> hash(key) -> uint32
hashMask -> selects bucket (power of two size)
- Each bucket stores a sorted linked list of
Nodeordered by(tag, key), wheretagis derived from the upper bits of the hash. Head pointers areatomic.Pointer[Node], so readers walk the list without taking locks; writers use CAS to splice nodes while preserving order. defaultTableBits = 12→ 4096 buckets by default (NewHotRing). The mask ensures cheap modulo operations.- Nodes keep a
count(int32) updated atomically and anextpointer stored viaunsafe.Pointer. Sliding-window state is guarded by a tiny per-node spin lock instead of a process-wide mutex.
flowchart LR Key(key) -->|hash| Bucket["buckets[index] (atomic head)"] Bucket --> Node1 Node1 --> Node2 Node2 --> Node3 Node3 --> Nil[(nil)]
3. Core Operations
| Method | Behaviour | Notes |
|---|---|---|
Touch | Insert or increment key’s counter. | CAS-splices a new node if missing, then increments (window-aware when enabled). |
Frequency | Read-only counter lookup. | Lock-free lookup; uses sliding-window totals when configured. |
TouchAndClamp | Increment unless count >= limit, returning (count, limited). | Throttling follows sliding-window totals so hot bursts clamp quickly. |
TopN | Snapshot hottest keys sorted by count desc. | Walks buckets without locks, then sorts a copy. |
KeysAbove | Return all keys with counters ≥ threshold. | Handy for targeted throttling or debugging hot shards. |
Bucket ordering is preserved by findOrInsert, which CASes either the bucket head or the predecessor’s next pointer to splice new nodes. Reads never take locks; only per-node sliding-window updates spin briefly to avoid data races.
4. Integration Points
- DB reads –
Txn.Getand iterators calldb.recordRead, which in turn invokesHotRing.Touchfor every successful lookup. Writes touch the ring only whenOptions.WriteHotKeyLimitis set, so throttling can clamp abusive keys. - Stats –
StatsSnapshotcopieshot.TopNintoHotKeys.expvarpublishes the same view underNoKV.Stats.HotKeysfor automation. - Caching –
lsm/cachecan promote blocks referenced by frequently touched keys, keeping the hot tier warm.
5. Comparisons
| Engine | Approach |
|---|---|
| RocksDB | External – TRACE / perf_context requires manual sampling. |
| Badger | None built-in. |
| NoKV | In-process ring with expvar/CLI export and throttling helpers. |
The HotRing emphasises simplicity: lock-free bucket lists with atomic counters (plus optional per-node window tracking), avoiding sketches while staying light enough for hundreds of thousands of hot keys.
6. Operational Tips
Options.HotRingTopKcontrols how many keys show up in stats; default 16. Increase it when investigating workloads with broad hot sets.- Combine
TouchAndClampwith request middleware to detect abusive tenants: whenlimitedis true, log the key and latency impact. - Resetting the ring is as simple as instantiating a new
HotRing—useful for benchmarks that require clean counters between phases.
For end-to-end examples see docs/stats.md and the CLI walkthrough in docs/cli.md.
7. Write-Path Throttling
Options.WriteHotKeyLimit wires HotRing into the write path. When set to a positive integer, every call to DB.Set* or transactional Txn.Set* invokes HotRing.TouchAndClamp with the limit. Once a key (optionally scoped by column family via cfHotKey) reaches the limit, the write is rejected with utils.ErrHotKeyWriteThrottle. This keeps pathological tenants or hot shards from overwhelming a single Raft group without adding heavyweight rate-limiters to the client stack.
Operational hints:
StatsSnapshot.HotWriteLimitedand the CLI lineWrite.HotKeyThrottledexpose how many writes were rejected since the process started.- Applications should surface
utils.ErrHotKeyWriteThrottleto callers (e.g. HTTP 429) so clients can back off. - Prefetching continues to run independently—only writes are rejected; reads still register hotness so the cache layer knows what to prefetch.
- Set the limit conservatively (e.g. a few dozen) and pair it with richer
HotRinganalytics (top-K stats, expvar export) to identify outliers before tuning.
8. Time-Based Decay & Sliding Window
HotRing now exposes two complementary controls so “old” hotspots fade away automatically:
- Periodic decay (
Options.HotRingDecayInterval+HotRingDecayShift)
Everyintervalthe global counters are right-shifted (count >>= shift). This keepsTopNand stats output focused on recent traffic even if writes stop abruptly. - Sliding window (
Options.HotRingWindowSlots+HotRingWindowSlotDuration)
Per-key windows split time intoslots, each lastingslotDuration.Touchonly accumulates inside the current slot; once the window slides past, the stale contribution is dropped.TouchAndClampandFrequencyuse the sliding-window total, so write throttling reflects short-term pressure instead of lifetime counts.
Disable either mechanism by setting the interval/durations to zero. Typical starting points:
| Option | Default value | Effect |
|---|---|---|
HotRingDecayInterval | 1s | Halve legacy counters once per second. |
HotRingDecayShift | 1 | Simple divide-by-two decay. |
HotRingWindowSlots | 8 | Keep ~8 buckets of recency data. |
HotRingWindowSlotDuration | 250ms | Roughly 2s window for throttling. |
With both enabled, the decay loop keeps background stats tidy while the sliding window powers precise, short-term throttling logic.