Stats & Observability Pipeline
NoKV exposes runtime health through:
StatsSnapshot(structured in-process snapshot)expvar(/debug/vars)nokv statsCLI (plain text or JSON)
The implementation lives in stats.go, and collection runs continuously once DB is open.
1. Architecture
flowchart TD
subgraph COLLECTORS["Collectors"]
LSM["lsm.* metrics"]
WAL["wal metrics"]
VLOG["value log metrics"]
HOT["hotring"]
REGION["region metrics"]
TRANSPORT["grpc transport metrics"]
REDIS["redis gateway metrics"]
end
LSM --> SNAP["Stats.Snapshot()"]
WAL --> SNAP
VLOG --> SNAP
HOT --> SNAP
REGION --> SNAP
TRANSPORT --> SNAP
REDIS --> SNAP
SNAP --> EXP["Stats.collect -> expvar"]
SNAP --> CLI["nokv stats"]
Two-layer design:
metricslayer: only collects counters/gauges/snapshots.statslayer: aggregates cross-module data and exports.
2. Snapshot Schema
StatsSnapshot is now domain-grouped (not flat):
entriesflush.*compaction.*value_log.*(includesvalue_log.gc.*)wal.*raft.*write.*region.*hot.*cache.*lsm.*transport.*redis.*
Representative fields:
flush.pending,flush.queue_length,flush.last_wait_mscompaction.backlog,compaction.max_score,compaction.value_weightvalue_log.segments,value_log.pending_deletes,value_log.gc.gc_runswal.active_segment,wal.segment_count,wal.typed_record_ratioraft.group_count,raft.lagging_groups,raft.max_lag_segmentswrite.queue_depth,write.avg_request_wait_ms,write.hot_key_limitedregion.total,region.running,region.removing,region.tombstonehot.read_keys,hot.write_keys,hot.read_ring,hot.write_ringcache.block_l0_hit_rate,cache.bloom_hit_rate,cache.iterator_reusedlsm.levels,lsm.value_bytes_total
3. expvar Export
Stats.collect exports a single structured object:
NoKV.Stats
All domains (flush, compaction, value_log, wal, raft, write, region, hot, cache, lsm, transport, redis) are nested under this object.
Legacy scalar compatibility keys are removed. Consumers should read fields from NoKV.Stats directly.
4. CLI & JSON
nokv stats --workdir <dir>: offline snapshot from local DBnokv stats --expvar <host:port>: snapshot from running process/debug/varsnokv stats --json: machine-readable nested JSON
Example:
{
"entries": 1048576,
"flush": {
"pending": 2,
"queue_length": 2
},
"value_log": {
"segments": 6,
"pending_deletes": 1,
"gc": {
"gc_runs": 12
}
},
"hot": {
"read_keys": [
{"key": "user:123", "count": 42}
]
}
}
5. Operational Guidance
flush.queue_length+compaction.backlogboth rising: flush/compaction under-provisioned.value_log.discard_queuehigh for long periods: checkvalue_log.gc.*and compaction pressure.write.throttle_active=truefrequently: L0 pressure likely high; inspectcache.block_l0_hit_rateand compaction.write.hot_key_limitedincreasing: hot key write throttling is active.raft.lag_warning=true: at least one group exceeds lag threshold.
6. Comparison
| Engine | Built-in observability |
|---|---|
| RocksDB | Rich metrics/perf context, often needs additional tooling/parsing |
| Badger | Optional metrics integrations |
| NoKV | Native expvar + structured snapshot + CLI with offline/online modes |