Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Stats & Observability Pipeline

NoKV exposes runtime health through:

  • StatsSnapshot (structured in-process snapshot)
  • expvar (/debug/vars)
  • nokv stats CLI (plain text or JSON)

The implementation lives in stats.go, and collection runs continuously once DB is open.


1. Architecture

flowchart TD
    subgraph COLLECTORS["Collectors"]
        LSM["lsm.* metrics"]
        WAL["wal metrics"]
        VLOG["value log metrics"]
        HOT["hotring"]
        REGION["region metrics"]
        TRANSPORT["grpc transport metrics"]
        REDIS["redis gateway metrics"]
    end
    LSM --> SNAP["Stats.Snapshot()"]
    WAL --> SNAP
    VLOG --> SNAP
    HOT --> SNAP
    REGION --> SNAP
    TRANSPORT --> SNAP
    REDIS --> SNAP
    SNAP --> EXP["Stats.collect -> expvar"]
    SNAP --> CLI["nokv stats"]

Two-layer design:

  • metrics layer: only collects counters/gauges/snapshots.
  • stats layer: aggregates cross-module data and exports.

2. Snapshot Schema

StatsSnapshot is now domain-grouped (not flat):

  • entries
  • flush.*
  • compaction.*
  • value_log.* (includes value_log.gc.*)
  • wal.*
  • raft.*
  • write.*
  • region.*
  • hot.*
  • cache.*
  • lsm.*
  • transport.*
  • redis.*

Representative fields:

  • flush.pending, flush.queue_length, flush.last_wait_ms
  • compaction.backlog, compaction.max_score, compaction.value_weight
  • value_log.segments, value_log.pending_deletes, value_log.gc.gc_runs
  • wal.active_segment, wal.segment_count, wal.typed_record_ratio
  • raft.group_count, raft.lagging_groups, raft.max_lag_segments
  • write.queue_depth, write.avg_request_wait_ms, write.hot_key_limited
  • region.total, region.running, region.removing, region.tombstone
  • hot.read_keys, hot.write_keys, hot.read_ring, hot.write_ring
  • cache.block_l0_hit_rate, cache.bloom_hit_rate, cache.iterator_reused
  • lsm.levels, lsm.value_bytes_total

3. expvar Export

Stats.collect exports a single structured object:

  • NoKV.Stats

All domains (flush, compaction, value_log, wal, raft, write, region, hot, cache, lsm, transport, redis) are nested under this object.

Legacy scalar compatibility keys are removed. Consumers should read fields from NoKV.Stats directly.


4. CLI & JSON

  • nokv stats --workdir <dir>: offline snapshot from local DB
  • nokv stats --expvar <host:port>: snapshot from running process /debug/vars
  • nokv stats --json: machine-readable nested JSON

Example:

{
  "entries": 1048576,
  "flush": {
    "pending": 2,
    "queue_length": 2
  },
  "value_log": {
    "segments": 6,
    "pending_deletes": 1,
    "gc": {
      "gc_runs": 12
    }
  },
  "hot": {
    "read_keys": [
      {"key": "user:123", "count": 42}
    ]
  }
}

5. Operational Guidance

  • flush.queue_length + compaction.backlog both rising: flush/compaction under-provisioned.
  • value_log.discard_queue high for long periods: check value_log.gc.* and compaction pressure.
  • write.throttle_active=true frequently: L0 pressure likely high; inspect cache.block_l0_hit_rate and compaction.
  • write.hot_key_limited increasing: hot key write throttling is active.
  • raft.lag_warning=true: at least one group exceeds lag threshold.

6. Comparison

EngineBuilt-in observability
RocksDBRich metrics/perf context, often needs additional tooling/parsing
BadgerOptional metrics integrations
NoKVNative expvar + structured snapshot + CLI with offline/online modes