Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

NoKV — Documentation

NoKV

An open-source namespace metadata substrate for distributed filesystems, object storage, and AI dataset metadata.

Native fsmeta primitives · Own LSM · Own Raft · Own MVCC · Own control plane

CI Coverage Go Report Card Go Reference Mentioned in Awesome CNCF Landscape

Go Version License DeepWiki

NoKV is the open-source counterpart of the “stateless schema layer + transactional KV” pattern that powers Meta Tectonic (over ZippyDB), Google Colossus (over Bigtable), and DeepSeek 3FS (over FoundationDB). The headline service is fsmeta, a namespace metadata API for distributed filesystems / object storage / AI dataset metadata.

The interesting part isn’t the feature list. The interesting part is that layer separation is enforced in code: the fsmeta executor consumes a narrow TxnRunner; the default OpenWithRaftstore adapter owns raftstore wiring; meta/root keeps only lifecycle / authority truth; the storage engine never learns that a namespace exists.

This site is the technical docs hub. For the project landing page, headline benchmarks, and the Why NoKV vs X? matrix, see the root README.


🧭 Three Audiences, One Substrate

DFS frontendObject storage namespaceAI dataset metadata
Consumer shapeFUSE / NFS / SMB driverS3-compatible HTTP gatewaytraining pipeline / scheduler
fsmeta primitives usedReadDirPlus, WatchSubtree, SnapshotSubtree, RenameSubtreeReadDirPlus for LIST, WatchSubtree for bucket events, SnapshotSubtree for versions, RenameSubtree for prefix movesSnapshotSubtree for dataset versions, WatchSubtree for checkpoint notification, ReadDirPlus for batch metadata fetch
Comparable industrial patternTectonic / Colossus / 3FS / HopsFSTectonic / Colossus over object layerMooncake / Quiver / 3FS dataset layer

All three consume the same rooted truth in meta/root and the same native primitives in fsmeta — schema is not specialized to any single consumer.

Deep dive: fsmeta positioning · namespace authority events umbrella


📑 If You Read Only Three Pages

Start here:

  1. fsmeta.md — namespace metadata service (the headline). Primitives, lifecycle authority, deployment.
  2. architecture.md — three-layer architecture. Where each module lives, what each layer is allowed to know.
  3. control_and_execution_protocols.md — the contract between control plane (coordinator/), execution plane (raftstore/), and rooted truth (meta/root/).

For the authority schema behind those primitives, read notes/2026-04-25-namespace-authority-events-umbrella.md.


🗺️ Read By Interest

🗂️ Namespace metadata service (fsmeta) — the primary product

TopicDoc
Complete reference (primitives + lifecycle + deployment)fsmeta.md
Positioning v5 (DFS / OSS / AI three-audience)notes/2026-04-24-fsmeta-positioning.md
Namespace authority events umbrella (Mount / SubtreeAuthority / SnapshotEpoch / QuotaFence schema)notes/2026-04-25-namespace-authority-events-umbrella.md
Snapshot subtree MVCC epochnotes/2026-04-25-snapshot-subtree-mvcc-epoch.md
Benchmark resultsfsmeta.md · benchmark/fsmeta/results/

🔬 Correctness models

TopicLocation
TLA+ / TLC models for control-plane and metadata transition safetyspec/ · spec/README.md
Checked artifactsspec/artifacts/

🏛️ Distributed runtime — the layer below fsmeta

TopicDoc
Rooted truth kernel (meta/root)rooted_truth.md
Coordinator (route / TSO / heartbeats / WatchRootEvents stream)coordinator.md
Coordinator ↔ meta/root deployment separationnotes/2026-04-12-coordinator-meta-separation.md
Coordinator-driven store registry and rooted membershipcoordinator.md · rooted_truth.md
Raftstore overview (store / peer / admin)raftstore.md
Control-plane ↔ execution-plane contractcontrol_and_execution_protocols.md
Standalone → distributed migrationmigration.md
Recovery modelrecovery.md
Percolator MVCC 2PC + AssertionNotExistpercolator.md
Runtime call chains (sequence diagrams)runtime.md

🔧 Storage engine internals — the foundation

The single-node substrate that everything sits on. Independently usable as an embedded Go LSM + Raft library.

TopicDoc
High-level architecturearchitecture.md
WAL discipline and replaywal.md
MemTable + ART/SkipList (ART pinned for fsmeta)memtable.md
Flush pipelineflush.md
Leveled compaction + landing buffercompaction.md · landing_buffer.md
Value log (KV separation + GC)vlog.md
Manifest semanticsmanifest.md
Range filterrange_filter.md
Block / row cachecache.md
VFS abstraction + FaultFSvfs.md · file.md
Hot-key observer (Thermos)thermos.md
Entry / error modelentry.md · errors.md

🛠️ Operations and tooling

TopicDoc
CLI reference (nokv — stats / manifest / regions / mount / quota / migrate)cli.md
nokv-fsmeta standalone gRPC gatewayfsmeta.md
Configuration (one JSON file shared by all binaries)config.md
Cluster demodemo.md
Scripts layoutscripts.md
Stats / expvar / metrics (4 namespaces: executor, watch, quota, mount)stats.md
Testing strategy (failpoints, chaos, restart, migration)testing.md

📒 Notable design decision records

All notes under notes/ are dated decision records — they explain the why, not just the what.


🏗️ Architecture at a Glance

NoKV Architecture

Layer 1  fsmeta            ← namespace primitives (Create / ReadDirPlus / WatchSubtree / RenameSubtree / SnapshotSubtree / Link / Unlink with link-count GC)
   │
Layer 2  meta/root         ← rooted authority truth (Mount / SubtreeAuthority / SnapshotEpoch / QuotaFence)
         coordinator       ← routing, TSO, store discovery, root-event publish + WatchRootEvents stream
         raftstore         ← per-region Raft + apply observer
         percolator        ← 2PC + MVCC + AssertionNotExist + commit-ts retry
   │
Layer 3  engine            ← LSM + ART memtable + WAL + value log (with per-CF/prefix value separation policy: fsm\x00 → AlwaysInline)

Four boundaries enforced in code:

  1. fsmeta-first API. Metadata operations expose filesystem/object-namespace shapes directly, instead of forcing users to assemble them from raw KV calls.
  2. Layer separation enforced. The fsmeta executor consumes a narrow TxnRunner; the default runtime adapter owns raftstore wiring; lower layers do not import fsmeta.
  3. Multi-gateway-safe. Quota fences are rooted truth; usage counters are data-plane keys updated in the same Percolator transaction as metadata mutations. Subtree handoff uses rooted events plus runtime repair.
  4. Root-event driven lifecycle. coordinator.WatchRootEvents pushes mount retire / quota fence / pending handoff changes after bootstrap; the monitor interval is reconnect backoff.

⚡ Quick Start

Bring up a full cluster + register a mount + use fsmeta

# 1. Build binaries
make build

# 2. Launch full cluster: meta-root + coordinator + 3 stores + fsmeta gateway
./scripts/dev/cluster.sh --config ./raft_config.example.json
# (Or: docker compose up -d  — includes mount-init bootstrap)

# 3. Register a mount (rooted authority)
nokv mount register --coordinator-addr 127.0.0.1:2379 \
  --mount default --root-inode 1 --schema-version 1

# 4. (Optional) Set a quota fence
nokv quota set --coordinator-addr 127.0.0.1:2379 \
  --mount default --limit-bytes 10737418240 --limit-inodes 10000000

# 5. Use fsmeta from any gRPC client (Go typed client at fsmeta/client/)
#    or embedded Go: see fsmetaexec.OpenWithRaftstore in the root README

# 6. Inspect runtime state
curl http://127.0.0.1:9101/debug/vars | jq '.nokv_fsmeta_executor, .nokv_fsmeta_watch, .nokv_fsmeta_quota, .nokv_fsmeta_mount'
nokv stats --workdir ./artifacts/cluster/store-1

Full walkthrough: getting_started.md · CLI reference: cli.md


🔗 Jump Points

fsmeta serviceThe headline product — namespace metadata API
Formal specsTLA+ / TLC models for transition safety
CLI surfacenokv — stats, manifest, regions, mount, quota, migrate
Topology configOne JSON file shared by scripts, Docker, all CLI
CoordinatorRoute / TSO / heartbeat / root-event subscribe
Rooted truthmeta/root typed event log
Percolator / MVCC2PC primitives in distributed mode
Runtime call chainsFunction-level sequence diagrams
TestingFailpoints, chaos, restart, migration
SUMMARY.mdFull mdbook table of contents

Open-source namespace metadata substrate for DFS, OSS, and AI dataset metadata.
Built from scratch — no external storage engine, no external Raft library, no external coordinator.