Error Handling Guide
This document defines how NoKV should own, define, and propagate errors.
1. Ownership Rules
- Domain errors stay in domain packages.
- Cross-cutting runtime errors may live in
utilsonly when shared by multiple subsystems. - Command-local/business-flow errors should be unexported (
errXxx) and stay in command/service packages.
Examples:
kv: entry codec/read-decode errors.vfs: filesystem contract errors.pd/core: control-plane validation/conflict errors.
2. Propagation Rules
- Wrap with
%wwhen crossing package boundaries. - Match via
errors.Is, not string compare. - Keep stable sentinel values for retryable / control-flow decisions.
- Add context in upper layers; do not lose original cause.
3. Naming Rules
- Exported sentinels use
ErrXxx. - Error text should be lowercase and package-scoped when useful (for example
pd/core: ...,vfs: ...). - Avoid duplicate sentinels with identical semantics in different packages.
4. Current Error Map
Shared runtime sentinels
utils/error.go: common cross-package sentinels such as invalid request, key/value validation errors, throttling, and lifecycle guards.
Domain-specific sentinels
kv/entry_codec.go:ErrBadChecksum,ErrPartialEntryvfs/vfs.go:ErrRenameNoReplaceUnsupportedlsm/compact/errors.go: compaction planner/runtime domain errorsraftstore/peer/errors.go: peer lifecycle/state errorspb/errorpb.proto: region/store routing protobuf errors (RegionError,StoreNotMatch,RegionNotFound,KeyNotInRegion, …)wal/errors.go: WAL encode/decode and segment errorspd/core/errors.go: PD metadata and range validation errors
5. Propagation in Hot Paths
- Embedded write path (
DB.Set*-> commit worker -> LSM/WAL):- validation returns direct sentinel (
ErrEmptyKey,ErrNilValue,ErrInvalidRequest); - storage boundary errors are wrapped with context and preserved via
%w.
- validation returns direct sentinel (
- Distributed command path (
kv.Service->Store.*Command->kv.Apply):- region/leader/store/range failures are mapped to
errorpbmessages in protobuf responses; - execution failures return Go errors to RPC layer and are translated to gRPC status.
- region/leader/store/range failures are mapped to
- Recovery/replay path (WAL/Vlog/Manifest):
- partial/corrupt records return domain sentinels and are handled by truncation or restart logic in upper layers.