Control-Plane and Execution-Plane Protocols

This note defines NoKV’s protocol line for:

the control plane
the execution plane

and the next-stage evolution around them.

The current implementation status is intentionally minimal but no longer just a design sketch:

control-plane protocol v1 is implemented and exposed through Coordinator RPCs plus meta/root storage semantics
execution-plane protocol v1 is implemented as a store-local contract with a small admin diagnostics API surface

The point of this document is to keep those two lines coordinated instead of letting them drift into separate, implicit rule sets.

The control plane focuses on the contract between:

meta/root
coordinator

The execution plane focuses on the contract between:

coordinator
raftstore
local durable state (raftstore/localmeta, raft log, restart replay)

The purpose of this document is not to replace Raft or redesign the data plane. The purpose is to make NoKV’s existing cross-plane behavior explicit, testable, and evolvable.

The control plane is protocolized around four ideas:

Freshness
CatchUp
Transition
DegradedMode

These four ideas already existed in partial form inside the implementation. The current work turns them into a stable vocabulary, explicit invariants, and a clear rollout line.

The execution plane is protocolized around four matching ideas:

Admission
ExecutionTarget
PublishBoundary
RestartState

0. Current Status

The control plane now has a minimal implemented v1.

Implemented and exposed through pb/coordinator/coordinator.proto, coordinator/server, coordinator/rootview, and tests:

route-read Freshness
RootToken
root_lag
DegradedMode
CatchUpState
TransitionID
PublishRootEventResponse.assessment as a pre-persist lifecycle assessment

This means the protocol is no longer only a design direction. It is already the formal serving contract for key Coordinator APIs.

Not implemented in v1:

richer transition phases such as Published / Stalled
a fuller catch-up action surface exposed through API
automatic recovery policy derived from protocol state
broad client-side policy that consumes every protocol field

So the right description today is:

control-plane protocol v1 is implemented and in use, while richer scheduler/runtime policy is not implemented in v1.

The execution plane is in a different state.

Today, raftstore has a minimal implemented v1 inside raftstore/store.

Already implemented and exercised through store-local types, raftstore/admin, runtime state, and tests:

explicit Admission classes and reasons on read / write / topology entry points
explicit topology ExecutionOutcome
explicit topology PublishState
explicit RestartState derived from raftstore/localmeta + raft replay pointers
terminal publish failures retained as visible retry state instead of silent drop
admin diagnostics exposure through pb/admin/admin.proto ExecutionStatus

Not implemented as first-class execution protocol fields yet:

request validation and routing
context propagation
detailed local leader admission diagnostics
detailed per-attempt scheduler retry/backoff policy
metrics for planned truth -> execute -> terminal truth latency
richer degraded local scheduler states

The current landing is still mostly store-local and spread across:

raftstore/store
raftstore/peer
raftstore/raftlog
raftstore/localmeta

So the right description there is:

execution-plane protocol v1 now exists as a minimal named runtime contract, with store-local state and admin-visible diagnostics, while broader metrics, policy, and richer executor states are not implemented in v1.

1. Intent

NoKV already has the right building blocks:

rooted truth events
checkpoint + committed tail
watch-first tail subscription
rebuildable coordinator/catalog
explicit planned and terminal topology events

Before v1, these pieces mostly existed as implementation mechanics. The control plane now has a formal minimum contract, while several policy extensions remain intentionally outside v1:

when a follower read is fresh enough
when a follower must reload
when retained tail catch-up is no longer enough
what phase a topology change is in
what “degraded” actually means to callers

The design goal is to keep turning these implicit behaviors into a formal protocol.

That protocol should be:

small
explicit
observable
testable
compatible with the current architecture

2. Scope

This document covers both planes, but not at the same implementation depth.

For the control plane, it defines the behavior of:

rooted truth consumption
control-plane view freshness
rooted catch-up progression
topology transition lifecycle
degraded operating modes

For the execution plane, it defines the protocol direction for:

request admission
transition execution
terminal truth publication
restart and local recovery alignment
degraded local behavior around scheduler / queue / publish boundaries

It does not redefine:

Raft replication
Percolator / 2PC transaction semantics
store-local recovery metadata
storage-engine internals unrelated to distributed lifecycle

This document should be read as two linked contracts:

control plane = durable truth + materialized view + serving contract

execution plane = admitted work + local execution + publish/restart contract

3. Protocol Objects

The naming set should remain compact, stable, and precise.

3.1 RootToken

RootToken is the rooted truth position already incorporated by some materialized view.

It is the control-plane equivalent of:

“what truth have I already consumed?”

It should be treated as:

monotonic
comparable
portable across control-plane nodes

RootToken is not just an internal storage cursor. It is the anchor for:

freshness
catch-up state
read eligibility
transition causality

3.2 Freshness

Freshness is the serving contract attached to a read.

It answers:

how fresh did the caller ask for?
how fresh was the returned answer?

3.3 CatchUpState

CatchUpState describes how far one Coordinator node has converged on rooted truth.

It answers:

can this node serve route reads?
can it satisfy bounded-freshness reads?
must it reload?
must it install bootstrap?

3.4 Transition

Transition is one rooted topology change that moves through a formal lifecycle.

Examples:

peer addition
peer removal
region split
region merge
region tombstone

Transition is not just a single event. It is a causally tracked change with:

identity
source truth position
phase
progress

3.5 DegradedMode

DegradedMode is the externally visible restriction level of the control plane.

It answers:

what kind of reads may still be served?
are rooted writes currently allowed?
should clients retry elsewhere?
is the node usable only as a stale view?

4. Naming Set

The protocol should use one stable vocabulary across:

API
code
logs
metrics
tests
docs

4.1 Read classes

Strong
- requires leader-grade freshness
Bounded
- allows follower service within explicit lag limits
BestEffort
- allows stale cache service

These names are short and carry clear serving intent.

4.2 Catch-up actions

Reload
- rebuild catalog from rooted storage
Advance
- acknowledge rooted tail progress without a full rebuild
Bootstrap
- install a fresh checkpoint because retained tail is insufficient
Reject
- deny freshness-sensitive reads until convergence improves

4.3 Catch-up states

Fresh
Lagging
BootstrapRequired
Recovering
Unavailable

4.4 Degraded modes

Healthy
CoordinatorDegraded
RootLagging
RootUnavailable
ViewOnly

ViewOnly is deliberately chosen over more vague names like ExecutionOnly. This section only defines control-plane behavior, so the right question is:

can this node still expose a stale view?

5. Freshness Contract

The control plane should stop treating all successful reads as equivalent.

Every control-plane read should:

declare the requested freshness class
optionally declare a rooted lower bound
receive an explicit served freshness result

5.1 Why this matters

Today, follower reads are effectively:

“good enough if the follower recently reloaded and is not too far behind”

That is practical, but not a protocol.

Without a formal freshness contract:

clients cannot reason about route read quality
tests cannot assert serving guarantees precisely
degraded modes remain guesswork
control-plane correctness is partly hidden in implementation details

5.2 Request fields

Control-plane read RPCs should be able to express:

freshness
- Strong, Bounded, or BestEffort
required_root_token
- optional lower bound on rooted truth already incorporated
max_root_lag
- optional bound on acceptable rooted lag

Not every caller will need all three fields. But the protocol should have room for them.

5.3 Response fields

Control-plane read RPCs should return:

served_root_token
served_freshness
served_by_leader
degraded_mode

Optional future fields:

root_lag
freshness_reason

5.4 Serving rules

`Strong`

Should be served only when:

the node is rooted leader
and the serving catalog has incorporated at least the requested RootToken

If this is not true, the server should reject rather than silently downgrade.

`Bounded`

May be served by a follower when:

the node is not in BootstrapRequired, Recovering, or Unavailable
and lag is within declared bounds
and the served token satisfies required_root_token if one was requested

If bounds cannot be satisfied, the server should reject rather than silently serve stale data.

`BestEffort`

May be served from the current materialized catalog so long as:

the catalog exists
the node is not fully unavailable

This class exists to make stale service explicit instead of accidental.

5.5 First rollout target

The first RPC that should adopt this contract is:

GetRegionByKey

That gives the system a clear, high-value place to prove the model before wider rollout.

6. Rooted Catch-Up Protocol

NoKV already has a good catch-up foundation:

checkpoint
committed tail
watch-first subscription
bootstrap install when retained tail is insufficient

The next step is to give that behavior a formal state machine.

6.1 Catch-up state definitions

`Fresh`

The node’s materialized catalog is sufficiently close to rooted truth to serve:

Bounded
BestEffort

and, if leader, possibly Strong.

`Lagging`

The node is behind, but still within retained-tail recovery range.

This means:

further rooted tail observation may repair the gap
bootstrap install is not yet mandatory
some bounded reads may need to be rejected

`BootstrapRequired`

The node is too far behind for retained tail replay.

This means:

a plain reload from retained tail is not sufficient
a new checkpoint/bootstrap install is required
freshness-sensitive reads should be rejected

`Recovering`

The node is actively rebuilding its materialized control-plane view.

This means:

catalog may be in transition
only explicitly allowed stale reads may be served

`Unavailable`

The node cannot presently produce a valid control-plane view.

This means:

no rooted freshness contract can be satisfied
the server should fail reads except possibly future explicit diagnostics

6.2 Catch-up actions

`Reload`

Used when rooted truth advanced in a way that requires rebuilding the materialized catalog.

`Advance`

Used when rooted tail progressed, but the catalog does not need a full rebuild.

`Bootstrap`

Used when the node must install a checkpoint because retained tail can no longer bridge the gap.

`Reject`

Used when the node should refuse freshness-sensitive serving until it converges further.

6.3 Protocol outputs

The rooted subscription path should eventually expose a structured result like:

root_token_before
root_token_after
catch_up_state
catch_up_action
reload_required
bootstrap_required

6.4 Why protocolizing this matters

Without explicit catch-up semantics:

tests can only assert indirect effects
follower-read serving policy stays implicit
degraded-mode logic gets duplicated
future clients cannot reason about retries properly

This is one of the strongest places for NoKV to become distinctive.

7. Transition Lifecycle Protocol

NoKV already records rooted topology intent and rooted completion. That is the start of a lifecycle, not yet the full protocol.

The next stage is to make transition tracking first-class.

7.1 Transition identity

Every topology transition should have a stable TransitionID.

TransitionID should be:

deterministic
durable
safe to log, surface, and test against

It should not require callers to infer identity from:

region ID
event kind
timing

alone.

7.2 Transition source

Every transition should record:

source rooted epoch or token
target topology intent
the event that created it

This makes causality explicit:

what truth position created this transition?
what later truth position superseded it?

7.3 Phase definitions

`Planned`

The rooted lifecycle assessment says the transition exists as an intended topology change, but the scheduler/control-plane runtime has not yet admitted it for forward progress.

This is the phase used by:

AssessRootEvent
PublishRootEventResponse.assessment

`Admitted`

The rooted transition is currently pending or open, and the scheduler/control-plane runtime has admitted it for execution progress.

This is the phase used by:

ListTransitions

It is intentionally runtime-facing. It does not appear in PublishRootEventResponse.assessment, because that response reports a pre-persist lifecycle assessment rather than post-admission runtime state.

`Completed`

The rooted lifecycle says the requested transition target is already satisfied. For a plan event, this usually means the requested topology is already present.

`Cancelled`

The rooted lifecycle says the requested transition target was cancelled.

`Conflicted`

The rooted lifecycle says a different pending transition already owns progress for the same target.

`Superseded`

The rooted lifecycle says a newer rooted topology already superseded this transition target.

`Aborted`

The rooted lifecycle says an apply or terminal event does not match the current pending rooted target.

7.4 Why lifecycle matters

A formal lifecycle enables:

clear scheduling decisions
proper retry/backoff
stuck transition recovery
scheduler/control-plane runtime clarity
precise testing around publish boundaries

Without it, the system keeps relying on partial signals scattered across:

rooted events
in-memory views
runtime heuristics

8. Degraded Semantics

NoKV already has some degraded behavior:

followers serve stale route views
route cache may survive Coordinator outages
scheduler paths may degrade

These behaviors should become explicit protocol states.

8.1 Mode definitions

`Healthy`

Normal serving mode.

Rooted truth, catalog freshness, and serving guarantees are all within policy.

`CoordinatorDegraded`

The Coordinator process is alive, but not all control-plane functions can be performed normally.

Examples:

partial RPC surface availability
write restrictions while leadership is unsettled

`RootLagging`

Rooted truth exists, but this node’s materialized catalog is behind allowed freshness bounds.

This is not full unavailability. It is a serving restriction mode.

`RootUnavailable`

The rooted backend cannot currently provide enough truth to support valid control-plane service.

In this mode:

truth-sensitive reads fail
rooted writes fail
diagnostics may still be exposed

`ViewOnly`

The node may still expose a stale materialized catalog, but cannot satisfy freshness-sensitive contracts.

This mode is useful because it makes “stale but useful” explicit.

8.2 Why this should be formal

Without explicit degraded modes, callers only see:

transport failure
not leader
route unavailable

Those errors do not express the actual system state.

A real degraded protocol lets callers answer:

retry elsewhere?
retry later?
accept stale?
fail fast?

8.3 Relationship to freshness

DegradedMode and Freshness are related but not identical.

Freshness is the contract requested and served for one read
DegradedMode is the broader operating condition of the serving node

A node may be:

Healthy and still reject a Strong read because it is not leader
RootLagging and still serve BestEffort
ViewOnly and still serve diagnostics

That distinction should remain sharp.

8.4 Current coordinator contract

The current implementation already enforces a concrete degraded-mode contract at the Coordinator RPC boundary.

Metadata reads (`GetRegionByKey`)

Freshness=BEST_EFFORT
- serves from the local materialized catalog even when meta/root is currently unavailable
- returns degraded_mode=ROOT_UNAVAILABLE when the rooted snapshot cannot be reloaded
- returns degraded_mode=ROOT_LAGGING when the local catalog trails rooted truth
Freshness=BOUNDED
- rejects when meta/root is unavailable
- rejects when root_lag > max_root_lag
- rejects when catch-up is still BOOTSTRAP_REQUIRED
Freshness=STRONG
- rejects on followers
- rejects whenever root_lag > 0
- rejects when meta/root is unavailable

In all cases, successful replies carry the current answerability witness:

served_root_token
current_root_token
root_lag
catch_up_state
degraded_mode
serving_class
sync_health

Duty-gated writes (`AllocID`, `TSO`, scheduler decisions)

These do not have a degraded fallback.

the local coordinator must first campaign / renew the rooted lease
the rooted lease must still be active for the local holder
the rooted era must not already be sealed
the rooted duty mask must admit the requested action

If any of those fail, the request is rejected instead of falling back to stale local state. This is the current boundary between:

read-path degradation
write-path fail-stop admission

Lifecycle mutations (`Seal`, `Confirm`, `Close`, `Reattach`)

Lifecycle mutations are stricter than hot-path duty admission:

they always re-read rooted state from storage before mutating
they reject any stale-holder / expired-lease / sealed-era view
they treat finality as a rooted safety condition, not a best-effort hint

That is why seal / confirm / close / reattach do not use the cached mirror admission path.

8.5 Operational diagnostics

DiagnosticsSnapshot() now exports both:

the current degraded serving state (root, lease, audit, handover_witness)
cumulative Eunomia counters under eunomia_metrics

eunomia_metrics is grouped into:

tenure_era_transitions_total
handover_stage_transitions_total
gate_rejections_total
guarantee_violations_total

The guarantee_violations_total buckets map directly to the four Eunomia guarantees:

primacy
inheritance
silence
finality

9. API Direction

The most valuable first implementation step is at the Coordinator RPC boundary.

9.1 Read-side API direction

Read APIs should conceptually grow:

freshness
required_root_token
max_root_lag

Read responses should conceptually expose:

served_root_token
served_freshness
degraded_mode
served_by_leader

9.2 Write-side API direction

Leader-only writes should remain leader-only.

Write requests should continue to require:

rooted leadership
expected cluster epoch where applicable

Write responses should eventually expose:

accepted_root_token
transition_id where topology change is involved

This makes a write result more precise than:

accepted = true

9.3 Diagnostics API direction

The control plane will likely also benefit from an explicit diagnostics surface.

Conceptually, that should expose:

current rooted token
catalog rooted token
catch-up state
degraded mode
leader identity knowledge
lag estimate

This may become:

a dedicated diagnostics RPC
metrics
CLI output

or all three.

10. Storage and Catalog Direction

To support the protocol above, the Coordinator catalog should become rooted-token aware.

At minimum, the materialized control-plane view should track:

catalog_root_token
catalog_updated_at
catch_up_state
degraded_mode

Optional future metadata:

root_lag
last_reload_reason
leader_observed

10.1 Ownership rule

This design does not change truth ownership.

The ownership line remains:

meta/root owns durable truth
coordinator/catalog owns materialized serving state

The catalog should become more informative, not more authoritative.

10.2 Materialization rule

The catalog must remain:

rebuildable
discardable
follower-local

It should never become a second durable truth source.

That is a core invariant.

11. Invariants

This protocol should preserve the following invariants.

11.1 Truth ownership invariant

Only meta/root owns durable control-plane truth.

11.2 Materialization invariant

coordinator/catalog is always derived state, never authority.

11.3 Monotonic token invariant

The materialized rooted token of one node must never move backward.

11.4 No silent downgrade invariant

If a caller requests Strong or bounded freshness and the node cannot satisfy it, the server should reject rather than silently serve BestEffort.

11.5 Explicit stale service invariant

If stale service is allowed, the response should say so explicitly.

11.6 Transition identity invariant

Every control-plane transition must be referencable as a stable object, not just inferred from event timing.

12. Rollout State

The rollout stays incremental, but the first protocol line is already in use.

Phase 1: Freshness

Status: implemented

Delivered outcomes:

GetRegionByKey can express requested freshness
route responses disclose served freshness and rooted token
follower-read behavior is no longer implicit

Phase 2: Catch-Up

Status: minimal v1 implemented

Delivered outcomes:

CatchUpState
formal bootstrap-required boundary
rooted lag awareness in serving decisions

Still open:

a wider public CatchUpAction surface
more explicit recovery diagnostics

Phase 3: Transition

Status: minimal v1 implemented

Delivered outcomes:

durable TransitionID
explicit phase semantics across:
- ListTransitions
- AssessRootEvent
- PublishRootEvent
publish-time pre-persist lifecycle assessment

Still open:

richer runtime phases
stuck / timeout diagnosis

Phase 4: DegradedMode

Status: minimal v1 implemented

Delivered outcomes:

explicit degraded semantics in route responses
route-serving rejection under rooted lag / rooted unavailability

Still open:

broader surfacing through metrics and diagnostics
tighter client retry policy based on degraded state

13. What Not To Do

The following are intentionally out of scope for this line of work:

inventing a new general-purpose consensus algorithm
replacing Raft in the mainline system
redesigning 2PC before control-plane semantics are explicit
collapsing rooted truth and catalog into one mixed layer
treating stale follower service as an undocumented optimization

NoKV’s control-plane innovation should come from stronger semantics and clearer ownership, not from unnecessary reinvention of already mature primitives.

14. Current Practical Naming Guidance

If this protocol starts landing in code, the implementation should prefer:

RootToken
Freshness
CatchUpState
CatchUpAction
TransitionID
DegradedMode

For execution-plane work, prefer:

Admission
ExecutionTarget
ExecutionOutcome
PublishState
RestartState

Avoid reintroducing weaker names like:

state kind
stale mode
sync status
reload reason as the primary protocol object

Those may still exist as helper fields, but the public model should stay anchored to the smaller protocol vocabulary above.

15. Execution-Plane Protocol

The execution plane is the contract between:

raftstore
local leader peer runtime
local durable recovery state
the control-plane publish boundary

Its job is different from the control plane.

The control plane answers:

what topology truth exists?
how fresh is the served view?
what transition lifecycle is visible globally?

The execution plane answers:

may this request enter local execution now?
what target is being executed?
how far has local execution progressed?
has terminal truth been published yet?
what state is safe to recover after restart?

15.1 Why this matters

Without an explicit execution-plane protocol, the system keeps important distributed safety semantics hidden in code paths such as:

request validation and cancellation
queue admission and local degradation
planned truth publication before local execution
terminal truth publication after local apply
restart reconciliation between localmeta, raft durable state, and Coordinator

Those are not low-level implementation details. They are correctness boundaries.

15.2 Protocol objects

The execution plane should be formalized around the following objects.

`Admission`

Admission is the local decision about whether one request may enter execution.

It should answer:

is the local peer leader?
is the region epoch valid?
is the peer hosted and runnable?
is the request cancelled or timed out already?
is the queue or scheduler allowed to accept more work?

The important design rule is that admission must be explicit, not an accidental mix of local checks and fallback retries.

`ExecutionTarget`

ExecutionTarget is the concrete unit of work the execution plane is trying to carry out.

Examples:

one read command
one raft write proposal
one peer change target
one split target
one merge target

For topology changes, ExecutionTarget must remain causally tied to the rooted transition object created by the control plane.

`ExecutionOutcome`

ExecutionOutcome is the local state reached by an admitted target.

Minimal useful states are:

Rejected
Queued
Proposed
Committed
Applied
Failed

This is the minimum needed to stop conflating “accepted by API”, “replicated by raft”, and “applied to local state”.

`PublishState`

PublishState tracks the boundary between local apply and control-plane truth publication.

This is a first-class boundary in NoKV’s architecture:

planned truth is published before execution
terminal truth is published after local apply

The protocol must therefore distinguish:

NotRequired
Pending
Published
PublishFailed

This is the exact boundary where split/merge/peer-change correctness otherwise turns into invisible best-effort behavior.

`RestartState`

RestartState describes whether one store can safely resume from local durable state.

It should answer:

is local peer metadata self-consistent?
is the local raft replay pointer usable?
does the store need Coordinator catch-up only, or local rebuild first?
is startup safe, degraded, or fatal?

This object exists to stop restart behavior from being an implicit composition of:

raftstore/localmeta
raft log replay
ad hoc bootstrap logic

15.3 Request classes and admission

Execution-plane v1 should start by distinguishing three request classes:

Read
- local leader read admission
- read-index / wait-applied preconditions
- cancellation and deadline propagation
Write
- raft proposal admission
- proposal tracking through commit/apply
- retryable local rejection vs fatal local rejection
Topology
- peer change
- split
- merge
- explicit coupling to planned and terminal rooted truth

These classes do not need separate RPC protocols, but they do need stable admission outcomes. At minimum, those outcomes should distinguish:

NotLeader
EpochMismatch
NotHosted
Canceled
TimedOut
QueueSaturated
SchedulerDegraded
Accepted

Without this line, request behavior remains split across store-local branches instead of becoming one coherent executor contract.

15.4 Publish lifecycle

Execution-plane v1 should also make the publish boundary explicit for topology work.

The minimal lifecycle is:

PlannedPublished
LocallyExecuting
Applied
TerminalPublishPending
TerminalPublished
TerminalPublishFailed

The important rule is that Applied and TerminalPublished are different states. Local execution success does not mean global lifecycle completion until terminal truth is durably published.

This is the boundary that should align:

raftstore/store/transition_builder.go
raftstore/store/transition_executor.go
raftstore/store/transition_outcome.go
raftstore/store/scheduler_runtime.go

15.5 First landing points

Execution-plane protocol v1 landed first in the places that already carried the boundary implicitly:

raftstore/store/command_ops.go
- request admission and context semantics
raftstore/store/command_pipeline.go
- request lifecycle states visible to callers
raftstore/store/scheduler_runtime.go
- queue overflow / degraded local behavior
raftstore/store/transition_builder.go
- execution target construction from rooted truth
raftstore/store/transition_executor.go
- local execution and apply boundary
raftstore/store/transition_outcome.go
- terminal truth publication result
raftstore/localmeta
- restart state and local recovery truth

These files still do not expose a new public API. But they now share one explicit local protocol vocabulary instead of inventing those semantics independently.

15.6 Execution invariants

The execution-plane protocol should preserve the following invariants.

`Admission` invariant

Every externally visible rejection should map to a stable admission reason, not only a transport error or generic retry exhaustion.

`No skipped publish boundary` invariant

If local apply completed but terminal truth publication did not, the system must surface that state explicitly. It must not be silently treated as fully complete.

`Restart truth boundary` invariant

Restart must derive hosted peer truth from local durable state, not from bootstrap config. Static config may resolve addresses, but must not overwrite runtime truth.

`No hidden drop` invariant

Queue overflow, scheduler degradation, and publish retry loss must be explicit protocol states or metrics-backed outcomes, not silent local behavior.

15.7 Minimal rollout target

Execution-plane protocol v1 started small.

The minimum useful delivered line is now:

request admission
topology execution outcome
publish boundary state
restart state

That is enough to formalize the most dangerous boundaries without trying to protocolize every internal raft detail.

16. Priority and Rollout Order

The next protocol work should avoid widening either protocol until the current v1 contracts stay small, observable, and well tested.

16.1 What is implemented now

The control plane has a minimal, externally visible contract:

freshness classes
rooted token / lag
degraded serving state
transition identity

The execution plane now has a minimal internal contract:

admission class / reason
topology outcome
publish state
restart state
admin-visible ExecutionStatus

That is enough for v1. It gives tests and operators names for the important boundaries without turning raftstore into a policy engine.

16.2 What should not happen next

The wrong next step would be to keep enriching lifecycle phases and diagnostic fields before the existing v1 state proves stable under recovery and integration tests.

That would create a vocabulary mismatch:

control plane claims richer transition semantics than the executor can act on
execution plane reports more states than the coordinator can use safely

16.3 Recommended order

Keep control-plane v1 and execution-plane v1 narrow.
Add tests around the existing publish/restart/admission states before adding new states.
Only then tighten control-plane v1 toward richer scheduler/runtime phases.

In short:

stabilize both v1 contracts first, then deepen scheduler/runtime semantics.

Keyboard shortcuts

NoKV Docs