Control-Plane and Execution-Plane Protocols
This note defines NoKV’s protocol line for:
- the
control plane - the
execution plane
and the next-stage evolution around them.
The current implementation status is intentionally minimal but no longer just a design sketch:
control-plane protocol v1is implemented and exposed through Coordinator RPCs plusmeta/rootstorage semanticsexecution-plane protocol v1is implemented as a store-local contract with a small admin diagnostics API surface
The point of this document is to keep those two lines coordinated instead of letting them drift into separate, implicit rule sets.
The control plane focuses on the contract between:
meta/rootcoordinator
The execution plane focuses on the contract between:
coordinatorraftstore- local durable state (
raftstore/localmeta, raft log, restart replay)
The purpose of this document is not to replace Raft or redesign the data plane. The purpose is to make NoKV’s existing cross-plane behavior explicit, testable, and evolvable.
The control plane is protocolized around four ideas:
FreshnessCatchUpTransitionDegradedMode
These four ideas already existed in partial form inside the implementation. The current work turns them into a stable vocabulary, explicit invariants, and a clear rollout line.
The execution plane is protocolized around four matching ideas:
AdmissionExecutionTargetPublishBoundaryRestartState
0. Current Status
The control plane now has a minimal implemented v1.
Implemented and exposed through pb/coordinator/coordinator.proto,
coordinator/server, coordinator/rootview, and tests:
- route-read
Freshness RootTokenroot_lagDegradedModeCatchUpStateTransitionIDPublishRootEventResponse.assessmentas a pre-persist lifecycle assessment
This means the protocol is no longer only a design direction. It is already the formal serving contract for key Coordinator APIs.
Not implemented in v1:
- richer transition phases such as
Published/Stalled - a fuller catch-up action surface exposed through API
- automatic recovery policy derived from protocol state
- broad client-side policy that consumes every protocol field
So the right description today is:
control-plane protocol v1 is implemented and in use, while richer scheduler/runtime policy is not implemented in v1.
The execution plane is in a different state.
Today, raftstore has a minimal implemented v1 inside raftstore/store.
Already implemented and exercised through store-local types, raftstore/admin,
runtime state, and tests:
- explicit
Admissionclasses and reasons on read / write / topology entry points - explicit topology
ExecutionOutcome - explicit topology
PublishState - explicit
RestartStatederived fromraftstore/localmeta+ raft replay pointers - terminal publish failures retained as visible retry state instead of silent drop
- admin diagnostics exposure through
pb/admin/admin.protoExecutionStatus
Not implemented as first-class execution protocol fields yet:
- request validation and routing
- context propagation
- detailed local leader admission diagnostics
- detailed per-attempt scheduler retry/backoff policy
- metrics for planned truth -> execute -> terminal truth latency
- richer degraded local scheduler states
The current landing is still mostly store-local and spread across:
raftstore/storeraftstore/peerraftstore/raftlograftstore/localmeta
So the right description there is:
execution-plane protocol v1 now exists as a minimal named runtime contract, with store-local state and admin-visible diagnostics, while broader metrics, policy, and richer executor states are not implemented in v1.
1. Intent
NoKV already has the right building blocks:
- rooted truth events
- checkpoint + committed tail
- watch-first tail subscription
- rebuildable
coordinator/catalog - explicit planned and terminal topology events
Before v1, these pieces mostly existed as implementation mechanics. The control plane now has a formal minimum contract, while several policy extensions remain intentionally outside v1:
- when a follower read is fresh enough
- when a follower must reload
- when retained tail catch-up is no longer enough
- what phase a topology change is in
- what “degraded” actually means to callers
The design goal is to keep turning these implicit behaviors into a formal protocol.
That protocol should be:
- small
- explicit
- observable
- testable
- compatible with the current architecture
2. Scope
This document covers both planes, but not at the same implementation depth.
For the control plane, it defines the behavior of:
- rooted truth consumption
- control-plane view freshness
- rooted catch-up progression
- topology transition lifecycle
- degraded operating modes
For the execution plane, it defines the protocol direction for:
- request admission
- transition execution
- terminal truth publication
- restart and local recovery alignment
- degraded local behavior around scheduler / queue / publish boundaries
It does not redefine:
- Raft replication
- Percolator / 2PC transaction semantics
- store-local recovery metadata
- storage-engine internals unrelated to distributed lifecycle
This document should be read as two linked contracts:
control plane = durable truth + materialized view + serving contract
execution plane = admitted work + local execution + publish/restart contract
3. Protocol Objects
The naming set should remain compact, stable, and precise.
3.1 RootToken
RootToken is the rooted truth position already incorporated by some materialized view.
It is the control-plane equivalent of:
- “what truth have I already consumed?”
It should be treated as:
- monotonic
- comparable
- portable across control-plane nodes
RootToken is not just an internal storage cursor.
It is the anchor for:
- freshness
- catch-up state
- read eligibility
- transition causality
3.2 Freshness
Freshness is the serving contract attached to a read.
It answers:
- how fresh did the caller ask for?
- how fresh was the returned answer?
3.3 CatchUpState
CatchUpState describes how far one Coordinator node has converged on rooted truth.
It answers:
- can this node serve route reads?
- can it satisfy bounded-freshness reads?
- must it reload?
- must it install bootstrap?
3.4 Transition
Transition is one rooted topology change that moves through a formal lifecycle.
Examples:
- peer addition
- peer removal
- region split
- region merge
- region tombstone
Transition is not just a single event.
It is a causally tracked change with:
- identity
- source truth position
- phase
- progress
3.5 DegradedMode
DegradedMode is the externally visible restriction level of the control plane.
It answers:
- what kind of reads may still be served?
- are rooted writes currently allowed?
- should clients retry elsewhere?
- is the node usable only as a stale view?
4. Naming Set
The protocol should use one stable vocabulary across:
- API
- code
- logs
- metrics
- tests
- docs
4.1 Read classes
Strong- requires leader-grade freshness
Bounded- allows follower service within explicit lag limits
BestEffort- allows stale cache service
These names are short and carry clear serving intent.
4.2 Catch-up actions
Reload- rebuild catalog from rooted storage
Advance- acknowledge rooted tail progress without a full rebuild
Bootstrap- install a fresh checkpoint because retained tail is insufficient
Reject- deny freshness-sensitive reads until convergence improves
4.3 Catch-up states
FreshLaggingBootstrapRequiredRecoveringUnavailable
4.4 Degraded modes
HealthyCoordinatorDegradedRootLaggingRootUnavailableViewOnly
ViewOnly is deliberately chosen over more vague names like ExecutionOnly.
This section only defines control-plane behavior, so the right question is:
can this node still expose a stale view?
5. Freshness Contract
The control plane should stop treating all successful reads as equivalent.
Every control-plane read should:
- declare the requested freshness class
- optionally declare a rooted lower bound
- receive an explicit served freshness result
5.1 Why this matters
Today, follower reads are effectively:
“good enough if the follower recently reloaded and is not too far behind”
That is practical, but not a protocol.
Without a formal freshness contract:
- clients cannot reason about route read quality
- tests cannot assert serving guarantees precisely
- degraded modes remain guesswork
- control-plane correctness is partly hidden in implementation details
5.2 Request fields
Control-plane read RPCs should be able to express:
freshnessStrong,Bounded, orBestEffort
required_root_token- optional lower bound on rooted truth already incorporated
max_root_lag- optional bound on acceptable rooted lag
Not every caller will need all three fields. But the protocol should have room for them.
5.3 Response fields
Control-plane read RPCs should return:
served_root_tokenserved_freshnessserved_by_leaderdegraded_mode
Optional future fields:
root_lagfreshness_reason
5.4 Serving rules
Strong
Should be served only when:
- the node is rooted leader
- and the serving catalog has incorporated at least the requested
RootToken
If this is not true, the server should reject rather than silently downgrade.
Bounded
May be served by a follower when:
- the node is not in
BootstrapRequired,Recovering, orUnavailable - and lag is within declared bounds
- and the served token satisfies
required_root_tokenif one was requested
If bounds cannot be satisfied, the server should reject rather than silently serve stale data.
BestEffort
May be served from the current materialized catalog so long as:
- the catalog exists
- the node is not fully unavailable
This class exists to make stale service explicit instead of accidental.
5.5 First rollout target
The first RPC that should adopt this contract is:
GetRegionByKey
That gives the system a clear, high-value place to prove the model before wider rollout.
6. Rooted Catch-Up Protocol
NoKV already has a good catch-up foundation:
- checkpoint
- committed tail
- watch-first subscription
- bootstrap install when retained tail is insufficient
The next step is to give that behavior a formal state machine.
6.1 Catch-up state definitions
Fresh
The node’s materialized catalog is sufficiently close to rooted truth to serve:
BoundedBestEffort
and, if leader, possibly Strong.
Lagging
The node is behind, but still within retained-tail recovery range.
This means:
- further rooted tail observation may repair the gap
- bootstrap install is not yet mandatory
- some bounded reads may need to be rejected
BootstrapRequired
The node is too far behind for retained tail replay.
This means:
- a plain reload from retained tail is not sufficient
- a new checkpoint/bootstrap install is required
- freshness-sensitive reads should be rejected
Recovering
The node is actively rebuilding its materialized control-plane view.
This means:
- catalog may be in transition
- only explicitly allowed stale reads may be served
Unavailable
The node cannot presently produce a valid control-plane view.
This means:
- no rooted freshness contract can be satisfied
- the server should fail reads except possibly future explicit diagnostics
6.2 Catch-up actions
Reload
Used when rooted truth advanced in a way that requires rebuilding the materialized catalog.
Advance
Used when rooted tail progressed, but the catalog does not need a full rebuild.
Bootstrap
Used when the node must install a checkpoint because retained tail can no longer bridge the gap.
Reject
Used when the node should refuse freshness-sensitive serving until it converges further.
6.3 Protocol outputs
The rooted subscription path should eventually expose a structured result like:
root_token_beforeroot_token_aftercatch_up_statecatch_up_actionreload_requiredbootstrap_required
6.4 Why protocolizing this matters
Without explicit catch-up semantics:
- tests can only assert indirect effects
- follower-read serving policy stays implicit
- degraded-mode logic gets duplicated
- future clients cannot reason about retries properly
This is one of the strongest places for NoKV to become distinctive.
7. Transition Lifecycle Protocol
NoKV already records rooted topology intent and rooted completion. That is the start of a lifecycle, not yet the full protocol.
The next stage is to make transition tracking first-class.
7.1 Transition identity
Every topology transition should have a stable TransitionID.
TransitionID should be:
- deterministic
- durable
- safe to log, surface, and test against
It should not require callers to infer identity from:
- region ID
- event kind
- timing
alone.
7.2 Transition source
Every transition should record:
- source rooted epoch or token
- target topology intent
- the event that created it
This makes causality explicit:
- what truth position created this transition?
- what later truth position superseded it?
7.3 Phase definitions
Planned
The rooted lifecycle assessment says the transition exists as an intended topology change, but the scheduler/control-plane runtime has not yet admitted it for forward progress.
This is the phase used by:
AssessRootEventPublishRootEventResponse.assessment
Admitted
The rooted transition is currently pending or open, and the scheduler/control-plane runtime has admitted it for execution progress.
This is the phase used by:
ListTransitions
It is intentionally runtime-facing. It does not appear in
PublishRootEventResponse.assessment, because that response reports a
pre-persist lifecycle assessment rather than post-admission runtime state.
Completed
The rooted lifecycle says the requested transition target is already satisfied. For a plan event, this usually means the requested topology is already present.
Cancelled
The rooted lifecycle says the requested transition target was cancelled.
Conflicted
The rooted lifecycle says a different pending transition already owns progress for the same target.
Superseded
The rooted lifecycle says a newer rooted topology already superseded this transition target.
Aborted
The rooted lifecycle says an apply or terminal event does not match the current pending rooted target.
7.4 Why lifecycle matters
A formal lifecycle enables:
- clear scheduling decisions
- proper retry/backoff
- stuck transition recovery
- scheduler/control-plane runtime clarity
- precise testing around publish boundaries
Without it, the system keeps relying on partial signals scattered across:
- rooted events
- in-memory views
- runtime heuristics
8. Degraded Semantics
NoKV already has some degraded behavior:
- followers serve stale route views
- route cache may survive Coordinator outages
- scheduler paths may degrade
These behaviors should become explicit protocol states.
8.1 Mode definitions
Healthy
Normal serving mode.
Rooted truth, catalog freshness, and serving guarantees are all within policy.
CoordinatorDegraded
The Coordinator process is alive, but not all control-plane functions can be performed normally.
Examples:
- partial RPC surface availability
- write restrictions while leadership is unsettled
RootLagging
Rooted truth exists, but this node’s materialized catalog is behind allowed freshness bounds.
This is not full unavailability. It is a serving restriction mode.
RootUnavailable
The rooted backend cannot currently provide enough truth to support valid control-plane service.
In this mode:
- truth-sensitive reads fail
- rooted writes fail
- diagnostics may still be exposed
ViewOnly
The node may still expose a stale materialized catalog, but cannot satisfy freshness-sensitive contracts.
This mode is useful because it makes “stale but useful” explicit.
8.2 Why this should be formal
Without explicit degraded modes, callers only see:
- transport failure
not leaderroute unavailable
Those errors do not express the actual system state.
A real degraded protocol lets callers answer:
- retry elsewhere?
- retry later?
- accept stale?
- fail fast?
8.3 Relationship to freshness
DegradedMode and Freshness are related but not identical.
Freshnessis the contract requested and served for one readDegradedModeis the broader operating condition of the serving node
A node may be:
Healthyand still reject aStrongread because it is not leaderRootLaggingand still serveBestEffortViewOnlyand still serve diagnostics
That distinction should remain sharp.
8.4 Current coordinator contract
The current implementation already enforces a concrete degraded-mode contract at the Coordinator RPC boundary.
Metadata reads (GetRegionByKey)
Freshness=BEST_EFFORT- serves from the local materialized catalog even when
meta/rootis currently unavailable - returns
degraded_mode=ROOT_UNAVAILABLEwhen the rooted snapshot cannot be reloaded - returns
degraded_mode=ROOT_LAGGINGwhen the local catalog trails rooted truth
- serves from the local materialized catalog even when
Freshness=BOUNDED- rejects when
meta/rootis unavailable - rejects when
root_lag > max_root_lag - rejects when catch-up is still
BOOTSTRAP_REQUIRED
- rejects when
Freshness=STRONG- rejects on followers
- rejects whenever
root_lag > 0 - rejects when
meta/rootis unavailable
In all cases, successful replies carry the current answerability witness:
served_root_tokencurrent_root_tokenroot_lagcatch_up_statedegraded_modeserving_classsync_health
Duty-gated writes (AllocID, TSO, scheduler decisions)
These do not have a degraded fallback.
- the local coordinator must first campaign / renew the rooted lease
- the rooted lease must still be active for the local holder
- the rooted era must not already be sealed
- the rooted duty mask must admit the requested action
If any of those fail, the request is rejected instead of falling back to stale local state. This is the current boundary between:
- read-path degradation
- write-path fail-stop admission
Lifecycle mutations (Seal, Confirm, Close, Reattach)
Lifecycle mutations are stricter than hot-path duty admission:
- they always re-read rooted state from storage before mutating
- they reject any stale-holder / expired-lease / sealed-era view
- they treat finality as a rooted safety condition, not a best-effort hint
That is why seal / confirm / close / reattach do not use the cached mirror admission path.
8.5 Operational diagnostics
DiagnosticsSnapshot() now exports both:
- the current degraded serving state (
root,lease,audit,handover_witness) - cumulative Eunomia counters under
eunomia_metrics
eunomia_metrics is grouped into:
tenure_era_transitions_totalhandover_stage_transitions_totalgate_rejections_totalguarantee_violations_total
The guarantee_violations_total buckets map directly to the four Eunomia
guarantees:
primacyinheritancesilencefinality
9. API Direction
The most valuable first implementation step is at the Coordinator RPC boundary.
9.1 Read-side API direction
Read APIs should conceptually grow:
freshnessrequired_root_tokenmax_root_lag
Read responses should conceptually expose:
served_root_tokenserved_freshnessdegraded_modeserved_by_leader
9.2 Write-side API direction
Leader-only writes should remain leader-only.
Write requests should continue to require:
- rooted leadership
- expected cluster epoch where applicable
Write responses should eventually expose:
accepted_root_tokentransition_idwhere topology change is involved
This makes a write result more precise than:
accepted = true
9.3 Diagnostics API direction
The control plane will likely also benefit from an explicit diagnostics surface.
Conceptually, that should expose:
- current rooted token
- catalog rooted token
- catch-up state
- degraded mode
- leader identity knowledge
- lag estimate
This may become:
- a dedicated diagnostics RPC
- metrics
- CLI output
or all three.
10. Storage and Catalog Direction
To support the protocol above, the Coordinator catalog should become rooted-token aware.
At minimum, the materialized control-plane view should track:
catalog_root_tokencatalog_updated_atcatch_up_statedegraded_mode
Optional future metadata:
root_laglast_reload_reasonleader_observed
10.1 Ownership rule
This design does not change truth ownership.
The ownership line remains:
meta/rootowns durable truthcoordinator/catalogowns materialized serving state
The catalog should become more informative, not more authoritative.
10.2 Materialization rule
The catalog must remain:
- rebuildable
- discardable
- follower-local
It should never become a second durable truth source.
That is a core invariant.
11. Invariants
This protocol should preserve the following invariants.
11.1 Truth ownership invariant
Only meta/root owns durable control-plane truth.
11.2 Materialization invariant
coordinator/catalog is always derived state, never authority.
11.3 Monotonic token invariant
The materialized rooted token of one node must never move backward.
11.4 No silent downgrade invariant
If a caller requests Strong or bounded freshness and the node cannot satisfy it,
the server should reject rather than silently serve BestEffort.
11.5 Explicit stale service invariant
If stale service is allowed, the response should say so explicitly.
11.6 Transition identity invariant
Every control-plane transition must be referencable as a stable object, not just inferred from event timing.
12. Rollout State
The rollout stays incremental, but the first protocol line is already in use.
Phase 1: Freshness
Status: implemented
Delivered outcomes:
GetRegionByKeycan express requested freshness- route responses disclose served freshness and rooted token
- follower-read behavior is no longer implicit
Phase 2: Catch-Up
Status: minimal v1 implemented
Delivered outcomes:
CatchUpState- formal bootstrap-required boundary
- rooted lag awareness in serving decisions
Still open:
- a wider public
CatchUpActionsurface - more explicit recovery diagnostics
Phase 3: Transition
Status: minimal v1 implemented
Delivered outcomes:
- durable
TransitionID - explicit phase semantics across:
ListTransitionsAssessRootEventPublishRootEvent
- publish-time pre-persist lifecycle assessment
Still open:
- richer runtime phases
- stuck / timeout diagnosis
Phase 4: DegradedMode
Status: minimal v1 implemented
Delivered outcomes:
- explicit degraded semantics in route responses
- route-serving rejection under rooted lag / rooted unavailability
Still open:
- broader surfacing through metrics and diagnostics
- tighter client retry policy based on degraded state
13. What Not To Do
The following are intentionally out of scope for this line of work:
- inventing a new general-purpose consensus algorithm
- replacing Raft in the mainline system
- redesigning 2PC before control-plane semantics are explicit
- collapsing rooted truth and catalog into one mixed layer
- treating stale follower service as an undocumented optimization
NoKV’s control-plane innovation should come from stronger semantics and clearer ownership, not from unnecessary reinvention of already mature primitives.
14. Current Practical Naming Guidance
If this protocol starts landing in code, the implementation should prefer:
RootTokenFreshnessCatchUpStateCatchUpActionTransitionIDDegradedMode
For execution-plane work, prefer:
AdmissionExecutionTargetExecutionOutcomePublishStateRestartState
Avoid reintroducing weaker names like:
state kindstale modesync statusreload reasonas the primary protocol object
Those may still exist as helper fields, but the public model should stay anchored to the smaller protocol vocabulary above.
15. Execution-Plane Protocol
The execution plane is the contract between:
raftstore- local leader peer runtime
- local durable recovery state
- the control-plane publish boundary
Its job is different from the control plane.
The control plane answers:
- what topology truth exists?
- how fresh is the served view?
- what transition lifecycle is visible globally?
The execution plane answers:
- may this request enter local execution now?
- what target is being executed?
- how far has local execution progressed?
- has terminal truth been published yet?
- what state is safe to recover after restart?
15.1 Why this matters
Without an explicit execution-plane protocol, the system keeps important distributed safety semantics hidden in code paths such as:
- request validation and cancellation
- queue admission and local degradation
- planned truth publication before local execution
- terminal truth publication after local apply
- restart reconciliation between
localmeta, raft durable state, and Coordinator
Those are not low-level implementation details. They are correctness boundaries.
15.2 Protocol objects
The execution plane should be formalized around the following objects.
Admission
Admission is the local decision about whether one request may enter execution.
It should answer:
- is the local peer leader?
- is the region epoch valid?
- is the peer hosted and runnable?
- is the request cancelled or timed out already?
- is the queue or scheduler allowed to accept more work?
The important design rule is that admission must be explicit, not an accidental mix of local checks and fallback retries.
ExecutionTarget
ExecutionTarget is the concrete unit of work the execution plane is trying to
carry out.
Examples:
- one read command
- one raft write proposal
- one peer change target
- one split target
- one merge target
For topology changes, ExecutionTarget must remain causally tied to the rooted
transition object created by the control plane.
ExecutionOutcome
ExecutionOutcome is the local state reached by an admitted target.
Minimal useful states are:
RejectedQueuedProposedCommittedAppliedFailed
This is the minimum needed to stop conflating “accepted by API”, “replicated by raft”, and “applied to local state”.
PublishState
PublishState tracks the boundary between local apply and control-plane truth
publication.
This is a first-class boundary in NoKV’s architecture:
- planned truth is published before execution
- terminal truth is published after local apply
The protocol must therefore distinguish:
NotRequiredPendingPublishedPublishFailed
This is the exact boundary where split/merge/peer-change correctness otherwise turns into invisible best-effort behavior.
RestartState
RestartState describes whether one store can safely resume from local durable
state.
It should answer:
- is local peer metadata self-consistent?
- is the local raft replay pointer usable?
- does the store need Coordinator catch-up only, or local rebuild first?
- is startup safe, degraded, or fatal?
This object exists to stop restart behavior from being an implicit composition of:
raftstore/localmeta- raft log replay
- ad hoc bootstrap logic
15.3 Request classes and admission
Execution-plane v1 should start by distinguishing three request classes:
Read- local leader read admission
- read-index / wait-applied preconditions
- cancellation and deadline propagation
Write- raft proposal admission
- proposal tracking through commit/apply
- retryable local rejection vs fatal local rejection
Topology- peer change
- split
- merge
- explicit coupling to planned and terminal rooted truth
These classes do not need separate RPC protocols, but they do need stable admission outcomes. At minimum, those outcomes should distinguish:
NotLeaderEpochMismatchNotHostedCanceledTimedOutQueueSaturatedSchedulerDegradedAccepted
Without this line, request behavior remains split across store-local branches instead of becoming one coherent executor contract.
15.4 Publish lifecycle
Execution-plane v1 should also make the publish boundary explicit for topology work.
The minimal lifecycle is:
PlannedPublishedLocallyExecutingAppliedTerminalPublishPendingTerminalPublishedTerminalPublishFailed
The important rule is that Applied and TerminalPublished are different
states. Local execution success does not mean global lifecycle completion until
terminal truth is durably published.
This is the boundary that should align:
raftstore/store/transition_builder.goraftstore/store/transition_executor.goraftstore/store/transition_outcome.goraftstore/store/scheduler_runtime.go
15.5 First landing points
Execution-plane protocol v1 landed first in the places that already carried the boundary implicitly:
raftstore/store/command_ops.go- request admission and context semantics
raftstore/store/command_pipeline.go- request lifecycle states visible to callers
raftstore/store/scheduler_runtime.go- queue overflow / degraded local behavior
raftstore/store/transition_builder.go- execution target construction from rooted truth
raftstore/store/transition_executor.go- local execution and apply boundary
raftstore/store/transition_outcome.go- terminal truth publication result
raftstore/localmeta- restart state and local recovery truth
These files still do not expose a new public API. But they now share one explicit local protocol vocabulary instead of inventing those semantics independently.
15.6 Execution invariants
The execution-plane protocol should preserve the following invariants.
Admission invariant
Every externally visible rejection should map to a stable admission reason, not only a transport error or generic retry exhaustion.
No skipped publish boundary invariant
If local apply completed but terminal truth publication did not, the system must surface that state explicitly. It must not be silently treated as fully complete.
Restart truth boundary invariant
Restart must derive hosted peer truth from local durable state, not from bootstrap config. Static config may resolve addresses, but must not overwrite runtime truth.
No hidden drop invariant
Queue overflow, scheduler degradation, and publish retry loss must be explicit protocol states or metrics-backed outcomes, not silent local behavior.
15.7 Minimal rollout target
Execution-plane protocol v1 started small.
The minimum useful delivered line is now:
- request admission
- topology execution outcome
- publish boundary state
- restart state
That is enough to formalize the most dangerous boundaries without trying to protocolize every internal raft detail.
16. Priority and Rollout Order
The next protocol work should avoid widening either protocol until the current v1 contracts stay small, observable, and well tested.
16.1 What is implemented now
The control plane has a minimal, externally visible contract:
- freshness classes
- rooted token / lag
- degraded serving state
- transition identity
The execution plane now has a minimal internal contract:
- admission class / reason
- topology outcome
- publish state
- restart state
- admin-visible
ExecutionStatus
That is enough for v1. It gives tests and operators names for the important
boundaries without turning raftstore into a policy engine.
16.2 What should not happen next
The wrong next step would be to keep enriching lifecycle phases and diagnostic fields before the existing v1 state proves stable under recovery and integration tests.
That would create a vocabulary mismatch:
- control plane claims richer transition semantics than the executor can act on
- execution plane reports more states than the coordinator can use safely
16.3 Recommended order
- Keep control-plane v1 and execution-plane v1 narrow.
- Add tests around the existing publish/restart/admission states before adding new states.
- Only then tighten control-plane v1 toward richer scheduler/runtime phases.
In short:
stabilize both v1 contracts first, then deepen scheduler/runtime semantics.