feat(encryption): Stage 6B-2 — KEK plumbing + --encryption-enabled flag + mutator-RPC gate#776
feat(encryption): Stage 6B-2 — KEK plumbing + --encryption-enabled flag + mutator-RPC gate#776bootjp wants to merge 4 commits into
Conversation
…ed flag + mutator-RPC gate Stage 6B-2 per the PR #762 plan. Completes Stage 6B by wiring the operator-facing surface: --kekFile / --encryption-enabled flags, KEK loader at startup, applier threading, and the re-enabled mutator-RPC wiring in registerEncryptionAdminServer gated on (--encryption-enabled AND --kekFile non-empty). ## What this commit ships ### New flags (main.go) - --kekFile (default "") — §5.1 KEK file path (32 raw bytes, owner-only mode). When set, the file-backed kek.Wrapper is constructed at startup and threaded into the §6.3 EncryptionApplier so ApplyBootstrap / ApplyRotation can KEK-unwrap. --kekUri is reserved for KMS providers in Stage 9; only file backing ships in 6B-2. - --encryption-enabled (default false) — §6.5 opt-in to the mutating EncryptionAdmin RPCs. Operator-explicit gate: an unset flag means the cluster has not committed to the §7.1 rollout, so mutators MUST refuse even on a fully-keyed binary. ### Startup wiring (main.go run()) - loadKEKWrapperFromFlag() constructs the file-backed wrapper from --kekFile (or returns nil if empty). Extracted to keep run() under the cyclop budget. - A single shared *encryption.Keystore is created once per process. Both Stage 6A (applier) and Stage 6D (storage cipher; not in this PR) read from the same instance so post-bootstrap DEKs are visible cluster-wide. - buildShardGroups gains three new tail parameters: (kekWrapper, keystore, sidecarPath). These thread through to the per-shard FSM construction. ### Per-shard applier construction (main.go buildShardGroups) applierOptionsFor() helper assembles the variadic ApplierOption slice based on which Stage 6B-2 dependencies the operator wired: - kekWrapper != nil → WithKEK(kekWrapper) - keystore != nil → WithKeystore(keystore) - sidecarPath != "" → WithSidecarPath(sidecarPath) Without any of them, the applier stays in the Stage 6A posture (ApplyBootstrap / ApplyRotation return ErrKEKNotConfigured at apply time). This is the desired fail-closed behaviour for clusters that have not opted in. ### Mutator gate (main_encryption_admin.go) registerEncryptionAdminServer regains its (engine, enableMutators) parameters that Stage 5D removed. When enableMutators=true AND engine!=nil, WithEncryptionAdminProposer + WithEncryptionAdminLeaderView are wired and mutators reach Raft. When either condition is false, both options stay off and mutators continue to refuse with FailedPrecondition at the RPC boundary. encryptionMutatorsEnabled() is the readback: returns true iff *encryptionEnabled && *kekFile != "". Both conditions are independently necessary: - --encryption-enabled is the explicit operator opt-in. - --kekFile being non-empty means a KEK source is loaded; without it, a mutator that committed would land in the applier with no KEK and HaltApply on every replica. The RPC-layer gate keeps that halt unreachable. The encryptionAdminEngine local interface is re-introduced (raftengine.Proposer ∩ raftengine.LeaderView) so the registration helper does not pull in the full engine type. ### Tests (main_encryption_admin_test.go) Existing TestEncryptionAdmin_MutatingRPCRefusedUntilStage6 replaced with two complementary tests covering all 4 corners of the (enableMutators, engine) gate matrix: - TestEncryptionAdmin_MutatingRPCRefusedWhenGateOff: 3 sub-cases (flag_off_engine_nil, flag_off_engine_set, flag_on_engine_nil) — all return FailedPrecondition. - TestEncryptionAdmin_MutatingRPCEnabledWhenGateOn: the remaining (flag_on_engine_set) corner — asserts that the FailedPrecondition gate is NO LONGER firing (deeper status codes from the empty-payload validation are fine; what matters is the gate boundary). A stubEncryptionAdminEngine fake satisfies the encryptionAdminEngine interface for the gate-on test. ### Tests updated for buildShardGroups signature multiraft_runtime_test.go (2 sites) and main_bootstrap_e2e_test.go (1 site) updated to pass nil, nil, "" for the 3 new tail parameters. These tests do not exercise encryption paths; the no-options posture leaves the FSM in Stage 6A behavior. ## Caller audit (semantic changes — signatures) - buildShardGroups: 3 new tail parameters. 1 production caller (main.go run() at line 337), 3 test callers updated. All updated together. - registerEncryptionAdminServer: 2 new tail parameters (enableMutators bool, engine encryptionAdminEngine). 1 production caller (main.go startRaftServers at line ~1390), 2 test callers updated. - The mutator gate is fail-CLOSED by default: enableMutators defaults to false at every test site (no fake engine supplied), so the operator-surface posture is identical to Stage 5D / 6A in the absence of explicit opt-in. - main.go run() now constructs a process-wide *Keystore and optionally a *kek.FileWrapper. Both live for the process lifetime and are threaded to every shard's applier. ## Stage 5D safety boundary The Stage 5D regression — operators who set --encryptionSidecarPath alone must not see mutators wired — is preserved by the double gate. sidecarPath alone is strictly the capability surface; the mutator wiring is gated on the AND of two SEPARATE flags. ## Five-lens self-review 1. Data loss — no data path touched. The applier remains fail-closed under HaltApply for any ErrEncryptionApply return. Pebble Sync semantics unchanged. 2. Concurrency / distributed failures — the shared Keystore is internally locked (sync.RWMutex per encryption.Keystore). The KEK wrapper is documented as safe for concurrent use. Per-shard applier reads from the shared instances. 3. Performance — no hot-path change. KEK loading is a single file read at startup. Per-shard applier construction adds one slice allocation per group (applierOptionsFor returns a 0-3 element slice). 4. Data consistency — the mutator gate is the load-bearing safety boundary. With either gate input false, the RPC layer refuses; with both true, the applier still has the §5.6 / §5.2 input-validation guards from Stage 6B-1 (bootstrap idempotency, foreign-DEK batch rejection, proposer-DEK mismatch, etc.). 5. Test coverage — 4 gate corners + the existing applier tests + the buildShardGroups callers. The gate matrix is exhaustively tested. ## Verification - go test -race -timeout=120s -run 'TestEncryptionAdmin|TestRegisterEncryptionAdminServer|TestApply|TestNewApplier|TestWriterRegistry' ./internal/encryption/... ./store/... ./kv/... . — PASS - go build ./... — PASS - golangci-lint run ./... — 0 issues on touched files (cyclop / mnd under budget via factored helpers) ## Plan Stage 6B is now complete (6B-1 applier internals + 6B-2 main.go wiring + flag + gate). The next stages per PR #762: - 6C: §9.1 startup refusal guards (sidecar present without flag, KEK mismatch, sidecar/raft index gap, fsync support, local_epoch rollback / exhaustion, node_id collision). - 6D: §6.6 enable-storage-envelope admin RPC + §7.1 Phase-1 storage cutover. - 6E: §6.6 enable-raft-envelope admin RPC + §7.1 Phase-2 raft cutover + ErrRaftUnwrapFailed. - 6F: §6.5 --encryption-rotate-on-startup ergonomics.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds encryption opt-in wiring for Stage 6B-2: new CLI flags ( ChangesStage 6B-2 Encryption Opt-In and Admin Gating
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review Stage 6B-2 — completes Stage 6B with the operator-facing surface ( Looking for review on:
|
|
Claude finished @bootjp's task in 3m 36s —— View job Code Review: Stage 6B-2 — KEK plumbing +
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e15568f3f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // is colocated with the registerEncryptionAdminServer helper | ||
| // that consumes it. | ||
| func encryptionMutatorsEnabled() bool { | ||
| return *encryptionEnabled && *kekFile != "" |
There was a problem hiding this comment.
Include sidecar-path in mutator gate
This gate enables mutating EncryptionAdmin RPCs when --encryption-enabled and --kekFile are set, but it ignores whether --encryptionSidecarPath is configured. BootstrapEncryption/RotateDEK can then propose entries that reach apply with only partial applier wiring; internal/encryption/applier.go requires KEK+keystore+sidecar together (bootstrapAndRotationConfigured) and otherwise returns ErrKEKNotConfigured, which is a halt-apply error path. In practice, a node started with encryption enabled + KEK file but empty sidecar path can accept mutator RPCs and trigger cluster apply halt once those entries commit.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
main_encryption_admin_test.go (1)
144-149: ⚡ Quick winUse a bounded context for the helper RPC call.
The helper uses a background context; if the request path stalls, this can hang the test. Use
context.WithTimeoutfor deterministic failure.Suggested timeout guard
- _, err = client.BootstrapEncryption(context.Background(), &pb.BootstrapEncryptionRequest{ + ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) + defer cancel() + _, err = client.BootstrapEncryption(ctx, &pb.BootstrapEncryptionRequest{ StorageDekId: 1, RaftDekId: 2, WrappedStorageDek: []byte("w"), WrappedRaftDek: []byte("w"), })import ( "context" "net" "testing" + "time"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@main_encryption_admin_test.go` around lines 144 - 149, The test calls client.BootstrapEncryption using context.Background(), which can hang; change the call to use a bounded context via context.WithTimeout (e.g., short timeout like a few seconds), defer the cancel, and pass that context into client.BootstrapEncryption so the helper RPC fails deterministically on stalls; update the invocation site where client.BootstrapEncryption is called in main_encryption_admin_test.go accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@main_encryption_admin_test.go`:
- Around line 118-122: The test currently only fails when status.Code(err) ==
codes.FailedPrecondition which lets unrelated transport/setup errors pass;
update the assertion around the BootstrapEncryption result so that when err !=
nil you check the returned status code (via status.Code(err)) and fail if it is
codes.FailedPrecondition, and otherwise require the code to be one of the
expected infra statuses (e.g., codes.Unavailable or codes.DeadlineExceeded);
implement this by replacing the current if-block with logic that computes code
:= status.Code(err) and calls t.Fatalf/t.Errorf if code ==
codes.FailedPrecondition or if code is not in the allowed set
(codes.Unavailable, codes.DeadlineExceeded), referencing BootstrapEncryption,
status.Code, and the codes constants.
---
Nitpick comments:
In `@main_encryption_admin_test.go`:
- Around line 144-149: The test calls client.BootstrapEncryption using
context.Background(), which can hang; change the call to use a bounded context
via context.WithTimeout (e.g., short timeout like a few seconds), defer the
cancel, and pass that context into client.BootstrapEncryption so the helper RPC
fails deterministically on stalls; update the invocation site where
client.BootstrapEncryption is called in main_encryption_admin_test.go
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bfd57482-669a-4167-ae9c-82f2d1cfa13e
📒 Files selected for processing (5)
main.gomain_bootstrap_e2e_test.gomain_encryption_admin.gomain_encryption_admin_test.gomultiraft_runtime_test.go
There was a problem hiding this comment.
Code Review
This pull request introduces a double-gate mechanism for encryption mutators, requiring both an explicit --encryption-enabled flag and a valid --kekFile path to enable mutating RPCs. The changes include refactoring the startup sequence to initialize a shared KEK wrapper and keystore, updating the shard group construction to thread these dependencies, and enhancing the gRPC server registration to enforce the new gating logic. Feedback focuses on improving maintainability by using interfaces instead of concrete types for the KEK wrapper in helper functions, which will facilitate future KMS integrations.
| func loadKEKWrapperFromFlag() (*kek.FileWrapper, error) { | ||
| if *kekFile == "" { | ||
| return nil, nil | ||
| } | ||
| w, err := kek.NewFileWrapper(*kekFile) | ||
| if err != nil { | ||
| return nil, errors.Wrapf(err, "failed to load KEK from %s", *kekFile) | ||
| } | ||
| return w, nil |
There was a problem hiding this comment.
Consider returning the kek.Wrapper interface instead of the concrete *kek.FileWrapper to simplify future KMS integrations. This is a provider-side interface; application code should not depend on or type-assert it to ensure maintainability and prevent unintended dependencies.
| func loadKEKWrapperFromFlag() (*kek.FileWrapper, error) { | |
| if *kekFile == "" { | |
| return nil, nil | |
| } | |
| w, err := kek.NewFileWrapper(*kekFile) | |
| if err != nil { | |
| return nil, errors.Wrapf(err, "failed to load KEK from %s", *kekFile) | |
| } | |
| return w, nil | |
| func loadKEKWrapperFromFlag() (kek.Wrapper, error) { | |
| if *kekFile == "" { | |
| return nil, nil | |
| } | |
| w, err := kek.NewFileWrapper(*kekFile) | |
| if err != nil { | |
| return nil, errors.Wrapf(err, "failed to load KEK from %s", *kekFile) | |
| } | |
| return w, nil | |
| } |
References
- When an interface is designed as an optional extension for internal use by a backend or adapter, clarify its role as a 'provider-side interface' and explicitly state that application code should not depend on or type-assert it.
| // suppresses its option, leaving the applier in the Stage 6A | ||
| // posture for that axis. Extracted from buildShardGroups so the | ||
| // per-shard loop stays under the cyclop complexity budget. | ||
| func applierOptionsFor(kekWrapper *kek.FileWrapper, keystore *encryption.Keystore, sidecarPath string) []encryption.ApplierOption { |
There was a problem hiding this comment.
Using the encryption.KEKUnwrapper interface for the kekWrapper parameter makes this helper more flexible. As this is a provider-side interface, application code should not depend on or type-assert it, improving maintainability and aligning with the requirements of encryption.NewApplier.
| func applierOptionsFor(kekWrapper *kek.FileWrapper, keystore *encryption.Keystore, sidecarPath string) []encryption.ApplierOption { | |
| func applierOptionsFor(kekWrapper encryption.KEKUnwrapper, keystore *encryption.Keystore, sidecarPath string) []encryption.ApplierOption { |
References
- When an interface is designed as an optional extension for internal use by a backend or adapter, clarify its role as a 'provider-side interface' and explicitly state that application code should not depend on or type-assert it.
…um x2 + coderabbit minor 4 findings on round-1 (commit 1e15568), all addressed. ## codex P1 — Include sidecar-path in mutator gate The Stage 6B-2 double-gate (--encryption-enabled AND --kekFile) let an operator with both flags set but --encryptionSidecarPath empty trigger a cluster halt: 1. Mutator RPC accepted (Proposer + LeaderView wired) 2. Proposal commits 3. Every replica's applier rejects the entry because bootstrapAndRotationConfigured() requires WithKEK + WithKeystore + WithSidecarPath together; sidecarPath was empty so WithSidecarPath was never installed 4. HaltApply fires on every node → cluster halt Fix: extend encryptionMutatorsEnabled() to a triple gate: return *encryptionEnabled && *kekFile != "" && *encryptionSidecarPath != "" Without all three, the RPC layer refuses with FailedPrecondition before any proposal commits. The applier-side bootstrapAndRotationConfigured() halt becomes unreachable in practice (still load-bearing as defense-in-depth). Flag help on --encryptionSidecarPath updated to mention the triple gate. ## gemini medium (line 815) — loadKEKWrapperFromFlag interface Return type changed from *kek.FileWrapper (concrete) to kek.Wrapper (interface). Stage 9 KMS providers will satisfy the same interface; the call site stays decoupled from the file-mode provider. ## gemini medium (line 824) — applierOptionsFor parameter interface Parameter type changed from *kek.FileWrapper to encryption.KEKUnwrapper. The applier only needs Unwrap, and any kek.Wrapper satisfies encryption.KEKUnwrapper structurally because both declare Unwrap with the same signature. buildShardGroups parameter type updated in step with these two: now takes kek.Wrapper (so the variable flowing from loadKEKWrapperFromFlag stays the same interface end-to-end). ## coderabbit minor — Tighten gate-on assertion TestEncryptionAdmin_MutatingRPCEnabledWhenGateOn previously accepted any non-FailedPrecondition status, which would mask a transport failure (bufconn dial timeout, deadline exceeded, canceled context) as a 'gate is no longer firing' pass. Added explicit checks for codes.Unavailable / DeadlineExceeded / Canceled — those signal infra failure, not gate behavior. ## Caller audit - encryptionMutatorsEnabled: stricter — strictly fewer paths produce true. Operator-surface fail-CLOSED at the additional axis. Production caller (startRaftServers) is unchanged at the call site; the readback is purely stricter. - loadKEKWrapperFromFlag return type: kek.Wrapper is the super-interface of *kek.FileWrapper. The one production caller (run()) assigns to a local var of inferred type, so no breakage. - applierOptionsFor parameter type: encryption.KEKUnwrapper is satisfied by kek.Wrapper. The one production caller (buildShardGroups) passes the kek.Wrapper variable directly. - buildShardGroups parameter type: kek.Wrapper supersedes *kek.FileWrapper. The one production caller passes the var from loadKEKWrapperFromFlag. The 3 test callers pass nil, which still satisfies the interface. ## Verification - go test -race -timeout=60s -run 'TestEncryptionAdmin|TestRegisterEncryptionAdminServer' . — PASS - go build ./... — PASS - golangci-lint run ./... — 0 issues on touched files
|
@claude review Round-2 (commit
Caller audit
Verification
|
|
Claude finished @bootjp's task in 2m 2s —— View job Claude's Round-2 Review Todo:
Round-2 Fix VerificationAll four claimed fixes are confirmed correct. Fix #1 — Triple gate (
func (a *Applier) bootstrapAndRotationConfigured() bool {
return a.kek != nil && a.keystore != nil && a.sidecarPath != ""
}The gate is now a structural mirror of the applier's own readiness check. The cluster-halt scenario from round-1 Finding 1 is closed. ✓ Fix #2 — Return type is now the interface. The docstring correctly explains the KMS-provider extensibility rationale (Stage 9 slots in without touching Fix #3 — The narrow applier-side interface is used. Fix #4 — Gate-on test assertion tightening ( The test now has two failure branches: buildShardGroups param type — Now takes Remaining Notes (non-blocking)1. CodeRabbit's round-1 nitpick on 2.
3. Pre-existing: Keystore not in FSM snapshots (carry-over from round-1 Finding 4) Still relevant for Stage 6D planning. After a snapshot, if the bootstrap Raft entry falls before the snapshot cutoff it won't replay, leaving the Keystore empty on restart. The Stage 6D storage-cipher plumbing will need either snapshot serialization or sidecar-based Keystore hydration at startup. Worth adding explicitly to the Stage 6D design doc before implementation starts. Overall AssessmentAll round-1 blocking and medium findings are resolved. The triple gate correctly mirrors the applier's own readiness predicate, the interface-return changes are structurally sound (and the cycle-safe rationale is documented), and the test assertions now fail loudly on infra noise. The remaining notes are non-blocking carry-overs. Ready to merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f946c1590
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // is colocated with the registerEncryptionAdminServer helper | ||
| // that consumes it. | ||
| func encryptionMutatorsEnabled() bool { | ||
| return *encryptionEnabled && *kekFile != "" && *encryptionSidecarPath != "" |
There was a problem hiding this comment.
Block mutators until cluster-wide readiness is confirmed
encryptionMutatorsEnabled() only inspects local flags, so the first node restarted with all three settings will wire mutating RPCs even if peers are still on old config. In that mixed rollout window, a committed BootstrapEncryption/RotateDEK entry can reach followers whose appliers were built without full KEK+keystore+sidecar wiring (buildShardGroups/applierOptionsFor), causing ApplyBootstrap/ApplyRotation to return ErrKEKNotConfigured; the FSM marks that as ErrEncryptionApply, which halts apply on those replicas. This makes a single early admin call capable of stalling the cluster during staged restarts, so mutator enablement should be gated on cluster-wide capability/readiness rather than node-local flags alone.
Useful? React with 👍 / 👎.
…e scope limitation codex r2 P1 on round-2 (commit 0f946c1): the triple-gate encryptionMutatorsEnabled() is purely node-local; during a staged rolling upgrade where some peers still run the pre-6B-2 binary or have the flags unset, a mutator proposal committed by a fully-configured leader would reach those replicas' appliers and HaltApply on the under-configured side. ## Resolution This is a real concern but the cluster-wide readiness check belongs at the §7.1 Phase-1 cutover RPC layer (enable-storage-envelope) in Stage 6D, not in the node-local flag gate. PR #762's milestone breakdown explicitly assigns the Voters ∪ Learners capability fan-out gate to Stage 6D. Adding a partial cluster-wide check in Stage 6B-2 would duplicate the §7.1 capability gate and ship before the operator-facing GetCapability fan-out is wired into the admin CLI. The right resolution is to document the scope limitation explicitly so: 1. Operators reading the code understand the gate's scope and the rolling-upgrade discipline expected at this layer (same constraint as PR #765's 6A→6B caveat). 2. The Stage 6D PR has a clear handoff point — the cluster-wide gate is its scope, and this docstring points forward to that. ## Doc-only No code paths changed; the gate's three-flag readback is unchanged. Single 30-line docstring expansion explaining: - the gate is THIS NODE local - what cluster-wide readiness means at §7.1 - why Stage 6B-2 deliberately defers it to Stage 6D - the operator-discipline constraint matching PR #765 ## Caller audit Pure doc edit. No callers to audit. ## Verification - go build ./... — PASS - go test -race -timeout=60s -run 'TestEncryptionAdmin|TestRegisterEncryptionAdminServer' . — PASS - golangci-lint run ./... — 0 issues on touched files
|
@claude review Round-3 (commit codex r2 P1
ResolutionThis is a real concern, but the cluster-wide readiness check belongs at the §7.1 Phase-1 cutover RPC layer ( PR #762's plan explicitly assigns the Voters ∪ Learners capability fan-out gate to Stage 6D:
Adding a partial cluster-wide check at the node-local flag-gate layer in Stage 6B-2 would duplicate the §7.1 capability gate and ship before the operator-facing
The two layers are complementary. The rolling-upgrade discipline ("don't enable any mutator RPC until every member reports encryption_capable") is the same operator-discipline constraint documented in PR #765 (Stage 6A) for the 6A→6B caveat. Fix: 30-line docstring expansionThe
Operators reading the code now understand the gate's scope. The Stage 6D PR has a clear handoff point. Caller auditDoc-only. No code paths changed. Other r2 notes (non-blocking, claude r2 verdict was 'Ready to merge')
Verification
|
|
Claude finished @bootjp's task in 1m 11s —— View job Claude's Round-3 Review Todo:
Round-3 Change VerificationScope of change confirmed: commit Scope-Boundary Argument AssessmentThe expanded godoc addresses codex r2 P1 by documenting what the gate is, what it deliberately is not, and why. Evaluating each element: "THIS NODE" emphasis at the function header — correct and prominent. An operator reading the function won't have to dig to find that this is a local check. ✓ HaltApply scenario reformulation — the revised text ( Three-bullet rationale for deferring to Stage 6D — each bullet is defensible:
PR #776 round-2 codex P1 cited as trigger — the audit trail is preserved in the docstring itself. Future readers can find the design decision without traversing git blame. ✓ One minor note (non-blocking): The PR #765 cross-reference is a PR number, not a design doc path. PR links can become stale (closed PRs stay accessible, but the context is less obvious than a Round-2 Carry-OversAll three deferred items remain correctly scoped:
Overall AssessmentRound-3 is a clean, doc-only change that directly and accurately addresses the codex r2 P1 concern. The scope-boundary argument is technically sound, the rationale is captured where it is actionable (at the function declaration), and the Stage 6D handoff point is clearly named. The carry-over items from round 2 are correctly deferred with documented rationale. Ready to merge. |
…eference claude r3 verdict was 'Ready to merge' with one nice-to-have note: the scope-boundary docstring cited PR #765 by number but not the underlying design doc path. PR links can become stale; a path to the docs/design/ file makes the constraint findable without GitHub access. Added a reference to docs/design/2026_04_29_partial_data_at_rest_encryption.md (specifically the 6A rationale's 'Rolling 6A→6B upgrade caveat' section). The PR #765 reference is preserved as the historical trigger; the design doc path is the canonical source of the rolling-upgrade discipline going forward. Doc-only — no code paths touched.
|
@claude review Round-4 (commit Doc-only, 3 lines added, no code touched. Per your r3 verdict the PR is otherwise ready to merge; this just makes the rolling-upgrade discipline findable without GitHub access. Verification
|
|
Claude finished @bootjp's task in 48s —— View job Round-4 Review Todo:
Round-4 Change VerificationScope of change confirmed: commit Round-4 diff (in full): -// #765 (Stage 6A) for the 6A→6B rolling upgrade.
+// #765 (Stage 6A) and in the Stage 6 plan in
+// docs/design/2026_04_29_partial_data_at_rest_encryption.md
+// (6A rationale, "Rolling 6A→6B upgrade caveat").Design doc reference verified: The r3 "nice to have" is addressed accurately. The rolling-upgrade discipline is now findable without GitHub access, and the cross-reference path ( Overall AssessmentAll four rounds of review complete. The change is doc-only, the referenced section exists and matches, and no carry-over items require action before merge. Ready to merge. |
Summary
Stage 6B-2 per the PR #762 plan. Completes Stage 6B with the operator-facing surface:
--kekFile/--encryption-enabledflags, KEK loader at startup, applier threading, and the re-enabled mutator-RPC wiring inregisterEncryptionAdminServergated on (--encryption-enabledAND--kekFilenon-empty).Builds on Stage 6B-1 (PR #768) which landed the real
ApplyBootstrap/ApplyRotationpaths. This PR wires those paths into production by threading KEK + Keystore + sidecar path through the FSM construction and re-enabling the EncryptionAdmin mutator RPCs under a double gate.New flags
--kekFile""kek.FileWrapperat startup; threaded into the §6.3 EncryptionApplier.--kekUrideferred to Stage 9 (KMS providers).--encryption-enabledfalse--kekFile) forBootstrapEncryption/RotateDEK/RegisterEncryptionWriterto wire.Double-gate
Both flags are independently necessary for the mutator wiring:
--encryption-enabledalone → no KEK loaded → mutators wired but ApplyBootstrap would HaltApply withErrKEKNotConfiguredon every replica. The RPC-layer gate refuses BEFORE the proposal commits.--kekFilealone → no operator opt-in → mutators stay refused at the RPC boundary.WithEncryptionAdminProposer+WithEncryptionAdminLeaderViewinstalled and mutators reach the §6.3 applier through the shard's engine.Caller audit (semantic changes — signatures)
buildShardGroupsgains 3 new tail params(kekWrapper *kek.FileWrapper, keystore *encryption.Keystore, sidecarPath string). 1 production caller (main.go run()), 3 test callers (multiraft_runtime_test.go×2,main_bootstrap_e2e_test.go×1) — all updated to passnil, nil, ""for the no-encryption posture.registerEncryptionAdminServergains 2 new tail params(enableMutators bool, engine encryptionAdminEngine). 1 production caller (main.go startRaftServers), 2 test callers (main_encryption_admin_test.go) — all updated.The mutator gate is fail-CLOSED by default at every site that does not explicitly opt in.
Test surface
TestEncryptionAdmin_MutatingRPCRefusedWhenGateOff(3 sub-cases)FailedPreconditionTestEncryptionAdmin_MutatingRPCEnabledWhenGateOnFailedPreconditiongate is NO LONGER firingA
stubEncryptionAdminEnginesatisfies theencryptionAdminEngineinterface for the gate-on test (no actual Raft proposal — the stubPropose()returns success).Stage 5D safety boundary preserved
The Stage 5D regression — operators who set
--encryptionSidecarPathalone must not see mutators wired — is preserved by the double gate.sidecarPathalone is strictly the capability surface; the mutator wiring is gated on the AND of two SEPARATE flags.Five-lens self-review
ErrEncryptionApplyreturn. Pebble Sync semantics unchanged.buildShardGroupscallers.Test plan
go test -race -timeout=120s -run 'TestEncryptionAdmin|TestRegisterEncryptionAdminServer|TestApply|TestNewApplier|TestWriterRegistry' ./internal/encryption/... ./store/... ./kv/... .— PASSgo build ./...— PASSgolangci-lint run ./...— 0 issues on touched files (cyclop / mnd under budget via factoredloadKEKWrapperFromFlag+applierOptionsFor+encryptionMutatorsEnabledhelpers)BootstrapEncryptionRPC, which is not part of the Jepsen workload set)Plan
Stage 6B is now complete. The next stages per PR #762:
enable-storage-envelopeadmin RPC + §7.1 Phase-1 cutoverenable-raft-envelopeadmin RPC + §7.1 Phase-2 cutover--encryption-rotate-on-startupergonomicsSummary by CodeRabbit
Release Notes
New Features
Tests