Skip to content

fix(replication): bound replication queues with per-source fairness (D1 DoS)#99

Open
grumbach wants to merge 4 commits into
WithAutonomi:mainfrom
grumbach:fix/d1-bounded-replication-queues
Open

fix(replication): bound replication queues with per-source fairness (D1 DoS)#99
grumbach wants to merge 4 commits into
WithAutonomi:mainfrom
grumbach:fix/d1-bounded-replication-queues

Conversation

@grumbach
Copy link
Copy Markdown
Collaborator

@grumbach grumbach commented May 19, 2026

Summary

Security fix for D1: ReplicationQueues::pending_verify (HashMap) and fetch_queue (BinaryHeap) had no capacity bound — the source carried the project's own TODO. handle_neighbor_sync_request documents "No per-request hint count limit" and the only gate is sender_in_rt, so a single routing-table peer can flood NeighborSyncRequest hints (each message capped only by MAX_REPLICATION_MESSAGE_SIZE ≈ 10 MiB → ~320k 32-byte hints) and grow these structures unboundedly:

  • node memory exhaustion (OOM), and
  • a self-amplifying storm of outbound verification/fetch requests onto honest peers (reflective amplification).

Fix (two layers)

  1. Global memory backstopMAX_PENDING_VERIFY / MAX_FETCH_QUEUE (131_072 each). add_pending_verify / enqueue_fetch reject once full; callers already treat non-admission as "not added".

  2. Per-source fairness — the actual D1 defence. Each pending entry is accounted to its hint_sender via a pending_per_sender counter maintained in lockstep with pending_verify (insert / remove_pending / evict_stale). A single source may hold at most MAX_PENDING_VERIFY_PER_PEER (8_192) entries, so a flooding peer exhausts only its own quota and can never starve honest peers, whose quotas are independent. hint_sender is the cryptographically authenticated connection identity (not a forgeable payload field), so the accounting cannot be evaded.

A global cap alone would merely convert the memory DoS into a worse silent honest-replication starvation DoS (one ~4 MB message every <30 min permanently rejecting all honest hints). The per-source quota is what actually closes D1. This was caught by adversarial review and is the reason for layer 2.

Hardening: counter decrement uses saturating_sub + debug_assert!; get_pending_mut documents the no-reattribution invariant. A residual (~16 simultaneously-in-routing-table Sybil PeerIds to reach the global backstop) is documented as an accepted, tracked follow-up — it degrades only to the bounded memory backstop, never to silent starvation of non-Sybil peers.

Proof

tests/poc_d1_bounded_queues.rs (6 tests) demonstrates the attack is defeated:

  • poc_d1_flooding_peer_cannot_starve_honest_peerthe key test: a 508k-entry single-peer flood is clamped to 8 192 slots and honest hints are still admitted. Verified to FAIL against a global-cap-only fix (the weak fix), proving it validates the real defence, not just the memory bound.
  • Global-bound, fetch-queue-bound, per-source counter consistency (remove + stale eviction, no underflow/leak), get_pending_mut no-desync, and legitimate-traffic / dedup non-regression.

The original unbounded behaviour was also reproduced (pre-fix the queues grew 1:1 with attacker input); the flip was verified by neutering each enforcement layer and observing the corresponding test fail.

Validation

  • 6/6 D1 PoC tests pass.
  • All 460 library tests pass (incl. 190 replication tests) — no regression.
  • cfd clean (fmt + clippy, no warnings/errors).
  • Two rounds of independent multi-agent adversarial review; all blockers resolved.

Copilot AI review requested due to automatic review settings May 19, 2026 06:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the replication scheduling queues against a D1 DoS by bounding memory growth and enforcing per-source fairness so a single routing-table peer cannot monopolize verification/fetch work.

Changes:

  • Add global capacity limits for pending_verify and fetch_queue, rejecting/dropping admissions once full.
  • Add per-hint_sender accounting (pending_per_sender) to enforce a per-peer pending cap for fairness.
  • Add an integration PoC/regression test suite covering global bounds, per-source fairness, and counter consistency.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/replication/scheduling.rs Introduces global/per-sender caps, per-sender accounting, and bounded enqueue behavior for replication queues.
tests/poc_d1_bounded_queues.rs Adds PoC regression tests validating bounded queues and per-source fairness under flooding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/replication/scheduling.rs Outdated
Comment on lines 213 to 223
/// Get a mutable reference to a pending verification entry.
///
/// INVARIANT: callers MUST NOT reassign `entry.hint_sender` through the
/// returned reference. The per-source quota counter
/// (`pending_per_sender`) is keyed by the `hint_sender` recorded at
/// admission; re-attributing a live entry to a different peer here would
/// orphan a count (decremented against the wrong peer on removal/eviction)
/// and silently desync the quota. Mutate only verification-progress
/// fields (e.g. `state`, `verified_sources`, `tried_sources`).
pub fn get_pending_mut(&mut self, key: &XorName) -> Option<&mut VerificationEntry> {
self.pending_verify.get_mut(key)
Comment on lines +144 to +151
// Attacker floods far beyond any single-peer budget.
let attacker_flood: u32 = (MAX_PENDING_VERIFY_PER_PEER as u32).saturating_add(500_000);
let mut attacker_admitted = 0usize;
for i in 0..attacker_flood {
if queues.add_pending_verify(unique_xorname(i), entry_from(attacker)) {
attacker_admitted += 1;
}
}
grumbach added 2 commits May 20, 2026 13:44
…irness (D1)

ReplicationQueues::pending_verify (HashMap) and fetch_queue (BinaryHeap)
had no capacity bound (the source carried the project's own TODO).
handle_neighbor_sync_request documents "No per-request hint count
limit" and the only gate is sender_in_rt, so a routing-table peer could
flood NeighborSyncRequest hints (each message capped only by
MAX_REPLICATION_MESSAGE_SIZE, ~320k 32-byte hints) and grow these
structures unboundedly: memory exhaustion plus a self-amplifying
outbound verification/fetch request storm.

Fix, two layers:

- Global memory backstop: MAX_PENDING_VERIFY / MAX_FETCH_QUEUE
  (131_072 each). add_pending_verify / enqueue_fetch reject once full
  (callers already treat non-admission as "not added").

- Per-source fairness (the actual D1 defence): each pending entry is
  accounted to its hint_sender via a pending_per_sender counter kept in
  lockstep with pending_verify (insert/remove/evict_stale). A single
  source may hold at most MAX_PENDING_VERIFY_PER_PEER (8_192) entries,
  so a flooding peer exhausts only its own quota and can never starve
  honest peers, whose quotas are independent. hint_sender is the
  cryptographically authenticated connection identity, not a forgeable
  payload field, so the accounting cannot be evaded.

A global cap alone would have converted the memory DoS into a worse
silent honest-replication starvation DoS; the per-source quota is what
actually closes D1. Counter decrement uses saturating_sub with a
debug_assert; get_pending_mut documents the no-reattribution invariant.
A residual (~16 in-RT Sybil PeerIds to reach the global backstop) is
documented as an accepted, tracked follow-up.

tests/poc_d1_bounded_queues.rs proves the attack is defeated and that
the critical starvation test fails against a global-cap-only fix.
460 lib tests pass; cfd clean.
…etter

Per review-wave findings on PR WithAutonomi#99:

Replace the `get_pending_mut(&XorName) -> Option<&mut VerificationEntry>`
helper with `set_pending_state(&XorName, VerificationState) -> Option<HintPipeline>`.
The old API exposed `&mut hint_sender` to callers and relied on a doc
comment to keep them from re-assigning it; a re-attribution would have
orphaned a per-source quota count and silently desynced the fairness
counter — the same silent-starvation class this PR was written to
prevent. The new setter writes only `state` and returns `pipeline`,
making the desync impossible to commit by accident.

Single production caller (run_verification_cycle in replication/mod.rs)
migrated; D1f test renamed/updated to exercise the new setter; PoC
flood shrunk from +500_000 to +10_000 over the cap (same security
signal, faster in debug builds).
@grumbach grumbach force-pushed the fix/d1-bounded-replication-queues branch from 102d080 to f3fd84f Compare May 20, 2026 04:55
…s capacity rejections

Per codex round-2 review on PR WithAutonomi#99 (two BLOCKERs + one NIT):

1. [BLOCKER] Silent verified-key drop on full fetch_queue. Both
   verification paths removed the key from pending_verify BEFORE calling
   enqueue_fetch. When MAX_FETCH_QUEUE was full, enqueue_fetch silently
   dropped the key, so a verified key disappeared from every queue with
   no retry path. New `ReplicationQueues::promote_pending_to_fetch`
   checks capacity FIRST, then removes-from-pending-and-enqueues
   atomically; on capacity miss the pending entry is left in place so
   the next verification cycle retries. Both production call sites in
   `run_verification_cycle` migrated. enqueue_fetch now also returns
   bool (true = enqueued, false = dedup-or-full) so the few raw callers
   can observe outcomes.

2. [BLOCKER] False bootstrap completion under capacity-rejected hints.
   `add_pending_verify` now returns `AdmissionResult` (Admitted /
   AlreadyPresent / CapacityRejected) instead of bool, so callers can
   tell "no work needed" from "work was silently dropped".
   `admit_and_queue_hints` now reports a `capacity_rejected_count`
   alongside `discovered`; bootstrap paths (initial sync, neighbour
   sync, recurring sync) feed that count into a new
   `BootstrapState.capacity_rejected_outstanding` counter via
   `bootstrap::note_capacity_rejected`. `check_bootstrap_drained`
   refuses to mark drain complete while that counter is non-zero, so a
   bootstrap snapshot whose hints overflowed the queue caps stays
   pending until the source re-delivers them post-drain.

3. [NIT] `evict_stale_removes_old_entries` test used to bypass
   `add_pending_verify` via direct field access, so it no longer
   exercised the per-sender counter maintenance the followup added to
   `evict_stale`. Routed through the public API and added a per-sender
   counter assertion.

A new `AdmissionResult::admitted()` helper preserves backward-compat
for the dozen-plus call sites that just want the boolean. The few
production sites that distinguish capacity vs dedup branch on the enum.
479 lib tests pass; cfd clean.
Copilot AI review requested due to automatic review settings May 20, 2026 05:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread src/replication/mod.rs Outdated
Comment on lines +882 to +888
// drained.
if outcome.capacity_rejected_count > 0 {
bootstrap::note_capacity_rejected(
&bootstrap_state,
outcome.capacity_rejected_count,
)
.await;
Comment thread src/replication/mod.rs
Comment on lines +1337 to 1345
if is_bootstrapping {
if !outcome.discovered.is_empty() {
bootstrap::track_discovered_keys(bootstrap_state, &outcome.discovered).await;
}
if outcome.capacity_rejected_count > 0 {
bootstrap::note_capacity_rejected(bootstrap_state, outcome.capacity_rejected_count)
.await;
}
}
Comment thread src/replication/mod.rs Outdated
}
if outcome.capacity_rejected_count > 0 {
bootstrap::note_capacity_rejected(bootstrap_state, outcome.capacity_rejected_count)
.await;
Comment thread src/replication/mod.rs
Comment on lines 2019 to 2024
let distance = crate::client::xor_distance(&key, p2p_node.peer_id().as_bytes());
q.enqueue_fetch(key, distance, sources);
// Atomic remove+enqueue: if fetch_queue is at capacity, the
// pending entry is preserved and retried next cycle (no
// silent drop of verified replica-repair work).
let _ = q.promote_pending_to_fetch(key, distance, sources);
}
Comment thread src/replication/mod.rs
Comment on lines 2099 to +2107
if fetch_eligible && !sources.is_empty() {
let distance =
crate::client::xor_distance(&key, p2p_node.peer_id().as_bytes());
q.remove_pending(&key);
q.enqueue_fetch(key, distance, sources);
// Not terminal — key moved to fetch queue.
// Atomic remove+enqueue: on fetch_queue capacity miss
// the pending entry is preserved so this verified key
// is retried on the next cycle (no silent drop).
let _ = q.promote_pending_to_fetch(key, distance, sources);
// Not terminal — either moved to fetch queue, or
// retained as pending until queue drains.
…n drain

Per round-3 review on PR WithAutonomi#99 (both Claude + codex flagged the same
BLOCKER):

The round-2 commit introduced `note_capacity_rejected` /
`clear_capacity_rejected` for bootstrap drain accounting but only wired
the first one. `clear_capacity_rejected` was defined as the retirement
path in the doc comment yet had zero callers. The counter was
monotonically increasing, so after the first `pending_verify` capacity
rejection during bootstrap the drain check returned `false` forever —
the node never transitioned out of `is_bootstrapping`.

Fix: track outstanding rejections per `PeerId` (not as a single
counter) so each source's contribution to "not yet drained" is
retired independently. `note_capacity_rejected` adds the source to a
`HashSet<PeerId>`; `clear_capacity_rejected(source)` removes it. The
three production `admit_and_queue_hints` call sites now branch
explicitly: a cycle with `capacity_rejected_count > 0` notes the
source, while a clean cycle from the same source clears it.

This matches the per-source granularity codex recommended (treating
one source's clean cycle as evidence for THAT source's hints, not
others'), so a node bootstrapping from multiple peers can't have one
peer's clean cycle wrongly retire another peer's outstanding work.

Two new tests in `bootstrap::tests` pin the behaviour:
  capacity_rejected_clears_on_clean_cycle  — round-3 regression guard
  capacity_rejected_is_per_source          — per-source granularity

481 lib tests pass; 211 replication tests; 6 D1 PoC; cfd clean.
Copy link
Copy Markdown
Collaborator

@dirvine dirvine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #99 — D1 DoS fix

Architecture & Soundness

The two-layer defence is exactly right:

  1. Global memory backstop (MAX_PENDING_VERIFY / MAX_FETCH_QUEUE) prevents unbounded memory growth.
  2. Per-source fairness (MAX_PENDING_VERIFY_PER_PEER) prevents a single peer from starving honest peers — the key architectural insight that a global cap alone would convert OOM into worse honest-replication starvation.

The test poc_d1_flooding_peer_cannot_starve_honest_peer is critical: it verifies the actual D1 defence (per-source fairness), not just the memory bound. Well designed.

Implementation details

  • AdmissionResult enum (Admitted / AlreadyPresent / CapacityRejected) is a clean improvement over the previous bool — distinguishes dedup from genuine capacity rejection, essential for bootstrap drain correctness.
  • promote_pending_to_fetch (remove+enqueue under the write lock) prevents the silent-drop regression. The RwLock serialization means there is no race.
  • set_pending_state replacing get_pending_mut is a good API narrowing — prevents accidental hint_sender re-attribution that would desync the per-source counter.
  • release_sender_slot with saturating_sub + debug_assert is defensive and test-friendly.
  • Bootstrap drain with capacity_rejected_sources per-peer is thorough — the round-3 regression test validates the clear mechanism explicitly.

Residual risk

The PR documentation notes the approx 16 Sybil residual. This is acceptable because:

  • The global memory backstop still bounds worst-case memory to ~131K entries (~128 MB).
  • 16 simultaneous Sybils in the routing table is a substantial operational bar.
  • The degradation is to bounded memory, not to silent starvation.
  • This is correctly tracked as a follow-up, not left unexamined.

Verdict: LGTM — well-structured, well-tested, merge-ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants