Skip to content

PR-X3: BlockedGrid hierarchical block layout (workers A1-A5 + design docs)#158

Merged
AdaWorldAPI merged 18 commits into
masterfrom
claude/pr-x3-cognitive-grid-design
May 18, 2026
Merged

PR-X3: BlockedGrid hierarchical block layout (workers A1-A5 + design docs)#158
AdaWorldAPI merged 18 commits into
masterfrom
claude/pr-x3-cognitive-grid-design

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

@AdaWorldAPI AdaWorldAPI commented May 18, 2026

Summary

Ships BlockedGrid<T, BR, BC> — a generic, const-generic-shaped, hierarchical block-padded 2-D grid in crate::hpc::blocked_grid::* — plus the blocked_grid_struct! SoA-of-grids macro. Layout-only, scalar inner loops, forward-compatible with the per-arch SIMD register-stack swap planned for PR-X5.

This is the PR-X3 carve-out from the cognitive-shader roadmap (see .claude/knowledge/cognitive-shader-foundation.md §"Current Gaps" and .claude/knowledge/pr-x3-cognitive-grid-design.md). All seven sprint workers (A1-A6 + B) landed. Codex P0 audit verdict: READY-FOR-PR, 0 P0, 2 P1 patched. P2 savant pre-merge review next.

What ships

Public API surface (crate::hpc::blocked_grid::*):

Worker File Public items Commit
A1 base.rs BlockedGrid<T, BR, BC> struct, new/new_with_pad, idx/get/set, as_padded_slice* with # Footgun docs, GridBlock/GridBlockMut view types with PhantomData lifetime variance, compile-time BR > 0 && BC > 0 assert a7e9a67
A2 iter.rs BaseBlockIter/BaseBlockIterMut, blocks_base/blocks_base_mut, GridBlockMut::row_mut af8a4c8
A3 super_block.rs GridSuperBlock/GridSuperBlockMut, TierBlockIter/TierBlockIterMut, blocks_tier::<N>/blocks_tier_mut::<N> with documented panic on invalid divisibility 195ce67
A4 compute.rs map_base/map_tier (PRIMARY compute — immutable self, returns new grid), bulk_apply_base/bulk_apply_tier (SECONDARY write-back, each carrying # Data-flow rule docstring citing .claude/rules/data-flow.md Rule #3) 2ed97a6
A5 aliases.rs Seven type aliases (ShaderMantissaGrid, AmxBf16Grid, AmxInt8Grid, StripF32Stack2, StripF32Stack4, SquareF64Stack8, HalfSquareU64); L1/L2/L3/L4 alias impls on BlockedGrid<T, 64, 64> only — Q7 ruling — for blocks_l* / map_l* / bulk_apply_l* b479956
A6 tests.rs + mod.rs doctest Integration tests (W4 bulk_apply composition, L1→L2 cascade, footgun verification) + module-level canonical compose doctest + compile_fail guards 32eaf11
B grid_struct_macro.rs blocked_grid_struct! #[macro_export] SoA-of-grids macro, generated {Name}L1Block/{Name}L1BlockMut, map_l1/bulk_apply_l1 on generated structs (L2-L4 macro methods deferred to follow-up), FieldGridRef trait + field_n::<I>() const-generic accessor b4c6692

Design docs (.claude/knowledge/):

  • pr-x3-cognitive-grid-design.md (v2) — binding spec, absorbs plan-review savant P0/P1/P2 patches and seven Q1-Q7 rulings
  • pr-x3-plan-review.md — Phase 2 savant verdict
  • pr-x3-codex-audit.md — Phase 11 codex audit verdict (READY-FOR-PR)
  • pr-x1-design.md + pr-x2-design.md — drafted in parallel, queued for their own sprints

Layering, data-flow, and distance-typing guardrails

Dependency addition

Adds paste = "1" to Cargo.toml (already present in workspace lock via crates/burn; binary impact zero). Re-exported as #[doc(hidden)] pub use paste; in src/lib.rs to support hygienic ident concat ([<$name L1Block>]) inside the blocked_grid_struct! macro. Standard stable-Rust mechanism for macro_rules! identifier generation.

Test plan

Current state (after codex P1-1 patch at 01a70edb): 111 lib tests + 79 doctests, all 5 cargo gates green.

  • cargo check -p ndarray --no-default-features --features std — PASS
  • cargo test -p ndarray --lib --no-default-features --features std hpc::blocked_grid — 111/111 PASS
  • cargo test --doc -p ndarray --no-default-features --features std hpc::blocked_grid — 79/79 PASS
  • cargo fmt --all -- --check — clean
  • cargo clippy -p ndarray --no-default-features --features std -- -D warnings — clean
  • Codex P0 audit: READY-FOR-PR, 0 P0, 2 P1 applied
  • P2 savant pre-merge review (running)

Sprint protocol traceability

Per .claude/knowledge/pr-x3-cognitive-grid-design.md §"Worker decomposition":

  1. ✅ Plan v1 (b348d43c)
  2. ✅ Plan-review savant (Phase 2): READY-WITH-DOC-FIXES, 2 P0 + 7 P1 + 4 P2
  3. ✅ Plan v2 (c6414ec0) + per-worker file refactor (b5329f06)
  4. ✅ Workers A1-A6 + B (sequential A1, parallel A2+A3, sequential A4→A5→A6→B)
  5. ✅ Codex P0 audit (Phase 11): READY-FOR-PR; P1-1 patched (01a70edb)
  6. 🟢 P2 savant pre-merge review (Phase 13) — running
  7. ⬜ Flip to ready-for-review + merge ladder

Roadmap (post-merge)

  • PR-X4 — refactor splat3d/tile.rs 16×16-tile binning onto BlockedGrid
  • PR-X5 — typed SIMD register-bank stacks (StackedU64x8<N>, StackedF32x16<N>, AmxTile<T, R, C>) in crate::simd::* with per-arch LazyLock dispatch
  • W7 — typed cognitive distance bulk fns + the actual CausalEdge64 mantissa cell kernel
  • PR-X1 / PR-X2 — separately queued (drafted in .claude/knowledge/pr-x1-design.md and pr-x2-design.md)
  • Macro L2/L3/L4 methods (deferred from B) — small follow-up PR

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

claude added 17 commits May 18, 2026 12:03
PR-X3 design contract for the next-wave sprint after W3-W6 + #157.

Scope (PR-X3): CognitiveGrid<T, BR, BC> const-generic 2-D blocked grid
with hierarchical tier iterators (L1=64x64 / L2=256x256 / L3=4096x4096 /
L4=16384x16384 on the default 64x64 base) + cognitive_grid_struct!
macro for SoA-of-grids. CausalEdge64 (u64) is the canonical cell type
acting as cognitive-shader mantissa.

Layering: scalar layout only. No #[target_feature], no per-arch imports,
no SIMD primitives. Forward-compatible with PR-X5 (SIMD register-bank
stacks) and W7 (typed cognitive distance bulk fns + cell kernels).

Hardware-block x cell-type matrix documents the AMX BF16 (16x16),
AMX INT8 (16x64 half-square), AVX-512 F32x16 / F64x8 / U64x8 / U8x64,
and NEON dotprod natural shapes. Default 64x64 is the LCM of all useful
register-bank shapes; const generics let consumers specialize.

Sequential 5-10 Sonnet workers + 1 Opus coordinator protocol per the
binding pattern: plan -> review -> correct -> sprint -> review code ->
fix P0 -> commit -> repeat. Workers in isolated worktrees, sequential
ordering (Worker B macro depends on Worker A core API).

Token-reset safety: doc is self-contained, includes context recovery
notes for fresh sessions arriving without conversational history.

Cross-references: w3-w6-soa-aos-design.md, cognitive-shader-foundation.md,
cognitive-distance-typing.md, vertical-simd-consumer-contract.md,
w3-w6-codex-audit.md, w3-w6-p2-savant-review.md.
Phase 3 corrector pass on the PR-X3 BlockedGrid design doc, applying the
plan-review savant verdict (READY-WITH-DOC-FIXES, 2 P0 + 7 P1 + 4 P2).

P0 patches applied:
- A1: split bulk_apply_base/tier into map_base/map_tier (PRIMARY compute,
  immutable self, returns new grid) + bulk_apply_base/tier (SECONDARY
  write-back, with explicit data-flow Rule #3 docstring citing
  .claude/rules/data-flow.md).
- A2: macro emits both map_l1-l4 (compute) AND bulk_apply_l1-l4
  (write-back) on the generated SoA-of-grids struct.

P1 patches applied:
- F1: CognitiveGrid → BlockedGrid rename; module path crate::hpc::blocked_grid
- F2: Block → GridBlock, BlockMut → GridBlockMut, SuperBlock → GridSuperBlock
  (avoids collision with crate::backend::native BLAS Block)
- F3: cache-hierarchy convention note (L1 innermost, L4 framebuffer-scale)
- G2: keep both map_tier::<N> + L1-L4 aliases (and same for bulk_apply)
- G6: add new_with_pad(rows, cols, pad_value: T) ctor (T: Copy only,
  no Default bound); new() delegates with T::default()
- G3: # Footgun doc section on as_padded_slice + as_padded_slice_mut
- G4: macro emits field_n::<I> const-generic field accessors

P2 patches applied:
- J1: PhantomData lifetime variance note on GridBlock/GridBlockMut
- J4: module-level docstring out-of-scope warning requirement (3 lines max)

Q1-Q7 rulings persisted in §"Resolved questions" (was §"Open questions" in v1).

Worker decomposition: 7-worker split (A1-A6 + B) is the DEFAULT, not the
fallback. Fixed §"Sprint protocol" step 4 contradiction (was "Two workers
in parallel" — corrected to "Spawn workers SEQUENTIALLY").

Verdict file persisted at .claude/knowledge/pr-x3-plan-review.md (savant
had no write permission — coordinator wrote it post-task).
The previous Read/Write/Edit/MultiEdit/NotebookEdit allow entries used the
bare `**` glob, which doesn't match against actual file paths in the
current Claude Code harness — so every Edit/Write call triggered a
permission popup despite being on the allowlist. Switching to the `{**}`
glob form (curly-brace alternation) so the patterns actually fire.

Deny entries (./.archive/**, ./.git/**, ./CLAUDE-CREDENTIALS.md, …)
are left untouched — they use absolute prefixes and were matching
correctly. Only the catch-all "any path" entries needed the syntax fix.
… (PR-X3 A1)

First sprint cut for PR-X3 per .claude/knowledge/pr-x3-cognitive-grid-design.md
(design v2 @ c6414ec).

Ships:
- BlockedGrid<T, BR, BC> struct (private fields, row-major padded storage)
- new(rows, cols) + new_with_pad(rows, cols, pad_value)
- rows/cols/padded_rows/padded_cols/block_dims/idx/get/set accessors
- as_padded_slice + as_padded_slice_mut with # Footgun docs
- GridBlock<'a, T, BR, BC> + GridBlockMut<'a, T, BR, BC> view types
  (PhantomData lifetime variance per Q3 ruling)
- Inline unit tests for all of the above
- Const-generic compile-time assertion BR > 0 && BC > 0

Iterators, super-blocks, map_*, bulk_apply_*, and convenience aliases
deferred to workers A2-A5. Macro deferred to worker B.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
A1 shipped 772 lines in mod.rs at 1101b7d (cherry-picked as a7e9a67).
Refactoring into one-file-per-sprint-worker per the design doc's new
§"Per-worker file scoping" section, so workers A2-A5 + B can spawn in
parallel without colliding on the same file.

File layout:
- mod.rs             — slim index (submodule decls + re-exports)
- base.rs            — A1's content (BlockedGrid, GridBlock, GridBlockMut,
                       all accessors, inline tests). Named `base` not
                       `core` to avoid shadowing the std `core` crate.
- iter.rs            — A2 stub (BaseBlockIter, BaseBlockIterMut)
- super_block.rs     — A3 stub (GridSuperBlock, TierBlockIter, blocks_tier)
- compute.rs         — A4 stub (map_*, bulk_apply_*)
- aliases.rs         — A5 stub (convenience aliases + L1-L4 impls)

All 5 gates still green after refactor:
- cargo check                       : PASS
- cargo test --lib (blocked_grid)   : 23/23 PASS
- cargo test --doc (blocked_grid)   : 25/25 PASS
- cargo fmt --check                 : PASS
- cargo clippy -D warnings          : PASS

Also tightens .claude/settings.json: replaces the catch-all `Edit({**})`
/ `Write({**})` with per-area entries (src/{**}, crates/{**},
.claude/knowledge/{**}, Cargo.toml, etc.). NotebookEdit removed (no
notebooks in this project). Read({**}) stays broad — agents need
read access everywhere for context.

Design doc gains §"Per-worker file scoping (binding)" with the worker→file
mapping table, and §"The agent sequence for PR-X3" now notes that workers
A2-A5 can spawn in parallel once the file-split scaffolding is landed.
…oa_struct! pad_to_lanes

PR-X1 design — MultiLaneColumn, Fingerprint::as_u8x64, array_window,
simd::* re-export sweep. Carves out the SIMD-staged inner-loop primitives
flagged by the W3-W6 P2 savant review (A1/A4 findings).

PR-X2 design — generalize aos_to_soa / soa_to_aos to <T, U, N> so non-f32
element types are first-class, and add the #[soa(pad_to_lanes=N)] field
attribute to soa_struct! so SIMD kernels get guaranteed tail padding.

Both designs follow the same 7-phase sprint-protocol shape as PR-X3
(plan → review → correct → sprint sequential → audit → fix P0 → P2 review).
Sequential worker decomposition. No code changes — design docs only.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
BaseBlockIter / BaseBlockIterMut + blocks_base / blocks_base_mut impls
on BlockedGrid<T, BR, BC>. Row-major iteration over the BR×BC base blocks.
Inline tests for all spec cases.

Also adds GridBlockMut::row_mut in iter.rs (needed by iterator doctests and
downstream workers; A1's base.rs exposed data_mut + padded_cols helpers that
make this possible from within the sibling module without touching base.rs
field visibility).

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Uncomment the `pub use iter::{BaseBlockIter, BaseBlockIterMut};` line
now that A2 (a4975a0) has landed real implementations. Cherry-picked
A2's commit as af8a4c8. All 5 gates green: 34 lib tests + 30 doctests.
GridSuperBlock<'a, T, BR, BC, N> + GridSuperBlockMut + TierBlockIter +
TierBlockIterMut + blocks_tier::<N> / blocks_tier_mut::<N> impls on
BlockedGrid. Const-generic N=tier-stride. Panics on invalid (BR*N, BC*N)
divisibility with a documented error message. Inline tests for all
spec'd cases including the panic case via #[should_panic].

Also: moved as_padded_slice / as_padded_slice_mut from impl<T: Copy> to
impl<T> in base.rs — those methods only borrow &[T] / &mut [T] and do
not need T: Copy; the Copy bound blocked blocks_tier from calling them.
Added pub(super) GridBlock::from_raw + GridBlockMut::from_raw to base.rs
so super_block.rs can construct base-block views without T: Copy.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
…om mod.rs

Uncomment the `pub use super_block::{...}` line now that A3 (4224b33)
has landed real implementations. Adds `TierBlockIterMut` to the re-export
list (A3 shipped both read-only and mutable tier iterators). All 5 gates
green: 49 lib tests + 48 doctests.
…PR-X3 A4)

Splits the API into:
- map_base / map_tier : PRIMARY compute paths (immutable self, returns
  a new BlockedGrid<U, BR, BC>) — satisfy data-flow Rule #3
- bulk_apply_base / bulk_apply_tier : SECONDARY write-back paths
  (&mut self) — each carries the mandatory # Data-flow rule docstring
  section citing .claude/rules/data-flow.md verbatim

Closure signatures use the two-block pattern (input block + output block)
for map_*, and (mut output block + coordinates) for bulk_apply_*.

Inline tests verify input-unchanged invariant on map_*, write-back
correctness on bulk_apply_*, panic propagation on bulk_apply_tier with
invalid divisibility, and empty-grid degenerate cases.

Also adds GridBlock::row() / rows() accessors in compute.rs (spec'd in
the PR-X3 design doc but missing from A1's base.rs). Requires two
#[doc(hidden)] pub helpers on GridBlock in base.rs (data_slice() and
padded_cols_stride()) so compute.rs can reach private fields without
reopening base.rs wholesale — the minimal touch justified by spec gap.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Type aliases for the cognitive-shader and SIMD-tier shapes:
- ShaderMantissaGrid (u64, 64, 64) — CausalEdge64 mantissa default
- AmxBf16Grid (u16, 16, 16) — AMX BF16 TDPBF16PS tile shape
- AmxInt8Grid (u8, 16, 64) — AMX INT8 TDPBUSD half-square shape
- StripF32Stack2 / Stack4 (f32, 2|4, 16) — F32x16 vertical stacks
- SquareF64Stack8 (f64, 8, 8) — F64x8 8×8 GEMM kernel shape
- HalfSquareU64 (u64, 32, 64) — half-square U64 grid

L1/L2/L3/L4 alias impls on BlockedGrid<T, 64, 64> ONLY (Q7 ruling):
- blocks_l1/2/3/4 delegating to blocks_base / blocks_tier::<4|64|256>
- map_l1/2/3/4 delegating to map_base / map_tier::<4|64|256>
- bulk_apply_l1/2/3/4 delegating to bulk_apply_base / bulk_apply_tier::<N>

Each bulk_apply_l* carries the verbatim # Data-flow rule docstring section
matching A4's bulk_apply_base. Cache-hierarchy convention (L1 innermost,
L4 framebuffer-scale) documented in the first method's docstring.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Uncomment the aliases re-export now that A5 (2402275) has landed. All
74 lib tests and 74 doctests passing. Cargo fmt + clippy clean.
…X3 A6)

Adds the final test-density layer after A1-A5 shipped their inline
#[cfg(test)] coverage:

- src/hpc/blocked_grid/tests.rs (new) — integration tests that span
  multiple submodules: W4 bulk_apply composition, L1→L2 cascade, all
  seven type aliases instantiate, half-square AMX INT8 pattern,
  as_padded_slice footgun verification, const-generic compile-fail
- src/hpc/blocked_grid/mod.rs — module-level doctest demonstrating the
  canonical compose pattern (ShaderMantissaGrid → map_l1 → verify input
  unchanged + output as expected) and #[cfg(test)] mod tests registration

Test count: +17 lib tests + 2 doctests above the A5 baseline (74 + 74).
All 5 gates green.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
…ds (PR-X3 B)

The macro generates an SoA-of-grids struct: each named field becomes its
own BlockedGrid<FieldT, BR, BC> with shared rows/cols/padded dimensions.

Generated API (v1 — L1 only; L2/L3/L4 deferred to follow-up):
- {Name}::new(rows, cols) constructor
- rows/cols/padded_rows/padded_cols accessors
- blocks_l1() lockstep iteration → {Name}L1Block<'_>
- map_l1() PRIMARY compute path → new {Name}, input unchanged
- bulk_apply_l1() SECONDARY write-back, carries verbatim
  # Data-flow rule docstring section per .claude/rules/data-flow.md Rule #3
- field_n::<I>() compile-time field accessor (P1 G4 ruling)
- {Name}L1Block / {Name}L1BlockMut / {Name}L1BlockIter view types

Also adds:
- FieldGridRef trait (object-safe dimension accessor for &dyn use)
- Clone impl for BlockedGrid<T: Copy + Default, BR, BC>
- paste = "1" dependency (identifier concat for {Name}L1Block naming)
- pub use paste re-export in lib.rs for $crate::paste::paste! macro hygiene

Reserved field names enforced (compile error if shadowed): `new`, `rows`,
`cols`, `padded_rows`, `padded_cols`, `blocks_l1/2/3/4`, `map_l1/2/3/4`,
`bulk_apply_l1/2/3/4`, `field_n`, `default`.

Inline tests: 2/3/4-field generation, pub/private field visibility,
#[derive(Clone)] passthrough, map_l1 input-unchanged invariant,
bulk_apply_l1 lockstep mutation, field_n::<0>/<1> accessor.

5-gate result: check OK, lib 111/111, doc 79/79, fmt OK, clippy OK.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
… + persist audit verdict

Codex P0 audit (Phase 11) returned READY-FOR-PR with 0 P0, 2 P1, 2 P2.

P1-1 (applied): GridBlockMut::row_mut had module-level data-flow
framing but not a method-level `# Data-flow rule` docstring section.
Added the verbatim citation of .claude/rules/data-flow.md Rule #3
and the pointer to BlockedGrid::map_base for compute paths.

P1-2 (PR description action): `paste = "1"` dep addition will be
called out in the PR description on next update.

P2 findings (deferred to P2 savant in Phase 13):
- pub helpers on GridBlock/GridBlockMut should perhaps be pub(crate)
- field_n::<I>() type erasure — additive typed accessor worth considering

Verdict file persisted at .claude/knowledge/pr-x3-codex-audit.md.
…st verdict

P2 savant (Phase 13) verdict: SHIP-WITH-FOLLOWUPS. 4 P2 findings; 3 applied
in this commit, 1 deferred to PR-X3.1.

P2-1 (applied) — downscope `pub` helpers on GridBlock/GridBlockMut to
`pub(crate)`. The four helpers (data_slice, padded_cols_stride on
GridBlock; data_mut, padded_cols on GridBlockMut) are intra-crate
implementation seams. Leaving them `pub` meant downstream consumers
could bypass the `# Footgun` guard on `as_padded_slice`. Also drops the
`#[doc(hidden)]` attribute — no longer needed once visibility is tight.

P2-3 (applied) — drop stray `T: Copy` bound from `GridBlock::from_grid`,
`GridBlockMut::from_grid`, `Iterator for BaseBlockIter`, `Iterator for
BaseBlockIterMut`, both `ExactSizeIterator` impls, and the impl block
holding `blocks_base` / `blocks_base_mut`. None of these positions
actually copy a `T` value — they only compute index arithmetic and
slice the storage. The bound was over-constraining; iterator surface
now works for any `T` (not just `T: Copy`). `BlockedGrid::get` / `set`
still correctly require `T: Copy` because they do copy values.

P2-4 (applied) — strengthen macro L1-only deferral wording with explicit
PR-X3.1 ticket reference + `TODO(PR-X3.1)` marker + dedicated per-field
workaround warning. Reduces the risk that callers cement per-field
loops outside the macro-generated struct.

P2-2 (DEFERRED → PR-X3.1) — typed `field_grid::<I, FieldT>()` accessor
alongside the existing erased `field_n::<I>()`. Additive but requires
either a downcast trait or extra macro emit arm; no current consumer
needs it.

Verdict file persisted at .claude/knowledge/pr-x3-p2-savant-review.md.
PR-X3.1 follow-up backlog documented at the bottom of the verdict file.

All 5 gates green after tightenings:
- cargo check: PASS
- cargo test --lib hpc::blocked_grid: 111/111 PASS
- cargo test --doc hpc::blocked_grid: 79/79 PASS
- cargo fmt --check: clean
- cargo clippy -D warnings: clean
@AdaWorldAPI AdaWorldAPI marked this pull request as ready for review May 18, 2026 14:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81766e63d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hpc/blocked_grid/base.rs Outdated
row_origin,
col_origin,
padded_cols,
data: &mut grid.data[start..end],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid overlapping mutable block views

When a grid has more than one block column, this strided slice includes the gaps between the block's rows, so adjacent column blocks overlap in memory. For example, in a 64×128 padded grid with 64×64 blocks, block (0,0) stores data[0..8128] and block (0,1) stores data[64..8192]; blocks_base_mut() can yield both GridBlockMuts alive at once via repeated next(), creating overlapping &mut [T] and making the safe iterator unsound. The mutable view needs to avoid holding a single strided &mut range, or the iterator must split the backing storage into truly disjoint pieces.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed UB. Fixed in 656a368.

Root cause: GridBlockMut::data: &'a mut [T] held a strided slice from [start..start + (BR-1)*padded_cols + BC], which fundamentally cannot describe a block without aliasing adjacent column blocks (they share rows that interleave in memory). BaseBlockIterMut::next happily yields multiple such slices alive at once via raw-pointer re-borrow, hitting the UB you described.

Fix: converted data: &'a mut [T]data: *mut T + data_len: usize. The struct's new aliasing invariant (documented in the type docstring) forbids any wide &mut [T] materialization; cell access is exclusively through row_mut(r), which materializes &mut [T] of length BC starting at (row_origin + r, col_origin). Across simultaneously-live blocks, these per-row materializations target disjoint cells (each block owns its own [col_origin, col_origin + BC) column range on every physical row), so no two live &mut [T] ever alias.

Also tightened: from_raw is now unsafe fn with a documented caller contract; explicit unsafe impl Send + Sync matching the existing pattern on BaseBlockIterMut; replaced the data_mut() accessor (the vehicle for the UB) with data_ptr() -> *mut T + data_len() -> usize. All 5 gates still green (111 lib + 79 doctests).

Filed a follow-up note for PR-X3.1 to add a SAFETY-claim verification gate to the codex P0 audit so this class of bug — latent UB that type-checks but violates the aliasing model — gets caught pre-merge.


Generated by Claude Code

// to `next()` advances past the rows consumed by the returned
// super-block (disjoint from all future yields), so no two live
// `GridSuperBlockMut` items alias the same memory.
data: unsafe { self.data.add(start) },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict mutable super-blocks to their columns

For grids with multiple super-block columns, blocks_tier_mut() advances in row-major order but each yielded GridSuperBlockMut starts at (row_origin, 0) and spans the full row slab. Thus adjacent super-blocks in the same row, such as blocks_tier_mut::<1>() on a 64×128 padded grid, receive the same starting pointer and length even though callers may keep both yielded items alive. That violates the exclusive-access invariant advertised by the mutable iterator and can produce overlapping mutable base-block views when both super-blocks are used.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed UB. Fixed in 656a368 alongside the related base-block bug (#158 review thread on base.rs:492).

You correctly diagnosed both halves: TierBlockIterMut::next() set data: *mut T from data_ptr.add(row_origin * padded_cols) with col_origin not entering the offset, so adjacent super-blocks in the same row received IDENTICAL pointers spanning the full row slab. Then base_blocks_mut() materialized strided &mut [T] from those colliding pointers via slice::from_raw_parts_mut(data_ptr.add(start), end - start) where end - start extended into adjacent super-blocks' columns — propagating the overlap to the GridBlockMut level.

Root cause was at the GridBlockMut level (the underlying base-block view's data: &'a mut [T] strided slice was fundamentally unsound for any block in a multi-column grid). Fixing GridBlockMut to use data: *mut T + per-row materialization through row_mut resolves both bugs:

  • BaseBlockIterMut: blocks no longer carry wide strided &mut [T]; row_mut materializes only [col_origin, col_origin+BC) cells, disjoint across blocks
  • TierBlockIterMut → base_blocks_mut: the intermediate slice::from_raw_parts_mut is removed; the super-block now passes raw pointer + length straight to GridBlockMut::from_raw (now unsafe fn with documented caller contract). Cell access flows through row_mut, picking up the same column-disjoint invariant.

The super-block's own data: *mut T is still set to row_origin's first cell (not col_origin), but that's now harmless — raw pointers can alias; the invariant only matters at &mut [T] materialization, which now happens exclusively per-row at column-disjoint addresses.

All 5 gates green: 111 lib tests + 79 doctests, fmt + clippy clean. Queued a SAFETY-claim verification gate for PR-X3.1's codex audit prompt to catch this class of latent UB pre-merge.


Generated by Claude Code

…ex P1×2)

Codex review on PR #158 flagged two P1 soundness bugs:

1. `BaseBlockIterMut::next()` yields `GridBlockMut` instances carrying
   `data: &'a mut [T]` slices over the strided block footprint
   (`[start..start + (BR-1)*padded_cols + BC]`). For grids with multiple
   block columns, adjacent column blocks' slices overlap heavily — e.g.,
   on a 64×128 padded grid with 64×64 blocks, block (0,0) covers
   `data[0..8128]` and block (0,1) covers `data[64..8192]`. Two such
   `&mut [T]` simultaneously live = UB.

2. `TierBlockIterMut::next()` yields `GridSuperBlockMut` instances with
   `data: *mut T` set to `data_ptr.add(row_origin * padded_cols)` —
   col_origin doesn't enter the offset, so adjacent super-blocks in the
   same row receive IDENTICAL raw pointers spanning the full row slab.
   `base_blocks_mut()` then materialized strided `&mut [T]` from these
   colliding super-block pointers, propagating the UB.

Both bugs trace to the same root cause: `GridBlockMut::data` stored
a strided `&'a mut [T]` slice that referenced cells outside the block's
own column range. Adjacent column blocks fundamentally share rows that
interleave in memory; no contiguous `&mut [T]` can describe a block
without aliasing siblings.

Fix: change `GridBlockMut::data` from `&'a mut [T]` to `*mut T` + add
`data_len: usize` for bounds checking. The struct's new aliasing
invariant (documented in the type-level docstring) is: `data` is NEVER
converted to a wide `&mut [T]`; cell access happens exclusively through
`row_mut(r)`, which materializes `&mut [T]` of length BC starting at
the block's own `(row_origin + r, col_origin)`. Across blocks, these
per-row materializations target disjoint cells (each block owns its
own `[col_origin, col_origin + BC)` column range), so no two live
`&mut [T]` ever alias.

Also:
- `GridBlockMut::from_raw` is now `unsafe fn` with documented caller
  contract (raw pointer + length + per-block column-disjoint invariant)
- Added `unsafe impl Send + Sync` for `GridBlockMut<T: Send/Sync>`
  matching the existing pattern on `BaseBlockIterMut` / `GridSuperBlockMut`
- Renamed `GridBlockMut::padded_cols` pub(crate) accessor to
  `padded_cols_stride` for naming consistency with `GridBlock`
  (resolves a PR-X3.1 housekeeping item early)
- Replaced `data_mut() -> &mut [T]` pub(crate) accessor with
  `data_ptr() -> *mut T` + `data_len() -> usize`. The wide-slice
  accessor was the materialization vehicle for the UB.
- Updated `iter.rs::row_mut` to materialize via `slice::from_raw_parts_mut`
  with a debug_assert bounds check and verbatim SAFETY comment
- Updated `super_block.rs::base_blocks_mut` to pass raw pointer + length
  to the new unsafe `from_raw` (no intermediate strided slice)
- Updated `super_block.rs::tier_mut_2_mutation_visible` test to use
  `row_mut(0)[0]` instead of the removed `data_mut()` accessor

All 5 gates still green:
- cargo check: PASS
- cargo test --lib hpc::blocked_grid: 111/111 PASS
- cargo test --doc hpc::blocked_grid: 79/79 PASS
- cargo fmt --check: clean
- cargo clippy -D warnings: clean

Codex audit gap noted for PR-X3.1: future audits need a SAFETY-claim
verification gate that simulates adversarial iterator usage (e.g.,
collect all yielded items into a Vec before consuming any) to catch
this class of latent UB that passes type-checking but violates the
aliasing model.
@AdaWorldAPI AdaWorldAPI merged commit 589ef56 into master May 18, 2026
15 checks passed
AdaWorldAPI pushed a commit that referenced this pull request May 18, 2026
…wildcards

Master consolidation: ndarray::hpc::* becomes the universal CPU-shape-aware
substrate. 10-submodule layout. Invariant 12 replaces jc's zero-dep rule
("certification = determinism + inspectability, not repo separation").
8-week schedule across 6 sprints with concurrent execution where the
dependency graph permits.

PR-X11 — jc consolidation: 6 workers move ewa_sandwich (Pillar-6),
ewa_sandwich_3d (Pillar-7), koestenberger, pflug (Pillar-10),
+ NEW Pillar-8 temporal_sandwich, Pillar-9 Cov<N> high-D, Pillar-11
signature transform into ndarray::hpc::pillar::*. Wasserstein/Sinkhorn-
Knopp/Hungarian primitives go to linalg::wasserstein. jc deprecates to
a thin probe-runner; 1-cycle #[deprecated] shim.

PR-X12 — x265-style codec: 8 workers ship ndarray::hpc::codec::* with
CTU/CU quad-tree, 4 modes (skip/merge/delta/escape), λ-RDO, rANS entropy
coder (chosen over CABAC for cache-friendliness; 0.5% compression-ratio
diff). PR-X9's lazy basin-codebook consumes this codec. Target: ~2.4
bytes/cell on coherent input, ≤ 4 bytes/cell worst-case (no regression).

PR-X13 — OGIT bridge: 4 workers embed the OGIT Cognitive namespace TTL
files (~150 KB) into ndarray via include_str! + ship a minimal Turtle
parser (~250 LoC, no rdflib dep) + O(1) family bitmap lookup. Subsumes
PR-Z1 (OGIT bootstrap) + PR-Z2 (lance-graph CognitiveBridge). 3-repo
coordination collapses to 1 sprint. Bardioc REST client integration
becomes optional follow-on, not blocker.

Phase 1 (Protocol B: plan → savant review → correct) drafts complete:
- pr-x3-cognitive-grid-design.md (shipped as PR #158)
- pr-x4-design.md
- pr-x9-design.md
- pr-z1-ogit-cognitive-bootstrap.md (superseded by PR-X13)
- pr-arithmetic-inventory.md
- pr-x10-linalg-core-design.md
- pr-master-consolidation.md
- pr-x11-jc-consolidation-design.md
- pr-x12-codec-x265-design.md
- pr-x13-ogit-bridge-design.md

Phase 2 (Protocol A: preflight Rust skeleton → parallel-savant fan-out →
workers fill bodies) starts after joint plan-review savant verdict on
all 10 docs. Per-sprint specialist savants: data-flow, layering,
distance-typing, SAFETY-claim, naming-collision, test-coverage.
SAFETY-claim savant exists specifically to catch the class of latent UB
that PR-X3's GridBlockMut had (caught post-merge by codex; preflight
catches it pre-implementation).

Also adds settings.json wildcard permissions (Edit/Write/MultiEdit/
NotebookEdit + Bash touch/cat/tee/bash) per user authorization. Reduces
popup friction for the upcoming 44-worker concurrent execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants