PR-X3: BlockedGrid hierarchical block layout (workers A1-A5 + design docs) by AdaWorldAPI · Pull Request #158 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-05-18T14:05:16Z

Summary

Ships BlockedGrid<T, BR, BC> — a generic, const-generic-shaped, hierarchical block-padded 2-D grid in crate::hpc::blocked_grid::* — plus the blocked_grid_struct! SoA-of-grids macro. Layout-only, scalar inner loops, forward-compatible with the per-arch SIMD register-stack swap planned for PR-X5.

This is the PR-X3 carve-out from the cognitive-shader roadmap (see .claude/knowledge/cognitive-shader-foundation.md §"Current Gaps" and .claude/knowledge/pr-x3-cognitive-grid-design.md). All seven sprint workers (A1-A6 + B) landed. Codex P0 audit verdict: READY-FOR-PR, 0 P0, 2 P1 patched. P2 savant pre-merge review next.

What ships

Public API surface (crate::hpc::blocked_grid::*):

Worker	File	Public items	Commit
A1	`base.rs`	`BlockedGrid<T, BR, BC>` struct, `new`/`new_with_pad`, `idx`/`get`/`set`, `as_padded_slice*` with `# Footgun` docs, `GridBlock`/`GridBlockMut` view types with `PhantomData` lifetime variance, compile-time `BR > 0 && BC > 0` assert	`a7e9a67`
A2	`iter.rs`	`BaseBlockIter`/`BaseBlockIterMut`, `blocks_base`/`blocks_base_mut`, `GridBlockMut::row_mut`	`af8a4c8`
A3	`super_block.rs`	`GridSuperBlock`/`GridSuperBlockMut`, `TierBlockIter`/`TierBlockIterMut`, `blocks_tier::<N>`/`blocks_tier_mut::<N>` with documented panic on invalid divisibility	`195ce67`
A4	`compute.rs`	`map_base`/`map_tier` (PRIMARY compute — immutable self, returns new grid), `bulk_apply_base`/`bulk_apply_tier` (SECONDARY write-back, each carrying `# Data-flow rule` docstring citing `.claude/rules/data-flow.md` Rule #3)	`2ed97a6`
A5	`aliases.rs`	Seven type aliases (`ShaderMantissaGrid`, `AmxBf16Grid`, `AmxInt8Grid`, `StripF32Stack2`, `StripF32Stack4`, `SquareF64Stack8`, `HalfSquareU64`); L1/L2/L3/L4 alias impls on `BlockedGrid<T, 64, 64>` only — Q7 ruling — for `blocks_l` / `map_l` / `bulk_apply_l*`	`b479956`
A6	`tests.rs` + `mod.rs` doctest	Integration tests (W4 `bulk_apply` composition, L1→L2 cascade, footgun verification) + module-level canonical compose doctest + compile_fail guards	`32eaf11`
B	`grid_struct_macro.rs`	`blocked_grid_struct!` `#[macro_export]` SoA-of-grids macro, generated `{Name}L1Block`/`{Name}L1BlockMut`, `map_l1`/`bulk_apply_l1` on generated structs (L2-L4 macro methods deferred to follow-up), `FieldGridRef` trait + `field_n::<I>()` const-generic accessor	`b4c6692`

Design docs (.claude/knowledge/):

pr-x3-cognitive-grid-design.md (v2) — binding spec, absorbs plan-review savant P0/P1/P2 patches and seven Q1-Q7 rulings
pr-x3-plan-review.md — Phase 2 savant verdict
pr-x3-codex-audit.md — Phase 11 codex audit verdict (READY-FOR-PR)
pr-x1-design.md + pr-x2-design.md — drafted in parallel, queued for their own sprints

Layering, data-flow, and distance-typing guardrails

Zero #[target_feature], zero per-arch imports, zero raw intrinsics. Hardware dispatch happens inside the consumer's closure body via crate::simd::* (W1a contract).
Data-flow Rule This PR ports high-performance computing (HPC) features from the rustynum library into ndarray, adding comprehensive linear algebra, statistical operations, hyperdimensional computing (HDC), and signal processing capabilities. The implementation uses a pluggable backend architecture with runtime CPU detection (AVX-512 → AVX2 → scalar) for optimal performance across different hardware. #3 enforced. Every &mut self method carries a # Data-flow rule docstring section pointing to the PRIMARY map_* / map_l* compute path. Was P0 finding A1/A2 from the plan-review savant; v2 split compute from write-back.
No distance metric baked in. BlockedGrid holds T; doesn't know what T means. Semantics live in consumer closures + future crate::hpc::cognitive::* (W7).

Dependency addition

Adds paste = "1" to Cargo.toml (already present in workspace lock via crates/burn; binary impact zero). Re-exported as #[doc(hidden)] pub use paste; in src/lib.rs to support hygienic ident concat ([<$name L1Block>]) inside the blocked_grid_struct! macro. Standard stable-Rust mechanism for macro_rules! identifier generation.

Test plan

Current state (after codex P1-1 patch at 01a70edb): 111 lib tests + 79 doctests, all 5 cargo gates green.

cargo check -p ndarray --no-default-features --features std — PASS
cargo test -p ndarray --lib --no-default-features --features std hpc::blocked_grid — 111/111 PASS
cargo test --doc -p ndarray --no-default-features --features std hpc::blocked_grid — 79/79 PASS
cargo fmt --all -- --check — clean
cargo clippy -p ndarray --no-default-features --features std -- -D warnings — clean
Codex P0 audit: READY-FOR-PR, 0 P0, 2 P1 applied
P2 savant pre-merge review (running)

Sprint protocol traceability

Per .claude/knowledge/pr-x3-cognitive-grid-design.md §"Worker decomposition":

✅ Plan v1 (b348d43c)
✅ Plan-review savant (Phase 2): READY-WITH-DOC-FIXES, 2 P0 + 7 P1 + 4 P2
✅ Plan v2 (c6414ec0) + per-worker file refactor (b5329f06)
✅ Workers A1-A6 + B (sequential A1, parallel A2+A3, sequential A4→A5→A6→B)
✅ Codex P0 audit (Phase 11): READY-FOR-PR; P1-1 patched (01a70edb)
🟢 P2 savant pre-merge review (Phase 13) — running
⬜ Flip to ready-for-review + merge ladder

Roadmap (post-merge)

PR-X4 — refactor splat3d/tile.rs 16×16-tile binning onto BlockedGrid
PR-X5 — typed SIMD register-bank stacks (StackedU64x8<N>, StackedF32x16<N>, AmxTile<T, R, C>) in crate::simd::* with per-arch LazyLock dispatch
W7 — typed cognitive distance bulk fns + the actual CausalEdge64 mantissa cell kernel
PR-X1 / PR-X2 — separately queued (drafted in .claude/knowledge/pr-x1-design.md and pr-x2-design.md)
Macro L2/L3/L4 methods (deferred from B) — small follow-up PR

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

PR-X3 design contract for the next-wave sprint after W3-W6 + #157. Scope (PR-X3): CognitiveGrid<T, BR, BC> const-generic 2-D blocked grid with hierarchical tier iterators (L1=64x64 / L2=256x256 / L3=4096x4096 / L4=16384x16384 on the default 64x64 base) + cognitive_grid_struct! macro for SoA-of-grids. CausalEdge64 (u64) is the canonical cell type acting as cognitive-shader mantissa. Layering: scalar layout only. No #[target_feature], no per-arch imports, no SIMD primitives. Forward-compatible with PR-X5 (SIMD register-bank stacks) and W7 (typed cognitive distance bulk fns + cell kernels). Hardware-block x cell-type matrix documents the AMX BF16 (16x16), AMX INT8 (16x64 half-square), AVX-512 F32x16 / F64x8 / U64x8 / U8x64, and NEON dotprod natural shapes. Default 64x64 is the LCM of all useful register-bank shapes; const generics let consumers specialize. Sequential 5-10 Sonnet workers + 1 Opus coordinator protocol per the binding pattern: plan -> review -> correct -> sprint -> review code -> fix P0 -> commit -> repeat. Workers in isolated worktrees, sequential ordering (Worker B macro depends on Worker A core API). Token-reset safety: doc is self-contained, includes context recovery notes for fresh sessions arriving without conversational history. Cross-references: w3-w6-soa-aos-design.md, cognitive-shader-foundation.md, cognitive-distance-typing.md, vertical-simd-consumer-contract.md, w3-w6-codex-audit.md, w3-w6-p2-savant-review.md.

Phase 3 corrector pass on the PR-X3 BlockedGrid design doc, applying the plan-review savant verdict (READY-WITH-DOC-FIXES, 2 P0 + 7 P1 + 4 P2). P0 patches applied: - A1: split bulk_apply_base/tier into map_base/map_tier (PRIMARY compute, immutable self, returns new grid) + bulk_apply_base/tier (SECONDARY write-back, with explicit data-flow Rule #3 docstring citing .claude/rules/data-flow.md). - A2: macro emits both map_l1-l4 (compute) AND bulk_apply_l1-l4 (write-back) on the generated SoA-of-grids struct. P1 patches applied: - F1: CognitiveGrid → BlockedGrid rename; module path crate::hpc::blocked_grid - F2: Block → GridBlock, BlockMut → GridBlockMut, SuperBlock → GridSuperBlock (avoids collision with crate::backend::native BLAS Block) - F3: cache-hierarchy convention note (L1 innermost, L4 framebuffer-scale) - G2: keep both map_tier::<N> + L1-L4 aliases (and same for bulk_apply) - G6: add new_with_pad(rows, cols, pad_value: T) ctor (T: Copy only, no Default bound); new() delegates with T::default() - G3: # Footgun doc section on as_padded_slice + as_padded_slice_mut - G4: macro emits field_n::<I> const-generic field accessors P2 patches applied: - J1: PhantomData lifetime variance note on GridBlock/GridBlockMut - J4: module-level docstring out-of-scope warning requirement (3 lines max) Q1-Q7 rulings persisted in §"Resolved questions" (was §"Open questions" in v1). Worker decomposition: 7-worker split (A1-A6 + B) is the DEFAULT, not the fallback. Fixed §"Sprint protocol" step 4 contradiction (was "Two workers in parallel" — corrected to "Spawn workers SEQUENTIALLY"). Verdict file persisted at .claude/knowledge/pr-x3-plan-review.md (savant had no write permission — coordinator wrote it post-task).

The previous Read/Write/Edit/MultiEdit/NotebookEdit allow entries used the bare `**` glob, which doesn't match against actual file paths in the current Claude Code harness — so every Edit/Write call triggered a permission popup despite being on the allowlist. Switching to the `{**}` glob form (curly-brace alternation) so the patterns actually fire. Deny entries (./.archive/**, ./.git/**, ./CLAUDE-CREDENTIALS.md, …) are left untouched — they use absolute prefixes and were matching correctly. Only the catch-all "any path" entries needed the syntax fix.

… (PR-X3 A1) First sprint cut for PR-X3 per .claude/knowledge/pr-x3-cognitive-grid-design.md (design v2 @ c6414ec). Ships: - BlockedGrid<T, BR, BC> struct (private fields, row-major padded storage) - new(rows, cols) + new_with_pad(rows, cols, pad_value) - rows/cols/padded_rows/padded_cols/block_dims/idx/get/set accessors - as_padded_slice + as_padded_slice_mut with # Footgun docs - GridBlock<'a, T, BR, BC> + GridBlockMut<'a, T, BR, BC> view types (PhantomData lifetime variance per Q3 ruling) - Inline unit tests for all of the above - Const-generic compile-time assertion BR > 0 && BC > 0 Iterators, super-blocks, map_*, bulk_apply_*, and convenience aliases deferred to workers A2-A5. Macro deferred to worker B. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

A1 shipped 772 lines in mod.rs at 1101b7d (cherry-picked as a7e9a67). Refactoring into one-file-per-sprint-worker per the design doc's new §"Per-worker file scoping" section, so workers A2-A5 + B can spawn in parallel without colliding on the same file. File layout: - mod.rs — slim index (submodule decls + re-exports) - base.rs — A1's content (BlockedGrid, GridBlock, GridBlockMut, all accessors, inline tests). Named `base` not `core` to avoid shadowing the std `core` crate. - iter.rs — A2 stub (BaseBlockIter, BaseBlockIterMut) - super_block.rs — A3 stub (GridSuperBlock, TierBlockIter, blocks_tier) - compute.rs — A4 stub (map_*, bulk_apply_*) - aliases.rs — A5 stub (convenience aliases + L1-L4 impls) All 5 gates still green after refactor: - cargo check : PASS - cargo test --lib (blocked_grid) : 23/23 PASS - cargo test --doc (blocked_grid) : 25/25 PASS - cargo fmt --check : PASS - cargo clippy -D warnings : PASS Also tightens .claude/settings.json: replaces the catch-all `Edit({**})` / `Write({**})` with per-area entries (src/{**}, crates/{**}, .claude/knowledge/{**}, Cargo.toml, etc.). NotebookEdit removed (no notebooks in this project). Read({**}) stays broad — agents need read access everywhere for context. Design doc gains §"Per-worker file scoping (binding)" with the worker→file mapping table, and §"The agent sequence for PR-X3" now notes that workers A2-A5 can spawn in parallel once the file-split scaffolding is landed.

…oa_struct! pad_to_lanes PR-X1 design — MultiLaneColumn, Fingerprint::as_u8x64, array_window, simd::* re-export sweep. Carves out the SIMD-staged inner-loop primitives flagged by the W3-W6 P2 savant review (A1/A4 findings). PR-X2 design — generalize aos_to_soa / soa_to_aos to <T, U, N> so non-f32 element types are first-class, and add the #[soa(pad_to_lanes=N)] field attribute to soa_struct! so SIMD kernels get guaranteed tail padding. Both designs follow the same 7-phase sprint-protocol shape as PR-X3 (plan → review → correct → sprint sequential → audit → fix P0 → P2 review). Sequential worker decomposition. No code changes — design docs only. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

BaseBlockIter / BaseBlockIterMut + blocks_base / blocks_base_mut impls on BlockedGrid<T, BR, BC>. Row-major iteration over the BR×BC base blocks. Inline tests for all spec cases. Also adds GridBlockMut::row_mut in iter.rs (needed by iterator doctests and downstream workers; A1's base.rs exposed data_mut + padded_cols helpers that make this possible from within the sibling module without touching base.rs field visibility). https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

Uncomment the `pub use iter::{BaseBlockIter, BaseBlockIterMut};` line now that A2 (a4975a0) has landed real implementations. Cherry-picked A2's commit as af8a4c8. All 5 gates green: 34 lib tests + 30 doctests.

GridSuperBlock<'a, T, BR, BC, N> + GridSuperBlockMut + TierBlockIter + TierBlockIterMut + blocks_tier::<N> / blocks_tier_mut::<N> impls on BlockedGrid. Const-generic N=tier-stride. Panics on invalid (BR*N, BC*N) divisibility with a documented error message. Inline tests for all spec'd cases including the panic case via #[should_panic]. Also: moved as_padded_slice / as_padded_slice_mut from impl<T: Copy> to impl<T> in base.rs — those methods only borrow &[T] / &mut [T] and do not need T: Copy; the Copy bound blocked blocks_tier from calling them. Added pub(super) GridBlock::from_raw + GridBlockMut::from_raw to base.rs so super_block.rs can construct base-block views without T: Copy. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

…om mod.rs Uncomment the `pub use super_block::{...}` line now that A3 (4224b33) has landed real implementations. Adds `TierBlockIterMut` to the re-export list (A3 shipped both read-only and mutable tier iterators). All 5 gates green: 49 lib tests + 48 doctests.

…PR-X3 A4) Splits the API into: - map_base / map_tier : PRIMARY compute paths (immutable self, returns a new BlockedGrid<U, BR, BC>) — satisfy data-flow Rule #3 - bulk_apply_base / bulk_apply_tier : SECONDARY write-back paths (&mut self) — each carries the mandatory # Data-flow rule docstring section citing .claude/rules/data-flow.md verbatim Closure signatures use the two-block pattern (input block + output block) for map_*, and (mut output block + coordinates) for bulk_apply_*. Inline tests verify input-unchanged invariant on map_*, write-back correctness on bulk_apply_*, panic propagation on bulk_apply_tier with invalid divisibility, and empty-grid degenerate cases. Also adds GridBlock::row() / rows() accessors in compute.rs (spec'd in the PR-X3 design doc but missing from A1's base.rs). Requires two #[doc(hidden)] pub helpers on GridBlock in base.rs (data_slice() and padded_cols_stride()) so compute.rs can reach private fields without reopening base.rs wholesale — the minimal touch justified by spec gap. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

Type aliases for the cognitive-shader and SIMD-tier shapes: - ShaderMantissaGrid (u64, 64, 64) — CausalEdge64 mantissa default - AmxBf16Grid (u16, 16, 16) — AMX BF16 TDPBF16PS tile shape - AmxInt8Grid (u8, 16, 64) — AMX INT8 TDPBUSD half-square shape - StripF32Stack2 / Stack4 (f32, 2|4, 16) — F32x16 vertical stacks - SquareF64Stack8 (f64, 8, 8) — F64x8 8×8 GEMM kernel shape - HalfSquareU64 (u64, 32, 64) — half-square U64 grid L1/L2/L3/L4 alias impls on BlockedGrid<T, 64, 64> ONLY (Q7 ruling): - blocks_l1/2/3/4 delegating to blocks_base / blocks_tier::<4|64|256> - map_l1/2/3/4 delegating to map_base / map_tier::<4|64|256> - bulk_apply_l1/2/3/4 delegating to bulk_apply_base / bulk_apply_tier::<N> Each bulk_apply_l* carries the verbatim # Data-flow rule docstring section matching A4's bulk_apply_base. Cache-hierarchy convention (L1 innermost, L4 framebuffer-scale) documented in the first method's docstring. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

Uncomment the aliases re-export now that A5 (2402275) has landed. All 74 lib tests and 74 doctests passing. Cargo fmt + clippy clean.

…X3 A6) Adds the final test-density layer after A1-A5 shipped their inline #[cfg(test)] coverage: - src/hpc/blocked_grid/tests.rs (new) — integration tests that span multiple submodules: W4 bulk_apply composition, L1→L2 cascade, all seven type aliases instantiate, half-square AMX INT8 pattern, as_padded_slice footgun verification, const-generic compile-fail - src/hpc/blocked_grid/mod.rs — module-level doctest demonstrating the canonical compose pattern (ShaderMantissaGrid → map_l1 → verify input unchanged + output as expected) and #[cfg(test)] mod tests registration Test count: +17 lib tests + 2 doctests above the A5 baseline (74 + 74). All 5 gates green. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

…ds (PR-X3 B) The macro generates an SoA-of-grids struct: each named field becomes its own BlockedGrid<FieldT, BR, BC> with shared rows/cols/padded dimensions. Generated API (v1 — L1 only; L2/L3/L4 deferred to follow-up): - {Name}::new(rows, cols) constructor - rows/cols/padded_rows/padded_cols accessors - blocks_l1() lockstep iteration → {Name}L1Block<'_> - map_l1() PRIMARY compute path → new {Name}, input unchanged - bulk_apply_l1() SECONDARY write-back, carries verbatim # Data-flow rule docstring section per .claude/rules/data-flow.md Rule #3 - field_n::<I>() compile-time field accessor (P1 G4 ruling) - {Name}L1Block / {Name}L1BlockMut / {Name}L1BlockIter view types Also adds: - FieldGridRef trait (object-safe dimension accessor for &dyn use) - Clone impl for BlockedGrid<T: Copy + Default, BR, BC> - paste = "1" dependency (identifier concat for {Name}L1Block naming) - pub use paste re-export in lib.rs for $crate::paste::paste! macro hygiene Reserved field names enforced (compile error if shadowed): `new`, `rows`, `cols`, `padded_rows`, `padded_cols`, `blocks_l1/2/3/4`, `map_l1/2/3/4`, `bulk_apply_l1/2/3/4`, `field_n`, `default`. Inline tests: 2/3/4-field generation, pub/private field visibility, #[derive(Clone)] passthrough, map_l1 input-unchanged invariant, bulk_apply_l1 lockstep mutation, field_n::<0>/<1> accessor. 5-gate result: check OK, lib 111/111, doc 79/79, fmt OK, clippy OK. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

… + persist audit verdict Codex P0 audit (Phase 11) returned READY-FOR-PR with 0 P0, 2 P1, 2 P2. P1-1 (applied): GridBlockMut::row_mut had module-level data-flow framing but not a method-level `# Data-flow rule` docstring section. Added the verbatim citation of .claude/rules/data-flow.md Rule #3 and the pointer to BlockedGrid::map_base for compute paths. P1-2 (PR description action): `paste = "1"` dep addition will be called out in the PR description on next update. P2 findings (deferred to P2 savant in Phase 13): - pub helpers on GridBlock/GridBlockMut should perhaps be pub(crate) - field_n::<I>() type erasure — additive typed accessor worth considering Verdict file persisted at .claude/knowledge/pr-x3-codex-audit.md.

…st verdict P2 savant (Phase 13) verdict: SHIP-WITH-FOLLOWUPS. 4 P2 findings; 3 applied in this commit, 1 deferred to PR-X3.1. P2-1 (applied) — downscope `pub` helpers on GridBlock/GridBlockMut to `pub(crate)`. The four helpers (data_slice, padded_cols_stride on GridBlock; data_mut, padded_cols on GridBlockMut) are intra-crate implementation seams. Leaving them `pub` meant downstream consumers could bypass the `# Footgun` guard on `as_padded_slice`. Also drops the `#[doc(hidden)]` attribute — no longer needed once visibility is tight. P2-3 (applied) — drop stray `T: Copy` bound from `GridBlock::from_grid`, `GridBlockMut::from_grid`, `Iterator for BaseBlockIter`, `Iterator for BaseBlockIterMut`, both `ExactSizeIterator` impls, and the impl block holding `blocks_base` / `blocks_base_mut`. None of these positions actually copy a `T` value — they only compute index arithmetic and slice the storage. The bound was over-constraining; iterator surface now works for any `T` (not just `T: Copy`). `BlockedGrid::get` / `set` still correctly require `T: Copy` because they do copy values. P2-4 (applied) — strengthen macro L1-only deferral wording with explicit PR-X3.1 ticket reference + `TODO(PR-X3.1)` marker + dedicated per-field workaround warning. Reduces the risk that callers cement per-field loops outside the macro-generated struct. P2-2 (DEFERRED → PR-X3.1) — typed `field_grid::<I, FieldT>()` accessor alongside the existing erased `field_n::<I>()`. Additive but requires either a downcast trait or extra macro emit arm; no current consumer needs it. Verdict file persisted at .claude/knowledge/pr-x3-p2-savant-review.md. PR-X3.1 follow-up backlog documented at the bottom of the verdict file. All 5 gates green after tightenings: - cargo check: PASS - cargo test --lib hpc::blocked_grid: 111/111 PASS - cargo test --doc hpc::blocked_grid: 79/79 PASS - cargo fmt --check: clean - cargo clippy -D warnings: clean

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81766e63d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T14:42:20Z

+            row_origin,
+            col_origin,
+            padded_cols,
+            data: &mut grid.data[start..end],


Avoid overlapping mutable block views

When a grid has more than one block column, this strided slice includes the gaps between the block's rows, so adjacent column blocks overlap in memory. For example, in a 64×128 padded grid with 64×64 blocks, block (0,0) stores data[0..8128] and block (0,1) stores data[64..8192]; blocks_base_mut() can yield both GridBlockMuts alive at once via repeated next(), creating overlapping &mut [T] and making the safe iterator unsound. The mutable view needs to avoid holding a single strided &mut range, or the iterator must split the backing storage into truly disjoint pieces.

Useful? React with 👍 / 👎.

Confirmed UB. Fixed in 656a368.

Root cause: GridBlockMut::data: &'a mut [T] held a strided slice from [start..start + (BR-1)*padded_cols + BC], which fundamentally cannot describe a block without aliasing adjacent column blocks (they share rows that interleave in memory). BaseBlockIterMut::next happily yields multiple such slices alive at once via raw-pointer re-borrow, hitting the UB you described.

Fix: converted data: &'a mut [T] → data: *mut T + data_len: usize. The struct's new aliasing invariant (documented in the type docstring) forbids any wide &mut [T] materialization; cell access is exclusively through row_mut(r), which materializes &mut [T] of length BC starting at (row_origin + r, col_origin). Across simultaneously-live blocks, these per-row materializations target disjoint cells (each block owns its own [col_origin, col_origin + BC) column range on every physical row), so no two live &mut [T] ever alias.

Also tightened: from_raw is now unsafe fn with a documented caller contract; explicit unsafe impl Send + Sync matching the existing pattern on BaseBlockIterMut; replaced the data_mut() accessor (the vehicle for the UB) with data_ptr() -> *mut T + data_len() -> usize. All 5 gates still green (111 lib + 79 doctests).

Filed a follow-up note for PR-X3.1 to add a SAFETY-claim verification gate to the codex P0 audit so this class of bug — latent UB that type-checks but violates the aliasing model — gets caught pre-merge.

Generated by Claude Code

chatgpt-codex-connector · 2026-05-18T14:42:20Z

+            // to `next()` advances past the rows consumed by the returned
+            // super-block (disjoint from all future yields), so no two live
+            // `GridSuperBlockMut` items alias the same memory.
+            data: unsafe { self.data.add(start) },


Restrict mutable super-blocks to their columns

For grids with multiple super-block columns, blocks_tier_mut() advances in row-major order but each yielded GridSuperBlockMut starts at (row_origin, 0) and spans the full row slab. Thus adjacent super-blocks in the same row, such as blocks_tier_mut::<1>() on a 64×128 padded grid, receive the same starting pointer and length even though callers may keep both yielded items alive. That violates the exclusive-access invariant advertised by the mutable iterator and can produce overlapping mutable base-block views when both super-blocks are used.

Useful? React with 👍 / 👎.

Confirmed UB. Fixed in 656a368 alongside the related base-block bug (#158 review thread on base.rs:492).

You correctly diagnosed both halves: TierBlockIterMut::next() set data: *mut T from data_ptr.add(row_origin * padded_cols) with col_origin not entering the offset, so adjacent super-blocks in the same row received IDENTICAL pointers spanning the full row slab. Then base_blocks_mut() materialized strided &mut [T] from those colliding pointers via slice::from_raw_parts_mut(data_ptr.add(start), end - start) where end - start extended into adjacent super-blocks' columns — propagating the overlap to the GridBlockMut level.

Root cause was at the GridBlockMut level (the underlying base-block view's data: &'a mut [T] strided slice was fundamentally unsound for any block in a multi-column grid). Fixing GridBlockMut to use data: *mut T + per-row materialization through row_mut resolves both bugs:

BaseBlockIterMut: blocks no longer carry wide strided &mut [T]; row_mut materializes only [col_origin, col_origin+BC) cells, disjoint across blocks

TierBlockIterMut → base_blocks_mut: the intermediate slice::from_raw_parts_mut is removed; the super-block now passes raw pointer + length straight to GridBlockMut::from_raw (now unsafe fn with documented caller contract). Cell access flows through row_mut, picking up the same column-disjoint invariant.

The super-block's own data: *mut T is still set to row_origin's first cell (not col_origin), but that's now harmless — raw pointers can alias; the invariant only matters at &mut [T] materialization, which now happens exclusively per-row at column-disjoint addresses.

All 5 gates green: 111 lib tests + 79 doctests, fmt + clippy clean. Queued a SAFETY-claim verification gate for PR-X3.1's codex audit prompt to catch this class of latent UB pre-merge.

Generated by Claude Code

…ex P1×2) Codex review on PR #158 flagged two P1 soundness bugs: 1. `BaseBlockIterMut::next()` yields `GridBlockMut` instances carrying `data: &'a mut [T]` slices over the strided block footprint (`[start..start + (BR-1)*padded_cols + BC]`). For grids with multiple block columns, adjacent column blocks' slices overlap heavily — e.g., on a 64×128 padded grid with 64×64 blocks, block (0,0) covers `data[0..8128]` and block (0,1) covers `data[64..8192]`. Two such `&mut [T]` simultaneously live = UB. 2. `TierBlockIterMut::next()` yields `GridSuperBlockMut` instances with `data: *mut T` set to `data_ptr.add(row_origin * padded_cols)` — col_origin doesn't enter the offset, so adjacent super-blocks in the same row receive IDENTICAL raw pointers spanning the full row slab. `base_blocks_mut()` then materialized strided `&mut [T]` from these colliding super-block pointers, propagating the UB. Both bugs trace to the same root cause: `GridBlockMut::data` stored a strided `&'a mut [T]` slice that referenced cells outside the block's own column range. Adjacent column blocks fundamentally share rows that interleave in memory; no contiguous `&mut [T]` can describe a block without aliasing siblings. Fix: change `GridBlockMut::data` from `&'a mut [T]` to `*mut T` + add `data_len: usize` for bounds checking. The struct's new aliasing invariant (documented in the type-level docstring) is: `data` is NEVER converted to a wide `&mut [T]`; cell access happens exclusively through `row_mut(r)`, which materializes `&mut [T]` of length BC starting at the block's own `(row_origin + r, col_origin)`. Across blocks, these per-row materializations target disjoint cells (each block owns its own `[col_origin, col_origin + BC)` column range), so no two live `&mut [T]` ever alias. Also: - `GridBlockMut::from_raw` is now `unsafe fn` with documented caller contract (raw pointer + length + per-block column-disjoint invariant) - Added `unsafe impl Send + Sync` for `GridBlockMut<T: Send/Sync>` matching the existing pattern on `BaseBlockIterMut` / `GridSuperBlockMut` - Renamed `GridBlockMut::padded_cols` pub(crate) accessor to `padded_cols_stride` for naming consistency with `GridBlock` (resolves a PR-X3.1 housekeeping item early) - Replaced `data_mut() -> &mut [T]` pub(crate) accessor with `data_ptr() -> *mut T` + `data_len() -> usize`. The wide-slice accessor was the materialization vehicle for the UB. - Updated `iter.rs::row_mut` to materialize via `slice::from_raw_parts_mut` with a debug_assert bounds check and verbatim SAFETY comment - Updated `super_block.rs::base_blocks_mut` to pass raw pointer + length to the new unsafe `from_raw` (no intermediate strided slice) - Updated `super_block.rs::tier_mut_2_mutation_visible` test to use `row_mut(0)[0]` instead of the removed `data_mut()` accessor All 5 gates still green: - cargo check: PASS - cargo test --lib hpc::blocked_grid: 111/111 PASS - cargo test --doc hpc::blocked_grid: 79/79 PASS - cargo fmt --check: clean - cargo clippy -D warnings: clean Codex audit gap noted for PR-X3.1: future audits need a SAFETY-claim verification gate that simulates adversarial iterator usage (e.g., collect all yielded items into a Vec before consuming any) to catch this class of latent UB that passes type-checking but violates the aliasing model.

…wildcards Master consolidation: ndarray::hpc::* becomes the universal CPU-shape-aware substrate. 10-submodule layout. Invariant 12 replaces jc's zero-dep rule ("certification = determinism + inspectability, not repo separation"). 8-week schedule across 6 sprints with concurrent execution where the dependency graph permits. PR-X11 — jc consolidation: 6 workers move ewa_sandwich (Pillar-6), ewa_sandwich_3d (Pillar-7), koestenberger, pflug (Pillar-10), + NEW Pillar-8 temporal_sandwich, Pillar-9 Cov<N> high-D, Pillar-11 signature transform into ndarray::hpc::pillar::*. Wasserstein/Sinkhorn- Knopp/Hungarian primitives go to linalg::wasserstein. jc deprecates to a thin probe-runner; 1-cycle #[deprecated] shim. PR-X12 — x265-style codec: 8 workers ship ndarray::hpc::codec::* with CTU/CU quad-tree, 4 modes (skip/merge/delta/escape), λ-RDO, rANS entropy coder (chosen over CABAC for cache-friendliness; 0.5% compression-ratio diff). PR-X9's lazy basin-codebook consumes this codec. Target: ~2.4 bytes/cell on coherent input, ≤ 4 bytes/cell worst-case (no regression). PR-X13 — OGIT bridge: 4 workers embed the OGIT Cognitive namespace TTL files (~150 KB) into ndarray via include_str! + ship a minimal Turtle parser (~250 LoC, no rdflib dep) + O(1) family bitmap lookup. Subsumes PR-Z1 (OGIT bootstrap) + PR-Z2 (lance-graph CognitiveBridge). 3-repo coordination collapses to 1 sprint. Bardioc REST client integration becomes optional follow-on, not blocker. Phase 1 (Protocol B: plan → savant review → correct) drafts complete: - pr-x3-cognitive-grid-design.md (shipped as PR #158) - pr-x4-design.md - pr-x9-design.md - pr-z1-ogit-cognitive-bootstrap.md (superseded by PR-X13) - pr-arithmetic-inventory.md - pr-x10-linalg-core-design.md - pr-master-consolidation.md - pr-x11-jc-consolidation-design.md - pr-x12-codec-x265-design.md - pr-x13-ogit-bridge-design.md Phase 2 (Protocol A: preflight Rust skeleton → parallel-savant fan-out → workers fill bodies) starts after joint plan-review savant verdict on all 10 docs. Per-sprint specialist savants: data-flow, layering, distance-typing, SAFETY-claim, naming-collision, test-coverage. SAFETY-claim savant exists specifically to catch the class of latent UB that PR-X3's GridBlockMut had (caught post-merge by codex; preflight catches it pre-implementation). Also adds settings.json wildcard permissions (Edit/Write/MultiEdit/ NotebookEdit + Bash touch/cat/tee/bash) per user authorization. Reduces popup friction for the upcoming 44-worker concurrent execution.

claude added 17 commits May 18, 2026 12:03

chore(hpc/blocked_grid): re-export A2 base-block iterators from mod.rs

e4c23f7

Uncomment the `pub use iter::{BaseBlockIter, BaseBlockIterMut};` line now that A2 (a4975a0) has landed real implementations. Cherry-picked A2's commit as af8a4c8. All 5 gates green: 34 lib tests + 30 doctests.

chore(hpc/blocked_grid): re-export A5 aliases from mod.rs

fbbbb35

Uncomment the aliases re-export now that A5 (2402275) has landed. All 74 lib tests and 74 doctests passing. Cargo fmt + clippy clean.

AdaWorldAPI marked this pull request as ready for review May 18, 2026 14:38

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

AdaWorldAPI merged commit 589ef56 into master May 18, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR-X3: BlockedGrid hierarchical block layout (workers A1-A5 + design docs)#158

PR-X3: BlockedGrid hierarchical block layout (workers A1-A5 + design docs)#158
AdaWorldAPI merged 18 commits into
masterfrom
claude/pr-x3-cognitive-grid-design

AdaWorldAPI commented May 18, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

AdaWorldAPI May 18, 2026

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

AdaWorldAPI May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What ships

Layering, data-flow, and distance-typing guardrails

Dependency addition

Test plan

Sprint protocol traceability

Roadmap (post-merge)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdaWorldAPI commented May 18, 2026 •

edited

Loading