PR-X3: BlockedGrid hierarchical block layout (workers A1-A5 + design docs)#158
Conversation
PR-X3 design contract for the next-wave sprint after W3-W6 + #157. Scope (PR-X3): CognitiveGrid<T, BR, BC> const-generic 2-D blocked grid with hierarchical tier iterators (L1=64x64 / L2=256x256 / L3=4096x4096 / L4=16384x16384 on the default 64x64 base) + cognitive_grid_struct! macro for SoA-of-grids. CausalEdge64 (u64) is the canonical cell type acting as cognitive-shader mantissa. Layering: scalar layout only. No #[target_feature], no per-arch imports, no SIMD primitives. Forward-compatible with PR-X5 (SIMD register-bank stacks) and W7 (typed cognitive distance bulk fns + cell kernels). Hardware-block x cell-type matrix documents the AMX BF16 (16x16), AMX INT8 (16x64 half-square), AVX-512 F32x16 / F64x8 / U64x8 / U8x64, and NEON dotprod natural shapes. Default 64x64 is the LCM of all useful register-bank shapes; const generics let consumers specialize. Sequential 5-10 Sonnet workers + 1 Opus coordinator protocol per the binding pattern: plan -> review -> correct -> sprint -> review code -> fix P0 -> commit -> repeat. Workers in isolated worktrees, sequential ordering (Worker B macro depends on Worker A core API). Token-reset safety: doc is self-contained, includes context recovery notes for fresh sessions arriving without conversational history. Cross-references: w3-w6-soa-aos-design.md, cognitive-shader-foundation.md, cognitive-distance-typing.md, vertical-simd-consumer-contract.md, w3-w6-codex-audit.md, w3-w6-p2-savant-review.md.
Phase 3 corrector pass on the PR-X3 BlockedGrid design doc, applying the plan-review savant verdict (READY-WITH-DOC-FIXES, 2 P0 + 7 P1 + 4 P2). P0 patches applied: - A1: split bulk_apply_base/tier into map_base/map_tier (PRIMARY compute, immutable self, returns new grid) + bulk_apply_base/tier (SECONDARY write-back, with explicit data-flow Rule #3 docstring citing .claude/rules/data-flow.md). - A2: macro emits both map_l1-l4 (compute) AND bulk_apply_l1-l4 (write-back) on the generated SoA-of-grids struct. P1 patches applied: - F1: CognitiveGrid → BlockedGrid rename; module path crate::hpc::blocked_grid - F2: Block → GridBlock, BlockMut → GridBlockMut, SuperBlock → GridSuperBlock (avoids collision with crate::backend::native BLAS Block) - F3: cache-hierarchy convention note (L1 innermost, L4 framebuffer-scale) - G2: keep both map_tier::<N> + L1-L4 aliases (and same for bulk_apply) - G6: add new_with_pad(rows, cols, pad_value: T) ctor (T: Copy only, no Default bound); new() delegates with T::default() - G3: # Footgun doc section on as_padded_slice + as_padded_slice_mut - G4: macro emits field_n::<I> const-generic field accessors P2 patches applied: - J1: PhantomData lifetime variance note on GridBlock/GridBlockMut - J4: module-level docstring out-of-scope warning requirement (3 lines max) Q1-Q7 rulings persisted in §"Resolved questions" (was §"Open questions" in v1). Worker decomposition: 7-worker split (A1-A6 + B) is the DEFAULT, not the fallback. Fixed §"Sprint protocol" step 4 contradiction (was "Two workers in parallel" — corrected to "Spawn workers SEQUENTIALLY"). Verdict file persisted at .claude/knowledge/pr-x3-plan-review.md (savant had no write permission — coordinator wrote it post-task).
The previous Read/Write/Edit/MultiEdit/NotebookEdit allow entries used the
bare `**` glob, which doesn't match against actual file paths in the
current Claude Code harness — so every Edit/Write call triggered a
permission popup despite being on the allowlist. Switching to the `{**}`
glob form (curly-brace alternation) so the patterns actually fire.
Deny entries (./.archive/**, ./.git/**, ./CLAUDE-CREDENTIALS.md, …)
are left untouched — they use absolute prefixes and were matching
correctly. Only the catch-all "any path" entries needed the syntax fix.
… (PR-X3 A1) First sprint cut for PR-X3 per .claude/knowledge/pr-x3-cognitive-grid-design.md (design v2 @ c6414ec). Ships: - BlockedGrid<T, BR, BC> struct (private fields, row-major padded storage) - new(rows, cols) + new_with_pad(rows, cols, pad_value) - rows/cols/padded_rows/padded_cols/block_dims/idx/get/set accessors - as_padded_slice + as_padded_slice_mut with # Footgun docs - GridBlock<'a, T, BR, BC> + GridBlockMut<'a, T, BR, BC> view types (PhantomData lifetime variance per Q3 ruling) - Inline unit tests for all of the above - Const-generic compile-time assertion BR > 0 && BC > 0 Iterators, super-blocks, map_*, bulk_apply_*, and convenience aliases deferred to workers A2-A5. Macro deferred to worker B. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
A1 shipped 772 lines in mod.rs at 1101b7d (cherry-picked as a7e9a67). Refactoring into one-file-per-sprint-worker per the design doc's new §"Per-worker file scoping" section, so workers A2-A5 + B can spawn in parallel without colliding on the same file. File layout: - mod.rs — slim index (submodule decls + re-exports) - base.rs — A1's content (BlockedGrid, GridBlock, GridBlockMut, all accessors, inline tests). Named `base` not `core` to avoid shadowing the std `core` crate. - iter.rs — A2 stub (BaseBlockIter, BaseBlockIterMut) - super_block.rs — A3 stub (GridSuperBlock, TierBlockIter, blocks_tier) - compute.rs — A4 stub (map_*, bulk_apply_*) - aliases.rs — A5 stub (convenience aliases + L1-L4 impls) All 5 gates still green after refactor: - cargo check : PASS - cargo test --lib (blocked_grid) : 23/23 PASS - cargo test --doc (blocked_grid) : 25/25 PASS - cargo fmt --check : PASS - cargo clippy -D warnings : PASS Also tightens .claude/settings.json: replaces the catch-all `Edit({**})` / `Write({**})` with per-area entries (src/{**}, crates/{**}, .claude/knowledge/{**}, Cargo.toml, etc.). NotebookEdit removed (no notebooks in this project). Read({**}) stays broad — agents need read access everywhere for context. Design doc gains §"Per-worker file scoping (binding)" with the worker→file mapping table, and §"The agent sequence for PR-X3" now notes that workers A2-A5 can spawn in parallel once the file-split scaffolding is landed.
…oa_struct! pad_to_lanes PR-X1 design — MultiLaneColumn, Fingerprint::as_u8x64, array_window, simd::* re-export sweep. Carves out the SIMD-staged inner-loop primitives flagged by the W3-W6 P2 savant review (A1/A4 findings). PR-X2 design — generalize aos_to_soa / soa_to_aos to <T, U, N> so non-f32 element types are first-class, and add the #[soa(pad_to_lanes=N)] field attribute to soa_struct! so SIMD kernels get guaranteed tail padding. Both designs follow the same 7-phase sprint-protocol shape as PR-X3 (plan → review → correct → sprint sequential → audit → fix P0 → P2 review). Sequential worker decomposition. No code changes — design docs only. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
BaseBlockIter / BaseBlockIterMut + blocks_base / blocks_base_mut impls on BlockedGrid<T, BR, BC>. Row-major iteration over the BR×BC base blocks. Inline tests for all spec cases. Also adds GridBlockMut::row_mut in iter.rs (needed by iterator doctests and downstream workers; A1's base.rs exposed data_mut + padded_cols helpers that make this possible from within the sibling module without touching base.rs field visibility). https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
GridSuperBlock<'a, T, BR, BC, N> + GridSuperBlockMut + TierBlockIter + TierBlockIterMut + blocks_tier::<N> / blocks_tier_mut::<N> impls on BlockedGrid. Const-generic N=tier-stride. Panics on invalid (BR*N, BC*N) divisibility with a documented error message. Inline tests for all spec'd cases including the panic case via #[should_panic]. Also: moved as_padded_slice / as_padded_slice_mut from impl<T: Copy> to impl<T> in base.rs — those methods only borrow &[T] / &mut [T] and do not need T: Copy; the Copy bound blocked blocks_tier from calling them. Added pub(super) GridBlock::from_raw + GridBlockMut::from_raw to base.rs so super_block.rs can construct base-block views without T: Copy. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
…om mod.rs
Uncomment the `pub use super_block::{...}` line now that A3 (4224b33)
has landed real implementations. Adds `TierBlockIterMut` to the re-export
list (A3 shipped both read-only and mutable tier iterators). All 5 gates
green: 49 lib tests + 48 doctests.
…PR-X3 A4) Splits the API into: - map_base / map_tier : PRIMARY compute paths (immutable self, returns a new BlockedGrid<U, BR, BC>) — satisfy data-flow Rule #3 - bulk_apply_base / bulk_apply_tier : SECONDARY write-back paths (&mut self) — each carries the mandatory # Data-flow rule docstring section citing .claude/rules/data-flow.md verbatim Closure signatures use the two-block pattern (input block + output block) for map_*, and (mut output block + coordinates) for bulk_apply_*. Inline tests verify input-unchanged invariant on map_*, write-back correctness on bulk_apply_*, panic propagation on bulk_apply_tier with invalid divisibility, and empty-grid degenerate cases. Also adds GridBlock::row() / rows() accessors in compute.rs (spec'd in the PR-X3 design doc but missing from A1's base.rs). Requires two #[doc(hidden)] pub helpers on GridBlock in base.rs (data_slice() and padded_cols_stride()) so compute.rs can reach private fields without reopening base.rs wholesale — the minimal touch justified by spec gap. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Type aliases for the cognitive-shader and SIMD-tier shapes: - ShaderMantissaGrid (u64, 64, 64) — CausalEdge64 mantissa default - AmxBf16Grid (u16, 16, 16) — AMX BF16 TDPBF16PS tile shape - AmxInt8Grid (u8, 16, 64) — AMX INT8 TDPBUSD half-square shape - StripF32Stack2 / Stack4 (f32, 2|4, 16) — F32x16 vertical stacks - SquareF64Stack8 (f64, 8, 8) — F64x8 8×8 GEMM kernel shape - HalfSquareU64 (u64, 32, 64) — half-square U64 grid L1/L2/L3/L4 alias impls on BlockedGrid<T, 64, 64> ONLY (Q7 ruling): - blocks_l1/2/3/4 delegating to blocks_base / blocks_tier::<4|64|256> - map_l1/2/3/4 delegating to map_base / map_tier::<4|64|256> - bulk_apply_l1/2/3/4 delegating to bulk_apply_base / bulk_apply_tier::<N> Each bulk_apply_l* carries the verbatim # Data-flow rule docstring section matching A4's bulk_apply_base. Cache-hierarchy convention (L1 innermost, L4 framebuffer-scale) documented in the first method's docstring. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Uncomment the aliases re-export now that A5 (2402275) has landed. All 74 lib tests and 74 doctests passing. Cargo fmt + clippy clean.
…X3 A6) Adds the final test-density layer after A1-A5 shipped their inline #[cfg(test)] coverage: - src/hpc/blocked_grid/tests.rs (new) — integration tests that span multiple submodules: W4 bulk_apply composition, L1→L2 cascade, all seven type aliases instantiate, half-square AMX INT8 pattern, as_padded_slice footgun verification, const-generic compile-fail - src/hpc/blocked_grid/mod.rs — module-level doctest demonstrating the canonical compose pattern (ShaderMantissaGrid → map_l1 → verify input unchanged + output as expected) and #[cfg(test)] mod tests registration Test count: +17 lib tests + 2 doctests above the A5 baseline (74 + 74). All 5 gates green. https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
…ds (PR-X3 B)
The macro generates an SoA-of-grids struct: each named field becomes its
own BlockedGrid<FieldT, BR, BC> with shared rows/cols/padded dimensions.
Generated API (v1 — L1 only; L2/L3/L4 deferred to follow-up):
- {Name}::new(rows, cols) constructor
- rows/cols/padded_rows/padded_cols accessors
- blocks_l1() lockstep iteration → {Name}L1Block<'_>
- map_l1() PRIMARY compute path → new {Name}, input unchanged
- bulk_apply_l1() SECONDARY write-back, carries verbatim
# Data-flow rule docstring section per .claude/rules/data-flow.md Rule #3
- field_n::<I>() compile-time field accessor (P1 G4 ruling)
- {Name}L1Block / {Name}L1BlockMut / {Name}L1BlockIter view types
Also adds:
- FieldGridRef trait (object-safe dimension accessor for &dyn use)
- Clone impl for BlockedGrid<T: Copy + Default, BR, BC>
- paste = "1" dependency (identifier concat for {Name}L1Block naming)
- pub use paste re-export in lib.rs for $crate::paste::paste! macro hygiene
Reserved field names enforced (compile error if shadowed): `new`, `rows`,
`cols`, `padded_rows`, `padded_cols`, `blocks_l1/2/3/4`, `map_l1/2/3/4`,
`bulk_apply_l1/2/3/4`, `field_n`, `default`.
Inline tests: 2/3/4-field generation, pub/private field visibility,
#[derive(Clone)] passthrough, map_l1 input-unchanged invariant,
bulk_apply_l1 lockstep mutation, field_n::<0>/<1> accessor.
5-gate result: check OK, lib 111/111, doc 79/79, fmt OK, clippy OK.
https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
… + persist audit verdict Codex P0 audit (Phase 11) returned READY-FOR-PR with 0 P0, 2 P1, 2 P2. P1-1 (applied): GridBlockMut::row_mut had module-level data-flow framing but not a method-level `# Data-flow rule` docstring section. Added the verbatim citation of .claude/rules/data-flow.md Rule #3 and the pointer to BlockedGrid::map_base for compute paths. P1-2 (PR description action): `paste = "1"` dep addition will be called out in the PR description on next update. P2 findings (deferred to P2 savant in Phase 13): - pub helpers on GridBlock/GridBlockMut should perhaps be pub(crate) - field_n::<I>() type erasure — additive typed accessor worth considering Verdict file persisted at .claude/knowledge/pr-x3-codex-audit.md.
…st verdict P2 savant (Phase 13) verdict: SHIP-WITH-FOLLOWUPS. 4 P2 findings; 3 applied in this commit, 1 deferred to PR-X3.1. P2-1 (applied) — downscope `pub` helpers on GridBlock/GridBlockMut to `pub(crate)`. The four helpers (data_slice, padded_cols_stride on GridBlock; data_mut, padded_cols on GridBlockMut) are intra-crate implementation seams. Leaving them `pub` meant downstream consumers could bypass the `# Footgun` guard on `as_padded_slice`. Also drops the `#[doc(hidden)]` attribute — no longer needed once visibility is tight. P2-3 (applied) — drop stray `T: Copy` bound from `GridBlock::from_grid`, `GridBlockMut::from_grid`, `Iterator for BaseBlockIter`, `Iterator for BaseBlockIterMut`, both `ExactSizeIterator` impls, and the impl block holding `blocks_base` / `blocks_base_mut`. None of these positions actually copy a `T` value — they only compute index arithmetic and slice the storage. The bound was over-constraining; iterator surface now works for any `T` (not just `T: Copy`). `BlockedGrid::get` / `set` still correctly require `T: Copy` because they do copy values. P2-4 (applied) — strengthen macro L1-only deferral wording with explicit PR-X3.1 ticket reference + `TODO(PR-X3.1)` marker + dedicated per-field workaround warning. Reduces the risk that callers cement per-field loops outside the macro-generated struct. P2-2 (DEFERRED → PR-X3.1) — typed `field_grid::<I, FieldT>()` accessor alongside the existing erased `field_n::<I>()`. Additive but requires either a downcast trait or extra macro emit arm; no current consumer needs it. Verdict file persisted at .claude/knowledge/pr-x3-p2-savant-review.md. PR-X3.1 follow-up backlog documented at the bottom of the verdict file. All 5 gates green after tightenings: - cargo check: PASS - cargo test --lib hpc::blocked_grid: 111/111 PASS - cargo test --doc hpc::blocked_grid: 79/79 PASS - cargo fmt --check: clean - cargo clippy -D warnings: clean
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81766e63d8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| row_origin, | ||
| col_origin, | ||
| padded_cols, | ||
| data: &mut grid.data[start..end], |
There was a problem hiding this comment.
Avoid overlapping mutable block views
When a grid has more than one block column, this strided slice includes the gaps between the block's rows, so adjacent column blocks overlap in memory. For example, in a 64×128 padded grid with 64×64 blocks, block (0,0) stores data[0..8128] and block (0,1) stores data[64..8192]; blocks_base_mut() can yield both GridBlockMuts alive at once via repeated next(), creating overlapping &mut [T] and making the safe iterator unsound. The mutable view needs to avoid holding a single strided &mut range, or the iterator must split the backing storage into truly disjoint pieces.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed UB. Fixed in 656a368.
Root cause: GridBlockMut::data: &'a mut [T] held a strided slice from [start..start + (BR-1)*padded_cols + BC], which fundamentally cannot describe a block without aliasing adjacent column blocks (they share rows that interleave in memory). BaseBlockIterMut::next happily yields multiple such slices alive at once via raw-pointer re-borrow, hitting the UB you described.
Fix: converted data: &'a mut [T] → data: *mut T + data_len: usize. The struct's new aliasing invariant (documented in the type docstring) forbids any wide &mut [T] materialization; cell access is exclusively through row_mut(r), which materializes &mut [T] of length BC starting at (row_origin + r, col_origin). Across simultaneously-live blocks, these per-row materializations target disjoint cells (each block owns its own [col_origin, col_origin + BC) column range on every physical row), so no two live &mut [T] ever alias.
Also tightened: from_raw is now unsafe fn with a documented caller contract; explicit unsafe impl Send + Sync matching the existing pattern on BaseBlockIterMut; replaced the data_mut() accessor (the vehicle for the UB) with data_ptr() -> *mut T + data_len() -> usize. All 5 gates still green (111 lib + 79 doctests).
Filed a follow-up note for PR-X3.1 to add a SAFETY-claim verification gate to the codex P0 audit so this class of bug — latent UB that type-checks but violates the aliasing model — gets caught pre-merge.
Generated by Claude Code
| // to `next()` advances past the rows consumed by the returned | ||
| // super-block (disjoint from all future yields), so no two live | ||
| // `GridSuperBlockMut` items alias the same memory. | ||
| data: unsafe { self.data.add(start) }, |
There was a problem hiding this comment.
Restrict mutable super-blocks to their columns
For grids with multiple super-block columns, blocks_tier_mut() advances in row-major order but each yielded GridSuperBlockMut starts at (row_origin, 0) and spans the full row slab. Thus adjacent super-blocks in the same row, such as blocks_tier_mut::<1>() on a 64×128 padded grid, receive the same starting pointer and length even though callers may keep both yielded items alive. That violates the exclusive-access invariant advertised by the mutable iterator and can produce overlapping mutable base-block views when both super-blocks are used.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed UB. Fixed in 656a368 alongside the related base-block bug (#158 review thread on base.rs:492).
You correctly diagnosed both halves: TierBlockIterMut::next() set data: *mut T from data_ptr.add(row_origin * padded_cols) with col_origin not entering the offset, so adjacent super-blocks in the same row received IDENTICAL pointers spanning the full row slab. Then base_blocks_mut() materialized strided &mut [T] from those colliding pointers via slice::from_raw_parts_mut(data_ptr.add(start), end - start) where end - start extended into adjacent super-blocks' columns — propagating the overlap to the GridBlockMut level.
Root cause was at the GridBlockMut level (the underlying base-block view's data: &'a mut [T] strided slice was fundamentally unsound for any block in a multi-column grid). Fixing GridBlockMut to use data: *mut T + per-row materialization through row_mut resolves both bugs:
BaseBlockIterMut: blocks no longer carry wide strided&mut [T];row_mutmaterializes only[col_origin, col_origin+BC)cells, disjoint across blocksTierBlockIterMut → base_blocks_mut: the intermediateslice::from_raw_parts_mutis removed; the super-block now passes raw pointer + length straight toGridBlockMut::from_raw(nowunsafe fnwith documented caller contract). Cell access flows throughrow_mut, picking up the same column-disjoint invariant.
The super-block's own data: *mut T is still set to row_origin's first cell (not col_origin), but that's now harmless — raw pointers can alias; the invariant only matters at &mut [T] materialization, which now happens exclusively per-row at column-disjoint addresses.
All 5 gates green: 111 lib tests + 79 doctests, fmt + clippy clean. Queued a SAFETY-claim verification gate for PR-X3.1's codex audit prompt to catch this class of latent UB pre-merge.
Generated by Claude Code
…ex P1×2) Codex review on PR #158 flagged two P1 soundness bugs: 1. `BaseBlockIterMut::next()` yields `GridBlockMut` instances carrying `data: &'a mut [T]` slices over the strided block footprint (`[start..start + (BR-1)*padded_cols + BC]`). For grids with multiple block columns, adjacent column blocks' slices overlap heavily — e.g., on a 64×128 padded grid with 64×64 blocks, block (0,0) covers `data[0..8128]` and block (0,1) covers `data[64..8192]`. Two such `&mut [T]` simultaneously live = UB. 2. `TierBlockIterMut::next()` yields `GridSuperBlockMut` instances with `data: *mut T` set to `data_ptr.add(row_origin * padded_cols)` — col_origin doesn't enter the offset, so adjacent super-blocks in the same row receive IDENTICAL raw pointers spanning the full row slab. `base_blocks_mut()` then materialized strided `&mut [T]` from these colliding super-block pointers, propagating the UB. Both bugs trace to the same root cause: `GridBlockMut::data` stored a strided `&'a mut [T]` slice that referenced cells outside the block's own column range. Adjacent column blocks fundamentally share rows that interleave in memory; no contiguous `&mut [T]` can describe a block without aliasing siblings. Fix: change `GridBlockMut::data` from `&'a mut [T]` to `*mut T` + add `data_len: usize` for bounds checking. The struct's new aliasing invariant (documented in the type-level docstring) is: `data` is NEVER converted to a wide `&mut [T]`; cell access happens exclusively through `row_mut(r)`, which materializes `&mut [T]` of length BC starting at the block's own `(row_origin + r, col_origin)`. Across blocks, these per-row materializations target disjoint cells (each block owns its own `[col_origin, col_origin + BC)` column range), so no two live `&mut [T]` ever alias. Also: - `GridBlockMut::from_raw` is now `unsafe fn` with documented caller contract (raw pointer + length + per-block column-disjoint invariant) - Added `unsafe impl Send + Sync` for `GridBlockMut<T: Send/Sync>` matching the existing pattern on `BaseBlockIterMut` / `GridSuperBlockMut` - Renamed `GridBlockMut::padded_cols` pub(crate) accessor to `padded_cols_stride` for naming consistency with `GridBlock` (resolves a PR-X3.1 housekeeping item early) - Replaced `data_mut() -> &mut [T]` pub(crate) accessor with `data_ptr() -> *mut T` + `data_len() -> usize`. The wide-slice accessor was the materialization vehicle for the UB. - Updated `iter.rs::row_mut` to materialize via `slice::from_raw_parts_mut` with a debug_assert bounds check and verbatim SAFETY comment - Updated `super_block.rs::base_blocks_mut` to pass raw pointer + length to the new unsafe `from_raw` (no intermediate strided slice) - Updated `super_block.rs::tier_mut_2_mutation_visible` test to use `row_mut(0)[0]` instead of the removed `data_mut()` accessor All 5 gates still green: - cargo check: PASS - cargo test --lib hpc::blocked_grid: 111/111 PASS - cargo test --doc hpc::blocked_grid: 79/79 PASS - cargo fmt --check: clean - cargo clippy -D warnings: clean Codex audit gap noted for PR-X3.1: future audits need a SAFETY-claim verification gate that simulates adversarial iterator usage (e.g., collect all yielded items into a Vec before consuming any) to catch this class of latent UB that passes type-checking but violates the aliasing model.
…wildcards
Master consolidation: ndarray::hpc::* becomes the universal CPU-shape-aware
substrate. 10-submodule layout. Invariant 12 replaces jc's zero-dep rule
("certification = determinism + inspectability, not repo separation").
8-week schedule across 6 sprints with concurrent execution where the
dependency graph permits.
PR-X11 — jc consolidation: 6 workers move ewa_sandwich (Pillar-6),
ewa_sandwich_3d (Pillar-7), koestenberger, pflug (Pillar-10),
+ NEW Pillar-8 temporal_sandwich, Pillar-9 Cov<N> high-D, Pillar-11
signature transform into ndarray::hpc::pillar::*. Wasserstein/Sinkhorn-
Knopp/Hungarian primitives go to linalg::wasserstein. jc deprecates to
a thin probe-runner; 1-cycle #[deprecated] shim.
PR-X12 — x265-style codec: 8 workers ship ndarray::hpc::codec::* with
CTU/CU quad-tree, 4 modes (skip/merge/delta/escape), λ-RDO, rANS entropy
coder (chosen over CABAC for cache-friendliness; 0.5% compression-ratio
diff). PR-X9's lazy basin-codebook consumes this codec. Target: ~2.4
bytes/cell on coherent input, ≤ 4 bytes/cell worst-case (no regression).
PR-X13 — OGIT bridge: 4 workers embed the OGIT Cognitive namespace TTL
files (~150 KB) into ndarray via include_str! + ship a minimal Turtle
parser (~250 LoC, no rdflib dep) + O(1) family bitmap lookup. Subsumes
PR-Z1 (OGIT bootstrap) + PR-Z2 (lance-graph CognitiveBridge). 3-repo
coordination collapses to 1 sprint. Bardioc REST client integration
becomes optional follow-on, not blocker.
Phase 1 (Protocol B: plan → savant review → correct) drafts complete:
- pr-x3-cognitive-grid-design.md (shipped as PR #158)
- pr-x4-design.md
- pr-x9-design.md
- pr-z1-ogit-cognitive-bootstrap.md (superseded by PR-X13)
- pr-arithmetic-inventory.md
- pr-x10-linalg-core-design.md
- pr-master-consolidation.md
- pr-x11-jc-consolidation-design.md
- pr-x12-codec-x265-design.md
- pr-x13-ogit-bridge-design.md
Phase 2 (Protocol A: preflight Rust skeleton → parallel-savant fan-out →
workers fill bodies) starts after joint plan-review savant verdict on
all 10 docs. Per-sprint specialist savants: data-flow, layering,
distance-typing, SAFETY-claim, naming-collision, test-coverage.
SAFETY-claim savant exists specifically to catch the class of latent UB
that PR-X3's GridBlockMut had (caught post-merge by codex; preflight
catches it pre-implementation).
Also adds settings.json wildcard permissions (Edit/Write/MultiEdit/
NotebookEdit + Bash touch/cat/tee/bash) per user authorization. Reduces
popup friction for the upcoming 44-worker concurrent execution.
Summary
Ships
BlockedGrid<T, BR, BC>— a generic, const-generic-shaped, hierarchical block-padded 2-D grid incrate::hpc::blocked_grid::*— plus theblocked_grid_struct!SoA-of-grids macro. Layout-only, scalar inner loops, forward-compatible with the per-arch SIMD register-stack swap planned for PR-X5.This is the PR-X3 carve-out from the cognitive-shader roadmap (see
.claude/knowledge/cognitive-shader-foundation.md§"Current Gaps" and.claude/knowledge/pr-x3-cognitive-grid-design.md). All seven sprint workers (A1-A6 + B) landed. Codex P0 audit verdict: READY-FOR-PR, 0 P0, 2 P1 patched. P2 savant pre-merge review next.What ships
Public API surface (
crate::hpc::blocked_grid::*):base.rsBlockedGrid<T, BR, BC>struct,new/new_with_pad,idx/get/set,as_padded_slice*with# Footgundocs,GridBlock/GridBlockMutview types withPhantomDatalifetime variance, compile-timeBR > 0 && BC > 0assertiter.rsBaseBlockIter/BaseBlockIterMut,blocks_base/blocks_base_mut,GridBlockMut::row_mutsuper_block.rsGridSuperBlock/GridSuperBlockMut,TierBlockIter/TierBlockIterMut,blocks_tier::<N>/blocks_tier_mut::<N>with documented panic on invalid divisibilitycompute.rsmap_base/map_tier(PRIMARY compute — immutable self, returns new grid),bulk_apply_base/bulk_apply_tier(SECONDARY write-back, each carrying# Data-flow ruledocstring citing.claude/rules/data-flow.mdRule #3)aliases.rsShaderMantissaGrid,AmxBf16Grid,AmxInt8Grid,StripF32Stack2,StripF32Stack4,SquareF64Stack8,HalfSquareU64); L1/L2/L3/L4 alias impls onBlockedGrid<T, 64, 64>only — Q7 ruling — forblocks_l*/map_l*/bulk_apply_l*tests.rs+mod.rsdoctestbulk_applycomposition, L1→L2 cascade, footgun verification) + module-level canonical compose doctest + compile_fail guardsgrid_struct_macro.rsblocked_grid_struct!#[macro_export]SoA-of-grids macro, generated{Name}L1Block/{Name}L1BlockMut,map_l1/bulk_apply_l1on generated structs (L2-L4 macro methods deferred to follow-up),FieldGridReftrait +field_n::<I>()const-generic accessorDesign docs (
.claude/knowledge/):pr-x3-cognitive-grid-design.md(v2) — binding spec, absorbs plan-review savant P0/P1/P2 patches and seven Q1-Q7 rulingspr-x3-plan-review.md— Phase 2 savant verdictpr-x3-codex-audit.md— Phase 11 codex audit verdict (READY-FOR-PR)pr-x1-design.md+pr-x2-design.md— drafted in parallel, queued for their own sprintsLayering, data-flow, and distance-typing guardrails
#[target_feature], zero per-arch imports, zero raw intrinsics. Hardware dispatch happens inside the consumer's closure body viacrate::simd::*(W1a contract).&mut selfmethod carries a# Data-flow ruledocstring section pointing to the PRIMARYmap_*/map_l*compute path. Was P0 finding A1/A2 from the plan-review savant; v2 split compute from write-back.BlockedGridholdsT; doesn't know whatTmeans. Semantics live in consumer closures + futurecrate::hpc::cognitive::*(W7).Dependency addition
Adds
paste = "1"toCargo.toml(already present in workspace lock viacrates/burn; binary impact zero). Re-exported as#[doc(hidden)] pub use paste;insrc/lib.rsto support hygienic ident concat ([<$name L1Block>]) inside theblocked_grid_struct!macro. Standard stable-Rust mechanism formacro_rules!identifier generation.Test plan
Current state (after codex P1-1 patch at
01a70edb): 111 lib tests + 79 doctests, all 5 cargo gates green.cargo check -p ndarray --no-default-features --features std— PASScargo test -p ndarray --lib --no-default-features --features std hpc::blocked_grid— 111/111 PASScargo test --doc -p ndarray --no-default-features --features std hpc::blocked_grid— 79/79 PASScargo fmt --all -- --check— cleancargo clippy -p ndarray --no-default-features --features std -- -D warnings— cleanSprint protocol traceability
Per
.claude/knowledge/pr-x3-cognitive-grid-design.md§"Worker decomposition":b348d43c)c6414ec0) + per-worker file refactor (b5329f06)01a70edb)Roadmap (post-merge)
splat3d/tile.rs16×16-tile binning ontoBlockedGridStackedU64x8<N>,StackedF32x16<N>,AmxTile<T, R, C>) incrate::simd::*with per-archLazyLockdispatch.claude/knowledge/pr-x1-design.mdandpr-x2-design.md)https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS