Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
b348d43
docs(pr-x3): CognitiveGrid hierarchical block layout design v1
claude May 18, 2026
c6414ec
docs(pr-x3): design v2 — savant P0/P1 patches + Q1-Q7 rulings
claude May 18, 2026
cb180bd
chore(claude-settings): convert file-tool allow patterns to {**} glob
claude May 18, 2026
a7e9a67
feat(hpc/blocked_grid): add BlockedGrid<T, BR, BC> struct + accessors…
claude May 18, 2026
b5329f0
refactor(hpc/blocked_grid): split A1's mod.rs into per-worker submodules
claude May 18, 2026
c4230a5
docs(pr-x1, pr-x2): draft design specs for SIMD-staged primitives + s…
claude May 18, 2026
af8a4c8
feat(hpc/blocked_grid): add base-block iterators (PR-X3 A2)
claude May 18, 2026
e4c23f7
chore(hpc/blocked_grid): re-export A2 base-block iterators from mod.rs
claude May 18, 2026
195ce67
feat(hpc/blocked_grid): add super-block + tier iterators (PR-X3 A3)
claude May 18, 2026
3dd4f3f
chore(hpc/blocked_grid): re-export A3 super-block + tier iterators fr…
claude May 18, 2026
2ed97a6
feat(hpc/blocked_grid): add map_* compute + bulk_apply_* write-back (…
claude May 18, 2026
b479956
feat(hpc/blocked_grid): add convenience aliases + L1-L4 impls (PR-X3 A5)
claude May 18, 2026
fbbbb35
chore(hpc/blocked_grid): re-export A5 aliases from mod.rs
claude May 18, 2026
32eaf11
test(hpc/blocked_grid): integration tests + module-level doctest (PR-…
claude May 18, 2026
b4c6692
feat(hpc/blocked_grid): add blocked_grid_struct! macro for SoA-of-gri…
claude May 18, 2026
01a70ed
fix(hpc/blocked_grid): apply codex P1 — add data-flow rule to row_mut…
claude May 18, 2026
81766e6
tighten(hpc/blocked_grid): apply P2 savant pre-merge findings + persi…
claude May 18, 2026
656a368
fix(hpc/blocked_grid): UB — overlapping &mut [T] in GridBlockMut (cod…
claude May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
563 changes: 563 additions & 0 deletions .claude/knowledge/pr-x1-design.md

Large diffs are not rendered by default.

506 changes: 506 additions & 0 deletions .claude/knowledge/pr-x2-design.md

Large diffs are not rendered by default.

58 changes: 58 additions & 0 deletions .claude/knowledge/pr-x3-codex-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# PR-X3 Codex P0 Audit — Verdict

Auditor: Sonnet codex P0 auditor (Phase 11 of PR-X3 sprint)
PR: AdaWorldAPI/ndarray#158
Branch audited: `claude/pr-x3-cognitive-grid-design` @ `b4c66921`
Compared against: `origin/master`

Verdict: **READY-FOR-PR**

P0 count: **0**
P1 count: 2 (advisory)
P2 count: 2 (defer to P2 savant)

## P0 findings (must fix before ready-for-review)

None.

## P1 findings (advisory — coordinator applied before P2 savant)

### P1-1 — `GridBlockMut::row_mut` lacked `# Data-flow rule` docstring

`src/hpc/blocked_grid/iter.rs:282-302` (line numbers pre-patch). The audit gate requires every `&mut self` public method on block view types to carry the data-flow rule citation. `row_mut` had module-level data-flow framing but not a method-level `# Data-flow rule` section.

**Patch applied by coordinator** (commit pending): added a `# Data-flow rule` block citing `.claude/rules/data-flow.md` Rule #3 verbatim and pointing readers to `BlockedGrid::map_base` for compute paths.

### P1-2 — `paste = "1"` dep addition not noted in PR description

`Cargo.toml` (new line: `paste = "1"`). Worker B added the `paste` dependency for hygienic ident concat (`[<$name L1Block>]`) in the `blocked_grid_struct!` macro. Already present in workspace lock via `crates/burn`, so binary impact is zero. Re-exported as `#[doc(hidden)] pub use paste;` in `src/lib.rs`.

**Action**: coordinator updates PR description with one line noting the dep.

## P2 findings (deferred to P2 savant)

### P2-1 — `pub` helpers on `GridBlock` / `GridBlockMut` (worker A4 commit)

`src/hpc/blocked_grid/base.rs:413-421, 557-563`. Worker A4 added `data_slice()` / `padded_cols_stride()` on `GridBlock` and `data_mut()` / `padded_cols()` on `GridBlockMut` as `pub` (not `pub(super)`) to enable sibling-module access. Downscoping to `pub(crate)` or `pub(super)` would tighten the public API. P2 savant ruling needed.

### P2-2 — `field_n::<I>()` returns `&dyn FieldGridRef` (type-erased)

`src/hpc/blocked_grid/grid_struct_macro.rs:490-512`. Matches the W3-W6 `soa_struct!` pattern. P2 savant may want a typed `field_grid::<I, FieldT>()` accessor as additive complement.

## Audit gates — pass/fail summary

| # | Gate | Result |
|---|---|---|
| 1 | Zero per-arch surface (target_feature / cfg / intrinsics / per-arch imports) | ✅ PASS (exhaustive grep of `src/hpc/blocked_grid/`) |
| 2 | Data-flow Rule #3 docstring on every `&mut self` compute-adjacent method | ✅ PASS (after P1-1 patch) |
| 3 | Zero distance-aware API surface | ✅ PASS |
| 4 | Every `pub fn` has a working `# Example` doctest | ✅ PASS (79 doctests, 0 failed) |
| 5 | Spec adherence (Q1-Q7 rulings, all 7 type aliases, `new_with_pad`, `# Footgun`, L1-L4 64×64-only, `field_n`, macro `map_*`+`bulk_apply_*` split) | ✅ PASS |
| 6 | Macro `#[macro_export]`, reserved names documented, L2-L4 deferral documented | ✅ PASS |
| 7 | Architectural deviations flagged | ✅ FLAGGED (paste dep — see P1-2) |

## Net call

Zero P0 findings. The P1-1 patch is a one-paragraph docstring addition (no logic change). The P1-2 action is a PR-description tweak. Both are coordinator-level edits — no new sprint worker needed.

**Recommended next phase: Phase 13 (P2 savant pre-merge review)**, with the two P2 findings above pre-flagged for explicit ruling. After P2 savant verdict, coordinator flips PR #158 from draft → ready-for-review and advances to merge ladder.
826 changes: 826 additions & 0 deletions .claude/knowledge/pr-x3-cognitive-grid-design.md

Large diffs are not rendered by default.

53 changes: 53 additions & 0 deletions .claude/knowledge/pr-x3-p2-savant-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# PR-X3 P2 Savant Pre-Merge Review — Verdict

Reviewer: Sonnet P2 savant (Phase 13 of PR-X3 sprint)
PR: AdaWorldAPI/ndarray#158
Branch reviewed: `claude/pr-x3-cognitive-grid-design` @ `01a70edb`

Verdict: **SHIP-WITH-FOLLOWUPS**

P2 count: 4 (3 applied pre-merge, 1 deferred to PR-X3.1)

## Highest-leverage tightenings (rank-ordered)

1. **P2-3 — drop stray `T: Copy` from `from_grid` + iterator impls** (APPLIED) — `src/hpc/blocked_grid/base.rs:334, 482` and `iter.rs:55, 87, 144, 187, 193`
2. **P2-1 — downscope four helpers to `pub(crate)`** (APPLIED) — `src/hpc/blocked_grid/base.rs:413, 421, 557, 563`
3. **P2-4 — macro deferral wording strengthened with PR-X3.1 marker** (APPLIED) — `src/hpc/blocked_grid/grid_struct_macro.rs:12-18`
4. **P2-2 — typed `field_grid::<I, FieldT>()` accessor** (DEFERRED → PR-X3.1)

## Detailed findings + rulings

### P2-1: `pub` helpers on `GridBlock` / `GridBlockMut` → `pub(crate)` (APPLIED)

Worker A4 added `data_slice()` / `padded_cols_stride()` on `GridBlock` and `data_mut()` / `padded_cols()` on `GridBlockMut` as `#[doc(hidden)] pub` to enable sibling-module access. Leaving them `pub` means downstream consumers can call `blk.data_slice()` and bypass the `# Footgun` guard on `as_padded_slice`. Downscoped all four to `pub(crate)` and dropped the `#[doc(hidden)]` attribute (no longer needed once visibility is tightened).

### P2-2: `field_n::<I>()` type erasure (DEFERRED → PR-X3.1)

Returns `&dyn FieldGridRef` (erased type), matching the W3-W6 `soa_struct!` pattern. Adding a typed `field_grid::<I, FieldT>()` accessor is additive but requires either a `FromFieldGridRef` downcast trait or a macro-generated per-field method — neither is trivially additive without a new trait or extra macro emit arm. The erased form is sufficient for the PR-X3 use case (dimension parity checks); the typed accessor would unlock `let edge: &BlockedGrid<u64, 64, 64> = g.field_grid::<0, u64>()` but no current consumer needs it. Queue for PR-X3.1 alongside macro L2/L3/L4 deferral.

### P2-3: Stray `T: Copy` bound on iterator surface (APPLIED)

`GridBlock::from_grid` / `GridBlockMut::from_grid` carried `where T: Copy` even though their bodies only compute index arithmetic and slice `&grid.data[start..end]` — no `T` value is ever copied. This bound propagated into `Iterator for BaseBlockIter` / `BaseBlockIterMut` / `ExactSizeIterator` / the `impl<T: Copy> BlockedGrid<T, BR, BC>` block holding `blocks_base` / `blocks_base_mut`. A consumer with `BlockedGrid<MyNonCopyType, 8, 8>` could only `get` / `set` cells, not iterate. Removed the bound from all six sites. `BlockedGrid::get` / `set` still correctly require `T: Copy` (they actually copy values).

### P2-4: Macro L1-only deferral wording (APPLIED)

The v1 macro emits `map_l1` / `bulk_apply_l1` / `blocks_l1` on the generated struct; L2/L3/L4 are deferred. The deferral itself is the right call (emitting lockstep `{Name}L2Block` for a four-field struct requires `paste!`-generated types with `N=4` const generics — non-trivial without regression risk). But the v1 deferral note was low-visibility, risking callers cementing per-field workarounds. Strengthened the wording: explicit PR-X3.1 ticket reference + `TODO(PR-X3.1)` marker + dedicated "per-field workaround warning" subsection alerting readers that per-field call sites won't auto-migrate when PR-X3.1 lands.

## CI signal

No fragile tests in the new modules: no timing-dependent, no env-dependent, no `#[ignore]`-gated tests. The `BaseBlockIterMut` raw-pointer lending-iterator carries three `// SAFETY:` annotations accounting for the aliasing invariant — appropriate level of annotation for this pattern. No CI concern.

The `paste = "1"` dep (P1-2 from codex audit) is already in the workspace lock and has zero binary impact.

## Net call

Three P2 tightenings applied as a same-day follow-up commit on this branch. P2-2 (typed `field_grid` accessor) correctly post-merge — queued for PR-X3.1 alongside the macro L2/L3/L4 emission.

After this commit lands, PR #158 flips draft → ready-for-review and advances to the merge ladder.

## PR-X3.1 follow-up backlog

Queued for a small same-week follow-up PR:
1. Emit lockstep `{Name}L{2,3,4}Block` block view types + `map_l{2,3,4}` + `bulk_apply_l{2,3,4}` methods on the macro-generated SoA-of-grids struct
2. Add `field_grid::<I, FieldT>()` typed accessor alongside the existing `field_n::<I>()` erased accessor
3. Naming consistency: rename `GridBlockMut::padded_cols` → `padded_cols_stride` to match `GridBlock::padded_cols_stride`
119 changes: 119 additions & 0 deletions .claude/knowledge/pr-x3-plan-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# PR-X3 Plan Review — Savant Verdict

Auditor: Sonnet plan-review savant (Phase 2 of sequential PR-X3 sprint)
Design doc reviewed: `.claude/knowledge/pr-x3-cognitive-grid-design.md` @ b348d43c
Verdict: **READY-WITH-DOC-FIXES**

P0 count: 2 | P1 count: 7 | P2 count: 4

## P0 findings (must fix before sprint can spawn)

### A1 — `bulk_apply_base(&mut self, F)` violates data-flow Rule #3

The methods `bulk_apply_base(&mut self, F)` and `bulk_apply_tier(&mut self, F)` violate `.claude/rules/data-flow.md` Rule #3 ("No `&mut self` during computation. Ever."). The methods are explicitly framed as compute paths in the design doc (§"Tier semantics map to cognitive shader passes" places "the CausalEdge64 mantissa pass" on `bulk_apply_l1`), not builder/constructor paths.

**Patch language**: split the API into two named families:

```rust
// PRIMARY compute path - immutable self, returns new grid (builder pattern)
pub fn map_base<U: Copy + Default, F>(&self, f: F) -> BlockedGrid<U, BR, BC>
where
F: FnMut(&Block<'_, T, BR, BC>, &mut BlockMut<'_, U, BR, BC>);

pub fn map_tier<U: Copy + Default, const N: usize, F>(&self, f: F) -> BlockedGrid<U, BR, BC>
where
F: FnMut(&SuperBlock<'_, T, BR, BC, N>, &mut SuperBlockMut<'_, U, BR, BC, N>);

// SECONDARY write-back variant - in-place mutation, explicit gated write-back
//
// # Data-flow rule
//
// This is the gated write-back variant of [`map_base`]. The closure performs
// write-back operations ONLY (per `.claude/rules/data-flow.md` Rule #3).
// For compute paths use `map_base` which returns a new grid.
pub fn bulk_apply_base<F>(&mut self, f: F)
where
F: FnMut(&mut BlockMut<'_, T, BR, BC>);

pub fn bulk_apply_tier<const N: usize, F>(&mut self, f: F)
where
F: FnMut(&mut SuperBlockMut<'_, T, BR, BC, N>);
```

Same split applies to the L1/L2/L3/L4 convenience aliases on `BlockedGrid<T, 64, 64>`:
- `map_l1` / `map_l2` / `map_l3` / `map_l4` — primary compute paths
- `bulk_apply_l1` / `bulk_apply_l2` / `bulk_apply_l3` / `bulk_apply_l4` — write-back variants

### A2 — Macro-generated bulk_apply methods inherit A1 violation

The `cognitive_grid_struct!`-generated `bulk_apply_l1` / `bulk_apply_l2` / `bulk_apply_l3` / `bulk_apply_l4` carry the same `&mut self` + compute framing problem. Fix propagates from A1: the macro emits BOTH `map_l1` (compute, returns new struct with mapped fields) AND `bulk_apply_l1` (write-back) alongside each other.

## P1 findings

### H1 — Sprint protocol step 4 contradicts the binding sequential rule

§"Sprint protocol" step 4 in the design doc currently says "Two workers in parallel" (carryover from W3-W6 protocol shape). This contradicts the explicit "5–10 sequential Sonnet workers + 1 Opus coordinator" protocol in §"Worker decomposition". Fix: align step 4 to read "Spawn sprint workers SEQUENTIALLY (per §"Worker decomposition")" and remove "in parallel".

Additionally: with P0 patches adding the `map_*` family alongside `bulk_apply_*`, the composite Worker A scope grows past reliable single-pass Sonnet attention. **Adopt the 7-worker split (A1–A6 + B) as the DEFAULT, not the fallback.**

### F1 — Type name `CognitiveGrid` overstates scope

The type is a generic 2-D blocked grid usable anywhere a hierarchical layout matters (BLAS GEMM blocking, image processing tiles, scientific computing). The "cognitive" prefix in the type name leans into one use case but couples the generic primitive to it semantically. **Rename `CognitiveGrid` → `BlockedGrid`. Add `pub type ShaderMantissaGrid = BlockedGrid<u64, 64, 64>;` to carry the cognitive framing as an alias.** Module path stays at `crate::hpc::blocked_grid::*` (with `cognitive_grid` deprecated alias if needed for back-compat — but PR-X3 is greenfield, so just `blocked_grid`).

The macro renames consistently: `cognitive_grid_struct!` → `blocked_grid_struct!`.

### F2 — Block / BlockMut naming

Cross-checked against existing ndarray types: `Block` is used in `crate::backend::native` BLAS kernels for a different concept (cache-blocked GEMM block sizes). To avoid collision, prefer `GridBlock` / `GridBlockMut` / `GridSuperBlock` / `GridSuperBlockMut` in `crate::hpc::blocked_grid::*`.

### F3 — L1/L2/L3/L4 tier names

Cache hierarchy convention is innermost=L1=fastest, outermost=L4=RAM. The design doc uses this convention. Verify the doc states this explicitly (currently implicit). Add a one-sentence note in the L1-L4 alias docstring: "Following cache-hierarchy convention: L1 = innermost (32 KB), L4 = framebuffer-scale (2 GB)."

### G2 — `bulk_apply_tier::<N>` + L2/L3/L4 aliases — keep both

Ruling on Q2: provide BOTH the const-generic `map_tier::<N>` / `bulk_apply_tier::<N>` AND the L2/L3/L4 alias methods. Aliases are convenience for the default 64×64 base; const-generic is the escape hatch for non-default bases. Same applies to the `map_*` family.

### G6 — `T::default()` padding bound is too restrictive

Q5 ruling: ADD `BlockedGrid::new_with_pad(rows, cols, pad_value: T)` constructor that takes the padding fill explicitly. Bound: `T: Copy` only, no `Default`. The `new` constructor stays as `T: Copy + Default` calling `new_with_pad(rows, cols, T::default())`.

### G3 — `as_padded_slice` exposure

Q6 ruling: KEEP `as_padded_slice` / `as_padded_slice_mut` as a feature (not footgun). Add a `# Footgun` section to each method's docstring explaining: "Returned slice includes padding cells at the right and bottom of the logical extent. Use [`rows`]/[`cols`] to compute logical bounds; do NOT use slice indices past `rows() * padded_cols() + cols()` for logical-only data." Plus an example showing how to compute the logical-cell flat index.

### G4 — `field_n::<I>` compile-time accessors on macro output

Q4 ruling: The `blocked_grid_struct!` macro should emit `field_n::<I>()` const-generic field accessors on the generated struct's L1 block type (matching the `soa_struct!` pattern from W3-W6). Failure to do so would force consumers into runtime field-index lookups in hot paths.

## P2 findings

### J1 — Open question Q3 ruling

`Block<'a, T, BR, BC>` and `BlockMut<'a, T, BR, BC>` are separate types (current spec). Verify they carry `PhantomData<&'a T>` / `PhantomData<&'a mut T>` markers explicitly for lifetime variance, not just by-virtue-of-having-`&'a [T]`-field. Idiomatic Rust 2024.

### J2 — Open question Q4 ruling

Per-field `#[grid(field_block = ...)]` heterogeneous block shapes — v1 locks to uniform block shape (all fields share `BR, BC`). Per-field extension is additive, not breaking. Document as "future work" in the macro docstring; do NOT support in v1.

### J3 — Open question Q7 ruling

L1-L4 aliases ONLY on `BlockedGrid<T, 64, 64>`. AMX (16×16) / strip (1×16) / half-square (32×64) grids use raw `blocks_tier::<N>` / `map_tier::<N>` / `bulk_apply_tier::<N>`. Document this constraint in the alias docstring.

### J4 — Add explicit "out of scope: SIMD primitives" warning in module header

The design doc has §"Out of scope" but the module-level docstring (`//!`) should also carry a concise version. Three lines max. Saves consumers from filing "why isn't aos_to_soa SIMD-accelerated" issues.

## Rulings on open questions (Q1–Q7 from design doc)

- **Q1: BlockedGrid** — rename `CognitiveGrid → BlockedGrid`. Add `ShaderMantissaGrid` alias for the cognitive-shader use case.
- **Q2: Both** — `map_tier::<N>` / `bulk_apply_tier::<N>` const-generic entries AND L1-L4 alias methods.
- **Q3: Separate** — keep `Block` / `BlockMut` as distinct types with `PhantomData` lifetime markers (rename to `GridBlock`/`GridBlockMut` per F2).
- **Q4: Compatible** — v1 uniform block shape; per-field extension additive, future work.
- **Q5: Add** — `new_with_pad(rows, cols, pad_value: T)` alongside `new`; `T: Copy` only, no `Default` bound on the new constructor.
- **Q6: Feature** — keep `as_padded_slice*`; add `# Footgun` doc section.
- **Q7: 64×64-only** — L1-L4 aliases only on `BlockedGrid<T, 64, 64>`. AMX / strip / half-square grids use raw `blocks_tier::<N>`.

## Net call

**Recommended next phase: Phase 3 (corrector)** — apply A1+A2 P0 patches + all P1 fixes to the design doc, commit as v2, then spawn Phase 4 sprint workers using the 7-worker split (A1–A6 + B) as the default decomposition. No structural rethink required. The P0 fixes (map/bulk_apply split, BlockedGrid rename) are mechanical edits that propagate cleanly through the doc.
30 changes: 25 additions & 5 deletions .claude/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,31 @@
"permissions": {
"allow": [
"mcp__github__*",
"Read(**)",
"Write(**)",
"Edit(**)",
"MultiEdit(**)",
"NotebookEdit(**)",
"Read({**})",
"Write(src/{**})",
"Write(crates/{**})",
"Write(.claude/knowledge/{**})",
"Write(.claude/settings.json)",
"Write(.claude/settings.local.json)",
"Write(Cargo.toml)",
"Write(Cargo.lock)",
"Write(README.md)",
"Write(CLAUDE.md)",
"Write(.cargo/config.toml)",
"Edit(src/{**})",
"Edit(crates/{**})",
"Edit(.claude/knowledge/{**})",
"Edit(.claude/settings.json)",
"Edit(.claude/settings.local.json)",
"Edit(Cargo.toml)",
"Edit(Cargo.lock)",
"Edit(README.md)",
"Edit(CLAUDE.md)",
"Edit(.cargo/config.toml)",
"MultiEdit(src/{**})",
"MultiEdit(crates/{**})",
"MultiEdit(.claude/knowledge/{**})",
"MultiEdit(Cargo.toml)",
"Bash(git *)",
"Bash(ls *)",
"Bash(ls)",
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,11 @@ fractal = { path = "crates/fractal", default-features = false, optional = true }
serde = { version = "1.0", optional = true, default-features = false, features = ["alloc"] }
rawpointer = { version = "0.2" }

# paste — identifier concatenation in macro_rules! expansions.
# Required by `blocked_grid_struct!` to generate {Name}L1Block / {Name}L1BlockMut
# view types. Already present in the workspace lockfile (via crates/burn).
paste = "1"


# Cranelift JIT (optional, behind "jit-native" feature)
# For AVX-512 VPOPCNTDQ/VNNI/VPTERNLOG/BITALG support, use the patched fork:
Expand Down
Loading
Loading