W3-W6: SoA/AoS layout helpers — SoaVec + soa_struct! + aos_to_soa + soa_to_aos + bulk_apply (scalar; SIMD deferred)#156
Merged
Conversation
Worker-prep doc for the W3-W6 sprint. Fixes exact API signatures for:
- W3: SoaVec<T,N> + soa_struct! macro (src/hpc/soa.rs)
- W5+W6: aos_to_soa / soa_to_aos (src/simd_ops.rs)
- W4: bulk_apply / bulk_scan (src/hpc/bulk.rs)
Scope intentionally scalar-only. SIMD acceleration of AoS<->SoA deferred
to a post-bench-harness wave once measured hot paths exist. Public API
forward-compatible with the future per-arch SIMD swap.
Layering rule reaffirmed: zero #[target_feature], zero direct
simd_{type}.rs imports, zero cfg(target_feature) gates in any W3-W6
file (codex audit will block on violations).
One P0 (move aos_to_soa/soa_to_aos from simd_ops.rs to hpc::soa.rs), six P1 clarifications, several P2 polish notes. Layering rule respected, no irreducible ambiguity. Recommend single combined W3+W5+W6 worker plus parallel W4 worker after doc patches land.
Captures the typing rule for palette-256 / HDR popcount / Base17 L1 / Fisher-z / BF16-mantissa direct-transform as separate typed primitives. Documents the worst-case roundtrip anti-pattern (palette > fisher-z > 'cosine' > hamming > popcount > palette) and the typed fast path (palette > euler-gamma-offset > palette-BF16-mantissa-exact-transform). Binding for W7 (cognitive bulk ops, deferred) and any future cognitive distance API work. W3-W6 (in-flight SoA/AoS helpers) are layout-only and explicitly do NOT bake in any distance metric.
Applies plan-review savant findings (commit 35a8e03): - P0-1: aos_to_soa / soa_to_aos move from src/simd_ops.rs to src/hpc/soa.rs (co-located with SoaVec; simd_ops.rs charter is SIMD-only) - P1-1: add field_n<const I> + field_n_mut<const I> compile-time accessors alongside runtime field(i) - P1-2: doc note that T need not be Copy - P1-3: §'Reserved field names' listing macro-method collisions (new/with_capacity/len/is_empty/clear/push/default) - P1-4 (D3): caller-owned invariant rule documented; macro fields stay pub - P1-5: inference fallback hint for aos_to_soa turbofish - P1-7: explicit 'do not manually re-export the macro' (macro_export handles it) - Worker plan: collapse W3+W5+W6 into one worker on hpc/soa.rs; W4 on hpc/bulk.rs (2 workers total in parallel, was 3) Adds §'Out of scope - distance metrics' citing cognitive-distance-typing.md (commit 5927712). W3-W6 helpers are layout-only; workers must not extend toward distance computation. W7 deferral expanded with representative bulk-fn shapes per typed metric (palette-256 / popcount / BF16-mantissa direct transform).
New module src/hpc/soa.rs implementing the SoA/AoS layout helpers per .claude/knowledge/w3-w6-soa-aos-design.md v2 (commit 10151d7). W3: SoaVec<T,N> generic container ([Vec<T>; N] inside) with runtime field(i) + compile-time field_n::<I>(), chunks(k), all_fields(). soa_struct! macro generates named-field structs with Vec<T> per field and inherent new/push/len/clear/Default. W5+W6: aos_to_soa<T,N,F> scalar deinterleave via closure; soa_to_aos<T,N,F> scalar interleave inverse. Co-located with SoaVec per plan-review P0-1 (NOT in simd_ops.rs — that module is SIMD-only). Scalar implementations throughout. No #[target_feature], no simd_{type}.rs imports, no cfg(target_feature). Forward-compatible with future bench-justified SIMD swap via grow-internal-arms. Out of scope: distance metrics (see .claude/knowledge/cognitive-distance-typing.md).
New module src/hpc/bulk.rs implementing chunked AoS traversal per .claude/knowledge/w3-w6-soa-aos-design.md v2 (commit 10151d7). bulk_apply<T, F: FnMut(&mut [T], usize)>: chunks &mut [T] via chunks_mut(chunk_size) and invokes the closure with each chunk plus its absolute starting index. Useful for cache-blocked traversal and for SoA-staging via composition with aos_to_soa inside the closure. bulk_scan: read-only sibling with the same chunking semantics. Both panic on chunk_size == 0. Scalar wrappers — no #[target_feature], no simd_{type}.rs imports. Out of scope: distance metrics (see .claude/knowledge/cognitive-distance-typing.md). The aos_to_soa composition integration test is gated behind cfg(any()) with a TODO note: Worker A's src/hpc/soa.rs has not yet landed on this branch base (ab20d11). When it does, drop the gate to enable the test.
Codex W3-W6 audit P1: chunk_size==usize::MAX is tested but not documented in the public docstring. One-line addition to bulk_apply and bulk_scan: 'A chunk_size of usize::MAX yields the entire slice as a single chunk.' Also persists the W3-W6 audit doc the audit agent couldn't write itself (sandbox permission for .claude/knowledge).
6 tasks
AdaWorldAPI
added a commit
that referenced
this pull request
May 18, 2026
docs(hpc/soa): P2 savant tightenings — f32-only scope + hpc::soa layering rationale + integration test ungate (orphan rescue from #156)
AdaWorldAPI
pushed a commit
that referenced
this pull request
May 18, 2026
…gmoid_f32 Adds the missing F→C direction of the strides-mismatch regression test. Upstream (PR #156 / 589ef56) already landed: - The `x.strides() == out.strides()` guard on `sigmoid_f32`. - `test_sigmoid_f32_c_in_f_out_mismatched_strides` (C-order input, F-order output). This commit adds the SYMMETRIC counterpart: F-order input, C-order output. If a future refactor narrows the guard to only check the C→F direction (e.g. `if x.is_standard_layout() != out.is_standard_layout()` phrased asymmetrically, or a one-sided `as_slice` vs `as_slice_memory_order` mismatch), the C→F test would still pass while F→C silently regressed. Pinning both directions keeps the strides-equality guard symmetric. The original sigmoid_f32 fix work on this branch became redundant when the upstream commit landed (identical code, slightly different comment) — branch reset to master and only the symmetric test is preserved as net-new value. ## Test count cargo test --lib hpc::activations → 18 passed; 0 failed (was 17 upstream: +1) cargo fmt --all --check → clean https://claude.ai/code/session_017GFLBnDy23AWBqvkbHHC41
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Establishes SoA/AoS layout-handoff helpers across the codebase. Scalar-only by design — public API forward-compatible with future bench-justified per-arch SIMD acceleration via the existing
LazyLockdispatch layer, but no SIMD bodies in this PR.W3 + W5 + W6 land as a combined deliverable in
src/hpc/soa.rs; W4 lands separately insrc/hpc/bulk.rs. W7 (cognitive bulk ops over&[Plane]/&[Fingerprint256]/ etc.) is explicitly deferred until a bench harness identifies measured hot paths to accelerate.What's in
src/hpc/soa.rsSoaVec<T,N>,SoaChunks,soa_struct!macrosrc/hpc/soa.rs(same file)aos_to_soa<T,N,F>,soa_to_aos<T,N,F>src/hpc/bulk.rsbulk_apply,bulk_scanPlus four supporting knowledge docs in
.claude/knowledge/:w3-w6-soa-aos-design.md— design contract (v2, after savant patches)w3-w6-plan-review.md— savant's pre-spawn auditcognitive-distance-typing.md— typed-distance rule (palette-256 ≠ HDR popcount ≠ Base17 L1; no roundtrips, no umbrella API)w3-w6-codex-audit.md— post-sprint codex verdictTwo-layer rule (re-affirmed)
W3-W6 lives at the user-code level. Zero
#[target_feature], zerocfg(target_feature), zero directsimd_{type}.rsimports, zero raw intrinsics. Verified by codex audit (grepchecks all returned empty).Distance typing guardrail
cognitive-distance-typing.md(committed in this PR) establishes that distance metrics in this codebase are typed:(PaletteIdx, PaletteIdx, &Buckets, EulerGammaOffset) → PaletteDistance— buckets and Euler offset are integralThe worst-case roundtrip (palette → fisher-z → "cosine" → hamming → popcount → palette) is explicitly illegal because each arrow erases the typing the previous step earned.
W3-W6 helpers are layout-only and explicitly do NOT bake in any distance metric. Both module headers warn against extension toward distance and point at the typing doc. W7 (when it lands) will ship per-metric named bulk fns (
bulk_hdr_popcount_early_exit,bulk_palette256_distance,bulk_palette256_bf16_mantissa_transform), never abulk_distance<T>umbrella.Process — multi-agent sprint
Per the established protocol: plan → savant review → correct → sprint → codex audit → fix P0 → commit → repeat.
hpc::soanotsimd_ops.rs) + 7 P1src/hpc/soa.rscombined (W3+W5+W6) — single commitsrc/hpc/bulk.rs(W4) — single commitusize::MAXdocstring gap), 3 P2 deferreddocs(hpc/bulk): F4 - document usize::MAX chunk_size semantics)Codex audit verdict — verification
cargo check -p ndarray --no-default-features --features stdcargo test --lib --no-default-features --features std hpc::soacargo test --lib --no-default-features --features std hpc::bulkcargo test --doc --no-default-features --features std hpc::soacargo test --doc --no-default-features --features std hpc::bulkcargo fmt --all -- --checkcargo clippy --no-default-features --features std -- -D warningsWhat's deferred (P2 from the audit, explicit non-goals for this PR)
bulk_scannaming (savant suggestedbulk_for_each— kept for symmetry withbulk_apply; rename if downstream finds it misleading)SoaVec::iter_rows()row iterator (usesoa.chunks(1)for now)#[derive(Clone, Debug)]onSoaVec(would requirewhere T: Clone+Debugbounds; macro-generated structs already support derive passthrough)aos_to_soa/soa_to_aos(defer until bench harness identifies a hot path; the public API forward-compatible via grow-internal-arms)W7 explicit deferral
Cognitive bulk ops on
&[Plane]/&[Fingerprint256]/&[PaletteIdx]are NOT in this wave. Two reasons:When W7 revisits, bulk fns will be one-per-metric (
bulk_hdr_popcount_early_exit,bulk_palette256_distance, etc.), MAY internally useSoaVec+bulk_applyfrom this PR for staging.Downstream impact
None. This PR is purely additive: two new modules under
hpc/. No existing signatures change. burn-ndarray, candle, tract, ort, lance-graph continue to build unchanged.The new symbols become available at
ndarray::hpc::soa::*andndarray::hpc::bulk::*. Thesoa_struct!macro is#[macro_export]sondarray::soa_struct!works at the crate root.Test plan
cargo checkcleancargo test --lib hpc::soa hpc::bulk— 45 passed / 0 failedcargo test --doc hpc::soa hpc::bulk— 12 passed / 0 failedcargo fmt --all -- --checkcleancargo clippy --no-default-features --features std -- -D warningscleangrepconfirms zero umbrella API surfacesgrepconfirms zero#[target_feature]/cfg(target_feature)/ per-arch imports / raw intrinsicsGenerated by Claude Code