diff --git a/.gitignore b/.gitignore
index ced6b991..69447569 100644
--- a/.gitignore
+++ b/.gitignore
@@ -107,4 +107,9 @@ docs/superpowers/baselines/**/raw/**
# Agent-generated plans / scratch (not project deliverables)
phase*-plan.md
*-plan.md
+# …but project-deliverable plans land under docs/superpowers/plans/ and
+# must be trackable. The directory itself needs un-ignoring first because
+# the outer `docs/superpowers/*` rule above excludes the dir, and an
+# ignored directory's contents cannot be re-included by a later pattern.
+!docs/superpowers/plans/
!docs/superpowers/plans/*.md
diff --git a/docs/superpowers/plans/2026-05-13-enrich-oom-fix.md b/docs/superpowers/plans/2026-05-13-enrich-oom-fix.md
new file mode 100644
index 00000000..2c0a69fa
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-13-enrich-oom-fix.md
@@ -0,0 +1,829 @@
+# Enrich Pipeline OOM Fix — Streaming Refactor
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Eliminate the `codeiq enrich` OOM on real-world polyglot codebases (~/projects/-scale: 49k files / 434k nodes) while keeping output bytes-identical to today's pipeline. The bar is "no OOM, ever, on any input that fits in disk."
+
+**Architecture:** Three coordinated refactors, sequenced low → high risk so each ships independently and yields measurable improvement on its own.
+
+1. **Phase A — Quick wins** (4 surgical fixes). Targets the actual pprof hotspots: parse-storm, unbounded goroutines, Kuzu buffer pool default, GraphBuilder dual-lifetime. Expected to drop peak RSS by 70-85% on its own.
+2. **Phase B — TreeCursor migration**. Replaces `parser.Walk`'s recursive `Node.Child()` traversal with tree-sitter's `TreeCursor`. Eliminates the largest single allocator (`cachedNode` — 91% of churn).
+3. **Phase C — Streaming three-pass enrich**. The architectural fix. Decouples enrich stages so the full graph is never materialised in Go memory. Memory-safe by construction; scales to 10M+ nodes.
+4. **Phase D — Verification harness**. Memory + wall-time benchmark suite that pins the regression bar in CI.
+
+**Tech Stack:** Go 1.25.10, Kuzu 0.7.1 (`github.com/kuzudb/go-kuzu`), tree-sitter (`github.com/smacker/go-tree-sitter`), SQLite (`mattn/go-sqlite3`). CGO required everywhere. Already in use — no new dependencies for Phases A + B; Phase C may add `lanrat/extsort` (28★, Apache-2) if disk-spill becomes necessary at extreme scale.
+
+**Execution mode:** ralph-loop with **no iteration limit**. The loop drives this plan to completion via the recipe in §"Ralph-loop execution recipe" below. No human gates within phases; humans approve PRs at phase boundaries.
+
+---
+
+## Ralph-loop execution recipe
+
+This plan is the loop's source of truth. The loop's prompt should be:
+
+> `Read /home/dev/projects/codeiq/docs/superpowers/plans/2026-05-13-enrich-oom-fix.md and progress at /home/dev/projects/codeiq/.claude/oom-fix-progress.md. Execute the next undone Task per the iteration recipe in the plan. Update the progress file after every Step. Only exit when OOM FIXED on ~/projects/ is genuinely true per the acceptance criteria in §"Completion promise".`
+
+### Completion promise
+
+The loop MUST NOT emit `OOM FIXED on ~/projects/` until ALL of the following hold:
+
+1. All Phase A, B, C, D tasks are checked complete in the progress file.
+2. The 4 PRs (one per phase) are merged into `main`. Verified via `gh pr list --state merged --head --json mergedAt`.
+3. `/usr/bin/time -v codeiq enrich ~/projects/` runs to completion with exit 0 AND peak RSS < 4 GiB (extracted from `Maximum resident set size (kbytes):` line).
+4. `codeiq stats ~/projects/` returns a non-empty `graph.nodes` count (proves the Kuzu graph populated).
+5. `cd go && CGO_ENABLED=1 go test ./... -count=1 -race` is green on `main` HEAD.
+6. The perf-gate CI added in Task D1 is green on `main` HEAD.
+
+If any criterion is unmet, continue iterating. Do NOT emit a false promise.
+
+### Per-iteration recipe
+
+```dot
+digraph ralph_loop {
+ start [shape=doublecircle, label="iteration start"];
+ read_prog [shape=box, label="Read progress file\n.claude/oom-fix-progress.md"];
+ find_next [shape=diamond, label="Find next undone\nTask in the plan"];
+ all_done [shape=diamond, label="All Tasks done?"];
+ accept [shape=diamond, label="Acceptance criteria\nmet (§ Completion promise)?"];
+ promise [shape=doublecircle, label="OOM FIXED\nEXIT LOOP"];
+ enter_wt [shape=box, label="Enter worktree for Task"];
+ exec_steps [shape=box, label="Execute Task Steps\nin order, marking each\ncomplete in progress file"];
+ tests [shape=diamond, label="All tests + benchmarks\npass for this Task?"];
+ blocker [shape=box, label="Mark Task blocked\nin progress, document\nthe failure"];
+ open_pr [shape=diamond, label="Last Task in Phase?"];
+ pr_phase [shape=box, label="Open PR for the\nwhole Phase"];
+ pr_wait [shape=diamond, label="Phase PR merged?"];
+ next_phase [shape=box, label="Pull main, start\nnext Phase"];
+ next_task [shape=box, label="Move to next\nTask in Phase"];
+
+ start -> read_prog;
+ read_prog -> all_done;
+ all_done -> accept [label="yes"];
+ all_done -> find_next [label="no"];
+ accept -> promise [label="yes"];
+ accept -> find_next [label="no"];
+ find_next -> enter_wt;
+ enter_wt -> exec_steps;
+ exec_steps -> tests;
+ tests -> blocker [label="no"];
+ tests -> open_pr [label="yes"];
+ blocker -> start [label="next iter"];
+ open_pr -> pr_phase [label="yes"];
+ open_pr -> next_task [label="no"];
+ pr_phase -> pr_wait;
+ pr_wait -> next_phase [label="yes"];
+ pr_wait -> start [label="no, wait next iter"];
+ next_phase -> start;
+ next_task -> start;
+}
+```
+
+### Progress file format
+
+The loop maintains `/home/dev/projects/codeiq/.claude/oom-fix-progress.md` (gitignored; the `.claude/` directory is already in `.gitignore`):
+
+```markdown
+# OOM Fix Progress
+
+Started: 2026-05-13
+Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md
+
+## Phase A — Quick wins
+- [x] Task A1: Parse once per file in LanguageEnricher — PR #144 merged 2026-05-13
+- [ ] Task A2: Bounded goroutine pool — in_progress, branch perf/enricher-bounded-pool
+- [ ] Task A3: Cap Kuzu BufferPoolSize + CALL threads
+- [ ] Task A4: Free GraphBuilder maps after Snapshot
+
+## Phase B — TreeCursor migration
+- [ ] Task B1: Rewrite parser.Walk to use TreeCursor
+
+## Phase C — Streaming three-pass enrich
+- [ ] Task C1: Define streaming interfaces (lands with C2)
+- [ ] Task C2: Implement Pass 1 — Index build
+- [ ] Task C3: Pass 2 — Linkers against compact index
+- [ ] Task C4: Pass 3 — Streaming load with bounded-batch BulkLoad
+- [ ] Task C5: Cut over enrich.go to three-pass
+
+## Phase D — Verification harness
+- [ ] Task D1: Memory regression test in CI
+- [ ] Task D2: Real-world acceptance run
+
+## Blockers
+(none currently — update if a Step fails twice)
+
+## Acceptance checklist (§ Completion promise)
+- [ ] All Tasks complete
+- [ ] 4 phase PRs merged
+- [ ] /usr/bin/time -v codeiq enrich ~/projects/ < 4 GiB peak RSS
+- [ ] codeiq stats ~/projects/ returns non-empty graph
+- [ ] go test ./... -count=1 -race green on main
+- [ ] perf-gate CI green on main
+```
+
+### Inter-task / inter-phase semantics
+
+- **Inside a phase**: tasks land as separate commits on ONE phase branch. The loop iterates Steps within a Task to completion, then moves to the next Task on the same branch.
+- **At phase end**: open ONE PR per phase (4 PRs total: `perf/enrich-oom-phase-a`, `…-phase-b`, `…-phase-c`, `…-phase-d`). The PR body lists all Tasks landed and the measured improvement.
+- **Phase gating**: do not start Phase N+1 until Phase N's PR is merged. The loop polls `gh pr view --json state` between iterations; if state ≠ MERGED, the loop's next iteration just re-checks (idempotent). User merges at their cadence; loop does not auto-merge.
+- **Worktree isolation**: each phase runs in its own worktree (`EnterWorktree name=enrich-oom-phase-X`). Worktree torn down after the phase PR merges.
+
+### Failure / blocker handling
+
+- Steps must verify themselves (tests + smoke). If a Step fails twice in a row, the loop DOES NOT retry a third time. It marks the Task `blocked`, writes the diagnosis to `## Blockers` in the progress file, and moves to the next undone Task (cross-phase if needed). At loop wake-up, blockers surface for human review.
+- Some Tasks have hard dependencies (e.g. C3 depends on C2). If C2 is blocked, C3 stays pending; loop moves on to other phases' tasks.
+- No silent skips: every `blocked` is documented with file paths, error messages, and what was tried.
+
+### Stop conditions
+
+The loop stops ONLY when `OOM FIXED on ~/projects/` is emitted, which requires all 6 acceptance criteria in §"Completion promise" to hold. There is no iteration cap; the loop continues until done OR a hard blocker requires human intervention (in which case it surfaces the blocker via a clearly-marked progress-file entry and waits for the next wake-up).
+
+---
+
+**Evidence base (research that informed this plan):**
+
+- Empirical pprof on airflow (9,151 Py files): 91% of all allocations come from `tree-sitter.(*Tree).cachedNode`, with peak RSS 3.8 GB driven by transient parse-storm churn, not retention. inuse_space post-GC is only 4.5 MB — Go GC keeps up at small scale but loses at ~/projects scale.
+- Trajectory: istio (5.2k files / 1.1 GB peak) → airflow (9.1k / 3.8 GB) → ~/projects extrapolation 9-15 GB → OOM on 15 GB host.
+- Code-walk: `enrich.go:68` never nils the `GraphBuilder`, so its dedup maps (~280 MB) coexist with the snapshot slices for the whole pipeline. `enrich.go:69-70` slices grow through every stage and are never chunked.
+- Kuzu: `SystemConfig.DefaultSystemConfig()` allocates 80% of system RAM as buffer pool. Kuzu has no streaming/Appender API in v0.7.1 through v0.11.3 (issue #2739 still open). String-PK hash index for COPY FROM is not buffer-pool tracked and cannot spill (issue #4937).
+- ETL patterns: ID-only dedup (~24 MB for 434k IDs) + compact linker side-table (drops Properties/Annotations, ~35 MB) is the surgical streaming pattern. `extsort` is the fallback for >5M-node scale.
+
+---
+
+## Phase A — Quick wins
+
+Four independent fixes. Each is its own PR. Sequence: A1 → A2 → A3 → A4 (no inter-task dependencies, but the order minimises rebase churn since A1 + A2 both touch `extractor/enricher.go`).
+
+### Task A1: Parse once per file in LanguageEnricher
+
+**Context.** `internal/intelligence/extractor/enricher.go:100-130` spawns one goroutine per source file. Inside, each goroutine iterates the file's nodes and calls `t.ext.Extract(ctx, n)`. Every `Extract()` implementation (`java/extractor.go`, `python/extractor.go`, `typescript/extractor.go`, `golang/extractor.go`) calls `parser.ParseByName(lang, []byte(ctx.Content))` at its top — re-parsing the same file once per node. At ~13 nodes/file on Python this is a 13× over-parse and the dominant allocation driver.
+
+**Fix.** Hoist the parse out of `Extract`. The extractor goroutine parses the file once, then calls a new method `ext.ExtractFromTree(ctx, tree, nodes)` that walks the prebuilt tree to produce edges for all of the file's nodes.
+
+**Files:**
+- Modify: `go/internal/intelligence/extractor/extractor.go` — extend `LanguageExtractor` interface with `ExtractFromTree(ctx Context, tree *sitter.Tree, nodes []*model.CodeNode) []*model.CodeEdge`. Keep the existing `Extract` as a thin wrapper that calls parse + ExtractFromTree, for back-compat in tests.
+- Modify: `go/internal/intelligence/extractor/enricher.go` — in the per-file goroutine, parse once, call `ExtractFromTree` instead of looping `Extract` per node.
+- Modify: `go/internal/intelligence/extractor/{java,python,typescript,golang}/extractor.go` — implement `ExtractFromTree`; refactor `Extract` to wrap it.
+- Modify: `go/internal/intelligence/extractor/{java,python,typescript,golang}/*_test.go` — update tests if they test private parse paths; otherwise no-op.
+
+- [ ] **Step 1: Add `ExtractFromTree` to the interface**
+
+`go/internal/intelligence/extractor/extractor.go`:
+
+```go
+type LanguageExtractor interface {
+ Language() string
+ Extract(ctx Context, node *model.CodeNode) []*model.CodeEdge
+ // ExtractFromTree walks a pre-parsed tree-sitter Tree once and emits
+ // edges for every node in `nodes` belonging to the same file. This
+ // replaces N per-node calls to Extract on the same file (each of which
+ // re-parses) with one call that visits the AST a single time.
+ ExtractFromTree(ctx Context, tree *sitter.Tree, nodes []*model.CodeNode) []*model.CodeEdge
+}
+```
+
+Run `go build ./...` — expect compile errors in the 4 extractor packages until they implement the new method.
+
+- [ ] **Step 2: Implement `ExtractFromTree` in `python/extractor.go`**
+
+Pull the current body of `Extract` (which calls `ParseByName` then walks the tree to find calls) into a helper `walkForCalls(tree, node) []CodeEdge`. New `ExtractFromTree` calls `walkForCalls(tree, n)` once per node, sharing the tree. `Extract` becomes:
+
+```go
+func (e *Extractor) Extract(ctx Context, n *model.CodeNode) []*model.CodeEdge {
+ tree, err := parser.ParseByName("python", []byte(ctx.Content))
+ if err != nil { return nil }
+ defer tree.Close()
+ return e.ExtractFromTree(ctx, tree, []*model.CodeNode{n})
+}
+```
+
+- [ ] **Step 3: Run python extractor tests**
+
+`cd go && CGO_ENABLED=1 go test ./internal/intelligence/extractor/python/... -count=1 -v`
+
+Tests must still pass with identical output.
+
+- [ ] **Step 4: Repeat Steps 2-3 for java, typescript, golang**
+
+- [ ] **Step 5: Update `enricher.go` to call `ExtractFromTree`**
+
+Replace the per-node loop in `enricher.go:97-130` with:
+
+```go
+go func(i int, t task) {
+ defer wg.Done()
+ raw, err := os.ReadFile(t.file)
+ if err != nil { return }
+ ctx := buildContext(t, raw)
+ tree, err := parser.ParseByName(t.ext.Language(), raw)
+ if err != nil { return }
+ defer tree.Close()
+ out[i] = t.ext.ExtractFromTree(ctx, tree, t.nodes)
+}(i, t)
+```
+
+- [ ] **Step 6: Full enrich-related test pass**
+
+```
+cd go && CGO_ENABLED=1 go test ./internal/intelligence/extractor/... ./internal/analyzer/... -count=1
+```
+
+- [ ] **Step 7: Benchmark before/after on fixture-multi-lang**
+
+Build, then time the enrich. Record peak RSS via `/usr/bin/time -v`. Expect: allocations down ~13×, peak RSS down meaningfully but smaller than the gain from Task A2.
+
+- [ ] **Step 8: Commit + PR**
+
+```
+git checkout -b perf/enrich-parse-once-per-file
+git commit -m "perf(enricher): parse tree-sitter tree once per file, not per node
+
+Each LanguageExtractor.Extract reparsed the source file at its top —
+on Python at ~13 nodes/file that meant 13x over-parse. pprof on
+airflow showed 91% of total allocations from tree-sitter.Tree.cachedNode.
+
+Adds ExtractFromTree(ctx, tree, nodes []*CodeNode) []*CodeEdge to the
+LanguageExtractor interface; enricher goroutines now parse once and
+walk the shared tree for every node in that file."
+```
+
+---
+
+### Task A2: Bounded goroutine pool in LanguageEnricher
+
+**Context.** `enricher.go:97-130` spawns one goroutine per source file unbounded. On airflow's 7,456 Python files that's 7,456 concurrent live trees + file content strings. Peak RSS spikes when many of these are live simultaneously.
+
+**Fix.** Replace the bare fan-out with a semaphore-bounded pool sized to `2 * runtime.GOMAXPROCS(0)`. Preserves determinism because results are still written to `out[i]` indexed by task slot.
+
+**Files:**
+- Modify: `go/internal/intelligence/extractor/enricher.go` — add semaphore channel; acquire before `go func`, release at goroutine end.
+- Modify: `go/internal/intelligence/extractor/enricher_test.go` — add a test asserting concurrency cap (count concurrent goroutines via a runtime counter).
+
+- [ ] **Step 1: Write the test first**
+
+```go
+func TestEnricherBoundedConcurrency(t *testing.T) {
+ var inFlight, maxInFlight int32
+ // Drive enricher with N=200 fake tasks that each sleep so we can
+ // observe peak concurrency. Assert max <= 2 * GOMAXPROCS.
+ ...
+ cap := int32(2 * runtime.GOMAXPROCS(0))
+ if maxInFlight > cap {
+ t.Fatalf("peak concurrent goroutines = %d, want <= %d", maxInFlight, cap)
+ }
+}
+```
+
+- [ ] **Step 2: Run test — watch it fail**
+
+- [ ] **Step 3: Implement the semaphore**
+
+```go
+sem := make(chan struct{}, 2*runtime.GOMAXPROCS(0))
+for i, t := range tasks {
+ wg.Add(1)
+ sem <- struct{}{}
+ go func(i int, t task) {
+ defer wg.Done()
+ defer func() { <-sem }()
+ // existing body
+ }(i, t)
+}
+```
+
+- [ ] **Step 4: Watch test pass**
+
+- [ ] **Step 5: Full extractor test pass**
+
+- [ ] **Step 6: Benchmark on airflow (or proxy)**
+
+Time + RSS before vs after. Expected: similar wall time (already bounded by CPU); peak RSS materially down because fewer trees live simultaneously.
+
+- [ ] **Step 7: Commit + PR**
+
+```
+perf(enricher): bound LanguageEnricher goroutine pool to 2 * GOMAXPROCS
+
+Previously the enricher spawned one goroutine per source file with no
+cap. On polyglot Python repos (airflow: 7,456 files) that produced
+7k+ concurrent live tree-sitter Trees + file content strings, driving
+the OOM-prone RSS spike. Bounded semaphore preserves determinism
+(results still indexed by task slot) at no measurable wall-time cost.
+```
+
+---
+
+### Task A3: Cap Kuzu BufferPoolSize + CALL threads
+
+**Context.** `internal/graph/store.go` opens Kuzu with `kuzu.DefaultSystemConfig()` which allocates 80% of system RAM as the buffer pool. On a 15 GiB host that's ~12 GiB reserved by Kuzu before any Go enrich work starts. Plus: COPY FROM parallelism on string-keyed node tables is the worst case (issue #4937 — primary key hash index is not buffer-pool-tracked, cannot spill).
+
+**Fix.**
+1. Expose a `--max-buffer-pool` flag (and config field) that defaults to `min(2 GiB, 25% of system RAM)`. Pass it via `SystemConfig.BufferPoolSize` when opening Kuzu.
+2. Before the first `BulkLoadNodes` COPY, issue `CALL threads = N` where N defaults to `min(4, GOMAXPROCS)`. Lowers Kuzu's COPY parallelism, capping its working set proportionally.
+
+**Files:**
+- Modify: `go/internal/graph/store.go` — change `Open()` and `OpenReadOnly()` to accept a `StoreOptions` struct (or extend the existing one) with `BufferPoolBytes int64`. Default if unset: 2 GiB.
+- Modify: `go/internal/cli/enrich.go` (or root.go) — add a `--max-buffer-pool` flag. Wire through to store.Open.
+- Modify: `go/internal/graph/bulk.go` — before the first COPY, issue `CALL threads = ?` if configured.
+- Modify: `codeiq.yml` example + `internal/config/*.go` — surface the option.
+
+- [ ] **Step 1: Add BufferPoolBytes to store options**
+
+```go
+type StoreOptions struct {
+ Path string
+ BufferPoolBytes int64 // 0 = use default 2 GiB
+ CopyThreads int // 0 = use default min(4, GOMAXPROCS)
+}
+```
+
+- [ ] **Step 2: Apply in graph.Open()**
+
+```go
+cfg := kuzu.DefaultSystemConfig()
+if opts.BufferPoolBytes > 0 {
+ cfg.BufferPoolSize = uint64(opts.BufferPoolBytes)
+} else {
+ cfg.BufferPoolSize = 2 << 30 // 2 GiB
+}
+```
+
+- [ ] **Step 3: Apply CALL threads in BulkLoadNodes** (`bulk.go`)
+
+Before the first COPY:
+
+```go
+if s.copyThreads > 0 {
+ if _, err := s.Cypher(fmt.Sprintf("CALL threads = %d", s.copyThreads)); err != nil {
+ return fmt.Errorf("graph: set copy threads: %w", err)
+ }
+}
+```
+
+- [ ] **Step 4: Wire CLI flag**
+
+In `enrich.go` cobra command:
+
+```go
+cmd.Flags().Int64Var(&maxBufferPool, "max-buffer-pool", 0, "Max Kuzu buffer pool in bytes (default: 2 GiB).")
+cmd.Flags().IntVar(©Threads, "copy-threads", 0, "Threads for Kuzu COPY FROM (default: min(4, GOMAXPROCS)).")
+```
+
+- [ ] **Step 5: Test**
+
+```
+cd go && CGO_ENABLED=1 go test ./internal/graph/... ./internal/cli/... -count=1
+```
+
+- [ ] **Step 6: Smoke with explicit cap**
+
+```
+/tmp/codeiq enrich /tmp/bench-fixture --max-buffer-pool=$((512*1024*1024)) --copy-threads=2
+```
+
+Expected: stats output identical to default.
+
+- [ ] **Step 7: Commit + PR**
+
+```
+perf(graph): cap Kuzu BufferPoolSize and COPY threads
+
+kuzu.DefaultSystemConfig() allocates 80% of system RAM as buffer pool
+(~12 GiB on a 15 GiB host) before any enrich work runs, leaving
+insufficient headroom for Go-side enrichment. Cap at 2 GiB by default;
+expose --max-buffer-pool and --copy-threads CLI flags for tuning.
+```
+
+---
+
+### Task A4: Free GraphBuilder maps after Snapshot
+
+**Context.** `enrich.go:68` calls `snap := builder.Snapshot()` but never releases `builder`. The two dedup maps (`builder.nodes`, `builder.edges`) hold ~280 MB of references to the same `*CodeNode` / `*CodeEdge` objects that the Snapshot slices now hold. They coexist for the entire pipeline lifespan.
+
+**Fix.** Set `builder = nil` immediately after Snapshot returns. Optionally, modify `GraphBuilder.Snapshot()` to clear its internal maps so the builder is reusable but doesn't retain references.
+
+**Files:**
+- Modify: `go/internal/analyzer/graph_builder.go` — `Snapshot()` clears `b.nodes = nil; b.edges = nil` at the end (after copying out into slices).
+- Modify: `go/internal/analyzer/graph_builder_test.go` — add a determinism test that exercises Snapshot twice; document that the second call returns an empty snapshot now (or change the semantics to error on reuse).
+
+Decision point: do we want `Snapshot` to be idempotent (return same snapshot on repeated calls) or single-shot (clear after extraction)? Single-shot is simpler. Tests should confirm the new semantics.
+
+- [ ] **Step 1: Change Snapshot to clear**
+
+```go
+func (b *GraphBuilder) Snapshot() *Snapshot {
+ snap := &Snapshot{
+ Nodes: sortedNodesByID(b.nodes),
+ Edges: sortedEdgesByID(b.edges),
+ Stats: b.Stats(),
+ }
+ // Release dedup maps so the holders can be GC'd before the
+ // downstream enricher pipeline runs.
+ b.nodes = nil
+ b.edges = nil
+ return snap
+}
+```
+
+- [ ] **Step 2: Run GraphBuilder tests**
+
+`cd go && CGO_ENABLED=1 go test ./internal/analyzer/... -count=1`
+
+Add a test:
+
+```go
+func TestSnapshotReleasesMaps(t *testing.T) {
+ b := NewGraphBuilder()
+ b.Add(&detector.Result{Nodes: []*model.CodeNode{{ID: "x"}}})
+ _ = b.Snapshot()
+ if b.nodes != nil || b.edges != nil {
+ t.Fatal("Snapshot must nil maps to allow GC")
+ }
+}
+```
+
+- [ ] **Step 3: Commit + PR**
+
+```
+perf(graph_builder): release dedup maps after Snapshot
+
+GraphBuilder.Snapshot extracts deduped nodes/edges into sorted slices
+but the internal map[string]*CodeNode / map[edgeKey]*CodeEdge held
+references to the same objects, doubling peak retained memory across
+the enrich pipeline (~280 MB on ~/projects scale).
+
+Clear the maps inside Snapshot so the next allocation can collect
+them. Snapshot is now single-shot; documented in code.
+```
+
+---
+
+## Phase A success criterion
+
+After A1-A4 merged, re-run the airflow enrich:
+
+```
+/usr/bin/time -v codeiq enrich ~/projects/polyglot-bench/airflow
+```
+
+Expected: peak RSS drops from 3.8 GB to ~600-800 MB. ~/projects-scale extrapolation drops from 9-15 GB to ~2-4 GB. If this is enough to clear ~/projects without OOM, we can ship Phases B + C as polish; if not, they're load-bearing.
+
+---
+
+## Phase B — TreeCursor migration
+
+### Task B1: Rewrite parser.Walk to use TreeCursor
+
+**Context.** `parser/walk.go:22-32` does `n.Child(i)` recursion. Each `Child()` call routes through `(*Tree).cachedNode` which heap-allocates a `*Node` on first visit and caches it. With ~12 nodes/file and 7k+ files, that's ~84k+ live `*Node` allocations per parse. pprof: 91% of all allocations in the airflow run.
+
+`tree-sitter`'s `TreeCursor` (`bindings.go:602`) traverses without per-node `*Node` allocation. The cursor itself is the only allocation; it reuses a single internal `Node` view across traversal.
+
+**Fix.** Rewrite `parser.Walk` to use TreeCursor. Public function signature stays identical; callers don't change.
+
+**Files:**
+- Modify: `go/internal/parser/walk.go` — rewrite `Walk(node *sitter.Node, fn func(*sitter.Node) bool)`.
+- Verify: every caller in `internal/intelligence/extractor/*` still works.
+
+- [ ] **Step 1: Read tree-sitter cursor API in `~/go/pkg/mod/github.com/smacker/go-tree-sitter@*/bindings.go`**
+
+Methods: `NewTreeCursor(node) *TreeCursor`, `GotoFirstChild() bool`, `GotoNextSibling() bool`, `GotoParent() bool`, `CurrentNode() *Node`, `CurrentFieldName() string`, `Close()`.
+
+- [ ] **Step 2: Rewrite Walk**
+
+```go
+// Walk visits every descendant of `root` in pre-order DFS using a
+// TreeCursor — no per-node *Node heap allocation. Visitor returns
+// false to skip descending into a node's children.
+func Walk(root *sitter.Node, fn func(*sitter.Node) bool) {
+ if root == nil { return }
+ cur := sitter.NewTreeCursor(root)
+ defer cur.Close()
+ descend := fn(cur.CurrentNode())
+ for {
+ if descend && cur.GoToFirstChild() {
+ descend = fn(cur.CurrentNode())
+ continue
+ }
+ for {
+ if cur.GoToNextSibling() {
+ descend = fn(cur.CurrentNode())
+ break
+ }
+ if !cur.GoToParent() { return }
+ }
+ }
+}
+```
+
+- [ ] **Step 3: Run parser tests + every extractor test**
+
+```
+cd go && CGO_ENABLED=1 go test ./internal/parser/... ./internal/intelligence/extractor/... -count=1
+```
+
+Tests must pass without modification — the public Walk API is unchanged.
+
+- [ ] **Step 4: Determinism check**
+
+Run the analyzer twice on fixture-minimal; assert byte-identical Kuzu output.
+
+- [ ] **Step 5: Benchmark**
+
+Allocations should drop 90%+ on airflow.
+
+- [ ] **Step 6: Commit + PR**
+
+```
+perf(parser): use TreeCursor instead of recursive Node.Child traversal
+
+(*Tree).cachedNode was responsible for 91% of total allocations
+during enrich on a polyglot Python repo (airflow run, pprof
+alloc_space top). Each Node.Child(i) call heap-allocated a *Node and
+cached it on the Tree. TreeCursor traverses the same tree without
+per-node *Node allocation.
+
+Public parser.Walk signature unchanged; callers are unmodified.
+```
+
+---
+
+## Phase C — Streaming three-pass enrich
+
+This is the architectural fix. After it lands, the enrich pipeline never materialises the full graph in Go memory. Memory is bounded by configurable batch size + a compact ID/metadata index, both well under 100 MB at ~/projects scale and gracefully scalable to 10M+ nodes.
+
+### Task C1: Define the streaming interfaces
+
+**Context.** Today the enrich pipeline passes `nodes []*model.CodeNode` and `edges []*model.CodeEdge` slices between stages. To stream, we need an iterator/channel-based contract.
+
+**Files:**
+- Create: `go/internal/analyzer/stream.go` — defines `NodeStream`, `EdgeStream`, `NodeIndex` types.
+
+```go
+// NodeStream emits nodes in the order they were ingested from the
+// SQLite cache; deduplicated by ID via a Set held by the
+// stream's source. Implementations must be safe for one consumer.
+type NodeStream interface {
+ Next() (*model.CodeNode, error) // io.EOF when exhausted
+ Close() error
+}
+
+// EdgeStream — symmetric, for edges.
+type EdgeStream interface {
+ Next() (*model.CodeEdge, error)
+ Close() error
+}
+
+// NodeIndex is a compact in-memory index of every node's id + the
+// small set of fields cross-stage enrichers actually need: Kind,
+// Label, FQN, FilePath, Module. Properties and Annotations are
+// omitted. Memory cost at 434k nodes: ~35 MB.
+type NodeIndex interface {
+ Lookup(id string) (CompactNode, bool)
+ LookupByFQN(fqn string) (CompactNode, bool)
+ Len() int
+ Range(fn func(CompactNode) bool) // visit in stable order
+}
+
+type CompactNode struct {
+ ID string
+ Kind model.NodeKind
+ Label string
+ FQN string
+ FilePath string
+ Module string
+ Layer model.Layer
+}
+```
+
+- [ ] **Step 1: Write the type definitions**
+
+- [ ] **Step 2: Add unit tests for the basic shape**
+
+Mock implementations + a smoke test that round-trips a small slice through a `sliceNodeStream`.
+
+- [ ] **Step 3: Commit (no PR yet — this is groundwork for C2)**
+
+This task lands together with C2 in a single PR for review coherence.
+
+### Task C2: Implement Pass 1 — Index build from cache
+
+**Goal.** Stream the SQLite cache once, building a `NodeIndex` (compact) and computing per-node Layer (since LayerClassifier is stateless and can run during the index pass). Memory budget: ~35 MB for the index + one cache row at a time.
+
+**Files:**
+- Modify: `go/internal/analyzer/enrich.go` — split `Enrich()` into `Enrich(opts)` orchestrating three passes.
+- Create: `go/internal/analyzer/pass1_index.go` — `BuildIndex(c *cache.Cache, root string) (NodeIndex, error)`.
+
+Pass 1 logic:
+1. Iterate cache entries via `c.IterateAll` — one entry at a time.
+2. For each cached node: compute Layer (call LayerClassifier inline), build `CompactNode`, insert into the index.
+3. Detect duplicates by ID via the index's internal map (acts as the ID set).
+4. Return the populated index. The cache stays on disk; the full node payloads are NOT held in memory.
+
+- [ ] **Step 1: Implement BuildIndex** with TDD
+
+Test on fixture-minimal: builds an index with the expected node count + compact-field contents.
+
+- [ ] **Step 2: Wire it into enrich.go**
+
+Replace `builder := NewGraphBuilder(); ...; snap := builder.Snapshot()` with `index, err := BuildIndex(c, root)`. The old `nodes, edges` locals are gone.
+
+- [ ] **Step 3: Run all enrich tests — they will fail**
+
+That's expected. The next tasks rewire each stage.
+
+### Task C3: Pass 2 — Linkers against the compact index
+
+**Goal.** Reformat the three linkers to operate on `NodeIndex` and emit new nodes/edges to a channel/buffer instead of mutating slices.
+
+**Files:**
+- Modify: `go/internal/analyzer/linker/topic_linker.go` — `Link(idx NodeIndex, emit func(detector.Result))`.
+- Modify: `go/internal/analyzer/linker/entity_linker.go` — same shape.
+- Modify: `go/internal/analyzer/linker/module_containment_linker.go` — same.
+
+Each linker internally:
+- Iterates `idx.Range(...)` for whatever cross-referencing it does.
+- Emits new `CodeNode`/`CodeEdge` records by calling `emit(...)`.
+- Holds only its own small internal state (e.g. `byModule map[string][]CompactNode`).
+
+- [ ] **Step 1-3 per linker** — TDD-style, write the new signature, port the body, update tests.
+
+- [ ] **Step 4: Smoke against fixture-multi-lang** — linker output diff vs baseline must be zero.
+
+### Task C4: Pass 3 — Streaming load with bounded-batch BulkLoad
+
+**Goal.** Stream the SQLite cache a SECOND time. For each batch of `batchSize` nodes (default 5000):
+1. Read the full payload from cache.
+2. Apply LexicalEnricher (per-file, releases file content after each).
+3. Apply LanguageEnricher (with parse-once-per-file from A1 + bounded pool from A2).
+4. Append linker output appropriate to nodes in this batch.
+5. Hand off to a write goroutine via a bounded channel; the write goroutine calls `BulkLoadNodes(batch)` (already chunked in PR #143).
+6. Release the batch slice — let GC collect.
+
+Memory budget: `batchSize` nodes (~2.5 MB at 5000 nodes × 500 bytes) + the NodeIndex (~35 MB) + write-channel buffer.
+
+**Files:**
+- Create: `go/internal/analyzer/pass3_load.go` — `StreamLoad(c *cache.Cache, idx NodeIndex, linkerOutputs []detector.Result, store *graph.Store, opts) error`.
+- Modify: `go/internal/intelligence/lexical/enricher.go` — operate on per-batch nodes; don't require full-set.
+- Modify: `go/internal/intelligence/extractor/enricher.go` — operate on per-batch nodes.
+
+The ServiceDetector is tricky because it currently walks the filesystem AND stamps every node. Two options:
+- (a) Run it in Pass 1 (before the index is finalised) so it appears in the index and downstream tasks see service mappings.
+- (b) Run it in Pass 3 batch by batch using the index for cross-references.
+
+Recommendation: (a). ServiceDetector's filesystem walk is fast (it doesn't iterate nodes — it walks for build files) and the per-node stamping is fast given the index is in memory.
+
+- [ ] **Step 1: ServiceDetector refactor to run in Pass 1**
+
+- [ ] **Step 2: LexicalEnricher per-batch refactor**
+
+- [ ] **Step 3: LanguageEnricher per-batch refactor** (depends on A1's `ExtractFromTree`)
+
+- [ ] **Step 4: Implement StreamLoad** with bounded write channel
+
+- [ ] **Step 5: Determinism test** — run enrich twice on fixture-multi-lang, diff Kuzu output (use the kuzu_dump utility in parity/).
+
+- [ ] **Step 6: Wall-time + memory benchmark on airflow**
+
+### Task C5: Cut over `enrich.go` to the three-pass pipeline
+
+Replace the old `Enrich()` body with:
+
+```go
+func Enrich(root string, c *cache.Cache, opts EnrichOptions) (EnrichSummary, error) {
+ // Pass 1: build compact index + run ServiceDetector + LayerClassifier
+ index, services, err := BuildIndex(c, root)
+ if err != nil { return EnrichSummary{}, fmt.Errorf("pass1: %w", err) }
+
+ // Pass 2: linkers emit new edges/nodes against the compact index
+ linkerOut, err := RunLinkers(index)
+ if err != nil { return EnrichSummary{}, fmt.Errorf("pass2: %w", err) }
+
+ // Pass 3: stream cache through enrichers in batches, bulk-load to Kuzu
+ store, err := graph.Open(opts.GraphDir, opts.StoreOptions)
+ if err != nil { return EnrichSummary{}, fmt.Errorf("open graph: %w", err) }
+ defer store.Close()
+ if err := store.ApplySchema(); err != nil { return EnrichSummary{}, err }
+
+ summary, err := StreamLoad(c, index, services, linkerOut, store, opts)
+ if err != nil { return EnrichSummary{}, fmt.Errorf("pass3: %w", err) }
+
+ if err := store.CreateIndexes(); err != nil { return EnrichSummary{}, err }
+ return summary, nil
+}
+```
+
+- [ ] **Step 1: Replace enrich.go body**
+
+- [ ] **Step 2: Full test suite**
+
+`cd go && CGO_ENABLED=1 go test ./... -count=1 -race`
+
+- [ ] **Step 3: Determinism diff against pre-cutover output**
+
+Index + enrich fixture-multi-lang on both branches; diff Kuzu graph contents via `parity/kuzu_dump`. Output must be byte-identical.
+
+- [ ] **Step 4: Commit + PR — single big PR for Phase C**
+
+The streaming refactor lands as one PR because the moving parts (index, linkers, load) are interlocked. Reviewers see the full picture once.
+
+```
+perf(enrich): streaming three-pass pipeline for memory-bounded enrich
+
+Pass 1 (index): stream SQLite cache, build compact NodeIndex (~35 MB
+at 434k-node scale; drops Properties + Annotations), run
+ServiceDetector + LayerClassifier in-pass.
+
+Pass 2 (linkers): TopicLinker, EntityLinker, ModuleContainmentLinker
+operate on NodeIndex; emit new nodes/edges to a buffer.
+
+Pass 3 (load): stream cache a second time in batches of 5000 nodes.
+LexicalEnricher + LanguageEnricher apply per batch. Each batch is
+BulkLoaded to Kuzu and released before the next starts.
+
+Memory profile: NodeIndex (35 MB) + one batch (2.5 MB) + linker
+output buffer + Kuzu buffer pool cap (2 GiB from task A3). Total
+peak ~2.1 GiB regardless of input size.
+
+Determinism: Kuzu output byte-identical to pre-cutover on
+fixture-multi-lang (verified via parity/kuzu_dump).
+```
+
+---
+
+## Phase D — Verification harness
+
+### Task D1: Memory regression test in CI
+
+**Goal.** Lock in the gain. Add a CI check that runs `enrich` on a representative target and asserts peak RSS < threshold.
+
+**Files:**
+- Create: `go/internal/analyzer/bench/memory_test.go` (build tag `bench`).
+- Modify: `.github/workflows/perf-gate.yml` — invoke the bench, parse output, fail if RSS > 1 GiB on the test fixture.
+
+The test:
+1. Builds the binary
+2. Runs `index` then `enrich` on a fixture
+3. Captures peak RSS via `/usr/bin/time -v` parsing OR `golang.org/x/sys/unix.Rusage`
+4. Asserts peak < threshold
+
+- [ ] **Step 1: Pick a fixture** — fixture-multi-lang is too small (peak ~50 MB). We need something Python-heavy to exercise the parse-storm path. Add a `fixture-python-heavy/` to testdata: ~500 synthetic Python files generated programmatically. ~5 MB on disk, ~10k AST nodes total.
+
+- [ ] **Step 2: Memory bench harness**
+
+- [ ] **Step 3: CI integration**
+
+Add to perf-gate.yml; threshold: peak RSS < 200 MB on the new fixture. (Tunable; pick after baselining.)
+
+### Task D2: Real-world acceptance run
+
+**Goal.** End-to-end confirmation on ~/projects.
+
+- [ ] **Step 1: Index + enrich + stats on ~/projects/polyglot-bench (~22k files)**
+
+```
+codeiq index ~/projects/polyglot-bench
+codeiq enrich ~/projects/polyglot-bench
+codeiq stats ~/projects/polyglot-bench
+```
+
+Capture peak RSS via `/usr/bin/time -v`. Target: < 2.5 GB.
+
+- [ ] **Step 2: Index + enrich on ~/projects (49k files)**
+
+Same dance. Target: < 4 GB peak. No exit 137. Stats output usable.
+
+- [ ] **Step 3: Document the result**
+
+Add a section to PROJECT_SUMMARY.md or CLAUDE.md noting the new scale-tested target.
+
+---
+
+## Out of scope
+
+- **Duplicate-PK service IDs** (`service:checkbox`, `service:src`) — distinct bug. `service_detector.go:168` builds ID as `"service:" + name`; needs path-qualification. Separate PR.
+- **CSV escape bug in BulkLoadEdges** — JSON properties with commas break Kuzu COPY FROM. Fix candidate: switch the delimiter to `\x1F` (unit separator) or pre-escape. Separate PR.
+- **Kuzu version upgrade** — v0.7.1 → v0.11.x. Worth doing for unrelated reasons but doesn't fix this OOM.
+- **Distributed enrich** — splitting enrich across processes. Premature; revisit at 100M-node scale.
+
+---
+
+## Risk register
+
+| Risk | Mitigation |
+|---|---|
+| Determinism drift between pre/post streaming refactor | After every Phase C task, run `parity/kuzu_dump` diff against baseline on fixture-multi-lang. Block the merge if any diff. |
+| TreeCursor semantics differ subtly from recursive Walk (e.g. order of visitation) | Phase B test pass against every extractor must show identical edge emission. Add a property-based test that compares Walk-old vs Walk-new on synthetic ASTs. |
+| Phase A3 cap of 2 GiB buffer pool starves Kuzu on large reads downstream | The cap is configurable (`--max-buffer-pool`). Default sized for typical workstation; users with more RAM raise the cap. |
+| ServiceDetector run in Pass 1 emits before all nodes seen | ServiceDetector's filesystem walk is independent of node iteration; the per-node stamping happens *after* the walk completes and the full module map is built. Verify with an audit during Step C4.1. |
+| Phase C is one big PR; review burden high | Split into stacked PRs: C1+C2 (index), C3 (linkers), C4 (load), C5 (cutover). Each is independently reviewable. |
+
+---
+
+## Verification checklist (run before declaring the OOM bar met)
+
+- [ ] All Phase A PRs merged. `go test ./... -count=1` green on main.
+- [ ] Phase B PR merged. parser.Walk uses TreeCursor; determinism diff zero on fixture-multi-lang.
+- [ ] Phase C PRs merged. enrich.go is the three-pass orchestrator. `go test ./... -count=1 -race` green.
+- [ ] Phase D bench test added to CI; passes on every PR.
+- [ ] **Real-world acceptance**: `codeiq enrich ~/projects/` completes successfully with peak RSS < 4 GiB. No exit 137. Stats output usable.
+- [ ] CLAUDE.md / PROJECT_SUMMARY.md updated noting the new memory profile + scale ceiling.
+- [ ] Kuzu graph output byte-identical (or documented-different) between pre-refactor and post-refactor on fixture-multi-lang.