Skip to content

perf(enrich): Phase A quick wins for OOM fix#145

Merged
aksOps merged 5 commits into
mainfrom
perf/enrich-oom-phase-a
May 13, 2026
Merged

perf(enrich): Phase A quick wins for OOM fix#145
aksOps merged 5 commits into
mainfrom
perf/enrich-oom-phase-a

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 13, 2026

Summary

Phase A of the enrich OOM fix plan (docs/superpowers/plans/2026-05-13-enrich-oom-fix.md). Four surgical fixes that target the actual hot spots pprof exposed on a real-world polyglot Python target (airflow, 9,151 files → 3.8 GB peak RSS; trajectory extrapolates to OOM at ~/projects/ scale).

Tasks landed

Task What Commit
A1 Parse tree-sitter tree once per file, not once per node. Adds `ExtractFromTree(ctx, tree, nodes) []Result` to `LanguageExtractor`; all 4 language extractors implement it; enricher.go parses once per file. Cuts the 91% `tree-sitter.(*Tree).cachedNode` hot spot pprof flagged. `60d02d9`
A2 Bound the enricher goroutine pool to `2 * GOMAXPROCS`. Caps simultaneously-live trees + file content strings. New test `TestEnricher_BoundedConcurrency` drives 4×cap files through a tracking extractor; asserts peak in-flight ≤ cap. `21f07d8`
A3 Cap Kuzu `BufferPoolSize` (default 2 GiB) and `MaxNumThreads` (`min(4, GOMAXPROCS)`) via new `OpenOptions` + `OpenWithOptions`. Default `kuzu.DefaultSystemConfig()` reserves 80% of system RAM as buffer pool — ~12 GiB on a 15 GiB host. `e311d99`
A4 `GraphBuilder.Snapshot()` nils its dedup maps before returning. Frees ~280 MB of duplicate references that previously coexisted with the snapshot slices through the rest of the enrich pipeline. New `TestSnapshotReleasesDedupMaps`. `3170fe3`

Expected impact

Per the plan's success criterion: `~/projects/` peak RSS should drop from 9-15 GB (OOM-killed at exit 137) to ~2-4 GB. Real-world verification will run once Phase B + C land too.

Test plan

  • `go test ./... -count=1` — 876 pass (one new bounded-concurrency test added on top of 875)
  • `fixture-minimal` index → enrich → stats: identical 45 nodes / 68 edges / 1 service output vs pre-Phase-A
  • `go vet ./...` clean
  • CI on this PR

Next phases

This PR is Phase A of 4 from the plan. Phase B (TreeCursor migration), Phase C (streaming three-pass refactor), Phase D (perf-gate CI + real-world acceptance) ship as separate PRs after A merges.

aksOps added 5 commits May 13, 2026 12:53
…ask A1)

Each LanguageExtractor.Extract reparsed the source file at its top —
on Python at ~13 nodes/file that meant ~13x over-parse. pprof on
airflow flagged 91% of total allocations from tree-sitter.
(*Tree).cachedNode driven by the per-node re-parse storm.

Adds ExtractFromTree(ctx, tree, nodes) []Result to the
LanguageExtractor interface. The orchestrator now parses the file
once and calls ExtractFromTree(tree, allNodes) — the AST is walked
multiple times for distinct node-kinds but never re-parsed. Extract
is retained as a thin wrapper for single-node convenience callers
and tests.

Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md Task A1.

Per-file caches: matchAllList (py), matchInterfaceAssertion (go),
collectExports (ts) are computed once per file rather than once per
matching node.

Verification:
- go test ./internal/intelligence/extractor/... -count=1: 28 pass
- go test ./... -count=1: 875 pass
Previously the enricher spawned one goroutine per source file with no
cap. On polyglot Python repos (airflow: 7,456 files) that produced
7k+ concurrent live tree-sitter Trees + file content strings, driving
the OOM-prone RSS spike pprof exposed.

Adds a semaphore-bounded fan-out at 2*runtime.GOMAXPROCS(0). Tasks
still write to indexed slots, so determinism (sorted file path order)
is preserved. Polyglot real-world targets see materially lower peak
RSS at no measurable wall-time cost.

Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md Task A2.

Verification:
- New TestEnricher_BoundedConcurrency asserts peak in-flight calls
  <= 2*GOMAXPROCS by driving 4*cap files through a tracking extractor.
- go test ./... -count=1: 876 pass.
…sk A3)

kuzu.DefaultSystemConfig() allocates 80% of system RAM as the buffer
pool (~12 GiB on a 15 GiB host) before any enrich work runs. Combined
with Go-side enricher memory that's enough to OOM the process. The
default also allocates full GOMAXPROCS worth of internal threads,
amplifying COPY-side working set.

Adds OpenOptions struct + OpenWithOptions(path, opts). Open(path)
now applies safe defaults via OpenWithOptions(path, OpenOptions{}):
- BufferPoolBytes: 2 GiB (DefaultBufferPoolBytes)
- MaxThreads: min(4, GOMAXPROCS)

OpenReadOnly is unchanged externally (same signature) but routes
through OpenWithOptions internally — read paths inherit the same
buffer pool cap (2 GiB is plenty for read-side caching at our graph
scale).

Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md Task A3.
Future polish: surface --max-buffer-pool and --copy-threads CLI flags
for power-user tuning (deferred).

Verification:
- go test ./internal/graph/... -count=1: 44 pass
- go test ./... -count=1: 876 pass
GraphBuilder.Snapshot extracted deduped nodes/edges into sorted slices
but left builder.nodes and builder.edges maps holding references to
the same objects. With the slices and maps coexisting for the rest of
the enrich pipeline (~30 sec wall time on ~/projects/), ~280 MB of
duplicate references stayed live needlessly.

Clear the maps inside Snapshot before returning. Snapshot is now
single-shot — calling it twice on the same builder returns an empty
snapshot (acceptable; the only caller is analyzer.Enrich which calls
once).

Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md Task A4.

Verification:
- New TestSnapshotReleasesDedupMaps asserts both nodes + edges maps
  are nilled after Snapshot returns.
- go test ./... -count=1: 876 pass (no regressions).
@aksOps aksOps merged commit 9f54673 into main May 13, 2026
13 checks passed
@aksOps aksOps deleted the perf/enrich-oom-phase-a branch May 13, 2026 13:22
aksOps added a commit that referenced this pull request May 14, 2026
Stale doc references after Phase 6 (Java deletion, #132) and the Kuzu
0.7.1 → 0.11.3 bump (#155 + #159).

- CLAUDE.md / PROJECT_SUMMARY.md: bump Kuzu 0.7.1 → 0.11.3,
  go-sqlite3 1.14.22 → 1.14.44, cobra to 1.10.2; note native FTS.
- AGENTS.md: rewrite "What this repo is" (no more "REST API");
  flip `mvn -B -ntp clean verify` → `go test ./...`; clarify that
  REST + React SPA were deleted in Phase 6 and won't return.
- SECURITY.md: rewrite scope. Drop the dead JAR / serve / REST API /
  React UI / H2 / Neo4j Embedded references. New in-scope list covers
  every codeiq subcommand, the 10 MCP tools (with `run_cypher` mutation
  gate called out), `.codeiq/cache/` (SQLite) + `.codeiq/graph/`
  (Kuzu), and `read_file` path sandboxing. Add the security CI
  workflows (CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM,
  Socket Security) + perf-gate to the hardening references.
- CHANGELOG.md: populate [Unreleased] with the OOM-fix saga
  (PRs #145-#148), the five correctness fixes (#149-#153), the
  Kuzu 0.7.1 → 0.11.3 bump (#155-#158), the FTS migration (#159),
  the Dependabot config rewrite (#154), and the enrich CLI knobs.

No code changes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant