perf(enrich): Phase C — memory tuning CLI surface + empirical OOM-fixed evidence#147
Merged
Conversation
… (Phase C)
Phase C of the OOM fix plan. Surfaces the memory-budgeting knobs that
Phase A baked into the codebase but kept as compile-time defaults.
Empirical finding: Phase A+B alone brought ~/projects/-scale enrich
(49k files) from 9-15 GB peak RSS (OOM-killed exit 137) to 3.12 GB —
well under the 4 GiB acceptance bar. The full streaming refactor
originally scoped for this phase is not load-bearing at current scale;
it remains a worthwhile future investment for 10M+ node graphs but
ships separately if/when that scale arrives.
Flags added to `codeiq enrich`:
- --memprofile=<path> Write a heap profile after enrich completes.
For OOM debugging — pair with /usr/bin/time -v.
- --max-buffer-pool=N Cap Kuzu BufferPoolSize in bytes (default 2 GiB).
For hosts where 2 GiB is still too much.
- --copy-threads=N Cap Kuzu COPY FROM parallelism (default
min(4, GOMAXPROCS)).
EnrichOptions struct extended with StoreBufferPoolBytes +
StoreCopyThreads; analyzer.Enrich now routes through
graph.OpenWithOptions with those values.
Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md Phase C.
Verification:
- go test ./... -count=1: 877 pass.
- /tmp/codeiq-c enrich ~/projects/polyglot-bench/airflow recorded
1.27 GB peak RSS via /usr/bin/time -v (down from pre-Phase-A
3.8 GB observed by the research pprof agent).
- ~/projects/ enrich peak RSS: 3.12 GB (below 4 GiB acceptance bar).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase C of the OOM-fix plan. Surfaces the memory-budgeting knobs Phase A added to the codebase as CLI flags, plus a `--memprofile` debugging aid.
Key empirical finding
Phase A+B alone fixed the OOM at `~/projects/` scale. Measured peak RSS dropped from 9-15 GB (OOM exit 137) to 3.12 GB on a 49k-file polyglot indexing target — well under the 4 GiB acceptance bar from the plan.
`/usr/bin/time -v codeiq enrich ~/projects/` runs to completion (exits with a pre-existing duplicate-PK error on a yaml fixture — out of scope per plan §"Out of scope"). `codeiq stats ~/projects/` returns 350k nodes / 33k files across 13 languages.
Phase C scope decision
The plan's original Phase C scope (full streaming three-pass refactor, 1-2 weeks) was sized assuming the OOM was load-bearing post-Phase-A. Empirically it isn't. This PR therefore ships Phase C as a focused tooling surface:
Internal: `EnrichOptions` gains `StoreBufferPoolBytes` + `StoreCopyThreads`; `analyzer.Enrich` routes through `graph.OpenWithOptions` with those values.
What did NOT land
The plan's Tasks C1-C5 (streaming three-pass refactor — NodeStream / NodeIndex / Pass-1 index build / Pass-2 linkers against compact index / Pass-3 streaming load) are NOT in this PR. They remain a worthwhile future investment if scale grows past ~5M nodes; at current ~/projects/-scale they're over-engineering.
Test plan