Skip to content

perf(enrich): Phase C — memory tuning CLI surface + empirical OOM-fixed evidence#147

Merged
aksOps merged 1 commit into
mainfrom
perf/enrich-oom-phase-c
May 13, 2026
Merged

perf(enrich): Phase C — memory tuning CLI surface + empirical OOM-fixed evidence#147
aksOps merged 1 commit into
mainfrom
perf/enrich-oom-phase-c

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 13, 2026

Summary

Phase C of the OOM-fix plan. Surfaces the memory-budgeting knobs Phase A added to the codebase as CLI flags, plus a `--memprofile` debugging aid.

Key empirical finding

Phase A+B alone fixed the OOM at `~/projects/` scale. Measured peak RSS dropped from 9-15 GB (OOM exit 137) to 3.12 GB on a 49k-file polyglot indexing target — well under the 4 GiB acceptance bar from the plan.

Stage Peak RSS on ~/projects/ (49k files) Status
Pre-Phase-A (main before #145) 9-15 GB (OOM exit 137) broken
Post-Phase-A+B (current main) 3.12 GB passes

`/usr/bin/time -v codeiq enrich ~/projects/` runs to completion (exits with a pre-existing duplicate-PK error on a yaml fixture — out of scope per plan §"Out of scope"). `codeiq stats ~/projects/` returns 350k nodes / 33k files across 13 languages.

Phase C scope decision

The plan's original Phase C scope (full streaming three-pass refactor, 1-2 weeks) was sized assuming the OOM was load-bearing post-Phase-A. Empirically it isn't. This PR therefore ships Phase C as a focused tooling surface:

  • `--memprofile=` on `codeiq enrich` — writes a heap profile after enrich completes. For diagnosing memory regressions at scale.
  • `--max-buffer-pool=N` — exposes the Kuzu `BufferPoolSize` cap that Phase A baked in as a 2 GiB default. For hosts where the default is still too generous.
  • `--copy-threads=N` — exposes the Kuzu `MaxNumThreads` cap.

Internal: `EnrichOptions` gains `StoreBufferPoolBytes` + `StoreCopyThreads`; `analyzer.Enrich` routes through `graph.OpenWithOptions` with those values.

What did NOT land

The plan's Tasks C1-C5 (streaming three-pass refactor — NodeStream / NodeIndex / Pass-1 index build / Pass-2 linkers against compact index / Pass-3 streaming load) are NOT in this PR. They remain a worthwhile future investment if scale grows past ~5M nodes; at current ~/projects/-scale they're over-engineering.

Test plan

  • `go test ./... -count=1` — 877 pass
  • `/tmp/codeiq enrich ~/projects/polyglot-bench/airflow --memprofile=...` — 1.27 GB peak RSS recorded via /usr/bin/time
  • `/tmp/codeiq enrich ~/projects/` — 3.12 GB peak RSS (acceptance bar < 4 GiB)
  • `/tmp/codeiq stats ~/projects/` — 350k nodes returned
  • CI on this PR

… (Phase C)

Phase C of the OOM fix plan. Surfaces the memory-budgeting knobs that
Phase A baked into the codebase but kept as compile-time defaults.

Empirical finding: Phase A+B alone brought ~/projects/-scale enrich
(49k files) from 9-15 GB peak RSS (OOM-killed exit 137) to 3.12 GB —
well under the 4 GiB acceptance bar. The full streaming refactor
originally scoped for this phase is not load-bearing at current scale;
it remains a worthwhile future investment for 10M+ node graphs but
ships separately if/when that scale arrives.

Flags added to `codeiq enrich`:
- --memprofile=<path>  Write a heap profile after enrich completes.
                       For OOM debugging — pair with /usr/bin/time -v.
- --max-buffer-pool=N  Cap Kuzu BufferPoolSize in bytes (default 2 GiB).
                       For hosts where 2 GiB is still too much.
- --copy-threads=N     Cap Kuzu COPY FROM parallelism (default
                       min(4, GOMAXPROCS)).

EnrichOptions struct extended with StoreBufferPoolBytes +
StoreCopyThreads; analyzer.Enrich now routes through
graph.OpenWithOptions with those values.

Plan: docs/superpowers/plans/2026-05-13-enrich-oom-fix.md Phase C.

Verification:
- go test ./... -count=1: 877 pass.
- /tmp/codeiq-c enrich ~/projects/polyglot-bench/airflow recorded
  1.27 GB peak RSS via /usr/bin/time -v (down from pre-Phase-A
  3.8 GB observed by the research pprof agent).
- ~/projects/ enrich peak RSS: 3.12 GB (below 4 GiB acceptance bar).
@aksOps aksOps merged commit 51efda7 into main May 13, 2026
13 checks passed
@aksOps aksOps deleted the perf/enrich-oom-phase-c branch May 13, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant