diff --git a/AGENTS.md b/AGENTS.md index e598e400..84de6b09 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,7 +4,7 @@ ## What this repo is -codeiq is a CLI + read-only server that builds a deterministic code-knowledge graph over a codebase. No AI, no external APIs — pure static analysis. See [`/CLAUDE.md`](CLAUDE.md) for the architecture, package map, pipeline, conventions, and gotchas. +codeiq is a CLI + read-only stdio MCP server that builds a deterministic code-knowledge graph over a codebase. No AI in the index/enrich pipeline; LLM use is opt-in via `codeiq review`. Single static Go binary (CGO for Kuzu + SQLite). See [`/CLAUDE.md`](CLAUDE.md) for the architecture, package map, pipeline, conventions, and gotchas. ## Pointers, in priority order @@ -22,9 +22,9 @@ codeiq is a CLI + read-only server that builds a deterministic code-knowledge gr - **Sign every commit.** The repo-local config (`scripts/setup-git-signed.sh`) makes this automatic; do not rewrite it. - **One logical change per commit.** Conventional-commit subjects (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`). - **Squash-merge only.** Branch protection rejects merge commits and force-pushes to `main`. -- **Tests + jacoco gate must pass.** `mvn -B -ntp clean verify` is the contract. +- **Tests + race + vet must pass.** `cd go && CGO_ENABLED=1 go test ./... -count=1` is the contract; release CI runs `-race` too. 880+ tests today. - **Determinism is non-negotiable.** Same input → same output, byte-for-byte. Any new detector ships with a determinism test. -- **Read-only serving layer.** MCP and REST API on the `serve` path do not mutate. If you find yourself adding `POST /api/` that writes, stop and reconsider. +- **Read-only MCP server.** Tool calls never write to the graph. Index/enrich happen only via the CLI commands `codeiq index` / `codeiq enrich`. The Java reference's REST API + React SPA were deleted in Phase 6 cutover (#132) and will not be reintroduced. - **No secrets in code.** Repo-level GitHub Actions secrets only. ## Paperclip / RAN-* coordination diff --git a/CHANGELOG.md b/CHANGELOG.md index 1a220740..2ca50722 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,56 @@ for that specific tag for the per-commit details. ## [Unreleased] +### Fixed + +- `codeiq enrich` survives polyglot codebases at `~/projects/` scale (49k + files, 15 GiB host). Pre-fix runs OOM-killed at exit 137; now exits 0 + with peak RSS 1.8–2.2 GiB. PRs #145, #146, #147, #148. +- Five enrich pipeline correctness fixes that surfaced at scale (each one + blocked the next — landed in order): + - PR #149: MCP dispatch arg names in `tools_consolidated` (7 modes were + permanently returning `INVALID_INPUT`). + - PR #150: pipe-delimited Kuzu COPY staging — JSON property values + containing commas (e.g. Python `imports`) no longer break the parser. + - PR #151: path-qualified SERVICE node IDs — two modules sharing a name + in different folders no longer collide on primary key. + - PR #152: TOML detector unquotes quoted keys (e.g. airflow's + `.cherry_picker.toml` `"check_sha" = ...`). + - PR #153: explicit `QUOTE='"', ESCAPE='"'` on Kuzu COPY so RFC-4180 + quoting round-trips correctly (Istio EDS cluster names with `|`). + +### Changed + +- **Kuzu 0.7.1 → 0.11.3** (PR #155). Migrates the embedded graph DB to a + release with bundled FTS extension and bound `LIMIT`/`SKIP` parameters. +- **Real FTS replaces CONTAINS predicates** (PR #159). `SearchByLabel` + and `SearchLexical` now route through `CALL QUERY_FTS_INDEX` with BM25 + ranking; CONTAINS fallback retained for pre-enrich graphs. Auto-suffix + `*` on single-token queries preserves prefix-match UX. Two indexes + created at enrich time: + - `code_node_label_fts` over `(label, fqn_lower)` + - `code_node_lexical_fts` over `(prop_lex_comment, prop_lex_config_keys)` +- **Parameterized `LIMIT`/`SKIP`** across the query layer (PR #159). + `intLiteral` helper removed; `fmt.Sprintf("LIMIT %d", n)` replaced with + `LIMIT $lim` bindings. +- **Dropped `stringsToAny` widener** (PR #159). Kuzu 0.11's Go binding + accepts `[]string` directly for `IN $param` clauses. +- **Mutation gate** allow-lists read-only `CALL QUERY_FTS_INDEX` (PR #159); + `CREATE_FTS_INDEX` / `DROP_FTS_INDEX` stay blocked under + `OpenReadOnly`. +- **Dependabot config** rewritten (PR #154) — drops the dead Java `maven` + (`/`) and `npm` (`/src/main/frontend`) ecosystems, adds `gomod` (`/go`) + with groups for `kuzu`, `tree-sitter`, `mcp`, `cobra-viper`, `sqlite`, + `test-libs`. Routine bumps land via PRs #155, #156, #157, #158. + +### Added + +- `codeiq enrich` knobs (PR #147): `--memprofile=` writes a Go + heap profile; `--max-buffer-pool=N` overrides the 2 GiB Kuzu cap; + `--copy-threads=N` overrides `MaxNumThreads` default. +- Perf-gate CI step (PR #148): `/usr/bin/time -v codeiq enrich` runs on + fixture-multi-lang; fails the build if peak RSS exceeds 300 MB. + ## [v0.3.0] - 2026-05-13 ### Changed diff --git a/CLAUDE.md b/CLAUDE.md index 4eeb1edc..8d1fc17b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -26,15 +26,16 @@ landing) and `c630245` (release infra). - **Go 1.25.10** — toolchain pin; module min is 1.25.0 (clamped by the MCP SDK's own `go` directive). -- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB. - CGO. v0.11.3 capability matrix documented in `## Gotchas` below. -- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache. CGO. +- **Kuzu 0.11.3** (`github.com/kuzudb/go-kuzu`) — embedded graph DB. + CGO. Native FTS via `CALL CREATE_FTS_INDEX` / `QUERY_FTS_INDEX`. + Capability matrix documented in `## Gotchas` below. +- **`mattn/go-sqlite3` 1.14.44** — SQLite analysis cache. CGO. - **`smacker/go-tree-sitter`** — AST parsing for Java / Python / TypeScript / Go. - **`modelcontextprotocol/go-sdk` v1.6** — stdio MCP server. v1.6 API shape: `Server.Serve(ctx, mcpsdk.Transport)`; no `NewStdioTransport` helper. -- **`spf13/cobra`** — CLI framework. Subcommand registration via +- **`spf13/cobra` 1.10.2** — CLI framework. Subcommand registration via `internal/cli` blank imports. ## Architecture diff --git a/PROJECT_SUMMARY.md b/PROJECT_SUMMARY.md index 7c503a40..241a66ee 100644 --- a/PROJECT_SUMMARY.md +++ b/PROJECT_SUMMARY.md @@ -22,11 +22,12 @@ - **Go 1.25.10** — toolchain pin in `go/go.mod` (module min 1.25.0, clamped by `modelcontextprotocol/go-sdk`). -- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB. -- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache. +- **Kuzu 0.11.3** (`github.com/kuzudb/go-kuzu`) — embedded graph DB. + Native FTS via `QUERY_FTS_INDEX` (bundled). +- **`mattn/go-sqlite3` 1.14.44** — SQLite analysis cache. - **`smacker/go-tree-sitter`** — AST parsing (Java / Python / TS / Go). - **`modelcontextprotocol/go-sdk` v1.6** — stdio MCP server. -- **`spf13/cobra`** — CLI framework. +- **`spf13/cobra` 1.10.2** — CLI framework. - Manifest files read: `go/go.mod`, `go/go.sum`. ## Entry points diff --git a/SECURITY.md b/SECURITY.md index ce02c32c..e63a3232 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -2,14 +2,14 @@ ## Supported versions -Security fixes are issued against the latest minor release line on Maven Central. While codeiq is pre-1.0 (`0.x.y`) only the **latest** released `0.MINOR.x` line receives backports; older minor lines are EOL the moment a new minor ships. +Security fixes are issued against the latest minor release line. While codeiq is pre-1.0 (`0.x.y`) only the **latest** released `0.MINOR.x` line receives backports; older minor lines are EOL the moment a new minor ships. | Version line | Status | |---|---| -| `0.1.x` | Supported (current) | -| `< 0.1.0` | Unsupported | +| `0.3.x` | Supported (current — Go single binary) | +| `0.2.x` and below | Unsupported (Java/Spring Boot reference, deleted at Phase 6 cutover) | -`-SNAPSHOT` builds are development snapshots; they do not receive security fixes by themselves — you should be tracking the latest tagged release. +Development builds (untagged `main`) are not covered — track the latest tagged release. ## Reporting a vulnerability @@ -22,8 +22,8 @@ Use one of: Please include: -- The codeiq version (`java -jar code-iq-*-cli.jar version` or `pom.xml` coordinate). -- The shortest reproducer you can produce — a CLI command or test case is ideal. +- The codeiq version (`codeiq --version`). +- The shortest reproducer you can produce — a CLI command, a test case, or an indexed-fixture path. - Your assessment of impact (e.g., RCE, path traversal, info-disclosure, DoS). - Whether the issue is in a transitive dependency (please name the dependency + advisory ID if known). @@ -40,26 +40,28 @@ We do not currently run a paid bug bounty. In-scope: -- The codeiq CLI (`code-iq-*-cli.jar`). -- The library JAR (`io.github.randomcodespace.iq:code-iq`). -- The bundled REST API + MCP server (`serve` subcommand) — including path traversal, authn/authz, deserialisation, request smuggling, and SSRF. -- The bundled React UI assets shipped inside the JAR. -- The pipeline cache (H2) and graph store (Neo4j Embedded) — including local privilege escalation and data tampering. +- The `codeiq` CLI binary and every subcommand (`index`, `enrich`, `mcp`, `query`, `find`, `cypher`, `stats`, `flow`, `graph`, `topology`, `review`, `cache`, `plugins`, `config`). +- The stdio MCP server (`codeiq mcp`) — including its 10 user-facing tools (`graph_summary`, `find_in_graph`, `inspect_node`, `trace_relationships`, `analyze_impact`, `topology_view`, `run_cypher`, `read_file`, `generate_flow`, `review_changes`). The mutation gate on `run_cypher` is in-scope — bypassing it to mutate the read-only Kuzu store is a vulnerability. +- The pipeline cache (SQLite, `.codeiq/cache/codeiq.sqlite`) and graph store (Kuzu embedded, `.codeiq/graph/codeiq.kuzu`) — including local privilege escalation and data tampering of the indexed graph. +- File-read sandboxing in `read_file` and `codeiq review` — path traversal out of the indexed root is in-scope. +- The release pipeline — Goreleaser config, signing keys (cosign keyless via OIDC), GitHub Actions workflows under `.github/workflows/`, and the published artifacts (binary tarballs + checksums + cosign bundles). Out of scope: - Vulnerabilities that require pre-existing local code execution on the developer's machine (we ship as a developer tool — by definition you trust the code you point it at). -- Public-internet attack surface — codeiq does not expose any service to the public internet by default; deploying the `serve` endpoint behind hostile reverse-proxies is out of scope. -- Findings in third-party services we do not control (Maven Central, GitHub itself, SonarCloud, etc.) — please report those upstream. +- Public-internet attack surface — codeiq does not expose any service to the public internet. It is a CLI + stdio MCP server only; there is no REST API and no web UI (the Java reference had both; they were deleted in Phase 6 cutover and will not be reintroduced). +- Vulnerabilities in the LLM endpoint used by `codeiq review` (Ollama local or cloud) — those are the LLM vendor's surface area. +- Findings in third-party services we do not control (GitHub itself, OpenSSF, Socket Security, etc.) — please report those upstream. ## Hardening references - [`shared/runbooks/engineering-standards.md`](shared/runbooks/engineering-standards.md) — CVE policy and quality gates. - [`shared/runbooks/rollback.md`](shared/runbooks/rollback.md) §6 — secret rotation flow. - `.github/workflows/scorecard.yml` — OpenSSF Scorecard supply-chain checks. -- GitHub repo-level **CodeQL default setup** (java-kotlin + javascript-typescript + actions) — code scanning, SARIF in the Security tab. Configured under repo Settings → Code security → Code scanning, not via a workflow file (a workflow-driven `codeql.yml` was tried and removed because GitHub rejects duplicate SARIF uploads when default setup is on for the same language). -- `.github/dependabot.yml` — automated dependency / GHA / npm bumps. +- `.github/workflows/security.yml` — CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM, Socket Security on every PR. +- `.github/workflows/perf-gate.yml` — enrich memory regression gate (300 MB ceiling on fixture-multi-lang). +- `.github/dependabot.yml` — automated `gomod` + `github-actions` bumps, grouped per ecosystem. ## Changelog -This file is versioned as part of the repo. Material changes (e.g., raising the supported-versions table, changing the disclosure timeline) are announced via a Release note and a Paperclip board comment. +This file is versioned as part of the repo. Material changes (e.g., raising the supported-versions table, changing the disclosure timeline) are announced via a Release note.