Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## What this repo is

codeiq is a CLI + read-only server that builds a deterministic code-knowledge graph over a codebase. No AI, no external APIs — pure static analysis. See [`/CLAUDE.md`](CLAUDE.md) for the architecture, package map, pipeline, conventions, and gotchas.
codeiq is a CLI + read-only stdio MCP server that builds a deterministic code-knowledge graph over a codebase. No AI in the index/enrich pipeline; LLM use is opt-in via `codeiq review`. Single static Go binary (CGO for Kuzu + SQLite). See [`/CLAUDE.md`](CLAUDE.md) for the architecture, package map, pipeline, conventions, and gotchas.

## Pointers, in priority order

Expand All @@ -22,9 +22,9 @@ codeiq is a CLI + read-only server that builds a deterministic code-knowledge gr
- **Sign every commit.** The repo-local config (`scripts/setup-git-signed.sh`) makes this automatic; do not rewrite it.
- **One logical change per commit.** Conventional-commit subjects (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`).
- **Squash-merge only.** Branch protection rejects merge commits and force-pushes to `main`.
- **Tests + jacoco gate must pass.** `mvn -B -ntp clean verify` is the contract.
- **Tests + race + vet must pass.** `cd go && CGO_ENABLED=1 go test ./... -count=1` is the contract; release CI runs `-race` too. 880+ tests today.
- **Determinism is non-negotiable.** Same input → same output, byte-for-byte. Any new detector ships with a determinism test.
- **Read-only serving layer.** MCP and REST API on the `serve` path do not mutate. If you find yourself adding `POST /api/<verb>` that writes, stop and reconsider.
- **Read-only MCP server.** Tool calls never write to the graph. Index/enrich happen only via the CLI commands `codeiq index` / `codeiq enrich`. The Java reference's REST API + React SPA were deleted in Phase 6 cutover (#132) and will not be reintroduced.
- **No secrets in code.** Repo-level GitHub Actions secrets only.

## Paperclip / RAN-* coordination
Expand Down
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,56 @@ for that specific tag for the per-commit details.

## [Unreleased]

### Fixed

- `codeiq enrich` survives polyglot codebases at `~/projects/` scale (49k
files, 15 GiB host). Pre-fix runs OOM-killed at exit 137; now exits 0
with peak RSS 1.8–2.2 GiB. PRs #145, #146, #147, #148.
- Five enrich pipeline correctness fixes that surfaced at scale (each one
blocked the next — landed in order):
- PR #149: MCP dispatch arg names in `tools_consolidated` (7 modes were
permanently returning `INVALID_INPUT`).
- PR #150: pipe-delimited Kuzu COPY staging — JSON property values
containing commas (e.g. Python `imports`) no longer break the parser.
- PR #151: path-qualified SERVICE node IDs — two modules sharing a name
in different folders no longer collide on primary key.
- PR #152: TOML detector unquotes quoted keys (e.g. airflow's
`.cherry_picker.toml` `"check_sha" = ...`).
- PR #153: explicit `QUOTE='"', ESCAPE='"'` on Kuzu COPY so RFC-4180
quoting round-trips correctly (Istio EDS cluster names with `|`).

### Changed

- **Kuzu 0.7.1 → 0.11.3** (PR #155). Migrates the embedded graph DB to a
release with bundled FTS extension and bound `LIMIT`/`SKIP` parameters.
- **Real FTS replaces CONTAINS predicates** (PR #159). `SearchByLabel`
and `SearchLexical` now route through `CALL QUERY_FTS_INDEX` with BM25
ranking; CONTAINS fallback retained for pre-enrich graphs. Auto-suffix
`*` on single-token queries preserves prefix-match UX. Two indexes
created at enrich time:
- `code_node_label_fts` over `(label, fqn_lower)`
- `code_node_lexical_fts` over `(prop_lex_comment, prop_lex_config_keys)`
- **Parameterized `LIMIT`/`SKIP`** across the query layer (PR #159).
`intLiteral` helper removed; `fmt.Sprintf("LIMIT %d", n)` replaced with
`LIMIT $lim` bindings.
- **Dropped `stringsToAny` widener** (PR #159). Kuzu 0.11's Go binding
accepts `[]string` directly for `IN $param` clauses.
- **Mutation gate** allow-lists read-only `CALL QUERY_FTS_INDEX` (PR #159);
`CREATE_FTS_INDEX` / `DROP_FTS_INDEX` stay blocked under
`OpenReadOnly`.
- **Dependabot config** rewritten (PR #154) — drops the dead Java `maven`
(`/`) and `npm` (`/src/main/frontend`) ecosystems, adds `gomod` (`/go`)
with groups for `kuzu`, `tree-sitter`, `mcp`, `cobra-viper`, `sqlite`,
`test-libs`. Routine bumps land via PRs #155, #156, #157, #158.

### Added

- `codeiq enrich` knobs (PR #147): `--memprofile=<path>` writes a Go
heap profile; `--max-buffer-pool=N` overrides the 2 GiB Kuzu cap;
`--copy-threads=N` overrides `MaxNumThreads` default.
- Perf-gate CI step (PR #148): `/usr/bin/time -v codeiq enrich` runs on
fixture-multi-lang; fails the build if peak RSS exceeds 300 MB.

## [v0.3.0] - 2026-05-13

### Changed
Expand Down
9 changes: 5 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,16 @@ landing) and `c630245` (release infra).

- **Go 1.25.10** — toolchain pin; module min is 1.25.0 (clamped by the
MCP SDK's own `go` directive).
- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
CGO. v0.11.3 capability matrix documented in `## Gotchas` below.
- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache. CGO.
- **Kuzu 0.11.3** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
CGO. Native FTS via `CALL CREATE_FTS_INDEX` / `QUERY_FTS_INDEX`.
Capability matrix documented in `## Gotchas` below.
- **`mattn/go-sqlite3` 1.14.44** — SQLite analysis cache. CGO.
- **`smacker/go-tree-sitter`** — AST parsing for Java / Python /
TypeScript / Go.
- **`modelcontextprotocol/go-sdk` v1.6** — stdio MCP server. v1.6 API
shape: `Server.Serve(ctx, mcpsdk.Transport)`; no `NewStdioTransport`
helper.
- **`spf13/cobra`** — CLI framework. Subcommand registration via
- **`spf13/cobra` 1.10.2** — CLI framework. Subcommand registration via
`internal/cli` blank imports.

## Architecture
Expand Down
7 changes: 4 additions & 3 deletions PROJECT_SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,12 @@

- **Go 1.25.10** — toolchain pin in `go/go.mod` (module min 1.25.0,
clamped by `modelcontextprotocol/go-sdk`).
- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache.
- **Kuzu 0.11.3** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
Native FTS via `QUERY_FTS_INDEX` (bundled).
- **`mattn/go-sqlite3` 1.14.44** — SQLite analysis cache.
- **`smacker/go-tree-sitter`** — AST parsing (Java / Python / TS / Go).
- **`modelcontextprotocol/go-sdk` v1.6** — stdio MCP server.
- **`spf13/cobra`** — CLI framework.
- **`spf13/cobra` 1.10.2** — CLI framework.
- Manifest files read: `go/go.mod`, `go/go.sum`.

## Entry points
Expand Down
34 changes: 18 additions & 16 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

## Supported versions

Security fixes are issued against the latest minor release line on Maven Central. While codeiq is pre-1.0 (`0.x.y`) only the **latest** released `0.MINOR.x` line receives backports; older minor lines are EOL the moment a new minor ships.
Security fixes are issued against the latest minor release line. While codeiq is pre-1.0 (`0.x.y`) only the **latest** released `0.MINOR.x` line receives backports; older minor lines are EOL the moment a new minor ships.

| Version line | Status |
|---|---|
| `0.1.x` | Supported (current) |
| `< 0.1.0` | Unsupported |
| `0.3.x` | Supported (current — Go single binary) |
| `0.2.x` and below | Unsupported (Java/Spring Boot reference, deleted at Phase 6 cutover) |

`-SNAPSHOT` builds are development snapshots; they do not receive security fixes by themselves — you should be tracking the latest tagged release.
Development builds (untagged `main`) are not covered — track the latest tagged release.

## Reporting a vulnerability

Expand All @@ -22,8 +22,8 @@ Use one of:

Please include:

- The codeiq version (`java -jar code-iq-*-cli.jar version` or `pom.xml` coordinate).
- The shortest reproducer you can produce — a CLI command or test case is ideal.
- The codeiq version (`codeiq --version`).
- The shortest reproducer you can produce — a CLI command, a test case, or an indexed-fixture path.
- Your assessment of impact (e.g., RCE, path traversal, info-disclosure, DoS).
- Whether the issue is in a transitive dependency (please name the dependency + advisory ID if known).

Expand All @@ -40,26 +40,28 @@ We do not currently run a paid bug bounty.

In-scope:

- The codeiq CLI (`code-iq-*-cli.jar`).
- The library JAR (`io.github.randomcodespace.iq:code-iq`).
- The bundled REST API + MCP server (`serve` subcommand) — including path traversal, authn/authz, deserialisation, request smuggling, and SSRF.
- The bundled React UI assets shipped inside the JAR.
- The pipeline cache (H2) and graph store (Neo4j Embedded) — including local privilege escalation and data tampering.
- The `codeiq` CLI binary and every subcommand (`index`, `enrich`, `mcp`, `query`, `find`, `cypher`, `stats`, `flow`, `graph`, `topology`, `review`, `cache`, `plugins`, `config`).
- The stdio MCP server (`codeiq mcp`) — including its 10 user-facing tools (`graph_summary`, `find_in_graph`, `inspect_node`, `trace_relationships`, `analyze_impact`, `topology_view`, `run_cypher`, `read_file`, `generate_flow`, `review_changes`). The mutation gate on `run_cypher` is in-scope — bypassing it to mutate the read-only Kuzu store is a vulnerability.
- The pipeline cache (SQLite, `.codeiq/cache/codeiq.sqlite`) and graph store (Kuzu embedded, `.codeiq/graph/codeiq.kuzu`) — including local privilege escalation and data tampering of the indexed graph.
- File-read sandboxing in `read_file` and `codeiq review` — path traversal out of the indexed root is in-scope.
- The release pipeline — Goreleaser config, signing keys (cosign keyless via OIDC), GitHub Actions workflows under `.github/workflows/`, and the published artifacts (binary tarballs + checksums + cosign bundles).

Out of scope:

- Vulnerabilities that require pre-existing local code execution on the developer's machine (we ship as a developer tool — by definition you trust the code you point it at).
- Public-internet attack surface — codeiq does not expose any service to the public internet by default; deploying the `serve` endpoint behind hostile reverse-proxies is out of scope.
- Findings in third-party services we do not control (Maven Central, GitHub itself, SonarCloud, etc.) — please report those upstream.
- Public-internet attack surface — codeiq does not expose any service to the public internet. It is a CLI + stdio MCP server only; there is no REST API and no web UI (the Java reference had both; they were deleted in Phase 6 cutover and will not be reintroduced).
- Vulnerabilities in the LLM endpoint used by `codeiq review` (Ollama local or cloud) — those are the LLM vendor's surface area.
- Findings in third-party services we do not control (GitHub itself, OpenSSF, Socket Security, etc.) — please report those upstream.

## Hardening references

- [`shared/runbooks/engineering-standards.md`](shared/runbooks/engineering-standards.md) — CVE policy and quality gates.
- [`shared/runbooks/rollback.md`](shared/runbooks/rollback.md) §6 — secret rotation flow.
- `.github/workflows/scorecard.yml` — OpenSSF Scorecard supply-chain checks.
- GitHub repo-level **CodeQL default setup** (java-kotlin + javascript-typescript + actions) — code scanning, SARIF in the Security tab. Configured under repo Settings → Code security → Code scanning, not via a workflow file (a workflow-driven `codeql.yml` was tried and removed because GitHub rejects duplicate SARIF uploads when default setup is on for the same language).
- `.github/dependabot.yml` — automated dependency / GHA / npm bumps.
- `.github/workflows/security.yml` — CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM, Socket Security on every PR.
- `.github/workflows/perf-gate.yml` — enrich memory regression gate (300 MB ceiling on fixture-multi-lang).
- `.github/dependabot.yml` — automated `gomod` + `github-actions` bumps, grouped per ecosystem.

## Changelog

This file is versioned as part of the repo. Material changes (e.g., raising the supported-versions table, changing the disclosure timeline) are announced via a Release note and a Paperclip board comment.
This file is versioned as part of the repo. Material changes (e.g., raising the supported-versions table, changing the disclosure timeline) are announced via a Release note.