From 86f632b3c1bd769162c186eb904d22a6972585b2 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Thu, 14 May 2026 17:44:11 +0000 Subject: [PATCH] docs(README): visual refresh + supply-chain badges MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit User-requested README glow-up. Replaces the dense 109-line version with a 409-line layout that's actually scannable, plus the badge set the user asked for (OpenSSF Best Practices, OpenSSF Scorecard, Sigstore, SLSA, plus a pkg.go.dev reference). Visual changes: * Centered title block with subtitle + hero badges in 4 grouped rows (release / CI / supply-chain / project-fact). * Three-column feature grid ("Why codeiq") with deterministic / agent-ready / supply-chain-hardened / polyglot / no-AI / single-binary callouts. * ASCII pipeline diagram in "How it works". * Documentation as a 3-column grouped table (starter / reference / operate) for quick navigation. * Collapsible CLI cheatsheet + MCP tool list. * Verification section with three concrete commands (cosign-checksum, cosign-darwin, gh attestation verify). Badge additions: * OpenSSF Best Practices (cii/percentage/12650 — auto-updates with project score) * OpenSSF Scorecard (img.shields.io/ossf-scorecard/) * Sigstore keyless badge (project-fact, not auto-status) * SLSA build provenance badge (project-fact) * Perf-gate workflow status * Scorecard workflow status * pkg.go.dev reference * 880+ tests fact * CGO required fact Badge omission with explicit footnote: * SonarQube/SonarCloud — codeiq deliberately replaced Sonar + CodeQL + OWASP Dependency-Check with the OSS-CLI security stack in CI (semgrep + osv-scanner + trivy + gitleaks + jscpd + govulncheck + native GitHub CodeQL). A Sonar badge would misrepresent the setup. Inline note under the badge block + cross-link to docs/07-integrations.md. All badge URLs spot-checked HTTP 200/302 from this host. No code changes. Co-Authored-By: Claude Opus 4.7 --- README.md | 406 +++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 353 insertions(+), 53 deletions(-) diff --git a/README.md b/README.md index 279325a8..0f4b5c66 100644 --- a/README.md +++ b/README.md @@ -1,81 +1,219 @@ +
+ # codeiq -**Deterministic code-knowledge-graph CLI + stdio MCP server. 100 detectors, 35+ languages. Pure static analysis — no AI in the index/enrich pipeline; LLM use is opt-in for PR review.** +### Deterministic code-knowledge-graph CLI + stdio MCP server + +**Map a polyglot codebase into a queryable graph. 100 detectors. 35+ languages. Zero AI in the pipeline.** -

- Latest release - CI - Security +
+ +

+ Latest release Go 1.25.10 - License + pkg.go.dev + License +

+ +

+ CI + Perf Gate + Security + Scorecard CI +

+ +

+ OpenSSF Best Practices + OpenSSF Scorecard + Sigstore keyless + SLSA Build Provenance +

+ +

100 Detectors 35+ Languages - MCP Stdio - Kuzu 0.11.3 + 880+ Tests + MCP stdio + Kuzu 0.11.3 + CGO required

-codeiq scans a codebase, builds a deterministic graph of services / endpoints / entities / infra / auth / framework usage, and exposes it via: +
+ + + Note on SonarQube: codeiq deliberately uses an in-house OSS-CLI security stack (CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, jscpd, govulncheck) instead of Sonar — see docs/07-integrations.md & security.yml. + + +
+ +--- + +## Table of contents + +- [Why codeiq](#why-codeiq) +- [How it works](#how-it-works) +- [Install](#install) +- [Quickstart](#quickstart) +- [MCP integration](#mcp-integration) +- [CLI cheatsheet](#cli-cheatsheet) +- [Architecture at a glance](#architecture-at-a-glance) +- [Verification](#verification-supply-chain) +- [Documentation](#documentation) +- [Project status](#project-status) +- [Contributing](#contributing) +- [License](#license) + +--- + +## Why codeiq + + + + + + + + + + + + +
+ +### Deterministic + +Same input → same output, byte-for-byte. Detector emissions are confidence-tagged (`LEXICAL` / `SYNTACTIC` / `RESOLVED`); the graph builder dedup-merges with confidence-aware property union and drops phantom edges at snapshot. Every detector ships a determinism test. + + + +### Agent-ready + +Stdio MCP server with 10 read-only tools wired for Claude Code / Cursor / Cline. Mode-driven surface (`graph_summary`, `find_in_graph`, `inspect_node`, `trace_relationships`, `analyze_impact`, `topology_view`) plus `run_cypher` for the power users. + + + +### Supply-chain hardened + +Goreleaser + Cosign keyless via GitHub OIDC + Sigstore Rekor transparency log + Syft SPDX SBOMs + SLSA build provenance attestation + OpenSSF Scorecard + 6 OSS-CLI security scanners in CI. + +
+ +### Polyglot + +100 detectors across **35+ languages**: Java, Kotlin, Scala, Python, TypeScript, JavaScript, Go, Rust, C#, C++, plus IaC (Terraform, Bicep, Helm, Kubernetes, Docker, CloudFormation), config (YAML/JSON/TOML/INI), SQL, protobuf, shell, and more. + + -- a CLI (`codeiq index → enrich → query/stats/find/cypher/topology/flow`) -- a stdio MCP server (10 read-only tools for Claude Code / Cursor) -- an LLM PR review (`codeiq review`, default backend Ollama local; cloud via `OLLAMA_API_KEY`) +### No AI in the pipeline -Same input ⇒ same output, every time. Detector emissions are confidence-tagged (`LEXICAL` / `SYNTACTIC` / `RESOLVED`); the graph builder dedup-merges with confidence-aware property union and drops phantom edges at snapshot. +Index + enrich + every MCP query is pure static analysis. The only LLM touch is the opt-in `codeiq review` subcommand. No telemetry. No auto-update. No outbound network during core flows. + + + +### Single static binary + +~25 MB. CGO embeds Kuzu (graph) + SQLite (cache) + tree-sitter (parser). No daemons. No external services. Works behind corporate firewalls / air-gapped after the initial install. + +
+ +--- + +## How it works + +``` + source ┌─────────────┐ + tree ─► index ──────► ┌──────────┐ ──► enrich ──────► │ Kuzu │ + FileDiscovery │ SQLite │ linkers + │ graph │ + tree-sitter │ cache │ layer classify │ (FTS-idx) │ + 100 detectors │ │ intelligence │ │ + dedup + sort └──────────┘ ServiceDetector └──────┬──────┘ + bulk COPY → Kuzu │ + ▼ + ┌───────────────────────────────────────────────┐ + │ Read-only consumers (all powered by Kuzu): │ + │ stats, find, query, cypher, flow, graph, │ + │ topology, review (+ Ollama LLM) │ + │ mcp (stdio JSON-RPC, 10 tools) │ + └───────────────────────────────────────────────┘ +``` + +Three commands cover the lifecycle: + +| Step | Command | What lands | +|---|---|---| +| **1.** Index | `codeiq index ` | `/.codeiq/cache/codeiq.sqlite` (content-hash keyed; resumable) | +| **2.** Enrich | `codeiq enrich ` | `/.codeiq/graph/codeiq.kuzu/` + BM25 FTS indexes | +| **3.** Query | `codeiq mcp \| stats \| find \| query \| cypher \| ...` | Read-only consumers of the Kuzu store | + +See [`docs/04-main-flows.md`](docs/04-main-flows.md) for per-flow entry points + failure modes. + +--- ## Install -### Pre-built (Linux / macOS) +### Pre-built binary (Linux amd64 / arm64, macOS arm64) ```bash +# Pick your platform; replace if needed curl -L https://github.com/RandomCodeSpace/codeiq/releases/latest/download/codeiq_$(uname -s | tr A-Z a-z)_$(uname -m | sed s/x86_64/amd64/).tar.gz | tar xz sudo install codeiq /usr/local/bin/ codeiq --version ``` -Cosign keyless verification: -```bash -cosign verify-blob \ - --bundle checksums.sha256.cosign.bundle \ - --certificate-identity-regexp 'https://github.com/RandomCodeSpace/codeiq/.github/workflows/release-go.yml@.*' \ - --certificate-oidc-issuer https://token.actions.githubusercontent.com \ - checksums.sha256 -``` - -### From source (Go 1.25.0+ with CGO toolchain) +### `go install` ```bash CGO_ENABLED=1 go install github.com/randomcodespace/codeiq/cmd/codeiq@latest ``` -Or: +> **Requires** Go 1.25.0+ and a C/C++ toolchain (Kuzu, SQLite, and tree-sitter all need CGO). + +### Build from source + ```bash git clone https://github.com/RandomCodeSpace/codeiq.git cd codeiq CGO_ENABLED=1 go build -o /usr/local/bin/codeiq ./cmd/codeiq +codeiq --version ``` +Full setup checklist in [`docs/01-local-setup.md`](docs/01-local-setup.md). + +--- + ## Quickstart ```bash -codeiq index /path/to/repo # scan → SQLite cache (.codeiq/cache/codeiq.sqlite) -codeiq enrich /path/to/repo # load cache → Kuzu graph (.codeiq/graph/codeiq.kuzu) + build FTS indexes -codeiq stats /path/to/repo -codeiq find endpoints /path/to/repo -codeiq query consumers /path/to/repo -codeiq topology /path/to/repo -codeiq flow overview /path/to/repo --format mermaid -codeiq mcp /path/to/repo # stdio MCP server (for Claude Code / Cursor) -codeiq review /path/to/repo --base origin/main --head HEAD # local Ollama +# 1. Scan files → SQLite cache +codeiq index /path/to/repo + +# 2. Load cache → Kuzu graph + FTS indexes +codeiq enrich /path/to/repo + +# 3. Ask questions +codeiq stats /path/to/repo +codeiq find endpoints /path/to/repo +codeiq query consumers /path/to/repo +codeiq topology /path/to/repo +codeiq flow overview /path/to/repo --format mermaid + +# 4. Wire into your AI agent (Claude Code / Cursor / Cline) +codeiq mcp /path/to/repo + +# 5. Get an LLM-driven PR review (local Ollama by default) +codeiq review /path/to/repo --base origin/main --head HEAD ``` +--- + ## MCP integration -Add to your MCP client config (`.mcp.json`): +Add to your MCP client config (`.mcp.json` at the repo root, or your editor's MCP settings): ```json { "mcpServers": { - "code-mcp": { + "codeiq": { "command": "codeiq", "args": ["mcp", "/path/to/repo"] } @@ -83,27 +221,189 @@ Add to your MCP client config (`.mcp.json`): } ``` -Ten user-facing tools: six mode-driven (`graph_summary`, `find_in_graph`, `inspect_node`, `trace_relationships`, `analyze_impact`, `topology_view`) plus `run_cypher` (read-only Cypher escape hatch), `read_file`, `generate_flow`, `review_changes`. +
+Ten user-facing tools + +| Tool | Modes | +|---|---| +| `graph_summary` | `overview` / `categories` / `capabilities` / `provenance` | +| `find_in_graph` | `nodes` / `edges` / `text` / `fuzzy` / `by_file` / `by_endpoint` | +| `inspect_node` | `neighbors` / `ego` / `evidence` / `source` | +| `trace_relationships` | `callers` / `consumers` / `producers` / `dependencies` / `dependents` / `shortest_path` | +| `analyze_impact` | `blast_radius` / `trace` / `cycles` / `circular_deps` / `dead_code` / `dead_services` / `bottlenecks` | +| `topology_view` | `summary` / `service` / `service_deps` / `service_dependents` / `flow` | +| `run_cypher` | Read-only Cypher escape hatch; mutation gate enforced | +| `read_file` | Path-sandboxed source reader (full file or line range) | +| `generate_flow` | Architecture flow diagrams (mermaid / dot / yaml) — 5 views | +| `review_changes` | LLM-driven git-diff review against the graph (Ollama) | + +
+ +--- + +## CLI cheatsheet + +
+Click to expand + +| Command | Purpose | +|---|---| +| `index [path]` | Scan files → SQLite analysis cache | +| `enrich [path]` | Load cache → Kuzu graph + build FTS indexes | +| `mcp [path]` | Stdio MCP server for Claude Code / Cursor | +| `stats [path]` | Categorized statistics (graph / languages / frameworks / infra / connections / auth / architecture) | +| `query [path]` | `consumers` / `producers` / `callers` / `dependencies` / `dependents` | +| `find [path]` | `endpoints` / `guards` / `entities` / `topics` / `queues` / `services` / `databases` / `components` | +| `cypher [path]` | Read-only Cypher against Kuzu | +| `flow [path]` | Architecture diagrams — `overview` / `ci` / `deploy` / `runtime` / `auth` | +| `graph [path]` | Export full graph as json / yaml / mermaid / dot | +| `topology [path]` | Service topology + `service-detail` / `blast-radius` / `bottlenecks` / `circular` / `dead` / `path` | +| `review [path]` | LLM-driven PR review (Ollama local by default; cloud via `OLLAMA_API_KEY`) | +| `cache ` | Inspect / list / inspect-row / clear the SQLite cache | +| `plugins ` | List + inspect registered detectors | +| `version` | Build info (version, commit, date, Go toolchain, platform, features) | + +Run `codeiq --help` for full flag listings. Full reference in [`docs/05-configuration.md`](docs/05-configuration.md). + +
+ +--- + +## Architecture at a glance + +``` +codeiq/ +├── cmd/codeiq/main.go ── 5-line entry shim +├── internal/ +│ ├── analyzer/ ── index + enrich pipelines + GraphBuilder + ServiceDetector +│ ├── cache/ ── SQLite cache (WAL, content-hash keyed, 5 tables) +│ ├── cli/ ── cobra subcommands + detectors_register.go (choke point) +│ ├── detector/ ── 100 detectors organized by family +│ │ ├── jvm/{java,kotlin,scala}/ python/ typescript/ golang/ +│ │ ├── frontend/ csharp/ systems/{cpp,rust}/ iac/ structured/ +│ │ ├── auth/ proto/ sql/ markup/ script/shell/ generic/ +│ │ └── base/ ── shared helpers (NOT detectors) +│ ├── flow/ ── architecture-flow diagram engine +│ ├── graph/ ── Kuzu facade + FTS + mutation gate +│ ├── intelligence/ ── Lexical enricher + per-language extractors +│ ├── mcp/ ── 10 MCP tools (stdio JSON-RPC) +│ ├── model/ ── CodeNode / CodeEdge / NodeKind (34) / EdgeKind (28) / Confidence / Layer +│ ├── parser/ ── tree-sitter + structured parsers +│ ├── query/ ── service / topology / stats / dead-code Cypher templates +│ └── review/ ── PR-review pipeline (diff + Ollama) +├── parity/ ── parity harness (build tag `parity`) +├── testdata/ ── fixture-minimal + fixture-multi-lang +├── .github/workflows/ ── go-ci, perf-gate, release-go, release-darwin, security, scorecard +└── .goreleaser.yml ── Goreleaser v2 (CGO multi-arch + Cosign + Syft) +``` + +Deep dive in [`docs/02-architecture.md`](docs/02-architecture.md) and [`docs/03-code-map.md`](docs/03-code-map.md). + +--- + +## Verification (supply chain) + +Every release artifact is keyless-signed via Cosign + GitHub OIDC and recorded in the Sigstore Rekor transparency log. SLSA build provenance attestations land in GitHub's attestations store. + +### Verify the checksum manifest signature + +```bash +cosign verify-blob \ + --bundle checksums.sha256.cosign.bundle \ + --certificate-identity-regexp 'https://github.com/RandomCodeSpace/codeiq/.github/workflows/release-go.yml@.*' \ + --certificate-oidc-issuer https://token.actions.githubusercontent.com \ + checksums.sha256 +``` + +### Verify the darwin tarball (signed separately) + +```bash +cosign verify-blob \ + --bundle codeiq_0.4.1_darwin_arm64.tar.gz.cosign.bundle \ + --certificate-identity-regexp 'https://github.com/RandomCodeSpace/codeiq/.github/workflows/release-darwin.yml@.*' \ + --certificate-oidc-issuer https://token.actions.githubusercontent.com \ + codeiq_0.4.1_darwin_arm64.tar.gz +``` + +### Verify the SLSA build provenance + +```bash +gh attestation verify codeiq_0.4.1_linux_amd64.tar.gz --owner RandomCodeSpace +``` + +--- ## Documentation -| File | Topic | + + + + + + + + + + + +
Starter packReferenceOperate
+ +[Project overview](docs/00-project-overview.md)
+[Local setup](docs/01-local-setup.md)
+[Architecture](docs/02-architecture.md)
+[Main flows](docs/04-main-flows.md) + +
+ +[Code map](docs/03-code-map.md)
+[Configuration](docs/05-configuration.md)
+[Data model](docs/06-data-model.md)
+[Integrations](docs/07-integrations.md) + +
+ +[Testing](docs/08-testing.md)
+[Build / deploy / release](docs/09-build-deploy-release.md)
+[Known risks + TODOs](docs/10-known-risks-and-todos.md)
+[Agent handoff](docs/11-agent-handoff.md) + +
+ +Architectural decisions: [`docs/adr/`](docs/adr/). Repo-specific Claude Code instructions: [`CLAUDE.md`](CLAUDE.md). + +--- + +## Project status + +| Surface | State | |---|---| -| [`docs/00-project-overview.md`](docs/00-project-overview.md) | What it is, who it's for, current status | -| [`docs/01-local-setup.md`](docs/01-local-setup.md) | Prereqs, build, test, common issues | -| [`docs/02-architecture.md`](docs/02-architecture.md) | Components, data flow, tradeoffs | -| [`docs/03-code-map.md`](docs/03-code-map.md) | Directory-by-directory tour | -| [`docs/04-main-flows.md`](docs/04-main-flows.md) | index / enrich / mcp / review lifecycles | -| [`docs/05-configuration.md`](docs/05-configuration.md) | env vars, `codeiq.yml`, CLI flags | -| [`docs/06-data-model.md`](docs/06-data-model.md) | Kuzu + SQLite schemas, NodeKind/EdgeKind taxonomy | -| [`docs/07-integrations.md`](docs/07-integrations.md) | External systems (Ollama, GitHub OIDC, Sigstore) | -| [`docs/08-testing.md`](docs/08-testing.md) | Test strategy, fixtures, perf-gate | -| [`docs/09-build-deploy-release.md`](docs/09-build-deploy-release.md) | Goreleaser, CI, supply-chain | -| [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md) | Gotchas, debt, security-sensitive areas | -| [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md) | One-stop brief for future AI agents | -| [`docs/adr/0001-current-architecture.md`](docs/adr/0001-current-architecture.md) | Why the architecture is what it is | -| [`CLAUDE.md`](CLAUDE.md) | Repo-specific instructions for Claude Code | +| CLI core (`index` / `enrich` / `stats` / `find` / `query` / `cypher`) | Production | +| MCP stdio server (10 tools) | Production | +| Kuzu 0.11.3 + native FTS (BM25) | Production | +| Goreleaser pipeline + Cosign keyless | Production | +| 884+ tests passing (race + vet + staticcheck + gosec + govulncheck on every PR) | Production | +| `codeiq review` (LLM PR review) | Beta — works end-to-end against local Ollama | +| `parity/` harness | Idle (Java→Go port artifact; build-tag gated) | + +Currently on **v0.4.1**. Release history was reset at v0.4.0 — see [`docs/00-project-overview.md`](docs/00-project-overview.md) for context. + +--- + +## Contributing + +- **Branch off `main`.** Conventional-commit subjects (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`). +- **One logical change per commit.** Squash-merge only. +- **Tests + race + vet must pass.** `CGO_ENABLED=1 go test ./... -race -count=1`. +- **Determinism is non-negotiable.** Every new detector ships positive / negative / determinism tests. +- **Read-only MCP.** Tool calls never mutate the graph. Index/enrich happen via the CLI. +- New detector? Don't forget to blank-import it in [`internal/cli/detectors_register.go`](internal/cli/detectors_register.go) — see [`CLAUDE.md`](CLAUDE.md) for the full how-to. + +Security: please report privately via [GitHub Security Advisories](https://github.com/RandomCodeSpace/codeiq/security/advisories/new). + +--- ## License -[MIT](LICENSE) +MIT License + +Copyright © codeiq contributors. See [`LICENSE`](LICENSE).