diff --git a/.github/workflows/release-go.yml b/.github/workflows/release-go.yml index 258ecd75..aa0cf874 100644 --- a/.github/workflows/release-go.yml +++ b/.github/workflows/release-go.yml @@ -1,17 +1,14 @@ name: release-go -# Tag-triggered release pipeline for the codeiq Go binary. +# Tag-triggered release pipeline for the codeiq Go binary (linux/amd64 +# + linux/arm64). darwin/arm64 ships from `release-darwin.yml` on the +# same tag. # # Trigger: push a tag matching `v*.*.*` (e.g. `git tag v0.3.0 && git push --tags`). # -# v0.3.0 scope: linux/amd64 + linux/arm64 only. Single ubuntu-latest -# runner builds both via the linux→linux cross-compile with -# gcc-aarch64-linux-gnu (CGO permits this cross — both kuzu and -# go-sqlite3 build cleanly). -# -# darwin/arm64 deferred — needs a macos runner and separate matrix. -# Follow-up: add a `release-darwin.yml` that attaches darwin binaries -# to the same draft Release. +# Single ubuntu-latest runner builds both linux archs via +# linux→linux cross-compile with gcc-aarch64-linux-gnu (CGO permits +# this cross — both kuzu and go-sqlite3 build cleanly). on: push: @@ -57,9 +54,6 @@ jobs: args: release --clean env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - HOMEBREW_TAP_OWNER: RandomCodeSpace - HOMEBREW_TAP_REPO: homebrew-codeiq - HOMEBREW_TAP_GITHUB_TOKEN: ${{ secrets.HOMEBREW_TAP_GITHUB_TOKEN }} - name: Attest release artifacts (build provenance) uses: actions/attest-build-provenance@a2bbfa25375fe432b6a289bc6b6cd05ecd0c4c32 # v4.1.0 with: diff --git a/.goreleaser.yml b/.goreleaser.yml index 092a4f1f..f381b740 100644 --- a/.goreleaser.yml +++ b/.goreleaser.yml @@ -61,8 +61,8 @@ builds: - -X 'github.com/randomcodespace/codeiq/go/internal/buildinfo.Dirty={{.IsGitDirty}}' goos: [linux] goarch: [arm64] - # darwin/arm64 deferred — needs a macos runner. Follow-up: - # release-darwin.yml attaches macOS binaries to the same draft Release. + # darwin/arm64 ships from `release-darwin.yml` (macos-14 runner) and + # attaches to the same Release that this config creates. archives: - id: codeiq @@ -114,28 +114,6 @@ signs: output: true signature: '${artifact}.cosign.bundle' -# Homebrew tap publish — opt-in via $HOMEBREW_TAP_GITHUB_TOKEN. When the -# env var is empty (forks, dry runs), the upload is skipped so the same -# .goreleaser.yml works for the owning org and downstream forks alike. -brews: - - name: codeiq - repository: - owner: '{{ envOrDefault "HOMEBREW_TAP_OWNER" "RandomCodeSpace" }}' - name: '{{ envOrDefault "HOMEBREW_TAP_REPO" "homebrew-codeiq" }}' - token: '{{ envOrDefault "HOMEBREW_TAP_GITHUB_TOKEN" "" }}' - skip_upload: '{{ if eq (envOrDefault "HOMEBREW_TAP_GITHUB_TOKEN" "") "" }}true{{ else }}false{{ end }}' - commit_author: - name: codeiq-bot - email: noreply@github.com - directory: Formula - homepage: 'https://github.com/RandomCodeSpace/codeiq' - description: 'Deterministic code knowledge graph + MCP server' - license: 'Apache-2.0' - install: | - bin.install "codeiq" - test: | - assert_match "codeiq", shell_output("#{bin}/codeiq --version") - release: github: owner: RandomCodeSpace diff --git a/CHANGELOG.md b/CHANGELOG.md index edf4311c..1a220740 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -39,7 +39,6 @@ What ships in v0.3.0 (carrying forward from the c363727 squash + c630245 release - Goreleaser cross-platform binaries (linux/amd64, linux/arm64, darwin/arm64), SPDX SBOMs, Cosign keyless signatures via GitHub OIDC + Sigstore Rekor. -- Optional Homebrew tap publish (`RandomCodeSpace/homebrew-codeiq`). - Per-PR perf-regression gate (`perf-gate.yml`). ### Removed @@ -65,565 +64,3 @@ can `git show c363727:` or `git checkout c363727 -- `. every `v*.*.*` tag push. Each archive ships with an SPDX SBOM (Syft), and the `checksums.sha256` manifest is keyless-signed via Cosign + GitHub OIDC (Sigstore Rekor transparency log). Optional - Homebrew tap publish to `RandomCodeSpace/homebrew-codeiq` — - skipped silently when the secret isn't configured, so forks can - reuse the same workflow. Build provenance attestations via - `actions/attest-build-provenance`. Runbook at - [`shared/runbooks/release-go.md`](shared/runbooks/release-go.md). -- **`.github/workflows/perf-gate.yml`** — per-PR perf-regression gate. - Runs `codeiq index` against `fixture-multi-lang` and fails the build - if wall-clock exceeds 8 s, node count drops below 40, or the - phantom-edge drop ratio crosses 50%. Catches regex-pathology - regressions like the CertificateAuthDetector pre-screen miss that - blew up PSA indexing from 0.1 s to 42 s mid-port. -- **Go port (Phases 1-4 of the rewrite)** — codeiq is being ported from - Java/Spring Boot to a single static Go binary on the `port/go-port` - branch. PR #130. 100 detectors at 1:1 parity with the Java side; 34 MCP - tools (deprecated) + 6 consolidated mode-driven tools (new); `codeiq - review` CLI + `review_changes` MCP tool for LLM-driven PR review via - Ollama (Cloud or local). Java tree untouched until Phase 6 cutover. -- **Graph dedup + determinism** (Go side) — `GraphBuilder` deduplicates - nodes by ID with confidence-aware merging, edges by canonical - `(source, target, kind)` tuple. Linker output sorted at the boundary. - `codeiq index` surfaces "Deduped: N nodes, M edges Dropped: K phantom - edges" so graph hygiene is visible. -- **`codeiq review`** — LLM-driven review of `git diff base..head` against - the indexed graph. Defaults to local Ollama (`gpt-oss:20b`); set - `OLLAMA_API_KEY` to flip to Ollama Cloud. `--format=markdown|json`, - `--out`, `--focus`. Graph evidence (nodes-in-file + 1-hop blast radius) - attached per changed file when the Kuzu store is enriched. -- **`review_changes` MCP tool** — same review flow exposed over MCP for - agent-driven invocation. Strictly read-only against the graph. -- OpenSSF supply-chain wiring — Best Practices project - [12650](https://www.bestpractices.dev/projects/12650), live Scorecard at - [securityscorecards.dev](https://api.securityscorecards.dev/projects/github.com/RandomCodeSpace/codeiq), - manifest at `.bestpractices.json`, README badges. (RAN-46, RAN-52, RAN-57) -- `.github/workflows/scorecard.yml` — OpenSSF Scorecard analysis on push + - weekly cron (Mondays 06:00 UTC), SARIF → Security tab. All actions - SHA-pinned per Scorecard `Pinned-Dependencies`. -- `.github/workflows/security.yml` — consolidated OSS-CLI security stack - per RAN-46 path-B board ruling: OSV-Scanner (npm SCA), Trivy (filesystem + - Maven + container CVEs + IaC misconfig), Semgrep (SAST: `p/security-audit` - + `p/owasp-top-ten` + `p/java`), Gitleaks (secret scan, full git history), - jscpd (duplication < 3% on production code), `anchore/sbom-action` (SPDX + - CycloneDX SBOM). Six gate-blocking jobs (SBOM is artifact-only). -- `SECURITY.md` — private vulnerability-disclosure policy, supported-versions - table, triage SLAs (acknowledgement < 72 h, initial triage < 7 d), and - coordinated-disclosure timeline. -- `shared/runbooks/` — `engineering-standards.md` (quality gates, code style, - branch/commit/PR rules, testing tiers, security stack, build & distribution, - documentation), `release.md`, `rollback.md`, `first-time-setup.md`, - `test-strategy.md`. SSoT for cross-cutting engineering rules. -- `scripts/setup-git-signed.sh` — one-shot ssh-signed-commit setup helper. -- `CLAUDE.md` "Supply-chain observability (OpenSSF)" section — operator-level - summary of the Best Practices state, Scorecard baseline + target (≥ 8.0/10 - stretch with eight checks at max), known floor reductions, and the OSS-CLI - stack reference. (RAN-52 AC #7) -- `PROJECT_SUMMARY.md` (repo-root agent entry doc) and - [`docs/project/`](docs/project/) deep-dives (architecture, data-model, - build-and-run, conventions, ui, flows) — written for AI agents and humans - who need to understand and modify the codebase, every claim grounded in a - file path. Sits alongside `CLAUDE.md` (which remains the canonical - hand-maintained internals doc). -- `docs/specs/` — directory for active architectural design specs. First - entry: `2026-04-27-resolver-spi-and-java-pilot-design.md`, the design for - sub-project 1 of the "robust graph" decomposition (symbol-resolver SPI - between parse and detect, Java pilot via JavaParser's `JavaSymbolSolver`, - `Confidence` enum + `source` field on every `CodeNode` / `CodeEdge`, - 4–6 Java detectors migrated, 9 layers of aggressive testing). Implementation - in flight on `feat/sub-project-1-resolver-spi-and-java-pilot`. -- **Symbol-resolver SPI** (sub-project 1, Phases 1–4 of the resolver-and-Java-pilot - plan): the foundation for moving the graph from regex-class-of-correctness - to AST-and-symbol-resolution-class-of-correctness. New `Confidence` enum - (`LEXICAL`/`SYNTACTIC`/`RESOLVED` with stable `score()` mapping) plus a - `source` field land on every `CodeNode` and `CodeEdge`, round-trip through - Neo4j (bare `confidence`/`source` properties on nodes and `RELATES_TO` - relationships) and through the H2 analysis cache (`CACHE_VERSION` bumped - 4 → 5 so existing v4 caches drop and rebuild on next open). Read paths are - non-throwing — legacy data without these fields reads back as - `LEXICAL`/null, never NPEs. New SPI under - `intelligence/resolver/`: `Resolved` interface + `EmptyResolved` singleton - sentinel, `SymbolResolver` per-language backend, `ResolutionException`, - `ResolverRegistry` (Spring `@Service` with deterministic alphabetical - bootstrap, case-insensitive lookup, per-resolver failure isolation). First - backend `JavaSymbolResolver` wraps `javaparser-symbol-solver-core` 3.28.0 - (Apache-2.0, same release train as `javaparser-core`) with a - `JavaSourceRootDiscovery` that walks Maven/Gradle/plain layouts under a - project root (skipping `target/`, `build/`, `node_modules/`, `.git/`, etc.; - symlink-loop-safe via `NOFOLLOW_LINKS`). `DetectorContext` now carries an - `Optional` (`withResolved()` opt-in, `Optional.empty()` for every - detector that doesn't care — fully backward compatible). `Detector.defaultConfidence()` - declares the per-detector floor (`LEXICAL` for regex bases, `SYNTACTIC` for - AST/structured/JavaParser/JavaMessaging bases) and `DetectorEmissionDefaults.applyDefaults` - is wired into every `detector.detect()` call site in `Analyzer.java` — - emissions whose `source` is null get stamped at the orchestration boundary - (detectors that explicitly stamp survive untouched). 11 atomic commits - ship with ~290 new tests covering happy paths, legacy-data fallbacks, - malformed inputs, determinism, concurrency-safe construction, and singleton - invariants. - -- **Resolver pipeline wiring + Java pilot detectors** (sub-project 1, plan - Phases 4 + 6 — follow-up to the SPI scaffolding above): the resolver - is now actually invoked end-to-end and four Java detectors consume - `ctx.resolved()` to emit RESOLVED-tier edges with stable - fully-qualified-name targets. - - `Analyzer` now bootstraps `ResolverRegistry` exactly once per pipeline - entry point (`run` / `runBatchedIndex` / `runSmartIndex`) and threads a - `Resolved` onto every `DetectorContext` at all three detect call sites - (`analyzeFile`, the batched-index variant, the regex-only fallback). - Per-file `ResolutionException` + `RuntimeException` are swallowed and - fall back to `EmptyResolved.INSTANCE`, so one resolver blow-up cannot - take down the whole pass. - - `JavaSymbolResolver.resolve()` now lazy-parses raw source `String` - content with a fresh symbol-solver-configured `JavaParser` per call — - a small per-call allocation that lets `Analyzer` pass the file content - directly (the orchestrator-level structured parser doesn't cover Java). - Permissive parsing returns `JavaResolved` with a possibly-error-laden - `CompilationUnit` rather than refusing — production analysis must keep - going across files with syntax errors. - - Four detectors migrated to consume `ctx.resolved()` (purely additive — - every existing detector test passes unchanged): - - **JpaEntityDetector** — `MAPS_TO` edges between entities now carry - `target_fqn` and `Confidence.RESOLVED` when the symbol solver can - pin the relationship target's FQN (handles `@OneToMany List`, - `@ManyToOne Owner`, both direct-field and generic-arg cases). - - **RepositoryDetector** — Spring Data repo `QUERIES` edges plus the - repo node carry the resolved entity FQN (`entity_fqn` / - `target_fqn`) when `JpaRepository` resolves. - - **SpringRestDetector** — endpoints emit a `MAPS_TO` edge to the - `@RequestBody` DTO class when the parameter type resolves, with - `parameter_kind=request_body` + `parameter_name` properties for - downstream consumers (SPA, MCP). - - **ClassHierarchyDetector** — `EXTENDS` / `IMPLEMENTS` edges across - classes, interfaces, and enums now stamp `Confidence.RESOLVED` + - `target_fqn` when the parent type resolves, collapsing four - duplicated in-line edge-emission blocks into a single - `addHierarchyEdge` helper as a side-benefit. - - Backward compatibility is total: when no resolver is registered or - `JavaSymbolResolver.bootstrap` fails, every detector returns the - same simple-name-targeted edge shape it shipped before this slice. - - 18 new wiring + resolved-mode tests on top of the SPI's ~290 — every - migration ships with the plan-required three-mode coverage (resolved, - fallback, mixed). -- **AKS read-only deploy hardening** (sub-project 2): runbook at - [`shared/runbooks/aks-read-only-deploy.md`](shared/runbooks/aks-read-only-deploy.md), - JVM-flag-preset launcher at [`scripts/aks-launch.sh`](scripts/aks-launch.sh), - and a sentinel test asserting the script contains every required flag. - Enables `codeiq serve` inside an AKS pod with - `securityContext.readOnlyRootFilesystem=true` and a writable `/tmp` - emptyDir: an init-container copies the graph bundle from Nexus into - `/tmp/codeiq-data`; the main container runs `aks-launch.sh /tmp/codeiq-data`. - Zero source-code changes to the serve profile or Neo4j wiring — solved at - the deployment layer plus Spring-Boot-loader / `java.io.tmpdir` / - `-XX:ErrorFile` / `-XX:HeapDumpPath` overrides. Spec at - [`docs/specs/2026-04-28-aks-read-only-deploy-design.md`](docs/specs/2026-04-28-aks-read-only-deploy-design.md). - -- **Resolver aggressive-testing layers** (sub-project 1, plan Phase 7 — - Layers 1, 3, 4, 5, 6, 7, 8, 9): the spec §12 testing matrix lands as - six new test classes plus a non-default Maven profile. - - **Layer 1** — `JavaSymbolResolverLayer1ExtendedTest` (16 tests): - deeply-nested generics, static / non-static inner classes, records, - sealed hierarchies, enum-with-abstract-methods, default-method - interfaces, abstract classes, annotation types, same simple name in - different packages by import, JDK `Optional` / `Stream` / `List` via - `ReflectionTypeSolver`, multi-source-root cross-references - (`src/main` ↔ `src/test`), wildcard imports, cyclic imports. - - **Layer 3** — `JavaSymbolResolverConcurrencyTest` (already shipped - in the prior commit): virtual-thread fan-out under `N=200` files / - `256` concurrent calls, garbage-input variant. - - **Layer 4** — `JavaSymbolResolverPathologicalTest` (3 tests): - 10K-line class, 1000 imports (most unresolvable), 10-deep generic - nesting; per-test `@Timeout` is the regression sentinel against - quadratic memoization. - - **Layer 5** — `JavaSymbolResolverAdversarialTest` (5 tests): - unbalanced braces (strict-success → `EmptyResolved`), mis-tagged - Kotlin / random-bytes (no exception, no null), mixed source root - with `.java` + `.txt` siblings, empty source root (no Java files - anywhere) bootstraps via `ReflectionTypeSolver` alone. - - **Layer 6** — `JavaSymbolResolverDeterminismTest` (already shipped): - same input → same FQN 25× in a row, two independent resolvers - agree, rebootstrap is observably idempotent, deeper FQNs are stable. - - **Layer 7** — `E2EResolverPetclinicTest` (env-gated): runs the - resolver against every `.java` under `$E2E_PETCLINIC_DIR`, asserts - bootstrap < 10 s, no exception, > 50% files produce `JavaResolved` - (i.e. strict-success isn't false-rejecting valid Java). Lighter than - spec §12 Layer 7's full precision/recall comparison — that requires - a pre-resolver baseline JSON checked into test resources, captured - at implementation time. This stand-in is the strongest signal until - that baseline lands. - - **Layer 8** — `JavaSymbolResolverRandomizedTest` (1 test, 100 - samples): hand-rolled randomized generator with fixed seed; per the - plan's license guidance, jqwik (EPL-2.0) is not on the preferred- - license list, and this is the documented JUnit + `java.util.Random` - fallback. Properties: never throws, never returns null, completes - per file in < 1 s. - - **Layer 9** — `mutation` Maven profile (non-default): adds - `pitest-maven` 1.18.0 (Apache-2.0) targeting - `intelligence.resolver.*` and `model.Confidence`. Run with - `mvn -P mutation org.pitest:pitest-maven:mutationCoverage - -Dfrontend.skip=true -Ddependency-check.skip=true`. Reports under - `target/pit-reports/`. - - Four robustness fixes from a dual-agent (superpowers + codex) - brainstorm landed on the same branch: `volatile` on - `JavaSymbolResolver`'s `solver` / `combined` fields, strict - parse-success check in the String-source branch (was silently - emitting partial-CU edges on broken parses), `StackOverflowError` - catch in `Analyzer.resolveFor` (pathological generics no longer kill - virtual threads), `try-with-resources` on the `Files.walk` in - `JavaSourceRootDiscovery.containsJavaFile` (fd leak fix). 26 new - tests on top of the resolver wiring slice's 18 — full suite at 3618 - / 0 / 32 skipped, +1 skip is the env-gated E2E petclinic test. - -### Changed - -- Documentation count drift fixed: detector total updated from **97 → 99** - (live count, excluding `Abstract*` and `*Helper*`); `NodeKind` total - updated from **32 → 34** (javadoc at `model/NodeKind.java` was stale by - two entries); `EdgeKind` total updated from **27 → 28** (javadoc at - `model/EdgeKind.java` was stale by one entry). `README.md`, `CLAUDE.md`, - `PROJECT_SUMMARY.md`, `docs/project/*.md`, and the source javadocs are - now in sync. - -- Branch protection on `main` requires every commit to be ssh-signed - (RAN-46 AC #2). Force-pushes to `main` are rejected; squash-merge from - PRs is the only path. -- Top-level `permissions: read-all` on every GitHub Actions workflow per - Scorecard `Token-Permissions`. Per-job permissions opt into narrower - writes only where required (`security-events: write` for SARIF upload; - `id-token: write` for the Scorecard publish step). -- Quality gate stack converged to OSS-CLI only: SpotBugs (`mvn spotbugs:check`), - JaCoCo coverage (≥ 85% line, project-wide), Semgrep + Trivy + OSV-Scanner + - Gitleaks + jscpd from `security.yml`, plus OpenSSF Scorecard as - observability. (RAN-46 path-B board ruling.) - -### Removed - -- SonarCloud, CodeQL (default-setup and workflow-driven), and OWASP - Dependency-Check are no longer part of the merge gate. Per the RAN-46 - path-B board ruling, they are not to be re-introduced without an explicit - board reversal — see `shared/runbooks/engineering-standards.md` §5.1. - -### Security - -- **Production-readiness PR 1 of 5 — security baseline.** First half of the - audit findings catalogued under `docs/audits/2026-04-28-serve-path-prod-readiness.md` - (+ `-counter.md`). Closes audit findings #1, #7, #13 (HIGH/MEDIUM) and C2 (MEDIUM). - - **Bearer-token auth on `/api/**` and `/mcp/**`** (audit #1). Added - `spring-boot-starter-security`. New `config/security/SecurityConfig`, - `BearerAuthFilter`, `TokenResolver`. Token source priority: - `CODEIQ_MCP_TOKEN` env > `codeiq.mcp.auth.token` config > startup failure. - Constant-time compare via SHA-256 pre-hash + `MessageDigest.isEqual` — - 32-byte digests on both sides defeat the length oracle. RFC 7235 §2.1 - case-insensitive scheme matching (`Bearer`, `bearer`, etc.). Authorization - header value never reaches a logger from this code. Permit list: - `/`, `/index.html`, `/favicon.ico`, `/assets/**`, `/static/**`, `/error`, - `/actuator/health/{liveness,readiness}` — everything else under - `/api/**`, `/mcp/**`, `/actuator/**` requires the bearer token. - - **Fail-fast on misconfiguration** (audit #14 partial). `mode=bearer` with - no token resolved → throws at startup. `mode=none` with active `serving` - profile and `allow_unauthenticated` not explicitly set → throws at - startup. `mode=mtls` is reserved and explicitly throws "not yet - implemented" rather than silently passing through. - - **Defensive response headers** (audit #13). New - `config/security/SecurityHeadersFilter` sets `X-Content-Type-Options: - nosniff`, `X-Frame-Options: DENY`, `Content-Security-Policy: default-src - 'self'; ... frame-ancestors 'none'`, `Referrer-Policy: no-referrer`, - `Permissions-Policy` disabling geolocation/camera/microphone. - `Strict-Transport-Security: max-age=31536000; includeSubDomains` is set - only when `X-Forwarded-Proto: https` is present (AKS terminates TLS at - ingress) — setting HSTS over plain HTTP would lock out misconfigured envs. - - **Uniform error envelope** (audit #7). New - `api/GlobalExceptionHandler` (`@RestControllerAdvice`, - `@Profile("serving")`) maps every uncaught exception to - `{"code","message","request_id"}` with the right HTTP status. - `IllegalArgumentException` → 400 with surfaced message. - `ResponseStatusException` → status code passes through. Anything else → - 500 with generic message; the actual exception is logged at WARN with - the `request_id` so on-call can correlate without leaking stack frames - to the client. `application.yml` now sets - `server.error.include-stacktrace: never` + `include-message: never` + - `include-binding-errors: never` as belt-and-suspenders. - - **Default CORS deny-all in serving** (audit #13). `config/CorsConfig` - default changed from loopback patterns to empty. Empty means register - no mappings → Spring MVC rejects all preflighted cross-origin requests. - Operators who genuinely need cross-origin (e.g. dev with a separate - Vite server on a different port) explicitly set - `codeiq.cors.allowed-origin-patterns`. Logs the resolved state at - startup. The React UI at `/` is unaffected — it's served same-origin. - - **Swagger UI / api-docs disabled in serving** (counter-audit C2). - `springdoc.api-docs.enabled: false` + `springdoc.swagger-ui.enabled: false` - in the serving profile of `application.yml`. The OpenAPI schema is - reconnaissance data; reachable only when running locally or with the - indexing profile. - - **`management.endpoints.web.exposure.include` narrowed** to `health,info` - in serving (was `health,info,metrics`); `health.show-details: never`. - Defense-in-depth alongside the `SecurityFilterChain` `authenticated()` - rule on `/actuator/**`. - - **Spring Security autoconfig excluded outside serving.** Without the - `serving` profile (CLI, tests, IDE runs), Spring Security's default - HTTP Basic chain would lock all endpoints — adding the starter would - break ~3000 existing tests that pass through MockMvc with no token. - `application.yml` excludes `SecurityAutoConfiguration`, - `SecurityFilterAutoConfiguration`, `UserDetailsServiceAutoConfiguration` - at the default level; the `serving` profile re-enables them by listing - only `UserDetailsServiceAutoConfiguration` (so the auto user/password - is suppressed but the filter chain is built from `SecurityConfig`). - - **Tests:** 31 new unit tests across `BearerAuthFilterTest` (14 cases: - missing/wrong/empty/correct/lowercase scheme, length-oracle defense, - log-leak audit, `shouldNotFilter` paths, `SecurityContextHolder` cleanup), - `TokenResolverTest` (9 cases for mode/profile/env-priority/fail-fast), - `SecurityHeadersFilterTest` (5 cases for header presence/HSTS gating), - `GlobalExceptionHandlerTest` (3 cases verifying the envelope shape and - no stack-trace leak). Full suite: 3453 tests / 0 failures / 0 errors. - - **Known follow-up (not in this PR):** the React UI cannot read env vars, - so the SPA shell is unauthenticated to access static assets. API/MCP calls - from the UI must inject `Authorization: Bearer ` from - operator-supplied localStorage. A first-class UI auth bootstrap (login - flow + token-issuance endpoint, OR server-side template injection) is its - own design — tracked as a follow-up issue. - -- **Production-readiness PR 2 of 5 — resource limits & abuse protection.** - Closes audit findings #2, #3, C1 (HIGH) and #10, #11 (MEDIUM). - - **Cypher transaction timeout** (audit #2). Neo4j embedded - `GraphDatabaseSettings.transaction_timeout = 30s` configured in - `Neo4jConfig` — every transaction in the JVM, including `run_cypher` - and graph traversals, gets a hard wall-clock cap. Catches runaway - variable-length matches before they starve the page cache. - - **Result-set cap on `run_cypher`** (audit #2). Hard row cap at - `mcp.limits.max_results` (default 500); excess rows dropped, response - carries `truncated: true` + `max_results: N`. Defends the JVM heap - against `MATCH (a),(b),(c) RETURN a,b,c LIMIT 999999999` blowups. - - **MCP `traceImpact` depth cap** (audit #10 corrected, C3). New - `mcp.limits.max_depth` field (default 10) wired into - `McpTools.traceImpact` via `Math.min`. Defends against - `RELATES_TO*1..1000` Cartesian explosions on hub nodes. - - **TTL snapshot cache on topology tools** (audit C1). `McpTools. - getCachedData()` now backed by a 60-second TTL snapshot. Without it, - every concurrent `service_dependencies` / `blast_radius` / - `find_path` / `find_bottlenecks` / `find_circular_deps` / - `find_dead_services` / `find_node` call paid the full - `graphStore.findAll()` cost and double-allocated multi-GB heaps. - A bridge fix; the proper refactor (TopologyService → per-tool Cypher) - is a tracked follow-up. - - **Per-client rate limiter** (audit #3). New `RateLimitFilter` using - Bucket4j 8.18.0 (Apache-2.0). Token bucket sized at - `mcp.limits.rate_per_minute` (default 300). Keyed by SHA-256 hash of - the `Authorization` header (so the token never lives in our key map), - falls back to `X-Forwarded-For` (first hop) or `RemoteAddr`. 429 - response with `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining` - headers. Registered before `BearerAuthFilter` so unauthenticated - brute-force is also throttled. - - **`/api/file` content-type sniff** (audit #11 corrected). Added - `Files.probeContentType` guard — non-text MIMEs (`.jks`, `.so`, - `.png`, native libs) return HTTP 415 with the probed type, instead - of being served as garbled `text/plain`. Allowlist: `text/*`, - `application/json`, `application/xml`, `application/x-yaml`, - `application/javascript`. The byte cap (already enforced by - `SafeFileReader`) is unchanged. - - **Tomcat slow-client tarpit** (audit #11). `server.tomcat.connection- - timeout: 10s`, `max-swallow-size: 1MB` in the serving profile — - drops connections that hold a virtual thread + Tomcat connection at - 1 KB/s. - - **CodeQL hardening on the security baseline.** Sanitised request - method + URI before logging in `BearerAuthFilter` (CWE-117 / CodeQL - `java/log-injection`); removed env-var name from the bearer-token - bootstrap log line in `TokenResolver` (CodeQL `java/sensitive-log`); - documented the deliberate stateless-bearer rationale on - `SecurityConfig.csrf(disable)` (CodeQL `java/spring-disabled-csrf-protection` - — no exploit path on a no-cookie surface). - - **Tests:** new `RateLimitFilterTest` (10 cases: under/over limit, - separate buckets per client, header-hashing, X-Forwarded-For - precedence, permit-list, default-rate fallback). Existing 6 test - classes updated for the new `McpTools` ctor signature. Full suite: - 3672 tests / 0 failures / 0 errors. - - **Known follow-up:** TopologyService still walks the full snapshot - in-memory after the cache hit — long-term plan is to rewrite each - topology tool as a targeted Cypher query so the snapshot isn't needed. - The cache is the bridge; the rewrite reduces peak memory. - -- **Production-readiness PR 3 of 5 — supply chain & bundle integrity.** - Closes the air-gap drift, missing bundle integrity, and unpinned - scanner versions audit findings. - - **`codeiq bundle` SHA-256 manifest.** Every entry in `bundle.zip` - (manifest, scripts, graph DB files, H2 cache, source tree, flow.html, - optional CLI JAR) is now hashed as it streams through the - `ZipOutputStream`, and a `checksums.sha256` entry is written last in - standard GNU coreutils format. Receivers verify with - `sha256sum -c checksums.sha256`. The hash is computed by feeding each - chunk to both the SHA-256 digest and the ZIP stream — no double-read - even for multi-hundred-MB graph databases. Order is deterministic - (sorted dir walks + sorted git ls-files), so the resulting - `checksums.sha256` is byte-stable. - - **No public-internet calls in launcher scripts.** `serve.sh` and - `serve.bat` previously fell back to `curl -fL https://repo1.maven.org/...` - when the CLI JAR wasn't bundled — incompatible with the air-gapped - deploy model documented in `~/.claude/rules/build.md`. The Maven - Central download is removed; if the JAR is missing, the launcher - fails fast and tells the operator to either `--include-jar` when - bundling or stage from an internal artifact mirror. `serve.sh` also - runs `sha256sum -c --quiet checksums.sha256` automatically before - launching (skip with `CODEIQ_SKIP_VERIFY=1`). - - **Pinned Semgrep version.** `.github/workflows/security.yml` was - `pip install semgrep` (floating) — Scorecard's - `Pinned-Dependencies` flagged it. Now pinned to `semgrep==1.161.0` - (latest stable as of 2026-04-28). Bumps go through Dependabot's pip - ecosystem on a documented cadence. - - **Tightened secret-pattern exclusions.** `.gitignore` previously - only matched `.env` / `.env.local` — gaps for `.env.prod`, - `.env.test`, JKS / P12 keystores, SSH private keys, and - cloud-credential JSON. Broadened to `.env.*` plus explicit globs - for `*.jks`, `*.p12`, `*.pfx`, `*.keystore`, `id_{rsa,ecdsa,ed25519,dsa}`, - `credentials.{json,yaml}`, `secrets.{json,yaml}`, - `*.serviceaccount.json`. `.dockerignore` mirrors the same rules - (Docker resolves COPY against the build context, which includes - untracked working-tree files; .dockerignore does not inherit - .gitignore). - - **Bundle verification runbook.** `shared/runbooks/release.md` §4a - documents consumer-side `sha256sum -c` workflow, including the - deliberate exclusion of `checksums.sha256` from itself (would be - circular) and the Sigstore/GPG out-of-band signing that backs - `checksums.sha256` against tampering. - - **Tests:** `BundleCommandTest#bundleCreatesZipWithCorrectStructure` - extended with 4 new asserts: serve.sh contains no `curl`/`maven.org` - references (defense against re-introduction), `checksums.sha256` - exists, format-conforms to `<64-hex> `, and excludes itself. - Full suite: 3672 tests / 0 failures / 0 errors. - -- **Production-readiness PR 4 of 5 — observability.** Closes the missing-MDC, - hot-path-health-probe, MCP-error-leak, and structured-logging gaps. - - **`RequestIdFilter` (new).** Populates SLF4J `MDC.request_id` FIRST in - the security chain so every downstream filter, controller, MCP tool, - and exception handler sees the same correlation ID. Strict allow-list - on inbound `X-Request-Id` (8–64 hex/dash/underscore chars) prevents - log-forging; bad inputs are replaced with a generated UUID. Echoes - the ID back to the client in the `X-Request-Id` response header. MDC - is cleared in `finally` to prevent leak across pooled threads (both - Tomcat platform and virtual-thread carriers). Pre-PR-4 every - `MDC.get("request_id")` call returned null; the four downstream - consumers (BearerAuthFilter, RateLimitFilter, GraphController, - GlobalExceptionHandler) all generated synthetic UUIDs that never - correlated. - - **JSON-structured logging** (`logback-spring.xml`). Serving profile - switches the encoder from `%msg%n` plaintext to LogstashEncoder - (`logstash-logback-encoder` 9.0 — MIT). One JSON event per log line - with `ts`, `level`, `logger`, `thread`, `msg`, `stack`, all MDC - entries (`request_id`), and a static `application: codeiq` field for - multi-pod ingestion. Indexing/CLI profiles keep plaintext to avoid - JSON noise leaking into `codeiq index` output. - - **`GraphHealthIndicator` 30s TTL cache.** Pre-PR-4 every readiness - probe (k8s default ~1Hz) ran a `MATCH (n) RETURN count(n)` Cypher - query — wakes the page cache, competes with API traffic. - `AtomicReference` lock-free cache absorbs the flood; - one underlying probe per 30s regardless of caller concurrency. - Error response sanitized too: pre-PR-4 the `error` detail - surfaced `e.getMessage()` (CodeQL `java/error-message-exposure`, - permitAll endpoint = anonymous probers). Now only an `error_class` - indicator; full stack is logged at WARN. - - **Liveness/readiness groups** (`application.yml`). Pre-PR-4 - `GraphHealthIndicator` contributed to BOTH probes — a graph-down - event would flap the pod (k8s killing it) instead of just routing - away. Pinned to readiness only: - `liveness: livenessState`, `readiness: readinessState, - graphHealthIndicator`. - - **Prometheus metrics** (`/actuator/prometheus`). Added - `micrometer-registry-prometheus` dep. Exposed under the bearer- - authenticated `/actuator/**` rule (NOT permitAll — full metrics - tree is reconnaissance data). Application tag `codeiq` for - multi-pod scraping. Step interval 10s. - - **Structured MCP error envelope.** Pre-PR-4 every MCP tool catch - block returned `toJson(Map.of("error", e.getMessage()))` — flat - string, no correlation. Refactored to a centralized - `errorEnvelope(code, e)` helper that returns - `{code, message, request_id, error}` (legacy `error` field - preserved for backwards-compat). Codes assigned per failure - category: `INTERNAL_ERROR`, `INVALID_INPUT`, `FILE_READ_FAILED`, - `SERIALIZATION_FAILED`. Full exception logged server-side with - request_id; only sanitized envelope reaches the client. `readFile` - no longer concatenates `e.getMessage()` into a string (CWE-209). - - **Tests:** new `RequestIdFilterTest` (7 cases — UUID generation, - header pass-through, control-char rejection, length bounds, MDC - clear-in-finally including throw path). `GraphHealthIndicatorTest` - extended with cache-hit assertion (3 calls → 1 underlying - `count()`) and updated for sanitized error fields. - `McpToolsTest#readFileShouldHandleMissingFile` updated for new - envelope contract. Full suite: 3680 tests / 0 failures / 0 errors. - -- **Production-readiness PR 5 of 5 — config validation, integration coverage, - docs refresh.** Final PR of the production-readiness series. Closes the - remaining audit findings around silent oversized-input clamping, missing - end-to-end coverage of the serving filter chain, and stale tech-stack - pins in `CLAUDE.md`. - - **MCP request-bound clamping in `McpTools`.** `queryNodes` / - `queryEdges` `limit` parameters are now `Math.min(requested, - mcp.limits.max_results)` so a caller asking for `LIMIT 1_000_000` no - longer trips the JVM into a multi-GB allocation before the - `run_cypher` row cap kicks in. `getEgoGraph` radius is clamped to - `mcp.limits.max_depth` for the same reason — a `radius=999` ego - walk on a hub node is a Cartesian explosion. `searchGraph` limit - follows the same rule. Per-call defense-in-depth on top of the - transaction-timeout cap from PR 2. - - **`ConfigValidator` hard ceilings + blank-string checks.** Added - explicit validations for fields previously only typed as - `Integer`/`Long` with no range: - - `mcp.limits.max_payload_bytes` — must be `> 0` (was silently - `null` → no payload cap → infinite-row run_cypher could OOM). - - `mcp.limits.rate_per_minute` — must be `> 0`. - - `mcp.limits.max_depth` — must be `1..100`. The 100 ceiling is a - DoS sentinel: variable-length Cypher with depth >100 is - pathological in practice (a graph with 100M nodes and fan-out 5 - reaches every node by depth 12), so anything higher is either a - misconfig or a reconnaissance probe. Catch at config-load, not - at query time. - - `mcp.auth.token_env` / `mcp.auth.token` — when mode=bearer, - blank-string values fail validation rather than being silently - coerced to null and then fail-fasting at startup with a - mysterious "no token resolved" message. - - **`ServingChainIntegrationTest` (new — 9 cases).** Fills the gap - where each filter (`RequestIdFilter`, `SecurityHeadersFilter`, - `RateLimitFilter`, `BearerAuthFilter`) had unit-test coverage in - isolation but no test exercised the full chain together. Asserts - the cross-filter contract: 401 envelope shape with `request_id` - echoed in the `X-Request-Id` response header; 429 envelope with - `Retry-After` and `X-RateLimit-Remaining: 0`; security headers - present on every response (success, 401, 429); inbound - `X-Request-Id` propagated end-to-end when valid; control-char - inbound rejected and replaced with a generated UUID; rate-limit - bucket isolation per token (one client exhausting their bucket - does not affect another); health endpoint bypasses auth (kubelet - probes carry no token). Manually chains the four filters via - lambda `FilterChain` instances rather than spinning up a full - `@SpringBootTest` so the run is sub-second and doesn't need - Neo4j. Lives in `io.github.randomcodespace.iq.config.security` - package to access package-private `TokenResolver.resolve()`. - - **`CLAUDE.md` tech-stack pin refresh.** Stale version pins - updated to current: Spring Boot 4.0.5 → 4.0.6, Spring AI 2.0.0-M3 - → 2.0.0-M4, Neo4j Embedded 2026.02.3 → 2026.04.0; added - Bucket4j 8.18.0, logstash-logback-encoder 9.0, - micrometer-registry-prometheus to the dependency list. - - **Tests:** new `ServingChainIntegrationTest` (9 cases). Full - suite: 3689 tests / 0 failures / 0 errors / 32 skipped. - -## [0.1.0] - 2026-03-28 - -First general-availability cut. See the -[v0.1.0 GitHub Release](https://github.com/RandomCodeSpace/codeiq/releases/tag/v0.1.0) -for the full notes. - -- 97 detectors across 35+ languages. -- Three-command pipeline: `index` → `enrich` → `serve`. -- Read-only REST API (37 endpoints), MCP server (34 tools, Spring AI 2.0 - streamable HTTP), and React UI shipped inside a single signed JAR. -- Maven Central coordinates: `io.github.randomcodespace.iq:code-iq`. - -## [0.0.1-beta.0] – [0.0.1-beta.46] - 2026-Q1 - -Pre-GA beta line. Full per-tag notes on -[GitHub Releases](https://github.com/RandomCodeSpace/codeiq/releases?q=prerelease%3Atrue). -The beta cadence shipped from `beta-java.yml` on `workflow_dispatch`; each -beta is an immutable Sonatype Central beta artifact + GPG-signed annotated -git tag + GitHub pre-release. - -[Unreleased]: https://github.com/RandomCodeSpace/codeiq/compare/v0.1.0...HEAD -[0.1.0]: https://github.com/RandomCodeSpace/codeiq/releases/tag/v0.1.0 diff --git a/CLAUDE.md b/CLAUDE.md index 409ccd51..8e8f5719 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -36,8 +36,6 @@ landing) and `c630245` (release infra). helper. - **`spf13/cobra`** — CLI framework. Subcommand registration via `internal/cli` blank imports. -- **`golang-jwt/jwt/v5`** — token validation surface (kept from a - serve-mode prototype; serve isn't fully ported yet). ## Architecture @@ -46,10 +44,15 @@ landing) and `c630245` (release infra). ``` index: FileDiscovery → Parsers → Detectors (goroutine pool) → GraphBuilder → SQLite cache enrich: SQLite → Linkers → LayerClassifier → LexicalEnricher → LanguageEnricher → ServiceDetector → Kuzu (COPY FROM) -serve: (deferred — not ported in v0.3.0) mcp: Kuzu → QueryService → 6 consolidated MCP tools + run_cypher escape hatch + review_changes ``` +codeiq has no REST API and no web UI surface — by design. Consumers +interact through the CLI or through the stdio MCP server (read-only). +The Java reference had a `codeiq serve` subcommand (Spring Boot REST ++ React SPA); both were removed in the Go port and will not be +reintroduced. + ### Pipeline components - **`internal/analyzer/file_discovery.go`** — `git ls-files` first, @@ -426,10 +429,8 @@ Release pipeline: - Release tag must be `v*.*.*`; pre-releases use the `vX.Y.Z-rc.N` form (Goreleaser `prerelease: auto` honors it). - Cosign keyless via GitHub OIDC — no long-lived key on the runner. - Verification needs the cert + sig + the OIDC identity regex (see - `shared/runbooks/release-go.md`). -- Homebrew tap publish is opt-in via `HOMEBREW_TAP_GITHUB_TOKEN`. - Forks leave the secret unset and the brew step skips silently. + Verification needs the cosign bundle file + the OIDC identity regex + (see `shared/runbooks/release-go.md`). ## Updating This File diff --git a/PROJECT_SUMMARY.md b/PROJECT_SUMMARY.md index 49d8612c..1b94c05b 100644 --- a/PROJECT_SUMMARY.md +++ b/PROJECT_SUMMARY.md @@ -5,9 +5,8 @@ > > **Canonical depth lives in [`CLAUDE.md`](CLAUDE.md)** (~16 KB, > agent-oriented, hand-maintained). This file is a thin entry point -> that links into `CLAUDE.md`, the runbooks under -> [`shared/runbooks/`](shared/runbooks/), and the deep-dives under -> [`docs/project/`](docs/project/). +> that links into `CLAUDE.md` and the runbooks under +> [`shared/runbooks/`](shared/runbooks/). ## Identity @@ -54,7 +53,7 @@ codeiq/ │ │ ├── cli/ — cobra subcommands │ │ ├── detector/ — 100 detectors organized by category │ │ ├── flow/ — architecture-flow diagram engine -│ │ ├── graph/ — Kuzu facade (read-only on serve path) +│ │ ├── graph/ — Kuzu facade (read-only) │ │ ├── intelligence/ — lexical + language extractors + evidence + planner │ │ ├── mcp/ — MCP server + tool definitions │ │ ├── model/ — CodeNode, CodeEdge, kinds, Confidence @@ -65,8 +64,7 @@ codeiq/ │ ├── testdata/ — fixtures (fixture-minimal, fixture-multi-lang) │ ├── go.mod │ └── go.sum -├── .github/workflows/ — go-ci, perf-gate, release-go, security, scorecard -├── docs/project/ — architecture + conventions + flows deep-dives +├── .github/workflows/ — go-ci, perf-gate, release-go, release-darwin, security, scorecard ├── shared/runbooks/ — release-go.md + engineering-standards.md ├── CHANGELOG.md ├── CLAUDE.md — SSoT internals doc @@ -104,8 +102,8 @@ CGO_ENABLED=1 go build -o /usr/local/bin/codeiq ./cmd/codeiq ``` **Required env / external services**: none for build. At run-time the -binary reads `OLLAMA_API_KEY` (optional) and `HOMEBREW_TAP_GITHUB_TOKEN` -(release-side only). +binary reads `OLLAMA_API_KEY` (optional, switches `codeiq review` to +Ollama Cloud). ## Conventions an agent must respect @@ -126,8 +124,7 @@ binary reads `OLLAMA_API_KEY` (optional) and `HOMEBREW_TAP_GITHUB_TOKEN` boundary; detectors override only when they have higher-confidence evidence. -Full set in [`CLAUDE.md` §Code Conventions](CLAUDE.md#code-conventions) -and [`docs/project/conventions.md`](docs/project/conventions.md). +Full set in [`CLAUDE.md` §Code Conventions](CLAUDE.md#code-conventions). ## Gotchas @@ -147,8 +144,6 @@ and [`docs/project/conventions.md`](docs/project/conventions.md). ## Where to look next -- Architecture & components → [`docs/project/architecture.md`](docs/project/architecture.md) -- Conventions (full) → [`docs/project/conventions.md`](docs/project/conventions.md) - Build & release → [`shared/runbooks/release-go.md`](shared/runbooks/release-go.md) - MCP integration → [`README.md#mcp-integration`](README.md#mcp-integration) - Internal SSoT → [`CLAUDE.md`](CLAUDE.md) diff --git a/README.md b/README.md index c8bd356e..9ed83c54 100644 --- a/README.md +++ b/README.md @@ -63,13 +63,6 @@ cosign verify-blob \ checksums.sha256 ``` -### Homebrew - -```bash -brew tap RandomCodeSpace/codeiq -brew install codeiq -``` - ### Build from source Requires Go 1.25.10+ and a C toolchain (CGO). @@ -153,11 +146,10 @@ the graph) are dropped at snapshot. Every run prints a "Deduped: N nodes, M edges Dropped: K phantom edges" line so graph hygiene is visible. -See [`docs/project/architecture.md`](docs/project/architecture.md) for -the pipeline (FileDiscovery → tree-sitter / regex → detectors → -GraphBuilder → linkers → LayerClassifier → Kuzu) and -[`docs/project/conventions.md`](docs/project/conventions.md) for the -detector authoring contract. +Pipeline: FileDiscovery → tree-sitter / regex → detectors → +GraphBuilder → linkers → LayerClassifier → Kuzu. See +[`CLAUDE.md`](CLAUDE.md) for the full architecture and the detector +authoring contract. ## Releases diff --git a/docs/project/architecture.md b/docs/project/architecture.md deleted file mode 100644 index 43186093..00000000 --- a/docs/project/architecture.md +++ /dev/null @@ -1,132 +0,0 @@ -# Architecture - -## High-level shape - -codeiq is a **two-mode Spring Boot application** that ships as one JAR with the React SPA bundled inside: - -- **Indexing mode** (`index`, `enrich`, and most other CLI commands): Spring profile `indexing`, no web server, virtual-thread-driven file scanning + detector pipeline writing to H2 (cache) then Neo4j Embedded (graph). -- **Serving mode** (`serve` only): Spring profile `serving`, web server up, REST API + Spring AI MCP server + React SPA reading from the already-populated Neo4j directory. Strictly read-only — no detector code runs in this profile. - -``` - ┌──────────────────────────┐ - filesystem ───► │ index (FileDiscovery + │ - (any repo) │ Detectors + GraphBuilder)│ ──► H2 cache (.codeiq/cache/) - └──────────────────────────┘ - │ - ┌──────────────────────────┐ │ - │ enrich (Linkers + │ ◄───────┘ - │ LayerClassifier + │ - │ ServiceDetector + │ - │ LanguageEnricher + │ - │ LexicalEnricher) │ ──► Neo4j (.codeiq/graph/graph.db) - └──────────────────────────┘ │ - │ - developer / agent ◄── REST + MCP + React SPA ◄──── serve ◄─────┘ - (read-only) (Spring profile = serving) -``` - -Profile selection happens in `CodeIqApplication.java`'s `main` (around the `boolean isServe = "serve".equalsIgnoreCase(command)` block): the first CLI arg is matched against `serve` → `serving`; everything else → `indexing`. `indexing` sets `WebApplicationType.NONE`. - -## Components - -### Pipeline orchestrator (`analyzer/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/analyzer/` -- **Responsibility:** Discover files, route to parsers, fan out to detectors on virtual threads, fold results into a single graph buffer, then run cross-file linkers and the layer classifier. -- **Key files:** - - `Analyzer.java` — top-level pipeline (in-memory mode for `analyze` command). - - `FileDiscovery.java` — `git ls-files` first, falls back to directory walk; maps extensions → languages via `FileClassifier.java`. - - `StructuredParser.java` — routes Java to JavaParser, ANTLR-supported langs to `grammar/AntlrParserFactory.java`, others to raw text. - - `GraphBuilder.java` — buffered build (nodes-first, then edges) — determinism guarantee. - - `LayerClassifier.java` — sets `layer ∈ {frontend|backend|infra|shared|unknown}` on every node. - - `ServiceDetector.java` — filesystem walk for build files (30+ build systems) → SERVICE nodes with `CONTAINS` edges. - - `linker/` — 4 linkers run after detectors: `EntityLinker`, `GuardLinker`, `ModuleContainmentLinker`, `TopicLinker` (`Linker.java` is the interface; `LinkResult.java` is the return type). - - `ConfigScanner.java`, `InfrastructureRegistry.java`, `ArchitectureKeywordFilter.java` — supporting passes. -- **Talks to:** `detector/` (fan-out), `cache/AnalysisCache.java` (write), `graph/GraphStore.java` (write — only during `enrich`). -- **Owns:** in-memory graph buffer during a single run. - -### Detector layer (`detector/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/detector/` -- **Responsibility:** 99 concrete detectors that turn parsed files into nodes + edges. Auto-discovered as Spring `@Component`s; no registry to maintain. -- **Categories (one subdir each):** `auth/`, `csharp/`, `frontend/`, `generic/`, `go/`, `iac/`, `jvm/{java,kotlin,scala}/`, `markup/`, `proto/`, `python/`, `script/{shell,...}/`, `sql/`, `structured/`, `systems/{cpp,rust}/`, `typescript/`. -- **Base classes:** `Detector` (interface), `AbstractRegexDetector`, `AbstractJavaParserDetector`, `AbstractAntlrDetector`, `AbstractStructuredDetector`, `AbstractPythonAntlrDetector`, `AbstractPythonDbDetector`, `AbstractTypeScriptDetector`, `AbstractJavaMessagingDetector`. Plus three static helpers: `DetectorDbHelper`, `FrontendDetectorHelper`, `StructuresDetectorHelper`. Full table: see [`conventions.md`](conventions.md) §"Detector base classes". -- **Talks to:** parsed AST input (JavaParser CompilationUnit, ANTLR ParseTree, or raw text) via `DetectorContext`. Writes to a thread-local `DetectorResult`. -- **Owns:** nothing — must be stateless. Spring beans are singletons. - -### Graph store (`graph/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/graph/` -- **Responsibility:** Facade over Neo4j Embedded — UNWIND-batched bulk save for writes, raw Cypher for reads (no Spring Data Neo4j hydration on the read path for performance). -- **Key files:** - - `GraphStore.java` — `bulkSave(List, List)`, `queryNodes(...)`, fulltext search via `db.index.fulltext.queryNodes`. Creates 5 indexes on first save (3 b-tree + 2 fulltext — see [`data-model.md`](data-model.md)). - - `GraphRepository.java` — Spring Data Neo4j repository, used **only on the write path** (legacy). -- **Talks to:** Neo4j Embedded via `org.neo4j.graphdb` API (no Bolt for in-process reads). -- **Owns:** the Neo4j directory at `.codeiq/graph/graph.db/`. - -### Analysis cache (`cache/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/cache/` -- **Responsibility:** Per-file content-hash cache so re-running `index` only re-detects changed files. -- **Key files:** `AnalysisCache.java` (H2 schema + read/write API, `ReentrantReadWriteLock`-guarded, `CACHE_VERSION = 4`), `FileHasher.java` (SHA-256, 64-hex output). - -### REST API (`api/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/api/` -- **Files:** `GraphController.java` (`/api/**`), `FlowController.java` (`/api/flow/**`), `TopologyController.java` (`/api/topology/**`), `IntelligenceController.java` (`/api/intelligence/**`), `SafeFileReader.java` (helper, path-traversal guard). -- All controllers carry `@Profile("serving")` — they aren't loaded in indexing mode. -- 37 endpoints, all read-only. Full enumeration in [`CLAUDE.md`](../../CLAUDE.md) §"Server Endpoints". - -### MCP server (`mcp/`) -- **File:** `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java` — 34 `@McpTool`-annotated methods. Spring AI's `spring-ai-starter-mcp-server-webmvc` auto-registers them on a streamable HTTP transport at `/mcp`. Read-only. - -### Intelligence enrichment (`intelligence/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/intelligence/` -- **Sub-packages:** `lexical/` (doc-comment + snippet enrichment), `extractor/` (per-language extractors: `java/`, `typescript/`, `python/`, `go/`), `evidence/` (evidence-pack assembly for retrieval), `query/` (`QueryPlanner` for intelligent routing). -- Runs during `enrich` after structural data is in Neo4j; produces `prop_lex_*` properties indexed by the `lexical_index` fulltext index. - -### CLI (`cli/`) -- **Lives in:** `src/main/java/io/github/randomcodespace/iq/cli/` -- **Files:** 20 — `CodeIqCli.java` (top-level), 14 commands (`Index`, `Enrich`, `Serve`, `Analyze`, `Stats`, `Graph`, `Query`, `Find`, `Cypher`, `Topology`, `Flow`, `Bundle`, `Cache`, `Plugins`), config subcommands (`ConfigCommand`, `ConfigExplainSubcommand`, `ConfigValidateSubcommand`), `VersionCommand`, helper `CliOutput`. -- All commands are `@Component`s; Picocli + Spring integration via `picocli-spring-boot-starter`. - -### React SPA (`src/main/frontend/`) -- See [`ui.md`](ui.md). Vite builds into `src/main/resources/static/` — Spring Boot's static handler serves it from inside the JAR when `codeiq.ui.enabled=true`. - -## Layering / dependency rules - -The package graph enforces a one-way flow: - -``` -cli/ ──► analyzer/ ──► detector/ ─► model/ - │ │ - └► linker/ └► grammar/ (ANTLR factory) - │ - ├► cache/ (H2) - └► graph/ (Neo4j) ──► api/ ──► query/ (read path) - │ - └► mcp/ (same QueryService) -``` - -- `model/` (CodeNode, CodeEdge, NodeKind, EdgeKind) is the dependency floor — depends on nothing in this codebase. -- `detector/` may import `model/` and `grammar/` — never `analyzer/`, `cli/`, or `api/`. -- `api/` and `mcp/` may import `query/` and `model/` — never `detector/` or `analyzer/` (read-only at serving time). -- `analyzer/` may import everything below it — it's the orchestrator. - -The `@Profile("serving")` annotation on every controller and on Neo4j-only beans (see `config/Neo4jConfig.java`) is what enforces "no writes during serving" at runtime; the package layering is convention, not a lint rule. - -## Cross-cutting concerns - -- **Logging:** SLF4J + Spring Boot's default Logback. `application.yml` quiets noisy `org.springframework.ai.mcp` and `PostProcessorRegistrationDelegate` to WARN. -- **Error handling:** Pipeline errors are logged + counted, never abort a whole run. Detector exceptions are caught per-file (the run continues with a logged warning); see `Analyzer.java` task wrapping. CLI commands return `int` exit codes via Picocli. -- **Auth / authz:** None — codeiq runs on the developer's machine. The serving layer trusts the loopback caller. CORS is configurable via `codeiq.cors.allowed-origin-patterns` (`application.yml` / `CorsConfig.java`). -- **Observability:** Spring Boot Actuator (`/actuator/health` with liveness + readiness probes per `application.yml`); `health/GraphHealthIndicator.java` reports Neo4j status. No metrics export — by design (offline tool). -- **Config:** Hierarchical, last-wins: built-in defaults → `~/.codeiq/config.yml` → `./codeiq.yml` → `CODEIQ_*` env → CLI flags. `UnifiedConfigBeans` bridges the unified config to the legacy `CodeIqConfig` bean. Spring-owned keys (`codeiq.neo4j.enabled`, `codeiq.neo4j.bolt.port`, `codeiq.cors.allowed-origin-patterns`, `codeiq.ui.enabled`) live in `application.yml` because they drive `@ConditionalOnProperty` / `@Value` wiring. Full schema: [`docs/codeiq.yml.example`](../codeiq.yml.example). - -## Concurrency model - -- Detector fan-out runs on **virtual threads** (`Executors.newVirtualThreadPerTaskExecutor()` in `Analyzer.java`). Java 25 + JEP 491 means `synchronized` and `j.u.c.locks` no longer pin carrier threads, so the cache's `ReentrantReadWriteLock` is purely a logical concurrency primitive — not a workaround. -- Detectors are stateless `@Component` singletons (Spring's default scope). Per-file mutable state lives in method-local `DetectorContext` / `DetectorResult` instances. -- `GraphBuilder` collects results into indexed slots (one per file) so iteration order is independent of thread completion order — this is the determinism guarantee. - -## Why it's shaped this way - -- **Three-stage pipeline (`index`/`enrich`/`serve`) instead of one all-in-one `analyze`:** large codebases (44 K+ files in the original target) blow heap if scanning + Neo4j ingestion happen in the same JVM run. `index` writes to H2 in batches (default 500), `enrich` reads from H2 and bulk-loads with UNWIND. `analyze` is kept as a legacy in-memory shortcut for small repos. See `CLAUDE.md` §"Pipeline". -- **Embedded Neo4j (not a server):** zero-ops deployment for an offline tool; bundle model means the serving host doesn't even need source code, just the `.codeiq/graph/` directory. -- **Read-only serving layer:** lets the server be deployed to a "remote" environment where source code is forbidden, while analysis still happens on the developer's box. See [`CLAUDE.md`](../../CLAUDE.md) §"Critical Rules / Read-Only Serving Layer". -- **Auto-discovery of detectors via `@Component`:** detectors are added by dropping a class — no registry edits, no plugin manifest. The trade-off is that mistakes (forgetting `@Component`) silently disable a detector; the `plugins` CLI command exists to introspect what's actually live. diff --git a/docs/project/build-and-run.md b/docs/project/build-and-run.md deleted file mode 100644 index ba1d5916..00000000 --- a/docs/project/build-and-run.md +++ /dev/null @@ -1,152 +0,0 @@ -# Build & Run - -## Prerequisites - -- **Java 25** (Temurin recommended — pinned in CI: `.github/workflows/ci-java.yml` sets `distribution: 'temurin'` and `java-version: '25'` on `actions/setup-java`). -- **Maven 3.9+** (Maven Wrapper not committed; `mvn` from system path is expected). -- **Node.js + npm** for the frontend build. The `frontend-maven-plugin` (configured in `pom.xml`) downloads its own Node automatically — you don't need a system Node unless you run `npm` directly inside `src/main/frontend/`. -- **No Docker, no Postgres, no Redis** — codeiq is offline-first. Neo4j and H2 are embedded. - -## First-time setup - -```bash -git clone https://github.com/RandomCodeSpace/codeiq.git -cd codeiq - -# Quickest validation — skip tests, skip the security gate -mvn clean package -DskipTests -Ddependency-check.skip=true - -# Resulting JAR -ls target/code-iq-*-cli.jar -``` - -The first `mvn verify` (the full CI gate) downloads ~1 GB of NVD data for OWASP dependency-check. Use `-Ddependency-check.skip=true` while iterating locally; CI runs the full check on every push. - -Source for these steps: `pom.xml` (the `` block + plugin executions further down) and [`shared/runbooks/first-time-setup.md`](../../shared/runbooks/first-time-setup.md). - -## Local development loop - -There's no hot-reload story for the Java side — codeiq is a CLI/server, not a long-running dev server. The typical loop: - -```bash -# Edit Java source, then -mvn test -Dtest=YourDetectorTest -Dfrontend.skip=true # fastest single-test cycle -mvn package -DskipTests -Ddependency-check.skip=true # repackage the JAR -java -jar target/code-iq-*-cli.jar index /path/to/scan-target -java -jar target/code-iq-*-cli.jar enrich /path/to/scan-target -java -jar target/code-iq-*-cli.jar serve /path/to/scan-target -``` - -For the **frontend** (live HMR against a running backend): - -```bash -# Terminal 1 — run the Java backend -java -jar target/code-iq-*-cli.jar serve /path/to/scan-target - -# Terminal 2 — run Vite dev server (proxies /api and /mcp to localhost:8080) -cd src/main/frontend -npm install -npm run dev -``` - -Vite proxy config: `src/main/frontend/vite.config.ts` (`server.proxy` at the bottom of the file) — `/api` and `/mcp` go to `http://localhost:8080`. - -## Test layers - -- **Unit + integration (JUnit, ~236 test files):** - ```bash - mvn test # all tests - mvn test -Dtest=SpringRestDetectorTest # one class - mvn test -Dsurefire.useFile=false # verbose stderr to console - ``` - Tests live in `src/test/java/**` mirroring the source-tree package layout. **Detector tests must include positive, negative, and determinism cases** — see existing `*DetectorTest.java`. - -- **E2E quality tests (Context7-grounded ground truth):** - ```bash - E2E_PETCLINIC_DIR=/path/to/spring-petclinic mvn test -Dtest=E2EQualityTest - ``` - Ground-truth JSON lives under `src/test/resources/e2e/ground-truth-*.json`. Skipped automatically when the env var isn't set. - -- **Frontend E2E (Playwright):** - ```bash - cd src/main/frontend - npm run test:e2e # headless - npm run test:e2e:headed # with browser visible - npm run test:e2e:report # open last report - ``` - -- **CI gate:** - ```bash - mvn verify - ``` - Includes everything above (`mvn test` plus `spotbugs:check` and `dependency-check:check` bound to the `verify` phase). Failing any of those breaks the build. See `pom.xml` plugin executions and `.github/workflows/ci-java.yml`. - -## Build artifacts - -- **What:** a single fat JAR — `target/code-iq-*-cli.jar` (Spring Boot repackaged executable JAR). -- **Bundles:** all Java deps + the React SPA built into `src/main/resources/static/` by the `frontend-maven-plugin` during `mvn package`. -- **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (see `` / `` in `pom.xml`). The artifactId stays `code-iq` historically; the binary command is `codeiq`. -- **Releases:** - - Beta: `.github/workflows/beta-java.yml` — `workflow_dispatch` only → Sonatype Central beta + GitHub pre-release. - - GA: `.github/workflows/release-java.yml` — `workflow_dispatch` with a `version` input → builds a GPG-signed release commit on a detached HEAD, deploys to Sonatype Central, then pushes a GPG-signed annotated `vX.Y.Z` tag + GitHub Release. **No tag-push trigger; no auto-release on merge.** See [`shared/runbooks/release.md`](../../shared/runbooks/release.md). - -## Deploy - -There is no SaaS surface, no container image, no VPS. codeiq runs on the developer's machine. The deploy flow: - -1. User adds the dep / downloads the JAR from Maven Central or GitHub Releases. -2. User runs `codeiq index → enrich → serve` against their own repo. -3. The `serve` mode binds `0.0.0.0:8080` by default — exposed only to the local machine unless the user reconfigures. - -For codeiq's own release (publishing to Maven Central): see [`shared/runbooks/release.md`](../../shared/runbooks/release.md). Rollback: [`shared/runbooks/rollback.md`](../../shared/runbooks/rollback.md). - -## CLI reference - -20 files under `src/main/java/io/github/randomcodespace/iq/cli/` define 14 user-facing commands. Authoritative table is in [`CLAUDE.md`](../../CLAUDE.md) §"CLI Commands"; condensed here: - -| Command | Purpose | Profile | -|---|---|---| -| `index ` | Memory-efficient batched scan → H2 cache | `indexing` | -| `enrich ` | Load H2 → Neo4j; run linkers, classifier, services | `indexing` | -| `serve ` | Read-only REST + MCP + UI on `http://localhost:8080` | **`serving`** | -| `analyze ` | Legacy in-memory all-in-one (small repos only) | `indexing` | -| `stats ` | 7-category statistics from Neo4j | `indexing` | -| `graph ` | Export graph (JSON / YAML / Mermaid / DOT) | `indexing` | -| `query ` | Preset relationship queries (consumers, producers, ...) | `indexing` | -| `find ` | Preset finds (endpoints, guards, entities, topics) | `indexing` | -| `cypher ` | Raw Cypher against Neo4j | `indexing` | -| `topology ` | Service topology (blast radius, cycles, bottlenecks) | `indexing` | -| `flow ` | Architecture flow diagrams | `indexing` | -| `bundle ` | Pack graph + source snapshot into ZIP | `indexing` | -| `cache ` | Inspect / clear / stats H2 cache | `indexing` | -| `plugins ` | List / inspect detectors | `indexing` | -| `config validate` / `config explain` | Unified-config tooling | `indexing` | -| `version` | Show version info | `indexing` | - -Profile selection happens in `CodeIqApplication.java`'s `main` (the `boolean isServe = "serve".equalsIgnoreCase(command)` block) — `serve` activates `serving` (web server on); everything else activates `indexing` (`WebApplicationType.NONE`). - -## Build phases — what runs when - -| Phase | What runs | Source | -|---|---|---| -| `generate-sources` | ANTLR codegen from `*.g4` files | `pom.xml` `antlr4-maven-plugin` | -| `process-resources` | `frontend-maven-plugin`: install Node, `npm ci`, `npm run build` → `src/main/resources/static/` | `pom.xml`, `src/main/frontend/vite.config.ts` (`build.outDir: '../resources/static'`) | -| `compile` / `test-compile` | javac for Java 25 | standard | -| `test` | Surefire — JUnit | standard | -| `verify` | `spotbugs:check`, `dependency-check:check` | `pom.xml` plugin executions; **this is the CI gate** | -| `package` | Spring Boot repackage → executable JAR with embedded SPA | `spring-boot-maven-plugin` | - -## Gotchas - -- **`mvn test` does NOT run the security gate.** SpotBugs and OWASP dependency-check are bound to `verify`. CI runs `mvn verify`. Locally, `mvn verify` is what actually mirrors CI. -- **OWASP NVD download is ~1 GB** and very slow on first run. `-Ddependency-check.skip=true` for fast local cycles; let CI run the full check. -- **`-Dfrontend.skip=true`** skips the frontend-maven-plugin entirely. The default `false` (in the `pom.xml` `` block) means `mvn package` always tries to build the SPA. Backend-only contributors should pass `-Dfrontend.skip=true` to avoid pulling Node. -- **Vite output path is relative-up:** `src/main/frontend/vite.config.ts` writes to `'../resources/static'` (= `src/main/resources/static/`) and uses `emptyOutDir: false` so a stale dir won't be wiped — if you see leftover assets, delete `src/main/resources/static/` manually. -- **ANTLR generated sources go under `target/generated-sources/antlr4/`** (per `antlr4-maven-plugin` defaults). Don't edit them; regenerate via `mvn generate-sources`. Modifying the `.g4` files in `src/main/antlr4/` is the supported edit point. -- **Spring Boot startup overhead is 8–16 s** for the embedded Neo4j + Spring context. Expected; not a perf bug. -- **Default index batch size is 500** (`Indexing batch tuning, see CLAUDE.md`). Larger isn't better; 500 outperformed 1000 in the tuning runs that set the default. -- **Tomcat 11.0.21 + Jackson 3.1.1 are pinned overrides** of Spring Boot 4.0.5's BOM (see `` / `` in `pom.xml`'s ``). Both are security bumps. Revert when Spring Boot 4.0.6+ catches up — keep the rationale comments. -- **`@ActiveProfiles("test")` is required on every `@SpringBootTest`** to avoid Neo4j auto-startup conflicts in integration tests. -- **First-run cache version mismatch wipes `.codeiq/cache/`.** Bump `CACHE_VERSION` (constant near the top of `cache/AnalysisCache.java`) whenever you change the hash algorithm or H2 schema. Existing users will lose cache on next run; that's intentional (incorrect cache > slow cache). -- **`SECURITY.md`, `CHANGELOG.md`, `.bestpractices.json`, `LICENSE`** are part of the OpenSSF Best Practices gate (project_id 12650). Do not delete or rename without coordinating — they are referenced by `.bestpractices.json` and the Scorecard workflow. -- **CI workflow pins all third-party actions by 40-char SHA** (see `.github/workflows/scorecard.yml`, `.github/workflows/codeql.yml` if present). When adding a new action, pin by SHA — Scorecard's `Pinned-Dependencies` check will downgrade us otherwise. diff --git a/docs/project/conventions.md b/docs/project/conventions.md deleted file mode 100644 index b83665e2..00000000 --- a/docs/project/conventions.md +++ /dev/null @@ -1,126 +0,0 @@ -# Conventions - -Rules to follow when modifying codeiq. Each item is grounded in an existing file. The 7 most important ones are summarized in [`PROJECT_SUMMARY.md`](../../PROJECT_SUMMARY.md) §"Conventions an agent must respect"; this file is the long form. - -## Code style - -- **Java 25 idioms encouraged** — records, sealed classes, pattern matching, virtual threads. Don't down-port to older idioms; this codebase is on the latest LTS-track. -- **Constructor injection only.** No field injection (`@Autowired` on fields), no setter injection. See any `@Component` / `@Service` in the codebase, e.g. `api/GraphController.java`. -- **Property-key constants** — when a string literal appears 3+ times in a file, extract: `private static final String PROP_FRAMEWORK = "framework";`. Saves typo bugs and makes refactors greppable. -- **Spring AI MCP annotations:** use `@McpTool` and `@McpToolParam` (Spring AI 2.x), not `@Tool`/`@ToolParam` (older form). See `mcp/McpTools.java`. -- **UTF-8 explicit:** `StandardCharsets.UTF_8` everywhere — never rely on platform default. `Analyzer.java` shows the import. - -## Error handling - -- **Pipeline errors don't abort the run.** Per-file detector exceptions are caught and logged; the file is skipped, the run continues. See task wrapping in `analyzer/Analyzer.java`. -- **CLI commands return `int` exit codes** via Picocli's `Callable` pattern. See any `cli/*Command.java` (e.g. `cli/EnrichCommand.java`). -- **No `System.exit()` from non-CLI code.** `CodeIqApplication.main` is the only place that calls `SpringApplication.exit(...)` and `System.exit(...)`. -- **No silent fallbacks.** If a detector can't parse a file, log it; don't return an empty result that looks indistinguishable from "nothing matched". - -## Naming - -- **Java packages:** `io.github.randomcodespace.iq.` (lowercase, no plurals). Detector subpackages match the language family: `detector/jvm/{java,kotlin,scala}/`, `detector/typescript/`, `detector/python/`, `detector/systems/{cpp,rust}/`. -- **Detector class:** `Detector` — `SpringSecurityDetector`, `FastifyDetector`, `GoStructuresDetector`. Always ends in `Detector`. -- **Detector test class:** `DetectorTest` — colocated under `src/test/java/` with the same package. -- **CLI commands:** `Command` — `IndexCommand`, `EnrichCommand`, `ServeCommand`. Picocli `@Command(name = "")` annotation gives the user-facing name. -- **Node ID format:** `"{prefix}:{filepath}:{type}:{identifier}"` — e.g. `"node:src/main/java/Foo.java:class:Foo"`. The full file path is part of the key — that's how cross-file uniqueness works. -- **Property keys:** snake_case (`auth_type`, `framework`, `roles`). Stored in Neo4j with a `prop_` prefix (`prop_auth_type`, `prop_framework`). -- **Frontend imports:** `@/...` resolves to `src/main/frontend/src/...` (Vite alias in `vite.config.ts`, mirrored in `tsconfig.json`'s `paths`). Always use the alias, never `../../../`. - -## Tests - -- **Location:** `src/test/java//`. ~236 test files total. -- **Layers:** - - **Unit:** plain JUnit, no Spring context. Most detector tests are unit. - - **Integration:** `@SpringBootTest` with `@ActiveProfiles("test")` — required to suppress Neo4j auto-startup. Standalone MockMvc for controller tests (no full context). - - **MCP tools:** test by calling `McpTools` methods directly — no protocol round-trip needed. - - **E2E quality:** `E2EQualityTest` validates against Context7-sourced ground truth (`src/test/resources/e2e/ground-truth-*.json`). Requires the env var `E2E_PETCLINIC_DIR` (or similar) to point at a cloned reference repo. -- **Run a single test:** `mvn test -Dtest=ClassName#methodName`. -- **Every detector needs:** - 1. Positive match — input that should fire, output asserted. - 2. Negative match — input that *looks similar* but shouldn't fire (especially for framework detectors). - 3. **Determinism test** — run the detector twice on the same input, assert output is byte-identical. - -## Logging - -- **SLF4J** via Spring Boot's default Logback. Pattern across the codebase: `private static final Logger log = LoggerFactory.getLogger(MyClass.class);`. -- `application.yml` already silences known-noisy loggers (`org.springframework.ai.mcp` → WARN, `PostProcessorRegistrationDelegate` → WARN). Don't add more bare `org.springframework.*` loggers without good cause. -- **No PII concerns** — codeiq scans the user's own code; logs go to the user's terminal. - -## Adding a new detector - -(Authoritative recipe — slightly expanded from [`CLAUDE.md`](../../CLAUDE.md) §"Adding a New Detector".) - -1. **Pick the right base class** (table below) and create `src/main/java/io/github/randomcodespace/iq/detector//Detector.java`. -2. **Annotate with `@Component`** (Spring auto-discovery) **and `@DetectorInfo(name=..., category=..., parser=ParserType.X, languages={...}, nodeKinds={...}, edgeKinds={...}, properties={...})`** (used by the `plugins` CLI command for introspection). Live examples: `detector/jvm/java/SpringSecurityDetector.java`, `detector/go/GoStructuresDetector.java`. -3. **Implement `detect(DetectorContext ctx)`** — return a `DetectorResult` populated with `CodeNode`s and `CodeEdge`s. Detectors are stateless; the `DetectorContext` is your scratch space. -4. **Framework detectors require a discriminator guard** — e.g. Quarkus must require `import io.quarkus.*`, Fastify must require `import 'fastify'`. Otherwise you'll match Spring controllers as Quarkus or Express as Fastify. **No exceptions** — this rule is enforced by review. -5. **Property-key constants** for any string literal repeated 3+ times. -6. **Add tests** in `src/test/java/.../detector//DetectorTest.java`: positive, negative, determinism. -7. **Run `mvn test`** — all 236+ tests must still pass. -8. **No registry edit needed** — Spring classpath scan picks up the `@Component`. The `plugins list` CLI command will introspect via `@DetectorInfo`. - -### Detector base classes - -| Class | Use when | -|---|---| -| `Detector` (interface) | You need full control; rare | -| `AbstractRegexDetector` | Pattern-only detection (most detectors) | -| `AbstractJavaParserDetector` | Java AST via JavaParser (Spring, JPA, etc.) | -| `AbstractAntlrDetector` | ANTLR grammar-based (TS, Python, Go, C#, Rust, C++) | -| `AbstractStructuredDetector` | Structured config files (YAML, JSON, TOML, INI, properties) | -| `AbstractPythonAntlrDetector` | Python ANTLR detectors (shared parse, getBaseClassesText, extractClassBody) | -| `AbstractPythonDbDetector` | Python ORM detectors (adds ensureDbNode/addDbEdge via DetectorDbHelper) | -| `AbstractTypeScriptDetector` | TS regex detectors (shared getSupportedLanguages, detect→detectWithRegex) | -| `AbstractJavaMessagingDetector` | Java messaging detectors (shared CLASS_RE, extractClassName, addMessagingEdge) | - -### Shared static helpers (don't subclass — call them) - -| Class | Purpose | -|---|---| -| `DetectorDbHelper` | `ensureDbNode` / `addDbEdge` for any detector emitting `DATABASE_CONNECTION` nodes | -| `FrontendDetectorHelper` | `createComponentNode` / `lineAt` for Angular, React, Vue detectors | -| `StructuresDetectorHelper` | `addImportEdge` / `createStructureNode` for Scala/Kotlin structure detectors | - -## Adding a new CLI command - -1. Create `src/main/java/io/github/randomcodespace/iq/cli/Command.java`. -2. Annotate `@Component` and `@picocli.CommandLine.Command(name="", description="...")`. -3. Implement `Callable` returning the exit code. -4. Wire as a subcommand of `CodeIqCli` in `cli/CodeIqCli.java` (it lists subcommands explicitly). -5. If the command needs a Spring profile other than `indexing` (only `serve` does this), update the `if (isServe) ...` block in `CodeIqApplication.main` — note this is **not** generic, so adding another `serving`-profile command means rethinking that conditional. - -## Adding a new REST endpoint - -1. Add a `@GetMapping` method (read-only — no `@PostMapping`/`@PutMapping`/`@DeleteMapping`) to the appropriate controller in `src/main/java/io/github/randomcodespace/iq/api/`. -2. Delegate to `query/QueryService.java` (or one of its peers — `StatsService`, `TopologyService`) — controllers stay thin. -3. **Mirror it in `mcp/McpTools.java`** as a new `@McpTool`. The MCP tool description must explain when an LLM should call it; copy the wording style of existing tools. -4. Add a controller test using standalone MockMvc (no `@SpringBootTest`). - -## Adding a new MCP tool - -1. Add a method on `mcp/McpTools.java` annotated `@McpTool(name="...", description="...")`. -2. Parameters: annotate with `@McpToolParam(description="...")`. -3. Return type: anything Jackson can serialize (typically a `Map` or a record). Jackson's `FAIL_ON_UNKNOWN_PROPERTIES` is globally disabled for MCP-protocol compatibility (`config/JacksonConfig.java`). -4. Test by calling the method directly in a unit test — no protocol round-trip needed. - -## Things to avoid (anti-patterns) - -- **`Set` iteration without sorting** — kills determinism. Use `TreeSet`, `stream().sorted(...)`, or sort the resulting list. -- **Mutable instance state on detectors** — they're Spring singletons; concurrent calls will collide. Per-call state goes in method-local variables / `DetectorContext`. -- **Coarse `synchronized` on `AnalysisCache`** — the `ReentrantReadWriteLock` is deliberate. Don't "simplify" to `synchronized` blocks; that serializes reads unnecessarily. -- **Direct `Boolean.TRUE.equals(yamlKey)`** — SnakeYAML parses bare `on` as `Boolean.TRUE`. Use `String.valueOf(key)` for YAML key comparisons (SonarCloud S2159). -- **Regex with nested non-possessive quantifiers** — use `*+` instead of `*` for nested patterns. `([^"\\]*+(?:\\.[^"\\]*+)*+)` not `([^"\\]*(?:\\.[^"\\]*)*)`. Stack-overflow risk (SonarCloud S5998). -- **Adding a new property to `CodeNode` without round-trip-testing** — Neo4j stores properties as `prop_`; `nodeFromNeo4j()` must restore them. A new property that survives `bulkSave` but not `nodeFromNeo4j` will silently disappear when read back. -- **Edges referencing nodes that don't exist yet** — `bulkSave`'s edge UNWIND silently drops rows whose source/target IDs don't match any node. Pre-validate IDs. -- **Generic patterns in framework detectors** — `router.get(...)` matches Express, Fastify, NestJS, Vue Router, Hono, and probably ten others. Always require a framework-specific import. - -## Don't refactor (intentional non-standard choices) - -- **Single-file `NodeKind` and `EdgeKind` enums.** They're long (34/28 values) and could be split, but they're load-bearing for cross-file uniqueness and detector readability. Don't split — keeps the type surface in one diff-friendly file. See `model/NodeKind.java`, `model/EdgeKind.java`. -- **No SDN hydration on the read path.** `graph/GraphStore.java` uses raw Cypher + `nodeFromNeo4j()` for reads; `graph/GraphRepository.java` (Spring Data Neo4j) is used **only for writes**. This is deliberate — SDN's hydration overhead was measured and rejected for the read path. Don't unify them. -- **Auto-discovery via Spring `@Component` on detectors, no explicit registry.** Drop in a class, it's live. The `DetectorRegistry` exists to *introspect* the discovered set, not to register them. Don't replace with a manual registry. -- **CLI profile selection in `CodeIqApplication.main` (not via Picocli's mechanism).** It's a string `if/else` on the first arg, and it pre-empts Picocli to set the Spring profile *before* the context starts. Looks ugly; works correctly. SpotBugs flagged the original duplicate branches; the current version was deliberately collapsed. -- **`indexing` profile sets `WebApplicationType.NONE`** — meaning `mvn test` from the IDE without `@ActiveProfiles("test")` will try to start the web server and pin to ports. Always use `@ActiveProfiles("test")` on `@SpringBootTest`. -- **Frontend assets bundled into the JAR (`src/main/resources/static/`)** — no separate frontend deploy. Vite's `outDir: '../resources/static'` is the embed seam; don't move the SPA out of the JAR without re-architecting the deploy story. -- **`prop_*` Neo4j property prefix.** It's a deliberate namespacing scheme to separate domain properties from top-level node attributes (`id`, `kind`, `layer`, etc.). Don't rename. diff --git a/docs/project/data-model.md b/docs/project/data-model.md deleted file mode 100644 index 5fdae10a..00000000 --- a/docs/project/data-model.md +++ /dev/null @@ -1,127 +0,0 @@ -# Data Model - -codeiq's data model has **three storage layers**, each with its own schema and lifetime: - -| Layer | Backing | Purpose | Lifetime | -|---|---|---|---| -| Domain types | Java records / enums | In-memory shape of nodes/edges, single source of truth | Per JVM run | -| Analysis cache | H2 (file-backed, embedded) | Per-file detection results keyed by content hash; enables incremental re-indexing | `.codeiq/cache/` until manually cleared or `CACHE_VERSION` bump | -| Graph | Neo4j Embedded (Community Edition 2026.02.3) | Final enriched graph for queries, MCP tools, REST API | `.codeiq/graph/graph.db/` until manually cleared | - -## Storage - -### Primary datastore — Neo4j Embedded -- **Defined in:** `pom.xml` `2026.02.3`, bootstrapped in `config/Neo4jConfig.java` (only loaded under the `serving` profile via `@ConditionalOnProperty(value="codeiq.neo4j.enabled", havingValue="true")`). -- **Data dir:** `.codeiq/graph/graph.db/` inside the scanned repo. -- **Migration tool:** none — Neo4j is schemaless; indexes/constraints are created idempotently by `GraphStore.bulkSave()`. - -### Secondary datastore — H2 (analysis cache) -- **Defined in:** `cache/AnalysisCache.java`. H2 is a transitive Spring Boot dependency (no explicit version pin in `pom.xml`). -- **Data dir:** `.codeiq/cache/` inside the scanned repo. -- **Schema versioning:** `CACHE_VERSION = 4` constant near the top of `AnalysisCache.java` (currently line 43; grep the symbol if drifted). On startup, cache reads the stored version; if it doesn't match, the H2 file is wiped and recreated. **Bump `CACHE_VERSION` whenever you change the file-hash algorithm or the schema.** - -## Domain types - -### `CodeNode` and `CodeEdge` -- **Defined in:** `model/CodeNode.java`, `model/CodeEdge.java`. -- **Plain Java records / classes** (not JPA entities — Spring Data Neo4j is used only on the write path). Properties live in a `Map`. -- **ID format:** `"{prefix}:{filepath}:{type}:{identifier}"` (e.g. `"node:src/main/java/Foo.java:class:Foo"`). Cross-file uniqueness is enforced by including the full file path. See existing detectors for the prefix convention. - -### `NodeKind` (enum) -- **Defined in:** `model/NodeKind.java`. -- **34 concrete values** (javadoc and file are in sync as of 2026-04-27): - -``` -MODULE, PACKAGE, CLASS, METHOD, ENDPOINT, ENTITY, REPOSITORY, QUERY, -MIGRATION, TOPIC, QUEUE, EVENT, RMI_INTERFACE, CONFIG_FILE, CONFIG_KEY, -WEBSOCKET_ENDPOINT, INTERFACE, ABSTRACT_CLASS, ENUM, ANNOTATION_TYPE, -PROTOCOL_MESSAGE, CONFIG_DEFINITION, DATABASE_CONNECTION, AZURE_RESOURCE, -AZURE_FUNCTION, MESSAGE_QUEUE, INFRA_RESOURCE, COMPONENT, GUARD, -MIDDLEWARE, HOOK, SERVICE, EXTERNAL, SQL_ENTITY -``` - -Each enum constant carries a lowercase `value` (e.g. `CLASS("class")`) used as the string representation in Cypher / JSON / MCP-tool responses. `NodeKind.fromValue(...)` does the reverse lookup via a static `BY_VALUE` map. - -### `EdgeKind` (enum) -- **Defined in:** `model/EdgeKind.java`. -- **28 concrete values** (javadoc and file are in sync as of 2026-04-27): - -``` -DEPENDS_ON, IMPORTS, EXTENDS, IMPLEMENTS, CALLS, INJECTS, EXPOSES, -QUERIES, MAPS_TO, PRODUCES, CONSUMES, PUBLISHES, LISTENS, INVOKES_RMI, -EXPORTS_RMI, READS_CONFIG, MIGRATES, CONTAINS, DEFINES, OVERRIDES, -CONNECTS_TO, TRIGGERS, PROVISIONS, SENDS_TO, RECEIVES_FROM, PROTECTS, -RENDERS, REFERENCES_TABLE -``` - -### `layer` (string property, not an enum) -Every node carries a `layer` property set by `analyzer/LayerClassifier.java` to one of: `frontend`, `backend`, `infra`, `shared`, `unknown`. Classification is deterministic — based on `kind`, `framework`, and path heuristics. - -## H2 cache schema - -Defined in the `SCHEMA_SQL` text block near the top of `cache/AnalysisCache.java` (grep `SCHEMA_SQL`). Tables (verified from the file): - -| Table | Purpose | -|---|---| -| `cache_meta` | `meta_key` (PK) → `meta_value` — stores the `version` row matching `CACHE_VERSION` | -| `files` | `content_hash` (PK) → file path, language, size, parse timestamp; the unit of cache lookup | -| `nodes` | per-file detected nodes; `row_id` AUTO_INCREMENT PK; FK to `files.content_hash` | -| `edges` | per-file detected edges; FK to `files.content_hash` | -| `analysis_runs` | `run_id` (PK), wall-clock metadata for one `index`/`analyze` invocation | - -**Reserved-word note:** H2 reserves `key`, `value`, `order`. The schema uses `meta_key` / `meta_value` etc. — keep that pattern when extending. - -**Concurrency:** the cache uses a `ReentrantReadWriteLock` (`AnalysisCache.java`). Many virtual-thread readers can run in parallel; writers serialize. This is what avoids `ClosedChannelException` against H2's MVStore file channel under concurrent virtual-thread access. - -## Neo4j schema (created by `GraphStore.bulkSave`) - -Indexes created idempotently (`CREATE … IF NOT EXISTS`) inside `GraphStore.bulkSave()` (`graph/GraphStore.java`, around lines 112–122 at time of writing — grep `CREATE INDEX` to relocate): - -| Index | Type | Property | -|---|---|---| -| (unnamed) | b-tree | `(:CodeNode {id})` | -| (unnamed) | b-tree | `(:CodeNode {label_lower})` | -| (unnamed) | b-tree | `(:CodeNode {fqn_lower})` | -| `search_index` | fulltext | `[label_lower, fqn_lower]` over `:CodeNode` | -| `lexical_index` | fulltext | `[prop_lex_comment, prop_lex_config_keys]` over `:CodeNode` | - -The `CLAUDE.md` "Gotchas" section additionally references b-tree indexes on `kind`, `layer`, `module`, `filePath`. **Cross-check before relying on those** — `grep "CREATE INDEX" graph/GraphStore.java` shows only the 3 above plus the 2 fulltext indexes. The CLAUDE.md claim may be aspirational or stale. - -### Property round-trip convention - -Domain `properties` Map → Neo4j stored as `prop_` properties. Domain ID, layer, kind, etc. become top-level node properties (`id`, `layer`, `kind`, `label_lower`, `fqn_lower`, `module`, `filePath`). The reverse mapping is in `nodeFromNeo4j()` inside `graph/GraphStore.java`. **Whenever you add a domain property, verify the round-trip survives** — silent property loss is the most common bug class on this seam. - -### Bulk-save batching - -`bulkSave` uses `UNWIND $batch AS props CREATE (n:CodeNode) SET n = props` for nodes (default batch 500) and a similar UNWIND-MATCH-MATCH-CREATE pattern for edges. Edge UNWIND **silently drops rows whose source/target node IDs are missing** — pre-validate before passing in. See [`CLAUDE.md`](../../CLAUDE.md) §"Gotchas". - -## Lifecycle / state machines - -There are no state machines on entities themselves. The closest thing is the **pipeline lifecycle** that produces them: - -``` -file on disk - ─► hashed (SHA-256, FileHasher.java) - ─► H2 cache lookup - ├─ hit → reuse cached nodes/edges - └─ miss → run detectors, write nodes+edges keyed by content_hash - ─► H2 cache populated - -(later, on `enrich`:) - ─► H2 read - ─► UNWIND bulk-load to Neo4j - ─► linkers (Topic, Entity, ModuleContainment, Guard) add cross-file edges - ─► LayerClassifier sets layer property on every node - ─► ServiceDetector adds SERVICE nodes + CONTAINS edges - ─► LanguageEnricher (per-language extractors) adds extractor results - ─► LexicalEnricher adds prop_lex_* + the lexical_index - ─► graph ready for `serve` -``` - -## Schema source of truth - -- **Neo4j shape:** `graph/GraphStore.java` is canonical (it creates the indexes; there are no other DDL sources). Property names like `label_lower` / `fqn_lower` / `prop_*` are decided here. -- **H2 shape:** `cache/AnalysisCache.java`'s `SCHEMA_SQL` constant is canonical. There is no separate migration directory — `CACHE_VERSION` is the migration mechanism. -- **Domain shape:** `model/{CodeNode,CodeEdge,NodeKind,EdgeKind}.java` are canonical. Detectors reference these enums by symbol; never use the lowercase string forms in detector code. - -If you change any of the three, **update the other two seams** (or document why you didn't). diff --git a/docs/project/flows.md b/docs/project/flows.md deleted file mode 100644 index 24ef4ba1..00000000 --- a/docs/project/flows.md +++ /dev/null @@ -1,127 +0,0 @@ -# Key Flows - -Four flows worth tracing — they cover the main code paths an agent will need to modify or debug. Each lists the file:line entry and the chain of calls. **Line numbers are accurate at the time of writing (2026-04-27)** but rot — `grep` for the symbol if a line drifts. - ---- - -## Flow: `codeiq index ` — file scan → H2 cache - -**Trigger:** `java -jar code-iq-*-cli.jar index /path/to/repo` from a shell. - -**Path through code:** - -1. `CodeIqApplication.java` `main(...)` — Spring Boot starts. The first arg (`index`) is *not* `serve`, so the app sets profile `indexing` and `WebApplicationType.NONE` (the `if (isServe) ... else ...` block). No web server spins up. -2. `CodeIqApplication.run(args)` — Picocli takes over: `new CommandLine(codeIqCli, factory).execute(args)`. -3. `cli/CodeIqCli.java` — top-level Picocli `@Command`. Subcommand dispatch routes to `cli/IndexCommand.java`. -4. `cli/IndexCommand.call()` — opens `cache/AnalysisCache` (creates the H2 file at `.codeiq/cache/` if missing; checks `CACHE_VERSION`). -5. `analyzer/FileDiscovery.discover(rootPath)` — runs `git ls-files` if the path is a git repo, else walks the filesystem. Returns a list of `DiscoveredFile`s with language tagged via `analyzer/FileClassifier.java`. -6. For each file, in batches (default 500): hash via `cache/FileHasher.hash(...)` (SHA-256), check the cache. - - **Cache hit** → reuse existing nodes/edges from H2. - - **Cache miss** → continue. -7. `analyzer/StructuredParser.parse(file)` — routes to JavaParser (Java), `grammar/AntlrParserFactory` (TS/Py/Go/C#/Rust/C++), or raw text. -8. **Detector fan-out** on virtual threads: every `@Component`-annotated `Detector` whose `getSupportedLanguages()` matches gets called with a `DetectorContext`. Results are collected per file. (Auto-discovery via Spring classpath scan; no manual list.) -9. `analyzer/GraphBuilder.addNodes(...) / addEdges(...)` — buffer to indexed slots so order is independent of thread completion. -10. `cache/AnalysisCache.write(contentHash, nodes, edges, runId)` — persist via UNWIND-friendly batches. -11. CLI prints summary; exit code 0. - -**Side effects:** `.codeiq/cache/` H2 file populated/updated. **No Neo4j writes**. No network calls. - -**Failure modes:** -- Per-file detector exceptions: caught + logged in `Analyzer.java`'s task wrapper; the file is skipped, the run continues. -- `CACHE_VERSION` mismatch: H2 file is wiped + recreated automatically on startup. -- Disk-full / permission errors: bubble up, run aborts with non-zero exit. - ---- - -## Flow: `codeiq enrich ` — H2 → Neo4j with linkers + classifiers - -**Trigger:** `java -jar code-iq-*-cli.jar enrich /path/to/repo` (after `index`). - -**Path through code:** - -1. `CodeIqApplication.main(...)` — same profile-selection logic; `enrich` → `indexing` profile, no web server. -2. `cli/EnrichCommand.call()` — opens `cache/AnalysisCache` (read), opens Neo4j Embedded directly via `DatabaseManagementServiceBuilder` (programmatic — Spring's `@Profile("serving")` Neo4j config is *not* loaded here). -3. `EnrichCommand` reads all nodes + edges from H2 in batches. -4. `graph/GraphStore.bulkSave(nodes, edges)` (line numbers approximate at time of writing — grep the Cypher fragment if drifted): - - `MATCH (n) WITH n LIMIT 5000 DETACH DELETE n RETURN count(*)` — clear in chunks if a previous graph existed. - - `CREATE INDEX IF NOT EXISTS` for `id`, `label_lower`, `fqn_lower` + `CREATE FULLTEXT INDEX` for `search_index` and `lexical_index`. - - `UNWIND $batch AS props CREATE (n:CodeNode) SET n = props` — nodes, batched (default 500). - - `UNWIND $batch AS e MATCH (a {id: e.src}) MATCH (b {id: e.tgt}) CREATE (a)-[r:EDGE_KIND]->(b)` — edges, batched. **Silently drops rows where source/target IDs miss.** -5. `analyzer/linker/*` — runs in order: `TopicLinker`, `EntityLinker`, `ModuleContainmentLinker`, `GuardLinker`. Each adds cross-file edges (e.g. `PRODUCES`/`CONSUMES` from a topic name appearing in two services). -6. `analyzer/LayerClassifier.classify(...)` — sets `n.layer` on every node based on `kind`, `framework`, and path heuristics. -7. `analyzer/ServiceDetector.detect(rootPath)` — walks the filesystem (not the Neo4j graph) for build files (Maven, Gradle, npm, Cargo, go.mod, etc. — 30+). Creates `:CodeNode {kind: 'service'}` nodes and `CONTAINS` edges to every module/file inside the service boundary. -8. `intelligence/extractor/LanguageEnricher` — runs per-language extractors (`JavaLanguageExtractor`, `TypeScriptLanguageExtractor`, `PythonLanguageExtractor`, `GoLanguageExtractor`) to add language-specific properties. -9. `intelligence/lexical/LexicalEnricher` — extracts doc comments (`DocCommentExtractor`) and persists to `prop_lex_comment`; populates the `lexical_index` fulltext index. -10. CLI prints summary; exit 0. - -**Side effects:** `.codeiq/graph/graph.db/` populated. H2 cache untouched. - -**Failure modes:** -- Edge with missing source/target ID: silently dropped by Cypher MATCH. Mitigation: pre-validate IDs before passing to `bulkSave`. **Most common cause of "missing relationships" bugs.** -- Property round-trip failure: a domain property survives `bulkSave` but `nodeFromNeo4j()` doesn't know to restore it → silent property loss. Verify by reading back any node you just wrote. - ---- - -## Flow: `codeiq serve ` — REST + MCP + UI request lifecycle - -**Trigger:** `java -jar code-iq-*-cli.jar serve /path/to/repo` (after `enrich`). Then a browser hits `http://localhost:8080/explorer` or an MCP client calls a tool. - -**Path through code (cold start):** - -1. `CodeIqApplication.main(...)` — first arg is `serve` → profile `serving` activated; web server starts. -2. Spring loads beans gated by `@Profile("serving")`: all 4 controllers in `api/`, `mcp/McpTools` (via Spring AI starter), the Neo4j `@Configuration` in `config/Neo4jConfig.java` (only when `codeiq.neo4j.enabled=true`). -3. Neo4j Embedded starts; `health/GraphHealthIndicator` reports status to `/actuator/health`. -4. Spring Boot's static-resource handler binds `src/main/resources/static/` (the bundled SPA) to `/`. -5. Server bound — `http://localhost:8080` ready. - -**Path through code (REST request, e.g. `GET /api/stats`):** - -1. Browser hits `/api/stats`. -2. `api/GraphController.getStats(...)` (`@GetMapping("/stats")`) is dispatched (carries `@Profile("serving")`). -3. Controller delegates to `query/StatsService.getStats()`. -4. `StatsService` runs Cypher queries via `graph/GraphStore.queryNodes(...)` (raw Cypher, not SDN). -5. Results aggregated into a `Map` and serialized by Jackson. -6. HTTP response returned. - -**Path through code (MCP tool call, e.g. `find_dead_code`):** - -1. MCP client (Claude Desktop, an LLM agent, the SPA's `McpConsole`) sends a JSON-RPC call to `/mcp` (mounted by Spring AI's `spring-ai-starter-mcp-server-webmvc`). -2. Spring AI dispatches to the matching `@McpTool`-annotated method on `mcp/McpTools.java`. -3. The MCP tool delegates to `query/QueryService.findDeadCode()` (or similar). -4. `QueryService` runs Cypher (filters by semantic edges only — `calls`, `imports`, `depends_on`; excludes structural `contains`, `defines`, and entry points like endpoints / config files — see [`CLAUDE.md`](../../CLAUDE.md) "Gotchas"). -5. Result returned as JSON-RPC response. - -**Side effects:** None — strictly read-only. - -**Failure modes:** -- Calling `serve` before `enrich` → `health/GraphHealthIndicator` reports DOWN; queries return empty results. Fix: run `enrich` first. -- CORS rejection if the SPA is being served from a different origin in dev: configure `codeiq.cors.allowed-origin-patterns` in `application.yml` (or env: `CODEIQ_CORS_ALLOWED_ORIGIN_PATTERNS`). -- `FAIL_ON_UNKNOWN_PROPERTIES` is globally disabled (`config/JacksonConfig.java`) — MCP protocol clients won't break on field additions, but it also hides typos in JSON inputs. Validate at the controller boundary. - ---- - -## Flow: Adding a new detector and seeing it run - -**Trigger:** developer adds `MyDetector.java` and rebuilds. - -**Path through code (compile-time + first run):** - -1. `src/main/java/io/github/randomcodespace/iq/detector//MyDetector.java` — new file, `@Component`-annotated, `@DetectorInfo(...)`-annotated, extending one of the `Abstract*Detector` base classes. -2. `mvn package` — compiles the class. -3. On the next `codeiq index `: - - Spring Boot starts under `indexing` profile, classpath-scans `io.github.randomcodespace.iq` for `@Component`s. - - `MyDetector` is instantiated as a singleton bean. - - `analyzer/Analyzer` (or `cli/IndexCommand`) iterates Spring's `Map` of all bean instances. -4. For every file whose language matches `getSupportedLanguages()`, `MyDetector.detect(ctx)` is called on a virtual thread. -5. Returned `DetectorResult` is folded into `GraphBuilder` (nodes-first, then edges). -6. From there: identical to the `index` flow — H2 cache write, then `enrich`, then visible via `serve`. - -**Verification:** -- `codeiq plugins list` introspects via `@DetectorInfo` and confirms the detector is live. -- `codeiq stats ` — node-kind counts should change after re-indexing. -- Unit test `MyDetectorTest` (positive + negative + determinism) must pass via `mvn test`. - -**Failure modes:** -- Forgot `@Component` → silently disabled, no error. Test won't catch it (unit tests instantiate directly). Catch via `codeiq plugins list` showing the detector is missing. -- Missing discriminator guard on a framework detector → false positives across other frameworks. Catch via the negative-match unit test. -- Stateful instance fields → race conditions across virtual threads. Catch via the determinism test. diff --git a/docs/project/ui.md b/docs/project/ui.md deleted file mode 100644 index c46599db..00000000 --- a/docs/project/ui.md +++ /dev/null @@ -1,136 +0,0 @@ -# UI - -App-mode (not library-mode): codeiq ships a single React SPA bundled inside the JAR and served by Spring Boot's static-resource handler at `http://localhost:8080/` when running `codeiq serve`. - -## Stack - -- **Framework:** React 18.3 (`src/main/frontend/package.json`) -- **Build tool:** Vite 6.4 + TypeScript 5.7 (`src/main/frontend/vite.config.ts`, `tsconfig.json`) -- **UI kit:** Ant Design 5.24 + `@ant-design/icons` 5.6 -- **Charts:** ECharts 5.6 via `echarts-for-react` 3.0 -- **Routing:** `react-router-dom` 7 -- **Styling:** AntD's built-in theme system (no Tailwind, no CSS Modules); `context/ThemeContext.tsx` toggles light/dark via AntD's `ConfigProvider` token system. -- **State management:** local component state + a tiny `useApi` hook (`hooks/useApi.ts`); no Redux / Zustand / React Query. -- **Data fetching:** raw `fetch` wrapped in `lib/api.ts` + `hooks/useApi.ts`. - -## Entry & layout - -- **HTML entry:** `src/main/frontend/index.html` (Vite default). -- **JS entry:** `src/main/frontend/src/main.tsx` → renders `` (`src/main/frontend/src/App.tsx`). -- **Root shell:** `App.tsx` wires the AntD `ConfigProvider`, the `ThemeContext.Provider`, and `react-router-dom`'s `BrowserRouter` + `Routes`. -- **Layout:** `components/AppLayout.tsx` — sidebar + content area; light/dark toggle via `useTheme()` from `ThemeContext.tsx`. -- **Provider stack** (outer → inner): AntD `ConfigProvider` → `ThemeContext.Provider` → `BrowserRouter` → `AppLayout` → page route. - -## Component organization - -``` -src/main/frontend/src/ -├── main.tsx — Vite entry, renders -├── App.tsx — providers + routes -├── env.d.ts — Vite env-var types -├── components/ -│ └── AppLayout.tsx — sidebar + content layout, theme toggle -├── context/ -│ └── ThemeContext.tsx — light/dark toggle -├── hooks/ -│ └── useApi.ts — generic API-call hook (loading / error / data) -├── lib/ -│ ├── api.ts — fetch wrapper + endpoint helpers -│ └── mcp-tools.ts — TOOLS, CATEGORIES, toolsByCategory, McpTool type -├── pages/ — one file per route -│ ├── Dashboard.tsx — stats overview + MCP tool launcher -│ ├── CodebaseMap.tsx — file-tree explorer -│ ├── Explorer.tsx — node/edge browser with kind filter + search -│ └── McpConsole.tsx — interactive MCP-tool playground -└── types/ - └── api.ts — TypeScript types matching the REST API shapes -``` - -**Conventions:** -- **`@/...` import alias** resolves to `src/main/frontend/src/...` (`vite.config.ts` `resolve.alias` + `tsconfig.json` `paths`). Always use the alias — never `../../../`. -- **One component per file**, `PascalCase.tsx`. -- **Pages are at `src/pages/`**; shared/UI primitives at `src/components/`. Reusable, non-page UI primitives haven't grown enough to warrant a `ui/` sublayer yet — fold into `components/` until that becomes painful. -- **No test colocation** for the SPA — frontend tests are E2E only via Playwright. Component-level testing isn't currently practiced. - -## Routes - -(Inferred from page filenames; **verify in `src/main/frontend/src/App.tsx`** before relying.) - -- `/` → `Dashboard` -- `/explorer` → `Explorer` -- `/codebase-map` → `CodebaseMap` -- `/mcp` → `McpConsole` - -## Design system - -- **Tokens:** AntD's built-in token system, customized via `ConfigProvider` in `App.tsx` and theme-keyed via `ThemeContext.tsx`. No standalone token file. -- **Primitives:** AntD components used directly (`Button`, `Layout`, `Menu`, `Table`, `Input`, etc.). No internal wrapper library. -- **Icons:** `@ant-design/icons` (`SunOutlined`, `MoonOutlined`, etc. — see `components/AppLayout.tsx`). - -## Data fetching - -`hooks/useApi.ts` wraps `lib/api.ts`'s `api.(...)` calls and exposes `{ data, loading, error, refetch }`. Page components use it like: - -```ts -const { data, loading, error } = useApi(() => api.stats()); -``` - -Endpoint helpers live in `lib/api.ts`; response types in `types/api.ts`. The MCP tools list — used by `Dashboard` and `McpConsole` — is a static client-side catalog at `lib/mcp-tools.ts` (it mirrors `mcp/McpTools.java` server-side; **must be kept in sync manually** when adding a tool). - -## Forms & validation - -Minimal — no `react-hook-form` / `formik`. The `McpConsole` builds parameter inputs dynamically from `lib/mcp-tools.ts` definitions; validation is "send and surface server error". This is fine for an internal dev tool. - -## i18n / a11y / theming - -- **i18n:** none. Strings are inline English. codeiq is a developer tool; no plan to localize. -- **a11y:** Playwright config integrates `@axe-core/playwright` (`src/main/frontend/package.json` devDep) — accessibility audits run as part of E2E. AntD's primitives carry sensible roles/labels; custom components inherit those. -- **Theming:** `ThemeContext.tsx` flips a boolean → AntD token theme (`defaultAlgorithm` vs `darkAlgorithm`). The toggle is in the layout header. No `prefers-color-scheme` auto-detection currently — feature gap if you care. - -## Performance notes - -- **Manual chunk splitting** in `vite.config.ts` (`build.rollupOptions.output.manualChunks`): - - `vendor-react` — React + react-dom + react-router-dom - - `vendor-antd` — antd + @ant-design/icons - - `vendor-echarts` — echarts + echarts-for-react - - Keeps the AntD chunk and the ECharts chunk out of the initial paint; both are heavy. -- **`chunkSizeWarningLimit: 1200`** — Vite's default 500 KB warning was too noisy for the AntD chunk; raised deliberately. -- **`emptyOutDir: false`** — preserves manually-placed assets in `src/main/resources/static/` between builds. If you see leftover files, delete the dir manually. -- **`sourcemap: false`** — production output ships without sourcemaps (the JAR is the ship artifact; sourcemaps would balloon it). - -## Dev loop - -```bash -# Backend — terminal 1 -java -jar target/code-iq-*-cli.jar serve /path/to/scan-target - -# Frontend — terminal 2 -cd src/main/frontend -npm install # only first time -npm run dev # Vite HMR on :5173, proxies /api and /mcp to :8080 -``` - -The Vite dev-server proxy is defined at the bottom of `vite.config.ts`: - -```ts -server: { - proxy: { - '/api': 'http://localhost:8080', - '/mcp': 'http://localhost:8080', - }, -} -``` - -## Production build → JAR embed - -`mvn package` triggers `frontend-maven-plugin` which runs `npm ci` + `npm run build`. Vite's `build.outDir: '../resources/static'` writes assets into `src/main/resources/static/`, which Spring Boot's static-resource handler serves out of the JAR at runtime when `codeiq.ui.enabled=true` (default true; toggle in `application.yml`). - -To skip the frontend build during backend-only iteration: `mvn test -Dfrontend.skip=true` (the property is wired in `pom.xml`'s `` block as `false`). - -## Gotchas - -- **`lib/mcp-tools.ts` is hand-maintained** — when you add a new `@McpTool` in `mcp/McpTools.java`, you must mirror the entry in `lib/mcp-tools.ts` for the `McpConsole` and `Dashboard` to know about it. There is no auto-sync. -- **`emptyOutDir: false`** — stale assets in `src/main/resources/static/` won't be deleted by Vite. If you renamed a chunk or removed a page, manually delete the static dir before the next build. -- **MCP endpoint path is `/mcp`**, not `/api/mcp` — the Vite proxy reflects this. The Spring AI starter mounts MCP at the root. -- **AntD chunk size is intentional.** Don't try to "fix" the 500 KB+ AntD chunk by code-splitting per page — the AntD design tokens shouldn't be reloaded per route. The manual chunk in `vite.config.ts` is the right granularity. diff --git a/shared/runbooks/release-go.md b/shared/runbooks/release-go.md index f0ec7031..66c940aa 100644 --- a/shared/runbooks/release-go.md +++ b/shared/runbooks/release-go.md @@ -17,7 +17,6 @@ The pipeline is **tag-triggered, fully automated, and keyless-signed**: the public Rekor log). 6. GitHub release is created as a **draft** with the verification recipe embedded in the release notes header. -7. Optional Homebrew tap publish — see "Homebrew tap" below. ## Cutting a release @@ -40,8 +39,6 @@ Within ~5 minutes: - `release-go` workflow finishes and creates a **draft** Release. - Sigstore transparency log records the signature. -- (If `HOMEBREW_TAP_GITHUB_TOKEN` is configured) the `homebrew-codeiq` - tap gets a Formula bump. Review the draft release on GitHub — verify artifact list, checksums, SBOM presence, release notes — then click **Publish release**. @@ -69,24 +66,6 @@ A successful `cosign verify-blob` proves: - The build ran on a GitHub-hosted runner under GitHub's OIDC token. - The signature was logged to the Rekor public transparency log. -## Homebrew tap - -The tap repo lives at `RandomCodeSpace/homebrew-codeiq` (separate from -the main repo; Homebrew's convention). - -Setup checklist (one-time, by a repo admin): - -1. Create the repo `homebrew-codeiq` under the `RandomCodeSpace` org. -2. Generate a fine-grained PAT with `Contents: write` on - `homebrew-codeiq` only. -3. Add it to `codeiq` repo secrets as `HOMEBREW_TAP_GITHUB_TOKEN`. - -After setup, every tag release updates the Formula automatically. - -If the secret is **not** set, the Homebrew step in `.goreleaser.yml` -skips silently — useful for forks and for local `goreleaser release ---snapshot` dry runs. - ## Local dry run To validate `.goreleaser.yml` without cutting a release: @@ -98,9 +77,8 @@ ls dist/ ``` The `--snapshot` flag forces a fake version `-next` and -disables publish steps (no GitHub upload, no signing, no Homebrew). -CGO is needed locally — `CGO_ENABLED=1` is set in -`.goreleaser.yml/env`. +disables publish steps (no GitHub upload, no signing). CGO is needed +locally — `CGO_ENABLED=1` is set in `.goreleaser.yml/env`. ## Failure recovery @@ -111,9 +89,6 @@ CGO is needed locally — `CGO_ENABLED=1` is set in - **Signing failure (OIDC token)** — usually transient. Re-run the workflow. The OIDC permissions in `release-go.yml` are correct; GitHub occasionally has Sigstore connectivity issues. -- **Homebrew tap PR fails** — check the PAT scope and that the tap - repo exists. The main release still publishes; only the Formula - bump skips. ## What this does NOT do