Skip to content

refactor(graph): use Kuzu 0.11 native features (FTS, param LIMIT, []string)#159

Merged
aksOps merged 1 commit into
mainfrom
explore/kuzu-011-fts-limit-probe
May 14, 2026
Merged

refactor(graph): use Kuzu 0.11 native features (FTS, param LIMIT, []string)#159
aksOps merged 1 commit into
mainfrom
explore/kuzu-011-fts-limit-probe

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 14, 2026

Summary

Kuzu 0.11.3 (merged in #155) lifts several restrictions that 0.7.1 had. This PR unwinds the workarounds we coded against the older runtime and exposes the native capabilities — most notably real FTS with BM25 ranking instead of CONTAINS predicates.

What changed

FTS (fulltext search) — `internal/graph/indexes.go`

  • `CreateIndexes()` was a no-op (Kuzu 0.7.1 needed a network `INSTALL fts` — incompatible with air-gapped builds). 0.11.3 ships FTS bundled.
  • Now creates two FTS indexes after enrich:
    • `code_node_label_fts` over `(label, fqn_lower)`
    • `code_node_lexical_fts` over `(prop_lex_comment, prop_lex_config_keys)`
  • `SearchByLabel` / `SearchLexical` use `CALL QUERY_FTS_INDEX` with BM25 ranking. Auto-appends `*` for prefix match on bare single-token queries (preserves the old "AuthService matches 'auth'" UX).
  • CONTAINS-based fallbacks retained for graphs without enrich (pre-index state).
  • Mutation gate (`MutationKeyword`): `CALL QUERY_FTS_INDEX` is allow-listed (read-only); `CREATE_FTS_INDEX` / `DROP_FTS_INDEX` stay blocked under `OpenReadOnly`.

Parameterized LIMIT / SKIP

Kuzu 0.11.3 accepts `LIMIT $param` and `SKIP $param` as bound parameters; 0.7.1 required inline literals. Cleaned up at:

  • `internal/graph/indexes.go` — SearchByLabel / SearchLexical
  • `internal/graph/reads.go` — FindByKindPaginated
  • `internal/query/service.go` — FindCycles, FindDeadCode
  • `internal/mcp/tools_graph.go` — list-edges, ego-neighbours, endpoints-by-id
  • Helper `intLiteral` removed (only used to format inline LIMITs).

Drop `stringsToAny` widener

Kuzu 0.7's Go binding required `[]any` for list parameters; 0.11.3 accepts `[]string` directly. The widener helper is gone; `query.FindDeadCode` and `topology.FindServicesContainingNodes` pass `[]string` straight.

Still present in 0.11.3 (workarounds retained)

  • List comprehension binder still rejects out-of-scope vars — keep `properties(nodes(p), 'id')`
  • `EXISTS { … }` subquery still doesn't see outer-scope `$param` — keep rel-pattern alternation
  • Recursive pattern upper bound `[*1..N]` still requires a literal
  • BlastRadius's anonymous-recursive pattern stays

CLAUDE.md

Rewrote the Kuzu quirks section as "lifted in 0.11.3" vs "still present" buckets so future contributors don't reintroduce workarounds that the runtime no longer needs.

Test plan

  • `cd go && CGO_ENABLED=1 go test ./... -count=1` — 883 passed
  • End-to-end on `polyglot-bench/airflow`: enrich exit 0, 95k nodes / 246k edges, FTS search returns BM25-ranked hits
  • End-to-end on `~/projects/`: enrich exit 0, 187k nodes / 414k edges / 1m 29s / 1.88 GiB peak RSS (slight perf improvement vs pre-refactor)
  • `CALL QUERY_FTS_INDEX('CodeNode', 'code_node_label_fts', 'service*') ... ORDER BY score DESC LIMIT 5` returns ranked results (scores 12–14 across method/class kinds)

Out of scope

  • Removing the lower-cased `label_lower` / `fqn_lower` columns from the schema — they still back the CONTAINS fallback path. Migrating fully off them is a separate, riskier schema change.
  • Tokenizer tuning (camelCase splitting, stem) — defaults work well for the codebase patterns observed.

🤖 Generated with Claude Code

…tring)

Kuzu 0.11.3 bundles features that were unavailable or broken in 0.7.1.
This commit unwinds the workarounds documented in CLAUDE.md.

### FTS (fulltext search)

`CreateIndexes()` was a no-op because Kuzu 0.7.1's FTS extension needed
a network INSTALL (incompatible with air-gapped builds). 0.11.3 ships
FTS pre-bundled. `CreateIndexes()` now:

- `INSTALL fts; LOAD EXTENSION fts;`
- `CALL DROP_FTS_INDEX` / `CALL CREATE_FTS_INDEX` for two indexes:
  - `code_node_label_fts`   over `(label, fqn_lower)`
  - `code_node_lexical_fts` over `(prop_lex_comment, prop_lex_config_keys)`

`SearchByLabel` / `SearchLexical` route through `CALL QUERY_FTS_INDEX`
with BM25 score ranking. A trailing `*` is auto-appended when the user
query is a single bare token, giving prefix-match UX similar to the old
CONTAINS behaviour. CONTAINS-based fallbacks remain in place for graphs
that never ran enrich (FTS index would be missing).

The mutation gate (`MutationKeyword`) allows the read-only
`CALL QUERY_FTS_INDEX` procedure; the catalog writers
`CALL CREATE_FTS_INDEX` / `CALL DROP_FTS_INDEX` stay blocked under
`OpenReadOnly`.

### Parameterized LIMIT / SKIP

Kuzu 0.7.1 rejected `$lim` / `$skip` bindings — values had to be inline
literals. 0.11.3 accepts them as bound parameters. Affected sites:

- `internal/graph/indexes.go` — SearchByLabel / SearchLexical
- `internal/graph/reads.go`   — FindByKindPaginated
- `internal/query/service.go` — FindCycles, FindDeadCode
- `internal/mcp/tools_graph.go` — list-edges, ego-neighbours, endpoints-by-id

Helper `intLiteral` is removed (was only used to format inline LIMITs).

### Drop `stringsToAny` widener

Kuzu 0.7's Go binding required `[]any` for list parameters; `[]string`
tripped `unsupported type` in `goValueToKuzuValue`. 0.11.3's binding
accepts `[]string` directly. The widener helper is removed and its
two callers (`query.FindDeadCode`, `topology.FindServicesContainingNodes`)
pass `[]string` straight.

### CLAUDE.md

Reworked the Kuzu quirks section into "lifted in 0.11.3" vs "still
present" buckets so future contributors don't reintroduce workarounds
that the runtime no longer needs.

### Verification

- `cd go && CGO_ENABLED=1 go test ./... -count=1` — 883 passed
- End-to-end on `~/projects/polyglot-bench/airflow`:
  enrich exit 0, 95k nodes, 246k edges, FTS search returns BM25-ranked hits
- End-to-end on `~/projects/`:
  enrich exit 0, 187k nodes, 414k edges, 1m 29s wall, 1.88 GiB peak RSS
  FTS `'service*'` returns top-5 ranked at scores ~12-14

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aksOps aksOps merged commit 799be73 into main May 14, 2026
13 checks passed
@aksOps aksOps deleted the explore/kuzu-011-fts-limit-probe branch May 14, 2026 00:51
aksOps added a commit that referenced this pull request May 14, 2026
Stale doc references after Phase 6 (Java deletion, #132) and the Kuzu
0.7.1 → 0.11.3 bump (#155 + #159).

- CLAUDE.md / PROJECT_SUMMARY.md: bump Kuzu 0.7.1 → 0.11.3,
  go-sqlite3 1.14.22 → 1.14.44, cobra to 1.10.2; note native FTS.
- AGENTS.md: rewrite "What this repo is" (no more "REST API");
  flip `mvn -B -ntp clean verify` → `go test ./...`; clarify that
  REST + React SPA were deleted in Phase 6 and won't return.
- SECURITY.md: rewrite scope. Drop the dead JAR / serve / REST API /
  React UI / H2 / Neo4j Embedded references. New in-scope list covers
  every codeiq subcommand, the 10 MCP tools (with `run_cypher` mutation
  gate called out), `.codeiq/cache/` (SQLite) + `.codeiq/graph/`
  (Kuzu), and `read_file` path sandboxing. Add the security CI
  workflows (CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM,
  Socket Security) + perf-gate to the hardening references.
- CHANGELOG.md: populate [Unreleased] with the OOM-fix saga
  (PRs #145-#148), the five correctness fixes (#149-#153), the
  Kuzu 0.7.1 → 0.11.3 bump (#155-#158), the FTS migration (#159),
  the Dependabot config rewrite (#154), and the enrich CLI knobs.

No code changes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant