fix(enrich): explicit QUOTE/ESCAPE so Kuzu COPY honors RFC-4180#153
Merged
Conversation
PR #150 switched the staging file delimiter to '|' to avoid JSON- property comma collisions. That fixes comma-bearing values but breaks when an ID itself contains a literal '|' — Istio's EDS cluster names are exactly this shape: json:istio/none_cds.json:inbound|7070|tcplocal|s1tcp.none Go's encoding/csv writer DOES wrap such fields in '"' per RFC-4180. But Kuzu's CSV reader defaults to BACKSLASH escaping, not the RFC-4180 doubled-quote form Go produces. With the default Kuzu escape rule, the pipe-bearing quoted field is parsed as multiple fields and the COPY aborts: Copy exception: Error in file ... expected 6 values per row, but got more. Fix: pass `QUOTE='"', ESCAPE='"'` explicitly so Kuzu interprets the RFC-4180 form Go writes. Applies to both copyNodeBatch and copyEdgeBatch. End-to-end: `codeiq enrich ~/projects/polyglot-bench/istio` now exits 0 (was exit 2 pre-fix): 36k nodes, 55k edges, 20 services. Regression test TestBulkLoadEdgesPipeInTargetID covers the exact Istio cluster-name shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
aksOps
added a commit
that referenced
this pull request
May 14, 2026
Stale doc references after Phase 6 (Java deletion, #132) and the Kuzu 0.7.1 → 0.11.3 bump (#155 + #159). - CLAUDE.md / PROJECT_SUMMARY.md: bump Kuzu 0.7.1 → 0.11.3, go-sqlite3 1.14.22 → 1.14.44, cobra to 1.10.2; note native FTS. - AGENTS.md: rewrite "What this repo is" (no more "REST API"); flip `mvn -B -ntp clean verify` → `go test ./...`; clarify that REST + React SPA were deleted in Phase 6 and won't return. - SECURITY.md: rewrite scope. Drop the dead JAR / serve / REST API / React UI / H2 / Neo4j Embedded references. New in-scope list covers every codeiq subcommand, the 10 MCP tools (with `run_cypher` mutation gate called out), `.codeiq/cache/` (SQLite) + `.codeiq/graph/` (Kuzu), and `read_file` path sandboxing. Add the security CI workflows (CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM, Socket Security) + perf-gate to the hardening references. - CHANGELOG.md: populate [Unreleased] with the OOM-fix saga (PRs #145-#148), the five correctness fixes (#149-#153), the Kuzu 0.7.1 → 0.11.3 bump (#155-#158), the FTS migration (#159), the Dependabot config rewrite (#154), and the enrich CLI knobs. No code changes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
#150 switched the staging-file delimiter to `|` to avoid comma collisions inside JSON property values. That fix works for commas but breaks when an ID itself contains a literal `|`. Istio's EDS cluster names are exactly this shape:
```
json:istio/none_cds.json:inbound|7070|tcplocal|s1tcp.none
```
Go's `encoding/csv` writer DOES wrap such fields in `"` per RFC-4180. But Kuzu's CSV reader defaults to backslash escaping, not the RFC-4180 doubled-quote form Go produces. With the default Kuzu escape rule, the pipe-bearing quoted field is parsed as multiple fields and COPY aborts:
```
Copy exception: Error in file /tmp/codeiq-edges-3435630223.csv on line 7319:
expected 6 values per row, but got more.
```
Fix
Pass `QUOTE='"', ESCAPE='"'` explicitly to the Kuzu COPY FROM clause for both `copyNodeBatch` and `copyEdgeBatch`. Kuzu now reads the RFC-4180 form Go writes.
Test plan
🤖 Generated with Claude Code