fix(parser): unquote TOML keys and section headers#152
Merged
Conversation
Apache Airflow's `.cherry_picker.toml` uses TOML's quoted-key form:
"check_sha" = "..."
`parseTOML` was reading the LHS as the raw text including the literal
quotes. The TomlStructureDetector then emitted node IDs like
`toml:.cherry_picker.toml:"check_sha"` while the CONTAINS edges (and
any downstream lookup) referenced different shapes — Kuzu's BulkLoad
aborted with:
Copy exception: Unable to find primary key value
"toml:.cherry_picker.toml:""check_sha"""
Bug was symmetric for `["quoted-section"]` headers. Fix both: call
the existing `unquote` helper on the key/section before storing.
Regression tests added in structured_test.go (new file).
End-to-end: `codeiq enrich ~/projects/polyglot-bench/airflow` now
exits 0 (was exit 2): 95k nodes, 246k edges, 165 services loaded.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Apache Airflow's `.cherry_picker.toml` uses TOML's quoted-key form:
```toml
"check_sha" = "..."
```
`parseTOML` was reading the LHS as the raw text including the literal quotes. The `TomlStructureDetector` then emitted node IDs like `toml:.cherry_picker.toml:"check_sha"` while the CONTAINS edges referenced a different escaping shape, and Kuzu's BulkLoadEdges aborted:
```
Error: enrich: bulk load edges: graph: copy CONTAINS:
Copy exception: Unable to find primary key value
"toml:.cherry_picker.toml:""check_sha"""
```
This blocked end-to-end enrich on real-world repos with quoted TOML keys (caught running enrich on `apache/airflow` after #149/#150/#151 landed).
Fix
Use the existing `unquote` helper on both the section header and the key before storing in the parsed map. Symmetric fix because `["quoted-section"]` headers were broken the same way.
Test plan
🤖 Generated with Claude Code