Skip to content

fix: detect binary files via NUL bytes, drop x/tools#55

Open
leno23 wants to merge 1 commit into
dolph:mainfrom
leno23:fix/binary-nul-detection-issue-9
Open

fix: detect binary files via NUL bytes, drop x/tools#55
leno23 wants to merge 1 commit into
dolph:mainfrom
leno23:fix/binary-nul-detection-issue-9

Conversation

@leno23
Copy link
Copy Markdown

@leno23 leno23 commented May 17, 2026

Summary

  • Replace godoc/util.IsText with NUL-byte and UTF-8 validation on the prefix and full streamed read.
  • Skip files containing a NUL anywhere to avoid corrupting mixed text/binary content.
  • Remove the golang.org/x/tools dependency.

Test plan

  • go test ./...
  • TestReadSkipsBinaryWithNUL

Closes #9

Made with Cursor

Replace godoc/util.IsText with an in-tree UTF-8 and NUL-byte check on
the read prefix and while streaming the remainder. Skip rewriting files
that contain a NUL anywhere, avoiding corruption of mixed text/binary
content.

Closes dolph#9

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae53ef65d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread file_handling.go
if n == 0 {
return ""
}
if !isTextBytes(buf[:n]) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle UTF-8 rune split across probe boundary

Using isTextBytes on the first 1024-byte probe rejects valid UTF-8 files when a multibyte rune is split at byte 1024 (for example, 1023 ASCII bytes followed by é). In that case utf8.Valid returns false on the prefix even though the full file is valid text, so Read() returns "" and replacements are silently skipped for legitimate text files.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Binary detection samples only the first 1024 bytes; mixed files are corrupted

1 participant