Skip to content

Panic in parseFileHeader when diff header lacks expected ' b/' separator #135

@ByteSizedMarius

Description

@ByteSizedMarius

Bug

parseFileHeader in diff.go panics with slice bounds out of range [:-1] when strings.Index returns -1 for a diff header that doesn't contain the expected b/ pattern. Admittedly this is very edgecase but I've encountered it twice

edit: I figured out that the issue was a filename checked into git with a german umlaut (ä). So its actually only slightly edgecase.

Reproduction

old test
func TestParseFileHeader_MalformedDiffLine(t *testing.T) {
	// A diff header without the ' b/' separator causes a panic.
	// This can happen with unusual filenames or for truncated git output.
	input := "diff --git a/somefile a/somefile\n"

	done := make(chan SteamParseDiffResult, 1)
	assert.NotPanics(t, func() {
		StreamParseDiff(strings.NewReader(input), done, 10, 10, 10)
		<-done
	})
}

This panics because strings.Index(line, " b/") returns -1, then line[beg+2 : -1] at line 278 triggers the panic.

Since StreamParseDiff (and (*Repository).Diff) runs in a goroutine, the panic is unrecoverable by the caller and crashes the entire process.

EDIT: here's tests with föo.txt / foo.txt

// TestParseFileHeader_AsymmetricQuoting reproduces a panic in parseFileHeader
// when git quotes only one side of a diff --git header.
//
// Git quotes each path independently based on whether it contains non-ASCII
// characters. A rename from a non-ASCII name to an ASCII name produces:
//
//   diff --git "a/f\303\266o.txt" b/foo.txt
//
// The a-side is quoted (contains ö = \303\266), the b-side is not. The parser
// checks line[len(diffHead)] == '"' and, finding a quote, searches for ` "b/`.
// But the b-side has no quote, so strings.Index returns -1, causing a panic
// at: a := line[beg+2 : middle]
//
// Real-world trigger: git diff --full-index -M <parent> <child> on a repo
// where a file with non-ASCII characters was renamed to an ASCII filename.
func TestParseFileHeader_AsymmetricQuoting(t *testing.T) {
	tests := []struct {
		name string
		diff string
	}{
		{
			name: "quoted a-side, unquoted b-side",
			diff: "diff --git \"a/f\\303\\266o.txt\" b/foo.txt\n" +
				"similarity index 85%\n" +
				"rename from \"f\\303\\266o.txt\"\n" +
				"rename to foo.txt\n" +
				"index abc1234..def5678 100644\n" +
				"--- \"a/f\\303\\266o.txt\"\n" +
				"+++ b/foo.txt\n" +
				"@@ -1,3 +1,3 @@\n" +
				"-old line\n" +
				"+new line\n" +
				" context\n",
		},
		{
			name: "unquoted a-side, quoted b-side",
			diff: "diff --git a/foo.txt \"b/f\\303\\266o.txt\"\n" +
				"similarity index 85%\n" +
				"rename from foo.txt\n" +
				"rename to \"f\\303\\266o.txt\"\n" +
				"index abc1234..def5678 100644\n" +
				"--- a/foo.txt\n" +
				"+++ \"b/f\\303\\266o.txt\"\n" +
				"@@ -1,3 +1,3 @@\n" +
				"-old line\n" +
				"+new line\n" +
				" context\n",
		},
		{
			name: "both sides quoted (should already work)",
			diff: "diff --git \"a/f\\303\\266o.txt\" \"b/f\\303\\266o.txt\"\n" +
				"index abc1234..def5678 100644\n" +
				"--- \"a/f\\303\\266o.txt\"\n" +
				"+++ \"b/f\\303\\266o.txt\"\n" +
				"@@ -1,3 +1,3 @@\n" +
				"-old line\n" +
				"+new line\n" +
				" context\n",
		},
	}

	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			done := make(chan git.SteamParseDiffResult)
			r := io.NopCloser(strings.NewReader(tt.diff))
			go git.StreamParseDiff(r, done, 100, 100, 100)
			result := <-done
			if result.Err != nil {
				t.Errorf("unexpected error: %v", result.Err)
			}
			if result.Diff == nil {
				t.Fatal("expected non-nil diff")
			}
			if len(result.Diff.Files) != 1 {
				t.Fatalf("expected 1 file, got %d", len(result.Diff.Files))
			}
		})
	}
}

Stack trace

panic: runtime error: slice bounds out of range [:-1]

goroutine 830 [running]:
github.com/aymanbagabas/git-module.(*diffParser).parseFileHeader(0xc0002124c0)
	diff.go:278 +0xa6d
github.com/aymanbagabas/git-module.(*diffParser).parse(0xc0002124c0)
	diff.go:489 +0x3af
github.com/aymanbagabas/git-module.StreamParseDiff(...)
	diff.go:553 +0x1d0

edit/note: stacktrace shows https://github.com/aymanbagabas/git-module where I noticed the issue, but the code hasnt changed upstream

Fix

Add a bounds check before slicing:

if middle == -1 {
	return nil, fmt.Errorf("malformed diff header: missing file separator in %q", line)
}

-> this will catch other edge cases but doesnt solve the actual issue.

proposed fix:

func (p *diffParser) parseFileHeader() (*DiffFile, error) {
	line := string(p.buffer)
	p.buffer = nil

	// Git quotes each side of the diff header independently when a path
	// contains non-ASCII characters. All four combinations are possible:
	//   diff --git a/foo b/bar             (neither quoted)
	//   diff --git "a/föo" "b/bär"         (both quoted)
	//   diff --git "a/föo" b/bar           (only a quoted)
	//   diff --git a/foo "b/bär"           (only b quoted)
	beg := len(diffHead)
	aQuoted := line[beg] == '"'

	var middle int
	if aQuoted {
		// Find closing quote of a-path to locate the separator
		end := strings.Index(line[beg+1:], `"`)
		if end == -1 {
			return nil, fmt.Errorf("malformed diff header: unterminated quote: %s", line)
		}
		middle = beg + 1 + end + 1 // position of space after closing quote
	} else {
		middle = strings.Index(line, ` "b/`)
		if middle == -1 {
			middle = strings.Index(line, ` b/`)
		}
	}

	if middle == -1 || middle >= len(line)-1 {
		return nil, fmt.Errorf("malformed diff header: %s", line)
	}

	bQuoted := line[middle+1] == '"'

	// Extract raw a and b path strings (with prefix, possibly quoted)
	aRaw := line[beg:middle]
	bRaw := line[middle+1:]

	// Strip quotes and a/, b/ prefixes
	var a, b string
	if aQuoted {
		inner := aRaw[1 : len(aRaw)-1]
		a = string(UnescapeChars([]byte(inner[2:])))
	} else {
		a = aRaw[2:]
	}
	if bQuoted {
		inner := bRaw[1 : len(bRaw)-1]
		b = string(UnescapeChars([]byte(inner[2:])))
	} else {
		b = bRaw[2:]
	}

	file := &DiffFile{
		Name:    a,
		oldName: b,
		Type:    DiffFileChange,
	}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions