Spark: Fix type mismatch in SPJ with bucket partition key on string column by ammarchalifah · Pull Request #16424 · apache/iceberg

ammarchalifah · 2026-05-19T14:38:29Z

Problem

When a table is partitioned by bucket(N, string_column), the bucket transform produces an Integer partition value. During Storage Partitioned Joins (SPJ), Spark reads partition values through StructInternalRow, which calls struct.get(ordinal, CharSequence.class) in getUTF8StringInternal(). This assumes the value is always a CharSequence, causing a ClassCastException:

IllegalArgumentException: Wrong class, expected java.lang.CharSequence, but was java.lang.Integer, for object: 1

This affects any SPJ query (e.g. MERGE INTO or JOIN) on tables partitioned
with bucket(N, string_column).

Fix

Changed getUTF8StringInternal() to use struct.get(ordinal, Object.class) instead of struct.get(ordinal, CharSequence.class), then call value.toString(). This follows the same pattern already used by getBinaryInternal() in the same class, which uses Object.class to handle multiple possible runtime types.

The fix is applied to all Spark versions: 3.4, 3.5, 4.0, and 4.1.

Testing

Added testJoinsWithBucketingOnStringColumn using the existing checkJoin helper to cover bucket-only partitioning on string columns.
Added testJoinsWithIdentityAndBucketOnStringColumn as a targeted regression test for the exact scenario from the issue: identity + bucket partitioning on a string column with an SPJ join.

Both tests are added consistently across all 4 Spark versions.

Notes

AI tools were used to assist with drafting this change. I have reviewed and
validated the logic, tests, and code style end-to-end.

Closes #15349

…is string

ammarchalifah · 2026-05-19T14:39:22Z

This PR is a re-implementation of this closed PR: #15555

I was the reporter that filed the bug report, and really needed this bug to be fixed.

Resolving issue apache#15349, enable SPJ on bucket(N, col) where col …

e37c145

…is string

github-actions Bot added the spark label May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Fix type mismatch in SPJ with bucket partition key on string column#16424

Spark: Fix type mismatch in SPJ with bucket partition key on string column#16424
ammarchalifah wants to merge 1 commit into
apache:mainfrom
ammarchalifah:fix/spj-bucket-type-mismatch

ammarchalifah commented May 19, 2026

Uh oh!

ammarchalifah commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ammarchalifah commented May 19, 2026

Problem

Fix

Testing

Notes

Uh oh!

ammarchalifah commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant