Skip to content

Spark: Re-enable subquery-expression view tests for the Hive view catalog#16416

Open
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/15053-hive-view-subquery
Open

Spark: Re-enable subquery-expression view tests for the Hive view catalog#16416
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/15053-hive-view-subquery

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

@wombatu-kun wombatu-kun commented May 19, 2026

Closes #15053

What

TestViews#createViewWithSubqueryExpressionInFilterThatIsRewritten and ...InQueryThatIsRewritten were skipped for the Hive view catalog (SPARK_WITH_HIVE_VIEWS) by an assumeThat that blamed a FileInputFormat instantiation error. This re-enables both tests for the Hive view catalog on Spark 3.5, 4.0 and 4.1.

Why

Investigation (see #15053) shows the Hive-backed subquery view itself works — creating the view, reading it, and the catalog/namespace SQL rewrite all pass. The only failing step was the cross-catalog negative assertion: it switches to spark_catalog and expects the unqualified table to be unresolvable. That premise does not hold for the Hive view catalog, because it shares its Hive Metastore with Spark's built-in spark_catalog — the Iceberg table is resolvable there, and a native Hive read of it then fails at execution time (Iceberg deliberately writes a placeholder InputFormat when engine.hive.enabled is false, since a native read of an Iceberg table would otherwise return silently-wrong results). So this is a test-correctness issue, not an Iceberg view defect, and the abstract-InputFormat behavior is intentional and out of scope.

How

  • Remove the blanket assumeThat(...).isNotEqualTo("hive") so both tests run for the Hive view catalog.
  • Branch the cross-catalog negative assertion: for the Hive view catalog assert the raw SQL fails at execution with SparkException (table found in the shared metastore but not natively readable); otherwise keep the existing AnalysisException not-found assertion.
  • Suppress the custom Checkstyle rule AssertThatThrownByWithMessageCheck on these two tests via @SuppressWarnings("checkstyle:AssertThatThrownByWithMessageCheck") — the same inline mechanism already used elsewhere in the repo (e.g. TestReadProjection, TestSerializedMetadata) — because a message assertion is intentionally omitted here.
  • Applied identically to spark/v3.5, spark/v4.0 and spark/v4.1. (spark/v3.4 never had the Hive view catalog parameter, so it is unchanged.)

The Hive-specific assertion checks only the exception type because that is the only stable, version-independent signal: the root cause is a message-less java.lang.InstantiationException (Spark reflectively instantiating the abstract FileInputFormat placeholder), and the top-level org.apache.spark.SparkException message differs across Spark 3.5/4.0/4.1. Asserting on either would be empty or fragile, so the message check is deliberately skipped and the Checkstyle rule suppressed for these two tests only.

Testing

createViewWithSubqueryExpressionInFilterThatIsRewritten and createViewWithSubqueryExpressionInQueryThatIsRewritten now pass for all parameters, including SPARK_WITH_HIVE_VIEWS, on Spark 3.5, 4.0 and 4.1.

@github-actions github-actions Bot added the spark label May 19, 2026
@wombatu-kun wombatu-kun force-pushed the issue/15053-hive-view-subquery branch from eb3cfb9 to b655d39 Compare May 19, 2026 08:52
…alog

TestViews#createViewWithSubqueryExpressionInFilterThatIsRewritten and ...InQueryThatIsRewritten were skipped for the Hive view catalog via an assumeThat that attributed the failure to a FileInputFormat instantiation error. Investigation shows the Hive-backed subquery view itself works: creating the view, reading it, and the catalog/namespace SQL rewrite all pass. The only failing step was the cross-catalog negative assertion, which switches to spark_catalog and expects the unqualified table to be unresolvable. That premise does not hold for the Hive view catalog because it shares its Hive Metastore with Spark's built-in spark_catalog: the Iceberg table is resolvable there, and a native Hive read of it then fails at execution time (Iceberg deliberately writes a placeholder InputFormat when engine.hive.enabled is false, since a native read of an Iceberg table would otherwise be silently wrong).

Remove the blanket skip so both tests run for the Hive view catalog, and branch the negative assertion: for the Hive view catalog the raw SQL is expected to fail at execution with SparkException (the table is found in the shared metastore but is not natively readable), otherwise the existing AnalysisException not-found assertion is kept. The change is applied identically to spark/v3.5, v4.0 and v4.1. The only stable, version-independent signal for the Hive failure is the exception type: its root cause is a message-less java.lang.InstantiationException (Spark reflectively instantiating the abstract FileInputFormat placeholder) and the top-level SparkException message differs across Spark 3.5/4.0/4.1, so the assertion intentionally checks only the type and the custom Checkstyle rule AssertThatThrownByWithMessageCheck is suppressed on these two tests via @SuppressWarnings, matching existing usages elsewhere in the repo (e.g. TestReadProjection, TestSerializedMetadata).

Both tests now pass for all parameters, including SPARK_WITH_HIVE_VIEWS, on Spark 3.5, 4.0 and 4.1.

Closes apache#15053

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wombatu-kun wombatu-kun force-pushed the issue/15053-hive-view-subquery branch from b655d39 to be90e7b Compare May 20, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

1 participant