Docs: Fix spark-quickstart to align with Docker setup by KodaiD · Pull Request #16436 · apache/iceberg

KodaiD · 2026-05-20T02:36:34Z

Problem

The "Adding A Catalog" section in spark-quickstart.md runs standalone spark-sql commands, while the rest of the guide uses docker exec with the spark-iceberg image. This inconsistency makes the tutorial difficult to follow. Additionally, the catalog type is described as "JDBC" when the configuration actually uses Hadoop catalog.

Solutions

This PR updates the section to align with the Docker-based setup used in the rest of the guide, and fixes the typo.

Changes:

CLI tab: Replace spark-sql with docker exec command and remove --packages flag already bundled in the image
spark-defaults.conf tab: Provide a docker exec command to append the config, and remove spark.jars.packages
Catalog type description: Fix JDBC → Hadoop catalog to match the actual configuration

KodaiD · 2026-05-20T02:37:02Z

Verified locally by following the updated steps.

$ docker exec -it spark-iceberg spark-sql \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hive \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=/home/iceberg/warehouse \
    --conf spark.sql.defaultCatalog=local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
26/05/20 02:01:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26/05/20 02:01:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark Web UI available at http://4cdbfd11e5bc:4041/
Spark master: local[*], Application Id: local-1779242488876
spark-sql ()> CREATE DATABASE local.db;
Time taken: 0.471 seconds
spark-sql ()> CREATE TABLE local.db.sample (id int, name string);
Time taken: 0.285 seconds
spark-sql ()>
What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug spark-iceberg
    Learn more at https://docs.docker.com/go/debug-cli/
$ ls warehouse/db/sample/metadata/
v1.metadata.json        version-hint.text

$ docker exec -it spark-iceberg bash -c "cat << EOF >> /opt/spark/conf/spark-defaults.conf
spark.sql.extensions                                 org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog                      org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type                 hive
spark.sql.catalog.local                              org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type                         hadoop
spark.sql.catalog.local.warehouse                    /home/iceberg/warehouse
spark.sql.defaultCatalog                             local
EOF"

What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug spark-iceberg
    Learn more at https://docs.docker.com/go/debug-cli/
$ docker exec -it spark-iceberg spark-sql
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
26/05/20 02:09:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26/05/20 02:09:55 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark Web UI available at http://a68f389eb8e9:4041/
Spark master: local[*], Application Id: local-1779242995931
spark-sql ()> CREATE DATABASE local.db;
Time taken: 0.431 seconds
spark-sql ()> CREATE TABLE local.db.sample (id int, name string);
Time taken: 0.208 seconds
spark-sql ()>
What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug spark-iceberg
    Learn more at https://docs.docker.com/go/debug-cli/
$ ls warehouse/db/sample/metadata/
v1.metadata.json        version-hint.text

kevinjqliu · 2026-05-20T16:16:31Z

thanks for working on this @KodaiD

Additionally, the catalog type is described as "JDBC" when the configuration actually uses Hadoop catalog.

I think we want to limit the usage of Hadoop catalog in our docs, and encourage JDBC instead.
I've tried to do this in #11285 and #11845 before but didnt get a chance to finish.
Would you like help take it over the finish line?

Docs: Fix spark-quickstart to align with Docker setup

71750d4

github-actions Bot added the docs label May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Fix spark-quickstart to align with Docker setup#16436

Docs: Fix spark-quickstart to align with Docker setup#16436
KodaiD wants to merge 1 commit into
apache:mainfrom
KodaiD:docs-fix-catalog-section-for-docker

KodaiD commented May 20, 2026

Uh oh!

KodaiD commented May 20, 2026 •

edited

Loading

Uh oh!

kevinjqliu commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KodaiD commented May 20, 2026

Problem

Solutions

Uh oh!

KodaiD commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinjqliu commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KodaiD commented May 20, 2026 •

edited

Loading