Skip to content

Docs: Fix spark-quickstart to align with Docker setup#16436

Open
KodaiD wants to merge 1 commit into
apache:mainfrom
KodaiD:docs-fix-catalog-section-for-docker
Open

Docs: Fix spark-quickstart to align with Docker setup#16436
KodaiD wants to merge 1 commit into
apache:mainfrom
KodaiD:docs-fix-catalog-section-for-docker

Conversation

@KodaiD
Copy link
Copy Markdown

@KodaiD KodaiD commented May 20, 2026

Problem

The "Adding A Catalog" section in spark-quickstart.md runs standalone spark-sql commands, while the rest of the guide uses docker exec with the spark-iceberg image. This inconsistency makes the tutorial difficult to follow. Additionally, the catalog type is described as "JDBC" when the configuration actually uses Hadoop catalog.

Solutions

This PR updates the section to align with the Docker-based setup used in the rest of the guide, and fixes the typo.

Changes:

  • CLI tab: Replace spark-sql with docker exec command and remove --packages flag already bundled in the image
  • spark-defaults.conf tab: Provide a docker exec command to append the config, and remove spark.jars.packages
  • Catalog type description: Fix JDBCHadoop catalog to match the actual configuration

@github-actions github-actions Bot added the docs label May 20, 2026
@KodaiD
Copy link
Copy Markdown
Author

KodaiD commented May 20, 2026

Verified locally by following the updated steps.

$ docker exec -it spark-iceberg spark-sql \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hive \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=/home/iceberg/warehouse \
    --conf spark.sql.defaultCatalog=local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
26/05/20 02:01:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26/05/20 02:01:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark Web UI available at http://4cdbfd11e5bc:4041/
Spark master: local[*], Application Id: local-1779242488876
spark-sql ()> CREATE DATABASE local.db;
Time taken: 0.471 seconds
spark-sql ()> CREATE TABLE local.db.sample (id int, name string);
Time taken: 0.285 seconds
spark-sql ()>
What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug spark-iceberg
    Learn more at https://docs.docker.com/go/debug-cli/
$ ls warehouse/db/sample/metadata/
v1.metadata.json        version-hint.text
$ docker exec -it spark-iceberg bash -c "cat << EOF >> /opt/spark/conf/spark-defaults.conf
spark.sql.extensions                                 org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog                      org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type                 hive
spark.sql.catalog.local                              org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type                         hadoop
spark.sql.catalog.local.warehouse                    /home/iceberg/warehouse
spark.sql.defaultCatalog                             local
EOF"

What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug spark-iceberg
    Learn more at https://docs.docker.com/go/debug-cli/
$ docker exec -it spark-iceberg spark-sql
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
26/05/20 02:09:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
26/05/20 02:09:55 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark Web UI available at http://a68f389eb8e9:4041/
Spark master: local[*], Application Id: local-1779242995931
spark-sql ()> CREATE DATABASE local.db;
Time taken: 0.431 seconds
spark-sql ()> CREATE TABLE local.db.sample (id int, name string);
Time taken: 0.208 seconds
spark-sql ()>
What's next:
    Try Docker Debug for seamless, persistent debugging tools in any container or image → docker debug spark-iceberg
    Learn more at https://docs.docker.com/go/debug-cli/
$ ls warehouse/db/sample/metadata/
v1.metadata.json        version-hint.text

@kevinjqliu
Copy link
Copy Markdown
Contributor

thanks for working on this @KodaiD

Additionally, the catalog type is described as "JDBC" when the configuration actually uses Hadoop catalog.

I think we want to limit the usage of Hadoop catalog in our docs, and encourage JDBC instead.
I've tried to do this in #11285 and #11845 before but didnt get a chance to finish.
Would you like help take it over the finish line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants