Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1812
Open
zvonand wants to merge 5 commits into
Open
Conversation
…solution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert_spark_azure_fixes Resolve problems with paths and compatibility problems with Spark in Azure (v2) # Conflicts: # src/Interpreters/IcebergMetadataLog.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
…olution in next commit) --- Original cherry-pick message follows: Merge 136b6b2 into a1bf94d # Conflicts: # src/IO/S3/URI.cpp # src/IO/S3/URI.h # src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.h # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h # src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h # src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h # src/Storages/ObjectStorage/StorageObjectStorageSource.cpp # src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp
Adapted PR 90740 to antalya-26.3:
- src/IO/S3/URI.{cpp,h}: dropped the `S3UriStyle uri_style` parameter from the
constructor (S3UriStyle does not exist on antalya-26.3) and kept only the new
`enable_url_encoding` parameter the PR introduces.
- src/Storages/ObjectStorage/Utils.cpp: removed `S3UriStyle::AUTO` arguments
from URI ctor calls to match the simplified signature.
- IcebergIterator.{cpp,h}: kept antalya-26.3 `table_schema_id` member/init and
the `setFileMetaInfo` call alongside the PR's `secondary_storages` member,
new constructor parameter, and `requires_external_storage` check loops.
- IcebergMetadata.{cpp,h}: added `secondary_storages` member, threaded it
through `Iceberg::getManifestFile` in the prefetcher; kept antalya-26.3's
`expire_snapshots` dispatcher (`Iceberg::expireSnapshots` +
`expireSnapshotsResultToPipe`) unchanged.
- StorageObjectStorageSource.cpp: kept `IcebergDataObjectInfo.h` include along
with PR's `Utils.h` include; kept antalya-26.3 `getCompressionMethod` getter
and swarm-mode guard; renamed local `task` to `raw` to match PR's variable
name used in the iceberg-aware block below. Did not import `.storage_id` for
virtual columns (not part of PR 90740's diff).
- StorageObjectStorageStableTaskDistributor.cpp: kept antalya-26.3 helper
`getFileIdentifier` and the rich `getAnyUnprocessedFile` / iceberg
optimization paths; pulled PR's `getMetadataPathFromObjectInfo`
fallback into `getFileIdentifier` so all callers benefit.
- Mutations.cpp: added local empty `SecondaryStorages` to
`collectRetainedFiles` / `collectExpiredFiles` so the new mandatory
parameter on `getManifestList` / `getManifestFileEntriesHandle` compiles
without threading external storages into `Iceberg::expireSnapshots`.
Dropped: ExpireSnapshotsExecute.{cpp,h}, RemoveOrphanFilesExecute.{cpp,h},
SnapshotFilesTraversal.{cpp,h} — extracted EXECUTE handlers introduced by
upstream commit 933f564 (and 6a0ed7f) which is not on antalya-26.3.
PR 90740 only modifies these files to thread `secondary_storages`; the
underlying refactor is the dependency, not PR 90740 itself.
Dropped: arrow `executeExpireSnapshots` / `executeRemoveOrphanFiles` dispatch
in `IcebergMetadata::executeCommand` — depends on dropped files above; the
antalya-26.3 `Iceberg::expireSnapshots` path is kept instead.
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp — depends on PR #933f564a71e not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h — depends on PR #933f564a71e not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp — depends on PR #6a0ed7ff912 (Reuse snapshot traversal) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h — depends on PR #6a0ed7ff912 not on antalya-26.3
Adapted: src/IO/S3/URI.{cpp,h} — dropped S3UriStyle parameter (type missing on antalya-26.3)
Adapted: src/Storages/ObjectStorage/Utils.cpp — removed S3UriStyle::AUTO from URI ctor calls
Adapted: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp — added local empty SecondaryStorages to compile against new mandatory parameter
Adapted: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp — kept Iceberg::expireSnapshots path; switched manifest prefetch to pass *secondary_storages
Adapted: src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp — folded PR's getMetadataPathFromObjectInfo into existing getFileIdentifier helper
zvonand
added a commit
to Altinity/RelEasy
that referenced
this pull request
May 19, 2026
`commit_cherry_pick_conflict_as_is` and `commit_conflict_markers` were doing `git add --all` before committing the with-conflict- markers checkpoint. That sweeps everything in the working tree that isn't gitignored — and real C++ repos accumulate plenty outside .gitignore: ClickHouse leaves server runtime data under `tmp/server_data*/store/<uuid>/<part>/...cmrk2`, build pipelines spit out generated headers, autosaves, etc. bug seen 2026-05-19: Altinity/ClickHouse#1812 ended up with **696 429 additions across 19 683 files** because tmp/server_data* was tracked-modified at the time of cherry-pick and got swept in. new helper `_stage_unmerged_paths` uses `git diff --name-only --diff-filter=U` to stage exactly the conflict-marked files. The clean parts of the cherry-pick are already staged by git automatically — only the unmerged paths (whose textual content is the markers themselves) need explicit staging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5636ee3 to
90519de
Compare
cherry-pick The cherry-pick of PR ClickHouse#90740 (90519de) was applied against an upstream merge commit that pulled in `SettingsChangesHistory.cpp` entries from many unrelated PRs. Those PRs were not cherry-picked to antalya-26.3, so the referenced settings are not declared in `Settings.cpp` / `FormatFactorySettings.h`. This broke any `SET compatibility = 'X.Y'` whose target version is older than 26.3 — `SettingsImpl::applyCompatibilitySetting` iterates the history and throws `UNKNOWN_SETTING` (e.g. `optimize_dictget_tuple_element`) before reaching real settings. Failing fast tests: - `02324_compatibility_setting` - `02325_compatibility_setting_2` - `02970_visible_width_behavior` - `03006_mv_deduplication_throw_if_async_insert` - `03011_adaptative_timeout_compatibility` - `03243_compatibility_setting_with_alias` - `03274_join_algorithm_default` - `03773_nullable_sparse_join` Kept only the entries whose settings exist on antalya-26.3: - `object_storage_cluster_join_mode` (pre-existing) - `output_format_parquet_use_custom_encoder`, `output_format_parquet_version`, `output_format_parquet_compliant_nested_types`, `input_format_parquet_use_native_reader_v3` (in `FormatFactorySettings.h`) - `s3_propagate_credentials_to_other_storages` (the one PR ClickHouse#90740 introduces) CI report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1812&sha=90519de4fcfccf62720a84e54fe851a094e696c0&name_0=PR&name_1=Fast%20test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zvonand
commented
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate (ClickHouse#100420 by @divanik, ClickHouse#90740 by @zvonand).
Combined port of 2 PR(s) (group
ClickHouse-ClickHouse-pr-90740). Cherry-picked from ClickHouse#100420, ClickHouse#90740.