Skip to content

Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1812

Open
zvonand wants to merge 5 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-90740
Open

Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1812
zvonand wants to merge 5 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-90740

Conversation

@zvonand
Copy link
Copy Markdown
Member

@zvonand zvonand commented May 19, 2026

Dropped from this backport: the AI dropped these surfaces rather than pulling in a missing prerequisite. Reviewers: confirm each is genuinely optional.

  • src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp — depends on PR #933f564a71e not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h — depends on PR #933f564a71e not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp — depends on PR #6a0ed7ff912 (Reuse snapshot traversal) not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h — depends on PR #6a0ed7ff912 not on antalya-26.3

Auto-ported prerequisites: RelEasy detected that the requested port depended on PR(s) not yet on the target branch and auto-ported them first (1 PR(s) added). Reviewers: please confirm the prereq scope is appropriate.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate (ClickHouse#100420 by @divanik, ClickHouse#90740 by @zvonand).

Combined port of 2 PR(s) (group ClickHouse-ClickHouse-pr-90740). Cherry-picked from ClickHouse#100420, ClickHouse#90740.

divanik and others added 4 commits May 19, 2026 22:31
…solution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert_spark_azure_fixes

Resolve problems with paths and compatibility problems with Spark in Azure (v2)

# Conflicts:
#	src/Interpreters/IcebergMetadataLog.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
…olution in next commit)

---
Original cherry-pick message follows:

Merge 136b6b2 into a1bf94d

# Conflicts:
#	src/IO/S3/URI.cpp
#	src/IO/S3/URI.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
#	src/Storages/ObjectStorage/StorageObjectStorageSource.cpp
#	src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp
Adapted PR 90740 to antalya-26.3:

- src/IO/S3/URI.{cpp,h}: dropped the `S3UriStyle uri_style` parameter from the
  constructor (S3UriStyle does not exist on antalya-26.3) and kept only the new
  `enable_url_encoding` parameter the PR introduces.
- src/Storages/ObjectStorage/Utils.cpp: removed `S3UriStyle::AUTO` arguments
  from URI ctor calls to match the simplified signature.
- IcebergIterator.{cpp,h}: kept antalya-26.3 `table_schema_id` member/init and
  the `setFileMetaInfo` call alongside the PR's `secondary_storages` member,
  new constructor parameter, and `requires_external_storage` check loops.
- IcebergMetadata.{cpp,h}: added `secondary_storages` member, threaded it
  through `Iceberg::getManifestFile` in the prefetcher; kept antalya-26.3's
  `expire_snapshots` dispatcher (`Iceberg::expireSnapshots` +
  `expireSnapshotsResultToPipe`) unchanged.
- StorageObjectStorageSource.cpp: kept `IcebergDataObjectInfo.h` include along
  with PR's `Utils.h` include; kept antalya-26.3 `getCompressionMethod` getter
  and swarm-mode guard; renamed local `task` to `raw` to match PR's variable
  name used in the iceberg-aware block below. Did not import `.storage_id` for
  virtual columns (not part of PR 90740's diff).
- StorageObjectStorageStableTaskDistributor.cpp: kept antalya-26.3 helper
  `getFileIdentifier` and the rich `getAnyUnprocessedFile` / iceberg
  optimization paths; pulled PR's `getMetadataPathFromObjectInfo`
  fallback into `getFileIdentifier` so all callers benefit.
- Mutations.cpp: added local empty `SecondaryStorages` to
  `collectRetainedFiles` / `collectExpiredFiles` so the new mandatory
  parameter on `getManifestList` / `getManifestFileEntriesHandle` compiles
  without threading external storages into `Iceberg::expireSnapshots`.

Dropped: ExpireSnapshotsExecute.{cpp,h}, RemoveOrphanFilesExecute.{cpp,h},
SnapshotFilesTraversal.{cpp,h} — extracted EXECUTE handlers introduced by
upstream commit 933f564 (and 6a0ed7f) which is not on antalya-26.3.
PR 90740 only modifies these files to thread `secondary_storages`; the
underlying refactor is the dependency, not PR 90740 itself.

Dropped: arrow `executeExpireSnapshots` / `executeRemoveOrphanFiles` dispatch
in `IcebergMetadata::executeCommand` — depends on dropped files above; the
antalya-26.3 `Iceberg::expireSnapshots` path is kept instead.

Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp — depends on PR #933f564a71e not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h — depends on PR #933f564a71e not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp — depends on PR #6a0ed7ff912 (Reuse snapshot traversal) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h — depends on PR #6a0ed7ff912 not on antalya-26.3
Adapted: src/IO/S3/URI.{cpp,h} — dropped S3UriStyle parameter (type missing on antalya-26.3)
Adapted: src/Storages/ObjectStorage/Utils.cpp — removed S3UriStyle::AUTO from URI ctor calls
Adapted: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp — added local empty SecondaryStorages to compile against new mandatory parameter
Adapted: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp — kept Iceberg::expireSnapshots path; switched manifest prefetch to pass *secondary_storages
Adapted: src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp — folded PR's getMetadataPathFromObjectInfo into existing getFileIdentifier helper
@zvonand zvonand added releasy Created/managed by RelEasy antalya-26.3 ai-resolved Port conflict auto-resolved by Claude auto-prereq-added Combined PR includes auto-added prerequisite PR(s) labels May 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Workflow [PR], commit [90519de]

zvonand added a commit to Altinity/RelEasy that referenced this pull request May 19, 2026
`commit_cherry_pick_conflict_as_is` and `commit_conflict_markers`
were doing `git add --all` before committing the with-conflict-
markers checkpoint. That sweeps everything in the working tree that
isn't gitignored — and real C++ repos accumulate plenty outside
.gitignore: ClickHouse leaves server runtime data under
`tmp/server_data*/store/<uuid>/<part>/...cmrk2`, build pipelines
spit out generated headers, autosaves, etc.

bug seen 2026-05-19: Altinity/ClickHouse#1812 ended up with
**696 429 additions across 19 683 files** because tmp/server_data*
was tracked-modified at the time of cherry-pick and got swept in.

new helper `_stage_unmerged_paths` uses `git diff --name-only
--diff-filter=U` to stage exactly the conflict-marked files. The
clean parts of the cherry-pick are already staged by git
automatically — only the unmerged paths (whose textual content is
the markers themselves) need explicit staging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@zvonand zvonand force-pushed the feature/antalya-26.3/ClickHouse-ClickHouse-pr-90740 branch from 5636ee3 to 90519de Compare May 19, 2026 22:45
 cherry-pick

The cherry-pick of PR ClickHouse#90740 (90519de) was applied against an upstream
merge commit that pulled in `SettingsChangesHistory.cpp` entries from many
unrelated PRs. Those PRs were not cherry-picked to antalya-26.3, so the
referenced settings are not declared in `Settings.cpp` / `FormatFactorySettings.h`.

This broke any `SET compatibility = 'X.Y'` whose target version is older than
26.3 — `SettingsImpl::applyCompatibilitySetting` iterates the history and
throws `UNKNOWN_SETTING` (e.g. `optimize_dictget_tuple_element`) before
reaching real settings.

Failing fast tests:
- `02324_compatibility_setting`
- `02325_compatibility_setting_2`
- `02970_visible_width_behavior`
- `03006_mv_deduplication_throw_if_async_insert`
- `03011_adaptative_timeout_compatibility`
- `03243_compatibility_setting_with_alias`
- `03274_join_algorithm_default`
- `03773_nullable_sparse_join`

Kept only the entries whose settings exist on antalya-26.3:
- `object_storage_cluster_join_mode` (pre-existing)
- `output_format_parquet_use_custom_encoder`, `output_format_parquet_version`,
  `output_format_parquet_compliant_nested_types`,
  `input_format_parquet_use_native_reader_v3` (in `FormatFactorySettings.h`)
- `s3_propagate_credentials_to_other_storages` (the one PR ClickHouse#90740 introduces)

CI report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1812&sha=90519de4fcfccf62720a84e54fe851a094e696c0&name_0=PR&name_1=Fast%20test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member Author

@zvonand zvonand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title shall be changed to one of 90740 (as this is the main port). Same for changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya-26.3 auto-prereq-added Combined PR includes auto-added prerequisite PR(s) releasy Created/managed by RelEasy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants