Skip to content

feat: add ReplacePartitions core class (PR1 of 2 for #637)#776

Open
shangxinli wants to merge 4 commits into
apache:mainfrom
shangxinli:feat/replace-partitions-core
Open

feat: add ReplacePartitions core class (PR1 of 2 for #637)#776
shangxinli wants to merge 4 commits into
apache:mainfrom
shangxinli:feat/replace-partitions-core

Conversation

@shangxinli

@shangxinli shangxinli commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the iceberg::ReplacePartitions class — the dynamic partition overwrite operation. Each AddFile() registers the file's partition for replacement of all existing data files in that partition. Produces an overwrite snapshot with "replace-partitions"="true" in the summary. Unpartitioned tables replace all existing data files.

This is PR1 of 2 for #637. See tracking issue #775 for the split rationale.

What's in PR1

  • iceberg::ReplacePartitions class extending MergingSnapshotUpdate:
    • Builder API: AddFile, ValidateAppendOnly, ValidateFromSnapshot, ValidateNoConflictingData, ValidateNoConflictingDeletes
    • Apply / Summary / CleanUncommitted overrides
    • Uses ManifestFilterManager for partition-based manifest filtering
  • SnapshotSummaryFields::kReplacePartitions = "replace-partitions" constant
  • ReplacePartitions forward declaration in type_fwd.h
  • CMake + Meson registration

What's deferred to PR2

  • Transaction::NewReplacePartitions() API wiring on Table / Transaction
  • End-to-end test file (replace_partitions_test.cc)

Design notes

  • Follows the Java ReplacePartitions semantics: operation type is overwrite, summary carries "replace-partitions"="true".
  • Mirrors the existing C++ FastAppend / MergeAppend patterns (factory Make(), MergingSnapshotUpdate base, builder error collector).
  • Unpartitioned table case treated as table-wide replace via DeleteByRowFilter(AlwaysTrue()), matching Java BaseReplacePartitions.

Test plan

  • Clean build (cmake --build build)
  • clang-format clean
  • End-to-end behavioral tests land in PR2 alongside the Table/Transaction API wiring

Closes part of #637 (PR2 will close it).
Tracking: #775

@shangxinli shangxinli force-pushed the feat/replace-partitions-core branch 4 times, most recently from ae973f6 to e33f433 Compare June 24, 2026 01:01
Adds iceberg::ReplacePartitions — the dynamic partition overwrite
operation. Each AddFile() registers the file's partition for
replacement of all existing data and delete files in that partition.
Produces an overwrite snapshot with "replace-partitions=true" in the
summary. Unpartitioned tables replace all existing data files.

Extends MergingSnapshotUpdate (matching Java's BaseReplacePartitions)
so the full data+delete manifest pipeline, custom-summary-property
handling, and conflict-validation helpers are inherited. AddFile()
unconditionally calls DropPartition(spec_id, file->partition) — for
unpartitioned specs the partition value is empty and the filter
manager matches every file in that spec, so no separate AlwaysTrue
path is needed. Touched partitions are tracked in a PartitionSet;
Validate() uses the partition-scoped overloads of
ValidateAddedDataFiles / ValidateNoNewDeleteFiles, or skips entirely
when no partitions were staged.

Changes:
  * iceberg::ReplacePartitions extending MergingSnapshotUpdate with
    builder API (AddFile, ValidateAppendOnly, ValidateFromSnapshot,
    ValidateNoConflictingData, ValidateNoConflictingDeletes) and
    Validate() override.
  * SnapshotSummaryFields::kReplacePartitions = "replace-partitions".
  * MergingSnapshotUpdate::SetSummaryProperty promoted from private to
    protected so subclasses can stash custom summary entries that
    survive commit retry via the cached-rebuild path.
  * Forward declaration in type_fwd.h.
  * CMake + Meson source registration.

Public API wiring (Table::NewReplacePartitions(),
Transaction::NewReplacePartitions()) and end-to-end tests are
deferred to PR2.

Tracking: apache#775
Related: apache#637
@shangxinli shangxinli force-pushed the feat/replace-partitions-core branch from e33f433 to 12dc230 Compare June 24, 2026 01:13
current_metadata, starting_snapshot_id_, replaced_partitions_, snapshot, io));
}
if (validate_conflicting_deletes_) {
ICEBERG_RETURN_UNEXPECTED(ValidateNoNewDeleteFiles(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java BaseReplacePartitions.validate also calls validateDeletedDataFiles, so concurrent overwrite/delete commits in the replaced partition are rejected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// the partition values are empty and naturally match every file under that
// spec — no separate AlwaysTrue path is needed, and validation stays scoped
// to the spec rather than the whole table.
ICEBERG_BUILDER_RETURN_IF_ERROR(DropPartition(spec_id, file->partition));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpartitioned replace is spec-scoped, but Java treats it as table-wide.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

…aFiles

- AddFile() now uses DeleteByRowFilter(AlwaysTrue()) for unpartitioned
  specs instead of DropPartition with empty partition values, matching
  Java BaseReplacePartitions which treats unpartitioned tables as a
  table-wide replace.
- Validate() now also calls ValidateDeletedDataFiles when
  ValidateNoConflictingDeletes is enabled, mirroring Java where
  validateNewDeletes gates both checks. This rejects concurrent
  overwrite/delete commits in the replaced partitions.
- New replace_by_row_filter_ flag drives the AlwaysTrue path in
  Validate() for the unpartitioned case.
// No-op update: no partitions were staged and no table-wide replace was
// requested, so there is nothing to conflict with. Calling the validators
// with AlwaysTrue here would turn an empty builder into a full-table check.
if (!replace_by_row_filter_ && replaced_partitions_.empty()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java rejects no-data-files replace. Suggest to call DataSpec() like Java.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where it's been implemented. Can you check again?

@manuzhang manuzhang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks replace_partitions.h is missing from src/iceberg/update/meson.build.

@zhjwpku

zhjwpku commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

In the PR description, should "the iceberg::ReplacePartitions class extending SnapshotUpdate" be changed to "the iceberg::ReplacePartitions class extending MergingSnapshotUpdate"?


namespace iceberg {

/// \brief Replaces partitions in a table with new data files.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use the {@link OverwriteFiles overwrite API} to explicitly overwrite data. like java?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to '/// \note This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use OverwriteFiles to explicitly overwrite data.'

…date

- Add replace_partitions.h to src/iceberg/update/meson.build (manuzhang)
- Add Java-style note recommending OverwriteFiles for non-Hive use (zhjwpku)
- Validate(): call DataSpec() to reject no-data-files replace, matching
  Java BaseReplacePartitions which throws when no file was added (manuzhang)
DataSpec() also errors on multi-spec stages, which would silently break a
ReplacePartitions that touches files from more than one spec. Use the
flags AddFile() sets (replace_by_row_filter_ / replaced_partitions_)
instead — the same condition that already gated the no-op early-return
below, now promoted to an error to match Java BaseReplacePartitions.
@shangxinli

Copy link
Copy Markdown
Contributor Author

In the PR description, should "the iceberg::ReplacePartitions class extending SnapshotUpdate" be changed to "the iceberg::ReplacePartitions class extending MergingSnapshotUpdate"?

Updated the PR description — "extending SnapshotUpdate" → "extending MergingSnapshotUpdate". Thanks for catching that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants