Skip to content

Conversation

@mitchellciupak
Copy link

@mitchellciupak mitchellciupak commented Jan 14, 2026

Which issue does this PR close?

This PR does not close any existing issues. It addresses an optimization opportunity in the fast append workflow.

Use Case

I need to validate data files before committing them to the table. Currently, validate_added_data_files() is called internally during commit(), which means validation occurs on every commit attempt, including retries.

Enhancement

By disabling validate_added_data_files() in the commit method, I can perform validation once before attempting said commit. This allows for commit retries without re-running validation, reducing overhead in retry scenarios.

It's a performance optimization that provides more control over the validation/commit lifecycle.

What changes are included in this PR?

This commit adds an option to the FastAppendAction to disable the validation step snapshot_producer.validate_added_data_files() during commits. This is similar to the option to disable snapshot_producer.validate_duplicate_files()

  • Adds an option/flag to FastAppendAction to perform or disable validation of added data files when appending.
  • Wiring the option through relevant code paths in append.rs.

The change is implemented in crates/iceberg/src/transaction/append.rs.

Are these changes tested?

These changes have been manually tested outside the test framework. I noticed that the existing with_check_duplicate() method has no test coverage. I'm not sure if either feature is just too small to be considered in scope for the project's test strategy. If helpful, I can add tests for both with_check_duplicate() and the new validate_added_data_files() method here in this PR.

@CTTY
Copy link
Collaborator

CTTY commented Jan 14, 2026

Hi, thanks for contributing! the change looks good

If helpful, I can add tests for both with_check_duplicate() and the new validate_added_data_files() method here in this PR.

Yes, this would be nice!

@mitchellciupak
Copy link
Author

added test_fast_append_with_check_duplicate_false and test_fast_append_with_check_added_data_files_false in crates/iceberg/src/transaction/append.rs! Passing locally and passing in CI!

@CTTY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants