Add split size session config support in SparkScanBuilder.configureSplitPlanning() #230
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The session config spark.sql.iceberg.split-size was honored in some Spark read paths but ignored in others. Specifically, SparkReadConf.splitSizeOption() only checked the read option (SparkReadOptions.SPLIT_SIZE) and not the session config, causing inconsistent behavior: Specifically, APIs such as SparkStagedScan and SparkMicroBatchStream uses the session config(SparkSQLProperties.SPLIT_SIZE), while SparkScanBuilder.configureSplitPlanning() did not respect the session configuration(SparkSQLProperties.SPLIT_SIZE).
Fix
This PR fixes the inconsistency by updating splitSizeOption() to also consider the session configuration (SparkSQLProperties.SPLIT_SIZE), ensuring consistent split-size handling across all Spark reader control flows.
Testing