Skip to content

perf:support streaming download#381

Merged
luoyuxia merged 9 commits intoapache:mainfrom
zhaohaidao:perf/pr2-streaming-download
Feb 27, 2026
Merged

perf:support streaming download#381
luoyuxia merged 9 commits intoapache:mainfrom
zhaohaidao:perf/pr2-streaming-download

Conversation

@zhaohaidao
Copy link
Contributor

Summary

  • Add configurable streaming remote-log download path.
  • Keep non-streaming range-read path as fallback.

Why

Range-loop reads on remote segments increase per-chunk request overhead and can reduce effective object-store throughput. Streaming download improves transfer shape and reduces remote-read overhead for this workload profile.

What Changed

  • crates/fluss/src/config.rs:
    • scanner_remote_log_streaming_read (default true)
    • scanner_remote_log_streaming_read_concurrency (default 4)
  • crates/fluss/src/client/table/scanner.rs: pass streaming knobs into RemoteLogDownloader.
  • crates/fluss/src/client/table/remote_log.rs:
    • add streaming path (reader_with().chunk().concurrent())
    • keep range-read path for A/B and rollback
  • bindings/cpp/src/lib.rs: switch to Config::default() + field overrides to avoid missing new config fields.

Benchmark

  • Fixed workload:

    • cargo run --release --example custom-consume-table -- --task LdsHomeFeedQH --hours-ago 10 --bucket-mode range --max-buckets 4
    • timeout 180s per run
      | Factor | (r/s) |
      |---|---:|
      | streaming_read=true (B0_v3) 39750.45 |
      | streaming_read=false (F6) 17950.00 |
  • Turning streaming off: -54.84% throughput

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable streaming download support for remote log segments to improve download throughput. The implementation adds a new streaming path using OpenDAL's reader API while keeping the existing range-based download as a fallback option, allowing for easy A/B testing and rollback.

Changes:

  • Added two new configuration fields: scanner_remote_log_streaming_read (default: true) and scanner_remote_log_streaming_read_concurrency (default: 4) to control the streaming behavior
  • Refactored the download implementation to support both streaming and range-based approaches, selected at runtime based on configuration
  • Updated C++ bindings to expose the new configuration options

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/fluss/src/config.rs Adds new config fields for streaming read control with comprehensive tests for the boolean flag behavior
crates/fluss/src/client/table/scanner.rs Passes new streaming config parameters to RemoteLogDownloader
crates/fluss/src/client/table/remote_log.rs Implements streaming download path and refactors existing range-based download into separate helper functions
bindings/cpp/src/lib.rs Updates FFI config struct and connection initialization to include new streaming fields
bindings/cpp/src/ffi_converter.hpp Maps C++ Configuration fields to FFI config structure
bindings/cpp/include/fluss.hpp Adds streaming configuration fields to C++ Configuration struct with appropriate defaults

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 865 to 870
remote_path: &str,
local_path: &Path,
remote_fs_props: &HashMap<String, String>,
streaming_read: bool,
streaming_read_concurrency: usize,
) -> Result<PathBuf> {
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment at line 862 states "using streaming read/write" but the function now supports both streaming and range-based downloads depending on the streaming_read parameter. Consider updating the comment to reflect this, such as "Download a file from remote storage to local using either streaming or range-based read/write".

Copilot uses AI. Check for mistakes.
@zhaohaidao
Copy link
Contributor Author

@luoyuxia @fresh-borzoni @leekeiabstraction PTAL if u have time.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao Thanks. LGTM overall.

  • also update bindings/python to respect the config
  • update the docs in website directory

@zhaohaidao
Copy link
Contributor Author

@zhaohaidao Thanks. LGTM overall.

  • also update bindings/python to respect the config
  • update the docs in website directory

Thanks. Comments are addressed. PTAL @luoyuxia

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM! Will wait to see if there's more comments before merge

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left minor comments

@zhaohaidao
Copy link
Contributor Author

@luoyuxia @fresh-borzoni Comments are addressed. PTAL if u have time.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao Ty, LGTM

@luoyuxia luoyuxia merged commit 0f549ef into apache:main Feb 27, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants