Skip to content

docs: add cpp build guides with examples#376

Open
zhaohaidao wants to merge 9 commits intoapache:mainfrom
zhaohaidao:feat/bzl-mgmt-docs-pr
Open

docs: add cpp build guides with examples#376
zhaohaidao wants to merge 9 commits intoapache:mainfrom
zhaohaidao:feat/bzl-mgmt-docs-pr

Conversation

@zhaohaidao
Copy link
Contributor

Purpose

Linked issue: close #xxx

Brief change log

Tests

API and Format

Documentation

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces C++ “system/build” dependency-mode guidance and wires up Bazel/CMake configuration to support those flows for the Fluss C++ bindings.

Changes:

  • Add Bazel and CMake usage guides for C++ (system/build modes) plus runnable Bazel consumer examples.
  • Introduce a Bazel bzlmod extension to provision Arrow C++ (build/system) and a stable arrow_cpp_dep alias target.
  • Add a protoc bootstrap script and update CMake/Bazel builds to require/propagate protoc for Rust prost-build during C++ builds.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
docs/cpp-cmake-usage.md Adds CMake system/build usage documentation and validation commands.
docs/cpp-bazel-usage.md Adds Bazel system/build usage documentation, examples, and upgrade notes.
bindings/cpp/scripts/ensure_protoc.sh Adds a helper script intended to download/use a baseline protoc.
bindings/cpp/examples/bazel-consumer/system/* Adds runnable Bazel consumer example for system-mode dependencies.
bindings/cpp/examples/bazel-consumer/build/* Adds runnable Bazel consumer example for build-mode dependencies.
bazel/cpp/deps.bzl Adds the bzlmod extension that provisions Arrow C++ for build/system modes.
bazel/cpp/BUILD.bazel Adds a stable alias arrow_cpp_dep to decouple internal repo naming.
bindings/cpp/bazel/cpp/deps.bzl Duplicates the bzlmod extension under bindings/cpp/ (currently unused).
bindings/cpp/bazel/cpp/BUILD.bazel Duplicates the arrow_cpp_dep alias under bindings/cpp/ (currently unused).
bindings/cpp/MODULE.bazel Registers toolchains and configures the new cpp_sdk extension for local builds.
MODULE.bazel Adds repo-level bzlmod configuration mirroring bindings/cpp/MODULE.bazel.
bindings/cpp/CMakeLists.txt Adds CMake system/build dependency mode selection, tool detection, and Arrow build-mode provisioning.
bindings/cpp/BUILD.bazel Ensures PROTOC is set for cargo genrules; wires in //bazel/cpp:arrow_cpp_dep.
bindings/cpp/.gitignore Adjusts ignore rules to keep Bazel consumer examples while ignoring their artifacts.
BUILD.bazel Adds top-level alias targets (//:fluss_cpp, //:consume_table, etc.) for Bazel consumers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 230 to 235
sed -i \
-e 's|path = "../../../../../"|path = "/home/admin/mh/fluss-r2/fluss-rust"|' \
-e 's|system_arrow_prefix = "/usr"|system_arrow_prefix = "/tmp/fluss-system-arrow-19.0.1"|' \
-e 's|system_arrow_shared_library = "lib/x86_64-linux-gnu/libarrow.so"|system_arrow_shared_library = "lib/libarrow.so"|' \
-e 's|system_arrow_runtime_glob = "lib/x86_64-linux-gnu/libarrow.so\\*"|system_arrow_runtime_glob = "lib/libarrow.so*"|' \
"$tmp_dir/MODULE.bazel"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troubleshooting example uses sed -i (GNU sed syntax). On macOS/BSD sed, -i requires a backup suffix (e.g. -i ''), so the command as written will fail for macOS users. Consider either using a portable approach (e.g. perl -pi -e ...) or documenting the macOS variant alongside the Linux command.

Suggested change
sed -i \
-e 's|path = "../../../../../"|path = "/home/admin/mh/fluss-r2/fluss-rust"|' \
-e 's|system_arrow_prefix = "/usr"|system_arrow_prefix = "/tmp/fluss-system-arrow-19.0.1"|' \
-e 's|system_arrow_shared_library = "lib/x86_64-linux-gnu/libarrow.so"|system_arrow_shared_library = "lib/libarrow.so"|' \
-e 's|system_arrow_runtime_glob = "lib/x86_64-linux-gnu/libarrow.so\\*"|system_arrow_runtime_glob = "lib/libarrow.so*"|' \
"$tmp_dir/MODULE.bazel"
perl -pi -e '
s|path = "../../../../../"|path = "/home/admin/mh/fluss-r2/fluss-rust"|;
s|system_arrow_prefix = "/usr"|system_arrow_prefix = "/tmp/fluss-system-arrow-19.0.1"|;
s|system_arrow_shared_library = "lib/x86_64-linux-gnu/libarrow.so"|system_arrow_shared_library = "lib/libarrow.so"|;
s|system_arrow_runtime_glob = "lib/x86_64-linux-gnu/libarrow.so\*"|system_arrow_runtime_glob = "lib/libarrow.so\*|;
' "$tmp_dir/MODULE.bazel"

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +85
normalize_version_for_protoc_release() {
local v="$1"
# Protobuf release packaging switched from v3.x.y to vX.Y for newer versions.
# For our current agreed baseline (3.25.5), the protoc archive/tag is 25.5.
if [[ "$v" =~ ^3\.([0-9]+\.[0-9]+)$ ]]; then
local stripped="${BASH_REMATCH[1]}"
local major="${stripped%%.*}"
if [[ "$major" -ge 21 ]]; then
echo "$stripped"
return 0
fi
fi
echo "$v"
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_version_for_protoc_release() doesn’t handle baseline versions like 3.25.5 correctly. With the current regex it will return 3.25.5, so the script will try to download .../releases/download/v3.25.5/protoc-3.25.5-...zip, which doesn’t exist for modern protobuf releases (they use tags like v25.5 / archives like protoc-25.5-...). Please update the normalization to map 3.<major>.<patch> to <major>.<patch> for the affected ranges (e.g. 3.25.5 -> 25.5) and keep version_matches_baseline() consistent with that mapping.

Copilot uses AI. Check for mistakes.
Comment on lines 109 to 118
if command -v wget >/dev/null 2>&1; then
local wget_args=()
if [[ -n "${https_proxy:-}" || -n "${http_proxy:-}" ]]; then
wget_args+=(--no-check-certificate -e use_proxy=yes)
if [[ -n "${https_proxy:-}" ]]; then
wget_args+=(-e "https_proxy=${https_proxy}")
fi
if [[ -n "${http_proxy:-}" ]]; then
wget_args+=(-e "http_proxy=${http_proxy}")
fi
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the curl path, the wget proxy path disables TLS verification (--no-check-certificate). This should not be enabled automatically just because a proxy is configured; keep verification on by default and require an explicit opt-in for insecure downloads.

Copilot uses AI. Check for mistakes.
Comment on lines 96 to 152
download_file() {
local url="$1"
local out="$2"

if command -v curl >/dev/null 2>&1; then
if [[ -n "${https_proxy:-}" || -n "${http_proxy:-}" ]]; then
curl -fLk "$url" -o "$out"
else
curl -fL "$url" -o "$out"
fi
return 0
fi

if command -v wget >/dev/null 2>&1; then
local wget_args=()
if [[ -n "${https_proxy:-}" || -n "${http_proxy:-}" ]]; then
wget_args+=(--no-check-certificate -e use_proxy=yes)
if [[ -n "${https_proxy:-}" ]]; then
wget_args+=(-e "https_proxy=${https_proxy}")
fi
if [[ -n "${http_proxy:-}" ]]; then
wget_args+=(-e "http_proxy=${http_proxy}")
fi
fi
wget "${wget_args[@]}" -O "$out" "$url"
return 0
fi

echo "ERROR: neither curl nor wget is available for downloading protoc." >&2
return 1
}

ensure_zip_tools() {
command -v unzip >/dev/null 2>&1 || {
echo "ERROR: unzip not found." >&2
exit 1
}
}

setup_proxy_env
ensure_zip_tools

if command -v protoc >/dev/null 2>&1; then
existing_out="$(protoc --version 2>/dev/null || true)"
if [[ "$existing_out" =~ ([0-9]+\.[0-9]+\.[0-9]+) ]]; then
existing_ver="${BASH_REMATCH[1]}"
if version_matches_baseline "$existing_ver" "$PROTOBUF_BASELINE_VERSION"; then
command -v protoc
exit 0
fi
fi
fi

PROTOC_RELEASE_VERSION="$(normalize_version_for_protoc_release "$PROTOBUF_BASELINE_VERSION")"
PROTOC_ARCHIVE="protoc-${PROTOC_RELEASE_VERSION}-${PROTOC_OS}-${PROTOC_ARCH}.zip"
PROTOC_URL="https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_RELEASE_VERSION}/${PROTOC_ARCHIVE}"
PROTOC_PREFIX="${PROTOC_INSTALL_ROOT}/protoc-${PROTOC_RELEASE_VERSION}-${PROTOC_OS}-${PROTOC_ARCH}"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download_file helper disables TLS certificate verification (curl -fLk and wget --no-check-certificate) when proxy variables are set and also downloads the protoc binary from GitHub without any checksum or signature verification. This makes it possible for a network attacker (e.g., via a malicious or compromised proxy or MITM) or a compromised distribution channel to serve a tampered protoc archive that will be executed in your build environment. Harden this by enforcing TLS certificate verification and validating the downloaded archive against a pinned checksum (or signature) before extracting and using the binary.

Copilot uses AI. Check for mistakes.
Comment on lines 21 to 22
PROTOBUF_BASELINE_VERSION="${PROTOBUF_BASELINE_VERSION:-3.25.5}"
PROTOC_INSTALL_ROOT="${PROTOC_INSTALL_ROOT:-/tmp/fluss-cpp-tools}"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROTOC_INSTALL_ROOT defaults to a shared /tmp/fluss-cpp-tools directory, and the script trusts any existing protoc binary found there without validating its origin or integrity. On multi-user systems another local user can pre-populate that cache path with a malicious executable that will then be used by your build, enabling code execution in the victim’s context. Use a cache location under the invoking user’s home directory (or another non-world-writable path) and/or validate ownership and contents before trusting an existing binary.

Copilot uses AI. Check for mistakes.
Comment on lines 276 to 283
http_archive(
name = "apache_arrow_cpp",
urls = arrow_version["urls"],
strip_prefix = arrow_version["strip_prefix"],
# TODO: Pin sha256/integrity once release packaging is finalized.
patch_cmds = _ARROW_PATCH_CMDS,
build_file_content = _render_arrow_build_file(tag),
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Arrow C++ source is fetched via http_archive from GitHub without a pinned sha256 or other integrity check. If the remote tarball or the network path is compromised, Bazel will transparently build and link against attacker-controlled code in apache_arrow_cpp. Add a sha256 (or equivalent integrity mechanism) for the Arrow archive to ensure the fetched contents match the expected release.

Copilot uses AI. Check for mistakes.
Comment on lines 276 to 283
http_archive(
name = "apache_arrow_cpp",
urls = arrow_version["urls"],
strip_prefix = arrow_version["strip_prefix"],
# TODO: Pin sha256/integrity once release packaging is finalized.
patch_cmds = _ARROW_PATCH_CMDS,
build_file_content = _render_arrow_build_file(tag),
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Arrow C++ source is fetched via http_archive from GitHub without a pinned sha256 or other integrity check. If the remote tarball or the network path is compromised, Bazel will transparently build and link against attacker-controlled code in apache_arrow_cpp. Add a sha256 (or equivalent integrity mechanism) for the Arrow archive to ensure the fetched contents match the expected release.

Copilot uses AI. Check for mistakes.
Comment on lines 138 to 143
FetchContent_Declare(
apache_arrow_src
URL ${FLUSS_CPP_ARROW_SOURCE_URL}
SOURCE_SUBDIR cpp
)
FetchContent_MakeAvailable(apache_arrow_src)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FetchContent_Declare(apache_arrow_src ...) pulls the Arrow C++ source from ${FLUSS_CPP_ARROW_SOURCE_URL} without specifying a hash or other integrity verification. If the download is tampered with (e.g., compromised mirror, DNS poisoning, or MITM), CMake will build and link against attacker-controlled Arrow code in your C++ bindings. Add an integrity check (such as URL_HASH with a pinned digest) for the Arrow archive to ensure the downloaded contents are exactly the expected release.

Copilot uses AI. Check for mistakes.
@zhaohaidao zhaohaidao changed the title (WIP)docs: add cpp system and build usage guides with examples docs: add cpp build guides with examples Feb 25, 2026
@zhaohaidao
Copy link
Contributor Author

@luoyuxia @fresh-borzoni @leekeiabstraction PTAL if u have time.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao On my Mac M1 bazel build failed, let me check if it builds in docker linux

UPD
worked in docker in linux(arm)

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao Ty, looks good overall, left a couple of comments, PTAL

@@ -1 +1 @@
build/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it match examples/bazel-consumer/build ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. build/ at the top can be confusing here.

!examples/bazel-consumer/** already re-includes this path, but I added explicit exceptions for clarity and safety:

  • !examples/bazel-consumer/build/
  • !examples/bazel-consumer/build/**

This makes it unambiguous that examples/bazel-consumer/build is intentionally tracked.

@zhaohaidao zhaohaidao force-pushed the feat/bzl-mgmt-docs-pr branch from 1afb46f to 72bf7af Compare February 27, 2026 04:38
@zhaohaidao
Copy link
Contributor Author

@zhaohaidao Ty, looks good overall, left a couple of comments, PTAL

Thanks, Comments ares addessed. PTAL @fresh-borzoni

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao Thanks for the pr. Left minor comments.
one more question, do you think we need to publish fluss-cpp to bazel registry?

module(name = "fluss_cpp_consumer_build")

bazel_dep(name = "rules_cc", version = "0.2.14")
bazel_dep(name = "fluss-cpp", version = "0.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the next release version is 0.1.0? Do you need to change to 0.1.0 in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here both examples use local_path_override(...), so the source is resolved from the local repository path, not from the registry version. In that setup, 0.0.0 vs 0.1.0 does not change build behavior.

I updated both examples to 0.1.0 for consistency.

module(name = "fluss_cpp_consumer_system")

bazel_dep(name = "rules_cc", version = "0.2.14")
bazel_dep(name = "fluss-cpp", version = "0.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito:
whether the version should be changed?

@@ -0,0 +1,284 @@
# Fluss C++ Bazel Usage Guide (System / Build Modes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we move document to fluss-rust/website/docs/user-guide/cpp/installation.md for cpp. Can we also move these two document to the website. I guess we may need to merge some comments with installation.md.
Maybe we can do in another pr if you find it'll cost some efforts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We should move/merge these docs into website/docs/user-guide/cpp/installation.md. To keep this PR focused on Bazel system/build fixes, I suggest doing the website-doc consolidation in a follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao LGTM, TY

@zhaohaidao
Copy link
Contributor Author

@zhaohaidao Thanks for the pr. Left minor comments. one more question, do you think we need to publish fluss-cpp to bazel registry?

Given the current scope, where Arrow is the main external dependency, it may be more practical to keep the source-based build path as the primary approach for now, and consider Bazel registry-first adoption once
dependency complexity grows.

Bazel is most valuable when:

  • Consumers are already Bazel-native (monorepo integration via bazel_dep(...)).
  • Cross-language reproducibility (Rust/C++) is a hard requirement.
  • Remote cache / remote execution is available to offset heavy native builds.
  • Dependency/platform complexity grows and centralized graph management is needed.

Bazel also adds clear maintenance overhead for us:

  • Release + registry publishing workflow (versioning, tagging, submission, validation).
  • Stronger backward-compatibility commitments for module/extension interfaces.
  • Expanded CI matrix (Linux/macOS, build/system, registry consumption path).
  • More user documentation and support obligations.
  • Ongoing version lifecycle work (patch releases, backports, rollback handling).

Recommendation now:

  • Keep source checkout + build as the default.
  • Keep Bazel support available but not as the primary distribution path.
  • Revisit registry-first Bazel adoption once external dependencies increase and user-side dependency conflicts become frequent.
    @luoyuxia

@zhaohaidao
Copy link
Contributor Author

@fresh-borzoni @luoyuxia Comments are addressed. Thanks so much for your partient and carefull reviews. PTAL if u have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants