Releases · alibaba/paimon-cpp

Supported Features

Data Types

Supports the following field types:
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP(0/3/6/9 with or without timezone), DECIMAL, DATE, ARRAY, MAP, STRUCT, BLOB.

Basic Read/Write Operations for Append and Primary Key (PK) Tables

Basic Operations: Write, commit, scan and read.
Schema Evolution for Read: Adding and removing fields; changing field types, order, and names; configuration updates et al.
External Path Support: Data files can be written across multiple storage clusters.
Blob Field Support:
-- Large binary fields are stored separately to reduce read/write amplification;
-- Supports streaming read/write of Blob fields, minimizing memory usage.
Bucketing Modes:
-- Append Table: Supports FixedBucket and UnawareBucket modes.
-- PK Table: Supports FixedBucket and PostponeBucket modes.
PK Table Read: Supports reading data in Deletion Vector and Merge-on-Read modes; Supports 4 merge strategies and basic aggregation functions.
File-Level Indexes: Supports reading Bitmap index, BSI (Bit-Sliced Index), Bloom filter.
Query Optimization: Column pruning, predicate pushdown.
Extra efficiency Optimization: File prefetching, multi-threaded row-to-batch transformation (for pk table).
Data Cleanup (Append Table only): orphaned file cleanup, expired snapshot cleanup, expired partition cleanup.

AI-Oriented Features

RowTracking: Supports global row ID assignment.
DataEvolution:
-- Global row IDs are continuous and gap-free.
-- Supports writing to specific fields only (e.g., enabling fast column addition).
-- Different fields may be stored across multiple files; queries automatically merge them during retrieval.
Global Index: Bitmap index, DiskANN-based vector search (lumina), full-text search (lucene, under development).

File Formats

Supports: Apache ORC, Parquet, Avro, Lance, Blob.

File Systems

Supports: local fs, and aliyun-oss.

Other Features

Zero-copy migration: Migrate ORC, Parquet, and other data files into Paimon tables without copying.
Data shuffle support
Branch table: Read and write operations supported.

What's Changed

chore: add third party binary files by @zjw1111 in #1
chore: remove code of conduct by @lucasfang in #2
chore: add issue template and pr template by @zjw1111 in #4
chore: add test for workflows by @lucasfang in #3
fix: prevent glog crash on concurrent initialization by @lxy-9602 in #6
chore: add test workflows for gcc by @lucasfang in #7
chore: add doc release workflow by @lucasfang in #9
fix: fix publish docs workflow by @lucasfang in #12
feat: Add IndexSplit and support returning index scores in read process by @lszskye in #11
chore: update pre-commit cmake format version and add cpplint check by @lucasfang in #13
chore: add license check using apache rat by @lucasfang in #14
feat: support serialize/deserialize for GlobalIndexResult in distributed global index search by @lxy-9602 in #15
fix: resolve multi thread mkdir error by @zjw1111 in #8
chore: correct minor typos and fix compilation warnings by @lxy-9602 in #17
chore: cpplint for more directories by @lucasfang in #16
fix: correct nextRowId in global index snapshot test data by @lxy-9602 in #18
feat(catalog): add LoadTableSchema interface by @dalingmeng in #10
chore: move the location of static library linker instruction by @zjw1111 in #20
chore: add PAIMON_THIRDPARTY_MIRROR_URL env by @lucasfang in #19
fix: fix clang tidy error by @zjw1111 in #21
chore: rename workflow jobs name by @lucasfang in #22
feat(scan): support built-in global index search during scan process by @lszskye in #23
fix(ut): prevent incorrect implicit conversion of string literals to … by @SGZW in #24
feat(scan): support create index readers with field name during scan process by @lszskye in #26
fix: fix compile issues by @zjw1111 in #27
fix: compile error by @ChaomingZhangCN in #28
refactor(global_index): remove global range awareness from plugin by @lxy-9602 in #30
add release ci workflow and remove global no-access-control by @lucasfang in #32
chore: fix clang-tidy error and improve clang-tidy in workflow by @zjw1111 in #35
fix(compile, ut): some compile/ut issues by @SGZW in #29
chore: fix syntax in API example by @letian-jiang in #36
fix: LoadTableSchema returns NotExist error instead of null when table does not exist by @lxy-9602 in #40
feat(test): add tests for global index by @lxy-9602 in #41
chore: specify fmt_ROOT in avro for find package by @zjw1111 in #44
fix: fix orc read timestamp under debian by @lszskye in #43
fix: coredump when sequence field is part of primary key by @lxy-9602 in #46
feat: support map<string, string> to/from json string and string util by @lucasfang in #45
feat: Add vector search support to DataEvolutionBatchScan and rename topk to vector search by @lxy-9602 in #48
Extract interfaces from FileBatchReader to PrefetchFileBatchReader by @lucasfang in #47
fix: Fix build errors with GCC 15 and optimize third-party library build time by @suxiaogang223 in #50
feat: update lumina lib for diskann by @lxy-9602 in #51
feat: support external path for global index by @lszskye in #52
docs: fix global index typo by @mrdrivingduck in #53
fix(executor): Add missing try/catch by @Eyizoha in #54
fix: lazy create merge function in merge file split read by @zjw1111 in #58
feat(catalog, schema): Add existence check and schema improvements by @Eyizoha in #56
fix: fix typo in catalog by @zjw1111 in #61
feat: support specific fs in ReadContext & options in VectorSearch by @lxy-9602 in #57
fix: glog linking error when libunwind is present by @mrdrivingduck in #60
fix(test): handle zero limit in LuminaGlobalIndexTest by @lxy-9602 in #62
chore: Miscellaneous minor improvements by @Eyizoha in #63
feat(catalog, predicate, schema): Add utility APIs by @Eyizoha in #64
fix(ut): fix more ut under gcc8 by @SGZW in #67
feat: support commit metrics of FileStoreCommitImpl to align with CommitMetrics by @SteNicholas in #66
feat: Introduce sst file format for btree global index by @ChaomingZhangCN in #49
fix(lfs): change big test data to lfs mode by @lszskye in #70
fix(lfs): fix pre-commit check for large files by @zjw1111 in #73
feat(memory): Add MemPool Free with alignment by @Eyizoha in #74
fix(build): Improve build system reliability and flexibility by @Eyizoha in #55
feat(cache): add readahead cache for prefetch by @lucasfang in #72
feat:Support specify field ids for table read context by @jichen20210919 in #75
feat: Add global config API and optimize Parquet read thread conf by @Eyizoha in #68
feat: support prefetch for orc by @lucasfang in #77
chore: Fix build_and_package.sh by @Eyizoha in #81
feat(api): Support specifying file system for more APIs by @Eyizoha in #78
feat: update read ahead cache pre buffer strategy by @lucasfang in https://g...