serde-bench

Benchmark JSON serialization to binary formats (Parquet, Avro, Protobuf, ORC). Load json files into arrow format as intermediary. Uses default compression levels, e.g. 3 for zstd, 6 for gzip. Snappy does not have compression levels.

sample output

Format     Compress      Time (ms)   Size (bytes)   Files      Ratio         MB/s         Rows/s
-----------------------------------------------------------------------------------------------
avro       gzip          217801.45      125.27 MB       2       7.78x         4.47           1566
avro       none          158799.67      595.13 MB      10       1.64x         6.14           2147
avro       snappy        153354.26      179.88 MB       3       5.42x         6.36           2224
avro       zstd          173574.12      127.83 MB       2       7.62x         5.61           1965
orc        gzip            2299.70       46.98 MB       1      20.74x       423.80         148281
orc        none            9399.54      551.47 MB       5       1.77x       103.69          36279
orc        snappy           761.36       93.60 MB       1      10.41x      1280.08         447883
orc        zstd            1025.39       47.41 MB       1      20.56x       950.47         332557
parquet    gzip            1677.35       37.06 MB       1      26.30x       581.04         203298
parquet    none           14482.22      178.76 MB       2       5.45x        67.30          23546
parquet    snappy           849.15       59.76 MB       1      16.31x      1147.75         401580
parquet    zstd             917.32       38.26 MB       1      25.47x      1062.45         371737
protobuf   gzip           50786.57       78.36 MB       3      12.44x        19.19           6714
protobuf   none           18770.55      621.74 MB      20       1.57x        51.92          18167
protobuf   snappy         22680.47      140.67 MB       5       6.93x        42.97          15035
protobuf   zstd           25615.61       51.02 MB       2      19.10x        38.05          13312

Total rows: 341002
Original JSON size: 974.61 MB
Fastest: orc + snappy (761.36 ms)
Slowest: avro + gzip (217801.45 ms)
Best ratio: parquet + gzip (26.30x)
Worst ratio: protobuf + none (1.57x)

Install

cargo build --release

or download from releases.

Usage

# Convert all JSON files in raw_data/ to all formats
./target/release/serde-bench

# Specific format and compression
./target/release/serde-bench -f parquet -c zstd

# Custom input/output
./target/release/serde-bench -i data/ -o results/

# Benchmark 5 iterations, verbose
./target/release/serde-bench --iterations 5 -v

# Export results as JSON
./target/release/serde-bench --json-file bench.json

Options

Flag	Description	Default
`-i`	Input files/directories	`raw_data`
`-o`	Output directory	`output`
`-f`	Format: `parquet`, `avro`, `protobuf`, `orc`, `all`	`all`
`-c`	Compression: `none`, `zstd`, `snappy`, `gzip`	all
`--iterations`	Benchmark iterations	`1`
`--dry-run`	Benchmark without writing files	-
`-v`	Verbose output	-

Contributing

Contributions are welcome! Here's how to get started:

Fork the repository and clone your fork
Create a feature branch (git checkout -b feature/my-feature)
Make your changes
Run cargo build --release to ensure it compiles
Test with sample data: cargo run --release -- -i raw_data/ --dry-run
Push to your fork and open a pull request

Good First Issues

Multi-threading support - Currently, all benchmarks run single-threaded. Adding parallel execution for:

Processing multiple input files concurrently
Running different format/compression combinations in parallel
Parallel row group writing for formats that support it (Parquet, ORC)

This would significantly improve benchmark throughput on multi-core systems.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

serde-bench

sample output

Install

Usage

Options

Contributing

Good First Issues

About

Uh oh!

Releases 1

Contributors 2

Uh oh!

Languages

License

themoah/serde-bench

Folders and files

Latest commit

History

Repository files navigation

serde-bench

sample output

Install

Usage

Options

Contributing

Good First Issues

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors 2

Uh oh!

Languages