Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,84 @@ cargo version

To keep code style consistent, run `cargo x lint --fix` to automatically fix any style issues before committing your changes.

## Build and Test

We recommend using `cargo x` as a single entrypoint (provided by the workspace `xtask` crate). This repo defines the `cargo x` alias in `.cargo/config.toml`, which maps to `cargo run --package x -- ...`.

Build:

```shell
cargo build --workspace
```

Test:

```shell
cargo x test
# or
cargo test --workspace --no-default-features
```

Lint:

```shell
cargo x lint
```

## Manual workflow (without xtask)

`cargo x lint` runs the following steps. Use these directly when you need more control or want to isolate failures:

```shell
cargo +nightly clippy --tests --all-features --all-targets --workspace -- -D warnings
cargo +nightly fmt --all --check
taplo format --check
typos
hawkeye check
```

Automatic fix commands:

```shell
cargo +nightly clippy --tests --all-features --all-targets --workspace --allow-staged --allow-dirty --fix
cargo +nightly fmt --all
taplo format
hawkeye format --fail-if-updated=false
```

Install the extra tools with:

```shell
cargo install taplo-cli typos-cli hawkeye
```

## Serialization snapshots and test data generation

Some tests depend on snapshot files under `datasketches/tests/serialization_test_data`. If they are missing, tests will fail. Regenerate them with:

```shell
python3 ./tools/generate_serialization_test_data.py --all
```

The script pulls `datasketches-java` and `datasketches-cpp` and writes files to:

- `datasketches/tests/serialization_test_data/java_generated_files`
- `datasketches/tests/serialization_test_data/cpp_generated_files`

You can generate them separately:

```shell
python3 ./tools/generate_serialization_test_data.py --java
python3 ./tools/generate_serialization_test_data.py --cpp
```

The script requires these commands on PATH (and network access):

- Java data: `git`, `java`, `mvn`
- C++ data: `git`, `cmake`, `ctest`

The current `datasketches-java` generation flow requires JDK >= 25 and Maven >= 3.9.11, otherwise Maven Enforcer will fail.

## Code of Conduct

We expect all community members to follow our [Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).