Skip to content

Releases: Bears-R-Us/arkouda

Release Notes v2025.12.16

16 Dec 21:40
a7a063a

Choose a tag to compare

Arkouda v2025.12.16

This release continues Arkouda’s push toward full NumPy and pandas compatibility, with major progress on multi-dimensional arrays, pandas ExtensionArray support, distributed performance, and developer tooling cleanup.


Supported environments and dependencies

This release was tested in CI with the following language versions:

  • Python: 3.10, 3.11, 3.12, 3.13
  • Chapel: 2.4.0, 2.5.0, 2.6.0

Notable dependency requirements

Runtime dependencies include:

  • NumPy ≥ 2.0
  • pandas ≥ 1.4.0, excluding 2.2.0 (!= 2.2.0)
  • pyarrow ≥ 6.0.1, < 21.0.0
  • tables (PyTables) ≥ 3.10.0
  • h5py ≥ 3.7.0
  • typeguard pinned to 2.10.0

For the full list of dependencies (including optional dev tools such as pytest, Sphinx, and linters), see pyproject.toml.


Highlights

Multi-Dimensional Array Expansion

Multi-dimensional support is now significantly more complete across the API:

  • Multi-dimensional support added to or enhanced:
  • Fixed Chapel instantiation limits for 3+ dimensions (#4227)
  • Reorganized broadcasting logic and internals (#4978, #4737, #4737)

Distributed Performance & Algorithms

  • New repartitionByHash API for distributed workflows (#4500)
  • Adopted Chapel standard sort for distributed sorting (#5039)
  • Refactored FeistelShuffle into innerArray for better performance (#5069)
  • Performance improvements to cumSum / cumProd (#4810)

pandas Integration & ExtensionArray Progress

  • New Arkouda accessor for pandas Index (#5074, #5110)
  • pandas DataFrame accessor for Arkouda (#4983)
  • Renamed ArkoudaBaseArrayArkoudaExtensionArray (#5001)
  • ExtensionArray API improvements: _from_sequence (#5078), copy (#5076), argsort (#4993)
  • Refactored factorize to remove pandas dependency (#4940)
  • Registered extension dtypes (#4946)

Developer Experience & CI Modernization

  • CI updated to support Chapel 2.6; dropped Chapel 2.0–2.3 (#4986, #4991)
  • Automated CI build improvements (#4892, #4893)
  • Improved Makefile structure and debug ergonomics (#5128, #5133)
  • Configurable compiled Arkouda dimensionality via make (#5091)
  • Updated Arrow / Parquet handling, including Arrow <19 compatibility (#5146, #5164)

Tooling Cleanup & Code Quality

  • Removed isort, darglint, and pydocstyle (#5060, #5072)
  • Reduced ruff ignores and resolved formatting issues (#4979, #4980, #4982)
  • Fixed mypy issues and improved type precision (#5093)
  • Removed deprecated tests and legacy code paths (#5031)

Bug Fixes & Correctness

  • Fixed edge-case failures for small sizes (size <= 10) (#5054, #5052, #5045)
  • Fixed ak.array negative number handling (#4984)
  • Fixed concatenate(axis=1) behavior (#5030)
  • Fixed CSV parsing for quoted and multiline records (#5080)
  • Improved numerical consistency with NumPy (allclose) (#2956)

Full Changelog
v2025.09.30...v2025.12.16

Auto-Generated Release Notes

What's Changed

Read more

Release Notes v2025.09.30

30 Sep 19:54
e077dc4

Choose a tag to compare

Release Notes

This release introduces several major new features, performance improvements, and bug fixes across Arkouda’s Python and Chapel codebases.
Highlights include the new pandas ExtensionArray implementation, expanded random number generation features, and improvements to Parquet I/O performance.

Supported environments and dependencies

This release was tested in CI with the following language versions:

  • Python: 3.10, 3.11, 3.12, 3.13
  • Chapel: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0

Notable dependency requirements

Runtime dependencies include:

  • NumPy ≥ 2.0
  • pandas ≥ 1.4.0, excluding 2.2.0 (!= 2.2.0)
  • pyarrow ≥ 6.0.1, < 21.0.0
  • tables (PyTables) ≥ 3.10.0
  • h5py ≥ 3.7.0
  • typeguard pinned to 2.10.0

For the full list of dependencies (including optional dev tools such as pytest, Sphinx, and linters), see pyproject.toml.

Major Changes

Implemented pandas ExtensionArray for Arkouda (Closes #4597, #4907, #4876, #4947) by @ajpotts

Added ak.rand to match np.random.rand (Closes #4736) by @drculhane

Added ak.shares_memory function (Closes #3284) by @ajpotts

Added ak.errstate context manager for error handling (Closes #3286) by @ajpotts

Added ak.Index.sort_values (Closes #3177) by @ajpotts

Added ak.fabs (Closes #4921) by @1RyanK

Added ascending argument to ak.argsort (Closes #4782) by @ajpotts

Improved Parquet read performance, especially for multiple column reads (Closes #4906) by @e-kayrakli

Enabled multi-dim output for ak.random.standard_exponential (Closes #4924) by @drculhane

Added destructors for Chapel-side and Python-side RNGs (Closes #4898) by @drculhane

Minor Changes

Expanded axis validation standardization across array API functions (Closes #4831, #4858, #4909, #4932) by @drculhane

Improved docstrings (Closes #3941, #3942, #4852, #4849, #4853, #4947) by @ajpotts, @1RyanK

Added global seed support for reproducibility (Closes #4777, #4726) by @drculhane

Improved shuffle benchmark with Feistel and alternatives (Closes #4818, #4845, #4787) by @1RyanK

Improved benchmark framework (Closes #4811, #4814, #4808, #4816, #4856) by @ajpotts

Added pytest-benchmark dependency (Closes #4821) by @jabraham17

Improved CI builds: Chapel 2.5 support, automated builds, Dockerfile fixes (Closes #4783, #4891, #4910, #4908) by @jaketrookman, @jabraham17

Added pyproject.toml for modern packaging (Closes #4209) by @ajpotts

Refined multi-dim build to reduce size (Closes #4791) by @ajpotts

Improved nbytes handling for bigint arrays (Closes #4850, #4896) by @1RyanK

Improved command registration (Closes #4953) by @e-kayrakli

Bug Fixes

Fixed ak.where for Categorical (Closes #4881) by @1RyanK

Fixed ak.randint behavior for bool (Closes #4872) by @1RyanK

Fixed conversion of numpy bigint zeros producing empty arrays (Closes #4884) by @1RyanK

Fixed cumsum vs cumulative_sum typo (Closes #4804) by @drculhane

Fixed handling of size/shape in ak.random.poisson (Closes #4916) by @drculhane

Fixed common type promotion in concat and stack (Closes #4889) by @drculhane

Fixed benchmark issues: average rate always zero, array_transfer.dat not populating, io_benchmark parsing (Closes #4824, #4863, #4862) by @ajpotts

Fixed doc build failures with Chapel 2.5.0 (Closes #4838) by @ajpotts

Fixed clang bitshift issue (Closes #4894) by @1RyanK

Fixed MaxArrayDims incorrectness (Closes #4565) by @1RyanK

Fixed negative server return values in rare cases (Closes #4157) by @ajpotts

Fixed intermittent test failures (test_set_uint) (Closes #4153) by @ajpotts

Fixed delGeneratorMsg bug (Closes #4933) by @ajpotts

Fixed PT003, T201, E127, Flake8 errors (Closes #4806, #4874, #4903, #4871) by @ajpotts

Fixed doctest failures in random and client modules (Closes #4798, #4860) by @ajpotts, @drculhane

Auto-generated release notes

What's Changed

Read more

Release Notes v2025.08.20

20 Aug 17:22
147a08b

Choose a tag to compare

Introduction

This release delivers a mix of new functionality, performance improvements, infrastructure updates, and ongoing work to align Arkouda more closely with NumPy and modern Python standards.

Highlights include:

  • New array operations and utilities (Strings.argsort, Categorical.argsort, isnumeric, deepcopy for ak.array, max_bits_list, and a new searchsorted implementation).
  • Major system-level improvements such as MergeShuffle, repartitionByLocale, enhanced checkpointing (including a server heartbeat and bigint array support), and better configuration utilities.
  • Expanded test coverage and benchmarking, with many benchmarks refactored for maintainability and consistency.
  • Significant documentation work: missing docstrings filled in, doctests added, and adoption of NumPy-style docstring conventions with ruff-based linting.
  • CI and infrastructure updates to improve reliability, including fixes for intermittent failures, expanded multi-dimensional test support, and branch migration from master to main.
  • A number of important bug fixes addressing auto-checkpointing, Arrow dependency compatibility, type hinting, and CI stability.

Together, these changes improve Arkouda’s stability, usability, and developer experience, while continuing to advance its alignment with NumPy semantics.

Major Changes


Minor Changes

Benchmarks

Documentation

CI / Testing / Infra

  • Set max-parallel for multi-dim tests in CI (#4696, PR #4697)
  • Reactivate pytest timeout for unit tests (PR #4689)
  • Parameterize size in test_multi_col_merge (#4713, PR #4714)
  • Update CI to use a slim build for multi-dim testing (#4778, PR #4778)
  • Switch branch from master to main + port PRs (#4733, PR #4734, PR #4740)
  • Skip testing of auto-checkpoints.py on unsupported hardward (PR #4707)

Other

  • Remove redundant cumsum, align cumulative ops to NumPy (#4749, PR #4755)

Bug Fixes

Auto-generated release notes

What's Changed

Read more

Release Notes v2025.07.03

22 Jul 17:53
88f2a83

Choose a tag to compare

Arkouda v2025.07.03

We're excited to announce a feature-packed release of Arkouda with enhanced NumPy compatibility, powerful new array functions, performance improvements, CI tooling, and major documentation progress.


Features

Array Functions

Checkpointing and Logging

  • Introduced experimental checkpointing of server state, with support for numeric arrays and automatic checkpointing triggered by memory limits or idle time.
    (#2384, PRs #3915, #4391, #4549, PR #4592, #4644)

  • Improved logging behavior:

    • Logs can now be redirected to a file using the server’s logging mechanism (PR #4152)
    • Reduced use of throws in logging routines (PR #4433)

Project Infrastructure

Other


API Enhancements and Compatibility

API Enhancements and Compatibility


Performance Improvements

  • Improved performance and stability in ak.permutation, distributed array creation, and sorting
    (#3974, PRs #3975, #4242)

Deprecations and Refactors

  • Removed deprecated or obsolete features:

  • Refactored and modernized core logic:

  • Simplified internals and extended platform support:

  • Added internal or system-level functionality:

    • repartitionByLocaleString and repartitionByHashString server functions
      (#4497, #4499, PRs #4557, #4617)
    • Set union function for Strings arrays (#4244, PR #4245)
    • Compatibility module for Time.totalMicroseconds() (PR #4142)
    • Added missing __all__ to ensure symbol export consistency (#4426, PR #4427)

Benchmark Refactor

Read more

Release Notes v2025.01.13

13 Jan 16:06
a3aa4c3

Choose a tag to compare

Bug Fixes

  • Issues #3931 and #3933: fixes bug in the Makefile preventing make install-arrow from successfully completing on some systems.
  • Issue #3947: fixes bug where reshape was failing for a single integer argument.

Major changes

  • Issues #3939 and #3957: refactors of the Makefile to streamline offline arkouda builds
  • Issue #3960: creates a comm_diagnostics module for querying comm diagnostic statistics.

Minor changes

  • Issue #3929: Adds chapel 2.1, 2.2 to the github CI
  • Issue #3911: minor performance improvement to reduction module
  • Issues #3881, #3882, and #3872: Completes the refactoring of all functions in EfuncMsg.chpl to the new interface.
Auto-generated release notes

New Contributors

Full Changelog: v2024.12.06...v2025.01.13

Release Notes v2024.12.06

07 Dec 02:31
9eba2ce

Choose a tag to compare

Bug Fixes

  • Issue #3870 - fixes bug in reshape for bigint type
  • Issue #3821 - fixes bug in stridable indexing of Strings in multilocale
  • PR #3804 - fixes sparseMatToPdarray test failures for distributed arrays
  • PR #3857 - fixes file location reporting in register-commands.py
  • Issue #3842 - fixes mypy CI failures

Major changes

Minor changes

  • Numpy Alignment
    • Issues #3868, #3884, #3781 - code reorganization to align with numpy
    • Issue #3864 - max and min of bool to return bool to match numpy
    • Issue #3714 - pdarray.shape returns a tuple
    • Issue #3283 - adds mixed types to work with histogram2d and match the return dtypes with Numpy
  • Issues #3839, #3560, #3796 - refactor benchmarks to use pytest framework and add to CI.
  • Issue #3815, PRs #3880, #3812, #3926, #3912, #3802 - unit test improvements
  • Issues #3902, #3896, #3818, #3883, #3887 - reduce warnings
  • Issue #3708 - refactors array_api to call functions from arkouda.pdarray_creation
  • PRs #3814 and #3826 - performance improvements to array function
  • PR #3862 - updated the hdf5 download link in the Makefile
  • Issue #3905 - assert_equivalent compares shapes of pdarrays
  • PR #3818 - improves documentation for LINUX_INSTALL
  • Issue #3849 - adds SortingAlgorithm enum to all in sorting module
Auto-generated release notes

New Contributors

Full Changelog: v2024.10.02...v2024.12.06

Release Notes v2024.10.02

03 Oct 00:12
a44dd0f

Choose a tag to compare

Bug Fixes

  • Issue #3762 - Fix dataframe groupby aggregations when keys contain NaNs
  • Issues #3658, #3650, #3654, #3783, #3784, #3788 and PR #3386 - Fix IO bugs including:
    • reading segarrays containing NaNs and empty segments with hdf5 and parquet
    • reading dataframes containing uint and int segarray columns
    • CSV address sanitizer "use after free" memory issues
  • Issues #3648, #3676, #3682, #3679, #3687, #3666 - Fix multidimensional bugs in sorting, nonzero, repeat, flatten, and unflatten
  • Issue #3367 - Fixes racy condition in SegHead function
  • Issue #3468 - Fixes round trip discrepancies for Index with Categorical values
  • Issue #3649 - Fixes bitshift failures
  • Issue #3467 - Fixes indexing error in DataFrame instantiation

Major Updates

Minor Updates

Auto-Generated Release Notes
Read more

Release Notes v2024.06.21

21 Jun 19:30
cf6eeac

Choose a tag to compare

Bug Fixes

  • Issues #3074, #3234 - Fix bug reading Segarrays from parquet files
  • Issues #3001, #3185 - Fix broadcast bugs involving nans and Strings
  • Issue #3156 - Fixes Categorical.sort_values bug
  • Issues #3311, #3112 - Fix Parquet multi column byte writing and Parquet string column free
  • Issue #3115 - Fixes non-deterministic sparse_sum failure
  • Issue #3089 - Avoids out of memory crashes caused by in intents on makeDistArray
  • Issue #3009 and PRs #3232, #3316 - Improve performance of indexof1d and fix handling of null values
  • Issues #3158, #3222 - Fix print bugs involving Dataframe or Series containing a Segarray

Major Updates

  • PR #3303 - Drops support for Chapel 1.31
  • Issues #3343, #3346 - Pin numpy < 2.0 and python < 3.12.4
  • Issue #3148 - Updates IO functions to always return a dictionary
  • PRs #3238, #3314 and Issue #3347 - Reimplements CSV read to increase performance
  • Issue #3108 - Adds groupby.sample and dataframe.groupby.sample
  • Issue #2893 - Changes the behavior of dataframe.GroupBy.count to align with pandas
  • Issues #3086, #3118, #3245, #3322, #3167 and PRs #3110, #3280 - Add updates to Random module:
    • Adds choice, poisson, normal to random number generators
  • PRs #3242, #3305, #3160, #3223, #3237, #3142 - Improvements to Array API:
    • Add documentation for Array API functions
    • Add implementations ofvstack, clip, diff,pad and missing stats, search, and sort functions to Array API module
    • Compatibility improvements for Xarray chunk-manager
  • Issues #3213, #3206, #3202, #3208, #3217, #3188 - Add Index and MultiIndex properties:
    • Including levels,equals, names, ndim, etc
  • Issues #3050, #3192, #3128, #3196, #3198, #3200, #3130, #3123, #3194 - Work on proto tests:
    • Improvements to tests for dataframe, dtypes, groupby, io,numeric, symbol_table
    • Adds make-proto-tests command and updates our CI to run it

Minor Updates

  • Issues #3006, #3007 - Add median and count_nonzero
  • Issues #3079, #3080 - Add sum and += for boolean pdarrays
  • PRs #3221, #3211 - Add NYC taxi tutorial from CUG 2024
Auto-Generated Release Notes
Read more

Release Notes v2024.04.19

19 Apr 20:37
8ac2645

Choose a tag to compare

Bug Fixes

  • PR #3091 - Fixes Parquet double reads to properly account for null values
  • Issue #3087 - Fixes bug when reading non-float parquet columns with null values
  • Issue #3088 and PR #3090 - Fix an off by 1 bug in sparse_sum_helper

Major Updates

  • Issue #3083 - Optimizes Parquet Strings read
  • Issues #3033, #3054 - Optimize CSV write
  • Issues #3020, #3040 - Adds nan functions to DataFrame and Series
    • isna, notna, dropna, ...
  • Issues #3071, #3084 - Add permutation and shuffle to random number generators
  • Issue #3030 - Creates numpy subdirectory as part of the alignment effort
  • PRs #3056, #3093, #3070, #3072 - Improves and adds Array API functionality including manipulation and set functions

Minor Updates

  • PR #3076 - Adds support for large string Parquet type
  • Issue #3092 - Adds support for TLS token authentication
  • Issue #3045 - Adds map method to Index
  • Issue #3065 - Adds count to DataFrame
  • Issue #2913 - Adds isdecimal to Strings
  • Issue #3002 - Adds clip to pdarray
  • Issue #3062 - Enhances arkouda metrics capability
Auto-Generated Release Notes

New Contributors

Full Changelog: v2024.03.18...v2024.04.19

Release Notes v2024.03.18

18 Mar 22:51
e07f70e

Choose a tag to compare

Bug Fixes

  • Issue #3035 - Fixes inconsistent results when broadcasting with empty segments
  • Issue #2939 - Fixes TypeError in DataFrame.reset_index
  • Issue #2966 - Fixes error when pip installing from a tar file
  • Issue #2897 - Fixes bug where DataFrame.corr returns DataFrame without index
  • PR #3021 - Adds SegArray optimization and benchmark bug fix

Major Updates

  • Issue #2958 - Renames akstats to akscipy
  • Issue #2942 - Removes DataFrame.sorted
  • Issue #3024 and PR #2976 - Add sparse sum helper to util with merge based and sort based workflows
  • Issues #2993, #3008, #3017 - Add a random subfolder and stateful Generator objects
  • Issue #2974 - Adds Series.map
  • Issue #3019 - Adds outer join option to DataFrame merge
  • PRs #2936, #2967, #3014, #3027 - Improve Array API functionality specifically adding stats and manipulation functions

Minor Updates

  • Issue #2929 - Updates DataFrame.size to match pandas
  • Issues #2906, #2945 - Add shift operators between 2 bool pdarrays and between a combination bool and int64 pdarrays
  • Issues #2916, #2919 - Add isspace and capitalize to Strings
  • Issue #3023 - Adds to_markdown to DataFrame and Series
  • Issue #2957 - Adds Dot Function
  • Issue #2960 - Adds memory_usage functions
  • Issue #2924 - Updates DataFrame documentation
  • Issue #2896 - Updates DataFrame columns to return an Index
  • Issue #2952 - Makes Chapel 1.33 release default for CI testing
  • Issue #2985 - Updates libzmq version in Makefile
  • Issue #2981 - adds LICENSES folder including the licenses for numpy, pandas, and scipy
  • Issues #2969, #2971, #2977, #2989 - Update failing proto_tests
Auto-Generated Release Notes

Full Changelog: v2024.02.02...v2024.03.18