Releases: Bears-R-Us/arkouda
Release Notes v2025.12.16
Arkouda v2025.12.16
This release continues Arkouda’s push toward full NumPy and pandas compatibility, with major progress on multi-dimensional arrays, pandas ExtensionArray support, distributed performance, and developer tooling cleanup.
Supported environments and dependencies
This release was tested in CI with the following language versions:
- Python: 3.10, 3.11, 3.12, 3.13
- Chapel: 2.4.0, 2.5.0, 2.6.0
Notable dependency requirements
Runtime dependencies include:
- NumPy ≥ 2.0
- pandas ≥ 1.4.0, excluding 2.2.0 (
!= 2.2.0) - pyarrow ≥ 6.0.1, < 21.0.0
- tables (PyTables) ≥ 3.10.0
- h5py ≥ 3.7.0
- typeguard pinned to 2.10.0
For the full list of dependencies (including optional dev tools such as pytest, Sphinx, and linters), see pyproject.toml.
Highlights
Multi-Dimensional Array Expansion
Multi-dimensional support is now significantly more complete across the API:
- Multi-dimensional support added to or enhanced:
- Fixed Chapel instantiation limits for 3+ dimensions (#4227)
- Reorganized broadcasting logic and internals (#4978, #4737, #4737)
Distributed Performance & Algorithms
- New
repartitionByHashAPI for distributed workflows (#4500) - Adopted Chapel standard sort for distributed sorting (#5039)
- Refactored FeistelShuffle into innerArray for better performance (#5069)
- Performance improvements to
cumSum/cumProd(#4810)
pandas Integration & ExtensionArray Progress
- New Arkouda accessor for pandas
Index(#5074, #5110) - pandas DataFrame accessor for Arkouda (#4983)
- Renamed
ArkoudaBaseArray→ArkoudaExtensionArray(#5001) - ExtensionArray API improvements:
_from_sequence(#5078),copy(#5076),argsort(#4993) - Refactored
factorizeto remove pandas dependency (#4940) - Registered extension dtypes (#4946)
Developer Experience & CI Modernization
- CI updated to support Chapel 2.6; dropped Chapel 2.0–2.3 (#4986, #4991)
- Automated CI build improvements (#4892, #4893)
- Improved Makefile structure and debug ergonomics (#5128, #5133)
- Configurable compiled Arkouda dimensionality via
make(#5091) - Updated Arrow / Parquet handling, including Arrow <19 compatibility (#5146, #5164)
Tooling Cleanup & Code Quality
- Removed
isort,darglint, andpydocstyle(#5060, #5072) - Reduced ruff ignores and resolved formatting issues (#4979, #4980, #4982)
- Fixed mypy issues and improved type precision (#5093)
- Removed deprecated tests and legacy code paths (#5031)
Bug Fixes & Correctness
- Fixed edge-case failures for small sizes (
size <= 10) (#5054, #5052, #5045) - Fixed
ak.arraynegative number handling (#4984) - Fixed
concatenate(axis=1)behavior (#5030) - Fixed CSV parsing for quoted and multiline records (#5080)
- Improved numerical consistency with NumPy (
allclose) (#2956)
Full Changelog
v2025.09.30...v2025.12.16
Auto-Generated Release Notes
What's Changed
- Closes #4973: Benchmark for segPdarrayIndex by @1RyanK in #4974
- Adds multi-dim capability to the uniform random number generator by @drculhane in #4956
- Resolves Issue 4810, cumSum and cumProd performance by @drculhane in #4819
- Closes #4949: x.take by @ajpotts in #4950
- Closes #4102: pdaarraymanipulation module needs unit tests by @ajpotts in #4977
- remove D104 ignore code by @ajpotts in #4980
- Closes #2689: Value mismatch with numpy on
int(arr) // -float(arr)by @1RyanK in #4976 - Closes #4902: Resolve E203 errors by @ajpotts in #4979
- Closes #4981: Resolve D200 Errors by @ajpotts in #4982
- Improve CI to automate builds (Part 2) by @jabraham17 in #4892
- improve extension array base class by @ajpotts in #4927
- Closes #4954: versioneer.py reference setup.cfg which was removed. by @ajpotts in #4955
- Drop Chapel 2.0-2.3 from the CI build images and add 2.6 by @jabraham17 in #4986
- Closes #4962, adds multi-dim to value_counts by @drculhane in #4988
- Improve CI to automate builds (Part 3) by @jabraham17 in #4893
- Reorganizes the broadcast functions in numpy and array_api by @drculhane in #4978
- Closes #4994: Use innerArray record in Repartion.chpl by @1RyanK in #4995
- Fix docs build by @jabraham17 in #4998
- Part 3 of: #4348 Remove DAR103 errors by @ajpotts in #4992
- Closes #5001: rename ArkoudaBaseArray to ArkoudaExtensionArray by @ajpotts in #5007
- Closes #4984: ak.array with negative numbers still has problems by @1RyanK in #4985
- Closes #4993: ArkoudaArray.argsort by @ajpotts in #4999
- Closes #4227: Hitting Chapel's default instantiation limit when compiling for 3 or more dimensions by @1RyanK in #5002
- Add chapel 2.6 to CI by @jabraham17 in #4991
- Closes #4500: Create a repartitionByHash function by @1RyanK in #5005
- Closes 4737 - renames the registered name broadcast in BroadcastMsg.chpl to gbbroadcast by @drculhane in #5014
- Closes #2956: Implement
allclosefunction by @jaketrookman in #5000 - Add numNodes to server config to support colocale-based runs by @e-kayrakli in #5021
- Closes #4940: Refactor factorize to avoid pandas reference by @ajpotts in #4941
- Adds multi-dimensionality to ak.matmul to match numpy by @drculhane in #5009
- Part 1 of #5008 upgrade to typeguard 4.3.0 by @ajpotts in #5012
- Adds a fix to ak.matmul multi-dim by @drculhane in #5027
- Temporarily disable pytables installation test in the CI by @ajpotts in #5036
- fix some formatting errors identified by the pre-commit config by @ajpotts in #5034
- Closes #5031: remove deprecated tests by @ajpotts in #5032
- Aligns ak.poisson's handling of lam and size to that of numpy by @drculhane in #5023
- Closes 5040, fixing a bug in multi-dim matrix multiplication by @drculhane in #5042
- Closes #5054: test_set_jth when size==10 by @ajpotts in #5055
- Fix sort benchmark names in benchmark v2 by @e-kayrakli in #5037
- Closes #5028: Fix a small issue in interpretAsBytes by @1RyanK in #5029
- Closes #5052: test_randint_array_dtype_multi_dim fails when size==10 by @ajpotts in #5053
- Closes #5030: concatenate gives unexpected zeros when axis=1 by @1RyanK in #5066
- Closes #5064 ak.cast overloads for more precise type checking by @ajpotts in #5065
- Closes #5045: test_is_sorted fails when size<=10 by @ajpotts in #5046
- Closes #5056: GroupBy.min to handle segment of all nans in skipNaN mode by @ajpotts in #5057
- Closes #5049: Errors from pdarrayclass.del at end of unit test runs. by @ajpotts in #5051
- More ruff ignore codes by @ajpotts in #5047
- Omit runs folder from pre-commit checks by @ajpotts in #5048
- Closes #5072: remove darglint and pydocstyle by @ajpotts in #5073
- Closes #5062: ak.array overloads for more precise type checking by @ajpotts in #5063
- Closes #5067: ak.generic_msg overloads for more precise type checking by @ajpotts in #5068
- Closes #4569: Simplify and extend logic in doBigIntBinOpvv and doBigIntBinOpvvBoolReturn by @1RyanK in #4593
- Closes #5060 remove isort by @ajpotts in #5061
- Closes #5069: Investigate moving FeistelShuffle from list Repartition to innerArray by @1RyanK in #5070
- Closes #5093: Fix mypy problems by @1RyanK in #5094
- Closes #5078: ArkoudaExtensionArray._from_sequence by @ajpotts in #5079
- Cl...
Release Notes v2025.09.30
Release Notes
This release introduces several major new features, performance improvements, and bug fixes across Arkouda’s Python and Chapel codebases.
Highlights include the new pandas ExtensionArray implementation, expanded random number generation features, and improvements to Parquet I/O performance.
Supported environments and dependencies
This release was tested in CI with the following language versions:
- Python: 3.10, 3.11, 3.12, 3.13
- Chapel: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0
Notable dependency requirements
Runtime dependencies include:
- NumPy ≥ 2.0
- pandas ≥ 1.4.0, excluding 2.2.0 (
!= 2.2.0) - pyarrow ≥ 6.0.1, < 21.0.0
- tables (PyTables) ≥ 3.10.0
- h5py ≥ 3.7.0
- typeguard pinned to 2.10.0
For the full list of dependencies (including optional dev tools such as pytest, Sphinx, and linters), see pyproject.toml.
Major Changes
Implemented pandas ExtensionArray for Arkouda (Closes #4597, #4907, #4876, #4947) by @ajpotts
Added ak.rand to match np.random.rand (Closes #4736) by @drculhane
Added ak.shares_memory function (Closes #3284) by @ajpotts
Added ak.errstate context manager for error handling (Closes #3286) by @ajpotts
Added ak.Index.sort_values (Closes #3177) by @ajpotts
Added ak.fabs (Closes #4921) by @1RyanK
Added ascending argument to ak.argsort (Closes #4782) by @ajpotts
Improved Parquet read performance, especially for multiple column reads (Closes #4906) by @e-kayrakli
Enabled multi-dim output for ak.random.standard_exponential (Closes #4924) by @drculhane
Added destructors for Chapel-side and Python-side RNGs (Closes #4898) by @drculhane
Minor Changes
Expanded axis validation standardization across array API functions (Closes #4831, #4858, #4909, #4932) by @drculhane
Improved docstrings (Closes #3941, #3942, #4852, #4849, #4853, #4947) by @ajpotts, @1RyanK
Added global seed support for reproducibility (Closes #4777, #4726) by @drculhane
Improved shuffle benchmark with Feistel and alternatives (Closes #4818, #4845, #4787) by @1RyanK
Improved benchmark framework (Closes #4811, #4814, #4808, #4816, #4856) by @ajpotts
Added pytest-benchmark dependency (Closes #4821) by @jabraham17
Improved CI builds: Chapel 2.5 support, automated builds, Dockerfile fixes (Closes #4783, #4891, #4910, #4908) by @jaketrookman, @jabraham17
Added pyproject.toml for modern packaging (Closes #4209) by @ajpotts
Refined multi-dim build to reduce size (Closes #4791) by @ajpotts
Improved nbytes handling for bigint arrays (Closes #4850, #4896) by @1RyanK
Improved command registration (Closes #4953) by @e-kayrakli
Bug Fixes
Fixed ak.where for Categorical (Closes #4881) by @1RyanK
Fixed ak.randint behavior for bool (Closes #4872) by @1RyanK
Fixed conversion of numpy bigint zeros producing empty arrays (Closes #4884) by @1RyanK
Fixed cumsum vs cumulative_sum typo (Closes #4804) by @drculhane
Fixed handling of size/shape in ak.random.poisson (Closes #4916) by @drculhane
Fixed common type promotion in concat and stack (Closes #4889) by @drculhane
Fixed benchmark issues: average rate always zero, array_transfer.dat not populating, io_benchmark parsing (Closes #4824, #4863, #4862) by @ajpotts
Fixed doc build failures with Chapel 2.5.0 (Closes #4838) by @ajpotts
Fixed clang bitshift issue (Closes #4894) by @1RyanK
Fixed MaxArrayDims incorrectness (Closes #4565) by @1RyanK
Fixed negative server return values in rare cases (Closes #4157) by @ajpotts
Fixed intermittent test failures (test_set_uint) (Closes #4153) by @ajpotts
Fixed delGeneratorMsg bug (Closes #4933) by @ajpotts
Fixed PT003, T201, E127, Flake8 errors (Closes #4806, #4874, #4903, #4871) by @ajpotts
Fixed doctest failures in random and client modules (Closes #4798, #4860) by @ajpotts, @drculhane
Auto-generated release notes
What's Changed
- Closes 4801, adds an assert to a line in tests/numpy/random_test.py by @drculhane in #4802
- Closes 4804, fixing a "cumsum" vs "cumulative_sum" typo that was in PR 4755 by @drculhane in #4805
- Closes #4597: implement pandas ExtensionArray for arkouda by @ajpotts in #4598
- Closes #4806: Fix PT003 error by @ajpotts in #4807
- Adds ak.rand function to match np.random.rand by @drculhane in #4736
- Closes 4726, standardizing use of seed in unit tests by @drculhane in #4748
- Closes #4808: Expand pytest.N usage in benchmarks_v2 by @ajpotts in #4809
- Add pytest-benchmark dependency to setup.py by @jabraham17 in #4821
- Closes #4816: sort benchmark_v2/datdir/configs/field_lookup_map.json… by @ajpotts in #4817
- Closes #3284: shares_memory by @ajpotts in #4823
- Closes #4818: Shuffle benchmark by @1RyanK in #4825
- Closes #4798: doctest for random.generator module by @ajpotts in #4799
- Closes #4791: trim down multi-dim build by @ajpotts in #4792
- Fixes #4838: make doc fails when building with chapel 2.5.0 by @ajpotts in #4839
- Closes #4811: parse results for benchmark_v2/split_benchmark.py by @ajpotts in #4813
- Closes #4814: Refactor benchmarks/multiIO.py to new benchmark framework by @ajpotts in #4815
- Closes 4828, fixes some parametrizations by @drculhane in #4829
- Fixes #4824: In io_benchmark.py read Average rate is always zero by @ajpotts in #4826
- Closes #4782: add ascending argument to argsort by @ajpotts in #4784
- Closes #4787: Shuffling alternative by @1RyanK in #4789
- Closes 4831, standardizes axis checking for cases where axis can only be an integer by @drculhane in #4834
- update CI to use chapel 2.5 by @jaketrookman in #4783
- Closes #4845: Add Feistel to Shuffle Benchmark by @1RyanK in #4846
- Improved docstrings in client module by @ajpotts in #4849
- Closes #4853: Problem in shuffle docstring by @1RyanK in #4854
- Closes #4860: remove apollo skips on doctest unit tests by @ajpotts in #4861
- Closes #4856: optional benchmark_v2 submodule by @ajpotts in #4859
- Flake8 errors by @ajpotts in #4871
- Closes #3941: ak.hist_all docs example produces an error by @ajpotts in #4866
- Closes #4874 T201 errors by @ajpotts in #4875
- Closes #4863: 16-array Average rate always returns zero under benchm… by @ajpotts in #4865
- Part 2 of #4348: Remove the DAR103 errors from the docstrings by @ajpotts in #4852
- Closes 4777, creating a global seed by @drculhane in #4848
- Closes #4862: array_transfer.dat not populating with benchmark_v2 by @ajpotts in #4864
- Closes #4850: nbytes is meaningless for bigint arrays by @1RyanK in #4851
- Incorporates standard for axis validation to array_api/manipulation_functions.py by @drculhane in #4858
- Closes 4894: clang issue with bitshifts by @1RyanK in #4895
- Closes #4896: Update calc_num_bytes for bigint by @1RyanK in #4897
- Improve CI to automate builds (Part 1) by @jabraham17 in #4891
- Closes #3286: errstate by @ajpotts in #4886
- Implements destructors for chapel-side and python-side rngs by @drculhane in #4898
- Fix push for
build-CI-containerby @jabraham17 in #4908 - Closes #4872: ak.randint doesn't behave the same for bool as numpy by @1RyanK in #4873
- Closes #4876 improve extension array api take functions by @ajpotts in #4882
- Fixes common type promotion in array_api concat and stack by @drculhane in #4889
- Closes #4903: Resolve E127 Errors by @ajpotts in #4904
- Closes #3177 Index.sort_values by @ajpotts in #3235
- Fixes handling of size/shape parameter in ak.random.poisson by @drculhane in #4916
- Closes #4921: fabs by @1RyanK in #4922
- Standardizes axis validation and handling in array_api/statistical_functions.py by @drculhane in #4909
- Fix paths in new CI image Dockerfile by @jabraham17 in #4910
- Closes #4836: --saveU...
Release Notes v2025.08.20
Introduction
This release delivers a mix of new functionality, performance improvements, infrastructure updates, and ongoing work to align Arkouda more closely with NumPy and modern Python standards.
Highlights include:
- New array operations and utilities (
Strings.argsort,Categorical.argsort,isnumeric,deepcopyforak.array,max_bits_list, and a newsearchsortedimplementation). - Major system-level improvements such as MergeShuffle, repartitionByLocale, enhanced checkpointing (including a server heartbeat and bigint array support), and better configuration utilities.
- Expanded test coverage and benchmarking, with many benchmarks refactored for maintainability and consistency.
- Significant documentation work: missing docstrings filled in, doctests added, and adoption of NumPy-style docstring conventions with ruff-based linting.
- CI and infrastructure updates to improve reliability, including fixes for intermittent failures, expanded multi-dimensional test support, and branch migration from
mastertomain. - A number of important bug fixes addressing auto-checkpointing, Arrow dependency compatibility, type hinting, and CI stability.
Together, these changes improve Arkouda’s stability, usability, and developer experience, while continuing to advance its alignment with NumPy semantics.
Major Changes
- Improvements to checkpointing of server state:
- Add MergeShuffle option to shuffle (#4075, PR #4094)
- Create a
repartitionByLocalefunction (#4498, PR #4647) - Create a
max_bits_listfunction (#4621, PR #4622) - Add deep copying to
ak.array(#4691, PR #4741) - Add
isnumericfunction (#2915, PR #4694) - Add
Strings.argsort(#4642, PR #4643) - Add
Categorical.argsort(#4724, PR #4727) - Improvement performance for
searchsorted(#4656, PR #4656)
Minor Changes
Benchmarks
- Updates/refactors to: encoding, no_op, str-locality, split, setops_multiarray, scan, substring_search, flatten, reduce, io, in1d, bigint_conversion, csvIO, sort_cases, setops, groupby, parquet-fixed-strings, str_locality (#3567, #3573, #4670, #4672, #4679, #3575, #3581, #3568, #3574, #3572, #3571, #4675, #4684, #3578, #3577, #3570, #4682, #3579, PR #4610, PR #4662, PR #4671, PR #4673, PR #4680, PR #4664, PR #4665, PR #4611, PR #4663, PR #4652, PR #4651, PR #4676, PR #4685, PR #4650, PR #4646, PR #4613, PR #4683, PR #4712)
- Reformat
benchmark results.pyto parse all benchmark results (#4606, PR #4751) - Create a benchmark for
ak.find(#4743, PR #4745) - Strip out
correctness_onlymode from benchmarks (#4706, PR #4719) - Create a
pytest.Nforbenchmark_v2/conftest.py(#4704, PR #4735)
Documentation
- Adds missing docstrings across multiple modules (
match,matcher,index,groupby,alignment,categorical,client dtypes,dataframe,logger,installers.py) (#4639, #4640, #4635, #4634, #4629, #4630, #4631, #4632, #4638, #4530, PR #4639, PR #4640, PR #4635, PR #4634, PR #4629, PR #4630, PR #4631, PR #4632, PR #4638, PR #4531) - Improved docstring rendering for accessor module (#4759, PR #4762)
- Add doctest to series module (#4270, PR #4761)
- Fix doc duplication issue for reorged modules (#4372, PR #4757)
- Switch pydocstyle to NumPy convention (#4518, PR #4518)
- Add ruff docstring linting (#4763, PR #4764)
- Strip
ak.connectfrom examples (#4779, PR #4780)
CI / Testing / Infra
- Set max-parallel for multi-dim tests in CI (#4696, PR #4697)
- Reactivate pytest timeout for unit tests (PR #4689)
- Parameterize size in
test_multi_col_merge(#4713, PR #4714) - Update CI to use a slim build for multi-dim testing (#4778, PR #4778)
- Switch branch from master to main + port PRs (#4733, PR #4734, PR #4740)
- Skip testing of auto-checkpoints.py on unsupported hardward (PR #4707)
Other
Bug Fixes
- Fix auto-checkpointing failure (#4700, PR #4700)
- Fix mypy 17.0.0 errors (#4715, PR #4716)
- Patch Arrow dependency issue (PR #4769)
- Fix intermittent CI failures during driver compilation (#4688, PR #4776)
- Fix DAR103 errors in docstrings (#4348, PR #4772)
- Fix type hint issue with
numeric_scalars(#2528, PR #4767) - Temporarily add ignore codes to
.pre-commit-config.yaml(#4793, PR #4794) - Deletion of SegString’s values and offsets + more (PR #4773)
- Support
ak.arrayfor empty strings (#4725, PR #4732)
Auto-generated release notes
What's Changed
- Closes #4692: tolist to match numpy by @ajpotts in #4693
- Closes #3567: Update encoding_benchmark by @ajpotts in #4610
- adds missing docstrigs to the match module by @ajpotts in #4639
- add missing docstring to the matcher module by @ajpotts in #4640
- Closes #3573: Update no_op_benchmark by @ajpotts in #4662
- Closes #4677: refactor small-str-groupby by @ajpotts in #4678
- Closes #4670: update str-locality benchmark by @ajpotts in #4671
- Closes #4672: update split_benchmark.py by @ajpotts in #4673
- Closes #4679: refactor setops_multiarray benchmark by @ajpotts in #4680
- Fix auto-checkpointing failure by @vasslitvinov in #4700
- Closes #3575: Update scan_benchmark by @ajpotts in #4664
- Closes #3581: Update substring_search_benchmark by @ajpotts in #4665
- Closes #4696: set max-parallel for multi-dim tests in CI by @ajpotts in #4697
- Closes #3568: Update flatten_benchmark by @ajpotts in #4611
- Revert "Temporarily don't timeout pytest for debugging" by @ajpotts in #4689
- Closes #3574: Update reduce_benchmark by @ajpotts in https://github....
Release Notes v2025.07.03
Arkouda v2025.07.03
We're excited to announce a feature-packed release of Arkouda with enhanced NumPy compatibility, powerful new array functions, performance improvements, CI tooling, and major documentation progress.
Features
Array Functions
-
Added:
append,argsort,diff,eye,newaxis,nextafter,percentile,quantile,repeat,result_type,take,tile,vecdot,xp.trapz
(#2998, #3000, #3003, #3004, #3292, #3755, #4393, #4418, #4419, #4458, #4483, #4484, #4502, PR #4101, PR #4127, PR #4146, PR #4219, PR #4361, PR #4393), PR #4394, PR #4418, PR #4419, PR #4552) -
Improved:
ak.diff,ak.nextafter,ak.repeat,ak.reshape,ak.take,ak.tile,ak.argsort
(#2998, #3000, #3004, #3755, #4101, #4146, #4147, #4165, #4394, #4418, #4419, #4458, PRs #4101, #4146, #4394, #4552) -
Axis and Broadcasting Enhancements:
Checkpointing and Logging
-
Introduced experimental checkpointing of server state, with support for numeric arrays and automatic checkpointing triggered by memory limits or idle time.
(#2384, PRs #3915, #4391, #4549, PR #4592, #4644) -
Improved logging behavior:
Project Infrastructure
- Upgraded Apache Arrow to 19.0.1 for compatibility and stability improvements
(#3981, PRs #3982, #4342, PR #4359)
Other
-
Introduced
ak.apply,ak.result_type(now withbigintsupport), andak.searchsorted
(#3005, #4483,#4235, PRs #3963, #4214, PR #4440, #4484) -
Added
ak.coargsort(ascending=...)keyword argument
(#4464, PR #4467) -
Added standard gamma distribution function to
ak.random
(#3846, PR #4089)
API Enhancements and Compatibility
API Enhancements and Compatibility
-
Improved NumPy 2.0 compatibility:
- Upgraded numpy dependency to 2.0.0 (#4098, PR #4188, PR #4213)
- Added or aligned:
ak.can_cast,ak.sign,ak.result_type,ak.dtype,ak.vecdot,ak.eye,ak.dot,ak.arange,ak.transpose,ak.hstack,ak.where,ak.full,ak.reshape
(#3329, #3337, #4092, #4165, #4312, #4321, #4555, #4468, PR #4105, PR #4116, PR #4174, PR #4224, PR #4472, PR #4522, PR #4556) - Improved parameter alignment to NumPy (
ak.eye,ak.where,ak.histogram, etc.) (#4096, PR #4078, PR #4482) - Enabled
boolas alias forbool_; enhanced dtype detection for builtinsbool,float,int(#4186, #4627, PR #4187, PR #4628)
(#3337, #3329, #3337, #3981, #4092, #4096, #4105, #4116, #4124, #4165, #4186, #4188, #4213, #4224, #4321, #4312, #4468, #4481, #4483, #4501, #4520, #4555, #4556, #4552, #4627; PRs #4078, #4103, #4174, #4185, #4390, #4213, #4505, #4522, #4628)
-
Reorganized modules into dedicated
numpy/,scipy/directories for API clarity
(PRs #4103, #4185, #4390) -
Miscellaneous API additions and improvements:
-
DataFrame and Merge Improvements
Performance Improvements
- Improved performance and stability in
ak.permutation, distributed array creation, and sorting
(#3974, PRs #3975, #4242)
Deprecations and Refactors
-
Removed deprecated or obsolete features:
-
Refactored and modernized core logic:
ak.arangenow usesinstantiateAndRegister(#4382, PR #4383)- Improved logic for
binopvvandbinopvs(#4459, #4460, PRs #4462, #4563) - Reverted
ak.zerosbehavior to previous default (PR #4141) - Refactored import and module layout (
__init__.py, sort module, CHPL_HOME independence)
(PRs #3972, #4453, #4551)
-
Simplified internals and extended platform support:
-
Added internal or system-level functionality:
Benchmark Refactor
Release Notes v2025.01.13
Bug Fixes
- Issues #3931 and #3933: fixes bug in the Makefile preventing
make install-arrowfrom successfully completing on some systems. - Issue #3947: fixes bug where reshape was failing for a single integer argument.
Major changes
- Issues #3939 and #3957: refactors of the Makefile to streamline offline arkouda builds
- Issue #3960: creates a comm_diagnostics module for querying comm diagnostic statistics.
Minor changes
- Issue #3929: Adds chapel 2.1, 2.2 to the github CI
- Issue #3911: minor performance improvement to reduction module
- Issues #3881, #3882, and #3872: Completes the refactoring of all functions in EfuncMsg.chpl to the new interface.
Auto-generated release notes
- Closes #3929: Add chapel 2.1, 2.2 to CI by @ajpotts in #3930
- Part of #3931 bug in make install deps by @ajpotts in #3932
- Closes #3939: install-deps to work offline by @ajpotts in #3940
- Part 1 of #3933: failing make install-arrow by @ajpotts in #3936
- Part 2 of #3933 failing make install arrow by @ajpotts in #3944
- Part of #3911 reduction performance improvements by @ajpotts in #3914
- Closes 3943 issue with reshape by @drculhane in #3947
- Part 3 of #3933: failing make install-arrow by @ajpotts in #3946
- Part 2 of #3957: simplify offline builds by @ajpotts in #3959
- Part 4 of #3957: simplify offline builds by @ajpotts in #3965
- Closes 3881 3882 3872 etc by @drculhane in #3937
- Read multiple row groups in Parquet files correctly by @jhh67 in #3950
- Revert "Read multiple row groups in Parquet files correctly" by @ajpotts in #3969
- Closes #3960: python interface for CommDiagnostics by @ajpotts in #3966
New Contributors
Full Changelog: v2024.12.06...v2025.01.13
Release Notes v2024.12.06
Bug Fixes
- Issue #3870 - fixes bug in reshape for bigint type
- Issue #3821 - fixes bug in stridable indexing of Strings in multilocale
- PR #3804 - fixes sparseMatToPdarray test failures for distributed arrays
- PR #3857 - fixes file location reporting in register-commands.py
- Issue #3842 - fixes mypy CI failures
Major changes
- PRs #3840, #3877 - adds Sparse Matrix creation from Pdarrays
- Issues #3823, #3827 - adds flatten function
- Issues #3782, #3851, #3820 - adds flip function
- Issue #3300 - adds shape function
- Issue #3904 - adds function to return list of all compiled dimensions available
- Issues #3886, #3813, #3866, #3855, #3809, and PRs #3874, #3854, #3847,#3845, #3841, #3832, #3799, #3878 - refactor and improve server side message argument handling and convert modules to new framework
Minor changes
- Numpy Alignment
- Issues #3839, #3560, #3796 - refactor benchmarks to use pytest framework and add to CI.
- Issue #3815, PRs #3880, #3812, #3926, #3912, #3802 - unit test improvements
- Issues #3902, #3896, #3818, #3883, #3887 - reduce warnings
- Issue #3708 - refactors array_api to call functions from arkouda.pdarray_creation
- PRs #3814 and #3826 - performance improvements to array function
- PR #3862 - updated the hdf5 download link in the Makefile
- Issue #3905 - assert_equivalent compares shapes of pdarrays
- PR #3818 - improves documentation for LINUX_INSTALL
- Issue #3849 - adds SortingAlgorithm enum to all in sorting module
Auto-generated release notes
- #3802: sporadic failures of test_assert_frame_equal_check_exact by @ajpotts in #3808
- Closes #3796: Add benchmarks to CI by @ajpotts in #3810
- Closes #3283: histogram2d between different dtypes by @jaketrookman in #3763
- Closes #3811: Roll back test change to determine impact on testing by @bmcdonald3 in #3812
- Fix performance regression in array transfer performance by @jeremiah-corrado in #3814
- Closes #3782: flip function to match numpy by @ajpotts in #3791
- Closes #3815: Disable
client_testfor nightly due to machine issues by @stress-tess in #3817 - small instruction fix by @ItsQuinnMoore in #3818
- Closes #3820: bug in flip multi-local by @ajpotts in #3822
- remove Commands.chpl from tree by @jeremiah-corrado in #3799
- Fix array transfer performance regression by @jeremiah-corrado in #3826
- Closes #3714: pdarray.shape should be a tuple by @ajpotts in #3803
- Fix sparseMatToPdarray test failures for distributed arrays by @jeremiah-corrado in #3804
- Closes #3827: rename flatten to split by @ajpotts in #3828
- Fixes #3821: Bug in stridable indexing of Strings in multilocale by @stress-tess in #3830
- Closes #3823 flatten function to match numpy by @ajpotts in #3825
- Closes 3818 -- eliminates warning messages about tilde vs not by @drculhane in #3829
- Refactor
SparseMatrixMsgto use automated registration by @jeremiah-corrado in #3832 - Closes #3781 move random module to numpy submodule by @ajpotts in #3835
- Closes #3842: Fixes mypy CI failures by @stress-tess in #3843
- Part of argTypeReductionMessage refactor by @ajpotts in #3845
- Refactor arg type reduction message pt2 by @ajpotts in #3847
- Closes #3560 Update argsort_benchmark by @ajpotts in #3838
- Creating Sparse Matrix from Pdarrays by @ShreyasKhandekar in #3840
- Closes #3849: Add
SortingAlgorithmenum to__all__by @stress-tess in #3853 - Fixes #3851: Error when running string
flipby @stress-tess in #3852 - Support where-clause evaluation in registration annotations by @jeremiah-corrado in #3841
- Closes #3861: Update hdf5 download link in
Makefileby @stress-tess in #3862 - Part 3 of argTypeReductionMessage refactor by @ajpotts in #3854
- Fix broken error reporting for Chapel 2.0 in
register-commands.pyby @jeremiah-corrado in #3857 - Closes 3809 moves trig and hyp fns to new interface by @drculhane in #3863
- Closes #3868: move squeeze functionality to arkouda.numpy. by @ajpotts in #3869
- Part of #3708: array_api to call functions from arkouda.pdarray_crea… by @ajpotts in #3758
- Fix performance regression in reductions benchmark by @jeremiah-corrado in #3874
- Optimize creation of sparrays from pdarrays by @ShreyasKhandekar in #3877
- Eliminates duplicates in tests/numpy/numeric_test.py by @drculhane in #3880
- LayoutCS deprecation warning fix by @jeremiah-corrado in #3883
- Closes #3884: Remove _squeeze function by @ajpotts in #3885
- Fix build error caused by #3883 by @jeremiah-corrado in #3887
- Closes #3855: refactor boolReductionMsg by @ajpotts in #3876
- Closes 3813 and 3866 -- moves several new functions to the new interface (abs, square, all exp and log, isnan, isinf, isfinite) by @drculhane in #3873
- Closes #3896 PytestUnknownMarkWarning for pytest.mark.skip_if_nl_grea… by @ajpotts in #3897
- Closes #3300: shape function by @ajpotts in #3900
- Closes #3886 refactor idx reduction msg by @ajpotts in #3889
- Closes #3902: truth value of an empty array DeprecationWarning by @ajpotts in #3903
- Closes 3878 - refactors rounding functions to new interface, pulls hash function into their own procs by @drculhane in #3898
- Part of #3839 new benchmarks to output performance graph format by @ajpotts in #3894
- Closes #3864 max and min of bool to return bool like numpy by @ajpotts in #3901
- Closes #3870: bug in reshape for bigint type by @ajpotts in #3907
- Closes #3904: function to return list of all compiled dimensions avai… by @ajpotts in #3909
- Closes #3905: assert_equivalent to compare shapes of pdarrays by @ajpotts in #3906
- Closes #3912: failing unit test test_is_locally_sorted_multidim unde… by @ajpotts in #3913
- Closes #3926: OverMemoryLimitError in pdarrayclass_test by @ajpotts in #3927
New Contributors
- @ItsQuinnMoore made their first contribution in #3818
Full Changelog: v2024.10.02...v2024.12.06
Release Notes v2024.10.02
Bug Fixes
- Issue #3762 - Fix dataframe groupby aggregations when keys contain
NaNs - Issues #3658, #3650, #3654, #3783, #3784, #3788 and PR #3386 - Fix IO bugs including:
- reading segarrays containing
NaNs and empty segments with hdf5 and parquet - reading dataframes containing uint and int segarray columns
- CSV address sanitizer "use after free" memory issues
- reading segarrays containing
- Issues #3648, #3676, #3682, #3679, #3687, #3666 - Fix multidimensional bugs in sorting,
nonzero,repeat,flatten, andunflatten - Issue #3367 - Fixes racy condition in SegHead function
- Issue #3468 - Fixes round trip discrepancies for Index with Categorical values
- Issue #3649 - Fixes bitshift failures
- Issue #3467 - Fixes indexing error in DataFrame instantiation
Major Updates
- Issues #3628, #3703 - Drop python
3.8support - Issue #3355 - Pins
scipy<=1.13.1 - Issues #3332, #3334, #3351, #3360, #3417, #3419, #3504, #3613, #3695, #3769, #3767, #3711 and PRs #3363, #3368, #3379 -parquet optimizations:
- Added fixed length flag for string reads
- Read strings and byte sizes in batches
- Simplified source code
- Issues #3336, #3362, #3183, #3364, #3226, #3523, #3278,#3373, #3372, #3627 - Improve random module with a focus on numpy alignment. Adding:
exponential,lognormal,logistic- multidimensional functionality to Random module
- Issues #3294, #3639, #3665, #3709 - Improve testing and add
deletefunction for multidimensional arrays - Issues #3425, #3526, #3632, #3656, #3631, #3718, #3720, #3722, #3771, #3657 and PRs #3345, #3358, #3359, #3371, #3518, #3474, #3521, #3525, #3590, #3606, #3603, #3685, #3672, #3691, #3789, #3773, #3786, #3634, #3671, #3655, #3697 - Refactor and improve server side message argument handling
- PRs #3516, #3593, #3745 - Add initial implementation of sparse matrix functionality including matrix multiplication,
fill_vals, andto_pdarray
Minor Updates
- Issues #2978, #3702 - Strip out ArrayView (replaced by multidimensional pdarray functionality)
- Issue #3302 - Adds
GroupBy.head - Issue #3326 - Adds
DataFrame.assign - Issue #3510, #3511 - Update
DataFrame.to_pandasandSeries.to_pandasto handle categoricals - Issue #3293, #3428 - Add
putmaskfunctionality - Issue #3297 - Adds
array_equal - Issue #3742 - move numeric module to arkouda.numpy
- Issues #3289, #3288, #3291, #3295, #3299, #3298, #3301, #3287, #3296 - Modify dtypes for better numpy alignment
- rename
booltobool_, align with numpy scalar type, removetranslate_np_dtype
- rename
- Issues #3259, #3265, #3267, #3271, #3275, #3269, #3263, #3273, #3261, #3385, #3400, #3403, #3409, #3440, #3445, #3457, #3448, #3452, #3454, #3459, #3461, #3463, #3465, #3407, #3442, #3446, #3405, #3411, #3389, #3212, #3145, #3144, #3143, #3231, #3441, #3447, #3458, #3443, #3462, #3466 , #3455, #3444, #3438, #3450, #3460, #3464 , #3388, #3430, #3624 , #3453, #3413, #3646, #3402, #3439, #3669, #3415, #3421 - Transitions to new testing suite including updating
make test - Issues #3508, #3748, #3759, #3727, #3378 - Updates documentation including:
- chapel tutorial, installation docs, and documentation about memory pressure during server builds
- Issues #3793, #3798, #3797 and PR #3730 - Updates to benchmarks
Auto-Generated Release Notes
- Closes #3308 Unify file permissions by @ajpotts in #3309
- Closes #3332: Split Parquet code into multiple files by @bmcdonald3 in #3333
- temporary fix for #3355: pin
scipy<=1.13.1to avoid CI failures by @ajpotts in #3356 - Closes #3334, #3351: Simplify server side string code and added fixed length by @bmcdonald3 in #3335
- Ignore new Parquet object files by @bmcdonald3 in #3363
- Closes #3336, #3362: Reuse random number generation loop structure by @stress-tess in #3352
- Closes #3259: deprecate test/scipy/scipy_test.py and special_test.py by @ajpotts in #3260
- Closes #3265 deprecate tests/numeric_test by @ajpotts in #3266
- Closes #3267 deprecate tests/dtypes_test by @ajpotts in #3268
- Closes #3271 deprecate tests/index_test by @ajpotts in #3272
- Closes #3275 deprecate tests/categorical_test by @ajpotts in #3277
- Closes #3360: Reduce code duplication in Parquet read code with templates by @bmcdonald3 in #3361
- Closes #3183, #3364: Add
exponentialdistribution and aggregation to random generator loop by @stress-tess in #3310 - Simplify Command Map by @jeremiah-corrado in #3345
- Adds arkouda.testing module by @ajpotts in #3186
- Closes #3269 deprecate tests/datetime_test by @ajpotts in #3270
- Remove string.doFormat, replacing with string.format by @jeremiah-corrado in #3365
- Closes #3302 GroupBy.head by @ajpotts in #3324
- Refactor MessageArgs by @jeremiah-corrado in #3358
- Closes Ticket #3263: deprecate tests/dataframe_test by @ajpotts in #3264
- Closes #3273 deprecate tests/series_test by @ajpotts in #3274
- Remove number of files multiplication for IO benchmark by @bmcdonald3 in #3368
- 3231 unique unit tests by @drculhane in #3258
- Adds missing numpy dtypes by @ajpotts in #3330
- Closes #3367 racy condition in SegHead function by @ajpotts in #3369
- Closes #3261 deprecate tests/numpy by @ajpotts in #3262
- Closes #3375: Cleanup indexof1d code by @stress-tess in #3377
- Disable Parquet multi row group test until resolved by @bmcdonald3 in #3379
- Closes #3326 DataFrame.assign by @ajpotts in #3327
- Refactor SymbolTable and error handling by @jeremiah-corrado in #3359
- Resolves #3294 - Add numpy-like delete function by @jeremiah-corrado in #3321
- Closes #3281 rename bool to bool_ to match numpy by @ajpotts in #3282
- Fixes #3392: Fix mypy CI failures by @stress-tess in #3394
- Resolve CSV Asan "use after free" memory issues by @ShreyasKhandekar in #3386
- Closes #3376 more numpy imports by @ajpotts in #3381
- Closes #3385 groupby_test.py by @ajpotts in #3397
- Closes #3400 deprecate alignment_tests.py by @ajpotts in #3401
- Closes #3403 deprecate bigint_agg_test.py by @ajpotts in #3404
- Closes #3409 deprecate tests/client_dtypes_test.py by @ajpotts in #3410
- Closes #3417: Separate Parquet string read code from generic read function by @bmcdonald3 in #3418
- Closes #3419: Remove intertwined list column and string column byte calculation logic by @bmcdonald3 in #3420
- Closes #3425: Improve Msg Function Registration for Module Tracking by @bmcdonald3 in #3424
- Closes #3226: Adds parameterization to test_shuffle and test_permutation by @drculhane in #3320
- Closes #3414 deprecate compare_test.py by @ajpotts in #3416
- Closes #3293: Add
putmaskby @drculhane in #3370 add-pathmodification for building on Horizon by @brandon-neth in #3423- Closes #3405 deprecate tests/bitops_test.py by @ajpotts in #3406
- Closes #3411 deprecate tests/client_test.py by @ajpotts in #3412
- Automated command registration by @jeremiah-corrado in #3371
- Closes #3475 make fails when lib and lib64 directories are both present by @ajpotts in #3503
- Closes #3504: Improve Parquet Integration: Stop Using Array Views by @bmcdonald3 in #3505
- Remove support for pre-2.0 versions of Chapel by @jeremiah-corrado in #3477
- Adds skip configuration for multidimensional histogram test by @brandon-neth in #3506
- Updates PROTOs pdarray_creation_test by @drculhane in #3393
- Closes #3514-add pandas-stubs to arkouda-env-dev.yml by @ajpotts in #3515
- Stop requiring manual installation of
chapel-pyto register commands by @jeremiah-corrado in #3518 - Closes #3407 deprecate tests/check.py by @ajpotts in #34...
Release Notes v2024.06.21
Bug Fixes
- Issues #3074, #3234 - Fix bug reading
Segarrays from parquet files - Issues #3001, #3185 - Fix broadcast bugs involving
nans andStrings - Issue #3156 - Fixes
Categorical.sort_valuesbug - Issues #3311, #3112 - Fix Parquet multi column byte writing and Parquet string column free
- Issue #3115 - Fixes non-deterministic
sparse_sumfailure - Issue #3089 - Avoids out of memory crashes caused by
inintents onmakeDistArray - Issue #3009 and PRs #3232, #3316 - Improve performance of
indexof1dand fix handling of null values - Issues #3158, #3222 - Fix print bugs involving
DataframeorSeriescontaining aSegarray
Major Updates
- PR #3303 - Drops support for Chapel
1.31 - Issues #3343, #3346 - Pin
numpy < 2.0andpython < 3.12.4 - Issue #3148 - Updates IO functions to always return a dictionary
- PRs #3238, #3314 and Issue #3347 - Reimplements CSV read to increase performance
- Issue #3108 - Adds
groupby.sampleanddataframe.groupby.sample - Issue #2893 - Changes the behavior of
dataframe.GroupBy.countto align with pandas - Issues #3086, #3118, #3245, #3322, #3167 and PRs #3110, #3280 - Add updates to
Randommodule:- Adds
choice,poisson,normalto random number generators
- Adds
- PRs #3242, #3305, #3160, #3223, #3237, #3142 - Improvements to Array API:
- Add documentation for Array API functions
- Add implementations of
vstack,clip,diff,padand missing stats, search, and sort functions to Array API module - Compatibility improvements for Xarray chunk-manager
- Issues #3213, #3206, #3202, #3208, #3217, #3188 - Add
IndexandMultiIndexproperties:- Including
levels,equals,names,ndim, etc
- Including
- Issues #3050, #3192, #3128, #3196, #3198, #3200, #3130, #3123, #3194 - Work on proto tests:
- Improvements to tests for
dataframe,dtypes,groupby,io,numeric,symbol_table - Adds
make-proto-testscommand and updates our CI to run it
- Improvements to tests for
Minor Updates
- Issues #3006, #3007 - Add
medianandcount_nonzero - Issues #3079, #3080 - Add
sumand+=for boolean pdarrays - PRs #3221, #3211 - Add NYC taxi tutorial from CUG 2024
Auto-Generated Release Notes
- Closes #3068 add doc strings for numpy imports by @ajpotts in #3077
- Add a random sampling with support for a weights array by @jeremiah-corrado in #3110
- Closes #3112: Fix Parquet string column free by @bmcdonald3 in #3113
- Closes #3115: Fix non-deterministic sparse_sum failure by @stress-tess in #3117
- Closes #3086: Add
choiceto random number generators by @stress-tess in #3114 - Closes #3118: Move
choiceimplementation into arkouda by @stress-tess in #3138 - Closes #2947 change the name of the class dataframe.GroupBy by @ajpotts in #3146
- Avoid a warning about mismatched parSafe settings for list initialization by @lydia-duncan in #3149
- Closes #3116 remove DataFrame._columns by @ajpotts in #3147
- Closes #3124-dataframe.pyi-file and Closes #3097 numpy import docs at module level by @ajpotts in #3141
- Closes #3135 Update scipy/special_test by @ajpotts in #3137
- 3050 groupby etc by @drculhane in #3111
- multidimensional array bug fixes by @jeremiah-corrado in #3142
- Closes #3123-make-proto-tests by @ajpotts in #3126
- Closes #2893 dataframe.GroupBy.count to align with pandas by @ajpotts in #3125
- Closes #3051 Update akscipy_test by @ajpotts in #3136
- Fixes #3158:
Dataframecontaining aSegarray .__str__()bug by @stress-tess in #3161 - Closes #3089: Avoid OOM Crashes caused due to
inintents onmakeDistArrayby @ShreyasKhandekar in #3163 - Resolve deprecation warning about not using 'new' in dmapped expressions by @jeremiah-corrado in #3162
- Closes #3079 and #3080: Sum and Plus Equal of Boolean Arrays by @jaketrookman in #3154
- Closes #3108: Add
groupby.sampleanddataframe.groupby.sampleby @stress-tess in #3157 - Closes #3174: loosens type return restrictions of sum by @stress-tess in #3175
- Fixes #3001: nan broadcast bug by @stress-tess in #3173
- Dataframe Indexing by @brandon-neth in #3109
- Closes 3190 add mypy.ini by @ajpotts in #3191
- Closes #3192 PROTO_tests/tests/dtypes_test.py is failing by @ajpotts in #3193
- Fixes #3156:
Categorical.sort_valuesbug by @stress-tess in #3168 - Closes #3148: Update IO functions to always return a dictionary by @stress-tess in #3164
- Re # 3128 fixes errors and omissions in PROTO-tests version of datafr… by @drculhane in #3139
- 3130 numeric test slight revamp by @drculhane in #3151
- 1D implementations of median and count_nonzero by @drculhane in #3187
- Closes #3196 PROTO_tests/tests/symbol_table.py failing by @ajpotts in #3197
- Closes #3198 PROTO_tests/tests/io_test.py failing by @ajpotts in #3199
- Closes #3200 PROTO_tests/tests/dataframe_test.py failing by @ajpotts in #3201
- Closes #3204 is_numeric to handle Index and Series type by @ajpotts in #3205
- Closes #3206 MultiIndex.levels by @ajpotts in #3207
- Array-API slice Assignment by @jeremiah-corrado in #3166
- Implement missing stats, search and sort functions for Array API by @jeremiah-corrado in #3160
- Closes #3202 Index.inferred_type by @ajpotts in #3203
- Closes #3208-Index.equals by @ajpotts in #3209
- Closes #3194 add proto tests to CI by @ajpotts in #3195
- Add benchmark for for CSV Read and write perf by @ShreyasKhandekar in #3189
- Fixes #3185: strings broadcast bug by @stress-tess in #3210
- Closes #3167: Add
normalto random number generators by @stress-tess in #3180 - Add NYC taxi tutorial from CUG 2024 by @bmcdonald3 in #3211
- Fix jupyter notebook formatting by @bmcdonald3 in #3221
- Closes #3009:
indexof1dto handle null values by @stress-tess in #3169 - Compatibility improvements for Xarray chunk-manager by @jeremiah-corrado in #3223
- Closes #3215: Index.__get__item can accept a list by @ajpotts in #3216
- Closes #3217: MultiIndex.get_level_values by @ajpotts in #3218
- Move some definitions from ArrowFunctions header to source by @e-kayrakli in #3236
- Reduce file size for csvIO benchmark by @ShreyasKhandekar in #3239
- Part of #3229: CI failures due to
indexof1dby @stress-tess in #3232 - Fixes #3074: Bug reading segarrays from parquet files by @stress-tess in #3233
- Closes #3227 add pandas stubs library by @ajpotts in #3228
- Closes #3213 Index properties by @ajpotts in #3214
- Add implementations of
clip,diff,padto Array API module by @jeremiah-corrado in #3237 - Closes #3188 multi index.equals by @ajpotts in #3225
- Fixes #3222: series of segarray print bug by @stress-tess in #3240
- Fixing a missing
ilocusage by @brandon-neth in #3243 - Closes #3249: Fix issue with finding incorrect conftest file for proto tests by @bmcdonald3 in #3250
- Fixes #3234: segarray with empty segments and nans parquet bug by @stress-tess in #3241
- Array API Documentation by @jeremiah-corrado in #3242
- Fixes #3252: proto
test_segarray_readfailure with multi-locale by @stress-tess in #3254 - Closes #3255 move numeric.floor to numpy module by @ajpotts in #3257
- Remove single-column cases from multi-col-merge test. by @brandon-neth in #3248
- Benchmark Display P...
Release Notes v2024.04.19
Bug Fixes
- PR #3091 - Fixes Parquet
doublereads to properly account for null values - Issue #3087 - Fixes bug when reading non-
floatparquet columns with null values - Issue #3088 and PR #3090 - Fix an off by 1 bug in
sparse_sum_helper
Major Updates
- Issue #3083 - Optimizes Parquet
Stringsread - Issues #3033, #3054 - Optimize CSV write
- Issues #3020, #3040 - Adds
nanfunctions toDataFrameandSeriesisna,notna,dropna, ...
- Issues #3071, #3084 - Add
permutationandshuffleto random number generators - Issue #3030 - Creates numpy subdirectory as part of the alignment effort
- PRs #3056, #3093, #3070, #3072 - Improves and adds Array API functionality including manipulation and set functions
Minor Updates
- PR #3076 - Adds support for large string Parquet type
- Issue #3092 - Adds support for TLS token authentication
- Issue #3045 - Adds
mapmethod toIndex - Issue #3065 - Adds
counttoDataFrame - Issue #2913 - Adds
isdecimaltoStrings - Issue #3002 - Adds
cliptopdarray - Issue #3062 - Enhances arkouda metrics capability
Auto-Generated Release Notes
- Closes #3030 numpy alignment directory structure by @ajpotts in #3038
- Closes #3040 isna and notna for series by @ajpotts in #3048
- Closes #3033: Optimize CSV write by @stress-tess in #3053
- Closes #3063: Fix deprecation warnings by @stress-tess in #3064
- Closes #2913: add isdecimal by @jaketrookman in #3015
- Closes #3045 index.map by @ajpotts in #3057
- Array API Set functions by @jeremiah-corrado in #3070
- Fix --print-used-modules for functions registered with
@arkouda.registerNDby @jeremiah-corrado in #3072 - Closes #3054: Dynamically switch to batching for larger csv writes by @stress-tess in #3061
- Closes #3002: Add
ak.clipfunctionality by @drculhane in #3043 - enhance arkouda metrics capability by @hokiegeek2 in #3067
- Partially addresses issue #3050, updating PROTO tests. by @drculhane in #3075
- Add support for large string Parquet type by @bmcdonald3 in #3076
- Closes #3071: Add permutation to our generators by @stress-tess in #3078
- Closes #3065 DataFrame.count by @ajpotts in #3081
- Part of #3088: Generate seed for sparse sum test by @stress-tess in #3090
- Fix Parquet double reads to properly account for null values by @bmcdonald3 in #3091
- Array API Manipulation Function Improvements by @jeremiah-corrado in #3056
- Closes #3084: Add
shuffleto random number generators by @stress-tess in #3085 - Pdarray indexing by @jeremiah-corrado in #3093
- Fixes #3087: Failure reading non-float parquet columns with null values by @stress-tess in #3094
- Closes #3083: Optimize Parquet string read code by @bmcdonald3 in #3082
- add support for TLS token authentication by @hokiegeek2 in #3096
- Fixes #3088: sparse sum nightly failures by @stress-tess in #3098
- Closes #3020 dataframe.dropna by @ajpotts in #3101
New Contributors
- @drculhane made their first contribution in #3043
Full Changelog: v2024.03.18...v2024.04.19
Release Notes v2024.03.18
Bug Fixes
- Issue #3035 - Fixes inconsistent results when broadcasting with empty segments
- Issue #2939 - Fixes
TypeErrorinDataFrame.reset_index - Issue #2966 - Fixes error when pip installing from a tar file
- Issue #2897 - Fixes bug where
DataFrame.corrreturnsDataFramewithout index - PR #3021 - Adds
SegArrayoptimization and benchmark bug fix
Major Updates
- Issue #2958 - Renames
akstatstoakscipy - Issue #2942 - Removes
DataFrame.sorted - Issue #3024 and PR #2976 - Add sparse sum helper to util with merge based and sort based workflows
- Issues #2993, #3008, #3017 - Add a random subfolder and stateful
Generatorobjects - Issue #2974 - Adds
Series.map - Issue #3019 - Adds outer join option to
DataFramemerge - PRs #2936, #2967, #3014, #3027 - Improve Array API functionality specifically adding stats and manipulation functions
Minor Updates
- Issue #2929 - Updates
DataFrame.sizeto match pandas - Issues #2906, #2945 - Add shift operators between 2
boolpdarrays and between a combinationboolandint64pdarrays - Issues #2916, #2919 - Add
isspaceandcapitalizetoStrings - Issue #3023 - Adds
to_markdowntoDataFrameandSeries - Issue #2957 - Adds Dot Function
- Issue #2960 - Adds
memory_usagefunctions - Issue #2924 - Updates
DataFramedocumentation - Issue #2896 - Updates
DataFramecolumns to return an Index - Issue #2952 - Makes Chapel 1.33 release default for CI testing
- Issue #2985 - Updates
libzmqversion in Makefile - Issue #2981 - adds
LICENSESfolder including the licenses for numpy, pandas, and scipy - Issues #2969, #2971, #2977, #2989 - Update failing proto_tests
Auto-Generated Release Notes
- Add sort compat modules for new sorting algorithm by @bmcdonald3 in #2941
- Closes #2906 shift operator for boolean vectors by @ajpotts in #2944
- Closes #2916 add isspace for pdarrays by @ajpotts in #2946
- Closes #2949: Add compat modules for 1.34 by @bmcdonald3 in #2950
- Closes #2952: Make Chapel 1.33 release default for CI testing by @bmcdonald3 in #2951
- Remove deprecation warnings about
domain(?)vs.domainby @jeremiah-corrado in #2953 - Closes #2919 add capitalize to pdarrays by @ajpotts in #2948
- Closes #2945: add Shift Operators for
BooleanandInt64by @jaketrookman in #2954 - Closes #2924 update pydoc strings for arkouda dataframe by @ajpotts in #2943
- Closes #2942 bug in sorted by @ajpotts in #2955
- Closes #2958 rename akstats to akscipy by @ajpotts in #2959
- Closes #2929-dataframe-size-to-match-pandas by @ajpotts in #2961
- Closes #2963-PROTO_tests-tests-akscipy-unit-tests-failing by @ajpotts in #2964
- Fixes #2966: pip install from tar error by @stress-tess in #2968
- Array API manipulation functions by @jeremiah-corrado in #2936
- Add sparse sum helper to util by @stress-tess in #2976
- Closes #2969 PROTO_tests/tests/client_test.py unit tests failing by @ajpotts in #2970
- Closes #2971 PROTO_tests/tests/dtypes_test.py unit tests failing by @ajpotts in #2972
- Closes #2977 PROTO_tests/tests/setops_test.py unit tests failing by @ajpotts in #2979
- Address
Random.choicedeprecation by @jeremiah-corrado in #2983 - Closes #2985: Update
libzmqversion in Makefile by @stress-tess in #2986 - Resolve formatting issue in NumpyDType server docs by @jeremiah-corrado in #2992
- Closes #2981 add licenses by @ajpotts in #2988
- Closes #2989 PROTO_tests/tests/pdarray_creation_test.py has failing test by @ajpotts in #2990
- Remove deprecation messages for reader/writer locking default change by @jeremiah-corrado in #2987
- Closes #2994: Remove upper bound on pandas version by @stress-tess in #2995
- Closes #2896 DataFrame columns should return an Index by @ajpotts in #2962
- Closes #2993: Create random subfolder and foundation for generator by @stress-tess in #2997
- Closes #2957: Dot Function by @jaketrookman in #2996
- Closes #2934 DataFrame.unregister_dataframe_by_name string return typ… by @ajpotts in #3011
- Closes #2897 DataFrame.corr returns dataframe without index by @ajpotts in #3012
- Closes #2939 DataFrame.reset_index by @ajpotts in #3013
- Array API stats functions by @jeremiah-corrado in #2967
- Support for creating Array API objects from numpy arrays by @jeremiah-corrado in #3014
- Closes #2960 int version of memory_usage by @ajpotts in #3018
- SegArray optimization & bug fix by @brandon-neth in #3021
- Closes #3019 Add outer join option for dataframe merge by @ajpotts in #3022
- Closes #3031: Update Arkouda for upcoming Chapel 2.0 release by @bmcdonald3 in #3032
- Closes #3008: Add generator sym entry and stateful uniform distribution by @stress-tess in #3016
- Closes #2974 Series.map by @ajpotts in #3010
- Closes #3024: Add merge based workflow and update sort workflow for sparse sum helper by @stress-tess in #3025
- Remove I/O-locking and randomStream.skipTo deprecation messages by @jeremiah-corrado in #3037
- Closes #3023 to_markdown by @ajpotts in #3026
- Array API improvements by @jeremiah-corrado in #3027
- Fixes #3035 - Inconsistent results when broadcasting with empty segments by @stress-tess in #3039
- Closes #3017: Add documentation for our random number generation by @stress-tess in #3044
Full Changelog: v2024.02.02...v2024.03.18