-
Notifications
You must be signed in to change notification settings - Fork 97
Closes #5303: Pandas ExtensionArray: allow dtype=ak for generic Arkouda-backed arrays #5304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Closes #5303: Pandas ExtensionArray: allow dtype=ak for generic Arkouda-backed arrays #5304
Conversation
76f3b24 to
cfc9e7e
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5304 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 4
Lines ? 63
Branches ? 0
========================================
Hits ? 63
Misses ? 0
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cfc9e7e to
11fcad3
Compare
1RyanK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
|
|
||
| from numpy.typing import NDArray | ||
| from pandas.api.extensions import ExtensionArray | ||
| from pandas.core.dtypes.base import ExtensionDtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would from pandas.api.extensions import ExtensionDtype be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| return cls(out) | ||
|
|
||
| @classmethod | ||
| def _from_sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note: pd.array(ak.pandas.Categorical(...), dtype="ak_int64") currently raises NotImplementedError because pandas routes that path through ArkoudaArray._from_sequence (which tries to iterate the categorical) rather than this dispatcher. This PR only supports categorical construction when using the generic "ak" dtype or ArkoudaCategoricalDtype. That’s probably fine, but it could be good to either (a) document that categorical → concrete dtype casts are unsupported, or (b) add a clearer error/guard for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is outside the scope of this PR.
In [12]: pd.array(Categorical(ak.array(["a","a","b"])), dtype="ak")
Out[12]: ArkoudaCategoricalArray(['a', 'a', 'b'])appears to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did create a ticket for the issue: #5335
…eric Arkouda-backed arrays
11fcad3 to
305014d
Compare
Summary
This PR introduces a generic Arkouda pandas dtype,
dtype="ak", allowing users to construct Arkouda-backed pandas arrays without specifying a concrete Arkouda dtype (e.g.ak_int64,ak_string) up front.The new generic dtype improves ergonomics and aligns Arkouda’s pandas integration with standard pandas patterns such as
dtype="string"ordtype="category".Motivation
Prior to this change, users had to explicitly specify a concrete Arkouda dtype when constructing pandas objects:
This is unnecessarily verbose and diverges from typical pandas usage, where users usually select a backend or family and allow the system to infer the concrete dtype.
With this PR, users can now write:
and rely on Arkouda to infer the appropriate concrete dtype.
What’s in this PR
1. Generic
ArkoudaDtypeA new pandas
ExtensionDtype,ArkoudaDtype, is introduced and registered under the name"ak".Key properties:
dtype="ak"resolves toArkoudaDtypeconstruct_array_type()returnsArkoudaExtensionArray2. Factory-style dispatch in
_from_sequenceArkoudaExtensionArray._from_sequencehas been refactored into a true factory:"ak"/ArkoudaDtypeas a request for backend inferencepdarray→ArkoudaArrayStrings→ArkoudaStringArrayCategorical→ArkoudaCategoricalArrayThis makes
_from_sequencethe single construction choke point used by pandas whendtype="ak"is specified.3. Updated documentation
The
_from_sequencedocstring was updated to accurately reflect:dtype="ak"vs concrete Arkouda dtypespd.array(..., dtype="ak"))4. Comprehensive tests
New tests verify that
dtype="ak"correctly dispatches for:int64,float64)Categoricalpd.arrayandpd.Seriesconstruction pathsTests also document the required construction pattern for categoricals (
pd.array(..., dtype="ak")prior toSeries) to avoid pandas eager iteration.Non-goals / Follow-ups
astype("ak")behavior is intentionally out of scope for this PRThese can be addressed in follow-up PRs if desired.
Impact
Example
Closes #5303: Pandas ExtensionArray: allow dtype=ak for generic Arkouda-backed arrays