Skip to content

Integrate Cape Python to work with Dask #94

@kjam

Description

@kjam

Is your feature request related to a problem? Please describe.
We've had several users request working with Dask directly instead of Spark and Pandas. Because of it's use in the Python data science community and ease of use for out-of-core computations and parallelization of workflows, it fits well with the data science needs we are trying to address.

Describe the solution you'd like
We should see how many changes we would need to get the cape_pandas integrations working for Dask Dataframes. Matt Rocklin had a look on the webinar and pointed out only a few lines (for example, where we explicitly call pd.Series when returning an array as a series), which would need to be updated for it to just work.

Describe alternatives you've considered
We could wait on Dask integration to prioritize other integrations; however, if it truly is as simple as changing a few returns, I would prefer we do it sooner! :)

Additional context
To hear Matt's comments, check out around 48minutes on this YouTube: https://www.youtube.com/watch?v=cIvv8EGMDY0&feature=youtu.be - I'm sure he is happy to help if we need extra guidance! 🙌

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions