Skip to content

Exploring Snowflake Support via PyArrow & Pandas #104

@kjam

Description

@kjam

Is your feature request related to a problem? Please describe.
We would like to eventually support workflows that use SnowflakeDB, and one idea that has come up is to use their integration with PyArrow and Pandas to learn more about Arrow but also to support Snowflake data - https://docs.snowflake.com/en/user-guide/python-connector-pandas.html

Describe the solution you'd like
An initial test of whether this workflow is feasible would be useful to see what benchmarks we can create for pulling data into Pandas and then applying Cape policy to it. It might also be worthwhile diving into the library internals to see how the Query -> Arrow -> Dataframe workflow works!

Describe alternatives you've considered
We have explored the idea of a ODBC or JDBC layer as another way of solving this issue.

Additional context
We could pick an interesting example use case and check it out! Would also be excited to hear about architecture choices they made here and see if we can explore how we might apply policy in/to Arrow??

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions