GitHub - firelink-sh/evolve-py: A highly efficient, composable, and lightweight ETL and data integration framework.

A highly efficient, composable, and lightweight ETL and data integration framework.

evolve is currently in early development and consistently undergoes breaking changes to the core api and functionality. Expect a more stable version to be released in a couple of weeks.

evolve is an open-source and platform agnostic Python framework that enables your data teams to efficiently integrate data from a wide variety of structured or unstructured data sources into your database, data warehouse, or data lake(house) — blazingly fast with minimal memory overhead thanks to the Apache Arrow ecosystem.

It is built for developers with a code-first mindset. You will not find any low-code, clickops, or drag-and-drop shenanigans here. evolve offers you full control of how your data is read, parsed, handled in-memory, transformed, and finally written to any destination you need.

Composable - Design your own data pipelines to fit into your own stack, and add any extra (possibly proprietary) sources or targets that you might possibly need, all possible through evolve's intuitive and lightweight framework philosophy.
Blazing fast - Zero-copy principles by leveraging Apache Arrow gives you extremely rapid in-memory operations perfect for OLAP and easy interoperability with DuckDB, Polars, Spark, DataFusion and many more query engines.
Customizable - You choose the backend that you want to use. Do you prefer DataFrames? Use Polars! Or perhaps you prefer to work on data using SQL? Then use the DuckDB backend! It is completely up to you.
Platform agnostic - Run your ETL/ELT using evolve on your own infrastructure, no vendor lock-in, never.

Architecture (alpha version)

flowchart TD
    %% Sources (Connectors)
    subgraph Sources
        CSV[Local CSV Source]
        JSON[HDFS JSON Source]
        Parquet[S3 Parquet Source]
        SQL[SQL Source]
        Custom[Custom Source]
    end

    %% Intermediate Representation
    subgraph Backend
        Arrow[Apache Arrow / Polars / DuckDB / Custom]
    end

    %% Targets (Connectors)
    subgraph Targets
        S3[S3 object store]
        Local[Local file system]
        HDFS[Hadoop file system]
        DW[Data Warehouse]
        ML[ML Pipeline]
        CustomOut[Custom Format]
    end

    %% Mapping logic
    CSV -->|Map to Arrow| Arrow
    JSON -->|Map to Arrow| Arrow
    SQL -->|Map to Arrow| Arrow
    Custom -->|Conditional Mapping| Arrow
    Parquet -->|Direct Mapping| S3

    Arrow --> S3
    Arrow --> Local
    Arrow --> HDFS
    Arrow --> DW
    Arrow --> ML
    Arrow --> Viz
    Arrow --> CustomOut

Example usage

import evolve as ev

# Pipelines are lazy - only run when told to
pipeline = ev.Pipeline("parquet-ingestion") \
    .with_source(ev.io.FixedWidthFile(...)) \
    .with_target(ev.io.ParquetFile(...)) \
    .with_transform(DropNulls(columns=(..., )))

pipeline.run()  # runs the ETL

You can configure it with yaml or json!

source:
  type: postgres
  host: localhost
  db: prod
  user: admin
  password: secret
  schema: sales
  tables: orders

transforms:
  - type: drop_nulls
    columns: ["order_id", "amount"]
  - type: rename_columns
    mapping:
      order_id: id
      amount: total
  - type: filter_rows
    condition: "total > 100"

target:
  type: parquet
  path: s3://prod/sales/orders.parquet

License

evolve is distributed under the terms of both the MIT License and the Apache License (version 2.0).

See LICENSE-APACHE and LICENSE-MIT for details.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github		.github
bench		bench
docs		docs
examples		examples
src/evolve		src/evolve
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Architecture (alpha version)

Example usage

License

About

Licenses found

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Licenses found

firelink-sh/evolve-py

Folders and files

Latest commit

History

Repository files navigation

Architecture (alpha version)

Example usage

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages