This repository contains an Apache Airflow setup (Docker Compose) and example DAGs for Redshift COPY loads and a Redshift → MySQL export.
docker-compose.yaml— Airflow + Postgres + Redis for local developmentdags/— DAGs, helper libs, and SQL templatesdags/sql/— SQL grouped by purpose (connect/,init/,copy/,query/)dags/lib/— Python helpers used by DAGs
plugins/— Airflow plugins (empty by default)config/— Optional Airflow config overrides (not required)logs/— Airflow logs (ignored by Git)infra/aws/— One‑time AWS setup scripts/policies for Redshift + S3 access.env.example— Example environment variables for Docker Compose.gitignore— Ignores logs, caches, secrets, local IDE files, etc.
-
Copy
.env.exampleto.envand adjust values if needed. -
Start the stack:
docker compose up -d
-
Open the Airflow UI: http://localhost:8080 (default:
airflow/airflow). -
Set connections in Airflow (UI -> Admin -> Connections):
redshift(Postgres‑compatible Redshift connection)mysql_local(MySQL connection used by the export task, created in UI)
Alternatively, you can configure connections via environment variables or a connections JSON, but the UI is simplest for local dev.
For Redshift Serverless to read from S3 (COPY from parquet, etc.), see infra/aws/:
infra/aws/setup_redshift_iam_role.sh— creates/updates IAM role and sets it as the Default IAM Role for a Redshift Serverless namespaceinfra/aws/policies/— example trust policy and bucket policy templateinfra/aws/examples/select/— sample S3 Select serialization configs
These are not required at runtime by the containers; they’re just operational docs/scripts to bootstrap AWS permissions.
- Do not commit secrets. Use
.env(ignored) or Airflow’s secret backends. logs/and__pycache__/are ignored. If any cache files were present, they’ve been removed.- If you extend Python dependencies, prefer building a custom image instead of
using
_PIP_ADDITIONAL_REQUIREMENTSfor anything beyond quick checks.