data-engineering-zoomcamp

IMPORTANT - NOT DONE YET

Overview

This repo contains the work done for the Data Engineering Zoomcamp 2026 from DataTalks.Club. Currently, only the first module is implemented. In this module we used Pyhton, Docker and PostgreSQL to create a data migration pipeline, from CSV a file with 2021 NY Taxi Data to a PostgreSQL Database. The CSV files are extracted and stored in the Database in chunks, using pandas framework, through a containerized CLI application. The PostreSQL Database runs in a multi-container application with PgAdmin and the Python script is executed in a separate container. The multi-container application must be up before using the script container.

Requirements

This work is based on Docker for containerization. You must have Docker installed and running, in order to build and run the containers.

How To Use

Clone this repo

git clone https://github.com/PLCodingStuff/data-engineering-zoomcamp.git
cd data-engineering-zoomcamp.git

Get into pipeline folder
```
cd pipeline
```
Start the multi-container application
```
docker compose up
```
Build the containerized script
```
docker build -t taxi:v001 .
```
Check the network
```
docker network ls
```

Run the container

# The network name will be based on the directory or found with previous command
docker run -it --rm \
--network=pipeline_default \
taxi:v001 \
--pg-user=root \
--pg-password=root \
--pg-host=pgdatabase \
--pg-port=5432 \
--pg-db=ny_taxi \
--target-table=yellow_taxi_trips

You can connect to PgAdmin through localhost:8085, with email admin@admin.com and password root.

Configuration

You can modify the environment variables of PostgreSQL and PgAdmin in docker-compose.yaml. The default user and password in PostgreSQL are root, under the fields POSTGRES_USER and POSTGRES_PASSWORD. The default email in PgAdmin is admin@admin.com and the password is root, under the fields PGADMIN_DEFAULT_EMAIL and PGADMIN_DEFAULT_PASSWORD. If you are more familiar with docker configuration files, feel free to make any changes. Every change in docker-compose.yaml also entails a change to the container execution.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
01-docker-terraform/pipeline		01-docker-terraform/pipeline
02-workflow-orchestration		02-workflow-orchestration
04-analytics-engineering		04-analytics-engineering
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-engineering-zoomcamp

IMPORTANT - NOT DONE YET

Overview

Requirements

How To Use

Configuration

About

Uh oh!

Languages

PLCodingStuff/data-engineering-zoomcamp

Folders and files

Latest commit

History

Repository files navigation

data-engineering-zoomcamp

IMPORTANT - NOT DONE YET

Overview

Requirements

How To Use

Configuration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages