Skip to content

Pipeline capable to handle messy SIM files and convert to a more user friendly format

License

Notifications You must be signed in to change notification settings

GOPAD-Datasus/ETL-SIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Project: SIM

A simple ETL (Extract, Transform, Load) pipeline to facilitate the use of DataSUS's SIM database. With focus on the files corresponding to the years 2012 to 2023.

📌 Overview

  • Extract: Data sourced from OpenDataSUS, and stored in data/raw folder with .csv format.
  • Transform: Every year goes through year specific changes and dtypes are optimized. At the end, They are stored inside data/processed
  • Load: Load transformed data on to Postgres. Alternatively, files created can be accessed at DB-SIM

🚀 Setup

  1. Clone the repo:
    git clone https://github.com/GOPAD-Datasus/ETL-SIM
  2. Install dependencies
    poetry install
    • Note: Poetry must be installed for the command above to work

⚙ Run

  • If your IDE supports in project virtual environments (like PyCharm), use:
python main.py
  • Else, add poetry run to the beginning of the command:
poetry run python main.py

✨ Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you'd like to change.

📝 License

LGNU | © GOPAD 2025

About

Pipeline capable to handle messy SIM files and convert to a more user friendly format

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages