A simple ETL (Extract, Transform, Load) pipeline to facilitate the use of DataSUS's SIM database. With focus on the files corresponding to the years 2012 to 2023.
- Extract: Data sourced from OpenDataSUS, and stored in
data/rawfolder with .csv format. - Transform: Every year goes through year specific changes and dtypes are optimized. At the end, They
are stored inside
data/processed - Load: Load transformed data on to Postgres. Alternatively, files created can be accessed at DB-SIM
- Clone the repo:
git clone https://github.com/GOPAD-Datasus/ETL-SIM
- Install dependencies
poetry install
- Note: Poetry must be installed for the command above to work
- If your IDE supports in project virtual environments (like PyCharm), use:
python main.py- Else, add
poetry runto the beginning of the command:
poetry run python main.pyPull requests are welcome. For major changes, please open an issue first to discuss what you'd like to change.
LGNU | © GOPAD 2025