cor-challenge

This repository contains a pipeline to load weather station data from text files to a SQLite database. The data is summarized annually per station. A FastAPI application serves two routes:

/weather: Station data queriable by station and/or date
/weather/summary: Annual station summary data queriable by station and/or year

The repository is structured such that:

Application code: app
Ingestion code: scripts
Source data: data
Test suite: tests
Database (will be created): db

Follow the steps below to run the code.

Development is managed with pre-commit and ruff.

Getting Started

This repo is managed through UV and can be installed through:

uv sync
source .venv/bin/activate

uv is a lightweight, very fast package manager based on rust. Installing this repo with other managers may have unexpected outcomes, particularly when running scripts that read from the app package.

This repo has only been run in a linux environment.

Setting up the database

To create database:

python scripts/create_db.py

This creates an empty SQLite database based on the sqlalchemy models in app/models.py

To load database with data:

python scripts/load.py --dir ./data --sqlite-db sqlite:///./db/weather.db

This parses all weather station text files in the ./data directory and upserts to station_data table using pandas and sqlalchemy. If it is run multiple times, the upsert will check for constraint (station_id, date). If the data for a given (station_id, date) has changed, it will be updated. Duplicates will not be created.

To create annual station summaries:

python scripts/summarize.py

This will summarize the station_data into station_summary using sqlalchemy. If it is run multiple times, the upsert will check for constraint (station_id, year). If the data for a given (station_id, year) has changed, it will be updated. Duplicates will not be created.

The database is now setup. You can view this in your preferred database viewer (e.g. DBeaver)

Running the API

To run the API:

python -m app.main

This will spin up the FastAPI client.

View the Swagger docs at: http://localhost:8000/docs. Here, you can see, read documentation, and test all endpoints. Each endpoint has multiple example options under "Try it" or you can modify these yourself.

Redoc is also available at: http://localhost:8000/redoc

Spinning up API:

Swagger docs:

Swagger docs - route:

Running tests

This repo includes a pytest suite. It creates a mock database called test.db to use with the mock client. The DB is flushed after each run. Temporary data files are written to tests/data for each test and removed on compeletion. All tests should pass.

pytest tests

Notes

This was a developed as a demo case. Here are some modifications I would make if continuing development or deploying to prod.

Changes / if in production:

Use PostgreSQL. SQLite was used for development ease
Use Postgres materialized view for summary statistics
Use migration scripts (e.g. liquibase, alembic) for database creation (generated from SQLAlchemy for convenience in demo)
Use primary key based on station id - date combination (removes need for using constraint on duplicate columns)
Create index on station id and date in both tables
year in summary should be date type
Allow filtering in API by multiple stations or dates

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
data		data
docs/img		docs/img
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cor-challenge

Getting Started

Setting up the database

Running the API

Running tests

Notes

About

Uh oh!

Releases

Packages

Languages

quercoak/cor-challenge

Folders and files

Latest commit

History

Repository files navigation

cor-challenge

Getting Started

Setting up the database

Running the API

Running tests

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages