This repository is a small, public demo extracted from a larger private trading / data-quality system (the full system is under NDA). The goal of this repo is to demonstrate Python engineering quality, data validation, and an LLM labeling/evaluation loop in a way a recruiter can run locally in minutes.
What it does
- Fetches crypto-related articles from NewsAPI for a set of tokens.
- (Optional) Labels each article using an LLM into a small taxonomy:
historical:True/False(is the article mostly about past events?)sentiment: one ofsignificant rise | rise | neutral | fall | significant fall
- Writes inspectable artifacts (CSV + JSON) under
data/news/<TOKEN>/.
What this demo intentionally does not include
- Private strategy logic, proprietary datasets, execution/routing code, infra, monitoring, or any production keys.
- Anything that would reveal client data or confidential decision rules.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
pip install -r requirements-dev.txt # optional: lint/type-check/testsCopy the example env file and insert your keys:
cp .env.example .envRequired:
NEWS_API_KEY(NewsAPI)
Optional (only if you run labeling):
OPENAI_API_KEYOPENAI_MODEL(default:gpt-4o-mini)
Fetch only (no LLM calls):
python get_news.py --tokens BTC,ETH --start 2024-09-27 --end 2024-10-06Fetch + label (uses OPENAI_API_KEY):
python get_news.py --tokens BTC,ETH --start 2024-09-27 --end 2024-10-06 --labelOutputs (example):
data/
news/
BTC/
2024-09-27_to_2024-10-06_everything_BTC.csv
2024-09-27_to_2024-10-06_sentiment_BTC.csv
2024-09-27_to_2024-10-06_rise_dates_BTC.json
2024-09-27_to_2024-10-06_fall_dates_BTC.json
Caching behavior:
- If the
*_everything_<TOKEN>.csvexists, the script reuses it and does not hit NewsAPI again. - If the
*_sentiment_<TOKEN>.csvexists, the script reuses it and does not call the LLM again.
get_news.py— CLI + core logic (NewsAPI fetch, optional LLM labeling, artifact outputs).flash_news.ipynb/flash_news.html/Flash News Presentation.pdf— supporting demo materials.tests/— pytest unit tests for parsing, caching, and API boundaries..github/workflows/ci.yml— CI pipeline (ruff + mypy + pytest) for Python 3.10–3.12.
LLM Trainer / Reasoning Specialist
- Constrained taxonomy labeling and robust parsing of model output.
- “Prompt → structured label → artifact” loop suitable for evaluation datasets.
AI Data Evaluator / Data Quality Reviewer
- Clear label schema, deterministic outputs, cache-first runs.
- Tests covering messy model output, empty fields, and boundary behavior.
Technical Writer – AI Training
- End-to-end reproducible README, explicit limitations, and artifact description.
Python Coding Specialist / Debugging
- Separation of concerns (NewsApiClient, labeler interface, analyzer orchestration).
- CI: linting, formatting, typing, tests.
Scientific Coding – Maths & Python
- Dataframe-driven outputs, reproducible runs, predictable artifact naming for downstream backtests.
Tool-Calling / Agent Evaluation
- Clean “adapter” interface (
LlmClient) that mirrors tool calling boundaries.
- This repo contains no secrets. Add keys only via
.env/ environment variables. .env.exampleis intentionally non-sensitive.- The private system includes additional components (data sources, infra, strategy logic) that are not required to understand this demo.
Run quality checks:
ruff check .
ruff format .
mypy .
pytestMIT License (see LICENSE).