AI Research Engineer — Python • Systems • EvalOps/RLHF Tooling
I build production-grade AI evaluation and data-quality tooling—the systems that make models and AI products reliable:
- Dataset validation + anomaly detection
- Evaluation harnesses + repeatable metrics
- APIs/CLIs + storage + CI gates
- Performance-minded pipelines (batching, streaming, profiling)
| Project | Description | Link |
|---|---|---|
| NLWeb (Microsoft) | Implemented CI/CD pipeline — linting, testing matrix, secrets scanning, Dependabot | PR #397 |
A signal-based toolkit for auditing preference datasets (duplicates, formatting issues, refusal bias, readability), designed for real pipelines.
- Extensible signals framework
- SQLite → PostgreSQL storage path
- CLI + API surface
- CI-friendly: blocks bad data before training
→ github.com/LewallenAE/rlhf-eval
| Category | Tools |
|---|---|
| Languages | Python, TypeScript/JavaScript, Java, SQL |
| ML/LLM | PyTorch, Hugging Face |
| Backend | FastAPI, REST, auth patterns |
| Data/Infra | SQLite, PostgreSQL, Docker, GitHub Actions |
- Vertical slices over big rewrites
- Interfaces + invariants first
- Tests that prove behavior
- Logging/metrics as first-class citizens
Applied Evals • EvalOps • Data Quality Engineering • Backend SWE • ML Systems

