Skip to content

heilcheng/DeepMind

Repository files navigation

Google Summer of Code 2025 / Google DeepMind

About

My journey during Google Summer of Code 2025 with Google DeepMind on the Gemma project. I built tools for evaluating large language models, focusing on systematic benchmarking and domain-specific assessment.

This repository is hosted at haileycheng.com/DeepMind/

Blog

How I Landed a Google DeepMind Project in Google Summer of Code 2025: A Step-by-Step Guide

Read on Medium

Updates

  • May 7: Selected by Google DeepMind for the Gemma project.
  • May 8: Rejections from two other orgs, leading me here.

Proposals are public for anyone curious about the process.

Submission for DeepMind:

Good luck for your GSoC 2026 application.

Projects

OpenEvals

Repository: github.com/heilcheng/openevals

Documentation: haileycheng.com/openevals

OpenEvals is a framework for LLM evaluation. Standardized benchmarking across academic tasks.

Functionality:

  • Runs standard benchmarks: MMLU, GSM8K, MATH, HumanEval, ARC, TruthfulQA
  • Compares model families: Gemma, Llama, Mistral, Qwen, DeepSeek, HuggingFace
  • Measures efficiency: latency, throughput, memory
  • Statistical analyses with confidence intervals
  • Publication-ready visualizations

Significance:

Evaluation is fragmented. OpenEvals unifies it. Consistent benchmarks. Reproducible results.

MedExplain Evals

Repository: github.com/heilcheng/medexplain-evals

Documentation: haileycheng.com/medexplain-evals

Domain-specific framework. Assessing model explanations of medical info for non-experts.

Functionality:

  • Evaluates medical explanation tasks
  • Measures accuracy, clarity, safety
  • Specialized benchmarks
  • Interactive web interface

Significance:

General benchmarks miss medical nuances. Misinformation harms. Targeted evaluation for patient-facing applications.

Resources for GSoC Applicants

Proposal

Original proposal submitted to Google DeepMind:

License

MIT

About

Record for work at Google DeepMind

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages