Google Summer of Code 2025 / Google DeepMind

About

My journey during Google Summer of Code 2025 with Google DeepMind on the Gemma project. I built tools for evaluating large language models, focusing on systematic benchmarking and domain-specific assessment.

This repository is hosted at haileycheng.com/DeepMind/

Blog

How I Landed a Google DeepMind Project in Google Summer of Code 2025: A Step-by-Step Guide

Read on Medium

Updates

May 7: Selected by Google DeepMind for the Gemma project.
May 8: Rejections from two other orgs, leading me here.

Proposals are public for anyone curious about the process.

Submission for DeepMind:

A proposal (PDF attached)
A blog post under the demo tag in the Gemma repo: google-deepmind/gemma#244

Good luck for your GSoC 2026 application.

Projects

OpenEvals

Repository: github.com/heilcheng/openevals

Documentation: haileycheng.com/openevals

OpenEvals is a framework for LLM evaluation. Standardized benchmarking across academic tasks.

Functionality:

Runs standard benchmarks: MMLU, GSM8K, MATH, HumanEval, ARC, TruthfulQA
Compares model families: Gemma, Llama, Mistral, Qwen, DeepSeek, HuggingFace
Measures efficiency: latency, throughput, memory
Statistical analyses with confidence intervals
Publication-ready visualizations

Significance:

Evaluation is fragmented. OpenEvals unifies it. Consistent benchmarks. Reproducible results.

MedExplain Evals

Repository: github.com/heilcheng/medexplain-evals

Documentation: haileycheng.com/medexplain-evals

Domain-specific framework. Assessing model explanations of medical info for non-experts.

Functionality:

Evaluates medical explanation tasks
Measures accuracy, clarity, safety
Specialized benchmarks
Interactive web interface

Significance:

General benchmarks miss medical nuances. Misinformation harms. Targeted evaluation for patient-facing applications.

Resources for GSoC Applicants

GSoC Guide: Comprehensive platform with tips and resources.
GSoC 2025 Proposals Archive: Archive of 120+ accepted proposals.
GSoC Organizations: Search and filter participating orgs.

Proposal

Original proposal submitted to Google DeepMind:

Comprehensive Benchmark Suite for Evaluating Gemma Models (PDF)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
Comprehensive Benchmark Suite for Evaluating Gemma Models.pdf		Comprehensive Benchmark Suite for Evaluating Gemma Models.pdf
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Google Summer of Code 2025 / Google DeepMind

About

Blog

Updates

Projects

OpenEvals

MedExplain Evals

Resources for GSoC Applicants

Proposal

License

About

Uh oh!

Releases

Packages

Languages

License

heilcheng/DeepMind

Folders and files

Latest commit

History

Repository files navigation

Google Summer of Code 2025 / Google DeepMind

About

Blog

Updates

Projects

OpenEvals

MedExplain Evals

Resources for GSoC Applicants

Proposal

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages