A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence. More at www.forecastbench.org.
Leaderboards and datasets are updated nightly and available at github.com/forecastingresearch/forecastbench-datasets.
Instructions for how to submit your model to the benchmark can be found here: How-to-submit-to-ForecastBench.
Dig into the details of ForecastBench on the wiki.
@inproceedings{karger2025forecastbench,
title={ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities},
author={Ezra Karger and Houtan Bastani and Chen Yueh-Han and Zachary Jacobs and Danny Halawi and Fred Zhang and Philip E. Tetlock},
year={2025},
booktitle={International Conference on Learning Representations (ICLR)},
url={https://iclr.cc/virtual/2025/poster/28507}
}git clone --recurse-submodules <repo-url>.gitcd forecastbenchcp variables.example.mk variables.mkand set the values accordingly- Setup your Python virtual environment
make setup-python-envsource .venv/bin/activate
cd directory/containing/cloud/functioneval $(cat path/to/variables.mk | xargs) python main.py
Before creating a pull request:
- run
make lintand fix any errors and warnings - ensure code has been deployed to Google Cloud Platform and tested (only for our devs, for others, we're happy you're contributing and we'll test this on our end).
- fork the repo
- reference the issue number (if one exists) in the commit message
- push to the fork on a branch other than
main - create a pull request