SorryDB dataset

This repository hosts the main dataset of the SorryDB project.

sorry_database.json: the principal database file containing a cumulative list of scraped sorries (and some data used by the crawler).
deduplicated_sorries.json: a list containing the most recent instance of a sorry with a given pretty printed goal string for every such goal occurring in the database.

Static lists

We intend to also provide some static lists of sorries that can be used for development, testing, and benchmarking. For now this includes:

static_100_varied_recent_deduplicated_sorries.json: a static list of 100 varied and recent deduplicated sorries for development and testing purposes.

Repositories

This repo is updated with new sorries from repos in the active_repos.json repo list. This was generated from Lean Reservoir using the scrape_resevoir.py script.

Information for repository owners and contributors

We intend to track all sorries in Lean 4 repositories listed by Reservoir. If you want your repository to be indexed, make sure it satisfies the inclusion criteria. In particular, it must have an open source license.

Even if your repository is listed on Reservoir, you can opt out from having it indexed. We can also exclude sorries based on authorship (determined via git blame). In both cases, please contact Lenny Taelman.

Guidelines for automated theorem proving researchers

See ABOUT.md for more information on the motivation and goals of this project.

Some important guidelines:

this is work in progress, and still in an experimental phase; in particular the format of the dataset may change without notice
if you are using or intend to use this dataset, let us know!
if you prove some of these sorries, do not submit automated pull requests without prior consent from the repository owners.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.gitignore		.gitignore
README.md		README.md
active_repos.json		active_repos.json
deduplicated_sorries.json		deduplicated_sorries.json
sorry_database.json		sorry_database.json
static_100_varied_recent_deduplicated_sorries.json		static_100_varied_recent_deduplicated_sorries.json
update_database_stats.json		update_database_stats.json
update_report.md		update_report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SorryDB dataset

Contents

Nightly updates

Static lists

Repositories

Information for repository owners and contributors

Guidelines for automated theorem proving researchers

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

SorryDB/sorrydb-data

Folders and files

Latest commit

History

Repository files navigation

SorryDB dataset

Contents

Nightly updates

Static lists

Repositories

Information for repository owners and contributors

Guidelines for automated theorem proving researchers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages