Change commit0 metric from resolved instances to total passed tests #341
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR changes the commit0 evaluation metric from counting resolved instances (all tests pass) to counting total passing tests.
Changes
total_instancesfrom 16 to 3628 (total number of tests across all instances)resolved_instances / completed_instancestototal_passed_tests / total_instancesWhy This Change
The previous metric counted how many instances (repositories) had all tests passing. The new metric counts the total number of individual tests that passed across all instances, providing a more granular measure of success.
Testing
Next Steps
This PR is marked as draft until we verify the results look correct after re-running all evaluations with the new metric.
Related PR in evaluation repo (to be created): Change BENCHMARK_INSTANCE_COUNTS['commit0'] from 16 to 3628