Skip to content

Conversation

@bastoica
Copy link
Collaborator

Description

This PR adds two artifacts, Acto (SOSP'23) and Anvil (OSDI'24), to ArtEvalBench.

Changes

  • Added two artifacts
  • Updated benchmark JSONL schema

Testing

Tested manually by downloading, installing, and running the two artifacts, along with their corresponding agent oracles.

Checklist

  • Tests pass locally
  • Code follows project style guidelines

@bastoica bastoica added the enhancement New feature or request label Dec 10, 2025
@bastoica bastoica requested a review from xuafeng December 10, 2025 07:42
@bastoica bastoica self-assigned this Dec 10, 2025
@xuafeng
Copy link
Collaborator

xuafeng commented Dec 11, 2025

Thanks @bastoica. Please ping me when it is ready for merge.

@bastoica bastoica marked this pull request as ready for review December 16, 2025 07:18
@bastoica
Copy link
Collaborator Author

Thanks @xuafeng, this PR is ready.

Having said that, Acto might need some improvements since it needs to download third-party benchmarks, a few which might not be easily available. I'll open a issue and work on a new PR for this.

@xuafeng xuafeng merged commit afa4640 into sys-intelligence:main Dec 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Acto (SOSP'23) to ArtEvalBench Add Anvil (OSDI'24) to ArtEvalBench

2 participants