DSEAL: The annotation process employed for the benchmark creation

Hi, I appreciate the efforts to develop a benchmark to evaluate ML agent systems in every state possible. But, I am most curious about the annotation process that was used to create these benchmarks. I think it was one of the TODOs for getting tutorials for Developing New Benchmarks. I went through the paper https://arxiv.org/pdf/2402.17168 correct me if I am wrong we use the 31 Kaggle datasets and available notebooks to come up with certain problem sketches which we then convert into individual problems (query, validator, etc) that form 1 problem set in our benchmark. Could I get more insights into this process and how we used LLMs to come up with them and refine them through human annotation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSEAL: The annotation process employed for the benchmark creation #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DSEAL: The annotation process employed for the benchmark creation #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions