-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Description
To improve usability and reproducibility, this repository should include Jupyter notebooks that demonstrate how to run evaluations using the refactored YESciEval-based pipeline.
Tasks
- Create a notebook:
evaluation_single_report.ipynb- Demonstrates evaluation of one Deep Research report
- Uses YESciEval rubrics and scoring
- Shows:
- Input report
- Per-dimension scores
- Final evaluation output
- Create a notebook:
evaluation_collection.ipynb- Runs evaluation over a collection of reports
- Computes aggregated metrics (e.g., mean, distribution)
- Produces plots (e.g., bar charts, score distributions)
- Ensure notebooks:
- Import evaluation logic from YESciEval
- Are runnable end-to-end with minimal setup
- Clearly document expected inputs and outputs
Acceptance Criteria
- Two notebooks are added and committed
- Notebooks run successfully using the refactored evaluation pipeline
- Aggregated metrics are computed and visualized for report collections
- Notebooks serve as clear reference examples for users and evaluators
Rationale
These notebooks act as executable documentation and lower the barrier for:
- Reproducibility
- Qualitative analysis
- Method comparison
- Adoption of YESciEval-based evaluation workflows
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request