Skip to content

Add Jupyter Notebooks Demonstrating Report-Level and Collection-Level Evaluation with YESciEval #19

@jd-coderepos

Description

@jd-coderepos

Description

To improve usability and reproducibility, this repository should include Jupyter notebooks that demonstrate how to run evaluations using the refactored YESciEval-based pipeline.

Tasks

  • Create a notebook:
    evaluation_single_report.ipynb
    • Demonstrates evaluation of one Deep Research report
    • Uses YESciEval rubrics and scoring
    • Shows:
      • Input report
      • Per-dimension scores
      • Final evaluation output
  • Create a notebook:
    evaluation_collection.ipynb
    • Runs evaluation over a collection of reports
    • Computes aggregated metrics (e.g., mean, distribution)
    • Produces plots (e.g., bar charts, score distributions)
  • Ensure notebooks:
    • Import evaluation logic from YESciEval
    • Are runnable end-to-end with minimal setup
    • Clearly document expected inputs and outputs

Acceptance Criteria

  • Two notebooks are added and committed
  • Notebooks run successfully using the refactored evaluation pipeline
  • Aggregated metrics are computed and visualized for report collections
  • Notebooks serve as clear reference examples for users and evaluators

Rationale

These notebooks act as executable documentation and lower the barrier for:

  • Reproducibility
  • Qualitative analysis
  • Method comparison
  • Adoption of YESciEval-based evaluation workflows

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions