Add Jupyter Notebooks Demonstrating Report-Level and Collection-Level Evaluation with YESciEval

### Description
To improve usability and reproducibility, this repository should include **Jupyter notebooks** that demonstrate how to run evaluations using the refactored YESciEval-based pipeline.

### Tasks
- [ ] Create a notebook:  
  **`evaluation_single_report.ipynb`**
  - Demonstrates evaluation of **one Deep Research report**
  - Uses YESciEval rubrics and scoring
  - Shows:
    - Input report
    - Per-dimension scores
    - Final evaluation output
- [ ] Create a notebook:  
  **`evaluation_collection.ipynb`**
  - Runs evaluation over a **collection of reports**
  - Computes **aggregated metrics** (e.g., mean, distribution)
  - Produces plots (e.g., bar charts, score distributions)
- [ ] Ensure notebooks:
  - Import evaluation logic from YESciEval
  - Are runnable end-to-end with minimal setup
  - Clearly document expected inputs and outputs

### Acceptance Criteria
- Two notebooks are added and committed
- Notebooks run successfully using the refactored evaluation pipeline
- Aggregated metrics are computed and visualized for report collections
- Notebooks serve as clear reference examples for users and evaluators

### Rationale
These notebooks act as executable documentation and lower the barrier for:
- Reproducibility
- Qualitative analysis
- Method comparison
- Adoption of YESciEval-based evaluation workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Jupyter Notebooks Demonstrating Report-Level and Collection-Level Evaluation with YESciEval #19

Description

Tasks

Acceptance Criteria

Rationale

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Jupyter Notebooks Demonstrating Report-Level and Collection-Level Evaluation with YESciEval #19

Description

Description

Tasks

Acceptance Criteria

Rationale

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions