-
Notifications
You must be signed in to change notification settings - Fork 3
Prompt explore #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Prompt explore #108
Conversation
## Rubric Updates (v2.1) Updated rubric10.txt and rubric20.txt to address 23 stakeholder review comments: - Add W3C PROV-O provenance support with graph-based validation - Align with Bridge2AI AI/ML readiness criteria and bioRXiv article - Add data sustainability indicators (DOI, governance, repositories) - Support graph-based metadata representations (preprocessing, collection) - Correct license terminology (replace misleading "Open"/"Public" terms) - Clarify multimodal definitions for Bridge2AI datasets - Expand sensitive data examples (voice, activity, retinal images) - Add dataset merging/integration capability assessment - Make citation mandatory for all Bridge2AI datasets - Add hosting platform identification field - Document review comments and responses in REVIEW_COMMENTS_RESPONSE_REPORT.md ## D4D Evaluations Regenerated deterministic evaluations for all 4 Bridge2AI projects using claude-sonnet-4-5-20250929 (temperature=0.0): **Corrected Rubric10 Rankings (50 points max):** 1. AI_READI: 44/50 (88.0%) - Grade A- 2. VOICE: 40/50 (80.0%) - Grade B+ 3. CHORUS: 38/50 (76.0%) - Grade B 4. CM4AI: 36/50 (72.0%) - Grade C+ **Corrected Rubric20 Rankings (84 points max):** 1. CHORUS: 78/84 (92.9%) - Grade A 2. CM4AI: 68/84 (81.0%) - Grade B+ 3. AI_READI: 44/84 (52.4%) - Grade D **Score Validation:** - Fixed LLM math errors in 3 files (CM4AI: -8 points, VOICE: -6 points, CM4AI rubric20: +1 point) - Post-processing validation essential even at temperature=0.0 **Files Updated:** - Evaluation JSONs (rubric10 and rubric20 for all projects) - HTML renderings (8 files: 4 rubric10 + 4 rubric20) - EVALUATION_SUMMARY_2026-01-12.md - Comprehensive analysis - COMPARISON_TABLES_2026-01-12.md - Cross-rubric comparison tables **Key Findings:** - Universal strengths: Scientific Motivation (5/5) and Technical Transparency (5/5) across all projects - Universal weakness: Limitations Disclosure (avg 2.3/5) - all projects missing known_limitations, known_biases, anomalies - Schema compliance: 25% of evaluation files pass current schema validation (5/20 files) - Rank reversal: CHORUS ranks 3rd in Rubric10 but 1st in Rubric20 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated render_evaluation_html_rubric10_semantic.py to generate files with
consistent naming convention matching rubric20 files.
**Changes:**
- Modified script to generate `{PROJECT}_evaluation_rubric10.html` instead of `{PROJECT}_evaluation.html`
- Removed old HTML files without suffix
- Regenerated all 4 rubric10 evaluation HTML files with correct naming
**File naming convention (now consistent):**
- Rubric10: `{PROJECT}_evaluation_rubric10.html`
- Rubric20: `{PROJECT}_evaluation_rubric20.html`
- D4D Human-Readable: `{PROJECT}_d4d_human_readable.html`
**Files affected:**
- scripts/render_evaluation_html_rubric10_semantic.py (line 642)
- AI_READI_evaluation_rubric10.html (renamed from AI_READI_evaluation.html)
- CHORUS_evaluation_rubric10.html (renamed from CHORUS_evaluation.html)
- CM4AI_evaluation_rubric10.html (renamed from CM4AI_evaluation.html)
- VOICE_evaluation_rubric10.html (renamed from VOICE_evaluation.html)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request updates the D4D rubric evaluation system from version 2.0 to 2.1, incorporating comprehensive review comments and regenerating evaluations for Bridge2AI datasets. The changes include rubric enhancements (W3C PROV-O provenance, AI/ML readiness criteria, sustainability indicators), updated evaluation outputs, and extensive documentation of the review process and results.
Changes:
- Updated rubrics (v2.0 → v2.1) with 23 review comments addressed across rubric10.txt and rubric20.txt
- Regenerated evaluation JSONs for CM4AI and VOICE with updated scoring methodology
- Added comprehensive documentation (REVIEW_COMMENTS_RESPONSE_REPORT.md, EVALUATION_SUMMARY, COMPARISON_TABLES)
- Updated HTML rendering script to include "_rubric10" suffix in output filenames
- Refreshed HTML evaluation reports with new timestamps
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/render_evaluation_html_rubric10_semantic.py | Updated output filename pattern to include "_rubric10" suffix |
| data/rubric/rubric20.txt | Schema v2.1 update with expanded field guide, revised questions (Q3, Q4, Q6, Q8-Q20) |
| data/rubric/rubric10.txt | Schema v2.1 update with expanded field guide, revised elements (3, 8, 10) |
| data/rubric/REVIEW_COMMENTS_RESPONSE_REPORT.md | New comprehensive 307-line documentation of 23 review comments |
| data/evaluation_llm/.../CM4AI_claudecode_agent_evaluation.json (rubric20) | Version 2.1 re-evaluation with score changes (82→68 points) |
| data/evaluation_llm/.../VOICE_claudecode_agent_evaluation.json (rubric10) | Version 2.1 re-evaluation with score changes (44→40 points) |
| data/evaluation_llm/.../CM4AI_claudecode_agent_evaluation.json (rubric10) | Version 2.1 re-evaluation with score changes (48→36 points) |
| data/evaluation_llm/EVALUATION_SUMMARY_2026-01-12.md | New 414-line summary report of evaluation results |
| data/evaluation_llm/COMPARISON_TABLES_2026-01-12.md | New 335-line comparative analysis tables |
| data/d4d_html/.../VOICE_evaluation_rubric10.html | New HTML rendering (1131 lines) |
| data/d4d_html/.../_evaluation.html | Timestamp updates (2025 dates → 2026-01-12/13) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Generate output filename | ||
| project_name = eval_file.stem.replace('_claudecode_agent_evaluation', '') | ||
| output_path = output_dir / f"{project_name}_evaluation.html" | ||
| output_path = output_dir / f"{project_name}_evaluation_rubric10.html" |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output filename pattern has been changed from {project_name}_evaluation.html to {project_name}_evaluation_rubric10.html. This change should be documented and any dependent scripts or documentation should be updated to reflect this new naming convention.
Clean up old evaluation HTML files that were replaced with properly named versions containing the _rubric10 suffix. Files removed: - AI_READI_evaluation.html → AI_READI_evaluation_rubric10.html - CHORUS_evaluation.html → CHORUS_evaluation_rubric10.html - CM4AI_evaluation.html → CM4AI_evaluation_rubric10.html - VOICE_evaluation.html → VOICE_evaluation_rubric10.html All projects now use consistent naming with explicit rubric type suffixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
No description provided.