Skip to content

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Notifications You must be signed in to change notification settings

SprocketLab/CARE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Jitian Zhao*, Changho Shin*, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala

Paper Link: TBD

image

Install

pip install -r requirements.txt

Run pipeline

1) Generate LLM judge outputs

python scripts/save_judge_outputs.py \
  --datasets asset_ratings civilcomments_binary allenai_preference_test_sets/pku_better_binary \
  --mode gaussian_mixture

Output path example: judge_outputs/fully_gaussian/asset/Qwen3-8B.csv

2) Run aggregations

Fully Gaussian (table 1 experiment):

python scripts/fully_gaussian_main.py --seed 2024

Gaussian mixture (table 2 experiment):

python scripts/gaussian_mixture_main.py --seed 42 --datasets civilcomments pku_better

Citation

TBD

About

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages