AutonomousAgentsLab · martinakaduc · Jun 9, 2025 · Jun 12, 2025 · Jun 12, 2025 · Jun 12, 2025
diff --git a/.gitignore b/.gitignore
@@ -163,4 +163,8 @@ cython_debug/
 
 logs/*
 
-*.ipynb
+*.ipynb
+finetune/data/
+finetune/*results
+finetune/configs
+finetune/prod_env
diff --git a/README.md b/README.md
@@ -1,27 +1,58 @@
-# SpeechEval
+# SLPHelm
 
-## 0. Env Set up:
-This project uses a Conda environment defined in conda_env.yml. To create and activate the environment:
-```sh
-# Create the environment
-conda env create -f environment.yml
+This repository contains scripts and instructions to run the SLPHelm benchmark.
 
-# Activate the environment
-conda activate SpeechEval
-```
+There are two sub-folders:
+- `finetune`: scripts to finetune models with self-generated data.
+- `finetune-ultrasuite`: instructions to create UltraSuite dataset and finetune models with LLaMa-Factory framework.
 
-Alternatively, if you prefer using pip directly, a requirements.txt file is provided:
-```sh
-# (Optional) create or activate your own environment, then:
-pip install -r requirements.txt
+## How to run the benchmark
+1. Install Helm:
+```bash
+git clone https://github.com/martinakaduc/helm/ -b slp_helm
+cd helm
+pip install -e .
 ```
-## 1. Get model list from huggingface:
 
-We first iterate the models on huggingface. Filter out the model satisfied the following requirement:
-1. Has model tag of: any-to-any, audio-text-to-text
-2. Has vllm support (test by run `vllm serve MODEL_ID`)
-3. Accept audio, text input and output text. (test by run a sample request)
+2. Run the benchmark:
+```bash
+# Binary Classification
+helm-run --run-entries \
+    ultra_suite_classification:model={model_name} \
+    --suite binary-suite \
+    --output-path {evaluation_dir} \
+    --disable-cache \
+    --max-eval-instances 1000
+
+# ASR Classification
+helm-run --run-entries \
+    ultra_suite_classification:model={model_name} \
+    --suite asr-suite \
+    --output-path {evaluation_dir} \
+    --disable-cache \
+    --max-eval-instances 1000
+
+# ASR Transcription
+helm-run --run-entries \
+    ultra_suite_asr_transcription:model={model_name} \
+    --suite trans-suite \
+    --output-path {evaluation_dir} \
+    --disable-cache \
+    --max-eval-instances 1000
+
+# Type Classification
+helm-run --run-entries \
+    ultra_suite_classification_breakdown:model={model_name} \
+    --suite type-suite \
+    --output-path {evaluation_dir} \
+    --disable-cache \
+    --max-eval-instances 1000
 
-```sh
-    cd tools && python get_model_list.py
+# Symptom Classification
+helm-run --run-entries \
+    ultra_suite_disorder_symptoms:model={model_name} \
+    --suite symp-suite \
+    --output-path {evaluation_dir} \
+    --disable-cache \
+    --max-eval-instances 1000
 ```
diff --git a/audio/alloy.wav b/audio/alloy.wav
diff --git a/audio/atypical.wav b/audio/atypical.wav
diff --git a/audio/typical.wav b/audio/typical.wav