A research project analyzing the impact of speaker accent (Native vs. Non-Native English speakers with or without medical background ) on automatic speech recognition (ASR) accuracy in medical/STEM contexts. The project evaluates transcription quality using multiple metrics including Word Error Rate (WER),Levenshtein Distance, BERTScore, and Named Entity Recognition (NER) performance.
This project investigates how accent affects the accuracy of speech-to-text transcription systems (Whisper/WhisperX) when processing medical content. The analysis compares performance between Native and Non-Native English speakers across different medical scenarios (STEM topics) and evaluates transcription quality using various metrics.
- Audio Transcription: Processes audio files using Whisper/WhisperX models via UnityPredict platform
- Grammar Correction: Applies grammar correction to transcriptions
- WER Analysis: Calculates Word Error Rate for transcriptions compared to reference texts
- BERTScore Evaluation: Measures semantic similarity using contextual embeddings (roberta-large model)
- NER Analysis: Extracts and evaluates medical named entities using spaCy models
- Levenshtein distance Analysis: Measures the number of single-character edits required to match the transcription with the original text.
- Statistical Analysis: Performs mixed-effects modeling to compare groups while accounting for participant and topic variability
Accent/
├── EntryPoint.py # Main engine entry point for UnityPredict
├── main.py # Test script for running the engine
├── config.json # Configuration for UnityPredict deployment
├── requirements.txt # Python dependencies
│
├── Processing Scripts/
│ ├── wer_processor_corrected.py # WER calculation for all the STEMs
│ ├── distance_processor.py # Edit distance calculations
│ ├── json_processor.py # Extracts corrected text from JSON responses
│ ├── Bertscore.py # BERTScore evaluation for STEM1- STEM4
│ ├── NER_STEM.py # Named Entity Recognition for STEM1-STEM4
│
├── Analysis Scripts/
│ ├── Mixed_effect_Model.py # Mixed-effects linear model analysis
│
The project analyzes four different medical scenarios (STEM topics):
- STEM1: Gastroenterology case (gastritis, GERD, endoscopy findings)
- STEM2: Cardiovascular case (diabetes, heart failure, myocardial infarction)
- STEM3: Inflammatory bowel disease case (ulcerative colitis, primary sclerosing cholangitis)
- STEM4: Pulmonary case (COPD, emphysema, obstructive lung disease)
- Python 3.7+
- UnityPredict platform access (for ASR models)
- Clone the repository:
git clone <repository-url>
cd Accent- Install Python dependencies:
pip install -r requirements.txt- Download spaCy models:
python -m spacy download en_ner_bc5cdr_md- Configure UnityPredict:
- Update
config.jsonwith your UnityPredict engine ID and API keys - Ensure UnityPredict platform is accessible
- Update
python main.pyThis will process audio files through the UnityPredict platform, transcribe them using Whisper/WhisperX, and apply grammar correction.
from wer_processor_corrected import process_folder_for_WER
process_folder_for_WER("path/to/transcription/folder")Run the appropriate script for each STEM topic:
python Bertscore.py Run NER extraction and evaluation:
python NER.py # For STEM1 through STEM4
Run mixed-effects modeling:
python Mixed_effect_model.pyThis analyzes the relationship between speaker group (Native vs. Non-Native), and Participant variation and WER, accounting for variability across participants and STEM topics.
unitypredict-engines>=1.0.0- UnityPredict platform integrationnumpy- Numerical computationspandas- Data manipulationstatsmodels- Statistical modelingplotly- Interactive visualizationsmatplotlib- Plottingscispacy>=0.5.4- Scientific NLPspacy>=3.7.0,<3.8.0- NLP processingjiwer- Word Error Rate calculationbert-score- BERTScore evaluationscikit-learn- Machine learning utilities
Measures transcription accuracy by comparing the number of word errors (substitutions, insertions, deletions) to the reference text.
Semantic similarity metric using contextual embeddings from pre-trained language models (roberta-large). Provides precision, recall, and F1 scores.
Extracts medical entities (diseases, chemicals, drugs) from transcriptions and evaluates extraction accuracy using precision, recall, and F1 scores.
The project uses mixed-effects linear models to:
- Compare WER between Native and Non-Native speaker groups
- Account for random effects of participants and STEM topics
- Estimate variance components for different sources of variability
Key findings from the analysis:
- Non-Native speakers show significantly higher WER than Native speakers
- STEM topics show substantial variation in baseline WER
- participant variance get smaller as the ASR model output improves
- The performance gap between groups remains relatively stable across topics
- Audio files:
.mp3format - Text files:
.txtformat (for direct transcription input)
- CSV files with WER, BERTScore, and NER metrics
- Corrected transcriptions
- Statistical model summaries
Edit config.json to configure:
- UnityPredict engine settings
- Model paths
- Temporary directory locations
- API keys and deployment parameters
- The project uses UnityPredict platform for ASR model access
- Local testing can be done using the
unitypredict_mocktooldirectory - Reference texts for each STEM topic are hardcoded in the processing scripts
- True entity sets for NER evaluation are defined per STEM topic
Yasman Fatapour, AI Research Data Scientist
- UnityPredict platform for ASR model access
- spaCy for medical NER models
- Whisper/WhisperX for speech recognition