-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Description
Hello ! First of all I want you to thank you for this great work and for making the code (even finetuning !) open source together with the model.
The whole speech processing community is grateful !
I think i have issues with reproducing the results on AMI SDM and i think probably it is because of differences in text normalization, maybe the test split used and maybe because I am not using flash attention ?
Currently I have:
Hypothesis: vibevoice_results/ami/ami/ami_hyp.seglst.json
Reference: /raid/users/popcornell/AMI/ami-sdm_supervisions_test.jsonl.gz
Hypothesis segments: 5037
Reference segments: 11589
--- tcpWER (Time-Constrained Permutation WER) ---
Error Rate: 37.50%
Errors: 33806 (S:7504 D:22688 I:3614)
Reference Words: 90149
--- cpWER (Concatenated min-Permutation WER) ---
Error Rate: 36.09%
Errors: 32537 (S:7459 D:22076 I:3002)
Reference Words: 90149
--- DER (Diarization Error Rate) ---
Error Rate: 24.67%
Scored Time: 21628.58s
Miss: 3142.88s | FA: 1511.59s | SpkErr: 681.66s
I used meeteval for all three with default parameters.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels