Reproducing the AMI SDM results

Hello ! First of all I want you to thank you for this great work and for making the code (even finetuning !) open source together with the model. 
The whole speech processing community is grateful ! 

I think i have issues with reproducing the results on AMI SDM and i think probably it is because of differences in text normalization, maybe the test split used and maybe because I am not using flash attention ? 

Currently I have: 

```
Hypothesis: vibevoice_results/ami/ami/ami_hyp.seglst.json
Reference: /raid/users/popcornell/AMI/ami-sdm_supervisions_test.jsonl.gz
Hypothesis segments: 5037
Reference segments: 11589

--- tcpWER (Time-Constrained Permutation WER) ---
  Error Rate: 37.50%
  Errors: 33806 (S:7504 D:22688 I:3614)
  Reference Words: 90149

--- cpWER (Concatenated min-Permutation WER) ---
  Error Rate: 36.09%
  Errors: 32537 (S:7459 D:22076 I:3002)
  Reference Words: 90149

--- DER (Diarization Error Rate) ---
  Error Rate: 24.67%
  Scored Time: 21628.58s
  Miss: 3142.88s | FA: 1511.59s | SpkErr: 681.66s
```

I used meeteval for all three with default parameters. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing the AMI SDM results #237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing the AMI SDM results #237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions