Skip to content

Reproducing the AMI SDM results #237

@popcornell

Description

@popcornell

Hello ! First of all I want you to thank you for this great work and for making the code (even finetuning !) open source together with the model.
The whole speech processing community is grateful !

I think i have issues with reproducing the results on AMI SDM and i think probably it is because of differences in text normalization, maybe the test split used and maybe because I am not using flash attention ?

Currently I have:

Hypothesis: vibevoice_results/ami/ami/ami_hyp.seglst.json
Reference: /raid/users/popcornell/AMI/ami-sdm_supervisions_test.jsonl.gz
Hypothesis segments: 5037
Reference segments: 11589

--- tcpWER (Time-Constrained Permutation WER) ---
  Error Rate: 37.50%
  Errors: 33806 (S:7504 D:22688 I:3614)
  Reference Words: 90149

--- cpWER (Concatenated min-Permutation WER) ---
  Error Rate: 36.09%
  Errors: 32537 (S:7459 D:22076 I:3002)
  Reference Words: 90149

--- DER (Diarization Error Rate) ---
  Error Rate: 24.67%
  Scored Time: 21628.58s
  Miss: 3142.88s | FA: 1511.59s | SpkErr: 681.66s

I used meeteval for all three with default parameters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions