Consider forking and maintaining pyctcdecode or switch to torchaudio.models.decoder

### System Info
transformers[torch-speech]==4.56.2
pyannote-audio==4.0.0    

### Who can help?

@Rocketknight1 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

With the release of the new pyannote-audio==4.0.0 a few problems arises due to the pyctcdecode dependency which seems to be abandoned.

pyannote-audio==4.0.0 depends on numpy>=2.0.0, but the latest pyctcdecode==0.5.0 (from January 2023) depends on numpy<2.0.0. A PR for numpy 2.0 support has been ignored since February (kensho-technologies/pyctcdecode/pull/116). The restriction of numpy<2.0.0 is arbitrary and was set way before numpy 2.0.0 was announced.

Another problem arises because one of the last commits to pyctcdecode changes the output format of the decoder from tuple to a dataclass, making it incompatible with the current transformers ASR pipeline code..

I.e., currently the only way to use the new pyannote-audio==4.0.0 speaker diarization lib with a  Wav2Vec2ProcessorWithLM is by forking pyctcdecode, reverting the main branch to the 0.5.0 state, and then removing the numpy<2.0.0 restriction.


```
user@host ~> uv pip install "transformers[torch-speech]==4.56.2" pyannote-audio==4.0.0                                                                                                                                                                                                           1
  × No solution found when resolving dependencies:
  ╰─▶ Because transformers[torch-speech]==4.56.2 depends on pyctcdecode>=0.4.0 and pyctcdecode>=0.4.0 depends on numpy>=1.15.0,<2.0.0, we can conclude that transformers[torch-speech]==4.56.2 depends on numpy>=1.15.0,<2.0.0. (1)

      Because pyannote-core==6.0.1 depends on numpy>=2.0 and only pyannote-core<=6.0.1 is available, we can conclude that pyannote-core>=6.0.1 depends on numpy>=2.0.
      And because pyannote-audio==4.0.0 depends on pyannote-core>=6.0.1, we can conclude that pyannote-audio==4.0.0 depends on numpy>=2.0.
      And because we know from (1) that transformers[torch-speech]==4.56.2 depends on numpy>=1.15.0,<2.0.0, we can conclude that pyannote-audio==4.0.0 and transformers[torch-speech]==4.56.2 are incompatible.
      And because you require transformers[torch-speech]==4.56.2 and pyannote-audio==4.0.0, we can conclude that your requirements are unsatisfiable.
```

### Reproduction

uv venv
uv pip install "transformers[torch-speech]==4.56.2" pyannote-audio==4.0.0

### Expected behavior

The packages should install

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider forking and maintaining pyctcdecode or switch to torchaudio.models.decoder #41230

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider forking and maintaining pyctcdecode or switch to torchaudio.models.decoder #41230

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions