StyleTTS2 Backporting by korakoe · Pull Request #1 · Respaired/StyleTTS_Accelerate

korakoe · 2024-12-22T11:15:11Z

Feel free to look at the changes and close

1. separate prosody and style encoders 2. implement diffusion (Bert embeddings still need work) 3. do proper cross entropy as opposed to the STTS1 way of doing cross entropy, use `s2s_attn_feat` (now just `s2s_attn` like in stts2) directly 4. add `max_len` parameter, may not be needed, but added in case we encounter a massive audio file (simply setting to a very large number will have the same effect) 5. Alter dataloader to support ref_mels and waves (waves is unnecessary, as its used for SLM and decoder losses, but STTS1 isn't end to end, kept in just in case) TODO: Add accelerate to second stage TODO: Un-hardcode all data params (sr, hop, n_fft, etc) TODO: Bert embeddings need to somehow be replaced by something else

swap to t_en

STTS2 backport

Data and embedding changes

korakoe added 6 commits December 18, 2024 13:45

Resolve data loader issues

19c3c79

swap to t_en

Get ready for small training run

c0ff8f0

Merge pull request #1 from ShoukanLabs/STTS2-backport

91cbe74

STTS2 backport

embedding stuff for diffusion

671fc87

Merge pull request #2 from ShoukanLabs/STTS2-backport

acced50

Data and embedding changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StyleTTS2 Backporting #1

StyleTTS2 Backporting #1
korakoe wants to merge 6 commits intoRespaired:mainfrom
ShoukanLabs:main

korakoe commented Dec 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

korakoe commented Dec 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant