Skip to content

StyleTTS2 Backporting #1

Open
korakoe wants to merge 6 commits intoRespaired:mainfrom
ShoukanLabs:main
Open

StyleTTS2 Backporting #1
korakoe wants to merge 6 commits intoRespaired:mainfrom
ShoukanLabs:main

Conversation

@korakoe
Copy link

@korakoe korakoe commented Dec 22, 2024

Feel free to look at the changes and close

1. separate prosody and style encoders

2. implement diffusion (Bert embeddings still need work)

3. do proper cross entropy as opposed to the STTS1 way of doing cross entropy, use `s2s_attn_feat` (now just `s2s_attn` like in stts2) directly

4. add `max_len` parameter, may not be needed, but added in case we encounter a massive audio file (simply setting to a very large number will have the same effect)

5. Alter dataloader to support ref_mels and waves (waves is unnecessary, as its used for SLM and decoder losses, but STTS1 isn't end to end, kept in just in case)

TODO: Add accelerate to second stage

TODO: Un-hardcode all data params (sr, hop, n_fft, etc)

TODO: Bert embeddings need to somehow be replaced by something else
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant