Skip to content

A lightweight LSTM encoder–decoder model for grapheme-to-phoneme conversion trained on CMUdict.

License

Notifications You must be signed in to change notification settings

galactixx/simple-g2p

Repository files navigation

simple-g2p

Grapheme-to-phoneme (G2P) conversion predicts phonetic pronunciations from written words, essential for text-to-speech and speech recognition systems. This project implements an LSTM encoder–decoder model trained on CMUdict that converts graphemes to phoneme sequences.

Results

LSTM

The encoder–decoder Bi-LSTM model uses Luong attention, scheduled sampling during training, and greedy decoding for inference. It achieves a test sequence-level accuracy of 77% (WER of 23%) on the CMUdict dataset. Beam search did not improve upon greedy decoding when evaluated on the testing set.

Training Graphs

Loss vs Sequence Accuracy: Training loss plotted against sequence-level accuracy throughout training epochs.

Loss vs Sequence Accuracy

Loss vs Teacher Forcing: Training loss evolution with varying teacher forcing probabilities during scheduled sampling.

Loss vs Teacher Forcing

About

A lightweight LSTM encoder–decoder model for grapheme-to-phoneme conversion trained on CMUdict.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages