PySpeech is a Python script that uses the Vosk speech recognition toolkit to transcribe real-time audio from your microphone.
This script initializes a Vosk model and a PyAudio stream to capture audio from your microphone. It then uses the Vosk recognizer to transcribe the audio offline, and in real-time, printing both partial and final transcriptions to the console.
- The script uses the Vosk speech recognition toolkit.
- It captures audio using PyAudio.
- Real-time transcription is printed to the console.
- Only tested on Windows, but it should work on Linux/Mac
-
Clone the repository:
git clone https://github.com/Nenotriple/PySpeech.git cd PySpeech -
Create and activate a virtual environment:
python -m venv venv venv\Scripts\activate
-
Install the required libraries:
pip install -r requirements.txt
-
Run the script:
python PySpeech.py
-
Speak into your microphone:
- The script will print
Listening...and start transcribing your speech. - Partial transcriptions will be printed as
Partial: .... - Final transcriptions will be printed as
You said: ....
- The script will print
-
Stop the script:
- Press
Ctrl+Cto stop the script. The script will handle the interrupt and close the audio stream gracefully. (Or close the terminal)
- Press
- Ensure your microphone is properly configured and accessible by PyAudio.
- It should be the default input device.
- The script is configured to use a sample rate of 16000 Hz and a single audio channel.
- Additional Vosk models can be found here: https://alphacephei.com/vosk/models
