This is an example of using real-time voice recognition from an audio output device using the Yandex Cloud API.
INSTALLATON:
- install Python 3.10.11
- git clone https://github.com/attyru/ya_cloud_api_demo
- pip install yandex-speechkit pyaudiowpatch argparse grpcio
- cd ./ya_cloud_api_demo
- python -m main --secret your_yandex_api_key_here
USE:
- select output device from list. or view device indexes with arg --list_only True and run with arg --device N
- u can view recognised text from audio out in console, gui widget, and logfile.
CL args: --secret your_API_key_or_IAM_token --log path_2_log_file_4_recognized_text_def_./recognition_log.txt --duration session_duration_in_seconds_def_300 --device forced_device_number_def_None --list true_or_false_def_false
Known issues:
- The text is recognized in parts - given the previous ones, so the output looks ugly. I'm working on a fix.
- Mixed languages are not recognized, only Russian. Done.
- The widget does not have the ability to interactively resize the window. Done.
- The close button on the widget does not work correctly - it closes the widget but does not terminate the process.
- The minimize button on the widget throws an exception. Done.
Planned features:
- Recognizing speaker identity from local samples.
- Possibility to use Google API and engines based on openai 'whisper' library.