GitHub

This is an example of using real-time voice recognition from an audio output device using the Yandex Cloud API.

INSTALLATON:

install Python 3.10.11
git clone https://github.com/attyru/ya_cloud_api_demo
pip install yandex-speechkit pyaudiowpatch argparse grpcio
cd ./ya_cloud_api_demo
python -m main --secret your_yandex_api_key_here

USE:

select output device from list. or view device indexes with arg --list_only True and run with arg --device N
u can view recognised text from audio out in console, gui widget, and logfile.

CL args: --secret your_API_key_or_IAM_token --log path_2_log_file_4_recognized_text_def_./recognition_log.txt --duration session_duration_in_seconds_def_300 --device forced_device_number_def_None --list true_or_false_def_false

Known issues:

The text is recognized in parts - given the previous ones, so the output looks ugly. I'm working on a fix.
Mixed languages are not recognized, only Russian. Done.
The widget does not have the ability to interactively resize the window. Done.
The close button on the widget does not work correctly - it closes the widget but does not terminate the process.
The minimize button on the widget throws an exception. Done.

Planned features:

Recognizing speaker identity from local samples.
Possibility to use Google API and engines based on openai 'whisper' library.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
google		google
yandex/cloud		yandex/cloud
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

attyru/ya_cloud_api_demo

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages