A production-ready voice assistant application powered by Google Gemini AI, Google Speech Recognition, and Microsoft Edge Text-to-Speech. Users can interact via text or voice input and receive intelligent spoken responses.
- Voice Input - Speak your questions using your microphone
- Text Input - Type messages for quick interactions
- Voice Output - Get spoken responses from the assistant
- Dark/Light Theme - Toggle between themes for comfort
- Multiple Voices - Choose from various voice options
- AI-Powered - Leverages Google Gemini for intelligent responses
- Responsive Design - Works on desktop and mobile devices
- Production-Ready - Proper logging, error handling, and configuration management
- Python 3.8+
- Flask 3.0.0 - Web framework
- Google Gemini API - AI processing and natural language understanding
- Google Speech Recognition - Speech-to-text conversion
- Microsoft Edge TTS - Text-to-speech synthesis
- Gunicorn - Production WSGI server
- HTML5 - Semantic markup
- CSS3 - Modern styling with custom properties
- JavaScript - Interactive features and API communication
- Bootstrap 5 - UI framework
- Font Awesome - Icons
- Python 3.8 or higher
- pip (Python package manager)
- A Google Gemini API key - Get one here
- A modern web browser with microphone support
- Clone the repository
git clone https://github.com/hasancoded/voice-assistant.git
cd voice-assistant- Create and activate virtual environment
# Create virtual environment
python -m venv venv
# Activate it
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate- Install dependencies
pip install -r requirements.txtNote: On some systems, you may need to install additional dependencies for PyAudio:
- Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-pyaudio- macOS:
brew install portaudio- Windows: PyAudio binaries are available via pip
- Configure environment variables
# Copy the example file
cp .env.example .env
# Edit .env and add your Google Gemini API key
GEMINI_API_KEY=your_actual_gemini_api_key_here- Run the application
python -m src.appThe application will start on http://localhost:8080
- Open in browser
Navigate to http://localhost:8080 and allow microphone access when prompted.
# Build the Docker image
docker build -t voice-assistant .
# Run the container
docker run -p 8080:8080 -e GEMINI_API_KEY=your_key voice-assistantvoice-assistant/
├── src/
│ ├── app.py # Main Flask application
│ ├── config.py # Configuration management
│ ├── routes/
│ │ ├── main.py # Main page and health check routes
│ │ └── api.py # API endpoints
│ ├── services/
│ │ ├── ai_service.py # Google Gemini integration
│ │ ├── speech_service.py # Speech-to-text service
│ │ └── tts_service.py # Text-to-speech service
│ └── utils/
│ └── logger.py # Logging configuration
├── templates/
│ └── index.html # Main HTML page
├── static/
│ ├── css/
│ │ └── style.css # Styling
│ └── js/
│ └── script.js # Frontend JavaScript
├── tests/ # Test files
├── scripts/ # Utility scripts
├── logs/ # Application logs
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
├── LICENSE # MIT License
└── README.md # This file
- Type your message in the input box
- Press Enter or click the send button
- Wait for the AI response
- Click the microphone button
- Speak your question clearly
- Click the stop button (or wait)
- The assistant will transcribe and respond
Use the dropdown menu in the header to select different voice options:
- Default (US Female)
- Emily (US Female)
- Michael (US Male)
- James (UK Male)
- Allison (US Female)
Click the moon/sun icon in the header to switch between light and dark themes.
Serves the main application page.
Health check endpoint for monitoring.
Response:
{
"status": "healthy",
"service": "voice-assistant"
}Convert speech audio to text.
Request: Binary audio data (WAV format)
Response:
{
"text": "transcribed text"
}Process user message and generate AI response with speech.
Request:
{
"userMessage": "your message here",
"voice": "en-US_EmilyV3Voice"
}Response:
{
"openaiResponseText": "AI response text",
"openaiResponseSpeech": "base64_encoded_audio"
}All configuration can be managed through environment variables in .env:
# AI Service Configuration
GEMINI_API_KEY=your_gemini_api_key_here
# Server Configuration
FLASK_ENV=production
FLASK_DEBUG=False
PORT=8080
HOST=0.0.0.0
# Logging
LOG_LEVEL=INFOEnsure virtual environment is activated and dependencies are installed:
pip install -r requirements.txt- Check browser permissions (click the lock icon in address bar)
- Use HTTPS or localhost (required for microphone access)
- Try a different browser (Chrome/Edge recommended)
- Verify your API key in
.envis correct - Check you have API access enabled in Google AI Studio
- Ensure you're using a valid model name
Change the port in .env:
PORT=8081- Never commit your
.envfile - It contains sensitive API keys - Keep your Google Gemini API key secure
- Use environment variables for all secrets
- Be mindful of API usage costs
- Implement rate limiting for production use
Be aware of API usage costs:
- Google Gemini - Free tier available, check current pricing at https://ai.google.dev/pricing
- Google Speech Recognition - Free tier available for limited usage
- Microsoft Edge TTS - Free service
Monitor your usage in the Google AI Studio dashboard.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues:
- Check the Troubleshooting section
- Review the logs in
logs/app.log - Check the browser console for JavaScript errors (F12)
- Open an issue on GitHub
Professional voice assistant powered by Google Gemini AI