Transform text into professional audio and video content using multiple AI models
A powerful web application that combines multiple AI language models (Claude, GPT-4, Gemini, Llama-2) with advanced text-to-speech and video generation services to create professional multimedia content from simple text inputs.
- Claude 3.5 Sonnet (Anthropic) - Advanced reasoning and natural language
- GPT-4 (OpenAI) - Industry-leading language model
- Gemini Pro (Google) - Multimodal AI capabilities
- Llama-2 (Meta via HuggingFace) - Open-source alternative
- OpenAI TTS - High-quality text-to-speech with multiple voices
- ElevenLabs - Premium voice synthesis with realistic intonation
- D-ID - AI avatar video generation with synchronized speech
- ✍️ Improve & Enhance - Polish and refine existing text
- 🎬 Video Script - Convert to professional script format
- 📖 Narration Style - Transform into storytelling narrative
- 🎙️ Podcast Intro - Create engaging podcast intros/outros
- 📚 Story Expansion - Expand ideas into full stories
- 💼 Professional Tone - Business-ready formal content
- 😊 Casual & Friendly - Conversational and approachable
- Glassmorphism design with animated gradients
- Responsive layout for all devices
- Real-time status indicators
- Intuitive user experience
- Dark mode optimized
Try it now: Your App URL
- Python 3.9 or higher
- pip package manager
- API keys for the services you want to use
- Clone the repository
git clone https://github.com/yourusername/multi-model-ai-agent.git
cd multi-model-ai-agent- Install dependencies
pip install -r requirements.txt- Run the application
streamlit run app.py- Open in browser
http://localhost:8501
- Visit console.anthropic.com
- Sign up for an account
- Navigate to API Keys section
- Create a new API key
- Pricing: Pay-as-you-go, ~$0.003 per 1K input tokens
- Visit platform.openai.com
- Create account and verify
- Generate API key
- Pricing: ~$0.03 per 1K tokens (GPT-4), $15 per 1M chars (TTS)
- Visit makersuite.google.com
- Sign in with Google account
- Create API key
- Pricing: Free tier available, then pay-as-you-go
- Visit huggingface.co/settings/tokens
- Create account
- Generate access token
- Pricing: Free tier available for inference API
- Same key as OpenAI GPT-4
- $15 per 1M characters
- Visit elevenlabs.io
- Sign up for account
- Get API key from profile
- Pricing: Free 10K chars/month, Starter $5/month (30K chars)
- Visit d-id.com
- Create account
- Access API credentials
- Pricing: Free trial, then $0.12-$0.30 per video
-
Configure API Keys
- Open sidebar
- Enter your Claude API key (required)
- Optionally add other LLM keys
- Select TTS provider and enter its key
-
Select Models
- Choose your LLM model (Claude, GPT-4, Gemini, or Llama-2)
- Select enhancement mode
- Pick TTS/Video provider and voice
-
Enter Content
- Type or paste your text (can be a topic or full content)
- Examples:
- "Create a professional introduction about AI"
- "Write a podcast intro about climate change"
- "Make a video script explaining blockchain"
-
Generate
- Click "Generate AI Content → Audio/Video"
- AI enhances your text
- Converts to audio or video
- Preview and download
- Start Simple: Begin with Claude + OpenAI TTS for best results
- Experiment: Try different enhancement modes for various styles
- Voice Selection: Test different voices to find the best fit
- Length: Keep text under 4,000 characters for optimal results
- Cost Control: Monitor API usage in respective dashboards
- Streamlit - Web application framework
- Custom CSS - Modern glassmorphism design
- HTML/CSS - Enhanced UI components
- Python 3.9+ - Core programming language
- Requests - HTTP library for API calls
- Base64 - Encoding utilities
- Anthropic Claude API - Text generation
- OpenAI API - GPT-4 & TTS
- Google AI API - Gemini
- HuggingFace API - Llama-2
- ElevenLabs API - Voice synthesis
- D-ID API - Video generation
multi-model-ai-agent/
│
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── .gitignore # Git ignore rules
│
├── screenshots/ # App screenshots
│ ├── dashboard.png
│ ├── api-config.png
│ └── results.png
│
└── .streamlit/ # Streamlit configuration (optional)
└── config.toml
- ✅ No data storage - All processing happens in real-time
- ✅ Secure API calls - Direct HTTPS connections to providers
- ✅ Browser-only keys - API keys stored in browser session
- ✅ No logging - Your content is never saved or logged
- ✅ Open source - Fully transparent codebase
- Never commit API keys to Git
- Use environment variables for production
- Rotate keys regularly
- Monitor API usage dashboards
- Set spending limits on API accounts
- Push code to GitHub
- Visit share.streamlit.io
- Connect your repository
- Set main file to
app.py - Deploy!
# Create Procfile
echo "web: streamlit run app.py --server.port=$PORT" > Procfile
# Deploy
heroku create your-app-name
git push heroku mainFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch
git checkout -b feature/AmazingFeature
- Commit your changes
git commit -m 'Add some AmazingFeature' - Push to the branch
git push origin feature/AmazingFeature
- Open a Pull Request
- Follow PEP 8 style guide
- Add comments for complex logic
- Test with multiple API providers
- Update documentation for new features
- Rate Limits: API providers have rate limits; add delays for bulk processing
- Cost: Using multiple premium APIs can accumulate costs
- D-ID Videos: May take 1-2 minutes to generate
- Text Length: Very long texts may hit token limits
- Browser Storage: API keys not persisted between sessions
- ✨ Initial release
- 🎨 Beautiful glassmorphism UI
- 🧠 Multi-model LLM support
- 🎵 Three TTS/Video providers
- 📚 Seven enhancement modes
- 🔒 Secure API handling
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Vivek YT
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Vivek YT
- GitHub: @yourusername
- YouTube: Your Channel
- Twitter: @yourhandle
- Email: your.email@example.com
- Anthropic for Claude AI
- OpenAI for GPT-4 and TTS
- Google for Gemini
- Meta AI for Llama-2
- ElevenLabs for voice synthesis
- D-ID for video generation
- Streamlit for the amazing framework
- All contributors and users
If you find this project helpful, please consider:
- ⭐ Starring the repository
- 🐛 Reporting bugs
- 💡 Suggesting new features
- 📢 Sharing with others
- ☕ Buy me a coffee
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@yourdomain.com
- Batch processing for multiple texts
- Custom voice cloning integration
- More LLM models (Cohere, AI21, etc.)
- Audio editing capabilities
- Video template selection
- Multi-language support
- API key encryption
- Usage analytics dashboard
- Export to multiple formats
- Webhook integrations


