🎙️ Multi-Model AI Agent

Transform text into professional audio and video content using multiple AI models

A powerful web application that combines multiple AI language models (Claude, GPT-4, Gemini, Llama-2) with advanced text-to-speech and video generation services to create professional multimedia content from simple text inputs.

🌟 Features

🧠 Multi-Model Text Generation

Claude 3.5 Sonnet (Anthropic) - Advanced reasoning and natural language
GPT-4 (OpenAI) - Industry-leading language model
Gemini Pro (Google) - Multimodal AI capabilities
Llama-2 (Meta via HuggingFace) - Open-source alternative

🎵 Professional Audio/Video Output

OpenAI TTS - High-quality text-to-speech with multiple voices
ElevenLabs - Premium voice synthesis with realistic intonation
D-ID - AI avatar video generation with synchronized speech

✨ Smart Enhancement Modes

✍️ Improve & Enhance - Polish and refine existing text
🎬 Video Script - Convert to professional script format
📖 Narration Style - Transform into storytelling narrative
🎙️ Podcast Intro - Create engaging podcast intros/outros
📚 Story Expansion - Expand ideas into full stories
💼 Professional Tone - Business-ready formal content
😊 Casual & Friendly - Conversational and approachable

🎨 Beautiful Modern UI

Glassmorphism design with animated gradients
Responsive layout for all devices
Real-time status indicators
Intuitive user experience
Dark mode optimized

🚀 Live Demo

Try it now: Your App URL

📸 Screenshots

Main Dashboard

API Configuration

Generated Content

🛠️ Installation

Prerequisites

Python 3.9 or higher
pip package manager
API keys for the services you want to use

Local Setup

Clone the repository

git clone https://github.com/yourusername/multi-model-ai-agent.git
cd multi-model-ai-agent

Install dependencies

pip install -r requirements.txt

Run the application

streamlit run app.py

Open in browser

http://localhost:8501

🔑 Getting API Keys

Required (Choose at least one LLM):

Claude (Recommended)

Visit console.anthropic.com
Sign up for an account
Navigate to API Keys section
Create a new API key
Pricing: Pay-as-you-go, ~$0.003 per 1K input tokens

OpenAI

Visit platform.openai.com
Create account and verify
Generate API key
Pricing: ~$0.03 per 1K tokens (GPT-4), $15 per 1M chars (TTS)

Google Gemini

Visit makersuite.google.com
Sign in with Google account
Create API key
Pricing: Free tier available, then pay-as-you-go

HuggingFace

Visit huggingface.co/settings/tokens
Create account
Generate access token
Pricing: Free tier available for inference API

Audio/Video (Choose one):

OpenAI TTS

Same key as OpenAI GPT-4
$15 per 1M characters

ElevenLabs

Visit elevenlabs.io
Sign up for account
Get API key from profile
Pricing: Free 10K chars/month, Starter $5/month (30K chars)

D-ID

Visit d-id.com
Create account
Access API credentials
Pricing: Free trial, then $0.12-$0.30 per video

📋 Usage Guide

Basic Workflow

Configure API Keys
- Open sidebar
- Enter your Claude API key (required)
- Optionally add other LLM keys
- Select TTS provider and enter its key
Select Models
- Choose your LLM model (Claude, GPT-4, Gemini, or Llama-2)
- Select enhancement mode
- Pick TTS/Video provider and voice
Enter Content
- Type or paste your text (can be a topic or full content)
- Examples:
  - "Create a professional introduction about AI"
  - "Write a podcast intro about climate change"
  - "Make a video script explaining blockchain"
Generate
- Click "Generate AI Content → Audio/Video"
- AI enhances your text
- Converts to audio or video
- Preview and download

Pro Tips

Start Simple: Begin with Claude + OpenAI TTS for best results
Experiment: Try different enhancement modes for various styles
Voice Selection: Test different voices to find the best fit
Length: Keep text under 4,000 characters for optimal results
Cost Control: Monitor API usage in respective dashboards

💻 Tech Stack

Frontend

Streamlit - Web application framework
Custom CSS - Modern glassmorphism design
HTML/CSS - Enhanced UI components

Backend

Python 3.9+ - Core programming language
Requests - HTTP library for API calls
Base64 - Encoding utilities

AI/ML Services

Anthropic Claude API - Text generation
OpenAI API - GPT-4 & TTS
Google AI API - Gemini
HuggingFace API - Llama-2
ElevenLabs API - Voice synthesis
D-ID API - Video generation

📁 Project Structure

multi-model-ai-agent/
│
├── app.py                 # Main Streamlit application
├── requirements.txt       # Python dependencies
├── README.md             # Project documentation
├── .gitignore            # Git ignore rules
│
├── screenshots/          # App screenshots
│   ├── dashboard.png
│   ├── api-config.png
│   └── results.png
│
└── .streamlit/           # Streamlit configuration (optional)
    └── config.toml

🔒 Security & Privacy

✅ No data storage - All processing happens in real-time
✅ Secure API calls - Direct HTTPS connections to providers
✅ Browser-only keys - API keys stored in browser session
✅ No logging - Your content is never saved or logged
✅ Open source - Fully transparent codebase

Best Practices

Never commit API keys to Git
Use environment variables for production
Rotate keys regularly
Monitor API usage dashboards
Set spending limits on API accounts

🚢 Deployment

Streamlit Cloud (Recommended)

Push code to GitHub
Visit share.streamlit.io
Connect your repository
Set main file to app.py
Deploy!

Heroku

# Create Procfile
echo "web: streamlit run app.py --server.port=$PORT" > Procfile

# Deploy
heroku create your-app-name
git push heroku main

Docker

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch
```
git checkout -b feature/AmazingFeature
```
Commit your changes
```
git commit -m 'Add some AmazingFeature'
```
Push to the branch
```
git push origin feature/AmazingFeature
```
Open a Pull Request

Development Guidelines

Follow PEP 8 style guide
Add comments for complex logic
Test with multiple API providers
Update documentation for new features

🐛 Known Issues & Limitations

Rate Limits: API providers have rate limits; add delays for bulk processing
Cost: Using multiple premium APIs can accumulate costs
D-ID Videos: May take 1-2 minutes to generate
Text Length: Very long texts may hit token limits
Browser Storage: API keys not persisted between sessions

📝 Changelog

Version 1.0.0 (2024-11-16)

✨ Initial release
🎨 Beautiful glassmorphism UI
🧠 Multi-model LLM support
🎵 Three TTS/Video providers
📚 Seven enhancement modes
🔒 Secure API handling

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2024 Vivek YT

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

👨‍💻 Author

Vivek YT

GitHub: @yourusername
YouTube: Your Channel
Twitter: @yourhandle
Email: your.email@example.com

🙏 Acknowledgments

Anthropic for Claude AI
OpenAI for GPT-4 and TTS
Google for Gemini
Meta AI for Llama-2
ElevenLabs for voice synthesis
D-ID for video generation
Streamlit for the amazing framework
All contributors and users

💖 Support

If you find this project helpful, please consider:

⭐ Starring the repository
🐛 Reporting bugs
💡 Suggesting new features
📢 Sharing with others
☕ Buy me a coffee

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
README.md		README.md
requirements.txt		requirements.txt
streamlit-tts-agent.py		streamlit-tts-agent.py

vivek7557/Multi_Model_AI_Agent

Folders and files

Latest commit

History

Repository files navigation