A fully client-side AI chat application powered by Next.js, Tailwind CSS, and WebLLM. This application runs entirely in your browser with no backend required - all AI processing happens locally on your device.
- 100% Client-Side: All AI processing happens in your browser using WebLLM
- Privacy-First: No data sent to external servers, everything stays on your device
- Multi-Agent System: π€ Specialized AI agents work together for comprehensive Sanskrit literature answers
- Streaming Responses: β‘ Real-time token-by-token generation like ChatGPT
- Intelligent Routing: Orchestrator agent classifies queries and routes to specialized agents
- Iterative Refinement: Agents can request additional searches for better answers
- Real-Time Status Updates: See what each agent is doing as they process your query
- Multiple LLM Support: Choose from various optimized models (Llama, Phi, Qwen)
- ChatGPT-like Interface: Professional, modern UI similar to ChatGPT
- Persistent Model Selection: Remembers your chosen model across sessions
- Markdown Support: Rich text formatting in AI responses
- Dark Mode: Automatic dark mode based on system preferences
- Responsive Design: Works great on desktop, tablet, and mobile
- Node.js 18+
- A modern browser with WebGPU support (Chrome 113+, Edge 113+)
- At least 8GB RAM for running larger models
- Clone the repository:
git clone <your-repo-url>
cd rgfe- Install dependencies:
npm install- Run the development server:
npm run dev- Open http://localhost:3000 in your browser
The application is deployed on GitHub Pages and can be accessed at: https://[your-username].github.io/rgfe/
This project is configured for automatic deployment to GitHub Pages:
- Enable GitHub Pages: Go to your repository Settings β Pages
- Source: Select "GitHub Actions" as the source
- Automatic Deployment: Every push to
main/masterbranch will automatically deploy
The deployment workflow will:
- Build the static site using
npm run export - Deploy to GitHub Pages
- Make the site available at
https://[username].github.io/rgfe/
To deploy manually:
# Build the static site
npm run export
# The built files will be in the 'out' directory
# Upload the contents of 'out' to your web serverTo test the static build locally:
npm run export
npx serve out-
When you first open the application, you'll be prompted to select an AI model
-
Choose a model based on your needs:
- Qwen3 8B: Recommended - Best multilingual capability and reasoning (4.8GB)
- Gemma 3 4B Instruct: Google's efficient model with strong performance (2.4GB)
- Mistral 7B Instruct v0.3: High-quality reasoning and instruction following (4.2GB)
- Qwen3 4B: Balanced performance with good multilingual support (2.4GB)
- Qwen3 0.6B: Ultra-lightweight model for fast responses (0.4GB)
-
The model will download and cache in your browser (this happens only once)
-
Once loaded, you can start chatting immediately
- Type your message in the input box at the bottom
- Press Enter to send (or click the send button)
- Press Shift + Enter to add a new line
- Click "New Chat" to start a fresh conversation
- Watch for capability badges in the header:
- β‘ Streaming enabled: Responses appear in real-time
- π§ Tool calls supported: Model can handle function calls
The selected model is automatically cached in your browser's IndexedDB storage. The model choice persists across browser sessions, so you won't need to download it again.
This application features a sophisticated multi-agent system specialized in Sanskrit literature queries. Three specialized AI agents work together to provide comprehensive, well-researched answers:
User Query β Orchestrator Agent β Classification
β
Sanskrit-related? Non-Sanskrit β Polite decline
β
Searcher Agent β Find relevant texts
β
Generator Agent β Analyze & Generate answer
β
Need more info? β Refined search β Loop back
β
Stream final answer to user
- π€ Orchestrator Agent - Classifies queries and routes to appropriate agents
- π Searcher Agent - Finds relevant information from Sanskrit texts using semantic search
- π Generator Agent - Creates comprehensive answers with citations
The search system uses Google's EmbeddingGemma (300M parameters) running entirely in the browser:
- In-Browser Embeddings: Transformers.js powers local embedding generation
- Privacy-First: No data sent to external servers
- 10,000+ Documents: Sanskrit passages with 512-dimensional embeddings
- Multilingual: Supports 100+ languages
- Fast: Query embeddings generated in ~200-500ms after model load
For details, see EMBEDDINGGEMMA_INTEGRATION.md
Watch the agents work with real-time status updates:
- π€ "Analyzing your query..."
- π "Searching for relevant information..."
- β "Found 5 relevant sources"
- π "Generating comprehensive answer..."
- β "Answer complete"
Try asking about:
- "Explain the concept of Dharma in the Bhagavad Gita"
- "What are the main schools of Hindu philosophy?"
- "Tell me about the Upanishads"
- "Summarize the story of the Mahabharata"
For more details, see MULTI_AGENT_SYSTEM.md
rgfe/
βββ app/
β βββ components/
β β βββ AgentChatInterface.tsx # Multi-agent chat UI
β β βββ AgentChatMessage.tsx # Agent message renderer
β β βββ ChatInterface.tsx # Original chat UI
β β βββ ChatMessage.tsx # Individual message display
β β βββ LLMSelector.tsx # Model selection modal
β β βββ LoadingScreen.tsx # Loading screen component
β βββ hooks/
β β βββ useWebLLM.ts # WebLLM integration hook
β β βββ useMultiAgent.ts # Multi-agent orchestration
β βββ lib/
β β βββ agents/
β β β βββ types.ts # Agent type definitions
β β β βββ orchestrator.ts # Orchestrator agent
β β β βββ searcher.ts # Searcher agent
β β β βββ generator.ts # Generator agent
β β βββ webllm-provider.ts # AI SDK adapter for WebLLM
β βββ globals.css # Global styles
β βββ layout.tsx # Root layout
β βββ page.tsx # Main page component
βββ docs/
β βββ MULTI_AGENT_SYSTEM.md # Multi-agent architecture docs
β βββ FEATURES.md # Complete feature list
β βββ ARCHITECTURE.md # Technical deep dive
β βββ ...
βββ public/ # Static assets
βββ package.json
βββ README.md
- Next.js 15: React framework with App Router
- TypeScript: Type-safe JavaScript
- Tailwind CSS: Utility-first CSS framework
- WebLLM: Browser-based LLM inference powered by MLC
- Vercel AI SDK: Framework for building multi-agent AI systems
- Marked: Markdown parser for rich text rendering
- Zod: Schema validation for agent communications
To add new models to the selection list, edit app/components/LLMSelector.tsx:
const availableModels = [
{
id: 'model-id-from-webllm',
name: 'Display Name',
size: 'Download Size',
description: 'Model description',
},
// Add more models here
];Check the WebLLM model list for available models.
- Global styles:
app/globals.css - Component-specific styles: Inline Tailwind classes in each component
- Color scheme: Automatically adapts to system dark/light mode
This application requires WebGPU support:
- β Chrome 113+
- β Edge 113+
- β Opera 99+
β οΈ Firefox (experimental support with flags)- β Safari (not yet supported)
- First Load: The initial model download may take 1-5 minutes depending on your internet speed
- Memory Usage: Larger models (7B-8B) require more RAM. Close other tabs if experiencing issues
- Generation Speed: Speed depends on your hardware. Newer GPUs provide faster inference
- Model Selection: Start with smaller models (3B-4B) if you have limited resources
Contributions are welcome! Please feel free to submit a Pull Request.
ISC
- WebLLM for browser-based LLM inference
- MLC LLM for the underlying ML compilation technology
- Transformers.js by HuggingFace for in-browser ML models
- EmbeddingGemma by Google DeepMind for semantic embeddings
- Orama for fast in-memory vector search
- Vercel for Next.js
Model fails to load:
- Ensure you're using Chrome 113+ or Edge 113+
- Check WebGPU support at
chrome://gpu - Try a smaller model first (Qwen3 0.6B or Gemma 2 2B)
Out of memory errors:
- Close other browser tabs
- Try a smaller model
- Ensure you have enough RAM available
Slow generation:
- Normal for first-time use as model warms up
- Consider using a smaller model
- Check if hardware acceleration is enabled in browser settings
For comprehensive troubleshooting, see TROUBLESHOOTING.md
A dedicated test page is available to verify the EmbeddingGemma + Orama integration:
-
Start the dev server:
npm run dev
-
Open the test page: Navigate to
http://localhost:3000/test-search -
Click "Initialize System" to load the embedding model and search index
-
Enter a test query (e.g., "fire sacrifice ritual") and click "Search"
The test page provides:
- Real-time initialization progress
- Performance metrics (initialization, embedding, search times)
- Detailed console output
- Visual search results with scores and sources
For detailed testing instructions, see TESTING_SEARCH.md
- MULTI_AGENT_SYSTEM.md: Multi-agent architecture & design
- EMBEDDINGGEMMA_INTEGRATION.md: In-browser semantic search with EmbeddingGemma
- TESTING_SEARCH.md: Testing semantic search functionality
- FEATURES.md: Complete feature list
- MODEL_CAPABILITIES.md: Streaming & tool call guide
- ARCHITECTURE.md: Technical deep dive
- QUICK_START.md: Getting started guide
- DEPLOYMENT.md: Deployment instructions
- TROUBLESHOOTING.md: Comprehensive troubleshooting
Built with β€οΈ using Next.js and WebLLM