AI-Powered Vision Assistant for Accessibility
A comprehensive accessibility-focused vision assistant featuring voice interaction, multilingual support, and intelligent image analysis. Built with modern web technologies and designed for users with visual impairments.
- Three-Mode UX System: Chat Mode, Image Mode, and Voice+Image Mode
- AI-Powered Vision: Advanced image analysis using Groq Llama Scout 4
- Voice Interaction: Real-time transcription and bidirectional voice chat
- Text-to-Speech: High-quality TTS with stop controls and number formatting
- Multilingual Support: 12 languages with automatic browser detection
- English, Spanish, French, German, Italian, Portuguese
- Russian, Japanese, Korean, Chinese, Arabic, Hindi
- Auto-detection from browser language preferences
- Complete UI internationalization (i18n)
- Screen reader optimized interface
- High contrast mode support
- Material Design accessibility standards
- Voice-first interaction design
- Keyboard navigation support
- Backend: Cloudflare Workers with Groq API integration
- Frontend: Vanilla JavaScript with Material Design
- Voice Processing: Groq Whisper for transcription
- Security: Input sanitization and CORS protection
- Performance: Intelligent image compression and caching
- Node.js (v18 or later)
- Cloudflare account
- Groq API key
-
Clone the repository
git clone https://github.com/sortedmess/augen.git cd augen -
Install dependencies
npm install
-
Setup environment
npm run setup # This copies wrangler.toml.local to wrangler.toml -
Configure your Groq API key
npx wrangler secret put GROQ_API_KEY -e production
-
Deploy the worker
npm run deploy:worker
-
Start local development
npm run serve
Visit http://localhost:8080 to see the application running locally.
- Vanilla JavaScript → No frameworks, maximum compatibility
- Static hosting → Deploy anywhere (GitHub Pages, Netlify, Vercel)
- Mobile-first → Optimized for touch devices
- High contrast CSS → WCAG 2.1 AA compliant
- Cloudflare Workers → Serverless, global edge deployment
- Groq API integration → Deploy your own worker with your Groq API key
- Groq APIs → Ultra-fast AI inference
- Vision: Llama Scout 4
- Voice: Whisper for transcription
- TTS: Native browser speech synthesis (offline, no API cost)
After deploying your own Cloudflare Worker, you'll have:
POST /api/analyze- Vision analysis endpointPOST /api/transcribe- Voice transcription endpointPOST /api/voice-query- Voice chat endpointGET /api/health- Health check endpoint
Note: You need to deploy your own worker with your Groq API key for production use
- No TTS API costs → Uses free native browser speech synthesis
- ~90% cheaper vision API than OpenAI GPT-5 with competitive performance that beats GPT-4o
- Live demo available → See the full app in action at the demo site
- Groq API required → Deploy your own Cloudflare Worker with your Groq API key
- Open source → Full code available for customization and deployment
- Restaurant menus
- Street signs
- Product labels
- Documents and papers
- Handwritten notes
- Navigation assistance
- Object identification
- People and activities
- Environmental awareness
- Safety assessment
- Morse code output for deaf-blind users
- Haptic feedback patterns
- Summary vs detailed descriptions
- Multiple language support
augen/
├── assets/
│ ├── css/style.css # Material Design styling
│ └── js/script.js # Main application logic
├── src/
│ └── worker/
│ └── worker.js # Cloudflare Workers backend
├── index.html # Main application interface
├── wrangler.toml # Cloudflare Workers config
└── package.json # Project metadata
# Start local server
npm run serve
# Start worker development
npm run dev
# Deploy to production
npm run deployWe welcome contributions! This project is built for the accessibility community.
- Accessibility improvements → Better screen reader support, motor accessibility
- Language support → Multi-language interface and descriptions
- Voice options → Different TTS voices and speeds
- Performance → Faster processing, offline capabilities
- Documentation → Better guides, tutorials, examples
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Test accessibility with screen readers
- Submit a pull request
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.
- Documentation → Help page
- Issues → GitHub Issues
- Discussions → GitHub Discussions
Built for accessibility
Augen means "eyes" in German - helping everyone see the world through AI.