AI-driven pipeline for image captioning and multilingual translation, with a React UI and a Node API. A Python text-to-audio stage is planned for future integration.
- Frontend: React + Vite, runs the image captioning model in-browser.
- Backend: Node + Express, runs the translation model and exposes a
/translateAPI. - Models: Hugging Face Transformers (image-to-text + NLLB translation).
react/React UI and captioning logic.node-server/Node API for translation and Docker setup.
- Node.js 20
- pnpm 10
- Docker (optional, for containerized Node API)
The AI models run locally on your machine (browser for captioning, server for translation). Ensure you have enough memory, and expect higher RAM usage on lower-end hardware.
- Start the Node API
cd node-server
pnpm install
node index.js
- Start the React app
cd react
pnpm install
pnpm dev
From the node-server/ folder:
docker compose up --build
- Exposes the API at http://localhost:3000
- React loads an image URL.
- Browser runs the image-captioning model.
- React calls
POST /translatewith the caption text. - Node translates EN -> PT-BR and returns the translated text.
- Python TTS integration planned for a later stage.
- CORS is configured to allow http://localhost:5173.
- Update ports or origins in the server if you change the frontend URL.