Smart Docs is a tool to extract requirements from audio files.
The backend requires several environment variables to be set for proper operation. These variables are defined in apps/api/.env.example. When running with Docker, you should copy this file to apps/api/.env and customize the values as needed.
These variables are validated in apps/api/src/shared/config/envs.ts.
| Variable | Description | Default |
|---|---|---|
NODE_ENV |
Node.js environment. | dev |
PORT |
Port for the API server. | 8080 |
CLIENT_URL |
URL of the frontend application. | http://localhost:3000 |
| Variable | Description | Required |
|---|---|---|
DATABASE_URL |
URL for your PostgreSQL database. | Yes |
RABBITMQ_URL |
URL for your RabbitMQ instance. | Yes |
| Variable | Description | Default |
|---|---|---|
OLLAMA_API_URL |
URL for your local Ollama API server. | http://localhost:11434 |
| Variable | Description | Required |
|---|---|---|
TRANSCRIPTION_MODEL |
Model used for audio transcription (e.g., 'tiny'). | Yes |
TRANSCRIPTION_LANGUAGE |
Language used for transcription (e.g., 'en', 'pt'). | Yes |
| Variable | Description | Required |
|---|---|---|
ANALYTICS_MODEL |
Ollama model used for analysis (e.g., 'llama3'). | Yes |
| Variable | Description | Default/Required |
|---|---|---|
GATEKEEPER_TRANSCRIPTION_MODEL |
Model used for fast transcription by the Gatekeeper. | Yes |
GATEKEEPER_ANALYTICS_MODEL |
Model used for context validation by the Gatekeeper. | Yes |
TRANSCRIPTION_LANGUAGE |
Language for transcription. | Yes |
MAX_RETRIES |
Maximum number of times to sample the audio. | 3 |
SAMPLE_DURATION |
Duration (in seconds) of each audio sample. | 30 |
The core philosophy of SmartDocs is "local-first." Your data is processed on your own hardware without being sent to third-party cloud services. The system listens for an audio file, processes it through an asynchronous pipeline, and generates a structured Markdown requirements document saved to your local filesystem.
- Local-First Processing: All processing, from transcription to AI analysis, happens on your machine.
- Event-Driven Architecture: Built on a robust, scalable architecture using RabbitMQ for asynchronous job processing.
- AI-Powered Filtering: A "Gatekeeper" worker uses a lightweight LLM to quickly discard irrelevant audio (e.g., music, noise).
- Multilingual Transcription: Utilizes Whisper for accurate speech-to-text conversion with support for multiple languages.
- Intelligent Analysis: A powerful LLM analyzes the transcription to generate professional Software Requirements Specification (SRS) documents.
- Markdown Output: Generates well-structured, readable Markdown documents instead of raw JSON.
- Document Download: Download generated requirements documents via a dedicated API endpoint.
- Processing Cache: Avoids re-processing by caching results based on the audio file's hash.
- Status Tracking: An API endpoint allows you to monitor the real-time status of your processing job.
The system is a TypeScript monorepo managed by Turborepo. The backend is built with Bun and ElysiaJS, communicating with a series of background workers via RabbitMQ.
- API (
apps/api): The main entry point. It receives an audio file, generates a hash, checks for a cached result, and if none exists, places a new job in theq.audio.newqueue. - Gatekeeper Worker: Consumes from
q.audio.new. It validates the audio for speech content. If valid, it passes the job to theq.audio.transcribequeue. - Transcriber Worker: Consumes from
q.audio.transcribe. It performs a full transcription of the audio and places the resulting text in theq.transcript.analyzequeue. - Analyst Worker: Consumes from
q.transcript.analyze. It uses a powerful LLM to generate a structured Markdown SRS document and saves it to the filesystem.
- Runtime: Bun
- Backend Framework: ElysiaJS
- Frontend: Next.js with React
- Database: PostgreSQL with Drizzle ORM
- Message Broker: RabbitMQ
- Local AI: Ollama
- AI Models:
phi3:mini(for classification)deepseek-coder(for analysis)nodejs-whisperwithtiny&smallmodels (for transcription)
- Audio Processing: FFmpeg
The project runs in a Hybrid Mode:
- Infrastructure: PostgreSQL and RabbitMQ run in Docker.
- Application: The Web App, API, and Workers run locally on your machine using Bun.
- Docker: Install Docker (or OrbStack)
- Bun: Install Bun
- FFmpeg:
brew install ffmpeg - Ollama: Must be installed and running on your host machine. Download Ollama
git clone <your-repository-url>
cd <repository-name>The AI models run on your host machine using Ollama.
- Pull Ollama Models:
ollama pull phi3:mini ollama pull deepseek-coder
Start the database and message broker using Docker Compose:
docker compose up -dInstall the project dependencies and run the database migrations:
bun install
cd apps/api && bun run db:migrateYou can run the entire application (Web, API, and Workers) with a single command from the project root:
bun run devThis command uses Turborepo to run the following services in parallel:
- Web App: http://localhost:3000
- API Server: http://localhost:8080
- Gatekeeper Worker
- Transcription Worker
- Analyst Worker
- Press
Ctrl+Cto stop the application services. - Run
docker compose downto stop the infrastructure.
- Open http://localhost:3000 in your browser.
- Upload an audio file (MP3, WAV, M4A, MP4).
- The system will process the file through the pipeline. You can watch the progress in the web interface.
- Once processing is complete, click the "Download Requirements Document" button to get your Markdown file.
- Alternatively, access documents via the API at
http://localhost:8080/gateway/download/{audio_hash}.