Core server and API for Better DEV, an intelligent AI chat platform with tool-calling capabilities and streaming responses.
- Overview
- Architecture
- Features
- Tech Stack
- Prerequisites
- Getting Started
- Project Structure
- API Documentation
- Operational Modes
- Tool System
- Deployment
- Development
- Environment Variables
- Contributing
Better DEV AI Backend is a production-ready NestJS application that powers an intelligent conversational AI platform. It provides:
- Real-time AI Conversations with streaming responses using AI SDK v5
- Intelligent Tool Calling system with web search capabilities
- Conversation Management with persistent storage
- User Authentication with JWT-based security
- Multi-Model AI Support using Groq (Llama models)
- Extensible Tool Architecture for adding custom capabilities
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Layer β
β (React/Next.js Frontend, Mobile Apps, API Clients) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β HTTPS/REST
β Server-Sent Events (SSE)
ββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway (Nginx) β
β - Load Balancing β
β - SSL Termination β
β - Request Routing β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββ
β NestJS Application Layer β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Controllers (REST Endpoints) β β
β β - AuthController: /auth/* β β
β β - ChatController: /chat/* β β
β β - HealthController: /health β β
β ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ β
β β Service Layer β β
β β β β
β β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β β
β β β AuthService β β ChatService β β UserService β β β
β β ββββββββ¬βββββββ ββββββββ¬ββββββββ βββββββββ¬ββββββββ β β
β β β β β β β
β β β βββββββββββββ΄βββββββββββββ β β β
β β β β AIService β β β β
β β β β - analyzeQueryIntent()β β β β
β β β β - streamResponseWith β β β β
β β β β Mode() β β β β
β β β β - generateResponse() β β β β
β β β βββββββββββββ¬βββββββββββββ β β β
β β β β β β β
β β β βββββββββββββ΄βββββββββββββ β β β
β β β β Operational Modes β β β β
β β β β β β β β
β β β β ββββββββββββββββββββββ β β β β
β β β β β ModeResolver β β β β β
β β β β β Service β β β β β
β β β β βββββββββββ¬βββββββββββ β β β β
β β β β β β β β β
β β β β βββββββββββ΄βββββββββββ β β β β
β β β β β AutoClassifier β β β β β
β β β β β Service β β β β β
β β β β βββββββββββ¬βββββββββββ β β β β
β β β β β β β β β
β β β β βββββββββββ΄βββββββββββ β β β β
β β β β β ClassificationCacheβ β β β β
β β β β β Service β β β β β
β β β β ββββββββββββββββββββββ β β β β
β β β ββββββββββββββββββββββββββ β β β
β β β β β β
β βββββββββββΌββββββββββββββββββββββββββββββββββββΌββββββββββββ β
β β β β
β βββββββββββ΄βββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββ β
β β Tool System (Extensible) β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β ToolRegistry (Service Locator) β β β
β β β - register() - get() - toAISDKFormat() β β β
β β ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ β β
β β β β β
β β βββββββββββββββ΄ββββββββββββββ β β
β β β β β β
β β ββββββββ΄βββββββββ βββββββββββββ΄βββββββ β β
β β β WebSearchTool β β Future Tools β β β
β β β - execute() β β - Calculator β β β
β β βββββββββ¬ββββββββ β - Weather β β β
β β β β - ImageGen β β β
β β βββββββββ΄βββββββββββ ββββββββββββββββββββ β β
β β β TavilyService β β β
β β β SummaryService β β β
β β ββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Data Access Layer (TypeORM) β β
β β - Repositories: User, Conversation, Message β β
β β - Entity Models with Relations β β
β ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββ
β PostgreSQL Database β
β β
β Tables: β
β - users (id, email, password, credits, isActive) β
β - conversations (id, userId, title, systemPrompt, β
β operationalMode) β
β - messages (id, conversationId, role, content, metadata) β
β β
β Relationships: β
β - User 1:N Conversations β
β - Conversation 1:N Messages β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββ
β External Services β
β - Groq API (AI Models: Llama 3.1, Llama 3.3) β
β - Tavily API (Web Search) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User Request Flow:
1. Client β Nginx β NestJS Controller
2. Controller validates JWT via JwtAuthGuard
3. Controller β Service Layer
4. Service Layer β AI Service (with Tool Registry)
5. AI Service β Groq API (streaming)
6. If tool needed β Tool Registry β Specific Tool β External API
7. Stream results back through SSE β Client
Database Transaction Flow:
1. Service Layer β TypeORM Repository
2. Repository β PostgreSQL
3. Response β Service β Controller β Client
-
π€ Multi-Model AI Support
- Fast text generation with Llama 3.1 8B
- Advanced tool calling with Llama 3.3 70B
- Automatic model selection based on task
-
π§ Extensible Tool System
- Plugin-based architecture
- Type-safe tool definitions with Zod
- Automatic tool registration
- Built-in web search with Tavily
- Automatically determines if queries need web search
-
π¬ Conversation Management
- Persistent conversation history
- Custom system prompts per conversation
- Automatic title generation
- Message metadata (tool calls, citations)
-
π Security
- JWT-based authentication
- Password hashing with bcrypt
- Request validation with class-validator
- CORS configuration
-
π― Operational Modes (NEW)
- Fast Mode - Quick responses with Llama 3.1 8B (500 tokens, 0.5 temp)
- Thinking Mode - Deep analysis with Llama 3.3 70B (4000 tokens, 0.7 temp)
- Auto Mode - AI-powered classification (default)
- Intelligent query complexity detection
- 5-minute classification caching (70% API call reduction)
- Per-conversation and per-message mode control
- Mode metadata tracking in message history
-
π‘ Real-time Streaming
- Server-Sent Events (SSE)
- AI SDK v5 compatible
- Tool execution visibility
- Progress tracking
- NestJS - Progressive Node.js framework
- TypeScript - Type-safe development
- Express - HTTP server
- PostgreSQL - Relational database
- TypeORM - ORM with entity management
- AI SDK v5 (Vercel) - Unified AI interface
- Groq - Fast AI inference
- Tavily - Web search API
- Passport JWT - Token-based auth
- bcrypt - Password hashing
- class-validator - DTO validation
- class-transformer - Object transformation
- Zod - Schema validation for tools
- Docker - Containerization
- Docker Compose - Multi-container orchestration
- Nginx - Reverse proxy & load balancing
- Node.js 20+ (LTS recommended)
- npm or yarn
- PostgreSQL 15+
- Docker & Docker Compose (for containerized setup)
- Groq API Key (Get one here)
- Tavily API Key (Get one here)
-
Clone the repository
git clone https://github.com/Kashif-Rezwi/better-dev-api.git cd better-dev-api -
Create environment file
cp .env.docker .env
-
Configure environment variables
# Edit .env file JWT_SECRET=your-super-secret-jwt-key-change-this GROQ_API_KEY=gsk_your_groq_api_key TAVILY_API_KEY=tvly-your-tavily-api-key DEFAULT_AI_MODEL=openai/gpt-oss-120b AI_TEXT_MODEL=llama-3.1-8b-instant AI_TOOL_MODEL=llama-3.3-70b-versatile -
Start the application
npm run docker:up
-
Check health
curl http://localhost:3001/health
-
View logs
npm run docker:logs
-
Clone the repository
git clone https://github.com/Kashif-Rezwi/better-dev-api.git cd better-dev-api -
Install dependencies
npm install
-
Setup PostgreSQL
# Install PostgreSQL locally or use Docker docker run --name nebula-postgres \ -e POSTGRES_USER=nebula_ai \ -e POSTGRES_PASSWORD=nebula_ai \ -e POSTGRES_DB=nebula_db \ -p 5432:5432 \ -d postgres:15-alpine -
Create .env file
# Database DATABASE_HOST=localhost DATABASE_PORT=5432 DATABASE_USER=nebula_ai DATABASE_PASSWORD=nebula_ai DATABASE_NAME=nebula_db # JWT JWT_SECRET=your-super-secret-jwt-key JWT_EXPIRATION=7d # App PORT=3001 NODE_ENV=development # AI GROQ_API_KEY=gsk_your_groq_api_key DEFAULT_AI_MODEL=openai/gpt-oss-120b AI_TEXT_MODEL=llama-3.1-8b-instant AI_TOOL_MODEL=llama-3.3-70b-versatile # Tools TAVILY_API_KEY=tvly-your-tavily-api-key
-
Start development server
npm run start:dev
-
Access the API
- API: http://localhost:3001
- Health: http://localhost:3001/health
better-dev-api/
βββ src/
β βββ config/ # Configuration files
β β βββ database.config.ts # Database connection config
β β βββ jwt.config.ts # JWT authentication config
β β
β βββ common/ # Shared resources
β β βββ guards/
β β βββ jwt-auth.guard.ts # JWT authentication guard
β β
β βββ modules/ # Feature modules
β β βββ auth/ # Authentication module
β β β βββ dto/ # Data Transfer Objects
β β β βββ strategies/ # Passport strategies
β β β βββ auth.controller.ts
β β β βββ auth.service.ts
β β β βββ auth.module.ts
β β β
β β βββ user/ # User management module
β β β βββ entities/ # Database entities
β β β βββ dto/
β β β βββ user.service.ts
β β β βββ user.module.ts
β β β
β β βββ chat/ # Chat & AI module
β β βββ entities/ # Conversation, Message
β β βββ dto/
β β βββ tools/ # Tool system
β β β βββ implementations/ # Tool implementations
β β β β βββ web-search.tool.ts
β β β βββ interfaces/ # Tool interfaces
β β β βββ services/ # Tool services
β β β β βββ tavily.service.ts
β β β β βββ summary.service.ts
β β β βββ tool.registry.ts # Tool registry
β β β βββ tools.config.ts # Tool configuration
β β βββ chat.controller.ts
β β βββ chat.service.ts
β β βββ ai.service.ts
β β βββ chat.module.ts
β β
β βββ app.module.ts # Root module
β βββ main.ts # Application entry point
β βββ health.controller.ts # Health check endpoint
β
βββ nginx/
β βββ nginx.conf # Nginx configuration
β
βββ docker-compose.yml # Docker services
βββ Dockerfile # Production Docker image
βββ package.json
βββ tsconfig.json
βββ README.md
- Development:
http://localhost:3001 - Production: Configure via Nginx
POST /auth/register
Content-Type: application/json
{
"email": "user@example.com",
"password": "securepassword123"
}
Response: 201 Created
{
"accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"user": {
"id": "uuid",
"email": "user@example.com",
"credits": 1000
}
}POST /auth/login
Content-Type: application/json
{
"email": "user@example.com",
"password": "securepassword123"
}
Response: 200 OK
{
"accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"user": {
"id": "uuid",
"email": "user@example.com",
"credits": 1000
}
}GET /auth/profile
Authorization: Bearer {token}
Response: 200 OK
{
"userId": "uuid",
"email": "user@example.com"
}POST /chat/conversations/with-message
Authorization: Bearer {token}
Content-Type: application/json
{
"title": "New Conversation",
"systemPrompt": "You are a helpful assistant.",
"firstMessage": "Hello, how are you?"
}
Response: 201 Created
{
"id": "conv-uuid",
"title": "New Conversation",
"systemPrompt": "You are a helpful assistant.",
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}GET /chat/conversations
Authorization: Bearer {token}
Response: 200 OK
[
{
"id": "uuid",
"title": "Conversation Title",
"systemPrompt": "Custom prompt",
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z",
"lastMessage": {
"id": "msg-uuid",
"role": "assistant",
"content": "Last message preview...",
"createdAt": "2025-01-01T00:00:00.000Z"
}
}
]GET /chat/conversations/{id}
Authorization: Bearer {token}
Response: 200 OK
{
"id": "uuid",
"title": "Conversation Title",
"systemPrompt": "Custom prompt",
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z",
"messages": [
{
"id": "msg-uuid",
"role": "user",
"content": "Hello",
"createdAt": "2025-01-01T00:00:00.000Z"
},
{
"id": "msg-uuid-2",
"role": "assistant",
"content": "Hi there!",
"createdAt": "2025-01-01T00:00:01.000Z",
"metadata": {
"toolCalls": []
}
}
]
}POST /chat/conversations/{id}/messages
Authorization: Bearer {token}
Content-Type: application/json
{
"messages": [
{
"id": "msg-1",
"role": "user",
"parts": [
{
"type": "text",
"text": "What's the latest news about AI?"
}
]
}
]
}
Response: 200 OK (Server-Sent Events)
Content-Type: text/event-stream
data: {"type":"text-delta","textDelta":"The latest"}
data: {"type":"text-delta","textDelta":" news..."}
data: {"type":"tool-call-start","toolName":"tavily_web_search"}
data: {"type":"tool-result","result":{...}}
data: {"type":"finish","usage":{...}}PUT /chat/conversations/{id}/system-prompt
Authorization: Bearer {token}
Content-Type: application/json
{
"systemPrompt": "You are an expert in AI and machine learning."
}
Response: 200 OK
{
"id": "uuid",
"title": "Conversation Title",
"systemPrompt": "You are an expert in AI and machine learning.",
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}POST /chat/conversations/{id}/generate-title
Authorization: Bearer {token}
Content-Type: application/json
{
"message": "What's the weather like today?"
}
Response: 200 OK
{
"title": "Weather Inquiry"
}DELETE /chat/conversations/{id}
Authorization: Bearer {token}
Response: 204 No ContentGET /health
Response: 200 OK
{
"status": "ok",
"timestamp": "2025-01-01T00:00:00.000Z",
"service": "better-dev-ai-chat"
}The AI Service includes an intelligent query intent analyzer that optimizes tool usage and reduces unnecessary API calls. The analyzeQueryIntent() method automatically determines whether a user query requires real-time web search or can be answered from general knowledge.
// Automatically analyzes each query
const needsWebSearch = await aiService.analyzeQueryIntent(messages);
// Routes to appropriate model
if (needsWebSearch) {
// Use tool-calling model with web search
stream = aiService.streamResponse(messages, tools);
} else {
// Use fast text model without tools
stream = aiService.streamResponse(messages);
}β Queries that trigger web search:
- Current/recent events and news (e.g., "latest AI trends 2025")
- Real-time information (e.g., "today's weather", "current stock price")
- Up-to-date data that changes frequently
β Queries that use general knowledge:
- General knowledge questions (e.g., "What is JavaScript?", "Explain OOP")
- Follow-up questions to previous searches (context already available)
- Questions about AI capabilities
- General conversation and clarification
The analyzer checks recent conversation history (last 3 turns) to detect if web search was recently performed. This prevents redundant searches when:
- User asks follow-up questions about search results
- Context is already available from previous tool calls
- User is refining or clarifying a previous query
- π° Cost Optimization - Avoids unnecessary tool-calling model usage
- β‘ Faster Responses - Uses lightweight models when appropriate
- π― Better UX - Reduces wait time for simple queries
- π Smarter Tool Usage - Only searches when truly needed
The feature uses different AI models for different tasks:
# Fast model for query analysis and simple responses
AI_TEXT_MODEL=llama-3.1-8b-instant
# Powerful model for tool calling and complex tasks
AI_TOOL_MODEL=llama-3.3-70b-versatileUser: "What is machine learning?"
β
AnalyzeQueryIntent() β NO (general knowledge)
β
Use fast text model β Quick response
β
User: "What are the latest ML breakthroughs in 2025?"
β
AnalyzeQueryIntent() β YES (current events)
β
Use tool model + web search β Real-time results
The Operational Modes system provides intelligent chat response modes that automatically adjust AI model selection, token limits, and response styles based on query complexity and user preferences.
Available Modes:
- Fast Mode - Quick, concise responses using lightweight models (Llama 3.1 8B)
- Thinking Mode - Detailed, comprehensive responses using advanced models (Llama 3.3 70B)
- Auto Mode - AI-powered automatic mode selection based on query complexity (default)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Request β
β POST /chat/conversations/:id/messages β
β { messages, modeOverride?: "fast"|"thinking"|"auto" } β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββ
β ChatController β
β - Validates request β
β - Extracts modeOverride (optional) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββ
β ChatService β
β - handleStreamingResponse() β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β ββββββββββββββββββββββββββββββββββββββββββββββ
β β MODE RESOLUTION HIERARCHY β
β β β
β β Priority (highest to lowest): β
β β 1. Message-level override (request) β
β β 2. Conversation-level setting (DB) β
β β 3. Default mode ("auto") β
β ββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ModeResolverService β
β - resolveMode() β
β - getRequestedMode() β Returns: "fast"|"thinking"|"auto" β
β - resolveEffectiveMode() β Returns: "fast"|"thinking" β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β If mode === "auto"
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AutoClassifierService β
β Uses AI to classify query complexity β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Classification Logic β β
β β β β
β β 1. Check ClassificationCacheService (5-min TTL) β β
β β 2. Quick heuristic: queries < 15 chars β fast β β
β β 3. AI classification prompt β β
β β - SIMPLE queries β fast mode β β
β β - COMPLEX queries β thinking mode β β
β β 4. Cache result for future requests β β
β β 5. Timeout fallback: 5s β defaults to fast β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Returns: { requested, effective }
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MODE_CONFIG β
β Static configuration for each mode β
β β
β fast: { thinking: { β
β model: llama-3.1-8b-instant model: llama-3.3-70b-versatileβ
β maxTokens: 500 maxTokens: 4000 β
β temperature: 0.5 temperature: 0.7 β
β systemPrompt: "Be concise" systemPrompt: "Be detailed" β
β } } β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Pass effective mode config
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AIService β
β - streamResponseWithMode(messages, effectiveMode, ...) β
β - Applies mode config (model, tokens, temperature, prompt) β
β - Streams response with configured parameters β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Stream response + save metadata
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Message Saved with Metadata β
β { β
β content: "...", β
β metadata: { β
β mode: { β
β requested: "auto", β
β effective: "thinking", β
β modelUsed: "llama-3.3-70b-versatile", β
β temperature: 0.7 β
β } β
β } β
β } β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Mode Types & Configuration (mode.config.ts)
// Type definitions
export type OperationalMode = 'fast' | 'thinking' | 'auto';
export type EffectiveMode = 'fast' | 'thinking';
// Configuration per mode
export const MODE_CONFIG: Record<EffectiveMode, ModeConfig> = {
fast: {
model: 'llama-3.1-8b-instant',
maxTokens: 500,
temperature: 0.5,
systemPrompt: 'Be extremely concise and direct...'
},
thinking: {
model: 'llama-3.3-70b-versatile',
maxTokens: 4000,
temperature: 0.7,
systemPrompt: 'Provide thorough, comprehensive responses...'
}
};2. ModeResolverService (mode-resolver.service.ts)
Resolves operational mode using hierarchy:
Request Priority:
βββββββββββββββββββββββββββββββββββββββββββ
β 1. modeOverride (API request body) β β Highest priority
βββββββββββββββββββββββββββββββββββββββββββ€
β 2. conversation.operationalMode (DB) β
βββββββββββββββββββββββββββββββββββββββββββ€
β 3. Default: "auto" β β Lowest priority
βββββββββββββββββββββββββββββββββββββββββββ
Key Methods:
resolveMode()- Main entry pointgetRequestedMode()- Determines mode via hierarchyresolveEffectiveMode()- Converts "auto" to concrete mode
3. AutoClassifierService (auto-classifier.service.ts)
AI-powered query complexity analysis:
Classification Flow:
βββββββββββββββββββββββββββββββββββββββββββ
β 1. Extract last user message β
βββββββββββββββββββββββββββββββββββββββββββ€
β 2. Quick heuristic (< 15 chars β fast) β
βββββββββββββββββββββββββββββββββββββββββββ€
β 3. Check ClassificationCacheService β
β - Cache hit β return cached mode β
βββββββββββββββββββββββββββββββββββββββββββ€
β 4. AI classification (with 5s timeout) β
β - Send prompt to lightweight model β
β - "SIMPLE" β fast mode β
β - "COMPLEX" β thinking mode β
βββββββββββββββββββββββββββββββββββββββββββ€
β 5. Cache result (5-minute TTL) β
βββββββββββββββββββββββββββββββββββββββββββ€
β 6. Return effective mode β
βββββββββββββββββββββββββββββββββββββββββββ
Classification Criteria:
| Query Type | Mode | Examples |
|---|---|---|
| Short questions | Fast | "What is X?", "Define Y" |
| Factual lookups | Fast | "Who invented Z?" |
| Yes/no questions | Fast | "Can I do X?" |
| Multi-part analysis | Thinking | "Compare X and Y in detail" |
| Code implementation | Thinking | "Build a function that..." |
| Debugging requests | Thinking | "Why is this code failing?" |
| Design decisions | Thinking | "Design a system for..." |
4. ClassificationCacheService (classification-cache.service.ts)
In-memory caching to reduce AI calls:
Cache Behavior:
- TTL: 5 minutes per entry
- Key: MD5 hash of last user message text
- Cleanup: Every 60 seconds (removes expired)
- Storage: Map<string, CacheEntry>
Benefits:
- Reduces classification API calls by ~70%
- Improves response time for repeated queries
- Automatic expiration ensures fresh classifications
Conversation Entity (conversation.entity.ts)
ALTER TABLE conversations ADD COLUMN operational_mode VARCHAR(20);
-- Values: 'fast' | 'thinking' | 'auto' | NULL
-- NULL means use default behavior (auto mode)Message Metadata (message.entity.ts)
// Messages store mode metadata in JSONB
{
"metadata": {
"toolCalls": [...], // Existing tool call data
"mode": { // NEW: Mode tracking
"requested": "auto",
"effective": "thinking",
"modelUsed": "llama-3.3-70b-versatile",
"temperature": 0.7,
"tokensUsed": null // Future: token tracking
}
}
}POST /chat/conversations/:id/messages
Authorization: Bearer {token}
Content-Type: application/json
{
"messages": [
{
"role": "user",
"parts": [{ "type": "text", "text": "Explain quantum computing" }]
}
],
"modeOverride": "thinking" // Optional: "fast" | "thinking" | "auto"
}
Response: 200 OK (Server-Sent Events)PUT /chat/conversations/:id/operational-mode
Authorization: Bearer {token}
Content-Type: application/json
{
"mode": "fast" // "fast" | "thinking" | "auto"
}
Response: 200 OK
{
"id": "conv-uuid",
"title": "Conversation Title",
"systemPrompt": "...",
"operationalMode": "fast", // Updated mode
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}POST /chat/conversations/with-message
Authorization: Bearer {token}
Content-Type: application/json
{
"title": "New Chat",
"systemPrompt": "You are a helpful assistant.",
"operationalMode": "thinking", // Optional
"firstMessage": "Hello!"
}
Response: 201 Created
{
"id": "conv-uuid",
"title": "New Chat",
"operationalMode": "thinking",
...
}Scenario: User sends "Explain how databases work internally" with no mode override
1. Request arrives at ChatController
ββ> No modeOverride in request
2. ChatService.handleStreamingResponse()
ββ> Calls ModeResolverService.resolveMode()
3. ModeResolverService.getRequestedMode()
ββ> Check modeOverride: null
ββ> Check conversation.operationalMode: null
ββ> Default: "auto"
ββ> Requested mode = "auto"
4. ModeResolverService.resolveEffectiveMode()
ββ> Mode is "auto" β delegate to AutoClassifierService
5. AutoClassifierService.classify()
ββ> Extract query: "Explain how databases work internally"
ββ> Length check: 38 chars (> 15) β continue
ββ> Cache check: MISS (not cached)
ββ> AI classification prompt sent:
β "Query: 'Explain how databases work internally'"
β Response: "COMPLEX"
ββ> Result: "thinking" mode
ββ> Cache result with key: md5(query)
6. Return to ChatService
ββ> { requested: "auto", effective: "thinking" }
7. ChatService gets MODE_CONFIG["thinking"]
ββ> model: llama-3.3-70b-versatile
ββ> maxTokens: 4000
ββ> temperature: 0.7
ββ> systemPrompt: "Provide thorough, comprehensive..."
8. AIService.streamResponseWithMode()
ββ> Apply mode config to streaming request
9. Response streamed to client
ββ> Save message with mode metadata
10. Next identical query
ββ> Cache HIT β skip AI classification
ββ> Instant mode resolution (< 1ms)
| Operation | Latency | Notes |
|---|---|---|
| Mode resolution (no auto) | < 1ms | Direct lookup |
| Cache hit (auto mode) | < 1ms | MD5 hash lookup |
| Cache miss (auto mode) | ~100-500ms | AI classification call |
| Classification timeout | 5s | Fallback to fast mode |
Optimization Strategies:
- β 5-minute cache TTL reduces 70% of classification calls
- β Heuristic pre-filter (< 15 chars) ~10% speedup
- β 5-second timeout prevents hanging requests
- β Fail-safe: errors β default to fast mode
Environment Variables:
# Fast mode model (lightweight, quick responses)
AI_TEXT_MODEL=llama-3.1-8b-instant
# Thinking mode model (powerful, detailed responses)
AI_TOOL_MODEL=llama-3.3-70b-versatileMode Customization:
Edit src/modules/chat/modes/mode.config.ts:
export const MODE_CONFIG: Record<EffectiveMode, ModeConfig> = {
fast: {
model: process.env.AI_TEXT_MODEL || 'llama-3.1-8b-instant',
maxTokens: 500, // Adjust max response length
temperature: 0.5, // Adjust creativity (0.0 - 1.0)
systemPrompt: '...' // Customize behavior
},
thinking: {
model: process.env.AI_TOOL_MODEL || 'llama-3.3-70b-versatile',
maxTokens: 4000,
temperature: 0.7,
systemPrompt: '...'
}
};For Users:
- π Faster responses for simple queries (fast mode)
- π§ Deeper answers for complex questions (thinking mode)
- π€ Automatic optimization with auto mode (default)
- π― Manual control via mode override or conversation settings
For System:
- π° Cost optimization - Use lightweight models when appropriate
- β‘ Performance - Faster responses with smaller models
- π Observability - Mode metadata tracked per message
- π§ Flexibility - Easy to add new modes or adjust configs
The tool system is designed to be extensible and type-safe:
- Tool Interface - Base contract for all tools
- Tool Registry - Central registry for managing tools
- Tool Config - Auto-registers tools on startup
- Tool Implementations - Specific tool logic
// 1. Create tool implementation
import { Injectable } from '@nestjs/common';
import { z } from 'zod';
import { Tool } from '../interfaces/tool.interface';
@Injectable()
export class CalculatorTool implements Tool {
readonly name = 'calculator';
readonly description = 'Performs basic arithmetic operations';
readonly parameters = z.object({
operation: z.enum(['add', 'subtract', 'multiply', 'divide']),
a: z.number(),
b: z.number(),
});
async execute(params: z.infer<typeof this.parameters>) {
const { operation, a, b } = params;
switch (operation) {
case 'add': return a + b;
case 'subtract': return a - b;
case 'multiply': return a * b;
case 'divide': return a / b;
}
}
}
// 2. Register in chat.module.ts
providers: [
// ... existing providers
CalculatorTool,
]
// 3. Register in tools.config.ts
onModuleInit() {
this.toolRegistry.register(this.calculatorTool);
}Searches the web for current information using Tavily API.
Parameters:
query(string): Search querymaxResults(number, optional): Max results (1-10, default: 5)
Returns:
{
query: string;
results: Array<{
title: string;
url: string;
snippet: string;
favicon: string;
publishedDate?: string;
relevanceScore: number;
}>;
resultsCount: number;
searchedAt: string;
summary: string;
citations: Array<{
text: string;
sourceIndex: number;
url: string;
}>;
}This API is deployed on DigitalOcean using a production-grade architecture with Docker, PostgreSQL, Nginx reverse proxy, SSL certificates, and automated CI/CD.
Live API: https://api.betterdev.in
GitHub (push to main)
β CI/CD
GitHub Actions βββ> SSH into VPS βββ> Restart Docker Container
(auto-build + health check)
DigitalOcean VPS (Ubuntu 22.04)
β
NGINX (SSL, reverse proxy)
β
Docker Container (NestJS API)
β
PostgreSQL (VPS service)
-
Create a DigitalOcean Droplet with Ubuntu 22.04 LTS
-
SSH into the server:
ssh root@your-server-ip
-
Create a non-root user (best practice):
adduser kashif usermod -aG sudo kashif su - kashif
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add user to docker group (no sudo needed)
sudo usermod -aG docker $USER
newgrp docker
# Install Docker Compose
sudo apt-get update
sudo apt-get install docker-compose-plugin
# Verify installation
docker --version
docker compose versionInstead of running PostgreSQL in Docker, we use a standalone database for stability and performance:
# Install PostgreSQL 14
sudo apt update
sudo apt install postgresql postgresql-contrib
# Create database and user
sudo -u postgres psql
CREATE DATABASE better_dev_db;
CREATE USER better_dev WITH PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE better_dev_db TO better_dev;
\q
# Configure PostgreSQL to accept connections
sudo nano /etc/postgresql/14/main/postgresql.conf
# Set: listen_addresses = '*'
sudo nano /etc/postgresql/14/main/pg_hba.conf
# Add: host all all 0.0.0.0/0 md5
sudo systemctl restart postgresql
# Open firewall (if using UFW)
sudo ufw allow 5432/tcpcd ~
git clone https://github.com/Kashif-Rezwi/better-dev-api.git
cd better-dev-apiCreate .env file:
nano .envAdd environment variables:
# Database
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_USER=better_dev
DATABASE_PASSWORD=your_secure_password
DATABASE_NAME=better_dev_db
# JWT
JWT_SECRET=your-super-secret-jwt-key-change-this
JWT_EXPIRATION=7d
# App
PORT=3001
NODE_ENV=production
# AI
GROQ_API_KEY=gsk_your_groq_api_key
DEFAULT_AI_MODEL=openai/gpt-oss-120b
AI_TEXT_MODEL=llama-3.1-8b-instant
AI_TOOL_MODEL=llama-3.3-70b-versatile
# Tools
TAVILY_API_KEY=tvly-your-tavily-api-key# Build Docker image
docker compose build --no-cache
# Start container in detached mode
docker compose up -d
# Verify API is running
curl http://localhost:3001/health
# View logs
docker compose logs -f# Install Nginx
sudo apt update
sudo apt install nginx
# Create Nginx configuration
sudo nano /etc/nginx/sites-available/api.betterdev.inAdd the following configuration:
server {
listen 80;
server_name api.betterdev.in;
location / {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Enable the site:
# Create symbolic link
sudo ln -sf /etc/nginx/sites-available/api.betterdev.in /etc/nginx/sites-enabled/
# Test configuration
sudo nginx -t
# Reload Nginx
sudo systemctl reload nginx# Install Certbot
sudo apt install certbot python3-certbot-nginx
# Obtain SSL certificate (auto-configures Nginx)
sudo certbot --nginx -d api.betterdev.in
# Verify auto-renewal
sudo certbot renew --dry-runNow your API is accessible at: https://api.betterdev.in
This enables automatic deployment on every push to main branch.
On the VPS:
- Generate SSH key for deployments (no passphrase):
ssh-keygen -t ed25519 -C "github-actions-deploy" -f ~/.ssh/github_actions_deploy # Add public key to authorized_keys cat ~/.ssh/github_actions_deploy.pub >> ~/.ssh/authorized_keys # Copy private key (you'll add this to GitHub Secrets) cat ~/.ssh/github_actions_deploy
On GitHub:
-
Add SSH private key to GitHub Secrets:
- Go to your repo β Settings β Secrets and variables β Actions
- Add secret:
SSH_PRIVATE_KEY(paste the private key content) - Add secret:
SERVER_IP(your VPS IP address) - Add secret:
SERVER_USER(your username, e.g.,kashif)
-
Create GitHub Actions workflow:
Create .github/workflows/deploy.yml:
name: Deploy to DigitalOcean
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to VPS
uses: appleboy/ssh-action@v1.0.0
with:
host: ${{ secrets.SERVER_IP }}
username: ${{ secrets.SERVER_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
cd ~/better-dev-api
git pull origin main
docker compose down
docker compose up -d --build
docker compose logs --tail=50Now, every push to main automatically deploys to production! π
β
Auto Deploy - Push to GitHub β automatic deployment
β
Auto Restart - Docker restarts API if it crashes
β
Health Checks - Docker monitors API health
β
SSL Auto-Renew - Free HTTPS with auto-renewal
β
Isolated Database - PostgreSQL runs independently
β
Reverse Proxy - Nginx handles all traffic
β
Production-Grade - Enterprise-level architecture
# SSH into VPS
ssh kashif@your-server-ip
# View logs
cd ~/better-dev-api
docker compose logs -f
# Restart API
docker compose restart
# Rebuild and restart
docker compose down
docker compose up -d --build
# Check health
curl https://api.betterdev.in/health
# View Nginx logs
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log
# Check SSL certificate
sudo certbot certificatesFor local development with Docker:
# Build and start all services
npm run docker:up
# View logs
npm run docker:logs
# Stop services
npm run docker:down
# Restart app only
npm run docker:restart# Development
npm run start:dev # Start with hot-reload
npm run start:debug # Start with debugger
# Production
npm run build # Build for production
npm run start:prod # Run production build
# Docker
npm run docker:build # Build Docker images
npm run docker:up # Start Docker containers
npm run docker:down # Stop Docker containers
npm run docker:logs # View logs
npm run docker:restart # Restart app container
# Testing
npm run test # Run tests
npm run test:watch # Run tests in watch mode
npm run test:cov # Generate coverage report# Linting
npx eslint .
# Format code
npx prettier --write "src/**/*.ts"# Generate migration
npx typeorm migration:generate -n MigrationName
# Run migrations
npx typeorm migration:run
# Revert migration
npx typeorm migration:revert| Variable | Description | Example |
|---|---|---|
DATABASE_HOST |
PostgreSQL host | localhost |
DATABASE_PORT |
PostgreSQL port | 5432 |
DATABASE_USER |
Database user | nebula_ai |
DATABASE_PASSWORD |
Database password | securepassword |
DATABASE_NAME |
Database name | nebula_db |
JWT_SECRET |
JWT signing secret | your-secret-key |
JWT_EXPIRATION |
Token expiration | 7d |
GROQ_API_KEY |
Groq API key | gsk_... |
TAVILY_API_KEY |
Tavily API key | tvly-... |
| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3001 |
NODE_ENV |
Environment | development |
DEFAULT_AI_MODEL |
Default AI model | openai/gpt-oss-120b |
AI_TEXT_MODEL |
Fast text model | llama-3.1-8b-instant |
AI_TOOL_MODEL |
Tool calling model | llama-3.3-70b-versatile |
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow TypeScript best practices
- Use meaningful variable/function names
- Write self-documenting code
- Add comments for complex logic
- Maintain consistent formatting (use Prettier)
- Write unit tests for new features
This project is licensed under the UNLICENSED License.
- NestJS - Framework
- Vercel AI SDK - AI streaming
- Groq - Fast AI inference
- Tavily - Web search API
- TypeORM - Database ORM
For questions or issues:
- Open an Issue
- Contact: GitHub Profile
Built with β€οΈ using NestJS and TypeScript