Skip to content

Core server and API for Better DEV, an intelligent AI chat platform with tool-calling capabilities and streaming responses.

Notifications You must be signed in to change notification settings

Kashif-Rezwi/better-dev-api

Repository files navigation

Better DEV AI Backend

Core server and API for Better DEV, an intelligent AI chat platform with tool-calling capabilities and streaming responses.

NestJS TypeScript PostgreSQL Docker

πŸ“‹ Table of Contents


🌟 Overview

Better DEV AI Backend is a production-ready NestJS application that powers an intelligent conversational AI platform. It provides:

  • Real-time AI Conversations with streaming responses using AI SDK v5
  • Intelligent Tool Calling system with web search capabilities
  • Conversation Management with persistent storage
  • User Authentication with JWT-based security
  • Multi-Model AI Support using Groq (Llama models)
  • Extensible Tool Architecture for adding custom capabilities

πŸ—οΈ Architecture

High-Level Design (HLD)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Client Layer                            β”‚
β”‚  (React/Next.js Frontend, Mobile Apps, API Clients)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚ HTTPS/REST
                     β”‚ Server-Sent Events (SSE)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      API Gateway (Nginx)                        β”‚
β”‚  - Load Balancing                                               β”‚
β”‚  - SSL Termination                                              β”‚
β”‚  - Request Routing                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   NestJS Application Layer                      β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚              Controllers (REST Endpoints)               β”‚    β”‚
β”‚  β”‚  - AuthController: /auth/*                              β”‚    β”‚
β”‚  β”‚  - ChatController: /chat/*                              β”‚    β”‚
β”‚  β”‚  - HealthController: /health                            β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                 β”‚                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                  Service Layer                          β”‚    β”‚
β”‚  β”‚                                                         β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚    β”‚
β”‚  β”‚  β”‚ AuthService β”‚  β”‚ ChatService  β”‚  β”‚  UserService  β”‚   β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚    β”‚
β”‚  β”‚         β”‚                β”‚                  β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚    AIService           β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚  - analyzeQueryIntent()β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚  - streamResponseWith  β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚    Mode()              β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚  - generateResponse()  β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚                β”‚                  β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ Operational Modes      β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚                        β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”‚ ModeResolver       β”‚ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”‚ Service            β”‚ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚           β”‚            β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”‚ AutoClassifier     β”‚ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”‚ Service            β”‚ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚           β”‚            β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”‚ ClassificationCacheβ”‚ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β”‚ Service            β”‚ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚           β”‚    β”‚
β”‚  β”‚         β”‚                                   β”‚           β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚            β”‚                                   β”‚                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              Tool System (Extensible)                    β”‚   β”‚
β”‚  β”‚                                                          β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β”‚
β”‚  β”‚  β”‚         ToolRegistry (Service Locator)           β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  - register()  - get()  - toAISDKFormat()        β”‚    β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β”‚
β”‚  β”‚                       β”‚                                  β”‚   β”‚
β”‚  β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚   β”‚
β”‚  β”‚         β”‚                           β”‚                    β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”             β”‚   β”‚
β”‚  β”‚  β”‚ WebSearchTool β”‚      β”‚  Future Tools    β”‚             β”‚   β”‚
β”‚  β”‚  β”‚ - execute()   β”‚      β”‚  - Calculator    β”‚             β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚  - Weather       β”‚             β”‚   β”‚
β”‚  β”‚          β”‚              β”‚  - ImageGen      β”‚             β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚   β”‚
β”‚  β”‚  β”‚ TavilyService    β”‚                                    β”‚   β”‚
β”‚  β”‚  β”‚ SummaryService   β”‚                                    β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚            Data Access Layer (TypeORM)                  β”‚    β”‚
β”‚  β”‚  - Repositories: User, Conversation, Message            β”‚    β”‚
β”‚  β”‚  - Entity Models with Relations                         β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  PostgreSQL Database                            β”‚
β”‚                                                                 β”‚
β”‚  Tables:                                                        β”‚
β”‚  - users (id, email, password, credits, isActive)               β”‚
β”‚  - conversations (id, userId, title, systemPrompt,              β”‚
β”‚                   operationalMode)                              β”‚
β”‚  - messages (id, conversationId, role, content, metadata)       β”‚
β”‚                                                                 β”‚
β”‚  Relationships:                                                 β”‚
β”‚  - User 1:N Conversations                                       β”‚
β”‚  - Conversation 1:N Messages                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   External Services                             β”‚
β”‚  - Groq API (AI Models: Llama 3.1, Llama 3.3)                   β”‚
β”‚  - Tavily API (Web Search)                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Interactions

User Request Flow:
1. Client β†’ Nginx β†’ NestJS Controller
2. Controller validates JWT via JwtAuthGuard
3. Controller β†’ Service Layer
4. Service Layer β†’ AI Service (with Tool Registry)
5. AI Service β†’ Groq API (streaming)
6. If tool needed β†’ Tool Registry β†’ Specific Tool β†’ External API
7. Stream results back through SSE β†’ Client

Database Transaction Flow:
1. Service Layer β†’ TypeORM Repository
2. Repository β†’ PostgreSQL
3. Response β†’ Service β†’ Controller β†’ Client

✨ Features

Core Capabilities

  • πŸ€– Multi-Model AI Support

    • Fast text generation with Llama 3.1 8B
    • Advanced tool calling with Llama 3.3 70B
    • Automatic model selection based on task
  • πŸ”§ Extensible Tool System

    • Plugin-based architecture
    • Type-safe tool definitions with Zod
    • Automatic tool registration
    • Built-in web search with Tavily
    • Automatically determines if queries need web search
  • πŸ’¬ Conversation Management

    • Persistent conversation history
    • Custom system prompts per conversation
    • Automatic title generation
    • Message metadata (tool calls, citations)
  • πŸ” Security

    • JWT-based authentication
    • Password hashing with bcrypt
    • Request validation with class-validator
    • CORS configuration
  • 🎯 Operational Modes (NEW)

    • Fast Mode - Quick responses with Llama 3.1 8B (500 tokens, 0.5 temp)
    • Thinking Mode - Deep analysis with Llama 3.3 70B (4000 tokens, 0.7 temp)
    • Auto Mode - AI-powered classification (default)
    • Intelligent query complexity detection
    • 5-minute classification caching (70% API call reduction)
    • Per-conversation and per-message mode control
    • Mode metadata tracking in message history
  • πŸ“‘ Real-time Streaming

    • Server-Sent Events (SSE)
    • AI SDK v5 compatible
    • Tool execution visibility
    • Progress tracking

πŸ› οΈ Tech Stack

Backend Framework

  • NestJS - Progressive Node.js framework
  • TypeScript - Type-safe development
  • Express - HTTP server

Database

  • PostgreSQL - Relational database
  • TypeORM - ORM with entity management

AI & Tools

  • AI SDK v5 (Vercel) - Unified AI interface
  • Groq - Fast AI inference
  • Tavily - Web search API

Authentication

  • Passport JWT - Token-based auth
  • bcrypt - Password hashing

Validation

  • class-validator - DTO validation
  • class-transformer - Object transformation
  • Zod - Schema validation for tools

DevOps

  • Docker - Containerization
  • Docker Compose - Multi-container orchestration
  • Nginx - Reverse proxy & load balancing

πŸ“¦ Prerequisites

  • Node.js 20+ (LTS recommended)
  • npm or yarn
  • PostgreSQL 15+
  • Docker & Docker Compose (for containerized setup)
  • Groq API Key (Get one here)
  • Tavily API Key (Get one here)

πŸš€ Getting Started

Option 1: Docker (Recommended)

  1. Clone the repository

    git clone https://github.com/Kashif-Rezwi/better-dev-api.git
    cd better-dev-api
  2. Create environment file

    cp .env.docker .env
  3. Configure environment variables

    # Edit .env file
    JWT_SECRET=your-super-secret-jwt-key-change-this
    GROQ_API_KEY=gsk_your_groq_api_key
    TAVILY_API_KEY=tvly-your-tavily-api-key
    DEFAULT_AI_MODEL=openai/gpt-oss-120b
    AI_TEXT_MODEL=llama-3.1-8b-instant
    AI_TOOL_MODEL=llama-3.3-70b-versatile
  4. Start the application

    npm run docker:up
  5. Check health

    curl http://localhost:3001/health
  6. View logs

    npm run docker:logs

Option 2: Local Development

  1. Clone the repository

    git clone https://github.com/Kashif-Rezwi/better-dev-api.git
    cd better-dev-api
  2. Install dependencies

    npm install
  3. Setup PostgreSQL

    # Install PostgreSQL locally or use Docker
    docker run --name nebula-postgres \
      -e POSTGRES_USER=nebula_ai \
      -e POSTGRES_PASSWORD=nebula_ai \
      -e POSTGRES_DB=nebula_db \
      -p 5432:5432 \
      -d postgres:15-alpine
  4. Create .env file

    # Database
    DATABASE_HOST=localhost
    DATABASE_PORT=5432
    DATABASE_USER=nebula_ai
    DATABASE_PASSWORD=nebula_ai
    DATABASE_NAME=nebula_db
    
    # JWT
    JWT_SECRET=your-super-secret-jwt-key
    JWT_EXPIRATION=7d
    
    # App
    PORT=3001
    NODE_ENV=development
    
    # AI
    GROQ_API_KEY=gsk_your_groq_api_key
    DEFAULT_AI_MODEL=openai/gpt-oss-120b
    AI_TEXT_MODEL=llama-3.1-8b-instant
    AI_TOOL_MODEL=llama-3.3-70b-versatile
    
    # Tools
    TAVILY_API_KEY=tvly-your-tavily-api-key
  5. Start development server

    npm run start:dev
  6. Access the API


πŸ“ Project Structure

better-dev-api/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/                      # Configuration files
β”‚   β”‚   β”œβ”€β”€ database.config.ts       # Database connection config
β”‚   β”‚   └── jwt.config.ts            # JWT authentication config
β”‚   β”‚
β”‚   β”œβ”€β”€ common/                      # Shared resources
β”‚   β”‚   └── guards/
β”‚   β”‚       └── jwt-auth.guard.ts    # JWT authentication guard
β”‚   β”‚
β”‚   β”œβ”€β”€ modules/                     # Feature modules
β”‚   β”‚   β”œβ”€β”€ auth/                    # Authentication module
β”‚   β”‚   β”‚   β”œβ”€β”€ dto/                 # Data Transfer Objects
β”‚   β”‚   β”‚   β”œβ”€β”€ strategies/          # Passport strategies
β”‚   β”‚   β”‚   β”œβ”€β”€ auth.controller.ts
β”‚   β”‚   β”‚   β”œβ”€β”€ auth.service.ts
β”‚   β”‚   β”‚   └── auth.module.ts
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ user/                    # User management module
β”‚   β”‚   β”‚   β”œβ”€β”€ entities/            # Database entities
β”‚   β”‚   β”‚   β”œβ”€β”€ dto/
β”‚   β”‚   β”‚   β”œβ”€β”€ user.service.ts
β”‚   β”‚   β”‚   └── user.module.ts
β”‚   β”‚   β”‚
β”‚   β”‚   └── chat/                    # Chat & AI module
β”‚   β”‚       β”œβ”€β”€ entities/            # Conversation, Message
β”‚   β”‚       β”œβ”€β”€ dto/
β”‚   β”‚       β”œβ”€β”€ tools/               # Tool system
β”‚   β”‚       β”‚   β”œβ”€β”€ implementations/ # Tool implementations
β”‚   β”‚       β”‚   β”‚   └── web-search.tool.ts
β”‚   β”‚       β”‚   β”œβ”€β”€ interfaces/      # Tool interfaces
β”‚   β”‚       β”‚   β”œβ”€β”€ services/        # Tool services
β”‚   β”‚       β”‚   β”‚   β”œβ”€β”€ tavily.service.ts
β”‚   β”‚       β”‚   β”‚   └── summary.service.ts
β”‚   β”‚       β”‚   β”œβ”€β”€ tool.registry.ts # Tool registry
β”‚   β”‚       β”‚   └── tools.config.ts  # Tool configuration
β”‚   β”‚       β”œβ”€β”€ chat.controller.ts
β”‚   β”‚       β”œβ”€β”€ chat.service.ts
β”‚   β”‚       β”œβ”€β”€ ai.service.ts
β”‚   β”‚       └── chat.module.ts
β”‚   β”‚
β”‚   β”œβ”€β”€ app.module.ts                # Root module
β”‚   β”œβ”€β”€ main.ts                      # Application entry point
β”‚   └── health.controller.ts         # Health check endpoint
β”‚
β”œβ”€β”€ nginx/
β”‚   └── nginx.conf                   # Nginx configuration
β”‚
β”œβ”€β”€ docker-compose.yml               # Docker services
β”œβ”€β”€ Dockerfile                       # Production Docker image
β”œβ”€β”€ package.json
β”œβ”€β”€ tsconfig.json
└── README.md

πŸ“‘ API Documentation

Base URL

  • Development: http://localhost:3001
  • Production: Configure via Nginx

Authentication

Register User

POST /auth/register
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "securepassword123"
}

Response: 201 Created
{
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "credits": 1000
  }
}

Login

POST /auth/login
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "securepassword123"
}

Response: 200 OK
{
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "credits": 1000
  }
}

Get Profile

GET /auth/profile
Authorization: Bearer {token}

Response: 200 OK
{
  "userId": "uuid",
  "email": "user@example.com"
}

Conversations

Create Conversation with First Message

POST /chat/conversations/with-message
Authorization: Bearer {token}
Content-Type: application/json

{
  "title": "New Conversation",
  "systemPrompt": "You are a helpful assistant.",
  "firstMessage": "Hello, how are you?"
}

Response: 201 Created
{
  "id": "conv-uuid",
  "title": "New Conversation",
  "systemPrompt": "You are a helpful assistant.",
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z"
}

List Conversations

GET /chat/conversations
Authorization: Bearer {token}

Response: 200 OK
[
  {
    "id": "uuid",
    "title": "Conversation Title",
    "systemPrompt": "Custom prompt",
    "createdAt": "2025-01-01T00:00:00.000Z",
    "updatedAt": "2025-01-01T00:00:00.000Z",
    "lastMessage": {
      "id": "msg-uuid",
      "role": "assistant",
      "content": "Last message preview...",
      "createdAt": "2025-01-01T00:00:00.000Z"
    }
  }
]

Get Conversation

GET /chat/conversations/{id}
Authorization: Bearer {token}

Response: 200 OK
{
  "id": "uuid",
  "title": "Conversation Title",
  "systemPrompt": "Custom prompt",
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z",
  "messages": [
    {
      "id": "msg-uuid",
      "role": "user",
      "content": "Hello",
      "createdAt": "2025-01-01T00:00:00.000Z"
    },
    {
      "id": "msg-uuid-2",
      "role": "assistant",
      "content": "Hi there!",
      "createdAt": "2025-01-01T00:00:01.000Z",
      "metadata": {
        "toolCalls": []
      }
    }
  ]
}

Send Message (Streaming)

POST /chat/conversations/{id}/messages
Authorization: Bearer {token}
Content-Type: application/json

{
  "messages": [
    {
      "id": "msg-1",
      "role": "user",
      "parts": [
        {
          "type": "text",
          "text": "What's the latest news about AI?"
        }
      ]
    }
  ]
}

Response: 200 OK (Server-Sent Events)
Content-Type: text/event-stream

data: {"type":"text-delta","textDelta":"The latest"}
data: {"type":"text-delta","textDelta":" news..."}
data: {"type":"tool-call-start","toolName":"tavily_web_search"}
data: {"type":"tool-result","result":{...}}
data: {"type":"finish","usage":{...}}

Update System Prompt

PUT /chat/conversations/{id}/system-prompt
Authorization: Bearer {token}
Content-Type: application/json

{
  "systemPrompt": "You are an expert in AI and machine learning."
}

Response: 200 OK
{
  "id": "uuid",
  "title": "Conversation Title",
  "systemPrompt": "You are an expert in AI and machine learning.",
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z"
}

Generate Title

POST /chat/conversations/{id}/generate-title
Authorization: Bearer {token}
Content-Type: application/json

{
  "message": "What's the weather like today?"
}

Response: 200 OK
{
  "title": "Weather Inquiry"
}

Delete Conversation

DELETE /chat/conversations/{id}
Authorization: Bearer {token}

Response: 204 No Content

Health Check

GET /health

Response: 200 OK
{
  "status": "ok",
  "timestamp": "2025-01-01T00:00:00.000Z",
  "service": "better-dev-ai-chat"
}

🧠 AI Service & Intelligent Query Routing

Query Intent Analysis

The AI Service includes an intelligent query intent analyzer that optimizes tool usage and reduces unnecessary API calls. The analyzeQueryIntent() method automatically determines whether a user query requires real-time web search or can be answered from general knowledge.

How It Works

// Automatically analyzes each query
const needsWebSearch = await aiService.analyzeQueryIntent(messages);

// Routes to appropriate model
if (needsWebSearch) {
  // Use tool-calling model with web search
  stream = aiService.streamResponse(messages, tools);
} else {
  // Use fast text model without tools
  stream = aiService.streamResponse(messages);
}

Intelligence Features

βœ… Queries that trigger web search:

  • Current/recent events and news (e.g., "latest AI trends 2025")
  • Real-time information (e.g., "today's weather", "current stock price")
  • Up-to-date data that changes frequently

❌ Queries that use general knowledge:

  • General knowledge questions (e.g., "What is JavaScript?", "Explain OOP")
  • Follow-up questions to previous searches (context already available)
  • Questions about AI capabilities
  • General conversation and clarification

Smart Conversation Context

The analyzer checks recent conversation history (last 3 turns) to detect if web search was recently performed. This prevents redundant searches when:

  • User asks follow-up questions about search results
  • Context is already available from previous tool calls
  • User is refining or clarifying a previous query

Benefits

  • πŸ’° Cost Optimization - Avoids unnecessary tool-calling model usage
  • ⚑ Faster Responses - Uses lightweight models when appropriate
  • 🎯 Better UX - Reduces wait time for simple queries
  • πŸ” Smarter Tool Usage - Only searches when truly needed

Configuration

The feature uses different AI models for different tasks:

# Fast model for query analysis and simple responses
AI_TEXT_MODEL=llama-3.1-8b-instant

# Powerful model for tool calling and complex tasks
AI_TOOL_MODEL=llama-3.3-70b-versatile

Example Flow

User: "What is machine learning?"
  ↓
AnalyzeQueryIntent() β†’ NO (general knowledge)
  ↓
Use fast text model β†’ Quick response
  ↓
User: "What are the latest ML breakthroughs in 2025?"
  ↓
AnalyzeQueryIntent() β†’ YES (current events)
  ↓
Use tool model + web search β†’ Real-time results

🎯 Operational Modes

Overview

The Operational Modes system provides intelligent chat response modes that automatically adjust AI model selection, token limits, and response styles based on query complexity and user preferences.

Available Modes:

  • Fast Mode - Quick, concise responses using lightweight models (Llama 3.1 8B)
  • Thinking Mode - Detailed, comprehensive responses using advanced models (Llama 3.3 70B)
  • Auto Mode - AI-powered automatic mode selection based on query complexity (default)

High-Level Design (HLD)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         User Request                            β”‚
β”‚  POST /chat/conversations/:id/messages                          β”‚
β”‚  { messages, modeOverride?: "fast"|"thinking"|"auto" }          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ChatController                               β”‚
β”‚  - Validates request                                            β”‚
β”‚  - Extracts modeOverride (optional)                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ChatService                                  β”‚
β”‚  - handleStreamingResponse()                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚  β”‚      MODE RESOLUTION HIERARCHY             β”‚
               β”‚  β”‚                                            β”‚
               β”‚  β”‚  Priority (highest to lowest):             β”‚
               β”‚  β”‚  1. Message-level override (request)       β”‚
               β”‚  β”‚  2. Conversation-level setting (DB)        β”‚
               β”‚  β”‚  3. Default mode ("auto")                  β”‚
               β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ModeResolverService                           β”‚
β”‚  - resolveMode()                                                 β”‚
β”‚  - getRequestedMode() β†’ Returns: "fast"|"thinking"|"auto"        β”‚
β”‚  - resolveEffectiveMode() β†’ Returns: "fast"|"thinking"           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ If mode === "auto"
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 AutoClassifierService                            β”‚
β”‚  Uses AI to classify query complexity                            β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚           Classification Logic                           β”‚    β”‚
β”‚  β”‚                                                          β”‚    β”‚
β”‚  β”‚  1. Check ClassificationCacheService (5-min TTL)         β”‚    β”‚
β”‚  β”‚  2. Quick heuristic: queries < 15 chars β†’ fast           β”‚    β”‚
β”‚  β”‚  3. AI classification prompt                             β”‚    β”‚
β”‚  β”‚     - SIMPLE queries β†’ fast mode                         β”‚    β”‚
β”‚  β”‚     - COMPLEX queries β†’ thinking mode                    β”‚    β”‚
β”‚  β”‚  4. Cache result for future requests                     β”‚    β”‚
β”‚  β”‚  5. Timeout fallback: 5s β†’ defaults to fast              β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Returns: { requested, effective }
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MODE_CONFIG                                   β”‚
β”‚  Static configuration for each mode                              β”‚
β”‚                                                                  β”‚
β”‚  fast: {                         thinking: {                     β”‚
β”‚    model: llama-3.1-8b-instant     model: llama-3.3-70b-versatileβ”‚
β”‚    maxTokens: 500                  maxTokens: 4000               β”‚
β”‚    temperature: 0.5                temperature: 0.7              β”‚
β”‚    systemPrompt: "Be concise"      systemPrompt: "Be detailed"   β”‚
β”‚  }                                }                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Pass effective mode config
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AIService                                     β”‚
β”‚  - streamResponseWithMode(messages, effectiveMode, ...)          β”‚
β”‚  - Applies mode config (model, tokens, temperature, prompt)      β”‚
β”‚  - Streams response with configured parameters                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β”‚ Stream response + save metadata
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Message Saved with Metadata                       β”‚
β”‚  {                                                               β”‚
β”‚    content: "...",                                               β”‚
β”‚    metadata: {                                                   β”‚
β”‚      mode: {                                                     β”‚
β”‚        requested: "auto",                                        β”‚
β”‚        effective: "thinking",                                    β”‚
β”‚        modelUsed: "llama-3.3-70b-versatile",                     β”‚
β”‚        temperature: 0.7                                          β”‚
β”‚      }                                                           β”‚
β”‚    }                                                             β”‚
β”‚  }                                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Low-Level Design (LLD)

Component Breakdown

1. Mode Types & Configuration (mode.config.ts)

// Type definitions
export type OperationalMode = 'fast' | 'thinking' | 'auto';
export type EffectiveMode = 'fast' | 'thinking';

// Configuration per mode
export const MODE_CONFIG: Record<EffectiveMode, ModeConfig> = {
  fast: {
    model: 'llama-3.1-8b-instant',
    maxTokens: 500,
    temperature: 0.5,
    systemPrompt: 'Be extremely concise and direct...'
  },
  thinking: {
    model: 'llama-3.3-70b-versatile',
    maxTokens: 4000,
    temperature: 0.7,
    systemPrompt: 'Provide thorough, comprehensive responses...'
  }
};

2. ModeResolverService (mode-resolver.service.ts)

Resolves operational mode using hierarchy:

Request Priority:
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ 1. modeOverride (API request body)      β”‚ ← Highest priority
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 2. conversation.operationalMode (DB)    β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 3. Default: "auto"                      β”‚ ← Lowest priority
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Methods:

  • resolveMode() - Main entry point
  • getRequestedMode() - Determines mode via hierarchy
  • resolveEffectiveMode() - Converts "auto" to concrete mode

3. AutoClassifierService (auto-classifier.service.ts)

AI-powered query complexity analysis:

Classification Flow:
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ 1. Extract last user message            β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 2. Quick heuristic (< 15 chars β†’ fast)  β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 3. Check ClassificationCacheService     β”‚
  β”‚    - Cache hit β†’ return cached mode     β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 4. AI classification (with 5s timeout)  β”‚
  β”‚    - Send prompt to lightweight model   β”‚
  β”‚    - "SIMPLE" β†’ fast mode               β”‚
  β”‚    - "COMPLEX" β†’ thinking mode          β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 5. Cache result (5-minute TTL)          β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚ 6. Return effective mode                β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Classification Criteria:

Query Type Mode Examples
Short questions Fast "What is X?", "Define Y"
Factual lookups Fast "Who invented Z?"
Yes/no questions Fast "Can I do X?"
Multi-part analysis Thinking "Compare X and Y in detail"
Code implementation Thinking "Build a function that..."
Debugging requests Thinking "Why is this code failing?"
Design decisions Thinking "Design a system for..."

4. ClassificationCacheService (classification-cache.service.ts)

In-memory caching to reduce AI calls:

Cache Behavior:
  - TTL: 5 minutes per entry
  - Key: MD5 hash of last user message text
  - Cleanup: Every 60 seconds (removes expired)
  - Storage: Map<string, CacheEntry>

Benefits:

  • Reduces classification API calls by ~70%
  • Improves response time for repeated queries
  • Automatic expiration ensures fresh classifications

Database Schema

Conversation Entity (conversation.entity.ts)

ALTER TABLE conversations ADD COLUMN operational_mode VARCHAR(20);
-- Values: 'fast' | 'thinking' | 'auto' | NULL

-- NULL means use default behavior (auto mode)

Message Metadata (message.entity.ts)

// Messages store mode metadata in JSONB
{
  "metadata": {
    "toolCalls": [...],  // Existing tool call data
    "mode": {            // NEW: Mode tracking
      "requested": "auto",
      "effective": "thinking",
      "modelUsed": "llama-3.3-70b-versatile",
      "temperature": 0.7,
      "tokensUsed": null  // Future: token tracking
    }
  }
}

API Endpoints

1. Send Message with Mode Override

POST /chat/conversations/:id/messages
Authorization: Bearer {token}
Content-Type: application/json

{
  "messages": [
    {
      "role": "user",
      "parts": [{ "type": "text", "text": "Explain quantum computing" }]
    }
  ],
  "modeOverride": "thinking"  // Optional: "fast" | "thinking" | "auto"
}

Response: 200 OK (Server-Sent Events)

2. Update Conversation Mode

PUT /chat/conversations/:id/operational-mode
Authorization: Bearer {token}
Content-Type: application/json

{
  "mode": "fast"  // "fast" | "thinking" | "auto"
}

Response: 200 OK
{
  "id": "conv-uuid",
  "title": "Conversation Title",
  "systemPrompt": "...",
  "operationalMode": "fast",  // Updated mode
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z"
}

3. Create Conversation with Mode

POST /chat/conversations/with-message
Authorization: Bearer {token}
Content-Type: application/json

{
  "title": "New Chat",
  "systemPrompt": "You are a helpful assistant.",
  "operationalMode": "thinking",  // Optional
  "firstMessage": "Hello!"
}

Response: 201 Created
{
  "id": "conv-uuid",
  "title": "New Chat",
  "operationalMode": "thinking",
  ...
}

Request Flow Example

Scenario: User sends "Explain how databases work internally" with no mode override

1. Request arrives at ChatController
   └─> No modeOverride in request

2. ChatService.handleStreamingResponse()
   └─> Calls ModeResolverService.resolveMode()

3. ModeResolverService.getRequestedMode()
   β”œβ”€> Check modeOverride: null
   β”œβ”€> Check conversation.operationalMode: null
   └─> Default: "auto"
   └─> Requested mode = "auto"

4. ModeResolverService.resolveEffectiveMode()
   └─> Mode is "auto" β†’ delegate to AutoClassifierService

5. AutoClassifierService.classify()
   β”œβ”€> Extract query: "Explain how databases work internally"
   β”œβ”€> Length check: 38 chars (> 15) β†’ continue
   β”œβ”€> Cache check: MISS (not cached)
   β”œβ”€> AI classification prompt sent:
   β”‚   "Query: 'Explain how databases work internally'"
   β”‚   Response: "COMPLEX"
   β”œβ”€> Result: "thinking" mode
   └─> Cache result with key: md5(query)

6. Return to ChatService
   └─> { requested: "auto", effective: "thinking" }

7. ChatService gets MODE_CONFIG["thinking"]
   └─> model: llama-3.3-70b-versatile
   └─> maxTokens: 4000
   └─> temperature: 0.7
   └─> systemPrompt: "Provide thorough, comprehensive..."

8. AIService.streamResponseWithMode()
   └─> Apply mode config to streaming request

9. Response streamed to client
   └─> Save message with mode metadata

10. Next identical query
    └─> Cache HIT β†’ skip AI classification
    └─> Instant mode resolution (< 1ms)

Performance Characteristics

Operation Latency Notes
Mode resolution (no auto) < 1ms Direct lookup
Cache hit (auto mode) < 1ms MD5 hash lookup
Cache miss (auto mode) ~100-500ms AI classification call
Classification timeout 5s Fallback to fast mode

Optimization Strategies:

  • βœ… 5-minute cache TTL reduces 70% of classification calls
  • βœ… Heuristic pre-filter (< 15 chars) ~10% speedup
  • βœ… 5-second timeout prevents hanging requests
  • βœ… Fail-safe: errors β†’ default to fast mode

Configuration

Environment Variables:

# Fast mode model (lightweight, quick responses)
AI_TEXT_MODEL=llama-3.1-8b-instant

# Thinking mode model (powerful, detailed responses)
AI_TOOL_MODEL=llama-3.3-70b-versatile

Mode Customization:

Edit src/modules/chat/modes/mode.config.ts:

export const MODE_CONFIG: Record<EffectiveMode, ModeConfig> = {
  fast: {
    model: process.env.AI_TEXT_MODEL || 'llama-3.1-8b-instant',
    maxTokens: 500,        // Adjust max response length
    temperature: 0.5,      // Adjust creativity (0.0 - 1.0)
    systemPrompt: '...'    // Customize behavior
  },
  thinking: {
    model: process.env.AI_TOOL_MODEL || 'llama-3.3-70b-versatile',
    maxTokens: 4000,
    temperature: 0.7,
    systemPrompt: '...'
  }
};

Benefits

For Users:

  • πŸš€ Faster responses for simple queries (fast mode)
  • 🧠 Deeper answers for complex questions (thinking mode)
  • πŸ€– Automatic optimization with auto mode (default)
  • 🎯 Manual control via mode override or conversation settings

For System:

  • πŸ’° Cost optimization - Use lightweight models when appropriate
  • ⚑ Performance - Faster responses with smaller models
  • πŸ“Š Observability - Mode metadata tracked per message
  • πŸ”§ Flexibility - Easy to add new modes or adjust configs

πŸ”§ Tool System

Architecture

The tool system is designed to be extensible and type-safe:

  1. Tool Interface - Base contract for all tools
  2. Tool Registry - Central registry for managing tools
  3. Tool Config - Auto-registers tools on startup
  4. Tool Implementations - Specific tool logic

Creating a New Tool

// 1. Create tool implementation
import { Injectable } from '@nestjs/common';
import { z } from 'zod';
import { Tool } from '../interfaces/tool.interface';

@Injectable()
export class CalculatorTool implements Tool {
  readonly name = 'calculator';
  
  readonly description = 'Performs basic arithmetic operations';
  
  readonly parameters = z.object({
    operation: z.enum(['add', 'subtract', 'multiply', 'divide']),
    a: z.number(),
    b: z.number(),
  });

  async execute(params: z.infer<typeof this.parameters>) {
    const { operation, a, b } = params;
    
    switch (operation) {
      case 'add': return a + b;
      case 'subtract': return a - b;
      case 'multiply': return a * b;
      case 'divide': return a / b;
    }
  }
}

// 2. Register in chat.module.ts
providers: [
  // ... existing providers
  CalculatorTool,
]

// 3. Register in tools.config.ts
onModuleInit() {
  this.toolRegistry.register(this.calculatorTool);
}

Available Tools

Web Search Tool (tavily_web_search)

Searches the web for current information using Tavily API.

Parameters:

  • query (string): Search query
  • maxResults (number, optional): Max results (1-10, default: 5)

Returns:

{
  query: string;
  results: Array<{
    title: string;
    url: string;
    snippet: string;
    favicon: string;
    publishedDate?: string;
    relevanceScore: number;
  }>;
  resultsCount: number;
  searchedAt: string;
  summary: string;
  citations: Array<{
    text: string;
    sourceIndex: number;
    url: string;
  }>;
}

🚒 Deployment

This API is deployed on DigitalOcean using a production-grade architecture with Docker, PostgreSQL, Nginx reverse proxy, SSL certificates, and automated CI/CD.

Live API: https://api.betterdev.in


Production Architecture

GitHub (push to main)
       ↓ CI/CD
GitHub Actions ───> SSH into VPS ───> Restart Docker Container
                                     (auto-build + health check)
DigitalOcean VPS (Ubuntu 22.04)
       ↓
NGINX (SSL, reverse proxy)
       ↓
Docker Container (NestJS API)
       ↓
PostgreSQL (VPS service)

🌐 DigitalOcean VPS Deployment

Step 1: Prepare the VPS

  1. Create a DigitalOcean Droplet with Ubuntu 22.04 LTS

  2. SSH into the server:

    ssh root@your-server-ip
  3. Create a non-root user (best practice):

    adduser kashif
    usermod -aG sudo kashif
    su - kashif

Step 2: Install Docker & Docker Compose

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Add user to docker group (no sudo needed)
sudo usermod -aG docker $USER
newgrp docker

# Install Docker Compose
sudo apt-get update
sudo apt-get install docker-compose-plugin

# Verify installation
docker --version
docker compose version

Step 3: Install PostgreSQL on VPS

Instead of running PostgreSQL in Docker, we use a standalone database for stability and performance:

# Install PostgreSQL 14
sudo apt update
sudo apt install postgresql postgresql-contrib

# Create database and user
sudo -u postgres psql

CREATE DATABASE better_dev_db;
CREATE USER better_dev WITH PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE better_dev_db TO better_dev;
\q

# Configure PostgreSQL to accept connections
sudo nano /etc/postgresql/14/main/postgresql.conf
# Set: listen_addresses = '*'

sudo nano /etc/postgresql/14/main/pg_hba.conf
# Add: host all all 0.0.0.0/0 md5

sudo systemctl restart postgresql

# Open firewall (if using UFW)
sudo ufw allow 5432/tcp

Step 4: Clone Repository

cd ~
git clone https://github.com/Kashif-Rezwi/better-dev-api.git
cd better-dev-api

Step 5: Configure Environment

Create .env file:

nano .env

Add environment variables:

# Database
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_USER=better_dev
DATABASE_PASSWORD=your_secure_password
DATABASE_NAME=better_dev_db

# JWT
JWT_SECRET=your-super-secret-jwt-key-change-this
JWT_EXPIRATION=7d

# App
PORT=3001
NODE_ENV=production

# AI
GROQ_API_KEY=gsk_your_groq_api_key
DEFAULT_AI_MODEL=openai/gpt-oss-120b
AI_TEXT_MODEL=llama-3.1-8b-instant
AI_TOOL_MODEL=llama-3.3-70b-versatile

# Tools
TAVILY_API_KEY=tvly-your-tavily-api-key

Step 6: Build & Start Docker Container

# Build Docker image
docker compose build --no-cache

# Start container in detached mode
docker compose up -d

# Verify API is running
curl http://localhost:3001/health

# View logs
docker compose logs -f

Step 7: Install & Configure Nginx Reverse Proxy

# Install Nginx
sudo apt update
sudo apt install nginx

# Create Nginx configuration
sudo nano /etc/nginx/sites-available/api.betterdev.in

Add the following configuration:

server {
    listen 80;
    server_name api.betterdev.in;

    location / {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Enable the site:

# Create symbolic link
sudo ln -sf /etc/nginx/sites-available/api.betterdev.in /etc/nginx/sites-enabled/

# Test configuration
sudo nginx -t

# Reload Nginx
sudo systemctl reload nginx

Step 8: Enable HTTPS with Certbot (Free SSL)

# Install Certbot
sudo apt install certbot python3-certbot-nginx

# Obtain SSL certificate (auto-configures Nginx)
sudo certbot --nginx -d api.betterdev.in

# Verify auto-renewal
sudo certbot renew --dry-run

Now your API is accessible at: https://api.betterdev.in


Step 9: Set Up GitHub Actions CI/CD (Auto Deploy)

This enables automatic deployment on every push to main branch.

On the VPS:

  1. Generate SSH key for deployments (no passphrase):
    ssh-keygen -t ed25519 -C "github-actions-deploy" -f ~/.ssh/github_actions_deploy
    
    # Add public key to authorized_keys
    cat ~/.ssh/github_actions_deploy.pub >> ~/.ssh/authorized_keys
    
    # Copy private key (you'll add this to GitHub Secrets)
    cat ~/.ssh/github_actions_deploy

On GitHub:

  1. Add SSH private key to GitHub Secrets:

    • Go to your repo β†’ Settings β†’ Secrets and variables β†’ Actions
    • Add secret: SSH_PRIVATE_KEY (paste the private key content)
    • Add secret: SERVER_IP (your VPS IP address)
    • Add secret: SERVER_USER (your username, e.g., kashif)
  2. Create GitHub Actions workflow:

Create .github/workflows/deploy.yml:

name: Deploy to DigitalOcean

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: ubuntu-latest
    
    steps:
      - name: Deploy to VPS
        uses: appleboy/ssh-action@v1.0.0
        with:
          host: ${{ secrets.SERVER_IP }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            cd ~/better-dev-api
            git pull origin main
            docker compose down
            docker compose up -d --build
            docker compose logs --tail=50

Now, every push to main automatically deploys to production! πŸš€


🎯 Production Features

βœ… Auto Deploy - Push to GitHub β†’ automatic deployment
βœ… Auto Restart - Docker restarts API if it crashes
βœ… Health Checks - Docker monitors API health
βœ… SSL Auto-Renew - Free HTTPS with auto-renewal
βœ… Isolated Database - PostgreSQL runs independently
βœ… Reverse Proxy - Nginx handles all traffic
βœ… Production-Grade - Enterprise-level architecture


πŸ”„ Managing Deployment

# SSH into VPS
ssh kashif@your-server-ip

# View logs
cd ~/better-dev-api
docker compose logs -f

# Restart API
docker compose restart

# Rebuild and restart
docker compose down
docker compose up -d --build

# Check health
curl https://api.betterdev.in/health

# View Nginx logs
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log

# Check SSL certificate
sudo certbot certificates

🐳 Local Docker Development

For local development with Docker:

# Build and start all services
npm run docker:up

# View logs
npm run docker:logs

# Stop services
npm run docker:down

# Restart app only
npm run docker:restart

πŸ’» Development

Available Scripts

# Development
npm run start:dev        # Start with hot-reload
npm run start:debug      # Start with debugger

# Production
npm run build            # Build for production
npm run start:prod       # Run production build

# Docker
npm run docker:build     # Build Docker images
npm run docker:up        # Start Docker containers
npm run docker:down      # Stop Docker containers
npm run docker:logs      # View logs
npm run docker:restart   # Restart app container

# Testing
npm run test             # Run tests
npm run test:watch       # Run tests in watch mode
npm run test:cov         # Generate coverage report

Code Quality

# Linting
npx eslint .

# Format code
npx prettier --write "src/**/*.ts"

Database Migrations

# Generate migration
npx typeorm migration:generate -n MigrationName

# Run migrations
npx typeorm migration:run

# Revert migration
npx typeorm migration:revert

πŸ” Environment Variables

Required Variables

Variable Description Example
DATABASE_HOST PostgreSQL host localhost
DATABASE_PORT PostgreSQL port 5432
DATABASE_USER Database user nebula_ai
DATABASE_PASSWORD Database password securepassword
DATABASE_NAME Database name nebula_db
JWT_SECRET JWT signing secret your-secret-key
JWT_EXPIRATION Token expiration 7d
GROQ_API_KEY Groq API key gsk_...
TAVILY_API_KEY Tavily API key tvly-...

Optional Variables

Variable Description Default
PORT Server port 3001
NODE_ENV Environment development
DEFAULT_AI_MODEL Default AI model openai/gpt-oss-120b
AI_TEXT_MODEL Fast text model llama-3.1-8b-instant
AI_TOOL_MODEL Tool calling model llama-3.3-70b-versatile

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Coding Standards

  • Follow TypeScript best practices
  • Use meaningful variable/function names
  • Write self-documenting code
  • Add comments for complex logic
  • Maintain consistent formatting (use Prettier)
  • Write unit tests for new features

πŸ“„ License

This project is licensed under the UNLICENSED License.


πŸ™ Acknowledgments


πŸ“ž Support

For questions or issues:


Built with ❀️ using NestJS and TypeScript

About

Core server and API for Better DEV, an intelligent AI chat platform with tool-calling capabilities and streaming responses.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages