Better DEV AI Backend

Core server and API for Better DEV, an intelligent AI chat platform with tool-calling capabilities and streaming responses.

📋 Table of Contents

Overview
Architecture
Features
Tech Stack
Prerequisites
Getting Started
Project Structure
API Documentation
Operational Modes
Tool System
Deployment
Development
Environment Variables
Contributing

🌟 Overview

Better DEV AI Backend is a production-ready NestJS application that powers an intelligent conversational AI platform. It provides:

Real-time AI Conversations with streaming responses using AI SDK v5
Intelligent Tool Calling system with web search capabilities
Conversation Management with persistent storage
User Authentication with JWT-based security
Multi-Model AI Support using Groq (Llama models)
Extensible Tool Architecture for adding custom capabilities

🏗️ Architecture

High-Level Design (HLD)

┌─────────────────────────────────────────────────────────────────┐
│                         Client Layer                            │
│  (React/Next.js Frontend, Mobile Apps, API Clients)             │
└────────────────────┬────────────────────────────────────────────┘
                     │ HTTPS/REST
                     │ Server-Sent Events (SSE)
┌────────────────────┴────────────────────────────────────────────┐
│                      API Gateway (Nginx)                        │
│  - Load Balancing                                               │
│  - SSL Termination                                              │
│  - Request Routing                                              │
└────────────────────┬────────────────────────────────────────────┘
                     │
┌────────────────────┴────────────────────────────────────────────┐
│                   NestJS Application Layer                      │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Controllers (REST Endpoints)               │    │
│  │  - AuthController: /auth/*                              │    │
│  │  - ChatController: /chat/*                              │    │
│  │  - HealthController: /health                            │    │
│  └──────────────┬──────────────────────────────────────────┘    │
│                 │                                               │
│  ┌──────────────┴──────────────────────────────────────────┐    │
│  │                  Service Layer                          │    │
│  │                                                         │    │
│  │  ┌─────────────┐  ┌──────────────┐  ┌───────────────┐   │    │
│  │  │ AuthService │  │ ChatService  │  │  UserService  │   │    │
│  │  └──────┬──────┘  └──────┬───────┘  └───────┬───────┘   │    │
│  │         │                │                  │           │    │
│  │         │    ┌───────────┴────────────┐     │           │    │
│  │         │    │    AIService           │     │           │    │
│  │         │    │  - analyzeQueryIntent()│     │           │    │
│  │         │    │  - streamResponseWith  │     │           │    │
│  │         │    │    Mode()              │     │           │    │
│  │         │    │  - generateResponse()  │     │           │    │
│  │         │    └───────────┬────────────┘     │           │    │
│  │         │                │                  │           │    │
│  │         │    ┌───────────┴────────────┐     │           │    │
│  │         │    │ Operational Modes      │     │           │    │
│  │         │    │                        │     │           │    │
│  │         │    │ ┌────────────────────┐ │     │           │    │
│  │         │    │ │ ModeResolver       │ │     │           │    │
│  │         │    │ │ Service            │ │     │           │    │
│  │         │    │ └─────────┬──────────┘ │     │           │    │
│  │         │    │           │            │     │           │    │
│  │         │    │ ┌─────────┴──────────┐ │     │           │    │
│  │         │    │ │ AutoClassifier     │ │     │           │    │
│  │         │    │ │ Service            │ │     │           │    │
│  │         │    │ └─────────┬──────────┘ │     │           │    │
│  │         │    │           │            │     │           │    │
│  │         │    │ ┌─────────┴──────────┐ │     │           │    │
│  │         │    │ │ ClassificationCache│ │     │           │    │
│  │         │    │ │ Service            │ │     │           │    │
│  │         │    │ └────────────────────┘ │     │           │    │
│  │         │    └────────────────────────┘     │           │    │
│  │         │                                   │           │    │
│  └─────────┼───────────────────────────────────┼───────────┘    │
│            │                                   │                │
│  ┌─────────┴────────────────┴──────────────────┴────────────┐   │
│  │              Tool System (Extensible)                    │   │
│  │                                                          │   │
│  │  ┌──────────────────────────────────────────────────┐    │   │
│  │  │         ToolRegistry (Service Locator)           │    │   │
│  │  │  - register()  - get()  - toAISDKFormat()        │    │   │
│  │  └────────────────────┬─────────────────────────────┘    │   │
│  │                       │                                  │   │
│  │         ┌─────────────┴─────────────┐                    │   │
│  │         │                           │                    │   │
│  │  ┌──────┴────────┐      ┌───────────┴──────┐             │   │
│  │  │ WebSearchTool │      │  Future Tools    │             │   │
│  │  │ - execute()   │      │  - Calculator    │             │   │
│  │  └───────┬───────┘      │  - Weather       │             │   │
│  │          │              │  - ImageGen      │             │   │
│  │  ┌───────┴──────────┐   └──────────────────┘             │   │
│  │  │ TavilyService    │                                    │   │
│  │  │ SummaryService   │                                    │   │
│  │  └──────────────────┘                                    │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │            Data Access Layer (TypeORM)                  │    │
│  │  - Repositories: User, Conversation, Message            │    │
│  │  - Entity Models with Relations                         │    │
│  └──────────────┬──────────────────────────────────────────┘    │
└─────────────────┼───────────────────────────────────────────────┘
                  │
┌─────────────────┴───────────────────────────────────────────────┐
│                  PostgreSQL Database                            │
│                                                                 │
│  Tables:                                                        │
│  - users (id, email, password, credits, isActive)               │
│  - conversations (id, userId, title, systemPrompt,              │
│                   operationalMode)                              │
│  - messages (id, conversationId, role, content, metadata)       │
│                                                                 │
│  Relationships:                                                 │
│  - User 1:N Conversations                                       │
│  - Conversation 1:N Messages                                    │
└─────────────────────────────────────────────────────────────────┘
                  │
┌─────────────────┴───────────────────────────────────────────────┐
│                   External Services                             │
│  - Groq API (AI Models: Llama 3.1, Llama 3.3)                   │
│  - Tavily API (Web Search)                                      │
└─────────────────────────────────────────────────────────────────┘

Component Interactions

User Request Flow:
1. Client → Nginx → NestJS Controller
2. Controller validates JWT via JwtAuthGuard
3. Controller → Service Layer
4. Service Layer → AI Service (with Tool Registry)
5. AI Service → Groq API (streaming)
6. If tool needed → Tool Registry → Specific Tool → External API
7. Stream results back through SSE → Client

Database Transaction Flow:
1. Service Layer → TypeORM Repository
2. Repository → PostgreSQL
3. Response → Service → Controller → Client

✨ Features

Core Capabilities

🤖 Multi-Model AI Support
- Fast text generation with Llama 3.1 8B
- Advanced tool calling with Llama 3.3 70B
- Automatic model selection based on task
🔧 Extensible Tool System
- Plugin-based architecture
- Type-safe tool definitions with Zod
- Automatic tool registration
- Built-in web search with Tavily
- Automatically determines if queries need web search
💬 Conversation Management
- Persistent conversation history
- Custom system prompts per conversation
- Automatic title generation
- Message metadata (tool calls, citations)
🔐 Security
- JWT-based authentication
- Password hashing with bcrypt
- Request validation with class-validator
- CORS configuration
🎯 Operational Modes (NEW)
- Fast Mode - Quick responses with Llama 3.1 8B (500 tokens, 0.5 temp)
- Thinking Mode - Deep analysis with Llama 3.3 70B (4000 tokens, 0.7 temp)
- Auto Mode - AI-powered classification (default)
- Intelligent query complexity detection
- 5-minute classification caching (70% API call reduction)
- Per-conversation and per-message mode control
- Mode metadata tracking in message history
📡 Real-time Streaming
- Server-Sent Events (SSE)
- AI SDK v5 compatible
- Tool execution visibility
- Progress tracking

🛠️ Tech Stack

Backend Framework

NestJS - Progressive Node.js framework
TypeScript - Type-safe development
Express - HTTP server

Database

PostgreSQL - Relational database
TypeORM - ORM with entity management

AI & Tools

AI SDK v5 (Vercel) - Unified AI interface
Groq - Fast AI inference
Tavily - Web search API

Authentication

Passport JWT - Token-based auth
bcrypt - Password hashing

Validation

class-validator - DTO validation
class-transformer - Object transformation
Zod - Schema validation for tools

DevOps

Docker - Containerization
Docker Compose - Multi-container orchestration
Nginx - Reverse proxy & load balancing

📦 Prerequisites

Node.js 20+ (LTS recommended)
npm or yarn
PostgreSQL 15+
Docker & Docker Compose (for containerized setup)
Groq API Key (Get one here)
Tavily API Key (Get one here)

🚀 Getting Started

Option 1: Docker (Recommended)

Clone the repository

git clone https://github.com/Kashif-Rezwi/better-dev-api.git
cd better-dev-api

Create environment file
```
cp .env.docker .env
```

Configure environment variables

# Edit .env file
JWT_SECRET=your-super-secret-jwt-key-change-this
GROQ_API_KEY=gsk_your_groq_api_key
TAVILY_API_KEY=tvly-your-tavily-api-key
DEFAULT_AI_MODEL=openai/gpt-oss-120b
AI_TEXT_MODEL=llama-3.1-8b-instant
AI_TOOL_MODEL=llama-3.3-70b-versatile

Start the application
```
npm run docker:up
```
Check health
```
curl http://localhost:3001/health
```
View logs
```
npm run docker:logs
```

Option 2: Local Development

Clone the repository

git clone https://github.com/Kashif-Rezwi/better-dev-api.git
cd better-dev-api

Install dependencies
```
npm install
```

Setup PostgreSQL

# Install PostgreSQL locally or use Docker
docker run --name nebula-postgres \
  -e POSTGRES_USER=nebula_ai \
  -e POSTGRES_PASSWORD=nebula_ai \
  -e POSTGRES_DB=nebula_db \
  -p 5432:5432 \
  -d postgres:15-alpine

Create .env file

# Database
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_USER=nebula_ai
DATABASE_PASSWORD=nebula_ai
DATABASE_NAME=nebula_db

# JWT
JWT_SECRET=your-super-secret-jwt-key
JWT_EXPIRATION=7d

# App
PORT=3001
NODE_ENV=development

# AI
GROQ_API_KEY=gsk_your_groq_api_key
DEFAULT_AI_MODEL=openai/gpt-oss-120b
AI_TEXT_MODEL=llama-3.1-8b-instant
AI_TOOL_MODEL=llama-3.3-70b-versatile

# Tools
TAVILY_API_KEY=tvly-your-tavily-api-key

Start development server
```
npm run start:dev
```
Access the API
- API: http://localhost:3001
- Health: http://localhost:3001/health

📁 Project Structure

better-dev-api/
├── src/
│   ├── config/                      # Configuration files
│   │   ├── database.config.ts       # Database connection config
│   │   └── jwt.config.ts            # JWT authentication config
│   │
│   ├── common/                      # Shared resources
│   │   └── guards/
│   │       └── jwt-auth.guard.ts    # JWT authentication guard
│   │
│   ├── modules/                     # Feature modules
│   │   ├── auth/                    # Authentication module
│   │   │   ├── dto/                 # Data Transfer Objects
│   │   │   ├── strategies/          # Passport strategies
│   │   │   ├── auth.controller.ts
│   │   │   ├── auth.service.ts
│   │   │   └── auth.module.ts
│   │   │
│   │   ├── user/                    # User management module
│   │   │   ├── entities/            # Database entities
│   │   │   ├── dto/
│   │   │   ├── user.service.ts
│   │   │   └── user.module.ts
│   │   │
│   │   └── chat/                    # Chat & AI module
│   │       ├── entities/            # Conversation, Message
│   │       ├── dto/
│   │       ├── tools/               # Tool system
│   │       │   ├── implementations/ # Tool implementations
│   │       │   │   └── web-search.tool.ts
│   │       │   ├── interfaces/      # Tool interfaces
│   │       │   ├── services/        # Tool services
│   │       │   │   ├── tavily.service.ts
│   │       │   │   └── summary.service.ts
│   │       │   ├── tool.registry.ts # Tool registry
│   │       │   └── tools.config.ts  # Tool configuration
│   │       ├── chat.controller.ts
│   │       ├── chat.service.ts
│   │       ├── ai.service.ts
│   │       └── chat.module.ts
│   │
│   ├── app.module.ts                # Root module
│   ├── main.ts                      # Application entry point
│   └── health.controller.ts         # Health check endpoint
│
├── nginx/
│   └── nginx.conf                   # Nginx configuration
│
├── docker-compose.yml               # Docker services
├── Dockerfile                       # Production Docker image
├── package.json
├── tsconfig.json
└── README.md

📡 API Documentation

Base URL

Development: http://localhost:3001
Production: Configure via Nginx

Authentication

Register User

POST /auth/register
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "securepassword123"
}

Response: 201 Created
{
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "credits": 1000
  }
}

Login

POST /auth/login
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "securepassword123"
}

Response: 200 OK
{
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "credits": 1000
  }
}

Get Profile

GET /auth/profile
Authorization: Bearer {token}

Response: 200 OK
{
  "userId": "uuid",
  "email": "user@example.com"
}

Conversations

Create Conversation with First Message

POST /chat/conversations/with-message
Authorization: Bearer {token}
Content-Type: application/json

{
  "title": "New Conversation",
  "systemPrompt": "You are a helpful assistant.",
  "firstMessage": "Hello, how are you?"
}

Response: 201 Created
{
  "id": "conv-uuid",
  "title": "New Conversation",
  "systemPrompt": "You are a helpful assistant.",
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z"
}

List Conversations

GET /chat/conversations
Authorization: Bearer {token}

Response: 200 OK
[
  {
    "id": "uuid",
    "title": "Conversation Title",
    "systemPrompt": "Custom prompt",
    "createdAt": "2025-01-01T00:00:00.000Z",
    "updatedAt": "2025-01-01T00:00:00.000Z",
    "lastMessage": {
      "id": "msg-uuid",
      "role": "assistant",
      "content": "Last message preview...",
      "createdAt": "2025-01-01T00:00:00.000Z"
    }
  }
]

Get Conversation

GET /chat/conversations/{id}
Authorization: Bearer {token}

Response: 200 OK
{
  "id": "uuid",
  "title": "Conversation Title",
  "systemPrompt": "Custom prompt",
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z",
  "messages": [
    {
      "id": "msg-uuid",
      "role": "user",
      "content": "Hello",
      "createdAt": "2025-01-01T00:00:00.000Z"
    },
    {
      "id": "msg-uuid-2",
      "role": "assistant",
      "content": "Hi there!",
      "createdAt": "2025-01-01T00:00:01.000Z",
      "metadata": {
        "toolCalls": []
      }
    }
  ]
}

Send Message (Streaming)

POST /chat/conversations/{id}/messages
Authorization: Bearer {token}
Content-Type: application/json

{
  "messages": [
    {
      "id": "msg-1",
      "role": "user",
      "parts": [
        {
          "type": "text",
          "text": "What's the latest news about AI?"
        }
      ]
    }
  ]
}

Response: 200 OK (Server-Sent Events)
Content-Type: text/event-stream

data: {"type":"text-delta","textDelta":"The latest"}
data: {"type":"text-delta","textDelta":" news..."}
data: {"type":"tool-call-start","toolName":"tavily_web_search"}
data: {"type":"tool-result","result":{...}}
data: {"type":"finish","usage":{...}}

Update System Prompt

PUT /chat/conversations/{id}/system-prompt
Authorization: Bearer {token}
Content-Type: application/json

{
  "systemPrompt": "You are an expert in AI and machine learning."
}

Response: 200 OK
{
  "id": "uuid",
  "title": "Conversation Title",
  "systemPrompt": "You are an expert in AI and machine learning.",
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z"
}

Generate Title

POST /chat/conversations/{id}/generate-title
Authorization: Bearer {token}
Content-Type: application/json

{
  "message": "What's the weather like today?"
}

Response: 200 OK
{
  "title": "Weather Inquiry"
}

Delete Conversation

DELETE /chat/conversations/{id}
Authorization: Bearer {token}

Response: 204 No Content

Health Check

GET /health

Response: 200 OK
{
  "status": "ok",
  "timestamp": "2025-01-01T00:00:00.000Z",
  "service": "better-dev-ai-chat"
}

🧠 AI Service & Intelligent Query Routing

Query Intent Analysis

The AI Service includes an intelligent query intent analyzer that optimizes tool usage and reduces unnecessary API calls. The analyzeQueryIntent() method automatically determines whether a user query requires real-time web search or can be answered from general knowledge.

How It Works

// Automatically analyzes each query
const needsWebSearch = await aiService.analyzeQueryIntent(messages);

// Routes to appropriate model
if (needsWebSearch) {
  // Use tool-calling model with web search
  stream = aiService.streamResponse(messages, tools);
} else {
  // Use fast text model without tools
  stream = aiService.streamResponse(messages);
}

Intelligence Features

✅ Queries that trigger web search:

Current/recent events and news (e.g., "latest AI trends 2025")
Real-time information (e.g., "today's weather", "current stock price")
Up-to-date data that changes frequently

❌ Queries that use general knowledge:

General knowledge questions (e.g., "What is JavaScript?", "Explain OOP")
Follow-up questions to previous searches (context already available)
Questions about AI capabilities
General conversation and clarification

Smart Conversation Context

The analyzer checks recent conversation history (last 3 turns) to detect if web search was recently performed. This prevents redundant searches when:

User asks follow-up questions about search results
Context is already available from previous tool calls
User is refining or clarifying a previous query

Benefits

💰 Cost Optimization - Avoids unnecessary tool-calling model usage
⚡ Faster Responses - Uses lightweight models when appropriate
🎯 Better UX - Reduces wait time for simple queries
🔍 Smarter Tool Usage - Only searches when truly needed

Configuration

The feature uses different AI models for different tasks:

# Fast model for query analysis and simple responses
AI_TEXT_MODEL=llama-3.1-8b-instant

# Powerful model for tool calling and complex tasks
AI_TOOL_MODEL=llama-3.3-70b-versatile

Example Flow

User: "What is machine learning?"
  ↓
AnalyzeQueryIntent() → NO (general knowledge)
  ↓
Use fast text model → Quick response
  ↓
User: "What are the latest ML breakthroughs in 2025?"
  ↓
AnalyzeQueryIntent() → YES (current events)
  ↓
Use tool model + web search → Real-time results

🎯 Operational Modes

Overview

The Operational Modes system provides intelligent chat response modes that automatically adjust AI model selection, token limits, and response styles based on query complexity and user preferences.

Available Modes:

Fast Mode - Quick, concise responses using lightweight models (Llama 3.1 8B)
Thinking Mode - Detailed, comprehensive responses using advanced models (Llama 3.3 70B)
Auto Mode - AI-powered automatic mode selection based on query complexity (default)

High-Level Design (HLD)

┌─────────────────────────────────────────────────────────────────┐
│                         User Request                            │
│  POST /chat/conversations/:id/messages                          │
│  { messages, modeOverride?: "fast"|"thinking"|"auto" }          │
└────────────────────┬────────────────────────────────────────────┘
                     │
┌────────────────────┴────────────────────────────────────────────┐
│                    ChatController                               │
│  - Validates request                                            │
│  - Extracts modeOverride (optional)                             │
└────────────────────┬────────────────────────────────────────────┘
                     │
┌────────────────────┴────────────────────────────────────────────┐
│                    ChatService                                  │
│  - handleStreamingResponse()                                    │
└──────────────┬──────────────────────────────────────────────────┘
               │
               │  ┌────────────────────────────────────────────┐
               │  │      MODE RESOLUTION HIERARCHY             │
               │  │                                            │
               │  │  Priority (highest to lowest):             │
               │  │  1. Message-level override (request)       │
               │  │  2. Conversation-level setting (DB)        │
               │  │  3. Default mode ("auto")                  │
               │  └────────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────────────┐
│                    ModeResolverService                           │
│  - resolveMode()                                                 │
│  - getRequestedMode() → Returns: "fast"|"thinking"|"auto"        │
│  - resolveEffectiveMode() → Returns: "fast"|"thinking"           │
└──────────────┬───────────────────────────────────────────────────┘
               │
               │ If mode === "auto"
               ▼
┌──────────────────────────────────────────────────────────────────┐
│                 AutoClassifierService                            │
│  Uses AI to classify query complexity                            │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐    │
│  │           Classification Logic                           │    │
│  │                                                          │    │
│  │  1. Check ClassificationCacheService (5-min TTL)         │    │
│  │  2. Quick heuristic: queries < 15 chars → fast           │    │
│  │  3. AI classification prompt                             │    │
│  │     - SIMPLE queries → fast mode                         │    │
│  │     - COMPLEX queries → thinking mode                    │    │
│  │  4. Cache result for future requests                     │    │
│  │  5. Timeout fallback: 5s → defaults to fast              │    │
│  └──────────────────────────────────────────────────────────┘    │
└──────────────┬───────────────────────────────────────────────────┘
               │
               │ Returns: { requested, effective }
               ▼
┌──────────────────────────────────────────────────────────────────┐
│                    MODE_CONFIG                                   │
│  Static configuration for each mode                              │
│                                                                  │
│  fast: {                         thinking: {                     │
│    model: llama-3.1-8b-instant     model: llama-3.3-70b-versatile│
│    maxTokens: 500                  maxTokens: 4000               │
│    temperature: 0.5                temperature: 0.7              │
│    systemPrompt: "Be concise"      systemPrompt: "Be detailed"   │
│  }                                }                              │
└──────────────┬───────────────────────────────────────────────────┘
               │
               │ Pass effective mode config
               ▼
┌──────────────────────────────────────────────────────────────────┐
│                    AIService                                     │
│  - streamResponseWithMode(messages, effectiveMode, ...)          │
│  - Applies mode config (model, tokens, temperature, prompt)      │
│  - Streams response with configured parameters                   │
└──────────────┬───────────────────────────────────────────────────┘
               │
               │ Stream response + save metadata
               ▼
┌──────────────────────────────────────────────────────────────────┐
│                Message Saved with Metadata                       │
│  {                                                               │
│    content: "...",                                               │
│    metadata: {                                                   │
│      mode: {                                                     │
│        requested: "auto",                                        │
│        effective: "thinking",                                    │
│        modelUsed: "llama-3.3-70b-versatile",                     │
│        temperature: 0.7                                          │
│      }                                                           │
│    }                                                             │
│  }                                                               │
└──────────────────────────────────────────────────────────────────┘

Low-Level Design (LLD)

Component Breakdown

1. Mode Types & Configuration (mode.config.ts)

// Type definitions
export type OperationalMode = 'fast' | 'thinking' | 'auto';
export type EffectiveMode = 'fast' | 'thinking';

// Configuration per mode
export const MODE_CONFIG: Record<EffectiveMode, ModeConfig> = {
  fast: {
    model: 'llama-3.1-8b-instant',
    maxTokens: 500,
    temperature: 0.5,
    systemPrompt: 'Be extremely concise and direct...'
  },
  thinking: {
    model: 'llama-3.3-70b-versatile',
    maxTokens: 4000,
    temperature: 0.7,
    systemPrompt: 'Provide thorough, comprehensive responses...'
  }
};

2. ModeResolverService (mode-resolver.service.ts)

Resolves operational mode using hierarchy:

Request Priority:
  ┌─────────────────────────────────────────┐
  │ 1. modeOverride (API request body)      │ ← Highest priority
  ├─────────────────────────────────────────┤
  │ 2. conversation.operationalMode (DB)    │
  ├─────────────────────────────────────────┤
  │ 3. Default: "auto"                      │ ← Lowest priority
  └─────────────────────────────────────────┘

Key Methods:

resolveMode() - Main entry point
getRequestedMode() - Determines mode via hierarchy
resolveEffectiveMode() - Converts "auto" to concrete mode

3. AutoClassifierService (auto-classifier.service.ts)

AI-powered query complexity analysis:

Classification Flow:
  ┌─────────────────────────────────────────┐
  │ 1. Extract last user message            │
  ├─────────────────────────────────────────┤
  │ 2. Quick heuristic (< 15 chars → fast)  │
  ├─────────────────────────────────────────┤
  │ 3. Check ClassificationCacheService     │
  │    - Cache hit → return cached mode     │
  ├─────────────────────────────────────────┤
  │ 4. AI classification (with 5s timeout)  │
  │    - Send prompt to lightweight model   │
  │    - "SIMPLE" → fast mode               │
  │    - "COMPLEX" → thinking mode          │
  ├─────────────────────────────────────────┤
  │ 5. Cache result (5-minute TTL)          │
  ├─────────────────────────────────────────┤
  │ 6. Return effective mode                │
  └─────────────────────────────────────────┘

Classification Criteria:

Query Type	Mode	Examples
Short questions	Fast	"What is X?", "Define Y"
Factual lookups	Fast	"Who invented Z?"
Yes/no questions	Fast	"Can I do X?"
Multi-part analysis	Thinking	"Compare X and Y in detail"
Code implementation	Thinking	"Build a function that..."
Debugging requests	Thinking	"Why is this code failing?"
Design decisions	Thinking	"Design a system for..."

4. ClassificationCacheService (classification-cache.service.ts)

In-memory caching to reduce AI calls:

Cache Behavior:
  - TTL: 5 minutes per entry
  - Key: MD5 hash of last user message text
  - Cleanup: Every 60 seconds (removes expired)
  - Storage: Map<string, CacheEntry>

Benefits:

Reduces classification API calls by ~70%
Improves response time for repeated queries
Automatic expiration ensures fresh classifications

Database Schema

Conversation Entity (conversation.entity.ts)

ALTER TABLE conversations ADD COLUMN operational_mode VARCHAR(20);
-- Values: 'fast' | 'thinking' | 'auto' | NULL

-- NULL means use default behavior (auto mode)

Message Metadata (message.entity.ts)

// Messages store mode metadata in JSONB
{
  "metadata": {
    "toolCalls": [...],  // Existing tool call data
    "mode": {            // NEW: Mode tracking
      "requested": "auto",
      "effective": "thinking",
      "modelUsed": "llama-3.3-70b-versatile",
      "temperature": 0.7,
      "tokensUsed": null  // Future: token tracking
    }
  }
}

API Endpoints

1. Send Message with Mode Override

POST /chat/conversations/:id/messages
Authorization: Bearer {token}
Content-Type: application/json

{
  "messages": [
    {
      "role": "user",
      "parts": [{ "type": "text", "text": "Explain quantum computing" }]
    }
  ],
  "modeOverride": "thinking"  // Optional: "fast" | "thinking" | "auto"
}

Response: 200 OK (Server-Sent Events)

2. Update Conversation Mode

PUT /chat/conversations/:id/operational-mode
Authorization: Bearer {token}
Content-Type: application/json

{
  "mode": "fast"  // "fast" | "thinking" | "auto"
}

Response: 200 OK
{
  "id": "conv-uuid",
  "title": "Conversation Title",
  "systemPrompt": "...",
  "operationalMode": "fast",  // Updated mode
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-01T00:00:00.000Z"
}

3. Create Conversation with Mode

POST /chat/conversations/with-message
Authorization: Bearer {token}
Content-Type: application/json

{
  "title": "New Chat",
  "systemPrompt": "You are a helpful assistant.",
  "operationalMode": "thinking",  // Optional
  "firstMessage": "Hello!"
}

Response: 201 Created
{
  "id": "conv-uuid",
  "title": "New Chat",
  "operationalMode": "thinking",
  ...
}

Request Flow Example

Scenario: User sends "Explain how databases work internally" with no mode override

1. Request arrives at ChatController
   └─> No modeOverride in request

2. ChatService.handleStreamingResponse()
   └─> Calls ModeResolverService.resolveMode()

3. ModeResolverService.getRequestedMode()
   ├─> Check modeOverride: null
   ├─> Check conversation.operationalMode: null
   └─> Default: "auto"
   └─> Requested mode = "auto"

4. ModeResolverService.resolveEffectiveMode()
   └─> Mode is "auto" → delegate to AutoClassifierService

5. AutoClassifierService.classify()
   ├─> Extract query: "Explain how databases work internally"
   ├─> Length check: 38 chars (> 15) → continue
   ├─> Cache check: MISS (not cached)
   ├─> AI classification prompt sent:
   │   "Query: 'Explain how databases work internally'"
   │   Response: "COMPLEX"
   ├─> Result: "thinking" mode
   └─> Cache result with key: md5(query)

6. Return to ChatService
   └─> { requested: "auto", effective: "thinking" }

7. ChatService gets MODE_CONFIG["thinking"]
   └─> model: llama-3.3-70b-versatile
   └─> maxTokens: 4000
   └─> temperature: 0.7
   └─> systemPrompt: "Provide thorough, comprehensive..."

8. AIService.streamResponseWithMode()
   └─> Apply mode config to streaming request

9. Response streamed to client
   └─> Save message with mode metadata

10. Next identical query
    └─> Cache HIT → skip AI classification
    └─> Instant mode resolution (< 1ms)

Performance Characteristics

Operation	Latency	Notes
Mode resolution (no auto)	< 1ms	Direct lookup
Cache hit (auto mode)	< 1ms	MD5 hash lookup
Cache miss (auto mode)	~100-500ms	AI classification call
Classification timeout	5s	Fallback to fast mode

Optimization Strategies:

✅ 5-minute cache TTL reduces 70% of classification calls
✅ Heuristic pre-filter (< 15 chars) ~10% speedup
✅ 5-second timeout prevents hanging requests
✅ Fail-safe: errors → default to fast mode

Configuration

Environment Variables:

# Fast mode model (lightweight, quick responses)
AI_TEXT_MODEL=llama-3.1-8b-instant

# Thinking mode model (powerful, detailed responses)
AI_TOOL_MODEL=llama-3.3-70b-versatile

Mode Customization:

Edit src/modules/chat/modes/mode.config.ts:

export const MODE_CONFIG: Record<EffectiveMode, ModeConfig> = {
  fast: {
    model: process.env.AI_TEXT_MODEL || 'llama-3.1-8b-instant',
    maxTokens: 500,        // Adjust max response length
    temperature: 0.5,      // Adjust creativity (0.0 - 1.0)
    systemPrompt: '...'    // Customize behavior
  },
  thinking: {
    model: process.env.AI_TOOL_MODEL || 'llama-3.3-70b-versatile',
    maxTokens: 4000,
    temperature: 0.7,
    systemPrompt: '...'
  }
};

Benefits

For Users:

🚀 Faster responses for simple queries (fast mode)
🧠 Deeper answers for complex questions (thinking mode)
🤖 Automatic optimization with auto mode (default)
🎯 Manual control via mode override or conversation settings

For System:

💰 Cost optimization - Use lightweight models when appropriate
⚡ Performance - Faster responses with smaller models
📊 Observability - Mode metadata tracked per message
🔧 Flexibility - Easy to add new modes or adjust configs

🔧 Tool System

Architecture

The tool system is designed to be extensible and type-safe:

Tool Interface - Base contract for all tools
Tool Registry - Central registry for managing tools
Tool Config - Auto-registers tools on startup
Tool Implementations - Specific tool logic

Creating a New Tool

// 1. Create tool implementation
import { Injectable } from '@nestjs/common';
import { z } from 'zod';
import { Tool } from '../interfaces/tool.interface';

@Injectable()
export class CalculatorTool implements Tool {
  readonly name = 'calculator';
  
  readonly description = 'Performs basic arithmetic operations';
  
  readonly parameters = z.object({
    operation: z.enum(['add', 'subtract', 'multiply', 'divide']),
    a: z.number(),
    b: z.number(),
  });

  async execute(params: z.infer<typeof this.parameters>) {
    const { operation, a, b } = params;
    
    switch (operation) {
      case 'add': return a + b;
      case 'subtract': return a - b;
      case 'multiply': return a * b;
      case 'divide': return a / b;
    }
  }
}

// 2. Register in chat.module.ts
providers: [
  // ... existing providers
  CalculatorTool,
]

// 3. Register in tools.config.ts
onModuleInit() {
  this.toolRegistry.register(this.calculatorTool);
}

Available Tools

Web Search Tool (`tavily_web_search`)

Searches the web for current information using Tavily API.

Parameters:

query (string): Search query
maxResults (number, optional): Max results (1-10, default: 5)

Returns:

{
  query: string;
  results: Array<{
    title: string;
    url: string;
    snippet: string;
    favicon: string;
    publishedDate?: string;
    relevanceScore: number;
  }>;
  resultsCount: number;
  searchedAt: string;
  summary: string;
  citations: Array<{
    text: string;
    sourceIndex: number;
    url: string;
  }>;
}

🚢 Deployment

This API is deployed on DigitalOcean using a production-grade architecture with Docker, PostgreSQL, Nginx reverse proxy, SSL certificates, and automated CI/CD.

Live API: https://api.betterdev.in

Production Architecture

GitHub (push to main)
       ↓ CI/CD
GitHub Actions ───> SSH into VPS ───> Restart Docker Container
                                     (auto-build + health check)
DigitalOcean VPS (Ubuntu 22.04)
       ↓
NGINX (SSL, reverse proxy)
       ↓
Docker Container (NestJS API)
       ↓
PostgreSQL (VPS service)

🌐 DigitalOcean VPS Deployment

Step 1: Prepare the VPS

Create a DigitalOcean Droplet with Ubuntu 22.04 LTS
SSH into the server:
```
ssh root@your-server-ip
```

Create a non-root user (best practice):

adduser kashif
usermod -aG sudo kashif
su - kashif

Step 2: Install Docker & Docker Compose

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Add user to docker group (no sudo needed)
sudo usermod -aG docker $USER
newgrp docker

# Install Docker Compose
sudo apt-get update
sudo apt-get install docker-compose-plugin

# Verify installation
docker --version
docker compose version

Step 3: Install PostgreSQL on VPS

Instead of running PostgreSQL in Docker, we use a standalone database for stability and performance:

# Install PostgreSQL 14
sudo apt update
sudo apt install postgresql postgresql-contrib

# Create database and user
sudo -u postgres psql

CREATE DATABASE better_dev_db;
CREATE USER better_dev WITH PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE better_dev_db TO better_dev;
\q

# Configure PostgreSQL to accept connections
sudo nano /etc/postgresql/14/main/postgresql.conf
# Set: listen_addresses = '*'

sudo nano /etc/postgresql/14/main/pg_hba.conf
# Add: host all all 0.0.0.0/0 md5

sudo systemctl restart postgresql

# Open firewall (if using UFW)
sudo ufw allow 5432/tcp

Step 4: Clone Repository

cd ~
git clone https://github.com/Kashif-Rezwi/better-dev-api.git
cd better-dev-api

Step 5: Configure Environment

Create .env file:

nano .env

Add environment variables:

# Database
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_USER=better_dev
DATABASE_PASSWORD=your_secure_password
DATABASE_NAME=better_dev_db

# JWT
JWT_SECRET=your-super-secret-jwt-key-change-this
JWT_EXPIRATION=7d

# App
PORT=3001
NODE_ENV=production

# AI
GROQ_API_KEY=gsk_your_groq_api_key
DEFAULT_AI_MODEL=openai/gpt-oss-120b
AI_TEXT_MODEL=llama-3.1-8b-instant
AI_TOOL_MODEL=llama-3.3-70b-versatile

# Tools
TAVILY_API_KEY=tvly-your-tavily-api-key

Step 6: Build & Start Docker Container

# Build Docker image
docker compose build --no-cache

# Start container in detached mode
docker compose up -d

# Verify API is running
curl http://localhost:3001/health

# View logs
docker compose logs -f

Step 7: Install & Configure Nginx Reverse Proxy

# Install Nginx
sudo apt update
sudo apt install nginx

# Create Nginx configuration
sudo nano /etc/nginx/sites-available/api.betterdev.in

Add the following configuration:

server {
    listen 80;
    server_name api.betterdev.in;

    location / {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Enable the site:

# Create symbolic link
sudo ln -sf /etc/nginx/sites-available/api.betterdev.in /etc/nginx/sites-enabled/

# Test configuration
sudo nginx -t

# Reload Nginx
sudo systemctl reload nginx

Step 8: Enable HTTPS with Certbot (Free SSL)

# Install Certbot
sudo apt install certbot python3-certbot-nginx

# Obtain SSL certificate (auto-configures Nginx)
sudo certbot --nginx -d api.betterdev.in

# Verify auto-renewal
sudo certbot renew --dry-run

Now your API is accessible at: https://api.betterdev.in

Step 9: Set Up GitHub Actions CI/CD (Auto Deploy)

This enables automatic deployment on every push to main branch.

On the VPS:

Generate SSH key for deployments (no passphrase):

ssh-keygen -t ed25519 -C "github-actions-deploy" -f ~/.ssh/github_actions_deploy

# Add public key to authorized_keys
cat ~/.ssh/github_actions_deploy.pub >> ~/.ssh/authorized_keys

# Copy private key (you'll add this to GitHub Secrets)
cat ~/.ssh/github_actions_deploy

On GitHub:

Add SSH private key to GitHub Secrets:
- Go to your repo → Settings → Secrets and variables → Actions
- Add secret: SSH_PRIVATE_KEY (paste the private key content)
- Add secret: SERVER_IP (your VPS IP address)
- Add secret: SERVER_USER (your username, e.g., kashif)
Create GitHub Actions workflow:

Create .github/workflows/deploy.yml:

name: Deploy to DigitalOcean

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: ubuntu-latest
    
    steps:
      - name: Deploy to VPS
        uses: appleboy/ssh-action@v1.0.0
        with:
          host: ${{ secrets.SERVER_IP }}
          username: ${{ secrets.SERVER_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            cd ~/better-dev-api
            git pull origin main
            docker compose down
            docker compose up -d --build
            docker compose logs --tail=50

Now, every push to main automatically deploys to production! 🚀

🎯 Production Features

✅ Auto Deploy - Push to GitHub → automatic deployment
✅ Auto Restart - Docker restarts API if it crashes
✅ Health Checks - Docker monitors API health
✅ SSL Auto-Renew - Free HTTPS with auto-renewal
✅ Isolated Database - PostgreSQL runs independently
✅ Reverse Proxy - Nginx handles all traffic
✅ Production-Grade - Enterprise-level architecture

🔄 Managing Deployment

# SSH into VPS
ssh kashif@your-server-ip

# View logs
cd ~/better-dev-api
docker compose logs -f

# Restart API
docker compose restart

# Rebuild and restart
docker compose down
docker compose up -d --build

# Check health
curl https://api.betterdev.in/health

# View Nginx logs
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log

# Check SSL certificate
sudo certbot certificates

🐳 Local Docker Development

For local development with Docker:

# Build and start all services
npm run docker:up

# View logs
npm run docker:logs

# Stop services
npm run docker:down

# Restart app only
npm run docker:restart

💻 Development

Available Scripts

# Development
npm run start:dev        # Start with hot-reload
npm run start:debug      # Start with debugger

# Production
npm run build            # Build for production
npm run start:prod       # Run production build

# Docker
npm run docker:build     # Build Docker images
npm run docker:up        # Start Docker containers
npm run docker:down      # Stop Docker containers
npm run docker:logs      # View logs
npm run docker:restart   # Restart app container

# Testing
npm run test             # Run tests
npm run test:watch       # Run tests in watch mode
npm run test:cov         # Generate coverage report

Code Quality

# Linting
npx eslint .

# Format code
npx prettier --write "src/**/*.ts"

Database Migrations

# Generate migration
npx typeorm migration:generate -n MigrationName

# Run migrations
npx typeorm migration:run

# Revert migration
npx typeorm migration:revert

🔐 Environment Variables

Required Variables

Variable	Description	Example
`DATABASE_HOST`	PostgreSQL host	`localhost`
`DATABASE_PORT`	PostgreSQL port	`5432`
`DATABASE_USER`	Database user	`nebula_ai`
`DATABASE_PASSWORD`	Database password	`securepassword`
`DATABASE_NAME`	Database name	`nebula_db`
`JWT_SECRET`	JWT signing secret	`your-secret-key`
`JWT_EXPIRATION`	Token expiration	`7d`
`GROQ_API_KEY`	Groq API key	`gsk_...`
`TAVILY_API_KEY`	Tavily API key	`tvly-...`

Optional Variables

Variable	Description	Default
`PORT`	Server port	`3001`
`NODE_ENV`	Environment	`development`
`DEFAULT_AI_MODEL`	Default AI model	`openai/gpt-oss-120b`
`AI_TEXT_MODEL`	Fast text model	`llama-3.1-8b-instant`
`AI_TOOL_MODEL`	Tool calling model	`llama-3.3-70b-versatile`

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Coding Standards

Follow TypeScript best practices
Use meaningful variable/function names
Write self-documenting code
Add comments for complex logic
Maintain consistent formatting (use Prettier)
Write unit tests for new features

📄 License

This project is licensed under the UNLICENSED License.

🙏 Acknowledgments

NestJS - Framework
Vercel AI SDK - AI streaming
Groq - Fast AI inference
Tavily - Web search API
TypeORM - Database ORM

📞 Support

For questions or issues:

Open an Issue
Contact: GitHub Profile

Built with ❤️ using NestJS and TypeScript

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/workflows		.github/workflows
architectures		architectures
nginx		nginx
src		src
test		test
.dockerignore		.dockerignore
.env.docker		.env.docker
.gitignore		.gitignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
nest-cli.json		nest-cli.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Kashif-Rezwi/better-dev-api

Folders and files

Latest commit

History

Repository files navigation

Better DEV AI Backend

📋 Table of Contents

🌟 Overview

🏗️ Architecture

High-Level Design (HLD)

Component Interactions

✨ Features

Core Capabilities

🛠️ Tech Stack

Backend Framework

Database

AI & Tools

Authentication

Validation

DevOps

📦 Prerequisites

🚀 Getting Started

Option 1: Docker (Recommended)

Option 2: Local Development

📁 Project Structure

📡 API Documentation

Base URL

Authentication

Register User

Login

Get Profile

Conversations

Create Conversation with First Message

List Conversations

Get Conversation

Send Message (Streaming)

Update System Prompt

Generate Title

Delete Conversation

Health Check

🧠 AI Service & Intelligent Query Routing

Query Intent Analysis

How It Works

Intelligence Features

Smart Conversation Context

Benefits

Configuration

Example Flow

🎯 Operational Modes

Overview

High-Level Design (HLD)

Low-Level Design (LLD)

Component Breakdown

Database Schema

API Endpoints

1. Send Message with Mode Override

2. Update Conversation Mode

3. Create Conversation with Mode

Request Flow Example

Performance Characteristics

Configuration

Benefits

🔧 Tool System

Architecture

Creating a New Tool

Available Tools

Web Search Tool (tavily_web_search)

🚢 Deployment

Production Architecture

🌐 DigitalOcean VPS Deployment

Step 1: Prepare the VPS

Step 2: Install Docker & Docker Compose

Step 3: Install PostgreSQL on VPS

Step 4: Clone Repository

Step 5: Configure Environment

Step 6: Build & Start Docker Container

Step 7: Install & Configure Nginx Reverse Proxy

Step 8: Enable HTTPS with Certbot (Free SSL)

Step 9: Set Up GitHub Actions CI/CD (Auto Deploy)

🎯 Production Features

Web Search Tool (`tavily_web_search`)

Packages