DocsTalk

Connect your LLM to living, official docs.
Stop trusting stale answers.

Tech Stack

Integrations

Project

AI-powered documentation assistant for developers.
Get instant, accurate answers from official docs using RAG.

📑 Table of Contents

Overview
When to Use DocsTalk
Key Features
Architecture
Ecosystem Routing
Getting Started
CLI Usage
Known Limitations
Contributing
License

🎯 Overview

DocsTalk is an AI-powered documentation assistant that helps developers find accurate answers from official documentation. Unlike general-purpose LLMs that may hallucinate or provide outdated information, DocsTalk uses RAG (Retrieval-Augmented Generation) to ensure answers are grounded in actual, up-to-date documentation.

Current Status: v0.3.1-alpha (Production-Ready)

✅ When to Use DocsTalk

👍 Use this if:

You want accurate answers from official documentation
You're tired of LLMs making up APIs or deprecated features
You work with multiple frameworks and need quick reference
You need answers in your native language (supports 100+ languages)
You want to ensure consistency across team documentation lookup

👎 Not suitable if:

You need general-purpose chat AI or creative writing
You want code generation without documentation context
You need offline access (requires internet connection)
You're looking for non-technical content

✨ Key Features

Ecosystem-Based Routing ⭐: Hierarchical doc grouping with intelligent detection*
Smart Scraping 🆕: Incremental & partial modes (5-60x faster updates)*
Secure CLI 🔒: Multi-layer authentication for developer commands
Smart Auto-Detection: Automatically identifies relevant documentation
Hybrid Search: Combines keyword matching with semantic vector search
Multi-Source Support: 16 sources across 8 ecosystems (React, Next.js, Python, Docker, etc.)
Deterministic Indexing: Stable chunk IDs for idempotent reindexing
CLI Tools: Command-line interface for scraping, indexing, and querying
RAG Powered: Uses Google Gemini (latest model) for accurate, context-aware answers
Global Language Support: Responds in user's query language

*Performance metrics depend on query complexity, dataset size, and baseline comparison. See Performance section.

🏗️ Hybrid Architecture

DocsTalk uses a Hybrid Architecture for maximum scalability and performance:

Supabase (PostgreSQL): System of Record - stores user data, chat history, and documentation metadata
Qdrant (Vector DB): Semantic Engine - stores high-dimensional vectors and full content for fast retrieval

🎯 Ecosystem-Based Routing (v0.3.1-alpha)

DocsTalk uses Hierarchical Ecosystem Routing for intelligent documentation detection:

8 Ecosystem Groups

🟦 Frontend Web - React, Next.js, Vue, TypeScript
🟩 JS Backend - Node.js, Express
🟧 Python - FastAPI, Python
🟨 Systems - Rust, Go
🟥 Cloud/Infra - Docker
🟪 AI/ML - DocsTalk Platform
🟫 Database - Prisma, PostgreSQL
🟩 Styling - Tailwind CSS

4-Stage Detection

Alias Matching (~2ms) - Natural phrases like "react hooks", "next router"
Keyword Groups (~5ms) - Semantic clustering of related concepts
Vector Similarity (~15ms) - 768d Gemini embeddings
AI Classification (~500ms) - Fallback for complex queries

Performance

⚡ 10-250x faster detection (2-50ms vs 500ms baseline)*
📈 92% accuracy in routing queries to correct documentation
🎯 Multi-doc context - Searches related docs in parallel
💾 GIN indexes - Optimized keyword/alias searches

*Compared to full AI classification on every query. Actual speedup varies by query type and complexity.

🚀 Getting Started

Prerequisites

Node.js 18+
pnpm
Supabase Project
Qdrant Instance (Cloud or Docker)
Google Gemini API Key
Clerk account

Environment Setup

Create .env files in apps/web and apps/api (see .env.example).

Required for API:

# Database
SUPABASE_URL=...
SUPABASE_SERVICE_ROLE_KEY=...
QDRANT_URL=...
QDRANT_API_KEY=...

# AI
GEMINI_API_KEY=...

Installation

# Clone repository
git clone https://github.com/hk-dev13/docstalk.git
cd docstalk

# Install dependencies
pnpm install

# Setup environment variables
cp apps/web/.env.example apps/web/.env.local
cp apps/api/.env.example apps/api/.env

# Build all packages
pnpm build

# Run development servers
pnpm dev

🖥️ CLI Usage

DocsTalk comes with a powerful CLI tool for managing documentation:

Installation

# Install CLI globally
npm install -g @docstalk/cli

# Or use from project (for development)
cd packages/cli
pnpm link --global

Public Commands (No authentication required)

# Ask a question
docstalk ask "how to use react hooks?"

# Ask with specific source
docstalk ask "docker compose" --source docker

# Search documentation
docstalk search "typescript generics"

# Show version
docstalk version

# Show help
docstalk help

Developer Commands (Requires authentication)

# Setup authentication
export DOCSTALK_ADMIN_TOKEN=dtalk_admin_YOUR_SECRET_KEY

# Start development server
docstalk dev serve

# Scrape documentation
docstalk dev scrape react
docstalk dev scrape react --incremental  # 5-10x faster
docstalk dev scrape https://react.dev/hooks/useState --partial  # 20-60x faster

# Index documentation
docstalk dev index react

# Scrape and index in one go
docstalk dev scrape react --index

# Test router
docstalk dev test-router "how to use hooks?"

CLI Features

Smart Scraping: Incremental and partial modes for faster updates
Auto-indexing: Scrape and index in one command
Multi-layer Auth: Secure developer commands
Global Access: Works from anywhere with proper token
Branded UI: Beautiful ASCII art and helpful messages

See CLI Documentation for more details.

⚠️ Known Limitations

Current Limitations (v0.3.1-alpha)

No Automatic Re-scraping: Documentation must be manually re-scraped to update
No Offline Mode: Requires internet connection for queries and indexing
Partial Source Coverage: Some documentation sources may have incomplete indexing
API Quota Dependency: Initial indexing requires Google Gemini API quota
No Real-time Updates: Documentation updates are not reflected until re-indexed
Limited to 16 Sources: Currently supports 16 documentation sources (can be expanded)

Planned Improvements

See Roadmap for upcoming features and improvements.

Environment Setup

Web (.env.local):

NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_key
CLERK_SECRET_KEY=your_clerk_secret
NEXT_PUBLIC_API_URL=http://localhost:3001

API (.env):

PORT=3001
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
GEMINI_API_KEY=your_gemini_key
CLERK_SECRET_KEY=your_clerk_secret
QDRANT_URL=...
QDRANT_API_KEY=...

📁 Monorepo Structure

docs_talk/
├── apps/
│   ├── web/              # Next.js 16 frontend
│   │   ├── src/
│   │   │   ├── app/      # App router pages
│   │   │   ├── components/
│   │   │   └── lib/
│   │   └── package.json
│   ├── api/              # Fastify backend
│   │   ├── src/
│   │   │   ├── index.ts  # Main server
│   │   │   └── services/
│   │   ├── scripts/
│   │   │   ├── scrape/   # Documentation scrapers
│   │   │   └── index/    # Indexing scripts
│   │   └── package.json
│   └── cli/              # CLI Tool (NEW!)
│       ├── src/
│       │   ├── commands/
│       │   └── index.ts
│       └── package.json
│
├── packages/             # Shared packages
│   ├── ui/              # Shared React components
│   │   ├── src/
│   │   │   ├── button.tsx
│   │   │   ├── chat-message.tsx
│   │   │   ├── chat-input.tsx
│   │   │   └── conversation-sidebar.tsx
│   │   └── package.json
│   │
│   ├── types/           # Shared TypeScript types
│   │   ├── src/
│   │   │   ├── api.ts
│   │   │   ├── document.ts
│   │   │   └── enums.ts
│   │   └── package.json
│   │
│   ├── rag/             # RAG utilities
│   │   ├── src/
│   │   │   ├── embeddings.ts
│   │   │   ├── usage-tracking.ts
│   │   │   ├── conversation.ts
│   │   │   └── response-modes.ts
│   │   └── package.json
│   │
│   └── config/          # Shared configs
│       ├── tsconfig.base.json
│       ├── tailwind.preset.js
│       └── .prettierrc.js
│
├── scripts/             # Root automation scripts
│   ├── dev-all.sh      # Run all apps
│   ├── build-all.sh    # Build monorepo
│   ├── deploy-api.sh   # Deploy API
│   └── deploy-web.sh   # Deploy web
│
└── docs/                # Documentation & guides
    ├── GETTING-STARTED.md
    ├── api/
    └── database/

🛠️ Tech Stack

Frontend

Next.js 16.0.3 - React framework with App Router
React 19 - Latest React with Server Components
TailwindCSS v4 - Utility-first CSS
shadcn/ui - Beautiful component library
Clerk - Authentication & user management
next-themes - Dark/Light mode support

Backend & CLI

Fastify - Fast TypeScript web framework
Commander.js - CLI framework
Google Gemini 2.5 Flash - LLM for answer generation & reasoning
Gemini text-embedding-004 - Vector embeddings
Supabase - PostgreSQL with pgvector
Clerk - Auth verification
Qdrant - Vector database

DevOps

pnpm - Fast, efficient package manager
TypeScript 5 - Type safety across monorepo
ESLint 9 - Code linting
Prettier - Code formatting

📚 Documentation Workflow

1. Scrape Documentation

# Scrape official documentation
cd apps/api
pnpm scrape nextjs    # Scrape Next.js docs
pnpm scrape react     # Scrape React docs
pnpm scrape typescript # Scrape TypeScript docs

2. Index Documentation

# Generate embeddings and store in Supabase
pnpm index nextjs
pnpm index react
pnpm index typescript

3. Query via Chat

Open http://localhost:3000 and start asking questions!

🔧 Development Scripts

Monorepo Commands

# Development
./scripts/dev-all.sh              # Run all apps
pnpm --filter @docstalk/web dev   # Run web only
pnpm --filter @docstalk/api dev   # Run API only

# Building
./scripts/build-all.sh            # Build everything
pnpm --filter @docstalk/ui build  # Build UI package
pnpm --filter @docstalk/rag build # Build RAG package

# Deployment
./scripts/deploy-web.sh           # Deploy web
./scripts/deploy-api.sh           # Deploy API

Package-Specific

# Web
cd apps/web
pnpm dev          # Development server
pnpm build        # Production build
pnpm start        # Run production build

# API
cd apps/api
pnpm dev          # Development server
pnpm build        # Build TypeScript
pnpm start        # Run compiled JS
pnpm scrape <source>  # Scrape docs
pnpm index <source>   # Index docs

🏗️ Architecture

RAG Pipeline

Scraping - Crawl official documentation with Puppeteer
Chunking - Smart text splitting with overlap (auto-split for large chunks)
Embedding - Generate vectors using Gemini text-embedding-004
Storage - Store in Supabase with pgvector
Qdrant - Store in Qdrant vector database
Retrieval - Hybrid search (similarity + keyword + version-aware)
Generation - Context-aware answers with Gemini 2.5 Flash

Key Features

Query Reformulation:

Converts follow-up questions into standalone queries
Preserves conversation context
Improves search accuracy

Auto-Split Chunking:

Intelligently splits large content (>30KB)
Preserves semantic boundaries (paragraphs, sentences)
Tracks parts with metadata

Response Modes:

Friendly, Formal, Tutorial, Simple, Deep-dive, Examples, Summary
Customizable persona and tone

📊 Database Schema

See docs/database/schema_docstalk.sql

Key tables:

users - User accounts (from Clerk)
user_usage - Usage tracking & limits
conversations - Chat conversations
messages - Chat history
doc_chunk_meta - Documentation chunks with embeddings
doc_sources - Documentation source metadata
context_switches - Context switching history
usage - Usage tracking & limits

🤝 Contributing

We welcome contributions! This project is designed to be maintainable by solo developers.

Adding New Documentation Source

Copy template: apps/api/scripts/scrape/sources/_template.ts
Implement scraping logic for your source
Add to DOC_CONFIGS in scrape-docs.ts
Run pnpm scrape <your-source>
Run pnpm index <your-source>

📝 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Next.js team for amazing documentation
Google Gemini for powerful AI capabilities
Supabase for excellent PostgreSQL + pgvector
Clerk for authentication
Qdrant for vector database
Shadcn for beautiful UI components
Puppeteer for web scraping
Commander.js for CLI
TypeScript for type safety
ESLint for code quality
Prettier for code formatting

Built with ❤️ for developers who hate reading manual documentation

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
apps		apps
docs		docs
packages		packages
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
render.yaml		render.yaml
tsconfig.json		tsconfig.json

License

hk-dev13/docstalk

Folders and files

Latest commit

History

Repository files navigation

DocsTalk

Tech Stack

Integrations

Project

📑 Table of Contents

🎯 Overview

✅ When to Use DocsTalk

👍 Use this if:

👎 Not suitable if:

✨ Key Features

🏗️ Hybrid Architecture

🎯 Ecosystem-Based Routing (v0.3.1-alpha)

8 Ecosystem Groups

4-Stage Detection

Performance

🚀 Getting Started

Prerequisites

Environment Setup

Installation

🖥️ CLI Usage

Installation

Public Commands (No authentication required)

Developer Commands (Requires authentication)

CLI Features

⚠️ Known Limitations

Current Limitations (v0.3.1-alpha)

Planned Improvements

Environment Setup

📁 Monorepo Structure

🛠️ Tech Stack

Frontend

Backend & CLI

DevOps

📚 Documentation Workflow

1. Scrape Documentation

2. Index Documentation

3. Query via Chat

🔧 Development Scripts

Monorepo Commands

Package-Specific

🏗️ Architecture

RAG Pipeline

Key Features

📊 Database Schema

🤝 Contributing

Adding New Documentation Source

📝 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages