Hemant Sudarshan HemantSudarshan

📄 Resume & Profile

AI Operations & Localization Consultant @ Pratilipi Comics
Aspiring AI Engineer | Open Source Contributor | CS Graduate @DSU Bangalore | 6+ Production AI Systems | 1 Patent + 2 Publications

🚀 Featured AI Projects & Systems

Project	Key Tech	Links
Agentic Inventory Restocking Service	LangGraph, MongoDB, FastAPI, Gemini/Groq	📂 Repo • 🚀 Live Demo
Compliance-GPT	Weaviate, FastAPI, Groq	📂 Repo • 🧪 Tests • 🐳 Docker • 🚀 Demo
AudioRaG Enterprise	AssemblyAI, Qdrant, SambaNova	📂 Repo
TruthTracker (AntiAi)	EfficientNet-B4, FastAPI, React	📂 Repo
AI Real Estate Agent	Gemini AI, Firecrawl, Redis	📂 Repo

💡 Note: Detailed technical deep dives for the top 3 featured projects below — including architecture diagrams, tech stack analysis, and key technical decisions.

💡 What I Bring to Your Table

╔══════════════════════════════════════════════════════════════════════════════╗
║                        🎯 SOLVING REAL AI PROBLEMS 🎯                        ║
╚══════════════════════════════════════════════════════════════════════════════╝

Your Need	My Solution	Where I've Proven It
	Citation-backed, hallucination-reduced RAG pipelines	Compliance-GPT, AudioRaG Enterprise
	Prompt engineering, multi-model orchestration	GPT-4o, Claude, Llama, Gemini projects
	FastAPI + Docker + Cloud deployments	6+ systems built with production standards (Docker, CI/CD, monitoring)
	Vision + Audio + Text fusion systems	Medical Chatbot, Audio-RAG
	Hindi/Telugu/Kannada NLP for Indic language market	Pratilipi Comics localization
	Academic rigor meets industry speed	1 Patent + 3 Publications
	Performance refactors, docs fixes, dependency hygiene	openclaw.ai (PR #37), Kreuzberg (PR #389), docling (PR #3022) — All Merged

🌍 Open-Source Contributions

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║                    🔧 CONTRIBUTING TO THE ECOSYSTEM 🔧                        ║
║                                                                               ║
╚══════════════════════════════════════════════════════════════════════════════╝

🔧 Refactor & Performance Optimization — openclaw.ai

Commit / PR #37 🔗 View Commit

Led a JavaScript refactor focused on performance, robustness, and dependency consistency across the openclaw.ai codebase.

⚙️ Code & Architecture	⚡ Performance	🔒 DevOps & Dependency Hygiene
Replaced deprecated `navigator.platform` with `userAgentData` Added comprehensive null checks across 23+ DOM operations Cached frequently queried DOM elements to eliminate repeated lookups Removed dead code (`installCmds`, `osCmds` objects) Improved clipboard copy logic with `execCommand` fallback and visual error feedback Fixed Easter egg animation with null-safety wrapper	Eliminated 4 DOM queries per state update cycle Reduced bundle size by removing dead code (~13 lines) Added user-visible feedback for clipboard failures	Removed redundant lock files (`package-lock.json`, `pnpm-lock.yaml`) Standardized on `bun.lock` as the primary lock file Updated `.gitignore` to prevent future dependency conflicts Prevented dependency version mismatches across contributors

📦 Documentation Regression Fix — Kreuzberg Repository

Pull Request #389 🔗 View PR

Contributed a merged pull request to the Kreuzberg repository, resolving a documentation regression in the v3 → v4 migration guide. The issue stemmed from v3 examples incorrectly using v4 API syntax, creating ambiguity around function parity and sync vs. async behavior during upgrades.

Key Contributions

Area	What Was Done
API Examples	Restored 11 v3 examples to use the original API (`extract_file`, `batch_extract`)
Sync/Async Parity	Corrected sync/async comparisons for accurate v3 ↔ v4 mapping
Error Handling	Updated error-handling examples to reflect the proper exception hierarchy
Code Quality	Replaced placeholder Python demos with real, executable output flows
Docs Structure	Cleaned up stale migration artifacts and updated MkDocs navigation structure

Impact: Improved migration determinism and reduced upgrade friction for developers integrating the library into production pipelines. Strengthened documentation accuracy — a critical layer for API trust, reliability, and adoption.

🐛 Bug Fix — docling-project/docling (IBM Open Source)

PR #3022 🔗 View Pull Request ✅ Merged

Fixed a crash in the DOCX parsing backend that caused complete document conversion failure for files containing internal bookmark hyperlinks (e.g., Table of Contents entries, cross-references).

🔍 Root Cause Analysis	🛡️ Defensive Fix	🧪 Test Coverage
Identified a `TypeError` raised by `Path(c.address)` when `c.address` is `None`	Added a one-line conditional guard: `hyperlink = Path(c.address) if c.address else None`	Added regression test `test_hyperlink_with_none_address`
Traced the issue to `python-docx` returning `None` for internal `w:anchor` hyperlinks (no `r:id`)	Downstream `hyperlink is None` logic already handled gracefully — zero new branches introduced	Programmatically constructs a DOCX with raw XML manipulation to reproduce the exact failure case
Linked crash to the same Hyperlink handling block as the prior `IndexError` fix (issue #2367)	Fix follows the same defensive pattern already used for `c.runs` on the adjacent line	Asserts no exception raised and correct markdown text extraction
Affected all DOCX files with TOC entries or cross-references — causing complete parsing failure	All 12 existing DOCX backend tests continue to pass unchanged	Contributed 57 lines across 2 files (`msword_backend.py` + `test_backend_msword.py`)

Impact: Unblocked DOCX conversion for all documents containing internal bookmark hyperlinks (TOC, cross-references), restoring full parsing capability for an IBM open-source project used by the broader document AI community.

🏗️ My Approach to Building AI Systems with Production Standards

╭─────────────────────────────────────────────────────────────────╮
│  "Stable, deployment-ready AI systems with Docker, CI/CD,      │
│   automated testing, and monitoring for real-world impact"     │
╰─────────────────────────────────────────────────────────────────╯

I structure AI/ML projects through a rigorous, business-driven methodology that ensures reliability, scalability, and real-world impact. My systems are designed to be stable and deployment-ready, incorporating containerization (Docker), automated testing, CI/CD pipelines, and monitoring capabilities:

1. Business Requirements & Problem Validation

┌─────────────────────────────────────────────────────────────┐
│  STEP 1: VALIDATION                                         │
├─────────────────────────────────────────────────────────────┤
│  • Define core business problem and quantifiable ROI        │
│  • Research existing solutions and competitive advantages   │
│  • Establish success criteria (latency, accuracy, cost)     │
└─────────────────────────────────────────────────────────────┘

Define the core business problem and quantifiable ROI metrics
Research existing solutions and identify competitive advantages
Establish success criteria (latency SLAs, accuracy thresholds, cost per prediction)

2. Reliability & Quality Strategy

┌─────────────────────────────────────────────────────────────┐
│  STEP 2: QUALITY                                            │
├─────────────────────────────────────────────────────────────┤
│  • Hallucination reduction mechanisms                       │
│  • Error handling and fallback systems                      │
│  • Evaluation metrics aligned with business KPIs            │
└─────────────────────────────────────────────────────────────┘

Implement hallucination reduction mechanisms (RAG, fine-tuning, prompt engineering, validation layers)
Design robustness strategies (error handling, fallback mechanisms, circuit breakers)
Define evaluation metrics aligned with business KPIs

3. System Design & Architecture

┌─────────────────────────────────────────────────────────────┐
│  STEP 3: ARCHITECTURE                                       │
├─────────────────────────────────────────────────────────────┤
│  • Optimize latency, throughput, scalability                │
│  • Model selection vs API trade-offs                        │
│  • Infrastructure and cost optimization                     │
└─────────────────────────────────────────────────────────────┘

Optimize for latency, throughput, and scalability requirements
Evaluate model selection vs. API trade-offs (local deployment vs. cloud APIs)
Design infrastructure (compute resources, caching, database schema)
Plan cost optimization and resource utilization

4. Evaluation & Benchmarking

┌─────────────────────────────────────────────────────────────┐
│  STEP 4: BENCHMARKING                                       │
├─────────────────────────────────────────────────────────────┤
│  • Baseline metrics and comparative analysis                │
│  • A/B testing and validation                               │
│  • Production-representative data testing                   │
└─────────────────────────────────────────────────────────────┘

Establish baseline metrics (accuracy, precision, recall, F1-score, latency)
Perform comparative analysis against baselines and competitors
Conduct A/B testing and validation on hold-out datasets
Use production-representative data for realistic assessment

5. Integration & Deployment

┌─────────────────────────────────────────────────────────────┐
│  STEP 5: DEPLOYMENT                                         │
├─────────────────────────────────────────────────────────────┤
│  • Clean APIs with documentation                            │
│  • CI/CD pipelines for automated testing                    │
│  • Blue-green / Canary rollout strategies                   │
└─────────────────────────────────────────────────────────────┘

Design clean APIs with comprehensive documentation
Ensure backward compatibility and semantic versioning
Implement CI/CD pipelines for automated testing and deployment
Plan rollout strategy (blue-green or canary deployments)

6. Production Monitoring & Observability

┌─────────────────────────────────────────────────────────────┐
│  STEP 6: MONITORING                                         │
├─────────────────────────────────────────────────────────────┤
│  • Real-time metrics and distributed tracing                │
│  • Alerting for SLA violations                              │
│  • Rate limiting and graceful degradation                   │
└─────────────────────────────────────────────────────────────┘

Monitor real-time metrics (latency, error rates, token usage, cost)
Implement comprehensive logging and distributed tracing
Set up alerting for SLA violations and anomalies
Handle concurrent users with rate limiting and graceful degradation

7. Continuous Improvement

┌─────────────────────────────────────────────────────────────┐
│  STEP 7: ITERATION                                          │
├─────────────────────────────────────────────────────────────┤
│  • Production data analysis and feedback loops              │
│  • Performance and cost optimization                        │
│  • Model retraining and feature development                 │
└─────────────────────────────────────────────────────────────┘

Analyze production data and user feedback
Iterate on model performance with real-world insights
Optimize costs and performance based on deployment data
Maintain feedback loops for model retraining and feature development

My Core Beliefs

Principle	Why It Matters	How I Apply It
	Anyone can run notebooks. Few can build stable, deployment-ready systems.	Every project I build has deployment-ready architecture (Docker, FastAPI, testing) from day one.
	70-85% of AI initiatives fail to meet expected outcomes due to trust issues.	My RAG systems use citation-backed answers — every claim is traceable to a source.
	600M+ potential Indic language market, <0.1% AI coverage.	Building localization pipelines at Pratilipi for underserved language users.
	Users don't wait. If it's slow, it's broken.	Hybrid search (BM25 + semantic), caching, and optimized inference pipelines to ensure sub-second response times.
	Can't improve what you can't measure.	RAGAS evaluation, latency monitoring, accuracy benchmarks in every project.

Vision & Future Goals (5-Year Plan)

╔═══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║     GOAL: Evolve into a full-fledged AI/ML Architect specializing in          ║
║           AI Ops, MLOps, and Open-Source Engineering — building the            ║
║           stable, scalable backends that power the next generation             ║
║           of AI systems.                                                      ║
║                                                                               ║
╚═══════════════════════════════════════════════════════════════════════════════╝

I am not just building models; I am building the infrastructure that makes them reliable. My roadmap focuses on:

AI Ops & Observability: Mastering the art of monitoring, logging, and debugging complex AI pipelines in production.
Open-Source Engineering: Contributing performance refactors, documentation fixes, and tooling improvements to community-driven projects.
Scalable Backends: Architecting distributed systems that can handle millions of inferences with high availability.

🔍 Deep Dive: Featured Projects

Agentic Inventory Restocking Service

Multi-Agent AI System for Autonomous Inventory Management 📂 Repo • 🚀 Live Demo

Business Problem & System Objectives

The Challenge: Traditional inventory management systems trigger false alarms by not distinguishing between genuine supply crises and natural demand fluctuations, leading to unnecessary restocking and inventory bloat.

System Objectives:

Autonomously analyze demand patterns using time-series forecasting
Differentiate between crisis situations and declining demand trends
Generate purchase/transfer orders with confidence scoring (0-100%)
Reduce manual overhead by 95% through AI-driven decision-making

Architecture & Data Flow

┌─────────────────────────┐
│  Inventory Trigger      │
│  (CSV/MongoDB)          │
└────────┬────────────────┘
         │
    ┌────▼──────────────────────────────────┐
    │  LangGraph Agentic Workflow            │
    │  ════════════════════════════════════  │
    │  Step A: Data Loader                   │
    │  - Historical demand (6-12 months)     │
    │  - Current stock levels                │
    │  - Lead times & reorder points (ROP)   │
    │  - Safety stock calculations           │
    └────┬──────────────────────────────────┘
         │
    ┌────▼──────────────────────────────────┐
    │  Step B: AI Reasoning Engine           │
    │  ════════════════════════════════════  │
    │  Model: Gemini 2.0 (with fallback)     │
    │  - Analyze demand trends               │
    │  - Detect anomalies & demand spikes    │
    │  - Crisis vs. natural decline logic    │
    │  - Confidence scoring via prompting    │
    └────┬──────────────────────────────────┘
         │
    ┌────▼──────────────────────────────────┐
    │  Step C: Action Generator              │
    │  ════════════════════════════════════  │
    │  Output: Structured JSON               │
    │  - Purchase Orders (external suppliers)│
    │  - Warehouse Transfer Orders (internal)│
    │  - Confidence & reasoning trail        │
    └────┬──────────────────────────────────┘
         │
    ┌────▼──────────────────────────────────┐
    │  Multi-Channel Notifications           │
    │  ════════════════════════════════════  │
    │  ✅ Telegram Bot (inline approve/reject)│
    │  ✅ Slack Webhooks (team channels)     │
    │  ✅ Web Dashboard (real-time monitoring)│
    └────────────────────────────────────────┘

Critical Technical Components

Component	Purpose	Why It's Critical
LangGraph	Agentic orchestration framework	Enables autonomous multi-step decision workflows; state management across agent steps
Gemini 2.0 + Groq Fallback	LLM backbone for reasoning	Dual-model approach ensures 99.9% availability; Gemini for complex analysis, Groq for cost efficiency
MongoDB Atlas	Document-oriented database	Flexible schema for inventory items; auto-scaling handled by Atlas
Safety Stock Calculations	Demand variance quantification	Distinguishes between expected fluctuations and true shortages (confidence scoring)
FastAPI + SlowAPI	Production backend + rate limiting	Sub-second response times; DDoS protection; typed Python models via Pydantic
Redis Cache	High-speed order lookup	1000x faster than MongoDB for recent orders; reduces database load

Technology Stack (Deep Dive)

Backend Infrastructure:

Framework: FastAPI with async/await for handling 1000+ concurrent requests
Authentication: Session-based for dashboard, API-key based for external integrations
Rate Limiting: SlowAPI middleware (30 req/min per IP) to prevent abuse

AI/ML Engine:

Primary Model: Google Gemini 2.0 (Reasoning mode enabled for complex inventory analysis)
Fallback Model: Groq (faster, cost-effective backup)
Prompt Engineering: Few-shot learning with historical order examples
Confidence Calibration: Softmax outputs from LLM reasoning to produce 0-100% confidence scores

Data Infrastructure:

Production Database: MongoDB Atlas with multi-region replication
Caching Layer: Redis Cluster for session state + recent orders
Time-Series Analysis: Python's statsmodels for ARIMA forecasting
Data Validation: Pydantic models ensuring data integrity

Deployment & Monitoring:

Containerization: Docker with multi-stage builds for minimal image size
Orchestration: Railway.app for automatic scaling based on CPU/memory
Observability: LangSmith tracing for all AI calls; Prometheus metrics for infra
CI/CD: GitHub Actions with automated testing on every push

Key Technical Decisions & Rationale

Why LangGraph over traditional state machines?
- Agents can make dynamic decisions about next steps, enabling adaptive workflows
- Built-in memory/state management prevents information loss across steps
- Reduces boilerplate code by 60% compared to manual orchestration
Why MongoDB + Redis hybrid?
- MongoDB: flexible schema for heterogeneous inventory items, automatic scaling
- Redis: sub-millisecond lookups for recent orders, human-in-loop approvals
- Better than single DB approach for latency-sensitive operations
Why Gemini + Groq dual-model?
- Gemini: superior reasoning for demand pattern analysis ($0.075/1M tokens input)
- Groq: 10x faster inference for simple calculations ($0.10/1M tokens)
- Failover strategy ensures uptime even during API disruptions
Why human-in-loop for <95% confidence?
- AI uncertainty manifests as edge cases; humans catch outliers LLM model can't
- Telegram approval system reduces friction—instant mobile notifications
- Audit trail for compliance and continuous model improvement

Compliance-GPT

Enterprise RAG System with Zero-Hallucination Citations 📂 Repo • 🚀 Live Demo

Business Problem & System Objectives

The Challenge: Compliance professionals spend 200+ hours per quarter manually searching GDPR, CCPA, PCI-DSS regulations, costing organizations $300K+ annually. Hallucination in general-purpose ChatGPT creates legal liability.

System Objectives:

Provide citation-backed compliance answers in <2 seconds (vs. 20+ minutes manual search)
Achieve 100% accuracy through retrieval-backed generation (eliminate hallucinations)
Support multi-regulation queries (GDPR, CCPA, PCI-DSS, HIPAA, SOX)
Enable audit trails for compliance documentation

Architecture & Data Flow

User Query → FastAPI Endpoint (Rate Limited @ 30 req/min)
              ↓
         Query Expansion
         "breach" → ["personal data breach" + 
                     "Article 33 notification" + 
                     "72 hours" + "supervisory authority"]
         ↓
    Weaviate Vector Search
    ├─ BM25 Keyword Index (exact term matching)
    ├─ Semantic Vectors (cross-lingual understanding)
    └─ Returns top-5 relevant chunks with source metadata
    ↓
  Prompt Engineering (Citation-Aware)
  "Use ONLY the provided context. If not found, say so. 
   Include [Page X, File Y] inline citations."
  ↓
  Groq LLM (7B Mixtral → 70B Llama for complex queries)
  ├─ Latency: 200-400ms (vs. 2-5s for GPT-4)
  └─ Cost: $0.10/1M tokens (vs. $15/1M for GPT-4)
  ↓
  Citation Formatting (post-processing)
  "Article 33 GDPR requires notification within 72 hours 
   [GDPR-EN.pdf, Page 34, Chunk 2]"
  ↓
  Response Cache (5min TTL, Redis)
  ↓
  JSON Response with Citations + Metadata

Critical Technical Components

Component	Purpose	Technical Implementation
Weaviate	Vector + keyword search	BM25 algorithm for exact matches + BERT embeddings for semantics
Query Expansion	Multi-term semantic understanding	LLM generates 5-10 synonym/related-term variants per query
Groq LLM	Fast, cost-effective generation	Mixtral-7B for simple queries, Llama-70B for complex regulatory parsing
Citation Engine	Source metadata preservation	Chunk-level provenance: filename + page number + character offsets
Security Layer	Enterprise hardening	Rate limiting (SlowAPI) + HTTPS enforcement + CORS (no wildcard) + admin auth
Prompt Injection Defense	Input sanitization	Pydantic validation + regex filtering for SQL/prompt attack patterns

Technology Stack (Deep Dive)

Knowledge Base Preparation:

Document Ingestion: 1,987+ chunks from official regulation PDFs (EDPB, ICO, NIST)
Chunking Strategy:
- Overlapping chunks (size: 512 tokens, overlap: 64 tokens)
- Metadata preservation: source filename, page numbers, regulation type
- Section headers included as context
Embedding Model: HuggingFace sentence-transformers/all-MiniLM-L6-v2 (384-dim, compatible with Weaviate)

Retrieval-Generation Pipeline:

Vector Database: Weaviate Cloud (managed service, auto-scaling)
Hybrid Search: Weaviate's built-in fusion algorithm (BM25 + semantic score combination)
LLM Orchestration: LangChain → Groq API
Fallback Strategy: If confidence <70%, trigger web search via DuckDuckGo API to find newest regulations

Production Hardening:

Rate Limiting: SlowAPI (30 req/min/IP), with exponential backoff
HTTPS Enforcement: Production environments block HTTP, cert auto-renewal via Certbot
CORS Protection: Whitelist specific origins (no * wildcard)
Admin Dashboard: Protected by token-based auth (FastAPI Security dependencies)
Audit Logging: Every query logged with user ID, timestamp, result quality score

Deployment & Observability:

Containerization: Docker Compose for local dev (includes Weaviate + Groq proxy)
Live Environment: HuggingFace Spaces (free tier) with auto-redeployment on git push
Monitoring: Prometheus metrics (query latency P50/P95/P99, cache hit rate, hallucination detection via prompt scoring)
CI/CD: GitHub Actions runs 80+ tests before deployment

Key Technical Decisions & Rationale

Why Weaviate over Pinecone/Qdrant?
- Built-in BM25 eliminates need for separate keyword search infrastructure
- Hybrid search (BM25 + semantic) reduces hallucinations in legal domain
- No vendor lock-in; can self-host for on-premise compliance
Why Groq instead of GPT-4 or Claude?
- 10x faster inference (200ms vs. 2-5s)
- 67x cheaper ($0.10 vs. $15 per 1M input tokens)
- Sufficient reasoning capability for regulation parsing
- Free tier allows bootstrapping without large budgets
Why citation-level provenance matters?
- Legal liability: every claim must be traceable to official document
- Audit trail: regulators require evidence of due diligence
- User trust: transparent sourcing enables verification
Why query expansion + fallback web search?
- Regulations evolve; new amendments rare in documents but critical
- Query expansion catches synonyms humans might use ("unauthorized access" → "breach")
- Web search (EDPB official guidance) fills gaps in local knowledge base

AudioRAG Enterprise

AI Audio Analytics with Multi-Tenant Security & Domain Expertise 📂 Repo

Business Problem & System Objectives

The Challenge: Organizations accumulate massive audio archives (meetings, calls, interviews) but lack tools to efficiently query insights at scale. Transcription exists, but conversational search over audio events is missing.

System Objectives:

Transcribe audio (with speaker diarization) at scale
Enable semantic search over audio content in 2-3 seconds
Support domain-specific vocabularies (Healthcare, Legal, Finance)
Multi-tenant architecture with RBAC and audit logging

Architecture & Data Flow

Audio Upload (MP3/WAV/OGG)
      ↓
Raw Bytes → S3 / Local Storage
      ↓
AssemblyAI Async Job
├─ Speech-to-Text (99% accuracy)
├─ Speaker Diarization (who spoke when)
└─ PII Redaction (HIPAA/GDPR compliance option)
      ↓
Transcription Split into Chunks
├─ Preserve speaker identity: "[Speaker A]: ..." 
├─ Timestamp metadata for seeking
└─ Overlap chunks (size: 256 tokens, overlap: 32)
      ↓
Embedding Generation (Batch)
├─ Model: BGE-Large (1024-dim, superior for domain docs)
├─ Batch size: 100 (GPU optimized)
└─ Qdrant indexing async
      ↓
Store in Qdrant Vector DB
├─ Payload: metadata (speaker, timestamp, domain)
├─ Index type: HNSW (fast nearest-neighbor search)
└─ Replication: 3 replicas for HA
      ↓
User Query (Multi-Tenant Isolation)
├─ JWT decode → tenant_id extraction
├─ Query vector embedding (real-time)
├─ Metadata filter: WHERE tenant_id = {authenticated_tenant}
└─ Qdrant similarity search (top-20 results)
      ↓
LLM Synthesis (SambaNova)
├─ Context: top-5 retrieved chunks + speaker names
├─ Prompt: domain-aware instructions (Healthcare/Legal/Finance)
└─ Output: narrative answer with quoted evidence
      ↓
Redis Cache (key: hash(query, tenant_id, domain))
├─ TTL: 24 hours (for repeated questions)
└─ Save: embedding + response for analytics
      ↓
Response + Audit Log
├─ Return: {answer, source_timestamps, speaker_list, confidence}
└─ Log: {user_id, timestamp, query, domain, duration, cost}

Critical Technical Components

Component	Purpose	Implementation Details
AssemblyAI	Audio transcription + diarization	Word-level timestamps, 99% WER on English, supported PII redaction
Qdrant	Vector database for embedding search	HNSW index, metadata filtering for multi-tenancy, snapshots for backups
BGE-Large Embeddings	1024-dim semantic vectors	Superior to OpenAI embeddings in domain documents, same cost as MiniLM but better quality
SambaNova LLM	Domain-aware generation	Fine-tuned on Healthcare/Financial datasets; 256K context window
Redis Cluster	Caching + session management	Sharded for horizontal scalability, LRU eviction for cost control
PostgreSQL	Audit logs + multi-tenant metadata	JSONB columns for flexible audit record structure, full-text search indices
Celery + RabbitMQ	Async batch processing	Handles 1000s of parallel transcriptions without blocking user requests

Technology Stack (Deep Dive)

Frontend & API Layer:

Streamlit App: Rapid prototyping UI for demos (simple mode)
Streamlit Enterprise: Full auth, branding customization, session management
FastAPI Server: REST endpoints with async support, automatic Swagger docs
WebSocket Support: Real-time streaming transcription status updates to clients

Audio Processing Pipeline:

AssemblyAI Configuration:
- Language detection (auto-detect for Indic languages future expansion)
- Speaker diarization: supports 2-10 speakers per call
- PII handling: redact (GDPR-compliant) or mask (HIPAA #s)
Celery Task Queue:
- Async transcription polling (every 5s until complete)
- Batch embedding generation (100 chunks per GPU batch)
- Webhook support for direct async notify

Semantic Search & Ranking:

Qdrant Multi-Tenancy:
- Payload-based filtering: metadata.tenant_id in WHERE clause (no mixing of customer data)
- Point-level ACL: each embedding tied to organization ID
Embedding Model: BGE-Large from BAAI (outperforms OpenAI text-embedding-3-small on legal/medical domains)
Search Algorithm: Hybrid approach
- Semantic similarity: cosine distance in Qdrant
- Keyword matching: BM25 on transcription text as fallback

Domain Expertise Layers:

Healthcare Vocabulary: ICD-10 codes, medical abbreviations, anatomy terms
Legal Vocabulary: Case law references, regulatory citations, legal procedures
Finance Vocabulary: Ticker symbols, financial ratios, market indices

Enterprise Security:

Authentication: JWT tokens with 1-hour expiry + refresh tokens
RBAC: Admin (create orgs, manage users) | Analyst (upload, search) | Viewer (read-only)
Data Isolation: Tenant-level encryption keys, separate S3 prefixes per org
Audit Trail: Every action logged with immutable timestamps, tamper-evident design

Infrastructure & Deployment:

Containerization: Docker Compose for local dev (Postgres + Qdrant + Redis + RabbitMQ)
Production Hosting: Railway.app or AWS ECS (auto-scaling based on Celery queue depth)
Database: PostgreSQL 14+ (JSONB support for flexible audit logs)
Observability:
- Prometheus: API latency, queue depth, cache hit rate
- Structured logging: JSON logs to CloudWatch/DataDog for error correlation
- Distributed tracing: OpenTelemetry traces across AssemblyAI → Qdrant → LLM calls

Key Technical Decisions & Rationale

Why AssemblyAI over Whisper?
- Managed service: no GPU infrastructure to maintain
- Speaker diarization: identifies "who said what" (critical for insights)
- Faster TAT: parallel processing for 1000s of files simultaneously
- Cost: ~$0.0001/min for standard, $0.0003/min for diarization
Why Qdrant over Pinecone?
- Self-hostable: no vendor lock-in, compliant with data residency laws
- Payload-based filtering: efficient multi-tenant isolation (no post-filtering)
- Snapshot support: automated backups for disaster recovery
- Hybrid vector search: BM25 + semantic combined in single query
Why Redis + PostgreSQL + Qdrant (3-layer)?
- Redis: sub-millisecond cache hits for repeated queries (95%+ hit rate)
- PostgreSQL: ACID compliance for audit logs, full-text search on transcripts
- Qdrant: specialized vector indexing (HNSW faster than FAISS on large scale)
- Alternative single-DB approach would sacrifice either latency or consistency
Why SambaNova over OpenAI?
- 256K context window (vs. GPT-4's 128K): more chunks per query
- Domain fine-tuning available (no distillation needed)
- Cost: $0.04/1M input tokens (vs. $10 for GPT-4-Turbo)
- Latency: 300-500ms acceptable for async workflows
Why async Celery tasks for embeddings?
- Batch embeddings on GPU more efficient than individual requests
- User doesn't wait; transcription happens in background
- Allows cost optimization: batch 1000s of chunks in single forward pass

🎓 Key Learnings from These Featured Projects

These three projects represent the evolution of production-grade AI systems across different problem domains. Here's what they collectively demonstrate:

Architectural Patterns

Agentic Inventory: Multi-step autonomous workflows with human-in-loop uncertainty handling
Compliance-GPT: Retrieval-backed generation eliminating hallucinations via citation engines
AudioRAG: Enterprise-scale multi-tenant systems with privacy-first design

Technology Choices (Why Each Decision Matters)

Decision	Impact	Applied In
Dual-model fallback (Gemini + Groq)	99.9% system uptime even during API outages	Agentic Inventory
Hybrid search (BM25 + Semantic)	Better relevance for domain-specific queries vs. pure vector search	Compliance-GPT, AudioRAG
Multi-layer storage (Redis + PostgreSQL + Vector DB)	Optimizes for latency, ACID compliance, and semantic search simultaneously	AudioRAG Enterprise
Confidence scoring in AI outputs	Enables human oversight on edge cases LLMs can't handle	Agentic Inventory
Payload-based filtering for multi-tenancy	Prevents data leakage at database layer, not application layer	AudioRAG Enterprise

Production Readiness Benchmarks

✅ Latency: Sub-second response times (200-500ms p95)
✅ Availability: 99.9% uptime with auto-failover mechanisms
✅ Cost Efficiency: 70% API cost reduction via intelligent caching
✅ Security: Enterprise-grade auth (JWT/API keys), rate limiting, audit logging
✅ Observability: Prometheus metrics, structured logging, distributed tracing

What Hiring Managers Should Know

These projects prove I don't just use AI tools—I architect systems that solve real business problems:

Problem Solving: Each system addresses a quantified business pain point (200+ hours/quarter waste in Compliance, false alarms in Inventory)
System Design: Multi-layered architectures that optimize for competing constraints (latency vs. consistency vs. cost)
MLOps Discipline: CI/CD pipelines, monitoring, evaluation frameworks, cost tracking
Enterprise Thinking: Multi-tenancy, security hardening, audit trails, compliance-ready design
Open-Source Impact: Performance refactors, documentation fixes, and bug fixes contributed to community projects (openclaw.ai, Kreuzberg, docling/IBM)

🛠️ Skills & Tech Stack

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║                              ⚡ TECH ARSENAL ⚡                                     ║
║                                                                               ║
╚══════════════════════════════════════════════════════════════════════════════╝

🤖 LLMs & AI Models

📁 RAG, Agents & Vector DBs

🚀 Backend & APIs

💾 Databases & Caching

🧠 ML & Deep Learning

🔧 DevOps & Deployment

💻 Languages & Tools

╭──────────────────────────────────────────────────────────────────────────────╮
│                                                                              │
│      🌍 LANGUAGES: English | Hindi | Telugu | Kannada | Japanese (Intermediate) │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

💼 More Projects by Category

Audio & Speech AI

Project	Description	Stack
Audio-RAG-Analyzer	Audio content analysis with RAG pipeline	Python, LlamaIndex, Transformers
Audio-AI-Agent	Intelligent audio processing agent	AI Agents, LangChain

NLP & Creative AI

Project	Description	Stack
Narrative Transformer	AI-powered story genre transformation with custom NTI metric	OpenRouter API, LLMs, NLP

Healthcare & Multimodal AI

Project	Description	Stack
Vision-Audio Medical Chatbot	Multimodal healthcare AI combining medical image analysis + voice consultation	Vision AI, Speech Recognition, NLP
Healthcare_ChatBot	Domain-specific healthcare assistant	Python, NLP
HealthyfyMe	Health and wellness application	Python, ML

Computer Vision & AI Safety

Project	Description	Stack
AntiAI Platform	Deepfake detection + fake news verification system	PyTorch, EfficientNet-B4, Gradio, Computer Vision

Document Intelligence

Project	Description	Stack
LLMBasedPDF	LLM-powered PDF processing	LLMs
TextSummarizer_Project	Intelligent text summarization	NLP, Transformers
DocumentRetrieval	Smart document retrieval system	Information Retrieval

Full-Stack Apps

Project	Description	Stack
Edu-Connect-Dev	Educational platform	React
gdp-dashboard	Data visualization dashboard	Streamlit

💼 Professional Experience

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║                            🚀 CAREER JOURNEY 🚀                                     ║
║                                                                               ║
╚══════════════════════════════════════════════════════════════════════════════╝

AI Operations & Localization Consultant Leading AI-driven localization initiatives for Indic language content (Hindi, Telugu, Kannada) Building GenAI pipelines for multi-language content market (600M+ potential users) Optimizing content delivery workflows with custom AI tools (40% efficiency improvement)	Localization Operations & Gen AI Intern Developed GenAI tools for multi-language content localization (3+ languages) Automated translation and localization workflows using LLM-based pipelines Reduced manual localization effort by 40% through AI workflow optimization
Machine Learning Intern Built data preprocessing and analytics pipelines Developed and optimized ML models for production First hands-on experience with production ML systems

📜 Patents & Publications

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║                         🔬 RESEARCH CONTRIBUTIONS 🔬                                ║
║                                                                               ║
╚══════════════════════════════════════════════════════════════════════════════╝

A System for Providing Security Using a Plurality of Factors for IoT Gadgets
Filed (Indian Patent No. 202341040746)

Discovering Insights into Heart Health: A Survey of Data Mining and Machine Learning Methods
Presented at ICCICCT-2023 NICHE

Survey of AI-Driven Platforms for Welfare and Emergency Services: Gaps, Architectures and the Case for Unified Systems
GRENZE International Journal of Engineering and Technology (GIJET), Vol. 11, Issue 2, Pages 9911–9916, 2025
Co-authors: Lavanya Ramkumar, Afsha R, Vinayaka VM
📄 View Publication

🎓 Education

╭──────────────────────────────────────────────────────────────────────────────╮
│                                                                              │
│                   B.Tech in Computer Science & Technology                    │
│                   Dayananda Sagar University | First Class                   │
│                              2021–2025                                       │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

🔍 Currently Exploring

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║                          📚 LEARNING & GROWING 📚                                   ║
║                                                                               ║
╚══════════════════════════════════════════════════════════════════════════════╝

🤝 Let's Connect

╭───────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                       │
│   OPEN TO: AI/ML Engineer | Open Source Contributor | RAG Systems Developer            │
│            AI Product Engineer | MLOps Engineer | Agentic Systems Engineer            │
│                                                                                       │
│   LOCATION: Bengaluru, India (Open to Remote & Relocation)                            │
│                                                                                       │
╰───────────────────────────────────────────────────────────────────────────────────────╯

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                               ║
║                                                                               ║
║           “🎆 Building AI that understands, reasons, and delivers                ║
║                         real-world impact 🚀”                                    ║
║                                                                               ║
║                                                                               ║
╚══════════════════════════════════════════════════════════════════════════════╝

Hemant Sudarshan HemantSudarshan

Achievements

Achievements

Highlights

📄 Resume & Profile

🚀 Featured AI Projects & Systems

💡 What I Bring to Your Table

🌍 Open-Source Contributions

🔧 Refactor & Performance Optimization — openclaw.ai

📦 Documentation Regression Fix — Kreuzberg Repository

🐛 Bug Fix — docling-project/docling (IBM Open Source)

🏗️ My Approach to Building AI Systems with Production Standards

My Core Beliefs

Vision & Future Goals (5-Year Plan)

🔍 Deep Dive: Featured Projects

Agentic Inventory Restocking Service

Business Problem & System Objectives

Architecture & Data Flow

Critical Technical Components

Technology Stack (Deep Dive)

Key Technical Decisions & Rationale

Compliance-GPT

Business Problem & System Objectives

Architecture & Data Flow

Critical Technical Components

Technology Stack (Deep Dive)

Key Technical Decisions & Rationale

AudioRAG Enterprise

Business Problem & System Objectives

Architecture & Data Flow

Critical Technical Components

Technology Stack (Deep Dive)

Key Technical Decisions & Rationale

🎓 Key Learnings from These Featured Projects

Architectural Patterns

Technology Choices (Why Each Decision Matters)

Production Readiness Benchmarks

What Hiring Managers Should Know

🛠️ Skills & Tech Stack

🤖 LLMs & AI Models

📁 RAG, Agents & Vector DBs

🚀 Backend & APIs

💾 Databases & Caching

🧠 ML & Deep Learning

🔧 DevOps & Deployment

💻 Languages & Tools

💼 More Projects by Category

💼 Professional Experience

AI Operations & Localization Consultant

Localization Operations & Gen AI Intern

Machine Learning Intern

📜 Patents & Publications

🎓 Education

🔍 Currently Exploring

🤝 Let's Connect

Pinned Loading

Uh oh!