- Problem Statement
- Project Overview
- Architecture Overview
- Features
- Technology Stack
- Folder Structure
- Prerequisites
- Getting Started
- Deployment Commands
- Application URLs
- Video Processing Pipeline
- Auto Scaling Across Multiple AZs
- Troubleshooting
- API Documentation
- Scaling for High Load
Build a production-ready video streaming platform that can handle video uploads, process them into adaptive streaming formats (DASH/HLS), and deliver content efficiently through a CDN. The platform must handle concurrent video processing, provide real-time status updates, implement rate limiting, and scale automatically based on demand while optimizing costs using spot instances.
VisionSync is a comprehensive cloud-native video streaming platform built with modern AWS services and containerization technologies. The platform enables users to upload videos through presigned URLs, automatically processes them into adaptive streaming formats, and delivers content via CloudFront CDN with real-time status updates through WebSocket connections.
The architecture leverages AWS ECS Fargate with intelligent spot/regular instance selection (70% spot, 30% regular) for cost-optimized video processing. Videos are transcoded into multiple resolutions (1080p, 720p, 480p, 360p) using FFmpeg and packaged as DASH-compliant streaming format with adaptive bitrate switching.
The backend is deployed across multiple AWS EC2 instances in an autoscaling group, connected to a MongoDB replica set (1 primary + 2 secondary nodes) for data persistence and Redis for caching. An Application Load Balancer distributes traffic across backend instances, while Lambda functions orchestrate the video processing workflow triggered by SQS messages. All infrastructure is managed through Pulumi IaC, with Ansible handling automated deployment and configuration management across multiple availability zones.
VisionSync consists of five main components working together to deliver a scalable video streaming platform:
1. Frontend (React + TypeScript):
- Modern React application with Shadcn UI components for video upload and streaming interface
- Real-time progress tracking using Socket.IO for video processing status updates
- DASH.js integration for adaptive streaming with automatic quality switching
- Containerized with Docker and deployed on EC2 in public subnet
- Optimized build served through CloudFront for global low-latency access
2. Backend (Node.js + Express + TypeScript):
- RESTful API with Express handling video upload via presigned S3 URLs
- Socket.IO for real-time WebSocket connections broadcasting processing status
- Advanced rate limiting with multiple algorithms (token bucket, sliding window)
- Deployed in private subnet across auto-scaling EC2 instances (min: 1, max: 5)
- Connected to ALB for traffic distribution and health monitoring
- Integration with AWS services: S3, SQS, Lambda, CloudFront
3. Video Processing (ECS Fargate + FFmpeg):
- Containerized FFmpeg processor running on AWS ECS Fargate
- Intelligent instance selection: 70% Spot (cost-optimized), 30% Regular (reliability)
- Downloads videos from S3 raw bucket, transcodes to multiple resolutions
- Generates DASH manifests with adaptive bitrate streaming
- Uploads processed chunks and manifests to S3 processed bucket
- Sends webhook to backend upon completion
4. Database Layer:
- MongoDB Replica Set: 1 primary + 2 secondary nodes in private subnet (zone 1c)
- Redis Cache: Single instance for session management and caching
- Deployed and configured using Ansible automation
- High availability with automatic failover
5. Infrastructure (AWS + Pulumi IaC):
- VPC: Multi-AZ deployment across 3 availability zones (ap-southeast-1a, 1b, 1c)
- Subnets: Public (2 AZs for ALB/bastion), Private (3 AZs for apps/databases)
- Application Load Balancer: Cross-AZ traffic distribution with health checks
- Auto Scaling Group: Dynamic scaling for backend EC2 (CPU-based: 10%-80%)
- ECS Cluster: Fargate tasks for video processing with auto-scaling based on SQS depth
- Lambda Function: Orchestrates ECS task launches from SQS triggers
- S3 Buckets: Separate buckets for raw and processed videos
- CloudFront CDN: Global content delivery with cache optimization
- SQS Queue: Message queue for video processing jobs with dead letter queue
- ECR Repositories: Docker image storage for backend, frontend, and video processor
- IAM Roles: Least privilege access control for all services
- CloudWatch: Logs, metrics, and alarms for monitoring
- Bastion Host: Secure SSH access to private subnet resources
The architecture flow:
- User uploads video → Frontend generates presigned S3 URL
- Video uploaded to S3 raw bucket → triggers Lambda via SQS message
- Lambda launches ECS Fargate task (Spot or Regular based on file size)
- ECS container downloads, processes, and uploads to S3 processed bucket
- Webhook notifies Backend → updates MongoDB → emits Socket.IO event
- Frontend receives status update → displays video with CloudFront URL
- User streams video via DASH player with adaptive quality switching
- Adaptive Bitrate Streaming: DASH-compliant streaming with automatic quality switching based on network conditions
- Multi-Resolution Support: Videos transcoded to 1080p, 720p, 480p, 360p for optimal device compatibility
- FFmpeg Processing: Professional-grade video compression and chunking (4-6 second segments)
- Real-Time Progress: Socket.IO powered live updates for upload and processing status
- Presigned URL Upload: Secure direct-to-S3 uploads without server overhead
- Automatic Thumbnail Generation: Creates thumbnails during video processing
- Smart ECS Instance Selection: 70% Spot instances (70% cost savings), 30% Regular Fargate for reliability
- Intelligent Fallback: Automatic retry on Regular instances if Spot unavailable
- File Size-Based Strategy: Files >1GB automatically use Regular Fargate for stability
- Batch Processing Mode: Lightweight jobs processed efficiently with batch mode enabled
- S3 Lifecycle Policies: Automatic storage class transitions for cost management
- CloudFront CDN: Global content delivery with intelligent caching for hot data
- Multi-AZ Deployment: Infrastructure spans 3 availability zones (ap-southeast-1a, 1b, 1c)
- Backend Auto Scaling: CPU-based scaling (10%-80% thresholds), 1-5 instances
- ECS Task Auto Scaling: SQS depth-based scaling for video processing workload
- MongoDB Replica Set: 1 Primary + 2 Secondary nodes with automatic failover
- ALB Health Checks: Continuous monitoring with automatic traffic rerouting
- Zero-Downtime Deployments: Rolling updates with health verification
- Multiple Rate Limiting Algorithms: Token bucket, sliding window, fixed window implementations
- Redis Caching: Session management, rate limiting, and Socket.IO backplane
- Dead Letter Queue: Failed processing jobs captured for debugging and retry
- Comprehensive Monitoring: CloudWatch logs, metrics, and alarms
- Security Best Practices: Private subnets, IAM least privilege, security groups
- WebSocket Real-Time Updates: Live status broadcasts for all connected clients
- Infrastructure as Code: Complete Pulumi-based IaC for reproducible infrastructure
- Ansible Automation: Automated deployment, configuration, and database setup
- One-Command Deployment:
make deploy-alldeploys entire platform - Fast Update Deployments:
make deploy-fastfor quick code updates - ECR Integration: Private Docker registry for all container images
- Automated Environment Configuration: Dynamic .env generation from Pulumi outputs
- SSH Bastion Access: Secure gateway to private subnet resources
-
Frontend:
- React with TypeScript for type-safe development
- Vite for lightning-fast build tooling
- Tailwind CSS for utility-first responsive styling
- Shadcn/ui for modern, accessible component library
- DASH.js for adaptive streaming video playback
- Socket.IO Client for real-time WebSocket connections
- Lucide React for beautiful, consistent icons
- Nginx for reverse proxy and optimized static file serving
- Docker multi-stage builds for production deployment
-
Backend:
- Node.js 18+ with Express and TypeScript
- Socket.IO for real-time bidirectional communication
- Express Rate Limit with multiple algorithms (token bucket, sliding window)
- Express Validator for input validation and sanitization
- Helmet for security headers
- Mongoose for MongoDB object modeling
- AWS SDK v3 for S3, SQS, CloudFront integration
- Multer for multipart/form-data handling
- UUID for unique identifier generation
- Docker multi-stage builds with optimized layers
-
Video Processing Container:
- FFmpeg for video transcoding, compression, and chunking
- Node.js runtime for orchestration logic
- AWS SDK for S3 download/upload operations
- Adaptive quality settings based on instance type
- Custom webhook notification system
- Docker containerized for ECS Fargate deployment
-
Database & Cache:
- MongoDB: Replica set with 1 Primary + 2 Secondary nodes for HA
- Redis: In-memory cache for sessions, rate limiting, Socket.IO adapter
- Mongoose schema validation and middleware
- Connection pooling and retry logic
-
AWS Services:
- Compute: EC2 (t3.micro), ECS Fargate (2 vCPU, 4GB RAM), Lambda (Node.js 18)
- Storage: S3 (raw/processed buckets), ECR (container registry)
- Networking: VPC, ALB, NAT Gateway, Internet Gateway, Security Groups
- Messaging: SQS with dead letter queue
- CDN: CloudFront for global content delivery
- Monitoring: CloudWatch Logs, Metrics, Alarms
- IAM: Role-based access control with least privilege
-
Infrastructure Architecture:
- VPC: Multi-AZ across ap-southeast-1a, 1b, 1c
- Public Subnets: 2 AZs for ALB, Bastion, Frontend (10.10.1.0/24, 10.10.2.0/24)
- Private Subnets: 3 AZs for Backend, MongoDB, Redis (10.10.3-5.0/24)
- Auto Scaling Groups: CPU-based scaling for backend (10%-80%)
- ECS Cluster: SQS-based auto-scaling for video processing
- Bastion Host: Secure SSH gateway to private resources
-
DevOps & Automation:
- IaC: Pulumi with TypeScript for infrastructure provisioning
- Configuration Management: Ansible playbooks for automated setup
- CI/CD: Makefile-based deployment pipeline
- Container Registry: AWS ECR for private image storage
- Version Control: Git with modular IaC structure
- SSH Key Management: Automated key generation and distribution
- Environment Management: Dynamic .env generation from IaC outputs
-
/client: Frontend Application/src: React application source code with TypeScript/components: Reusable UI components (Button, Card, Progress, etc.)/lib: Utility functions and configurations
Dockerfile: Multi-stage build for optimized production imagenginx.conf: Nginx configuration for serving static filesvite.config.ts: Vite build configurationtailwind.config.js: Tailwind CSS customizationpackage.json: Dependencies (React, TypeScript, DASH.js, Socket.IO)
-
/server: Backend API/src: Node.js/Express backend source/config: Database and AWS service configurations/models: Mongoose schemas for MongoDB/routes: API endpoint definitions/services: Business logic (video, S3, SQS services)/middleware: Authentication, rate limiting, validationserver.ts: Express server with Socket.IO setup
Dockerfile: Multi-stage build with security best practicespackage.json: Dependencies (Express, Mongoose, Socket.IO, AWS SDK).env: Environment variables (S3, SQS, MongoDB, Redis config)
-
/container: Video Processing Worker/src: FFmpeg video processing logicprocess-video.ts: Main processing orchestratorffmpeg-service.ts: FFmpeg wrapper for transcodings3-service.ts: S3 upload/download operationswebhook-service.ts: Completion notification
Dockerfile: FFmpeg + Node.js containerpackage.json: Dependencies (AWS SDK, FFmpeg)
-
/lambda: ECS Task Orchestrator/srcor/dist: Lambda function codeindex.js: SQS trigger handler, ECS task launcher- Configuration for Spot/Regular instance selection
- Package dependencies for AWS SDK
-
/IaC: Infrastructure as Code (Pulumi)/src: Modular infrastructure components/networking: VPC, subnets, ALB, security groups/compute: EC2, ECS, Lambda, ECR, Autoscaling/storage: S3 buckets with lifecycle policies/database: MongoDB and Redis instance configs/messaging: SQS queue and dead letter queue/monitoring: CloudWatch logs, metrics, alarms/security: IAM roles and policies/config: Centralized configuration
/bin: Pulumi app entry pointindex.ts: Main infrastructure export filePulumi.yaml: Pulumi project configurationtsconfig.json: TypeScript configuration
-
/ansible: Configuration Managementsite.yml: Main playbook for full deploymentdeploy-backend.yml: Backend deployment playbookdeploy-client.yml: Frontend deployment playbooksetup-mongodb-replica-set.yml: MongoDB cluster setupredis-docker-setup.yml: Redis installationhosts.ini/inventory.j2: Dynamic inventory templatesproduction-env.j2: Environment variable templates
-
/doc: Documentation- Architecture diagrams and detailed explanations
- Video processing pipeline documentation
- Critical implementation examples
-
Makefile: Deployment Automationdeploy-all: One-command full deploymentdeploy-fast: Quick code update deploymentsetup-mongodb: MongoDB replica set setupsetup-redis: Redis cache setup- Database management and status check commands
- Container build and push commands
- Infrastructure provisioning commands
-
docker-compose.yml: Local development environment (optional) -
README.md: Project documentation -
Project-details.md: Technical implementation details
Before deploying the application, ensure you have the following:
Required:
- AWS Account with permissions for EC2, ECS, S3, Lambda, CloudFront, IAM, VPC, ALB
- AWS CLI installed and configured (
aws configurewith access keys) - Docker installed (version 20.10+) for building and pushing containers
- Node.js (version 18 or above) and npm installed
- Pulumi installed for infrastructure as code (
curl -fsSL https://get.pulumi.com | sh) - Pulumi Account (free tier works) - sign up at pulumi.com
- Ansible installed for configuration management (
pip install ansible) - TypeScript (version 5 or above) installed globally (
npm install -g typescript) - Make utility (pre-installed on Linux/Mac, Windows users can use WSL)
- SSH key for AWS EC2 access (will be auto-generated if not exists)
Optional:
- MongoDB Atlas account (for development, production uses local replica set)
- Redis (for local development, production uses AWS-deployed Redis)
- FFmpeg (for local video processing testing)
AWS Service Limits to Check:
- ECS Fargate: At least 10 concurrent tasks
- EC2: At least 5 t3.micro instances in your region
- S3: Unlimited (default)
- VPC: At least 1 VPC available
- Elastic IPs: At least 3 available
# Clone the repository
git clone https://github.com/yourusername/vision-sync.git
cd vision-sync
# Deploy everything (infrastructure + backend + databases)
make deploy-allThis single command will:
- Install all dependencies (frontend, backend, lambda, container, IaC)
- Build all components
- Deploy AWS infrastructure (VPC, EC2, ECS, S3, Lambda, ALB, etc.)
- Build and push Docker images to ECR
- Configure databases (MongoDB replica set + Redis)
- Deploy backend services
- Show deployment URLs
1. Clone and Install
git clone https://github.com/yourusername/vision-sync.git
cd vision-sync
# Install all dependencies
make install2. Configure AWS
# Set your AWS credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region (ap-southeast-1), Output format (json)
# Verify configuration
aws sts get-caller-identity3. Configure Pulumi
cd IaC
pulumi login
pulumi stack init dev # or your preferred stack name
pulumi config set aws:region ap-southeast-1
cd ..4. Set Up Environment Variables
Create a .env file in /server directory:
# AWS Configuration
AWS_REGION=ap-southeast-1
S3_BUCKET_RAW=<will be auto-filled by make update-env>
S3_BUCKET_PROCESSED=<will be auto-filled by make update-env>
SQS_QUEUE_URL=<will be auto-filled by make update-env>
CLOUDFRONT_DOMAIN=<will be auto-filled by make update-env>
# Database (for development, use MongoDB Atlas)
MONGODB_URI=mongodb://localhost:27017/vision-sync
REDIS_URL=redis://localhost:6379
# Server Configuration
PORT=5000
NODE_ENV=development
FRONTEND_URL=http://localhost:3000
CORS_ORIGIN=*
# Socket.IO
SOCKET_IO_CORS_ORIGIN=http://localhost:3000
# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100Create a .env file in /client directory:
# Backend API URL
VITE_API_URL=http://localhost:5000
# Socket.IO URL
VITE_SOCKET_URL=http://localhost:5000Note: After running make deploy, use make update-env to automatically populate AWS resource values in server/.env.
5. Build All Components
make build6. Deploy Infrastructure
# Deploy AWS infrastructure
make deploy
# This will:
# - Create VPC, subnets, security groups
# - Launch EC2 instances (bastion, backend, MongoDB, Redis)
# - Set up ECS cluster and task definitions
# - Create S3 buckets, SQS queue, Lambda function
# - Configure ALB, CloudFront distribution
# - Automatically update server/.env with resource URLs7. Setup Databases
# Setup MongoDB replica set (1 Primary + 2 Secondary)
make setup-mongodb
# Setup Redis cache
make setup-redis
# Or setup both at once
make setup-all-db8. Build and Push Docker Images
# Build and push all containers to ECR
make push-containers9. Deploy Backend Services
make deploy-services10. Verify Deployment
# Check deployment status
make status
# View all resource URLs
make outputs
# Test backend health
curl http://<BACKEND_IP>:5000/healthFor local development without AWS:
# Terminal 1: Start backend
cd server
npm run dev
# Terminal 2: Start frontend
cd client
npm run dev
# Terminal 3: Start MongoDB (if local)
mongod --replSet rs0
# Terminal 4: Start Redis (if local)
redis-serverAccess the application:
- Frontend: http://localhost:3000
- Backend: http://localhost:5000
- API Health: http://localhost:5000/health
The application provides comprehensive automation through Makefile:
🚀 Complete Deployment
make deploy-all # Deploy everything: infra + backend + databases + services
make deploy-fast # Quick update for code changes onlymake deploy # Deploy AWS infrastructure with Pulumi
make destroy # Destroy all AWS resources (with confirmation)
make status # Check deployment status and resource summary
make outputs # Show all Pulumi outputs (S3, SQS, CloudFront, etc.)make push-containers # Build and push all images (backend, frontend, video processor)
make container # Build and push video processor container to ECR
make docker-clean # Clean Docker resources and free up spacemake setup-mongodb # Setup MongoDB replica set (1 Primary + 2 Secondary)
make setup-redis # Setup Redis cache with Docker
make setup-all-db # Setup both MongoDB and Redis
make check-mongodb # Verify MongoDB replica set status
make check-redis # Verify Redis server statusmake deploy-backend # Full backend deployment to EC2
make update-backend # Update backend container only
make status-backend # Check backend health and status
make logs-backend # View backend container logs
make ssh-backend # SSH into backend EC2 instancemake install # Install all dependencies (server, client, lambda, container, IaC)
make build # Build all components
make dev # Start local development servers (backend + frontend)
make clean # Clean build artifacts
make reset # Clean everything and reinstallmake update-env # Auto-update server/.env with AWS resource values
make env # Show basic environment variables
make create-inventory # Create Ansible inventory from Pulumi outputs
make check-ansible # Validate Ansible configurationmake logs-lambda # View Lambda function logs
make logs-ecs # View ECS container logs
make logs-server # View local server logsmake help # Show all available commands
make troubleshoot # Show common issues and solutions
make post-deploy # Show configuration after deploymentFirst-Time Deployment:
make install # Install dependencies
make deploy-all # Deploy everythingCode Update:
make deploy-fast # Quick updateDatabase Issues:
make check-mongodb # Check MongoDB status
make check-redis # Check Redis status
make setup-all-db # Recreate databasesDebugging:
make status # Overall status
make logs-backend # Backend logs
make logs-ecs # Video processing logs
make troubleshoot # Common solutionsAfter successful deployment, access your services at:
# Get all URLs
make outputsMain Application:
- Frontend:
http://<FRONTEND_PUBLIC_IP>orhttps://<CLOUDFRONT_DOMAIN> - Backend API:
http://<ALB_DNS_NAME>orhttp://<BACKEND_IP>:5000 - Health Check:
http://<BACKEND_IP>:5000/health - Socket.IO:
ws://<BACKEND_IP>:5000
AWS Resources:
- S3 Raw Bucket:
s3://<RAW_BUCKET_NAME> - S3 Processed Bucket:
s3://<PROCESSED_BUCKET_NAME> - CloudFront Distribution:
https://<CLOUDFRONT_DOMAIN> - SQS Queue:
<SQS_QUEUE_URL> - ECR Repository:
<ECR_REPOSITORY_URL>
Example URLs:
Frontend: http://54.251.192.45
Backend: http://alb-vision-1234567890.ap-southeast-1.elb.amazonaws.com
Health: http://54.251.192.45:5000/health
CloudFront: https://d3abc123xyz.cloudfront.net
Direct Access:
# SSH to backend
ssh -i ~/.ssh/vision-sync-backend ubuntu@<BACKEND_IP>
# SSH to bastion (for accessing private resources)
ssh -i ~/.ssh/vision-sync-backend ubuntu@<BASTION_IP>
# From bastion, access MongoDB primary
ssh ubuntu@<MONGODB_PRIMARY_IP>
# From bastion, access Redis
ssh ubuntu@<REDIS_IP>VisionSync implements a sophisticated serverless video processing pipeline optimized for cost and performance:
User Upload → Backend → S3 Raw → SQS → Lambda → ECS Fargate → FFmpeg Processing → S3 Processed → Webhook → Backend → Socket.IO → User
1. Video Upload Initiation
// User requests presigned URL from backend
POST /api/videos/upload-url
Body: { filename, fileSize, contentType }
// Backend generates presigned S3 URL (valid for 15 minutes)
Response: { uploadUrl, videoId, expiresIn }
// User uploads directly to S3 using presigned URL
PUT <uploadUrl>
Body: <video file>2. SQS Message Trigger
// Backend sends processing message to SQS
await sqsService.sendVideoProcessingMessage(
config.S3_BUCKET_RAW,
`videos/${videoId}/${filename}`,
videoId
);
// Updates video status in MongoDB
status: "PROCESSING"
// Emits Socket.IO event
socket.emit('video:status', { videoId, status: 'processing' })3. Lambda Orchestration
// Lambda triggered by SQS message
// Determines processing strategy based on file size
const useSpot = fileSize < 1_000_000_000 && Math.random() < 0.7; // 70% Spot
// Launches ECS Fargate task
await ecs.runTask({
cluster: ECS_CLUSTER,
taskDefinition: TASK_DEFINITION,
capacityProviderStrategy: useSpot ?
[{ capacityProvider: 'FARGATE_SPOT', weight: 1 }] :
[{ capacityProvider: 'FARGATE', weight: 1 }],
overrides: {
containerOverrides: [{
environment: [
{ name: 'VIDEO_ID', value: videoId },
{ name: 'S3_KEY', value: s3Key },
{ name: 'WEBHOOK_URL', value: webhookUrl },
{ name: 'FFMPEG_PRESET', value: useSpot ? 'medium' : 'fast' }
]
}]
}
});4. ECS Container Processing
The container performs these steps:
// Download video from S3
await downloadFromS3(bucket, key, localPath);
// Process with FFmpeg
const resolutions = useSpot ?
['720p', '480p', '360p'] : // Cost-optimized
['1080p', '720p', '480p', '360p']; // Quality-optimized
// For each resolution
for (const resolution of resolutions) {
// Transcode video
await ffmpeg
.input(inputPath)
.size(resolution)
.videoBitrate(bitrate)
.audioBitrate('128k')
.outputOptions([
'-f dash', // DASH format
`-seg_duration ${segmentDuration}`, // 4-6 second segments
'-use_timeline 1',
'-use_template 1',
'-adaptation_sets "id=0,streams=v id=1,streams=a"'
])
.save(outputPath);
}
// Generate manifest.mpd
await generateDashManifest(outputDir);
// Upload all chunks and manifest to S3
await uploadDirectory(outputDir, processedBucket, videoId);
// Generate thumbnail
await generateThumbnail(inputPath, thumbnailPath);
await uploadToS3(thumbnailPath, processedBucket, `${videoId}/thumbnail.jpg`);5. Webhook Notification
// Container sends webhook to backend
POST <WEBHOOK_URL>/api/webhook/processing-complete
Body: {
videoId,
status: 'ready',
manifestUrl: `${cloudfrontDomain}/${videoId}/manifest.mpd`,
thumbnailUrl: `${cloudfrontDomain}/${videoId}/thumbnail.jpg`,
resolutions: ['1080p', '720p', '480p', '360p'],
duration: 600,
processingTime: 180
}
// Backend updates MongoDB
await Video.findByIdAndUpdate(videoId, {
status: 'ready',
manifestUrl,
thumbnailUrl,
resolutions,
processedAt: new Date()
});
// Emits Socket.IO event
io.to(videoId).emit('video:ready', {
videoId,
manifestUrl,
thumbnailUrl
});6. Client Playback
// Frontend receives Socket.IO event
socket.on('video:ready', ({ videoId, manifestUrl }) => {
// Initialize DASH player
const player = dashjs.MediaPlayer().create();
player.initialize(videoElement, manifestUrl, autoPlay);
// Player automatically selects quality based on bandwidth
player.updateSettings({
streaming: {
abr: {
autoSwitchBitrate: { video: true }
}
}
});
});Spot vs Regular Instance Selection:
| Condition | Instance Type | Cost Savings | Trade-off |
|---|---|---|---|
| File < 1GB AND 70% probability | Spot | 70% cheaper | May be interrupted |
| File ≥ 1GB OR Spot unavailable | Regular | Standard cost | Guaranteed completion |
Processing Settings by Instance:
| Setting | Spot Instance | Regular Instance |
|---|---|---|
| Resolutions | 720p, 480p, 360p | 1080p, 720p, 480p, 360p |
| FFmpeg Preset | medium | fast |
| CRF | 25 | 23 |
| Segment Duration | 6 seconds | 4 seconds |
| Threads | 1 | 2 |
| Max Processing Time | 60 minutes | 30 minutes |
Additional Optimizations:
- S3 Lifecycle: Move old videos to Glacier after 90 days
- CloudFront: Cache popular videos at edge locations
- Batch Mode: Process multiple small jobs together
- Dead Letter Queue: Retry failed jobs up to 3 times
Users receive live updates throughout the process:
// Upload progress
socket.emit('video:uploading', { videoId, progress: 45 });
// Processing started
socket.emit('video:processing', { videoId, stage: 'transcoding' });
// Processing progress (from container webhooks)
socket.emit('video:processing', { videoId, stage: 'encoding-720p', progress: 60 });
// Processing complete
socket.emit('video:ready', { videoId, manifestUrl, thumbnailUrl });
// Error handling
socket.emit('video:error', { videoId, error: 'Processing failed', retryable: true });This pipeline ensures efficient, cost-effective video processing with high reliability and excellent user experience.
Our autoscaling setup distributes backend instances across multiple availability zones for maximum fault tolerance and performance.
Availability Zones:
- AZ-A (ap-southeast-1a): Public subnet (frontend, ALB) + Private subnet (backend)
- AZ-B (ap-southeast-1b): Public subnet (ALB) + Private subnet (backend, ECS tasks)
- AZ-C (ap-southeast-1c): Private subnet (MongoDB replica set, Redis)
Subnet Layout:
Public Subnets:
├── AZ-A: 10.10.1.0/24 (Frontend EC2, ALB, Bastion)
└── AZ-B: 10.10.2.0/24 (ALB)
Private Subnets:
├── AZ-A: 10.10.3.0/24 (Backend EC2)
├── AZ-B: 10.10.4.0/24 (Backend EC2, ECS Tasks)
└── AZ-C: 10.10.5.0/24 (MongoDB Primary + Secondary, Redis)
Backend Auto Scaling Group Configuration:
Desired Capacity: 2 instances
Minimum Size: 1 instance
Maximum Size: 5 instances
Instance Type: t3.micro
Health Check: ALB with 300 seconds grace period
Evaluation Period: 2 minutesECS Auto Scaling Configuration:
Service: Video Processing
Desired Tasks: 0 (scales based on SQS)
Minimum Tasks: 0
Maximum Tasks: 10
Scaling Metric: SQS Queue Depth
Target: 1 message per task
Scale-out Cooldown: 60 seconds
Scale-in Cooldown: 300 secondsInitial Backend Deployment:
2 instances:
├── AZ-A: 1 backend instance (private subnet)
└── AZ-B: 1 backend instance (private subnet)
Scale-Up Scenarios:
3 instances: AZ-A (2), AZ-B (1)
4 instances: AZ-A (2), AZ-B (2)
5 instances: AZ-A (3), AZ-B (2) or AZ-A (2), AZ-B (3)
ECS Task Distribution:
Video processing tasks distribute across:
├── Private Subnet AZ-A (10.10.3.0/24)
└── Private Subnet AZ-B (10.10.4.0/24)
Tasks are assigned based on:
- Available resources in each AZ
- Current task count per AZ
- Spot vs Regular capacity provider selection
Backend EC2 Auto Scaling:
| Policy | Metric | Threshold | Duration | Action | Cooldown |
|---|---|---|---|---|---|
| Scale Out | CPU Utilization | > 80% | 2 minutes | +1 instance | 300s |
| Scale In | CPU Utilization | < 10% | 5 minutes | -1 instance | 300s |
ECS Task Auto Scaling:
| Policy | Metric | Threshold | Action | Cooldown |
|---|---|---|---|---|
| Scale Out | SQS Messages | > 1 per task | +1 task | 60s |
| Scale In | SQS Messages | 0 messages | -1 task | 300s |
Additional Triggers:
- Network Throttling: Scale out if NetworkIn > 10MB/s sustained
- Memory Pressure: Scale out if MemoryUtilization > 85%
- Socket Connections: Scale out if concurrent Socket.IO connections > 1000
Single AZ Failure Scenario:
-
Detection (< 30 seconds):
- ALB health checks detect unhealthy instances
- CloudWatch alarms trigger
- Auto Scaling marks instances as unhealthy
-
Traffic Rerouting (immediate):
- ALB stops routing to failed AZ
- All traffic goes to healthy AZ instances
- Socket.IO connections reconnect automatically
-
Recovery (5-10 minutes):
- Auto Scaling launches replacement instances
- New instances register with ALB
- Health checks pass, traffic resumes
-
Data Consistency:
- MongoDB replica set maintains data integrity
- Redis persists session data
- Video processing jobs retry from SQS
Database Resilience:
- MongoDB: Automatic failover from Primary to Secondary (< 10 seconds)
- Redis: Persistence enabled, AOF every second
- Backup: Automated daily snapshots
Load Balancer Integration:
- ALB spans public subnets in AZ-A and AZ-B
- Cross-zone load balancing enabled
- Health checks every 30 seconds (3 unhealthy = remove)
- Deregistration delay: 30 seconds (for graceful shutdown)
- Sticky sessions: Enabled (for Socket.IO)
Zero-Downtime Deployments:
# Rolling update strategy
1. Deploy new version to 1 instance
2. Wait for health checks to pass
3. Deploy to next instance
4. Repeat until all updated
5. Keep minimum 50% capacity during update1. Video Upload Fails
# Check S3 bucket permissions
aws s3 ls s3://<RAW_BUCKET_NAME>
# Verify backend can generate presigned URLs
curl http://<BACKEND_IP>:5000/health
# Check backend logs
make logs-backend
# Solution: Verify IAM role has S3 PutObject permission2. Video Processing Stuck
# Check SQS queue for messages
aws sqs get-queue-attributes \
--queue-url <SQS_QUEUE_URL> \
--attribute-names ApproximateNumberOfMessages
# Check Lambda logs
make logs-lambda
# Check ECS tasks
aws ecs list-tasks --cluster vision-sync-cluster
# Check dead letter queue
aws sqs receive-message --queue-url <DLQ_URL>
# Solution: Check Lambda has permission to launch ECS tasks3. Socket.IO Not Connecting
# Test Socket.IO endpoint
curl http://<BACKEND_IP>:5000/socket.io/
# Check CORS configuration
# server/.env should have: CORS_ORIGIN=*
# Check Redis connection
ssh -i ~/.ssh/vision-sync-backend ubuntu@<REDIS_IP>
docker exec redis-server redis-cli ping
# Solution: Ensure Socket.IO CORS allows frontend domain4. MongoDB Connection Errors
# Check replica set status
make check-mongodb
# Connect to primary
ssh -i ~/.ssh/vision-sync-backend ubuntu@<MONGODB_PRIMARY_IP>
mongosh --eval "rs.status()"
# Check if replica set is initialized
mongosh --eval "rs.isMaster()"
# Solution: Re-run MongoDB setup
make setup-mongodb5. Redis Connection Issues
# Check Redis status
make check-redis
# Test Redis connection
ssh -i ~/.ssh/vision-sync-backend ubuntu@<REDIS_IP>
docker ps | grep redis
docker logs redis-server
# Solution: Restart Redis container
ssh ubuntu@<REDIS_IP> "docker restart redis-server"6. Backend 502/503 Errors
# Check ALB target health
aws elbv2 describe-target-health \
--target-group-arn <TARGET_GROUP_ARN>
# Check backend health
curl http://<BACKEND_IP>:5000/health
# Check backend logs
make logs-backend
# Solution: Verify security groups allow ALB → Backend traffic7. ECS Tasks Failing
# View ECS task logs
make logs-ecs
# Check task definition
aws ecs describe-task-definition \
--task-definition vision-sync-video-processor
# Check ECR image exists
aws ecr describe-images \
--repository-name vision-sync-video-processor
# Solution: Rebuild and push container
make containerSystem Overview:
make status # Overall deployment status
make outputs # All resource URLs
make troubleshoot # Common issues guideBackend Debugging:
make logs-backend # View backend logs
make ssh-backend # SSH into backend
make status-backend # Backend health check
# Inside backend instance
docker ps
docker logs vision-sync-backend
docker exec -it vision-sync-backend shDatabase Debugging:
make check-mongodb # MongoDB status
make check-redis # Redis status
# MongoDB replica set info
ssh ubuntu@<MONGODB_IP>
mongosh --eval "rs.status()" | grep -E "(stateStr|name)"
# Redis info
ssh ubuntu@<REDIS_IP>
docker exec redis-server redis-cli info | grep connectedVideo Processing Debugging:
make logs-ecs # ECS container logs
make logs-lambda # Lambda orchestration logs
# Check SQS queue
aws sqs get-queue-attributes \
--queue-url <SQS_URL> \
--attribute-names All
# Check specific ECS task
aws ecs describe-tasks \
--cluster vision-sync-cluster \
--tasks <TASK_ARN>VisionSync provides a comprehensive RESTful API for video management, upload, and streaming operations.
Production: http://<ALB_DNS_NAME> or http://<BACKEND_IP>:5000
Development: http://localhost:5000
CloudFront CDN: https://<CLOUDFRONT_DOMAIN>
Currently, the API uses session-based authentication. Future versions will implement JWT tokens.
Multiple rate limiting algorithms are implemented:
- Default: 100 requests per 15 minutes per IP
- Upload endpoint: 10 requests per 15 minutes per IP
- Streaming: Unlimited (handled by CloudFront)
Rate limit headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1736524800
GET /healthDescription: Check server health and status
Response:
{
"status": "ok",
"timestamp": "2025-01-09T10:30:00.000Z",
"uptime": 3600,
"mongodb": "connected",
"redis": "connected",
"services": {
"s3": "available",
"sqs": "available",
"ecs": "available"
}
}Status Codes:
200 OK- Server is healthy503 Service Unavailable- Server or dependencies are down
POST /api/videos/upload-url
Content-Type: application/jsonDescription: Request a presigned S3 URL for direct video upload
Request Body:
{
"filename": "my-video.mp4",
"fileSize": 52428800,
"contentType": "video/mp4"
}Validation:
filename: Required, string, max 255 charsfileSize: Required, number, max 5GB (5368709120 bytes)contentType: Required, must be video/* MIME type
Response:
{
"success": true,
"data": {
"uploadUrl": "https://vision-sync-raw-bucket.s3.amazonaws.com/videos/...",
"videoId": "677f3a5c9e8f1b2c3d4e5f6a",
"expiresIn": 900,
"key": "videos/677f3a5c9e8f1b2c3d4e5f6a/my-video.mp4"
}
}Status Codes:
200 OK- Upload URL generated successfully400 Bad Request- Invalid request parameters413 Payload Too Large- File size exceeds limit429 Too Many Requests- Rate limit exceeded500 Internal Server Error- S3 service error
Usage Example:
// Step 1: Get presigned URL
const response = await fetch('/api/videos/upload-url', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
filename: file.name,
fileSize: file.size,
contentType: file.type
})
});
const { uploadUrl, videoId } = await response.json();
// Step 2: Upload directly to S3
await fetch(uploadUrl, {
method: 'PUT',
body: file,
headers: { 'Content-Type': file.type }
});
// Step 3: Confirm upload
await fetch(`/api/videos/${videoId}/confirm`, { method: 'POST' });POST /api/videos/:videoId/confirmDescription: Trigger video processing after successful S3 upload
Path Parameters:
videoId: UUID of the video
Response:
{
"success": true,
"message": "Video processing started",
"data": {
"videoId": "677f3a5c9e8f1b2c3d4e5f6a",
"status": "processing",
"estimatedTime": 300
}
}Status Codes:
200 OK- Processing initiated404 Not Found- Video not found409 Conflict- Video already processing
GET /api/videosDescription: Retrieve list of all uploaded videos
Query Parameters:
status(optional): Filter by status (uploading, processing, ready, failed)limit(optional): Number of videos per page (default: 20, max: 100)skip(optional): Number of videos to skip (pagination)sort(optional): Sort field (createdAt, filename, duration)order(optional): Sort order (asc, desc)
Response:
{
"success": true,
"data": {
"videos": [
{
"_id": "677f3a5c9e8f1b2c3d4e5f6a",
"filename": "my-video.mp4",
"fileSize": 52428800,
"status": "ready",
"manifestUrl": "https://d3abc123xyz.cloudfront.net/677f3a5c.../manifest.mpd",
"thumbnailUrl": "https://d3abc123xyz.cloudfront.net/677f3a5c.../thumbnail.jpg",
"duration": 120,
"resolutions": ["1080p", "720p", "480p", "360p"],
"uploadedAt": "2025-01-09T10:00:00.000Z",
"processedAt": "2025-01-09T10:03:45.000Z",
"processingTime": 225
}
],
"total": 45,
"limit": 20,
"skip": 0
}
}Status Codes:
200 OK- Videos retrieved successfully400 Bad Request- Invalid query parameters
GET /api/videos/:videoIdDescription: Retrieve detailed information about a specific video
Path Parameters:
videoId: UUID of the video
Response:
{
"success": true,
"data": {
"_id": "677f3a5c9e8f1b2c3d4e5f6a",
"filename": "my-video.mp4",
"originalFilename": "my-video.mp4",
"fileSize": 52428800,
"contentType": "video/mp4",
"status": "ready",
"manifestUrl": "https://d3abc123xyz.cloudfront.net/677f3a5c.../manifest.mpd",
"thumbnailUrl": "https://d3abc123xyz.cloudfront.net/677f3a5c.../thumbnail.jpg",
"duration": 120,
"resolutions": ["1080p", "720p", "480p", "360p"],
"s3Keys": {
"raw": "videos/677f3a5c.../my-video.mp4",
"processed": "processed/677f3a5c.../"
},
"metadata": {
"codec": "h264",
"width": 1920,
"height": 1080,
"fps": 30,
"bitrate": 5000000
},
"uploadedAt": "2025-01-09T10:00:00.000Z",
"processedAt": "2025-01-09T10:03:45.000Z",
"processingTime": 225,
"processingDetails": {
"instanceType": "fargate_spot",
"preset": "medium",
"compressionRatio": 0.65
}
}
}Status Codes:
200 OK- Video found404 Not Found- Video does not exist
DELETE /api/videos/:videoIdDescription: Delete video and all associated files from S3
Path Parameters:
videoId: UUID of the video
Query Parameters:
deleteFiles(optional): Whether to delete S3 files (default: true)
Response:
{
"success": true,
"message": "Video deleted successfully",
"data": {
"deletedFiles": {
"raw": true,
"processed": true
}
}
}Status Codes:
200 OK- Video deleted successfully404 Not Found- Video not found500 Internal Server Error- Failed to delete S3 files
GET /api/videos/:videoId/statusDescription: Get real-time processing status (also available via Socket.IO)
Path Parameters:
videoId: UUID of the video
Response:
{
"success": true,
"data": {
"videoId": "677f3a5c9e8f1b2c3d4e5f6a",
"status": "processing",
"stage": "encoding-720p",
"progress": 65,
"estimatedTimeRemaining": 120,
"currentResolution": "720p",
"completedResolutions": ["1080p"],
"message": "Encoding 720p resolution..."
}
}Video Status Values:
uploading- File being uploaded to S3processing- Video being transcodedready- Video processed and available for streamingfailed- Processing failed
Processing Stages:
queued- Waiting in SQS queuedownloading- Downloading from S3 raw bucketanalyzing- Analyzing video metadataencoding-1080p/720p/480p/360p- Encoding specific resolutiongenerating-manifest- Creating DASH manifestuploading-chunks- Uploading to S3 processed bucketfinalizing- Cleaning up and notifying backend
POST /api/webhook/processing-complete
Content-Type: application/json
X-Webhook-Secret: <shared_secret>Description: Called by ECS container when video processing completes
Request Body:
{
"videoId": "677f3a5c9e8f1b2c3d4e5f6a",
"status": "ready",
"manifestUrl": "https://d3abc123xyz.cloudfront.net/677f3a5c.../manifest.mpd",
"thumbnailUrl": "https://d3abc123xyz.cloudfront.net/677f3a5c.../thumbnail.jpg",
"resolutions": ["1080p", "720p", "480p", "360p"],
"duration": 120,
"processingTime": 225,
"metadata": {
"codec": "h264",
"fps": 30
}
}Response:
{
"success": true,
"message": "Webhook processed successfully"
}VisionSync uses Socket.IO for real-time bidirectional communication.
Connection:
import io from 'socket.io-client';
const socket = io('http://<BACKEND_IP>:5000', {
transports: ['websocket', 'polling'],
reconnection: true,
reconnectionDelay: 1000,
reconnectionAttempts: 5
});Join Video Room:
socket.emit('video:join', { videoId: '677f3a5c9e8f1b2c3d4e5f6a' });Leave Video Room:
socket.emit('video:leave', { videoId: '677f3a5c9e8f1b2c3d4e5f6a' });Video Upload Progress:
socket.on('video:upload-progress', (data) => {
// data: { videoId, progress: 45, bytesUploaded: 23592960, totalBytes: 52428800 }
console.log(`Upload progress: ${data.progress}%`);
});Video Processing Started:
socket.on('video:processing-started', (data) => {
// data: { videoId, status: 'processing', estimatedTime: 300 }
console.log('Processing started');
});Video Processing Progress:
socket.on('video:processing-progress', (data) => {
// data: {
// videoId,
// stage: 'encoding-720p',
// progress: 65,
// currentResolution: '720p',
// completedResolutions: ['1080p']
// }
console.log(`${data.stage}: ${data.progress}%`);
});Video Ready:
socket.on('video:ready', (data) => {
// data: {
// videoId,
// manifestUrl,
// thumbnailUrl,
// resolutions: ['1080p', '720p', '480p', '360p'],
// duration: 120
// }
console.log('Video ready for streaming!');
initializePlayer(data.manifestUrl);
});Video Processing Failed:
socket.on('video:error', (data) => {
// data: {
// videoId,
// error: 'Processing failed: Invalid codec',
// retryable: true,
// stage: 'encoding-1080p'
// }
console.error('Video processing error:', data.error);
});Connection Status:
socket.on('connect', () => {
console.log('Connected to server');
});
socket.on('disconnect', (reason) => {
console.log('Disconnected:', reason);
});
socket.on('reconnect', (attemptNumber) => {
console.log('Reconnected after', attemptNumber, 'attempts');
});All endpoints return errors in a consistent format:
{
"success": false,
"error": {
"code": "VIDEO_NOT_FOUND",
"message": "Video with ID 677f3a5c9e8f1b2c3d4e5f6a not found",
"details": {
"videoId": "677f3a5c9e8f1b2c3d4e5f6a"
}
}
}Common HTTP Status Codes:
200 OK- Request successful201 Created- Resource created successfully400 Bad Request- Invalid request parameters401 Unauthorized- Authentication required403 Forbidden- Insufficient permissions404 Not Found- Resource not found409 Conflict- Resource conflict (e.g., video already processing)413 Payload Too Large- File size exceeds limit429 Too Many Requests- Rate limit exceeded500 Internal Server Error- Server error503 Service Unavailable- Service temporarily unavailable
Common Error Codes:
VIDEO_NOT_FOUND- Video does not existVIDEO_ALREADY_PROCESSING- Video is already being processedINVALID_FILE_TYPE- Unsupported video formatFILE_TOO_LARGE- File exceeds 5GB limitUPLOAD_FAILED- S3 upload failedPROCESSING_FAILED- Video processing encountered an errorRATE_LIMIT_EXCEEDED- Too many requestsS3_ERROR- S3 service errorSQS_ERROR- SQS service errorMONGODB_ERROR- Database errorREDIS_ERROR- Cache error
Allowed Origins:
- Development:
http://localhost:3000,http://localhost:5173 - Production: Configured via
CORS_ORIGINenvironment variable
Allowed Methods:
GET,POST,PUT,DELETE,OPTIONS
Allowed Headers:
Content-Type,Authorization,X-Requested-With
Exposed Headers:
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset
Multi-AZ High Availability:
- 3 Availability Zones: Distributed across ap-southeast-1a, 1b, 1c
- Cross-AZ Load Balancing: ALB distributes traffic across all healthy instances
- Zone-Level Fault Tolerance: Automatic failover if entire AZ fails
- Database Replication: MongoDB replica set with automatic failover
Current Capacity:
- Backend: 1-5 EC2 instances (t3.micro, auto-scaling)
- Video Processing: 0-10 ECS Fargate tasks (2 vCPU, 4GB RAM each)
- Concurrent Uploads: ~50 per minute
- Video Processing: ~10 concurrent jobs
- Storage: Unlimited S3
- CDN: CloudFront with global edge locations
Performance Metrics:
- Video Upload: Direct to S3, ~50MB/s
- Processing Time: 3-5 minutes per GB (depends on Spot vs Regular)
- Streaming Latency: <100ms via CloudFront
- API Response Time: <200ms average
- Socket.IO Latency: <50ms
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes with clear messages
- Push to your fork (
git push origin feature/amazing-feature) - Open a Pull Request with detailed description
Development Guidelines:
- Follow TypeScript best practices
- Write descriptive commit messages
- Add tests for new features
- Update documentation
- Follow existing code style
Get Help:
- Open an issue on GitHub for bugs or feature requests
- Check existing issues before creating new ones
- Provide detailed information for faster resolution