ΠΠ²ΡΠΎΡ: Π§ΡΠ±Π°ΡΠΎΠ²Π° ΠΠ°ΡΡΡ ΠΠ»Π΅ΠΊΡΠ΅Π΅Π²Π½Π°
A complete system for reducing flickering in frame-by-frame video segmentation using temporal smoothing techniques.
Live Demo: https://nickscherbakov.github.io/mask-stabilization/
A comprehensive presentation website (in Russian) showcasing:
- Problem statement and visual explanations
- System architecture and technologies
- Stabilization methods with formulas
- Experimental results and metrics
- API documentation
- Q&A section for homework defense
See docs/GITHUB_PAGES_SETUP.md for GitHub Pages setup instructions.
This project implements a full pipeline for:
- Video Segmentation using DeepLabv3 (PyTorch/torchvision)
- Mask Stabilization with multiple temporal smoothing methods
- Metrics Calculation to measure stability improvements
- REST API for easy integration (FastAPI)
- Interactive Analysis with Jupyter notebooks
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Upload βββββββΆβ Segmentation βββββββΆβ Stabilization β
β Video β β (DeepLabv3) β β (Temporal) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β β
β β
βΌ βΌ
ββββββββββββββββ βββββββββββββββ
β Masks β β Smoothed β
β (Before) β β Masks β
ββββββββββββββββ βββββββββββββββ
β
β
βΌ
βββββββββββββββ
β Metrics β
β Calculation β
βββββββββββββββ
mask-stabilization/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker container setup
βββ docker-compose.yml # Docker composition
β
βββ docs/ # Presentation website (GitHub Pages)
β βββ index.html # Main presentation page (Russian)
β βββ GITHUB_PAGES_SETUP.md # Setup instructions
β
βββ src/
β βββ __init__.py
β βββ main.py # FastAPI server
β βββ segmentation.py # DeepLabv3 segmentation
β βββ stabilization.py # Temporal smoothing methods
β βββ metrics.py # Stability metrics (IoU, etc.)
β βββ utils.py # Utility functions
β
βββ notebooks/
β βββ analysis.ipynb # Interactive analysis notebook
β
βββ frontend/
β βββ index.html # Web frontend (HTML/CSS/JS)
β βββ README.md # Frontend documentation
β
βββ spark_frontend/
β βββ SPARK_PROMPT.md # GitHub Spark frontend prompt
β
βββ examples/
β βββ .gitkeep # Place test videos here
β
βββ results/
βββ .gitkeep # Processing results stored here
-
Clone the repository
git clone https://github.com/NickScherbakov/mask-stabilization.git cd mask-stabilization -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
- Build and run with Docker Compose
docker-compose up --build
The API will be available at http://localhost:8000
# From the project root directory
uvicorn src.main:app --host 0.0.0.0 --port 8000 --reloadAccess the API documentation at http://localhost:8000/docs
jupyter notebook notebooks/analysis.ipynbThe notebook provides a step-by-step demonstration of the entire pipeline.
curl -X POST "http://localhost:8000/api/upload" \
-F "file=@path/to/video.mp4"Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "uploaded",
"video_info": {
"fps": 30.0,
"frame_count": 150,
"width": 1920,
"height": 1080
}
}curl -X POST "http://localhost:8000/api/segment" \
-H "Content-Type: application/json" \
-d '{"job_id": "YOUR_JOB_ID", "target_class": "person"}'curl -X POST "http://localhost:8000/api/stabilize" \
-H "Content-Type: application/json" \
-d '{
"job_id": "YOUR_JOB_ID",
"method": "moving_average",
"window_size": 5
}'curl "http://localhost:8000/api/status/YOUR_JOB_ID"curl "http://localhost:8000/api/metrics/YOUR_JOB_ID"Response:
{
"iou_before": {
"mean": 0.8234,
"std": 0.0521,
"min": 0.6543,
"max": 0.9876
},
"iou_after": {
"mean": 0.9123,
"std": 0.0234,
"min": 0.8234,
"max": 0.9912
},
"improvement": {
"iou_improvement": 0.0889,
"iou_improvement_percent": 10.8,
"instability_reduction_percent": 57.3
}
}curl "http://localhost:8000/api/frames/YOUR_JOB_ID/comparison/25" \
--output frame_25.pngFrame types: mask_before, mask_after, comparison
{
0: 'background',
15: 'person',
7: 'car',
6: 'bus',
8: 'truck',
9: 'boat',
17: 'cat',
18: 'dog',
19: 'horse',
20: 'sheep',
21: 'cow'
}Averages probability maps over a temporal window.
Parameters:
window_size: 3, 5, 7, or 9 (must be odd)
Formula:
smoothed[i] = mean(masks[i-w:i+w+1])
Use case: General-purpose smoothing, good balance between smoothness and responsiveness.
Computes median across temporal window for each pixel.
Parameters:
window_size: 3, 5, 7, or 9 (must be odd)
Formula:
smoothed[i] = median(masks[i-w:i+w+1])
Use case: Robust to outliers, preserves sharp edges better than moving average.
Weighted average giving more importance to recent frames.
Parameters:
alpha: 0.1 to 0.9 (smoothing factor)- Lower Ξ± = more smoothing
- Higher Ξ± = more responsive
Formula:
smoothed[t] = Ξ± * original[t] + (1-Ξ±) * smoothed[t-1]
Use case: Adaptive smoothing, good for varying motion speeds.
Measures overlap between consecutive frames:
IoU = |A β© B| / |A βͺ B|
Higher IoU = more temporal consistency
Instability = 1 - IoU
Higher instability = more flickering
- Mean IoU: Average consistency across all frame transitions
- IoU Standard Deviation: Variability in consistency
- Instability Reduction: Percentage decrease in flickering
- Min/Max IoU: Range of consistency values
A clean, modern web interface is available in the frontend/ directory.
-
Start the API server:
uvicorn src.main:app --host 0.0.0.0 --port 8000
-
Open the frontend:
- Simply open
frontend/index.htmlin a web browser, or - Serve it with a simple HTTP server:
cd frontend python -m http.server 8080 - Navigate to
http://localhost:8080/index.html
- Simply open
The frontend provides:
- Drag-and-drop video upload with format validation
- Real-time processing status with progress tracking
- Interactive frame viewer with navigation controls
- Metrics visualization showing IoU improvements
- Configuration options for object classes and stabilization methods
- Responsive design that works on all screen sizes
See frontend/README.md for detailed documentation.
For an alternative GitHub Spark interface, see the prompt in spark_frontend/SPARK_PROMPT.md.
- Upload a video
- Segment targeting "person" class
- Apply moving average with window_size=5
- Visualize results:
- Frame-by-frame comparison
- IoU improvement chart
- Quantitative metrics
Typical improvements with window_size=5:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Mean IoU | 0.823 | 0.912 | +10.8% |
| IoU Std | 0.052 | 0.023 | -55.8% |
| Instability | 0.177 | 0.088 | -50.3% |
# Add tests in tests/ directory
pytest tests/segmentation.py: VideoSegmenter class using DeepLabv3stabilization.py: MaskStabilizer with temporal filtering methodsmetrics.py: Metric calculation functionsutils.py: Helper functions for video/image processingmain.py: FastAPI application with REST endpoints
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is for educational purposes (Homework Assignment 5).
- DeepLabv3 model from torchvision
- FastAPI framework
- OpenCV for video processing
- DeepLabv3: Rethinking Atrous Convolution for Semantic Image Segmentation
- Temporal consistency in video segmentation
- Moving average and median filters for temporal smoothing
- Video Selection: Start with short videos (5-10 seconds) for faster processing
- Class Selection: Choose "person" for best results with human subjects
- Window Size: Start with 5, increase for more smoothing
- Alpha Value: Try 0.3 for balanced exponential smoothing
Issue: CUDA out of memory
# Solution: Use CPU mode or reduce batch size
segmenter = VideoSegmenter(device='cpu')Issue: Video won't upload
- Check file format (.mp4, .avi, .mov supported)
- Ensure file size < 100MB
- Verify video codec compatibility
Issue: Segmentation is slow
- Use GPU if available
- Reduce video resolution
- Process fewer frames
For issues and questions, please open an issue on GitHub.
Status: β Ready for deployment and testing
Version: 1.0.0
Last Updated: December 2025