This project watches a YouTube live stream (webcam) and detects humans in real-time, drawing bounding boxes around them and saving the processed frames as images. It also provides a web interface to view the processed stream in real-time.
I'm just doing this for fun. This is not for any other use.
- Clone this repository
- Install the required packages:
pip install -r requirements.txt- Make sure you have FFmpeg installed on your system
- Set up your environment configuration:
cp .env.example .env- Edit the
.envfile with your preferred settings:
YOUTUBE_URL=https://www.youtube.com/watch?v=your_youtube_video_id
- Make sure you have Docker and Docker Compose installed on your system
- Set up your environment configuration:
cp .env.example .env- Edit the
.envfile with your YouTube URL - Build and run the container:
docker compose up --buildRun the script:
python live_human_detector.pyThe script will automatically:
- Connect to the YouTube stream specified in your .env file
- Process the video frames
- Detect and mark humans with bounding boxes
- Save the processed video to the
outputfolder - Start a web server at http://localhost:5005
Start the container:
docker compose upTo stop the container:
docker compose downOnce the application is running, open a web browser and navigate to:
http://localhost:5005
You'll see the live stream with bounding boxes drawn around detected humans.
The following environment variables can be set in the .env file:
| Variable | Description | Default |
|---|---|---|
| YOUTUBE_URL | URL of the YouTube live stream to process | https://www.youtube.com/watch?v=your_youtube_video_id |
| CONFIDENCE_THRESHOLD | Minimum confidence level (0-1) for human detection | 0.5 |
| SAVE_OUTPUT | Whether to save the processed video to disk | true |
The processed frames are saved to timestamped directories in the output folder:
output/session_20230815_143027/
├── frame_000001.jpg
├── frame_000002.jpg
├── frame_000003.jpg
└── ...
Frames are saved in the following cases:
- When humans are detected in the frame
- Periodically (every 5 seconds) to maintain context
Each saved frame includes:
- Bounding boxes around detected humans
- The current FPS calculation
- A timestamp showing when the frame was captured
- Python 3.7+
- OpenCV
- PyTorch
- yt-dlp (YouTube downloader)
- FFmpeg (for video processing)
- Flask (for web interface)
- Docker
- Docker Compose
This application uses a pre-trained Faster R-CNN model from PyTorch for human detection. The model is optimized for detecting people in various contexts.
This project is licensed under the MIT License - see the LICENSE file for details.
IMPORTANT: This software is provided for educational and research purposes only.
- This project is designed for analyzing publicly available streams only.
- Always respect privacy and copyright laws in your jurisdiction.
- Do not use this software for surveillance or to violate anyone's privacy.
- The authors and contributors are not responsible for any misuse of this software.

