Object detection, tracking, and pairwise relations analysis using YOLOv8 and Ultralytics.
Contact: brandongarate177@gmail.com
test4.mp4
annotated.mp4
# Clone the repo
git clone <repo-url>
cd YOLO-Object-Tracking
# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt# Run the full pipeline on a video file
python main.py --source input/sample.mp4
# Choose a robot policy
python main.py --source input/sample.mp4 --policy follow_person
python main.py --source input/sample.mp4 --policy protect_object
python main.py --source input/sample.mp4 --policy center_between
# With optional flags
python main.py --source input/sample.mp4 --model yolov8n.pt --conf 0.3 --output-dir output
# Re-run without re-tracking (uses cached tracks from a previous run)
python main.py --source input/sample.mp4 --load-tracks output/tracks.json --policy protect_objectAll outputs are saved to the output/ directory:
| File | Description |
|---|---|
relations.csv |
Per-frame pairwise object relations |
actions.csv |
Per-frame robot policy actions |
distance_plot.png |
Distance over time plot for tracked object pairs |
annotated.mp4 |
Video with detection overlays and policy actions |
tracks.json |
Cached tracking data (skip re-tracking on re-runs) |
main.py Thin entry point
yolo_tracking/
cli.py CLI argument parsing & pipeline orchestration
config.py PipelineConfig dataclass (all thresholds)
models.py Detection dataclass & type aliases
tracker.py YOLO detection & tracking (BoT-SORT)
relations.py Pairwise relations: distance, side, trend
visualizer.py Annotated video rendering
policies/
base.py Policy abstract base class
follow_person.py Option 1 — Follow the Person
protect_object.py Option 2 — Protect an Object
center_between.py Option 3 — Center Between Two Objects
utils/
geometry.py pixel_distance, left_or_right
- Detection dataclass — All modules share a typed
Detectionobject instead of raw dicts, providing IDE autocomplete and catching typos at definition time. - PipelineConfig — Every threshold and parameter lives in one dataclass, populated from CLI args. No magic numbers scattered in code.
- Policy plugin system — An abstract
Policybase class with a registry/factory. Adding a new policy only requires creating a new file inpolicies/and registering it. - Track caching — YOLO tracking is the slowest step. Tracks are saved to JSON after step 1 so you can re-run relations/policy/visualization instantly with
--load-tracks.
Video: input/test4.mp4
- Objects:
- Scene:
- Why chosen:
- Difficulties:
Euclidean pixel distance between the center points of two detected objects.
Compares the x-coordinates of two object centers to determine if object B is to the left, right, or aligned with object A.
Compares the distance between a pair of objects at frame t vs frame t-1:
- approaching — distance is decreasing
- moving_away — distance is increasing
- stable — distance is roughly unchanged
Three robot policies are implemented. Select one with --policy.
The robot follows the closest person in the frame.
| Condition | Action |
|---|---|
| No person detected | SEARCH |
| Person left of center (>10% offset) | TURN_LEFT |
| Person right of center (>10% offset) | TURN_RIGHT |
| Person centered | ALIGNED |
| Person bbox small (<20% frame height) | MOVE_FORWARD |
| Person bbox large (>40% frame height) | MOVE_BACKWARD |
| Person bbox in range | HOLD_POSITION |
Output format: "{direction} + {distance_cmd}" (e.g. "TURN_LEFT + MOVE_FORWARD")
Monitors a non-person object and raises an alert if any person gets too close.
| Condition | Action |
|---|---|
| No non-person object in frame | NO_TARGET |
| Person within alert distance of object | RAISE_ALERT (object #id) |
| No person near the object | IDLE |
The robot aligns itself at the midpoint between the two most spread-apart objects.
| Condition | Action |
|---|---|
| Fewer than 2 objects | NOT_ENOUGH |
| Midpoint is left of frame center | MOVE_LEFT (between X and Y) |
| Midpoint is right of frame center | MOVE_RIGHT (between X and Y) |
| Midpoint is close to center | CENTERED (between X and Y) |
def robot_policy(frame_info, image_width, image_height):
"""
frame_info: list of detected objects for the current frame (track_id, class_id, bbox, center).
Returns: a string representing the robot's action for this frame.
"""
persons = [obj for obj in frame_info if obj["class_id"] == 0]
if len(persons) == 0:
return "SEARCH"
person = sorted(persons, key=lambda x: x["track_id"])[0]
cx, cy = person["center"]
x_center = image_width / 2
x1, y1, x2, y2 = person["bbox"]
box_height = y2 - y1
# Horizontal control
offset_x = cx - x_center
if abs(offset_x) > 0.1 * image_width:
direction = "TURN_LEFT" if offset_x < 0 else "TURN_RIGHT"
else:
direction = "ALIGNED"
# Distance control
if box_height < 0.2 * image_height:
distance_cmd = "MOVE_FORWARD"
elif box_height > 0.4 * image_height:
distance_cmd = "MOVE_BACKWARD"
else:
distance_cmd = "HOLD_POSITION"
return f"{direction} + {distance_cmd}"- Horizontal alignment (10% of frame width): Avoids jittery turn commands from minor pixel-level position noise. A person within 10% of center is considered aligned.
- Too far (bbox height < 20% of frame): When the person's bounding box is less than 20% of the frame height, they appear small and distant — the robot should move forward.
- Too close (bbox height > 40% of frame): When the person fills more than 40% of the frame vertically, they are very close — the robot should back up.
- Confidence threshold (0.3): Filters out low-confidence YOLO detections that add noise to the relations analysis.
- Alert distance (200px): For the protect_object policy — a person within 200 pixels of the protected object triggers an alert.
frame,objectA_id,objectA_class,objectB_id,objectB_class,distance,side,relation
0,1,person,7,bottle,145.2,left,stable
1,1,person,7,bottle,130.9,left,approaching
2,1,person,7,bottle,118.4,left,approaching
frame,action
0,SEARCH
1,TURN_LEFT + MOVE_FORWARD
2,ALIGNED + HOLD_POSITION
See output/distance_plot.png — shows distance between each tracked object pair over time.
- Track ID consistency: YOLO occasionally fails to assign track IDs on certain frames (e.g. during fast motion or occlusion). We handle this by skipping untracked frames rather than assigning fake IDs that corrupt the relations analysis.
- Temporal relation gaps: When objects are temporarily occluded, the distance history between pairs must be preserved across frame gaps to correctly compute approaching/moving_away trends.
- Output file size: The default mp4v codec produced very large annotated videos (~334MB). Switching to H.264 (avc1) reduced this significantly.
- Architecture: Restructuring from a flat script layout to a proper Python package required careful dependency management to avoid circular imports while keeping
python main.pyworking.
- How YOLOv8 and the Ultralytics tracking API (BoT-SORT) work together for multi-object tracking with persistent IDs across frames.
- Computing pairwise spatial and temporal relations between tracked objects (distance, left/right, approaching/moving away).
- Building a simple rule-based robot policy that translates visual observations into actions.
- Designing a plugin system with abstract base classes and a registry pattern for extensible robot policies.
- Video processing with OpenCV — reading, annotating, and writing video frames with overlays.