Skip to content

BrandonGarate177/YOLO-Object-Tracking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YOLO Object Tracking & Relations Analysis

Object detection, tracking, and pairwise relations analysis using YOLOv8 and Ultralytics.

Contact: brandongarate177@gmail.com

Orginal video:

test4.mp4

After:

annotated.mp4

Setup

# Clone the repo
git clone <repo-url>
cd YOLO-Object-Tracking

# Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

Usage

# Run the full pipeline on a video file
python main.py --source input/sample.mp4

# Choose a robot policy
python main.py --source input/sample.mp4 --policy follow_person
python main.py --source input/sample.mp4 --policy protect_object
python main.py --source input/sample.mp4 --policy center_between

# With optional flags
python main.py --source input/sample.mp4 --model yolov8n.pt --conf 0.3 --output-dir output

# Re-run without re-tracking (uses cached tracks from a previous run)
python main.py --source input/sample.mp4 --load-tracks output/tracks.json --policy protect_object

Outputs

All outputs are saved to the output/ directory:

File Description
relations.csv Per-frame pairwise object relations
actions.csv Per-frame robot policy actions
distance_plot.png Distance over time plot for tracked object pairs
annotated.mp4 Video with detection overlays and policy actions
tracks.json Cached tracking data (skip re-tracking on re-runs)

Architecture

main.py                              Thin entry point
yolo_tracking/
  cli.py                             CLI argument parsing & pipeline orchestration
  config.py                          PipelineConfig dataclass (all thresholds)
  models.py                          Detection dataclass & type aliases
  tracker.py                         YOLO detection & tracking (BoT-SORT)
  relations.py                       Pairwise relations: distance, side, trend
  visualizer.py                      Annotated video rendering
  policies/
    base.py                          Policy abstract base class
    follow_person.py                 Option 1 — Follow the Person
    protect_object.py                Option 2 — Protect an Object
    center_between.py                Option 3 — Center Between Two Objects
  utils/
    geometry.py                      pixel_distance, left_or_right

Design Decisions

  • Detection dataclass — All modules share a typed Detection object instead of raw dicts, providing IDE autocomplete and catching typos at definition time.
  • PipelineConfig — Every threshold and parameter lives in one dataclass, populated from CLI args. No magic numbers scattered in code.
  • Policy plugin system — An abstract Policy base class with a registry/factory. Adding a new policy only requires creating a new file in policies/ and registering it.
  • Track caching — YOLO tracking is the slowest step. Tracks are saved to JSON after step 1 so you can re-run relations/policy/visualization instantly with --load-tracks.

Video Description

Video: input/test4.mp4

  • Objects:
  • Scene:
  • Why chosen:
  • Difficulties:

Object Relations

Distance Calculation

Euclidean pixel distance between the center points of two detected objects.

Left / Right Relationship

Compares the x-coordinates of two object centers to determine if object B is to the left, right, or aligned with object A.

Approaching / Moving Away

Compares the distance between a pair of objects at frame t vs frame t-1:

  • approaching — distance is decreasing
  • moving_away — distance is increasing
  • stable — distance is roughly unchanged

Robot Policies (Tier 2 — Extra Credit)

Three robot policies are implemented. Select one with --policy.

Option 1: Follow the Person (--policy follow_person)

The robot follows the closest person in the frame.

Condition Action
No person detected SEARCH
Person left of center (>10% offset) TURN_LEFT
Person right of center (>10% offset) TURN_RIGHT
Person centered ALIGNED
Person bbox small (<20% frame height) MOVE_FORWARD
Person bbox large (>40% frame height) MOVE_BACKWARD
Person bbox in range HOLD_POSITION

Output format: "{direction} + {distance_cmd}" (e.g. "TURN_LEFT + MOVE_FORWARD")

Option 2: Protect an Object (--policy protect_object)

Monitors a non-person object and raises an alert if any person gets too close.

Condition Action
No non-person object in frame NO_TARGET
Person within alert distance of object RAISE_ALERT (object #id)
No person near the object IDLE

Option 3: Center Between Two Objects (--policy center_between)

The robot aligns itself at the midpoint between the two most spread-apart objects.

Condition Action
Fewer than 2 objects NOT_ENOUGH
Midpoint is left of frame center MOVE_LEFT (between X and Y)
Midpoint is right of frame center MOVE_RIGHT (between X and Y)
Midpoint is close to center CENTERED (between X and Y)

Policy Function

def robot_policy(frame_info, image_width, image_height):
    """
    frame_info: list of detected objects for the current frame (track_id, class_id, bbox, center).
    Returns: a string representing the robot's action for this frame.
    """
    persons = [obj for obj in frame_info if obj["class_id"] == 0]
    if len(persons) == 0:
        return "SEARCH"

    person = sorted(persons, key=lambda x: x["track_id"])[0]
    cx, cy = person["center"]
    x_center = image_width / 2

    x1, y1, x2, y2 = person["bbox"]
    box_height = y2 - y1

    # Horizontal control
    offset_x = cx - x_center
    if abs(offset_x) > 0.1 * image_width:
        direction = "TURN_LEFT" if offset_x < 0 else "TURN_RIGHT"
    else:
        direction = "ALIGNED"

    # Distance control
    if box_height < 0.2 * image_height:
        distance_cmd = "MOVE_FORWARD"
    elif box_height > 0.4 * image_height:
        distance_cmd = "MOVE_BACKWARD"
    else:
        distance_cmd = "HOLD_POSITION"

    return f"{direction} + {distance_cmd}"

Threshold Choices

  • Horizontal alignment (10% of frame width): Avoids jittery turn commands from minor pixel-level position noise. A person within 10% of center is considered aligned.
  • Too far (bbox height < 20% of frame): When the person's bounding box is less than 20% of the frame height, they appear small and distant — the robot should move forward.
  • Too close (bbox height > 40% of frame): When the person fills more than 40% of the frame vertically, they are very close — the robot should back up.
  • Confidence threshold (0.3): Filters out low-confidence YOLO detections that add noise to the relations analysis.
  • Alert distance (200px): For the protect_object policy — a person within 200 pixels of the protected object triggers an alert.

Example Outputs

Relations CSV (output/relations.csv)

frame,objectA_id,objectA_class,objectB_id,objectB_class,distance,side,relation
0,1,person,7,bottle,145.2,left,stable
1,1,person,7,bottle,130.9,left,approaching
2,1,person,7,bottle,118.4,left,approaching

Actions CSV (output/actions.csv)

frame,action
0,SEARCH
1,TURN_LEFT + MOVE_FORWARD
2,ALIGNED + HOLD_POSITION

Distance Plot

See output/distance_plot.png — shows distance between each tracked object pair over time.


Challenges

  • Track ID consistency: YOLO occasionally fails to assign track IDs on certain frames (e.g. during fast motion or occlusion). We handle this by skipping untracked frames rather than assigning fake IDs that corrupt the relations analysis.
  • Temporal relation gaps: When objects are temporarily occluded, the distance history between pairs must be preserved across frame gaps to correctly compute approaching/moving_away trends.
  • Output file size: The default mp4v codec produced very large annotated videos (~334MB). Switching to H.264 (avc1) reduced this significantly.
  • Architecture: Restructuring from a flat script layout to a proper Python package required careful dependency management to avoid circular imports while keeping python main.py working.

What I Learned

  • How YOLOv8 and the Ultralytics tracking API (BoT-SORT) work together for multi-object tracking with persistent IDs across frames.
  • Computing pairwise spatial and temporal relations between tracked objects (distance, left/right, approaching/moving away).
  • Building a simple rule-based robot policy that translates visual observations into actions.
  • Designing a plugin system with abstract base classes and a registry pattern for extensible robot policies.
  • Video processing with OpenCV — reading, annotating, and writing video frames with overlays.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages