Object Detection Pipeline with Gemini & DINO Models

This project implements a complete object detection workflow to identify short roadside bollards using two AI models:

Google's Gemini Vision API
Grounding DINO (Hugging Face)

Features

Gemini-based Detection (`gemini_api.py`)

Sends images to Gemini 2.0 Flash via API.
Extracts bounding boxes for bollards
Saves:
- JSON predictions per image
- Visualizations with red boxes around detected bollards
Supports hyperparameter tuning (temperature, top_p, top_k).

Grounding DINO Detection (`dino.py`)

Loads the Hugging Face GroundingDINO model locally.
Detects similar bollard objects using text prompts.
Saves:
- Annotated images with class names and bounding boxes
- Corresponding JSON files

Frame Extraction (`extract_frames_from_mcap.py`)

Converts .mcap video files into image frames.
Helps generate a dataset from video input.

Setup Instructions

1. Clone the Repository

2. Install Python Requirements

pip install -r requirements.txt

3a. Set up Google Gemini API

Get your API key from Google AI Studio for developers.

Uncomment and insert your key:

client = genai.Client(api_key='YOUR_KEY_HERE')

3b. Set up DINO

Clone this repository : https://github.com/IDEA-Research/GroundingDINO

4. Run Inference

Gemini

python gemini_api.py

DINO

python dino.py

Notes

Make sure the image filenames in chosen_dataset/ end with .png or .jpg.

Tune DINO detection thresholds (BOX_THRESHOLD, TEXT_THRESHOLD) in dino.py.

Tune Gemini hyperparameters via the configs list in gemini_api.py.

Extra models

To view personal trial projects exploring other models (SLIME, YOLOE, OWLV2, OWLVit, YOLO-World and YOLO-World-V2), find their folders in the home folder of the Lenovo Desktop with the Serial Number : PF-5M3JNS, and open the folder as a project in Visual Studio Code.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DINO_model		DINO_model
GEMINI_model		GEMINI_model
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
TEMP_FIX.txt		TEMP_FIX.txt
capnp_subscriber.py		capnp_subscriber.py
extract_frames_from_mcap.py		extract_frames_from_mcap.py
gemini_api_imageinput.py		gemini_api_imageinput.py
output_json_path		output_json_path
test_image.py		test_image.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object Detection Pipeline with Gemini & DINO Models

Features

Gemini-based Detection (`gemini_api.py`)

Grounding DINO Detection (`dino.py`)

Frame Extraction (`extract_frames_from_mcap.py`)

Setup Instructions

1. Clone the Repository

2. Install Python Requirements

3a. Set up Google Gemini API

3b. Set up DINO

4. Run Inference

Notes

Extra models

About

Uh oh!

Releases

Packages

Languages

vilota-dev/Object_Detection_Models

Folders and files

Latest commit

History

Repository files navigation

Object Detection Pipeline with Gemini & DINO Models

Features

Gemini-based Detection (gemini_api.py)

Grounding DINO Detection (dino.py)

Frame Extraction (extract_frames_from_mcap.py)

Setup Instructions

1. Clone the Repository

2. Install Python Requirements

3a. Set up Google Gemini API

3b. Set up DINO

4. Run Inference

Notes

Extra models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Gemini-based Detection (`gemini_api.py`)

Grounding DINO Detection (`dino.py`)

Frame Extraction (`extract_frames_from_mcap.py`)

Packages