This project implements a complete object detection workflow to identify short roadside bollards using two AI models:
- Google's Gemini Vision API
- Grounding DINO (Hugging Face)
- Sends images to Gemini 2.0 Flash via API.
- Extracts bounding boxes for bollards
- Saves:
- JSON predictions per image
- Visualizations with red boxes around detected bollards
- Supports hyperparameter tuning (
temperature,top_p,top_k).
- Loads the Hugging Face GroundingDINO model locally.
- Detects similar bollard objects using text prompts.
- Saves:
- Annotated images with class names and bounding boxes
- Corresponding JSON files
- Converts
.mcapvideo files into image frames. - Helps generate a dataset from video input.
pip install -r requirements.txt
Get your API key from Google AI Studio for developers.
Uncomment and insert your key:
client = genai.Client(api_key='YOUR_KEY_HERE')
Clone this repository : https://github.com/IDEA-Research/GroundingDINO
Gemini
python gemini_api.py
DINO
python dino.py
Make sure the image filenames in chosen_dataset/ end with .png or .jpg.
Tune DINO detection thresholds (BOX_THRESHOLD, TEXT_THRESHOLD) in dino.py.
Tune Gemini hyperparameters via the configs list in gemini_api.py.
To view personal trial projects exploring other models (SLIME, YOLOE, OWLV2, OWLVit, YOLO-World and YOLO-World-V2), find their folders in the home folder of the Lenovo Desktop with the Serial Number : PF-5M3JNS, and open the folder as a project in Visual Studio Code.