Basic Visual Synthesis is a project that performs image analysis and generates detailed commentary by integrating YOLO-based object detection with Anything LLM (Deepseek R114B) using a RAG (Retrieval Augmented Generation) approach. The pipeline detects objects in an image, retrieves additional context from a built-in knowledge base, and then leverages the LLM to produce comprehensive commentary on the image.
Project Description: "This project aims to perform image analysis and generate detailed commentary by integrating YOLO-based object detection with the Deepseek LLM using a RAG (Retrieval Augmented Generation) approach."
- Object Detection: Utilizes Ultralytics YOLO for detecting objects in images.
- Context Retrieval: Provides additional context via a built-in knowledge base.
- RAG Approach: Integrates Anything LLM for detailed commentary generation. with Deepseek R114B to generate detailed commentary using a Retrieval Augmented Generation method.
- Colored Logging: Uses Colorama with Python's
loggingmodule for enhanced, colored log output.
-
Python Version: Tested with Python 3.8 and above.
-
Clone the Repository:
git clone https://github.com/oaslananka/BasicVisualSynthesis.git cd BasicVisualSynthesis -
Create a Virtual Environment (Optional but Recommended):
python -m venv .venv source .venv/bin/activate pip install ultralytics colorama requests python VisualSynthesisRAG.py -
Install Required Packages:
pip install ultralytics colorama requests
-
Configure API Keys and Model Paths:
- Update the API_KEY and BASE_URL constants in the code as needed.