dnth · dnth · Apr 2, 2025 · Mar 30, 2025 · Mar 30, 2025 · Mar 31, 2025
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,7 @@ outputs/
 trt_cache/
 # Dataset
 dataset_collections/
+checkpoints/
 
 # Byte-compiled / optimized / DLL files
 __pycache__/

diff --git a/README.md b/README.md
@@ -12,6 +12,44 @@
 <p>DEIMKit is a Python wrapper for <a href="https://github.com/ShihuaHuang95/DEIM">DEIM: DETR with Improved Matching for Fast Convergence</a>. Check out the original repo for more details.</p>
 </div>
 
+
+
+<!-- Add HTML Table of Contents -->
+<div align="center">
+    <br />
+    <table>
+        <tr>
+            <td align="center">
+                <a href="#-why-deimkit">🤔 Why DEIMKit?</a>
+            </td>
+            <td align="center">
+                <a href="#-key-features">🌟 Key Features</a>
+            </td>
+            <td align="center">
+                <a href="#-installation">📦 Installation</a>
+            </td>
+            <td align="center">
+                <a href="#-usage">🚀 Usage</a>
+            </td>
+        </tr>
+        <tr>
+            <td align="center">
+                <a href="#-inference">💡 Inference</a>
+            </td>
+            <td align="center">
+                <a href="#-training">🏋️ Training</a>
+            </td>
+            <td align="center">
+                <a href="#-export">💾 Export</a>
+            </td>
+             <td align="center">
+                <a href="#-disclaimer">⚠️ Disclaimer</a>
+            </td>
+        </tr>
+    </table>
+</div>
+
+<br />
 <div align="center">
   <a href="https://colab.research.google.com/github/dnth/DEIMKit/blob/main/nbs/colab-quickstart.ipynb">
     <img src="https://img.shields.io/badge/Open%20In-Colab-blue?style=for-the-badge&logo=google-colab" alt="Open In Colab"/>
@@ -22,29 +60,35 @@
 </div>
 </div>
 
-
-## Why DEIMKit?
+## 🤔 Why DEIMKit?
 
 - **Pure Python Configuration** - No complicated YAML files, just clean Python code
 - **Cross-Platform Simplicity** - Single command installation on Linux, macOS, and Windows
 - **Intuitive API** - Load, train, predict, export in just a few lines of code
 
-## Supported Features
-
-- [x] Inference
-- [x] Training
-- [x] Export
-
-
-## Installation
-
-### Using pip
-Install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
-
-## Installation
-
-### Using pip
-Install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
+## 🌟 Key Features
+
+*   **💡 Inference**
+    *   [x] Single Image & Batch Prediction
+    *   [x] Load Pretrained & Custom Models
+    *   [x] Built-in Result Visualization
+    *   [x] Live ONNX Inference (Webcam, Video, Image)
+*   **🏋️ Training**
+    *   [x] Single & Multi-GPU Training
+    *   [x] Custom Dataset Support (COCO Format)
+    *   [x] Flexible Configuration via Pure Python
+*   **💾 Export**
+    *   [x] Export Trained Models to ONNX
+    *   [x] ONNX Model with Integrated Preprocessing
+*   **🛠️ Utilities & Demos**
+    *   [x] Cross-Platform Support (Linux, macOS, Windows)
+    *   [x] Pixi Environment Management Integration
+    *   [x] Interactive Gradio Demo Script
+
+## 📦 Installation
+
+### 📥 Using pip
+If you're installing using pip, install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
 
 Next, install the package.
 Bleeding edge version
@@ -57,7 +101,7 @@ Stable version
 pip install git+https://github.com/dnth/DEIM.git@v0.1.1
 ```
 
-### Using Pixi
+### 🔌 Using Pixi
 
 > [!TIP] 
 > I recommend using [Pixi](https://pixi.sh) to run this package. Pixi makes it easy to install the right version of Python and the dependencies to run this package on any platform!
@@ -85,7 +129,7 @@ This will download a toy dataset with 8 images, and train a model on it for 3 ep
 
 If this runs without any issues, you've got a working Python environment with all the dependencies installed. This also installs DEIMKit in editable mode for development. See the [pixi cheatsheet](#-pixi-cheat-sheet) below for more. 
 
-## Usage
+## 🚀 Usage
 
 List models supported by DEIMKit
 
@@ -103,7 +147,7 @@ list_models()
  'deim_hgnetv2_x']
 ```
 
-### Inference
+### 💡 Inference
 
 Load a pretrained model by the original authors
 
@@ -157,7 +201,7 @@ Stomata Dataset
 
 See the [demo notebook on using pretrained models](nbs/pretrained-model-inference.ipynb) and [custom model inference](nbs/custom-model-inference.ipynb) for more details.
 
-### Training
+### 🏋️ Training
 
 DEIMKit provides a simple interface for training your own models.
 
@@ -225,7 +269,8 @@ Navigate to the http://localhost:6006/ in your browser to view the training prog
 
 ![alt text](assets/tensorboard.png)
 
-### Export
+### 💾 Export
+Currently, the export function is only used for exporting the model to ONNX and run it using ONNXRuntime (see [Live Inference](#-live-inference) for more details). I think one could get pretty far with this even on a low resource machine. Drop an issue if you think this should be extended to other formats.
 
 ```python
 from deimkit.exporter import Exporter
@@ -240,58 +285,128 @@ output_path = exporter.to_onnx(
 )
 ```
 
-### Gradio App
-Run a Gradio app to interact with your model.
+> [!NOTE]
+> The exported model will accept raw BGR images of any size. It will also handle the preprocessing internally. Credit to [PINTO0309](https://github.com/PINTO0309/DEIM) for the implementation.
+> 
+> ![onnx model](assets/exported_onnx.png)
+
+> [!TIP]
+> If you want to export to OpenVINO you can do so directly from the ONNX model.
+> 
+>
+> ```python
+> import onnx
+> from onnx import helper
+> 
+> model = onnx.load("best.onnx")
+>
+> # Change the mode attribute of the GridSample node to bilinear as this operation is not supported in OpenVINO
+> for node in model.graph.node:
+>     if node.op_type == 'GridSample':
+>         for i, attr in enumerate(node.attribute):
+>             if attr.name == 'mode' and attr.s == b'linear':
+>                 # Replace 'linear' with 'bilinear'
+>                 node.attribute[i].s = b'bilinear'
+>       
+> # Save the modified model
+> onnx.save(model, "best_prep_openvino.onnx")
+> ```
+> You can then use the live inference script to run inference on the OpenVINO model.
+
+### 🖥️ Gradio App
+Run a Gradio app to interact with your model. The app will accept raw BGR images of any size. It will also handle the preprocessing internally using the exported ONNX model.
 
 ```bash
-python scripts/gradio_demo.py
+python scripts/gradio_demo.py \
+    --model "best.onnx" \
+    --classes "classes.txt" \
+    --examples "Rock Paper Scissors SXSW.v14i.coco/test"
 ```
 ![alt text](assets/gradio_demo.png)
 
-### Live Inference
+> [!NOTE]
+> The demo app uses onnx model and onnxruntime for inference. Additionally, I have also made it that the ONNX model to accept any input size, despite the original model was trained on 640x640 images. 
+> This means you can use any image size you want. Play around with the input size slider to see what works best for your model. 
+> Some objects are visible even at lower input sizes, this means you can use a lower input size to speed up inference.
+
+### 🎥 Live Inference
 Run live inference on a video, image or webcam using ONNXRuntime. This runs on CPU by default.
-If you would like to use the CUDA backend, you can install the `onnxruntime-gpu` package and uninstall the `onnxruntime` package.
+If you would like to use the CUDA backend, install the `onnxruntime-gpu` package and uninstall the `onnxruntime` package.
 
-For video inference, specify the path to the video file as the input. Output video will be saved as `onnx_result.mp4` in the current directory.
+For running inference on a webcam, set the `--webcam` flag.
 
 ```bash
 python scripts/live_inference.py 
-    --onnx model.onnx           # Path to the ONNX model file
-    --input video.mp4           # Path to the input video file
-    --class-names classes.txt   # Path to the classes file with each name on a new row
-    --input-size 320            # Input size for the model
+    --model model.onnx          # Path to the ONNX model file
+    --webcam                    # Use webcam as input source
+    --classes classes.txt       # Path to the classes file with each name on a new row
+    --video-width 720           # Input size for the model
+    --provider tensorrt         # Execution provider (cpu/cuda/tensorrt)
+    --threshold 0.3             # Detection confidence threshold
 ```
 
-The following is a demo of video inference after training for about 50 epochs on the vehicles dataset with image size 320x320.
+Because we are handling the preprocessing internally in the ONNX model, the input size is not limited to the original 640x640. You can use any input size you want for inference. The model was trained on 640x640 images. Integrating the preprocessing internally in the ONNX model also lets us run inference at very high FPS as it uses more efficient onnx operators. 
 
-https://github.com/user-attachments/assets/5066768f-c97e-4999-af81-ffd29d88f529
+The following is a model I trained on a custom dataset using the deim_hgnetv2_s model and exported to ONNX. Here are some examples of inference on a webcam at different video resolutions.
 
+Webcam video width at 1920x1080 pixels (1080p):
 
-You can also run live inference on a webcam by setting the `webcam` flag.
+https://github.com/user-attachments/assets/bd98eb1e-feff-4b53-9fa9-d4aff6a724e0
+
+Webcam video width at 1280x720 pixels (720p):
+
+https://github.com/user-attachments/assets/31a8644e-e0c6-4bba-9d4f-857a3d0b53e1
+
+Webcam video width at 848x480 pixels (480p):
+
+https://github.com/user-attachments/assets/aa267f05-5dbd-4824-973c-62f3b8f59c80
+
+Webcam video width at 640x480 pixels (480p):
+
+https://github.com/user-attachments/assets/3d0c04c0-645a-4d54-86c0-991930491113
+
+Webcam video width at 320x240 pixels (240p):
+
+https://github.com/user-attachments/assets/f4afff9c-3e6d-4965-ab86-0d4de7ce1a44
+
+
+
+
+For video inference, specify the path to the video file as the input. Output video will be saved as `onnx_result.mp4` in the current directory.
 
 ```bash
 python scripts/live_inference.py 
-    --onnx model.onnx           # Path to the ONNX model file
-    --webcam                    # Use webcam as input source
-    --class-names classes.txt   # Path to the classes file. Each class name should be on a new line.
-    --input-size 320            # Input size for the model
+    --model model.onnx           # Path to the ONNX model file
+    --video video.mp4            # Path to the input video file
+    --classes classes.txt        # Path to the classes file with each name on a new row
+    --video-width 320            # Input size for the model
+    --provider cpu               # Execution provider (cpu/cuda/tensorrt)
+    --threshold 0.3              # Detection confidence threshold
 ```
-The following is a demo of webcam inference after training on the rock paper scissors dataset 640x640 resolution image.
+https://github.com/user-attachments/assets/6bc1dc6a-a223-4220-954d-2dab5c75b4a8
+
+The following is an inference using the pre-trained model `deim_hgnetv2_x` trained on COCO. See how I exported the pre-trained model to onnx in this notebook [here](nbs/export.ipynb).
 
-https://github.com/user-attachments/assets/6e5dbb15-4e3a-45a3-997e-157bb9370146
+https://github.com/user-attachments/assets/77070ea4-8407-4648-ade3-01cacd77b51b
 
 
 For image inference, specify the path to the image file as the input.
+
 ```bash
 python scripts/live_inference.py 
-    --onnx model.onnx           # Path to the ONNX model file
-    --input image.jpg           # Path to the input image file
-    --class-names classes.txt   # Path to the classes file. Each class name should be on a new line.
-    --input-size 320            # Input size for the model
+    --model model.onnx          # Path to the ONNX model file
+    --image image.jpg           # Path to the input image file
+    --classes classes.txt       # Path to the classes file with each name on a new row
+    --provider cpu              # Execution provider (cpu/cuda/tensorrt)
+    --threshold 0.3             # Detection confidence threshold
 ```
+
+
+
+
 The following is a demo of image inference
 
-![image](assets/sample_result_image.jpg)
+![image](assets/sample_result_image_1.jpg)
 
 > [!TIP]
 > If you are using Pixi, you can run the live inference script with the following command with the same arguments as above.
@@ -308,7 +423,7 @@ The following is a demo of image inference
 > If you want to use the CPU, replace `cuda` with `cpu` in the command above.
 
 
-## Pixi Cheat Sheet
+## 📝 Pixi Cheat Sheet
 Here are some useful tasks you can run with Pixi.
 
 Run a quickstart
@@ -352,7 +467,7 @@ pixi run -e cpu live-inference --onnx model.onnx --input video.mp4 --class-names
 
 Launch Gradio app
 ```bash
-pixi run -e cuda gradio-demo
+pixi run gradio-demo --model "best_prep.onnx" --classes "classes.txt" --examples "Rock Paper Scissors SXSW.v14i.coco/test"
 ```
 
 ```bash
@@ -366,5 +481,5 @@ pixi run export --config config.yml --checkpoint model.pth --output model.onnx
 
 
 
-## Disclaimer
+## ⚠️ Disclaimer
 I'm not affiliated with the original DEIM authors. I just found the model interesting and wanted to try it out. The changes made here are of my own. Please cite and star the original repo if you find this useful.
diff --git a/assets/exported_onnx.png b/assets/exported_onnx.png
diff --git a/assets/gradio_demo.png b/assets/gradio_demo.png
diff --git a/assets/sample_result_image_1.jpg b/assets/sample_result_image_1.jpg