Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
3dd5820
train successfully
dnth Mar 30, 2025
82ff375
update exporter
dnth Mar 30, 2025
1dd1191
add sample inference for image
dnth Mar 31, 2025
03819c2
working inference
dnth Mar 31, 2025
46496a7
working video det
dnth Mar 31, 2025
1967b29
add cuda and trt ep
dnth Mar 31, 2025
ebb4123
working webcam inference
dnth Mar 31, 2025
cb9b321
update
dnth Mar 31, 2025
7b93a27
use cuda and trt for image inference
dnth Mar 31, 2025
085abcc
add video inference
dnth Mar 31, 2025
141c142
add webcam specs
dnth Mar 31, 2025
aba0d1c
update readme
dnth Mar 31, 2025
1da7f63
update gradio demo
dnth Mar 31, 2025
a1a605d
update readme
dnth Mar 31, 2025
f3c6ded
update command
dnth Mar 31, 2025
298868f
update readme
dnth Mar 31, 2025
33c480c
update
dnth Mar 31, 2025
3e5f9df
update
dnth Mar 31, 2025
75bf332
update
dnth Mar 31, 2025
94184c8
update
dnth Mar 31, 2025
2453ee5
Refine README instructions for live inference commands, ensuring cons…
dnth Apr 2, 2025
34be469
update
dnth Apr 2, 2025
1bd403d
udpate
dnth Apr 2, 2025
f1e3cdd
update
dnth Apr 2, 2025
2708b88
Update README.md
dnth Apr 2, 2025
50727ee
update
dnth Apr 2, 2025
4926ec9
update
dnth Apr 2, 2025
a3951b6
Update README.md
dnth Apr 2, 2025
bda06dd
Update README.md
dnth Apr 2, 2025
688b4d5
update quickstart
dnth Apr 2, 2025
158b793
add credit
dnth Apr 2, 2025
fc1ace4
update
dnth Apr 2, 2025
5578640
update
dnth Apr 2, 2025
083c802
update
dnth Apr 2, 2025
eaf1bf4
update
dnth Apr 2, 2025
66b6ea9
update
dnth Apr 2, 2025
3f09ecc
update readme
dnth Apr 2, 2025
cd17ba8
add dash
dnth Apr 2, 2025
09f87ed
update key features
dnth Apr 2, 2025
7180e6d
updte
dnth Apr 2, 2025
a47979a
add openvino export
dnth Apr 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ outputs/
trt_cache/
# Dataset
dataset_collections/
checkpoints/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
215 changes: 165 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,44 @@
<p>DEIMKit is a Python wrapper for <a href="https://github.com/ShihuaHuang95/DEIM">DEIM: DETR with Improved Matching for Fast Convergence</a>. Check out the original repo for more details.</p>
</div>



<!-- Add HTML Table of Contents -->
<div align="center">
<br />
<table>
<tr>
<td align="center">
<a href="#-why-deimkit">🤔 Why DEIMKit?</a>
</td>
<td align="center">
<a href="#-key-features">🌟 Key Features</a>
</td>
<td align="center">
<a href="#-installation">📦 Installation</a>
</td>
<td align="center">
<a href="#-usage">🚀 Usage</a>
</td>
</tr>
<tr>
<td align="center">
<a href="#-inference">💡 Inference</a>
</td>
<td align="center">
<a href="#-training">🏋️ Training</a>
</td>
<td align="center">
<a href="#-export">💾 Export</a>
</td>
<td align="center">
<a href="#-disclaimer">⚠️ Disclaimer</a>
</td>
</tr>
</table>
</div>

<br />
<div align="center">
<a href="https://colab.research.google.com/github/dnth/DEIMKit/blob/main/nbs/colab-quickstart.ipynb">
<img src="https://img.shields.io/badge/Open%20In-Colab-blue?style=for-the-badge&logo=google-colab" alt="Open In Colab"/>
Expand All @@ -22,29 +60,35 @@
</div>
</div>


## Why DEIMKit?
## 🤔 Why DEIMKit?

- **Pure Python Configuration** - No complicated YAML files, just clean Python code
- **Cross-Platform Simplicity** - Single command installation on Linux, macOS, and Windows
- **Intuitive API** - Load, train, predict, export in just a few lines of code

## Supported Features

- [x] Inference
- [x] Training
- [x] Export


## Installation

### Using pip
Install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.

## Installation

### Using pip
Install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.
## 🌟 Key Features

* **💡 Inference**
* [x] Single Image & Batch Prediction
* [x] Load Pretrained & Custom Models
* [x] Built-in Result Visualization
* [x] Live ONNX Inference (Webcam, Video, Image)
* **🏋️ Training**
* [x] Single & Multi-GPU Training
* [x] Custom Dataset Support (COCO Format)
* [x] Flexible Configuration via Pure Python
* **💾 Export**
* [x] Export Trained Models to ONNX
* [x] ONNX Model with Integrated Preprocessing
* **🛠️ Utilities & Demos**
* [x] Cross-Platform Support (Linux, macOS, Windows)
* [x] Pixi Environment Management Integration
* [x] Interactive Gradio Demo Script

## 📦 Installation

### 📥 Using pip
If you're installing using pip, install [torch](https://pytorch.org/get-started/locally/) and torchvision as a pre-requisite.

Next, install the package.
Bleeding edge version
Expand All @@ -57,7 +101,7 @@ Stable version
pip install git+https://github.com/dnth/DEIM.git@v0.1.1
```

### Using Pixi
### 🔌 Using Pixi

> [!TIP]
> I recommend using [Pixi](https://pixi.sh) to run this package. Pixi makes it easy to install the right version of Python and the dependencies to run this package on any platform!
Expand Down Expand Up @@ -85,7 +129,7 @@ This will download a toy dataset with 8 images, and train a model on it for 3 ep

If this runs without any issues, you've got a working Python environment with all the dependencies installed. This also installs DEIMKit in editable mode for development. See the [pixi cheatsheet](#-pixi-cheat-sheet) below for more.

## Usage
## 🚀 Usage

List models supported by DEIMKit

Expand All @@ -103,7 +147,7 @@ list_models()
'deim_hgnetv2_x']
```

### Inference
### 💡 Inference

Load a pretrained model by the original authors

Expand Down Expand Up @@ -157,7 +201,7 @@ Stomata Dataset

See the [demo notebook on using pretrained models](nbs/pretrained-model-inference.ipynb) and [custom model inference](nbs/custom-model-inference.ipynb) for more details.

### Training
### 🏋️ Training

DEIMKit provides a simple interface for training your own models.

Expand Down Expand Up @@ -225,7 +269,8 @@ Navigate to the http://localhost:6006/ in your browser to view the training prog

![alt text](assets/tensorboard.png)

### Export
### 💾 Export
Currently, the export function is only used for exporting the model to ONNX and run it using ONNXRuntime (see [Live Inference](#-live-inference) for more details). I think one could get pretty far with this even on a low resource machine. Drop an issue if you think this should be extended to other formats.

```python
from deimkit.exporter import Exporter
Expand All @@ -240,58 +285,128 @@ output_path = exporter.to_onnx(
)
```

### Gradio App
Run a Gradio app to interact with your model.
> [!NOTE]
> The exported model will accept raw BGR images of any size. It will also handle the preprocessing internally. Credit to [PINTO0309](https://github.com/PINTO0309/DEIM) for the implementation.
>
> ![onnx model](assets/exported_onnx.png)

> [!TIP]
> If you want to export to OpenVINO you can do so directly from the ONNX model.
>
>
> ```python
> import onnx
> from onnx import helper
>
> model = onnx.load("best.onnx")
>
> # Change the mode attribute of the GridSample node to bilinear as this operation is not supported in OpenVINO
> for node in model.graph.node:
> if node.op_type == 'GridSample':
> for i, attr in enumerate(node.attribute):
> if attr.name == 'mode' and attr.s == b'linear':
> # Replace 'linear' with 'bilinear'
> node.attribute[i].s = b'bilinear'
>
> # Save the modified model
> onnx.save(model, "best_prep_openvino.onnx")
> ```
> You can then use the live inference script to run inference on the OpenVINO model.

### 🖥️ Gradio App
Run a Gradio app to interact with your model. The app will accept raw BGR images of any size. It will also handle the preprocessing internally using the exported ONNX model.

```bash
python scripts/gradio_demo.py
python scripts/gradio_demo.py \
--model "best.onnx" \
--classes "classes.txt" \
--examples "Rock Paper Scissors SXSW.v14i.coco/test"
```
![alt text](assets/gradio_demo.png)

### Live Inference
> [!NOTE]
> The demo app uses onnx model and onnxruntime for inference. Additionally, I have also made it that the ONNX model to accept any input size, despite the original model was trained on 640x640 images.
> This means you can use any image size you want. Play around with the input size slider to see what works best for your model.
> Some objects are visible even at lower input sizes, this means you can use a lower input size to speed up inference.

### 🎥 Live Inference
Run live inference on a video, image or webcam using ONNXRuntime. This runs on CPU by default.
If you would like to use the CUDA backend, you can install the `onnxruntime-gpu` package and uninstall the `onnxruntime` package.
If you would like to use the CUDA backend, install the `onnxruntime-gpu` package and uninstall the `onnxruntime` package.

For video inference, specify the path to the video file as the input. Output video will be saved as `onnx_result.mp4` in the current directory.
For running inference on a webcam, set the `--webcam` flag.

```bash
python scripts/live_inference.py
--onnx model.onnx # Path to the ONNX model file
--input video.mp4 # Path to the input video file
--class-names classes.txt # Path to the classes file with each name on a new row
--input-size 320 # Input size for the model
--model model.onnx # Path to the ONNX model file
--webcam # Use webcam as input source
--classes classes.txt # Path to the classes file with each name on a new row
--video-width 720 # Input size for the model
--provider tensorrt # Execution provider (cpu/cuda/tensorrt)
--threshold 0.3 # Detection confidence threshold
```

The following is a demo of video inference after training for about 50 epochs on the vehicles dataset with image size 320x320.
Because we are handling the preprocessing internally in the ONNX model, the input size is not limited to the original 640x640. You can use any input size you want for inference. The model was trained on 640x640 images. Integrating the preprocessing internally in the ONNX model also lets us run inference at very high FPS as it uses more efficient onnx operators.

https://github.com/user-attachments/assets/5066768f-c97e-4999-af81-ffd29d88f529
The following is a model I trained on a custom dataset using the deim_hgnetv2_s model and exported to ONNX. Here are some examples of inference on a webcam at different video resolutions.

Webcam video width at 1920x1080 pixels (1080p):

You can also run live inference on a webcam by setting the `webcam` flag.
https://github.com/user-attachments/assets/bd98eb1e-feff-4b53-9fa9-d4aff6a724e0

Webcam video width at 1280x720 pixels (720p):

https://github.com/user-attachments/assets/31a8644e-e0c6-4bba-9d4f-857a3d0b53e1

Webcam video width at 848x480 pixels (480p):

https://github.com/user-attachments/assets/aa267f05-5dbd-4824-973c-62f3b8f59c80

Webcam video width at 640x480 pixels (480p):

https://github.com/user-attachments/assets/3d0c04c0-645a-4d54-86c0-991930491113

Webcam video width at 320x240 pixels (240p):

https://github.com/user-attachments/assets/f4afff9c-3e6d-4965-ab86-0d4de7ce1a44




For video inference, specify the path to the video file as the input. Output video will be saved as `onnx_result.mp4` in the current directory.

```bash
python scripts/live_inference.py
--onnx model.onnx # Path to the ONNX model file
--webcam # Use webcam as input source
--class-names classes.txt # Path to the classes file. Each class name should be on a new line.
--input-size 320 # Input size for the model
--model model.onnx # Path to the ONNX model file
--video video.mp4 # Path to the input video file
--classes classes.txt # Path to the classes file with each name on a new row
--video-width 320 # Input size for the model
--provider cpu # Execution provider (cpu/cuda/tensorrt)
--threshold 0.3 # Detection confidence threshold
```
The following is a demo of webcam inference after training on the rock paper scissors dataset 640x640 resolution image.
https://github.com/user-attachments/assets/6bc1dc6a-a223-4220-954d-2dab5c75b4a8

The following is an inference using the pre-trained model `deim_hgnetv2_x` trained on COCO. See how I exported the pre-trained model to onnx in this notebook [here](nbs/export.ipynb).

https://github.com/user-attachments/assets/6e5dbb15-4e3a-45a3-997e-157bb9370146
https://github.com/user-attachments/assets/77070ea4-8407-4648-ade3-01cacd77b51b


For image inference, specify the path to the image file as the input.

```bash
python scripts/live_inference.py
--onnx model.onnx # Path to the ONNX model file
--input image.jpg # Path to the input image file
--class-names classes.txt # Path to the classes file. Each class name should be on a new line.
--input-size 320 # Input size for the model
--model model.onnx # Path to the ONNX model file
--image image.jpg # Path to the input image file
--classes classes.txt # Path to the classes file with each name on a new row
--provider cpu # Execution provider (cpu/cuda/tensorrt)
--threshold 0.3 # Detection confidence threshold
```




The following is a demo of image inference

![image](assets/sample_result_image.jpg)
![image](assets/sample_result_image_1.jpg)

> [!TIP]
> If you are using Pixi, you can run the live inference script with the following command with the same arguments as above.
Expand All @@ -308,7 +423,7 @@ The following is a demo of image inference
> If you want to use the CPU, replace `cuda` with `cpu` in the command above.


## Pixi Cheat Sheet
## 📝 Pixi Cheat Sheet
Here are some useful tasks you can run with Pixi.

Run a quickstart
Expand Down Expand Up @@ -352,7 +467,7 @@ pixi run -e cpu live-inference --onnx model.onnx --input video.mp4 --class-names

Launch Gradio app
```bash
pixi run -e cuda gradio-demo
pixi run gradio-demo --model "best_prep.onnx" --classes "classes.txt" --examples "Rock Paper Scissors SXSW.v14i.coco/test"
```

```bash
Expand All @@ -366,5 +481,5 @@ pixi run export --config config.yml --checkpoint model.pth --output model.onnx



## Disclaimer
## ⚠️ Disclaimer
I'm not affiliated with the original DEIM authors. I just found the model interesting and wanted to try it out. The changes made here are of my own. Please cite and star the original repo if you find this useful.
Binary file added assets/exported_onnx.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/gradio_demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/sample_result_image_1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading