From 64888ef46c52b695a57992d716dac707ae52505c Mon Sep 17 00:00:00 2001 From: David Newmon Date: Sun, 7 Dec 2025 07:15:00 -0500 Subject: [PATCH] feat: add RT-DETR training notebook with specs --- .../rtdetr/rtdetr.ipynb | 636 ++++++++++++++++++ .../rtdetr/specs/classmap.txt | 80 +++ .../rtdetr/specs/download_coco.sh | 69 ++ .../rtdetr/specs/specs.yaml | 150 +++++ 4 files changed, 935 insertions(+) create mode 100644 notebooks/tao_launcher_starter_kit/rtdetr/rtdetr.ipynb create mode 100755 notebooks/tao_launcher_starter_kit/rtdetr/specs/classmap.txt create mode 100755 notebooks/tao_launcher_starter_kit/rtdetr/specs/download_coco.sh create mode 100644 notebooks/tao_launcher_starter_kit/rtdetr/specs/specs.yaml diff --git a/notebooks/tao_launcher_starter_kit/rtdetr/rtdetr.ipynb b/notebooks/tao_launcher_starter_kit/rtdetr/rtdetr.ipynb new file mode 100644 index 0000000..5fc0baa --- /dev/null +++ b/notebooks/tao_launcher_starter_kit/rtdetr/rtdetr.ipynb @@ -0,0 +1,636 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Object Detection using TAO RT-DETR\n", + "\n", + "Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. \n", + "\n", + "Train Adapt Optimize (TAO) Toolkit is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.\n", + "\n", + "\n", + "\n", + "## What is RT-DETR?\n", + "**RT-DETR** (Real-Time DEtection TRansformer) is a high-performance transformer-based object detection model designed specifically for real-time applications. RT-DETR eliminates the need for post-processing heuristics like Non-Maximum Suppression (NMS), which are typically required by CNN-based real-time detectors like the YOLO series. Unlike standard DETR models, RT-DETR employs an efficient hybrid encoder to decouple intra-scale interaction and cross-scale fusion, significantly improving inference speed. It often supports variable backbones, such as ResNet and HGNetv2, to flexibly balance latency and accuracy.\n", + "\n", + "In TAO, different types of backbone networks are supported: ResNet, EfficientViT, FAN, and ConvNext v1/v2. In this notebook, we use the default backbone of ResNet50." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Learning Objectives\n", + "\n", + "In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:\n", + "\n", + "* Take a pretrained model and train an RT-DETR model on COCO dataset\n", + "* Evaluate the trained model\n", + "* Run inference with the trained model and visualize the result\n", + "* Export the trained model to a .onnx file for deployment to DeepStream\n", + "* Generate TensorRT engine using tao-deploy and verify the engine through evaluation\n", + "\n", + "At the end of this notebook, you will have generated a trained `rtdetr` model\n", + "which you may deploy via [DeepStream](https://developer.nvidia.com/deepstream-sdk).\n", + "\n", + "## Table of Contents\n", + "\n", + "This notebook shows an example usecase of DeformableDETR using Train Adapt Optimize (TAO) Toolkit.\n", + "\n", + "0. [Set up env variables and map drives](#head-0)\n", + "1. [Installing the TAO launcher](#head-1)\n", + "2. [Prepare dataset and pre-trained model](#head-2)\n", + "3. [Provide training specification](#head-3)\n", + "4. [Run TAO training](#head-4)\n", + "5. [Evaluate a trained model](#head-5)\n", + "6. [Visualize inferences](#head-6)\n", + "7. [Deploy](#head-7)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 0. Set up env variables and map drives \n", + "\n", + "The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/rtdetr/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.\n", + "\n", + "The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher.
\n", + "\n", + "`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json` file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# Please define this local project directory that needs to be mapped to the TAO docker session.\n", + "%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments\n", + "\n", + "os.environ[\"HOST_DATA_DIR\"] = os.path.join(os.getenv(\"LOCAL_PROJECT_DIR\", os.getcwd()), \"data\")\n", + "os.environ[\"HOST_RESULTS_DIR\"] = os.path.join(os.getenv(\"LOCAL_PROJECT_DIR\", os.getcwd()), \"rtdetr\", \"results\")\n", + "\n", + "# Set this path if you don't run the notebook from the samples directory.\n", + "# %env NOTEBOOK_ROOT=~/tao-samples/deformable_detr\n", + "\n", + "# The sample spec files are present in the same path as the downloaded samples.\n", + "os.environ[\"HOST_SPECS_DIR\"] = os.path.join(\n", + " os.getenv(\"NOTEBOOK_ROOT\", os.getcwd()),\n", + " \"specs\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! mkdir -p $HOST_DATA_DIR\n", + "! mkdir -p $HOST_SPECS_DIR\n", + "! mkdir -p $HOST_RESULTS_DIR" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Mapping up the local directories to the TAO docker.\n", + "import json\n", + "import os\n", + "mounts_file = os.path.expanduser(\"~/.tao_mounts.json\")\n", + "tao_configs = {\n", + " \"Mounts\":[\n", + " # Mapping the Local project directory\n", + " {\n", + " \"source\": os.environ[\"LOCAL_PROJECT_DIR\"],\n", + " \"destination\": \"/workspace/tao-experiments\"\n", + " },\n", + " {\n", + " \"source\": os.environ[\"HOST_DATA_DIR\"],\n", + " \"destination\": \"/data\"\n", + " },\n", + " {\n", + " \"source\": os.environ[\"HOST_SPECS_DIR\"],\n", + " \"destination\": \"/specs\"\n", + " },\n", + " {\n", + " \"source\": os.environ[\"HOST_RESULTS_DIR\"],\n", + " \"destination\": \"/results\"\n", + " }\n", + " ],\n", + " \"DockerOptions\": {\n", + " \"shm_size\": \"16G\",\n", + " \"ulimits\": {\n", + " \"memlock\": -1,\n", + " \"stack\": 67108864\n", + " },\n", + " \"user\": \"{}:{}\".format(os.getuid(), os.getgid()),\n", + " \"network\": \"host\"\n", + " }\n", + "}\n", + "# Writing the mounts file.\n", + "with open(mounts_file, \"w\") as mfile:\n", + " json.dump(tao_configs, mfile, indent=4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!cat ~/.tao_mounts.json" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Installing the TAO launcher \n", + "The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.\n", + "\n", + "Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running\n", + "\n", + "```sh\n", + "export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x\n", + "```\n", + "where x >= 6 and <= 8\n", + "\n", + "We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:\n", + "* python >=3.7, <=3.10.x\n", + "* docker-ce > 19.03.5\n", + "* docker-API 1.40\n", + "* nvidia-container-toolkit > 1.3.0-1\n", + "* nvidia-container-runtime > 3.4.0-1\n", + "* nvidia-docker2 > 2.5.0-1\n", + "* nvidia-driver > 455+\n", + "\n", + "Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below\n", + "\n", + "```sh\n", + "docker login nvcr.io\n", + "```\n", + "\n", + "You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.\n", + "\n", + "Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# View the versions of the TAO launcher\n", + "!tao info" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Prepare dataset and pre-trained model " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.1 Prepare dataset" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + " We will be using the COCO dataset for the tutorial. The following script will download COCO dataset automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create local dir\n", + "!mkdir -p $HOST_DATA_DIR\n", + "# Download the data\n", + "!bash $HOST_SPECS_DIR/download_coco.sh $HOST_DATA_DIR" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Verification\n", + "!ls -l $HOST_DATA_DIR/raw-data" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.2 Download pre-trained model" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "According to the [Nvidia Forums](https://forums.developer.nvidia.com/t/backbone-models-for-rt-detr/353239/3), this is the best place to get pretrained weights." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!mkdir -p $LOCAL_PROJECT_DIR/rtdetr/pretrained_weights\n", + "!wget -P $LOCAL_PROJECT_DIR/rtdetr/pretrained_weights https://download.pytorch.org/models/resnet50-0676ba61.pth" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Provide training specification \n", + "\n", + "We provide specification files to configure the training parameters including:\n", + "\n", + "* dataset: configure the dataset and augmentation methods\n", + " * train_data_sources:\n", + " * image_dir: annotation file for train data. required to be in COCO json format\n", + " * json_file: the root directory for train images\n", + " * val_data_sources: \n", + " * image_dir: the root directory for validation images\n", + " * json_file: annotation file for validation data. required to be in COCO json format\n", + " * num_classes: number of classes of you training data\n", + " * batch_size: batch size for dataloader\n", + " * workers: number of workers to do data loading\n", + "* model: configure the model setting\n", + " * pretrained_backbone_path: path to the pretrained backbone model. Only ResNet50 backbone is supported\n", + " * num_feature_levels: number of feature levels used from backbone\n", + " * dec_layers: number of decoder layers\n", + " * enc_layers: number of encoder layers\n", + " * num_queries: number of queries for the model\n", + " * with_box_refine: flag to enable bbox refinement\n", + " * dropout_ratio: drop out ratio\n", + "* train: configure the training hyperparameters\n", + " * num_gpus: number of gpus \n", + " * num_nodes: number of nodes (num_nodes=1 for single node)\n", + " * val_interval: validation interval\n", + " * optim:\n", + " * lr_backbone: learning rate for backbone\n", + " * lr: learning rate for the rest of the model\n", + " * lr_steps: learning rate decay step milestone (MultiStep)\n", + " * num_epochs: number of epochs\n", + " * activation_checkpoint: recompute activations in the backward to save GPU memory. Default is `True`.\n", + " * precision: If set to fp16, the training is run on Automatic Mixed Precision (AMP)\n", + " * distributed_strategy: Default is `ddp`. `ddp_sharded` is also supported.\n", + "\n", + "* **Note that the sample spec is not meant to produce SOTA accuracy on COCO. To reproduce SOTA, you should set `num_feature_levels` as 4 to match the original params. In addition, the use of NVImageNet weight also cause a slightly lower mAP when compared with ImageNet weight.**\n", + "\n", + "Please refer to the TAO documentation about Deformable Detr to get all the parameters that are configurable.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "!cat $HOST_SPECS_DIR/specs.yaml" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Run TAO training \n", + "* Provide the sample spec file and the output directory location for models\n", + "* Evaluation uses COCO metrics. For more info, please refer to: https://cocodataset.org/#detection-eval\n", + "* WARNING: [according to the orirginal paper](https://arxiv.org/abs/2010.04159), COCO training takes about 325 hours to complete using 8 V100 gpus. As a result, **we highly recommend that you run training with multiple high-end gpus (e.g. V100, A100)**\n", + "* If you wish to speed up training, you may try to set `train.precision=fp16` for mixed precision training" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# NOTE: The following paths are set from the perspective of the TAO Docker.\n", + "\n", + "# The data is saved here\n", + "%env DATA_DIR = /data\n", + "%env SPECS_DIR = /specs\n", + "%env RESULTS_DIR = /results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(\"For multi-GPU, change train.num_gpus in train.yaml based on your machine\")\n", + "print(\"For multi-node, change train.num_gpus and train.num_nodes in train.yaml based on your machine\")\n", + "# If you face out of memory issue, you may reduce the batch size in the spec file by passing dataset.batch_size=2\n", + "!tao model rtdetr train \\\n", + " -e $SPECS_DIR/specs.yaml \\\n", + " train.num_gpus=4 \\\n", + " results_dir=$RESULTS_DIR/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('Trained checkpoints:')\n", + "print('---------------------')\n", + "!ls -ltrh $HOST_RESULTS_DIR/train" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint\n", + "# %env NUM_EPOCH=029\n", + "\n", + "# Get the name of the checkpoint corresponding to your set epoch\n", + "# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH\n", + "# %env CHECKPOINT={tmp[0]}\n", + "\n", + "# Or get the latest checkpoint\n", + "os.environ[\"CHECKPOINT\"] = os.path.join(os.getenv(\"HOST_RESULTS_DIR\"), \"train/rtdetr_model_latest.pth\")\n", + "\n", + "print('Rename a trained model: ')\n", + "print('---------------------')\n", + "!cp $CHECKPOINT $HOST_RESULTS_DIR/train/rtdetr_model.pth\n", + "!ls -ltrh $HOST_RESULTS_DIR/train/rtdetr_model.pth" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Evaluate a trained model \n", + "\n", + "In this section, we run the `evaluate` tool to evaluate the trained model and produce the mAP metric.\n", + "\n", + "We provide evaluate.yaml specification files to configure the evaluate parameters including:\n", + "\n", + "* model: configure the model setting\n", + " * this config should remain same as your trained model's configuration.\n", + "* dataset: configure the dataset and augmentation methods\n", + " * test_data_sources:\n", + " * image_dir: the root directory for evaluatation images \n", + " * json_file: annotation file for evaluatation data. required to be in COCO json format.\n", + " * num_classes: number of classes you used for training\n", + " * eval_class_ids: classes you would like to evaluate. \\\n", + " Note that current config file will evaluate only on class 1 (person in COCO dataset)\\\n", + " If you remove this from config file, it will evaluate and compute the average over entire classes.\n", + " * batch_size\n", + " * workers\n", + "* evaluate:\n", + " * num_gpus: number of gpus\n", + " * conf_threshold: a threshold for confidence scores\n", + "\n", + "* **NOTE: You need to change the model path in evaluate.yaml file based on your setting.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Evaluate on TAO model\n", + "!tao model rtdetr evaluate \\\n", + " -e $SPECS_DIR/specs.yaml \\\n", + " evaluate.checkpoint=$RESULTS_DIR/train/rtdetr_model.pth \\\n", + " results_dir=$RESULTS_DIR/" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Visualize Inferences \n", + "In this section, we run the `inference` tool to generate inferences on the trained models and visualize the results. The `inference` tool produces annotated image outputs and txt files that contain prediction information.\n", + "\n", + "We provide evaluate.yaml specification files to configure the evaluate parameters including:\n", + "\n", + "* model: configure the model setting\n", + " * this config should remain same as your trained model's configuration\n", + "* dataset: configure the dataset and augmentation methods\n", + " * infer_data_sources:\n", + " * image_dir: the list of directories for inference images\n", + " * classmap: \n", + " * num_classes: number of classes you used for training\n", + " * batch_size\n", + " * workers\n", + "* inference\n", + " * conf_threshold: the confidence score threshold\n", + " * color_map: the color mapping for each class. The predicted bbox will be drawn with mapped color for each class\n", + "* **NOTE: You need to change the model path in infer.yaml file based on your setting.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!tao model rtdetr inference \\\n", + " -e $SPECS_DIR/specs.yaml \\\n", + " inference.checkpoint=$RESULTS_DIR/train/rtdetr_model.pth \\\n", + " results_dir=$RESULTS_DIR/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Simple grid visualizer\n", + "!pip3 install \"matplotlib>=3.3.3, <4.0\"\n", + "import matplotlib.pyplot as plt\n", + "import os\n", + "from math import ceil\n", + "valid_image_ext = ['.jpg']\n", + "\n", + "def visualize_images(output_path, num_cols=4, num_images=10):\n", + " num_rows = int(ceil(float(num_images) / float(num_cols)))\n", + " f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])\n", + " f.tight_layout()\n", + " a = [os.path.join(output_path, image) for image in os.listdir(output_path) \n", + " if os.path.splitext(image)[1].lower() in valid_image_ext]\n", + " for idx, img_path in enumerate(a[:num_images]):\n", + " col_id = idx % num_cols\n", + " row_id = idx // num_cols\n", + " img = plt.imread(img_path)\n", + " axarr[row_id, col_id].imshow(img) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualizing the sample images.\n", + "IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], \"inference\", \"images_annotated\")\n", + "COLS = 2 # number of columns in the visualizer grid.\n", + "IMAGES = 4 # number of images to visualize.\n", + "\n", + "visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Deploy " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# Export the RGB model to ONNX model\n", + "!tao model rtdetr export \\\n", + " -e $SPECS_DIR/specs.yaml \\\n", + " export.checkpoint=$RESULTS_DIR/train/rtdetr_model.pth \\\n", + " export.onnx_file=$RESULTS_DIR/export/rtdetr_model.onnx \\\n", + " results_dir=$RESULTS_DIR/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate TensorRT engine using tao deploy\n", + "!tao deploy rtdetr gen_trt_engine -e $SPECS_DIR/gen_trt_engine.yaml \\\n", + " gen_trt_engine.onnx_file=$RESULTS_DIR/export/rtdetr_model.onnx \\\n", + " gen_trt_engine.trt_engine=$RESULTS_DIR/gen_trt_engine/rtdetr_model.engine \\\n", + " results_dir=$RESULTS_DIR/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Evaluate with generated TensorRT engine\n", + "!tao deploy rtdetr evaluate -e $SPECS_DIR/specs.yaml \\\n", + " evaluate.trt_engine=$RESULTS_DIR/gen_trt_engine/rtdetr_model.engine \\\n", + " results_dir=$RESULTS_DIR" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Inference with generated TensorRT engine\n", + "!tao deploy rtdetr inference -e $SPECS_DIR/specs.yaml \\\n", + " inference.trt_engine=$RESULTS_DIR/gen_trt_engine/rtdetr_model.engine \\\n", + " results_dir=$RESULTS_DIR" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualizing the sample images.\n", + "IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], \"trt_inference\", \"images_annotated\")\n", + "COLS = 2 # number of columns in the visualizer grid.\n", + "IMAGES = 4 # number of images to visualize.\n", + "\n", + "visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook has come to an end." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/notebooks/tao_launcher_starter_kit/rtdetr/specs/classmap.txt b/notebooks/tao_launcher_starter_kit/rtdetr/specs/classmap.txt new file mode 100755 index 0000000..941cb4e --- /dev/null +++ b/notebooks/tao_launcher_starter_kit/rtdetr/specs/classmap.txt @@ -0,0 +1,80 @@ +person +bicycle +car +motorcycle +airplane +bus +train +truck +boat +traffic light +fire hydrant +stop sign +parking meter +bench +bird +cat +dog +horse +sheep +cow +elephant +bear +zebra +giraffe +backpack +umbrella +handbag +tie +suitcase +frisbee +skis +snowboard +sports ball +kite +baseball bat +baseball glove +skateboard +surfboard +tennis racket +bottle +wine glass +cup +fork +knife +spoon +bowl +banana +apple +sandwich +orange +broccoli +carrot +hot dog +pizza +donut +cake +chair +couch +potted plant +bed +dining table +toilet +tv +laptop +mouse +remote +keyboard +cell phone +microwave +oven +toaster +sink +refrigerator +book +clock +vase +scissors +teddy bear +hair drier +toothbrush diff --git a/notebooks/tao_launcher_starter_kit/rtdetr/specs/download_coco.sh b/notebooks/tao_launcher_starter_kit/rtdetr/specs/download_coco.sh new file mode 100755 index 0000000..8d58211 --- /dev/null +++ b/notebooks/tao_launcher_starter_kit/rtdetr/specs/download_coco.sh @@ -0,0 +1,69 @@ +#!/bin/bash +# Script can be used to download COCO dataset. +# usage: +# bash download_coco.sh /data-dir +set -e +set -x + + +if [ -z "$1" ]; then + echo "usage download_coco.sh [data dir]" + exit +fi + +UNZIP="unzip -nq" + +# Create the output directories. +OUTPUT_DIR="${1%}/raw-data" +mkdir -p "${OUTPUT_DIR}" +CURRENT_DIR=$(pwd) + +# Helper function to download and unpack a .zip file. +function download_and_unzip() { + local BASE_URL=${1} + local FILENAME=${2} + + if [ ! -f ${FILENAME} ]; then + echo "Downloading ${FILENAME} to $(pwd)" + wget -nd -c "${BASE_URL}/${FILENAME}" + else + echo "Skipping download of ${FILENAME}" + fi + echo "Unzipping ${FILENAME}" + ${UNZIP} ${FILENAME} +} + +cd ${OUTPUT_DIR} + +# Download the images. +BASE_IMAGE_URL="http://images.cocodataset.org/zips" + +TRAIN_IMAGE_FILE="train2017.zip" +download_and_unzip ${BASE_IMAGE_URL} ${TRAIN_IMAGE_FILE} +TRAIN_IMAGE_DIR="${OUTPUT_DIR}/train2017" + +VAL_IMAGE_FILE="val2017.zip" +download_and_unzip ${BASE_IMAGE_URL} ${VAL_IMAGE_FILE} +VAL_IMAGE_DIR="${OUTPUT_DIR}/val2017" + +TEST_IMAGE_FILE="test2017.zip" +download_and_unzip ${BASE_IMAGE_URL} ${TEST_IMAGE_FILE} +TEST_IMAGE_DIR="${OUTPUT_DIR}/test2017" + +# Download the annotations. +BASE_INSTANCES_URL="http://images.cocodataset.org/annotations" +INSTANCES_FILE="annotations_trainval2017.zip" +download_and_unzip ${BASE_INSTANCES_URL} ${INSTANCES_FILE} + +TRAIN_OBJ_ANNOTATIONS_FILE="${OUTPUT_DIR}/annotations/instances_train2017.json" +VAL_OBJ_ANNOTATIONS_FILE="${OUTPUT_DIR}/annotations/instances_val2017.json" + +TRAIN_CAPTION_ANNOTATIONS_FILE="${OUTPUT_DIR}/annotations/captions_train2017.json" +VAL_CAPTION_ANNOTATIONS_FILE="${OUTPUT_DIR}/annotations/captions_val2017.json" + +# Download the test image info. +BASE_IMAGE_INFO_URL="http://images.cocodataset.org/annotations" +IMAGE_INFO_FILE="image_info_test2017.zip" +download_and_unzip ${BASE_IMAGE_INFO_URL} ${IMAGE_INFO_FILE} + +TESTDEV_ANNOTATIONS_FILE="${OUTPUT_DIR}/annotations/image_info_test-dev2017.json" \ No newline at end of file diff --git a/notebooks/tao_launcher_starter_kit/rtdetr/specs/specs.yaml b/notebooks/tao_launcher_starter_kit/rtdetr/specs/specs.yaml new file mode 100644 index 0000000..112f8ad --- /dev/null +++ b/notebooks/tao_launcher_starter_kit/rtdetr/specs/specs.yaml @@ -0,0 +1,150 @@ +dataset: + train_data_sources: + - image_dir: "/data/raw-data/train2017" + json_file: "/data/raw-data/annotations/instances_train2017.json" + val_data_sources: + image_dir: "/data/raw-data/val2017" + json_file: "/data/raw-data/annotations/instances_val2017.json" + test_data_sources: + image_dir: "/data/raw-data/" + json_file: "/data/raw-data/annotations/instanced_test2017.json" + infer_data_sources: + image_dir: + - "/data/raw-data/val2017/" + classmap: "/specs/classmap.txt" + batch_size: 8 + workers: 12 + remap_mscoco_category: false + dataset_type: "default" + num_classes: 91 + eval_class_ids: null + augmentation: + multi_scales: + - 480 + - 512 + - 544 + - 576 + - 608 + - 640 + - 672 + - 704 + - 736 + - 768 + - 800 + train_spatial_size: + - 640 + - 640 + eval_spatial_size: + - 640 + - 640 + distortion_prob: 0.8 + iou_crop_prob: 0.0 + preserve_aspect_ratio: false +inference: + color_map: + person: tan + bicycle: silver + car: blue + motorcycle: black + airplane: white + bus: yellow + train: red + truck: grey + boat: navy + traffic light: green + fire hydrant: red + stop sign: crimson + parking meter: slate + bench: brown + bird: teal + cat: orange + dog: goldenrod + horse: chestnut + sheep: ivory + cow: white + elephant: grey + bear: brown + zebra: black + giraffe: yellow + backpack: blue + umbrella: magenta + handbag: beige + tie: burgundy + suitcase: black + frisbee: lime + skis: neon + snowboard: violet + sports ball: orange + kite: skyblue + baseball bat: wood + baseball glove: tan + skateboard: charcoal + surfboard: turquoise + tennis racket: yellow + bottle: clear + wine glass: transparent + cup: white + fork: silver + knife: steel + spoon: silver + bowl: ceramic + banana: yellow + apple: red + sandwich: beige + orange: orange + broccoli: green + carrot: orange + hot dog: sienna + pizza: gold + donut: pink + cake: chocolate + chair: mahogany + couch: charcoal + potted plant: emerald + bed: white + dining table: oak + toilet: white + tv: black + laptop: silver + mouse: grey + remote: black + keyboard: black + cell phone: black + microwave: white + oven: silver + toaster: chrome + sink: white + refrigerator: silver + book: blue + clock: white + vase: cyan + scissors: red + teddy bear: brown + hair drier: black + toothbrush: blue +evaluate: + checkpoint: "/results/train/rtdetr_model.pth" + conf_threshold: 0.0 +model: + backbone: "resnet_50" + train_backbone: True + #pretrained_backbone_path: "/workspace/tao-experiments/rtdetr/pretrained_weights/c_radio_v2_b.pth" + pretrained_backbone_path: "/workspace/tao-experiments/rtdetr/pretrained_weights/nvidia_resnet50_200821.pth" + return_interm_indices: [1, 2, 3] + dec_layers: 6 + enc_layers: 1 + num_queries: 300 +train: + num_gpus: 1 + validation_interval: 1 + optim: + lr_backbone: 1e-5 + lr: 1e-4 + lr_steps: [1000] + momentum: 0.9 + num_epochs: 40 + precision: bf16 + activation_checkpoint: True + enable_ema: True + ema: + decay: 0.999