From c4e6ff099607adfb9021b79a0e9d4918fe3ec772 Mon Sep 17 00:00:00 2001 From: nathan contino Date: Mon, 16 Jun 2025 14:28:03 -0400 Subject: [PATCH 1/2] Add inline docs --- README.md | 106 +++++++++++++++++++++++++++++------------------------- meta.json | 4 ++- 2 files changed, 61 insertions(+), 49 deletions(-) diff --git a/README.md b/README.md index 99f396f..bf0a88c 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,22 @@ # Viam Torchvision Module - -This is a [Viam module](https://docs.viam.com/extend/modular-resources/) providing a model of vision service for [TorchVision's New Multi-Weight Support API](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/). +This is a [Viam module](https://docs.viam.com/extend/modular-resources/) providing a model of [vision service](https://docs.viam.com/services/vision/#api) for [TorchVision's New Multi-Weight Support API](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/).

- -

- + +

- For a given model architecture (e.g. *ResNet50*), multiple weights can be available and each of those weights comes with Metadata (preprocessing and labels). +For a given model architecture (e.g. *ResNet50*), multiple weights can be available. Each of those weights comes with preprocessing and label metadata. ## Getting started +First, [create a machine](https://docs.viam.com/how-tos/configure/) in Viam. + To use this module, follow these instructions to [add a module from the Viam Registry](https://docs.viam.com/modular-resources/configure/#add-a-module-from-the-viam-registry) and select the `viam:vision:torchvision` model from the [`torchvision` module](https://app.viam.com/module/viam/torchvision). + +Navigate to the [**CONFIGURE** tab](https://docs.viam.com/configure/) of your [machine](https://docs.viam.com/fleet/machines/) in the [Viam app](https://app.viam.com/). + +[Add vision / torchvision to your machine](https://docs.viam.com/configure/#components). + Depending on the type of models configured, the module implements: - For detectors: @@ -22,20 +27,58 @@ Depending on the type of models configured, the module implements: - `GetClassifications()` - `GetClassificationsFromCamera()` -> [!NOTE] ->See [vision service API](https://docs.viam.com/services/vision/#api) for more details. +## viam:vision:torchvision + +To configure the `torchvision` model, use the following template: + +```json +"attributes": { + "model_name": , + "labels_confidences": { + : , + : + }, + "default_minimum_confidence": +} +``` + +### Attributes -## Configure your `torchvision` vision service +The only **required attribute** to configure your torchvision vision service is a `model_name`: -> [!NOTE] -> Before configuring your vision service, you must [create a machine](https://docs.viam.com/how-tos/configure/). -Navigate to the [**CONFIGURE** tab](https://docs.viam.com/configure/) of your [machine](https://docs.viam.com/fleet/machines/) in the [Viam app](https://app.viam.com/). -[Add vision / torchvision to your machine](https://docs.viam.com/configure/#components). +| Name | Type | Inclusion | Default | Description | +| ------------ | ------ | ------------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `model_name` | string | **Required** | | Vision model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. | + + +### Optional attributes -### Example configuration with a camera and transform camera +| Name | Type | Inclusion | Default | Description | +| ---------------------------- | --------------------- | --------- | ----------- | ----------- | +| `weights` | string | Optional | `DEFAULT` | Weights model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. | +| `default_minimum_confidence` | float | Optional | | Default minimum confidence for filtering all labels that are not specified in `label_confidences`. | +| `labels_confidences` | dict[str, float] | Optional | | Dictionary specifying minimum confidence thresholds for specific labels. Example: `{"grasshopper": 0.5, "cricket": 0.45}`. If a label has a confidence set lower that `default_minimum_confidence`, that confidence over-writes the default for the specified label if `labels_confidences` is left blank, no filtering on labels will be applied. | +| `use_weight_transform` | bool | Optional | True | Loads preprocessing transform from weights metadata. | +| `input size` | List[int] | Optional | `None` | Resize the image. Overides resize from weights metadata. | +| `mean_rgb` | [float, float, float] | Optional | `[0, 0, 0]` | Specifies the mean and standard deviation values for normalization in RGB order. | +| `std_rgb` | [float, float, float] | Optional | `[1, 1, 1]` | Specifies the standard deviation values for normalization in RGB order. | +| `swap_r_and_b` | bool | Optional | `False` | If True, swaps the R and B channels in the input image. Use this if the images passed as inputs to the model are in the OpenCV format. | +| `channel_last` | bool | Optional | `False` | If True, the image tensor will be converted to channel-last format. Default is False. | + +### Preprocessing transforms behavior and **order**: + +- If there are a transform in the metadata of the weights and `use_weight_transform` is True, `weights_transform` is added to the pipeline. +- If `input_size` is provided, the image is resized using `v2.Resize()` to the specified size. +- If both mean and standard deviation values are provided in `normalize`, the image is normalized using `v2.Normalize()` with the specified mean and standard deviation values. +- If `swap_R_and_B` is set to `True`, first and last channel are swapped. +- If `channel_last` is `True`, a transformation is applied to convert the channel order to the last dimension format. (C, H ,W) -> (H, W, X). + + +#### Full example configuration The following JSON config file includes the following resources: + - TorchVision module - modular resource (TorchVision vision service) - a [webcam camera](https://docs.viam.com/components/camera/webcam/) @@ -99,40 +142,7 @@ The following JSON config file includes the following resources: } ``` +### Resources -### Attributes description - -The only **required attribute** to configure your torchvision vision service is a `model_name`: - - -| Name | Type | Inclusion | Default | Description | -| ------------ | ------ | ------------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `model_name` | string | **Required** | | Vision model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. | - - - -## Supplementaries -### Optional config attributes -| Name | Type | Inclusion | Default | Description | -| ---------------------------- | --------------------- | --------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `weights` | string | Optional | `DEFAULT` | Weights model name as expected by the method [get_model()](https://pytorch.org/vision/main/models.html#listing-and-retrieving-available-models) from torchvision multi-weight API. | -| `default_minimum_confidence` | float | Optional | | Default minimum confidence for filtering all labels that are not specified in `label_confidences`. | -| `labels_confidences` | dict[str, float] | Optional | | Dictionary specifying minimum confidence thresholds for specific labels. Example: `{"grasshopper": 0.5, "cricket": 0.45}`. If a label has a confidence set lower that `default_minimum_confidence`, that confidence over-writes the default for the specified label if `labels_confidences` is left blank, no filtering on labels will be applied. | -| `use_weight_transform` | bool | Optional | True | Loads preprocessing transform from weights metadata. | -| `input size` | List[int] | Optional | `None` | Resize the image. Overides resize from weights metadata. | -| `mean_rgb` | [float, float, float] | Optional | `[0, 0, 0]` | Specifies the mean and standard deviation values for normalization in RGB order | -| `std_rgb` | [float, float, float] | Optional | `[1, 1, 1]` | Specifies the standard deviation values for normalization in RGB order. | -| `swap_r_and_b` | bool | Optional | `False` | If True, swaps the R and B channels in the input image. Use this if the images passed as inputs to the model are in the OpenCV format. | -| `channel_last` | bool | Optional | `False` | If True, the image tensor will be converted to channel-last format. Default is False. | -### Preprocessing transforms behavior and **order**: - - If there are a transform in the metadata of the weights and `use_weight_transform` is True, `weights_transform` is added to the pipeline. - - If `input_size` is provided, the image is resized using `v2.Resize()` to the specified size. - - If both mean and standard deviation values are provided in `normalize`, the image is normalized using `v2.Normalize()` with the specified mean and standard deviation values. - - If `swap_R_and_B` is set to `True`, first and last channel are swapped. - - If `channel_last` is `True`, a transformation is applied to convert the channel order to the last dimension format. (C, H ,W) -> (H, W, X). - - - -### RESOURCES - [Table of all available classification weights](https://pytorch.org/vision/main/models.html#table-of-all-available-classification-weights) - [Quantized models](https://pytorch.org/vision/main/models.html#quantized-models) diff --git a/meta.json b/meta.json index 47f4931..009ee64 100644 --- a/meta.json +++ b/meta.json @@ -6,7 +6,9 @@ "models": [ { "api": "rdk:service:vision", - "model": "viam:vision:torchvision" + "model": "viam:vision:torchvision", + "short_description": "Service wrapper for the torchvision computer vision library.", + "markdown_link": "README.md#viamvisiontorchvision" } ], "build": { From 38767413e9d035f741c4cbbf40cb0c817feb40f3 Mon Sep 17 00:00:00 2001 From: nathan contino Date: Mon, 16 Jun 2025 14:39:42 -0400 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bf0a88c..8bf4b96 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ Depending on the type of models configured, the module implements: To configure the `torchvision` model, use the following template: ```json -"attributes": { +{ "model_name": , "labels_confidences": { : ,