synaptics-synap · lkeller-synaptics · Aug 15, 2024 · Aug 15, 2024 · Aug 22, 2024 · Aug 22, 2024
diff --git a/.github/actions/build-doc/Dockerfile b/.github/actions/build-doc/Dockerfile
@@ -8,6 +8,8 @@ RUN pip install sphinxcontrib-plantuml==0.30
 
 RUN pip install breathe==4.35.0
 
+RUN pip install myst-parser==3.0.1
+
 COPY download_releases.py /usr/local/bin
 COPY build.sh /usr/local/bin/build.sh
 

diff --git a/conf.py b/conf.py
@@ -17,6 +17,7 @@
     'sphinx_rtd_theme',
     'sphinxcontrib.plantuml',
     'breathe',
+    'myst_parser'
 ]
 
 html_theme = "sphinx_rtd_theme"

diff --git a/manual/images/preoptimized.svg b/manual/images/preoptimized.svg
diff --git a/manual/index.rst b/manual/index.rst
@@ -12,3 +12,4 @@ SyNAP Manual
      framework_api.rst
      npu_operators.rst
      java.rst
+     test.md
diff --git a/manual/inference.md b/manual/inference.md
@@ -0,0 +1,148 @@
+# Inference
+
+## Introduction
+
+1. The easiest way to get started is using the CLI commands.
+2. For application development, a C++ library API etc.
+
+The simplest way to start experimenting with *Synp* is to use the sample precompiled models and applications that come preinstalled on the board.
+
+> **Important**: On Android the sample models can be found in `/vendor/firmware/models/` while on Yocto Linux they are in `/usr/share/synap/models/`. In this document we will refer to this directory as `$MODELS`.
+
+The models are organized in broad categories according to the type of data they take in input and the information they generate in output. Inside each category, models are organized per topic (for example "imagenet") and for each topic a set of models and sample input data is provided.
+
+For each category a corresponding command line test application is provided.
+
+| **Category**          | **Input** | **Output**                                        | **Test App**        |
+|-----------------------|-----------|--------------------------------------------------|---------------------|
+| image_classification  | image     | probabilities (one per class)                    | synap_cli_ic        |
+| object_detection      | image     | detections (bound.box+class+probability)         | synap_cli_od        |
+| image_processing      | image     | image                                            | synap_cli_ip        |
+
+In addition to the specific applications listed above `synap_cli` can be used to execute models of all categories. The purpose of this application is not to provide high-level outputs but to measure inference timings. This is the only sample application that can be used with models requiring secure inputs or outputs.
+
+### `synap_cli_ic` application
+
+This command line application allows to easily execute *image_classification* models.
+
+It takes in input:
+- the converted synap model (*.synap* extension)
+- one or more images (*jpeg* or *png* format)
+
+It generates in output:
+- the top 5 most probable classes for each input image provided
+
+> **Note**: The jpeg/png input image(s) are resized in SW to the size of the network input tensor. This is not included in the classification time displayed.
+
+Example:
+```sh
+$ cd $MODELS/image_classification/imagenet/model/mobilenet_v2_1.0_224_quant
+$ synap_cli_ic -m model.synap ../../sample/goldfish_224x224.jpg
+Loading network: model.synap
+Input image: ../../sample/goldfish_224x224.jpg
+Classification time: 3.00 ms
+Class  Confidence  Description
+    1       18.99  goldfish, Carassius auratus
+  112        9.30  conch
+  927        8.70  trifle
+   29        8.21  axolotl, mud puppy, Ambystoma mexicanum
+  122        7.71  American lobster, Northern lobster, Maine lobster, Homarus americanus
+```
+
+### `synap_cli_od` application
+
+This command line application allows to easily execute *object_detection* models.
+
+It takes in input:
+- the converted synap model (*.synap* extension)
+- optionally the confidence threshold for detected objects
+- one or more images (*jpeg* or *png* format)
+
+It generates in output:
+- the list of object detected for each input image provided and for each of them the following information:
+  - bounding box
+  - class index
+  - confidence
+
+> **Note**: The jpeg/png input image(s) are resized in SW to the size of the network input tensor.
+
+Example:
+```sh
+$ cd $MODELS/object_detection/people/model/mobilenet224_full1/
+$ synap_cli_od -m model.synap ../../sample/sample001_640x480.jpg
+Input image: ../../sample/sample001_640x480.jpg (w = 640, h = 480, c = 3)
+Detection time: 26.94 ms
+#   Score  Class  Position  Size     Description
+0   0.95       0   94,193    62,143  person
+```
+
+> **Important**: The output of object detection models is not standardized, many different formats exist. The output format used has to be specified when the model is converted, see `model_conversion_tutorial`. If this information is missing or the format is unknown `synap_cli_od` doesn’t know how to interpret the result and so it fails with an error message: *"Failed to initialize detector"*.
+
+### `synap_cli_ip` application
+
+This command line application allows to execute *image_processing* models. The most common case is the execution of super-resolution models that take in input a low-resolution image and generate in output a higher resolution image.
+
+It takes in input:
+- the converted synap model (*.synap* extension)
+- optionally the region of interest in the image (if supported by the model)
+- one or more raw images with one of the following extensions: *nv12*, *nv21*, *rgb*, *bgr*, *bgra*, *gray*  or *bin*
+
+It generates in output:
+- a file containing the processed image in for each input file.
+
+  The output file is called `outimage<i>_<W>x<H>.<ext>`, where `<i>` is the index of the corresponding input file, `<W>` and `<H>` are the dimensions of the image, and `<ext>` depends on the type of the output image, for example `nv12` or `rgb`. The output files are created in the current directory, and this can be changed with the `--out-dir` option.
+
+> **Note**: The input image(s) are automatically resized to the size of the network input tensor. This is not supported for `nv12`: if the network takes in input an `nv12` image, the file provided in input must have the same format and the *WxH* dimensions of the image must correspond to the dimensions of the input tensor of the network.
+
+> **Note**: Any `png` and `jpeg` image can be converted to `nv12` and rescaled to the required size using the `image_to_raw` command available in the *SyNAP* `toolkit` (for more info see `using-docker-label`). In the same way the generated raw `nv12` or `rgb` images can be converted to `png` or `jpeg` format using the `image_from_raw` command.
+
+Example:
+```sh
+$ cd $MODELS/image_processing/super_resolution/model/sr_qdeo_y_uv_1920x1080_3840x2160
+$ synap_cli_ip -m model.synap ../../sample/ref_1920x1080.nv12
+Input buffer: input_0 size: 1036800
+Input buffer: input_1 size: 2073600
+Output buffer: output_13 size: 4147200
+Output buffer: output_14 size: 8294400
+
+Input image: ../../sample/ref_1920x1080.nv12
+Inference time: 30.91 ms
+Writing output to file: outimage0_3840x2160.nv12
+```
+
+### `synap_cli_ic2` application
+
+This application executes two models in sequence, the input image is fed to the first model and its output is then fed to the second one which is used to perform classification as in `synap_cli_ic`. It provides an easy way to experiment with 2-stage inference, where for example the first model is a *preprocessing* model for downscaling and/or format conversion and the second is an *image_classification* model.
+
+It takes in input:
+- the converted synap *preprocessing* model (*.synap* extension)
+- the converted synap *classification* model (*.synap* extension)
+- one or more images (*jpeg* or *png* format)
+
+It generates in output:
+- the top 5 most probable classes for each input image provided
+
+> **Note**: The shape of the output tensor of the first model must match that of the input of the second model.
+
+Example:
+```sh
+$ pp=$MODELS/image_processing/preprocess/model/convert_nv12@1920x1080_rgb@224x224
+$ cd $MODELS/image_classification/imagenet/model/mobilenet_v2_1.0_224_quant
+$ synap_cli_ic2 -m $pp/model.synap -m2 model.synap ../../sample/goldfish_1920x1080.nv12
+
+Inference time: 4.34 ms
+Class  Confidence  Description
+    1       19.48  goldfish, Carassius auratus
+  122       10.68  American lobster, Northern lobster, Maine lobster, Homarus americanus
+  927        9.69  trifle
+  124        9.69  crayfish, crawfish, crawdad, crawdaddy
+  314        9.10  cockroach, roach
+```
+
+The classification output is very close to what we get in `synap_cli_ic`, the minor difference is due to the difference in the image rescaled from NV12. The bigger overall inference time is due to the processing required to perform rescale and conversion of the input 1920x1080 image.
+
+### `synap_cli` application
+
+This command line application can be used to run models of all categories. The purpose of `synap_cli` is not to show inference results but to benchmark the network execution times. So it provides additional options that allow to run inference multiple times in order to collect statistics.
+
+An additional feature is that `synap_cli` can automatically generate input images with random content. This
diff --git a/manual/introduction.md b/manual/introduction.md
@@ -0,0 +1,61 @@
+Introduction
+============
+
+SyNAP is a software tool that optimizes neural network models for on-device inference by targeting *NPU* or *GPU* hardware accelerators in [Synaptics Astra Embedded Processors](https://www.synaptics.com/products/embedded-processors). To do this, it takes models in their original representation (e.g., Tensorflow Lite, PyTorch, or ONNX) and compiles them to a binary network graph `.synap` format specific to the target hardware, ready for inference.
+
+Optimizing models for NPU
+-------------------------
+
+Optimization of models for embedded applications using ahead-of-time compilation can usually be done with a [single command](optimizing_models.md). Optimization options (e.g. [mixed quantization](tutorials/model_import), [heterogeneous inference](heterogeneous_inference)) can be also passed at compile time using a [YAML metafile](conversion-metafile), and the model can be signed and encrypted to support Synaptics SyKURE™ secure inference technology.
+
+![synap](images/preoptimized.svg)
+
+> [!NOTE] 
+> While optimal for the target hardware, a pre-optimized model is target specific and will fail to execute on different hardware.
+
+Running inference
+-----------------
+
+There are a number of ways you can run [inference](inference.md) using compiled `.synap` models on Synaptics Astra hardware:
+
+- Image classification, object detection, and image processing using `synap_cli` commands.
+- Gstreamer plugin and Python examples for streaming media (e.g., webcam object detection).
+- Embedded applications developed in C++ or Python can use the [SyNAP Framework API](./framework_api.rst).
+
+> [!IMPORTANT] 
+> The simplest way to start experimenting with *SyNAP* is to use the sample precompiled models and applications that come preinstalled on the Synaptics Astra board.
+
+JIT compilation
+---------------
+
+For portable apps (e.g., targeting Android) you might consider the [JIT compilation](jit_compilation.md) approach instead. This approach uses a Tensorflow Lite external delegate to run inference using the original `.tflite` model directly.
+
+This offers the greatest hardware portability, but there are a few disadvantages to this approach. Using this method requires any hardware-specific optimizations be done in the TensorFlow training or TFLite model export stages, which is much more involved than post-training quantization using SyNAP. Additionally, initialization time can take a few seconds on first inference, and secure media paths are not available.
+
+Model Profiling & Benchmarks
+----------------------------
+
+SyNAP provides [analysis tools](sysfs-inference-counter) in order to identify bottlenecks and optimize models. These include:
+
+- Overall model inference timing
+- NPU runtime statistics (e.g., overall layer and I/O buffer utilization)
+- Model profiling (e.g., per-layer operator type, execution time, memory usage)
+
+You can also find a [comprehensive list of reference models and benchmarks](benchmark).
+
+NPU Hardware
+------------
+
+SyNAP aims to make best use of supported [neural network operators](npu_operators) in order to accelerate on-device inference using the available NPU or GPU hardware. The NPUs themselves consist of several distinct types of functional unit:
+
+- **Convolutional Core**: Optimized to only execute convolutions (int8, int16, float 16).
+- **Tensor Processor**: Optimized to execute highly parallel operations (int8, int16, float 16).
+- **Parallel Processing Unit**: 128-bit SIMD execution unit (slower, but more flexible).
+- **Internal RAM**: Used to cache data and weights.
+
+
+| Chip         | Neural Network Core | Tensor Processor   | Parallel Processing Unit |
+|--------------|---------------------|--------------------|--------------------------|
+| VS640, SL1640| 4                   | 2 Full + 4 Lite    | 1                        |
+| VS680, SL1680| 22                  | 8 Full             | 1                        |
+
diff --git a/manual/java.md b/manual/java.md
@@ -0,0 +1,103 @@
+# Direct Access in Android Applications
+
+In Android, in addition to NN API, SyNAP can be directly accessed by applications. Direct access to SyNAP main benefits are zero-copy input/output and execution of optimized models compiled ahead of time with the SyNAP toolkit.
+
+Access to SyNAP can be performed via custom JNI C++ code using the `synapnb` library. The library can be used as usual, the only constraint is to use the Synap allocator, which can be obtained with `synap_allocator()`.
+
+Another option is to use custom JNI C code using the `synap_device` library. In this case, there are no constraints. The library allows creating new I/O buffers with the function `synap_allocate_io_buffer`. It is also possible to use existing DMABUF handles obtained, for instance, from gralloc with `synap_create_io_buffer`. The DMABUF can be accessed with standard Linux DMABUF APIs (i.e., `mmap`/`munmap`/`ioctls`).
+
+SyNAP provides a sample JNI library that shows how to use the `synap_device` library in a Java application. The code is located in `java` and can be included in an existing Android application by adding the following lines to the `settings.gradle` of the application:
+
+```groovy
+include ':synap'
+project(':synap').projectDir = file("[absolute path to synap]/java")
+```
+
+The code can then be used as follows:
+
+```java
+package com.synaptics.synap;
+
+public class InferenceEngine {
+
+    /**
+     * Perform inference using the model passed in  data
+     *
+     * @param model EBG model
+     * @param inputs arrays containing model input data, one byte array per network input,
+     *               of the size expected by the network
+     * @param outputs arrays where to store output of the network, one byte array per network
+     *                output, of the size expected by the network
+     */
+    public static void infer(byte[] model, byte[][] inputs, byte[][] outputs) {
+
+        Synap synap = Synap.getInstance();
+
+        // load the network
+        Network network = synap.createNetwork(model);
+
+        // create input buffers and attach them to the network
+        IoBuffer[] inputBuffers = new IoBuffer[inputs.length];
+        Attachment[] inputAttachments = new Attachment[inputs.length];
+
+        for (int i = 0; i < inputs.length; i++) {
+            // create the input buffer of the desired length
+            inputBuffers[i] = synap.createIoBuffer(inputs[i].length);
+
+            // attach the buffer to the network (make sure you keep a reference to the
+            // attachment to avoid it is garbage collected and destroyed)
+            inputAttachments[i] = network.attachIoBuffer(inputBuffers[i]);
+
+            // set the buffer as the i-th input of the network
+            inputAttachments[i].useAsInput(i);
+
+            // copy the input data to the buffer
+            inputBuffers[i].copyFromBuffer(inputs[i], 0, 0, inputs[i].length);
+        }
+
+        // create the output buffers and attach them to the network
+        IoBuffer[] outputBuffers = new IoBuffer[outputs.length];
+        Attachment[] outputAttachments = new Attachment[inputs.length];
+
+        for (int i = 0; i < outputs.length; i++) {
+            // create the output buffer of the desired length
+            outputBuffers[i] = synap.createIoBuffer(outputs[i].length);
+
+            // attach the buffer to the network (make sure you keep a reference to the
+            // attachment to avoid it is garbage collected and destroyed)
+            outputAttachments[i] = network.attachIoBuffer(outputBuffers[i]);
+
+            // set the buffer as the i-th output of the network
+            outputAttachments[i].useAsOutput(i);
+        }
+
+        // run the network
+        network.run();
+
+        // copy the result data to the output buffers
+        for (int i = 0; i < outputs.length; i++) {
+            outputBuffers[i].copyToBuffer(outputs[i], 0, 0, outputs[i].length);
+        }
+
+        // release resources (it will be done automatically when the objects are garbage
+        // collected but this may take some time so it is better to release them explicitly
+        // as soon as possible)
+
+        network.release();  // this will automatically release the attachments
+
+        for (int i = 0 ; i < inputs.length; i++) {
+            inputBuffers[i].release();
+        }
+
+        for (int i = 0 ; i < outputs.length; i++) {
+            outputBuffers[i].release();
+        }
+
+    }
+
+}
+```
+
+> **Note**: 
+> 
+> To simplify application development by default, VSSDK allows untrusted applications (such as applications sideloaded or downloaded from the Google Play store) to use the SyNAP API. Since the API uses limited hardware resources, this can lead to situations in which a 3rd party application interferes with platform processes. To restrict access to SyNAP only to platform applications, remove the file `vendor/vsi/sepolicy/synap_device/untrusted_app.te`.