Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/actions/build-doc/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ RUN pip install sphinxcontrib-plantuml==0.30

RUN pip install breathe==4.35.0

RUN pip install myst-parser==3.0.1

COPY download_releases.py /usr/local/bin
COPY build.sh /usr/local/bin/build.sh

Expand Down
1 change: 1 addition & 0 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
'sphinx_rtd_theme',
'sphinxcontrib.plantuml',
'breathe',
'myst_parser'
]

html_theme = "sphinx_rtd_theme"
Expand Down
644 changes: 644 additions & 0 deletions manual/images/preoptimized.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions manual/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ SyNAP Manual
framework_api.rst
npu_operators.rst
java.rst
test.md
148 changes: 148 additions & 0 deletions manual/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Inference

## Introduction

1. The easiest way to get started is using the CLI commands.
2. For application development, a C++ library API etc.

The simplest way to start experimenting with *Synp* is to use the sample precompiled models and applications that come preinstalled on the board.

> **Important**: On Android the sample models can be found in `/vendor/firmware/models/` while on Yocto Linux they are in `/usr/share/synap/models/`. In this document we will refer to this directory as `$MODELS`.

The models are organized in broad categories according to the type of data they take in input and the information they generate in output. Inside each category, models are organized per topic (for example "imagenet") and for each topic a set of models and sample input data is provided.

For each category a corresponding command line test application is provided.

| **Category** | **Input** | **Output** | **Test App** |
|-----------------------|-----------|--------------------------------------------------|---------------------|
| image_classification | image | probabilities (one per class) | synap_cli_ic |
| object_detection | image | detections (bound.box+class+probability) | synap_cli_od |
| image_processing | image | image | synap_cli_ip |

In addition to the specific applications listed above `synap_cli` can be used to execute models of all categories. The purpose of this application is not to provide high-level outputs but to measure inference timings. This is the only sample application that can be used with models requiring secure inputs or outputs.

### `synap_cli_ic` application

This command line application allows to easily execute *image_classification* models.

It takes in input:
- the converted synap model (*.synap* extension)
- one or more images (*jpeg* or *png* format)

It generates in output:
- the top 5 most probable classes for each input image provided

> **Note**: The jpeg/png input image(s) are resized in SW to the size of the network input tensor. This is not included in the classification time displayed.

Example:
```sh
$ cd $MODELS/image_classification/imagenet/model/mobilenet_v2_1.0_224_quant
$ synap_cli_ic -m model.synap ../../sample/goldfish_224x224.jpg
Loading network: model.synap
Input image: ../../sample/goldfish_224x224.jpg
Classification time: 3.00 ms
Class Confidence Description
1 18.99 goldfish, Carassius auratus
112 9.30 conch
927 8.70 trifle
29 8.21 axolotl, mud puppy, Ambystoma mexicanum
122 7.71 American lobster, Northern lobster, Maine lobster, Homarus americanus
```

### `synap_cli_od` application

This command line application allows to easily execute *object_detection* models.

It takes in input:
- the converted synap model (*.synap* extension)
- optionally the confidence threshold for detected objects
- one or more images (*jpeg* or *png* format)

It generates in output:
- the list of object detected for each input image provided and for each of them the following information:
- bounding box
- class index
- confidence

> **Note**: The jpeg/png input image(s) are resized in SW to the size of the network input tensor.

Example:
```sh
$ cd $MODELS/object_detection/people/model/mobilenet224_full1/
$ synap_cli_od -m model.synap ../../sample/sample001_640x480.jpg
Input image: ../../sample/sample001_640x480.jpg (w = 640, h = 480, c = 3)
Detection time: 26.94 ms
# Score Class Position Size Description
0 0.95 0 94,193 62,143 person
```

> **Important**: The output of object detection models is not standardized, many different formats exist. The output format used has to be specified when the model is converted, see `model_conversion_tutorial`. If this information is missing or the format is unknown `synap_cli_od` doesn’t know how to interpret the result and so it fails with an error message: *"Failed to initialize detector"*.

### `synap_cli_ip` application

This command line application allows to execute *image_processing* models. The most common case is the execution of super-resolution models that take in input a low-resolution image and generate in output a higher resolution image.

It takes in input:
- the converted synap model (*.synap* extension)
- optionally the region of interest in the image (if supported by the model)
- one or more raw images with one of the following extensions: *nv12*, *nv21*, *rgb*, *bgr*, *bgra*, *gray* or *bin*

It generates in output:
- a file containing the processed image in for each input file.

The output file is called `outimage<i>_<W>x<H>.<ext>`, where `<i>` is the index of the corresponding input file, `<W>` and `<H>` are the dimensions of the image, and `<ext>` depends on the type of the output image, for example `nv12` or `rgb`. The output files are created in the current directory, and this can be changed with the `--out-dir` option.

> **Note**: The input image(s) are automatically resized to the size of the network input tensor. This is not supported for `nv12`: if the network takes in input an `nv12` image, the file provided in input must have the same format and the *WxH* dimensions of the image must correspond to the dimensions of the input tensor of the network.

> **Note**: Any `png` and `jpeg` image can be converted to `nv12` and rescaled to the required size using the `image_to_raw` command available in the *SyNAP* `toolkit` (for more info see `using-docker-label`). In the same way the generated raw `nv12` or `rgb` images can be converted to `png` or `jpeg` format using the `image_from_raw` command.

Example:
```sh
$ cd $MODELS/image_processing/super_resolution/model/sr_qdeo_y_uv_1920x1080_3840x2160
$ synap_cli_ip -m model.synap ../../sample/ref_1920x1080.nv12
Input buffer: input_0 size: 1036800
Input buffer: input_1 size: 2073600
Output buffer: output_13 size: 4147200
Output buffer: output_14 size: 8294400

Input image: ../../sample/ref_1920x1080.nv12
Inference time: 30.91 ms
Writing output to file: outimage0_3840x2160.nv12
```

### `synap_cli_ic2` application

This application executes two models in sequence, the input image is fed to the first model and its output is then fed to the second one which is used to perform classification as in `synap_cli_ic`. It provides an easy way to experiment with 2-stage inference, where for example the first model is a *preprocessing* model for downscaling and/or format conversion and the second is an *image_classification* model.

It takes in input:
- the converted synap *preprocessing* model (*.synap* extension)
- the converted synap *classification* model (*.synap* extension)
- one or more images (*jpeg* or *png* format)

It generates in output:
- the top 5 most probable classes for each input image provided

> **Note**: The shape of the output tensor of the first model must match that of the input of the second model.

Example:
```sh
$ pp=$MODELS/image_processing/preprocess/model/convert_nv12@1920x1080_rgb@224x224
$ cd $MODELS/image_classification/imagenet/model/mobilenet_v2_1.0_224_quant
$ synap_cli_ic2 -m $pp/model.synap -m2 model.synap ../../sample/goldfish_1920x1080.nv12

Inference time: 4.34 ms
Class Confidence Description
1 19.48 goldfish, Carassius auratus
122 10.68 American lobster, Northern lobster, Maine lobster, Homarus americanus
927 9.69 trifle
124 9.69 crayfish, crawfish, crawdad, crawdaddy
314 9.10 cockroach, roach
```

The classification output is very close to what we get in `synap_cli_ic`, the minor difference is due to the difference in the image rescaled from NV12. The bigger overall inference time is due to the processing required to perform rescale and conversion of the input 1920x1080 image.

### `synap_cli` application

This command line application can be used to run models of all categories. The purpose of `synap_cli` is not to show inference results but to benchmark the network execution times. So it provides additional options that allow to run inference multiple times in order to collect statistics.

An additional feature is that `synap_cli` can automatically generate input images with random content. This
61 changes: 61 additions & 0 deletions manual/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Introduction
============

SyNAP is a software tool that optimizes neural network models for on-device inference by targeting *NPU* or *GPU* hardware accelerators in [Synaptics Astra Embedded Processors](https://www.synaptics.com/products/embedded-processors). To do this, it takes models in their original representation (e.g., Tensorflow Lite, PyTorch, or ONNX) and compiles them to a binary network graph `.synap` format specific to the target hardware, ready for inference.

Optimizing models for NPU
-------------------------

Optimization of models for embedded applications using ahead-of-time compilation can usually be done with a [single command](optimizing_models.md). Optimization options (e.g. [mixed quantization](tutorials/model_import), [heterogeneous inference](heterogeneous_inference)) can be also passed at compile time using a [YAML metafile](conversion-metafile), and the model can be signed and encrypted to support Synaptics SyKURE™ secure inference technology.

![synap](images/preoptimized.svg)

> [!NOTE]
> While optimal for the target hardware, a pre-optimized model is target specific and will fail to execute on different hardware.

Running inference
-----------------

There are a number of ways you can run [inference](inference.md) using compiled `.synap` models on Synaptics Astra hardware:

- Image classification, object detection, and image processing using `synap_cli` commands.
- Gstreamer plugin and Python examples for streaming media (e.g., webcam object detection).
- Embedded applications developed in C++ or Python can use the [SyNAP Framework API](./framework_api.rst).

> [!IMPORTANT]
> The simplest way to start experimenting with *SyNAP* is to use the sample precompiled models and applications that come preinstalled on the Synaptics Astra board.

JIT compilation
---------------

For portable apps (e.g., targeting Android) you might consider the [JIT compilation](jit_compilation.md) approach instead. This approach uses a Tensorflow Lite external delegate to run inference using the original `.tflite` model directly.

This offers the greatest hardware portability, but there are a few disadvantages to this approach. Using this method requires any hardware-specific optimizations be done in the TensorFlow training or TFLite model export stages, which is much more involved than post-training quantization using SyNAP. Additionally, initialization time can take a few seconds on first inference, and secure media paths are not available.

Model Profiling & Benchmarks
----------------------------

SyNAP provides [analysis tools](sysfs-inference-counter) in order to identify bottlenecks and optimize models. These include:

- Overall model inference timing
- NPU runtime statistics (e.g., overall layer and I/O buffer utilization)
- Model profiling (e.g., per-layer operator type, execution time, memory usage)

You can also find a [comprehensive list of reference models and benchmarks](benchmark).

NPU Hardware
------------

SyNAP aims to make best use of supported [neural network operators](npu_operators) in order to accelerate on-device inference using the available NPU or GPU hardware. The NPUs themselves consist of several distinct types of functional unit:

- **Convolutional Core**: Optimized to only execute convolutions (int8, int16, float 16).
- **Tensor Processor**: Optimized to execute highly parallel operations (int8, int16, float 16).
- **Parallel Processing Unit**: 128-bit SIMD execution unit (slower, but more flexible).
- **Internal RAM**: Used to cache data and weights.


| Chip | Neural Network Core | Tensor Processor | Parallel Processing Unit |
|--------------|---------------------|--------------------|--------------------------|
| VS640, SL1640| 4 | 2 Full + 4 Lite | 1 |
| VS680, SL1680| 22 | 8 Full | 1 |

103 changes: 103 additions & 0 deletions manual/java.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Direct Access in Android Applications

In Android, in addition to NN API, SyNAP can be directly accessed by applications. Direct access to SyNAP main benefits are zero-copy input/output and execution of optimized models compiled ahead of time with the SyNAP toolkit.

Access to SyNAP can be performed via custom JNI C++ code using the `synapnb` library. The library can be used as usual, the only constraint is to use the Synap allocator, which can be obtained with `synap_allocator()`.

Another option is to use custom JNI C code using the `synap_device` library. In this case, there are no constraints. The library allows creating new I/O buffers with the function `synap_allocate_io_buffer`. It is also possible to use existing DMABUF handles obtained, for instance, from gralloc with `synap_create_io_buffer`. The DMABUF can be accessed with standard Linux DMABUF APIs (i.e., `mmap`/`munmap`/`ioctls`).

SyNAP provides a sample JNI library that shows how to use the `synap_device` library in a Java application. The code is located in `java` and can be included in an existing Android application by adding the following lines to the `settings.gradle` of the application:

```groovy
include ':synap'
project(':synap').projectDir = file("[absolute path to synap]/java")
```

The code can then be used as follows:

```java
package com.synaptics.synap;

public class InferenceEngine {

/**
* Perform inference using the model passed in data
*
* @param model EBG model
* @param inputs arrays containing model input data, one byte array per network input,
* of the size expected by the network
* @param outputs arrays where to store output of the network, one byte array per network
* output, of the size expected by the network
*/
public static void infer(byte[] model, byte[][] inputs, byte[][] outputs) {

Synap synap = Synap.getInstance();

// load the network
Network network = synap.createNetwork(model);

// create input buffers and attach them to the network
IoBuffer[] inputBuffers = new IoBuffer[inputs.length];
Attachment[] inputAttachments = new Attachment[inputs.length];

for (int i = 0; i < inputs.length; i++) {
// create the input buffer of the desired length
inputBuffers[i] = synap.createIoBuffer(inputs[i].length);

// attach the buffer to the network (make sure you keep a reference to the
// attachment to avoid it is garbage collected and destroyed)
inputAttachments[i] = network.attachIoBuffer(inputBuffers[i]);

// set the buffer as the i-th input of the network
inputAttachments[i].useAsInput(i);

// copy the input data to the buffer
inputBuffers[i].copyFromBuffer(inputs[i], 0, 0, inputs[i].length);
}

// create the output buffers and attach them to the network
IoBuffer[] outputBuffers = new IoBuffer[outputs.length];
Attachment[] outputAttachments = new Attachment[inputs.length];

for (int i = 0; i < outputs.length; i++) {
// create the output buffer of the desired length
outputBuffers[i] = synap.createIoBuffer(outputs[i].length);

// attach the buffer to the network (make sure you keep a reference to the
// attachment to avoid it is garbage collected and destroyed)
outputAttachments[i] = network.attachIoBuffer(outputBuffers[i]);

// set the buffer as the i-th output of the network
outputAttachments[i].useAsOutput(i);
}

// run the network
network.run();

// copy the result data to the output buffers
for (int i = 0; i < outputs.length; i++) {
outputBuffers[i].copyToBuffer(outputs[i], 0, 0, outputs[i].length);
}

// release resources (it will be done automatically when the objects are garbage
// collected but this may take some time so it is better to release them explicitly
// as soon as possible)

network.release(); // this will automatically release the attachments

for (int i = 0 ; i < inputs.length; i++) {
inputBuffers[i].release();
}

for (int i = 0 ; i < outputs.length; i++) {
outputBuffers[i].release();
}

}

}
```

> **Note**:
>
> To simplify application development by default, VSSDK allows untrusted applications (such as applications sideloaded or downloaded from the Google Play store) to use the SyNAP API. Since the API uses limited hardware resources, this can lead to situations in which a 3rd party application interferes with platform processes. To restrict access to SyNAP only to platform applications, remove the file `vendor/vsi/sepolicy/synap_device/untrusted_app.te`.
Loading