Hardware-Aware AI Model Optimization with NetsPresso

Author Profile

Soner Yildirim

Soner is an electrical engineer by education and a data scientist by passion. He also writes content on topics related to data science such as machine-learning and deep-learning algorithms, data visualization, data analysis, and Python and R packages for data science. In his free time, he enjoys watching and playing soccer.

**Medium**

More written by Yildirim is posted on Medium. Please click on the image to connect to Medium immediately.

**Linkedin**

Yildirim's LinkedIn gives you more knowledge about data science and machine learning. Please click on the image to connect to Linkedin immediately.

**Twitter**

Yildirim's Twitter is quick to find out what he’s updating. Please click on the image to connect to Twitter immediately.

Hardware-Aware AI Model Optimization with NetsPresso

The tremendous growth of data science and machine learning has resulted in a rapidly increasing number of applications in a wide range of industries. More and more businesses have adopted machine learning to create value, improve their processes, and increase productivity and profitability. As a natural result of this, the extent of machine learning applications has reached beyond the traditional environment such as cloud and local servers.

Machine learning use cases in IoT devices, mobile phones, and many other edge devices are quite common now. Thanks to the improvements in machine learning tools and edge computing, the power of Edge AI has become more noticeable.

Edge AI simply means deploying machine learning and AI applications in physical devices such as sensors, mobile phones, and other electronic devices. The main reason why it is called edge is that computations are done at the edge (i.e. in the device) where the data is located rather than at a central hub such as the cloud or a local server.

NetsPresso is an AI model optimization platform for edge devices. It is hardware-aware meaning you can select the device to optimize for. This is a highly critical aspect for Edge AI because you will have a limited computation power and memory unlike cloud-based applications.

NetsPresso currently has 3 modules, which help you create an entire pipeline from model creation to deployment. These modules are:

Model Searcher: Searches for optimized models specific to a target device
Model Compressor: Compresses models for computational efficiency
Model Launcher: Converts and/or packages models to a suitable format for a specific target device

You can use one or more of these modules to create a customized pipeline. For instance, if you already have a model, you can compress and deploy it to a target device using the model compressor and model launcher modules. If you just have a dataset and want to create a pipeline with NetsPresso, you can use all three modules.

Creating pipelines with NetsPresso leads to shorter development times letting machine learning engineers and practitioners to focus on other areas of the product. Let’s create a project on NetsPresso and learn about the process of using all three modules.

Building a Project with NetsPresso

Go to NetsPresso website and click on one of the three modules available in the drop-down menu of the start button. You will then directed to the console as shown on the left:

In the models section, you can see the models you created in NetsPresso in addition to the NetsPresso’s ready-to-use models such as sample_yolov5s_VOC and sample_yolov4_efficientnet_voc. You can also upload a model and use it with the model compressor and model launcher modules.

We will create a model from scratch on NetsPresso so we will open up the projects page in the console. We can also see our previous projects on this page. To start a new one, just click on the new project button.

It is important to note that NetsPresso is currently a low-code platform that provides a GUI interface. In future releases, API and CLI interfaces will be provided. This will be a great development and increase the usability, scalability, and efficiency of the platform.

In the next page, we will need to select the training method. The two methods that are currently available are quick search and retraining. There will be another method available in the future which is called advanced search. Retraining, as its name suggests, is for retraining an existing model. Since we are creating a new model, we select Quick Search.

In the Quick Search pane, the first step is to give the project a name and write a short memo of what the project is for. Then, we select the task, dataset, and target device.

The task type is object detection. We can select one of the sample datasets available or upload our own dataset, which can be done in the datasets tab of the console. One of the identifying features of NetsPresso is being hardware-aware so that we can optimize the model based on a selected target device.

NetsPresso currently supports the three most popular devices for edge AI model development, which are NVIDIA Jetson, Raspberry Pi, and Intel Server. However, there are more to come. Users will be able to choose from a wider assortment of devices in future releases including Arm Virtual Hardware, Renesas RZ Series, NVIDIA Jetson Orin, and more.

Under these selections, we have the settings for the output format. NetsPresso shows the suggested frameworks for the selected target device. Therefore, we don’t have to worry about whether the output of the resulting model will be suitable for the device.

The next section in the project page is model training where we can customize the target latency, default image size, and the number of input channels for model training. The notes under each setting guides us to make a proper selection.

In the advances options section, we can change the number of training epochs, do hyperparameter tuning and data augmentation. Once we are done with the project settings, click on the next button, which will take us to the model recommendations page.

In this page, we see several suggested models with latency and size values. Choose the one that best fits your needs and click on the next button.

Then, we can select a training server and start the project. If there is not an available server at the moment, we can click on the start later button, which will start the project when there is an available server. We are also able to see how many credits will be spent before confirming the start, which is a valuable insight especially when working on a large project. NetsPresso sends us an email when the project starts and ends, which makes it easier to track the projects.

Once the training is completed, we can save the model. The saved models can be viewed in the projects page.

There is a summary section that shows the task type, dataset, target device, and the objective of the selected model. We are also able to see when the project was created.

To see more detailed information about the project, click on the arrow on the bottom right, which will take us to the project details page. We will see the selected settings as well as the hyperparameter values in this page.

In the results tab, we can see the model performance based on different metrics and the settings of the selected device. The testing tab contains sample predictions from the test or validation datasets.

As of writing this article, NetsPresso is a computer vision-focused platform that only supports detection tasks. However, in future releases, it will support additional functionality such as classification and segmentation.

Retrain, Compress, Convert

In the previous section, we learned how to create a new project with Quick Search. NetsPresso also allows for retraining, compressing, and converting the existing models.

To retrain, compress, or convert a model, go to the models page in the console. We can select an existing model (previously created by us or provided by NetsPresso) or upload a new model. The desired action can be started by clicking the icons on the right hand side for each model.

If we click on the compress icon, it will take us to the Model Compressor module. It offers two options: Automatic Compression and Advanced Compression.

Click on the compress button under the desired option. Then, we select the base model to be compressed. In Automatic Compression, we can adjust the compression ratio, which is between 0 and 1. In Advanced Compression, we are able to choose a compression method from the given list of methods. An explanation of each method is provided so that we know how the compression is done.

Retraining procedure is similar to creating a new project with Quick Search. The only difference is that we need to select a base model to be retrained. Two important points to keep in mind when retraining:

If we are retraining the model right after the Model Search, we can select a different dataset to build another model with the same base model architecture.
If we are retraining the model after the compression, it is recommended to use the same dataset that formerly used to train the original model to recover accuracy properly. It's more like a fine-tuning process.

The conversion is basically optimizing your model to a different target device. We select a base model to be converted and then choose the target device. Then, click on the start converting button. After a while, the converted models will be available in the models page of the console.

Final thoughts

Edge AI allows for making more use of machine learning applications by increasing automation in a wide range of processes, which would be impractical or unfeasible to do in a centralized cloud environment.

Since computation and memory is a critical measure and a limiting factor in such cases, model optimization is of crucial importance. NetsPresso helps us in this regard by providing a hardware-aware model optimization platform.

Hardware-Aware AI Model Optimization with NetsPresso

Building a Project with NetsPresso

Automatic Hardware-aware AI Model Optimization with NetsPresso

AI Model Optimization Has Never Been Easier