Optimizing AI Models for ST Target Devices with PyNetsPresso Integration

Introduction

The demand for AI at the edge is increasing rapidly, leading to improvements in cost efficiency and reduced latency. However, the rapid advancement of AI models has resulted in larger model sizes, making it increasingly challenging to deploy them on edge devices. Given the limited memory capacity and power consumption constraints of edge devices, deploying large AI models presents a significant challenge. Therefore, optimizing AI models for maximum size reduction with minimal accuracy loss is a critical task for edge computing. NetsPresso®, a hardware-aware AI model optimization platform, offers AI model optimization through structural pruning and filter decomposition. Additionally, PyNetsPresso, a development tool for optimizing AI models based on Python for target devices, enables AI model optimization with just a few lines of code.

To simplify the deployment process on edge devices, many semiconductor companies provide their own SDKs and software tools to facilitate the deployment of AI models on the devices they manufacture. STMicroelectronics offers tools such as STM32 model zoo and STM32Cube.AI Developer Cloud to support fine-tuning and deployment of AI models. STM32 Model Zoo provides services for AI model selection, training, evaluation, and STM32Cube.AI Developer Cloud aids in quantization and running benchmarks on ST target devices.

In order to enable users looking to use ST's devices to achieve better performance with optimized models, we have integrated model pruning functionality using PyNetsPresso into the STM32 model zoo + STM32Cube.AI Developer Cloud pipeline.

Consequently, we have taken the following steps:

  1. Utilize STM32 Model Zoo to acquire a pre-trained MobilenetV2 model using ImageNet data, then proceeded to train, quantize, and evaluate it with the Tensorflow flower dataset.

  2. Use PyNetsPresso to prune the model, thereby reducing its size and enhancing inference latency.

  3. Fine-tune the pruned model by STM32 model zoo with the Tensorflow flower dataset.

  4. Conduct a performance comparison of the models mentioned above when running on the STM32H747I-DISCO by STM32Cube.AI Developer Cloud.

  5. Demonstrate that the pruned model, achieved through PyNetsPresso, maintains a minimal drop in model accuracy.

In particular, we conducted a demo using the upcoming release, STM32 Model Zoo v2.0, set to be released on the end of this November. This new version is expected to include support for using custom model files and the ability to selectively optimize with fine-tuning options of your choice.

Figure 1. Pipeline of STM32 model zoo and STM32 Cube.AI

Install and Setup STM32 Model Zoo

The following steps are required for using STM32 Model Zoo:

  1. Sign Up to STM32Cube.AI for measuring and running benchmarks on devices of ST.

  2. Login to STM32Cube.AI

import os
import getpass

email ='YOUR_EMAIL_ADDRESS'
os.environ['stmai_username'] = email
print('Enter you password')
password = getpass.getpass()
os.environ['stmai_password'] = password
os.environ['NO_SSL_VERIFY'] = "1"

3. Install STM32 model zoo (git repo)

git clone https://github.com/STMicroelectronics/stm32ai-modelzoo.git

4. Install required libraries of Python

pip install -r requirements.txt

5. Install PyNetsPresso

pip install netspresso==1.1.7

6. Set some path for using model zoo

import os
import sys

from IPython import get_ipython

path_to_modelzoo = "../../image_classification/"

sys.path.append(os.path.relpath(path_to_modelzoo+'../../common'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/data_augmentation'))
sys.path.append(os.path.relpath(path_to_modelzoo+'/src/preprocessing'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/training'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/utils'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/evaluation'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./deployment'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/quantization'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/prediction'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/benchmarking'))
sys.path.append(os.path.relpath(path_to_modelzoo+'./src/models'))
sys.path.append(os.path.relpath('./notebook_utils'))

from notebook_utils import CustomMagics
get_ipython().register_magics(CustomMagics)

Preparing Base Model

STM32 model zoo serves various pre-trained models in the models directory. We choose a MobilenetV2 model pre-trained on ImageNet and train this classification model with the tf_flower classification dataset.

Figure 2. Image samples of tf_flower classification dataset

You can also use any customized classification dataset as long as it is in the following format:

dataset_directory/
  class_a/
    a_image_1.jpg
    a_image_2.jpg
  class_b/
    b_image_1.jpg
    b_image_2.jpg

Then, set the following parameters:

operation_mode : The mode of operation for the simulation. Please set it to chain_tqe to run the training followed by the quantization and evaluation of the model.

general.model_path : Path to the model to be trained. You can use any model from the Model Zoo or a custom .h5 model.

dataset.training_path : Path to the training dataset. The training set should be a directory containing a sub-directory for each class.

hydra.run.dir : Path to save the trained model.

Before fine-tuning, it's important to review the user_config.yaml file in image_classification/src and check if any parameters need to be modified. After checking the user_config.yaml file, we run the following code. In this process, model training, quantization to int8 and evaluation are orderly processed.

%%tee trained_model_output
%run ../../image_classification/src/stm32ai_main.py  
		operation_mode='chain_tqe'  
		general.model_path='../../image_classification/pretrained_models/mobilenetv2.h5'  
		dataset.training_path='../../datasets/flowers' 
		hydra.run.dir='experiments_outputs/training'

Measuring the Benchmark of Base Model with STM32Cube.AI Developer Cloud

With STM32Cube.AI Developer Cloud, benchmarks can be measured on multiple ST devices. In this process, we measure the benchmark on the STM32H747I-DISCO, and the result of this benchmark will be used to compare with that of the pruned model.

%%tee baseline_benchmark_output
%run ../../image_classification/src/stm32ai_main.py  
			operation_mode='benchmarking' 
			general.model_path='experiments_outputs/training/quantized_models/quantized_model.tflite' 
			hydra.run.dir='experiments_outputs/baseline_benchmark'

The benchmark result of the base model is:

import getpass
from netspresso.client import SessionClient
from netspresso.compressor import ModelCompressor

email ='YOUR_NETSPRESSO_EMAIL_ADDRESS'
print('Enter you password')
password = getpass.getpass()

session = SessionClient(email=email, password=password)
compressor = ModelCompressor(user_session=session)

Optimize the AI model by pruning with PyNetsPresso

Before using the PyNetsPresso, it is required to sign up for NetsPresso®. Then, you need email and password for sign in with in the code section below.

import getpass
from netspresso.client import SessionClient
from netspresso.compressor import ModelCompressor

email ='YOUR_NETSPRESSO_EMAIL_ADDRESS'
print('Enter you password')
password = getpass.getpass()

session = SessionClient(email=email, password=password)
compressor = ModelCompressor(user_session=session)

After sign in, upload the AI model for pruning by PyNetsPresso. After uploading the model, please save model_id of your uploaded model.

from netspresso.compressor import Task, Framework

model = compressor.upload_model(
    model_name='model',
    task=Task.IMAGE_CLASSIFICATION,
    framework=Framework.TENSORFLOW_KERAS,
    file_path= 'experiments_outputs/training/saved_models/best_model.h5',  
    input_shapes=[{'batch': 1, 'channel': 3, 'dimension': [128, 128]}])

For advanced compression using PyNetsPresso, please enter the required parameters. A description of each parameter is provided in the table below. We utilize L2 Norm Pruning as the compression method, Structured Layer-adaptive Sparsity as the Magnitude-based Pruning recommendation method, and set the recommendation ratio to 0.5.

If you want more options please visit our PyNetsPresso docs.

import os

from netspresso.compressor import CompressionMethod
from netspresso.compressor import RecommendationMethod

if not os.path.exists('experiments_outputs/compressed_models'):
    os.makedirs('experiments_outputs/compressed_models')

compressed_model = compressor.recommendation_compression(
    model_id='YOUR_UPLOADED_MODEL_ID',
    model_name='compressed_model.h5',
    compression_method=CompressionMethod.PR_L2,
    recommendation_method=RecommendationMethod.SLAMP,
    recommendation_ratio=0.5,
    output_path='experiments_outputs/compressed_models/compressed_model.h5')

Retraining, quantization, evaluation and benchmark of the pruned model

To prepare the pruned model for comparison with the base model, we conduct training, quantization, evaluation, and benchmarking on the base model, as described above. The latest version of the STM32 Model Zoo allows for convenient retraining with only the pruned model, simplifying the process.

%%tee retrained_model_output
%run ../../image_classification/src/stm32ai_main.py  
	operation_mode='chain_tqe'  
	general.model_path='experiments_outputs/compressed_models/compressed_model.h5' 
	dataset.training_path='../../datasets/flowers' 
	hydra.run.dir='experiments_outputs/retraining'

%%tee compressed_benchmark_output
%run ../../image_classification/src/stm32ai_main.py  
	operation_mode='benchmarking' 
	general.model_path='experiments_outputs/retraining/quantized_models/quantized_model.tflite' 
	hydra.run.dir='experiments_outputs/compressed_benchmark'

The benchmark result of the pruned model is:

[INFO] : Total RAM : 258.0 (KiB)
[INFO] : RAM Activations : 225.17 (KiB)
[INFO] : RAM Runtime : 32.83 (KiB)
[INFO] : Total Flash : 522.0 (KiB)
[INFO] : Flash Weights : 406.86 (KiB)
[INFO] : Estimated Flash Code : 115.14 (KiB)
[INFO] : MACCs : 19.1 (M)
[INFO] : Number of cycles : 40.454 (M)
[INFO] : Inference Time : 101.13 (ms)
[INFO] : Benchmark complete.

Result of pruned model by PyNetsPresso on STM32H747I-DISCO

Conclusion

Through this post, we demonstrated how to leverage STM32 Model Zoo to fine-tune pre-trained models and measure their performance on the desired devices using STM32Cube.AI. Furthermore, we demonstrated the AI model pruning process using PyNetsPresso, enhancing convenience and compatibility with other software solutions provided by device companies. This demonstrated that dramatic model compression is achievable with minimal accuracy loss, highlighting the power and convenience of PyNetsPresso. Through L2-norm-based pruning, we were able to achieve a remarkable 23.7% improvement in the inference latency, with a performance drop of just 1.78%.