Revolutionizing Mobile AI: How NetsPresso® Turbocharges Semantic Segmentation Models for Real-Time Performance

Author

Nota AI Marketing team

YoonJae Yang

AI Application Developer, Nota AI

  • Android app and model deployment

Hyungjun Lee

Research Engineer. Nota AI

  • AI model development and lightweighting


Introduction

In the world of artificial intelligence, semantic segmentation stands out as a crucial task. This AI model excels at detecting the pixels of specific target objects within images or videos. In this article, our focus narrows down to a rather familiar target - "people." The applications of semantic segmentation are diverse, ranging from background removal during video conferencing and video calls to enhancing the art of photo editing.

However, here's the catch. Semantic segmentation models demand a substantial amount of computational power. Why? Because they must meticulously classify every single pixel within an input image. For instance, they need to determine whether a given pixel belongs to the background or represents a person. This high computational complexity poses a significant challenge, especially when real-time background removal is a necessity, as in the case of video conferencing or video calls.

 

Test Platform

Our journey into the world of mobile optimization begins with a specific set of tools and hardware:

  • Operating System: Android

  • AI Model: PIDNet (Xu et al., CVPR 2023)

  • AI Model Format: ONNX

  • Target Hardware: Galaxy S23 (SM-S911**)

Additionally, we have made our project available on GitHub, making it accessible to fellow developers and AI enthusiasts. You can explore our work at this GitHub repository. We're proud to acknowledge the contributions of YoonJae Yang and Hyungjun Lee to this project.

 

Scenario Using NetsPresso®

Now, let's delve into a critical scenario where we explore how to optimize segmentation models for mobile devices using NetsPresso®.

 

Initial Challenge

We embarked on our journey using PIDNet, a cutting-edge model known for its efficiency in segmentation tasks. However, despite its prowess, we encountered a significant obstacle: a latency of 1.86 FPS (Frames Per Second). To put it simply, this meant a noticeable lag between processing frames, which isn't acceptable for real-time applications.

 

The Power of Compression

To address this challenge, we turned to NetsPresso® and harnessed its structured pruning capabilities. Our objective was clear: compress the model without sacrificing performance. We experimented with two levels of compression - 40% and 50%.

40% Compression

Our first experiment was a revelation. By compressing the model by 40%, we not only managed to maintain nearly the same accuracy (measured by mIoU) but also achieved a significant reduction in latency. We compared the original model's performance with the 40% compressed model, and the results were impressive.

Table 1: Performance Comparison - Original vs. 40% Compressed Model

The 40% compressed model significantly reduced the time taken to produce results compared to the original model.

50% Compression

Encouraged by the success of our first experiment, we pushed the boundaries further by compressing the model by 50%. This time, there was some loss in accuracy, but the gains in latency were noteworthy.

Even with a 50% compression, we managed to significantly improve latency while accepting a slight dip in accuracy.

 

Model Performance

Table 2: Summary of the performance of all PIDNet models discussed

These results underline the significant improvement in latency achieved by compressing the PIDNet model using NetsPresso®'s structured pruning. This optimization makes it a more viable choice for real-time applications, such as video conferencing and video calls, on mobile devices.

In conclusion, our journey into mobile optimization has shown that with the right tools and techniques, even the most computationally demanding AI models can be made efficient for mobile platforms. This opens up exciting possibilities for applications that require real-time, on-device AI processing.

Previous
Previous

Integrating LaunchX with NVIDIA TAO Toolkit for Running on Various Edge Devices

Next
Next

Empowering Pedestrian Safety: The Obstacle Detection App and AI Model Optimization