Quantization and Deployment of Deep Neural Networks on Microcontrollers
- URL: http://arxiv.org/abs/2105.13331v1
- Date: Thu, 27 May 2021 17:39:06 GMT
- Title: Quantization and Deployment of Deep Neural Networks on Microcontrollers
- Authors: Pierre-Emmanuel Novac (1), Ghouthi Boukli Hacene (2 and 3), Alain
Pegatoquet (1), Beno\^it Miramond (1), Vincent Gripon (2) ((1) Universit\'e
C\^ote d'Azur, CNRS, LEAT, Sophia Antipolis, France, (2) IMT Atlantique,
Brest, France, (3) MILA, Montreal, Canada)
- Abstract summary: This work focuses on quantization and deployment of deep neural networks onto low-power 32-bit microcontrollers.
A new framework for end-to-end deep neural networks training, quantization and deployment is presented.
Execution using single precision 32-bit floating-point as well as fixed-point on 8- and 16-bit integers are supported.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Embedding Artificial Intelligence onto low-power devices is a challenging
task that has been partly overcome with recent advances in machine learning and
hardware design. Presently, deep neural networks can be deployed on embedded
targets to perform different tasks such as speech recognition,object detection
or Human Activity Recognition. However, there is still room for optimization of
deep neural networks onto embedded devices. These optimizations mainly address
power consumption,memory and real-time constraints, but also an easier
deployment at the edge. Moreover, there is still a need for a better
understanding of what can be achieved for different use cases. This work
focuses on quantization and deployment of deep neural networks onto low-power
32-bit microcontrollers. The quantization methods, relevant in the context of
an embedded execution onto a microcontroller, are first outlined. Then, a new
framework for end-to-end deep neural networks training, quantization and
deployment is presented. This framework, called MicroAI, is designed as an
alternative to existing inference engines (TensorFlow Lite for Microcontrollers
and STM32Cube.AI). Our framework can indeed be easily adjusted and/or extended
for specific use cases. Execution using single precision 32-bit floating-point
as well as fixed-point on 8- and 16-bit integers are supported. The proposed
quantization method is evaluated with three different datasets (UCI-HAR, Spoken
MNIST and GTSRB). Finally, a comparison study between MicroAI and both existing
embedded inference engines is provided in terms of memory and power efficiency.
On-device evaluation is done using ARM Cortex-M4F-based microcontrollers (Ambiq
Apollo3 and STM32L452RE).
Related papers
- Accelerating TinyML Inference on Microcontrollers through Approximate Kernels [3.566060656925169]
In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on microcontrollers.
Our evaluation on an STM32-Nucleo board and 2 popular CNNs trained on the CIFAR-10 dataset shows that, compared to state-of-the-art exact inference, our solutions can feature on average 21% latency reduction.
arXiv Detail & Related papers (2024-09-25T11:10:33Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - Evaluation of Convolution Primitives for Embedded Neural Networks on
32-bit Microcontrollers [0.0]
We propose an implementation for ARM Cortex-M processor family with an open source deployment platform (NNoM)
Our benchmark reveals a linear relationship between theoretical MACs and energy consumption.
We discuss about the significant reduction in latency and energy consumption due to the use of SIMD instructions.
arXiv Detail & Related papers (2023-03-19T16:17:19Z) - Keyword Spotting System and Evaluation of Pruning and Quantization
Methods on Low-power Edge Microcontrollers [7.570300579676175]
Keywords spotting (KWS) is beneficial for voice-based user interactions with low-power devices at the edge.
This paper shows our small-footprint KWS system running on STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM.
arXiv Detail & Related papers (2022-08-04T16:49:45Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Neural network relief: a pruning algorithm based on neural activity [47.57448823030151]
We propose a simple importance-score metric that deactivates unimportant connections.
We achieve comparable performance for LeNet architectures on MNIST.
The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations.
arXiv Detail & Related papers (2021-09-22T15:33:49Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Efficient Neural Network Deployment for Microcontroller [0.0]
This paper is going to explore and generalize convolution neural network deployment for microcontrollers.
The memory savings and performance will be compared with CMSIS-NN framework developed for ARM Cortex-M CPUs.
The final purpose is to develop a tool consuming PyTorch model with trained network weights, and it turns into an optimized inference engine in C/C++ for low memory(kilobyte level) and limited computing capable microcontrollers.
arXiv Detail & Related papers (2020-07-02T19:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.