Keyword Spotting System and Evaluation of Pruning and Quantization
Methods on Low-power Edge Microcontrollers
- URL: http://arxiv.org/abs/2208.02765v1
- Date: Thu, 4 Aug 2022 16:49:45 GMT
- Title: Keyword Spotting System and Evaluation of Pruning and Quantization
Methods on Low-power Edge Microcontrollers
- Authors: Jingyi Wang, Shengchen Li
- Abstract summary: Keywords spotting (KWS) is beneficial for voice-based user interactions with low-power devices at the edge.
This paper shows our small-footprint KWS system running on STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM.
- Score: 7.570300579676175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyword spotting (KWS) is beneficial for voice-based user interactions with
low-power devices at the edge. The edge devices are usually always-on, so edge
computing brings bandwidth savings and privacy protection. The devices
typically have limited memory spaces, computational performances, power and
costs, for example, Cortex-M based microcontrollers. The challenge is to meet
the high computation and low-latency requirements of deep learning on these
devices. This paper firstly shows our small-footprint KWS system running on
STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM. Our
selected convolutional neural network (CNN) architecture has simplified number
of operations for KWS to meet the constraint of edge devices. Our baseline
system generates classification results for each 37ms including real-time audio
feature extraction part. This paper further evaluates the actual performance
for different pruning and quantization methods on microcontroller, including
different granularity of sparsity, skipping zero weights, weight-prioritized
loop order, and SIMD instruction. The result shows that for microcontrollers,
there are considerable challenges for accelerate unstructured pruned models,
and the structured pruning is more friendly than unstructured pruning. The
result also verified that the performance improvement for quantization and SIMD
instruction.
Related papers
- Accelerating TinyML Inference on Microcontrollers through Approximate Kernels [3.566060656925169]
In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on microcontrollers.
Our evaluation on an STM32-Nucleo board and 2 popular CNNs trained on the CIFAR-10 dataset shows that, compared to state-of-the-art exact inference, our solutions can feature on average 21% latency reduction.
arXiv Detail & Related papers (2024-09-25T11:10:33Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - Evaluation of Convolution Primitives for Embedded Neural Networks on
32-bit Microcontrollers [0.0]
We propose an implementation for ARM Cortex-M processor family with an open source deployment platform (NNoM)
Our benchmark reveals a linear relationship between theoretical MACs and energy consumption.
We discuss about the significant reduction in latency and energy consumption due to the use of SIMD instructions.
arXiv Detail & Related papers (2023-03-19T16:17:19Z) - Pex: Memory-efficient Microcontroller Deep Learning through Partial
Execution [11.336229510791481]
We discuss a novel execution paradigm for microcontroller deep learning.
It modifies the execution of neural networks to avoid materialising full buffers in memory.
This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
arXiv Detail & Related papers (2022-11-30T18:47:30Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Quantization and Deployment of Deep Neural Networks on Microcontrollers [0.0]
This work focuses on quantization and deployment of deep neural networks onto low-power 32-bit microcontrollers.
A new framework for end-to-end deep neural networks training, quantization and deployment is presented.
Execution using single precision 32-bit floating-point as well as fixed-point on 8- and 16-bit integers are supported.
arXiv Detail & Related papers (2021-05-27T17:39:06Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Efficient Neural Network Deployment for Microcontroller [0.0]
This paper is going to explore and generalize convolution neural network deployment for microcontrollers.
The memory savings and performance will be compared with CMSIS-NN framework developed for ARM Cortex-M CPUs.
The final purpose is to develop a tool consuming PyTorch model with trained network weights, and it turns into an optimized inference engine in C/C++ for low memory(kilobyte level) and limited computing capable microcontrollers.
arXiv Detail & Related papers (2020-07-02T19:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.