Deep Compression for PyTorch Model Deployment on Microcontrollers
- URL: http://arxiv.org/abs/2103.15972v1
- Date: Mon, 29 Mar 2021 22:08:44 GMT
- Title: Deep Compression for PyTorch Model Deployment on Microcontrollers
- Authors: Eren Dogan, H. Fatih Ugurdag, Hasan Unlu
- Abstract summary: This paper adds model compression, specifically Deep Compression, to Unlu's earlier work on arXiv.
In the case of the LeNet-5 model, the memory footprint was reduced by 12.45x, and the inference speed was boosted by 2.57x.
- Score: 0.2578242050187029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural network deployment on low-cost embedded systems, hence on
microcontrollers (MCUs), has recently been attracting more attention than ever.
Since MCUs have limited memory capacity as well as limited compute-speed, it is
critical that we employ model compression, which reduces both memory and
compute-speed requirements. In this paper, we add model compression,
specifically Deep Compression, and further optimize Unlu's earlier work on
arXiv, which efficiently deploys PyTorch models on MCUs. First, we prune the
weights in convolutional and fully connected layers. Secondly, the remaining
weights and activations are quantized to 8-bit integers from 32-bit
floating-point. Finally, forward pass functions are compressed using special
data structures for sparse matrices, which store only nonzero weights (without
impacting performance and accuracy). In the case of the LeNet-5 model, the
memory footprint was reduced by 12.45x, and the inference speed was boosted by
2.57x.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Compressed Real Numbers for AI: a case-study using a RISC-V CPU [2.0516276923852415]
We focus on two families of formats that have achieved interesting results in compressing binary32 numbers in machine learning applications.
We propose a way to decompress a tensor of bfloat/posits just before computations.
arXiv Detail & Related papers (2023-09-11T07:54:28Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z) - Compression strategies and space-conscious representations for deep
neural networks [0.3670422696827526]
Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications.
CNNs have millions of parameters, thus they are not deployable on resource-limited platforms.
In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
arXiv Detail & Related papers (2020-07-15T19:41:19Z) - Efficient Neural Network Deployment for Microcontroller [0.0]
This paper is going to explore and generalize convolution neural network deployment for microcontrollers.
The memory savings and performance will be compared with CMSIS-NN framework developed for ARM Cortex-M CPUs.
The final purpose is to develop a tool consuming PyTorch model with trained network weights, and it turns into an optimized inference engine in C/C++ for low memory(kilobyte level) and limited computing capable microcontrollers.
arXiv Detail & Related papers (2020-07-02T19:21:05Z) - Kernel Quantization for Efficient Network Compression [59.55192551370948]
Kernel Quantization (KQ) aims to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss.
Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level.
Experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer.
arXiv Detail & Related papers (2020-03-11T08:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.