A Survey on Deep Neural Network Compression: Challenges, Overview, and
Solutions
- URL: http://arxiv.org/abs/2010.03954v1
- Date: Mon, 5 Oct 2020 13:12:46 GMT
- Title: A Survey on Deep Neural Network Compression: Challenges, Overview, and
Solutions
- Authors: Rahul Mishra, Hari Prabhat Gupta, and Tanima Dutta
- Abstract summary: Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability.
This paper presents a review of existing literature on compressing DNN model that reduces both storage and computation requirements.
We divide the existing approaches into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous, based upon the mechanism incorporated for compressing the DNN model.
- Score: 18.095948566754874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Network (DNN) has gained unprecedented performance due to its
automated feature extraction capability. This high order performance leads to
significant incorporation of DNN models in different Internet of Things (IoT)
applications in the past decade. However, the colossal requirement of
computation, energy, and storage of DNN models make their deployment
prohibitive on resource constraint IoT devices. Therefore, several compression
techniques were proposed in recent years for reducing the storage and
computation requirements of the DNN model. These techniques on DNN compression
have utilized a different perspective for compressing DNN with minimal accuracy
compromise. It encourages us to make a comprehensive overview of the DNN
compression techniques. In this paper, we present a comprehensive review of
existing literature on compressing DNN model that reduces both storage and
computation requirements. We divide the existing approaches into five broad
categories, i.e., network pruning, sparse representation, bits precision,
knowledge distillation, and miscellaneous, based upon the mechanism
incorporated for compressing the DNN model. The paper also discussed the
challenges associated with each category of DNN compression techniques.
Finally, we provide a quick summary of existing work under each category with
the future direction in DNN compression.
Related papers
- GhostRNN: Reducing State Redundancy in RNN with Cheap Operations [66.14054138609355]
We propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations.
Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (40%) and computation cost while keeping performance similar.
arXiv Detail & Related papers (2024-11-20T11:37:14Z) - NAS-BNN: Neural Architecture Search for Binary Neural Networks [55.058512316210056]
We propose a novel neural architecture search scheme for binary neural networks, named NAS-BNN.
Our discovered binary model family outperforms previous BNNs for a wide range of operations (OPs) from 20M to 200M.
In addition, we validate the transferability of these searched BNNs on the object detection task, and our binary detectors with the searched BNNs achieve a novel state-of-the-art result, e.g., 31.6% mAP with 370M OPs, on MS dataset.
arXiv Detail & Related papers (2024-08-28T02:17:58Z) - "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
Quantization [1.0235078178220354]
We propose an automated framework to compress Deep Neural Networks (DNNs) in a hardware-aware manner by jointly employing pruning and quantization.
Our framework achieves $39%$ average energy reduction for datasets $1.7%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.
arXiv Detail & Related papers (2023-12-23T18:50:13Z) - Resource Constrained Model Compression via Minimax Optimization for
Spiking Neural Networks [11.19282454437627]
Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient networks.
It is difficult to deploy these networks on resource-limited edge devices directly.
We propose an improved end-to-end Minimax optimization method for this sparse learning problem.
arXiv Detail & Related papers (2023-08-09T02:50:15Z) - Sparsifying Binary Networks [3.8350038566047426]
Binary neural networks (BNNs) have demonstrated their ability to solve complex tasks with comparable accuracy as full-precision deep neural networks (DNNs)
Despite the recent improvements, they suffer from a fixed and limited compression factor that may result insufficient for certain devices with very limited resources.
We propose sparse binary neural networks (SBNNs), a novel model and training scheme which introduces sparsity in BNNs and a new quantization function for binarizing the network's weights.
arXiv Detail & Related papers (2022-07-11T15:54:41Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - Compacting Deep Neural Networks for Internet of Things: Methods and
Applications [14.611047945621511]
Deep Neural Networks (DNNs) have shown great success in completing complex tasks.
DNNs inevitably bring high computational cost and storage consumption due to the complexity of hierarchical structures.
This paper presents a comprehensive study on compacting-DNNs technologies.
arXiv Detail & Related papers (2021-03-20T03:18:42Z) - AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for
Enabling Ubiquitous Intelligent Mobiles [21.919700946676393]
We propose AdaDeep to explore the desired trade-off between performance and resource constraints.
AdaDeep can achieve up to $18.6times$ latency reduction, $9.8times$ energy-efficiency improvement, and $37.3times$ storage reduction in DNNs while incurring negligible accuracy loss.
arXiv Detail & Related papers (2020-06-08T09:42:12Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.