AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for
Enabling Ubiquitous Intelligent Mobiles
- URL: http://arxiv.org/abs/2006.04432v1
- Date: Mon, 8 Jun 2020 09:42:12 GMT
- Title: AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for
Enabling Ubiquitous Intelligent Mobiles
- Authors: Sicong Liu, Junzhao Du, Kaiming Nan, ZimuZhou, Atlas Wang, Yingyan Lin
- Abstract summary: We propose AdaDeep to explore the desired trade-off between performance and resource constraints.
AdaDeep can achieve up to $18.6times$ latency reduction, $9.8times$ energy-efficiency improvement, and $37.3times$ storage reduction in DNNs while incurring negligible accuracy loss.
- Score: 21.919700946676393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a
tremendously growing demand for bringing DNN-powered intelligence into mobile
platforms. While the potential of deploying DNNs on resource-constrained
platforms has been demonstrated by DNN compression techniques, the current
practice suffers from two limitations: 1) merely stand-alone compression
schemes are investigated even though each compression technique only suit for
certain types of DNN layers; and 2) mostly compression techniques are optimized
for DNNs' inference accuracy, without explicitly considering other
application-driven system performance (e.g., latency and energy cost) and the
varying resource availability across platforms (e.g., storage and processing
capability). To this end, we propose AdaDeep, a usage-driven, automated DNN
compression framework for systematically exploring the desired trade-off
between performance and resource constraints, from a holistic system level.
Specifically, in a layer-wise manner, AdaDeep automatically selects the most
suitable combination of compression techniques and the corresponding
compression hyperparameters for a given DNN. Thorough evaluations on six
datasets and across twelve devices demonstrate that AdaDeep can achieve up to
$18.6\times$ latency reduction, $9.8\times$ energy-efficiency improvement, and
$37.3\times$ storage reduction in DNNs while incurring negligible accuracy
loss. Furthermore, AdaDeep also uncovers multiple novel combinations of
compression techniques.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Sparsifying Binary Networks [3.8350038566047426]
Binary neural networks (BNNs) have demonstrated their ability to solve complex tasks with comparable accuracy as full-precision deep neural networks (DNNs)
Despite the recent improvements, they suffer from a fixed and limited compression factor that may result insufficient for certain devices with very limited resources.
We propose sparse binary neural networks (SBNNs), a novel model and training scheme which introduces sparsity in BNNs and a new quantization function for binarizing the network's weights.
arXiv Detail & Related papers (2022-07-11T15:54:41Z) - Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems.
By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption.
In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model
Compression for Mobile Applications [15.134752032646231]
We present AdaSpring, a context-adaptive and self-evolutionary DNN compression framework.
It enables the runtime adaptive compression locally online.
Experiment outcomes show that AdaSpring obtains up to 3.1x latency reduction, 4.2 x energy efficiency improvement in DNNs.
arXiv Detail & Related papers (2021-01-28T03:30:04Z) - A Survey on Deep Neural Network Compression: Challenges, Overview, and
Solutions [18.095948566754874]
Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability.
This paper presents a review of existing literature on compressing DNN model that reduces both storage and computation requirements.
We divide the existing approaches into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous, based upon the mechanism incorporated for compressing the DNN model.
arXiv Detail & Related papers (2020-10-05T13:12:46Z) - GAN Slimming: All-in-One GAN Compression by A Unified Optimization
Framework [94.26938614206689]
We propose the first unified optimization framework combining multiple compression means for GAN compression, dubbed GAN Slimming.
We apply GS to compress CartoonGAN, a state-of-the-art style transfer network, by up to 47 times, with minimal visual quality degradation.
arXiv Detail & Related papers (2020-08-25T14:39:42Z) - SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost
Computation [97.78417228445883]
We present SmartExchange, an algorithm- hardware co-design framework for energy-efficient inference of deep neural networks (DNNs)
We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2.
We further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance.
arXiv Detail & Related papers (2020-05-07T12:12:49Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.