AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model
Compression for Mobile Applications
- URL: http://arxiv.org/abs/2101.11800v1
- Date: Thu, 28 Jan 2021 03:30:04 GMT
- Title: AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model
Compression for Mobile Applications
- Authors: Sicong Liu, Bin Guo, Ke Ma, Zhiwen Yu, Junzhao Du
- Abstract summary: We present AdaSpring, a context-adaptive and self-evolutionary DNN compression framework.
It enables the runtime adaptive compression locally online.
Experiment outcomes show that AdaSpring obtains up to 3.1x latency reduction, 4.2 x energy efficiency improvement in DNNs.
- Score: 15.134752032646231
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: There are many deep learning (e.g., DNN) powered mobile and wearable
applications today continuously and unobtrusively sensing the ambient
surroundings to enhance all aspects of human lives. To enable robust and
private mobile sensing, DNN tends to be deployed locally on the
resource-constrained mobile devices via model compression. The current practice
either hand-crafted DNN compression techniques, i.e., for optimizing
DNN-relative performance (e.g., parameter size), or on-demand DNN compression
methods, i.e., for optimizing hardware-dependent metrics (e.g., latency),
cannot be locally online because they require offline retraining to ensure
accuracy. Also, none of them have correlated their efforts with runtime
adaptive compression to consider the dynamic nature of the deployment context
of mobile applications. To address those challenges, we present AdaSpring, a
context-adaptive and self-evolutionary DNN compression framework. It enables
the runtime adaptive DNN compression locally online. Specifically, it presents
the ensemble training of a retraining-free and self-evolutionary network to
integrate multiple alternative DNN compression configurations (i.e., compressed
architectures and weights). It then introduces the runtime search strategy to
quickly search for the most suitable compression configurations and evolve the
corresponding weights. With evaluation on five tasks across three platforms and
a real-world case study, experiment outcomes show that AdaSpring obtains up to
3.1x latency reduction, 4.2 x energy efficiency improvement in DNNs, compared
to hand-crafted compression techniques, while only incurring <= 6.2ms
runtime-evolution latency.
Related papers
- Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass.
FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z) - Resource Constrained Model Compression via Minimax Optimization for
Spiking Neural Networks [11.19282454437627]
Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient networks.
It is difficult to deploy these networks on resource-limited edge devices directly.
We propose an improved end-to-end Minimax optimization method for this sparse learning problem.
arXiv Detail & Related papers (2023-08-09T02:50:15Z) - FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing [5.815300670677979]
We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an asymmetric environment.
Our method achieves 60% lower than a state-of-the-art SC method without decreasing accuracy and is up 16x faster than offloading with existing standards.
arXiv Detail & Related papers (2023-02-21T14:03:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - You Only Compress Once: Towards Effective and Elastic BERT Compression
via Exploit-Explore Stochastic Nature Gradient [88.58536093633167]
Existing model compression approaches require re-compression or fine-tuning across diverse constraints to accommodate various hardware deployments.
We propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.
Compared with state-of-the-art algorithms, YOCO-BERT provides more compact models, yet achieving 2.1%-4.5% average accuracy improvement on the GLUE benchmark.
arXiv Detail & Related papers (2021-06-04T12:17:44Z) - Incremental Training and Group Convolution Pruning for Runtime DNN
Performance Scaling on Heterogeneous Embedded Platforms [23.00896228073755]
Inference for Deep Neural Networks is increasingly being executed locally on mobile and embedded platforms.
In this paper, we present a dynamic DNN using incremental training and group convolution pruning.
It achieved 10.6x (energy) and 41.6x (time) wider dynamic range by combining with task mapping and DVFS.
arXiv Detail & Related papers (2021-05-08T05:38:01Z) - A Survey on Deep Neural Network Compression: Challenges, Overview, and
Solutions [18.095948566754874]
Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability.
This paper presents a review of existing literature on compressing DNN model that reduces both storage and computation requirements.
We divide the existing approaches into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous, based upon the mechanism incorporated for compressing the DNN model.
arXiv Detail & Related papers (2020-10-05T13:12:46Z) - AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for
Enabling Ubiquitous Intelligent Mobiles [21.919700946676393]
We propose AdaDeep to explore the desired trade-off between performance and resource constraints.
AdaDeep can achieve up to $18.6times$ latency reduction, $9.8times$ energy-efficiency improvement, and $37.3times$ storage reduction in DNNs while incurring negligible accuracy loss.
arXiv Detail & Related papers (2020-06-08T09:42:12Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.