DyCE: Dynamic Configurable Exiting for Deep Learning Compression and
Scaling
- URL: http://arxiv.org/abs/2403.01695v1
- Date: Mon, 4 Mar 2024 03:09:28 GMT
- Title: DyCE: Dynamic Configurable Exiting for Deep Learning Compression and
Scaling
- Authors: Qingyuan Wang, Barry Cardiff, Antoine Frapp\'e, Benoit Larras and
Deepu John
- Abstract summary: DyCE is a dynamic early-exit framework that decouples design considerations from each other and from the base model.
It significantly reduces the computational complexity by 23.5% of ResNet152 and 25.9% of ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.
- Score: 1.9686770963118378
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern deep learning (DL) models necessitate the employment of scaling and
compression techniques for effective deployment in resource-constrained
environments. Most existing techniques, such as pruning and quantization are
generally static. On the other hand, dynamic compression methods, such as early
exits, reduce complexity by recognizing the difficulty of input samples and
allocating computation as needed. Dynamic methods, despite their superior
flexibility and potential for co-existing with static methods, pose significant
challenges in terms of implementation due to any changes in dynamic parts will
influence subsequent processes. Moreover, most current dynamic compression
designs are monolithic and tightly integrated with base models, thereby
complicating the adaptation to novel base models. This paper introduces DyCE,
an dynamic configurable early-exit framework that decouples design
considerations from each other and from the base model. Utilizing this
framework, various types and positions of exits can be organized according to
predefined configurations, which can be dynamically switched in real-time to
accommodate evolving performance-complexity requirements. We also propose
techniques for generating optimized configurations based on any desired
trade-off between performance and computational complexity. This empowers
future researchers to focus on the improvement of individual exits without
latent compromise of overall system performance. The efficacy of this approach
is demonstrated through image classification tasks with deep CNNs. DyCE
significantly reduces the computational complexity by 23.5% of ResNet152 and
25.9% of ConvNextv2-tiny on ImageNet, with accuracy reductions of less than
0.5%. Furthermore, DyCE offers advantages over existing dynamic methods in
terms of real-time configuration and fine-grained performance tuning.
Related papers
- Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Convolutional Neural Network Compression via Dynamic Parameter Rank
Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning.
Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution [16.56592303409295]
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.
We propose a new framework, textbfSparse Dynamic Convolution (textscSD-Conv), to naturally integrate these two paths.
arXiv Detail & Related papers (2022-04-05T14:03:54Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization.
We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z) - AntiDote: Attention-based Dynamic Optimization for Neural Network
Runtime Efficiency [42.00372941618975]
We propose a dynamic CNN optimization framework based on the neural network attention mechanism.
Our method could bring 37.4% to 54.5% FLOPs reduction with negligible accuracy drop on various test networks.
arXiv Detail & Related papers (2020-08-14T18:48:13Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.