DyCE: Dynamic Configurable Exiting for Deep Learning Compression and
Scaling
- URL: http://arxiv.org/abs/2403.01695v1
- Date: Mon, 4 Mar 2024 03:09:28 GMT
- Title: DyCE: Dynamic Configurable Exiting for Deep Learning Compression and
Scaling
- Authors: Qingyuan Wang, Barry Cardiff, Antoine Frapp\'e, Benoit Larras and
Deepu John
- Abstract summary: DyCE is a dynamic early-exit framework that decouples design considerations from each other and from the base model.
It significantly reduces the computational complexity by 23.5% of ResNet152 and 25.9% of ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.
- Score: 1.9686770963118378
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern deep learning (DL) models necessitate the employment of scaling and
compression techniques for effective deployment in resource-constrained
environments. Most existing techniques, such as pruning and quantization are
generally static. On the other hand, dynamic compression methods, such as early
exits, reduce complexity by recognizing the difficulty of input samples and
allocating computation as needed. Dynamic methods, despite their superior
flexibility and potential for co-existing with static methods, pose significant
challenges in terms of implementation due to any changes in dynamic parts will
influence subsequent processes. Moreover, most current dynamic compression
designs are monolithic and tightly integrated with base models, thereby
complicating the adaptation to novel base models. This paper introduces DyCE,
an dynamic configurable early-exit framework that decouples design
considerations from each other and from the base model. Utilizing this
framework, various types and positions of exits can be organized according to
predefined configurations, which can be dynamically switched in real-time to
accommodate evolving performance-complexity requirements. We also propose
techniques for generating optimized configurations based on any desired
trade-off between performance and computational complexity. This empowers
future researchers to focus on the improvement of individual exits without
latent compromise of overall system performance. The efficacy of this approach
is demonstrated through image classification tasks with deep CNNs. DyCE
significantly reduces the computational complexity by 23.5% of ResNet152 and
25.9% of ConvNextv2-tiny on ImageNet, with accuracy reductions of less than
0.5%. Furthermore, DyCE offers advantages over existing dynamic methods in
terms of real-time configuration and fine-grained performance tuning.
Related papers
- ReStNet: A Reusable & Stitchable Network for Dynamic Adaptation on IoT Devices [16.762206782460296]
ReStNet dynamically constructs a hybrid network by stitching two pre-trained models together.<n>It achieves flexible accuracy-efficiency trade-offs at runtime while significantly reducing training cost.
arXiv Detail & Related papers (2025-06-08T16:14:37Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Convolutional Neural Network Compression via Dynamic Parameter Rank
Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning.
Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z) - Deep learning model compression using network sensitivity and gradients [3.52359746858894]
We present model compression algorithms for both non-retraining and retraining conditions.
In the first case, we propose the Bin & Quant algorithm for compression of the deep learning models using the sensitivity of the network parameters.
In the second case, we propose our novel gradient-weighted k-means clustering algorithm (GWK)
arXiv Detail & Related papers (2022-10-11T03:02:40Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.