LayerCollapse: Adaptive compression of neural networks
- URL: http://arxiv.org/abs/2311.17943v2
- Date: Thu, 8 Feb 2024 20:28:28 GMT
- Title: LayerCollapse: Adaptive compression of neural networks
- Authors: Soheil Zibakhsh Shabgahi, Mohammad Sohail Shariff, Farinaz Koushanfar
- Abstract summary: We present LayerCollapse, a form of structured pruning to reduce the depth of fully connected layers.
We develop a novel regularizer allowing for post-training compression without finetuning, while having limited impact on performance.
- Score: 15.248788216228842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handling the ever-increasing scale of contemporary deep learning and
transformer-based models poses a significant challenge. Overparameterized
Transformer networks outperform prior art in Natural Language processing and
Computer Vision. These models contain hundreds of millions of parameters,
demanding significant computational resources and making them prone to
overfitting. In this work we present LayerCollapse, a form of structured
pruning to reduce the depth of fully connected layers. We develop a novel
regularizer allowing for post-training compression without finetuning, while
having limited impact on performance. LayerCollapse controls model
expressiveness with regularization on the activations between fully connected
layers, modulating the linearity of activation functions. A linear activation
function reduces the rank of the transformation to the rank of the
corresponding linear transformation. We demonstrate the effectiveness of
LayerCollapse by showing its compression capabilities in sentimental analysis
and image classification benchmarks. Moreover we show LayerCollapse is an
effective compression aware regularization method in a language modeling
benchmark.
Related papers
- LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging [20.774060844559838]
Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer.
These methods suffer from a critical drawback; the kernel size of the merged layers becomes larger.
We show that this problem can be addressed by jointly pruning convolution layers and activation functions.
We propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove.
arXiv Detail & Related papers (2024-06-18T17:55:15Z) - Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios [14.48369551534582]
A learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality.
A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model.
This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure.
arXiv Detail & Related papers (2024-06-10T11:00:26Z) - Efficient Compression of Overparameterized Deep Models through
Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models.
Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Scaling Private Deep Learning with Low-Rank and Sparse Gradients [5.14780936727027]
We propose a framework that exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates.
A novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates.
Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.
arXiv Detail & Related papers (2022-07-06T14:09:47Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Compressing Deep ODE-Nets using Basis Function Expansions [105.05435207079759]
We consider formulations of the weights as continuous-depth functions using linear combinations of basis functions.
This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance.
In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments.
arXiv Detail & Related papers (2021-06-21T03:04:51Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.