Training Acceleration of Low-Rank Decomposed Networks using Sequential
Freezing and Rank Quantization
- URL: http://arxiv.org/abs/2309.03824v1
- Date: Thu, 7 Sep 2023 16:33:42 GMT
- Title: Training Acceleration of Low-Rank Decomposed Networks using Sequential
Freezing and Rank Quantization
- Authors: Habib Hajimolahoseini and Walid Ahmed and Yang Liu
- Abstract summary: We propose two techniques for accelerating low rank decomposed models without requiring to use small ranks for decomposition.
These methods include rank optimization and sequential freezing of layers.
Experiments show that these techniques can improve the model throughput up to 60% during training and 37% during inference when combined together.
- Score: 5.914653351242832
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Low Rank Decomposition (LRD) is a model compression technique applied to the
weight tensors of deep learning models in order to reduce the number of
trainable parameters and computational complexity. However, due to high number
of new layers added to the architecture after applying LRD, it may not lead to
a high training/inference acceleration if the decomposition ranks are not small
enough. The issue is that using small ranks increases the risk of significant
accuracy drop after decomposition. In this paper, we propose two techniques for
accelerating low rank decomposed models without requiring to use small ranks
for decomposition. These methods include rank optimization and sequential
freezing of decomposed layers. We perform experiments on both convolutional and
transformer-based models. Experiments show that these techniques can improve
the model throughput up to 60% during training and 37% during inference when
combined together while preserving the accuracy close to that of the original
models
Related papers
- AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning [9.51289606759621]
Training and fine-tuning large language models (LLMs) come with challenges related to memory and computational requirements.
Various techniques have been developed to tackle these challenges, such as low-rank adaptation (LoRA)
We introduce a new method inspired by a phenomenon we formally prove: as training progresses, the rank of the estimated gradient gradually decreases.
arXiv Detail & Related papers (2024-10-23T13:53:26Z) - Accelerating the Low-Rank Decomposed Models [4.817356884702073]
We present a comprehensive study about how to modify low rank decomposition technique in AI models.
We could benefit from both high accuracy and low memory consumption as well as speeding up the training and inference.
arXiv Detail & Related papers (2024-07-24T20:26:58Z) - Maestro: Uncovering Low-Rank Structures via Trainable Decomposition [15.254107731735553]
Deep Neural Networks (DNNs) have been a large driver for AI breakthroughs in recent years.
They have been getting increasingly large as they become more accurate and safe.
This means that their training becomes increasingly costly and time-consuming.
We propose Maestro, a framework for trainable low-rank layers.
arXiv Detail & Related papers (2023-08-28T23:08:15Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Cuttlefish: Low-Rank Model Training without All the Tuning [55.984294012024755]
We introduce Cuttlefish, an automated low-rank training approach.
Cuttlefish switches from full-rank to low-rank training once the stable ranks of all layers have converged.
Our results show that Cuttlefish generates models up to 5.6 times smaller than full-rank models, and attains up to a 1.2 times faster end-to-end training process.
arXiv Detail & Related papers (2023-05-04T04:20:20Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - TRP: Trained Rank Pruning for Efficient Deep Neural Networks [69.06699632822514]
We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training.
A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP.
The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
arXiv Detail & Related papers (2020-04-30T03:37:36Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.