TRP: Trained Rank Pruning for Efficient Deep Neural Networks
- URL: http://arxiv.org/abs/2004.14566v1
- Date: Thu, 30 Apr 2020 03:37:36 GMT
- Title: TRP: Trained Rank Pruning for Efficient Deep Neural Networks
- Authors: Yuhui Xu, Yuxi Li, Shuai Zhang, Wei Wen, Botao Wang, Yingyong Qi,
Yiran Chen, Weiyao Lin, Hongkai Xiong
- Abstract summary: We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training.
A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP.
The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
- Score: 69.06699632822514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To enable DNNs on edge devices like mobile phones, low-rank approximation has
been widely adopted because of its solid theoretical rationale and efficient
implementations. Several previous works attempted to directly approximate a
pretrained model by low-rank decomposition; however, small approximation errors
in parameters can ripple over a large prediction loss. As a result, performance
usually drops significantly and a sophisticated effort on fine-tuning is
required to recover accuracy. Apparently, it is not optimal to separate
low-rank approximation from training. Unlike previous works, this paper
integrates low rank approximation and regularization into the training process.
We propose Trained Rank Pruning (TRP), which alternates between low rank
approximation and training. TRP maintains the capacity of the original network
while imposing low-rank constraints during training. A nuclear regularization
optimized by stochastic sub-gradient descent is utilized to further promote low
rank in TRP. The TRP trained network inherently has a low-rank structure, and
is approximated with negligible performance loss, thus eliminating the
fine-tuning process after low rank decomposition. The proposed method is
comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous
compression methods using low rank approximation.
Related papers
- Efficient Neural Network Training via Subset Pretraining [5.352839075466439]
In training neural networks, it is common practice to use partial gradients computed over batches.
The loss minimum of the training set can be expected to be well-approximated by the minima of its subsets.
experiments have confirmed that results equivalent to conventional training can be reached.
arXiv Detail & Related papers (2024-10-21T21:31:12Z) - Grad-Instructor: Universal Backpropagation with Explainable Evaluation Neural Networks for Meta-learning and AutoML [0.0]
An Evaluation Neural Network (ENN) is trained via deep reinforcement learning to predict the performance of the target network.
The ENN then works as an additional evaluation function during backpropagation.
arXiv Detail & Related papers (2024-06-15T08:37:51Z) - Maestro: Uncovering Low-Rank Structures via Trainable Decomposition [15.254107731735553]
Deep Neural Networks (DNNs) have been a large driver for AI breakthroughs in recent years.
They have been getting increasingly large as they become more accurate and safe.
This means that their training becomes increasingly costly and time-consuming.
We propose Maestro, a framework for trainable low-rank layers.
arXiv Detail & Related papers (2023-08-28T23:08:15Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Prospect Pruning: Finding Trainable Weights at Initialization using
Meta-Gradients [36.078414964088196]
Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network.
Current methods are insufficient to enable this optimization and lead to a large degradation in model performance.
We propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune.
Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
arXiv Detail & Related papers (2022-02-16T15:18:55Z) - Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.