Compact Model Training by Low-Rank Projection with Energy Transfer
- URL: http://arxiv.org/abs/2204.05566v3
- Date: Wed, 14 Aug 2024 15:31:26 GMT
- Title: Compact Model Training by Low-Rank Projection with Energy Transfer
- Authors: Kailing Guo, Zhenquan Lin, Canyang Chen, Xiaofen Xing, Fang Liu, Xiangmin Xu,
- Abstract summary: Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning.
Previous low-rank network compression methods compress networks by approximating pre-trained models and re-training.
We devise a new training method, low-rank projection with energy transfer, that trains low-rank compressed networks from scratch and competitive performance.
- Score: 13.446719541044663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation of the previous layer. We propose BN rectification to cut off its effect on the optimal low-rank approximation, which further improves the performance.
Related papers
- TrAct: Making First-layer Pre-Activations Trainable [65.40281259525578]
We consider the training of the first layer of vision models and notice the clear relationship between pixel values and update magnitudes.
An image with low contrast has a smaller impact on learning than an image with higher contrast.
A very bright or very dark image has a stronger impact on the weights than an image with moderate brightness.
arXiv Detail & Related papers (2024-10-31T14:25:55Z) - Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
We present a theoretically-justified technique termed Low-Rank Induced Training (LoRITa)
LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation.
We demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and ImageNet on Convolutional Neural Networks.
arXiv Detail & Related papers (2024-05-06T00:58:23Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Riemannian Low-Rank Model Compression for Federated Learning with
Over-the-Air Aggregation [2.741266294612776]
Low-rank model compression is a widely used technique for reducing the computational load when training machine learning models.
Existing compression techniques are not directly applicable to efficient over-the-air (OTA) aggregation in federated learning systems.
We propose a novel manifold optimization formulation for low-rank model compression in FL that does not relax the low-rank constraint.
arXiv Detail & Related papers (2023-06-04T18:32:50Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - TRP: Trained Rank Pruning for Efficient Deep Neural Networks [69.06699632822514]
We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training.
A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP.
The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
arXiv Detail & Related papers (2020-04-30T03:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.