Adaptive Low-Rank Regularization with Damping Sequences to Restrict Lazy
Weights in Deep Networks
- URL: http://arxiv.org/abs/2106.09677v1
- Date: Thu, 17 Jun 2021 17:28:14 GMT
- Title: Adaptive Low-Rank Regularization with Damping Sequences to Restrict Lazy
Weights in Deep Networks
- Authors: Mohammad Mahdi Bejani, Mehdi Ghatee
- Abstract summary: This paper detects a subset of the weighting layers that cause overfitting. The overfitting recognizes by matrix and tensor condition numbers.
An adaptive regularization scheme entitled Adaptive Low-Rank (ALR) is proposed that converges a subset of the weighting layers to their Low-Rank Factorization (LRF)
The experimental results show that ALR regularizes the deep networks well with high training speed and low resource usage.
- Score: 13.122543280692641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Overfitting is one of the critical problems in deep neural networks. Many
regularization schemes try to prevent overfitting blindly. However, they
decrease the convergence speed of training algorithms. Adaptive regularization
schemes can solve overfitting more intelligently. They usually do not affect
the entire network weights. This paper detects a subset of the weighting layers
that cause overfitting. The overfitting recognizes by matrix and tensor
condition numbers. An adaptive regularization scheme entitled Adaptive Low-Rank
(ALR) is proposed that converges a subset of the weighting layers to their
Low-Rank Factorization (LRF). It happens by minimizing a new Tikhonov-based
loss function. ALR also encourages lazy weights to contribute to the
regularization when epochs grow up. It uses a damping sequence to increment
layer selection likelihood in the last generations. Thus before falling the
training accuracy, ALR reduces the lazy weights and regularizes the network
substantially. The experimental results show that ALR regularizes the deep
networks well with high training speed and low resource usage.
Related papers
- InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - A Framework for Provably Stable and Consistent Training of Deep
Feedforward Networks [4.21061712600981]
We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios.
This algorithm combines the standard descent gradient and the gradient clipping method.
We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.
arXiv Detail & Related papers (2023-05-20T07:18:06Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Logit Attenuating Weight Normalization [5.856897366207895]
deep networks trained using gradient-based generalizations are a popular choice for solving classification and ranking problems.
Without appropriately tuned $ell$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large.
We propose a method called Logituating Weight Normalization (LAWN), that can be stacked onto any gradient-based generalization.
arXiv Detail & Related papers (2021-08-12T16:44:24Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Adaptive Low-Rank Factorization to regularize shallow and deep neural
networks [9.607123078804959]
We use Low-Rank matrix Factorization (LRF) to drop out some parameters of the learning model along the training process.
The best results of AdaptiveLRF on SVHN and CIFAR-10 datasets are 98%, 94.1% F-measure, and 97.9%, 94% accuracy.
arXiv Detail & Related papers (2020-05-05T08:13:30Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.