A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix
Factorization
- URL: http://arxiv.org/abs/2212.14150v2
- Date: Fri, 11 Aug 2023 07:42:04 GMT
- Title: A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix
Factorization
- Authors: Jian Cao, Chen Qian, Yihui Huang, Dicheng Chen, Yuncheng Gao, Jiyang
Dong, Di Guo, Xiaobo Qu
- Abstract summary: Implicit regularization is an important way to interpret neural networks.
Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF)
- Score: 21.64166573203593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit regularization is an important way to interpret neural networks.
Recent theory starts to explain implicit regularization with the model of deep
matrix factorization (DMF) and analyze the trajectory of discrete gradient
dynamics in the optimization process. These discrete gradient dynamics are
relatively small but not infinitesimal, thus fitting well with the practical
implementation of neural networks. Currently, discrete gradient dynamics
analysis has been successfully applied to shallow networks but encounters the
difficulty of complex computation for deep networks. In this work, we introduce
another discrete gradient dynamics approach to explain implicit regularization,
i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle
points and local minima. We theoretically establish the connection between
saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that,
for a rank-R matrix reconstruction, DMF will converge to a second-order
critical point after R stages of SPE. This conclusion is further experimentally
verified on a low-rank matrix reconstruction problem. This work provides a new
theory to analyze implicit regularization in deep learning.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z) - Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit [10.64241024049424]
The deep linear network (DLN) is a model for implicit regularization in gradient based optimization of overparametrized learning architectures.
We investigate the link between the geometric geometry and the trainings for matrix completion with rigorous analysis and numerics.
We propose that implicit regularization is a result of bias towards high state space volume.
arXiv Detail & Related papers (2022-10-22T17:03:10Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Support Vectors and Gradient Dynamics for Implicit Bias in ReLU Networks [45.886537625951256]
We study gradient flow dynamics in the parameter space when training single-neuron ReLU networks.
Specifically, we discover implicit bias in terms of support vectors in ReLU networks, which play a key role in why and how ReLU networks generalize well.
arXiv Detail & Related papers (2022-02-11T08:55:58Z) - Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit
Bias towards Low Rank [1.9350867959464846]
In deep learning, gradientdescent tends to prefer solutions which generalize well.
In this paper we analyze the dynamics of gradient descent in the simplifiedsetting of linear networks and of an estimation problem.
arXiv Detail & Related papers (2020-11-27T15:08:34Z) - Shallow Univariate ReLu Networks as Splines: Initialization, Loss
Surface, Hessian, & Gradient Flow Dynamics [1.5393457051344297]
We propose reparametrizing ReLU NNs as continuous piecewise linear splines.
We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum.
Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.
arXiv Detail & Related papers (2020-08-04T19:19:49Z) - Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of
DNNs [115.35745188028169]
We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics.
We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum.
We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
arXiv Detail & Related papers (2020-02-25T11:40:27Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.