Related papers: A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

URL: http://arxiv.org/abs/2212.14150v2
Date: Fri, 11 Aug 2023 07:42:04 GMT
Title: A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
Authors: Jian Cao, Chen Qian, Yihui Huang, Dicheng Chen, Yuncheng Gao, Jiyang Dong, Di Guo, Xiaobo Qu
Abstract summary: Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF)
Score: 21.64166573203593
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

Related papers

On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks. Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z)
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z)
Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit [10.64241024049424]
The deep linear network (DLN) is a model for implicit regularization in gradient based optimization of overparametrized learning architectures. We investigate the link between the geometric geometry and the trainings for matrix completion with rigorous analysis and numerics. We propose that implicit regularization is a result of bias towards high state space volume.
arXiv Detail & Related papers (2022-10-22T17:03:10Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Support Vectors and Gradient Dynamics for Implicit Bias in ReLU Networks [45.886537625951256]
We study gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover implicit bias in terms of support vectors in ReLU networks, which play a key role in why and how ReLU networks generalize well.
arXiv Detail & Related papers (2022-02-11T08:55:58Z)
Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank [1.9350867959464846]
In deep learning, gradientdescent tends to prefer solutions which generalize well. In this paper we analyze the dynamics of gradient descent in the simplifiedsetting of linear networks and of an estimation problem.
arXiv Detail & Related papers (2020-11-27T15:08:34Z)
Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics [1.5393457051344297]
We propose reparametrizing ReLU NNs as continuous piecewise linear splines. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.
arXiv Detail & Related papers (2020-08-04T19:19:49Z)
Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs [115.35745188028169]
We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics. We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum. We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
arXiv Detail & Related papers (2020-02-25T11:40:27Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.