Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent
- URL: http://arxiv.org/abs/2410.16247v1
- Date: Mon, 21 Oct 2024 17:52:01 GMT
- Title: Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent
- Authors: Santhosh Karnik, Anna Veselovska, Mark Iwen, Felix Krahmer,
- Abstract summary: We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime.
We prove the first tensor result of its kind for gradient descent rather than gradient flow.
- Score: 4.031100721019478
- License:
- Abstract: We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.
Related papers
- Low-Rank Tensor Learning by Generalized Nonconvex Regularization [25.115066273660478]
We study the problem of low-rank tensor learning, where only a few samples are observed the underlying tensor.
A family of non tensor learning functions are employed to characterize the low-rankness of the underlying tensor.
An algorithm designed to solve the resulting majorization-minimization is proposed.
arXiv Detail & Related papers (2024-10-24T03:33:20Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Understanding Deflation Process in Over-parametrized Tensor
Decomposition [17.28303004783945]
We study the training dynamics for gradient flow on over-parametrized tensor decomposition problems.
Empirically, such training process often first fits larger components and then discovers smaller components.
arXiv Detail & Related papers (2021-06-11T18:51:36Z) - Implicit Regularization in Deep Tensor Factorization [0.0]
We introduce deep Tucker and TT unconstrained factorization to deal with the completion task.
Experiments on both synthetic and real data show that gradient descent promotes solution with low-rank.
arXiv Detail & Related papers (2021-05-04T07:48:40Z) - Implicit Regularization in Tensor Factorization [17.424619189180675]
Implicit regularization in deep learning is perceived as a tendency of gradient-based optimization to fit training data with predictors of minimal "complexity"
We argue that tensor rank may pave way to explaining both implicit regularization of neural networks, and the properties of real-world data translating it to generalization.
arXiv Detail & Related papers (2021-02-19T15:10:26Z) - On the Implicit Bias of Initialization Shape: Beyond Infinitesimal
Mirror Descent [55.96478231566129]
We show that relative scales play an important role in determining the learned model.
We develop a technique for deriving the inductive bias of gradient-flow.
arXiv Detail & Related papers (2021-02-19T07:10:48Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.