Flat minima generalize for low-rank matrix recovery
- URL: http://arxiv.org/abs/2203.03756v1
- Date: Mon, 7 Mar 2022 22:35:20 GMT
- Title: Flat minima generalize for low-rank matrix recovery
- Authors: Lijun Ding, Dmitriy Drusvyatskiy, Maryam Fazel
- Abstract summary: We show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions.
For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well.
- Score: 16.956238550063365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empirical evidence suggests that for a variety of overparameterized nonlinear
models, most notably in neural network training, the growth of the loss around
a minimizer strongly impacts its performance. Flat minima -- those around which
the loss grows slowly -- appear to generalize well. This work takes a step
towards understanding this phenomenon by focusing on the simplest class of
overparameterized nonlinear models: those arising in low-rank matrix recovery.
We analyze overparameterized matrix and bilinear sensing, robust PCA,
covariance matrix estimation, and single hidden layer neural networks with
quadratic activation functions. In all cases, we show that flat minima,
measured by the trace of the Hessian, exactly recover the ground truth under
standard statistical assumptions. For matrix completion, we establish weak
recovery, although empirical evidence suggests exact recovery holds here as
well. We complete the paper with synthetic experiments that illustrate our
findings.
Related papers
- Simplicity Bias via Global Convergence of Sharpness Minimization [43.658859631741024]
We show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks.
We also find a novel property of the trace of Hessian of the loss at approximate stationary points on the manifold of zero loss.
arXiv Detail & Related papers (2024-10-21T18:10:37Z) - Minimum-Norm Interpolation Under Covariate Shift [14.863831433459902]
In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as textitbenign overfitting
We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting.
arXiv Detail & Related papers (2024-03-31T01:41:57Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Implicit Regularization for Group Sparsity [33.487964460794764]
We show that gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure.
We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates.
In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression.
arXiv Detail & Related papers (2023-01-29T20:54:03Z) - An Unconstrained Layer-Peeled Perspective on Neural Collapse [20.75423143311858]
We introduce a surrogate model called the unconstrained layer-peeled model (ULPM)
We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer.
We show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
arXiv Detail & Related papers (2021-10-06T14:18:47Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Entropy Minimizing Matrix Factorization [102.26446204624885]
Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks.
In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem.
Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization.
arXiv Detail & Related papers (2021-03-24T21:08:43Z) - Learning Mixtures of Low-Rank Models [89.39877968115833]
We study the problem of learning computational mixtures of low-rank models.
We develop an algorithm that is guaranteed to recover the unknown matrices with near-optimal sample.
In addition, the proposed algorithm is provably stable against random noise.
arXiv Detail & Related papers (2020-09-23T17:53:48Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.