Small random initialization is akin to spectral learning: Optimization
and generalization guarantees for overparameterized low-rank matrix
reconstruction
- URL: http://arxiv.org/abs/2106.15013v1
- Date: Mon, 28 Jun 2021 22:52:39 GMT
- Title: Small random initialization is akin to spectral learning: Optimization
and generalization guarantees for overparameterized low-rank matrix
reconstruction
- Authors: Dominik St\"oger and Mahdi Soltanolkotabi
- Abstract summary: In this paper we show that small random initialization are not fully understood.
We reconstruct a gradient from a small randomrank matrix and find solutions akin to an optimal gradient from a low randomrank matrix.
- Score: 35.585697639325105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently there has been significant theoretical progress on understanding the
convergence and generalization of gradient-based methods on nonconvex losses
with overparameterized models. Nevertheless, many aspects of optimization and
generalization and in particular the critical role of small random
initialization are not fully understood. In this paper, we take a step towards
demystifying this role by proving that small random initialization followed by
a few iterations of gradient descent behaves akin to popular spectral methods.
We also show that this implicit spectral bias from small random initialization,
which is provably more prominent for overparameterized models, also puts the
gradient descent iterations on a particular trajectory towards solutions that
are not only globally optimal but also generalize well. Concretely, we focus on
the problem of reconstructing a low-rank matrix from a few measurements via a
natural nonconvex formulation. In this setting, we show that the trajectory of
the gradient descent iterations from small random initialization can be
approximately decomposed into three phases: (I) a spectral or alignment phase
where we show that that the iterates have an implicit spectral bias akin to
spectral initialization allowing us to show that at the end of this phase the
column space of the iterates and the underlying low-rank matrix are
sufficiently aligned, (II) a saddle avoidance/refinement phase where we show
that the trajectory of the gradient iterates moves away from certain degenerate
saddle points, and (III) a local refinement phase where we show that after
avoiding the saddles the iterates converge quickly to the underlying low-rank
matrix. Underlying our analysis are insights for the analysis of
overparameterized nonconvex optimization schemes that may have implications for
computational problems beyond low-rank reconstruction.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Symmetric Matrix Completion with ReLU Sampling [15.095194065320987]
We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with entry-dependent sampling.
In particular, we consider rectified linear unit (ReLU) sampling, where only stationary points are observed.
arXiv Detail & Related papers (2024-06-09T15:14:53Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled
Gradient Descent, Even with Overparameterization [48.65416821017865]
This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD)
It converges linearly at a constant rate independent of the condition number of the low-rank object.
It maintains the low periteration cost of gradient descent for a variety of tasks.
arXiv Detail & Related papers (2023-10-09T21:16:57Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Implicit Balancing and Regularization: Generalization and Convergence
Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems.
In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z) - Algorithmic Regularization in Model-free Overparametrized Asymmetric
Matrix Factorization [16.325663190517773]
We consider the asymmetric factorization problem under a natural non formulation with arbitrary overparamatrization.
We produce the best low-rank approximation to the observed matrix.
arXiv Detail & Related papers (2022-03-06T00:07:53Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled
Gradient Descent [34.0533596121548]
Low-rank matrix estimation converges convex problem that finds numerous applications in signal processing, machine learning and imaging science.
We show that ScaledGD achieves the best of the best in terms of the number of the low-rank matrix.
Our analysis is also applicable to general loss that are similar to low-rank gradient descent.
arXiv Detail & Related papers (2020-05-18T17:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.