Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled
Gradient Descent, Even with Overparameterization
- URL: http://arxiv.org/abs/2310.06159v1
- Date: Mon, 9 Oct 2023 21:16:57 GMT
- Title: Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled
Gradient Descent, Even with Overparameterization
- Authors: Cong Ma, Xingyu Xu, Tian Tong, Yuejie Chi
- Abstract summary: This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD)
It converges linearly at a constant rate independent of the condition number of the low-rank object.
It maintains the low periteration cost of gradient descent for a variety of tasks.
- Score: 48.65416821017865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many problems encountered in science and engineering can be formulated as
estimating a low-rank object (e.g., matrices and tensors) from incomplete, and
possibly corrupted, linear measurements. Through the lens of matrix and tensor
factorization, one of the most popular approaches is to employ simple iterative
algorithms such as gradient descent (GD) to recover the low-rank factors
directly, which allow for small memory and computation footprints. However, the
convergence rate of GD depends linearly, and sometimes even quadratically, on
the condition number of the low-rank object, and therefore, GD slows down
painstakingly when the problem is ill-conditioned. This chapter introduces a
new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that
provably converges linearly at a constant rate independent of the condition
number of the low-rank object, while maintaining the low per-iteration cost of
gradient descent for a variety of tasks including sensing, robust principal
component analysis and completion. In addition, ScaledGD continues to admit
fast global convergence to the minimax-optimal solution, again almost
independent of the condition number, from a small random initialization when
the rank is over-specified in the presence of Gaussian noise. In total,
ScaledGD highlights the power of appropriate preconditioning in accelerating
nonconvex statistical estimation, where the iteration-varying preconditioners
promote desirable invariance properties of the trajectory with respect to the
symmetry in low-rank factorization without hurting generalization.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Max-affine regression via first-order methods [7.12511675782289]
The max-affine model ubiquitously arises in applications in signal processing and statistics.
We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
arXiv Detail & Related papers (2023-08-15T23:46:44Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - Global Convergence of Sub-gradient Method for Robust Matrix Recovery:
Small Initialization, Noisy Measurements, and Over-parameterization [4.7464518249313805]
Sub-gradient method (SubGM) is used to recover a low-rank matrix from a limited number of measurements.
We show that SubGM converges to the true solution, even under arbitrarily large and arbitrarily dense noise values.
arXiv Detail & Related papers (2022-02-17T17:50:04Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On Asymptotic Linear Convergence of Projected Gradient Descent for
Constrained Least Squares [22.851500417035947]
This manuscript presents a unified framework for the analysis of projected gradient descent in the context of constrained least squares.
We present a recipe for the convergence analysis of PGD and demonstrate it via a beginning-to-end application of the recipe on four fundamental problems.
arXiv Detail & Related papers (2021-12-22T09:49:51Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled
Gradient Descent [34.0533596121548]
Low-rank matrix estimation converges convex problem that finds numerous applications in signal processing, machine learning and imaging science.
We show that ScaledGD achieves the best of the best in terms of the number of the low-rank matrix.
Our analysis is also applicable to general loss that are similar to low-rank gradient descent.
arXiv Detail & Related papers (2020-05-18T17:17:16Z) - On the Convergence of Stochastic Gradient Descent with Low-Rank
Projections for Convex Low-Rank Matrix Problems [19.24470467199451]
We revisit the use of Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations.
We show that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems.
arXiv Detail & Related papers (2020-01-31T06:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.