On the Convergence of Stochastic Gradient Descent with Low-Rank
Projections for Convex Low-Rank Matrix Problems
- URL: http://arxiv.org/abs/2001.11668v2
- Date: Sun, 14 Jun 2020 14:26:36 GMT
- Title: On the Convergence of Stochastic Gradient Descent with Low-Rank
Projections for Convex Low-Rank Matrix Problems
- Authors: Dan Garber
- Abstract summary: We revisit the use of Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations.
We show that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems.
- Score: 19.24470467199451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the use of Stochastic Gradient Descent (SGD) for solving convex
optimization problems that serve as highly popular convex relaxations for many
important low-rank matrix recovery problems such as \textit{matrix completion},
\textit{phase retrieval}, and more. The computational limitation of applying
SGD to solving these relaxations in large-scale is the need to compute a
potentially high-rank singular value decomposition (SVD) on each iteration in
order to enforce the low-rank-promoting constraint. We begin by considering a
simple and natural sufficient condition so that these relaxations indeed admit
low-rank solutions. This condition is also necessary for a certain notion of
low-rank-robustness to hold. Our main result shows that under this condition
which involves the eigenvalues of the gradient vector at optimal points, SGD
with mini-batches, when initialized with a "warm-start" point, produces
iterates that are low-rank with high probability, and hence only a low-rank SVD
computation is required on each iteration. This suggests that SGD may indeed be
practically applicable to solving large-scale convex relaxations of low-rank
matrix recovery problems. Our theoretical results are accompanied with
supporting preliminary empirical evidence. As a side benefit, our analysis is
quite simple and short.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled
Gradient Descent, Even with Overparameterization [48.65416821017865]
This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD)
It converges linearly at a constant rate independent of the condition number of the low-rank object.
It maintains the low periteration cost of gradient descent for a variety of tasks.
arXiv Detail & Related papers (2023-10-09T21:16:57Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix
Optimization Problems [19.930021245647705]
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning.
In this paper we consider standard convex relaxations for such problems.
We show that the textitextragradient method converges to an optimal solution with rate $O (1/t)$ while requiring only two textitlow-rank SVDs per iteration.
arXiv Detail & Related papers (2022-02-08T17:47:40Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Provable Low Rank Plus Sparse Matrix Separation Via Nonconvex
Regularizers [0.0]
This paper considers a large problem where we seek to recover a low rank matrix/or sparse vector from some set of measurements.
While methods based on convex bias estimators suffer from bias the rank or sparsity to be known as a priori, we use non regularizers.
We present a novel analysis of the proximal alternating bias descent algorithm applied to such problems.
arXiv Detail & Related papers (2021-09-26T22:09:42Z) - Low-rank matrix recovery with non-quadratic loss: projected gradient
method and regularity projection oracle [23.84884127542249]
Existing results for low-rank matrix recovery largely behaved on quadratic loss.
We show that a critical component in provable low-rank recovery with non-quadratic loss is a regularity projection.
arXiv Detail & Related papers (2020-08-31T17:56:04Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled
Gradient Descent [34.0533596121548]
Low-rank matrix estimation converges convex problem that finds numerous applications in signal processing, machine learning and imaging science.
We show that ScaledGD achieves the best of the best in terms of the number of the low-rank matrix.
Our analysis is also applicable to general loss that are similar to low-rank gradient descent.
arXiv Detail & Related papers (2020-05-18T17:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.