Related papers: On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems

On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems

URL: http://arxiv.org/abs/2001.11668v2
Date: Sun, 14 Jun 2020 14:26:36 GMT
Title: On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems
Authors: Dan Garber
Abstract summary: We revisit the use of Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations. We show that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems.
Score: 19.24470467199451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as \textit{matrix completion}, \textit{phase retrieval}, and more. The computational limitation of applying SGD to solving these relaxations in large-scale is the need to compute a potentially high-rank singular value decomposition (SVD) on each iteration in order to enforce the low-rank-promoting constraint. We begin by considering a simple and natural sufficient condition so that these relaxations indeed admit low-rank solutions. This condition is also necessary for a certain notion of low-rank-robustness to hold. Our main result shows that under this condition which involves the eigenvalues of the gradient vector at optimal points, SGD with mini-batches, when initialized with a "warm-start" point, produces iterates that are low-rank with high probability, and hence only a low-rank SVD computation is required on each iteration. This suggests that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems. Our theoretical results are accompanied with supporting preliminary empirical evidence. As a side benefit, our analysis is quite simple and short.

Related papers

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization [48.65416821017865]
This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD) It converges linearly at a constant rate independent of the condition number of the low-rank object. It maintains the low periteration cost of gradient descent for a variety of tasks.
arXiv Detail & Related papers (2023-10-09T21:16:57Z)
Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems [17.384717824118255]
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning.<n>In this paper we consider standard convex relaxations for such problems.<n>We prove that under a textitstrict complementarity condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular textitmirror-prox methods.
arXiv Detail & Related papers (2022-06-23T08:10:54Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems [19.930021245647705]
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. In this paper we consider standard convex relaxations for such problems. We show that the textitextragradient method converges to an optimal solution with rate $O (1/t)$ while requiring only two textitlow-rank SVDs per iteration.
arXiv Detail & Related papers (2022-02-08T17:47:40Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Provable Low Rank Plus Sparse Matrix Separation Via Nonconvex Regularizers [0.0]
This paper considers a large problem where we seek to recover a low rank matrix/or sparse vector from some set of measurements. While methods based on convex bias estimators suffer from bias the rank or sparsity to be known as a priori, we use non regularizers. We present a novel analysis of the proximal alternating bias descent algorithm applied to such problems.
arXiv Detail & Related papers (2021-09-26T22:09:42Z)
Low-rank matrix recovery with non-quadratic loss: projected gradient method and regularity projection oracle [23.84884127542249]
Existing results for low-rank matrix recovery largely behaved on quadratic loss. We show that a critical component in provable low-rank recovery with non-quadratic loss is a regularity projection.
arXiv Detail & Related papers (2020-08-31T17:56:04Z)
A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound. It bypasses the computation of singular values and allows fast optimization by algorithms. It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z)
Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression. The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent [34.0533596121548]
Low-rank matrix estimation converges convex problem that finds numerous applications in signal processing, machine learning and imaging science. We show that ScaledGD achieves the best of the best in terms of the number of the low-rank matrix. Our analysis is also applicable to general loss that are similar to low-rank gradient descent.
arXiv Detail & Related papers (2020-05-18T17:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.