Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix
Optimization Problems
- URL: http://arxiv.org/abs/2202.04026v1
- Date: Tue, 8 Feb 2022 17:47:40 GMT
- Title: Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix
Optimization Problems
- Authors: Dan Garber, Atara Kaplan
- Abstract summary: Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning.
In this paper we consider standard convex relaxations for such problems.
We show that the textitextragradient method converges to an optimal solution with rate $O (1/t)$ while requiring only two textitlow-rank SVDs per iteration.
- Score: 19.930021245647705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rank and nonsmooth matrix optimization problems capture many fundamental
tasks in statistics and machine learning. While significant progress has been
made in recent years in developing efficient methods for \textit{smooth}
low-rank optimization problems that avoid maintaining high-rank matrices and
computing expensive high-rank SVDs, advances for nonsmooth problems have been
slow paced.
In this paper we consider standard convex relaxations for such problems.
Mainly, we prove that under a natural \textit{generalized strict
complementarity} condition and under the relatively mild assumption that the
nonsmooth objective can be written as a maximum of smooth functions, the
\textit{extragradient method}, when initialized with a "warm-start" point,
converges to an optimal solution with rate $O(1/t)$ while requiring only two
\textit{low-rank} SVDs per iteration. We give a precise trade-off between the
rank of the SVDs required and the radius of the ball in which we need to
initialize the method. We support our theoretical results with empirical
experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating
that using simple initializations, the extragradient method produces exactly
the same iterates when full-rank SVDs are replaced with SVDs of rank that
matches the rank of the (low-rank) ground-truth matrix to be recovered.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Low-Rank Extragradient Methods for Scalable Semidefinite Optimization [0.0]
We focus on high-dimensional and plausible settings in which the problem admits a low-rank solution.
We provide several theoretical results proving that, under these circumstances, the well-known Extragradient method converges to a solution of the constrained optimization problem.
arXiv Detail & Related papers (2024-02-14T10:48:00Z) - Bilevel Optimization under Unbounded Smoothness: A New Algorithm and
Convergence Analysis [17.596465452814883]
Current bilevel optimization algorithms assume that the iterations of the upper-level function is unbounded smooth.
Recent studies reveal that certain neural networks exhibit such unbounded smoothness.
arXiv Detail & Related papers (2024-01-17T20:28:15Z) - Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization
Problems [19.930021245647705]
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning.
In this paper we consider standard convex relaxations for such problems.
We prove that under a textitstrict complementarity condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular textitmirror-prox methods.
arXiv Detail & Related papers (2022-06-23T08:10:54Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix
Completion [60.52730146391456]
We propose a new non scalable low-rank regularizer called "nuclear Frobenius norm" regularizer, which is adaptive and sound.
It bypasses the computation of singular values and allows fast optimization by algorithms.
It obtains state-of-the-art recovery performance while being the fastest in existing matrix learning methods.
arXiv Detail & Related papers (2020-08-14T18:47:58Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z) - On the Convergence of Stochastic Gradient Descent with Low-Rank
Projections for Convex Low-Rank Matrix Problems [19.24470467199451]
We revisit the use of Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations.
We show that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems.
arXiv Detail & Related papers (2020-01-31T06:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.