On Stochastic Variance Reduced Gradient Method for Semidefinite
Optimization
- URL: http://arxiv.org/abs/2101.00236v1
- Date: Fri, 1 Jan 2021 13:55:32 GMT
- Title: On Stochastic Variance Reduced Gradient Method for Semidefinite
Optimization
- Authors: Jinshan Zeng and Yixuan Zha and Ke Ma and Yuan Yao
- Abstract summary: The SVRG method has been regarded as one of the most effective methods.
There is a significant gap between the theory and practice when adapted to the semidefinite programming (SDP)
In this paper, we fill this gap via exploiting a new variant of the original SVRG with Option I adapted to the semidefinite optimization.
- Score: 14.519696724619074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The low-rank stochastic semidefinite optimization has attracted rising
attention due to its wide range of applications. The nonconvex reformulation
based on the low-rank factorization, significantly improves the computational
efficiency but brings some new challenge to the analysis. The stochastic
variance reduced gradient (SVRG) method has been regarded as one of the most
effective methods. SVRG in general consists of two loops, where a reference
full gradient is first evaluated in the outer loop and then used to yield a
variance reduced estimate of the current gradient in the inner loop. Two
options have been suggested to yield the output of the inner loop, where Option
I sets the output as its last iterate, and Option II yields the output via
random sampling from all the iterates in the inner loop. However, there is a
significant gap between the theory and practice of SVRG when adapted to the
stochastic semidefinite programming (SDP). SVRG practically works better with
Option I, while most of existing theoretical results focus on Option II. In
this paper, we fill this gap via exploiting a new semi-stochastic variant of
the original SVRG with Option I adapted to the semidefinite optimization.
Equipped with this, we establish the global linear submanifold convergence
(i.e., converging exponentially fast to a submanifold of a global minimum under
the orthogonal group action) of the proposed SVRG method, given a provable
initialization scheme and under certain smoothness and restricted strongly
convex assumptions. Our analysis includes the effects of the mini-batch size
and update frequency in the inner loop as well as two practical step size
strategies, the fixed and stabilized Barzilai-Borwein step sizes. Some
numerical results in matrix sensing demonstrate the efficiency of proposed SVRG
method outperforming Option II counterpart as well as others.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient [40.22217106270146]
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.
In composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random estimation.
This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances.
arXiv Detail & Related papers (2024-05-28T02:27:53Z) - Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence
and Variance Reduction [26.9632099249085]
We propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings.
We equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $smashwidetildemathcalO(n+1/epsilon)$ gradient evaluations.
arXiv Detail & Related papers (2023-08-11T10:17:29Z) - Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed
Smoothness Conditions [9.518010235273785]
We present a novel fully Liploop Hessian-inversion-free algorithmic framework for bilevel optimization.
We show that by a slight modification of our approach our approach can handle a more general multi-objective robust bilevel optimization problem.
arXiv Detail & Related papers (2023-06-21T07:32:29Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.