Stochastic gradient-free descents
- URL: http://arxiv.org/abs/1912.13305v5
- Date: Tue, 14 Jan 2020 10:31:56 GMT
- Title: Stochastic gradient-free descents
- Authors: Xiaopeng Luo and Xin Xu
- Abstract summary: We propose gradient-free methods and accelerated gradients with momentum for solving optimization problems.
We analyze the convergence behavior of these methods under the mean-variance framework.
- Score: 8.663453034925363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose stochastic gradient-free methods and accelerated
methods with momentum for solving stochastic optimization problems. All these
methods rely on stochastic directions rather than stochastic gradients. We
analyze the convergence behavior of these methods under the mean-variance
framework, and also provide a theoretical analysis about the inclusion of
momentum in stochastic settings which reveals that the momentum term we used
adds a deviation of order $\mathcal{O}(1/k)$ but controls the variance at the
order $\mathcal{O}(1/k)$ for the $k$th iteration. So it is shown that, when
employing a decaying stepsize $\alpha_k=\mathcal{O}(1/k)$, the stochastic
gradient-free methods can still maintain the sublinear convergence rate
$\mathcal{O}(1/k)$ and the accelerated methods with momentum can achieve a
convergence rate $\mathcal{O}(1/k^2)$ in probability for the strongly convex
objectives with Lipschitz gradients; and all these methods converge to a
solution with a zero expected gradient norm when the objective function is
nonconvex, twice differentiable and bounded below.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods [0.3222802562733786]
We consider minimizing finite-sum expectation objective functions via Hessian-averaging based subsampled Newton methods.
These methods allow for inexactness and have fixed per-it Hessian approximation costs.
We present novel analysis techniques and propose challenges for their practical implementation.
arXiv Detail & Related papers (2024-08-14T03:27:48Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence [9.11726703830074]
In this paper, we consider solving the distributed optimization problem under the communication restricted setting.
We show the method over com-pressed exact diffusion termed CEDAS"
In particular, when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when
arXiv Detail & Related papers (2023-01-14T09:49:15Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Fast Margin Maximization via Dual Acceleration [52.62944011696364]
We present and analyze a momentum-based method for training linear classifiers with an exponentially-tailed loss.
This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual.
arXiv Detail & Related papers (2021-07-01T16:36:39Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Almost sure convergence rates for Stochastic Gradient Descent and
Stochastic Heavy Ball [17.33867778750777]
We study gradient descent (SGD) and the heavy ball method (SHB) for the general approximation problem.
For SGD, in the convex and smooth setting, we provide the first emphalmost sure convergence emphrates for a weighted average of the iterates.
arXiv Detail & Related papers (2020-06-14T11:12:05Z) - Can speed up the convergence rate of stochastic gradient methods to
$\mathcal{O}(1/k^2)$ by a gradient averaging strategy? [8.663453034925363]
We show that if the iterative variance we defined is always dominant even a little bit in the gradient iterations, the proposed gradient averaging strategy can increase the convergence rate.
This conclusion suggests how we should control the gradient iterations to improve the rate of convergence.
arXiv Detail & Related papers (2020-02-25T09:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.