Stochastic Hamiltonian Gradient Methods for Smooth Games
- URL: http://arxiv.org/abs/2007.04202v1
- Date: Wed, 8 Jul 2020 15:42:13 GMT
- Title: Stochastic Hamiltonian Gradient Methods for Smooth Games
- Authors: Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal
Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas
- Abstract summary: We focus on the class of Hamiltonian methods and provide the first convergence guarantees for certain classes of smooth games.
Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a gradient.
Our results provide the first global non-asymotic last-rate convergence guarantees for the class of general games.
- Score: 51.47367707388402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of adversarial formulations in machine learning has brought
renewed motivation for smooth games. In this work, we focus on the class of
stochastic Hamiltonian methods and provide the first convergence guarantees for
certain classes of stochastic smooth games. We propose a novel unbiased
estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight
its benefits. Using tools from the optimization literature we show that SHGD
converges linearly to the neighbourhood of a stationary point. To guarantee
convergence to the exact solution, we analyze SHGD with a decreasing step-size
and we also present the first stochastic variance reduced Hamiltonian method.
Our results provide the first global non-asymptotic last-iterate convergence
guarantees for the class of stochastic unconstrained bilinear games and for the
more general class of stochastic games that satisfy a "sufficiently bilinear"
condition, notably including some non-convex non-concave problems. We
supplement our analysis with experiments on stochastic bilinear and
sufficiently bilinear games, where our theory is shown to be tight, and on
simple adversarial machine learning formulations.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - High Probability Analysis for Non-Convex Stochastic Optimization with
Clipping [13.025261730510847]
gradient clipping is a technique for dealing with the heavy-tailed neural networks.
Most theoretical guarantees only provide an in-expectation analysis and only on the performance.
Our analysis provides a relatively complete picture for the theoretical guarantee of optimization algorithms with gradient clipping.
arXiv Detail & Related papers (2023-07-25T17:36:56Z) - Formal guarantees for heuristic optimization algorithms used in machine
learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems.
We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Minibatch and Momentum Model-based Methods for Stochastic Non-smooth
Non-convex Optimization [3.4809730725241597]
We make two important extensions to model-based methods.
First, we propose a new minibatch which takes a set of samples to approximate the model function in each iteration.
Second, by the success of momentum techniques we propose a new convex-based model.
arXiv Detail & Related papers (2021-06-06T05:31:57Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.