Explore Aggressively, Update Conservatively: Stochastic Extragradient
Methods with Variable Stepsize Scaling
- URL: http://arxiv.org/abs/2003.10162v2
- Date: Thu, 5 Nov 2020 13:34:04 GMT
- Title: Explore Aggressively, Update Conservatively: Stochastic Extragradient
Methods with Variable Stepsize Scaling
- Authors: Yu-Guan Hsieh, Franck Iutzeler, J\'er\^ome Malick, Panayotis
Mertikopoulos
- Abstract summary: Extragradient methods have become a staple for solving large-scale saddlepoint problems in machine learning.
We show in this paper that running vanilla extragradient with gradients may jeopardize its convergence, even in simple bilinear models.
We show that this modification allows the method to converge even with gradients, and we derive sharp convergence rates under an error bound condition.
- Score: 34.35013145885164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Owing to their stability and convergence speed, extragradient methods have
become a staple for solving large-scale saddle-point problems in machine
learning. The basic premise of these algorithms is the use of an extrapolation
step before performing an update; thanks to this exploration step,
extra-gradient methods overcome many of the non-convergence issues that plague
gradient descent/ascent schemes. On the other hand, as we show in this paper,
running vanilla extragradient with stochastic gradients may jeopardize its
convergence, even in simple bilinear models. To overcome this failure, we
investigate a double stepsize extragradient algorithm where the exploration
step evolves at a more aggressive time-scale compared to the update step. We
show that this modification allows the method to converge even with stochastic
gradients, and we derive sharp convergence rates under an error bound
condition.
Related papers
- Stagewise Boosting Distributional Regression [0.0]
We propose a stagewise boosting-type algorithm for distributional regression.
We extend it with a novel regularization method, correlation filtering, to provide additional stability.
Besides the advantage of processing large datasets, the nature of the approximations can lead to better results.
arXiv Detail & Related papers (2024-05-28T15:40:39Z) - One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware
Quantization Training [12.400950982075948]
Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources.
Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient.
This paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction.
arXiv Detail & Related papers (2024-01-30T05:42:54Z) - Aiming towards the minimizers: fast convergence of SGD for
overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method.
All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence.
We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z) - Stochastic Gradient Methods with Preconditioned Updates [47.23741709751474]
There are several algorithms for such problems, but existing methods often work poorly when badly scaled and/or ill-conditioned.
Here we include preconditionimater based on Hutchinson's approach to approxing the diagonal Hessian.
We prove convergence both when smoothness and the PL condition are assumed.
arXiv Detail & Related papers (2022-06-01T07:38:08Z) - Cutting Some Slack for SGD with Adaptive Polyak Stepsizes [35.024680868164445]
We consider the family of SPS (Stochastic gradient with a Polyak Stepsize) adaptive methods.
We first show that SPS and its recent variants can all be seen as extensions of the Passive-Aggressive methods applied to nonlinear problems.
We use this insight to develop new variants of the SPS method that are better suited to nonlinear models.
arXiv Detail & Related papers (2022-02-24T19:31:03Z) - One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.