Related papers: Randomized Stochastic Gradient Descent Ascent

Randomized Stochastic Gradient Descent Ascent

URL: http://arxiv.org/abs/2111.13162v1
Date: Thu, 25 Nov 2021 16:44:19 GMT
Title: Randomized Stochastic Gradient Descent Ascent
Authors: Othmane Sebbouh and Marco Cuturi and Gabriel Peyr\'e
Abstract summary: An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function. We propose RSGDA (Randomized SGD), a variant of ESGDA with loop size with a simpler theoretical analysis.
Score: 37.887266927498395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function that is itself defined as a maximum. Carrying a loop of stochastic gradient ascent (SGA) steps on the (inner) maximization problem, followed by an SGD step on the (outer) minimization, is known as Epoch Stochastic Gradient \textit{Descent Ascent} (ESGDA). While successful in practice, the theoretical analysis of ESGDA remains challenging, with no clear guidance on choices for the inner loop size nor on the interplay between inner/outer step sizes. We propose RSGDA (Randomized SGDA), a variant of ESGDA with stochastic loop size with a simpler theoretical analysis. RSGDA comes with the first (among SGDA algorithms) almost sure convergence rates when used on nonconvex min/strongly-concave max settings. RSGDA can be parameterized using optimal loop sizes that guarantee the best convergence rates known to hold for SGDA. We test RSGDA on toy and larger scale problems, using distributionally robust optimization and single-cell data matching using optimal transport as a testbed.

Related papers

SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates [19.420605210427635]
SketchySGD improves upon existing gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. In the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems.
arXiv Detail & Related papers (2022-11-16T01:05:41Z)
SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization [18.668531108219415]
We present a theoretical approach for solving minimax optimization problems using sequentially descent-ascent gradient (SGDA) We analyze both simultaneous and alternating SGDA-LL objectives for non-concave objectives with Polyak-Lojasiewicz (PL) geometry. Our rates also extend to mini-batch-GDARR, recovering few known rates for full gradient-batch descent-ascent gradient (GDA) Finally, we present a comprehensive lower bound for two-time-scale GDA, which matches the full rate for primal-PL-
arXiv Detail & Related papers (2022-10-12T08:05:41Z)
Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity [30.97847679999505]
Gradient descent ascent (GDA) is the simplest single-loop algorithm for nonstationary minimax optimization. Recent work shows inferior convergence rates of GDA in adversarial networks. Two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- are presented.
arXiv Detail & Related papers (2021-12-10T15:35:42Z)
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression [122.70478935214128]
gradient descent (SGD) has been demonstrated to generalize well in many deep learning applications. This paper provides problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize.
arXiv Detail & Related papers (2021-10-12T17:49:54Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD) We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
An Optimal Multistage Stochastic Gradient Method for Minimax Problems [8.615625517708324]
We study the minimax optimization problem in the smooth and strongly convex-strongly concave setting. We first analyze the Gradient Descent Ascent (GDA) method with constant stepsize. We propose a multistage variant of GDA that runs in multiple stages with a particular learning rate decay schedule.
arXiv Detail & Related papers (2020-02-13T18:01:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.