Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax
Optimization
- URL: http://arxiv.org/abs/2206.02953v1
- Date: Tue, 7 Jun 2022 00:37:37 GMT
- Title: Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax
Optimization
- Authors: Aniket Das, Bernhard Sch\"olkopf, Michael Muehlebach
- Abstract summary: We analyze convergence rates of gradient algorithms for smooth finite-sum minimax optimization.
We show that, for many such algorithms, sampling points without replacement leads to faster convergence compared to sampling with replacement.
- Score: 12.794526174053134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze the convergence rates of stochastic gradient algorithms for smooth
finite-sum minimax optimization and show that, for many such algorithms,
sampling the data points without replacement leads to faster convergence
compared to sampling with replacement. For the smooth and strongly
convex-strongly concave setting, we consider gradient descent ascent and the
proximal point method, and present a unified analysis of two popular
without-replacement sampling strategies, namely Random Reshuffling (RR), which
shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which
shuffles only at the beginning. We obtain tight convergence rates for RR and SO
and demonstrate that these strategies lead to faster convergence than uniform
sampling. Moving beyond convexity, we obtain similar results for smooth
nonconvex-nonconcave objectives satisfying a two-sided Polyak-{\L}ojasiewicz
inequality. Finally, we demonstrate that our techniques are general enough to
analyze the effect of data-ordering attacks, where an adversary manipulates the
order in which data points are supplied to the optimizer. Our analysis also
recovers tight rates for the incremental gradient method, where the data points
are not shuffled at all.
Related papers
- Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - Nesterov Accelerated Shuffling Gradient Method for Convex Optimization [15.908060383231371]
We show that our algorithm has an improved rate of $mathcalO (1/T)$ using unified shuffling schemes.
Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition.
Numerical simulations demonstrate the efficiency of our algorithm.
arXiv Detail & Related papers (2022-02-07T21:23:17Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - SGD with shuffling: optimal rates without component convexity and large
epoch requirements [60.65928290219793]
We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once)
We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps.
We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
arXiv Detail & Related papers (2020-06-12T05:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.