Stability and Generalization for Minibatch SGD and Local SGD
- URL: http://arxiv.org/abs/2310.01139v2
- Date: Mon, 30 Oct 2023 07:30:10 GMT
- Title: Stability and Generalization for Minibatch SGD and Local SGD
- Authors: Yunwen Lei, Tao Sun, Mingrui Liu
- Abstract summary: Minibatch gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization.
We study the stability and generalization analysis of minibatch and local SGD to understand their learnability.
We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
- Score: 46.45496260281998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing scale of data propels the popularity of leveraging parallelism
to speed up the optimization. Minibatch stochastic gradient descent (minibatch
SGD) and local SGD are two popular methods for parallel optimization. The
existing theoretical studies show a linear speedup of these methods with
respect to the number of machines, which, however, is measured by optimization
errors. As a comparison, the stability and generalization of these methods are
much less studied. In this paper, we study the stability and generalization
analysis of minibatch and local SGD to understand their learnability by
introducing a novel expectation-variance decomposition. We incorporate training
errors into the stability analysis, which shows how small training errors help
generalization for overparameterized models. We show both minibatch and local
SGD achieve a linear speedup to attain the optimal risk bounds.
Related papers
- Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication [37.210933391984014]
Local SGD is a popular optimization method in distributed learning, often outperforming other algorithms in practice.
We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions.
We also show the min-max optimality of accelerated mini-batch SGD for several problem classes.
arXiv Detail & Related papers (2024-05-19T20:20:03Z) - SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax
optimization [18.668531108219415]
We present a theoretical approach for solving minimax optimization problems using sequentially descent-ascent gradient (SGDA)
We analyze both simultaneous and alternating SGDA-LL objectives for non-concave objectives with Polyak-Lojasiewicz (PL) geometry.
Our rates also extend to mini-batch-GDARR, recovering few known rates for full gradient-batch descent-ascent gradient (GDA)
Finally, we present a comprehensive lower bound for two-time-scale GDA, which matches the full rate for primal-PL-
arXiv Detail & Related papers (2022-10-12T08:05:41Z) - Learning from time-dependent streaming data with online stochastic
algorithms [7.283533791778357]
This paper addresses optimization in a streaming setting with time-dependent and biased estimates.
We analyze several first-order methods, including Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages.
arXiv Detail & Related papers (2022-05-25T07:53:51Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Minibatch vs Local SGD for Heterogeneous Distributed Learning [28.80878557506603]
We argue that Minibatch SGD dominates all existing analysis of Local SGD in this setting.
We present the first upper bound for Local SGD that improves over Minibatch SGD in a non-homogeneous regime.
arXiv Detail & Related papers (2020-06-08T16:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.