Select without Fear: Almost All Mini-Batch Schedules Generalize
Optimally
- URL: http://arxiv.org/abs/2305.02247v2
- Date: Mon, 23 Oct 2023 14:28:11 GMT
- Title: Select without Fear: Almost All Mini-Batch Schedules Generalize
Optimally
- Authors: Konstantinos E. Nikolakakis, Amin Karbasi, Dionysis Kalogerias
- Abstract summary: We establish matching upper and generalization error bounds for GD (GD) with either deterministic or otherwise independent data.
For smooth/non-adaptive non- losses, we show that full-batch (deterministic) GD is essentially optimal among possible batch schedules.
- Score: 38.3493773521059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish matching upper and lower generalization error bounds for
mini-batch Gradient Descent (GD) training with either deterministic or
stochastic, data-independent, but otherwise arbitrary batch selection rules. We
consider smooth Lipschitz-convex/nonconvex/strongly-convex loss functions, and
show that classical upper bounds for Stochastic GD (SGD) also hold verbatim for
such arbitrary nonadaptive batch schedules, including all deterministic ones.
Further, for convex and strongly-convex losses we prove matching lower bounds
directly on the generalization error uniform over the aforementioned class of
batch schedules, showing that all such batch schedules generalize optimally.
Lastly, for smooth (non-Lipschitz) nonconvex losses, we show that full-batch
(deterministic) GD is essentially optimal, among all possible batch schedules
within the considered class, including all stochastic ones.
Related papers
- Global Well-posedness and Convergence Analysis of Score-based Generative Models via Sharp Lipschitz Estimates [1.3124513975412255]
We establish global well-posedness and convergence of the score-based generative models (SGM)
For the smooth case, we start from a Lipschitz bound of the score function with optimal time length.
The optimality is validated by an example whose Lipschitz constant of scores is bounded at initial but blows up in finite time.
arXiv Detail & Related papers (2024-05-25T07:31:24Z) - Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning.
We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances.
We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z) - Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for
Full-Batch GD [31.80268332522017]
We provide sharp path-dependent and excess error guarantees for the full-batch Gradient Decent (GD) for smooth losses (possibly non-Lipschitz)
Our full-batch generalization error and excess risk bounds are significantly tighter than the existing bounds for GD, when the loss is smooth (but possibly non-Lipschitz)
arXiv Detail & Related papers (2022-04-26T17:05:57Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Between Stochastic and Adversarial Online Convex Optimization: Improved
Regret Bounds via Smoothness [2.628557920905129]
We establish novel regret bounds for online convex optimization in a setting that interpolates between i.i.d. and fully adversarial losses.
To accomplish this goal, we introduce two key quantities associated with the loss sequence, that we call the cumulative variance and the adversarial variation.
In the fully i.i.d. case, our bounds match the rates one would expect from results in acceleration, and in the fully adversarial case they gracefully deteriorate to match the minimax regret.
arXiv Detail & Related papers (2022-02-15T16:39:33Z) - Black-Box Generalization [31.80268332522017]
We provide the first error analysis for black-box learning through derivative generalization.
We show both generalization are independent $d$, $K$ and under appropriate choices a slightly decreased learning rate.
arXiv Detail & Related papers (2022-02-14T17:14:48Z) - Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via
Root-Entropic Regularization [16.536558038560695]
We consider prediction with expert advice when data are generated from varying arbitrarily within an unknown constraint set.
The Hedge algorithm was recently shown to be simultaneously minimax optimal for i.i.d. data.
We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels.
arXiv Detail & Related papers (2020-07-13T17:54:34Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z) - Information Directed Sampling for Linear Partial Monitoring [112.05623123909895]
We introduce information directed sampling (IDS) for partial monitoring with a linear reward and observation structure.
IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game.
We extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
arXiv Detail & Related papers (2020-02-25T21:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.