Related papers: Never Go Full Batch (in Stochastic Convex Optimization)

Never Go Full Batch (in Stochastic Convex Optimization)

URL: http://arxiv.org/abs/2107.00469v1
Date: Tue, 29 Jun 2021 16:07:50 GMT
Title: Never Go Full Batch (in Stochastic Convex Optimization)
Authors: Idan Amir, Yair Carmon, Tomer Koren, Roi Livni
Abstract summary: We study the generalization performance of $textfull-batch$ optimization algorithms for convex optimization. We provide a new separation result showing that, while algorithms such as gradient descent can generalize and optimize the population risk to within $epsilon$ after $O (1/epsilon2)$, full-batch methods either need at least $Omega (1/epsilon4)$ iterations or exhibit a dimension-dependent sample complexity.
Score: 42.46711831860667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within $\epsilon$ after $O(1/\epsilon^2)$ iterations, full-batch methods either need at least $\Omega(1/\epsilon^4)$ iterations or exhibit a dimension-dependent sample complexity.

Related papers

Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints [12.624604051853657]
We propose smoothed primal-dual algorithms for solving nonexact optimization problems with linear inequality constraints. Our algorithms are single-loop iterations based on one gradient at each sample. Unlike existing methods, our algorithms are free sub, large sizes or increasing parameters and use dual variable updates to ensure feasibility.
arXiv Detail & Related papers (2025-04-10T09:59:43Z)
Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates [15.27596975662702]
We explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our results match the minimax lower bound in citekamath2022, indicating that the theoretical limit of convex optimization under DP is achievable.
arXiv Detail & Related papers (2024-08-19T11:07:05Z)
Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance [17.199063087458907]
We quantify the convergence rate of the Mirror Descent algorithm with a class of uniformly convex mirror maps. This algorithm does not require any explicit gradient clipping or normalization. We complement our results with information-theoretic lower bounds showing that no other algorithm using only first-order oracles can achieve improved rates.
arXiv Detail & Related papers (2022-02-23T17:08:40Z)
A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse [121.54116938140754]
We propose a new Hessian inverse free Fully Single Loop Algorithm for bilevel optimization problems. We show that our algorithm converges with the rate of $O(epsilon-2)$.
arXiv Detail & Related papers (2021-12-09T02:27:52Z)
Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions. We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6) This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples. We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components. We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.