Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe
Algorithm under Parallelization
- URL: http://arxiv.org/abs/2205.12751v1
- Date: Wed, 25 May 2022 13:01:09 GMT
- Title: Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe
Algorithm under Parallelization
- Authors: Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor
- Abstract summary: We consider the problem of minimizing the sum of two convex functions.
One has Lipschitz-continuous gradients, and can be accessed via oracles, whereas the other is "simple"
We show that one can achieve an $epsilon$ primaldual gap (in expectation) in $tildeO (1/ sqrtepsilon)$ iterations.
- Score: 9.166498349629157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of minimizing the sum of two convex functions. One of
those functions has Lipschitz-continuous gradients, and can be accessed via
stochastic oracles, whereas the other is "simple". We provide a Bregman-type
algorithm with accelerated convergence in function values to a ball containing
the minimum. The radius of this ball depends on problem-dependent constants,
including the variance of the stochastic oracle. We further show that this
algorithmic setup naturally leads to a variant of Frank-Wolfe achieving
acceleration under parallelization. More precisely, when minimizing a smooth
convex function on a bounded domain, we show that one can achieve an $\epsilon$
primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\epsilon})$ iterations,
by only accessing gradients of the original function and a linear maximization
oracle with $O(1/\sqrt{\epsilon})$ computing units in parallel. We illustrate
this fast convergence on synthetic numerical experiments.
Related papers
- An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Revisiting Subgradient Method: Complexity and Convergence Beyond
Lipschitz Continuity [29.56022052936922]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization.
In this work, we first extend the typical complexity results for the subgradient method to convex and weakly convex objective functions.
We provide convergence results for non-Lipschitz convex and weakly convex objective functions using proper diminishing rules on the step sizes.
arXiv Detail & Related papers (2023-05-23T15:26:36Z) - Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
This work considers the problem of finding first-order stationary point of a non atau function with potentially constant smoothness using a gradient.
We develop a technique that allows us to prove $mathcalO(fracmathrmpolylog(T)sigmatT)$ convergence rates without assuming uniform bounds on the noise.
arXiv Detail & Related papers (2023-02-13T18:13:36Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - On the Complexity of Finding Small Subgradients in Nonsmooth
Optimization [31.714928102950584]
We show that no dimension-free rate can be achieved by a deterministic algorithm.
We show how the convergence rate of finding $(delta,epsilon)$-stationary points can be improved in case the function is convex.
arXiv Detail & Related papers (2022-09-21T13:30:00Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Thinking Outside the Ball: Optimal Learning with Gradient Descent for
Generalized Linear Stochastic Convex Optimization [37.177329562964765]
We consider linear prediction with a convex Lipschitz loss, or more generally, convex optimization problems of generalized linear form.
We show that in this setting, early iteration stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $epsilon$.
But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Theta (1/epsilon4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
arXiv Detail & Related papers (2022-02-27T09:41:43Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Parameter-free Stochastic Optimization of Variationally Coherent
Functions [19.468067110814808]
We design and analyze an algorithm for first-order optimization of a class of functions on $mathbbRdilon.
It is the first to achieve both these at the same time.
arXiv Detail & Related papers (2021-01-30T15:05:34Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.