Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions
- URL: http://arxiv.org/abs/2006.08167v2
- Date: Wed, 26 Jan 2022 19:49:59 GMT
- Title: Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions
- Authors: Tesi Xiao, Krishnakumar Balasubramanian, Saeed Ghadimi
- Abstract summary: We show that the conditional gradient method requires $mathcalO(epsilon-1.5)$ calls to the gradient oracle to find an $epsilon$-optimal solution.
By including a gradient sliding step, we show that the number of calls reduces to $mathcalO(epsilon-1.5)$.
- Score: 12.096252285460814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze stochastic conditional gradient methods for constrained
optimization problems arising in over-parametrized machine learning. We show
that one could leverage the interpolation-like conditions satisfied by such
models to obtain improved oracle complexities. Specifically, when the objective
function is convex, we show that the conditional gradient method requires
$\mathcal{O}(\epsilon^{-2})$ calls to the stochastic gradient oracle to find an
$\epsilon$-optimal solution. Furthermore, by including a gradient sliding step,
we show that the number of calls reduces to $\mathcal{O}(\epsilon^{-1.5})$.
Related papers
- Gradient Methods with Online Scaling [19.218484733179356]
We introduce a framework to accelerate the convergence of gradient-based methods with online learning.
We show for the first time that the widely-used hypergradient descent improves on the convergence of gradient descent.
arXiv Detail & Related papers (2024-11-04T05:04:18Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity [24.45688490844496]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization.
In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization.
arXiv Detail & Related papers (2023-05-23T15:26:36Z) - Min-Max Optimization Made Simple: Approximating the Proximal Point
Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems.
Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - Bilevel Optimization with a Lower-level Contraction: Optimal Sample
Complexity without Warm-start [39.13249645102326]
We study bilevel problems in which the upper-level problem consists in the iterations of a smooth objective function and the lower-level problem is to find the fixed point of a smooth map.
Several recent works have proposed algorithms which warm-start the lower-level problem.
We show that without warm-start, it is still possible to achieve order-wise optimal sample complexity.
arXiv Detail & Related papers (2022-02-07T18:35:46Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.