Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions
- URL: http://arxiv.org/abs/2006.08167v2
- Date: Wed, 26 Jan 2022 19:49:59 GMT
- Title: Improved Complexities for Stochastic Conditional Gradient Methods under
Interpolation-like Conditions
- Authors: Tesi Xiao, Krishnakumar Balasubramanian, Saeed Ghadimi
- Abstract summary: We show that the conditional gradient method requires $mathcalO(epsilon-1.5)$ calls to the gradient oracle to find an $epsilon$-optimal solution.
By including a gradient sliding step, we show that the number of calls reduces to $mathcalO(epsilon-1.5)$.
- Score: 12.096252285460814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze stochastic conditional gradient methods for constrained
optimization problems arising in over-parametrized machine learning. We show
that one could leverage the interpolation-like conditions satisfied by such
models to obtain improved oracle complexities. Specifically, when the objective
function is convex, we show that the conditional gradient method requires
$\mathcal{O}(\epsilon^{-2})$ calls to the stochastic gradient oracle to find an
$\epsilon$-optimal solution. Furthermore, by including a gradient sliding step,
we show that the number of calls reduces to $\mathcal{O}(\epsilon^{-1.5})$.
Related papers
- Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems.
Our algorithms allow for the estimate of complexity using the distance of.
mathO (log d) / EuM4.
We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z) - Revisiting Subgradient Method: Complexity and Convergence Beyond
Lipschitz Continuity [29.56022052936922]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization.
In this work, we first extend the typical complexity results for the subgradient method to convex and weakly convex objective functions.
We provide convergence results for non-Lipschitz convex and weakly convex objective functions using proper diminishing rules on the step sizes.
arXiv Detail & Related papers (2023-05-23T15:26:36Z) - Min-Max Optimization Made Simple: Approximating the Proximal Point
Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems.
Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - Bilevel Optimization with a Lower-level Contraction: Optimal Sample
Complexity without Warm-start [39.13249645102326]
We study bilevel problems in which the upper-level problem consists in the iterations of a smooth objective function and the lower-level problem is to find the fixed point of a smooth map.
Several recent works have proposed algorithms which warm-start the lower-level problem.
We show that without warm-start, it is still possible to achieve order-wise optimal sample complexity.
arXiv Detail & Related papers (2022-02-07T18:35:46Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.