Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles
- URL: http://arxiv.org/abs/2408.01839v1
- Date: Sat, 3 Aug 2024 18:34:23 GMT
- Title: Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles
- Authors: Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran,
- Abstract summary: This work investigates the performance limits of projected first-order methods for minimizing functions under the $(alpha,tau,mathcal)$-projected-dominance property.
- Score: 38.45952947660789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work investigates the performance limits of projected stochastic first-order methods for minimizing functions under the $(\alpha,\tau,\mathcal{X})$-projected-gradient-dominance property, that asserts the sub-optimality gap $F(\mathbf{x})-\min_{\mathbf{x}'\in \mathcal{X}}F(\mathbf{x}')$ is upper-bounded by $\tau\cdot\|\mathcal{G}_{\eta,\mathcal{X}}(\mathbf{x})\|^{\alpha}$ for some $\alpha\in[1,2)$ and $\tau>0$ and $\mathcal{G}_{\eta,\mathcal{X}}(\mathbf{x})$ is the projected-gradient mapping with $\eta>0$ as a parameter. For non-convex functions, we show that the complexity lower bound of querying a batch smooth first-order stochastic oracle to obtain an $\epsilon$-global-optimum point is $\Omega(\epsilon^{-{2}/{\alpha}})$. Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of $\mathcal{O}(\epsilon^{-{2}/{\alpha}})$, matching the lower bound. For convex functions, we establish a complexity lower bound of $\Omega(\log(1/\epsilon)\cdot\epsilon^{-{2}/{\alpha}})$ for minimizing functions under a local version of gradient-dominance property, which also matches the upper complexity bound of accelerated stochastic subgradient methods.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity [22.615156512223763]
We propose variance- optimistic sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective.
We prove $mathcal O(delta D2/varepsilon)$, communication complexity of $mathcal O(n+sqrtndelta D2/varepsilon)$, and local calls of $tildemathcal O(n+sqrtndelta+L)D2/varepsilon)$.
arXiv Detail & Related papers (2024-05-25T08:34:49Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Decentralized Stochastic Variance Reduced Extragradient Method [25.21457349137344]
This paper studies decentralized convex-concave minimax optimization problems of the form $min_xmax_y fx,y triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msum
arXiv Detail & Related papers (2022-02-01T16:06:20Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave
Minimax Problem [16.689304539024036]
In this paper, we consider non-asymotic behavior of finding second-order stationary point for minimax problem.
For high-dimensional problems, we propose anf to expensive cost form second-order oracle which solves the cubic sub-problem in gradient via descent and Chebyshev expansion.
arXiv Detail & Related papers (2021-10-10T14:54:23Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.