Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity
- URL: http://arxiv.org/abs/2409.10773v2
- Date: Tue, 1 Oct 2024 15:57:06 GMT
- Title: Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity
- Authors: Site Bai, Brian Bullins,
- Abstract summary: We provide tight lower bounds for the oracle complexity of minimizing high-order H"older smooth and uniformly convex functions.
Our analysis generalizes previous lower bounds for functions under first- and second-order smoothness as well as those for uniformly convex functions.
- Score: 6.972653925522813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we provide tight lower bounds for the oracle complexity of minimizing high-order H\"older smooth and uniformly convex functions. Specifically, for a function whose $p^{th}$-order derivatives are H\"older continuous with degree $\nu$ and parameter $H$, and that is uniformly convex with degree $q$ and parameter $\sigma$, we focus on two asymmetric cases: (1) $q > p + \nu$, and (2) $q < p+\nu$. Given up to $p^{th}$-order oracle access, we establish worst-case oracle complexities of $\Omega\left( \left( \frac{H}{\sigma}\right)^\frac{2}{3(p+\nu)-2}\left( \frac{\sigma}{\epsilon}\right)^\frac{2(q-p-\nu)}{q(3(p+\nu)-2)}\right)$ in the first case with an $\ell_\infty$-ball-truncated-Gaussian smoothed hard function and $\Omega\left(\left(\frac{H}{\sigma}\right)^\frac{2}{3(p+\nu)-2}+ \log^2\left(\frac{\sigma^{p+\nu}}{H^q}\right)^\frac{1}{p+\nu-q}\right)$ in the second case, for reaching an $\epsilon$-approximate solution in terms of the optimality gap. Our analysis generalizes previous lower bounds for functions under first- and second-order smoothness as well as those for uniformly convex functions, and furthermore our results match the corresponding upper bounds in the general setting.
Related papers
- Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles [38.45952947660789]
This work investigates the performance limits of projected first-order methods for minimizing functions under the $(alpha,tau,mathcal)$-projected-dominance property.
arXiv Detail & Related papers (2024-08-03T18:34:23Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - On Partially Unitary Learning [0.0]
An optimal mapping between Hilbert spaces $IN$ of $left|psirightrangle$ and $OUT$ of $left|phirightrangle$ is presented.
An algorithm finding the global maximum of this optimization problem is developed and it's application to a number of problems is demonstrated.
arXiv Detail & Related papers (2024-05-16T17:13:55Z) - On the Complexity of Finite-Sum Smooth Optimization under the
Polyak-{\L}ojasiewicz Condition [14.781921087738967]
This paper considers the optimization problem of the form $min_bf xinmathbb Rd f(bf x)triangleq frac1nsum_i=1n f_i(bf x)$, where $f(cdot)$ satisfies the Polyak--Lojasiewicz (PL) condition with parameter $mu$ and $f_i(cdot)_i=1n$ is $L$-mean-squared smooth.
arXiv Detail & Related papers (2024-02-04T17:14:53Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - The First Optimal Acceleration of High-Order Methods in Smooth Convex
Optimization [88.91190483500932]
We study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems.
The reason for this is that these algorithms require performing a complex binary procedure, which makes them neither optimal nor practical.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft(epsilon-2/(p+1)right) $pth order oracle complexity.
arXiv Detail & Related papers (2022-05-19T16:04:40Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave
Minimax Problem [16.689304539024036]
In this paper, we consider non-asymotic behavior of finding second-order stationary point for minimax problem.
For high-dimensional problems, we propose anf to expensive cost form second-order oracle which solves the cubic sub-problem in gradient via descent and Chebyshev expansion.
arXiv Detail & Related papers (2021-10-10T14:54:23Z) - Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
We give a sharp complexity analysis for EXTRA with the improved $Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$.
Our communication complexities of the accelerated EXTRA are only worse by the factors of $left(logfracLmu (1-sigma_2(W))right)$ and $left(logfrac1epsilon (1-
arXiv Detail & Related papers (2020-02-24T08:07:08Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.