On the Condition Number Dependency in Bilevel Optimization
- URL: http://arxiv.org/abs/2511.22331v1
- Date: Thu, 27 Nov 2025 11:03:24 GMT
- Title: On the Condition Number Dependency in Bilevel Optimization
- Authors: Lesi Chen, Jingzhao Zhang,
- Abstract summary: Bilevel optimization between an objective function defined by an upper-level problem whose feasible region is the solution of a lower-level problem.<n>For second-order and hyper-smooth problems, we show $(_y13/4 )$ and $(4-4)$ respectively.
- Score: 23.985835962136793
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(κ^4 ε^{-2})$ upper bound that is near-optimal in $ε$. However, the optimal dependency on the condition number $κ$ is unknown. In this work, we establish a new $Ω(κ^2 ε^{-2})$ lower bound and $\tilde{\mathcal{O}}(κ^{7/2} ε^{-2})$ upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show $Ω(κ_y^{13/4} ε^{-12/7})$ and $Ω(κ^{17/10} ε^{-8/5})$ lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ/\sqrtε)$ to $Ω(κ^{5/4} / \sqrtε)$. (3) For smooth stochastic problems, we show an $Ω(κ^4 ε^{-4})$ lower bound.
Related papers
- Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds [27.851263935083736]
We investigate distributed online convex optimization with compressed communication.<n>We propose a novel algorithm that achieves improved regret bounds of $tildeO(-1/2-1nsqrtT)$ and $tildeO(-1-2nlnT)$ for convex and strongly convex functions.
arXiv Detail & Related papers (2026-01-08T13:05:36Z) - Stochastic Bilevel Optimization with Heavy-Tailed Noise [27.792016944321627]
This paper considers the smooth bilevel optimization in which the lower problem is convex strongly and the upper-level problem is possibly non-stationary noise level.
arXiv Detail & Related papers (2025-09-18T13:37:40Z) - Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization [27.377966916440432]
We show the complexity of finding an $epsilon-gradient point for bilevel optimization when the upper-level problem is nonmath and the lower-level problem is convex.<n>Recent work proposed the first-order approximation, F$2$SA, achieving the $tildemathcalO(epsilon-4)$ lower bound for firstorder smooth problems.
arXiv Detail & Related papers (2025-09-03T02:02:52Z) - Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness [57.93371273485736]
We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $ell$-smoothness condition $||nabla2f(x)|| le ellleft(||nabla f(x)||right),$ which generalizes the $L$-smoothness and $(L_0,L_1)$-smoothness.
arXiv Detail & Related papers (2025-08-09T08:28:06Z) - Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Optimal and Efficient Algorithms for Decentralized Online Convex Optimization [51.00357162913229]
Decentralized online convex optimization (D-OCO) is designed to minimize a sequence of global loss functions using only local computations and communications.<n>We develop a novel D-OCO algorithm that can reduce the regret bounds for convex and strongly convex functions to $tildeO(nrho-1/4sqrtT)$ and $tildeO(nrho-1/2log T)$.<n>Our analysis reveals that the projection-free variant can achieve $O(nT3/4)$ and $O(n
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - Projection-Free Methods for Stochastic Simple Bilevel Optimization with
Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization.
We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z) - Perseus: A Simple and Optimal High-Order Method for Variational
Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$.
We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z) - Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave
Saddle-Point Problems with Bilinear Coupling [84.47780064014262]
We study a linear convex-concave saddle-point problem $min_xmax_y f(x) ytopmathbfA x - g(y)
arXiv Detail & Related papers (2021-12-30T20:31:46Z) - Nonconvex-Nonconcave Min-Max Optimization with a Small Maximization
Domain [11.562923882714093]
We study the problem of finding approximate first-order stationary points in optimization problems of the form $min_x in max_y in Y f(x,y)
Our approach relies upon replacing the function $f(x,cdot)$ with its $kth order Taylor approximation (in $y$) and finding a near-stationary point in $Y$.
arXiv Detail & Related papers (2021-10-08T07:46:18Z) - Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max
Optimization [31.0295459253155]
We provide a first-order oracle lower bound for finding stationary points of min-max optimization problems.
Our analysis shows that the upper bound is optimal in the $epsilon$ dependence up to $kappa$.
It suggests that there is a significant gap between the upper $mathcalOkappa3 epsilon-4)$ in (Lin et al., 2020a) and our lower bound in the approximate number dependence.
arXiv Detail & Related papers (2021-04-18T04:30:01Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Projection Efficient Subgradient Method and Optimal Nonsmooth
Frank-Wolfe Method [54.93433440034386]
We find a feasible $epsilon$-suboptimal solution using only $O(epsilon-1)$ PO calls and optimal $O(epsilon-2)$ FO calls.
Our experiments confirm that these methods achieve significant speedups over the state-of-the-art, for a problem with costly PO and LMO calls.
arXiv Detail & Related papers (2020-10-05T08:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.