Related papers: On the Condition Number Dependency in Bilevel Optimization

On the Condition Number Dependency in Bilevel Optimization

URL: http://arxiv.org/abs/2511.22331v1
Date: Thu, 27 Nov 2025 11:03:24 GMT
Title: On the Condition Number Dependency in Bilevel Optimization
Authors: Lesi Chen, Jingzhao Zhang,
Abstract summary: Bilevel optimization between an objective function defined by an upper-level problem whose feasible region is the solution of a lower-level problem.<n>For second-order and hyper-smooth problems, we show $(_y13/4 )$ and $(4-4)$ respectively.
Score: 23.985835962136793
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(κ^4 ε^{-2})$ upper bound that is near-optimal in $ε$. However, the optimal dependency on the condition number $κ$ is unknown. In this work, we establish a new $Ω(κ^2 ε^{-2})$ lower bound and $\tilde{\mathcal{O}}(κ^{7/2} ε^{-2})$ upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show $Ω(κ_y^{13/4} ε^{-12/7})$ and $Ω(κ^{17/10} ε^{-8/5})$ lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ/\sqrtε)$ to $Ω(κ^{5/4} / \sqrtε)$. (3) For smooth stochastic problems, we show an $Ω(κ^4 ε^{-4})$ lower bound.

Related papers

Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds [27.851263935083736]
We investigate distributed online convex optimization with compressed communication.<n>We propose a novel algorithm that achieves improved regret bounds of $tildeO(-1/2-1nsqrtT)$ and $tildeO(-1-2nlnT)$ for convex and strongly convex functions.
arXiv Detail & Related papers (2026-01-08T13:05:36Z)
Stochastic Bilevel Optimization with Heavy-Tailed Noise [27.792016944321627]
This paper considers the smooth bilevel optimization in which the lower problem is convex strongly and the upper-level problem is possibly non-stationary noise level.
arXiv Detail & Related papers (2025-09-18T13:37:40Z)
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization [27.377966916440432]
We show the complexity of finding an $epsilon-gradient point for bilevel optimization when the upper-level problem is nonmath and the lower-level problem is convex.<n>Recent work proposed the first-order approximation, F$2$SA, achieving the $tildemathcalO(epsilon-4)$ lower bound for firstorder smooth problems.
arXiv Detail & Related papers (2025-09-03T02:02:52Z)
Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness [57.93371273485736]
We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $ell$-smoothness condition $||nabla2f(x)|| le ellleft(||nabla f(x)||right),$ which generalizes the $L$-smoothness and $(L_0,L_1)$-smoothness.
arXiv Detail & Related papers (2025-08-09T08:28:06Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Optimal and Efficient Algorithms for Decentralized Online Convex Optimization [51.00357162913229]
Decentralized online convex optimization (D-OCO) is designed to minimize a sequence of global loss functions using only local computations and communications.<n>We develop a novel D-OCO algorithm that can reduce the regret bounds for convex and strongly convex functions to $tildeO(nrho-1/4sqrtT)$ and $tildeO(nrho-1/2log T)$.<n>Our analysis reveals that the projection-free variant can achieve $O(nT3/4)$ and $O(n
arXiv Detail & Related papers (2024-02-14T13:44:16Z)
Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization. We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z)
Perseus: A Simple and Optimal High-Order Method for Variational Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$. We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z)
Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling [84.47780064014262]
We study a linear convex-concave saddle-point problem $min_xmax_y f(x) ytopmathbfA x - g(y)
arXiv Detail & Related papers (2021-12-30T20:31:46Z)
Nonconvex-Nonconcave Min-Max Optimization with a Small Maximization Domain [11.562923882714093]
We study the problem of finding approximate first-order stationary points in optimization problems of the form $min_x in max_y in Y f(x,y) Our approach relies upon replacing the function $f(x,cdot)$ with its $kth order Taylor approximation (in $y$) and finding a near-stationary point in $Y$.
arXiv Detail & Related papers (2021-10-08T07:46:18Z)
Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization [31.0295459253155]
We provide a first-order oracle lower bound for finding stationary points of min-max optimization problems. Our analysis shows that the upper bound is optimal in the $epsilon$ dependence up to $kappa$. It suggests that there is a significant gap between the upper $mathcalOkappa3 epsilon-4)$ in (Lin et al., 2020a) and our lower bound in the approximate number dependence.
arXiv Detail & Related papers (2021-04-18T04:30:01Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method [54.93433440034386]
We find a feasible $epsilon$-suboptimal solution using only $O(epsilon-1)$ PO calls and optimal $O(epsilon-2)$ FO calls. Our experiments confirm that these methods achieve significant speedups over the state-of-the-art, for a problem with costly PO and LMO calls.
arXiv Detail & Related papers (2020-10-05T08:16:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.