Related papers: Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

URL: http://arxiv.org/abs/2509.02937v1
Date: Wed, 03 Sep 2025 02:02:52 GMT
Title: Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
Authors: Lesi Chen, Junru Li, Jingzhao Zhang,
Abstract summary: We show the complexity of finding an $epsilon-gradient point for bilevel optimization when the upper-level problem is nonmath and the lower-level problem is convex.<n>Recent work proposed the first-order approximation, F$2$SA, achieving the $tildemathcalO(epsilon-4)$ lower bound for firstorder smooth problems.
Score: 27.377966916440432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the complexity of finding an $\epsilon$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(\epsilon^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $\Omega(\epsilon^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p \epsilon^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $\Omega(\epsilon^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = \Omega( \log \epsilon^{-1} / \log \log \epsilon^{-1})$.

Related papers

On the Condition Number Dependency in Bilevel Optimization [23.985835962136793]
Bilevel optimization between an objective function defined by an upper-level problem whose feasible region is the solution of a lower-level problem.<n>For second-order and hyper-smooth problems, we show $(_y13/4 )$ and $(4-4)$ respectively.
arXiv Detail & Related papers (2025-11-27T11:03:24Z)
Stochastic Bilevel Optimization with Heavy-Tailed Noise [27.792016944321627]
This paper considers the smooth bilevel optimization in which the lower problem is convex strongly and the upper-level problem is possibly non-stationary noise level.
arXiv Detail & Related papers (2025-09-18T13:37:40Z)
Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness [57.93371273485736]
We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $ell$-smoothness condition $||nabla2f(x)|| le ellleft(||nabla f(x)||right),$ which generalizes the $L$-smoothness and $(L_0,L_1)$-smoothness.
arXiv Detail & Related papers (2025-08-09T08:28:06Z)
On the Complexity of First-Order Methods in Stochastic Bilevel Optimization [9.649991673557167]
We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. Existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We propose a simple first-order method that converges to an $epsilon$ stationary point using $O(epsilon-6), O(epsilon-4)$ access to first-order $y*$-aware oracles.
arXiv Detail & Related papers (2024-02-11T04:26:35Z)
Achieving ${O}(ε^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization [19.273672650548722]
We show how to achieve an $O(epsilon1.5)$ sample complexity for non-strongly-accurate stationary point gradient bilevel optimization.<n>As far as we know, this is the first Hessian/Jacobian-free method with an $O(epsilon1.5)$ sample complexity for non-strongly-accurate stationary point gradient optimization.
arXiv Detail & Related papers (2023-12-06T16:34:58Z)
Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization. We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z)
Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems. Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z)
Perseus: A Simple and Optimal High-Order Method for Variational Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$. We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z)
Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions. No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z)
A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth. We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.