Related papers: On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation

URL: http://arxiv.org/abs/2602.23633v1
Date: Fri, 27 Feb 2026 03:12:08 GMT
Title: On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation
Authors: Yubo Zhou, Luo Luo, Guang Dai, Haishan Ye,
Abstract summary: We provide a refined convergence analysis of the Single-loop Approximate Implicit Differentiation (SSAID) algorithm.<n>Our result is noteworthy in two aspects: (i) it matches the optimal $mathcalO(-2)$ rate of state-of-the-art multi-loop methods; and (ii) it provides the first explicit, fine-grained characterization of the $$-dependence.
Score: 44.084531611147305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic Bilevel Optimization has emerged as a fundamental framework for meta-learning and hyperparameter optimization. Despite the practical prevalence of single-loop algorithms--which update lower and upper variables concurrently--their theoretical understanding, particularly in the stochastic regime, remains significantly underdeveloped compared to their multi-loop counterparts. Existing analyses often yield suboptimal convergence rates or obscure the critical dependence on the lower-level condition number $κ$, frequently burying it within generic Lipschitz constants. In this paper, we bridge this gap by providing a refined convergence analysis of the Single-loop Stochastic Approximate Implicit Differentiation (SSAID) algorithm. We prove that SSAID achieves an $ε$-stationary point with an oracle complexity of $\mathcal{O}(κ^7 ε^{-2})$. Our result is noteworthy in two aspects: (i) it matches the optimal $\mathcal{O}(ε^{-2})$ rate of state-of-the-art multi-loop methods (e.g., stocBiO) while maintaining the computational efficiency of a single-loop update; and (ii) it provides the first explicit, fine-grained characterization of the $κ$-dependence for stochastic AID-based single-loop methods. This work demonstrates that SSAID is not merely a heuristic approach, but admits a rigorous theoretical foundation with convergence guarantees competitive with mainstream multi-loop frameworks.

Related papers

Active Bipartite Ranking with Smooth Posterior Distributions [1.9838140219494644]
Bipartite ranking is a statistical learning problem involved in many applications and widely studied in the passive context.<n>We propose a novel algorithm, referred to as smooth-rank and designed for the continuous setting, which aims to minimise the distance between the ROC curve of the estimated ranking rule and the optimal one w.r.t. the $sup$ norm.<n>We provide a problem dependent upper bound on the expected sampling time of smooth-rank and establish a problem dependent lower bound on the expected sampling time of any PAC$(,)$ algorithm.
arXiv Detail & Related papers (2026-02-27T18:32:08Z)
From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity [12.097833603814252]
We show that Byzantine-robust distributed optimization can be cast as a general optimization with inexact gradient oracles.<n>We propose two optimization schemes to speed up the convergence.
arXiv Detail & Related papers (2026-02-03T09:56:23Z)
Scalable Second-order Riemannian Optimization for $K$-means Clustering [22.839486642997187]
We present a new formulation of the $K$-means clustering problem as a submanifold.<n>By factorizing the $K$-means manifold into a product manifold, we show how each Newton subproblem can be solved.<n>Our experiments show that the proposed method converges significantly faster than the state-of-the-art first-order negative non-negative low-rank factorization method.
arXiv Detail & Related papers (2025-09-25T22:49:00Z)
Near-Optimal Non-Convex Stochastic Optimization under Generalized Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point. However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications. In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.<n>We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z)
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method. The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.