Problem-Parameter-Free Decentralized Bilevel Optimization
- URL: http://arxiv.org/abs/2510.24288v1
- Date: Tue, 28 Oct 2025 10:50:04 GMT
- Title: Problem-Parameter-Free Decentralized Bilevel Optimization
- Authors: Zhiwei Zhai, Wenjing Yan, Ying-Jun Angela Zhang,
- Abstract summary: Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems.<n>In this paper, we propose AdaSDBO, a fully problem- parameter-free algorithm for decentralized bilevel optimization with a single-loop structure.<n>Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $widetildemathcalOleft(frac1Tright)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors.
- Score: 31.15538292038612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.
Related papers
- Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes [3.246129789918632]
The training of deep neural networks is inherently a non- optimization problem.<n>Standard approaches such as gradient descent (SGD) require simultaneous updates to parameters.<n>We propose a novel method Alternating Train Miniization with tailored step sizes (SAMT)<n>SAMT achieves better performance with fewer parameter updates compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T08:23:38Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis [21.932550214810533]
We propose two novel tuning-free algorithms, D-TFBO and S-TFBO.
D-TFBO employs a double-loop structure with stepsizes adaptively adjusted by the "inverse of cumulative gradient norms" strategy.
S-TFBO features a simpler fully single-loop structure that updates three variables simultaneously with a theory-motivated joint design of adaptive stepsizes for all variables.
arXiv Detail & Related papers (2024-10-07T15:50:30Z) - A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Value Function Based Difference-of-Convex Algorithm for Bilevel
Hyperparameter Selection Problems [5.940592509070767]
We develop a sequentially convergent Value based Difference-of-Convex Function Algorithm with inexactness (VF-iDCA)
Our experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.
arXiv Detail & Related papers (2022-06-13T08:51:10Z) - Continuation Path with Linear Convergence Rate [18.405645120971496]
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems are solved sequentially.
We present a primal dual analysis of the path-following algorithm as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem.
Considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter.
arXiv Detail & Related papers (2021-12-09T18:42:13Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.