Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic
Update and Transient Iteration Complexity
- URL: http://arxiv.org/abs/2402.03167v2
- Date: Mon, 26 Feb 2024 11:30:42 GMT
- Title: Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic
Update and Transient Iteration Complexity
- Authors: Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan
- Abstract summary: We introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient complexity.
D-SOBA achieves the state-of-the-art rate, gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions.
- Score: 38.54552875789929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic bilevel optimization (SBO) is becoming increasingly essential in
machine learning due to its versatility in handling nested structures. To
address large-scale SBO, decentralized approaches have emerged as effective
paradigms in which nodes communicate with immediate neighbors without a central
server, thereby improving communication efficiency and enhancing algorithmic
robustness. However, current decentralized SBO algorithms face challenges,
including expensive inner-loop updates and unclear understanding of the
influence of network topology, data heterogeneity, and the nested bilevel
algorithmic structures. In this paper, we introduce a single-loop decentralized
SBO (D-SOBA) algorithm and establish its transient iteration complexity, which,
for the first time, clarifies the joint influence of network topology and data
heterogeneity on decentralized bilevel algorithms. D-SOBA achieves the
state-of-the-art asymptotic rate, asymptotic gradient/Hessian complexity, and
transient iteration complexity under more relaxed assumptions compared to
existing methods. Numerical experiments validate our theoretical findings.
Related papers
- Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks [57.24087627267086]
We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network.
Lower bounds on the number of decentralized communications and (sub)gradient computations required to solve the problem have been established.
We develop the first optimal algorithm that matches these lower bounds and offers substantially improved theoretical performance compared to the existing state of the art.
arXiv Detail & Related papers (2024-05-28T10:28:45Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate [26.676582181833584]
Decentralized multi-level optimization is challenging because of the multilevel structure and decentralized communication.
We develop two novel decentralized optimization algorithms to optimize the multi-level compositional problem.
arXiv Detail & Related papers (2023-06-06T00:23:28Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - DIAMOND: Taming Sample and Communication Complexities in Decentralized
Bilevel Optimization [27.317118892531827]
We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking)
We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
arXiv Detail & Related papers (2022-12-05T15:58:00Z) - A Penalty-Based Method for Communication-Efficient Decentralized Bilevel
Programming [14.35928967799696]
Bilevel programming has recently received attention in the literature, due to its wide range of applications.
The underlying bilevel optimization problem is solved either by a single machine or in the case of multiple machines connected in a star-shaped network.
This paper introduces a penalty function based decentralized algorithm with theoretical guarantees for this class of optimization problems.
arXiv Detail & Related papers (2022-11-08T08:39:30Z) - Optimal Complexity in Non-Convex Decentralized Learning over
Time-Varying Networks [8.860889476382594]
Decentralized optimization with time-varying networks is an emerging paradigm in machine learning.
It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are moving.
arXiv Detail & Related papers (2022-11-01T15:37:54Z) - DESTRESS: Computation-Optimal and Communication-Efficient Decentralized
Nonconvex Finite-Sum Optimization [43.31016937305845]
Internet-of-things, networked sensing, autonomous systems and federated learning call for decentralized algorithms for finite-sum optimizations.
We develop DEcentralized STochastic REcurSive methodDESTRESS for non finite-sum optimization.
Detailed theoretical and numerical comparisons show that DESTRESS improves upon prior decentralized algorithms.
arXiv Detail & Related papers (2021-10-04T03:17:41Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.