DIAMOND: Taming Sample and Communication Complexities in Decentralized
Bilevel Optimization
- URL: http://arxiv.org/abs/2212.02376v2
- Date: Wed, 7 Dec 2022 16:09:28 GMT
- Title: DIAMOND: Taming Sample and Communication Complexities in Decentralized
Bilevel Optimization
- Authors: Peiwen Qiu, Yining Li, Zhuqing Liu, Prashant Khanduri, Jia Liu, Ness
B. Shroff, Elizabeth Serena Bentley, Kurt Turck
- Abstract summary: We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking)
We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
- Score: 27.317118892531827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized bilevel optimization has received increasing attention recently
due to its foundational role in many emerging multi-agent learning paradigms
(e.g., multi-agent meta-learning and multi-agent reinforcement learning) over
peer-to-peer edge networks. However, to work with the limited computation and
communication capabilities of edge networks, a major challenge in developing
decentralized bilevel optimization techniques is to lower sample and
communication complexities. This motivates us to develop a new decentralized
bilevel optimization called DIAMOND (decentralized single-timescale stochastic
approximation with momentum and gradient-tracking). The contributions of this
paper are as follows: i) our DIAMOND algorithm adopts a single-loop structure
rather than following the natural double-loop structure of bilevel
optimization, which offers low computation and implementation complexity; ii)
compared to existing approaches, the DIAMOND algorithm does not require any
full gradient evaluations, which further reduces both sample and computational
complexities; iii) through a careful integration of momentum information and
gradient tracking techniques, we show that the DIAMOND algorithm enjoys
$\mathcal{O}(\epsilon^{-3/2})$ in sample and communication complexities for
achieving an $\epsilon$-stationary solution, both of which are independent of
the dataset sizes and significantly outperform existing works. Extensive
experiments also verify our theoretical findings.
Related papers
- Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic
Update and Transient Iteration Complexity [38.54552875789929]
We introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient complexity.
D-SOBA achieves the state-of-the-art rate, gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions.
arXiv Detail & Related papers (2024-02-05T16:35:30Z) - Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate [26.676582181833584]
Decentralized multi-level optimization is challenging because of the multilevel structure and decentralized communication.
We develop two novel decentralized optimization algorithms to optimize the multi-level compositional problem.
arXiv Detail & Related papers (2023-06-06T00:23:28Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - INTERACT: Achieving Low Sample and Communication Complexities in
Decentralized Bilevel Learning over Networks [24.02913189682224]
Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities.
Low sample and communication complexities are two fundamental challenges that remain under-explored.
Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
arXiv Detail & Related papers (2022-07-27T04:19:28Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Decentralized Gossip-Based Stochastic Bilevel Optimization over
Communication Networks [42.76623191830371]
We propose a gossip-based distributed bilevel optimization algorithm.
Agents can solve both networked and outer problems in a single time.
Our algorithm achieves the state-of-the-art efficiency and test accuracy.
arXiv Detail & Related papers (2022-06-22T06:38:54Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.