Related papers: DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization

DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization

URL: http://arxiv.org/abs/2212.02376v2
Date: Wed, 7 Dec 2022 16:09:28 GMT
Title: DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization
Authors: Peiwen Qiu, Yining Li, Zhuqing Liu, Prashant Khanduri, Jia Liu, Ness B. Shroff, Elizabeth Serena Bentley, Kurt Turck
Abstract summary: We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking) We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
Score: 27.317118892531827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized bilevel optimization has received increasing attention recently due to its foundational role in many emerging multi-agent learning paradigms (e.g., multi-agent meta-learning and multi-agent reinforcement learning) over peer-to-peer edge networks. However, to work with the limited computation and communication capabilities of edge networks, a major challenge in developing decentralized bilevel optimization techniques is to lower sample and communication complexities. This motivates us to develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale stochastic approximation with momentum and gradient-tracking). The contributions of this paper are as follows: i) our DIAMOND algorithm adopts a single-loop structure rather than following the natural double-loop structure of bilevel optimization, which offers low computation and implementation complexity; ii) compared to existing approaches, the DIAMOND algorithm does not require any full gradient evaluations, which further reduces both sample and computational complexities; iii) through a careful integration of momentum information and gradient tracking techniques, we show that the DIAMOND algorithm enjoys $\mathcal{O}(\epsilon^{-3/2})$ in sample and communication complexities for achieving an $\epsilon$-stationary solution, both of which are independent of the dataset sizes and significantly outperform existing works. Extensive experiments also verify our theoretical findings.

Related papers

Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity [38.54552875789929]
We introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient complexity. D-SOBA achieves the state-of-the-art rate, gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions.
arXiv Detail & Related papers (2024-02-05T16:35:30Z)
Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate [26.676582181833584]
Decentralized multi-level optimization is challenging because of the multilevel structure and decentralized communication. We develop two novel decentralized optimization algorithms to optimize the multi-level compositional problem.
arXiv Detail & Related papers (2023-06-06T00:23:28Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc. We prove that FedBiOAcc-Local converges at the same rate for this type of problems. Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z)
INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks [24.02913189682224]
Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities. Low sample and communication complexities are two fundamental challenges that remain under-explored. Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
arXiv Detail & Related papers (2022-07-27T04:19:28Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks [42.76623191830371]
We propose a gossip-based distributed bilevel optimization algorithm. Agents can solve both networked and outer problems in a single time. Our algorithm achieves the state-of-the-art efficiency and test accuracy.
arXiv Detail & Related papers (2022-06-22T06:38:54Z)
Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes. We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks. Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.