Decentralized Gossip-Based Stochastic Bilevel Optimization over
Communication Networks
- URL: http://arxiv.org/abs/2206.10870v1
- Date: Wed, 22 Jun 2022 06:38:54 GMT
- Title: Decentralized Gossip-Based Stochastic Bilevel Optimization over
Communication Networks
- Authors: Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
- Abstract summary: We propose a gossip-based distributed bilevel optimization algorithm.
Agents can solve both networked and outer problems in a single time.
Our algorithm achieves the state-of-the-art efficiency and test accuracy.
- Score: 42.76623191830371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilevel optimization have gained growing interests, with numerous
applications found in meta learning, minimax games, reinforcement learning, and
nested composition optimization. This paper studies the problem of distributed
bilevel optimization over a network where agents can only communicate with
neighbors, including examples from multi-task, multi-agent learning and
federated learning. In this paper, we propose a gossip-based distributed
bilevel learning algorithm that allows networked agents to solve both the inner
and outer optimization problems in a single timescale and share information via
network propagation. We show that our algorithm enjoys the
$\mathcal{O}(\frac{1}{K \epsilon^2})$ per-agent sample complexity for general
nonconvex bilevel optimization and $\mathcal{O}(\frac{1}{K \epsilon})$ for
strongly convex objective, achieving a speedup that scales linearly with the
network size. The sample complexities are optimal in both $\epsilon$ and $K$.
We test our algorithm on the examples of hyperparameter tuning and
decentralized reinforcement learning. Simulated experiments confirmed that our
algorithm achieves the state-of-the-art training efficiency and test accuracy.
Related papers
- Faster Adaptive Decentralized Learning Algorithms [24.379734053137597]
We propose a class of faster distributed non decentralized algorithms (i.e. AdaMDOS and AdaMDOF) for adaptive learning and finite-sum optimization.
Some experimental results demonstrate efficiency of our algorithms.
arXiv Detail & Related papers (2024-08-19T08:05:33Z) - Federated Multi-Level Optimization over Decentralized Networks [55.776919718214224]
We study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors.
We propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale.
Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications.
arXiv Detail & Related papers (2023-10-10T00:21:10Z) - DIAMOND: Taming Sample and Communication Complexities in Decentralized
Bilevel Optimization [27.317118892531827]
We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking)
We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
arXiv Detail & Related papers (2022-12-05T15:58:00Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Faster Adaptive Momentum-Based Federated Methods for Distributed
Composition Optimization [14.579475552088692]
We propose a class of faster federated composition optimization algorithms (i.e. MFCGD and AdaMFCGD) to solve the non distributed composition problems.
In particular, our adaptive algorithm (i.e., AdaMFCGD) uses a unified adaptive matrix to flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-03T15:17:04Z) - Fast Adaptive Federated Bilevel Optimization [14.579475552088692]
We propose a novel adaptive federated bilevel optimization algorithm (i.e.,AdaFBiO) to solve the distributed bilevel optimization problems.
AdaFBiO uses the unified adaptive matrices to flexibly incorporate various adaptive learning rates to update variables in both UL and LL problems.
We provide a convergence analysis framework for our AdaFBiO algorithm, and prove it needs the sample of complexity of $tildeO(epsilon-3)$ with communication complexity of $tildeO(epsilon-2)$ to obtain an $
arXiv Detail & Related papers (2022-11-02T13:55:47Z) - INTERACT: Achieving Low Sample and Communication Complexities in
Decentralized Bilevel Learning over Networks [24.02913189682224]
Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities.
Low sample and communication complexities are two fundamental challenges that remain under-explored.
Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
arXiv Detail & Related papers (2022-07-27T04:19:28Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.