INTERACT: Achieving Low Sample and Communication Complexities in
Decentralized Bilevel Learning over Networks
- URL: http://arxiv.org/abs/2207.13283v2
- Date: Thu, 28 Jul 2022 14:34:30 GMT
- Title: INTERACT: Achieving Low Sample and Communication Complexities in
Decentralized Bilevel Learning over Networks
- Authors: Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, and Jia Liu
- Abstract summary: Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities.
Low sample and communication complexities are two fundamental challenges that remain under-explored.
Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
- Score: 24.02913189682224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, decentralized bilevel optimization problems have received
increasing attention in the networking and machine learning communities thanks
to their versatility in modeling decentralized learning problems over
peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent
reinforcement learning, personalized training, and Byzantine-resilient
learning). However, for decentralized bilevel optimization over peer-to-peer
networks with limited computation and communication capabilities, how to
achieve low sample and communication complexities are two fundamental
challenges that remain under-explored so far. In this paper, we make the first
attempt to investigate the class of decentralized bilevel optimization problems
with nonconvex and strongly-convex structure corresponding to the outer and
inner subproblems, respectively. Our main contributions in this paper are
two-fold: i) We first propose a deterministic algorithm called INTERACT
(inner-gradient-descent-outer-tracked-gradient) that requires the sample
complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of
$\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where
$n$ and $\epsilon > 0$ are the number of samples at each agent and the desired
stationarity gap, respectively. ii) To relax the need for full gradient
evaluations in each iteration, we propose a stochastic variance-reduced version
of INTERACT (SVR-INTERACT), which improves the sample complexity to
$\mathcal{O}(\sqrt{n} \epsilon^{-1})$ while achieving the same communication
complexity as the deterministic algorithm. To our knowledge, this work is the
first that achieves both low sample and communication complexities for solving
decentralized bilevel optimization problems over networks. Our numerical
experiments also corroborate our theoretical findings.
Related papers
- A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization [16.020878731214083]
This paper introduces a fully first-order decentralized method for decentralized Bilevel optimization, $textC2$DFB.
$textC2$DFB is both compute- and communicate-efficient.
arXiv Detail & Related papers (2024-10-18T02:00:45Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - DIAMOND: Taming Sample and Communication Complexities in Decentralized
Bilevel Optimization [27.317118892531827]
We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking)
We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
arXiv Detail & Related papers (2022-12-05T15:58:00Z) - A Penalty-Based Method for Communication-Efficient Decentralized Bilevel Programming [14.35928967799696]
This paper introduces a penalty function-based decentralized algorithm for solving bilevel programming problems over a decentralized network.
A key feature of the proposed algorithm is the estimation of the hyper-gradient of the penalty function.
Our theoretical framework ensures non-asymptotic convergence to the optimal solution of the original problem under various convexity conditions.
arXiv Detail & Related papers (2022-11-08T08:39:30Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Decentralized Gossip-Based Stochastic Bilevel Optimization over
Communication Networks [42.76623191830371]
We propose a gossip-based distributed bilevel optimization algorithm.
Agents can solve both networked and outer problems in a single time.
Our algorithm achieves the state-of-the-art efficiency and test accuracy.
arXiv Detail & Related papers (2022-06-22T06:38:54Z) - Local Stochastic Bilevel Optimization with Momentum-Based Variance
Reduction [104.41634756395545]
We study Federated Bilevel Optimization problems. Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm.
We show FedBiO has complexity of $O(epsilon-1.5)$.
Our algorithms show superior performances compared to other baselines in numerical experiments.
arXiv Detail & Related papers (2022-05-03T16:40:22Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - GT-STORM: Taming Sample, Communication, and Memory Complexities in
Decentralized Non-Convex Learning [11.129095449532283]
Decentralized-1/2 non robustness optimization has received increasing attention in recent years in machine learning.
Three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample costs, communication, and memory complexities.
arXiv Detail & Related papers (2021-05-04T00:44:48Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.