Related papers: INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks

INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks

URL: http://arxiv.org/abs/2207.13283v2
Date: Thu, 28 Jul 2022 14:34:30 GMT
Title: INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks
Authors: Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, and Jia Liu
Abstract summary: Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities. Low sample and communication complexities are two fundamental challenges that remain under-explored. Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
Score: 24.02913189682224
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} \epsilon^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings.

Related papers

A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization [16.020878731214083]
This paper introduces a fully first-order decentralized method for decentralized Bilevel optimization, $textC2$DFB. $textC2$DFB is both compute- and communicate-efficient.
arXiv Detail & Related papers (2024-10-18T02:00:45Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning. This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z)
Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc. We prove that FedBiOAcc-Local converges at the same rate for this type of problems. Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z)
DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization [27.317118892531827]
We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking) We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
arXiv Detail & Related papers (2022-12-05T15:58:00Z)
A Penalty-Based Method for Communication-Efficient Decentralized Bilevel Programming [14.35928967799696]
This paper introduces a penalty function-based decentralized algorithm for solving bilevel programming problems over a decentralized network. A key feature of the proposed algorithm is the estimation of the hyper-gradient of the penalty function. Our theoretical framework ensures non-asymptotic convergence to the optimal solution of the original problem under various convexity conditions.
arXiv Detail & Related papers (2022-11-08T08:39:30Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks [42.76623191830371]
We propose a gossip-based distributed bilevel optimization algorithm. Agents can solve both networked and outer problems in a single time. Our algorithm achieves the state-of-the-art efficiency and test accuracy.
arXiv Detail & Related papers (2022-06-22T06:38:54Z)
Local Stochastic Bilevel Optimization with Momentum-Based Variance Reduction [104.41634756395545]
We study Federated Bilevel Optimization problems. Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm. We show FedBiO has complexity of $O(epsilon-1.5)$. Our algorithms show superior performances compared to other baselines in numerical experiments.
arXiv Detail & Related papers (2022-05-03T16:40:22Z)
Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z)
GT-STORM: Taming Sample, Communication, and Memory Complexities in Decentralized Non-Convex Learning [11.129095449532283]
Decentralized-1/2 non robustness optimization has received increasing attention in recent years in machine learning. Three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample costs, communication, and memory complexities.
arXiv Detail & Related papers (2021-05-04T00:44:48Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.