Related papers: On the Communication Complexity of Decentralized Bilevel Optimization

On the Communication Complexity of Decentralized Bilevel Optimization

URL: http://arxiv.org/abs/2311.11342v4
Date: Sat, 1 Jun 2024 18:32:08 GMT
Title: On the Communication Complexity of Decentralized Bilevel Optimization
Authors: Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao,
Abstract summary: We propose two novel decentralized bilevel gradient descent algorithms based on simultaneous and alternating update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. This is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting.
Score: 40.45379954138305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several decentralized stochastic bilevel optimization algorithms have been developed. However, existing methods often suffer from slow convergence rates and high communication costs in heterogeneous settings, limiting their applicability to real-world tasks. To address these issues, we propose two novel decentralized stochastic bilevel gradient descent algorithms based on simultaneous and alternating update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. Importantly, our convergence analyses do not rely on strong assumptions regarding heterogeneity. More importantly, our theoretical analysis clearly discloses how the additional communication required for estimating hypergradient under the heterogeneous setting affects the convergence rate. To the best of our knowledge, this is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting. Furthermore, we demonstrate how to establish the convergence rate for the alternating update strategy when combined with the variance-reduced gradient. Finally, experimental results confirm the efficacy of our algorithms.

Related papers

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization [20.588374906635256]
We design two compressed decentralized algorithms for solving non-compression heterogeneity optimization under two different scenarios.<n>Both algorithms adopt a momentum technique to achieve fast convergence and a message-speed technique to save communication costs.<n> Superior empirical performance is observed over state-of-the-art methods on deep neural networks (DNNs) and Transformers.
arXiv Detail & Related papers (2025-08-07T00:33:00Z)
Bilevel Learning with Inexact Stochastic Gradients [2.247833425312671]
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods.
arXiv Detail & Related papers (2024-12-16T18:18:47Z)
SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization [35.92829763686735]
This paper studies decentralized bilevel optimization, in which multiple agents collaborate to solve problems involving nested optimization structures with neighborhood communications. We propose SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization. We present a unified convergence analysis for SPARKLE, applicable to all its variants, with state-of-the-art convergence rates compared to existing decentralized bilevel algorithms.
arXiv Detail & Related papers (2024-11-21T14:23:06Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts. We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z)
Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate [26.676582181833584]
Decentralized multi-level optimization is challenging because of the multilevel structure and decentralized communication. We develop two novel decentralized optimization algorithms to optimize the multi-level compositional problem.
arXiv Detail & Related papers (2023-06-06T00:23:28Z)
Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc. We prove that FedBiOAcc-Local converges at the same rate for this type of problems. Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.