Communication-Efficient Federated Hypergradient Computation via
Aggregated Iterative Differentiation
- URL: http://arxiv.org/abs/2302.04969v3
- Date: Fri, 16 Jun 2023 02:26:22 GMT
- Title: Communication-Efficient Federated Hypergradient Computation via
Aggregated Iterative Differentiation
- Authors: Peiyao Xiao and Kaiyi Ji
- Abstract summary: AggITD is simple to implement and significantly reduces the communication cost.
We show that the proposed AggITD-based algorithm achieves the same sample complexity as existing approximate implicit differentiation (AID)-based approaches.
- Score: 14.494626833445915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated bilevel optimization has attracted increasing attention due to
emerging machine learning and communication applications. The biggest challenge
lies in computing the gradient of the upper-level objective function (i.e.,
hypergradient) in the federated setting due to the nonlinear and distributed
construction of a series of global Hessian matrices. In this paper, we propose
a novel communication-efficient federated hypergradient estimator via
aggregated iterative differentiation (AggITD). AggITD is simple to implement
and significantly reduces the communication cost by conducting the federated
hypergradient estimation and the lower-level optimization simultaneously. We
show that the proposed AggITD-based algorithm achieves the same sample
complexity as existing approximate implicit differentiation (AID)-based
approaches with much fewer communication rounds in the presence of data
heterogeneity. Our results also shed light on the great advantage of ITD over
AID in the federated/distributed hypergradient estimation. This differs from
the comparison in the non-distributed bilevel optimization, where ITD is less
efficient than AID. Our extensive experiments demonstrate the great
effectiveness and communication efficiency of the proposed method.
Related papers
- Distributed Noncoherent Joint Transmission Based on Multi-Agent Reinforcement Learning for Dense Small Cell MISO Systems [8.146481327854545]
We consider a dense small cell (DSC) network where multi-antenna small cell base stations (SBSs) transmit data over a shared band.
arXiv Detail & Related papers (2024-08-22T02:11:14Z) - On the Communication Complexity of Decentralized Bilevel Optimization [40.45379954138305]
We propose two novel decentralized bilevel gradient descent algorithms based on simultaneous and alternating update strategies.
Our algorithms can achieve faster convergence rates and lower communication costs than existing methods.
This is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting.
arXiv Detail & Related papers (2023-11-19T14:56:26Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - Fast Adaptive Federated Bilevel Optimization [14.579475552088692]
We propose a novel adaptive federated bilevel optimization algorithm (i.e.,AdaFBiO) to solve the distributed bilevel optimization problems.
AdaFBiO uses the unified adaptive matrices to flexibly incorporate various adaptive learning rates to update variables in both UL and LL problems.
We provide a convergence analysis framework for our AdaFBiO algorithm, and prove it needs the sample of complexity of $tildeO(epsilon-3)$ with communication complexity of $tildeO(epsilon-2)$ to obtain an $
arXiv Detail & Related papers (2022-11-02T13:55:47Z) - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement [80.94378602238432]
We propose an efficient structure named Correspondence Efficient Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner.
To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates.
Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
arXiv Detail & Related papers (2022-09-25T13:05:33Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - Resource-constrained Federated Edge Learning with Heterogeneous Data:
Formulation and Analysis [8.863089484787835]
We propose a distributed approximate Newton-type Newton-type training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data.
FedOVA decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning.
arXiv Detail & Related papers (2021-10-14T17:35:24Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.