Decentralized Hyper-Gradient Computation over Time-Varying Directed
Networks
- URL: http://arxiv.org/abs/2210.02129v3
- Date: Tue, 13 Jun 2023 05:04:17 GMT
- Title: Decentralized Hyper-Gradient Computation over Time-Varying Directed
Networks
- Authors: Naoyuki Terashita, Satoshi Hara
- Abstract summary: This paper addresses the communication issues when estimating hyper-gradients in decentralized learning (FL)
We introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients.
We confirm the convergence of our estimator to the true hyper-gradient both theoretically and empirically.
- Score: 13.274835852615572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the communication issues when estimating hyper-gradients
in decentralized federated learning (FL). Hyper-gradients in decentralized FL
quantifies how the performance of globally shared optimal model is influenced
by the perturbations in clients' hyper-parameters. In prior work, clients trace
this influence through the communication of Hessian matrices over a static
undirected network, resulting in (i) excessive communication costs and (ii)
inability to make use of more efficient and robust networks, namely,
time-varying directed networks. To solve these issues, we introduce an
alternative optimality condition for FL using an averaging operation on model
parameters and gradients. We then employ Push-Sum as the averaging operation,
which is a consensus optimization technique for time-varying directed networks.
As a result, the hyper-gradient estimator derived from our optimality condition
enjoys two desirable properties; (i) it only requires Push-Sum communication of
vectors and (ii) it can operate over time-varying directed networks. We confirm
the convergence of our estimator to the true hyper-gradient both theoretically
and empirically, and we further demonstrate that it enables two novel
applications: decentralized influence estimation and personalization over
time-varying networks.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Decentralized Optimization in Time-Varying Networks with Arbitrary Delays [22.40154714677385]
We consider a decentralized optimization problem for networks affected by communication delays.
Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems.
To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs.
arXiv Detail & Related papers (2024-05-29T20:51:38Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - Hierarchical Federated Learning in Wireless Networks: Pruning Tackles Bandwidth Scarcity and System Heterogeneity [32.321021292376315]
We propose a pruning-enabled hierarchical federated learning (PHFL) in heterogeneous networks (HetNets)
We first derive an upper bound of the convergence rate that clearly demonstrates the impact of the model pruning and wireless communications.
We validate the effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall clock time, energy consumption and bandwidth requirement.
arXiv Detail & Related papers (2023-08-03T07:03:33Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - FLCC: Efficient Distributed Federated Learning on IoMT over CSMA/CA [0.0]
Federated Learning (FL) has emerged as a promising approach for privacy preservation.
This article investigates the performance of FL on an application that might be used to improve a remote healthcare system over ad hoc networks.
We present two metrics to evaluate the network performance: 1) probability of successful transmission while minimizing the interference, and 2) performance of distributed FL model in terms of accuracy and loss.
arXiv Detail & Related papers (2023-03-29T16:36:42Z) - A Q-Learning-based Approach for Distributed Beam Scheduling in mmWave
Networks [18.22250038264899]
We consider the problem of distributed downlink beam scheduling and power allocation for millimeter-Wave (mmWave) cellular networks.
Multiple base stations belonging to different service operators share the same unlicensed spectrum with no central coordination or cooperation among them.
We propose a distributed scheduling approach to power allocation and adaptation for efficient interference management over the shared spectrum by modeling each BS as an independent Q-learning agent.
arXiv Detail & Related papers (2021-10-17T02:58:13Z) - Low-Latency Federated Learning over Wireless Channels with Differential
Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server.
In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Toward fast and accurate human pose estimation via soft-gated skip
connections [97.06882200076096]
This paper is on highly accurate and highly efficient human pose estimation.
We re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art.
Our model achieves state-of-the-art results on the MPII and LSP datasets.
arXiv Detail & Related papers (2020-02-25T18:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.