Heterogeneous Federated Learning on a Graph
- URL: http://arxiv.org/abs/2209.08737v1
- Date: Mon, 19 Sep 2022 03:18:10 GMT
- Title: Heterogeneous Federated Learning on a Graph
- Authors: Huiyuan Wang and Xuyang Zhao and Wei Lin
- Abstract summary: Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in machine learning practice.
In this work, we consider parameter estimation in federated learning iteration with data distribution and communication heterogeneity, as well as limited computational capacity of local devices.
We highlight, our algorithm transmits only parameters along edges of $G$ at convergence rate $O(T-1log T)$ where $T$ denotes the number of iterations.
- Score: 9.135254524746847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning, where algorithms are trained across multiple
decentralized devices without sharing local data, is increasingly popular in
distributed machine learning practice. Typically, a graph structure $G$ exists
behind local devices for communication. In this work, we consider parameter
estimation in federated learning with data distribution and communication
heterogeneity, as well as limited computational capacity of local devices. We
encode the distribution heterogeneity by parametrizing distributions on local
devices with a set of distinct $p$-dimensional vectors. We then propose to
jointly estimate parameters of all devices under the $M$-estimation framework
with the fused Lasso regularization, encouraging an equal estimate of
parameters on connected devices in $G$. We provide a general result for our
estimator depending on $G$, which can be further calibrated to obtain
convergence rates for various specific problem setups. Surprisingly, our
estimator attains the optimal rate under certain graph fidelity condition on
$G$, as if we could aggregate all samples sharing the same distribution. If the
graph fidelity condition is not met, we propose an edge selection procedure via
multiple testing to ensure the optimality. To ease the burden of local
computation, a decentralized stochastic version of ADMM is provided, with
convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations.
We highlight that, our algorithm transmits only parameters along edges of $G$
at each iteration, without requiring a central machine, which preserves
privacy. We further extend it to the case where devices are randomly
inaccessible during the training process, with a similar algorithmic
convergence guarantee. The computational and statistical efficiency of our
method is evidenced by simulation experiments and the 2020 US presidential
election data set.
Related papers
- A Federated Distributionally Robust Support Vector Machine with Mixture of Wasserstein Balls Ambiguity Set for Distributed Fault Diagnosis [3.662364375995991]
We study the problem of training a distributionally robust (DR) support vector machine (SVM) in a federated fashion over a network comprised of a central server and $G$ clients without sharing data.
We propose two distributed optimization algorithms for training the global FDR-SVM.
arXiv Detail & Related papers (2024-10-04T19:21:45Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization [0.552480439325792]
We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense.
We propose a coupled distributed approximation algorithm, in which every agent updates the current beliefs of its unknown parameter.
We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by $mathcalO(frac1nk)+mathcalOleft(frac1sqrtn (1-rho_w)right)frac1k1.5
arXiv Detail & Related papers (2024-04-21T14:18:49Z) - A Unified Framework for Gradient-based Clustering of Distributed Data [51.904327888475606]
We develop a family of distributed clustering algorithms that work over networks of users.
DGC-$mathcalF_rho$ is specialized to popular clustering losses like $K$-means and Huber loss.
We show that consensus fixed points of DGC-$mathcalF_rho$ are equivalent to fixed points of gradient clustering over the full data.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - FedLGA: Towards System-Heterogeneity of Federated Learning via Local
Gradient Approximation [21.63719641718363]
We formalize the system-heterogeneous FL problem and propose a new algorithm, called FedLGA, which addresses this problem by bridging the divergence local model updates via epoch approximation.
The results of comprehensive experiments on multiple datasets show that FedLGA outperforms current FL benchmarks against the system-heterogeneity.
arXiv Detail & Related papers (2021-12-22T16:05:09Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.