Heterogeneous Federated Learning on a Graph
- URL: http://arxiv.org/abs/2209.08737v1
- Date: Mon, 19 Sep 2022 03:18:10 GMT
- Title: Heterogeneous Federated Learning on a Graph
- Authors: Huiyuan Wang and Xuyang Zhao and Wei Lin
- Abstract summary: Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in machine learning practice.
In this work, we consider parameter estimation in federated learning iteration with data distribution and communication heterogeneity, as well as limited computational capacity of local devices.
We highlight, our algorithm transmits only parameters along edges of $G$ at convergence rate $O(T-1log T)$ where $T$ denotes the number of iterations.
- Score: 9.135254524746847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning, where algorithms are trained across multiple
decentralized devices without sharing local data, is increasingly popular in
distributed machine learning practice. Typically, a graph structure $G$ exists
behind local devices for communication. In this work, we consider parameter
estimation in federated learning with data distribution and communication
heterogeneity, as well as limited computational capacity of local devices. We
encode the distribution heterogeneity by parametrizing distributions on local
devices with a set of distinct $p$-dimensional vectors. We then propose to
jointly estimate parameters of all devices under the $M$-estimation framework
with the fused Lasso regularization, encouraging an equal estimate of
parameters on connected devices in $G$. We provide a general result for our
estimator depending on $G$, which can be further calibrated to obtain
convergence rates for various specific problem setups. Surprisingly, our
estimator attains the optimal rate under certain graph fidelity condition on
$G$, as if we could aggregate all samples sharing the same distribution. If the
graph fidelity condition is not met, we propose an edge selection procedure via
multiple testing to ensure the optimality. To ease the burden of local
computation, a decentralized stochastic version of ADMM is provided, with
convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations.
We highlight that, our algorithm transmits only parameters along edges of $G$
at each iteration, without requiring a central machine, which preserves
privacy. We further extend it to the case where devices are randomly
inaccessible during the training process, with a similar algorithmic
convergence guarantee. The computational and statistical efficiency of our
method is evidenced by simulation experiments and the 2020 US presidential
election data set.
Related papers
- A Federated Distributionally Robust Support Vector Machine with Mixture of Wasserstein Balls Ambiguity Set for Distributed Fault Diagnosis [3.662364375995991]
We study the problem of training a distributionally robust (DR) support vector machine (SVM) in a federated fashion over a network comprised of a central server and $G$ clients without sharing data.
We propose two distributed optimization algorithms for training the global FDR-SVM.
arXiv Detail & Related papers (2024-10-04T19:21:45Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization [0.552480439325792]
We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense.
We propose a coupled distributed approximation algorithm, in which every agent updates the current beliefs of its unknown parameter.
We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by $mathcalO(frac1nk)+mathcalOleft(frac1sqrtn (1-rho_w)right)frac1k1.5
arXiv Detail & Related papers (2024-04-21T14:18:49Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - FedLGA: Towards System-Heterogeneity of Federated Learning via Local
Gradient Approximation [21.63719641718363]
We formalize the system-heterogeneous FL problem and propose a new algorithm, called FedLGA, which addresses this problem by bridging the divergence local model updates via epoch approximation.
The results of comprehensive experiments on multiple datasets show that FedLGA outperforms current FL benchmarks against the system-heterogeneity.
arXiv Detail & Related papers (2021-12-22T16:05:09Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Debiasing Distributed Second Order Optimization with Surrogate Sketching
and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.