On the Unreasonable Effectiveness of Federated Averaging with
Heterogeneous Data
- URL: http://arxiv.org/abs/2206.04723v1
- Date: Thu, 9 Jun 2022 18:25:25 GMT
- Title: On the Unreasonable Effectiveness of Federated Averaging with
Heterogeneous Data
- Authors: Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong
Zhang
- Abstract summary: Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning.
This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions.
- Score: 39.600069116159695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing theory predicts that data heterogeneity will degrade the performance
of the Federated Averaging (FedAvg) algorithm in federated learning. However,
in practice, the simple FedAvg algorithm converges very well. This paper
explains the seemingly unreasonable effectiveness of FedAvg that contradicts
the previous theoretical predictions. We find that the key assumption of
bounded gradient dissimilarity in previous theoretical analyses is too
pessimistic to characterize data heterogeneity in practical applications. For a
simple quadratic problem, we demonstrate there exist regimes where large
gradient dissimilarity does not have any negative impact on the convergence of
FedAvg. Motivated by this observation, we propose a new quantity, average drift
at optimum, to measure the effects of data heterogeneity, and explicitly use it
to present a new theoretical analysis of FedAvg. We show that the average drift
at optimum is nearly zero across many real-world federated training tasks,
whereas the gradient dissimilarity can be large. And our new analysis suggests
FedAvg can have identical convergence rates in homogeneous and heterogeneous
data settings, and hence, leads to better understanding of its empirical
success.
Related papers
- A New Theoretical Perspective on Data Heterogeneity in Federated Optimization [39.75009345804017]
In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate.
In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large.
This paper aims to bridge this gap between theoretical understanding and practical performance by providing a theoretical analysis from a new perspective.
arXiv Detail & Related papers (2024-07-22T11:52:58Z) - Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity [54.145730036889496]
This paper deals with Gradient learning (FL) in the presence of malicious attacks Byzantine data.
A novel Average Algorithm (RAGA) is proposed, which leverages robustness aggregation and can select a dataset.
arXiv Detail & Related papers (2024-03-20T08:15:08Z) - Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification.
Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples.
This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z) - Exact nonlinear state estimation [0.0]
The majority of data assimilation methods in the geosciences are based on Gaussian assumptions.
Non-parametric, particle-based DA algorithms have superior accuracy, but their application to high-dimensional models still poses operational challenges.
This article introduces a new nonlinear estimation theory which attempts to bridge the existing gap in DA methodology.
arXiv Detail & Related papers (2023-10-17T03:44:29Z) - Federated Conformal Predictors for Distributed Uncertainty
Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning.
In this paper, we extend conformal prediction to the federated learning setting.
We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z) - A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data [0.261072980439312]
We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives.
In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
arXiv Detail & Related papers (2023-03-01T02:13:22Z) - A Convergence Theory for Federated Average: Beyond Smoothness [28.074273047592065]
Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly.
As a leading algorithm in this setting, Federated Average FedAvg, which runs Gradient Descent (SGD) in parallel on local devices, has been widely used.
This paper provides a theoretical convergence study on Federated Learning.
arXiv Detail & Related papers (2022-11-03T04:50:49Z) - Achieving Statistical Optimality of Federated Learning: Beyond
Stationary Points [19.891597817559038]
Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud.
Recent work raised concerns on two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally.
We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates.
arXiv Detail & Related papers (2021-06-29T09:59:43Z) - A Unified Linear Speedup Analysis of Federated Averaging and Nesterov
FedAvg [49.76940694847521]
Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data.
In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today.
We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies.
arXiv Detail & Related papers (2020-07-11T05:59:08Z) - FedDANE: A Federated Newton-Type Method [49.9423212899788]
Federated learning aims to jointly learn low statistical models over massively distributed datasets.
We propose FedDANE, an optimization that we adapt from DANE, to handle federated learning.
arXiv Detail & Related papers (2020-01-07T07:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.