On Convergence of FedProx: Local Dissimilarity Invariant Bounds,
Non-smoothness and Beyond
- URL: http://arxiv.org/abs/2206.05187v1
- Date: Fri, 10 Jun 2022 15:35:10 GMT
- Title: On Convergence of FedProx: Local Dissimilarity Invariant Bounds,
Non-smoothness and Beyond
- Authors: Xiao-Tong Yuan and Ping Li
- Abstract summary: We develop a novel local dissimilarity in convergence theory for FedProx and its minibatch extension through the lens of algorithmic stability.
Preliminary experimental results on a series of benchmark FL datasets are reported to demonstrate the benefit of minibatching for improving the sample efficiency of FedProx.
- Score: 41.082982732100696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The FedProx algorithm is a simple yet powerful distributed proximal point
optimization method widely used for federated learning (FL) over heterogeneous
data. Despite its popularity and remarkable success witnessed in practice, the
theoretical understanding of FedProx is largely underinvestigated: the
appealing convergence behavior of FedProx is so far characterized under certain
non-standard and unrealistic dissimilarity assumptions of local functions, and
the results are limited to smooth optimization problems. In order to remedy
these deficiencies, we develop a novel local dissimilarity invariant
convergence theory for FedProx and its minibatch stochastic extension through
the lens of algorithmic stability. As a result, we contribute to derive several
new and deeper insights into FedProx for non-convex federated optimization
including: 1) convergence guarantees independent on local dissimilarity type
conditions; 2) convergence guarantees for non-smooth FL problems; and 3) linear
speedup with respect to size of minibatch and number of sampled devices. Our
theory for the first time reveals that local dissimilarity and smoothness are
not must-have for FedProx to get favorable complexity bounds. Preliminary
experimental results on a series of benchmark FL datasets are reported to
demonstrate the benefit of minibatching for improving the sample efficiency of
FedProx.
Related papers
- Tighter Performance Theory of FedExProx [85.92481138826949]
We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel algorithms via extrapolation.
We develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems.
We extend the applicability of our analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis.
arXiv Detail & Related papers (2024-10-20T11:53:25Z) - Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process.
We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z) - FedSkip: Combatting Statistical Heterogeneity with Federated Skip
Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices.
We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z) - Achieving Statistical Optimality of Federated Learning: Beyond
Stationary Points [19.891597817559038]
Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud.
Recent work raised concerns on two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally.
We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates.
arXiv Detail & Related papers (2021-06-29T09:59:43Z) - Distributionally Robust Federated Averaging [19.875176871167966]
We present communication efficient distributed algorithms for robust learning periodic averaging with adaptive sampling.
We give corroborating experimental evidence for our theoretical results in federated learning settings.
arXiv Detail & Related papers (2021-02-25T03:32:09Z) - A Unified Linear Speedup Analysis of Federated Averaging and Nesterov
FedAvg [49.76940694847521]
Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data.
In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today.
We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies.
arXiv Detail & Related papers (2020-07-11T05:59:08Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - FedDANE: A Federated Newton-Type Method [49.9423212899788]
Federated learning aims to jointly learn low statistical models over massively distributed datasets.
We propose FedDANE, an optimization that we adapt from DANE, to handle federated learning.
arXiv Detail & Related papers (2020-01-07T07:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.