DFedADMM: Dual Constraints Controlled Model Inconsistency for
Decentralized Federated Learning
- URL: http://arxiv.org/abs/2308.08290v1
- Date: Wed, 16 Aug 2023 11:22:36 GMT
- Title: DFedADMM: Dual Constraints Controlled Model Inconsistency for
Decentralized Federated Learning
- Authors: Qinglun Li, Li Shen, Guanghao Li, Quanjun Yin, Dacheng Tao
- Abstract summary: Decentralized learning (DFL) discards the central server and establishes a decentralized communication network.
Existing DFL methods still suffer from two major challenges: local inconsistency and local overfitting.
- Score: 52.83811558753284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To address the communication burden issues associated with federated learning
(FL), decentralized federated learning (DFL) discards the central server and
establishes a decentralized communication network, where each client
communicates only with neighboring clients. However, existing DFL methods still
suffer from two major challenges: local inconsistency and local heterogeneous
overfitting, which have not been fundamentally addressed by existing DFL
methods. To tackle these issues, we propose novel DFL algorithms, DFedADMM and
its enhanced version DFedADMM-SAM, to enhance the performance of DFL. The
DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual
variables to control the model inconsistency raised from the decentralized
heterogeneous data distributions. The DFedADMM-SAM algorithm further improves
on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which
uses gradient perturbations to generate locally flat models and searches for
models with uniformly low loss values to mitigate local heterogeneous
overfitting. Theoretically, we derive convergence rates of $\small
\mathcal{O}\Big(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi)^2}\Big)$ and $\small
\mathcal{O}\Big(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi)^2}+
\frac{1}{T^{3/2}K^{1/2}}\Big)$ in the non-convex setting for DFedADMM and
DFedADMM-SAM, respectively, where $1 - \psi$ represents the spectral gap of the
gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10 and
CIFAR100 datesets demonstrate that our algorithms exhibit superior performance
in terms of both generalization and convergence speed compared to existing
state-of-the-art (SOTA) optimizers in DFL.
Related papers
- Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process.
We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z) - Asymmetrically Decentralized Federated Learning [22.21977974314497]
Decentralized Federated Learning (DFL) has emerged, which discards the server with a peer-to-peer (P2P) communication framework.
This paper proposes DFedSGPSM algorithm, which is based on asymmetric topologies and utilizes the Push- Aware protocol.
arXiv Detail & Related papers (2023-10-08T09:46:26Z) - Improving the Model Consistency of Decentralized Federated Learning [68.2795379609854]
Federated Learning (FL) discards the central server and each client only communicates with its neighbors in a decentralized communication network.
Existing DFL suffers from inconsistency among local clients, which results in inferior compared to FLFL.
We propose DFedSAMMGS, where $1lambda$ is the spectral gossip matrix and $Q$ is the number of sparse data gaps.
arXiv Detail & Related papers (2023-02-08T14:37:34Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - On the Convergence of Heterogeneous Federated Learning with Arbitrary
Adaptive Online Model Pruning [15.300983585090794]
We present a unifying framework for heterogeneous FL algorithms with em arbitrary adaptive online model pruning.
In particular, we prove that under certain sufficient conditions, these algorithms converge to a stationary point of standard FL for general smooth cost functions.
We illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index.
arXiv Detail & Related papers (2022-01-27T20:43:38Z) - Achieving Personalized Federated Learning with Sparse Local Models [75.76854544460981]
Federated learning (FL) is vulnerable to heterogeneously distributed data.
To counter this issue, personalized FL (PFL) was proposed to produce dedicated local models for each individual user.
Existing PFL solutions either demonstrate unsatisfactory generalization towards different model architectures or cost enormous extra computation and memory.
We proposeFedSpa, a novel PFL scheme that employs personalized sparse masks to customize sparse local models on the edge.
arXiv Detail & Related papers (2022-01-27T08:43:11Z) - FedLGA: Towards System-Heterogeneity of Federated Learning via Local
Gradient Approximation [21.63719641718363]
We formalize the system-heterogeneous FL problem and propose a new algorithm, called FedLGA, which addresses this problem by bridging the divergence local model updates via epoch approximation.
The results of comprehensive experiments on multiple datasets show that FedLGA outperforms current FL benchmarks against the system-heterogeneity.
arXiv Detail & Related papers (2021-12-22T16:05:09Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.