A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning
- URL: http://arxiv.org/abs/2507.06542v1
- Date: Wed, 09 Jul 2025 04:56:56 GMT
- Title: A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning
- Authors: Tongtian Zhu, Tianyu Zhang, Mingze Wang, Zhanpeng Zhou, Can Wang,
- Abstract summary: We study how communication should be scheduled over time, including determining when and how frequently devices synchronize.<n>We find that fully connected communication at the final step, implemented by a single global merging, is sufficient to match the performance of server-based training.
- Score: 17.386971981099588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized learning provides a scalable alternative to traditional parameter-server-based training, yet its performance is often hindered by limited peer-to-peer communication. In this paper, we study how communication should be scheduled over time, including determining when and how frequently devices synchronize. Our empirical results show that concentrating communication budgets in the later stages of decentralized training markedly improves global generalization. Surprisingly, we uncover that fully connected communication at the final step, implemented by a single global merging, is sufficient to match the performance of server-based training. We further show that low communication in decentralized learning preserves the \textit{mergeability} of local models throughout training. Our theoretical contributions, which explains these phenomena, are first to establish that the globally merged model of decentralized SGD can converge faster than centralized mini-batch SGD. Technically, we novelly reinterpret part of the discrepancy among local models, which were previously considered as detrimental noise, as constructive components that accelerate convergence. This work challenges the common belief that decentralized learning generalizes poorly under data heterogeneity and limited communication, while offering new insights into model merging and neural network loss landscapes.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Coordination-free Decentralised Federated Learning on Complex Networks:
Overcoming Heterogeneity [2.6849848612544]
Federated Learning (FL) is a framework for performing a learning task in an edge computing scenario.
We propose a communication-efficient Decentralised Federated Learning (DFL) algorithm able to cope with them.
Our solution allows devices communicating only with their direct neighbours to train an accurate model.
arXiv Detail & Related papers (2023-12-07T18:24:19Z) - Scheduling and Communication Schemes for Decentralized Federated
Learning [0.31410859223862103]
A decentralized federated learning (DFL) model with the gradient descent (SGD) algorithm has been introduced.
Three scheduling policies for DFL have been proposed for communications between the clients and the parallel servers.
Results show that the proposed scheduling polices have an impact both on the speed of convergence and in the final global model.
arXiv Detail & Related papers (2023-11-27T17:35:28Z) - Event-Triggered Decentralized Federated Learning over
Resource-Constrained Edge Devices [12.513477328344255]
Federated learning (FL) is a technique for distributed machine learning (ML)
In traditional FL algorithms, trained models at the edge are periodically sent to a central server for aggregation.
We develop a novel methodology for fully decentralized FL, where devices conduct model aggregation via cooperative consensus formation.
arXiv Detail & Related papers (2022-11-23T00:04:05Z) - DisPFL: Towards Communication-Efficient Personalized Federated Learning
via Decentralized Sparse Training [84.81043932706375]
We propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL.
Dis-PFL employs personalized sparse masks to customize sparse local models on the edge.
We demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities.
arXiv Detail & Related papers (2022-06-01T02:20:57Z) - Decentralized Event-Triggered Federated Learning with Heterogeneous
Communication Thresholds [12.513477328344255]
We propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over a network graph topology.
We demonstrate that our methodology achieves the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature.
arXiv Detail & Related papers (2022-04-07T20:35:37Z) - Finite-Time Consensus Learning for Decentralized Optimization with
Nonlinear Gossiping [77.53019031244908]
We present a novel decentralized learning framework based on nonlinear gossiping (NGO), that enjoys an appealing finite-time consensus property to achieve better synchronization.
Our analysis on how communication delay and randomized chats affect learning further enables the derivation of practical variants.
arXiv Detail & Related papers (2021-11-04T15:36:25Z) - RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data.
Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z) - Continual Local Training for Better Initialization of Federated Models [14.289213162030816]
Federated learning (FL) refers to the learning paradigm that trains machine learning models directly in decentralized systems.
The popular FL algorithm emphFederated Averaging (FedAvg) suffers from weight divergence.
We propose the local continual training strategy to address this problem.
arXiv Detail & Related papers (2020-05-26T12:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.