Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum
- URL: http://arxiv.org/abs/2311.18578v3
- Date: Fri, 27 Jun 2025 13:40:04 GMT
- Title: Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum
- Authors: Riccardo Zaccone, Sai Praneeth Karimireddy, Carlo Masone, Marco Ciccone,
- Abstract summary: Federated Learning (FL) has emerged as the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios.<n>Despite significant research efforts, existing approaches often degrade severely due to the joint effect of heterogeneity and partial client participation.<n>In this work, we propose a novel Generalized Heavy-Ball Momentum (GHBM)<n>We show that GHBM substantially improves state-of-the-art performance under random uniform client sampling.
- Score: 19.473386008007942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) has emerged as the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios.However, system and statistical challenges hinder its real-world applicability, requiring efficient learning from edge devices and robustness to data heterogeneity. Despite significant research efforts, existing approaches often degrade severely due to the joint effect of heterogeneity and partial client participation. In particular, while momentum appears as a promising approach for overcoming statistical heterogeneity, in current approaches its update is biased towards the most recently sampled clients. As we show in this work, this is the reason why it fails to outperform FedAvg, preventing its effective use in real-world large-scale scenarios. In this work, we propose a novel Generalized Heavy-Ball Momentum (GHBM) and theoretically prove it enables convergence under unbounded data heterogeneity in cyclic partial participation, thereby advancing the understanding of momentum's effectiveness in FL. We then introduce adaptive and communication-efficient variants of GHBM that match the communication complexity of FedAvg in settings where clients can be stateful. Extensive experiments on vision and language tasks confirm our theoretical findings, demonstrating that GHBM substantially improves state-of-the-art performance under random uniform client sampling, particularly in large-scale settings with high data heterogeneity and low client participation. Code is available at https://rickzack.github.io/GHBM.
Related papers
- FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios [14.18492489954482]
Federated Learning (FL) enables decentralized model training while preserving data privacy.<n>Despite its benefits, FL faces challenges with non-identically distributed (non-IID) data.<n>We propose FedWCM, a method that dynamically adjusts momentum using global and per-round data.
arXiv Detail & Related papers (2025-07-20T14:24:57Z) - Client-Centric Federated Adaptive Optimization [78.30827455292827]
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private.<n>We propose Federated-Centric Adaptive Optimization, which is a class of novel federated optimization approaches.
arXiv Detail & Related papers (2025-01-17T04:00:50Z) - Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation [1.519321208145928]
Federated Learning (FL) offers a promising framework for individuals aiming to collaboratively develop a shared model.<n> variations in data distribution among clients can profoundly affect FL methodologies, primarily due to instabilities in the aggregation process.<n>We propose a novel FL framework, combining individual parameter pruning and regularization techniques to improve the robustness of individual clients' models to aggregate.
arXiv Detail & Related papers (2024-12-19T16:22:37Z) - Towards Efficient Model-Heterogeneity Federated Learning for Large Models [18.008063521900702]
We introduce HeteroTune, an innovative fine-tuning framework tailored for model-heterogeneity federated learning (MHFL)
In particular, we propose a novel parameter-efficient fine-tuning structure, called FedAdapter, which employs a multi-branch cross-model aggregator.
Benefiting from the lightweight FedAdapter, our approach significantly reduces both the computational and communication overhead.
arXiv Detail & Related papers (2024-11-25T09:58:51Z) - Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability [23.466997173249034]
FedAPM includes novel structures that (i) for missed computations due to unavailability with only $(1)O$ additional memory computation with respect to standard FedAvg.
We show that FedAPM converges to a stationary point even non-stationary algorithm despite being non-stationary dynamics.
arXiv Detail & Related papers (2024-09-26T00:38:18Z) - DynamicFL: Federated Learning with Dynamic Communication Resource Allocation [34.97472382870816]
Federated Learning (FL) is a collaborative machine learning framework that allows multiple users to train models utilizing their local data in a distributed manner.
We introduce DynamicFL, a new FL framework that investigates the trade-offs between global model performance and communication costs.
We show that DynamicFL surpasses current state-of-the-art methods with up to a 10% increase in model accuracy.
arXiv Detail & Related papers (2024-09-08T05:53:32Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm.
It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities.
It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z) - Take History as a Mirror in Heterogeneous Federated Learning [9.187993085263209]
Federated Learning (FL) allows several clients to cooperatively train machine learning models without disclosing the raw data.
In this work, we propose a novel asynchronous FL framework called Federated Historical Learning (FedHist)
FedHist effectively addresses the challenges posed by both Non-IID data and gradient staleness.
arXiv Detail & Related papers (2023-12-16T11:40:49Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Momentum Benefits Non-IID Federated Learning Simply and Provably [22.800862422479913]
Federated learning is a powerful paradigm for large-scale machine learning.
FedAvg and SCAFFOLD are two prominent algorithms to address these challenges.
This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD.
arXiv Detail & Related papers (2023-06-28T18:52:27Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Adaptive Federated Learning via New Entropy Approach [14.595709494370372]
Federated Learning (FL) has emerged as a prominent distributed machine learning framework.
In this paper, we propose an adaptive FEDerated learning algorithm based on ENTropy theory (FedEnt) to alleviate the parameter deviation among heterogeneous clients.
arXiv Detail & Related papers (2023-03-27T07:57:04Z) - FS-Real: Towards Real-World Cross-Device Federated Learning [60.91678132132229]
Federated Learning (FL) aims to train high-quality models in collaboration with distributed clients while not uploading their local data.
There is still a considerable gap between the flourishing FL research and real-world scenarios, mainly caused by the characteristics of heterogeneous devices and its scales.
We propose an efficient and scalable prototyping system for real-world cross-device FL, FS-Real.
arXiv Detail & Related papers (2023-03-23T15:37:17Z) - FedSkip: Combatting Statistical Heterogeneity with Federated Skip
Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices.
We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z) - Towards Fair Federated Recommendation Learning: Characterizing the
Inter-Dependence of System and Data Heterogeneity [6.355248215478912]
Federated learning (FL) is an effective mechanism for data privacy in recommender systems by running machine learning model training on-device.
While prior FL optimizations tackled the data and system heterogeneity challenges faced by FL, they assume the two are independent of each other.
This paper takes a data-driven approach to show the inter-dependence of data and system heterogeneity in real-world data and quantifies its impact on the overall model quality and fairness.
arXiv Detail & Related papers (2022-05-30T20:59:35Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Finite-Time Consensus Learning for Decentralized Optimization with
Nonlinear Gossiping [77.53019031244908]
We present a novel decentralized learning framework based on nonlinear gossiping (NGO), that enjoys an appealing finite-time consensus property to achieve better synchronization.
Our analysis on how communication delay and randomized chats affect learning further enables the derivation of practical variants.
arXiv Detail & Related papers (2021-11-04T15:36:25Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z) - Supercharging Imbalanced Data Learning With Energy-based Contrastive
Representation Transfer [72.5190560787569]
In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets.
Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions.
This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
arXiv Detail & Related papers (2020-11-25T00:13:11Z) - FedDANE: A Federated Newton-Type Method [49.9423212899788]
Federated learning aims to jointly learn low statistical models over massively distributed datasets.
We propose FedDANE, an optimization that we adapt from DANE, to handle federated learning.
arXiv Detail & Related papers (2020-01-07T07:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.