Related papers: Communication-Efficient Federated Learning by Exploiting Spatio-Temporal Correlations of Gradients

Communication-Efficient Federated Learning by Exploiting Spatio-Temporal Correlations of Gradients

URL: http://arxiv.org/abs/2601.10491v1
Date: Thu, 15 Jan 2026 15:11:41 GMT
Title: Communication-Efficient Federated Learning by Exploiting Spatio-Temporal Correlations of Gradients
Authors: Shenlong Zheng, Zhen Zhang, Yuhui Deng, Geyong Min, Lin Cui,
Abstract summary: GradESTC is a compression technique that exploits both spatial and temporal correlations.<n>It significantly reduces communication overhead by transmitting lightweight combination coefficients and a limited number of updated basis vectors instead of the full gradients.
Score: 28.747595687821843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Communication overhead is a critical challenge in federated learning, particularly in bandwidth-constrained networks. Although many methods have been proposed to reduce communication overhead, most focus solely on compressing individual gradients, overlooking the temporal correlations among them. Prior studies have shown that gradients exhibit spatial correlations, typically reflected in low-rank structures. Through empirical analysis, we further observe a strong temporal correlation between client gradients across adjacent rounds. Based on these observations, we propose GradESTC, a compression technique that exploits both spatial and temporal gradient correlations. GradESTC exploits spatial correlations to decompose each full gradient into a compact set of basis vectors and corresponding combination coefficients. By exploiting temporal correlations, only a small portion of the basis vectors need to be dynamically updated in each round. GradESTC significantly reduces communication overhead by transmitting lightweight combination coefficients and a limited number of updated basis vectors instead of the full gradients. Extensive experiments show that, upon reaching a target accuracy level near convergence, GradESTC reduces uplink communication by an average of 39.79% compared to the strongest baseline, while maintaining comparable convergence speed and final accuracy to uncompressed FedAvg. By effectively leveraging spatio-temporal gradient structures, GradESTC offers a practical and scalable solution for communication-efficient federated learning.

Related papers

Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach [16.51305515824504]
Hierarchical edge learning (HFL) has emerged as a key for large-scale wireless and Internet of Things systems.<n>One method such as sign-based gradient descent (SignSGD) offer an essential solution, but existing theory and algorithms do not naturally extend to hierarchical settings.<n>We introduce a scalable HFL algorithm, HierSignSGD, and provide the convergence analysis for SignSGD in a hierarchical setting.
arXiv Detail & Related papers (2026-02-02T17:18:03Z)
Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning [0.8220217498103312]
Federated Learning (FL) enables decentralized model training across multiple clients while preserving data privacy.<n>We introduce two algorithms: ProjFL, designed for unbiased compressors, and ProjFL+EF, for biased compressors through an Error Feedback mechanism.
arXiv Detail & Related papers (2025-11-05T13:11:30Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation [23.649838489244917]
gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs. We propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, RS-DGC. We achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset.
arXiv Detail & Related papers (2023-12-29T09:24:26Z)
Efficient Semantic Matching with Hypercolumn Correlation [58.92933923647451]
HCCNet is an efficient yet effective semantic matching method. It exploits the full potential of multi-scale correlation maps. It eschews the reliance on expensive match-wise relationship mining on the 4D correlation map.
arXiv Detail & Related papers (2023-11-07T20:40:07Z)
Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective. It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Compressing gradients by exploiting temporal correlation in momentum-SGD [17.995905582226463]
We analyze compression methods that exploit temporal correlation in systems with and without error-feedback. Experiments with the ImageNet dataset demonstrate that our proposed methods offer significant reduction in the rate of communication. We prove the convergence of SGD under an expected error assumption by establishing a bound for the minimum gradient norm.
arXiv Detail & Related papers (2021-08-17T18:04:06Z)
Fast Federated Learning by Balancing Communication Trade-Offs [9.89867121050673]
Federated Learning (FL) has recently received a lot of attention for large-scale privacy-preserving machine learning. High communication overheads due to frequent gradient transmissions decelerate FL. We propose an enhanced FL scheme, namely Fast FL (FFL), that jointly and dynamically adjusts the two variables to minimize the learning error.
arXiv Detail & Related papers (2021-05-23T21:55:14Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.