Related papers: An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning

An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning

URL: http://arxiv.org/abs/2511.05770v1
Date: Fri, 07 Nov 2025 23:59:09 GMT
Title: An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning
Authors: Zhijing Ye, Sheng Di, Jiamin Wang, Zhiqing Zhong, Zhaorui Zhang, Xiaodong Yu,
Abstract summary: Federated learning (FL) enables collaborative model training without exposing clients' private data.<n>EBLC is particularly appealing for its fine-grained utility-compression tradeoff.<n>We propose an EBLC framework tailored for FL gradient data to achieve high compression ratios while preserving model accuracy.
Score: 7.649286962189554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning (FL) enables collaborative model training without exposing clients' private data, but its deployment is often constrained by the communication cost of transmitting gradients between clients and the central server, especially under system heterogeneity where low-bandwidth clients bottleneck overall performance. Lossy compression of gradient data can mitigate this overhead, and error-bounded lossy compression (EBLC) is particularly appealing for its fine-grained utility-compression tradeoff. However, existing EBLC methods (e.g., SZ), originally designed for smooth scientific data with strong spatial locality, rely on generic predictors such as Lorenzo and interpolation for entropy reduction to improve compression ratio. Gradient tensors, in contrast, exhibit low smoothness and weak spatial correlation, rendering these predictors ineffective and leading to poor compression ratios. To address this limitation, we propose an EBLC framework tailored for FL gradient data to achieve high compression ratios while preserving model accuracy. The core of it is an innovative prediction mechanism that exploits temporal correlations across FL training rounds and structural regularities within convolutional kernels to reduce residual entropy. The predictor is compatible with standard quantizers and entropy coders and comprises (1) a cross-round magnitude predictor based on a normalized exponential moving average, and (2) a sign predictor that leverages gradient oscillation and kernel-level sign consistency. Experiments show that this new EBLC yields up to 1.53x higher compression ratios than SZ3 with lower accuracy loss. Integrated into a real-world FL framework, APPFL, it reduces end-to-end communication time by 76.1%-96.2% under various constrained-bandwidth scenarios, demonstrating strong scalability for real-world FL deployments.

Related papers

Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning [0.8220217498103312]
Federated Learning (FL) enables decentralized model training across multiple clients while preserving data privacy.<n>We introduce two algorithms: ProjFL, designed for unbiased compressors, and ProjFL+EF, for biased compressors through an Error Feedback mechanism.
arXiv Detail & Related papers (2025-11-05T13:11:30Z)
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning [29.727339562140653]
Current data compression methods, such as sparsification in Federated Averaging (FedAvg), effectively enhance the communication efficiency of Federated Learning (FL) These methods encounter challenges such as the straggler problem and diminished model performance due to heterogeneous bandwidth and non-IID data. We introduce a bandwidth-aware compression framework for FL, aimed at improving communication efficiency while mitigating the problems associated with non-IID data.
arXiv Detail & Related papers (2024-08-27T02:28:27Z)
Communication-Efficient Distributed Learning with Local Immediate Error Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm. LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z)
Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds. We show strong evidences that variable-length is beneficial for compression in FL. We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z)
Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees [53.950234267704]
We introduce Global-QSGD, an All-reduce gradient-compatible quantization method.<n>We show that it accelerates distributed training by up to 3.51% over baseline quantization methods.
arXiv Detail & Related papers (2023-05-29T21:32:15Z)
Optimal Rate Adaption in Federated Learning with Compressed Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates. tradeoff between compression and model accuracy in the networked environment remains unclear. We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Compressing gradients by exploiting temporal correlation in momentum-SGD [17.995905582226463]
We analyze compression methods that exploit temporal correlation in systems with and without error-feedback. Experiments with the ImageNet dataset demonstrate that our proposed methods offer significant reduction in the rate of communication. We prove the convergence of SGD under an expected error assumption by establishing a bound for the minimum gradient norm.
arXiv Detail & Related papers (2021-08-17T18:04:06Z)
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training [74.43625662170284]
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. We propose a new compression technique that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability. We experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) without significant accuracy loss.
arXiv Detail & Related papers (2021-04-21T02:22:10Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.