Related papers: $γ$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning

$γ$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning

URL: http://arxiv.org/abs/2505.12479v1
Date: Sun, 18 May 2025 15:55:50 GMT
Title: $γ$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning
Authors: Rongwei Lu, Yutong Jiang, Jinrui Zhang, Chunyang Li, Yifei Zhu, Bin Chen, Zhi Wang,
Abstract summary: Gradient compression can effectively alleviate bottlenecks in Federated Learning (FL)<n>We introduce the fundamental conversation of Error-Feedback.<n>We show that $gamma$-FedHT improves accuracy by up to $7.42%$ over Top-$k$ under equal communication.
Score: 15.458263187587097
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient compression can effectively alleviate communication bottlenecks in Federated Learning (FL). Contemporary state-of-the-art sparse compressors, such as Top-$k$, exhibit high computational complexity, up to $\mathcal{O}(d\log_2{k})$, where $d$ is the number of model parameters. The hard-threshold compressor, which simply transmits elements with absolute values higher than a fixed threshold, is thus proposed to reduce the complexity to $\mathcal{O}(d)$. However, the hard-threshold compression causes accuracy degradation in FL, where the datasets are non-IID and the stepsize $\gamma$ is decreasing for model convergence. The decaying stepsize reduces the updates and causes the compression ratio of the hard-threshold compression to drop rapidly to an aggressive ratio. At or below this ratio, the model accuracy has been observed to degrade severely. To address this, we propose $\gamma$-FedHT, a stepsize-aware low-cost compressor with Error-Feedback to guarantee convergence. Given that the traditional theoretical framework of FL does not consider Error-Feedback, we introduce the fundamental conversation of Error-Feedback. We prove that $\gamma$-FedHT has the convergence rate of $\mathcal{O}(\frac{1}{T})$ ($T$ representing total training iterations) under $\mu$-strongly convex cases and $\mathcal{O}(\frac{1}{\sqrt{T}})$ under non-convex cases, \textit{same as FedAVG}. Extensive experiments demonstrate that $\gamma$-FedHT improves accuracy by up to $7.42\%$ over Top-$k$ under equal communication traffic on various non-IID image datasets.

Related papers

(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum [7.095058159492494]
heavy ball momentum (SHB) is commonly used to train machine learning models, and often provides empirical improvements over iterations of gradient descent. We show that SHB can attain an accelerated acceleration when the mini-size is larger than a threshold $b* that that on the number $kappa.
arXiv Detail & Related papers (2024-01-12T18:17:28Z)
Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z)
Learning Low-Rank Representations for Model Compression [6.721845345130468]
We propose a Low-Rank Representation Vector Quantization ($textLR2textVQ$) method that outperforms previous VQ algorithms in various tasks and architectures. In our method, the compression ratio could be directly controlled by $m$, and the final accuracy is solely determined by $tilded$. With a proper $tilded$, we evaluate $textLR2textVQ$ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top
arXiv Detail & Related papers (2022-11-21T12:15:28Z)
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning [78.30395044401321]
We develop a novel model-based approach to reinforcement learning (MBRL) It relaxes the assumptions on the target transition model to belong to a generic family of mixture models. It can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
arXiv Detail & Related papers (2022-06-02T17:27:49Z)
Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems [0.0]
We develop two compression based gradient algorithms to solve a class of non-smooth strongly convex-strongly concave saddle-point problems. Our first algorithm is a Restart-based Decentralized Proximal Gradient method with Compression (C-RDPSG) for general settings. Next, we present a Decentralized Proximal Variance Reduced Gradient algorithm with Compression (C-DPSVRG) for finite sum setting.
arXiv Detail & Related papers (2022-05-28T15:17:19Z)
Settling the Sample Complexity of Model-Based Offline Reinforcement Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration. Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality. We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
Rethinking gradient sparsification as total error minimization [0.0]
Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training training networks (DNNs) We argue that the benefits of gradient sparsification, especially for DNNs, are necessary -- one that moves from perit optimality to consider optimality for the entire training.
arXiv Detail & Related papers (2021-08-02T14:52:42Z)
CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning [8.702106020664612]
Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. High communication costs could arise in FL due to deep-scale (deep) learning models and bandwidth-connected connections. We introduce a distributed communication datasets called CFedAvg for FL with non-biased SNR-constrained compressors.
arXiv Detail & Related papers (2021-06-14T04:27:19Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model. For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.