Related papers: Adaptive Compression in Federated Learning via Side Information

Adaptive Compression in Federated Learning via Side Information

URL: http://arxiv.org/abs/2306.12625v3
Date: Mon, 22 Apr 2024 00:14:54 GMT
Title: Adaptive Compression in Federated Learning via Side Information
Authors: Berivan Isik, Francesco Pase, Deniz Gunduz, Sanmi Koyejo, Tsachy Weissman, Michele Zorzi,
Abstract summary: We propose a framework that requires approximately $D_KL(q_phi(n) p_theta$ bits of communication. We show that our method can be integrated into many existing compression frameworks to attain the same (and often higher) test accuracy with up to $82$ times smaller than the prior work -- corresponding to 2,650 times overall compression.
Score: 28.401993810064255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a global distribution $p_{\theta}$ that is close to the clients' distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks to attain the same (and often higher) test accuracy with up to $82$ times smaller bitrate than the prior work -- corresponding to 2,650 times overall compression.

Related papers

Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction [57.93371273485736]
We consider a centralized distributed learning setup where all workers jointly find an unbiased bound LDeltaepsilon2,$ better poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution.
arXiv Detail & Related papers (2025-06-30T13:27:39Z)
FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching [7.264549907717153]
Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server. We introduce FedFetch, a strategy to mitigate the download time overhead caused by combining client sampling and compression techniques. We empirically show that adding FedFetch to communication efficient FL techniques reduces end-to-end training time by 1.26$times$ and download time by 4.49$times$ across compression techniques with heterogeneous client settings.
arXiv Detail & Related papers (2025-04-21T18:17:05Z)
Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning [51.560590617691005]
We investigate whether it is possible to squeeze more juice" out of each cohort than what is possible in a single communication round. Our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting.
arXiv Detail & Related papers (2024-06-03T08:48:49Z)
Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates [11.616782769625003]
adversarial (a.k.a., em Byzantine) clients makes federated learning (FL) prone to arbitrary manipulation. We show that the rate improvement in learning accuracy em diminishes with respect to the number of subsampled clients. We also observe that under a careful step choice, the learning error due to Byzantine clients decreases with the number of local steps.
arXiv Detail & Related papers (2024-02-20T07:40:11Z)
Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training. In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework. Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z)
Timely Asynchronous Hierarchical Federated Learning: Age of Convergence [59.96266198512243]
We consider an asynchronous hierarchical federated learning setting with a client-edge-cloud framework. The clients exchange the trained parameters with their corresponding edge servers, which update the locally aggregated model. The goal of each client is to converge to the global model, while maintaining timeliness of the clients.
arXiv Detail & Related papers (2023-06-21T17:39:16Z)
Towards Bias Correction of FedAvg over Nonuniform and Time-Varying Communications [26.597515045714502]
Federated learning (FL) is a decentralized learning framework wherein a parameter server (PS) and a collection of clients collaboratively train a model via a global objective. We show that when the channel conditions are heterogeneous across clients are changing over time, the FedFederated Postponed global model fails to postpone the gossip-type information mixing errors.
arXiv Detail & Related papers (2023-06-01T01:52:03Z)
Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation [20.909302074826666]
Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA) We show that in order to achieve the optimal error under $(varepsilon, delta)$-DP, it is sufficient for each client to send $Thetaleft( n minleft(varepsilon, varepsilon2right)$ bits for FL and $Thetaleft(logleft(minleft(varepsilon, varepsilon2right)$)$ bits for FA
arXiv Detail & Related papers (2023-04-04T05:37:17Z)
Federated Learning with Regularized Client Participation [1.433758865948252]
Federated Learning (FL) is a distributed machine learning approach where multiple clients work together to solve a machine learning task. One of the key challenges in FL is the issue of partial participation, which occurs when a large number of clients are involved in the training process. We propose a new technique and design a novel regularized client participation scheme.
arXiv Detail & Related papers (2023-02-07T18:26:07Z)
Optimizing Server-side Aggregation For Robust Federated Learning via Subspace Training [80.03567604524268]
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems. We propose SmartFL, a generic approach that optimize the server-side aggregation process. We provide theoretical analyses of the convergence and generalization capacity for SmartFL.
arXiv Detail & Related papers (2022-11-10T13:20:56Z)
Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z)
A Bayesian Federated Learning Framework with Online Laplace Approximation [144.7345013348257]
Federated learning allows multiple clients to collaboratively learn a globally shared model. We propose a novel FL framework that uses online Laplace approximation to approximate posteriors on both the client and server side. We achieve state-of-the-art results on several benchmarks, clearly demonstrating the advantages of the proposed method.
arXiv Detail & Related papers (2021-02-03T08:36:58Z)
Timely Communication in Federated Learning [65.1253801733098]
We consider a global learning framework in which a parameter server (PS) trains a global model by using $n$ clients without actually storing the client data centrally at a cloud server. Under the proposed scheme, at each iteration, the PS waits for $m$ available clients and sends them the current model. We find the average age of information experienced by each client and numerically characterize the age-optimal $m$ and $k$ values for a given $n$.
arXiv Detail & Related papers (2020-12-31T18:52:08Z)
Faster Non-Convex Federated Learning via Global and Local Momentum [57.52663209739171]
textttFedGLOMO is the first (first-order) FLtexttFedGLOMO algorithm. Our algorithm is provably optimal even with communication between the clients and the server.
arXiv Detail & Related papers (2020-12-07T21:05:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.