Preserved central model for faster bidirectional compression in
distributed settings
- URL: http://arxiv.org/abs/2102.12528v1
- Date: Wed, 24 Feb 2021 19:48:20 GMT
- Title: Preserved central model for faster bidirectional compression in
distributed settings
- Authors: Constantin Philippenko and Aymeric Dieuleveut
- Abstract summary: We develop a new approach to tackle communication constraints in a distributed learning problem with a central server.
We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression.
- Score: 12.323007278097249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a new approach to tackle communication constraints in a
distributed learning problem with a central server. We propose and analyze a
new algorithm that performs bidirectional compression and achieves the same
convergence rate as algorithms using only uplink (from the local workers to the
central server) compression. To obtain this improvement, we design MCM, an
algorithm such that the downlink compression only impacts local models, while
the global model is preserved. As a result, and contrary to previous works, the
gradients on local servers are computed on perturbed models. Consequently,
convergence proofs are more challenging and require a precise control of this
perturbation. To ensure it, MCM additionally combines model compression with a
memory mechanism. This analysis opens new doors, e.g. incorporating worker
dependent randomized-models and partial participation.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Rate-Distortion Theoretic Bounds on Generalization Error for Distributed
Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms.
The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - Joint Global and Local Hierarchical Priors for Learned Image Compression [30.44884350320053]
Recently, learned image compression methods have shown superior performance compared to the traditional hand-crafted image codecs.
We propose a novel entropy model called Information Transformer (Informer) that exploits both local and global information in a content-dependent manner.
Our experiments demonstrate that Informer improves rate-distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets.
arXiv Detail & Related papers (2021-12-08T06:17:37Z) - Slashing Communication Traffic in Federated Learning by Transmitting
Clustered Model Updates [12.660500431713336]
Federated Learning (FL) is an emerging decentralized learning framework through which multiple clients can collaboratively train a learning model.
heavy communication traffic can be incurred by exchanging model updates via the Internet between clients and the parameter server.
In this work, we devise the Model Update Compression by Soft Clustering (MUCSC) algorithm to compress model updates transmitted between clients and the PS.
arXiv Detail & Related papers (2021-05-10T07:15:49Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.