Compressed and Sparse Models for Non-Convex Decentralized Learning
- URL: http://arxiv.org/abs/2311.05760v2
- Date: Wed, 5 Jun 2024 21:43:04 GMT
- Title: Compressed and Sparse Models for Non-Convex Decentralized Learning
- Authors: Andrew Campbell, Hang Liu, Leah Woldemariam, Anna Scaglione,
- Abstract summary: Frequent model communication is a significant bottleneck to the efficiency of decentralized machine learning.
We present Malcom-PSGD, a novel decentralized ML algorithm that combines model sparsity with a gradient gradient.
Our method reduces communication costs by approximately $75%$ when compared to the state-of-the-art.
- Score: 6.14375469212514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research highlights frequent model communication as a significant bottleneck to the efficiency of decentralized machine learning (ML), especially for large-scale and over-parameterized neural networks (NNs). To address this, we present Malcom-PSGD, a novel decentralized ML algorithm that combines gradient compression techniques with model sparsification. We promote model sparsity by adding $\ell_1$ regularization to the objective and present a decentralized proximal SGD method for training. Our approach employs vector source coding and dithering-based quantization for the compressed gradient communication of sparsified models. Our analysis demonstrates that Malcom-PSGD achieves a convergence rate of $\mathcal{O}(1/\sqrt{t})$ with respect to the iterations $t$, assuming a constant consensus and learning rate. This result is supported by our proof for the convergence of non-convex compressed Proximal SGD methods. Additionally, we conduct a bit analysis, providing a closed-form expression for the communication costs associated with Malcom-PSGD. Numerical results verify our theoretical findings and demonstrate that our method reduces communication costs by approximately $75\%$ when compared to the state-of-the-art.
Related papers
- Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Transfer Learning on Multi-Dimensional Data: A Novel Approach to Neural Network-Based Surrogate Modeling [0.0]
Convolutional neural networks (CNNs) have gained popularity as the basis for such surrogate models.
We propose training a CNN surrogate model on a mixture of numerical solutions to both the $d$-dimensional problem and its ($d-1$)-dimensional approximation.
We demonstrate our approach on a multiphase flow test problem, using transfer learning to train a dense fully-convolutional encoder-decoder CNN on the two classes of data.
arXiv Detail & Related papers (2024-10-16T05:07:48Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Compressed Federated Reinforcement Learning with a Generative Model [11.074080383657453]
Reinforcement learning has recently gained unprecedented popularity, yet it still grapples with sample inefficiency.
Addressing this challenge, federated reinforcement learning (FedRL) has emerged, wherein agents collaboratively learn a single policy by aggregating local estimations.
We propose CompFedRL, a communication-efficient FedRL approach incorporating both textitperiodic aggregation and (direct/error-feedback) compression mechanisms.
arXiv Detail & Related papers (2024-03-26T15:36:47Z) - Generative Fractional Diffusion Models [53.36835573822926]
We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics.
Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID.
arXiv Detail & Related papers (2023-10-26T17:53:24Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Fundamental Limits of Communication Efficiency for Model Aggregation in
Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective.
It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - A Distributed Training Algorithm of Generative Adversarial Networks with
Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs.
The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor.
Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z) - rTop-k: A Statistical Estimation Approach to Distributed SGD [5.197307534263253]
We show that top-k and random-k sparsification methods consistently and significantly outperforms either method applied alone.
We propose a simple statistical estimation model for gradients which captures the sparsity and statistically optimal communication scheme.
We show through extensive experiments on both image and language domains with CIFAR-10, ImageNet, and Penn Treebank datasets that the skewd application of these two sparsification methods consistently and significantly outperforms either method applied alone.
arXiv Detail & Related papers (2020-05-21T16:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.