Truncated Non-Uniform Quantization for Distributed SGD
- URL: http://arxiv.org/abs/2402.01160v1
- Date: Fri, 2 Feb 2024 05:59:48 GMT
- Title: Truncated Non-Uniform Quantization for Distributed SGD
- Authors: Guangfeng Yan, Tan Li, Yuanzhang Xiao, Congduan Li and Linqi Song
- Abstract summary: We introduce a novel two-stage quantization strategy to enhance the communication efficiency of distributed gradient Descent (SGD)
The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients based on their statistical characteristics.
Our proposed algorithm outperforms existing quantization schemes, striking a superior balance between communication efficiency and convergence performance.
- Score: 17.30572818507568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address the communication bottleneck challenge in distributed learning,
our work introduces a novel two-stage quantization strategy designed to enhance
the communication efficiency of distributed Stochastic Gradient Descent (SGD).
The proposed method initially employs truncation to mitigate the impact of
long-tail noise, followed by a non-uniform quantization of the post-truncation
gradients based on their statistical characteristics. We provide a
comprehensive convergence analysis of the quantized distributed SGD,
establishing theoretical guarantees for its performance. Furthermore, by
minimizing the convergence error, we derive optimal closed-form solutions for
the truncation threshold and non-uniform quantization levels under given
communication constraints. Both theoretical insights and extensive experimental
evaluations demonstrate that our proposed algorithm outperforms existing
quantization schemes, striking a superior balance between communication
efficiency and convergence performance.
Related papers
- Clipped Uniform Quantizers for Communication-Efficient Federated Learning [3.38220960870904]
This paper introduces an approach to employ clipped uniform quantization in federated learning settings.
By employing optimal clipping thresholds and adaptive quantization schemes, our method significantly curtails the bit requirements for model weight transmissions.
arXiv Detail & Related papers (2024-05-22T05:48:25Z) - Quantization Avoids Saddle Points in Distributed Optimization [1.579622195923387]
Distributed non optimization underpins key functionalities of numerous distributed systems.
The aim of this paper is to prove that it can effectively escape saddle points convergence to a second-order stationary point convergence.
With an easily adjustable quantization, the approach allows a user control to aggressively reduce communication overhead.
arXiv Detail & Related papers (2024-03-15T15:58:20Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Improved Quantization Strategies for Managing Heavy-tailed Gradients in
Distributed Learning [20.91559450517002]
It is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies.
Existing parameter quantization methods experience performance degradation when this heavy-tailed feature is ignored.
We introduce a novel compression scheme specifically engineered for heavy-tailed gradient gradients, which effectively combines truncation with quantization.
arXiv Detail & Related papers (2024-02-02T06:14:31Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Near-Term Distributed Quantum Computation using Mean-Field Corrections
and Auxiliary Qubits [77.04894470683776]
We propose near-term distributed quantum computing that involve limited information transfer and conservative entanglement production.
We build upon these concepts to produce an approximate circuit-cutting technique for the fragmented pre-training of variational quantum algorithms.
arXiv Detail & Related papers (2023-09-11T18:00:00Z) - Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z) - Symmetry Regularization and Saturating Nonlinearity for Robust
Quantization [5.1779694507922835]
We present three insights to robustify a network against quantization.
We propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL)
Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization.
arXiv Detail & Related papers (2022-07-31T02:12:28Z) - Wireless Quantized Federated Learning: A Joint Computation and
Communication Design [36.35684767732552]
In this paper, we aim to minimize the total convergence time of FL, by quantizing the local model parameters prior to uplink transmission.
We jointly optimize the computing, communication resources and number of quantization bits, in order to guarantee minimized convergence time across all global rounds.
arXiv Detail & Related papers (2022-03-11T12:30:08Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.