Truncated Non-Uniform Quantization for Distributed SGD
- URL: http://arxiv.org/abs/2402.01160v1
- Date: Fri, 2 Feb 2024 05:59:48 GMT
- Title: Truncated Non-Uniform Quantization for Distributed SGD
- Authors: Guangfeng Yan, Tan Li, Yuanzhang Xiao, Congduan Li and Linqi Song
- Abstract summary: We introduce a novel two-stage quantization strategy to enhance the communication efficiency of distributed gradient Descent (SGD)
The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients based on their statistical characteristics.
Our proposed algorithm outperforms existing quantization schemes, striking a superior balance between communication efficiency and convergence performance.
- Score: 17.30572818507568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address the communication bottleneck challenge in distributed learning,
our work introduces a novel two-stage quantization strategy designed to enhance
the communication efficiency of distributed Stochastic Gradient Descent (SGD).
The proposed method initially employs truncation to mitigate the impact of
long-tail noise, followed by a non-uniform quantization of the post-truncation
gradients based on their statistical characteristics. We provide a
comprehensive convergence analysis of the quantized distributed SGD,
establishing theoretical guarantees for its performance. Furthermore, by
minimizing the convergence error, we derive optimal closed-form solutions for
the truncation threshold and non-uniform quantization levels under given
communication constraints. Both theoretical insights and extensive experimental
evaluations demonstrate that our proposed algorithm outperforms existing
quantization schemes, striking a superior balance between communication
efficiency and convergence performance.
Related papers
- Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG)
QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
We demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms.
arXiv Detail & Related papers (2024-10-08T13:21:48Z) - Clipped Uniform Quantizers for Communication-Efficient Federated Learning [3.38220960870904]
This paper introduces an approach to employ clipped uniform quantization in federated learning settings.
By employing optimal clipping thresholds and adaptive quantization schemes, our method significantly curtails the bit requirements for model weight transmissions.
arXiv Detail & Related papers (2024-05-22T05:48:25Z) - Quantization Avoids Saddle Points in Distributed Optimization [1.579622195923387]
Distributed non optimization underpins key functionalities of numerous distributed systems.
The aim of this paper is to prove that it can effectively escape saddle points convergence to a second-order stationary point convergence.
With an easily adjustable quantization, the approach allows a user control to aggressively reduce communication overhead.
arXiv Detail & Related papers (2024-03-15T15:58:20Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Improved Quantization Strategies for Managing Heavy-tailed Gradients in
Distributed Learning [20.91559450517002]
It is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies.
Existing parameter quantization methods experience performance degradation when this heavy-tailed feature is ignored.
We introduce a novel compression scheme specifically engineered for heavy-tailed gradient gradients, which effectively combines truncation with quantization.
arXiv Detail & Related papers (2024-02-02T06:14:31Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Near-Term Distributed Quantum Computation using Mean-Field Corrections
and Auxiliary Qubits [77.04894470683776]
We propose near-term distributed quantum computing that involve limited information transfer and conservative entanglement production.
We build upon these concepts to produce an approximate circuit-cutting technique for the fragmented pre-training of variational quantum algorithms.
arXiv Detail & Related papers (2023-09-11T18:00:00Z) - Symmetry Regularization and Saturating Nonlinearity for Robust
Quantization [5.1779694507922835]
We present three insights to robustify a network against quantization.
We propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL)
Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization.
arXiv Detail & Related papers (2022-07-31T02:12:28Z) - Wireless Quantized Federated Learning: A Joint Computation and
Communication Design [36.35684767732552]
In this paper, we aim to minimize the total convergence time of FL, by quantizing the local model parameters prior to uplink transmission.
We jointly optimize the computing, communication resources and number of quantization bits, in order to guarantee minimized convergence time across all global rounds.
arXiv Detail & Related papers (2022-03-11T12:30:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.