Correlated Quantization for Faster Nonconvex Distributed Optimization
- URL: http://arxiv.org/abs/2401.05518v1
- Date: Wed, 10 Jan 2024 19:29:17 GMT
- Title: Correlated Quantization for Faster Nonconvex Distributed Optimization
- Authors: Andrei Panferov, Yury Demidovich, Ahmad Rammal, Peter Richt\'arik
- Abstract summary: Quantization (starAli et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round.
We analyze the forefront distributed non- optimization algorithm MARINA (Gbunov et al., 2022)
We expand the theoretical framework of MARINA to accommodate a substantially broader range of potentially correlated compressors.
- Score: 1.2744523252873352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization (Alistarh et al., 2017) is an important (stochastic) compression
technique that reduces the volume of transmitted bits during each communication
round in distributed model training. Suresh et al. (2022) introduce correlated
quantizers and show their advantages over independent counterparts by analyzing
distributed SGD communication complexity. We analyze the forefront distributed
non-convex optimization algorithm MARINA (Gorbunov et al., 2022) utilizing the
proposed correlated quantizers and show that it outperforms the original MARINA
and distributed SGD of Suresh et al. (2022) with regard to the communication
complexity. We significantly refine the original analysis of MARINA without any
additional assumptions using the weighted Hessian variance (Tyurin et al.,
2022), and then we expand the theoretical framework of MARINA to accommodate a
substantially broader range of potentially correlated and biased compressors,
thus dilating the applicability of the method beyond the conventional
independent unbiased compressor setup. Extensive experimental results
corroborate our theoretical findings.
Related papers
- Theoretical Convergence Guarantees for Variational Autoencoders [2.8167997311962942]
Variational Autoencoders (VAE) are popular generative models used to sample from complex data distributions.
This paper aims to bridge that gap by providing non-asymptotic convergence guarantees for VAE trained using both Gradient Descent and Adam algorithms.
Our theoretical analysis applies to both Linear VAE and Deep Gaussian VAE, as well as several VAE variants, including $beta$-VAE and IWAE.
arXiv Detail & Related papers (2024-10-22T07:12:38Z) - An Improved Analysis of Langevin Algorithms with Prior Diffusion for
Non-Log-Concave Sampling [27.882407333690267]
We show that modified Langevin algorithm with prior diffusion can converge dimension independently for strongly log-concave target distributions.
We also prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules.
arXiv Detail & Related papers (2024-03-10T11:50:34Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Score-Based Diffusion meets Annealed Importance Sampling [89.92133671626327]
Annealed Importance Sampling remains one of the most effective methods for marginal likelihood estimation.
We leverage recent progress in score-based generative modeling to approximate the optimal extended target distribution for AIS proposals.
arXiv Detail & Related papers (2022-08-16T12:13:29Z) - Distributed Methods with Absolute Compression and Error Compensation [1.52292571922932]
Communication compression is a powerful approach to alleviating this issue.
In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy.
Our rates improve upon the previously known ones in this setting.
arXiv Detail & Related papers (2022-03-04T15:41:14Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - Generative Modeling with Denoising Auto-Encoders and Langevin Sampling [88.83704353627554]
We show that both DAE and DSM provide estimates of the score of the smoothed population density.
We then apply our results to the homotopy method of arXiv:1907.05600 and provide theoretical justification for its empirical success.
arXiv Detail & Related papers (2020-01-31T23:50:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.