Related papers: Correlated Quantization for Faster Nonconvex Distributed Optimization

Correlated Quantization for Faster Nonconvex Distributed Optimization

URL: http://arxiv.org/abs/2401.05518v1
Date: Wed, 10 Jan 2024 19:29:17 GMT
Title: Correlated Quantization for Faster Nonconvex Distributed Optimization
Authors: Andrei Panferov, Yury Demidovich, Ahmad Rammal, Peter Richt\'arik
Abstract summary: Quantization (starAli et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round. We analyze the forefront distributed non- optimization algorithm MARINA (Gbunov et al., 2022) We expand the theoretical framework of MARINA to accommodate a substantially broader range of potentially correlated compressors.
Score: 1.2744523252873352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantization (Alistarh et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round in distributed model training. Suresh et al. (2022) introduce correlated quantizers and show their advantages over independent counterparts by analyzing distributed SGD communication complexity. We analyze the forefront distributed non-convex optimization algorithm MARINA (Gorbunov et al., 2022) utilizing the proposed correlated quantizers and show that it outperforms the original MARINA and distributed SGD of Suresh et al. (2022) with regard to the communication complexity. We significantly refine the original analysis of MARINA without any additional assumptions using the weighted Hessian variance (Tyurin et al., 2022), and then we expand the theoretical framework of MARINA to accommodate a substantially broader range of potentially correlated and biased compressors, thus dilating the applicability of the method beyond the conventional independent unbiased compressor setup. Extensive experimental results corroborate our theoretical findings.

Related papers

MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes [57.24311218570012]
We extend the non-smooth convex theory of EF21-P (Anonymous 2024) and MARINA-P (arXiv:2402.06412) in the non-size convex setting. We provide theoretical guarantees under constant, decreasing, and adaptive (aktypetype) steps.
arXiv Detail & Related papers (2024-12-22T16:18:34Z)
Theoretical Convergence Guarantees for Variational Autoencoders [2.8167997311962942]
Variational Autoencoders (VAE) are popular generative models used to sample from complex data distributions. This paper aims to bridge that gap by providing non-asymptotic convergence guarantees for VAE trained using both Gradient Descent and Adam algorithms. Our theoretical analysis applies to both Linear VAE and Deep Gaussian VAE, as well as several VAE variants, including $beta$-VAE and IWAE.
arXiv Detail & Related papers (2024-10-22T07:12:38Z)
An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling [27.882407333690267]
We show that modified Langevin algorithm with prior diffusion can converge dimension independently for strongly log-concave target distributions. We also prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules.
arXiv Detail & Related papers (2024-03-10T11:50:34Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Score-Based Diffusion meets Annealed Importance Sampling [89.92133671626327]
Annealed Importance Sampling remains one of the most effective methods for marginal likelihood estimation. We leverage recent progress in score-based generative modeling to approximate the optimal extended target distribution for AIS proposals.
arXiv Detail & Related papers (2022-08-16T12:13:29Z)
Distributed Methods with Absolute Compression and Error Compensation [1.52292571922932]
Communication compression is a powerful approach to alleviating this issue. In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy. Our rates improve upon the previously known ones in this setting.
arXiv Detail & Related papers (2022-03-04T15:41:14Z)
Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics. On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
Generative Modeling with Denoising Auto-Encoders and Langevin Sampling [88.83704353627554]
We show that both DAE and DSM provide estimates of the score of the smoothed population density. We then apply our results to the homotopy method of arXiv:1907.05600 and provide theoretical justification for its empirical success.
arXiv Detail & Related papers (2020-01-31T23:50:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.