Federated Optimization Algorithms with Random Reshuffling and Gradient
Compression
- URL: http://arxiv.org/abs/2206.07021v1
- Date: Tue, 14 Jun 2022 17:36:47 GMT
- Title: Federated Optimization Algorithms with Random Reshuffling and Gradient
Compression
- Authors: Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov,
Ahmed Khaled, Konstantin Burlachenko, Peter Richt\'arik
- Abstract summary: We provide the first analysis of methods with gradient compression and without-replacement sampling.
We show how to reduce the variance coming from gradient quantization through the use of control iterates.
We outline several settings in which they improve upon existing algorithms.
- Score: 2.7554288121906296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient compression is a popular technique for improving communication
complexity of stochastic first-order methods in distributed training of machine
learning models. However, the existing works consider only with-replacement
sampling of stochastic gradients. In contrast, it is well-known in practice and
recently confirmed in theory that stochastic methods based on
without-replacement sampling, e.g., Random Reshuffling (RR) method, perform
better than ones that sample the gradients with-replacement. In this work, we
close this gap in the literature and provide the first analysis of methods with
gradient compression and without-replacement sampling. We first develop a
distributed variant of random reshuffling with gradient compression (Q-RR), and
show how to reduce the variance coming from gradient quantization through the
use of control iterates. Next, to have a better fit to Federated Learning
applications, we incorporate local computation and propose a variant of Q-RR
called Q-NASTYA. Q-NASTYA uses local gradient steps and different local and
global stepsizes. Next, we show how to reduce compression variance in this
setting as well. Finally, we prove the convergence results for the proposed
methods and outline several settings in which they improve upon existing
algorithms.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms.
We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants.
Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Communication-Efficient Federated Learning via Quantized Compressed
Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.