Differentially Quantized Gradient Methods
- URL: http://arxiv.org/abs/2002.02508v4
- Date: Tue, 26 Apr 2022 20:45:48 GMT
- Title: Differentially Quantized Gradient Methods
- Authors: Chung-Yi Lin, Victoria Kostina, and Babak Hassibi
- Abstract summary: We show that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $maxsigma_mathrmGD, rhon 2-R$.
No algorithm within a certain class can converge faster than $maxsigma_mathrmGD, 2-R$.
- Score: 53.3186247068836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider the following distributed optimization scenario. A worker has access
to training data that it uses to compute the gradients while a server decides
when to stop iterative computation based on its target accuracy or delay
constraints. The server receives all its information about the problem instance
from the worker via a rate-limited noiseless communication channel. We
introduce the principle we call Differential Quantization (DQ) that prescribes
compensating the past quantization errors to direct the descent trajectory of a
quantized algorithm towards that of its unquantized counterpart. Assuming that
the objective function is smooth and strongly convex, we prove that
Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction
factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where
$\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient
descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and
$R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n
/\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as
that of unquantized GD, i.e., there is no loss due to quantization. We show
that no algorithm within a certain class can converge faster than
$\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to
1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically
optimal. The principle of differential quantization continues to apply to
gradient methods with momentum such as Nesterov's accelerated gradient descent,
and Polyak's heavy ball method. For these algorithms as well, if the rate is
above a certain threshold, there is no loss in contraction factor obtained by
the differentially quantized algorithm compared to its unquantized counterpart.
Experimental results on least-squares problems validate our theoretical
analysis.
Related papers
- Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Quantum state tomography via non-convex Riemannian gradient descent [13.100184125419691]
In this work, we derive a quantum state tomography scheme that improves the dependence on $kappa$ to the scale.
The improvement comes from the application the non-varian gradient (RGD) algorithms.
Our theoretical results of extremely fast convergence and nearly optimal error bounds are corroborated by numerical results.
arXiv Detail & Related papers (2022-10-10T14:19:23Z) - Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast as finite-time error.
Our analysis provides for faster convergence step-sizes for choosing step-sizes.
arXiv Detail & Related papers (2022-09-06T15:04:57Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Generalization Bounds for Gradient Methods via Discrete and Continuous
Prior [8.76346911214414]
We show a new high probability generalization bound of order $O(frac1n + fracL2n2sum_t=1T(gamma_t/varepsilon_t)2)$ for gradient Langevin Dynamics (GLD)
We can also obtain new bounds for certain variants of SGD.
arXiv Detail & Related papers (2022-05-27T07:23:01Z) - Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient [90.14768299744792]
We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $gamma$-discounted MDP.
We establish normality for the iteration averaged $barboldsymbolQ_T$.
In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.
arXiv Detail & Related papers (2021-12-29T14:47:56Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.
Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.