Fugu-MT 論文翻訳(概要): Differentially Quantized Gradient Methods

論文の概要: Differentially Quantized Gradient Methods

arxiv url: http://arxiv.org/abs/2002.02508v4
Date: Tue, 26 Apr 2022 20:45:48 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-03 10:10:36.617628
Title: Differentially Quantized Gradient Methods
Title（参考訳）: 微分量子化勾配法
Authors: Chung-Yi Lin, Victoria Kostina, and Babak Hassibi
Abstract要約: 微分量子化グラディエントDescence (DQ-GD) が$maxsigma_mathrmGD, rhon 2-R$の線形収縮係数を得ることを示す。あるクラス内のアルゴリズムは$maxsigma_mathrmGD, 2-R$よりも早く収束できない。
参考スコア（独自算出の注目度）: 53.3186247068836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.
Abstract（参考訳）: 以下の分散最適化シナリオを考えてみよう。ワーカーは勾配を計算するために使用するトレーニングデータにアクセスし、サーバは目標の精度や遅延制約に基づいて反復計算をいつ停止するかを決定する。サーバは、レート制限されたノイズレス通信チャネルを介して、ワーカーから問題インスタンスに関するすべての情報を受信する。本稿では,従来の量子化誤差を補正して,量子化アルゴリズムの降下軌道を非定量化アルゴリズムの軌道に向ける微分量子化(DQ)の原理を紹介する。目的関数が滑らかで強い凸であると仮定すると、微分量子化勾配降下 (dq-gd) は、次の線型縮約係数 $max\{\sigma_{\mathrm{gd}}, \rho_n 2^{-r}\}$, ここで、$\sigma_{\mathrm{gd}}$ は非量子化勾配降下 (gd) の縮約係数、$\rho_n \geq 1$ は量子化器の被覆効率、$r$ は問題次元あたりのビットレート $n$ である。したがって、任意の$R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits において、DQ-GD の収縮係数は非定量化 GD と同じであり、量子化による損失はない。我々は、あるクラス内のアルゴリズムが$\max\{\sigma_{\mathrm{gd}}, 2^{-r}\}$よりも高速に収束できることを示す。量子化器は$\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963) として存在するので、DQ-GD は漸近的に最適である。微分量子化の原理は、ネステロフの加速勾配降下やポリアクの重球法のような運動量を持つ勾配法に適用され続けている。これらのアルゴリズムについても、レートが一定のしきい値を超える場合、差分量子化アルゴリズムによって得られる収縮係数が、その未定量化アルゴリズムと比較して失われることはない。最小二乗問題に関する実験結果は、我々の理論解析を検証する。

論文の概要: Differentially Quantized Gradient Methods

関連論文リスト