Fugu-MT 論文翻訳(概要): On Biased Compression for Distributed Learning

論文の概要: On Biased Compression for Distributed Learning

arxiv url: http://arxiv.org/abs/2002.12410v4
Date: Sun, 14 Jan 2024 16:36:09 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-18 04:16:40.457155
Title: On Biased Compression for Distributed Learning
Title（参考訳）: 分散学習のためのバイアス圧縮について
Authors: Aleksandr Beznosikov and Samuel Horv\'ath and Peter Richt\'arik and Mher Safaryan
Abstract要約: バイアス圧縮機が単一ノードと分散設定の両方において線形収束率をもたらすことを初めて示す。理論的保証と実用性能を期待できる新しいバイアス圧縮機を提案する。
参考スコア（独自算出の注目度）: 55.89300593805943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.
Abstract（参考訳）: 近年,分散学習におけるコミュニケーションのボトルネックを軽減するツールとして,様々なコミュニケーション圧縮技術が登場している。しかし、バイアス圧縮機は、より研究され理解されている非バイアス圧縮機と比較して、実際は優れた性能を示すことが多いが、それらについてはほとんど知られていない。本研究では, 偏差圧縮演算子の3つのクラスについて検討し, その2つのクラスは新しく, その性能は(確率的)勾配降下と分散(確率的)勾配降下に適用した。偏りのある圧縮機が単一ノードと分散設定の両方で線形収束率をもたらすことを初めて示す。 We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). さらに、通信勾配の合成的および経験的分布に関する理論的研究を通じて、なぜ、また、偏りのある圧縮機が偏りのない変種をどれだけ上回るかについて光を当てた。最後に, 理論的な保証と実用性能が期待できる新しいバイアス圧縮機を提案する。

関連論文リスト

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
パラメータ・サーバ・フレームワークにおける圧縮勾配通信を用いた分散勾配降下(SGD)のための新しいアルゴリズムを提案する。平坦な1ビット勾配勾配勾配法(FO-SGD)は2つの単純なアルゴリズムの考え方に依存している。
論文参考訳（メタデータ） (2024-05-17T21:17:27Z)
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers [98.33906104846386]
Token圧縮は、プルーニング(ドロップ)やトークンのマージによって、大規模な視覚変換器(ViTなど)を高速化することを目的としている。 DiffRate(ディフレート)は、先行技術にはないいくつかの魅力的な特性を持つ新しいトークン圧縮手法である。
論文参考訳（メタデータ） (2023-05-29T10:15:19Z)
EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
分散最適化と学習では、異なるコンピュータユニット間の通信がボトルネックとなることが多い。圧縮演算子には2つのクラスがあり、それを利用するアルゴリズムは別々である。本稿では,特にDIANAとEF21を復元する新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-05-09T10:44:23Z)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [151.3826781154146]
先行変数と超優先度を持つ潜伏変数は、変動画像圧縮において重要な問題である。ベクトル化された視点で潜伏変数を観察する際、相関関係や相関関係は存在する。当社のモデルでは、速度歪曲性能が向上し、圧縮速度が3.18倍に向上した。
論文参考訳（メタデータ） (2022-03-21T11:44:17Z)
Distributed Methods with Absolute Compression and Error Compensation [1.52292571922932]
コミュニケーション圧縮はこの問題を緩和するための強力なアプローチである。本稿では,任意のサンプリング戦略に対する絶対圧縮によるEC-SGDの解析を一般化する。この設定では、以前知られていたものよりも私たちのレートが向上します。
論文参考訳（メタデータ） (2022-03-04T15:41:14Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
本稿では,Gorbunov et al (2021) の MARINA 法が,理論的な通信複雑性の観点から最先端の手法とみなすことができることを示す。 MARINAの理論は、古典的な独立圧縮機設定を超えて、潜在的にエミュレートされた圧縮機の理論を支持するものである。
論文参考訳（メタデータ） (2021-10-07T09:38:15Z)
Escaping Saddle Points with Compressed SGD [8.014396597444296]
勾配降下(SGD)は大規模分散機械学習の最適化手法である。勾配圧縮によるSGD拡張は$varepsilon$1次定常点に収束することを示す。勾配がリプシッツでないとき、RandomK圧縮機を持つSGDは、SGDと同じ数の反復数を持つ$varepsilon$-SOSPに収束する。
論文参考訳（メタデータ） (2021-05-21T01:56:43Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) は閾値に基づくスペーシフィケーションスキームであり、DGCと同等のしきい値推定品質を享受する。 SIDCoは,非圧縮ベースライン,Topk,DGC圧縮機と比較して,最大で41:7%,7:6%,1:9%の速度でトレーニングを高速化する。
論文参考訳（メタデータ） (2021-01-26T13:06:00Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。