Fugu-MT 論文翻訳(概要): Distributed Momentum Methods Under Biased Gradient Estimations

論文の概要: Distributed Momentum Methods Under Biased Gradient Estimations

arxiv url: http://arxiv.org/abs/2403.00853v1
Date: Thu, 29 Feb 2024 18:03:03 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-05 16:29:38.984727
Title: Distributed Momentum Methods Under Biased Gradient Estimations
Title（参考訳）: バイアス勾配推定に基づく分散モーメント法
Authors: Ali Beikmohammadi, Sarit Khirirat, Sindri Magn\'usson
Abstract要約: 分散勾配法は、複数のノードに分散したデータを含む大規模な機械学習問題の解決において、注目を集めている。しかし、多くの分散機械学習アプリケーションでは、偏りのない勾配推定値を得るのは難しい。本稿では,偏差勾配推定の下での分散運動量法における非同相収束境界を確立する。
参考スコア（独自算出の注目度）: 6.046591474843391
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Distributed stochastic gradient methods are gaining prominence in solving large-scale machine learning problems that involve data distributed across multiple nodes. However, obtaining unbiased stochastic gradients, which have been the focus of most theoretical research, is challenging in many distributed machine learning applications. The gradient estimations easily become biased, for example, when gradients are compressed or clipped, when data is shuffled, and in meta-learning and reinforcement learning. In this work, we establish non-asymptotic convergence bounds on distributed momentum methods under biased gradient estimation on both general non-convex and $\mu$-PL non-convex problems. Our analysis covers general distributed optimization problems, and we work out the implications for special cases where gradient estimates are biased, i.e., in meta-learning and when the gradients are compressed or clipped. Our numerical experiments on training deep neural networks with Top-$K$ sparsification and clipping verify faster convergence performance of momentum methods than traditional biased gradient descent.
Abstract（参考訳）: 分散確率勾配法は、複数のノードに分散するデータを含む大規模機械学習問題の解決において注目されている。しかし、最も理論的な研究の焦点となっている偏りのない確率勾配を得ることは、多くの分散機械学習アプリケーションにおいて困難である。勾配推定は、例えば、勾配が圧縮されたり、切断されたり、データがシャッフルされたり、メタラーニングや強化学習で容易にバイアスとなる。本研究では,一般非凸問題と$\mu$-pl非凸問題の両方に対する偏勾配推定の下で,分散運動量法における非漸近収束境界を確立する。本分析は,一般的な分散最適化問題を対象としており,勾配推定が偏り,すなわちメタラーニングや,勾配が圧縮されたり,クリップされたりする場合など,特別な場合の意義について検討する。我々は,Top-K$スペーシフィケーションとクリッピングによるディープニューラルネットワークのトレーニングに関する数値実験により,従来のバイアス勾配よりも高速なモーメント手法の収束性能を検証した。

関連論文リスト

Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm [56.06235614890066]
勾配降下(GD)と勾配降下(SGD)は多くのアプリケーションドメインで広く使われている。本稿では, 勾配流の異なる段階における終端アトラクタに基づくGDのダイナミクスを慎重に解析する。
論文参考訳（メタデータ） (2024-09-10T14:15:56Z)
Almost sure convergence rates of stochastic gradient methods under gradient domination [2.96614015844317]
大域的および局所的な勾配支配特性は、強い凸性のより現実的な置き換えであることが示されている。収束率 $f(X_n)-f*in obig(n-frac14beta-1+epsilonbig)$ は勾配降下の最終反復である。教師付き学習と強化学習の両方において,本研究結果をトレーニングタスクに適用する方法を示す。
論文参考訳（メタデータ） (2024-05-22T12:40:57Z)
Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [0.8192907805418583]
偏りのある勾配は滑らかな非函数に対する臨界点に収束することを示す。適切なチューニングを行うことで,バイアスの効果を低減できることを示す。
論文参考訳（メタデータ） (2024-02-05T10:17:36Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
emphdone right -- 最適化とカーネルコミュニティからの具体的な洞察を使用するという意味で -- が、勾配降下は非常に効果的であることを示している。本稿では,直感的に設計を記述し,設計選択について説明する。本手法は,分子結合親和性予測のための最先端グラフニューラルネットワークと同程度にガウス過程の回帰を配置する。
論文参考訳（メタデータ） (2023-10-31T16:15:13Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
本稿では,カーネル手法のコンテキストにおいて,現象を正確に特徴付けることができることを示す。分離可能なヒルベルト空間における2次対象の最小化を考慮し、早期停止の場合、学習速度の選択が得られた解のスペクトル分解に影響を及ぼすことを示す。
論文参考訳（メタデータ） (2022-02-28T13:01:04Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity [24.428843425522107]
直交線形ネットワーク上の勾配降下の力学を,その連続時間,すなわち勾配流を用いて研究する。トレーニング損失の収束速度がバイアス効果の大きさを制御することを示し,収束速度が遅くなるほどバイアスが良くなることを示した。
論文参考訳（メタデータ） (2021-06-17T14:16:04Z)
Stability and Generalization of Stochastic Gradient Methods for Minimax Problems [71.60601421935844]
多くの機械学習問題は、GAN(Generative Adversarial Networks)のようなミニマックス問題として定式化できる。ミニマックス問題に対するトレーニング勾配法から例を包括的に一般化解析する。
論文参考訳（メタデータ） (2021-05-08T22:38:00Z)
Deep learning: a statistical viewpoint [120.94133818355645]
ディープラーニングは、理論的観点からいくつかの大きな驚きを明らかにしました。特に、簡単な勾配法は、最適でないトレーニング問題に対するほぼ完全な解決策を簡単に見つけます。我々はこれらの現象を具体的原理で補うと推測する。
論文参考訳（メタデータ） (2021-03-16T16:26:36Z)
Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank [1.9350867959464846]
ディープラーニングでは、勾配発散は、よく一般化する解を好む傾向にある。本稿では,線形ネットワークの簡易化における勾配降下のダイナミクスと推定問題について解析する。
論文参考訳（メタデータ） (2020-11-27T15:08:34Z)
On the Convergence of SGD with Biased Gradients [28.400751656818215]
偏り勾配法 (SGD) の導出領域を解析し, 個々の更新を圧縮によって劣化させる。偏差精度と収束率の影響の程度を定量化する。
論文参考訳（メタデータ） (2020-07-31T19:37:59Z)
A Study of Gradient Variance in Deep Learning [56.437755740715396]
階層化サンプリングによる平均ミニバッチ勾配のばらつきを最小化する手法であるグラディエントクラスタリングを導入する。我々は、一般的なディープラーニングベンチマークの勾配分散を測定し、一般的な仮定に反して、トレーニング中に勾配分散が増加することを観察する。
論文参考訳（メタデータ） (2020-07-09T03:23:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。