Fugu-MT 論文翻訳(概要): Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

論文の概要: Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

arxiv url: http://arxiv.org/abs/2002.05685v2
Date: Wed, 4 Nov 2020 16:17:37 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-01 10:03:04.549447
Title: Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
Title（参考訳）: フラクショナルアンダーダムランゲヴィンダイナミクス:重音下でのモーメントによるSGDのリターゲティング
Authors: Umut \c{S}im\c{s}ekli, Lingjiong Zhu, Yee Whye Teh, Mert G\"urb\"uzbalaban
Abstract要約: FULDは, 深層学習における役割において, 自然的, エレガントな手法と類似性があることが示唆された。
参考スコア（独自算出の注目度）: 39.9241638707715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study, we consider a \emph{continuous-time} variant of SGDm, known as the underdamped Langevin dynamics (ULD), and investigate its asymptotic properties under heavy-tailed perturbations. Supported by recent studies from statistical physics, we argue both theoretically and empirically that the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that \emph{the optima of stationary distribution} of the dynamics might not match \emph{the optima of the cost function to be optimized}. As a remedy, we develop a novel framework, which we coin as \emph{fractional} ULD (FULD), and prove that FULD targets the so-called Gibbs distribution, whose optima exactly match the optima of the original cost. We observe that the Euler discretization of FULD has noteworthy algorithmic similarities with \emph{natural gradient} methods and \emph{gradient clipping}, bringing a new perspective on understanding their role in deep learning. We support our theory with experiments conducted on a synthetic model and neural networks.
Abstract（参考訳）: モーメントを伴う確率勾配降下(SGDm)は、ディープラーニングにおける最も一般的な最適化アルゴリズムの1つである。凸問題にはsgdmの豊富な理論があるが、この問題が非凸で勾配ノイズが重み付き振舞いを示す深層学習の文脈では、近年の研究で実証的に観察されたように、この理論は開発されていない。本研究では, アンダーダムドランゲヴィン力学 (ULD) として知られるSGDmのemph{continuous-time} 変種について検討し, その漸近特性について検討する。統計物理学の最近の研究で支持されているように、この摂動の重みは、ステップサイズが小さい場合でもバイアスをもたらすと理論的にも経験的にも論じるが、力学の「定常分布の最適値」が最適化されるコスト関数の最適値と一致しないかもしれない。そこで我々は, FULD (emph{fractional} ULD) と呼ばれる新しいフレームワークを開発し, FULD が本来のコストの最適値と正確に一致するギブズ分布を目標としていることを証明した。 fuldのオイラー離散化は, \emph{natural gradient} 法と \emph{gradient clipping} 法とのアルゴリズム的類似性が注目され,深層学習におけるその役割を理解するための新たな視点がもたらされている。我々は,合成モデルとニューラルネットワークを用いた実験により,この理論を支持する。

関連論文リスト

Role of Momentum in Smoothing Objective Function and Generalizability of Deep Neural Networks [0.6906005491572401]
モーメントを有する勾配降下(SGD)における雑音は,学習速度,バッチサイズ,運動量係数,標準値の上限によって決定される目的関数を円滑にすることを示す。また、雑音レベルに依存するアサーションモデルの一般化性を支持する実験結果も提供する。
論文参考訳（メタデータ） (2024-02-04T02:48:28Z)
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise [16.12834917344859]
重球運動量法は加速収束を提供し、大きなバッチ設定でうまく機能するはずだと広く推測されている。重球運動量は, SGDの偏差項の加速収束率を$tildemathcalO(sqrtkappa)$で達成し, ほぼ最適収束率を達成できることを示した。つまり、重い球運動量を持つSGDは、分散機械学習やフェデレーション学習のような大規模なバッチ設定で有用である。
論文参考訳（メタデータ） (2023-12-22T09:58:39Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
我々は、閉形式力学を解析するための数学的機会を提供する、簡潔な損失関数であるアンヒンジド・ロスを導入する。アンヒンジされた損失は、時間変化学習率や特徴正規化など、より実践的なテクニックを検討することができる。
論文参考訳（メタデータ） (2023-12-13T02:11:07Z)
The Marginal Value of Momentum for Small Learning Rate SGD [20.606430391298815]
モーメントは、勾配雑音のない強い凸条件下での勾配降下の収束を加速することが知られている。実験により、最適学習率があまり大きくない実践訓練において、運動量には最適化と一般化の両方の利点があることがわかった。
論文参考訳（メタデータ） (2023-07-27T21:01:26Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
平均場ランゲヴィンダイナミクス(英: mean-field Langevin dynamics、MFLD)は、分布依存のドリフトを含むランゲヴィン力学の非線形一般化である。近年の研究では、MFLDは測度空間で機能するエントロピー規則化された凸関数を地球規模で最小化することが示されている。有限粒子近似,時間分散,勾配近似による誤差を考慮し,MFLDのカオスの均一時間伝播を示す枠組みを提供する。
論文参考訳（メタデータ） (2023-06-12T16:28:11Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
SGLDにおけるノイズ構造を操作することにより,情報理論の一般化を最適化する。低経験的リスクを保証するために制約を課すことで、最適なノイズ共分散が期待される勾配共分散の平方根であることを証明する。
論文参考訳（メタデータ） (2021-10-26T15:02:27Z)
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent [3.0079490585515343]
勾配降下(SGD)は、消滅する学習率体制において比較的よく理解されている。 SGDとその変異体の基本特性を非退化学習率体系で研究することを提案する。
論文参考訳（メタデータ） (2020-12-07T12:31:43Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
本稿では,中等度学習におけるSGDの特定の正規化効果を特徴付けることを試みる。 SGDはデータ行列の大きな固有値方向に沿って収束し、GDは小さな固有値方向に沿って収束することを示す。
論文参考訳（メタデータ） (2020-11-04T21:07:52Z)
Dynamic of Stochastic Gradient Descent with State-Dependent Noise [84.64013284862733]
勾配降下(SGD)とその変種は、ディープニューラルネットワークを訓練するための主流の方法である。局所ミニマの局所領域におけるSGDのノイズの共分散は状態の二次関数であることを示す。本稿では,SGDのダイナミクスを近似するために,状態依存拡散を伴う新しいパワーローダイナミクスを提案する。
論文参考訳（メタデータ） (2020-06-24T13:34:38Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。