Fugu-MT 論文翻訳(概要): AdamD: Improved bias-correction in Adam

論文の概要: AdamD: Improved bias-correction in Adam

arxiv url: http://arxiv.org/abs/2110.10828v2
Date: Fri, 22 Oct 2021 17:26:48 GMT
ステータス: 翻訳完了
システム内更新日: 2021-10-25 11:55:31.197145
Title: AdamD: Improved bias-correction in Adam
Title（参考訳）: AdamD: バイアス補正を改善したAdam
Authors: John St John
Abstract要約: デフォルトのバイアス補正では、Adamはトレーニングの早い段階で要求された勾配更新よりも大きくなるだろう。 Adamのデフォルトの実装は、最初に提案されたバイアス補正手順と初期ステップの振る舞いのために、ハイパーパラメータ$beta_1、beta$と同等に敏感であるかもしれない。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Here I present a small update to the bias-correction term in the Adam optimizer that has the advantage of making smaller gradient updates in the first several steps of training. With the default bias-correction, Adam may actually make larger than requested gradient updates early in training. By only including the well-justified bias-correction of the second moment gradient estimate, $v_t$, and excluding the bias-correction on the first-order estimate, $m_t$, we attain these more desirable gradient update properties in the first series of steps. The default implementation of Adam may be as sensitive as it is to the hyperparameters $\beta_1, \beta_2$ partially due to the originally proposed bias correction procedure, and its behavior in early steps.
Abstract（参考訳）: ここでは,adamオプティマイザにおけるバイアス補正項の小さな更新について紹介する。デフォルトのバイアス補正では、Adamはトレーニングの早い段階で要求された勾配更新よりも大きくなるだろう。第2モーメント勾配の推定値である$v_t$を適切に補正したバイアス補正と、第1次推定値である$m_t$のバイアス補正を除いて、これらのより望ましい勾配更新特性を第1のステップで達成する。 Adamのデフォルトの実装は、もともと提案されたバイアス補正手順と初期ステップの振る舞いのために、ハイパーパラメータ$\beta_1, \beta_2$に匹敵する敏感である。

関連論文リスト

Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps [65.64965527170156]
我々は、強化学習に広く用いられているAdam optimiserに適応する。我々は、Adam-Relがエポック内で局所的なタイムステップを使用しており、基本的にターゲット変更後のAdamのタイムステップを0にリセットしていることを示す。次に,RLにおいて勾配ノルムの増加が生じることを示すとともに,理論モデルと観測データとの差について検討する。
論文参考訳（メタデータ） (2024-12-22T18:01:08Z)
ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate [21.378608502899077]
本稿では,ADOPTという新しい適応勾配法を提案する。これは,有界雑音の仮定に依存することなく,$mathcalOの最適収束率を実現する。 ADOPTは、画像分類、生成モデル、自然言語処理、深層強化学習など、幅広いタスクにおいて、Adamとその変種と比較して優れた結果が得られる。
論文参考訳（メタデータ） (2024-11-05T06:57:47Z)
DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation [0.0]
我々は,AdamによるDPの従来の利用は,勾配計算における独立雑音の追加により,第2モーメント推定においてバイアスが発生することを観察した。このバイアスは、非プライベートなAdamの振る舞いやAdamのサイン降下解釈と矛盾する低分散パラメータ更新のための異なるスケーリングにつながる。
論文参考訳（メタデータ） (2023-04-21T18:43:37Z)
Provable Adaptivity of Adam under Non-uniform Smoothness [79.25087082434975]
アダムは急速に収束するため、実用的な用途で広く採用されている。アダムの既存の収束解析は、有界な滑らかさの仮定に依存する。本稿では,ランダムにリシャッフルされたAdamの学習率の低下に伴う収束について検討する。
論文参考訳（メタデータ） (2022-08-21T14:57:47Z)
Understanding AdamW through Proximal Methods and Scale-Freeness [57.47324825501137]
Adam は $ell$ regularizer Adam-$ell$ の一般化である。 AdamWは、Adam-$ell$の更新ルールからAdam-$ell$の勾配を分離する。我々はAdamWがAdam-$ell$よりも有利であることを示し、ネットワークの勾配が複数のスケールを示すことを期待する度合いを示す。
論文参考訳（メタデータ） (2022-01-31T21:00:55Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
我々は,重力減衰グローバリゼーションにおいても,目的の異なる解に確実に異なる誤差で収束できることを示す。凸と重み減衰正則化を用いると、Adamを含む任意の最適化アルゴリズムは同じ解に収束することを示す。
論文参考訳（メタデータ） (2021-08-25T17:58:21Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Adam$^+$: A Stochastic Method with Adaptive Variance Reduction [56.051001950733315]
Adamはディープラーニングアプリケーションに広く使われている最適化手法である。我々はAdam$+$(Adam-plusと発音する)という新しい方法を提案する。画像分類,言語モデリング,自動音声認識など,さまざまなディープラーニングタスクに関する実証研究により,Adam$+$がAdamを著しく上回ることを示した。
論文参考訳（メタデータ） (2020-11-24T09:28:53Z)
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients [112.00379151834242]
本稿では,Adamにおける2乗勾配のランニング平均を重み付き平均に置き換える適応学習率の原理を提案する。これにより、より高速な適応が可能となり、より望ましい経験的収束挙動がもたらされる。
論文参考訳（メタデータ） (2020-06-21T21:47:43Z)
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights [53.8489656709356]
正規化技術は現代の深層学習の恩恵である。しかし、運動量を導入することで、スケール不変の重みに対する効果的なステップサイズが急速に小さくなることがしばしば見過ごされる。本稿では,この2つの材料の組み合わせが,有効ステップサイズと準最適モデル性能の早期劣化につながることを検証した。
論文参考訳（メタデータ） (2020-06-15T08:35:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。