Fugu-MT 論文翻訳(概要): Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity

論文の概要: Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity

arxiv url: http://arxiv.org/abs/2410.08198v1
Date: Thu, 10 Oct 2024 17:58:53 GMT
ステータス: 翻訳完了
システム内更新日: 2024-10-31 04:46:03.686300
Title: Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
Title（参考訳）: Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
Authors: Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li,
Abstract要約: 好ましくは $ell_infty$-geometry が SGD であるのに対して、Adam は影響を受けていない。我々の実験は、好ましくは $ell_infty$-geometry が SGD であるのに対して、Adam が影響を受けていない場合、さらに悪化することを確認した。
参考スコア（独自算出の注目度）: 6.270305440413688
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adam outperforms SGD when training language models. Yet this advantage is not well-understood theoretically -- previous convergence analysis for Adam and SGD mainly focuses on the number of steps $T$ and is already minimax-optimal in non-convex cases, which are both $\widetilde{O}(T^{-1/4})$. In this work, we argue that the exploitation of nice $\ell_\infty$-geometry is the key advantage of Adam over SGD. More specifically, we give a new convergence analysis for Adam under novel assumptions that loss is smooth under $\ell_\infty$-geometry rather than the more common $\ell_2$-geometry, which yields a much better empirical smoothness constant for GPT-2 and ResNet models. Our experiments confirm that Adam performs much worse when the favorable $\ell_\infty$-geometry is changed while SGD provably remains unaffected. We also extend the convergence analysis to blockwise Adam under novel blockwise smoothness assumptions.
Abstract（参考訳）: 言語モデルのトレーニングでは、AdamはSGDより優れています。しかし、この利点は理論的にはよく理解されていない。Adam と SGD の以前の収束解析は主にステップ数 $T$ に焦点をあて、既に非凸の場合は $\widetilde{O}(T^{-1/4})$ に最適化されている。本研究では、良い$\ell_\infty$-geometryの活用がSGDに対するAdamの重要な利点であると主張する。具体的には、より一般的な$\ell_2$-geometryではなく$\ell_\infty$-geometryの下で損失が滑らかであるという新しい仮定の下で、Adamに新しい収束解析を与える。我々の実験では、好意的な$\ell_\infty$-geometryが変更され、SGDは確実に影響を受けない場合、Adamはより悪化することを確認した。また、新しいブロックワイズ滑らか性仮定の下で、収束解析をブロックワイズ・アダムに拡張する。

関連論文リスト

Simple Convergence Proof of Adam From a Sign-like Descent Perspective [58.89890024903816]
我々は、Adamが以前の$cal O(fracln TTs14)$よりも$cal O(frac1Ts14)$の最適なレートを達成することを示す。我々の理論分析は、収束を保証する重要な要因として運動量の役割に関する新たな洞察を提供する。
論文参考訳（メタデータ） (2025-07-08T13:19:26Z)
When Can You Get Away with Low Memory Adam? [48.30892531847662]
我々は、$textitSlimAdam$がAdamのパフォーマンスと安定性にマッチし、合計2回目で98%のコストを節約できることを示します。 code for $textitSlimAdam$はhttps://github.com/dayal-kalra/low-Memory-adamで入手できる。
論文参考訳（メタデータ） (2025-03-03T18:59:40Z)
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization [5.896194021915813]
重量減衰を持つアダム(AdamW)は、言語モデリングタスクにおける優れた性能で広く評価されている。我々はAdamWの利点を理解するために、暗黙的に制約付き最適化を行うことを示す。フルバッチ設定では、AdamWが部分和が分岐する非増加学習率スケジュールに収束した場合、元の損失のKKT点に収束しなければならないことを示す。
論文参考訳（メタデータ） (2024-04-05T23:56:50Z)
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization [20.399244578926474]
我々は,Adam型アルゴリズム(UAdam)の統一フレームワークを導入する。これは、NAdamBound、AdaFom、Adanといった2階のモーメントの一般的な形式を備えている。 UAdam が定常点の近傍に収束して $mathcalO (1/T)$ となることを示す。
論文参考訳（メタデータ） (2023-05-09T13:07:03Z)
Provable Adaptivity of Adam under Non-uniform Smoothness [79.25087082434975]
アダムは急速に収束するため、実用的な用途で広く採用されている。アダムの既存の収束解析は、有界な滑らかさの仮定に依存する。本稿では,ランダムにリシャッフルされたAdamの学習率の低下に伴う収束について検討する。
論文参考訳（メタデータ） (2022-08-21T14:57:47Z)
Understanding AdamW through Proximal Methods and Scale-Freeness [57.47324825501137]
Adam は $ell$ regularizer Adam-$ell$ の一般化である。 AdamWは、Adam-$ell$の更新ルールからAdam-$ell$の勾配を分離する。我々はAdamWがAdam-$ell$よりも有利であることを示し、ネットワークの勾配が複数のスケールを示すことを期待する度合いを示す。
論文参考訳（メタデータ） (2022-01-31T21:00:55Z)
A Novel Convergence Analysis for Algorithms of the Adam Family [105.22760323075008]
本稿ではAdam, AMSGrad, AdaboundなどのAdamスタイルの手法群に対する収束の一般的な証明を示す。我々の分析は非常に単純で汎用的なので、より広範な非構成最適化問題の族を解くための収束を確立するために利用することができる。
論文参考訳（メタデータ） (2021-12-07T02:47:58Z)
Adam$^+$: A Stochastic Method with Adaptive Variance Reduction [56.051001950733315]
Adamはディープラーニングアプリケーションに広く使われている最適化手法である。我々はAdam$+$(Adam-plusと発音する)という新しい方法を提案する。画像分類,言語モデリング,自動音声認識など,さまざまなディープラーニングタスクに関する実証研究により,Adam$+$がAdamを著しく上回ることを示した。
論文参考訳（メタデータ） (2020-11-24T09:28:53Z)
A new regret analysis for Adam-type algorithms [78.825194932103]
理論的には、オンライン凸最適化に対する後悔の保証は、急速に崩壊する$beta_1to0$スケジュールを必要とする。最適なデータ依存リセット境界を一定の$beta_1$で導出できる新しいフレームワークを提案する。
論文参考訳（メタデータ） (2020-03-21T19:19:51Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
我々はAdam Adagradと$O(d(N)/st)$アルゴリズムの収束の証明を示す。 Adamはデフォルトパラメータで使用する場合と同じ収束$O(d(N)/st)$で収束する。
論文参考訳（メタデータ） (2020-03-05T01:56:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。