Fugu-MT 論文翻訳(概要): A Simple Convergence Proof of Adam and Adagrad

論文の概要: A Simple Convergence Proof of Adam and Adagrad

arxiv url: http://arxiv.org/abs/2003.02395v3
Date: Mon, 17 Oct 2022 13:20:40 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-26 06:24:34.248865
Title: A Simple Convergence Proof of Adam and Adagrad
Title（参考訳）: アダムとアダグラードの簡単な収束証明
Authors: Alexandre D\'efossez, L\'eon Bottou, Francis Bach, Nicolas Usunier
Abstract要約: 我々はAdam Adagradと$O(d(N)/st)$アルゴリズムの収束の証明を示す。 Adamはデフォルトパラメータで使用する場合と同じ収束$O(d(N)/st)$で収束する。
参考スコア（独自算出の注目度）: 74.24716715922759
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters of the optimizer, the dimension $d$, and the total number of iterations $N$. This bound can be made arbitrarily small, and with the right hyper-parameters, Adam can be shown to converge with the same rate of convergence $O(d\ln(N)/\sqrt{N})$. When used with the default parameters, Adam doesn't converge, however, and just like constant step-size SGD, it moves away from the initialization point faster than Adagrad, which might explain its practical success. Finally, we obtain the tightest dependency on the heavy ball momentum decay rate $\beta_1$ among all previous convergence bounds for non-convex Adam and Adagrad, improving from $O((1-\beta_1)^{-3})$ to $O((1-\beta_1)^{-1})$.
Abstract（参考訳）: 有界勾配を持つ滑らかな(非凸な)対象関数に適用した場合、Adam と Adagrad の適応最適化アルゴリズムの両方をカバーする単純な収束の証明を与える。予測において、軌道上で平均される客観的勾配の2乗ノルムは、問題の定数、オプティマイザのパラメータ、次元 $d$、反復の総数 $n$ で明示された上限を持つ。この境界は任意に小さくすることができ、右超パラメータでは、adam は同じ収束率である $o(d\ln(n)/\sqrt{n})$ で収束することが示される。しかし、デフォルトのパラメータで使われる場合、Adamは収束せず、定常的なステップサイズSGDと同じように、初期化点からAdagradより早く離れて、実際の成功を説明するかもしれない。最後に,非凸adam と adagrad の以前の収束境界のうち,重球運動量減衰率 $\beta_1$ に対する最も強い依存度を求め,$o((1-\beta_1)^{-3})$ から $o((1-\beta_1)^{-1})$ に改善した。

関連論文リスト

Simple Convergence Proof of Adam From a Sign-like Descent Perspective [58.89890024903816]
我々は、Adamが以前の$cal O(fracln TTs14)$よりも$cal O(frac1Ts14)$の最適なレートを達成することを示す。我々の理論分析は、収束を保証する重要な要因として運動量の役割に関する新たな洞察を提供する。
論文参考訳（メタデータ） (2025-07-08T13:19:26Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION は、勾配カルシュ=クーン=T (sqrtdK-)$で測定された $cal(sqrtdK-)$ の反復を収束する。従来のSGDと比較して,LIONは損失が小さく,性能も高いことを示す。
論文参考訳（メタデータ） (2024-11-12T11:30:53Z)
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance [23.112775335244258]
我々はまず,適応学習率を持つAdamの特殊なケースであるRMSPropを分析する。我々は、勾配ノルムの関数でもある降下補題において、新しい上界一階項を開発する。 RMSPropとAdamの両者の結果は、citearvani2023lowerで確立された複雑さと一致した。
論文参考訳（メタデータ） (2024-04-01T19:17:45Z)
Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods [25.831462008050387]
グラディエント・Descent(SGD)アルゴリズムは、実際の性能が良く、理論的な理解が欠如していることから、人々の関心を喚起している。有限収束がより広い合成最適化や非ユークリッドノルムに証明可能な拡張が可能かどうかはまだ不明である。
論文参考訳（メタデータ） (2023-12-13T21:41:06Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
我々は、アダムがより現実的な条件下で、$O(epsilon-4)$勾配複雑性で$epsilon$-定常点に収束することを示している。また、Adamの分散還元版を$O(epsilon-3)$の加速勾配複雑性で提案する。
論文参考訳（メタデータ） (2023-04-27T06:27:37Z)
High Probability Convergence of Stochastic Gradient Methods [15.829413808059124]
最適解への初期距離に依存する有界収束を示す。 AdaGrad-Normのハイバウンドが得られることを示す。
論文参考訳（メタデータ） (2023-02-28T18:42:11Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Proximal Gradient Descent-Ascent: Variable Convergence under K{\L} Geometry [49.65455534654459]
有限降下指数パラメータ (GDA) はミニマックス最適化問題の解法として広く応用されている。本稿では、KL-L型幾何学の収束を研究することにより、そのようなギャップを埋める。
論文参考訳（メタデータ） (2021-02-09T05:35:53Z)
Last iterate convergence of SGD for Least-Squares in the Interpolation regime [19.05750582096579]
基本最小二乗構成におけるノイズレスモデルについて検討する。最適予測器が完全に入力に適合すると仮定し、$langletheta_*, phi(X) rangle = Y$, ここで$phi(X)$は無限次元の非線型特徴写像を表す。
論文参考訳（メタデータ） (2021-02-05T14:02:20Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
本稿では,勾配降下(SGD)の軌跡を解析する。我々はSGDが厳格なステップサイズポリシーのために1ドルでサドルポイント/マニフォールドを避けることを示す。
論文参考訳（メタデータ） (2020-06-19T14:11:26Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。