Fugu-MT 論文翻訳(概要): On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

論文の概要: On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

arxiv url: http://arxiv.org/abs/2402.03982v1
Date: Tue, 6 Feb 2024 13:19:26 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-07 14:56:02.112783
Title: On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Title（参考訳）: 緩和仮定下での確率最適化のためのadamの収束について
Authors: Yusu Hong and Junhong Lin
Abstract要約: Adaptive Momentum Estimation (Adam)アルゴリズムは、様々なディープラーニングタスクにおいて非常に効果的である。この一般的な雑音モデルの下で,Adamは高い反復率で定常点のばらつきを見いだせることを示す。
参考スコア（独自算出の注目度）: 4.9495085874952895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks. Despite this, there's limited theoretical understanding for Adam, especially when focusing on its vanilla form in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. In this paper, we study vanilla Adam under these challenging conditions. We introduce a comprehensive noise model which governs affine variance noise, bounded noise and sub-Gaussian noise. We show that Adam can find a stationary point with a $\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$ rate in high probability under this general noise model where $T$ denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors. More importantly, we reveal that Adam is free of tuning step-sizes with any problem-parameters, yielding a better adaptation property than the Stochastic Gradient Descent under the same conditions. We also provide a probabilistic convergence result for Adam under a generalized smooth condition which allows unbounded smoothness parameters and has been illustrated empirically to more accurately capture the smooth property of many practical objective functions.
Abstract（参考訳）: Adaptive Momentum Estimation (Adam)アルゴリズムは、様々なディープラーニングタスクのトレーニングに非常に効果的である。それにもかかわらず、アダムには理論的な理解が限られており、特に非凸な滑らかなシナリオにおいてバニラ形式に焦点を合わせると、潜在的な非有界勾配とアフィン分散ノイズがある。本稿では,バニラ・アダムをこれらの困難条件下で研究する。本稿では,アフィン分散雑音,有界雑音,サブゲージ雑音を支配する包括的雑音モデルを提案する。我々はAdamが、この一般的なノイズモデルの下で高い確率で$\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$の定常点を見つけることができることを示す。より重要なことは、アダムは任意の問題パラメータでステップサイズをチューニングできず、同じ条件下での確率勾配降下よりも適応性が良いことを明らかにすることである。また,非有界なスムースネスパラメータを許容する一般化スムース条件下でのadamの確率的収束結果を示し,多くの実用目的関数のスムース特性をより正確に捉えるために実証的に示した。

関連論文リスト

Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed [83.8485684139678]
AdaGradやAdamのような適応的なステップを持つメソッドは、現代のディープラーニングモデルのトレーニングに不可欠です。 AdaGradはノイズが狭い場合, 高い確率収束性を有することを示す。我々は、Clip-RAD RedaGrad with Delayと呼ばれるAdaGradの新バージョンを提案する。
論文参考訳（メタデータ） (2024-06-06T18:49:10Z)
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance [23.112775335244258]
我々はまず,適応学習率を持つAdamの特殊なケースであるRMSPropを分析する。我々は、勾配ノルムの関数でもある降下補題において、新しい上界一階項を開発する。 RMSPropとAdamの両者の結果は、citearvani2023lowerで確立された複雑さと一致した。
論文参考訳（メタデータ） (2024-04-01T19:17:45Z)
High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise [4.9495085874952895]
我々はAdamが高い確率で定常点に収束できることを示し、$mathcalOleft(rm poly(log T)/sqrtTright)$を座標ワイドな「アフィン」ノイズ分散の下で表す。また、Adamの閉包は$mathcalOleft(rm poly(left T)right)$の順序でノイズレベルに適応していることも明らかにされている。
論文参考訳（メタデータ） (2023-11-03T15:55:53Z)
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization [20.399244578926474]
我々は,Adam型アルゴリズム(UAdam)の統一フレームワークを導入する。これは、NAdamBound、AdaFom、Adanといった2階のモーメントの一般的な形式を備えている。 UAdam が定常点の近傍に収束して $mathcalO (1/T)$ となることを示す。
論文参考訳（メタデータ） (2023-05-09T13:07:03Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
我々は、アダムがより現実的な条件下で、$O(epsilon-4)$勾配複雑性で$epsilon$-定常点に収束することを示している。また、Adamの分散還元版を$O(epsilon-3)$の加速勾配複雑性で提案する。
論文参考訳（メタデータ） (2023-04-27T06:27:37Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
この仮定から逸脱すると、実際により良い統計的推定結果が得られることが示される。特に、最適な雑音分布は、データと異なり、また、別の家族からさえも異なる。
論文参考訳（メタデータ） (2022-03-02T13:59:20Z)
A Novel Convergence Analysis for Algorithms of the Adam Family [105.22760323075008]
本稿ではAdam, AMSGrad, AdaboundなどのAdamスタイルの手法群に対する収束の一般的な証明を示す。我々の分析は非常に単純で汎用的なので、より広範な非構成最適化問題の族を解くための収束を確立するために利用することができる。
論文参考訳（メタデータ） (2021-12-07T02:47:58Z)
Adam$^+$: A Stochastic Method with Adaptive Variance Reduction [56.051001950733315]
Adamはディープラーニングアプリケーションに広く使われている最適化手法である。我々はAdam$+$(Adam-plusと発音する)という新しい方法を提案する。画像分類,言語モデリング,自動音声認識など,さまざまなディープラーニングタスクに関する実証研究により,Adam$+$がAdamを著しく上回ることを示した。
論文参考訳（メタデータ） (2020-11-24T09:28:53Z)
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients [112.00379151834242]
本稿では,Adamにおける2乗勾配のランニング平均を重み付き平均に置き換える適応学習率の原理を提案する。これにより、より高速な適応が可能となり、より望ましい経験的収束挙動がもたらされる。
論文参考訳（メタデータ） (2020-06-21T21:47:43Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。