Fugu-MT 論文翻訳(概要): A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings

論文の概要: A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings

arxiv url: http://arxiv.org/abs/2309.08339v3
Date: Wed, 3 Apr 2024 13:36:16 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-05 20:22:43.439028
Title: A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings
Title（参考訳）: 非凸配置における「具体的な」一定のステップサイズを有するアダムの収束に関する理論的および実証的研究
Authors: Alokendu Mazumder, Rishabh Sabharwal, Manan Tayal, Bhartendu Kumar, Punit Rathore,
Abstract要約: ニューラルネットワークトレーニングでは、RMSとAdamは依然として広く好まれているアルゴリズムである。理論的には、アダムのステップサイズに対する定数収束を解析する。いくつかの過去の蓄積にもかかわらず、Adamにおける収束の鍵となる要因は、非ステップのサイズであることを示す。
参考スコア（独自算出の注目度）: 1.246305060872372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In neural network training, RMSProp and Adam remain widely favoured optimisation algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyse a constant step size version of Adam in the non-convex setting and discuss why it is important for the convergence of Adam to use a fixed step size. This work demonstrates the derivation and effective implementation of a constant step size for Adam, offering insights into its performance and efficiency in non convex optimisation scenarios. (i) First, we provide proof that these adaptive gradient algorithms are guaranteed to reach criticality for smooth non-convex objectives with constant step size, and we give bounds on the running time. Both deterministic and stochastic versions of Adam are analysed in this paper. We show sufficient conditions for the derived constant step size to achieve asymptotic convergence of the gradients to zero with minimal assumptions. Next, (ii) we design experiments to empirically study Adam's convergence with our proposed constant step size against stateof the art step size schedulers on classification tasks. Lastly, (iii) we also demonstrate that our derived constant step size has better abilities in reducing the gradient norms, and empirically, we show that despite the accumulation of a few past gradients, the key driver for convergence in Adam is the non-increasing step sizes.
Abstract（参考訳）: ニューラルネットワークトレーニングでは、RMSPropとAdamは最適化アルゴリズムを広く好んでいる。パフォーマンスの鍵の1つは、正しいステップサイズを選択することである。さらに、それらの理論収束性に関する疑問は、引き続き関心の対象である。本稿では,非凸設定におけるAdamの定数ステップサイズバージョンを理論的に解析し,Adamの収束が固定ステップサイズを使用する上で重要である理由について議論する。この研究は、Adam氏にとって一定のステップサイズの導出と効果的な実装を示し、非凸最適化シナリオのパフォーマンスと効率に関する洞察を提供する。第一に、これらの適応勾配アルゴリズムは、定常的なステップサイズで滑らかな非凸目的に対して臨界点に達することが保証されていることを示し、ランニング時間に限界を与える。本論文では,Adamの定式化版と確率型化版の両方について分析する。我々は、最小の仮定で勾配をゼロに漸近収束させるのに、導出定数のステップサイズについて十分な条件を示す。次に (II) 分類タスクにおけるアートステップサイズスケジューラの状態に対して, 提案した一定ステップサイズとアダムの収束を実証的に検討する実験を設計する。最後に。実験により, 過去の勾配の蓄積にもかかわらず, アダムにおける収束の鍵となる要因は非増加段階の大きさであることがわかった。

論文の概要: A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings

関連論文リスト