Fugu-MT 論文翻訳(概要): Adam Converges in Nonsmooth Nonconvex Optimization

論文の概要: Adam Converges in Nonsmooth Nonconvex Optimization

arxiv url: http://arxiv.org/abs/2606.22326v1
Date: Sun, 21 Jun 2026 04:00:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 19:06:22.256173
Title: Adam Converges in Nonsmooth Nonconvex Optimization
Title（参考訳）: 非滑らかな非凸最適化におけるアダム収束
Authors: Zijian Liu,
Abstract要約: アダムは広く実装され、影響力のある近代バイアスの1つである。我々は,Adam の速度に対する最初の有限時間解析を iTe ステップで行い,さらなる修正を加えない。
参考スコア（独自算出の注目度）: 3.8357180714081327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adam is one of the most widely implemented and influential modern optimizers. Why is it effective across different optimization problems in practice? This question arguably lies at the center of the optimization community over the last decade and has motivated a substantial body of work aimed at understanding its convergence behavior. However, existing studies have mainly focused on the convergence rate of Adam in smooth nonconvex optimization, which unfortunately does not adequately capture practical settings, since many real-world problems are nonsmooth, such as those arising in training neural networks. Thus, these studies cannot fully explain the popularity and empirical success of Adam. Recently, an insightful and powerful framework called Online-to-Nonconvex Conversion has opened a new way to analyze Adam for nonsmooth nonconvex optimization. Unfortunately, prior works along this line share two common limitations. First, all of them ignore the important bias-correction term in the original Adam algorithm. Second and more importantly, many of them require extra operations that are not used in Adam, such as a clipping step. Therefore, the convergence guarantee for the original Adam method still remains unclear. In this work, we present the first finite-time analysis for the classical form of Adam, i.e., with the bias-correction step and without further algorithmic modifications, and prove that a randomly scaled learning rate ensures a convergence rate of $1/T^{\frac{2}{13}}$ for nonsmooth nonconvex optimization. Moreover, our result provably applies to the modern heavy-tailed noise regime, which is closer to practice. Interestingly, our theory is established under the parameter choice $β_1=β_2$, aligning with the recent empirical studies.
Abstract（参考訳）: アダムは最も広く実装され、影響力のある現代オプティマイザの1人である。なぜ異なる最適化問題に対して効果があるのか? この質問は、おそらく過去10年間、最適化コミュニティの中心にあり、その収束行動を理解することを目的とした、かなりの作業の動機となっている。しかし、既存の研究は主に、スムーズな非凸最適化におけるアダムの収束率に焦点を合わせており、残念なことに現実の問題は、ニューラルネットワークのトレーニングで発生するような非滑らかな問題が多いため、実用的な設定を適切に捉えていない。したがって、これらの研究はアダムの人気と経験的成功を十分に説明できない。最近、Online-to-Nonconvex Conversionと呼ばれる洞察に富んだ強力なフレームワークが、非滑らかな非凸最適化のためにAdamを分析する新しい方法を公開した。残念なことに、この路線の先行工事には2つの共通する制限がある。まず、これらはすべて元のAdamアルゴリズムの重要なバイアス補正項を無視している。 2番目に重要なのは、クリップングステップなど、Adamで使用されていない余分な操作を必要とすることです。したがって、元のアダム法に対する収束保証はいまだに不明である。本研究では,Adam の古典形式に対する最初の有限時間解析,すなわちバイアス補正ステップを伴い,さらにアルゴリズム的な修正を加えることなく,非滑らかな非凸最適化のための1/T^{\frac{2}{13}}$の収束速度をランダムにスケールした学習速度で保証することを示す。さらに,本研究の成果は,実践に近い現代ヘビーテールノイズレジームにも確実に当てはまる。興味深いことに、我々の理論は最近の経験的研究と一致するパラメータ選択$β_1=β_2$の下で確立されている。

論文の概要: Adam Converges in Nonsmooth Nonconvex Optimization

関連論文リスト