Fugu-MT 論文翻訳(概要): On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization

論文の概要: On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization

arxiv url: http://arxiv.org/abs/2109.14213v1
Date: Wed, 29 Sep 2021 06:38:39 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-30 23:17:29.535232
Title: On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization
Title（参考訳）: 非凸最小値最適化におけるアダム型アルゴリズムの一側収束について
Authors: Zehao Dou, Yuanzhi Li
Abstract要約: 本稿では,一方のMVI条件下での分極最適化問題において,アダム型アルゴリズムが一方の1次定常点に収束することを示す。また,この片側MVI条件が標準GANに対して満たされていることを実証的に検証した。
参考スコア（独自算出の注目度）: 43.504548777955854
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Adam-type methods, the extension of adaptive gradient methods, have shown great performance in the training of both supervised and unsupervised machine learning models. In particular, Adam-type optimizers have been widely used empirically as the default tool for training generative adversarial networks (GANs). On the theory side, however, despite the existence of theoretical results showing the efficiency of Adam-type methods in minimization problems, the reason of their wonderful performance still remains absent in GAN's training. In existing works, the fast convergence has long been considered as one of the most important reasons and multiple works have been proposed to give a theoretical guarantee of the convergence to a critical point of min-max optimization algorithms under certain assumptions. In this paper, we firstly argue empirically that in GAN's training, Adam does not converge to a critical point even upon successful training: Only the generator is converging while the discriminator's gradient norm remains high throughout the training. We name this one-sided convergence. Then we bridge the gap between experiments and theory by showing that Adam-type algorithms provably converge to a one-sided first order stationary points in min-max optimization problems under the one-sided MVI condition. We also empirically verify that such one-sided MVI condition is satisfied for standard GANs after trained over standard data sets. To the best of our knowledge, this is the very first result which provides an empirical observation and a strict theoretical guarantee on the one-sided convergence of Adam-type algorithms in min-max optimization.
Abstract（参考訳）: 適応勾配法の拡張であるアダム型手法は、教師なし機械学習モデルと教師なし機械学習モデルの訓練において優れた性能を示した。特に、adam型オプティマイザは、gans(generative adversarial network)のトレーニングのデフォルトツールとして、実証的に広く使われている。しかし、理論面では、最小化問題におけるアダム型手法の効率性を示す理論的な結果が存在するにもかかわらず、その素晴らしい性能の理由はまだ残っていない。既存の研究では、高速収束は最も重要な理由の1つと考えられており、特定の仮定の下でミンマックス最適化アルゴリズムの臨界点への収束を理論的に保証するために複数の研究が提案されている。本稿では、まず、GANのトレーニングにおいて、Adamはトレーニングが成功しても臨界点に収束しないことを実証的に論じる: ジェネレータのみが収束している一方で、差別者の勾配規範はトレーニングを通して高いままである。これを片側収束と呼ぶ。実験と理論のギャップを橋渡しし,一方のmvi条件下でのmin-max最適化問題において,adam型アルゴリズムが片側一階定常点に確実に収束することを示す。また、標準データセットをトレーニングした後、標準GANに対して一方的なMVI条件が満たされることを実証的に検証した。我々の知る限りでは、これは実験的な観察と、min-max最適化におけるAdam型アルゴリズムの一側収束に関する厳密な理論的保証を提供する最初の結果である。

関連論文リスト

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning [22.486361028522374]
統計的観点から,レイヤワイドプレコンディショニング手法が確実に必要であることを示す。我々は,SGDが理想的等方性入力を超えて拡張する際の準最適特徴であることを示す。我々は、Adamプリコンディショニングやバッチノームのような標準ツールがこれらの問題を緩やかに緩和することを示します。
論文参考訳（メタデータ） (2025-02-03T19:08:32Z)
Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case [0.0]
本稿では,バニラ・アダムの収束と非エルゴード収束の課題について考察する。これらの発見は、非ゴーディック最適化問題を解くために、Adamの確固たる理論基盤を構築する。
論文参考訳（メタデータ） (2023-07-20T12:02:17Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
我々は、アダムがより現実的な条件下で、$O(epsilon-4)$勾配複雑性で$epsilon$-定常点に収束することを示している。また、Adamの分散還元版を$O(epsilon-3)$の加速勾配複雑性で提案する。
論文参考訳（メタデータ） (2023-04-27T06:27:37Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
分散化されたミニマックス最適化は、幅広い機械学習に応用されているため、ここ数年で活発に研究されている。本稿では,非コンカブ問題に対する2つの新しい分散化ミニマックス最適化アルゴリズムを提案する。
論文参考訳（メタデータ） (2023-04-24T02:19:39Z)
A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning [0.6526824510982802]
適応勾配法はディープニューラルネットワークの最適化に人気がある。最近の例にはAdaGradとAdamがある。我々は適応的勾配法のための汎用的なフレームワークを開発する。
論文参考訳（メタデータ） (2022-06-04T17:55:33Z)
A Novel Convergence Analysis for Algorithms of the Adam Family [105.22760323075008]
本稿ではAdam, AMSGrad, AdaboundなどのAdamスタイルの手法群に対する収束の一般的な証明を示す。我々の分析は非常に単純で汎用的なので、より広範な非構成最適化問題の族を解くための収束を確立するために利用することができる。
論文参考訳（メタデータ） (2021-12-07T02:47:58Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
我々は,重力減衰グローバリゼーションにおいても,目的の異なる解に確実に異なる誤差で収束できることを示す。凸と重み減衰正則化を用いると、Adamを含む任意の最適化アルゴリズムは同じ解に収束することを示す。
論文参考訳（メタデータ） (2021-08-25T17:58:21Z)
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration [12.744658958445024]
Adamは、ディープニューラルネットワークをトレーニングするための最も影響力のある適応アルゴリズムの1つです。適応学習率の低下、大きなバッチサイズの採用など、既存のアプローチは、Adam型アルゴリズムの収束を促進しようとしている。本稿では,履歴ベース学習率のパラメータにのみ依存する,代替的な簡易チェック条件を提案する。
論文参考訳（メタデータ） (2021-01-14T06:42:29Z)
Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization [15.731908248435348]
強凸の場合の勾配降下平均化と主双進平均化アルゴリズムを開発する。一次二重平均化は出力平均化の観点から最適な収束率を導出し、SC-PDAは最適な個々の収束を導出する。 SVMとディープラーニングモデルに関するいくつかの実験は、理論解析の正確性とアルゴリズムの有効性を検証する。
論文参考訳（メタデータ） (2020-12-29T01:40:30Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
本稿では,ニューラルネットワークを用いた大規模AUCのための分散変数について検討する。我々のモデルは通信ラウンドをはるかに少なくし、理論上はまだ多くの通信ラウンドを必要としています。いくつかのデータセットに対する実験は、我々の理論の有効性を示し、我々の理論を裏付けるものである。
論文参考訳（メタデータ） (2020-05-05T18:08:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。