Fugu-MT 論文翻訳(概要): Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

論文の概要: Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

arxiv url: http://arxiv.org/abs/2402.05155v1
Date: Wed, 7 Feb 2024 16:14:04 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-09 17:45:36.336423
Title: Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks
Title（参考訳）: ニューラルネットワーク学習におけるadamと確率勾配降下最適化のための大域的最小化器の非収束と局所的最小化器の構成
Authors: Arnulf Jentzen, Adrian Riekert
Abstract要約: SGDメソッドがANNのトレーニングに成功している理由を厳格に説明することは、依然としてオープンな問題である。我々は、SGD法が高い確率で大域最小化器を見つけることができることを証明した。さらに、SGD法が高い確率で行うようなANNの訓練では、大域的な最小化に収束することができないことを示す。
参考スコア（独自算出の注目度）: 6.708125191843434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD method and the popular Adam optimizer are nowadays the method of choice in the training of artificial neural networks (ANNs). Despite the remarkable success of SGD methods in the ANN training in numerical simulations, it remains in essentially all practical relevant scenarios an open problem to rigorously explain why SGD methods seem to succeed to train ANNs. In particular, in most practically relevant supervised learning problems, it seems that SGD methods do with high probability not converge to global minimizers in the optimization landscape of the ANN training problem. Nevertheless, it remains an open problem of research to disprove the convergence of SGD methods to global minimizers. In this work we solve this research problem in the situation of shallow ANNs with the rectified linear unit (ReLU) and related activations with the standard mean square error loss by disproving in the training of such ANNs that SGD methods (such as the plain vanilla SGD, the momentum SGD, the AdaGrad, the RMSprop, and the Adam optimizers) can find a global minimizer with high probability. Even stronger, we reveal in the training of such ANNs that SGD methods do with high probability fail to converge to global minimizers in the optimization landscape. The findings of this work do, however, not disprove that SGD methods succeed to train ANNs since they do not exclude the possibility that SGD methods find good local minimizers whose risk values are close to the risk values of the global minimizers. In this context, another key contribution of this work is to establish the existence of a hierarchical structure of local minimizers with distinct risk values in the optimization landscape of ANN training problems with ReLU and related activations.
Abstract（参考訳）: 一般のバニラSGD法やアダム最適化法のような確率勾配勾配(SGD)最適化法は,近年,人工ニューラルネットワーク(ANN)の訓練における選択方法となっている。数値シミュレーションにおいてSGD法が顕著に成功したにもかかわらず、SGD法がANNの訓練に成功しているように見える理由を厳格に説明するためのオープンな問題として、実質的にはすべての実践的なシナリオで残っている。特に,実際に関連する教師付き学習問題において,sgd法は,ann学習問題の最適化環境において,大域的最小値に収束しない確率が高いと考えられる。それでも、SGD法のグローバル・ミニマライザーへの収束を否定する研究のオープンな問題である。本研究では,直交線形単位 (ReLU) を持つ浅層ANNの状況と,SGD法(プレーンバニラSGD,運動量SGD,AdaGrad,RMSprop,Adamオプティマイザなど)のトレーニングにおいて,標準的な平均二乗誤差損失と関連するアクティベーションとを相殺することで,この課題を解決する。さらに,sgd法が高い確率で行うアンの訓練では,最適化環境における大域的最小化に収束しないことが明らかとなった。しかし、この研究の成果は、sgd法がリスク値がグローバル・ミニマルのリスク値に近い良い局所的最小値を見つける可能性を排除するものではないため、sgd法がアンの訓練に成功することを否定するものではない。この文脈において、この研究の重要な貢献は、ReLUと関連するアクティベーションを伴うANNトレーニング問題の最適化ランドスケープにおいて、異なるリスク値を持つ局所最小化器の階層構造の存在を確立することである。

関連論文リスト

Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks [5.052293146674794]
DNNのトレーニングにおいて、SGD最適化手法の真のリスクを最適な真のリスク値に収束させることを証明または証明することは、未解決の問題である。任意の完全連結フィードフォワードDNNのトレーニングでは、考慮された真のリスクが最適の真のリスク値に確率で収束するとは考えていない。
論文参考訳（メタデータ） (2025-03-03T15:36:01Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
正規球上の線形最小化オラクル(LMO)を利用する最適化手法について検討する。この問題の幾何学に適応するためにLMOを用いた新しいアルゴリズム群を提案し, 意外なことに, 制約のない問題に適用可能であることを示す。
論文参考訳（メタデータ） (2025-02-11T13:10:34Z)
Stability and Generalization for Distributed SGDA [70.97400503482353]
分散SGDAのための安定性に基づく一般化分析フレームワークを提案する。我々は, 安定性の誤差, 一般化ギャップ, 人口リスクの包括的分析を行う。理論的結果から,一般化ギャップと最適化誤差のトレードオフが明らかになった。
論文参考訳（メタデータ） (2024-11-14T11:16:32Z)
Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation [3.6185342807265415]
厳密な理論用語でSGD法の成功と限界を説明することは、研究のオープンな問題である。本研究では,最適化問題の大域的最小化に収束しない確率の高いSGD手法の大規模なクラスについて検証する。この研究の一般的な非収束結果は、通常のバニラ標準SGD法だけでなく、多くの加速および適応SGD法にも適用される。
論文参考訳（メタデータ） (2024-10-14T14:11:37Z)
Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates [3.6185342807265415]
ディープラーニングアルゴリズムは多くの人工知能(AI)システムにおいて重要な要素である。ディープラーニングアルゴリズムは通常、勾配降下(SGD)最適化法によって訓練されたディープニューラルネットワークのクラスで構成されている。
論文参考訳（メタデータ） (2024-07-11T00:10:35Z)
The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication [37.210933391984014]
ローカルSGDは分散学習において一般的な最適化手法であり、実際には他のアルゴリズムよりも優れていることが多い。我々は、既存の一階データ不均一性仮定の下で、局所的なSGDに対して新しい下界を提供する。また、いくつかの問題クラスに対して、高速化されたミニバッチSGDの min-max 最適性を示す。
論文参考訳（メタデータ） (2024-05-19T20:20:03Z)
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [59.822151945132525]
物理情報ニューラルネットワーク(PINN)は、損失関数のソフト制約として問題領域からの物理的知識を取り入れている。これらのモデルの訓練性に及ぼす座標点の位置の影響について検討した。モデルがより高い誤りを犯している領域に対して、より多くのコロケーションポイントを段階的に割り当てる適応的コロケーション方式を提案する。
論文参考訳（メタデータ） (2022-07-08T18:17:06Z)
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
ローカル手法は通信時間を短縮する有望なアプローチの1つである。局所的データセットが局所的損失の滑らかさよりも小さい場合,通信の複雑さは非局所的手法よりも優れていることを示す。
論文参考訳（メタデータ） (2022-02-12T15:12:17Z)
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions [1.7149364927872015]
勾配降下(SGD)型最適化法はディープニューラルネットワーク(DNN)の訓練において非常に効果的に機能する本研究では,修正線形単位(ReLU)アクティベーションを備えた完全連結フィードフォワードDNNのトレーニングにおけるSGD型最適化手法について検討する。
論文参考訳（メタデータ） (2021-12-13T11:45:36Z)
Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency [15.04034188283642]
Local SGDは分散学習における通信オーバーヘッドを克服するための有望なアプローチである。局所sgdaは均質データと異質データの両方において分散ミニマックス問題を確実に最適化できることを示す。
論文参考訳（メタデータ） (2021-02-25T20:15:18Z)
TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation [79.4205462326301]
TaylorGANは関数ベースの自然言語生成のための新しいアプローチである。オフポリシー更新による勾配推定と1階のTaylor拡張が強化される。これにより、より小さなバッチサイズで、スクラッチからNLGモデルをトレーニングすることができます。
論文参考訳（メタデータ） (2020-11-27T02:26:15Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
我々は,補完ラベルを用いた学習という,弱教師付き問題を研究する。勾配推定の品質はリスク最小化においてより重要であることを示す。本稿では,ゼロバイアスと分散の低減を両立させる新しい補助的相補的損失(SCL)フレームワークを提案する。
論文参考訳（メタデータ） (2020-07-05T04:19:37Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
コミュニケーションのボトルネックは、大規模なディープラーニングにおいて重要な問題である。非効率な分散問題に対する誤りフィードバックよりも優れた収束性を示す分散誤差フィードバック(DEF)アルゴリズムを提案する。また、DEFよりも優れた境界を示すDEFの一般化を加速するDEFAを提案する。
論文参考訳（メタデータ） (2020-04-11T03:50:59Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。