Fugu-MT 論文翻訳(概要): AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

論文の概要: AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

arxiv url: http://arxiv.org/abs/2010.07468v5
Date: Sun, 20 Dec 2020 22:30:36 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-07 02:39:56.902914
Title: AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Title（参考訳）: AdaBelief Optimizer: 観測勾配におけるステップサイズ適応
Authors: Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar Tatikonda, Nicha Dvornek, Xenophon Papademetris, James S. Duncan
Abstract要約: 本稿では,適応法のような高速収束,SGDのような優れた一般化,訓練安定性の3つの目標を同時に達成するために,AdaBeliefを提案する。我々は、AdaBeliefを広範囲な実験で検証し、画像分類と言語モデリングにおいて、高速収束と高精度で他の手法よりも優れていることを示す。
参考スコア（独自算出の注目度）: 16.625086653851543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability.We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the stepsize according to the "belief" in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer
Abstract（参考訳）: ディープラーニングの最も一般的な最適化は、適応法(adamなど)と加速スキーム(sgdと運動量による確率勾配降下(sgd)など)に広く分類できる。畳み込みニューラルネットワーク(CNN)のような多くのモデルでは、適応的手法は一般的にSGDよりも高速に収束するが、より良く一般化する; 生成的敵対的ネットワーク(GAN)のような複雑な設定では、適応的手法は一般にその安定性のためにデフォルトとなるが、適応的手法は3つの目標を同時に達成するためにAdaBeliefを提案する。 AdaBelief の直感は、現在の勾配方向の "belief" に従ってステップ化を適応させることである。雑音勾配の指数的移動平均(EMA)を次のステップでの勾配の予測として見た場合、観測された勾配が予測から大きくずれた場合には、現在の観測を不信にし、小さなステップを踏む。画像分類や言語モデリングにおいて,他の手法よりも高速に収束し,高い精度を発揮できることを示すため,adabeliefを広範な実験で検証した。特にimagenetでは、adabeliefはsgdと同等の精度を実現している。さらに、Cifar10上でのGANのトレーニングでは、AdaBeliefはAdamオプティマイザと比較して高い安定性を示し、生成したサンプルの品質を向上させる。コードはhttps://github.com/juntang-zhuang/Adabelief-Optimizerで入手できる。

関連論文リスト

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization [6.468625143772815]
Adamのような適応的な勾配最適化手法は、さまざまな機械学習タスクにわたるディープニューラルネットワークのトレーニングで広く使われている。これらの手法は、降下勾配 (SGD) と比較して最適下一般化に苦しむことが多く、不安定性を示す。非ゼロ値で2階モーメント推定を初期化する。
論文参考訳（メタデータ） (2024-12-03T04:28:14Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
我々は, 高速(指数率), ab initio(超自由)勾配に基づく適応法を提案する。本手法の主な考え方は,状況認識による$alphaの適応である。これは任意の次元 n の問題に適用でき、線型にしかスケールできない。
論文参考訳（メタデータ） (2023-09-12T14:36:13Z)
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [158.19276683455254]
アダプティブ勾配アルゴリズムは、重ボール加速の移動平均アイデアを借用し、勾配の第1次モーメントを正確に推定し、収束を加速する。ネステロフ加速は、理論上はボール加速よりも早く収束し、多くの経験的ケースでも収束する。本稿では,計算勾配の余分な計算とメモリオーバーヘッドを回避するため,Nesterov運動量推定法(NME)を提案する。 Adan は視覚変換器 (ViT と CNN) で対応する SoTA を上回り,多くの人気ネットワークに対して新たな SoTA を設定する。
論文参考訳（メタデータ） (2022-08-13T16:04:39Z)
A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning [0.6526824510982802]
適応勾配法はディープニューラルネットワークの最適化に人気がある。最近の例にはAdaGradとAdamがある。我々は適応的勾配法のための汎用的なフレームワークを開発する。
論文参考訳（メタデータ） (2022-06-04T17:55:33Z)
Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
TomはAdamの新しい変種であり、ニューラルネットワークによって渡される損失の風景の勾配の傾向を考慮に入れている。 Tomは両方の精度でAdagrad、Adadelta、RMSProp、Adamを上回り、より早く収束する。
論文参考訳（メタデータ） (2021-09-07T20:19:40Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent [1.7874193862154875]
本稿では,適応勾配降下法から勾配勾配降下法DSTAdaへのスケーリング遷移を減少させる手法を提案する。実験の結果,DSTAdaは高速で精度が高く,安定性と堅牢性も向上した。
論文参考訳（メタデータ） (2021-06-12T11:28:58Z)
Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks [82.61182037130405]
勾配降下(SGD)は深いネットワークを訓練するための主要なアプローチです。本研究では,現在と過去の勾配の違いに基づいて,Adamに基づく変分を比較する。 resnet50を勾配降下訓練したネットワークのアンサンブルと融合実験を行った。
論文参考訳（メタデータ） (2021-03-26T18:55:08Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
レジリエンスとモメンタム(AdaRem)を用いた適応勾配法を提案する。 AdaRemは、過去の1つのパラメータの変化方向が現在の勾配の方向と一致しているかどうかに応じてパラメータワイズ学習率を調整する。本手法は,学習速度とテスト誤差の観点から,従来の適応学習率に基づくアルゴリズムよりも優れていた。
論文参考訳（メタデータ） (2020-10-21T14:49:00Z)
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients [112.00379151834242]
本稿では,Adamにおける2乗勾配のランニング平均を重み付き平均に置き換える適応学習率の原理を提案する。これにより、より高速な適応が可能となり、より望ましい経験的収束挙動がもたらされる。
論文参考訳（メタデータ） (2020-06-21T21:47:43Z)
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs [25.158203665218164]
適応勾配法は有限時間後にランダムシャッフルSGDよりも高速であることを示す。我々の知る限り、適応的勾配法は有限時間後にSGDよりも高速であることを示すのはこれが初めてである。
論文参考訳（メタデータ） (2020-06-12T09:39:47Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
我々は、勾配収束法を期待する適応勾配法を証明した。解析では、非理解勾配境界の最適化において、より適応的な勾配法に光を当てた。
論文参考訳（メタデータ） (2018-08-16T20:25:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。