Fugu-MT 論文翻訳(概要): Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent

論文の概要: Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent

arxiv url: http://arxiv.org/abs/2106.06749v1
Date: Sat, 12 Jun 2021 11:28:58 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-19 19:08:51.206920
Title: Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent
Title（参考訳）: 適応勾配降下から確率勾配降下へのスケーリング遷移の低減
Authors: Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu
Abstract要約: 本稿では,適応勾配降下法から勾配勾配降下法DSTAdaへのスケーリング遷移を減少させる手法を提案する。実験の結果,DSTAdaは高速で精度が高く,安定性と堅牢性も向上した。
参考スコア（独自算出の注目度）: 1.7874193862154875
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Currently, researchers have proposed the adaptive gradient descent algorithm and its variants, such as AdaGrad, RMSProp, Adam, AmsGrad, etc. Although these algorithms have a faster speed in the early stage, the generalization ability in the later stage of training is often not as good as the stochastic gradient descent. Recently, some researchers have combined the adaptive gradient descent and stochastic gradient descent to obtain the advantages of both and achieved good results. Based on this research, we propose a decreasing scaling transition from adaptive gradient descent to stochastic gradient descent method(DSTAda). For the training stage of the stochastic gradient descent, we use a learning rate that decreases linearly with the number of iterations instead of a constant learning rate. We achieve a smooth and stable transition from adaptive gradient descent to stochastic gradient descent through scaling. At the same time, we give a theoretical proof of the convergence of DSTAda under the framework of online learning. Our experimental results show that the DSTAda algorithm has a faster convergence speed, higher accuracy, and better stability and robustness. Our implementation is available at: https://github.com/kunzeng/DSTAdam.
Abstract（参考訳）: 現在、AdaGrad、RMSProp、Adam、AmsGradなどの適応勾配降下アルゴリズムとその変種が提案されている。これらのアルゴリズムは、初期の段階では高速であるが、後期の訓練における一般化能力は、確率的勾配降下ほど良くないことが多い。近年,適応勾配降下と確率勾配降下を組み合わせる研究が行われ,両者の利点が得られ,良好な結果が得られた。本研究では,適応勾配降下法から確率勾配降下法(DSTAda)へのスケーリング遷移を減少させる手法を提案する。確率勾配降下の訓練段階では,一定の学習率ではなく,反復回数に比例して線形に減少する学習率を用いる。適応勾配降下からスケールによる確率勾配降下への滑らかで安定な遷移を実現する。同時に,オンライン学習の枠組みの下でdstadaの収束を理論的に証明する。実験の結果,DSTAdaアルゴリズムはより高速な収束速度,高い精度,安定性,堅牢性を有することがわかった。私たちの実装は、https://github.com/kunzeng/DSTAdam.comで利用可能です。

関連論文リスト

Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize [24.464140786923476]
後続近似により、ディリクレプロセスの混合のような非パラメトリックは、分数的なコストでより大きなデータセットにスケールアップできる。勾配上昇は機械学習の現代的なアプローチであり、ディープニューラルネットワークのトレーニングに広く利用されている。本研究では,ディリクレプロセス混合物の後部近似のための高速アルゴリズムとして勾配上昇法について検討する。
論文参考訳（メタデータ） (2024-12-12T05:33:23Z)
How to guess a gradient [68.98681202222664]
我々は、勾配が以前考えられていたよりもより構造化されていることを示す。この構造をエクスプロイトすると、勾配のない最適化スキームが大幅に改善される。厳密な勾配の最適化と勾配の推測の間に大きなギャップを克服する上での新たな課題を強調した。
論文参考訳（メタデータ） (2023-12-07T21:40:44Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
我々は, 高速(指数率), ab initio(超自由)勾配に基づく適応法を提案する。本手法の主な考え方は,状況認識による$alphaの適応である。これは任意の次元 n の問題に適用でき、線型にしかスケールできない。
論文参考訳（メタデータ） (2023-09-12T14:36:13Z)
One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
これは、Fisherスコアリングアルゴリズムの1ステップで修正されたログ様関数の予測勾配勾配に基づいている。理論およびシミュレーションにより、平均勾配勾配や適応勾配勾配の通常の勾配勾配の代替として興味深いものであることを示す。
論文参考訳（メタデータ） (2023-06-09T13:43:07Z)
On Training Implicit Models [75.20173180996501]
ファントム勾配(ファントム勾配)と呼ばれる暗黙モデルに対する新しい勾配推定法を提案し、正確な勾配の計算コストを抑える。大規模タスクの実験では、これらの軽量ファントム勾配が暗黙の訓練モデルの後方通過を約1.7倍加速することを示した。
論文参考訳（メタデータ） (2021-11-09T14:40:24Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent [1.7874193862154875]
運動量勾配降下は、蓄積された勾配を電流パラメータの更新方向として利用する。平坦勾配降下は, 累積勾配により補正されていない。 TSGDアルゴリズムは訓練速度が速く、精度が高く、安定性も向上している。
論文参考訳（メタデータ） (2021-06-12T11:42:04Z)
Reparametrizing gradient descent [0.0]
本稿では,ノルム適応勾配勾配という最適化アルゴリズムを提案する。我々のアルゴリズムは準ニュートン法と比較することもできるが、定常点ではなく根を求める。
論文参考訳（メタデータ） (2020-10-09T20:22:29Z)
Neural gradients are near-lognormal: improved quantized and sparse training [35.28451407313548]
神経勾配の分布は概ね対数正規である。神経勾配の計算と記憶の負担を軽減するための2つの閉形式解析法を提案する。我々の知る限り,本論文は,(1)6ビット浮動小数点形式への勾配の定量化,あるいは(2)精度の低い場合において,最大85%の勾配間隔を達成した最初の論文である。
論文参考訳（メタデータ） (2020-06-15T07:00:15Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
適応アルゴリズムは勾配の歴史を用いて勾配を更新し、深層ニューラルネットワークのトレーニングにおいてユビキタスである。本稿では,非コンケーブ最小値問題に対するOptimisticOAアルゴリズムの変種を解析する。実験の結果,適応型GAN非適応勾配アルゴリズムは経験的に観測可能であることがわかった。
論文参考訳（メタデータ） (2019-12-26T22:10:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。