Fugu-MT 論文翻訳(概要): Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

論文の概要: Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

arxiv url: http://arxiv.org/abs/2106.06753v1
Date: Sat, 12 Jun 2021 11:42:04 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-19 18:33:31.121706
Title: Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent
Title（参考訳）: 運動量確率勾配降下から平滑確率勾配降下へのスケーリング遷移
Authors: Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu
Abstract要約: 運動量勾配降下は、蓄積された勾配を電流パラメータの更新方向として利用する。平坦勾配降下は, 累積勾配により補正されていない。 TSGDアルゴリズムは訓練速度が速く、精度が高く、安定性も向上している。
参考スコア（独自算出の注目度）: 1.7874193862154875
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The plain stochastic gradient descent and momentum stochastic gradient descent have extremely wide applications in deep learning due to their simple settings and low computational complexity. The momentum stochastic gradient descent uses the accumulated gradient as the updated direction of the current parameters, which has a faster training speed. Because the direction of the plain stochastic gradient descent has not been corrected by the accumulated gradient. For the parameters that currently need to be updated, it is the optimal direction, and its update is more accurate. We combine the advantages of the momentum stochastic gradient descent with fast training speed and the plain stochastic gradient descent with high accuracy, and propose a scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent(TSGD) method. At the same time, a learning rate that decreases linearly with the iterations is used instead of a constant learning rate. The TSGD algorithm has a larger step size in the early stage to speed up the training, and training with a smaller step size in the later stage can steadily converge. Our experimental results show that the TSGD algorithm has faster training speed, higher accuracy and better stability. Our implementation is available at: https://github.com/kunzeng/TSGD.
Abstract（参考訳）: 平易な確率勾配降下と運動量確率勾配降下は、その単純な設定と計算複雑性の低さから、ディープラーニングにおいて非常に広く応用されている。運動量確率勾配降下は、累積勾配を現在のパラメータの更新方向として使用し、より高速なトレーニング速度を持つ。平面確率勾配勾配の方向は, 累積勾配によって補正されていない。現在更新する必要があるパラメータに対して、それは最適な方向であり、その更新はより正確である。本研究では,運動量確率勾配降下と高速訓練速度,平滑確率勾配降下の利点を高精度に組み合わせ,運動量確率勾配降下から平滑確率勾配降下(tsgd)へのスケーリング遷移を提案する。同時に、繰り返しとともに線形に減少する学習率を、一定の学習率の代わりに使用する。 TSGDアルゴリズムは、トレーニングを高速化するために初期段階においてより大きなステップサイズを持ち、後期段階においてより小さなステップサイズでのトレーニングは着実に収束する。実験の結果,TSGDアルゴリズムは学習速度が向上し,精度が向上し,安定性が向上した。私たちの実装は、https://github.com/kunzeng/TSGD.comで利用可能です。

関連論文リスト

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
パラメータ・サーバ・フレームワークにおける圧縮勾配通信を用いた分散勾配降下(SGD)のための新しいアルゴリズムを提案する。平坦な1ビット勾配勾配勾配法(FO-SGD)は2つの単純なアルゴリズムの考え方に依存している。
論文参考訳（メタデータ） (2024-05-17T21:17:27Z)
One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training [12.400950982075948]
重み量子化は、限られたリソースを持つエッジデバイスに展開するディープニューラルネットワークを圧縮する効果的な手法である。従来の損失対応量子化法は、全精度勾配を置き換えるために量子化勾配を用いるのが一般的である。本稿では、損失認識量子化のための1ステップの前進およびバックトラック手法を提案し、より正確で安定した勾配方向を得る。
論文参考訳（メタデータ） (2024-01-30T05:42:54Z)
One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
これは、Fisherスコアリングアルゴリズムの1ステップで修正されたログ様関数の予測勾配勾配に基づいている。理論およびシミュレーションにより、平均勾配勾配や適応勾配勾配の通常の勾配勾配の代替として興味深いものであることを示す。
論文参考訳（メタデータ） (2023-06-09T13:43:07Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
フォワード学習は、ディープニューラルネットワークを学ぶためのバックプロップに代わる生物学的に妥当な代替手段である。重みよりも活性化に摂動を適用することにより、前方勾配のばらつきを著しく低減できることを示す。提案手法はMNIST と CIFAR-10 のバックプロップと一致し,ImageNet 上で提案したバックプロップフリーアルゴリズムよりも大幅に優れていた。
論文参考訳（メタデータ） (2022-10-07T03:52:27Z)
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [158.19276683455254]
アダプティブ勾配アルゴリズムは、重ボール加速の移動平均アイデアを借用し、勾配の第1次モーメントを正確に推定し、収束を加速する。ネステロフ加速は、理論上はボール加速よりも早く収束し、多くの経験的ケースでも収束する。本稿では,計算勾配の余分な計算とメモリオーバーヘッドを回避するため,Nesterov運動量推定法(NME)を提案する。 Adan は視覚変換器 (ViT と CNN) で対応する SoTA を上回り,多くの人気ネットワークに対して新たな SoTA を設定する。
論文参考訳（メタデータ） (2022-08-13T16:04:39Z)
On Training Implicit Models [75.20173180996501]
ファントム勾配(ファントム勾配)と呼ばれる暗黙モデルに対する新しい勾配推定法を提案し、正確な勾配の計算コストを抑える。大規模タスクの実験では、これらの軽量ファントム勾配が暗黙の訓練モデルの後方通過を約1.7倍加速することを示した。
論文参考訳（メタデータ） (2021-11-09T14:40:24Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent [1.7874193862154875]
本稿では,適応勾配降下法から勾配勾配降下法DSTAdaへのスケーリング遷移を減少させる手法を提案する。実験の結果,DSTAdaは高速で精度が高く,安定性と堅牢性も向上した。
論文参考訳（メタデータ） (2021-06-12T11:28:58Z)
SSGD: A safe and efficient method of gradient descent [0.5099811144731619]
勾配降下法は様々な最適化問題を解く上で重要な役割を果たしている。超勾配降下法による勾配長の隠蔽によるパラメータの更新我々のアルゴリズムは勾配に対する攻撃に対して防御できる。
論文参考訳（メタデータ） (2020-12-03T17:09:20Z)
Anderson acceleration of coordinate descent [5.794599007795348]
複数の機械学習問題において、座標降下はフルグレードの手法よりも性能が大幅に向上する。本稿では,外挿による座標降下の高速化版を提案する。
論文参考訳（メタデータ） (2020-11-19T19:01:48Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
適応アルゴリズムは勾配の歴史を用いて勾配を更新し、深層ニューラルネットワークのトレーニングにおいてユビキタスである。本稿では,非コンケーブ最小値問題に対するOptimisticOAアルゴリズムの変種を解析する。実験の結果,適応型GAN非適応勾配アルゴリズムは経験的に観測可能であることがわかった。
論文参考訳（メタデータ） (2019-12-26T22:10:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。