Fugu-MT 論文翻訳(概要): Tom: Leveraging trend of the observed gradients for faster convergence

論文の概要: Tom: Leveraging trend of the observed gradients for faster convergence

arxiv url: http://arxiv.org/abs/2109.03820v1
Date: Tue, 7 Sep 2021 20:19:40 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-10 14:11:39.885900
Title: Tom: Leveraging trend of the observed gradients for faster convergence
Title（参考訳）: Tom: より高速な収束のための観測された勾配の活用トレンド
Authors: Anirudh Maiya, Inumella Sricharan, Anshuman Pandey, Srinivas K. S
Abstract要約: TomはAdamの新しい変種であり、ニューラルネットワークによって渡される損失の風景の勾配の傾向を考慮に入れている。 Tomは両方の精度でAdagrad、Adadelta、RMSProp、Adamを上回り、より早く収束する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of deep learning can be attributed to various factors such as increase in computational power, large datasets, deep convolutional neural networks, optimizers etc. Particularly, the choice of optimizer affects the generalization, convergence rate, and training stability. Stochastic Gradient Descent (SGD) is a first order iterative optimizer that updates the gradient uniformly for all parameters. This uniform update may not be suitable across the entire training phase. A rudimentary solution for this is to employ a fine-tuned learning rate scheduler which decreases learning rate as a function of iteration. To eliminate the dependency of learning rate schedulers, adaptive gradient optimizers such as AdaGrad, AdaDelta, RMSProp, Adam employ a parameter-wise scaling term for learning rate which is a function of the gradient itself. We propose Tom (Trend over Momentum) optimizer, which is a novel variant of Adam that takes into account of the trend which is observed for the gradients in the loss landscape traversed by the neural network. In the proposed Tom optimizer, an additional smoothing equation is introduced to address the trend observed during the process of optimization. The smoothing parameter introduced for the trend requires no tuning and can be used with default values. Experimental results for classification datasets such as CIFAR-10, CIFAR-100 and CINIC-10 image datasets show that Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence. The source code is publicly made available at https://github.com/AnirudhMaiya/Tom
Abstract（参考訳）: ディープラーニングの成功は、計算能力の増加、大規模データセット、深層畳み込みニューラルネットワーク、オプティマイザなど、さまざまな要因に起因する可能性がある。特にオプティマイザの選択は一般化、収束率、トレーニング安定性に影響する。 Stochastic Gradient Descent (SGD) は、全てのパラメータに対して勾配を均一に更新する一階反復最適化器である。この均一な更新はトレーニングフェーズ全体では適さないかもしれない。これに対する初歩的な解決策は、反復関数としての学習率を減少させる微調整学習率スケジューラを使用することである。学習速度スケジューラの依存性を排除するために、AdaGrad、AdaDelta、RMSPropといった適応的な勾配最適化器は、勾配自体の関数である学習率のパラメータワイズスケーリング項を用いる。本稿では,ニューラルネットワークによる損失景観の勾配を考慮に入れたAdamの新たな変種であるTom(Trend over Momentum)オプティマイザを提案する。提案したトム最適化器では、最適化の過程で観測される傾向に対処するため、さらなる平滑化方程式が導入された。このトレンドに導入されたスムージングパラメータはチューニングを必要とせず、デフォルト値で使用できる。 CIFAR-10、CIFAR-100、CINIC-10画像データセットなどの分類データセットの実験結果から、TomはAdagrad、Adadelta、RMSProp、Adamよりも精度が高く、収束が速い。ソースコードはhttps://github.com/AnirudhMaiya/Tomで公開されている。

関連論文リスト

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
分散サーバ(DFL)はクライアント・クライアント・アーキテクチャへの依存をなくす。非滑らかな正規化はしばしば機械学習タスクに組み込まれる。本稿では,これらの問題を解決する新しいDNCFLアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-04-17T08:32:25Z)
Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning [0.0]
グラディエントDescent(DSG)とその変種(ADAMなど)はディープラーニングの最適化の基礎となっている。本稿では適応性と効率性を向上する新しい最適化手法であるAYLAを紹介する。
論文参考訳（メタデータ） (2025-04-02T16:31:39Z)
Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
深層ニューラルネットワークのプライバシーと信頼性を高めるために、機械学習(MU)が登場した。近似MUは大規模モデルの実用的手法である。本稿では,最新の学習方向を暗黙的に近似する高速スローパラメータ更新手法を提案する。
論文参考訳（メタデータ） (2024-09-29T15:17:33Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
大規模言語モデルに対する適応学習率(AdaLomo)を用いた低メモリ最適化を提案する。 AdaLomoはAdamWと同等の結果を得ると同時に、メモリ要件を大幅に削減し、大きな言語モデルをトレーニングするためのハードウェア障壁を低くする。
論文参考訳（メタデータ） (2023-10-16T09:04:28Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
我々は, 高速(指数率), ab initio(超自由)勾配に基づく適応法を提案する。本手法の主な考え方は,状況認識による$alphaの適応である。これは任意の次元 n の問題に適用でき、線型にしかスケールできない。
論文参考訳（メタデータ） (2023-09-12T14:36:13Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
フォワード学習は、ディープニューラルネットワークを学ぶためのバックプロップに代わる生物学的に妥当な代替手段である。重みよりも活性化に摂動を適用することにより、前方勾配のばらつきを著しく低減できることを示す。提案手法はMNIST と CIFAR-10 のバックプロップと一致し,ImageNet 上で提案したバックプロップフリーアルゴリズムよりも大幅に優れていた。
論文参考訳（メタデータ） (2022-10-07T03:52:27Z)
Step-size Adaptation Using Exponentiated Gradient Updates [21.162404996362948]
ステップサイズの適応的なチューニング手法で与えられた拡張が性能を大幅に向上させることを示す。私たちは、アップデートのグローバルなステップサイズスケールと、各座標のゲインファクタを維持しています。提案手法は, 特別に調整された学習率スケジュールを使わずに, 標準モデルの精度を高めることができることを示す。
論文参考訳（メタデータ） (2022-01-31T23:17:08Z)
Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models? [46.01087792062936]
クラス内のEmphbestメソッドの相対的性能による推定器のクラスの比較を行う。これにより、学習アルゴリズムのチューニング感度を厳格に定量化できます。
論文参考訳（メタデータ） (2021-08-26T16:01:37Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks [82.61182037130405]
勾配降下(SGD)は深いネットワークを訓練するための主要なアプローチです。本研究では,現在と過去の勾配の違いに基づいて,Adamに基づく変分を比較する。 resnet50を勾配降下訓練したネットワークのアンサンブルと融合実験を行った。
論文参考訳（メタデータ） (2021-03-26T18:55:08Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
ニューラルネットワークを初期化するための自動化およびアーキテクチャ手法であるgradinitを提案する。各ネットワーク層の分散は、SGDまたはAdamの単一ステップが最小の損失値をもたらすように調整される。また、学習率のウォームアップを伴わずに、オリジナルのPost-LN Transformerを機械翻訳用にトレーニングすることもできる。
論文参考訳（メタデータ） (2021-02-16T11:45:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。