Fugu-MT 論文翻訳(概要): Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

論文の概要: Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

arxiv url: http://arxiv.org/abs/2110.05454v1
Date: Mon, 11 Oct 2021 17:43:59 GMT
ステータス: 翻訳完了
システム内更新日: 2021-10-12 21:08:15.447129
Title: Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Title（参考訳）: 適応勾配法におけるモーメント中心化と非同期更新
Authors: Juntang Zhuang, Yifan Ding, Tommy Tang, Nicha Dvornek, Sekhar Tatikonda, James S. Duncan
Abstract要約: ACProp (Asynchronous-centering-Prop) は、第2の運動量中心と非同期更新を結合したアダプティブである。 ACPropは強い理論と経験的な性能を持っている。
参考スコア（独自算出の注目度）: 11.016301322840354
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for $t$-th update, denominator uses information up to step $t-1$, while numerator uses gradient at $t$-th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition. We demonstrate that ACProp has a convergence rate of $O(\frac{1}{\sqrt{T}})$ for the stochastic non-convex case, which matches the oracle rate and outperforms the $O(\frac{logT}{\sqrt{T}})$ rate of RMSProp and Adam. We validate ACProp in extensive empirical studies: ACProp outperforms both SGD and other adaptive optimizers in image classification with CNN, and outperforms well-tuned adaptive optimizers in the training of various GAN models, reinforcement learning and transformers. To sum up, ACProp has good theoretical properties including weak convergence condition and optimal convergence rate, and strong empirical performance including good generalization like SGD and training stability like Adam.
Abstract（参考訳）: ACProp (Asynchronous-centering-Prop) は、第2モーメントと非同期更新(例えば、$t$-thの更新では、denominatorは、ステップ$t-1$までの情報を使用し、numeratorは$t$-thの勾配を使用する)の中央値を組み合わせた適応最適化器である。 acpropは強い理論特性と経験的性能を持つ。 reddi et al. (2018) の例では、非同期オプティマイザ(例: adashift, acprop)が同期オプティマイザ(例: adam, rmsprop, adabelief)よりも弱い収束条件を持つことが示されている。我々は、ACPropが確率的非凸ケースに対して$O(\frac{1}{\sqrt{T}})$の収束率を持つことを示し、これはオラクルレートと一致し、RMSPropとAdamの$O(\frac{logT}{\sqrt{T}})$よりも優れている。 ACPropは、CNNによる画像分類において、SGDおよび他の適応最適化器よりも優れ、様々なGANモデルのトレーニング、強化学習、変換器において、よく調整された適応最適化器より優れている。要約すると、ACPropは弱収束条件や最適収束率、SGDのような優れた一般化やAdamのような訓練安定性を含む強い経験的性能を含む優れた理論的性質を持つ。

関連論文リスト

Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness [26.49261268883266]
提案アルゴリズムがディープニューラルネットワークにおけるグループフェアネスにどう影響するかについて検討する。最適化の選択は、特に厳密な不均衡の下での公正な結果に実際に影響を及ぼすことを示す。本研究は,公正な成果を促進するための重要なメカニズムとして,適応的更新の役割を強調した。
論文参考訳（メタデータ） (2025-04-21T06:20:50Z)
AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates [17.490809667438818]
大規模言語モデル(LLM)の最適化フレームワークであるAlphaAdamを提案する。パラメータの更新を分離し、その強度を動的に調整することで、AlphaAdamは収束を加速し、トレーニングの安定性を向上させる。
論文参考訳（メタデータ） (2025-01-30T02:10:23Z)
AdaFisher: Adaptive Second Order Optimization via Fisher Information [22.851200800265914]
本稿では,適応型プレコンディショニング勾配のためのフィッシャー情報行列に対して,ブロック対角近似を利用する適応型2次のAdaFisherを提案する。 AdaFisher は精度と収束速度の両方において SOTA よりも優れていることを示す。
論文参考訳（メタデータ） (2024-05-26T01:25:02Z)
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance [23.112775335244258]
我々はまず,適応学習率を持つAdamの特殊なケースであるRMSPropを分析する。我々は、勾配ノルムの関数でもある降下補題において、新しい上界一階項を開発する。 RMSPropとAdamの両者の結果は、citearvani2023lowerで確立された複雑さと一致した。
論文参考訳（メタデータ） (2024-04-01T19:17:45Z)
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
メタ適応(MADA)は、複数の既知の収束を一般化し、トレーニング中に最も適した収束を動的に学習できる統合フレームワークである。私たちは、MADAを視覚や言語タスクに関する他の人気と経験的に比較し、MADAがAdamや他の人気を一貫して上回っていることに気付きました。 AVGradは最大演算子を平均演算子に置き換えたもので、高次最適化に適している。
論文参考訳（メタデータ） (2024-01-17T00:16:46Z)
Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
オンライン勾配降下(OGD)は、強い凸性や単調性仮定の下では二重最適であることが知られている。本稿では,これらのパラメータの事前知識を必要としない完全適応型OGDアルゴリズム,textsfAdaOGDを設計する。
論文参考訳（メタデータ） (2023-10-21T18:38:13Z)
Transformers as Support Vector Machines [54.642793677472724]
自己アテンションの最適化幾何と厳密なSVM問題との間には,形式的等価性を確立する。勾配降下に最適化された1層変圧器の暗黙バイアスを特徴付ける。これらの発見は、最適なトークンを分離し選択するSVMの階層としてのトランスフォーマーの解釈を刺激していると信じている。
論文参考訳（メタデータ） (2023-08-31T17:57:50Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
我々は,新しいtextscAdmeta(textbfADouble指数textbfMov averagtextbfE textbfAdaptiveおよび非適応運動量)フレームワークを提案する。我々は、textscAdmetaR と textscAdmetaS の2つの実装を提供し、前者は RAdam を、後者は SGDM をベースとしています。
論文参考訳（メタデータ） (2023-07-02T18:16:06Z)
Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction [19.71671771503269]
深層学習における適応の族にスパース群ラッソの正規化子を追加する新しいフレームワークを開発する。理論的に凸な設定では,収束保証が確立される。提案手法は, 極めて優れた性能, 高い競争性能を達成できる。
論文参考訳（メタデータ） (2021-07-30T05:33:43Z)
Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence [66.83161885378192]
ROC(AUROC)と精度リコール曲線(AUPRC)の下の領域は、不均衡問題に対する分類性能を評価するための一般的な指標である。本稿では,深層学習のためのAUPRCの最適化手法を提案する。
論文参考訳（メタデータ） (2021-04-18T06:22:21Z)
Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup [56.27526702716774]
本稿では、A3CアルゴリズムをTD(0)で修正し、A3C-TD(0)と呼ばれ、証明可能な収束を保証する。 i.i.d. サンプリング a3c-td(0) は、作業者あたり $mathcalo(epsilon-2.5/n)$ のサンプル複雑性を取得して $epsilon$ 精度を達成する。 2 に対して $mathcalO(epsilon-2.5/N)$ の最もよく知られたサンプル複雑性との比較
論文参考訳（メタデータ） (2020-12-31T09:07:09Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。