Fugu-MT 論文翻訳(概要): PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning

論文の概要: PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning

arxiv url: http://arxiv.org/abs/2505.22085v1
Date: Wed, 28 May 2025 08:07:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-29 17:35:50.483011
Title: PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning
Title（参考訳）: PADAM:Adamが科学機械学習における確率最適化の誤差を削減
Authors: Arnulf Jentzen, Julian Kranz, Adrian Riekert,
Abstract要約: Ruppert-Polyak平均化や指数移動平均化(EMA)といった平均化技術は、一般的なADAMのような勾配降下(SGD)最適化手法の最適化を高速化するための強力なアプローチである。本研究では,並列平均化ADAM(PADAM)と呼ばれる並列平均化手法を提案する。この手法では,ADAMの並列平均化変動を計算し,トレーニングプロセス中に最小の最適化誤差で勾配を動的に選択する。
参考スコア（独自算出の注目度）: 5.052293146674794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Averaging techniques such as Ruppert--Polyak averaging and exponential movering averaging (EMA) are powerful approaches to accelerate optimization procedures of stochastic gradient descent (SGD) optimization methods such as the popular ADAM optimizer. However, depending on the specific optimization problem under consideration, the type and the parameters for the averaging need to be adjusted to achieve the smallest optimization error. In this work we propose an averaging approach, which we refer to as parallel averaged ADAM (PADAM), in which we compute parallely different averaged variants of ADAM and during the training process dynamically select the variant with the smallest optimization error. A central feature of this approach is that this procedure requires no more gradient evaluations than the usual ADAM optimizer as each of the averaged trajectories relies on the same underlying ADAM trajectory and thus on the same underlying gradients. We test the proposed PADAM optimizer in 13 stochastic optimization and deep neural network (DNN) learning problems and compare its performance with known optimizers from the literature such as standard SGD, momentum SGD, Adam with and without EMA, and ADAMW. In particular, we apply the compared optimizers to physics-informed neural network, deep Galerkin, deep backward stochastic differential equation and deep Kolmogorov approximations for boundary value partial differential equation problems from scientific machine learning, as well as to DNN approximations for optimal control and optimal stopping problems. In nearly all of the considered examples PADAM achieves, sometimes among others and sometimes exclusively, essentially the smallest optimization error. This work thus strongly suggest to consider PADAM for scientific machine learning problems and also motivates further research for adaptive averaging procedures within the training of DNNs.
Abstract（参考訳）: Ruppert-Polyak平均化や指数移動平均化(EMA)といった平均化技術は、一般的なADAM最適化器のような確率勾配降下(SGD)最適化手法の最適化を高速化するための強力なアプローチである。しかし、検討中の特定の最適化問題により、最小の最適化誤差を達成するためには、平均値の型とパラメータを調整する必要がある。本研究では,並列平均化ADAM (PADAM) と呼ばれる並列平均化手法を提案する。この手法では,ADAMの並列平均化変種を計算し,トレーニングプロセス中に最小の最適化誤差で変種を動的に選択する。このアプローチの中心的な特徴は、各平均軌道が同じ基礎となるADAM軌道に依存しているため、通常のADAMオプティマイザ以上の勾配評価を必要としないことである。提案したPADAMオプティマイザを,確率的最適化と深層ニューラルネットワーク(DNN)学習問題で検証し,その性能を標準SGD,運動量SGD,Adam with and without EMA,ADAMWなどの文献からの既知のオプティマイザと比較した。特に, 物理インフォームドニューラルネットワーク, ディープ・ガレルキン, ディープ・後方確率微分方程式, ディープ・コルモゴロフ近似を, 科学機械学習による境界値偏微分方程式問題, および最適制御および最適停止問題に対するDNN近似に適用した。考慮されたほとんどの例において、PADMは、時折、時には排他的に、本質的には最小の最適化誤差を達成している。本研究は、科学的機械学習問題に対するPADMの検討を強く示唆するとともに、DNNのトレーニングにおける適応平均化手順のさらなる研究を動機付けている。

関連論文リスト

Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems [5.052293146674794]
この研究は古典的なPolyak-Ruppert平均化アプローチにインスパイアされている。本研究では,Adam法の平均変種をディープラーニングネットワーク(DNN)の学習に適用する。それぞれの数値例では、採用される平均変種Adamは標準Adamと標準SGDよりも優れている。
論文参考訳（メタデータ） (2025-01-10T16:15:25Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
我々は,現代のディープラーニングにおいて広く普及している一般的なメタ学習問題に対処する。これらの問題は、しばしばBi-Level Optimizations (BLO)として定式化される。我々は,与えられたBLO問題を,内部損失関数が滑らかな分布となり,外損失が内部分布に対する期待損失となるようなii最適化に変換することにより,新たな視点を導入する。
論文参考訳（メタデータ） (2024-10-14T12:10:06Z)
Convergence rates for the Adam optimizer [4.066869900592636]
我々はAdamベクトル場と呼ばれる新しいベクトル場関数を提案する。この場は、勾配アダム最適化過程を正確に記述するが、目的関数の負勾配とは異なる。我々の収束解析は、アダムが目的関数の臨界点に収束しないことを明らかにする。
論文参考訳（メタデータ） (2024-07-29T22:49:04Z)
Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses [5.052293146674794]
標準降下(SGD)最適化法は、学習率が0に収束しない場合、アダムのような加速および適応SGD最適化法が収束しないことが知られている。本研究では,経験的推定に基づいて学習率を調整するSGD最適化手法の学習速度適応手法を提案し,検討する。
論文参考訳（メタデータ） (2024-06-20T14:07:39Z)
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
メタ適応(MADA)は、複数の既知の収束を一般化し、トレーニング中に最も適した収束を動的に学習できる統合フレームワークである。私たちは、MADAを視覚や言語タスクに関する他の人気と経験的に比較し、MADAがAdamや他の人気を一貫して上回っていることに気付きました。 AVGradは最大演算子を平均演算子に置き換えたもので、高次最適化に適している。
論文参考訳（メタデータ） (2024-01-17T00:16:46Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
我々は,新しいtextscAdmeta(textbfADouble指数textbfMov averagtextbfE textbfAdaptiveおよび非適応運動量)フレームワークを提案する。我々は、textscAdmetaR と textscAdmetaS の2つの実装を提供し、前者は RAdam を、後者は SGDM をベースとしています。
論文参考訳（メタデータ） (2023-07-02T18:16:06Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
分子目的を最適化するための様々なZO最適化手法の有効性について検討する。 ZO符号に基づく勾配降下(ZO-signGD)の利点を示す。本稿では,Guurcamol スイートから広く使用されているベンチマークタスクに対して,ZO 最適化手法の有効性を示す。
論文参考訳（メタデータ） (2022-10-27T01:58:10Z)
Data-driven evolutionary algorithm for oil reservoir well-placement and control optimization [3.012067935276772]
一般化されたデータ駆動進化アルゴリズム(GDDE)は、適切な配置と制御最適化問題で実行されるシミュレーションの数を減らすために提案される。確率的ニューラルネットワーク(PNN)は、情報的および有望な候補を選択するための分類器として採用されている。
論文参考訳（メタデータ） (2022-06-07T09:07:49Z)
Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications [5.660384137948734]
提案アルゴリズムは, 温和な条件下で, 制御可能なバイアスで正しい分布に収束する。提案アルゴリズムは, 温和な条件下で, 制御可能なバイアスで正しい分布に収束可能であることを示す。
論文参考訳（メタデータ） (2020-06-29T20:57:20Z)
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients [112.00379151834242]
本稿では,Adamにおける2乗勾配のランニング平均を重み付き平均に置き換える適応学習率の原理を提案する。これにより、より高速な適応が可能となり、より望ましい経験的収束挙動がもたらされる。
論文参考訳（メタデータ） (2020-06-21T21:47:43Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
自己指向型オンライン学習最適化は、ディープニューラルネットワーク(DNN)と有限要素法(FEM)計算を統合している。本アルゴリズムは, コンプライアンスの最小化, 流体構造最適化, 伝熱促進, トラス最適化の4種類の問題によって検証された。その結果, 直接使用法と比較して計算時間を2～5桁削減し, 実験で検証した全ての最先端アルゴリズムより優れていた。
論文参考訳（メタデータ） (2020-02-04T20:00:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。