Fugu-MT 論文翻訳(概要): Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

論文の概要: Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

arxiv url: http://arxiv.org/abs/2109.03445v5
Date: Tue, 20 Feb 2024 12:58:09 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-21 22:09:07.009230
Title: Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning
Title（参考訳）: バッチ非同期確率近似の収束と強化学習への応用
Authors: Rajeeva L. Karandikar and M. Vidyasagar
Abstract要約: 近似 (SA) は $f(theta) = 0$ という形の方程式の解を見つけるための標準ツールとなっている。強化学習(RL)における重要なテクニックである$Q$-learningのようなアプリケーションでは、$theta_t$の1つのコンポーネントのみが$t$ごとに更新される。本研究の課題は textbfBlock Asynchronous SA (BASA) を各ステップで研究することである。
参考スコア（独自算出の注目度）: 2.3134637611088653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ever since its introduction in the classic paper of Robbins and Monro in 1951, Stochastic Approximation (SA) has become a standard tool for finding a solution of an equation of the form $f(\theta) = 0$, when only noisy measurements of $f(\cdot)$ are available. In most situations, \textit{every component} of the putative solution $\theta_t$ is updated at each step $t$. In some applications such as $Q$-learning, a key technique in Reinforcement Learning (RL), \textit{only one component} of $\theta_t$ is updated at each $t$. This is known as \textbf{asynchronous} SA. The topic of study in the present paper is to study \textbf{Block Asynchronous SA (BASA)}, in which, at each step $t$, \textit{some but not necessarily all} components of $\theta_t$ are updated. The theory presented here embraces both conventional (synchronous) SA as well as asynchronous SA, and all in-between possibilities. We also prove bounds on the \textit{rate} of convergence of $\theta_t$ to the solutions. As a prelude to the new results, we also briefly survey some results on the convergence of the Stochastic Gradient method, proved in a companion paper by the present authors.
Abstract（参考訳）: 1951年にロビンズとモンロの古典的論文で紹介されて以来、確率近似 (stochastic approximation, sa) は、ノイズの大きい測定値である$f(\theta) = 0$ の形の方程式の解を見つけるための標準的なツールとなっている。ほとんどの場合、配置ソリューション $\theta_t$ の \textit{every component} は各ステップ $t$ で更新される。 q$-learningのようないくつかのアプリケーションでは、強化学習(rl)のキーテクニックである \textit{only one component} of $\theta_t$がそれぞれ$t$で更新される。これは \textbf{asynchronous} SA として知られている。本稿では,各ステップ$t$, \textit{some, but not always all} component of $\theta_t$ を更新した \textbf{Block Asynchronous SA (BASA)} について検討する。ここで提示される理論は、従来の(同期) SA だけでなく、非同期 SA も含んでいる。また、解に対する$\theta_t$ の収束の \textit{rate} 上の境界も証明する。筆者らによる共著論文で証明された確率勾配法の収束に関するいくつかの結果について,新しい結果の先行研究として,簡単な調査を行った。

関連論文リスト

Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods [13.677904140815386]
Gradient Descent (SGD) は機械学習の研究で広く使われている。本稿では,より緩やかなステップサイズ条件下でのSGDの収束解析法を提案する。
論文参考訳（メタデータ） (2025-04-17T02:56:20Z)
From Continual Learning to SGD and Back: Better Rates for Continual Linear Models [50.11453013647086]
以前見られたタスクの損失を、$k$の繰り返しの後、忘れること、すなわち、分析する。実現可能な最小二乗の設定において、新しい最上界を創出する。我々は、タスクを繰り返しないランダム化だけで、十分に長いタスクシーケンスで破滅的な事態を防げることを初めて証明した。
論文参考訳（メタデータ） (2025-04-06T18:39:45Z)
Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
非ガウス成分分析(Non-Gaussian Component Analysis、NGCA)は、高次元データセットにおいて非ガウス方向を求める統計的タスクである。本稿では Sum-of-Squares フレームワークにおける NGCA の複雑さについて考察する。
論文参考訳（メタデータ） (2024-10-28T18:19:13Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
本稿では、RMSPropとその運動量拡張を考察し、$frac1Tsum_k=1Tの収束速度を確立する。我々の収束率は、次元$d$を除くすべての係数に関して下界と一致する。収束率は$frac1Tsum_k=1Tと類似していると考えられる。
論文参考訳（メタデータ） (2024-02-01T07:21:32Z)
Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
目的関数 $J(cdot)$ の定常点を求めるグラディエント・Descent (SGD) 法の収束特性について検討した。この結果は、すべての定常点が大域最小値である性質を持つ invex' 関数のクラスに適用できる。
論文参考訳（メタデータ） (2023-12-05T15:22:39Z)
Convergence Analysis of Decentralized ASGD [1.8710230264817358]
本稿では,ノード間の部分同期や制限的ネットワークトポロジを必要としない分散非同期SGD(DASGD)に対する新しい収束速度解析法を提案する。我々の収束証明は、固定段数と任意の非滑らかで同質でL字型の目的函数を仮定する。
論文参考訳（メタデータ） (2023-09-07T14:50:31Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
我々は、$symbol$k$収束問題に対して、新しいマップベースのアルゴリズム(mathsfnorMtext-mathsfSGD$)を提案する。
論文参考訳（メタデータ） (2023-05-10T01:12:11Z)
Topology-aware Generalization of Decentralized SGD [89.25765221779288]
本稿では,分散型Valpha-10安定降下(D-SGD)の一般化可能性について検討する。 D-SGDの一般化性は、初期訓練段階における接続性と正の相関があることを証明した。
論文参考訳（メタデータ） (2022-06-25T16:03:48Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling [64.31011847952006]
我々は、平均$n$スムーズでおそらくは非カラー関数のほぼ定常点を求める問題を再考する。我々は$smallsfcolorgreen$を一般化し、事実上あらゆるサンプリングメカニズムで確実に動作するようにします。我々は、スムーズな非カラー状態における最適境界の最も一般的な、最も正確な解析を提供する。
論文参考訳（メタデータ） (2022-06-05T21:32:33Z)
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning [3.8098187557917464]
この論文は$d$-dimensional recursion approximation, $$theta_n+1=theta_n + alpha_n + 1 f(theta_n, Phi_n+1)に関するものである。主な結果は、ドスカー・バラダン・リャプノフドリフト条件(DV3)の平均流とバージョンに関する追加条件の下で確立される。 a example is given where $f$ and $barf$ are linear in $theta$, and $Phi$ is a geometryal.
論文参考訳（メタデータ） (2021-10-27T13:38:25Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
本稿では,勾配降下(SGD)の軌跡を解析する。我々はSGDが厳格なステップサイズポリシーのために1ドルでサドルポイント/マニフォールドを避けることを示す。
論文参考訳（メタデータ） (2020-06-19T14:11:26Z)
Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball [17.33867778750777]
一般近似問題に対する勾配降下法(SGD)と重球法(SHB)について検討した。 SGD の場合、凸と滑らかな設定において、イテレートの重み付き平均に対して、最初の最も確実な収束式を提供する。
論文参考訳（メタデータ） (2020-06-14T11:12:05Z)
Confidence sequences for sampling without replacement [39.98103047898974]
私たちは$thetastar$に対して信頼性シーケンスを設計するための一連のツールを提示します。 CS は、(C_n)_n=1N$ の信頼集合の列で、サイズが小さく、高い確率で$thetastar$ を同時に含む。次に、WoRサンプリングのためのHoeffding-およびExperiical-Bernstein-type time-uniform CSと固定時間信頼区間を示す。
論文参考訳（メタデータ） (2020-06-08T04:30:25Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
非同期Q-ラーニングはマルコフ決定過程(MDP)の最適行動値関数(またはQ-関数)を学習することを目的としている。 Q-関数の入出力$varepsilon$-正確な推定に必要なサンプルの数は、少なくとも$frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$の順である。
論文参考訳（メタデータ） (2020-06-04T17:51:00Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
我々はAdam Adagradと$O(d(N)/st)$アルゴリズムの収束の証明を示す。 Adamはデフォルトパラメータで使用する場合と同じ収束$O(d(N)/st)$で収束する。
論文参考訳（メタデータ） (2020-03-05T01:56:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。