Fugu-MT 論文翻訳(概要): Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

論文の概要: Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

arxiv url: http://arxiv.org/abs/2109.03445v4
Date: Mon, 3 Apr 2023 08:15:35 GMT
ステータス: 翻訳完了
システム内更新日: 2023-04-05 02:02:32.786275
Title: Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning
Title（参考訳）: バッチ非同期確率近似の収束と強化学習への応用
Authors: Rajeeva L. Karandikar and M. Vidyasagar
Abstract要約: 我々は、そのようなアルゴリズムが研究中の地図の固定点に収束することを証明するための一般的な方法論を開発する。本稿では、値に対する時間差$TD(lambda)$と、最適なアクション値関数を求めるための$Q$-learningアルゴリズムを分析する。
参考スコア（独自算出の注目度）: 1.1127149900807178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a zero or a fixed point of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one makes a distinction between ``synchronous'' updating, whereby every component of the current guess is updated at each time, and ``asynchronous'' updating, whereby only one component is updated. In this paper, we study an intermediate situation that we call ``batch asynchronous stochastic approximation'' (BASA), in which, at each time instant, \textit{some but not all} components of the current estimated solution are updated. BASA allows the user to trade off memory requirements against time complexity. We develop a general methodology for proving that such algorithms converge to the fixed point of the map under study. These convergence proofs make use of weaker hypotheses than existing results. Specifically, existing convergence proofs require that the measurement noise is a zero-mean i.i.d\ sequence or a martingale difference sequence. In the present paper, we permit biased measurements, that is, measurement noises that have nonzero conditional mean. Also, all convergence results to date assume that the stochastic step sizes satisfy a probabilistic analog of the well-known Robbins-Monro conditions. We replace this assumption by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we analyze the temporal difference algorithm $TD(\lambda)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function. In both cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.
Abstract（参考訳）: 確率近似(英: stochastic approximation、SA)アルゴリズムは、関数のノイズ測定のみが利用可能であるとき、ベクトル値の試行の零点または定点を見つけるために広く用いられる確率的手法である。これまでの文献では、‘synchronous’ の更新と、現在の推測のすべてのコンポーネントが毎回更新される `synchronous'' の更新とを区別し、1つのコンポーネントだけが更新される。本稿では,現在推定されている解のコンポーネントを瞬時に更新する,‘batch asynchronous stochastic approximation’(basa)と呼ばれる中間状態について検討する。 BASAにより、ユーザーはメモリ要件を時間的複雑さと引き換えることができる。このようなアルゴリズムが研究中の写像の不動点に収束することを示す一般的な方法を開発した。これらの収束証明は、既存の結果よりも弱い仮説を用いる。具体的には、既存の収束証明は、測定ノイズがゼロ平均i.i.d\列またはマルティンゲール差分列である必要がある。本稿では,非ゼロ条件平均の計測ノイズについて,偏りの測定を許可する。また、すべての収束結果は、確率的ステップサイズがよく知られたロビンズ・モンロ条件の確率的類似性を満たすと仮定している。この仮定を,マルコフ過程の既約性に関する純粋決定論的条件に置き換える。 Reinforcement Learning への具体的な応用として、時間差分アルゴリズム $TD(\lambda)$ for value iteration と、最適なアクション値関数を見つけるための $Q$-learning アルゴリズムを解析する。どちらの場合も、既存の文献よりも穏やかな条件下でこれらのアルゴリズムの収束を確立する。

関連論文リスト

Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods [13.677904140815386]
Gradient Descent (SGD) は機械学習の研究で広く使われている。本稿では,より緩やかなステップサイズ条件下でのSGDの収束解析法を提案する。
論文参考訳（メタデータ） (2025-04-17T02:56:20Z)
From Continual Learning to SGD and Back: Better Rates for Continual Linear Models [50.11453013647086]
以前見られたタスクの損失を、$k$の繰り返しの後、忘れること、すなわち、分析する。実現可能な最小二乗の設定において、新しい最上界を創出する。我々は、タスクを繰り返しないランダム化だけで、十分に長いタスクシーケンスで破滅的な事態を防げることを初めて証明した。
論文参考訳（メタデータ） (2025-04-06T18:39:45Z)
Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
非ガウス成分分析(Non-Gaussian Component Analysis、NGCA)は、高次元データセットにおいて非ガウス方向を求める統計的タスクである。本稿では Sum-of-Squares フレームワークにおける NGCA の複雑さについて考察する。
論文参考訳（メタデータ） (2024-10-28T18:19:13Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
本稿では、RMSPropとその運動量拡張を考察し、$frac1Tsum_k=1Tの収束速度を確立する。我々の収束率は、次元$d$を除くすべての係数に関して下界と一致する。収束率は$frac1Tsum_k=1Tと類似していると考えられる。
論文参考訳（メタデータ） (2024-02-01T07:21:32Z)
Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
目的関数 $J(cdot)$ の定常点を求めるグラディエント・Descent (SGD) 法の収束特性について検討した。この結果は、すべての定常点が大域最小値である性質を持つ invex' 関数のクラスに適用できる。
論文参考訳（メタデータ） (2023-12-05T15:22:39Z)
Convergence Analysis of Decentralized ASGD [1.8710230264817358]
本稿では,ノード間の部分同期や制限的ネットワークトポロジを必要としない分散非同期SGD(DASGD)に対する新しい収束速度解析法を提案する。我々の収束証明は、固定段数と任意の非滑らかで同質でL字型の目的函数を仮定する。
論文参考訳（メタデータ） (2023-09-07T14:50:31Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
我々は、$symbol$k$収束問題に対して、新しいマップベースのアルゴリズム(mathsfnorMtext-mathsfSGD$)を提案する。
論文参考訳（メタデータ） (2023-05-10T01:12:11Z)
Topology-aware Generalization of Decentralized SGD [89.25765221779288]
本稿では,分散型Valpha-10安定降下(D-SGD)の一般化可能性について検討する。 D-SGDの一般化性は、初期訓練段階における接続性と正の相関があることを証明した。
論文参考訳（メタデータ） (2022-06-25T16:03:48Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling [64.31011847952006]
我々は、平均$n$スムーズでおそらくは非カラー関数のほぼ定常点を求める問題を再考する。我々は$smallsfcolorgreen$を一般化し、事実上あらゆるサンプリングメカニズムで確実に動作するようにします。我々は、スムーズな非カラー状態における最適境界の最も一般的な、最も正確な解析を提供する。
論文参考訳（メタデータ） (2022-06-05T21:32:33Z)
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning [3.8098187557917464]
この論文は$d$-dimensional recursion approximation, $$theta_n+1=theta_n + alpha_n + 1 f(theta_n, Phi_n+1)に関するものである。主な結果は、ドスカー・バラダン・リャプノフドリフト条件(DV3)の平均流とバージョンに関する追加条件の下で確立される。 a example is given where $f$ and $barf$ are linear in $theta$, and $Phi$ is a geometryal.
論文参考訳（メタデータ） (2021-10-27T13:38:25Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
本稿では,勾配降下(SGD)の軌跡を解析する。我々はSGDが厳格なステップサイズポリシーのために1ドルでサドルポイント/マニフォールドを避けることを示す。
論文参考訳（メタデータ） (2020-06-19T14:11:26Z)
Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball [17.33867778750777]
一般近似問題に対する勾配降下法(SGD)と重球法(SHB)について検討した。 SGD の場合、凸と滑らかな設定において、イテレートの重み付き平均に対して、最初の最も確実な収束式を提供する。
論文参考訳（メタデータ） (2020-06-14T11:12:05Z)
Confidence sequences for sampling without replacement [39.98103047898974]
私たちは$thetastar$に対して信頼性シーケンスを設計するための一連のツールを提示します。 CS は、(C_n)_n=1N$ の信頼集合の列で、サイズが小さく、高い確率で$thetastar$ を同時に含む。次に、WoRサンプリングのためのHoeffding-およびExperiical-Bernstein-type time-uniform CSと固定時間信頼区間を示す。
論文参考訳（メタデータ） (2020-06-08T04:30:25Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
非同期Q-ラーニングはマルコフ決定過程(MDP)の最適行動値関数(またはQ-関数)を学習することを目的としている。 Q-関数の入出力$varepsilon$-正確な推定に必要なサンプルの数は、少なくとも$frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$の順である。
論文参考訳（メタデータ） (2020-06-04T17:51:00Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
我々はAdam Adagradと$O(d(N)/st)$アルゴリズムの収束の証明を示す。 Adamはデフォルトパラメータで使用する場合と同じ収束$O(d(N)/st)$で収束する。
論文参考訳（メタデータ） (2020-03-05T01:56:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。