Fugu-MT 論文翻訳(概要): Adaptive Temporal Difference Learning with Linear Function Approximation

論文の概要: Adaptive Temporal Difference Learning with Linear Function Approximation

arxiv url: http://arxiv.org/abs/2002.08537v2
Date: Mon, 11 Oct 2021 03:30:04 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-30 07:16:24.734877
Title: Adaptive Temporal Difference Learning with Linear Function Approximation
Title（参考訳）: 線形関数近似を用いた適応時間差学習
Authors: Tao Sun, Han Shen, Tianyi Chen, Dongsheng Li
Abstract要約: 本稿では,強化学習における政策評価タスクにおける時間差(TD)学習アルゴリズムを再検討する。線形関数近似を用いたTD(0)学習アルゴリズムの確率収束適応型射影多様体を開発した。いくつかの標準強化学習タスクにおいて,AdaTD(0)とAdaTD($lambda$)の性能を評価する。
参考スコア（独自算出の注目度）: 29.741034258674205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes, TD(0) suffers from slow convergence. Motivated by the tight link between the TD(0) learning algorithm and the stochastic gradient methods, we develop a provably convergent adaptive projected variant of the TD(0) learning algorithm with linear function approximation that we term AdaTD(0). In contrast to the TD(0), AdaTD(0) is robust or less sensitive to the choice of stepsizes. Analytically, we establish that to reach an $\epsilon$ accuracy, the number of iterations needed is $\tilde{O}(\epsilon^{-2}\ln^4\frac{1}{\epsilon}/\ln^4\frac{1}{\rho})$ in the general case, where $\rho$ represents the speed of the underlying Markov chain converges to the stationary distribution. This implies that the iteration complexity of AdaTD(0) is no worse than that of TD(0) in the worst case. When the stochastic semi-gradients are sparse, we provide theoretical acceleration of AdaTD(0). Going beyond TD(0), we develop an adaptive variant of TD($\lambda$), which is referred to as AdaTD($\lambda$). Empirically, we evaluate the performance of AdaTD(0) and AdaTD($\lambda$) on several standard reinforcement learning tasks, which demonstrate the effectiveness of our new approaches.
Abstract（参考訳）: 本稿では,強化学習における政策評価タスクにおける時間差(TD)学習アルゴリズムを再検討する。通常、TD(0) と TD($\lambda$) のパフォーマンスはステップサイズの選択に非常に敏感である。しばしば、TD(0) は緩やかな収束に悩まされる。本稿では,TD(0)学習アルゴリズムと確率勾配法との密接な関係を動機として,AdaTD(0)という線形関数近似を用いたTD(0)学習アルゴリズムの帰納的適応的射影変種を開発する。 TD(0) とは対照的に、AdaTD(0) は段数の選択に対して頑健であるか、あまり敏感でない。解析学的に、$\epsilon$精度に達するためには、必要となる反復数は$\tilde{O}(\epsilon^{-2}\ln^4\frac{1}{\epsilon}/\ln^4\frac{1}{\rho})$である。これは、最悪の場合、adatd(0) のイテレーションの複雑さが td(0) のそれよりも悪くはないことを意味する。確率的半勾配がスパースであるとき、AdaTD(0) の理論的加速を与える。 TD(0)を超えて、AdaTD($\lambda$)と呼ばれるTD($\lambda$)の適応的な変種を開発する。実験により,AdaTD(0)とAdaTD($\lambda$)の性能をいくつかの標準的な強化学習タスクで評価し,新しい手法の有効性を実証した。

関連論文リスト

Accelerating Multi-Task Temporal Difference Learning under Low-Rank Representation [12.732028509861829]
低ランク表現環境下でのマルチタスク強化学習(RL)における政策評価問題について検討する。そこで我々は,TD学習の更新に,いわゆるtruncatedの特異値分解ステップを統合する,新しいTD学習法を提案する。実験の結果,提案手法は古典的なTD学習よりも優れており,性能差は$r$が減少するにつれて増大することがわかった。
論文参考訳（メタデータ） (2025-03-03T20:07:45Z)
Statistical Efficiency of Distributional Temporal Difference Learning [24.03281329962804]
我々は、分布時間差分学習(CTD)と量子時間差分学習(QTD)の有限サンプル性能を解析する。 $gamma$-discounted infinite-horizon decision process に対して、NTD では $tildeOleft(frac1varepsilon2p (1-gamma)2pright)$ が、高い確率で $varepsilon$-optimal estimator を達成するために必要であることを示す。我々はヒルベルト空間における新しいフリードマンの不等式を確立し、これは独立な関心事である。
論文参考訳（メタデータ） (2024-03-09T06:19:53Z)
Distributed TD(0) with Almost No Communication [15.321579527891457]
線形関数近似を用いた時間差分学習の非漸近解析法を提案する。分散過程の収束時間がTD(0)の収束時間よりもN$の係数である線形時間高速化現象のバージョンを実証する。
論文参考訳（メタデータ） (2023-05-25T17:00:46Z)
Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
本稿では,非定常環境における遅延オンライン凸最適化(OCO)について検討する。まず, 遅延勾配の勾配降下ステップを, 到着順に応じて行う単純なアルゴリズム, DOGDを提案する。 DOGDが達成した動的後悔境界を$O(sqrtbardT(P_T+1))$に削減する改良アルゴリズムを開発した。
論文参考訳（メタデータ） (2023-05-20T07:54:07Z)
Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation [44.27439128304058]
そこで本研究では,TD学習アルゴリズムの時間的有限性について検討した。ステップサイズ選択の下で、テール平均TDのパラメータ誤差に基づいて有限時間境界を導出する。
論文参考訳（メタデータ） (2022-10-12T04:37:54Z)
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning [77.22019100456595]
通信周波数の異なる分散計算作業者のトレーニングアルゴリズムを示す。本研究では,より厳密な収束率を$mathcalO!!(sigma2-2_avg!)とする。また,不均一性の項は,作業者の平均遅延によっても影響されることを示した。
論文参考訳（メタデータ） (2022-06-16T17:10:57Z)
PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method [49.93717224277131]
PEriodically Restarted-ETD(PEriodically Restarted-ETD)と呼ばれる新しいETD手法を提案する。 PER-ETD は ETD と同じ所望の固定点に収束するが, 指数的なサンプルの複雑性は向上する。
論文参考訳（メタデータ） (2021-10-13T17:40:12Z)
Predictor-Corrector(PC) Temporal Difference(TD) Learning (PCTD) [0.0]
Predictor-Corrector Temporal difference (PCTD) は、私が離散時間 ODE の理論から翻訳時間 Reinforcement (RL) アルゴリズムと呼ぶものです。私は新しいタイプのtd学習アルゴリズムを提案する。近似されるパラメータは、ODEに対する解のTaylor Seriesエラーのマグニチュード低減の保証された順序を有する。
論文参考訳（メタデータ） (2021-04-15T18:54:16Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
問題に対処するための新しいアルゴリズム - Momentum- Single-timescale Approximation (MSTSA) を提案する。 MSTSAでは、低いレベルのサブプロブレムに対する不正確な解決策のため、反復でエラーを制御することができます。
論文参考訳（メタデータ） (2021-02-15T07:10:33Z)
Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup [56.27526702716774]
本稿では、A3CアルゴリズムをTD(0)で修正し、A3C-TD(0)と呼ばれ、証明可能な収束を保証する。 i.i.d. サンプリング a3c-td(0) は、作業者あたり $mathcalo(epsilon-2.5/n)$ のサンプル複雑性を取得して $epsilon$ 精度を達成する。 2 に対して $mathcalO(epsilon-2.5/N)$ の最もよく知られたサンプル複雑性との比較
論文参考訳（メタデータ） (2020-12-31T09:07:09Z)
Reducing Sampling Error in Batch Temporal Difference Learning [42.30708351947417]
時間差学習(TD)は現代の強化学習の基盤の1つである。本稿では、標準的TDアルゴリズムであるTD(0)を用いて、与えられたポリシーの値関数をデータのバッチから推定する。
論文参考訳（メタデータ） (2020-08-15T15:30:06Z)
Reanalysis of Variance Reduced Temporal Difference Learning [57.150444843282]
Korda と La が提案した分散還元型TD (VRTD) アルゴリズムは,マルコフサンプルを用いたオンラインTD学習に直接適用する。我々は,VRTDが線形収束速度でTDの固定点解の近傍に収束することが保証されていることを示す。
論文参考訳（メタデータ） (2020-01-07T05:32:43Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。