Fugu-MT 論文翻訳(概要): Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

論文の概要: Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

arxiv url: http://arxiv.org/abs/2009.13736v3
Date: Sat, 3 Apr 2021 23:43:26 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-13 04:58:26.258882
Title: Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy
Title（参考訳）: 経験リプレイを夢見るlucid - 現在のポリシーで過去の状態をリフレッシュする
Authors: Yunshu Du, Garrett Warnell, Assefaw Gebremedhin, Peter Stone, Matthew E. Taylor
Abstract要約: 我々は、エージェントの現在のポリシーを活用することで、リプレイ体験をリフレッシュできるフレームワークであるLucid Dreaming for Experience Replay(LiDER)を紹介した。 LiDERは6つのAtari 2600ゲームにおいて、ベースラインよりも一貫してパフォーマンスを改善している。
参考スコア（独自算出の注目度）: 48.8675653453076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Experience replay (ER) improves the data efficiency of off-policy reinforcement learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy. LiDER consists of three steps: First, LiDER moves an agent back to a past state. Second, from that state, LiDER then lets the agent execute a sequence of actions by following its current policy -- as if the agent were "dreaming" about the past and can try out different behaviors to encounter new experiences in the dream. Third, LiDER stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into off-policy, multi-worker RL algorithms that use ER; we present in this work a case study of applying LiDER to an actor-critic based algorithm. Results show LiDER consistently improves performance over the baseline in six Atari 2600 games. Our open-source implementation of LiDER and the data used to generate all plots in this work are available at github.com/duyunshu/lucid-dreaming-for-exp-replay.
Abstract（参考訳）: experience replay (er)は、エージェントが過去の経験をリプレイバッファに保存し再利用することで、オフポリシー強化学習(rl)アルゴリズムのデータ効率を向上させる。バッファから経験をサンプリングする方法をバイアスすることでerを強化するために多くのテクニックが提案されているが、これまでのところバッファ内の経験をリフレッシュするための戦略を検討していない。本稿では、エージェントの現在のポリシーを活用することで、リプレイ体験をリフレッシュできる概念的に新しいフレームワークであるLucid Dreaming for Experience Replay(LiDER)を紹介する。 LiDERは3つのステップから構成される: まず、LiDERはエージェントを過去の状態に戻す。次に、その状態から、liderはエージェントが現在のポリシーに従って一連のアクションを実行するようにします。第3に、liderは、エージェントが以前経験したもの、すなわち記憶をリフレッシュするために、新しいエクスペリエンスを格納し再利用する。 LiDER は ER を使用する外部のマルチワーカー RL アルゴリズムに容易に組み込めるように設計されており,本研究では,アクター批判に基づくアルゴリズムに LiDER を適用するケーススタディを提案する。結果、ライダーは6つのatari 2600ゲームでベースラインのパフォーマンスを一貫して向上させた。当社のLiDERのオープンソース実装と,この作業におけるすべてのプロットを生成するデータについては,github.com/duyunshu/lucid-dreaming-for-exp-replayで公開しています。

関連論文リスト

Reliability-Adjusted Prioritized Experience Replay [5.342556166066767]
本稿では,時間差誤差の信頼性の新たな尺度を導入することで,PER(priitized Experience Replay)の拡張を提案する。理論的には、結果の遷移選択アルゴリズムであるReliability-adjusted Prioritized Experience Replay (ReaPER)はPERよりも効率的な学習を可能にする。
論文参考訳（メタデータ） (2025-06-23T10:35:36Z)
Retrospex: Language Agent Meets Offline Reinforcement Learning Critic [4.776906435812746]
Retrospexは、過去の経験を深く分析するエージェントフレームワークである。 LLMの行動可能性と強化学習批判によって推定される行動値を組み合わせる。我々は,ScienceWorld,ALFWorld,Webshop環境におけるRetrospexを評価した。
論文参考訳（メタデータ） (2025-05-17T03:28:24Z)
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
クロスタスク体験の共有と選択によるシーケンシャル推論を強化する一般化可能なアルゴリズムを提案する。我々の研究は、既存のシーケンシャルな推論パラダイムのギャップを埋め、タスク間体験の活用の有効性を検証する。
論文参考訳（メタデータ） (2024-10-22T03:59:53Z)
OER: Offline Experience Replay for Continual Offline Reinforcement Learning [25.985985377992034]
エージェントには、事前にコンパイルされたオフラインデータセットのシーケンスを通じて、新たなスキルを継続的に学習することが望ましい。本稿では、エージェントが一連のオフライン強化学習タスクを学習する、新しい設定である連続オフライン強化学習(CORL)を定式化する。本稿では,リプレイバッファを構築するためのモデルベースエクスペリエンス選択手法を提案する。
論文参考訳（メタデータ） (2023-05-23T08:16:44Z)
Eventual Discounting Temporal Logic Counterfactual Experience Replay [42.20459462725206]
標準のRLフレームワークは、最大限に満足するポリシーを見つけるには筋が通らない。我々は、最終的に割引と呼ばれる手法を用いて、新しい値関数ベースのプロキシを開発する。第2に、政治外のデータを生成するための新しい体験再生手法を開発する。
論文参考訳（メタデータ） (2023-03-03T18:29:47Z)
Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation [7.6146285961466]
最近開発された理論上は厳格なリバース・エクスペリエンス・リプレイ(RER)について考察する。実験を通して、様々なタスクにおけるPER(Preferd Experience Replay)のようなテクニックよりも優れたパフォーマンスを示す。
論文参考訳（メタデータ） (2022-06-07T10:42:02Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
過去の経験のデータセットを最適な行動にマップするために、ネットワークをトレーニングします。検索プロセスは、現在のコンテキストで有用なデータセットから情報を取得するために訓練される。検索強化R2D2はベースラインR2D2エージェントよりもかなり高速に学習し,より高いスコアを得ることを示す。
論文参考訳（メタデータ） (2022-02-17T02:44:05Z)
Replay For Safety [51.11953997546418]
経験的なリプレイでは、過去の遷移はメモリバッファに格納され、学習中に再使用される。適切なバイアスサンプリング方式を用いることで,エファンセーフなポリシーを実現できることを示す。
論文参考訳（メタデータ） (2021-12-08T11:10:57Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
本稿では,Q-ラーニング手法における経験リプレイの体系的および広範囲な分析について述べる。我々は、リプレイ能力と、収集した経験に対する学習更新率の2つの基本特性に焦点を当てた。
論文参考訳（メタデータ） (2020-07-13T21:22:17Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
本研究は,現在の政策の定常分布下での経験を生かし,その可能性に基づいて,その経験を再評価することを提案する。提案手法は,ソフトアクタ批判 (SAC) とツイン遅延Deep Deterministic Policy gradient (TD3) の2つの競合手法に実証的に適用する。
論文参考訳（メタデータ） (2020-06-23T17:17:44Z)
Bootstrapping a DQN Replay Memory with Synthetic Experiences [0.0]
学習者を支援するために,非決定論的離散環境において合成体験を生成するアルゴリズムを提案する。 The Interpolated Experience Replay are evaluate on the FrozenLake environment and we show that it can support the agent to learn faster and better than the classic version。
論文参考訳（メタデータ） (2020-02-04T15:36:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。