Fugu-MT 論文翻訳(概要): Rainbow-DemoRL: Combining Improvements in Demonstration-Augmented Reinforcement Learning

論文の概要: Rainbow-DemoRL: Combining Improvements in Demonstration-Augmented Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.27400v1
Date: Sat, 28 Mar 2026 20:34:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.938263
Title: Rainbow-DemoRL: Combining Improvements in Demonstration-Augmented Reinforcement Learning
Title（参考訳）: レインボー・デモRL:実証強化型強化学習の改良
Authors: Dwait Bhatt, Shih-Chieh Chou, Nikolay Atanasov,
Abstract要約: オフラインで収集した実演を活用することで,オンライン強化学習(RL)のサンプル効率を向上させるために,いくつかのアプローチが提案されている。既存の実演型RLアプローチを3つのカテゴリに分類し,その強度,弱点,組み合わせに関する実証的研究を行った。分析の結果,オフラインデータを直接再利用し,動作のクローン化による初期化は,オンラインサンプル効率を向上させるために,より複雑なオフラインRL事前学習法よりも優れていた。
参考スコア（独自算出の注目度）: 5.7784578751617275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several approaches have been proposed to improve the sample efficiency of online reinforcement learning (RL) by leveraging demonstrations collected offline. The offline data can be used directly as transitions to optimize RL objectives, or offline policy and value functions can first be learned from the data and then used for online finetuning or to provide reference actions. While each of these strategies has shown compelling results, it is unclear which method has the most impact on sample efficiency, whether these approaches can be combined, and if there are cumulative benefits. We classify existing demonstration-augmented RL approaches into three categories and perform an extensive empirical study of their strengths, weaknesses, and combinations to isolate the contribution of each strategy and determine effective hybrid combinations for sample-efficient online RL. Our analysis reveals that directly reusing offline data and initializing with behavior cloning consistently outperform more complex offline RL pretraining methods for improving online sample efficiency.
Abstract（参考訳）: オフラインで収集した実演を活用することで,オンライン強化学習(RL)のサンプル効率を向上させるために,いくつかのアプローチが提案されている。オフラインデータは、RLの目的を最適化するためのトランジションとして、あるいはオフラインポリシーとバリュー関数を直接、データから学習し、オンラインの微調整や参照アクションの提供に使用することができる。いずれの戦略も説得力のある結果を示しているが、どの手法がサンプリング効率に最も影響を与えるのか、これらの手法が組み合わさるかどうか、累積的な利点があるかどうかは不明だ。我々は,既存の実演型RLアプローチを3つのカテゴリに分類し,その強み,弱点,組み合わせについて広範な実証研究を行い,各戦略の貢献を分離し,サンプル効率の良いオンラインRLのための効果的なハイブリッド組み合わせを決定する。分析の結果,オフラインデータを直接再利用し,動作のクローン化による初期化は,オンラインサンプル効率を向上させるために,より複雑なオフラインRL事前学習法よりも優れていた。

論文の概要: Rainbow-DemoRL: Combining Improvements in Demonstration-Augmented Reinforcement Learning

関連論文リスト