Fugu-MT 論文翻訳(概要): Single-Reset Divide & Conquer Imitation Learning

論文の概要: Single-Reset Divide & Conquer Imitation Learning

arxiv url: http://arxiv.org/abs/2402.09355v1
Date: Wed, 14 Feb 2024 17:59:47 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-15 14:06:58.593517
Title: Single-Reset Divide & Conquer Imitation Learning
Title（参考訳）: シングルリセットディバイドとコンカマー模倣学習
Authors: Alexandre Chenu, Olivier Serris, Olivier Sigaud, Nicolas Perrin-Gilbert
Abstract要約: デモはDeep Reinforcement Learningアルゴリズムの学習プロセスを高速化するために一般的に使用される。いくつかのアルゴリズムは1つのデモンストレーションから学習するために開発されている。
参考スコア（独自算出の注目度）: 49.87201678501027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Demonstrations are commonly used to speed up the learning process of Deep Reinforcement Learning algorithms. To cope with the difficulty of accessing multiple demonstrations, some algorithms have been developed to learn from a single demonstration. In particular, the Divide & Conquer Imitation Learning algorithms leverage a sequential bias to learn a control policy for complex robotic tasks using a single state-based demonstration. The latest version, DCIL-II demonstrates remarkable sample efficiency. This novel method operates within an extended Goal-Conditioned Reinforcement Learning framework, ensuring compatibility between intermediate and subsequent goals extracted from the demonstration. However, a fundamental limitation arises from the assumption that the system can be reset to specific states along the demonstrated trajectory, confining the application to simulated systems. In response, we introduce an extension called Single-Reset DCIL (SR-DCIL), designed to overcome this constraint by relying on a single initial state reset rather than sequential resets. To address this more challenging setting, we integrate two mechanisms inspired by the Learning from Demonstrations literature, including a Demo-Buffer and Value Cloning, to guide the agent toward compatible success states. In addition, we introduce Approximate Goal Switching to facilitate training to reach goals distant from the reset state. Our paper makes several contributions, highlighting the importance of the reset assumption in DCIL-II, presenting the mechanisms of SR-DCIL variants and evaluating their performance in challenging robotic tasks compared to DCIL-II. In summary, this work offers insights into the significance of reset assumptions in the framework of DCIL and proposes SR-DCIL, a first step toward a versatile algorithm capable of learning control policies under a weaker reset assumption.
Abstract（参考訳）: デモはDeep Reinforcement Learningアルゴリズムの学習プロセスを高速化するために一般的に使用される。複数のデモにアクセスすることの難しさに対処するため、いくつかのアルゴリズムが単一のデモから学習するために開発された。特に、分割と克服の模倣学習アルゴリズムは、単一の状態ベースのデモンストレーションを使用して複雑なロボットタスクの制御ポリシーを学ぶために逐次バイアスを利用する。最新バージョンのDCIL-IIは、顕著なサンプル効率を示している。この新手法は拡張目標条件強化学習フレームワーク内で動作し,デモから抽出した中間目標と後続目標との互換性を確保する。しかし、基本的な制限は、実証された軌道に沿ってシステムが特定の状態にリセットされ、シミュレーションシステムへの応用が制限されるという仮定から生じる。これに対応するために,sr-dcilと呼ばれる拡張を導入し,シーケンシャルリセットではなく,単一の初期状態リセットに依存することにより,この制約を克服する。この困難な状況に対処するため,我々は,デモバッファとバリュークローニングを含む,実演文献からの学習に触発された2つのメカニズムを統合して,エージェントをコンパチブルな成功状態へと導く。さらに,リセット状態から離れた目標に到達するためのトレーニングを容易にするため,近似目標切り換えを導入する。本論文は,DCIL-IIにおけるリセット仮定の重要性を強調し,SR-DCIL変異のメカニズムを提示し,DCIL-IIと比較して課題のあるロボット作業における性能評価を行う。まとめると、本研究はDCILのフレームワークにおけるリセット仮定の重要性に関する洞察を与え、より弱いリセット仮定の下で制御ポリシーを学習できる汎用アルゴリズムへの第一歩であるSR-DCILを提案する。

論文の概要: Single-Reset Divide & Conquer Imitation Learning

関連論文リスト