Fugu-MT 論文翻訳(概要): Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials

論文の概要: Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials

arxiv url: http://arxiv.org/abs/2210.05178v3
Date: Sat, 23 Sep 2023 23:25:32 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-27 04:42:39.865490
Title: Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials
Title（参考訳）: ロボットの事前訓練:オフラインRLで試行錯誤から新しいタスクを学習できる
Authors: Aviral Kumar, Anikait Singh, Frederik Ebert, Mitsuhiko Nakamoto, Yanlai Yang, Chelsea Finn, Sergey Levine
Abstract要約: 新しいタスクを効果的に学習しようとするオフラインRLに基づくフレームワークを提案する。既存のロボットデータセットの事前トレーニングと、新しいタスクの迅速な微調整と、最大10のデモを組み合わせたものだ。我々の知る限り、PTRは本物のWidowXロボットで新しいドメインで新しいタスクを学習する最初のRL手法である。
参考スコア（独自算出の注目度）: 97.95400776235736
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Progress in deep learning highlights the tremendous potential of utilizing diverse robotic datasets for attaining effective generalization and makes it enticing to consider leveraging broad datasets for attaining robust generalization in robotic learning as well. However, in practice, we often want to learn a new skill in a new environment that is unlikely to be contained in the prior data. Therefore we ask: how can we leverage existing diverse offline datasets in combination with small amounts of task-specific data to solve new tasks, while still enjoying the generalization benefits of training on large amounts of data? In this paper, we demonstrate that end-to-end offline RL can be an effective approach for doing this, without the need for any representation learning or vision-based pre-training. We present pre-training for robots (PTR), a framework based on offline RL that attempts to effectively learn new tasks by combining pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations. PTR utilizes an existing offline RL method, conservative Q-learning (CQL), but extends it to include several crucial design decisions that enable PTR to actually work and outperform a variety of prior methods. To our knowledge, PTR is the first RL method that succeeds at learning new tasks in a new domain on a real WidowX robot with as few as 10 task demonstrations, by effectively leveraging an existing dataset of diverse multi-task robot data collected in a variety of toy kitchens. We also demonstrate that PTR can enable effective autonomous fine-tuning and improvement in a handful of trials, without needing any demonstrations. An accompanying overview video can be found in the supplementary material and at thi URL: https://sites.google.com/view/ptr-final/
Abstract（参考訳）: 深層学習の進歩は、効果的な一般化を達成するために多様なロボットデータセットを利用するという驚くべき可能性を浮き彫りにしている。しかし、実際には、私たちは多くの場合、以前のデータに含まれない新しい環境で新しいスキルを学びたいと思っています。そこで、我々はどのようにして既存の多様なオフラインデータセットを、少数のタスク固有のデータと組み合わせて新しいタスクを解決し、大量のデータに対するトレーニングの一般化の利点を享受できるか? 本稿では,表現学習や視覚に基づく事前学習を必要とせずに,エンドツーエンドのオフラインRLが効果的に実現可能であることを示す。我々は、既存のロボットデータセットの事前学習と新しいタスクの迅速な微調整を組み合わせることで、新しいタスクを効果的に学習するオフラインRLに基づくフレームワークであるPTR(Pre-training for Robot)を提案する。 PTRは、既存のオフラインのRLメソッド、保守的Qラーニング(CQL)を使用しているが、PTRが実際に動作し、さまざまな先行メソッドを上回る性能を発揮するための重要な設計決定を含むように拡張されている。我々の知る限り、PTRは、さまざまなおもちゃのキッチンで収集された多様なマルチタスクロボットデータのデータセットを効果的に活用することで、実際のWidowXロボットの新しいドメインで10個のタスクデモを行うのに成功する最初のRL手法である。我々はまた、PTRがデモを必要とせずに、少数の試験において効果的な自律的な微調整と改善を可能にすることを実証した。付随する概要ビデオは補足資料とtiのurlで見ることができる。 https://sites.google.com/view/ptr-final/

論文の概要: Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials

関連論文リスト