RoboCLIP: One Demonstration is Enough to Learn Robot Policies
- URL: http://arxiv.org/abs/2310.07899v1
- Date: Wed, 11 Oct 2023 21:10:21 GMT
- Title: RoboCLIP: One Demonstration is Enough to Learn Robot Policies
- Authors: Sumedh A Sontakke, Jesse Zhang, S\'ebastien M. R. Arnold, Karl
Pertsch, Erdem B{\i}y{\i}k, Dorsa Sadigh, Chelsea Finn, Laurent Itti
- Abstract summary: RoboCLIP is an online imitation learning method that uses a single demonstration in the form of a video demonstration or a textual description of the task to generate rewards.
RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains.
- Score: 72.24495908759967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward specification is a notoriously difficult problem in reinforcement
learning, requiring extensive expert supervision to design robust reward
functions. Imitation learning (IL) methods attempt to circumvent these problems
by utilizing expert demonstrations but typically require a large number of
in-domain expert demonstrations. Inspired by advances in the field of
Video-and-Language Models (VLMs), we present RoboCLIP, an online imitation
learning method that uses a single demonstration (overcoming the large data
requirement) in the form of a video demonstration or a textual description of
the task to generate rewards without manual reward function design.
Additionally, RoboCLIP can also utilize out-of-domain demonstrations, like
videos of humans solving the task for reward generation, circumventing the need
to have the same demonstration and deployment domains. RoboCLIP utilizes
pretrained VLMs without any finetuning for reward generation. Reinforcement
learning agents trained with RoboCLIP rewards demonstrate 2-3 times higher
zero-shot performance than competing imitation learning methods on downstream
robot manipulation tasks, doing so using only one video/text demonstration.
Related papers
- Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - Augmented Reality Demonstrations for Scalable Robot Imitation Learning [25.026589453708347]
This paper presents an innovative solution: an Augmented Reality (AR)-assisted framework for demonstration collection.
We empower non-roboticist users to produce demonstrations for robot IL using devices like the HoloLens 2.
We validate our approach with experiments on three classical robotics tasks: reach, push, and pick-and-place.
arXiv Detail & Related papers (2024-03-20T18:30:12Z) - Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training [69.54948297520612]
Learning a generalist embodied agent poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets.
We introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos.
Our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-02-22T09:48:47Z) - SWBT: Similarity Weighted Behavior Transformer with the Imperfect
Demonstration for Robotic Manipulation [32.78083518963342]
We propose a novel framework named Similarity Weighted Behavior Transformer (SWBT)
SWBT effectively learn from both expert and imperfect demonstrations without interaction with environments.
We are the first to attempt to integrate imperfect demonstrations into the offline imitation learning setting for robot manipulation tasks.
arXiv Detail & Related papers (2024-01-17T04:15:56Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Learning Complicated Manipulation Skills via Deterministic Policy with
Limited Demonstrations [9.640594614636049]
Deep reinforcement learning can efficiently develop policies for manipulators.
It takes time to collect sufficient high-quality demonstrations in practice.
Human demonstrations may be unsuitable for robots.
arXiv Detail & Related papers (2023-03-29T05:56:44Z) - Learning Agile Skills via Adversarial Imitation of Rough Partial
Demonstrations [19.257876507104868]
Learning agile skills is one of the main challenges in robotics.
We propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations.
We show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors.
arXiv Detail & Related papers (2022-06-23T13:34:11Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks [8.756012472587601]
Deep reinforcement learning (RL) can be used to learn complex manipulation tasks.
RL requires the robot to collect a large amount of real-world experience.
S SQUIRL performs a new but related long-horizon task robustly given only a single video demonstration.
arXiv Detail & Related papers (2020-03-10T20:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.