RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
- URL: http://arxiv.org/abs/2409.14674v1
- Date: Mon, 23 Sep 2024 02:50:33 GMT
- Title: RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
- Authors: Yinpei Dai, Jayjun Lee, Nima Fazeli, Joyce Chai,
- Abstract summary: We propose a scalable data generation pipeline that augments expert demonstrations with failure recovery trajectories.
We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework.
Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer on RLbench.
- Score: 19.023560632891467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.
Related papers
- Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts [21.249837293326497]
Generalizable reward function is central to reinforcement learning and planning for robots.
This paper transfers video-language models with robust generalization into a language-conditioned reward function.
Our model shows outstanding generalization to new environments and new instructions for robot planning and reinforcement learning.
arXiv Detail & Related papers (2024-07-20T13:22:59Z) - LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
Vision Language Models (VLMs) can process state information as visual-textual prompts and respond with policy decisions in text.
We propose LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as conversations.
arXiv Detail & Related papers (2024-06-28T17:59:12Z) - Yell At Your Robot: Improving On-the-Fly from Language Corrections [84.09578841663195]
We show that high-level policies can be readily supervised with human feedback in the form of language corrections.
This framework enables robots not only to rapidly adapt to real-time language feedback, but also incorporate this feedback into an iterative training scheme.
arXiv Detail & Related papers (2024-03-19T17:08:24Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - REFLECT: Summarizing Robot Experiences for Failure Explanation and
Correction [28.015693808520496]
REFLECT is a framework which queries Large Language Models for failure reasoning based on a hierarchical summary of robot past experiences.
We show that REFLECT is able to generate informative failure explanations that assist successful correction planning.
arXiv Detail & Related papers (2023-06-27T18:03:15Z) - LARG, Language-based Automatic Reward and Goal Generation [8.404316955848602]
We develop an approach that converts a text-based task description into its corresponding reward and goal-generation functions.
We evaluate our approach for robotic manipulation and demonstrate its ability to train and execute policies in a scalable manner.
arXiv Detail & Related papers (2023-06-19T14:52:39Z) - Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks.
We introduce a framework for language-driven representation learning from human videos and captions.
We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z) - Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs.
In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations.
We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z) - What Matters in Language Conditioned Robotic Imitation Learning [26.92329260907805]
We study the most critical challenges in learning language conditioned policies from offline free-form imitation datasets.
We present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
arXiv Detail & Related papers (2022-04-13T08:45:32Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.