Related papers: Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

URL: http://arxiv.org/abs/2010.12083v1
Date: Thu, 22 Oct 2020 21:49:08 GMT
Title: Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Authors: Simon Stepputtis, Joseph Campbell, Mariano Phielipp, Stefan Lee, Chitta Baral, Heni Ben Amor
Abstract summary: We introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent. The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
Score: 39.40937105264774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., "go to the large green bowl"). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.

Related papers

SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions. Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z)
Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks. At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids. We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z)
Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption. SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning. We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z)
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions. Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning. An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z)
Contrastive Language, Action, and State Pre-training for Robot Learning [1.1000499414131326]
We introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning. Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment. We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior to language-conditioned reinforcement learning.
arXiv Detail & Related papers (2023-04-21T07:19:33Z)
Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. We introduce a framework for language-driven representation learning from human videos and captions. We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z)
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution. Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot. We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration [8.57914821832517]
We introduce a simple new method for visual imitation learning, which allows a novel robot manipulation task to be learned from a single human demonstration. Our method models imitation learning as a state estimation problem, with the state defined as the end-effector's pose. At test time, the end-effector moves to the estimated state through a linear path, at which point the original demonstration's end-effector velocities are simply replayed.
arXiv Detail & Related papers (2021-05-13T16:36:55Z)
Language Conditioned Imitation Learning over Unstructured Data [9.69886122332044]
We present a method for incorporating free-form natural language conditioning into imitation learning. Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network. We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data.
arXiv Detail & Related papers (2020-05-15T17:08:50Z)
Metric-Based Imitation Learning Between Two Dissimilar Anthropomorphic Robotic Arms [29.08134072341867]
One major challenge in imitation learning is the correspondence problem. We introduce a distance measure between dissimilar embodiments. We find that the measure is well suited for describing the similarity between embodiments and for learning imitation policies by distance.
arXiv Detail & Related papers (2020-02-25T19:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.