Language-Conditioned Imitation Learning for Robot Manipulation Tasks
- URL: http://arxiv.org/abs/2010.12083v1
- Date: Thu, 22 Oct 2020 21:49:08 GMT
- Title: Language-Conditioned Imitation Learning for Robot Manipulation Tasks
- Authors: Simon Stepputtis, Joseph Campbell, Mariano Phielipp, Stefan Lee,
Chitta Baral, Heni Ben Amor
- Abstract summary: We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
- Score: 39.40937105264774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning is a popular approach for teaching motor skills to robots.
However, most approaches focus on extracting policy parameters from execution
traces alone (i.e., motion trajectories and perceptual data). No adequate
communication channel exists between the human expert and the robot to describe
critical aspects of the task, such as the properties of the target object or
the intended shape of the motion. Motivated by insights into the human teaching
process, we introduce a method for incorporating unstructured natural language
into imitation learning. At training time, the expert can provide
demonstrations along with verbal descriptions in order to describe the
underlying intent (e.g., "go to the large green bowl"). The training process
then interrelates these two modalities to encode the correlations between
language, perception, and motion. The resulting language-conditioned visuomotor
policies can be conditioned at runtime on new human commands and instructions,
which allows for more fine-grained control over the trained policies while also
reducing situational ambiguity. We demonstrate in a set of simulation
experiments how our approach can learn language-conditioned manipulation
policies for a seven-degree-of-freedom robot arm and compare the results to a
variety of alternative methods.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Contrastive Language, Action, and State Pre-training for Robot Learning [1.1000499414131326]
We introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning.
Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment.
We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior to language-conditioned reinforcement learning.
arXiv Detail & Related papers (2023-04-21T07:19:33Z) - Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks.
We introduce a framework for language-driven representation learning from human videos and captions.
We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single
Demonstration [8.57914821832517]
We introduce a simple new method for visual imitation learning, which allows a novel robot manipulation task to be learned from a single human demonstration.
Our method models imitation learning as a state estimation problem, with the state defined as the end-effector's pose.
At test time, the end-effector moves to the estimated state through a linear path, at which point the original demonstration's end-effector velocities are simply replayed.
arXiv Detail & Related papers (2021-05-13T16:36:55Z) - Language Conditioned Imitation Learning over Unstructured Data [9.69886122332044]
We present a method for incorporating free-form natural language conditioning into imitation learning.
Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network.
We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data.
arXiv Detail & Related papers (2020-05-15T17:08:50Z) - Metric-Based Imitation Learning Between Two Dissimilar Anthropomorphic
Robotic Arms [29.08134072341867]
One major challenge in imitation learning is the correspondence problem.
We introduce a distance measure between dissimilar embodiments.
We find that the measure is well suited for describing the similarity between embodiments and for learning imitation policies by distance.
arXiv Detail & Related papers (2020-02-25T19:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.