Related papers: Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks

Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks

URL: http://arxiv.org/abs/2210.04476v2
Date: Fri, 28 Apr 2023 09:38:07 GMT
Title: Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
Authors: Albert Yu, Raymond J. Mooney
Abstract summary: DeL-TaCo is a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. To our knowledge, this is the first work to show that simultaneously conditioning a multi-task robotic manipulation policy on both demonstration and language embeddings improves sample efficiency and generalization over conditioning on either modality alone.
Score: 21.65346551790888
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Demonstrations and natural language instructions are two common ways to specify and teach robots novel tasks. However, for many complex tasks, a demonstration or language instruction alone contains ambiguities, preventing tasks from being specified clearly. In such cases, a combination of both a demonstration and an instruction more concisely and effectively conveys the task to the robot than either modality alone. To instantiate this problem setting, we train a single multi-task policy on a few hundred challenging robotic pick-and-place tasks and propose DeL-TaCo (Joint Demo-Language Task Conditioning), a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. By allowing these two modalities to mutually disambiguate and clarify each other during novel task specification, DeL-TaCo (1) substantially decreases the teacher effort needed to specify a new task and (2) achieves better generalization performance on novel objects and instructions over previous task-conditioning methods. To our knowledge, this is the first work to show that simultaneously conditioning a multi-task robotic manipulation policy on both demonstration and language embeddings improves sample efficiency and generalization over conditioning on either modality alone. See additional materials at https://deltaco-robot.github.io/

Related papers

NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation [21.02437461550044]
Many real-world tasks demand intricate multi-step reasoning. We introduce a benchmark, NrVLM, comprising 15 distinct manipulation tasks. We propose a novel learning framework that completes the manipulation task step-by-step according to the fine-grained instructions.
arXiv Detail & Related papers (2024-03-13T09:12:16Z)
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z)
Continual Robot Learning using Self-Supervised Task Inference [19.635428830237842]
We propose a self-supervised task inference approach to continually learn new tasks. We use a behavior-matching self-supervised learning objective to train a novel Task Inference Network (TINet) A multi-task policy is built on top of the TINet and trained with reinforcement learning to optimize performance over tasks.
arXiv Detail & Related papers (2023-09-10T09:32:35Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
Robustness of Learning from Task Instructions [15.462970803323563]
Traditional supervised learning mostly works on individual tasks and requires training on a large set of task-specific examples. To build a system that can quickly and easily generalize to new tasks, task instructions have been adopted as an emerging trend of supervision. This work investigates the system robustness when the instructions of new tasks are (i) manipulated, (ii) paraphrased, or (iii) from different levels of conciseness.
arXiv Detail & Related papers (2022-12-07T17:54:59Z)
Instruction-driven history-aware policies for robotic manipulations [82.25511767738224]
We propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations. We evaluate our method on the challenging RLBench benchmark and on a real-world robot.
arXiv Detail & Related papers (2022-09-11T16:28:25Z)
Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences. One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations. We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees [63.31965375413414]
We propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. We derive an end-to-end learning objective function that is suitable for the multi-task problem.
arXiv Detail & Related papers (2020-12-24T22:46:22Z)
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning [32.82030512053361]
We propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories. We find that human demonstrations help solve the most complex tasks. We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2020-11-01T14:39:46Z)
DeComplex: Task planning from complex natural instructions by a collocating robot [3.158346511479111]
It is not trivial to execute the human intended tasks as natural language expressions can have large linguistic variations. Existing works assume either single task instruction is given to the robot at a time or there are multiple independent tasks in an instruction. We propose a method to find the intended order of execution of multiple inter-dependent tasks given in natural language instruction.
arXiv Detail & Related papers (2020-08-23T18:10:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.