Grounding Language with Visual Affordances over Unstructured Data
- URL: http://arxiv.org/abs/2210.01911v1
- Date: Tue, 4 Oct 2022 21:16:48 GMT
- Title: Grounding Language with Visual Affordances over Unstructured Data
- Authors: Oier Mees, Jessica Borja-Diaz, Wolfram Burgard
- Abstract summary: We propose a novel approach to efficiently learn language-conditioned robot skills from unstructured, offline and reset-free data.
We exploit a self-supervised visuo-lingual affordance model, which requires as little as 1% of the total data with language.
We find that our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches.
- Score: 26.92329260907805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have shown that Large Language Models (LLMs) can be applied to
ground natural language to a wide variety of robot skills. However, in
practice, learning multi-task, language-conditioned robotic skills typically
requires large-scale data collection and frequent human intervention to reset
the environment or help correcting the current policies. In this work, we
propose a novel approach to efficiently learn general-purpose
language-conditioned robot skills from unstructured, offline and reset-free
data in the real world by exploiting a self-supervised visuo-lingual affordance
model, which requires annotating as little as 1% of the total data with
language. We evaluate our method in extensive experiments both in simulated and
real-world robotic tasks, achieving state-of-the-art performance on the
challenging CALVIN benchmark and learning over 25 distinct visuomotor
manipulation tasks with a single policy in the real world. We find that when
paired with LLMs to break down abstract natural language instructions into
subgoals via few-shot prompting, our method is capable of completing
long-horizon, multi-tier tasks in the real world, while requiring an order of
magnitude less data than previous approaches. Code and videos are available at
http://hulc2.cs.uni-freiburg.de
Related papers
- Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics [25.2461925479135]
Video-Language Critic is a reward model that can be trained on readily available cross-embodiment data.
Our model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only.
arXiv Detail & Related papers (2024-05-30T12:18:06Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous
States in Realistic 3D Scenes [72.83187997344406]
ARNOLD is a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes.
ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals.
arXiv Detail & Related papers (2023-04-09T21:42:57Z) - Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks.
We introduce a framework for language-driven representation learning from human videos and captions.
We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z) - Robotic Skill Acquisition via Instruction Augmentation with
Vision-Language Models [70.82705830137708]
We introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL)
We utilize semi-language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data.
DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.
arXiv Detail & Related papers (2022-11-21T18:56:00Z) - What Matters in Language Conditioned Robotic Imitation Learning [26.92329260907805]
We study the most critical challenges in learning language conditioned policies from offline free-form imitation datasets.
We present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
arXiv Detail & Related papers (2022-04-13T08:45:32Z) - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world.
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z) - CALVIN: A Benchmark for Language-conditioned Policy Learning for
Long-horizon Robot Manipulation Tasks [30.936692970187416]
General-purpose robots must learn to relate human language to their perceptions and actions.
We present CALVIN, an open-source simulated benchmark to learn long-horizon language-conditioned tasks.
arXiv Detail & Related papers (2021-12-06T18:37:33Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Language Conditioned Imitation Learning over Unstructured Data [9.69886122332044]
We present a method for incorporating free-form natural language conditioning into imitation learning.
Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network.
We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data.
arXiv Detail & Related papers (2020-05-15T17:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.