Collecting Interactive Multi-modal Datasets for Grounded Language
Understanding
- URL: http://arxiv.org/abs/2211.06552v3
- Date: Tue, 21 Mar 2023 06:38:48 GMT
- Title: Collecting Interactive Multi-modal Datasets for Grounded Language
Understanding
- Authors: Shrestha Mohanty, Negar Arabzadeh, Milagro Teruel, Yuxuan Sun, Artem
Zholus, Alexey Skrynnik, Mikhail Burtsev, Kavya Srinet, Aleksandr Panov,
Arthur Szlam, Marc-Alexandre C\^ot\'e, Julia Kiseleva
- Abstract summary: We formalized the collaborative embodied agent using natural language task.
We developed a tool for extensive and scalable data collection.
We collected the first dataset for interactive grounded language understanding.
- Score: 66.30648042100123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human intelligence can remarkably adapt quickly to new tasks and
environments. Starting from a very young age, humans acquire new skills and
learn how to solve new tasks either by imitating the behavior of others or by
following provided natural language instructions. To facilitate research which
can enable similar capabilities in machines, we made the following
contributions (1) formalized the collaborative embodied agent using natural
language task; (2) developed a tool for extensive and scalable data collection;
and (3) collected the first dataset for interactive grounded language
understanding.
Related papers
- Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - Policy Learning with a Language Bottleneck [65.99843627646018]
Policy Learning with a Language Bottleneck (PLLBB) is a framework enabling AI agents to generate linguistic rules.
PLLBB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules.
In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show thatPLLBB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z) - Transforming Human-Centered AI Collaboration: Redefining Embodied Agents
Capabilities through Interactive Grounded Language Instructions [23.318236094953072]
Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly.
The research community is actively pursuing the development of interactive "embodied agents"
These agents must possess the ability to promptly request feedback in case communication breaks down or instructions are unclear.
arXiv Detail & Related papers (2023-05-18T07:51:33Z) - Grounding Language with Visual Affordances over Unstructured Data [26.92329260907805]
We propose a novel approach to efficiently learn language-conditioned robot skills from unstructured, offline and reset-free data.
We exploit a self-supervised visuo-lingual affordance model, which requires as little as 1% of the total data with language.
We find that our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches.
arXiv Detail & Related papers (2022-10-04T21:16:48Z) - IGLU 2022: Interactive Grounded Language Understanding in a
Collaborative Environment at NeurIPS 2022 [63.07251290802841]
We propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment.
The primary goal of the competition is to approach the problem of how to develop interactive embodied agents.
This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community.
arXiv Detail & Related papers (2022-05-27T06:12:48Z) - Interactive Grounded Language Understanding in a Collaborative
Environment: IGLU 2021 [58.196738777207315]
We propose emphIGLU: Interactive Grounded Language Understanding in a Collaborative Environment.
The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment.
arXiv Detail & Related papers (2022-05-05T01:20:09Z) - NeurIPS 2021 Competition IGLU: Interactive Grounded Language
Understanding in a Collaborative Environment [71.11505407453072]
We propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment.
The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment.
This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL)
arXiv Detail & Related papers (2021-10-13T07:13:44Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Multi-agent Communication meets Natural Language: Synergies between
Functional and Structural Language Learning [16.776753238108036]
We present a method for combining multi-agent communication and traditional data-driven approaches to natural language learning.
Our starting point is a language model that has been trained on generic, not task-specific language data.
We then place this model in a multi-agent self-play environment that generates task-specific rewards used to adapt or modulate the model.
arXiv Detail & Related papers (2020-05-14T15:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.