IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents
- URL: http://arxiv.org/abs/2206.00142v1
- Date: Tue, 31 May 2022 23:08:22 GMT
- Title: IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents
- Authors: Artem Zholus, Alexey Skrynnik, Shrestha Mohanty, Zoya Volovikova,
Julia Kiseleva, Artur Szlam, Marc-Alexandre Cot\'e, Aleksandr I. Panov
- Abstract summary: We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way.
The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.
- Score: 54.300585048295225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the IGLU Gridworld: a reinforcement learning environment for
building and evaluating language conditioned embodied agents in a scalable way.
The environment features visual agent embodiment, interactive learning through
collaboration, language conditioned RL, and combinatorically hard task (3d
blocks building) space.
Related papers
- Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments [42.06453257292203]
We propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive action-execution capabilities of reinforcement learning agents.
We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.
arXiv Detail & Related papers (2024-07-12T14:19:36Z) - LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs)
LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface.
In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Graph based Environment Representation for Vision-and-Language
Navigation in Continuous Environments [20.114506226598508]
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment.
We propose a new environment representation in order to solve the above problems.
arXiv Detail & Related papers (2023-01-11T08:04:18Z) - CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions.
We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations.
Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z) - SILG: The Multi-environment Symbolic Interactive Language Grounding
Benchmark [62.34200575624785]
We propose the multi-environment Interactive Language Grounding benchmark (SILG)
SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack)
We evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG.
arXiv Detail & Related papers (2021-10-20T17:02:06Z) - Semantic Tracklets: An Object-Centric Representation for Visual
Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation.
We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment.
Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z) - VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement
Learning [14.553086325168803]
We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment)
We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment.
The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal.
arXiv Detail & Related papers (2020-10-26T18:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.