SILG: The Multi-environment Symbolic Interactive Language Grounding
Benchmark
- URL: http://arxiv.org/abs/2110.10661v1
- Date: Wed, 20 Oct 2021 17:02:06 GMT
- Title: SILG: The Multi-environment Symbolic Interactive Language Grounding
Benchmark
- Authors: Victor Zhong and Austin W. Hanjie and Sida I. Wang and Karthik
Narasimhan and Luke Zettlemoyer
- Abstract summary: We propose the multi-environment Interactive Language Grounding benchmark (SILG)
SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack)
We evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG.
- Score: 62.34200575624785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing work in language grounding typically study single environments. How
do we build unified models that apply across multiple environments? We propose
the multi-environment Symbolic Interactive Language Grounding benchmark (SILG),
which unifies a collection of diverse grounded language learning environments
under a common interface. SILG consists of grid-world environments that require
generalization to new dynamics, entities, and partially observed worlds (RTFM,
Messenger, NetHack), as well as symbolic counterparts of visual worlds that
require interpreting rich natural language with respect to complex scenes
(ALFWorld, Touchdown). Together, these environments provide diverse grounding
challenges in richness of observation space, action space, language
specification, and plan complexity. In addition, we propose the first shared
model architecture for RL on these environments, and evaluate recent advances
such as egocentric local convolution, recurrent state-tracking, entity-centric
attention, and pretrained LM using SILG. Our shared architecture achieves
comparable performance to environment-specific architectures. Moreover, we find
that many recent modelling advances do not result in significant gains on
environments other than the one they were designed for. This highlights the
need for a multi-environment benchmark. Finally, the best models significantly
underperform humans on SILG, which suggests ample room for future work. We hope
SILG enables the community to quickly identify new methodologies for language
grounding that generalize to a diverse set of environments and their associated
challenges.
Related papers
- ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments [13.988804095409133]
We propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks.
Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps.
With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics.
arXiv Detail & Related papers (2024-07-26T07:00:27Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning [73.0990339667978]
Navigation in unfamiliar environments presents a major challenge for robots.
We use language models to bias exploration of novel real-world environments.
We evaluate LFG in challenging real-world environments and simulated benchmarks.
arXiv Detail & Related papers (2023-10-16T06:21:06Z) - Graph based Environment Representation for Vision-and-Language
Navigation in Continuous Environments [20.114506226598508]
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is a navigation task that requires an agent to follow a language instruction in a realistic environment.
We propose a new environment representation in order to solve the above problems.
arXiv Detail & Related papers (2023-01-11T08:04:18Z) - Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z) - CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions.
We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations.
Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z) - IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents [54.300585048295225]
We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way.
The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.
arXiv Detail & Related papers (2022-05-31T23:08:22Z) - VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement
Learning [14.553086325168803]
We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment)
We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment.
The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal.
arXiv Detail & Related papers (2020-10-26T18:51:02Z) - Ecological Semantics: Programming Environments for Situated Language
Understanding [25.853707930426175]
Grounded language learning approaches offer the promise of deeper understanding by situating learning in richer, more structured training environments.
We propose treating environments as "first-class citizens" in semantic representations.
We argue that models must begin to understand and program in the language of affordances.
arXiv Detail & Related papers (2020-03-10T08:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.