Related papers: A Systematic Survey of Text Worlds as Embodied Natural Language Environments

A Systematic Survey of Text Worlds as Embodied Natural Language Environments

URL: http://arxiv.org/abs/2107.04132v1
Date: Thu, 8 Jul 2021 22:15:16 GMT
Title: A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Authors: Peter A Jansen
Abstract summary: Text Worlds are virtual environments for embodied agents that, unlike 2D or 3D environments, are rendered exclusively using textual descriptions. These environments offer an alternative to higher-fidelity 3D environments due to their low barrier to entry.
Score: 0.76146285961466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text Worlds are virtual environments for embodied agents that, unlike 2D or 3D environments, are rendered exclusively using textual descriptions. These environments offer an alternative to higher-fidelity 3D environments due to their low barrier to entry, providing the ability to study semantics, compositional inference, and other high-level tasks with rich high-level action spaces while controlling for perceptual input. This systematic survey outlines recent developments in tooling, environments, and agent modeling for Text Worlds, while examining recent trends in knowledge graphs, common sense reasoning, transfer learning of Text World performance to higher-fidelity environments, as well as near-term development targets that, once achieved, make Text Worlds an attractive general research paradigm for natural language processing.

Related papers

Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z)
World-Consistent Data Generation for Vision-and-Language Navigation [52.08816337783936]
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions. One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments. We propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency.
arXiv Detail & Related papers (2024-12-09T11:40:54Z)
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments [13.988804095409133]
We propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks. Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps. With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics.
arXiv Detail & Related papers (2024-07-26T07:00:27Z)
Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z)
"What's my model inside of?": Exploring the role of environments for grounded natural language understanding [1.8829370712240063]
In this thesis we adopt an ecological approach to grounded natural language understanding (NLU) research. We develop novel training and annotation approaches for procedural text understanding based on text-based game environments. We propose a design for AI-augmented "social thinking environments" for knowledge workers like scientists.
arXiv Detail & Related papers (2024-02-04T15:52:46Z)
Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z)
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions. We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations. Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z)
SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark [62.34200575624785]
We propose the multi-environment Interactive Language Grounding benchmark (SILG) SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack) We evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG.
arXiv Detail & Related papers (2021-10-20T17:02:06Z)
Learning Knowledge Graph-based World Models of Textual Environments [16.67845396797253]
This work focuses on the task of building world models of text-based game environments. Our world model learns to simultaneously: (1) predict changes in the world caused by an agent's actions when representing the world as a knowledge graph; and (2) generate the set of contextually relevant natural language actions required to operate in the world.
arXiv Detail & Related papers (2021-06-17T15:45:54Z)
Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments [54.405920619915655]
We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a dataset with natural language commands for the greatest number of interactive environments to date. MoTIF is the first to contain natural language requests for interactive environments that are not satisfiable. We perform initial feasibility classification experiments and only reach an F1 score of 37.3, verifying the need for richer vision-language representations.
arXiv Detail & Related papers (2021-04-17T14:48:02Z)
VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning [14.553086325168803]
We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment) We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment. The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal.
arXiv Detail & Related papers (2020-10-26T18:51:02Z)
Zero-Shot Compositional Policy Learning via Language Grounding [13.45138913186308]
Humans can adapt to new tasks quickly by leveraging prior knowledge about the world such as language descriptions. We introduce a new research platform BabyAI++ in which the dynamics of environments are disentangled from visual appearance. We find that current language-guided RL/IL techniques overfit to the training environments and suffer from a huge performance drop when facing unseen combinations.
arXiv Detail & Related papers (2020-04-15T16:58:19Z)
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation [88.69873520186017]
We introduce a multitask navigation model that can be seamlessly trained on Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks. Experiments show that environment-agnostic multitask learning significantly reduces the performance gap between seen and unseen environments.
arXiv Detail & Related papers (2020-03-01T09:06:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.