Layout-aware Dreamer for Embodied Referring Expression Grounding
- URL: http://arxiv.org/abs/2212.00171v2
- Date: Fri, 2 Dec 2022 16:00:40 GMT
- Title: Layout-aware Dreamer for Embodied Referring Expression Grounding
- Authors: Mingxiao Li, Zehao Wang, Tinne Tuytelaars, Marie-Francine Moens
- Abstract summary: We study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment.
We have designed an autonomous agent called Layout-aware Dreamer (LAD)
LAD learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation.
To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand.
- Score: 49.33508853581283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the problem of Embodied Referring Expression
Grounding, where an agent needs to navigate in a previously unseen environment
and localize a remote object described by a concise high-level natural language
instruction. When facing such a situation, a human tends to imagine what the
destination may look like and to explore the environment based on prior
knowledge of the environmental layout, such as the fact that a bathroom is more
likely to be found near a bedroom than a kitchen. We have designed an
autonomous agent called Layout-aware Dreamer (LAD), including two novel
modules, that is, the Layout Learner and the Goal Dreamer to mimic this
cognitive decision process. The Layout Learner learns to infer the room
category distribution of neighboring unexplored areas along the path for coarse
layout estimation, which effectively introduces layout common sense of
room-to-room transitions to our agent. To learn an effective exploration of the
environment, the Goal Dreamer imagines the destination beforehand. Our agent
achieves new state-of-the-art performance on the public leaderboard of the
REVERIE dataset in challenging unseen test environments with improvement in
navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43%
compared to the previous state-of-the-art. The code is released at
https://github.com/zehao-wang/LAD
Related papers
- What Is Near?: Room Locality Learning for Enhanced Robot
Vision-Language-Navigation in Indoor Living Environments [9.181624273492828]
We propose WIN, a commonsense learning model for Vision Language Navigation (VLN) tasks.
WIN predicts the local neighborhood map based on prior knowledge of living spaces and current observation.
We show that local-global planning based on locality knowledge and predicting the indoor layout allows the agent to efficiently select the appropriate action.
arXiv Detail & Related papers (2023-09-10T14:15:01Z) - TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors [29.255373211228548]
TIDEE tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.
TIDEE explores a home environment, detects objects that are out of their natural place, infers plausible object contexts for them, localizes such contexts in the current scene, and repositions the objects.
We test TIDEE on tidying up disorganized scenes in the AI2THOR simulation environment.
arXiv Detail & Related papers (2022-07-21T21:19:18Z) - Explore before Moving: A Feasible Path Estimation and Memory Recalling
Framework for Embodied Navigation [117.26891277593205]
We focus on the navigation and solve the problem of existing navigation algorithms lacking experience and common sense.
Inspired by the human ability to think twice before moving and conceive several feasible paths to seek a goal in unfamiliar scenes, we present a route planning method named Path Estimation and Memory Recalling framework.
We show strong experimental results of PEMR on the EmbodiedQA navigation task.
arXiv Detail & Related papers (2021-10-16T13:30:55Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Scene-Intuitive Agent for Remote Embodied Visual Grounding [89.73786309180139]
Humans learn from life events to form intuitions towards the understanding of visual environments and languages.
We present an agent that mimics such human behaviors.
arXiv Detail & Related papers (2021-03-24T02:37:48Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Diagnosing the Environment Bias in Vision-and-Language Navigation [102.02103792590076]
Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations.
Recent works that study VLN observe a significant performance drop when tested on unseen environments, indicating that the neural agent models are highly biased towards training environments.
In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias.
arXiv Detail & Related papers (2020-05-06T19:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.