Pushing it out of the Way: Interactive Visual Navigation
- URL: http://arxiv.org/abs/2104.14040v1
- Date: Wed, 28 Apr 2021 22:46:41 GMT
- Title: Pushing it out of the Way: Interactive Visual Navigation
- Authors: Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi
- Abstract summary: We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
- Score: 62.296686176988125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We have observed significant progress in visual navigation for embodied
agents. A common assumption in studying visual navigation is that the
environments are static; this is a limiting assumption. Intelligent navigation
may involve interacting with the environment beyond just moving
forward/backward and turning left/right. Sometimes, the best way to navigate is
to push something out of the way. In this paper, we study the problem of
interactive navigation where agents learn to change the environment to navigate
more efficiently to their goals. To this end, we introduce the Neural
Interaction Engine (NIE) to explicitly predict the change in the environment
caused by the agent's actions. By modeling the changes while planning, we find
that agents exhibit significant improvements in their navigational
capabilities. More specifically, we consider two downstream tasks in the
physics-enabled, visually rich, AI2-THOR environment: (1) reaching a target
while the path to the target is blocked (2) moving an object to a target
location by pushing it. For both tasks, agents equipped with an NIE
significantly outperform agents without the understanding of the effect of the
actions indicating the benefits of our approach.
Related papers
- ETPNav: Evolving Topological Planning for Vision-Language Navigation in
Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments.
We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z) - Emergence of Maps in the Memories of Blind Navigation Agents [68.41901534985575]
Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment.
We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or'mental') maps.
Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms.
arXiv Detail & Related papers (2023-01-30T20:09:39Z) - What do navigation agents learn about their environment? [39.74076893981299]
We introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents.
We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
arXiv Detail & Related papers (2022-06-17T01:33:43Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.