Explore and Explain: Self-supervised Navigation and Recounting
- URL: http://arxiv.org/abs/2007.07268v1
- Date: Tue, 14 Jul 2020 18:00:49 GMT
- Title: Explore and Explain: Self-supervised Navigation and Recounting
- Authors: Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli,
Lorenzo Baraldi, Rita Cucchiara
- Abstract summary: We devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path.
In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes.
Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation.
- Score: 43.52107532692226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied AI has been recently gaining attention as it aims to foster the
development of autonomous and intelligent agents. In this paper, we devise a
novel embodied setting in which an agent needs to explore a previously unknown
environment while recounting what it sees during the path. In this context, the
agent needs to navigate the environment driven by an exploration goal, select
proper moments for description, and output natural language descriptions of
relevant objects and scenes. Our model integrates a novel self-supervised
exploration module with penalty, and a fully-attentive captioning model for
explanation. Also, we investigate different policies for selecting proper
moments for explanation, driven by information coming from both the environment
and the navigation. Experiments are conducted on photorealistic environments
from the Matterport3D dataset and investigate the navigation and explanation
capabilities of the agent as well as the role of their interactions.
Related papers
- NavHint: Vision and Language Navigation Agent with a Hint Generator [31.322331792911598]
We provide indirect supervision to the navigation agent through a hint generator that provides detailed visual descriptions.
The hint generator assists the navigation agent in developing a global understanding of the visual environment.
We evaluate our method on the R2R and R4R datasets and achieve state-of-the-art on several metrics.
arXiv Detail & Related papers (2024-02-04T16:23:16Z) - Embodied Agents for Efficient Exploration and Smart Scene Description [47.82947878753809]
We tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment.
We propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images.
Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions.
arXiv Detail & Related papers (2023-01-17T19:28:01Z) - What do navigation agents learn about their environment? [39.74076893981299]
We introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents.
We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment.
arXiv Detail & Related papers (2022-06-17T01:33:43Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.