Teaching Agents how to Map: Spatial Reasoning for Multi-Object
Navigation
- URL: http://arxiv.org/abs/2107.06011v4
- Date: Tue, 25 Apr 2023 08:26:47 GMT
- Title: Teaching Agents how to Map: Spatial Reasoning for Multi-Object
Navigation
- Authors: Pierre Marza, Laetitia Matignon, Olivier Simonin, Christian Wolf
- Abstract summary: We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings.
A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge.
- Score: 11.868792440783055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of visual navigation, the capacity to map a novel environment
is necessary for an agent to exploit its observation history in the considered
place and efficiently reach known goals. This ability can be associated with
spatial reasoning, where an agent is able to perceive spatial relationships and
regularities, and discover object characteristics. Recent work introduces
learnable policies parametrized by deep neural networks and trained with
Reinforcement Learning (RL). In classical RL setups, the capacity to map and
reason spatially is learned end-to-end, from reward alone. In this setting, we
introduce supplementary supervision in the form of auxiliary tasks designed to
favor the emergence of spatial perception capabilities in agents trained for a
goal-reaching downstream objective. We show that learning to estimate metrics
quantifying the spatial relationships between an agent at a given location and
a goal to reach has a high positive impact in Multi-Object Navigation settings.
Our method significantly improves the performance of different baseline agents,
that either build an explicit or implicit representation of the environment,
even matching the performance of incomparable oracle agents taking ground-truth
maps as input. A learning-based agent from the literature trained with the
proposed auxiliary losses was the winning entry to the Multi-Object Navigation
Challenge, part of the CVPR 2021 Embodied AI Workshop.
Related papers
- Improving Zero-Shot ObjectNav with Generative Communication [60.84730028539513]
We propose a new method for improving zero-shot ObjectNav.
Our approach takes into account that the ground agent may have limited and sometimes obstructed view.
arXiv Detail & Related papers (2024-08-03T22:55:26Z) - Interpretable Brain-Inspired Representations Improve RL Performance on
Visual Navigation Tasks [0.0]
We show how the method of slow feature analysis (SFA) overcomes both limitations by generating interpretable representations of visual data.
We employ SFA in a modern reinforcement learning context, analyse and compare representations and illustrate where hierarchical SFA can outperform other feature extractors on navigation tasks.
arXiv Detail & Related papers (2024-02-19T11:35:01Z) - Interactive Semantic Map Representation for Skill-based Visual Object
Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment.
We have implemented this representation into a full-fledged navigation approach called SkillTron.
The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z) - KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation [61.08389704326803]
Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes.
Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates.
We propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability.
arXiv Detail & Related papers (2023-03-28T08:00:46Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Embodied Learning for Lifelong Visual Perception [33.02424587900808]
We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings.
The purpose of the agents is to recognize objects and other semantic classes in the whole building at the end of a process that combines exploration and active visual learning.
arXiv Detail & Related papers (2021-12-28T10:47:13Z) - Learning to Map for Active Semantic Goal Navigation [40.193928212509356]
We propose a novel framework that actively learns to generate semantic maps outside the field of view of the agent.
We show how different objectives can be defined by balancing exploration with exploitation.
Our method is validated in the visually realistic environments offered by the Matterport3D dataset.
arXiv Detail & Related papers (2021-06-29T18:01:30Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment [73.9469267445146]
First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents.
We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
arXiv Detail & Related papers (2020-10-28T19:27:26Z) - Exploiting Scene-specific Features for Object Goal Navigation [9.806910643086043]
We introduce a new reduced dataset that speeds up the training of navigation models.
Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times.
We propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them.
arXiv Detail & Related papers (2020-08-21T10:16:01Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.