Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
- URL: http://arxiv.org/abs/2402.17587v3
- Date: Fri, 22 Mar 2024 07:23:51 GMT
- Title: Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
- Authors: Xiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li,
- Abstract summary: Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment.
We propose a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation.
Our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)
- Score: 88.84058353659107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a new embodied vision task, Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment. The main challenge of this task lies in identifying the target object from different viewpoints while rejecting similar distractors. Existing ImageGoal Navigation methods usually adopt the simple Exploration-Exploitation framework and ignore the identification of specific instance during navigation. In this work, we propose to imitate the human behaviour of ``getting closer to confirm" when distinguishing objects from a distance. Specifically, we design a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation. Our method allows for active switching among the exploration, verification, and exploitation actions, thereby facilitating the agent in making reasonable decisions under different situations. On the challenging HabitatMatterport 3D semantic (HM3D-SEM) dataset, our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)
Related papers
- GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation [65.71524410114797]
GOAT-Bench is a benchmark for the universal navigation task GO to AnyThing (GOAT)
In GOAT, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image.
We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities.
arXiv Detail & Related papers (2024-04-09T20:40:00Z) - Prioritized Semantic Learning for Zero-shot Instance Navigation [2.537056548731396]
We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training.
We propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents.
Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task.
arXiv Detail & Related papers (2024-03-18T10:45:50Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Navigating to Objects Specified by Images [86.9672766351891]
We present a system that can perform the task in both simulation and the real world.
Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.
On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x.
arXiv Detail & Related papers (2023-04-03T17:58:00Z) - Instance-Specific Image Goal Navigation: Training Embodied Agents to
Find Object Instances [90.61897965658183]
We consider the problem of embodied visual navigation given an image-goal (ImageNav)
Unlike related navigation tasks, ImageNav does not have a standardized task definition which makes comparison across methods difficult.
We present the Instance-specific ImageNav task (ImageNav) to address these limitations.
arXiv Detail & Related papers (2022-11-29T02:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.