Prioritized Semantic Learning for Zero-shot Instance Navigation
- URL: http://arxiv.org/abs/2403.11650v2
- Date: Tue, 16 Jul 2024 18:13:07 GMT
- Title: Prioritized Semantic Learning for Zero-shot Instance Navigation
- Authors: Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang,
- Abstract summary: We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training.
We propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents.
Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task.
- Score: 2.537056548731396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training. Previous object navigation approaches apply the image-goal navigation (ImageNav) task (go to the location of an image) for pretraining, and transfer the agent to achieve object goals using a vision-language model. However, these approaches lead to issues of semantic neglect, where the model fails to learn meaningful semantic alignments. In this paper, we propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents. Specifically, a semantic-enhanced PSL agent is proposed and a prioritized semantic training strategy is introduced to select goal images that exhibit clear semantic supervision and relax the reward function from strict exact view matching. At inference time, a semantic expansion inference scheme is designed to preserve the same granularity level of the goal semantic as training. Furthermore, for the popular HM3D environment, we present an Instance Navigation (InstanceNav) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (ObjectNav) task where the goal is defined merely by the object category. Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task. Code will be released at https://github.com/XinyuSun/PSL-InstanceNav.
Related papers
- GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
We propose a new framework for IIN, Gaussian Splatting for Visual Navigation (GaussNav), which constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our GaussNav framework demonstrates a significant performance improvement, with Success weighted by Path Length (SPL) increasing from 0.347 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - Right Place, Right Time! Generalizing ObjectNav to Dynamic Environments with Portable Targets [55.581423861790945]
We present a novel formulation to generalize ObjectNav to dynamic environments with non-stationary objects.
We first address several challenging issues with dynamizing existing topological scene graphs.
We then present a benchmark for P-ObjectNav using a combination of reinforcement learning, and Large Language Model (LLM)-based navigation approaches.
arXiv Detail & Related papers (2024-03-14T22:33:22Z) - Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation [88.84058353659107]
Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment.
We propose a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation.
Our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)
arXiv Detail & Related papers (2024-02-25T07:59:10Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Instance-Specific Image Goal Navigation: Training Embodied Agents to
Find Object Instances [90.61897965658183]
We consider the problem of embodied visual navigation given an image-goal (ImageNav)
Unlike related navigation tasks, ImageNav does not have a standardized task definition which makes comparison across methods difficult.
We present the Instance-specific ImageNav task (ImageNav) to address these limitations.
arXiv Detail & Related papers (2022-11-29T02:29:35Z) - Learning to Map for Active Semantic Goal Navigation [40.193928212509356]
We propose a novel framework that actively learns to generate semantic maps outside the field of view of the agent.
We show how different objectives can be defined by balancing exploration with exploitation.
Our method is validated in the visually realistic environments offered by the Matterport3D dataset.
arXiv Detail & Related papers (2021-06-29T18:01:30Z) - SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic
Navigation [22.0915442335966]
This paper focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment.
We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module.
Our experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies.
arXiv Detail & Related papers (2020-12-08T15:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.