SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic
Navigation
- URL: http://arxiv.org/abs/2012.04512v2
- Date: Mon, 22 Mar 2021 01:15:16 GMT
- Title: SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic
Navigation
- Authors: Yiqing Liang, Boyuan Chen, Shuran Song
- Abstract summary: This paper focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment.
We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module.
Our experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies.
- Score: 22.0915442335966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper focuses on visual semantic navigation, the task of producing
actions for an active agent to navigate to a specified target object category
in an unknown environment. To complete this task, the algorithm should
simultaneously locate and navigate to an instance of the category. In
comparison to the traditional point goal navigation, this task requires the
agent to have a stronger contextual prior to indoor environments. We introduce
SSCNav, an algorithm that explicitly models scene priors using a
confidence-aware semantic scene completion module to complete the scene and
guide the agent's navigation planning. Given a partial observation of the
environment, SSCNav first infers a complete scene representation with semantic
labels for the unobserved scene together with a confidence map associated with
its own prediction. Then, a policy network infers the action from the scene
completion result and confidence map. Our experiments demonstrate that the
proposed scene completion module improves the efficiency of the downstream
navigation policies. Video, code, and data: https://sscnav.cs.columbia.edu/
Related papers
- SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation [83.4599149936183]
Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects.
We propose to represent the observed scene with 3D scene graph.
We conduct extensive experiments on MP3D, HM3D and RoboTHOR environments, where SG-Nav surpasses previous state-of-the-art zero-shot methods by more than 10% SR on all benchmarks.
arXiv Detail & Related papers (2024-10-10T17:57:19Z) - Prioritized Semantic Learning for Zero-shot Instance Navigation [2.537056548731396]
We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training.
We propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents.
Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task.
arXiv Detail & Related papers (2024-03-18T10:45:50Z) - Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [16.32780793344835]
We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation.
Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception.
The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
arXiv Detail & Related papers (2024-02-29T06:31:18Z) - Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network [3.0820097046465285]
"Zero-shot" means that the target the agent needs to find is not trained during the training phase.
We propose the Class-Independent Relationship Network (CIRN) to address the issue of coupling navigation ability with target features during training.
Our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task.
arXiv Detail & Related papers (2023-10-15T16:42:14Z) - Explore and Tell: Embodied Visual Captioning in 3D Environments [83.00553567094998]
In real-world scenarios, a single image may not offer a good viewpoint, hindering fine-grained scene understanding.
We propose a novel task called Embodied Captioning, which equips visual captioning models with navigation capabilities.
We propose a Cascade Embodied Captioning model (CaBOT), which comprises of a navigator and a captioner, to tackle this task.
arXiv Detail & Related papers (2023-08-21T03:46:04Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - ESceme: Vision-and-Language Navigation with Episodic Scene Memory [72.69189330588539]
Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.
We introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent's memories of past visits when it enters the current scene.
arXiv Detail & Related papers (2023-03-02T07:42:07Z) - Predicting Dense and Context-aware Cost Maps for Semantic Robot
Navigation [35.45993685414002]
We investigate the task of object goal navigation in unknown environments where the target is specified by a semantic label.
We propose a deep neural network architecture and loss function to predict dense cost maps that implicitly contain semantic context.
We also present a novel way of fusing mid-level visual representations in our architecture to provide additional semantic cues for cost map prediction.
arXiv Detail & Related papers (2022-10-17T11:43:19Z) - VTNet: Visual Transformer Network for Object Goal Navigation [36.15625223586484]
We introduce a Visual Transformer Network (VTNet) for learning informative visual representation in navigation.
In a nutshell, VTNet embeds object and region features with their location cues as spatial-aware descriptors.
Experiments in the artificial environment AI2-Thor demonstrate that VTNet significantly outperforms state-of-the-art methods in unseen testing environments.
arXiv Detail & Related papers (2021-05-20T01:23:15Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.