GaussNav: Gaussian Splatting for Visual Navigation
- URL: http://arxiv.org/abs/2403.11625v2
- Date: Wed, 20 Mar 2024 02:01:58 GMT
- Title: GaussNav: Gaussian Splatting for Visual Navigation
- Authors: Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li,
- Abstract summary: Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
Our framework constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
- Score: 92.13664084464514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. Our code will be made publicly available.
Related papers
- GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting [82.29476781526752]
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques.
GaussianObject is a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images.
GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images.
arXiv Detail & Related papers (2024-02-15T18:42:33Z) - VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language
Model [28.79971953667143]
VoroNav is a semantic exploration framework to extract exploratory paths and planning nodes from a semantic map constructed in real time.
By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model.
arXiv Detail & Related papers (2024-01-05T08:05:07Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - 3D-Aware Object Goal Navigation via Simultaneous Exploration and
Identification [19.125633699422117]
We propose a framework for 3D-aware ObjectNav based on two straightforward sub-policies.
Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets.
arXiv Detail & Related papers (2022-12-01T07:55:56Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.