IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
- URL: http://arxiv.org/abs/2508.00823v1
- Date: Fri, 01 Aug 2025 17:59:56 GMT
- Title: IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
- Authors: Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu,
- Abstract summary: IGL-Nav is an Incremental 3D Gaussian framework for efficient and 3D-aware image-goal navigation.<n>It can handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform.
- Score: 78.00035681410348
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual navigation with an image as goal is a fundamental and challenging problem. Conventional methods either rely on end-to-end RL learning or modular-based policy with topological graph or BEV map as memory, which cannot fully model the geometric relationship between the explored 3D environment and the goal image. In order to efficiently and accurately localize the goal image in 3D space, we build our navigation system upon the renderable 3D gaussian (3DGS) representation. However, due to the computational intensity of 3DGS optimization and the large search space of 6-DoF camera pose, directly leveraging 3DGS for image localization during agent exploration process is prohibitively inefficient. To this end, we propose IGL-Nav, an Incremental 3D Gaussian Localization framework for efficient and 3D-aware image-goal navigation. Specifically, we incrementally update the scene representation as new images arrive with feed-forward monocular prediction. Then we coarsely localize the goal by leveraging the geometric information for discrete space matching, which can be equivalent to efficient 3D convolution. When the agent is close to the goal, we finally solve the fine target pose with optimization via differentiable rendering. The proposed IGL-Nav outperforms existing state-of-the-art methods by a large margin across diverse experimental configurations. It can also handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform using a cellphone to capture goal image at arbitrary pose. Project page: https://gwxuan.github.io/IGL-Nav/.
Related papers
- SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation [9.77843053500054]
We propose SGLoc, a novel localization system that directly regresses camera poses from 3D Gaussian Splatting (3DGS) representation by leveraging semantic information.<n>Our method utilizes the semantic relationship between 2D image and 3D scene representation to estimate the 6DoF pose without prior pose information.
arXiv Detail & Related papers (2025-07-16T08:39:08Z) - 3DGeoDet: General-purpose Geometry-aware Image-based 3D Object Detection [17.502554516157893]
3DGeoDet is a novel geometry-aware 3D object detection approach.<n>It effectively handles single- and multi-view RGB images in indoor and outdoor environments.
arXiv Detail & Related papers (2025-06-11T09:18:36Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs [33.74118487769923]
We introduce GSemSplat, a framework that learns semantic representations linked to 3D Gaussians without per-scene optimization, dense image collections or calibration.<n>We employ a dual-feature approach that leverages both region-specific and context-aware semantic features as supervision in the 2D space.
arXiv Detail & Related papers (2024-12-22T09:06:58Z) - SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.<n>We propose a new framework for IIN, Gaussian Splatting for Visual Navigation (GaussNav), which constructs a novel map representation based on 3D Gaussian Splatting (3DGS)<n>Our GaussNav framework demonstrates a significant performance improvement, with Success weighted by Path Length (SPL) increasing from 0.347 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.