SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation
- URL: http://arxiv.org/abs/2507.12027v1
- Date: Wed, 16 Jul 2025 08:39:08 GMT
- Title: SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation
- Authors: Beining Xu, Siting Zhu, Hesheng Wang,
- Abstract summary: We propose SGLoc, a novel localization system that directly regresses camera poses from 3D Gaussian Splatting (3DGS) representation by leveraging semantic information.<n>Our method utilizes the semantic relationship between 2D image and 3D scene representation to estimate the 6DoF pose without prior pose information.
- Score: 9.77843053500054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SGLoc, a novel localization system that directly regresses camera poses from 3D Gaussian Splatting (3DGS) representation by leveraging semantic information. Our method utilizes the semantic relationship between 2D image and 3D scene representation to estimate the 6DoF pose without prior pose information. In this system, we introduce a multi-level pose regression strategy that progressively estimates and refines the pose of query image from the global 3DGS map, without requiring initial pose priors. Moreover, we introduce a semantic-based global retrieval algorithm that establishes correspondences between 2D (image) and 3D (3DGS map). By matching the extracted scene semantic descriptors of 2D query image and 3DGS semantic representation, we align the image with the local region of the global 3DGS map, thereby obtaining a coarse pose estimation. Subsequently, we refine the coarse pose by iteratively optimizing the difference between the query image and the rendered image from 3DGS. Our SGLoc demonstrates superior performance over baselines on 12scenes and 7scenes datasets, showing excellent capabilities in global localization without initial pose prior. Code will be available at https://github.com/IRMVLab/SGLoc.
Related papers
- IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation [78.00035681410348]
IGL-Nav is an Incremental 3D Gaussian framework for efficient and 3D-aware image-goal navigation.<n>It can handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform.
arXiv Detail & Related papers (2025-08-01T17:59:56Z) - Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization [29.793562435104707]
We propose a scene representation for visual localization that combines an explicit geometry model (3DGS) with an implicit feature field.<n>We use a 3D structure-informed clustering procedure to regularize the representation learning and seamlessly convert the features to segmentations.<n>The resulting privacy- and non-privacy-preserving localization pipelines, evaluated on multiple real-world datasets, show state-of-the-art performances.
arXiv Detail & Related papers (2025-07-31T13:58:15Z) - EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting [9.94641948288285]
EG-Gaussian utilizes epipolar geometry and graph networks for 3D scene reconstruction.<n>Our approach significantly improves reconstruction accuracy compared to 3DGS-based methods.
arXiv Detail & Related papers (2025-04-18T08:10:39Z) - OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation.<n>We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images.<n>To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images [7.363332481155945]
This paper presents a vision-based localization pipeline utilizing the 3D Splatting (GS) technique as scene representation.
During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map.
High-precision pose is achieved through the analysis-by manner on the map.
arXiv Detail & Related papers (2024-10-15T11:17:18Z) - GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
We propose a two-stage procedure that integrates dense and robust keypoint descriptors from the lightweight XFeat feature extractor into 3DGS.<n>In the second stage, the initial pose estimate is refined by minimizing the rendering-based photometric warp loss.<n> Benchmarking on widely used indoor and outdoor datasets demonstrates improvements over recent neural rendering-based localization methods.
arXiv Detail & Related papers (2024-09-24T23:18:32Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D
Pretraining from Real-World Data [73.06536202251915]
3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions.
We propose GS-CLIP for the first attempt to introduce 3DGS into multimodal pre-training to enhance 3D representation.
arXiv Detail & Related papers (2024-02-09T05:46:47Z) - Geometric Correspondence Fields: Learned Differentiable Rendering for 3D
Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild.
In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates.
We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.