Related papers: iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion

iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion

URL: http://arxiv.org/abs/2511.14149v1
Date: Tue, 18 Nov 2025 05:22:22 GMT
Title: iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion
Authors: Hao Wang, Linqing Zhao, Xiuwei Xu, Jiwen Lu, Haibin Yan,
Abstract summary: iGaussian is a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion.<n> Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T&T+DB datasets demonstrate a significant performance improvement over previous methods.
Score: 62.09575122593993
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent trends in SLAM and visual navigation have embraced 3D Gaussians as the preferred scene representation, highlighting the importance of estimating camera poses from a single image using a pre-built Gaussian model. However, existing approaches typically rely on an iterative \textit{render-compare-refine} loop, where candidate views are first rendered using NeRF or Gaussian Splatting, then compared against the target image, and finally, discrepancies are used to update the pose. This multi-round process incurs significant computational overhead, hindering real-time performance in robotics. In this paper, we propose iGaussian, a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion. Our method first regresses a coarse 6DoF pose using a Gaussian Scene Prior-based Pose Regression Network with spatial uniform sampling and guided attention mechanisms, then refines it through feature matching and multi-model fusion. The key contribution lies in our cross-correlation module that aligns image embeddings with 3D Gaussian attributes without differentiable rendering, coupled with a Weighted Multiview Predictor that fuses features from Multiple strategically sampled viewpoints. Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T\&T+DB datasets demonstrate a significant performance improvement over previous methods, reducing median rotation errors to 0.2° while achieving 2.87 FPS tracking on mobile robots, which is an impressive 10 times speedup compared to optimization-based approaches. Code: https://github.com/pythongod-exe/iGaussian

Related papers

SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views [18.814209805277503]
SPFSplatV2, an efficient feed-forward framework for 3D Gaussian splatting from sparse multi-view images, is presented.<n>Method achieves state-of-the-art performance in both in-domain and out-of-domain novel view synthesis.
arXiv Detail & Related papers (2025-09-21T21:37:56Z)
3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians [15.776720879897345]
We introduce 3DGEER, an Exact and Efficient Volumetric Gaussian Rendering method.<n>Our method consistently outperforms prior methods, establishing a new state-of-the-art in real-time neural rendering.
arXiv Detail & Related papers (2025-05-29T22:52:51Z)
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images [20.089890859122168]
We introduce UniGS, a novel 3D Gaussian reconstruction and novel view synthesis model.<n>UniGS predicts a high-fidelity representation of 3D Gaussians from arbitrary number of posed sparse-view images.
arXiv Detail & Related papers (2024-10-17T03:48:02Z)
EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Multi-view Camera Settings [11.248908608011941]
3D Gaussian Splatting methods have demonstrated exceptional capability in real-time novel view synthesis for human models.<n>We propose a novel pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse multi-view camera settings.
arXiv Detail & Related papers (2024-10-02T11:23:08Z)
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting [59.277480452459315]
We propose a principled sensitivity pruning score that preserves visual fidelity and foreground details at significantly higher compression ratios.<n>We also propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model without changing its training pipeline.
arXiv Detail & Related papers (2024-06-14T17:53:55Z)
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images [102.7646120414055]
We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22fps)
arXiv Detail & Related papers (2024-03-21T17:59:58Z)
GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos [56.40776739573832]
We present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA) Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions. We introduce a pose refinement technique to improve hand and foot pose accuracy by aligning normal maps and silhouettes.
arXiv Detail & Related papers (2024-02-26T14:40:15Z)
GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting [82.29476781526752]
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques. GaussianObject is a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images.
arXiv Detail & Related papers (2024-02-15T18:42:33Z)
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis [70.24111297192057]
We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting.
arXiv Detail & Related papers (2023-12-04T18:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.