SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2503.05174v1
- Date: Fri, 07 Mar 2025 06:40:06 GMT
- Title: SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
- Authors: Linqi Yang, Xiongwei Zhao, Qihao Sun, Ke Wang, Ao Chen, Peng Kang,
- Abstract summary: We introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation.<n> Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings.
- Score: 3.6688867031495223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 6-DoF pose estimation is a fundamental task in computer vision with wide-ranging applications in augmented reality and robotics. Existing single RGB-based methods often compromise accuracy due to their reliance on initial pose estimates and susceptibility to rotational ambiguity, while approaches requiring depth sensors or multi-view setups incur significant deployment costs. To address these limitations, we introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation using only a single RGB image. Central to our approach is the Dual-Attention Ray Scoring Network (DARS-Net), which innovatively decouples positional and angular alignment through geometry-domain attention mechanisms, explicitly modeling directional dependencies to mitigate rotational ambiguity. Additionally, a coarse-to-fine optimization pipeline progressively refines pose estimates by aligning dense 2D features between query images and 3DGS-synthesized views, effectively correcting feature misalignment and depth errors from sparse ray sampling. Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings, rivaling approaches that depend on depth or multi-view images.
Related papers
- Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames [10.859307261818362]
Estimating the 6D pose of textureless objects from RBG images is an important problem in robotics.<n>We propose a comprehensive active perception framework for estimating the 6D poses of textureless objects using only RGB images.
arXiv Detail & Related papers (2025-03-05T18:28:32Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
We propose a two-stage procedure that integrates dense and robust keypoint descriptors from the lightweight XFeat feature extractor into 3DGS.<n>In the second stage, the initial pose estimate is refined by minimizing the rendering-based photometric warp loss.<n> Benchmarking on widely used indoor and outdoor datasets demonstrates improvements over recent neural rendering-based localization methods.
arXiv Detail & Related papers (2024-09-24T23:18:32Z) - GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting [25.780452115246245]
We propose a novel test-time camera pose refinement (CPR) framework, GS-CPR.<n>This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods.<n>The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences.
arXiv Detail & Related papers (2024-08-20T17:58:23Z) - Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel and lightning-fast neural reconstruction system that builds accurate 3D representations from as few as 2-3 images.<n>InstantSplat integrates dense stereo priors and co-visibility relationships between frames to initialize pixel-aligned by progressively expanding the scene.<n>It achieves an acceleration of over 20 times in reconstruction, improves visual quality (SSIM) from 0.3755 to 0.7624 than COLMAP with 3D-GS, and is compatible with multiple 3D representations.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - A Combined Approach Toward Consistent Reconstructions of Indoor Spaces
Based on 6D RGB-D Odometry and KinectFusion [7.503338065129185]
We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction.
We feed the estimated pose to the highly accurate KinectFusion algorithm, which fine-tune the frame-to-frame relative pose.
Our algorithm outputs a ready-to-use polygon mesh (highly suitable for creating 3D virtual worlds) without any postprocessing steps.
arXiv Detail & Related papers (2022-12-25T22:52:25Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z) - Extreme Rotation Estimation using Dense Correlation Volumes [73.35119461422153]
We present a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting.
We observe that, even when images do not overlap, there may be rich hidden cues as to their geometric relationship.
We propose a network design that can automatically learn such implicit cues by comparing all pairs of points between the two input images.
arXiv Detail & Related papers (2021-04-28T02:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.