SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views
- URL: http://arxiv.org/abs/2509.17246v1
- Date: Sun, 21 Sep 2025 21:37:56 GMT
- Title: SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views
- Authors: Ranran Huang, Krystian Mikolajczyk,
- Abstract summary: SPFSplatV2, an efficient feed-forward framework for 3D Gaussian splatting from sparse multi-view images, is presented.<n>Method achieves state-of-the-art performance in both in-domain and out-of-domain novel view synthesis.
- Score: 18.814209805277503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SPFSplatV2, an efficient feed-forward framework for 3D Gaussian splatting from sparse multi-view images, requiring no ground-truth poses during training and inference. It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses in a canonical space from unposed inputs. A masked attention mechanism is introduced to efficiently estimate target poses during training, while a reprojection loss enforces pixel-aligned Gaussian primitives, providing stronger geometric constraints. We further demonstrate the compatibility of our training framework with different reconstruction architectures, resulting in two model variants. Remarkably, despite the absence of pose supervision, our method achieves state-of-the-art performance in both in-domain and out-of-domain novel view synthesis, even under extreme viewpoint changes and limited image overlap, and surpasses recent methods that rely on geometric supervision for relative pose estimation. By eliminating dependence on ground-truth poses, our method offers the scalability to leverage larger and more diverse datasets. Code and pretrained models will be available on our project page: https://ranrhuang.github.io/spfsplatv2/.
Related papers
- Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z) - Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting [33.7339252839354]
We introduce a new feed-forward architecture that detects 3D Gaussian primitives at a sub-pixel level.<n>Inspired by keypoint detection, our decoder learns to distribute primitives across image patches.<n>Our resulting pose-free model generates scenes in seconds, achieving state-of-the-art novel view synthesis for feed-forward models.
arXiv Detail & Related papers (2025-12-17T14:59:21Z) - iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion [62.09575122593993]
iGaussian is a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion.<n> Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T&T+DB datasets demonstrate a significant performance improvement over previous methods.
arXiv Detail & Related papers (2025-11-18T05:22:22Z) - Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes [10.279549537925062]
3D Gaussian Splatting (3DGS) has emerged as a key rendering pipeline for digital asset creation.<n>To address the issues of unstable pose estimation and scene representation distortion, we approach the problem from two aspects.<n>For pose estimation, we leverage LiDAR-IMU Odometry to provide prior poses for cameras in large-scale environments.<n>For scene representation, we introduce normal vector constraints and effective rank regularization to enforce consistency in the direction and shape of Gaussian primitives.
arXiv Detail & Related papers (2025-11-10T06:45:08Z) - No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views [17.221166075016257]
SPFSplat is an efficient framework for 3D Gaussian splatting from sparse multi-view images.<n>It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses.<n>It achieves state-of-the-art performance in novel view synthesis even under significant viewpoint changes and limited image overlap.
arXiv Detail & Related papers (2025-08-02T03:19:13Z) - UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views [9.974268614169155]
A common rendering setup for training feed-forward approaches places a 3D object at the world origin and renders it from cameras pointed toward the origin.<n>We introduce a novel adaptation framework that enables pretrained pose-free feed-forward 3DGS models to handle unfavorable views.
arXiv Detail & Related papers (2025-07-30T02:56:47Z) - Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing [6.997091164331322]
Visual relocalization is fundamental to remote sensing and UAV applications.<n>Existing methods face inherent trade-offs: image-based retrieval and pose regression approaches lack precision.<n>We introduce $mathrmHi2$-GSLoc, a dual-hierarchical relocalization framework that follows a sparse-to-dense and coarse-to-fine paradigm.
arXiv Detail & Related papers (2025-07-21T14:47:56Z) - GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos [56.40776739573832]
We present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA)
Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions.
We introduce a pose refinement technique to improve hand and foot pose accuracy by aligning normal maps and silhouettes.
arXiv Detail & Related papers (2024-02-26T14:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.