YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2511.07321v1
- Date: Mon, 10 Nov 2025 17:21:54 GMT
- Title: YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
- Authors: Botao Ye, Boqi Chen, Haofei Xu, Daniel Barath, Marc Pollefeys,
- Abstract summary: YoNoSplat is a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images.<n>Our model is highly versatile, operating effectively with both posed and unposed, calibrated and uncalibrated inputs.<n>It achieves state-of-the-art performance on standard benchmarks in both pose-free and pose-dependent settings.
- Score: 79.38712054342625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fast and flexible 3D scene reconstruction from unstructured image collections remains a significant challenge. We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images. Our model is highly versatile, operating effectively with both posed and unposed, calibrated and uncalibrated inputs. YoNoSplat predicts local Gaussians and camera poses for each view, which are aggregated into a global representation using either predicted or provided poses. To overcome the inherent difficulty of jointly learning 3D Gaussians and camera parameters, we introduce a novel mixing training strategy. This approach mitigates the entanglement between the two tasks by initially using ground-truth poses to aggregate local Gaussians and gradually transitioning to a mix of predicted and ground-truth poses, which prevents both training instability and exposure bias. We further resolve the scale ambiguity problem by a novel pairwise camera-distance normalization scheme and by embedding camera intrinsics into the network. Moreover, YoNoSplat also predicts intrinsic parameters, making it feasible for uncalibrated inputs. YoNoSplat demonstrates exceptional efficiency, reconstructing a scene from 100 views (at 280x518 resolution) in just 2.69 seconds on an NVIDIA GH200 GPU. It achieves state-of-the-art performance on standard benchmarks in both pose-free and pose-dependent settings. Our project page is at https://botaoye.github.io/yonosplat/.
Related papers
- iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion [62.09575122593993]
iGaussian is a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion.<n> Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T&T+DB datasets demonstrate a significant performance improvement over previous methods.
arXiv Detail & Related papers (2025-11-18T05:22:22Z) - LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos [24.61106294159454]
LongSplat addresses challenges in novel view synthesis (NVS) from casually captured long videos characterized by irregular camera motion, unknown camera poses, and expansive scenes.<n>LongSplat is a robust unposed 3D Gaussian Splatting framework featuring: (1) Incremental Joint Optimization that concurrently optimize camera poses and 3D Gaussians to avoid local minima and ensure global consistency; (2) a robust Pose Estimation Module leveraging learned 3D priors; and (3) an efficient Octree Anchor Formation mechanism that converts dense point clouds into anchors based on spatial density.
arXiv Detail & Related papers (2025-08-19T17:59:56Z) - No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views [17.221166075016257]
SPFSplat is an efficient framework for 3D Gaussian splatting from sparse multi-view images.<n>It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses.<n>It achieves state-of-the-art performance in novel view synthesis even under significant viewpoint changes and limited image overlap.
arXiv Detail & Related papers (2025-08-02T03:19:13Z) - AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [68.94737256959661]
AnySplat is a feed forward network for novel view synthesis from uncalibrated image collections.<n>A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance.<n>In extensive zero shot evaluations, AnySplat matches the quality of pose aware baselines in both sparse and dense view scenarios.
arXiv Detail & Related papers (2025-05-29T17:49:56Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose [44.13819148680788]
We develop a novel construct-and-optimize method for sparse view synthesis without camera poses.
Specifically, we construct a solution by using monocular depth and projecting pixels back into the 3D world.
We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views.
arXiv Detail & Related papers (2024-05-06T17:36:44Z) - COLMAP-Free 3D Gaussian Splatting [88.420322646756]
We propose a novel method to perform novel view synthesis without any SfM preprocessing.
We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time.
Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.
arXiv Detail & Related papers (2023-12-12T18:39:52Z) - 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.