FlowR: Flowing from Sparse to Dense 3D Reconstructions
- URL: http://arxiv.org/abs/2504.01647v2
- Date: Mon, 04 Aug 2025 19:36:55 GMT
- Title: FlowR: Flowing from Sparse to Dense 3D Reconstructions
- Authors: Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang, Nikhil Keetha, Lorenzo Porzi, Norman Müller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder,
- Abstract summary: We propose a flow matching model that learns a flow to connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions.<n>Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass.
- Score: 60.28571003356382
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: 3D Gaussian splatting enables high-quality novel view synthesis (NVS) at real-time frame rates. However, its quality drops sharply as we depart from the training views. Thus, dense captures are needed to match the high-quality expectations of applications like Virtual Reality (VR). However, such dense captures are very laborious and expensive to obtain. Existing works have explored using 2D generative models to alleviate this requirement by distillation or generating additional training views. These models typically rely on a noise-to-data generative process conditioned only on a handful of reference input views, leading to hallucinations, inconsistent generation results, and subsequent reconstruction artifacts. Instead, we propose a multi-view, flow matching model that learns a flow to directly connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions. This enables augmenting scene captures with consistent, generated views to improve reconstruction quality. Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass. Our pipeline consistently improves NVS in sparse- and dense-view scenarios, leading to higher-quality reconstructions than prior works across multiple, widely-used NVS benchmarks.
Related papers
- SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting [7.9061560322289335]
We propose an MVS-based learning that regresses 2DGS surface parameters in a feed-forward fashion to perform 3D shape reconstruction and NVS from sparse-view images.<n>The resulting pipeline attains the state-of-the-art results on the DTU 3D reconstruction benchmark in terms of Chamfer distance to ground-truth, as-well as state-of-the-art NVS.
arXiv Detail & Related papers (2025-05-04T16:33:47Z) - Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views [29.85363432402896]
We propose a novel neural rendering framework to accomplish the unposed and extremely sparse-view 3D reconstruction in unbounded 360deg scenes.
By employing a dense stereo reconstruction model to recover coarse geometry, we introduce a layer-specific bootstrap optimization to refine the noise and fill occluded regions in the reconstruction.
Our approach outperforms existing state-of-the-art methods in terms of rendering quality and surface reconstruction accuracy.
arXiv Detail & Related papers (2025-03-31T17:59:25Z) - GenFusion: Closing the Loop between Reconstruction and Generation via Videos [24.195304481751602]
We propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings.<n>We also propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set.
arXiv Detail & Related papers (2025-03-27T07:16:24Z) - SplatVoxel: History-Aware Novel View Streaming without Temporal Training [29.759664150610362]
We study the problem of novel view streaming from sparse-view videos.<n>Existing novel view synthesis methods struggle with temporal coherence and visual fidelity.<n>We propose a hybrid splat-voxel feed-forward scene reconstruction approach.
arXiv Detail & Related papers (2025-03-18T20:00:47Z) - Stable Virtual Camera: Generative View Synthesis with Diffusion Models [51.71244310522393]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.<n>Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.<n>Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z) - DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction.
Few-shot methods often struggle with poor reconstruction quality in vast environments.
This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z) - SCube: Instant Large-Scale Scene Reconstruction using VoxSplats [55.383993296042526]
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold.
arXiv Detail & Related papers (2024-10-26T00:52:46Z) - Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z) - SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting [6.506706621221143]
3D Splatting (3DGS) has recently enabled real-time rendering of 3D scenes for novel view synthesis.<n>This technique requires dense training views to accurately reconstruct 3D geometry.<n>We introduce SparseGS, an efficient training pipeline designed to address the limitations of 3DGS in scenarios with sparse training views.
arXiv Detail & Related papers (2023-11-30T21:38:22Z) - DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model [86.37536249046943]
textbfDMV3D is a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.
Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering.
arXiv Detail & Related papers (2023-11-15T18:58:41Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.