MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
- URL: http://arxiv.org/abs/2503.10604v1
- Date: Thu, 13 Mar 2025 17:48:41 GMT
- Title: MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
- Authors: Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang,
- Abstract summary: MuDG is an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction.<n>We show that MuDG outperforms existing methods in both reconstruction and photorealistic synthesis quality.
- Score: 44.592566642185425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial performance deterioration under significant viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization, providing comprehensive supervision signals to refine 3DGS representations for rendering robustness enhancement under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.
Related papers
- LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis [17.470869402542533]
Novel view synthesis (NVS) in low-light scenes remains a significant challenge due to degraded inputs.
We propose LL-Gaussian, a novel framework for 3D reconstruction and enhancement from low-light sRGB images.
Compared to state-of-the-art NeRF-based methods, LL-Gaussian achieves up to 2,000 times faster inference and reduces training time to just 2%.
arXiv Detail & Related papers (2025-04-14T15:39:31Z) - Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [22.767866875051013]
We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address limitations of current methods.
By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones.
Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.
arXiv Detail & Related papers (2025-04-02T17:59:46Z) - StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting [5.759434800012218]
StructGS is a framework that enhances 3D Gaussian Splatting (3DGS) for improved novel-view synthesis in 3D reconstruction.<n>Our framework significantly reduces computational redundancy, enhances detail capture and supports high-resolution rendering from low-resolution inputs.
arXiv Detail & Related papers (2025-03-09T05:39:44Z) - T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.
We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z) - MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [84.07233691641193]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction.
To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization.
Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z) - MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields [73.49548565633123]
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering.
Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images.
We propose a view framework based on 3D Gaussian Splatting, named MCGS, enabling scene reconstruction from sparse input views.
arXiv Detail & Related papers (2024-10-15T08:39:05Z) - Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods [4.6836510920448715]
This study explores the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction.
We assess performance based on tracking accuracy, mapping fidelity, and view synthesis.
Findings reveal that NeRF excels in view synthesis, offering unique capabilities in generating new perspectives from existing data.
arXiv Detail & Related papers (2024-08-08T07:11:57Z) - Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections [30.321151430263946]
This paper presents Wild-GS, an innovative adaptation of 3DGS optimized for unconstrained photo collections.
Wild-GS determines the appearance of each 3D Gaussian by their inherent material attributes, global illumination and camera properties per image, and point-level local variance of reflectance.
This novel design effectively transfers the high-frequency detailed appearance of the reference view to 3D space and significantly expedites the training process.
arXiv Detail & Related papers (2024-06-14T19:06:07Z) - Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction.
Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow.
For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z) - GS-IR: 3D Gaussian Splatting for Inverse Rendering [71.14234327414086]
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS)
We extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions.
The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering.
arXiv Detail & Related papers (2023-11-26T02:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.