Hybrid Gaussian Splatting for Novel Urban View Synthesis
- URL: http://arxiv.org/abs/2510.12308v1
- Date: Tue, 14 Oct 2025 09:09:13 GMT
- Title: Hybrid Gaussian Splatting for Novel Urban View Synthesis
- Authors: Mohamed Omran, Farhad Zanjani, Davide Abati, Jens Petersen, Amirhossein Habibian,
- Abstract summary: This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025.<n>The challenge concerns novel view synthesis in street scenes, and participants are required to generate renders of the same urban environment.<n>Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models.<n>On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.
- Score: 9.298287928508492
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025. The challenge concerns novel view synthesis in street scenes, and participants are required to generate, starting from car-centric frames captured during some training traversals, renders of the same urban environment as viewed from a different traversal (e.g. different street lane or car direction). Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models, and it is composed of two stages: First, we fit a 3D reconstruction of the scene and render novel views as seen from the target cameras. Then, we enhance the resulting frames with a dedicated single-step diffusion model. We discuss specific choices made in the initialization of gaussian primitives as well as the finetuning of the enhancer model and its training data curation. We report the performance of our model design and we ablate its components in terms of novel view quality as measured by PSNR, SSIM and LPIPS. On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.
Related papers
- ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models [27.324967736816337]
Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas.<n>We propose a two-stage pipeline that leverages two key insights.<n>First, we train a powerful bidirectional generative model with a novel opacity mixing strategy.<n>Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass.
arXiv Detail & Related papers (2026-02-28T06:22:40Z) - ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation [8.962361530943976]
This report presents our winning solution which took first place in the RealADSim Workshop NVS track at ICCV 2025.<n>To address the core challenges of street view extrapolation, we introduce a comprehensive four-stage pipeline.<n>On the RealADSim-NVS benchmark, our method achieves a final score of 0.441, ranking first among all participants.
arXiv Detail & Related papers (2025-10-21T06:50:20Z) - FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction [18.570290675633732]
We introduce Mesh-Guided 2D Gaussian Splatting, where 2D primitives are attached directly to template mesh faces with constrained position, rotation, and movement.<n>We leverage foundation models trained on large-scale datasets, such as Sapiens, to complement the limited visual cues from monocular videos.<n> Experimental evaluation demonstrates superior reconstruction quality compared to existing methods, with notable gains in geometric accuracy and appearance fidelity.
arXiv Detail & Related papers (2025-09-18T08:41:41Z) - BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images [21.811586185200706]
This paper addresses the challenge of reconstructing vehicles from sparse-view inputs.<n>We leverage depth maps and a robust pose estimation architecture to synthesize novel views.<n>We present a novel dataset featuring both synthetic and real-world public transportation vehicles.
arXiv Detail & Related papers (2025-07-16T10:04:35Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors [13.883695200241524]
RI3D is a novel approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images.<n>Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions.<n>We produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes.
arXiv Detail & Related papers (2025-03-13T20:16:58Z) - Extrapolated Urban View Synthesis Benchmark [53.657271730352214]
Photo simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs)<n>At their core is Novel View Synthesis (NVS), a capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs.<n>Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes.<n>We will release the data to help advance self-driving and urban robotics simulation technology.
arXiv Detail & Related papers (2024-12-06T18:41:39Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - AIM 2024 Sparse Neural Rendering Challenge: Methods and Results [64.19942455360068]
This paper reviews the challenge on Sparse Neural Rendering that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024.
The challenge aims at producing novel camera view synthesis of diverse scenes from sparse image observations.
Participants are asked to optimise objective fidelity to the ground-truth images as measured via the Peak Signal-to-Noise Ratio (PSNR) metric.
arXiv Detail & Related papers (2024-09-23T14:17:40Z) - Efficient Depth-Guided Urban View Synthesis [52.841803876653465]
We introduce Efficient Depth-Guided Urban View Synthesis (EDUS) for fast feed-forward inference and efficient per-scene fine-tuning.<n>EDUS exploits noisy predicted geometric priors as guidance to enable generalizable urban view synthesis from sparse input images.<n>Our results indicate that EDUS achieves state-of-the-art performance in sparse view settings when combined with fast test-time optimization.
arXiv Detail & Related papers (2024-07-17T08:16:25Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - Fast Non-Rigid Radiance Fields from Monocularized Data [66.74229489512683]
This paper proposes a new method for full 360deg inward-facing novel view synthesis of non-rigidly deforming scenes.
At the core of our method are 1) An efficient deformation module that decouples the processing of spatial and temporal information for accelerated training and inference; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field.
In both cases, our method is significantly faster than previous methods, converging in less than 7 minutes and achieving real-time framerates at 1K resolution, while obtaining a higher visual accuracy for generated novel views.
arXiv Detail & Related papers (2022-12-02T18:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.