EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis
- URL: http://arxiv.org/abs/2601.15951v1
- Date: Thu, 22 Jan 2026 13:39:29 GMT
- Title: EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis
- Authors: Sheng Miao, Sijin Li, Pan Wang, Dongfeng Bai, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao,
- Abstract summary: EvolSplat4D is a feed-forward framework that moves beyond existing per-pixel paradigms by unifying volume-based and pixel-based Gaussian prediction.<n>We show that EvolSplat4D reconstructs both static and dynamic environments with superior accuracy and consistency, outperforming both per-scene optimization and state-of-the-art feed-forward baselines.
- Score: 43.898895514609286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel view synthesis (NVS) of static and dynamic urban scenes is essential for autonomous driving simulation, yet existing methods often struggle to balance reconstruction time with quality. While state-of-the-art neural radiance fields and 3D Gaussian Splatting approaches achieve photorealism, they often rely on time-consuming per-scene optimization. Conversely, emerging feed-forward methods frequently adopt per-pixel Gaussian representations, which lead to 3D inconsistencies when aggregating multi-view predictions in complex, dynamic environments. We propose EvolSplat4D, a feed-forward framework that moves beyond existing per-pixel paradigms by unifying volume-based and pixel-based Gaussian prediction across three specialized branches. For close-range static regions, we predict consistent geometry of 3D Gaussians over multiple frames directly from a 3D feature volume, complemented by a semantically-enhanced image-based rendering module for predicting their appearance. For dynamic actors, we utilize object-centric canonical spaces and a motion-adjusted rendering module to aggregate temporal features, ensuring stable 4D reconstruction despite noisy motion priors. Far-Field scenery is handled by an efficient per-pixel Gaussian branch to ensure full-scene coverage. Experimental results on the KITTI-360, KITTI, Waymo, and PandaSet datasets show that EvolSplat4D reconstructs both static and dynamic environments with superior accuracy and consistency, outperforming both per-scene optimization and state-of-the-art feed-forward baselines.
Related papers
- RU4D-SLAM: Reweighting Uncertainty in Gaussian Splatting SLAM for 4D Scene Reconstruction [8.13353479857245]
4D reconstruction, especially 4D Gaussian splatting, offers a promising direction for addressing these challenges.<n>We propose a robust and efficient framework, namely Reweighting Uncertainty in Gaussian Splatting SLAM (RU4D-SLAM) for 4D scene reconstruction.<n>Our method substantially outperforms state-of-the-art approaches in both trajectory accuracy and 4D scene reconstruction.
arXiv Detail & Related papers (2026-02-24T11:47:43Z) - Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding [54.859943475818234]
We present Motion4D, a novel framework that integrates 2D priors from foundation models into a unified 4D Gaussian Splatting representation.<n>Our method features a two-part iterative optimization framework: 1) Sequential optimization, which updates motion and semantic fields in consecutive stages to maintain local consistency, and 2) Global optimization, which jointly refines all attributes for long-term coherence.<n>Our method significantly outperforms both 2D foundation models and existing 3D-based approaches across diverse scene understanding tasks, including point-based tracking, video object segmentation, and novel view synthesis.
arXiv Detail & Related papers (2025-12-03T09:32:56Z) - 4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos [52.89084603734664]
We present 4D3R, a pose-free dynamic neural rendering framework that decouples static and dynamic components through a two-stage approach.<n>Our approach achieves up to 1.8dB PSNR improvement over state-of-the-art methods.
arXiv Detail & Related papers (2025-11-07T13:25:50Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving [12.006435326659526]
We introduce a novel 4D Gaussian Splatting (4DGS) approach to improve dynamic scene rendering.<n> Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians.<n>By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation.
arXiv Detail & Related papers (2025-03-09T19:58:51Z) - 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [115.67081491747943]
Dynamic 3D scene representation and novel view synthesis are crucial for enabling AR/VR and metaverse applications.<n>We reformulate the reconstruction of a time-varying 3D scene as approximating its underlying 4D volume.<n>We derive several compact variants that effectively reduce the memory footprint to address its storage bottleneck.
arXiv Detail & Related papers (2024-12-30T05:30:26Z) - UrbanGS: Semantic-Guided Gaussian Splatting for Urban Scene Reconstruction [86.4386398262018]
UrbanGS uses 2D semantic maps and an existing dynamic Gaussian approach to distinguish static objects from the scene.<n>For potentially dynamic objects, we aggregate temporal information using learnable time embeddings.<n>Our approach outperforms state-of-the-art methods in reconstruction quality and efficiency.
arXiv Detail & Related papers (2024-12-04T16:59:49Z) - 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes [33.14021987166436]
We introduce 4DRotorGS, a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians.
As an explicit spatial-temporal representation, 4DRotorGS demonstrates powerful capabilities for modeling complicated dynamics and fine details.
We further implement our temporal slicing and acceleration framework, achieving real-time rendering speeds of up to 277 FPS on an 3090 GPU and 583 FPS on a 4090 GPU.
arXiv Detail & Related papers (2024-02-05T18:59:04Z) - Real-time Photorealistic Dynamic Scene Representation and Rendering with
4D Gaussian Splatting [8.078460597825142]
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics.
We propose to approximate the underlying-temporal rendering volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.
Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics.
arXiv Detail & Related papers (2023-10-16T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.