Related papers: DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes

DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes

URL: http://arxiv.org/abs/2510.24734v1
Date: Tue, 14 Oct 2025 03:32:46 GMT
Title: DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes
Authors: Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui,
Abstract summary: We propose DrivingScene, an online framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images.<n>Our key innovation is a lightweight residual flow network that predicts the non-rigid motion of dynamic objects per camera.
Score: 11.532584276783105
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Real-time, high-fidelity reconstruction of dynamic driving scenes is challenged by complex dynamics and sparse views, with prior methods struggling to balance quality and efficiency. We propose DrivingScene, an online, feed-forward framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images. Our key innovation is a lightweight residual flow network that predicts the non-rigid motion of dynamic objects per camera on top of a learned static scene prior, explicitly modeling dynamics via scene flow. We also introduce a coarse-to-fine training paradigm that circumvents the instabilities common to end-to-end approaches. Experiments on nuScenes dataset show our image-only method simultaneously generates high-quality depth, scene flow, and 3D Gaussian point clouds online, significantly outperforming state-of-the-art methods in both dynamic reconstruction and novel view synthesis.

Related papers

Flow4R: Unifying 4D Reconstruction and Tracking with Scene Flow [61.297800738187355]
Flow4R predicts a minimal per-pixel property set-3D point position, scene flow, pose weight, and confidence-from two-view inputs using a Vision Transformer.<n> trained jointly on static and dynamic datasets, Flow4R achieves state-of-the-art performance on 4D reconstruction and tracking tasks.
arXiv Detail & Related papers (2026-02-15T06:58:08Z)
MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments [12.796165448365949]
MOSAIC-GS is a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos.<n>We leverage multiple geometric cues, such as depth, optical flow, dynamic object segmentation, and point tracking.<n>We demonstrate that MOSAIC-GS achieves substantially faster optimization and rendering compared to existing methods.
arXiv Detail & Related papers (2026-01-08T20:48:24Z)
4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos [52.89084603734664]
We present 4D3R, a pose-free dynamic neural rendering framework that decouples static and dynamic components through a two-stage approach.<n>Our approach achieves up to 1.8dB PSNR improvement over state-of-the-art methods.
arXiv Detail & Related papers (2025-11-07T13:25:50Z)
DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams [14.211339652447462]
Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams is crucial for numerous real-world applications.<n>We introduce StreamSplat, the first fully feed-forward framework that transforms uncalibrated video streams of arbitrary length into dynamic 3D representations in an online manner.
arXiv Detail & Related papers (2025-06-10T14:52:36Z)
FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static Decoupling [13.495102292705253]
FreeDriveRF reconstructs dynamic driving scenes using only sequential RGB images without requiring poses inputs.<n>We introduce a warped ray-guided dynamic object rendering consistency loss, utilizing optical flow to better constrain the dynamic modeling process.
arXiv Detail & Related papers (2025-05-14T14:02:49Z)
3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction [1.2603104712715607]
This paper proposes a novel 3D Gaussian point distribution method for dynamic street scene reconstruction.<n>Our approach eliminates moving objects while preserving high-fidelity static scene details.<n> Experimental results demonstrate that our method achieves high reconstruction quality, improved rendering performance, and adaptability in large-scale dynamic environments.
arXiv Detail & Related papers (2025-03-15T05:41:59Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.<n>By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.<n>We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
Shape of Motion: 4D Reconstruction from a Single Video [42.42669078777769]
We introduce a method for reconstructing generic dynamic scenes, featuring explicit, persistent 3D motion trajectories in the world coordinate frame.<n>First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE(3) motion bases.<n>Second, we take advantage of off-the-shelf data-driven priors such as monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals.
arXiv Detail & Related papers (2024-07-18T17:59:08Z)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z)
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes [57.12439406121721]
We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes. For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene. We then leverage a composite dynamic Gaussian graph to handle multiple moving objects. We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency.
arXiv Detail & Related papers (2023-12-13T06:30:51Z)
DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting [31.75611852583603]
We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories.<n>Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions.
arXiv Detail & Related papers (2023-11-30T18:59:11Z)
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.