SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
- URL: http://arxiv.org/abs/2411.15482v1
- Date: Sat, 23 Nov 2024 07:39:30 GMT
- Title: SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
- Authors: Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen,
- Abstract summary: We introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Motion Flow Fields (NMFF)
S SplatFlow learns 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB, depth and flow synthesis.
- Score: 11.564716761428251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing Dynamic Gaussian Splatting methods for complex dynamic urban scenarios rely on accurate object-level supervision from expensive manual labeling, limiting their scalability in real-world applications. In this paper, we introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF) to learn 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB, depth and flow synthesis. SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields. Leveraging NMFF, SplatFlow effectively decomposes static background and dynamic objects, representing them with 3D and 4D Gaussian primitives, respectively. NMFF also models the status correspondences of each 4D Gaussian across time, which aggregates temporal features to enhance cross-view consistency of dynamic components. SplatFlow further improves dynamic scene identification by distilling features from 2D foundational models into 4D space-time representation. Comprehensive evaluations conducted on the Waymo Open Dataset and KITTI Dataset validate SplatFlow's state-of-the-art (SOTA) performance for both image reconstruction and novel view synthesis in dynamic urban scenarios.
Related papers
- Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM [0.0]
D4DGS-SLAM is the first SLAM based on 4DGS map representation for dynamic environments.
By incorporating the temporal dimension into scene representation, D4DGS-SLAM enables high-quality reconstruction of dynamic scenes.
We show that our method outperforms state-of-the-art approaches in both camera pose tracking and map quality.
arXiv Detail & Related papers (2025-04-07T08:56:35Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.
We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving [12.006435326659526]
We introduce a novel 4D Gaussian Splatting (4DGS) approach to improve dynamic scene rendering.
Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians.
By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation.
arXiv Detail & Related papers (2025-03-09T19:58:51Z) - UrbanGS: Semantic-Guided Gaussian Splatting for Urban Scene Reconstruction [86.4386398262018]
UrbanGS uses 2D semantic maps and an existing dynamic Gaussian approach to distinguish static objects from the scene.
For potentially dynamic objects, we aggregate temporal information using learnable time embeddings.
Our approach outperforms state-of-the-art methods in reconstruction quality and efficiency.
arXiv Detail & Related papers (2024-12-04T16:59:49Z) - DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation.
It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z) - KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter [49.85369344101118]
We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering.
Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions.
Our KFD-NeRF demonstrates similar or even superior performance within comparable computational time and state-of-the-art view synthesis performance with thorough training.
arXiv Detail & Related papers (2024-07-18T05:48:24Z) - Dynamic 3D Gaussian Fields for Urban Areas [60.64840836584623]
We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas.
We propose 4DGF, a neural scene representation that scales to large-scale dynamic urban areas.
arXiv Detail & Related papers (2024-06-05T12:07:39Z) - GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation [28.780488884938997]
We introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames.
Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting.
Our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis.
arXiv Detail & Related papers (2024-03-19T02:22:21Z) - 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes [33.14021987166436]
We introduce 4DRotorGS, a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians.
As an explicit spatial-temporal representation, 4DRotorGS demonstrates powerful capabilities for modeling complicated dynamics and fine details.
We further implement our temporal slicing and acceleration framework, achieving real-time rendering speeds of up to 277 FPS on an 3090 GPU and 583 FPS on a 4090 GPU.
arXiv Detail & Related papers (2024-02-05T18:59:04Z) - Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle [9.082693946898733]
We introduce a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos.
In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS)
Our proposed approach showcases a substantial efficiency improvement, achieving a $5times$ faster training speed compared to the per-frame 3DGS modeling.
arXiv Detail & Related papers (2023-12-06T11:25:52Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via
Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping.
Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z) - Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis [58.5779956899918]
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements.
We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians.
We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.
arXiv Detail & Related papers (2023-08-18T17:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.