MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos
- URL: http://arxiv.org/abs/2406.00434v2
- Date: Tue, 29 Oct 2024 09:50:00 GMT
- Title: MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos
- Authors: Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lyv, Peng Wang, Wenping Wang, Junhui Hou,
- Abstract summary: MoDGS is a new pipeline to render novel views of dynamic scenes from a casually captured monocular video.
Experiments demonstrate MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video.
- Score: 65.31707882676292
- License:
- Abstract: In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid move ment of input cameras to construct multiview consistency but struggle to recon struct dynamic scenes on casually captured input videos whose cameras are either static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms state-of-the-art meth ods by a significant margin. The code will be publicly available.
Related papers
- Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos [101.48581851337703]
We present BTimer, the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes.
Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames.
Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets.
arXiv Detail & Related papers (2024-12-04T18:15:06Z) - RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos [39.384910552854926]
We present RoDyGS, an optimization pipeline for dynamic Gaussian Splatting from casual videos.
It effectively learns motion and underlying geometry of scenes by separating dynamic and static primitives.
We also introduce a comprehensive benchmark, Kubric-MRig, that provides extensive camera and object motion along with simultaneous multi-view captures.
arXiv Detail & Related papers (2024-12-04T07:02:49Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - Modeling Ambient Scene Dynamics for Free-view Synthesis [31.233859111566613]
We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture.
Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes.
arXiv Detail & Related papers (2024-06-13T17:59:11Z) - MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds [27.802537831023347]
We introduce 4D Motion Scaffolds (MoSca), a modern 4D reconstruction system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild.
Experiments demonstrate state-of-the-art performance on dynamic rendering benchmarks and its effectiveness on real videos.
arXiv Detail & Related papers (2024-05-27T17:59:07Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Non-Rigid Neural Radiance Fields: Reconstruction and Novel View
Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes.
We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.