Real-time Photorealistic Dynamic Scene Representation and Rendering with
4D Gaussian Splatting
- URL: http://arxiv.org/abs/2310.10642v3
- Date: Thu, 22 Feb 2024 15:08:49 GMT
- Title: Real-time Photorealistic Dynamic Scene Representation and Rendering with
4D Gaussian Splatting
- Authors: Zeyu Yang, Hongye Yang, Zijie Pan, Li Zhang
- Abstract summary: Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics.
We propose to approximate the underlying-temporal rendering volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.
Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics.
- Score: 8.078460597825142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing dynamic 3D scenes from 2D images and generating diverse views
over time is challenging due to scene complexity and temporal dynamics. Despite
advancements in neural implicit models, limitations persist: (i) Inadequate
Scene Structure: Existing methods struggle to reveal the spatial and temporal
structure of dynamic scenes from directly learning the complex 6D plenoptic
function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element
deformation becomes impractical for complex dynamics. To address these issues,
we consider the spacetime as an entirety and propose to approximate the
underlying spatio-temporal 4D volume of a dynamic scene by optimizing a
collection of 4D primitives, with explicit geometry and appearance modeling.
Learning to optimize the 4D primitives enables us to synthesize novel views at
any desired time with our tailored rendering routine. Our model is conceptually
simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that
can rotate arbitrarily in space and time, as well as view-dependent and
time-evolved appearance represented by the coefficient of 4D spherindrical
harmonics. This approach offers simplicity, flexibility for variable-length
video and end-to-end training, and efficient real-time rendering, making it
suitable for capturing complex dynamic scene motions. Experiments across
various benchmarks, including monocular and multi-view scenarios, demonstrate
our 4DGS model's superior visual quality and efficiency.
Related papers
- DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion [22.11178016375823]
DimensionX is a framework designed to generate 3D and 4D scenes from just a single image with video diffusion.
Our approach begins with the insight that both the spatial structure of a 3D scene and the temporal evolution of a 4D scene can be effectively represented through sequences of video frames.
arXiv Detail & Related papers (2024-11-07T18:07:31Z) - Real-Time Spatio-Temporal Reconstruction of Dynamic Endoscopic Scenes with 4D Gaussian Splatting [1.7947477507955865]
This paper presents ST-Endo4DGS, a novel framework that models the dynamics of dynamic endoscopic scenes.
This approach enables precise representation of deformable tissue, capturing spatial and temporal correlations in real time.
arXiv Detail & Related papers (2024-11-02T11:24:27Z) - SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes [7.590932716513324]
We present SpectroMotion, a novel approach that combines 3D Gaussian Splatting (3DGS) with physically-based rendering (PBR) and deformation fields to reconstruct dynamic specular scenes.
arXiv Detail & Related papers (2024-10-22T17:59:56Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation.
We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets.
Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z) - Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic
Scenes [69.52540205439989]
We introduce Im4D, a hybrid representation that consists of a grid-based geometry representation and a multi-view image-based appearance representation.
We represent the scene appearance by the original multi-view videos and a network that learns to predict the color of a 3D point from image features.
We show that Im4D state-of-the-art performance in rendering quality and can be trained efficiently, while realizing real-time rendering with a speed of 79.8 FPS for 512x512 images.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis [58.5779956899918]
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements.
We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians.
We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.
arXiv Detail & Related papers (2023-08-18T17:59:21Z) - SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes [75.9110646062442]
We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner.
Our method takes multi-view RGB videos and background images from static cameras with known camera parameters as input.
We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
arXiv Detail & Related papers (2023-08-16T09:50:35Z) - LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human
Modeling [69.56581851211841]
We propose a novel Local 4D implicit Representation for Dynamic clothed human, named LoRD.
Our key insight is to encourage the network to learn the latent codes of local part-level representation.
LoRD has strong capability for representing 4D human, and outperforms state-of-the-art methods on practical applications.
arXiv Detail & Related papers (2022-08-18T03:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.