GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
- URL: http://arxiv.org/abs/2501.04782v1
- Date: Wed, 08 Jan 2025 19:01:12 GMT
- Title: GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
- Authors: Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem,
- Abstract summary: We introduce a novel neural representation that combines 3D Gaussian splatting with continuous camera motion modeling.
Experimental results show that our hierarchical learning, combined with robust camera motion modeling, captures complex dynamic scenes with strong temporal consistency.
This memory-efficient approach achieves high-quality rendering at impressive speeds.
- Score: 28.981174430968643
- License:
- Abstract: Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training times, and temporal consistency. To address these issues, we introduce a novel neural video representation that combines 3D Gaussian splatting with continuous camera motion modeling. By leveraging Neural ODEs, our approach learns smooth camera trajectories while maintaining an explicit 3D scene representation through Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy, progressively refining spatial and temporal features to enhance reconstruction quality and accelerate convergence. This memory-efficient approach achieves high-quality rendering at impressive speeds. Experimental results show that our hierarchical learning, combined with robust camera motion modeling, captures complex dynamic scenes with strong temporal consistency, achieving state-of-the-art performance across diverse video datasets in both high- and low-motion scenarios.
Related papers
- Representing Long Volumetric Video with Temporal Gaussian Hierarchy [80.51373034419379]
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos.
We propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos.
This work is the first approach capable of efficiently handling minutes of volumetric video data while maintaining state-of-the-art rendering quality.
arXiv Detail & Related papers (2024-12-12T18:59:34Z) - 4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes [19.24815625343669]
SaRO-GS is a novel dynamic scene representation capable of achieving real-time rendering.
To handle temporally complex dynamic scenes, we introduce a Scale-aware Residual Field.
Our method has demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2024-12-09T08:44:19Z) - MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream [20.552076533208687]
A spike camera is a specialized high-speed visual sensor that offers advantages such as high temporal resolution and high dynamic range.
We introduce SpikeGS, the method to learn 3D Gaussian fields solely from spike stream.
Our method can reconstruct view synthesis results with fine texture details from a continuous spike stream captured by a moving spike camera.
arXiv Detail & Related papers (2024-09-23T16:28:41Z) - Dynamic 3D Gaussian Fields for Urban Areas [60.64840836584623]
We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas.
We propose 4DGF, a neural scene representation that scales to large-scale dynamic urban areas.
arXiv Detail & Related papers (2024-06-05T12:07:39Z) - VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction [59.40711222096875]
We present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting.
Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets.
arXiv Detail & Related papers (2024-02-27T11:40:50Z) - Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene
Reconstruction [29.83056271799794]
Implicit neural representation has paved the way for new approaches to dynamic scene reconstruction and rendering.
We propose a deformable 3D Gaussians Splatting method that reconstructs scenes using 3D Gaussians and learns them in canonical space.
Through a differential Gaussianizer, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed.
arXiv Detail & Related papers (2023-09-22T16:04:02Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.