V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
- URL: http://arxiv.org/abs/2409.13648v2
- Date: Mon, 23 Sep 2024 08:04:53 GMT
- Title: V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
- Authors: Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, Lan Xu,
- Abstract summary: V3 (Viewing Volumetric Videos) is a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians.
Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs.
As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience.
- Score: 53.614560799043545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V^3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V^3, outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.
Related papers
- Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity [59.80405282381126]
Diffusion Transformers (DiTs) dominate video generation but their high computational cost severely limits real-world applicability.
We propose a training-free framework termed Sparse VideoGen (SVG) that leverages the inherent sparsity in 3D Full Attention to boost inference efficiency.
SVG achieves up to 2.28x and 2.33x end-to-end speedup on CogVideoX-v1.5 and HunyuanVideo, respectively, while preserving generation quality.
arXiv Detail & Related papers (2025-02-03T19:29:16Z) - GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting [3.479384894190067]
We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames.
Experiment results show that GSVC achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs.
arXiv Detail & Related papers (2025-01-21T11:30:51Z) - Representing Long Volumetric Video with Temporal Gaussian Hierarchy [80.51373034419379]
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos.
We propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos.
This work is the first approach capable of efficiently handling minutes of volumetric video data while maintaining state-of-the-art rendering quality.
arXiv Detail & Related papers (2024-12-12T18:59:34Z) - QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos [42.554100586090826]
Online free-viewpoint video (FVV) streaming is a challenging problem, which is relatively under-explored.
We propose a novel framework for QUantized and Efficient ENcoding for streaming FVV using 3D Gaussianting.
We further propose a quantization-sparity framework, which contains a learned latent-decoder for effectively quantizing residuals other than Gaussian positions.
arXiv Detail & Related papers (2024-12-05T18:59:55Z) - HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting [7.507657419706855]
This paper proposes an efficient framework, dubbed HiCoM, with three key components.
First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy.
Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians.
Experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about $20%$.
arXiv Detail & Related papers (2024-11-12T04:40:27Z) - Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos [44.50599475213118]
We present a novel approach, dubbed textitDualGS, for real-time and high-fidelity playback of complex human performance.
Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame.
We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets.
arXiv Detail & Related papers (2024-09-12T18:33:13Z) - SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length [2.4844080708094745]
This paper introduces SwinGS, a framework for training, delivering, and rendering volumetric video in a real-time streaming fashion.
We show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR.
We also develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers.
arXiv Detail & Related papers (2024-09-12T05:33:15Z) - Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting [94.84688557937123]
Video-3DGS is a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors.
Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos.
It enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
arXiv Detail & Related papers (2024-06-04T17:57:37Z) - VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams [56.00479598817949]
VideoRF is the first approach to enable real-time streaming and rendering of dynamic radiance fields on mobile platforms.
We show that the feature image stream can be efficiently compressed by 2D video codecs.
We have developed a real-time interactive player that enables online streaming and rendering of dynamic scenes.
arXiv Detail & Related papers (2023-12-03T14:14:35Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.