Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos
- URL: http://arxiv.org/abs/2409.08353v1
- Date: Thu, 12 Sep 2024 18:33:13 GMT
- Title: Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos
- Authors: Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu,
- Abstract summary: We present a novel approach, dubbed textitDualGS, for real-time and high-fidelity playback of complex human performance.
Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame.
We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets.
- Score: 44.50599475213118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed \textit{DualGS}, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.
Related papers
- HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting [7.507657419706855]
This paper proposes an efficient framework, dubbed HiCoM, with three key components.
First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy.
Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians.
Experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about $20%$.
arXiv Detail & Related papers (2024-11-12T04:40:27Z) - V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians [53.614560799043545]
V3 (Viewing Volumetric Videos) is a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians.
Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs.
As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience.
arXiv Detail & Related papers (2024-09-20T16:54:27Z) - SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length [2.4844080708094745]
This paper introduces SwinGS, a framework for training, delivering, and rendering volumetric video in a real-time streaming fashion.
We show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR.
We also develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers.
arXiv Detail & Related papers (2024-09-12T05:33:15Z) - HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian
Splatting [48.59338619051709]
HiFi4G is an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage.
It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame.
arXiv Detail & Related papers (2023-12-06T12:36:53Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Human Performance Modeling and Rendering via Neural Animated Mesh [40.25449482006199]
We bridge the traditional mesh with a new class of neural rendering.
In this paper, we present a novel approach for rendering human views from video.
We demonstrate our approach on various platforms, inserting virtual human performances into AR headsets.
arXiv Detail & Related papers (2022-09-18T03:58:00Z) - Context-Aware Video Reconstruction for Rolling Shutter Cameras [52.28710992548282]
In this paper, we propose a context-aware GS video reconstruction architecture.
We first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame.
Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames.
arXiv Detail & Related papers (2022-05-25T17:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.