HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion
- URL: http://arxiv.org/abs/2305.06356v2
- Date: Thu, 11 May 2023 17:59:43 GMT
- Title: HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion
- Authors: Mustafa I\c{s}{\i}k, Martin R\"unz, Markos Georgopoulos, Taras
Khakhulin, Jonathan Starck, Lourdes Agapito, Matthias Nie{\ss}ner
- Abstract summary: We introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input.
Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates.
We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data.
- Score: 7.592039690054564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representing human performance at high-fidelity is an essential building
block in diverse applications, such as film production, computer games or
videoconferencing. To close the gap to production-level quality, we introduce
HumanRF, a 4D dynamic neural scene representation that captures full-body
appearance in motion from multi-view video input, and enables playback from
novel, unseen viewpoints. Our novel representation acts as a dynamic video
encoding that captures fine details at high compression rates by factorizing
space-time into a temporal matrix-vector decomposition. This allows us to
obtain temporally coherent reconstructions of human actors for long sequences,
while representing high-resolution details even in the context of challenging
motion. While most research focuses on synthesizing at resolutions of 4MP or
lower, we address the challenge of operating at 12MP. To this end, we introduce
ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160
cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions. We
demonstrate challenges that emerge from using such high-resolution data and
show that our newly introduced HumanRF effectively leverages this data, making
a significant step towards production-level quality novel view synthesis.
Related papers
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling [33.00658723633997]
We present PKU-DyMVHumans, a versatile human-centric dataset for high-fidelity reconstruction and rendering of dynamic human scenarios.
It comprises 8.2 million frames captured by more than 56 cameras synchronized across diverse scenarios.
Inspired by recent advancements in neural field (NeRF)-based scene representations, we carefully set up an off-the-shelf framework.
arXiv Detail & Related papers (2024-03-24T10:06:40Z) - NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads [2.5999037208435705]
We propose a new multi-view capture setup composed of 16 calibrated machine vision cameras.
With our setup, we collect a new dataset of over 4700 high-resolution, high-framerate sequences of more than 220 human heads.
In order to reconstruct high-fidelity human heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles.
arXiv Detail & Related papers (2023-05-04T17:52:18Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Neural 3D Video Synthesis [18.116032726623608]
We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene.
Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting.
We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for highly complex and dynamic scenes.
arXiv Detail & Related papers (2021-03-03T18:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.