DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering
- URL: http://arxiv.org/abs/2307.10173v2
- Date: Sat, 30 Sep 2023 06:24:23 GMT
- Title: DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering
- Authors: Wei Cheng, Ruixiang Chen, Wanqi Yin, Siming Fan, Keyu Chen, Honglin
He, Huiwen Luo, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu
Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu,
Dahua Lin, Bo Dai, Kwan-Yee Lin
- Abstract summary: We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
- Score: 126.00165445599764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realistic human-centric rendering plays a key role in both computer vision
and computer graphics. Rapid progress has been made in the algorithm aspect
over the years, yet existing human-centric rendering datasets and benchmarks
are rather impoverished in terms of diversity, which are crucial for rendering
effect. Researchers are usually constrained to explore and evaluate a small set
of rendering problems on current datasets, while real-world applications
require methods to be robust across different scenarios. In this work, we
present DNA-Rendering, a large-scale, high-fidelity repository of human
performance data for neural actor rendering. DNA-Rendering presents several
alluring attributes. First, our dataset contains over 1500 human subjects, 5000
motion sequences, and 67.5M frames' data volume. Second, we provide rich assets
for each subject -- 2D/3D human body keypoints, foreground masks, SMPLX models,
cloth/accessory materials, multi-view images, and videos. These assets boost
the current method's accuracy on downstream rendering tasks. Third, we
construct a professional multi-view system to capture data, which contains 60
synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern
camera calibration steps, ensuring high-quality resources for task training and
evaluation. Along with the dataset, we provide a large-scale and quantitative
benchmark in full-scale, with multiple tasks to evaluate the existing progress
of novel view synthesis, novel pose animation synthesis, and novel identity
rendering methods. In this manuscript, we describe our DNA-Rendering effort as
a revealing of new observations, challenges, and future directions to
human-centric rendering. The dataset, code, and benchmarks will be publicly
available at https://dna-rendering.github.io/
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures [44.172804112944625]
We present MVHumanNet, a dataset that comprises multi-view human action sequences of 4,500 human identities.
Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions.
arXiv Detail & Related papers (2023-12-05T18:50:12Z) - Relightable Neural Human Assets from Multi-view Gradient Illuminations [39.70530019396583]
We present UltraStage, a new 3D human dataset that contains more than 2,000 high-quality human assets captured under both multi-view and multi-illumination settings.
Inspired by recent advances in neural representation, we interpret each example into a neural human asset which allows novel view synthesis under arbitrary lighting conditions.
We show our neural human assets can achieve extremely high capture performance and are capable of representing fine details such as facial wrinkles and cloth folds.
arXiv Detail & Related papers (2022-12-15T08:06:03Z) - Human Performance Modeling and Rendering via Neural Animated Mesh [40.25449482006199]
We bridge the traditional mesh with a new class of neural rendering.
In this paper, we present a novel approach for rendering human views from video.
We demonstrate our approach on various platforms, inserting virtual human performances into AR headsets.
arXiv Detail & Related papers (2022-09-18T03:58:00Z) - Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset.
We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance.
The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z) - HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling [83.57675975092496]
HuMMan is a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.
HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes.
arXiv Detail & Related papers (2022-04-28T17:54:25Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - A Review of Deep Learning Techniques for Markerless Human Motion on
Synthetic Datasets [0.0]
Estimating human posture has recently gained increasing attention in the computer vision community.
We present a model that can predict the skeleton of an animation based solely on 2D images.
The implementation process uses DeepLabCut on its own dataset to perform many necessary steps.
arXiv Detail & Related papers (2022-01-07T15:42:50Z) - HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive
Media [16.711606354731533]
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities captured simultaneously.
We provide benchmarking by HUMAN4D with state-of-the-art human pose estimation and 3D pose estimation methods.
arXiv Detail & Related papers (2021-10-14T09:03:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.