HUMBI: A Large Multiview Dataset of Human Body Expressions and Benchmark
Challenge
- URL: http://arxiv.org/abs/2110.00119v1
- Date: Thu, 30 Sep 2021 23:19:25 GMT
- Title: HUMBI: A Large Multiview Dataset of Human Body Expressions and Benchmark
Challenge
- Authors: Jae Shin Yoon, Zhixuan Yu, Jaesik Park, Hyun Soo Park
- Abstract summary: This paper presents a new large multiview dataset called HUMBI for human body expressions with natural clothing.
107 synchronized HD cameras are used to capture 772 distinctive subjects across gender, ethnicity, age, and style.
We reconstruct high fidelity body expressions using 3D mesh models, which allows representing view-specific appearance.
- Score: 33.26419876973344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a new large multiview dataset called HUMBI for human body
expressions with natural clothing. The goal of HUMBI is to facilitate modeling
view-specific appearance and geometry of five primary body signals including
gaze, face, hand, body, and garment from assorted people. 107 synchronized HD
cameras are used to capture 772 distinctive subjects across gender, ethnicity,
age, and style. With the multiview image streams, we reconstruct high fidelity
body expressions using 3D mesh models, which allows representing view-specific
appearance. We demonstrate that HUMBI is highly effective in learning and
reconstructing a complete human model and is complementary to the existing
datasets of human body expressions with limited views and subjects such as
MPII-Gaze, Multi-PIE, Human3.6M, and Panoptic Studio datasets. Based on HUMBI,
we formulate a new benchmark challenge of a pose-guided appearance rendering
task that aims to substantially extend photorealism in modeling diverse human
expressions in 3D, which is the key enabling factor of authentic social
tele-presence. HUMBI is publicly available at http://humbi-data.net
Related papers
- PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion [43.850899288337025]
PSHuman is a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model.
It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions.
To enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X.
arXiv Detail & Related papers (2024-09-16T10:13:06Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image.
We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls.
We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z) - MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction [12.942635715952525]
Multiple cameras can provide comprehensive multi-view video coverage of a person.
Previous studies have overlooked the challenges posed by self-occlusion under multiple views.
We introduce a method to reconstruct the 3D human body from multiple uncalibrated camera views.
arXiv Detail & Related papers (2024-03-08T05:03:25Z) - MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures [44.172804112944625]
We present MVHumanNet, a dataset that comprises multi-view human action sequences of 4,500 human identities.
Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions.
arXiv Detail & Related papers (2023-12-05T18:50:12Z) - XAGen: 3D Expressive Human Avatars Generation [76.69560679209171]
XAGen is the first 3D generative model for human avatars capable of expressive control over body, face, and hands.
We propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands.
Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities.
arXiv Detail & Related papers (2023-11-22T18:30:42Z) - HiFECap: Monocular High-Fidelity and Expressive Capture of Human
Performances [84.7225785061814]
HiFECap simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video.
Our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works.
arXiv Detail & Related papers (2022-10-11T17:57:45Z) - Generalizable Neural Performer: Learning Robust Radiance Fields for
Human Novel View Synthesis [52.720314035084215]
This work targets at using a general deep learning framework to synthesize free-viewpoint images of arbitrary human performers.
We present a simple yet powerful framework, named Generalizable Neural Performer (GNR), that learns a generalizable and robust neural body representation.
Experiments on GeneBody-1.0 and ZJU-Mocap show better robustness of our methods than recent state-of-the-art generalizable methods.
arXiv Detail & Related papers (2022-04-25T17:14:22Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling [103.65625425020129]
We represent the pedestrian's shape, pose and skinning weights as neural implicit functions that are directly learned from data.
We demonstrate the effectiveness of our approach on various datasets and show that our reconstructions outperform existing state-of-the-art methods.
arXiv Detail & Related papers (2021-01-17T02:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.