HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive
Media
- URL: http://arxiv.org/abs/2110.07235v1
- Date: Thu, 14 Oct 2021 09:03:35 GMT
- Title: HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive
Media
- Authors: nargyros Chatzitofis, Leonidas Saroglou, Prodromos Boutis, Petros
Drakoulis, Nikolaos Zioulis, Shishir Subramanyam, Bart Kevelham, Caecilia
Charbonnier, Pablo Cesar, Dimitrios Zarpalas, Stefanos Kollias, Petros Daras
- Abstract summary: We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities captured simultaneously.
We provide benchmarking by HUMAN4D with state-of-the-art human pose estimation and 3D pose estimation methods.
- Score: 16.711606354731533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce HUMAN4D, a large and multimodal 4D dataset that contains a
variety of human activities simultaneously captured by a professional
marker-based MoCap, a volumetric capture and an audio recording system. By
capturing 2 female and $2$ male professional actors performing various
full-body movements and expressions, HUMAN4D provides a diverse set of motions
and poses encountered as part of single- and multi-person daily, physical and
social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD),
volumetric and audio data. Despite the existence of multi-view color datasets
captured with the use of hardware (HW) synchronization, to the best of our
knowledge, HUMAN4D is the first and only public resource that provides
volumetric depth maps with high synchronization precision due to the use of
intra- and inter-sensor HW-SYNC. Moreover, a spatio-temporally aligned scanned
and rigged 3D character complements HUMAN4D to enable joint research on
time-varying and high-quality dynamic meshes. We provide evaluation baselines
by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D
compression methods. For the former, we apply 2D and 3D pose estimation
algorithms both on single- and multi-view data cues. For the latter, we
benchmark open-source 3D codecs on volumetric data respecting online volumetric
video encoding and steady bit-rates. Furthermore, qualitative and quantitative
visual comparison between mesh-based volumetric data reconstructed in different
qualities showcases the available options with respect to 4D representations.
HUMAN4D is introduced to the computer vision and graphics research communities
to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD
and audio data cues. The dataset and its code are available
https://tofis.github.io/myurls/human4d.
Related papers
- Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions [27.677520981665012]
Harmony4D is a dataset for human-human interaction featuring in-the-wild activities such as wrestling, dancing, MMA, and more.
We use a flexible multi-view capture system to record these dynamic activities and provide annotations for human detection, tracking, 2D/3D pose estimation, and mesh recovery for closely interacting subjects.
arXiv Detail & Related papers (2024-10-27T00:05:15Z) - Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers [28.38686299271394]
We propose a framework for 3D sequence-to-sequence (seq2seq) human pose detection.
Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships.
Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset.
arXiv Detail & Related papers (2024-01-30T03:00:25Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - 4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in
varying outfits exhibiting large displacements [19.538122092286894]
4DHumanOutfit presents a new dataset of densely sampled-temporal 4D human data of different actors, outfits and motions.
The dataset can be seen as a cube of data containing 4D motion sequences along 3 axes with identity, outfit and motion.
This rich dataset has numerous potential applications for the processing and creation of digital humans.
arXiv Detail & Related papers (2023-06-12T19:59:27Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - FLAG3D: A 3D Fitness Activity Dataset with Language Instruction [89.60371681477791]
We present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories.
We show that FLAG3D contributes great research value for various challenges, such as cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation.
arXiv Detail & Related papers (2022-12-09T02:33:33Z) - LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human
Modeling [69.56581851211841]
We propose a novel Local 4D implicit Representation for Dynamic clothed human, named LoRD.
Our key insight is to encourage the network to learn the latent codes of local part-level representation.
LoRD has strong capability for representing 4D human, and outperforms state-of-the-art methods on practical applications.
arXiv Detail & Related papers (2022-08-18T03:49:44Z) - HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling [83.57675975092496]
HuMMan is a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.
HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes.
arXiv Detail & Related papers (2022-04-28T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.