HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
- URL: http://arxiv.org/abs/2204.13686v2
- Date: Sun, 16 Apr 2023 12:26:14 GMT
- Title: HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
- Authors: Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia
Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan
Zhang, Chen Change Loy, Lei Yang, Ziwei Liu
- Abstract summary: HuMMan is a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.
HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes.
- Score: 83.57675975092496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 4D human sensing and modeling are fundamental tasks in vision and graphics
with numerous applications. With the advances of new sensors and algorithms,
there is an increasing demand for more versatile datasets. In this work, we
contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human
subjects, 400k sequences and 60M frames. HuMMan has several appealing
properties: 1) multi-modal data and annotations including color images, point
clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile
device is included in the sensor suite; 3) a set of 500 actions, designed to
cover fundamental movements; 4) multiple tasks such as action recognition, pose
estimation, parametric human recovery, and textured mesh reconstruction are
supported and evaluated. Extensive experiments on HuMMan voice the need for
further study on challenges such as fine-grained action recognition, dynamic
human mesh reconstruction, point cloud-based parametric human recovery, and
cross-device domain gaps.
Related papers
- MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures [44.172804112944625]
We present MVHumanNet, a dataset that comprises multi-view human action sequences of 4,500 human identities.
Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions.
arXiv Detail & Related papers (2023-12-05T18:50:12Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless
Sensing [45.29593826502026]
MM-Fi is the first multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation action categories.
MM-Fi consists of over 320k synchronized frames of five modalities from 40 human subjects.
arXiv Detail & Related papers (2023-05-12T05:18:52Z) - SynBody: Synthetic Dataset with Layered Human Models for 3D Human
Perception and Modeling [93.60731530276911]
We introduce a new synthetic dataset, SynBody, with three appealing features.
The dataset comprises 1.2M images with corresponding accurate 3D annotations, covering 10,000 human body models, 1,187 actions, and various viewpoints.
arXiv Detail & Related papers (2023-03-30T13:30:12Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive
Media [16.711606354731533]
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities captured simultaneously.
We provide benchmarking by HUMAN4D with state-of-the-art human pose estimation and 3D pose estimation methods.
arXiv Detail & Related papers (2021-10-14T09:03:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.