Human-centric Scene Understanding for 3D Large-scale Scenarios
- URL: http://arxiv.org/abs/2307.14392v1
- Date: Wed, 26 Jul 2023 08:40:46 GMT
- Title: Human-centric Scene Understanding for 3D Large-scale Scenarios
- Authors: Yiteng Xu, Peishan Cong, Yichen Yao, Runnan Chen, Yuenan Hou, Xinge
Zhu, Xuming He, Jingyi Yu, Yuexin Ma
- Abstract summary: We present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife.
Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc.
- Score: 52.12727427303162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-centric scene understanding is significant for real-world applications,
but it is extremely challenging due to the existence of diverse human poses and
actions, complex human-environment interactions, severe occlusions in crowds,
etc. In this paper, we present a large-scale multi-modal dataset for
human-centric scene understanding, dubbed HuCenLife, which is collected in
diverse daily-life scenarios with rich and fine-grained annotations. Our
HuCenLife can benefit many 3D perception tasks, such as segmentation,
detection, action recognition, etc., and we also provide benchmarks for these
tasks to facilitate related research. In addition, we design novel modules for
LiDAR-based segmentation and action recognition, which are more applicable for
large-scale human-centric scenarios and achieve state-of-the-art performance.
Related papers
- HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes [21.2539366684941]
We propose an unsupervised 3D detection method for human-centric scenarios by transferring the knowledge from synthetic human instances to real scenes.
Remarkably, our method exhibits superior performance compared to current state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-05T08:37:05Z) - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model.
We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data.
We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z) - MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures [44.172804112944625]
We present MVHumanNet, a dataset that comprises multi-view human action sequences of 4,500 human identities.
Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions.
arXiv Detail & Related papers (2023-12-05T18:50:12Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - Hulk: A Universal Knowledge Translator for Human-Centric Tasks [69.8518392427151]
We present Hulk, the first multimodal human-centric generalist model.
It addresses 2D vision, 3D vision, skeleton-based, and vision-language tasks without task-specific finetuning.
Hulk achieves state-of-the-art performance in 11 benchmarks.
arXiv Detail & Related papers (2023-12-04T07:36:04Z) - Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D
Human Keypoints [25.550524178542833]
We propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction.
We use 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity.
We show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics.
arXiv Detail & Related papers (2023-06-01T18:27:48Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.