RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
- URL: http://arxiv.org/abs/2403.19501v1
- Date: Thu, 28 Mar 2024 15:31:36 GMT
- Title: RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
- Authors: Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang,
- Abstract summary: We present RELI11D, a high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera.
It records the motions of 10 actors performing 5 sports in 7 scenes, including 3.32 hours of synchronized LiDAR point clouds, IMU measurement data, RGB videos, and Event steams.
To address the challenge of integrating different modalities, we propose LEIR, a multimodal baseline that effectively utilizes LiDAR Point Cloud, Event stream, and RGB.
- Score: 44.670169033884896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding, we present RELI11D, a high-quality multimodal human motion dataset involves LiDAR, IMU system, RGB camera, and Event camera. It records the motions of 10 actors performing 5 sports in 7 scenes, including 3.32 hours of synchronized LiDAR point clouds, IMU measurement data, RGB videos and Event steams. Through extensive experiments, we demonstrate that the RELI11D presents considerable challenges and opportunities as it contains many rapid and complex motions that require precise location. To address the challenge of integrating different modalities, we propose LEIR, a multimodal baseline that effectively utilizes LiDAR Point Cloud, Event stream, and RGB through our cross-attention fusion strategy. We show that LEIR exhibits promising results for rapid motions and daily motions and that utilizing the characteristics of multiple modalities can indeed improve HPE performance. Both the dataset and source code will be released publicly to the research community, fostering collaboration and enabling further exploration in this field.
Related papers
- Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms [29.577583619354314]
We propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera.
To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare.
arXiv Detail & Related papers (2024-08-19T07:52:20Z) - Motion Capture from Inertial and Vision Sensors [60.5190090684795]
MINIONS is a large-scale Motion capture dataset collected from INertial and visION Sensors.
We conduct experiments on multi-modal motion capture using a monocular camera and very few IMUs.
arXiv Detail & Related papers (2024-07-23T09:41:10Z) - MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark [63.878793340338035]
Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras.
Existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting.
We present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments.
arXiv Detail & Related papers (2024-03-29T15:08:37Z) - LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free
Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation.
We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z) - Headset: Human emotion awareness under partial occlusions multimodal
dataset [19.57427512904342]
We present a new multimodal database to help advance the development of immersive technologies.
Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs)
The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video.
arXiv Detail & Related papers (2024-02-14T11:42:15Z) - I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions [42.87514729260336]
I'm-HOI is a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting.
It combines general motion inference and category-aware refinement.
Our dataset and code will be released to the community.
arXiv Detail & Related papers (2023-12-10T08:25:41Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR
Point Clouds [58.402752909624716]
Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications.
We propose LiDARHuman26M, a new human motion capture dataset captured by LiDAR at a much longer range to overcome this limitation.
Our dataset also includes the ground truth human motions acquired by the IMU system and the synchronous RGB images.
arXiv Detail & Related papers (2022-03-28T12:52:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.