Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation
in Outdoor Scenes
- URL: http://arxiv.org/abs/2308.00628v2
- Date: Sun, 6 Aug 2023 14:47:00 GMT
- Title: Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation
in Outdoor Scenes
- Authors: Bohao Fan, Siqi Wang, Wenxuan Guo, Wenzhao Zheng, Jianjiang Feng, Jie
Zhou
- Abstract summary: Human-M3 is an outdoor multi-modal multi-view multi-person human pose database.
It includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds.
In order to obtain accurate human poses, we propose an algorithm based on multi-modal data input.
- Score: 35.90042512490975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D human pose estimation in outdoor environments has garnered increasing
attention recently. However, prevalent 3D human pose datasets pertaining to
outdoor scenes lack diversity, as they predominantly utilize only one type of
modality (RGB image or pointcloud), and often feature only one individual
within each scene. This limited scope of dataset infrastructure considerably
hinders the variability of available data. In this article, we propose
Human-M3, an outdoor multi-modal multi-view multi-person human pose database
which includes not only multi-view RGB videos of outdoor scenes but also
corresponding pointclouds. In order to obtain accurate human poses, we propose
an algorithm based on multi-modal data input to generate ground truth
annotation. This benefits from robust pointcloud detection and tracking, which
solves the problem of inaccurate human localization and matching ambiguity that
may exist in previous multi-view RGB videos in outdoor multi-person scenes, and
generates reliable ground truth annotations. Evaluation of multiple different
modalities algorithms has shown that this database is challenging and suitable
for future research. Furthermore, we propose a 3D human pose estimation
algorithm based on multi-modal data input, which demonstrates the advantages of
multi-modal data input for 3D human pose estimation. Code and data will be
released on https://github.com/soullessrobot/Human-M3-Dataset.
Related papers
- Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers [28.38686299271394]
We propose a framework for 3D sequence-to-sequence (seq2seq) human pose detection.
Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships.
Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset.
arXiv Detail & Related papers (2024-01-30T03:00:25Z) - LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation [31.651300414497822]
LiCamPose is a pipeline that integrates multi-view RGB and sparse point cloud information to estimate robust 3D human poses via single frame.
LiCamPose is evaluated on four datasets, including two public datasets, one synthetic dataset, and one challenging self-collected dataset.
arXiv Detail & Related papers (2023-12-11T14:30:11Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Multi-person 3D pose estimation from unlabelled data [2.54990557236581]
We present a model based on Graph Neural Networks capable of predicting the cross-view correspondence of the people in the scenario.
We also present a Multilayer Perceptron that takes the 2D points to yield the 3D poses of each person.
arXiv Detail & Related papers (2022-12-16T22:03:37Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation [35.791868530073955]
We present PandaNet, a new single-shot, anchor-based and multi-person 3D pose estimation approach.
The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass.
It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box.
arXiv Detail & Related papers (2021-01-07T10:32:17Z) - Multi-Person Absolute 3D Human Pose Estimation with Weak Depth
Supervision [0.0]
We introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion.
Our algorithm is a monocular, multi-person, absolute pose estimator.
We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates.
arXiv Detail & Related papers (2020-04-08T13:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.