FusePose: IMU-Vision Sensor Fusion in Kinematic Space for Parametric
Human Pose Estimation
- URL: http://arxiv.org/abs/2208.11960v1
- Date: Thu, 25 Aug 2022 09:35:27 GMT
- Title: FusePose: IMU-Vision Sensor Fusion in Kinematic Space for Parametric
Human Pose Estimation
- Authors: Yiming Bao, Xu Zhao and Dahong Qian
- Abstract summary: We propose a framework called emphFusePose under a parametric human kinematic model.
We aggregate different information of IMU or vision data and introduce three distinctive sensor fusion approaches: NaiveFuse, KineFuse and AdaDeepFuse.
The performance of 3D human pose estimation is improved compared to the baseline result.
- Score: 12.821740951249552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There exist challenging problems in 3D human pose estimation mission, such as
poor performance caused by occlusion and self-occlusion. Recently, IMU-vision
sensor fusion is regarded as valuable for solving these problems. However,
previous researches on the fusion of IMU and vision data, which is
heterogeneous, fail to adequately utilize either IMU raw data or reliable
high-level vision features. To facilitate a more efficient sensor fusion, in
this work we propose a framework called \emph{FusePose} under a parametric
human kinematic model. Specifically, we aggregate different information of IMU
or vision data and introduce three distinctive sensor fusion approaches:
NaiveFuse, KineFuse and AdaDeepFuse. NaiveFuse servers as a basic approach that
only fuses simplified IMU data and estimated 3D pose in euclidean space. While
in kinematic space, KineFuse is able to integrate the calibrated and aligned
IMU raw data with converted 3D pose parameters. AdaDeepFuse further develops
this kinematical fusion process to an adaptive and end-to-end trainable manner.
Comprehensive experiments with ablation studies demonstrate the rationality and
superiority of the proposed framework. The performance of 3D human pose
estimation is improved compared to the baseline result. On Total Capture
dataset, KineFuse surpasses previous state-of-the-art which uses IMU only for
testing by 8.6\%. AdaDeepFuse surpasses state-of-the-art which uses IMU for
both training and testing by 8.5\%. Moreover, we validate the generalization
capability of our framework through experiments on Human3.6M dataset.
Related papers
- Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision.
Inertial sensor has been introduced to provide complementary source of information.
It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z) - SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving.
Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models.
Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Adversarial Parametric Pose Prior [106.12437086990853]
We learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training.
We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images.
arXiv Detail & Related papers (2021-12-08T10:05:32Z) - Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D
Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance.
We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces.
AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in
the Wild [77.43884383743872]
We present AdaFuse, an adaptive multiview fusion method to enhance the features in occluded views.
We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic.
We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints.
arXiv Detail & Related papers (2020-10-26T03:19:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.