Related papers: FusePose: IMU-Vision Sensor Fusion in Kinematic Space for Parametric Human Pose Estimation

FusePose: IMU-Vision Sensor Fusion in Kinematic Space for Parametric Human Pose Estimation

URL: http://arxiv.org/abs/2208.11960v1
Date: Thu, 25 Aug 2022 09:35:27 GMT
Title: FusePose: IMU-Vision Sensor Fusion in Kinematic Space for Parametric Human Pose Estimation
Authors: Yiming Bao, Xu Zhao and Dahong Qian
Abstract summary: We propose a framework called emphFusePose under a parametric human kinematic model. We aggregate different information of IMU or vision data and introduce three distinctive sensor fusion approaches: NaiveFuse, KineFuse and AdaDeepFuse. The performance of 3D human pose estimation is improved compared to the baseline result.
Score: 12.821740951249552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There exist challenging problems in 3D human pose estimation mission, such as poor performance caused by occlusion and self-occlusion. Recently, IMU-vision sensor fusion is regarded as valuable for solving these problems. However, previous researches on the fusion of IMU and vision data, which is heterogeneous, fail to adequately utilize either IMU raw data or reliable high-level vision features. To facilitate a more efficient sensor fusion, in this work we propose a framework called \emph{FusePose} under a parametric human kinematic model. Specifically, we aggregate different information of IMU or vision data and introduce three distinctive sensor fusion approaches: NaiveFuse, KineFuse and AdaDeepFuse. NaiveFuse servers as a basic approach that only fuses simplified IMU data and estimated 3D pose in euclidean space. While in kinematic space, KineFuse is able to integrate the calibrated and aligned IMU raw data with converted 3D pose parameters. AdaDeepFuse further develops this kinematical fusion process to an adaptive and end-to-end trainable manner. Comprehensive experiments with ablation studies demonstrate the rationality and superiority of the proposed framework. The performance of 3D human pose estimation is improved compared to the baseline result. On Total Capture dataset, KineFuse surpasses previous state-of-the-art which uses IMU only for testing by 8.6\%. AdaDeepFuse surpasses state-of-the-art which uses IMU for both training and testing by 8.5\%. Moreover, we validate the generalization capability of our framework through experiments on Human3.6M dataset.

Related papers

DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation. DeProPose is a flexible method that simplifies the network architecture to reduce training complexity. We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z)
Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation [27.25933965875881]
LiDAR-based 3D Human Pose Estimation is becoming a research focus. Most of the existing methods use temporal information, multi-modal fusion, or SMPL optimization to correct biased results. We propose a simple yet powerful method, which provides insights both on modeling and augmentation of point clouds.
arXiv Detail & Related papers (2024-12-18T02:54:30Z)
CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations. We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV) We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels. Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z)
Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion [13.938406073551844]
This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation. To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views. Our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements.
arXiv Detail & Related papers (2024-10-06T18:15:27Z)
Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision. Inertial sensor has been introduced to provide complementary source of information. It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z)
GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving. Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models. Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z)
Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments. We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges. MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks. We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild [77.43884383743872]
We present AdaFuse, an adaptive multiview fusion method to enhance the features in occluded views. We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic. We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints.
arXiv Detail & Related papers (2020-10-26T03:19:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.