Related papers: Benchmarking 3D Human Pose Estimation Models under Occlusions

Benchmarking 3D Human Pose Estimation Models under Occlusions

URL: http://arxiv.org/abs/2504.10350v2
Date: Mon, 02 Jun 2025 16:24:00 GMT
Title: Benchmarking 3D Human Pose Estimation Models under Occlusions
Authors: Filipa Lino, Carlos Santiago, Manuel Marques,
Abstract summary: Human Pose Estimation (HPE) involves detecting and localizing keypoints on the human body from visual data.<n>This paper presents a benchmark on the robustness of 3D HPE models under realistic occlusion conditions.<n>We evaluate nine state-of-the-art 2D-to-3D HPE models, spanning convolutional, transformer-based, graph-based, and diffusion-based architectures.
Score: 6.858859328420893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human Pose Estimation (HPE) involves detecting and localizing keypoints on the human body from visual data. In 3D HPE, occlusions, where parts of the body are not visible in the image, pose a significant challenge for accurate pose reconstruction. This paper presents a benchmark on the robustness of 3D HPE models under realistic occlusion conditions, involving combinations of occluded keypoints commonly observed in real-world scenarios. We evaluate nine state-of-the-art 2D-to-3D HPE models, spanning convolutional, transformer-based, graph-based, and diffusion-based architectures, using the BlendMimic3D dataset, a synthetic dataset with ground-truth 2D/3D annotations and occlusion labels. All models were originally trained on Human3.6M and tested here without retraining to assess their generalization. We introduce a protocol that simulates occlusion by adding noise into 2D keypoints based on real detector behavior, and conduct both global and per-joint sensitivity analyses. Our findings reveal that all models exhibit notable performance degradation under occlusion, with diffusion-based models underperforming despite their stochastic nature. Additionally, a per-joint occlusion analysis identifies consistent vulnerability in distal joints (e.g., wrists, feet) across models. Overall, this work highlights critical limitations of current 3D HPE models in handling occlusions, and provides insights for improving real-world robustness.

Related papers

DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation.<n>DeProPose is a flexible method that simplifies the network architecture to reduce training complexity.<n>We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z)
Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation [9.637714330461037]
We propose a novel method of hard example synthesis that is model-agnostic.<n>We demonstrate an improvement in correct detection rate of up to 20% across several ROBI-dataset objects using state-of-the-art pose estimation models.
arXiv Detail & Related papers (2024-12-05T16:00:55Z)
Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection [10.782354892545651]
We present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos. We reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features. Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates.
arXiv Detail & Related papers (2024-07-23T18:41:16Z)
3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement [6.858859328420893]
This work identifies and addresses a gap in the current state of the art in 3D Human Pose Estimation (HPE) We introduce our novel BlendMimic3D dataset, designed to mimic real-world situations where occlusions occur. We also propose a 3D pose refinement block, employing a Graph Convolutional Network (GCN) to enhance pose representation through a graph model.
arXiv Detail & Related papers (2024-04-24T18:49:37Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level. The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.<n>We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [71.2556016049579]
ManiPose is a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting.<n>By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses.<n>We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency.
arXiv Detail & Related papers (2023-12-11T13:50:10Z)
3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation [28.24765523800196]
We propose 3D-aware Neural Body Fitting (3DNBF) for 3D human pose estimation. In particular, we propose a generative model of deep features based on a volumetric human representation with Gaussian ellipsoidal kernels emitting 3D pose-dependent feature vectors. The neural features are trained with contrastive learning to become 3D-aware and hence to overcome the 2D-3D ambiguity.
arXiv Detail & Related papers (2023-08-19T22:41:00Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only. PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose. We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z)
LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering [3.007707487678111]
We propose a framework that synthesizes silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. We are among state-of-the-art on the 3DPW dataset in terms of pose accuracy and evidently outperform the rank-1 method in terms of shape accuracy.
arXiv Detail & Related papers (2021-08-01T02:09:16Z)
Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance. We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces. AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z)
Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable. We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.