Related papers: VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

URL: http://arxiv.org/abs/2508.06757v1
Date: Sat, 09 Aug 2025 00:13:46 GMT
Title: VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Authors: Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit Roy-Chowdhury,
Abstract summary: VOccl3D is a Video-based human Occlusion dataset with 3D body pose and shape annotations.<n>Inspired by works such as AGORA and BEDLAM, we constructed this dataset using advanced computer graphics rendering techniques.
Score: 12.739233840342958
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human pose and shape (HPS) estimation methods have been extensively studied, with many demonstrating high zero-shot performance on in-the-wild images and videos. However, these methods often struggle in challenging scenarios involving complex human poses or significant occlusions. Although some studies address 3D human pose estimation under occlusion, they typically evaluate performance on datasets that lack realistic or substantial occlusions, e.g., most existing datasets introduce occlusions with random patches over the human or clipart-style overlays, which may not reflect real-world challenges. To bridge this gap in realistic occlusion datasets, we introduce a novel benchmark dataset, VOccl3D, a Video-based human Occlusion dataset with 3D body pose and shape annotations. Inspired by works such as AGORA and BEDLAM, we constructed this dataset using advanced computer graphics rendering techniques, incorporating diverse real-world occlusion scenarios, clothing textures, and human motions. Additionally, we fine-tuned recent HPS methods, CLIFF and BEDLAM-CLIFF, on our dataset, demonstrating significant qualitative and quantitative improvements across multiple public datasets, as well as on the test split of our dataset, while comparing its performance with other state-of-the-art methods. Furthermore, we leveraged our dataset to enhance human detection performance under occlusion by fine-tuning an existing object detector, YOLO11, thus leading to a robust end-to-end HPS estimation system under occlusions. Overall, this dataset serves as a valuable resource for future research aimed at benchmarking methods designed to handle occlusions, offering a more realistic alternative to existing occlusion datasets. See the Project page for code and dataset:https://yashgarg98.github.io/VOccl3D-dataset/

Related papers

Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset [6.181093777643576]
3D human pose estimation poses several challenges due to the complex geometry of the human body and self-cluding joints.<n>We introduce a framework that effectively conditioned the distribution of human poses and pose history.<n>We present a large-scale indoor dataset MVPose3D, which contains multiple modalities.
arXiv Detail & Related papers (2025-12-11T06:11:24Z)
Benchmarking 3D Human Pose Estimation Models under Occlusions [6.858859328420893]
Human Pose Estimation (HPE) involves detecting and localizing keypoints on the human body from visual data.<n>This paper presents a benchmark on the robustness of 3D HPE models under realistic occlusion conditions.<n>We evaluate nine state-of-the-art 2D-to-3D HPE models, spanning convolutional, transformer-based, graph-based, and diffusion-based architectures.
arXiv Detail & Related papers (2025-04-14T16:00:25Z)
DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation.<n>DeProPose is a flexible method that simplifies the network architecture to reduce training complexity.<n>We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z)
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation. We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z)
Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects [32.32128461720876]
Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data. We systematically evaluate single-view 3D shape retrieval along three different axes: the presence of object occlusions and truncations, generalization to unseen 3D shape data, and generalization to unseen objects in the input images.
arXiv Detail & Related papers (2023-12-31T05:39:38Z)
LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation [31.651300414497822]
LiCamPose is a pipeline that integrates multi-view RGB and sparse point cloud information to estimate robust 3D human poses via single frame. LiCamPose is evaluated on four datasets, including two public datasets, one synthetic dataset, and one challenging self-collected dataset.
arXiv Detail & Related papers (2023-12-11T14:30:11Z)
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z)
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering [3.007707487678111]
We propose a framework that synthesizes silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. We are among state-of-the-art on the 3DPW dataset in terms of pose accuracy and evidently outperform the rank-1 method in terms of shape accuracy.
arXiv Detail & Related papers (2021-08-01T02:09:16Z)
Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance. We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces. AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.