PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2505.10888v1
- Date: Fri, 16 May 2025 05:49:23 GMT
- Title: PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation
- Authors: Saad Manzur, Bryan Vela, Brandon Vela, Aditya Agrawal, Lan-Anh Dang-Vu, David Li, Wayne Hayes,
- Abstract summary: We present a standardized testing environment in which each method is evaluated on a variety of datasets.<n>We propose PoseBench3D, a unified framework designed to systematically re-evaluate prior and future models.
- Score: 1.470703050699957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable three-dimensional human pose estimation is becoming increasingly important for real-world applications, yet much of prior work has focused solely on the performance within a single dataset. In practice, however, systems must adapt to diverse viewpoints, environments, and camera setups -- conditions that differ significantly from those encountered during training, which is often the case in real-world scenarios. To address these challenges, we present a standardized testing environment in which each method is evaluated on a variety of datasets, ensuring consistent and fair cross-dataset comparisons -- allowing for the analysis of methods on previously unseen data. Therefore, we propose PoseBench3D, a unified framework designed to systematically re-evaluate prior and future models across four of the most widely used datasets for human pose estimation -- with the framework able to support novel and future datasets as the field progresses. Through a unified interface, our framework provides datasets in a pre-configured yet easily modifiable format, ensuring compatibility with diverse model architectures. We re-evaluated the work of 18 methods, either trained or gathered from existing literature, and reported results using both Mean Per Joint Position Error (MPJPE) and Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) metrics, yielding more than 100 novel cross-dataset evaluation results. Additionally, we analyze performance differences resulting from various pre-processing techniques and dataset preparation parameters -- offering further insight into model generalization capabilities.
Related papers
- Investigating Domain Gaps for Indoor 3D Object Detection [60.55242233729081]
We consider the task of adapting indoor 3D object detectors from one dataset to another.<n>In this paper, we present a benchmark with ScanNet, SUN RGB-D and 3D Front datasets, as well as our newly proposed large-scale datasets ProcTHOR-OD and ProcFront.<n>We conduct experiments on different adaptation scenarios including synthetic-to-real adaptation, point cloud quality adaptation, layout adaptation and instance feature adaptation, analyzing the impact of different domain gaps on 3D object detectors.
arXiv Detail & Related papers (2025-08-24T16:34:19Z) - VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions [12.739233840342958]
VOccl3D is a Video-based human Occlusion dataset with 3D body pose and shape annotations.<n>Inspired by works such as AGORA and BEDLAM, we constructed this dataset using advanced computer graphics rendering techniques.
arXiv Detail & Related papers (2025-08-09T00:13:46Z) - Ensemble-Based Deepfake Detection using State-of-the-Art Models with Robust Cross-Dataset Generalisation [0.0]
Machine learning-based Deepfake detection models have achieved impressive results on benchmark datasets.<n>But their performance often deteriorates significantly when evaluated on out-of-distribution data.<n>In this work, we investigate an ensemble-based approach for improving the generalization of deepfake detection systems.
arXiv Detail & Related papers (2025-07-08T13:54:48Z) - Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models [7.61977883644433]
We propose four dimensions to evaluate data quality: professionalism, readability, reasoning, and cleanliness.<n>We introduce Meta-rater, a multi-dimensional data selection method that integrates these dimensions with existing quality metrics through learned optimal weightings.<n>Experiments demonstrate that Meta-rater doubles convergence speed for 1.3B parameter models and improves downstream task performance by 3.23, with advantages that scale to models as large as 7.2B parameters.
arXiv Detail & Related papers (2025-04-19T06:12:33Z) - Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation [8.365430750061506]
RGB-based 3D pose estimation methods have been successful with the development of deep learning.<n>Most existing methods do not operate well for testing images whose distribution is far from that of training data.<n>In this paper, we introduce an unsupervised domain adaptation framework for 3D pose estimation.
arXiv Detail & Related papers (2025-01-14T19:56:43Z) - EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features.<n>EBES is a comprehensive benchmark for EvS classification with sequence-level targets.<n>It features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - What is the Right Notion of Distance between Predict-then-Optimize Tasks? [35.842182348661076]
We show that traditional dataset distances, which rely solely on feature and label dimensions, lack informativeness in the Predict-then-then (PtO) context.
We propose a new dataset distance that incorporates the impacts of downstream decisions.
Our results show that this decision-aware dataset distance effectively captures adaptation success in PtO contexts.
arXiv Detail & Related papers (2024-09-11T04:13:17Z) - SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities [55.87169702896249]
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift.<n>We present a complete and fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment.<n>Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications.
arXiv Detail & Related papers (2024-07-16T12:52:29Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - State-of-the-art Models for Object Detection in Various Fields of
Application [0.0]
COCO minival, COCO test, Pascal VOC 2007, ADE20K, and ImageNet are reviewed.
The datasets are handpicked after closely comparing them with the rest in terms of diversity, quality of data, minimal bias, labeling quality etc.
It lists the top models and their optimal use cases for each of the respective datasets.
arXiv Detail & Related papers (2022-11-01T20:25:32Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by
Learnable Motion Generation [24.009674750548303]
Testing a pre-trained 3D pose estimator on a new dataset results in a major performance drop.
We propose AdaptPose, an end-to-end framework that generates synthetic 3D human motions from a source dataset.
Our method outperforms previous work in cross-dataset evaluations by 14% and previous semi-supervised learning methods that use partial 3D annotations by 16%.
arXiv Detail & Related papers (2021-12-22T00:27:52Z) - Post-hoc Models for Performance Estimation of Machine Learning Inference [22.977047604404884]
Estimating how well a machine learning model performs during inference is critical in a variety of scenarios.
We systematically generalize performance estimation to a diverse set of metrics and scenarios.
We find that proposed post-hoc models consistently outperform the standard confidence baselines.
arXiv Detail & Related papers (2021-10-06T02:20:37Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation [97.93687743378106]
Existing 3D pose estimation models suffer performance drop when applying to new scenarios with unseen poses.
We propose a novel framework, Inference Stage Optimization (ISO), for improving the generalizability of 3D pose models.
Remarkably, it yields new state-of-the-art of 83.6% 3D PCK on MPI-INF-3DHP, improving upon the previous best result by 9.7%.
arXiv Detail & Related papers (2020-07-04T09:45:18Z) - Novel Human-Object Interaction Detection via Adversarial Domain
Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.
The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations.
We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.