Related papers: PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation

PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation

URL: http://arxiv.org/abs/2505.17475v1
Date: Fri, 23 May 2025 04:58:20 GMT
Title: PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation
Authors: Uyoung Jeong, Jonathan Freer, Seungryul Baek, Hyung Jin Chang, Kwang In Kim,
Abstract summary: PoseBH is a new multi-dataset training framework for pose estimation.<n>It tackles keypoint heterogeneity and limited supervision through two key techniques.<n>Our learned keypoint embeddings transfer effectively to hand shape estimation (InterHand2.6M) and human body shape estimation (3DPW)
Score: 38.19172513799442
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study multi-dataset training (MDT) for pose estimation, where skeletal heterogeneity presents a unique challenge that existing methods have yet to address. In traditional domains, \eg regression and classification, MDT typically relies on dataset merging or multi-head supervision. However, the diversity of skeleton types and limited cross-dataset supervision complicate integration in pose estimation. To address these challenges, we introduce PoseBH, a new MDT framework that tackles keypoint heterogeneity and limited supervision through two key techniques. First, we propose nonparametric keypoint prototypes that learn within a unified embedding space, enabling seamless integration across skeleton types. Second, we develop a cross-type self-supervision mechanism that aligns keypoint predictions with keypoint embedding prototypes, providing supervision without relying on teacher-student models or additional augmentations. PoseBH substantially improves generalization across whole-body and animal pose datasets, including COCO-WholeBody, AP-10K, and APT-36K, while preserving performance on standard human pose benchmarks (COCO, MPII, and AIC). Furthermore, our learned keypoint embeddings transfer effectively to hand shape estimation (InterHand2.6M) and human body shape estimation (3DPW). The code for PoseBH is available at: https://github.com/uyoung-jeong/PoseBH.

Related papers

A New Teacher-Reviewer-Student Framework for Semi-supervised 2D Human Pose Estimation [33.01458098153753]
We propose a novel semi-supervised 2D human pose estimation method by utilizing a newly designed Teacher-Reviewer-Student framework.<n>Specifically, we first mimic the phenomenon that human beings constantly review previous knowledge for consolidation to design our framework.<n> Secondly, we introduce a Multi-level Feature Learning strategy, which utilizes the outputs from different stages of the backbone to estimate the heatmap to guide network training.
arXiv Detail & Related papers (2025-01-16T14:40:02Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach [12.042768320132694]
We propose a novel approach integrating multi-teacher knowledge distillation with a unified skeleton representation. Our networks are jointly trained on the COCO and MPII datasets, containing 17 and 16 keypoints, respectively. Our joint models achieved an average accuracy of 70.89 and 76.40, compared to 53.79 and 55.78 when trained on a single dataset and evaluated on both.
arXiv Detail & Related papers (2024-05-30T14:14:39Z)
Effective Whole-body Pose Estimation with Two-stages Distillation [52.92064408970796]
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an image. We present a two-stage pose textbfDistillation for textbfWhole-body textbfPose estimators, named textbfDWPose, to improve their effectiveness and efficiency.
arXiv Detail & Related papers (2023-07-29T03:49:28Z)
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module [74.80776648785897]
The previous method ignored two problems: (i) When conducting interactive training between large model and lightweight model, the pseudo label of lightweight model will be used to guide large models. We propose a semi-supervised 2D human pose estimation framework driven by a position inconsistency pseudo label correction module (SSPCM) To further improve the performance of the student model, we use the semi-supervised Cut-Occlude based on pseudo keypoint perception to generate more hard and effective samples.
arXiv Detail & Related papers (2023-03-08T02:57:05Z)
MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model [27.849059115252008]
We propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints. Our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints.
arXiv Detail & Related papers (2023-02-17T08:29:33Z)
2D Human Pose Estimation with Explicit Anatomical Keypoints Structure Constraints [15.124606575017621]
We present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints. Our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods. Our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods.
arXiv Detail & Related papers (2022-12-05T11:01:43Z)
Pose for Everything: Towards Category-Agnostic Pose Estimation [93.07415325374761]
Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
arXiv Detail & Related papers (2022-07-21T09:40:54Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation [21.250031729596085]
Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly.
arXiv Detail & Related papers (2020-01-24T03:27:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.