Related papers: Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach

Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach

URL: http://arxiv.org/abs/2405.20084v1
Date: Thu, 30 May 2024 14:14:39 GMT
Title: Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach
Authors: Muhammad Saif Ullah Khan, Dhavalkumar Limbachiya, Didier Stricker, Muhammad Zeshan Afzal,
Abstract summary: We propose a novel approach integrating multi-teacher knowledge distillation with a unified skeleton representation. Our networks are jointly trained on the COCO and MPII datasets, containing 17 and 16 keypoints, respectively. Our joint models achieved an average accuracy of 70.89 and 76.40, compared to 53.79 and 55.78 when trained on a single dataset and evaluated on both.
Score: 12.042768320132694
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in developing universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unified skeleton representation. Our networks are jointly trained on the COCO and MPII datasets, containing 17 and 16 keypoints, respectively. We demonstrate enhanced adaptability by predicting an extended set of 21 keypoints, 4 (COCO) and 5 (MPII) more than original annotations, improving cross-dataset generalization. Our joint models achieved an average accuracy of 70.89 and 76.40, compared to 53.79 and 55.78 when trained on a single dataset and evaluated on both. Moreover, we also evaluate all 21 predicted points by our two models by reporting an AP of 66.84 and 72.75 on the Halpe dataset. This highlights the potential of our technique to address one of the most pressing challenges in pose estimation research and application - the inconsistency in skeletal annotations.

Related papers

SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches [1.1172147007388977]
SketchMind is a multi-agent framework for evaluating and improving student-drawn scientific sketches.<n>It comprises modular agents responsible for parsing, sketch perception, cognitive alignment, and iterative feedback with sketch modification.<n>Experts noted the system's potential to meaningfully support conceptual growth through guided revision.
arXiv Detail & Related papers (2025-06-29T11:35:10Z)
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation [38.19172513799442]
PoseBH is a new multi-dataset training framework for pose estimation.<n>It tackles keypoint heterogeneity and limited supervision through two key techniques.<n>Our learned keypoint embeddings transfer effectively to hand shape estimation (InterHand2.6M) and human body shape estimation (3DPW)
arXiv Detail & Related papers (2025-05-23T04:58:20Z)
8-Calves Image dataset [0.0]
8-Calves is a benchmark for evaluating object detection and identity preservation in temporally consistent environments. The dataset consists of a 1-hour video of eight Holstein Friesian calves with unique coat patterns and 900 static frames.
arXiv Detail & Related papers (2025-03-17T23:47:52Z)
Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models [1.9890559505377343]
Current vision-language multimodal models are well-adapted for general visual understanding tasks.<n>We introduce a method for generating such data by integrating human keypoints with traditional visual features such as captions and bounding boxes.<n>We fine-tune the LLaVA-1.5-7B model using this dataset and evaluate it on the benchmark, achieving significant improvements.
arXiv Detail & Related papers (2024-09-14T05:07:57Z)
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z)
Multi-Dataset Multi-Task Learning for COVID-19 Prognosis [25.371798627482065]
We introduce a novel multi-dataset multi-task training framework that predicts COVID-19 prognostic outcomes from chest X-rays. Our framework hypothesizes that assessing severity scores enhances the model's ability to classify prognostic severity groups.
arXiv Detail & Related papers (2024-05-22T15:57:44Z)
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers [2.954890575035673]
Data-free quantization can potentially address data privacy and security concerns in model compression. Recently, PSAQ-ViT designs a relative value metric, patch similarity, to generate data from pre-trained vision transformers (ViTs) In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs.
arXiv Detail & Related papers (2022-09-13T01:55:53Z)
I^2R-Net: Intra- and Inter-Human Relation Network for Multi-Person Pose Estimation [30.204633647947293]
We present the Intra- and Inter-Human Relation Networks (I2R-Net) for Multi-Person Pose Estimation. First, the Intra-Human Relation Module operates on a single person and aims to capture Intra-Human dependencies. Second, the Inter-Human Relation Module considers the relation between multiple instances and focuses on capturing Inter-Human interactions.
arXiv Detail & Related papers (2022-06-22T07:44:41Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation [86.41437210485932]
We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. We propose a novel end-to-end zero-shot HOI Detection framework via vision-language knowledge distillation. Our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP.
arXiv Detail & Related papers (2022-04-01T07:27:19Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
Beyond Tracking: Using Deep Learning to Discover Novel Interactions in Biological Swarms [3.441021278275805]
We propose training deep network models to predict system-level states directly from generic graphical features from the entire view. Because the resulting predictive models are not based on human-understood predictors, we use explanatory modules. This represents an example of augmented intelligence in behavioral ecology -- knowledge co-creation in a human-AI team.
arXiv Detail & Related papers (2021-08-20T22:50:41Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Whole-Body Human Pose Estimation in the Wild [88.09875133989155]
COCO-WholeBody extends COCO dataset with whole-body annotations. It is the first benchmark that has manual annotations on the entire human body. A single-network model, named ZoomNet, is devised to take into account the hierarchical structure of the full human body.
arXiv Detail & Related papers (2020-07-23T08:35:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.