Related papers: Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models

Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models

URL: http://arxiv.org/abs/2512.00572v2
Date: Thu, 04 Dec 2025 14:45:21 GMT
Title: Integrating Skeleton Based Representations for Robust Yoga Pose Classification Using Deep Learning Models
Authors: Mohammed Mohiuddin, Syed Mohammod Minhaz Hossain, Sumaiya Khanam, Prionkar Barua, Aparup Barua, MD Tamim Hossain,
Abstract summary: We introduce a curated dataset, 'Yoga-16', which addresses limitations of existing datasets.<n>We systematically evaluate three deep learning architectures (VGG16, ResNet50, and Xception) using three input modalities (direct images, MediaPipe Pose skeleton images, and YOLOv8 Pose skeleton images)<n>Experiments demonstrate that skeleton-based representations outperform raw image inputs, with the highest accuracy of 96.09% achieved by VGG16 with MediaPipe Pose skeleton input.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Yoga is a popular form of exercise worldwide due to its spiritual and physical health benefits, but incorrect postures can lead to injuries. Automated yoga pose classification has therefore gained importance to reduce reliance on expert practitioners. While human pose keypoint extraction models have shown high potential in action recognition, systematic benchmarking for yoga pose recognition remains limited, as prior works often focus solely on raw images or a single pose extraction model. In this study, we introduce a curated dataset, 'Yoga-16', which addresses limitations of existing datasets, and systematically evaluate three deep learning architectures (VGG16, ResNet50, and Xception), using three input modalities (direct images, MediaPipe Pose skeleton images, and YOLOv8 Pose skeleton images). Our experiments demonstrate that skeleton-based representations outperform raw image inputs, with the highest accuracy of 96.09% achieved by VGG16 with MediaPipe Pose skeleton input. Additionally, we provide interpretability analysis using Grad-CAM, offering insights into model decision-making for yoga pose classification with cross-validation analysis.

Related papers

Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis [0.6524460254566905]
This study aims to assess the effectiveness of Contrastive Language-Image Pretraining (CLIP) in classifying human postures.<n>Applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results.<n>The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%.
arXiv Detail & Related papers (2025-01-13T11:20:44Z)
3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding [0.0]
3DYoga901 is organized within a three-level label hierarchy. Our dataset includes meticulously curated RGB yoga pose videos and 3D skeleton sequences.
arXiv Detail & Related papers (2023-10-16T07:15:31Z)
Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one. We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments. We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z)
An Efficient Deep Convolutional Neural Network Model For Yoga Pose Recognition Using Single Images [2.6717276381722033]
This paper presents YPose, an efficient deep convolutional neural network (CNN) model to recognize yoga asanas from RGB images. The proposed model has been tested on the Yoga-82 dataset.
arXiv Detail & Related papers (2023-06-27T19:34:46Z)
LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body. It is fully differentiable and optimizable with disentangled shape and pose latent spaces. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z)
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame. Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information. Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z)
FixMyPose: Pose Correctional Captioning and Retrieval [67.20888060019028]
We introduce a new captioning dataset named FixMyPose to address automated pose correction systems. We collect descriptions of correcting a "current" pose to look like a "target" pose. To avoid ML biases, we maintain a balance across characters with diverse demographics.
arXiv Detail & Related papers (2021-04-04T21:45:44Z)
Yoga-82: A New Dataset for Fine-grained Classification of Human Poses [46.319423568714505]
We present a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes. Yoga-82 consists of complex poses where fine annotations may not be possible. The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names.
arXiv Detail & Related papers (2020-04-22T01:43:44Z)
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images [42.27703025887059]
The main problems with the standard supervised approach are that it often yields anatomically implausible poses. We propose a semi-supervised method that can make effective use of images with and without pose annotations. The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible.
arXiv Detail & Related papers (2020-04-08T05:02:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.