An Efficient Deep Convolutional Neural Network Model For Yoga Pose
Recognition Using Single Images
- URL: http://arxiv.org/abs/2306.15768v1
- Date: Tue, 27 Jun 2023 19:34:46 GMT
- Title: An Efficient Deep Convolutional Neural Network Model For Yoga Pose
Recognition Using Single Images
- Authors: Santosh Kumar Yadav, Apurv Shukla, Kamlesh Tiwari, Hari Mohan Pandey,
Shaik Ali Akbar
- Abstract summary: This paper presents YPose, an efficient deep convolutional neural network (CNN) model to recognize yoga asanas from RGB images.
The proposed model has been tested on the Yoga-82 dataset.
- Score: 2.6717276381722033
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pose recognition deals with designing algorithms to locate human body joints
in a 2D/3D space and run inference on the estimated joint locations for
predicting the poses. Yoga poses consist of some very complex postures. It
imposes various challenges on the computer vision algorithms like occlusion,
inter-class similarity, intra-class variability, viewpoint complexity, etc.
This paper presents YPose, an efficient deep convolutional neural network (CNN)
model to recognize yoga asanas from RGB images. The proposed model consists of
four steps as follows: (a) first, the region of interest (ROI) is segmented
using segmentation based approaches to extract the ROI from the original
images; (b) second, these refined images are passed to a CNN architecture based
on the backbone of EfficientNets for feature extraction; (c) third, dense
refinement blocks, adapted from the architecture of densely connected networks
are added to learn more diversified features; and (d) fourth, global average
pooling and fully connected layers are applied for the classification of the
multi-level hierarchy of the yoga poses. The proposed model has been tested on
the Yoga-82 dataset. It is a publicly available benchmark dataset for yoga pose
recognition. Experimental results show that the proposed model achieves the
state-of-the-art on this dataset. The proposed model obtained an accuracy of
93.28%, which is an improvement over the earlier state-of-the-art (79.35%) with
a margin of approximately 13.9%. The code will be made publicly available.
Related papers
- Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - 3D Multi-Object Tracking with Differentiable Pose Estimation [0.0]
We propose a novel approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments.
We leverage those correspondences to inform a graph neural network to solve for the optimal, temporally-consistent 7-DoF pose trajectories of all objects.
Our method improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods.
arXiv Detail & Related papers (2022-06-28T06:46:32Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Wise-SrNet: A Novel Architecture for Enhancing Image Classification by
Learning Spatial Resolution of Feature Maps [0.5892638927736115]
One of the main challenges since the advancement of convolutional neural networks is how to connect the extracted feature map to the final classification layer.
In this paper, we aim to tackle this problem by replacing the GAP layer with a new architecture called Wise-SrNet.
It is inspired by the depthwise convolutional idea and is designed for processing spatial resolution while not increasing computational cost.
arXiv Detail & Related papers (2021-04-26T00:37:11Z) - Road Segmentation for Remote Sensing Images using Adversarial Spatial
Pyramid Networks [28.32775611169636]
We introduce a new model to apply structured domain adaption for synthetic image generation and road segmentation.
A novel scale-wise architecture is introduced to learn from the multi-level feature maps and improve the semantics of the features.
Our model achieves state-of-the-art 78.86 IOU on the Massachusetts dataset with 14.89M parameters and 86.78B FLOPs, with 4x fewer FLOPs but higher accuracy (+3.47% IOU)
arXiv Detail & Related papers (2020-08-10T11:00:19Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Yoga-82: A New Dataset for Fine-grained Classification of Human Poses [46.319423568714505]
We present a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes.
Yoga-82 consists of complex poses where fine annotations may not be possible.
The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names.
arXiv Detail & Related papers (2020-04-22T01:43:44Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.