Unsupervised Human Pose Estimation through Transforming Shape Templates
- URL: http://arxiv.org/abs/2105.04154v1
- Date: Mon, 10 May 2021 07:15:56 GMT
- Title: Unsupervised Human Pose Estimation through Transforming Shape Templates
- Authors: Luca Schmidtke, Athanasios Vlontzos, Simon Ellershaw, Anna Lukens,
Tomoki Arichi, Bernhard Kainz
- Abstract summary: We present a novel method for learning pose estimators for human adults and infants in an unsupervised fashion.
We demonstrate the effectiveness of our approach on two different datasets including adults and infants.
- Score: 2.729524133721473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human pose estimation is a major computer vision problem with applications
ranging from augmented reality and video capture to surveillance and movement
tracking. In the medical context, the latter may be an important biomarker for
neurological impairments in infants. Whilst many methods exist, their
application has been limited by the need for well annotated large datasets and
the inability to generalize to humans of different shapes and body
compositions, e.g. children and infants. In this paper we present a novel
method for learning pose estimators for human adults and infants in an
unsupervised fashion. We approach this as a learnable template matching problem
facilitated by deep feature extractors. Human-interpretable landmarks are
estimated by transforming a template consisting of predefined body parts that
are characterized by 2D Gaussian distributions. Enforcing a connectivity prior
guides our model to meaningful human shape representations. We demonstrate the
effectiveness of our approach on two different datasets including adults and
infants.
Related papers
- Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Challenges in Video-Based Infant Action Recognition: A Critical
Examination of the State of the Art [9.327466428403916]
We introduce a groundbreaking dataset called InfActPrimitive'', encompassing five significant infant milestone action categories.
We conduct an extensive comparative analysis employing cutting-edge skeleton-based action recognition models.
Our findings reveal that, although the PoseC3D model achieves the highest accuracy at approximately 71%, the remaining models struggle to accurately capture the dynamics of infant actions.
arXiv Detail & Related papers (2023-11-21T02:36:47Z) - Efficient Domain Adaptation via Generative Prior for 3D Infant Pose
Estimation [29.037799937729687]
3D human pose estimation has gained impressive development in recent years, but only a few works focus on infants, that have different bone lengths and also have limited data.
Here, we show that our model attains state-of-the-art MPJPE performance of 43.6 mm on the SyRIP dataset and 21.2 mm on the MINI-RGBD dataset.
We also prove that our method, ZeDO-i, could attain efficient domain adaptation, even if only a small number of data is given.
arXiv Detail & Related papers (2023-11-17T20:49:37Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Invariant Representation Learning for Infant Pose Estimation with Small
Data [14.91506452479778]
We release a hybrid synthetic and real infant pose dataset with small yet diverse real images as well as generated synthetic infant poses.
In our ablation study, with identical network structure, models trained on SyRIP dataset show noticeable improvement over the ones trained on the only other public infant pose datasets.
One of our best infant pose estimation performers on the state-of-the-art DarkPose model shows mean average precision (mAP) of 93.6.
arXiv Detail & Related papers (2020-10-13T01:10:14Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - AMIL: Adversarial Multi Instance Learning for Human Pose Estimation [24.175298058941515]
We present a structure-aware network to discreetly consider priors during the training of the network.
We propose generative adversarial networks as our learning model in which we design two residual multiple instance learning (MIL) models.
The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets.
arXiv Detail & Related papers (2020-03-18T01:22:16Z) - Deformation-aware Unpaired Image Translation for Pose Estimation on
Laboratory Animals [56.65062746564091]
We aim to capture the pose of neuroscience model organisms, without using any manual supervision, to study how neural circuits orchestrate behaviour.
Our key contribution is the explicit and independent modeling of appearance, shape and poses in an unpaired image translation framework.
We demonstrate improved pose estimation accuracy on Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Danio rerio (zebrafish)
arXiv Detail & Related papers (2020-01-23T15:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.