Efficient, Self-Supervised Human Pose Estimation with Inductive Prior
Tuning
- URL: http://arxiv.org/abs/2311.02815v1
- Date: Mon, 6 Nov 2023 01:19:57 GMT
- Title: Efficient, Self-Supervised Human Pose Estimation with Inductive Prior
Tuning
- Authors: Nobline Yoo, Olga Russakovsky
- Abstract summary: We analyze the relationship between reconstruction quality and pose estimation accuracy.
We develop a model pipeline that outperforms the baseline, using less than one-third the amount of training data.
We show that a combination of well-engineered reconstruction losses and inductive priors can help coordinate pose learning alongside reconstruction.
- Score: 30.256493625913127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of 2D human pose estimation (HPE) is to localize anatomical
landmarks, given an image of a person in a pose. SOTA techniques make use of
thousands of labeled figures (finetuning transformers or training deep CNNs),
acquired using labor-intensive crowdsourcing. On the other hand,
self-supervised methods re-frame the HPE task as a reconstruction problem,
enabling them to leverage the vast amount of unlabeled visual data, though at
the present cost of accuracy. In this work, we explore ways to improve
self-supervised HPE. We (1) analyze the relationship between reconstruction
quality and pose estimation accuracy, (2) develop a model pipeline that
outperforms the baseline which inspired our work, using less than one-third the
amount of training data, and (3) offer a new metric suitable for
self-supervised settings that measures the consistency of predicted body part
length proportions. We show that a combination of well-engineered
reconstruction losses and inductive priors can help coordinate pose learning
alongside reconstruction in a self-supervised paradigm.
Related papers
- Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing [42.371736670824575]
We propose a novel framework that combines visual pose estimators with bioimpedance sensing to capture the 3D pose of people by taking self-contact into account.<n>We validate our approach using a new dataset of synchronized RGB video, bioimpedance measurements, and 3D motion capture.<n>We also present a miniature wearable bioimpedance sensor that enables efficient large-scale collection of contact-aware training data.
arXiv Detail & Related papers (2025-12-04T14:45:38Z) - 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information [2.457872341625575]
A novel Semantic Graph Attention Network can benefit from the ability of self-attention to capture global context.
A Body Part Decoder assists in extracting and refining the information related to specific segments of the body.
A Geometry Loss makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture.
arXiv Detail & Related papers (2024-06-03T10:59:00Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation [1.1501261942096426]
We introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation.
Our model is able to capture the long-range dependencies between body joints.
Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-11-01T13:48:55Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.