Poseur: Direct Human Pose Regression with Transformers
- URL: http://arxiv.org/abs/2201.07412v1
- Date: Wed, 19 Jan 2022 04:31:57 GMT
- Title: Poseur: Direct Human Pose Regression with Transformers
- Authors: Weian Mao and Yongtao Ge and Chunhua Shen and Zhi Tian and Xinlong
Wang and Zhibin Wang and Anton van den Hengel
- Abstract summary: We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
- Score: 119.79232258661995
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a direct, regression-based approach to 2D human pose estimation
from single images. We formulate the problem as a sequence prediction task,
which we solve using a Transformer network. This network directly learns a
regression mapping from images to the keypoint coordinates, without resorting
to intermediate representations such as heatmaps. This approach avoids much of
the complexity associated with heatmap-based approaches. To overcome the
feature misalignment issues of previous regression-based methods, we propose an
attention mechanism that adaptively attends to the features that are most
relevant to the target keypoints, considerably improving the accuracy.
Importantly, our framework is end-to-end differentiable, and naturally learns
to exploit the dependencies between keypoints. Experiments on MS-COCO and MPII,
two predominant pose-estimation datasets, demonstrate that our method
significantly improves upon the state-of-the-art in regression-based pose
estimation. More notably, ours is the first regression-based approach to
perform favorably compared to the best heatmap-based pose estimation methods.
Related papers
- Improving Robustness for Pose Estimation via Stable Heatmap Regression [19.108116394510258]
A heatmap regression method is proposed to alleviate network vulnerability to small perturbations.
A maximum stability training loss is used to simplify the optimization difficulty.
The proposed method achieves a significant advance in robustness over state-of-the-art approaches on two benchmark datasets.
arXiv Detail & Related papers (2021-05-08T03:07:05Z) - Pose Recognition with Cascade Transformers [31.7059023190426]
We present a regression-based pose recognition method using Transformers.
Heatmap-based and regression-based methods achieve higher accuracy but are subject to various designs.
In the experiments, we report competitive results for pose recognition when compared with the competing regression-based methods.
arXiv Detail & Related papers (2021-04-14T17:00:22Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - TFPose: Direct Human Pose Estimation with Transformers [83.03424247905869]
We formulate the pose estimation task into a sequence prediction problem that can effectively be solved by transformers.
Our framework is simple and direct, bypassing the drawbacks of the heatmap-based pose estimation.
Experiments on the MS-COCO and MPII datasets demonstrate that our method can significantly improve the state-of-the-art of regression-based pose estimation.
arXiv Detail & Related papers (2021-03-29T04:18:54Z) - End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer.
Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image.
Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.