Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?
- URL: http://arxiv.org/abs/2107.03332v1
- Date: Wed, 7 Jul 2021 16:20:12 GMT
- Title: Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?
- Authors: Yanjie Li, Sen Yang, Shoukui Zhang, Zhicheng Wang, Wankou Yang,
Shu-Tao Xia, Erjin Zhou
- Abstract summary: We propose a textbfSimple yet promising textbfDisentangled textbfRepresentation for keypoint coordinate (emphSimDR)
In detail, we propose to disentangle the representation of horizontal and vertical coordinates for keypoint location, leading to a more efficient scheme without extra upsampling and refinement.
- Score: 44.313782042852246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The 2D heatmap representation has dominated human pose estimation for years
due to its high performance. However, heatmap-based approaches have some
drawbacks: 1) The performance drops dramatically in the low-resolution images,
which are frequently encountered in real-world scenarios. 2) To improve the
localization precision, multiple upsample layers may be needed to recover the
feature map resolution from low to high, which are computationally expensive.
3) Extra coordinate refinement is usually necessary to reduce the quantization
error of downscaled heatmaps. To address these issues, we propose a
\textbf{Sim}ple yet promising \textbf{D}isentangled \textbf{R}epresentation for
keypoint coordinate (\emph{SimDR}), reformulating human keypoint localization
as a task of classification. In detail, we propose to disentangle the
representation of horizontal and vertical coordinates for keypoint location,
leading to a more efficient scheme without extra upsampling and refinement.
Comprehensive experiments conducted over COCO dataset show that the proposed
\emph{heatmap-free} methods outperform \emph{heatmap-based} counterparts in all
tested input resolutions, especially in lower resolutions by a large margin.
Code will be made publicly available at \url{https://github.com/leeyegy/SimDR}.
Related papers
- SHaRPose: Sparse High-Resolution Representation for Human Pose
Estimation [39.936860590417346]
We propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose)
Our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of $1.4times$ faster than ViTPose-Base.
arXiv Detail & Related papers (2023-12-17T16:29:16Z) - Attention Map Guided Transformer Pruning for Edge Device [98.42178656762114]
Vision transformer (ViT) has achieved promising success in both holistic and occluded person re-identification (Re-ID) tasks.
We propose a novel attention map guided (AMG) transformer pruning method, which removes both redundant tokens and heads.
Comprehensive experiments on Occluded DukeMTMC and Market-1501 demonstrate the effectiveness of our proposals.
arXiv Detail & Related papers (2023-04-04T01:51:53Z) - Heatmap Regression via Randomized Rounding [105.75014893647538]
We propose a simple yet effective quantization system to address the sub-pixel localization problem.
The proposed system encodes the fractional part of numerical coordinates into the ground truth heatmap using a probabilistic approach during training.
arXiv Detail & Related papers (2020-09-01T04:54:22Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z) - Attentive One-Dimensional Heatmap Regression for Facial Landmark
Detection and Tracking [73.35078496883125]
We propose a novel attentive one-dimensional heatmap regression method for facial landmark localization.
First, we predict two groups of 1D heatmaps to represent the marginal distributions of the x and y coordinates.
Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in x and y coordinates.
Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image.
arXiv Detail & Related papers (2020-04-05T06:51:22Z) - Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation [33.71628590745982]
We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images.
We propose a simple and effective compression method to drastically reduce the size of this representation.
Our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.
arXiv Detail & Related papers (2020-04-01T10:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.